Your application works fine with ten users. But what happens when a thousand users hit your checkout endpoint at the same time? What about a flash sale that doubles your normal traffic in sixty seconds? Most teams find out the hard way — in production, during the event, when it is too late to fix anything.
Load testing gives you the answer before your users do. You simulate realistic traffic against your system, measure how it responds under pressure, and find the bottlenecks before they become outages.
k6 is a modern open-source load testing tool built by Grafana Labs. You write test scripts in JavaScript, run them from the command line, and get a clean summary of latency percentiles, request rates, and failure counts. It is fast (the runtime is written in Go), integrates naturally into CI/CD pipelines, and has become the standard choice for teams that want more than what Apache Bench or basic curl loops can offer.
In this tutorial you will install k6 on Ubuntu, write a load test script from scratch, run tests with different virtual user counts, use checks and thresholds to define pass/fail criteria, simulate realistic ramp-up traffic, and understand how to read and act on the output.
How k6 Works
k6 is not a browser tool. It does not render JavaScript, click buttons, or simulate a full browser session. It is an HTTP-level load generator: it creates virtual users (VUs), each of which runs your test script in a loop, sending HTTP requests as fast as your script and the network allow.
Virtual users are the core abstraction. Each VU is an independent worker that runs your script from top to bottom, then starts again from the top. If you run 50 VUs for 30 seconds, you have 50 concurrent workers hammering your server for that entire duration.
Iterations are how many times the script body runs in total. With 50 VUs each completing 20 iterations, you get 1,000 total script executions.
Checks are assertions embedded in your script — “the response status should be 200”, “the body should contain a token”. Failed checks do not abort the test; they are counted and appear in the final summary.
Thresholds are the pass/fail criteria you define upfront. You tell k6: “fail this test if p(95) latency exceeds 500ms” or “fail if more than 1% of requests get an error”. k6 exits with a non-zero code when any threshold is breached, which is what makes it drop naturally into CI pipelines — a failing load test breaks the build just like a failing unit test.
Prerequisites
- Ubuntu 20.04, 22.04, or 24.04
- A non-root user with
sudoprivileges - Basic command-line familiarity
- Enough JavaScript to read function calls and object literals
- A running HTTP service to test against — a local nginx, a staging API, or the public
https://test.k6.ioendpoint used in the examples below
Step 1: Install k6
k6 provides an official apt repository. Add the signing key and repository, then install:
sudo gpg --no-default-keyring \
--keyring /usr/share/keyrings/k6-archive-keyring.gpg \
--keyserver hkp://keyserver.ubuntu.com:80 \
--recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] \
https://dl.k6.io/deb stable main" \
| sudo tee /etc/apt/sources.list.d/k6.list
sudo apt update && sudo apt install k6 -y
Verify the installation:
k6 version
You should see something like:
k6 v0.55.0 (go1.23.4, linux/amd64)
Step 2: Write Your First Load Test Script
k6 test scripts are plain JavaScript files. Create a project directory and an initial script:
mkdir ~/k6-tests && cd ~/k6-tests
nano basic-test.js
Paste this into the file:
import http from 'k6/http';
import { sleep } from 'k6';
export const options = {
vus: 10,
duration: '30s',
};
export default function () {
http.get('https://test.k6.io');
sleep(1);
}
What each part does:
import http from 'k6/http'— brings in the built-in HTTP client moduleexport const options— tells k6 to run 10 virtual users for 30 secondsexport default function— the loop body each VU executes repeatedlysleep(1)— pauses each VU for 1 second after each request, simulating user think time
The sleep(1) matters more than it looks. Without it, your VUs will generate as many requests per second as the network allows. That is sometimes what you want (stress testing), but it rarely reflects real user behavior (load testing). For login flows or page loads, users think for a moment between actions.
Step 3: Run the Test and Read the Output
k6 run basic-test.js
k6 prints a live counter while the test runs, then a full summary at the end:
data_received..................: 334 kB 11 kB/s
data_sent......................: 29 kB 955 B/s
http_req_blocked...............: avg=5.13ms min=1.46µs med=3.21µs max=299ms p(90)=7.14µs p(95)=9.21µs
http_req_connecting............: avg=1.08ms min=0s med=0s max=102ms p(90)=0s p(95)=0s
http_req_duration..............: avg=215ms min=163ms med=209ms max=618ms p(90)=277ms p(95)=313ms
{ expected_response:true }...: avg=215ms min=163ms med=209ms max=618ms p(90)=277ms p(95)=313ms
http_req_failed................: 0.00% ✓ 0 ✗ 281
http_req_receiving.............: avg=1.87ms min=40.6µs med=1.03ms max=88.5ms p(90)=4.13ms p(95)=5.89ms
http_req_sending...............: avg=18.4µs min=6.91µs med=14.6µs max=355µs p(90)=31.5µs p(95)=38.3µs
http_req_tls_handshaking.......: avg=4ms min=0s med=0s max=218ms p(90)=0s p(95)=0s
http_req_waiting...............: avg=213ms min=152ms med=207ms max=591ms p(90)=275ms p(95)=311ms
http_reqs......................: 281 9.36/s
iteration_duration.............: avg=1.22s min=1.17s med=1.21s max=1.62s p(90)=1.28s p(95)=1.32s
iterations.....................: 281 9.36/s
vus............................: 10 min=10 max=10
vus_max........................: 10 min=10 max=10
The lines you care about most:
http_req_duration— your end-to-end latency. Thep(95)=313msline means 95% of all requests finished in 313ms or less. p(95) is the standard SLO metric because it captures tail latency without being skewed by rare outliers.http_req_failed— the fraction of requests that received a 4xx or 5xx response. Zero is good.http_reqs— total requests sent and the rate. Here k6 sent 281 requests at ~9.36 per second across 10 VUs.http_req_waiting— time the client waited for the first byte after sending the request. This is the server processing time, stripped of network transfer. Highhttp_req_waitingmeans your server is slow; highhttp_req_receivingmeans data transfer is slow.
Step 4: Add Checks and Thresholds
Raw numbers are not enough. You need to define what “passing” looks like before you run the test — otherwise you will unconsciously adjust your standard to match whatever the server delivered.
Edit basic-test.js:
import http from 'k6/http';
import { check, sleep } from 'k6';
export const options = {
vus: 20,
duration: '30s',
thresholds: {
http_req_failed: ['rate<0.01'], // less than 1% of requests fail
http_req_duration: ['p(95)<500'], // 95th percentile under 500ms
},
};
export default function () {
const res = http.get('https://test.k6.io');
check(res, {
'status is 200': (r) => r.status === 200,
'response time under 1s': (r) => r.timings.duration < 1000,
});
sleep(1);
}
Now if either threshold is breached, k6 exits with code 99. When you add this to a CI pipeline:
k6 run basic-test.js
if [ $? -ne 0 ]; then
echo "Load test failed: thresholds breached"
exit 1
fi
A performance regression will fail the pipeline the same way a broken unit test would.
Step 5: Simulate Realistic Traffic with Stages
Real traffic does not jump from 0 to 500 users instantly. It ramps up, holds, then ramps down. k6 stages let you model this shape:
import http from 'k6/http';
import { check, sleep } from 'k6';
export const options = {
stages: [
{ duration: '1m', target: 50 }, // ramp up to 50 VUs over 1 minute
{ duration: '3m', target: 50 }, // hold at 50 VUs for 3 minutes
{ duration: '1m', target: 0 }, // ramp down to 0
],
thresholds: {
http_req_failed: ['rate<0.01'],
http_req_duration: ['p(95)<500'],
},
};
export default function () {
const res = http.get('https://test.k6.io');
check(res, { 'status is 200': (r) => r.status === 200 });
sleep(1);
}
The ramp-up phase serves an important purpose: it lets your server warm up — JVM JIT compilation, database connection pool initialization, DNS caching. Jumping straight to peak load can produce results that are worse than what real users experience. The steady-state phase (the middle three minutes) is your actual measurement window. The ramp-down confirms the server recovers cleanly to baseline latency.
Step 6: Load Test a POST Endpoint with JSON
GET requests cover basic scenarios, but most critical application paths involve POST requests with JSON payloads. Here is how you load test a login endpoint:
import http from 'k6/http';
import { check, sleep } from 'k6';
export const options = {
stages: [
{ duration: '30s', target: 20 },
{ duration: '1m', target: 20 },
{ duration: '10s', target: 0 },
],
thresholds: {
http_req_failed: ['rate<0.05'],
http_req_duration: ['p(95)<800'],
},
};
const BASE_URL = 'http://192.168.1.100:3000';
export default function () {
const payload = JSON.stringify({
email: '[email protected]',
password: 'testpassword123',
});
const params = {
headers: { 'Content-Type': 'application/json' },
};
const res = http.post(`${BASE_URL}/api/auth/login`, payload, params);
check(res, {
'login successful': (r) => r.status === 200,
'token present': (r) => {
try {
return JSON.parse(r.body).token !== undefined;
} catch (_) {
return false;
}
},
});
sleep(1);
}
Replace 192.168.1.100:3000 with your staging server address. The Content-Type: application/json header is required — without it, most frameworks (Express, Fastify, Spring Boot) will not parse the body and your test will generate a flood of 400 errors that have nothing to do with performance.
The try/catch in the check is defensive: if the server returns a non-JSON body under load (an HTML error page, for example), JSON.parse would throw and crash the check function without it.
Common Mistakes & Troubleshooting
“WARN Request Failed” with connection refused
The target URL is not reachable. Verify with curl http://192.168.1.100:3000 from the same machine. Check that the service is running and that your firewall allows inbound connections on that port. If you have UFW enabled, see the rules with sudo ufw status.
All requests return 429 Too Many Requests
Your server has rate limiting configured. If you intentionally set up rate limiting (as covered in Configure Rate Limiting in Nginx on Ubuntu), this is expected behavior. For load testing purposes, either raise the rate limit in your staging configuration or reduce the k6 VU count so requests stay within the allowed window.
p(95) latency is 5–10x higher than single-request baseline
This is the classic sign of a server-side bottleneck: saturated CPU, database connection pool exhaustion, or hitting file descriptor limits. Run top or htop on the server while the k6 test is active. Check ss -s for connection counts. Look at your application’s connection pool settings — a pool of 5 connections serving 50 concurrent VUs will queue almost every request.
k6 cannot import npm packages
k6 uses its own JavaScript runtime (goja), not Node.js. You cannot require('axios') or install packages with npm. Use the built-in k6 modules only: k6/http, k6/metrics, k6/check, k6/crypto. If you need to share utility functions, put them in a local .js file and import them with a relative path.
VU count in output is lower than configured
If you configure 100 VUs but see only 20-30 active during the test, your sleep() value may be too high, or the test duration is too short to ramp all VUs before it ends. With sleep(5) and a 30-second duration, many VUs spend most of their time sleeping rather than requesting.
Best Practices
Always test on staging, not production. A test with 200 VUs and no sleep generates tens of thousands of requests per second. Running that against a live system affects real users and can cascade into a self-inflicted outage.
Start low and scale up. Run at 5 VUs first to confirm your script is correct and your target is reachable. Scale to 20, then 50, then your expected peak. If something breaks at 50 VUs, you want to know it at 50, not discover it after wasting 10 minutes at 500.
Add jitter to your sleep calls. Replace sleep(1) with sleep(Math.random() * 2 + 1) for a random pause between 1 and 3 seconds. Fixed sleep values cause synchronized request bursts as all VUs wake up at the same moment; random jitter spreads the load more naturally.
Test one scenario at a time. A script that hits 15 different endpoints tells you almost nothing useful. Test one critical path — the checkout flow, the search endpoint, the authentication sequence — and measure it precisely. You can run multiple scenario scripts and compare them.
Set thresholds before you look at results. Decide what p(95) and error rate are acceptable based on your SLOs, not based on what the server happened to deliver. Writing thresholds after the run is the same as drawing the target after you shoot.
Correlate with server-side metrics. k6 reports what the client observes. CPU usage, memory, database query times, and GC pressure on the server are equally important. If you have Prometheus and Grafana running (see Setup Prometheus and Grafana on Ubuntu), k6 can push metrics to Prometheus Remote Write in real time using the --out experimental-prometheus-rw flag, so you can watch k6 metrics alongside server metrics on the same dashboard.
Conclusion
You have installed k6 on Ubuntu and worked through a complete load testing workflow: a basic GET test, adding checks and thresholds, modeling a realistic ramp-up with stages, and testing a POST endpoint with JSON. You also know which metrics matter and how to interpret them.
The next step is to run this against your actual staging environment. Start at low VU counts, watch top and your application logs on the server, and increase load until you find the ceiling. Record the p(95) latency and max throughput for your current version — that number is your baseline. When you ship a new version, run the same test and compare. If p(95) jumps significantly without a reason, you have a regression before anyone in production notices it.
From here you can explore k6 extensions (xk6) for protocols beyond HTTP — WebSocket, gRPC, and browser automation are available as community extensions. For distributed load testing from multiple regions or larger VU counts than a single machine can generate, the Grafana Cloud k6 service runs the same scripts remotely with no infrastructure to manage.