Load Testing Your Web Application with k6 on Ubuntu

Your application works fine with ten users. But what happens when a thousand users hit your checkout endpoint at the same time? What about a flash sale that doubles your normal traffic in sixty seconds? Most teams find out the hard way — in production, during the event, when it is too late to fix anything.

Load testing gives you the answer before your users do. You simulate realistic traffic against your system, measure how it responds under pressure, and find the bottlenecks before they become outages.

k6 is a modern open-source load testing tool built by Grafana Labs. You write test scripts in JavaScript, run them from the command line, and get a clean summary of latency percentiles, request rates, and failure counts. It is fast (the runtime is written in Go), integrates naturally into CI/CD pipelines, and has become the standard choice for teams that want more than what Apache Bench or basic curl loops can offer.

In this tutorial you will install k6 on Ubuntu, write a load test script from scratch, run tests with different virtual user counts, use checks and thresholds to define pass/fail criteria, simulate realistic ramp-up traffic, and understand how to read and act on the output.

How k6 Works

k6 is not a browser tool. It does not render JavaScript, click buttons, or simulate a full browser session. It is an HTTP-level load generator: it creates virtual users (VUs), each of which runs your test script in a loop, sending HTTP requests as fast as your script and the network allow.

Virtual users are the core abstraction. Each VU is an independent worker that runs your script from top to bottom, then starts again from the top. If you run 50 VUs for 30 seconds, you have 50 concurrent workers hammering your server for that entire duration.

Iterations are how many times the script body runs in total. With 50 VUs each completing 20 iterations, you get 1,000 total script executions.

Checks are assertions embedded in your script — “the response status should be 200”, “the body should contain a token”. Failed checks do not abort the test; they are counted and appear in the final summary.

Thresholds are the pass/fail criteria you define upfront. You tell k6: “fail this test if p(95) latency exceeds 500ms” or “fail if more than 1% of requests get an error”. k6 exits with a non-zero code when any threshold is breached, which is what makes it drop naturally into CI pipelines — a failing load test breaks the build just like a failing unit test.

Prerequisites

Ubuntu 20.04, 22.04, or 24.04
A non-root user with sudo privileges
Basic command-line familiarity
Enough JavaScript to read function calls and object literals
A running HTTP service to test against — a local nginx, a staging API, or the public https://test.k6.io endpoint used in the examples below

Step 1: Install k6

k6 provides an official apt repository. Add the signing key and repository, then install:

sudo gpg --no-default-keyring \
  --keyring /usr/share/keyrings/k6-archive-keyring.gpg \
  --keyserver hkp://keyserver.ubuntu.com:80 \
  --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69

echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] \
  https://dl.k6.io/deb stable main" \
  | sudo tee /etc/apt/sources.list.d/k6.list

sudo apt update && sudo apt install k6 -y

Verify the installation:

k6 version

You should see something like:

k6 v0.55.0 (go1.23.4, linux/amd64)

Step 2: Write Your First Load Test Script

k6 test scripts are plain JavaScript files. Create a project directory and an initial script:

mkdir ~/k6-tests && cd ~/k6-tests
nano basic-test.js

Paste this into the file:

import http from 'k6/http';
import { sleep } from 'k6';

export const options = {
  vus: 10,
  duration: '30s',
};

export default function () {
  http.get('https://test.k6.io');
  sleep(1);
}

What each part does:

import http from 'k6/http' — brings in the built-in HTTP client module
export const options — tells k6 to run 10 virtual users for 30 seconds
export default function — the loop body each VU executes repeatedly
sleep(1) — pauses each VU for 1 second after each request, simulating user think time

The sleep(1) matters more than it looks. Without it, your VUs will generate as many requests per second as the network allows. That is sometimes what you want (stress testing), but it rarely reflects real user behavior (load testing). For login flows or page loads, users think for a moment between actions.

Step 3: Run the Test and Read the Output

k6 run basic-test.js

k6 prints a live counter while the test runs, then a full summary at the end:

     data_received..................: 334 kB  11 kB/s
     data_sent......................: 29 kB   955 B/s
     http_req_blocked...............: avg=5.13ms   min=1.46µs  med=3.21µs  max=299ms  p(90)=7.14µs  p(95)=9.21µs
     http_req_connecting............: avg=1.08ms   min=0s      med=0s      max=102ms  p(90)=0s      p(95)=0s
     http_req_duration..............: avg=215ms    min=163ms   med=209ms   max=618ms  p(90)=277ms   p(95)=313ms
       { expected_response:true }...: avg=215ms    min=163ms   med=209ms   max=618ms  p(90)=277ms   p(95)=313ms
     http_req_failed................: 0.00%   ✓ 0         ✗ 281
     http_req_receiving.............: avg=1.87ms   min=40.6µs  med=1.03ms  max=88.5ms p(90)=4.13ms  p(95)=5.89ms
     http_req_sending...............: avg=18.4µs   min=6.91µs  med=14.6µs  max=355µs  p(90)=31.5µs  p(95)=38.3µs
     http_req_tls_handshaking.......: avg=4ms      min=0s      med=0s      max=218ms  p(90)=0s      p(95)=0s
     http_req_waiting...............: avg=213ms    min=152ms   med=207ms   max=591ms  p(90)=275ms   p(95)=311ms
     http_reqs......................: 281     9.36/s
     iteration_duration.............: avg=1.22s    min=1.17s   med=1.21s   max=1.62s  p(90)=1.28s   p(95)=1.32s
     iterations.....................: 281     9.36/s
     vus............................: 10      min=10      max=10
     vus_max........................: 10      min=10      max=10

The lines you care about most:

http_req_duration — your end-to-end latency. The p(95)=313ms line means 95% of all requests finished in 313ms or less. p(95) is the standard SLO metric because it captures tail latency without being skewed by rare outliers.
http_req_failed — the fraction of requests that received a 4xx or 5xx response. Zero is good.
http_reqs — total requests sent and the rate. Here k6 sent 281 requests at ~9.36 per second across 10 VUs.
http_req_waiting — time the client waited for the first byte after sending the request. This is the server processing time, stripped of network transfer. High http_req_waiting means your server is slow; high http_req_receiving means data transfer is slow.

Step 4: Add Checks and Thresholds

Raw numbers are not enough. You need to define what “passing” looks like before you run the test — otherwise you will unconsciously adjust your standard to match whatever the server delivered.

Edit basic-test.js:

import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  vus: 20,
  duration: '30s',
  thresholds: {
    http_req_failed: ['rate<0.01'],      // less than 1% of requests fail
    http_req_duration: ['p(95)<500'],    // 95th percentile under 500ms
  },
};

export default function () {
  const res = http.get('https://test.k6.io');

  check(res, {
    'status is 200': (r) => r.status === 200,
    'response time under 1s': (r) => r.timings.duration < 1000,
  });

  sleep(1);
}

Now if either threshold is breached, k6 exits with code 99. When you add this to a CI pipeline:

k6 run basic-test.js
if [ $? -ne 0 ]; then
  echo "Load test failed: thresholds breached"
  exit 1
fi

A performance regression will fail the pipeline the same way a broken unit test would.

Step 5: Simulate Realistic Traffic with Stages

Real traffic does not jump from 0 to 500 users instantly. It ramps up, holds, then ramps down. k6 stages let you model this shape:

import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  stages: [
    { duration: '1m', target: 50 },   // ramp up to 50 VUs over 1 minute
    { duration: '3m', target: 50 },   // hold at 50 VUs for 3 minutes
    { duration: '1m', target: 0 },    // ramp down to 0
  ],
  thresholds: {
    http_req_failed: ['rate<0.01'],
    http_req_duration: ['p(95)<500'],
  },
};

export default function () {
  const res = http.get('https://test.k6.io');
  check(res, { 'status is 200': (r) => r.status === 200 });
  sleep(1);
}

The ramp-up phase serves an important purpose: it lets your server warm up — JVM JIT compilation, database connection pool initialization, DNS caching. Jumping straight to peak load can produce results that are worse than what real users experience. The steady-state phase (the middle three minutes) is your actual measurement window. The ramp-down confirms the server recovers cleanly to baseline latency.

Step 6: Load Test a POST Endpoint with JSON

GET requests cover basic scenarios, but most critical application paths involve POST requests with JSON payloads. Here is how you load test a login endpoint:

import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  stages: [
    { duration: '30s', target: 20 },
    { duration: '1m',  target: 20 },
    { duration: '10s', target: 0 },
  ],
  thresholds: {
    http_req_failed: ['rate<0.05'],
    http_req_duration: ['p(95)<800'],
  },
};

const BASE_URL = 'http://192.168.1.100:3000';

export default function () {
  const payload = JSON.stringify({
    email: '[email protected]',
    password: 'testpassword123',
  });

  const params = {
    headers: { 'Content-Type': 'application/json' },
  };

  const res = http.post(`${BASE_URL}/api/auth/login`, payload, params);

  check(res, {
    'login successful': (r) => r.status === 200,
    'token present': (r) => {
      try {
        return JSON.parse(r.body).token !== undefined;
      } catch (_) {
        return false;
      }
    },
  });

  sleep(1);
}

Replace 192.168.1.100:3000 with your staging server address. The Content-Type: application/json header is required — without it, most frameworks (Express, Fastify, Spring Boot) will not parse the body and your test will generate a flood of 400 errors that have nothing to do with performance.

The try/catch in the check is defensive: if the server returns a non-JSON body under load (an HTML error page, for example), JSON.parse would throw and crash the check function without it.

Common Mistakes & Troubleshooting

“WARN Request Failed” with connection refused
The target URL is not reachable. Verify with curl http://192.168.1.100:3000 from the same machine. Check that the service is running and that your firewall allows inbound connections on that port. If you have UFW enabled, see the rules with sudo ufw status.

All requests return 429 Too Many Requests
Your server has rate limiting configured. If you intentionally set up rate limiting (as covered in Configure Rate Limiting in Nginx on Ubuntu), this is expected behavior. For load testing purposes, either raise the rate limit in your staging configuration or reduce the k6 VU count so requests stay within the allowed window.

p(95) latency is 5–10x higher than single-request baseline
This is the classic sign of a server-side bottleneck: saturated CPU, database connection pool exhaustion, or hitting file descriptor limits. Run top or htop on the server while the k6 test is active. Check ss -s for connection counts. Look at your application’s connection pool settings — a pool of 5 connections serving 50 concurrent VUs will queue almost every request.

k6 cannot import npm packages
k6 uses its own JavaScript runtime (goja), not Node.js. You cannot require('axios') or install packages with npm. Use the built-in k6 modules only: k6/http, k6/metrics, k6/check, k6/crypto. If you need to share utility functions, put them in a local .js file and import them with a relative path.

VU count in output is lower than configured
If you configure 100 VUs but see only 20-30 active during the test, your sleep() value may be too high, or the test duration is too short to ramp all VUs before it ends. With sleep(5) and a 30-second duration, many VUs spend most of their time sleeping rather than requesting.

Best Practices

Always test on staging, not production. A test with 200 VUs and no sleep generates tens of thousands of requests per second. Running that against a live system affects real users and can cascade into a self-inflicted outage.

Start low and scale up. Run at 5 VUs first to confirm your script is correct and your target is reachable. Scale to 20, then 50, then your expected peak. If something breaks at 50 VUs, you want to know it at 50, not discover it after wasting 10 minutes at 500.

Add jitter to your sleep calls. Replace sleep(1) with sleep(Math.random() * 2 + 1) for a random pause between 1 and 3 seconds. Fixed sleep values cause synchronized request bursts as all VUs wake up at the same moment; random jitter spreads the load more naturally.

Test one scenario at a time. A script that hits 15 different endpoints tells you almost nothing useful. Test one critical path — the checkout flow, the search endpoint, the authentication sequence — and measure it precisely. You can run multiple scenario scripts and compare them.

Set thresholds before you look at results. Decide what p(95) and error rate are acceptable based on your SLOs, not based on what the server happened to deliver. Writing thresholds after the run is the same as drawing the target after you shoot.

Correlate with server-side metrics. k6 reports what the client observes. CPU usage, memory, database query times, and GC pressure on the server are equally important. If you have Prometheus and Grafana running (see Setup Prometheus and Grafana on Ubuntu), k6 can push metrics to Prometheus Remote Write in real time using the --out experimental-prometheus-rw flag, so you can watch k6 metrics alongside server metrics on the same dashboard.

Conclusion

You have installed k6 on Ubuntu and worked through a complete load testing workflow: a basic GET test, adding checks and thresholds, modeling a realistic ramp-up with stages, and testing a POST endpoint with JSON. You also know which metrics matter and how to interpret them.

The next step is to run this against your actual staging environment. Start at low VU counts, watch top and your application logs on the server, and increase load until you find the ceiling. Record the p(95) latency and max throughput for your current version — that number is your baseline. When you ship a new version, run the same test and compare. If p(95) jumps significantly without a reason, you have a regression before anyone in production notices it.

From here you can explore k6 extensions (xk6) for protocols beyond HTTP — WebSocket, gRPC, and browser automation are available as community extensions. For distributed load testing from multiple regions or larger VU counts than a single machine can generate, the Grafana Cloud k6 service runs the same scripts remotely with no infrastructure to manage.

Load Testing Your Web Application with k6 on Ubuntu

How k6 Works

Prerequisites

Step 1: Install k6

Step 2: Write Your First Load Test Script

Step 3: Run the Test and Read the Output

Step 4: Add Checks and Thresholds

Step 5: Simulate Realistic Traffic with Stages

Step 6: Load Test a POST Endpoint with JSON

Common Mistakes & Troubleshooting

Best Practices

Conclusion

Getting Started with HashiCorp Vault on Ubuntu

Set Up a Self-Hosted GitHub Actions Runner on Ubuntu

Getting Started with Apache Kafka on Ubuntu using KRaft Mode

Load Testing HTTP/2 Applications with h2load on Ubuntu

Setup CoreDNS as a Local DNS Server on Ubuntu

Install Linkerd Service Mesh on Kubernetes

Getting Started with HashiCorp Vault on Ubuntu

Learn, Execute, Innovate