Using load-tests to ensure that your infrastructure, applications and services can handle user requests in a timely manner is a mandatory step in your application delivery pipeline. To this end, there are many types of load-tests at your disposal - helping you optimize resources/cost and identify bottlenecks by simulating different usage patterns that your application needs to handle gracefully, for example
It goes without saying that generating the load often required for these tests is not something that is realistically done from your local machine - considering the needs for high and consistent compute/network throughput over prolonged periods of time.
This article explores how Kubernetes can be used as a foundation for a scalable load-testing strategy, using CNCF tools to orchestrate, execute and analyze results accordingly.
Before we dive into the nitty-gritty, let's discuss two of the most common load-testing metrics that are key to planning and executing your load-tests; Virtual Users (VUs) and Requests per Second (RPS).
Virtual Users is used to depict the simulation of an actual user interaction with your system. For example when load-testing a “Create Account” flow in an ecommerce application, a corresponding user would have to go through multiple steps;
Under the hood this would result in multiple requests to your BE, both for the “visual” actions performed by the user, but possibly also for auto-complete or validation logic done in the background. These requests would not all come at the same time, but with some kind of delays between them, depending on how fast (or slow) an actual user would walk through the process. The resulting “Virtual User script” would include all these requests, with corresponding delays between them.
Requests Per Second (RPS) on the other hand is a more direct measurement of individual requests made to an API or BE, which doesn’t necessarily correlate to a discrete (stateful) user interaction as described for a Virtual User above.
Translating between these is doable but can be misleading. For the example above, let’s say the “Create Account” flow for 1 VU generates 10 backend requests over an average user time of 60 seconds - which means that 100 VUs will generate 1000 requests per minute - or roughly 17 RPS (if they come at regular intervals - unlikely..).
Which of these to use when calibrating your load tests usually depends on your goal - both E2E performance tests, and testing a single API endpoint are completely valid (and common) cases. In the first case it's more about users, in the second it's about plain RPS. It is very common to check both of them while executing performance tests - for example after initial E2E tests checking specific endpoints that seem to underperform, checking the endpoints that are the most "costly" (response time multiplied by frequency), etc. Or, after mapping specific endpoints, frequency, etc. reflecting them in the tests - that will do just a few requests.
Going back to the example above, you might start by running the “Create Account” test with 10000 VUs and identify one of the BE validation calls as a bottle-neck - in that case an API load-test purely on the validation endpoint might be used to ensure a throughput or 100 RPS for that request specifically.
It is crucial to understand these VU / RPS metrics when choosing a load-testing tool and approach for your testing; different tools will implement VUs differently and understanding their corresponding heuristics will help you gain a more accurate understanding of your system behavior when testing.
Thanks to its resource management and scalability capabilities, Kubernetes provides a solid foundation for running load-tests at scale, making it possible to distribute load-testing tools across multiple nodes when needed, with resource allocations tuned to the need of the testing tool at hand.
Going into the details though, there are 3 aspects of distributed load-testing in Kubernetes that need special attention for the testing to be successful:
Testkube is a generic test orchestration and execution framework that allows you to leverage the benefits of scalability and centralized test execution for any testing tool or script that you may already be using. Testkube uses Kubernetes as its test-execution runtime, allowing it to leverage corresponding scalability and resource management features for load-test execution and distribution accordingly.
Testkube features a purpose built test-execution engine that tackles many of the above mentioned challenges head-on:
Let’s have a look at how Testkube can be used to scale k6 tests to generate massive load in distributed execution setup.
As discussed earlier in this article, the load generation strategy, and configuration can differ significantly depending on specific testing objectives. That also applies to VUs or connection reuse settings, which need to match the specific use-case. For the purpose of this article we will use a simple k6 script to generate a singular load on a target service. This article is focused purely on load generation, so GCP Cloud Storage has been used as a target service to take this part out of the equation (the initial warm up has been executed anyway before running the actual tests).
The script looks as follows:
import http from 'k6/http';
import { check } from 'k6';
export default function () {
const res = http.get('');
check(res, { 'status was 200': (r) => r.status == 200 });
check(res, {
'verify partial text': (r) =>
r.body.includes('Testkube test page - Lipsum'),
It’s available in the Testkube repository:
Before we will start with the Testkube run, first, let’s check what kind of load can be reallisticly generated using rather typical dev PC with the following spec:
AMD Ryzen 7 5800X(8 core/16 thread) CPU
1Gb/500Mb Internet connection
Operating system: Ubuntu 24.04 LTS
The operating system configuration has been adjusted according to the k6 recommendations (
When focusing on an RPS specifically, the VU setting can essentially be viewed as "threads". In this case the goal is to identify the VU settings that will result in the highest stable, and predictable settings. It can be achieved by gradually increasing the VU while checking the resource usage, and the resulting load.
In this case, the 100% CPU utilisation has been reached for about 900-1000VU, that’s where the RPS also peaked.
checks.........................: 100.00% 1643464 out of 1643464
data_received..................: 6.7 GB 105 MB/s
data_sent......................: 93 MB 1.5 MB/s
http_req_blocked...............: avg=172.9µs min=90ns med=220ns max=1.72s p(90)=290ns p(95)=310ns
http_req_connecting............: avg=20.18µs min=0s med=0s max=62.94ms p(90)=0s p(95)=0s
http_req_duration..............: avg=58.27ms min=11.49ms med=30.72ms max=28.15s p(90)=70.64ms p(95)=267.49ms
{ expected_response:true }...: avg=58.27ms min=11.49ms med=30.72ms max=28.15s p(90)=70.64ms p(95)=267.49ms
http_req_failed................: 0.00% 0 out of 821732
http_req_receiving.............: avg=36.95ms min=19.36µs med=2.06ms max=28.14s p(90)=49.84ms p(95)=246.62ms
http_req_sending...............: avg=18.46µs min=6.71µs med=16.36µs max=67.02ms p(90)=23.12µs p(95)=28.16µs
http_req_tls_handshaking.......: avg=86.68µs min=0s med=0s max=1.62s p(90)=0s p(95)=0s
http_req_waiting...............: avg=21.29ms min=7.2ms med=14.66ms max=6.76s p(90)=32.52ms p(95)=34.61ms
http_reqs......................: 821732 12958.615032/s
iteration_duration.............: avg=58.51ms min=11.59ms med=30.81ms max=28.15s p(90)=70.85ms p(95)=267.66ms
iterations.....................: 821732 12958.615032/s
vus............................: 1 min=1 max=800
vus_max........................: 800 min=800 max=800
(This is the standard k6 results output - read more at the Grafana k6 docs)
Currently, the CPU seems to be the only limiting factor (according to htop, nmon, and iotop). However, the network utilization is high, and is close to also becoming the limiting factor. Both RAM, and disk usage is low, and insignificant.
Now, let’s start running this k6 test with Testkube. For this example a node pool with GCP `c3d-highcpu-16` nodes has been added to our test cluster. The nodes are specced as following:
Egress/Ingress bandwidth: up to 20 Gb/s
First, let’s run a standard Test Workflow, with just a single k6 instance.
kind: TestWorkflow
name: k6-perf-test-gcp
Content: # 1
revision: main
- test/k6/executor-tests/k6-perf-test-gcp.js
requests: # 2
cpu: 15
memory: 25Gi
workingDir: /data/repo/test/k6/executor-tests
config: # 3
vus: {type: integer}
duration: {type: string, default: '1m'}
- name: Run test
image: grafana/k6:0.49.0
- run:
shell: mkdir /data/artifacts && k6 run k6-perf-test-gcp.js --vus {{ config.vus }} --duration {{ config.duration }} # 4
env: # 5
value: "true"
value: "/data/artifacts/k6-test-report.html"
artifacts: # 6
workingDir: /data/artifacts
- '*'
1: K6 test from the repository
2: Resource requests have been set according to the node size (including slight overhead).
3. Config options, so specific settings (VUs, and duration in this case) can be set while running workflow
4. Run command - directory need to be created, so the k6 HTML report can be saved; config options passed to k6 command
5. K6 ENVs - to generate k6 html report
6. Saving test artifacts (HTML report from k6)
The workflow has then been executed multiple times with different VU settings, and resource usage has been monitored again to determine the most optimal settings for machine specs.The CPU usage, which seemed to be the limiting factor again, has been maxed out for 1100-1200 VU, and 1000VU has been chosen as the “safe” setting.
Having the nodes deployed in the same provider as the target service resulted in significantly lower response times in comparison to local runs. So, the average RPS (for 1000VU) was significantly higher - around 40kRPS.
checks.........................: 100.00% ✓ 4811016 ✗ 0
data_received..................: 20 GB 329 MB/s
data_sent......................: 296 MB 4.9 MB/s
http_req_blocked...............: avg=99.91µs min=90ns med=220ns max=486.22ms p(90)=270ns p(95)=300ns
http_req_connecting............: avg=25.3µs min=0s med=0s max=410.64ms p(90)=0s p(95)=0s
http_req_duration..............: avg=23.59ms min=10.94ms med=20.76ms max=679.57ms p(90)=32.67ms p(95)=39.76ms
{ expected_response:true }...: avg=23.59ms min=10.94ms med=20.76ms max=679.57ms p(90)=32.67ms p(95)=39.76ms
http_req_failed................: 0.00% ✓ 0 ✗ 2405508
http_req_receiving.............: avg=2.94ms min=33.86µs med=1.49ms max=271.81ms p(90)=7.38ms p(95)=11.25ms
http_req_sending...............: avg=43.6µs min=14.65µs med=23.98µs max=45.59ms p(90)=61.73µs p(95)=90.98µs
http_req_tls_handshaking.......: avg=50.73µs min=0s med=0s max=409.7ms p(90)=0s p(95)=0s
http_req_waiting...............: avg=20.6ms min=4.53ms med=18.8ms max=677.7ms p(90)=25.67ms p(95)=30.08ms
http_reqs......................: 2405508 39998.588465/s
iteration_duration.............: avg=24.91ms min=11.02ms med=21.2ms max=679.66ms p(90)=36.19ms p(95)=45ms
iterations.....................: 2405508 39998.588465/s
vus............................: 1000 min=1000 max=1000
vus_max........................: 1000 min=1000 max=1000
Results have been compared for executions up to 1000VU, and the response times (both average, and P90s) have been very consistent (differences below 0.5ms). That means both the load generation, and the target service are stable under load.
At the same time, it’s important to choose VU settings wisely and thoroughly. Misconfigured or overly aggressive VU settings can lead to inconsistencies, degraded performance, and misleading conclusions about the system's actual capabilities. In the following case a significant increase in both average response time (39.26ms vs. 23.59ms) and 90th percentile response time (56.54ms vs. 32.67ms) are visible, indicating a noticeable performance degradation.
checks.........................: 100.00% ✓ 4728432 ✗ 0
data_received..................: 20 GB 325 MB/s
data_sent......................: 292 MB 4.9 MB/s
http_req_blocked...............: avg=368.1µs min=90ns med=200ns max=1.25s p(90)=260ns p(95)=290ns
http_req_connecting............: avg=88.2µs min=0s med=0s max=1.09s p(90)=0s p(95)=0s
http_req_duration..............: avg=39.26ms min=11.08ms med=37.51ms max=1.13s p(90)=56.54ms p(95)=63.79ms
{ expected_response:true }...: avg=39.26ms min=11.08ms med=37.51ms max=1.13s p(90)=56.54ms p(95)=63.79ms
http_req_failed................: 0.00% ✓ 0 ✗ 2364216
http_req_receiving.............: avg=9.11ms min=39.84µs med=9.07ms max=772.14ms p(90)=16.94ms p(95)=19.35ms
http_req_sending...............: avg=106.97µs min=15.25µs med=25.45µs max=737.21ms p(90)=70.48µs p(95)=105.03µs
http_req_tls_handshaking.......: avg=242.01µs min=0s med=0s max=1.16s p(90)=0s p(95)=0s
http_req_waiting...............: avg=30.04ms min=0s med=27.52ms max=1.13s p(90)=43.97ms p(95)=51.4ms
http_reqs......................: 2364216 39374.691594/s
iteration_duration.............: avg=50.3ms min=11.73ms med=46.26ms max=1.29s p(90)=77.27ms p(95)=89.62ms
iterations.....................: 2364216 39374.691594/s
vus............................: 2000 min=2000 max=2000
vus_max........................: 2000 min=2000 max=2000
Now, let’s target over 100k+ RPS. Testkube's `parallel` option allows running multiple “workers”, offering both scalability and flexibility. In this case, to achieve 100k+ RPS, we would need 3 workers.
kind: TestWorkflow
name: k6-perf-test-workers
vus: {type: integer, default: 20}
duration: {type: string, default: '1m'}
workers: {type: integer, default: 3} # 1
revision: main
- test/k6/executor-tests/k6-perf-test-gcp.js
- name: Run test
count: 'config.workers'
transfer: # 2
- from: /data/repo
fetch: # 3
- from: /data/artifacts
- name: distribute/evenly # 4
cpu: 15
memory: 25Gi
paused: true # 5
image: grafana/k6:0.49.0
workingDir: /data/repo/test/k6/executor-tests
shell: mkdir /data/artifacts && k6 run k6-perf-test.js --vus {{ config.vus }} --duration {{ config.duration }} --execution-segment '{{ index }}/{{ count }}:{{ index + 1 }}/{{ count }}'
value: "true"
value: "/data/artifacts/k6-test-report-worker-{{ index + 1}}.html"
workingDir: /data/artifacts
- '*.html'
1 - additional config option - number of parallel “workers”
2 - transfer - copying test file to worker “instances”
3 - fetch - collecting artifacts from worker “instances”
4 - distribute evenly assures specific “workers” will be distributed evenly across k8s nodes
5 - synchronise running all workers, so each “worker” will start load generation simultaneously
K6 `--execution-segment` is used to spread the load generation across multiple “instances” evenly, and to make the results aggregation easier.
Let’s now run this workflow with 3 workers, 3k VU (in total), and 15m execution time, and check the actual performance.
Worker 1:
checks.........................: 100.00% ✓ 68146940 ✗ 0
data_received..................: 281 GB 312 MB/s
data_sent......................: 4.2 GB 4.7 MB/s
http_req_duration..............: avg=25.69ms min=10.01ms med=19.59ms max=16.68s p(90)=28.86ms p(95)=36.03ms
http_reqs......................: 34073470 37853.093901/s
Worker 2:
checks.........................: 100.00% ✓ 67184534 ✗ 0
data_received..................: 277 GB 307 MB/s
data_sent......................: 4.1 GB 4.6 MB/s
http_req_duration..............: avg=26.03ms min=10.33ms med=19.81ms max=16.65s p(90)=29.57ms p(95)=36.96ms
http_reqs......................: 33592267 37320.754617/s
Worker 3:
checks.........................: 100.00% ✓ 68536268 ✗ 0
data_received..................: 282 GB 314 MB/s
data_sent......................: 4.2 GB 4.7 MB/s
http_req_duration..............: avg=25.57ms min=10.01ms med=19.49ms max=16.73s p(90)=28.78ms p(95)=36.04ms
http_reqs......................: 34268134 38069.613718/s
Which sums up to over 113k RPS (6,78M RPM). The target service handles increased load consistently, however the average response time slightly increased resulting in slightly lower RPS.
The resources usage during end of the run looked like this:
gke-testkube-cloud-e-perf-test-node-p-199ba712-97lq 15280m 96% 23897Mi 84%
gke-testkube-cloud-e-perf-test-node-p-199ba712-cksq 15565m 97% 19874Mi 69%
gke-testkube-cloud-e-perf-test-node-p-199ba712-d28c 15670m 98% 19741Mi 69%
So, the nodes have been fully utilized in terms of CPU. Initially low memory usage increased significantly over time during this 15m run, and for longer runs would probably need to be increased (for example by using c3d-standard nodes instead of `high-cpu`). Network utilization during execution averaged at about 2.5Gb/s per “worker”.
In the above example, several separate k6 instances were executed, each generating its own individual report. While this approach may be sufficient for certain scenarios, it has limitations when you need a consolidated view of k6 test results. The workflow can be further extended by leveraging k6’s Prometheus endpoint feature. It enables centralized metrics collection and aggregation across all instances and supports visualization through tools like Grafana, providing real-time insights and a unified analysis of performance test data.
In this case the Prometheus and Grafana have been deployed at our test cluster, you can find prometheus and grafana values file here (Prometheus remote-write-receiver and Prometheus Narive histograms enabled):
The previous workflow has been extended with:
- k6 command: `-o experimental-prometheus-rw --tag testid=worker-{{ index + 1}}`
- ENVs:
value: 'http://prometheus-server.prometheus-grafana.svc.cluster.local:80/api/v1/write'
- name: K6_PROMETHEUS_RW_TREND_STATS # additional trend stats
value: 'p(95),p(99),min,max'
- name: K6_PROMETHEUS_RW_TREND_AS_NATIVE_HISTOGRAM # native histogram enabled
value: "true"
Resulting in this workflow:
kind: TestWorkflow
name: k6-perf-test-gcp-workers-prometheus
vus: {type: integer}
duration: {type: string, default: '1m'}
workers: {type: integer}
revision: main
- test/k6/executor-tests/k6-perf-test-gcp.js
- name: Run test
count: 'config.workers'
- from: /data/repo
- from: /data/artifacts
- name: distribute/evenly
cpu: 15
memory: 25Gi
paused: true # synchronise running all workers
image: grafana/k6:0.49.0
workingDir: /data/repo/test/k6/executor-tests
shell: mkdir /data/artifacts && k6 run k6-perf-test-gcp.js -o experimental-prometheus-rw --vus {{ config.vus }} --duration {{ config.duration }} --execution-segment '{{ index }}/{{ count }}:{{ index + 1 }}/{{ count }}' --tag testid=worker-{{ index + 1}}
value: "true"
value: "/data/artifacts/k6-test-report-worker-{{ index + 1}}.html"
value: 'http://prometheus-server.prometheus-grafana.svc.cluster.local:80/api/v1/write'
value: 'p(95),p(99),min,max'
value: "true"
workingDir: /data/artifacts
- '*.html'
As demonstrated in the examples above, using Testkube and its parallel feature to run k6 tests made exceeding even 100k RPS almost effortless. Testkube's flexibility and scalability make it a perfect tool for running load tests at scale. Its tool-agnostic, and highly configurable design allows seamless integration with k6 and a wide range of other testing tools. Whether it’s load testing with tools like k6, JMeter, or Artillery, API testing with Postman, or UI testing with Playwright or Cypress, Testkube seamlessly adapts to meet the needs of various testing scenarios.
You can find additional Test Workflow examples for k6 in the Testkube docs: You may also want to check examples for other load testing tools:
- JMeter:
- Artillery:
- Gatling:
Get started with Testkube at - there are both Cloud and On-Prem versions available - and don’t hesitate to reach out to our team on Slack for any questions you might have!
Testkube is a test execution and orchestration framework for Kubernetes that works with any CI/CD system and testing tool you need, empowering teams to deliver on the promise of agile, efficient, and comprehensive testing programs by leveraging all the capabilities of K8s to eliminate CI/CD bottlenecks, perfecting your testing workflow. Get started with Testkube's free trial today.
Related topics: