Responsive

Using Kubernetes to Scale your Load Testing

Published

July 20, 2025

Tomasz Konieczny

QA Lead

Testkube

Share this article:

Unlock Better Testing Workflows in Kubernetes — Try Testkube for Free

Get Started

You have successfully subscribed to the Testkube newsletter.

Oops! Something went wrong while submitting the form.

Last updated

July 19, 2025

Tomasz Konieczny

QA Lead

Testkube

Background

Using load-tests to ensure that your infrastructure, applications and services can handle user requests in a timely manner is a mandatory step in your application delivery pipeline. To this end, there are many types of load-tests at your disposal - helping you optimize resources/cost and identify bottlenecks by simulating different usage patterns that your application needs to handle gracefully, for example

Stress tests are used to identify and improve the “breaking point” of your system by gradually increasing load on your system.
Burst tests can be used to generate (very) high load on your system in bursts to see that your infrastructure can scale under load and without downtime and recover gracefully.
Soak tests can be used to put a base load on your system over prolonged periods of time (hours/days) to help detect memory/resource leaks or infrastructure contention issues (logs, etc).

It goes without saying that generating the load often required for these tests is not something that is realistically done from your local machine - considering the needs for high and consistent compute/network throughput over prolonged periods of time.

This article explores how Kubernetes can be used as a foundation for a scalable load-testing strategy, using CNCF tools to orchestrate, execute and analyze results accordingly.

Virtual Users vs Requests Per Second

Before we dive into the nitty-gritty, let's discuss two of the most common load-testing metrics that are key to planning and executing your load-tests; Virtual Users (VUs) and Requests per Second (RPS).

Virtual Users is used to depict the simulation of an actual user interaction with your system. For example when load-testing a “Create Account” flow in an ecommerce application, a corresponding user would have to go through multiple steps;

Navigate to the signup page
Fill out the form/wizard
Submit

Under the hood this would result in multiple requests to your BE, both for the “visual” actions performed by the user, but possibly also for auto-complete or validation logic done in the background. These requests would not all come at the same time, but with some kind of delays between them, depending on how fast (or slow) an actual user would walk through the process. The resulting “Virtual User script” would include all these requests, with corresponding delays between them.

Requests Per Second (RPS) on the other hand is a more direct measurement of individual requests made to an API or BE, which doesn’t necessarily correlate to a discrete (stateful) user interaction as described for a Virtual User above.

Translating between these is doable but can be misleading. For the example above, let’s say the “Create Account” flow for 1 VU generates 10 backend requests over an average user time of 60 seconds - which means that 100 VUs will generate 1000 requests per minute - or roughly 17 RPS (if they come at regular intervals - unlikely..).

Which of these to use when calibrating your load tests usually depends on your goal - both E2E performance tests, and testing a single API endpoint are completely valid (and common) cases. In the first case it's more about users, in the second it's about plain RPS. It is very common to check both of them while executing performance tests - for example after initial E2E tests checking specific endpoints that seem to underperform, checking the endpoints that are the most "costly" (response time multiplied by frequency), etc. Or, after mapping specific endpoints, frequency, etc. reflecting them in the tests - that will do just a few requests.

Going back to the example above, you might start by running the “Create Account” test with 10000 VUs and identify one of the BE validation calls as a bottle-neck - in that case an API load-test purely on the validation endpoint might be used to ensure a throughput or 100 RPS for that request specifically.

It is crucial to understand these VU / RPS metrics when choosing a load-testing tool and approach for your testing; different tools will implement VUs differently and understanding their corresponding heuristics will help you gain a more accurate understanding of your system behavior when testing.

Kubernetes and for Load Testing

Thanks to its resource management and scalability capabilities, Kubernetes provides a solid foundation for running load-tests at scale, making it possible to distribute load-testing tools across multiple nodes when needed, with resource allocations tuned to the need of the testing tool at hand.

Going into the details though, there are 3 aspects of distributed load-testing in Kubernetes that need special attention for the testing to be successful:

Distribution of load-testing scripts and parameters: when distributing load across multiple nodes, the same scripts and parameters need to be distributed accordingly. For parameterised load-tests, it might be desirable to shard input parameters, i.e. divide them across nodes. For example, if your load-test reads input to simulate 100000 unique users and you want to run these across 10 nodes, do you want all nodes to simulate all 100000 users - or each node to simulate a subset (10000) of them?
Synchronization across multiple nodes: when starting multiple nodes you will want to ensure that they start at the same time to ensure relevant results. And correspondingly, you’ll need to wait for each node to finish before you can create any aggregate view of the results.
Real-time metrics/insights: While your load-tests are running, you will often want to monitor both resource usage of the system under test and throughput of the test itself. The latter needs to be collected, aggregated and preferably visualized in real-time from all nodes participating in the load-test.
Collection and aggregation of logs and results: Tests run on multiple nodes will often create results or reports in isolation, these need to be retrieved and aggregated to provide a holistic view of the results of your load-tests for further analysis.

Testkube for Load-testing Under Kubernetes

Testkube is a generic test orchestration and execution framework that allows you to leverage the benefits of scalability and centralized test execution for any testing tool or script that you may already be using. Testkube uses Kubernetes as its test-execution runtime, allowing it to leverage corresponding scalability and resource management features for load-test execution and distribution accordingly.

Testkube features a purpose built test-execution engine that tackles many of the above mentioned challenges head-on:

Parallelisation support makes it possible to scale any load-testing tool across multiple nodes for higher load-generation, for example k6, jmeter, artillery and gatling.
Parameterisation functionality allows you to provide sharded input data to your load tests when they are scaled across multiple nodes.
Support for running dedicated controllers/services required by tools like JMeter or Selenium.
Automatic collection of logs and results from distributed nodes into a centralized dashboard for analysis and reporting.

Let’s have a look at how Testkube can be used to scale k6 tests to generate massive load in distributed execution setup.

Putting Theory Into Practice

As discussed earlier in this article, the load generation strategy, and configuration can differ significantly depending on specific testing objectives. That also applies to VUs or connection reuse settings, which need to match the specific use-case. For the purpose of this article we will use a simple k6 script to generate a singular load on a target service. This article is focused purely on load generation, so GCP Cloud Storage has been used as a target service to take this part out of the equation (the initial warm up has been executed anyway before running the actual tests).

The script looks as follows:

import http from 'k6/http';
import { check } from 'k6';

export default function () {
  const res = http.get('https://storage.googleapis.com/perf-test-static-page-bucket/testkube-test-page-lorem-ipsum/index.html');
  check(res, { 'status was 200': (r) => r.status == 200 });
  check(res, {
    'verify partial text': (r) =>
      r.body.includes('Testkube test page - Lipsum'),
  });
}

‍

It’s available in the Testkube repository: https://github.com/kubeshop/testkube/blob/main/test/k6/executor-tests/k6-perf-test-gcp.js

Baseline Performance - Local k6 Run

Before we will start with the Testkube run, first, let’s check what kind of load can be reallisticly generated using rather typical dev PC with the following spec:

AMD Ryzen 7 5800X(8 core/16 thread) CPU
64 GB RAM
High IOPS SSD
1Gb/500Mb Internet connection
Operating system: Ubuntu 24.04 LTS

‍

The operating system configuration has been adjusted according to the k6 recommendations (https://grafana.com/docs/k6/latest/testing-guides/running-large-tests/#os-fine-tuning)

Defining The Optimal VU Settings

When focusing on an RPS specifically, the VU setting can essentially be viewed as "threads". In this case the goal is to identify the VU settings that will result in the highest stable, and predictable settings. It can be achieved by gradually increasing the VU while checking the resource usage, and the resulting load.

‍

In this case, the 100% CPU utilisation has been reached for about 900-1000VU, that’s where the RPS also peaked.

checks.........................: 100.00% 1643464 out of 1643464
data_received..................: 6.7 GB  105 MB/s
data_sent......................: 93 MB   1.5 MB/s
http_req_blocked...............: avg=172.9µs min=90ns    med=220ns   max=1.72s   p(90)=290ns   p(95)=310ns   
http_req_connecting............: avg=20.18µs min=0s      med=0s      max=62.94ms p(90)=0s      p(95)=0s      
http_req_duration..............: avg=58.27ms min=11.49ms med=30.72ms max=28.15s  p(90)=70.64ms p(95)=267.49ms
{ expected_response:true }...: avg=58.27ms min=11.49ms med=30.72ms max=28.15s  p(90)=70.64ms p(95)=267.49ms
http_req_failed................: 0.00%   0 out of 821732
http_req_receiving.............: avg=36.95ms min=19.36µs med=2.06ms  max=28.14s  p(90)=49.84ms p(95)=246.62ms
http_req_sending...............: avg=18.46µs min=6.71µs  med=16.36µs max=67.02ms p(90)=23.12µs p(95)=28.16µs 
http_req_tls_handshaking.......: avg=86.68µs min=0s      med=0s      max=1.62s   p(90)=0s      p(95)=0s      
http_req_waiting...............: avg=21.29ms min=7.2ms   med=14.66ms max=6.76s   p(90)=32.52ms p(95)=34.61ms 
http_reqs......................: 821732  12958.615032/s
iteration_duration.............: avg=58.51ms min=11.59ms med=30.81ms max=28.15s  p(90)=70.85ms p(95)=267.66ms
iterations.....................: 821732  12958.615032/s
vus............................: 1       min=1                  max=800
vus_max........................: 800     min=800                max=800

(This is the standard k6 results output - read more at the Grafana k6 docs)

‍

Currently, the CPU seems to be the only limiting factor (according to htop, nmon, and iotop). However, the network utilization is high, and is close to also becoming the limiting factor. Both RAM, and disk usage is low, and insignificant.

Running k6 Tests with Testkube - Single Instance

Now, let’s start running this k6 test with Testkube. For this example a node pool with GCP c3d-highcpu-16 nodes has been added to our test cluster. The nodes are specced as following:

16vCPU
32GB RAM
SSD
Egress/Ingress bandwidth: up to 20 Gb/s

First, let’s run a standard Test Workflow, with just a single k6 instance.

apiVersion: testworkflows.testkube.io/v1
kind: TestWorkflow
metadata:
  name: k6-perf-test-gcp
spec:
  Content: # 1
    git:
      uri: https://github.com/kubeshop/testkube
      revision: main
      paths:
      - test/k6/executor-tests/k6-perf-test-gcp.js
  container:
    resources:
      requests: # 2
        cpu: 15
        memory: 25Gi
    workingDir: /data/repo/test/k6/executor-tests
  config: # 3
    vus: {type: integer}
    duration: {type: string, default: '1m'}
  steps:
  - name: Run test
    container:
      image: grafana/k6:0.49.0
    steps:
    - run:
        shell: mkdir /data/artifacts && k6 run k6-perf-test-gcp.js --vus {{ config.vus }} --duration {{ config.duration }} # 4
        env: # 5
        - name: K6_WEB_DASHBOARD
          value: "true"
        - name: K6_WEB_DASHBOARD_EXPORT
          value: "/data/artifacts/k6-test-report.html"
      artifacts: # 6
        workingDir: /data/artifacts
        artifacts:
          paths:
          - '*'

K6 test from the repository
Resource requests have been set according to the node size (including slight overhead).
Config options, so specific settings (VUs, and duration in this case) can be set while running workflow
Run command - directory need to be created, so the k6 HTML report can be saved; config options passed to k6 command
K6 ENVs - to generate k6 html report
Saving test artifacts (HTML report from k6)

The workflow has then been executed multiple times with different VU settings, and resource usage has been monitored again to determine the most optimal settings for machine specs.The CPU usage, which seemed to be the limiting factor again, has been maxed out for 1100-1200 VU, and 1000VU has been chosen as the “safe” setting.

Having the nodes deployed in the same provider as the target service resulted in significantly lower response times in comparison to local runs. So, the average RPS (for 1000VU) was significantly higher - around 40kRPS.

‍

checks.........................: 100.00% ✓ 4811016      ✗ 0      
data_received..................: 20 GB   329 MB/s
data_sent......................: 296 MB  4.9 MB/s
http_req_blocked...............: avg=99.91µs min=90ns    med=220ns   max=486.22ms p(90)=270ns   p(95)=300ns  
http_req_connecting............: avg=25.3µs  min=0s      med=0s      max=410.64ms p(90)=0s      p(95)=0s     
http_req_duration..............: avg=23.59ms min=10.94ms med=20.76ms max=679.57ms p(90)=32.67ms p(95)=39.76ms
  { expected_response:true }...: avg=23.59ms min=10.94ms med=20.76ms max=679.57ms p(90)=32.67ms p(95)=39.76ms
http_req_failed................: 0.00%   ✓ 0            ✗ 2405508
http_req_receiving.............: avg=2.94ms  min=33.86µs med=1.49ms  max=271.81ms p(90)=7.38ms  p(95)=11.25ms
http_req_sending...............: avg=43.6µs  min=14.65µs med=23.98µs max=45.59ms  p(90)=61.73µs p(95)=90.98µs
http_req_tls_handshaking.......: avg=50.73µs min=0s      med=0s      max=409.7ms  p(90)=0s      p(95)=0s     
http_req_waiting...............: avg=20.6ms  min=4.53ms  med=18.8ms  max=677.7ms  p(90)=25.67ms p(95)=30.08ms
http_reqs......................: 2405508 39998.588465/s
iteration_duration.............: avg=24.91ms min=11.02ms med=21.2ms  max=679.66ms p(90)=36.19ms p(95)=45ms   
iterations.....................: 2405508 39998.588465/s
vus............................: 1000    min=1000       max=1000 
vus_max........................: 1000    min=1000       max=1000

Results have been compared for executions up to 1000VU, and the response times (both average, and P90s) have been very consistent (differences below 0.5ms). That means both the load generation, and the target service are stable under load.

Pushing VU Settings Too High - Resource Overutilization

At the same time, it’s important to choose VU settings wisely and thoroughly. Misconfigured or overly aggressive VU settings can lead to inconsistencies, degraded performance, and misleading conclusions about the system's actual capabilities. In the following case a significant increase in both average response time (39.26ms vs. 23.59ms) and 90th percentile response time (56.54ms vs. 32.67ms) are visible, indicating a noticeable performance degradation.

checks.........................: 100.00% ✓ 4728432      ✗ 0      
data_received..................: 20 GB   325 MB/s
data_sent......................: 292 MB  4.9 MB/s
http_req_blocked...............: avg=368.1µs  min=90ns    med=200ns   max=1.25s    p(90)=260ns   p(95)=290ns   
http_req_connecting............: avg=88.2µs   min=0s      med=0s      max=1.09s    p(90)=0s      p(95)=0s      
http_req_duration..............: avg=39.26ms  min=11.08ms med=37.51ms max=1.13s    p(90)=56.54ms p(95)=63.79ms 
  { expected_response:true }...: avg=39.26ms  min=11.08ms med=37.51ms max=1.13s    p(90)=56.54ms p(95)=63.79ms 
http_req_failed................: 0.00%   ✓ 0            ✗ 2364216
http_req_receiving.............: avg=9.11ms   min=39.84µs med=9.07ms  max=772.14ms p(90)=16.94ms p(95)=19.35ms 
http_req_sending...............: avg=106.97µs min=15.25µs med=25.45µs max=737.21ms p(90)=70.48µs p(95)=105.03µs
http_req_tls_handshaking.......: avg=242.01µs min=0s      med=0s      max=1.16s    p(90)=0s      p(95)=0s      
http_req_waiting...............: avg=30.04ms  min=0s      med=27.52ms max=1.13s    p(90)=43.97ms p(95)=51.4ms  
http_reqs......................: 2364216 39374.691594/s
iteration_duration.............: avg=50.3ms   min=11.73ms med=46.26ms max=1.29s    p(90)=77.27ms p(95)=89.62ms 
iterations.....................: 2364216 39374.691594/s
vus............................: 2000    min=2000       max=2000 
vus_max........................: 2000    min=2000       max=2000

Running k6 Test At Scale With Testkube’s `Parallel` Mode

Now, let’s target over 100k+ RPS. Testkube's parallel option allows running multiple “workers”, offering both scalability and flexibility. In this case, to achieve 100k+ RPS, we would need 3 workers.

apiVersion: testworkflows.testkube.io/v1
kind: TestWorkflow
metadata:
  name: k6-perf-test-workers
spec:
  config:
    vus: {type: integer, default: 20}
    duration: {type: string, default: '1m'}
    workers: {type: integer, default: 3} # 1
  content:
    git:
      uri: https://github.com/kubeshop/testkube
      revision: main
      paths:
      - test/k6/executor-tests/k6-perf-test-gcp.js
  steps:
  - name: Run test
    parallel:
      count: 'config.workers'
      transfer: # 2
      - from: /data/repo
      fetch: # 3
      - from: /data/artifacts
      use:
      - name: distribute/evenly # 4
      container:
        resources:
          requests:
            cpu: 15
            memory: 25Gi
      paused: true # 5
      run:
        image: grafana/k6:0.49.0
        workingDir: /data/repo/test/k6/executor-tests
        shell: mkdir /data/artifacts && k6 run k6-perf-test.js --vus {{ config.vus }} --duration {{ config.duration }} --execution-segment '{{ index }}/{{ count }}:{{ index + 1 }}/{{ count }}'
        env:
        - name: K6_WEB_DASHBOARD
          value: "true"
        - name: K6_WEB_DASHBOARD_EXPORT
          value: "/data/artifacts/k6-test-report-worker-{{ index + 1}}.html"
    artifacts:
      workingDir: /data/artifacts
      paths:
      - '*.html'

‍

1 - additional config option - number of parallel “workers”

2 - transfer - copying test file to worker “instances”

3 - fetch - collecting artifacts from worker “instances”

4 - distribute evenly assures specific “workers” will be distributed evenly across k8s nodes

5 - synchronise running all workers, so each “worker” will start load generation simultaneously

‍

K6 --execution-segment is used to spread the load generation across multiple “instances” evenly, and to make the results aggregation easier.

‍

Let’s now run this workflow with 3 workers, 3k VU (in total), and 15m execution time, and check the actual performance.

Worker 1:
     checks.........................: 100.00%  ✓ 68146940     ✗ 0       
     data_received..................: 281 GB   312 MB/s
     data_sent......................: 4.2 GB   4.7 MB/s
   http_req_duration..............: avg=25.69ms min=10.01ms  med=19.59ms  max=16.68s   p(90)=28.86ms p(95)=36.03ms
   http_reqs......................: 34073470 37853.093901/s
Worker 2:
     checks.........................: 100.00%  ✓ 67184534     ✗ 0       
     data_received..................: 277 GB   307 MB/s
     data_sent......................: 4.1 GB   4.6 MB/s
     http_req_duration..............: avg=26.03ms min=10.33ms med=19.81ms  max=16.65s   p(90)=29.57ms p(95)=36.96ms
     http_reqs......................: 33592267 37320.754617/s
Worker 3:
   checks.........................: 100.00%  ✓ 68536268     ✗ 0       
     data_received..................: 282 GB   314 MB/s
     data_sent......................: 4.2 GB   4.7 MB/s
   http_req_duration..............: avg=25.57ms min=10.01ms med=19.49ms  max=16.73s   p(90)=28.78ms p(95)=36.04ms
   http_reqs......................: 34268134 38069.613718/s

‍

Which sums up to over 113k RPS (6,78M RPM). The target service handles increased load consistently, however the average response time slightly increased resulting in slightly lower RPS.

The resources usage during end of the run looked like this:

gke-testkube-cloud-e-perf-test-node-p-199ba712-97lq   15280m       96%    23897Mi         84%
gke-testkube-cloud-e-perf-test-node-p-199ba712-cksq   15565m       97%    19874Mi         69%
gke-testkube-cloud-e-perf-test-node-p-199ba712-d28c   15670m       98%    19741Mi         69%

So, the nodes have been fully utilized in terms of CPU. Initially low memory usage increased significantly over time during this 15m run, and for longer runs would probably need to be increased (for example by using c3d-standard nodes instead of high-cpu). Network utilization during execution averaged at about 2.5Gb/s per “worker”.

Results Aggregation

In the above example, several separate k6 instances were executed, each generating its own individual report. While this approach may be sufficient for certain scenarios, it has limitations when you need a consolidated view of k6 test results. The workflow can be further extended by leveraging k6’s Prometheus endpoint feature. It enables centralized metrics collection and aggregation across all instances and supports visualization through tools like Grafana, providing real-time insights and a unified analysis of performance test data.

In this case the Prometheus and Grafana have been deployed at our test cluster, you can find prometheus and grafana values file here (Prometheus remote-write-receiver and Prometheus Narive histograms enabled): https://github.com/kubeshop/testkube/tree/main/test/examples/k6-perf/prometheus-grafana

The previous workflow has been extended with:
- k6 command: -o experimental-prometheus-rw --tag testid=worker-{{ index + 1}}

- ENVs:

        - name: K6_PROMETHEUS_RW_SERVER_URL
          value: 'http://prometheus-server.prometheus-grafana.svc.cluster.local:80/api/v1/write'
        - name: K6_PROMETHEUS_RW_TREND_STATS # additional trend stats
          value: 'p(95),p(99),min,max'
        - name: K6_PROMETHEUS_RW_TREND_AS_NATIVE_HISTOGRAM # native histogram enabled
          value: "true"

Resulting in this workflow:

apiVersion: testworkflows.testkube.io/v1
kind: TestWorkflow
metadata:
  name: k6-perf-test-gcp-workers-prometheus
spec:
  config:
    vus: {type: integer}
    duration: {type: string, default: '1m'}
    workers: {type: integer}
  content:
    git:
      uri: https://github.com/kubeshop/testkube
      revision: main
      paths:
      - test/k6/executor-tests/k6-perf-test-gcp.js
  steps:
  - name: Run test
    parallel:
      count: 'config.workers'
      transfer:
      - from: /data/repo
      fetch:
      - from: /data/artifacts
      use:
      - name: distribute/evenly
      container:
        resources:
          requests:
            cpu: 15
            memory: 25Gi
      paused: true # synchronise running all workers
      run:
        image: grafana/k6:0.49.0
        workingDir: /data/repo/test/k6/executor-tests
        shell: mkdir /data/artifacts && k6 run k6-perf-test-gcp.js -o experimental-prometheus-rw --vus {{ config.vus }} --duration {{ config.duration }} --execution-segment '{{ index }}/{{ count }}:{{ index + 1 }}/{{ count }}' --tag testid=worker-{{ index + 1}}
        env:
        - name: K6_WEB_DASHBOARD
          value: "true"
        - name: K6_WEB_DASHBOARD_EXPORT
          value: "/data/artifacts/k6-test-report-worker-{{ index + 1}}.html"
        - name: K6_PROMETHEUS_RW_SERVER_URL
          value: 'http://prometheus-server.prometheus-grafana.svc.cluster.local:80/api/v1/write'
        - name: K6_PROMETHEUS_RW_TREND_STATS
          value: 'p(95),p(99),min,max'
        - name: K6_PROMETHEUS_RW_TREND_AS_NATIVE_HISTOGRAM
          value: "true"
    artifacts:
      workingDir: /data/artifacts
      paths:
      - '*.html'

Conclusions

As demonstrated in the examples above, using Testkube and its parallel feature to run k6 tests made exceeding even 100k RPS almost effortless. Testkube's flexibility and scalability make it a perfect tool for running load tests at scale. Its tool-agnostic, and highly configurable design allows seamless integration with k6 and a wide range of other testing tools. Whether it’s load testing with tools like k6, JMeter, or Artillery, API testing with Postman, or UI testing with Playwright or Cypress, Testkube seamlessly adapts to meet the needs of various testing scenarios.

You can find additional Test Workflow examples for k6 in the Testkube docs: https://docs.testkube.io/articles/examples/k6-basic. You may also want to check examples for other load testing tools:

- JMeter: https://docs.testkube.io/articles/examples/jmeter-basic

- Artillery: https://docs.testkube.io/articles/examples/artillery-basic

- Gatling: https://docs.testkube.io/articles/examples/gatling-basic

Get started with Testkube at testkube.io - there are both Cloud and On-Prem versions available - and don’t hesitate to reach out to our team on Slack for any questions you might have!

About Testkube

Testkube is a test execution and orchestration framework for Kubernetes that works with any CI/CD system and testing tool you need. It empowers teams to deliver on the promise of agile, efficient, and comprehensive testing programs by leveraging all the capabilities of K8s to eliminate CI/CD bottlenecks, perfecting your testing workflow. Get started with Testkube's free trial today.