

Table of Contents
Try Testkube instantly in our sandbox. No setup needed.
Try Testkube instantly in our sandbox. No setup needed.


Table of Contents
Executive Summary
You know the drill. You run the suite on your machine. Green across the board. You push, CI kicks off, and ten minutes later your Slack lights up: three failures, all in tests that passed thirty seconds ago on your laptop.
So you rerun. Maybe one passes now. Maybe two. The third is still red, but the error is different this time. You dig into the logs, but they're scattered across your CI provider's UI, mixed in with build output and deployment noise. By the time you find the relevant stack trace, you've lost twenty minutes and your confidence in the suite.
This is not a flaky test problem. It's an environment problem. And if your team is running tests on Kubernetes, the gap between "local" and "CI" is wider than most people realize.
The environment gap nobody talks about
When a test passes locally and fails in CI, the instinct is to blame the test. Flaky assertion, timing issue, bad mock. Sometimes that's true. But more often, the real cause is that "local" and "CI" are two fundamentally different environments pretending to be the same thing.
Your laptop has predictable CPU allocation, a fast local filesystem, consistent DNS resolution, and whatever services you've got running in Docker Compose. Your CI runner is a shared VM (or a container on a shared node) with throttled resources, network policies it inherited from the cluster, and services that may or may not be fully ready when your test starts hitting them.
In Kubernetes environments, this gets worse. Your application runs in pods with specific resource requests, network policies, service mesh sidecars, and ConfigMaps. Your CI pipeline runs tests in a completely different context, usually outside the cluster entirely, reaching in through an ingress or port-forward. The test and the thing it's testing aren't even in the same network namespace.
That mismatch is where flakiness lives. Not in your assertions. In the six inches of infrastructure between your test runner and your application.
Three ways the gap bites you
Resource contention changes timing. Your laptop gives a test all the CPU it wants. A shared CI runner doesn't. A test that expects a response within 200ms locally might need 800ms on a resource-constrained node. The test didn't get flaky. The environment got slower.
Service readiness is non-deterministic. Locally, you start your dependencies, wait until they're healthy, then run tests. In CI, you might be testing against services that are still initializing, or that share a database with another pipeline's test run. Kubernetes readiness probes help, but only if your test runner knows to respect them.
Network topology matters more than you think. A test that calls localhost:8080 on your machine is hitting the process directly. The same test in CI might be going through a Kubernetes Service, an ingress controller, and possibly a service mesh proxy. Each hop adds latency, and each hop can fail independently. When your test reports a connection timeout, good luck figuring out which layer caused it.

The scattered tooling problem
Here's where QA teams hit a second wall. Even if you identify the root cause, fixing it requires touching infrastructure that probably isn't yours to touch.
Most QA engineers don't own the CI pipeline. They don't own the Kubernetes cluster. They don't own the network policies. They write tests, and someone else is responsible for where and how those tests run. When a test fails in CI, the QA engineer files a ticket, the DevOps team investigates, and three days later someone adjusts a timeout or adds a retry. The underlying environment mismatch remains.
The tooling doesn't help either. Test results are in one place. CI logs are in another. Application logs are in a third. Kubernetes events (the ones that would actually explain why a pod was evicted mid-test) are in kubectl output that nobody thought to capture. Sergio Valin Cabrera, a Senior Software Engineer and DevOps lead at FunXtion, described the situation his team faced before they consolidated their testing: QA processes were scattered across CI pipelines and arbitrary scripts run from multiple places, with no ability to perform testing natively within their Kubernetes clusters.
FunXtion was in the middle of migrating from a monolith to a microservices architecture on Kubernetes. Their QA testing processes were split between CI pipelines and manual execution. Reporting was fragmented: different dashboards for different test types. When a test failed, the QA engineers couldn't tell whether the failure was in their test, in the application, or in the infrastructure between.
Running tests where the code actually lives
The fix isn't writing better tests. It's running tests in the same environment where the application runs.
When tests execute as Kubernetes jobs inside the cluster, they share the same network namespace, the same DNS resolution, the same service mesh, and the same resource constraints as the application. The gap between "local behavior" and "test behavior" shrinks to nearly zero, because the test is running in the real environment, not a simulation of it.
This is the approach 3Key Company took for their CZERTAINLY platform, an open-source PKI and encryption system built on microservices. Roman Cinkais, the co-founder, was facing a common challenge: a small team (four developers, three platform engineers, one part-time tester) trying to maintain quality across a microservices platform where each service might need different types of tests. Before they adopted in-cluster testing, their process was limited to code reviews, unit tests, a handful of API tests, and manual testing before each release.
The shift to running tests inside Kubernetes changed what was possible. The team built an automated testing framework that could spin up an ephemeral environment, execute the full test suite, and tear it down, all managed in code, all running inside the cluster. As Roman put it: they gained better visibility on the impact of introduced changes and timely notification when something broke.
For FunXtion, the shift was equally concrete. They started with API load tests using k6, running natively in their Kubernetes clusters. Because the tests ran inside the cluster rather than hitting it from outside, the results were reliable. No latency artifacts from network hops, no false positives from CI runner resource limits. They automated these tests to trigger on deployment, with results piped to a Grafana dashboard and Slack notifications firing when performance metrics crossed thresholds.
The team stopped manually tracking performance test execution. The process went from "a few engineers with the required setup and knowledge" to fully automated, running on the existing Kubernetes infrastructure.
What changes when tests run inside the cluster
The immediate benefit is fewer false failures. But the downstream effects matter more for QA teams:
One place for results. When every test (functional, load, end-to-end) runs through the same orchestration layer, results land in one dashboard. You don't need to check GitHub Actions for unit tests, Grafana for load tests, and a shared spreadsheet for manual regression results. FunXtion's team now has centralized reporting and logging for all natively run tests, regardless of what triggered them.
Environment-specific test definitions. Instead of one test suite that runs everywhere and fails in unpredictable ways, you can define which tests run in which environment. FunXtion does this declaratively, following a GitOps philosophy. Tests are versioned alongside infrastructure, and the right tests run in the right clusters automatically.
QA engineers focus on testing, not infrastructure. This is the part that doesn't show up in metrics but changes everything about the job. When the test platform handles execution, scheduling, and environment setup, QA engineers can spend their time writing better tests instead of debugging why their perfectly good tests fail in someone else's pipeline. As FunXtion's team put it, they wanted their QA engineers working on what they do best (testing) without being caught up in infrastructure or Kubernetes details.
Real confidence in results. 3Key's team built something worth noting: a system where anyone can write tests that are executed independently, in an ephemeral environment that gets created and destroyed automatically. The tests aren't tied to a specific machine, a specific developer's local setup, or a specific CI runner's quirks. They run the same way every time, because they run in the same environment every time.

The hard part: getting started without boiling the ocean
You don't need to migrate every test overnight. Both 3Key and FunXtion started small.
FunXtion began with load tests: a single test type, one executor (k6), automated on deployment. Once that was working and the team trusted the results, they expanded to end-to-end tests. 3Key started by building the framework and defining their first use cases, planning to expand into performance testing with k6 and security testing with ZAP.
The pattern is the same: pick the tests that hurt the most when they fail in CI, move those into the cluster first, and prove to the team that the results are reliable. Then expand.
A few things that make the transition easier: look for a platform that's framework-agnostic, since your QA team probably uses a mix of tools, and forcing everyone onto one framework will slow adoption. Make sure test execution can be triggered from multiple sources: CI/CD webhooks, Kubernetes events, schedules, or manual runs. And centralize your logs and artifacts so that when a test does fail, debugging takes minutes instead of hours.
Stop blaming the tests
The next time a test passes on your machine and fails in CI, resist the urge to add a retry or bump a timeout. Ask instead: is the test running in the same environment as the application? Does the test runner share the same network, the same resources, the same service dependencies?
If the answer is no, no amount of test-level fixes will solve the problem. The environment gap will keep producing failures that look like flaky tests but are actually infrastructure mismatches.
3Key and FunXtion both found the same thing: when tests run where the code runs, the false failures disappear and QA teams can focus on actual quality, catching real bugs, not chasing phantom ones.
Testkube runs tests as native Kubernetes jobs in your clusters, supports any testing framework, and gives QA teams a single interface for execution, results, and debugging. Explore the architecture, start a Testkube trial, or see how it integrates with your CI/CD pipeline.


About Testkube
Testkube is a cloud-native continuous testing platform for Kubernetes. It runs tests directly in your clusters, works with any CI/CD system, and supports every testing tool your team uses. By removing CI/CD bottlenecks, Testkube helps teams ship faster with confidence.
Explore the sandbox to see Testkube in action.




