

Table of Contents
Start your free trial.
Start your free trial.
Start your free trial.




Table of Contents
Executive Summary
Engineering teams are shipping software faster than ever. AI assistants write features in minutes, pull requests pile up, and release velocity keeps climbing. Anyone running a CI/CD pipeline already feels the catch: testing has not kept pace. More code means more tests, more tests mean longer pipelines, and longer pipelines mean more failures to triage. The tooling that was supposed to make teams faster is quietly creating new bottlenecks downstream.
In a recent webinar, Testkube CTO Ole Lensmar walked through how Testkube AI attacks this problem. Instead of bolting a chatbot onto a dashboard, Testkube infuses AI directly into the testing platform so it participates in your pipeline like any other part of your stack. Here is what was covered, plus a look at the live demo.
What are the three bottlenecks AI created?
The irony of AI-assisted development is that solving one constraint tends to expose the next. Ole broke the testing slowdown into three distinct bottlenecks.
Test creation. When AI writes more code, that code needs more tests. Teams are left with coverage gaps, because writing tests for everything AI generates is its own enormous task. AI does not always see the whole picture either: ask it to implement a feature and it may miss the ripple effects, letting bugs slip through.
Test execution. Once you have all those tests, whether you wrote them by hand or generated them, you have to run them. Suddenly a build that took five or ten minutes takes twenty-five, and the whole team waits. Your CI/CD tooling becomes the choke point.
Test analysis. After everything runs, someone has to make sense of the results: sift through failures, hunt for anomalies, and work out why a test is suddenly slow or flaky. As volume grows, mean time to resolution can climb fast.
Each bottleneck amplifies the others, and the cumulative effect is stalled releases and shaky release confidence. Testkube AI is built to attack all three.
What is Testkube?
Testkube is an open testing platform for AI-driven engineering teams. It runs your tests inside your own clusters, with a control plane that handles test orchestration and exposes everything through a dashboard, CLI, APIs, an MCP server, and your CI/CD tools.
A few properties matter here. Testkube is open and vendor-agnostic: it runs any framework, from Playwright and Cypress end-to-end tests to k6 load tests, API tests, security tests, and more. It is cloud-native, running inside your existing Kubernetes environment, which means execution happens on your own infrastructure for security, performance, and compliance reasons. And now it is AI-native, with AI features that already know about your tests, environments, execution history, and dependencies.
Pillar 1: AI-powered test creation
This is the headline feature, and the demo made the pitch concrete: describe what you want to test in plain language, and Testkube generates the test and runs it against your real application, in any framework and any language.
In the demo, Ole prompted Testkube to build an API test. It explored the API, generated a Postman collection, and ran it with Newman. When the first generated test failed, it noticed, went back, and improved the test on its own. He repeated the process for a k6 performance test and a Playwright end-to-end test, which ran in parallel across multiple workers and self-corrected when one node hit a locator error.
The point is not that AI can write test code. Plenty of tools do that. The point is the loop: generate, run inside your infrastructure, see real results, refine. Picture the workflow. Someone reports that the authentication service is acting up in staging. You ask for an API test against the auth endpoint, run it in staging immediately, and skip the entire "build it locally, then wire it into CI/CD" detour. Because Testkube already holds the history of previous tests against that service, it factors that context into what it generates.
Why not just use my coding assistant to write tests?
One audience question got at the obvious objection: if you already use AI to write code, why not use the same tool to write the tests? Ole's answer was about context and tuning. Testkube has access to your existing tests, so it understands what is already covered and how your suite is structured. Its skills are tuned for test creation, the same way a dedicated code-review tool beats a vanilla LLM prompt. And good tests need more than source code as context. If your code has a bug, a test generated only from that code may simply validate the bug. Feeding in requirements and other context produces tests that check what the system is supposed to do.
Pillar 2: Autonomous AI agents
If test creation is the front of the pipeline, agents are everywhere else. In Testkube, an agent is something you define to handle almost any testing-and-quality task: deciding which tests to run and in what order, optimizing test configuration such as sharding and memory allocation, triaging failures, doing root-cause analysis, generating reports, enforcing quality gates, and analyzing flakiness.
The demo showed an "AI Analyze" button that fires a troubleshooting agent directly from the dashboard. It pulls the logs and artifacts from a failed Playwright run and reports back what failed, where, how to fix it, and why the healthy workflows stayed healthy. Another agent produced a trend report across every execution of a workflow, complete with resource-usage patterns and optimization suggestions. A third was wired up to create an issue in Linear straight from the failure analysis, with a human-in-the-loop approval step before it acted.
A detail worth highlighting: when the team went through their own backlog of customer feature requests, they realized that a large share of them, including automatic reruns of flaky tests, flakiness detection and remediation, issue-tracker integration, and anomaly detection, were now just agent use cases. Many shipped as built-in templates rather than bespoke features.
Out of the box you get a handful of core agents (troubleshooting, optimize/analyze, reporting, and a general helper) plus templates such as a flaky-test detective, smart rerun, failure categorization, and an infrastructure validator that can generate validation workflows for you. You connect whatever MCP servers your agents need, choose your models, and decide what runs automatically versus what needs sign-off.
Most flakiness is not really flakiness
On flakiness specifically, Ole made a sharp point: flakiness usually is not flakiness. It is something happening outside the test that you cannot see, such as a config change, a source change, or another job loading the system at the same time. Give an agent access to the right MCP servers (your source code, Grafana or Datadog metrics, your Kubernetes runtime) and it can weave those signals together to find the real cause, separating infrastructure-induced flakiness from genuine test-design issues.
How much does it cost to run AI testing agents?
Agents that run often can burn tokens fast. The team hit this themselves when a noisy test with massive logs ran up the bill on failure categorization. Two takeaways. First, you do not need a frontier model for everything. Analyzing logs is a good job for a cheaper, faster, or even local model. Testkube lets you configure which models each agent uses, point at your own endpoints (including local or air-gapped models through something like Ollama, and, on the roadmap, set token caps and view consumption per agent and environment. Second, Testkube itself does not charge per test or per token. Because everything runs on your infrastructure, you can run as many tests in parallel as your clusters can handle. The only limits are the resources you assign.
Pillar 3: The Testkube MCP server
The third pillar connects everything to the AI tools you already use. The Testkube MCP server exposes around 30 tools covering most of what the platform does: running workflows, fetching results, creating workflows, and searching historical data. It lets AI coding assistants such as Claude Code or Cursor read your live testing context directly.
In the demo, Ole connected Cursor to a Testkube environment and asked it about a specific workflow. It called the MCP tools to assemble a high-level picture of how that test had been running. He also showed a flakiness analysis it produced over thousands of real executions, breaking failures into classes and rendering the whole thing as a clean canvas, all from data the assistant pulled through the MCP server.
This unlocks an AI-native development loop: write a feature in your AI editor, use the Testkube MCP server to run the relevant tests against your changes, get results back, and let the model correlate failures with the exact code you just touched. The current server uses token authentication, with OAuth support landing in an upcoming release, after which it will inherit the authenticating user's permissions automatically.
Where to start with AI test automation
One of the most grounded moments came at the end. You do not have to adopt all of this at once. If you have already nailed test creation with your own crafted prompts in Cursor or Claude Code, keep doing that, and let Testkube handle execution and analysis instead. Pick the bottleneck that is costing your team the most and start there.
That came with a healthy realism about the technology. AI is non-deterministic, and so are people. Getting agents to behave takes iteration: the team's first failure-categorization attempt went wrong before they tuned it. A useful trick Ole shared: AI is excellent at writing the very prompts and agents you will use, so ask it to draft a deterministic agent prompt for a given task and refine from there. And keep a human in the loop, especially before anything self-heals a test, because sometimes a failing test is correctly catching a real bug, and you do not want an agent to fix it away.
Key takeaways
- AI shifts the bottleneck, it does not remove it. Faster code creation simply exposes the next constraint: creating, running, and analyzing enough tests to keep up.
- Testkube AI attacks all three bottlenecks as one system. Natural-language test creation, autonomous agents, and an MCP server cover creation, execution, and analysis together.
- Tests run inside your own clusters. Execution happens on infrastructure you control, with no per-test or per-token charge and parallelism bounded only by your resources.
- Most flakiness is an infrastructure signal, not a test defect. Agents that read source, metrics, and runtime context can find the real cause instead of masking it.
- Start with one bottleneck and keep a human in the loop. Adopt incrementally, and review agent actions before anything self-heals a test that may be catching a real bug.
Frequently asked questions


About Testkube
Testkube is the open testing platform for AI-driven engineering teams. It runs tests directly in your Kubernetes clusters, works with any CI/CD system, and supports every testing tool your team uses. By removing CI/CD bottlenecks, Testkube helps teams ship faster with confidence.
Get Started with a trial to see Testkube in action.




