Responsive

Infrastructure Testing in Kubernetes

Infrastructure Testing in Kubernetes

Published
February 28, 2025
Atulpriya Sharma
Sr. Developer Advocate
InfraCloud Technologies
Share on X
Share on LinkedIn
Share on Reddit
Share on HackerNews
Copy URL

Table of Contents

Start Using Testkube with a Free Trial Today

Subscribe to our monthly newsletter to stay up to date with all-things Testkube.

Last updated
February 28, 2025
Atulpriya Sharma
Sr. Developer Advocate
InfraCloud Technologies
Share on X
Share on LinkedIn
Share on Reddit
Share on HackerNews
Copy URL

Table of Contents

Kubernetes has become the de facto container orchestrator that enables organizations to build scalable, available, and resilient apps that meet modern business demands. However, with this comes complexity. Infrastructure failures, from network routing to node failures and storage failures, can lead to significant business disruptions. A single misconfigured policy or resource quota can lead to a complete outage, directly impacting the revenue and customer trust. 

That is where Infrastructure testing in Kubernetes becomes crucial. By validating Kubernetes components, teams can prevent downtimes, ensure compliance, and maintain high availability. Thorough testing of various components and layers helps identify issues before they reach production, reducing mean time to recover (MTTR) and improving overall reliability.

In this post, we’ll examine infrastructure testing strategies for Kubernetes and discuss ways and best practices for testing infrastructure to ensure production readiness.

Key Areas of Infrastructure Testing

When we talk about infrastructure testing in Kubernetes, we’re talking about the foundation that supports your entire application ecosystem. Beyond just checking if the pods are running, it involves comprehensive testing of cluster health, compute, networking, and storage resources. Such an all-around approach ensures that your infrastructure is ready to support your production workloads while keeping up with security and performance standards. 

Cluster Availability and Health

This serves as the foundation for reliable Kubernetes operations. Effective testing requires validating the control plane components - API server responsiveness, scheduler functionality, and controller manager operations - and ensuring these meet the SLAs. You must also validate node readiness to confirm kubelet health and capacity for scheduling your workloads. 

You should also focus on validating etcd to ensure that core configurations are intact to prevent data corruption. You should also check add-on services like CoreDNS, kubeproxy, and any other tool you use to identify warning signals early and avoid cascading failures. 

Compute Resource Validation

Ensuring your cluster can effectively manage and distribute workloads is critical. It is important to verify the node capacity metrics including CPU, memory and storage while confirming the allocatable resources after system reservations. For specialized workloads like AI & ML, you need to validate device plugins and compatibility for GPU/TPU resources. 

Resource quota enforcement testing ensures namespace level constraints, validating the limits during load conditions and ensuring appropriate throttling behaviors. It’s also important to validate node affinity and taint configurations for correct pod placement to maintain workload isolation and ensure critical workloads get adequate resources. 

Network Validation

Testing networks is critical for application connectivity. DNS resolution tests must be done to confirm both internal kube-dns and external name resolution are working as expected with an appropriate timeout. Service resolution testing ensures ClusterIP, NodePort, or LoadBalancers services function correctly across namespaces. 

Test your ingress to confirm correct rule processing, TLS termination, and backend service routing under various load conditions. Validate network policies for proper pod isolation. You should also check the load balancer configuration to ensure proper health probe functionality, session persistence, and correct traffic distribution. 

Storage Validation

Validating storage ensures data persistence and availability for stateful workloads in Kubernetes environments. Storage class availability testing verifies provisioner functionality and validates reclaim policies across different storage backends. PVC provisioning tests must validate capacity limits, access modes and correct volume attachment to nodes.

Measuring throughput and latency under various load conditions is also critical for ensuring I/O profiles meet application requirements. This will also identify any bottlenecks during concurrent operations. Lastly, validating backups is critical for smooth operations. Check for snapshot creation capabilities, check restoring procedures with data integrity verification, and validate retention policy enforcement.

When To Run Infrastructure Tests

While we now know the importance of infrastructure testing in Kubernetes, knowing precisely when to execute infrastructure tests can reduce operational risks. Infrastructure failures can impact multiple workloads simultaneously. By knowing when to execute your infrastructure tests, you can prevent any cascading failures and reduce debug times.

Below are some strategic points when you should run your infrastructure tests:

  • Pre-deployment: validate infrastructure and cluster health before workloads are deployed.
  • Post-deployment: After the workload is deployed and there are any changes done using tools like Terraform or CloudFormation
  • Post-upgrade: After the cluster is upgraded, verify compatibility.
  • Scheduled intervals: Regular infrastructure tests to detect any configuration drift.
  • Before critical releases: Conduct pre-flight checks before deploying critical workloads.

The scope of testing and frequency should align with your management practices along with a comprehensive testing plan.

Implementing Infrastructure Testing with Testkube

Infrastructure testing in Kubernetes requires validating multiple components across cluster health, compute, networking, and storage. This often demands multiple testing tools and methods, many of which weren’t designed for container environments. Traditional approaches create integration challenges and add to management overhead. 

That’s where Testkube comes in. It addresses these challenges by providing a cloud-native testing framework that unifies infrastructure testing workflows. Let's understand how Testkube helps.

Testkube is a cloud-native, vendor-agnostic test execution framework that makes testing in Kubernetes straightforward. Using Testkube, you can orchestrate and automate complex validation workflows using different testing tools while maintaining Kubernetes-native operations.

Benefits of Using Testkube For Infrastructure Testing

Testkube enhances infrastructure testing by integrating seamlessly with Kubernetes, allowing teams to leverage their full potential. Here are a few benefits of using Testkube:

  • Orchestrate Complex Test Workflow: Test Workflows enable sequential and parallel test executions that mimic real-world scenarios without complex scripting, giving you better control over infrastructure validation.
  • Seamless integration: Testkube effortlessly articulates with CI/CD platforms like Jenkins, GitLab CI, and Argo CD, allowing for automated infrastructure testing as part of the deployment process.
  • Integration with Testing Tools: Testkube works with popular testing tools like k6, Curl, and Postman, allowing you to combine different approaches for comprehensive infrastructure validation.
  • Single pane of Glass: All test results, logs, and artifacts are in one place, no matter how tests are triggered, and they provide streamlined test troubleshooting and powerful holistic reporting

Thanks to its flexibility and alignment with cloud-native technologies, Testkube allows you to execute test workflows at various strategic points in your application's lifecycle:

Infrastructure testing using Testkube

By orchestrating tests that leverage familiar tools like Curl, Pytest, and K6, Testkube eliminates integration challenges without compromising text execution. Below are a few examples demonstrating how Testkube makes complex infrastructure testing accessible and efficient for production environments.

Using Curl for Cluster Health Checks and Resource Validations

Testkube makes it easy to implement comprehensive cluster health checks using simple HTTP-based tools like Curl. This validates control plane components, etcd health, and node readiness in a Kubernetes-native way.

apiVersion: testworkflows.testkube.io/v1
kind: TestWorkflow
metadata:
	name: cluster-health-validation
    namespace: testkube
spec:
	steps:
    - name: Validate API Server Health
    container:
    	image: curlimages/curl:8.7.1
        shell: curl -k https://kubernetes.default.svc/healthz
    - name: Check Node Readiness
    container:
    	image: curlimages/curl:8.7.1
        shell: curl -k https://kubernetes.default.svc/api/v1/nodes | grep -i ready
    - name: Validate Etcd Health
    container:
    	image: curlimages/curl:8.7.1
        shell: curl -k https://kubernetes.default.svc/healthz/etcd
     - name: Check CoreDNS Status
     container:
     	image: curlimages/curl:8.7.1
        shell: "curl -k https://kubernetes.default.svc/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy/health"

This workflow validates critical cluster components by sequentially checking API server responsiveness, confirming node readiness status, verifying etcd health, and ensuring CoreDNS is working as expected.

Using cURL for Network Testing

Network testing requires validating critical communication paths in your Kubernetes cluster. Here's a simple example using cURL to verify ingress controller health:

kind: TestWorkflow
apiVersion: testworkflows.testkube.io/v1
metadata:
	name: ingress-health-check
    namespace: testkube
    spec:
    	content:
        	files:
            - path: /data/check_ingress.sh
            content:|-
            #!/bin/bash
            set -e
            
            # Test Ingress controller health
            echo "Testing Ingress controller..."
            ingress_status=$(curl -s -o /dev/null -w "%{http_code}" http://ingress-nginx-controller.ingress-nginx.svc.cluster.local:10254/healthz)
            if [ "$ingress_status" == "200" ]; then
            	echo "✅ Ingress controller is healthy"
            else
            	echo "❌ Ingress controller check failed with status $ingress_status"
                exit 1
               fi
           steps:
           - name: Check Ingress Health
           workingDir: /data
           run:
           	image: curlimages/curl:8.3.0
            command: ["/bin/sh", "-c"]
            args:
            - |
            chmod +x /data/check_ingress.sh
            /data/check_ingress.sh‍

This workflow focuses on validating your ingress controller's health - a critical component for external traffic routing.

Using Node Selectors for GPU Test

Hardware validation ensures that specialized workloads have the resources they need. This example tests GPU availability and performance.

```yaml
apiVersion: testworkflows.testkube.io/v1
kind: TestWorkflow
metadata:
	name: nvidia-gpu-workflow
    namespace: testkube
spec:
container:
	resources:
    	limits:
        	nvidia.com/gpu: 1
            memory: 200Mi
            cpu: 100m
        requests:
        	nvidia.com/gpu: 1
            memory: 200Mi
            cpu: 100m
pod:
	nodeSelector:
    	cloud.google.com/gke-accelerator: nvidia-tesla-t4
   steps:
   	- name: Run test
    run:
    	image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda10.2‍

The above test workflow has a single step that executes a Docker image which does matrix calculations using CUDA. The test will fail if the Pod running it does not run on a Node with a GPU or if the CUDA Toolkit is not installed.

What we saw here were a few examples of how you can use Testkube for Infrastructure testing in Kubernetes. You can build more complex workflows to suit your requirements. Check out our blog posts to learn more about building complex workflows using Testkube. 

Summary

Infrastructure testing in Kubernetes is complex due to the presence of multiple components, resources, networking, and storage. Effective validation of all of these requires a comprehensive approach using multiple tools, each targeting different aspects of infrastructure testing. 

Testkube addresses this complexity by providing a cloud-native and vendor-agnostic testing framework that unifies different tools into cohesive workflows. It helps teams implement thorough infrastructure validation with minimal operation overhead. 

We would love to hear all about the custom test workflows that you created using Testkube for infrastructure testing. If you face any issues, remember that the entire Testkube team, plus a vibrant community of fellow Kubernetes testers, are on Slack. We’re just getting started in building the most comprehensive cloud-native testing framework for Kubernetes so feel free to follow us on Twitter @testkube_io.

About Testkube

Testkube is a test execution and orchestration framework for Kubernetes that works with any CI/CD system and testing tool you need, empowering teams to deliver on the promise of agile, efficient, and comprehensive testing programs by leveraging all the capabilities of K8s to eliminate CI/CD bottlenecks, perfecting your testing workflow. Get started with Testkube's free trial today.