Kubernetes Performance Testing: Pods, Services, and Scaling

Performance testing applications on Kubernetes introduces challenges that don’t exist in traditional deployments. Pod scaling, resource limits, network policies, and service mesh overhead all affect application performance in ways that require specific testing approaches.

Understanding Kubernetes Performance Factors

Before testing, understand what affects performance in a Kubernetes environment:

Factor	Impact	Testing Consideration
Pod resource limits	CPU throttling, OOM kills	Test at various load levels
Horizontal Pod Autoscaler	Scale-out latency	Measure time to scale
Service mesh (Istio, Linkerd)	Added latency per request	Compare with/without mesh
Network policies	Connection overhead	Test cross-namespace calls
Node placement	Network hops between pods	Test pod affinity scenarios

Setting Up Test Infrastructure

For consistent results, deploy your load generator within the cluster:

# k6-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: k6-load-test
spec:
  template:
    spec:
      containers:
      - name: k6
        image: grafana/k6:latest
        command: ["k6", "run", "/scripts/test.js"]
        volumeMounts:
        - name: test-script
          mountPath: /scripts
        resources:
          requests:
            memory: "256Mi"
            cpu: "500m"
          limits:
            memory: "512Mi"
            cpu: "1000m"
      volumes:
      - name: test-script
        configMap:
          name: k6-test-script
      restartPolicy: Never

Store your test script in a ConfigMap:

kubectl create configmap k6-test-script --from-file=test.js
kubectl apply -f k6-job.yaml

Testing Pod Scaling Behaviour

The Horizontal Pod Autoscaler (HPA) doesn’t scale instantly. Measure the lag:

// test.js - Ramping load to trigger scaling
import http from 'k6/http';
import { sleep } from 'k6';

export const options = {
  stages: [
    { duration: '2m', target: 10 },   // Baseline
    { duration: '1m', target: 100 },  // Spike to trigger scaling
    { duration: '5m', target: 100 },  // Hold while scaling occurs
    { duration: '2m', target: 10 },   // Return to baseline
  ],
};

export default function () {
  http.get('http://my-service.default.svc.cluster.local/api/endpoint');
  sleep(0.5);
}

Monitor HPA activity during the test:

kubectl get hpa my-app-hpa -w

Typical observations:

Scale-up delay: 15-30 seconds after threshold breach
Pod startup time: 10-60 seconds depending on image size and readiness probes
Scale-down delay: 5 minutes by default (configurable)

Resource Limit Testing

Kubernetes enforces CPU and memory limits. Test what happens when limits are reached:

# Deploy with restrictive limits
resources:
  requests:
    memory: "128Mi"
    cpu: "100m"
  limits:
    memory: "256Mi"
    cpu: "200m"

During load testing, monitor for:

# CPU throttling
kubectl top pods -l app=my-app

# OOM kills
kubectl get events --field-selector reason=OOMKilled

# Pod restarts
kubectl get pods -l app=my-app -o wide

Signs of resource starvation:

Response times increase dramatically under moderate load
Pods restart during tests
CPU usage hits limit but throughput plateaus

Service Mesh Latency

If using Istio, Linkerd, or similar service mesh, measure the overhead:

Without mesh (direct pod-to-pod):

kubectl exec -it client-pod -- curl -w "@curl-format.txt" http://server-pod:8080/

With mesh (through sidecar proxies):

# Same request routes through Envoy/Linkerd proxies
kubectl exec -it client-pod -- curl -w "@curl-format.txt" http://server-service:8080/

Typical service mesh overhead:

Mesh	Added Latency (p99)
Istio	3-10ms
Linkerd	1-3ms
No mesh	Baseline

For latency-sensitive applications, this overhead matters at high request volumes.

Cross-Namespace Performance

Test performance when services communicate across namespaces:

// Test internal service calls
const internalService = 'http://api-service.production.svc.cluster.local';
const crossNamespace = 'http://auth-service.security.svc.cluster.local';

export default function () {
  // Same namespace call
  const local = http.get(`${internalService}/health`);

  // Cross-namespace call
  const remote = http.get(`${crossNamespace}/validate`);

  // Compare timings
  console.log(`Local: ${local.timings.duration}ms, Remote: ${remote.timings.duration}ms`);
}

Network policies can add latency if complex rule evaluation is required.

Ingress Controller Testing

Test the ingress controller’s capacity separately from your application:

# nginx-ingress specific metrics
apiVersion: v1
kind: ConfigMap
metadata:
  name: nginx-configuration
data:
  enable-vts-status: "true"  # Enable metrics

Monitor ingress metrics during load:

kubectl exec -it nginx-ingress-controller-xxx -- curl localhost:18080/nginx_status

Key metrics:

Active connections
Requests per second
Connection queue depth

Persistent Volume Performance

For stateful applications, test storage performance:

apiVersion: v1
kind: Pod
metadata:
  name: storage-benchmark
spec:
  containers:
  - name: fio
    image: ljishen/fio
    command: ["fio", "--name=randwrite", "--ioengine=libaio", "--iodepth=16",
              "--rw=randwrite", "--bs=4k", "--size=1G", "--numjobs=4",
              "--time_based", "--runtime=60", "--filename=/data/test"]
    volumeMounts:
    - name: test-volume
      mountPath: /data
  volumes:
  - name: test-volume
    persistentVolumeClaim:
      claimName: test-pvc

Compare different storage classes:

Storage Class	IOPS	Latency	Use Case
gp2 (AWS)	3000 burst	1-10ms	General purpose
io1 (AWS)	Provisioned	<1ms	Databases
standard (GKE)	Variable	Variable	Development
ssd (GKE)	Higher	Lower	Production

Monitoring During Tests

Deploy Prometheus and Grafana for real-time visibility:

helm install prometheus prometheus-community/kube-prometheus-stack

Key dashboards:

Container resource usage (CPU, memory per pod)
Network I/O per pod
Request latency percentiles
Error rates by service

Example PromQL queries for load testing:

# Request rate
sum(rate(http_requests_total{namespace="production"}[1m])) by (service)

# P99 latency
histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[1m])) by (le, service))

# Error rate
sum(rate(http_requests_total{status=~"5.."}[1m])) / sum(rate(http_requests_total[1m]))

Cluster Autoscaler Interaction

If using cluster autoscaler, test node scaling:

Deploy pods that exceed current node capacity
Measure time for new nodes to join
Verify pods schedule correctly on new nodes

# Watch node scaling
kubectl get nodes -w

# Check pending pods
kubectl get pods --field-selector=status.phase=Pending

Node scaling typically takes 2-10 minutes depending on cloud provider and instance type.

Recommendations

Always test within the cluster to measure realistic internal latencies
Test scaling boundaries before they’re hit in production
Monitor resource metrics alongside application metrics
Test failure scenarios - what happens when pods crash under load?
Document baseline performance for each resource configuration

Kubernetes adds operational complexity but also provides powerful scaling capabilities. Performance testing validates that your configuration actually delivers the resilience and scalability you expect.

Kubernetes Performance Testing: Pods, Services, and Scaling

Understanding Kubernetes Performance Factors

Setting Up Test Infrastructure

Testing Pod Scaling Behaviour

Resource Limit Testing

Service Mesh Latency

Cross-Namespace Performance

Ingress Controller Testing

Persistent Volume Performance

Monitoring During Tests

Cluster Autoscaler Interaction

Recommendations

Related Articles

Need help with performance testing?

Kubernetes Performance Testing: Pods, Services, and Scaling

Understanding Kubernetes Performance Factors

Setting Up Test Infrastructure

Testing Pod Scaling Behaviour

Resource Limit Testing

Service Mesh Latency

Cross-Namespace Performance

Ingress Controller Testing

Persistent Volume Performance

Monitoring During Tests

Cluster Autoscaler Interaction

Recommendations

Related Articles

The Importance of Test Data in Performance Testing

Building a Performance Testing Strategy: From Requirements to Metrics

Azure Performance Testing: App Service and AKS Optimization

Need help with performance testing?