Skip to main content
Back to blog
performance 17 August 2022 5 min read

Kubernetes Performance Testing: Pods, Services, and Scaling

Learn how to performance test applications running on Kubernetes, including pod scaling behaviour, service mesh latency, and resource limit testing.

M

Mark

Performance Testing Expert

Performance testing applications on Kubernetes introduces challenges that don’t exist in traditional deployments. Pod scaling, resource limits, network policies, and service mesh overhead all affect application performance in ways that require specific testing approaches.

Understanding Kubernetes Performance Factors

Before testing, understand what affects performance in a Kubernetes environment:

FactorImpactTesting Consideration
Pod resource limitsCPU throttling, OOM killsTest at various load levels
Horizontal Pod AutoscalerScale-out latencyMeasure time to scale
Service mesh (Istio, Linkerd)Added latency per requestCompare with/without mesh
Network policiesConnection overheadTest cross-namespace calls
Node placementNetwork hops between podsTest pod affinity scenarios

Setting Up Test Infrastructure

For consistent results, deploy your load generator within the cluster:

# k6-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: k6-load-test
spec:
  template:
    spec:
      containers:
      - name: k6
        image: grafana/k6:latest
        command: ["k6", "run", "/scripts/test.js"]
        volumeMounts:
        - name: test-script
          mountPath: /scripts
        resources:
          requests:
            memory: "256Mi"
            cpu: "500m"
          limits:
            memory: "512Mi"
            cpu: "1000m"
      volumes:
      - name: test-script
        configMap:
          name: k6-test-script
      restartPolicy: Never

Store your test script in a ConfigMap:

kubectl create configmap k6-test-script --from-file=test.js
kubectl apply -f k6-job.yaml

Testing Pod Scaling Behaviour

The Horizontal Pod Autoscaler (HPA) doesn’t scale instantly. Measure the lag:

// test.js - Ramping load to trigger scaling
import http from 'k6/http';
import { sleep } from 'k6';

export const options = {
  stages: [
    { duration: '2m', target: 10 },   // Baseline
    { duration: '1m', target: 100 },  // Spike to trigger scaling
    { duration: '5m', target: 100 },  // Hold while scaling occurs
    { duration: '2m', target: 10 },   // Return to baseline
  ],
};

export default function () {
  http.get('http://my-service.default.svc.cluster.local/api/endpoint');
  sleep(0.5);
}

Monitor HPA activity during the test:

kubectl get hpa my-app-hpa -w

Typical observations:

  • Scale-up delay: 15-30 seconds after threshold breach
  • Pod startup time: 10-60 seconds depending on image size and readiness probes
  • Scale-down delay: 5 minutes by default (configurable)

Resource Limit Testing

Kubernetes enforces CPU and memory limits. Test what happens when limits are reached:

# Deploy with restrictive limits
resources:
  requests:
    memory: "128Mi"
    cpu: "100m"
  limits:
    memory: "256Mi"
    cpu: "200m"

During load testing, monitor for:

# CPU throttling
kubectl top pods -l app=my-app

# OOM kills
kubectl get events --field-selector reason=OOMKilled

# Pod restarts
kubectl get pods -l app=my-app -o wide

Signs of resource starvation:

  • Response times increase dramatically under moderate load
  • Pods restart during tests
  • CPU usage hits limit but throughput plateaus

Service Mesh Latency

If using Istio, Linkerd, or similar service mesh, measure the overhead:

Without mesh (direct pod-to-pod):

kubectl exec -it client-pod -- curl -w "@curl-format.txt" http://server-pod:8080/

With mesh (through sidecar proxies):

# Same request routes through Envoy/Linkerd proxies
kubectl exec -it client-pod -- curl -w "@curl-format.txt" http://server-service:8080/

Typical service mesh overhead:

MeshAdded Latency (p99)
Istio3-10ms
Linkerd1-3ms
No meshBaseline

For latency-sensitive applications, this overhead matters at high request volumes.

Cross-Namespace Performance

Test performance when services communicate across namespaces:

// Test internal service calls
const internalService = 'http://api-service.production.svc.cluster.local';
const crossNamespace = 'http://auth-service.security.svc.cluster.local';

export default function () {
  // Same namespace call
  const local = http.get(`${internalService}/health`);

  // Cross-namespace call
  const remote = http.get(`${crossNamespace}/validate`);

  // Compare timings
  console.log(`Local: ${local.timings.duration}ms, Remote: ${remote.timings.duration}ms`);
}

Network policies can add latency if complex rule evaluation is required.

Ingress Controller Testing

Test the ingress controller’s capacity separately from your application:

# nginx-ingress specific metrics
apiVersion: v1
kind: ConfigMap
metadata:
  name: nginx-configuration
data:
  enable-vts-status: "true"  # Enable metrics

Monitor ingress metrics during load:

kubectl exec -it nginx-ingress-controller-xxx -- curl localhost:18080/nginx_status

Key metrics:

  • Active connections
  • Requests per second
  • Connection queue depth

Persistent Volume Performance

For stateful applications, test storage performance:

apiVersion: v1
kind: Pod
metadata:
  name: storage-benchmark
spec:
  containers:
  - name: fio
    image: ljishen/fio
    command: ["fio", "--name=randwrite", "--ioengine=libaio", "--iodepth=16",
              "--rw=randwrite", "--bs=4k", "--size=1G", "--numjobs=4",
              "--time_based", "--runtime=60", "--filename=/data/test"]
    volumeMounts:
    - name: test-volume
      mountPath: /data
  volumes:
  - name: test-volume
    persistentVolumeClaim:
      claimName: test-pvc

Compare different storage classes:

Storage ClassIOPSLatencyUse Case
gp2 (AWS)3000 burst1-10msGeneral purpose
io1 (AWS)Provisioned<1msDatabases
standard (GKE)VariableVariableDevelopment
ssd (GKE)HigherLowerProduction

Monitoring During Tests

Deploy Prometheus and Grafana for real-time visibility:

helm install prometheus prometheus-community/kube-prometheus-stack

Key dashboards:

  • Container resource usage (CPU, memory per pod)
  • Network I/O per pod
  • Request latency percentiles
  • Error rates by service

Example PromQL queries for load testing:

# Request rate
sum(rate(http_requests_total{namespace="production"}[1m])) by (service)

# P99 latency
histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[1m])) by (le, service))

# Error rate
sum(rate(http_requests_total{status=~"5.."}[1m])) / sum(rate(http_requests_total[1m]))

Cluster Autoscaler Interaction

If using cluster autoscaler, test node scaling:

  1. Deploy pods that exceed current node capacity
  2. Measure time for new nodes to join
  3. Verify pods schedule correctly on new nodes
# Watch node scaling
kubectl get nodes -w

# Check pending pods
kubectl get pods --field-selector=status.phase=Pending

Node scaling typically takes 2-10 minutes depending on cloud provider and instance type.

Recommendations

  1. Always test within the cluster to measure realistic internal latencies
  2. Test scaling boundaries before they’re hit in production
  3. Monitor resource metrics alongside application metrics
  4. Test failure scenarios - what happens when pods crash under load?
  5. Document baseline performance for each resource configuration

Kubernetes adds operational complexity but also provides powerful scaling capabilities. Performance testing validates that your configuration actually delivers the resilience and scalability you expect.

Tags:

#kubernetes #containers #performance-testing #docker

Need help with performance testing?

Let's discuss how I can help improve your application's performance.

Get in Touch