Kubernetes Performance Testing: Pods, Services, and Scaling
Learn how to performance test applications running on Kubernetes, including pod scaling behaviour, service mesh latency, and resource limit testing.
Mark
Performance Testing Expert
Performance testing applications on Kubernetes introduces challenges that don’t exist in traditional deployments. Pod scaling, resource limits, network policies, and service mesh overhead all affect application performance in ways that require specific testing approaches.
Understanding Kubernetes Performance Factors
Before testing, understand what affects performance in a Kubernetes environment:
| Factor | Impact | Testing Consideration |
|---|---|---|
| Pod resource limits | CPU throttling, OOM kills | Test at various load levels |
| Horizontal Pod Autoscaler | Scale-out latency | Measure time to scale |
| Service mesh (Istio, Linkerd) | Added latency per request | Compare with/without mesh |
| Network policies | Connection overhead | Test cross-namespace calls |
| Node placement | Network hops between pods | Test pod affinity scenarios |
Setting Up Test Infrastructure
For consistent results, deploy your load generator within the cluster:
# k6-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: k6-load-test
spec:
template:
spec:
containers:
- name: k6
image: grafana/k6:latest
command: ["k6", "run", "/scripts/test.js"]
volumeMounts:
- name: test-script
mountPath: /scripts
resources:
requests:
memory: "256Mi"
cpu: "500m"
limits:
memory: "512Mi"
cpu: "1000m"
volumes:
- name: test-script
configMap:
name: k6-test-script
restartPolicy: Never
Store your test script in a ConfigMap:
kubectl create configmap k6-test-script --from-file=test.js
kubectl apply -f k6-job.yaml
Testing Pod Scaling Behaviour
The Horizontal Pod Autoscaler (HPA) doesn’t scale instantly. Measure the lag:
// test.js - Ramping load to trigger scaling
import http from 'k6/http';
import { sleep } from 'k6';
export const options = {
stages: [
{ duration: '2m', target: 10 }, // Baseline
{ duration: '1m', target: 100 }, // Spike to trigger scaling
{ duration: '5m', target: 100 }, // Hold while scaling occurs
{ duration: '2m', target: 10 }, // Return to baseline
],
};
export default function () {
http.get('http://my-service.default.svc.cluster.local/api/endpoint');
sleep(0.5);
}
Monitor HPA activity during the test:
kubectl get hpa my-app-hpa -w
Typical observations:
- Scale-up delay: 15-30 seconds after threshold breach
- Pod startup time: 10-60 seconds depending on image size and readiness probes
- Scale-down delay: 5 minutes by default (configurable)
Resource Limit Testing
Kubernetes enforces CPU and memory limits. Test what happens when limits are reached:
# Deploy with restrictive limits
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "200m"
During load testing, monitor for:
# CPU throttling
kubectl top pods -l app=my-app
# OOM kills
kubectl get events --field-selector reason=OOMKilled
# Pod restarts
kubectl get pods -l app=my-app -o wide
Signs of resource starvation:
- Response times increase dramatically under moderate load
- Pods restart during tests
- CPU usage hits limit but throughput plateaus
Service Mesh Latency
If using Istio, Linkerd, or similar service mesh, measure the overhead:
Without mesh (direct pod-to-pod):
kubectl exec -it client-pod -- curl -w "@curl-format.txt" http://server-pod:8080/
With mesh (through sidecar proxies):
# Same request routes through Envoy/Linkerd proxies
kubectl exec -it client-pod -- curl -w "@curl-format.txt" http://server-service:8080/
Typical service mesh overhead:
| Mesh | Added Latency (p99) |
|---|---|
| Istio | 3-10ms |
| Linkerd | 1-3ms |
| No mesh | Baseline |
For latency-sensitive applications, this overhead matters at high request volumes.
Cross-Namespace Performance
Test performance when services communicate across namespaces:
// Test internal service calls
const internalService = 'http://api-service.production.svc.cluster.local';
const crossNamespace = 'http://auth-service.security.svc.cluster.local';
export default function () {
// Same namespace call
const local = http.get(`${internalService}/health`);
// Cross-namespace call
const remote = http.get(`${crossNamespace}/validate`);
// Compare timings
console.log(`Local: ${local.timings.duration}ms, Remote: ${remote.timings.duration}ms`);
}
Network policies can add latency if complex rule evaluation is required.
Ingress Controller Testing
Test the ingress controller’s capacity separately from your application:
# nginx-ingress specific metrics
apiVersion: v1
kind: ConfigMap
metadata:
name: nginx-configuration
data:
enable-vts-status: "true" # Enable metrics
Monitor ingress metrics during load:
kubectl exec -it nginx-ingress-controller-xxx -- curl localhost:18080/nginx_status
Key metrics:
- Active connections
- Requests per second
- Connection queue depth
Persistent Volume Performance
For stateful applications, test storage performance:
apiVersion: v1
kind: Pod
metadata:
name: storage-benchmark
spec:
containers:
- name: fio
image: ljishen/fio
command: ["fio", "--name=randwrite", "--ioengine=libaio", "--iodepth=16",
"--rw=randwrite", "--bs=4k", "--size=1G", "--numjobs=4",
"--time_based", "--runtime=60", "--filename=/data/test"]
volumeMounts:
- name: test-volume
mountPath: /data
volumes:
- name: test-volume
persistentVolumeClaim:
claimName: test-pvc
Compare different storage classes:
| Storage Class | IOPS | Latency | Use Case |
|---|---|---|---|
| gp2 (AWS) | 3000 burst | 1-10ms | General purpose |
| io1 (AWS) | Provisioned | <1ms | Databases |
| standard (GKE) | Variable | Variable | Development |
| ssd (GKE) | Higher | Lower | Production |
Monitoring During Tests
Deploy Prometheus and Grafana for real-time visibility:
helm install prometheus prometheus-community/kube-prometheus-stack
Key dashboards:
- Container resource usage (CPU, memory per pod)
- Network I/O per pod
- Request latency percentiles
- Error rates by service
Example PromQL queries for load testing:
# Request rate
sum(rate(http_requests_total{namespace="production"}[1m])) by (service)
# P99 latency
histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[1m])) by (le, service))
# Error rate
sum(rate(http_requests_total{status=~"5.."}[1m])) / sum(rate(http_requests_total[1m]))
Cluster Autoscaler Interaction
If using cluster autoscaler, test node scaling:
- Deploy pods that exceed current node capacity
- Measure time for new nodes to join
- Verify pods schedule correctly on new nodes
# Watch node scaling
kubectl get nodes -w
# Check pending pods
kubectl get pods --field-selector=status.phase=Pending
Node scaling typically takes 2-10 minutes depending on cloud provider and instance type.
Recommendations
- Always test within the cluster to measure realistic internal latencies
- Test scaling boundaries before they’re hit in production
- Monitor resource metrics alongside application metrics
- Test failure scenarios - what happens when pods crash under load?
- Document baseline performance for each resource configuration
Kubernetes adds operational complexity but also provides powerful scaling capabilities. Performance testing validates that your configuration actually delivers the resilience and scalability you expect.
Tags: