Building a Performance Testing Strategy: From Requirements to Metrics
A comprehensive guide to developing a performance testing strategy, from gathering requirements to defining success criteria and reporting results.
Mark
Performance Testing Expert
Performance testing without a strategy is just running tools. A proper strategy aligns testing activities with business objectives, defines clear success criteria, and provides actionable results. Here’s how to build one.
Starting with Business Requirements
Performance requirements should trace back to business needs, not arbitrary technical targets. Ask stakeholders:
- What user experience are we promising?
- What happens if the system is slow? Lost revenue? Reputation damage?
- What growth do we expect over the next 12 months?
- Are there regulatory or contractual SLAs?
Convert business requirements to technical requirements:
| Business Requirement | Technical Requirement |
|---|---|
| ”Pages should feel instant” | Page load < 2 seconds at p95 |
| ”Support 10,000 concurrent users” | System handles 10,000 active sessions |
| ”Handle Black Friday traffic” | 5x normal throughput for 6 hours |
| ”99.9% uptime SLA” | Error rate < 0.1% under load |
Defining Workload Models
A workload model describes how users interact with your system. Base it on production data when possible:
User Distribution
Typical E-commerce Workload:
- 60% Browse products (read-heavy)
- 25% Search (compute-intensive)
- 10% Add to cart (write operations)
- 4% Checkout (complex transactions)
- 1% Account management
Think Time and Session Length
| User Type | Avg Session | Think Time | Actions/Session |
|---|---|---|---|
| Browser | 5 min | 15-30 sec | 8-12 |
| Buyer | 12 min | 10-20 sec | 20-30 |
| Power User | 30 min | 5-10 sec | 100+ |
Peak vs Normal Load
Define multiple load profiles:
Normal: 500 concurrent users, 50 requests/sec
Peak: 2,000 concurrent users, 200 requests/sec
Stress: 5,000 concurrent users, 500 requests/sec
Break: Increase until failure
Test Types and When to Use Them
| Test Type | Purpose | Duration | Load Level |
|---|---|---|---|
| Smoke | Verify basic functionality | 5-10 min | Minimal |
| Load | Validate requirements | 30-60 min | Expected peak |
| Stress | Find breaking points | 15-30 min | Beyond peak |
| Soak/Endurance | Memory leaks, degradation | 4-24 hours | Normal load |
| Spike | Sudden traffic bursts | 15-30 min | Rapid changes |
| Scalability | Capacity planning | Variable | Incremental |
Schedule these appropriately:
- Smoke tests: Every deployment
- Load tests: Weekly or before releases
- Stress tests: Monthly or quarterly
- Soak tests: Before major releases
- Spike tests: Before events (sales, launches)
Success Criteria
Define unambiguous pass/fail criteria before testing:
Response Time Thresholds
API Endpoints:
- p50 < 100ms
- p95 < 500ms
- p99 < 1000ms
Page Loads:
- Time to First Byte < 200ms
- First Contentful Paint < 1.5s
- Largest Contentful Paint < 2.5s
Throughput Requirements
Minimum sustainable throughput:
- API: 1,000 requests/second
- Database: 5,000 queries/second
- Message queue: 10,000 messages/second
Error Budget
Acceptable error rate: < 0.1%
- 4xx errors: < 0.05% (client errors shouldn't increase under load)
- 5xx errors: < 0.05% (server errors indicate capacity issues)
- Timeouts: < 0.01%
Resource Utilisation
Under peak load:
- CPU: < 70% average, < 90% peak
- Memory: < 80% with headroom for spikes
- Disk I/O: < 80% of provisioned capacity
- Network: < 70% of bandwidth limit
Environment Strategy
Test Environment Parity
| Aspect | Production | Performance Test |
|---|---|---|
| Instance types | c5.4xlarge | c5.4xlarge (same) |
| Instance count | 10 | 5 (scaled ratio) |
| Database | r5.2xlarge | r5.2xlarge (same) |
| Data volume | 500GB | 100GB (subset) |
| CDN | Yes | Yes or bypassed |
Perfect parity is often impractical. Document differences and adjust expectations accordingly.
Data Preparation
Performance test data should:
- Match production volume ratios
- Include edge cases and variety
- Be anonymised if from production
- Support the required concurrent user count
Example data sizing:
- 100,000 user accounts (10x concurrent target)
- 1,000,000 products (realistic catalog)
- 10,000,000 orders (historical data for queries)
Monitoring and Observability
What to Measure
Application metrics:
- Response time percentiles (p50, p95, p99)
- Throughput (requests/second)
- Error rates by type
- Active users/sessions
Infrastructure metrics:
- CPU, memory, disk I/O
- Network throughput and latency
- Database connections and query times
- Cache hit rates
Business metrics:
- Conversion rate under load
- Cart abandonment timing
- Search result latency
Baseline Establishment
Before testing changes, establish baselines:
Baseline Test (v1.5.2, 2024-04-01):
- Peak throughput: 850 req/s
- p95 response time: 245ms
- Error rate: 0.02%
- CPU at peak: 65%
Compare all future tests against this baseline.
Reporting and Communication
For Technical Teams
Detailed metrics and analysis:
- Full percentile distribution
- Resource utilisation timelines
- Specific bottleneck identification
- Reproduction steps for issues
For Stakeholders
Business-focused summary:
- Did we meet the requirements? (Yes/No)
- What’s the capacity headroom?
- What are the risks?
- What’s the recommendation?
Report Template
## Performance Test Summary
**Test Date:** 2024-04-08
**Version:** 2.1.0
**Environment:** Staging (scaled 1:2)
### Results vs Requirements
| Requirement | Target | Actual | Status |
|-------------|--------|--------|--------|
| Response time p95 | < 500ms | 312ms | PASS |
| Throughput | 1000 rps | 1,247 rps | PASS |
| Error rate | < 0.1% | 0.03% | PASS |
| CPU utilisation | < 70% | 68% | PASS |
### Key Findings
1. Database connection pool saturates at 1,100 rps
2. Memory usage stable over 4-hour soak test
3. No degradation during simulated failover
### Recommendations
1. Increase connection pool size before go-live
2. Add monitoring alert for connection pool usage
3. Approved for production deployment
Continuous Performance Testing
Integrate performance testing into CI/CD:
# On every commit
smoke_test:
- 10 users, 2 minutes
- Basic functionality check
- Fail build if p95 > 1s
# On merge to main
load_test:
- 100 users, 10 minutes
- Compare against baseline
- Alert if regression > 10%
# Weekly scheduled
full_load_test:
- Production-like load
- 1 hour duration
- Full report generation
Common Pitfalls
Testing in isolation: Performance depends on real infrastructure, network, and data. Synthetic environments give synthetic results.
Testing too late: Finding performance issues in production is expensive. Test early and often.
Ignoring variability: Single test runs aren’t statistically significant. Run multiple iterations and report ranges.
Optimising prematurely: Measure first. Don’t guess where bottlenecks are.
Focusing only on averages: Averages hide problems. Always look at percentiles and tail latency.
A performance testing strategy isn’t a document that sits on a shelf. It’s a living framework that evolves with your application and guides decision-making throughout the development lifecycle. Start with clear requirements, measure consistently, and communicate results in terms stakeholders understand.
Tags: