Deep dive
The safest cost wins come from removing waste and paying the right price for stable usage, not from cutting redundancy.
Practical levers
- **Right-size + autoscale**: use metrics (CPU, memory, p95 latency) to set sensible sizing and scaling rules.
- **Commit for steady state**: reserved instances/savings plans for predictable baseline, autoscaling for spikes.
- **Cache aggressively**: CDN for static assets, application caching for expensive reads.
- **Storage hygiene**: lifecycle policies (move cold data to cheaper tiers), delete orphaned volumes/snapshots.
- **Reduce egress**: keep traffic within region, use compression, avoid cross-region chatter.
- **Observability costs**: reduce log verbosity/retention; sample traces.
Common pitfalls
- Turning off multi-AZ/replication to save money (reliability loss).
- Over-logging and paying more for logs than compute.
- Paying for idle resources (unused load balancers, IPs, disks, dev environments 24/7).