Alert on user-visible symptoms tied to SLOs, use multi-window burn-rate alerts, and ensure every alert has an owner and runbook. Deduplicate and route alerts to the right team.
Advanced answer
Deep dive
Actionable alerts are about impact and clarity:
Alert on SLO burn-rate, not every error.
Use multi-window (fast + slow) to catch spikes and sustained issues.
Define severity levels with clear response expectations.
Add context: links to dashboards, traces, and recent deploys.
Examples
Burn-rate concept:
If error budget burn-rate > 14x for 5m -> page
If burn-rate > 2x for 1h -> ticket