Metrics are numbers over time (CPU, latency), logs are discrete events/messages, and traces follow a single request across services (spans). Together they help you detect, diagnose, and understand incidents.
Advanced answer
Deep dive
Think of observability signals as answering different questions:
**Metrics**: “Is the system healthy?” Aggregated numbers over time (RPS, p95 latency, error rate). Great for dashboards, SLOs, and alerting.
**Logs**: “What happened?” Discrete event records with context (userId, orderId). Great for debugging and audits.
**Traces**: “Where did time go?” A distributed view of a single request across services, broken into spans (DB call, HTTP call, cache).
How they work together
A common workflow: 1) Alert triggers from metrics (error rate spike). 2) Pivot to traces to find the slow/failing hop. 3) Read logs for the specific trace/request to see exact errors and context.
Practical tips
Put a correlation id / trace id into logs.
Avoid high-cardinality labels in metrics (cost + performance).
Use sampling for traces in high-traffic systems.
Common pitfalls
Logging too much at info level and exploding costs.