Interview kitsBlog

Your dream job? Lets Git IT.
Interactive technical interview preparation platform designed for modern developers.

XGitHub

Platform

  • Categories

Resources

  • Blog
  • About the app
  • FAQ
  • Feedback

Legal

  • Privacy Policy
  • Terms of Service

© 2026 LetsGit.IT. All rights reserved.

LetsGit.IT/Categories/Observability
Observabilitymedium

How do you investigate a latency regression in production?

Tags
#latency#incident#tracing
Back to categoryPractice quiz

Answer

Start with metrics to confirm scope (p95/p99, endpoints, regions), then use traces to locate slow spans and logs to identify exact errors or queries. Compare recent deploys and config changes.

Advanced answer

Deep dive

A structured workflow saves time:

  • Confirm impact: user-facing vs internal, % of traffic.
  • Slice by dimension: endpoint, region, tenant, version.
  • Trace bottlenecks: DB, cache, downstream, queue.
  • Correlate with deploys, feature flags, or traffic shifts.

Examples

Regression checklist:

1) p95/p99 up? 2) Which routes? 3) Which version? 4) Trace slow spans
5) Check DB: slow query log / locks / cache hit ratio

Common pitfalls

  • Looking only at average latency (hides tail issues).
  • Blaming the last deploy without evidence.
  • Ignoring downstream dependency status.

Interview follow-ups

  • How do you decide between rollback vs fix-forward?
  • How do you test for latency regressions pre-prod?
  • What if traces are missing for the affected traffic?

Related questions

Observability
What is sampling in tracing and what are the trade-offs?
#tracing#sampling#cost
Observability
What is distributed tracing and how do you propagate context?
#tracing#context#distributed-systems
Observability
Logs vs metrics vs traces — when do you use each?
#observability#logs
#metrics