Interview kitsBlog

Your dream job? Lets Git IT.
Interactive technical interview preparation platform designed for modern developers.

XGitHub

Platform

  • Categories

Resources

  • Blog
  • About the app
  • FAQ
  • Feedback

Legal

  • Privacy Policy
  • Terms of Service

© 2026 LetsGit.IT. All rights reserved.

LetsGit.IT/Categories/DevOps
DevOpsmedium

What should a good incident response runbook include, and how do postmortems drive change?

Tags
#incident#runbook#postmortem
Back to categoryPractice quiz

Answer

A runbook includes detection steps, triage, mitigation/rollback, roles, escalation paths, and comms. Postmortems capture root cause and create owned action items to prevent recurrence.

Advanced answer

Deep dive

Expanding on the short answer — what usually matters in practice:

  • Context (tags): incident, runbook, postmortem
  • Reliability: detect issues (monitoring) and limit blast radius (rollback, feature flags).
  • Security: least privilege, secret rotation, supply chain.
  • Automation: idempotency, repeatability, drift control.
  • Explain the "why", not just the "what" (intuition + consequences).
  • Trade-offs: what you gain/lose (time, memory, complexity, risk).
  • Edge cases: empty inputs, large inputs, invalid inputs, concurrency.

Examples

A tiny example (an explanation template):

// Example: discuss trade-offs for "what-should-a-good-incident-response-runbook-inc"
function explain() {
  // Start from the core idea:
  // A runbook includes detection steps, triage, mitigation/rollback, roles, escalation paths, 
}

Common pitfalls

  • Too generic: no concrete trade-offs or examples.
  • Mixing average-case and worst-case (e.g., complexity).
  • Ignoring constraints: memory, concurrency, network/disk costs.

Interview follow-ups

  • When would you choose an alternative and why?
  • What production issues show up and how do you diagnose them?
  • How would you test edge cases?

Related questions

Observability
How do you investigate a latency regression in production?
#latency#incident#tracing
Architecture
What is a blameless postmortem and why is it useful?
#postmortem#incident#culture
Architecture
What makes a good alert and how do you avoid alert fatigue?
#alerting#runbook
#observability