Recruitment and knowledge question base. Filter, search and test your knowledge.
DevOps is a culture and set of practices that align dev and ops around fast, reliable delivery with shared ownership. Success is measured by outcomes like deployment frequency, lead time for changes, change failure rate, and MTTR, plus user-impact metrics.
A CI pipeline usually checks out code, builds, runs unit/integration tests, performs static analysis, packages artifacts, and publishes them. Failures often come from flaky tests, missing dependencies, environment drift, secrets/config issues, or non-deterministic builds.
Continuous delivery keeps every change releasable and usually requires a manual approval to go to production. Continuous deployment automatically ships to production once the pipeline passes.
Rolling updates replace instances gradually with minimal extra capacity but can expose partial issues. Blue/green keeps two environments and switches traffic at once; rollback is easy but costs more. Canary releases to a small % first to validate metrics, reducing risk but requiring strong monitoring.
GitOps treats Git as the source of truth for desired state. Changes happen via pull requests and are reconciled to the target environment automatically, giving strong auditability and consistency.
Idempotency means applying the same config repeatedly yields the same state, enabling safe, repeatable provisioning. Validate changes with linting, plan/diff, policy checks, and a staging environment before production.
Configuration is non-sensitive and can be versioned. Secrets should live in a secret manager/KMS, be injected at runtime, rotated, and accessed with least privilege.
Use minimal base images, multi-stage builds, pin versions, remove build dependencies, use .dockerignore, run as non-root, and scan for vulnerabilities.
Deployment is for stateless workloads, StatefulSet for stateful apps needing stable identity/storage, and DaemonSet for running one pod per node (e.g., log agents).
Readiness gates traffic, liveness restarts unhealthy containers, and startup allows longer initialization. Misconfigured probes can cause restart loops or route traffic before the app is ready.
Metrics show trends and health, logs provide event details, and traces follow a request across services. Together they help detect, diagnose, and explain incidents.
Alert on symptoms tied to SLOs, use burn‑rate/multi-window alerts, deduplicate, route to owners, and ensure every alert is actionable with a clear runbook.
A runbook includes detection steps, triage, mitigation/rollback, roles, escalation paths, and comms. Postmortems capture root cause and create owned action items to prevent recurrence.
SLI is the measured metric, SLO is the target, SLA is the contractual promise. Error budget is the allowed failure (1 - SLO); if it’s burned, you slow releases and focus on reliability work.
Right-size instances based on metrics, use autoscaling for variable load, and buy reserved/committed capacity for steady workloads. Add storage lifecycle policies and caching where it makes sense.