Cloud
Recruitment and knowledge question base. Filter, search and test your knowledge.
easymicroservicesarchitecturedistributed-systems
Answer
A microservice is a small service that does one business thing and can be deployed independently. It usually owns its data and talks to other services via APIs/events, so it can scale and release separately.
const express = require('express');
const app = express();
const port = 3000;
app.get('/users', (req, res) => {
res.json([{ id: 1, name: 'Alice' }]);
});
app.listen(port, () => {
console.log(`Microservice listening at http://localhost:${port}`);
});easycloud-computingservice-modeliaas+2
Answer
IaaS provides raw infrastructure like VMs, networking and storage; you manage the OS, runtime and app. PaaS provides a managed platform/runtime where you deploy code and the provider handles OS, scaling and patches. SaaS is a complete application delivered to end users.
mediumdockervirtualizationcontainerization+1
Answer
Containers (Docker) virtualize at the OS level: they share the host kernel, start quickly, and are lightweight. Virtual machines virtualize hardware: each VM runs its own guest OS/kernel, is heavier and slower to start, but provides stronger isolation.
hardkubernetesorchestrationcontainer
Answer
Kubernetes is a container orchestration system. Core concepts include a cluster (control plane + worker nodes), pods as the smallest deployable unit running containers, deployments/statefulsets for desired replica state, services/ingress for networking, and ConfigMaps/Secrets and Namespaces for configuration and isolation.
mediumserverlesscloudarchitecture+1
Answer
Serverless lets you run code without managing servers: the provider handles provisioning, scaling and patching. You pay per execution, can scale to zero, and deploy quickly. Trade‑offs include cold starts, execution limits and vendor lock‑in.
easyiaaspaassaas+1
Answer
IaaS gives you infrastructure (VMs, networks), PaaS gives you a managed runtime/platform (deploy code, provider runs it), and SaaS is a ready-to-use application (you just use it).
easycloudregionsavailability-zone+1
Answer
A region is a geographic area (e.g., `eu-central-1`) that contains multiple Availability Zones. An AZ is a physically separate data center (or group of them) within a region. Spreading across AZs improves availability; spreading across regions improves geo‑redundancy.
easycloudscalinghorizontal+1
Answer
Horizontal scaling adds more instances (scale out), improving redundancy and capacity. Vertical scaling makes a single instance bigger (scale up). Horizontal is usually more resilient; vertical can be simpler but has limits.
mediumdeploymentblue-greencanary+1
Answer
Blue/green switches all traffic from old to new at once (with quick rollback). Canary rolls out to a small % first and gradually increases, reducing risk by observing metrics before full rollout.
mediumload-balancerl4l7+1
Answer
L4 works on transport (TCP/UDP) and routes connections without understanding HTTP. L7 understands application protocols (HTTP) so it can route by path/headers, do TLS termination, and apply more advanced rules.
mediumobservabilitymetricslogs+1
Answer
Metrics are numbers over time (CPU, latency), logs are discrete events/messages, and traces follow a single request across services (spans). Together they help you detect, diagnose, and understand incidents.
hardhigh-availabilitymulti-azmulti-region+1
Answer
Multi-AZ protects you from a datacenter outage inside a region with lower latency and simpler ops. Multi-region can survive a full region outage but adds latency, data replication complexity, and higher costs.
hardmessagingidempotencyretries
Answer
With at-least-once delivery, a message can be delivered more than once (retries), so consumers must be idempotent (processing duplicates safely). Exactly-once is hard/expensive in practice, so idempotency is a common solution.
hardsecretskmssecurity
Answer
Store secrets in a secret manager (or encrypted KMS-backed store) and inject them at runtime (env/volume), not in git or plain config files. Rotate secrets and follow least privilege.
hardcostright-sizingautoscaling+1
Answer
Right-size instances based on metrics and use autoscaling instead of overprovisioning. Add caching (CDN/app cache) and consider reserved/committed capacity for predictable workloads.
easycdncachingperformance
Answer
A CDN caches static content (images, JS, CSS) close to users. It reduces latency and offloads traffic from your origin servers, improving speed and resilience.
mediumautoscalingmetricsthrashing+1
Answer
Autoscaling adjusts the number of instances based on demand (metrics like CPU, latency, queue depth). A common pitfall is thrashing: scaling up/down too aggressively due to noisy metrics—use cooldowns and proper thresholds.
mediumiacterraformautomation+1
Answer
IaC defines infrastructure in code (Terraform/CloudFormation) so it’s versioned, reviewable, and reproducible. It reduces configuration drift and makes environments consistent and easier to automate.
hardiamsecurityleast-privilege
Answer
Least privilege means giving only the minimum permissions needed to do the job (no more). It limits blast radius: if a key or service is compromised, the attacker can do less damage.
{
"Effect": "Allow",
"Action": ["s3:GetObject"],
"Resource": ["arn:aws:s3:::my-bucket/*"]
}hardservice-meshmtlssidecar+1
Answer
A service mesh adds a dedicated layer (often sidecar proxies) for service-to-service traffic: mTLS, retries, timeouts, and observability. It’s worth it when you have many services and need consistent networking/security controls, but it adds operational complexity.
easystorageobject-storageblock-storage+1
Answer
Object storage stores files as objects (key + data + metadata) and is great for blobs (images, backups). Block storage provides a raw disk volume for a VM; it’s good for databases and filesystems where you need low-latency random access.
mediumhealth-checkload-balanceravailability
Answer
A health check tests if an instance is ready to serve traffic. Load balancers use it to remove unhealthy instances from rotation, improving availability and reducing user-facing errors.
mediumdrrporto+1
Answer
RPO (Recovery Point Objective) is how much data you can lose (time). RTO (Recovery Time Objective) is how long recovery can take. They drive backup frequency, replication, and DR design.
harddisaster-recoveryfailovermulti-region+1
Answer
Backup/restore is cheapest but has higher RTO/RPO. Warm standby keeps a smaller live environment ready to scale during failover (better RTO). Active-active runs in multiple regions at once (best RTO, often best availability) but is the most complex and expensive.
hardenvironmentssecurityblast-radius+1
Answer
Separation reduces blast radius and prevents accidents (e.g., deleting prod resources). It improves security and compliance, makes costs clearer, and lets you apply stricter policies/approvals in production.
easycloudnetworkingvpc+1
Answer
A VPC is a private, isolated network in the cloud where you define IP ranges, subnets, routes, and firewall rules. You use it to control who can talk to what (e.g., keep databases private) and to connect securely to other networks (VPN/peering).
mediumcloudnetworkingsubnet+1
Answer
A public subnet has a route to an Internet Gateway, so instances can be reached from the internet (with correct firewall rules). A private subnet has no direct inbound internet route; it’s commonly used for app servers and databases. Often the load balancer is public, while app/DB stay private.
mediumcloudnetworkingnat+1
Answer
A NAT gateway lets instances in a private subnet make outbound connections to the internet (updates, external APIs) while staying unreachable from inbound internet traffic. It’s a common pattern: private app servers + NAT for outbound, public load balancer for inbound.
hardcloudsaasmulti-tenant+1
Answer
Multi-tenant shares infrastructure between customers (cheaper and easier to scale), but isolation and noisy-neighbor risks are harder. Single-tenant gives stronger isolation and simpler “per customer” limits, but costs more and is operationally heavier. Many systems use a hybrid approach.
hardcloudrate-limitingwaf+1
Answer
Rate limiting protects your system from abuse and traffic spikes (often returning HTTP 429). You can enforce it at the edge (CDN/WAF), API gateway/load balancer, and in the app itself. Earlier enforcement saves resources, but the app still needs safeguards because not all traffic comes through one entry point.
easycloudcontainersregistry+1
Answer
A container registry stores and serves container images (like a “Git repo for images”). Teams use it to version images, scan them for vulnerabilities, control access, and deploy the same tested image to different environments.
mediumkubernetesnetworkingingress+2
Answer
A Service gives stable networking to pods (stable name/IP) and load-balances inside the cluster. A LoadBalancer Service typically provisions a cloud L4 load balancer to expose the Service externally. Ingress is usually L7 HTTP routing (host/path rules, TLS) in front of Services, managed by an Ingress Controller.
mediumcloudsecuritysecrets+1
Answer
Use overlapping validity: create a new secret version, deploy apps that can use the new secret, then revoke the old one. Prefer short-lived credentials where possible. Make sure apps reload secrets safely (restart/sidecar/reload hook) and monitor failures during the rollout.
hardcloudstorageconsistency+1
Answer
Depending on the provider and operation (especially overwrites and listings), object storage can behave like an eventually consistent system, so you may not see the newest state immediately. Design for it by avoiding overwrites (use unique keys/versioning), using retries with backoff, and not relying on immediate “list shows everything” semantics.
hardkubernetesautoscalinghpa+1
Answer
HPA (Horizontal Pod Autoscaler) scales the number of pods based on metrics (CPU, memory, custom). Cluster Autoscaler scales the number of nodes in the cluster when pods can’t be scheduled due to lack of resources. They often work together: HPA adds pods, Cluster Autoscaler adds nodes if needed.
mediumcloudnetworkingsubnet+1
Answer
Public subnets can route to the internet (via an Internet Gateway). Private subnets have no direct inbound internet access. A NAT gateway lets instances in private subnets initiate outbound connections (e.g., to fetch updates) without being publicly reachable.
mediumcloudiamsecurity+1
Answer
Users are identities for people or long‑lived credentials. Roles are assumed by services or users for temporary access. Least privilege means granting only the minimal permissions needed, ideally via role‑based, time‑bound access.
easycloudcdnperformance+1
Answer
A CDN caches content at edge locations close to users, reducing latency and offloading origin servers. Use it for static assets, large files, and global audiences to improve performance and resilience.
mediumclouddeploymentblue-green+1
Answer
Blue/green keeps two full environments and switches traffic from old to new in one step (fast rollback). Canary rolls out to a small percentage first and gradually increases, reducing risk but requiring monitoring and staged rollout logic.
mediumcloudstatelessstateful+1
Answer
Stateless services don’t keep user/session state in memory, which makes them easy to scale and replace. Stateful services keep local state, so scaling requires sticky sessions, shared storage, or careful replication. Stateless designs are generally more cloud‑friendly.
mediumclouddisaster-recoveryrto+1
Answer
RTO (Recovery Time Objective) is how quickly you must restore service after an outage. RPO (Recovery Point Objective) is how much data loss is acceptable (time between last good backup and failure).
mediumcloudobservabilitymetrics+2
Answer
Metrics are numeric time‑series (e.g., latency, error rate). Logs are detailed events with context. Traces link events across services to show end‑to‑end request flow. Together they help detect, diagnose, and explain incidents.
mediumcloudpricingon-demand+2
Answer
On‑demand is flexible but most expensive. Reserved (or savings plans) are cheaper for steady workloads but require commitment. Spot is the cheapest but can be interrupted, so it’s best for fault‑tolerant or batch jobs.