Databaseshard

Replication vs sharding — what problem does each solve?

Answer

Replication copies the same data to multiple nodes (better read scale and availability). Sharding splits data across nodes (better write/size scale), but makes queries and transactions more complex.

Advanced answer

Deep dive

Replication

Replication keeps **copies of the same dataset** on multiple nodes.

Primary/replica (leader/follower) is common.
Benefits: high availability, read scaling (read replicas), easier failover.
Trade-offs: replication lag (stale reads), failover complexity, write throughput still limited by the primary.

Sharding

Sharding splits the dataset into **partitions** (shards) across nodes.

Benefits: scale data size and write throughput.
Trade-offs: complex queries (cross-shard joins/aggregations), distributed transactions, resharding, choosing a good shard key.

Practical guidance

Start with replication for HA and read scale.
Consider sharding only when one node can’t handle data size/write throughput.
Many real deployments combine both: each shard is replicated.