Databases

Recruitment and knowledge question base. Filter, search and test your knowledge.

Topics

What is the N+1 problem in Hibernate?

mediumhibernateormperformance+1

Answer

The N+1 problem appears when Hibernate loads N parent entities with one query, and then lazily loads each parent’s children with an additional query, resulting in N extra selects. Fix it with fetch joins/entity graphs, batch fetching, or adjusting fetch strategy.

What are ACID properties?

mediumtransactionaciddatabase-theory

Open question

Answer

ACID describes transactional guarantees: Atomicity (all‑or‑nothing), Consistency (invariants preserved), Isolation (concurrent transactions behave as if serialized), and Durability (committed data survives crashes).

B-Tree vs Hash Index?

hardindexingb-treehash+1

Open question

Answer

B‑tree indexes keep keys ordered, so they support equality lookups, range scans, prefix queries and ORDER BY efficiently. Hash indexes map keys to buckets for fast equality lookups, but they don’t support ranges or ordering.

Database Normalization (1NF, 2NF, 3NF)?

hardnormalizationdatabase-designtheory

Open question

Answer

Normalization reduces redundancy. 1NF requires atomic columns and no repeating groups. 2NF is 1NF plus every non‑key attribute depends on the whole key. 3NF is 2NF plus no transitive dependencies between non‑key attributes.

SQL vs NoSQL?

mediumsqlnosqlcomparison+1

Open question

Answer

SQL databases are relational with fixed schemas, joins and strong ACID transactions. NoSQL databases are non‑relational (document, key‑value, column, graph), often with flexible schemas and easier horizontal scaling, sometimes trading strict consistency for availability/performance.

Primary key vs unique constraint vs index — what’s the difference?

easyprimary-keyuniqueindex+1

Open question

Answer

A primary key uniquely identifies a row (and can be referenced by FKs). A unique constraint also enforces uniqueness, but isn’t “the” row identity. An index is a data structure that speeds up reads; it may be unique or not.

What is a transaction and why do we use it?

easytransactionacidconsistency

Open question

Answer

A transaction groups multiple operations into one unit of work: either all succeed or all are rolled back. It protects data consistency, especially with concurrent users and failures.

What does an index do and what is the main trade-off?

easyindexperformancetradeoffs

Open question

Answer

An index speeds up reads (filtering/sorting) by avoiding full table scans. The trade-off is slower writes (INSERT/UPDATE/DELETE) and extra storage because the index must be maintained.

INNER JOIN vs LEFT JOIN — what’s the difference?

mediumsqljoininner-join+1

Open question

Answer

INNER JOIN returns only rows that match in both tables. LEFT JOIN returns all rows from the left table and fills missing matches from the right side with NULLs.

Normalization vs denormalization — what’s the trade-off?

mediumnormalizationdenormalizationschema-design

Open question

Answer

Normalization reduces redundancy and update anomalies by splitting data into related tables. Denormalization duplicates some data to speed up reads and reduce joins, at the cost of harder writes and consistency management.

What is an isolation level (and why do we care)?

mediumtransactionsisolationconcurrency

Open question

Answer

Isolation defines what anomalies are allowed when transactions run concurrently (dirty/non-repeatable reads, phantoms). Higher isolation gives stronger correctness but may reduce concurrency and performance.

What is the N+1 query problem and how do you avoid it?

hardn-plus-oneormperformance

Open question

Answer

It’s when you load N parent rows and then run one extra query per parent (N more queries). Avoid it with joins, batching, prefetching (eager loading), or separate “IN (...)” queries.

Optimistic vs pessimistic locking — what’s the difference?

hardlockingoptimisticpessimistic+1

Open question

Answer

Optimistic locking assumes conflicts are rare: you update with a version/timestamp check and retry on conflict. Pessimistic locking locks rows upfront (e.g., `SELECT ... FOR UPDATE`) to prevent others from changing them.

Replication vs sharding — what problem does each solve?

hardreplicationshardingscaling

Open question

Answer

Replication copies the same data to multiple nodes (better read scale and availability). Sharding splits data across nodes (better write/size scale), but makes queries and transactions more complex.

How do you read an EXPLAIN plan at a high level?

hardexplainquery-planperformance

Open question

Answer

Look at the biggest cost/time nodes first and the scan type (Seq Scan vs Index Scan) and row estimates. Then check joins/sorts and whether indexes are used as expected; fix with better queries, indexes, or stats.

What is a foreign key and what does it enforce?

easyforeign-keyconstraintsintegrity

Open question

Answer

A foreign key is a constraint that links a column to a referenced primary/unique key in another table. It enforces referential integrity: you can’t reference a row that doesn’t exist (and it can control delete/update behavior).

GROUP BY vs HAVING — what’s the difference?

mediumsqlgroup-byhaving+1

Open question

Answer

GROUP BY groups rows for aggregation. HAVING filters groups after aggregation (while WHERE filters rows before grouping). Example: “only groups with count > 10”.

What is a covering index?

mediumindexcovering-indexperformance

Open question

Answer

A covering index contains all columns needed by a query, so the database can answer it using only the index (no table/heap lookup). It can significantly speed up reads, at the cost of larger indexes and slower writes.

What is a deadlock in a database and how do you reduce it?

harddeadlocklockingtransactions+1

Open question

Answer

A deadlock is when transactions wait on each other’s locks in a cycle, so none can proceed. Reduce it by keeping transactions short, locking rows in a consistent order, and retrying on deadlock errors.

What is SQL injection and how do you prevent it?

hardsql-injectionsecurityprepared-statements

Open question

Answer

SQL injection is when untrusted input changes the meaning of your SQL (e.g., by concatenating strings). Prevent it with parameterized queries/prepared statements, proper escaping by the driver/ORM, and least-privilege DB accounts.

What is a composite (multi-column) index and when does it help?

easyindexcomposite-indexperformance

Open question

Answer

A composite index indexes multiple columns together (e.g., (org_id, email)). It helps when queries filter/sort by the leftmost prefix of those columns, reducing scans and speeding up lookups.

What is a window function in SQL and a typical use case?

mediumsqlwindow-functionsrow_number

Open question

Answer

A window function computes a value over a “window” of rows related to the current row without collapsing rows like GROUP BY. Use cases: ranking (ROW_NUMBER), running totals, “top N per group”.

Constraints vs triggers — what’s the difference and what do you prefer for integrity?

mediumconstraintstriggersintegrity

Open question

Answer

Constraints (PK, FK, UNIQUE, CHECK) are declarative rules enforced by the DB engine. Triggers are custom code that runs on events. Prefer constraints for data integrity when possible because they’re simpler, predictable, and optimized.

How can you make a write idempotent at the database level?

hardidempotencyunique-constraintupsert+1

Open question

Answer

Use a unique constraint on an idempotency key (or natural key) and upsert/insert-with-conflict-handling. If the same request arrives twice, the second write becomes a no-op or updates the same row instead of creating a duplicate.

Soft delete vs hard delete — what’s the trade-off?

hardsoft-deletedata-retentionauditing

Open question

Answer

Soft delete marks a row as deleted (e.g., `deleted_at`) so you can restore/audit, but it complicates queries and indexes (you must filter out deleted rows). Hard delete removes data and simplifies queries, but you lose history unless you archive elsewhere.

Why is `SELECT *` risky in production queries?

easysqlbest-practicesperformance+1

Open question

Answer

`SELECT *` fetches more data than you need (more I/O and memory) and couples code to schema changes (adding a column can change results, payload size, or expose sensitive fields). Selecting only needed columns is clearer and can enable better index-only plans.

SQL NULL: why is `col = NULL` not true and what should you use?

mediumsqlnullthree-valued-logic+1

Open question

Answer

NULL means “unknown”, so comparisons like `col = NULL` evaluate to UNKNOWN, not true/false (3-valued logic). Use `IS NULL` / `IS NOT NULL`, and in some DBs `IS DISTINCT FROM` to compare safely with NULLs.

Partitioning vs sharding: what is the difference?

mediumscalingpartitioningsharding+1

Open question

Answer

Partitioning splits one logical table into smaller parts, usually within one database system (helps manage data and can speed up queries via partition pruning). Sharding splits data across multiple database nodes/instances to scale out. Partitioning is often simpler; sharding adds distributed complexity.

What is a materialized view and when would you use it?

hardviewsmaterialized-viewperformance+1

Open question

Answer

A materialized view stores the result of a query physically, so reads can be much faster than recomputing the query each time. You use it for expensive aggregations or reporting, accepting the trade-off: refresh cost and (often) slightly stale data.

What is write amplification and why do many indexes make writes slower?

hardperformanceindexeswrite-amplification+1

Open question

Answer

Write amplification means one logical write causes multiple physical writes: the table row plus every affected index (and often WAL/redo logs). More indexes usually speed up reads but make inserts/updates/deletes slower and increase storage and maintenance cost.

Index selectivity: what is it and why does it matter?

easyindexesselectivityperformance+1

Open question

Answer

Selectivity is how well a column filters rows (how many rows match a predicate). A highly selective index (few matches) is more useful, because the DB can avoid scanning many rows. Low-selectivity columns (like boolean flags) often don’t benefit much from a standalone index.

Why can the optimizer choose a bad query plan and how do statistics help?

mediumoptimizerstatisticscardinality+1

Open question

Answer

The optimizer picks a plan based on estimated row counts (cardinality). If estimates are wrong (stale stats, skewed data, correlated columns), it can choose the wrong join order or algorithm. Updating statistics (e.g., ANALYZE) and using appropriate indexes helps the optimizer estimate better.

Why can `LIKE '%term%'` be slow and what are common alternatives?

mediumsqllikeindexes+1

Open question

Answer

A leading wildcard (`%term`) often prevents using a normal B-tree index, so the DB may scan many rows. Alternatives include full-text search indexes, trigram indexes (where supported), or changing queries to use prefix search (`term%`) when possible.

Online schema change: how do you change a column type with minimal downtime?

hardmigrationsschema-changezero-downtime+1

Open question

Answer

Use an expand/contract approach: add a new column, backfill data in batches, write to both (or keep them in sync), switch reads to the new column, then remove the old one. This avoids long blocking locks and lets you roll out safely.

Why are long transactions dangerous in production databases?

hardtransactionslocksmvcc+2

Open question

Answer

Long transactions can hold locks for a long time, block other queries, and increase contention. In MVCC databases they can also prevent cleanup of old row versions, leading to bloat. They can also increase replication lag and make failures harder to recover from.

Primary key vs unique constraint: what’s the difference?

easydatabaseconstraintsprimary-key+1

Open question

Answer

A primary key uniquely identifies a row and typically implies NOT NULL plus a single main identifier per table. A unique constraint also enforces uniqueness but you can have multiple unique constraints and they may allow NULLs depending on the DB.

Foreign keys: why use them and what’s a trade‑off?

mediumdatabaseforeign-keyintegrity+1

Open question

Answer

Foreign keys enforce referential integrity (no orphan rows) and can simplify reasoning. The trade‑off is extra write overhead and sometimes more complex migrations. In high‑throughput systems, teams sometimes validate in the app instead of the DB.

Composite index order: why does column order matter?

mediumdatabaseindexescomposite-index+1

Open question

Answer

Most databases use the left‑most prefix of a composite index. That means an index on (A, B, C) can speed up queries on A or A,B, but not on B alone. Choose the order based on common filters and selectivity.

What is a covering index (index‑only scan) and why can it be faster?

mediumdatabaseindexescovering-index+1

Open question

Answer

A covering index contains all columns needed by a query, so the DB can answer it using only the index without fetching table rows. This reduces I/O and can be much faster, especially on large tables.

Deadlock: what is it and how do databases resolve it?

mediumdatabasetransactionslocks+1

Open question

Answer

A deadlock happens when two transactions wait on each other’s locks. Databases detect it and abort (rollback) one transaction so the other can continue. Apps should retry aborted transactions.

Isolation levels: what’s the difference between Read Committed, Repeatable Read, and Serializable?

harddatabasetransactionsisolation+1

Open question

Answer

Read Committed prevents dirty reads but allows non‑repeatable reads. Repeatable Read guarantees the same rows won’t change during the transaction but can allow phantom rows. Serializable is the strictest; it behaves as if transactions run one‑by‑one, preventing phantoms but reducing concurrency.

Partitioning vs sharding: what’s the difference?

harddatabasepartitioningsharding+1

Open question

Answer

Partitioning splits a table into parts within a single database instance (often for management/performance). Sharding splits data across multiple database servers for horizontal scaling. Sharding adds complexity around routing and cross‑shard queries.

Denormalization: when might you do it and what’s the trade‑off?

mediumdatabasedenormalizationperformance+1

Open question

Answer

Denormalization duplicates data to speed up reads and reduce joins. It can improve performance for read‑heavy workloads, but increases storage, risks inconsistency, and makes writes/migrations more complex.

OLTP vs OLAP: what’s the difference?

easydatabaseoltpolap+1

Open question

Answer

OLTP systems handle many small, short transactions (operational workloads). OLAP systems run fewer, heavy analytical queries over large datasets (reporting/BI). Schema design, indexing, and storage often differ.

Autocommit vs explicit transactions: when does it matter?

easydatabasetransactionsautocommit+1

Open question

Answer

With autocommit, each statement is its own transaction. Explicit transactions group multiple statements into one atomic unit, which matters for consistency and performance (fewer commits/round‑trips). Use explicit transactions for multi‑step changes that must succeed or fail together.