Advanced SQL Quiz: 12 Challenging Questions to Test, Improve, and Master Your SQL Skills

Advanced SQL Quiz: 12 Challenging Questions to Test, Improve, and Master Your SQL Skills

Table of Contents

Quiz Overview and Instructions

This quiz contains 12 challenging problems designed to stretch your advanced SQL skills across real-world tasks: writing set-based queries, composing and optimizing CTEs, applying window functions, debugging unexpected results, and reasoning about indexes and execution plans. Expect a mixture of question types—write-the-query prompts that require exact output, small debugging scenarios where you must identify and fix logic errors, and short multiple-choice items that test conceptual trade-offs.

Allocate about 45–60 minutes to complete the set; work steadily rather than racing. Treat each problem as independent and aim for correctness and clarity. When submitting a written query, include any assumptions about schema or NULL semantics, show the expected result rows (or a compact sample), and prefer readable formatting with CTEs and clear aliases so reviewers can follow your logic.

Do not rely on proprietary extensions unless a question explicitly permits them; stick to standard SQL where possible. For optimization questions, briefly justify choices (e.g., index usage, join order, or aggregation strategy) and, if applicable, show a simpler alternative and explain why it’s less efficient. Small, well-documented answers that run on a representative dataset will score higher than dense, undocumented one-liners.

Complex Joins and Unions

Real-world problems often require combining rows from multiple sets while controlling duplicates, preserving unmatched rows, and keeping performance predictable. Use explicit join types to express intent: INNER for intersection, LEFT/RIGHT for keeping one side, and FULL OUTER to reconcile differences. Prefer explicit ON predicates and qualified columns to avoid accidental cross joins or row multiplication when joining aggregated results; aggregate first, then join the smaller summary rather than joining raw rows and grouping later.

Anti-joins and semi-joins are key patterns. Use LEFT JOIN … WHERE right.key IS NULL or NOT EXISTS for anti-joins; NOT EXISTS is clearer and often safer with NULLs. Example: find customers with no orders.

-- anti-join (preferred)
SELECT c.id, c.name
FROM customers c
WHERE NOT EXISTS (
  SELECT 1 FROM orders o WHERE o.customer_id = c.id
);

When combining result sets, choose UNION ALL when you want all rows and UNION when you need deduplication—UNION incurs an expensive sort/hash to remove duplicates. If you need union + dedupe but want control, append a source tag and use ROW_NUMBER() to pick preferred rows:

WITH combined AS (
  SELECT id, val, 'A' src FROM a
  UNION ALL
  SELECT id, val, 'B' src FROM b
), ranked AS (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY src) rn FROM combined
)
SELECT id, val FROM ranked WHERE rn = 1;

For data reconciliation, FULL OUTER JOIN with COALESCE lets you compare keys and highlight mismatches. Always ensure join keys are indexed, avoid SELECT * across joins, and test query plans for costly hash vs nested-loop behavior. Small, well-documented CTEs and clear aliases make complex joins and unions maintainable and reviewable.

Window Functions Challenges

Window functions unlock powerful row-aware calculations but introduce several subtle pitfalls to watch for. Always specify a stable ORDER BY inside the window; without it results can be non-deterministic. Choose the right ranking function: use ROW_NUMBER() when you need a single winner per partition, RANK()/DENSE_RANK() when ties should be preserved. Remember window frames: ROWS counts physical rows, RANGE groups by value (important with duplicate timestamps), and the default frame for ordered windows may be different than you expect—explicit frames avoid surprises.

Avoid filtering by a window result in the same query block; compute the window value in a CTE or subquery and then filter in the outer SELECT (e.g., find latest row per group using ROW_NUMBER() then WHERE rn = 1). For running totals, use SUM(...) OVER (PARTITION BY ... ORDER BY ... ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) to guarantee row-based accumulation.

Performance-wise, large partitions force heavy sorts and memory usage. Limit partition sizes with predicates, pre-aggregate when possible, and ensure good indexes on the partition and order columns to reduce sort cost. Beware that window outputs cannot be indexed directly; materialize intermediate results if reused. When updating rows using window results, compute them in a CTE and join in the UPDATE to keep intent clear and avoid ambiguous execution order.

Small, explicit examples and testing on edge cases (NULLs, duplicate keys, and boundary frames) will catch most logical errors before they reach production.

CTEs and Recursive Queries

Break complex logic into small, named building blocks using the WITH clause: it improves readability, lets you name intermediate results, and makes testing each step easy. Use a non-recursive WITH for modular transformations and pre-aggregation.

WITH customer_totals AS (
  SELECT customer_id, SUM(amount) total
  FROM orders
  WHERE created_at >= '2024-01-01'
  GROUP BY customer_id
)
SELECT c.id, c.name, t.total
FROM customers c
JOIN customer_totals t ON t.customer_id = c.id
WHERE t.total > 1000;

For hierarchical or graph traversals, use a recursive WITH to define an anchor (base rows) and a recursive member that references the named result. Always include a clear termination condition to avoid infinite recursion and, when possible, a mechanism to detect cycles.

WITH RECURSIVE hierarchy AS (
  -- anchor: top-level managers
  SELECT id, manager_id, name, 1 AS depth, ARRAY[id] AS path
  FROM employees
  WHERE manager_id IS NULL
  UNION ALL
  -- recursive member: attach direct reports
  SELECT e.id, e.manager_id, e.name, h.depth + 1, h.path || e.id
  FROM employees e
  JOIN hierarchy h ON e.manager_id = h.id
  WHERE NOT e.id = ANY(h.path) -- prevents cycles
)
SELECT id, name, depth FROM hierarchy;

Practical tips: push selective filters into the anchor to shrink recursion, return only needed columns, index join keys, and materialize results if reused. For very deep or wide graphs prefer iterative set-based approaches or DB-specific safeguards (e.g., SQL Server’s OPTION (MAXRECURSION) or equivalent) to control resource usage.

Aggregation and Subqueries

Aggregate functions (SUM, COUNT, AVG, MIN, MAX) compress rows; GROUP BY defines the grouping columns and HAVING filters groups after aggregation. Remember non-aggregated columns must appear in GROUP BY and NULLs are excluded by most aggregates (COUNT(col) vs COUNT(*) behaves differently).

Aggregate first, then join the smaller summary to avoid row explosion and improve performance:

WITH totals AS (
  SELECT customer_id, SUM(amount) AS total
  FROM orders
  WHERE created_at >= '2024-01-01'
  GROUP BY customer_id
)
SELECT c.id, c.name, t.total
FROM customers c
JOIN totals t ON t.customer_id = c.id
WHERE t.total > 1000;

Subqueries can be uncorrelated (single-valued or set-valued) or correlated (referencing the outer row). Use scalar subqueries in SELECT for computed values, and correlated subqueries when the calculation depends on each outer row:

-- scalar (uncorrelated)
SELECT product_id, price FROM products
WHERE price > (SELECT AVG(price) FROM products);

-- correlated: compare each order to its customer's average
SELECT o.*
FROM orders o
WHERE o.amount > (
  SELECT AVG(amount) FROM orders oo WHERE oo.customer_id = o.customer_id
);

Prefer EXISTS over IN for correlated-existence checks and to avoid surprising NULL behavior with IN. For heavy aggregations, push filters into the aggregation step, index grouping/join keys, and consider window functions (ROW_NUMBER(), SUM() OVER()) when you need per-row context instead of collapsing rows.

Solutions and Detailed Explanations

State assumptions first: list required schema columns, NULL semantics, and any DB-specific behavior you rely on. For write-the-query problems, present a clear, readable solution with one or two compact CTEs, then show expected sample rows. Explain why the approach meets the requirement and note edge cases.

Use set-based patterns for correctness and performance. For “latest row per group” use ROW_NUMBER() in a CTE and filter rn = 1 to avoid non-deterministic aggregates:

WITH ranked AS (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY key_col ORDER BY ts_col DESC) rn
  FROM events
)
SELECT * FROM ranked WHERE rn = 1;

For anti-joins prefer NOT EXISTS to avoid NULL surprises and assure intent:

SELECT c.* FROM customers c
WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id);

When unions must dedupe, prefer UNION ALL plus ROW_NUMBER() if you need control over which source wins; use plain UNION only when dataset size and deduplication cost are acceptable. For recursive hierarchy problems show anchor and recursive member, include cycle protection and a depth limit.

In debugging questions identify root cause (wrong join type, misplaced WHERE that turns outer join into inner, incorrect frame for window) and provide the minimal code change and why it fixes correctness. For optimization justify indexes on join and partition/order columns, push selective filters into CTE anchors or WHERE clauses, and prefer aggregating before joining to reduce intermediate row counts.

For conceptual multiple-choice explain trade-offs briefly (e.g., UNION vs UNION ALL: dedupe cost vs correctness; ROW_NUMBER vs RANK: tie handling). Keep sample outputs tiny and representative to validate correctness.

Scroll to Top