Essential SQL: Master the 20% of Queries and Concepts That Solve 80% of Real-World Data Tasks

Essential SQL: Master the 20% of Queries and Concepts That Solve 80% of Real-World Data Tasks

Table of Contents

Why the 80/20 of SQL matters — which queries and concepts to prioritize

Master a compact set of patterns that cover most production needs: SELECT with WHERE, JOINs (INNER/LEFT), GROUP BY plus aggregates, and ORDER BY with LIMIT for paging and top-n queries. (projectpro.io)

Pair those with basic DML (INSERT/UPDATE/DELETE) and small, practical patterns like DISTINCT, UNION/UNION ALL, and simple subqueries — they solve ETL, reporting, and CRUD work quickly. Use EXPLAIN and proper indexes to find and fix bottlenecks early; indexing and query shape often matter more than clever SQL. (dba.stackexchange.com)

After you’re fluent, add CTEs and window functions (ROW_NUMBER(), RANK(), running totals) for pagination, de-duplication, and rolling calculations — they replace complex joins/subqueries and simplify logic. (learn.microsoft.com)

A practical checkpoint: if you can write fast SELECT+JOIN+WHERE+GROUP BY queries, read an EXPLAIN plan, and add an index, you’ll cover roughly 80% of everyday data tasks.

SELECT, WHERE and filtering: pick the right columns and rows

Choose columns deliberately: list only the fields you need (SELECT id, name), not SELECT *—smaller rows mean less I/O, faster networks, and smaller memory/lock footprints.

Filter rows early and precisely with WHERE clauses. Push selective predicates (equality, ranges) onto indexed columns when possible; avoid wrapping columns in functions (e.g., WHERE DATE(ts)=…) because that prevents index use.

Examples:

-- projection + selective filter
SELECT id, customer_id, total
FROM orders
WHERE customer_id = 123 AND status = 'shipped'
LIMIT 100;

Use LIMIT for sampling and pagination, and prefer EXISTS for anti-joins or existence checks over large IN lists. For complex filters, test with EXPLAIN to confirm the planner uses indexes. Small changes to which columns you SELECT and which predicates you write often yield the biggest real-world performance gains.

JOINs 101: INNER, LEFT, RIGHT, (and CROSS) plus practical join patterns

Joins combine rows from two (or more) tables by a relationship column. Use them to enrich, filter, or detect missing data.

INNER JOIN returns only matching rows.

SELECT o.id, c.name FROM orders o
INNER JOIN customers c ON o.customer_id = c.id;

LEFT JOIN keeps all left-side rows; use it for optional data or anti-joins (find orphans).

-- optional data
SELECT u.id, p.profile_pic FROM users u
LEFT JOIN profiles p ON p.user_id = u.id;
-- anti-join (missing right side)
SELECT u.* FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE o.id IS NULL;

RIGHT JOIN mirrors LEFT; prefer flipping table order to avoid RIGHT in readable code.

CROSS JOIN produces a Cartesian product—use only for combinatorics or intentional expansion:

SELECT a.x, b.y FROM a CROSS JOIN b;

Practical tips: qualify columns, push selective predicates onto indexed columns, and for semi-joins prefer EXISTS/IN for readability and performance. For LEFT JOIN, place filters on the right table in ON (not WHERE) to preserve outer results.

Aggregation & GROUP BY: SUM, COUNT, AVG, HAVING for summary reports

Use GROUP BY plus aggregates to produce compact summary rows (per customer, product, date, etc.). SUM/COUNT/AVG compute totals, counts, and means across groups; AVG and SUM ignore NULLs by default (use COALESCE(col,0) when you must treat NULL as 0). (academy.vertabelo.com)

COUNT(*) returns total rows in the group, while COUNT(col) counts only non-NULL values—pick intentionally. (stackoverflow.com)

Filter raw rows with WHERE before grouping to reduce work; use HAVING to filter on aggregated results (e.g., groups with SUM > threshold). (geeksforgeeks.org)

Example:

SELECT customer_id, SUM(amount) AS total, COUNT(*) AS orders, AVG(amount) AS avg_order
FROM orders
WHERE status='shipped'
GROUP BY customer_id
HAVING SUM(amount) > 1000
ORDER BY total DESC
LIMIT 50;

Practical tips: alias aggregates for readability, push selective predicates into WHERE, and COALESCE inside SUM/AVG when NULL semantics matter.

Window functions vs aggregates: ROW_NUMBER, RANK, LAG/LEAD for per-row analytics

Aggregates (SUM/COUNT/AVG) collapse groups into single rows; window functions compute values for every input row using OVER (PARTITION BY … ORDER BY …), so you get per-row analytics without losing row-level detail. (en.wikipedia.org)

Ranking functions differ by tie-handling: ROW_NUMBER() assigns a unique sequential id (good for deterministic de-duplication), RANK() gives equal ranks and leaves gaps after ties, and DENSE_RANK() gives equal ranks without gaps. (learn.microsoft.com)

-- top-N per group / de-dupe examples
SELECT *, ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY ts DESC) rn
FROM events;

LAG() and LEAD() access neighboring rows (previous/next) inside the same partition — ideal for diffs, churn/sessionization, or detecting state changes. Use offset and default to handle edges. (learn.microsoft.com)

SELECT ts, value, value - LAG(value) OVER(PARTITION BY id ORDER BY ts) delta
FROM readings;

CTEs, subqueries and basic performance tips: indexes, EXPLAIN, and simple tuning

Break complex logic into small, named steps using WITH blocks so each step is testable and readable; use recursive CTEs for hierarchical data. Reuse CTEs for clarity, but be aware some engines materialize them.

Prefer simple subqueries for isolated lookups and correlated subqueries when row-by-row context is required; for existence checks use EXISTS (faster and safer than large IN lists).

Index basics: index selective WHERE and JOIN columns; prefer composite indexes ordered for your most-common filter/ORDER BY prefix; create covering indexes (include columns) to avoid lookups; avoid indexes on low-cardinality flags. Keep stats current (ANALYZE) so the planner picks good plans.

Use EXPLAIN then EXPLAIN ANALYZE to compare estimated vs actual rows, spot seq scans, heavy sorts, or nested-loop blowups. Key signs: huge estimated rows mismatch, unexpected sequential scans, and external sorts.

Simple tuning wins: SELECT only needed columns, push predicates early, avoid wrapping indexed columns in functions, add appropriate indexes, batch writes, and retest with EXPLAIN ANALYZE after each change.

Scroll to Top