If you've ever written a SQL query so nested that you lost track of which subquery was which, Common Table Expressions (CTEs) are about to become your new best friend. They're one of PostgreSQL's most powerful features for writing readable, maintainable SQL.
What Are Common Table Expressions?
A Common Table Expression, or CTE, is a temporary named result set that you define within a query. Think of it as creating a temporary view that exists only for the duration of that single query. You define CTEs using the
WITH
keyword.
Here's the basic syntax:
WITH cte_name AS (
SELECT column1, column2
FROM some_table
WHERE condition
)
SELECT *
FROM cte_name;
The query inside the parentheses runs first, and its results become available under the name you specified. You can then reference that name in your main query as if it were a regular table.
Why Use CTEs?
1. Improved Readability
Compare these two approaches for finding customers who spent more than the average order amount:
Without CTE (nested subquery):
SELECT c.name, c.email, o.total
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE o.total > (
SELECT AVG(total)
FROM orders
WHERE created_at > '2025-01-01'
);
With CTE:
WITH avg_order AS (
SELECT AVG(total) AS average_total
FROM orders
WHERE created_at > '2025-01-01'
)
SELECT c.name, c.email, o.total
FROM customers c
JOIN orders o ON c.id = o.customer_id
CROSS JOIN avg_order
WHERE o.total > avg_order.average_total;
The CTE version is longer, but it clearly separates the logic: first we calculate the average, then we use it. When queries get more complex, this separation becomes invaluable.
2. Reusing Results Multiple Times
CTEs really shine when you need to reference the same derived data multiple times:
WITH monthly_sales AS (
SELECT
DATE_TRUNC('month', order_date) AS month,
SUM(total) AS revenue
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
)
SELECT
month,
revenue,
LAG(revenue) OVER (ORDER BY month) AS previous_month,
revenue - LAG(revenue) OVER (ORDER BY month) AS growth
FROM monthly_sales
ORDER BY month;
Without a CTE, you'd either repeat the aggregation logic or nest subqueries within subqueries.
3. Chaining Multiple CTEs
You can define multiple CTEs and have them reference each other:
WITH
active_users AS (
SELECT id, name, email
FROM users
WHERE last_login > NOW() - INTERVAL '30 days'
),
user_orders AS (
SELECT
u.id,
u.name,
COUNT(o.id) AS order_count,
SUM(o.total) AS total_spent
FROM active_users u
LEFT JOIN orders o ON u.id = o.customer_id
GROUP BY u.id, u.name
),
user_segments AS (
SELECT
*,
CASE
WHEN total_spent > 1000 THEN 'high_value'
WHEN total_spent > 100 THEN 'medium_value'
ELSE 'low_value'
END AS segment
FROM user_orders
)
SELECT segment, COUNT(*), AVG(total_spent)
FROM user_segments
GROUP BY segment;
Each CTE builds on the previous one, creating a clear data transformation pipeline.
Recursive CTEs: The Real Superpower
The
WITH RECURSIVE
syntax unlocks queries that would be impossible with standard SQL. Recursive CTEs can reference themselves, allowing you to traverse hierarchical data or generate series.
Traversing a Hierarchy
Consider an employees table where each employee has a manager:
CREATE TABLE employees (
id SERIAL PRIMARY KEY,
name TEXT,
manager_id INTEGER REFERENCES employees(id)
);
To find all employees under a specific manager, including indirect reports:
WITH RECURSIVE org_chart AS (
-- Base case: start with the manager
SELECT id, name, manager_id, 1 AS depth
FROM employees
WHERE id = 42
UNION ALL
-- Recursive case: find direct reports of people we've found
SELECT e.id, e.name, e.manager_id, oc.depth + 1
FROM employees e
JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT * FROM org_chart ORDER BY depth, name;
The query first selects the starting point (base case), then repeatedly joins to find more rows (recursive case) until no new rows are found.
Generating a Date Series
Need a row for every day in a range?
WITH RECURSIVE date_series AS (
SELECT DATE '2025-01-01' AS date
UNION ALL
SELECT date + INTERVAL '1 day'
FROM date_series
WHERE date < DATE '2025-12-31'
)
SELECT * FROM date_series;
This generates 365 rows, one for each day of 2025. (Note: PostgreSQL's
generate_series()
function is often better for this specific use case, but recursive CTEs work when you need more complex logic.)
Finding Paths in a Graph
For a table of connections between nodes:
WITH RECURSIVE paths AS (
SELECT
start_node,
end_node,
ARRAY[start_node, end_node] AS path,
1 AS hops
FROM connections
WHERE start_node = 'A'
UNION ALL
SELECT
p.start_node,
c.end_node,
p.path || c.end_node,
p.hops + 1
FROM paths p
JOIN connections c ON p.end_node = c.start_node
WHERE NOT c.end_node = ANY(p.path) -- Prevent cycles
AND p.hops < 10 -- Limit depth
)
SELECT * FROM paths WHERE end_node = 'Z';
When NOT to Use CTEs
CTEs aren't always the right choice. Here are situations where you should reconsider:
1. Performance-Critical Simple Queries
In older PostgreSQL versions (before 12), CTEs were optimization fences . The planner couldn't push predicates into the CTE or merge it with the outer query. While PostgreSQL 12+ can inline non-recursive CTEs, there are still cases where a subquery performs better.
If your CTE is used only once and you notice performance issues, try rewriting it as a subquery:
-- CTE version
WITH filtered AS (
SELECT * FROM large_table
)
SELECT * FROM filtered WHERE id = 123;
-- Subquery version (may be faster if CTE isn't inlined)
SELECT * FROM (
SELECT * FROM large_table
) filtered WHERE id = 123;
-- Even better: just write it directly
SELECT * FROM large_table WHERE id = 123;
2. When Materialization Hurts Performance
By default, PostgreSQL 12+ inlines CTEs when possible. But if you use
MATERIALIZED
:
WITH my_cte AS MATERIALIZED (
SELECT * FROM huge_table
)
SELECT * FROM my_cte WHERE rare_column = 'specific_value';
The entire CTE is computed and stored before filtering, which can be catastrophic for large tables. Only use
MATERIALIZED
when you know you need it (like when the CTE is referenced multiple times and is expensive to compute).
Disk space considerations : When PostgreSQL materializes a CTE, it writes the entire result set to disk (specifically to temporary files in
pgsql_tmp
if the data exceeds
work_mem
). For a CTE returning millions of rows with wide columns, this can consume gigabytes of temporary disk space. If your server has limited temp storage or you're running multiple concurrent queries with materialized CTEs, you risk filling up disk space and crashing queries.
Monitor your disk space usage with:
-- Check current temp file usage
SELECT temp_files, temp_bytes
FROM pg_stat_database
WHERE datname = current_database();
If you see high temp file usage, consider whether your materialized CTEs are the culprit.
3. Overly Simple Cases
Don't use a CTE just because you can:
-- Overkill
WITH active AS (
SELECT * FROM users WHERE active = true
)
SELECT name FROM active;
-- Just write this
SELECT name FROM users WHERE active = true;
CTEs add cognitive overhead. If the query is already simple and clear, keep it that way.
4. When Views or Temp Tables Are Better
If you're using the same CTE logic across multiple queries, consider:
- Views : For reusable query logic that doesn't need to be materialized
- Materialized Views : For expensive computations that can be refreshed periodically
- Temporary Tables : For intermediate results needed across multiple statements in a transaction
5. Recursive CTEs Without Termination Conditions
Always ensure your recursive CTEs will terminate:
-- Dangerous: could run forever with circular references
WITH RECURSIVE bad AS (
SELECT id, parent_id FROM nodes WHERE id = 1
UNION ALL
SELECT n.id, n.parent_id FROM nodes n JOIN bad b ON n.parent_id = b.id
)
SELECT * FROM bad;
-- Safe: includes cycle detection and depth limit
WITH RECURSIVE safe AS (
SELECT id, parent_id, ARRAY[id] AS visited, 1 AS depth
FROM nodes WHERE id = 1
UNION ALL
SELECT n.id, n.parent_id, s.visited || n.id, s.depth + 1
FROM nodes n
JOIN safe s ON n.parent_id = s.id
WHERE NOT n.id = ANY(s.visited)
AND s.depth < 100
)
SELECT * FROM safe;
Performance Comparisons: CTEs, Temp Tables, and Views
Understanding when to use CTEs versus temporary tables or views requires knowing their performance characteristics. Here's a practical comparison.
CTEs (Common Table Expressions)
Best for : Single-query readability, recursive operations, moderate data sizes
| Aspect | Behavior |
|---|---|
| Storage | In-memory (spills to disk if exceeds
work_mem
) |
| Scope | Single query only |
| Indexing | Not indexable |
| Planning | Re-planned with each query execution |
| Typical overhead | Minimal for small-medium results (<100K rows) |
Performance guideline : Use CTEs freely for result sets under 100,000 rows. Above that, benchmark against alternatives.
Temporary Tables
Best for : Large intermediate results, multi-statement workflows, when you need indexes
CREATE TEMP TABLE active_orders AS
SELECT * FROM orders WHERE status = 'active';
CREATE INDEX ON active_orders(customer_id);
-- Now use in multiple queries
SELECT * FROM active_orders WHERE customer_id = 123;
SELECT COUNT(*) FROM active_orders GROUP BY region;
| Aspect | Behavior |
|---|---|
| Storage | Disk-backed (in temp tablespace) |
| Scope | Current session (dropped on disconnect) |
| Indexing | Fully indexable |
| Planning | Statistics gathered with
ANALYZE
|
| Typical overhead | Higher creation cost, faster repeated access |
Performance guideline : Consider temp tables when:
- Result set exceeds 100K-500K rows
- You need to query the same data multiple times
- You need indexes for efficient lookups
- You're doing multi-step ETL within a transaction
Views (Non-Materialized)
Best for : Reusable query logic, access control, schema abstraction
CREATE VIEW active_customers AS
SELECT c.*, COUNT(o.id) as order_count
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
WHERE c.active = true
GROUP BY c.id;
| Aspect | Behavior |
|---|---|
| Storage | No storage (query rewritten at execution) |
| Scope | Permanent (until dropped) |
| Indexing | Uses underlying table indexes |
| Planning | Merged with outer query (usually optimal) |
| Typical overhead | None beyond the underlying query |
Performance guideline : Views have virtually no overhead since they're query macros. Use them liberally for organization. Performance depends entirely on the underlying query.
Materialized Views
Best for : Expensive aggregations, reporting dashboards, data that changes infrequently
CREATE MATERIALIZED VIEW monthly_revenue AS
SELECT
DATE_TRUNC('month', order_date) AS month,
SUM(total) AS revenue,
COUNT(*) AS order_count
FROM orders
GROUP BY 1;
-- Refresh when needed
REFRESH MATERIALIZED VIEW monthly_revenue;
| Aspect | Behavior |
|---|---|
| Storage | Persistent disk storage |
| Scope | Permanent (refreshed manually or via trigger) |
| Indexing | Fully indexable |
| Planning | Treated as a regular table |
| Typical overhead | Refresh cost; stale data between refreshes |
Performance guideline : Use materialized views when:
- The underlying query takes >1 second
- Data freshness of minutes/hours is acceptable
- The same aggregation is needed by multiple queries/users
Quick Decision Matrix
| Scenario | Recommended Approach |
|---|---|
| Breaking up a complex query for readability | CTE |
| Recursive tree/graph traversal | CTE (WITH RECURSIVE) |
| Same subquery used 2-3 times in one statement | CTE |
| Large result set (>500K rows) used multiple times | Temp table |
| Need indexes on intermediate results | Temp table |
| Reusable query logic across sessions | View |
| Expensive aggregation for dashboards | Materialized view |
| ETL pipeline with multiple transformation steps | Temp tables |
Benchmarking Example
When in doubt, measure. Here's a template for comparing approaches:
-- Test CTE approach
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
WITH filtered AS (
SELECT * FROM large_table WHERE category = 'A'
)
SELECT f.*, d.name
FROM filtered f
JOIN dimensions d ON f.dim_id = d.id
WHERE f.amount > 1000;
-- Test temp table approach
CREATE TEMP TABLE filtered AS
SELECT * FROM large_table WHERE category = 'A';
ANALYZE filtered;
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT f.*, d.name
FROM filtered f
JOIN dimensions d ON f.dim_id = d.id
WHERE f.amount > 1000;
DROP TABLE filtered;
Compare the execution times and buffer usage. The winner often depends on your specific data distribution and query patterns.
Best Practices
- Name CTEs descriptively :
monthly_revenueis better thancte1 - Keep each CTE focused : One logical operation per CTE
- Consider the execution plan : Use
EXPLAIN ANALYZEto verify performance - Use
NOT MATERIALIZEDif you want to ensure the planner can optimize:WITH my_cte AS NOT MATERIALIZED (...) - Document recursive CTEs : They can be tricky to understand later
Wrapping Up
Common Table Expressions are a powerful tool for writing clear, maintainable SQL. They excel at breaking complex queries into logical steps, reusing intermediate results, and traversing hierarchical data with recursion.
But like any tool, they're not always the right choice. For simple queries, they add unnecessary complexity. For performance-critical code, always verify with
EXPLAIN ANALYZE
that the planner is handling them efficiently.
Master CTEs, and you'll find yourself writing SQL that's not just correct, but a pleasure to read and maintain.