← Back to all posts
Tutorial PostgreSQL

Common Table Expressions in PostgreSQL: A Complete Guide

By Tech TeamEngineeringJanuary 25, 202615 min read

If you've ever written a SQL query so nested that you lost track of which subquery was which, Common Table Expressions (CTEs) are about to become your new best friend. They're one of PostgreSQL's most powerful features for writing readable, maintainable SQL.

What Are Common Table Expressions?

A Common Table Expression, or CTE, is a temporary named result set that you define within a query. Think of it as creating a temporary view that exists only for the duration of that single query. You define CTEs using the WITH keyword.

Here's the basic syntax:

WITH cte_name AS (
    SELECT column1, column2
    FROM some_table
    WHERE condition
)
SELECT *
FROM cte_name;

The query inside the parentheses runs first, and its results become available under the name you specified. You can then reference that name in your main query as if it were a regular table.

Why Use CTEs?

1. Improved Readability

Compare these two approaches for finding customers who spent more than the average order amount:

Without CTE (nested subquery):

SELECT c.name, c.email, o.total
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE o.total > (
    SELECT AVG(total)
    FROM orders
    WHERE created_at > '2025-01-01'
);

With CTE:

WITH avg_order AS (
    SELECT AVG(total) AS average_total
    FROM orders
    WHERE created_at > '2025-01-01'
)
SELECT c.name, c.email, o.total
FROM customers c
JOIN orders o ON c.id = o.customer_id
CROSS JOIN avg_order
WHERE o.total > avg_order.average_total;

The CTE version is longer, but it clearly separates the logic: first we calculate the average, then we use it. When queries get more complex, this separation becomes invaluable.

2. Reusing Results Multiple Times

CTEs really shine when you need to reference the same derived data multiple times:

WITH monthly_sales AS (
    SELECT
        DATE_TRUNC('month', order_date) AS month,
        SUM(total) AS revenue
    FROM orders
    GROUP BY DATE_TRUNC('month', order_date)
)
SELECT
    month,
    revenue,
    LAG(revenue) OVER (ORDER BY month) AS previous_month,
    revenue - LAG(revenue) OVER (ORDER BY month) AS growth
FROM monthly_sales
ORDER BY month;

Without a CTE, you'd either repeat the aggregation logic or nest subqueries within subqueries.

3. Chaining Multiple CTEs

You can define multiple CTEs and have them reference each other:

WITH
active_users AS (
    SELECT id, name, email
    FROM users
    WHERE last_login > NOW() - INTERVAL '30 days'
),
user_orders AS (
    SELECT
        u.id,
        u.name,
        COUNT(o.id) AS order_count,
        SUM(o.total) AS total_spent
    FROM active_users u
    LEFT JOIN orders o ON u.id = o.customer_id
    GROUP BY u.id, u.name
),
user_segments AS (
    SELECT
        *,
        CASE
            WHEN total_spent > 1000 THEN 'high_value'
            WHEN total_spent > 100 THEN 'medium_value'
            ELSE 'low_value'
        END AS segment
    FROM user_orders
)
SELECT segment, COUNT(*), AVG(total_spent)
FROM user_segments
GROUP BY segment;

Each CTE builds on the previous one, creating a clear data transformation pipeline.

Recursive CTEs: The Real Superpower

The WITH RECURSIVE syntax unlocks queries that would be impossible with standard SQL. Recursive CTEs can reference themselves, allowing you to traverse hierarchical data or generate series.

Traversing a Hierarchy

Consider an employees table where each employee has a manager:

CREATE TABLE employees (
    id SERIAL PRIMARY KEY,
    name TEXT,
    manager_id INTEGER REFERENCES employees(id)
);

To find all employees under a specific manager, including indirect reports:

WITH RECURSIVE org_chart AS (
    -- Base case: start with the manager
    SELECT id, name, manager_id, 1 AS depth
    FROM employees
    WHERE id = 42

    UNION ALL

    -- Recursive case: find direct reports of people we've found
    SELECT e.id, e.name, e.manager_id, oc.depth + 1
    FROM employees e
    JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT * FROM org_chart ORDER BY depth, name;

The query first selects the starting point (base case), then repeatedly joins to find more rows (recursive case) until no new rows are found.

Generating a Date Series

Need a row for every day in a range?

WITH RECURSIVE date_series AS (
    SELECT DATE '2025-01-01' AS date

    UNION ALL

    SELECT date + INTERVAL '1 day'
    FROM date_series
    WHERE date < DATE '2025-12-31'
)
SELECT * FROM date_series;

This generates 365 rows, one for each day of 2025. (Note: PostgreSQL's generate_series() function is often better for this specific use case, but recursive CTEs work when you need more complex logic.)

Finding Paths in a Graph

For a table of connections between nodes:

WITH RECURSIVE paths AS (
    SELECT
        start_node,
        end_node,
        ARRAY[start_node, end_node] AS path,
        1 AS hops
    FROM connections
    WHERE start_node = 'A'

    UNION ALL

    SELECT
        p.start_node,
        c.end_node,
        p.path || c.end_node,
        p.hops + 1
    FROM paths p
    JOIN connections c ON p.end_node = c.start_node
    WHERE NOT c.end_node = ANY(p.path)  -- Prevent cycles
      AND p.hops < 10  -- Limit depth
)
SELECT * FROM paths WHERE end_node = 'Z';

When NOT to Use CTEs

CTEs aren't always the right choice. Here are situations where you should reconsider:

1. Performance-Critical Simple Queries

In older PostgreSQL versions (before 12), CTEs were optimization fences . The planner couldn't push predicates into the CTE or merge it with the outer query. While PostgreSQL 12+ can inline non-recursive CTEs, there are still cases where a subquery performs better.

If your CTE is used only once and you notice performance issues, try rewriting it as a subquery:

-- CTE version
WITH filtered AS (
    SELECT * FROM large_table
)
SELECT * FROM filtered WHERE id = 123;

-- Subquery version (may be faster if CTE isn't inlined)
SELECT * FROM (
    SELECT * FROM large_table
) filtered WHERE id = 123;

-- Even better: just write it directly
SELECT * FROM large_table WHERE id = 123;

2. When Materialization Hurts Performance

By default, PostgreSQL 12+ inlines CTEs when possible. But if you use MATERIALIZED :

WITH my_cte AS MATERIALIZED (
    SELECT * FROM huge_table
)
SELECT * FROM my_cte WHERE rare_column = 'specific_value';

The entire CTE is computed and stored before filtering, which can be catastrophic for large tables. Only use MATERIALIZED when you know you need it (like when the CTE is referenced multiple times and is expensive to compute).

Disk space considerations : When PostgreSQL materializes a CTE, it writes the entire result set to disk (specifically to temporary files in pgsql_tmp if the data exceeds work_mem ). For a CTE returning millions of rows with wide columns, this can consume gigabytes of temporary disk space. If your server has limited temp storage or you're running multiple concurrent queries with materialized CTEs, you risk filling up disk space and crashing queries.

Monitor your disk space usage with:

-- Check current temp file usage
SELECT temp_files, temp_bytes
FROM pg_stat_database
WHERE datname = current_database();

If you see high temp file usage, consider whether your materialized CTEs are the culprit.

3. Overly Simple Cases

Don't use a CTE just because you can:

-- Overkill
WITH active AS (
    SELECT * FROM users WHERE active = true
)
SELECT name FROM active;

-- Just write this
SELECT name FROM users WHERE active = true;

CTEs add cognitive overhead. If the query is already simple and clear, keep it that way.

4. When Views or Temp Tables Are Better

If you're using the same CTE logic across multiple queries, consider:

  • Views : For reusable query logic that doesn't need to be materialized
  • Materialized Views : For expensive computations that can be refreshed periodically
  • Temporary Tables : For intermediate results needed across multiple statements in a transaction

5. Recursive CTEs Without Termination Conditions

Always ensure your recursive CTEs will terminate:

-- Dangerous: could run forever with circular references
WITH RECURSIVE bad AS (
    SELECT id, parent_id FROM nodes WHERE id = 1
    UNION ALL
    SELECT n.id, n.parent_id FROM nodes n JOIN bad b ON n.parent_id = b.id
)
SELECT * FROM bad;

-- Safe: includes cycle detection and depth limit
WITH RECURSIVE safe AS (
    SELECT id, parent_id, ARRAY[id] AS visited, 1 AS depth
    FROM nodes WHERE id = 1
    UNION ALL
    SELECT n.id, n.parent_id, s.visited || n.id, s.depth + 1
    FROM nodes n
    JOIN safe s ON n.parent_id = s.id
    WHERE NOT n.id = ANY(s.visited)
      AND s.depth < 100
)
SELECT * FROM safe;

Performance Comparisons: CTEs, Temp Tables, and Views

Understanding when to use CTEs versus temporary tables or views requires knowing their performance characteristics. Here's a practical comparison.

CTEs (Common Table Expressions)

Best for : Single-query readability, recursive operations, moderate data sizes

AspectBehavior
StorageIn-memory (spills to disk if exceeds work_mem )
ScopeSingle query only
IndexingNot indexable
PlanningRe-planned with each query execution
Typical overheadMinimal for small-medium results (<100K rows)

Performance guideline : Use CTEs freely for result sets under 100,000 rows. Above that, benchmark against alternatives.

Temporary Tables

Best for : Large intermediate results, multi-statement workflows, when you need indexes

CREATE TEMP TABLE active_orders AS
SELECT * FROM orders WHERE status = 'active';

CREATE INDEX ON active_orders(customer_id);

-- Now use in multiple queries
SELECT * FROM active_orders WHERE customer_id = 123;
SELECT COUNT(*) FROM active_orders GROUP BY region;
AspectBehavior
StorageDisk-backed (in temp tablespace)
ScopeCurrent session (dropped on disconnect)
IndexingFully indexable
PlanningStatistics gathered with ANALYZE
Typical overheadHigher creation cost, faster repeated access

Performance guideline : Consider temp tables when:

  • Result set exceeds 100K-500K rows
  • You need to query the same data multiple times
  • You need indexes for efficient lookups
  • You're doing multi-step ETL within a transaction

Views (Non-Materialized)

Best for : Reusable query logic, access control, schema abstraction

CREATE VIEW active_customers AS
SELECT c.*, COUNT(o.id) as order_count
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
WHERE c.active = true
GROUP BY c.id;
AspectBehavior
StorageNo storage (query rewritten at execution)
ScopePermanent (until dropped)
IndexingUses underlying table indexes
PlanningMerged with outer query (usually optimal)
Typical overheadNone beyond the underlying query

Performance guideline : Views have virtually no overhead since they're query macros. Use them liberally for organization. Performance depends entirely on the underlying query.

Materialized Views

Best for : Expensive aggregations, reporting dashboards, data that changes infrequently

CREATE MATERIALIZED VIEW monthly_revenue AS
SELECT
    DATE_TRUNC('month', order_date) AS month,
    SUM(total) AS revenue,
    COUNT(*) AS order_count
FROM orders
GROUP BY 1;

-- Refresh when needed
REFRESH MATERIALIZED VIEW monthly_revenue;
AspectBehavior
StoragePersistent disk storage
ScopePermanent (refreshed manually or via trigger)
IndexingFully indexable
PlanningTreated as a regular table
Typical overheadRefresh cost; stale data between refreshes

Performance guideline : Use materialized views when:

  • The underlying query takes >1 second
  • Data freshness of minutes/hours is acceptable
  • The same aggregation is needed by multiple queries/users

Quick Decision Matrix

ScenarioRecommended Approach
Breaking up a complex query for readabilityCTE
Recursive tree/graph traversalCTE (WITH RECURSIVE)
Same subquery used 2-3 times in one statementCTE
Large result set (>500K rows) used multiple timesTemp table
Need indexes on intermediate resultsTemp table
Reusable query logic across sessionsView
Expensive aggregation for dashboardsMaterialized view
ETL pipeline with multiple transformation stepsTemp tables

Benchmarking Example

When in doubt, measure. Here's a template for comparing approaches:

-- Test CTE approach
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
WITH filtered AS (
    SELECT * FROM large_table WHERE category = 'A'
)
SELECT f.*, d.name
FROM filtered f
JOIN dimensions d ON f.dim_id = d.id
WHERE f.amount > 1000;

-- Test temp table approach
CREATE TEMP TABLE filtered AS
SELECT * FROM large_table WHERE category = 'A';
ANALYZE filtered;

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT f.*, d.name
FROM filtered f
JOIN dimensions d ON f.dim_id = d.id
WHERE f.amount > 1000;

DROP TABLE filtered;

Compare the execution times and buffer usage. The winner often depends on your specific data distribution and query patterns.

Best Practices

  1. Name CTEs descriptively : monthly_revenue is better than cte1
  2. Keep each CTE focused : One logical operation per CTE
  3. Consider the execution plan : Use EXPLAIN ANALYZE to verify performance
  4. Use NOT MATERIALIZED if you want to ensure the planner can optimize: WITH my_cte AS NOT MATERIALIZED (...)
  5. Document recursive CTEs : They can be tricky to understand later

Wrapping Up

Common Table Expressions are a powerful tool for writing clear, maintainable SQL. They excel at breaking complex queries into logical steps, reusing intermediate results, and traversing hierarchical data with recursion.

But like any tool, they're not always the right choice. For simple queries, they add unnecessary complexity. For performance-critical code, always verify with EXPLAIN ANALYZE that the planner is handling them efficiently.

Master CTEs, and you'll find yourself writing SQL that's not just correct, but a pleasure to read and maintain.

Need help with your database? 💬

Connect to any database, write blazing-fast queries, and share them with your team.

Get MightyDB