PostgreSQL (or just "Postgres") is the database that keeps winning developers over. It's open-source, rock-solid, and packed with features that other databases charge enterprise prices for. Whether you're building a side project or architecting a system that needs to handle millions of requests, Postgres has your back.
This tutorial takes you from zero to production-ready. We'll cover everything — installation, core SQL, PostgreSQL-specific superpowers, and the stuff you actually need to know before deploying to production. Every example is runnable, so fire up a terminal and follow along.
What is PostgreSQL and Why Learn It?
PostgreSQL is an open-source relational database management system (RDBMS) that's been around since the 1980s. It started as a research project at UC Berkeley, and today it powers everything from startups to Fortune 500 companies.
So why pick Postgres over the dozens of other databases out there?
- Standards compliant. Postgres follows the SQL standard more closely than any other major database. What you learn here transfers to other systems.
- Feature-rich. JSONB, full-text search, window functions, CTEs, custom types, extensions — Postgres does things out of the box that other databases need plugins (or entirely separate services) for.
- Reliable. ACID-compliant with a track record spanning decades. Your data is safe.
- Extensible. Need vector search for AI? PostGIS for maps? Time-series data? There's an extension for that.
- Free. No licensing costs, no feature tiers. Everything is included.
PostgreSQL vs Other Databases (Quick Comparison)
| Feature | PostgreSQL | MySQL | SQLite | MongoDB |
|---|---|---|---|---|
| ACID compliance | Full | Full (InnoDB) | Full | Multi-doc (4.0+) |
| JSON support | JSONB (indexed, queryable) | JSON (limited) | JSON1 extension | Native |
| Full-text search | Built-in | Built-in (basic) | FTS5 extension | Atlas Search |
| Extensibility | Excellent (extensions) | Limited | Limited | Limited |
| License | PostgreSQL (MIT-like) | GPL / Commercial | Public Domain | SSPL |
| Best for | General purpose, complex queries | Web apps, read-heavy | Embedded, mobile | Document-heavy apps |
Real-World PostgreSQL Use Cases
Postgres isn't just for CRUD apps. Here's where it really shines:
- Web application backends — Discord, Instagram, and Supabase all run on Postgres
- Geospatial applications — PostGIS makes Postgres a full-featured GIS
- Analytics and reporting — Window functions and materialized views handle complex data analysis
- Multi-tenant SaaS — Row-level security and schemas make tenant isolation straightforward
- AI/ML applications — pgvector enables semantic search alongside your relational data
Installing PostgreSQL
Let's get Postgres running on your machine. Pick your operating system below.
Installation on macOS
The easiest way is with Homebrew :
brew install postgresql@17
Start the service:
brew services start postgresql@17
Verify it's running:
psql postgres
You should see the
postgres=#
prompt. Type
\q
to exit.
Installation on Ubuntu/Debian
Ubuntu includes PostgreSQL in its default repositories, but for the latest version, use the official PostgreSQL apt repository:
# Add the PostgreSQL APT repository
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
# Import the repository signing key
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
# Update and install
sudo apt update
sudo apt install postgresql-17
PostgreSQL starts automatically after installation. Verify:
sudo systemctl status postgresql
Switch to the
postgres
user to access the database:
sudo -u postgres psql
Installation on Windows
- Download the installer from postgresql.org/download/windows
- Run the installer and follow the prompts
- Set a password for the
postgressuperuser when asked (remember this!) - The installer includes pgAdmin, a graphical management tool
After installation, open SQL Shell (psql) from the Start Menu to connect.
Docker Installation (Recommended for Development)
Docker is a great option if you want a clean, isolated setup:
docker run --name my-postgres \
-e POSTGRES_PASSWORD=mysecretpassword \
-p 5432:5432 \
-d postgres:17
Connect to it:
docker exec -it my-postgres psql -U postgres
Docker is ideal for development because you can spin up multiple Postgres versions, blow them away, and start fresh without affecting your system. For production, a managed service or dedicated server is usually a better choice.
Common Installation Issues and Fixes
Port 5432 already in use? Another Postgres instance (or a different service) is using the default port. Either stop the other process or use a different port:
# On macOS/Linux, find what's using the port
lsof -i :5432
# Or start Postgres on a different port
# In postgresql.conf, change: port = 5433
Permission denied on Linux? PostgreSQL creates a
postgres
system user during installation. Connect through that user first, then create your own role:
sudo -u postgres createuser --interactive
Can't connect from another machine? By default, Postgres only listens on localhost. Edit
postgresql.conf
to set
listen_addresses = '*'
and update
pg_hba.conf
to allow remote connections.
PostgreSQL Fundamentals
Now that Postgres is running, let's get comfortable with the basics.
Connecting to PostgreSQL
psql is the built-in command-line client. It's powerful and worth learning:
# Connect to a specific database
psql -h localhost -p 5432 -U myuser -d mydb
# Connect with a connection string
psql "postgresql://myuser:mypassword@localhost:5432/mydb"
Once connected, here are the psql commands you'll use constantly:
| Command | What it does |
|---|---|
\l
| List all databases |
\c dbname
| Connect to a different database |
\dt
| List tables in current schema |
\d tablename
| Describe a table (columns, types, indexes) |
\di
| List indexes |
\df
| List functions |
\x
| Toggle expanded output (great for wide rows) |
\timing
| Toggle query timing display |
\q
| Quit psql |
If you prefer a graphical interface, check out pgAdmin , DBeaver , or a modern SQL client like MightyDB that makes it easy to write, share, and collaborate on queries.
PostgreSQL Architecture: Databases, Schemas, Tables
Postgres organizes data in a hierarchy:
PostgreSQL Cluster (server instance)
├── Database: myapp
│ ├── Schema: public (default)
│ │ ├── Table: users
│ │ ├── Table: orders
│ │ └── Table: products
│ └── Schema: analytics
│ ├── Table: events
│ └── Table: page_views
├── Database: testing
│ └── Schema: public
│ └── ...
- A cluster is a single PostgreSQL server instance
- A database is an isolated container — you can't query across databases easily
- A schema is a namespace within a database — like folders for your tables
- Tables hold your actual data
Every database starts with a
public
schema. You can create more to organize things:
CREATE SCHEMA analytics;
CREATE TABLE analytics.events (
id SERIAL PRIMARY KEY,
event_name TEXT,
created_at TIMESTAMPTZ DEFAULT NOW()
);
Creating Your First Database
Let's create a database and start building:
-- Create a new database
CREATE DATABASE tutorial;
-- Connect to it (in psql)
\c tutorial
Or from the command line:
createdb tutorial
psql tutorial
Essential SQL Operations in PostgreSQL
Time to write some SQL. We'll build a small bookstore database to make things concrete.
Data Types (PostgreSQL-Specific Types)
Postgres has all the standard SQL types plus some unique ones:
| Type | Description | Example |
|---|---|---|
INTEGER
/
BIGINT
| Whole numbers |
42
,
9999999999
|
NUMERIC(p,s)
| Exact decimal |
19.99
|
TEXT
| Variable-length string |
'Hello world'
|
VARCHAR(n)
| Variable-length with limit |
'Hello'
(max n chars) |
BOOLEAN
| True/false |
TRUE
,
FALSE
|
TIMESTAMPTZ
| Timestamp with timezone |
'2026-02-09 14:30:00+00'
|
DATE
| Date only |
'2026-02-09'
|
UUID
| Universally unique identifier |
gen_random_uuid()
|
JSONB
| Binary JSON (indexed!) |
'{"key": "value"}'
|
TEXT[]
| Array of text |
ARRAY['a', 'b', 'c']
|
INET
| IP address |
'192.168.1.1'
|
TSQUERY
| Full-text search query |
to_tsquery('hello & world')
|
Use
TEXT
instead of
VARCHAR
in most cases. In PostgreSQL, there's no performance difference between
TEXT
and
VARCHAR
. Use
VARCHAR(n)
only when you need to enforce a maximum length as a business rule.
Creating and Modifying Tables
Let's build our bookstore:
-- Authors table
CREATE TABLE authors (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
bio TEXT,
born_date DATE,
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Books table
CREATE TABLE books (
id SERIAL PRIMARY KEY,
title TEXT NOT NULL,
author_id INTEGER REFERENCES authors(id),
isbn VARCHAR(13) UNIQUE,
price NUMERIC(10, 2) NOT NULL,
pages INTEGER,
published_date DATE,
tags TEXT[] DEFAULT '{}',
metadata JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Reviews table
CREATE TABLE reviews (
id SERIAL PRIMARY KEY,
book_id INTEGER REFERENCES books(id) ON DELETE CASCADE,
reviewer_name TEXT NOT NULL,
rating INTEGER CHECK (rating BETWEEN 1 AND 5),
comment TEXT,
created_at TIMESTAMPTZ DEFAULT NOW()
);
Notice a few PostgreSQL-specific things here:
SERIALauto-increments (Postgres also supports the SQL-standardGENERATED ALWAYS AS IDENTITY)REFERENCEScreates foreign keysON DELETE CASCADEremoves reviews when a book is deletedTEXT[]creates an array columnJSONBstores flexible structured dataCHECKadds a validation constraint
Modifying tables after creation:
-- Add a column
ALTER TABLE books ADD COLUMN language TEXT DEFAULT 'English';
-- Rename a column
ALTER TABLE books RENAME COLUMN pages TO page_count;
-- Change a column type
ALTER TABLE books ALTER COLUMN page_count TYPE SMALLINT;
-- Drop a column
ALTER TABLE books DROP COLUMN language;
CRUD Operations
Create (INSERT):
-- Insert a single row
INSERT INTO authors (name, bio, born_date)
VALUES ('Ursula K. Le Guin', 'American author of science fiction', '1929-10-21');
-- Insert multiple rows
INSERT INTO books (title, author_id, isbn, price, pages, published_date, tags)
VALUES
('A Wizard of Earthsea', 1, '9780547722023', 9.99, 183, '1968-11-01', ARRAY['fantasy', 'classic']),
('The Left Hand of Darkness', 1, '9780441478125', 11.99, 304, '1969-03-01', ARRAY['sci-fi', 'classic']),
('The Dispossessed', 1, '9780060512750', 13.99, 387, '1974-05-01', ARRAY['sci-fi', 'utopia']);
-- Insert and return the new row
INSERT INTO reviews (book_id, reviewer_name, rating, comment)
VALUES (1, 'Alice', 5, 'A masterpiece of fantasy literature')
RETURNING *;
Read (SELECT):
-- Basic select
SELECT title, price FROM books;
-- With filtering
SELECT title, price FROM books
WHERE price < 12.00 AND 'classic' = ANY(tags);
-- Sorting and limiting
SELECT title, price FROM books
ORDER BY price DESC
LIMIT 5;
-- Pattern matching
SELECT title FROM books WHERE title ILIKE '%dark%';
Update:
-- Update specific rows
UPDATE books SET price = 10.99
WHERE isbn = '9780547722023';
-- Update with RETURNING to see what changed
UPDATE books SET price = price * 1.10
WHERE author_id = 1
RETURNING title, price;
Delete:
-- Delete specific rows
DELETE FROM reviews WHERE rating < 2;
-- Delete and see what was removed
DELETE FROM books WHERE published_date < '1970-01-01'
RETURNING title;
The
RETURNING
clause is a PostgreSQL superpower. It lets you see exactly what was inserted, updated, or deleted without running a separate
SELECT
. Use it generously.
Querying Data: SELECT, WHERE, JOIN
Joins are how you connect related data across tables. Let's see the main types:
-- INNER JOIN: Only matching rows from both tables
SELECT b.title, a.name AS author
FROM books b
INNER JOIN authors a ON b.author_id = a.id;
-- LEFT JOIN: All books, even those without an author
SELECT b.title, a.name AS author
FROM books b
LEFT JOIN authors a ON b.author_id = a.id;
-- Join three tables
SELECT
b.title,
a.name AS author,
r.rating,
r.comment
FROM books b
JOIN authors a ON b.author_id = a.id
JOIN reviews r ON r.book_id = b.id
ORDER BY r.rating DESC;
Aggregations summarize data:
-- Average rating per book
SELECT
b.title,
COUNT(r.id) AS review_count,
ROUND(AVG(r.rating), 1) AS avg_rating
FROM books b
LEFT JOIN reviews r ON r.book_id = b.id
GROUP BY b.id, b.title
HAVING COUNT(r.id) > 0
ORDER BY avg_rating DESC;
Subqueries nest one query inside another:
-- Books with above-average prices
SELECT title, price
FROM books
WHERE price > (SELECT AVG(price) FROM books);
Intermediate PostgreSQL Concepts
You've got the basics down. Now let's level up with the concepts that separate beginners from competent Postgres users.
Indexes and When to Use Them
Indexes make queries fast. Without them, Postgres has to scan every row in a table (a "sequential scan") to find matches. With the right index, it can jump directly to the rows it needs.
-- Create an index on a frequently queried column
CREATE INDEX idx_books_author_id ON books(author_id);
-- Composite index for queries that filter on multiple columns
CREATE INDEX idx_books_author_price ON books(author_id, price);
-- Partial index: only index rows that match a condition
CREATE INDEX idx_books_expensive ON books(price) WHERE price > 50.00;
-- Index for text search patterns
CREATE INDEX idx_books_title_trgm ON books USING gin(title gin_trgm_ops);
When to create indexes:
- Columns used in
WHEREclauses frequently - Columns used in
JOINconditions - Columns used in
ORDER BY
When NOT to create indexes:
- Small tables (under a few thousand rows) — sequential scans are fine
- Columns that are rarely queried
- Tables with heavy write workloads and few reads — indexes slow down inserts/updates
Constraints and Data Integrity
Constraints protect your data from invalid states:
-- NOT NULL: column must have a value
ALTER TABLE books ALTER COLUMN title SET NOT NULL;
-- UNIQUE: no duplicate values
ALTER TABLE authors ADD CONSTRAINT unique_author_name UNIQUE (name);
-- CHECK: custom validation
ALTER TABLE books ADD CONSTRAINT positive_price CHECK (price > 0);
-- FOREIGN KEY: referential integrity
ALTER TABLE books ADD CONSTRAINT fk_author
FOREIGN KEY (author_id) REFERENCES authors(id);
-- EXCLUSION: prevent overlapping ranges (PostgreSQL-specific)
CREATE TABLE room_bookings (
room_id INTEGER,
booked_during TSTZRANGE,
EXCLUDE USING gist (room_id WITH =, booked_during WITH &&)
);
That last one — exclusion constraints — is pure Postgres magic. It prevents double-booking a room by ensuring no two rows can have the same
room_id
with overlapping time ranges. Try doing that with a simple
UNIQUE
constraint!
Transactions and ACID Properties
Transactions ensure that a group of operations either all succeed or all fail. This is critical for data integrity:
BEGIN;
-- Transfer money between accounts
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
-- If everything looks good
COMMIT;
-- If something went wrong
-- ROLLBACK;
PostgreSQL is fully ACID compliant:
- Atomicity — All changes in a transaction succeed or none do
- Consistency — The database moves from one valid state to another
- Isolation — Concurrent transactions don't interfere with each other
- Durability — Committed data survives crashes
Savepoints let you partially roll back within a transaction:
BEGIN;
INSERT INTO orders (customer_id, total) VALUES (1, 99.99);
SAVEPOINT before_items;
INSERT INTO order_items (order_id, product_id) VALUES (1, 999);
-- Oops, product 999 doesn't exist!
ROLLBACK TO before_items;
-- Continue with valid data
INSERT INTO order_items (order_id, product_id) VALUES (1, 1);
COMMIT;
Views and Common Table Expressions
Views save a query for reuse — like a virtual table:
CREATE VIEW book_stats AS
SELECT
b.id,
b.title,
a.name AS author,
COUNT(r.id) AS review_count,
COALESCE(ROUND(AVG(r.rating), 1), 0) AS avg_rating
FROM books b
JOIN authors a ON b.author_id = a.id
LEFT JOIN reviews r ON r.book_id = b.id
GROUP BY b.id, b.title, a.name;
-- Now use it like a table
SELECT * FROM book_stats WHERE avg_rating >= 4.0;
Common Table Expressions (CTEs) break complex queries into readable steps:
WITH top_authors AS (
SELECT
a.id,
a.name,
COUNT(b.id) AS book_count
FROM authors a
JOIN books b ON b.author_id = a.id
GROUP BY a.id, a.name
HAVING COUNT(b.id) >= 3
),
author_ratings AS (
SELECT
ta.name,
ta.book_count,
ROUND(AVG(r.rating), 1) AS avg_rating
FROM top_authors ta
JOIN books b ON b.author_id = ta.id
JOIN reviews r ON r.book_id = b.id
GROUP BY ta.name, ta.book_count
)
SELECT * FROM author_ratings
ORDER BY avg_rating DESC;
CTEs make complex queries readable. Each
WITH
block is a named intermediate result that the next block (or the final query) can reference. You'll find more detailed examples in our guide to CTEs .
PostgreSQL-Specific Features
This is where Postgres really separates itself from the pack. These features either don't exist in other databases or require expensive add-ons.
Working with JSON/JSONB Data
JSONB is one of PostgreSQL's killer features. It lets you store structured, flexible data alongside your relational tables — and it's fully indexed and queryable.
-- Insert JSON data
INSERT INTO books (title, author_id, isbn, price, pages, metadata)
VALUES (
'Data-Driven Design',
1,
'9781234567890',
29.99,
250,
'{
"edition": 2,
"formats": ["hardcover", "ebook", "audiobook"],
"awards": [
{"name": "Best Tech Book", "year": 2025}
],
"dimensions": {"height": 23, "width": 15}
}'
);
-- Query JSON fields
SELECT
title,
metadata->>'edition' AS edition,
metadata->'dimensions'->>'height' AS height_cm
FROM books
WHERE metadata @> '{"edition": 2}';
-- Query inside JSON arrays
SELECT title
FROM books
WHERE metadata->'formats' ? 'audiobook';
-- Update a JSON field
UPDATE books
SET metadata = jsonb_set(metadata, '{edition}', '3')
WHERE isbn = '9781234567890';
Key JSONB operators:
| Operator | Meaning | Example |
|---|---|---|
->
| Get JSON element (as JSON) |
data->'key'
|
->>
| Get JSON element (as text) |
data->>'key'
|
@>
| Contains |
data @> '{"key": "val"}'
|
?
| Key exists |
data ? 'key'
|
\|\|
| Concatenate/merge |
data \|\| '{"new": true}'
|
Always use JSONB, not JSON. JSONB stores data in a decomposed binary format that's faster to query and supports indexing. Plain JSON stores the raw text — it's only useful when you need to preserve exact formatting or key ordering.
Array Data Types and Operations
Arrays let you store lists of values in a single column — no join table needed for simple cases:
-- Books already have a tags array!
SELECT title, tags FROM books;
-- Query arrays
SELECT title FROM books WHERE 'sci-fi' = ANY(tags);
-- Array contains all of these
SELECT title FROM books WHERE tags @> ARRAY['sci-fi', 'classic'];
-- Array overlap (contains any of these)
SELECT title FROM books WHERE tags && ARRAY['fantasy', 'utopia'];
-- Unnest arrays into rows
SELECT title, unnest(tags) AS tag FROM books;
-- Aggregate into arrays
SELECT
author_id,
array_agg(title ORDER BY published_date) AS books_chronological
FROM books
GROUP BY author_id;
Full-Text Search Basics
PostgreSQL has a built-in full-text search engine that handles stemming, ranking, and language-aware tokenization:
-- Simple full-text search
SELECT title
FROM books
WHERE to_tsvector('english', title) @@ to_tsquery('english', 'wizard');
-- Add a search index for performance
ALTER TABLE books ADD COLUMN search_vector TSVECTOR
GENERATED ALWAYS AS (
to_tsvector('english', coalesce(title, '') || ' ' || coalesce(metadata->>'description', ''))
) STORED;
CREATE INDEX idx_books_search ON books USING gin(search_vector);
-- Now search is fast
SELECT title, ts_rank(search_vector, query) AS rank
FROM books, to_tsquery('english', 'fantasy | wizard') AS query
WHERE search_vector @@ query
ORDER BY rank DESC;
For many applications, Postgres full-text search is powerful enough that you don't need a separate search service like Elasticsearch.
PostgreSQL Extensions
Extensions are add-on packages that give Postgres new superpowers. Here are a few you should know about:
-- Enable an extension
CREATE EXTENSION IF NOT EXISTS pg_trgm; -- fuzzy text matching
CREATE EXTENSION IF NOT EXISTS pgcrypto; -- encryption & hashing
-- Generate UUIDs (built into Postgres 13+)
SELECT gen_random_uuid();
-- Fuzzy matching with pg_trgm
SELECT name, similarity(name, 'Ursla Le Gin') AS sim
FROM authors
WHERE name % 'Ursla Le Gin' -- finds despite typos
ORDER BY sim DESC;
Popular extensions include PostGIS (geospatial), pgvector (AI vector search), pg_stat_statements (query performance monitoring), and pg_cron (scheduled jobs). Check out our guide to PostgreSQL extensions for the full rundown.
Practice Projects and Exercises
Theory only goes so far. Here are three projects that put what you've learned into practice, each building on the concepts from previous sections.
Beginner Project: Blog Database Schema
Design and populate a blog platform database:
-- Create the schema
CREATE TABLE blog_users (
id SERIAL PRIMARY KEY,
username TEXT UNIQUE NOT NULL,
email TEXT UNIQUE NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE posts (
id SERIAL PRIMARY KEY,
author_id INTEGER REFERENCES blog_users(id),
title TEXT NOT NULL,
slug TEXT UNIQUE NOT NULL,
body TEXT NOT NULL,
status TEXT DEFAULT 'draft' CHECK (status IN ('draft', 'published', 'archived')),
published_at TIMESTAMPTZ,
tags TEXT[] DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE comments (
id SERIAL PRIMARY KEY,
post_id INTEGER REFERENCES posts(id) ON DELETE CASCADE,
author_id INTEGER REFERENCES blog_users(id),
body TEXT NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Exercise: Write queries to...
-- 1. Find the 5 most recent published posts with author names
-- 2. Count comments per post
-- 3. Find all posts tagged with 'postgresql'
-- 4. Get the most active commenters
Intermediate Project: E-commerce System
This one exercises joins, aggregations, and transactions:
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
email TEXT UNIQUE NOT NULL
);
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
price NUMERIC(10,2) NOT NULL CHECK (price > 0),
stock INTEGER NOT NULL DEFAULT 0 CHECK (stock >= 0)
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INTEGER REFERENCES customers(id),
status TEXT DEFAULT 'pending' CHECK (status IN ('pending', 'paid', 'shipped', 'delivered')),
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE order_items (
id SERIAL PRIMARY KEY,
order_id INTEGER REFERENCES orders(id) ON DELETE CASCADE,
product_id INTEGER REFERENCES products(id),
quantity INTEGER NOT NULL CHECK (quantity > 0),
unit_price NUMERIC(10,2) NOT NULL
);
-- Exercise: Write queries to...
-- 1. Place an order (use a transaction to deduct stock)
-- 2. Calculate total revenue per month
-- 3. Find the top 10 customers by total spend
-- 4. List products that are running low on stock
Advanced Project: Multi-Tenant SaaS Database
This project uses schemas and row-level security — real patterns from production SaaS apps:
-- Each tenant gets their own schema
CREATE SCHEMA tenant_acme;
CREATE SCHEMA tenant_globex;
-- Or use RLS for shared-table multi-tenancy
CREATE TABLE projects (
id SERIAL PRIMARY KEY,
tenant_id INTEGER NOT NULL,
name TEXT NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Enable Row Level Security
ALTER TABLE projects ENABLE ROW LEVEL SECURITY;
-- Users can only see their own tenant's data
CREATE POLICY tenant_isolation ON projects
FOR ALL
USING (tenant_id = current_setting('app.tenant_id')::INTEGER);
-- Set the tenant context before querying
SET app.tenant_id = '1';
SELECT * FROM projects; -- Only sees tenant 1's projects
Performance and Optimization Basics
You don't need to be a DBA to write reasonably fast queries. Here are the fundamentals.
Understanding EXPLAIN
EXPLAIN
shows you how PostgreSQL plans to execute a query.
EXPLAIN ANALYZE
actually runs it and shows real timing:
EXPLAIN ANALYZE
SELECT b.title, a.name
FROM books b
JOIN authors a ON b.author_id = a.id
WHERE b.price > 10.00;
Output looks something like:
Hash Join (cost=1.04..2.15 rows=2 width=64) (actual time=0.035..0.037 rows=2 loops=1)
Hash Cond: (b.author_id = a.id)
-> Seq Scan on books b (cost=0.00..1.05 rows=2 width=40) (actual time=0.009..0.010 rows=2 loops=1)
Filter: (price > 10.00)
-> Hash (cost=1.01..1.01 rows=1 width=36) (actual time=0.008..0.008 rows=1 loops=1)
-> Seq Scan on authors a (cost=0.00..1.01 rows=1 width=36) (actual time=0.003..0.004 rows=1 loops=1)
Planning Time: 0.150 ms
Execution Time: 0.062 ms
Things to watch for:
- Seq Scan on large tables — might need an index
- Rows (estimated vs actual) — big differences mean stale statistics (run
ANALYZE) - Nested Loop with many iterations — could be slow on large datasets
Query Optimization Essentials
Rule 1: Add indexes for your WHERE clauses.
-- Before: Seq Scan on 1M rows
SELECT * FROM orders WHERE customer_id = 42;
-- Add an index
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
-- After: Index Scan (much faster)
Rule 2: Only select what you need.
-- Bad: selects every column
SELECT * FROM orders;
-- Better: only the columns you need
SELECT id, status, created_at FROM orders;
Rule 3: Use
EXPLAIN ANALYZE
to verify your optimizations actually help.
Don't guess. Measure.
Connection Pooling and Configuration
PostgreSQL creates a new process for each connection, which uses memory. In production, use a connection pooler like PgBouncer to share connections across your application:
# pgbouncer.ini
[databases]
mydb = host=localhost port=5432 dbname=mydb
[pgbouncer]
listen_port = 6432
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 20
Key
postgresql.conf
settings to tune:
| Setting | Default | Recommended | What it does |
|---|---|---|---|
shared_buffers
| 128MB | 25% of RAM | Cache for frequently accessed data |
work_mem
| 4MB | 16-64MB | Memory for sorts and hash operations |
effective_cache_size
| 4GB | 75% of RAM | Helps planner estimate index vs scan cost |
maintenance_work_mem
| 64MB | 512MB-1GB | Memory for VACUUM, CREATE INDEX |
Don't blindly set
work_mem
too high. It's allocated per operation per connection . If you have 100 connections and set
work_mem = 256MB
, a complex query with multiple sort operations could consume gigabytes of RAM.
Moving to Production
Building locally is one thing. Running PostgreSQL in production requires some extra care.
Security Best Practices
1. Use strong passwords and limit superuser access:
-- Create application users with minimal privileges
CREATE ROLE app_user WITH LOGIN PASSWORD 'strong-random-password';
GRANT CONNECT ON DATABASE myapp TO app_user;
GRANT USAGE ON SCHEMA public TO app_user;
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO app_user;
2. Configure
pg_hba.conf
carefully:
# Only allow specific IPs with encrypted passwords
host myapp app_user 10.0.0.0/8 scram-sha-256
host all all 0.0.0.0/0 reject
3. Enable SSL/TLS for connections:
# In postgresql.conf
ssl = on
ssl_cert_file = '/path/to/server.crt'
ssl_key_file = '/path/to/server.key'
4. Keep PostgreSQL updated. Security patches are released regularly. Stay current.
Backup and Recovery Strategies
Logical backups with
pg_dump
— best for smaller databases and migrations:
# Backup a single database
pg_dump -Fc mydb > mydb_backup.dump
# Restore
pg_restore -d mydb mydb_backup.dump
Physical backups with
pg_basebackup
— best for large databases and point-in-time recovery:
# Full base backup
pg_basebackup -D /backup/base -Ft -z -P
Continuous archiving enables point-in-time recovery (PITR) by archiving WAL (Write-Ahead Log) files:
# In postgresql.conf
archive_mode = on
archive_command = 'cp %p /archive/%f'
Test your backups regularly. A backup you haven't tested is not a backup. Schedule monthly restore tests to a separate server.
Monitoring Your PostgreSQL Instance
At minimum, monitor these:
Active connections:
SELECT count(*) FROM pg_stat_activity WHERE state = 'active';
Long-running queries:
SELECT pid, now() - pg_stat_activity.query_start AS duration, query
FROM pg_stat_activity
WHERE state = 'active'
AND now() - query_start > interval '5 minutes';
Table bloat (dead tuples):
SELECT relname, n_dead_tup, n_live_tup,
round(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 1) AS dead_pct
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 10;
Cache hit ratio (should be > 99%):
SELECT
sum(heap_blks_hit) / NULLIF(sum(heap_blks_hit) + sum(heap_blks_read), 0) AS cache_hit_ratio
FROM pg_statio_user_tables;
Tools like pg_stat_statements , pgMonitor , and Grafana dashboards make monitoring much easier in practice.
Troubleshooting Common Issues
Here's a quick reference for problems you'll likely run into:
"FATAL: too many connections" — You've hit the
max_connections
limit. Either increase it (not ideal) or use a connection pooler like PgBouncer.
"ERROR: deadlock detected" — Two transactions are waiting on each other. Review your transaction logic and try to acquire locks in a consistent order.
Queries suddenly got slow — Statistics might be stale. Run
ANALYZE
on the affected tables, or check if autovacuum is keeping up.
"ERROR: could not resize shared memory segment" — Usually happens on macOS when
shared_buffers
is set too high. Increase
kern.sysv.shmmax
or lower
shared_buffers
.
Disk space filling up — Check for WAL accumulation (
pg_wal/
directory), table bloat, or temp files from large sorts. Running
VACUUM FULL
reclaims space but locks the table.
FAQ
Should I use PostgreSQL or MySQL?
For most new projects, PostgreSQL is the better choice. It has stricter SQL compliance, better JSON support, more advanced features, and stronger data integrity guarantees. MySQL is still solid for read-heavy web applications, but Postgres does everything MySQL does and more.
How much data can PostgreSQL handle?
PostgreSQL can handle terabytes of data on a single server. With extensions like Citus for sharding, it can scale to petabytes across multiple nodes. For most applications, a well-tuned single Postgres server is more than enough.
Is PostgreSQL free for commercial use?
Yes. PostgreSQL uses a permissive open-source license (similar to MIT/BSD). You can use it for any purpose — commercial or otherwise — without paying licensing fees.
What's the difference between
SERIAL
and
GENERATED ALWAYS AS IDENTITY
?
Both auto-generate integer values.
GENERATED ALWAYS AS IDENTITY
is the SQL-standard way and is recommended for new projects.
SERIAL
is a PostgreSQL shorthand that's been around longer. The behavior is nearly identical.
Should I use an ORM or write raw SQL?
Both have their place. ORMs (like SQLAlchemy, Prisma, or Django ORM) are great for standard CRUD operations and migrations. But for complex queries, reporting, and performance-critical paths, knowing raw SQL gives you much more control. Learning SQL first, then using an ORM, is the best approach.
Next Steps: Advanced Learning Path
You've covered a lot of ground! Here's where to go from here:
- Deep-dive into query optimization — Learn to read EXPLAIN plans and profile slow queries. Small changes can mean 100x speed improvements.
- Explore PostgreSQL extensions — PostGIS for geospatial, pgvector for AI, pg_cron for scheduling.
- Learn about replication — Set up streaming replication for high availability.
- Practice with real data — Import a public dataset (try Kaggle or the IMDb dataset ) and write queries against it.
- Join the community — The PostgreSQL mailing lists , r/PostgreSQL , and PostgreSQL Slack are all welcoming and helpful.
Happy querying! 🐘
Further Reading
- PostgreSQL Official Documentation — the best database docs on the internet
- The Art of PostgreSQL — excellent book for intermediate to advanced users
- Use The Index, Luke — deep dive into database indexing