PostgreSQL Performance Optimization Guide
Why Choose PostgreSQL?
PostgreSQL is a powerful open-source relational database known for its reliability, rich features, and performance.
Index Optimization
Indexes are key to improving query performance.
1. B-Tree Index (Default)
Suitable for most scenarios:
-- Create single-column index
CREATE INDEX idx_users_email ON users(email);
-- Create composite index
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);
-- Create unique index
CREATE UNIQUE INDEX idx_users_username ON users(username);
2. Partial Index
Index only rows that meet specific conditions to reduce index size:
-- Index only active users
CREATE INDEX idx_active_users ON users(email)
WHERE status = 'active';
-- Index only unpaid orders
CREATE INDEX idx_unpaid_orders ON orders(created_at)
WHERE status = 'pending';
3. Expression Index
Create indexes based on computed results:
-- Index on lowercase email
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
-- Index on JSON field
CREATE INDEX idx_users_preferences ON users((preferences->>'theme'));
4. GIN Index
Suitable for full-text search and arrays:
-- Full-text search index
CREATE INDEX idx_posts_search ON posts
USING GIN(to_tsvector('english', title || ' ' || content));
-- Array index
CREATE INDEX idx_tags ON articles USING GIN(tags);
5. GiST Index
Suitable for geographic data and range queries:
CREATE INDEX idx_locations ON stores
USING GIST(location);
Query Optimization
Use EXPLAIN ANALYZE
Analyze query execution plans:
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE user_id = 123 AND created_at > NOW() - INTERVAL '30 days';
Avoid SELECT *
Query only the columns you need:
-- Bad
SELECT * FROM users WHERE id = 1;
-- Good
SELECT id, name, email FROM users WHERE id = 1;
Use JOIN Instead of Subqueries
-- Bad (subquery)
SELECT * FROM orders
WHERE user_id IN (SELECT id FROM users WHERE country = 'US');
-- Good (JOIN)
SELECT o.* FROM orders o
INNER JOIN users u ON o.user_id = u.id
WHERE u.country = 'US';
Batch Operations
-- Bad (multiple single inserts)
INSERT INTO logs (message) VALUES ('Log 1');
INSERT INTO logs (message) VALUES ('Log 2');
-- Good (batch insert)
INSERT INTO logs (message) VALUES
('Log 1'),
('Log 2'),
('Log 3');
Use LIMIT
Limit the number of results returned:
SELECT * FROM articles
ORDER BY created_at DESC
LIMIT 10;
Configuration Optimization
Key postgresql.conf Parameters
# Memory settings (assuming 16GB RAM)
shared_buffers = 4GB # 25% of RAM
effective_cache_size = 12GB # 75% of RAM
work_mem = 64MB # Memory per operation
maintenance_work_mem = 512MB # Memory for maintenance operations
# Concurrency settings
max_connections = 100
max_worker_processes = 8
max_parallel_workers_per_gather = 4
# Logging settings
log_min_duration_statement = 1000 # Log slow queries (>1 second)
log_line_prefix = '%t [%p]: ' # Log format
# Checkpoint settings
checkpoint_timeout = 15min
checkpoint_completion_target = 0.9
# WAL settings
wal_buffers = 16MB
Connection Pooling
Use PgBouncer to reduce connection overhead:
# pgbouncer.ini
[databases]
mydb = host=localhost dbname=mydb
[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25
Table Design Optimization
1. Choose Appropriate Data Types
-- Bad
CREATE TABLE users (
age VARCHAR(3), -- Using string for numbers
is_active VARCHAR(5) -- Using string for booleans
);
-- Good
CREATE TABLE users (
age SMALLINT, -- Using integer type
is_active BOOLEAN -- Using boolean type
);
2. Use Constraints
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INTEGER NOT NULL REFERENCES users(id),
amount DECIMAL(10,2) CHECK (amount > 0),
status VARCHAR(20) NOT NULL DEFAULT 'pending',
created_at TIMESTAMP NOT NULL DEFAULT NOW()
);
3. Partitioned Tables
Partition large tables:
-- Partition by date range
CREATE TABLE logs (
id SERIAL,
message TEXT,
created_at TIMESTAMP NOT NULL
) PARTITION BY RANGE (created_at);
-- Create partitions
CREATE TABLE logs_2025_01 PARTITION OF logs
FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');
CREATE TABLE logs_2025_02 PARTITION OF logs
FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');
Maintenance Tasks
1. VACUUM
Reclaim space and update statistics:
-- Manual VACUUM
VACUUM ANALYZE users;
-- Full VACUUM (locks table)
VACUUM FULL users;
-- Auto VACUUM configuration
ALTER TABLE users SET (
autovacuum_vacuum_scale_factor = 0.1,
autovacuum_analyze_scale_factor = 0.05
);
2. REINDEX
Rebuild indexes:
-- Rebuild single index
REINDEX INDEX idx_users_email;
-- Rebuild all indexes on table
REINDEX TABLE users;
-- Rebuild all indexes in database (use with caution)
REINDEX DATABASE mydb;
3. Update Statistics
-- Update table statistics
ANALYZE users;
-- Update all tables
ANALYZE;
Monitoring and Debugging
View Slow Queries
-- View currently running queries
SELECT pid, query, state, query_start
FROM pg_stat_activity
WHERE state != 'idle';
-- Terminate slow query
SELECT pg_terminate_backend(pid);
View Table Sizes
-- View table sizes
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;
View Index Usage
SELECT
schemaname,
tablename,
indexname,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;
Caching Strategies
1. Application-Level Caching
Use Redis to cache hot data:
async function getUser(id) {
// Check cache first
let user = await redis.get(`user:${id}`);
if (!user) {
// Query database
user = await db.query("SELECT * FROM users WHERE id = $1", [id]);
// Write to cache
await redis.setex(`user:${id}`, 3600, JSON.stringify(user));
}
return user;
}
2. Materialized Views
-- Create materialized view
CREATE MATERIALIZED VIEW sales_summary AS
SELECT
DATE_TRUNC('day', created_at) AS date,
COUNT(*) AS order_count,
SUM(amount) AS total_amount
FROM orders
GROUP BY DATE_TRUNC('day', created_at);
-- Create index
CREATE INDEX idx_sales_summary_date ON sales_summary(date);
-- Refresh view
REFRESH MATERIALIZED VIEW sales_summary;
Common Performance Issues
1. N+1 Query Problem
-- Bad: N+1 queries
SELECT * FROM orders; -- 1 query
-- Then query user for each order
SELECT * FROM users WHERE id = ?; -- N queries
-- Good: Use JOIN
SELECT o.*, u.name, u.email
FROM orders o
JOIN users u ON o.user_id = u.id; -- 1 query
2. Missing Indexes
-- Find tables missing indexes
SELECT
schemaname,
tablename,
seq_scan,
seq_tup_read,
idx_scan,
seq_tup_read / seq_scan AS avg_seq_tup
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_scan DESC;
3. Lock Contention
-- View lock waits
SELECT
blocked_locks.pid AS blocked_pid,
blocked_activity.query AS blocked_query,
blocking_locks.pid AS blocking_pid,
blocking_activity.query AS blocking_query
FROM pg_locks blocked_locks
JOIN pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype
JOIN pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;
Summary
PostgreSQL performance optimization is an ongoing process that requires:
- Properly designed table structures and indexes
- Writing efficient SQL queries
- Tuning database configuration parameters
- Regular maintenance and monitoring
- Using caching strategies
Remember: Measure first, then optimize. Use EXPLAIN ANALYZE to analyze queries and find the real performance bottlenecks.