Skip to main content

PostgreSQL Performance Optimization Guide

Backend Development HeStudy

Why Choose PostgreSQL?

PostgreSQL is a powerful open-source relational database known for its reliability, rich features, and performance.

Index Optimization

Indexes are key to improving query performance.

1. B-Tree Index (Default)

Suitable for most scenarios:

-- Create single-column index
CREATE INDEX idx_users_email ON users(email);

-- Create composite index
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);

-- Create unique index
CREATE UNIQUE INDEX idx_users_username ON users(username);

2. Partial Index

Index only rows that meet specific conditions to reduce index size:

-- Index only active users
CREATE INDEX idx_active_users ON users(email)
WHERE status = 'active';

-- Index only unpaid orders
CREATE INDEX idx_unpaid_orders ON orders(created_at)
WHERE status = 'pending';

3. Expression Index

Create indexes based on computed results:

-- Index on lowercase email
CREATE INDEX idx_users_email_lower ON users(LOWER(email));

-- Index on JSON field
CREATE INDEX idx_users_preferences ON users((preferences->>'theme'));

4. GIN Index

Suitable for full-text search and arrays:

-- Full-text search index
CREATE INDEX idx_posts_search ON posts
USING GIN(to_tsvector('english', title || ' ' || content));

-- Array index
CREATE INDEX idx_tags ON articles USING GIN(tags);

5. GiST Index

Suitable for geographic data and range queries:

CREATE INDEX idx_locations ON stores
USING GIST(location);

Query Optimization

Use EXPLAIN ANALYZE

Analyze query execution plans:

EXPLAIN ANALYZE
SELECT * FROM orders
WHERE user_id = 123 AND created_at > NOW() - INTERVAL '30 days';

Avoid SELECT *

Query only the columns you need:

-- Bad
SELECT * FROM users WHERE id = 1;

-- Good
SELECT id, name, email FROM users WHERE id = 1;

Use JOIN Instead of Subqueries

-- Bad (subquery)
SELECT * FROM orders
WHERE user_id IN (SELECT id FROM users WHERE country = 'US');

-- Good (JOIN)
SELECT o.* FROM orders o
INNER JOIN users u ON o.user_id = u.id
WHERE u.country = 'US';

Batch Operations

-- Bad (multiple single inserts)
INSERT INTO logs (message) VALUES ('Log 1');
INSERT INTO logs (message) VALUES ('Log 2');

-- Good (batch insert)
INSERT INTO logs (message) VALUES
  ('Log 1'),
  ('Log 2'),
  ('Log 3');

Use LIMIT

Limit the number of results returned:

SELECT * FROM articles
ORDER BY created_at DESC
LIMIT 10;

Configuration Optimization

Key postgresql.conf Parameters

# Memory settings (assuming 16GB RAM)
shared_buffers = 4GB                # 25% of RAM
effective_cache_size = 12GB         # 75% of RAM
work_mem = 64MB                     # Memory per operation
maintenance_work_mem = 512MB        # Memory for maintenance operations

# Concurrency settings
max_connections = 100
max_worker_processes = 8
max_parallel_workers_per_gather = 4

# Logging settings
log_min_duration_statement = 1000   # Log slow queries (>1 second)
log_line_prefix = '%t [%p]: '       # Log format

# Checkpoint settings
checkpoint_timeout = 15min
checkpoint_completion_target = 0.9

# WAL settings
wal_buffers = 16MB

Connection Pooling

Use PgBouncer to reduce connection overhead:

# pgbouncer.ini
[databases]
mydb = host=localhost dbname=mydb

[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25

Table Design Optimization

1. Choose Appropriate Data Types

-- Bad
CREATE TABLE users (
  age VARCHAR(3),           -- Using string for numbers
  is_active VARCHAR(5)      -- Using string for booleans
);

-- Good
CREATE TABLE users (
  age SMALLINT,             -- Using integer type
  is_active BOOLEAN         -- Using boolean type
);

2. Use Constraints

CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  user_id INTEGER NOT NULL REFERENCES users(id),
  amount DECIMAL(10,2) CHECK (amount > 0),
  status VARCHAR(20) NOT NULL DEFAULT 'pending',
  created_at TIMESTAMP NOT NULL DEFAULT NOW()
);

3. Partitioned Tables

Partition large tables:

-- Partition by date range
CREATE TABLE logs (
  id SERIAL,
  message TEXT,
  created_at TIMESTAMP NOT NULL
) PARTITION BY RANGE (created_at);

-- Create partitions
CREATE TABLE logs_2025_01 PARTITION OF logs
  FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');

CREATE TABLE logs_2025_02 PARTITION OF logs
  FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');

Maintenance Tasks

1. VACUUM

Reclaim space and update statistics:

-- Manual VACUUM
VACUUM ANALYZE users;

-- Full VACUUM (locks table)
VACUUM FULL users;

-- Auto VACUUM configuration
ALTER TABLE users SET (
  autovacuum_vacuum_scale_factor = 0.1,
  autovacuum_analyze_scale_factor = 0.05
);

2. REINDEX

Rebuild indexes:

-- Rebuild single index
REINDEX INDEX idx_users_email;

-- Rebuild all indexes on table
REINDEX TABLE users;

-- Rebuild all indexes in database (use with caution)
REINDEX DATABASE mydb;

3. Update Statistics

-- Update table statistics
ANALYZE users;

-- Update all tables
ANALYZE;

Monitoring and Debugging

View Slow Queries

-- View currently running queries
SELECT pid, query, state, query_start
FROM pg_stat_activity
WHERE state != 'idle';

-- Terminate slow query
SELECT pg_terminate_backend(pid);

View Table Sizes

-- View table sizes
SELECT
  schemaname,
  tablename,
  pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

View Index Usage

SELECT
  schemaname,
  tablename,
  indexname,
  idx_scan,
  idx_tup_read,
  idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;

Caching Strategies

1. Application-Level Caching

Use Redis to cache hot data:

async function getUser(id) {
  // Check cache first
  let user = await redis.get(`user:${id}`);

  if (!user) {
    // Query database
    user = await db.query("SELECT * FROM users WHERE id = $1", [id]);

    // Write to cache
    await redis.setex(`user:${id}`, 3600, JSON.stringify(user));
  }

  return user;
}

2. Materialized Views

-- Create materialized view
CREATE MATERIALIZED VIEW sales_summary AS
SELECT
  DATE_TRUNC('day', created_at) AS date,
  COUNT(*) AS order_count,
  SUM(amount) AS total_amount
FROM orders
GROUP BY DATE_TRUNC('day', created_at);

-- Create index
CREATE INDEX idx_sales_summary_date ON sales_summary(date);

-- Refresh view
REFRESH MATERIALIZED VIEW sales_summary;

Common Performance Issues

1. N+1 Query Problem

-- Bad: N+1 queries
SELECT * FROM orders;  -- 1 query
-- Then query user for each order
SELECT * FROM users WHERE id = ?;  -- N queries

-- Good: Use JOIN
SELECT o.*, u.name, u.email
FROM orders o
JOIN users u ON o.user_id = u.id;  -- 1 query

2. Missing Indexes

-- Find tables missing indexes
SELECT
  schemaname,
  tablename,
  seq_scan,
  seq_tup_read,
  idx_scan,
  seq_tup_read / seq_scan AS avg_seq_tup
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_scan DESC;

3. Lock Contention

-- View lock waits
SELECT
  blocked_locks.pid AS blocked_pid,
  blocked_activity.query AS blocked_query,
  blocking_locks.pid AS blocking_pid,
  blocking_activity.query AS blocking_query
FROM pg_locks blocked_locks
JOIN pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype
JOIN pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;

Summary

PostgreSQL performance optimization is an ongoing process that requires:

  1. Properly designed table structures and indexes
  2. Writing efficient SQL queries
  3. Tuning database configuration parameters
  4. Regular maintenance and monitoring
  5. Using caching strategies

Remember: Measure first, then optimize. Use EXPLAIN ANALYZE to analyze queries and find the real performance bottlenecks.

References