Overview
A systematic performance audit of a Django application that had grown organically over two years without a coherent indexing strategy. By the time performance became critical, the database was under significant load: 500K+ records in key tables, P99 query latency over 12 seconds on some pages, and database CPU averaging 80% during peak hours.
No new infrastructure. No schema redesign. Just systematic analysis and targeted indexing.
Problem
The application served a marketplace with user listings, transactions, and messaging. As data volume grew, queries that were fast at 10K rows became intolerable at 500K rows. Sequential scans were happening on hot queries.
Key symptoms:
- Product listing page: 4–12 seconds to load
- User transaction history: 6–8 seconds for users with 100+ transactions
- Search filters (by category + location + price): 8–15 seconds
- Admin reporting queries: timing out (30s+ limit hit)
Process
Step 1: Identify the Worst Offenders
pg_stat_statements + pganalyze gave a ranked list of the most expensive queries by total time:
SELECT
query,
calls,
total_exec_time / 1000 AS total_exec_seconds,
mean_exec_time AS mean_ms,
rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;
This identified 23 queries responsible for 87% of total database time.
Step 2: Understand Each Query with EXPLAIN ANALYZE
For each slow query, the first step was always:
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM listings
WHERE category_id = 5 AND city_id = 42 AND price < 5000
ORDER BY created_at DESC
LIMIT 20;
Output revealed the problem types:
- Seq Scan on large tables — no relevant index
- Hash Join with high actual rows — misestimated cardinality
- Sort operations — no covering index for ORDER BY columns
- Index Scan + filter — partial index could improve selectivity
Key Index Optimizations
Case 1: Composite Index for Filtered + Sorted Queries
Before: Sequential scan on listings (480K rows)
Latency: 4.2 seconds
-- Query Django generated:
SELECT * FROM listings
WHERE category_id = 5 AND is_active = true
ORDER BY created_at DESC
LIMIT 20;
Fix: Composite index with column ordering matching the query:
CREATE INDEX CONCURRENTLY idx_listings_category_active_created
ON listings (category_id, is_active, created_at DESC)
WHERE is_active = true; -- Partial index: excludes inactive rows
After: Index scan on 4,200 rows → 280ms (15× improvement)
The partial index (WHERE is_active = true) halved the index size and improved selectivity, since ~50% of rows are inactive.
Case 2: Covering Index for Transaction History
Before: Index scan + heap fetch for user transaction page Latency: 6.1 seconds
The query needed user_id, amount, status, and created_at. The existing index only covered user_id, forcing a heap fetch for every row.
-- Existing index:
CREATE INDEX idx_transactions_user ON transactions (user_id);
-- Covering index (all needed columns in index):
CREATE INDEX CONCURRENTLY idx_transactions_user_covering
ON transactions (user_id, created_at DESC)
INCLUDE (amount, status, reference_id);
After: Index-only scan — no heap access → 145ms (42× improvement)
Case 3: Multi-Column GIN Index for JSONB Search
A feature stored product attributes in a JSONB column. Filtering by attribute values was doing sequential scans on parsed JSON:
-- Before: seq scan
SELECT * FROM products
WHERE attributes @> '{"color": "blue", "size": "L"}';
-- GIN index for JSONB containment queries
CREATE INDEX CONCURRENTLY idx_products_attributes_gin
ON products USING gin (attributes);
After: GIN index scan → 380ms → 28ms (13× improvement)
Case 4: Partial Index for Active Sessions
A sessions table had 2M+ rows, but only 50K were active. Every active session query was scanning the full table:
CREATE INDEX CONCURRENTLY idx_sessions_active
ON user_sessions (user_id, last_activity DESC)
WHERE is_active = true; -- Only 50K rows in index, not 2M
After: 18ms (from 8.2 seconds — 450× improvement, but most of that was volume reduction)
What Didn't Work
Foreign key indexes — PostgreSQL doesn't auto-create indexes on foreign key columns (unlike MySQL). Several queries were doing seq scans on transactions.user_id, listings.seller_id etc. Simply adding basic FK indexes helped significantly before any other optimization.
Index on low-cardinality columns — Added CREATE INDEX ON orders (status) early on. Status has ~5 distinct values across millions of rows — the planner correctly ignored this index in favor of seq scans for most queries. Dropped it and used it only in partial indexes.
Aggressive index creation — Created 30+ indexes in the first pass. Write performance degraded. Audited which indexes were actually being used:
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read
FROM pg_stat_user_indexes
WHERE idx_scan = 0; -- Indexes never used
Dropped 11 unused indexes. Write throughput recovered.
Failures & Lessons
Failure 1: Not using CONCURRENTLY
First attempt to add indexes used plain CREATE INDEX (which locks the table). A 480K-row table took 8 minutes — during which the site was effectively down. All production index creation must use CONCURRENTLY.
Failure 2: Wrong column order in composite indexes
A composite index (city_id, category_id) didn't help queries filtering only on category_id. Index ordering matters: most selective column first, then supporting columns. Needed separate indexes for different access patterns.
Failure 3: VACUUM neglect
After heavy bulk deletes, table bloat was significant. The planner's row estimates were off by 10×, causing it to choose seq scans when indexes existed. ANALYZE after bulk operations kept statistics current.
Metrics
| Query | Before | After | Improvement |
|---|---|---|---|
| Listing page (category + city filter) | 4.2s | 280ms | 15× |
| Transaction history | 6.1s | 145ms | 42× |
| JSONB attribute search | 380ms | 28ms | 13× |
| Active sessions lookup | 8.2s | 18ms | 455× |
| Admin reporting | 30s+ timeout | 2.1s | >14× |
| Database CPU (peak) | ~80% | ~27% | 67% reduction |
Future Improvements
- Query plan stability — Use
pg_hint_planfor queries where the planner makes poor choices despite good statistics - Partitioning — Table partitioning for the transactions table by created_at; will significantly improve time-range queries as data grows
- Connection pooling — PgBouncer to reduce connection overhead; separate issue from query performance but adjacent
- Materialized views — For the admin reporting queries that aggregate large datasets; acceptable to be slightly stale
Key Takeaways
pg_stat_statementsis the first tool; anecdote-driven optimization is noise- EXPLAIN (ANALYZE, BUFFERS) not just EXPLAIN — without BUFFERS you don't see cache hit patterns
- Partial indexes are underused; they dramatically reduce index size and maintenance cost for queries with selective WHERE clauses
- Covering indexes eliminate heap access entirely for read-heavy queries — the biggest single-query improvement available
- Creating too many indexes hurts write performance; audit with pg_stat_user_indexes and drop unused ones