10–40×

Latency Reduction

Queries Fixed

12s → 280ms

P99 Improvement

67%

CPU Load Reduction

Overview

A systematic performance audit of a Django application that had grown organically over two years without a coherent indexing strategy. By the time performance became critical, the database was under significant load: 500K+ records in key tables, P99 query latency over 12 seconds on some pages, and database CPU averaging 80% during peak hours.

No new infrastructure. No schema redesign. Just systematic analysis and targeted indexing.

Problem

The application served a marketplace with user listings, transactions, and messaging. As data volume grew, queries that were fast at 10K rows became intolerable at 500K rows. Sequential scans were happening on hot queries.

Key symptoms:

Product listing page: 4–12 seconds to load
User transaction history: 6–8 seconds for users with 100+ transactions
Search filters (by category + location + price): 8–15 seconds
Admin reporting queries: timing out (30s+ limit hit)

Process

Step 1: Identify the Worst Offenders

pg_stat_statements + pganalyze gave a ranked list of the most expensive queries by total time:

SELECT
    query,
    calls,
    total_exec_time / 1000 AS total_exec_seconds,
    mean_exec_time AS mean_ms,
    rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

This identified 23 queries responsible for 87% of total database time.

Step 2: Understand Each Query with EXPLAIN ANALYZE

For each slow query, the first step was always:

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) 
SELECT * FROM listings 
WHERE category_id = 5 AND city_id = 42 AND price < 5000
ORDER BY created_at DESC 
LIMIT 20;

Output revealed the problem types:

Seq Scan on large tables — no relevant index
Hash Join with high actual rows — misestimated cardinality
Sort operations — no covering index for ORDER BY columns
Index Scan + filter — partial index could improve selectivity

Key Index Optimizations

Case 1: Composite Index for Filtered + Sorted Queries

Before: Sequential scan on listings (480K rows) Latency: 4.2 seconds

-- Query Django generated:
SELECT * FROM listings 
WHERE category_id = 5 AND is_active = true
ORDER BY created_at DESC 
LIMIT 20;

Fix: Composite index with column ordering matching the query:

CREATE INDEX CONCURRENTLY idx_listings_category_active_created
ON listings (category_id, is_active, created_at DESC)
WHERE is_active = true;  -- Partial index: excludes inactive rows

After: Index scan on 4,200 rows → 280ms (15× improvement)

The partial index (WHERE is_active = true) halved the index size and improved selectivity, since ~50% of rows are inactive.

Case 2: Covering Index for Transaction History

Before: Index scan + heap fetch for user transaction page Latency: 6.1 seconds

The query needed user_id, amount, status, and created_at. The existing index only covered user_id, forcing a heap fetch for every row.

-- Existing index:
CREATE INDEX idx_transactions_user ON transactions (user_id);

-- Covering index (all needed columns in index):
CREATE INDEX CONCURRENTLY idx_transactions_user_covering
ON transactions (user_id, created_at DESC)
INCLUDE (amount, status, reference_id);

After: Index-only scan — no heap access → 145ms (42× improvement)

Case 3: Multi-Column GIN Index for JSONB Search

A feature stored product attributes in a JSONB column. Filtering by attribute values was doing sequential scans on parsed JSON:

-- Before: seq scan
SELECT * FROM products 
WHERE attributes @> '{"color": "blue", "size": "L"}';

-- GIN index for JSONB containment queries
CREATE INDEX CONCURRENTLY idx_products_attributes_gin
ON products USING gin (attributes);

After: GIN index scan → 380ms → 28ms (13× improvement)

Case 4: Partial Index for Active Sessions

A sessions table had 2M+ rows, but only 50K were active. Every active session query was scanning the full table:

CREATE INDEX CONCURRENTLY idx_sessions_active
ON user_sessions (user_id, last_activity DESC)
WHERE is_active = true;  -- Only 50K rows in index, not 2M

After: 18ms (from 8.2 seconds — 450× improvement, but most of that was volume reduction)

What Didn't Work

Foreign key indexes — PostgreSQL doesn't auto-create indexes on foreign key columns (unlike MySQL). Several queries were doing seq scans on transactions.user_id, listings.seller_id etc. Simply adding basic FK indexes helped significantly before any other optimization.

Index on low-cardinality columns — Added CREATE INDEX ON orders (status) early on. Status has ~5 distinct values across millions of rows — the planner correctly ignored this index in favor of seq scans for most queries. Dropped it and used it only in partial indexes.

Aggressive index creation — Created 30+ indexes in the first pass. Write performance degraded. Audited which indexes were actually being used:

SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read
FROM pg_stat_user_indexes
WHERE idx_scan = 0;  -- Indexes never used

Dropped 11 unused indexes. Write throughput recovered.

Failures & Lessons

Failure 1: Not using CONCURRENTLY First attempt to add indexes used plain CREATE INDEX (which locks the table). A 480K-row table took 8 minutes — during which the site was effectively down. All production index creation must use CONCURRENTLY.

Failure 2: Wrong column order in composite indexes A composite index (city_id, category_id) didn't help queries filtering only on category_id. Index ordering matters: most selective column first, then supporting columns. Needed separate indexes for different access patterns.

Failure 3: VACUUM neglect After heavy bulk deletes, table bloat was significant. The planner's row estimates were off by 10×, causing it to choose seq scans when indexes existed. ANALYZE after bulk operations kept statistics current.

Metrics

Query	Before	After	Improvement
Listing page (category + city filter)	4.2s	280ms	15×
Transaction history	6.1s	145ms	42×
JSONB attribute search	380ms	28ms	13×
Active sessions lookup	8.2s	18ms	455×
Admin reporting	30s+ timeout	2.1s	>14×
Database CPU (peak)	~80%	~27%	67% reduction

Future Improvements

Query plan stability — Use pg_hint_plan for queries where the planner makes poor choices despite good statistics
Partitioning — Table partitioning for the transactions table by created_at; will significantly improve time-range queries as data grows
Connection pooling — PgBouncer to reduce connection overhead; separate issue from query performance but adjacent
Materialized views — For the admin reporting queries that aggregate large datasets; acceptable to be slightly stale

Key Takeaways

pg_stat_statements is the first tool; anecdote-driven optimization is noise
EXPLAIN (ANALYZE, BUFFERS) not just EXPLAIN — without BUFFERS you don't see cache hit patterns
Partial indexes are underused; they dramatically reduce index size and maintenance cost for queries with selective WHERE clauses
Covering indexes eliminate heap access entirely for read-heavy queries — the biggest single-query improvement available
Creating too many indexes hurts write performance; audit with pg_stat_user_indexes and drop unused ones