2026-01-08

pgvector Performance & Optimization 2025

research

Learned: 2026-01-08 Topic: Vector Database, PostgreSQL


Key Insights

  1. HNSW > IVFFlat for production: 15x faster queries, handles updates better
  2. pgvector 0.8.0 is game-changing: iterative scans fix overfiltering, 9x faster filtered queries
  3. pgvector vs Pinecone: 28x lower latency, 16x higher throughput, 75% cost savings at 50M vectors
  4. Scale limit: pgvector handles 50-100M vectors well; beyond that consider dedicated solutions

Index Types Comparison

FactorHNSWIVFFlat
Query Speed40.5 QPS2.6 QPS (@99.8% recall)
Build Time32x slowerFast
Memory2.8x moreCompact
UpdatesNo rebuildNeeds rebuild
Build on emptyYesNo

Recommendation: HNSW for production unless memory-constrained


Configuration Tuning

HNSW Parameters

CREATE INDEX ON items USING hnsw (embedding vector_l2_ops)
WITH (m = 16, ef_construction = 64);

SET hnsw.ef_search = 100;  -- Query accuracy (default 40)
ParameterDefaultRecommendation
m1616-48 (higher for >768 dims)
ef_construction6464-128 (higher = better quality)
ef_search40100-200 for production

IVFFlat Parameters

CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops)
WITH (lists = 1000);

SET ivfflat.probes = 32;  -- sqrt(lists)
  • lists: rows/1000 (up to 1M), sqrt(rows) (over 1M)
  • probes: sqrt(lists) for good balance

pgvector 0.8.0 Critical Features (Nov 2024)

Iterative Scans - Fixes overfiltering problem:

SET hnsw.iterative_scan = 'on';
SET ivfflat.iterative_scan = 'on';
  • Up to 9x faster queries
  • 100x better relevance for filtered queries
  • Better cost estimation for query planner

Hybrid Search Pattern

-- Create indexes
CREATE INDEX ON docs USING hnsw (embedding vector_cosine_ops);
CREATE INDEX ON docs (category);      -- B-tree for WHERE
CREATE INDEX ON docs (created_at);    -- B-tree for filters

-- Hybrid query
SELECT id, title, embedding <=> query_vec AS similarity
FROM docs
WHERE category = 'technical'
  AND created_at > '2024-01-01'
ORDER BY embedding <=> query_vec
LIMIT 10;

Scaling Guidelines

ScaleRecommendation
< 10Mpgvector excellent
10-100Mpgvector + pgvectorscale
100M-1BPartitioning + optimization
> 1BDedicated vector DB

Optimization Strategies

  1. Dimensionality reduction: 1536 → 768 dims = 2x memory savings, 97% accuracy
  2. Use halfvec (float16): 50% memory reduction
  3. Bulk load then index: Much faster than incremental
  4. Concurrent index creation for production

Benchmarks vs Dedicated DBs

DatabaseBest ForScale
pgvectorSQL integration, ACID, cost50-100M
PineconeManaged, enterpriseBillions
QdrantComplex filteringBillions
MilvusGPU, largest scaleBillions

pgvector advantages:

  • 75% cheaper than Pinecone
  • Full SQL (JOINs, transactions)
  • Existing Postgres ecosystem

Production Memory Sizing (50M vectors, 768 dims)

  • Base vectors: 150 GB
  • HNSW index: 450 GB
  • shared_buffers: 200 GB
  • Total: ~850 GB
-- postgresql.conf
shared_buffers = 200GB
maintenance_work_mem = 8GB
work_mem = 256MB

Monitoring

-- Query performance
SELECT query, calls, mean_exec_time, max_exec_time
FROM pg_stat_statements
WHERE query LIKE '%<=>%'
ORDER BY mean_exec_time DESC;

-- Index size
SELECT pg_size_pretty(pg_relation_size('items_embedding_idx'));

-- Recall validation
WITH exact AS (SELECT id FROM items ORDER BY embedding <-> q LIMIT 100),
     approx AS (SELECT id FROM items ORDER BY embedding <-> q LIMIT 100)
SELECT COUNT(*) * 1.0 / 100 AS recall
FROM approx WHERE id IN (SELECT id FROM exact);

Quick Reference

-- HNSW index (recommended)
CREATE INDEX ON items USING hnsw (embedding vector_l2_ops)
WITH (m = 16, ef_construction = 64);

-- Query tuning
SET hnsw.ef_search = 100;
SET hnsw.iterative_scan = 'on';

-- Maintenance
VACUUM ANALYZE items;
REINDEX INDEX CONCURRENTLY items_embedding_idx;

-- Debug
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM items ORDER BY embedding <-> '[...]' LIMIT 10;