All terms
RAG & retrieval
Hybrid search
Also known as: hybrid retrieval, sparse-dense retrieval, BM25 + vector
Combining semantic (embedding) search with keyword search (BM25) and fusing the results — the production-default retrieval strategy in 2026.
What it means
Semantic search is great at conceptual queries ("how do I save money on infra") and bad at exact tokens (product SKUs, error codes, legal citations, names with weird spellings). Keyword search is the inverse — it nails exact matches and falls apart on paraphrase. Hybrid search runs both in parallel and merges the results, getting the strengths of each.
The mechanics: index every document twice — once as a BM25 inverted index, once as a vector index. At query time, run both retrievers, get two ranked lists, then fuse them. Reciprocal Rank Fusion (RRF) is the simplest and works shockingly well — score each doc by the sum of 1/(k+rank) across both lists. Weighted score fusion (alpha * semantic_score + (1-alpha) * bm25_score) gives you a tunable knob. Most production stacks use one of these.
You really see the difference on real-world queries. "How do I configure SSO for Okta" — both retrievers nail it. "What does error E1042 mean" — BM25 saves you because the embedding might match other errors. "Ways to make our docs less confusing" — semantic shines because no doc literally contains that phrase. Run only one and you'll lose 20-30% of your queries on the long tail.
Most managed vector DBs (Pinecone, Qdrant, Weaviate, OpenSearch, pgvector with paradedb) ship hybrid search natively in 2026. You don't have to build it. You do have to tune the fusion weight and decide whether to rerank the merged list — usually yes.
Example
A legal-research RAG indexes case law with both BM25 and embeddings. A query for "Smith v. Jones 2019 negligence ruling" hits the BM25 index for the case name; "cases about employer liability for remote workers" hits the vector index. Both fuse, rerank, and ship the top 5 to the LLM.
Why it matters
Pure vector search is a 2022 architecture. If you're building production RAG in 2026 and not running hybrid, you're leaving easy wins on the floor. Most 'embedding model is bad' diagnoses are actually missing-keyword-search diagnoses.