All terms
RAG & retrieval

Reranking

Also known as: cross-encoder reranking, second-stage retrieval, reranker

A second-pass model that takes the top-K results from initial retrieval and reorders them by true relevance to the query.

What it means

Vector search is fast but coarse — it embeds the query and documents independently and just measures distance. That's good enough to surface 20 candidate chunks, but the top-1 result is often not actually the best answer. Rerankers fix this. They take the query and each candidate together, run them through a cross-encoder model that can attend to both, and produce a relevance score that's far more accurate than cosine similarity. The standard pattern is two-stage retrieval: pull top-50 to top-100 with cheap vector search, then rerank to top-3 to top-10 with a cross-encoder. The reranker is slower per-pair (it has to run a transformer over query+doc together), but you only run it on a small candidate set, so total latency stays under ~200ms. Cohere Rerank, Voyage rerank, BGE rerankers, and Jina Rerank are the production-grade options in 2026. Why rerankers help so much: cross-encoders see the query and the document jointly, so they can catch subtle relevance signals that bi-encoder embeddings miss. They're also better at penalizing fluency-without-substance — a chunk that uses lots of related words but doesn't actually answer the question gets demoted. On real RAG benchmarks, adding a reranker typically lifts answer quality more than upgrading the LLM. Naive RAG ships without reranking and the team spends two months wondering why the bot keeps citing the wrong doc. Rerankers are the cheapest single quality lever in retrieval — turn one on before you tune anything else.

Example

A query "what is our parental leave policy" pulls 50 candidate chunks via vector search. The reranker scores each (query, chunk) pair, surfaces the actual policy section to top-1, and demotes a chunk that mentioned parental leave in passing.

Why it matters

If you only do one optimization to a basic RAG pipeline, add a reranker. It's a 1-line API call that often closes most of the gap between 'this kind of works' and 'this is shippable.'

Related terms