All terms
RAG & retrieval
Embedding model
Also known as: encoder, text embedding model, sentence encoder
A model that turns a piece of text (or image, or audio) into a fixed-length vector, where similar inputs produce similar vectors. Different from the LLM that generates answers.
What it means
An embedding model is a neural network trained to map inputs into a vector space where geometric distance corresponds to semantic similarity. Feed it "the cat sat on the mat" and you get back, say, a 1,536-dimensional float array. Feed it "a feline rested on the rug" and you get a different array, but one whose cosine distance to the first is small. That property is what makes semantic search work.
Embedding models are not LLMs and you don't use them to generate text. They're typically encoder-only transformers (think BERT-family) trained with contrastive objectives — pull paraphrases together, push unrelated text apart, often using mined hard negatives. The output dimensionality (768, 1024, 1536, 3072) is fixed, regardless of input length. Most have a max input of 512 to 8,192 tokens.
The 2026 lineup, roughly best-in-class first: Voyage AI (voyage-3, voyage-code-3) consistently tops MTEB; OpenAI text-embedding-3-large is the safe-default API; Cohere Embed v3 has strong multilingual; Jina v3 is good open-source; BGE and E5 are popular open weights you can self-host; Nomic Embed if you want truly open. Specialized embeddings exist for code (Voyage code), legal, biomedical, and multimodal (CLIP-style for image+text).
Two practical points teams get wrong: you have to use the same embedding model for documents and queries (otherwise the vectors are in different spaces and similarity is meaningless), and switching embedding models means re-indexing everything. Treat the embedding model choice like a database schema decision, not a hyperparameter.
Example
Voyage's voyage-3 turns the sentence 'How do I reset my password?' into a 1,024-dim vector. The same model embeds 200,000 help-center paragraphs into 1,024-dim vectors. At query time you cosine-compare to find the nearest paragraphs.
Why it matters
The embedding model is the ceiling on your retrieval quality — no amount of reranking saves a bad embedder. It's also a near-permanent commitment because reindexing terabytes of data is expensive. Pick deliberately.