Embeddings
Numerical representations that capture the semantic meaning of text, images, or other data as vectors, enabling machines to measure how similar two pieces of content are.
An embedding is a list of numbers — a vector — that represents the meaning of a piece of content. Text, images, audio: anything you can feed into an embedding model comes out as a fixed-length array of floats, typically 768 to 3,072 dimensions. Semantically similar inputs land near each other in that space. "Refund policy" and "how do I get my money back" are completely different strings, but their embeddings are close neighbors. That's the trick that makes modern search and retrieval actually work.
Every RAG pipeline starts here. Embed the query, find the nearest stored vectors in a vector database, pull the relevant context, then let the model generate a response. Bad embeddings mean bad retrieval; bad retrieval means hallucinations regardless of how capable your model is. The embedding layer is the quality ceiling for your entire retrieval system.
The practical considerations: embedding models are cheap — fractions of a cent per thousand chunks — but you embed your entire corpus upfront and re-embed when content changes. More importantly, your choice of embedding model is a commitment. You can't swap models without re-embedding everything, because different models produce incompatible vector spaces. Pick one that matches your domain, benchmark it against your actual queries, and treat the embedding pipeline as infrastructure from day one.
Further reading:
- OpenAI Embeddings Guide — Official documentation on embeddings, including model options and best practices
- What are Embeddings? by Vicki Boykis — Deep dive into embeddings from mathematical fundamentals through production use