What Is Semantic Search? Embeddings, Keywords, and Hybrid Explained
Search by meaning, not just matching words — the concept behind RAG and AI search.
Semantic search finds results by meaning rather than by matching exact words. Where a keyword search for “how to reset my password” only finds documents containing those literal words, a semantic search also surfaces a page titled “recovering account access” — because it understands the two phrases mean nearly the same thing. It does this by converting text into numerical vectors called embeddings and finding the ones closest in “meaning space.” This is the foundation underneath modern AI search and RAG.
Keyword search vs semantic search
Keyword search (the classic approach, e.g. BM25 in Elasticsearch) ranks documents by how often and how prominently your query terms appear. It’s fast, transparent, and unbeatable for exact matches — error codes, product SKUs, names, precise quotes. Its weakness: it’s literal. It misses synonyms, paraphrases, and intent. Search “car” and it won’t find “automobile.”
Semantic search understands that “car” and “automobile” are close in meaning, that “the capital of France” relates to “Paris,” and that a question can match an answer phrased completely differently. Its weakness: it can be too fuzzy, missing exact strings (it might not prioritize the precise SKU you typed) and it costs more to compute.
| Keyword (BM25) | Semantic (vector) | |
|---|---|---|
| Matches on | Literal terms | Meaning / intent |
| Handles synonyms | No | Yes |
| Exact strings (SKUs, codes) | Excellent | Can miss |
| Speed / cost | Very fast, cheap | Heavier (embeddings) |
| Explainability | High (you see the matched words) | Lower (it’s vectors) |
What are embeddings?
An embedding is a list of numbers — a vector — that represents the meaning of a piece of text. An embedding model reads text and outputs, say, 768 numbers (each dimension capturing some latent aspect of meaning). The key property: texts with similar meaning produce vectors that sit close together in that high-dimensional space, and unrelated texts sit far apart.
So “the cat sat on the mat” and “a feline rested on the rug” land near each other, while “quarterly revenue forecast” lands far away. Semantic search works by embedding your query, then finding the stored document vectors nearest to it — using a distance measure like cosine similarity. Those nearest neighbors are your results.
Popular open-source embedding models include nomic-embed-text and mxbai-embed-large (both runnable locally via Ollama); a common cloud option is OpenAI’s text-embedding-3-small. One rule matters above all: embed your documents and your queries with the same model, or the vectors won’t be comparable.
What is hybrid search?
Hybrid search runs keyword and semantic search together and blends the scores. You get the best of both: the precision of keyword matching for exact terms and the recall of semantic matching for synonyms and intent. In practice this is what most strong production search systems use, because pure semantic search alone tends to miss the exact-match cases users also expect to work. Many vector databases — Qdrant and Weaviate among them — support hybrid search natively.
Where do vector databases fit in?
To do semantic search at scale you need somewhere to store millions of embeddings and search them fast. That’s a vector database (or a vector index). It builds a structure — usually an HNSW graph — that finds approximate nearest neighbors in milliseconds instead of comparing your query to every vector one by one. Options range from pgvector (a PostgreSQL extension) to dedicated engines like Qdrant, Chroma, and Weaviate. We cover choosing one in the self-hosted RAG guide.
When to use which
- Use keyword search when exact matching dominates: code search, legal citations, product catalogs with precise IDs, or any time users type strings they expect to match literally.
- Use semantic search when intent and phrasing vary: FAQ/help centers, document Q&A, “search by what I mean,” and anything feeding an LLM in a RAG pipeline.
- Use hybrid search when you want both — which, honestly, is most real-world search. Start hybrid if you can; it’s the safest default.
How this connects to AI search and RAG
Semantic search is the retrieval half of retrieval-augmented generation. In RAG, you semantically search your knowledge base for the chunks most relevant to a question, then hand those chunks to an LLM to write a grounded, cited answer. The same machinery powers open-source AI answer engines like Vane and Khoj. Understand semantic search and embeddings, and the rest of the modern AI-search stack stops being mysterious.
FAQ
Is semantic search the same as neural search? Effectively yes in common usage. “Neural search” emphasizes that the embeddings come from a neural network; “semantic search” emphasizes the goal (search by meaning). People use the terms interchangeably.
Does semantic search replace keyword search? No — it complements it. Keyword search still wins for exact matches. Hybrid search combines both and is usually the strongest choice.
Do I need a GPU to do semantic search? For generating embeddings on a modest corpus and running the vector search, a CPU is fine. A GPU mainly speeds up embedding very large datasets or running large local models.
What’s the difference between an embedding and a vector database? An embedding is the numeric representation of one piece of text. A vector database stores many embeddings and searches them efficiently. The embedding is the data; the database is where it lives and gets queried.
This is the foundation; the fun is in building on it. Read the Self-Hosted RAG complete guide to turn semantic search into a private AI knowledge base, or browse all guides. Aquila is the independent home for AI search you own. Own your search.
Keep going
More guides on self-hosted AI search, RAG, and vector databases.