What Is a Vector Database? ANN Indexes, Filtering, and Hybrid Search

Q: Do I need a vector database for RAG?

You need somewhere to store and search embeddings. For a small or prototype RAG system, a library or pgvector is often enough. At production scale — millions of vectors, concurrent queries, filtering — a dedicated vector database earns its place.

A vector database is a system built to store embeddings — the numeric vectors that represent the meaning of text, images, or other data — and to find the ones most similar to a query vector, fast. Instead of matching keywords like a traditional database, it answers “which stored items are closest in meaning to this one?” in milliseconds, even across millions of vectors. That capability is what makes semantic search and RAG possible at scale. This guide explains what a vector database does, how it works under the hood, and — honestly — when you need a dedicated one versus a lighter option.

This is the conceptual explainer. When you’re choosing a specific product, the head-to-heads live on the comparison pages: best self-hosted vector databases, Qdrant vs Weaviate, and more.

The problem a vector database solves

Once your content is turned into embeddings, searching by meaning is conceptually simple: embed the query, then find the stored vectors with the highest similarity. The catch is scale.

Comparing your query to a few hundred vectors one-by-one (“brute force”) is instant. Comparing it to ten million, on every single search, is far too slow for a real application. A naive search has to touch every vector, every time — that cost grows linearly and quickly becomes unworkable.

A vector database exists to make that search fast anyway. It does three things a plain database doesn’t do well:

Stores high-dimensional vectors efficiently alongside their metadata.
Builds a specialized index so similarity search doesn’t have to scan everything.
Serves nearest-neighbor queries with low latency, plus filtering, updates, and persistence.

How it works: approximate nearest neighbor (ANN) search

The core technique is Approximate Nearest Neighbor (ANN) search. The key word is approximate. Instead of guaranteeing it finds the mathematically perfect closest vectors (which requires checking them all), an ANN index finds almost certainly the closest ones, by cleverly skipping the vast majority that can’t possibly be relevant.

This is a deliberate trade: you give up a tiny bit of accuracy (called recall — the fraction of the true nearest neighbors you actually retrieve) in exchange for an enormous speed-up. A well-tuned index might return 98–99% of the true neighbors while searching a thousandth of the data. For search and RAG, that trade is almost always worth it.

The two index families you’ll meet most:

HNSW (Hierarchical Navigable Small World)

HNSW is the most popular ANN index in modern vector databases. It builds a layered graph where each vector is connected to its nearby neighbors, with sparse “express lanes” at the top layers for jumping across the space quickly. A search starts at the top, hops toward the query region, then descends into denser layers to refine — like zooming in on a map.

Strengths: excellent recall-vs-speed balance, great for high-throughput, low-latency queries. The usual default.
Costs: higher memory use (it keeps the graph in RAM) and slower to build/insert than simpler indexes.

IVF (Inverted File Index)

IVF first clusters all your vectors into groups (cells). At query time it figures out which few clusters your query is near and searches only those, ignoring the rest. It’s often combined with compression (e.g. product quantization) to shrink memory — you’ll see names like IVF and IVFPQ.

Strengths: lower memory footprint, fast to build, scales to very large datasets, especially with quantization.
Costs: recall depends on how many clusters you probe; tuning matters more, and accuracy can dip if your data doesn’t cluster cleanly.

	HNSW	IVF (+PQ)
Structure	Layered neighbor graph	Clustered cells
Recall/speed	Excellent, default choice	Good, very tunable
Memory	Higher (graph in RAM)	Lower (esp. with quantization)
Build/insert speed	Slower	Faster
Best when	Latency-critical search	Huge datasets, memory-constrained

You rarely implement these yourself — you pick a database and choose (or accept the default of) an index type. But knowing the trade lets you read a config file and understand why your search is fast, slow, or memory-hungry.

Beyond similarity: metadata filtering

Pure similarity search isn’t enough for real applications. You almost always need to combine “closest in meaning” with hard constraints: only this user’s documents, only articles from the last 30 days, only items in stock. That’s metadata filtering.

A vector database stores structured fields (a category, a date, a tenant ID, a price) alongside each vector, and lets you filter on them in the same query as the similarity search. The hard part — and a real differentiator between products — is doing this efficiently: a naive implementation either filters first and then has too few vectors to search well, or searches first and throws away most results. Good engines integrate filtering into the ANN traversal so it stays fast and accurate. This is one reason a dedicated vector database can beat a bolt-on solution at scale.

Hybrid search: vectors plus keywords

Semantic (vector) search is great at meaning but can miss exact strings — a precise product SKU, an error code, a person’s name. Keyword search (BM25) is the opposite: literal and precise, blind to synonyms. Hybrid search runs both and fuses the scores, so you get the recall of semantic search and the precision of keyword matching.

Many vector databases offer hybrid search as a built-in single-query feature, often using a fusion method like Reciprocal Rank Fusion (RRF) to merge the two ranked lists. For most production search, hybrid is the strongest default — we cover the why in What Is Semantic Search. When comparing products, native hybrid support (versus DIY assembly in application code) is a meaningful line item.

When you actually need a vector database

Here’s the honest part, because not every project needs a dedicated vector database. Match the tool to the scale:

A library like FAISS — for in-process, fixed datasets

FAISS (and similar libraries) is an ANN library, not a database. It gives you the index algorithms (HNSW, IVF, and more) to run inside your own process. There’s no server, no API, no built-in persistence or filtering — you manage all of that. It’s ideal for research, batch jobs, notebooks, or embedding a fast index into one application where the dataset is mostly static and you don’t need concurrent writes, metadata filtering, or a network service. When your needs grow past “fast search in one process,” a library starts to feel like infrastructure you’re rebuilding by hand.

pgvector — when you already run Postgres

pgvector is an extension that adds vector search to PostgreSQL (it supports HNSW and IVFFlat indexes). If you already run Postgres, your vector count is modest-to-moderate, and you value keeping vectors next to your relational data — same database, same backups, same transactions, one fewer system to operate — pgvector is often the most pragmatic choice. The tradeoff: at very large scale or with demanding hybrid-search and filtering needs, a purpose-built engine generally has the edge, since vectors are pgvector’s add-on rather than its whole reason to exist.

A dedicated vector database — at scale and in production

Dedicated engines (Qdrant, Weaviate, Milvus, Chroma, and others) are the right call when you have millions of vectors, need low-latency search under concurrent load, want efficient metadata filtering and native hybrid search, and need real operational features: horizontal scaling, replication, live inserts and deletes, and a network API. This is the production tier for serious semantic search and RAG.

A useful rule of thumb:

Situation	Reasonable choice
Prototype / notebook / static dataset	A library like FAISS
Already on Postgres, modest scale	pgvector
Millions of vectors, production load, filtering + hybrid	Dedicated vector database

The point of this guide is what a vector database is and whether you need one. Which dedicated database to pick is a separate, opinionated question — covered in best self-hosted vector databases, Qdrant vs Weaviate, and the Typesense vs Meilisearch comparison for the site-search angle.

How it fits the self-hosted stack

For anyone building private, self-hosted AI search, the vector database is the piece that holds your index — and therefore your data. Self-hosting it means your embeddings (which encode the meaning of your private documents) never leave your infrastructure. Most of the strong options ship an official Docker image and run on a single box for a small-to-medium corpus, scaling to a cluster when you need it. Own this layer and you own the most sensitive part of the stack: the searchable representation of your content.

FAQ

What’s the difference between a vector database and a regular database? A regular database matches exact values and keywords. A vector database stores embeddings and finds items by similarity of meaning using ANN indexes. They solve different problems — and many real systems use both, or use pgvector to add vector search to a regular Postgres database.

Is a vector database the same as an embedding? No. An embedding is a single vector representing one piece of content. A vector database stores many embeddings and searches them efficiently. The embedding is the data; the database is where it lives and gets queried.

Do I need a vector database for RAG? You need somewhere to store and search embeddings. For a small or prototype RAG system, a library or pgvector is often enough. At production scale — millions of vectors, concurrent queries, filtering — a dedicated vector database earns its place.

What is HNSW, in one sentence? HNSW is a graph-based ANN index that finds approximate nearest neighbors very quickly by navigating a layered network of vectors; it’s the most common default index in modern vector databases.

Can’t I just use pgvector for everything? Often, yes — especially if you already run Postgres and your scale is modest. pgvector handles a lot of real workloads well. Dedicated engines pull ahead at large scale or when you need advanced hybrid search and high-throughput filtered queries. See the comparison pages for the tradeoffs.

A vector database is the search engine at the heart of meaning-based search. Build your understanding outward: start with embeddings, see them in action in semantic search, or pick a self-hosted option in best self-hosted vector databases. Aquila is the independent home for AI search you own. Own your search.