Self-Host SearXNG: Your Own Private Metasearch Engine (No Tracking)

SearXNG is a free, self-hostable metasearch engine: it forwards your query to hundreds of other search engines, aggregates the results, and hands them back to you — without tracking you, profiling you, storing your queries, or showing ads. Self-hosting it gives you a private search front-end you fully own, running on a $5 VPS or a spare box at home. This guide covers why you’d want one, how to stand it up with Docker, the configuration that actually matters, and when it’s worth bolting an Ollama-powered LLM on top to get AI-style answers (the Vane/Perplexica stack).

If you want the wider landscape of AI answer engines first, see Open-Source Perplexity Alternatives — SearXNG is the private search layer several of them sit on.

What SearXNG actually is

SearXNG is a community-maintained fork of the original Searx project. It does not crawl or index the web itself. Instead it acts as a proxy in front of the search engines you already know — Google, Bing, DuckDuckGo, Brave, Wikipedia, and many more. You type a query, SearXNG fans it out to whichever engines you’ve enabled (up to a couple hundred services), de-duplicates and re-ranks the combined results, and returns one clean page.

A few facts worth pinning down (verified June 2026):

	SearXNG
What it is	Self-hostable metasearch engine
License	AGPL-3.0
GitHub stars	~32.3k (as of June 2026)
Engines aggregated	200+ search services
Tracking	None — users are neither tracked nor profiled
Ads	None
Deployment	Docker (official image), or bare-metal

The headline property is the one you can’t get from any commercial search engine: it logs nothing about you and monetizes nothing about you. There’s no profile, no ad auction, no “people also searched for” surveillance loop. The results are just results.

Why self-host your search

You can use one of the many public SearXNG instances without hosting anything. But running your own gets you guarantees a public instance can’t:

No trust assumption. On a public instance you’re trusting an operator you’ve never met not to log queries. On your own instance, you are the operator. The privacy is structural, not promised.
No rate-limiting roulette. Public instances get hammered, hit upstream rate limits, and serve degraded results. Yours serves only you (or your team).
Full configuration control. You choose exactly which engines run, the default region and language, safe-search level, and the UI. You can disable the engines that leak the most or perform worst for your use.
A private base for AI search. This is the big one for developers: a self-hosted SearXNG is the retrieval plumbing that AI answer engines like Vane plug into. Own the search layer and the whole self-hosted AI search stack stays on your hardware.

The tradeoff is the usual one: you run it, you patch it, you keep it alive. For a metasearch front-end that’s genuinely light work — but it’s still your box now.

Docker setup overview

SearXNG ships an official Docker image, and Docker is the path of least resistance. The shape of a deployment looks like this — always defer to the project’s current README and searxng-docker repo for exact, up-to-date commands, because compose files and config keys drift between releases.

1. Prerequisites

You need a host with Docker and Docker Compose installed. A 1–2 vCPU / 1–2 GB RAM VPS is plenty for a personal instance — SearXNG is lightweight because the heavy lifting (actual crawling and indexing) happens on the upstream engines, not on your box. A Hetzner or DigitalOcean-class VPS, or a Raspberry Pi / home server, all work.

2. Pull the stack

The recommended deployment is the searxng-docker compose bundle, which brings up three containers:

SearXNG itself (the search app)
Redis / Valkey (an in-memory cache SearXNG uses for rate-limiting and short-lived state)
Caddy (an optional reverse proxy that handles TLS / HTTPS automatically)

You clone the searxng-docker repository, set a couple of environment values, and run docker compose up -d. That’s the whole “installation.”

3. Point it at your domain (or localhost)

If you’re exposing it on the internet, set your hostname and let Caddy fetch a Let’s Encrypt certificate so the instance is served over HTTPS. If it’s purely internal — your laptop, your homelab, behind a VPN — you can run it on localhost or a private IP and skip public TLS entirely. Internal-only is the most private posture: nothing about your instance is reachable from outside your network.

4. Generate a secret and lock it down

On first run you set a secret_key (used for signing). If your instance is public, you’ll also want to keep the JSON API format restricted unless you specifically need it open (the bundled config disables open formats by default for a reason — an open instance can be abused as a scraping proxy).

That’s the baseline. You now have a working private metasearch engine at your address.

Configuration basics

Almost everything is driven by a single YAML file, settings.yml. The Docker bundle lets you mount your own copy and override defaults. The settings that matter most for a real deployment:

Engines

Each upstream search engine is a block you can enable, disable, or weight. Out of the box, dozens are on. In practice you’ll want to:

Disable engines that constantly rate-limit or break so they stop dragging down result latency.
Disable engines you don’t trust for privacy if you’re being strict.
Weight the engines you trust most so their results rank higher in the aggregated list.

Search defaults

Set the default language/region, safe_search level, and which categories (general, images, news, science, etc.) appear. For a team instance, sensible defaults save everyone from fiddling with toggles.

Rate limiting and bot protection

The limiter (backed by Redis/Valkey) protects a public instance from abuse and from getting your upstream engines to rate-limit your server’s IP. Leave it on for anything internet-facing.

UI and result format

You can theme the front-end, set the default results-per-page, and — importantly for the AI stack — enable the JSON output format if (and only if) you intend to query SearXNG programmatically from an LLM pipeline. Keep it disabled otherwise.

A pragmatic rule: change as little as possible at first. Get it running with defaults, use it for a week, then prune the engines that annoy you.

The privacy benefits, concretely

It’s worth being specific about what “private” buys you here, because the word is overused:

No query logging tied to you. A self-hosted instance you run for yourself has no incentive — and, configured normally, no mechanism — to build a search history profile. There’s no advertiser to sell it to.
No IP fingerprinting by upstream engines… of you. SearXNG queries the upstream engines on your behalf, from its IP, not yours. Google sees your server, not your browser. You get the engines’ results without handing them your identity, cookies, or fingerprint.
No ad-driven ranking. Commercial engines rank partly by what’s profitable to show you. SearXNG just aggregates and de-duplicates. No sponsored slots are silently mixed in.
No third party in the loop. On an internal instance, your queries never leave your network at the application layer — they go out only as anonymized fan-out requests to the engines you chose.

The honest caveat: the upstream engines still see the queries (just not who you are). SearXNG de-identifies you; it doesn’t make the search itself disappear. If an engine could fingerprint the query content itself, that content still leaves your box. For most people the de-identification is exactly the win they wanted; for true air-gapped needs you’d be searching a local index, not the live web.

When to add an LLM layer (the Vane / Ollama stack)

SearXNG by itself returns a list of links — fast, private, ad-free, but it’s a list. To get a Perplexity-style cited answer (“here’s the synthesized answer, with sources”), you add a language model on top that:

Takes your question and runs a SearXNG query (using that JSON output format).
Reads the top results.
Writes a grounded, cited summary.

You don’t have to build that yourself. Vane (the project formerly known as Perplexica, MIT-licensed, ~35.4k GitHub stars as of June 2026) bundles exactly this: a Next.js front-end, an API backend, and a private SearXNG instance, shipped as a Docker image. Point it at a local Ollama model for fully air-gapped AI answers, or at a cloud LLM (OpenAI, Claude, Gemini, Groq) when you want more reasoning horsepower. Khoj and SurfSense are alternatives with different strengths — all covered in Open-Source Perplexity Alternatives.

So when should you add the LLM layer?

Add it when you want answers, not link lists — research, Q&A, “summarize the current state of X.” This is where a model earns its keep.
Keep it bare when you just want private web search to click through yourself. A raw SearXNG instance is lighter, faster, and needs no GPU.
Go local-LLM (Ollama) when privacy is the whole point — the query, the search, the retrieved pages, and the generated answer all stay on your hardware.
Go cloud-LLM for generation only when you want frontier-model quality and are willing to send the prompt and retrieved context to a provider, while keeping the search private. Make that tradeoff knowingly.

If your real goal is chatting with your own documents rather than the live web, that’s retrieval-augmented generation, and the building blocks are embeddings and a vector database rather than a metasearch engine. SearXNG searches the web; RAG searches your stuff. Many self-hosted setups run both.

FAQ

Is SearXNG a search engine or a metasearch engine? A metasearch engine. It doesn’t crawl or index the web itself — it forwards your query to other engines (200+ of them) and aggregates the results. That’s why it’s lightweight to self-host: there’s no index to build or store.

Do I need a powerful server to self-host SearXNG? No. A 1–2 vCPU / 1–2 GB RAM VPS or even a Raspberry Pi handles a personal instance comfortably, because the heavy lifting happens on the upstream engines. You only need more horsepower if you add a local LLM for AI answers, where a GPU helps.

Is self-hosted SearXNG actually private? Yes, with one nuance. Your queries aren’t logged or profiled, and upstream engines see your server’s IP rather than yours — so they can’t fingerprint you. The query content itself still reaches the upstream engines (just de-identified). For most people that de-identification is the whole point.

Can SearXNG give me AI answers like Perplexity? Not on its own — it returns links. Pair it with an LLM and you get cited answers. The easiest route is Vane (formerly Perplexica), which bundles SearXNG plus a model layer; point it at a local Ollama model for fully private AI search.

SearXNG vs SearX — what’s the difference? SearXNG is the actively maintained community fork of the original Searx. It’s the one you want today; “Searx” tutorials are mostly outdated.

A private search front-end is one of the highest-leverage things you can self-host: small footprint, big privacy payoff. From here, add an open-source AI answer engine for cited summaries, learn the embeddings that power searching your own data, or browse all guides. Aquila is the independent home for AI search you own. Own your search.