How to Self-Host Vane (formerly Perplexica): A Complete Guide
Run a Perplexity-style AI answer engine on your own hardware — bundled SearXNG, your choice of LLM, nothing logged.
Vane (the project formerly known as Perplexica) is the most popular open-source, self-hosted alternative to Perplexity — an AI answer engine that takes your question, searches the web through a private SearXNG instance, reads the results, and writes a cited answer. This guide walks you through self-hosting it end to end: the Docker setup using the bundled image, connecting a local model via Ollama or a cloud LLM, the configuration that matters, and how to keep the whole loop private. If you’re still deciding between AI answer engines, start with our wider roundup of open-source Perplexity alternatives and come back here once you’ve settled on Vane.
The canonical repository is github.com/ItzCrazyKns/Vane. The old ItzCrazyKns/Perplexica path now redirects there — same project, same lineage.
What Vane is (and what it isn’t)
Vane is an AI answering engine, not a chatbot and not a document knowledge base. It is the closest open-source experience to Perplexity itself: you type a question in plain language, Vane runs a real web search, ranks and reads the top results, and returns a synthesized answer with inline source citations you can click through.
Architecturally, it bundles three things into one deployable stack:
- A Next.js frontend — the chat-style web UI you actually use.
- An API backend — the orchestration layer that handles search, retrieval, and the LLM calls (the repository is roughly 99% TypeScript).
- A private, integrated SearXNG metasearch engine — the privacy-preserving search layer that fans your query out to many search backends without tracking you.
That bundling is the reason Vane is so easy to self-host: you don’t have to wire SearXNG up separately. The whole thing ships as a single Docker image, itzcrazykns1337/vane:latest, and a docker compose up stands it up.
| Vane | |
|---|---|
| What it is | Self-hosted AI answer engine (Perplexity-style) |
| Formerly | Perplexica (rebranded around March 2026) |
| Canonical repo | github.com/ItzCrazyKns/Vane |
| License | MIT |
| GitHub stars | ~35.4k (as of June 2026) |
| Stack | Next.js frontend + API backend + bundled SearXNG |
| Docker image | itzcrazykns1337/vane:latest |
| Local LLMs | Yes — via Ollama |
| Cloud LLMs | OpenAI, Anthropic Claude, Google Gemini, Groq, and more |
The MIT license is worth flagging: it’s permissive, so you can run, modify, and even embed Vane in commercial work without copyleft obligations. That’s a meaningful difference from AGPL-licensed tools like Khoj if you plan to build on top of it.
Naming note. You’ll still find “Perplexica” all over GitHub forks, Medium posts, and YouTube tutorials written before the rebrand. Vane and Perplexica are the same project — when you search for setup help, search for both names.
Why self-host Vane instead of using Perplexity
Hosted Perplexity is polished and frictionless. It also logs every query you make, decides on its own opaque logic which sources to trust, can’t run offline or behind your firewall, and charges a subscription for the good models. Self-hosting Vane flips each of those:
- Privacy by architecture. With a local model, your query, the web search, the retrieved pages, and the generated answer never leave your network. Nothing is logged by a third party and nothing trains someone else’s model.
- Model choice. Run a local LLM through Ollama for fully air-gapped answers, or wire in a frontier cloud model when you want maximum reasoning quality — your call, per task.
- No gatekeeper. You control which search engines feed the answers and there are no ads or ranking incentives working against you.
- No subscription. Past the cost of a VPS (or a box you already own), running it is effectively free if you use local models.
The honest tradeoff: you run it, you patch it, you keep it alive. A local 7–8B model won’t reason like a frontier cloud model, and a SearXNG-backed search won’t have the freshness of a commercial crawler. For a privacy-conscious developer, owning the stack is exactly the point. For a casual user, the hosted product is less hassle. This guide assumes you’ve decided control matters more than convenience.
Before you start: prerequisites
You need three things:
- A host with Docker and Docker Compose. A VPS, a homelab box, or your own laptop all work. For CPU-only inference, a 2–4 vCPU / 8 GB RAM machine is a reasonable floor; for comfortable local LLM generation you want a GPU (consumer 12–24 GB VRAM class makes a big difference).
- A model backend. Either a running Ollama instance (for local, private inference) or an API key for a cloud provider (OpenAI, Anthropic, Gemini, Groq).
- Basic comfort with the terminal. Everything below is
git clone, edit a config file,docker compose up.
A note on hardware: SearXNG and the Vane app itself are light. The resource cost is almost entirely the LLM. If you point Vane at a cloud model, even a small VPS runs the stack fine because the heavy compute happens at the provider.
Step-by-step: self-hosting Vane with Docker
These steps describe the general shape of a Vane deployment. Always follow the project’s own README for exact, current commands — config keys, compose files, and image tags drift between releases, especially after a rebrand.
1. Install Docker and clone the repo
Install Docker and Docker Compose on your host, then clone the canonical repository:
git clone https://github.com/ItzCrazyKns/Vane.git
cd Vane
The repo ships a docker-compose.yaml that defines the full stack — frontend, backend, and the bundled SearXNG. If you’d rather not build from source, the prebuilt image itzcrazykns1337/vane:latest is referenced by the compose file, so most users never compile anything.
2. Create your configuration
Vane reads its settings from a config file (historically a TOML file you copy from a .sample). This is where you tell it which model backend to use. The two paths:
- Local (Ollama): point Vane at your Ollama server’s address (commonly
http://host.docker.internal:11434when Ollama runs on the host and Vane runs in Docker). You’ll select a chat model and an embedding model from what you’ve pulled. - Cloud: paste an API key for OpenAI, Anthropic Claude, Google Gemini, or Groq. Leave the local fields blank or fill in both and choose per query in the UI.
You don’t have to commit to one — Vane lets you register multiple providers and switch models from the web UI at query time.
3. (For local models) pull your Ollama models first
If you’re going local, make sure Ollama has the models before you start Vane:
ollama pull llama3.1 # a chat/reasoning model
ollama pull nomic-embed-text # an embedding model for retrieval
A capable instruct model in the 7–8B range is the usual starting point for local AI search; scale up if your hardware allows. For more on picking the embedding side, see our guide to the best local embedding models.
4. Bring up the stack
From the project directory:
docker compose up -d
This starts the Next.js frontend, the API backend, and the bundled SearXNG instance together. Give it a moment to pull images and initialize on the first run.
5. Open the UI and ask a question
By default the web UI is served on a local port (commonly 127.0.0.1:3000 — check the compose file). Open it in your browser, type a question, and Vane will:
- Send your query to its private SearXNG instance.
- Retrieve and rank the top web results.
- Feed them to your chosen LLM as context.
- Return a synthesized, cited answer.
If you chose a local model, that entire loop ran on your hardware with nothing leaving your network.
Connecting Ollama vs. a cloud LLM
This is the most consequential choice you’ll make, because it’s the privacy-vs-quality dial.
| Ollama (local) | Cloud LLM | |
|---|---|---|
| Where prompts go | Stay on your hardware | Sent to the provider |
| Privacy | Fully air-gapped possible | Provider sees prompts + retrieved context |
| Answer quality | Good; bounded by your hardware | Frontier-grade reasoning |
| Cost | Hardware only (no per-query fee) | Per-token API billing |
| GPU needed | Strongly recommended | No |
| Best for | Privacy-critical, offline, or cost-sensitive use | Maximum quality on hard questions |
A pragmatic pattern many people land on: keep search and retrieval local (SearXNG runs on your box regardless), and use a cloud LLM only for the final generation step when you want frontier quality. Your queries to the search layer stay private; only the prompt and retrieved snippets go to the model provider — a tradeoff you’re now making knowingly rather than by default. If privacy is the whole reason you’re here, go fully local with Ollama.
Configuration that actually matters
Once Vane is running, a few settings are worth tuning:
Search modes (Speed / Balanced / Quality)
Vane exposes search modes that trade latency for thoroughness. Speed does a lighter search and answers fast; Quality retrieves and reads more before answering. Start on Balanced and adjust based on whether you care more about wait time or depth.
Which model handles which job
You can often assign one model to chat/generation and a different (smaller, cheaper, or local) one to embeddings. A common cost-saver: local embeddings for retrieval, cloud model only for the final answer.
The bundled SearXNG
Because Vane ships its own SearXNG, you inherit all of SearXNG’s tunability — which upstream engines are enabled, default region and language, safe-search. If a particular engine is rate-limiting or returning junk, disabling it in the SearXNG config improves Vane’s answers. For the full picture of what you can tune there, see our self-host SearXNG guide — the same engine, just running standalone.
Keeping it internal
The most private posture is to not expose Vane to the public internet at all — run it on localhost, a private IP, or behind a VPN, so the instance is only reachable from inside your network. If you do expose it (for a team), put it behind a reverse proxy with TLS and authentication; never leave an AI answer engine with cloud API keys open to the world.
Vane vs. the alternatives — a quick orientation
Vane is the best pick when you want “Perplexity, but mine” — a web answer engine. It’s less of a document knowledge base than some alternatives:
- Want to chat with your own documents (PDFs, notes, Markdown) across many clients? Khoj is the stronger choice — it’s an AI “second brain” built around your files, with pgvector under the hood.
- Want just private web search to click through yourself, no AI layer? Run SearXNG standalone.
- Building Q&A over your own corpus rather than the live web? That’s retrieval-augmented generation, a different architecture built on embeddings and a vector database.
Many people run more than one — Vane for the web, a RAG stack for their own data.
Privacy and data ownership
This is the entire reason to self-host. With Vane pointed at a local Ollama model and using its bundled SearXNG, the full loop — your query, the search, the retrieved pages, the generated answer — stays inside your network. No third party logs it, no provider trains on it, and you can run it air-gapped.
The moment you wire in a cloud LLM, you reintroduce one dependency: the prompt and retrieved context go to that provider for the generation step. That can be a perfectly reasonable, deliberate tradeoff for answer quality — just make it knowingly, and keep the search and any document ingestion local if privacy is your driver. The same principle runs through every guide on this site, from SearXNG to our self-hosted RAG guide.
FAQ
Is Vane the same as Perplexica?
Yes. The project formerly known as Perplexica was rebranded to Vane around March 2026. The GitHub repo moved to github.com/ItzCrazyKns/Vane and the old Perplexica path now redirects there. Older tutorials still say “Perplexica” — same software.
Can I run Vane completely offline? Yes. Point Vane at a local LLM through Ollama and use its bundled SearXNG, and the whole pipeline — search, retrieval, and answer generation — runs on your own hardware with nothing leaving your network. Note that web search still needs internet access to reach upstream search engines; “offline” here means no third-party AI service, not no network.
Do I need a GPU to self-host Vane? Only for local LLM inference, and even then it’s a strong recommendation rather than a hard requirement — local models run on CPU but slowly. If you point Vane at a cloud LLM (OpenAI, Claude, Gemini, Groq), the heavy compute happens at the provider and a small CPU-only VPS runs the stack comfortably.
Is Vane free to use? Yes. Vane is open-source under the permissive MIT license, so there’s no software cost. Your only costs are the hardware or VPS you run it on, plus per-token API fees if you choose a cloud LLM instead of a local one.
Vane vs. Khoj — which should I self-host? Pick Vane if you want a Perplexity-style engine that answers from the live web. Pick Khoj if your goal is searching and chatting with your own documents across browser, Obsidian, desktop, and phone. They solve overlapping but different problems, and plenty of people run both.
Vane is the most direct way to own a Perplexity-style answer engine end to end. From here, learn the private search layer it sits on, compare it against Khoj and other alternatives, or move on to chatting with your own documents. Aquila is the independent home for AI search you own. Own your search.
Keep going
More guides on self-hosted AI search, RAG, and vector databases.