LibreChat vs Open WebUI vs LobeChat: Self-Hosted Chat 2026

By Fanny Engriana · May 20, 2026 · 14 min read · 52 views

librechat openwebui lobechat self-hosted ai vps ollama rag

If you have an OpenAI API key, an Anthropic key, and maybe a local Ollama box, you are paying for three different web UIs that all do roughly the same thing — chat. Self-hosting a single front-end on a VPS solves that, but in 2026 the choice is not obvious. LibreChat, Open WebUI, and LobeChat have each carved out a clear identity, and picking the wrong one means rebuilding your team's workflow six months later.

I have been running self-hosted infrastructure across seven aggregator sites on Hostinger (shared and VPS) for the last 18 months, plus a side of 50+ client projects on Warung Digital Teknologi where AI integrations are now a default ask. When I set up an internal AI gateway for our team three months ago, I deployed all three of these — in production, in parallel — for about six weeks before consolidating. This is the comparison I wish I had read before I started.

Self-hosted AI chat interface on laptop screen

The short answer

If you want the TL;DR before the 3000 words below:

LibreChat — pick this if you need multi-provider routing, conversation branching, agents with tool calls, and proper SSO (LDAP/OIDC) for a team. It is the closest thing to a self-hosted ChatGPT Team plan.
Open WebUI — pick this if your primary workload is local models via Ollama, or if RAG over your own documents is the main use case. It has the cleanest document-Q&A pipeline of the three.
LobeChat — pick this if UX polish matters more than back-end depth, or if you want PWA/mobile-first, voice chat, and a real plugin marketplace for non-engineer users.

What I tested and how

I ran each of the three on a Hetzner CX22 (2 vCPU AMD, 4 GB RAM, 40 GB NVMe — about €4.51/month with IPv4) using Docker Compose, behind a Caddy reverse proxy with Let's Encrypt. Same OpenAI key, same Anthropic key, same self-hosted Ollama box (a separate Hetzner CCX13 with 8 GB RAM running llama3.2:3b and qwen2.5:7b for cheap fallback). I measured cold-boot RAM, steady-state RAM after one hour of light use, and the time it took me to wire up each of three real tasks our team actually does daily:

Summarising a 40-page PDF transcript of a client meeting into bullet points.
Drafting a SQL migration against a schema I pasted in, then tweaking it across five rounds of follow-ups.
Translating Indonesian product copy to English while preserving brand tone — repeated across batches of 30+ items.

Those are not synthetic benchmarks. They are the same three flows that justified paying for ChatGPT Plus and Claude Pro for half my team before I tried to consolidate.

Quick comparison table

Aspect	LibreChat	Open WebUI	LobeChat
Backend stack	Node.js + MongoDB + Meilisearch + optional Postgres/pgvector	Python (FastAPI) + SQLite or Postgres + ChromaDB	Next.js + Postgres + S3-compatible storage
Minimum RAM (idle)	~750 MB (LibreChat + Mongo + Meilisearch)	~750 MB single container, drops to ~300 MB with env tweaks	~400 MB Next.js + ~200 MB Postgres
Recommended VPS	4 GB RAM, 2 vCPU	2 GB RAM, 2 vCPU (8 GB if also running Ollama on same box)	2 GB RAM, 1 vCPU
Multi-provider out of the box	OpenAI, Anthropic, Google, Mistral, Ollama, Bedrock, Azure, custom	OpenAI-compatible only by default (Ollama, LiteLLM proxy)	OpenAI, Anthropic, Gemini, Ollama, DeepSeek, Qwen, Bedrock, Azure
SSO	OAuth, LDAP, OIDC, SAML (native)	OAuth, OIDC (LDAP via env)	OAuth (Clerk, Auth0, Logto), basic OIDC
RAG / document Q&A	Yes (dedicated RAG API container, pgvector)	Yes (built-in, ChromaDB, hybrid search)	Yes (knowledge base, file upload, S3 storage)
MCP support	Native (since v0.7.6)	Pipelines + MCP via plugin	Native MCP since late 2025
Conversation branching	Yes (forks)	No (linear history)	No (linear, with regenerate)
Voice (STT/TTS)	Yes (Whisper, ElevenLabs)	Yes (Whisper, browser TTS)	Yes (best UX of three)
Mobile / PWA	Responsive web only	Responsive web only	Full PWA, installable
License	MIT	BSD-3-Clause	Apache 2.0

LibreChat: the team workhorse

LibreChat is the project most explicitly trying to be a self-hosted ChatGPT Team. It is written in Node.js, uses MongoDB for chat history, Meilisearch for full-text search across past conversations, and optionally a PostgreSQL+pgvector container for RAG. That sounds like a lot, and it is — the default Docker Compose file spins up five containers.

What you get for that complexity is the most complete feature set of the three. Conversation branching alone is worth the deployment headache if you do real engineering work with LLMs: you can fork a thread at any message, try a different prompt path, and the original branch stays intact. I have not seen Open WebUI or LobeChat ship this convincingly yet. Forks save my team time on the SQL-migration workflow specifically — we routinely try three variants of a query before picking one, and in linear-history UIs we end up with a polluted scrollback.

Multi-provider routing is the second big LibreChat win. Drop your OpenAI, Anthropic, Google, Mistral, Bedrock, Azure, and Ollama credentials into librechat.yaml and they all appear as selectable endpoints in the same chat box. Per-user permissions are real — you can let your interns see only the cheap gpt-4o-mini endpoint while engineers get Claude Opus. We have that exact setup running today for our content team.

Resource reality on a 4 GB VPS

The official docs recommend 4 GB RAM with all features enabled. My measurements on the Hetzner CX22 matched that: LibreChat itself ran at ~103 MiB, MongoDB at ~35 MiB, Meilisearch at ~431 MiB steady-state. Add the RAG API container and pgvector and you are at ~1.8 GB before any traffic. That leaves room on a 4 GB box, but barely — I had to set MEILI_MAX_INDEXING_MEMORY=500MB and MEILI_MAX_INDEXING_THREADS=2 to keep Meilisearch from spiking during search index rebuilds.

The catastrophic failure mode is documented in the LibreChat issues tracker: if you let your Mongo database grow to tens of thousands of messages and then restart the stack, Meilisearch will try to sync the entire index at boot and OOM the host. I hit this on a smaller 2 GB Contabo VPS after a month of use. The workaround is either: 4 GB minimum, or disable the startup sync and run it as a cron. I went with 4 GB — it costs €4.51/month, the cron approach costs more debugging hours.

What LibreChat does badly

The MongoDB requirement is the biggest sore spot. Mongo has its own ops curve, its own backup workflow, and its own indexing footguns. If you are coming from a Postgres-everywhere shop (which I am — six of my seven aggregator sites are on PostgreSQL or MySQL, no Mongo), it is one more datastore to babysit. There is a long-standing issue thread asking to support Postgres as the primary store; as of mid-2026 it is still tagged "future enhancement."

The UI is also less polished than LobeChat's. It is functional, fast, and unambiguous — but if you are showing it to non-technical stakeholders who are used to the ChatGPT app, they will notice the rougher edges.

Server rack hosting self-hosted AI chat infrastructure

Open WebUI: the RAG and local-model king

Open WebUI started life as the official-ish front-end for Ollama, and that lineage still shows. If you have a GPU box (or are renting one — see my Thunder Compute vs RunPod comparison) and you want to run local Llama/Qwen/DeepSeek models, this is the front-end with the least friction. Pull a model with ollama pull llama3.2:3b, refresh the browser, and it shows up in the model picker.

What surprised me in testing was how much better the RAG pipeline has become in 2026. Open WebUI's "Knowledge" feature lets you create a collection, drag in a folder of PDFs or markdown files, choose your embedding model (it ships with sentence-transformers by default, can route to OpenAI's text-embedding-3-large if you have credit), and point any chat at it. The web search integration — using your choice of Brave, SearXNG, Tavily, or DuckDuckGo — is genuinely the cleanest of the three I tested. For our PDF-summary workflow, I dropped a folder of 12 client meeting transcripts in, and asked "what did the client in the Helsinki account ask us about pricing?" — got a correct, cited answer in about 8 seconds against a local Qwen 7B.

The single-container appeal

Open WebUI's "everything in one container" architecture is its biggest practical advantage on small VPS plans. With the right environment variables — disabling Whisper, Tika, and the embedding model at startup if you do not need them — I measured steady-state RAM at around 320 MB. That comfortably fits a Hetzner CX22 (4 GB, €4.51/mo) or even a Contabo VPS S (8 GB, €4.50/mo) with room left over for Ollama on the same box.

For solo developers and small teams (under 10 users), I think this is the cheapest production-quality option in the comparison.

The cost of that simplicity

The Achilles' heel is multi-provider support. Open WebUI's native API surface is OpenAI-compatible only. To talk to Anthropic, Gemini, Bedrock, or anything exotic, you put LiteLLM in front as a translation proxy. LiteLLM is good software — I run it in production — but it is now another container, another config file, another set of credentials to rotate. LibreChat does this work for you in librechat.yaml; Open WebUI makes you wire it yourself.

The other thing missing is conversation branching. The history model is strictly linear. You can regenerate the last response, you can edit and re-send any message (which lops the tail), but you cannot keep two parallel branches alive. If branching is a hard requirement, this is a deal-breaker.

LobeChat: the UX winner

LobeChat is the front-end that non-engineers actually like using. The PWA installs on iOS and Android home screens and behaves like a native app — push-style notifications when a long generation completes, offline-cached UI, dark mode that does not look like a Bootstrap default. Of the three, it is the one I would install on a stakeholder's phone without hesitation.

The agent marketplace is the other distinguishing feature. There is a public catalogue of community-contributed agents (system prompts plus tool configurations) for everything from "code reviewer" to "Indonesian-to-English translator with marketing tone." When I needed to spin up our translation workflow, I cloned a marketplace agent, tweaked it for our brand voice in about ten minutes, and shipped it to the team. The equivalent setup in LibreChat took me 40 minutes of YAML editing.

Architecture and where it pays off

LobeChat is a Next.js app. The "local mode" deployment uses browser IndexedDB for chat history — which means zero database required for personal use, and zero RAM cost for that part. The "server mode" (which is what you want for team deployments) adds Postgres for shared state and S3-compatible storage for uploaded files. I run it against a Cloudflare R2 bucket — about $0.015/GB/month storage, no egress fees — which has worked well so far.

On the same Hetzner CX22, server-mode LobeChat ran at about 600 MB total (Next.js + Postgres). That is the lightest of the three when configured with comparable features.

Where LobeChat is still maturing

The thing that kept LobeChat from being our final choice is the enterprise auth story. As of mid-2026, LobeChat's first-class auth providers are Clerk, Auth0, and Logto — all SaaS-flavoured. Self-hosted OIDC works, but it is less battle-tested than LibreChat's, and LDAP support is community-contributed rather than first-party. If you need to plug into an existing Active Directory or Keycloak instance, LibreChat is a smoother ride.

The other gap is the agent permissions model. You can share agents with all users, or keep them private — there is no per-group access control. For a 30-person team where the marketing agent should not be visible to engineering, that is awkward.

VPS sizing: what to actually rent

This section is the practical bit. I tested each on three VPS price points to see where they break.

VPS Tier	Example (May 2026 pricing)	LibreChat	Open WebUI	LobeChat
1 GB RAM, 1 vCPU	Hetzner CX11 (€3.29), Hostinger KVM 1 (~$4.99)	Will OOM at startup	Works (with env tweaks)	Works (local mode only)
2 GB RAM, 2 vCPU	Contabo VPS S (€4.50), Hostinger KVM 2 (~$6.99)	Works without RAG	Works comfortably	Works comfortably (server mode)
4 GB RAM, 2 vCPU	Hetzner CX22 (€4.51), Contabo VPS M (€7.50)	Recommended sweet spot	Headroom for RAG + Ollama 3B	Comfortable, room for backups
8 GB RAM, 4 vCPU	Hetzner CX32 (€8.46), Contabo VPS L (€10.50)	~50 active users	~50 active users + Ollama 7B	~100 active users

A note on Hostinger's pricing — I have all seven of our aggregator sites on Hostinger Cloud Startup shared plans, which run me about $9.99/month per site. For pure AI chat self-hosting, their KVM VPS tier is the comparable option, starting around $6.99/month for 2 GB. Hetzner is cheaper at the same RAM tier, but Hostinger's panel is friendlier if you do not want to live in docker compose logs.

Security: what I locked down before going live

Self-hosted AI chat is a juicy target. Every one of these stacks holds API keys to providers that bill by token, conversation history (potentially sensitive client data), and — in some configs — direct access to your local network via tool calls. Three things I locked down in every deployment:

Reverse proxy with rate limiting. Caddy in front, with rate_limit at 60 req/min per IP. The Open WebUI auth endpoint specifically is a credential-stuffing magnet — I saw 200+ failed login attempts within 48 hours of bringing a host online on a clean IP.
Outbound egress filtering. The LLM provider list is short and known (api.openai.com, api.anthropic.com, etc.). I put a Cloudflare Tunnel in front of inbound traffic and a strict iptables outbound allowlist on the host. This stops a compromised tool-call from exfiltrating data to arbitrary endpoints.
Secret rotation. The biggest risk is leaked API keys ending up in your provider's logs. All three apps support per-user keys instead of a single shared admin key. Use that feature — when someone leaves the team, you revoke their personal key in five minutes instead of rotating a shared one and chasing down every config.

RAG specifics: a 12-document test

I ran the same 12 PDFs (a mix of client meeting transcripts and product specs, totaling about 480 pages) through each app's document-Q&A pipeline. Same embedding model where I could control it (text-embedding-3-small via OpenAI), same retrieval question set.

Open WebUI — ingest took 1 minute 40 seconds. Retrieval was consistently the most accurate; in 18 of 20 test questions it cited the correct source paragraph. The hybrid keyword+vector search noticeably helped with named-entity questions ("what did Sarah say about the timeline?" worked, whereas pure-vector retrieval missed it).
LibreChat — ingest took 2 minutes 50 seconds via the RAG API container. Retrieval got 15 of 20 right. The integration is cleaner than I expected, but the separate RAG API service is one more piece to monitor.
LobeChat — ingest took 2 minutes 10 seconds. Retrieval got 14 of 20 right. The UI for managing knowledge bases is the prettiest, but I hit one bug where re-uploading a file with the same name silently appended duplicates rather than replacing.

This is not a comprehensive RAG benchmark — 20 questions is small, and your documents will behave differently. But it tracks with the community consensus: Open WebUI's RAG pipeline is genuinely the most polished today.

Decision matrix

Use this if you do not want to read the whole thing again:

You are a solo developer with one or two API keys, want local Ollama integration, and need document Q&A. → Open WebUI on a 2 GB VPS. Cheapest setup, fewest moving parts.
You are a 5-30 person team, need SSO and per-user model permissions, and want a real ChatGPT-Team alternative. → LibreChat on a 4 GB VPS. Worth the MongoDB tax.
You want non-technical users to actually adopt this, mobile-first, with voice chat and a polished marketplace. → LobeChat on a 2 GB VPS. UX wins matter more than back-end depth here.
You have a GPU box and your primary workload is local model inference with RAG. → Open WebUI on the same box as Ollama. Co-locating saves the network round-trip and the friction.
You need conversation branching for engineering workflows. → LibreChat. It is the only one that ships this.

What I ended up running

For our team at Warung Digital Teknologi we settled on LibreChat as the primary front-end, behind Caddy on a Hetzner CX22, with Cloudflare Tunnel for inbound. We kept a separate Open WebUI instance running on the GPU box (a Hetzner CCX13) specifically for the document-Q&A workflow because the RAG quality was meaningfully better. LobeChat got shelved — the agent marketplace was lovely but our team is small enough that we did not need it, and the lack of LDAP closed it out.

Total monthly cost for the AI chat infrastructure across both servers: about €13 (Hetzner) + ~$2 (Cloudflare R2 storage) + variable LLM API spend. That replaced four individual ChatGPT Plus subscriptions and three Claude Pro subscriptions across my team, which were running us about $140/month combined. The break-even came in week three.

FAQ

Can I run all three on the same VPS?
Technically yes, but only if you size the host for it — 8 GB minimum, and you will need to remap ports. I would not. Pick one. The decision is reversible — chat history is just JSON in a database, and there are community export scripts for all three.

Do any of them support MCP servers?
All three, as of mid-2026. LibreChat's implementation is the most mature (native config in librechat.yaml). LobeChat shipped MCP support late 2025 and it works. Open WebUI handles MCP via its Pipelines feature, which is more flexible but requires more setup.

What about cost compared to a managed SaaS?
ChatGPT Team is $25/user/month, so 10 users = $250/month. A 4 GB VPS running LibreChat costs around €5/month, plus your underlying API spend (typically 30-60% lower than the equivalent ChatGPT Team usage because you only pay for tokens used, not seats). At ten users the savings are usually $150+/month even before counting the architectural flexibility.

Are there security concerns with self-hosting?
Yes — the biggest is that you own the credential store. If your VPS is compromised, the attacker gets your provider API keys. Mitigations: use per-user keys not shared admin keys, rotate quarterly, put a reverse proxy with rate-limiting in front, and keep your host's automatic security updates on. The provider-side risk profile is actually lower than SaaS because your data does not pass through a third-party log pipeline.

What about Hostinger specifically?
Hostinger's shared hosting will not run any of these — none of them are pure-PHP. You need their VPS tier (KVM 2 or higher, about $6.99/month) or a competitor. I happen to use Hostinger shared for static-content sites and Hetzner VPS for anything Docker, which is the same split most of my readers tell me they use.

Can I migrate between them later?
Mostly yes for conversations, less so for agents/knowledge bases. LibreChat → Open WebUI conversation export is a JSON dump and a small script. Agents do not port between any of them — they use different prompt formats and tool-call schemas. Budget a day to rebuild your top five agents if you switch.

Final take

If you have read this far, the honest answer is: most teams will be better served by LibreChat in 2026. It has the right balance of features for an org-level deployment, the licensing is clean, the multi-provider story is genuinely effortless, and the conversation branching alone earns its keep for engineering work. The MongoDB requirement is a real cost — but it is a fixed one, paid once at setup.

Open WebUI is the right answer if RAG over your own documents is the primary use case, or if you are GPU-rich and want a local-first chat UX. LobeChat is the right answer if you are picking software for people who do not want to think about it — which, when you count the non-engineers on most teams, is a larger market than the self-hosting community usually admits.

What none of them are is finished. All three ship breaking changes monthly. Whichever you pick, pin your Docker tags to a specific version, take regular backups, and read the release notes before you upgrade. That advice is universal across self-hosted software in 2026 — but it bites harder here because every one of these projects is still iterating on the data schema.

Have a different setup that works well? I am especially interested in real-world experience with multi-tenant deployments (50+ users on one stack) — that scale is where the three diverge most, and where I have the least first-hand data.