How to Install AnythingLLM on Ubuntu with Ollama

Written by: Bagus Facsi Aginsa
Published at: 19 May 2026


Running a local LLM with Ollama is powerful. Giving it a chat UI with Open WebUI makes it approachable. But both of those tools are built around conversations. A prompt goes in, a response comes out. What they do not do well is give you a dedicated knowledge base per project.

If you want to ask questions about your internal runbooks without the model mixing in its general training data, or you want your team to have separate AI workspaces, one for engineering docs, one for HR policies, one for product specs, you need something built around documents and workspaces, not just conversations. That tool is AnythingLLM.

AnythingLLM is a self-hosted AI workspace that organizes your work into isolated environments called workspaces. Each workspace has its own document collection. When you ask a question inside a workspace, the system searches those documents for relevant passages and feeds them to the model as grounded context. This is called Retrieval-Augmented Generation, or RAG, and it is the difference between a model that guesses and one that actually reads your files.

This tutorial walks you through installing AnythingLLM on Ubuntu using Docker, connecting it to your local Ollama backend, uploading documents, and having the model answer questions from them. If you have not set up Ollama yet, start with Ollama vs LM Studio: Choosing the Right Tool to Run Local LLMs on Ubuntu first. If you already have Open WebUI running and are wondering how AnythingLLM differs, the short answer is covered below.


How AnythingLLM Differs from Open WebUI

Both tools connect to Ollama and provide a browser-based chat interface. The difference is in architecture and purpose.

Open WebUI is a general-purpose chat interface. It has conversation history, a model switcher, and a basic document upload feature, but all users share the same flat structure. Documents you upload are available globally, not organized per project.

AnythingLLM is organized around workspaces. Each workspace is an independent silo with:

  • Its own document library and vector database, documents uploaded to one workspace are not searchable from another
  • Its own LLM and embedding model configuration, you can use a fast 3B model in one workspace and a larger 13B model in another
  • Its own chat history, separate per user
  • Its own access controls in multi-user mode

This makes AnythingLLM the right choice when the accuracy of answers matters and you have specific document collections to query, not just a general chat assistant.

The RAG pipeline AnythingLLM runs under the hood:

  1. You upload a document (PDF, DOCX, Markdown, plain text, or a URL)
  2. AnythingLLM chunks the document into small passages and converts each chunk into a vector embedding using an embedding model
  3. Those embeddings are stored in a built-in vector database (LanceDB by default)
  4. When you send a chat message, AnythingLLM embeds your question, finds the most semantically similar document chunks, and injects them into the prompt before sending it to Ollama
  5. Ollama generates a response grounded in those retrieved chunks

The model generating the final text is Ollama. The embedding model is a separate, lighter model that only handles the vector search step.


Prerequisites

  • Ubuntu 20.04, 22.04, or 24.04
  • Ollama installed and running with at least one model already pulled
  • Docker installed (covered in Step 1 if you do not have it)
  • At least 8 GB of RAM (4 GB minimum, but 8 GB gives comfortable headroom)
  • At least 10 GB of free disk space
  • A user with sudo privileges

Confirm Ollama is responding before you continue:

curl http://localhost:11434/api/tags

You should see a JSON object with your downloaded models. A connection error means Ollama is not running, start it with sudo systemctl start ollama.


Step 1: Install Docker

Skip this step if Docker is already installed. Otherwise, the official install script handles everything:

curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
newgrp docker

The usermod command adds your user to the docker group so you can run containers without sudo. The newgrp command applies the group change immediately without requiring a logout.

Verify it works:

docker --version
docker run hello-world

Step 2: Pull the Embedding Model

AnythingLLM needs a dedicated embedding model to convert document text into vectors. This is a separate model from the one you use for chat, it is lighter, faster, and optimized for semantic similarity rather than text generation.

Pull nomic-embed-text from Ollama:

ollama pull nomic-embed-text

The model is about 274 MB and downloads quickly. You do not interact with it directly, AnythingLLM calls it internally every time you upload a document or send a message.


Step 3: Create the Storage Directory

AnythingLLM stores documents, vector embeddings, conversation history, and settings persistently in a single directory on your host machine. Create it before launching the container:

mkdir -p $HOME/anythingllm-storage
chmod 777 $HOME/anythingllm-storage

The broad permission is intentional. The container runs processes as a non-root internal user whose UID does not match your host user, so the directory needs to be world-writable. This directory lives entirely on your local machine, so the relaxed permission does not create a network exposure.


Step 4: Run the AnythingLLM Container

Pull and start the container with a single command:

docker run -d \
  --name anythingllm \
  --restart unless-stopped \
  -p 3001:3001 \
  --cap-add SYS_ADMIN \
  --add-host=host.docker.internal:host-gateway \
  -v $HOME/anythingllm-storage:/app/server/storage \
  -e STORAGE_DIR="/app/server/storage" \
  mintplexlabs/anythingllm

What each flag does:

  • -p 3001:3001 is to expose port 3001 so your browser can reach the UI
  • --restart unless-stopped is to restart the container automatically on reboot unless you explicitly stop it
  • --cap-add SYS_ADMIN, AnythingLLM embeds Chromium for web scraping features; this capability is required for Chromium’s sandbox to function inside Docker
  • --add-host=host.docker.internal:host-gateway is to creates a hostname inside the container that resolves to your host machine’s IP, which is how the container reaches Ollama
  • -v $HOME/anythingllm-storage:/app/server/storage is to mounts your storage directory into the container so data persists across container recreations
  • -e STORAGE_DIR is telling AnythingLLM where to write its data inside the container

The image is about 2.5 GB and takes a few minutes to download on first run.

Check the container started:

docker ps
docker logs anythingllm --tail 30

Look for AnythingLLM server running on port 3001 in the log output. That line confirms the backend is ready.


Step 5: Complete the Setup Wizard

Open your browser and go to http://localhost:3001 (or replace localhost with your server’s IP if accessing from another machine).

AnythingLLM presents a first-run setup wizard. Work through each screen:

LLM Provider

Select Ollama from the provider list. Set the base URL to:

http://host.docker.internal:11434

This is the hostname the container uses to reach Ollama on your host. From the model dropdown, select the model you want to use for chat (for example, llama3.2:3b). If the dropdown shows no models, check that Ollama is running and that the URL is correct.

Embedding Model

Select Ollama as the embedding provider. Set the same base URL:

http://host.docker.internal:11434

From the embedding model dropdown, select nomic-embed-text. This is the model you pulled in Step 2. If it does not appear, confirm you ran ollama pull nomic-embed-text.

Vector Database

Leave the selection as LanceDB. LanceDB is embedded in AnythingLLM and requires no external service or separate installation. It is well-suited for single-server use and handles millions of vectors without performance issues at typical document collection sizes.

Create Admin Account

Enter a username, email address, and password. This becomes the admin account with full access to all settings, workspaces, and users. Choose a strong password, even on a local server, this is the key to all your documents.

Click through to finish. You will land on the main AnythingLLM dashboard.


Step 6: Create a Workspace

In the left sidebar, click the + icon next to “Workspaces” to create your first workspace. Give it a name that describes the document collection it will hold, for example Engineering Runbooks, Product Docs, or Security Policies.

Each workspace starts empty. The name is just a label; the document collection you build in the next step is what gives it meaning.


Step 7: Upload Documents

Click on your workspace to open it. Look for the Upload Document button (the icon in the upper right of the document panel on the left side of the screen).

AnythingLLM accepts:

  • PDF files
  • Word documents (.docx)
  • Plain text (.txt)
  • Markdown files (.md)
  • Web URLs

Upload a document. After upload, it appears in the document library. You now need to move it into the workspace’s vector database by clicking Move to Workspace or Save and Embed.

This two-step process matters. A document in the library is stored but not yet searchable. A document that has been embedded is chunked, vectorized, and indexed, this is what the RAG pipeline searches at query time.

For a 10-page PDF, embedding takes roughly 5–15 seconds on CPU. A 100-page document may take a minute or two.


Step 8: Chat with Your Documents

Switch to the Chat tab in the workspace. Ask a question that the documents you uploaded can answer. For example, if you uploaded an engineering runbook:

How do I perform a zero-downtime rolling deployment?

AnythingLLM will retrieve the most relevant document chunks, include them in the prompt, and return an answer grounded in your actual document content. Below the response you will see a Citations panel listing the specific document sections that were used. This lets you verify that the answer came from your document rather than the model’s general training data.

If the model responds with something like “I don’t have information about that in the provided context”, it means no sufficiently similar chunks were found. This is the correct behavior in Query mode, the model is telling you it cannot find the answer rather than guessing.


Step 9: Enable Multi-User Mode

By default, AnythingLLM runs in single-user mode. To give teammates their own accounts:

Go to Settings → Security and enable Multi-User Mode. Once enabled, navigate to Settings → Users to create accounts.

Roles:

  • Admin: full access to settings, all workspaces, and user management
  • Manager: can manage workspaces and documents but cannot change global settings
  • Default: can use workspaces they have been granted access to

Assign users to specific workspaces from the workspace settings page. A default-role user only sees and can query workspaces they have been assigned to. Their conversation history is private to their account.


Common Mistakes and Troubleshooting

No models appear in the LLM dropdown during setup

The container cannot reach Ollama. Test connectivity from inside the container:

docker exec -it anythingllm curl http://host.docker.internal:11434/api/tags

If you see “connection refused”, Ollama is binding to 127.0.0.1 only. Override this by editing the Ollama systemd service:

sudo systemctl edit ollama

Add:

[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"

Then reload and restart:

sudo systemctl daemon-reload
sudo systemctl restart ollama

Document is stuck processing and never finishes embedding

Confirm that nomic-embed-text is pulled:

ollama list

If it is missing, pull it:

ollama pull nomic-embed-text

Also check for errors in the container logs:

docker logs anythingllm --tail 50

Answers are vague or the model says it does not know

The most common cause is that the document was uploaded to the library but not embedded into the workspace. In the workspace document panel, the document should show a filled indicator meaning it is embedded. If it shows as just uploaded, click the embed/move button to add it to the workspace vector store.

Container exits immediately on startup

The --cap-add SYS_ADMIN flag is likely missing. Remove the failed container and re-run with all flags from Step 4:

docker rm anythingllm

Chat responses are slow

This is normal on CPU-only machines for models larger than 3B parameters. If you have a GPU, ensure Ollama is using it by running ollama ps while a model is loaded and checking that the GPU field shows your card. If Ollama installed before your GPU driver was available, re-run the Ollama install script to let it re-detect the hardware.


Best Practices

Use one workspace per document domain. Mixing a Kubernetes configuration guide with an HR policy manual in the same workspace degrades retrieval quality because unrelated chunks compete to be returned for every query. Workspace separation is cheap; retrieval confusion is not.

Prefer Query mode over Conversation mode for factual tasks. AnythingLLM offers two chat modes per workspace. Query mode only generates a response if relevant document chunks are found. Conversation mode blends retrieved context with the model’s general knowledge, which can lead to answers that sound confident but go beyond your documents. For a knowledge base where accuracy matters, Query mode is the safer default.

Back up the storage directory regularly. Everything lives in $HOME/anythingllm-storage. A simple compressed archive captures your entire AnythingLLM state including documents, vector indexes, conversations, and settings:

tar czf $HOME/backups/anythingllm-$(date +%F).tar.gz $HOME/anythingllm-storage

Put Nginx in front if the interface needs to be accessed beyond localhost. Do not expose port 3001 directly to the internet. Reverse proxy through Nginx with TLS. The process is the same as putting Open WebUI behind Nginx, include the WebSocket proxy headers since AnythingLLM streams responses over WebSockets as well:

server {
    listen 443 ssl;
    server_name ai.example.com;

    location / {
        proxy_pass http://127.0.0.1:3001;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_read_timeout 300s;
    }
}

For the SSL certificate, refer to Secure Nginx with Let’s Encrypt SSL Using Certbot on Ubuntu.

Keep the container image updated monthly. The project ships frequent updates with new embedding providers, bug fixes, and UI improvements. Updating is safe because all data lives in the mounted storage directory, not inside the container:

docker pull mintplexlabs/anythingllm
docker stop anythingllm && docker rm anythingllm

Then re-run the docker run command from Step 4. Your workspaces, documents, and conversations survive the container replacement.


Conclusion

You now have AnythingLLM running on Ubuntu, connected to Ollama for both chat generation and document embeddings, with a workspace that can answer questions grounded in documents you control. No data leaves your server, no subscriptions, and no dependency on external APIs.

What you have built is the foundation of a private knowledge assistant, one that can answer accurately from your actual documentation rather than hallucinating from general training. That is a meaningful quality improvement over plain conversation, especially for technical or policy-heavy content where precision matters.

From here, the natural next steps are:

  • Agent mode: Enable agents in a workspace to let the model perform live web searches and run calculations, not just retrieve from static documents
  • URL scraping: Point AnythingLLM at a documentation site URL and it will crawl and embed the pages automatically, useful for keeping a workspace in sync with an actively maintained docs site
  • Multiple LLM configurations: Set different Ollama models per workspace. Use a small, fast model for structured question answering where documents do most of the work, and reserve a larger model for workspaces that require reasoning across complex material