ChromaDB shows up in a lot of RAG tutorials as “the vector database you pip-install and use inside your Python script.” That embedded mode is convenient for a quick prototype, but it only works from the single process that started it. The moment you want two services to share the same vector store, embedded mode stops being enough.
The answer is to run ChromaDB as a standalone HTTP server. One process owns the data. Everything else talks to it over the network using a clean REST API you can call from any language, or directly from the terminal with curl.
In this tutorial you will install ChromaDB on Ubuntu, start it as a dedicated server, create collections, insert documents with vector embeddings, run semantic similarity queries, apply metadata filters, and handle updates and deletes, all through the REST API using curl. You will also containerize it with Docker, keep it alive with systemd, and protect it behind Nginx. By the end you will have ChromaDB running as a proper service and understand exactly how its API works.
What Is ChromaDB and How Does Server Mode Differ
ChromaDB is an open-source vector database built for AI applications. You store documents alongside their vector embeddings in collections, then query the collection to find documents whose meaning is closest to your query. The similarity calculation happens on the embedding vectors, arrays of floats that encode the semantic content of each document.
Embedded mode (what most RAG tutorials show) runs ChromaDB in-process. Your Python script imports chromadb, creates a PersistentClient, and the database lives inside that process. No network, no separate service, but only that one process can access it at a time.
Server mode runs ChromaDB as a separate process listening on HTTP port 8000. Any number of clients connect over the network using the REST API. This is what this tutorial covers.
The REST API does not run the built-in text embedding model. You bring your own vectors. In practice this is how production systems work anyway, you generate embeddings from a dedicated model (Ollama, OpenAI, sentence-transformers) and store the resulting float arrays in ChromaDB. This tutorial uses simple 4-dimensional vectors so you can understand the mechanics without needing an embedding model.
Prerequisites
Before starting, make sure you have:
- Ubuntu 20.04, 22.04, or 24.04
- Python 3.8 or newer, check with
python3 --version pipandvenv, check withpip3 --versioncurlandjqinstalled for making and reading API calls- A non-root user with
sudoprivileges - At least 1 GB of free RAM
Install Python tooling and jq if needed:
sudo apt update
sudo apt install -y python3 python3-pip python3-venv jq
Step 1: Install ChromaDB in a Virtual Environment
Create a working directory and a virtual environment:
mkdir -p ~/chromadb-server
cd ~/chromadb-server
python3 -m venv venv
source venv/bin/activate
A virtual environment keeps ChromaDB’s dependencies separate from your system Python. On a server running multiple Python applications this avoids version conflicts.
Install ChromaDB:
pip install chromadb
This installs both the server binary and the Python client library in one package.
Confirm the install:
chroma --version
Step 2: Start ChromaDB as an HTTP Server
Create a directory for ChromaDB’s on-disk storage:
mkdir -p ~/chromadb-server/data
Start the server:
chroma run --path ~/chromadb-server/data --host 0.0.0.0 --port 8000
What each flag does:
--pathis the directory where ChromaDB persists collections and vectors. All data written here survives restarts.--host 0.0.0.0to listens on all network interfaces. Use127.0.0.1instead if you only need local access (recommended in production; external traffic should go through a reverse proxy).--port 8000is the default port. Change it if something else is already using 8000.
You should see:
Starting Chroma server on http://0.0.0.0:8000
Open a second terminal and verify the server is up:
curl -s http://localhost:8000/api/v1/heartbeat | jq
Expected response:
{
"nanosecond heartbeat": 1717000000000000000
}
Any JSON response here means ChromaDB is running and accepting connections. Leave the server running and work in the second terminal from here on.
Step 3: Understand the Core Concepts
Before touching the collection API, take two minutes to understand ChromaDB’s data model.
Collection is the top-level container, similar to a table in a relational database. When you create a collection you declare:
- The distance metric via metadata,
cosineis standard for text and most ML models,l2(Euclidean) is the default if you omit it,ip(inner product) is used for some recommendation models.
Document is a single record inside a collection. Each document has four parts:
id, a unique string identifier within the collectionembedding, the array of floats representing the itemdocument, the original text (optional, stored alongside the vector)metadata, an arbitrary JSON object for structured fields like category, source, or timestamp
Query. You provide a query embedding vector and ChromaDB returns the N documents whose stored vectors are closest to it, ranked by distance (lower distance means more similar).
Metadata filtering lets you combine vector similarity with structured conditions in one query. You can limit results to a specific category and ChromaDB applies the filter before the similarity ranking runs.
Step 4: Create a Collection
In this tutorial you will work with simple 4-dimensional vectors. Real embedding models produce 768 or 1536 dimensions, but 4 is enough to understand the mechanics without needing an actual embedding model.
Create a collection called knowledge_base:
curl -s -X POST http://localhost:8000/api/v1/collections \
-H "Content-Type: application/json" \
-d '{
"name": "knowledge_base",
"metadata": {"hnsw:space": "cosine"}
}' | jq
Expected response:
{
"id": "3e8a4f12-...",
"name": "knowledge_base",
"metadata": {
"hnsw:space": "cosine"
},
"tenant": "default_tenant",
"database": "default_database"
}
The hnsw:space setting chooses the distance metric, use cosine for text and most ML embedding models. This setting is fixed at creation time and cannot be changed later without dropping and recreating the collection.
The id field is a UUID that ChromaDB generates for the collection. You will need this UUID for all subsequent operations on this collection. Save it to a shell variable:
COLLECTION_ID=$(curl -s http://localhost:8000/api/v1/collections/knowledge_base | jq -r '.id')
echo $COLLECTION_ID
Verify the collection was created by listing all collections:
curl -s http://localhost:8000/api/v1/collections | jq
Step 5: Insert Documents
Now insert some documents into the knowledge_base collection. Each document has an id, an embedding vector, the original document text, and a metadata object.
Insert six documents with a single request:
curl -s -X POST "http://localhost:8000/api/v1/collections/$COLLECTION_ID/add" \
-H "Content-Type: application/json" \
-d '{
"ids": ["doc1", "doc2", "doc3", "doc4", "doc5", "doc6"],
"embeddings": [
[0.10, 0.90, 0.20, 0.80],
[0.15, 0.85, 0.25, 0.75],
[0.12, 0.88, 0.10, 0.82],
[0.80, 0.10, 0.70, 0.15],
[0.82, 0.12, 0.68, 0.18],
[0.50, 0.40, 0.60, 0.30]
],
"documents": [
"To restart nginx, run: sudo systemctl restart nginx",
"Check nginx status with: sudo systemctl status nginx",
"View nginx error logs at /var/log/nginx/error.log",
"Kubernetes deployments manage rolling updates across pods",
"HorizontalPodAutoscaler scales pod count based on CPU usage",
"Prometheus scrapes metrics from targets defined in prometheus.yml"
],
"metadatas": [
{"category": "nginx"},
{"category": "nginx"},
{"category": "nginx"},
{"category": "kubernetes"},
{"category": "kubernetes"},
{"category": "monitoring"}
]
}' | jq
Expected response:
true
true means all documents were indexed successfully. The operation is an upsert, if a document with the same id already exists it is overwritten; if it does not exist it is inserted. This makes it safe to re-index updated content without deduplication logic.
Confirm the documents were stored:
curl -s "http://localhost:8000/api/v1/collections/$COLLECTION_ID/count" | jq
Expected response:
6
Step 6: Retrieve Documents by ID
You can retrieve any document by its ID using the get endpoint:
curl -s -X POST "http://localhost:8000/api/v1/collections/$COLLECTION_ID/get" \
-H "Content-Type: application/json" \
-d '{
"ids": ["doc1"],
"include": ["documents", "metadatas", "embeddings"]
}' | jq
Expected output:
{
"ids": ["doc1"],
"embeddings": [[0.1, 0.9, 0.2, 0.8]],
"documents": ["To restart nginx, run: sudo systemctl restart nginx"],
"metadatas": [{"category": "nginx"}],
"uris": null,
"data": null
}
The include array controls which fields are returned. You can retrieve multiple documents by ID in one call:
curl -s -X POST "http://localhost:8000/api/v1/collections/$COLLECTION_ID/get" \
-H "Content-Type: application/json" \
-d '{
"ids": ["doc1", "doc4", "doc6"],
"include": ["documents", "metadatas"]
}' | jq
Step 7: Run a Similarity Search
This is the core operation. You provide a query embedding and ChromaDB returns the closest stored documents.
Search for the 3 most similar documents to a query vector that is close to the nginx cluster:
curl -s -X POST "http://localhost:8000/api/v1/collections/$COLLECTION_ID/query" \
-H "Content-Type: application/json" \
-d '{
"query_embeddings": [[0.12, 0.88, 0.22, 0.78]],
"n_results": 3,
"include": ["documents", "metadatas", "distances"]
}' | jq
Expected output:
{
"ids": [["doc2", "doc1", "doc3"]],
"distances": [[0.0002, 0.0007, 0.0021]],
"metadatas": [[
{"category": "nginx"},
{"category": "nginx"},
{"category": "nginx"}
]],
"documents": [[
"Check nginx status with: sudo systemctl status nginx",
"To restart nginx, run: sudo systemctl restart nginx",
"View nginx error logs at /var/log/nginx/error.log"
]],
"embeddings": null,
"uris": null,
"data": null
}
The distances field is the cosine distance between the query vector and each stored vector. Values close to 0.0 mean nearly identical direction (very similar). Values close to 2.0 mean opposite direction. This is the inverse of Qdrant’s score field, in ChromaDB, lower is better.
Notice the results are nested one level deep, ids[0], distances[0], documents[0], because ChromaDB supports batching multiple query vectors in a single request. The outer array corresponds to each query vector; here there is only one.
The Kubernetes and monitoring documents did not appear in the top 3 because their vectors point in a very different direction from the query vector.
Step 8: Filter Search Results by Metadata
Metadata filtering lets you combine vector similarity with structured conditions. You can limit results to a specific category and ChromaDB applies the filter before ranking by similarity.
Search for similar documents but only within the kubernetes category:
curl -s -X POST "http://localhost:8000/api/v1/collections/$COLLECTION_ID/query" \
-H "Content-Type: application/json" \
-d '{
"query_embeddings": [[0.12, 0.88, 0.22, 0.78]],
"n_results": 2,
"where": {"category": "kubernetes"},
"include": ["documents", "metadatas", "distances"]
}' | jq
Now ChromaDB only returns documents where category == "kubernetes", even though those vectors are farther from the query:
{
"ids": [["doc4", "doc5"]],
"distances": [[0.4821, 0.5012]],
"metadatas": [[
{"category": "kubernetes"},
{"category": "kubernetes"}
]],
"documents": [[
"Kubernetes deployments manage rolling updates across pods",
"HorizontalPodAutoscaler scales pod count based on CPU usage"
]],
"embeddings": null,
"uris": null,
"data": null
}
The simple {"category": "kubernetes"} syntax is shorthand for an equality match. For more complex conditions you can use explicit operators. To exclude a category:
curl -s -X POST "http://localhost:8000/api/v1/collections/$COLLECTION_ID/query" \
-H "Content-Type: application/json" \
-d '{
"query_embeddings": [[0.12, 0.88, 0.22, 0.78]],
"n_results": 3,
"where": {"category": {"$ne": "kubernetes"}},
"include": ["documents", "metadatas", "distances"]
}' | jq
The available operators are $eq, $ne, $gt, $gte, $lt, $lte, $in, and $nin. Combine conditions with $and and $or:
curl -s -X POST "http://localhost:8000/api/v1/collections/$COLLECTION_ID/query" \
-H "Content-Type: application/json" \
-d '{
"query_embeddings": [[0.12, 0.88, 0.22, 0.78]],
"n_results": 3,
"where": {
"$or": [
{"category": {"$eq": "nginx"}},
{"category": {"$eq": "monitoring"}}
]
},
"include": ["documents", "metadatas", "distances"]
}' | jq
Step 9: Update and Delete Documents
Update a document’s metadata without changing its embedding:
curl -s -X POST "http://localhost:8000/api/v1/collections/$COLLECTION_ID/update" \
-H "Content-Type: application/json" \
-d '{
"ids": ["doc1", "doc2"],
"metadatas": [
{"category": "nginx", "reviewed": true},
{"category": "nginx", "reviewed": true}
]
}' | jq
Expected response:
true
This replaces the full metadata object for each specified document. Verify the change:
curl -s -X POST "http://localhost:8000/api/v1/collections/$COLLECTION_ID/get" \
-H "Content-Type: application/json" \
-d '{
"ids": ["doc1"],
"include": ["metadatas"]
}' | jq
Delete specific documents by ID:
curl -s -X POST "http://localhost:8000/api/v1/collections/$COLLECTION_ID/delete" \
-H "Content-Type: application/json" \
-d '{"ids": ["doc6"]}' | jq
Delete documents by metadata filter:
curl -s -X POST "http://localhost:8000/api/v1/collections/$COLLECTION_ID/delete" \
-H "Content-Type: application/json" \
-d '{"where": {"category": "monitoring"}}' | jq
Delete an entire collection (destructive, deletes all documents and the collection definition):
curl -s -X DELETE http://localhost:8000/api/v1/collections/knowledge_base | jq
Step 10: Run ChromaDB with Docker
If you prefer containers over a pip install, ChromaDB publishes an official Docker image.
Create the data directory:
mkdir -p ~/chromadb-server/data
Run the container:
docker run -d \
--name chromadb \
--restart unless-stopped \
-p 8000:8000 \
-v ~/chromadb-server/data:/chroma/chroma \
chromadb/chroma:latest
The -v mount maps your local data directory into the container so collections survive container rebuilds. --restart unless-stopped keeps ChromaDB running after a server reboot unless you explicitly stop the container.
Verify it is responding:
curl -s http://localhost:8000/api/v1/heartbeat | jq
All the curl commands from the previous steps work identically against the Docker-hosted server.
Step 11: Run as a Systemd Service
For a non-Docker setup, create a systemd unit to keep ChromaDB running in the background and restart it on failure.
Find the exact path to the chroma binary:
which chroma
# Example: /home/ubuntu/chromadb-server/venv/bin/chroma
Create the service file:
sudo nano /etc/systemd/system/chromadb.service
[Unit]
Description=ChromaDB Vector Database Server
After=network.target
[Service]
Type=simple
User=YOUR_USERNAME
WorkingDirectory=/home/YOUR_USERNAME/chromadb-server
ExecStart=/home/YOUR_USERNAME/chromadb-server/venv/bin/chroma run \
--path /home/YOUR_USERNAME/chromadb-server/data \
--host 127.0.0.1 \
--port 8000
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
Replace YOUR_USERNAME with your actual Linux username. Notice that --host 127.0.0.1 is used here, ChromaDB only listens on localhost. External traffic goes through Nginx (next step), not directly to ChromaDB.
Enable and start:
sudo systemctl daemon-reload
sudo systemctl enable chromadb
sudo systemctl start chromadb
sudo systemctl status chromadb
Check the logs if something goes wrong:
journalctl -u chromadb -n 50 --no-pager
Step 12: Protect ChromaDB Behind Nginx
ChromaDB has no built-in HTTPS and no authentication by default. Any process that can reach port 8000 can read and delete all your data. For any server accessible over a network, put ChromaDB behind Nginx.
Install Nginx and the password utility:
sudo apt install -y nginx apache2-utils
Create a password file:
sudo htpasswd -c /etc/nginx/.chromadb_htpasswd chromauser
You will be prompted to enter and confirm a password.
Create an Nginx site:
sudo nano /etc/nginx/sites-available/chromadb
server {
listen 8001;
location / {
auth_basic "ChromaDB";
auth_basic_user_file /etc/nginx/.chromadb_htpasswd;
proxy_pass http://127.0.0.1:8000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_read_timeout 120s;
}
}
Enable the site and restart Nginx:
sudo ln -s /etc/nginx/sites-available/chromadb /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl restart nginx
Now port 8001 is the public-facing port with HTTP basic authentication. Port 8000 stays bound to localhost only.
Test with curl:
curl -s -u chromauser:yourpassword http://localhost:8001/api/v1/heartbeat | jq
All the API calls from earlier work the same way through Ngin, just add -u chromauser:yourpassword to every curl command. Services on the same machine can still connect directly to port 8000 without credentials.
Common Mistakes and Troubleshooting
curl returns Connection refused at localhost:8000
The ChromaDB server is not running. If using systemd, check its status:
sudo systemctl status chromadb
journalctl -u chromadb -n 30 --no-pager
COLLECTION_ID variable is empty
The jq -r '.id' command returned nothing, meaning the collection does not exist or the name was wrong. Confirm the collection name with:
curl -s http://localhost:8000/api/v1/collections | jq '.[].name'
Invalid collection: collection not found
The UUID in $COLLECTION_ID does not match any collection on the server. If you restarted ChromaDB or opened a new terminal, re-run the export command:
COLLECTION_ID=$(curl -s http://localhost:8000/api/v1/collections/knowledge_base | jq -r '.id')
Add of existing embedding ID
You tried to add a document with an ID that already exists. Either change the ID, or use the update endpoint instead of add to replace the existing document.
Queries return unexpected or irrelevant results
Most likely the collection was created without setting "hnsw:space": "cosine". The default L2 (Euclidean) distance gives poor results for normalized text embeddings. Drop the collection and recreate it with the correct metric, you will need to re-index all documents.
High memory usage
If you later use the Python client with the built-in embedding function, the all-MiniLM-L6-v2 model downloads on first use (around 80 MB) and stays in memory (roughly 200–400 MB while loaded). When using the REST API directly with curl, no embedding model runs server-side, memory usage is much lower.
Best Practices
Always specify the distance metric at collection creation. The hnsw:space setting is fixed when the collection is created and cannot be changed later. Use cosine for text and most ML embedding models. Re-creating a collection to change this requires re-indexing all documents.
Store COLLECTION_ID wherever you store config. The UUID is stable, it does not change across server restarts. Fetch it once at startup and store it in your application’s environment variables or config file rather than querying for it on every request.
Bind ChromaDB to localhost only in production. The systemd service example uses --host 127.0.0.1 for this reason. Direct exposure on 0.0.0.0:8000 means any machine that can reach the server can delete all your collections.
Back up the data directory regularly. The ~/chromadb-server/data directory is your entire database. Set up a cron job to copy it to a separate location:
0 2 * * * rsync -a ~/chromadb-server/data/ /mnt/backups/chromadb/
Keep IDs stable and meaningful. Use stable identifiers for documents (file paths, document slugs, UUIDs derived from content) rather than sequential integers. ChromaDB’s upsert semantics mean re-indexing a document with the same ID safely replaces the old version, predictable IDs make this work correctly.
Conclusion
You now have ChromaDB running as a standalone server on Ubuntu and know how to operate it entirely through the REST API. You installed it in a virtual environment, started the HTTP server, created a collection, inserted documents with embedding vectors, ran similarity searches, applied metadata filters, updated and deleted documents, set up Docker and systemd alternatives, and protected the API with Nginx basic auth, all with curl.
The fundamentals you practiced here translate directly to any language SDK. The ChromaDB Python client and community clients for Node.js and Go are thin wrappers around the same REST API you just used by hand.
From here, the natural next step is generating real embeddings instead of manually crafted vectors. If you have Ollama running locally, pull nomic-embed-text and call its embedding endpoint to convert actual text into 768-dimensional vectors before inserting them into ChromaDB. The combination of Ollama for embeddings, ChromaDB as the vector store, and a Fastify API to tie it together is covered end-to-end in the RAG API with Fastify, Ollama, and ChromaDB tutorial.