If you have run Kubernetes on bare metal for any length of time, you have probably hit the storage wall. On a managed cloud platform, you ask for a PersistentVolumeClaim and a real disk appears out of nowhere, because the cloud provider has a storage driver wired in for you. On your own hardware, that magic does not exist. You create a claim, and it sits there forever in Pending, because nothing is listening to actually create the volume.
This is the single biggest gap between a “cloud” cluster and a self-hosted one. Stateless apps are easy. The moment you want a database, a file upload directory, or anything that has to survive a pod restart, you need real persistent storage that follows your pod around the cluster.
Longhorn fills that gap. It is a lightweight, distributed block storage system built specifically for Kubernetes, and it turns the local disks already in your nodes into replicated, highly available volumes. In this tutorial you will install Longhorn with Helm, make it your default StorageClass, provision a volume, run a stateful app on top of it, and prove that your data survives a pod being deleted. You will also take a snapshot so you understand how backups work.
This guide is for developers, sysadmins, and DevOps engineers running self-hosted Kubernetes. It builds on the same bare-metal foundation as my earlier posts on installing MetalLB and getting started with Helm. If you have those in place, you are ready.
Conceptual Overview
Before installing anything, let us get a few Kubernetes storage terms straight, because they trip up almost everyone at first.
A PersistentVolume (PV) is an actual piece of storage in the cluster. Think of it as a real disk that has been registered with Kubernetes.
A PersistentVolumeClaim (PVC) is a request for storage made by a pod. The pod says “I need 5Gi of storage,” and Kubernetes tries to satisfy that request with a PV. You almost always work with PVCs, not PVs directly.
A StorageClass is the bridge that makes this automatic. It defines a “type” of storage and points to a provisioner, the program that creates a real volume on demand. Without a StorageClass and a working provisioner, a PVC has no way to be fulfilled, which is exactly why claims hang in Pending on a fresh bare-metal cluster.
This is where Longhorn comes in. Longhorn is the provisioner. It runs as a set of pods across your nodes, takes the raw disk space on each node, and carves out block volumes from it. When a pod asks for storage through a Longhorn StorageClass, Longhorn creates the volume, attaches it to whatever node the pod is running on, and keeps replicas of the data on other nodes for safety.
That replication is the key feature. By default Longhorn keeps three copies of every volume on three different nodes. If a node dies, your data still exists elsewhere, and the volume can re-attach to a surviving node. That is what makes it genuinely production-grade rather than a toy local-disk hack.
Prerequisites
Before starting, make sure you have:
- A running Kubernetes cluster with
kubectlconfigured. Confirm withkubectl get nodes. Ideally three or more worker nodes, so replication has somewhere to put copies. Longhorn works on a single node too, but you lose the high-availability benefit. - Helm 3 installed. Check with
helm versionand confirm it starts withv3. If not, see my Helm guide. - open-iscsi installed on every node. Longhorn uses the Linux iSCSI subsystem to attach volumes to pods. This is the most common reason installs fail, so we handle it first.
- Some free disk space on each node. The default Longhorn data directory lives under
/var/lib/longhorn, so make sure the root filesystem has room, or plan to mount a dedicated disk there. - Basic comfort with
kubectl, YAML, and the Linux command line.
In this guide the cluster runs Ubuntu 22.04 nodes on the 192.168.1.0/24 subnet. Adjust anything network-specific to match your own setup.
Step 1: Install open-iscsi on Every Node
Longhorn attaches volumes to pods using iSCSI, a protocol for block storage over the network. The open-iscsi package provides the client side, and it must be present and running on every node that will host Longhorn volumes. Skipping this is the number-one cause of broken installs.
Run this on each node in the cluster:
sudo apt-get update
sudo apt-get install -y open-iscsi nfs-common
sudo systemctl enable --now iscsid
The nfs-common package is included because Longhorn uses NFS internally for one of its backup features, and having it ready avoids a separate trip later.
Confirm the iSCSI daemon is running:
sudo systemctl status iscsid
● iscsid.service - iSCSI initiator daemon (iscsid)
Loaded: loaded (/lib/systemd/system/iscsid.service; enabled)
Active: active (running) since Sat 2026-06-20 09:12:03 UTC
If you have many nodes and you have read my Ansible guide, this is a perfect candidate for a quick playbook rather than logging into each machine by hand.
Longhorn also ships an environment-check script that confirms every node is ready. Run it once from your workstation:
curl -sSfL https://raw.githubusercontent.com/longhorn/longhorn/v1.7.2/scripts/environment_check.sh | bash
It checks for iscsid, the right kernel modules, and a few other requirements, and tells you exactly which node is missing what. Fix anything it flags before moving on.
Step 2: Install Longhorn with Helm
With the prerequisites in place, the install itself is short. Add the Longhorn Helm repository and refresh your cache:
helm repo add longhorn https://charts.longhorn.io
helm repo update
Install Longhorn into its own namespace:
helm install longhorn longhorn/longhorn \
--namespace longhorn-system \
--create-namespace \
--version 1.7.2
Notice I pinned the version with --version. As I stressed in the Helm guide, pinning keeps your install reproducible so a future helm upgrade does not surprise you with a new release. Check the latest stable version on the Longhorn releases page and use that.
Longhorn deploys quite a few components: a manager on every node, the CSI driver, an instance manager, and a web UI. Watch them come up:
kubectl get pods -n longhorn-system --watch
This takes a few minutes on first install because several images need to be pulled. Wait until everything reaches Running or Completed:
NAME READY STATUS RESTARTS AGE
longhorn-manager-7q4xk 1/1 Running 0 3m
longhorn-driver-deployer-6c9d8f7b4d-2xk9p 1/1 Running 0 3m
longhorn-ui-5d8f9c7b5d-mn4qt 1/1 Running 0 3m
instance-manager-0a1b2c3d 1/1 Running 0 2m
csi-attacher-7f8d9c6b4d-abcde 1/1 Running 0 2m
...
Press Ctrl+C to stop watching once things settle. Do not panic if a pod or two restarts once during startup; that is normal as components wait for each other.
Step 3: Verify the StorageClass
The Helm chart automatically creates a StorageClass named longhorn and marks it as the cluster default. Confirm it:
kubectl get storageclass
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
longhorn (default) driver.longhorn.io Delete Immediate true 4m
That (default) next to longhorn is important. It means any PVC that does not name a specific StorageClass will be handled by Longhorn automatically. If you ever run more than one storage system, only one should be the default, so keep an eye on this.
If for some reason it is not marked default, you can set it yourself:
kubectl patch storageclass longhorn \
-p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
Step 4: Provision Your First Volume
Now let us prove the whole thing works by creating a PersistentVolumeClaim. Create a file named test-pvc.yaml:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: longhorn-test
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 2Gi
A few notes on this:
accessModes: ReadWriteOncemeans the volume can be mounted read-write by a single node at a time. This is the normal mode for block storage and what databases expect.storageClassName: longhornexplicitly asks for Longhorn. Since it is the default, you could omit this line, but being explicit is a good habit.2Giis the size. Longhorn will thin-provision this, so it does not consume the full amount up front.
Apply it:
kubectl apply -f test-pvc.yaml
Check its status:
kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
longhorn-test Bound pvc-3f9a2b1c-7d8e-4f5a-9b0c-1d2e3f4a5b6c 2Gi RWO longhorn 8s
That Bound status is the moment everything pays off. On a fresh bare-metal cluster without a provisioner, this claim would be stuck in Pending forever. With Longhorn watching, it created a real volume and bound it in seconds.
Step 5: Run a Stateful App on the Volume
A bound PVC is nice, but the real test is using it from a pod and confirming the data survives. Let us deploy a small app that writes to the volume.
Create test-pod.yaml:
apiVersion: v1
kind: Pod
metadata:
name: storage-test
spec:
containers:
- name: app
image: busybox
command: ["/bin/sh", "-c", "sleep 3600"]
volumeMounts:
- name: data
mountPath: /data
volumes:
- name: data
persistentVolumeClaim:
claimName: longhorn-test
This runs a minimal BusyBox container that just sleeps, with our Longhorn volume mounted at /data. Apply it and wait for it to be ready:
kubectl apply -f test-pod.yaml
kubectl wait --for=condition=Ready pod/storage-test --timeout=60s
Now write some data into the volume:
kubectl exec storage-test -- sh -c 'echo "Longhorn keeps my data safe" > /data/notes.txt'
kubectl exec storage-test -- cat /data/notes.txt
Longhorn keeps my data safe
Here is the moment of truth. Delete the pod entirely and recreate it:
kubectl delete pod storage-test
kubectl apply -f test-pod.yaml
kubectl wait --for=condition=Ready pod/storage-test --timeout=60s
Then read the file again:
kubectl exec storage-test -- cat /data/notes.txt
Longhorn keeps my data safe
The file is still there. The pod was destroyed and recreated, but because the data lived on a Longhorn persistent volume rather than inside the container, it survived. This is the entire reason persistent storage exists, demonstrated in three commands.
Step 6: Access the Longhorn Dashboard
Longhorn ships a clean web UI that shows your volumes, replicas, node health, and disk usage. It is genuinely useful for understanding what is happening under the hood.
The UI runs as a service inside the cluster. The quickest way to see it is a port-forward:
kubectl port-forward -n longhorn-system svc/longhorn-frontend 8080:80
Now open http://localhost:8080 in your browser. You will see your longhorn-test volume listed, and if you click into it, you can see its three replicas spread across different nodes. That replica view is worth a look, because it makes the high-availability story concrete: your 2Gi volume physically exists in three places.
For a permanent setup you would expose this through your Ingress controller instead of a port-forward. If you have followed my NGINX Ingress guide, you can route a hostname to longhorn-frontend. Do not expose the dashboard publicly without authentication, though. It can create and delete volumes, so protect it with an ingress auth annotation or keep it internal only.
Step 7: Take a Snapshot
Longhorn can take point-in-time snapshots of a volume, which are the foundation of any backup strategy. A snapshot captures the state of the volume at a moment in time so you can roll back to it.
The easiest way to take one is through the dashboard: open your volume, and click “Take Snapshot.” But you can also do it declaratively, which is what you want for automation. Create snapshot.yaml:
apiVersion: longhorn.io/v1beta2
kind: Snapshot
metadata:
name: longhorn-test-snap1
namespace: longhorn-system
spec:
volume: longhorn-test
A word of caution: the volume field here must be the Longhorn volume name, which is the pvc-... name you saw earlier in the kubectl get pvc output, not the PVC name itself. Snapshots live on the same disks as the volume, so they protect against accidental deletion and bad writes, but not against losing the underlying disks.
For real disaster recovery, you configure a backup target, an external location like an S3 bucket or an NFS share, and Longhorn copies backups there. If you self-host object storage, my guide on MinIO pairs perfectly with this: point Longhorn’s backup target at your MinIO bucket and your cluster backs itself up to storage you control. Set the backup target under Settings in the dashboard, then schedule recurring backups per volume.
Common Mistakes and Troubleshooting
PVC stuck in Pending. First check that Longhorn pods are actually running with kubectl get pods -n longhorn-system. If they are fine, the usual culprit is open-iscsi missing on a node. Run the environment-check script from Step 1 again. Also confirm the StorageClass exists with kubectl get storageclass.
Pods fail to mount with iscsiadm errors. This is the classic symptom of iscsid not running on the node where the pod was scheduled. SSH into that node and run sudo systemctl enable --now iscsid. Remember it must be on every node, not just the control plane.
Volume degraded or only one replica. If you have fewer than three schedulable nodes, Longhorn cannot place three replicas and the volume shows as “Degraded.” This is expected on small clusters. You can lower the default replica count in the dashboard under Settings, or in the StorageClass parameters, by setting numberOfReplicas to match your node count.
Not enough disk space. Longhorn stores data under /var/lib/longhorn by default. If your root partition is small, volumes will fail to schedule. Check available space with df -h /var/lib/longhorn. If you need more room, see my note on increasing a storage partition, or mount a dedicated disk at that path before installing.
Uninstalling leaves things behind. Longhorn protects you from accidentally deleting it while volumes exist. To fully remove it, you must first set the deleting-confirmation-flag setting to true in the dashboard, then run helm uninstall longhorn -n longhorn-system. Skipping that flag makes the uninstall hang.
Best Practices
A few habits will keep your storage reliable in production.
Run at least three nodes. Longhorn’s whole value is replication, and the default replica count is three. With fewer nodes you get a working volume but no real redundancy. Three nodes is the practical minimum for production.
Use a dedicated disk for storage. Rather than letting Longhorn share the root filesystem, mount a separate disk at /var/lib/longhorn on each node. This isolates storage I/O from the operating system and prevents a full data disk from taking down the whole node.
Configure an off-cluster backup target. Snapshots live on the same disks as your data, so they do not protect you from losing those disks. Always set a backup target on external storage, such as a MinIO or S3 bucket, and schedule recurring backups for important volumes.
Pin the chart version and upgrade deliberately. Storage is the last thing you want changing unexpectedly. Install with --version, read the upgrade notes before bumping it, and never jump multiple minor versions at once.
Set resource requests on your stateful workloads. Databases and other stateful apps deserve guaranteed CPU and memory so they are not starved or evicted. Longhorn keeps the data safe, but an evicted database pod is still downtime.
Protect the dashboard. The Longhorn UI can destroy volumes. Keep it behind authentication or internal-only access, never open to the internet. If you expose it through Ingress, add a basic-auth annotation at minimum.
Conclusion
You have turned the plain local disks in your bare-metal nodes into real, replicated, persistent storage for Kubernetes. You installed open-iscsi on every node, deployed Longhorn with Helm, confirmed it became your default StorageClass, provisioned a PersistentVolumeClaim, ran a stateful pod on top of it, and proved that your data survives a pod being destroyed. You also explored the dashboard and took a snapshot.
This closes one of the biggest gaps in a self-hosted cluster. Combined with MetalLB for external IPs, an Ingress controller for routing, and cert-manager for TLS, you now have the four pillars that managed Kubernetes gives you for free: load balancing, routing, certificates, and storage. Your bare-metal cluster can finally run databases, file stores, and other stateful workloads with confidence.
From here, the natural next steps are: configure a MinIO or S3 backup target and schedule recurring backups, deploy a real stateful application like PostgreSQL on a Longhorn volume, and experiment with volume expansion, since the StorageClass already allows it. Once storage is solid, the rest of your platform starts to feel like a proper cloud of your own.