Any service that lives on a single IP address dies with the machine that holds it. The classic cure is a floating IP (also called a virtual IP or VIP): an address that is not owned by any one server but moves automatically to whichever server is currently healthy. Clients and DNS only ever know the floating IP; which physical machine answers is an implementation detail.
On Linux, the standard tool for this is Keepalived, an implementation of VRRP (Virtual Router Redundancy Protocol). In this tutorial you will install Keepalived on two Ubuntu servers, configure a floating IP between them, add a health check so failover triggers on service failure (not just machine death), and, the part that matters, actually test the failover.
This guide keeps the setup service-agnostic; the VIP can front anything that listens on an IP. For the most common pairing, an HA Nginx load balancer, see Setup a High Availability Load Balancer with Nginx and Keepalived on Ubuntu.
How VRRP and Floating IPs Work
VRRP is a simple protocol once you see the moving parts:
- Both nodes run Keepalived and share a virtual router ID and a virtual IP address.
- Each node has a priority. The highest-priority healthy node wins the election, becomes MASTER, and adds the VIP to its network interface. The other node stays BACKUP.
- The master multicasts a VRRP advertisement every second (to
224.0.0.18, IP protocol 112): “I am master, priority 200.” - If advertisements stop (the master crashed, lost network, or lowered its priority because a health check failed), the backup stops hearing them, promotes itself, adds the VIP to its own interface, and broadcasts gratuitous ARP so every switch and neighbour immediately learns the VIP’s new home.
Failover typically completes in 3 to 4 seconds. When the original master recovers, default behaviour is preemption: the higher-priority node takes the VIP back (you can disable this, covered below).
Prerequisites
- Two servers running Ubuntu 22.04 or 24.04 LTS on the same subnet. VRRP is a layer-2 protocol, so both nodes need to share a broadcast domain
- One unused IP address on that subnet for the VIP, outside any DHCP pool
- A user with
sudoprivileges on both servers - Networks that allow VRRP: multicast to
224.0.0.18and IP protocol 112 between the nodes. On-premises networks generally do; most public clouds do not (they need unicast mode plus the provider’s floating-IP feature)
The examples use:
- node1:
192.168.1.11/24on interfaceens18 - node2:
192.168.1.12/24on interfaceens18 - Virtual IP:
192.168.1.100
Check your interface name with ip link show and substitute.
Step 1: Install Keepalived on Both Nodes
sudo apt update
sudo apt install keepalived
Also install psmisc if it is missing. It provides killall, which the health check below uses:
sudo apt install psmisc
Step 2: Configure the Master (node1)
sudo nano /etc/keepalived/keepalived.conf
vrrp_script check_service {
script "/usr/bin/killall -0 nginx"
interval 2
fall 3
rise 2
weight -150
}
vrrp_instance VI_1 {
interface ens18
state MASTER
priority 200
virtual_router_id 51
advert_int 1
virtual_ipaddress {
192.168.1.100/24
}
authentication {
auth_type PASS
auth_pass Sup3rS3c
}
track_script {
check_service
}
}
Understanding each block:
vrrp_script check_service is the health check. killall -0 probes whether a process named nginx exists (swap in whatever your VIP fronts: haproxy, postgres, or a custom script). interval 2 runs it every 2 seconds. fall 3 demands 3 consecutive failures before acting (prevents flapping on a hiccup); rise 2 demands 2 consecutive successes to recover. Any command works as a check. Exit code 0 is healthy, anything else is a failure, so curl -sf http://localhost/health makes a fine deeper check.
weight -150 is the crucial line. On check failure, subtract 150 from this node’s priority: 200 minus 150 is 50, which is lower than the backup’s 100, so the backup wins the next election and takes the VIP. Without a weight (or with weight 0), a failed check puts the whole instance in FAULT state instead, which is a blunter version of the same effect. The weighted approach is preferred because recovery is equally graceful.
priority 200: higher wins. Master gets the bigger number.
virtual_router_id 51 is the VRRP group identity, 1 to 255. Both nodes must match, and the ID must be unique on the subnet: two unrelated Keepalived clusters using the same ID on the same network will fight each other.
advert_int 1 is the advertisement interval in seconds. The backup declares the master dead after about 3 missed intervals, so this directly sets your failover time floor.
virtual_ipaddress is the floating IP. Keepalived adds/removes it from ens18 as state changes. You can list several.
authentication is a shared secret so stray VRRP speakers cannot join the group. Both nodes identical; note VRRP truncates it to 8 characters.
Enable and start:
sudo systemctl enable --now keepalived
Step 3: Configure the Backup (node2)
Same file, two differences: state BACKUP and lower priority.
sudo nano /etc/keepalived/keepalived.conf
vrrp_script check_service {
script "/usr/bin/killall -0 nginx"
interval 2
fall 3
rise 2
weight -150
}
vrrp_instance VI_1 {
interface ens18
state BACKUP
priority 100
virtual_router_id 51
advert_int 1
virtual_ipaddress {
192.168.1.100/24
}
authentication {
auth_type PASS
auth_pass Sup3rS3c
}
track_script {
check_service
}
}
sudo systemctl enable --now keepalived
(state MASTER/BACKUP is only the initial state. The election by priority decides who actually holds the VIP.)
Step 4: Verify the VIP
On node1:
ip addr show ens18
2: ens18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP
link/ether 52:54:00:11:11:11 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.11/24 brd 192.168.1.255 scope global ens18
inet 192.168.1.100/24 scope global secondary ens18
The second inet line is the VIP. node2 should show only its own address. Watch state transitions live on either node:
journalctl -u keepalived -f
You will see lines like (VI_1) Entering MASTER STATE on node1 and Entering BACKUP STATE on node2. From any third machine, confirm the VIP answers:
ping -c 3 192.168.1.100
Step 5: Test the Failover
From a third machine, start a continuous probe against the VIP:
ping 192.168.1.100
Now break the master. Two escalating tests.
Test 1: service failure. On node1, stop the tracked service:
sudo systemctl stop nginx
Within about 6 to 8 seconds (3 failed checks, then re-election) the VIP appears on node2. Run ip addr show ens18 there and it now lists 192.168.1.100. The ping stream shows a few lost packets, then resumes. This is the scenario track_script exists for: the machine is fine, the service is dead, and the VIP moved anyway.
Restart nginx on node1 and, because its priority recovers to 200, it preempts and takes the VIP back.
Test 2: node failure. Reboot node1 entirely:
sudo reboot
Same outcome, faster detection (about 3 seconds, since the backup notices missing advertisements rather than waiting on health checks). When node1 finishes booting, it takes the VIP back again.
If both tests pass, you have a working floating IP.
Preemption: Should the Recovered Master Take the VIP Back?
Default VRRP behaviour is yes: highest priority always wins, so a recovered master causes a second failover. Each failover is a few seconds of disruption, and for most services two blips are worse than one.
To make the VIP stay wherever it currently is until an actual failure, use nopreempt:
vrrp_instance VI_1 {
state BACKUP
nopreempt
priority 200
...
}
Two caveats: nopreempt requires the initial state to be BACKUP on both nodes (priorities still decide the first election), and you should make the same change on both. With this setup, a recovered node simply waits as a hot standby. For services with connection state (databases, WebSocket servers), nopreempt is almost always what you want; for stateless HTTP either choice is fine.
Unicast Mode for Clouds and Filtered Networks
VRRP’s default multicast does not survive most cloud networks and some corporate ones. Keepalived can send advertisements directly between nodes instead. Add to the vrrp_instance block on node1:
unicast_src_ip 192.168.1.11
unicast_peer {
192.168.1.12
}
And the mirror image on node2:
unicast_src_ip 192.168.1.12
unicast_peer {
192.168.1.11
}
Note that on public clouds this makes the election work, but the VIP itself must still be routable to the winning node. That usually requires the provider’s floating/elastic IP API, typically driven from a notify_master script. On your own network (bare metal, Proxmox, VMware), unicast alone is enough.
Common Problems and Troubleshooting
Both nodes claim MASTER and both hold the VIP (split-brain).
The nodes cannot hear each other. Check in order: firewall dropping VRRP (with UFW: sudo ufw allow to 224.0.0.18 on both nodes, see Setup Firewall Using UFW on Ubuntu); mismatched virtual_router_id or auth_pass; nodes on different subnets; multicast-hostile network (switch to unicast, above). To see who is actually talking on the wire: sudo tcpdump -i ens18 proto 112.
The health check never triggers failover.
Look for VRRP_Script(check_service) failed in journalctl -u keepalived. If it is absent, the script never runs or never fails: use absolute paths (/usr/bin/killall), confirm psmisc is installed, and remember recent Keepalived versions enforce script security. If the log says a script is being dropped for security reasons, add enable_script_security and a script_user in a global_defs block, or run the check as a dedicated user.
Failover works, but clients keep timing out for a minute afterwards.
Stale ARP somewhere in the path. Keepalived sends gratuitous ARP on takeover, but some switches rate-limit or ignore it. Verify with arping 192.168.1.100 from the client side and check switch ARP/CAM timers; on clouds, this is the sign you need the provider’s floating IP mechanism rather than raw VRRP.
State flaps between MASTER and BACKUP every few seconds.
Either the health check is marginal (service restart-looping, so check it directly) or something else on the subnet shares your virtual_router_id. tcpdump proto 112 shows every VRRP speaker and their priorities; a rogue advertisement with a higher priority is immediately visible.
The VIP is present but the service is unreachable on it.
Keepalived only moves the address. The service must actually listen on it. The safe pattern is binding to 0.0.0.0 (all addresses); a service bound specifically to the node’s primary IP will not answer on the VIP.
Best Practices
Track the service, not just the machine. A floating IP without a track_script only protects against total node death, which is the rarer failure in practice. Track the process (or better, an HTTP health endpoint) of whatever the VIP is for.
Test failover both ways, on a schedule. The standby that has never been failed over to is not a standby, it is a hope. After the initial tests in Step 5, repeat them whenever configs change.
Use nopreempt unless you have a reason not to. Halving the number of failover events is free reliability.
Keep the paired configuration in version control or automation. The two nodes’ configs must stay consistent (same router ID, same password, same tracked service); drift between them is a classic slow-burn outage. A small Ansible playbook handles it. See Getting Started with Ansible on Ubuntu.
Conclusion
You installed Keepalived on two Ubuntu servers and turned one fragile address into a floating IP: VRRP elects a master by priority, a track script drops that priority the moment the fronted service dies, the VIP follows the healthy node within seconds, and nopreempt keeps recovery from causing a second outage. You proved all of it by stopping the service and rebooting the master while watching the ping stream.
The natural next step is putting something behind the VIP worth protecting: Setup a High Availability Load Balancer with Nginx and Keepalived on Ubuntu builds the full HA load balancer pair, How to Configure Nginx as Layer 4 Load Balancer covers the balancing tier for arbitrary TCP/UDP services, and if your servers live on a fixed address for the first time, How to Set a Static IP Address with Netplan on Ubuntu is the prerequisite worth double-checking, because floating IPs and DHCP do not mix.