A single database server is a single point of failure. If the disk dies, the kernel panics, or someone fat-fingers a systemctl stop, your application goes down and your most recent data might be gone. Logical backups with pg_dump help, but restoring a large database from a dump can take hours, and any writes since the last dump are lost.
Streaming replication solves this. You run a second PostgreSQL server, called a standby, that receives a continuous stream of every change happening on the primary. If the primary fails, the standby already has an up-to-date copy and can be promoted to take over. As a bonus, the standby can serve read-only queries, offloading reporting and analytics traffic away from the primary.
This tutorial walks you through setting up physical streaming replication between two Ubuntu servers from scratch. You will configure a primary, build a standby with pg_basebackup, use a replication slot so the primary never throws away data the standby still needs, verify replication is healthy, and learn how to promote the standby during a failover. This guide assumes you already have PostgreSQL installed; if you do not, start with How to Install and Configure PostgreSQL on Ubuntu first.
How Streaming Replication Works
Before touching any config, it helps to understand what is actually being streamed.
Every change PostgreSQL makes, every insert, update, delete, and schema change, is first written to the Write-Ahead Log (WAL) before it touches the data files. The WAL is an append-only sequence of records describing exactly what changed. This is what makes PostgreSQL crash-safe: after a crash, it replays the WAL to recover.
Streaming replication takes advantage of this. The standby connects to the primary, requests the WAL stream, and replays those same records against its own copy of the data. The result is a byte-for-byte identical replica that stays only a fraction of a second behind.
A few terms you will see throughout this guide:
- Primary, the read-write server where your application sends writes
- Standby (or replica), the read-only server that replays the primary’s WAL
- WAL, the Write-Ahead Log, the stream of change records being replicated
- Replication slot, a marker on the primary that tracks how far the standby has consumed the WAL, so the primary keeps that WAL until the standby has it
- pg_basebackup, a tool that takes a full physical copy of the primary to seed the standby
This setup is asynchronous by default: the primary commits a transaction without waiting for the standby to confirm it received the WAL. That gives you the best performance and is the right choice for most setups. We will mention the synchronous option near the end.
Prerequisites
- Two Ubuntu servers (20.04, 22.04, or 24.04), one for the primary and one for the standby
- PostgreSQL of the same major version installed on both (this guide uses PostgreSQL 16)
- A non-root user with
sudoon both servers - Network connectivity between the two servers on port 5432
- Basic comfort with the Linux command line and editing config files
For this tutorial the servers use these addresses. Replace them with your own throughout:
Primary: 10.0.0.10
Standby: 10.0.0.20
A quick note on versions: streaming replication requires the primary and standby to run the same PostgreSQL major version. You cannot stream from a 15 primary to a 16 standby. Confirm with psql --version on both machines before you start.
Step 1: Configure the Primary Server
All of the work in this step happens on the primary (10.0.0.10).
Create a Replication Role
The standby needs a dedicated role to log in and pull the WAL stream. It does not need superuser rights, only the REPLICATION attribute. Open a psql shell:
sudo -i -u postgres psql
Create the role with a strong password:
CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'a_strong_replication_password';
REPLICATION is a special privilege that allows the role to initiate replication connections but nothing else, which keeps the blast radius small if the credential leaks. Exit psql:
\q
Edit postgresql.conf
Open the main config file:
sudo nano /etc/postgresql/16/main/postgresql.conf
Set the following values. Most of them exist already, commented out, so search and edit rather than appending duplicates:
listen_addresses = '*'
wal_level = replica
max_wal_senders = 10
max_replication_slots = 10
wal_keep_size = 512MB
hot_standby = on
Here is what each one does:
listen_addresses = '*'tells PostgreSQL to accept connections on all network interfaces, not just localhost, so the standby can reach itwal_level = replicamakes PostgreSQL log enough information in the WAL to support replication (this is actually the default in modern versions, but set it explicitly)max_wal_sendersis the number of concurrent standby connections allowed; 10 is plenty of headroom for one or two standbysmax_replication_slotsis the number of replication slots you can createwal_keep_sizeis a safety buffer of WAL the primary keeps around even without a slot; it helps if a standby briefly disconnectshot_standby = ononly matters when this server acts as a standby, but setting it everywhere keeps your configs identical and ready for failover
Edit pg_hba.conf
The standby must be allowed to connect using the special replication pseudo-database. Open the host-based authentication file:
sudo nano /etc/postgresql/16/main/pg_hba.conf
Add this line, using the standby’s IP address:
host replication replicator 10.0.0.20/32 scram-sha-256
This says: allow the replicator role to make replication connections from exactly 10.0.0.20, authenticating with a hashed password. Using a /32 mask locks it to that single host rather than an entire subnet.
Apply the Changes
listen_addresses only takes effect on a full restart, so restart rather than reload:
sudo systemctl restart postgresql
If you run UFW, allow the standby to reach the database port:
sudo ufw allow from 10.0.0.20 to any port 5432
Create a Replication Slot
A replication slot is the piece that makes this setup robust. Without one, if the standby disconnects for longer than wal_keep_size allows, the primary recycles WAL the standby still needed, and replication breaks permanently. A slot tells the primary “do not delete WAL past this point until the standby confirms it has it.”
Create one in psql:
sudo -i -u postgres psql
SELECT pg_create_physical_replication_slot('standby1_slot');
Verify it exists:
SELECT slot_name, slot_type, active FROM pg_replication_slots;
slot_name | slot_type | active
---------------+-----------+--------
standby1_slot | physical | f
active is f (false) for now because no standby has connected yet. Exit psql with \q.
One caution worth stating early: a replication slot will hold WAL forever if its standby goes away and never comes back. That can fill your primary’s disk. We cover how to guard against that in the best practices section.
Step 2: Build the Standby Server
Everything in this step happens on the standby (10.0.0.20).
The standby will be a clone of the primary, so its existing data directory must be wiped and replaced with a fresh copy pulled directly from the primary using pg_basebackup.
Stop PostgreSQL and Clear the Data Directory
First stop the service:
sudo systemctl stop postgresql
The default data directory on Ubuntu is /var/lib/postgresql/16/main. Empty it. Be careful with this command and double-check the path:
sudo -u postgres rm -rf /var/lib/postgresql/16/main
Run pg_basebackup
Now pull a full copy of the primary. Run this as the postgres user:
sudo -u postgres pg_basebackup \
-h 10.0.0.10 \
-U replicator \
-D /var/lib/postgresql/16/main \
-S standby1_slot \
-R \
-P \
-X stream
Let me break down each flag, because this single command does most of the heavy lifting:
-h 10.0.0.10is the primary’s address-U replicatoris the replication role you created-D /var/lib/postgresql/16/mainis where the copied data lands-S standby1_slottells the standby to use the replication slot you created on the primary-Rautomatically writes the connection settings into the new data directory and creates thestandby.signalfile that marks this server as a standby-Pshows a progress bar-X streamstreams WAL during the backup so the copy is consistent even on a busy primary
You will be prompted for the replicator password. After it finishes, you will see something like:
33611/33611 kB (100%), 1/1 tablespace
Inspect What -R Generated
The -R flag is a convenience that saves you from writing connection details by hand. Look at what it produced:
sudo -u postgres cat /var/lib/postgresql/16/main/postgresql.auto.conf
primary_conninfo = 'user=replicator password=a_strong_replication_password host=10.0.0.10 port=5432 sslmode=prefer ...'
primary_slot_name = 'standby1_slot'
This is the standby telling itself how to reach the primary and which slot to use. You should also see an empty marker file confirming standby mode:
sudo -u postgres ls -l /var/lib/postgresql/16/main/standby.signal
When standby.signal is present, PostgreSQL boots into standby mode and starts replaying WAL instead of accepting writes.
Start the Standby
sudo systemctl start postgresql
If you allowed remote access on the primary’s firewall but also run UFW on the standby, nothing extra is needed here, the standby makes an outbound connection to the primary.
Step 3: Verify Replication Is Working
Now confirm the two servers are actually talking. There are checks to run on each side.
On the Primary
Open psql on the primary and look at the replication status:
sudo -i -u postgres psql -x -c "SELECT client_addr, state, sync_state, sent_lsn, replay_lsn FROM pg_stat_replication;"
-[ RECORD 1 ]------------------
client_addr | 10.0.0.20
state | streaming
sync_state | async
sent_lsn | 0/3000148
replay_lsn | 0/3000148
state = streaming means the WAL is flowing. When sent_lsn and replay_lsn match, the standby has caught up to the primary completely. The slot should now show as active too:
SELECT slot_name, active FROM pg_replication_slots;
active will now be t.
On the Standby
Confirm the server is running in recovery (standby) mode:
sudo -i -u postgres psql -c "SELECT pg_is_in_recovery();"
pg_is_in_recovery
-------------------
t
t means yes, this is a standby. A primary would return f.
The Real Test: Write on Primary, Read on Standby
Nothing beats an end-to-end test. On the primary, create some data:
sudo -i -u postgres psql -c "CREATE DATABASE repltest;"
sudo -i -u postgres psql -d repltest -c "CREATE TABLE t (id int); INSERT INTO t VALUES (1), (2), (3);"
Now, on the standby, read it back:
sudo -i -u postgres psql -d repltest -c "SELECT * FROM t;"
id
----
1
2
3
The rows appeared on the standby within milliseconds without you copying anything. Replication works. If you try to write on the standby, it will refuse:
sudo -i -u postgres psql -d repltest -c "INSERT INTO t VALUES (4);"
ERROR: cannot execute INSERT in a read-only transaction
That read-only enforcement is exactly what you want. The standby is a faithful mirror, not a place to accept independent writes.
Step 4: Promote the Standby (Failover)
The whole point of a standby is to take over when the primary dies. Promotion turns the standby into a fully independent read-write primary.
Simulate a primary failure by stopping it:
sudo systemctl stop postgresql # run on the primary
Then promote the standby:
sudo -u postgres pg_ctl promote -D /var/lib/postgresql/16/main
On Ubuntu’s packaged PostgreSQL you can also use the wrapper:
sudo pg_ctlcluster 16 main promote
Confirm the standby is now a primary:
sudo -i -u postgres psql -c "SELECT pg_is_in_recovery();"
It now returns f, and the server accepts writes. The standby.signal file is automatically removed during promotion.
Important: once promoted, the old primary and the new primary have diverged. You cannot simply restart the old primary and expect it to rejoin as a standby, because both may have written conflicting WAL. To bring the old primary back as a standby, you either rebuild it with pg_basebackup again, or use pg_rewind to resynchronize it cheaply. pg_rewind is the production-friendly path, since it only copies the blocks that differ instead of the entire database.
Common Mistakes and Troubleshooting
“FATAL: no pg_hba.conf entry for replication connection”
You forgot the replication line in the primary’s pg_hba.conf, or the standby’s IP does not match. Remember the database column must literally say replication, not the name of a real database. Reload the primary after editing: sudo systemctl reload postgresql.
“could not connect to server: Connection refused”
Either listen_addresses is still localhost on the primary, or the firewall is blocking port 5432. Check what PostgreSQL is bound to with ss -tlnp | grep 5432 on the primary. You want to see 0.0.0.0:5432, not 127.0.0.1:5432.
“requested WAL segment has already been removed”
The standby fell behind and the primary recycled WAL it still needed. This is exactly what replication slots prevent, so make sure you created the slot and passed -S to pg_basebackup. If you hit this without a slot, you must rebuild the standby from scratch.
Standby will not start, “database files are incompatible with server”
The two servers are running different PostgreSQL major versions, or different CPU architectures. Physical replication requires both to match. Check psql --version on each.
pg_basebackup hangs or is very slow
Usually a network bottleneck or a primary under heavy write load. The -P flag shows progress so you can tell whether it is moving. For large databases, run it during a quieter period.
Best Practices
Watch your replication slots. A slot whose standby has disappeared will make the primary retain WAL forever, eventually filling the disk. Monitor lag with this query on the primary, which shows how much WAL each slot is holding:
SELECT slot_name, active,
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS retained
FROM pg_replication_slots;
If a slot is inactive and retained keeps growing, drop the slot with SELECT pg_drop_replication_slot('standby1_slot'); to release the WAL, then rebuild that standby when it returns.
Do not store the replication password in plain text if you can avoid it. The -R flag writes it into postgresql.auto.conf. For tighter security, move it into a .pgpass file owned by the postgres user with 0600 permissions, and remove the password from primary_conninfo.
Consider synchronous replication for zero data loss, carefully. By default replication is asynchronous, so a primary crash can lose the last few transactions that had not reached the standby yet. Setting synchronous_standby_names makes the primary wait for the standby to confirm each commit, guaranteeing no committed transaction is lost. The trade-off is real: if the synchronous standby goes down, writes on the primary block until it returns. Only use it when losing a transaction is worse than a write outage, and always with at least two standbys.
Use the standby for read-only queries. Point reporting dashboards and read-heavy analytics at the standby to take load off the primary. Just remember replication is slightly delayed, so a read on the standby may not reflect a write that happened a moment ago on the primary.
Automate failover for production. Manually promoting a standby is fine for learning, but a real production cluster should detect primary failure and promote automatically. Tools like Patroni or repmgr handle leader election, promotion, and reconfiguration so you are not doing it by hand at 3 AM.
Conclusion
You now have a working PostgreSQL streaming replication setup: a primary that streams its Write-Ahead Log to a standby, a replication slot that keeps the primary from discarding WAL the standby still needs, verified end-to-end replication, and a tested promotion procedure for failover. This is the foundation of a highly available database, the same architecture that backs many production systems.
From here, good next steps are:
- pg_rewind, to rejoin a failed primary as a standby without a full rebuild
- Patroni or repmgr, to automate failover and manage the cluster as a unit
- PgBouncer, a connection pooler that can route writes to the primary and reads to standbys
- Delayed replicas, a standby intentionally kept hours behind, useful for recovering from an accidental
DELETEbefore it replicates
Replication is one of the most valuable skills in operating PostgreSQL at scale, and you now have the core of it running on your own servers.