2026-02-23 · 2 min read

ClickHouse Replication Lag: How to Diagnose & Fix

Replication lag in ClickHouse measures how far a replica is behind its source shard. You can check it instantly with a single query against system.replicas. Lag under 30 seconds is healthy. Over 5 minutes indicates a serious problem. The fix command is SYSTEM SYNC REPLICA database.table.

Detecting Replication Lag

-- Current lag across all replicated tables
SELECT
    database,
    table,
    replica_name,
    absolute_delay AS lag_seconds,
    queue_size,
    inserts_in_queue,
    merges_in_queue
FROM system.replicas
WHERE absolute_delay > 0
ORDER BY absolute_delay DESC;

Common Causes

1. Network congestion — High network utilization between ClickHouse nodes slows replication throughput.

2. Heavy merge activity — Large background merges consume disk I/O, starving replication of resources.

3. ZooKeeper/Keeper issues — ClickHouse uses ZooKeeper to coordinate replication. A degraded ZooKeeper session causes replication to stall.

4. Replica overloaded — If a replica is serving heavy query load, it may fall behind on replication.

Fixing Replication Lag

-- Force immediate sync (most common fix)
SYSTEM SYNC REPLICA database.table_name;
 
-- Check the replication queue for stuck operations
SELECT type, create_time, required_quorum, source_replica, parts_to_merge
FROM system.replication_queue
WHERE table = 'table_name'
ORDER BY create_time ASC
LIMIT 20;
 
-- Restart replication sends if stuck
SYSTEM RESTART REPLICA database.table_name;

Monitoring Thresholds

  • Green: absolute_delay < 30 seconds
  • Warning: absolute_delay between 30–300 seconds
  • Critical: absolute_delay > 300 seconds

Clustersight monitors replication lag continuously and sends a Slack alert with the SYSTEM SYNC REPLICA command pre-filled when lag exceeds your threshold.

Read more: How to Monitor ClickHouse in Production

Frequently Asked Questions

How do I check ClickHouse replication lag?

Query SELECT database, table, absolute_delay FROM system.replicas WHERE absolute_delay > 0 ORDER BY absolute_delay DESC. The absolute_delay column shows lag in seconds.

What is acceptable replication lag in ClickHouse?

Under 30 seconds is healthy. 30–300 seconds is a warning. Over 300 seconds (5 minutes) is a serious problem requiring investigation.

How do I fix replication lag in ClickHouse?

Run SYSTEM SYNC REPLICA database.table to force immediate synchronization. Also check the replication queue in system.replication_queue for stuck operations.