Skip to content
All Notes

Kubernetes Liveness != Readiness

Problem

We used the same /health endpoint for both liveness and readiness probes. Under load, the health check included a database ping that timed out. Kubernetes marked pods as unhealthy and restarted them — causing cascading restarts across the cluster.

Key Insight

Liveness and readiness serve fundamentally different purposes:

ProbeControlsFailure Action
LivenessIs the process alive?Restart the pod
ReadinessCan it serve traffic?Stop routing traffic to it

Separate endpoints with different thresholds:

livenessProbe:
  httpGet:
    path: /health/live    # Just: is the process running?
    port: 8000
  failureThreshold: 3
 
readinessProbe:
  httpGet:
    path: /health/ready   # Can it serve? (DB, deps OK)
    port: 8000
  failureThreshold: 1     # Stop traffic immediately

Takeaway

Conflating liveness and readiness causes cascading restarts under load. Liveness should be trivial ("is the process alive?"). Readiness checks external dependencies. Different endpoints, different thresholds, different failure actions.