Problem
We used the same /health endpoint for both liveness and readiness probes. Under load, the health check included a database ping that timed out. Kubernetes marked pods as unhealthy and restarted them — causing cascading restarts across the cluster.
Key Insight
Liveness and readiness serve fundamentally different purposes:
| Probe | Controls | Failure Action |
|---|---|---|
| Liveness | Is the process alive? | Restart the pod |
| Readiness | Can it serve traffic? | Stop routing traffic to it |
Separate endpoints with different thresholds:
livenessProbe:
httpGet:
path: /health/live # Just: is the process running?
port: 8000
failureThreshold: 3
readinessProbe:
httpGet:
path: /health/ready # Can it serve? (DB, deps OK)
port: 8000
failureThreshold: 1 # Stop traffic immediatelyTakeaway
Conflating liveness and readiness causes cascading restarts under load. Liveness should be trivial ("is the process alive?"). Readiness checks external dependencies. Different endpoints, different thresholds, different failure actions.