When deploying microservices on Kubernetes, how many replicas should I run? Intuition suggests that two replicas is enough to give you high availability; if one pod dies, the other keeps the service online. Industry standard, however, suggests running three replicas. Why is this the case?

In this post, I will explain reasons to choose three replicas over two, and some best practices based on the realities specific to Kubernetes.

The Math Behind Preferring Three Replicas

The suggestion to run three replicas is often followed by hand-wavy “because it is more resilient”. Rather than rely on intuition, let’s try putting some numbers behind the reasoning.

The Model

We model resiliency as the probability of having at least two healthy pods at any moment. This aligns with real-world conditions, which require at least a single replica of capacity to support transient failures, rollouts, or other events impacting availability of a single pod. This ensures you maintain redundancy during the moments in time when one pod is down.

Assumptions:

  • Each pod is independently unavailable with probability \(p\), meaning that it is available with probability \(1-p\)
  • The target state is to have greater than or equal to two pods healthy at all times to withstand regular maintenance or failure events
  • Costs scale approximately linearly with replica count

Formula:

Let \(n\) be the number of replicas and \(p\) the per-pod unavailability probability.

To find the probability that at least two pods are healthy, we sum up all the scenarios where exactly \(i\) pods are healthy, for every value of \(i\) from \(2\) up to \(n\). Each scenario follows a binomial distribution, and we can calculate overall probability by summing over the different values from this distribution for each \(i\):

$$ A_{\geq 2}(n,p) = \sum_{i=2}^{n} \binom{n}{i} (1-p)^i p^{n-i} $$

Breaking down the formula, you get:

  • \(\binom{n}{i}\) is the number of ways to choose which \(i\) pods are healthy
  • \((1 - p)^i\) is the probability that those \(i\) specific pods are healthy
  • \(p^{n-i}\) is the probability that the remaining \(n - i\) pods are down
  • We sum from \(i = 2\) to \(i = n\) to cover “at least two of the pods are healthy”

Let’s look at the two cases where \(n = 2\) and \(n = 3\) with different probabilities of pod failure.

p = 1% (99% per-pod availability)

With two replicas the probability that both are healthy reduces to just a square of each individual probability:

$$ A_{\geq 2}(2, 0.01) = (1−p)^2 = (0.99)^2 = 0.9801 $$

With three replicas we have a slightly more complicated formula. We need to sum the probability that two out of the three replicas are healthy at one point in time with the probability that all three replicas are available at one point in time:

$$A_{\geq 2}(3,p) = \binom{3}{2}(1-p)^2 p + (1-p)^3 = 3(1-p)^2p + (1-p)^3$$

Substituting our values, we calculate the probability that exactly two are healthy

$$ \binom{3}{2}(0.99)^2(0.01) = 3 \times 0.9801 \times 0.01 = 0.029403 $$

and the probability that exactly three are healthy

$$ (0.99)^3 = 0.970299 $$

then sum them together to get the total availability

$$ A_{\geq 2}(3, 0.01) = 0.029403 + 0.970299 = 0.999702 $$

So, with two replicas and a 1% unavailability of a single pod, we can measure the likelihood we have two pods running at all times as \(0.98\), whereas with three replicas, the likelihood that we have two pods running at all times is \(0.9997\).

We can convert these calculations into minutes per month:

  • Two replicas: \((1 - 0.9801) \times 43200 \approx 859.7\) minutes below target
  • Three replicas: \((1 - 0.999702) \times 43200 \approx 12.9\) minutes below target

This is an improvement of about \(859.7 / 12.9 \approx 67\times\) for only 50% more cost.

If we repeat this experiment with different values for per-pod availability, we see even more relative improvement:

p = 0.5% (99.5% per-pod availability)

  • Two replicas: \(A_{\geq 2}(2, 0.005) = (0.995)^2 = 0.990025\) 430.9 minutes below target
  • Three replicas: \(A_{\geq 2}(3, 0.005) = 0.99992525\) 3.2 minutes below target

Improvement: \(133\times\) less time in a degraded state

p = 0.1% (99.9% per-pod availability)

  • Two replicas: \(A_{\geq 2}(2, 0.001) = 0.998001\) implying 86.4 minutes below target
  • Three replicas: \(A_{\geq 2}(3, 0.001) = 0.999997\) implying 0.13 minutes below target

Improvement: \(660\times\) less time in a degraded state

The following plot visualizes these numbers for varying values of \(p\). On the vertical axis, we track resiliency (probability of greater or equal than two pods running) at a moment in time. On the horizontal axis, we track different probabilities of a single pod being unavailable at a moment in time.

You can see a dramatically different resiliency curve between two and three replicas, and a less pronounced difference between three and four replicas.

Resilience vs Availability

Why “≥2 Healthy” Is the Right Target

This discussion does not describe downtime or availability of your service. Rather, it describes resiliency as measured by the number of minutes you will be running a single replica, leaving you susceptible to a total outage.

Indeed, if the target is “≥1 healthy pod”, the math looks great:

  • With two replicas: \(A_{\geq 1}(2, 0.01) = 1 - (0.01)^2 = 0.9999\)
  • With three replicas: \(A_{\geq 1}(3, 0.01) = 1 - (0.01)^3 = 0.999999\)

Both look amazing! But this metric is misleading because it measures existence of a single pod, not resilience or the ability to withstand faults. If the target steady state is a single pod, you are often at a capacity cliff-edge with no safety margin. Production services should strive to maintain redundancy at all times, which means targeting two or more healthy pods.

Of course, more replicas always improve availability — you could run four, six, or even ten if you wanted. But the return diminishes quickly. Going from one to two replicas gives you redundancy, but in a fragile way. Going from two to three gives you a massive jump in robustness for minimal cost. Adding more beyond three, however, tends to yield smaller resilience gains relative to cost.

At 1% per-pod unavailability:

  • Two replicas: 860 minutes degraded/month
  • Three replicas: 13 minutes degraded/month (66× better)
  • Four replicas: 0.4 minutes degraded/month (32× better than three)

The jump from two to three is dramatic. The jump from three to four is incremental. This is why three has emerged as the industry default — it’s the sweet spot on the availability vs cost curve. You can always add more replicas for performance and scaling reasons.

The Kubernetes Confusion

At this point, you may be convinced that you should run your service with a steady state of at least two pods healthy and available at all times. Unfortunately, Kubernetes presents what might seem like a counter-intuitive model for controlling how many pods are available during regularly scheduled maintenance events.

In particular, minReplicas determines the number of replicas that should be available during steady-state operations. Kubernetes will maintain these replicas during normal operation. However, during a rolling update, or maintenance events like a pod eviction, or node draining, the value of maxUnavailable is what actively governs the minimum number of pods, not minReplicas.

The default value for maxUnavailable is one. This means that during rolling updates or maintenance events like node drains, Kubernetes will take one pod offline before bringing a new one up. With only two replicas, this means you are routinely guaranteed to run with just a single pod for a period of time. With this configuration, every routine operation becomes a risky gamble. Not to mention the case when unscheduled events like out-of-memory errors or segfaults terminate a pod with no notice.

As a practical example, imagine a critical service running with minReplicas = 2 to save costs. During a routine upgrade, one pod was terminated, as expected given maxUnavailable = 1. Unfortunately, the replacement pod gets stuck during deployment in CrashLoopBackOff due to a misconfigured health check timeout — something that had always passed in staging but failed under production load. During the time it takes to resolve the health check issue, a single pod is serving all production traffic. In this case, increasing minReplicas to 3 is a negligible cost increase that comes with operational peace of mind.

Putting It Into Practice

If you adopt a minimum of three replicas as your best practice, here are some Kubernetes best practices that ensure you remain at two replicas the majority of time:

Configure horizontal scaling with 3 minReplicas

This property sets a lower limit on the number of pod replicas that are maintained during steady state operations. You may still dip deploy this threshold during regular maintenance events or during fault or error events that require bringing up another pod.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-service-hpa
spec:
  minReplicas: 3
  maxReplicas: 10

Pod Disruption Budget

A Kubernetes Pod Disruption Budget (PDB) is a resource that allows you to specify the minimum number or percentage of Pods that must remain available during a voluntary disruption. Voluntary disruptions are planned events like node maintenance, cluster upgrades, or scaling down deployments, which might require evicting Pods. With minReplicas set to 3, you can set this value to 2 to ensure you have redundancy during these events.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-service-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: my-service

Rolling Update Strategy

During a deployment, Kubernetes will terminate pods up to the setting of maxUnavailable, which defaults to one. This setting is safe when running three pods, but unsafe for two. By also setting maxSurge, you allow Kubernetes to create additional pods during a deployment.

This fits naturally with three replicas:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1

If you are running more than 3 replicas, a good setting here is 25%. For example, consider a Deployment with replicas: 8, maxSurge: "25%", and maxUnavailable: "25%". During a rolling update: maxSurge (25% of 8) would be 2, meaning up to 10 pods (8 desired + 2 surge) could be running at any given time during the update. maxUnavailable (25% of 8) would also be 2, meaning at least 6 pods (8 desired - 2 unavailable) must always be in a ready state. This configuration allows for a controlled, gradual update where at least 6 pods are in a ready state at all times and new pods are brought online and become healthy before older pods are scaled down, ensuring the application remains available throughout the process.

(Optional): Topology Constraints

Use topology spread constraints to distribute pods across availability zones:

topologySpreadConstraints:
- maxSkew: 1
  topologyKey: topology.kubernetes.io/zone
  whenUnsatisfiable: DoNotSchedule
  labelSelector:
    matchLabels:
      app: my-service

Together, these practices ensure you are running two replicas during normal operations within a Kubernetes cluster.

When You Might Use Fewer

For production services, three pods or more is desirable. However, there are legitimate exceptions:

  • Development and staging environments where availability isn’t critical and cost matters more. Two replicas (or even one) can be acceptable for non-production workloads.
  • Extremely low-traffic services where even a single pod is over-provisioned. If your service handles 10 requests per day, three replicas might be overkill. Consider whether the service should exist at all, or whether it should be merged with another service.
  • Batch jobs and async workers where you’re processing queues rather than serving live traffic. These workloads have different availability models — losing a worker temporarily just means tasks take longer to process.