Inventory Theory for Compute Capacity: How Much Buffer Should You Hold?


Your cloud capacity is inventory.

Reserved instances are safety stock. Spot instances are just-in-time procurement. Auto-scaling is your reorder system. And like any inventory problem, you’re balancing two risks: too much (waste) and too little (stockouts).

Operations research solved these problems decades ago. Let’s apply inventory theory to cloud capacity planning.

Inventory Concept Cloud Equivalent
Safety stock Reserved capacity headroom
Cycle stock Baseline committed instances
Pipeline inventory Instances being provisioned
Seasonal stock Pre-scaled capacity for known peaks
Stockout Throttling, 503s, outages
Holding cost Paying for idle resources
Ordering cost Provisioning overhead, cold starts

Once you see cloud through this lens, classic inventory models become directly applicable.

Every inventory problem balances two costs:

Capacity < Demand:
  - Requests throttled or dropped
  - Latency spikes (degraded experience)
  - Revenue lost (checkout failures, abandoned sessions)
  - SLA breaches (penalties, credits)
  - Reputation damage (customers remember outages)

Quantifying stockout cost:

Stockout cost per hour:
  Revenue at risk:           $50,000/hour
  Probability of stockout:   5% (given current buffer)
  Expected stockout cost:    $2,500/hour of exposure
Capacity > Demand:
  - Paying for idle resources
  - Capital tied up (opportunity cost)
  - Committed to wrong instance types
  - Harder to migrate (locked into reservations)

Quantifying holding cost:

Excess capacity cost:
  Reserved instances unused:  20% of fleet
  Monthly reserved spend:     $100,000
  Waste:                      $20,000/month
Total Cost = Holding Cost + Stockout Cost

Minimize Total Cost by finding optimal inventory level

This is the core of inventory theory—and it applies directly to capacity planning.

Safety stock protects against demand variability. In cloud terms: how much headroom above expected peak?

Safety Stock = z × σ × √L

Where:
  z = service level factor (e.g., 1.65 for 95%, 2.33 for 99%)
  σ = standard deviation of demand
  L = lead time to replenish
Capacity Headroom = z × σ_demand × √(scale_up_time)

Example:
  Target availability:        99.9% (z = 3.09)
  Demand std dev:             1,000 requests/sec
  Time to scale up:           5 minutes = 0.083 hours
  
  Headroom = 3.09 × 1,000 × √0.083
           = 3.09 × 1,000 × 0.29
           = 896 requests/sec of buffer capacity

If your current capacity handles 10,000 req/sec and peak demand averages 9,000 req/sec with σ=1,000, you need ~900 req/sec headroom to hit 99.9% availability.

Headroom depends on:

  1. Variability (σ): More variable demand → more headroom needed
  2. Lead time: Slower scaling → more headroom needed
  3. Service level: Higher availability target → more headroom needed
Reduce variability:     Smooth traffic (rate limiting, queuing)
Reduce lead time:       Faster auto-scaling, warm pools
Accept lower service:   Maybe 99.5% is enough?

Each approach reduces required headroom differently.

EOQ answers: what’s the optimal order size balancing ordering costs and holding costs?

EOQ = √(2DS/H)

Where:
  D = annual demand
  S = ordering/setup cost per order
  H = holding cost per unit per year

The question: How much capacity should we commit to in reserved instances vs keeping flexible?

Commitment size = √(2 × Annual Compute Demand × Commitment Overhead / Flexibility Premium)

Where:
  Annual Compute Demand:    Total compute-hours needed
  Commitment Overhead:      Cost of managing reservations, forecasting, etc.
  Flexibility Premium:      On-demand price - Reserved price (what you pay for flexibility)

More useful framing for cloud:

Reserved vs On-Demand Decision:

Reserved instance cost:     $0.40/hour (1-year commit)
On-demand cost:             $1.00/hour
Break-even utilization:     40%

If utilization > 40%:       Reserve
If utilization < 40%:       On-demand

But this ignores uncertainty. What if demand drops?

Expected value calculation:

Scenario A (80% prob): Demand stays high
  Reserved cost: $0.40 × 8760 hours = $3,504
  On-demand cost: $1.00 × 8760 = $8,760
  Savings: $5,256

Scenario B (20% prob): Demand drops 50%
  Reserved cost: $3,504 (still committed)
  On-demand cost: $1.00 × 4380 = $4,380
  Loss: $876 (paid for unused capacity)

Expected value of reserving:
  0.8 × $5,256 + 0.2 × (-$876) = $4,030 expected savings

Reserve if expected savings > 0

Optimal reservation depends on:

  1. Demand certainty: More certain → reserve more
  2. Discount depth: Bigger discount → reserve more
  3. Commitment length: Longer commitment → need more certainty
High certainty + deep discount:    Reserve aggressively (70-80% of base)
Moderate certainty:                Reserve conservatively (50-60%)
High uncertainty:                  Minimize commitments, stay flexible

The newsvendor problem: how much to stock when demand is uncertain and leftovers have salvage value?

Classic example: newspaper vendor deciding how many papers to buy. Too few = missed sales. Too many = unsold papers.

Optimal quantity where:
  P(Demand ≤ Q*) = (p - c) / (p - s)

Where:
  p = selling price (revenue per unit)
  c = cost per unit
  s = salvage value (what you get for excess)

Spot instances are like newsvendor inventory:

  • You acquire them speculatively
  • If demand materializes, they generate value
  • If not, you’ve paid for nothing (salvage = 0, or you can release them)
Spot buffer decision:

Value if used (p):          $1.00/hour of revenue protected
Cost of spot (c):           $0.30/hour
Salvage if unused (s):      $0.00 (can terminate, pay nothing more)

Critical ratio = (p - c) / (p - s)
               = ($1.00 - $0.30) / ($1.00 - $0.00)
               = 0.70

Stock spot capacity at the 70th percentile of demand distribution

This means: if protecting $1 of revenue costs $0.30 in spot capacity, you should provision enough spot to cover 70% of the demand distribution—not 95% or 99%.

Why? Because the marginal cost of protection ($0.30) exceeds the marginal benefit once you’re past the 70th percentile.

Demand distribution:
  50th percentile:    8,000 req/sec
  70th percentile:    9,500 req/sec
  90th percentile:    12,000 req/sec
  99th percentile:    15,000 req/sec

Optimal spot buffer: Cover up to 9,500 req/sec
Above that:          Accept some throttling (it's not economical to buffer)

This is counterintuitive. We’re trained to think “always provision for peak.” But the math says: provision for the economically optimal point, which is often well below peak.

When should you trigger scaling? Too early wastes money. Too late causes stockouts.

Reorder Point = Expected demand during lead time + Safety stock
ROP = d × L + z × σ × √L

Where:
  d = average demand rate
  L = lead time
  z = service level factor
  σ = demand standard deviation
Scale-up trigger point:

Average demand:              8,000 req/sec
Current capacity:            10,000 req/sec
Time to scale up:            3 minutes
Demand variability (σ):      500 req/sec
Target service level:        99% (z = 2.33)

Demand during scale-up = 8,000 × (3/60) = 400 requests
Safety buffer = 2.33 × 500 × √(3/60) = 260 req/sec equivalent

Trigger scale-up when:
  Current utilization approaches (Capacity - Safety buffer) / Capacity
  = (10,000 - 260) / 10,000
  = 97.4%

But that's too late! We need to trigger earlier to account for lead time.

Better: Trigger at 80% utilization to give scaling time to complete.

Sophisticated systems adjust triggers based on:

Time of day:
  Peak hours:       Trigger at 70% utilization (more buffer)
  Off-peak:         Trigger at 85% utilization (less buffer needed)

Demand trend:
  Demand rising:    Trigger earlier
  Demand falling:   Trigger later (avoid over-provisioning)

Recent variability:
  High variance:    Trigger earlier
  Stable:           Trigger later

How do you choose the right service level? Higher isn’t always better.

Service Level    Capacity Needed    Cost        Stockout Risk
90%              100 units          $100K       10%
95%              115 units          $115K       5%
99%              140 units          $140K       1%
99.9%            175 units          $175K       0.1%
99.99%           220 units          $220K       0.01%

Each “9” costs more. Is it worth it?

Optimal service level where:
  Marginal cost of capacity = Marginal reduction in stockout cost

Going from 99% to 99.9%:
  Additional capacity cost:       $35K/year
  Stockout probability reduction: 0.9%
  Annual stockout events avoided: 0.9% × 365 = 3.3 days
  Cost per stockout day:          $50K
  Stockout cost avoided:          3.3 × $50K = $165K

  ROI: $165K / $35K = 4.7x → Worth it

Going from 99.9% to 99.99%:
  Additional capacity cost:       $45K/year
  Stockout probability reduction: 0.09%
  Annual stockout events avoided: 0.09% × 365 = 0.33 days
  Cost per stockout day:          $50K
  Stockout cost avoided:          0.33 × $50K = $16K

  ROI: $16K / $45K = 0.36x → Not worth it

There’s an economically optimal service level. It’s often lower than engineers instinctively want.

The 99.99% myth:

Engineers:  "We need five nines!"
Finance:    "What does that cost?"
Engineers:  "Whatever it takes."
Math:       "The marginal value of the 5th nine is $3K. The marginal cost is $200K."

Pick your service level based on stockout cost, not engineering pride.

Collect data:
  - Average demand by hour/day/week
  - Standard deviation of demand
  - Peak demand events (frequency, magnitude)
  - Trend (growing, stable, declining)
Stockout costs:
  - Revenue per request
  - SLA penalty per hour of degradation
  - Customer churn from poor experience
  - Reputation/brand impact

Holding costs:
  - Reserved instance rates
  - On-demand rates
  - Opportunity cost of capital
Base capacity:        Reserve instances covering ~60% of average demand
Headroom:             Safety stock formula for variability
Spot buffer:          Newsvendor model for peak coverage
Trigger points:       Reorder point model for auto-scaling
Service level:        Marginal cost = marginal benefit analysis
Capacity portfolio:
  Reserved (1-year):   50% of base (predictable, cheap)
  Reserved (flexible): 20% of base (some flexibility)
  On-demand:           20% of base (full flexibility)
  Spot:                10% buffer (opportunistic)
  Auto-scale headroom: Covers demand spikes up to service level target
100% availability requires infinite capacity.

Infinite capacity costs infinite money.

Pick a service level and optimize for it.
"We have auto-scaling, we're fine."

But auto-scaling takes 3-5 minutes.
In 3 minutes at 10K req/sec, that's 1.8M requests.
If you're at capacity when scaling triggers, those requests fail.

Buffer for lead time.
Scaling at 80% utilization always:
  - Wasteful at 3am (demand is low, 80% is fine)
  - Dangerous before Black Friday (should scale earlier)

Dynamic thresholds based on context.
Not all capacity is fungible:
  - GPU instances vs CPU
  - Memory-optimized vs compute-optimized
  - Regional capacity constraints

Model each capacity type separately.

Cloud capacity is inventory. Inventory theory applies:

Problem Model Key Insight
How much headroom? Safety stock Depends on variability, lead time, service level
How much to reserve? EOQ / Expected value Balance discount vs flexibility
How much spot buffer? Newsvendor Optimal is often below peak—it’s not economical to cover everything
When to scale? Reorder point Trigger early enough for lead time
What service level? Marginal analysis Each “9” has diminishing returns

The capacity planning mindset:

Old thinking:
  "Provision for peak. Add buffer. Don't run out."

Inventory thinking:
  "Balance holding costs vs stockout costs. 
   Find the economically optimal point.
   Accept that some stockout risk is rational."

You wouldn’t run a warehouse by saying “stock infinite inventory so we never run out.” Don’t run cloud capacity that way either.

Find your optimal inventory level. It’s probably lower than you think—and that’s okay.