Inventory Theory for Compute Capacity: How Much Buffer Should You Hold?

Your cloud capacity is inventory.

Reserved instances are safety stock. Spot instances are just-in-time procurement. Auto-scaling is your reorder system. And like any inventory problem, you’re balancing two risks: too much (waste) and too little (stockouts).

Operations research solved these problems decades ago. Let’s apply inventory theory to cloud capacity planning.

The Inventory Parallel ¶

Inventory Concept	Cloud Equivalent
Safety stock	Reserved capacity headroom
Cycle stock	Baseline committed instances
Pipeline inventory	Instances being provisioned
Seasonal stock	Pre-scaled capacity for known peaks
Stockout	Throttling, 503s, outages
Holding cost	Paying for idle resources
Ordering cost	Provisioning overhead, cold starts

Once you see cloud through this lens, classic inventory models become directly applicable.

The Fundamental Trade-off ¶

Every inventory problem balances two costs:

Cost of Too Little (Understocking) ¶

Capacity < Demand:
  - Requests throttled or dropped
  - Latency spikes (degraded experience)
  - Revenue lost (checkout failures, abandoned sessions)
  - SLA breaches (penalties, credits)
  - Reputation damage (customers remember outages)

Quantifying stockout cost:

Stockout cost per hour:
  Revenue at risk:           $50,000/hour
  Probability of stockout:   5% (given current buffer)
  Expected stockout cost:    $2,500/hour of exposure

Cost of Too Much (Overstocking) ¶

Capacity > Demand:
  - Paying for idle resources
  - Capital tied up (opportunity cost)
  - Committed to wrong instance types
  - Harder to migrate (locked into reservations)

Quantifying holding cost:

Excess capacity cost:
  Reserved instances unused:  20% of fleet
  Monthly reserved spend:     $100,000
  Waste:                      $20,000/month

The Optimization ¶

Total Cost = Holding Cost + Stockout Cost

Minimize Total Cost by finding optimal inventory level

This is the core of inventory theory—and it applies directly to capacity planning.

Model 1: Safety Stock for Capacity Headroom ¶

Safety stock protects against demand variability. In cloud terms: how much headroom above expected peak?

The Classic Formula ¶

Safety Stock = z × σ × √L

Where:
  z = service level factor (e.g., 1.65 for 95%, 2.33 for 99%)
  σ = standard deviation of demand
  L = lead time to replenish

Applied to Cloud ¶

Capacity Headroom = z × σ_demand × √(scale_up_time)

Example:
  Target availability:        99.9% (z = 3.09)
  Demand std dev:             1,000 requests/sec
  Time to scale up:           5 minutes = 0.083 hours
  
  Headroom = 3.09 × 1,000 × √0.083
           = 3.09 × 1,000 × 0.29
           = 896 requests/sec of buffer capacity

If your current capacity handles 10,000 req/sec and peak demand averages 9,000 req/sec with σ=1,000, you need ~900 req/sec headroom to hit 99.9% availability.

The Insight ¶

Headroom depends on:

Variability (σ): More variable demand → more headroom needed
Lead time: Slower scaling → more headroom needed
Service level: Higher availability target → more headroom needed

Reduce variability:     Smooth traffic (rate limiting, queuing)
Reduce lead time:       Faster auto-scaling, warm pools
Accept lower service:   Maybe 99.5% is enough?

Each approach reduces required headroom differently.

Model 2: Economic Order Quantity for Reserved Instances ¶

EOQ answers: what’s the optimal order size balancing ordering costs and holding costs?

The Classic Formula ¶

EOQ = √(2DS/H)

Where:
  D = annual demand
  S = ordering/setup cost per order
  H = holding cost per unit per year

Applied to Reserved Instances ¶

The question: How much capacity should we commit to in reserved instances vs keeping flexible?

Commitment size = √(2 × Annual Compute Demand × Commitment Overhead / Flexibility Premium)

Where:
  Annual Compute Demand:    Total compute-hours needed
  Commitment Overhead:      Cost of managing reservations, forecasting, etc.
  Flexibility Premium:      On-demand price - Reserved price (what you pay for flexibility)

Practical Framing ¶

More useful framing for cloud:

Reserved vs On-Demand Decision:

Reserved instance cost:     $0.40/hour (1-year commit)
On-demand cost:             $1.00/hour
Break-even utilization:     40%

If utilization > 40%:       Reserve
If utilization < 40%:       On-demand

But this ignores uncertainty. What if demand drops?

Expected value calculation:

Scenario A (80% prob): Demand stays high
  Reserved cost: $0.40 × 8760 hours = $3,504
  On-demand cost: $1.00 × 8760 = $8,760
  Savings: $5,256

Scenario B (20% prob): Demand drops 50%
  Reserved cost: $3,504 (still committed)
  On-demand cost: $1.00 × 4380 = $4,380
  Loss: $876 (paid for unused capacity)

Expected value of reserving:
  0.8 × $5,256 + 0.2 × (-$876) = $4,030 expected savings

Reserve if expected savings > 0

The Insight ¶

Optimal reservation depends on:

Demand certainty: More certain → reserve more
Discount depth: Bigger discount → reserve more
Commitment length: Longer commitment → need more certainty

High certainty + deep discount:    Reserve aggressively (70-80% of base)
Moderate certainty:                Reserve conservatively (50-60%)
High uncertainty:                  Minimize commitments, stay flexible

Model 3: Newsvendor for Spot Instance Buffers ¶

The newsvendor problem: how much to stock when demand is uncertain and leftovers have salvage value?

Classic example: newspaper vendor deciding how many papers to buy. Too few = missed sales. Too many = unsold papers.

The Classic Formula ¶

Optimal quantity where:
  P(Demand ≤ Q*) = (p - c) / (p - s)

Where:
  p = selling price (revenue per unit)
  c = cost per unit
  s = salvage value (what you get for excess)

Applied to Spot Buffer Capacity ¶

Spot instances are like newsvendor inventory:

You acquire them speculatively
If demand materializes, they generate value
If not, you’ve paid for nothing (salvage = 0, or you can release them)

Spot buffer decision:

Value if used (p):          $1.00/hour of revenue protected
Cost of spot (c):           $0.30/hour
Salvage if unused (s):      $0.00 (can terminate, pay nothing more)

Critical ratio = (p - c) / (p - s)
               = ($1.00 - $0.30) / ($1.00 - $0.00)
               = 0.70

Stock spot capacity at the 70th percentile of demand distribution

The Insight ¶

This means: if protecting $1 of revenue costs $0.30 in spot capacity, you should provision enough spot to cover 70% of the demand distribution—not 95% or 99%.

Why? Because the marginal cost of protection ($0.30) exceeds the marginal benefit once you’re past the 70th percentile.

Demand distribution:
  50th percentile:    8,000 req/sec
  70th percentile:    9,500 req/sec
  90th percentile:    12,000 req/sec
  99th percentile:    15,000 req/sec

Optimal spot buffer: Cover up to 9,500 req/sec
Above that:          Accept some throttling (it's not economical to buffer)

This is counterintuitive. We’re trained to think “always provision for peak.” But the math says: provision for the economically optimal point, which is often well below peak.

Model 4: Reorder Point for Auto-Scaling ¶

When should you trigger scaling? Too early wastes money. Too late causes stockouts.

The Classic Formula ¶

Reorder Point = Expected demand during lead time + Safety stock
ROP = d × L + z × σ × √L

Where:
  d = average demand rate
  L = lead time
  z = service level factor
  σ = demand standard deviation

Applied to Auto-Scaling Triggers ¶

Scale-up trigger point:

Average demand:              8,000 req/sec
Current capacity:            10,000 req/sec
Time to scale up:            3 minutes
Demand variability (σ):      500 req/sec
Target service level:        99% (z = 2.33)

Demand during scale-up = 8,000 × (3/60) = 400 requests
Safety buffer = 2.33 × 500 × √(3/60) = 260 req/sec equivalent

Trigger scale-up when:
  Current utilization approaches (Capacity - Safety buffer) / Capacity
  = (10,000 - 260) / 10,000
  = 97.4%

But that's too late! We need to trigger earlier to account for lead time.

Better: Trigger at 80% utilization to give scaling time to complete.

Dynamic Reorder Points ¶

Sophisticated systems adjust triggers based on:

Time of day:
  Peak hours:       Trigger at 70% utilization (more buffer)
  Off-peak:         Trigger at 85% utilization (less buffer needed)

Demand trend:
  Demand rising:    Trigger earlier
  Demand falling:   Trigger later (avoid over-provisioning)

Recent variability:
  High variance:    Trigger earlier
  Stable:           Trigger later

Model 5: Service Level Targeting ¶

How do you choose the right service level? Higher isn’t always better.

The Cost Trade-off ¶

Service Level    Capacity Needed    Cost        Stockout Risk
90%              100 units          $100K       10%
95%              115 units          $115K       5%
99%              140 units          $140K       1%
99.9%            175 units          $175K       0.1%
99.99%           220 units          $220K       0.01%

Each “9” costs more. Is it worth it?

The Calculation ¶

Optimal service level where:
  Marginal cost of capacity = Marginal reduction in stockout cost

Going from 99% to 99.9%:
  Additional capacity cost:       $35K/year
  Stockout probability reduction: 0.9%
  Annual stockout events avoided: 0.9% × 365 = 3.3 days
  Cost per stockout day:          $50K
  Stockout cost avoided:          3.3 × $50K = $165K

  ROI: $165K / $35K = 4.7x → Worth it

Going from 99.9% to 99.99%:
  Additional capacity cost:       $45K/year
  Stockout probability reduction: 0.09%
  Annual stockout events avoided: 0.09% × 365 = 0.33 days
  Cost per stockout day:          $50K
  Stockout cost avoided:          0.33 × $50K = $16K

  ROI: $16K / $45K = 0.36x → Not worth it

The Insight ¶

There’s an economically optimal service level. It’s often lower than engineers instinctively want.

The 99.99% myth:

Engineers:  "We need five nines!"
Finance:    "What does that cost?"
Engineers:  "Whatever it takes."
Math:       "The marginal value of the 5th nine is $3K. The marginal cost is $200K."

Pick your service level based on stockout cost, not engineering pride.

Putting It Together: A Capacity Framework ¶

Step 1: Understand Your Demand ¶

Collect data:
  - Average demand by hour/day/week
  - Standard deviation of demand
  - Peak demand events (frequency, magnitude)
  - Trend (growing, stable, declining)

Step 2: Quantify Your Costs ¶

Stockout costs:
  - Revenue per request
  - SLA penalty per hour of degradation
  - Customer churn from poor experience
  - Reputation/brand impact

Holding costs:
  - Reserved instance rates
  - On-demand rates
  - Opportunity cost of capital

Step 3: Calculate Optimal Levels ¶

Base capacity:        Reserve instances covering ~60% of average demand
Headroom:             Safety stock formula for variability
Spot buffer:          Newsvendor model for peak coverage
Trigger points:       Reorder point model for auto-scaling
Service level:        Marginal cost = marginal benefit analysis

Step 4: Build the Portfolio ¶

Capacity portfolio:
  Reserved (1-year):   50% of base (predictable, cheap)
  Reserved (flexible): 20% of base (some flexibility)
  On-demand:           20% of base (full flexibility)
  Spot:                10% buffer (opportunistic)
  Auto-scale headroom: Covers demand spikes up to service level target

Common Mistakes ¶

Mistake 1: Targeting 100% Availability ¶

100% availability requires infinite capacity.

Infinite capacity costs infinite money.

Pick a service level and optimize for it.

Mistake 2: Ignoring Lead Time ¶

"We have auto-scaling, we're fine."

But auto-scaling takes 3-5 minutes.
In 3 minutes at 10K req/sec, that's 1.8M requests.
If you're at capacity when scaling triggers, those requests fail.

Buffer for lead time.

Mistake 3: Static Thresholds ¶

Scaling at 80% utilization always:
  - Wasteful at 3am (demand is low, 80% is fine)
  - Dangerous before Black Friday (should scale earlier)

Dynamic thresholds based on context.

Mistake 4: Treating All Capacity as Equal ¶

Not all capacity is fungible:
  - GPU instances vs CPU
  - Memory-optimized vs compute-optimized
  - Regional capacity constraints

Model each capacity type separately.

Summary ¶

Cloud capacity is inventory. Inventory theory applies:

Problem	Model	Key Insight
How much headroom?	Safety stock	Depends on variability, lead time, service level
How much to reserve?	EOQ / Expected value	Balance discount vs flexibility
How much spot buffer?	Newsvendor	Optimal is often below peak—it’s not economical to cover everything
When to scale?	Reorder point	Trigger early enough for lead time
What service level?	Marginal analysis	Each “9” has diminishing returns

The capacity planning mindset:

Old thinking:
  "Provision for peak. Add buffer. Don't run out."

Inventory thinking:
  "Balance holding costs vs stockout costs. 
   Find the economically optimal point.
   Accept that some stockout risk is rational."

You wouldn’t run a warehouse by saying “stock infinite inventory so we never run out.” Don’t run cloud capacity that way either.

Find your optimal inventory level. It’s probably lower than you think—and that’s okay.