Your cloud capacity is inventory.
Reserved instances are safety stock. Spot instances are just-in-time procurement. Auto-scaling is your reorder system. And like any inventory problem, you’re balancing two risks: too much (waste) and too little (stockouts).
Operations research solved these problems decades ago. Let’s apply inventory theory to cloud capacity planning.
The Inventory Parallel ¶
| Inventory Concept | Cloud Equivalent |
|---|---|
| Safety stock | Reserved capacity headroom |
| Cycle stock | Baseline committed instances |
| Pipeline inventory | Instances being provisioned |
| Seasonal stock | Pre-scaled capacity for known peaks |
| Stockout | Throttling, 503s, outages |
| Holding cost | Paying for idle resources |
| Ordering cost | Provisioning overhead, cold starts |
Once you see cloud through this lens, classic inventory models become directly applicable.
The Fundamental Trade-off ¶
Every inventory problem balances two costs:
Cost of Too Little (Understocking) ¶
Capacity < Demand:
- Requests throttled or dropped
- Latency spikes (degraded experience)
- Revenue lost (checkout failures, abandoned sessions)
- SLA breaches (penalties, credits)
- Reputation damage (customers remember outages)
Quantifying stockout cost:
Stockout cost per hour:
Revenue at risk: $50,000/hour
Probability of stockout: 5% (given current buffer)
Expected stockout cost: $2,500/hour of exposure
Cost of Too Much (Overstocking) ¶
Capacity > Demand:
- Paying for idle resources
- Capital tied up (opportunity cost)
- Committed to wrong instance types
- Harder to migrate (locked into reservations)
Quantifying holding cost:
Excess capacity cost:
Reserved instances unused: 20% of fleet
Monthly reserved spend: $100,000
Waste: $20,000/month
The Optimization ¶
Total Cost = Holding Cost + Stockout Cost
Minimize Total Cost by finding optimal inventory level
This is the core of inventory theory—and it applies directly to capacity planning.
Model 1: Safety Stock for Capacity Headroom ¶
Safety stock protects against demand variability. In cloud terms: how much headroom above expected peak?
The Classic Formula ¶
Safety Stock = z × σ × √L
Where:
z = service level factor (e.g., 1.65 for 95%, 2.33 for 99%)
σ = standard deviation of demand
L = lead time to replenish
Applied to Cloud ¶
Capacity Headroom = z × σ_demand × √(scale_up_time)
Example:
Target availability: 99.9% (z = 3.09)
Demand std dev: 1,000 requests/sec
Time to scale up: 5 minutes = 0.083 hours
Headroom = 3.09 × 1,000 × √0.083
= 3.09 × 1,000 × 0.29
= 896 requests/sec of buffer capacity
If your current capacity handles 10,000 req/sec and peak demand averages 9,000 req/sec with σ=1,000, you need ~900 req/sec headroom to hit 99.9% availability.
The Insight ¶
Headroom depends on:
- Variability (σ): More variable demand → more headroom needed
- Lead time: Slower scaling → more headroom needed
- Service level: Higher availability target → more headroom needed
Reduce variability: Smooth traffic (rate limiting, queuing)
Reduce lead time: Faster auto-scaling, warm pools
Accept lower service: Maybe 99.5% is enough?
Each approach reduces required headroom differently.
Model 2: Economic Order Quantity for Reserved Instances ¶
EOQ answers: what’s the optimal order size balancing ordering costs and holding costs?
The Classic Formula ¶
EOQ = √(2DS/H)
Where:
D = annual demand
S = ordering/setup cost per order
H = holding cost per unit per year
Applied to Reserved Instances ¶
The question: How much capacity should we commit to in reserved instances vs keeping flexible?
Commitment size = √(2 × Annual Compute Demand × Commitment Overhead / Flexibility Premium)
Where:
Annual Compute Demand: Total compute-hours needed
Commitment Overhead: Cost of managing reservations, forecasting, etc.
Flexibility Premium: On-demand price - Reserved price (what you pay for flexibility)
Practical Framing ¶
More useful framing for cloud:
Reserved vs On-Demand Decision:
Reserved instance cost: $0.40/hour (1-year commit)
On-demand cost: $1.00/hour
Break-even utilization: 40%
If utilization > 40%: Reserve
If utilization < 40%: On-demand
But this ignores uncertainty. What if demand drops?
Expected value calculation:
Scenario A (80% prob): Demand stays high
Reserved cost: $0.40 × 8760 hours = $3,504
On-demand cost: $1.00 × 8760 = $8,760
Savings: $5,256
Scenario B (20% prob): Demand drops 50%
Reserved cost: $3,504 (still committed)
On-demand cost: $1.00 × 4380 = $4,380
Loss: $876 (paid for unused capacity)
Expected value of reserving:
0.8 × $5,256 + 0.2 × (-$876) = $4,030 expected savings
Reserve if expected savings > 0
The Insight ¶
Optimal reservation depends on:
- Demand certainty: More certain → reserve more
- Discount depth: Bigger discount → reserve more
- Commitment length: Longer commitment → need more certainty
High certainty + deep discount: Reserve aggressively (70-80% of base)
Moderate certainty: Reserve conservatively (50-60%)
High uncertainty: Minimize commitments, stay flexible
Model 3: Newsvendor for Spot Instance Buffers ¶
The newsvendor problem: how much to stock when demand is uncertain and leftovers have salvage value?
Classic example: newspaper vendor deciding how many papers to buy. Too few = missed sales. Too many = unsold papers.
The Classic Formula ¶
Optimal quantity where:
P(Demand ≤ Q*) = (p - c) / (p - s)
Where:
p = selling price (revenue per unit)
c = cost per unit
s = salvage value (what you get for excess)
Applied to Spot Buffer Capacity ¶
Spot instances are like newsvendor inventory:
- You acquire them speculatively
- If demand materializes, they generate value
- If not, you’ve paid for nothing (salvage = 0, or you can release them)
Spot buffer decision:
Value if used (p): $1.00/hour of revenue protected
Cost of spot (c): $0.30/hour
Salvage if unused (s): $0.00 (can terminate, pay nothing more)
Critical ratio = (p - c) / (p - s)
= ($1.00 - $0.30) / ($1.00 - $0.00)
= 0.70
Stock spot capacity at the 70th percentile of demand distribution
The Insight ¶
This means: if protecting $1 of revenue costs $0.30 in spot capacity, you should provision enough spot to cover 70% of the demand distribution—not 95% or 99%.
Why? Because the marginal cost of protection ($0.30) exceeds the marginal benefit once you’re past the 70th percentile.
Demand distribution:
50th percentile: 8,000 req/sec
70th percentile: 9,500 req/sec
90th percentile: 12,000 req/sec
99th percentile: 15,000 req/sec
Optimal spot buffer: Cover up to 9,500 req/sec
Above that: Accept some throttling (it's not economical to buffer)
This is counterintuitive. We’re trained to think “always provision for peak.” But the math says: provision for the economically optimal point, which is often well below peak.
Model 4: Reorder Point for Auto-Scaling ¶
When should you trigger scaling? Too early wastes money. Too late causes stockouts.
The Classic Formula ¶
Reorder Point = Expected demand during lead time + Safety stock
ROP = d × L + z × σ × √L
Where:
d = average demand rate
L = lead time
z = service level factor
σ = demand standard deviation
Applied to Auto-Scaling Triggers ¶
Scale-up trigger point:
Average demand: 8,000 req/sec
Current capacity: 10,000 req/sec
Time to scale up: 3 minutes
Demand variability (σ): 500 req/sec
Target service level: 99% (z = 2.33)
Demand during scale-up = 8,000 × (3/60) = 400 requests
Safety buffer = 2.33 × 500 × √(3/60) = 260 req/sec equivalent
Trigger scale-up when:
Current utilization approaches (Capacity - Safety buffer) / Capacity
= (10,000 - 260) / 10,000
= 97.4%
But that's too late! We need to trigger earlier to account for lead time.
Better: Trigger at 80% utilization to give scaling time to complete.
Dynamic Reorder Points ¶
Sophisticated systems adjust triggers based on:
Time of day:
Peak hours: Trigger at 70% utilization (more buffer)
Off-peak: Trigger at 85% utilization (less buffer needed)
Demand trend:
Demand rising: Trigger earlier
Demand falling: Trigger later (avoid over-provisioning)
Recent variability:
High variance: Trigger earlier
Stable: Trigger later
Model 5: Service Level Targeting ¶
How do you choose the right service level? Higher isn’t always better.
The Cost Trade-off ¶
Service Level Capacity Needed Cost Stockout Risk
90% 100 units $100K 10%
95% 115 units $115K 5%
99% 140 units $140K 1%
99.9% 175 units $175K 0.1%
99.99% 220 units $220K 0.01%
Each “9” costs more. Is it worth it?
The Calculation ¶
Optimal service level where:
Marginal cost of capacity = Marginal reduction in stockout cost
Going from 99% to 99.9%:
Additional capacity cost: $35K/year
Stockout probability reduction: 0.9%
Annual stockout events avoided: 0.9% × 365 = 3.3 days
Cost per stockout day: $50K
Stockout cost avoided: 3.3 × $50K = $165K
ROI: $165K / $35K = 4.7x → Worth it
Going from 99.9% to 99.99%:
Additional capacity cost: $45K/year
Stockout probability reduction: 0.09%
Annual stockout events avoided: 0.09% × 365 = 0.33 days
Cost per stockout day: $50K
Stockout cost avoided: 0.33 × $50K = $16K
ROI: $16K / $45K = 0.36x → Not worth it
The Insight ¶
There’s an economically optimal service level. It’s often lower than engineers instinctively want.
The 99.99% myth:
Engineers: "We need five nines!"
Finance: "What does that cost?"
Engineers: "Whatever it takes."
Math: "The marginal value of the 5th nine is $3K. The marginal cost is $200K."
Pick your service level based on stockout cost, not engineering pride.
Putting It Together: A Capacity Framework ¶
Step 1: Understand Your Demand ¶
Collect data:
- Average demand by hour/day/week
- Standard deviation of demand
- Peak demand events (frequency, magnitude)
- Trend (growing, stable, declining)
Step 2: Quantify Your Costs ¶
Stockout costs:
- Revenue per request
- SLA penalty per hour of degradation
- Customer churn from poor experience
- Reputation/brand impact
Holding costs:
- Reserved instance rates
- On-demand rates
- Opportunity cost of capital
Step 3: Calculate Optimal Levels ¶
Base capacity: Reserve instances covering ~60% of average demand
Headroom: Safety stock formula for variability
Spot buffer: Newsvendor model for peak coverage
Trigger points: Reorder point model for auto-scaling
Service level: Marginal cost = marginal benefit analysis
Step 4: Build the Portfolio ¶
Capacity portfolio:
Reserved (1-year): 50% of base (predictable, cheap)
Reserved (flexible): 20% of base (some flexibility)
On-demand: 20% of base (full flexibility)
Spot: 10% buffer (opportunistic)
Auto-scale headroom: Covers demand spikes up to service level target
Common Mistakes ¶
Mistake 1: Targeting 100% Availability ¶
100% availability requires infinite capacity.
Infinite capacity costs infinite money.
Pick a service level and optimize for it.
Mistake 2: Ignoring Lead Time ¶
"We have auto-scaling, we're fine."
But auto-scaling takes 3-5 minutes.
In 3 minutes at 10K req/sec, that's 1.8M requests.
If you're at capacity when scaling triggers, those requests fail.
Buffer for lead time.
Mistake 3: Static Thresholds ¶
Scaling at 80% utilization always:
- Wasteful at 3am (demand is low, 80% is fine)
- Dangerous before Black Friday (should scale earlier)
Dynamic thresholds based on context.
Mistake 4: Treating All Capacity as Equal ¶
Not all capacity is fungible:
- GPU instances vs CPU
- Memory-optimized vs compute-optimized
- Regional capacity constraints
Model each capacity type separately.
Summary ¶
Cloud capacity is inventory. Inventory theory applies:
| Problem | Model | Key Insight |
|---|---|---|
| How much headroom? | Safety stock | Depends on variability, lead time, service level |
| How much to reserve? | EOQ / Expected value | Balance discount vs flexibility |
| How much spot buffer? | Newsvendor | Optimal is often below peak—it’s not economical to cover everything |
| When to scale? | Reorder point | Trigger early enough for lead time |
| What service level? | Marginal analysis | Each “9” has diminishing returns |
The capacity planning mindset:
Old thinking:
"Provision for peak. Add buffer. Don't run out."
Inventory thinking:
"Balance holding costs vs stockout costs.
Find the economically optimal point.
Accept that some stockout risk is rational."
You wouldn’t run a warehouse by saying “stock infinite inventory so we never run out.” Don’t run cloud capacity that way either.
Find your optimal inventory level. It’s probably lower than you think—and that’s okay.