Signs You’re Operating Kubernetes at Scale


Kubernetes works until it doesn’t.

The docs cover installation, basic operations, maybe some tuning. They don’t cover what happens when you have 3,000 nodes, 50,000 pods, and 10,000 services. That knowledge lives in incident reports and war stories.

This is a cheatsheet of symptoms. Each one signals you’ve crossed into “at scale” territory. Some link to deeper dives. Others are just: here’s the symptom, here’s the fix.

kube-proxy rebuilds the entire iptables rule set on every Service or Endpoints change. At 100 services, this takes milliseconds. At 10,000 services, it takes tens of seconds.

Watch: kubeproxy_sync_proxy_rules_duration_seconds climbing steadily.

The problem: During sync, new connections can fail. If sync takes longer than the interval between changes, you never catch up.

Deep dive: Beyond kube-proxy: eBPF Service Routing in Kubernetes


Kubernetes controllers use informer caches to avoid hammering the API server. These caches are eventually consistent. At scale, “eventually” gets longer.

Watch: Reconciliation loops doing redundant work. Race conditions between controllers. Resources getting created twice.

The problem: Your controller reads from cache, sees version N, makes a decision. By the time it acts, the real state is version N+3. The decision was wrong.

Deep dive: Eventual Consistency and Stale Caches in Kubernetes Controllers


Every pod, every connection, every service discovery hits CoreDNS. At scale, CoreDNS pods become a bottleneck.

Watch: Application latency spikes. Intermittent connection failures. SERVFAIL responses. CoreDNS pods at 100% CPU.

The problem: Default CoreDNS deployment doesn’t scale with cluster size. DNS becomes a single point of contention.

Deep dive: CoreDNS Under Pressure: How We Fixed DNS Bottlenecks with NodeLocal DNSCache


Admission webhooks—both mutating and validating—are in the critical path for every relevant API request. At scale, webhook latency compounds.

Watch: apiserver_admission_webhook_admission_duration_seconds increasing. API calls timing out. Cascading failures when a webhook is slow or down.

The problem: A 100ms webhook on every pod creation doesn’t matter at 10 pods/minute. At 1,000 pods/minute, it’s a 100-second queue.

Deep dive: Admission Webhooks at Scale: Diagnosis, Hardening, and Multi-Cluster Consistency


etcd backs everything in Kubernetes—every object, every watch, every change. At scale, etcd becomes the bottleneck.

Watch: Slow API responses. Watch lag. etcd_mvcc_db_total_size_in_bytes growing. Compaction taking minutes instead of seconds.

The problem: Too many objects, too many watches, too much churn. etcd’s single-node write path can’t keep up.

Deep dive: From etcd to Watch: How Kubernetes Watches Actually Work


The scheduler scores every feasible node for every pod. Scoring is roughly O(nodes × pods in scheduling cycle).

Watch: scheduler_pending_pods growing. scheduler_scheduling_attempt_duration_seconds spiking. Pods sitting in Pending for minutes.

The problem: With 5,000 nodes and complex affinity rules, the scheduler spends more time thinking than placing.

Deep dive: How the Kubernetes Scheduler Actually Works


kubectl apply stores the previous configuration in the kubectl.kubernetes.io/last-applied-configuration annotation. Annotations have a max size of 262KB.

Watch: apply failures with metadata.annotations: Too long: must have at most 262144 bytes.

The problem: Large ConfigMaps, complex CRDs, or deeply nested specs hit the limit. This breaks GitOps workflows that rely on kubectl apply.

Fix: Use server-side apply (kubectl apply --server-side). It tracks field ownership differently and doesn’t store the full config in an annotation.


Kubernetes 1.18+ has Priority and Fairness (P&F)—a system to protect the API server from overload by throttling requests.

Watch: apiserver_flowcontrol_rejected_requests_total increasing. Controllers logging “rate limited” or backing off. Legitimate requests getting 429 Too Many Requests.

The problem: At scale, legitimate controllers and kubelets generate enough traffic to trigger P&F. The defaults assume smaller clusters.

Fix: Tune P&F flow schemas. Identify which priority levels are saturated (apiserver_flowcontrol_current_inqueue_requests). Reduce API chatter—use informers properly, batch operations, avoid unnecessary watches.


Every node sends a heartbeat to the control plane—by default, every 10 seconds. With Leases (default since 1.17), this updates a Lease object in etcd.

Watch: etcd write latency climbing. etcd_disk_wal_fsync_duration_seconds increasing. Control plane CPU spent on heartbeat processing.

The problem: 5,000 nodes × 1 heartbeat per 10 seconds = 500 writes/second, just for liveness. Add to that all the actual work.

Fix: Lease objects (now default) are much lighter than Node status updates. If you’re still on Node status heartbeats, migrate. Consider increasing --node-status-update-frequency if your SLOs allow.


A single Kubernetes cluster has practical limits: ~5,000 nodes, ~150,000 pods, ~100,000 total objects. These aren’t hard limits—they’re where things start breaking.

Watch: You’re past the tuning phase. Every component is optimized. It’s still not enough.

The problem: At some point, a single control plane can’t handle the load. The architecture needs to change.

Deep dive: Scaling Beyond 5,000 Nodes Per Cluster


Scale problems are configuration problems until they’re architecture problems.

The pattern is usually: hit a wall, tune something, buy time, hit the next wall. Most of the symptoms above have tuning fixes. Some—like the multi-cluster threshold—require rethinking.

Kubernetes doesn’t warn you when you’ve outgrown a component. These symptoms are the warnings.