Eventual Consistency and Stale Caches in Kubernetes Controllers

Your custom controller watches for a label change. A user adds the label. Your controller does nothing — or worse, does the wrong thing. Thirty seconds later, it finally reacts. What happened?

The answer lies in the informer cache, and understanding it is the difference between controllers that work in demos and controllers that work in production.

The Problem: When Your Controller Lies to Itself ¶

Consider a simple scenario:

User runs: kubectl label deployment web feature.example.com/inject-sidecar=true
API server accepts the write, returns success
Your controller’s reconcile loop runs
Controller checks: “Does this Deployment have the sidecar label?”
Cache says: “No”
Controller does nothing

The label exists in etcd. The API server knows about it. But your controller’s local cache hasn’t caught up yet. Your controller just made a decision based on a lie.

This gets worse on busy clusters. When the API server is under load, watch events queue up. When your controller is processing a backlog, event handlers fall behind. The window between “truth in etcd” and “truth in your cache” widens from milliseconds to seconds — sometimes longer.

Symptoms you’ll see:

Flickering state: Resource toggles between two states as controllers fight over stale views
Unnecessary reconciliations: Controller keeps requeueing because it can’t see its own writes
Race conditions: Two controllers both think they need to act, both act, chaos ensues
Silent failures: Controller checks a condition, condition appears false, controller exits early

How the Informer Cache Actually Works ¶

To fix these problems, you need to understand the machinery between the API server and your Reconcile() function.

List & Watch ¶

Kubernetes controllers don’t poll the API server. Instead, they use the watch protocol:

Initial List: On startup, the controller fetches all relevant objects (e.g., all Deployments in the cluster)
Watch: Controller opens a long-lived HTTP/2 stream. The API server pushes events (ADDED, MODIFIED, DELETED) as objects change

This is efficient — you get updates pushed to you rather than polling. But it introduces a fundamental reality: your controller sees an eventually consistent view of the cluster.

SharedInformer Architecture ¶

The client-go library provides SharedInformer to manage this watch lifecycle. Here’s what’s actually happening:

API Server
    |
    |  Watch stream (HTTP/2)
    v
Reflector ------- Consumes watch events, handles reconnection
    |
    v
DeltaFIFO ------- Buffers changes, coalesces multiple updates
    |
    v
Indexer --------- The actual cache (thread-safe in-memory store)
    |
    |  Event handlers (OnAdd, OnUpdate, OnDelete)
    v
Work Queue ------ Rate-limited queue of keys to reconcile
    |
    v
Reconcile ------- Your code runs here

Every box in this diagram is a place where delay can accumulate.

ResourceVersion: How Kubernetes Tracks Freshness ¶

Every object in Kubernetes has a metadata.resourceVersion field. This isn’t a version number you control — it’s an opaque string derived from the etcd revision.

metadata:
  name: web
  resourceVersion: "1847293"  # etcd revision when this object was last modified

When you watch resources, the API server tracks your position in the event stream using ResourceVersion. When your watch reconnects, it resumes from where it left off (if possible) or relists everything.

Key insight: If you read an object from cache and it has resourceVersion: "1847293", you’re seeing the state as of etcd revision 1847293. The object might have been modified since then — your cache just hasn’t received the event yet.

The Lag Window ¶

The time between “API server accepts a write” and “your Reconcile() sees it” is your lag window. Let’s trace where time goes:

API Server Watch Cache ¶

The API server doesn’t stream directly from etcd. It maintains an in-memory watch cache and flushes events to watchers periodically. Default flush interval: ~100ms.

Delay contribution: 0-100ms typically

Network Latency ¶

Events travel from API server to your controller over the network.

Delay contribution: <1ms (same node) to 10-50ms (cross-region)

DeltaFIFO Processing ¶

The Reflector pushes events into DeltaFIFO. A separate goroutine pops events and updates the Indexer. If events arrive faster than they’re processed, they queue up.

Delay contribution: Microseconds normally, can spike to seconds under load

Event Handler Execution ¶

When the cache updates, your event handlers run (OnAdd, OnUpdate, OnDelete). If you do anything slow here — logging, metrics, complex filtering — you block subsequent events.

Delay contribution: Should be microseconds, but bad code makes this milliseconds or worse

Work Queue Wait Time ¶

Event handlers typically just add a key to the work queue. But if your reconciler is slow, the queue grows. New events wait behind old ones.

Delay contribution: Depends entirely on your reconciler throughput

Reconcile Execution ¶

Finally, your code runs. But you’re reading from the cache, which reflects state as of when the event handler ran — not when Reconcile runs.

Total lag budget: On a healthy cluster, 100-500ms is typical. On a busy cluster with slow reconcilers, 5-30 seconds is possible.

Measuring the Lag ¶

Want to see this in your cluster? Log the ResourceVersion at write time and compare to what your cache returns:

func (r *MyReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    var obj appsv1.Deployment
    if err := r.Get(ctx, req.NamespacedName, &obj); err != nil {
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }
    
    log.Info("Reconciling",
        "name", obj.Name,
        "cacheResourceVersion", obj.ResourceVersion,
        "queuedAt", req./* if you track this */))
    
    // ...
}

Compare against kubectl get deployment web -o jsonpath='{.metadata.resourceVersion}' to see how far behind your cache is.

Common Pitfalls ¶

Read-After-Write Inconsistency ¶

The most common bug:

func (r *MyReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    var deploy appsv1.Deployment
    if err := r.Get(ctx, req.NamespacedName, &deploy); err != nil {
        return ctrl.Result{}, err
    }
    
    // Add a label
    if deploy.Labels == nil {
        deploy.Labels = make(map[string]string)
    }
    deploy.Labels["my-controller/processed"] = "true"
    
    if err := r.Update(ctx, &deploy); err != nil {
        return ctrl.Result{}, err
    }
    
    // BUG: Reading immediately after writing
    var updated appsv1.Deployment
    if err := r.Get(ctx, req.NamespacedName, &updated); err != nil {
        return ctrl.Result{}, err
    }
    
    // This might be false! Cache hasn't caught up to our write.
    if updated.Labels["my-controller/processed"] != "true" {
        log.Error(nil, "Label not found after update!")  // This happens.
    }
    
    return ctrl.Result{}, nil
}

The Update() call succeeds and returns the updated object. But r.Get() reads from the cache, which hasn’t received the watch event yet.

Negative Existence Checks ¶

This pattern is deceptively dangerous:

func (r *MyReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    var deploy appsv1.Deployment
    if err := r.Get(ctx, req.NamespacedName, &deploy); err != nil {
        return ctrl.Result{}, err
    }
    
    // Dangerous: acting on absence
    if _, exists := deploy.Labels["feature.example.com/sidecar"]; !exists {
        log.Info("Sidecar label not present, skipping")
        return ctrl.Result{}, nil
    }
    
    // ... inject sidecar
}

If a user just added the label, your cache might not have it yet. You skip processing, and the user wonders why nothing happened. Worse, you don’t requeue — so you might never process it until the next resync.

Multi-Resource Coordination ¶

Controllers often watch multiple resource types:

func (r *MyReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    var deploy appsv1.Deployment
    if err := r.Get(ctx, req.NamespacedName, &deploy); err != nil {
        return ctrl.Result{}, err
    }
    
    // Get the associated ConfigMap
    var configMap corev1.ConfigMap
    configMapName := deploy.Annotations["my-controller/config"]
    if err := r.Get(ctx, types.NamespacedName{
        Namespace: deploy.Namespace,
        Name:      configMapName,
    }, &configMap); err != nil {
        if apierrors.IsNotFound(err) {
            // BUG: ConfigMap might exist but not be in cache yet
            log.Info("ConfigMap not found, waiting...")
            return ctrl.Result{RequeueAfter: 5 * time.Second}, nil
        }
        return ctrl.Result{}, err
    }
    
    // ...
}

You’re watching Deployments. Someone creates a ConfigMap and then annotates a Deployment to reference it. The Deployment event arrives, but the ConfigMap watch might not have received its ADDED event yet. You conclude the ConfigMap doesn’t exist.

Conflict Loops ¶

Two controllers watching the same resource, both with stale views:

Controller A cache: Deployment has 3 replicas
Controller B cache: Deployment has 3 replicas

User sets replicas to 5

Controller A sees update event (replicas=5)
Controller A: "I need to create a monitoring config for 5 replicas"
Controller A updates Deployment annotations

Controller B sees the annotation update (but has stale replicas=3 in cache)
Controller B: "Annotations changed, let me process this... replicas=3"
Controller B "fixes" replicas back to 3 based on stale cache

Controller A sees replicas change to 3...

Both controllers are acting rationally based on their view. But their views are inconsistent, and they fight.

Strategies and Patterns ¶

Requeue with Backoff ¶

Don’t trust a single reconciliation. If conditions aren’t met, requeue and check again:

func (r *MyReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    var deploy appsv1.Deployment
    if err := r.Get(ctx, req.NamespacedName, &deploy); err != nil {
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }
    
    // Check if we've processed this
    if deploy.Annotations["my-controller/processed"] != "true" {
        // Do processing...
        deploy.Annotations["my-controller/processed"] = "true"
        if err := r.Update(ctx, &deploy); err != nil {
            return ctrl.Result{}, err
        }
        // Don't trust the update immediately - requeue to verify
        return ctrl.Result{RequeueAfter: 1 * time.Second}, nil
    }
    
    return ctrl.Result{}, nil
}

Optimistic Concurrency ¶

Always use ResourceVersion when updating. The API server rejects updates with stale ResourceVersion:

func (r *MyReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    var deploy appsv1.Deployment
    if err := r.Get(ctx, req.NamespacedName, &deploy); err != nil {
        return ctrl.Result{}, err
    }
    
    // Modify
    deploy.Labels["processed"] = "true"
    
    // Update - this uses the ResourceVersion from Get()
    if err := r.Update(ctx, &deploy); err != nil {
        if apierrors.IsConflict(err) {
            // Someone else modified it - requeue and try again
            log.Info("Conflict detected, requeueing")
            return ctrl.Result{Requeue: true}, nil
        }
        return ctrl.Result{}, err
    }
    
    return ctrl.Result{}, nil
}

The conflict error is your friend. It tells you your view was stale and prevents you from clobbering someone else’s changes.

Read-Through on Critical Paths ¶

When you absolutely need fresh data, bypass the cache:

func (r *MyReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    // Normal cached read
    var deploy appsv1.Deployment
    if err := r.Get(ctx, req.NamespacedName, &deploy); err != nil {
        return ctrl.Result{}, err
    }
    
    // Need fresh data for critical decision? Bypass cache.
    var freshDeploy appsv1.Deployment
    if err := r.APIReader.Get(ctx, req.NamespacedName, &freshDeploy); err != nil {
        return ctrl.Result{}, err
    }
    
    // freshDeploy is read directly from API server
}

Use this sparingly — it adds API server load and defeats the purpose of caching. But for critical decisions where eventual consistency isn’t acceptable, it’s the right tool.

The Expectations Pattern ¶

The kube-controller-manager uses this pattern for ReplicaSets. Track what you expect to happen, and wait for the cache to confirm:

type Expectations struct {
    mu       sync.Mutex
    expected map[string]expectation
}

type expectation struct {
    add    int  // expecting this many adds
    delete int  // expecting this many deletes
}

func (e *Expectations) ExpectCreations(key string, count int) {
    e.mu.Lock()
    defer e.mu.Unlock()
    exp := e.expected[key]
    exp.add += count
    e.expected[key] = exp
}

func (e *Expectations) CreationObserved(key string) {
    e.mu.Lock()
    defer e.mu.Unlock()
    exp := e.expected[key]
    exp.add--
    e.expected[key] = exp
}

func (e *Expectations) SatisfiedExpectations(key string) bool {
    e.mu.Lock()
    defer e.mu.Unlock()
    exp := e.expected[key]
    return exp.add <= 0 && exp.delete <= 0
}

In your reconciler:

func (r *MyReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    key := req.NamespacedName.String()
    
    // Don't reconcile until expectations are met
    if !r.expectations.SatisfiedExpectations(key) {
        log.Info("Expectations not yet satisfied, skipping")
        return ctrl.Result{}, nil
    }
    
    // ... determine we need to create 3 pods
    
    r.expectations.ExpectCreations(key, 3)
    for i := 0; i < 3; i++ {
        if err := r.Create(ctx, &pod); err != nil {
            // Creation failed - adjust expectations
            r.expectations.CreationObserved(key)
            return ctrl.Result{}, err
        }
    }
    
    return ctrl.Result{}, nil
}

Your pod informer’s event handler calls CreationObserved() when it sees new pods. This prevents the reconciler from creating duplicate pods because it doesn’t see the ones it just created.

Generation vs ObservedGeneration ¶

For tracking whether a controller has processed the latest spec changes, use the Generation pattern:

func (r *MyReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    var obj myv1.MyResource
    if err := r.Get(ctx, req.NamespacedName, &obj); err != nil {
        return ctrl.Result{}, err
    }
    
    // Skip if we've already processed this generation
    if obj.Status.ObservedGeneration == obj.Generation {
        return ctrl.Result{}, nil
    }
    
    // Process the spec...
    
    // Update status to record that we've processed this generation
    obj.Status.ObservedGeneration = obj.Generation
    obj.Status.Conditions = append(obj.Status.Conditions, metav1.Condition{
        Type:               "Ready",
        Status:             metav1.ConditionTrue,
        ObservedGeneration: obj.Generation,
        LastTransitionTime: metav1.Now(),
    })
    
    if err := r.Status().Update(ctx, &obj); err != nil {
        return ctrl.Result{}, err
    }
    
    return ctrl.Result{}, nil
}

metadata.generation increments only when spec changes. status.observedGeneration records which generation your controller last processed. This gives you a reliable way to detect “is there new work to do” even with cache lag.

Hands-On: Demonstrating and Debugging Cache Lag ¶

Instrumenting Cache Freshness ¶

Add metrics to track how stale your cache reads are:

var (
    cacheAgeHistogram = prometheus.NewHistogramVec(
        prometheus.HistogramOpts{
            Name:    "controller_cache_age_seconds",
            Help:    "Age of objects read from cache",
            Buckets: []float64{0.01, 0.05, 0.1, 0.5, 1, 5, 10, 30},
        },
        []string{"controller", "resource"},
    )
)

func (r *MyReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    start := time.Now()
    
    var deploy appsv1.Deployment
    if err := r.Get(ctx, req.NamespacedName, &deploy); err != nil {
        return ctrl.Result{}, err
    }
    
    // Estimate cache age from last update timestamp
    if lastUpdate := deploy.ManagedFields[len(deploy.ManagedFields)-1].Time; lastUpdate != nil {
        age := time.Since(lastUpdate.Time).Seconds()
        cacheAgeHistogram.WithLabelValues("mycontroller", "deployment").Observe(age)
    }
    
    // ...
}

Simulating Busy Cluster Conditions ¶

For testing, add artificial delay to your event handlers:

func setupEventHandlers(mgr ctrl.Manager) error {
    informer, err := mgr.GetCache().GetInformer(context.Background(), &appsv1.Deployment{})
    if err != nil {
        return err
    }
    
    informer.AddEventHandler(cache.ResourceEventHandlerFuncs{
        AddFunc: func(obj interface{}) {
            if simulateLag {
                time.Sleep(2 * time.Second)  // Simulate busy cluster
            }
        },
        UpdateFunc: func(old, new interface{}) {
            if simulateLag {
                time.Sleep(2 * time.Second)
            }
        },
    })
    
    return nil
}

Run your controller with this lag injected and watch how it behaves.

Logging Patterns ¶

Log enough context to debug stale cache issues:

func (r *MyReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    log := log.FromContext(ctx)
    
    var deploy appsv1.Deployment
    if err := r.Get(ctx, req.NamespacedName, &deploy); err != nil {
        return ctrl.Result{}, err
    }
    
    log.V(1).Info("Reconciling",
        "resourceVersion", deploy.ResourceVersion,
        "generation", deploy.Generation,
        "observedGeneration", deploy.Status.ObservedGeneration,
        "labels", deploy.Labels,
    )
    
    // ... do work
    
    log.V(1).Info("Reconcile complete",
        "resultingResourceVersion", deploy.ResourceVersion,
    )
    
    return ctrl.Result{}, nil
}

When debugging, compare these ResourceVersions across log entries to trace propagation delays.

Embracing Eventual Consistency ¶

The informer cache isn’t broken — it’s working as designed. Kubernetes is an eventually consistent system, and your controller must be too.

Design principles:

Idempotency: Running your reconciler twice with the same input should produce the same result
Convergence: Given enough time without new inputs, the system reaches the desired state
Tolerance: Your controller handles stale data gracefully — it might make suboptimal decisions, but never catastrophically wrong ones
Verification: Don’t trust a single check. Requeue, recheck, confirm

The goal isn’t to eliminate cache lag — it’s to build controllers that work correctly despite it.