Kubernetes Networking Demystified: Tracing the Magic (and Debugging the Nightmare)

You run curl my-service:8080 from a pod, and it just works. The request reaches another pod, possibly on a different node, and you get a response. Magic.

Until it doesn’t work. Then you’re staring at iptables rules, tcpdump output, and CNI logs wondering where your packet went.

This post traces a packet through Kubernetes networking — from pod to Service to pod across nodes. Understanding this flow turns debugging from nightmare to systematic diagnosis.

The Setup: What We’re Tracing ¶

Let’s make this concrete. We have:

Node 1 (192.168.1.10)
  Pod A: 10.244.1.5 (client)

Node 2 (192.168.1.11)  
  Pod B: 10.244.2.8 (server, behind Service)
  Pod C: 10.244.2.9 (server, behind Service)

Service: my-service
  ClusterIP: 10.96.45.67
  Port: 8080
  Endpoints: 10.244.2.8:8080, 10.244.2.9:8080

Pod A runs: curl my-service:8080

What actually happens? Let’s trace it.

Part 1: Pod-to-Pod on the Same Node ¶

Before crossing nodes, let’s understand the simplest case: two pods on the same node.

The Virtual Network ¶

Each pod gets its own network namespace — an isolated network stack with its own interfaces, routes, and iptables. But pods need to communicate. Kubernetes creates a virtual ethernet pair (veth) connecting each pod to the node:

+-----------------------------------------------------------+
|                          Node                             |
|                                                           |
|   +-----------+                    +-----------+          |
|   |   Pod A   |                    |   Pod B   |          |
|   |   eth0    |                    |   eth0    |          |
|   +-----+-----+                    +-----+-----+          |
|         | veth                           | veth           |
|         |                                |                |
|   +-----+--------------------------------+-----+          |
|   |            Bridge (cni0/cbr0)             |           |
|   +-------------------------------------------+           |
|                                                           |
+-----------------------------------------------------------+

The veth pair: One end (eth0) is inside the pod’s network namespace. The other end is attached to a bridge on the node. The pair acts like a virtual cable.

The bridge: A software switch (commonly named cni0, cbr0, or docker0). All pod veths connect to this bridge. Packets to other pods on the same node go through the bridge.

Packet Flow: Same Node ¶

Pod A (10.244.1.5) sends to Pod B (10.244.1.6), both on Node 1:

1. Pod A sends packet (src: 10.244.1.5, dst: 10.244.1.6)
2. Packet exits via eth0 (inside pod) -> enters veth -> arrives at bridge
3. Bridge looks up MAC for 10.244.1.6 -> forwards to Pod B's veth
4. Packet enters Pod B's eth0
5. Pod B receives packet

No iptables (for basic connectivity), no encapsulation. Just Layer 2 switching on the bridge.

See It Yourself ¶

# On the node, list veths
ip link show type veth

# See the bridge
ip link show type bridge
brctl show cni0

# See pod connections
bridge fdb show dev cni0

# tcpdump on the bridge
tcpdump -i cni0 -n host 10.244.1.5

Part 2: How Services Work (iptables) ¶

Now for the interesting part. Pod A doesn’t call Pod B directly — it calls my-service:8080. The Service has a ClusterIP (10.96.45.67) that doesn’t exist on any interface. How does this work?

The ClusterIP Illusion ¶

A ClusterIP is a virtual IP. No interface has this address. No ARP entry exists. If you try to ping it from outside the cluster, nothing responds.

Yet from inside a pod, it works. The secret: iptables rewrites the destination before the packet leaves.

kube-proxy’s Job ¶

kube-proxy runs on every node. It watches Services and Endpoints, then programs iptables rules that:

Intercept packets destined for ClusterIPs
Rewrite the destination to an actual pod IP (DNAT)
Load balance across endpoints

The Chain of Chains ¶

iptables organizes rules into chains. kube-proxy creates a hierarchy:

Packet arrives (dst: 10.96.45.67:8080)
    |
    v
PREROUTING (or OUTPUT for local pods)
    |
    v
KUBE-SERVICES -- matches on ClusterIP:port
    |
    v
KUBE-SVC-XYZABC123 -- the Service's chain, randomly selects endpoint
    |
    +--- 50% ---> KUBE-SEP-ENDPOINT1 (DNAT to 10.244.2.8)
    |
    +--- 50% ---> KUBE-SEP-ENDPOINT2 (DNAT to 10.244.2.9)

KUBE-SERVICES: The entry point. Has rules for every Service, matching on ClusterIP:port.

KUBE-SVC-*: One chain per Service. Contains probability-based jumps to endpoint chains (this is how load balancing works).

KUBE-SEP-*: One chain per endpoint (pod). Performs the actual DNAT — rewriting destination from ClusterIP to pod IP.

Reading Real iptables Rules ¶

Let’s see what kube-proxy creates:

# Dump all iptables rules
iptables-save | grep my-service

# Or more specifically, find the Service chain
iptables -t nat -L KUBE-SERVICES -n | grep 10.96.45.67

Example output (annotated):

# Entry in KUBE-SERVICES for our Service
-A KUBE-SERVICES -d 10.96.45.67/32 -p tcp -m tcp --dport 8080 \
    -j KUBE-SVC-ABCD1234  # Jump to Service chain

# The Service chain with load balancing
-A KUBE-SVC-ABCD1234 -m statistic --mode random --probability 0.5 \
    -j KUBE-SEP-ENDPOINT1  # 50% to first endpoint
-A KUBE-SVC-ABCD1234 \
    -j KUBE-SEP-ENDPOINT2  # Remaining 50% to second endpoint

# Endpoint chains - the actual DNAT
-A KUBE-SEP-ENDPOINT1 -p tcp \
    -j DNAT --to-destination 10.244.2.8:8080
-A KUBE-SEP-ENDPOINT2 -p tcp \
    -j DNAT --to-destination 10.244.2.9:8080

The probability math: With two endpoints, the first rule matches 50%. The second rule catches everything else (also 50%). With three endpoints: 33%, 50% of remaining (33%), then the rest (33%).

What DNAT Does ¶

Before DNAT:

src: 10.244.1.5 (Pod A)
dst: 10.96.45.67:8080 (Service ClusterIP)

After DNAT:

src: 10.244.1.5 (Pod A)
dst: 10.244.2.8:8080 (Pod B - actual endpoint)

The packet now has a real destination. It can be routed to Pod B.

Connection Tracking (conntrack) ¶

One problem: Pod B’s response will have:

src: 10.244.2.8:8080 (Pod B)
dst: 10.244.1.5 (Pod A)

Pod A sent to 10.96.45.67 but receives from 10.244.2.8. Won’t it be confused?

conntrack saves us. The kernel tracks connections. When the response arrives, it reverses the DNAT:

Response packet:
  Before reverse DNAT: src=10.244.2.8, dst=10.244.1.5
  After reverse DNAT:  src=10.96.45.67, dst=10.244.1.5

Pod A sees the response coming from the ClusterIP it originally contacted. The illusion holds.

# See connection tracking entries
conntrack -L | grep 10.96.45.67

Part 3: Crossing Nodes (Overlay Networks) ¶

Our packet is now destined for 10.244.2.8 (Pod B on Node 2). But there’s a problem.

The Problem: Pod CIDRs Aren’t Routable ¶

Pod IPs (10.244.x.x) are internal to Kubernetes. Your physical network doesn’t know how to route them:

Node 1 (192.168.1.10) wants to send to 10.244.2.8
Physical router: "10.244.2.8? Never heard of it. Drop."

Node 1’s routing table doesn’t have a route to 10.244.2.0/24. Neither does your datacenter’s router.

Solution: Overlay Networks ¶

An overlay network encapsulates pod-to-pod packets inside node-to-node packets:

+----------------------------------------------------------+
| Original Packet                                          |
| src: 10.244.1.5   dst: 10.244.2.8   payload: HTTP GET    |
+----------------------------------------------------------+
                         |
                  VXLAN Encapsulation
                         |
                         v
+----------------------------------------------------------+
| Outer Header                                             |
| src: 192.168.1.10  dst: 192.168.1.11  proto: UDP:8472    |
+----------------------------------------------------------+
| VXLAN Header (VNI: 1)                                    |
+----------------------------------------------------------+
| Inner Packet (original)                                  |
| src: 10.244.1.5   dst: 10.244.2.8   payload: HTTP GET    |
+----------------------------------------------------------+

The physical network only sees the outer header: Node 1 sending UDP to Node 2. It routes normally. Node 2 receives, strips the outer header, and delivers the inner packet to Pod B.

VXLAN Deep Dive ¶

VXLAN (Virtual Extensible LAN) is the most common overlay in Kubernetes (used by Flannel, Calico in VXLAN mode, etc.).

Key components on each node:

VXLAN interface (flannel.1, vxlan.calico): A virtual interface that handles encap/decap
FDB (Forwarding Database): Maps pod IPs/MACs to node IPs
Routes: Direct pod CIDR traffic to the VXLAN interface

# See the VXLAN interface
ip -d link show flannel.1

# See the FDB entries (which node has which pods)
bridge fdb show dev flannel.1

# See routes to other pod CIDRs
ip route | grep 10.244

Example routing table on Node 1:

10.244.1.0/24 dev cni0 proto kernel scope link src 10.244.1.1  # Local pods
10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink              # Node 2's pods
10.244.3.0/24 via 10.244.3.0 dev flannel.1 onlink              # Node 3's pods

Traffic to 10.244.2.x goes through flannel.1, which encapsulates it.

Packet Flow: Crossing Nodes ¶

Full journey for Pod A (10.244.1.5, Node 1) to Pod B (10.244.2.8, Node 2):

Node 1:
  1. Pod A sends: src=10.244.1.5, dst=10.244.2.8
  2. Packet exits pod via veth -> arrives at bridge (cni0)
  3. Bridge checks: 10.244.2.8 not local
  4. Packet routes to flannel.1 (VXLAN interface)
  5. VXLAN encapsulates:
     - Outer: src=192.168.1.10, dst=192.168.1.11, UDP:8472
     - Inner: original packet
  6. Encapsulated packet sent on physical network (eth0)

Physical Network:
  7. Packet routed from 192.168.1.10 to 192.168.1.11

Node 2:
  8. eth0 receives packet
  9. Kernel sees UDP:8472 -> hands to VXLAN interface
  10. flannel.1 decapsulates, extracts inner packet
  11. Inner packet: src=10.244.1.5, dst=10.244.2.8
  12. Routes to cni0 (bridge)
  13. Bridge forwards to Pod B's veth
  14. Pod B receives original packet

MTU Matters ¶

VXLAN adds ~50 bytes of overhead (outer IP + UDP + VXLAN header). If your physical MTU is 1500:

Physical MTU:   1500
VXLAN overhead:  -50
Pod MTU:        1450

If pods use MTU 1500, packets that need the full size will fail (too big after encapsulation). Symptoms:

Small requests work, large requests hang
SSH works, SCP fails
TCP connections stall

Check MTU configuration:

# On pod
cat /sys/class/net/eth0/mtu

# On node VXLAN interface
cat /sys/class/net/flannel.1/mtu

Most CNIs set pod MTU correctly, but misconfigurations happen.

Alternative: No Overlay (BGP) ¶

Overlay isn’t the only option. Calico can run in BGP mode:

Each node advertises its pod CIDR to the network
Routers learn: “10.244.2.0/24 is behind 192.168.1.11”
No encapsulation needed — native routing

Trade-offs:

BGP: No overhead, but requires network integration (not all environments support it)
VXLAN: Works anywhere, but has overhead

Cloud providers often use their own routing (VPC routes in AWS/GCP) — no overlay, no BGP, just cloud magic.

Part 4: The Full Journey ¶

Let’s put it all together. Pod A curls my-service:8080:

NODE 1 (192.168.1.10)
======================
Pod A (10.244.1.5)
    | (1) curl my-service:8080
    |     DNS resolves to 10.96.45.67
    |     Packet: src=10.244.1.5, dst=10.96.45.67:8080
    v
iptables (PREROUTING)
    | (2) DNAT: dst 10.96.45.67 -> 10.244.2.8
    |     Packet: src=10.244.1.5, dst=10.244.2.8:8080
    v
Bridge (cni0)
    | (3) 10.244.2.8 not local -> route lookup
    v
VXLAN (flannel.1)
    | (4) Encapsulate
    |     Outer: src=192.168.1.10, dst=192.168.1.11
    v
eth0 (192.168.1.10)
    | (5) Send to physical network
    v
==== Physical Network ====
    |
    v
NODE 2 (192.168.1.11)
======================
eth0 (192.168.1.11)
    | (6) Receive encapsulated packet
    v
VXLAN (flannel.1)
    | (7) Decapsulate
    |     Extract: src=10.244.1.5, dst=10.244.2.8
    v
Bridge (cni0)
    | (8) Forward to Pod B's veth
    v
Pod B (10.244.2.8)
    (9) Receive packet, process HTTP request

The return path:

Pod B responds: src=10.244.2.8, dst=10.244.1.5
VXLAN encapsulates, sends to Node 1
Node 1 decapsulates
conntrack matches the existing connection
Reverse DNAT: src becomes 10.96.45.67 (the ClusterIP)
Pod A receives response from “my-service”

Part 5: Debugging the Nightmare ¶

Armed with this knowledge, debugging becomes systematic.

“Pod Can’t Reach Service” ¶

Step 1: Can the pod reach anything?

# From inside the pod
kubectl exec -it pod-a -- ping 8.8.8.8
kubectl exec -it pod-a -- ping 10.244.1.1  # Node's bridge IP

If this fails, the problem is basic connectivity (CNI, veth, bridge).

Step 2: Can the pod reach other pods on the same node?

kubectl exec -it pod-a -- ping <pod-on-same-node-ip>

If this fails: bridge or veth issue.

Step 3: Can the pod reach pods on other nodes?

kubectl exec -it pod-a -- ping <pod-on-different-node-ip>

If same-node works but cross-node fails: overlay problem.

Step 4: Can the pod reach the ClusterIP?

kubectl exec -it pod-a -- curl -v 10.96.45.67:8080

If direct pod IP works but ClusterIP fails: kube-proxy/iptables problem.

Step 5: Is DNS working?

kubectl exec -it pod-a -- nslookup my-service
kubectl exec -it pod-a -- cat /etc/resolv.conf

If IP works but name doesn’t: DNS problem (CoreDNS, resolv.conf).

Reading iptables Rules ¶

# SSH to the node, then:

# Find rules for your Service
iptables-save | grep <service-name>
iptables-save | grep <cluster-ip>

# List the KUBE-SERVICES chain
iptables -t nat -L KUBE-SERVICES -n --line-numbers

# Follow a specific Service chain
iptables -t nat -L KUBE-SVC-XXXXX -n

What to look for:

Is there a rule matching your ClusterIP?
Does the Service chain have endpoint rules?
Are the endpoint IPs correct?

Missing rules? Check if kube-proxy is running:

kubectl get pods -n kube-system -l k8s-app=kube-proxy
kubectl logs -n kube-system -l k8s-app=kube-proxy

tcpdump at Each Hop ¶

Capture traffic to see where packets go (or stop):

# On the pod (if tcpdump available)
kubectl exec -it pod-a -- tcpdump -i eth0 -n host 10.244.2.8

# On node, at the bridge
tcpdump -i cni0 -n host 10.244.1.5

# On node, at the VXLAN interface
tcpdump -i flannel.1 -n host 10.244.2.8

# On node, at the physical interface (see encapsulated packets)
tcpdump -i eth0 -n udp port 8472

# On destination node
tcpdump -i eth0 -n udp port 8472
tcpdump -i flannel.1 -n host 10.244.1.5

Interpret what you see:

Packets at cni0 but not flannel.1? Routing problem.
Packets at flannel.1 but not remote eth0? Physical network problem.
Packets at remote eth0 but not flannel.1? VXLAN decap problem.
Packets at remote cni0 but not pod? Bridge/veth problem.

conntrack Inspection ¶

# See all tracked connections
conntrack -L

# Filter for your Service
conntrack -L | grep 10.96.45.67

# Watch new connections
conntrack -E

Stale conntrack entries can cause weird issues (traffic to old pod IPs). Flushing can help (but disrupts existing connections):

conntrack -F

Common Failures ¶

kube-proxy not running:

Symptom: ClusterIP doesn’t work, direct pod IP works
Check: kubectl get pods -n kube-system -l k8s-app=kube-proxy
Fix: Restart kube-proxy, check logs

CNI misconfiguration:

Symptom: Pods can’t communicate at all, or only on same node
Check: kubectl get pods -n kube-system for CNI pods (flannel, calico, etc.)
Check: /etc/cni/net.d/ for CNI config
Fix: Reinstall CNI, check config

iptables rules missing:

Symptom: ClusterIP doesn’t work after Service creation
Check: iptables-save | grep <service-ip>
Cause: kube-proxy error, RBAC issue
Fix: Check kube-proxy logs

MTU mismatch:

Symptom: Small packets work, large fail; TCP stalls
Check: MTU on pod, bridge, VXLAN interface
Fix: Configure CNI with correct MTU

NetworkPolicy blocking:

Symptom: Some pods can’t connect, others can
Check: kubectl get networkpolicy -A
Fix: Add appropriate NetworkPolicy rules

Firewall blocking VXLAN:

Symptom: Cross-node fails, same-node works
Check: tcpdump -i eth0 udp port 8472 — packets sent but not received?
Fix: Open UDP 8472 (VXLAN) between nodes

Quick Diagnostic Commands ¶

# Check all network interfaces
ip addr

# Check routes
ip route

# Check iptables NAT rules
iptables -t nat -L -n -v

# Check VXLAN FDB
bridge fdb show dev flannel.1

# Check CNI config
cat /etc/cni/net.d/*

# Check kube-proxy mode
kubectl logs -n kube-system -l k8s-app=kube-proxy | grep "Using"

# Check endpoints for a Service
kubectl get endpoints my-service

Summary ¶

The magic of Kubernetes networking is built on:

veths and bridges: Connect pods within a node
iptables (kube-proxy): Implement Services via DNAT
Overlay networks: Carry pod traffic across nodes via encapsulation

When debugging:

Trace the path systematically (pod -> bridge -> overlay -> remote node -> pod)
Use tcpdump at each hop to see where packets stop
Check iptables for Service issues
Check conntrack for connection state issues
Check MTU for “big packets fail” symptoms

The magic is just routing, NAT, and encapsulation. Once you know the layers, you know where to look.