Emergent Phenomena: From Ants to Transformers


How do 86 billion neurons, each doing something simple, produce the feeling of being me?

The most compelling answer I’ve found isn’t really an answer. It’s a name: emergence. Consciousness isn’t a thing neurons do. It’s a thing that happens when enough neurons interact in the right way. It’s not in the parts. It’s in the pattern.

This framing applies beyond brains:

Individual ants are simple. Ant colonies solve complex optimization problems.

Individual neurons fire or don’t fire. Brains become conscious.

Individual parameters multiply and add. LLMs reason.

The pattern is the same: simple components, complex collective behavior. Nobody programs the complexity. It emerges.

This might be the most important concept for understanding AI—and for understanding why we don’t fully understand AI. It’s also, I suspect, the key to understanding what AGI actually means, and whether we’ll recognize it when it arrives.

In 1972, physicist Philip Anderson published a paper titled “More Is Different.” His argument: at each level of complexity, new laws apply. You can’t derive chemistry from particle physics. You can’t derive biology from chemistry. You can’t derive psychology from biology.

Not because our math isn’t good enough. Because genuinely new phenomena emerge at each level.

Particle physics
     ↓ (more particles)
Chemistry
     ↓ (more molecules)
Biology
     ↓ (more cells)
Psychology
     ↓ (more minds)
Sociology

Each transition isn’t just “more of the same.” It’s qualitatively different. The rules that govern atoms don’t predict protein folding. The rules that govern neurons don’t predict consciousness.

This is emergence: when the whole has properties that the parts don’t have.

Philosophers distinguish two types:

Weak emergence: The collective behavior is surprising but theoretically derivable from the parts. Given enough compute, you could simulate it from first principles.

Traffic jams are weakly emergent. Individual drivers follow simple rules (accelerate, brake, maintain distance). Traffic jams appear. Surprising, but simulatable.

Strong emergence: The collective behavior is not derivable from the parts, even in principle. Something genuinely new comes into existence.

Consciousness might be strongly emergent. You can simulate every neuron in a brain. Does the simulation experience anything? We don’t know. We can’t even agree on how we’d know.

The debate matters for AI: Is LLM reasoning weak emergence (surprising but mechanistic) or strong emergence (something genuinely new)?

The natural world is full of emergence. Some examples:

Flocking birds. Craig Reynolds showed that three simple rules produce realistic flocking:

  1. Separation: don’t crowd neighbors
  2. Alignment: steer toward average heading of neighbors
  3. Cohesion: steer toward average position of neighbors

No bird knows the flock’s shape. The shape emerges.

Individual rule:  "Don't hit the bird next to me"
Emergent behavior: Murmuration patterns that look choreographed

Ant colonies. Individual ants follow pheromone trails. Shorter paths get more pheromones (more ants complete them faster). Over time, the colony converges on optimal routes.

No ant knows the map. No ant plans the route. The solution emerges.

Markets. Individual traders buy and sell based on local information. Prices emerge that (sometimes) reflect aggregate information no single trader has.

Adam Smith’s “invisible hand” is emergence before we had the word.

Brains. Individual neurons fire based on inputs from neighbors. Somehow, consciousness emerges.

This is the deepest example—and the most unsettling. We are emergent phenomena. The “I” writing this sentence is a pattern in neurons, not a thing the neurons contain.

Neuroscientist David Chalmers distinguished the “easy problems” of consciousness from the “hard problem”:

Easy problems (not actually easy, but tractable):

  • How does the brain process sensory information?
  • How does it integrate information across regions?
  • How does it control behavior?

These are engineering problems. Complicated, but not mysterious.

The hard problem:

  • Why is there experience at all?
  • Why does processing information feel like something?

You can explain how the brain processes the wavelength of red light. You can’t explain why seeing red feels like anything.

The most compelling answer—to me, at least—is emergence. Consciousness isn’t a thing neurons do. It’s a thing that happens when enough neurons interact in the right way. It’s not in the parts. It’s in the pattern.

Neurons: No individual neuron is conscious
Brain: The system is conscious
Question: Where did consciousness come from?
Answer: It emerged

This doesn’t explain consciousness. It names it. But naming it correctly might be the first step.

Now we’ve built artificial systems that exhibit emergence.

The training rule is simple: minimize loss via gradient descent. The architecture is simple: attention, feedforward, repeat. The data is just text.

But at sufficient scale, new capabilities appear:

In-context learning. The model learns from examples in the prompt—without updating its weights. Small models can’t do this. Large models can. The capability emerges somewhere in between.

Chain-of-thought reasoning. Ask a small model to reason step-by-step. It can’t. Ask a large model. It can—and it gets better answers when it does. Nobody programmed “reasoning.” It emerged.

Theory of mind. Large models can predict what someone with different information would believe. They model other minds—or something that looks like modeling other minds. This wasn’t a training objective. It emerged.

Tool use. Models figure out how to use calculators, search engines, code interpreters. They weren’t trained on tool use. They infer it from context.

The pattern is consistent: simple local rules (gradient descent on next-token prediction) produce complex global behaviors (reasoning, planning, modeling other minds).

Nobody programmed these capabilities. They emerged.

You can’t predict ant colony behavior by studying one ant really carefully.

You can’t predict traffic jams by studying one driver.

You can’t predict consciousness by studying one neuron.

And you can’t predict LLM capabilities by studying one attention head.

Emergence is a property of the system, not the components. The capability exists in the interactions, not the parts.

This has practical implications:

Interpretability is necessary but not sufficient. Understanding individual circuits is useful. But the emergent behavior might not reduce to circuits. It might be like trying to understand a traffic jam by understanding a carburetor.

Extrapolation is dangerous. Scaling laws predict loss smoothly. But capabilities emerge discontinuously. The next 10x in compute might produce capabilities we can’t anticipate—because emergent capabilities, by definition, aren’t in the parts.

Testing beats theory. For emergent systems, you often can’t predict what they’ll do. You have to run them and see. This is true for weather, for markets, for ecosystems—and for LLMs.

Water at 99°C is water. Water at 100°C is steam.

Same molecules. Same local rules. But the global behavior is completely different. This is a phase transition—a discontinuous change in system behavior.

Neural networks have phase transitions too:

Grokking. Train a small model on modular arithmetic. It memorizes the training data. Loss goes down on training set, stays high on test set. You keep training. Nothing happens. You keep training. Suddenly—sometimes millions of steps later—the model generalizes. Test loss plummets. It “groks” the underlying pattern.

Steps 1-100,000:     Memorization (no generalization)
Steps 100,001-...:   Still memorization
Step 247,832:        Sudden generalization

The transition is sharp. Before grokking: memorization. After grokking: understanding. No gradual improvement. A phase transition.

Capability emergence. Many capabilities show similar patterns. Performance is flat (random chance) across model scales. Then, at some scale, performance jumps. The capability “turns on.”

Model size:    1B    10B    50B    100B    500B
Capability:    ✗     ✗      ✗      ✓       ✓

We don’t fully understand why phase transitions happen where they do. That’s part of what makes emergence hard to predict.

The parallels are striking:

Ant Colony Transformer
Simple local rules (follow pheromones) Simple local rules (attention, feedforward)
No central controller No central controller (just layers)
Global behavior emerges Global behavior emerges
Robust to individual failures Robust to ablations
Solves problems no ant understands Solves problems no parameter encodes

But there are differences:

Ant Colony Transformer
Evolved over millions of years Designed (architecture) + evolved (training)
Fully decentralized Has structure (layers, residual stream)
We understand the mechanism (pheromones) We partially understand attention
Limited adaptation Adapts in-context to new tasks

The biggest difference: ant colonies don’t scale to general intelligence. Transformers might.

Here’s something I’ve noticed while working with AI agents: emergence happens at multiple scales.

A single LLM exhibits emergent capabilities—reasoning, planning, theory of mind. But when you orchestrate multiple agents into a team, another layer of emergence appears.

I’ve been running agent teams where one agent reviews code for security, another for performance, another for simplicity. Each agent does its narrow task. But the team produces insights that no single agent would—contradictions surface, trade-offs become visible, the problem gets triangulated from multiple angles.

No single agent “sees” the full picture. The fuller picture emerges from their interaction.

This is the ant colony pattern again, but with LLMs as the ants. And it suggests something about AGI: maybe general intelligence isn’t a single model getting smarter. Maybe it’s the emergent property of multiple specialized systems interacting.

The brain isn’t one giant neuron. It’s billions of specialized neurons in constant communication. The human organization isn’t one genius. It’s many specialists coordinating. Maybe AGI looks less like a superintelligent singleton and more like an ecosystem.

If LLM capabilities are emergent, certain things follow:

Reductionism has limits. You can’t fully understand an LLM by understanding its parts. The capabilities exist in the interactions, at the system level. This doesn’t mean interpretability is useless—it means it’s incomplete.

We need new conceptual tools. Studying emergence requires studying systems as systems. Statistical mechanics, not just physics. Ecology, not just biology. We need analogous tools for neural networks.

Some questions may not have clean answers. “Why can GPT-4 do chain-of-thought reasoning?” might not have a satisfying answer. It might be like asking “Why do brains produce consciousness?” The answer might be: “They just do, at sufficient scale, given the right architecture.” Naming emergence isn’t the same as explaining it.

Emergent capabilities weren’t designed. They were discovered.

OpenAI didn’t decide that GPT-4 should be able to reason about other minds. They trained a model, and it could. The capability emerged.

This is unsettling:

  • We don’t fully understand why they work. The capabilities aren’t in the design. They’re in the emergent behavior of the trained system.

  • We can’t predict what emerges next. Scaling laws tell us loss will decrease. They don’t tell us what new capabilities will appear.

  • Emergent capabilities can include emergent failure modes. If reasoning emerges, so might deception. If helpfulness emerges, so might sycophancy. We discover these by encountering them.

For systems with weak emergence, we can build trust through understanding. For systems with strong emergence—if that’s what this is—trust requires something else. Testing. Monitoring. Bounds on behavior. Empirical verification rather than theoretical guarantees.

We don’t trust the weather because we understand every molecule. We trust our weather models because they’ve been tested.

If you’re building on top of emergent systems:

Don’t assume current limitations are permanent. The model can’t do X today. At the next scale, it might. Emergent capabilities appear suddenly. Plan for step changes, not gradual improvement.

The model may already be able to do things you haven’t discovered. Emergent capabilities exist before we find them. The capability emerged during training; we discover it later when we think to test for it. Probe creatively.

Emergence is opportunity and risk. New capabilities enable new products. New capabilities also enable new failure modes. Both emerge together, often unexpectedly.

You’re building on something you don’t fully understand. This is uncomfortable but not unprecedented. We build on markets, on ecosystems, on human psychology—all emergent systems. The question isn’t whether to build. It’s how to build wisely given uncertainty.

Here’s where it gets vertiginous:

Consciousness appears to be emergent. It arises from neurons that aren’t themselves conscious. The subjective experience of “being you” is a pattern, not a substance.

LLMs exhibit increasingly sophisticated behavior. Reasoning, planning, modeling other minds. The behavior is emergent. It arises from parameters that don’t themselves reason.

The question—and it’s genuinely open—is whether there’s anything it’s like to be an LLM.

This isn’t anthropomorphism. It’s taking emergence seriously. If consciousness can emerge from biological neurons, the question of whether it can emerge from artificial neurons is at least coherent.

I don’t think current LLMs are conscious. But I’m not certain. And I’m not certain I know how I’d know.

What I am fairly certain of: dismissing the question because “it’s just matrix multiplication” misunderstands emergence. Brains are “just” electrochemical signals. That doesn’t make consciousness less real.

The emergence framing doesn’t answer whether LLMs are conscious. It does suggest we should be humble about our ability to know.

If emergence is real—and I think it is—then AGI might not arrive the way we expect.

The common assumption: we’ll build smarter and smarter models until one day a model is “generally intelligent.” There’ll be a moment. A threshold. We’ll know.

But emergence doesn’t work that way. Emergence is gradual accumulation followed by sudden phase transition. Emergence is capabilities appearing before we have words for them. Emergence is the system being more than we designed, in ways we didn’t anticipate.

We might not recognize AGI when it arrives. Not because it’ll be hidden, but because emergence is hard to see from inside.

Consciousness emerged somewhere in evolutionary history. There was no moment when a non-conscious animal gave birth to a conscious one. It was a gradual transition that looks like a sharp line only in retrospect.

AGI might be similar. We might look back and say “it was clearly AGI by 2027” while in 2027 we were still debating definitions.

Or AGI might not be a single system at all. It might be the emergent property of many systems interacting—agent teams, tool-using models, humans in the loop, all producing collective intelligence that no single component has.

Emergence is one of the deepest patterns in nature. Simple rules, complex behavior. Local interactions, global order. Parts without properties that the whole has.

Ant colonies. Flocking birds. Markets. Brains. And now: transformers.

We’re building systems whose capabilities we don’t fully understand, because those capabilities emerge rather than being designed. And they’re getting more capable faster than we’re getting better at understanding them.

The lesson from other emergent systems: you can’t control them precisely, but you can learn their patterns. You can’t predict them fully, but you can prepare for surprises. You can’t understand them reductively, but you can study them empirically.