Activity vs Progress: The Visibility Trap in Engineering Productivity


Your engineering team is working hard. Nobody’s slacking. Standups are full, Slack is active, PRs are flowing, sprints are completing on time.

A feature that should take three days still takes three weeks to reach a customer. Not because anyone is coasting — because the system around them rewards effort that’s visible over effort that matters.

What's on the performance review:
  "Impact," "ownership," "collaboration," "technical excellence"

What leadership actually notices:
  Always visible in Slack                → "strong communicator"
  Speaks up in every meeting             → "has leadership presence"
  Name on lots of PRs                    → "high output"
  Lots of commits, lots of lines         → "prolific"
  Files tickets proactively              → "takes initiative"
  Joins every cross-team thread          → "collaborative"

What leadership doesn't see:
  Spent 3 days thinking through a hard design
  Wrote 50 lines that saved the team 5,000
  Said nothing in the meeting because nothing needed saying
  Didn't file a ticket because they just fixed it

None of these signals are on any official rubric. They don’t have to be. They shape who gets called “high-performer” at calibration and who gets told to “increase visibility.” This isn’t because leaders are shallow — measuring individual progress is genuinely hard, and visible signals become proxies because better ones don’t exist yet. But developers figure out what gets rewarded fast — and like any adverse selection dynamic, visible work starts crowding out valuable work.

                Activity Signals          Progress Signals
──────────────────────────────────────────────────────────────
Visible?        Yes, immediately          Only in retrospect
Countable?      Yes, trivially            Requires judgment
Attributable?   To individuals            Distributed across teams
Shapes promo?   Strongly (implicit)       Weakly, if at all

What shapes promotions and what moves the business have almost nothing in common.

A developer's 8-hour day:

Meetings                                ████████░░░░░░░░  1.5h
Context switches (Slack, email, PRs)    █████░░░░░░░░░░░  1.0h
Waiting (CI, reviews, deploys, envs)    ████████░░░░░░░░  1.5h
Shallow coding (small fixes, comments)  ██████████░░░░░░  2.0h
Deep work (actual feature building)     ██████████░░░░░░  2.0h
                                        ────────────────
                                        8.0h total

Activity metrics capture all five categories. They’re all “work.” Progress comes from the last one.

What leadership implicitly rewards:
  Slack threads resolved          → visible
  Meetings attended               → visible
  PRs reviewed                    → visible
  Tickets created & closed        → visible
  Lines of code committed         → visible
  Deep work on hard problems      → invisible

Visible hours:     6
Deep work hours:   2

Some of those visible hours do matter — a code review catches a real bug, a meeting unblocks a dependency. The problem isn’t that visible work is worthless. It’s that the system can’t distinguish visible work that moved the needle from visible work that didn’t.

Sometimes the developer is trying to ship the feature. The infrastructure won’t let them.

A 3-day feature, in practice:

Day 1:  Start coding. CI is backed up — 45 min queue.
        Push a fix. Flaky test fails. Re-run. Wait again.
        Staging env is broken. File a ticket. Switch to something else.

Day 2:  Staging is back. Deploy fails — config drift.
        Slack thread with platform team. Meeting to triage.
        Env fixed by EOD. No code shipped.

Day 3:  Finally deploy to staging. Works.
        Prod deploy needs approval. Approver is OOO.

Day 4:  Approved. Deploy to prod. Rollback — dependency issue.
        Debug. Fix. Redeploy. Works.

Day 5:  Feature live. 3 days of coding. 5 days of calendar.
        Activity generated: 14 Slack threads, 3 tickets,
        2 meetings, 8 CI runs, 4 deploys.

Broken internal infrastructure converts deep work into activity — waiting waste, defect waste, motion waste packed into a single sprint. The developer didn’t choose to spend two days fighting CI and staging — the system forced it. And every hour fighting broken tooling shows up as “activity” on someone’s dashboard: tickets filed, threads resolved, incidents triaged. At some point, infrastructure itself becomes the bottleneck on what the team can ship.

The irony: fixing the infrastructure would reduce activity metrics across the board. Fewer tickets, fewer Slack threads, fewer heroic debugging sessions. On the dashboard, it looks like the team got less productive — which is why most infrastructure investments die in the valley before they pay off.

Goldratt called this out decades ago: throughput is value reaching users, not things produced. A feature behind a flag that never gets flipped isn’t progress — it’s inventory. Not all non-user-facing work is waste — migrations, tech debt reduction, and platform improvements enable future throughput. But the question is whether that’s what’s actually happening, or whether it’s the story told after the fact.

Team Alpha:
  PRs merged per week:          50
  % reaching users:             10%
  Throughput:                   5 PRs of user value

Team Beta:
  PRs merged per week:          15
  % reaching users:             80%
  Throughput:                   12 PRs of user value

Team Alpha looks 3.3x faster on the activity dashboard. Team Beta delivers 2.4x more value.

                            Alpha       Beta
──────────────────────────────────────────────────
Output (PRs/week):          50          15
Throughput (user value):    5           12
Waste (unused output):      45          3
Waste ratio:                90%         20%
If you're the VP of Engineering:

  Activity report says:   "Alpha is our top team, ship more like them"
  Throughput report says: "Alpha has a 90% waste ratio"

Same data. Two conclusions. One of them costs you quarters of misallocated investment.

Each of these patterns has a benign explanation — smaller PRs, better CI/CD, legitimate estimation changes. They can also indicate gaming. The data alone won’t tell you which. The trend lines will.

Pattern 1 — PR inflation:

Quarter     PRs/sprint    Avg lines/PR    Features shipped
─────────────────────────────────────────────────────────────
Q1          28            340             4
Q2          35            210             4
Q3          47            120             3

PRs went up 68%. Features went down.

Work is being split to hit the metric. The overhead of reviewing three PRs instead of one slows delivery.

Pattern 2 — Deploy frequency theater:

Quarter     Deploys/day    Rollback rate    Incidents/week
──────────────────────────────────────────────────────────────
Q1          1.2            3%               2
Q2          2.1            3%               3
Q3          3.2            3%               5

Deploys went up 167%. Quality stayed the same.
Incidents scaled linearly — more deploys, more fires.

Higher frequency with unchanged risk just means more incidents. Config changes and no-op deploys inflate the number without touching confidence.

Pattern 3 — Story point inflation:

Quarter     Points/sprint    Customer features    Internal churn
──────────────────────────────────────────────────────────────────
Q1          62               5                    12 tickets
Q2          74               4                    22 tickets
Q3          89               4                    38 tickets

Points up 44%. Customer features flat. Internal churn tripled.

Points measure estimated effort, not value. A 5-point internal refactor and a 5-point revenue feature look identical on the burndown chart.

Metric What It Measures Gaming Vector What It Misses
PRs merged Code movement Split PRs smaller Impact of each PR
Deploy frequency Release cadence Deploy config changes Whether deploys matter
Story points Estimated effort burned Re-estimate higher Value of the effort
Slack activity Communication volume Reply to everything Whether anything was decided
Lines changed Volume of code Refactors, formatting Whether it’s the right code
Tickets closed Backlog throughput Create-and-close Whether anyone needed the ticket

All of these can trend up while the product stands still. The things that would actually catch this — how much calendar time is real work vs waiting, how much of what you build reaches users and whether they found it useful, how much time developers lose to broken tooling — are harder to measure, slower to move, and don’t fit neatly into a quarterly calibration. Which is why most orgs default to counting activity instead.

The activity trap:

  1. Leadership implicitly rewards visibility
  2. Developers learn what gets noticed (rational response)
  3. Activity goes up (looks like success)
  4. Progress stays flat (nobody's tracking that)
  5. Leadership asks "why aren't we shipping faster?"
  6. Response: measure more activity (goto 1)

Nobody designed this on purpose. Leaders default to visible signals because measuring progress is genuinely hard. Developers optimize for what gets rewarded because that’s rational. Everyone is stuck in the same loop.

Breaking out starts with one honest question at the next calibration: “what changed for our users this quarter?” Not PRs merged, not tickets closed, not Slack activity — what actually changed. If the room goes quiet, that’s the gap between activity and progress. Closing it is uncomfortable, slow, and probably the most important work leadership can do.