Your engineering team is working hard. Nobody’s slacking. Standups are full, Slack is active, PRs are flowing, sprints are completing on time.
A feature that should take three days still takes three weeks to reach a customer. Not because anyone is coasting — because the system around them rewards effort that’s visible over effort that matters.
Why Activity Wins ¶
What's on the performance review:
"Impact," "ownership," "collaboration," "technical excellence"
What leadership actually notices:
Always visible in Slack → "strong communicator"
Speaks up in every meeting → "has leadership presence"
Name on lots of PRs → "high output"
Lots of commits, lots of lines → "prolific"
Files tickets proactively → "takes initiative"
Joins every cross-team thread → "collaborative"
What leadership doesn't see:
Spent 3 days thinking through a hard design
Wrote 50 lines that saved the team 5,000
Said nothing in the meeting because nothing needed saying
Didn't file a ticket because they just fixed it
None of these signals are on any official rubric. They don’t have to be. They shape who gets called “high-performer” at calibration and who gets told to “increase visibility.” This isn’t because leaders are shallow — measuring individual progress is genuinely hard, and visible signals become proxies because better ones don’t exist yet. But developers figure out what gets rewarded fast — and like any adverse selection dynamic, visible work starts crowding out valuable work.
Activity Signals Progress Signals
──────────────────────────────────────────────────────────────
Visible? Yes, immediately Only in retrospect
Countable? Yes, trivially Requires judgment
Attributable? To individuals Distributed across teams
Shapes promo? Strongly (implicit) Weakly, if at all
What shapes promotions and what moves the business have almost nothing in common.
Where the Hours Go ¶
A developer's 8-hour day:
Meetings ████████░░░░░░░░ 1.5h
Context switches (Slack, email, PRs) █████░░░░░░░░░░░ 1.0h
Waiting (CI, reviews, deploys, envs) ████████░░░░░░░░ 1.5h
Shallow coding (small fixes, comments) ██████████░░░░░░ 2.0h
Deep work (actual feature building) ██████████░░░░░░ 2.0h
────────────────
8.0h total
Activity metrics capture all five categories. They’re all “work.” Progress comes from the last one.
What leadership implicitly rewards:
Slack threads resolved → visible
Meetings attended → visible
PRs reviewed → visible
Tickets created & closed → visible
Lines of code committed → visible
Deep work on hard problems → invisible
Visible hours: 6
Deep work hours: 2
Some of those visible hours do matter — a code review catches a real bug, a meeting unblocks a dependency. The problem isn’t that visible work is worthless. It’s that the system can’t distinguish visible work that moved the needle from visible work that didn’t.
Where the Other Hours Go ¶
Sometimes the developer is trying to ship the feature. The infrastructure won’t let them.
A 3-day feature, in practice:
Day 1: Start coding. CI is backed up — 45 min queue.
Push a fix. Flaky test fails. Re-run. Wait again.
Staging env is broken. File a ticket. Switch to something else.
Day 2: Staging is back. Deploy fails — config drift.
Slack thread with platform team. Meeting to triage.
Env fixed by EOD. No code shipped.
Day 3: Finally deploy to staging. Works.
Prod deploy needs approval. Approver is OOO.
Day 4: Approved. Deploy to prod. Rollback — dependency issue.
Debug. Fix. Redeploy. Works.
Day 5: Feature live. 3 days of coding. 5 days of calendar.
Activity generated: 14 Slack threads, 3 tickets,
2 meetings, 8 CI runs, 4 deploys.
Broken internal infrastructure converts deep work into activity — waiting waste, defect waste, motion waste packed into a single sprint. The developer didn’t choose to spend two days fighting CI and staging — the system forced it. And every hour fighting broken tooling shows up as “activity” on someone’s dashboard: tickets filed, threads resolved, incidents triaged. At some point, infrastructure itself becomes the bottleneck on what the team can ship.
The irony: fixing the infrastructure would reduce activity metrics across the board. Fewer tickets, fewer Slack threads, fewer heroic debugging sessions. On the dashboard, it looks like the team got less productive — which is why most infrastructure investments die in the valley before they pay off.
The Throughput Illusion ¶
Goldratt called this out decades ago: throughput is value reaching users, not things produced. A feature behind a flag that never gets flipped isn’t progress — it’s inventory. Not all non-user-facing work is waste — migrations, tech debt reduction, and platform improvements enable future throughput. But the question is whether that’s what’s actually happening, or whether it’s the story told after the fact.
Team Alpha:
PRs merged per week: 50
% reaching users: 10%
Throughput: 5 PRs of user value
Team Beta:
PRs merged per week: 15
% reaching users: 80%
Throughput: 12 PRs of user value
Team Alpha looks 3.3x faster on the activity dashboard. Team Beta delivers 2.4x more value.
Alpha Beta
──────────────────────────────────────────────────
Output (PRs/week): 50 15
Throughput (user value): 5 12
Waste (unused output): 45 3
Waste ratio: 90% 20%
If you're the VP of Engineering:
Activity report says: "Alpha is our top team, ship more like them"
Throughput report says: "Alpha has a 90% waste ratio"
Same data. Two conclusions. One of them costs you quarters of misallocated investment.
Three Patterns ¶
Each of these patterns has a benign explanation — smaller PRs, better CI/CD, legitimate estimation changes. They can also indicate gaming. The data alone won’t tell you which. The trend lines will.
Pattern 1 — PR inflation:
Quarter PRs/sprint Avg lines/PR Features shipped
─────────────────────────────────────────────────────────────
Q1 28 340 4
Q2 35 210 4
Q3 47 120 3
PRs went up 68%. Features went down.
Work is being split to hit the metric. The overhead of reviewing three PRs instead of one slows delivery.
Pattern 2 — Deploy frequency theater:
Quarter Deploys/day Rollback rate Incidents/week
──────────────────────────────────────────────────────────────
Q1 1.2 3% 2
Q2 2.1 3% 3
Q3 3.2 3% 5
Deploys went up 167%. Quality stayed the same.
Incidents scaled linearly — more deploys, more fires.
Higher frequency with unchanged risk just means more incidents. Config changes and no-op deploys inflate the number without touching confidence.
Pattern 3 — Story point inflation:
Quarter Points/sprint Customer features Internal churn
──────────────────────────────────────────────────────────────────
Q1 62 5 12 tickets
Q2 74 4 22 tickets
Q3 89 4 38 tickets
Points up 44%. Customer features flat. Internal churn tripled.
Points measure estimated effort, not value. A 5-point internal refactor and a 5-point revenue feature look identical on the burndown chart.
What These Metrics Miss ¶
| Metric | What It Measures | Gaming Vector | What It Misses |
|---|---|---|---|
| PRs merged | Code movement | Split PRs smaller | Impact of each PR |
| Deploy frequency | Release cadence | Deploy config changes | Whether deploys matter |
| Story points | Estimated effort burned | Re-estimate higher | Value of the effort |
| Slack activity | Communication volume | Reply to everything | Whether anything was decided |
| Lines changed | Volume of code | Refactors, formatting | Whether it’s the right code |
| Tickets closed | Backlog throughput | Create-and-close | Whether anyone needed the ticket |
All of these can trend up while the product stands still. The things that would actually catch this — how much calendar time is real work vs waiting, how much of what you build reaches users and whether they found it useful, how much time developers lose to broken tooling — are harder to measure, slower to move, and don’t fit neatly into a quarterly calibration. Which is why most orgs default to counting activity instead.
The Uncomfortable Truth ¶
The activity trap:
1. Leadership implicitly rewards visibility
2. Developers learn what gets noticed (rational response)
3. Activity goes up (looks like success)
4. Progress stays flat (nobody's tracking that)
5. Leadership asks "why aren't we shipping faster?"
6. Response: measure more activity (goto 1)
Nobody designed this on purpose. Leaders default to visible signals because measuring progress is genuinely hard. Developers optimize for what gets rewarded because that’s rational. Everyone is stuck in the same loop.
Breaking out starts with one honest question at the next calibration: “what changed for our users this quarter?” Not PRs merged, not tickets closed, not Slack activity — what actually changed. If the room goes quiet, that’s the gap between activity and progress. Closing it is uncomfortable, slow, and probably the most important work leadership can do.