The Invisible Score
You've had the conversation. Maybe in a sprint retro, maybe in a one-on-one, maybe in a thread that goes quiet too fast. Someone says the codebase is slowing the team down. Everyone nods. And then nothing happens.
Not because anyone disagrees. Because nobody can prove it.
The cost of a bad codebase doesn't appear on your velocity chart or your DORA metrics or your quarterly OKR review. It shows up as absence — features that weren't built, deploys that weren't attempted, hires that didn't stay. You can't put "things we didn't do" on a spreadsheet. This is why "technical debt" is the most agreed-upon and least-acted-upon problem in software engineering. The evidence is real. The proof is invisible.
A framework from piechowski.io proposes a simpler move: stop trying to measure the codebase directly. Measure the behaviors it produces instead.
The Score Nobody Keeps
The Codebase Drag Audit scores five behavioral signals, each rated 0–2.
Padded estimates. Not because engineers are bad at estimating — because they've learned what "two days of work" actually costs in this codebase. A 0 means estimates track reality. A 2 means every estimate carries an apology tax for coupling and blast radius the team can't predict.
Deploy fear. A 0 means the team ships daily without drama. A 2 means deploys are batched, avoided, or require a designated captain — someone whose job is to absorb the risk the system should be handling.
Avoided files. Every codebase has its haunted house — the module nobody touches, the one you build around instead of through. A 0 means most of the codebase is approachable. A 2 means there are files the team treats like load-bearing walls — too embedded to touch, so they build around them instead.
Unreliable tests. Not "no tests" — worse. Tests that pass while production breaks. Coverage numbers that represent confidence the team doesn't actually feel. A 0 means tests catch real regressions. A 2 means the test suite is theater.
Onboarding friction. How long before a new hire commits meaningful code. A 0 means days. A 2 means weeks — and the gap is filled with undocumented environment variables, broken seed data, and setup instructions that were accurate eighteen months ago.
Each signal scored independently. Total: 0–10.
The thresholds are clean: 0–3 is normal friction. 4–6 means drag is real and your team knows it. 7+ means the codebase is the bottleneck — and no amount of process change, hiring, or motivational talks will fix it.
What Behaviors Know That Dashboards Don't
What makes this framework land isn't the scoring. It's what's being measured.
Traditional engineering metrics measure output: lines shipped, tickets closed, deploys completed. The drag audit measures something different — the gap between what your team could do and what they actually do, visible in the behavioral patterns they've already developed.
Every one of those five signals is something your team already exhibits. They pad estimates. They fear deploys. They avoid files. They don't trust tests. They watch new hires struggle. None of this is hidden. It's in every retro, every standup, every hallway conversation someone changes the subject away from.
The problem is framing. Each behavior, taken alone, has a plausible alternative explanation. Estimates are padded because "estimating is hard." Deploys are scary because "production is complex." New hires are slow because "our domain is unique." Every individual symptom has an alibi.
The score strips the alibis away. Five independent signals all pointing the same direction isn't bad luck. It's an environment.
The behaviors you're most frustrated by on your team are the most honest data you have. Padded estimates aren't a morale problem. Deploy fear isn't a confidence problem. Avoided files aren't a courage problem. They're accurate readings of an environment that makes those responses rational. Your team isn't slow. Your team is correct — they've priced in the drag you haven't measured yet.
The Audit as Conversation
Here's how to run it. Thirty minutes, maximum.
Score each signal 0–2. Do it individually first — have each engineer score independently, then compare. The disagreements are as revealing as the scores. When one person rates deploy fear a 0 and another rates it a 2, that tells you something about who absorbs the risk and who's shielded from it.
The number you get isn't a judgment. It's a shared language.
Before the audit, "the codebase is slow" is a feeling. After, "we're a 7 — deploy fear and onboarding friction are our highest signals" is a diagnosis. Feelings get acknowledged in retros and forgotten by the next sprint. Diagnoses get resourced in planning.
Start with the highest-scoring signal. Don't attempt a comprehensive refactoring push — that's force applied to complexity, and it almost always stalls. A deploy-fear score of 2 tells you exactly where to invest: CI speed, rollback automation, smaller batch sizes. An onboarding-friction score of 2 points somewhere different: documentation, seed data, environment setup. The score doesn't just tell you there's a problem. It tells you which problem to solve first.
Two-week targeted sprint. Measure the before and after. Re-score.
The most useful diagnostics don't discover new information. They make visible what everyone already knows but can't say out loud.
Your team has been measuring codebase drag for years — with their estimates, their deployment rituals, their workarounds, their warnings to new hires. They just didn't have a number to attach to it. Now they do. The score doesn't fix the codebase. It fixes the conversation. And the conversation is where investment starts.
Source: piechowski.io — "Why Your Engineering Team Is Slow (It's the Codebase, Not the People)" (2026-03-31)