Any engineer that’s ever worked at a somewhat successful company has inevitably run afoul of the “Do It Right” vs “Do It Fast” mindset. A customer comes with a problem and cash in hand to resolve it, engineering looks at it and comes up with a beautifully architected solution that they can have ready in a month.. Then leadership demands it be done in a week. Corners get cut, product gets delivered, and engineers fill the code with
// TODO: Fix this lateror Jira backlog items to go clean it up later. And that’s where ideas die.
But whats the true cost of Technical Debt? We’re all guilty of it, but can you really quantify the impact of it on an organization or a product?
Last week I came across a writeup by Abi Noba on “Technical Debt Cripples Productivity”, which is actually a review of a conference paper out of Sweden called “Technical Debt Cripples Software Developer Productivity”. In this paper, they analyze 43 software developers from 6 different companies within their network, and survey them repeatedly twice a week over 7 weeks on a variety of areas about their work.
The surprising takeaway:
23% of their time was lost due to Technical Debt.
That’s basically a full day every week, lost to technical debt.
When they dug deeper into what kind of issues the engineers were running into, they came up with this great visualization:
With the additional technical debt, there was growing uncertainly in what exactly the system did vs what people thought it did, which meant a reliance no additional testing to ensure the previous behavior maintained. In addition, trying to find documents and workarounds for the impacted systems was another leading time sink. Of course, a lot of time was spent trying to clean up the debt, but that is often a luxury many developers don’t get to do in a fast-moving organization.
It’s worth pointing out that the survey respondents were a bit skewed in the Embedded Devices spaces, so a bad code changes could brick hardware. This level of impact isn’t quite the same in other fields like Cloud or Game development, however I bet you would find a similar distribution in almost all software engineering fields.
A related article is a whitepaper from Google on how they manage Code Quality at Google, and its impact on Developer Productivity.
In a surprising, albeit creepy big-brotherish nature that would only work at an organization the size and scale of Google, they got internal access logs from a wide variety of internal tools that monitor developer productivity. These logs give them data on time spent writing code, lines of code written, time spent reviewing code of their colleagues, and more. Combined with their existing internal engineering surveys, similar to the previous paper, they constructed a model of engineering productivity within their org.
In the upper right quadrant is things that have a large absolute effect on developer productivity, and a large statistical significance across multiple samples. No surprise, the top 3 up there are:
- Satisfaction with Code Quality
- Priority Shifts
- Technical Debt
No surprise, priority shifts from leadership have a huge impact on productivity. But seeing a general satisfaction with Code Quality and Technical Debt so prominent, particularly over things like internal changes to infrastructure or tooling, or even complicated processes (all of which appear in the lower left quadrant) was a real surprise to me. It’s even more impactful than an Org Change or slow build times.
So what can we learn from these findings? A few things I think.
It’s not a perfect correlation and it’s not the sole factor by any means, but the data shows that developer satisfaction drops in a code base saddled with a lot of technical debt. Why? Because they find themselves not doing fun innovative new work, but instead trying not to bring down the house of cards they find themselves in. This leads to increased stress, and a slower throughput due to the extra care and thought they have to put into the work.
Important enough that it requires more than lip service from leadership. When forced to make a quick band-aid change to a system, engineering needs a “relief window” during which they can clean up the mess and prepare for the next time this happens. If technical debt keeps building without any opportunity for repair, it leads not only to system instabilities and technical problems but stress and burnout of engineering resources. Look at our previous post on Emotional Intelligence & Leadership.
As shown in the first paper, with proper processes around testing you can manage some level of technical debt. With enough automated testing around the uncertain areas of code, you can insulate developers from the “messiness” and let them treat it as a black box: So long as the same inputs yield the same outputs, you’re good. These tests can also be used later as part of refactoring to mitigate impacts on external systems, but that depends a lot on the extent of the refactoring.
When leadership asks for heroic measures to get new features out into the wild, they need to understand that these typically come with side-effects beyond the requested feature. These shortcuts can compound to make future features harder and slower to implement, and lead to an overall slowdown in development due to the care that must be taken not to disturb an increasingly fragile system.
It’s important, as an engineering manager, to understand not only how long will it take to implement a feature, but what future impacts will that decision have and can those impacts be mitigated as well. The same way we accrue story points or other metrics of work on a project, we should be collecting “debt points” that need to be repaid later for the health of the system and the staff.