Always Fix Broken Windows
This three-part series written by Peter Antman takes a journey through the evolution of quality assurance, takes a look at the current state of "quality" in the software industry and explains what we all must do to raise the standards of quality throughout the technological world. In this post, Peter explains how poor processes, if left untouched, will lead to compounding deterioration to the quality of your product.
Think of old bicycles in cities. A bike is left tied up on the sidewalk for a while. Someone steals the ringing bell. Left in that shape in the same place for long enough the bike will magically fall apart, as if rotting.
In 1982, the social scientists James Q. Wilson and George L. Kelling formulated a sociological theory from observations like these called “The Broken Windows.” A landscape, they meant, signals the type of social norms prevalent in the local society. A broken window left broken says: “it's okay to break windows.” A broken bicycle sends the same signal.
There are deep ramifications for software developers here. A broken build that is not fixed quickly is just like a broken window. It signals that no one cares. It signals that it’s okay to break the build. Gradually, the morale will decrease. I've seen this happen, time and time again.
In the first two posts in this series I wrote about how to build quality in by stopping the line, and that implementing it in software development requires social change, respectively.
If you have a continuous integrations environment that frequently (or always) is red and you don’t stop the line, the system will decline gradually over time. Every time the line is not stopped will be a push for producing more bugs, more broken windows.
At a previous workplace, I eventually got fed up and decided to try to fix this downward trend. For a number of years we had tried to come to grip with a growing number of defects. We had good focus on testing, we had a rather huge testing environment, we had done a number of so called “go greens” and clean ups (i.e. fixing bugs and broken tests), but the number of bugs kept growing, and with them emergency customer cases.
Having tested almost everything I instigated a zero bug policy, which basically meant that no bugs should ever be allowed to go into a release, and that any bug found by customers should be fixed immediately.
Practically, this meant we now had a stop-the-line policy. If the test environment was red, people had to stop coding and find out why. If a bug came in as a support issue it would have higher priority than any other development issue. Period. No argument.
That was, of course, easier said than actually done. It took a couple of years to really get it working, and we had to develop a number of tools to help us achieve this. The most important of these changes were the social ones, redefining our culture into this:
“We write fault free code. We write tests to achieve that goal. When a test fails we fix the problem immediately. A bug is just an unwritten test that failed, which must therefore be fixed immediately. “
Eventually, the policy got the name “Fix Next Sprint,” which meant that bugs discovered outside of daily work would automatically be fixed next sprint, since the cost of disturbing the team in-sprint was higher than waiting one week (yes, we had one-week sprints). Sometimes, a bug would have to wait a couple more sprints, but eventually they were always handled.
This stopped the mountain of defects growing, and - in time - it also meant a lot less customer emergency cases. Some of us actually missed the hectic focus of severe crises, but hey, that was a cheap price to pay. And I dare say, as long as we don’t have instant feedback loops, this is the only way to fix the problem: Build the quality in by stopping the line manually!
Unfortunately, this is not something you do just once. Every day the fundamental ethical message must be signaled: We do not yield! It’s a truly Sisyphean task that echoes the almost existential credo, “in spite of it all!” That is, it's a lot of work, but a meaningful and worthwhile struggle.
I keep a close watch on these tests of mine
I keep my Jenkins open all the time
I see a defect coming down the line
Because you're mine, I stop the line
About the author:
Peter Antman has a passion for improving how we build software together. Currently he is coaching companies doing agile transformations as a consultant at Crisp. He has a background as Head of Development of Atex Polopoly where he enjoyed building large scale test environments. He was also one of the early comitters to the JBoss project.