Where is that Red 'Stop' Button in Your Development Process?
This three-part series written by Peter Antman takes a journey through the evolution of quality assurance, takes a look at the current state of "quality" in the software industry and explains what we all must do to raise the standards of quality throughout the technological world. In this post, Peter explains the importance of combining automation with human touch in order to create products of the highest possible quality.
In the first post of this series, I wrote about Toyoda Sakichi, the founder of the Toyota industries, who invented a loom that would automatically stop when a thread broke in the 1920. He thereby also invented the concept of “stop-the-line” to build quality in.
Incremental compile with visual feedback is a small step toward the automaticity of the Sakichi loom. Beyond that, we still have these longish feedback cycles, be it manually running unit tests or waiting on the automatic build or system tests run by our continuous integration (CI) system.
Even worse, most popular modern languages and environments, such as Python, Java Script or Ruby, don't have incremental feedback mechanisms; you have to run them to see if they are even syntactically correct.
Today we have better support for automating all of this than ever before. But as long as time passes between doing work and seeing the result of our work, we will fight a Sisyphean battle for quality, because we cannot genuinely build quality in.
The problem can be boiled down to these two aspects:
- The bigger the system, the longer time it will take to run the test suite that truly proves the system is working (unit testing does not do that, but that is another topic). The longer it takes for the result to come back, the less value it will have.
- The bigger the system, the more people will be involved. When faults are not detected by the one producing them quality assurance quickly becomes a social (or even sociological) problem.
But if we cannot build quality in automatically, what can we do?
Toyoda Sakichi later sold his patent for the loom and gave the large sum of money to his son to create an automobile company, which of course was named Toyota. Toyota is known - among other things - for the Just-In-Time (JIT) concept, which basically means you should only do something when there is an actual need.
A product with a quality problem is unfinished work seen from a Toyota (or “lean”) perspective. This is a waste in itself, since unfinished work is actually inventory, and inventory is something no one has ordered (not JIT), and therefore has no value, only a cost.
One troublesome side effect of producing inventory is that inventories are actually queues, and the longer queues you get, the worse lead times you get. In the end, not having a grip of quality, will eventually grind the production process to an halt; either by work piling up in the QA-department or by all the defect reports coming in from the field.
The quest of building quality in therefore also lies at the heart of the Toyota Production Process. It even has a name: Jidoka (autonomation), meaning automation with a human touch, that is, adding the human aspect of detecting quality problems when they happens to machines.
But how do they handle situations where the machines can't detect defects?
The human touch of many machines at Toyota plants is actually a combination of technical and social solutions. Every machine also has a manual stopping device attached to it. If a worker detects a quality problem he or she is expected to press that stop button. The team leader then has about 30 seconds to make the decision of whether or not the production line should be stopped.
And lines are stopped, trust me. A worker detecting a fault can potentially bring a whole plant to a grinding halt!
How often do we dare to stop all developers from developing when the CI environment is red? I can't even tell you the number of stories I have heard about teams setting up lava lamps or other physical signals connected to their CI environment and just weeks or months later pull the plug from the lamp because it is perpetually red.
When you don’t get instant feedback, the truly hard part is not creating your automatic integration environment; it’s fostering a culture that really stops the line when your automated system has detected faults. A CI is not a stop-the-line device, it's just a mechanism, with a delay, that discovers faults.
If you - as a developer, as a manager or as a company - are not prepared to change the culture and, at least initially, pay the price of stopping the work and start chasing defects, I would question the value of having automated continuous integration tests.
It’s even worse than that. If you have a continuous integrations environment that is frequently (or always) red and you don’t stop the line, the system will decline over time. Each instance that the line is not stopped will result in a push for producing more bugs.
So in the end, as long as we do not have an automatic stop the line mechanism, it’s all about norms and cultures. That’s the topic of the third post in this series.
About the author:
Peter Antman has a passion for improving how we build software together. Currently he is coaching companies doing agile transformations as a consultant at Crisp. He has a background as Head of Development of Atex Polopoly where he enjoyed building large scale test environments. He was also one of the early comitters to the JBoss project.