Posted September 09, 2015
Automation Vs. Testing: Getting The Balance Right
The world of software is completely crazy about automation. Thanks perhaps to DevOps, if anything, the mania is increasing. Some companies are eliminating the tester role in favor of tools smiths with programming experience who can build frameworks; sometimes this is called "developer productivity." The language these companies use is one of replacement; of dropping humans in favor of the machine.
I'd like to suggest a different language: The language of compliment, where we recognize that testing and programming are different, that Automation solves certain types of problems and creates others. For the best outcome, we need both.
Let's start by covering how testing and automation are different.
I've spent the past several years in good company wrestling the definition of 'software testing'. What software testers do in practice is very different from how we typically hear the world 'test', which comes from the school system. For most people, from age 5 until some point in our early twenties, a test is a thing with answers that are clearly right or wrong. Teachers and professors put together groups of questions, usually in multiple choice or essay style answers, usually to see if we remember what they said in class and on the contents of a text book.
In science and in software, testing is a different beast. When I am testing software, I'm performing experiments in a piece of software and carefully observing. Testing, then, is a performance, something done with skill - that includes not only the design of these experiments, but learning and altering our plan based on the results of the last test. Domain testing, for example, is a good way to discover information about the types of data that might or might not work for a variable. Often though, the most important information is discovered somehow with luck.
Often, I'll find myself exploring some aspect of a feature and notice something interesting. That 'something interesting' helps me form ideas about what is happening under a set of conditions and guides my decisions about what to do next.
Testing is all about exploration, observation, learning about your surroundings, and making decisions and judgment about your experience.
A check is the word Michael Bolton and James Bach are using for the actions performed by most of the testing tools we use. The definition they have created is: Checking is the process of making evaluations by applying algorithmic decision rules to specific observations of a product. I like to simplify this a little bit by saying that a check is a question about a software product that can be answered with a 'yes', or a 'no'. Test tooling is great for performing checks, in fact, that is all it can do.
Here is an example that you might see when automating a user interface:
Navigate to ebay.com
Log in with $user and $password
Assert you are returned to ebay.com
Assert text ("Hello " + $firstname + " !") is displayed
Every time you see the word Assert in that example means that a check will be performed. The tool must compare and decide with a yes, or no to the questions "Was I returned to ebay.com" and "Am I being greeted on the home page after logging in". Checks look precisely at the things you identify and they completely ignore everything else. A check doesn't observe, explore, learn, or judge. It just returns a confirmation that the condition you expected was met, or not.
Sometimes, we try to recreate this checking action by using check lists and very detailed test cases.
One of the biggest motivations I see for creating checks, which can run at the unit level, in the user interface, or something in between, is the ability to repeat the exact same scenario and set of questions as many times as you can run the program. Or, at least until the software you are trying to test changes enough to make the check useless. It can take a lot of work creating environments, database seed scripts, and creating tests. But once you're there, repeating a test is usually trivial.
Repeatability usually also makes it unlikely to find new information about the product.
Brian Marick describes this through the idea of a mine field. Imagine there is a mine field laid out before you, the kind that blows up when you step on a certain spot. You don't know where the mines are right now, and there are only a couple of very specific paths you can take. When you take these paths, you find a couple of the mines and remove them (hopefully without getting blown up). Each time you take that path after the first, you aren't very likely to find a mine in the same spots again.
People are pretty bad at repeating steps precisely. When I was working for a company that relied heavily on detailed test scripts, I had a constantly wondering eye and mind even though the mission was to follow the script and try to be robot-like. This need to deviate is a good thing. Going off the script when appropriate means exposing yourself to new information.
The minefield idea supports the idea that people should be exploring and testing to discover new information in addition to repeatable checks being run to confirm what we think we know.
The amount of time it takes to get information from a test happens on a spectrum. At the "as fast as you can type" end of the spectrum are what are called quick attacks. Quick attacks are a style of test that require next to no setup or planning. You can just perform them and observe what happens. This might sound trivial, but they are powerful because you can run them so quickly and they tend to yield a good harvest of bugs. Matt Heusser, my colleague at Excelon Development, likes to point out that a "trained", high maturity programmer, especially working in pairs, can find most of these bugs on their own. That's certainly true, as far as it goes. Yet most of the companies I work with hire because they are not exactly high maturity ... yet.
There are also plenty of techniques beyond quick attacks that take a more detailed analysis of the requirements, platform, and sometimes code. These "other end of the spectrum" techniques require time to design, special data conditions, and often setup conditions before we can run the test and get results.
Automated Checks require a great deal of set-up time by themselves; someone has to write code or at least record and inspect with a tool. I'd dare say that the absolute fastest you can go from idea to information with a check is close to the slowest you get with a human test. After the checks are created, assuming you want to run them more than one time, you will have to worry about keeping them in sync with and running against your product. Using tools to create checks is often like creating a software project in parallel to the one you are selling.
Occasionally, I see tool-aided testing used as a reason to reduce or get rid of all together the test department. If you aren't spending a lot of time refactoring into a component architecture where you can turn features on and off quickly, and creating elaborate monitoring systems to notice when something is going wrong, a check heavy strategy probably isn't for you. Even if you are, context matters and there are plenty, life critical systems for example, where I wouldn't take the risk.
You'll notice the subtle idea of substitution wormed its way into the conversation again here. It's powerful. But I think we can do better. If a test requires a great deal of setup that can be scripted, and we want to run it many times, then we can do that in code, or with a tool - making the re-run cheap. We can also use the check to drive us to an interesting place -- two users that have identical names, for example, or to edit a page while we are trying to edit it ourselves, testing for concurrency issues. Michael Larsen calls this "Taxi Cab Testing"; that the script drives us to an interesting place faster than a human can, then we jump out and test.
When all you see your entire life are white swans, you don't think of the arrival of a black one. This idea, that the presence of only white swans does not disprove the existence of black ones, was an example in Medieval logic classes ... right up until Willam De Vlamingh found black swans in Australia in 1697.
Black swans events surprise us. They seem obvious in hindsight but are hard to predict up front. Things that we could not have imagined, like a storm knocking out the power to Amazon's main data center or worse, a tsunami wave following a major earthquake creating a nuclear disaster, can have a massive impact.
Checks that run repeatedly ask the same questions over and over. They'll be unable to find black swans almost by definition. Testing is a strategy for putting people in a position to discover black swans, sometimes on purpose. Things get a little complicated here. Checks can be embedded inside of a test. What that means, is that a check can be run and then a person can use the results of that check to learn something interesting and go off and investigate - like our Taxi Cab example. So, even though the check can't discover a big important problem, a person reviewing the results of that check might get the information they need to discover the problem.
How To Talk About This
The terms check and test were carefully selected, but the topic is very much rooted in philosophy and social science. Specifically the work of Harry Collins in his books The Shape of Actions, and Tacit And Explicit Knowledge. The reading came come off a bit dry and academic. Yet there are practical implications that can be used in day to day work. Software testing performed by people is a strategy used to discover new and important things about a product.
Checks might help these people to some degree, but on their own are only answering yes or no to simple questions.
Choose your strategy carefully.