Manual testing is the oldest and most essential form of software testing. At its core, it represents a tester interacting with an application as a user would and searching for defects and whether the application met stated technological requirements. As technological advances have been made over the years, more and more tests are becoming automated across different layers of the application under test. Despite this, the art and science of manual testing continues to be critical to software quality for the foreseeable future.
Author: Matthew Heusser
The Problem of (Early) Testing
One of the core problems in early testing (and still today) was the idea of coverage. We wanted to know what elements of the system had been tested, and perhaps how thoroughly. The most obvious way to do this is to break requirements down into bullet points, and make sure there is something to cover each bullet point. Each bullet point was a possible test case, described on ten different pages of “Program Test Methods”, Bill Hetzel’s 1973 collection of conference papers that became the first book on software testing. The test case idea is relatively simple, consisting of four steps: Preconditions, actions, expected result, actual result. Before the actual testing begins, someone creates all the test cases, then hands them down to the testers to do the actual work. Sometimes these are called “test scripts”, as in scripts to follow, like a play.
Manual Testing as Script-Following
When Hetzel’s work came out in 1972, most organizations were either living in chaos, or trying to follow some form of waterfall model. Under waterfall, the program was built, then system tested until the bugs fell out and fixed, and then the product was shipped. Testers were often end-users, people from customer support, or other roles. The scripts told them what to do. When the software needed an upgrade, the scripts could be re-run. In theory, they allowed the company to swap out anyone to do testing, lowering the hourly rate and making it easier to find testers.
In practice, this leads to boring work often done badly. Dr. Cem Kaner is a retired professor of Software Engineering Florida Tech. He is also co-author of Testing Computer Software, the best-selling testing book of all time. Dr. Kaner has pointed out problems with test cases. For example, some people claim their applications are too complex; that scripts provide navigational guidance to the new tester. Dr. Kaner would ask you to remember a time you drove to a new place with a navigator who gave you specific directions on when to turn, but not why each of those directions mattered. If you returned, would you remember the route? Script-following encourages people to follow the same steps every time. It might be consistent, but it can decrease coverage over time as features are added to the application. Worst of all, test cases can cause the person viewing them to focus only on the expected result, missing other bugs in the software.
As a result, extremely-detailed manual test cases have a reputation for being boring, no-fun, low-value, and even brittle, because when the user interface changes, the test case directions “break.” Exploration is the opposite of scripted testing, which has its own risks.
Manual Testing As Exploration
When Adam Yuret started his technology career, he was doing testing. His first assignment was to take a new piece of software and “play with it.” At the sound of the words “play with it”, some professionals will wince. Yet play, fundamentally, is open-ended exploration. Exploratory testing is the process of learning about software while testing it, having the next test idea derived from the results of the previous. Like chess, it actually takes skill and discipline to do well -- while appearing unpredictable, even confusing, to the outside observer.
The problem with exploration is that pesky problem of coverage. How do you know when you are done? How can we have confidence that we touched all the pieces of the application that matter to a customer? And, how can you describe the work you did to other people? One way around this is to create a map, or a testing dashboard. The testing dashboard allows the testers to evaluate the amount of coverage they have for any feature, and the quality, on a scale from one-to-ten.
Using Tools To Explore
A third kind of manual testing is using tools to gather or aggregate data. A tester that populates a grid, cuts and pastes the data into a spreadsheet, and applies a sum function to see if the totals add up is certainly doing manual testing. What if the data came out of a database, and the tester wrote a SELECT statement, then ran it through a diffing tool to compare with the previous release? Is that manual testing?
The key distinction I would like to make here is between tools that run against software unattended, producing results, and tools that are crafted by a tester for one-time use. The first time the tester created the tool and ran it, the work was manual. They are actively working with the product, learning, and changing what they do based on that information. If the test becomes something that runs automatically, only creating an email alert in case of failure, then it is no longer manual testing. Few people would call that manual testing - they are more likely to call it “technical” testing."
Feature Tests Vs. Regression Tests
Most modern testing groups have at least two kinds of functional testing - testing the individual features by themselves, and testing to see if a change made since the feature was tested has caused it to regress, or go backward. In a two-week sprint, regression testing can’t be more than a day or two, which means the teams want to have both fewer regressions in the process (cleaner code, fewer mistakes) and more tooling that runs continuously, to find bugs as they are created.
Feature testing is usually a combination of creating tooling to run every time there is a new build, and exploration for the purposes of finding new, unexpected bugs. There are, after all, a million questions a tester might want to ask once, but a much smaller list of questions about the software that needed to be asked for every release, created in code to run as automated checks. Where feature testing is more manual and a little automated, most teams attempt to make regression testing more automated and less manual - so they can run regression tests more often, with less effort, and release more often, with less effort. The problem is finding the right balance.
The Future of Manual Testing
Despite years of prognostication to the contrary, manual testing is still very much alive. In fact, as long as humans have a few tests they don’t want to automate, we’ll have manual testing. If some human asks the question “I wonder what happens if …” we’ll have exploration.
Here’s the challenge: For any given sprint, the technical staff might have something like 200 person-hours to devote to testing above the unit (code) level. Recording a piece of automation might take 10 person-hours; doing it by hand might take one. The technical staff need to decide if they will create twenty automated checks, or 200 human feature-checks. The “right” answer is probably a combination of the two. The manual side of that might contain following some directions, some exploration and some tool-work. Doctor Kaner suggests checklists, allowing the staff member to do the testing first, then refer to the checklist to see if they missed anything.
Putting It All Together
A few years ago, a fortune 1000 firm in my area outsourced a large number of test scripts to a team in a developing nation. The scripts were high-level, and the overnight test results came down all passing, even when the databases and web servers were down. The company did not really have insight into what the testers were doing, so they re-wrote the test cases with more detail, at a lower grade level, then re-wrote them again. Eventually they found at the fifth-grade level, the results that came back made sense.
This is the worst kind of manual testing, the kind that makes luminaries say things like “You’re a manual tester? You’re a dinosaur, you’re going to die!” Which I heard in 2012, in Sweden, when I was presenting on testing. It is sad that the only testing so many people have seen is that kind. When people hear the term “Manual Testing”, that is often what they think of.
Instead of a single answer to the question, I suggest asking the person what they mean by manual testing: Is it exploration? Following directions? Building small, one-time use tools? When does that make sense - for special corner-cases that might only need to be tested once? Unless everybody involved understands what the objectives are, the definition of manual testing is left up to debate, and the quality of your application is at risk.
Contexts for Manual Testing
Manual testing may sound less common, but it is still a normal method of testing. It is, however, used less often. Because most offices only run for forty hours each week, a question commonly asked is “How much time should I invest in tooling if I still need time for human investigation?”
In our research, we have found that there are many different environments; some are better for manual testing, and others for automation and tooling. Today we will explore that topic, specifically focusing on testing at the Graphical User Interface (GUI) level, with the goal of making those tough decisions easier.
New and Emerging Platforms and Technologies
The capability to develop programs on new platform, such as on the web, a phone, or a tablet (or plugins such as Flash or Silverlight), is usually created well ahead of the technology to drive tests for that platform. For example: when Google made the Android operating system there were no tools to drive any of the native Android applications, and certainly not even any for the operating system itself.
The explanation is simple: the people at Google who were testing were humans. Probably smart humans, perhaps using tools. At the code level, Android may have had unit tests and model-driven tests and all sorts of checks, but at the system level, Android used a manual testing strategy. The same was true for Google glass, and it is probably true today for most virtual realities (Oculus), Autonomous Vehicles, and cutting-edge video games, especially games that respond to human motion.
Jason Huggins, the initial creator of the Selenium project, once suggested this: if people want to remain as manual testers, they should surf the wave of new technology, always staying at the front. How can we use that advice?
Emergent and One-Time Risks
Several of the teams I have worked with have had a “demo”, or “show and tell” when a feature or sprint was complete. Engaged customers would often ask “what happens if we leave (required_field) blank” or similar things. They would try it, and sometimes they would find a bug, or other times they would find a new requirement. Sometimes customers like this behavior, but it cannot work for everyone.
The point is that nobody said “Wait! Stop!” or “You can’t test that! We need to automate it first!” That kind of rant is unnecessary, and they knew that. Yet many organizations I have worked with try to mandate that all testing be automated. Pushed to explain that, they may claim that the only manual testing that occurs is while the testers are recording automation. This, as so many companies already know, is not the only way to test--and often is the worst way for your company.
There are simply too many test ideas that need to run once or twice, but are not worth institutionalizing, not worth running all the time. Another time I worked with an eCommerce team that was not supporting the iPad. No matter what they were told, they ignored it. Despite being told that tablets were not supported, customers consistently used the iPad, and large volumes of money were flowing in through it. The company asked me to do a “quick assessment” of blocker conditions for the iPad manually over a few days.
Expect wise organizations to continue to do human testing as features are developed, as well as when new risks, such as new browsers or use cases, emerge.
Unstable User Interfaces
For mature user interfaces, change is a good thing -- but make sure the changes you make today only impact one sub-system, so you can see the failing tests, revise them, and see them pass. Like cutting wood, you measure twice and cut once. The failing tests that need to be “greened” are a second measure.
When an application is new, its user interface is constantly changing. That means a lot of false errors; too many slow test runs that hang while waiting for data that is not present, too much re-running, and more time than expected waiting and fixing. A few organizations have “got the hang of” proactively changing the automated checks before the new GUI is created, but they are a tiny minority, with a very specific tool set.
Teams testing a new, small application that is constantly changing may be better off testing it by hand. If all the team does is create new, small applications that do not have much ongoing maintenance, then testing at the GUI level might not be worthwhile for finding bugs. It might provide more value as working demo! This is especially true for applications that do not exist yet. Testing paper and whiteboard prototypes can replace days or months of work with only minutes, yet there isn’t a computer in the world that can do that job (at least not yet).
Media and Graphic Design
The ultimate unstable user interfaces are often the most common user interfaces, like eCommerce sites and video games. One eCommerce project hired me as a human tester (a manual tester) to make sure everything looked correct. The project also had GUI automated tests that ran overnight. A typical test would be to try to move a mini-shopping cart logo, on the side of the screen, down twenty pixels. When I tested it, the “submit” button was now twenty pixels lower, with the bottom ten now off the screen, and only the top-half of the letters showing. The tool was still able to click the button, but I saw the problem immediately because I am a human.
There are some modern tools that can look for visual differences between test runs, bringing them to the attention of the testers, but a media application like the one I was working on would just throw errors with every change. While there are more automations in video games and multimedia applications, (for example “bots” to simulate teams in an online football game) it is necessary for humans to continue to play roles in testing graphic-intense applications for years to come.
As a Coach of Testing Skills
What can you do with a large IT group, with dozens or hundreds of test automators, twice that many production programmers, plus business analysts, product owners, managers, and scrum masters. Or what can you do with an organization with no test automators, where the programmers take care of both jobs?
An organization that large will likely need to have someone focused exclusively on test design, strategy and coaching -- a person not actually focused on how to write the code of test automation, but what to test and how to know if things are successful. This person will likely be involved in figuring out testing requirements, project plans, staffing requirements, and teaching technical staff about when to test manually and what risks are better covered by tools.
This reminds me of my old mentor, William F. Duke, who was a non-leadership “Technical Director” at the National Security Agency. He described his role this way: “For the most part, I am self-directed. I do have some officially assigned duties, but are they are mostly in the vein of ‘Go forth and do good things.’” After twenty years with the agency, he knew the systems and knew how to add value. Did Bill Duke write code? No. Nobody cared, either.
Regulated Software
Some companies have established written business processes for how they test software. Those might include written test cases with physical signatures on a document and dates for when that test ran. These systems are antiquated, out-of-date, and incredibly hard to change. They can control avionics, medical devices embedded in the human body, or that dispense medicine, or financial transactions.
Many software testers can be a bit cavalier. They say that their test automation software will produce screen captures, even videos of when it ran, and put it automatically on a network drive. The videos will be stamped with a trace back to the test idea, plus a timestamp, plus the exact build of the software that the test ran on. For that matter, they can include the Docker image ID, which makes reproducing the entire web server possible.
And yet, for some reason, the auditors and executives responsible are rarely comfortable with testers using this style. For the foreseeable future, systems that haven’t killed or bankrupted anyone with manual testing (but could) will probably continue to be tested that way. Yes, a few of them will shift to using a more tools-focused approach. The conversion will be slow, with plenty of notice.
Complex Legacy Systems
Ironically, while the latest shiny systems can’t use tooling because they are too new, legacy systems have a hard time with tooling because they are too old. For example, one company I worked with had an older, undocumented, “green screen”-like application. Their data was always stored in a database, but to get to the database it had to be manually entered. The front end combined data entry with some transformation rules and changes to the system were few and far between. Long-time technical staff knew how to test every screen, and knew that, unless the database or storage of the data changed, changes to a screen would almost certainly be limited to that one screen. Testing only one screen might take an hour or two.
The Future of Manual Testing
After this analysis, it should be clear that many of the contexts that are heavy on manual testing are heavy for a reason. Instead of criticizing them, or complaining the industry is a “bunch of dinosaurs”, we may be better off recognizing the reality that drove them to those decisions. The two extremes, of the super-advanced interfaces (that can’t be automated), and the legacy systems (that can’t be automated) tie together to tell and interesting story. The best way to use tooling is the middle of the bell curve: applications that are both popular enough to have market demand for tooling support and mature enough to support that demand.
Then there are the other contexts: regulated software, unstable interfaces, and graphic-intense applications where the problems might lie are ones that only a human can recognize. And, of course, feature testing. Some manual testing work can be exciting, interesting, and valuable. Some of it, like the regulated work, can be boring but important. Some will be just boring.
Most organizations will use a mixture of approaches for different parts of the process, and that mixture will change over time. Because of this, you need to have a conscious strategy. Understand what you spend your time on. Most importantly, be prepared to change your approaches. In this business, the only constant thing is change.
Additional Resources