What You Need to Know About AI and How It Will Change Testing
Testing is on the verge of a revolution brought about by the use of Artificial Intelligence (AI) in practices such as test case design, test management, test execution, and evaluation of test results.
AI-based testing solutions have the potential to generate test cases, select test cases to run, recommend test approaches, and analyze test results to determine risk and help make deployment decisions. Vendors are already introducing solutions that will make testing more efficient and robust while reducing the tester workload.
Many of these tools use machine learning approaches, a subset of techniques under the artificial intelligence umbrella. Artificial intelligence may include classic statistical techniques like correlation and regression, and time series forecasting -- systems that are programmed with rules that emulate an expert’s thought processes in a particular problem domain by leveraging data and many different algorithms.
With machine learning, the application “learns” based on processing the data about the problem domain. In reality, what is happening is that large amount of data are fed through algorithms that attempt to use that data to derive an answer or decision. This is typically done a number of times until the algorithms converge on a solution that meets accuracy requirements.
Three Approaches to Machine Learning
There are three main approaches to machine learning to enable largest sets of data to be processed:
Neural Networks
The most common machine learning approach is neural networks. Neural networks attempt to model the function of the human brain, by creating a multi-layer sequence of algorithms.
Relevant data is fed into the input layer of the network, where the algorithms process it. That processed data is sent to one or more hidden layers, where it is manipulated further.
This process is called training the network. The first time the data passes through the network, the end result is usually not a good representation of the correct answer. However, assuming that the correct answer is known, the algorithms will compare known results to generated results and adjust to attempt to come closer to that answer.
Teams may have to train the network through a number of iterations before it begins to converge on an acceptable solution. How many times it is trained depends on the desired accuracy, although for better accuracy, the team may also have to rearchitect the network, in effect starting over again.
There are several popular open source frameworks for creating neural networks. Google’s TensorFlow is the most popular, but there is also OpenNN, Amazon’s SageMaker, and Keras, to name a few. If you plan on developing a neural network, look at each of the available frameworks to see which one best meets your needs and skill set.
While you are likely to need a data scientist or mathematician to help with creating a sophisticated network, many simpler ones can be created with only a knowledge of a programming language such as Python and some background in mathematics or statistics.
Genetic Algorithms
Another popular approach to AI is genetic algorithms. They’re called genetic algorithms because the process is loosely modeled after the workings of genetic selection – that is, through a combination of heredity, variation, and selection.
Algorithms are evaluated based on some criteria, and the best algorithms are combined to form a next generation of algorithms, some of which at least should perform better.
A key to propagating a genetic algorithm is its fitness – that is, its ability to make incrementally and successively better predictions. Fitness is a measure that the system designer develops to reflect the overall success of the application. It could be to maximize revenue, achieve a high throughput, or some other measure that represents the business or technical goal.
Genetic algorithms, on the other hand, search a domain based on the biological principles of genetics. They apply successive efforts with sets of algorithms based on the previous set in order to optimize the fitness score.
Also, typically a genetic algorithm attempts to optimize; or find a relative fitness in the search space. Searches scan across the space, so that it doesn’t focus on a local optimization. It’s possible, even likely, that the fitness function may go down during successive generations.
Ruled-Based
AI systems can also be rules-based. In a rule-based system, the knowledge engineer interviews a problem domain expert about his or her sequence of thought steps in evaluating a problem and coming to a decision. The knowledge engineer codifies these thoughts into rules, which are then programmed into the application.
The approach to rules is more sophisticated than simply an if-then-else construct. AI algorithms follow steps on how rules are evaluated, using either forward- or backward-chaining.
The name forward chaining comes from the fact that the expert system inference engine starts with the data and reasons its way to the answer. Alternatively, with backward chaining, the engine presumes an answer and tries to evaluate data in support of that answer.
While a popular approach twenty years ago, rules-based expert systems are less frequently used today. It turns out that few experts can explicitly define the thought processes they go through in order to come up with an answer in a given problem domain. However, they are still used in domains where the steps are well-defined and quantitative in nature, such as whether or not to approve an insurance application.
These three techniques represent the vast majority of AI-type applications being developed or in use. Neural networks are dominant today, but future systems will likely use elements of all three for a more holistic way of expressing learning and knowledge.
AI Rapidly Moves Into Mainstream Software
There are several reasons why machine learning approaches are popular today.
- First, many useful machine learning systems are relatively easy to build. While they require large amounts of data and an understanding of how the algorithms process that data, most are conceptually simple.
- Second, the availability of, and ability to store and process large amounts of data, make training machine learning systems easier and more accurate. If we have multiple independent variables that point to the answer to a dependent variable with a high degree of correlation, we can build a relationship model that is trained to produce the correct answer reliably. Large amounts of data makes it easier to find that correlation.
- Third, these systems can be genuinely useful in a wide variety of decision-making domains. In some domains, these can result in higher sales and profits, better safety, and better health care.
We are collecting so much data today that it is likely that many different types of systems will use these techniques in the future. But even today systems for financial trading, e-commerce recommendation engines, facial recognition, and behavior prediction, to name a few, are very amenable to machine learning solutions.
These types of systems have a lot of data at their disposable, which can be used to train machine learning systems to suggest product purchases, buy and sell financial securities, and recognize people.
These types of systems are not one hundred percent accurate, because they rely on complex relationships between input data and output results, they are usually at least as good as a human performing the same task. This means that they can supplement rather than replace human capabilities. And they can be both highly useful and profitable.
Recommendation engines, which suggest other related purchases by e-commerce consumers fail the majority of the time, but succeed often enough to satisfy most users. Trading systems will fail to profit occasionally, but over time will do at least as good as humans.
Testing and Machine Learning
Commercial testing vendors are offering machine learning-based solutions for test case generation and selection, risk analysis, test result classification, and other test-related activities. Testers can benefit from these tools by creating and running better test cases, understanding the results of test runs, and communicating risk to stakeholders.
Machine learning in automated testing tools can enable access to application properties often missed by standard object recognition techniques. This means that testers can more easily keep automated tests up to date, even when physical locations of object properties change on the UI.
Machine learning systems will eventually be an important part of software and system testing. Most systems are limited in scope, and many are experimental. In the future, these will become pervasive in more and more applications.
There is Still Work to Be Done
Machine learning systems lack the ability to take over entire jobs, whether it be in manufacturing, software testing, or any other field. Those that require a high degree of precision, such as self-driving cars and intelligent drones, are likely to require a complex combination of approaches in order to achieve that level of accuracy.
Also, in cases where getting the wrong answer is potentially damaging or a safety issue, such as in law enforcement, there is the need for better accuracy. The required level of precision will demand the refinement of existing techniques, as well as the development of entirely new ones, and will likely take years of research and development effort.
How to Test Machine Learning Systems
Even as machine learning is being adapted to assist testers, testers have to be able to test such systems prior to deployment. Testing tools, recommendation engines, trading systems, medical diagnosis systems, decision support systems, driverless vehicles and more are increasingly using AI in some way to make decisions or assist human operators.
Testing is built on the logical foundation that for every given input, there is a defined and unique output. If I enter A, the system will always return B. What if software isn’t supposed to behave like that?
Machine learning systems are increasingly being used in e-commerce recommendation engines, predictive analytics, big data mining, and a host of other daily applications. And while some seem trivial, ecommerce sites are increasing their sales by sifting customer data to find relationships between products people buy.
We are using big data trends to determine the likelihood of failure in networks, automobiles, aircraft, and public transportation systems.
Testing isn’t always at the top of our minds as we try to develop and deploy systems based on machine learning algorithms. Simply building a good set of algorithms that model the problem space is difficult enough. But testing is a part of the software development and deployment process, and we need to look seriously about how these systems will be tested.
Testing can take two different forms:
- First is the traditional type of testing, where the application is both unit tested by developers, smoke tested by automation during the build and integration process, and manual testing by testers. This process is well-known, though it will vary depending on the type of system being developed.
- Second is testing of the solutions generated, and presumably the underlying algorithms. This is where testing becomes more problematic. How do we test systems that may return a different result to the same data over time? Traditional testing techniques have no way of taking such a result into account. So what are testers supposed to do?
Testing machine learning systems qualitatively isn’t the same as testing any other type of software. In most testing situations, testers seek to make sure that the actual output matches the expected one.
With machine learning systems, looking for exactly the right output is exactly the wrong approach. Instead, here’s what testers need to focus on:
- Have objective and measurable acceptance criteria. Know the standard deviation you can accept in your problem space. This requires some quantitative information, and the ability to make sure that you understand and interpret those measurements.
- Test with new data, rather than the original training data. If necessary, split your training set into two groups, one that does training, and one that does testing. Better, obtain and use fresh data if you are able.
- Don’t count on all results being correct; in fact, some may vary significantly from the correct value. And if it’s not good enough, testers may have to recommend throwing out the entire network architecture and starting over.
- Understand the architecture of the network as a part of the testing process. Testers won’t necessarily understand how the neural network was constructed, but they will understand whether or not it meets requirements.
The key to testing the system is to understand both the requirements for the production results and the limitations of the algorithms.
The requirements need to translate into objective measurements; ideally, the standard deviation of the mean result, assuming that the mean result is closely related to the actual result found in the training data.
Testers need to be able to assess their results from a statistical standpoint, rather than a yes-no standpoint.
Putting the Different Pieces of AI Together
AI and machine learning systems are poised to revolutionize society across industries, and software testing in particular. These systems will radically change how we make decisions, and the criteria by which we judge those decisions. Application users will partner with AI and machine learning systems to test more quickly and efficiently. With the guidance of intelligent systems, testers can, simply, be better testers.
But there’s more to AI than that. Testers will also have to test AI systems themselves. Relatively few are doing so today, but the number will only increase. Testing AI and machine learning applications will be much different than testing traditional applications. While many testers don’t envision themselves at that point, it is coming sooner than they might think.