Testing Challenges Associated with Machine Learning APIs

  April 22, 2015


Machine learning is a combination of specialized algorithms, data, and automation. The machine is actually an algorithm that learns from past mistakes in order to create a refined solution. The learning process involves training with known data and then testing to ensure the training process worked as intended.

Even though machine learning, a branch of data science, isn’t new, it has gained a lot stronger emphasis lately because of all the ways in which it’s used. For example, you can’t create a self-driving car without machine learning because the algorithms that control the car’s functionality need to adapt to the particular situations in which the car is used and the characteristics of a particular car. You also find machine learning used in robotics, fraud detection, handwriting analysis, e-mail spam filtering, and even in situations such as those suggestions for additional purchases on sites such as Amazon.com. It’s likely that any application you work on in the future will incorporate some type of machine learning in it because the technology is so versatile and extends applications in ways that make them easier to use and proactive in meeting user needs.

Similarities with Other API Testing

APIs all require specific kinds of checks and tests to ensure they work as advertised. For example, just because this is a machine learning API, doesn’t mean you can forget about security testing. In fact, you want to be sure that a machine learning API is secure because you don’t want your competitors to know what sorts of data problems you’re working through in secret. You also need to check processing speed and the ability to get reliable answers. All the standard checks you’d perform on any other API, you also perform on machine learning APIs; although, the actual testing process differs to a degree (as discussed in the remainder of the article).

Understanding the Challenge

With most APIs, you need to test for specific sorts of things. For example, if you provide specific data as input, then you expect specific data as output. The answers are as easy as 2 + 2 = 4. Machine learning relies on huge data sets, multiple algorithms, and pattern recognition to meet its goals. Looking for a specific answer won’t solve anything. In some cases, you don’t even know the answer until you use the API. That’s right, using machine learning can help you solve problems for which you don’t currently have an answer (or even a good guess), so there is no way for you to know the answer at the outset—it’s the reason you’re using machine learning in the first place!

The manner in which you train the algorithm also makes a huge difference. If you provide the wrong sort of training data, then the algorithm won’t produce useful results. In this respect, it’s the same situation that you encounter when training anything else. If the training doesn’t match the real world scenario, then the person, animal, or device lacks the skills required to deal with that real world scenario and you can’t do anything about it except provide additional training.

When working with machine learning, you must also choose the correct machine to perform a specific type of problem solving. Not every machine is good at solving every problem. There isn’t any silver bullet solution either. The only way you can know whether you have the correct machine (algorithm) to solve your particular problem is through testing. This means that you really require full access to a machine learning API and then you must time the time required to delve into each of the algorithms supplied by that API in working through your specific problem scenario.

Performing an Initial Test

You need a starting point for any testing scenario. In most cases, you want a test data set that mirrors the real world data set in every way possible, including size. The size of your test data set must match the size of the expected real world data set because some algorithms are size and complexity sensitive. That means if you use a subset of the data, the results of your testing will be skewed and you won’t see the expected results in a real world scenario.

The rule of thumb is to divide your test data into two parts. You use 70 percent of the test data to train your algorithm. Training means providing both the data and the associated answers that you’re seeking from the data. The correlation between data and answer will form the pattern recognition that machine learning relies upon to guess at answers for data that has no answers.

The other 30 percent of the data is data alone and you use it to test the results of your training. You have the answers associated with this 30 percent, so you can check to see how well the algorithm performs in guessing the correct answers. The machine won’t always provide the correct answer. Every machine learning scenario encounters situations where the output is incorrect because the input data is skewed in some way or it contains outliers (data points that fall outside the expected range). The error rate of a machine learning algorithm determines whether that algorithm is the right one for your particular problem domain.

Using Cross-correlation Tests

Assuming that you have settled on an algorithm, the training is completed, and the initial test goes well, then you can use cross-correlation testing to verify that you have the correct setup. It’s entirely possible to pass the initial test and still have a setup that won’t work in a real world scenario.

To perform cross-correlation testing, you split your data into ten parts. Nine parts are used to train your algorithm. You use the remaining part to test the algorithm. After each test, you obtain an error figure for that test—how many times the machine guesses wrong. You perform the test ten times, rotating through each of the ten parts one at a time as the test data. When the procedure is complete, you can verify that the error levels don’t vary significantly between tests. In addition, the average of the error outputs from the tests gives you an idea of just how well your machine will work in your specific problem scenario.

Checking Other Algorithms

Because of the nature of machine learning, you can’t ever be completely certain that another algorithm isn’t better than the one you’re using unless you actually test the other algorithm with your specific problem domain. This fact makes machine learning testing different from just about anything you might have experienced in the past. Yes, you have what you think is a really good answer, even though you have a few error outputs, but unless you compare that output with other algorithms, you won’t know for certain. Part of the testing process is to test other algorithms (within reason) to determine whether they’ll provide better answers than the algorithm you tried at the outset. In short, testing can take quite a bit of time when working with machine learning because you test each available algorithm, rather than just one algorithm. The essential thing to remember is that there is no one algorithm that is best at everything—each algorithm is best at just a particular problem domain.

Further Resources