Volkswagen Test Rigging: Who’s to Blame for Cheating Software? Part 2

  September 25, 2015

Ways in which regulators can find scandalous audit logs

As Volkswagen’s emission scandal has widened, its share price has tanked and the company has set aside approximately $7.3 billion to cover the costs related to the scandal including recalls. Their market cap has plummeted even further since we wrote Part 1 of this Series, and the week is just coming to a close.

Who’s to Blame?

Volkswagen CEO Martin Winterkorn stepped down as the company is trying to cope with the heat for cheating on emission tests, thwarting environmental standards. At the same time, U.S. and German governments are looking to launch criminal investigations in order to identify the culprits. No one from Volkswagen has come forward accepting responsibility for faking emissions tests. Even the CEO there was no wrong doing on his part. While some would blame corporate greed, the question arises how should regulators go about finding the truth? The answer may lie in following “bottom to top” approach. Following the development and testing audit trail, provided such a log still exists and has not been tampered with, could be the key.

Let’s take a look at the software development lifecycle journey from the start. Welcome to software development 1.0, my friends:

Requirement Gathering: Since Volkswagen has admitted to installing software that sensed when the car was being tested, thereby activating the equipment to reduce emissions, a critical question arises. Were such requirements explicitly asked for by someone working on the project? If they were, were they documented in a requirement management system?  The answer to this question is probably NO. Having an explicit document highlighting the need for artificially lowering the test results is something that creates obvious problems. That brings me to the second stage:

Coding: Without having an explicit requirement document specifying the need for cheating software, how come the feature was actually built into the product? Developers probably had verbal commitments, or as often happens in software development, this feature was rolled out due to mistake on someone's part. Irrespective of either of these scenarios, few critical areas to look at would be:

  • Source Control Management check-ins: These check-ins would help explain who made changes to the code, when such changes were made, and any specific revisions that took place during the timeframe. The timestamp, along with the name of person making the change, would help clarify if this was a genuine attempt to cheat the system.

  • Unit test cases written: A good programming practice is to write a unit test case to ensure that the code works as the developer expects it to work. As a result, to verify their code, developers often send different dummy inputs to particular methods just to verify the method returns an expected value.

In a recent article, German newsmagazine, “Der Spiegel”, described how applications are going beyond the state of testability because automobiles now contain 100 million lines of code. This certainly makes testing applications, having an ability to recognize specific situations, difficult. However, it is critical to not just look at the tests performed on the application as a whole, but the functional and non-functional tests executed to test specific portions of code, before they were actually checked in. This could in turn prove useful in understanding if unit tests for the particular piece of software in play did reveal the context in which code gave different toxic readings.

  • Code Coverage: Another way of ensuring what percentage of the 100 million lines of code is covered is looking at code coverage metrics. A variety of coverage metrics (statement, branch, path, etc.) can help offer different insights into which part of code base was tested. Using this information, a developer would typically figure out which part still needs testing and then create test cases to ensure every statement in the code is covered.

In Volkswagen’s case, path coverage, if it was ever tracked, would have been a great way to find all paths or loops from start to end inside an application that had been tested at least once. The logical complexity of an application design can be handled by using tools available for creating flow graphs that are often used in code coverage. Especially when reports specify that the software sat dormant in the vehicles until engineers hitched it up for testing, an insight into coverage metrics could have come in handy. It can help figure out if a particular assembly, class, or method was actually tested. If it wasn’t tested, why weren’t additional tests written for the method causing the problem? Or if it was tested, why didn’t the test case reveal an unknown problem deep in the code base?

Outsourcing Software? How Does that Play In?

Many times, organizations outsource their development to companies in other countries, primarily to minimize cost and increase margins. If the software in play was actually outsourced, the firm that developed this piece of software also has stake in the controversy. But then should Volkswagen be blamed for a lot less? To answer that, it is important to understand who and what kind of testing was performed before the software was rolled out to the market. This brings me to the testing phase of the software.

Integration and Regression Tests: A critical piece of evidence is the kind of integration tests (manual or automated) that were conducted. The key is if the program in question was actually tested with other modules and what the results were. Additionally, with countries like Australia unable to confirm that Volkswagen cars have cheating software, such integration tests could hold the key to understanding if Volkswagen got any discrepancies when testing was performed on their end.

Similarly, regression tests are often run to manage the risk of change associated with rolling out a new feature. With the code base changing after the introduction of the new software, especially given the system consists of fairly complex sub-systems, did regression tests indicate any problems and in-turn link those causes to the new feature?

Getting access to any of these test results essentially necessitates that test results have been tracked and not been doctored. A test management system that contains information regarding the kind of tests run, and who and when these tests were run, would be critical.

There are a lot of “ifs” and “buts” involved in uncovering how the chain of events actually played out in this real world scenario. Over the next few days, it would be interesting to see if any of these evidences do come in handy during the investigation.

Nikhil Kaul is Product Marketing Manager for TestComplete at SmartBear Software, the choice of more than two million software professionals for building and delivering the world’s best applications. He recently received a Master's Degree in Business Administration from Georgetown University. Read his blog posts or follow him on Twitter @kaulnikhil