Introduction
2016, it was reported that the US Air Force's F-35 Lightning II Joint Strike Fighter was hit by yet another software issue, affecting the functioning of its radars and forcing pilots to reboot the radars. Previously, the USAF claimed the software program had 419 deficiencies to be addressed and that 700 to 800 issues with software, hardware, and the Autonomic Logistics Information System had already been corrected. [1] While the Air Force and its contractors diligently work to correct such mistakes, the potential fall-out of software errors is catastrophic. In addition to putting at risk an asset costing upwards of $132M [2], software failures of this nature could also jeopardize military missions. Even more tragically, human lives could be lost.
The Problem is Complexity
As systems become more capable, it becomes harder to test all the ways they will be used in advance. Once you test software and fix all the problems found, the software will always work under the conditions for which it was tested. The reason there are not more software tragedies is that testers have been able to exercise these systems in most of the ways they will typically be used. But all it takes is one software failure and a subsequent lawsuit to seriously damage a company’s reputation. Test-and-fix approaches are vital dynamic testing approaches.
Whether performed on individual units or the entire system, these dynamic approaches share one common shortcoming: they all rely on test cases. Test case scenarios are constructed from the same source documents that developers use, such as requirements and specification documents. These documents are much more comprehensive at defining what the finished product should do, rather than what it shouldn’t do. Developers inject about 100 defects into every 1,000 lines of the code they write. [3] Many of these defects will have no impact on the test case scenarios designed for testing. Yet, they could have devastating, unforeseen effects in the future.
If Quality Cannot be Tested in, Then What?
Software product quality assurance in the aerospace industry has been intertwined with process assurance for more than 40 years. At NASA’s Goddard Space Flight Center, software quality assurance (SQA) is assigned to the office of Systems Safety and Mission Assurance. The SSMA office stipulates that a list of tasks to be performed at each of 6 phases of the software development life cycle - from concept and requirements gathering through design, development, testing and maintenance.
Within the development phase, the very first requirement of SSMA is “perform code walk-throughs and peer reviews.” [4] DO-178C, “Software Considerations in Airborne Systems and Equipment Certification” is the title of the recently published document by which the certification authorities such as FAA, EASA and Transport Canada will approve all commercial software-based aerospace systems. DO-178C was published in January 2012 and replaces DO-178B which was last revised in 1992. Chapter 6.1 defines the purpose for the software verification process. In comparison to its predecessor (DO-178B), the more recent DO-178C adds the following statement about the Executable Object Code: “The Executable Object Code is robust with respect to the software requirements that it can respond correctly to abnormal inputs and conditions.” Compare that to the earlier version of this statement in DO-178B: “The Executable Object Code satisfies the software requirements (that is, intended function) and provides confidence in the absence of intended functionality.” This addition points out the importance of identifying and rectifying unintended consequences of the code. (DO-178B and DO-178C)
The guidelines use the terms “verification” and “validation” (also referred to as “V&V”) to encompass software quality process requirements. While the terms are sometimes interchanged, validation generally refers to traditional, dynamic testing of the “by objective evidence.” Verification, on the other hand, refers to confirmation by examination. In a software development environment, software verification is confirmation that the output of a particular phase of development meets all of the input requirements for that phase. Software testing is one of several verification activities intended to confirm that the software development output meets its input requirements. Other verification activities specifically listed include:
- Walk-throughs
- Various static and dynamic analyses
- Code and document inspections
- Module level testing
- Integration testing
As Capers Jones points out, "A synergistic combination of formal inspections, static analysis and formal testing can achieve combined defect removal efficiency levels of 99%." [5] Where tool assisted peer review stands out is in code and document inspections as well as providing a central location for reviewing test cases, plans and the results of static analysis tools. While some believe static analysis of the code is best done by automated tools, code reviews are actually more effective at finding errors than automated tools. Most forms of testing average only about 30% to 35% in defect removal efficiency levels and seldom top 50%. Formal design and code inspections, on the other hand, can achieve 95% in defect removal efficiency. [6]
There are some verification requirements that can only be satisfied by code review. “While analysis may be used to verify that all requirements are traced, only review can determine the correctness of the trace between requirements because human interpretation is required to understand the implications of any given requirement. The implications must be considered not only for the directly traced requirements but also for the untraced but applicable requirements. Human review techniques are better suited to such qualitative judgments than are analyses.” [7]
The software verification process is aimed at showing the correctness of the software. It consists of requirement reviews, code reviews, analyses, and testing. Reviews are to be regularly conducted throughout the software development process to ensure that the Software Development Plan is being followed. All steps in the decomposition of high-level system requirements to object code are considered in this process. DO-178B and DO-178C require examination of the output of all processes to check for software correctness and to find errors. DO-178C (section 6) requires that:
- The high–level software requirements are correctly and completely formed from the system requirements
- The high-level requirements are complete and consistent
- The software architecture correctly and completely meets all high-level requirements
- The low-level requirements correctly and completely fulfill the software architecture
- The low- level requirements are consistent and correct
- The software code correctly satisfies all low-level requirements
- All code is traceable to one or more low-level requirements
- The object code correctly implements the software on the target computer, and it is traceable and complies with all low-level and high-level requirements [8]
The guidance is clear that all code reviews must be in writing. Otherwise, there is no proof it has been performed. Statements about the code in general, specific lines, and specific issues, must all be tied to the person, time and date of their identification. If needed, this data should be presented as both comments and metrics to allow an accounting of the development process. Firms may perform, manage and document the process manually, as long as they use “appropriate controls to ensure consistency and independence.”
Source code evaluations should be extended to verification of internal linkages between modules and layers (horizontal and vertical interfaces), and compliance with their design specifications. Documentation of the procedures used and the results of source code evaluations should be maintained as part of design verification. [9] The DO-178C does not go into detail as to how code reviews and evaluations should be performed. While thousands of organizations have successfully implemented and defended peer code reviews successfully, many have failed. The difference most often comes down to poor implementation strategies that can be readily addressed:
- Reviews are too long. After just a few hours, attention wanders and effectiveness decreases. All-day code reviews can seem almost painful. Keep reviews short, no more than one or two hours per day. In that time, developers will be able to review between 150 and 300 lines of code, depending on complexity. Not surprising, this rate of review also provides the highest rate of defects identified per line of code (defects / LOC).
- Reviews are seen as an additional task. It is especially true when a review backlog builds up. Rather than let them become a bottleneck, make reviews a daily activity or take them as they come in. Let the code review serve as a break from a hard problem or a way to transition between tasks.
- Comments are seen as subjective. It is easy to discount a colleague’s comments as just their opinion. Make it easy for reviewers to annotate the specific code in question and to get other reviewers to weigh in.
- Remote reviews can be challenging. Distributed teams are a given, and bringing teams together for reviews is at odds with the need for regular, brief reviews. Instead, facilitate remote reviews with tools designed for remote collaboration in general and peer code review, specifically.
- Documentation is not automated. The administrative burden of documenting, archiving and distributing this living document can be overwhelming. Use tools that make compliance documentation an automatic by-product of the review.
One of the most important contributions a company can make to successful adoption of code reviews are the tools it provides its teams. The right tool set will enable each development team to find its own best way to do code reviews, enabling a bottom-up approach to code review design and ensuring fuller achievement of potential gains and regulatory compliance.