In the first two articles in this series, I discussed some concepts and principles about measuring peer reviews, and identified a number of base and derived metrics to calculate from each review. This final article, adapted from my book "Peer Reviews in Software," describes how to analyze all of that peer review data.
One aspect of benefit is effectiveness, or yield: The percentage of the defects present in a work product that your reviews discover. You’d like this to be close to 100%. Another aspect is efficiency: the effort required to find a defect by peer review. Less effort per defect indicates higher efficiency. As before, the discussion below refers to inspections as the type of formal peer reviews from which data typically is collected.
As you accumulate data from multiple inspections of a particular type of work product, the spreadsheet I described in the previous article will calculate the averages of the metrics. You can then use scatter charts to look for correlations between pairs of metrics. The spreadsheets do not contain any scatter charts, but it wouldn’t be difficult to add those that you find informative.
Figure 1 shows a sample scatter chart. This chart plots Effort.per.Defect (average labor hours of review needed to find one defect) against Rate.Preparation (average lines of code per hour studied during individual preparation) for a set of code inspections. Each data point represents a separate inspection. For example, the highlighted point in Figure 1 marks a inspection that had a preparation rate of 100 lines per hour and found one defect for every half hour of preparation time.
Figure 1. Sample scatter chart of two inspection metrics
This chart shows that inspections having preparation rates slower than about 200 lines of code per hour are less efficient than those with higher preparation rates. That is, it takes more hours of work, on average, to find each defect if the inspectors prepare slowly. This doesn’t necessarily mean that you should race through the material to try to boost efficiency, though.
In figure 2, A plot of Defect.Density (the number of defects found per thousand lines of code) versus Rate.Inspection shows that effectiveness decreases with higher inspection rates (Figure 2). That is, the faster you go, the fewer defects you find. (Or, is that discovering more defects forces you to cover the material more slowly?) This type of data analysis helps you judge the optimum balance between efficiency and effectiveness.
Figure 2. Relationship between defect density and code inspection rate
Statistical Process Control
As your organization establishes repeatable development and inspection processes, you can use statistical process control (SPC) to monitor key inspection parameters. SPC is a set of analytical techniques for measuring the stability of a process and identifying when an individual performance of the process falls outside the expected range of variation, or control limits. Points that lie beyond the control limits stimulate an inquiry to understand why. The historical data trends from a process that is in control facilitate predicting future outcomes of the process. Inspection metrics that are amenable to SPC include Rate.Preparation, Rate.Inspection, Size.Actual, and Defect.Density.
As a simple illustration of SPC, Figure 3 depicts a control chart containing preparation rates (lines of code per hour) from 25 code inspections. The control chart plots specific data (Rate.Preparation) from the process being observed (inspection preparation) versus the set of observations (inspection number). The average preparation rate was 240 lines of code per hour, shown with the solid horizontal line. The upper control limit, shown as the dashed line at 440 LOC/hour, is a statistic that attempts to discriminate normal variation or noise in the process from variations that can be attributed to some assignable cause.
Figure 3. Sample inspection data control chart
Figure 3 shows that the preparation rate for inspection 14 (shown in red) was an abnormally high 560 LOC/hour, which lies well beyond the upper control limit. Because inspection 14 departed from the expected preparation rate range, its results should be viewed with caution. Did inspectors cut their preparation shorter than the organization’s historical data recommends for optimum defect discovery? Was there something unusual about the work product—particularly straightforward or clean code, reused code—that justified the increase in preparation rate? Was the inspection team particularly experienced and efficient?
Correlating defect densities with preparation rates for a repeatable process lets you judge whether a code component that had an abnormally high preparation rate might suffer future quality problems because of residual defects. This is the kind of data analysis that allowed my friend Nick, the quality manager mentioned in the first article in this series, to predict how many major defects he expected to find when inspecting a new code module.
Measuring the Impact of Inspections
Participants can subjectively judge the value of inspections. Sometimes, though, a more quantitative benefit analysis is desired. Ideally, you will be able to demonstrate that inspections are saving your project team, company, or customers more time than they consume.
Inspection effectiveness is a lagging indicator: you can’t measure it at the time of the inspection, only later. Calculating effectiveness requires that you know both how many defects your inspections discover and how many were found later in development, during testing, and by customers. Defects found by customers within three to six months following product release provide a reasonable indication of how many significant defects eluded your quality nets. To illustrate an inspection effectiveness calculation, consider the following sample data for a single code module:
Defects found during code inspection:
Defects found during unit testing:
Defects found during system testing:
Defects found by customer:
Total defects initially present in the code:
Code inspection effectiveness:
100 * (7 / 13) = 54%
If you know your inspection effectiveness, you can estimate how many defects remain in a document following inspection. Suppose that your average effectiveness for inspecting requirements specifications is 60 percent. You found 16 major defects while inspecting a particular specification. Assuming that the average applies to this inspection, you can estimate that the document originally contained about 27 major defects, of which 11 remain to be discovered later. Without knowing your inspection effectiveness, you can’t make any claims about the quality of a document following inspection.
Suppose you inspect just a small sample of a document. Combining your known inspection effectiveness, the document size, and the number of defects found in the sample lets you estimate the total defects in the rest of the document. This is simplistic because it assumes that the sample is representative of the whole document, which might not be true. To estimate the potential cost of those remaining defects, multiply the defect count estimate by your average cost of correcting defects found through testing or by customers. (If you don’t know those average costs, begin collecting them now!) Compare that projected cost with the cost of inspecting the rest of the document to judge whether such inspection is warranted economically.
Using some typical numbers, the following example suggests that it would be cheaper to remove the remaining defects by inspection (90 labor hours) than by system testing (270 labor hours). This illustration assumes that testing and inspection find the same types of bugs, but that also might not be the case.
Defects found in a 2-page sample of 20 pages of code:
Estimated defects in the remaining 18 pages:
Effort spent inspecting the 2-page sample:
10 labor hours
Estimated effort to inspect the remaining 18 pages:
5 labor hours/page * 18 pages = 90 labor hours
Average effort to find and correct a defect in system test:
15 labor hours
Estimated effort needed to find and correct the remaining 18 defects by system testing:
15 * 18 = 270 labor hours
The more efficient your inspections are, the more defects they discover per hour of effort. Your efficiency will increase with experience. When you begin holding inspections, strive to maximize efficiency, which reduces the average cost of finding a defect.
There’s a paradox here, though. A successful inspection program leads to process improvements that reduce the number of errors developers make. This leaves fewer defects in the deliverables to be discovered by inspection or testing. So, as your product quality improves, the cost of discovering each defect will increase, and the trends in your metrics will become harder to interpret. Monitor both inspection efficiency and effectiveness to understand whether a trend toward decreasing efficiency truly indicates higher product quality, or if it means your inspections are not working as well as they should be. You might reach a point where the increased cost of using inspections to hunt for the few defects present in the product exceeds the business risks of shipping the product with those defects still present.
Peer review measurement, like everything else you do on a project, is not free. If you’re serious about quality, though, you’ll find that the small investment you make in metrics is well repaid by the insights you gain. You’ll learn how your peer review program is—and is not—working and how to improve it. And you’ll be able to convince skeptics that peer reviews are indeed a valuable investment that improves both product quality and team productivity.