Measuring the Benefits of Peer Review

  February 24, 2014

It doesn’t take as much effort as you might fear to collect, analyze and interpret metrics from your peer reviews. It’s more a matter of establishing a bit of infrastructure to store the data, then making it a habit for review participants to record just a few numbers from each review experience. In fact, I think that peer review metrics provide an easy way to begin growing a measurement culture in your organization.

Why Collect Data?

Recording data about the review process and product quality is a distinguishing characteristic of formal peer reviews, such as the type of rigorous peer reviews called inspections. Data answers important questions, provides quantifiable insights and historical perspective, and lets you base decisions on facts instead of perceptions, memories or opinions.

For example, one organization learned that it could inspect requirements specifications written by experienced business analysts twice as fast as those written by novices because they contained fewer defects. This data revealed the need to train and mentor novice BAs. Another organization improved its development process by studying data on defect injection rates and the types of defects their inspections did not catch. This example illustrates the value of recording the life-cycle activities during which each defect is created and discovered. One way to choose appropriate metrics is the Goal-Question-Metric, or GQM, technique.

  1. First, state your business or technical goals.
  2. Next, identify questions you need to answer to tell if you are reaching those goals.
  3. Finally, select metrics that will let you answer those questions.

One goal might be to reduce your rework costs through peer reviews. Answers to the following questions could help you judge whether you’re reaching that worthy goal:

  • What percentage of each project’s development effort is spent on rework?
  • How much effort do our reviews consume? How much do they save?
  • How many defects do we discover by review? What kind? How severe? At what life-cycle stage?
  • What percentage of the defects in our products do our reviews remove?
  • Do we spend less time testing, debugging, and maintaining products that we reviewed than those we did not?

Some Measurement Caveats

Software measurement is a sensitive subject. It’s important to be honest and nonjudgmental about metrics. Data is neither good nor bad, so a manager must neither reward nor punish individuals for their metrics results. The first time a team member is penalized for some data he reported is the last time that person will submit accurate data.

Defects found prior to peer review should remain private to the author. Information about defects found in a specific peer review should be shared only with the project team, not with its managers. You can aggregate the data from multiple reviews to monitor averages and trends in your peer review process without compromising the privacy of individual authors. The project manager should share aggregated data with the rest of the team so they see the insights the data can provide and recognize the peer review benefits. Beware the phenomenon known as measurement dysfunction. Measurement dysfunction arises when the measurement process or the ways in which managers use the data lead to counterproductive behaviors by the people providing the data.

People behave in the ways for which they are rewarded; they usually avoid behaviors that could have unpleasant consequences. Some forms of measurement dysfunction that can arise from peer reviews are: inflating or deflating defect severities; marking as closed defects that really aren’t resolved; and distorting defect densities, preparation times, and defect discovery rates to look more favorable. There’s a natural tension between a work product author’s desire to create defect-free products and the reviewers’ desire to find lots of bugs. Evaluating either authors or reviewers according to the number of defects found during a review will lead to conflict.

If you rate reviewers based on how many defects they find, they’ll report many defects, even if it means arguing with the author about whether every small issue truly is a defect. It’s not necessary to know who identified each defect or to count how many each reviewer found. What is important is that all team members participate constructively in peer reviews.

Help managers avoid the temptation to misuse the data for individual performance evaluation by not making individual defect data available to them. It’s tempting to overanalyze the review data. Avoid trying to draw significant conclusions from data collected shortly after launching your peer review program. There’s a definite learning curve as software people beginning participating in systematic reviews and figure out how to do them effectively and constructively.

If you begin tracking a new metric, give it time to stabilize and make sure you’re getting reliable data before jumping to any conclusions. The trends you observe are more significant than any single data point. Now that we have a basic foundation of peer review metrics principles, the next sections will get into some specific metrics to track and how to analyze the data.

To read more about base metrics, derived metrics, and how to get the absolute most out of your peer review data, download our free eBook below:

Measuring the Benefits of

Peer Review eBook

See also: