What Your QA Team Can Learn from Open Source Development Projects
Studies show that major FOSS projects have fewer defects per lines of code than proprietary software. Free and open source projects follow slightly different protocols than their proprietary counterparts. You can apply some of these processes in your team to your benefit, even if you’re developing proprietary software.
Proprietary development shops can learn a few things from successful open source projects. Empirical evidence from these studies suggests that FOSS projects tend to have fewer defects per lines of code than does proprietary software, but why? Licensing alone won't get you quality. Exemplary software quality comes from good engineering and from removing obstacles that prevent developers from doing their best work.
This is not to say that open source development is a guarantee of code quality. SourceForge, Google Project Hosting, GitHub, and other open source hosting sites are littered with projects that have lousy code. Josh Berkus, a core PostgreSQL contributor who's done time on proprietary projects, says, "Nothing is true across the board. You can't say proprietary software is always lower quality. There's some really schlocky open source software out there, and some really tightly-written proprietary code."
"It all depends on the 'proprietary software,'" adds Greg Kroah-Hartman, Linux kernel developer and maintainer for the linux-stable branch. "What about the software that runs the Space Shuttle?” Kroah-Hartman says. “That's proprietary, and it has the least number of defects-per-lines of anything else known at this point in time."
However, the cream of the crop in open source tends to have fewer defects per line of code when compared to proprietary software. In its "How Open Source and Commercial Software Compare," series in 2003, Reasoning looked at several projects, including the Linux kernel's TCP/IP stack, Apache HTTP Server, the Tomcat Application Server, and MySQL. The code analysis tool provider found that "an active, mature open source project may have fewer defects than a similar commercial project."(PDF)
Reasoning performed automated software inspection to find typical code errors that can cause application crashes, data corruption, or security vulnerabilities. For example, they looked for memory leaks, null pointer dereferences, bad deallocations, out of bounds array accesses, and uninitialized variables. This isn't a full analysis — code that does well in Reasoning's methodology may still have a less-than-stellar user interface or may lack features when compared to a proprietary application — but it does give insight into a significant aspect of code quality.
As a baseline, Reasoning found an "average defect density" in 200 customer projects totaling more than 35 million lines of code of 0.57 defects per 1,000 Lines of Code (KLOC). When looking at MySQL 4.0.16, for example, Reasoning found only 0.09 defects per KLOC when compared to commercial database components with between one and five years in the field. The Linux kernel's TCP/IP stack had a defect density of 0.013 per KLOC, the lowest defect rate when compared to commercial UNIX stacks and embedded networking operating systems. Similar results were demonstrated by Coverity in its 2008 and 2009 open source reports.
Clearly, there's a few things that open source projects get right. Some of them can be emulated by any development process. Here’s how they might help your company’s development and QA teams.
The Unfair Advantage of Unpaid Staff
At first glance, a project that depends in part or in whole on contributors who are not paid to focus on the project would be at a disadvantage. After all, in a commercially motivated project with a dedicated development staff, the company can expect developers to show up and direct development, rather than depending on the whims of a community that may have other professional priorities.
However, while companies may be able to hire the best and brightest to work on a project, they can't hire everyone. Therein lies an advantage for open source projects that depend on contributors. They may not dedicate 40 hours a week to a project, but they bring their "A Game" when they do show up.
Berkus argues that open source projects "only get the most productive hours of [developers'] contributions."
The key is specialization. Where a company may not be able to staff a project with people who are experts in every area, a successful open source project may attract the right contributors for each feature. "We have PostgreSQL contributors who might put in 10 hours a month reviewing and writing code. That's actually the most productive hours they put in,” says Berkus. Some people have specialty knowledge (such as cryptography or memory management). They may work on one feature, make that improvement, and they're done. “Whereas when you have a fulltime staff you end up taking somebody whose specialty is completely different and say, 'Go learn about cryptography.' The feature you end up with is not as good as if you could count on this expertise," explains Berkus.
Open source projects also give anyone the power to review code — whether or not they're responsible for it. The axiom, "With enough eyes, all bugs are shallow," has a lot of truth to it. Berkus notes that proprietary shops discourage this kind of behavior. If a developer looks over code from another part of a project, they're likely to be asked by a manager, "You're not on that component, why are you wasting your time?"
The Marketing Influence
One advantage that open source projects have is the influence of marketing and sales on schedules, or the lack thereof. Many open source projects — particularly those which are not driven by a single vendor — have time-based release schedules or "release when it's ready," schedules. Features that are not ready, come release time, are held for future releases, or eventually scrapped altogether.
Vendor-driven, proprietary projects tend to have schedules and feature requirements that encourage shipping code regardless of the actual quality. Berkus says that a problem for "packaged software" is that "It's driven by marketing. Marketing says that we need feature XYZ in the next release in 2012. Developers need to somehow make that happen. The problem is that if they run into unforeseen obstacles, it's seldom possible to increase the budget in order to overcome obstacles and stay on the same timeline... so they're faced with letting the release date slip or shoehorning in a feature that doesn't work properly. Really, it's a failure of a company refusing to choose between cost, schedule, and feature.”
That’s one key difference from corporate-driven projects. "An open source project does not have a marketing department. If you're asking about a feature, you're asking the release team," points out Berkus.
So how could a vendor solve this problem, faced with the need to ship a product on time, with features that were promised to customers? Berkus says that it could be difficult for a company that needs to make money on a project, but suggests that they "need to prioritize" and "have a contingency fund if a feature is more difficult than anticipated. Have a slush fund of X to spend on extra effort and time to make it happen."
The Marketing staff should look to Engineering to help set timelines, too. "You really ought to be having a meeting with engineering, asking, ‘How much is this going to cost to implement? What's the risk that it will fail?’" he says.
Berkus contends that open source and proprietary projects do share one common problem: The influence of high-end users. "For any piece of software, the users you're going to hear from most often are your high-end users. For databases, it’s the users with terabytes of data; they have a laundry list of things they want. This happens whether you're an open source project or a proprietary software company.”
Gradually, the development team begins to design the software around those users’ features because they make so much noise. That causes problems because "You break something the majority of your users actually liked." For example, Berkus says, "Something that used to work with one command becomes more complicated so that high-end users have more options. It makes high end users happy, but pisses everyone else off."
It's worse with sales-driven projects, though. “The conduit of communication is different," Berkus notes. Sales doesn't communicate the needs of small accounts to the development staff with the same emphasis (or at all) compared to the company’s big accounts. With open source projects, it's a level playing field. Users have direct access to bug trackers and development lists. They can even fix problems themselves.
To avoid the "power user distortion field," developers on either side need to listen to feedback from all users — not just the noisiest power users.
Wasting time may be advantageous, at least for quality if not bottom-line productivity. If it's not broke, don't fix it. Surely we've all heard this from a project manager at some point or another in our careers. That might be a good excuse not to fix or replace aging office equipment as long as it's doing its job, but it's a lousy reason not to refactor code.
Open source projects are constantly refactoring code and taking out code that is no longer relevant. One only need look at the Linux kernel from release to release as a prime example. Pieces that are no longer relevant are removed. Features are constantly re-evaluated and redesigned for better performance.
Berkus says that 80% to 90% of the code in PostgreSQL has been refactored over the last 10 years, "and heavily used portions of the code more than that." What's more, PostgreSQL has cut code nearly as much as it's added code. "We've probably added 50,000 to 80,000 lines of code, and cut 30,000 lines of code." In proprietary projects, he says, "That sort of thing never happens. That comes down to management. They get rewarded for meeting deadlines, adding new stuff, saving money. They do not get rewarded for long-term code reliability and maintainability."
Projects, says Berkus, should also learn from the modularity of open source projects. Though open source projects often achieve this by re-using code from other projects (which a proprietary project may not be able to do) a proprietary shop can try to make things more modular. Instead, says Berkus, proprietary shops "tend to build everything monolithically" which increases the amount of effort for each change as the codebase grows.
Finally, there's reputation to consider. While most developers in a proprietary shop are known only to their immediate peers, an open source developer is putting it on the line for everyone to see. Says Kroah-Hartman, "The knowledge that others will be reviewing the code, and that your name is on it in a public area is a great incentive to take the time and effort to ensure that your code is correct before sending it out."
Still, Kroah-Hartman cautions that companies (and projects) realize that it's all about good engineering processes and not the licenses used or grouping all projects together. "Both groups need to learn from each other, and there are excellent examples of both successes and failures for both types of projects."
What else have you seen done in open source projects that could be adopted in other realms? Tell us about it in the comments.