Code Coverage in Large Applications

While its name is directly related to code, code coverage metrics is also meaningful for testing. It is a broadly accepted maxim that “the code not tested is the code that does not work,” and code coverage gives you a direct measure of the code that is tested. Code coverage allows you to detect which parts of the code are exercised by your automated or manual tests and which are not. It is an established best practice to collect code coverage for both unit and functional tests to apply a measure of effectiveness to the existing tests. Additionally, code coverage also allows you to identify potential dead code in the tested application, which is at a minimum useful for general code hygiene and is required for groups where “no dead code” is a policy.

Code coverage analysis can be applied both to a software product as a whole, as well as specific components or libraries. Depending on the codebase size, or lines of source code (LOC), collecting code coverage presents varying degrees of difficulty. Getting code coverage for a 10,000 LOC application can be pretty easy, while doing that for a 500,000 LOC application may require a more complicated test setup. Code coverage analysis of several million LOC is a very different beast and calls for a thoughtful divide and conquer approach, which needs to be carefully thought out.

Integrated Code Coverage Analysis with AQtime

At SmartBear, we make products for software quality, so we care about code quality like no one else. Our flagship product, TestComplete, is a very complex piece of software that includes more than 3 million LOC. This is comparable, for example, to the size of Eclipse Platform or Qt 5 framework. In this article, we would like to share with you some real-life challenges that we came across during coverage profiling of TestComplete and how we solved them with the help of AQtime. Some tips are technical while others are general work- flow recommendations. We hope you will find all of them useful for setting up code coverage analysis of your large and unruly apps.

AQtime, a runtime application profiler, includes capability to collect code coverage (line/statement coverage), along with performance, memory, and other runtime metrics. AQtime collects code coverage information during a single application run, and can merge cover- age data from multiple runs for a roll-up report.

If you are using both AQtime and TestComplete, we provide an easy integration of functional testing with TestComplete and code coverage capabilities of AQtime. Using this integration, you can automatically collect code coverage of your TestComplete automated tests and create roll-up reports.

Effective Practices for Coverage Analysis of Large Applications with AQtime

Disable Compiler and Linker Optimizations During Build

AQtime identifies an application’s functions by analyzing its binary code rather than the source code. It then uses the debug information to map the application’s binary code instructions back to the original source code lines.

Compiler and linker optimizations, however, can complicate the matter significantly and cause unusual profiling results. For example, when profiling an optimized release build, a developer can see a method hit when it definitely does not happen, or end up with total cov- erage numbers that are higher than they actually are. To avoid these and similar issues, we recommend performing coverage analysis on binaries built without compiler and linker optimizations.

Here are some examples that illustrate this:

  • Visual C++ linker supports the COMDAT folding optimization (/OPT:ICF), when functions containing identical code are folded (merged) into a single copy in the resulting EXE to reduce its size. This optimization is used, for instance, with templates, field accessors, AddRef methods and default destructors. In this case, the same function in the application binary code corresponds to several functions in the source code, and there is no way for the profiler to know which source-level function is “correct”. As a result, you can see misattributed function hits in your coverage results.
  • Inline functions are expanded into the body of the caller at each function call (where possible). Since inlined functions do not exist as functions in the compiled binary and debug information, they aren’t discoverable by AQtime. Consequently, they do not show up in the coverage results. The coverage numbers for inlined functions are included in the coverage of their caller functions.

Use Light Coverage Profiler for Large Applications

AQtime includes two profilers for coverage analysis – the Coverage profiler and the Light Coverage profiler. The latter, as you can guess from the name, is a lightweight version of Coverage profiler, in the sense that it does not track some statistics, such as threads and hit counts.

Unlike performance profiling, when it is useful to know which lines and functions get hit more often than others, coverage analysis is about whether or not a line or function gets hit at all. The Light Coverage collects less data than the Coverage profiler, which makes it faster and adds less overhead for the application execution. Therefore for complex applications with large codebases, Light Coverage is the preferred choice.

Do Smaller Test Runs and Merge Results

Large applications consisting of many executable modules also have huge volume of debug information – up to ten times larger than the file size of the application binaries. For example, the total size of all TestComplete modules including debug information is about 2 GB, which is a challenging amount of data for a profiler to keep track of in real time, given that it hits the process memory limit on 32-bit Windows.

A rule of thumb when dealing with large ap- plications is collecting coverage for each single test separately rather than using one long session for all tests at once.

Also, the profiler needs to dynamically collect data from the profiled application about executed binary instructions, so the profiler is in danger of running out of memory.

That is why a rule of thumb when dealing with large applications is collecting coverage for each single test separately rather than using one long session for all tests at once. First, this gives better profiling performance and helps avoid memory limit issues. Second, it is generally easier to isolate issues, should those come up during profiling, and relate them to a specific test rather than a contin- uous sequence of tests.

This approach, however, leaves you with multiple coverage result sets from multiple profiler runs. You will then need to combine these individual result sets to get the total coverage percentage for your project. In AQtime, this is done by merging two or more sets of coverage results.

The resulting merged report shows you aggregated data for multiple coverage test runs. Here, a source code line is reported covered if it was covered in at least one of the result sets that were merged into the report. The total coverage for functions, source files, binary modules and the entire project is also re-calculated based on the summarized data, so you can easily know these totals for all the test runs performed.

Increase Application Timeouts and Test Run Timeouts During Coverage Analysis

In general, adding coverage analysis to tests increases test runtime because collecting coverage has some runtime overhead. Typically, while this overhead is pretty small, it varies depending on a number of factors, such as:

  • The number of application modules, functions and source code lines being analyzed at once. For example, analyzing code coverage for 5-10 modules is faster than for 50 modules.
  • The level of code coverage analysis. For example, function coverage analysis has better runtime performance than line coverage analysis.
  • Computer hardware configuration. For example, coverage profiling on a physical machine is faster than on a virtual machine.

In some cases, the performance overhead is low to insignificant. In other cases, the performance impact can cause the application execution to go a different source code path. This can happen with applications that interact with external resources (databases, web servers, remote computers, other applications and so on) or include time-sensitive components. With certain applications, even a minimum performance degradation may cause the application to hit its timeout limits before completing operations, and therefore trigger timeout-related code. As a result, the application may behave differently during coverage profiling than it does normally, and you may get different coverage data as different code paths are touched.

To avoid timeout-related issues during coverage profiling, we recommend temporarily increasing timeout limits used in your application and infrastructure – database timeouts, request timeouts, session timeouts and so on – for the time period the coverage profiling is performed. This helps to prevent your application from hitting timeouts and consequently exercising unexpected code paths when it is run under a profiler.

This recommendation also applies to timeouts configured for your unit and functional tests that are used for coverage analysis. For example, you may need to increase timeout limits for the test runs to keep up with the tested application’s runtime performance impact introduced by code coverage instrumentation. The exact values depend, of course, on the specific application, tests used for coverage analysis and the abovementioned factors.

Test in Many Different Environments to Reach Specific Code

The Windows operating system changes from version to version, and many projects are likely to contain code that executes only on a specific Windows version. Our own TestComplete is not an exception. Here are some examples of environment-specific code that we came across when performing code coverage analysis of TestComplete:

  • Interacting with 32-bit vs. 64-bit applications from TestComplete (which itself is 32-bit) requires using different structures to pass data to and receive data from external applications because of different pointer sizes and address space arrangement in 32- and 64-bit environments.
  • Interacting with Unicode and ANSI applications. For example, using the SendMessageW function vs. SendMessageA with the corresponding string types.
  • Aero-style wizards on Windows 7 and Vista vs. classic dialogs on Windows XP.
  • Presenting operating system information, such as Windows full name, to the user in a textual form.
  • Getting special folder paths using SHGetFolderPath on Windows XP vs SHGetKnownFolderPath on Windows Vista and later. Windows version-specific code presents challenges to code coverage analysis, as a test suite run on a single version of Windows just will not reach code blocks specific to other Windows version. This means you will have to run coverage tests in many different environ- ments and merge the results to get the overall coverage.

Choose Realistic Coverage Goals (Or How to Stop Worrying About 100% Coverage and Ship Code)

Some consider 100% code coverage to be the Holy Grail of development and quality metrics. However, 100% test coverage is rather difficult to achieve and is thus rare, especially in case of large applications with complex logic. Moreover, targeting 100% coverage can actually impede development and testing productivity. When the application’s core and most frequently used functionality have already been covered by tests, getting additional percentage of coverage becomes tough due to the amount of efforts needed to reach the not so commonly used areas.

If you target 100% perfect coverage, you can later find yourself spending most of the development cycle time creating and maintaining tests for these rare cases.

Exception handling blocks are one example of code that are tricky to cover, because exceptions are not supposed to happen in normal operating conditions. You will need negative tests to cover various exceptional situations like missing or corrupted files and registry items, access denial, inter-process communication issues and so on. AQtime provides a Failure Emulator that will help you artificially create some of the error con- ditions during application runtime. Still, some exceptions can be so extremely rare and hard to emulate that there’s little business value in having them covered. If you target 100% perfect coverage, you can later find yourself spending most of the development cycle time creating and maintaining tests for these rare cases. Wouldn’t you rather spend you development efforts on enhancing your product and adding new features that your customers need?

That is why it is important to choose a realistic coverage goal. For example, closer to release dates you might want 80-95% coverage. However, depending on the complexity of your application, even 20% can be acceptable as an initial or intermediate goal if it is the right 20% that represent the application’s core functionality.

Maintain the Code Coverage Level as the Project Evolves

As the project evolves, and new code is written and existing code is changed, so should your tests evolve for optimum coverage pro- filing. Watch your code coverage trends over time, and adjust the testing process accordingly. If you notice the coverage level tends to decrease, it is a good indication that tests do not keep pace with the application codebase changes. In this case, you need to spend more time creating new tests or improve existing tests to cover code that is being missed.

Conclusion

Code coverage is a great way to find parts of codebase that are being missed by tests, so that you can improve the tests. Code coverage also effectively makes you review code that needs to be looked at, which helps you identify logic flaws or unused code that can be removed.

Visit our Resource Center to learn more about Code Coverage and Profiling.