Fundamentals of Performance Profiling

Performance profilers are software development tools designed to help you analyze the performance of your applications and improve poorly performing sections of code. They provide measurements of how long a routine takes to execute, how often it is called, where it is called from, and how much of total time at some spot is spent executing that routine. If you've used a profiler in the past, you'll certainly agree that it is a wonderful asset during the development and QA process. Did you ever wonder, though, if the results and timings produced by a java or C# performance profiler are actually correct?

There are different ways to measure the performance of an application while it runs. Depending on the method used, profiler results will vary; this can affect your ability to optimize your projects. Profiling methods fall into two broad categories: Instrumenting and Sampling. Let's take a look at each.

Instrumentation

Instrumenting profilers insert special code at the beginning and end of each routine to record when the routine starts and when it exits. With this information, the profiler aims to measure the actual time taken by the routine on each call. This type of profiler may also record which other routines are called from a routine. It can then display the time for the entire routine and also break it down into time spent locally and time spent on each call to another routine.

Currently, two types of instrumenting profilers are available on the market: source-code modifying and binary.

Source-code modifying profilers create several problems. They tend to conflict with source code control systems. They do not always reliably parse the source they are supposed to instrument. In fact, since the addition and removal of instrumentation can be bit tricky, source-code modifying profilers often suggest that users work with a copy of the project source to avoid possible corruption.

Also, at best these profilers can only insert their instrumenting code at the start of a procedure in source. At this point, procedure setup has already run (for stack frames, local variables, parameters). In small procedures, setup can be a significant portion of execution time. Yet, there is no way to time the setup itself using a source-code modifying profiler.

Binary profilers (or hierarchical profilers ) work strictly at runtime. They insert their instrumentation directly into an application's executable code once it is loaded in memory. Source code is not required in any way, and thus there is no risk of corrupting it. Since a binary profiler works anew on each execution, it is also very easy to find some slow code on one execution, try an improvement in source, recompile and test again - incremental optimization is supported in real time.

And of course a binary profiler inserts its instrumentation just at the first assembly instruction of each routine. This insures that routine setup is counted in the timing.

Pitfalls of Instrumentation

The timer calls which an instrumenting profiler inserts at the start and end of each profiled routine take some time themselves. To account for this, at the start of each run instrumenting profilers measure the overhead incurred from the instrumenting process - they calibrate themselves - and they later subtract this overhead from performance measurements. This usually works out very well.

However, when a routine is very short, another effect due to the instrumentation becomes important. Modern processors are quite dependent on order of execution for branch predictions and other CPU optimizations. Inevitably, inserting a timing operation at the start and end of a very small routine disturbs the way it would execute in the CPU, absent the timing calls. If you have a small routine that is called millions of times, an instrumenting profiler will not yield an accurate time comparison between this routine and larger routines. If you ignore this, you may spend great deal of effort optimizing routines that are not the real bottlenecks.

Sampling

To help address the limitations of instrumenting profilers, sampling profilers let applications run without any runtime modifications. Nothing is inserted, order of execution is not affected, and all profiling work is done outside the application’s process.

The operating system interrupts the CPU at regular intervals (time slices) to execute process switches. At that point, a sampling profiler will record the currently-executed instruction (the execution point) for the application it is profiling. This is as short an operation as can possibly be implemented: the contents of one CPU register are copied to memory. Using debug information linked into the application's executable, the profiler later correlates the recorded execution points with the routine and source code line they belong to. What the profiling finally yields is the frequency with which a given routine or source line was executing at a given period in the application's run, or over the entire run.

As the profiler operation is executed less often, and as it is so much simpler than a time measurement, the overhead is negligible, and the application runs practically at its real speed.

A sampling profiler is the perfect tool to isolate small, often-called routines that cause bottlenecks in program execution. The downside is that its evaluations of time spent are approximations. It is not impossible that a quite fast routine should regularly be executing at the sampling interrupts. To make sure a given routine is slow, it is recommended the application be run twice through the sampling profiler.

Another limitation of sampling is that it only tells what routine is executing currently, not where it was called from. A sampling profiler cannot give you a parent-child call trace of your application. Nor can it show you that a routine is actually running slow, when the time is not spent in its own code, but in routines it calls, either because it makes many calls, or because the called routines are slow. By contrast, the code which an instrumenting profiler inserts at the beginning of each routine traces out and records what other routine it is being called from.

How To Insure Proper Performance Analysis

There are many other differences between profiling methods. But our point is already made: not all profilers are alike, each has strengths and weaknesses and each is properly applied only to specific aspects of application testing during development. Read how to select the best profiling tool to get a better grasp on a few different tools. To accurately and successfully isolate bottlenecks in your code, you must use a combination of profilers.

To learn more about Performance Profiling, visit our Code Profiling Resource Center.