code review
review tool software inspection
code review tool blog site map contact us
Sales/Support: (978) 236-7860
Free book about peer code review
Free webinars about code review

Collaboratorallowed us to improve our development process and share the knowledge from code reviews with the whole team.

Jagdev Dhillon
Senior Director of Engineering
GoldenGate Software
read more

NullSafe Patent-Pending Technology

Smart Bear has developed a unique, patent-pending algorithm to perform the complex task of real-world NullPointerExecption detection.

Why finding NullPointerExceptions is hard

Consider the following Java code snippet:

	char getFirstChar( String s )
	{
		if ( s.length() == 0 )
			return ' ';
		return s.charAt( 0 );
	}
	

The intent is to return the first character of a string, or a space character if the string is empty. Of course if s is null, this routine will throw a NullPointerException when the JVM tries to evalute the length of the string.

In this example, static analysis should be able to identify this possible bug without difficulty. So the user fixes the bug in the obvious way:

	char getFirstChar( String s )
	{
		if ( s == null )
			return ' ';
		if ( s.length() == 0 )
			return ' ';
		return s.charAt( 0 );
	}
	

A simple code-path analysis should now be able to detect that the s.length() and s.charAt() expressions will never be evaluated if s is null. Again, no trouble.

However this is not always the way such a bug is fixed. In fact, in this example many Java developers would reailze that the Apache Jakarta Commons library has a routine that better fits the purpose at hand. (Lest the reader believes this is a contrived example, in a scan of the Apache Tomcat project (possibly the most-used servlet container in the world) we found thousands of references to StringUtils and other methods in the Apache Commons library.)

	import org.apache.commons.StringUtils;
	char getFirstChar( String s )
	{
		if ( StringUtils.isEmpty( s ) )
			return ' ';
		return s.charAt( 0 );
	}
	

Now the job of the static analysis engine gets harder. In order for the code-path analysis to determine that s.charAt() is not evaluated when s is null, it has to know that StringUtils.isEmpty() always returns true if its argument is null.

This adds complexity in several ways. First, function calls have to be evaluted recursively. From a code-path point of view, the analysis has to "go into" StringUtils.isEmpty() and understand its behavior.

Second, the result of the StringUtils.isEmpty() function depends on its input. That means you can't just scan StringUtils.isEmpty() once and cache the results — the results are different depending on the invocation.

Third, the code-path analysis must handle multiple data types. It has to know that if an object-type input value is null, the boolean-type output value is false, whereas if the object-type input is non-null, the output value can be either true or false. With boolean and object types there is (only!) a power-of-two explosion in the number of combinations of input/output values; with integer types it's ridiculous.

It gets worse. Consider this:

	import org.apache.commons.StringUtils;
	char getFirstChar( String s )
	{
		int n;

		if ( StringUtils.isEmpty( s ) )
			n = 0;
		else
			n = s.length();
		bar();   /* always execute bar() */
		if ( n >= 1 )
			return s.charAt( 0 );
		return ' ';
	}
	

Several new concepts are at play. First, we're transferring information about the possible null-ness of s to another variable n. In particular, n is zero if s is null. But it doesn't go backwards — if later in the routine you know that n is zero that could mean either that s is null or that s is non-null and empty.

The analysis also has to know that if StringUtils.isEmpty() returns false, then s is necessarily non-null. It must understand this so that it knows that the s.length() in the second part of the conditional is safe.

Finally, the analysis needs to undestand that the check for n >= 1 is also sufficient to know that s is non-null.

The difficulties continue. Functions have to be followed recursively resulting in exponential growth in the amount of input/output information that has to be understood. Every time there is a conditional or a loop inside a function the possible number of code paths increases exponentially.

Prior Work

Previous attempts at making Java code null-safe have, of course, failed. We say "of course" because none of us are currently using any tool that finds the types of errors described above. NullSafe is the first one.

Why have previous attempts failed?

Previous attempts at NPE-detection have fallen victim to the kinds of complexity problems described in the previous section. No matter how sophisticated the code-path analysis or theorem-proving engine, no other system has been successfully applied to real-world code bases.

Various short-cuts have been suggested that could make the problem of NPE-detection tractable. Our claim is that each of these proposals require trade-offs unacceptable to most development groups.

Trade-Off #1: Source code annotation

"Source code annotation" requires the developer to specify method input/output contracts. Usually these go in JavaDoc so the static analysis rules are included in the normal documentation of the method.

In the example above, the developer might specify that "s cannot be null." Then within the implementation of getFirstChar() the static analysis engine can just assume s is non-null, and complain only of other code calls getFirstChar() and passes in a possibly-null argument.

The situation is more difficult with StringUtils.isEmpty() because it can take a null argument but output depends on this input. The source code annotation system therefore needs a logic language to express such things; this is where things can get confusing and difficult for devleopers to write.

The biggest problem with this short-cut is that existing code bases are not already documented in this fashion. The task of annotating everything is daunting. Imagine annotating everything in Sun's standard Java library! And what about 3rd-party libraries that aren't annotated? What if you don't even have the source code to those libraries?

(NullSafe does support JavaDoc annotations, but this is a tool for developers to accurate describe their node and are specifically not required for the tool to work.)

Trade-Off #2: Single-method analysis

This short-cut is simply: Don't scan method-calls recursively. Although this makes the problem much more tractible, a huge number of false-positives result. In the Apache Jakarta Tomcat code base for example, thousands of false positives would be identified. (We've tried!) "Fixing" the false-positives means you're adding a bunch of unnecessary code and precludes the use of utility classes like StringUtils whose very purpose is to provide null-safe wrapper routines.

The Smart Bear NullSafe Patent

We have devised a way to scan code deeply and accurately enough to solve all the types of problems described in the first section, an in-memory and on-disk caching mechanism that stores enough information to be valuable but not so much that the cache becomes too large, and do it all while keeping exponential explosion of code paths and analysis in check.

We cannot (yet!) divulge all the details of our methods because of the current stage of the patent. Once the patent because published, we will link to it and describe the method here in plain English.