The Healthy Fear Associated With Legacy Code
The life of a developer would be rainbows and unicorns if all we did was add new code to code bases. Imagine a world without maintenance programming, debugging, and scratching your head while squinting at confusing existing code. It'd be pretty nice, right?
Sadly, we don't live in that world. Most of our efforts in software development involve a blend of new and old code. We write some new code, stuff it into some existing code, and then try to figure out how the two things will behave together in production. Consequently, both writing and reviewing code necessarily involve a kind of constant, subconscious risk management. "Hmm... should we really touch this code?"
There's rarely a set of explicit heuristics that guides this decision; it tends to be a matter of feel. It'd be nice if there were a way to be a bit more deliberate about it.
Understanding Legacy Code
"Legacy code" is a rather nebulous term. Even the Wikipedia entry offers multiple possible meanings for legacy code:
- Code that relates to a no-longer-supported hardware or software dependency
- Code inherited from someone else
- Code that's part of an older version of the software
- Code that isn't covered by automated unit tests
When I think about what pops into my head when someone says "legacy code," it wouldn't surprise me if any of these descriptions were true. I could imagine any or all of them being applicable. But for me, legacy code really translates to "code you're afraid to touch."
Any of the things mentioned above would make you leery of code. If the code works on outdated dependencies, how do you test to see if your changes work? If it's inherited from someone else, did they know something that you don't? If the code is from some outdated version of the software, who will be affected by your changes, and how? And if there are no automated tests, how do you know you're not breaking things?
All of these questions will tend to float half-formed through your head as you weed into existing code and tweak it. And probably all of these questions drift through your mind when you're reviewing someone else's code. Since your primary purpose is checking rather than modifying, if you're going to be reviewing code, you need to understand how to identify legacy code—code that is risky (scary) to change. This is doubly true if you're evaluating the code of less experienced folks who perhaps haven't yet known the horror of making some tiny change to existing code and introducing a horrendous, counter-intuitive bug.
Recognizing the Signs
When should you be nervous? What does dangerous-to-change code look like? What properties does it have? These are the important questions to answer when evaluating whether you should sign off on a code change or whether you should suggest that your peer find a way to accomplish her task in a way that doesn't involve touching the code in question.
Here are some heuristics. (The following are signs that extra caution is warranted with the code that you’re reviewing.)
The Obvious Stuff
Before we delve into nuance and code-specific properties, let's get some obvious signs of dealing with legacy code out of the way. You can audit the source control history of the code base, asking around and finding people familiar with its history, to learn a number of things. If no one has touched the file(s) in question for years, there's a heightened element of risk. If the people who wrote the code are no longer with the group or even the company, there's danger as well. And, of course, if no one, including the original authors, can easily articulate what the code is supposed to do or why it was written, you're on thin ice when you change it.
Another obvious consideration is the one addressed by Michael Feathers' definition of legacy code: code not covered by unit tests. Do you have any way of knowing if the code will continue behaving as expected after the changes? Automated tests are probably the best way to answer this question, but comments, documentation, or manual regression test plans and results can also help. When you look at the code, see if you have any way to verify whether the changes aren't having unanticipated effects.
Fan-in is a measure of how many elements in a codebase depend on a particular element. Think of a method in your codebase that was called in 400 different places. This method would have a very high degree of fan-in. High fan-in makes code riskier to change because there are so many different sources using it in so many potentially different ways. If the code you're looking at has high fan-in, proceed with caution.
Global State and Hidden Dependencies
Another property of code that should scare you is the interaction with hidden dependencies, especially global state. Global variables, by their nature, can be accessed and mutated anywhere in the code base at any time. As a result, any code that deals with such variables is part of a complex interleaving of logic that is non-obvious to someone touching that code. Modifying code like this can have ripple effects throughout the entire codebase and should be approached with extreme caution.
Global state tends to be the most dangerous dependency, but there are other, subtler things to beware of as well. The modified code may be collaborating with other code in hidden ways as well. This can happen with runtime binding schemes or side effects not obvious from method and class signatures. Does a method called "CalculateAverage" randomly write something to a database? Code with unpredictable behaviors like that is best approached with a metaphorical hazmat suit.
The final heuristic that I'll offer is this: be leery of modifications to methods that are complex. Complexity can come in various forms, but there are two particularly obvious and telling signs to note. The first is the size of the method. If methods are hundreds or thousands of lines long, they're doing a lot of things—more things than a human can keep straight in his/her head. The second sign of complexity is the number of control flow statements or paths through the code. This is known as cyclomatic complexity. If there are dozens or hundreds of paths through the code, odds are that you're not covering all test scenarios after you've modified the code.
Proceed with Caution
Learning when to have a healthy fear of old code is a skill that will serve you well, both as a reviewer and while writing code. If you develop a deliberate approach to deciding whether or not to touch existing code, you'll wind up spending fewer late nights chasing down arcane bugs. And even when you have no choice but to touch it, you'll better understand the risks and be able to allocate more time for regression testing. Until you find yourself in the land of rainbows and unicorns (only writing new code), recognizing legacy code will at least make your life a little better.