Unit tests are a popular, even indispensable part of modern software development, but they didn’t exist until the 1990’s, and certainly weren’t indispensable until much later. That means a lot of code was written Pre-Unit test era, sometimes called “PU.”
Programmers disagree on how (or even if) to add code coverage to a legacy application, one thing they agree on is that it is expensive. We thought it would be best to invite two experts, Matt Heusser and Bob Reselman, with two different perspectives, to present ideas on the value of code coverage for legacy applications — and let you decide what makes sense for you.
So good luck, and come out swinging. I mean, um, here’s to hoping we all learn something.
About Our Contenders
Bob Reselman is nationally known technologist, technology writer and software developer. He has written numerous books and articles on computer programming, project management and technical process. Has held positions with the consulting company, Cap Gemini, the computer manufacturer, Gateway, and the automotive web site, Edmunds.com.
Matt Heusser is the principal consultant at Excelon Development, where he advises teams on technology projects. Primarily known for his writing, Matt was the lead organizer for the workshop on technical debt, the co-author of Save Our Scrum, served a term on board of directors for the association for software testing, has two technology degrees, and spent a couple of years teaching information systems at night for Calvin College.
Increasing Code Coverage on Legacy Apps — Pro or Con?
Bob: I am a big fan of Test Driven Development. I live it; I breath it. When I am writing new code, I always go for a 100% coverage. And, for organizations I’ve managed, I’ve set the standard at 80%. However, when it comes to legacy apps, my thinking is that that boat has sailed. As long as the legacy code is working, going back to increase code coverage numbers is a waste of time and money.
It was hard pill for me to swallow.
Matt: I’ve certainly seen code coverage efforts fair, Bob - one company I worked with, the outsourced programmers just called every method, wrapped in a handle all exceptions block - the “unit tests” didn’t do anything! Even if they aren’t cheating, adding new automated checks to legacy code is a slog, and it can be demoralizing; on a ten million line of code system, it might take intense effort to move the needle from 0% to 0.5% But that doesn’t mean it is impossible, or that we should give up on trying to increase coverage for legacy applications.
Bob: What you describe is not unit testing under Test Driven Development. Seems more like unit testing under Organizational Insanity.
Matt: Oh, I’m sure that’s not TDD, and it isn’t what I recommend, either. Instead, I’d say that is the strawman I encounter the most when people say “We tried code coverage and it didn’t work.” Another problem I see is speed of the test/deploy process; it should only take a few minutes.
Bob: Implementing a feasible Test Driven Development process cannot be done in minutes. In my experience transforming a shop into one in which TDD is part of the development culture takes months, maybe a year.
Matt: Let me just interrupt there Bob - we agree on this! By test/deploy pipeline, I don’t mean how long it takes for the team to learn the skills. I mean the actual process to move from committed code, to build and automated checks complete. In other words, how long did the build software take to run and did it run unit tests automatically. If that can run in minutes, we can have a tight feedback loop. One common problem with legacy applications is that the only way to explore and test the software is at the highest levels. To run the checks, we need to do a full build, deploy a webserver, do a database refresh … if the feedback loop doesn’t take hours, it will be soon. When programmers get failures, they don’t trust them, or the code changed. They want to re-run everything and now we are two days behind and have to debug. It’s a pain. Too many organizations feel that pain and give up.
Bob: It sounds to me like they are trying to run integration or system tests and calling them unit tests. Neither of those are really TDD, which is a discipline. It’s fantastic for new code, but trying to add it on top of old code changes the very paradigm the work was done under. And we’re assuming that the legacy code is testable.
The very nature of the legacy application - a nature that worked, that is generating enough money for the company to keep working - was done under some different assumptions. If we say we should have one line of test code for every ten lines of production code, which is, by the way, conservative, than a 10 million line of code application needs a million lines of test code to be covered. Doing that while maintaining things, and changing the architecture to support it? You’d be better off leaving well enough alone.
Maybe you can find some efficiency applying unit tests or higher level testing to machine to machine code, code in which one machine talks to another. That flavor of code tends to have abstract interfaces that are testable, maybe beyond the simple,” here is a value x, make sure I get back value, y” testing. But let’s look at the bane of most testers, Graphical User Interface testing.
A while back I was directing a company that had an application in which the DOM was generated automagically, based on the URL and parameters defined in the query string. It turned out that the IDs of the page elements were not following a predictable convention because each developer, over the years that the code was being developed, applied his/her own convention. Thus, when we recorded a simple Selenium script of a page flow process, we found the script was only applicable to that one process, of which there were thousands of possible permutations. After a week of trying to apply tests, we threw in the towel. We estimated that it would take 2 developers at least 6 months to come up with scripts that just ran, forget adequate coverage.
Do I wish that the UI code was self-enforcing, making it so that the “next” developer could not veer away from a pre-existing condition? Yes. Do I wish that there was some sort of UI unit testing in force so that it caused failure when the developer went off the reservation? Yes. But, such issues are more about good design. Well designed software is testable, always!
Matt: I think we can agree that retrofitting unit tests on top of a legacy system might not be the place to start. Most legacy systems, I meet up with, someone has to check out the code manually, get it into an IDE, press F5, copy some files to a network drive, maybe bounce a web server by hand. If you don’t have Continuous Integration running, the programmers will make changes that break each other’s unit tests and it’ll end up a mess. So Unit Test coverage might not be the first modernization improvement for a legacy team to tackle. We might agree there. But let’s get past that. Did you get complaints that the code didn’t support unit tests, that unit testing was impossible, except at the highest levels?
Bob: Any code at the class level and maybe at the very distinct component level is unit testable. Or course, sometimes mocks and fakes may be required to emulate external dependencies. If the code cannot be unit tested, logic dictates that it is not a unit.
Matt: In my time in legacy software, I have seen “Master” class objects that contained all the data. These had to be created and populated for the rest of the code to do anything. To do that you need to have a database connection. I’ve seen architectures where the code was designed as a monolith - a single jar file - where the classes were intermixed. No one had ever thought of creating a class in isolation, so they are hard to create in isolation. Making unit tests “hard” or “impossible.”
Bob: Oh sure. The design of those legacy applications don’t support TDD, Matt. You’re helping my argument here.
Matt: Wait a minute. If the code isn’t made of isolated components, that’s a design problem. The hardness to test is a symptom of something greater. That it is hard to unit test is a symptom of something greater. So we need to teach the team how to do good design, then give them tools to make the code more clean over time. Do you disagree?
Bob: So now you want to redesign a system whose work-life is measured in person-decades. That’s a lot of extra work for a system that is working fine now, and has for years. Why not just plan to sunset the system? Jeepers Matt, even Microsoft realized that there was only so far that they could go with the original version of Windows, before they had to gut the whole thing. Let a dying system die.
Matt: Agreed that progress on legacy systems slow down until you want to throw it away. I’d hope that by adding coverage, we could prevent the crisis where you decide to scrap and rewrite.
Back around 2008 I organized the first workshop on technical debt, and we talked about the Boy Scout Rule -- this idea that you leave a campsite in better condition than you found it. When I think about the legacy work I have done, it is often adding a new usertype, or a new field -- that sort of work. We’re typically in one screen at a time. That screen is a mess; the business logic is all tied in to the user interface which is mixed in with the database. Instead of trying to fix everything, I suggest the Boy Scout Rule. Pull out the code that you are working on, extract it to a class or method, then unit test that method. If everyone does something like that when they touch the code, eventually the pieces of the code that change the most get unit test coverage. I’d rather have that than some arbitrary percentage, wouldn’t you?
Bob: Yes, your thinking is wise. Of course, we want to make it better. And, because we are writing new code, we subject that code to unit testing. However, we do not want to waste time writing unit tests around that ugly code just to get a pretty code coverage number. It does not make economic sense. The legacy code is out there, it has been working despite its warts. So, what is the benefit to dedicating labor to affirm something we sorta know anyway?
Matt: I can agree that the Scout Rule isn’t enough; you end up with this hub of untested code surrounded by these spokes of functionality. You need a vision for the design to guide toward. Two that I’m familiar with are the facade pattern and the strangler pattern. First we hide the old code behind a facade - say exposing it as a web service. Any new code needs to go on the front side of the facade. So we need to create a clear separation between old code, which is okay to stay untested, and new code, which needs to be covered by automated checks.
Bob: If our marching orders are to create a Web Service using as much legacy code as possible, then yes, using a façade pattern makes a lot of sense. However, that we are creating a Web Service means we are, by definition, beyond the scope of unit testing. Web Services need to be subject to a broader regimen of testing, load and scale testing, for example. At this level of testing, even considering applying the discipline of unit testing is a bit comical. I mean, let’s say we have a Web Service that consumes a subordinate web service outside the domain. There is no way we’re going to get a realistic coverage number. And, as far as 100% passing goes, we’re talking about 100% passing tests under what conditions?
Matt: Exposing the old code as a service at least makes it possible for us to put integration tests around it, which is better than we had before! And we could instrument those and look at coverage.
I grant that the time certainly is an investment. It slows you down at first and requires skills development. The quick and the easy way is to just add the new column. But that leaves you with each change taking a little longer to get to production, with a little more risk -- and each person saying it isn’t their fault, they just added an “if statement and a new parameter”. I look at that system of effects and I think, no thanks.
It’s a big hill, but I think one worth climbing. And I tend to focus more on the climb - on what we are doing now and improving - instead of our ‘failure’ at the end of year to hit some arbitrary amount of coverage. I think we can agree on that. Knowing that we have made some improvement, having an understanding of what parts of the code do have unit tests -- I see that as a very good thing. Eventually your old code, behind the facade, is much smaller than the new. You might only measure coverage on the new.
Bob: So you just said you’re not adding old code to the legacy app. That’s my position. Also, you forgot to mention the nemesis of redirection. How many times do you have to redirect program flow away from bad software to better software before it becomes absurd? 10 redirects? 20 redirects? 100 redirects? You want to maintain code with 100 avenues of band-aid redirection?
Matt: You can build an interface so everything goes through one redirect. If you have to, use a switch statement. I want to change the nature of the system, so over time the legacy app becomes a smaller and smaller codebase. In some cases, it can ‘go away’ entirely. This isn’t black and white - there is room for gray.
Bob: I’m talking black and white because of the skill sets of the staff. What you described takes nuance, it takes skills and craft - skills that many technical teams simply don’t have.
Matt: Agreed. So get ‘em some books. Conduct some experiments. Do some pair - and even mob - programming. Bring in some guest lectures. Start a book club. Send them to training. Shake things up!
Bob: Well, I admire your spunk, Matt. But, don’t get me started on “send ‘em to training”, which I also consider to be a waste of time. Education, yes; training can best be done on one’s own time.
Of course, we should be promoting better practices all the way around: good design, good coding practices as well as good testing practices. But, to think that just because you have tests running against legacy code is going to make things better, you are mistaken. I can see how under the right conditions you might turn a legacy project around. But certainly that isn’t all projects!
Matt: You’re right there Bob. There are some teams where people don’t read, where management is uninterested in anything but tomorrow’s deadline, where no one has the experience or fortitude to see the world as anything but what it is. Those are the kind of folks who don’t read blogs, but in the odd case one does, I’d suggest the old irish proverb: If you want to get there, I wouldn’t start from here!
Handling Legacy Code
Unit testing is a fundamental tenet of the practice of Test Driven Development. As a result, the typical reaction by managers pursuing the DevOps way of life is to have development staff go back into the codebase and write unit tests.
Interested in learning more about dealing with legacy code? Learn more about unit testing legacy code.