Programmers are often organized into specialties just like testers, like front end, backend/server, and ops. Each person does their part to create the User Interface, or build infrastructure on the server, maybe adds a few automated checks to run along the build, and the software is built. Experience shows that the intersection of those specialties will be a place people miscommunicate, make assumptions, or are unclear on responsibilities — exactly the kind of place that bugs can arise.
One of those questions is integration testing. How do we test the product to find problems that only show up when all the pieces are combined and in the right place?
Putting it together: What is Integration Testing?
Let's start by saying that integration testing is looking for information on how two systems work together. That's a little vague, right? A system could be nearly anything. APIs send and get data from databases. That could be entire software products working together like TweetDeck and Twitter. An integration test can be performed anywhere there is a coupling between two software systems.
When that coupling is broken, software becomes a black hole. Customers can submit forms and send data all they want, but it doesn't do any good. FedEx doesn't know that Amazon has packages ready to be shipped from the warehouse to customers that have already paid. Insurance companies can't make decisions about which parts of a healthcare bill they should cover for their customers. And more importantly, people might not be able to share this article on social media.
It is easy for a tester to get a sort of blindness during a release cycle. New features come in one at a time, and sometimes one piece at a time. Testing happens at a granular level. First the API gets tested to see how data is created, updated, deleted, and to get some vague information about the flow of information through the product. Eventually the user interface is wired up and the tester takes a look at look and feel, usability, and probably some cursory testing around submitting forms. In all of this activity, we forget about the bigger picture; how the features work together to make a product, and even more how that product will work with other software in the wild.
Sometimes, doing integration testing means taking a step back. We have to think about how things work together to do integration testing. Instead of thinking in terms of functions like login, and post tweet, there has to be a shift to scenarios a person would use. What happens if someone wants to take a picture, post that to a tweet with some text, and then send the link to a friend for some retweet love? And what happens when one part of those several steps goes wrong?
The tricky part is figuring out exactly where the problem is. If a tweet is posted through the twitter web interface, and that never shows up in TweetDeck, how do we know what went wrong? There is the Twitter API that sends data to other pieces of software, there is the 140 character long string that makes up the tweet, and there is the system consuming and displaying the data. A good place to start with troubleshooting this layered problem is with log files. The log files will tell the story of how far the data traveled. If the tweet was never sent from the Twitter interface, there will probably be no trace. But, if it made it to TweetDeck, there will be a trail. Maybe that leads to authentication problems, maybe it leads to misformatted JSON, or maybe a special character in the TweetDeck is angry. Figuring this out without good log files would not be fun.
The Healthcare Challenge
Alistair Cockburn describes integration testing in his Hexagional Architecture as a scenario that from a software product all the way through an external adaptor or component.
There really is no better example for the importance of integration testing than electronic health record (EHR) systems. Every time a person visits a doctor or a hospital, there is a stack of paperwork that follows them around. There is a form for just about any type of clinical interaction someone would need. This isn't documentation for the sake of documentation. Each form is supposed to give a history, and be a tool for sharing information for other healthcare facilities. EHRs were created to shrink the number of forms needed and to create a universal language between hospitals. One record to rule them all.
The roll outs of course were not that simple. Today there are probably more than 20 different popular EHR systems, not counting the ones healthcare companies created themselves. Each of these stores data in a one way, and sends data out in a completely different way. That sounds bad, but it gets worse. The data captured in these systems gets sent on to insurance and billing companies to see who owes who how much money. Each of these companies wants a specific data type, and sometimes there are restrictions state by state, too.
There is no one medical record to represent each person, there is a mesh of records that come together to represent a person and that gets sliced up a hundred different ways to be useful for anyone that might need the data. Without careful integration -- testing the connection between the EHR, billing companies, and insurance companies -- this all falls over and is useless (but still expensive).
Starting simple can be a good strategy to test the waters
In this healthcare example, a very basic patient record could be a good place to start. What seems like an overly simple test will give up a lot of information about the system. If it 'works', you now know that your product can generate data, generate a file, and that the receiving system can accept and process this file. If it doesn't work, not too much time has been wasted on data setup and hopefully a critical issue has been uncovered.
Assuming the EHR can send data, and the billing system can receive data, this connection is the responsibility of a data file and its contents. These files are expected to be in a special format and also have special types of data. Testing this can be thought of as an exercise in domain testing. Assuming the fields can be tested one at a time, there are ranges of valid and invalid data that need be sent across. For example, a diagnosis field will have a group of diagnoses in the relevant category, everything outside of that category is invalid. The test is seeing what happens when those values are passed across the ether into the billing system.
The important part of integration to remember is that it comes in layers. Software has to be coherent and come together as a product someone would want to use, and then that product will also need to play well with the outside world. Thinking of software testing only in terms of features is a trap. The real value proposition is in how software can communicate with the rest of the world.