What should have been a straightforward e-commerce site development project turned into a laughing stock – and a disaster of unmitigated proportions. Let’s look at how that project went sour, in the context of developers and IT leaders avoiding another one closer-to-home.
Once the government shutdown in early October 2013 faded from the press, a new mess came to dominate the media's time: what an absolute clusterfrack the official Affordable Healthcare Act (a.k.a. ObamaCare) had turned into.
HealthCare.gov was supposed to operate on a simple functional spec: Present the person with the options available, determine the individual’s ability to pay and whether he needed a subsidy, collect the individual’s information and payment, and transmit it to a health insurance provider in the citizen’s state.
It did none of these things. On its first day of operation, just six people were able to sign up.
Over the months, with near constant overhauls, the site grew better. By January 2014, healthcare.gov had signed up 1.1 million people (for that month). Still, the site is plagued with outages, non-transmission of payments, and seemingly non-existent security.
What went wrong with HealthCare.gov wasn't just one issue, it was many, leading to a cascade of failure. Let’s look at the elements involved. As we review them, consider how these human errors might be reflected in your own company’s development process. (I hope you breathe a sigh of relief and say, “Not at all!”)
The bidding war
The contractor that built Healthcare.gov initially became involved in a bid to provide technology services for the Department of Health and Human Services (HHS) in 2007, two years before Barack Obama became President. Even though the law was passed in 2010, development on HealthCare.gov didn't begin until December 2011. So HHS offered the contract for the site to the final four bidders of the 2007 contract, and awarded the project to the bidder that provided the "best value."
The contractor spent $174 million on building the site, well below the $600 million figure often cited on some news outlets. Even at $174 million, that’s a ridiculous amount of money for what is essentially an e-commerce site.
The aforementioned contractor is now getting killed in the court of public opinion. The federal government dropped its services in favor of Accenture; North Carolina dumped them for its tax collection system; and Massachusetts fired them from its contract to build a HealthCare.gov site for its residents.
But it wasn't all the contractor's fault.
Wrong server model
At its heart, HealthCare.gov is an e-commerce site, not much different than Amazon: shop, select, and checkout. "It's similar to a standard e-commerce site that a few engineers can build here in the Silicon Valley," said Raj Bains, director of products at Clustrix, which makes scale-out SQL databases.
For an e-commerce site like that, the architecture has to be scale-out, which means using a bunch of application servers and adding more when you need to grow. Bains cited a Clustrix customer, nomorerack.com (a reseller like Overstock.com), as an example of scale-out. For Cyber Monday this past Christmas season, the site quickly went from 6 servers to 14 servers by leasing more hardware from its provider; then as traffic leveled off, it let the leases expire.
Instead, Bains said, HealthCare.gov used the 1990s database approach: use a big server that scales up. But that's not good with many concurrent users. "Legacy players used to sell giant boxes to the enterprise; and if the load is too much you throw that away and get a bigger box. That's not how scale-out, cloud architecture works," he said.
The wrong design model
Agile development is all the rage today, but the contractor didn't use it to iteratively build the most important features and get them working, and add more functionality in relatively short sprints. Instead, they used the old waterfall model of build as much as you can, then test it.
Agile means testing right from the get-go. Tony Barbagallo, vice president of marketing at Clustrix, said HealthCare.gov wasn't designed that way. "If you don’t do the design exercise correctly that's the first point of failure," he said.
Most websites are built with an Agile process, so you get the basic infrastructure up very quickly and start testing for bottlenecks. Then you add more features and test them before deploying. Then add more features and test them, too, before putting on production servers. " Silicon Valley has gone completely to Agile. Testing starts a few weeks into the process. And every 4 to 6 weeks you add out new features you like and test, say 'that's good,' and add another task and test," Barbagallo said.
HealthCare.gov got virtually no testing. The first tests were run just days before the October 1 launch; and when the simulation reached just a few hundred concurrent users, it crashed.
Too many cooks spoil the broth and ruin the website. Government ineptitude really played a role here. First, it didn't start work on the site until a year after the law passed. In the business world, work would have gotten underway the next day.
Then the government came in with last-minute changes. A government oversight report said that the Obama administration told contractors one month before HealthCare.gov went live that consumers should be required to register before they could see prices for the plans. Before that change in software requirements, you could see the costs without registering. Other last-minute changes were imposed on the contractor – when it hadn't even finished the core system functionality, let alone tested it.
"As simple as it sounds, getting business stakeholders to agree on a definition of 'success' can become quite difficult and any project initiative should not move forward until this is understood by all participants in the project with everyone protecting the values that drive to success and raising the flag to things that are known causes for late project failures,” said Tony McClain, a client partner with Geneca, a custom software development firm. “One example that immediately comes to mind with Healthcare.gov is the allowance of last minute changes that can and apparently did disrupt or compromise the usability of the system."
Worst. Coding. Ever.
Microsoft Windows 7, the entire operating system, is approximately 50 million lines of code. HealthCare.gov, a simple e-commerce Web site, weighs in at 500 million lines of code, according to the New York Times. Quite a few developers have expressed skepticism over this statistic, but no one really knows the size.
What code has been seen is a disaster. A recent Time.com article showed just how bad things were: On the back end of the site, data from the few people able to attempt to sign up was garbled and, in some cases, unusable. The nightly reports on new enrollees sent to insurance companies were riddled with errors, including syntax mistakes and transposed or duplicate data.
The worst offense was the improper use of DataTables, an open-source plug-in for jQuery, which improves data handling and display. DataTables is available as both a GPL v2 and BSD license. However, the DataTables team notes you have to "keep the copyright notices in the software." But Healthcare.gov did not do that. SpryMedia, the company behind DataTables, said it was "extremely disappointed" by the move. One sensationalistic website claimed SpryMedia was going to sue, but that was never borne out.
Ok, the contractor got publicly trashed and it lost the contract – as well as plenty of other business. But no one in government has lost his job over this debacle, nor has anyone been held accountable for other federal IT boondoggles.
In 2012, the Air Force pulled the plug on its Expeditionary Combat Support System, an ERP system designed to replace more than 200 legacy systems currently in use. After spending $1.1 billion just to get 25% of the original scope and with a completion target date of 2020, the government finally had enough, and it killed the project. The FBI's disastrous Virtual Case File initiative cost $170 million before being cancelled. Its replacement, called Sentinel, went live in 2012 after $451 million spent.
In business, debacles like these would generate some sort of post-mortem analysis to learn what went wrong and how to avoid them in future. Yet, in none of these situations did the government address the cause of the problem (though some scapegoats were presumably slaughtered). Sadly, no one ever pays when the government screws up. Except the taxpayer.