Why Did Our Application Crash? A Loaded Question…
Test and Monitor | Posted October 17, 2012

Congratulations, you hit your deadline (yay!!). Your product is now full of new features and your marketing team is salivating with fun tag lines and promotions. Out there in Customer Zone, your users are clamoring on the forums – you can see the traffic spiking there and you are sure it’s all accolades although you haven’t actually taken the time to check.

Hmm… so then why are you sitting in a conference room a week later, sleep-deprived and cranky, for the dreaded post-mortem? Because all that marketing and customer loyalty drove your servers to their knees, exposed a memory leak and your application started timing out before most users could even log in.

Yeah, admit it – anyone who has been in this industry long enough has been at that post-mortem. It usually entails a lot of finger pointing and grumbling, and requires reports in triplicate for the CMO, the CTO, and a very unhappy Director. But no matter where the fingers point during the meeting, by the end they are all directed at the Quality Assurance team. And you're as baffled as everyone else. After all, you took the time out to load test the application and you were sure you could survive the load.

Three reasons why load tests don’t work

So why did your tests fail? Most people assume that if they know how to construct a load test and make the effort to run it, that’s all there is to it. So, let’s examine some of the hidden causes for load test failures. Here are three common reasons why load tests don’t actually prevent load failures.

1. Your load predictions were inaccurate. Where did you get your estimates?

Marketing can tell you more about their campaigns and their expected reach than anyone else. They are muttering to themselves and anyone else who will listen about conversion rates, new vs existing, social media reach and sentiment, and other marketing mysteries. Are you aligned with their expectations? If they’re good, they brought in all the customers they were looking for and then some.

Operations can tell you better than anyone what the history has been for load, both on an average daily basis as well as immediately after launch. In fact, they should be in the review team for any load test cases you’ve put together since they get the first call when things go awry in production.

Well, they get the first alert anyway. The first call goes to Support. They are another typically untapped goldmine of information. They remember in fitful sleep the last time you released and didn’t do any load testing. They can probably recite from memory the browsers and platforms most likely to crash and burn when threads are exhausted or there’s a memory leak somewhere. They know which customers have the most licenses and are most likely to wake them up because the application died before their whole team could get their work done.

Lesson here?

Use your extended team. Quality Assurance is a company matter and the best QA teams know they have to poke in all the dusty corners to make sure they are upholding their responsibility to factor in all the relevant concerns. Sending your test plans for review to the extended team using Collaborator by SmartBear can gather valuable input from all of those constituents and ensure a more accurate load test. Even better, using a collaborative review tool like PRC lets all of those groups see each other’s input, further ensuring that everyone is aware of each other’s activities and perspectives. Imagine… creating a dialog between Product Marketing and Operations!

2. You didn’t chase down the Mad Girlfriend bugs.

Time is always a precious commodity and we applaud any development team that takes the time to perform load tests regularly. We also applaud the management team who encourages this wise use of time. But let’s face it – unless your release is specifically about scalability and stability and NOT about features (I can hear the product marketing teams gagging already), load testing is usually not given the time or focus it should get. Even when teams devote some time to it, it's usually banded time and it’s at the end of the cycle when every second counts.

So, here you are, at the eleventh hour. It’s Saturday, yeah Saturday, and you’ve commandeered a network of computers so you can get as much concurrency and load as possible in the weekend they gave you (oh, thank you so much!) to do the testing. You kick off your tests and thankfully they report failures. Nobody wants to cop to this, but the truth is it feels better to find an actual bug after that much set-up than to have a golden run.

But we all know that not all failures are created equal. There's always that one bug that makes you feel small and inadequate when you try to describe it… You know there was something wrong – screens were slow to respond, sometimes database entries didn’t get created, elements of the page didn’t load – but the logs were clean and you couldn’t find a pattern for its occurrence. Your application is exhibiting the dreaded Mad Girlfriend symptoms (substitute Boyfriend there if the shoe fits). Regardless, something is obviously wrong, but it’s telling you everything is fine.

So, in the small window left for bug fixing and regression, you pick the ones that seem easier to tackle. If you only tested at the GUI/Browser level, you may have missed the opportunity to find the root cause of the issue at the API level - or vice versa. That leaves the Mad Girlfriends sitting off to the side, ominously silent and happily ignored. But we all know what happens when we ignore the Mad Girlfriend…. Eventually things erupt.

Lesson here?

The best load tests are done early enough in the cycle to allow the team to investigate even the silent but deadly bugs. In actuality, the ideal situation is continual load testing, with releases being regularly promoted into an environment dedicated to running load tests. You can keep control of what gets deployed and when by using ABS to build and deploy the code you want to test. Then run both LoadUIWeb and LoadUI scripts on a continual basis to get full coverage against the code and capture even the most elusive failures earlier in the cycle.

3. Your test environment didn’t match your production environment closely enough.

Of course, everyone will tell you that your QA environments should be identical in configuration and platforms as your production environment. How else can you be sure that your tests are valuable in the real world? Okay, reality check time. Most companies have fairly robust production environments, especially now that cloud environments are so adaptable and cheap. But very few companies have the budget or staff to make their test environments match their production environments.

Even in the relatively inexpensive world of cloud computing, things can get costly when you pay the transactional costs for load testing. Plus, who’s gonna manage that environment? Load and performance tests are notoriously painful to set up. If you do your load tests as one of the final stages of development, it means interrupting an already busy Operations or Development team to set up what is by necessity a complex environment, only to tear it down again later or reconfigure it to be used for a different release or product.

Lesson here? 

Chances are you won’t ever have a truly replicated production environment for testing purposes. But spending a fraction of the money on a dedicated environment that has the most important elements of production (load balancing and monitoring, at the very least) will save you the costs of load failures later. By incorporating AlertSite on both your load test environment and your production environment, you can get the added benefit of regular reporting and alerts from site monitoring that mimics what would happen under the same conditions in a production environment.

So… take a load off

Here’s what they will tell you at the post-mortem: You’re there to Assure Quality – so why didn’t you? And here’s your answer: Quality Assurance is a big task that everyone has a part in. Taking load tests seriously means making them an integral part of your software development cycle, from review cycle through site monitoring. So, do your homework and engage your colleagues – then take a load off your mind by putting a load on your system.

Tell us what you think – is it important to have a load test environment that is a duplicate of your production environment?

See also: