Load Testing the Post Office
Hi there! My name is Dain, and I'm part of the fantastic development team working on the open source load testing tool loadUI. This is my first entry on this blog, and I thought I would share some thoughts on the topic of load testing.
One of the questions we are asked a lot about, regarding loadUI, is "My server needs to handle X simultaneous requests, how do I test this?" Technically, this question is perfectly valid. It has also been answered several times in our forum. From a load testing perspective, however, it seldom makes much sense. The question I always want to respond with is, "Why do you have a requirement that states that your server needs to process a certain number of requests in parallel?" If you have a good reason for this, then by all means, loadUI will let you test it. However, let's take a look at what you'd actually be testing, and how that would relate to how many users your server can handle (hint: they're pretty much unrelated).
Let's get started, old school!
Let me use an analogy to better explain some performance metrics. Imagine, if you will, an old fashioned, brick and mortar, post office. Yes, those do still exist. At least for the time being. Users arrive to the post office with some business to do, and they start by getting a queue number. They then wait for their number to be called, approach the correct teller, and proceed to handle their business. Once done, they leave the post office and go home.
In our post office, we have a number of tellers working. Each teller can handle the business needs of a single customer at any one time. If we have a single teller, we can only handle a single persons business at a time. If we have two, we can handle two customers, and so on. This is straight forward, right? This means that if we need to increase the throughput of customers, we add a bunch of more tellers to solve the problem. Thus, it makes perfect sense to use this as the measurement of whether or not our post office can handle the amount of customers coming in!
Except... well, how many tellers do we need? Say a new customer arrives every minute. Does this mean we need one teller? Five? Ten? There is actually no way to tell without looking at other parts of the system. So the answer is, quite simply, it depends. It depends on the nature of the customers' business need. It depends on the speed of the teller.
Getting closer with throughput
Instead of just looking at the number of tellers, we can take a look at the customers entering the post office and time between the arrivals. Obviously, we need to look at this during a time when the post office is being heavily used if we want to be able to gauge the capacity of the facility. In fact, if we have an existing post office it's likely that we can check the logs of our queue number dispenser (or server request log, if we allow us to slip away from the analogy for just a moment) and from that figure out the required throughput at peak times, et cetera. Even if we’re expecting an influx of new users, we should have some ideas as to what we are expecting, in comparison to the existing data (are we expecting to double the traffic? Just double the expected throughput!).
Once we've clearly defined our needs, we can verify that the post office handles a specific throughput by sending in customers at the specified rate, and watching the queue. Best-case scenario, there is always an idle teller waiting when a new customer arrives, and there is no queue time. Worst case, the queue grows over time until people can no longer fit in the building (or at least until they give up and go home). In reality, we'll usually see a bit of queuing at times when many customers arrive at once, and other times where several tellers are idly waiting.
What about customer satisfaction?
And so we've finally arrived at the most important thing of all: Customer satisfaction. Really, it's the only thing that matters at all. Imagine yourself as the customer here, with a post-related errand to do. After it is all said and done, do you care about how many tellers were handling other people's business? Do you care how fast they were at their job? Sure, if it means your turn will come sooner, then yes. Nevertheless, that's secondary to the one thing you really care about, which is how long you'll be stuck at the post office. Everything else is just noise. In fact, the analogy we've been using has a flaw. To complete the analogy we have to look outside the post office, in the parking lot. It turns out the real user isn't the customer at all, but the wife who is waiting in the car. She doesn't even see the queue, or the number of tellers, or any of the other customers. The only thing she experiences is the time from when her husband leaves the car, until he returns. This is the user we're targeting.
Back to reality
What does all this mean for our actual problem of web apps and servers? It means that the only thing relevant to the user is response time, and the only thing relevant to us is the highest throughput we can achieve while keeping the response time at an acceptable level. What is acceptable depends a lot on user expectation for the type of request being made, but some basic testing with a handful of users should be able to determine a breaking point where users start to feel like the app is unresponsive. The goal should be to keep the queue short or nonexistent for the throughput given, which should correspond to the peak load your service is getting (or expecting to get). Thus, the criteria for a load test should be to handle a specified throughput, and the maximum time each request is allowed to take.
Did you find this post interesting? Are you itching to try out some load testing? Well, there's a fine tool called loadUI, which can certainly help you with that! In fact, it's free, open source, and easy to use! We've got some nice tutorials to get you started. There's also a couple of videos showing off the tool in action. Please, try it and let us know what you think. We welcome questions as well as feedback in our forums. Using the new Statistics Workbench you can view both throughput and the response times in real time (denoted as Throughput > TPS and Time Taken, respectively, in loadUI).
I hope you've enjoyed this blog entry. Feel free to post your own thoughts on load testing down below in the comments section. Until next time!