Capacity Planning on a Cocktail Napkin
Test and Monitor | Posted February 15, 2012

A few simple rules of thumb and a bit of arithmetic can help you make simple, but useful, capacity estimates.

Normally, after eight hours on my feet with an audience, I'm both exhausted and wired; what I really want is to fall face down on my bed until I recover enough to order room service.

I was in Chicago teaching a course in Web systems architecture, and several of my students insisted I come with them to the famous BIG Bar in the Hyatt Regency. We got settled around a table, I ordered a Shirley Temple, and one of the students – a tall, slender, rather intimidatingly beautiful brunette – asked me a question.

"I'm starting a Web application start-up, and I'm worried about our platform. We're well-enough funded, but I don't want to spend any more than I have to ... but I think we might be big, and I don't want to have my system fall down the day we open. How can I make sure we have enough capacity without buying too much?"

I thought a second. "Really answering that question would take some work and time, but we can make some good guesses right here." I pulled over a dry cocktail napkin, and took my fountain pen out of my pocket. "We start with some simple assumptions...."

#

What we were talking about was capacity planning – basically predicting how big a system you'll need to sustain an expected demand. Real capacity planning is based on two statistical models: a workload model, which describes how often someone uses the system and how, and a performance model that describes the behavior of the system under load.

Capacity planning really got started in the early days of telephony, when telephone companies were the Internet startups of the day. The initial work by A. K. Erlang around the turn of the last century led to a whole body of lovely mathematics, almost none of which we'll use here.

A detailed and accurate workload model, and an associated performance model, allow high-quality predictions to be made, and let network managers play "what-if" games to determine rather exactly how much capacity they need. Think of how rarely you pick up the phone and get no dial tone — except in a really extreme emergency – or try to make a call to be told the circuits are busy.

But my student's problem was different. She didn't have a good workload model, and she had no performance model at all. The whole system was some user-interface wireframes and user stories, along with a business plan.

#

I drew an x and y axis on the cocktail napkin, and then drew a squiggly curve.

"So we can start from first principles. Here's a picture representing the amount of traffic you get when your site is up and operating. The x axis is time, and the y axis is in Erlangs, the number of concurrent sessions you have running at one time." I pointed to the highest point on my squiggle. "That's your maximum usage over this time, so what you want to do is make sure your system can handle at least that much load."

"Of course." She'd moved very close to me to look at my sketch. I tried to keep my eyes on the napkin.

"Now, we know some things about Web systems. The biggest one is that each client usually makes demands on the system in bursts, and then sit and thinks for a long time. So we can assume the length of a session is fairly short, a few seconds up to maybe a minute. By definition, the utilization in Erlangs is the arrival rate lambda times the session duration h --"

I wrote the equation on my cocktail napkin, above my squiggled graph:

E=λh

"Remember always to include a Greek letter, it looks much more impressive then. But this is the basic equation we care about. If we can guess any two of these terms we can compute the other one.

"So, now, let's make a guess. You've got your business plan – how many users a day do you expect to need to be profitable?"

She frowned for a moment. "Around 10,000."

"Okay, well, users tend to come in on a Pareto distribution – 80% of the work comes in 20% of the time. So we're going to have about 8,000 users over about 2.4 hours, or about 3000 an hour...."

"Really, it's 3333.3," she said, looking a bit triumphant. Her dark, slightly almond-shaped eyes caught mine. Students always like correcting the teacher.

#

Now, my student and I had actually constructed the basics of a capacity planning model. We've got a workload. We know that to fit their business plan, they need to handle 10,000 customers a day, and based on some basic rules of thumb, we've guessed that means 3333 customers an hour during the most loaded hours of the day, with each session loading the system for around 10 seconds. We apply the Erlang equation and do a little algebra:

λ = 3333 users/hr ÷ 3600 sec/hr ≈ 0.93 users/sec

A session, we decided, is about 10 seconds, so the load in Erlangs is:

E=λh= 0.93 users/src × 10 sec = 9.3 concurrent users

#

She broke out in a smile. "That's all? I need to plan on 9.3 users?"

"Well, not quite. First of all, you can't have a third of a user, so you'd want to plan for 10. But there's more to it than that. See, you can't depend on your users to come in in a nice orderly fashion; they're not taking numbers at a deli counter. Instead you have a whole lot of independent actors, each one making their own decisions about when to use your system. Because they're arriving randomly, we've got to think about them statistically.

"So far, we've been talking about averages, and we know we expect the users to arrive at about 0.93 per second on average – or inverting that, a little bit more than a second between each user –"

"It's 1.08 seconds," she grinned.

"How are you doing that?"

"I've got my little secrets."

"Right." I shook my head. "So anyway, all right, the average inter-arrival time is one over lambda or about 1.08 seconds, but it's coming from a random process, and since the actors are individuals, they don't have any co-ordination; it's going to be what's called a memory-less distribution. Does that ring a bell, say, from statistics classes?"

"Nothing, sorry." She smiled brightly and brushed her long dark hair back from her face.

"Okay, well, there are only two kinds of memory-less distributions. If we talk about discrete events, like numbers of arrivals, it's called a Poisson distribution; if it's some continuous value, it's called an exponential distribution."

"Okay, that does sound familiar."

"Now, those distributions have lots of nice properties, which makes it easy to calculate with them. And the nicest one is that if you have an exponential distribution with parameter lambda – in other words, an arrivals rate of about 0.93 per second – then we can approximate with a normal distribution, where the mean is 0.93 per second, and the variance is also 0.93 seconds. That means the standard deviation, sigma, is the square root of 0.93." I wrote those down on the cocktail napkin next, along with a sketch of the normal distribution's bell curve:

Exponential distribution

Means approximately normal

Mean is λ, variance σ² = λ

So std dev

"That means we expect the arrival rate to be abut 0.93 per second, but it's also a random variable that we know is approximately normally distributed with one standard deviation about the square root of 0.93, which is..."

"Are you a lightning calculator or what? Okay, 0.96. One sigma above the mean is about 0.93+0.96, or almost 2 arrivals a second. Since that's one standard deviation up, that happens about 15% of the time. About 5% of the time, it'll be almost 3 arrivals per second. So you really need to think about times when you have three arrivals in a second. You can work out that it's not very probable to have that happen for a very long time, but you won't go far wrong if you plan for 30 concurrent user sessions at a time: 3 arrivals per second, each one running ten seconds."

"How accurate is that going to be?"

"Oh, not very, but what do you want on one cocktail napkin?"

"Good point." She studied the napkin carefully. "Wait, there's something I don't understand. What about the other side of the curve? The mean is 0.93 arrivals per second — but one sigma down is negative 0.03. What does that mean?"

"Easy: you can't have less than zero arrivals in a second, but what that means is some seconds won't have any arrivals at all. Even at your high load, about 15% of the time you'll get no arrivals whatsoever."

"Oh, sure, that makes sense. But then there will be times when the system is idle even during high loads, right?"

"Yes, it'll help even out the heavily loaded seconds."

She seemed very impressed. "May I keep this?" she said, taking the cocktail napkin.

"Of course." She put it in her purse, drank the last of her drink.

"So, um,..." After talking so glibly I was suddenly tongue-tied. "Would you like to, um, have d—" Just then she stood up and waved "Rod, over here!"

There was a man coming in, tall, perfectly sculpted features, an expensive suit. He could have been Clark Kent, if Kent wore Armani.

"I want to introduce you to my husband. Rod, this is our instructor!" We shook hands, and she left, happily arm in arm with Rod.

So I drank the rest of my Shirley Temple.

I wonder what's on TV tonight?