Sizing, Danish Style

Folks in telecommunications and operations research have used Erlang models for almost a century. A. K. Erlang, a Danish telephone engineer, developed these models to help plan the capacity of the phone network and predict the grade of service that could be guaranteed, given some basic metrics about call volume and duration. Telephone networks are expensive to deploy, particularly when upgrading your trunk lines involves digging up large portions of rocky Danish ground or running cables under the North Sea.

The Erlang-B formula predicts the probability that an incoming call cannot be serviced, based on the call arrival rate, average call time, and number of lines available.  Erlang-C is similar, but allows for calls to be queued while waiting for service. It predicts the probability that a call will be queued. It can also show when calls will never be serviced, because the rate of arriving calls exceeds the system's total capacity to serve them.

Erlang models are widely used in telecomm, including GPRS network sizing, trunk line sizing, call center staffing models, and other capacity planning arenas where request arrival is apparently random. In fact, you can use it to predict the capacity and wait time at a restaurant, bank branch, or theme park, too.

It should be pretty obvious that Erlang models are widely applicable in computer performance analysis, too. There's a rich body of literature on this subject that goes back to the dawn of the mainframe. Erlang models are the foundation of most capacity management groups. I'm not even going to scratch the surface here, except to show how some back-of-the-envelope calculations can help you save millions of dollars.

One Million Page Views

In my case, I wanted to look at thread pool sizing. Suppose you have an even 1,000,000 requests per hour to handle. This implies an arrival rate (or lambda) of 0.27777... requests per millisecond. (Erlang units are dimensionless, but you need to start with the same units of time, whether it's hours, days, or milliseconds.) I'm going to assume for the moment that the system is pretty fast, so it handles a request in 250 milliseconds, on average.

(Please note that there are many assumptions underneath simply statements like "on average". For the moment, I'll pretend that request processing time follows a normal distribution, even though any modern system is more likely to be bimodal.)

Table 1 shows a portion of the Erlang-C table for these parameters. Feel free to double-check my work with this spreadsheet or this short C program to compute the Erlang-B and Erlang-C values for various numbers of threads. (Thanks to Kenneth J. Christensen for the original program. I can only claim credit for the extra "for" loop.)

Table 1. Erlang-C values at 250 ms / request

NPr_Queue (Erlang-C)
67undef
68undef
69undef
700.921417281
710.791698369
720.676255938
730.574128540
740.484342834
750.405921606
760.337892350
770.279296163
780.229196685
790.186688788
800.150906701
810.121031288
820.096296202
830.075992736
840.059473196
850.046152756
860.035509802
870.027084849
880.020478191
890.015346497
900.011398581
910.008390600
920.006120940
930.004424999
940.003170077
950.002250524
960.001583268
970.001103786
980.000762573
990.000522098

From Table 1, I can immediately see that anything less than 70 threads will never keep up. With less than 70 threads, the queue of unprocessed requests will grow without bound. I need at least 91 threads to get below a 1% chance that a request will be delayed by queueing.

Performance and Capacity

Now, what happens if the average request processing time goes up by 100 milliseconds on those same million requests? Adjusting the parameters, I get Table 2.

Table 2. Erlang-C values at 350 ms / request

NPr_Queue (Erlang-C)
96undef
97undef
980.907100356
990.797290966
1000.697789489
1010.608014385
1020.527376532
1030.455282634
1040.391138874
1050.334354749
1060.284347016
1070.240543652
1080.202387733
1090.169341130
1100.140887936
1110.116537521
1120.095827141
1130.078324041
1140.063626999
1150.051367297
1160.041209109
1170.032849334
1180.026016901
1190.020471625
1200.016002658
1210.012426630
1220.009585560
1230.007344611
1240.005589775
1250.004225555

Now we need a minimum of 99 threads before we can even expect to keep up and we need 122 threads to get down under that 1% queuing threshold.

On the other hand, what about increasing performance by 100 millseconds per request? I'll let you run the calculator for that, but it looks to me like we need between 42 and 59 threads to meet the same thresholds.

That swing, from 150 to 350 milliseconds per request makes a huge difference in the number of concurrent threads your system must support to handle a million requests per hour---almost a factor of 3 times. Would you be willing to triple your hardware for the same request volume? Next time anyone says that "CPU is cheap", fold your arms and tell them "Erlang would not approve." On the flip side, it might be worth spending some administrator time on performance tuning to bring down your average page latency. Or maybe some programmer time to integrate memcached so every single page doesn't have to trudge all the way to the database.

Summary and Extension

Obviously, there's a lot more to performance analysis for web servers than this. Over time, I'll be mixing more analytic pieces with the pragmatic, hands-on posts that I usually make. It'll take some time. For one thing, I have to go back and learn about stochastic process and Markov chains. Pattern recognition and signal processing I've got. Advanced probability and statistics I don't got.

In fact, I'll offer a free copy of Release It to the first commenter who can show me how to derive an Erlang-like model that accounts for a) garbage collection times (bimodal processing time distribution), b) multiple coupled wait states during processing, c) non-equilibrium system states, and d) processing time that varies as a function of system utilization.