# The statistics of offered load

Posted by on Sep 20, 2012 in Tutorial | 0 comments

Almost everyone with a high school education is familiar with the Gaussian or Normal distribution. One reason for this is that the student’s scores from exams or complete terms are “graded on a curve”; that curve being a Normal probability law.

The two important parameters of the Normal curve are its mean and standard deviation. The mean is a measure of its “central tendency”; that is, the single and preferred location around which the most probable values are bunched up. The standard deviation measures the spread of the distribution.

A discrete variation of the Normal curve is important in traditional communications theory. This discrete form is the Poisson distribution. The Poisson distribution is of use in counting discrete events that have some expected rate of arrival; e.g., 10 calls per hour or 5 packets per millisecond or even 800 bits per packet, and so on. Agner Erlang employed the Poisson distribution when he modeled offered load on telephone systems and came up with the various models that bear his name now. The concept is very simple; if a call arrives on average once every T seconds and lasts for M minutes, according to a Poisson distribution, we can begin to work out how likely our network is likely to block traffic.

It is natural to assume that offered load works this way in many communications systems; e.g., paging networks. However, the truth is stranger than this convenient fiction. In practice, offered load on many networks looks more like a Power Law distribution. A simple and commonly used model for such scenarios is the so-called “80-20 rule” aka the Pareto principle. Such a rule as applied to network traffic might go something like 80% of your traffic comes from 20% of your customers. The converse would naturally be that 80% of your customers only offer 20% of your traffic. By tuning some of the elements of the Pareto principle, one can come up with similar statements like 90% of the load comes from 10% of the subscribers or 1% of the population has 99% of the wealth, and so on.

I believe that one of the first modern scientists to work on a Pareto principle in communications was Benoit Mandelbrot at IBM. He analyzed data errors on modem calls and concluded that the arrival rate of errors was not distributed as Poisson (or normal statistics) but rather as a Pareto law. By now, it is commonly accepted that the arrival rate of data errors, of packets, of the length of error bursts, of hard disk failures, and any number of other communications phenomena are more properly accounted for as power law distributions or possibly as “long tail distributions”.

A discrete form of these power law curves that is more suitable for consideration in, say, paging or other wireless data networks, is due to Zipf. Zipf studied that frequency of occurrence of words in spoken languages and concluded that they appeared to be distributed in accordance with a power law. For example, the most frequent word in any English text is “the,” which occurs about 7% of the time. The next most frequent word is “of” which appears about ½ as often at 3.5%. And so on. Back at WebLink, we (Larry Martin) patented a couple of speech-to-text conversion methods that were based on this Zipfian model of the English language, recognizing that a relatively small set of words could be used to capture over 99% of text messages. [The real problems were in handling the proper names of people, places, and things.]

Zipf associated his observations with a “principle of least effort”; that is, humans wanted to use the least number of words in order to convey their message. This principle of least effort may also be behind the behavior of subscribers to paging and other wireless systems. Why send a message if you don’t have to? My recollection of the typical high-traffic source on the WebLink network would have been a computer system designed to report that it was still working every few minutes. However, there were very few of these. On the other hand, many subscribers never sent or received a message in any given month, while another large fraction only sent or received fewer than three pages. In other words, a Pareto/Zipf model was appropriate.

That’s interesting so far as it goes, but there was another dynamic at work. There would occasionally be “catastrophes” of offered load when all of those folks who rarely or never sent a page all got together and delivered their one page a year (or month or day). That is, the same reasons that motivated one of these folks to send a message motivated a large proportion of them to do the same. As often as not, this would be some compelling event; e.g., a major accident, a major storm, a major sporting event, etc. I imagine that it’s the same with Twitter now. Some news-worthy bandwagon comes by and everyone hops on it with their mobile devices and starts texting away, in effect.

I recall having the greatest difficulty explaining this to other senior managers at WebLink. The idea that every random event in the universe must be in accordance with a Normal distribution is deeply ingrained, I guess. I eventually dug up traffic reports for every paging terminal in our network, produced the graphs on a monthly and yearly basis and produced curves very much like those in the Zipf article on Wikipedia. Even then, my data was considered suspect. Management had the view that there had to be two Normal distributions at work: one for average people and another for “cheaters.” They expected to see a combined distribution with one mean for the average subscribers and another, higher mean for cheaters (meaning people who were trying to get something for nothing . . . a very Romney-esque point of view). They were quite disturbed to see smooth distributions with no apparent means at all. As the Zipf article shows, these curves are straight lines on log-log graphs.

I raised this important point in the context of capacity management on public safety communications systems since a failure to recognize the fact that the statistics of offered load in an emergency cannot be estimated on the basis of some fudge factor over a “mean” non-emergency load. For power law distributions, there is no convergence to a mean; a mean simply doesn’t exist. Likewise, standard deviations and variances don’t exist. So, it is impossible to guesstimate high levels of traffic on the basis of adding 1 or 2 or 6 (non-existent) standard deviations to the (non-existent) mean. For a system in which a Pareto (80-20) rule is at work, only a fraction of subscribers offer load in any average observation window (hour, day, week, month . . .). Under emergency circumstances, that rule is inverted; e.g., the computer that would normally report its “up” status every five minutes is now off the air, its service engineer is being paged, and all of the little old ladies who never get messages are trying to be contacted by every family member because of the big storm.

More like as not, the cellular networks are also down: and their techs are being paged. Wink wink.

So, under the general heading of Zipf’s principle of least effort and the idea of “why send a message when you don’t have to?”, consider that when there is a common state of emergency, that is exactly when almost everyone has to send a message.

The curve below is a Zipf-Mandelbrot probability distribution for a situation with up to 10,000 messages versus the probability of a given number of messages being sent, and showing a power law distribution with a negative exponent of 1.1. Curves like this are typical of paging (and other wireless) networks.