I’d like to shift gears from last time and talk more explicitly about some technical issues.
Let me begin by mentioning that I am not an expert in modeling, routing, or capturing network traffic. But as Feynman said, “Science is the belief in the ignorance of the experts”. This is the first in a series of blog posts intended to highlight what appears to be a common misperception about network traffic statistics by many experts based on old data and to indirectly illustrate how this misperception may have adversely affected network security research in recent years. Finally, it will sketch some characterization and detection techniques that seek to leverage the actual features and emerging trends in network traffic.
The field of queueing theory was inaugurated by A.K. Erlang in 1909 with his publication of “The Theory of Probabilities and Telephone Conversations”, which demonstrated that telephone traffic was Poissonian. This meant that the interarrival time between any two calls could be represented as a random Poisson variable, i.e., the times between successive calls were independent and identically distributed, and the probability distribution of such a generic interarrival time was given by
where was the overall rate at which calls were placed. This equation governs not only telephone calls but also network flow interarrivals and radioactive decays. Poissonian traffic is very nice from a mathematical point of view, and this fact allowed telcos to analytically model networks and plan to provision them accordingly. Since telco infrastructure was (and is) expensive, this was a big deal.
Fast forward a little over 50 years: Leonard Kleinrock applied queueing theory to the design of packet-switched networks, and in 1969 his lab was one of the original ARPANET sites. (In fact, Kleinrock and his student Charley Kline sent the very first message over the Internet—Kline was typing “login” and got as far as “lo” before the network crashed.) The Internet was built on the edifice of queueing theory, which in turn was deeply rooted in Poissonian traffic models.
This should serve to illustrate why a now-famous 1994 paper titled “On the self-similar nature of Ethernet traffic” by Leland et al. stopped a generation of network researchers and engineers in their tracks. The measurements and analysis in this paper definitively showed that LAN packet interarrivals (versus flow interarrivals) had self-similar or fractal behavior, and were not Poissonian.
It turns out that a Poisson arrival model is the only one that has interarrival times that are both independent and stationary. So the fact that packet interarrivals were not Poissonian meant that they were not independent of each other and/or were not stationary. Stationarity essentially means “time-invariance” and for various reasons is often simply conflated with so-called weak stationarity in the literature. Weak stationarity is defined explicitly here for the sake of later discussion as follows. The starting point for characterizing a time series has traditionally been the autocorrelation
where is the variance of . If the expected value of is constant and the autocorrelation can be written as , then is called weakly stationary.
Leland et al. explicitly verified that their data was stationary, but often stationarity has traditionally been taken for granted over intermediate timescales of minutes to hours. The reason is simple: while it is natural to expect the packet interarrivals on a LAN to show some daily and weekly patterns, over shorter timescales there is no obvious cause to expect statistical variation. This left independence of packet interarrival times as the odd man out. Since flow interarrivals were still Poissonian (a fact indicated experimentally for interactive TELNET and FTP sessions even in a well-known 1995 paper by Paxson and Floyd called “Wide area traffic: the failure of Poisson modeling” and corroborated by Barakat et al. in the 2003 paper “Modeling Internet backbone traffic at the flow level”), the natural conclusion would have been that individual flows exhibited autocorrelation. And in fact this turned out to be the case.
What Leland et al. essentially did to establish self-similarity was to put the timestamps of packets traversing a link into small bins and average over the bins. They found that the resulting locally averaged time series had similar autocorrelation structure over a wide range of bin sizes and the variances scaled as an inverse power of the bin size, which are the defining features of “exact second-order self-similarity”. For Poissonian traffic, larger bins would have produced progressively flatter locally averaged series, but they saw that the locally averaged series looked qualitatively similar for any reasonable choice of bin size.
In short, LAN traffic was “bursty” no matter how you looked at it:
For example, our analysis of the Ethernet data shows that the generally accepted argument for the “Poisson-like” nature of aggregate traffic, namely, that aggregate traffic becomes smoother (less bursty) as the number of traffic sources increases, has very little to do with reality. In fact, using the degree of self-similarity…as a measure of “burstiness,” we show that the burstiness of LAN traffic typically intensifies as the number of active traffic sources increases, contrary to commonly held views.
This presaged challenges for computer network provisioning of a sort that were unprecedented before then. People took notice. A raft of studies followed that confirmed and extended these results. They are still held as gospel by most of the network research community. And they inform the methods used to analyze and model network traffic today.
But experimental results over the last five years and longstanding theoretical work have indicated that it is time to reevaluate the assumption of non-Poisson behavior. In a very real sense, it may be time to go back to basics in order to provide more sophisticated tools for analyzing network traffic. The next posts in this series will speak to these developments.
update 11/24: fixed historical blurb re: Charley Kline.