Why Poissonian traffic models matter more now than ever, part 3

21 July 2009

A paper by Karagiannis et al. that appeared around the same time as Veitch et al. and called “A Nonstationary Poisson View of Internet Traffic” provided concrete evidence to support the convergence to Poissonian packet interarrivals. Mindful of the same short-timescale issues as Veitch et al., Karagiannis et al. remarked that “the packet interarrival time distribution may deviate from the Poisson model for very small values because of multiple-packet deterministic sequences” due primarily to buffered packets in the upstream router. However, this should not be a factor except as the link becomes saturated, which is precisely the caveat of Cao et al. detailed earlier.

Karagiannis et al. examined traces from multiple links, including an OC-48 backbone link and a 100 Mbps trans-Pacific backbone link in 2003, and concluded that (formatting as in the original)

•    Packet arrivals appear Poisson at sub-second time scales: The packet interarrivals follow an exponential distribution. In addition, packet sizes and interarrival times appear uncorrelated…
•    Internet traffic is nonstationary at multi-second time scales: We demonstrate that traffic oscillates around a global mean, in a piecewise linear manner.
•    Internet traffic exhibits long-range dependence (LRD) at large time-sacles: In agreement with previous findings, we observe that Internet traffic is LRD at scales of seconds and above.

They went on to note that

…Our work suggests that Poisson models should not be abandoned especially in the Internet backbone with high speeds, and huge levels of traffic multiplexing.

With respect to the OC-48 data, Karagiannis et al. also showed that

For interarrival times, independence holds for 20,000 consecutive packet arrivals…

Moreover,

To stress-test the claim for the memoryless properties of Poisson arrivals and independence, we studied bursts of packets…We find that the distributions of the duration of the busy/idle period…are well approximated by exponential distributions. This is irrespective of the interarrival time that is used as the boundary for distinguishing between idle and busy periods.

Karagiannis et al. concluded by noting that

this type of traffic model (i.e., Poisson with nonstationarity at multi-second scales) is consistent with the kind of long-range dependence that is commonly observed in network data over larger time scales…we expect the traffic characteristics for the Internet backbone to continue to grow even better behaved in the future.

The coup de grâce in the fifteen-year saga of packet interarrival behavior may have been delivered in early 2009, when Gÿorgy Terdik and Tibor Gyires delivered a paper called “Does the Internet Still Demonstrate Fractal Nature?” In this paper Terdik and Gyires noted the work of Veitch et al. and remarked that

a Poisson cluster process could model the aggregate traffic where the packet interarrivals within individual clusters of each flow could be characterized by an overdispersed Gamma distribution.

Analyzing data from an OC-192 link in 2008 with this in mind, Terdik and Gyires found that

the burstiness of the interarrival times decreased significantly compared to earlier traces…Furthermore, we found that in many traces the distribution was Poisson deviating from previous observations. Therefore in answering our original question, we can conclude that based on the sample traces, the Internet is losing its self-similar nature that was so prevalent for years.

And there you have it. It is a subtle picture of evolving behavior for network packet interarrival times, but the central point is clear: network traffic is becoming increasingly (and at high speeds already appears to be) Poissonian. This simplification means that a lot of elegant and powerful mathematical techniques that were not ever considered for profiling network traffic because of earlier results in the nineties are actually increasingly likely to be the appropriate way to handle things now and in the future. It may also mean that the debates you hear now about network capacity are not going to continue for many more years, because the telcos will be able to plan better. (Or they may continue simply because it’s to the telcos’ advantage to have such debates.)

In the next post in this series, I’ll discuss some mathematically oriented ideas that seek to exploit the emergence of Poisson traffic to provide a lasting foundation for scalable network profiling algorithms.


Random bits

20 July 2009

Joanna Rutkowska interview focusing on system-level security

Null pointer exploits in Linux

THz transistors for imaging

Quantum walks

Moths jam bat sonar


Maginot line

17 July 2009

From Andrew Krepinevich’s article in the July/August issue of Foreign Affairs:

U.S. military operations are very dependent on commercial land-based information infrastructure. If cyberattacks inflicted substantial damage on them or disrupted them, not only would great economic turmoil ensue; much of the military capability of the United States could prove to be the modern equivalent of the Maginot Line…It may be that a defensive [cyber]strategy cannot be successfully pursued and that the United States will be forced to develop its own cyberweapons and rely on deterring the worst sorts of cyberattacks. In short, the potential for a surprise of the worst sort in this realm remains a real possibility.


Random bits

17 July 2009
Nmap 5 released
Delay Tolerant Networking
AT&T 40Gbps disaster recovery exercise
Alcatel 100Gbps carrier edge router
Can AI fight terrorism?

Why Poissonian traffic models matter more now than ever, part 2

15 July 2009

My first encounters with queueing theory and non-Poisson network traffic statistics came at the same time, in early 2001. I was less than a year into my first real job and was tasked with helping a DoD development effort on a then-novel network intrusion detection system. As a mathematician sometimes masquerading as a physicist while sometimes applying these fields to communications security, I was then a total neophyte (and am still very much an amateur) when it comes to the details of network infrastructure, protocol stacks and the like.

At any rate, I arrived at an Army base in Hawaii with not a lot to do immediately: the prototype IDS was still in the early stages of implementation. So I decided a good thing to do with my time would be to simulate its performance. And I figured the best way to do that would be to try to simulate the Internet first. I was a young guy and didn’t know any better.

I rapidly worked through a bunch of papers describing new observations of power-law distributions in both network packet interarrivals and link topologies. And I built a long simulation based on this stuff in MATLAB. Probably I never got rid of half the mistakes in my code, but given an hour on my laptop, it could output a hundred or so nodes producing extremely simple surrogates for network traffic. Depending on the parameters I’d see link queues in equilibrium, or overflowing, but nothing of real utility for what I was really there to work on. So I abandoned the exercise, because by then the earliest iteration of the system was operational and I had some real data to analyze.

The point of this personal background is to highlight that I was naturally predisposed from the very beginning of my professional experience with computer networks to accept the long-prevailing views about power laws and multifractal traffic and all that. But over time that changed. The thing that started it was when a DoD researcher who I know well and whose talents I respect enormously mentioned offhandedly to me that he’d heard that network operators monitoring network backbones said that unlike on most links, their traffic was “pure Poisson”. That claim of emergent Poisson behavior at high speeds made sense to me.

As I later told a roomful of smart people in early 2009 (some of them incredulous because they’d read about fractal interarrival statistics over the years): if you had enough monkeys typing “W-O-R-D” over TELNET sessions on the same link, the fact that each monkey’s keystrokes wouldn’t be independent and identically distributed shouldn’t mean that the overall sequence of packet interarrival times would be non-Poissonian.

In fact there was both theoretical work based on models from the eighties (notably Sriram and Whitt’s 1986 paper “Characterizing Superposition Arrival Processes in Packet Multiplexers for Voice and Data”, which emphasized the importance of the relationship between the number of flows, aggregate traffic rate, and relevant timescales) and measurements from 2001 suggesting essentially the same thing. Two 2001 papers from Bell Labs by Cao et al. titled “On the Nonstationarity of Internet Traffic” and “The Effect of Statistical Multiplexing on the Long-Range Dependence of Internet Packet Traffic” argued that as the number of flows increased, traffic would become closer to the Poissonian ideal. They pointed out that binning interarrival data would not reveal this emergent behavior, and demonstrated the tendency towards Poisson behavior in models and with real data. About the only thing that appeared as if it might affect these results in practice would be link saturation, but even this was conjecture.

So what you had at this point was an opportunity for an academic catfight, but one that could have real consequences for network protocol and infrastructure design. But the fight apparently was decided outside the journals and conferences. There was a quietly solidifying consensus of fractal traffic. (As an aside, an engrossing but massive study by Harry Collins called Gravity’s Shadow details how consensus is reached among research communities through the example of the gravitational wave detection community.)

But that consensus, which still exists among the broader community, is apparently being dismantled and replaced with a new one. The first sign of this might have been a paper by Veitch et al. called “Multifractality in TCP/IP traffic: the case against”, where some of the more exotic claimed traffic behaviors were called into question. Veitch et al. also highlighted the utility of “Poisson cluster processes” (basically, Poissonian flow arrivals, but with distinct intra-flow statistics) in accurately modeling network traffic, including the reproduction of pseudoscaling. They said that

backbone traffic has been said to tend to a Poisson process with increasing traffic rate. While it is true that the distribution of inter-arrival times tends to exponential as [the number of flows] increases, the inter-arrivals remain correlated…This contradicts the necessary assumption of independent inter-arrival times of a Poisson process…One must be careful of the subtle fact that when examining inter-arrival times as traffic rates increase, one is in fact shifting the focus of observation to smaller and smaller scales.

That is, Veitch et al. argued that the correlations amongst intra-flow interarrivals would be preserved within the aggregate traffic even as the overall rate continued to increase. Although this is a subtle argument, the emergence of exponential interarrival behavior was already clear.

The next post in this series will deal with the most recent work on network packet arrival statistics.


Random bits

15 July 2009

Korean network attacks traced to proxies in UK and US

Got $1M to spend on network security? Richard Bejtlich says spend $850K on people, $100K on tech (this low figure made possible by FOSS), and $50K on miscellany

Twitter: pwned


Random bits

14 July 2009

Three reasons why US cybersecurity sucks

Schneier on the recent Korean network attacks

It might be a good idea to actually monitor IPv6 traffic


Random bits

13 July 2009

Data from Arbor regarding the recent Korean network attacks

A 112-bit elliptic curve discrete log problem solved

Warskimming


Random bits

10 July 2009

Anyone remember the rainbow books or TEMPEST?

Turns out that defense contractors are espionage targets

Update on Korean network attacks


Why Poissonian traffic models matter more now than ever, part 1

10 July 2009

I’d like to shift gears from last time and talk more explicitly about some technical issues.

Let me begin by mentioning that I am not an expert in modeling, routing, or capturing network traffic. But as Feynman said, “Science is the belief in the ignorance of the experts”. This is the first in a series of blog posts intended to highlight what appears to be a common misperception about network traffic statistics by many experts based on old data and to indirectly illustrate how this misperception may have adversely affected network security research in recent years. Finally, it will sketch some characterization and detection techniques that seek to leverage the actual features and emerging trends in network traffic.

The field of queueing theory was inaugurated by A.K. Erlang in 1909 with his publication of “The Theory of Probabilities and Telephone Conversations”, which demonstrated that telephone traffic was Poissonian. This meant that the interarrival time between any two calls could be represented as a random Poisson variable, i.e., the times between successive calls were independent and identically distributed, and the probability distribution of such a generic interarrival time T was given by

\mathbb{P}(T > t) = e^{-qt}

where q was the overall rate at which calls were placed. This equation governs not only telephone calls but also network flow interarrivals and radioactive decays. Poissonian traffic is very nice from a mathematical point of view, and this fact allowed telcos to analytically model networks and plan to provision them accordingly. Since telco infrastructure was (and is) expensive, this was a big deal.

Fast forward a little over 50 years: Leonard Kleinrock applied queueing theory to the design of packet-switched networks, and in 1969 his lab was one of the original ARPANET sites. (In fact, Kleinrock and his student Charley Kline sent the very first message over the Internet—Kline was typing “login” and got as far as “lo” before the network crashed.) The Internet was built on the edifice of queueing theory, which in turn was deeply rooted in Poissonian traffic models.

This should serve to illustrate why a now-famous 1994 paper titled “On the self-similar nature of Ethernet traffic” by Leland et al. stopped a generation of network researchers and engineers in their tracks. The measurements and analysis in this paper definitively showed that LAN packet interarrivals (versus flow interarrivals) had self-similar or fractal behavior, and were not Poissonian.

It turns out that a Poisson arrival model is the only one that has interarrival times that are both independent and stationary. So the fact that packet interarrivals were not Poissonian meant that they were not independent of each other and/or were not stationary. Stationarity essentially means “time-invariance” and for various reasons is often simply conflated with so-called weak stationarity in the literature. Weak stationarity is defined explicitly here for the sake of later discussion as follows. The starting point for characterizing a time series X_t has traditionally been the autocorrelation

r(s,t) := \frac{\mathbb{E}[(X_s - \mathbb{E}X_s)(X_t - \mathbb{E}X_t)]}{\sqrt{\sigma_s \sigma_t}}

where \sigma_s is the variance of X_s. If the expected value of X is constant and the autocorrelation can be written as r(s,t) \equiv r(t-s), then X is called weakly stationary.

Leland et al. explicitly verified that their data was stationary, but often stationarity has traditionally been taken for granted over intermediate timescales of minutes to hours. The reason is simple: while it is natural to expect the packet interarrivals on a LAN to show some daily and weekly patterns, over shorter timescales there is no obvious cause to expect statistical variation. This left independence of packet interarrival times as the odd man out. Since flow interarrivals were still Poissonian (a fact indicated experimentally for interactive TELNET and FTP sessions even in a well-known 1995 paper by Paxson and Floyd called “Wide area traffic: the failure of Poisson modeling” and corroborated by Barakat et al. in the 2003 paper “Modeling Internet backbone traffic at the flow level”), the natural conclusion would have been that individual flows exhibited autocorrelation. And in fact this turned out to be the case.

What Leland et al. essentially did to establish self-similarity was to put the timestamps X of packets traversing a link into small bins and average over the bins. They found that the resulting locally averaged time series had similar autocorrelation structure over a wide range of bin sizes and the variances scaled as an inverse power of the bin size, which are the defining features of “exact second-order self-similarity”. For Poissonian traffic, larger bins would have produced progressively flatter locally averaged series, but they saw that the locally averaged series looked qualitatively similar for any reasonable choice of bin size.

In short, LAN traffic was “bursty” no matter how you looked at it:

For example, our analysis of the Ethernet data shows that the generally accepted argument for the “Poisson-like” nature of aggregate traffic, namely, that aggregate traffic becomes smoother (less bursty) as the number of traffic sources increases, has very little to do with reality. In fact, using the degree of self-similarity…as a measure of “burstiness,” we show that the burstiness of LAN traffic typically intensifies as the number of active traffic sources increases, contrary to commonly held views.

This presaged challenges for computer network provisioning of a sort that were unprecedented before then. People took notice. A raft of studies followed that confirmed and extended these results. They are still held as gospel by most of the network research community. And they inform the methods used to analyze and model network traffic today.

But experimental results over the last five years and longstanding theoretical work have indicated that it is time to reevaluate the assumption of non-Poisson behavior. In a very real sense, it may be time to go back to basics in order to provide more sophisticated tools for analyzing network traffic. The next posts in this series will speak to these developments.

update 11/24: fixed historical blurb re: Charley Kline.


Follow

Get every new post delivered to your Inbox.