Why Poissonian traffic models matter more now than ever, part 6

Imagine that you were a network security researcher working to develop a traffic characterization or anomaly detection algorithm for a packet-level (versus flow-level) security platform. If you were convinced that complex packet interarrival behavior was intrinsic to network traffic, you might not even pursue the use of tools leveraging Poisson behavior. Even armed with the time rescaling theorem, the only reasonable way to estimate packet rates over short time intervals would require using a model like a Poisson cluster process, and you’d still need a good model for the clusters themselves before you felt comfortable. Even though this looks doable, it’d be complicated, and going any further to produce an overall algorithm would just be that much harder. (Although there is a known large-deviation result for Poisson cluster processes, it is considerably more complicated than the result sketched earlier. See Bordenave and Torrisi’s paper “Large deviations of Poisson cluster processes” for more details.) Add to this the fact that most network traffic security platforms don’t normalize their data in a way that is mathematically convenient, and it’s not hard to see why these sorts of techniques have been underutilized.

Other methods besides the one described earlier in this series of posts, including methods based on Markov processes, are also well-suited for providing scalable, long-term solutions to computer network traffic characterization. Equilibrium’s platform provides a solid foundation for incorporating these (and other) techniques. For example, the Girsanov theorem can be applied to produce a simple, scalable likelihood ratio test rooted in martingale theory that can help to determine changes in network traffic patterns with only trivial dependence on any statistical properties of traffic. It can be evaluated on a per-packet basis at high speeds, or periodically at whatever speeds the computational resources available will support.

More complicated large deviation results have also been studied, including one based on the transient fluctuation theorem (TFT) in statistical physics. Unfortunately, the computation of the sample mean entropy production rate that is at the heart of this result is so involved that pursuing a scalable characterization or detection algorithm based on it doesn’t look worthwhile unless the computational requirements can be improved. (See Jiang, Qian, and Qian’s Mathematical Theory of Nonequilibrium Steady States: On the Frontier of Probability and Dynamical Systems for background on this.) Interestingly, however, the TFT turns out to be a corollary of the Girsanov theorem but was discovered decades later. Like the path integral, whose foundations were laid by Wiener in 1923 (when Feynman was five years old), the TFT was effectively rediscovered by physicists long after probability theorists had covered much of the same ground.

The Azuma-Hoeffding inequality provides another powerful tool for analyzing martingales with bounded differences that is similar to large-deviation techniques. Getting good a priori bounds in order to apply this inequality is tricky for martingales derived from Poisson or continuous-time Markov processes, principally because such processes can have a large number of arrivals or transitions in very short time intervals. Even though this “explosion” is very unlikely at any given time, on the relatively rare but inevitable occasions when it happens, it makes a big difference. Though for martingales derived from the Dynkin formula a trick using time-ordered exponentiation can help, there is a high computational cost incurred and it is not clear that the results can be useful without frequent resets anyway. You can cheat and get the bounds after the fact, but you inevitably pay a price for that. (The technical reason for this is that the implied a posteriori measure is not absolutely continuous with respect to the actual measure.) One of our team has actually numerically evaluated the price paid for cheating this way in the case of Poisson processes: the price isn’t always that bad, but sometimes it is, and it’s always there.

Striking the right balance between effectiveness, simplicity, and rigor for building on results like these is something we like to focus on at Equilibrium as part of providing the best possible foundation for a long-term network monitoring solution.

Leave a Reply