Random bits

13 August 2009

This week in mathematical physics returns to talk about the flavor du jour…graphene

100 GbE ruminations

Dense packings of the Platonic and Archimedean solids


MIL-OSS

13 August 2009

Yesterday I spoke at MIL-OSS at the Georgia Tech Research Institute (thanks to the audience and the organizers) and talked about our technology and open source strategy, but the highlight of the day for me was when my former colleague David Wheeler explained Jim Stogdill’s phrase “code is maneuver” and its implications for open source: because the effectiveness of militaries will be increasingly dependent on their software, the ability to modify, patch, and improve that software (and of course to secure the systems and networks running that software) will be increasingly decisive in conflicts.

When open source gets pitched to the Hill and AT&L, a strong case can be made for it on national security grounds: government purpose rights mean less in practice than they do in theory, and Uncle Sam should have the four freedoms for (e.g.) the F-22 avionics code, and not just for webservers. (More generally, anyone who pays for custom code should either press for these freedoms or get a deep discount.) It’s important here to note that open source does not need to mean “released to the public”: it’s really not that hard to license even highly classified code under an open-source license and still deal with classification and ITAR restrictions appropriately.

There will be a lot of inertia against this. Procurement and management execs care more about how much money they control than almost anything else, and the easiest way to spend a lot of money (as someone commented during a panel session) is to write the same software over and over again. But it’s important to note that open source software generally creates wealth and markets for the public, even if it adversely affects the bottom line of pure-proprietary software companies. Their business models are not an excuse to needlessly duplicate time, money, and effort on projects–and certainly not to artificially impede the performance of our military and government.


Common data sets and the illusion of scientific security testing

12 August 2009

In my book there is nothing as good as real data produced by a red team, except captured data produced by a red team from NSA (even if it’s not really annotated or labeled well and their MO isn’t quite what it would be in practice). When I was involved with a past IDS research effort in the early 2000s there was a great deal of emphasis on the DARPA/Lincoln Labs datasets, which were old even back then. They were a lot better than nothing, but one thing that concerns me and most everyone else is the lack of good common data, let alone reproducible testbeds like the National Cyber Range is supposed to provide. So it is nice to see that the DARPA/LL datasets are deadlong live the 2009 CDX datasets.

(Wireshark opened the small border data capture fine on my laptop, so don’t let the lack of .pcap extensions bother you.)

But one often-implied corollary of having common or reproducible input data troubles me. Some folks have got the idea that it is possible to scientifically evaluate computer security systems. Even with good input data, I don’t believe such a thing is really possible except in an extremely narrow sense. Let me explain by way of analogy.

Suppose someone came to you with a box of padlocks of the same model and asked you to scientifically evaluate the security of that padlock model. There are a few things you could do that would be obvious. You could test mechanical properties scientifically, asking questions like: How much force does it take applied in such-and-such a way to produce a mechanical failure of the lock? What is the dominant failure mode that results? but it is very implausible to imagine that you could evaluate all the possible failure modes–and hence the actual security of the lock–scientifically.

Sticking with the mechanical failure modes: what if someone decides to use acid to dissolve the lock? or liquid nitrogen to make it brittle? or heats it with an acetylene torch? And maybe a cold, brittle lock is easier to pick; or a hot, ductile lock is…you get the picture. And this doesn’t even begin to address lockpicking in all its forms, which is equal parts art and science.

(BTW/FWIW: one of my favorite episodes from college involves breaking into a room [that I was allowed to be in] that was secured with a fancy keypad lock system using nothing more than a piece of string from an interoffice envelope. It was after a power outage, and the keypad was inoperative, but the folks that installed the lock didn’t think about a very simple mechanical failure mode.)

In the real world you can usually expect a combinatorial explosion of possible failure modes that would have to be tested to assure security. Even in quantum cryptography people rightly worry about things that aren’t in the formal protocols, like efficiencies and TEMPEST-type issues with photon detectors. One of the reasons people are so excited about quantum crypto in the first place is that it is, among other things, a truly credible attempt to use physical theory to reduce the number of failure modes in a security protocol. And one of the reasons I don’t bother to pay attention to formal security proofs outside of cryptography is that their assumptions are never credible to a degree comparable to the Bell inequalities.

This is not to say that security systems shouldn’t be tested–of course they should (especially if there is a “proof” of security)–but it doesn’t make sense to read too much into the results if they’re good. (If your results from evaluating a security system are bad, then that security system is not for you, regardless of why.) In science a hypothesis can never be proved, only disproved. And in security evaluation a system can never be proven secure, only broken. The difference is that in science the hypotheses can be deductively identified and tailored to test good theories that seek to reflect a underlying objective truth of big-n Nature; in security evaluation the system can only be used to test attacks that seek to reflect the ingenuity of one particular set of red team tactics, for which there is often no underlying objective validity, just a common-sense notion of what ought to be done. The domain of applicability of any security evaluation is fundamentally limited because there is no way to come up with a scientific theory of security. Science typically deals with establishing and understanding regularities in phenomena, while security evaluation typically deals with the opposite.

I was hoping to be able to (but can’t) make it to a meeting in Seattle at the end of the month that is trying to produce

progress in the area of Quantifiable Scientific Evaluation of CyberSecurity research. Currently, there is no well understood scientific standard used to guage [sic] the quality of research results in this area. Instead, decisions are made by program committees and journal editors.  Also, experimental results are often not repeatable, sometimes due to the proprietary nature of the code or the privacy of the data. This meeting seeks to establish the beginnings of an agreed-upon set of scientific standards whereby progress can be measured, and identify barriers to such standards.

Since I can’t be there, I will just say this: Concentrate on getting good, normalized inputs and outputs for comparative security evaluations. That is plenty hard enough, even though it is not science except in a trivial sense. If a goal is to use nontrivial science in security research, try applying ideas from science (like immunology or my favorite, statistical physics) and mathematics in the development of engineering principles for security systems–where it can be of some benefit–rather than in the evaluation of systems, where anything nontrivial that can be done might be valid and statistically significant and of practical engineering value, but is still probably not scientific.

Anyway, my hat is off to the CDX guys for putting those pcap files and logs up.


Random bits

11 August 2009

“We’re talking about small businesses, the lifeblood of the U.S., that are getting hit for five or six figures because they’ve embraced online banking.”

“The CEO who lets the Security organization become the compliance department has abdicated…his responsibility to understand and manage organizational risk. That is a fiduciary breach of CEO responsibility to shareholders.”

Supercomputer visuals without graphics chips (reminds me of a cute little lattice Boltzmann MATLAB routine I wrote that ran 100x faster than the results could be properly displayed…that sort of thing must be pretty frustrating at petaflop scales)

“The fact that [the President] has yet to name anyone [as White House cybersecurity coordinator] — despite the administration’s self-professed focus on information security matters — illustrates the challenge of finding someone willing to take on the job.” I wonder who they’ll find that’s “crazy enough to attempt the job.”


Scaling up ion trap quantum computing

9 August 2009

Science has just published a paper by Dave Wineland’s group at NIST that describes an experimental implementation of

a combination of all the fundamental elements required to perform scalable quantum computing using qubits stored in the internal states of trapped atomic ions.

(Technology Review has a layman’s overview here.)

It has been eight years since I really kept up with quantum computing, but this result is clearly very important. I recall hearing about the idea of a “quantum bus” to shuttle ions back and forth in order to scale up ion trap quantum computing at a dinner that I was at along with Wineland in 2000, and the idea appears to date from 1998 or even earlier. There were no obvious showstoppers, his group (among others) was good and certain to be adequately funded for as long as it would take, and so it is not that big a surprise to see this working now.

But even though the transport of ions in traps had been done before, this current result has built on of over a decade of work. An idea apparently developed over the last few years that I don’t recall from those early days (it seems to date from a 2003 paper) involves using Mg+ “refrigerator” ions to sympathetically cool qubit-carrying Be+ ions by

using a combination of Doppler cooling and resolved sideband cooling on the 24Mg+ ions. Importantly, the cooling light only interacts with 24Mg+, leaving the qubits stored in 9Be+ intact.

Although the fact that it took so long for them to get here indicates just how difficult the physics and engineering challenges are, more results extending this work (probably starting with improving the fidelity, with longer-term goals of working with multidimensional trap arrays and finding good ways to scale up the entanglement distribution) are certain to follow.

In the last paragraph of a review paper on quantum algorithms and protocols I wrote in 2000, I said that “it is appropriate to say some words about whether a scalable quantum computer will in fact ever be built. [I believe] that such devices could well be built within 20 years.” The Wineland group’s latest result makes me feel better about that statement (and worse about using RSA for anything long-term) than I have in the last couple of years.

Update: Speaking of “certain to be adequately funded for as long as it would take”, I just noticed a Nature article from June talking about IARPA cutting funding to Wineland’s group. From the article:

In 2007, the newly created IARPA took over funding for quantum information science from the National Security Agency. The following year, IARPA stopped funding the NIST researchers because, it says, it did not want to fund other government agencies…the NIST funds ran out while [IARPA] managers were reviewing the programme and deciding how NIST might be involved.

While IARPA’s stance is understandable, this sort of disruption is not conducive to National Medal of Science-level work, even if as NIST said

It’s ultimately the responsibility of NIST to make sure these programmes receive the resources they need, and we are committed to ensuring they remain adequately funded.


Why Poissonian traffic models matter more now than ever, part 6

6 August 2009

Imagine that you were a network security researcher working to develop a traffic characterization or anomaly detection algorithm for a packet-level (versus flow-level) security platform. If you were convinced that complex packet interarrival behavior was intrinsic to network traffic, you might not even pursue the use of tools leveraging Poisson behavior. Even armed with the time rescaling theorem, the only reasonable way to estimate packet rates over short time intervals would require using a model like a Poisson cluster process, and you’d still need a good model for the clusters themselves before you felt comfortable. Even though this looks doable, it’d be complicated, and going any further to produce an overall algorithm would just be that much harder. (Although there is a known large-deviation result for Poisson cluster processes, it is considerably more complicated than the result sketched earlier. See Bordenave and Torrisi’s paper “Large deviations of Poisson cluster processes” for more details.) Add to this the fact that most network traffic security platforms don’t normalize their data in a way that is mathematically convenient, and it’s not hard to see why these sorts of techniques have been underutilized.

Other methods besides the one described earlier in this series of posts, including methods based on Markov processes, are also well-suited for providing scalable, long-term solutions to computer network traffic characterization. Equilibrium’s platform provides a solid foundation for incorporating these (and other) techniques. For example, the Girsanov theorem can be applied to produce a simple, scalable likelihood ratio test rooted in martingale theory that can help to determine changes in network traffic patterns with only trivial dependence on any statistical properties of traffic. It can be evaluated on a per-packet basis at high speeds, or periodically at whatever speeds the computational resources available will support.

More complicated large deviation results have also been studied, including one based on the transient fluctuation theorem (TFT) in statistical physics. Unfortunately, the computation of the sample mean entropy production rate that is at the heart of this result is so involved that pursuing a scalable characterization or detection algorithm based on it doesn’t look worthwhile unless the computational requirements can be improved. (See Jiang, Qian, and Qian’s Mathematical Theory of Nonequilibrium Steady States: On the Frontier of Probability and Dynamical Systems for background on this.) Interestingly, however, the TFT turns out to be a corollary of the Girsanov theorem but was discovered decades later. Like the path integral, whose foundations were laid by Wiener in 1923 (when Feynman was five years old), the TFT was effectively rediscovered by physicists long after probability theorists had covered much of the same ground.

The Azuma-Hoeffding inequality provides another powerful tool for analyzing martingales with bounded differences that is similar to large-deviation techniques. Getting good a priori bounds in order to apply this inequality is tricky for martingales derived from Poisson or continuous-time Markov processes, principally because such processes can have a large number of arrivals or transitions in very short time intervals. Even though this “explosion” is very unlikely at any given time, on the relatively rare but inevitable occasions when it happens, it makes a big difference. Though for martingales derived from the Dynkin formula a trick using time-ordered exponentiation can help, there is a high computational cost incurred and it is not clear that the results can be useful without frequent resets anyway. You can cheat and get the bounds after the fact, but you inevitably pay a price for that. (The technical reason for this is that the implied a posteriori measure is not absolutely continuous with respect to the actual measure.) One of our team has actually numerically evaluated the price paid for cheating this way in the case of Poisson processes: the price isn’t always that bad, but sometimes it is, and it’s always there.

Striking the right balance between effectiveness, simplicity, and rigor for building on results like these is something we like to focus on at Equilibrium as part of providing the best possible foundation for a long-term network monitoring solution.


Random bits

6 August 2009

Galrahn’s take on Russian subs near the east coast (and why this is even in the news)

USAF is definitely going to centralize network responsibilities this time around

“The first simple programming language with a molecular-scale implementation”


Random bits

5 August 2009

More from Arbor on the Iranian firewall

Five futuristic interfaces on display at SIGGRAPH


Random bits

4 August 2009

Return to the Iranian Firewall

“every computer sold in China will have to contain the Green Dam Youth Escort software package” (sounds like China would be opening itself up to network attacks from the US along with other comers if it weren’t for self-deterrence)

“a general principle of organization for random packing [that] may provide the foundations for a theory of jammed matter”


Why Poissonian traffic models matter more now than ever, part 5

3 August 2009

To sketch how the large-deviation approach discussed in the last post in this series can actually work, consider a process with conditional rate function 1 for the first N = 10000 arrivals and q_* thereafter. Take the maximum-likelihood rate estimate over the previous n arrivals and see how the resulting putative martingales behave using the following MATLAB code:

function y=blogf(q,N,n,plotflag)

% Piecewise-constant inhomogeneous Poisson
% process at unit rate for N steps, then at
% rate q for N steps. MLE of rate obtained
% at each timestep using a window of n steps,
% then time rescaling theorem applied to
% obtain a putative unit-rate Poisson process.
% Finally the estimated martingale is formed
% and a large-deviation result applied
% to obtain the probability of the observed
% behavior at the beginning and middle
% (i.e., where the jump occurs).

% NB. There is no error checking done (e.g.,
% for n, N) and the code is far from perfect,
% but should be OK for illustrative purposes

% unit rate Poisson process
at1 = cumsum(-log(rand(1,N)));
% rate q  
at2 = cumsum(-log(rand(1,N))/q);    
at = [at1, at1(end)+at2];   % arrival times
iat = [at(1), diff(at)];    % interarrivals

% empirical rate estimate over last n arrivals
qe = zeros(1,2*N);    
for j = 1:n
 qe(j) = j/at(j);
end
for j = (n+1):2*N
 qe(j) = n/(at(j) - at(j-n));
end

% time rescaled process w/ empirical rate
tau = zeros(1,2*N);   
tau(1) = iat(1)*qe(1);
for j = 2:2*N
 tau(j) = tau(j-1) + iat(j)*qe(j);
end

% putative martingale from empirical rescaling
m = (1:2*N) - tau;  

t = 1;  % timescale
% j1 and j2 are respectively the indices
% of the first arrival further than n*t from 0
% and from the Nth arrival
j1 = find(tau > n*t, 1, 'first');
j2 = find(tau > tau(N)+n*t, 1, 'first');
% martingale differences
dm1 = m(j1) - m(1); dm2 = m(j2) - m(N);
% a, x, yy vars from large-deviation result
a1 = dm1/n; a2 = dm2/n;
x1 = a1/t + 1; x2 = a2/t + 1;   
yy = t*n;   % (note: y is output var.)
% log of asymp. prob. of fluctuations
logp1 = yy*(-x1*log(x1) + x1 - 1);   
logp2 = yy*(-x2*log(x2) + x2 - 1);

y = [logp1 logp2];

if plotflag
 figure;plot(tau,m,'k');hold;
 mmmm = [min(m) max(m)];
 plot(tau(N)*ones(1,2),mmmm,'k:');
 axis tight
 xlabel('\tau'),ylabel('N_\tau - \tau')
 s1 = 'actual rate changes from 1 to ';
 s2 = sprintf('%g',q);
 s3 = ' at ';
 s4 = sprintf('%d',N);
 s5 = 'th arrival; MLE rate over ';
 s6 = sprintf('%d',n);
 s7 = ' arrivals';
 title([s1 s2 s3 s4 s5 s6 s7]);
end

Two typical outputs are shown below.

Typical graphical output of blogf for <img src='http://s0.wp.com/latex.php?latex=q_%2A+%3D+1.5%2C+N+%3D+10000%2C+n+%3D+1000.&bg=ffffff&fg=333333&s=0' alt='q_* = 1.5, N = 10000, n = 1000.' title='q_* = 1.5, N = 10000, n = 1000.' class='latex' />” width=”300″ height=”225″ /><p class=Typical graphical output of blogf for q_* = 1.5, N = 10000, n = 1000.

As above, except <img src='http://s0.wp.com/latex.php?latex=q_%2A+%3D+.67&bg=ffffff&fg=333333&s=0' alt='q_* = .67' title='q_* = .67' class='latex' />.” width=”300″ height=”225″ /><p class=As above, except q_* = .67.

The following code snippet (which omits the necessary plotting commands) can be used inline for a quick evaluation of the behavior of the overall technique as a function of q_* and n:

% L = number of runs to average over
N = 10000; L = 40;
q = .65:.05:1.65; n = ceil(logspace(1,3,25));
for j = 1:21
 qj = q(j);
 for k = 1:25
  nk = n(k); temp = zeros(L,2);
  for l = 1:L
   temp(l,:) = blogf(qj,N,nk,0);
  end
  z1(j,k) = mean(temp(:,1));    % initial
  z2(j,k) = mean(temp(:,2));    % middle
 end
end

Using this, it’s easy to see the difference between the log-likelihood of observed fluctuations before and during an abrupt rate change.

Average log-likelihood of the initial <img src='http://s0.wp.com/latex.php?latex=n&bg=ffffff&fg=333333&s=0' alt='n' title='n' class='latex' /> steps over 40 observed runs as a function of <img src='http://s0.wp.com/latex.php?latex=n&bg=ffffff&fg=333333&s=0' alt='n' title='n' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=q_%2A&bg=ffffff&fg=333333&s=0' alt='q_*' title='q_*' class='latex' />. Note that there should be no dependence on <img src='http://s0.wp.com/latex.php?latex=q_%2A&bg=ffffff&fg=333333&s=0' alt='q_*' title='q_*' class='latex' /> in this case, and that in fact no such dependence is evident.” width=”300″ height=”225″ /><p class=Average log-likelihood of the initial n steps over 40 observed runs as a function of n and q_*. Note that there should be no dependence on q_* in this case, and that in fact no such dependence is evident.

Average log-likelihood of the <img src='http://s0.wp.com/latex.php?latex=n&bg=ffffff&fg=333333&s=0' alt='n' title='n' class='latex' /> steps immediately following a rate change over 40 observed runs. Note the tame behavior around <img src='http://s0.wp.com/latex.php?latex=q_%2A+%3D+1&bg=ffffff&fg=333333&s=0' alt='q_* = 1' title='q_* = 1' class='latex' />.” width=”300″ height=”225″ /><p class=Average log-likelihood of the n steps immediately following a rate change over 40 observed runs. Note the tame behavior around q_* = 1.

From the figures, it’s pretty clear that this kind of mathematically oriented technique can work effectively to detect and estimate the likelihood of even relatively small but abrupt rate changes quickly, without relying on heuristics.

But it can do more. Suppose the rate is piecewise linear (for convenience, as a function of arrivals), changing from 1 to q_* linearly over N arrivals and then remaining there for another N arrivals. The relevant changes to the MATLAB code basically consist of using the lines

qq = linspace(1,q,N);
at2 = cumsum(-log(rand(1,N))./qq);
at3 = cumsum(-log(rand(1,N))/q);
at12end = at1(end)+at2(end);
at = [at1, at1(end)+at2, at12end+at3];

and changing instances of 2N to 3N along with revised plotting commands.

Here the size of the window used makes a big difference:

Typical graphical output of the modified blogf code for <img src='http://s0.wp.com/latex.php?latex=q_%2A+%3D+1.5%2C+N+%3D+10000%2C+n+%3D+1000&bg=ffffff&fg=333333&s=0' alt='q_* = 1.5, N = 10000, n = 1000' title='q_* = 1.5, N = 10000, n = 1000' class='latex' />.” width=”300″ height=”225″ /><p class=Typical graphical output of the modified blogf code for q_* = 1.5, N = 10000, n = 1000.

Typical graphical output of the modified blogf code for <img src='http://s0.wp.com/latex.php?latex=q_%2A+%3D+1.5%2C+N+%3D+10000%2C+n+%3D+5000&bg=ffffff&fg=333333&s=0' alt='q_* = 1.5, N = 10000, n = 5000' title='q_* = 1.5, N = 10000, n = 5000' class='latex' />.” width=”300″ height=”225″ /><p class=Typical graphical output of the modified blogf code for q_* = 1.5, N = 10000, n = 5000.

The average likelihood after the rate begins to change for q_* = 1.5, N = 10000 is approximately 10^{-0.48} for n = 1000 and 10^{-17} for n = 5000. Of course, the rate estimation window needn’t equal the window used for measuring the likelihoods of fluctuations, but this just adds another variable to the analysis.

The overall point is that as arrivals get closer to Poissonian, so that an arrival rate can be well-defined and estimated, the picture for characterization and detection algorithms improves substantially. The next and final post in this series will give a little bit of perspective.


Follow

Get every new post delivered to your Inbox.