In my book there is nothing as good as real data produced by a red team, except captured data produced by a red team from NSA (even if it’s not really annotated or labeled well and their MO isn’t quite what it would be in practice). When I was involved with a past IDS research effort in the early 2000s there was a great deal of emphasis on the DARPA/Lincoln Labs datasets, which were old even back then. They were a lot better than nothing, but one thing that concerns me and most everyone else is the lack of good common data, let alone reproducible testbeds like the National Cyber Range is supposed to provide. So it is nice to see that the DARPA/LL datasets are dead–long live the 2009 CDX datasets.
(Wireshark opened the small border data capture fine on my laptop, so don’t let the lack of .pcap extensions bother you.)
But one often-implied corollary of having common or reproducible input data troubles me. Some folks have got the idea that it is possible to scientifically evaluate computer security systems. Even with good input data, I don’t believe such a thing is really possible except in an extremely narrow sense. Let me explain by way of analogy.
Suppose someone came to you with a box of padlocks of the same model and asked you to scientifically evaluate the security of that padlock model. There are a few things you could do that would be obvious. You could test mechanical properties scientifically, asking questions like: How much force does it take applied in such-and-such a way to produce a mechanical failure of the lock? What is the dominant failure mode that results? but it is very implausible to imagine that you could evaluate all the possible failure modes–and hence the actual security of the lock–scientifically.
Sticking with the mechanical failure modes: what if someone decides to use acid to dissolve the lock? or liquid nitrogen to make it brittle? or heats it with an acetylene torch? And maybe a cold, brittle lock is easier to pick; or a hot, ductile lock is…you get the picture. And this doesn’t even begin to address lockpicking in all its forms, which is equal parts art and science.
(BTW/FWIW: one of my favorite episodes from college involves breaking into a room [that I was allowed to be in] that was secured with a fancy keypad lock system using nothing more than a piece of string from an interoffice envelope. It was after a power outage, and the keypad was inoperative, but the folks that installed the lock didn’t think about a very simple mechanical failure mode.)
In the real world you can usually expect a combinatorial explosion of possible failure modes that would have to be tested to assure security. Even in quantum cryptography people rightly worry about things that aren’t in the formal protocols, like efficiencies and TEMPEST-type issues with photon detectors. One of the reasons people are so excited about quantum crypto in the first place is that it is, among other things, a truly credible attempt to use physical theory to reduce the number of failure modes in a security protocol. And one of the reasons I don’t bother to pay attention to formal security proofs outside of cryptography is that their assumptions are never credible to a degree comparable to the Bell inequalities.
This is not to say that security systems shouldn’t be tested–of course they should (especially if there is a “proof” of security)–but it doesn’t make sense to read too much into the results if they’re good. (If your results from evaluating a security system are bad, then that security system is not for you, regardless of why.) In science a hypothesis can never be proved, only disproved. And in security evaluation a system can never be proven secure, only broken. The difference is that in science the hypotheses can be deductively identified and tailored to test good theories that seek to reflect a underlying objective truth of big-n Nature; in security evaluation the system can only be used to test attacks that seek to reflect the ingenuity of one particular set of red team tactics, for which there is often no underlying objective validity, just a common-sense notion of what ought to be done. The domain of applicability of any security evaluation is fundamentally limited because there is no way to come up with a scientific theory of security. Science typically deals with establishing and understanding regularities in phenomena, while security evaluation typically deals with the opposite.
I was hoping to be able to (but can’t) make it to a meeting in Seattle at the end of the month that is trying to produce
progress in the area of Quantifiable Scientific Evaluation of CyberSecurity research. Currently, there is no well understood scientific standard used to guage [sic] the quality of research results in this area. Instead, decisions are made by program committees and journal editors. Also, experimental results are often not repeatable, sometimes due to the proprietary nature of the code or the privacy of the data. This meeting seeks to establish the beginnings of an agreed-upon set of scientific standards whereby progress can be measured, and identify barriers to such standards.
Since I can’t be there, I will just say this: Concentrate on getting good, normalized inputs and outputs for comparative security evaluations. That is plenty hard enough, even though it is not science except in a trivial sense. If a goal is to use nontrivial science in security research, try applying ideas from science (like immunology or my favorite, statistical physics) and mathematics in the development of engineering principles for security systems–where it can be of some benefit–rather than in the evaluation of systems, where anything nontrivial that can be done might be valid and statistically significant and of practical engineering value, but is still probably not scientific.
Anyway, my hat is off to the CDX guys for putting those pcap files and logs up.
Securing the Information Highway
20 October 2009The November/December issue of Foreign Affairs (unfortunately not yet available online as of this writing) has an eponymous piece by Wesley Clark and Peter Levin: in it, they write that
Clark and Levin recount William Safire‘s claim that a 3-kiloton explosion of a Siberian natural gas pipeline in 1982—”the most monumental non-nuclear explosion and fire ever seen from space”—was the direct consequence of a Trojan inserted into Canadian SCADA software that the CIA allowed the KGB to steal. They recirculate the rumor that the Israeli destruction of a purported Syrian nuclear facility in 2007 was facilitated by a cyberattack targeting Syrian air defense systems. (This blog has linked to other reports of capabilities along similar lines, such as this one.)
But their real focus is (not surprisingly, given Levin’s history as founder of a hardware outfit specializing in the area) is on the problem of validating hardware. DoD has been very concerned with the idea of hardware Trojans the last few years. Nobody in the military/intelligence-industrial complex wants to take it on faith that chips that are manufactured in China or Taiwan don’t have backdoors. There are apparent precedents for the hardware Trojan such as old reports involving Crypto AG. So NSA started up a trusted foundry and DARPA started the TRUST program (whose PM funded some of my research some years back, so I applaud his taste on both counts). But that leaves the vast majority of chips in network components still unaccounted for, including a large number of counterfeit chips.
Clark and Levin propose an emphasis on reconfigurable hardware (such as FPGAs) and the sort of immunological paradigm started by the Forrest group at UNM as an example of a sound defensive strategy. While the practical utility (as compared to the undeniable conceptual elegance) of the paradigm for network defense is not clear to me (but then again I’m obviously a partisan when it comes to the best scientific principles for designing network defense infrastructure), the ideas of using reconfigurable hardware and avoiding a computational and network monoculture that goes hand-in-hand with immunological principles are sound ones that I’ve agreed with for some time. I gained an appreciation of the benefits of FPGAs from performing research on algorithms for reconfigurable computing architectures some years back, and at a conference last year I got into a brief argument on the security dangers of monocultures with a government sysadmin who lauded the monolithic computing infrastructure he maintained. So it’s not a stretch to say that I am extremely sympathetic to their point of view.
Clark and Levin close by highlighting the need for open infrastructure–both reconfigurable hardware and open source software, and (insofar as it can be implemented) this is the entirely correct approach to technological security of any form. As Reagan said: “trust, but verify”.
update 10/23: The article is available here (subscription required).