Feynman called the canonical ensemble

the summit of statistical mechanics, and the entire subject is either the slide-down from this summit, as the principle is applied to various cases, or the climb-up to where the fundamental law is derived and the concepts of thermal equilibrium and temperature

Tclarified.

In short, the canonical ensemble is the probabilistic description of a system in thermal equilibrium with its environment. It is expressed through the Boltzmann-Gibbs distribution (which in accordance with the more general mathematical usage I will simply call the Gibbs distribution):

where is the energy of the *k*th possible state of the system (for convenience, assume a finite number of such possible states), the partition function that serves as a normalizing factor on the right hand side above is given by and where is the Boltzmann constant.

There is a quick derivation of the Gibbs distribution for a finite system from the basic postulate that ** the probability of a state depends only on its energy**. I have seen elements of it in physics textbooks, but invariably I have found that the derivations introduce a function that counts the number of system states at a given energy. That approach has the clear advantage of motivating the introduction of entropy, but it also has the cost of clouding the essence of the derivation itself.

What follows is a quick derivation of the Gibbs distribution along different lines. It is not completely rigorous as presented here, and I cannot recall having seen it anywhere else, but it is almost certainly out there somewhere. (As usual in cases like this, I will give $.50 and a Sprite to the first person who can show me an essentially identical argument in the literature: claim your prize in a comment.)

The key to the derivation is that only differences in energy can be measured, or as Feynman put it in his derivation, “energy is only defined up to an additive constant”. This and the basic postulate imply that

for some function and arbitrary. Define

and note that by definition. It follows that

Therefore

implying that

and in turn, since

As a result,

Setting, without loss of generality,

and

produces the Gibbs distribution so long as the temperature is *defined* via Note also that

so that as required for the self-consistency of the argument. And although the derivation here is only appropriate for a fixed this amounts to the canonical ensemble.

In effect, this way of arriving at the Gibbs distribution takes the notion of temperature as a primitive, compared to the usual way which takes entropy as a primitive. There is a surprisingly good reason to do this: entropy can be introduced, defined, and applied in much broader contexts than physics, precisely because it is abstract. Temperature, on the other hand, has no obvious information-theoretical interpretation, and more often than not its physical interpretation has to be given in terms of the average kinetic energy of a gas particle in thermal equilibrium with whatever system is actually under consideration. That is, the meaning of temperature is only really clear in relation to a gas, though as the well-known kinetic theorist Harold Grad put it:

Any dynamical system can be used as a thermometer to measure the temperature of another large enough system.

Hasok Chang’s recent book *Inventing Temperature: Measurement and Scientific Progress* and Lawrence Sklar’s wonderful *Physics and Chance: Philosophical Issues in the Foundation of Statistical Mechanics* both discuss the nature of temperature, and with complementary emphases.

One of the things that we’ve done at EQ is to define a sensible notion of temperature for network traffic and demonstrate its utility for anomaly detection. The essential ideas and some actual data are detailed in a paper available from the downloads section of our website (note that this is a preprint addressed to physicists, and it’s still a work in progress). The question “what is the temperature of a network?” doesn’t have a “right answer”, but neither does the question “what is the entropy of X?” Even so, there are ways to define “appropriate” temperatures and/or entropies, some better than others. A 2000 paper by Mark Burgess of cfengine fame was among the first places that ideas like this were publicly applied to network security, though Burgess had floated these ideas a few years before then, and EQ’s technology drew a lot of lessons from the later stages of an effort initiated by Dave Ford at NSA that was declassified around the same time.

Because temperature is usually only defined in thermal equilibrium, where the Gibbs distribution applies, it makes sense to go in the opposite direction and use the Gibbs distribution to define an effective temperature for systems–whether or not they are actually in equilibrium. This is basically what we do: and it turns out that there is a unique way to do it provided that a characteristic timescale for the system is known. In situations like these I take to heart a comment of Henry McKean, a mathematician who is well known for contributions to probability theory and statistical physics (among other things) that

One of the faults of mathematicians is [that] when physicists give them an equation, they take it absolutely seriously.

In a nutshell, statistical physics is great at extracting the few most relevant parameters describing gigantic volumes of data. The detailed microstate of a cup of water encodes many orders of magnitude more information than can go over a 100 Gbps link in a year, and temperature, pressure, and volume do a pretty good job of describing the important features. It’s not much of a stretch to apply the same ideas to network traffic.