Entropy: the measure of our ignorance

Mar 14, 2026 · #information-theory #entropy #philosophy #complexity

Entropy is usually mistaken for disorder. It is better understood as a measure of what we do not know about a system — the bridge that ties together thermodynamics, information, and the physical cost of forgetting.

An ordered lattice of particles dissolving into a scattered cloud, illustrating entropy as the number of indistinguishable configurations. — Entropy counts the configurations we cannot tell apart — the gap between what we measure and what we ignore.

Ask ten people what entropy is and nine will answer “disorder.” It is the popular image: the room that never tidies itself, the coffee that cools, the universe drifting toward chaos. The picture is not wrong, but it hides the essential point. Entropy is not a property of objects. It is a property of our knowledge of objects — a measure of uncertainty. Once that shift is made, three disciplines that seemed unrelated turn out to be the same thing seen from three angles.

Boltzmann: counting the ways of being

Ludwig Boltzmann, at the end of the nineteenth century, attacked thermodynamics with a radical idea. A gas in a box is something like 10²³ molecules. At the macroscopic scale we measure only a handful of quantities: temperature, pressure, volume. That is the macrostate. But billions of billions of distinct microscopic configurations — the positions and velocities of every molecule, the microstates — produce exactly the same macroscopic reading.

His formula, carved on his tombstone in Vienna, reads S = k · ln(W): the entropy S equals Boltzmann’s constant k times the logarithm of W, the number of microstates compatible with the observed macrostate. In plain language, entropy is the logarithm of the number of ways a system can be arranged without changing what we observe. The logarithm is there for a simple reason: the number of combinations multiplies when independent systems are joined, while we want a quantity that adds up. Taking the logarithm turns “how many arrangements” into “how many bits of information,” and bits add.

Why does coffee cool? Not through any mysterious force. Simply because there are overwhelmingly more microstates in which energy is spread evenly than microstates in which it stays concentrated in the cup. The system is not “seeking” disorder; it wanders at random among its possible configurations, and the “spread-out” configurations are so vastly more numerous that it almost certainly lands in one of them. The arrow of time — that sense that the past differs from the future — is statistical. It is not impossible for the coffee to reheat itself spontaneously; it is just absurdly improbable.

It is worth resisting the urge to memorise the symbols. The formula is only notation. The image fits in a sentence: entropy counts the number of ways a system can be arranged without changing what you observe of it.

Shannon: surprise written as an equation

In 1948, Claude Shannon at Bell Labs was not interested in gases but in an engineer’s problem: how much information does a message carry, and how far can it be compressed? He defined the entropy of a source as the average amount of surprise per symbol: H = − Σ pᵢ · log₂(pᵢ), measured in bits.

Read past the symbols and the formula simply says “average surprise.” An event of probability p carries more surprise the rarer it is; you weight each surprise by how likely it is to occur, you sum, and you obtain the average uncertainty of the source. Two limiting cases make everything clear. A rigged coin that always lands heads carries zero surprise: H = 0. There is no information at all — you already know the outcome, so there is nothing to transmit. A perfectly fair coin carries H = 1 bit: maximum uncertainty for two outcomes, and each toss teaches you exactly one bit.

Shannon’s genius was to show that H is also the limit of compression. When a tool like gzip shrinks a log file tenfold, it is exploiting the fact that the true entropy of that text is very low — it is repetitive and predictable. Pure randomness, by contrast, is incompressible: a file of white noise will not zip down, because its entropy is already maximal. To compress is to remove redundancy until you reach the entropy floor.

The same object under two names

The kinship between Boltzmann and Shannon is not an analogy; it is an identity of structure. The same logarithm, the same probability-weighted sum, the same idea: count the possibilities, weight them by their probability. Thermodynamic entropy is just the Shannon entropy of the distribution over microstates, up to the constant k that converts bits into joules per kelvin.

This is the decisive move: entropy is not in the system, it lives in the relationship between the system and an observer. The coffee does not contain “disorder” in itself. It has a macrostate — what we measure — and a multitude of microstates we ignore. Entropy quantifies precisely what we ignore.

James Clerk Maxwell had sensed this with his famous demon. Imagine a being that knew the position and velocity of every molecule; it could sort hot from cold without spending energy, apparently violating the second law of thermodynamics. The modern resolution, due to Rolf Landauer and Charles Bennett, is beautiful: the demon must eventually erase the information it accumulates in its memory, and erasing one bit costs an irreducible amount of energy, kT·ln2. Information is not an abstraction. It is physical. To erase is to heat.

What this changes

The shift from “disorder” to “ignorance” is not pedantry; it reorganises how we think about several practical problems at once.

The first is compression. Any stream of data — text, sensor readings, the lines a machine logs about itself — is a Shannon source. A line that repeats identically carries almost no entropy: it teaches nothing, which is exactly why it compresses to nothing and why it is useless as a signal. The rare line, the error that surfaces once in ten thousand events, carries high surprise: many bits, high value. The honest way to think about retention and sampling is therefore not by volume but by surprise — keep the rare aggressively, sample the predictable brutally. Observing a system well is not displaying everything; it is compressing intelligently, throwing away redundancy and keeping surprise.

The second is the temptation to record everything. Landauer’s principle is a quiet warning against it. A display that shows all of the data has reduced nothing; it has merely moved the chaos from the machine into the observer’s head, and it still pays the physical cost of holding all those bits without ever harvesting a reduction in uncertainty. The demon teaches that knowledge is not free, and that the genuinely useful act is not accumulation but the disciplined discarding of the predictable.

There is even a hardware echo of all this. For decades, engineers have fought thermal noise — entropy itself — as the enemy of reliable computation. A more recent line of research turns that on its head, treating physical randomness as a resource for certain probabilistic workloads rather than something to suppress. Whatever becomes of those efforts, the framing is the same one Maxwell stumbled on: noise is not simply waste, and information has a thermodynamic price.

So the next time the coffee cools or a file compresses, it is worth remembering that the two are the same phenomenon in different clothes. In both cases we are measuring hidden possibilities: the coffee has a crowd of microscopic configurations we cannot see, the file has predictable patterns we can throw away. Entropy is what remains once we have removed everything we could have guessed. It is, in the most literal sense, the measure of our ignorance.

Boltzmann: counting the ways of being

Shannon: surprise written as an equation

The same object under two names

What this changes

Further reading