Entropy is a very simple concept, but it has a lot of depth. It is a measure of uncertainty or randomness in a system. In information theory, entropy quantifies the amount of information contained in a message or data source. The more uncertain or random the data, the higher the entropy.


Quantifying Surprise

Entropy in a sense is the expected value of surprise. If you have a fair coin, the outcome of a flip is uncertain, and thus has high entropy. If you have a biased coin that always lands on heads, the outcome is certain, and thus has low entropy.

Surprise should be high when something unlikely happens, and low when something likely happens. The key difference between surprise and probability is that surprise is inversely related to probability, AND that compounding suprise is additive, not multiplicative.

To achieve this, we use the negative logarithm of the probability:

Where is the event, is the probability of the event, and is the base of the logarithm. The base determines the unit of measurement for surprise. ommon bases are 2 (bits), e (nats), and 10 (hartleys).

What you will notice is that for an example where someone gets a 1 on a 6 sided die 3 times in a row, the probability is:

But the surprise would instead be:

In essence, the logarithm turns the multiplication of probabilities into an addition of surprise. In the case of rolling 3 dice, the surprise would be that 3 times that of a single dice, whereas the probability of seeing that particular combination is instead .


Entropy of a Random Variable

We achieve entropy by taking the expected value of the surprise over all possible outcomes of a random variable.

This involves first multiplying the surprise of each outcome by its respective probability, and then summing these products over all possible outcomes.

Defined formally, the entropy of a discrete random variable with possible outcomes and corresponding probabilities is given by:

This reframes the surprise function into the entropy function by taking the weighted average of surprise across all outcomes.

What Entropy Tells Us

Entropy provides a measure of the uncertainty or unpredictability associated with a random variable. A higher entropy value indicates greater uncertainty, while a lower entropy value indicates less uncertainty.

While not clear from this introduction, entropy can be applied in various contexts, the main ones being comparing different probability distributions, optimizing data encoding schemes, and measuring information gain in machine learning.