Cross Entropy builds on the concept of Entropy in Information by measuring the difference between two probability distributions. While entropy quantifies the uncertainty within a single distribution, cross entropy evaluates how one distribution diverges from another.

Why have this?

In an ideal world, we would have access to the underlying Probability Distribution used to create our sample, making entropy very easy to calculate. But in reality, we dont have this perfect Probability Distribution, and thus we need an approximation. What cross entropy allows us to do is compare the true distribution (which we may not know) with an estimated distribution (which we do know).


How Can We Quantify the Difference between Two Probability Distributions?

We extend apon the idea of cross entropy with something call Kullback-Leibler Divergence (KL Divergence), which measures how one probability distribution diverges from a second, expected probability distribution.

Formal Definition

Cross Entropy between two discrete probability distributions and over the same set of events is defined as:

Where:

  • is the true distribution (the actual probabilities of events).
  • is the estimated distribution (the predicted probabilities of events).
  • is the base of the logarithm, commonly 2 (bits), e (nats), or 10 (hartleys).

Example Values

Let’s say we have two distributions over the outcomes of a 3-sided die:

We can calculate the cross entropy as follows:

What Does This Mean?

The value of cross entropy indicates how well the predicted distribution approximates the true distribution . A lower cross entropy value suggests that is a better approximation of , while a higher value indicates greater divergence between the two distributions.

Important t

Cross Entropy is NOT Symmetric: Note that cross entropy is not symmetric, meaning that is not necessarily equal to . This asymmetry reflects the fact that the roles of the true and estimated distributions are different in the calculation.


Properties of Cross Entropy

One imortant property is that the cross entropy is always greater than or equal to the entropy of the true distribution . This is because using an estimated distribution can never reduce the uncertainty inherent in the true distribution .