KL Divergence (Kullback-Leibler Divergence) is a measure of how one probability distribution diverges from a second, expected probability distribution. Extending on the idea of Cross Entropy, KL Divergence provides a way to quantify the difference between two distributions by measuring the inefficiency of assuming that the distribution is when the true distribution is .
It can be thought of as the amount of information lost when is used to approximate . In other words, it tells us how much more βsurpriseβ we would experience if we were to use the distribution instead of the true distribution .
What is KL Divergence Used For?
We use KL Divergence primarily to compute how one probability distribution diverges from a second, expected probability distribution. This is particularly useful because in many real-world cases, we dont have access to the true distribution , and we need to rely on an estimated distribution . KL Divergence helps us understand how well our estimated distribution approximates the true distribution.