Probably one of the most fundamental building blocks to fields like Information Theory and later Variational Inference. In laymans terms, it states that for Convex Function, the function of the average is less than or equal to the average of the function. This has deep implications in understanding expectations and variances in probabilistic systems.

Formal Definition
1. Linear Edition
If is convex (shaped like a smile ), then for any two points in the domain of , and a weight :
Imagine drawing a straight line connecting two points on a curved “smiley face” graph. For a convex shape, the straight line connecting two points always sits above the curve itself. The weighted average of the function values is greater than the function of the weighted average.
- Right Side: The straight line (the chord).
- Left Side: The actual curved function.
2. Probabilistic Extension
Let be a convex function (where the slope is increasing, i.e., ) and let be a random variable. Then:
Equality Condition:
If is strictly convex (), then equality, , holds if and only if is a constant (no variance).
If a system is convex (like compound interest or options), adding volatility (randomness) actually increases the expected outcome. Being exposed to the ups and downs (the average of the function) is better than staying at the steady average (the function of the average).
- Left Side: The Function of the Average.
- Right Side: The Average of the Function.
3. The Basic Case
This connects the Linear Edition to the Probabilistic Edition. If we treat the weight as a probability:
The “Linear Edition” is just a specific example of the “Probabilistic Extension” where your random variable only has two possible outcomes ( or ). It proves that the logic holds for simple weighted averages just as it does for complex probability distributions.
4. Concave Edition
Let be a concave function (shaped like a frown , i.e., ) and let be a random variable. Then:
Equality Condition:
If is strictly concave (), then equality holds if and only if is a constant.
This is the logic of diminishing returns (like the utility of money). In a concave world, reliability is valuable. The “average of the outcomes” is usually worse than the “outcome of the average.” The straight line sits below the curve.
- Scenario: Would you rather have a guaranteed 0 or $100 (Average of the Function)?
- Result: Most people prefer the guaranteed $50.