The concept of entropy was prevalent, before Shannon, in the thermodynamics and statistical mechanics literature. In classical thermodynamics, the important second law states that the total entropy of any isolated thermodynamic system tends to increase with time. Ludwig Boltzmann and Josiah W. Gibbs, in the late 1800s, statistically analyzed the randomness associated with an ensemble of gas particles. They called this measure entropy and defined it to be proportional to the logarithm of the number of microstates such a gas could occupy. Their mathematical formulation of entropy, albeit in a different context, was equivalent to the definition by Shannon.
Shannon defined a measure of uncertainty or randomness associated with an RV, calling it
entropy [154]. Thus, entropy is the average uncertainty associated with each
possible value of the RV:
![]() |
(58) | ||
![]() |
(59) |
Alfred Renyi [138] generalized Shannon's measure of entropy by presenting a family of
entropy functions parameterized by a continuous parameter
:
![]() |
(60) |
We can also interpret Shannon entropy as the expectation of the RV
, i.e.,
| (61) |
![]() |
|||
![]() |
(62) |
We can observe that the expression on the right involves the product of the probabilities of
occurrence of the observations. This product is, in fact, the likelihood function associated with
the observations. Recall that the ML estimate selects that parameter value that maximizes the
likelihood function--where each term is the probability conditioned on the parameter value.
Indeed, we can prove that the ML parameter estimates are the same as the minimum-entropy parameter
estimates when dealing with Shannon's entropy measure:
![]() |
![]() |
||
![]() |
|||
| (63) |
The joint entropy of two RVs
and
is
![]() |
(64) |