next up previous
Next: Nonparametric Density Estimation Up: Statistical Inference Previous: Maximum-a-Posteriori (MAP) Estimation


Expectation-Maximization (EM) Algorithm

There are times when we want to apply the ML or MAP estimation technique, but the data ${\bf x}$ is incomplete. This implies that the model consists of two parts: (a) the observed part: ${\bf x}$ and (b) the hidden part: ${\bf y}$. We can associate RVs $X$ and $Y$ with the observed and hidden parts, respectively. We can still apply ML or MAP estimation techniques if we assume a certain joint PDF $P (X,Y)$ between the observed and hidden RVs, and then marginalize over the hidden RVs $Y$. Marginalization of an RV $Y$ chosen from a set of RVs refers to the process of integration of the joint PDF over the values $y$ of the chosen RV. This is the key idea behind the EM algorithm.

Considering ML estimation, for example, we compute the optimal parameter as

$\displaystyle \theta^{*}$ $\textstyle =$ $\displaystyle \mathop{\mbox{argmax }}_{\theta} \Bigg( \log L (\theta \vert {\bf x}) \Bigg)$  
  $\textstyle =$ $\displaystyle \mathop{\mbox{argmax }}_{\theta} \Bigg( \log \bigg( \int_{\mathcal{S}_Y} P ({\bf x}, y \vert \theta) dy \bigg) \Bigg),$ (37)

where $L (\cdot)$ is the likelihood function described previously in Section 2.3.1. This key idea is formalized in the expectation-maximization (EM) algorithm [43,104].

Herman O. Hartley [71] pioneered the research on the EM algorithm in the late 1950s. The first concrete mathematical foundation, however, was laid by Dempster, Laird, and Rubin [43] in the late 1970s. Neal and Hinton [111,112,108] presented the EM algorithm from a new perspective of lower-bound maximization. Over the years, the EM algorithm has found many applications in various domains and has become a powerful estimation tool [104,48].

The EM algorithm is an iterative optimization procedure. Starting with an initial parameter estimate $\theta^0$, it is guaranteed to converge to the local maximum of the likelihood function $L (\theta
\vert {\bf x})$. The EM algorithm consists of two steps: (a) the E step or the expectation step and (b) the M step or the maximization step.

The iterations proceed until convergence to a local maximum of $L (\theta
\vert {\bf x})$. Actually, the M step need not find that $\theta^{i+1}$ corresponding to the maximum value of the Q function, but rather it is sufficient to find any $\theta^{i+1}$ such that
$\displaystyle Q (\theta^{i+1}) \ge Q (\theta^i).$     (52)

This modified strategy is referred to as the generalized-EM (GEM) algorithm and is also guaranteed to converge [43].


next up previous
Next: Nonparametric Density Estimation Up: Statistical Inference Previous: Maximum-a-Posteriori (MAP) Estimation
Suyash P. Awate 2007-02-21