Next: Nonparametric Density Estimation
Up: Statistical Inference
Previous: Maximum-a-Posteriori (MAP) Estimation
Expectation-Maximization (EM) Algorithm
There are times when we want to apply the ML or MAP estimation technique, but the data
is
incomplete. This implies that the model consists of two parts: (a) the observed part:
and (b) the hidden part:
. We can associate RVs
and
with the observed
and hidden parts, respectively. We can still apply ML or MAP estimation techniques if we assume a
certain joint PDF
between the observed and hidden RVs, and then marginalize over
the hidden RVs
. Marginalization of an RV
chosen from a set of RVs refers to the process of
integration of the joint PDF over the values
of the chosen RV. This is the key idea behind the
EM algorithm.
Considering ML estimation, for example, we compute the optimal parameter as
where
is the likelihood function described previously in Section 2.3.1. This
key idea is formalized in the expectation-maximization (EM)
algorithm [43,104].
Herman O. Hartley [71] pioneered the research on the EM algorithm in the late 1950s. The
first concrete mathematical foundation, however, was laid by Dempster, Laird, and
Rubin [43] in the late 1970s. Neal and Hinton [111,112,108] presented
the EM algorithm from a new perspective of lower-bound maximization. Over the years, the EM
algorithm has found many applications in various domains and has become a powerful estimation
tool [104,48].
The EM algorithm is an iterative optimization procedure. Starting with an initial parameter estimate
, it is guaranteed to converge to the local maximum of the likelihood function
. The EM algorithm consists of two steps: (a) the E step or the
expectation step and (b) the M step or the maximization step.
-
The E step constructs an optimal lower bound
to the log-likelihood function
. This optimal lower bound is a function of
that touches the
log-likelihood function at the current parameter estimate
, i.e.,
 |
|
|
(38) |
and never exceeds the objective function at any
, i.e.,
 |
|
|
(39) |
Intuitively, maximizing this optimal lower bound
(in the M step) will surely take us
closer to the maximum of the log-likelihood function
, i.e., the ML
estimate. We compute this optimal lower bound as follows [108,42,104].
Let us rewrite the log-likelihood function as
where
is any arbitrary PDF. Applying Jensen's inequality [34], and using the
concavity of the
function, gives:
Our goal is to try to find the particular PDF
such that
is the optimal
lower bound that touches the log-likelihood function at the current parameter estimate
.
We can achieve this goal by solving the following constrained-optimization [137] problem:
Using the Lagrange-multiplier [137] approach, the objective function to be maximized is
 |
|
|
(43) |
The derivative of the objective function
with respect to
is
 |
|
|
(44) |
The derivative of the objective function
with respect to
is
 |
|
|
(45) |
The objective function achieves its maximum value when both the aforementioned derivatives in
(2.44) and (2.45) are zero. Using these conditions, and
after some simplification, we get
This gives our optimal lower bound as
 |
|
|
(47) |
We can confirm that
indeed equals
, which
indicates that
touches the log-likelihood function
at
and is an optimal lower bound.
-
The M step performs the maximization of the function
with respect to the variable
.
where the Q function is
Observe that
also depends on the current parameter estimate
that is
considered a constant. The M step assigns the new parameter estimate
as the one
that maximizes
, i.e.,
 |
|
|
(51) |
The iterations proceed until convergence to a local maximum of
. Actually, the M step need not find that
corresponding to the maximum value of
the Q function, but rather it is sufficient to find any
such that
 |
|
|
(52) |
This modified strategy is referred to as the generalized-EM (GEM) algorithm and is also
guaranteed to converge [43].
Next: Nonparametric Density Estimation
Up: Statistical Inference
Previous: Maximum-a-Posteriori (MAP) Estimation
Suyash P. Awate
2007-02-21