This section formulates the classification problem as an optimal-segmentation problem using with an information-theoretic goodness measure associated with the Markov PDFs. It begins by forming a connection between information-theoretic measures, such as mutual information, entropy [34], and classification.
Loosely speaking, the mutual information between two random variables quantifies the degree of functional dependence between them. For functionally-dependent random variables, each variable uniquely determines the other, and the mutual information is maximized. On the other hand, independent random variables convey no information about each other, and their mutual information is zero (minimal). For image segmentation [87], we can say that a good segmentation is one in which the voxel-neighborhood intensity values provide the most information about the class labels. Likewise, knowing the voxel class should provide the most reliable estimate of the voxel neighborhood. Clearly, there is no strict functional dependence and images are inherently stochastic, but mutual information provides a well-founded mechanism for quantifying the degree to which these properties hold.
Using the set of conditional PDFs
for the
classes, we can define
a joint PDF
between the RVs
and
. At each voxel
, an instance
is drawn from the joint PDF. What we observe, however, are only the
intensity-neighborhood vectors
. The label values
define the classification and
must be estimated. We define the optimal segmentation as the one that maximizes the mutual
information between
and
, i.e.,
The entropy of class
is
The entropy of the Markov PDF associated with the entire image,
, is independent of the
label assignment
and we can ignore it during the optimization. Thus,
(6.3) implies that the optimal segmentation is the one that minimizes a
weighted average of entropies
of the
Markov PDFs associated with the
stationary-ergodic regions. The present mutual-information-based energy gives more importance, or
weight, to reducing entropies of larger regions in the image in direct proportion to their
size--the weights are the probability of occurrence of the classes
in the image.
Rewriting
provides more insight into this optimality
metric. We see that the metric encourages segmentations with equal voxel counts for the
classes--uniform PDF for
implying maximal
--while demanding high predictability of the
label at each voxel
given its neighborhood intensities
--low
leading to low
.
Equations (6.3) and (6.4) give the optimal
segmentation as
![]() |
|||
![]() |
(140) |
![]() |
|||
![]() |
(141) |
To estimate
from the data, we observe that the discrete random variable
can take
only
possible values. Furthermore,
voxels, out of a total of
voxels, have
. Thus,
So far, we have not taken into account any a priori information in the segmentation process
and we have derived all probabilities solely from the data. The formulation, however, extends in a
straightforward manner to include a priori information using standard Bayesian strategies
followed by optimization involving the resulting posterior probabilities.
Section 6.4.3 discusses how to integrate a priori information in the form of
brain tissue probabilistic atlases into the proposed method. For the minimization in
(6.9), we manipulate the regions
using a
gradient-descent optimization strategy. Section 6.4.2 gives the details.