next up previous
Next: Optimal Parzen-Window Kernel Parameter Up: Adaptive Markov Image Modeling Previous: Data-Driven Nonparametric Markov Statistics


Consistency of the Data-Driven Markov Model

The power of the Markov model on the random field and nonparametric density estimation comes with some additional theoretical constraints that warrant mention. In order for the Parzen-window estimation to converge [125,48] the kernel parameter $\sigma $ must decrease with increasing number of samples. This relationship can be derived from the actual data, and several authors have proposed ML-based schemes for estimating $\sigma $ [15,62]. Section 3.4 discusses this in more detail.

Another important issue is consistency. A consistent system is one where the joint PDF $P
(\{ X_t \}_{t \in \mathcal{T}})$ of all the random variables gives, using rules of probabilistic inference, each conditional PDF $P (X_t \vert {\bf
y}_t)$ uniquely. Besag's proof of the Hammersely-Clifford theorem [14], also known as the Markov-Gibbs equivalence theorem, shows that the conditional Markov PDFs $P (X_t \vert {\bf
y}_t)$ must be restricted to a specific form in order to give a consistent structure to the entire system.

The Markov PDFs that the proposed method learns empirically from the data do, indeed, yield a consistent system asymptotically, i.e., as the amount of data tends to infinity. This follows from the convergence of the Parzen-window density estimate to the true Markov PDF. This convergence, however, holds only when the observations in the sample are independently generated from a single underlying PDF. The stationarity of the Markov random field implies that all observations are derived from a single PDF. However, in our case, these observations are the neighborhood-intensity vectors, which may share neighboring voxel values. Independence requires sampling from a subset $\mathcal {U}$ of the entire voxel-set $\mathcal{T}$, such that no two voxels in the subset have overlapping neighborhoods, i.e.,

$\displaystyle \mathcal {U}$ $\textstyle \subset$ $\displaystyle \mathcal {R},$  
$\displaystyle \forall a,b \in \mathcal{U}$ $\textstyle :$ $\displaystyle \mathcal{N}_a \cap \mathcal{N}_b = \phi,$ (103)

The constraint of nonoverlapping neighborhoods leads to a wastage of a large amount of data ( $\{
{\bf z}_t \}_{t \in \mathcal{T} \setminus \mathcal {U}}$) [14], which would, in practice, lead to too few image samples. However, Levina [98] shows that ergodicity allows convergence even in the case of overlapping data, and thus it is appropriate to derive the sample $\mathcal{A}$ from the entire set of image neighborhoods in $\mathcal {R}$.


next up previous
Next: Optimal Parzen-Window Kernel Parameter Up: Adaptive Markov Image Modeling Previous: Data-Driven Nonparametric Markov Statistics
Suyash P. Awate 2007-02-21