next up previous
Next: Consistency of the Data-Driven Up: Adaptive Markov Image Modeling Previous: Wavelet modeling

Data-Driven Nonparametric Markov Statistics

In order to rely on image samples to produce nonparametric estimates of Markov statistics, we must assume that different neighborhood-intensities in the image are derived from the same PDF. Mathematically, this is the notion of stationarity associated with a random field. A stationary region $\mathcal{R} \subset \mathcal{T}$ is one where the Markov PDFs $P ({\bf Z}_t)$ are exactly the same for all voxels $t$ in that region [47,161], i.e.,

$\displaystyle \forall t \in \mathcal{T}, P ({\bf Z}_t) = P ({\bf Z}).$     (99)

In other words, the Markov statistics are shift invariant. Stationarity provides many observations $\{ {\bf z}_t \}_{t \in \mathcal{R}}$, all derived from $P ({\bf Z})$.

Stationarity alone, however, is not sufficient to provide accurate estimates of the Markov PDFs from a single observed image. To do this, we must rely on another statistical property, namely ergodicity. Essentially, ergodicity guarantees accurate estimation of certain ensemble properties of the random field, e.g., the Markov PDFs $P ({\bf Z})$, from observations $\{ {\bf z}_t \}_{t \in \mathcal{R}}$ in a single realization of the stationary random field, i.e., the observed image. Mathematically, it guarantees that, for certain quantities associated with $P ({\bf Z})$, the spatial averages (i.e., over $\mathcal {R}$) converge to the ensemble averages (i.e., over ${\bf z}$) as the size of the image $\vert R\vert$ tends to infinity [161]. Ergodicity achieves this by ensuring that: (a) random variables become independent as the shift between them approaches infinity, and (b) the random variables in the MRF become progressively less dependent with increasing spatial distance at a sufficiently-rapidly rate. Therefore, spatial averages over sufficiently-large regions $\mathcal {R}$ appear as averages of nearly-independent random variables and, subsequently, the weak law of large numbers [161] ensures the convergence of such averages to the desired ensemble average.

To represent the Markov PDFs $P ({\bf Z})$, we use the nonparametric Parzen-window technique [125,48]. The Parzen-window probability estimate for $P ({\bf z})$ is defined as the ensemble average

\begin{displaymath}
P ({\bf z})
= \frac {1} {\vert\mathcal{S}'\vert}
\sum_{{\bf z}' \in \mathcal{S}'} G_d ({\bf z} - {\bf z}', \Psi_d),
\end{displaymath} (100)

where $\mathcal{S}'$ is a random sample [47,161] drawn from the PDF $P ({\bf Z})$, $d = \vert\mathcal{N}_t\vert$ is the neighborhood size, and $G_d ({\bf z}; \Psi_d)$ is the $d$-dimensional Gaussian kernel with zero mean and covariance matrix $\Psi_d$. Having no a priori information on the structure of $P ({\bf Z})$, we choose an isotropic Gaussian kernel, i.e.,
$\displaystyle \Psi_d = \sigma^2 I_d,$     (101)

where $I_d$ is the $d \times d$ identity matrix and $\sigma $ is the standard deviation along each dimension. Ergodicity enables us to approximate the ensemble average as a spatial average:
\begin{displaymath}
P ({\bf z})
\approx
\frac {1} {\vert\mathcal{A}\vert}
\sum_{t \in \mathcal{A}} G_d ({\bf z} - {\bf z}_t; \Psi_d),
\end{displaymath} (102)

where the set $\mathcal{A}$ is a small subset of $\mathcal {R}$. Taking $\mathcal{A} = \mathcal{R}$ increases the algorithmic complexity of the scheme. Section 3.5.1 describes an effective technique of choosing this Parzen-window sample. As we saw in Section 2.4, the density estimate varies with the kernel-parameter $\sigma $ value and Section 3.4 describes a data-driven technique to estimate an optimal kernel-parameter $\sigma $ value.


next up previous
Next: Consistency of the Data-Driven Up: Adaptive Markov Image Modeling Previous: Wavelet modeling
Suyash P. Awate 2007-02-21