next up previous
Next: Engineering Enhancements Up: Adaptive Markov Image Modeling Previous: Consistency of the Data-Driven


Optimal Parzen-Window Kernel Parameter

The nonparametric Parzen-window scheme for estimating Markov PDFs entails setting an appropriate value for the kernel-parameter $\sigma $. Section 3.3 described a ML-based estimate for this parameter and discussed the theoretical advantages of such a strategy. A maximum likelihood estimate for $\sigma $ is equivalent to the choice that minimizes the entropy of the Markov statistics of the stationary-ergodic random field. That is,

$\displaystyle \sigma^*$ $\textstyle =$ $\displaystyle \mathop{\mbox{argmax }}_{\sigma}
\prod_{t \in \mathcal{R}} P ({\bf z}_t; \sigma)$  
  $\textstyle =$ $\displaystyle \mathop{\mbox{argmax }}_{\sigma}
\sum_{t \in \mathcal{R}} \log P ({\bf z}_t; \sigma)$  
  $\textstyle \approx$ $\displaystyle \mathop{\mbox{argmin }}_{\sigma}
\sum_{{\bf z}' \in \mathcal{S}'_{\sigma}}
\Big( - \log P ({\bf z}'; \sigma) \Big)$  
  $\textstyle =$ $\displaystyle \mathop{\mbox{argmin }}_{\sigma}
E_{P({\bf Z}; \sigma)} \Big[ - \log P ({\bf Z}; \sigma) \Big]$  
  $\textstyle =$ $\displaystyle \mathop{\mbox{argmin }}_{\sigma} h ({\bf Z}; \sigma),$ (104)

where $\mathcal{S}'_{\sigma}$ is a random sample derived from the PDF $P ({\bf Z}; \sigma)$, and $h
({\bf Z}; \sigma)$ is the $\sigma $-dependent entropy of the random variable ${\bf Z}$. Indeed, the relationship between log-likelihood and entropy is well-documented in the literature [170]. We use the iterative Newton-Raphson optimization scheme [137] to find the optimal $\sigma $ value.

It is important to note that a naive application of ML estimation results in $\sigma = 0$ leading to a highly irregular PDF of little use. Careful observation shows that computing $P ({\bf z}_t)$ using a sample $\mathcal{A}$ that includes ${\bf z}_t$ produces an optimal kernel-parameter $\sigma $ estimate of zero [70,29,135]. This is because $\sigma = 0$ places impulse functions at each of the observations $\{ {\bf z}_t \}_{t \in \mathcal{T}}$, thereby maximizing their each probability $P ({\bf z}_t)$. The resulting PDF estimate $P ({\bf Z})$, a superposition of impulse functions, is highly irregular/rough and has little practical utility. Therefore, in order to regularize the PDF estimate we ensure that, while computing $P ({\bf z}_t)$, the set $\mathcal{A}$ does not contain the observation ${\bf z}_t$, i.e.,

$\displaystyle P ({\bf z}_t)$ $\textstyle \approx$ $\displaystyle \frac {1} {\vert\mathcal{A}_t\vert}
\sum_{u \in \mathcal{A}_t} G_d ({\bf z}_t - {\bf z}_u; \Psi_d), \mathrm { where}$  
$\displaystyle \mathcal{A}_t$ $\textstyle \subset$ $\displaystyle \mathcal{R}, \mathrm { and}$  
$\displaystyle t$ $\textstyle \notin$ $\displaystyle \mathcal{A}_t.$ (105)

This method of regularization is called cross validation and we employ this scheme throughout this dissertation. It is known to be versatile, producing effective density estimates in a variety of situations [49,151,63,70,29]. Chow et al. [29] prove the consistency of the resulting nonparametric data-driven density estimator. The cross-validation-based PDF estimate, however, is also known to undersmooth the density estimate at times and is sensitive to outliers [151,156].

Other schemes such as plug-in bandwidth estimators perform more smoothing, but at the risk of missing subtle features in the PDF [156]. This is an example of the classic tradeoff between robustness and sensitivity. As Simonoff [156] puts it: data-driven smoothing-parameter selection remains a controversial issue where no specific method is accepted as the gold standard. Figure 3.1 shows the variation of the entropy measure as a function of $\sigma $ for the standard Lena image.

Figure 3.1: Optimal kernel bandwidth. (a) The Lena image. (b) The entropy estimate for the Lena image as a function of Parzen-window kernel $\sigma $.
\begin{figure}\twoHeight {UINTA/Lena_InputImage.eps} {UINTA/graph_JointEntropy_Vs_Sigma_gray.eps}
\end{figure}

Alternative strategies for regularization of the PDF estimate include spline-based methods [156] and incorporation of roughness penalties via the first/second derivatives of the logarithm or square-root of the PDF. For instance, Good and Gaskins [66,67] derive such a derivative-based roughness penalty by penalizing the KL-divergence between the estimated PDF and its shifted version. The resulting $\sigma $ estimates are known as penalized-ML estimates.


next up previous
Next: Engineering Enhancements Up: Adaptive Markov Image Modeling Previous: Consistency of the Data-Driven
Suyash P. Awate 2007-02-21