next up previous
Next: Bayesian Image Restoration Up: Markov Random Fields Previous: Markov Consistency


Parameter Estimation

Modeling the Markov PDFs parametrically entails data-driven optimal estimation of the parameters associated with the GRF potential functions or the Markov PDFs, lest we enforce an ill-fitted model on the data. Even nonparametric schemes are not free of internal parameters and one would want to learn these parameters in a data-driven manner. Standard estimation schemes, e.g., maximum likelihood, are not applicable in a straightforward manner for this task. Consider that we want to estimate some parameter $\theta$ in the MRF model. A ML-estimation scheme needs to evaluate the joint PDF of all the RVs in the MRF, i.e., $P ({\bf x} \vert \theta)$, which is a function of $\theta$. We can compute the potential functions $V_c ({\bf x}, \theta)$, as functions of $\theta$, in a simple way. The partition function $\eta (\theta)$, however, involves a $\theta$-dependent integral over the entire $\vert\mathcal{T}\vert$-dimensional space of possible realizations of the MRF. This is virtually intractable for any practical dataset, or image, comprising a reasonable number of indices $\vert\mathcal{T}\vert$. For instance, a 256 $\times $ 256 pixels image results in a $65536$D space.

Besag [14,15] devised one way to bypass this problem in the following way. Based on his idea, we first choose a set of indices $\mathcal{T}_\alpha$ such that the neighborhoods for the indices in $\mathcal{T}_\alpha$ do not overlap, i.e.,

$\displaystyle \mathcal{T}_\alpha$ $\textstyle \subset$ $\displaystyle \mathcal{T},$ (80)
$\displaystyle \mathcal{N}_t \cap \mathcal{N}_u$ $\textstyle =$ $\displaystyle \phi, \forall t,u \in \mathcal{T}_\alpha.$ (81)

This makes the set of random vectors corresponding to these neighborhoods mutually independent and identically distributed and, hence, a random sample. Besag referred to this partitioning process as the coding scheme. Then, the likelihood function is
$\displaystyle L (\theta) = \prod_{t \in \mathcal{T}_\alpha} P (x_t \vert {\bf y}_t, \theta)$     (82)

and the optimal parameter estimate is
$\displaystyle \mathop{\mbox{argmax }}_{\theta} L (\theta).$     (83)

This does not involve evaluation of the unwieldy partition function and standard numerical optimization techniques, e.g., the Newton-Raphson method, can produce the optimal estimate.

A major drawback of the coding-based parameter estimation is the wastage of data [14,15] because it utilizes only a small part $\mathcal{T}_\alpha$ ( $\vert\mathcal{T}_\alpha\vert \ll \vert\mathcal{T}\vert$) of the entire data. Another drawback is that the partition $\mathcal{T}_\alpha$ is not unique, and different partitions produce potentially different parameter estimates. There appears no clear way of reconciliation between these different estimates [99].

To alleviate the drawbacks of the coding scheme, Besag [14,15] invented a simple approximate scheme called the pseudo-likelihood estimation. This eliminated any coding strategies and used all the data at hand. The pseudo-likelihood function $L_{\mathrm{pseudo}}
(\theta)$ is simply the product of the conditional likelihoods at each index $t \in \mathcal{T}$, i.e.,

$\displaystyle L_{\mathrm{pseudo}} (\theta) = \prod_{t \in \mathcal{T}} P (X_t \vert {\bf y}_t, \theta).$     (84)

The optimal parameter estimate is
$\displaystyle \mathop{\mbox{argmax }}_{\theta} L_{\mathrm{pseudo}} (\theta).$     (85)

The overlapping neighborhoods of indices $t$ in the product do not produce independent observations, and the resulting function is not the true likelihood function--hence the name. Geman and Graffigne [62], later proved that the pseudo-likelihood estimate converges, with probability one, to the true ML estimate asymptotically with infinite data ( $\vert\mathcal{T}\vert
\rightarrow \infty$).

The literature also presents other methods of MRF-parameter estimation such as those based on mean-field approximations and least-squares fitting [99].


next up previous
Next: Bayesian Image Restoration Up: Markov Random Fields Previous: Markov Consistency
Suyash P. Awate 2007-02-21