next up previous contents
Next: Clustering using the Feature Up: Document Segmentation Previous: Feature Selection   Contents

Estimation of Feature Importances

The K-Means and other standard clustering algorithms use simple euclidean distance between two vectors for segmentation. The euclidean difference between two feature vectors, u and v, is given by $d^2(v,u) = (v-u)^T(v-u)$. However, the response of the image to different bands may not be equally strong [22]. To control the impact of each band individually, the difference along each component is weighed differently. The weighted distance between two feature vectors is given by:
\begin{displaymath}
d^2(v,u) = w_i^2 \sum_{i=1}^{n} (v_i-u_i)^T(v_i-u_i)
\end{displaymath} (19)

where $w_i$ is the weight assigned to the scale/orientation band in feature vector component i. It is clear that $w_i \ge 0$ for all i. Without loss in generality, the constraint
\begin{displaymath}
\sum_{i=1}^{n} w_i = 1
\end{displaymath} (20)

can also be added, as only relative weights are important. The weights should be assigned such that the bands having high discriminatory power are assigned higher weights. The clustering algorithm assigns a label $s_i^I$ under the clustering scheme represented by I to each pixel i such that the aggregate clustering error:
\begin{displaymath}
J = \sum_{i=1}^{n} d^2(v_i,\bar{v}^{s_i^I})
\end{displaymath} (21)

is minimum. The objective function is a function of the clustering scheme as well as the weight vector W as the clustering error depends on the distance measure given by Equation  19. J can be optimized in two steps:
  1. Optimize J with respect to cluster assignments using the K-Means algorithm.
  2. Optimize J with respect to the weight vector W. The procedure is described below.
  3. Iterate over the above two steps till the change in J is very low.
The objective function can be written as:
\begin{displaymath}
J(W,I) = \sum_{i=1}^{n} w_j^2 \psi(j),
\end{displaymath} (22)

where n is the number of components in the feature vector, and $\psi(i) = \sum_{i=1}^{n} d_j^2(v_i, \bar{v}^{s_i^I})$ is the clustering error limited to the feature j. To minimize J with respect to $w_i$, we incorporate the constraint given in Equation  20 into the objective function:
\begin{displaymath}
F(W,\lambda) = \sum_{i=1}^{n} w_i^2 \psi_i - \lambda (\sum_{i=1}^{n} w_i - 1)
\end{displaymath} (23)

Differentiating Equation  23 with respect to $w_m$, we get
\begin{displaymath}
w_m = \frac{\lambda}{2\psi_m}
\end{displaymath} (24)

Substituting Equation  24 into Equation  20 and equating to zero, we get,
\begin{displaymath}
w_m = \frac{1}{\sum_{k=1}^{n} \frac{\psi_m}{\psi_k}}
\end{displaymath} (25)

Thus, using the above formula the weights can be calculated iteratively.
next up previous contents
Next: Clustering using the Feature Up: Document Segmentation Previous: Feature Selection   Contents
2002-06-03