next up previous contents
Next: Estimation of Feature Importances Up: Document Segmentation Previous: Generating the feature image   Contents

Feature Selection

The output of the Gabor Filter bank is a set of n filtered images. To obtain uniform and complete coverage of the frequency domain, Gabor filters of up to 6 scales and 30 orientations were used. Using all the 180 filtered images is computationally expensive. Also, some of the filtered images have very little discriminatory power, while others contain very little information about the original image. Hence, it is enough to use a subset of the filtered images for the segmentation. Consider a set of n d-dimensional feature vectors $x_1$, $x_2$, ... ,$x_n$ belonging to two classes. The two classes will be well separated in the feature space if the function:
\begin{displaymath}
J(w) = \frac{\vert m_1-m_2\vert^2}{s_1^2+s_2^2}
\end{displaymath} (18)

is maximum. Here $m_1$, $m_2$, $s_1$, $s_2$ are the means and standard deviations of the two classes respectively. Maximizing the above function ensures that the means of the two classes are well separated in the feature space and the standard deviation within a class is minimum i.e. the points belonging to the same class are compactly clustered around their respective means. The discriminatory power of each feature is estimated by computing the value of $J(w)$ for that feature. The ten best features i.e the 10 features having the highest value of $J(w)$ are used for segmentation.
next up previous contents
Next: Estimation of Feature Importances Up: Document Segmentation Previous: Generating the feature image   Contents
2002-06-03