next up previous contents
Next: Image Pyramids for generating Up: Document Segmentation Previous: Document Segmentation using Texture   Contents

System Overview

We use the following approach for image segmentation:
  1. Preprocess the image to obtain a binary image, in which the text is black and the background is white.
  2. Pass the image through a set of filters tuned to a certain band of frequency and orientation. Compute the feature vector corresponding to each filtered image. (discussed in more detail in the following Sections)
  3. Compute the feature image as described in step 2 above.
  4. Select a set of filters such that the image has high response to these set of filters. This is described in more detail in Section 4.6.
  5. Construct a feature vector using these reduced set of filters and segment using K-Means algorithm.
An overview of the system is shown in the Figure 2.

Figure 2: Overview of the System
\begin{figure*}
\centerline{\epsfig{figure=overview.eps,width=0.9\textwidth}}
\end{figure*}

As mentioned in step 2, we use a set of filters, tuned to a certain band of frequency and orientation, to segment the text in the images. The Laplacian pyramid can be used to obtain a set of pseudo band-pass filtered images, which would allow a particular band of frequency. The number of levels of the Laplacian pyramid would determine the number of band pass filtered images used for segmentation. The Laplacian pyramid is adequate for extracting text from the background. This is because the text has a stronger response to the filters, while background areas with little intensity variation have nearly no response [17]. However, for segmenting multilingual documents, analysis of the frequency content of the image alone is inadequate. In addition to the frequency information, the orientation information also needs to be extracted. The Gabor filter has the ability to separate out information at various scales and orientations. Hence, currently a Gabor Filter bank is being used, which decomposes a given input image into a number of filtered images over some frequency and orientation. In our implementation, Gabor filters up to 6 scales and 30 orientations have been used giving a total of 180 filters. The scale and orientation of the filter can be parameterized using $(u_0,v_0)$. $\sigma_x$ and $\sigma_y$ determine the filter bandwidth. The next Sections give a brief description of the Image pyramids followed by the Gabor filter.
next up previous contents
Next: Image Pyramids for generating Up: Document Segmentation Previous: Document Segmentation using Texture   Contents
2002-06-03