next up previous contents
Next: Gabor Filters Up: Image Pyramids for generating Previous: Gaussian Filter   Contents

Gaussian and Laplacian Pyramids

The Gaussian pyramid is computed as follows. The original image is convolved with a Gaussian kernel. As described above the resulting image is a low pass filtered version of the original image. The cut-off frequency can be controlled using the parameter $\sigma$. The Laplacian is then computed as the difference between the original image and the low pass filtered image. This process is continued to obtain a set of band-pass filtered images (since each is the difference between two levels of the Gaussian pyramid). Thus the Laplacian pyramid is a set of band pass filters

Figure 5: The filtered images stacked one on top of the other form a tapering pyramid structure, hence the name

Implementation Let $I(x,y)$ be the original image. The Gaussian pyramid on image I is defined as:
G_0(x,y) = I
\end{displaymath} (6)

G_{i+1}(x,y) = REDUCE(G_i(x,y))
\end{displaymath} (7)

The REDUCE operation is carried out by convolving the image with a Gaussian low pass filter. The filter mask is designed such that the center pixel gets more weight than the neighboring ones and the remaining terms are chosen so that their sum is 1. The Gaussian kernel is given by:
w(r,c) = w(r)w(c),
\end{displaymath} (8)

where, w(r) =

\begin{displaymath}\left( \begin{array}{ccccc}
\frac{1}{4}-\frac{a}{2} & \frac{...
...& \frac{1}{4} & \frac{1}{4}-\frac{a}{2}\\
\end{array} \right)\end{displaymath}

a is chosen in the range 0.3 to 0.6 The prediction error $L_0(x,y)$ is then given by
L_i(x,y) = G_i(x,y) - EXPAND(G_{i+1}(x,y))
\end{displaymath} (9)

The EXPAND operation is defined as follows:
G_{i+1}(x,y) = 4 \sum_{m=-2}^{2} \sum_{n=-2}^{2} w(m,n) G_{i}(\frac{x-m}{2},\frac{y-n}{2})
\end{displaymath} (10)

Only terms for which (x-m)/2 and (y-n)/2 are integers are included in the sum. Rather than encode $G_0$ , $L_0$ and $G_1$ is encoded. This results in a net data compression because:
  1. $L_0$ is largely uncorrelated, and so may be represented pixel by pixel with many fewer bits than $G_0$
  2. $G_1$ is low-pass filtered, and so may be encoded at a reduced sample rate. Further data compression is achieved by iterating this process.
By repeating these steps several times a sequence of images $L_0$ , $L_1$ , $L_2$ ,... , $L_n$ are obtained. If we now imagine these images stacked one above another, the result is a tapering pyramid data structure - hence the name. The Laplacian pyramid can thus be used to represent images as a series of band-pass filtered images, each sampled at successively sparser densities. It is frequently used in image processing and pattern recognition tasks because of its ease of computation. We were able to extract the text from the background in an image by using 3 levels of the Laplacian pyramid. We used the K-Means algorithm to segment the 3 images obtained at each level of the pyramid. The text, which has a stronger response to the filters forms one cluster, while the background areas with little intensity variation form a separate cluster. However, for segmenting multilingual documents, analysis of the frequency content of the image alone is inadequate. In addition to the frequency information, the orientation information also needs to be extracted, for which we have used the Gabor filters. In the next section, we give a brief description of the Gabor filters.
next up previous contents
Next: Gabor Filters Up: Image Pyramids for generating Previous: Gaussian Filter   Contents