next up previous contents
Next: Related Work Up: report Previous: List of Tables   Contents

Introduction

Document Image Analysis is concerned with the problem of transferring the document images into electronic form. This would involve the automatic interpretation of images of printed and handwritten documents, including text, forms, postal envelopes, bank cheques, engineering drawings, maps etc [1]. Several systems which work in specific domains, like the ones mentioned above, have been developed. Document Image Analysis can be defined as the process that performs the overall interpretation of document images [2]. It is a key area of research for various applications in machine vision and media processing, including page readers, content-based document retrieval, and digital libraries. The first step of a Document Image Analysis system would be to extract the text, figures, tables, graphs, charts, mathematical equations, etc, from the background. This would involve segmentation of the document image into equations, tables, charts and text. An important follow up problem in multi-lingual documents is the segmentation of text based on the scripts. This project aims to separate out the regions of different scripts in the scanned image of a document. Such a system would be essential in a multilingual country like India, where a single document (example. passport application form, public examination question paper, railway booking form etc) page may contain words in two or more language scripts, typically English, Hindi and the local language. After separating the different scripts, they can be fed to the corresponding OCRs. Applications of such a system includes digital libraries, multimedia systems, information retrieval systems, eGovernance etc. The report is organized as follows. In Section 2, ongoing work in related areas is briefly mentioned. Some of the important properties common to most Indian scripts are described in Section 3. Based on the textural properties of Indian scripts, a Gabor filter based multi-channel filtering approach for segmentation is presented in Section 4. This Section includes a brief description of the Gabor filters. A technique for feature space reduction and estimation of feature importances is also presented. Next, future work and conclusions are presented in Sections 5 and 6. Finally, experimental results are presented in Section 7.
next up previous contents
Next: Related Work Up: report Previous: List of Tables   Contents
2002-06-03