Course Project Reports
Texture Synthesis on Surfaces
Particle Engine
A study of the Alpha 21364 Processor
Final Year Project
Real Time Operating System
Embedded Device
Distributed Database Management System
Multimedia Editor
Sound Editor
Online Chat Server
Summer Internship
Graphics Course Projects
Web Portal
|
Final Year Project - Document Image Analysis (ongoing) Guides: Dr. P J Narayanan and Dr. C V Jawahar Documents are mainly paper-based
type of media containing information of various kinds such as text, graphics,
pictures, mathematical formulae, and tables. Nowadays, most documents are
being stored in electronic form because of their ease of storage, search, and
retrieval. However, even today, a number of documents (government files,
books, magazines, newspapers etc) exist in print format. Such documents lack
the long time persistence of electronic documents, their ease of storage,
retrieval etc. Hence, a number of researchers are devoting their time and
efforts on the problem of converting scanned images of these documents
(called document images) to electronic form. Document Image Analysis is concerned with the problem of transferring these document images into electronic form. This would involve the automatic interpretation of images of printed and handwritten documents, including text, forms, postal envelopes, bank checks, engineering drawings, maps etc. Several systems, which work in specific domains, like the ones mentioned above, have been developed. Document Image Analysis can be defined as the process that performs the overall interpretation of document images. It is a key area of research for various applications in machine vision and media processing, including page readers, forms processing, content-based document retrieval and transmission, and digital libraries. My final year project would involve the development of a system for converting document images to electronic form. It would take in the scanned image of a document and covert these scanned images to a suitable format like HTML, XML or a custom format. The entire system would be completed in three phases: i) Segmentation: The input image would first be segmented to separate out the text, images (both natural and synthetic like charts, equations etc) and background. The text would be fed to an OCR system (which may or may not be developed by me depending on time constraints), while the images (including charts, mathematical equations etc) would be handled by a separate module. ii) Extraction of geometric structure of the document: The geometric structure or layout of the document would then be extracted for use in reconstruction of the document. iii) Mapping this geometric structure to logical structure: Once the layout of the document has been obtained, a new electronic document (like HTML document) would be made. Before starting with the second phase, I decided to look into possible applications of the system. I came up with the following three applications: a) Automatic building of a Geographical Information System (GIS): It would take maps having textual information embedded in it as input, extract this information and combine these maps in different layers. I have prepared a small write up about GIS in course of this study. b) Classification of a page based on its structure/layout: The layout information also gives an indication about the type of page and can be used for classification of the page without using the text content in the page. Existing algorithms rely solely on the text content of the web pages for categorization. However, web documents have a lot of information contained in their structure, images, audio, video etc present in them. I along with my colleague Kranthi have written a paper titled 'Web Page Classification based on Document Structure' based on this idea, which won the 2nd prize at IEEE India Council National level Student Paper contest 2001. c) Conversion of a multilingual document to electronic form: Given the scanned image of a multilingual document the aim would be to separate out the text in different languages, feed them to their respective OCRs and then covert the document to electronic form. I have decided to develop a system for the analysis of a multilingual document. So far, I have completed the first phase of the project. Analyzing a multilingual document poses a great challenge and if solved could prove to be very useful in the Indian context. For a more detailed report click here. To download a pdf version of the same click here.
Real Time Operating System (ongoing) Guide: Dr. Govindarajulu The motivation for the project was that building an OS from scratch allows one to gain a deeper understanding of the internals of a computer. This project would give us an in-depth knowledge about the various hardware components that go into the making of a computer system. We started by looking at the necessary as well as desired features of a RTOS. We also looked at the source code/specification of a few operating systems like Linux, QNX and came up with the following specifications for our system: · Soft Real Time · Micro-kernel based · Single-user · Multi-processing · Text console based The Operating System would be based on a preemptive soft-real-time scheduling algorithm. It would support the MSDOS FAT32 file system. There would be support for ELF, Portable Executable and COFF binaries.
Guide: Dr. M B Srinivas and Dr. Kamalakar Karlapalem The system we plan to develop is a specialized device that would enable customers to shop online without the need of a computer or internet connection. It is basically an embedded system to perform web based shopping. All that would be needed is a telephone connection, the telephone number of the retailer and this device. The system would offer a simple, secure and efficient means of carrying out retail transactions. Internally it would be a Linux based PC with an integrated client for the web, optimized for its small display.
Guide: Dr. C V Jawahar The project involved developing an editor for image, audio and video files. The editor supports 3 image file formats - JPEG, GIF, PNM, 2 audio file formats - WAV, AU, and 3 video file formats - MP3, MPEG and AVI. The editor supports the regular editing features like cut, copy and paste. It also supports multimedia specific editing features. The Image Editor basically deals with the raw file format and all the editing features are implemented in this format. To support JPEG, GIF and PNM formats, these files were first converted into the raw format, and then the transformations done. Transformations have not been implemented directly in the compressed domain. The image editing features include rotation of the image, superimposition of two images, histogram equalization of images, color inversion, edge detection and color sharpening. The Audio Editor uses the basic wav format as a base for implementing the transformations. Au and Mp3 audio files are decoded into wav files and once the relevant transformations have been applied, the wav files are encoded into Mp3/Au files. Winamp Plugins have been made use of to decode Mp3s to wav. The audio editing features include alteration of the volume settings, filtering, and Fourier domain transformations. The Video Editor makes use of the DirectX API to decode videos in Mpeg and Avi file formats into their constituent frames, which are then edited. Once the frames have been edited making use of the features implemented in the Image Editor, the frames are put together to form videos again. The video editing features include all image editing features applied to individual frames of a video, merging two videos, inserting some frames of one video into another.
Distributed Database Management System Guide: Dr. Kamalakar Karlapalam As part of our Distributed Database Management System course, we implemented a distributed layer over MySQL DBMS. The project was implemented in Java. The tables of a single database were distributed over three systems. Further, individual table were also fragmented vertically and horizontally and distributed over various sites. The database catalog was placed at all the three sites. Three modules were implemented, the distributed select, the distributed update and optimization module. Only the select and update clauses were implemented, as they are enough to deliver the basic functionality of the DBMS. The distributed select involves displaying all the tuples in the select. If the table is horizontally partitioned, this becomes easy, as the select has to be performed with the same set of attributes on all the horizontal fragments. If the condition of the query is defined over the attribute which is used in the partitioning condition, then we should look for contradiction between the user query condition and the partitioning condition. This increases the efficiency because in case of contradiction, a select over that fragment yields no tuples. In case the conditions are defined over different attributes an ‘and’ of the two conditions is performed while selecting. The vertically partitioned tables need to be joined based on the key attributes. Updates are propagated through the various partitions. Updating a horizontally partitioned distributed table involves checking if the fragmentation condition is not modified by the update. In case it is, then the tuple is deleted from the first fragment and inserted into the fragment whose condition it now satisfies. Update over a vertically partitioned table requires the entire tuple with all the attributes to be selected, the change made, the old tuple deleted and the new tuple inserted in to the database. Distribute insert is also implemented to facilitate updates. One way of optimization the distributed query execution is by minimizing communication costs. i.e. at each point the selects are performed before the joins and sent to the next node in the network where the results are needed.
Guide: Dr. Rajeev Sangal We were introduced to this project by Prof. Rajendran, who is a part of a team at IIT Chennai, which has developed a Hindi text-to-speech system. The system is based on a parameter concatenation model and makes use of the following speech parameters: · Linear Predictive Coefficients (LPC) · Formants · Pitch · Gain The researchers working on this TTS system had been studying these parameters theoretically. They were not able to manipulate/control these parameters and see the results. A need for a tool to display and control these parameters was felt. To develop such a tool we had to first understand the theory behind these parameters, their implementation etc. We then developed a system that could display these parameters and allow the user to manipulate them. This system also allowed for recording and simple manipulations of a speech signal, like cutting, copying, pasting, concatenating two signals etc.
Guide: Prof Ramana Reddy This project was done in the Distributed Computing course under Prof. Ramana Reddy. This project was done by a group of five students. The coding was done in Java so as to make it easily portable to any system that supports the Java Runtime Environment. It is based on a client-server architecture. The communication between the server and client was implemented through Remote Method Invocation (RMI). RMI was preferred over TCP sockets as RMI provides a more fine-grained and low overhead approach to distributed application development. The GUI was designed using the swing classes of Java. This application was specifically designed for taking tests over the web. The client downloads the set of questions from the server at the beginning of the test. The score on the test is displayed as soon as the user finishes the test. The user can compare his performance with his previous performances or with the toppers of the test so as to know where he stands.
As part of the industry training after my 2nd year I worked as an intern at
Exceed Commerce, Description of Toadnode Toadnode is a computer program that allows the user to share and download files with Internet users all over the world. Toadnode helps find files that a user wants to download and helps him share files he thinks other Internet users want. Each Toadnode client is a node in an anonymous, worldwide file-sharing network. This network works differently from most because it is not centralize i.e. there is no central server. In traditional file-sharing networks large server computers that operate 24 hours a day are make files available for download. Toadnode uses peer-to-peer file sharing which doesn't user servers. Toadnode is available for download. During the cause of the project I learned a lot about software design, development, testing and maintenance. This was an invaluable experience for the international exposure it gave me.
Guide: Prof Asthana · Fractal Generator: Fractals are self similar geometric figures i.e. within a figure there are infinitely many smaller versions of itself. I developed an application that generates a few popular fractals like Julia, Mandelbrot and also my own fractal. Click here for a few fractals generated by the application. · Brick Game: The popular brick game was developed in OpenGL, DirectX, Turbo C and VRML and the performance of all the versions studied. ·
Guide: Dr. Rajeev Sangal The course on Perl and Scripting Languages was more practical oriented with a lot of emphasis given to the project. We built a web portal to understand the complexities and issues involved in web design. Our portal included: a) Two chat systems - one in java and the other using HTML and Perl CGI: The chat system in HTML was developed because some people disable java applets for security reasons. Both versions supported chat in several Indian languages. b) Bulletin Board: A bulletin board is a common knowledge pool where a number of users can share information by posting it on the BB. The issues involved here include: i. Many users can simultaneously try to post a message on the BB resulting in the messages getting jumbled up ii. In case users requests are handled one at a time, some of the users may be forced to wait for a long time. iii. A message posted by one user may not be appropriate for another individual or group. We took care of the first issue by locking the file before writing into it. The second issue was handled by forking a new process for handling each request. This process would wait on the lock and the user can continue with his work. The third issue is handled by designing customizable filters. c) A rudimentary lexical analyzer: We first built a database of the high frequency words corresponding to each category using a corpus provided by the Language Technologies Research Center (LTRC). The user would give a document as input to the analyzer, which would categorize it based on the frequency of occurrence of the keyword. d) Mail Server: We have provided a web interface for the user to send and receive mail. A user wanting to send a mail simply types it in into our web interface. We get it from the mail from the web interface and send it directly to port 25 of the mail server. Next, we configured Sendmail to handle virtual POP accounts. To receive mails we query port 110 to communicate with the POP daemon.
|