Optical Character Recognition (OCR) is a process of recognizing written (or printed) text by mechanical or electronic means (usually a computer). The process amounts to scanning the text into computer memory (using an optical scanner or a pen-based display), processing it, and printing out a stream of ASCII characters representing the text.
The take home problem chosen for this year's contest is an unusual one. Unlike past years' problems, it doesn't have a solution. In an academic setting such a problem is usually called a Ph.D. thesis while in the industry it's sometimes known as ``If you can solve this one I will make you a vice-president''. Yes, there are many optical character recognizers out there used for many different purposes from mail delivery to processing the income tax. However, most of them are very limited in the kinds of inputs they accept. For example, the U.S. Postal Service issues guidelines which describe how an address must be printed to be accepted by the OCR. And the popular Newton computer will only recognize your handwriting if you write in a very specific way. In fact, the current state-of-the-art OCR program produced by AT&T which can recognize printed characters of almost any font with an accuracy of a human typist (about 2-3 typos per page) costs $10,000! Of course, our problem will not be so grandiose; nevertheless, it will hopefully give you a glimpse of the sort of thing you can expect to be working on, if you choose to go into the field of Computer Science.