University of Utah
Department of Computer Science


AutoSlog and AutoSlog-TS

Project Description

Information extraction systems typically rely on a dictionary of extraction patterns to identify relevant information. In most IE systems, these dictionaries are constructed manually, which is extremely tedious and time-consuming. To address this knowledge-engineering bottleneck, we have developed a system called AutoSlog that automatically builds dictionaries of extraction patterns for new domains. AutoSlog uses an annotated corpus and simple linguistic rules. A training corpus for AutoSlog must be annotated by a person to indicate which noun phrases need to be extracted from a text. For example, given the sentence ``The mayor was kidnapped by armed men'', a person would mark the ``mayor'' as a kidnapping victim and the ``armed men'' as perpetrators. AutoSlog then proposes patterns that are capable of extracting these noun phrases. In the previous sentence, AutoSlog would create one pattern ``X was kidnapped'' to extract the mayor, and a second pattern ``was kidnapped by Y'' to extract the armed men. Because the patterns are general, they will extract similar information from new texts as well.

AutoSlog has been used to build dictionaries for three different domains: terrorism, joint ventures, and microelectronics. Given a training corpus for the terrorism domain, AutoSlog produced a dictionary with only 5 person-hours of effort that achieved 98% of the performance of a hand-crafted dictionary that required approximately 1500 person-hours to build. We are currently working with a new version of AutoSlog, called AutoSlog-TS that generates dictionaries of extraction patterns using only preclassified texts, and does not require the detailed text annotations that AutoSlog did.


Mail: riloff@cs.utah.edu for more information.
Last modified: Wed Nov 22 06:26:13 1995