Ellen Riloff's Research Interests
Research Interests
My primary research area is natural language processing, though I am
also interested in machine learning, information retrieval, and
artificial intelligence in general. Much of my research focuses on
automatically acquiring the knowledge needed for conceptual natural
language processing using unannotated text corpora. Some current and
previous research projects are described below. ("We" refers to work
done by me and members of the NLP group at Utah.)
- Learning Patterns for Information Extraction
We have developed several algorithms to automatically learn extraction
patterns. AutoSlog was the first system, which learned
extraction patterns using an annotated training corpus.
AutoSlog-TS was its successor, which learned extraction patterns
using an unannotated but preclassified training corpus (preclassified
means the texts are identified as relevant or irrelevant to the
domain). Most recently, we have developed a meta-bootstrapping
algorithm that can learn both extraction patterns and a semantic
lexicon simulataneously using only unannotated texts.
- Learning Semantic Lexicons
Semantic dictionaries are essential to support language
understanding. We have developed two different algorithms for
automatically acquiring domain-specific semantic lexicons. One method
is the meta-bootstrapping algorithm described above, and the
second method is a statistical collocation algorithm that also
requires only an unannotated text corpus.
- Extraction-based Text Categorization
Our forays into information retrieval have used information extraction
techniques to support high-precision text categorization. We have
developed several text classification algorithms that use extraction
patterns to recognize specific contexts and role relationships, which
can be essential for some classification tasks. The most notable
algorithms use relevancy signatures and augmented
relevancy signatures to represent extraction-based text
classification terms.
- Anaphora Resolution
We have recently begun to investigate the problem of resolving
anaphora. One key challenge is to identify which noun phrases need to
be resolved - many do not. We created a corpus-based method for
automatically identifying definite noun phrases that are not anaphoric
because they do not have antecedents in the text.
- Question Answering
Another recent area of interest is computational question answering,
especially for reading comprehension. We developed a rule-based system
called Quarc that can take reading comprehension exams. We also
participated in the Summer 2000 Johns Hopkins Workshop on Reading
Comprehension, where we developed a question answering system called
Spot.
- Natural Language Interfaces for Programming
An ongoing research project involves creating a natural language
interface for Java programming. This interface accepts natural
language sentences as programming instructions (e.g., "create a for
loop that iterates from 1 to 10") and automatically creates Java
source code.
- Sundance
Most of our research uses our home-grown natural language processing
system called Sundance. Sundance is a heuristic-based shallow parser
that also activates and instantiates case frames for information
extraction.
Publications on these topics can be found at
http://www.cs.utah.edu/~riloff/publications.html
back to home page