Sean Igo

Sean Igo

M.S. Computer Science, graduated 2007

Natural Language Processing

Advisor: Dr. Ellen Riloff

sigo AT cs dot utah dot edu

 

Thesis research:

Identifying Reduced-Relative Passive Voice Constructions in Shallow Parsing Environments

Verbs in passive voice can affect the relationship between the syntactic and thematic roles of noun phrases in a sentence. For example, in both of the following sentences, Sam is the agent and the dog is the theme:

Sam washed the dog.

The dog was washed by Sam.

However, their syntactic roles are different. In the first sentence, Sam is the syntactic subject of washed and the dog is the direct object. In the second, the dog has moved to subject position and Sam moves into a prepositional phrase. NLP systems which rely on the relationship of semantic and thematic roles must be aware of this property of passive-voice verbs.

Typically, verbs in passive voice are easily recognized because of the auxiliary be-verb preceding them. The auxiliary be is not completely reliable, though, because it disappears if the verb is in a reduced relative clause:

The dog washed by Sam was a border collie.

I call passive-voice verbs of this form “reduced passives”.

Full parsers, such as those written by Charniak and Collins, are capable of building syntactic structures that reflect these cases. Post-processing of these parsers' output can identify the verbs as being in passive voice despite the lack of an auxiliary. Unfortunately, the accuracy of these parsers, and therefore this approach to recognizing reduced passives, drops sharply if the input text does not resemble the parser's training data, and creating annotated corpora to retrain them in a new domain would be very costly.

Shallow (partial) parsers, on the other hand, lack the syntactic sophistication necessary to identify reduced passives through simple post-processing, but they are more robust when given out-of-domain or ungrammatical input text. My hypothesis is that shallow parsers can recognize reduced passives if supported by knowledge of lexical and semantic properties of verbs, and that this knowledge can be extracted from unannotated domain corpora.

The goal of my research was to add reduced passive recognition to the Sundance shallow parser.