WORD SENSE DISAMBIGUATION

 

APPLIED TO

 

SINGLE DOMAIN-FREE

COLLOCATIONS

 

 

 

Bill Phillips

cs6010

proposal

 

 

 

ABSTRACT

 

This proposal describes a method for applying word sense disambiguation to a small collocation of words. It can be applied to a single phrase or sentence. Currently used word sense disambiguation methods rely on large collocations and domain knowledge to achieve satisfactory results. By applying higher order interdependent word probabilities, it should be possible to achieve results comparable to existing methods without the need for a large collocation or domain knowledge.

 

INTRODUCTION

 

Until recently, most NLP research has been focused on the understanding of a larger text. This is especially true of word-sense disambiguation research. Although this is a natural and important area to conduct research, there are currently many required applications where only a single phrase or sentence is available. The methods used for word-sense disambiguation on a larger text cannot successfully be applied to such a domain. In this paper, I will outline a new method of word-sense disambiguation which is applicable to single phrases of language.

Word-sense disambiguation is of fundamental importance in language understanding. In the English language a given word can take on completely different meanings even for a given POS. For example, the word plant, while functioning as a noun, can either refer to a living organism or a factory. Not even a fundamental meaning of a sentence with the noun plant in it can be determined unless the word plant can first be properly disambiguated. Also, synonym resolution, which is required for any meaning representations of sentences, cannot be performed without first successfully performing word disambiguation.

Current methods for word sense disambiguation cannot be applied to a single small collocation of words in a domain free environment due to the inherent lack of data that is required by these methods. My approach overcomes this problem by applying higher order interdependent word statistics to a single, small collocation. Such statistical information contains the additional relevant data needed to allow successful word sense disambiguation in this environment.

 

BACKGROUND

 

Both supervised and unsupervised methods have been successfully implemented to apply word sense disambiguation to words found within a discourse. The most successful methods rely on one of (or both) two properties found in word sense usage in language. The first is one-sense per discourse. If a given sense of an ambiguous word can be determined in one location within a discourse, it will generally be used with this same sense throughout the discourse. Yarowsky heavily relied on this information in his most recent unsupervised method, which is considered the current benchmark for unsupervised word sense algorithms. Some words he found which have such a correlation are shown in fig. 1. As can be seen, when a word appears more than once (the applicability measure) the probability that it will take on the same meaning is extremely high.

The other property of language that is exploited in word sense disambiguation algorithms is one sense per collocation. This refers to the phenomena that surrounding words strongly and consistently help discriminate the meaning of an ambiguous word. For example, given manufacturing plant, the sense of the noun plant is clearly that of a factory. The presence of manufacturing appearing immediately before plant disambiguates its meaning.

The most successful and applicable methods currently used employ stochastic knowledge derived from these two language properties. In general, a naive bayesian model is applied to the probabilities of surrounding words appearing for a given sense of the word. The probabilities of each word are assumed to act independently. The various probabilities from each relevant word are then combined in some weighted voting scheme to determine the most probable sense.

For example, in the following discourse:

"The corporation’s new business model for reducing operating expenses requires the location of manufacturing facilities in rural areas. The company has a large plant in the Adirondack Mountain Region. This factory has the lowest labor cost of any of their facilities."

the word plant is disambiguated to refer to a factory by the surrounding words (corporation, business, expenses, manufacturing, factory, etc.) which all point to the current domain of a manufacturing plant. In general, current methods use word associations as a means of classifying the domain of the text. Once the domain of the text is known, the sense of the word can be disambiguated with very high reliability as being the sense that corresponds to this domain. In the above example, the word probabilities can classify the text as referring to corporations or business. With this knowledge, the word plant can be reliably determined to refer to a manufacturing plant.

There are two different applications where these methods fail to achieve satisfactory results. First, when the word to be disambiguated can have different senses in the same domain. This is commonly exhibited by verbs and, in general, by words that take on several meanings or meanings that are independent of any domain. For example, in the following discourse, the verb take has different meanings that current methods cannot reliably disambiguate.

"John was very sick. I took him to the pharmacy so he could buy the medicine he needed to take"

The words in the discourse such as sick, pharmacy, and medicine will all point to the current disambiguation for the second occurrence of take, but the first one will not be able to be distinguished. A larger discourse will not aid in this disambiguation.

The other application where these methods fail is when trying to disambiguate a word in a single domain-free phrase or sentence. In this application, there is not a large enough collocation of words for existing methods to be able to perform disambiguation. As an example, we will examine the following sentence:

"The company has large plants in the Adirondack Mountain Region."

This sentence, when appearing alone, should be disambiguated to refer to a factory, but the appearance of large and Adirondack Mountain Region may carry more weight than company in the currently used methods.

The problem with current methods is that they rely, either explicitly or implicitly, too heavily on matching a word-sense to its surrounding domain. When such a domain does not exist or is not applicable to the current word-sense, these methods break down. Obviously, for my area of research dealing with smaller collocations, there is not a large enough body of text for a domain to be established.

 

Proposed Work

 

To solve this problem I propose using a probabilistic model which accounts for the interdependent probabilities of word associations. Instead of making the false assumption that word associations can be performed independently, my model looks at word associations for n=2 words. All possible word associations are looked at. For example, in the previous sentence the probabilities of the words (companies, have), (companies, large), (companies, Adirondack Mountains), (have, large), and (have, Adirondack Mountains) occurring with each sense of the word plant could be computed. Each association would be allowed to vote, with greater weighting being given to the more deterministic probabilities.

By using such interdependent probabilites, it is possible to capture additional information that is not otherwise available. This information is inherently less dependent on the domain context, and more localized to the word being examined. It also allows more reliable information to be gathered by the data, since the natural word associations are better accounted for.

I chose to only look at the two-word interdependencies for two reasons. First, a large amount of information gain is obtained going to a two word dependency model from a single word independency assumption. Although some further information may be gained by going to three-word dependencies, the gain would not be very large. Secondly, the number of probabilities that must be accounted for when going from an independence assumption to two word dependencies is over an order of magnitude greater, but is still manageable. When going from n=2 to n=3, the number of probabilities that need to be found and stored becomes unmanageable. In short, the tradeoff between complexity and information gain is not beneficial beyond dual-word dependencies.

There are two major steps that will need to be dealt with in order to implement my solution. First, all of the two word dependencies need to be computed in a domain free environment. This is not a trivial problem and requires a large number of possible domains that need to be accounted for. This would not have been possible until recently since such a large corpus of text was not available. However, the internet currently supplies a near limitless amount of text, and it is now possible to acquire enough text to compute such probabilities accurately. A database of search engine queries will provide an excellent source of domain-free independent phrases, and could provide an excellent source of data to complement traditional text sources.

The second hurdle that needs to be addressed is the implementation details of my solution. The weighting that should be given to different probabilities needs to be determined. The voting method of the different probabilities also needs to be determined. For example, should only the best rule be applied, or should all probabilities be accounted for. Also, what factors should be accounted for in gathering probabilities. Should the syntactic structure of surrounding words, the word order, the distance from the target word, or the conceptual role of the words be factored in; and what weighting and cutoff values should be used. Many of these factors will need to be determined experimentally. Luckily, a lot of research has already been done in determining the importance of these factors for word sense disambiguation in known domains, and this information is generally valid for my application. This existing knowledge, combined with experimentation, should be able to lead to an optimal implementation of my algorithm.

 

EVALUATION

 

I will compare my solution against some well established algorithms applied to the context of short, independent, domain free collocations. I consider it to be a minimal criteria that my solution should obtain statistically significant better results than these methods since these methods are not optimized to perform well for this application.

Unfortunately, since there has not been much research to date in this area, there are no commonly used benchmarks for testing word sense disambiguation for smaller collocations. Once common benchmark used for word sense disambiguation that can be used is the SEMCOR tests. Each test consists of only a few sentences. Although this does not explicitly meet the criteria for testing my solution, it would still provide some useful results for analysis.

In order to more accurately test my system, I will need to create my own test sets. Given, the large quantity of data that needs to be collected to gather the needed statistical information, it should not be difficult to also collect valid testing data. Care will be taken to separate the data used for gathering the statistics from that used for testing the system. A random sampling from search engine queries may also be able to be used for this purpose since such queries already consist of random short, domain free collocations.

 

SUMMARY

 

By gathering statistical information which is explicitly related to disambiguating a given word independent of its domain, it is possible to perform word sense disambiguation in a single, short collocation of words. Interdependent word association probabilities for word sense disambiguation provide this needed statistical information. It is currently possible to gather these statistics in a domain free environment. These probabilities can then be applied to any domain free sentence using a simple voting scheme to perform word disambiguation. The same techniques used here to allow disambiguation to be applied to small collocations of words should also be more broadly applicable to other areas of word sense disambiguation where the current methods fail due to a lack of domain-dependence exhibited by the target word.

 

 

 

 

 

 

 

 

BIBLIOGRAPHY

 

Yarowsky, David "Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora", in Proceedings, COLING-92, Nantes, France, 1992.

Yarowsky,David "Unsupervised Word-Sense Disambiguation Rivaling Supervised Methods", in Proceedings, 33rdAnnual Meeting of the Association of Computational Linguistics, 1995.