A method for disambiguating word senses in a large corpus

Gale, William A.; Church, Kenneth W.; Yarowsky, David

doi:10.1007/BF00136984

A method for disambiguating word senses in a large corpus

Published: December 1992

Volume 26, pages 415–439, (1992)
Cite this article

Computers and the Humanities Aims and scope Submit manuscript

William A. Gale¹,
Kenneth W. Church¹ &
David Yarowsky¹

423 Accesses
181 Citations
3 Altmetric
Explore all metrics

Abstract

Word sense disambiguation has been recognized as a major problem in natural language processing research for over forty years. Both quantitive and qualitative methods have been tried, but much of this work has been stymied by difficulties in acquiring appropriate lexical resources. The availability of this testing and training material has enabled us to develop quantitative disambiguation methods that achieve 92% accuracy in discriminating between two very distinct senses of a noun. In the training phase, we collect a number of instances of each sense of the polysemous noun. Then in the testing phase, we are given a new instance of the noun, and are asked to assign the instance to one of the senses. We attempt to answer this question by comparing the context of the unknown instance with contexts of known instances using a Bayesian argument that has been applied successfully in related tasks such as author identification and information retrieval. The proposed method is probably most appropriate for those aspects of sense disambiguation that are closest to the information retrieval task. In particular, the proposed method was designed to disambiguate senses that are usually associated with different topics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Analysis of Word Sense Disambiguation (WSD)

A New Approach to the Supervised Word Sense Disambiguation

Practice of Word Sense Disambiguation

References

Bar-Hillel. “Automatic Translation of Languages.” In Advances in Computers. Ed. Donald Booth and R. E. Meagher. New York: Academic Press, 1960.
Google Scholar
Black, Ezra. Towards Computational Discrimination of English Word Senses. Ph.D. thesis. City University of New York, 1987.
Black, Ezra. “An Experiment in Computational Discrimination of English Word Senses.” IBM Journal of Research and Development, 32 (1988) 185–94.
Google Scholar
Brown, Peter, Stephen Della Pietra, Vincent Della Pietra, and Robert Mercer. “Word Sense Disambiguation Using Statistical Methods.” In Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, 1991, 264–70.
Brown, Peter, Jennifer Lai, and Robert Mercer. “Aligning Sentences in Parallel Corpora.” In Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, 1991, 169–76.
Choueka, Yaacov, and Serge Lusignan. “Disambiguation by Short Contexts.” Computers and the Humanities, 19 (1985) 147–58.
Google Scholar
Church, Kenneth. “A Stochastic Parts Program an Noun Phrase Parser for Unrestricted Text.” In Proceedings, IEEE International Conference on Acoustics, Speech and Signal Processing, Glasgow, 1989.
Cruse, D. A. Lexical Semantics. Cambridge: Cambridge University Press, 1986.
Google Scholar
Dagan, Ido, Alon Itai, and Ulrike Schwall. “Two Languages are more Informative than One.” In Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, 1991, 130–37.
Fillmore, Charles, and Sue Atkins. “Word Meaning: Starting where MRD's Stop.” Invited paper given at the 29th Annual Meeting of the Association for Computational Linguistics, 1991.
Granger, Richard. “FOUL-UP A Program that Figures out Meanings of Words from Context.” IJCAII-77 (1977) 172–78.
Hearst, Marti. “Toward Noun Homonym Disambiguation Using Local Context in Large Text Corpora.” In Proceedings of the Seventh Annual Conference of the UW Centre for the New OED and Text Research. Waterloo, Ontario, Canada: UW Centre for the New OED and Text Research, University of Waterloo, 1991.
Google Scholar
Hirschman, Lynette. “Discovering Sublanguage Discovery.” In Analyzing Language in Restricted Domains. Ed. Ralph Grishman and Richard Kittredge. Hillsdale, New Jersey: Lawrence Erlbaum, 1986.
Google Scholar
Hirst, G. Semantic Interpretation and the Resolution of Ambiguity. Cambridge: Cambridge University Press, 1987.
Google Scholar
Ide, N. M., and J. Veronis. “Very Large Neural Networks for Word Sense Disambiguation.” In Proceedings of the 9th European Conference on Artificial Intelligence, ECAI 90. Stockholm, 1990, pp. 366–68.
Isabelle, P. “Machine Translation at the TAUM Group.” In Machine Translation Today: The State of the Art. Ed. King, M. Edinburgh: Edinburgh University Press, 1984.
Google Scholar
Jackson, Howard. Words and their Meaning, London: Longman, 1988.
Google Scholar
Jacobs, Paul, George Krupka, Susan McRoy, Lisa Rau, Norman Sondheimer, and Uri Zernik. “Generic Text Processing: A Progress Report.” In Proceedings DARPA Speech and Natural Language Workshop, 1990, 359–64.
Jorgensen, Julia. “The Psychological Reality of Word Senses.” Journal of Psycholinguistic Research, 19, (1990) 167–90.
Google Scholar
Kaplan, Abraham. “An Experimental Study of Ambiguity in Context.” Cited in Mechanical Translation, 1, 1–3 (1950).
Google Scholar
Kay, M., and M. Rösenschein, M. “Text-Translation Alignment.” Computational Linguistics. (To appear.)
Kelly, Edward, and Phillip Stone. Computer Recognition of English Word Senses. Amsterdam: North-Holland, 1975.
Google Scholar
Kucera, H., and W. Francis. Computational Analysis of Present-day American English. Providence: Brown University Press, 1967.
Google Scholar
Lesk, Michael. “Automatic Sense Disambiguation: How to tell a Pine Cone from an Ice Cream Cone.” In Proceeding of the 1986 SIGDOC Conference. New York: Association for Computing Machinery, 1986.
Google Scholar
Longman Group Ltd. Longman Dictionary of Contemporary English, Burnt Mill, England: Longman, 1978.
Google Scholar
Masterson, Margaret. “Mechanical Pidgin Translation.” In Machine Translation. Ed. Donald Booth. New York: Wiley, 1967.
Google Scholar
Mosteller, Fredrick, and David Wallace. Inference and Disputed Authorship: The Federalist. Reading, MA: Addison-Wesley, 1964.
Google Scholar
Quine, W. v. O. Word and Object. Cambridge: MIT Press, 1960.
Google Scholar
Reiger, Charles. “Viewing Parsing as Word Sense Discrimination.” In A Survey of Linguistic Science. Ed. W. Dingall. Greylock, 1977.
Sinclair, J., Hanks, P., Fox, G., Moon, R., Stock, P. et al.Collins Cobuild English Language Dictionary. London and Glasgow: Collins, 1987.
Google Scholar
Small, S. and C. Rieger. “Parsing and Comprehending with Word Experts (A Theory and its Realization).” In Strategies for Natural Language Processing Ed. W. Lehnert and M. Ringle. Hillsdale, New Jersey: Lawrence Erlbaum, 1982.
Google Scholar
Stone, Phillip, D. C. Dunphy, M. S. Smith, and D. M. Ogilvie. The General Inquirer: A Computer Approach to Content Analysis. Cambridge: MIT Press, 1966.
Google Scholar
Walker, Donald. “Knowledge Resource Tools for Accessing Large Text Files.” In Machine Translation: Theoretical and Methodological Issues. Ed. Sergei Nirenbe. Cambridge, England: Cambridge University Press, 1987.
Google Scholar
Weinreich, U. On Semantics. Philadelphia: University of Pennsylvania Press, 1979.
Google Scholar
Weiss, Stephen. “Learning to Disambiguate.” Information Storage and Retrieval, 9 (1973), 33–41.
Google Scholar
Yngve, Victor. “Syntax and the Problem of Multiple Meaning.” In Machine Translation of Languages. Ed. William Locke and Donald Booth. New York: Wiley, 1955.
Google Scholar
Zernik, Uri. “Tagging Word Senses in Corpus: The Needle in the Haystack Revisited.” In Text Based Intelligent Systems: Current Research in Text Analysis, Information Extraction, and Retrival. Ed. P. Jacobs. Schenectedy: GE Research and Development Center, 1990. 1990, 25–29.
Google Scholar

Download references

Author information

Authors and Affiliations

AT&T Bell Laboratories 600 Mountain Avenue, P.O. Box 636, 07974-0636, Murray Hill, NJ
William A. Gale, Kenneth W. Church & David Yarowsky

Authors

William A. Gale
View author publications
You can also search for this author in PubMed Google Scholar
Kenneth W. Church
View author publications
You can also search for this author in PubMed Google Scholar
David Yarowsky
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

William Gale is in a statistics department at AT&T Bell Laboratories. He has done research in physics, radio astronomy, and economics in the past, and founded the Society for Artificial Intelligence and Statistics. His current interests include lexical issues such as word sense discrimination, word similarity measures, and word correspondences in parallel texts.

Kenneth Ward Church received his Ph.D. in Computer Science from MIT, and then went to work at AT&T Bell Laboratories on problems in speech and language. Recently, he has been advocating the use of statistical methods for analyzing large corpora.

David Yarowsky is currently pursuing a Ph.D. in Computer Science at the University of Pennsylvania. He spent several years at AT&T Bell Laboratories doing research in statistical natural language processing.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gale, W.A., Church, K.W. & Yarowsky, D. A method for disambiguating word senses in a large corpus. Comput Hum 26, 415–439 (1992). https://doi.org/10.1007/BF00136984

Download citation

Issue Date: December 1992
DOI: https://doi.org/10.1007/BF00136984

Key Words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A method for disambiguating word senses in a large corpus

Abstract

Access this article

Similar content being viewed by others

An Analysis of Word Sense Disambiguation (WSD)

A New Approach to the Supervised Word Sense Disambiguation

Practice of Word Sense Disambiguation

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Key Words

Navigation

A method for disambiguating word senses in a large corpus

Abstract

Access this article

Similar content being viewed by others

An Analysis of Word Sense Disambiguation (WSD)

A New Approach to the Supervised Word Sense Disambiguation

Practice of Word Sense Disambiguation

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key Words

Search

Navigation