Integrating Machine Readable Dictionary and Thesaurus for Conceptual Context Representation of Word Sense

Chen, Jen Nan; Chang, Jason S.

doi:10.1007/978-94-017-0952-1_10

Jen Nan Chen⁴ &
Jason S. Chang⁵

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 10))

181 Accesses

Abstract

This paper describes a heuristic approach to automatic acquisition of contextual representations of senses in a machine-readable dictionary (MRD). Including contextual information in an MRD-based lexical database offers several benefits. First, the representation can be used to merge closely related senses and construct a coarser sense division, so unnecessarily fine sense distinctions can be avoided in word sense disambiguation (WSD). The contextual information can also be used as a knowledge base to develop a WSD system. Furthermore, if the algorithms run on several MRDs, the contextual representation also provides a means of linking relevant senses across multiple MRDs to create an integrated lexical database. The algorithms are based primarily on information retrieval techniques to build a list of topics that are most relevant to the definition of each MRD sense. An implementation of the method using definition sentences in the Longman Dictionary of Contemporary English is described. To this end, the topical word lists and topical cross-references in the Longman Lexicon of Contemporary English are used. We have conducted a series of experiments and evaluations to assess the performance of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ageno, A., I. Castellon, M. A. Marti, G. Rigau, F. Ribas, H. Rodriguez, M. Taule and F. Verdejo. 1992. SEISD: An Environment for Extraction of Semantic Information from On-Line Dictionaries. In the Proceedings of the 3rd Conference on Applied Natural Language Processing, pp. 253-254, Trento, Italy.
Google Scholar
Ahlswede, T. and M. Evens. 1988. Parsing vs. Text Processing in the Analysis of Dictionary Definitions. In the Proceedings of the 26th Annual Meeting of the Association for Computational Linguistics, pp. 217-224.
Google Scholar
Alshawi, H. 1987. Processing Dictionary Definitions with Phrasal Pattern Hierarchies. American Journal of Computational Linguistics, Vol. 13. no. 3 ., pp. 195–202.
Google Scholar
Alshawi, H., B. Boguraev and D. Carter. 1989. Placing the Dictionary On-Line. In B. Boguraev and T. Briscoe (eds.), Computational Lexicography for Natural Language Processing, pp. 41-63,Longman, London.
Google Scholar
Amsler, R. A. 1984a. Machine-Readable Dictionaries. Annual Review of Information Science and Technology, 19, pp. 161–209.
Google Scholar
Amsler, R. A. 1984b. Lexical Knowledge Bases, Panel Session on Machine-Readable Dictionaries. In the Proceedings of the Tenth International Congress on Computational Linguistics,pp. 458-459, Stanford, CA.
Google Scholar
Amsler, R. A. 1987. Words and Words. In the Proceedings of the Third Workshop on Theoretical Issues in Natural Language Processing,pp. 7-9, New Mexico State Unive:sity at Las Cruces, NM.
Google Scholar
Brown, P. F., S. A. Della Pietra, V. J. Della Pietra and R. L. Mercer. 1991. Word Sense Disambiguation using Statistical Methods. In the Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics,pp. 264-270.
Google Scholar
Bruce, R. and J. Wiebe. 1994 Word Sense Disambiguation using Decomposable Models. In the Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics,pp. 139-145.
Google Scholar
Chang, J. S., J. N. Chen, H. H. Sheng and S. J. Ker. 1996. Combining Machine Readable Lexical Resources and Bilingual Corpora for Broad Word Sense Disambiguation. In the Proceedings of the Second Conference of the Association for Machine Translation, pp. 115–124. Montréal, Québec, Canada.
Google Scholar
Chen, J. N. and J. S. Chang. 1994 Towards Generality and Modularity in Statistical Word Sense Disambiguation. In the Proceedings of the 8th Asian Conference on Language, Information and Computation,pp. 45-48.
Google Scholar
Chen, J. N. and J. S. Chang. 1998. Topical Clustering of MRD Sense based on Information Retrieval Techniques. Computational Linguistics, Vol.24 no.1, pp. 61–95.
Google Scholar
Chodorow, M. S., R. J. Byrd and G. E. Heidorn. 1985. Extracting Semantic Hierarchies from a Large On-Line Dictionary. In the Proceedings of the 23rd Annual Meeting of the Association for Computational Linguistics,pp. 299-304.
Google Scholar
Church, K. W. 1988. A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text. In the Proceedings of the 2nd Conference on Applied Natural Language Processing, pp. 136–143, Austin, Texas, USA.
Google Scholar
Copestake, A. 1990. An Approach to Building the Hierarchical Element of a Lexical Knowledge base from a Machine Readable Dictionary. In the Proceedings of the First International Workshop on Inheritance in Natural Language Processing,pp. 19-29, Tilburg, The Netherlands.
Google Scholar
Cowie, J., J. Guthrie and L. Guthrie. 1992. Lexical Disambiguation using Simulated Annealing. In the Proceedings of the 14th International Conference on Computational Linguistics,pp. 359-365.
Google Scholar
Dagan, I. and A. Itai. 1994. Word Sense Disambiguation using a Second Language Monolingual Corpus. Computational Linguistics,Vol. 20 no. 4, pp. 563-596.
Google Scholar
Dagan, I., A. Itai and U. Schwall. 1991. Two Languages are More Informative than One. In the Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics,pp. 130-137.
Google Scholar
Dolan, W. B. 1994. Word Sense Disambiguation: Clustering Related Senses. In the Proceedings of the 15th International Conference on Computational Linguistics,pp. 712716.
Google Scholar
Gale, W., K. W. Church and D. Yarowsky. 1992. Using Bilingual Materials to Develop Word Sense Disambiguation Methods. In the Proceedings of the 4th International Conference on Theoretical and Methodological Issues in Machine Translation,pp. 101-112.
Google Scholar
Guthrie L., B. M. Slator, Y. Wilks and R. Bruce. 1990. Is there Contents in Err pty Heads? In the Proceedings of the 13th International Conference on Computational Linguistics,pp. 138-143.
Google Scholar
Ide N. and J. Véronis. 1998. Introduction to the Special Issue on Word Sense Disambiguation: the State of the Art. Computational Linguistics, Vol.24 no.1, pp. 1–10.
Google Scholar
Jensen, K. and J. L. Binot. 1987. Disambiguating Prepositional Phrase Attachments by using On-Line Dictionary Definitions. Computational Linguistics, Vol.13 no.4, pp. 251–260.
Google Scholar
Klavans, J. L., M. S. Chodorow and N. Wacholder. 1990. From Dictionary to Knowledge Base via Taxonomy. In the Proceedings of the Sixth Conference of the University of Waterloo Centre for the New Oxford English Dictionary and Text Research: Electronic Text Research,pp. 110-132, University of Waterloo, Canada.
Google Scholar
Krovetz, R. and W. B. Croft. 1992. Lexical Ambiguity and Information Retrieval. ACM Transaction on Information Systems, pp. 115–141.
Google Scholar
Krovetz, R. 1992. Sense-Linking in a Machine Readable Dictionary. In the Proceedings of the 30th Annual Meeting of the Association for Computational Linguistics, pp. 330–332.
Google Scholar
Lesk, M. E. 1986. Automated Sense Disambiguation using Machine-Readable Dictionaries: How to Tell a Pine Cone from an Ice Cream Cone. In the Proceedings of the ACM SIGDOC Conference,pp. 24-26, Toronto, Ontario.
Google Scholar
Liddy, E. D., W. Paik and E. S. Yu. 1993. Document Filtering using Semantic Information from a Machine Readable Dictionary. In the Proceedings of the ACL Workshop on Very Large Corpora,pp. 20-29.
Google Scholar
Longman. 1978. Longman Dictionary of Contemporary English, P. Proctor (ed.) London: Longman Group.
Google Scholar
Luk, A. K. 1995. Statistical Sense Disambiguation with Relatively Small Corpora using Dictionary Definitions. In the Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics,pp. 181-188.
Google Scholar
McArthur, T. 1992. Longman Lexicon of Contemporary English. Longman Group ( Far East) Ltd., Hong Kong.
Google Scholar
McRoy, S. 1992. Using Multiple Knowledge Sources for Word Sense Discrimination. Com putational Linguistics,Vol. 18 no. 1, pp. 1-30.
Google Scholar
Miller, G. A., R. Beckwith, C. Fellbaum, D. Gross and K. Miller. 1990. Word-Net: An On-Line Lexical Database. International Journal of Lexicography,Vol. 3 no. 4:235244.
Google Scholar
Montemagni, S. and L. Vanderwende. 1992. Structural Pattern vs. String Pattern for Extracting Semantic Information from Dictionaries. In the Proceedings of the fifteenth International Conference on Computational Linguistics,pp. 546-552.
Google Scholar
Ng, H. T. and H. B. Lee. 1996. Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Examplar-Based Approach. In the Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics,pp. 40-47, Santa Cruz, CA, USA.
Google Scholar
Okumura, A. and E. Hovy. 1994. Lexicon-to-Ontology Concept Association using a Bilingual Dictionary. In the Proceedings of the First Conference of the Association for Machine Translation in the Americas,pp. 177-184, Columbia, MD.
Google Scholar
Ostler, N., and B. T. S. Atkins. 1991. Predictable Meaning Shift: Some Linguistic Properties of Lexical Implication Rules. In the Proceedings of the 1991 ACL Workshop on Lexical Semantics and Knowledge Representation,pp. 76-87.
Google Scholar
Putstejovsky, J. 1991. The Generative Lexicon. Computational Linguistics,Vol. 17 no. 4, pp. 409-441.
Google Scholar
Putstejovsky, J. and P. Bouillon. 1994. On the Proper Role of Coercion in Semantic Typing. In the Proceedings of the 15th International Conference on Computational Linguistics,pp. 706-711.
Google Scholar
Ravin, Y. 1990. Disambiguating and Interpreting Verb Definitions. In the Proceedings of the 28h Annual Meeting of the Association for Computational Linguistics, pp. 260–267.
Google Scholar
Roget’s Thesaurus of English words and Phrases. 1987. Longman Group UK Limited. Sanfilippo, A. and V. Poznanski. 1992. The Acquisition of Lexical Knowledge from Combined Machine-Readable Dictionary Sources. In the Proceedings of the 3rd Conference on Applied Natural Language Processing (ANLP-92),pp. 80-87, Trento, Italy.
Google Scholar
Schütze, H. 1992. Word Sense Disambiguation with Sublexical Representations. In the Proceedings of the 1992 AAAI Workshop on Statistically-based Natural Language Programming Techniques, pp. 100–104.
Google Scholar
Vanderwende, L. 1994. Algorithm for Interpretation of Noun Sequence. In the Proceedings of the 15th International Conference on Computational Linguistics, pp. 782–788.
Google Scholar
Vossen, P., W. Meijs and M. D. Broeder. 1989. Meaning and Structure in Dictionary Definitions. In B. Boguraev and T. Briscoe (eds.) Computational Lexicography for Natural Language Processing, London: Longman Group UK Limited, pp. 171–190.
Google Scholar
Webster’s Seventh New Collegiate Dictionary. 1967. C. and C. Merriam company, Springfield, Massachusetts.
Google Scholar
Wilks, Y. A., D. C. Fass, C. M. Guo, J. E. McDonald, T. Plate, and B. A. Slator. 1990. Providing Tractable Dictionary Tools. In J. Pustejovsky (ed.) Semantics and the Lexicon, MIT Press, Cambridge, M.A.
Google Scholar
Witten, I. H., A. Moffat and T. C. Bell. 1994. Managing Gigabytes, Van Nostrand Reinhold, New York.
Google Scholar
Yarowsky, D. 1992. Word Sense Disambiguation using Statistical Models of Roget’s Categories Trained on Large Corpora. In the Proceedings of the 14th International Conference on Computational Linguistics,pp. 454-460, Nantes, France.
Google Scholar
Yarowsky, D. 1995. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. In the Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, pp. 189–196.
Google Scholar
Zernik, U. 1991. Trainl vs. Train2: Tagging Word Senses in Corpus. In the Proceedings of Intelligent Systems: Current Research in Text Analysis, Information Extraction and Retrieval. GE Research and Development Center, Schenectady, New York.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Management, Ming Chuan University Taipei, Taiwan, ROC
Jen Nan Chen
Department of Computer Science, National Tsing Hua University Hsinchu, Taiwan, ROC
Jason S. Chang

Authors

Jen Nan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jason S. Chang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computing Research Laboratory, New Mexico State University, Las Cruces, New Mexico, USA
Evelyne Viegas

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Chen, J.N., Chang, J.S. (1999). Integrating Machine Readable Dictionary and Thesaurus for Conceptual Context Representation of Word Sense. In: Viegas, E. (eds) Breadth and Depth of Semantic Lexicons. Text, Speech and Language Technology, vol 10. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-0952-1_10

Download citation

DOI: https://doi.org/10.1007/978-94-017-0952-1_10
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-5347-3
Online ISBN: 978-94-017-0952-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics