Skip to main content
Log in

Practical Word-Sense Disambiguation Using Co-occurring Concept Codes

  • Published:
Machine Translation

Abstract

Most previous corpus-based approaches to the resolution of word-sense ambiguity have collected lexical information from the context of the word to be disambiguated. However, they suffer from the problem of data sparseness. To address this problem, this paper proposes a disambiguation method using co-occurring concept codes (CCCs). The use of concept-code features and concept-code generalization effectively alleviate the data sparseness problem and also reduce the number of features to a practical size without any loss in system performance. We prove the effectiveness of the CCC features and the concept-code generalization by experimental evaluations. The proposed disambiguation method is applied to a Korean-to-Japanese MT system that experimented with various machine-learning techniques. In a lexical sample evaluation, our CCC-based method achieved a precision of 82.00%, with an 11.83% improvement over the baseline. Also, it achieved a precision of 83.51% in an experiment on real text, which shows that our proposed method is very useful for practical MT systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agirre, E. and G. Rigau: 1996, ‘Word sense disambiguation using conceptual density’. In Proceedings of the 16th International Conference on Compuational Linguistics, COLING-96, Copenhagen, Denmark, pp. 16–22.

  • Bruce, R. and J. Wiebe: 1994, ‘Word-sense disambiguation using decomposable models’. In 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, NM, pp. 139–145.

  • I. Dagan A. Itai (1994) ArticleTitle‘Word sense disambiguation using a second language monolingual corpus’ Computational Linguistics 20 563–596

    Google Scholar 

  • N. Ide J. Véronis (1998) ArticleTitle‘Introduction to the special issue on word sense disambiguation: The state of the art’ Computational Linguistics 24 1–40

    Google Scholar 

  • T. Joachims (1999) ‘Making large-scale support vector machine learning practical’ B. Schölkopf C.J.C. Burges A.J. Smola (Eds) Advances in Kernel Methods – Support Vector Machines The MIT Press Cambridge, MA 41–56

    Google Scholar 

  • Kim, E.J. and J.H. Lee: 1993, ‘A collocation-based transfer model for Japanese-to-Korean machine translation’. In Natural Language Processing Pacific Rim Symposium (NLPRS1993), Fukuoka, Japan, pp. 223–231.

  • C. Leacock M. Chodorow G.A. Miller (1998) ArticleTitle‘Using corpus statistics and WordNet relations for sense identification’ Computational Linguistics 24 147–165

    Google Scholar 

  • H.F. Li N.W. Heo K.H. Moon J.H. Lee G.B. Lee (2000) ArticleTitle‘Lexical transfer ambiguity resolution using automatically-extracted concept co-occurrence information’ International Journal of Computer Processing of Oriental Languages 13 53–68

    Google Scholar 

  • Lin, D.: 1997, ‘Using syntactic dependency as local context to resolve word sense ambiguity’. In 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics, Madrid, Spain, pp. 64–71.

  • Luk, A. K.: 1995, ‘Statistical sense disambiguation with relatively small corpora using dictionary definitions’. In 33rd Annual Meeting of the Association for Computational Linguistics, Columbus, OH, pp. 181–188.

  • B. Magnini C. Strapparava G. Pezzulo A. Gliozzo (2002) ArticleTitle‘The role of domain information in word sense disambiguation’ Natural Language Engineering 8 359–373 Occurrence Handle10.1017/S1351324902003029

    Article  Google Scholar 

  • S. McRoy (1992) ArticleTitle‘Using multiple knowledge sources for word sense discrimination’ Computational Linguistics 18 1–30

    Google Scholar 

  • Ng, H-T. and H-B. Lee: 1996, ‘Integrating multiple knowledge sources to disambiguate word sense: An exemplar-based approach’. In 34th Annual Meeting of the Association for Computational Linguistics, Santa Cruz, CA, pp. 40–47.

  • Ohno, S. and M. Hamanishi [New Synonym Dictionary], Kadokawa Shoten, Tōkyō.

  • Peh, L-S. and H-T. Ng: 1997, ‘Domain-specific semantic class disambiguation using wordNet’. In Proceedings of the Fifth Workshop on Very Large Corpora, Beijing/Hong Kong, pp. 56–64.

  • J.R. Quinlan (1993) C4.5: Programs for Machine Learning Morgan Kaufmann San Mateo, CA

    Google Scholar 

  • Resnik, P.: 1997, ‘Selectional preference and sense disambiguation’. In ACL SIGLEX Workshop on Tagging Text with Lexical Semantics: Why, What, and How?, Washington, DC, pp. 52–57.

  • F. Smadja (1993) ArticleTitle‘Retrieving collocations from text: Xtract’ Computational Linguistics 19 143–177

    Google Scholar 

  • Yarowsky, D.: 1992, ‘Word-sense disambiguation using statistical models of roget’s categories trained on large corpora’. In Proceedings of the fifteenth [sic] International Conference on Computational Linguistics: COLING’92, Nantes, France, pp. 454–460.

  • Yarowsky, D.: 1993, ‘One Sense per Collocation’. In Proceedings of DARPA Workshop on Human Language Technology, Princeton, NJ, pp. 266–271.

  • D. Yarowsky (2000) ArticleTitle‘Hierarchical decision lists for word sense disambiguation’ Computers and the Humanities 34 179–186 Occurrence Handle10.1023/A:1002674829964

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Youjin Chung.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chung, Y., Lee, JH. Practical Word-Sense Disambiguation Using Co-occurring Concept Codes. Mach Translat 19, 59–82 (2005). https://doi.org/10.1007/s10590-005-2559-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-005-2559-y

Keywords

Navigation