Skip to main content
Log in

Evaluation of Linguistic Features for Word Sense Disambiguation with Self-Organized Document Maps

  • Published:
Computers and the Humanities Aims and scope Submit manuscript

Abstract

Word sense disambiguation automatically determines the appropriate senses of a word in context. We have previously shown that self-organized document maps have properties similar to a large-scale semantic structure that is useful for word sense disambiguation. This work evaluates the impact of different linguistic features on self-organized document maps for word sense disambiguation. The features evaluated are various qualitative features, e.g. part-of-speech and syntactic labels, and quantitative features, e.g. cut-off levels for word frequency. It is shown that linguistic features help make contextual information explicit. If the training corpus is large even contextually weak features, such as base forms, will act in concert to produce sense distinctions in a statistically significant way. However, the most important features are syntactic dependency relations and base forms annotated with part of speech or syntactic labels. We achieve 62.9% ± 0.73% correct results on the fine grained lexical task of the English SENSEVAL-2 data. On the 96.7% of the test cases which need no back-off to the most frequent sense we achieve 65.7% correct results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Agirre E., Martinez D. (2001) Knowledge Sources for Word Sense Disambiguation. In V. M. et al. (eds.), TSD 2001, Proceedings of the International Conference on Text, Speech and Dialogue. Springer-Verlag, Berlin Heidelberg, pp. 1–10.

  • C. Cabezas P. Resnik J. Stevens (2001) Supervised Sense Tagging Using Support Vector Machines. Proceedings of the Second International Workshop on Evaluating Word Sense Disambiguation Systems (SENSEVAL-2) Toulouse France

    Google Scholar 

  • Connexor (2002) ‘Machinese Syntax’. [http://www.connexor.com/].

  • G. Escudero L. Màrquez G. Rigau (2000) A Comparison between Supervised Learning Algorithms for Word Sense Disambiguation C. Cardie W. Daelemans C. Nedellec Kim Sang E. Tjong (Eds) Proceedings of CoNLL-2000 and LLL-2000 Lisbon Portugal 31–36

    Google Scholar 

  • R. Florian S. Cucerzan C. Schafer D. Yarowsky (2002) ArticleTitleCombining Classifiers for Word Sense Disambiguation Natural Language Engineering 8 IssueID4 327–341

    Google Scholar 

  • Florian R., Yarowsky D. (2002) Modeling Consensus: Classifier Combination for Word Sense Disambiguation. Proceedings of EMNLP-2002, pp. 25–32.

  • Honkela T., Kaski S., Lagus K., Kohonen T. (1996) Newsgroup Exploration with WEBSOM Method and Browsing Interface. Technical Report A32, Helsinki University of Technology, Laboratory of Computer and Information Science, Espoo, Finland.

  • N. Ide J. Veronis (1998) ArticleTitleIntroduction to the Special Issue on Word Sense Disambiguation: The State of the Art Computational Linguistics 24 IssueID1 1–40

    Google Scholar 

  • S. Kaski (1998) Dimensionality Reduction by Random Mapping: Fast Similarity Computation for Clustering. Proceedings of IJCNN’98 International Joint Conference on Neural Networks Vol. 1. IEEE Service Center Piscataway, NJ 413–418

    Google Scholar 

  • T. Kohonen (1997) Self-Organizing Maps Vol. 30 of Springer Series in Information Sciences EditionNumber2 Springer Berlin

    Google Scholar 

  • T. Kohonen S. Kaski K. Lagus J. Salojärvi V. Paatero A. Saarela (2000) ArticleTitleOrganization of a Massive Document Collection IEEE Transactions on Neural Networks Special Issue on Neural Networks for Data Mining and Knowledge Discovery 11 IssueID3 574–585

    Google Scholar 

  • C. Leacock M. Chodorow G. A. Miller (1998) ArticleTitleUsing Corpus Statistics and WordNet Relations for Sense Identification Computational Linguistics 24 IssueID1 147–165

    Google Scholar 

  • Lee Y.K., Ng H.T. (2002) An Empirical Evaluation of Knowledge Sources and Learning Algorithms for Word Sense Disambiguation. Proceedings of EMNLP-2002. pp. 41–48.

  • K. Lindén (2003) Word Sense Disambiguation with THESSOM Proceedings of the WSOM’03 – Intelligent Systems and Innovational Computing Kitakuyshu Japan

    Google Scholar 

  • K. Lindén K. Lagus (2002) Word Sense Disambiguation in Document Space. Proceedings of the 2002 IEEE International Conference on Systems, Man and Cybernetics Hammamet Tunisia

    Google Scholar 

  • W. Lowe (1997) Semantic representation and priming in a self-organizing lexicon J. A. Bullinaria D. W. Glasspool G. Houghton (Eds) Proceedings of the Fourth Neural Computation and Psychology Workshop: Connectionist Representations Springer-Verlag London 227–239

    Google Scholar 

  • W. Lowe (2001) Towards a Theory of Semantic Space J.D. Moore K. Stenning (Eds) Proceedings of the Twenty-Third Annual Conference of the Cognitive Science Society Lawrence Erlbaum Mahwah, NJ 576–581

    Google Scholar 

  • B. Magnini C. Strapparava G. Pezzulo A. Gliozzo (2002) ArticleTitleThe Role of Domain Information inWord Sense Disambiguation Natural Language Engineering 8 IssueID4 359–373

    Google Scholar 

  • C.D. Manning H. Schütze (1999) Foundations of Statistical Natural Language Processing The MIT Press Cambridge, MA

    Google Scholar 

  • T. Martinetz K. Schulten (1994) ArticleTitleTopology Representing Networks Neural Networks 7 IssueID3 507–522

    Google Scholar 

  • Martinez D., Agirre E. (2000) One Sense per Collocation and Genre/Topic Variations. Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. Hong Kong.

  • H. Ritter T. Kohonen (1989) ArticleTitleSelf-Organizing Semantic Maps Biological Cybernetics 61 241–254

    Google Scholar 

  • Samuelsson C. (2000) A Statistical Theory of Dependency Syntax. Proceedings of COLING-2000. ICCL.

  • H. Schütze (1998) ArticleTitleAutomatic Word Sense Discrimination Computational Linguistics 24 IssueID1 97–123

    Google Scholar 

  • SENSEVAL-2: 2001, Training and Testing Corpora. [http://www.cis.upenn.edu/cotton/senseval/corpora.tgz].

  • G. Somes (1983) McNemar test S. Kotz N. Johnson (Eds) Encyclopedia of Statistical Sciences, Vol 5. Wiley New York 361–363

    Google Scholar 

  • Steyvers M., Tenenbaum J.B. submitted, ‘The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model for Semantic Growth’. Cognitive Science.

  • Tapanainen P., Järvinen T. (1997) A Non-Projective Dependency Parser. Proceedings of 5th Conference on Applied Natural Language Processing. pp. 64–71.

  • E.M. Voorhees C. Leacock G. Towell (1995) Computational Learning Theory and Natural Language Learning Systems 3: Selecting Good Models MIT Press Cambridge 279–305

    Google Scholar 

  • Yarowsky D. (1995) Unsupervised Word-Sense Disambiguation Rivaling Supervised Methods. Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics (ACL ‘95). Cambridge, MA, pp. 189–196.

  • D. Yarowsky R. Florian (2002) ArticleTitleEvaluating Sense Disambiguation Across Diverse Parameter Spaces Natural Language Engineering 8 IssueID4 293–310

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Krister Lindén.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lindén, K. Evaluation of Linguistic Features for Word Sense Disambiguation with Self-Organized Document Maps. Comput Hum 38, 417–435 (2004). https://doi.org/10.1007/s10579-004-1948-9

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-004-1948-9

Keywords

Navigation