Abstract
Word sense disambiguation automatically determines the appropriate senses of a word in context. We have previously shown that self-organized document maps have properties similar to a large-scale semantic structure that is useful for word sense disambiguation. This work evaluates the impact of different linguistic features on self-organized document maps for word sense disambiguation. The features evaluated are various qualitative features, e.g. part-of-speech and syntactic labels, and quantitative features, e.g. cut-off levels for word frequency. It is shown that linguistic features help make contextual information explicit. If the training corpus is large even contextually weak features, such as base forms, will act in concert to produce sense distinctions in a statistically significant way. However, the most important features are syntactic dependency relations and base forms annotated with part of speech or syntactic labels. We achieve 62.9% ± 0.73% correct results on the fine grained lexical task of the English SENSEVAL-2 data. On the 96.7% of the test cases which need no back-off to the most frequent sense we achieve 65.7% correct results.
Similar content being viewed by others
References
Agirre E., Martinez D. (2001) Knowledge Sources for Word Sense Disambiguation. In V. M. et al. (eds.), TSD 2001, Proceedings of the International Conference on Text, Speech and Dialogue. Springer-Verlag, Berlin Heidelberg, pp. 1–10.
C. Cabezas P. Resnik J. Stevens (2001) Supervised Sense Tagging Using Support Vector Machines. Proceedings of the Second International Workshop on Evaluating Word Sense Disambiguation Systems (SENSEVAL-2) Toulouse France
Connexor (2002) ‘Machinese Syntax’. [http://www.connexor.com/].
G. Escudero L. Màrquez G. Rigau (2000) A Comparison between Supervised Learning Algorithms for Word Sense Disambiguation C. Cardie W. Daelemans C. Nedellec Kim Sang E. Tjong (Eds) Proceedings of CoNLL-2000 and LLL-2000 Lisbon Portugal 31–36
R. Florian S. Cucerzan C. Schafer D. Yarowsky (2002) ArticleTitleCombining Classifiers for Word Sense Disambiguation Natural Language Engineering 8 IssueID4 327–341
Florian R., Yarowsky D. (2002) Modeling Consensus: Classifier Combination for Word Sense Disambiguation. Proceedings of EMNLP-2002, pp. 25–32.
Honkela T., Kaski S., Lagus K., Kohonen T. (1996) Newsgroup Exploration with WEBSOM Method and Browsing Interface. Technical Report A32, Helsinki University of Technology, Laboratory of Computer and Information Science, Espoo, Finland.
N. Ide J. Veronis (1998) ArticleTitleIntroduction to the Special Issue on Word Sense Disambiguation: The State of the Art Computational Linguistics 24 IssueID1 1–40
S. Kaski (1998) Dimensionality Reduction by Random Mapping: Fast Similarity Computation for Clustering. Proceedings of IJCNN’98 International Joint Conference on Neural Networks Vol. 1. IEEE Service Center Piscataway, NJ 413–418
T. Kohonen (1997) Self-Organizing Maps Vol. 30 of Springer Series in Information Sciences EditionNumber2 Springer Berlin
T. Kohonen S. Kaski K. Lagus J. Salojärvi V. Paatero A. Saarela (2000) ArticleTitleOrganization of a Massive Document Collection IEEE Transactions on Neural Networks Special Issue on Neural Networks for Data Mining and Knowledge Discovery 11 IssueID3 574–585
C. Leacock M. Chodorow G. A. Miller (1998) ArticleTitleUsing Corpus Statistics and WordNet Relations for Sense Identification Computational Linguistics 24 IssueID1 147–165
Lee Y.K., Ng H.T. (2002) An Empirical Evaluation of Knowledge Sources and Learning Algorithms for Word Sense Disambiguation. Proceedings of EMNLP-2002. pp. 41–48.
K. Lindén (2003) Word Sense Disambiguation with THESSOM Proceedings of the WSOM’03 – Intelligent Systems and Innovational Computing Kitakuyshu Japan
K. Lindén K. Lagus (2002) Word Sense Disambiguation in Document Space. Proceedings of the 2002 IEEE International Conference on Systems, Man and Cybernetics Hammamet Tunisia
W. Lowe (1997) Semantic representation and priming in a self-organizing lexicon J. A. Bullinaria D. W. Glasspool G. Houghton (Eds) Proceedings of the Fourth Neural Computation and Psychology Workshop: Connectionist Representations Springer-Verlag London 227–239
W. Lowe (2001) Towards a Theory of Semantic Space J.D. Moore K. Stenning (Eds) Proceedings of the Twenty-Third Annual Conference of the Cognitive Science Society Lawrence Erlbaum Mahwah, NJ 576–581
B. Magnini C. Strapparava G. Pezzulo A. Gliozzo (2002) ArticleTitleThe Role of Domain Information inWord Sense Disambiguation Natural Language Engineering 8 IssueID4 359–373
C.D. Manning H. Schütze (1999) Foundations of Statistical Natural Language Processing The MIT Press Cambridge, MA
T. Martinetz K. Schulten (1994) ArticleTitleTopology Representing Networks Neural Networks 7 IssueID3 507–522
Martinez D., Agirre E. (2000) One Sense per Collocation and Genre/Topic Variations. Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. Hong Kong.
H. Ritter T. Kohonen (1989) ArticleTitleSelf-Organizing Semantic Maps Biological Cybernetics 61 241–254
Samuelsson C. (2000) A Statistical Theory of Dependency Syntax. Proceedings of COLING-2000. ICCL.
H. Schütze (1998) ArticleTitleAutomatic Word Sense Discrimination Computational Linguistics 24 IssueID1 97–123
SENSEVAL-2: 2001, Training and Testing Corpora. [http://www.cis.upenn.edu/cotton/senseval/corpora.tgz].
G. Somes (1983) McNemar test S. Kotz N. Johnson (Eds) Encyclopedia of Statistical Sciences, Vol 5. Wiley New York 361–363
Steyvers M., Tenenbaum J.B. submitted, ‘The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model for Semantic Growth’. Cognitive Science.
Tapanainen P., Järvinen T. (1997) A Non-Projective Dependency Parser. Proceedings of 5th Conference on Applied Natural Language Processing. pp. 64–71.
E.M. Voorhees C. Leacock G. Towell (1995) Computational Learning Theory and Natural Language Learning Systems 3: Selecting Good Models MIT Press Cambridge 279–305
Yarowsky D. (1995) Unsupervised Word-Sense Disambiguation Rivaling Supervised Methods. Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics (ACL ‘95). Cambridge, MA, pp. 189–196.
D. Yarowsky R. Florian (2002) ArticleTitleEvaluating Sense Disambiguation Across Diverse Parameter Spaces Natural Language Engineering 8 IssueID4 293–310
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lindén, K. Evaluation of Linguistic Features for Word Sense Disambiguation with Self-Organized Document Maps. Comput Hum 38, 417–435 (2004). https://doi.org/10.1007/s10579-004-1948-9
Issue Date:
DOI: https://doi.org/10.1007/s10579-004-1948-9