Evaluation of Linguistic Features for Word Sense Disambiguation with Self-Organized Document Maps

Lindén, Krister

doi:10.1007/s10579-004-1948-9

Evaluation of Linguistic Features for Word Sense Disambiguation with Self-Organized Document Maps

Published: November 2004

Volume 38, pages 417–435, (2004)
Cite this article

Computers and the Humanities Aims and scope Submit manuscript

Krister Lindén^1,2

99 Accesses
2 Citations
Explore all metrics

Abstract

Word sense disambiguation automatically determines the appropriate senses of a word in context. We have previously shown that self-organized document maps have properties similar to a large-scale semantic structure that is useful for word sense disambiguation. This work evaluates the impact of different linguistic features on self-organized document maps for word sense disambiguation. The features evaluated are various qualitative features, e.g. part-of-speech and syntactic labels, and quantitative features, e.g. cut-off levels for word frequency. It is shown that linguistic features help make contextual information explicit. If the training corpus is large even contextually weak features, such as base forms, will act in concert to produce sense distinctions in a statistically significant way. However, the most important features are syntactic dependency relations and base forms annotated with part of speech or syntactic labels. We achieve 62.9% ± 0.73% correct results on the fine grained lexical task of the English SENSEVAL-2 data. On the 96.7% of the test cases which need no back-off to the most frequent sense we achieve 65.7% correct results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agirre E., Martinez D. (2001) Knowledge Sources for Word Sense Disambiguation. In V. M. et al. (eds.), TSD 2001, Proceedings of the International Conference on Text, Speech and Dialogue. Springer-Verlag, Berlin Heidelberg, pp. 1–10.
C. Cabezas P. Resnik J. Stevens (2001) Supervised Sense Tagging Using Support Vector Machines. Proceedings of the Second International Workshop on Evaluating Word Sense Disambiguation Systems (SENSEVAL-2) Toulouse France
Google Scholar
Connexor (2002) ‘Machinese Syntax’. [http://www.connexor.com/].
G. Escudero L. Màrquez G. Rigau (2000) A Comparison between Supervised Learning Algorithms for Word Sense Disambiguation C. Cardie W. Daelemans C. Nedellec Kim Sang E. Tjong (Eds) Proceedings of CoNLL-2000 and LLL-2000 Lisbon Portugal 31–36
Google Scholar
R. Florian S. Cucerzan C. Schafer D. Yarowsky (2002) ArticleTitleCombining Classifiers for Word Sense Disambiguation Natural Language Engineering 8 IssueID4 327–341
Google Scholar
Florian R., Yarowsky D. (2002) Modeling Consensus: Classifier Combination for Word Sense Disambiguation. Proceedings of EMNLP-2002, pp. 25–32.
Honkela T., Kaski S., Lagus K., Kohonen T. (1996) Newsgroup Exploration with WEBSOM Method and Browsing Interface. Technical Report A32, Helsinki University of Technology, Laboratory of Computer and Information Science, Espoo, Finland.
N. Ide J. Veronis (1998) ArticleTitleIntroduction to the Special Issue on Word Sense Disambiguation: The State of the Art Computational Linguistics 24 IssueID1 1–40
Google Scholar
S. Kaski (1998) Dimensionality Reduction by Random Mapping: Fast Similarity Computation for Clustering. Proceedings of IJCNN’98 International Joint Conference on Neural Networks Vol. 1. IEEE Service Center Piscataway, NJ 413–418
Google Scholar
T. Kohonen (1997) Self-Organizing Maps Vol. 30 of Springer Series in Information Sciences EditionNumber2 Springer Berlin
Google Scholar
T. Kohonen S. Kaski K. Lagus J. Salojärvi V. Paatero A. Saarela (2000) ArticleTitleOrganization of a Massive Document Collection IEEE Transactions on Neural Networks Special Issue on Neural Networks for Data Mining and Knowledge Discovery 11 IssueID3 574–585
Google Scholar
C. Leacock M. Chodorow G. A. Miller (1998) ArticleTitleUsing Corpus Statistics and WordNet Relations for Sense Identification Computational Linguistics 24 IssueID1 147–165
Google Scholar
Lee Y.K., Ng H.T. (2002) An Empirical Evaluation of Knowledge Sources and Learning Algorithms for Word Sense Disambiguation. Proceedings of EMNLP-2002. pp. 41–48.
K. Lindén (2003) Word Sense Disambiguation with THESSOM Proceedings of the WSOM’03 – Intelligent Systems and Innovational Computing Kitakuyshu Japan
Google Scholar
K. Lindén K. Lagus (2002) Word Sense Disambiguation in Document Space. Proceedings of the 2002 IEEE International Conference on Systems, Man and Cybernetics Hammamet Tunisia
Google Scholar
W. Lowe (1997) Semantic representation and priming in a self-organizing lexicon J. A. Bullinaria D. W. Glasspool G. Houghton (Eds) Proceedings of the Fourth Neural Computation and Psychology Workshop: Connectionist Representations Springer-Verlag London 227–239
Google Scholar
W. Lowe (2001) Towards a Theory of Semantic Space J.D. Moore K. Stenning (Eds) Proceedings of the Twenty-Third Annual Conference of the Cognitive Science Society Lawrence Erlbaum Mahwah, NJ 576–581
Google Scholar
B. Magnini C. Strapparava G. Pezzulo A. Gliozzo (2002) ArticleTitleThe Role of Domain Information inWord Sense Disambiguation Natural Language Engineering 8 IssueID4 359–373
Google Scholar
C.D. Manning H. Schütze (1999) Foundations of Statistical Natural Language Processing The MIT Press Cambridge, MA
Google Scholar
T. Martinetz K. Schulten (1994) ArticleTitleTopology Representing Networks Neural Networks 7 IssueID3 507–522
Google Scholar
Martinez D., Agirre E. (2000) One Sense per Collocation and Genre/Topic Variations. Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. Hong Kong.
H. Ritter T. Kohonen (1989) ArticleTitleSelf-Organizing Semantic Maps Biological Cybernetics 61 241–254
Google Scholar
Samuelsson C. (2000) A Statistical Theory of Dependency Syntax. Proceedings of COLING-2000. ICCL.
H. Schütze (1998) ArticleTitleAutomatic Word Sense Discrimination Computational Linguistics 24 IssueID1 97–123
Google Scholar
SENSEVAL-2: 2001, Training and Testing Corpora. [http://www.cis.upenn.edu/cotton/senseval/corpora.tgz].
G. Somes (1983) McNemar test S. Kotz N. Johnson (Eds) Encyclopedia of Statistical Sciences, Vol 5. Wiley New York 361–363
Google Scholar
Steyvers M., Tenenbaum J.B. submitted, ‘The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model for Semantic Growth’. Cognitive Science.
Tapanainen P., Järvinen T. (1997) A Non-Projective Dependency Parser. Proceedings of 5th Conference on Applied Natural Language Processing. pp. 64–71.
E.M. Voorhees C. Leacock G. Towell (1995) Computational Learning Theory and Natural Language Learning Systems 3: Selecting Good Models MIT Press Cambridge 279–305
Google Scholar
Yarowsky D. (1995) Unsupervised Word-Sense Disambiguation Rivaling Supervised Methods. Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics (ACL ‘95). Cambridge, MA, pp. 189–196.
D. Yarowsky R. Florian (2002) ArticleTitleEvaluating Sense Disambiguation Across Diverse Parameter Spaces Natural Language Engineering 8 IssueID4 293–310
Google Scholar

Download references

Author information

Authors and Affiliations

Neural Networks Research Centre, Helsinki University of Technology, P.O.Box 9800, FIN-02015, Hut, Finland
Krister Lindén
Department of General Linguistics, University of Helsinki, P.O. Box 9, FIN-00014, Finland
Krister Lindén

Authors

Krister Lindén
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Krister Lindén.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lindén, K. Evaluation of Linguistic Features for Word Sense Disambiguation with Self-Organized Document Maps. Comput Hum 38, 417–435 (2004). https://doi.org/10.1007/s10579-004-1948-9

Download citation

Issue Date: November 2004
DOI: https://doi.org/10.1007/s10579-004-1948-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluation of Linguistic Features for Word Sense Disambiguation with Self-Organized Document Maps

Abstract

Access this article

Similar content being viewed by others

Exploiting Lexical Sensitivity in Performing Word Sense Disambiguation

An Analysis of Word Sense Disambiguation (WSD)

Extending the TüBa-D/Z Treebank with GermaNet Sense Annotation

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Evaluation of Linguistic Features for Word Sense Disambiguation with Self-Organized Document Maps

Abstract

Access this article

Similar content being viewed by others

Exploiting Lexical Sensitivity in Performing Word Sense Disambiguation

An Analysis of Word Sense Disambiguation (WSD)

Extending the TüBa-D/Z Treebank with GermaNet Sense Annotation

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation