Comparisons of Relatedness Measures Through a Word Sense Disambiguation Task

Schwab, Didier; Tchechmedjiev, Andon; Goulian, Jérôme; Sérasset, Gilles

doi:10.1007/978-3-319-08043-7_13

Didier Schwab⁵,
Andon Tchechmedjiev⁵,
Jérôme Goulian⁵ &
…
Gilles Sérasset⁵

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 48))

1476 Accesses

Abstract

Michael Zock’s work has focussed these last years on finding the appropriate and most adequate word when writing or speaking. The semantic relatedness between words can play an important role in this context. Previous studies have pointed out three kinds of approaches for their evaluation: a theoretical examination of the desirability (or not) of certain mathematical properties, for example in mathematically defined measures: distances, similarities, scores, …; a comparison with human judgement or an evaluation through NLP applications. In this article, we present a novel approach to analyse the semantic relatedness between words that is based on the relevance of semantic relatedness measures on the global level of a word sense disambiguation task. More specifically, for a given selection of senses of a text, a global similarity for the sense selection can be computed, by combining the pairwise similarities through a particular function (sum for example) between all the selected senses. This global similarity value can be matched to other possible values pertaining to the selection, for example the F1 measure resulting from the evaluation with a gold standard reference annotation. We use several classical local semantic similarity measures as well as measures built by our team and study the correlation of the global score compared to the F1 values of a gold standard. Thus, we are able to locate the typical output of an algorithm compared to an exhaustive evaluation, and thus to optimise the measures and the sense selection process in general.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Sufficient in the sense of permitting the exhibition of statistical significance, even though in practice we generate several orders of magnitude more samples that the bare minimum necessary to obtain statistically significant differences in the average values.
2.
The article is available here https://wiki.csc.calpoly.edu/CSC-581-S11-06/browser/trunk/treebank_paper/buraw/wsj_0105.ready.buraw.
3.
http://getalp.imag.fr/static/wsd/Schwab-et-al-SemanticSimilarity2014.html.
4.
http://wn-similarity.sourceforge.net.
5.
Given that we have over a million configuration and that the correlation is calculated in chunks of 100 scores, each group contains over 10,000 samples, which at a 10⁻⁴ difference range should guarantee a sufficient statistical power.

References

Baldwin, T., Kim, S., Bond, F., Fujita, S., Martinez, D., & Tanaka, T. (2010). A reexamination of MRD-based word sense disambiguation. ACM Transactions on Asian Language Information Processing, 9(1), 4:1–4:21. doi:10.1145/1731035.1731039, http://doi.acm.org/10.1145/1731035.1731039.
Banerjee, S., & Pedersen, T. (2002). An adapted Lesk algorithm for word sense disambiguation using wordnet. In CICLing 2002, Mexico City.
Google Scholar
Brody, S., & Lapata, M. (2008). Good neighbors make good senses: Exploiting distributional similarity for unsupervised WSD. In Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Manchester, UK (pp. 65–72).
Google Scholar
Budanitsky, A., & Hirst, G. (2006). Evaluating WordNet-based measures of lexical semantic relatedness. Computational Linguistics, 32(1), 13–47.
Article MATH Google Scholar
Cowie, J., Guthrie, J., & Guthrie, L. (1992). Lexical disambiguation using simulated annealing. In COLING 1992 (Vol. 1, pp. 359–365). Nantes, France.
Google Scholar
Cramer, I., Wandmacher, T., & Waltinger, U. (2010). WordNet: An electronic lexical database, chapter modeling, learning and processing of text technological data structures. Heidelberg: Springer.
Google Scholar
Dice, L. R. (1945). Measures of the amount of ecologic association between species. Ecology, 26(3), 297–302.
Article Google Scholar
Gale, W., Church, K., & Yarowsky, D. (1992). One sense per discourse. In Fifth DARPA Speech and Natural Language Workshop (pp. 233–237). Harriman, New York: États-Unis.
Google Scholar
Gelbukh, A., Sidorov, G., & Han, S. Y. (2003). Evolutionary approach to natural language WSD through global coherence optimization. WSEAS Transactions on Communications, 2(1), 11–19.
Google Scholar
Hirst, G., & St-Onge, D. D. (1998). Lexical chains as representations of context for the detection and correction of malapropisms. In C. Fellbaum (Ed.) WordNet: An electronic lexical database (pp. 305–332). Cambridge, MA: MIT Press.
Google Scholar
Lesk, M. (1986). Automatic sense disambiguation using mrd: How to tell a pine cone from an ice cream cone. In Proceedings of SIGDOC ’86 (pp. 24–26). New York, NY, USA: ACM.
Google Scholar
Miller, G. A., Leacock, C., Tengi, R., & Bunker, R. T. (1993). A semantic concordance. In Proceedings of the Workshop on Human Language Technology, HLT ’93 (pp. 303–308). Stroudsburg, PA, USA: Association for Computational Linguistics. doi:10.3115/1075671.1075742, http://dx.doi.org/10.3115/1075671.1075742.
Miller, T., Biemann, C., Zesch, T., & Gurevych, I. (2012). Using distributional similarity for lexical expansion in knowledge-based word sense disambiguation. In Proceedings of COLING 2012 (pp. 1781–1796). Mumbai, India: The COLING 2012 Organizing Committee. Retrieved from http://www.aclweb.org/anthology/C12-1109.
Navigli, R. (2009). WSD: A survey. ACM Computing Surveys, 41(2), 1–69.
Article Google Scholar
Navigli, R. (2012). A quick tour of word sense disambiguation, induction and related approaches. In Proceedings of the 38th Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM) (pp. 115–129).
Google Scholar
Navigli, R., & Lapata, M. (2010). An experimental study of graph connectivity for unsupervised word sense disambiguation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 678–692.
Article Google Scholar
Ng, H. T., & Lee, H. B. (1996). Integrating multiple knowledge sources to disambiguate word sense: An exemplar-based approach. In Proceedings of the 34th annual meeting on Association for Computational Linguistics, ACL ’96 (pp. 40–47). Stroudsburg, PA, USA: Association for Computational Linguistics. doi:10.3115/981863.981869, http://dx.doi.org/10.3115/981863.981869.
Patwardhan, S., & Pedersen, T. (2006). Using wordnet based context vectors to estimate the semantic relatedness of concepts. In EACL 2006 Workshop Making Sense of Sense—Bringing Computational Linguistics and Psycholinguistics Together (pp. 1–8).
Google Scholar
Pedersen, T., Banerjee, S., & Patwardhan, S. (2005). Maximizing semantic relatedness to perform WSD. Research report, University of Minnesota Supercomputing Institute.
Google Scholar
Pirró, G., & Euzenat, J. (2010). A feature and information theoretic framework for semantic similarity and relatedness. In P. Patel-Schneider, Y. Pan, P. Hitzler, P. Mika, L. Zhang, J. Pan, I. Horrocks, & B. Glimm (Eds.), The semantic web—ISWC 2010 (Vol. 6496, pp. 615–630)., Lecture Notes in Computer Science Berlin/Heidelberg: Springer.
Chapter Google Scholar
Rogers, D., & Tanimoto, T. (1960). A computer program for classifying plants. Science, 132(3434), 1115–1118.
Article Google Scholar
Schutze, H. (1998). Automatic word sense discrimination. Computational Linguistics, 24(1), 97–123.
MathSciNet Google Scholar
Schwab, D., Goulian, J., & Guillaume, N. (2011). Désambigusation lexicale par propagation de mesures sémantiques locales par algorithmes à colonies de fourmis. In Traitement Automatique des Langues Naturelles (TALN), Montpellier, France.
Google Scholar
Schwab, D., Goulian, J., & Tchechmedjiev, A. (2013). Worst-case complexity and empirical evaluation of artificial intelligence methods for unsupervised word sense disambiguation. International Journal of Web Engineering and Technology 8(2), 124–153. doi:10.1504/IJWET.2013.055713, http://dx.doi.org/10.1504/IJWET.2013.055713.
Schwab, D., Goulian, J., Tchechmedjiev, A., & Blanchon, H. (2012). Ant colony algorithm for the unsupervised word sense disambiguation of texts: Comparison and evaluation. In Proceedings of the 25th International Conference on Computational Linguistics (COLING 2012), Mumbai (India).
Google Scholar
Silber, H. G., McCoy, K. F. (2000). Efficient text summarization using lexical chains. In Proceedings of the 5th International Conference on Intelligent User Interfaces, IUI ’00 (pp. 252–255). New York, NY, USA: ACM.
Google Scholar
Tversky, A. (1977). Features of similarity. Psychological Review, 84(4), 327–352.
Article Google Scholar
Wilks, Y., & Stevenson, M. (1998). Word sense disambiguation using optimised combinations of knowledge sources. In COLING ’98 (pp. 1398–1402). Stroudsburg, PA, USA: ACL. Retrieved from http://dx.doi.org/10.3115/980432.980797.
Zipf, G. K. (1949). Human behavior and the principle of least effort. Reading, MA: Addison-Wesley.
Google Scholar
Zock, M., Ferret, O., & Schwab, D. (2010). Deliberate word access: An intuition, a roadmap and some preliminary empirical results. International Journal of Speech Technology, 13(4), 107–117. Retrieved from http://hal.archives-ouvertes.fr/hal-00953695.
Zock, M., & Schwab, D. (2011). Storage does not guarantee access: The problem of organizing and accessing words in a speaker’s Lexicon. Journal of Cognitive Science, 12, 233–258. Retrieved from http://hal.archives-ouvertes.fr/hal-00953672. (Impact-F 3.52 estim. in 2012).

Download references

Author information

Authors and Affiliations

Université de Grenoble Alpes, LIG-GETALP, Grenoble, France
Didier Schwab, Andon Tchechmedjiev, Jérôme Goulian & Gilles Sérasset

Authors

Didier Schwab
View author publications
You can also search for this author in PubMed Google Scholar
Andon Tchechmedjiev
View author publications
You can also search for this author in PubMed Google Scholar
Jérôme Goulian
View author publications
You can also search for this author in PubMed Google Scholar
Gilles Sérasset
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Didier Schwab .

Editor information

Editors and Affiliations

CNRS-LIF, UMR 7279, Aix-Marseille University, City, France
Núria Gala
CNRS-LIF, UMR 7279, Aix-Marseille University and University of Mainz, Marseille, France
Reinhard Rapp
CNRS-LIF, UMR 7279, Aix-Marseille University, Marseille, France
Gemma Bel-Enguix

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Schwab, D., Tchechmedjiev, A., Goulian, J., Sérasset, G. (2015). Comparisons of Relatedness Measures Through a Word Sense Disambiguation Task. In: Gala, N., Rapp, R., Bel-Enguix, G. (eds) Language Production, Cognition, and the Lexicon. Text, Speech and Language Technology, vol 48. Springer, Cham. https://doi.org/10.1007/978-3-319-08043-7_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-08043-7_13
Published: 12 November 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08042-0
Online ISBN: 978-3-319-08043-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics