Finding the Most Frequent Sense of a Word by the Length of Its Definition

Calvo, Hiram; Gelbukh, Alexander

doi:10.1007/978-3-319-13647-9_1

Finding the Most Frequent Sense of a Word by the Length of Its Definition

Hiram Calvo²² &
Alexander Gelbukh²²

Conference paper

1756 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8856))

Abstract

Most frequent sense (MFS) is a very powerful heuristic in word sense disambiguation, extremely difficult to outperform with sophisticated methods. We show that counting the number of words, characters, or relationships of a word’s sense definitions allows guessing the most frequent sense of the word: the MFS usually has a longer gloss, more examples of usage, and more relationships with other words (synonyms, hyponyms, etc.). In addition, we show that this effect is resource-dependent, making some algorithms to perform differently with different dictionaries.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hawker, T., Honnibal, M.: Improved Default Sense Selection for Word Sense Disambiguation. In: Proceedings of the 2006 Australasian Language Technology Workshop (ALTW 2006), pp. 11–17 (2006)
Google Scholar
Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the 5th Annual International Conference on Systems Documentation, pp. 24–26. ACM (1986)
Google Scholar
Lin, D.: An information-theoretic definition of similarity. In: International Conference on Machine Learning, vol. 98, pp. 296–304 (1998)
Google Scholar
Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: The Penn Treebank. Computational linguistics 19(2), 313–330 (1993)
Google Scholar
Màrquez, L., Taulé, M., Martí, M.A., García, M., Artigas, N., Real, F.J., Ferrés, D.: Senseval-3: The Spanish Lexical Sample Task. In: Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text. Association for Computational Linguistics, Barcelona (2004)
Google Scholar
McCarthy, D., Koeling, R., Weeds, R.J., Carroll, J.: Unsupervised acquisition of predominant word senses. Computational Linguistics 33(4), 553–590 (2007)
Article Google Scholar
Mihalcea, R., Chklovski, T., Kilgarriff, A.: The Senseval-3 English lexical sample task. In: Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, pp. 25–28 (2004)
Google Scholar
Miller, G., Leacock, C., Tengi, R., Bunker, R.T.: A Semantic Concordance. In: Proceedings of ARPA Workshop on Human Language Technology, pp. 303–308 (1993)
Google Scholar
Miller, G.A., Chodorow, M., Landes, S., Leacock, C., Thomas, R.G.: Using a semantic concordance for sense identification. In: Proceedings of the ARPA Human Language Technology Workshop, pp. 240–243 (1994)
Google Scholar
Snyder, B., Palmer, M.: The English all-words task. In: ACL 2004 Senseval-3 Workshop, Barcelona, Spain (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Instituto Politécnico Nacional, Centro de Investigación en Computación, Av. Juan de Dios Bátiz s/n, esq. Av. Mendizábal, D.F., 07738, Mexico
Hiram Calvo & Alexander Gelbukh

Authors

Hiram Calvo
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Gelbukh
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Centro de Investigación en Computación, Instituto Politécnico Nacional, Av. Juan Dios Bátiz s/n, Col. Nueva Industrial Vallejo, 07738, Mexico City, Mexico
Alexander Gelbukh
Área Académica de Computación y Electrónica, Carretera Pachuca-Tulancingo, Universidad Autónoma del Estado de Hidalgo, Km. 4.5, Col. Carboneras, Mineral de la Reforma, 42180, Hidalgo, Mexico
Félix Castro Espinoza
Facultad de ciencias, Universidad Autónoma Nacional de México, Ciudad Universitaria, México DF, Mexico
Sofía N. Galicia-Haro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Calvo, H., Gelbukh, A. (2014). Finding the Most Frequent Sense of a Word by the Length of Its Definition. In: Gelbukh, A., Espinoza, F.C., Galicia-Haro, S.N. (eds) Human-Inspired Computing and Its Applications. MICAI 2014. Lecture Notes in Computer Science(), vol 8856. Springer, Cham. https://doi.org/10.1007/978-3-319-13647-9_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-13647-9_1
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13646-2
Online ISBN: 978-3-319-13647-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics