Abstract
In this paper we propose a general framework for word sense disambiguation using knowledge latent in Wikipedia. Specifically, we exploit the rich and growing Wikipedia corpus in order to achieve a large and robust knowledge repository consisting of keyphrases and their associated candidate topics. Keyphrases are mainly derived from Wikipedia article titles and anchor texts associated with wikilinks. The disambiguation of a given keyphrase is based on both the commonness of a candidate topic and the context-dependent relatedness where unnecessary (and potentially noisy) context information is pruned. With extensive experimental evaluations using different relatedness measures, we show that the proposed technique achieved comparable disambiguation accuracies with respect to state-of-the-art techniques, while incurring orders of magnitude less computation cost.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Artiles, J., Gonzalo, J., Sekine, S.: Weps 2 evaluation campaign: overview of the web people search clustering task. In: Web People Search Evaluation Workshop (WePS), WWW Conference (2009)
Bagga, A., Baldwin, B.: Entity-based cross-document coreferencing using the vector space model. In: Int’l Conf. on Computational Linguistics, pp. 79–85 (1998)
Cucerzan, S.: Large-scale named entity disambiguation based on wikipedia data. In: EMNLP-CoNLL, pp. 708–716 (2007)
Gabrilovich, E., Markovitch, S.: Overcoming the brittleness bottleneck using wikipedia: enhancing text categorization with encyclopedic knowledge. In: AAAI, pp. 1301–1306 (2006)
Giles, J.: Internet encyclopaedias go head to head. Nature 438 (December 2005)
Gliozzo, A., Giuliano, C., Strapparava, C.: Domain kernels for word sense disambiguation. In: ACL, pp. 403–410 (2005)
Grineva, M., Grinev, M., Lizorkin, D.: Extracting key terms from noisy and multitheme documents. In: WWW, pp. 661–670 (2009)
Han, X., Zhao, J.: Named entity disambiguation by leveraging wikipedia semantic knowledge. In: ACM CIKM, pp. 215–224 (2009)
Hu, X., Zhang, X., Lu, C., Park, E.K., Zhou, X.: Exploiting wikipedia as external knowledge for document clustering. In: ACM KDD, pp. 389–396 (2009)
Lee, Y.K., Ng, H.T.: An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation. In: EMNLP, pp. 41–48 (2002)
Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: SIGDOC, pp. 24–26 (1986)
Mann, G.S., Yarowsky, D.: Unsupervised personal name disambiguation. In: HLT-NAACL, pp. 33–40 (2003)
Medelyan, O., Witten, I.H., Milne, D.: Topic indexing with wikipedia. In: AAAI Workshop on Wikipedia and Artificial Intelligence (2008)
Mihalcea, R., Csomai, A.: Wikify!: linking documents to encyclopedic knowledge. In: ACM CIKM, pp. 233–242 (2007)
Milne, D., Witten, I.H.: An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In: AAAI Workshop on Wikipedia and Artificial Intelligence (2008)
Milne, D., Witten, I.H.: Learning to link with wikipedia. In: ACM CIKM, pp. 509–518 (2008)
Pedersen, T.: A decision tree of bigrams is an accurate predictor of word sense. In: NAACL, pp. 1–8 (2001)
Ravin, Y., Kazi, Z.: Is hillary rodham clinton the president?: disambiguating names across documents. In: Workshop on Coreference and its Applications (CorefApp), pp. 9–16 (1999)
Strube, M., Ponzetto, S.P.: Wikirelate! computing semantic relatedness using wikipedia. In: AAAI, pp. 1419–1424 (2006)
Turdakov, D., Velikhov, P.: Semantic relatedness metric for wikipedia concepts based on link analysis and its application to word sense disambiguation. In: SYRCoDIS. CEUR Workshop Proceedings, vol. 355 (2008)
Wang, P., Domeniconi, C.: Building semantic kernels for text classification using wikipedia. In: ACM KDD, pp. 713–721 (2008)
Yoshida, M., Ikeda, M., Ono, S., Sato, I., Nakagawa, H.: Person name disambiguation by bootstrapping. In: ACM SIGIR, pp. 10–17 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, C., Sun, A., Datta, A. (2011). A Generalized Method for Word Sense Disambiguation Based on Wikipedia. In: Clough, P., et al. Advances in Information Retrieval. ECIR 2011. Lecture Notes in Computer Science, vol 6611. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20161-5_65
Download citation
DOI: https://doi.org/10.1007/978-3-642-20161-5_65
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20160-8
Online ISBN: 978-3-642-20161-5
eBook Packages: Computer ScienceComputer Science (R0)