Word Sense Disambiguation Using Wikipedia

Dandala, Bharath; Mihalcea, Rada; Bunescu, Razvan

doi:10.1007/978-3-642-35085-6_9

Bharath Dandala³,
Rada Mihalcea³ &
Razvan Bunescu⁴

Part of the book series: Theory and Applications of Natural Language Processing ((NLP))

1615 Accesses
6 Citations

Abstract

This paper describes explorations in word sense disambiguation using Wikipedia as a source of sense annotations. Through experiments on four different languages, we show that the Wikipedia-based sense annotations are reliable and can be used to construct accurate sense classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://meta.wikimedia.org/wiki/List_of_Wikipedias
2.
The average length of a paragraph is 80 words.
3.
http://www.senseval.org
4.
Note that this baseline assumes the availability of a sense tagged corpus in order to determine the most frequent sense of a word. The baseline is therefore “informed,” as compared to a random, “uninformed” sense selection.
5.
http://translate.google.com
6.
http://www.wiktionary.org

References

Agirre E, de Lacalle OL (2009) Supervised domain adaption for WSD. In: Proceedings of the 12th conference of the European chapter of the association for computational linguistics, association for computational linguistics, EACL ’09, Stroudsburg, PA, USA, pp 42–50
Google Scholar
Agirre E, Martinez D (2004) Unsupervised word sense disambiguation based on automatically retrieved examples: the importance of bias. In: Proceedings of EMNLP 2004, Barcelona, Spain
Google Scholar
Agirre E, De Lacalle OL, Soroa A (2009) Knowledge-based WSD on specific domains: performing better than generic supervised WSD. In: Proceedings of the 21st international joint conference on artifical intelligence, IJCAI’09. Morgan Kaufmann, San Francisco, pp 1501–1506
Google Scholar
Ahn D, Jijkoun V, Mishne G, Muller K, de Rijke M, Schlobach S (2004) Using Wikipedia at the TREC QA track. In: Proceedings of the 13th text retrieval conference (TREC 2004), Gaithersburg, MD
Google Scholar
Bizer C, Lehmann J, Kobilarov G, Auer S, Becker C, Cyganiak R, Hellmann S (2009) DBpedia – a crystallization point for the Web of data. Web Semant 7:154–165
Article Google Scholar
Bryl V, Giuliano C, Serafini L, Tymoshenko K (2010) Using background knowledge to support coreference resolution. In: Proceedings of the 2010 conference on ECAI 2010: 19th European conference on artificial intelligence, Amsterdam, The Netherlands, pp 759–764
Google Scholar
Bunescu R, Pasca M (2006) Using encyclopedic knowledge for named entity disambiguation. In: Proceedings of the European conference of the association for computational linguistics, Trento, Italy
Google Scholar
Chklovski T, Mihalcea R (2002) Building a sense tagged corpus with open mind word expert. In: Proceedings of the ACL 2002 workshop on word sense disambiguation: recent successes and future directions, Philadelphia
Google Scholar
Cimiano P, Schultz A, Sizov S, Sorg P, Staab S (2009) Explicit versus latent concept models for cross-language information retrieval. In: International joint conference on artificial intelligence, IJCAI-09, Pasadena, CA, pp 1513–1518
Google Scholar
Cucerzan S (2007) Large-scale named entity disambiguation based on Wikipedia data. In: Proceedings of the conference on empirical methods in natural language processing, Prague, Czech Republic, pp 708–716
Google Scholar
Diab M (2004) Relieving the data acquisition bottleneck in word sense disambiguation. In: Proceedings of the 42nd meeting of the association for computational linguistics (ACL 2004), Barcelona, Spain
Google Scholar
Diab M, Resnik P (2002) An unsupervised method for word sense tagging using parallel corpora. In: Proceedings of the 40st annual meeting of the association for computational linguistics (ACL 2002), Philadelphia, PA
Google Scholar
Ferrucci DA, Brown EW, Chu-Carroll J, Fan J, Gondek D, Kalyanpur A, Lally A, Murdock JW, Nyberg E, Prager JM, Schlaefer N, Welty CA (2010) Building Watson: an overview of the DeepQA project. AI Mag 31(3):59–79
Google Scholar
Gabrilovich E, Markovitch S (2006) Overcoming the brittleness bottleneck using Wikipedia: enhancing text categorization with encyclopedic knowledge. In: Proceedings of the national conference on artificial intelligence (AAAI), Boston
Google Scholar
Gabrilovich E, Markovitch S (2007) Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: Proceedings of the international joint conference on artificial intelligence, Hyderabad, pp 1606–1611
Google Scholar
Galley M, McKeown K (2003) Improving word sense disambiguation in lexical chaining. In: Proceedings of the 18th international joint conference on artificial intelligence (IJCAI 2003), Acapulco, Mexico
Google Scholar
Hachey B, Radford W, Nothman J, Honnibal M, Curran JR (2013) Evaluating entity linking with Wikipedia. Artif Intell 194:130–150
Article Google Scholar
Haghighi A, Klein D (2009) Simple coreference resolution with rich syntactic and semantic features. In: Proceedings of the 2009 conference on empirical methods in natural language processing, Singapore, pp 1152–1161
Google Scholar
Henrich V, Hinrichs EW, Vodolazova T (2011) Semi-automatic extension of GermaNet with sense definitions from Wiktionary. In: Proceedings of the 5th language and technology conference: human language technologies as a challenge for computer science and linguistics, Poznań, Poland pp 126–130
Google Scholar
Henrich V, Hinrichs EW, Vodolazova T (2012) An automatic method for creating a sense-annotated corpus harvested from the Web. In: 13th international conference on intelligent text processing and computational linguistics, CICLing-2012, New Delhi, India
Google Scholar
Henrich V, Hinrichs EW, Vodolazova T (2012) Webcage – a Web-harvested corpus annotated with GermaNet senses. In: 13th conference of the European chapter of the association for computational linguistics, EACL ’12, Avignon, France, pp 387–396
Google Scholar
Kaisser M (2008) The QuALiM question answering demo: supplementing answers with paragraphs drawn from Wikipedia. In: Proceedings of the ACL-08 human language technology demo session, Columbus, Ohio, pp 32–35
Google Scholar
Kunze C, Lemnitzer L (2002) GermaNet – Representation, visualization, application. In: 3rd international conference on language resources and evaluation, LREC’02, Las Palmas, Spain, pp 1485–1491
Google Scholar
Leacock C, Chodorow M, Miller G (1998) Using corpus statistics and WordNet relations for sense identification. Comput Linguist 24(1):147–165
Google Scholar
Lee Y, Ng H (2002) An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation. In: Proceedings of the 2002 conference on empirical methods in natural language processing (EMNLP 2002), Philadelphia
Google Scholar
Lesk M (1986) Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the SIGDOC conference 1986, Toronto
Google Scholar
Li Y, Luk R, Ho E, Chung K (2007) Improving weak ad-hoc queries using Wikipedia as external corpus. In: proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, Amsterdam, Netherlands, pp 797–798
Google Scholar
Medelyan O, Milne D, Legg C, Witten IH (2009) Mining meaning from Wikipedia. Inter J Human Comput Stud 67(9):716–754
Article Google Scholar
de Melo G, Weikum G (2010) Menta: inducing multilingual taxonomies from Wikipedia. In: Proceedings of the 19th ACM international conference on information and knowledge management, CIKM ’10. ACM, New York, pp 1099–1108
Google Scholar
Meyer CM, Gurevych I (2011) What psycholinguists know about chemistry: aligning Wiktionary and WordNet for increased domain coverage. In: Proceedings of the 5th international joint conference on natural language processing (IJCNLP), pp 883–892
Google Scholar
Mihalcea R (2002) Bootstrapping large sense tagged corpora. In: Proceedings of the third international conference on language resources and evaluation LREC 2002, Canary Islands, Spain, pp 1407–1411
Google Scholar
Mihalcea R (2007) Using Wikipedia for automatic word sense disambiguation. In: Human language technologies 2007: the conference of the North American chapter of the association for computational linguistics, Rochester, New York
Google Scholar
Mihalcea R, Csomai A (2007) Wikify!: linking documents to encyclopedic knowledge. In: Proceedings of the sixteenth ACM conference on information and knowledge management, Lisbon, Portugal
Book Google Scholar
Mihalcea R, Moldovan D (1999) An automatic method for generating sense tagged corpora. In: Proceedings of AAAI-99, Orlando, FL, pp 461–466
Google Scholar
Miller G (1995) Wordnet: A lexical database for English. Commun ACM 38(11):39–41
Article Google Scholar
Milne D (2007) Computing semantic relatedness using Wikipedia link structure. In: Proceedings of the New Zealand computer science research student conference, Hamilton, New Zealand
Google Scholar
Milne D, Witten I (2008) Learning to link with Wikipedia. In: Proceedings of the seventeenth ACM conference on information and knowledge management, Napa Valley, CA
Book Google Scholar
Nastase V, Strube M, Boerschinger B, Zirn C, Elghafari A (2010) WikiNet: a very large scale multi-lingual concept network. In: 7th international conference on language resources and evaluation, LREC’10, Valletta
Google Scholar
Navigli R, Ponzetto S (2010) BabelNet: Building a very large multilingual semantic network. In: Proceedings of the 48th annual meeting of the association for computational linguistics, Uppsala, Sweden
Google Scholar
Navigli R, Velardi P (2005) Structural semantic interconnections: a knowledge-based approach to word sense disambiguation. IEEE Trans Pattern Anal Mach Intell (PAMI) 27:1075–1086
Article Google Scholar
Ng H, Lee H (1996) Integrating multiple knowledge sources to disambiguate word sense: an examplar-based approach. In: Proceedings of the 34th annual meeting of the association for computational linguistics (ACL 1996), Santa Cruz
Google Scholar
Ng H, Wang B, Chan Y (2003) Exploiting parallel texts for word sense disambiguation: an empirical study. In: Proceedings of the 41st annual meeting of the association for computational linguistics (ACL 2003), Sapporo, Japan
Google Scholar
Niemann E, Gurevych I (2011) The people’s Web meets linguistic knowledge: automatic sense alignment of Wikipedia and Wordnet. In: Proceedings of the ninth international conference on computational semantics, association for computational linguistics, IWCS ’11, Stroudsburg, PA, USA, pp 205–214
Google Scholar
Pedersen T (2001) A decision tree of bigrams is an accurate predictor of word sense. In: Proceedings of the North American chapter of the association for computational linguistics (NAACL 2001), Pittsburgh, pp 79–86
Google Scholar
Ponzetto SP, Navigli R (2009) Large-scale taxonomy mapping for restructuring and integrating Wikipedia. In: Proceedings of the 21th international joint conference on artificial intelligence, Pasadena, CA
Google Scholar
Ponzetto SP, Navigli R (2010) Knowledge-rich word sense disambiguation rivaling supervised systems. In: Proceedings of the 48th annual meeting of the association for computational linguistics, association for computational linguistics, Stroudsburg, PA, USA, pp 1522–1531
Google Scholar
Potthast M, Stein B, Anderka MA (2008) Wikipedia-based multilingual retrieval model. In: Proceedings of the 30th European conference on IR research, Glasgow, United Kingdom
Google Scholar
Rahman A, Ng V (2011) Coreference resolution with world knowledge. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies – volume 1, association for computational linguistics, Stroudsburg, PA, USA, pp 814–824
Google Scholar
Resnik P, Yarowsky D (1999) Distinguishing systems and distinguishing senses: new evaluation methods for word sense disambiguation. Nat Lang Eng 5(2):113–134
Article Google Scholar
Strube M, Ponzetto SP (2006) Wikirelate! computing semantic relatedeness using Wikipedia. In: Proceedings of the American association for artificial intelligence, Boston, MA
Google Scholar
Suchanek FM, Kasneci G, Weikum G (2007) Yago: A core of semantic knowledge. In: Proceedings of the 16th World Wide Web conference, Banff, Alberta, Canada
Google Scholar
Wu F, Weld D (2007) Autonomously semantifying Wikipedia. In: Proceedings of the 16th ACM conference on information and knowledge management, Lisbon, Portugal
Google Scholar
Wu F, Weld D (2008) Automatically refining the Wikipedia Infobox ontology. In: Proceedings of the 17th international World Wide Web conference, Beijing, China
Google Scholar
Yarowsky D (1995) Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 33rd annual meeting of the association for computational linguistics (ACL 1995), Cambridge, MA
Google Scholar

Download references

Acknowledgements

This material is based in part upon work supported by the National Science Foundation IIS awards #1018613 and #1018590 and CAREER award #0747340. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

Author information

Authors and Affiliations

Department of Computer Science, University of North Texas, Denton, TX, USA
Bharath Dandala & Rada Mihalcea
School of EECS, Ohio University, Athens, OH, USA
Razvan Bunescu

Authors

Bharath Dandala
View author publications
You can also search for this author in PubMed Google Scholar
Rada Mihalcea
View author publications
You can also search for this author in PubMed Google Scholar
Razvan Bunescu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rada Mihalcea .

Editor information

Editors and Affiliations

Department of Computer Science Ubiquitous Knowledge Processing (UKP) Lab, Technische Universität Darmstadt, Darmstadt, Germany
Iryna Gurevych & Jungi Kim &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Dandala, B., Mihalcea, R., Bunescu, R. (2013). Word Sense Disambiguation Using Wikipedia. In: Gurevych, I., Kim, J. (eds) The People’s Web Meets NLP. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35085-6_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-35085-6_9
Published: 21 February 2013
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35084-9
Online ISBN: 978-3-642-35085-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics