Skip to main content

Word Sense Disambiguation Using Wikipedia

  • Chapter
  • First Online:
The People’s Web Meets NLP

Abstract

This paper describes explorations in word sense disambiguation using Wikipedia as a source of sense annotations. Through experiments on four different languages, we show that the Wikipedia-based sense annotations are reliable and can be used to construct accurate sense classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://meta.wikimedia.org/wiki/List_of_Wikipedias

  2. 2.

    The average length of a paragraph is 80 words.

  3. 3.

    http://www.senseval.org

  4. 4.

    Note that this baseline assumes the availability of a sense tagged corpus in order to determine the most frequent sense of a word. The baseline is therefore “informed,” as compared to a random, “uninformed” sense selection.

  5. 5.

    http://translate.google.com

  6. 6.

    http://www.wiktionary.org

References

  1. Agirre E, de Lacalle OL (2009) Supervised domain adaption for WSD. In: Proceedings of the 12th conference of the European chapter of the association for computational linguistics, association for computational linguistics, EACL ’09, Stroudsburg, PA, USA, pp 42–50

    Google Scholar 

  2. Agirre E, Martinez D (2004) Unsupervised word sense disambiguation based on automatically retrieved examples: the importance of bias. In: Proceedings of EMNLP 2004, Barcelona, Spain

    Google Scholar 

  3. Agirre E, De Lacalle OL, Soroa A (2009) Knowledge-based WSD on specific domains: performing better than generic supervised WSD. In: Proceedings of the 21st international joint conference on artifical intelligence, IJCAI’09. Morgan Kaufmann, San Francisco, pp 1501–1506

    Google Scholar 

  4. Ahn D, Jijkoun V, Mishne G, Muller K, de Rijke M, Schlobach S (2004) Using Wikipedia at the TREC QA track. In: Proceedings of the 13th text retrieval conference (TREC 2004), Gaithersburg, MD

    Google Scholar 

  5. Bizer C, Lehmann J, Kobilarov G, Auer S, Becker C, Cyganiak R, Hellmann S (2009) DBpedia – a crystallization point for the Web of data. Web Semant 7:154–165

    Article  Google Scholar 

  6. Bryl V, Giuliano C, Serafini L, Tymoshenko K (2010) Using background knowledge to support coreference resolution. In: Proceedings of the 2010 conference on ECAI 2010: 19th European conference on artificial intelligence, Amsterdam, The Netherlands, pp 759–764

    Google Scholar 

  7. Bunescu R, Pasca M (2006) Using encyclopedic knowledge for named entity disambiguation. In: Proceedings of the European conference of the association for computational linguistics, Trento, Italy

    Google Scholar 

  8. Chklovski T, Mihalcea R (2002) Building a sense tagged corpus with open mind word expert. In: Proceedings of the ACL 2002 workshop on word sense disambiguation: recent successes and future directions, Philadelphia

    Google Scholar 

  9. Cimiano P, Schultz A, Sizov S, Sorg P, Staab S (2009) Explicit versus latent concept models for cross-language information retrieval. In: International joint conference on artificial intelligence, IJCAI-09, Pasadena, CA, pp 1513–1518

    Google Scholar 

  10. Cucerzan S (2007) Large-scale named entity disambiguation based on Wikipedia data. In: Proceedings of the conference on empirical methods in natural language processing, Prague, Czech Republic, pp 708–716

    Google Scholar 

  11. Diab M (2004) Relieving the data acquisition bottleneck in word sense disambiguation. In: Proceedings of the 42nd meeting of the association for computational linguistics (ACL 2004), Barcelona, Spain

    Google Scholar 

  12. Diab M, Resnik P (2002) An unsupervised method for word sense tagging using parallel corpora. In: Proceedings of the 40st annual meeting of the association for computational linguistics (ACL 2002), Philadelphia, PA

    Google Scholar 

  13. Ferrucci DA, Brown EW, Chu-Carroll J, Fan J, Gondek D, Kalyanpur A, Lally A, Murdock JW, Nyberg E, Prager JM, Schlaefer N, Welty CA (2010) Building Watson: an overview of the DeepQA project. AI Mag 31(3):59–79

    Google Scholar 

  14. Gabrilovich E, Markovitch S (2006) Overcoming the brittleness bottleneck using Wikipedia: enhancing text categorization with encyclopedic knowledge. In: Proceedings of the national conference on artificial intelligence (AAAI), Boston

    Google Scholar 

  15. Gabrilovich E, Markovitch S (2007) Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: Proceedings of the international joint conference on artificial intelligence, Hyderabad, pp 1606–1611

    Google Scholar 

  16. Galley M, McKeown K (2003) Improving word sense disambiguation in lexical chaining. In: Proceedings of the 18th international joint conference on artificial intelligence (IJCAI 2003), Acapulco, Mexico

    Google Scholar 

  17. Hachey B, Radford W, Nothman J, Honnibal M, Curran JR (2013) Evaluating entity linking with Wikipedia. Artif Intell 194:130–150

    Article  Google Scholar 

  18. Haghighi A, Klein D (2009) Simple coreference resolution with rich syntactic and semantic features. In: Proceedings of the 2009 conference on empirical methods in natural language processing, Singapore, pp 1152–1161

    Google Scholar 

  19. Henrich V, Hinrichs EW, Vodolazova T (2011) Semi-automatic extension of GermaNet with sense definitions from Wiktionary. In: Proceedings of the 5th language and technology conference: human language technologies as a challenge for computer science and linguistics, Poznań, Poland pp 126–130

    Google Scholar 

  20. Henrich V, Hinrichs EW, Vodolazova T (2012) An automatic method for creating a sense-annotated corpus harvested from the Web. In: 13th international conference on intelligent text processing and computational linguistics, CICLing-2012, New Delhi, India

    Google Scholar 

  21. Henrich V, Hinrichs EW, Vodolazova T (2012) Webcage – a Web-harvested corpus annotated with GermaNet senses. In: 13th conference of the European chapter of the association for computational linguistics, EACL ’12, Avignon, France, pp 387–396

    Google Scholar 

  22. Kaisser M (2008) The QuALiM question answering demo: supplementing answers with paragraphs drawn from Wikipedia. In: Proceedings of the ACL-08 human language technology demo session, Columbus, Ohio, pp 32–35

    Google Scholar 

  23. Kunze C, Lemnitzer L (2002) GermaNet – Representation, visualization, application. In: 3rd international conference on language resources and evaluation, LREC’02, Las Palmas, Spain, pp 1485–1491

    Google Scholar 

  24. Leacock C, Chodorow M, Miller G (1998) Using corpus statistics and WordNet relations for sense identification. Comput Linguist 24(1):147–165

    Google Scholar 

  25. Lee Y, Ng H (2002) An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation. In: Proceedings of the 2002 conference on empirical methods in natural language processing (EMNLP 2002), Philadelphia

    Google Scholar 

  26. Lesk M (1986) Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the SIGDOC conference 1986, Toronto

    Google Scholar 

  27. Li Y, Luk R, Ho E, Chung K (2007) Improving weak ad-hoc queries using Wikipedia as external corpus. In: proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, Amsterdam, Netherlands, pp 797–798

    Google Scholar 

  28. Medelyan O, Milne D, Legg C, Witten IH (2009) Mining meaning from Wikipedia. Inter J Human Comput Stud 67(9):716–754

    Article  Google Scholar 

  29. de Melo G, Weikum G (2010) Menta: inducing multilingual taxonomies from Wikipedia. In: Proceedings of the 19th ACM international conference on information and knowledge management, CIKM ’10. ACM, New York, pp 1099–1108

    Google Scholar 

  30. Meyer CM, Gurevych I (2011) What psycholinguists know about chemistry: aligning Wiktionary and WordNet for increased domain coverage. In: Proceedings of the 5th international joint conference on natural language processing (IJCNLP), pp 883–892

    Google Scholar 

  31. Mihalcea R (2002) Bootstrapping large sense tagged corpora. In: Proceedings of the third international conference on language resources and evaluation LREC 2002, Canary Islands, Spain, pp 1407–1411

    Google Scholar 

  32. Mihalcea R (2007) Using Wikipedia for automatic word sense disambiguation. In: Human language technologies 2007: the conference of the North American chapter of the association for computational linguistics, Rochester, New York

    Google Scholar 

  33. Mihalcea R, Csomai A (2007) Wikify!: linking documents to encyclopedic knowledge. In: Proceedings of the sixteenth ACM conference on information and knowledge management, Lisbon, Portugal

    Book  Google Scholar 

  34. Mihalcea R, Moldovan D (1999) An automatic method for generating sense tagged corpora. In: Proceedings of AAAI-99, Orlando, FL, pp 461–466

    Google Scholar 

  35. Miller G (1995) Wordnet: A lexical database for English. Commun ACM 38(11):39–41

    Article  Google Scholar 

  36. Milne D (2007) Computing semantic relatedness using Wikipedia link structure. In: Proceedings of the New Zealand computer science research student conference, Hamilton, New Zealand

    Google Scholar 

  37. Milne D, Witten I (2008) Learning to link with Wikipedia. In: Proceedings of the seventeenth ACM conference on information and knowledge management, Napa Valley, CA

    Book  Google Scholar 

  38. Nastase V, Strube M, Boerschinger B, Zirn C, Elghafari A (2010) WikiNet: a very large scale multi-lingual concept network. In: 7th international conference on language resources and evaluation, LREC’10, Valletta

    Google Scholar 

  39. Navigli R, Ponzetto S (2010) BabelNet: Building a very large multilingual semantic network. In: Proceedings of the 48th annual meeting of the association for computational linguistics, Uppsala, Sweden

    Google Scholar 

  40. Navigli R, Velardi P (2005) Structural semantic interconnections: a knowledge-based approach to word sense disambiguation. IEEE Trans Pattern Anal Mach Intell (PAMI) 27:1075–1086

    Article  Google Scholar 

  41. Ng H, Lee H (1996) Integrating multiple knowledge sources to disambiguate word sense: an examplar-based approach. In: Proceedings of the 34th annual meeting of the association for computational linguistics (ACL 1996), Santa Cruz

    Google Scholar 

  42. Ng H, Wang B, Chan Y (2003) Exploiting parallel texts for word sense disambiguation: an empirical study. In: Proceedings of the 41st annual meeting of the association for computational linguistics (ACL 2003), Sapporo, Japan

    Google Scholar 

  43. Niemann E, Gurevych I (2011) The people’s Web meets linguistic knowledge: automatic sense alignment of Wikipedia and Wordnet. In: Proceedings of the ninth international conference on computational semantics, association for computational linguistics, IWCS ’11, Stroudsburg, PA, USA, pp 205–214

    Google Scholar 

  44. Pedersen T (2001) A decision tree of bigrams is an accurate predictor of word sense. In: Proceedings of the North American chapter of the association for computational linguistics (NAACL 2001), Pittsburgh, pp 79–86

    Google Scholar 

  45. Ponzetto SP, Navigli R (2009) Large-scale taxonomy mapping for restructuring and integrating Wikipedia. In: Proceedings of the 21th international joint conference on artificial intelligence, Pasadena, CA

    Google Scholar 

  46. Ponzetto SP, Navigli R (2010) Knowledge-rich word sense disambiguation rivaling supervised systems. In: Proceedings of the 48th annual meeting of the association for computational linguistics, association for computational linguistics, Stroudsburg, PA, USA, pp 1522–1531

    Google Scholar 

  47. Potthast M, Stein B, Anderka MA (2008) Wikipedia-based multilingual retrieval model. In: Proceedings of the 30th European conference on IR research, Glasgow, United Kingdom

    Google Scholar 

  48. Rahman A, Ng V (2011) Coreference resolution with world knowledge. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies – volume 1, association for computational linguistics, Stroudsburg, PA, USA, pp 814–824

    Google Scholar 

  49. Resnik P, Yarowsky D (1999) Distinguishing systems and distinguishing senses: new evaluation methods for word sense disambiguation. Nat Lang Eng 5(2):113–134

    Article  Google Scholar 

  50. Strube M, Ponzetto SP (2006) Wikirelate! computing semantic relatedeness using Wikipedia. In: Proceedings of the American association for artificial intelligence, Boston, MA

    Google Scholar 

  51. Suchanek FM, Kasneci G, Weikum G (2007) Yago: A core of semantic knowledge. In: Proceedings of the 16th World Wide Web conference, Banff, Alberta, Canada

    Google Scholar 

  52. Wu F, Weld D (2007) Autonomously semantifying Wikipedia. In: Proceedings of the 16th ACM conference on information and knowledge management, Lisbon, Portugal

    Google Scholar 

  53. Wu F, Weld D (2008) Automatically refining the Wikipedia Infobox ontology. In: Proceedings of the 17th international World Wide Web conference, Beijing, China

    Google Scholar 

  54. Yarowsky D (1995) Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 33rd annual meeting of the association for computational linguistics (ACL 1995), Cambridge, MA

    Google Scholar 

Download references

Acknowledgements

This material is based in part upon work supported by the National Science Foundation IIS awards #1018613 and #1018590 and CAREER award #0747340. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rada Mihalcea .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Dandala, B., Mihalcea, R., Bunescu, R. (2013). Word Sense Disambiguation Using Wikipedia. In: Gurevych, I., Kim, J. (eds) The People’s Web Meets NLP. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35085-6_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35085-6_9

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35084-9

  • Online ISBN: 978-3-642-35085-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics