Word Sense Disambiguation Using Wikipedia

Chapter

Abstract

This paper describes explorations in word sense disambiguation using Wikipedia as a source of sense annotations. Through experiments on four different languages, we show that the Wikipedia-based sense annotations are reliable and can be used to construct accurate sense classifiers.

References

  1. 1.
    Agirre E, de Lacalle OL (2009) Supervised domain adaption for WSD. In: Proceedings of the 12th conference of the European chapter of the association for computational linguistics, association for computational linguistics, EACL ’09, Stroudsburg, PA, USA, pp 42–50Google Scholar
  2. 2.
    Agirre E, Martinez D (2004) Unsupervised word sense disambiguation based on automatically retrieved examples: the importance of bias. In: Proceedings of EMNLP 2004, Barcelona, SpainGoogle Scholar
  3. 3.
    Agirre E, De Lacalle OL, Soroa A (2009) Knowledge-based WSD on specific domains: performing better than generic supervised WSD. In: Proceedings of the 21st international joint conference on artifical intelligence, IJCAI’09. Morgan Kaufmann, San Francisco, pp 1501–1506Google Scholar
  4. 4.
    Ahn D, Jijkoun V, Mishne G, Muller K, de Rijke M, Schlobach S (2004) Using Wikipedia at the TREC QA track. In: Proceedings of the 13th text retrieval conference (TREC 2004), Gaithersburg, MDGoogle Scholar
  5. 5.
    Bizer C, Lehmann J, Kobilarov G, Auer S, Becker C, Cyganiak R, Hellmann S (2009) DBpedia – a crystallization point for the Web of data. Web Semant 7:154–165CrossRefGoogle Scholar
  6. 6.
    Bryl V, Giuliano C, Serafini L, Tymoshenko K (2010) Using background knowledge to support coreference resolution. In: Proceedings of the 2010 conference on ECAI 2010: 19th European conference on artificial intelligence, Amsterdam, The Netherlands, pp 759–764Google Scholar
  7. 7.
    Bunescu R, Pasca M (2006) Using encyclopedic knowledge for named entity disambiguation. In: Proceedings of the European conference of the association for computational linguistics, Trento, ItalyGoogle Scholar
  8. 8.
    Chklovski T, Mihalcea R (2002) Building a sense tagged corpus with open mind word expert. In: Proceedings of the ACL 2002 workshop on word sense disambiguation: recent successes and future directions, PhiladelphiaGoogle Scholar
  9. 9.
    Cimiano P, Schultz A, Sizov S, Sorg P, Staab S (2009) Explicit versus latent concept models for cross-language information retrieval. In: International joint conference on artificial intelligence, IJCAI-09, Pasadena, CA, pp 1513–1518Google Scholar
  10. 10.
    Cucerzan S (2007) Large-scale named entity disambiguation based on Wikipedia data. In: Proceedings of the conference on empirical methods in natural language processing, Prague, Czech Republic, pp 708–716Google Scholar
  11. 11.
    Diab M (2004) Relieving the data acquisition bottleneck in word sense disambiguation. In: Proceedings of the 42nd meeting of the association for computational linguistics (ACL 2004), Barcelona, SpainGoogle Scholar
  12. 12.
    Diab M, Resnik P (2002) An unsupervised method for word sense tagging using parallel corpora. In: Proceedings of the 40st annual meeting of the association for computational linguistics (ACL 2002), Philadelphia, PAGoogle Scholar
  13. 13.
    Ferrucci DA, Brown EW, Chu-Carroll J, Fan J, Gondek D, Kalyanpur A, Lally A, Murdock JW, Nyberg E, Prager JM, Schlaefer N, Welty CA (2010) Building Watson: an overview of the DeepQA project. AI Mag 31(3):59–79Google Scholar
  14. 14.
    Gabrilovich E, Markovitch S (2006) Overcoming the brittleness bottleneck using Wikipedia: enhancing text categorization with encyclopedic knowledge. In: Proceedings of the national conference on artificial intelligence (AAAI), BostonGoogle Scholar
  15. 15.
    Gabrilovich E, Markovitch S (2007) Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: Proceedings of the international joint conference on artificial intelligence, Hyderabad, pp 1606–1611Google Scholar
  16. 16.
    Galley M, McKeown K (2003) Improving word sense disambiguation in lexical chaining. In: Proceedings of the 18th international joint conference on artificial intelligence (IJCAI 2003), Acapulco, MexicoGoogle Scholar
  17. 17.
    Hachey B, Radford W, Nothman J, Honnibal M, Curran JR (2013) Evaluating entity linking with Wikipedia. Artif Intell 194:130–150CrossRefGoogle Scholar
  18. 18.
    Haghighi A, Klein D (2009) Simple coreference resolution with rich syntactic and semantic features. In: Proceedings of the 2009 conference on empirical methods in natural language processing, Singapore, pp 1152–1161Google Scholar
  19. 19.
    Henrich V, Hinrichs EW, Vodolazova T (2011) Semi-automatic extension of GermaNet with sense definitions from Wiktionary. In: Proceedings of the 5th language and technology conference: human language technologies as a challenge for computer science and linguistics, Poznań, Poland pp 126–130Google Scholar
  20. 20.
    Henrich V, Hinrichs EW, Vodolazova T (2012) An automatic method for creating a sense-annotated corpus harvested from the Web. In: 13th international conference on intelligent text processing and computational linguistics, CICLing-2012, New Delhi, IndiaGoogle Scholar
  21. 21.
    Henrich V, Hinrichs EW, Vodolazova T (2012) Webcage – a Web-harvested corpus annotated with GermaNet senses. In: 13th conference of the European chapter of the association for computational linguistics, EACL ’12, Avignon, France, pp 387–396Google Scholar
  22. 22.
    Kaisser M (2008) The QuALiM question answering demo: supplementing answers with paragraphs drawn from Wikipedia. In: Proceedings of the ACL-08 human language technology demo session, Columbus, Ohio, pp 32–35Google Scholar
  23. 23.
    Kunze C, Lemnitzer L (2002) GermaNet – Representation, visualization, application. In: 3rd international conference on language resources and evaluation, LREC’02, Las Palmas, Spain, pp 1485–1491Google Scholar
  24. 24.
    Leacock C, Chodorow M, Miller G (1998) Using corpus statistics and WordNet relations for sense identification. Comput Linguist 24(1):147–165Google Scholar
  25. 25.
    Lee Y, Ng H (2002) An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation. In: Proceedings of the 2002 conference on empirical methods in natural language processing (EMNLP 2002), PhiladelphiaGoogle Scholar
  26. 26.
    Lesk M (1986) Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the SIGDOC conference 1986, TorontoGoogle Scholar
  27. 27.
    Li Y, Luk R, Ho E, Chung K (2007) Improving weak ad-hoc queries using Wikipedia as external corpus. In: proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, Amsterdam, Netherlands, pp 797–798Google Scholar
  28. 28.
    Medelyan O, Milne D, Legg C, Witten IH (2009) Mining meaning from Wikipedia. Inter J Human Comput Stud 67(9):716–754CrossRefGoogle Scholar
  29. 29.
    de Melo G, Weikum G (2010) Menta: inducing multilingual taxonomies from Wikipedia. In: Proceedings of the 19th ACM international conference on information and knowledge management, CIKM ’10. ACM, New York, pp 1099–1108Google Scholar
  30. 30.
    Meyer CM, Gurevych I (2011) What psycholinguists know about chemistry: aligning Wiktionary and WordNet for increased domain coverage. In: Proceedings of the 5th international joint conference on natural language processing (IJCNLP), pp 883–892Google Scholar
  31. 31.
    Mihalcea R (2002) Bootstrapping large sense tagged corpora. In: Proceedings of the third international conference on language resources and evaluation LREC 2002, Canary Islands, Spain, pp 1407–1411Google Scholar
  32. 32.
    Mihalcea R (2007) Using Wikipedia for automatic word sense disambiguation. In: Human language technologies 2007: the conference of the North American chapter of the association for computational linguistics, Rochester, New YorkGoogle Scholar
  33. 33.
    Mihalcea R, Csomai A (2007) Wikify!: linking documents to encyclopedic knowledge. In: Proceedings of the sixteenth ACM conference on information and knowledge management, Lisbon, PortugalGoogle Scholar
  34. 34.
    Mihalcea R, Moldovan D (1999) An automatic method for generating sense tagged corpora. In: Proceedings of AAAI-99, Orlando, FL, pp 461–466Google Scholar
  35. 35.
    Miller G (1995) Wordnet: A lexical database for English. Commun ACM 38(11):39–41CrossRefGoogle Scholar
  36. 36.
    Milne D (2007) Computing semantic relatedness using Wikipedia link structure. In: Proceedings of the New Zealand computer science research student conference, Hamilton, New ZealandGoogle Scholar
  37. 37.
    Milne D, Witten I (2008) Learning to link with Wikipedia. In: Proceedings of the seventeenth ACM conference on information and knowledge management, Napa Valley, CAGoogle Scholar
  38. 38.
    Nastase V, Strube M, Boerschinger B, Zirn C, Elghafari A (2010) WikiNet: a very large scale multi-lingual concept network. In: 7th international conference on language resources and evaluation, LREC’10, VallettaGoogle Scholar
  39. 39.
    Navigli R, Ponzetto S (2010) BabelNet: Building a very large multilingual semantic network. In: Proceedings of the 48th annual meeting of the association for computational linguistics, Uppsala, SwedenGoogle Scholar
  40. 40.
    Navigli R, Velardi P (2005) Structural semantic interconnections: a knowledge-based approach to word sense disambiguation. IEEE Trans Pattern Anal Mach Intell (PAMI) 27:1075–1086CrossRefGoogle Scholar
  41. 41.
    Ng H, Lee H (1996) Integrating multiple knowledge sources to disambiguate word sense: an examplar-based approach. In: Proceedings of the 34th annual meeting of the association for computational linguistics (ACL 1996), Santa CruzGoogle Scholar
  42. 42.
    Ng H, Wang B, Chan Y (2003) Exploiting parallel texts for word sense disambiguation: an empirical study. In: Proceedings of the 41st annual meeting of the association for computational linguistics (ACL 2003), Sapporo, JapanGoogle Scholar
  43. 43.
    Niemann E, Gurevych I (2011) The people’s Web meets linguistic knowledge: automatic sense alignment of Wikipedia and Wordnet. In: Proceedings of the ninth international conference on computational semantics, association for computational linguistics, IWCS ’11, Stroudsburg, PA, USA, pp 205–214Google Scholar
  44. 44.
    Pedersen T (2001) A decision tree of bigrams is an accurate predictor of word sense. In: Proceedings of the North American chapter of the association for computational linguistics (NAACL 2001), Pittsburgh, pp 79–86Google Scholar
  45. 45.
    Ponzetto SP, Navigli R (2009) Large-scale taxonomy mapping for restructuring and integrating Wikipedia. In: Proceedings of the 21th international joint conference on artificial intelligence, Pasadena, CAGoogle Scholar
  46. 46.
    Ponzetto SP, Navigli R (2010) Knowledge-rich word sense disambiguation rivaling supervised systems. In: Proceedings of the 48th annual meeting of the association for computational linguistics, association for computational linguistics, Stroudsburg, PA, USA, pp 1522–1531Google Scholar
  47. 47.
    Potthast M, Stein B, Anderka MA (2008) Wikipedia-based multilingual retrieval model. In: Proceedings of the 30th European conference on IR research, Glasgow, United KingdomGoogle Scholar
  48. 48.
    Rahman A, Ng V (2011) Coreference resolution with world knowledge. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies – volume 1, association for computational linguistics, Stroudsburg, PA, USA, pp 814–824Google Scholar
  49. 49.
    Resnik P, Yarowsky D (1999) Distinguishing systems and distinguishing senses: new evaluation methods for word sense disambiguation. Nat Lang Eng 5(2):113–134CrossRefGoogle Scholar
  50. 50.
    Strube M, Ponzetto SP (2006) Wikirelate! computing semantic relatedeness using Wikipedia. In: Proceedings of the American association for artificial intelligence, Boston, MAGoogle Scholar
  51. 51.
    Suchanek FM, Kasneci G, Weikum G (2007) Yago: A core of semantic knowledge. In: Proceedings of the 16th World Wide Web conference, Banff, Alberta, CanadaGoogle Scholar
  52. 52.
    Wu F, Weld D (2007) Autonomously semantifying Wikipedia. In: Proceedings of the 16th ACM conference on information and knowledge management, Lisbon, PortugalGoogle Scholar
  53. 53.
    Wu F, Weld D (2008) Automatically refining the Wikipedia Infobox ontology. In: Proceedings of the 17th international World Wide Web conference, Beijing, ChinaGoogle Scholar
  54. 54.
    Yarowsky D (1995) Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 33rd annual meeting of the association for computational linguistics (ACL 1995), Cambridge, MAGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Bharath Dandala
    • 1
  • Rada Mihalcea
    • 1
  • Razvan Bunescu
    • 2
  1. 1.Department of Computer ScienceUniversity of North TexasDentonUSA
  2. 2.School of EECSOhio UniversityAthensUSA

Personalised recommendations