Artificial Intelligence Review

, Volume 41, Issue 2, pp 241–260 | Cite as

Unsupervised word sense disambiguation with N-gram features

  • Daniel Preotiuc-PietroEmail author
  • Florentina Hristea


The present paper concentrates on the issue of feature selection for unsupervised word sense disambiguation (WSD) performed with an underlying Naïve Bayes model. It introduces web N-gram features which, to our knowledge, are used for the first time in unsupervised WSD. While creating features from unlabeled data, we are “helping” a simple, basic knowledge-lean disambiguation algorithm to significantly increase its accuracy as a result of receiving easily obtainable knowledge. The performance of this method is compared to that of others that rely on completely different feature sets. Test results concerning nouns, adjectives and verbs show that web N-gram feature selection is a reliable alternative to previously existing approaches, provided that a “quality list” of features, adapted to the part of speech, is used.


Bayesian classification The EM algorithm Word sense disambiguation Unsupervised disambiguation Web-scale N-grams 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Agirre E, Edmonds PG (2006) Word sense disambiguation: algorithms and applications (Text, Speech and Language Technology). Springer, DordrechtCrossRefGoogle Scholar
  2. Banerjee S, Pedersen T (2002) An adapted lesk algorithm for word sense disambiguation using wordnet. In: Proceedings of the third international conference on computational linguistics and intelligent text processing, CICLing ’02, pp 136–145Google Scholar
  3. Banerjee S, Pedersen T (2003) Extended gloss overlaps as a measure of semantic relatedness. In: Proceedings of the eighteenth international joint conference on artificial intelligence, pp 805–810Google Scholar
  4. Bergsma S, Lin D, Goebel R (2009) Web-scale N-gram models for lexical disambiguation. In: Proceedings of the 21st international jont conference on artifical intelligence, pp 1507–1512Google Scholar
  5. Bergsma S, Pitler E, Lin D (2010) Creating robust supervised classifiers via web-scale N-gram data. In: Proceedings of the 48th annual meeting of the association for computational linguistics, ACL ’10, pp 865–874Google Scholar
  6. Brants T, Franz A (2006) Web 1T 5-gram corpus version 1.1. Technical report, Google researchGoogle Scholar
  7. Brants T, Franz A (2009) Web 1T 5-gram, 10 european languages version 1. Technical report, Linguistic data consortium, PhiladelphiaGoogle Scholar
  8. Bruce R, Wiebe J, Pedersen T (1996) The measure of a model. CoRR, cmp-lg/9604018Google Scholar
  9. Chang C-Y, Clark S (2010) Linguistic steganography using automatically generated paraphrases. In: Human language technologies: the 2010 annual conference of the North American chapter of the association for computational linguistics, HLT ’10, pp 591–599Google Scholar
  10. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39: 1–38MathSciNetzbMATHGoogle Scholar
  11. Fellbaum Ce (1998) Wordnet: an electronic lexical database. The MIT Press, CambridgezbMATHGoogle Scholar
  12. Gale W, Church K, Yarowsky D (1992) A method for disambiguating word senses in a large corpus. Comput Humanit 26: 415–439CrossRefGoogle Scholar
  13. Hristea F (2009) Recent advances concerning the usage of the naive bayes model in unsupervised word sense disambiguation. Int Rev Comput Softw 4: 58–67Google Scholar
  14. Hristea F, Popescu M (2009) Adjective sense disambiguation at the border between unsupervised and knowledge-based techniques. Fundam Inf 91: 547–562MathSciNetzbMATHGoogle Scholar
  15. Hristea F, Popescu M, Dumitrescu M (2008) Performing word sense disambiguation at the border between unsupervised and knowledge-based techniques. Artif Intell Rev 30: 67–86CrossRefGoogle Scholar
  16. Islam A, Inkpen D (2009) Real-word spelling correction using google web it 3-grams. In: Proceedings of the 2009 conference on empirical methods in natural language processing, vol 3, EMNLP ’09, pp 1241–1249Google Scholar
  17. Keller F, Lapata M (2003) Using the web to obtain frequencies for unseen bigrams. Comput Linguist 29: 459–484CrossRefGoogle Scholar
  18. Leacock C, Towell G, Voorhees E (1993) Corpus-based statistical sense resolution. In: Proceedings of the workshop on human language technology, HLT ’93, pp 260–265Google Scholar
  19. Miller GA (1990) Nouns in wordnet: a lexical inheritance system. Int J Lexicogr 3: 245–264CrossRefGoogle Scholar
  20. Miller GA (1995) Wordnet: a lexical database for English. Commun ACM 38: 39–41CrossRefGoogle Scholar
  21. Miller GA, Beckwith R, Fellbaum C, Gross D, Miller K (1990) Wordnet: an on-line lexical database. Int J Lexicogr 3: 235–244CrossRefGoogle Scholar
  22. Pedersen T (2006) Unsupervised corpus-based methods for wsd. In: Word sense disambiguation: algorithms and applications. In: Agirre E, Edmonds P (eds) Springer, Dordrecht, pp 133–166Google Scholar
  23. Pedersen T, Bruce R (1997) Distinguishing word senses in untagged text. In: Proceedings of the second conference on empirical methods in natural language processing, pp 197–207Google Scholar
  24. Pedersen T, Bruce R (1998) Knowledge lean word-sense disambiguation. In: Proceedings of the fifteenth national conference on artificial intelligence, AAAI Press, pp 800–805Google Scholar
  25. Ponzetto SP, Navigli R (2010) Knowledge-rich word sense disambiguation rivaling supervised systems. In: Proceedings of the 48th annual meeting of the association for computational linguistics, ACL ’10, pp 1522–1531Google Scholar
  26. Schütze H (1998) Automatic word sense discrimination. Comput Linguist 24: 97–123Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2012

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of SheffieldSheffieldUK
  2. 2.Department of Computer ScienceUniversity of BucharestBucharestRomania

Personalised recommendations