Programming and Computer Software

, Volume 36, Issue 1, pp 11–18 | Cite as

Automatic word sense disambiguation based on document networks

  • D. Yu. TurdakovEmail author
  • S. D. Kuznetsov


In this paper, a survey of works on word sense disambiguation is presented, and the method used in the Texterra system [1] is described. The method is based on calculation of semantic relatedness of Wikipedia concepts. Comparison of the proposed method and the existing word sense disambiguation methods on various document collections is given.


Semantic Relatedness Ambiguous Word Word Sense Word Sense Disambiguation Test Collection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Texterra: A Toolkit for Text Mining.
  2. 2.
    Miller, G.A., WordNet: A Lexical Database for English, Commun. ACM, 1995, vol. 38, no. 11, pp. 39–41.CrossRefGoogle Scholar
  3. 3.
    Cycorp, Inc. Scholar
  4. 4.
    Francis, W. and Kucera, H., Brown Corpus Manual.
  5. 5.
    The Penn Treebank Project. http://www.cis.upenn. edu/~treebank/.
  6. 6.
    Manning, C.D. and Schutze, H., Foundations of Statistical Natural Language Processing, Cambridge, Mass.: MIT, 1999.zbMATHGoogle Scholar
  7. 7.
    Ide, N. and Ve’ronis, J., Word Sense Disambiguation: The State of the Art, Computational Linguistics, 1998.Google Scholar
  8. 8.
    Aggire, E. and Edmonds, P.G., Word Sense Disambiguation: Algorithms and Applications, Springer, 2006.Google Scholar
  9. 9.
    Senseval Web Page. Scholar
  10. 10.
    Mihalcea, R., Using Wikipedia for Automatic Word Sense Disambiguation, Proc. of NAACL HLT 2007, Rochester, NY, 2007, pp. 196–203.Google Scholar
  11. 11.
    Mihalcea, R. and Csomai, A., Wikify!: Linking Documents to Encyclopedic Knowledge, Proc. of the 16th ACM Conf. on Information and Knowledge Management (CIKM’07), 2007.Google Scholar
  12. 12.
    Menczer, F., Evolution of Document Networks, Proc. of the National Academy of Sciences of the United States of America.Google Scholar
  13. 13.
    Albert, R. and Baraba’si, A.-L., Statistical Mechanics of Complex Networks, Rev. Modern Phys., 2002, vol. 47, pp. 47–97.CrossRefMathSciNetGoogle Scholar
  14. 14.
    Cohen, R. and Havlin, S., Scale-free Networks are Ultrasmall, Phys. Rev. Lett., 2003, vol. 90, no. 5, 058701.CrossRefGoogle Scholar
  15. 15.
    Turdakov, D. and Velikhov, P., Semantic Relatedness Metric for Wikipedia Concepts Based on Link Analysis and its Applications to Word Sense Disambiguation, Proc. of SYRCoDIS, 2008.Google Scholar
  16. 16.
    Kilgarriff, A. and Grefenstette, G., Introduction to the Special Issue on the Web as Corpus, Computational Linguistics, 2003, vol. 29, no. 3, pp. 333–347.CrossRefMathSciNetGoogle Scholar
  17. 17.
    Zesch, T. and Gurevych, I., Analysis of the Wikipedia Category Graph for NLP Applications, Proc. of the TextGraphs-2 Workshop, NAACL-HLT, 2007.Google Scholar
  18. 18.
    Lesk, M., Automatic Sense Disambiguation Using Machine Readable Dictionaries: How to Tell a Pine Cone from an Ice Cream Cone, ACM Special Interest Group for Design of Communication, Proc. of the 5th Ann. Int. Conf. on System Documentation, 1986, pp. 24–26.Google Scholar
  19. 19.
    Pradhan, S., Loper, E., Dligach, D., and Palmer, M., SemEval-2007 Task 17: English Lexical Sample, SRL and All Words, Proc. of the 4th Int. Workshop on Semantic Evaluations (SemEval-2007), 2007, Prague, Czech Republic, pp. 87–92.Google Scholar
  20. 20.
    Strube, M. and Ponzetto, S.P., WikiRelate! Computing Semantic Relatedness Using Wikipedia, Proc. of AAAI, 2006, pp. 1419–1424.Google Scholar
  21. 21.
    Milne, D. and Witten, I.H., An Effective, Low-Cost Measure of Semantic Relatedness Obtained from Wikipedia Links, Proc. of the AAAI’08 Workshop on Wikipedia and Artificial Intelligence, 2008.Google Scholar
  22. 22.
    Cucerzan, S., Large-Scale Named Entity Disambiguation Based on Wikipedia Data, Proc. 2007 Joint Conf. on EMNLP and CNLL, Prague, 2007, pp. 708–716.Google Scholar
  23. 23.
    Bunescu, R. and Pasca, M., Using Encyclopedic Knowledge for Named Entity Disambiguation, Proc. of the 11th Conf. of the European Chapter of the Association for Computational Linguistics (EACL), Trento, Italy, 2006.Google Scholar
  24. 24.
    Milne, D. and Witten, I.H., Learning to Link with Wikipedia, Proc. of the ACM Conf. on Information and Knowledge Management, 2008.Google Scholar
  25. 25.
    Medelyan, O., Witten, I.H., and Milne, D., Topic Indexing with Wikipedia, Proc. of the AAAI’08 Workshop on Wikipedia and Artificial Intelligence, 2008.Google Scholar
  26. 26.
    Jeh, G. and Widom, J., SimRank: A Measure of Structural-Context Similarity, Proc. of the Eighth ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, 2002.Google Scholar
  27. 27.
    Lizorkin, D., Velikhov, P., Grinev, M., and Turdakov, D., Accuracy Estimate and Optimization Techniques for SimRank Computation, Proc. of the 34th Int. Conf. on Very Large Data Bases (VLDB’08), pp. 422–433.Google Scholar

Copyright information

© Pleiades Publishing, Ltd. 2010

Authors and Affiliations

  1. 1.Department of Computational Mathematics and CyberneticsMoscow State UniversityMoscowRussia
  2. 2.Institute of System ProgrammingRussian Academy of SciencesMoscowRussia

Personalised recommendations