Automatic word sense disambiguation based on document networks

Abstract

In this paper, a survey of works on word sense disambiguation is presented, and the method used in the Texterra system [1] is described. The method is based on calculation of semantic relatedness of Wikipedia concepts. Comparison of the proposed method and the existing word sense disambiguation methods on various document collections is given.

This is a preview of subscription content, access via your institution.

References

  1. 1.

    Texterra: A Toolkit for Text Mining. http://modis.ispras.ru/texterra.

  2. 2.

    Miller, G.A., WordNet: A Lexical Database for English, Commun. ACM, 1995, vol. 38, no. 11, pp. 39–41.

    Article  Google Scholar 

  3. 3.

    Cycorp, Inc. www.cyc.com.

  4. 4.

    Francis, W. and Kucera, H., Brown Corpus Manual. http://icame.uib.no/brown/bcm.html.

  5. 5.

    The Penn Treebank Project. http://www.cis.upenn. edu/~treebank/.

  6. 6.

    Manning, C.D. and Schutze, H., Foundations of Statistical Natural Language Processing, Cambridge, Mass.: MIT, 1999.

    Google Scholar 

  7. 7.

    Ide, N. and Ve’ronis, J., Word Sense Disambiguation: The State of the Art, Computational Linguistics, 1998.

  8. 8.

    Aggire, E. and Edmonds, P.G., Word Sense Disambiguation: Algorithms and Applications, Springer, 2006.

  9. 9.

    Senseval Web Page. www.senseval.org.

  10. 10.

    Mihalcea, R., Using Wikipedia for Automatic Word Sense Disambiguation, Proc. of NAACL HLT 2007, Rochester, NY, 2007, pp. 196–203.

  11. 11.

    Mihalcea, R. and Csomai, A., Wikify!: Linking Documents to Encyclopedic Knowledge, Proc. of the 16th ACM Conf. on Information and Knowledge Management (CIKM’07), 2007.

  12. 12.

    Menczer, F., Evolution of Document Networks, Proc. of the National Academy of Sciences of the United States of America.

  13. 13.

    Albert, R. and Baraba’si, A.-L., Statistical Mechanics of Complex Networks, Rev. Modern Phys., 2002, vol. 47, pp. 47–97.

    Article  MathSciNet  Google Scholar 

  14. 14.

    Cohen, R. and Havlin, S., Scale-free Networks are Ultrasmall, Phys. Rev. Lett., 2003, vol. 90, no. 5, 058701.

    Article  Google Scholar 

  15. 15.

    Turdakov, D. and Velikhov, P., Semantic Relatedness Metric for Wikipedia Concepts Based on Link Analysis and its Applications to Word Sense Disambiguation, Proc. of SYRCoDIS, 2008.

  16. 16.

    Kilgarriff, A. and Grefenstette, G., Introduction to the Special Issue on the Web as Corpus, Computational Linguistics, 2003, vol. 29, no. 3, pp. 333–347.

    Article  MathSciNet  Google Scholar 

  17. 17.

    Zesch, T. and Gurevych, I., Analysis of the Wikipedia Category Graph for NLP Applications, Proc. of the TextGraphs-2 Workshop, NAACL-HLT, 2007.

  18. 18.

    Lesk, M., Automatic Sense Disambiguation Using Machine Readable Dictionaries: How to Tell a Pine Cone from an Ice Cream Cone, ACM Special Interest Group for Design of Communication, Proc. of the 5th Ann. Int. Conf. on System Documentation, 1986, pp. 24–26.

  19. 19.

    Pradhan, S., Loper, E., Dligach, D., and Palmer, M., SemEval-2007 Task 17: English Lexical Sample, SRL and All Words, Proc. of the 4th Int. Workshop on Semantic Evaluations (SemEval-2007), 2007, Prague, Czech Republic, pp. 87–92.

  20. 20.

    Strube, M. and Ponzetto, S.P., WikiRelate! Computing Semantic Relatedness Using Wikipedia, Proc. of AAAI, 2006, pp. 1419–1424.

  21. 21.

    Milne, D. and Witten, I.H., An Effective, Low-Cost Measure of Semantic Relatedness Obtained from Wikipedia Links, Proc. of the AAAI’08 Workshop on Wikipedia and Artificial Intelligence, 2008.

  22. 22.

    Cucerzan, S., Large-Scale Named Entity Disambiguation Based on Wikipedia Data, Proc. 2007 Joint Conf. on EMNLP and CNLL, Prague, 2007, pp. 708–716.

  23. 23.

    Bunescu, R. and Pasca, M., Using Encyclopedic Knowledge for Named Entity Disambiguation, Proc. of the 11th Conf. of the European Chapter of the Association for Computational Linguistics (EACL), Trento, Italy, 2006.

  24. 24.

    Milne, D. and Witten, I.H., Learning to Link with Wikipedia, Proc. of the ACM Conf. on Information and Knowledge Management, 2008.

  25. 25.

    Medelyan, O., Witten, I.H., and Milne, D., Topic Indexing with Wikipedia, Proc. of the AAAI’08 Workshop on Wikipedia and Artificial Intelligence, 2008.

  26. 26.

    Jeh, G. and Widom, J., SimRank: A Measure of Structural-Context Similarity, Proc. of the Eighth ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, 2002.

  27. 27.

    Lizorkin, D., Velikhov, P., Grinev, M., and Turdakov, D., Accuracy Estimate and Optimization Techniques for SimRank Computation, Proc. of the 34th Int. Conf. on Very Large Data Bases (VLDB’08), pp. 422–433.

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to D. Yu. Turdakov.

Additional information

Original Russian Text © D.Yu. Turdakov, S.D. Kuznetsov, 2010, published in Programmirovanie, 2010, Vol. 36, No. 1.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Turdakov, D.Y., Kuznetsov, S.D. Automatic word sense disambiguation based on document networks. Program Comput Soft 36, 11–18 (2010). https://doi.org/10.1134/S0361768810010032

Download citation

Keywords

  • Semantic Relatedness
  • Ambiguous Word
  • Word Sense
  • Word Sense Disambiguation
  • Test Collection