In this paper, a survey of works on word sense disambiguation is presented, and the method used in the Texterra system  is described. The method is based on calculation of semantic relatedness of Wikipedia concepts. Comparison of the proposed method and the existing word sense disambiguation methods on various document collections is given.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Price excludes VAT (USA)
Tax calculation will be finalised during checkout.
Texterra: A Toolkit for Text Mining. http://modis.ispras.ru/texterra.
Miller, G.A., WordNet: A Lexical Database for English, Commun. ACM, 1995, vol. 38, no. 11, pp. 39–41.
Cycorp, Inc. www.cyc.com.
Francis, W. and Kucera, H., Brown Corpus Manual. http://icame.uib.no/brown/bcm.html.
The Penn Treebank Project. http://www.cis.upenn. edu/~treebank/.
Manning, C.D. and Schutze, H., Foundations of Statistical Natural Language Processing, Cambridge, Mass.: MIT, 1999.
Ide, N. and Ve’ronis, J., Word Sense Disambiguation: The State of the Art, Computational Linguistics, 1998.
Aggire, E. and Edmonds, P.G., Word Sense Disambiguation: Algorithms and Applications, Springer, 2006.
Senseval Web Page. www.senseval.org.
Mihalcea, R., Using Wikipedia for Automatic Word Sense Disambiguation, Proc. of NAACL HLT 2007, Rochester, NY, 2007, pp. 196–203.
Mihalcea, R. and Csomai, A., Wikify!: Linking Documents to Encyclopedic Knowledge, Proc. of the 16th ACM Conf. on Information and Knowledge Management (CIKM’07), 2007.
Menczer, F., Evolution of Document Networks, Proc. of the National Academy of Sciences of the United States of America.
Albert, R. and Baraba’si, A.-L., Statistical Mechanics of Complex Networks, Rev. Modern Phys., 2002, vol. 47, pp. 47–97.
Cohen, R. and Havlin, S., Scale-free Networks are Ultrasmall, Phys. Rev. Lett., 2003, vol. 90, no. 5, 058701.
Turdakov, D. and Velikhov, P., Semantic Relatedness Metric for Wikipedia Concepts Based on Link Analysis and its Applications to Word Sense Disambiguation, Proc. of SYRCoDIS, 2008.
Kilgarriff, A. and Grefenstette, G., Introduction to the Special Issue on the Web as Corpus, Computational Linguistics, 2003, vol. 29, no. 3, pp. 333–347.
Zesch, T. and Gurevych, I., Analysis of the Wikipedia Category Graph for NLP Applications, Proc. of the TextGraphs-2 Workshop, NAACL-HLT, 2007.
Lesk, M., Automatic Sense Disambiguation Using Machine Readable Dictionaries: How to Tell a Pine Cone from an Ice Cream Cone, ACM Special Interest Group for Design of Communication, Proc. of the 5th Ann. Int. Conf. on System Documentation, 1986, pp. 24–26.
Pradhan, S., Loper, E., Dligach, D., and Palmer, M., SemEval-2007 Task 17: English Lexical Sample, SRL and All Words, Proc. of the 4th Int. Workshop on Semantic Evaluations (SemEval-2007), 2007, Prague, Czech Republic, pp. 87–92.
Strube, M. and Ponzetto, S.P., WikiRelate! Computing Semantic Relatedness Using Wikipedia, Proc. of AAAI, 2006, pp. 1419–1424.
Milne, D. and Witten, I.H., An Effective, Low-Cost Measure of Semantic Relatedness Obtained from Wikipedia Links, Proc. of the AAAI’08 Workshop on Wikipedia and Artificial Intelligence, 2008.
Cucerzan, S., Large-Scale Named Entity Disambiguation Based on Wikipedia Data, Proc. 2007 Joint Conf. on EMNLP and CNLL, Prague, 2007, pp. 708–716.
Bunescu, R. and Pasca, M., Using Encyclopedic Knowledge for Named Entity Disambiguation, Proc. of the 11th Conf. of the European Chapter of the Association for Computational Linguistics (EACL), Trento, Italy, 2006.
Milne, D. and Witten, I.H., Learning to Link with Wikipedia, Proc. of the ACM Conf. on Information and Knowledge Management, 2008.
Medelyan, O., Witten, I.H., and Milne, D., Topic Indexing with Wikipedia, Proc. of the AAAI’08 Workshop on Wikipedia and Artificial Intelligence, 2008.
Jeh, G. and Widom, J., SimRank: A Measure of Structural-Context Similarity, Proc. of the Eighth ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, 2002.
Lizorkin, D., Velikhov, P., Grinev, M., and Turdakov, D., Accuracy Estimate and Optimization Techniques for SimRank Computation, Proc. of the 34th Int. Conf. on Very Large Data Bases (VLDB’08), pp. 422–433.
Original Russian Text © D.Yu. Turdakov, S.D. Kuznetsov, 2010, published in Programmirovanie, 2010, Vol. 36, No. 1.
About this article
Cite this article
Turdakov, D.Y., Kuznetsov, S.D. Automatic word sense disambiguation based on document networks. Program Comput Soft 36, 11–18 (2010). https://doi.org/10.1134/S0361768810010032