Abstract
Computing semantic relatedness is a key component of information retrieval tasks and natural processing language applications. Wikipedia provides a knowledge base for computing word relatedness with more coverage than WordNet. In this paper we use a new intrinsic information content (IC) metric with Wikipedia category graph (WCG) to measure the semantic relatedness between words. Indeed, we have developed a performed algorithm to extract the categories assigned to a given word from the WCG. Moreover, this extraction strategy is coupled with a new intrinsic information content metric based on the subgraph composed of hypernyms of a given concept. Also, we have developed a process to quantify the information content subgraph. When tested on common benchmark of similarity ratings the proposed approach shows a good correlation value compared to other computational models.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Strube, M., Ponzetto, S.: WikiRelate! Computing semantic relatedness using Wikipedia. In: Proceedings of the 21st National Conference on Artificial Intelligence, AAAI (2006)
Budanitsky, A., Hirst, G.: Evaluating WordNet-based measures of semantic distance. Computational Linguistics 32(1), 13–47 (2006)
Resnik, P.: Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research 11, 95–130 (1999)
Leacock, C., Chodorow, M.: Combining local context and WordNet similarity for word sense identification. In: Fellbaum, C. (ed.) WordNet. An Electronic Lexical Database, ch. 11, pp. 265–283 (1998)
Han, X., Zhao, J.: Structural semantic relatedness: a knowledge-based method to named entity disambiguation. In: The 48th Annual Meeting of the Association for Computational Linguistics (2010)
Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., Ruppin, E.: Placing search in context: the concept revisited. ACM Transactions on Information Systems 20(1), 116–131 (2002)
Gurevych, I., Müller, C., Zesch, T.: What to be? – electronic career guidance based on semantic relatedness. In: Proceedings of ACL, pp. 1032–1039. Association for Computational Linguistics, Prague (2007)
Baziz, M., Boughanem, M., Aussenac-Gilles, N.: Evaluating a Conceptual Indexing Method by Utilizing WordNet. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 238–246. Springer, Heidelberg (2006)
Zargayouna, H.: Contexte et sémantique pour une indexation de documents semi-structurés. In: ACM COnférence en Recherche Information et Applications, CORIA 2004 (2004)
Zesch, T., Gurevych, I.: Analysis of the Wikipedia Category Graph for NLP Applications. In: Proceedings of the TextGraphs-2 Workshop, NAACL-HLT (2007)
Ponzetto, S.P., Strube, M.: Knowledge Derived From Wikipedia For Computing Semantic Relatedness. Journal of Artificial Intelligence Research 30, 181–212 (2007)
Zesch, T., Gurevych, I.: Wisdom of crowds versus wisdom of linguists: measuring the semantic relatedness of words. Journal of Natural Language Engineering 16, 25–59 (2010)
Zhang, Z., Gentile, A., Xia, L., Iria, J., Chapman, S.: A random graph walk based approach to compute semantic relatedness using knowledge from Wikipedia. In: Proceedings of LREC 2010 (2010)
Lord, P., Stevens, R., Brass, A., Goble, C.: Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics 19(10), 1275–1283 (2003)
Patwardhan, S., Pedersen, T.: Using WordNet based context vectors to estimate the semantic relatedness of concepts. In: Proceedings of the EACL 2006 Workshop Making Sense of Sense, Italy (2006)
Völkel, M., Krötzsch, M., Vrandecic, D., Haller, H., Studer, R.: Semantic Wikipedia. In: Proceedings of the 15th International Conference on World Wide Web, Edinburgh, Scotland (2006)
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Hirst, G., St-Onge, D.: Lexical chains as representation of context for the detection and correction malapropisms. In: Fellbaum, C. (ed.) WordNet: An Electronic Lexical Database and Some of Its Applications, Cambridge, pp. 305–332 (1998)
Zesch, T., Müller, C., Gurevych, I.: Extracting lexical semantic knowledge from Wikipedia and Wiktionary. In: Proceedings of the Conference on Language Resources and Evaluation (LREC), Marrakech, Morocco (2008)
Zesch, T., Gurevych, I., Mühlhäuser, M.: Analyzing and Accessing Wikipedia as a Lexical Semantic Resource. In: Biannual Conference of the Society for Computational Linguistics and Language Technology, pp. 213–221 (2007)
Rada, R., Mili, H., Bicknell, E., Blettner, M.: Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man and Cybernetics 19(1), 17–30 (1989)
Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, pp. 133–138 (1994)
Jiang, J., Conrath, D.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of the International Conference on Research in Computational Linguistics, pp. 19–33 (1997)
Lin, D.: An information-theoretic definition of similarity. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 296–304 (1998)
Pirro, G.: A semantic similarity metric combining features and intrinsic information content. Data and Knowledge Engineering 68(11), 1289–1308 (2009)
Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of IJCAI 1995, pp. 448–453 (1995)
Seco, N., Hayes, T.: An intrinsic information content metric for semantic similarity in WordNet. In: Proceedings of the 16th European Conference on Artificial Intelligence (2004)
Sebti, A., Barfrouch, A.A.: A new word sense similarity measure in WordNet. In: Proceedings of the International Multiconference on Computer Science and Information Technologie, Poland (2008)
Hadj Taieb, M., Ben Aouicha, M., Tmar, M., Ben Hamadou, A.: New Information Content Metric and Nominalization Relation for a new WordNet-based method to measure the semantic relatedness. In: 10th IEEE International Conference on Cybernetic Intelligent Systems, University of East London (2011)
Yeh, E., Ramage, D., Manning, C., Agirre, E., Soroa, A.: WikiWalk: Random walks on Wikipedia for Semantic Relatedness. In: ACL Workshop”TextGraphs-4: Graph-based Methods for Natural Language Processing (2009)
Milne, D.: Computing Semantic Relatedness using Wikipedia Link Structure. In: Proc. of NZ CSRSC 2007 (2007)
Gabrilovich, E., Markovitch, S.: Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India (January 2007)
Harrington, B.: A semantic network approach to measuring relatedness. In: Proceedings of COLING 2010 (2010)
Wojtinnek, P., Pulman, S.: Semantic relatedness from automatically generated semantic networks. In: Proceedings of the Ninth International Conference on Computational Semantics, IWCS 2011 (2011)
Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Pasca, M., Soroa, A.: A Study on Similarity and Relatedness Using Distributional and WordNet based Approaches. In: Proceedings of NAACL 2009 (2009)
Halavais, A.: An Analysis of Topical Coverage of Wikipedia. Journal of Computer-Mediated Communication 13(2) (2008)
Gouws, S., Rooyen, G., Engelbrecht, H.: Measuring conceptual similarity by spreading activation over Wikipedia’s hyperlink structure. In: Proceedings of the 2nd Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources (2010)
Varelas, G., Voutsakis, E., Raftopoulou, P., Petrakis, E., Milios, E.: Semantic similarity methods in WordNet and their application to information retrieval on the web. In: 7th ACM Intern. Workshop on Web Information and Data Management (WIDM 2005), Bremen, Germany (2005)
Jarmasz, M.: Roget’s thesaurus as a lexical resource for natural language processsing. Master’s thesis, University of Ottawa (2003)
Hadj Taieb, M., Ben Aouicha, M., Tmar, M., Ben Hamadou, A.: New WordNet-based semantic relatedness measurement using new information content metric and k-means clustering algorithm. In: Global WordNet Conference, Matsue, Japan (2012)
Miller, G.A., Charles, W.G.: Contextual correlates of semantic similarity. Language and Cognitive Processes 6(1), 1–28 (1991)
Rubenstein, H., Goodenough, J.: Contextual correlates of synonymy. Communications of the ACM 8(10), 627–633
Ross, S.: A first Course in Probability. Macmillan (1976)
Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. JASIS 41(6) (1990)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hadj Taieb, M.A., Ben Aouicha, M., Tmar, M., Ben Hamadou, A. (2012). Wikipedia Category Graph and New Intrinsic Information Content Metric for Word Semantic Relatedness Measuring. In: Xiang, Y., Pathan, M., Tao, X., Wang, H. (eds) Data and Knowledge Engineering. ICDKE 2012. Lecture Notes in Computer Science, vol 7696. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34679-8_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-34679-8_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34678-1
Online ISBN: 978-3-642-34679-8
eBook Packages: Computer ScienceComputer Science (R0)