Skip to main content

Wikipedia Category Graph and New Intrinsic Information Content Metric for Word Semantic Relatedness Measuring

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7696))

Abstract

Computing semantic relatedness is a key component of information retrieval tasks and natural processing language applications. Wikipedia provides a knowledge base for computing word relatedness with more coverage than WordNet. In this paper we use a new intrinsic information content (IC) metric with Wikipedia category graph (WCG) to measure the semantic relatedness between words. Indeed, we have developed a performed algorithm to extract the categories assigned to a given word from the WCG. Moreover, this extraction strategy is coupled with a new intrinsic information content metric based on the subgraph composed of hypernyms of a given concept. Also, we have developed a process to quantify the information content subgraph. When tested on common benchmark of similarity ratings the proposed approach shows a good correlation value compared to other computational models.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   72.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Strube, M., Ponzetto, S.: WikiRelate! Computing semantic relatedness using Wikipedia. In: Proceedings of the 21st National Conference on Artificial Intelligence, AAAI (2006)

    Google Scholar 

  2. Budanitsky, A., Hirst, G.: Evaluating WordNet-based measures of semantic distance. Computational Linguistics 32(1), 13–47 (2006)

    Article  MATH  Google Scholar 

  3. Resnik, P.: Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research 11, 95–130 (1999)

    MATH  Google Scholar 

  4. Leacock, C., Chodorow, M.: Combining local context and WordNet similarity for word sense identification. In: Fellbaum, C. (ed.) WordNet. An Electronic Lexical Database, ch. 11, pp. 265–283 (1998)

    Google Scholar 

  5. Han, X., Zhao, J.: Structural semantic relatedness: a knowledge-based method to named entity disambiguation. In: The 48th Annual Meeting of the Association for Computational Linguistics (2010)

    Google Scholar 

  6. Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., Ruppin, E.: Placing search in context: the concept revisited. ACM Transactions on Information Systems 20(1), 116–131 (2002)

    Article  Google Scholar 

  7. Gurevych, I., Müller, C., Zesch, T.: What to be? – electronic career guidance based on semantic relatedness. In: Proceedings of ACL, pp. 1032–1039. Association for Computational Linguistics, Prague (2007)

    Google Scholar 

  8. Baziz, M., Boughanem, M., Aussenac-Gilles, N.: Evaluating a Conceptual Indexing Method by Utilizing WordNet. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 238–246. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  9. Zargayouna, H.: Contexte et sémantique pour une indexation de documents semi-structurés. In: ACM COnférence en Recherche Information et Applications, CORIA 2004 (2004)

    Google Scholar 

  10. Zesch, T., Gurevych, I.: Analysis of the Wikipedia Category Graph for NLP Applications. In: Proceedings of the TextGraphs-2 Workshop, NAACL-HLT (2007)

    Google Scholar 

  11. Ponzetto, S.P., Strube, M.: Knowledge Derived From Wikipedia For Computing Semantic Relatedness. Journal of Artificial Intelligence Research 30, 181–212 (2007)

    MATH  Google Scholar 

  12. Zesch, T., Gurevych, I.: Wisdom of crowds versus wisdom of linguists: measuring the semantic relatedness of words. Journal of Natural Language Engineering 16, 25–59 (2010)

    Article  Google Scholar 

  13. Zhang, Z., Gentile, A., Xia, L., Iria, J., Chapman, S.: A random graph walk based approach to compute semantic relatedness using knowledge from Wikipedia. In: Proceedings of LREC 2010 (2010)

    Google Scholar 

  14. Lord, P., Stevens, R., Brass, A., Goble, C.: Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics 19(10), 1275–1283 (2003)

    Article  Google Scholar 

  15. Patwardhan, S., Pedersen, T.: Using WordNet based context vectors to estimate the semantic relatedness of concepts. In: Proceedings of the EACL 2006 Workshop Making Sense of Sense, Italy (2006)

    Google Scholar 

  16. Völkel, M., Krötzsch, M., Vrandecic, D., Haller, H., Studer, R.: Semantic Wikipedia. In: Proceedings of the 15th International Conference on World Wide Web, Edinburgh, Scotland (2006)

    Google Scholar 

  17. Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  18. Hirst, G., St-Onge, D.: Lexical chains as representation of context for the detection and correction malapropisms. In: Fellbaum, C. (ed.) WordNet: An Electronic Lexical Database and Some of Its Applications, Cambridge, pp. 305–332 (1998)

    Google Scholar 

  19. Zesch, T., Müller, C., Gurevych, I.: Extracting lexical semantic knowledge from Wikipedia and Wiktionary. In: Proceedings of the Conference on Language Resources and Evaluation (LREC), Marrakech, Morocco (2008)

    Google Scholar 

  20. Zesch, T., Gurevych, I., Mühlhäuser, M.: Analyzing and Accessing Wikipedia as a Lexical Semantic Resource. In: Biannual Conference of the Society for Computational Linguistics and Language Technology, pp. 213–221 (2007)

    Google Scholar 

  21. Rada, R., Mili, H., Bicknell, E., Blettner, M.: Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man and Cybernetics 19(1), 17–30 (1989)

    Article  MATH  Google Scholar 

  22. Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, pp. 133–138 (1994)

    Google Scholar 

  23. Jiang, J., Conrath, D.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of the International Conference on Research in Computational Linguistics, pp. 19–33 (1997)

    Google Scholar 

  24. Lin, D.: An information-theoretic definition of similarity. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 296–304 (1998)

    Google Scholar 

  25. Pirro, G.: A semantic similarity metric combining features and intrinsic information content. Data and Knowledge Engineering 68(11), 1289–1308 (2009)

    Article  Google Scholar 

  26. Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of IJCAI 1995, pp. 448–453 (1995)

    Google Scholar 

  27. Seco, N., Hayes, T.: An intrinsic information content metric for semantic similarity in WordNet. In: Proceedings of the 16th European Conference on Artificial Intelligence (2004)

    Google Scholar 

  28. Sebti, A., Barfrouch, A.A.: A new word sense similarity measure in WordNet. In: Proceedings of the International Multiconference on Computer Science and Information Technologie, Poland (2008)

    Google Scholar 

  29. Hadj Taieb, M., Ben Aouicha, M., Tmar, M., Ben Hamadou, A.: New Information Content Metric and Nominalization Relation for a new WordNet-based method to measure the semantic relatedness. In: 10th IEEE International Conference on Cybernetic Intelligent Systems, University of East London (2011)

    Google Scholar 

  30. Yeh, E., Ramage, D., Manning, C., Agirre, E., Soroa, A.: WikiWalk: Random walks on Wikipedia for Semantic Relatedness. In: ACL Workshop”TextGraphs-4: Graph-based Methods for Natural Language Processing (2009)

    Google Scholar 

  31. Milne, D.: Computing Semantic Relatedness using Wikipedia Link Structure. In: Proc. of NZ CSRSC 2007 (2007)

    Google Scholar 

  32. Gabrilovich, E., Markovitch, S.: Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India (January 2007)

    Google Scholar 

  33. Harrington, B.: A semantic network approach to measuring relatedness. In: Proceedings of COLING 2010 (2010)

    Google Scholar 

  34. Wojtinnek, P., Pulman, S.: Semantic relatedness from automatically generated semantic networks. In: Proceedings of the Ninth International Conference on Computational Semantics, IWCS 2011 (2011)

    Google Scholar 

  35. Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Pasca, M., Soroa, A.: A Study on Similarity and Relatedness Using Distributional and WordNet based Approaches. In: Proceedings of NAACL 2009 (2009)

    Google Scholar 

  36. Halavais, A.: An Analysis of Topical Coverage of Wikipedia. Journal of Computer-Mediated Communication 13(2) (2008)

    Google Scholar 

  37. Gouws, S., Rooyen, G., Engelbrecht, H.: Measuring conceptual similarity by spreading activation over Wikipedia’s hyperlink structure. In: Proceedings of the 2nd Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources (2010)

    Google Scholar 

  38. Varelas, G., Voutsakis, E., Raftopoulou, P., Petrakis, E., Milios, E.: Semantic similarity methods in WordNet and their application to information retrieval on the web. In: 7th ACM Intern. Workshop on Web Information and Data Management (WIDM 2005), Bremen, Germany (2005)

    Google Scholar 

  39. Jarmasz, M.: Roget’s thesaurus as a lexical resource for natural language processsing. Master’s thesis, University of Ottawa (2003)

    Google Scholar 

  40. Hadj Taieb, M., Ben Aouicha, M., Tmar, M., Ben Hamadou, A.: New WordNet-based semantic relatedness measurement using new information content metric and k-means clustering algorithm. In: Global WordNet Conference, Matsue, Japan (2012)

    Google Scholar 

  41. Miller, G.A., Charles, W.G.: Contextual correlates of semantic similarity. Language and Cognitive Processes 6(1), 1–28 (1991)

    Article  Google Scholar 

  42. Rubenstein, H., Goodenough, J.: Contextual correlates of synonymy. Communications of the ACM 8(10), 627–633

    Google Scholar 

  43. Ross, S.: A first Course in Probability. Macmillan (1976)

    Google Scholar 

  44. Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. JASIS 41(6) (1990)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hadj Taieb, M.A., Ben Aouicha, M., Tmar, M., Ben Hamadou, A. (2012). Wikipedia Category Graph and New Intrinsic Information Content Metric for Word Semantic Relatedness Measuring. In: Xiang, Y., Pathan, M., Tao, X., Wang, H. (eds) Data and Knowledge Engineering. ICDKE 2012. Lecture Notes in Computer Science, vol 7696. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34679-8_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34679-8_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34678-1

  • Online ISBN: 978-3-642-34679-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics