Knowledge and Information Systems

, Volume 41, Issue 2, pp 467–497 | Cite as

A new semantic relatedness measurement using WordNet features

Regular Paper

Abstract

Computing semantic similarity/relatedness between concepts and words is an important issue of many research fields. Information theoretic approaches exploit the notion of Information Content (IC) that provides for a concept a better understanding of its semantics. In this paper, we present a complete IC metrics survey with a critical study. Then, we propose a new intrinsic IC computing method using taxonomical features extracted from an ontology for a particular concept. This approach quantifies the subgraph formed by the concept subsumers using the depth and the descendents count as taxonomical parameters. In a second part, we integrate this IC metric in a new parameterized multistrategy approach for measuring word semantic relatedness. This measure exploits the WordNet features such as the noun “is a” taxonomy, the nominalization relation allowing the use of verb “is a” taxonomy and the shared words (overlaps) in glosses. Our work has been evaluated and compared with related works using a wide set of benchmarks conceived for word semantic similarity/relatedness tasks. Obtained results show that our IC method and the new relatedness measure correlated better with human judgments than related works.

Keywords

Semantic similarity Semantic relatedness WordNet  Information content Gloss 

References

  1. 1.
    Agirre E, Alfonseca E, Hall K, Kravalova J, Pasca M, Soroa A (2009) A study on similarity and relatedness using distributional and WordNet-based approaches. In: Proceedings of human language technologies: the 2009 annual conference of the North American chapter of the Association for Computational Linguistics (Stroudsburg, PA, USA, 2009), NAACL ’09. Association for, Computational Linguistics, pp 19–27Google Scholar
  2. 2.
    Atkinson J, Ferreira A, Aravena E (2009) Discovering implicit intention-level knowledge from natural-language texts. Knowl Based Syst 22(7):502–508CrossRefGoogle Scholar
  3. 3.
    Ballatore A, Bertolotto M, Wilson D (2012) Geographic knowledge extraction and semantic similarity in OpenStreetMap. Knowl Inf Syst. doi:10.1007/s10115-012-0571-0
  4. 4.
    Banerjee S, Pedersen T (2003) Extended gloss overlaps as a measure of semantic relatedness. In: Proceedings of the eighteenth international joint conference on, artificial intelligence, pp 805–810Google Scholar
  5. 5.
    Batet M, Sánchez D, Valls A (2011) An ontology-based measure to compute semantic similarity in biomedicine. J Biomed Inform 44(1):118–125CrossRefGoogle Scholar
  6. 6.
    Batista DS, Silva MJ, Couto FM, Behera B (2010) Geographic signatures for semantic retrieval. In: Proceedings of the 6th workshop on geographic information retrieval (New York, NY, USA), GIR’10, ACM, pp 19:1–19:8Google Scholar
  7. 7.
    Baziz M, Boughanem M, Aussenac-Gilles N (2005) Evaluating a conceptual indexing method by utilizing WordNet. In: Accessing multilingual information repositories, 6th workshop of the cross-language evaluation forum, CLEF 2005, Vienna, Austria, 21–23 September, 2005. [Revised Selected Papers, C. Peters, F. C. Gey, J. Gonzalo, H. Müller, G. J. F. Jones, M. Kluck, B. Magnini, and M. de Rijke, Eds., vol. 4022 of Lecture Notes in Computer Science. Springer, pp 238–246]Google Scholar
  8. 8.
    Blanco-Fernández Y, Pazos-Arias JJ, Gil-Solla A, Ramos-Cabrer M, López-Nores M, García-Duque J, Fernández-Vilas A, Díaz-Redondo RP, Bermejo-Muñoz J (2008) A flexible semantic inference methodology to reason about user preferences in knowledge-based recommender systems. Know Based Syst 21(4):305–320Google Scholar
  9. 9.
    Bollegala D, Matsuo Y, Ishizuka M (2007) Measuring semantic similarity between words using web search engines. In: WWW ‘07: proceedings of the 16th international conference on World Wide Web (New York, NY, USA). ACM, pp 757–766Google Scholar
  10. 10.
    Budanitsky A, Budanitsky A (1999) Lexical semantic relatedness and its application in natural language processing. Tech. RepGoogle Scholar
  11. 11.
    Budanitsky A, Hirst G (2006) Evaluating WordNet-based measures of lexical semantic relatedness. Comput Linguist 32(1):13–47CrossRefMATHGoogle Scholar
  12. 12.
    Bulskov H, Andreasen T (2002) On measuring similarity for conceptual querying. In: Proceedings of the 5th international conference on flexible query answering systems. Springer, Berlin, pp 100–111Google Scholar
  13. 13.
    Chen H-H, Lin M-S, Wei Y-C (2006) Novel association measures using web search with double checking. In: Proceedings of the 21st international conference on computational linguistics and the 44th annual meeting of the Association for Computational Linguistics (Stroudsburg, PA, USA), ACL-44, Association for, Computational Linguistics, pp 1009–1016Google Scholar
  14. 14.
    Couto FM, Silva MJ, Coutinho PM (2007) Measuring semantic similarity between gene ontology terms. Data Knowl Eng 61(1):137–152CrossRefGoogle Scholar
  15. 15.
    Cross V, Chennai-Thiagarajan A (2012) Measuring information content for an ontological concept. In: Fuzzy Information Processing Society (NAFIPS)Google Scholar
  16. 16.
    Curran JR (2002) Ensemble methods for automatic thesaurus extraction. In: Proceedings of the conference on empirical methods in natural language processing, pp 222–229Google Scholar
  17. 17.
    Debenham J, Sierra C (2008) Merging intelligent agency and the semantic web. Knowl Based Syst 21(3):184–191CrossRefGoogle Scholar
  18. 18.
    Fellbaum C (ed) (1998) WordNet: an electronic lexical database (language, speech, and communication), illustrated edition. MIT Press, CambridgeGoogle Scholar
  19. 19.
    Ferreira JD, Couto FM (2010) Semantic similarity for automatic classification of chemical compounds. PLoS Comput Biol 6:9Google Scholar
  20. 20.
    Ferreira JD, Couto FM (2011) Generic semantic relatedness measure for biomedical ontologies. In: Bodenreider O, Martone ME, Ruttenberg A (eds) ICBO, vol 833 of CEUR workshop proceedings, CEUR-WS.orgGoogle Scholar
  21. 21.
    Finkelstein L, Evgenly G, Yossi M, Ehud R, Zach S, Gadi W, Eytan R (2001) Placing search in context: the concept revisited. In: Proceedings of the tenth international World Wide Web conferenceGoogle Scholar
  22. 22.
    Formica A (2008) Concept similarity in formal concept analysis: an information content approach. Knowl Based Syst 21(1):80–87MathSciNetCrossRefGoogle Scholar
  23. 23.
    Francis NW, Kučera H (1982) Frequency analysis of English usage: lexicon and grammar, vol 18. Houghton Mifflin, BostonGoogle Scholar
  24. 24.
    Gaeta M, Orciuoli F, Ritrovato P (2009) Advanced ontology management system for personalised e-learning. Knowl Based Syst 22(4):292–301CrossRefGoogle Scholar
  25. 25.
    Gracia J, Mena E (2008) Web-based measure of semantic relatedness. In: Proceedings of 9th international conference on Web information systems engineering (WISE 2008), Auckland (New Zealand). Springer, Berlin, pp 136–150Google Scholar
  26. 26.
    Harris Z (1954) Distributional structure. Word 10(23):146–162Google Scholar
  27. 27.
    Hirst G, St-Onge D (1998) Lexical chains as representations of context for the detection and correction of malapropisms. MIT Press, Cambridge, pp 305-322Google Scholar
  28. 28.
    Hliaoutakis A, Varelas G, Voutsakis E, Petrakis EGM, Milios E (2006) Information retrieval by semantic similarity. In: International journal on semantic Web and information systems (IJSWIS), vol 3(3), pp 55–73, July/Sept. [Special issue of multimedia semantics]Google Scholar
  29. 29.
    Janowicz K, Keler C, Schwarz M, Wilkes M, Panov I, Espeter M, Bumer B (2007) Algorithm, implementation and application of the sim-dl similarity server. In: Second international conference on geospatial semantics (GEOS 2007). Number 4853 in lecture notes in computer science. Springer, Berlin, pp 128–145Google Scholar
  30. 30.
    Jiang JJ, Conrath DW (1997) Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of international conference research on computational linguistics, TaiwanGoogle Scholar
  31. 31.
    Köhler S, Schulz MH, Krawitz P, Bauer S, Dölken S, Ott CE, Mundlos C, Horn D, Mundlos S, Robinson PN (2009) Clinical diagnostics in human genetics with semantic similarity searches in ontologies. Am J Human Genet 85(4):457–464CrossRefGoogle Scholar
  32. 32.
    Leacock C, Chodorow M (1998) Combining Local context and WordNet similarity for word sense identification, chap 11. MIT Press, Cambridge, pp 265–283Google Scholar
  33. 33.
    Lee JH, Kim MH, Lee YJ (1993) Information retrieval based on conceptual distance in is-a hierarchies. J Documentation 49(2):188–207CrossRefGoogle Scholar
  34. 34.
    Lesk M (1986) Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the 5th annual international conference on Systems documentation (New York, NY, USA), SIGDOC ’86, ACM, pp 24–26Google Scholar
  35. 35.
    Li Y, Bandar ZA, McLean D (2003) An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans Knowl Data Eng 15(4):871–882CrossRefGoogle Scholar
  36. 36.
    Lin D (1998) An information-theoretic definition of similarity. In: Proceedings of the 15th international conference on machine learning. Morgan Kaufmann, San Francisco, pp 296–304Google Scholar
  37. 37.
    Lopez-Pellicer FJ, Silva MJ, Chaves M (2010) Linkable geographic ontologies. In: Proceedings of the 6th workshop on geographic information retrieval (New York, NY, USA), GIR ’10. ACM, pp 1:1–1:8Google Scholar
  38. 38.
    Meng L, Gu J, Zhou Z (Sept. 2012) A new model of information content based on concept’s topology for measuring semantic similarity in WordNet. Int J Grid Distrib Comput 5:3Google Scholar
  39. 39.
    Miller GA, Charles WG (1991) Contextual correlates of semantic similarity. Lang Cogn Process 6(1):1–28CrossRefGoogle Scholar
  40. 40.
    Nayak R, Iryadi W (2007) Xml schema clustering with semantic and hierarchical similarity measures. Knowl Based Syst 20(4):336–349CrossRefGoogle Scholar
  41. 41.
    Pakhomov S, McInnes B, Adam T, Liu Y, Pedersen T, Melton GB (2010) Semantic similarity and relatedness between clinical terms: an experimental study. In: Proceedings AMIA annual symposium pp. 572–577Google Scholar
  42. 42.
    Patwardhan S, Pedersen T (2006) Using WordNet-based context vectors to estimate the semantic relatedness of concepts. In: EACL 2006 workshop making sense of sense-bringing computational linguistics and psycholinguistics together (Trento, Italy), pp 1–8Google Scholar
  43. 43.
    Pedersen T, Pakhomov SVS, Patwardhan S, Chute CG (2007) Measures of semantic similarity and relatedness in the biomedical domain. J Biomed Inform 40(3):288–299CrossRefGoogle Scholar
  44. 44.
    Pedersen T, Patwardhan S, Michelizzi J (2004) WordNet::Similarity: measuring the relatedness of concepts. In: Demonstration papers at HLT-NAACL 2004 (Stroudsburg, PA, USA), HLT-NAACL-Demonstrations ’04. Association for Computational Linguistics, pp 38–41Google Scholar
  45. 45.
    Pesquita C, Faria D, Falcão AO, Lord P, Couto FM (2009) Semantic similarity in biomedical ontologies. PLoS Comput Biol 5:7Google Scholar
  46. 46.
    Petrakis EGM, Varelas G, Hliaoutakis A, Raftopoulou P (2006) X-similarity: computing semantic similarity between concepts from different ontologies. J Digit Inf Manag (JDIM) 4(4):233–238Google Scholar
  47. 47.
    Pirró G (2009) A semantic similarity metric combining features and intrinsic information content. Data Knowl Eng 68(11):1289–1308CrossRefGoogle Scholar
  48. 48.
    Rada R, Mili H, Bicknell E, Blettner M (1989) Development and application of a metric on semantic nets. IEEE Trans Syst Man Cybern 19(1):17–30CrossRefGoogle Scholar
  49. 49.
    Radinsky K, Agichtein E, Gabrilovich E, Markovitch S (2011) A word at a time: computing word relatedness using temporal semantic analysis. In: Proceedings of the 20th international conference on World Wide Web (New York, NY, USA), WWW ’11, ACM, pp 337–346Google Scholar
  50. 50.
    Resnik P (1995) Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th international joint conference on Artificial intelligence—volume 1 (San Francisco, CA, USA), IJCAI’95. Morgan Kaufmann, pp 448–453Google Scholar
  51. 51.
    Resnik P (1999) Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language J Artif Intell Res 11:95–130Google Scholar
  52. 52.
    Richardson R, Smeaton AF, Murphy J (1994) Using WordNet as a knowledge base for measuring semantic similarity between words. In: Proceedings of AICS conference, Tech. RepGoogle Scholar
  53. 53.
    Rodríguez MA, Egenhofer MJ (2003) Determining semantic similarity among entity classes from different ontologies. IEEE Trans Knowl Data Eng 15(2):442–456CrossRefGoogle Scholar
  54. 54.
    Rubenstein H, Goodenough JB (1965) Contextual correlates of synonymy. Commun ACM 8(10):627–633CrossRefGoogle Scholar
  55. 55.
    Sahami M, Heilman TD (2006) A web-based kernel function for measuring the similarity of short text snippets. In: Proceedings of the 15th international conference on World Wide Web (New York, NY, USA), WWW ’06. ACM, pp 377–386Google Scholar
  56. 56.
    Sánchez D (2010) A methodology to learn ontological attributes from the web. Data Knowl Eng 69(6):573–597CrossRefGoogle Scholar
  57. 57.
    Sánchez D, Batet M, Isern D (2011) Ontology-based information content computation. Knowl Based Syst 24(2):297–303CrossRefGoogle Scholar
  58. 58.
    Sánchez D, Isern D, Millan M (2011) Content annotation for the semantic web: an automatic web-based approach. Knowl Inf Syst 27(3):393–418CrossRefGoogle Scholar
  59. 59.
    Sánchez D, Moreno A (2008) Learning non-taxonomic relationships from web documents for domain ontology construction. Data Knowl Eng 64(3):600–623CrossRefGoogle Scholar
  60. 60.
    Sebti A, Barfroush AA (2008) A new word sense similarity measure in WordNet. In: IEEE IMCSIT 2008, pp 369–373Google Scholar
  61. 61.
    Seco N, Veale T, Hayes J (2004) An intrinsic information content metric for semantic similarity in WordNet. In: Proceedings of the 16th European conference on artificial intelligence, pp 1089–1090Google Scholar
  62. 62.
    Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423Google Scholar
  63. 63.
    Snow R, O’Connor B, Jurafsky D, Ng AY (2008) Cheap and fast- but is it good? Evaluating non-expert annotations for natural language tasks. In: Proceedings of the conference on empirical methods in natural language processing (Stroudsburg, PA, USA), EMNLP ’08, Association for Computational Linguistics, pp 254–263Google Scholar
  64. 64.
    Spearman C (1987) The proof and measurement of association between two things. By C. Spearman, 1904. Am J Psychol 100(3–4):441–471CrossRefGoogle Scholar
  65. 65.
    Stevenson M, Greenwood MA (2005) A semantic approach to IE pattern induction. In: Proceedings of the 43rd annual meeting on Association for Computational Linguistics (Stroudsburg, PA, USA), ACL’05, Association for Computational Linguistics, pp 379–386Google Scholar
  66. 66.
    Sussna M (1993) Word sense disambiguation for free-text indexing using a massive semantic network. In: Proceedings of the second international conference on information and knowledge management (New York, NY, USA), CIKM ’93. ACM, pp 67–74Google Scholar
  67. 67.
    Tapeh AG, Rahgozar M (2008) A knowledge-based question answering system for b2c ecommerce. Knowl Based Syst 21(8):946–950CrossRefGoogle Scholar
  68. 68.
    Tversky A (1977) Features of similarity. Psychol Rev 84:327–352Google Scholar
  69. 69.
    Wu Z, Palmer M (1994) Verbs semantics and lexical selection. In: Proceedings of the 32nd annual meeting on Association for Computational Linguistics (Stroudsburg, PA, USA), ACL ’94, Association for Computational Linguistics, pp 133–138Google Scholar
  70. 70.
    Yang D, Powers DMW (2005) Measuring semantic similarity in the taxonomy of WordNet. In: Proceedings of the twenty-eighth Australasian conference on computer science—volume 38 (Darlinghurst, Australia, Australia), ACSC ’05. Australian Computer Society, Inc., pp 315–322Google Scholar
  71. 71.
    Zargayouna H (2004) Contexte et sémantique pour une indexation de documents semi-structurés. In: CORIA, pp 161–178Google Scholar
  72. 72.
    Zhou Z, Wang Y, Gu J (2008) A new model of information content for semantic similarity in WordNet. In: Proceedings of the 2008 second international conference on future generation communication and networking symposia—volume 03 (Washington, DC, USA), FGCNS ’08. IEEE Computer Society, pp 85–89Google Scholar

Copyright information

© Springer-Verlag London 2013

Authors and Affiliations

  1. 1.MIRACL LaboratorySfax UniversitySfaxTunisia

Personalised recommendations