Abstract
Semantic similarity assessment between concepts is an important task in many language related applications. In the past, many approaches to assess similarity of concepts have been proposed by using one knowledge source. In this paper, some limitations of the existing similarity measures are identified. To tackle these problems, we propose an extensive study for semantic similarity of concepts from which a unified framework for semantic similarity computation is presented. Based on our framework, we give some generic and flexible approaches to semantic similarity measures resulting from instantiations of the framework. In particular, we obtain some new approaches to similarity measures that existing methods cannot deal with by introducing multiple knowledge sources. The evaluation based on eight benchmarks, three widely used benchmarks (i.e., M&C, R&G, and WordSim-353 benchmarks) and five benchmarks developed in ourselves (i.e, Jiang-1, Jiang-2, Jiang-3, Jiang-4, and Jiang-5 benchmarks), sustains the intuitions with respect to human judgements. Overall, some methods proposed in this paper have a good human correlation (Pearson correlation with human judgments and Spearman correlation with human judgments) and constitute some effective ways of determining semantic similarity between concepts.
Similar content being viewed by others
References
Abid A, Rouached M, Messai N (2020) Semantic web service composition using semantic similarity measures and formal concept analysis. Multimed Tools Appl 79:6569–6597
Agirre E, Alfonseca E, Hall K, Kravalova J, Pasca M, Soroa A (2009) A study on similarity and relatedness using distributional and WordNet-based approaches. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, Stroudsburg, pp 19–27
Alonso I, Contreras D (2016) Evaluation of semantic similarity metrics applied to the automatic retrieval of medical documents: an UMLS approach. Expert Syst Appl 44:386–399
Aouicha MB, Taieb MAH (2016) Computing semantic similarity between biomedical concepts using new information content approach. J Biomed Inform 59:258–275
Aouicha MB, Taieb MAH, Hamadou AB (2016) Taxonomy-based information content and wordnet-wiktionary-wikipedia glosses for semantic relatedness. Appl Intell 45(2):475–511
Baker T, Lamb D, Taleb-Bendiab A, Al-Jumeily D (2010) Facilitating semantic adaptation of web services at runtime using a meta-data layer. In: Proceedings of IEEE 2010 third international conference on Developments in eSystems Engineering (DESE 2010), IEEE, New York, pp 231–236
Bandrowski A, Brinkman R, Brochhausen M, Brush MH, Bug B, Chibucos MC, Clancy K, Courtot M, Derom D, Dumontier M, Fan L, Fostel J, Fragoso G, Gibson F, Gonzalez-Beltran A, Haendel MA, He Y, Heiskanen M, Hernandez-Boussard T, Jensen M, Lin Y, Lister AL, Lord P, Malone J, Manduchi E, McGee M, Morrison N, Overton JA, Parkinson H, Peters B, Rocca-Serra P, Ruttenberg A, Sansone SA, Scheuermann RH, Schober D, Smith B, Soldatova LN, Stoeckert CJ, Taylor CF, Torniai C, Turner JA, Vita R, Whetzel PL, Zheng J (2016) The ontology for biomedical investigations. PLoS One 11(4):e0154556
Batet M, Sanchez D, Valls A, Gibert K (2013) Semantic similarity estimation from multiple ontologies. Appl Intell 38(1):29–44
Bekhet S, Ahmed A (2020) Evaluation of similarity measures for video retrieval. Multimed Tools Appl 79:6265–6278
Bizer C, Heath T, Berners-Lee T (2009) Linked data - the story so far. Int J Semant Web Inf Syst 5(3):1–22
Bizer C, Lehmann J, Kobilarov G, Auer S, Becker C, Cyganiak R, Hellmann S (2009) DBpedia - a crystallization point for the web of data. J Web Semant 7(3):154–165
Budanitsky A, Hirst G (2006) Evaluating WordNet-based measures of lexical semantic relatedness. Comput Linguist 32(1):13–47
Capuano A, Rinaldi AM, Russo C (2020) An ontology-driven multimedia focused crawler based on linked open data and deep learning techniques. Multimed Tools Appl 79:7577–7598
Church KW, Hanks P (1990) Word association norms, mutual information, and lexicography. Comput Linguist 16(1):22–29
Cilibrasi RL, Vitanyi PMB (2007) The Google similarity distance. IEEE Trans Knowl Data Eng 19(3):370–383
Coletti MH, Bleich HL (2001) Medical subject headings used to search the biomedical literature. J Am Med Inform Assoc 8(4):317–323
Couto FM, Silva MJ, Coutinho PM (2007) Measuring semantic similarity between gene ontology terms. Data Knowl Eng 61(1):137–152
Cross V, Yu X, Hu X (2013) Unifying ontological similarity measures: a theoretical and empirical investigation. Int J Approx Reason 54(7):861–875
Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407
Fellbaum C (1998) WordNet: an electronic lexical database. Academic Press, Cambridge, MA
Ferreira R, Lins RD, Simske SJ, Freitas F, Riss M (2016) Assessing sentence similarity through lexical, syntactic and semantic analysis. Comput Speech Lang 39:1–28
Finkelstein L, Gabrilovich E, Matias Y, Rivlin E, Solan Z, Wolfman G, Ruppin E (2002) Placing search in context: the concept revisited. ACM Trans Inf Syst 20(1):116–131
Gabrilovich E, Markovitch S (2007) Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: Proceedings of the 20th International Joint Conference on Artificial intelligence (IJCAI 2007). Morgan Kaufmann Publishers, San Francisco, CA, USA, pp 1606–1611
Gao JB, Zhang BW, Chen XH (2015) A WordNet-based semantic similarity measurement combining edge-counting and information content theory. Eng Appl Artif Intell 39:80–88
Garla VN, Brandt C (2012) Semantic similarity in the biomedical domain: an evaluation across knowledge sources. BioMed Central Bioinform 13(1):261–273
Gene Ontology Consortium (2004) The gene ontology (GO) database and informatics resource. Nucleic Acids Res 32:D258–D261
Goldstone RL (1994) The role of similarity in categorization: providing a groundwork. Cognition 52(2):125–157
Hadj Taieb MA, Aouicha MB, Hamadou AB (2014) A new semantic relatedness measurement using WordNet features. Knowl Inf Syst 41(2):467–497
Hadj Taieb MA, Aouicha MB, Hamadou AB (2014) Ontology-based approach for measuring semantic similarity. Eng Appl Artif Intell 36:238–261
Halavais A, Lackaff D (2008) An analysis of topical coverage of Wikipedia. J Comput-Mediat Commun 13(2):429–440
Hamedani MR, Kim SW, Kim DJ (2016) SimCC: a novel method to consider both content and citations for computing similarity of scientific papers. Inf Sci 334-335:273–292
Harispe S, Sanchez D, Ranwez S, Janaqi S, Montmain J (2014) A framework for unifying ontology-based semantic similarity measures: a study in the biomedical domain. J Biomed Inform 48:38–53
Hirst G, St-Onge D (1998) Lexical chains as representations of context for the detection and correction of malapropisms. WordNet: An Electronic Lexical Database, The MIT Press, Cambridge, MA, pp 305–332
Jiang Y, Bai W, Zhang X, Hu J (2017) Wikipedia-based information content and semantic similarity computation. Inf Process Manag 53(1):248–265
Jiang JJ, Conrath DW (1997) Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of the 10th international conference on research on computational linguistics, The Association for Computational Linguistics and Chinese Language Processing (ACLCLP), Taipei, pp 19–33
Jiang Y, Yang M, Qu R (2019) Semantic similarity measures for formal concept analysis using linked data and WordNet. Multimed Tools Appl 78:19807–19837
Jiang Y, Zhang X, Tang Y, Nie R (2015) Feature-based approaches to semantic similarity assessment of concepts using Wikipedia. Inf Process Manag 51(3):215–234
Lastra-Diaz JJ, Garcia-Serrano A (2015) A novel family of IC-based similarity measures with a detailed experimental survey on WordNet. Eng Appl Artif Intell 46:140–153
Leacock C, Chodorow M (1998) Combining local context and WordNet similarity for word sense identification. WordNet: An Electronic Lexical Database, The MIT Press, Cambridge, MA, pp 265–283
Lee D, Cornet R, Lau F, de Keizer N (2013) A survey of SNOMED CT implementations. J Biomed Inform 46(1):87–96
Li Y, Bandar ZA, McLean D (2003) An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans Knowl Data Eng 15(4):871–882
Lin D (1998) An information-theoretic definition of similarity. In: Proceedings of the Fifteenth International Conference on Machine Learning (ICML 1998). Morgan Kaufmann Publishers, San Francisco, CA, USA, pp 296–304
Liu H, Bao H, Xu D (2012) Concept vector for semantic similarity and relatedness based on WordNet structure. J Syst Softw 85(2):370–381
Liu YH, Wacholder N (2017) Evaluating the impact of MeSH (medical subject headings) terms on different types of searchers. Inf Process Manag 53(4):851–870
Maarek YS, Berry DM, Kaiser GE (1991) An information retrieval approach for automatically constructing software libraries. IEEE Trans Softw Eng 17(8):800–813
Maguitman AG, Menczer F, Erdinc F, Roinestad H, Vespignani A (2006) Algorithmic computation and approximation of semantic similarity. World Wide Web 9(4):431–456
Martinez-Gil J (2014) An overview of textual semantic similarity measures based on web intelligence. Artif Intell Rev 42(4):935–943
Medelyan O, Milne D, Legg C, Witten IH (2009) Mining meaning from Wikipedia. Int J Hum Comput Stud 67(9):716–754
Meng L, Gu J, Zhou Z (2012) A new model of information content based on concept’s topology for measuring semantic similarity in WordNet. Int J Grid Distribute Comput 5(3):81–93
Meng L, Huang R, Gu J (2014) Measuring semantic similarity of word pairs using path and information content. Int J Future Generation Commun Netw 7(3):183–194
Meymandpour R, Davis JG (2016) A semantic similarity measure for linked data: an information content-based approach. Knowl-Based Syst 109:276–293
Miller GA, Charles WG (1991) Contextual correlates of semantic similarity. Lang Cogn Process 6(1):1–28
Nosofsky RM (1992) Similarity scaling and cognitive process models. Annu Rev Psychol 43(1):25–53
Oliva J, Serrano JI, del Castillo MD, Iglesias A (2011) SyMSS: a syntax-based measure for short-text semantic similarity. Data Knowl Eng 70(4):390–405
Ou W, Xuan R, Gou J, Zhou Q, Cao Y (2020) Semantic consistent adversarial cross-modal retrieval exploiting semantic similarity. Multimed Tools Appl 79:14733–14750
Pedersen T, Pakhomov SVS, Patwardhan S, Chute CG (2007) Measures of semantic similarity and relatedness in the biomedical domain. J Biomed Inform 40(3):288–299
Pellegrin L, Escalante HJ, Montes-y-Gomez M, Gonzalez FA (2019) Exploiting label semantic relatedness for unsupervised image annotation with large free vocabularies. Multimed Tools Appl 78:19641–19662
Petrakis EGM, Varelas G, Hliaoutakis A, Raftopoulou P (2006) X-similarity: computing semantic similarity between concepts from different ontologies. J Digit Inf Manag 4(4):233–237
Pilehvar MT, Navigli R (2015) From senses to texts: an all-in-one graph-based approach for measuring semantic similarity. Artif Intell 228:95–128
Pirro G (2009) A semantic similarity metric combining features and intrinsic information content. Data Knowl Eng 68(11):1289–1308
Ponzetto SP, Strube M (2007) Knowledge derived from Wikipedia for computing semantic relatedness. J Artif Intell Res 30:181–212
Rada R, Mili H, Bicknell M, Blettner E (1989) Development and application of a metric on semantic nets. IEEE Trans Syst Man Cybern 19(1):17–30
Resnik P (1995) Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of International Joint Conference for Artificial Intelligence (IJCAI 1995). Morgan Kaufmann Publishers, San Francisco, CA, USA, pp 448–453
Resnik P (1999) Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. J Artif Intell Res 11:95–130
Rodriguez MA, Egenhofer MJ (2003) Determining semantic similarity among entity classes from different ontologies. IEEE Trans Knowl Data Eng 15(2):442–456
Rubenstein H, Goodenough J (1965) Contextual correlates of synonymy. Commun ACM 8(10):627–633
Safyan M, Qayyum ZU, Sarwar S, Garcia-Castro R, Ahmed M (2019) Ontology-driven semantic unified modelling for concurrent activity recognition (OSCAR). Multimed Tools Appl 78:2073–2104
Samih H, Rady S, Gharib TF (2020) Enhancing image retrieval for complex queries using external knowledge sources. Multimed Tools Appl 79:27633–27657
Sanchez D, Batet M (2011) Semantic similarity estimation in the biomedical domain: an ontology-based information-theoretic perspective. J Biomed Inform 44(5):749–759
Sanchez D, Batet M (2012) A new model to compute the information content of concepts from taxonomic knowledge. Int J Semant Web Inf Syst 8(2):34–50
Sanchez D, Batet M (2013) A semantic similarity method based on information content exploiting multiple ontologies. Expert Syst Appl 40(4):1393–1399
Sanchez D, Batet M, Isern D (2011) Ontology-based information content computation. Knowl-Based Syst 24(2):297–303
Sanchez D, Batet M, Isern D, Valls A (2012) Ontology-based semantic similarity: a new feature-based approach. Expert Syst Appl 39(9):7718–7728
Sarwar S, Qayyum ZU, Garcia-Castro R, Safyan M, Munir RF (2019) Ontology based E-learning framework: a personalized, adaptive and context aware model. Multimed Tools Appl 78:34745–34771
Seco N, Veale T, Hayes J (2004) An intrinsic information content metric for semantic similarity in WordNet. In: Proceedings of the 16th European Conference on Artificial Intelligence (ECAI), IOS Press, Amsterdam, pp 1089–1094
Shepard RN (1962) The analysis of proximities: multidimensional scaling with an unknown distance function. I Psychometrika 27(2):125–140
Staab S, Studer R (2009) Handbook on Ontologies. Springer, Second Edition
Strube M, Ponzetto SP (2006) WikiRelate! Computing semantic relatedness using Wikipedia. In: Proceedings of the 21st national conference on artificial intelligence (AAAI 2006), AAAI Press, Cambridge, pp 1419-1424
Suchanek FM, Kasneci G, Weikum G (2008) YAGO: a large ontology from Wikipedia and WordNet. J Web Semant 6(3):203–217
Tversky A (1977) Features of similarity. Psychol Rev 84(4):327–352
Wolk K, Wolk A (2017) Machine enhanced translation of the human phenotype ontology project. Procedia Comput Sci 121:11–18
Wu Z, Palmer M (1994) Verb semantics and lexical selection. In: Proceedings of the 32nd annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Stroudsburg, PA, pp 133–138
Zhou Z, Wang Y, Gu J (2008) A new model of information content for semantic similarity in WordNet. In: Proceedings of second international conference on Future Generation Communication and Networking Symposia (FGCNS 2008), IEEE, New York, pp 85–89
Acknowledgements
The authors would like to thank the anonymous referees for their valuable comments and suggestions which greatly improved the exposition of the paper. The works described in this paper are supported by The National Natural Science Foundation of China under Grant Nos. 61772210 and U1911201; Guangdong Province Universities Pearl River Scholar Funded Scheme (2018); The Project of Science and Technology in Guangzhou in China under Grant Nos. 201807010043 and 202007040006. Thanks for my students Rong Qu, Yongyi Fang, and Yudong Liu for their discussion, programming, and experiments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Jiang, Y. A unified framework for semantic similarity computation of concepts. Multimed Tools Appl 80, 32335–32378 (2021). https://doi.org/10.1007/s11042-021-10966-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-10966-1