Skip to main content

Advertisement

Log in

A unified framework for semantic similarity computation of concepts

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Semantic similarity assessment between concepts is an important task in many language related applications. In the past, many approaches to assess similarity of concepts have been proposed by using one knowledge source. In this paper, some limitations of the existing similarity measures are identified. To tackle these problems, we propose an extensive study for semantic similarity of concepts from which a unified framework for semantic similarity computation is presented. Based on our framework, we give some generic and flexible approaches to semantic similarity measures resulting from instantiations of the framework. In particular, we obtain some new approaches to similarity measures that existing methods cannot deal with by introducing multiple knowledge sources. The evaluation based on eight benchmarks, three widely used benchmarks (i.e., M&C, R&G, and WordSim-353 benchmarks) and five benchmarks developed in ourselves (i.e, Jiang-1, Jiang-2, Jiang-3, Jiang-4, and Jiang-5 benchmarks), sustains the intuitions with respect to human judgements. Overall, some methods proposed in this paper have a good human correlation (Pearson correlation with human judgments and Spearman correlation with human judgments) and constitute some effective ways of determining semantic similarity between concepts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. https://en.wikipedia.org/

  2. http://wordnet.princeton.edu/

  3. https://www.nlm.nih.gov/mesh/

  4. http://www.disease-ontology.org/

  5. http://human-phenotype-ontology.github.io/

  6. http://www.geneontology.org/

  7. http://obi-ontology.org/

References

  1. Abid A, Rouached M, Messai N (2020) Semantic web service composition using semantic similarity measures and formal concept analysis. Multimed Tools Appl 79:6569–6597

    Article  Google Scholar 

  2. Agirre E, Alfonseca E, Hall K, Kravalova J, Pasca M, Soroa A (2009) A study on similarity and relatedness using distributional and WordNet-based approaches. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, Stroudsburg, pp 19–27

  3. Alonso I, Contreras D (2016) Evaluation of semantic similarity metrics applied to the automatic retrieval of medical documents: an UMLS approach. Expert Syst Appl 44:386–399

    Article  Google Scholar 

  4. Aouicha MB, Taieb MAH (2016) Computing semantic similarity between biomedical concepts using new information content approach. J Biomed Inform 59:258–275

    Article  Google Scholar 

  5. Aouicha MB, Taieb MAH, Hamadou AB (2016) Taxonomy-based information content and wordnet-wiktionary-wikipedia glosses for semantic relatedness. Appl Intell 45(2):475–511

    Article  Google Scholar 

  6. Baker T, Lamb D, Taleb-Bendiab A, Al-Jumeily D (2010) Facilitating semantic adaptation of web services at runtime using a meta-data layer. In: Proceedings of IEEE 2010 third international conference on Developments in eSystems Engineering (DESE 2010), IEEE, New York, pp 231–236

  7. Bandrowski A, Brinkman R, Brochhausen M, Brush MH, Bug B, Chibucos MC, Clancy K, Courtot M, Derom D, Dumontier M, Fan L, Fostel J, Fragoso G, Gibson F, Gonzalez-Beltran A, Haendel MA, He Y, Heiskanen M, Hernandez-Boussard T, Jensen M, Lin Y, Lister AL, Lord P, Malone J, Manduchi E, McGee M, Morrison N, Overton JA, Parkinson H, Peters B, Rocca-Serra P, Ruttenberg A, Sansone SA, Scheuermann RH, Schober D, Smith B, Soldatova LN, Stoeckert CJ, Taylor CF, Torniai C, Turner JA, Vita R, Whetzel PL, Zheng J (2016) The ontology for biomedical investigations. PLoS One 11(4):e0154556

    Article  Google Scholar 

  8. Batet M, Sanchez D, Valls A, Gibert K (2013) Semantic similarity estimation from multiple ontologies. Appl Intell 38(1):29–44

    Article  Google Scholar 

  9. Bekhet S, Ahmed A (2020) Evaluation of similarity measures for video retrieval. Multimed Tools Appl 79:6265–6278

    Article  Google Scholar 

  10. Bizer C, Heath T, Berners-Lee T (2009) Linked data - the story so far. Int J Semant Web Inf Syst 5(3):1–22

    Article  Google Scholar 

  11. Bizer C, Lehmann J, Kobilarov G, Auer S, Becker C, Cyganiak R, Hellmann S (2009) DBpedia - a crystallization point for the web of data. J Web Semant 7(3):154–165

    Article  Google Scholar 

  12. Budanitsky A, Hirst G (2006) Evaluating WordNet-based measures of lexical semantic relatedness. Comput Linguist 32(1):13–47

    Article  MATH  Google Scholar 

  13. Capuano A, Rinaldi AM, Russo C (2020) An ontology-driven multimedia focused crawler based on linked open data and deep learning techniques. Multimed Tools Appl 79:7577–7598

    Article  Google Scholar 

  14. Church KW, Hanks P (1990) Word association norms, mutual information, and lexicography. Comput Linguist 16(1):22–29

    Google Scholar 

  15. Cilibrasi RL, Vitanyi PMB (2007) The Google similarity distance. IEEE Trans Knowl Data Eng 19(3):370–383

    Article  Google Scholar 

  16. Coletti MH, Bleich HL (2001) Medical subject headings used to search the biomedical literature. J Am Med Inform Assoc 8(4):317–323

    Article  Google Scholar 

  17. Couto FM, Silva MJ, Coutinho PM (2007) Measuring semantic similarity between gene ontology terms. Data Knowl Eng 61(1):137–152

    Article  Google Scholar 

  18. Cross V, Yu X, Hu X (2013) Unifying ontological similarity measures: a theoretical and empirical investigation. Int J Approx Reason 54(7):861–875

    Article  Google Scholar 

  19. Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407

    Article  Google Scholar 

  20. Fellbaum C (1998) WordNet: an electronic lexical database. Academic Press, Cambridge, MA

    Book  MATH  Google Scholar 

  21. Ferreira R, Lins RD, Simske SJ, Freitas F, Riss M (2016) Assessing sentence similarity through lexical, syntactic and semantic analysis. Comput Speech Lang 39:1–28

    Article  Google Scholar 

  22. Finkelstein L, Gabrilovich E, Matias Y, Rivlin E, Solan Z, Wolfman G, Ruppin E (2002) Placing search in context: the concept revisited. ACM Trans Inf Syst 20(1):116–131

    Article  Google Scholar 

  23. Gabrilovich E, Markovitch S (2007) Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: Proceedings of the 20th International Joint Conference on Artificial intelligence (IJCAI 2007). Morgan Kaufmann Publishers, San Francisco, CA, USA, pp 1606–1611

    Google Scholar 

  24. Gao JB, Zhang BW, Chen XH (2015) A WordNet-based semantic similarity measurement combining edge-counting and information content theory. Eng Appl Artif Intell 39:80–88

    Article  Google Scholar 

  25. Garla VN, Brandt C (2012) Semantic similarity in the biomedical domain: an evaluation across knowledge sources. BioMed Central Bioinform 13(1):261–273

    Google Scholar 

  26. Gene Ontology Consortium (2004) The gene ontology (GO) database and informatics resource. Nucleic Acids Res 32:D258–D261

    Article  Google Scholar 

  27. Goldstone RL (1994) The role of similarity in categorization: providing a groundwork. Cognition 52(2):125–157

    Article  MathSciNet  Google Scholar 

  28. Hadj Taieb MA, Aouicha MB, Hamadou AB (2014) A new semantic relatedness measurement using WordNet features. Knowl Inf Syst 41(2):467–497

    Article  Google Scholar 

  29. Hadj Taieb MA, Aouicha MB, Hamadou AB (2014) Ontology-based approach for measuring semantic similarity. Eng Appl Artif Intell 36:238–261

    Article  Google Scholar 

  30. Halavais A, Lackaff D (2008) An analysis of topical coverage of Wikipedia. J Comput-Mediat Commun 13(2):429–440

    Article  Google Scholar 

  31. Hamedani MR, Kim SW, Kim DJ (2016) SimCC: a novel method to consider both content and citations for computing similarity of scientific papers. Inf Sci 334-335:273–292

    Article  Google Scholar 

  32. Harispe S, Sanchez D, Ranwez S, Janaqi S, Montmain J (2014) A framework for unifying ontology-based semantic similarity measures: a study in the biomedical domain. J Biomed Inform 48:38–53

    Article  Google Scholar 

  33. Hirst G, St-Onge D (1998) Lexical chains as representations of context for the detection and correction of malapropisms. WordNet: An Electronic Lexical Database, The MIT Press, Cambridge, MA, pp 305–332

    Google Scholar 

  34. Jiang Y, Bai W, Zhang X, Hu J (2017) Wikipedia-based information content and semantic similarity computation. Inf Process Manag 53(1):248–265

    Article  Google Scholar 

  35. Jiang JJ, Conrath DW (1997) Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of the 10th international conference on research on computational linguistics, The Association for Computational Linguistics and Chinese Language Processing (ACLCLP), Taipei, pp 19–33

  36. Jiang Y, Yang M, Qu R (2019) Semantic similarity measures for formal concept analysis using linked data and WordNet. Multimed Tools Appl 78:19807–19837

    Article  Google Scholar 

  37. Jiang Y, Zhang X, Tang Y, Nie R (2015) Feature-based approaches to semantic similarity assessment of concepts using Wikipedia. Inf Process Manag 51(3):215–234

    Article  Google Scholar 

  38. Lastra-Diaz JJ, Garcia-Serrano A (2015) A novel family of IC-based similarity measures with a detailed experimental survey on WordNet. Eng Appl Artif Intell 46:140–153

    Article  Google Scholar 

  39. Leacock C, Chodorow M (1998) Combining local context and WordNet similarity for word sense identification. WordNet: An Electronic Lexical Database, The MIT Press, Cambridge, MA, pp 265–283

    Google Scholar 

  40. Lee D, Cornet R, Lau F, de Keizer N (2013) A survey of SNOMED CT implementations. J Biomed Inform 46(1):87–96

    Article  Google Scholar 

  41. Li Y, Bandar ZA, McLean D (2003) An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans Knowl Data Eng 15(4):871–882

    Article  Google Scholar 

  42. Lin D (1998) An information-theoretic definition of similarity. In: Proceedings of the Fifteenth International Conference on Machine Learning (ICML 1998). Morgan Kaufmann Publishers, San Francisco, CA, USA, pp 296–304

    Google Scholar 

  43. Liu H, Bao H, Xu D (2012) Concept vector for semantic similarity and relatedness based on WordNet structure. J Syst Softw 85(2):370–381

    Article  Google Scholar 

  44. Liu YH, Wacholder N (2017) Evaluating the impact of MeSH (medical subject headings) terms on different types of searchers. Inf Process Manag 53(4):851–870

    Article  Google Scholar 

  45. Maarek YS, Berry DM, Kaiser GE (1991) An information retrieval approach for automatically constructing software libraries. IEEE Trans Softw Eng 17(8):800–813

    Article  Google Scholar 

  46. Maguitman AG, Menczer F, Erdinc F, Roinestad H, Vespignani A (2006) Algorithmic computation and approximation of semantic similarity. World Wide Web 9(4):431–456

    Article  Google Scholar 

  47. Martinez-Gil J (2014) An overview of textual semantic similarity measures based on web intelligence. Artif Intell Rev 42(4):935–943

    Article  Google Scholar 

  48. Medelyan O, Milne D, Legg C, Witten IH (2009) Mining meaning from Wikipedia. Int J Hum Comput Stud 67(9):716–754

    Article  Google Scholar 

  49. Meng L, Gu J, Zhou Z (2012) A new model of information content based on concept’s topology for measuring semantic similarity in WordNet. Int J Grid Distribute Comput 5(3):81–93

    Google Scholar 

  50. Meng L, Huang R, Gu J (2014) Measuring semantic similarity of word pairs using path and information content. Int J Future Generation Commun Netw 7(3):183–194

    Article  Google Scholar 

  51. Meymandpour R, Davis JG (2016) A semantic similarity measure for linked data: an information content-based approach. Knowl-Based Syst 109:276–293

    Article  Google Scholar 

  52. Miller GA, Charles WG (1991) Contextual correlates of semantic similarity. Lang Cogn Process 6(1):1–28

    Article  Google Scholar 

  53. Nosofsky RM (1992) Similarity scaling and cognitive process models. Annu Rev Psychol 43(1):25–53

    Article  Google Scholar 

  54. Oliva J, Serrano JI, del Castillo MD, Iglesias A (2011) SyMSS: a syntax-based measure for short-text semantic similarity. Data Knowl Eng 70(4):390–405

    Article  Google Scholar 

  55. Ou W, Xuan R, Gou J, Zhou Q, Cao Y (2020) Semantic consistent adversarial cross-modal retrieval exploiting semantic similarity. Multimed Tools Appl 79:14733–14750

    Article  Google Scholar 

  56. Pedersen T, Pakhomov SVS, Patwardhan S, Chute CG (2007) Measures of semantic similarity and relatedness in the biomedical domain. J Biomed Inform 40(3):288–299

    Article  Google Scholar 

  57. Pellegrin L, Escalante HJ, Montes-y-Gomez M, Gonzalez FA (2019) Exploiting label semantic relatedness for unsupervised image annotation with large free vocabularies. Multimed Tools Appl 78:19641–19662

    Article  Google Scholar 

  58. Petrakis EGM, Varelas G, Hliaoutakis A, Raftopoulou P (2006) X-similarity: computing semantic similarity between concepts from different ontologies. J Digit Inf Manag 4(4):233–237

    Google Scholar 

  59. Pilehvar MT, Navigli R (2015) From senses to texts: an all-in-one graph-based approach for measuring semantic similarity. Artif Intell 228:95–128

    Article  MathSciNet  MATH  Google Scholar 

  60. Pirro G (2009) A semantic similarity metric combining features and intrinsic information content. Data Knowl Eng 68(11):1289–1308

    Article  Google Scholar 

  61. Ponzetto SP, Strube M (2007) Knowledge derived from Wikipedia for computing semantic relatedness. J Artif Intell Res 30:181–212

    Article  MATH  Google Scholar 

  62. Rada R, Mili H, Bicknell M, Blettner E (1989) Development and application of a metric on semantic nets. IEEE Trans Syst Man Cybern 19(1):17–30

    Article  Google Scholar 

  63. Resnik P (1995) Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of International Joint Conference for Artificial Intelligence (IJCAI 1995). Morgan Kaufmann Publishers, San Francisco, CA, USA, pp 448–453

    Google Scholar 

  64. Resnik P (1999) Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. J Artif Intell Res 11:95–130

    Article  MATH  Google Scholar 

  65. Rodriguez MA, Egenhofer MJ (2003) Determining semantic similarity among entity classes from different ontologies. IEEE Trans Knowl Data Eng 15(2):442–456

    Article  Google Scholar 

  66. Rubenstein H, Goodenough J (1965) Contextual correlates of synonymy. Commun ACM 8(10):627–633

    Article  Google Scholar 

  67. Safyan M, Qayyum ZU, Sarwar S, Garcia-Castro R, Ahmed M (2019) Ontology-driven semantic unified modelling for concurrent activity recognition (OSCAR). Multimed Tools Appl 78:2073–2104

    Article  Google Scholar 

  68. Samih H, Rady S, Gharib TF (2020) Enhancing image retrieval for complex queries using external knowledge sources. Multimed Tools Appl 79:27633–27657

    Article  Google Scholar 

  69. Sanchez D, Batet M (2011) Semantic similarity estimation in the biomedical domain: an ontology-based information-theoretic perspective. J Biomed Inform 44(5):749–759

    Article  Google Scholar 

  70. Sanchez D, Batet M (2012) A new model to compute the information content of concepts from taxonomic knowledge. Int J Semant Web Inf Syst 8(2):34–50

    Article  Google Scholar 

  71. Sanchez D, Batet M (2013) A semantic similarity method based on information content exploiting multiple ontologies. Expert Syst Appl 40(4):1393–1399

    Article  Google Scholar 

  72. Sanchez D, Batet M, Isern D (2011) Ontology-based information content computation. Knowl-Based Syst 24(2):297–303

    Article  Google Scholar 

  73. Sanchez D, Batet M, Isern D, Valls A (2012) Ontology-based semantic similarity: a new feature-based approach. Expert Syst Appl 39(9):7718–7728

    Article  Google Scholar 

  74. Sarwar S, Qayyum ZU, Garcia-Castro R, Safyan M, Munir RF (2019) Ontology based E-learning framework: a personalized, adaptive and context aware model. Multimed Tools Appl 78:34745–34771

    Article  Google Scholar 

  75. Seco N, Veale T, Hayes J (2004) An intrinsic information content metric for semantic similarity in WordNet. In: Proceedings of the 16th European Conference on Artificial Intelligence (ECAI), IOS Press, Amsterdam, pp 1089–1094

  76. Shepard RN (1962) The analysis of proximities: multidimensional scaling with an unknown distance function. I Psychometrika 27(2):125–140

    Article  MathSciNet  MATH  Google Scholar 

  77. Staab S, Studer R (2009) Handbook on Ontologies. Springer, Second Edition

    Book  MATH  Google Scholar 

  78. Strube M, Ponzetto SP (2006) WikiRelate! Computing semantic relatedness using Wikipedia. In: Proceedings of the 21st national conference on artificial intelligence (AAAI 2006), AAAI Press, Cambridge, pp 1419-1424

  79. Suchanek FM, Kasneci G, Weikum G (2008) YAGO: a large ontology from Wikipedia and WordNet. J Web Semant 6(3):203–217

    Article  Google Scholar 

  80. Tversky A (1977) Features of similarity. Psychol Rev 84(4):327–352

    Article  Google Scholar 

  81. Wolk K, Wolk A (2017) Machine enhanced translation of the human phenotype ontology project. Procedia Comput Sci 121:11–18

    Article  Google Scholar 

  82. Wu Z, Palmer M (1994) Verb semantics and lexical selection. In: Proceedings of the 32nd annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Stroudsburg, PA, pp 133–138

    Chapter  Google Scholar 

  83. Zhou Z, Wang Y, Gu J (2008) A new model of information content for semantic similarity in WordNet. In: Proceedings of second international conference on Future Generation Communication and Networking Symposia (FGCNS 2008), IEEE, New York, pp 85–89

Download references

Acknowledgements

The authors would like to thank the anonymous referees for their valuable comments and suggestions which greatly improved the exposition of the paper. The works described in this paper are supported by The National Natural Science Foundation of China under Grant Nos. 61772210 and U1911201; Guangdong Province Universities Pearl River Scholar Funded Scheme (2018); The Project of Science and Technology in Guangzhou in China under Grant Nos. 201807010043 and 202007040006. Thanks for my students Rong Qu, Yongyi Fang, and Yudong Liu for their discussion, programming, and experiments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuncheng Jiang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, Y. A unified framework for semantic similarity computation of concepts. Multimed Tools Appl 80, 32335–32378 (2021). https://doi.org/10.1007/s11042-021-10966-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-10966-1

Keywords

Navigation