Abstract
Semantic similarity and relatedness computation has attracted an increasing amount of attention among researchers. The majority of previous studies, including edge-based and information content-based methods, rely on a single semantic relationship in WordNet such as the “is-a” relation. However, a performance ceiling may have been created by semantic unicity and inadequate calculation in solely “is-a” relation-based measurements, i.e., the computed results for some word pairs are too small and significantly deviate from human judgments. For this problem, we propose the following solutions: (1) We introduce the notion of the nearest common descendant to provide a supplement for commonalities between concepts according to genetics theory. (2) We design various targeted methods for different incomplete semantic relations. Therefore, various semantic relations can participate in similarity and relatedness computations in their most appropriate manners. (3) We utilize the cross-use of incomplete semantic relations similar-to and antonymy to solve the challenge of adjective and adverb similarity/relatedness measurements in WordNet. (4) We propose a targeted independent computation and largest contribution aggregation method to break through the performance ceiling of similarity/relatedness measurements based on single “is-a” relations. We conduct evaluations of our proposed model using seven extensively employed datasets. These evaluations indicate that our method significantly improves the performance of the existing methods based on single “is-a” relations. Their best Pearson coefficient with human judgments on both the MC30 and RG65 is increased to 0.9. With the development and enrichment of semantic relations in WordNet, our proposed model can be expected to have a more prominent role.
Similar content being viewed by others
References
Zhu GG, Iglesias CA (2018) Exploiting semantic similarity for named entity disambiguation in knowledge graphs. Expert Syst Appl 101(2018):8–24
Ru C, Tang J, Li S, Xie S, Wang T (2018) Using semantic similarity to reduce wrong labels in distant supervision for relation extraction. Inf Process Manag 54(4):593–608
Otegi A, Arregi X, Ansa O, Agirre E (2015) Using knowledge based relatedness for information retrieval. Knowl Inf Syst 44(3):689–718
Tversky A (1977) Features of similarity. Psychol Rev 84(4):327–352
Miller GA, Charles WG (1991) Contextual correlates of semantic similarity. Lang Cogn Process 6(1):1–28
Fellbaum C (1998) WordNet: an electronic lexical database (language, speech, and communication). The MIT Press, Cambridge
Zhu X, Li F, Chen H, Peng Q (2018) An efficient path computing model for measuring semantic similarity using edge and density. Knowl Inf Syst 55(1):79–111
Gao JB, Zhang BW, Chen XH (2015) A WordNet-based semantic similarity measurement combining edge-counting and information content theory. Eng Appl Artif Intel 39(2015):80–88
Hadj Taieb MA, Aouicha MB, Hamadou AB (2014) A new semantic relatedness measurement using WordNet features. Knowl Inf Syst 41(2):467–497
Li Y, Bandar ZA, McLean D (2003) An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans Knowl Data Eng 15(4):871–882
Liu X, Zhou Y, Zheng R (2007) Measuring semantic similarity in WordNet. In: Proceedings of the sixth international conference on machine learning and cybernetics, pp 3431–3435
Meng L, Gu J, Zhou Z (2012) A new model of information content based on concept’s topology for measuring semantic similarity in WordNet. Int J Grid Distrib Comput 5(3):81–94
Seco N, Veale T, Hayes J (2004) An intrinsic information content metric for semantic similarity in WordNet. In: Proceedings of t artificial intelligence, pp 1089–1090
Wu Z, Palmer M (1994) Verbs semantics and lexical selection. In: Proceedings of the 32nd annual meeting on association for computational linguistics, pp 133–138
Hao D, Zuo WL, Peng T (2011) An approach for calculating semantic similarity between words using WordNet. In: Proceeding of 2011 second international conference on digital manufacturing and automation (Zhan Jiajie, Hunan, China), pp 177–180
Ahsaee MG, Naghibzadeh M, Naeini SEY (2014) Semantic similarity assessment of words using weighted WordNet. Int J Mach Learn Cybern 5(3):479–490
Lin D (1998) An information-theoretic definition of similarity. In: Proceedings of the fifteenth international conference on machine learning, pp 296–304
Ning W, Yu M, Kong D (2016) Evaluating semantic similarity between Chinese biomedical terms through multiple ontologies with score normalization: an initial study. J Biomed Inform 64(2016):273–287
Aouicha MB, Hadj Taieb MA, Hamadou AB (2016) Taxonomy-based information content and wordnet-wiktionary-wikipedia glosses for semantic relatedness. Appl Intell 45(2):1–37
Sánchez D, Batet M (2013) A semantic similarity method based on information content exploiting multiple ontologies. Expert Syst Appl 40(4):1393–1399
Sánchez D, Batet M (2011) Ontology-based information content computation. Knowl-Based Syst 24(2):297–303
Batet M, Harispe S, Ranwez S et al (2014) An information theoretic approach to improve semantic similarity assessments across multiple ontologies. Inf Sci 283(2014):197–210
Aouicha MB, Hadj Taieb MA (2016) Computing semantic similarity between biomedical concepts using new information content approach. J Biomed Inform 59(1):258–275
Resnik P (1995) Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of 14th international joint conference on artificial intelligence, pp 448–453
Jiang JJ, Conrath DW (1997) Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of research in computational linguistics, pp 19–33
Sánchez D, Batet M (2011) Semantic similarity estimation in the biomedical domain: an ontology-based information-theoretic perspective. J Biomed Inform 44(5):749–759
Petrakis EGM, Varelas G, Hliaoutakis A, Raftopoulou P (2006) X-similarity: computing semantic similarity between concepts from different ontologies. J Digit Inf Manag (JDIM) 4(4):233–237
Rodríguez MA, Egenhofer MJ (2003) Determining semantic similarity among entity classes from different ontologies. IEEE Trans Knowl Data Eng 15(2):442–456
Lu W, Qin Y, Qi Q, Zeng W, Zhong Y (2016) Selecting a semantic similarity measure for concepts in two different CAD model data ontologies. Adv Eng Inform 30(3):449–466
Sussna M (1993) Word sense disambiguation for free-text indexing using a massive semantic network. In: Proceedings of the second international conference on information and knowledge management, pp 67–74
Hirst G, St-Onge D (1998) Lexical chains as representation of context for the detection and correction malapropisms. MIT Press, Cambridge, pp 305–322
Zhu G, Iglesias CA (2017) Computing semantic similarity of concepts in knowledge graphs. IEEE Trans Knowl Data Eng 29(1):72–85
Banerjee S, Pedersen T (2003) Extended gloss overlaps as a measure of semantic relatedness. In: Proceedings of the 18th international joint conference on artificial intelligence (San Francisco, CA, USA), pp 805–810
Lesk M (1986) Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the 5th annual international conference on systems documentation, pp 24–26
Patwardhan S, Pedersen T (2006) Using WordNet-based context vectors to estimate the semantic relatedness of concepts. In: Proceedings of the EACL 2006 workshop making sense of sense: bringing computational linguistics and psycholinguistics together, pp 1–8
Rada R, Mili H, Bicknell E, Blettner M (1989) Development and application of a metric on semantic nets. IEEE Trans Syst Man Cybern 19(1):17–30
Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27(3):379–423, 623–656
Jaccard P (1901) Distribution de la flore alpine dans le bassin des dranses et dans quelques régions voisines. Bull Soc Vaudoise Sci Nat 37(1901):241–272
Hadj Taieb MA, Aouicha MB, Hamadou AB (2014) Ontology-based approach for measuring semantic similarity. Eng Appl Artif Intell 36(8):238–261
Cross V, Yu X, Hu X (2013) Unifying ontological similarity measures: a theoretical and empirical investigation. Int J Approx Reason 54(7):861–875
Harispe S, Sánchez D, Ranwez S, Janaqi S et al (2014) A framework for unifying ontology-based semantic similarity measures: a study in the biomedical domain. J Biomed Inform 48(2):38–53
Zhu X, Guo Q (2019) Zhang B (2019) An efficient approach for measuring semantic relatedness using Wikipedia bidirectional links. Appl Intell. https://doi.org/10.1007/s10489-019-01452-1
Harispe S, Ranwez S, Janaqi S (2015) Semantic similarity from natural language and ontology analysis. Synth Lect Hum Lang Technol 8(1):254
Rubenstein H, Goodenough JB (1965) Contextual correlates of synonymy. Commun ACM 8(10):627–633
Agirre E, Alfonseca E, Hall K, Kravalova J, Paşca M, Soroa A (2009) A study on similarity and relatedness using distributional and WordNet-based approaches. In: Proceeding of the human language technologies: the 2009 annual conference of the North American chapter of the association for computational linguistics, pp 19–27
Finkelstein L, Gabrilovich E, Matias Y, Rivlin E, Solan Z, Wolfman G, Ruppin E (2002) Placing search in context: the concept revisited. ACM Trans Inform Syst 20(1):116–131
Hill F, Reichart R, Korhonen A (2014) SimLex-999: evaluating semantic models with (Genuine) similarity estimation. Comput Linguist 41(4):665–695
Landauer TK, Dumais ST (1997) A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol Rev 104(2):211–240
Tsatsaronis G, Varlamis I, Vazirgiannis M (2010) Text relatedness based on a word thesaurus. J Artif Intell Res 37(4):1–39
Dolan B, Quirk C, Brockett C (2004) Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources. In Proceedings of the twentieth international conference on computational linguistics (COLING), pp 350–356
Acknowledgements
This work has been supported by the Natural Science Foundation of Guangxi of China under the contract number 2018GXNSFAA138087, the National Natural Science Foundation of China under the contract numbers 61462010 and 61363036, the Innovation Project of Guangxi Graduate Education under the contract number XYCSZ2019064 and Guangxi Collaborative Innovation Center of Multi-source Information Integration and Intelligent Processing.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhu, X., Yang, X., Huang, Y. et al. Measuring similarity and relatedness using multiple semantic relations in WordNet. Knowl Inf Syst 62, 1539–1569 (2020). https://doi.org/10.1007/s10115-019-01387-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-019-01387-6