Advertisement

Knowledge and Information Systems

, Volume 55, Issue 1, pp 79–111 | Cite as

An efficient path computing model for measuring semantic similarity using edge and density

  • Xinhua Zhu
  • Fei LiEmail author
  • Hongchao Chen
  • Qi Peng
Regular Paper

Abstract

The shortest path between two concepts in a taxonomic ontology is commonly used to represent the semantic distance between concepts in edge-based semantic similarity measures. In the past, edge counting, which is simple and intuitive and has low computational complexity, was considered the default method for path computation. However, a large lexical taxonomy, such as WordNet, has irregular link densities between concepts due to its broad domain, but edge counting-based path computation is powerless for this non-uniformity problem. In this paper, we advocate that the path computation can be separated from edge-based similarity measures and can form various general computing models. Therefore, to solve the problem of the non-uniformity of concept density in a large taxonomic ontology, we propose a new path computing model based on the compensation of local area density of concepts, which is equal to the number of direct hyponyms of the subsumers for concepts in the shortest path. This path model considers the local area density of concepts as an extension of the edge counting-based path according to the information theory. This model is a general path computing model and can be applied in various edge-based similarity approaches. The experimental results show that the proposed path model improves the average optimal correlation between edge-based measures and human judgments on the Miller and Charles benchmark for WordNet from less than 0.79 to more than 0.86, on the Pedersenet al. benchmark (average of both Physician and Coder) for SNOMED-CT from less than 0.75 to more than 0.82, and it has a large advantage in efficiency compared with information content computation in a dynamic ontology, thereby successfully improving the edge-based similarity measure as an excellent method with high performance and high efficiency.

Keywords

Path computing model Semantic similarity Local density WordNet SNOMED-CT 

Notes

Acknowledgements

This work has been supported by the National Natural Science Foundation of China under the Contract Numbers 61363036 and 61462010, and Guangxi Collaborative Innovation Center of Multi-source Information Integration and Intelligent Processing.

References

  1. 1.
    Srihari RK, Zhang ZF, Rao A (2000) Intelligent indexing and semantic retrieval of multimodal documents. Inf Retr 2(2–3):245–275CrossRefGoogle Scholar
  2. 2.
    Patwardhan S, Banerjee S, Pedersen T (2003) Using measures of semantic relatedness for word sense disambiguation. In: Proceedings of computational linguistics and intelligent text, pp 241–257Google Scholar
  3. 3.
    Snchez D, Morenoa A (2008) Learning non-taxonomic relationships from web documents for domain ontology construction. Data Knowl Eng 64(3):600–623CrossRefGoogle Scholar
  4. 4.
    Budanitsky A, Hirst G (2001) Semantic distance in WordNet: an experimental, application-oriented evaluation of five measures. In: Workshop on WordNet and other lexical resources, Second meeting of the North American chapter of the association for computational linguistics, vol 2, issue 12, pp 29–34Google Scholar
  5. 5.
    Liu X, Zhou Y, Zheng R(2007) Measuring semantic similarity in WordNet. In: Proceedings of machine learning and cybernetics, pp 3431–3435Google Scholar
  6. 6.
    Kozima H (1994) Computing lexical cohesion as a tool for text analysis. Ph.D. thesis, Computer Science and Information Mathematics, Graduate School of Electro-Communications, University of Electro-CommunicationsGoogle Scholar
  7. 7.
    Wu Z, Palmer M (1994) Verbs semantics and lexical selection. In: Proceedings of on association for computational linguistics, pp 133–138Google Scholar
  8. 8.
    Seco N, Veale T, Hayes J (2004) An intrinsic information content metric for semantic similarity in WordNet. In: Proceedings of t artificial intelligence, pp 1089–1090Google Scholar
  9. 9.
    Rodríguez MA, Egenhofer MJ (2003) Determining semantic similarity among entity classes from different ontologies. IEEE Trans Knowl Data Eng 15(2):442–456CrossRefGoogle Scholar
  10. 10.
    Zhou Z, Wang Y, Gu J (2008) New model of semantic similarity measuring in WordNet. In: Proceedings of intelligent system and knowledge engineering, pp 256-261Google Scholar
  11. 11.
    Hao D, Zuo WL, Peng T (2011) An approach for calculating semantic similarity between words using WordNet. In: Proceeding of digital manufacturing and automation, pp 177–180Google Scholar
  12. 12.
    Jiang JJ, Conrath DW (1997) Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of research in computational linguistics, pp 19–33Google Scholar
  13. 13.
    Li Y, Bandar Z, McLean D (2003) An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans Knowl Data Eng 15(4):871–882CrossRefGoogle Scholar
  14. 14.
    Hadj Taieb MA, Aouicha MB, Hamadou AB (2014) Ontology-based approach for measuring semantic similarity. Eng Appl Artif Intell 36(8):238–261CrossRefGoogle Scholar
  15. 15.
    Rada R, Mili H, Bicknell E et al (1989) Development and application of a metric on semantic nets. IEEE Trans Syst Man Cybern 19(1):17–30CrossRefGoogle Scholar
  16. 16.
    Resnik P (1999) Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. J Artif Intell Res 11(4):95–130zbMATHGoogle Scholar
  17. 17.
    Borgida A, Walsh T, Hirsh H (2005) Towards measuring similarity in description logics. In: 2005 international workshop on description logics, pp 286–294Google Scholar
  18. 18.
    Claudia D (2007) Similarity-based learning methods for the semantic web. Ph.D. thesis, Department of Computer Science, University of Bari, ItalyGoogle Scholar
  19. 19.
    Claudia D, Steffen S, Nicola F (2008) On the influence of description logics ontologies on conceptual similarity. In: Proceeding of knowledge engineering: practice and patterns, pp 48–63Google Scholar
  20. 20.
    Jan R (2002) Clustering and instance based learning in first order logic. Ph.D. thesis, Department of Computer Science, Leuven, BelgiumGoogle Scholar
  21. 21.
    Hirst G, St-Onge D (1998) Lexical chains as representations of context for the detection and correction of malapropisms. MIT Press, Cambridge, pp 305–322Google Scholar
  22. 22.
    Leacock C, Chodorow M (1998) Combining local context and WordNet similarity for word sense identification. MIT Press, Cambridge, pp 265–283Google Scholar
  23. 23.
    Lin D (1998) An information-theoretic definition of similarity. In: Proceeding of machine learning, pp 296–304Google Scholar
  24. 24.
    Resnik P (1995) Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of artificial intelligence, pp 448–453Google Scholar
  25. 25.
    Meng L, Gu J, Zhou Z (2012) A new model of information content based on concept’s topology for measuring semantic similarity in WordNet. Int J Grid Distrib Comput 5(3):81–94Google Scholar
  26. 26.
    Devitt A, Vogel C (2004) The topology of WordNet: some metrics. In: Proceeding of global Wordnet conference, pp 106–111Google Scholar
  27. 27.
    Spackman KA (2004) SNOMED CT milestones: endorsements are added to already impressive standards credentials. Healthc Inform Bus Mag Inf Commun Syst 21(9):54–56Google Scholar
  28. 28.
    Harispe S, Ranwez S, Janaqi S et al (2015) Semantic similarity from natural language and ontology analysis. Synth Lect Hum Lang Technol 8(1):254Google Scholar
  29. 29.
    Miller GA, Charles WG (1991) Contextual correlates of semantic similarity. Lang Cognit Process 6(1):1–28MathSciNetCrossRefGoogle Scholar
  30. 30.
    Kipper KS (2006) VERNET: a broad-coverage comprehensive verb lexicon. http://repository.upenn.edu/dissertations/AAI3179808
  31. 31.
    Baker CF, Fillmore CJ, Lowe JB (1998) The Berkeley FrameNet project. In: Proceeding of the 36th annual meeting of the association for computational linguistics and 17th international conference on computational linguistics, pp 86–90Google Scholar
  32. 32.
    Richardson SD, Dolan WB, Vanderwende L (1998) MindNet: acquiring and structuring semantic information from text. In: the 36th annual meeting of the association for computational linguistics and 17th international conference on computational linguistics, pp 1098–1102Google Scholar
  33. 33.
    Hadj Taieb MA, Aouicha MB, Hamadou AB (2014) A new semantic relatedness measurement using WordNet features. Knowl Inf Syst 41(2):467–497CrossRefGoogle Scholar
  34. 34.
    Yang D, Powers D (2006) Verb similarity on the taxonomy of WordNet. In: Proceeding of global WordNet conference, pp 177–178Google Scholar
  35. 35.
    Sussna M (1993) Word sense disambiguation for free-text indexing using a massive semantic network. In: Proceedings of information and knowledge management, pp 67–74Google Scholar
  36. 36.
    Sánchez D, Batet M (2011) Ontology-based information content computation. Knowl Based Syst 24(2):297–303CrossRefGoogle Scholar
  37. 37.
    Patwardhan S, Pedersen T (2006) Using WordNet-based context vectors to estimate the semantic relatedness of concepts. In: Proceedings of EACL 2006 workshop on making sense of sense: bringing computational linguistics and psycholinguistics together, pp 1–8Google Scholar
  38. 38.
    Petrakis E, Varelas G, Hliaoutakis A et al (2006) X-similarity: computing semantic similarity between concepts from different ontologies. J Digit Inf Manag 4(4):233–237Google Scholar
  39. 39.
    Rubenstein H, Goodenough JB (1965) Contextual correlates of synonymy. Commun Assoc Comput Mach 8(10):627–633Google Scholar
  40. 40.
    Pedersen T, Pakhomov SVS, Patwardhan S, Chute CG (2007) Measures of semantic similarity and relatedness in the biomedical. J Biomed Inform 40(3):288–299CrossRefGoogle Scholar
  41. 41.
    Princeton University (2014) The MIT Java Wordnet interface. http://projects.csail.mit.edu/jwi/

Copyright information

© Springer-Verlag London Ltd. 2017

Authors and Affiliations

  1. 1.Guangxi Key Lab of Multi-source Information Mining and SecurityGuangxi Normal UniversityGuilinChina

Personalised recommendations