Exploiting Taxonomical Knowledge to Compute Semantic Similarity: An Evaluation in the Biomedical Domain

  • Montserrat Batet
  • David Sanchez
  • Aida Valls
  • Karina Gibert
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6096)


Determining the semantic similarity between concept pairs is an important task in many language related problems. In the biomedical field, several approaches to assess the semantic similarity between concepts by exploiting the knowledge provided by a domain ontology have been proposed. In this paper, some of those approaches are studied, exploiting the taxonomical structure of a biomedical ontology (SNOMED-CT). Then, a new measure is presented based on computing the amount of overlapping and non-overlapping taxonomical knowledge between concept pairs. The performance of our proposal is compared against related ones using a set of standard benchmarks of manually ranked terms. The correlation between the results obtained by the computerized approaches and the manual ranking shows that our proposal clearly outperforms previous works.


Semantic similarity Ontologies Biomedicine Data mining 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Resnik, P.: Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research 11, 95–130 (1999)zbMATHGoogle Scholar
  2. 2.
    Cilibrasi, R.L., Vitányi, P.M.: The Google similarity distance. IEEE Transaction on Knowledge and Data Engineering 19(3), 370–383 (2006)CrossRefGoogle Scholar
  3. 3.
    Sanchez, D., Moreno, A.: Learning non-taxonomic relationships from web documents for domain ontology construction. Data Knowledge Engineering 63(3), 600–623 (2008)CrossRefGoogle Scholar
  4. 4.
    Lee, J., Kim, M., Lee, Y.: Information retrieval based on conceptual distance in is-a hierarchies. Journal of Documentation 49(2), 188–207 (1993)CrossRefGoogle Scholar
  5. 5.
    Pedersen, T., Pakhomov, S., Patwardhan, S., Chute, C.: Measures of semantic similarity and relatedness in the biomedical domain. Journal of Biomedical Informatics 40, 288–299 (2007)CrossRefGoogle Scholar
  6. 6.
    Lord, P., Stevens, R., Brass, A., Goble, C.: Investigating semantic similarity measures across the gene ontology: the relationship between sequence and annotation. Bioinformatics 19(10), 1275–1283 (2003)CrossRefGoogle Scholar
  7. 7.
    Wilbu, W., Yang, Y.: An analysis of statistical term strength and its use in the indexing and retrieval of molecular biology texts. Computers in Biology and Medicine 26, 209–222 (1996)CrossRefGoogle Scholar
  8. 8.
    Resnik, P.: Using information content to evalutate semantic similarity in a taxonomy. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI 95), Montreal, Canada, pp. 448–453 (1995)Google Scholar
  9. 9.
    Lin, D.: An information-theoretic definition of similarity. In: Shavlik, J.W. (ed.) Proceedings of the 15th International Conference on Machine Learning (ICML 98), Madison, Wisconson, USA, pp. 296–304. Morgan Kaufmann, San Francisco (1998)Google Scholar
  10. 10.
    Jiang, J., Conrath, D.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of the International Conference on Research in Computational Linguistics, September 1997, pp. 19–33 (1997)Google Scholar
  11. 11.
    Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998), zbMATHGoogle Scholar
  12. 12.
    Neches, R., Fikes, R., Finin, T., Gruber, T., Senator, T., Swartout, W.: Enabling technology for knowledge sharing. AI Magazine 12(3), 36–56 (1991)Google Scholar
  13. 13.
    Wu, Z., Palmer, M.: Verb semantics and lexical selection. In: Proceedings of the 32nd annual Meeting of the Association for Computational Linguistics, New Mexico, USA, pp. 133–138. Association for Computational Linguistics (1994)Google Scholar
  14. 14.
    Leacock, C., Chodorow, M.: WordNet: An electronic lexical database. In: Combining local context and WordNet similarity for word sense identification, pp. 265–283. MIT Press, Cambridge (1998)Google Scholar
  15. 15.
    Etzioni, O., Cafarella, M., Downey, D., Popescu, A., Shaked, T., Soderland, S., Weld, D., Yates, A.: Unsupervised named-entity extraction form the web: An experimental study. Artificial Intelligence 165, 91–134 (2005)CrossRefGoogle Scholar
  16. 16.
    Landauer, T., Dumais, S.: A solution to plato’s problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review 104, 211–240 (1997)CrossRefGoogle Scholar
  17. 17.
    Lemaire, B., Denhiére, G.: Effects of high-order co-occurrences on word semantic similarities. Current Psychology Letters - Behaviour, Brain and Cognition 18(1) (2006)Google Scholar
  18. 18.
    Gómez-Pérez, A., Fernández-López, M., Corcho, O.: Ontological Engineering, 2nd printing. Springer, Heidelberg (2004)Google Scholar
  19. 19.
    Rada, R., Mili, H., Bichnell, E., Blettner, M.: Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man and Cybernetics 9(1), 17–30 (1989)CrossRefGoogle Scholar
  20. 20.
    Caviedes, J., Cimino, J.: Towards the development of a conceptual distance metric for the UMLS. Journal of Biomedical Informatics 37, 77–85 (2004)CrossRefGoogle Scholar
  21. 21.
    Nguyen, H., Al-Mubaid, H.: New ontology-based semantic similarity measure for the biomedical domain. In: IEEE conference on Granular Computing, pp. 623–628 (2006)Google Scholar
  22. 22.
    Burgun, A., Bodenreider, O.: Comparing terms, concepts and semantic classes in wordnet and the unified medical language system. In: Proc. of the NAACL 2001 Workshop: WordNet and other lexical resources: Applications, extensions and customizations, Pittsburgh, PA, pp. 77–82 (2001)Google Scholar
  23. 23.
    Miller, G., Charles, W.: Contextual correlates of semantic similarity. Language and Cognitive Processes 6(1), 1–28 (1991)CrossRefGoogle Scholar
  24. 24.
    Cimiano, P.: Ontology Learning and Population from Text. Algorithms, Evaluation and Applications (2006)Google Scholar
  25. 25.
    Hliaoutakis, A., Varelas, G., Voutsakis, E., Petrakis, E.G.M., Milios, E.E.: Information retrieval by semantic similarity. Int. J. Semantic Web Inf. Syst. 2(3), 55–73 (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Montserrat Batet
    • 1
  • David Sanchez
    • 1
  • Aida Valls
    • 1
  • Karina Gibert
    • 2
  1. 1.Department of Computer Science and Mathematics, Intelligent Technologies for Advanced Knowledge Acquisition Research GroupUniversitat Rovira i VirgiliTarragonaSpain
  2. 2.Department of Statistics and Operations Research, Knowledge Engineering and Machine Learning groupUniversitat Politècnica de CatalunyaBarcelonaSpain

Personalised recommendations