MedSim: A Novel Semantic Similarity Measure in Bio-medical Knowledge Graphs

  • Kai Lei
  • Kaiqi Yuan
  • Qiang Zhang
  • Ying ShenEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11061)


We present MedSim, a novel semantic SIMilarity method based on public well-established bio-MEDical knowledge graphs (KGs) and large-scale corpus, to study the therapeutic substitution of antibiotics. Besides hierarchy and corpus of KGs, MedSim further interprets medicine characteristics by constructing multi-dimensional medicine-specific feature vectors. Dataset of 528 antibiotic pairs scored by doctors is applied for evaluation and MedSim has produced statistically significant improvement over other semantic similarity methods. Furthermore, some promising applications of MedSim in drug substitution and drug abuse prevention are presented in case study.


Semantic similarity Semantic networks Bioinformatics 


  1. 1.
    Hliaoutakis, A., Varelas, G., Petrakis, E.G.M., Milios, E.: MedSearch: a retrieval system for medical information based on semantic similarity. In: Gonzalo, J., Thanos, C., Verdejo, M.F., Carrasco, R.C. (eds.) ECDL 2006. LNCS, vol. 4172, pp. 512–515. Springer, Heidelberg (2006). Scholar
  2. 2.
    Pedersen, T., Pakhomov, S.V., Patwardhan, S., Chute, C.G.: Measures of semantic similarity and relatedness in the biomedical domain. J. Biomed. Inform. 40(3), 288–299 (2007)CrossRefGoogle Scholar
  3. 3.
    Resnik, P.: Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. AI Access Found. 11, 95–130 (1999)zbMATHGoogle Scholar
  4. 4.
    Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor J.: Freebase. In: Proceedings of SIGMOD (2008)Google Scholar
  5. 5.
    Wishart, D.S., et al.: DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 46(D1), D1074–D1082 (2017)CrossRefGoogle Scholar
  6. 6.
    Sun, Y., Han, J., Yan, X., Yu, P.S., Wu, T.: PathSim: meta path-based top-k similarity search in heterogeneous information networks. Proc. VLDB Endow. 4(11), 992–1003 (2011)Google Scholar
  7. 7.
    Shi, C., Li, Y., Yu, P.S., Wu, B.: Constrained-meta-path-based ranking in heterogeneous information network. Knowl. Inf. Syst. 49(2), 719–747 (2016)CrossRefGoogle Scholar
  8. 8.
    Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: International Joint Conference on Artificial Intelligence, pp. 448–453 (1995)Google Scholar
  9. 9.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, vol. 26, pp. 3111–3119 (2013)Google Scholar
  10. 10.
    Zhu, G., Iglesias, C.: Computing semantic similarity of concepts in knowledge graphs. IEEE Trans. Knowl. Data Eng. 29(1), 72–85 (2017)CrossRefGoogle Scholar
  11. 11.
    Arup, C., Shrey, S., Pabitra, M., Cyril, S., Nandu, S.S., Muthusamy, C.: SimCat: an entity similarity measure for heterogeneous knowledge graph with categories. In: Proceedings of the Second ACM IKDD Conference on Data Sciences, pp. 112–113 (2015)Google Scholar
  12. 12.
    Al-Mubaid, H., Nguyen, H.A.: A cluster-based approach for semantic similarity in the biomedical domain. In: 28th Annual International Conference of the IEEE, pp. 2713–2717 (2006)Google Scholar
  13. 13.
    Traverso, I., Vidal, M.E., Kämpgen, B., Sure-Vetter, Y.: GADES: a graph-based semantic similarity measure. In: Proceedings of the 12th International Conference on Semantic Systems, pp. 101–104. ACM (2016)Google Scholar
  14. 14.
    Hliaoutakis, A.: Semantic similarity measures in MeSH ontology and their application to information retrieval on Medline. Master’s thesis (2005)Google Scholar
  15. 15.
    Kuhn, M., Letunic, I., Jensen, L.J., Bork, P.: The SIDER database of drugs and side effects. Nucleic Acids Res. 44(D1), D1075–9 (2015)CrossRefGoogle Scholar
  16. 16.
    Pathak, J., Chute, C.G.: Analyzing categorical information in two publicly available drug terminologies: RxNorm and NDF-RT. J. Am. Med. Inform. Assoc. 17(4), 432–439 (2010)CrossRefGoogle Scholar
  17. 17.
    Canese, K., Weis, S.: PubMed: the bibliographic database. National Center for Biotechnology Information (2013)Google Scholar
  18. 18.
    Ho, I.W., Lee, C.T., Chen, P.W., Lo, Y.C.: Impact of cumulative antibiograms sub-categorized by origins of infection acquisition on the selection of empirical antimicrobial therapy. J. Biomed. Lab. Sci. 27(1), 10–18 (2015)Google Scholar
  19. 19.
    Hawkyard, C., Koerner, R.: The use of erythromycin as a gastrointestinal prokinetic agent in adult critical care: benefits versus risks authors’ response. J. Antimicrob. Chemother. 61(1), 227–228 (2007)CrossRefGoogle Scholar
  20. 20.
    Bryan, P., Rami A.R., Steven, S.: DeepWalk. In: Proceedings of SIGKDD (2014)Google Scholar
  21. 21.
    Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: International Joint Conference on Artificial Intelligence, pp. 1606–1611 (2007)Google Scholar
  22. 22.
    Robertson, S.: The probabilistic relevance framework: BM25 and beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2010)CrossRefGoogle Scholar
  23. 23.
    Ho, P., Wong, S.: Reducing bacterial resistance with IMPACT-Interhospital Multi-disciplinary Programme on Antimicrobial ChemoTherapy, 4th edn. Meteoritics And Planetaryence, pp. 1–176 (2012)Google Scholar
  24. 24.

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.School of Electronic and Computer EngineeringPeking University Shenzhen Graduate SchoolShenzhenChina

Personalised recommendations