Measuring the Semantic World – How to Map Meaning to High-Dimensional Entity Clusters in PubMed?

  • Janus WawrzinekEmail author
  • Wolf-Tilo Balke
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11279)


The exponential increase of scientific publications in the medical field urgently calls for innovative access paths beyond the limits of a term-based search. As an example, the search term “diabetes” leads to a result of over 600,000 publications in the medical digital library PubMed. In such cases, the automatic extraction of semantic relations between important entities like active substances, diseases, and genes can help to reveal entity-relationships and thus allow simplified access to the knowledge embedded in digital libraries. On the other hand, for semantic-relation tasks distributional embedding models based on neural networks promise considerable progress in terms of accuracy, performance and scalability. Yet, despite the recent successes of neural networks in this field, questions arise related to their non-deterministic nature: Are the semantic relations meaningful, and perhaps even new and unknown entity-relationships? In this paper, we address this question by measuring the associations between important pharmaceutical entities such as active substances (drugs) and diseases in high-dimensional embedded space. In our investigation, we show that while on one hand only few of the contextualized associations directly correlate with spatial distance, on the other hand we have discovered their potential for predicting new associations, which makes the method suitable as a new, literature-based technique for important practical tasks like e.g., drug repurposing.


Digital libraries Information extraction Neural embeddings 


  1. 1.
    Baroni, M., Dinu, G., Kruszewski, G.: Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 238–247 (2014)Google Scholar
  2. 2.
    Leaman, R., Islamaj Doğan, R., Lu, Z.: DNorm: disease name normalization with pairwise learning to rank. Bioinformatics 29(22), 2909–2917 (2013)CrossRefGoogle Scholar
  3. 3.
    Zhang, W., et al.: Predicting drug-disease associations based on the known association bipartite network. In: 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 503–509. IEEE, November 2017Google Scholar
  4. 4.
    Lotfi Shahreza, M., Ghadiri, N., Mousavi, S.R., Varshosaz, J., Green, J.R.: A review of network-based approaches to drug repositioning. Brief. Bioinform. (2017). Scholar
  5. 5.
    Jensen, L.J., Saric, J., Bork, P.: Literature mining for the biologist: from information retrieval to biological discovery. Nat. Rev. Genet. 7(2), 119 (2006)CrossRefGoogle Scholar
  6. 6.
    Dudley, J.T., Deshpande, T., Butte, A.J.: Exploiting drug–disease relationships for computational drug repositioning. Brief. Bioinform. 12(4), 303–311 (2011)CrossRefGoogle Scholar
  7. 7.
    Wawrzinek, J., Balke, W.-T.: Semantic facettation in pharmaceutical collections using deep learning for active substance contextualization. In: Choemprayong, S., Crestani, F., Cunningham, S.J. (eds.) ICADL 2017. LNCS, vol. 10647, pp. 41–53. Springer, Cham (2017). Scholar
  8. 8.
    Keiser, M.J., et al.: Predicting new molecular targets for known drugs. Nature 462(7270), 175 (2009)CrossRefGoogle Scholar
  9. 9.
    Agarwal, P., Searls, D.B.: Can literature analysis identify innovation drivers in drug discovery? Nat. Rev. Drug Discov. 8(11), 865 (2009)CrossRefGoogle Scholar
  10. 10.
    Ngo, D.L., et al.: Application of word embedding to drug repositioning. J. Biomed. Sci. Eng. 9(01), 7 (2016)CrossRefGoogle Scholar
  11. 11.
    Lengerich, B.J., Maas, A.L., Potts, C.: Retrofitting distributional embeddings to knowledge graphs with functional relations. arXiv preprint arXiv:1708.00112 (2017)
  12. 12.
    Mikolov, T., Yih, W.T., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 746–751 (2013)Google Scholar
  13. 13.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
  14. 14.
    Chiang, A.P., Butte, A.J.: Systematic evaluation of drug–disease relationships to identify leads for novel drug uses. Clin. Pharmacol. Ther. 86(5), 507–510 (2009)CrossRefGoogle Scholar
  15. 15.
    Elekes, Á., Schäler, M., Böhm, K.: On the various semantics of similarity in word embedding models. In: 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 1–10. IEEE, June 2017Google Scholar
  16. 16.
    Dumais, S.T.: Latent semantic analysis. Annu. Rev. Inf. Sci. Technol. (ARIST) 38(1), 188–230 (2004). Association for Information Science & TechnologyCrossRefGoogle Scholar
  17. 17.
    Larsen, P.O., Von Ins, M.: The rate of growth in scientific publication and the decline in coverage provided by Science Citation Index. Scientometrics 84(3), 575–603 (2010)CrossRefGoogle Scholar
  18. 18.
    Miller, G.A., Charles, W.G.: Contextual correlates of semantic similarity. Lang. Cogn. Process. 6(1), 1–28 (1991)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)CrossRefGoogle Scholar
  20. 20.
    Rinaldi, F., Clematide, S., Hafner, S.: Ranking of CTD articles and interactions using the OntoGene pipeline. In: Proceedings of the 2012 BioCreative Workshop, April 2012Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.IFIS TU-BraunschweigBrunswickGermany

Personalised recommendations