Semantic Facettation in Pharmaceutical Collections Using Deep Learning for Active Substance Contextualization

  • Janus WawrzinekEmail author
  • Wolf-Tilo Balke
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10647)


Alternative access paths to literature beyond mere keyword or bibliographic search are a major success factor in today’s digital libraries. Especially in the sciences, users are in dire need of complex knowledge spaces and facettations where entities like e.g., chemical substances, genes, or mathematical formulae may play a central role. However, even for clear-cut entities the requirements in terms of contextualized similarities or rankings may strongly differ. In this paper, we show how deep learning techniques used on scientific corpora lead to a strongly contextualized description of entities. As application case we take pharmaceutical entities in the form of small molecules and demonstrate how their learned contexts and profiles reflect their actual use as well as possible new uses, e.g., for drug design or repurposing. As our evaluation shows, the results gained are quite comparable to expensive manually maintained classifications in the field. Since our techniques only rely on deep embeddings of textual documents, our methodology promises to be generalizable to other use cases, too.


Digital libraries Information extraction Facettation Deep learning 


  1. 1.
    Willett, P., Barnard, J.M., Downs, G.M.: Chemical similarity searching. J. Chem. Inf. Comput. Sci. 38(6), 983–996 (1998)CrossRefGoogle Scholar
  2. 2.
    Tönnies, S., Köhncke, B., Balke, W.T.: Taking chemistry to the task: personalized queries for chemical digital libraries. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL 2011), Ottawa, Canada (2011)Google Scholar
  3. 3.
    Wishart, D.S., Knox, C., Guo, A.C., Shrivastava, S., Hassanali, M., Stothard, P., Chang, Z., Woolsey, J.: DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res. 34(1), D668–D672 (2006). Database issueCrossRefGoogle Scholar
  4. 4.
    Sacco, G.M., Tzitzikas, Y.: Dynamic Taxonomies and Faceted Search: Theory, Practice, and Experience. Springer, Heidelberg (2009). doi: 10.1007/978-3-642-02359-0 CrossRefGoogle Scholar
  5. 5.
    Köhncke, B., Balke, W.-T.: Context-sensitive ranking using cross-domain knowledge for chemical digital libraries. In: Aalberg, T., Papatheodorou, C., Dobreva, M., Tsakonas, G., Farrugia, C.J. (eds.) TPDL 2013. LNCS, vol. 8092, pp. 285–296. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-40501-3_29 CrossRefGoogle Scholar
  6. 6.
    Gonzalez Pinto, J.M., Balke, W.T.: Demystifying the semantics of relevant objects in scholarly collections: a probabilistic approach. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), Knoxville, TN, USA (2015)Google Scholar
  7. 7.
    Gurulingappa, H., Kolárik, C., Hofmann-Apitius, M., Fluck, J.: Concept-based semi-automatic classification of drugs. J. Chem. Inf. Model. 49(8), 1986–1992 (2009)CrossRefGoogle Scholar
  8. 8.
    Dunkel, M., Günther, S., Ahmed, J., Wittig, B., Preissner, R.: SuperPred: drug classification and target prediction. Nucleic Acids Res. 36(suppl 2), W55–W59 (2008)CrossRefGoogle Scholar
  9. 9.
    Trieschnigg, D., Pezik, P., Lee, V., De Jong, F., Kraaij, W., Rebholz-Schuhmann, D.: MeSH Up: effective MeSH text classification for improved document retrieval. Bioinformatics 25(11), 1412–1418 (2009). Oxford University PressCrossRefGoogle Scholar
  10. 10.
    Dumais, S.T.: Latent semantic analysis. In: Annual Review of Information Science and Technology (ARIST), Association for Information Science & Technology, vol. 38, no. 1 (2004)Google Scholar
  11. 11.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003). MIT PresszbMATHGoogle Scholar
  12. 12.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), Lake Tahoe, NV, USA (2013)Google Scholar
  13. 13.
    Jessop, D.M., Adams, S.E., Willighagen, E.L., Hawizy, L., Murray-Rust, P.: OSCAR4: a flexible architecture for chemical text-mining. J. Cheminform. 3(1), 41 (2011). SpringerCrossRefGoogle Scholar
  14. 14.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)CrossRefzbMATHGoogle Scholar
  15. 15.
    Borg, I., Groenen, P.J.: Modern Multidimensional Scaling: Theory and Applications. Springer, Heidelberg (2005). doi: 10.1007/0-387-28981-X zbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.IFIS TU-BraunschweigBraunschweigGermany

Personalised recommendations