Semantic Facettation in Pharmaceutical Collections Using Deep Learning for Active Substance Contextualization
Alternative access paths to literature beyond mere keyword or bibliographic search are a major success factor in today’s digital libraries. Especially in the sciences, users are in dire need of complex knowledge spaces and facettations where entities like e.g., chemical substances, genes, or mathematical formulae may play a central role. However, even for clear-cut entities the requirements in terms of contextualized similarities or rankings may strongly differ. In this paper, we show how deep learning techniques used on scientific corpora lead to a strongly contextualized description of entities. As application case we take pharmaceutical entities in the form of small molecules and demonstrate how their learned contexts and profiles reflect their actual use as well as possible new uses, e.g., for drug design or repurposing. As our evaluation shows, the results gained are quite comparable to expensive manually maintained classifications in the field. Since our techniques only rely on deep embeddings of textual documents, our methodology promises to be generalizable to other use cases, too.
KeywordsDigital libraries Information extraction Facettation Deep learning
- 2.Tönnies, S., Köhncke, B., Balke, W.T.: Taking chemistry to the task: personalized queries for chemical digital libraries. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL 2011), Ottawa, Canada (2011)Google Scholar
- 5.Köhncke, B., Balke, W.-T.: Context-sensitive ranking using cross-domain knowledge for chemical digital libraries. In: Aalberg, T., Papatheodorou, C., Dobreva, M., Tsakonas, G., Farrugia, C.J. (eds.) TPDL 2013. LNCS, vol. 8092, pp. 285–296. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-40501-3_29 CrossRefGoogle Scholar
- 6.Gonzalez Pinto, J.M., Balke, W.T.: Demystifying the semantics of relevant objects in scholarly collections: a probabilistic approach. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), Knoxville, TN, USA (2015)Google Scholar
- 10.Dumais, S.T.: Latent semantic analysis. In: Annual Review of Information Science and Technology (ARIST), Association for Information Science & Technology, vol. 38, no. 1 (2004)Google Scholar
- 12.Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), Lake Tahoe, NV, USA (2013)Google Scholar