Measuring the Semantic World – How to Map Meaning to High-Dimensional Entity Clusters in PubMed?

Wawrzinek, Janus; Balke, Wolf-Tilo

doi:10.1007/978-3-030-04257-8_2

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11279))

Included in the following conference series:

International Conference on Asian Digital Libraries

1390 Accesses
4 Citations
3 Altmetric

Abstract

The exponential increase of scientific publications in the medical field urgently calls for innovative access paths beyond the limits of a term-based search. As an example, the search term “diabetes” leads to a result of over 600,000 publications in the medical digital library PubMed. In such cases, the automatic extraction of semantic relations between important entities like active substances, diseases, and genes can help to reveal entity-relationships and thus allow simplified access to the knowledge embedded in digital libraries. On the other hand, for semantic-relation tasks distributional embedding models based on neural networks promise considerable progress in terms of accuracy, performance and scalability. Yet, despite the recent successes of neural networks in this field, questions arise related to their non-deterministic nature: Are the semantic relations meaningful, and perhaps even new and unknown entity-relationships? In this paper, we address this question by measuring the associations between important pharmaceutical entities such as active substances (drugs) and diseases in high-dimensional embedded space. In our investigation, we show that while on one hand only few of the contextualized associations directly correlate with spatial distance, on the other hand we have discovered their potential for predicting new associations, which makes the method suitable as a new, literature-based technique for important practical tasks like e.g., drug repurposing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Baroni, M., Dinu, G., Kruszewski, G.: Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 238–247 (2014)
Google Scholar
Leaman, R., Islamaj Doğan, R., Lu, Z.: DNorm: disease name normalization with pairwise learning to rank. Bioinformatics 29(22), 2909–2917 (2013)
Article Google Scholar
Zhang, W., et al.: Predicting drug-disease associations based on the known association bipartite network. In: 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 503–509. IEEE, November 2017
Google Scholar
Lotfi Shahreza, M., Ghadiri, N., Mousavi, S.R., Varshosaz, J., Green, J.R.: A review of network-based approaches to drug repositioning. Brief. Bioinform. (2017). https://doi.org/10.1093/bib/bbx017
Article Google Scholar
Jensen, L.J., Saric, J., Bork, P.: Literature mining for the biologist: from information retrieval to biological discovery. Nat. Rev. Genet. 7(2), 119 (2006)
Article Google Scholar
Dudley, J.T., Deshpande, T., Butte, A.J.: Exploiting drug–disease relationships for computational drug repositioning. Brief. Bioinform. 12(4), 303–311 (2011)
Article Google Scholar
Wawrzinek, J., Balke, W.-T.: Semantic facettation in pharmaceutical collections using deep learning for active substance contextualization. In: Choemprayong, S., Crestani, F., Cunningham, S.J. (eds.) ICADL 2017. LNCS, vol. 10647, pp. 41–53. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70232-2_4
Chapter Google Scholar
Keiser, M.J., et al.: Predicting new molecular targets for known drugs. Nature 462(7270), 175 (2009)
Article Google Scholar
Agarwal, P., Searls, D.B.: Can literature analysis identify innovation drivers in drug discovery? Nat. Rev. Drug Discov. 8(11), 865 (2009)
Article Google Scholar
Ngo, D.L., et al.: Application of word embedding to drug repositioning. J. Biomed. Sci. Eng. 9(01), 7 (2016)
Article Google Scholar
Lengerich, B.J., Maas, A.L., Potts, C.: Retrofitting distributional embeddings to knowledge graphs with functional relations. arXiv preprint arXiv:1708.00112 (2017)
Mikolov, T., Yih, W.T., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 746–751 (2013)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Chiang, A.P., Butte, A.J.: Systematic evaluation of drug–disease relationships to identify leads for novel drug uses. Clin. Pharmacol. Ther. 86(5), 507–510 (2009)
Article Google Scholar
Elekes, Á., Schäler, M., Böhm, K.: On the various semantics of similarity in word embedding models. In: 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 1–10. IEEE, June 2017
Google Scholar
Dumais, S.T.: Latent semantic analysis. Annu. Rev. Inf. Sci. Technol. (ARIST) 38(1), 188–230 (2004). Association for Information Science & Technology
Article Google Scholar
Larsen, P.O., Von Ins, M.: The rate of growth in scientific publication and the decline in coverage provided by Science Citation Index. Scientometrics 84(3), 575–603 (2010)
Article Google Scholar
Miller, G.A., Charles, W.G.: Contextual correlates of semantic similarity. Lang. Cogn. Process. 6(1), 1–28 (1991)
Article MathSciNet Google Scholar
Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
Rinaldi, F., Clematide, S., Hafner, S.: Ranking of CTD articles and interactions using the OntoGene pipeline. In: Proceedings of the 2012 BioCreative Workshop, April 2012
Google Scholar

Download references

Author information

Authors and Affiliations

IFIS TU-Braunschweig, Mühlenpfordstrasse 23, 38106, Brunswick, Germany
Janus Wawrzinek & Wolf-Tilo Balke

Authors

Janus Wawrzinek
View author publications
You can also search for this author in PubMed Google Scholar
Wolf-Tilo Balke
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Janus Wawrzinek .

Editor information

Editors and Affiliations

University College London Qatar, Doha, Qatar
Milena Dobreva
University of Waikato, Hamilton, New Zealand
Annika Hinze
University of Ljubljana, Ljubljana, Slovenia
Maja Žumer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wawrzinek, J., Balke, WT. (2018). Measuring the Semantic World – How to Map Meaning to High-Dimensional Entity Clusters in PubMed?. In: Dobreva, M., Hinze, A., Žumer, M. (eds) Maturity and Innovation in Digital Libraries. ICADL 2018. Lecture Notes in Computer Science(), vol 11279. Springer, Cham. https://doi.org/10.1007/978-3-030-04257-8_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-04257-8_2
Published: 15 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04256-1
Online ISBN: 978-3-030-04257-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics