Towards Creating a New Triple Store for Literature-Based Discovery

Koroleva, Anna; Anisimova, Maria; Gil, Manuel

doi:10.1007/978-3-030-60470-7_5

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12237))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

620 Accesses
1 Citations
6 Altmetric

Abstract

Literature-based discovery (LBD) is a field of research aiming at discovering new knowledge by mining scientific literature. Knowledge bases are commonly used by LBD systems. SemMedDB, created with the use of SemRep information extraction system, is the most frequently used database in LBD. However, new applications of LBD are emerging that go beyond the scope of SemMedDB. In this work, we propose some new discovery patterns that lie in the domain of Natural Products and that are not covered by the existing databases and tools. Our goal thus is to create a new, extended knowledge base, addressing limitations of SemMedDB. Our proposed contribution is three-fold: 1) we add types of entities and relations that are of interest for LBD but are not covered by SemMedDB; 2) we plan to leverage full texts of scientific publications, instead of titles and abstracts only; 3) we envisage using the RDF model for our database, in accordance with Semantic Web standards. To create a new database, we plan to build a distantly supervised entity and relation extraction system, employing a neural networks/deep learning architecture. We describe the methods and tools we plan to employ.

This work is funded by a grant (9710.3.01.5.0001.08) from Health@N, ZHAW.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://arxiv.org/.
2.
https://www.biorxiv.org/.
3.
https://www.medrxiv.org.
4.
https://www.drugbank.ca/.
5.
https://www.w3.org/standards/semanticweb/.
6.
https://spacy.io/.
7.
See https://allenai.github.io/scispacy/.

References

Abacha, A.B., Zweigenbaum, P.: Medical entity recognition: a comparison of semantic and statistical methods. In: BioNLP ACL (2011)
Google Scholar
Aronson, A.: Effective mapping of biomedical text to the UMLS Metathesaurus: the metamap program. In: AMIA Annual Symposium 2001, pp. 17–21, February 2001
Google Scholar
Baker, N.C.: Methods in literature-based drug discovery (2010)
Google Scholar
Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, IJCAI 2007, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp. 2670–2676 (2007)
Google Scholar
Bodenreider, O.: The unified medical language system (UMLs): integrating biomedical terminology. Nucleic Acids Res. 32(Database issue), D267–D270 (2004)
Article Google Scholar
Bravo, A., Piñero, J., Queralt-Rosinach, N., Rautschka, M., Furlong, L.I.: Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research. BMC Bioinform. 16, 55 (2015)
Article Google Scholar
Bui, Q.C.: Relation extraction methods for biomedical literature. Ph.D. thesis, Informatics Institute (IVI), University of Amsterdam (2012)
Google Scholar
Cairelli, M.J., Miller, C.M., Fiszman, M., Workman, T.E., Rindflesch, T.C.: Semantic MEDLINE for discovery browsing: using semantic predications and the literature-based discovery paradigm to elucidate a mechanism for the obesity paradox. In: AMIA Annual Symposium Proceedings, pp. 164–73 (2013)
Google Scholar
Cameron, D., Kavuluru, R., Rindflesch, T.C., Sheth, A.P., Thirunarayan, K., Bodenreider, O.: Context-driven automatic subgraph creation for literature-based discovery. J. Biomed. Inf. 54, 141–157 (2015)
Article Google Scholar
Chen, H., Sharp, B.M.: Content-rich biological network constructed by mining PubMed abstracts. BMC Bioinform. 5, 147 (2004)
Article Google Scholar
Chichester, C., Digles, D., Siebes, R., Loizou, A., Groth, P., Harland, L.: Drug discovery FAQs: workflows for answering multidomain drug discovery questions. Drug Discovery Today 20(4), 399–405 (2015)
Article Google Scholar
Cohen, T., Schvaneveldt, R., Widdows, D.: Reflective Random Indexing and indirect inference: a scalable method for discovery of implicit connections. J. Biomed. Inf. 43(2), 240–256 (2010)
Article Google Scholar
Cohen, T., Whitfield, G.K., Schvaneveldt, R.W., Mukund, K., Rindflesch, T.: EpiphaNet: an interactive tool to support biomedical discoveries. J. Biomed. Discovery Collab. 5, 21–49 (2010)
Google Scholar
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018). http://arxiv.org/abs/1810.04805
Eberts, M., Ulges, A.: Span-based joint entity and relation extraction with transformer pre-training (2019)
Google Scholar
Federhen, S.: The NCBI taxonomy database. Nucleic Acids Res. 40(D1), D136–D143 (2011)
Article Google Scholar
Gopalakrishnan, V., Jha, K., Jin, W., Zhang, A.: A survey on literature based discovery approaches in biomedical domain. J. Biomed. Inform. 93, 103141 (2019)
Article Google Scholar
Hastings, J., et al.: Chebi in 2016: improved services and an expanding collection of metabolites. Nucleic Acids Res. 44, D1214–D1219 (2015)
Article Google Scholar
Hristovski, D., Friedman, C., Rindflesch, T.C., Peterlin, B.: Exploiting semantic relations for literature-based discovery. In: AMIA Annual Symposium proceedings, pp. 349–53 (2006)
Google Scholar
Hristovski, D., Peterlin, B., Mitchell, J.A., Humphrey, S.M.: Improving literature based discovery support by genetic knowledge integration (2003)
Google Scholar
Hristovski, D., Rindflesch, T., Peterlin, B.: Using literature-based discovery to identify novel therapeutic approaches. Cardiovasc. hematol. Agents Med. Chem. 11(1), 14–24 (2013)
Article Google Scholar
Hui, W., Lau, W.K.: Application of literature-based discovery in nonmedical disciplines: a survey. In: Proceedings of the 2nd International Conference on Computing and Big Data, ICCBD 2019, pp. 7–11. Association for Computing Machinery, New York (2019)
Google Scholar
Ijaz, A.Z., Song, M., Lee, D.: MKEM: a multi-level knowledge emergence model for mining undiscovered public knowledge. BMC Bioinform. 11(Suppl 2), S3 (2010)
Article Google Scholar
Kilicoglu, H., Shin, D., Fiszman, M., Rosemblat, G., Rindflesch, T.C.: SemMedDB: a PubMed-scale repository of biomedical semantic predications. Bioinformatics 28(23), 3158 (2012)
Article Google Scholar
Korbel, J.O., et al.: Systematic association of genes to phenotypes by genome and literature mining. PLoS Biol. 3(5), e134 (2005)
Article Google Scholar
Koroleva, A., Kamath, S., Paroubek, P.: Extracting outcomes from articlesreporting randomized controlled trialsusing pre-trained deep language representations. Assisted authoring for avoiding inadequate claims in scientific reporting, chap. 3, pp. 45–68. Print Service Ede, The Netherlands (2019)
Google Scholar
Koroleva, A., Kamath, S., Paroubek, P.: Measuring semantic similarity of clinical trial outcomes using deep pre-trained language representations. J. Biomed. Inf. X 4, 100058 (2019)
Google Scholar
Lee, J., et al.: Biobert: a pre-trained biomedical language representation model for biomedical text mining. arXiv preprint arXiv:1901.08746 (2019)
Li, J., et al.: Biocreative V CDR task corpus: a resource for chemical disease relation extraction. Database (2016)
Google Scholar
Liu, Y., et al.: Roberta: a robustly optimized Bert pretraining approach (2019)
Google Scholar
Manohar, N., Adam, T., Pakhomov, S., Melton, G., Zhang, R.: Evaluation of herbal and dietary supplement resource term coverage. Stud. Health Technol. Inform. 216, 785–9 (2015)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, NIPS 2013, vol. 2, pp. 3111–3119. Curran Associates Inc., Red Hook (2013)
Google Scholar
Mohan, S., Li, D.: Medmentions: a large biomedical corpus annotated with UMLS concepts. In: Proceedings of the 2019 Conference on Automated Knowledge Base Construction (AKBC 2019) (2019)
Google Scholar
van Mulligen, E.M., et al.: The EU-ADR corpus: annotated drugs, diseases, targets, and their relationships. J. Biomed. Inform. 45(5), 879–884 (2012). Text Mining and Natural Language Processing in Pharmacogenomics
Article Google Scholar
Neumann, M., King, D., Beltagy, I., Ammar, W.: ScispaCy: fast and robust models for biomedical natural language processing. In: Proceedings of the 18th BioNLP Workshop and Shared Task, pp. 319–327. Association for Computational Linguistics, Florence, Augst 2019
Google Scholar
Nguyen, N.T., Gabud, R.S., Ananiadou, S.: Copious: a gold standard corpus of named entities towards extracting species occurrence from biodiversity literature. Biodivers. Data J. 7, e29626 (2019)
Article Google Scholar
Ozgür, A., Xiang, Z., Radev, D.R., He, Y.: Literature-based discovery of IFN-gamma and vaccine-mediated gene interaction networks. J. Biomed. Biotechnol. 2010, 426479 (2010)
Article Google Scholar
Papanikolaou, Y., Roberts, I., Pierleoni, A.: Deep bidirectional transformers for relation extraction without supervision. In: Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP, DeepLo 2019 (2019). https://doi.org/10.18653/v1/d19-6108
Peters, M., et al.: Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long Papers) (2018). https://doi.org/10.18653/v1/n18-1202
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding with unsupervised learning. Technical report, OpenAI (2018)
Google Scholar
Rastegar-Mojarad, M., Elayavilli, R.K., Li, D., Prasad, R., Liu, H.: A new method for prioritizing drug repositioning candidates extracted by literature-based discovery. In: IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 669–674. IEEE, November 2015
Google Scholar
Rindflesch, T.C., Fiszman, M.: The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text. J. Biomed. Inform. 36(6), 462–477 (2003). Unified Medical Language System, unified Medical Language System
Article Google Scholar
Sang, S., Yang, Z., Wang, L., Liu, X., Lin, H., Wang, J.: SemaTyP: a knowledge graph based literature mining method for drug discovery. BMC Bioinform. 19(1), 193 (2018)
Article Google Scholar
Smith, B., et al.: The obo foundry: coordinated evolution of ontologies to support biomedical data integration. Nat. Biotech. 25(11), 1251–1255 (2007)
Article Google Scholar
Song, M., Han, N.G., Kim, Y.H., Ding, Y., Chambers, T.: Discovering implicit entity relation with the gene-citation-gene network. PloS One 8(12), e84639 (2013)
Article Google Scholar
Swanson, D.R.: Fish oil, Raynaud’s syndrome, and undiscovered public knowledge. Perspect. Biol. Med. 30, 7–18 (1986)
Article Google Scholar
Swanson, D.R.: Migraine and magnesium: eleven neglected connections. Perspect. Biol. Med. 31, 526–557 (1988)
Article Google Scholar
Sybrandt, J., Shtutman, M., Safro, I.: MOLIERE: automatic biomedical hypothesis generation system. In: KDD : Proceedings of the International Conference on Knowledge Discovery & Data Mining 2017, pp. 1633–1642, August 2017
Google Scholar
Torvik, V.I., Smalheiser, N.R.: A quantitative model for linking two disparate sets of articles in MEDLINE. Bioinformatics 23(13), 1658–1665 (2007)
Article Google Scholar
Wilkowski, B., et al.: Graph-based methods for discovery browsing with semantic predications. In: AMIA Annual Symposium Proceedings 2011, pp. 1514–1523 (2011)
Google Scholar
Williams, A.J., et al.: Open PHACTS: semantic interoperability for drug discovery. Drug Discovery Today 17(21), 1188–1198 (2012)
Article Google Scholar
Wu, H.Y., et al.: An integrated pharmacokinetics ontology and corpus for text mining. BMC Bioinform. 14, 35 (2013)
Article Google Scholar
Zhang, O.R., Zhang, Y., Xu, J., Roberts, K., Zhang, X.Y., Xu, H.: Interweaving domain knowledge and unsupervised learning for psychiatric stressor extraction from clinical notes. In: Benferhat, S., Tabia, K., Ali, M. (eds.) IEA/AIE 2017. LNCS (LNAI), vol. 10351, pp. 396–406. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-60045-1_41
Chapter Google Scholar
Zhang, R., et al.: Exploiting literature-derived knowledge and semantics to identify potential prostate cancer drugs. Cancer Inform 13(s1), 103–111 (2014). https://doi.org/10.4137/CIN.S13889
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Applied Simulation, School of Life Sciences and Facility Management, Zurich University of Applied Sciences (ZHAW), Grüentalstrasse 14, P.O. Box, 8820, Waedenswil, Switzerland
Anna Koroleva, Maria Anisimova & Manuel Gil
Swiss Institute of Bioinformatics (SIB), Quartier Sorge - Bâtiment Génopode, 1015, Lausanne, Switzerland
Anna Koroleva, Maria Anisimova & Manuel Gil

Authors

Anna Koroleva
View author publications
You can also search for this author in PubMed Google Scholar
Maria Anisimova
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Gil
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Anna Koroleva or Manuel Gil .

Editor information

Editors and Affiliations

Singapore University of Technology and Design, Singapore, Singapore
Wei Lu
Shanghai Jiao Tong University, Shanghai, China
Kenny Q. Zhu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Koroleva, A., Anisimova, M., Gil, M. (2020). Towards Creating a New Triple Store for Literature-Based Discovery. In: Lu, W., Zhu, K.Q. (eds) Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2020. Lecture Notes in Computer Science(), vol 12237. Springer, Cham. https://doi.org/10.1007/978-3-030-60470-7_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-60470-7_5
Published: 15 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60469-1
Online ISBN: 978-3-030-60470-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics