Abstract
Over one million new biomedical articles are published every year. Efficient and accurate text-mining tools are urgently needed to automatically extract knowledge from these articles to support research and genetic testing. In particular, the extraction of gene-disease associations is mostly studied. However, existing text-mining tools for extracting gene-disease associations have limited capacity, as each sentence is considered separately. Our experiments show that the best existing tools, such as BeFree and DTMiner, achieve a precision of 48% and recall rate of 78% at most. In this study, we designed and implemented a deep learning approach, named RENET, which considers the correlation between the sentences in an article to extract gene-disease associations. Our method has significantly improved the precision and recall rate to 85.2% and 81.8%, respectively. The source code of RENET is available at https://bitbucket.org/alexwuhkucs/gda-extraction/src/master/.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Lu, Y.-F., Goldstein, D.B., Angrist, M., Cavalleri, G.: Personalized medicine and human genetic diversity. Cold Spring Harbor Perspect. Med. 4, a008581 (2014)
Garraway, L.A., Verweij, J., Ballman, K.V.: Precision oncology: an overview. J. Clin. Oncol. 31(15), 1803–1805 (2013)
Westergaard, D., Stærfeldt, H.-H., Tønsberg, C., Jensen, L.J., Brunak, S.: A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts. PLoS Comput. Biol. 14(2), e1005962 (2018)
Wei, C.-H., Kao, H.-Y., Lu, Z.: PubTator: a web-based text mining tool for assisting biocuration. Nucleic Acids Res. 41(W1), W518–W522 (2013)
Wang, Y., et al.: No association between bipolar disorder and syngr1 or synapsin II polymorphisms in the Han Chinese population. Psychiatry Res. 169(2), 167–168 (2009)
Hakenberg, J., et al.: A SNPshot of PubMed to associate genetic variants with drugs, diseases, and adverse reactions. J. Biomed. Inf. 45(5), 842–850 (2012)
Song, M., Kim, W.C., Lee, D., Heo, G.E., Kang, K.Y.: PKDE4 J: entity and relation extraction for public knowledge discovery. J. Biomed. Inf. 57, 320–332 (2015)
Thompson, P., Ananiadou, S.: Extracting gene-disease relations from text to support biomarker discovery. In: Proceedings of the 2017 International Conference on Digital Health, pp. 180–189. ACM (2017)
Bundschus, M., Dejori, M., Stetter, M., Tresp, V., Kriegel, H.-P.: Extraction of semantic biomedical relations from text using conditional random fields. BMC Bioinf. 9(1), 207 (2008)
Chun, H.-W., et al.: Extraction of gene-disease relations from Medline using domain dictionaries and machine learning. In: Biocomputing, pp. 4–15. World Scientific (2006)
Peng, Y., Lu, Z.: Deep learning for extracting protein-protein interactions from biomedical literature. arXiv preprint arXiv:1706.01556 (2017)
Bravo, À., Piñero, J., Queralt-Rosinach, N., Rautschka, M., Furlong, L.I.: Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research. BMC Bioinf. 16(1), 55 (2015)
Miwa, M., Bansal, M.: End-to-end relation extraction using LSTMS on sequences and tree structures. arXiv preprint arXiv:1601.00770 (2016)
Nguyen, T.H., Grishman, R.: Relation extraction: perspective from convolutional neural networks. In: Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, pp. 39–48 (2015)
Tang, D., Qin, B., Liu, T.: Document modeling with gated recurrent neural network for sentiment classification. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1422–1432 (2015)
Xu, D., et al.: DTMiner: identification of potential disease targets through biomedical literature mining. Bioinformatics 32(23), 3619–3626 (2016)
Roberts, R.J.: PubMed central: the GenBank of the published literature. Proc. Natl. Acad. Sci. U. S. A. 98(2), 381–382 (2001)
Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3(Feb), 1137–1155 (2003)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Tang, D., Qin, B., Liu, T.: Learning semantic representations of users and products for document level sentiment classification. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 1014–1023 (2015)
Denil, M., Demiraj, A., Kalchbrenner, N., Blunsom, P., de Freitas, N.: Modelling, visualising and summarising documents with a single convolutional neural network. arXiv preprint arXiv:1408.5882 (2014)
Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)
Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceeding of the Conference on Empirical Methods in Natural Language Processing, pp. 1724–1734 (2014)
Graves, A., Jaitly, N., Mohamed, A.-R.: Hybrid speech recognition with deep bidirectional LSTM. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 273–278. IEEE (2013)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Piñero, J., et al.: DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids Res. 45(D1), D833–D839 (2016)
Moen, S., Ananiadou, T.S.S.: Distributional semantics resources for biomedical text processing. In: Proceedings of the 5th International Symposium on Languages in Biology and Medicine, Tokyo, Japan, pp. 39–43 (2013)
Acknowledgments
This work was supported by Hong Kong ITF Grant ITS/331/17FP and General Research Fund No. 27204518.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Wu, Y., Luo, R., Leung, H.C.M., Ting, HF., Lam, TW. (2019). RENET: A Deep Learning Approach for Extracting Gene-Disease Associations from Literature. In: Cowen, L. (eds) Research in Computational Molecular Biology. RECOMB 2019. Lecture Notes in Computer Science(), vol 11467. Springer, Cham. https://doi.org/10.1007/978-3-030-17083-7_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-17083-7_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-17082-0
Online ISBN: 978-3-030-17083-7
eBook Packages: Computer ScienceComputer Science (R0)