Abstract
The detection and normalization of diseases in biomedical texts are key biomedical natural language processing tasks. Disease names need not only be identified, but also normalized or linked to clinical taxonomies describing diseases such as MeSH®. In this paper we describe deep learning methods that tackle both tasks. We train and test our methods on the known NCBI disease benchmark corpus. We propose to represent disease names by leveraging MeSH® ’s graphical structure together with the lexical information available in the taxonomy using graph embeddings. We also show that combining neural named entity recognition models with our graph-based entity linking methods via multitask learning leads to improved disease recognition in the NCBI corpus.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Full MeSH® example: https://meshb.nlm.nih.gov/record/ui?ui=D011125.
- 2.
We drop one training abstract because it is repeated: abstract 8528200.
- 3.
- 4.
The code of our experiments is available at: https://github.com/druv022/Disease-Normalization-with-Graph-Embeddings.
- 5.
They also include a wide range of optimizations such as re-ranking, coherence models or abbreviation resolution.
References
Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)
Cho, H., Choi, W., Lee, H.: A method for named entity normalization in biomedical articles: application to diseases and plants. BMC Bioinform. 18(1), 451 (2017)
Davis, A.P., Grondin, C.J., Johnson, R.J., Sciaky, D., King, B.L., McMorran, R., Wiegers, J., Wiegers, T.C., Mattingly, C.J.: The comparative toxicogenomics database: update 2019. Nucleic Acids Res. 47(D1), D948–D954 (2018)
Dogan, R.I., Lu, Z.: An inference method for disease name normalization. Information retrieval and knowledge discovery in biomedical text. In: AAAI Fall Symposium (2012)
Doğan, R.I., Leaman, R., Lu, Z.: NCBI disease corpus. J. Biomed. Inform. 47(C), 1–10 (2014)
Mehta, R., Wright, D., Katsis, Y., Hsu, C.-N.: NormCo: deep disease normalization for biomedical knowledge base construction. In: Proceedings of AKBC 2019 (2019)
Greenberg, N., Bansal, T., Verga, P., McCallum, A.: Marginal likelihood training of BiLSTM-CRF for biomedical named entity recognition from disjoint label sets. In: Proceedings of EMNLP 2018 (2018)
Grover, A., Leskovec, J.: Node2Vec: scalable feature learning for networks. In: Proceedings of KDD 2016 (2016)
Habibi, M., Weber, L., Neves, M., Wiegandt, D.L., Leser, U.: Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics 33(14), i37–i48 (2017)
Hamosh, A., Scott, A.F., Amberger, J.S., Bocchini, C.A., McKusick, V.A.: Online mendelian inheritance in man (OMIM), a knowledge base of human genes and genetic disorders. Nucleic Acids Res. 33(suppl\_1), D514–D517 (2005)
Jin, Q., Dhingra, B., Cohen, W.W., Lu, X.: Probing biomedical embeddings from language models. CoRR, abs/1904.02181 (2019)
Kingma, D.P., Adam, J.B.: A method for stochastic optimization 2014. In: Proceedings of ICLR (2014)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. CoRR, abs/1609.02907 (2016)
Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML 2001 (2001)
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. CoRR, abs/1603.01360, 2016
Leaman, R., Islamaj Doğan, R., Lu, Z.: DNorm: disease name normalization with pairwise learning to rank. Bioinformatics 29(22), 2909–2917 (2013)
Li, H., Chen, Q., Tang, B., Wang, X., Hua, X., Wang, B., Huang, D.: CNN-based ranking for biomedical entity normalization. BMC Bioinform. 18(11), 385 (2017)
Lipscomb, C.E.: Medical subject headings (MeSH). Bull Med. Libr. Assoc. 88(3), 265–266 (2000)
Ma, X., Hovy, E.: End-to-end sequence labeling via Bi-directional LSTM-CNNs-CRF. CoRR, abs/1603.01354 (2016)
Marcheggiani, D., Titov, I.: Encoding sentences with graph convolutional networks for semantic role labeling. CoRR, abs/1703.04826 (2017)
Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. In: Proceedings of KDD 2014 (2014)
Pyysalo, S., Ginter, F., Moen, H., Salakoski, T., Ananiadou, S.: Distributional semantics resources for biomedical text processing. In: Proceedings of LBM 2013 (2013)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Thorne, C., Klinger, R.: On the semantic similarity of disease mentions in PubMed\(^{\textregistered }\) and Twitter. In: Proceedings of NLDB 2018 (2018)
Xuan Wang, Y., Zhang, X.R., Zhang, Y., Zitnik, M., Shang, J., Langlotz, C., Han, J.: Cross-type biomedical named entity recognition with deep multi-task learning. Bioinformatics 35(10), 1745–1752 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Pujary, D., Thorne, C., Aziz, W. (2021). Disease Normalization with Graph Embeddings. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Intelligent Systems and Applications. IntelliSys 2020. Advances in Intelligent Systems and Computing, vol 1251. Springer, Cham. https://doi.org/10.1007/978-3-030-55187-2_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-55187-2_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-55186-5
Online ISBN: 978-3-030-55187-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)