Disease Normalization with Graph Embeddings

Pujary, D.; Thorne, C.; Aziz, W.

doi:10.1007/978-3-030-55187-2_18

D. Pujary^17,18,
C. Thorne¹⁸ &
W. Aziz¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1251))

Included in the following conference series:

Proceedings of SAI Intelligent Systems Conference

977 Accesses
2 Citations

Abstract

The detection and normalization of diseases in biomedical texts are key biomedical natural language processing tasks. Disease names need not only be identified, but also normalized or linked to clinical taxonomies describing diseases such as MeSH^®. In this paper we describe deep learning methods that tackle both tasks. We train and test our methods on the known NCBI disease benchmark corpus. We propose to represent disease names by leveraging MeSH^® ’s graphical structure together with the lexical information available in the taxonomy using graph embeddings. We also show that combining neural named entity recognition models with our graph-based entity linking methods via multitask learning leads to improved disease recognition in the NCBI corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Full MeSH^® example: https://meshb.nlm.nih.gov/record/ui?ui=D011125.
2.
We drop one training abstract because it is repeated: abstract 8528200.
3.
https://www.nltk.org/.
4.
The code of our experiments is available at: https://github.com/druv022/Disease-Normalization-with-Graph-Embeddings.
5.
They also include a wide range of optimizations such as re-ranking, coherence models or abbreviation resolution.

References

Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)
Article MathSciNet Google Scholar
Cho, H., Choi, W., Lee, H.: A method for named entity normalization in biomedical articles: application to diseases and plants. BMC Bioinform. 18(1), 451 (2017)
Article Google Scholar
Davis, A.P., Grondin, C.J., Johnson, R.J., Sciaky, D., King, B.L., McMorran, R., Wiegers, J., Wiegers, T.C., Mattingly, C.J.: The comparative toxicogenomics database: update 2019. Nucleic Acids Res. 47(D1), D948–D954 (2018)
Article Google Scholar
Dogan, R.I., Lu, Z.: An inference method for disease name normalization. Information retrieval and knowledge discovery in biomedical text. In: AAAI Fall Symposium (2012)
Google Scholar
Doğan, R.I., Leaman, R., Lu, Z.: NCBI disease corpus. J. Biomed. Inform. 47(C), 1–10 (2014)
Google Scholar
Mehta, R., Wright, D., Katsis, Y., Hsu, C.-N.: NormCo: deep disease normalization for biomedical knowledge base construction. In: Proceedings of AKBC 2019 (2019)
Google Scholar
Greenberg, N., Bansal, T., Verga, P., McCallum, A.: Marginal likelihood training of BiLSTM-CRF for biomedical named entity recognition from disjoint label sets. In: Proceedings of EMNLP 2018 (2018)
Google Scholar
Grover, A., Leskovec, J.: Node2Vec: scalable feature learning for networks. In: Proceedings of KDD 2016 (2016)
Google Scholar
Habibi, M., Weber, L., Neves, M., Wiegandt, D.L., Leser, U.: Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics 33(14), i37–i48 (2017)
Article Google Scholar
Hamosh, A., Scott, A.F., Amberger, J.S., Bocchini, C.A., McKusick, V.A.: Online mendelian inheritance in man (OMIM), a knowledge base of human genes and genetic disorders. Nucleic Acids Res. 33(suppl\_1), D514–D517 (2005)
Google Scholar
Jin, Q., Dhingra, B., Cohen, W.W., Lu, X.: Probing biomedical embeddings from language models. CoRR, abs/1904.02181 (2019)
Google Scholar
Kingma, D.P., Adam, J.B.: A method for stochastic optimization 2014. In: Proceedings of ICLR (2014)
Google Scholar
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. CoRR, abs/1609.02907 (2016)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML 2001 (2001)
Google Scholar
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. CoRR, abs/1603.01360, 2016
Google Scholar
Leaman, R., Islamaj Doğan, R., Lu, Z.: DNorm: disease name normalization with pairwise learning to rank. Bioinformatics 29(22), 2909–2917 (2013)
Article Google Scholar
Li, H., Chen, Q., Tang, B., Wang, X., Hua, X., Wang, B., Huang, D.: CNN-based ranking for biomedical entity normalization. BMC Bioinform. 18(11), 385 (2017)
Article Google Scholar
Lipscomb, C.E.: Medical subject headings (MeSH). Bull Med. Libr. Assoc. 88(3), 265–266 (2000)
Google Scholar
Ma, X., Hovy, E.: End-to-end sequence labeling via Bi-directional LSTM-CNNs-CRF. CoRR, abs/1603.01354 (2016)
Google Scholar
Marcheggiani, D., Titov, I.: Encoding sentences with graph convolutional networks for semantic role labeling. CoRR, abs/1703.04826 (2017)
Google Scholar
Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. In: Proceedings of KDD 2014 (2014)
Google Scholar
Pyysalo, S., Ginter, F., Moen, H., Salakoski, T., Ananiadou, S.: Distributional semantics resources for biomedical text processing. In: Proceedings of LBM 2013 (2013)
Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Thorne, C., Klinger, R.: On the semantic similarity of disease mentions in PubMed\(^{\textregistered }\) and Twitter. In: Proceedings of NLDB 2018 (2018)
Google Scholar
Xuan Wang, Y., Zhang, X.R., Zhang, Y., Zitnik, M., Shang, J., Langlotz, C., Han, J.: Cross-type biomedical named entity recognition with deep multi-task learning. Bioinformatics 35(10), 1745–1752 (2018)
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Amsterdam, Amsterdam, The Netherlands
D. Pujary & W. Aziz
Elsevier, Amsterdam, The Netherlands
D. Pujary & C. Thorne

Authors

D. Pujary
View author publications
You can also search for this author in PubMed Google Scholar
C. Thorne
View author publications
You can also search for this author in PubMed Google Scholar
W. Aziz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to C. Thorne .

Editor information

Editors and Affiliations

Saga University, Saga, Japan
Kohei Arai
The Science and Information (SAI) Organization, Bradford, West Yorkshire, UK
Supriya Kapoor
The Science and Information (SAI) Organization, Bradford, West Yorkshire, UK
Rahul Bhatia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pujary, D., Thorne, C., Aziz, W. (2021). Disease Normalization with Graph Embeddings. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Intelligent Systems and Applications. IntelliSys 2020. Advances in Intelligent Systems and Computing, vol 1251. Springer, Cham. https://doi.org/10.1007/978-3-030-55187-2_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-55187-2_18
Published: 25 August 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-55186-5
Online ISBN: 978-3-030-55187-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics