Skip to main content

Disease Normalization with Graph Embeddings

  • Conference paper
  • First Online:
Intelligent Systems and Applications (IntelliSys 2020)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1251))

Included in the following conference series:

Abstract

The detection and normalization of diseases in biomedical texts are key biomedical natural language processing tasks. Disease names need not only be identified, but also normalized or linked to clinical taxonomies describing diseases such as MeSH®. In this paper we describe deep learning methods that tackle both tasks. We train and test our methods on the known NCBI disease benchmark corpus. We propose to represent disease names by leveraging MeSH® ’s graphical structure together with the lexical information available in the taxonomy using graph embeddings. We also show that combining neural named entity recognition models with our graph-based entity linking methods via multitask learning leads to improved disease recognition in the NCBI corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Full MeSH® example: https://meshb.nlm.nih.gov/record/ui?ui=D011125.

  2. 2.

    We drop one training abstract because it is repeated: abstract 8528200.

  3. 3.

    https://www.nltk.org/.

  4. 4.

    The code of our experiments is available at: https://github.com/druv022/Disease-Normalization-with-Graph-Embeddings.

  5. 5.

    They also include a wide range of optimizations such as re-ranking, coherence models or abbreviation resolution.

References

  1. Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)

    Article  MathSciNet  Google Scholar 

  2. Cho, H., Choi, W., Lee, H.: A method for named entity normalization in biomedical articles: application to diseases and plants. BMC Bioinform. 18(1), 451 (2017)

    Article  Google Scholar 

  3. Davis, A.P., Grondin, C.J., Johnson, R.J., Sciaky, D., King, B.L., McMorran, R., Wiegers, J., Wiegers, T.C., Mattingly, C.J.: The comparative toxicogenomics database: update 2019. Nucleic Acids Res. 47(D1), D948–D954 (2018)

    Article  Google Scholar 

  4. Dogan, R.I., Lu, Z.: An inference method for disease name normalization. Information retrieval and knowledge discovery in biomedical text. In: AAAI Fall Symposium (2012)

    Google Scholar 

  5. Doğan, R.I., Leaman, R., Lu, Z.: NCBI disease corpus. J. Biomed. Inform. 47(C), 1–10 (2014)

    Google Scholar 

  6. Mehta, R., Wright, D., Katsis, Y., Hsu, C.-N.: NormCo: deep disease normalization for biomedical knowledge base construction. In: Proceedings of AKBC 2019 (2019)

    Google Scholar 

  7. Greenberg, N., Bansal, T., Verga, P., McCallum, A.: Marginal likelihood training of BiLSTM-CRF for biomedical named entity recognition from disjoint label sets. In: Proceedings of EMNLP 2018 (2018)

    Google Scholar 

  8. Grover, A., Leskovec, J.: Node2Vec: scalable feature learning for networks. In: Proceedings of KDD 2016 (2016)

    Google Scholar 

  9. Habibi, M., Weber, L., Neves, M., Wiegandt, D.L., Leser, U.: Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics 33(14), i37–i48 (2017)

    Article  Google Scholar 

  10. Hamosh, A., Scott, A.F., Amberger, J.S., Bocchini, C.A., McKusick, V.A.: Online mendelian inheritance in man (OMIM), a knowledge base of human genes and genetic disorders. Nucleic Acids Res. 33(suppl\_1), D514–D517 (2005)

    Google Scholar 

  11. Jin, Q., Dhingra, B., Cohen, W.W., Lu, X.: Probing biomedical embeddings from language models. CoRR, abs/1904.02181 (2019)

    Google Scholar 

  12. Kingma, D.P., Adam, J.B.: A method for stochastic optimization 2014. In: Proceedings of ICLR (2014)

    Google Scholar 

  13. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. CoRR, abs/1609.02907 (2016)

    Google Scholar 

  14. Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML 2001 (2001)

    Google Scholar 

  15. Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. CoRR, abs/1603.01360, 2016

    Google Scholar 

  16. Leaman, R., Islamaj Doğan, R., Lu, Z.: DNorm: disease name normalization with pairwise learning to rank. Bioinformatics 29(22), 2909–2917 (2013)

    Article  Google Scholar 

  17. Li, H., Chen, Q., Tang, B., Wang, X., Hua, X., Wang, B., Huang, D.: CNN-based ranking for biomedical entity normalization. BMC Bioinform. 18(11), 385 (2017)

    Article  Google Scholar 

  18. Lipscomb, C.E.: Medical subject headings (MeSH). Bull Med. Libr. Assoc. 88(3), 265–266 (2000)

    Google Scholar 

  19. Ma, X., Hovy, E.: End-to-end sequence labeling via Bi-directional LSTM-CNNs-CRF. CoRR, abs/1603.01354 (2016)

    Google Scholar 

  20. Marcheggiani, D., Titov, I.: Encoding sentences with graph convolutional networks for semantic role labeling. CoRR, abs/1703.04826 (2017)

    Google Scholar 

  21. Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. In: Proceedings of KDD 2014 (2014)

    Google Scholar 

  22. Pyysalo, S., Ginter, F., Moen, H., Salakoski, T., Ananiadou, S.: Distributional semantics resources for biomedical text processing. In: Proceedings of LBM 2013 (2013)

    Google Scholar 

  23. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  24. Thorne, C., Klinger, R.: On the semantic similarity of disease mentions in PubMed\(^{\textregistered }\) and Twitter. In: Proceedings of NLDB 2018 (2018)

    Google Scholar 

  25. Xuan Wang, Y., Zhang, X.R., Zhang, Y., Zitnik, M., Shang, J., Langlotz, C., Han, J.: Cross-type biomedical named entity recognition with deep multi-task learning. Bioinformatics 35(10), 1745–1752 (2018)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to C. Thorne .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pujary, D., Thorne, C., Aziz, W. (2021). Disease Normalization with Graph Embeddings. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Intelligent Systems and Applications. IntelliSys 2020. Advances in Intelligent Systems and Computing, vol 1251. Springer, Cham. https://doi.org/10.1007/978-3-030-55187-2_18

Download citation

Publish with us

Policies and ethics