Graph-Enriched Biomedical Entity Representation Transformer

Sakhovskiy, Andrey; Semenova, Natalia; Kadurin, Artur; Tutubalina, Elena

doi:10.1007/978-3-031-42448-9_10

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14163))

Included in the following conference series:

International Conference of the Cross-Language Evaluation Forum for European Languages

650 Accesses

Abstract

Infusing external domain-specific knowledge about diverse biomedical concepts and relationships into language models (LMs) advances their ability to handle specialised in-domain tasks like medical concept normalization (MCN). However, existing biomedical LMs are primarily trained with contrastive learning using synonymous concept names from a terminology (e.g., UMLS) as positive anchors, while accurate aggregation of the features of graph nodes and neighbors remains a challenge. In this paper, we present Graph-Enriched Biomedical Entity Representation Transformer (GEBERT) which captures graph structural data from the UMLS via graph neural networks and contrastive learning. In GEBERT, we enrich the entity representations by introducing an additional graph-based node-level contrastive objective. To enable mutual knowledge sharing among the textual and the structural modalities, we minimize the contrastive objective between a concept’s node representation and its textual embedding obtained via LM. We explore several state-of-the-art convolutional graph architectures, namely GraphSAGE and GAT, to learn relational information from local node neighborhood. After task-specific supervision, GEBERT achieves state-of-the-art results on five MCN datasets in English.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Aronson, A.R.: Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In: Proceedings of the AMIA Symposium, p. 17 (2001)
Google Scholar
Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3615–3620 (2019)
Google Scholar
Bodenreider, O.: The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32(suppl_1), D267–D270 (2004)
Google Scholar
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: Advances in Neural Information Processing Systems, vol. 26 (2013)
Google Scholar
Chen, H., Chen, W., Liu, C., Zhang, L., Su, J., Zhou, X.: Relational network for knowledge discovery through heterogeneous biomedical and clinical features. Sci. Rep. 6(1), 29915 (2016)
Article Google Scholar
Dermouche, M., Looten, V., Flicoteaux, R., Chevret, S., Velcin, J., Taright, N.: ECSTRA-INSERM@ CLEF eHealth2016-task 2: ICD10 code extraction from death certificates. In: CLEF (2016)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4171–4186 (2019)
Google Scholar
Doğan, R.I., Leaman, R., Lu, Z.: NCBI disease corpus: a resource for disease name recognition and concept normalization. J. Biomed. Inform. 47, 1–10 (2014)
Article Google Scholar
Fiorini, N., et al.: Best match: new relevance search for PubMed. PLoS Biol. 16(8), e2005343 (2018)
Article Google Scholar
Gillick, D., Kulkarni, S., Lansing, L., Presta, A., Baldridge, J., Ie, E., Garcia-Olano, D.: Learning dense representations for entity retrieval. In: Proceedings of the 23rd Conference on Computational Natural Language Learning, pp. 528–537 (2019)
Google Scholar
Gilmer, J., Schoenholz, S.S., Riley, P.F., Vinyals, O., Dahl, G.E.: Neural message passing for quantum chemistry. In: International Conference on Machine Learning, pp. 1263–1272. PMLR (2017)
Google Scholar
Gu, Y., et al.: Domain-specific language model pretraining for biomedical natural language processing. ACM Trans. Comput. Healthcare 3(1), 1–23 (2021)
Article Google Scholar
Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Lee, J., et al.: BioBERT: pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 1234–1240 (2019)
Article Google Scholar
Lee, S., et al.: Best: next-generation biomedical entity search tool for knowledge discovery from biomedical literature. PLoS ONE 11(10), e0164680 (2016)
Article Google Scholar
Li, J., et al.: BioCreative V CDR task corpus: a resource for chemical disease relation extraction. Database 2016 (2016)
Google Scholar
Liu, F., Shareghi, E., Meng, Z., Basaldella, M., Collier, N.: Self-alignment pretraining for biomedical entity representations. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4228–4238 (2021)
Google Scholar
Liu, F., Vulić, I., Korhonen, A., Collier, N.: Learning domain-specialised representations for cross-lingual biomedical entity linking. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pp. 565–574 (2021)
Google Scholar
Lou, Y., Qian, T., Li, F., Zhou, J., Ji, D., Cheng, M.: Investigating of disease name normalization using neural network and pre-training. IEEE Access 8, 85729–85739 (2020)
Article Google Scholar
Michalopoulos, G., Wang, Y., Kaka, H., Chen, H., Wong, A.: UmlsBERT: clinical domain knowledge augmentation of contextual embeddings using the unified medical language system metathesaurus. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1744–1753 (2021)
Google Scholar
Miftahutdinov, Z., Alimova, I., Tutubalina, E.: On biomedical named entity recognition: experiments in interlingual transfer for clinical and social media texts. In: Jose, J.M., et al. (eds.) ECIR 2020. LNCS, vol. 12036, pp. 281–288. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45442-5_35
Chapter Google Scholar
Miftahutdinov, Z., Kadurin, A., Kudrin, R., Tutubalina, E.: Medical concept normalization in clinical trials with drug and disease representation learning. Bioinformatics 37(21), 3856–3864 (2021)
Article Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Mondal, I., et al.: Medical entity linking using triplet network, pp. 95–100 (2019)
Google Scholar
Morgan, A.A., et al.: Overview of biocreative ii gene normalization. Genome Biol. 9(S2), S3 (2008)
Article Google Scholar
Niu, J., Yang, Y., Zhang, S., Sun, Z., Zhang, W.: Multi-task character-level attentional networks for medical concept normalization. Neural Process. Lett. 49, 1239–1256 (2019)
Article Google Scholar
Phan, M.C., Sun, A., Tay, Y.: Robust representation learning of biomedical names. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 3275–3285 (2019)
Google Scholar
Roberts, K., Demner-Fushman, D., Tonning, J.M.: Overview of the TAC 2017 adverse reaction extraction from drug labels track. In: TAC (2017)
Google Scholar
Soni, S., Roberts, K.: An evaluation of two commercial deep learning-based information retrieval systems for COVID-19 literature. J. Am. Med. Inform. Assoc. 28(1), 132–137 (2021)
Article Google Scholar
Sung, M., Jeon, H., Lee, J., Kang, J.: Biomedical entity representations with synonym marginalization. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 3641–3650 (2020)
Google Scholar
Sutton, R.T., Pincock, D., Baumgart, D.C., Sadowski, D.C., Fedorak, R.N., Kroeker, K.I.: An overview of clinical decision support systems: benefits, risks, and strategies for success. NPJ Digit. Med. 3(1), 17 (2020)
Article Google Scholar
Tutubalina, E., Kadurin, A., Miftahutdinov, Z.: Fair evaluation in concept normalization: a large-scale comparative analysis for BERT-based models. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 6710–6716 (2020)
Google Scholar
Van Mulligen, E., Afzal, Z., Akhondi, S.A., Vo, D., Kors, J.A.: Erasmus MC at CLEF eHealth 2016: concept recognition and coding in French texts. In: CLEF (2016)
Google Scholar
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., Bengio, Y.: Graph attention networks. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=rJXMpikCZ. accepted as poster
Wang, X., Han, X., Huang, W., Dong, D., Scott, M.R.: Multi-similarity loss with general pair weighting for deep metric learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5022–5030 (2019)
Google Scholar
Yang, B., Yih, S.W.T., He, X., Gao, J., Deng, L.: Embedding entities and relations for learning and inference in knowledge bases. In: Proceedings of the International Conference on Learning Representations (ICLR) 2015 (2015)
Google Scholar
Yuan, Z., Zhao, Z., Sun, H., Li, J., Wang, F., Yu, S.: CODER: knowledge-infused cross-lingual medical term embedding for term normalization. J. Biomed. Inform. 126, 103983 (2022)
Article Google Scholar
Zhu, M., Celikkaya, B., Bhatia, P., Reddy, C.K.: LATTE: latent type modeling for biomedical entity linking. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 9757–9764 (2020)
Google Scholar

Download references

Acknowledgments

The work has been supported by the Russian Science Foundation grant # 23-11-00358.

Author information

Authors and Affiliations

Sber AI, Moscow, Russia
Andrey Sakhovskiy & Natalia Semenova
Artificial Intelligence Research Institute, Moscow, Russia
Natalia Semenova, Artur Kadurin & Elena Tutubalina
Kazan (Volga Region) Federal University, Kazan, Russia
Andrey Sakhovskiy & Elena Tutubalina

Authors

Andrey Sakhovskiy
View author publications
You can also search for this author in PubMed Google Scholar
Natalia Semenova
View author publications
You can also search for this author in PubMed Google Scholar
Artur Kadurin
View author publications
You can also search for this author in PubMed Google Scholar
Elena Tutubalina
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrey Sakhovskiy .

Editor information

Editors and Affiliations

Democritus University of Thrace, Xanthi, Greece
Avi Arampatzis
University of Amsterdam, Amsterdam, The Netherlands
Evangelos Kanoulas
CERTH-ITI, Thessaloniki, Greece
Theodora Tsikrika
CERTH-ITI, Thessaloniki, Greece
Stefanos Vrochidis
Utrecht University, Utrecht, The Netherlands
Anastasia Giachanou
Elsevier, Amsterdam, The Netherlands
Dan Li
University of Amsterdam, Amsterdam, The Netherlands
Mohammad Aliannejadi
University of Lausanne, Lausanne, Switzerland
Michalis Vlachos
University of Padua, Padova, Italy
Guglielmo Faggioli
University of Padua, Padova, Italy
Nicola Ferro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sakhovskiy, A., Semenova, N., Kadurin, A., Tutubalina, E. (2023). Graph-Enriched Biomedical Entity Representation Transformer. In: Arampatzis, A., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2023. Lecture Notes in Computer Science, vol 14163. Springer, Cham. https://doi.org/10.1007/978-3-031-42448-9_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-42448-9_10
Published: 11 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42447-2
Online ISBN: 978-3-031-42448-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics