BioBERT-Based Model for COVID-Related Named Entity Recognition

Soni, Govind; Verma, Shikha; Sharan, Aditi; Ahmad, Owais

doi:10.1007/978-981-99-5085-0_32

Govind Soni¹²,
Shikha Verma¹²,
Aditi Sharan¹² &
…
Owais Ahmad¹³

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 755))

Included in the following conference series:

International Conference on Advances in IoT and Security with AI

162 Accesses

Abstract

In natural language processing, information extraction from textual data is an important task. Named entity recognition, is the most popular task of information extraction, especially in the context of the medical domain. The entity extraction task aims to identify the entities and categorize them (Cho and Lee in BMC Bioinform 20:1–11, 2019) into predefined categories. With the emergence of COVID-19, COVID-related digital resources increased drastically and new types of entities are being introduced, which are semantically similar; also, new entities are being introduced which were earlier unknown. Now due to this, the task of entity extraction has become more challenging for which well-defined models developed earlier are not suitable for extracting such entities. Earlier research suggested that the state-of-the-art models were generic and focused less on domain-specific knowledge. Thus, it becomes important that the research progresses in a direction that considers biomedical domain knowledge for named entity recognition. The paper thus aims to identify the entities of biomedical nature specifically on the COVID benchmark (Cho and Lee in BMC Bioinform 20:1–11, 2019) dataset which was released by the University of Illinois. The experiments were performed using the biomedical domain-specific model BioBERT. Further, we have compared different versions of pre-trained weights on the BioBERT model and the experimental results show that the (Lee et al. in Bioinformatics, 2019) BioBERT-Base v1.1 (+PubMed 1M) weighted version outperforms the other models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cho H, Lee H (2019) Biomedical named entity recognition using deep neural networks with contextual information. BMC Bioinform 20:1–11
Article Google Scholar
Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J (2019) BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. https://doi.org/10.1093/bioinformatics/btz682
Article Google Scholar
Wang X, Song X, Li B, Guan Y, Han J (2020) Comprehensive named entity recognition on cord-19 with distant or weak supervision. arXiv preprint arXiv:2003.12218
Zhao D, Li J, Feng Y, Ji H (2015) Natural language processing and Chinese computing. Springer
Google Scholar
Das D, Katyal Y, Verma J, Dubey S, Singh A, Agarwal K, Bhaduri S, Ranjan R (2020) Information retrieval and extraction on covid-19 clinical articles using graph community detection and bio-BERT embeddings. In: Proceedings of the 1st workshop on NLP for COVID-19 at ACL 2020
Google Scholar
Catelli R, Gargiulo F, Casola V, De Pietro G, Fujita H, Esposito M (2020) Crosslingual named entity recognition for clinical de-identification applied to a covid-19 Italian data set. Appl Soft Comput 97:106779
Article Google Scholar
Arguello-Casteleiro M, Maroto N, Wroe C, Torrado CS, Henson C, Des-Diz J, Fernandez-Prieto M, Furmston T, Fernandez DM, Kulshrestha M, et al (2021) Named entity recognition and relation extraction for covid-19: explainable active learning with word2vec embeddings and transformer-based BERT models. In: Artificial intelligence XXXVIII: 41st SGAI international conference on artificial intelligence, AI 2021, Cambridge, UK, December 14–16, 2021, Proceedings 41. Springer, pp 158–163
Google Scholar
Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J (2020) BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4):1234–1240
Article Google Scholar
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Google Scholar
Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. corr abs/1802.05365. arXiv preprint arXiv:1802.05365 (1802)
Lyman CA, Anderson C, Morris M, Nandal UK, Martindale MJ, Clement M, Broderick G (2019) When the how outweighs the what: the pivotal importance of context. In: 2019 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE, pp 2149–2156
Google Scholar
Kumar S, Sahu A, Sharan A (2022) Deep learning based architecture for entity extraction from covid related documents. In: Proceedings of 4th international conference on information systems and management science (ISMS) 2021. Springer, pp 419–427
Google Scholar
Wang LL, Lo K, Chandrasekhar Y, Reas R, Yang J, Eide D, Funk K, Kinney R, Liu Z, Merrill W, et al (2020) Cord-19: the covid-19 open research dataset
Google Scholar
Atliha V (2023) Improving image captioning methods using machine learning approaches. PhD thesis, Vilniaus Gedimino technikos universitetas
Google Scholar
Brockmeier AJ, Ju M, Przybyla P, Ananiadou S (2019) Improving reference prioritisation with PICO recognition. BMC Med Inform Decis Mak 19:1–14
Article Google Scholar
Zhang Y, Lin H, Yang Z, Wang J, Sun Y, Xu B, Zhao Z (2019) Neural network-based approaches for biomedical relation classification: a review. J Biomed Inform 99:103294
Article Google Scholar
Zhong N, Bradshaw JM, Liu J, Taylor JG (2011) Brain informatics. IEEE Intell Syst 26(5):16–21
Article Google Scholar
Srinivasan P, Qiu XY (2007) Go for gene documents. BMC Bioinform (BioMed Central) 8:1–15
Google Scholar
Ganeshkumar M, Ravi V, Sowmya V, Gopalakrishnan E, Soman K, Chakraborty C (2022) Identification of intracranial haemorrhage (ICH) using ResNet with data augmentation using CycleGAN and ICH segmentation using SegAN. Multimed Tools Appl 81(25):36257–36273
Article Google Scholar
Jha PK, Valekunja UK, Reddy AB (2023) SlumberNet: deep learning classification of sleep stages using residual neural networks. bioRxiv, 2023–05
Google Scholar
Hong G, Kim Y, Choi Y, Song M (2021) BioPREP: deep learning-based predicate classification with SemMedDB. J Biomed Inform 122:103888
Article Google Scholar
Zhang R, Hristovski D, Schutte D, Kastrin A, Fiszman M, Kilicoglu H (2021) Drug repurposing for covid-19 via knowledge graph completion. J Biomed Inform 115:103696
Article Google Scholar
Liang Y, Kelemen A (2005) Temporal gene expression classification with regularised neural network. Int J Bioinform Res Appl 1(4):399–413
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer and System Sciences, Jawaharlal Nehru University, New Delhi, India
Govind Soni, Shikha Verma & Aditi Sharan
Thoucentric, Bangalore, India
Owais Ahmad

Authors

Govind Soni
View author publications
You can also search for this author in PubMed Google Scholar
Shikha Verma
View author publications
You can also search for this author in PubMed Google Scholar
Aditi Sharan
View author publications
You can also search for this author in PubMed Google Scholar
Owais Ahmad
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Govind Soni .

Editor information

Editors and Affiliations

Department of Electronics, Deen Dayal Upadhyaya College, University of Delhi, New Delhi, India
Anurag Mishra
Department of Computer Science and Engineering, MNNIT Allahabad, Prayagraj, India
Deepak Gupta
Faculty of Science and Technology, University of Canberra, Bruce, ACT, Australia
Girija Chetty

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Soni, G., Verma, S., Sharan, A., Ahmad, O. (2023). BioBERT-Based Model for COVID-Related Named Entity Recognition. In: Mishra, A., Gupta, D., Chetty, G. (eds) Advances in IoT and Security with Computational Intelligence. ICAISA 2023. Lecture Notes in Networks and Systems, vol 755. Springer, Singapore. https://doi.org/10.1007/978-981-99-5085-0_32

Download citation

DOI: https://doi.org/10.1007/978-981-99-5085-0_32
Published: 23 September 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-5084-3
Online ISBN: 978-981-99-5085-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics