Abstract
In natural language processing, information extraction from textual data is an important task. Named entity recognition, is the most popular task of information extraction, especially in the context of the medical domain. The entity extraction task aims to identify the entities and categorize them (Cho and Lee in BMC Bioinform 20:1–11, 2019) into predefined categories. With the emergence of COVID-19, COVID-related digital resources increased drastically and new types of entities are being introduced, which are semantically similar; also, new entities are being introduced which were earlier unknown. Now due to this, the task of entity extraction has become more challenging for which well-defined models developed earlier are not suitable for extracting such entities. Earlier research suggested that the state-of-the-art models were generic and focused less on domain-specific knowledge. Thus, it becomes important that the research progresses in a direction that considers biomedical domain knowledge for named entity recognition. The paper thus aims to identify the entities of biomedical nature specifically on the COVID benchmark (Cho and Lee in BMC Bioinform 20:1–11, 2019) dataset which was released by the University of Illinois. The experiments were performed using the biomedical domain-specific model BioBERT. Further, we have compared different versions of pre-trained weights on the BioBERT model and the experimental results show that the (Lee et al. in Bioinformatics, 2019) BioBERT-Base v1.1 (+PubMed 1M) weighted version outperforms the other models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cho H, Lee H (2019) Biomedical named entity recognition using deep neural networks with contextual information. BMC Bioinform 20:1–11
Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J (2019) BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. https://doi.org/10.1093/bioinformatics/btz682
Wang X, Song X, Li B, Guan Y, Han J (2020) Comprehensive named entity recognition on cord-19 with distant or weak supervision. arXiv preprint arXiv:2003.12218
Zhao D, Li J, Feng Y, Ji H (2015) Natural language processing and Chinese computing. Springer
Das D, Katyal Y, Verma J, Dubey S, Singh A, Agarwal K, Bhaduri S, Ranjan R (2020) Information retrieval and extraction on covid-19 clinical articles using graph community detection and bio-BERT embeddings. In: Proceedings of the 1st workshop on NLP for COVID-19 at ACL 2020
Catelli R, Gargiulo F, Casola V, De Pietro G, Fujita H, Esposito M (2020) Crosslingual named entity recognition for clinical de-identification applied to a covid-19 Italian data set. Appl Soft Comput 97:106779
Arguello-Casteleiro M, Maroto N, Wroe C, Torrado CS, Henson C, Des-Diz J, Fernandez-Prieto M, Furmston T, Fernandez DM, Kulshrestha M, et al (2021) Named entity recognition and relation extraction for covid-19: explainable active learning with word2vec embeddings and transformer-based BERT models. In: Artificial intelligence XXXVIII: 41st SGAI international conference on artificial intelligence, AI 2021, Cambridge, UK, December 14–16, 2021, Proceedings 41. Springer, pp 158–163
Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J (2020) BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4):1234–1240
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. corr abs/1802.05365. arXiv preprint arXiv:1802.05365 (1802)
Lyman CA, Anderson C, Morris M, Nandal UK, Martindale MJ, Clement M, Broderick G (2019) When the how outweighs the what: the pivotal importance of context. In: 2019 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE, pp 2149–2156
Kumar S, Sahu A, Sharan A (2022) Deep learning based architecture for entity extraction from covid related documents. In: Proceedings of 4th international conference on information systems and management science (ISMS) 2021. Springer, pp 419–427
Wang LL, Lo K, Chandrasekhar Y, Reas R, Yang J, Eide D, Funk K, Kinney R, Liu Z, Merrill W, et al (2020) Cord-19: the covid-19 open research dataset
Atliha V (2023) Improving image captioning methods using machine learning approaches. PhD thesis, Vilniaus Gedimino technikos universitetas
Brockmeier AJ, Ju M, Przybyla P, Ananiadou S (2019) Improving reference prioritisation with PICO recognition. BMC Med Inform Decis Mak 19:1–14
Zhang Y, Lin H, Yang Z, Wang J, Sun Y, Xu B, Zhao Z (2019) Neural network-based approaches for biomedical relation classification: a review. J Biomed Inform 99:103294
Zhong N, Bradshaw JM, Liu J, Taylor JG (2011) Brain informatics. IEEE Intell Syst 26(5):16–21
Srinivasan P, Qiu XY (2007) Go for gene documents. BMC Bioinform (BioMed Central) 8:1–15
Ganeshkumar M, Ravi V, Sowmya V, Gopalakrishnan E, Soman K, Chakraborty C (2022) Identification of intracranial haemorrhage (ICH) using ResNet with data augmentation using CycleGAN and ICH segmentation using SegAN. Multimed Tools Appl 81(25):36257–36273
Jha PK, Valekunja UK, Reddy AB (2023) SlumberNet: deep learning classification of sleep stages using residual neural networks. bioRxiv, 2023–05
Hong G, Kim Y, Choi Y, Song M (2021) BioPREP: deep learning-based predicate classification with SemMedDB. J Biomed Inform 122:103888
Zhang R, Hristovski D, Schutte D, Kastrin A, Fiszman M, Kilicoglu H (2021) Drug repurposing for covid-19 via knowledge graph completion. J Biomed Inform 115:103696
Liang Y, Kelemen A (2005) Temporal gene expression classification with regularised neural network. Int J Bioinform Res Appl 1(4):399–413
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Soni, G., Verma, S., Sharan, A., Ahmad, O. (2023). BioBERT-Based Model for COVID-Related Named Entity Recognition. In: Mishra, A., Gupta, D., Chetty, G. (eds) Advances in IoT and Security with Computational Intelligence. ICAISA 2023. Lecture Notes in Networks and Systems, vol 755. Springer, Singapore. https://doi.org/10.1007/978-981-99-5085-0_32
Download citation
DOI: https://doi.org/10.1007/978-981-99-5085-0_32
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-5084-3
Online ISBN: 978-981-99-5085-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)