Skip to main content

BioBERT-Based Model for COVID-Related Named Entity Recognition

  • Conference paper
  • First Online:
Advances in IoT and Security with Computational Intelligence (ICAISA 2023)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 755))

Included in the following conference series:

  • 162 Accesses

Abstract

In natural language processing, information extraction from textual data is an important task. Named entity recognition, is the most popular task of information extraction, especially in the context of the medical domain. The entity extraction task aims to identify the entities and categorize them (Cho and Lee in BMC Bioinform 20:1–11, 2019) into predefined categories. With the emergence of COVID-19, COVID-related digital resources increased drastically and new types of entities are being introduced, which are semantically similar; also, new entities are being introduced which were earlier unknown. Now due to this, the task of entity extraction has become more challenging for which well-defined models developed earlier are not suitable for extracting such entities. Earlier research suggested that the state-of-the-art models were generic and focused less on domain-specific knowledge. Thus, it becomes important that the research progresses in a direction that considers biomedical domain knowledge for named entity recognition. The paper thus aims to identify the entities of biomedical nature specifically on the COVID benchmark (Cho and Lee in BMC Bioinform 20:1–11, 2019) dataset which was released by the University of Illinois. The experiments were performed using the biomedical domain-specific model BioBERT. Further, we have compared different versions of pre-trained weights on the BioBERT model and the experimental results show that the (Lee et al. in Bioinformatics, 2019) BioBERT-Base v1.1 (+PubMed 1M) weighted version outperforms the other models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cho H, Lee H (2019) Biomedical named entity recognition using deep neural networks with contextual information. BMC Bioinform 20:1–11

    Article  Google Scholar 

  2. Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J (2019) BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. https://doi.org/10.1093/bioinformatics/btz682

    Article  Google Scholar 

  3. Wang X, Song X, Li B, Guan Y, Han J (2020) Comprehensive named entity recognition on cord-19 with distant or weak supervision. arXiv preprint arXiv:2003.12218

  4. Zhao D, Li J, Feng Y, Ji H (2015) Natural language processing and Chinese computing. Springer

    Google Scholar 

  5. Das D, Katyal Y, Verma J, Dubey S, Singh A, Agarwal K, Bhaduri S, Ranjan R (2020) Information retrieval and extraction on covid-19 clinical articles using graph community detection and bio-BERT embeddings. In: Proceedings of the 1st workshop on NLP for COVID-19 at ACL 2020

    Google Scholar 

  6. Catelli R, Gargiulo F, Casola V, De Pietro G, Fujita H, Esposito M (2020) Crosslingual named entity recognition for clinical de-identification applied to a covid-19 Italian data set. Appl Soft Comput 97:106779

    Article  Google Scholar 

  7. Arguello-Casteleiro M, Maroto N, Wroe C, Torrado CS, Henson C, Des-Diz J, Fernandez-Prieto M, Furmston T, Fernandez DM, Kulshrestha M, et al (2021) Named entity recognition and relation extraction for covid-19: explainable active learning with word2vec embeddings and transformer-based BERT models. In: Artificial intelligence XXXVIII: 41st SGAI international conference on artificial intelligence, AI 2021, Cambridge, UK, December 14–16, 2021, Proceedings 41. Springer, pp 158–163

    Google Scholar 

  8. Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J (2020) BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4):1234–1240

    Article  Google Scholar 

  9. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781

  10. Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

    Google Scholar 

  11. Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. corr abs/1802.05365. arXiv preprint arXiv:1802.05365 (1802)

  12. Lyman CA, Anderson C, Morris M, Nandal UK, Martindale MJ, Clement M, Broderick G (2019) When the how outweighs the what: the pivotal importance of context. In: 2019 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE, pp 2149–2156

    Google Scholar 

  13. Kumar S, Sahu A, Sharan A (2022) Deep learning based architecture for entity extraction from covid related documents. In: Proceedings of 4th international conference on information systems and management science (ISMS) 2021. Springer, pp 419–427

    Google Scholar 

  14. Wang LL, Lo K, Chandrasekhar Y, Reas R, Yang J, Eide D, Funk K, Kinney R, Liu Z, Merrill W, et al (2020) Cord-19: the covid-19 open research dataset

    Google Scholar 

  15. Atliha V (2023) Improving image captioning methods using machine learning approaches. PhD thesis, Vilniaus Gedimino technikos universitetas

    Google Scholar 

  16. Brockmeier AJ, Ju M, Przybyla P, Ananiadou S (2019) Improving reference prioritisation with PICO recognition. BMC Med Inform Decis Mak 19:1–14

    Article  Google Scholar 

  17. Zhang Y, Lin H, Yang Z, Wang J, Sun Y, Xu B, Zhao Z (2019) Neural network-based approaches for biomedical relation classification: a review. J Biomed Inform 99:103294

    Article  Google Scholar 

  18. Zhong N, Bradshaw JM, Liu J, Taylor JG (2011) Brain informatics. IEEE Intell Syst 26(5):16–21

    Article  Google Scholar 

  19. Srinivasan P, Qiu XY (2007) Go for gene documents. BMC Bioinform (BioMed Central) 8:1–15

    Google Scholar 

  20. Ganeshkumar M, Ravi V, Sowmya V, Gopalakrishnan E, Soman K, Chakraborty C (2022) Identification of intracranial haemorrhage (ICH) using ResNet with data augmentation using CycleGAN and ICH segmentation using SegAN. Multimed Tools Appl 81(25):36257–36273

    Article  Google Scholar 

  21. Jha PK, Valekunja UK, Reddy AB (2023) SlumberNet: deep learning classification of sleep stages using residual neural networks. bioRxiv, 2023–05

    Google Scholar 

  22. Hong G, Kim Y, Choi Y, Song M (2021) BioPREP: deep learning-based predicate classification with SemMedDB. J Biomed Inform 122:103888

    Article  Google Scholar 

  23. Zhang R, Hristovski D, Schutte D, Kastrin A, Fiszman M, Kilicoglu H (2021) Drug repurposing for covid-19 via knowledge graph completion. J Biomed Inform 115:103696

    Article  Google Scholar 

  24. Liang Y, Kelemen A (2005) Temporal gene expression classification with regularised neural network. Int J Bioinform Res Appl 1(4):399–413

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Govind Soni .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Soni, G., Verma, S., Sharan, A., Ahmad, O. (2023). BioBERT-Based Model for COVID-Related Named Entity Recognition. In: Mishra, A., Gupta, D., Chetty, G. (eds) Advances in IoT and Security with Computational Intelligence. ICAISA 2023. Lecture Notes in Networks and Systems, vol 755. Springer, Singapore. https://doi.org/10.1007/978-981-99-5085-0_32

Download citation

Publish with us

Policies and ethics