Skip to main content

Investigation of Biomedical Named Entity Recognition Methods

  • Conference paper
  • First Online:
4th International Conference on Artificial Intelligence and Applied Mathematics in Engineering (ICAIAME 2022)

Abstract

Biomedical named-entity recognition is the process of identifying entity names such as disease, symptom, drug, protein, and chemical in biomedical texts. It plays an important role in natural language processing, such as relationship extraction, question-answer systems, keyword extraction, machine translation, and text summarization. Biomedical domain information extraction can be used for early diagnosis of diseases, detection of missing relationships between biomedical entities such as diseases and chemicals, and determination of drug interactions and side effects. Since biomedical texts contain domain-specific words, complicated phrases, and abbreviations, named entity recognition in this domain is still a challenging task. In this study, we first investigated methods for named entity recognition in the biomedical domain. These methods are classified into four categories: dictionary-based, rule-based, machine learning, and deep learning methods. Recent advances such as deep learning and transformer-based biomedical language models have helped to achieve successful results in the named entity recognition task. Second, we conduct an experimental study on an annotated dataset called MedMention which is available to researchers. Finally, we present our experimental results and discuss the challenges and opportunities of the existing methods. The experimental study shows that the most successful method for extracting diseases and symptoms from biomedical texts is BioBERT, with an F1 score of 0.72.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 299.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 379.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Free shipping worldwide - see info
Hardcover Book
USD 379.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Li, J., Sun, A., Han, J., Li, C.: A survey on deep learning for named entity recognition. IEEE Trans. Knowl. Data Eng. 34(1), 50–70 (2020)

    Article  Google Scholar 

  2. Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4), 1234–1240 (2020)

    Article  Google Scholar 

  3. Zhang, Y., Chen, Q., Yang, Z., Lin, H., Lu, Z.: BioWordVec, improving biomedical word embeddings with subword information and MeSH. Sci. Data 6(1), 1–9 (2019)

    Article  Google Scholar 

  4. Kaddari, Z., Mellah, Y., Berrich, J., Bouchentouf, T., Belkasmi, M.G.: Biomedical question answering: a survey of methods and datasets. In: 2020 Fourth International Conference On Intelligent Computing in Data Sciences (ICDS), pp. 1–8. IEEE (2020)

    Google Scholar 

  5. Aramaki, E., Miura, Y., Tonoike, M., Ohkuma, T., Masuichi, H., Ohe, K.: Text2table: Medical text summarization system based on named entity recognition and modality identification. In: Proceedings of the BioNLP 2009 Workshop, pp. 185–192 (2009)

    Google Scholar 

  6. Çelikten, A., Uğur, A., Bulut, H.: Keyword extraction from biomedical documents using deep contextualized embeddings. In: 2021 International Conference on INnovations in Intelligent SysTems and Applications (INISTA), pp. 1–5 (2021). https://doi.org/10.1109/INISTA52262.2021.9548470

  7. Yang, Z., Lin, H., Li, Y.: Exploiting the performance of dictionary-based bio-entity name recognition in biomedical literature. Comput Biol Chem 32(4), 287–291 (2008)

    Article  MATH  Google Scholar 

  8. Aronson, A.R.: Effective mapping of biomedical text to the UMLS metathesaurus: the metamap program. In: Proceedings of the AMIA Symposium, p. 17. American Medical Informatics Association (2001)

    Google Scholar 

  9. Kang, N., Singh, B., Afzal, Z., et al.: Using rule-based natural language processing to improve disease normalization in biomedical text. J. Am. Med. Inform. Assoc. 20(5), 876–881 (2013)

    Article  Google Scholar 

  10. Fukuda, K.I., Tsunoda, T., Tamura, A., Takagi, T.: Toward information extraction: identifying protein names from biological papers. In Pac. Symp. Biocomput. 707(18), 707–718 (1998)

    Google Scholar 

  11. Khordad, M., Mercer, R.E., Rogan, P.: A machine learning approach for phenotype name recognition. In: Proceedings of COLING 2012, pp. 1425–1440 (2012)

    Google Scholar 

  12. Zhu, Q., Li, X., Conesa, A., Pereira, C.: GRAM-CNN: a deep learning approach with local context for named entity recognition in biomedical text. Bioinformatics 34(9), 1547–1554 (2018)

    Article  Google Scholar 

  13. Kazama, J., Makino, T., Ohta, Y., et al.: Tuning support vector machines for biomedical named entity recognition. In: Proceedings of the ACL-02 Workshop on Natural Language Processing in the Biomedical Domain-vol. 3, pp. 1–8. Association for Computational Linguistics (2002)

    Google Scholar 

  14. Kazkılınç, S., Adalı, E.: Koşullu Rastgele Alanlar ile Türkçe Haber Metinlerinin Etiketlenmesi. Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi, 5(2) (2012)

    Google Scholar 

  15. McDonald, R., Pereira, F.: Identifying gene and protein mentions in text using conditional random fields. BMC Bioinform. 6(1), 1–7 (2005)

    Article  Google Scholar 

  16. Luo, L., et al.: An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition. Bioinformatics 34(8), 1381–1388 (2018)

    Article  Google Scholar 

  17. Alsentzer, E., Murphy, J.R., Boag, W., Weng, W.H., Jin, D., Naumann, T., McDermott, M.: Publicly available clinical BERT embeddings. arXiv preprint arXiv:1904.03323 (2019)

  18. Beltagy, I., Lo, K., Cohan, A.: SciBERT: A pretrained language model for scientific text. arXiv preprint arXiv:1903.10676 (2019)

  19. Liu, Y., et al.: Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019). Doğan, R.I., Leaman, R., Lu, Z.: NCBI disease corpus: a resource for disease name recognition and concept normalization. J. Biomed. Inform. 47, 1–10 (2014)

    Google Scholar 

  20. Doğan, R.I., Leaman, R., Lu, Z.: NCBI disease corpus: a resource for disease name recognition and concept normalization. J. Biomed. Inform. 47, 1–10 (2014)

    Article  Google Scholar 

  21. Krallinger, M., et al.: The CHEMDNER corpus of chemicals and drugs and its annotation principles. J. Cheminform. 7(1), 1–17 (2015)

    Article  Google Scholar 

  22. Li, J., et al.: BioCreative V CDR task corpus: a resource for chemical disease relation extraction. Database 2016, baw068 (2016). https://doi.org/10.1093/database/baw068

    Article  Google Scholar 

  23. Kim, J.D., Ohta, T., Tsuruoka, Y., Tateisi, Y., Collier, N.: Introduction to the bio-entity recognition task at JNLPBA. In: Proceedings of the international joint workshop on natural language processing in biomedicine and its applications, pp. 70–75 (2004)

    Google Scholar 

  24. Pafilis, E., et al.: The species and organisms resources for fast and accurate identification of taxonomic names in text. PLoS ONE 8(6), e65390 (2013)

    Article  Google Scholar 

  25. Mohan, S., Li, D.: Medmentions: A large biomedical corpus annotated with umls concepts. arXiv preprint arXiv:1902.09476 (2019)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Azer Çelikten .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Çelikten, A., Onan, A., Bulut, H. (2023). Investigation of Biomedical Named Entity Recognition Methods. In: Hemanth, D.J., Yigit, T., Kose, U., Guvenc, U. (eds) 4th International Conference on Artificial Intelligence and Applied Mathematics in Engineering. ICAIAME 2022. Engineering Cyber-Physical Systems and Critical Infrastructures, vol 7. Springer, Cham. https://doi.org/10.1007/978-3-031-31956-3_18

Download citation

Publish with us

Policies and ethics