A Hybrid Approach for French Medical Entity Recognition and Normalization

  • Allaouzi Imane
  • Mohamed Ben Ahmed
Conference paper
Part of the Lecture Notes in Networks and Systems book series (LNNS, volume 37)


Medical document written in natural language is available in electronic form, and it constitutes an invaluable source for medical research. This paper describes our system based on hybrid approach for the task of Named Entity Recognition and Normalization of French medical documents using QUAERO corpus [1]. To evaluate our system, we took part in three subtasks: Entity Normalization, Named Entity Extraction and Classification which involved 10 categories including Anatomy, Chemicals & Drugs, Devices, Disorders, Geographic Areas, Living Beings, Objects, Phenomena, Physiology and Procedures. The results on both tasks, Named Entity Recognition and Normalization, demonstrate high performance as compared to other methods for French Medical Entity Recognition and Normalization.


Medical entity recognition Automatic categorization Normalization UMLS Machine learning Knowledge-based NLP 


  1. 1.
    Névéol, A., Grouin, C., Leixa, J., Rosset, S., Zweigenbaum, P.: The QUAERO French medical corpus: a resource for medical entity recognition and normalization. In: Fourth Workshop on Building and Evaluating Resources for Health and Biomedical Text Processing – BioTxtM, pp. 24–30 (2014)Google Scholar
  2. 2.
    György, S.: Feature engineering for domain independent named entity recognition and biomedical text mining applications. In: Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and the University of Szeged, pp. 5–6 (2008)Google Scholar
  3. 3.
    Sekine, S., Sudo, K., Nobata, C.: Extended named entity hierarchy. In: LREC-2002 (2002)Google Scholar
  4. 4.
    Meystre, S.M., Haug, P.J.: Comparing natural language processing tools to extract medical problems from narrative text. In: AMIA Annual Symposium Proceedings, pp. 525–529 (2005)Google Scholar
  5. 5.
    Alfred, R., Leong, L.C., On, C.K., Anthony, P.: Malay named entity recognition based on rule-based approach. Int. J. Mach. Learn. Comput. 4(3), 301–302 (2014)Google Scholar
  6. 6.
    Tang, B., Cao, H., Wu, Y., Jiang, M., Xu, H.: Clinical entity recognition using structural support vector machines with rich features. In: Proceedings of the ACM Sixth International Workshop on Data and Text Mining in Biomedical Informatics, p. 1 (2012)Google Scholar
  7. 7.
    Abacha, A.B., Zweigenbaum, P.: Medical entity recognition: a comparison of semantic and statistical methods. In: Proceedings of BioNLP Workshop, pp. 57–58 (2011)Google Scholar
  8. 8.
    Jiang, M., Chen, Y., Liu, M., Rosenbloom, S.T., Mani, S., Denny, J.C., Xu, H.: A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries. J. Am. Med. Inform. Assoc. 18, 601–606 (2011)CrossRefGoogle Scholar
  9. 9.
    Ho-Dac, L.M, Tanguy, L., Grauby, C., Hnub, N., Mby A.H., Malosse, J., Rivière, L., Veltz-Mauclair, A., Wauquier, M.: LITL at CLEF eHealth2016: recognizing entities in French biomedical documents. In: CLEF 2016 Online Working Notes, p. 3. CEUR-WS (2016)Google Scholar
  10. 10.
    McCray, A.T.: The scope and structure of the first version of the UMLS semantic network. In: Proceedings of the Annual Symposium on Computer Application in Medical Care, pp. 126–130 (1990)Google Scholar
  11. 11.
    National Library of Medicine: Semantic networks. In: UMLS Reference Manual. U.S. National Library of Medicine, National Institutes of Health, Bethesda (2009). Chapter 1Google Scholar
  12. 12.
    National Library of Medicine: Semantic networks. In: UMLS Reference Manual. U.S. National Library of Medicine, National Institutes of Health, Bethesda (2009). Chapter 5Google Scholar
  13. 13.
    Elnahrawy, E.M.: Log-based chat room monitoring using text categorization: a comparative study. In: International Conference on Information and Knowledge Sharing, Acta Press Series, pp. 1–2 (2002)Google Scholar
  14. 14.
    Breiman, L.: Mach. Learn. 45, 5 (2001). CrossRefGoogle Scholar
  15. 15.
    Segal, M.R.: Machine Learning Benchmarks and Random Forest Regression. Center for Bioinformatics & Molecular Biostatistics (2014)Google Scholar
  16. 16.
    Galibert, O., Rosset, S., Grouin, C., Zweigenbaum, P., Quintard, L.: Structured and extended named entity evaluation in automatic speech transcriptions. In: Proceedings of 5th International Joint Conference on Natural Language Processing, p. 521 (2011)Google Scholar
  17. 17.
    Névéol, A., Cohen, K.B., Grouin, C., Hamon, T., Lavergne, T., Kelly, L., Goeuriot, L., Rey, G., Robert, A., Tannier, X., Zweigenbaum, P.: Clinical information extraction at the CLEF eHealth evaluation lab 2016. In: CLEF 2016 Online Working Notes. CEUR-WS (2016)Google Scholar
  18. 18.
    Van Mulligen, E., Afzal, Z., Akhondi, S.A., Vo, D., Kors, J.A.: Erasmus MC at CLEF eHealth 2016: concept recognition and coding in French texts. In: CLEF 2016 Online Working Notes. CEUR-WS (2016)Google Scholar
  19. 19.
    Cabot, C., Soualmia, L.F., Dahamna, B., Darmoni, S.J.: SIBM at CLEF eHealth evaluation lab 2016: extracting concepts in french medical texts with ECMT and CIMIND. In: CLEF 2016 Online Working Notes. CEUR-WS (2016)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.LIST/FSTTAbdelmalek Essaadi UniversityTangierMorocco

Personalised recommendations