Skip to main content

Évaluation d’un outil d’aide á l’anonymisation des documents médicaux basé sur le traitement automatique du langage naturel

  • Chapter
  • 579 Accesses

Part of the book series: Informatique et Santé ((INFORMATIQUE,volume 1))

Abstract

Anonymization of personal data is a legal requirement for their use as part of a research project. In the context of developing a tool for detecting hospital-acquired infections, 2000 medical documents were needed for the research project ALADIN. To help annotators to anonymize this corpus of documents, a tool for the anonymization has been developed, relying on Natural Language Processing techniques. The recall, precision and F-score of the automatic phase of the anonymizer were respectively 79.7, 85.2 and 82.4%. The gold- standard used for the evaluation was the manual anonymization of the documents. The performance of the automatic anonymization can still be improved but the tool is already a considerable help in this process in terms of saving time and in terms of quality of anonymization (including the accuracy of labeling anonymized terms and computation of time duration).

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Références

  1. Journal officiel de la République française. Loi du 6 janvier 1978 relative á l’informatique aux fichiers et aux libertés modifiée par la loi n°2004-801 du 6 août 2004. JO du 7 août 2004 1978

    Google Scholar 

  2. [2] CNIL. L’état des lieux en matiére de procédés d’anonymisation. Disponible sur: 〈http://www.cnil.fr/en-savoir-plus/fiches-pratiques/fiche/article/letat-des-lieux-en-matiere-de-procedes-danonymisation/〉 (Consulté le 26.12.2010)

    Google Scholar 

  3. Roberts A, Gaizauskas R, Hepple M, Demetriou G, Guo Y, Roberts I, Setzer A. Building a semantically annotated corpus of clinical texts. J Biomed Inform 2009; 42(5): 950–66

    Article  Google Scholar 

  4. Coden A, Savova G, Sominsky I, Tanenblatt M, Masanz J, Schuler K, Cooper J, Guan W, de Groen PC. Automatically extracting cancer disease characteristics from pathology reports into a Disease Knowledge Representation Model. J Biomed Inform 2009; 42(5): 937–49

    Article  Google Scholar 

  5. Grouin C, Rosier A, Dameron O, Zweigenbaum P. Une procédure d’anonymisation á deux niveaux pour créer un corpus de comptes rendus hospitaliers. Informatique et Santé 2009; 17: 23–34

    Article  Google Scholar 

  6. Proux D, Marchal P, Segond F, Kergourlay I, Darmoni S, Pereira S, Gicquel Q, Metzger M. Natural Language Processing to detect Risk Patterns related to Hospital Acquired Infections. International Conference on Recent Advances in Natural Language Processing. Borovets, Bulgaria, 2009

    Google Scholar 

  7. Metzger MH, Gicquel Q, Proux D, Pereira S, Kergourlay I, Serrot E, Segond F, Darmoni S. Development of an Automated Detection Tool for Healthcare-associated Infections Based on Screening Natural Language Medical Reports. AMIA Annu Fall Symp 2009

    Google Scholar 

  8. Proux D, Segond F, Gerbier S, Metzger MH. Addressing risk assessment for patient safety in hospitals through information extraction in medical reports. In: Springer B, ed. Intelligent Information Processing IV. IFIP International Federation for Information Processing 2009; 288: 230–9.

    Google Scholar 

  9. Brun C, Ehrmann M. Adaptation of a Named Entity Recognition System for the ESTER2 Evaluation Campaign. IEEE NLP-KE (IEEE International Conference on Natural Language Processing and Knowledge Engineering). Dalian, China, 24–27 September 2009

    Google Scholar 

  10. Meystre SM, Friedlin FJ, South BR, Shen S, Samore MH. Automatic de-identification of textual documents in the electronic health record: a review of recent research. BMC Med Res Methodol 2010; 10: 70

    Article  Google Scholar 

  11. Krishna R, Kelleher K, Stahlberg E. Patient confidentiality in the research use of clinical medical databases. Am J Public Health 2007; 97(4): 654–8

    Article  Google Scholar 

  12. Ruch P, Baud RH, Rassinoux AM, Bouillon P, Robert G. Medical document anonymization with a semantic lexicon. AMIA Fall Symp Proc 2000; 729–33

    Google Scholar 

  13. Sweeney L. Replacing personally-identifying information in medical records, the Scrub system. Annu Fall Symp Proc 1996; 333–7

    Google Scholar 

  14. Taira RK, Bui AA, Kangarloo H. Identification of patient name references within medical documents using semantic selectional restrictions. Proc AMIA Symp 2002; 757–61

    Google Scholar 

  15. Neamatullah I, Douglass MM, Lehman LW, Reisner A, Villarroel M, Long WJ, Szolovits P, Moody GB, Mark RG, Clifford GD. Automated de-identification of free-text medical records. BMC Med Inform Decis Mak 2008; 8: 32

    Article  Google Scholar 

  16. Friedlin FJ, McDonald CJ. A software tool for removing patient identifying information from clinical documents. J Am Med Inform Assoc 2008; 15(5): 601–610

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Quentin Gicquel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag France

About this chapter

Cite this chapter

Gicquel, Q. et al. (2011). Évaluation d’un outil d’aide á l’anonymisation des documents médicaux basé sur le traitement automatique du langage naturel. In: Staccini, P.M., Harmel, A., Darmoni, S.J., Gouider, R. (eds) Systèmes d’information pour l’amélioration de la qualité en santé. Informatique et Santé, vol 1. Springer, Paris. https://doi.org/10.1007/978-2-8178-0285-5_15

Download citation

  • DOI: https://doi.org/10.1007/978-2-8178-0285-5_15

  • Publisher Name: Springer, Paris

  • Print ISBN: 978-2-8178-0284-8

  • Online ISBN: 978-2-8178-0285-5

Publish with us

Policies and ethics