Recognition and pseudonymisation of medical records for secondary use

  • Johannes HeurixEmail author
  • Stefan Fenz
  • Antonio Rella
  • Thomas Neubauer
Original Article


Health records rank among the most sensitive personal information existing today. An unwanted disclosure to unauthorised parties usually results in significant negative consequences for an individual. Therefore, health records must be adequately protected in order to ensure the individual’s privacy. However, health records are also valuable resources for clinical studies and research activities. In order to make the records available for privacy-preserving secondary use, thorough de-personalisation is a crucial prerequisite to prevent re-identification. This paper introduces MEDSEC, a system which automatically converts paper-based health records into de-personalised and pseudonymised documents which can be accessed by secondary users without compromising the patients’ privacy. The system converts the paper-based records into a standardised structure that facilitates automated processing and the search for useful information.


De-personalisation Information management Secondary use Privacy Pseudonymisation 



We thank our business partners XiTrust Secure Technologies and Xylem Technologies for supporting the implementation of the case studies carried out within the MEDSEC project. The research was funded by BRIDGE (#824884), FFG-Austrian Research Promotion Agency, and supported by COMET K1, FFG-Austrian Research Promotion Agency.

Ethical standard

Since real-life records from a hospital archive with personal data were used in the case study, special care was taken to ensure the involved patients’ privacy. Access to the data was only allowed for the directly involved project members. Furthermore, the test data were only accessible within the archive computer network and records were not stored, copied, or processed outside the network environment.


  1. 1.
    Appari A, Johnson ME (2010) Information security and privacy in health-care: current state of research. Int J Internet Enterp Manag 6(4):279–314CrossRefGoogle Scholar
  2. 2.
    Appelt DE (1999) Introduction to information extraction. AI Commun 12(3):161–172Google Scholar
  3. 3.
    Bascifci F, Eldem A (2013) Using reduced rule base with Expert System for the diagnosis of disease in hypertension. Med Biomed Eng Comput 51:1287–1293CrossRefGoogle Scholar
  4. 4.
    Buckland M, Gey F (1994) The relationship between recall and precision. J Am Soc Inform Sci 45(1):12–19CrossRefGoogle Scholar
  5. 5.
    Claerhout B, DeMoor G (2005) Privacy protection for clinical and genomic data: the use of privacy-enhancing techniques in medicine. Int J Med Inform 74(2):257–265CrossRefPubMedGoogle Scholar
  6. 6.
    Galindo D, Verheul ER (2007) Microdata sharing via pseudonymisation. Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, Manchester, pp 24–32Google Scholar
  7. 7.
    Giakoumaki A, Pavlopoulos S, Koutsouris D (2006) Secure and efficient health data management through multiple watermarking on medical images. Med Biomed Eng Comput 44:619–631CrossRefGoogle Scholar
  8. 8.
    Grouin D, Rosier A, Dameron O, Zweigenbaum P (2009) Testing tactics to localize de-identification. Stud Health Technol Inform 150:735–739PubMedGoogle Scholar
  9. 9.
    Health Level Seven International (2007) HL7 version 3. Online:
  10. 10.
    Heurix J, Karlinger M, Neubauer T (2012) PERiMETER–Pseudonymization and personal metadata encryption for privacy-preserving searchable documents. Health Systems 1(1):46–57CrossRefGoogle Scholar
  11. 11.
    Heurix J, Rella A, Fenz S, Neubauer T (2013) Automated transformation of semi-structured text elements. In: Proceedings of America’s conference on information systems (AMCIS), pp 1–11Google Scholar
  12. 12.
    Iacono LL (2007) Multi-centric universal pseudonymisation for secondary use of the EHR. Stud Health Technol Inform 126:239–247PubMedGoogle Scholar
  13. 13.
    Imamura T, Matsumoto S, Kanagawa Y, Tajima B, Matsuya S, Furue M, Oyama H (2007) A technique for identifying three diagnostic findings using association analysis. Med Biomed Eng Comput 45:51–59CrossRefGoogle Scholar
  14. 14.
    Morrison F, Li L, Lai A, Hripcsak G (2009) Repurposing the clinical record: can an existing natural language processing system de-identify clinical notes? J Am Med Inform Assoc 16(1):37–39CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Ness RB (2007) Influence of the HIPAA privacy rule on health research. J Am Med Assoc 298(18):2164–2170CrossRefGoogle Scholar
  16. 16.
    Noumeir R, Lemay A, Lina J-M (2007) Pseudonymization of radiology data for research purposes. J Digit Imaging 20(3):284–295CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Sarawagi S (2008) Information extraction. Found Trends Databases 1(3):261–377CrossRefGoogle Scholar
  18. 18.
    Sibanda T, He T, Szolovits P, Uzuner O (2006) Syntactically-informed semantic category recognition in discharge summaries. In: AMIA annual symposium proceedings, pp 714–718Google Scholar
  19. 19.
    Simon SR, Evans JS, Benjamin A, Delano D, Bates DW (2009) Patients’ attitudes toward electronic health information exchange: qualitative study. J Med Internet Res 11(3):1–30CrossRefGoogle Scholar
  20. 20.
    Szarvas G, Farkas R, Busa-Fekete R (2007) State-of-the-art anonymization of medical records using an iterative machine learning framework. J Am Med Inform Assoc 14(5):574–580CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Union European (1995) Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data. Off J Eur Commun L281:31–50Google Scholar
  22. 22.
    United States Congress (1996) Health insurance portability and accountability Act of 1996. Pub.L. 104–191, 110 Stat. 1936Google Scholar
  23. 23.
    Velupillai S, Dalianisa H, Hassela M, Nilsson GH (2009) Developing a standard for de-identifying electronic patient records written in Swedish: precision, recall and f-measure in a manual and computerized annotation trial. Int J Med Inform 78(12):19–26CrossRefGoogle Scholar
  24. 24.
    Wellner B, Huyck M, Mardis S, Aberdeen J, Morgan A, Peshkin L, Yeh A, Hitzeman J, Hirschman L (2007) Rapidly retargetable approaches to de-identification in medical records. J Am Med Inform Assoc 14(5):564–573CrossRefPubMedPubMedCentralGoogle Scholar
  25. 25.
    Willison DJ, Keshavjee K, Nair K, Goldsmith C, Holbrook AM (2003) Patients’ consent preferences for research uses of information in electronic medical records: interview and survey data. Br Med J 326(7385):373CrossRefGoogle Scholar

Copyright information

© International Federation for Medical and Biological Engineering 2015

Authors and Affiliations

  • Johannes Heurix
    • 1
    Email author
  • Stefan Fenz
    • 2
  • Antonio Rella
    • 3
  • Thomas Neubauer
    • 2
  1. 1.SBA ResearchViennaAustria
  2. 2.Vienna University of Technology, ISISViennaAustria
  3. 3.Xitrust Secure TechnologiesGrazAustria

Personalised recommendations