Skip to main content

Part of the book series: SpringerBriefs in Electrical and Computer Engineering ((BRIEFSELECTRIC))

  • 794 Accesses

Abstract

This chapter presents an overview of anonymization techniques that can be used to protect different types of patient data. We first discuss anonymization principles that can prevent identity and sensitive information disclosure in demographic data publishing, as well as algorithms for enforcing these principles, in Section.1. Subsequently, in Section 2.2, we turn our attention to diagnosis code anonymization, which, somewhat surprisingly, has not been considered by the medical informatics community. After motivating the need for anonymizing diagnosis codes, we present several related principles and transformation models that have been proposed by the data management community. We then review anonymization algorithms, detailing how these principles and algorithms are utilized by them. Last, in Section 2.3, we examine genomic data, which are also susceptible to privacy attacks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    ICD-9 codes are described in the International Classification of Diseases, Ninth Revision – Clinical Modification, http://www.cdc.gov/nchs/icd/icd9cm.htm

  2. 2.

    The identifier is used only for reference and may be omitted, if this is clear from the context.

  3. 3.

    http://hapmap.ncbi.nlm.nih.gov/

  4. 4.

    Minor Allele Frequencies (MAFs) are the frequencies at which the less common allele occurs in a given population.

References

  1. Adam, N., Worthmann, J.: Security-control methods for statistical databases: a comparative study. ACM Comput. Surv. 21(4), 515–556 (1989)

    Article  Google Scholar 

  2. Aggarwal, C., Yu, P.: A condensation approach to privacy preserving data mining. In: EDBT, pp. 183–199 (2004)

    Google Scholar 

  3. Aggarwal, C.C.: On k-anonymity and the curse of dimensionality. In: VLDB, pp. 901–909 (2005)

    Google Scholar 

  4. Aggarwal, G., Kenthapadi, F., Motwani, K., Panigrahy, R., Zhu, D.T.A.: Approximation algorithms for k-anonymity. Journal of Privacy Technology (2005)

    Google Scholar 

  5. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB, pp. 487–499 (1994)

    Google Scholar 

  6. Bayardo, R., Agrawal, R.: Data privacy through optimal k-anonymization. In: 21st ICDE, pp. 217–228 (2005)

    Google Scholar 

  7. Braun, R., Rowe, W., Schaefer, C., Zhang, J., Buetow, K.: Needles in the haystack: identifying individuals present in pooled genomic data. PLoS Genetocs 5(10), e1000,668 (2009)

    Google Scholar 

  8. Byun, J., Kamra, A., Bertino, E., Li, N.: Efficient k-anonymity using clustering technique. In: DASFAA, pp. 188–200 (2007)

    Google Scholar 

  9. Cao, J., Karras, P., Kalnis, P., Tan, K.L.: Sabre: a sensitive attribute bucketization and redistribution framework for t-closeness. VLDBJ 20, 59–81 (2011)

    Article  Google Scholar 

  10. Cassa, C., Schmidt, B., Kohane, I., Mandl, K.D.: My sister’s keeper? genomic research and the identifiability of siblings. BMC Medical Genomics 1, 32 (2008)

    Article  Google Scholar 

  11. Chen, B., Ramakrishnan, R., LeFevre, K.: Privacy skyline: Privacy with multidimensional adversarial knowledge. In: VLDB, pp. 770–781 (2007)

    Google Scholar 

  12. Medical Research Council: MRC data sharing and preservation initiative policy. http://www.mrc.ac.uk/ourresearch/ethicsresearchguidance/datasharinginitiative (2006)

  13. Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. on Knowledge and Data Engineering 14(1), 189–201 (2002)

    Article  Google Scholar 

  14. Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogeneous k-anonymity through microaggregation. DMKD 11(2), 195–212 (2005)

    Article  MathSciNet  Google Scholar 

  15. Dwork, C.: Differential privacy. In: ICALP, pp. 1–12 (2006)

    Google Scholar 

  16. Emam, K.E.: Methods for the de-identification of electronic health records for genomic research. Genome Medicine 3(4), 25 (2011)

    Article  Google Scholar 

  17. Emam, K.E., Dankar, F.K.: Protecting privacy using k-anonymity. Journal of the American Medical Informatics Association 15(5), 627–637 (2008)

    Article  Google Scholar 

  18. Emam, K.E., Dankar, F.K., et al.: A globally optimal k-anonymity method for the de-identification of health data. Journal of the American Medical Informatics Association 16(5), 670–682 (2009)

    Article  Google Scholar 

  19. Farkas, C., Jajodia, S.: The inference problem: a survey. SIGKDD Explorations 4(2), 6–11 (2002)

    Article  Google Scholar 

  20. Federal Committee on Statistical Methodology: Report on statistical disclosure limitation methodology. http://www.fcsm.gov/working-papers/totalreport.pdf (2005)

  21. Fienberg, S.E., Slavkovic, A., Uhler, C.: Privacy preserving gwas data sharing. In: IEEE ICDM Worksops, pp. 628–635 (2011)

    Google Scholar 

  22. Friedman, J., Bentley, J., Finkel, R.: An algorithm for finding best matches in logarithmic time. ACM Trans. on Mathematical Software 3(3) (1977)

    Google Scholar 

  23. Fung, B.C.M., Wang, K., Chen, R., Yu, P.S.: Privacy-preserving data publishing: A survey on recent developments. ACM Comput. Surv. 42 (2010)

    Google Scholar 

  24. Gkoulalas-Divanis, A., Loukides, G.: PCTA: Privacy-constrained Clustering-based Transaction Data Anonymization. In: EDBT PAIS, p. 5 (2011)

    Google Scholar 

  25. Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: SIGMOD ’84, pp. 47–57 (1984)

    Google Scholar 

  26. Hamming, R.W.: Coding and Information Theory. Prentice-Hall (1980)

    Google Scholar 

  27. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD, pp. 1–12 (2000)

    Google Scholar 

  28. He, Y., Naughton, J.F.: Anonymization of set-valued data via top-down, local generalization. PVLDB 2(1), 934–945 (2009)

    Google Scholar 

  29. Homer, N., Szelinger, S., Redman, M., et al.: Resolving individuals contributing trace amounts of dna to highly complex mixtures using high-density snp genotyping microarrays. PLoS Genetics 4(8), e1000,167 (2008)

    Google Scholar 

  30. Iwuchukwu, T., Naughton, J.F.: K-anonymization as spatial indexing: Toward scalable and incremental anonymization. In: VLDB, pp. 746–757 (2007)

    Google Scholar 

  31. Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: KDD, pp. 279–288 (2002)

    Google Scholar 

  32. Koudas, N., Zhang, Q., Srivastava, D., Yu, T.: Aggregate query answering on anonymized tables. In: ICDE ’07, pp. 116–125 (2007)

    Google Scholar 

  33. LeFevre, K., DeWitt, D., Ramakrishnan, R.: Incognito: efficient full-domain k-anonymity. In: SIGMOD, pp. 49–60 (2005)

    Google Scholar 

  34. LeFevre, K., DeWitt, D., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: ICDE, p. 25 (2006)

    Google Scholar 

  35. LeFevre, K., DeWitt, D., Ramakrishnan, R.: Workload-aware anonymization. In: KDD, pp. 277–286 (2006)

    Google Scholar 

  36. Li, J., Wong, R., Fu, A., Pei, J.: Achieving -anonymity by clustering in attribute hierarchical structures. In: DaWaK, pp. 405–416 (2006)

    Google Scholar 

  37. Li, N., Li, T., Venkatasubramanian, S.: t-closeness: Privacy beyond k-anonymity and l-diversity. In: ICDE, pp. 106–115 (2007)

    Google Scholar 

  38. Li, T., Li, N.: Towards optimal k-anonymization. DKE 65, 22–39 (2008)

    Article  Google Scholar 

  39. Lin, Z., Altman, R.B., Owen, A.: Confidentiality in genome research. Science 313(5786), 441–442 (2006)

    Article  Google Scholar 

  40. Loukides, G., Denny, J., Malin, B.: The disclosure of diagnosis codes can breach research participants’ privacy. Journal of the American Medical Informatics Association 17, 322–327 (2010)

    Google Scholar 

  41. Loukides, G., Gkoulalas-Divanis, A., Malin, B.: Anonymization of electronic medical records for validating genome-wide association studies. Proceedings of the National Academy of Sciences 17(107), 7898–7903 (2010)

    Article  Google Scholar 

  42. Loukides, G., Gkoulalas-Divanis, A., Malin, B.: COAT: Constraint-based anonymization of transactions. KAIS 28(2), 251–282 (2011)

    Google Scholar 

  43. Loukides, G., Gkoulalas-Divanis, A., Shao, J.: Anonymizing transaction data to eliminate sensitive inferences. In: DEXA, pp. 400–415 (2010)

    Google Scholar 

  44. Loukides, G., Shao, J.: Capturing data usefulness and privacy protection in k-anonymisation. In: SAC, pp. 370–374 (2007)

    Google Scholar 

  45. Loukides, G., Shao, J.: Preventing range disclosure in k-anonymised data. Expert Systems with Applications 38(4), 4559–4574 (2011)

    Article  Google Scholar 

  46. Loukides, G., Tziatzios, A., Shao, J.: Towards preference-constrained -anonymisation. In: DASFAA International Workshop on Privacy- Preserving Data Analysis (PPDA), pp. 231–245 (2009)

    Google Scholar 

  47. Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-diversity: Privacy beyond k-anonymity. In: ICDE, p. 24 (2006)

    Google Scholar 

  48. Malin, B., Loukides, G., Benitez, K., Clayton, E.: Identifiability in biobanks: models, measures, and mitigation strategies. Human Genetics 130(3), 383–392 (2011)

    Article  Google Scholar 

  49. Meyerson, A., Williams, R.: On the complexity of optimal k-anonymity. In: PODS, pp. 223–228 (2004)

    Google Scholar 

  50. National Institutes of Health: Policy for sharing of data obtained in NIH supported or conducted genome-wide association studies. NOT-OD-07-088. 2007.

    Google Scholar 

  51. Nergiz, M.E., Clifton, C.: Thoughts on k-anonymization. DKE 63(3), 622–645 (2007)

    Article  Google Scholar 

  52. Ohno-Machado, L., Vinterbo, S., Dreiseitl, S.: Effects of data anonymization by cell suppression on descriptive statistics and predictive modeling performance. Journal of American Medical Informatics Association 9(6), 115119 (2002)

    Article  Google Scholar 

  53. Park, H., Shim, K.: Approximate algorithms for k-anonymity. In: SIGMOD, pp. 67–78 (2007)

    Google Scholar 

  54. European Parliament, C.: EU Directive on privacy and electronic communications. http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX:32002L0058:EN:NOT (2002)

  55. Phillips, C., Salas, A., Sanchez, J., et al.: Inferring ancestral origin using a single multiplex assay of ancestry-informative marker snps. Forensic Science International: Genetics 1, 273–280 (2007)

    Article  Google Scholar 

  56. Rodgers, J.: Quality assurance and medical ontologies. Methods of Information in Medicine 45(3), 267–274 (2006)

    Google Scholar 

  57. Rothstein, M., Epps, P.: Ethical and legal implications of pharmacogenomics. Nature Review Genetics 2, 228–231 (2001)

    Article  Google Scholar 

  58. Samarati, P.: Protecting respondents identities in microdata release. TKDE 13(9), 1010–1027 (2001)

    Google Scholar 

  59. Sweeney, L.: k-anonymity: a model for protecting privacy. IJUFKS 10, 557–570 (2002)

    MathSciNet  MATH  Google Scholar 

  60. Terrovitis, M., Mamoulis, N., Kalnis, P.: Privacy-preserving anonymization of set-valued data. PVLDB 1(1), 115–125 (2008)

    Google Scholar 

  61. Terrovitis, M., Mamoulis, N., Kalnis, P.: Local and global recoding methods for anonymizing set-valued data. VLDB J 20(1), 83–106 (2011)

    Article  Google Scholar 

  62. Texas Department of State Health Services: User manual of texas hospital inpatient discharge public use data file. http://www.dshs.state.tx.us/THCIC/ (2008)

  63. Truta, T.M., Campan, A., Meyer, P.: Generating microdata with p -sensitive k -anonymity property. In: Secure Data Management, pp. 124–141 (2007)

    Google Scholar 

  64. U.S. Department of Health and Human Services Office for Civil Rights: HIPAA administrative simplification regulation text (2006)

    Google Scholar 

  65. Wang, R., Li, Y.F., Wang, X., Tang, H., Zhou, X.: Learning your identity and disease from research papers: information leaks in genome wide association study. In: CCS, pp. 534–544 (2009)

    Google Scholar 

  66. Wong, R.C., Li, J., Fu, A., K.Wang: alpha-k-anonymity: An enhanced k-anonymity model for privacy-preserving data publishing. In: KDD, pp. 754–759 (2006)

    Google Scholar 

  67. Xiao, X., Tao, Y.: Personalized privacy preservation. In: SIGMOD, pp. 229–240 (2006)

    Google Scholar 

  68. Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., Fu, A.W.C.: Utility-based anonymization using local recoding. In: KDD, pp. 785–790 (2006)

    Google Scholar 

  69. Xu, Y., Wang, K., Fu, A.W.C., Yu, P.S.: Anonymizing transaction databases for publication. In: KDD, pp. 767–775 (2008)

    Google Scholar 

  70. Zerhouni, E.A., Nabel, E.: Protecting aggregate genomic data. Science 322(5898) (2008)

    Google Scholar 

  71. Zhou, X., Peng, B., Li, Y.F., Chen, Y., Tang, H., Wang, X.: To release or not to release: evaluating information leaks in aggregate human-genome data. In: ESORICS, pp. 607–627 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2013 The Author(s)

About this chapter

Cite this chapter

Gkoulalas-Divanis, A., Loukides, G. (2013). Overview of Patient Data Anonymization. In: Anonymization of Electronic Medical Records to Support Clinical Analysis. SpringerBriefs in Electrical and Computer Engineering. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-5668-1_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-5668-1_2

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-5667-4

  • Online ISBN: 978-1-4614-5668-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics