Abstract
This chapter presents an overview of anonymization techniques that can be used to protect different types of patient data. We first discuss anonymization principles that can prevent identity and sensitive information disclosure in demographic data publishing, as well as algorithms for enforcing these principles, in Section.1. Subsequently, in Section 2.2, we turn our attention to diagnosis code anonymization, which, somewhat surprisingly, has not been considered by the medical informatics community. After motivating the need for anonymizing diagnosis codes, we present several related principles and transformation models that have been proposed by the data management community. We then review anonymization algorithms, detailing how these principles and algorithms are utilized by them. Last, in Section 2.3, we examine genomic data, which are also susceptible to privacy attacks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
ICD-9 codes are described in the International Classification of Diseases, Ninth Revision – Clinical Modification, http://www.cdc.gov/nchs/icd/icd9cm.htm
- 2.
The identifier is used only for reference and may be omitted, if this is clear from the context.
- 3.
- 4.
Minor Allele Frequencies (MAFs) are the frequencies at which the less common allele occurs in a given population.
References
Adam, N., Worthmann, J.: Security-control methods for statistical databases: a comparative study. ACM Comput. Surv. 21(4), 515–556 (1989)
Aggarwal, C., Yu, P.: A condensation approach to privacy preserving data mining. In: EDBT, pp. 183–199 (2004)
Aggarwal, C.C.: On k-anonymity and the curse of dimensionality. In: VLDB, pp. 901–909 (2005)
Aggarwal, G., Kenthapadi, F., Motwani, K., Panigrahy, R., Zhu, D.T.A.: Approximation algorithms for k-anonymity. Journal of Privacy Technology (2005)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB, pp. 487–499 (1994)
Bayardo, R., Agrawal, R.: Data privacy through optimal k-anonymization. In: 21st ICDE, pp. 217–228 (2005)
Braun, R., Rowe, W., Schaefer, C., Zhang, J., Buetow, K.: Needles in the haystack: identifying individuals present in pooled genomic data. PLoS Genetocs 5(10), e1000,668 (2009)
Byun, J., Kamra, A., Bertino, E., Li, N.: Efficient k-anonymity using clustering technique. In: DASFAA, pp. 188–200 (2007)
Cao, J., Karras, P., Kalnis, P., Tan, K.L.: Sabre: a sensitive attribute bucketization and redistribution framework for t-closeness. VLDBJ 20, 59–81 (2011)
Cassa, C., Schmidt, B., Kohane, I., Mandl, K.D.: My sister’s keeper? genomic research and the identifiability of siblings. BMC Medical Genomics 1, 32 (2008)
Chen, B., Ramakrishnan, R., LeFevre, K.: Privacy skyline: Privacy with multidimensional adversarial knowledge. In: VLDB, pp. 770–781 (2007)
Medical Research Council: MRC data sharing and preservation initiative policy. http://www.mrc.ac.uk/ourresearch/ethicsresearchguidance/datasharinginitiative (2006)
Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. on Knowledge and Data Engineering 14(1), 189–201 (2002)
Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogeneous k-anonymity through microaggregation. DMKD 11(2), 195–212 (2005)
Dwork, C.: Differential privacy. In: ICALP, pp. 1–12 (2006)
Emam, K.E.: Methods for the de-identification of electronic health records for genomic research. Genome Medicine 3(4), 25 (2011)
Emam, K.E., Dankar, F.K.: Protecting privacy using k-anonymity. Journal of the American Medical Informatics Association 15(5), 627–637 (2008)
Emam, K.E., Dankar, F.K., et al.: A globally optimal k-anonymity method for the de-identification of health data. Journal of the American Medical Informatics Association 16(5), 670–682 (2009)
Farkas, C., Jajodia, S.: The inference problem: a survey. SIGKDD Explorations 4(2), 6–11 (2002)
Federal Committee on Statistical Methodology: Report on statistical disclosure limitation methodology. http://www.fcsm.gov/working-papers/totalreport.pdf (2005)
Fienberg, S.E., Slavkovic, A., Uhler, C.: Privacy preserving gwas data sharing. In: IEEE ICDM Worksops, pp. 628–635 (2011)
Friedman, J., Bentley, J., Finkel, R.: An algorithm for finding best matches in logarithmic time. ACM Trans. on Mathematical Software 3(3) (1977)
Fung, B.C.M., Wang, K., Chen, R., Yu, P.S.: Privacy-preserving data publishing: A survey on recent developments. ACM Comput. Surv. 42 (2010)
Gkoulalas-Divanis, A., Loukides, G.: PCTA: Privacy-constrained Clustering-based Transaction Data Anonymization. In: EDBT PAIS, p. 5 (2011)
Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: SIGMOD ’84, pp. 47–57 (1984)
Hamming, R.W.: Coding and Information Theory. Prentice-Hall (1980)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD, pp. 1–12 (2000)
He, Y., Naughton, J.F.: Anonymization of set-valued data via top-down, local generalization. PVLDB 2(1), 934–945 (2009)
Homer, N., Szelinger, S., Redman, M., et al.: Resolving individuals contributing trace amounts of dna to highly complex mixtures using high-density snp genotyping microarrays. PLoS Genetics 4(8), e1000,167 (2008)
Iwuchukwu, T., Naughton, J.F.: K-anonymization as spatial indexing: Toward scalable and incremental anonymization. In: VLDB, pp. 746–757 (2007)
Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: KDD, pp. 279–288 (2002)
Koudas, N., Zhang, Q., Srivastava, D., Yu, T.: Aggregate query answering on anonymized tables. In: ICDE ’07, pp. 116–125 (2007)
LeFevre, K., DeWitt, D., Ramakrishnan, R.: Incognito: efficient full-domain k-anonymity. In: SIGMOD, pp. 49–60 (2005)
LeFevre, K., DeWitt, D., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: ICDE, p. 25 (2006)
LeFevre, K., DeWitt, D., Ramakrishnan, R.: Workload-aware anonymization. In: KDD, pp. 277–286 (2006)
Li, J., Wong, R., Fu, A., Pei, J.: Achieving -anonymity by clustering in attribute hierarchical structures. In: DaWaK, pp. 405–416 (2006)
Li, N., Li, T., Venkatasubramanian, S.: t-closeness: Privacy beyond k-anonymity and l-diversity. In: ICDE, pp. 106–115 (2007)
Li, T., Li, N.: Towards optimal k-anonymization. DKE 65, 22–39 (2008)
Lin, Z., Altman, R.B., Owen, A.: Confidentiality in genome research. Science 313(5786), 441–442 (2006)
Loukides, G., Denny, J., Malin, B.: The disclosure of diagnosis codes can breach research participants’ privacy. Journal of the American Medical Informatics Association 17, 322–327 (2010)
Loukides, G., Gkoulalas-Divanis, A., Malin, B.: Anonymization of electronic medical records for validating genome-wide association studies. Proceedings of the National Academy of Sciences 17(107), 7898–7903 (2010)
Loukides, G., Gkoulalas-Divanis, A., Malin, B.: COAT: Constraint-based anonymization of transactions. KAIS 28(2), 251–282 (2011)
Loukides, G., Gkoulalas-Divanis, A., Shao, J.: Anonymizing transaction data to eliminate sensitive inferences. In: DEXA, pp. 400–415 (2010)
Loukides, G., Shao, J.: Capturing data usefulness and privacy protection in k-anonymisation. In: SAC, pp. 370–374 (2007)
Loukides, G., Shao, J.: Preventing range disclosure in k-anonymised data. Expert Systems with Applications 38(4), 4559–4574 (2011)
Loukides, G., Tziatzios, A., Shao, J.: Towards preference-constrained -anonymisation. In: DASFAA International Workshop on Privacy- Preserving Data Analysis (PPDA), pp. 231–245 (2009)
Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-diversity: Privacy beyond k-anonymity. In: ICDE, p. 24 (2006)
Malin, B., Loukides, G., Benitez, K., Clayton, E.: Identifiability in biobanks: models, measures, and mitigation strategies. Human Genetics 130(3), 383–392 (2011)
Meyerson, A., Williams, R.: On the complexity of optimal k-anonymity. In: PODS, pp. 223–228 (2004)
National Institutes of Health: Policy for sharing of data obtained in NIH supported or conducted genome-wide association studies. NOT-OD-07-088. 2007.
Nergiz, M.E., Clifton, C.: Thoughts on k-anonymization. DKE 63(3), 622–645 (2007)
Ohno-Machado, L., Vinterbo, S., Dreiseitl, S.: Effects of data anonymization by cell suppression on descriptive statistics and predictive modeling performance. Journal of American Medical Informatics Association 9(6), 115119 (2002)
Park, H., Shim, K.: Approximate algorithms for k-anonymity. In: SIGMOD, pp. 67–78 (2007)
European Parliament, C.: EU Directive on privacy and electronic communications. http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX:32002L0058:EN:NOT (2002)
Phillips, C., Salas, A., Sanchez, J., et al.: Inferring ancestral origin using a single multiplex assay of ancestry-informative marker snps. Forensic Science International: Genetics 1, 273–280 (2007)
Rodgers, J.: Quality assurance and medical ontologies. Methods of Information in Medicine 45(3), 267–274 (2006)
Rothstein, M., Epps, P.: Ethical and legal implications of pharmacogenomics. Nature Review Genetics 2, 228–231 (2001)
Samarati, P.: Protecting respondents identities in microdata release. TKDE 13(9), 1010–1027 (2001)
Sweeney, L.: k-anonymity: a model for protecting privacy. IJUFKS 10, 557–570 (2002)
Terrovitis, M., Mamoulis, N., Kalnis, P.: Privacy-preserving anonymization of set-valued data. PVLDB 1(1), 115–125 (2008)
Terrovitis, M., Mamoulis, N., Kalnis, P.: Local and global recoding methods for anonymizing set-valued data. VLDB J 20(1), 83–106 (2011)
Texas Department of State Health Services: User manual of texas hospital inpatient discharge public use data file. http://www.dshs.state.tx.us/THCIC/ (2008)
Truta, T.M., Campan, A., Meyer, P.: Generating microdata with p -sensitive k -anonymity property. In: Secure Data Management, pp. 124–141 (2007)
U.S. Department of Health and Human Services Office for Civil Rights: HIPAA administrative simplification regulation text (2006)
Wang, R., Li, Y.F., Wang, X., Tang, H., Zhou, X.: Learning your identity and disease from research papers: information leaks in genome wide association study. In: CCS, pp. 534–544 (2009)
Wong, R.C., Li, J., Fu, A., K.Wang: alpha-k-anonymity: An enhanced k-anonymity model for privacy-preserving data publishing. In: KDD, pp. 754–759 (2006)
Xiao, X., Tao, Y.: Personalized privacy preservation. In: SIGMOD, pp. 229–240 (2006)
Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., Fu, A.W.C.: Utility-based anonymization using local recoding. In: KDD, pp. 785–790 (2006)
Xu, Y., Wang, K., Fu, A.W.C., Yu, P.S.: Anonymizing transaction databases for publication. In: KDD, pp. 767–775 (2008)
Zerhouni, E.A., Nabel, E.: Protecting aggregate genomic data. Science 322(5898) (2008)
Zhou, X., Peng, B., Li, Y.F., Chen, Y., Tang, H., Wang, X.: To release or not to release: evaluating information leaks in aggregate human-genome data. In: ESORICS, pp. 607–627 (2011)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2013 The Author(s)
About this chapter
Cite this chapter
Gkoulalas-Divanis, A., Loukides, G. (2013). Overview of Patient Data Anonymization. In: Anonymization of Electronic Medical Records to Support Clinical Analysis. SpringerBriefs in Electrical and Computer Engineering. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-5668-1_2
Download citation
DOI: https://doi.org/10.1007/978-1-4614-5668-1_2
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-5667-4
Online ISBN: 978-1-4614-5668-1
eBook Packages: EngineeringEngineering (R0)