Improving privacy preservation policy in the modern information age
- 91 Downloads
Anonymization or de-identification techniques are methods for protecting the privacy of human subjects in sensitive data sets while preserving the utility of those data sets. In the case of health data, anonymization techniques may be used to remove or mask patient identities while allowing the health data content to be used by the medical and pharmaceutical research community. The efficacy of anonymization methods has come under repeated attacks and several researchers have shown that anonymized data can be re-identified to reveal the identity of the data subjects via approaches such as “linking.” Nevertheless, even given these deficiencies, many government privacy policies depend on anonymization techniques as the primary approach to preserving privacy. In this report, we survey the anonymization landscape and consider the range of anonymization approaches that can be used to de-identify data containing personally identifiable information. We then review several notable government privacy policies that leverage anonymization. In particular, we review the European Union’s General Data Protection Regulation (GDPR) and show that it takes a more goal-oriented approach to data privacy. It defines data privacy in terms of desired outcome (i.e., as a defense against risk of personal data disclosure), and is agnostic to the actual method of privacy preservation. And GDPR goes further to frame its privacy preservation regulations relative to the state of the art, the cost of implementation, the incurred risks, and the context of data processing. This has potential implications for the GDPR’s robustness to future technological innovations – very much in contrast to privacy regulations that depend explicitly on more definite technical specifications.
KeywordsPrivacy Digital privacy Data privacy Data utility Anonymization De-identification Data management HIPAA GDPR
We would like to thank Marjory Blumenthal and Rebecca Balebako for their detailed and thoughtful review of early drafts of this document. We are immensely grateful for their comments and feedback. Any errors contained herein are our own and should not be attributed to them.
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflict of interest.
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent was obtained from all individual participants included in the study.
- 1.Tanner, Adam, Our Bodies, Our Data: How Companies Make Billions Selling Our Medical Records, Beacon Press. 2017.Google Scholar
- 2.G Cormode, D Srivastava. Anonymized Data: Generation, Models, Usage. SIGMOD, Providence, Rhode Island. 2009.Google Scholar
- 3.Dalenius T. Finding a needle in a haystack: identifying anonymous census records. J Off Stat. 1986;2(3):329–36.Google Scholar
- 4.L Sweeney. Uniqueness of Simple Demographics in the U.S. Population , LIDAPWP4. Carnegie Mellon University, Laboratory for International Data Privacy, Pittsburgh, PA. Forthcoming book entitled, The Identifiability of Data. 2000.Google Scholar
- 6.Adam Tanner. Harvard Professor Re-Identifies Anonymous Volunteers In DNA Study,” Forbes, April 25, http://www.forbes.com/sites/adamtanner/2013/04/25/harvard-professor-re-identifies-anonymous-volunteers-in-dna-study/print/. 2013.
- 7.Michael Barbaro and Tom Zeller. A Face Is Exposed for AOL Searcher No. 4417749, New York Times. 2006.Google Scholar
- 8.A. Narayanan, V. Shmatikov. Robust De-anonymization of Large Sparse Datasets. In Proceedings of the 2008 IEEE Symposium on Security and Privacy (SP '08). IEEE Computer Society, Washington, 2008, 111–125..Google Scholar
- 12.Truta TM, Campan A, Meyer P. Generating microdata with p-sensitive k-anonymity property. Berlin: Springer; 2007. p. 124–41.Google Scholar
- 14.Li, Ninghui, Tiancheng Li, and Suresh Venkatasubramanian. t-closeness: Privacy beyond k-anonymity and l-diversity. Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on. IEEE, 2007.Google Scholar
- 15.Domingo-Ferrer, Josep, and Vicenç Torra. "A critique of k-anonymity and some of its enhancements." Availability, Reliability and Security, 2008. ARES 08. Third International Conference on. IEEE, 2008.Google Scholar
- 16.Bonizzoni, P, Gianluca Della Vedova, and Riccardo Dondi. "The k-anonymity problem is hard." Fundamentals of Computation Theory. Springer Berlin Heidelberg, 2009.Google Scholar
- 17.LeFevre, Kristen, David J. DeWitt, and Raghu Ramakrishnan. "Incognito: Efficient full-domain k-anonymity." Proceedings of the 2005 ACM SIGMOD international conference on Management of data. ACM, 2005.Google Scholar
- 18.LeFevre, Kristen, David J. DeWitt, and Raghu Ramakrishnan. "Mondrian multidimensional k-anonymity." Data Engineering, 2006. ICDE'06. Proceedings of the 22nd International Conference on. IEEE, 2006.Google Scholar
- 19.Liang, H, H Yuan. "On the complexity of t-closeness anonymization and related problems." In Database Systems for Advanced Applications, pp. 331-345. Springer Berlin Heidelberg, 2013.Google Scholar
- 21.Cynthia Dwork. Differential privacy: a survey of results. In Proceedings of the 5th international conference on Theory and applications of models of computation (TAMC'08), Manindra Agrawal, Dingzhu Du, Zhenhua Duan, and Angsheng Li (Eds.). Springer-Verlag, Berlin, Heidelberg, 2008, 1-19.Google Scholar
- 22.Dwork C. An ad omnia approach to defining and achieving private data analysis. In: Bonchi F, Ferrari E, Malin B, Saygin Y, editors. Proceedings of the 1st ACM SIGKDD international conference on privacy, security, and trust in KDD (PinKDD'07). Berlin, Heidelberg: Springer-Verlag; 2007. p. 1–13.Google Scholar
- 23.Frank McSherry and Kunal Talwar. 2007. Mechanism Design via Differential Privacy. In Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS '07). IEEE Computer Society, Washington, DC, USA, 94-103.Google Scholar
- 24.Soria-Comas, Jordi, and Josep Domingo-Ferrer. Differential privacy via t-closeness in data publishing. Privacy, Security and Trust (PST), 2013 Eleventh Annual International Conference on. IEEE, 2013.Google Scholar
- 26.Leoni, D. (2012), Non-interactive differential privacy: a survey., in Guillaume Raschia & Martin Theobald, ed., 'WOD' , ACM, , pp. 40-52.Google Scholar
- 27.G. Cormode, M. Procopiuc, D. Srivastava, and T. Tran. Differentially private publication of sparse data. In International Conference on Database Theory (ICDT), 2012.Google Scholar
- 28.Ohm P. Broken promises of privacy: responding to the surprising failure of anonymization. UCLA Law Rev. 2010;57:1701.Google Scholar
- 29.Zayatz L. Disclosure avoidance practices and research at the US Census Bureau: an update. J Off Stat. 2007;23(2):253.Google Scholar
- 30.Klarreich, Erica. "Privacy by the Numbers: A New Approach to Safeguarding Data." Quanta Magazine. Quanta Magazine, 2012. Web. <https://www.quantamagazine.org/20121210-privacy-by-the-numbers-a-new-approach-to-safeguarding-data/>.
- 32.Roscorla, Tanya. “3 Student Data Privacy Bills That Congress Could Act On.” Center for Digital Education March 24, 2016, http://www.centerdigitaled.com/k-12/3-Student-Data-Privacy-Bills-That-Congress-Could-Act-On.html
- 33.Daries JP, Reich J, Waldo J, Young EM, Whittinghill J, Seaton DT, et al. Privacy, anonymity, and big data in the social sciences. Queue. 2014;12(7):30. 12 pagesGoogle Scholar
- 34.Access to Classified Information, Executive Order #12968, August 4, 1995, http://www.fas.org/sgp/clinton/eo12968.html
- 35.Dana Priest and William M. Arkin, “A hidden world, growing beyond control,” Washington Post – Top Secret America, http://projects.washingtonpost.com/top-secret-america/
- 36.“White House orders review of 5 million security clearances,” Nov 22, 2013, https://www.rt.com/usa/clapper-demands-security-clearance-review-173/
- 37.Gentry C, Halevi S. Implementing Gentry's fully-homomorphic encryption scheme. In: Paterson KG, editor. Proceedings of the 30th annual international conference on theory and applications of cryptographic techniques: advances in cryptology (EUROCRYPT'11). Berlin: Springer-Verlag; 2011. p. 129–48.Google Scholar
- 38.Lindell Y, Pinkas B. Privacy preserving data mining. In: Bellare M, editor. Proceedings of the 20th annual international cryptology conference on advances in cryptology (CRYPTO '00). London: Springer-Verlag; 2000. p. 36–54.Google Scholar