Differentially Private Data Sets Based on Microaggregation and Record Perturbation
We present an approach to generate differentially private data sets that consists in adding noise to a microaggregated version of the original data set. While this idea has already been proposed in the literature to reduce the data sensitivity and hence the noise required to reach differential privacy, the novelty of our approach is that we focus on the microaggregated data set as the target of protection, rather than focusing on the original data set and viewing the microaggregated data set as a mere intermediate step. As a result, we avoid the complexities inherent to the insensitive microaggregation used in previous contributions and we significantly improve the utility of the data. This claim is supported by theoretical and empirical utility comparisons between our approach and existing approaches.
KeywordsAnonymization Differential privacy Microaggregation Privacy
Acknowledgments and Disclaimer
Partial support to this work has been received from the European Commission (projects H2020-644024 “CLARUS” and H2020-700540 “CANVAS”), the Government of Catalonia (ICREA Acadèmia Prize to J. Domingo-Ferrer), and from the Spanish Government (projects TIN2014-57364-C2-1-R “SmartGlacis” and TIN 2015-70054-REDC). The authors are with the UNESCO Chair in Data Privacy, but the views in this paper are their own and are not necessarily shared by UNESCO.
- 3.Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report, SRI International (1998)Google Scholar
- 4.Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: \(l\)-diversity: privacy beyond \(k\)-anonymity. ACM Trans. Knowl. Disc. Data 1(1) (2007).Google Scholar
- 5.Li, N., Li, T., Venkatasubramanian, S.: \(t\)-closeness: privacy beyond \(k\)-anonymity and \(l\)-diversity. In: 23th IEEE International Conference on Data Engineering-ICDE 2007, pp. 106–115. IEEE (2007)Google Scholar
- 6.Machanavajjhala, A., Kifer, D., Abowd, J., Gehrke, J., Vilhuber, L.: Privacy: theory meets practice on the map. In: 24th IEEE International Conference on Data Engineering-ICDE 2008, pp. 277–286 (2008)Google Scholar
- 7.Zhang, J., Cormode, G., Procopiuc, C.M., Srivastava, D., Xiao, X.: Privbayes: private data release via Bayesian networks. In: 2014 ACM SIGMOD International Conference on Management of Data-SIGMOD 2014, pp. 1423–1434. ACM, New York (2014)Google Scholar
- 13.McSherry, F., Talwar, K.: Mechanism design via differential privacy. In: 48th Annual IEEE Symposium on Foundations of Computer Science-FOCS 2007, pp. 94–103. IEEE Computer Society, Washington D.C. (2007)Google Scholar
- 14.Nissim, K., Raskhodnikova, S., Smith, A.: Smooth sensitivity and sampling in private data analysis. In: 39th Annual ACM Symposium on Theory of Computing-STOC 2007, pp. 75–84. ACM, New York (2007)Google Scholar
- 15.Soria-Comas, J., Domingo-Ferrer, J., Sánchez, D., Martínez, S.: Improving the utility of differentially private data releases via k-anonymity. In: 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications-TrustCom 2013, pp. 372–379 (2013)Google Scholar
- 18.Brand, R., Domingo-Ferrer, J., Mateo-Sanz, J.M.: Reference data sets to test and compare SDC methods for the protection of numerical microdata. Deliverable of the EU FP5 “CASC” project (2002). http://neon.vb.cbs.nl/casc/CASCtestsets.htm