Abstract
Differential privacy is a privacy model for anonymization that offers more robust privacy guarantees than previous models, such as k-anonymity and its extensions. However, it is often disregarded that the utility of differentially private outputs is quite limited, either because of the amount of noise that needs to be added to obtain them or because utility is only preserved for a restricted type of queries. On the contrary, k-anonymity-like anonymization offers general purpose data releases that make no assumption on the uses of the protected data. This paper proposes a mechanism to offer general purpose differentially private data releases with a specific focus on the preservation of the utility of the protected data. Our proposal relies on univariate microaggregation to reduce the amount of noise needed to satisfy differential privacy. The theoretical benefits of the proposal are illustrated and in a practical setting.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., Zhu, A.: Anonymizing tables. In: Eiter, T., Libkin, L. (eds.) ICDT 2005. LNCS, vol. 3363, pp. 246–258. Springer, Heidelberg (2005)
Blum, A., Ligett, K., Roth, A.: A learning theory approach to non-interactive database privacy. In: The 40th Annual Symposium on the Theory of Computing-STOC 2008, pp. 609–618 (2008)
Charest, A.-S.: How can we analyze differentially-private synthetic data sets? Journal of Privacy and Confidentiality 2(2), 21–33 (2010)
Charest, A.-S.: Empirical evaluation of statistical inference from differentially-private contingency tables. In: Domingo-Ferrer, J., Tinnirello, I. (eds.) PSD 2012. LNCS, vol. 7556, pp. 257–272. Springer, Heidelberg (2012)
Cormode, G., Procopiuc, C.M., Shen, E., Srivastava, D., Yu, T.: Differentially private spatial decompositions. In: IEEE International Conference on Data Engineering (ICDE 2012), pp. 20–31 (2012)
Defays, D., Nanopoulos, P.: Panels of enterprises and confidentiality: the small aggregated method. In: The 1992 Symposium on Design and Analysis of Longitudinal Surveys, pp. 195–204 (1993)
Domingo-Ferrer, J.: A critique of k-anonymity and some of its enhancements. In: ARES/PSAI 2008, pp. 990–993. IEEE Computer Society (2008)
Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and Data Engineering 14(1), 189–201 (2002)
Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Mining and Knowledge Discovery 11(2), 195–212 (2005)
Domingo-Ferrer, J., Mateo-Sanz, J.M., Oganian, A., Torra, V., Torres, A.: On the Security of Microaggregation with Individual Ranking: Analytical Attacks. International Journal of Uncertainty, Fuzziness, and Knowledge-Based Systems 18(5), 477–492 (2002)
Domingo-Ferrer, J., Sánchez, D., Rufian-Torrell, G.: Anonymization of nominal data based on semantic marginality. Information Sciences 242, 35–48 (2013)
Domingo-Ferrer, J., Sebé, F., Solanas, A.: A polynomial-time approximation to optimal multivariate microaggregation. Computing & Mathematics with Applications 55(4), 714–732 (2008)
Drechsler, J.: My understanding of the differences between the CS and the statistical approach to data confidentiality. In: The 4th IAB Wokshop on Confidentiality and Disclosure. Institute for Employment Research (2011)
Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006)
Dwork, C.: A firm foundation for private data analysis. Communications of the ACM 54(1), 86–95 (2011)
Frank, A., Asuncion, A.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA (2010), http://archive.ics.uci.edu/ml/datasets/Adult
Goldberger, J., Tassa, T.: Efficient anonymizations with enhanced utility. Transactions on Data Privacy 3, 149–175 (2010)
Hay, M., Rastogi, V., Miklau, G., Suciu, D.: Boosting the accuracy of differentially private histograms through consistency. PVLDB 3(1), 1021–1032 (2010)
Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Schulte Nordholt, E., Spicer, K., de Wolf, P.-P.: Statistical Disclosure Control. Wiley (2012)
Li, N., Li, T., Venkatasubramanian, S.: t-Closeness: privacy beyond k-anonymity and l-diversity. In: IEEE International Conference on Data Engineering (ICDE 2007), pp. 106–115 (2007)
Li, N., Yang, W., Qardaji, W.: Differentially private grids for geospatial data. In: IEEE International Conference on Data Engineering (ICDE 2013), pp. 757–768 (2013)
Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-Diversity: privacy beyond k-anonymity. In: IEEE International Conference on Data Engineering (ICDE 2006), p. 24 (2006)
Machanavajjhala, A., Kifer, D., Abowd, J., Gehrke, J., Vilhuber, L.: Privacy: theory meets practice on the map. In: IEEE International Conference on Data Engineering (ICDE 2008), pp. 277–286 (2008)
Martínez, S., Sánchez, D., Valls, A.: A semantic framework to protect the privacy of electronic health records with non-numerical attributes. Journal of Biomedical Informatics 46(2), 294–303 (2013)
McSherry, F.: Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: The 2009 ACM SIGMOD International Conference on Management of Data, pp. 19–30. ACM (2009)
Mohammed, N., Chen, R., Fung, B.C.M., Yu, P.S.: Differentially private data release for data mining. In: The 17th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining-KDD 2011, pp. 493–501. ACM (2011)
Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Transactions on Knowledge and Data Engineering 13(6), 1010–1027 (2001)
Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. SRI International Report (1998)
Soria-Comas, J., Domingo-Ferrer, J., Sánchez, D., Martínez, S.: Enhancing Data Utility in Differential Privacy via Microaggregation-based k-Anonymity. VLDB Journal (to appear)
Sweeney, L.: k-Anonymity: a model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-based Systems 10(5), 557–570 (2002)
Wong, R., Li, J., Fu, A., Wang, K.: (α, k)-Anonymity: an enhanced k-anonymity model for privacy preserving data publishing. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2006), pp. 754–759 (2006)
Xiao, X., Wang, G., Gehrke, J.: Differential Privacy via Wavelet Transforms. IEEE Transactions on Knowledge and Data Engineering 23(8), 1200–1214 (2010)
Xiao, Y., Xiong, L., Yuan, C.: Differentially private data release through multidimensional partitioning. In: The 7th VLDB Conference on Secure Data Management (SDM 2010), pp. 150–168 (2010)
Xu, J., Zhang, Z., Xiao, X., Yang, Y., Yu, G.: Differentially Private Histogram Publication. In: IEEE International Conference on Data Engineering (ICDE 2012), pp. 32–43 (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Sánchez, D., Domingo-Ferrer, J., Martínez, S. (2014). Improving the Utility of Differential Privacy via Univariate Microaggregation. In: Domingo-Ferrer, J. (eds) Privacy in Statistical Databases. PSD 2014. Lecture Notes in Computer Science, vol 8744. Springer, Cham. https://doi.org/10.1007/978-3-319-11257-2_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-11257-2_11
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11256-5
Online ISBN: 978-3-319-11257-2
eBook Packages: Computer ScienceComputer Science (R0)