Abstract
In spite of the tremendous privacy and liability benefits of anonymization, most shared data today is only pseudonymized. The reason is simple: there haven’t been any anonymization technologies that are general purpose, easy to use, and preserve data quality. This paper presents the design of Diffix, a new approach to database anonymization that promises to break new ground in the utility/privacy trade-off. Diffix acts as an SQL proxy between the analyst and an unmodified live database. Diffix adds a minimal amount of noise to answers—Gaussian with a standard deviation of only two for counting queries—and places no limit on the number of queries an analyst may make. Diffix works with any type of data and configuration is simple and data-independent: the administrator does not need to consider the identifiability or sensitivity of the data itself. This paper presents a high-level but complete description of Diffix. It motivates the design through examples of attacks and defenses, and provides some evidence for how Diffix can provide strong anonymity with such low noise levels.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
For example, a state-of-the-art anonymization tool is ARX [9]. Usage of this tool requires among many other things that the user is able to classify data as identifying, quasi-identifying, sensitive, and insensitive; can create masking-based, interval-based, or order-based generalization hierarchies; and can understand and configure privacy models such as \(\delta \)-presence, l-diversity, t-closeness, \(\delta \)-disclosure, k-Anonymity, k-Map, (\(\epsilon \),\(\delta \))-differential privacy, and risk-based privacy models for prosecutor, journalist and marketer risks.
- 2.
- 3.
Distributions other than Gaussian may serve better, but in any event the noise is small and so we haven’t yet explored this question.
References
Article 29 Data Protection Working Party Opinion 05/2014 on Anonymisation Techniques. http://ec.europa.eu/justice/data-protection/article-29/documentation/opinion-recommendation/files/2014/wp216_en.pdf
Denning, D.E.: Secure statistical databases with random sample queries. ACM Trans. Database Syst. (TODS) 5(3), 291–315 (1980)
Denning, D.E., Denning, P.J., Schwartz, M.D.: The tracker: a threat to statistical database security. ACM Trans. Database Syst. (TODS) 4(1), 76–96 (1979)
Dinur, I., Nissim, K.: Revealing information while preserving privacy. In: Proceedings of the Twenty-Second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 202–210. ACM (2003)
Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006). doi:10.1007/11787006_1
Erlingsson, Ú., Pihur, V., Korolova, A.: RAPPOR: randomized aggregatable privacy-preserving ordinal response. In: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, pp. 1054–1067. ACM (2014)
Fellegi, I., Phillips, J.: Statistical confidentiality: some theory and application to data dissemination. In: Annals of Economic and Social Measurement, vol. 3, no. 2, pp. 399–409. NBER (1974)
Fellegi, I.P.: On the question of statistical confidentiality. J. Am. Stat. Assoc. 67(337), 7–18 (1972)
Prasser, F., Kohlmayer, F.: Putting statistical disclosure control into practice: the ARX data anonymization tool. In: Gkoulalas-Divanis, A., Loukides, G. (eds.) Medical Data Privacy Handbook, pp. 111–148. Springer, Cham (2015). doi:10.1007/978-3-319-23633-9_6
Kotschy, W.: The new General Data Protection Regulation - Is there sufficient pay-off for taking the trouble to anonymize or pseudonymize data? November 2016. https://fpf.org/wp-content/uploads/2016/11/Kotschy-paper-on-pseudonymisation.pdf
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Francis, P., Probst Eide, S., Munz, R. (2017). Diffix: High-Utility Database Anonymization. In: Schweighofer, E., Leitold, H., Mitrakas, A., Rannenberg, K. (eds) Privacy Technologies and Policy. APF 2017. Lecture Notes in Computer Science(), vol 10518. Springer, Cham. https://doi.org/10.1007/978-3-319-67280-9_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-67280-9_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67279-3
Online ISBN: 978-3-319-67280-9
eBook Packages: Computer ScienceComputer Science (R0)