Diffix: High-Utility Database Anonymization

Francis, Paul; Probst Eide, Sebastian; Munz, Reinhard

doi:10.1007/978-3-319-67280-9_8

Paul Francis¹⁷,
Sebastian Probst Eide¹⁸ &
Reinhard Munz¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 10518))

Included in the following conference series:

Annual Privacy Forum

1550 Accesses
12 Citations

Abstract

In spite of the tremendous privacy and liability benefits of anonymization, most shared data today is only pseudonymized. The reason is simple: there haven’t been any anonymization technologies that are general purpose, easy to use, and preserve data quality. This paper presents the design of Diffix, a new approach to database anonymization that promises to break new ground in the utility/privacy trade-off. Diffix acts as an SQL proxy between the analyst and an unmodified live database. Diffix adds a minimal amount of noise to answers—Gaussian with a standard deviation of only two for counting queries—and places no limit on the number of queries an analyst may make. Diffix works with any type of data and configuration is simple and data-independent: the administrator does not need to consider the identifiability or sensitivity of the data itself. This paper presents a high-level but complete description of Diffix. It motivates the design through examples of attacks and defenses, and provides some evidence for how Diffix can provide strong anonymity with such low noise levels.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
For example, a state-of-the-art anonymization tool is ARX [9]. Usage of this tool requires among many other things that the user is able to classify data as identifying, quasi-identifying, sensitive, and insensitive; can create masking-based, interval-based, or order-based generalization hierarchies; and can understand and configure privacy models such as \(\delta \)-presence, l-diversity, t-closeness, \(\delta \)-disclosure, k-Anonymity, k-Map, (\(\epsilon \),\(\delta \))-differential privacy, and risk-based privacy models for prosecutor, journalist and marketer risks.
2.
Despite the thousands of papers on anonymity we could only find two that try to add noise in a way that depends on the data [2, 6]. This is why the paper cites so little related work.
3.
Distributions other than Gaussian may serve better, but in any event the noise is small and so we haven’t yet explored this question.

References

Article 29 Data Protection Working Party Opinion 05/2014 on Anonymisation Techniques. http://ec.europa.eu/justice/data-protection/article-29/documentation/opinion-recommendation/files/2014/wp216_en.pdf
Denning, D.E.: Secure statistical databases with random sample queries. ACM Trans. Database Syst. (TODS) 5(3), 291–315 (1980)
Article MATH Google Scholar
Denning, D.E., Denning, P.J., Schwartz, M.D.: The tracker: a threat to statistical database security. ACM Trans. Database Syst. (TODS) 4(1), 76–96 (1979)
Article Google Scholar
Dinur, I., Nissim, K.: Revealing information while preserving privacy. In: Proceedings of the Twenty-Second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 202–210. ACM (2003)
Google Scholar
Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006). doi:10.1007/11787006_1
Chapter Google Scholar
Erlingsson, Ú., Pihur, V., Korolova, A.: RAPPOR: randomized aggregatable privacy-preserving ordinal response. In: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, pp. 1054–1067. ACM (2014)
Google Scholar
Fellegi, I., Phillips, J.: Statistical confidentiality: some theory and application to data dissemination. In: Annals of Economic and Social Measurement, vol. 3, no. 2, pp. 399–409. NBER (1974)
Google Scholar
Fellegi, I.P.: On the question of statistical confidentiality. J. Am. Stat. Assoc. 67(337), 7–18 (1972)
Article MATH Google Scholar
Prasser, F., Kohlmayer, F.: Putting statistical disclosure control into practice: the ARX data anonymization tool. In: Gkoulalas-Divanis, A., Loukides, G. (eds.) Medical Data Privacy Handbook, pp. 111–148. Springer, Cham (2015). doi:10.1007/978-3-319-23633-9_6
Chapter Google Scholar
Kotschy, W.: The new General Data Protection Regulation - Is there sufficient pay-off for taking the trouble to anonymize or pseudonymize data? November 2016. https://fpf.org/wp-content/uploads/2016/11/Kotschy-paper-on-pseudonymisation.pdf

Download references

Author information

Authors and Affiliations

Max Planck Institute for Software Systems, Kaiserslautern, Saarbrücken, Germany
Paul Francis & Reinhard Munz
Aircloak GmbH, Kaiserslautern, Germany
Sebastian Probst Eide

Authors

Paul Francis
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Probst Eide
View author publications
You can also search for this author in PubMed Google Scholar
Reinhard Munz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Paul Francis .

Editor information

Editors and Affiliations

Centre for Legal Informatics, University of Vienna, Vienna, Austria
Erich Schweighofer
A-SIT, Graz, Austria
Herbert Leitold
European Union Agency for Network and Information Security, Heraklion, Greece
Andreas Mitrakas
Goethe University Frankfurt, Frankfurt, Hessen, Germany
Kai Rannenberg

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Francis, P., Probst Eide, S., Munz, R. (2017). Diffix: High-Utility Database Anonymization. In: Schweighofer, E., Leitold, H., Mitrakas, A., Rannenberg, K. (eds) Privacy Technologies and Policy. APF 2017. Lecture Notes in Computer Science(), vol 10518. Springer, Cham. https://doi.org/10.1007/978-3-319-67280-9_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-67280-9_8
Published: 11 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67279-3
Online ISBN: 978-3-319-67280-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics