Advertisement

Diffix: High-Utility Database Anonymization

  • Paul Francis
  • Sebastian Probst Eide
  • Reinhard Munz
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10518)

Abstract

In spite of the tremendous privacy and liability benefits of anonymization, most shared data today is only pseudonymized. The reason is simple: there haven’t been any anonymization technologies that are general purpose, easy to use, and preserve data quality. This paper presents the design of Diffix, a new approach to database anonymization that promises to break new ground in the utility/privacy trade-off. Diffix acts as an SQL proxy between the analyst and an unmodified live database. Diffix adds a minimal amount of noise to answers—Gaussian with a standard deviation of only two for counting queries—and places no limit on the number of queries an analyst may make. Diffix works with any type of data and configuration is simple and data-independent: the administrator does not need to consider the identifiability or sensitivity of the data itself. This paper presents a high-level but complete description of Diffix. It motivates the design through examples of attacks and defenses, and provides some evidence for how Diffix can provide strong anonymity with such low noise levels.

Keywords

Privacy Anonymity Analytics Database 

References

  1. 1.
    Article 29 Data Protection Working Party Opinion 05/2014 on Anonymisation Techniques. http://ec.europa.eu/justice/data-protection/article-29/documentation/opinion-recommendation/files/2014/wp216_en.pdf
  2. 2.
    Denning, D.E.: Secure statistical databases with random sample queries. ACM Trans. Database Syst. (TODS) 5(3), 291–315 (1980)CrossRefMATHGoogle Scholar
  3. 3.
    Denning, D.E., Denning, P.J., Schwartz, M.D.: The tracker: a threat to statistical database security. ACM Trans. Database Syst. (TODS) 4(1), 76–96 (1979)CrossRefGoogle Scholar
  4. 4.
    Dinur, I., Nissim, K.: Revealing information while preserving privacy. In: Proceedings of the Twenty-Second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 202–210. ACM (2003)Google Scholar
  5. 5.
    Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006). doi: 10.1007/11787006_1 CrossRefGoogle Scholar
  6. 6.
    Erlingsson, Ú., Pihur, V., Korolova, A.: RAPPOR: randomized aggregatable privacy-preserving ordinal response. In: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, pp. 1054–1067. ACM (2014)Google Scholar
  7. 7.
    Fellegi, I., Phillips, J.: Statistical confidentiality: some theory and application to data dissemination. In: Annals of Economic and Social Measurement, vol. 3, no. 2, pp. 399–409. NBER (1974)Google Scholar
  8. 8.
    Fellegi, I.P.: On the question of statistical confidentiality. J. Am. Stat. Assoc. 67(337), 7–18 (1972)CrossRefMATHGoogle Scholar
  9. 9.
    Prasser, F., Kohlmayer, F.: Putting statistical disclosure control into practice: the ARX data anonymization tool. In: Gkoulalas-Divanis, A., Loukides, G. (eds.) Medical Data Privacy Handbook, pp. 111–148. Springer, Cham (2015). doi: 10.1007/978-3-319-23633-9_6 CrossRefGoogle Scholar
  10. 10.
    Kotschy, W.: The new General Data Protection Regulation - Is there sufficient pay-off for taking the trouble to anonymize or pseudonymize data? November 2016. https://fpf.org/wp-content/uploads/2016/11/Kotschy-paper-on-pseudonymisation.pdf

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Paul Francis
    • 1
  • Sebastian Probst Eide
    • 2
  • Reinhard Munz
    • 1
  1. 1.Max Planck Institute for Software SystemsKaiserslautern, SaarbrückenGermany
  2. 2.Aircloak GmbHKaiserslauternGermany

Personalised recommendations