Skip to main content

Diffix: High-Utility Database Anonymization

  • Conference paper
  • First Online:
Privacy Technologies and Policy (APF 2017)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 10518))

Included in the following conference series:

Abstract

In spite of the tremendous privacy and liability benefits of anonymization, most shared data today is only pseudonymized. The reason is simple: there haven’t been any anonymization technologies that are general purpose, easy to use, and preserve data quality. This paper presents the design of Diffix, a new approach to database anonymization that promises to break new ground in the utility/privacy trade-off. Diffix acts as an SQL proxy between the analyst and an unmodified live database. Diffix adds a minimal amount of noise to answers—Gaussian with a standard deviation of only two for counting queries—and places no limit on the number of queries an analyst may make. Diffix works with any type of data and configuration is simple and data-independent: the administrator does not need to consider the identifiability or sensitivity of the data itself. This paper presents a high-level but complete description of Diffix. It motivates the design through examples of attacks and defenses, and provides some evidence for how Diffix can provide strong anonymity with such low noise levels.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    For example, a state-of-the-art anonymization tool is ARX [9]. Usage of this tool requires among many other things that the user is able to classify data as identifying, quasi-identifying, sensitive, and insensitive; can create masking-based, interval-based, or order-based generalization hierarchies; and can understand and configure privacy models such as \(\delta \)-presence, l-diversity, t-closeness, \(\delta \)-disclosure, k-Anonymity, k-Map, (\(\epsilon \),\(\delta \))-differential privacy, and risk-based privacy models for prosecutor, journalist and marketer risks.

  2. 2.

    Despite the thousands of papers on anonymity we could only find two that try to add noise in a way that depends on the data [2, 6]. This is why the paper cites so little related work.

  3. 3.

    Distributions other than Gaussian may serve better, but in any event the noise is small and so we haven’t yet explored this question.

References

  1. Article 29 Data Protection Working Party Opinion 05/2014 on Anonymisation Techniques. http://ec.europa.eu/justice/data-protection/article-29/documentation/opinion-recommendation/files/2014/wp216_en.pdf

  2. Denning, D.E.: Secure statistical databases with random sample queries. ACM Trans. Database Syst. (TODS) 5(3), 291–315 (1980)

    Article  MATH  Google Scholar 

  3. Denning, D.E., Denning, P.J., Schwartz, M.D.: The tracker: a threat to statistical database security. ACM Trans. Database Syst. (TODS) 4(1), 76–96 (1979)

    Article  Google Scholar 

  4. Dinur, I., Nissim, K.: Revealing information while preserving privacy. In: Proceedings of the Twenty-Second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 202–210. ACM (2003)

    Google Scholar 

  5. Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006). doi:10.1007/11787006_1

    Chapter  Google Scholar 

  6. Erlingsson, Ú., Pihur, V., Korolova, A.: RAPPOR: randomized aggregatable privacy-preserving ordinal response. In: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, pp. 1054–1067. ACM (2014)

    Google Scholar 

  7. Fellegi, I., Phillips, J.: Statistical confidentiality: some theory and application to data dissemination. In: Annals of Economic and Social Measurement, vol. 3, no. 2, pp. 399–409. NBER (1974)

    Google Scholar 

  8. Fellegi, I.P.: On the question of statistical confidentiality. J. Am. Stat. Assoc. 67(337), 7–18 (1972)

    Article  MATH  Google Scholar 

  9. Prasser, F., Kohlmayer, F.: Putting statistical disclosure control into practice: the ARX data anonymization tool. In: Gkoulalas-Divanis, A., Loukides, G. (eds.) Medical Data Privacy Handbook, pp. 111–148. Springer, Cham (2015). doi:10.1007/978-3-319-23633-9_6

    Chapter  Google Scholar 

  10. Kotschy, W.: The new General Data Protection Regulation - Is there sufficient pay-off for taking the trouble to anonymize or pseudonymize data? November 2016. https://fpf.org/wp-content/uploads/2016/11/Kotschy-paper-on-pseudonymisation.pdf

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paul Francis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Francis, P., Probst Eide, S., Munz, R. (2017). Diffix: High-Utility Database Anonymization. In: Schweighofer, E., Leitold, H., Mitrakas, A., Rannenberg, K. (eds) Privacy Technologies and Policy. APF 2017. Lecture Notes in Computer Science(), vol 10518. Springer, Cham. https://doi.org/10.1007/978-3-319-67280-9_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67280-9_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67279-3

  • Online ISBN: 978-3-319-67280-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics