Diffix: High-Utility Database Anonymization
In spite of the tremendous privacy and liability benefits of anonymization, most shared data today is only pseudonymized. The reason is simple: there haven’t been any anonymization technologies that are general purpose, easy to use, and preserve data quality. This paper presents the design of Diffix, a new approach to database anonymization that promises to break new ground in the utility/privacy trade-off. Diffix acts as an SQL proxy between the analyst and an unmodified live database. Diffix adds a minimal amount of noise to answers—Gaussian with a standard deviation of only two for counting queries—and places no limit on the number of queries an analyst may make. Diffix works with any type of data and configuration is simple and data-independent: the administrator does not need to consider the identifiability or sensitivity of the data itself. This paper presents a high-level but complete description of Diffix. It motivates the design through examples of attacks and defenses, and provides some evidence for how Diffix can provide strong anonymity with such low noise levels.
KeywordsPrivacy Anonymity Analytics Database
- 1.Article 29 Data Protection Working Party Opinion 05/2014 on Anonymisation Techniques. http://ec.europa.eu/justice/data-protection/article-29/documentation/opinion-recommendation/files/2014/wp216_en.pdf
- 4.Dinur, I., Nissim, K.: Revealing information while preserving privacy. In: Proceedings of the Twenty-Second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 202–210. ACM (2003)Google Scholar
- 6.Erlingsson, Ú., Pihur, V., Korolova, A.: RAPPOR: randomized aggregatable privacy-preserving ordinal response. In: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, pp. 1054–1067. ACM (2014)Google Scholar
- 7.Fellegi, I., Phillips, J.: Statistical confidentiality: some theory and application to data dissemination. In: Annals of Economic and Social Measurement, vol. 3, no. 2, pp. 399–409. NBER (1974)Google Scholar
- 10.Kotschy, W.: The new General Data Protection Regulation - Is there sufficient pay-off for taking the trouble to anonymize or pseudonymize data? November 2016. https://fpf.org/wp-content/uploads/2016/11/Kotschy-paper-on-pseudonymisation.pdf