Calibrating Noise to Sensitivity in Private Data Analysis
We continue a line of research initiated in [10,11]on privacy-preserving statistical databases. Consider a trusted server that holds a database of sensitive information. Given a query function f mapping databases to reals, the so-called true answer is the result of applying f to the database. To protect privacy, the true answer is perturbed by the addition of random noise generated according to a carefully chosen distribution, and this response, the true answer plus noise, is returned to the user.
Previous work focused on the case of noisy sums, in which f = ∑ i g(x i ), where x i denotes the ith row of the database and g maps database rows to [0,1]. We extend the study to general functions f, proving that privacy can be preserved by calibrating the standard deviation of the noise according to the sensitivity of the function f. Roughly speaking, this is the amount that any single argument to f can change its output. The new analysis shows that for several particular applications substantially less noise is needed than was previously understood to be the case.
The first step is a very clean characterization of privacy in terms of indistinguishability of transcripts. Additionally, we obtain separation results showing the increased value of interactive sanitization mechanisms over non-interactive.
KeywordsLaplace Distribution True Answer Query Function Privacy Breach Semantic Security
- 1.Adam, N.R., Wortmann, J.C.: Security-control methods for statistical databases: a comparative study. ACM Computing Surveys 25(4) (December 1989)Google Scholar
- 2.Agrawal, D., Aggarwal, C.C.: On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the Twentieth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM, New York (2001)Google Scholar
- 3.Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Chen, W., Naughton, J.F., Bernstein, P.A. (eds.) SIGMOD Conference, pp. 439–450. ACM, New York (2000)Google Scholar
- 4.Ben-Sasson, E., Harsha, P., Raskhodnikova, S.: Some 3cnf properties are hard to test. In: STOC, pp. 345–354. ACM, New York (2000)Google Scholar
- 5.Web page for the Bertinoro CS-Statistics workshop on privacy and confidentiality (July 2005), Available from, http://www.stat.cmu.edu/~hwainer
- 6.Blum, A., Dwork, C., McSherry, F., Nissim, K.: Practical privacy: The sulq framework. In: PODS (2005)Google Scholar
- 7.Chawla, S., Dwork, C., McSherry, F., Smith, A., Wee, H.: Toward privacy in public databases. In: Theory of Cryptography Conference (TCC), pp. 363–385 (2005)Google Scholar
- 8.Chawla, S., Dwork, C., McSherry, F., Talwar, K.: On the utility of privacy-preserving histograms. In: 21st Conference on Uncertainty in Artificial Intelligence (UAI) (2005)Google Scholar
- 10.Dinur, I., Nissim, K.: Revealing information while preserving privacy. In: Proceedings of the Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 202–210 (2003)Google Scholar
- 12.Evfimievski, A.V., Gehrke, J., Srikant, R.: Limiting privacy breaches in privacy preserving data mining. In: Proceedings of the Twenty- Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 211–222 (2003)Google Scholar
- 14.Roque, G.: Masking microdata with mixtures of normal distributions. University of California, Riverside (2000); Doctoral DissertationGoogle Scholar