Privacy-Preserving Datamining on Vertically Partitioned Databases

  • Cynthia Dwork
  • Kobbi Nissim
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3152)

Abstract

In a recent paper Dinur and Nissim considered a statistical database in which a trusted database administrator monitors queries and introduces noise to the responses with the goal of maintaining data privacy [5]. Under a rigorous definition of breach of privacy, Dinur and Nissim proved that unless the total number of queries is sub-linear in the size of the database, a substantial amount of noise is required to avoid a breach, rendering the database almost useless.

As databases grow increasingly large, the possibility of being able to query only a sub-linear number of times becomes realistic. We further investigate this situation, generalizing the previous work in two important directions: multi-attribute databases (previous work dealt only with single-attribute databases) and vertically partitioned databases, in which different subsets of attributes are stored in different databases. In addition, we show how to use our techniques for datamining on published noisy statistics.

Keywords

Data Privacy Statistical Databases Data Mining Vertically Partitioned Databases 

References

  1. 1.
    Agrawal, D., Aggarwal, C.: On the Design and Quantification of Privacy Preserving Data Mining Algorithms. In: Proceedings of the 20th Symposium on Principles of Database Systems (2001)Google Scholar
  2. 2.
    Adam, N.R., Wortmann, J.C.: Security-Control Methods for Statistical Databases: A Comparative Study. ACM Computing Surveys 21(4), 515–556 (1989)CrossRefGoogle Scholar
  3. 3.
    Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proc. of the ACM SIGMOD Conference on Management of Data, pp. 439–450 (2000)Google Scholar
  4. 4.
    Chawla, S., Dwork, C., McSherry, F., Smith, A., Wee, H.: Toward Privacy in Public Databases (submitted for publication) (2004)Google Scholar
  5. 5.
    Dinur, I., Nissim, K.: Revealing information while preserving privacy. In: Proceedings of the Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 202–210 (2003)Google Scholar
  6. 6.
    Duncan, G.: Confidentiality and statistical disclosure limitation. In: Smelser, N., Baltes, P. (eds.) International Encyclopedia of the Social and Behavioral Sciences, Elsevier, New York (2001)Google Scholar
  7. 7.
    Evfimievski, A.V., Gehrke, J., Srikant, R.: Limiting privacy breaches in privacy preserving data mining. In: Proceedings of the Twenty-Second ACM SIGACTSIGMOD- SIGART Symposium on Principles of Database Systems, pp. 211–222 (2003)Google Scholar
  8. 8.
    Fienberg, S.: Confidentiality and Data Protection Through Disclosure Limitation: Evolving Principles and Technical Advances. In: IAOS Conference on Statistics, Development and Human Rights (September 2000), available at http://www.statistik.admin.ch/about/international/fienberg_final_paper.doc
  9. 9.
    Fienberg, S., Makov, U., Steele, R.: Disclosure Limitation and Related Methods for Categorical Data. Journal of Official Statistics 14, 485–502 (1998)Google Scholar
  10. 10.
    Franconi, L., Merola, G.: Implementing Statistical Disclosure Control for Aggregated Data Released Via Remote Access, Working Paper No. 30, United Nations Statistical Commission and European Commission, joint ECE/EUROSTAT work session on statistical data confidentiality (April 2003), available at http://www.unece.org/stats/documents/2003/04/confidentiality/wp.30.e.pdf
  11. 11.
    Goldwasser, S., Micali, S.: Probabilistic Encryption and How to Play Mental Poker Keeping Secret All Partial Information. In: STOC 1982, pp. 365–377 (1982)Google Scholar
  12. 12.
    Raghunathan, T.E., Reiter, J.P., Rubin, D.B.: Multiple Imputation for Statistical Disclosure Limitation. Journal of Official Statistics 19(1), 1–16 (2003)Google Scholar
  13. 13.
    Rubin, D.B.: Discussion: Statistical Disclosure Limitation. Journal of Official Statistics 9(2), 461–469 (1993)Google Scholar
  14. 14.
    Shoshani, A.: Statistical databases: Characteristics, problems and some solutions. In: Proceedings of the 8th International Conference on Very Large Data Bases (VLDB 1982), pp. 208–222 (1982)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Cynthia Dwork
    • 1
  • Kobbi Nissim
    • 1
  1. 1.Microsoft ResearchSVCMountain ViewUSA

Personalised recommendations