PCPs and the Hardness of Generating Private Synthetic Data

  • Jonathan Ullman
  • Salil Vadhan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6597)


Assuming the existence of one-way functions, we show that there is no polynomial-time, differentially private algorithm \(\mathcal{A}\) that takes a database D ∈ ({0,1} d ) n and outputs a “synthetic database” \(\widehat{D}\) all of whose two-way marginals are approximately equal to those of D. (A two-way marginal is the fraction of database rows x ∈ {0,1} d with a given pair of values in a given pair of columns). This answers a question of Barak et al. (PODS ‘07), who gave an algorithm running in time poly(n,2 d ).

Our proof combines a construction of hard-to-sanitize databases based on digital signatures (by Dwork et al., STOC ‘09) with encodings based on probabilistically checkable proofs.

We also present both negative and positive results for generating “relaxed” synthetic data, where the fraction of rows in D satisfying a predicate c are estimated by applying c to each row of \(\widehat{D}\) and aggregating the results in some way.


privacy digital signatures inapproximability constraint satisfaction problems probabilistically checkable proofs 


  1. 1.
    Adam, N.R., Wortmann, J.: Security-control methods for statistical databases: A comparative study. ACM Computing Surveys 21, 515–556 (1989)CrossRefGoogle Scholar
  2. 2.
    Alekhnovich, M., Braverman, M., Feldman, V., Klivans, A.R., Pitassi, T.: The complexity of properly learning simple concept classes. J. Comput. Syst. Sci. 74, 16–34 (2008)CrossRefzbMATHMathSciNetGoogle Scholar
  3. 3.
    Barak, B., Chaudhuri, K., Dwork, C., Kale, S., McSherry, F., Talwar, K.: Privacy, accuracy, and consistency too: A holistic solution to contingency table release. In: Proceedings of the 26th Symposium on Principles of Database Systems, pp. 273–282 (2007)Google Scholar
  4. 4.
    Barak, B., Goldreich, O.: Universal arguments and their applications. SIAM J. Comput. 38, 1661–1694 (2008)CrossRefzbMATHMathSciNetGoogle Scholar
  5. 5.
    Blum, A., Dwork, C., McSherry, F., Nissim, K.: Practical privacy: The SuLQ framework. In: Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (June 2005)Google Scholar
  6. 6.
    Blum, A., Ligett, K., Roth, A.: A learning theory approach to non-interactive database privacy. In: Proceedings of the 40th ACM SIGACT Symposium on Thoery of Computing (2008)Google Scholar
  7. 7.
    Dinur, I., Nissim, K.: Revealing information while preserving privacy. In: Proceedings of the Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 202–210 (2003)Google Scholar
  8. 8.
    Duncan, G.: Confidentiality and statistical disclosure limitation. In: International Encyclopedia of the Social and Behavioral Sciences. Elsevier, Amsterdam (2001)Google Scholar
  9. 9.
    Dwork, C.: A firm foundation for private data analysis. Communications of the ACM (to appear)Google Scholar
  10. 10.
    Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006, Part II. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  11. 11.
    Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  12. 12.
    Dwork, C., Naor, M., Reingold, O., Rothblum, G., Vadhan, S.: When and how can privacy-preserving data release be done efficiently? In: Proceedings of the 2009 International ACM Symposium on Theory of Computing (STOC) (2009)Google Scholar
  13. 13.
    Dwork, C., Nissim, K.: Privacy-preserving datamining on vertically partitioned databases. In: Franklin, M. (ed.) CRYPTO 2004. LNCS, vol. 3152, pp. 528–544. Springer, Heidelberg (2004)Google Scholar
  14. 14.
    Dwork, C., Rothblum, G., Vadhan, S.P.: Boosting and differential privacy. In: Proceedings of FOCS 2010 (2010)Google Scholar
  15. 15.
    Evfimievski, A., Grandison, T.: Privacy Preserving Data Mining (a short survey). In: Encyclopedia of Database Technologies and Applications. Information Science Reference (2006)Google Scholar
  16. 16.
    Feldman, V.: Hardness of proper learning. In: The Encyclopedia of Algorithms. Springer, Heidelberg (2008)Google Scholar
  17. 17.
    Feldman, V.: Hardness of approximate two-level logic minimization and PAC learning with membership queries. Journal of Computer and System Sciences 75(1), 13–26 (2009), CrossRefzbMATHMathSciNetGoogle Scholar
  18. 18.
    Goldreich, O.: Foundations of Cryptography, vol. 2. Cambridge University Press, Cambridge (2004)CrossRefzbMATHGoogle Scholar
  19. 19.
    Håstad, J.: Some optimal inapproximability results. J. ACM. 48, 798–859 (2001)CrossRefzbMATHMathSciNetGoogle Scholar
  20. 20.
    Kearns, M.J., Valiant, L.G.: Cryptographic limitations on learning boolean formulae and finite automata. J. ACM. 41, 67–95 (1994)CrossRefzbMATHMathSciNetGoogle Scholar
  21. 21.
    Kilian, J.: A note on efficient zero-knowledge proofs and arguments (extended abstract). In: STOC (1992)Google Scholar
  22. 22.
    Micali, S.: Computationally sound proofs. SIAM J. Comput. 30, 1253–1298 (2000)CrossRefzbMATHMathSciNetGoogle Scholar
  23. 23.
    Naor, M., Yung, M.: Universal one-way hash functions and their cryptographic applications. In: STOC, pp. 33–43 (1989)Google Scholar
  24. 24.
    Pitt, L., Valiant, L.G.: Computational limitations on learning from examples. J. ACM 35, 965–984 (1988)CrossRefzbMATHMathSciNetGoogle Scholar
  25. 25.
    Reiter, J.P., Drechsler, J.: Releasing multiply-imputed synthetic data generated in two stages to protect confidentiality. Iab discussion paper, Intitut für Arbeitsmarkt und Berufsforschung (IAB), Nürnberg, Institute for Employment Research, Nuremberg, Germany (2007),
  26. 26.
    Rompel, J.: One-way functions are necessary and sufficient for secure signatures. In: STOC, pp. 387–394 (1990)Google Scholar
  27. 27.
    Roth, A., Roughgarden, T.: Interactive privacy via the median mechanism. In: STOC 2010 (2010)Google Scholar
  28. 28.
    Ullman, J., Vadhan, S.P.: PCPs and the hardness of generating synthetic data. Electronic Colloquium on Computational Complexity (ECCC) 17, 17 (2010)Google Scholar
  29. 29.
    Valiant, L.G.: A theory of the learnable. Communications of the ACM 27(11), 1134–1142 (1984)CrossRefzbMATHGoogle Scholar

Copyright information

© International Association for Cryptologic Research 2011

Authors and Affiliations

  • Jonathan Ullman
    • 1
  • Salil Vadhan
    • 1
  1. 1.School of Engineering and Applied Sciences & Center for Research on Computation and SocietyHarvard UniversityCambridgeUSA

Personalised recommendations