Empirical Evaluation of Statistical Inference from Differentially-Private Contingency Tables

  • Anne-Sophie Charest
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7556)


In this paper, we evaluate empirically the quality of statistical inference from differentially-private synthetic contingency tables. We compare three methods: histogram perturbation, the Dirichlet-Multinomial synthesizer and the Hardt-Ligett-McSherry algorithm. We consider a goodness-of-fit test for models suitable to the real data, and a model selection procedure. We find that the theoretical guarantees associated with these differentially-private datasets do not always translate well into guarantees about the statistical inference on the synthetic datasets.


Contingency Table Synthetic Dataset Model Selection Procedure Multiplicative Weight Synthetic Data Generation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [2007]
    Barak, B., Chaudhuri, K., Dwork, C., Kale, S., McSherry, F., Talwar, K.: Privacy, Accuracy, and Consistency too: a Holistic Solution to Contingency Table Release. In: Proceedings of the Twenty-sixth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 273–282 (2007)Google Scholar
  2. [2011]
    Charest, A.-S.: How Can We Analyze Differentially-Private Synthetic Datasets? Journal of Privacy and Confidentiality 2(2), 21–33 (2011)Google Scholar
  3. [2012]
    Charest, A.-S.: Creation and Analysis of Differentially-Private Synthetic Datasets. PhD Thesis, Carnegie Mellon University (2012)Google Scholar
  4. [1990]
    Christiansen, S., Giese, H.: Genetic Analysis of the Obligate Parasitic Barley Powdery Mildew Fungus Based on RFLP and Virulence Loci. TAG Theoretical and Applied Genetics 79(5), 705–712 (1990)Google Scholar
  5. [2006]
    Dwork, C.: Differential Privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  6. [2006]
    Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating Noise to Sensitivity in Private Data Analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  7. [2008]
    Dwork, C.: Differential Privacy: A Survey of Results. In: Agrawal, M., Du, D.-Z., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  8. [2000]
    Edwards, D.: Introduction to Graphical Modelling. Springer (2000)Google Scholar
  9. [1985]
    Edwards, D., Toma, H.: A Fast Procedure for Model Search in Multidimensional Contingency Tables. Biometrika 72(2), 339–351 (1985)MathSciNetzbMATHCrossRefGoogle Scholar
  10. [2011]
    Fienberg, S., Rinaldo, A., Yang, X.: Differential Privacy and the Risk-Utility Tradeoff for Multi-Dimensional Contingency Tables. In: Privacy in Statistical Databases, pp. 187–199 (2011)Google Scholar
  11. [2012]
    Fienberg, S.E., Rinaldo, A.: Maximum Likelihood Estimation in Log-Linear Models: Theory and Algorithms. In: Annals of Statistics (to appear, 2012)Google Scholar
  12. [2010]
    Hardt, M., Ligett, K., McSherry, F.: A simple and practical algorithm for differentially private data release. Arxiv preprint arXiv:1012.4763 (2010)Google Scholar
  13. [2010]
    Hardt, M., Rothblum, G.: Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis. In: Proc. 51st Foundations of Computer Science (FOCS). IEEE (2010)Google Scholar
  14. [2007]
    Kinney, S.: Model Selection and Multivariate Inference Using Data Multiply Imputed for Disclosure Limitation and Nonresponse. ProQuest (2007)Google Scholar
  15. [2010]
    Kinney, S., Reiter, J., Berger, J.: Model Selection when Multiple Imputation Is Used to Protect Confidentiality in Public Use Data. Journal of Privacy and Confidentiality 2(2), 3–19 (2010)Google Scholar
  16. [2010]
    Li, C., Hay, M., Rastogi, V., Miklau, G., McGregor, A.: Optimizing linear counting queries under differential privacy. In: Proceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 123–134 (2010)Google Scholar
  17. [2008]
    Machanavajjhala, A., Kifer, D., Abowd, J., Gehrke, J., Vilhuber, L.: Privacy: Theory Meets Practice on the Map. In: IEEE 24th International Conference on Data Engineering, ICDE 2008, pp. 277–286 (2008)Google Scholar
  18. [2007]
    McSherry, F., Talwar, K.: Mechanism Design via Differential Privacy. In: 48th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2007, pp. 94–103 (2007)Google Scholar
  19. [2003]
    Reiter, J.: Inference for Partially Synthetic, Public Use Microdata Sets. Survey Methodology 29(2), 181–188 (2003)Google Scholar
  20. [2005]
    Reiter, J.: Significance Tests for Multi-Component Estimands from Multiply Imputed, Synthetic Microdata. Journal of Statistical Planning and Inference 131(2), 365–377 (2005)MathSciNetzbMATHCrossRefGoogle Scholar
  21. [1993]
    Rubin, D.B.: Statistical Disclosure Limitation. Journal of Official Statistics 9(2), 461–468 (1993)Google Scholar
  22. [2009]
    Whittaker, J.: Graphical Models in Applied Multivariate Statistics. Wiley (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Anne-Sophie Charest
    • 1
  1. 1.Université LavalCanada

Personalised recommendations