Gaussian Mixture Models for Classification and Hypothesis Tests Under Differential Privacy

  • Xiaosu Tong
  • Bowei Xi
  • Murat Kantarcioglu
  • Ali Inan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10359)


Many statistical models are constructed using very basic statistics: mean vectors, variances, and covariances. Gaussian mixture models are such models. When a data set contains sensitive information and cannot be directly released to users, such models can be easily constructed based on noise added query responses. The models nonetheless provide preliminary results to users. Although the queried basic statistics meet the differential privacy guarantee, the complex models constructed using these statistics may not meet the differential privacy guarantee. However it is up to the users to decide how to query a database and how to further utilize the queried results. In this article, our goal is to understand the impact of differential privacy mechanism on Gaussian mixture models. Our approach involves querying basic statistics from a database under differential privacy protection, and using the noise added responses to build classifier and perform hypothesis tests. We discover that adding Laplace noises may have a non-negligible effect on model outputs. For example variance-covariance matrix after noise addition is no longer positive definite. We propose a heuristic algorithm to repair the noise added variance-covariance matrix. We then examine the classification error using the noise added responses, through experiments with both simulated data and real life data, and demonstrate under which conditions the impact of the added noises can be reduced. We compute the exact type I and type II errors under differential privacy for one sample z test, one sample t test, and two sample t test with equal variances. We then show under which condition a hypothesis test returns reliable result given differentially private means, variances and covariances.


Differential privacy Statistical database Mixture model Classification Hypothesis test 


  1. 1.
    Chaudhuri, K., Monteleoni, C., Sarwate, A.D.: Differentially private empirical risk minimization. J. Mach. Learn. Res. 12, 1069–1109 (2011)MathSciNetzbMATHGoogle Scholar
  2. 2.
    Chaudhuri, K., Sarwate, A.D., Sinha, K.: A near-optimal algorithm for differentially-private principal components. J. Mach. Learn. Res. 14, 2905–2943 (2013)MathSciNetzbMATHGoogle Scholar
  3. 3.
    Clifton, C., Tassa, T.: On syntactic anonymity and differential privacy. Trans. Data Priv. 6(2), 161–183 (2013)MathSciNetGoogle Scholar
  4. 4.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New York (2001)zbMATHGoogle Scholar
  5. 5.
    Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006). doi: 10.1007/11787006_1 CrossRefGoogle Scholar
  6. 6.
    Dwork, C.: Differential Privacy: A Survey of Results. In: Agrawal, M., Du, D., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008). doi: 10.1007/978-3-540-79228-4_1 CrossRefGoogle Scholar
  7. 7.
    Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). doi: 10.1007/11681878_14 CrossRefGoogle Scholar
  8. 8.
    Friedman, A., Schuster, A.: Data mining with differential privacy. In: KDD 2010: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, pp. 493–502. ACM (2010)Google Scholar
  9. 9.
    Fukunaga, K.: Introduction to Statistical Pattern Recognition, 2nd edn. Academic Press Professional Inc., San Diego (1990)zbMATHGoogle Scholar
  10. 10.
    Jagannathan, G., Pillaipakkamnatt, K., Wright, R.N.: A practical differentially private random decision tree classifier. In: ICDM Workshops, pp. 114–121 (2009)Google Scholar
  11. 11.
    Kapralov, M., Talwar, K.: On differentially private low rank approximation. In: SODA 2013: Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, Louisiana, pp. 1395–1414. SIAM (2013)Google Scholar
  12. 12.
    Kifer, D., Smith, A., Thakurta, A.: Private convex empirical risk minimization and high-dimensional regression. J. Mach. Learn. Res. 23, 1–41 (2012)Google Scholar
  13. 13.
    J. Lei. Differentially private M-estimators. In: Advances in Neural Information Processing Systems, pp. 361–369 (2011)Google Scholar
  14. 14.
    Pathak, M.A., Raj, B.: Large margin Gaussian mixture models with differential privacy. IEEE Trans. Dependable Secure Comput. 9(4), 463–469 (2012)CrossRefGoogle Scholar
  15. 15.
    McSherry, F., Mironov, I.: Differentially private recommender systems: building privacy into the Netflix prize contenders. In: KDD 2009: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, pp. 627–636. ACM (2009)Google Scholar
  16. 16.
    McSherry, F., Talwar, K.: Mechanism design via differential privacy. In: FOCS 2007: 48th Annual IEEE Symposium on Foundations of Computer Science, Providence, Rhode Island, pp. 94–103. IEEE (2007)Google Scholar
  17. 17.
    Samarati, P., Sweeney, L.: Generalizing data to provide anonymity when disclosing information. In: Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), Seattle, WA, USA, 1–3 June 1998Google Scholar
  18. 18.
    Samarati, P.: Protecting respondents identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)CrossRefGoogle Scholar
  19. 19.
    Rubinstein, B., Bartlett, P.L., Huang, L., Taft, N.: Learning in a large function space: privacy-preserving mechanisms for SVM learning. J. Priv. Confidentiality 4(1), 65–100 (2012)Google Scholar
  20. 20.
    Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Vu, D., Slavkovic, A.: Differential privacy for clinical trial data: preliminary evaluations. In: IEEE 13th International Conference on Data Mining Workshops, Los Alamitos, CA, USA, pp. 138–143. IEEE (2009)Google Scholar
  22. 22.
    Xi, B., Kantarcioglu, M., Inan, A.: Mixture of Gaussian models and Bayes error under differential privacy. In: Proceedings of the First ACM Conference on Data and Application Security and Privacy, pp. 179–190. ACM (2011)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2017

Authors and Affiliations

  • Xiaosu Tong
    • 1
  • Bowei Xi
    • 2
  • Murat Kantarcioglu
    • 3
  • Ali Inan
    • 4
  1. 1.AmazonSeattleUSA
  2. 2.Department of StatisticsPurdue UniversityWest LafayetteUSA
  3. 3.Department of Computer ScienceUniversity of Texas at DallasDallasUSA
  4. 4.Department of Computer EngineeringAdana Science and Technology UniversityAdanaTurkey

Personalised recommendations