Advertisement

Privacy Preserving Synthetic Data Release Using Deep Learning

  • Nazmiye Ceren AbayEmail author
  • Yan Zhou
  • Murat Kantarcioglu
  • Bhavani Thuraisingham
  • Latanya Sweeney
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11051)

Abstract

For many critical applications ranging from health care to social sciences, releasing personal data while protecting individual privacy is paramount. Over the years, data anonymization and synthetic data generation techniques have been proposed to address this challenge. Unfortunately, data anonymization approaches do not provide rigorous privacy guarantees. Although, there are existing synthetic data generation techniques that use rigorous definitions of differential privacy, to our knowledge, these techniques have not been compared extensively using different utility metrics.

In this work, we provide two novel contributions. First, we compare existing techniques on different datasets using different utility metrics. Second, we present a novel approach that utilizes deep learning techniques coupled with an efficient analysis of privacy costs to generate differentially private synthetic datasets with higher data utility. We show that we can learn deep learning models that can capture relationship among multiple features, and then use these models to generate differentially private synthetic datasets. Our extensive experimental evaluation conducted on multiple datasets indicates that our proposed approach is more robust (i.e., one of the top performing technique in almost all type of data we have experimented) compared to the state-of-the art methods in terms of various data utility measures. Code related to this paper is available at: https://github.com/ncabay/synthetic_generation.

Keywords

Differential privacy Deep learning Data generation 

Notes

Acknowledgement

The research reported herein was supported in part by NIH award 1R01HG006844, NSF awards CNS-1111529, CICI-1547324, and IIS-1633331 and ARO award W911NF-17-1- 0356.

References

  1. 1.
    Abadi, M., et al.: Deep learning with differential privacy. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308–318. ACM (2016)Google Scholar
  2. 2.
    Ács, G., Melis, L., Castelluccia, C., Cristofaro, E.D.: Differentially private mixture of generative neural networks. CoRR abs/1709.04514 (2017). http://arxiv.org/abs/1709.04514
  3. 3.
    Barak, B., Chaudhuri, K., Dwork, C., Kale, S., McSherry, F., Talwar, K.: Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In: Proceedings of the Twenty-Sixth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2007, pp. 273–282. ACM, New York (2007).  https://doi.org/10.1145/1265530.1265569
  4. 4.
    Beimel, A., Kasiviswanathan, S.P., Nissim, K.: Bounds on the sample complexity for private learning and private data release. In: Micciancio, D. (ed.) TCC 2010. LNCS, vol. 5978, pp. 437–454. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-11799-2_26CrossRefGoogle Scholar
  5. 5.
    Bindschaedler, V., Shokri, R., Gunter, C.A.: Plausible deniability for privacy-preserving data synthesis. Proc. VLDB Endow. 10(5), 481–492 (2017)CrossRefGoogle Scholar
  6. 6.
    Blum, A., Dwork, C., McSherry, F., Nissim, K.: Practical privacy: the SuLQ framework. In: Proceedings of the Twenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 128–138. ACM (2005)Google Scholar
  7. 7.
    Chaudhuri, K., Monteleoni, C.: Privacy-preserving logistic regression. In: Advances in Neural Information Processing Systems, pp. 289–296 (2009)Google Scholar
  8. 8.
    Chaudhuri, K., Monteleoni, C., Sarwate, A.D.: Differentially private empirical risk minimization. J. Mach. Learn. Res. 12, 1069–1109 (2011)MathSciNetzbMATHGoogle Scholar
  9. 9.
    Ding, B., Winslett, M., Han, J., Li, Z.: Differentially private data cubes: optimizing noise sources and consistency. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, pp. 217–228. ACM (2011)Google Scholar
  10. 10.
    Dwork, C., Kenthapadi, K., McSherry, F., Mironov, I., Naor, M.: Our data, ourselves: privacy via distributed noise generation. In: Vaudenay, S. (ed.) EUROCRYPT 2006. LNCS, vol. 4004, pp. 486–503. Springer, Heidelberg (2006).  https://doi.org/10.1007/11761679_29CrossRefGoogle Scholar
  11. 11.
    Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006).  https://doi.org/10.1007/11681878_14CrossRefGoogle Scholar
  12. 12.
    Dwork, C., Roth, A., et al.: The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9(3–4), 211–407 (2014)MathSciNetzbMATHGoogle Scholar
  13. 13.
    Dwork, C., Rothblum, G.N., Vadhan, S.: Boosting and differential privacy. In: 2010 51st Annual IEEE Symposium on Foundations of Computer Science (FOCS), pp. 51–60. IEEE (2010)Google Scholar
  14. 14.
    Goodfellow, I.: Efficient per-example gradient computations. arXiv preprint arXiv:1510.01799 (2015)
  15. 15.
    Hearst, M.A., Dumais, S.T., Osuna, E., Platt, J., Scholkopf, B.: Support vector machines. IEEE Intell. Syst. Appl. 13(4), 18–28 (1998)CrossRefGoogle Scholar
  16. 16.
    Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)CrossRefGoogle Scholar
  17. 17.
    Jagannathan, G., Pillaipakkamnatt, K., Wright, R.N.: A practical differentially private random decision tree classifier. In: IEEE International Conference on Data Mining Workshops, ICDMW 2009, pp. 114–121. IEEE (2009)Google Scholar
  18. 18.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  19. 19.
    Li, H., et al.: The sequence alignment/map format and samtools. Bioinformatics 25(16), 2078–2079 (2009)CrossRefGoogle Scholar
  20. 20.
    Li, N., Qardaji, W., Su, D., Cao, J.: PrivBasis: frequent itemset mining with differential privacy. Proc. VLDB Endow. 5(11), 1340–1351 (2012)CrossRefGoogle Scholar
  21. 21.
    Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
  22. 22.
    Park, M., Foulds, J., Chaudhuri, K., Welling, M.: DP-EM: differentially private expectation maximization. arXiv preprint arXiv:1605.06995 (2016)
  23. 23.
    Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: International Conference on Machine Learning, pp. 1310–1318 (2013)Google Scholar
  24. 24.
    Qardaji, W., Yang, W., Li, N.: Priview: practical differentially private release of marginal contingency tables. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 1435–1446. ACM (2014)Google Scholar
  25. 25.
    Rubin, D.B.: Discussion statistical disclosure limitation. J. Off. Stat. 9(2), 461 (1993)Google Scholar
  26. 26.
    Rubinstein, B.I., Bartlett, P.L., Huang, L., Taft, N.: Learning in a large function space: privacy-preserving mechanisms for svm learning. arXiv preprint arXiv:0911.5708 (2009)
  27. 27.
    Schork, N.J.: Personalized medicine: time for one-person trials. Nature 520(7549), 609–611 (2015)CrossRefGoogle Scholar
  28. 28.
    Shah, I.M.: Introduction to nonparametric estimation. Investigación Operacional 30(3), 284–285 (2009)Google Scholar
  29. 29.
    Song, S., Chaudhuri, K., Sarwate, A.D.: Stochastic gradient descent with differentially private updates. In: Global Conference on Signal and Information Processing (Global-SIP), pp. 245–248. IEEE (2013)Google Scholar
  30. 30.
    Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 10(05), 557–570 (2002)MathSciNetCrossRefGoogle Scholar
  31. 31.
    Vaidya, J., Shafiq, B., Basu, A., Hong, Y.: Differentially private naive Bayes classification. In: Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), vol. 01, pp. 571–576. IEEE Computer Society (2013)Google Scholar
  32. 32.
    Vogel, P., Greiser, T., Mattfeld, D.C.: Understanding bike-sharing systems using data mining: exploring activity patterns. Procedia Soc. Behav. Sci. 20, 514–523 (2011)CrossRefGoogle Scholar
  33. 33.
    Wong, C.: Nyc taxi trips dataset (2017). https://github.com/andresmh/nyctaxitrips
  34. 34.
    Zanella, A., Bui, N., Castellani, A., Vangelista, L., Zorzi, M.: Internet of things for smart cities. IEEE Internet Things J. 1(1), 22–32 (2014)CrossRefGoogle Scholar
  35. 35.
    Zeng, C., Naughton, J.F., Cai, J.Y.: On differentially private frequent itemset mining. Proc. VLDB Endow. 6(1), 25–36 (2012)CrossRefGoogle Scholar
  36. 36.
    Zhang, J., Cormode, G., Procopiuc, C.M., Srivastava, D., Xiao, X.: PrivBayes: private data release via Bayesian networks. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 1423–1434. ACM (2014)Google Scholar
  37. 37.
    Zhang, J., Zhang, Z., Xiao, X., Yang, Y., Winslett, M.: Functional mechanism: regression analysis under differential privacy. Proc. VLDB Endow. 5(11), 1364–1375 (2012)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.University of Texas at DallasRichardsonUSA
  2. 2.Harvard UniversityCambridgeUSA

Personalised recommendations