Springer Nature is making Coronavirus research free. View research | View latest news | Sign up for updates

Embedding differential privacy in decision tree algorithm with different depths

Abstract

Differential privacy (DP) has become one of the most important solutions for privacy protection in recent years. Previous studies have shown that prediction accuracy usually increases as more data mining (DM) logic is considered in the DP implementation. However, although one-step DM computation for decision tree (DT) model has been investigated, existing research has not studied the scenarios when the DP is embedded in two-step DM computation, three-step DM computation until the whole model DM computation. It is very challenging to embed DP in more than two steps of DM computation since the solution space exponentially increases with the increase of computational complexity. In this work, we propose algorithms by making use of Markov Chain Monte Carlo (MCMC) method, which can efficiently search a computationally infeasible space to embed DP into DT generation algorithm. We compare the performance when embedding DP in DT with different depths, i.e., one-step DM computation (previous work), two-step, three-step and the whole model. We find that the deep combination of DP and DT does help to increase the prediction accuracy. However, when the privacy budget is very large (e.g., ϵ = 10), this may overwhelm the complexity of DT model, and the increasing trend is not obvious. We also find that the prediction accuracy decreases with the increase of model complexity.

This is a preview of subscription content, log in to check access.

References

  1. 1

    Dwork C. Differential privacy. In: Proceedings of the 33rd International Colloquium on Automata, Languages and Programming, Venice, 2006. 1–12

  2. 2

    Sweeney L. Achieving k-anonymity privacy protection using generalization and suppression. Int J Uncertain Fuzz, 2002, 10: 571–588

  3. 3

    Domingo-Ferrer J, Torra V. A critique of k-anonymity and some of its enhancements. In: Proceedings of the 3rd International Conference on Availability, Reliability and Security. Washington, DC: IEEE, 2008. 990–993

  4. 4

    Hu X Y, Yuan M Y, Yao J G, et al. Differential privacy in telco big data platform. In: Proceedings of the 41st International Conference on Very Large Data Bases Endowment, Kohala Coast, 2015. 1692–1703

  5. 5

    McSherry F D. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, Rhode Island, 2009. 19–30

  6. 6

    Xiao Q, Chen R, Tan K-L. Differentially private network data release via structural inference. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, 2014. 911–920

  7. 7

    Chen R, Xiao Q, Zhang Y, et al. Differentially private high-dimensional data publication via sampling-based inference. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, 2015. 129–138

  8. 8

    Li H T, Ma J F, Fu S. A privacy-preserving data collection model for digital community. Sci China Inf Sci, 2015, 58: 032101

  9. 9

    Huang X Z, Liu J Q, Han Z, et al. Privacy beyond sensitive values. Sci China Inf Sci, 2015, 58: 072106

  10. 10

    Dwork C, Mcsherry F, Nissim K, et al. Calibrating noise to sensitivity in private data analysis. In: Proceedings of the 3rd Conference on Theory of Cryptography, New York, 2006. 265–284

  11. 11

    Dwork C. A firm foundation for private data analysis. Commun ACM, 2011, 54: 86–95

  12. 12

    Blum A, Dwork C, McSherry F, et al. Practical privacy: the SuLQ framework. In: Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Baltimore, 2005. 128–138

  13. 13

    Chaudhuri K, Monteleoni C. Privacy-preserving logistic regression. In: Proceedings of the 22nd Annual Conference on Neural Information Processing Systems, Vancouver, 2008. 289–296

  14. 14

    Friedman A, Schuster A. Data mining with differential privacy. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington DC, 2010. 493–502

  15. 15

    Erlingsson U, Pihur V, Korolova A. RAPPOR: randomized aggregatable privacy-preserving ordinal response. In: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, Scottsdale, 2014. 1054–1067

  16. 16

    Wang L W, Zhang J P. On the measurement complexity of differentially private query answering. Sci China Inf Sci, 2015, 58: 092112

  17. 17

    Li N H, Qardaji W, Su D, et al. PrivBasis: frequent itemset mining with differential privacy. Proc VLDB Endowment, 2012, 5: 1340–1351

  18. 18

    Hien T, Gabriel G, Cyrus S. A framework for protecting worker location privacy in spatial crowdsourcing. Proc VLDB Endowment, 2014, 7: 919–930

  19. 19

    Li N H, Yang W N, Qardaji W. Differentially private grids for geospatial data. In: Proceedings of the 2013 IEEE International Conference on Data Engineering. Washington DC: IEEE, 2013. 757–768

  20. 20

    Machanavajjhala A, Korolova A, Sarma A D. Personalized social recommendations: accurate or private. Proc VLDB Endowment, 2011, 4: 440–450

  21. 21

    Mohammed N, Chen R, Fung B C M, et al. Differentially private data release for data mining. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, 2011. 493–501

  22. 22

    Shen E T, Yu T. Mining frequent graph patterns with differential privacy. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, 2013. 545–553

  23. 23

    Clauset A, Moore C, Newman M E J. Hierarchical structure and the prediction of missing links in networks. Nature, 2008, 453: 98–101

  24. 24

    Clauset A, Moore C, Newman M E J. Structural inference of hierarchies in networks. In: Proceedings of the 2006 International Conference on Machine Learning on Statistical Network Analysis, Pittsburgh, 2006. 1–13

  25. 25

    Jagannathan G, Pillaipakkamnatt K, Wright R N. A practical differentially private random decision tree classifier. Trans Data Privacy, 2009, 5: 114–121

Download references

Acknowledgements

This work was supported in part by National Natural Science Foundation of China (Grant Nos. 61525204, 61572322), Science and Technology Commission of Shanghai Municipality Project (Grant Nos. 14510722600, 16QA1402200), Aeronautical Science Foundation of China (Grant No. 20145557010), and NRF Singapore CREATE Program E2S2.

Author information

Correspondence to Jianguo Yao or Haibing Guan.

Additional information

Conflict of interest The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Bai, X., Yao, J., Yuan, M. et al. Embedding differential privacy in decision tree algorithm with different depths. Sci. China Inf. Sci. 60, 082104 (2017). https://doi.org/10.1007/s11432-016-0442-1

Download citation

Keywords

  • rential privacy
  • decision tree
  • exponential mechanism
  • exhaustive search
  • MCMC