Advertisement

The VLDB Journal

, Volume 27, Issue 2, pp 201–223 | Cite as

PrivPfC: differentially private data publication for classification

  • Dong Su
  • Jianneng Cao
  • Ninghui Li
  • Min Lyu
Regular Paper
  • 387 Downloads

Abstract

In this paper, we tackle the problem of constructing a differentially private synopsis for the classification analysis. Several state-of-the-art methods follow the structure of existing classification algorithms and are all iterative, which is suboptimal due to the locally optimal choices and division of the privacy budget among many sequentially composed steps. We propose PrivPfC, a new differentially private method for releasing data for classification. The key idea underlying PrivPfC is to privately select, in a single step, a grid, which partitions the data domain into a number of cells. This selection is done by using the exponential mechanism with a novel quality function, which maximizes the expected number of correctly classified records by a histogram classifier. PrivPfC supports both the binary classification and the multiclass classification. Through extensive experiments on real datasets, we demonstrate PrivPfC ’s superiority over the state-of-the-art methods.

Keywords

Differential privacy Classification Privacy preserving data publishing 

Notes

Acknowledgements

We thank the reviewers for their valuable comments. This paper is based upon work supported by the United States National Science Foundation under grants CNS-1116991 and CNS-1640374 and Key Laboratory on High Performance Computing, Anhui Province, NSFC (61672486, 61672480,11671376), Key Program of NSFC (71631006).

References

  1. 1.
    Asuncion, A., Newman, D.: UCI machine learning repository (2010)Google Scholar
  2. 2.
    Bayardo, R.J., Agrawal, R.: Data privacy through optimal \(k\)-anonymization. In: ICDE, pp. 217–228 (2005)Google Scholar
  3. 3.
    Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, Secaucus (2006)zbMATHGoogle Scholar
  4. 4.
    Blum, A., Dwork, C., McSherry, F., Nissim, K.: Practical privacy: the SuLQ framework. In: PODS, pp. 128–138 (2005)Google Scholar
  5. 5.
    Chang, C.C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011)CrossRefGoogle Scholar
  6. 6.
    Chaudhuri, K., Monteleoni, C.: Privacy-preserving logistic regression. In: NIPS, pp. 289–296 (2008)Google Scholar
  7. 7.
    Chaudhuri, K., Monteleoni, C., Sarwate, A.D.: Differentially private empirical risk minimization. J. Mach. Learn. Res. 12, 1069–1109 (2011)MathSciNetzbMATHGoogle Scholar
  8. 8.
    Cormode, G., Srivastava, D., Li, N., Li, T.: Minimizing minimality and maximizing utility: analyzing method-based attacks on anonymized data. PVLDB 3(1–2), 1045–1056 (2010)Google Scholar
  9. 9.
    Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Applications of Mathematics. Springer, Berlin (1996)CrossRefzbMATHGoogle Scholar
  10. 10.
    Dwork, C.: Differential privacy. In: ICALP, pp. 1–12 (2006)Google Scholar
  11. 11.
    Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: TCC, pp. 265–284 (2006)Google Scholar
  12. 12.
    Fredrikson, M., Jha, S., Ristenpart, T.: Model inversion attacks that exploit confidence information and basic countermeasures. In: CCS, pp. 1322–1333 (2015)Google Scholar
  13. 13.
    Fredrikson, M., Lantz, E., Jha, S., Lin, S., Page, D., Ristenpart, T.: Privacy in pharmacogenetics: an end-to-end case study of personalized warfarin dosing. In: USENIX Security Symposium, pp. 17–32 (2014)Google Scholar
  14. 14.
    Friedman, A., Schuster, A.: Data mining with differential privacy. In: KDD, pp. 493–502 (2010)Google Scholar
  15. 15.
    Fung, B.C.M., Wang, K., Yu, P.S.: Top-down specialization for information and privacy preservation. In: ICDE, pp. 205–216 (2005)Google Scholar
  16. 16.
    Geng, X., Liu, T.Y., Qin, T., Li, H.: Feature selection for ranking. In: SIGIR, pp. 407–414 (2007)Google Scholar
  17. 17.
    Hay, M., Machanavajjhala, A., Miklau, G., Chen, Y., Zhang, D.: Principled evaluation of differentially private algorithms using dpbench. In: SIGMOD, pp. 139–154 (2016)Google Scholar
  18. 18.
    Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: KDD, pp. 279–288 (2002)Google Scholar
  19. 19.
    Jagannathan, G., Pillaipakkamnatt, K., Wright, R.N.: A practical differentially private random decision tree classifier. Trans. Data Priv. 5, 273–295 (2012)MathSciNetGoogle Scholar
  20. 20.
    Kotz, S., Kozubowski, T., Podgorski, K.: The Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering, and Finance. Springer, Berlin (2001)CrossRefzbMATHGoogle Scholar
  21. 21.
    LeFevre, K., DeWitt, D., Ramakrishnan, R.: Incognito: efficient full-domain \(k\)-anonymity. In: SIGMOD, pp. 49–60 (2005)Google Scholar
  22. 22.
    LeFevre, K., DeWitt, D., Ramakrishnan, R.: Mondrian multidimensional \(k\)-anonymity. In: ICDE, p. 25 (2006)Google Scholar
  23. 23.
    Lei, J.: Differentially private m-estimators. In: NIPS, pp. 361–369 (2011)Google Scholar
  24. 24.
    McSherry, F., Talwar, K.: Mechanism design via differential privacy. In: FOCS, pp. 94–103 (2007)Google Scholar
  25. 25.
    Minnesota Population Center: Integrated Public Use Microdata Series, International: Version 6.5 [dataset]. University of Minnesota, Minneapolis (2017).  https://doi.org/10.18128/D020.V6.5
  26. 26.
    Mohammed, N., Chen, R., Fung, B.C.M., Yu, P.S.: Differentially private data release for data mining. In: KDD, pp. 493–501 (2011)Google Scholar
  27. 27.
    Qardaji, W., Yang, W., Li, N.: Differentially private grids for geospatial data. In: ICDE, pp. 757–768 (2013)Google Scholar
  28. 28.
    Qardaji, W., Yang, W., Li, N.: Understanding hierarchical methods for differentially private histograms. PVLDB 6(14), 1954–1965 (2013)Google Scholar
  29. 29.
    Ruggles, S., Genadek, K., Goeken, R., Grover, J., Sobek, M.: Integrated Public Use Microdata Series: Version 7.0 [dataset]. University of Minnesota, Minneapolis (2017).  https://doi.org/10.18128/D010.V7.0
  30. 30.
    Shokri, R., Stronati, M., Song, C., Shmatikov, V.: Membership inference attacks against machine learning models. In: 2017 IEEE Symposium on Security and Privacy, SP 2017, San Jose, CA, USA, May 22–26, 2017, pp. 3–18 (2017)Google Scholar
  31. 31.
    Su, D., Cao, J., Li, N., Bertino, E., Jin, H.: Differentially private k-means clustering. In: Proceedings of the Sixth ACM on Conference on Data and Application Security and Privacy, CODASPY 2016, New Orleans, LA, USA, March 9–11, 2016, pp. 26–37 (2016)Google Scholar
  32. 32.
    Su, D., Cao, J., Li, N., Bertino, E., Lyu, M., Jin, H.: Differentially private k-means clustering and a hybrid approach to private optimization. ACM Trans. Priv. Secur. 20(4), 16:1–16:33 (2017)CrossRefGoogle Scholar
  33. 33.
    Therneau, T.M., Atkinson, B.: Package: rpart (2014). http://cran.r-project.org/web/packages/rpart/rpart.pdf
  34. 34.
    Vinterbo, S A.: Differentially private projected histograms: construction and use for prediction. In: ECML PKDD’12, pp. 19–34 (2012)Google Scholar
  35. 35.
    Wong, R.C.W., Fu, A. W. C., Wang, K., Pei, J.: Minimality attack in privacy preserving data publishing. In: VLDB, pp. 543–554 (2007)Google Scholar
  36. 36.
    Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: ICML, pp. 412–420 (1997)Google Scholar
  37. 37.
    Zhang, J., Cormode, G., Procopiuc, C.M., Srivastava, D., Xiao, X.: Privbayes: private data release via bayesian networks. In: SIGMOD ’14, pp. 1423–1434 (2014)Google Scholar
  38. 38.
    Zhang, J., XiaoXia, X., Yang, Y., Zhang, Z., Winslett, M.: Privgene: differentially private model fitting using genetic algorithms. In: SIGMOD ’13, pp. 665–676 (2013)Google Scholar
  39. 39.
    Zhang, J., Zhang, Z., Xiao, X., Yang, Y., Winslett, M.: Functional mechanism: regression analysis under differential privacy. PVLDB 5(11), 1364–1375 (2012)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Purdue UniversityWest LafayetteUSA
  2. 2.Institute for Infocomm ResearchSingaporeSingapore
  3. 3.University of Science and Technology of ChinaHefeiChina

Personalised recommendations