Abstract
Differential privacy is the standard privacy definition for performing analyses over sensitive data. Yet, its privacy budget bounds the number of tasks an analyst can perform with reasonable accuracy, which makes it challenging to deploy in practice. This can be alleviated by private sketching, where the dataset is compressed into a single noisy sketch vector which can be shared with the analysts and used to perform arbitrarily many analyses. However, the algorithms to perform specific tasks from sketches must be developed on a case-by-case basis, which is a major impediment to their use. In this paper, we introduce the generic moment-to-moment (\(\textrm{M}^2\textrm{M}\)) method to perform a wide range of data exploration tasks from a single private sketch. Among other things, this method can be used to estimate empirical moments of attributes, the covariance matrix, counting queries (including histograms), and regression models. Our method treats the sketching mechanism as a black-box operation, and can thus be applied to a wide variety of sketches from the literature, widening their ranges of applications without further engineering or privacy loss, and removing some of the technical barriers to the wider adoption of sketches for data exploration under differential privacy. We validate our method with data exploration tasks on artificial and real-world data, and show that it can be used to reliably estimate statistics and train classification models from private sketches.
F. Houssiau and V. Schellekens—These authors contributed equally.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abowd, J.M.: The US census bureau adopts differential privacy. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, p. 2867 (2018)
Aktay, A., et al.: Google Covid-19 community mobility reports: anonymization process description (version 1.0). arXiv preprint arXiv:2004.04145 (2020)
Balog, M., Tolstikhin, I., Schölkopf, B.: Differentially private database release via kernel mean embeddings. In: International Conference on Machine Learning, pp. 414–422 (2018)
Barak, B., Chaudhuri, K., Dwork, C., Kale, S., McSherry, F., Talwar, K.: Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In: Proceedings of the Twenty-Sixth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 273–282 (2007)
Barbaro, M., Zeller, T., Hansell, S.: A face is exposed for AOL searcher no. 4417749. New York Times 9(2008), 8For (2006)
Blum, A., Hopcroft, J., Kannan, R.: Foundations of Data Science. Cambridge University Press, Cambridge (2020)
Candanedo, L.M., Feldheim, V.: Accurate occupancy detection of an office room from light, temperature, humidity and CO2 measurements using statistical learning models. Energy Build. 112, 28–39 (2016)
Chatalic, A., Schellekens, V., Houssiau, F., De Montjoye, Y.A., Jacques, L., Gribonval, R.: Compressive learning with privacy guarantees. Inf. Inference: J. IMA (iaab005) (2021). https://doi.org/10.1093/imaiai/iaab005
Chaudhuri, K., Monteleoni, C.: Privacy-preserving logistic regression. In: Advances in Neural Information Processing Systems, pp. 289–296 (2009)
Chaudhuri, K., Monteleoni, C., Sarwate, A.D.: Differentially private empirical risk minimization. J. Mach. Learn. Res. 12(3) (2011)
Coleman, B., Shrivastava, A.: Sub-linear race sketches for approximate kernel density estimation on streaming data. In: Proceedings of the Web Conference 2020, pp. 1739–1749 (2020)
Cormode, G., Garofalakis, M., Haas, P.J., Jermaine, C.: Synopses for massive data: samples, histograms, wavelets, sketches. Found. Trends Databases 4(1–3), 1–294 (2012)
Drineas, P., Kannan, R., Mahoney, M.W.: Fast Monte Carlo algorithms for matrices I: approximating matrix multiplication. SIAM J. Comput. 36(1), 132–157 (2006)
Dwork, C.: Differential privacy: a survey of results. In: Agrawal, M., Du, D., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79228-4_1
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14
Flajolet, P., Martin, G.N.: Probabilistic counting algorithms for data base applications. J. Comput. Syst. Sci. 31(2), 182–209 (1985)
Gribonval, R., Blanchard, G., Keriven, N., Traonmilin, Y.: Compressive statistical learning with random feature moments. Math. Stat. Learn. 3(2), 113–164 (2021)
Harder, F., Adamczewski, K., Park, M.: DP-MERF: differentially private mean embeddings with RandomFeatures for practical privacy-preserving data generation. In: Banerjee, A., Fukumizu, K. (eds.) Proceedings of the 24th International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 130, pp. 1819–1827. PMLR (2021)
Huang, G., Huang, G.B., Song, S., You, K.: Trends in extreme learning machines: a review. Neural Netw. 61, 32–48 (2015). https://doi.org/10.1016/j.neunet.2014.10.001
Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: theory and applications. Neurocomputing 70(1–3), 489–501 (2006)
Kenthapadi, K., Tran, T.T.: PriPeARL: a framework for privacy-preserving analytics and reporting at linkedin. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 2183–2191 (2018)
Keriven, N., Bourrier, A., Gribonval, R., Pérez, P.: Sketching for large-scale learning of mixture models. Inf. Inference: J. IMA 7(3), 447–508 (2018)
Keriven, N., Tremblay, N., Traonmilin, Y., Gribonval, R.: Compressive k-means. In: ICASSP (2017). https://hal.inria.fr/hal-01386077/document
Li, H., Xiong, L., Jiang, X.: Differentially private synthesization of multi-dimensional data using copula functions. In: Advances in Database Technology: Proceedings. International Conference on Extending Database Technology, vol. 2014, p. 475. NIH Public Access (2014)
Liu, F., Huang, X., Chen, Y., Suykens, J.A.: Random features for kernel approximation: a survey on algorithms, theory, and beyond. IEEE Trans. Pattern Anal. Mach. Intell. 01, 1 (2021)
Misra, J., Gries, D.: Finding repeated elements. Sci. Comput. Program. 2(2), 143–152 (1982)
de Montjoye, Y.A., Hidalgo, C.A., Verleysen, M., Blondel, V.D.: Unique in the crowd: the privacy bounds of human mobility. Sci. Rep. 3, 1376 (2013)
de Montjoye, Y.A., Radaelli, L., Singh, V.K., et al.: Unique in the shopping mall: on the reidentifiability of credit card metadata. Science 347(6221), 536–539 (2015)
Park, M., Vinaroz, M., Charusaie, M.A., Harder, F.: Polynomial magic! Hermite polynomials for private data generation. arXiv:2106.05042 [cs, stat] (2021)
Parzen, E.: On estimation of a probability density function and mode. Ann. Math. Stat. 33(3), 1065–1076 (1962)
Qardaji, W., Yang, W., Li, N.: Priview: practical differentially private release of marginal contingency tables. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 1435–1446 (2014)
Rahimi, A., Recht, B.: Random features for large-scale kernel machines. In: Advances in Neural Information Processing Systems, pp. 1177–1184 (2008)
Rahimi, A., Recht, B.: Weighted sums of random kitchen sinks: replacing minimization with randomization in learning. In: Advances in Neural Information Processing Systems, pp. 1313–1320 (2009)
Rudin, W.: Fourier Analysis on Groups. Interscience Publishers (1962)
Schellekens, V., Chatalic, A., Houssiau, F., de Montjoye, Y.A., Jacques, L., Gribonval, R.: Differentially private compressive k-means. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2019, pp. 7933–7937. IEEE (2019)
Xie, L., Lin, K., Wang, S., Wang, F., Zhou, J.: Differentially private generative adversarial network. arXiv preprint arXiv:1802.06739 (2018)
Zhang, J., Cormode, G., Procopiuc, C.M., Srivastava, D., Xiao, X.: PrivBayes: private data release via Bayesian networks. ACM Trans. Database Syst. (TODS) 42(4), 1–41 (2017)
Zhang, R., Lan, Y., Huang, G.B., Xu, Z.B.: Universal approximation of extreme learning machine with adaptive growth of hidden nodes. IEEE Trans. Neural Netw. Learn. Syst. 23(2), 365–371 (2012). https://doi.org/10.1109/TNNLS.2011.2178124
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Proof of Theorem 1
Let \(J_\varSigma \), the left-hand side of the inequality, the mean squared error between the empirical mean \(\overline{f}\) and the estimation from the sketch \(\widetilde{f}\). Denoting \(X=(X_1,\dots ,X_n)\), we have
where we used in (i) the independence from \(\xi \) and X and the fact that \(\mathbb {E}\left[ \xi \right] = 0\), and in (ii) the fact that samples \((X_i)_{1\le i\le n}\) are independent (and \(\mathbb {V}[\cdot ]\) denotes the variance of a random variable). Finally, we use Jensen’s inequality (since \(x\mapsto x^2\) is convex) to show that \(\left( \mathbb {E}_{X}\left[ f(X)\right] -\langle a,\mathbb {E}_{X}\left[ \varPhi (X)\right] \rangle \right) ^2 \le \mathbb {E}_{X}\left[ \left( f(X) - \langle a,\varPhi (X)\rangle \right) ^2\right] \), which concludes the proof.
B \(\textrm{M}^2\textrm{M}\) Learning Procedure
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Houssiau, F., Schellekens, V., Chatalic, A., Annamraju, S.K., de Montjoye, YA. (2023). M\(^2\)M: A General Method to Perform Various Data Analysis Tasks from a Differentially Private Sketch. In: Lenzini, G., Meng, W. (eds) Security and Trust Management. STM 2022. Lecture Notes in Computer Science, vol 13867. Springer, Cham. https://doi.org/10.1007/978-3-031-29504-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-29504-1_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-29503-4
Online ISBN: 978-3-031-29504-1
eBook Packages: Computer ScienceComputer Science (R0)