M $$^2$$ M: A General Method to Perform Various Data Analysis Tasks from a Differentially Private Sketch

Houssiau, Florimond; Schellekens, Vincent; Chatalic, Antoine; Annamraju, Shreyas Kumar; de Montjoye, Yves-Alexandre

doi:10.1007/978-3-031-29504-1_7

Florimond Houssiau⁹,
Vincent Schellekens¹⁰,
Antoine Chatalic¹¹,
Shreyas Kumar Annamraju¹² &
…
Yves-Alexandre de Montjoye¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13867))

Included in the following conference series:

International Workshop on Security and Trust Management

214 Accesses

Abstract

Differential privacy is the standard privacy definition for performing analyses over sensitive data. Yet, its privacy budget bounds the number of tasks an analyst can perform with reasonable accuracy, which makes it challenging to deploy in practice. This can be alleviated by private sketching, where the dataset is compressed into a single noisy sketch vector which can be shared with the analysts and used to perform arbitrarily many analyses. However, the algorithms to perform specific tasks from sketches must be developed on a case-by-case basis, which is a major impediment to their use. In this paper, we introduce the generic moment-to-moment ($\textrm{M}^2\textrm{M}$) method to perform a wide range of data exploration tasks from a single private sketch. Among other things, this method can be used to estimate empirical moments of attributes, the covariance matrix, counting queries (including histograms), and regression models. Our method treats the sketching mechanism as a black-box operation, and can thus be applied to a wide variety of sketches from the literature, widening their ranges of applications without further engineering or privacy loss, and removing some of the technical barriers to the wider adoption of sketches for data exploration under differential privacy. We validate our method with data exploration tasks on artificial and real-world data, and show that it can be used to reliably estimate statistics and train classification models from private sketches.

F. Houssiau and V. Schellekens—These authors contributed equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We consider only unbounded DP for conciseness, yet the private sketches from Sect. 2.3 can be extended in a straightforward manner to the bounded DP setting. In this case no noise needs to be added to the denominator in (2).

References

Abowd, J.M.: The US census bureau adopts differential privacy. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, p. 2867 (2018)
Google Scholar
Aktay, A., et al.: Google Covid-19 community mobility reports: anonymization process description (version 1.0). arXiv preprint arXiv:2004.04145 (2020)
Balog, M., Tolstikhin, I., Schölkopf, B.: Differentially private database release via kernel mean embeddings. In: International Conference on Machine Learning, pp. 414–422 (2018)
Google Scholar
Barak, B., Chaudhuri, K., Dwork, C., Kale, S., McSherry, F., Talwar, K.: Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In: Proceedings of the Twenty-Sixth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 273–282 (2007)
Google Scholar
Barbaro, M., Zeller, T., Hansell, S.: A face is exposed for AOL searcher no. 4417749. New York Times 9(2008), 8For (2006)
Google Scholar
Blum, A., Hopcroft, J., Kannan, R.: Foundations of Data Science. Cambridge University Press, Cambridge (2020)
Book MATH Google Scholar
Candanedo, L.M., Feldheim, V.: Accurate occupancy detection of an office room from light, temperature, humidity and CO2 measurements using statistical learning models. Energy Build. 112, 28–39 (2016)
Article Google Scholar
Chatalic, A., Schellekens, V., Houssiau, F., De Montjoye, Y.A., Jacques, L., Gribonval, R.: Compressive learning with privacy guarantees. Inf. Inference: J. IMA (iaab005) (2021). https://doi.org/10.1093/imaiai/iaab005
Chaudhuri, K., Monteleoni, C.: Privacy-preserving logistic regression. In: Advances in Neural Information Processing Systems, pp. 289–296 (2009)
Google Scholar
Chaudhuri, K., Monteleoni, C., Sarwate, A.D.: Differentially private empirical risk minimization. J. Mach. Learn. Res. 12(3) (2011)
Google Scholar
Coleman, B., Shrivastava, A.: Sub-linear race sketches for approximate kernel density estimation on streaming data. In: Proceedings of the Web Conference 2020, pp. 1739–1749 (2020)
Google Scholar
Cormode, G., Garofalakis, M., Haas, P.J., Jermaine, C.: Synopses for massive data: samples, histograms, wavelets, sketches. Found. Trends Databases 4(1–3), 1–294 (2012)
MATH Google Scholar
Drineas, P., Kannan, R., Mahoney, M.W.: Fast Monte Carlo algorithms for matrices I: approximating matrix multiplication. SIAM J. Comput. 36(1), 132–157 (2006)
Article MathSciNet MATH Google Scholar
Dwork, C.: Differential privacy: a survey of results. In: Agrawal, M., Du, D., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79228-4_1
Chapter MATH Google Scholar
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14
Chapter Google Scholar
Flajolet, P., Martin, G.N.: Probabilistic counting algorithms for data base applications. J. Comput. Syst. Sci. 31(2), 182–209 (1985)
Article MathSciNet MATH Google Scholar
Gribonval, R., Blanchard, G., Keriven, N., Traonmilin, Y.: Compressive statistical learning with random feature moments. Math. Stat. Learn. 3(2), 113–164 (2021)
Article MathSciNet MATH Google Scholar
Harder, F., Adamczewski, K., Park, M.: DP-MERF: differentially private mean embeddings with RandomFeatures for practical privacy-preserving data generation. In: Banerjee, A., Fukumizu, K. (eds.) Proceedings of the 24th International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 130, pp. 1819–1827. PMLR (2021)
Google Scholar
Huang, G., Huang, G.B., Song, S., You, K.: Trends in extreme learning machines: a review. Neural Netw. 61, 32–48 (2015). https://doi.org/10.1016/j.neunet.2014.10.001
Article MATH Google Scholar
Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: theory and applications. Neurocomputing 70(1–3), 489–501 (2006)
Article Google Scholar
Kenthapadi, K., Tran, T.T.: PriPeARL: a framework for privacy-preserving analytics and reporting at linkedin. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 2183–2191 (2018)
Google Scholar
Keriven, N., Bourrier, A., Gribonval, R., Pérez, P.: Sketching for large-scale learning of mixture models. Inf. Inference: J. IMA 7(3), 447–508 (2018)
Article MathSciNet MATH Google Scholar
Keriven, N., Tremblay, N., Traonmilin, Y., Gribonval, R.: Compressive k-means. In: ICASSP (2017). https://hal.inria.fr/hal-01386077/document
Li, H., Xiong, L., Jiang, X.: Differentially private synthesization of multi-dimensional data using copula functions. In: Advances in Database Technology: Proceedings. International Conference on Extending Database Technology, vol. 2014, p. 475. NIH Public Access (2014)
Google Scholar
Liu, F., Huang, X., Chen, Y., Suykens, J.A.: Random features for kernel approximation: a survey on algorithms, theory, and beyond. IEEE Trans. Pattern Anal. Mach. Intell. 01, 1 (2021)
Google Scholar
Misra, J., Gries, D.: Finding repeated elements. Sci. Comput. Program. 2(2), 143–152 (1982)
Article MathSciNet MATH Google Scholar
de Montjoye, Y.A., Hidalgo, C.A., Verleysen, M., Blondel, V.D.: Unique in the crowd: the privacy bounds of human mobility. Sci. Rep. 3, 1376 (2013)
Article Google Scholar
de Montjoye, Y.A., Radaelli, L., Singh, V.K., et al.: Unique in the shopping mall: on the reidentifiability of credit card metadata. Science 347(6221), 536–539 (2015)
Article Google Scholar
Park, M., Vinaroz, M., Charusaie, M.A., Harder, F.: Polynomial magic! Hermite polynomials for private data generation. arXiv:2106.05042 [cs, stat] (2021)
Parzen, E.: On estimation of a probability density function and mode. Ann. Math. Stat. 33(3), 1065–1076 (1962)
Article MathSciNet MATH Google Scholar
Qardaji, W., Yang, W., Li, N.: Priview: practical differentially private release of marginal contingency tables. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 1435–1446 (2014)
Google Scholar
Rahimi, A., Recht, B.: Random features for large-scale kernel machines. In: Advances in Neural Information Processing Systems, pp. 1177–1184 (2008)
Google Scholar
Rahimi, A., Recht, B.: Weighted sums of random kitchen sinks: replacing minimization with randomization in learning. In: Advances in Neural Information Processing Systems, pp. 1313–1320 (2009)
Google Scholar
Rudin, W.: Fourier Analysis on Groups. Interscience Publishers (1962)
Google Scholar
Schellekens, V., Chatalic, A., Houssiau, F., de Montjoye, Y.A., Jacques, L., Gribonval, R.: Differentially private compressive k-means. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2019, pp. 7933–7937. IEEE (2019)
Google Scholar
Xie, L., Lin, K., Wang, S., Wang, F., Zhou, J.: Differentially private generative adversarial network. arXiv preprint arXiv:1802.06739 (2018)
Zhang, J., Cormode, G., Procopiuc, C.M., Srivastava, D., Xiao, X.: PrivBayes: private data release via Bayesian networks. ACM Trans. Database Syst. (TODS) 42(4), 1–41 (2017)
Article MathSciNet MATH Google Scholar
Zhang, R., Lan, Y., Huang, G.B., Xu, Z.B.: Universal approximation of extreme learning machine with adaptive growth of hidden nodes. IEEE Trans. Neural Netw. Learn. Syst. 23(2), 365–371 (2012). https://doi.org/10.1109/TNNLS.2011.2178124
Article Google Scholar

Download references

Author information

Authors and Affiliations

The Alan Turing Institute, London, UK
Florimond Houssiau
UCLouvain, Ottignies-Louvain-la-Neuve, Belgium
Vincent Schellekens
MaLGa - DIBRIS, Università di Genova, Genoa, Italy
Antoine Chatalic
Imperial College London, London, UK
Shreyas Kumar Annamraju & Yves-Alexandre de Montjoye

Authors

Florimond Houssiau
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Schellekens
View author publications
You can also search for this author in PubMed Google Scholar
Antoine Chatalic
View author publications
You can also search for this author in PubMed Google Scholar
Shreyas Kumar Annamraju
View author publications
You can also search for this author in PubMed Google Scholar
Yves-Alexandre de Montjoye
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Florimond Houssiau .

Editor information

Editors and Affiliations

University of Luxembourg, Esch-sur-Alzette, Luxembourg
Gabriele Lenzini
Technical University of Denmark, Kongens Lyngby, Denmark
Weizhi Meng

Appendices

A Proof of Theorem 1

Let $J_\varSigma $, the left-hand side of the inequality, the mean squared error between the empirical mean $\overline{f}$ and the estimation from the sketch $\widetilde{f}$. Denoting $X=(X_1,\dots ,X_n)$, we have

$$ \begin{array}{ll} J_\varSigma &{}= \mathbb {E}_{X,\xi }\left[ \left( \frac{1}{n}\sum _{i=1}^n f(X_i) - \langle a, \frac{1}{n}\left( \sum _{i=1}^n \varPhi (X_i) + \xi \right) \rangle \right) ^2\right] \\ &{}= \mathbb {E}_{X,\xi }\left[ \left( \frac{1}{n}\sum _{i=1}^n \left( f(X_i)-\langle a,\varPhi (X_i)\rangle \right) - \frac{1}{n}\langle a, \xi \rangle \right) ^2\right] \\ &{}{\mathop {=}\limits ^{(i)}} \mathbb {E}_{X}\left[ \left( \frac{1}{n}\sum _{i=1}^n \left( f(X_i)-\langle a,\varPhi (X_i)\rangle \right) \right) ^2\right] + \frac{1}{n^2}\mathbb {E}_{\xi }\left[ \langle a,\xi \rangle ^2\right] \\ &{}{\mathop {=}\limits ^{(ii)}} \frac{n (n-1)}{n^2} \cdot \left( \mathbb {E}_{X}\left[ f(X)\right] -\langle a,\mathbb {E}_{X}\left[ \varPhi (X)\right] \rangle \right) ^2\\ &{}\quad \quad ~+~\frac{n}{n^2}\cdot \mathbb {E}_{X}\left[ \left( f(X) - \langle a,\varPhi (X)\rangle \right) ^2\right] + ||a||_2^2 \frac{\mathbb {V}[\xi ]}{n^2} \end{array} $$

where we used in (i) the independence from $\xi $ and X and the fact that $\mathbb {E}\left[ \xi \right] = 0$, and in (ii) the fact that samples $(X_i)_{1\le i\le n}$ are independent (and $\mathbb {V}[\cdot ]$ denotes the variance of a random variable). Finally, we use Jensen’s inequality (since $x\mapsto x^2$ is convex) to show that $\left( \mathbb {E}_{X}\left[ f(X)\right] -\langle a,\mathbb {E}_{X}\left[ \varPhi (X)\right] \rangle \right) ^2 \le \mathbb {E}_{X}\left[ \left( f(X) - \langle a,\varPhi (X)\rangle \right) ^2\right] $, which concludes the proof.

B $\textrm{M}^2\textrm{M}$ Learning Procedure

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Houssiau, F., Schellekens, V., Chatalic, A., Annamraju, S.K., de Montjoye, YA. (2023). M$^2$M: A General Method to Perform Various Data Analysis Tasks from a Differentially Private Sketch. In: Lenzini, G., Meng, W. (eds) Security and Trust Management. STM 2022. Lecture Notes in Computer Science, vol 13867. Springer, Cham. https://doi.org/10.1007/978-3-031-29504-1_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-29504-1_7
Published: 04 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-29503-4
Online ISBN: 978-3-031-29504-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

M\(^2\)M: A General Method to Perform Various Data Analysis Tasks from a Differentially Private Sketch

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

A Proof of Theorem 1

B \(\textrm{M}^2\textrm{M}\) Learning Procedure

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

M\(^2\)M: A General Method to Perform Various Data Analysis Tasks from a Differentially Private Sketch

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

A Proof of Theorem 1

B \(\textrm{M}^2\textrm{M}\) Learning Procedure

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation