Abstract
In the realm of large-scale machine learning, it is crucial to explore methods for reducing computational complexity and memory demands while maintaining generalization performance. Additionally, since the collected data may contain some sensitive information, it is also of great significance to study privacy-preserving machine learning algorithms. This paper focuses on the performance of the differentially private stochastic gradient descent (SGD) algorithm based on random features. To begin, the algorithm maps the original data into a low-dimensional space, thereby avoiding the traditional kernel method for large-scale data storage requirement. Subsequently, the algorithm iteratively optimizes parameters using the stochastic gradient descent approach. Lastly, the output perturbation mechanism is employed to introduce random noise, ensuring algorithmic privacy. We prove that the proposed algorithm satisfies the differential privacy while achieving fast convergence rates under some mild conditions.
Similar content being viewed by others
References
N Aronszajn. Theory of reproducing kernels, Transactions of the American mathematical society, 1950, 68(3): 337–404.
R Bassily, V Feldman, C Guzmán, K Talwar. Stability of stochastic gradient descent on nonsmooth convex losses, Advances in Neural Information Processing Systems, 2020, 33: 4381–4391.
R Bassily, V Feldman, K Talwar, A Thakurta. Private stochastic convex optimization with optimal rates, Advances in Neural Information Processing Systems, 2019, 32.
L Bottou, O Bousquet. The tradeoff’s of large scale learning, Advances in Neural Information Processing Systems, 2007, 20.
L Carratino, A Rudi, L Rosasco. Learning with sgd and random features, Advances in Neural Information Processing Systems, 2018, 31.
K Chaudhuri, C Monteleoni, A D Sarwate. Differentially private empirical risk minimization, Journal of Machine Learning Research, 2011, 12(3): 1069–1109.
X Chen, B Tang, J Fan, X Guo. Online gradient descent algorithms for functional data learning, Journal of Complexity, 2022, 70: 101635.
F Cucker, D X Zhou. Learning theory: an approximation theory viewpoint, Cambridge University Press, 2007.
A Dieuleveut, F Bach. Nonparametric stochastic approximation with large step-sizes, The Annals of Statistics, 2016, 44(4): 1363–1399.
C Dwork, F McSherry, K Nissim, A Smith. Calibrating noise to sensitivity in private data analysis, In Theory of Cryptography Conference, 2006, 265–284.
C Dwork, A Roth. The algorithmic foundations of differential privacy, Foundations and Trends® in Theoretical Computer Science, 2014, 9(3–4): 211–407.
V Feldman, T Koren, K Talwar. Private stochastic convex optimization: optimal rates in linear time, Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, 2020, 439–449.
X Guo, Z C Guo, L Shi. Capacity dependent analysis for functional online learning algorithms, Applied and Computational Harmonic Analysis, 2023, 67: 101567.
Z C Guo, A Christmann, L Shi. Optimality of robust online learning, Foundations of Computational Mathematics, 2023.
Z C Guo, L Shi. Fast and strong convergence of online learning algorithms, Advances in Computational Mathematics, 2019, 45: 2745–2770.
P Jain, A Thakurta. Differentially private learning with kernels, In International Conference on Machine Learning, 2013, 118–126.
Y Lei, L Shi, Z C Guo. Convergence of unregularized online learning algorithms, The Journal of Machine Learning Research, 2017, 18(1): 6269–6301.
Y Lei, Y Ying. Fine-grained analysis of stability and generalization for stochastic gradient descent, In International Conference on Machine Learning, 2020, 5809–5819.
J Lin, L Rosasco. Optimal rates for multi-pass stochastic gradient methods, The Journal of Machine Learning Research, 2017, 18(1): 3375–3421.
I Pinelis. Optimum bounds for the distributions of martingales in banach spaces, The Annals of Probability, 1994, 1679–1706.
A Rahimi, B Recht. Random features for large-scale kernel machines, Advances in Neural Information Processing Systems, 2007, 20.
A Rudi, L Rosasco. Generalization properties of learning with random features, Advances in Neural Information Processing Systems, 2017, 30.
B Schölkopf, A J Smola. Learning with kernels: support vector machines, regularization, optimization, and beyond, MIT press, 2002.
O Shamir, T Zhang. Stochastic gradient descent for non-smooth optimization: Convergence results and optimal averaging schemes, In International Conference on Machine Learning, 2013, 71–79.
S Smale, D X Zhou. Learning theory estimates via integral operators and their approximations, Constructive approximation, 2007, 26(2): 153–172.
B Sriperumbudur, Z Szabó. Optimal rates for random fourier features, Advances in Neural Information Processing Systems, 2015, 28.
I Sutskever, J Martens, G Dahl, G Hinton. On the importance of initialization and momentum in deep learning, In International Conference on Machine Learning, 2013, 1139–1147.
M J Wainwright. High-dimensional statistics: A non-asymptotic viewpoint, Cambridge University Press, 2019.
P Wang, Y Lei, Y Ying, H Zhang. Differentially private sgd with non-smooth losses, Applied and Computational Harmonic Analysis, 2022, 56: 306–336.
X Wu, F Li, A Kumar, K Chaudhuri, S Jha, J Naughton. Bolt-on differential privacy for scalable stochastic gradient descent-based analytics, In Proceedings of the 2017 ACM International Conference on Management of Data, 2017, 1307–1322.
Y Ying, M Pontil. Online gradient descent learning algorithms, Foundations of Computational Mathematics, 2008, 8: 561–596.
Y Ying, D X Zhou. Online regularized classification algorithms, IEEE Transactions on Information Theory, 2006, 52(11): 4775–4788.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest The authors declare no conflict of interest.
Additional information
The work is supported by Zhejiang Provincial Natural Science Foundation of China (LR20A010001) and National Natural Science Foundation of China(12271473 and U21A20426).
Rights and permissions
About this article
Cite this article
Wang, Yg., Guo, Zc. Differentially private SGD with random features. Appl. Math. J. Chin. Univ. 39, 1–23 (2024). https://doi.org/10.1007/s11766-024-5037-0
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11766-024-5037-0
Keywords
- learning theory
- differential privacy
- stochastic gradient descent
- random features
- reproducing kernel Hilbert spaces