Abstract
We analyze the stochastic proximal subgradient descent in the case where the objective functions are path differentiable and verify a Sard-type condition. While the accumulation set may not be reduced to unique point, we show that the time spent by the iterates to move from one accumulation point to another goes to infinity. An oscillation-type behavior of the drift is established. These results show a strong stability property of the proximal subgradient descent. Using the theory of closed measures, Bolte et al. (2020) established this type of behavior for the deterministic subgradient descent. Our technique of proof relies on the classical works on stochastic approximation of differential inclusions, which allows us to extend results in the deterministic case to a stochastic and proximal setting, as well as to treat these different cases in a unified manner.
Similar content being viewed by others
Data availability
Not applicable.
References
Aubin, J.P., Cellina, A.: Differential inclusions. In: Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 264. Springer, Berlin (1984). https://doi.org/10.1007/978-3-642-69512-4. Set-valued maps and viability theory
Benaïm, M.: Dynamics of stochastic approximation algorithms. In: Séminaire de Probabilités, XXXIII, Lecture Notes in Mathematics, vol. 1709, pp. 1–68. Springer, Berlin (1999). https://doi.org/10.1007/BFb0096509
Benaïm, M., Hofbauer, J., Sorin, S.: Stochastic approximations and differential inclusions. SIAM J. Control Optim. (Electron.) 44(1), 328–348 (2005). https://doi.org/10.1137/S0363012904439301
Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 18(2), 556–572 (2007)
Bolte, J., Pauwels, E.: Conservative set valued fields, automatic differentiation, stochastic gradient method and deep learning. arXiv:1909.10300 (2019)
Bolte, J., Pauwels, E., Rios-Zertuche, R.: Long term dynamics of the subgradient method for Lipschitz path differentiable functions. arxiv:2006.00098 (2020)
Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. (2018). https://doi.org/10.1137/16M1080173
Clarke, F.H., Ledyaev, Y.S., Stern, R.J., Wolenski, P.R.: Nonsmooth analysis and control theory. In: Graduate Texts in Mathematics, vol. 178. Springer, New York (1998)
Coste, M.: An introduction to O-minimal geometry. Dottorato di ricerca in matematica / Università di Pisa, Dipartimento di Matematica. Istituti editoriali e poligrafici internazionali (2000). https://books.google.fr/books?id=vK56IAAACAAJ
Davis, D., Drusvyatskiy, D., Kakade, S., Lee, J.D.: Stochastic subgradient method converges on tame functions. Found. Comput. Math. (2019). https://doi.org/10.1007/s10208-018-09409-5
van den Dries, L., Miller, C.: Geometric categories and o-minimal structures. Duke Math. J. 84(2), 497–540 (1996). https://doi.org/10.1215/S0012-7094-96-08416-1
Duchi, J., Ruan, F.: Stochastic methods for composite and weakly convex optimization problems (2018)
Ioffe, A.D.: An invitation to tame optimization. SIAM J. Optim. 19(4), 1894–1917 (2009). https://doi.org/10.1137/080722059
Majewski, S., Miasojedow, B., Moulines, E.: Analysis of nonsmooth stochastic approximation: the differential inclusion approach. arXiv:1805.01916 (2018)
Rios-Zertuche, R.: Examples of pathological dynamics of the subgradient method for Lipschitz path-differentiable functions. arXiv:2007.11699 (2020)
Rockafellar, R.T., Wets, R.J.B.: Variational analysis. In: Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 317. Springer, Berlin (1998). https://doi.org/10.1007/978-3-642-02431-3
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58(1), 267–288 (1996)
Funding
This work is supported by the Région Ile-de-France.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Schechtman, S. Stochastic proximal subgradient descent oscillates in the vicinity of its accumulation set. Optim Lett 17, 177–190 (2023). https://doi.org/10.1007/s11590-022-01884-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11590-022-01884-8