Skip to main content
Log in

Stochastic proximal subgradient descent oscillates in the vicinity of its accumulation set

  • Original Paper
  • Published:
Optimization Letters Aims and scope Submit manuscript

Abstract

We analyze the stochastic proximal subgradient descent in the case where the objective functions are path differentiable and verify a Sard-type condition. While the accumulation set may not be reduced to unique point, we show that the time spent by the iterates to move from one accumulation point to another goes to infinity. An oscillation-type behavior of the drift is established. These results show a strong stability property of the proximal subgradient descent. Using the theory of closed measures, Bolte et al. (2020) established this type of behavior for the deterministic subgradient descent. Our technique of proof relies on the classical works on stochastic approximation of differential inclusions, which allows us to extend results in the deterministic case to a stochastic and proximal setting, as well as to treat these different cases in a unified manner.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Data availability

Not applicable.

References

  1. Aubin, J.P., Cellina, A.: Differential inclusions. In: Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 264. Springer, Berlin (1984). https://doi.org/10.1007/978-3-642-69512-4. Set-valued maps and viability theory

  2. Benaïm, M.: Dynamics of stochastic approximation algorithms. In: Séminaire de Probabilités, XXXIII, Lecture Notes in Mathematics, vol. 1709, pp. 1–68. Springer, Berlin (1999). https://doi.org/10.1007/BFb0096509

  3. Benaïm, M., Hofbauer, J., Sorin, S.: Stochastic approximations and differential inclusions. SIAM J. Control Optim. (Electron.) 44(1), 328–348 (2005). https://doi.org/10.1137/S0363012904439301

    Article  MathSciNet  MATH  Google Scholar 

  4. Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 18(2), 556–572 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  5. Bolte, J., Pauwels, E.: Conservative set valued fields, automatic differentiation, stochastic gradient method and deep learning. arXiv:1909.10300 (2019)

  6. Bolte, J., Pauwels, E., Rios-Zertuche, R.: Long term dynamics of the subgradient method for Lipschitz path differentiable functions. arxiv:2006.00098 (2020)

  7. Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. (2018). https://doi.org/10.1137/16M1080173

    Article  MathSciNet  MATH  Google Scholar 

  8. Clarke, F.H., Ledyaev, Y.S., Stern, R.J., Wolenski, P.R.: Nonsmooth analysis and control theory. In: Graduate Texts in Mathematics, vol. 178. Springer, New York (1998)

  9. Coste, M.: An introduction to O-minimal geometry. Dottorato di ricerca in matematica / Università di Pisa, Dipartimento di Matematica. Istituti editoriali e poligrafici internazionali (2000). https://books.google.fr/books?id=vK56IAAACAAJ

  10. Davis, D., Drusvyatskiy, D., Kakade, S., Lee, J.D.: Stochastic subgradient method converges on tame functions. Found. Comput. Math. (2019). https://doi.org/10.1007/s10208-018-09409-5

    Article  MATH  Google Scholar 

  11. van den Dries, L., Miller, C.: Geometric categories and o-minimal structures. Duke Math. J. 84(2), 497–540 (1996). https://doi.org/10.1215/S0012-7094-96-08416-1

    Article  MathSciNet  MATH  Google Scholar 

  12. Duchi, J., Ruan, F.: Stochastic methods for composite and weakly convex optimization problems (2018)

  13. Ioffe, A.D.: An invitation to tame optimization. SIAM J. Optim. 19(4), 1894–1917 (2009). https://doi.org/10.1137/080722059

    Article  MathSciNet  MATH  Google Scholar 

  14. Majewski, S., Miasojedow, B., Moulines, E.: Analysis of nonsmooth stochastic approximation: the differential inclusion approach. arXiv:1805.01916 (2018)

  15. Rios-Zertuche, R.: Examples of pathological dynamics of the subgradient method for Lipschitz path-differentiable functions. arXiv:2007.11699 (2020)

  16. Rockafellar, R.T., Wets, R.J.B.: Variational analysis. In: Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 317. Springer, Berlin (1998). https://doi.org/10.1007/978-3-642-02431-3

  17. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58(1), 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

Download references

Funding

This work is supported by the Région Ile-de-France.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. Schechtman.

Ethics declarations

Conflict of interest

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Schechtman, S. Stochastic proximal subgradient descent oscillates in the vicinity of its accumulation set. Optim Lett 17, 177–190 (2023). https://doi.org/10.1007/s11590-022-01884-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11590-022-01884-8

Keywords

Navigation