A Line Search Based Proximal Stochastic Gradient Algorithm with Dynamical Variance Reduction

Franchini, Giorgia; Porta, Federica; Ruggiero, Valeria; Trombini, Ilaria

doi:10.1007/s10915-022-02084-3

A Line Search Based Proximal Stochastic Gradient Algorithm with Dynamical Variance Reduction

Published: 23 December 2022

Volume 94, article number 23, (2023)
Cite this article

Journal of Scientific Computing Aims and scope Submit manuscript

Giorgia Franchini ORCID: orcid.org/0000-0001-9082-8087¹,
Federica Porta¹,
Valeria Ruggiero² &
…
Ilaria Trombini^2,3

424 Accesses
4 Citations
1 Altmetric
Explore all metrics

A Correction to this article was published on 23 June 2023

This article has been updated

Abstract

Many optimization problems arising from machine learning applications can be cast as the minimization of the sum of two functions: the first one typically represents the expected risk, and in practice it is replaced by the empirical risk, and the other one imposes a priori information on the solution. Since in general the first term is differentiable and the second one is convex, proximal gradient methods are very well suited to face such optimization problems. However, when dealing with large-scale machine learning issues, the computation of the full gradient of the differentiable term can be prohibitively expensive by making these algorithms unsuitable. For this reason, proximal stochastic gradient methods have been extensively studied in the optimization area in the last decades. In this paper we develop a proximal stochastic gradient algorithm which is based on two main ingredients. We indeed combine a proper technique to dynamically reduce the variance of the stochastic gradients along the iterative process with a descent condition in expectation for the objective function, aimed to fix the value for the steplength parameter at each iteration. For general objective functionals, the a.s. convergence of the limit points of the sequence generated by the proposed scheme to stationary points can be proved. For convex objective functionals, both the a.s. convergence of the whole sequence of the iterates to a minimum point and an ${\mathcal {O}}(1/k)$ convergence rate for the objective function values have been shown. The practical implementation of the proposed method does not need neither the computation of the exact gradient of the empirical risk during the iterations nor the tuning of an optimal value for the steplength. An extensive numerical experimentation highlights that the proposed approach appears robust with respect to the setting of the hyperparameters and competitive compared to state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Proximal Gradient Method with Extrapolation and Line Search for a Class of Non-convex and Non-smooth Problems

Article 19 December 2023

Accelerated Gradient-Free Optimization Methods with a Non-Euclidean Proximal Operator

Article 16 August 2019

A Telescopic Bregmanian Proximal Gradient Method Without the Global Lipschitz Continuity Assumption

Article 25 March 2019

Availability of Data and Materials

The datasets analysed during the current study are available in links given in the paper.

Change history

23 June 2023
A Correction to this paper has been published: https://doi.org/10.1007/s10915-023-02267-6

Notes

$\Vert a-b\Vert ^2+\Vert b-c\Vert ^2-\Vert a-c\Vert ^2 = 2(a-b)^T(c-b), \quad \forall a,b,c \in {\mathbb {R}}^d$.

References

Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Math. Program. Ser. A 137(1), 91–129 (2013)
Bertsekas, D.: Convex Optimization Theory, Chapter 6 on Convex Optimization Algorithms, pp. 251–489. Athena Scientific, Belmont (2009)
Google Scholar
Berahas, A.S., Cao, L., Scheinberg, K.: Global convergence rate analysis of a generic line search algorithm with noise. SIAM J. Optim. 31(2), 1489–1518 (2021)
Article MathSciNet MATH Google Scholar
Bollapragada, R., Byrd, R., Nocedal, J.: Adaptive sampling strategies for stochastic optimization. SIAM J. Optim. 28(4), 3312–3343 (2018)
Article MathSciNet MATH Google Scholar
Bonettini, S., Loris, I., Porta, F., Prato, M.: Variable metric inexact line-search based methods for nonsmooth optimization. SIAM J. Optim. 26, 891–921 (2016)
Article MathSciNet MATH Google Scholar
Bonettini, S., Porta, F., Prato, M., Rebegoldi, S., Ruggiero, V., Zanni, L.: Recent advances in variable metric first-order methods. In: Donatelli, M., Serra-Capizzano, S. (eds.) Computational Methods for Inverse Problems in Imaging. Springer INDAM Series, vol. 36, pp. 1–31 (2019)
Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)
Article MathSciNet MATH Google Scholar
Bottou, L.: Online algorithms and stochastic approximations, in online learning and neural networks. In: Saad, D. (ed.) Cambridge University Press, Cambridge (1998). https://leon.bottou.org/publications/pdf/online-1998.pdf
Byrd, R.H., Chin, G.M., Nocedal, J., Wu, Y.: Sample size selection in optimization methods for machine learning. Math. Program. 134(1), 128–155 (2012)
Article MathSciNet MATH Google Scholar
Combettes, P.L., Pesquet, J.-C.: Proximal splitting methods in signal processing. In: Bauschke, H.H., Burachik, R.S., Combettes, P.L., Elser, V., Luke, D.R., Wolkowicz, H. (eds.) Fixed-Point Algorithms for Inverse Problems in Science and Engineering. Springer Optimization and Its Applications, pp. 185–212. Springer, New York (2011)
Chapter MATH Google Scholar
Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. SIAM Multiscale Model. Simul. 4, 1168–1200 (2005)
Article MathSciNet MATH Google Scholar
Duchi, J., Singer, Y.: Efficient online and batch learning using forward backward splitting. J. Mach. Learn. Res. 10, 2873–2898 (2009)
MathSciNet MATH Google Scholar
Franchini, G., Ruggiero, V., Zanni, L.: Ritz-like values in steplength selections for stochastic gradient methods. Soft. Comput. 24, 17573–17588 (2020)
Article MATH Google Scholar
Franchini, G., Ruggiero, V., Trombini, I.: Automatic steplength selection in Stochastic gradient methods. Mach. Learn. Optim. Data Sci. LOD 2021, 4124–4132 (2021)
Google Scholar
Freund, J.E.: Mathematical Statistics. Prentice-Hall, Englewood Cliffs (1962)
Google Scholar
Ghadimi, S., Lan, G.: Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341–2368 (2013)
Article MathSciNet MATH Google Scholar
Iusem, A.N., Jofrè, A., Oliveira, R.I., Thompson, P.: Variance-based extragradient methods with line search for stochastic variational inequalities. SIAM J. Optim. 29(1), 175–206 (2019)
Article MathSciNet MATH Google Scholar
Le, T.V., Gopee, N.: Classifying CIFAR-10 images using unsupervised feature & ensemble learning. https://trucvietle.me/files/601-report.pdf
Paquette, C., Scheinberg, K.: A stochastic line search method with expected complexity analysis. SIAM J. Optim. 30(1), 349–376 (2020)
Article MathSciNet MATH Google Scholar
Polyak, B.T.: Introduction to Optimization. Optimization Software, New York (1987)
MATH Google Scholar
Poon, C., Liang, J., Schoenlieb, C.: Local Convergence properties of SAGA/Prox-SVRG and acceleration, PMLR. In: Proceedings of the 35th International Conference on Machine Learning, vol. 80, pp. 4124–4132 (2018)
Phamy, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: ProxSARAH: an effcient algorithmic framework for stochastic composite nonconvex optimization. J. Mach. Learn. Res. 21, 1–48 (2020)
Google Scholar
Poon, C., Liang, J., Schoenlieb, C.: Local convergence properties of SAGA/Prox-SVRG and acceleration. In: Dy, J., Krause, A (eds.) Proceedings of the 35th International Conference on Machine Learning, PMLR, Proceedings of Machine Learning Research, vol. 80, pp. 4124–4132 (2018)
Rockafellar, R.T., Wets, R.J.-B., Wets, M.: Variational Analysis. Grundlehren der Mathematischen Wissenschaften, vol. 317. Springer, Berlin (1998)
MATH Google Scholar
Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Math. Program. 162(1), 83–112 (2017)
Article MathSciNet MATH Google Scholar
Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: SpiderBoost and momentum: faster stochastic variance reduction algorithms. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems, vol. 216, pp. 2406–2416. Curran Associates Inc. (2019)
Xiao, L., Zhang, T.: A proximal Stochastic gradient method with progressive variance reduction. SIAM J. Optim. 24(4), 2057–2075 (2014)
Article MathSciNet MATH Google Scholar
Yang, Z., Wang, C., Zang, Y., Li, J.: Mini-batch algorithms with Barzilai-Borwein update step. Neurocomputing 314, 2177–185 (2018)
Article Google Scholar

Download references

Acknowledgements

The authors thank the anonymous referees for their careful reading and useful remarks and suggestions that improved the quality of the paper.

Funding

This work has been partially supported by the INdAM research group GNCS. The publication was created with the co-financing of the European Union-FSE-REACT-EU, PON Research and Innovation 2014–2020 DM1062/2021.

Author information

Authors and Affiliations

Department of Physics, Informatics and Mathematics, University of Modena and Reggio Emilia, Via Campi, 213/B, 41125, Modena, Italy
Giorgia Franchini & Federica Porta
Department of Mathematics and Computer Science, University of Ferrara, Via Machiavelli, 30, 44121, Ferrara, Italy
Valeria Ruggiero & Ilaria Trombini
Department of Mathematical, Physical and Computer Sciences, University of Parma, Parco Area delle Scienze, 7/A, 43124, Parma, Italy
Ilaria Trombini

Authors

Giorgia Franchini
View author publications
You can also search for this author in PubMed Google Scholar
Federica Porta
View author publications
You can also search for this author in PubMed Google Scholar
Valeria Ruggiero
View author publications
You can also search for this author in PubMed Google Scholar
Ilaria Trombini
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed equally to the study conception and design. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Giorgia Franchini.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A Proofs of Theorems for Sect. 2

To prove Theorems 1, 2, 3 and 4, Lemmas 1, 2 and 3 are needed. Lemma 1 recalls well known results on the proximal operator (for the proof, see [5, 10] and references therein) while Lemma 3 is a classical result from stochastic analysis.

Lemma 1

Let $\alpha >0,\; x\in \text{ dom } (P),\; u\in \mathbb {R}^d$. The following statements hold true.

a.
${\hat{y}}={\text {prox}}_{\alpha R}(x-\alpha u)$ if and only if $\frac{1}{\alpha }(x-{\hat{y}})-u=w$, $w\in \partial R({\hat{y}})$.
b.
The function $h_{\alpha }$ is strongly convex with modulus of convexity $\displaystyle \frac{1}{\alpha }$.
c.
$h_\alpha (x; x) = 0$.
d.
$h_{\alpha }(p_{\alpha }(x); x)\le 0$ and $h_{\alpha }(p_{\alpha }(x); x) = 0$ if and only if $p_{\alpha }(x) = x$.
e.
x is a stationary point for problem (1) if and only if $x = p_{\alpha }(x)$ if and only if $h_{\alpha }(p_{\alpha }(x);x) = 0$.

Lemma 2

Under the Assumption 1 (i), let us consider the sequence $\{x^{(k)}\}$ generated by the iteration (7). If $\alpha _k>0$, the following inequality holds:

$$\begin{aligned} h_{\alpha _k}(x^{(k+1)};x^{(k)})- h_{\alpha _k}(p_{\alpha _k}(x^{(k)});x^{(k)})\le \frac{\alpha _k}{2}\Vert e_g^{(k)}\Vert ^2. \end{aligned}$$

(A1)

Proof

In view of (4), we have

$$\begin{aligned} \begin{aligned}&h_{\alpha _k}(x^{(k+1)};x^{(k)})+ {e_g^{(k)}}^T(x^{(k+1)}-x^{(k)})\\&\quad =\frac{1}{2\alpha _k}\Vert x^{(k+1)}-x^{(k)}\Vert ^2 {+} \\&\qquad + (\nabla F(x^{(k)})+e_g^{(k)})^T (x^{(k+1)}-x^{(k)}) + R(x^{(k+1)})-R(x^{(k)}) \\&\quad = \nabla F(x^{(k)})^T (x^{(k+1)}-p_{\alpha _k}(x^{(k)})) + \nabla F(x^{(k)})^T (p_{\alpha _k}(x^{(k)})-{x}^{(k)}){+}\\&\qquad + {e_g^{(k)}}^T(x^{(k+1)}-x^{(k)}) +\frac{1}{2\alpha _k}\Vert x^{(k+1)}-p_{\alpha _k}(x^{(k)})\Vert ^2 {+} \\&\qquad + \frac{1}{2\alpha _k}\Vert p_{\alpha _k}(x^{(k)})-x^{(k)}\Vert ^2 + \frac{1}{\alpha _k} (x^{(k+1)}-p_{\alpha _k}(x^{(k)}))^T(p_{\alpha _k}(x^{(k)})-x^{(k)}){+}\\&\qquad +R(x^{(k+1)})-R(x^{(k)}) +R(p_{\alpha _k}(x^{(k)}))- R(p_{\alpha _k}(x^{(k)}))\\&\quad = h_{\alpha _k}(p_{\alpha _k}(x^{(k)});x^{(k)}) +\nabla F(x^{(k)})^T (x^{(k+1)}-p_{\alpha _k}(x^{(k)})){+}\\&\qquad + {e_g^{(k)}}^T(x^{(k+1)}-x^{(k)})+ \frac{1}{2\alpha _k}\Vert x^{(k+1)}-p_{\alpha _k}(x^{(k)})\Vert ^2+ \\&\qquad +{\frac{1}{\alpha _k} (x^{(k+1)}-p_{\alpha _k}(x^{(k)}))^T(p_{\alpha _k}(x^{(k)})-x^{(k)}) } +R(x^{(k+1)})- R(p_{\alpha _k}(x^{(k)})). \end{aligned} \end{aligned}$$

(A2)

Now, from the convexity of R at ${x}^{(k+1)}$ and $\frac{x^{(k)}-{x}^{(k+1)}}{\alpha _k}-(\nabla F(x^{(k)})+e_g^{(k)})\in \partial R({x}^{(k+1)})$ (Lemma 1 a), we obtain

$$\begin{aligned} \begin{aligned}&R({x^{(k+1)}})-R(p_{\alpha _k}(x^{(k)}))\le \frac{1}{\alpha _k} (x^{(k)}-x^{(k+1)})^T ({x}^{(k+1)}-p_{\alpha _k}(x^{(k)})){+}\\&\quad - (\nabla F(x^{(k)})+e_g^{(k)})^T (x^{(k+1)}-p_{\alpha _k}(x^{(k)})). \end{aligned} \end{aligned}$$

Including the above inequality in (A2), we obtain

$$\begin{aligned} \begin{aligned}&h_{\alpha _k}(x^{(k+1)};x^{(k)})+ {e_g^{(k)}}^T(x^{(k+1)}-x^{(k)}) \le h_{\alpha _k}(p_{\alpha _k}(x^{(k)});x^{(k)}){+}\\&\quad + {e_g^{(k)}}^T(p_{\alpha _k}(x^{(k)})-x^{(k)}) -\frac{1}{2\alpha _k}\Vert x^{(k+1)}-p_{\alpha _k}(x^{(k)})\Vert ^2. \end{aligned} \end{aligned}$$

(A3)

Then, we have

$$\begin{aligned}{} & {} h_{\alpha _k}(x^{(k+1)};x^{(k)})- h_{\alpha _k}(p_{\alpha _k}(x^{(k)});x^{(k)})\le {e_g^{(k)}}^T(p_{\alpha _k}(x^{(k)})-x^{(k+1)}) {+}\\{} & {} \quad -\frac{1}{2\alpha _k}\Vert x^{(k+1)}-p_{\alpha _k}(x^{(k)})\Vert ^2. \end{aligned}$$

By adding and subtracting $\frac{\alpha _k}{2}\Vert e_g^{(k)}\Vert ^2$, we obtain

$$\begin{aligned}{} & {} h_{\alpha _k}(x^{(k+1)};x^{(k)})- h_{\alpha _k}(p_{\alpha _k}(x^{(k)});x^{(k)})\le \\{} & {} \quad \le -\frac{1}{2\alpha _k}\Vert -\alpha _k e_g^{(k)}-x^{(k+1)}+p_{\alpha _k}(x^{(k)}) \Vert ^2+ \frac{\alpha _k}{2}\Vert e_g^{(k)}\Vert ^2 \le \frac{\alpha _k}{2}\Vert e_g^{(k)}\Vert ^2. \end{aligned}$$

$\square $

Lemma 3

[20, Lemma 11] Let $\nu _k$, $u_k$, $\alpha _k$, $\beta _k$ be nonnegative random variables and let

$$\begin{aligned} {\mathbb {E}}(\nu _{k+1}|{\mathcal {F}}_k)\le & {} (1+\alpha _k)\nu _k-u_k+\beta _k \qquad \text{ a.s. } \\{} & {} \sum _{k=0}^{\infty } \alpha _k< \infty \quad \text{ a.s. }, \qquad \sum _{k=0}^{\infty } \beta _k < \infty \quad \text{ a.s. }, \end{aligned}$$

where ${\mathbb {E}}(\nu _{k+1}|{\mathcal {F}}_k)$ denotes the conditional expectation for the given $\nu _0,\dots ,\nu _k$, $u_0,\dots ,u_k$, $\alpha _0,\dots ,\alpha _k$, $\beta _0,\dots ,\beta _k$. Then

$$\begin{aligned} \nu _k\longrightarrow \nu \quad \text{ a.s }, \qquad \sum _{k=0}^{\infty }u_k<\infty \quad \text{ a.s }, \end{aligned}$$

where $\nu \ge 0$ is some random variable.

Proof of Theorem 1

In view of Assumption 1 (iii), $P(x^{(k)})-P^*$ is a nonnegative random variable and, from (9), we obtain:

$$\begin{aligned}{} & {} {\mathbb {E}}(P(x^{(k+1)})-P^*|{\mathcal {F}}_k)\le (P(x^{(k)})-P^*)+\\{} & {} \quad - {\gamma }{\mathbb {E}}(-h_{\alpha _k}(x^{(k+1)};x^{(k)})-{e_g^{(k)}}^T(x^{(k+1)}-x^{(k)}) |{\mathcal {F}}_k) + \eta _k. \end{aligned}$$

In view of (8) and Lemma 3, we obtain that $P(x^{(k+1)})-P^*\longrightarrow {\overline{P}}$ a.s. and

$$\begin{aligned} \sum _{k=0}^{\infty }{\mathbb {E}}\left( -h_{\alpha _k}(x^{(k+1)};x^{(k)})-{e_g^{(k)}}^T(x^{(k+1)}-x^{(k)})|{\mathcal {F}}_k\right) <\infty \quad \text{ a.s. } \end{aligned}$$

In order to conclude the proof, we follow a strategy similar to the one employed in the proof of [23, Theorem 2.1]. Define a new random variable $w_j = \sum _{k\ge j}{\mathbb {E}}\left( -h_{\alpha _k}(x^{(k+1)};x^{(k)})-{e_g^{(k)}}^T(x^{(k+1)}-x^{(k)})|{\mathcal {F}}_k\right) $. The sequence $\{w_j\}$ is non increasing and converges to 0 as $j\rightarrow +\infty $. As a consequence, from the monotone convergence theorem, it holds that

$$\begin{aligned} \begin{aligned} 0&= {\mathbb {E}}(\lim _{j\rightarrow +\infty }w_j ) = \lim _{j\rightarrow +\infty }{\mathbb {E}}(w_j)\\ {}&= \lim _{j\rightarrow +\infty }\sum _{k\ge j}{\mathbb {E}}\left( -h_{\alpha _k}(x^{(k+1)};x^{(k)})-{e_g^{(k)}}^T(x^{(k+1)}-x^{(k)})\right) \\&=\lim _{j\rightarrow +\infty }{\mathbb {E}}\left( \sum _{k\ge j}-h_{\alpha _k}(x^{(k+1)};x^{(k)})-{e_g^{(k)}}^T(x^{(k+1)}-x^{(k)})\right) \end{aligned} \end{aligned}$$

which implies

$$\begin{aligned} {\mathbb {E}}\left( \sum _{k\ge j}-h_{\alpha _k}(x^{(k+1)};x^{(k)})-{e_g^{(k)}}^T(x^{(k+1)}-x^{(k)})\right) <+\infty \end{aligned}$$

and, hence,

$$\begin{aligned} \sum _{k}-h_{\alpha _k}(x^{(k+1)};x^{(k)})-{e_g^{(k)}}^T(x^{(k+1)}-x^{(k)})<+\infty \quad \text{ a.s. } \end{aligned}$$

Then $h_{\alpha _k}(x^{(k+1)};x^{(k)})+{e_g^{(k)}}^T(x^{(k+1)}-x^{(k)})\rightarrow 0$ a.s. $\square $

Proof of Theorem 2

We suppose that there exists a subsequence of $\{x^{(k)}\}$ that converges a.s. to ${\bar{x}}$, namely there exists ${\mathcal {K}}\subseteq {\mathbb {N}}$ such that

$$\begin{aligned} \lim _{{k\rightarrow \infty , \, k\in {\mathcal {K}}}}x^{(k)} = {\bar{x}} \text{ a.s. } \end{aligned}$$

We observe that, since $h_{\alpha _k}$ is strongly convex with modulus of convexity $\displaystyle \frac{1}{\alpha _{{max}}}$ and $p_{\alpha _k}(x^{(k)})$ is its minimum point, we have

$$\begin{aligned} \frac{1}{2\alpha _{{max}}}\Vert z-p_{\alpha _k}(x^{(k)})\Vert ^2\le h_{\alpha _k}(z;x^{(k)})-h_{\alpha _k}(p_{\alpha _k}(x^{(k)});x^{(k)}), \ \ \forall z. \end{aligned}$$

(A4)

Setting $z=x^{(k+1)}$ in the previous inequality gives

$$\begin{aligned} \frac{1}{2\alpha _{{max}}}\Vert x^{(k+1)}-p_{\alpha _k}(x^{(k)})\Vert ^2\le h_{\alpha _k}(x^{(k+1)};x^{(k)})-h_{\alpha _k}(p_{\alpha _k}(x^{(k)});x^{(k)}). \end{aligned}$$

(A5)

From the last inequality and Lemma 2, we have

$$\begin{aligned} 0\le h_{\alpha _k}(x^{(k+1)};x^{(k)})-h_{\alpha _k}(p_{\alpha _k}(x^{(k)});x^{(k)})\le \frac{\alpha _{max}}{2} \Vert e_g^{(k)}\Vert ^2 \end{aligned}$$

and, consequently, by considering the conditional expectation in both members, we have

$$\begin{aligned} \begin{aligned} 0&\le {\mathbb {E}}\left( h_{\alpha _k}(x^{(k+1)};x^{(k)})-h_{\alpha _k}(p_{\alpha _k}(x^{(k)});x^{(k)})|{\mathcal {F}}_k\right) \\&\le \frac{\alpha _{max}}{2} {\mathbb {E}}\left( \Vert e_g^{(k)}\Vert ^2|{\mathcal {F}}_k\right) \\&\le \frac{\alpha _{max}\varepsilon _k}{2}. \end{aligned} \end{aligned}$$

In view of the law of total expectation and the hypothesis on the sequence $\{\varepsilon _k\}$, the above inequality allows to state that

$$\begin{aligned} \lim _{k\rightarrow \infty } {\mathbb {E}}(h_{\alpha _k}(x^{(k+1)};x^{(k)})-h_{\alpha _k}(p_{\alpha _k}(x^{(k)});x^{(k)})) = 0. \end{aligned}$$

(A6)

From (A5) and (A6) we can conclude that

$$\begin{aligned} \lim _{k\rightarrow \infty } {\mathbb {E}}(\Vert x^{(k+1)}-p_{\alpha _k}(x^{(k)})\Vert ^2) = 0. \end{aligned}$$

(A7)

Then there exists ${\mathcal {K}}'\subseteq {\mathcal {K}}$ such that $\lim _{k\rightarrow \infty ,k\in {\mathcal {K}}'}(x^{(k+1)}-p_{\alpha _k}(x^{(k)}))=0$ a.s. By continuity of the operator $p_{\alpha _k}(\cdot )$ with respect to all its arguments, since $\{x^{(k)}\}_{k\in {\mathcal {K}}}$ is bounded a.s., $\{p_{\alpha _k}(x^{(k)})\}_{k\in {\mathcal {K}}'}$ is bounded a.s. as well. Thus $\{x^{(k+1)}\}_{k\in {\mathcal {K}}'}$ is also bounded a.s. and there exists a limit point $\bar{{\bar{x}}}$ of $\{x^{(k+1)}\}_{k\in {\mathcal {K}}'}$. We define ${\mathcal {K}}''\subseteq {\mathcal {K}}'$ such that $\lim _{{k\rightarrow \infty , \, k\in {\mathcal {K}}''}}x^{(k+1)} = \bar{{\bar{x}}}$ a.s. By continuity of the operator $p_{\alpha _k}(\cdot )$, (A7) implies that $\bar{{\bar{x}}}=p_{\alpha _k}({\bar{x}})$ a.s.

Since $h_{\alpha _k}(x;x^{(k)})+{e_g^{(k)}}^T(x-x^{(k)})$ is strongly convex with modulus of convexity $\frac{1}{\alpha _{max}}$ as well and $x^{(k+1)}$ is its minimum point, we have

$$\begin{aligned} \begin{aligned} \frac{1}{\alpha _{max}}\Vert z-x^{(k+1)}\Vert ^2&\le h_{\alpha _k}(z;x^{(k)})+{e_g^{(k)}}^T(z-x^{(k)}) - h_{\alpha _k}(x^{(k+1)};x^{(k)}){+}\\&\quad -{e_g^{(k)}}^T(x^{(k+1)}-x^{(k)}). \end{aligned} \end{aligned}$$

By setting $z=x^{(k)}$ in the previous inequality, we obtain

$$\begin{aligned} \frac{1}{\alpha _{max}}\Vert x^{(k)}-x^{(k+1)}\Vert ^2\le - h_{\alpha _k}(x^{(k+1)};x^{(k)})-{e_g^{(k)}}^T(x^{(k+1)}-x^{(k)}). \end{aligned}$$

In view of Theorem 1 (ii), we can state that

$$\begin{aligned} \Vert x^{(k)}-x^{(k+1)}\Vert ^2 \ \longrightarrow 0 \quad \text{ a.s. } \end{aligned}$$

Thus we proved that ${\bar{x}}=\bar{{\bar{x}}}=p_{\alpha _k}({\bar{x}})$ a.s. and by Lemma 1 e., we have that ${\bar{x}}$ is a stationary point a.s. $\square $

Proof of Theorem 3

Let $x^*\in X^*$. Since $\displaystyle \frac{x^{(k)}-x^{(k+1)}}{\alpha _k}-g^{(k)} \in \partial R(x^{(k+1)}), $ it holds that

$$\begin{aligned} R(y)\ge R(x^{(k+1)})+\frac{1}{\alpha _k}(x^{(k)}-x^{(k+1)}-\alpha _kg^{(k)})^T(y-x^{(k+1)}), \quad \forall y\in {\mathbb {R}}^d. \end{aligned}$$

It follows that, $\forall y\in {\mathbb {R}}^d$,

$$\begin{aligned} \begin{aligned} \alpha _kR(y)&\ge \alpha _kR(x^{(k+1)})+(x^{(k)}-x^{(k+1)}-\alpha _kg^{(k)})^T(y-x^{(k+1)})\\&=\alpha _kR(x^{(k+1)})+(x^{(k)}-x^{(k+1)})^T(y-x^{(k+1)})-\alpha _k{g^{(k)}}^T(y-x^{(k+1)}), \end{aligned} \end{aligned}$$

and, hence, the following inequality holds

$$\begin{aligned} (x^{(k+1)}-x^{(k)})^T(y-x^{(k+1)})\ge \alpha _k\left( R(x^{(k+1)})-R(y)+{g^{(k)}}^T(x^{(k+1)}-y)\right) . \end{aligned}$$

(A8)

For $y=x^*$ the previous inequality gives

$$\begin{aligned} \begin{aligned}&(x^{(k+1)}-x^{(k)})^T(x^*-x^{(k)}+x^{(k)}-x^{(k+1)})\ge \\&\quad {\ge }\alpha _k\left( R(x^{(k+1)})-R(x^*)+{g^{(k)}}^T(x^{(k+1)}-x^{(k)}+x^{(k)}-x^*)\right) . \end{aligned} \end{aligned}$$

As a consequence, we obtain the following relations:

$$\begin{aligned} \begin{aligned}&(x^{(k+1)}-x^{(k)})^T(x^*-x^{(k)})\ge \alpha _k\left( R(x^{(k+1)})-R(x^*)+{g^{(k)}}^T(x^{(k)}-x^*)\right) {+}\\&\qquad -(x^{(k+1)}-x^{(k)})^T(x^{(k)}-x^{(k+1)})+\alpha _k{g^{(k)}}^T(x^{(k+1)}-x^{(k)})\\&\quad =\alpha _k\left( R(x^{(k+1)})-R(x^*)+({\nabla F(x^{(k)})+e_g^{(k)}})^T(x^{(k)}-x^*)\right) {+}\\&\qquad +(x^{(k+1)}-x^{(k)})^T(x^{(k+1)}-x^{(k)})+\alpha _k({\nabla F(x^{(k)})+e_g^{(k)}})^T(x^{(k+1)}-x^{(k)})\\&\quad \ge \alpha _k\left( R(x^{(k+1)})-R(x^*)+F(x^{(k)})-F(x^*)\right) +\alpha _k{e_g^{(k)}}^T(x^{(k)}-x^*){+}\\&\qquad +\Vert x^{(k+1)}-x^{(k)}\Vert ^2 + \alpha _k(\nabla F(x^{(k)})+e_g^{(k)})^T(x^{(k+1)}-x^{(k)})\\&\quad =\alpha _k\left( R(x^{(k+1)}) + R(x^{(k)}) - R(x^{(k)}) + F(x^{(k)})-P(x^*)\right) {+}\\&\qquad + \Vert x^{(k+1)}-x^{(k)}\Vert ^2 +\alpha _k {e_g^{(k)}}^T(x^{(k)}-x^*){+}\\&\qquad +\alpha _k(\nabla F(x^{(k)})+e_g^{(k)})^T(x^{(k+1)}-x^{(k)})\\&\quad = \alpha _k\left( R(x^{(k+1)}) - R(x^{(k)}) + P(x^{(k)})-P(x^*)\right) + \Vert x^{(k+1)}-x^{(k)}\Vert ^2{+}\\&\qquad +\alpha _k {e_g^{(k)}}^T(x^{(k)}-x^*) +\alpha _k(\nabla F(x^{(k)})+e_g^{(k)})^T(x^{(k+1)}-x^{(k)})\\&\quad \ge \alpha _k\left( R(x^{(k+1)}) - R(x^{(k)})\right) + \Vert x^{(k+1)}-x^{(k)}\Vert ^2 +\alpha _k {e_g^{(k)}}^T(x^{(k)}-x^*) {+}\\&\qquad +\alpha _k(\nabla F(x^{(k)})+e_g^{(k)})^T(x^{(k+1)}-x^{(k)}), \end{aligned} \end{aligned}$$

(A9)

where the second inequality follows from the convexity of F and the last inequality follows from the fact that $P(x^{(k)})-P(x^*)\ge 0$. From a basic property of the Euclidean norm^{Footnote 1} we can write

$$\begin{aligned} \begin{aligned} \Vert x^{(k+1)}-x^*\Vert ^2&= \Vert x^{(k+1)}-x^{(k)}\Vert ^2+\Vert x^{(k)}-x^*\Vert ^2-2(x^{(k+1)}-x^{(k)})^T(x^*-x^{(k)})\\&{\mathop {\le }\limits ^{(A9)}} \Vert x^{(k+1)}-x^{(k)}\Vert ^2+\Vert x^{(k)}-x^*\Vert ^2-2\alpha _k\left( R(x^{(k+1)}) - R(x^{(k)})\right) {+}\\&\quad - 2\Vert x^{(k+1)}-x^{(k)}\Vert ^2{+}\\&\quad -2\alpha _k{e_g^{(k)}}^T(x^{(k)}-x^*) -2\alpha _k(\nabla F(x^{(k)})+e_g^{(k)})^T(x^{(k+1)}-x^{(k)})\\&=\Vert x^{(k)}-x^*\Vert ^2-\Vert x^{(k+1)}-x^{(k)}\Vert ^2-2\alpha _k\left( R(x^{(k+1)}) - R(x^{(k)})\right) {+}\\&\quad -2\alpha _k\nabla F(x^{(k)})^T(x^{(k+1)}-x^{(k)}) -2\alpha _k{e_g^{(k)}}^T(x^{(k)}-x^*){+}\\&\quad -2\alpha _k {e_g^{(k)}}^T(x^{(k+1)}-x^{(k)})\\&=\Vert x^{(k)}-x^*\Vert ^2-2\alpha _k{e_g^{(k)}}^T(x^{(k)}-x^*) -2\alpha _k {e_g^{(k)}}^T(x^{(k+1)}-x^{(k)}){+}\\&\quad -2\alpha _k\left( R(x^{(k+1)}) - R(x^{(k)}) + \nabla F(x^{(k)})^T(x^{(k+1)}-x^{(k)})\right. {+}\\&\quad \left. + \frac{1}{2\alpha _k}\Vert x^{(k+1)}-x^{(k)}\Vert ^2\right) \\&=\Vert x^{(k)}-x^*\Vert ^2-2\alpha _k\left( h_{\alpha _k}(x^{(k+1)};x^{(k)})+{e_g^{(k)}}^T(x^{(k+1)}-x^{(k)})\right) {+}\\&\quad -2\alpha _k{e^{(k)}_g}^T(x^{(k)}-x^*)\\ {}&\le \Vert x^{(k)}-x^*\Vert ^2-2\alpha _{max}\left( h_{\alpha _k}(x^{(k+1)};x^{(k)})+{e_g^{(k)}}^T(x^{(k+1)}-x^{(k)})\right) {+}\\&\quad -2\alpha _{k}{e^{(k)}_g}^T(x^{(k)}-x^*). \end{aligned} \end{aligned}$$

Taking the conditional expectation with respect to the $\sigma $-algebra ${\mathcal {F}}_k$, we obtain

$$\begin{aligned} \begin{aligned} {\mathbb {E}}\left( \Vert x^{(k+1)}-x^*\Vert ^2 |{\mathcal {F}}_k\right)&\le \Vert x^{(k)}-x^*\Vert ^2-2\alpha _{max}{\mathbb {E}}\left( h_{\alpha _k}(x^{(k+1)};x^{(k)})|{\mathcal {F}}_k\right) {+}\\&\quad -2\alpha _{max}{\mathbb {E}}\left( {e_g^{(k)}}^T(x^{(k+1)}-x^{(k)})|{\mathcal {F}}_k\right) {+}\\&\quad - 2{\mathbb {E}}\left( \alpha _{k}{e_g^{(k)}}^T(x^{(k)}-x^*)|{\mathcal {F}}_k\right) . \end{aligned} \end{aligned}$$

(A10)

Since $\alpha _k\in {\mathcal {F}}_{k+1}$ where ${\mathcal {F}}_k\subset {\mathcal {F}}_{k+1}$, in view of the tower property we obtain ${\mathbb {E}}\left( \alpha _k{e_g^{(k)}}^T(x^{(k)}-x^*)|{\mathcal {F}}_k\right) =0$ and we rewrite (A10) as

$$\begin{aligned} \begin{aligned} {\mathbb {E}}\left( \Vert x^{(k+1)}-x^*\Vert ^2 |{\mathcal {F}}_k\right)&\le \Vert x^{(k)}-x^*\Vert ^2+2\alpha _{max}{\mathbb {E}}\left( -h_{\alpha _k}(x^{(k+1)};x^{(k)})|{\mathcal {F}}_k\right) {+}\\&\quad +2\alpha _{max}{\mathbb {E}}\left( -{e_g^{(k)}}^T(x^{(k+1)}-x^{(k)})|{\mathcal {F}}_k\right) . \end{aligned} \end{aligned}$$

(A11)

By combining (A11) and part i) of Theorem 1 together with Lemma 3, we can state that the sequence $\{\Vert x^{(k)}-x^*\Vert \}_{k\in {\mathbb {N}}}$ converges a.s.

Next we prove the almost sure convergence of the sequence $\{x^{(k)}\}$ by following a strategy similar to the one employed in [21, Theorem 2.1]. Let $\{x_i^*\}_i$ be a countable subset of the relative interior $\text {ri}(X^*)$ that is dense in $X^*$. From the almost sure convergence of $\Vert x^{(k)}-x^*\Vert $, $x^*\in X^*$, we have that for each i, the probability $\text {Prob}(\{\Vert x^{(k)}-x_i^*\Vert \} \ \text{ is } \text{ not } \text{ convergent})=0$. Therefore, we observe that

$$\begin{aligned} \begin{aligned}&\text {Prob}(\forall i\ \exists b_i\ \text{ s.t. } \lim _{k\rightarrow +\infty }\Vert x^{(k)}-x_{i}^* \Vert =b_i )=1-\text {Prob}(\{\Vert x^{(k)}-x_{i} ^*\Vert \} \ \text{ is } \text{ not } \text{ convergent})\\&\quad \ge 1- \sum _i \text {Prob}(\{\Vert x^{(k)}-x_i^*\Vert \} \ \text{ is } \text{ not } \text{ convergent})=1, \end{aligned} \end{aligned}$$

where the inequality follows from the union bound, i.e. for each i, $\{\Vert x^{(k)}-x_i^*\Vert \}$ is a convergent sequence a.s. For a contradiction, suppose that there are convergent subsequences $\{u_{k_j}\}_{k_j}$ and $\{v_{k_j}\}_{k_j}$ of $\{x^{(k)}\}$ which converge to their limiting points $u^*$ and $v^*$ respectively, with $\Vert u^*-v^*\Vert =r>0$. By Theorem 2, $u^*$ and $v^*$ are stationary; in particular, since P is convex, they are minimum points, i.e. $u^*,v^*\in X^*$. Since $\{x^*_i\}_i$ is dense in $X^*$, we may assume that for all $\epsilon >0$, we have $x^*_{i_1}$ and $x^*_{i_2}$ are such that $\Vert x^*_{i_1}-u^*\Vert <\epsilon $ and $\Vert x^*_{i_2}-v^*\Vert <\epsilon $. Therefore, for all $k_j$ sufficiently large,

$$\begin{aligned} \Vert u_{k_j}-x^*_{i_1}\Vert \le \Vert u_{k_j}-u^*\Vert +\Vert u^*-x^*_{i_1}\Vert <\Vert u_{k_j}-u^*\Vert +\epsilon . \end{aligned}$$

On the other hand, for sufficiently large j, we have

$$\begin{aligned} \Vert v_{k_j}-x_{i_1}^*\Vert \ge \Vert v^*-u^*\Vert -\Vert u^*-x^*_{i_1}\Vert -\Vert v_{k_j}-v^*\Vert>r-\epsilon -\Vert v_{k_j}-v^*\Vert >r-2\epsilon . \end{aligned}$$

This contradicts with the fact that $x^{(k)}-x^*_{i_1}$ is convergent. Therefore, we must have $u^*=v^*$, hence there exists ${\bar{x}}\in X^*$ such that $x^{(k)}\longrightarrow {\bar{x}}$. $\square $

Proof of Theorem 4

If we do not neglect the term $P(x^{(k)}) - P(x^*)$ in (A9) and in all the subsequent inequalities, instead of (A11) we obtain

$$\begin{aligned} \begin{aligned} {\mathbb {E}}\left( \Vert x^{(k+1)}-x^*\Vert ^2 |{\mathcal {F}}_k\right)&\le \Vert x^{(k)}-x^*\Vert ^2+ \\&\quad + 2\alpha _{max}{\mathbb {E}}\left( -h_{\alpha _k}(x^{(k+1)};x^{(k)}) -{e_g^{(k)}}^T(x^{(k+1)}-x^{(k)})|{\mathcal {F}}_k\right) {+}\\&\quad -2\alpha _{min} {\mathbb {E}}\left( P(x^{(k)})- P(x^*) |{\mathcal {F}}_k\right) . \end{aligned} \end{aligned}$$

(A12)

Summing the previous inequality from 0 to K and taking the total expectation, we obtain

$$\begin{aligned} \begin{aligned}&\sum _{k=0}^K {\mathbb {E}}\left( P(x^{(k)})- P(x^*)\right) \le \frac{1}{2 \alpha _{min}}\left( \Vert x^{(0)}-x^*\Vert ^2 - {\mathbb {E}}(\Vert x^{(K+1)}-x^*\Vert ^2)\right) + \\&\quad +\frac{\alpha _{max}}{\alpha _{min}} {\mathbb {E}}\left( \sum _{k=0}^K {\mathbb {E}}\left( -h_{\alpha _k}(x^{(k+1)};x^{(k)})-{e_g^{(k)}}^T(x^{(k+1)}-x^{(k)})|{\mathcal {F}}_k\right) \right) . \end{aligned} \end{aligned}$$

By neglecting the term $- {\mathbb {E}}(\Vert x^{(K+1)}-x^*\Vert ^2)$ and bounding by S the second term (Theorem 1i))

$$\begin{aligned} \sum _{k=0}^K {\mathbb {E}}\left( -h_{\alpha _k}(x^{(k+1)};x^{(k)})-{e_g^{(k)}}^T(x^{(k+1)}-x^{(k)})|{\mathcal {F}}_k\right) \le S, \end{aligned}$$

we obtain

$$\begin{aligned} \sum _{k=0}^K {\mathbb {E}}\left( P(x^{(k)})- P(x^*)\right) \le \frac{1}{2 \alpha _{min}} \Vert x^{(0)}-x^*\Vert ^2+ \frac{\alpha _{max}}{\alpha _{min}} S. \end{aligned}$$

(A13)

Setting ${\overline{x}}^{(K)}= \frac{1}{K+1} \sum _{k=0}^K x^{(k)}$, from the Jensen’s inequality, we observe that $ {\mathbb {E}}(P({\overline{x}}^{(K)}))\le \frac{1}{K+1} \sum _{k=0}^K {{\mathbb {E}}(P(x^{(k)}))}$. Thus, by dividing (A13) by $K+1$, we can write

$$\begin{aligned} {\mathbb {E}}\left( P({\overline{x}}^{(K)})- P(x^*)\right) \le \frac{1}{K+1} \left( \frac{1}{2 \alpha _{min}} \Vert x^{(0)}-x^*\Vert ^2+ \frac{\alpha _{max}}{\alpha _{min}} S\right) . \end{aligned}$$

(A14)

Thus, we obtain the ${\mathcal {O}}(1/K)$ ergodic convergence rate of ${\mathbb {E}}\left( P({\overline{x}}^{(K)})- P(x^*)\right) $.

Now, we assume $\sum _{k=0}^\infty k \eta _k=\Sigma $. In (A13) the term $\sum _{k=0}^K {\mathbb {E}}\left( P(x^{(k)})- P(x^*)\right) $ is equal to ${\mathbb {E}} \left( \sum _{k=0}^K P(x^{(k)}) \right) -(K+1) P(x^*) $. We observe that, since $0\le P(x^{(0)})-P(x^*)$, we can write

$$\begin{aligned} {\mathbb {E}} \left( \sum _{k=1}^K P(x^{(k)})\right) - K P(x^*)\le & {} {\mathbb {E}} \left( \sum _{k=0}^K P(x^{(k)}) \right) -(K+1) P(x^*) \\\le & {} \frac{1}{2 \alpha _{min}} \Vert x^{(0)}-x^*\Vert ^2+ \frac{\alpha _{max}}{\alpha _{min}} S. \end{aligned}$$

Now we determine a lower bound for ${\mathbb {E}} \left( \sum _{k=1}^K P(x^{(k)})\right) $. From the inequality (8), we have that ${\mathbb {E}}\left( P(x^{(k)})-P(x^{(k+1)})|{\mathcal {F}}_k\right) +\eta _k\ge 0$ and, hence, by considering the total expectation we obtain ${\mathbb {E}}\left( P(x^{(k)})-P(x^{(k+1)})\right) +{\mathbb {E}}(\eta _k)\ge 0$. Thus, we have

$$\begin{aligned} 0\le & {} \sum _{k=1}^K k {\mathbb {E}}\left( P(x^{(k)})-P(x^{(k+1)})\right) +\sum _{k=1}^K k{\mathbb {E}}(\eta _k) \nonumber \\= & {} \sum _{k=1}^K {\mathbb {E}}( P(x^{(k)})) -K {\mathbb {E}}(P(x^{(K+1)}))+ {\mathbb {E}}\left( \sum _{k=1}^K k\eta _k\right) . \end{aligned}$$

(A15)

Then, we can write

$$\begin{aligned} K {\mathbb {E}}( P(x^{(K+1)}))-\Sigma \le \sum _{k=1}^K {\mathbb {E}}\left( P(x^{(k)})\right) . \end{aligned}$$

(A16)

Consequently, we can conclude that

$$\begin{aligned} {\mathbb {E}}(P(x^{(K+1)})-P(x^*)) \le \frac{1}{K} \left( \frac{1}{2 \alpha _{min}} \Vert x^{(0)}-x^*\Vert ^2+ \frac{\alpha _{max}}{\alpha _{min}} S +\Sigma \right) . \end{aligned}$$

$\square $

Appendix B Hyperparameter Settings for Hybrid Methods

For Prox-SVRG method we use the hyperparameter setting proposed in [27], i.e., ${\overline{N}}=1$, $m=2N$, where m is the number of Prox-SVRG inner iterations. This means that a full gradient has to be computed every two epochs. As for the fixed steplength ${\overline{\alpha }}$, we tried the suggestions reported in the experimental part of [27], i.e., $\alpha =\{ \frac{1}{{\hat{L}}},\frac{0.1}{{\hat{L}}},\frac{0.01}{{\hat{L}}}\}$, where ${\hat{L}}$ is an approximation of the Lipschitz constant L of $\nabla F$. In Table we report the best obtained steplength values for all the test problems.

Table 8 Best tuned values of the steplength for Prox-SVRG for the considered test problems

Full size table

For the Prox-SARAH method we use the hyperparameter setting specified in [22] where, by borrowing the notation of the referred paper, $q=2+0.01+(\frac{1}{100})$, $C=\frac{q^2}{(q^2+8){\hat{L}}^2\gamma ^2}$ and the values for the other hyperparameters are shown in Table .

Table 9 Settings of Prox-SARAH [22]

Full size table

For the Prox-Spider-boost method we use the hyperparameter setting specified in [26] and the values for the hyperparameters are shown in Table .

Table 10 Settings of Prox-Spider-boost [26]

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Franchini, G., Porta, F., Ruggiero, V. et al. A Line Search Based Proximal Stochastic Gradient Algorithm with Dynamical Variance Reduction. J Sci Comput 94, 23 (2023). https://doi.org/10.1007/s10915-022-02084-3

Download citation

Received: 22 February 2022
Revised: 26 September 2022
Accepted: 06 December 2022
Published: 23 December 2022
DOI: https://doi.org/10.1007/s10915-022-02084-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Line Search Based Proximal Stochastic Gradient Algorithm with Dynamical Variance Reduction

Abstract

Access this article

Similar content being viewed by others

Proximal Gradient Method with Extrapolation and Line Search for a Class of Non-convex and Non-smooth Problems

Accelerated Gradient-Free Optimization Methods with a Non-Euclidean Proximal Operator

A Telescopic Bregmanian Proximal Gradient Method Without the Global Lipschitz Continuity Assumption

Availability of Data and Materials

Change history

23 June 2023

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix A Proofs of Theorems for Sect. 2

Lemma 1

Lemma 2

Proof

Lemma 3

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

Proof of Theorem 4

Appendix B Hyperparameter Settings for Hybrid Methods

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Line Search Based Proximal Stochastic Gradient Algorithm with Dynamical Variance Reduction

Abstract

Access this article

Similar content being viewed by others

Proximal Gradient Method with Extrapolation and Line Search for a Class of Non-convex and Non-smooth Problems

Accelerated Gradient-Free Optimization Methods with a Non-Euclidean Proximal Operator

A Telescopic Bregmanian Proximal Gradient Method Without the Global Lipschitz Continuity Assumption

Availability of Data and Materials

Change history

23 June 2023

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix A Proofs of Theorems for Sect. 2

Lemma 1

Lemma 2

Proof

Lemma 3

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

Proof of Theorem 4

Appendix B Hyperparameter Settings for Hybrid Methods

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation