Accelerated proximal stochastic variance reduction for DC optimization

He, Lulu; Ye, Jimin; E, 
Jianwei

doi:10.1007/s00521-021-06348-1

Accelerated proximal stochastic variance reduction for DC optimization

Review
Published: 06 August 2021

Volume 33, pages 13163–13181, (2021)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Lulu He¹,
Jimin Ye¹ &
Jianwei E¹

517 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

In this article, we study an important class of stochastic difference-of-convex (SDC) programming whose objective is given in the form of the sum of a continuously differentiable convex function, a simple convex function and a continuous concave function. Recently, a proximal stochastic variance reduction difference-of-convex algorithm (Prox-SVRDCA) (Xu et al., 2019) is developed for this problem. And, Prox-SVRDCA reduces to the proximal stochastic variance reduction gradient (Prox-SVRG) (Xiao and Zhang, 2014) as the continuous concave function is disappeared, and hence Prox-SVRDCA is potentially slow in practice. Inspired by recently proposed acceleration techniques, an accelerated proximal stochastic variance reduction difference-of-convex algorithm (AProx-SVRDCA) is proposed. Different from Prox-SVRDCA, an extrapolation acceleration step that involves the latest two iteration points is incorporated in AProx-SVRDCA. The experimental results show that, for a fairly general choice of the extrapolation parameter, the acceleration will be achieved for AProx-SVRDCA. Then, a rigorous theoretical analysis is presented. We first show that any accumulation point of the generated iteration sequences is a stationary point of the objective function. Furthermore, different from the traditional convergence analysis in the existing nonconvex stochastic optimizations, a global convergence property with respect to the generated sequences is established under the assumption: the Kurdyka-Łojasiewicz property together with the continuity and differentiability of the concave part in objective function. To the best of our knowledge, this is the first time that the acceleration trick is incorporated into nonconvex nonsmooth SDC programming. Finally, extended experimental results show the superiority of our proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Inertial proximal gradient methods with Bregman regularization for a class of nonconvex optimization problems

Article 19 August 2020

Smoothing sample average approximation method for solving stochastic second-order-cone complementarity problems

Article Open access 10 April 2018

Proximal-like incremental aggregated gradient method with Bregman distance in weakly convex optimization problems

Article 29 May 2021

References

Gasso G, Rakotomamonjy A, Canu S (2009) Recovering sparse signals with a certain family of nonconvex penalties and DC programming. IEEE Trans Signal Process. 57(12):4686–4698
Article MathSciNet Google Scholar
Zhang S, Xin J (2014) Minimization of transformed $l_1$ penalty: Theory, difference of convex function algorithm, and robust application in compressed sensing. Math Program. 169(3):1–30
Google Scholar
Le Thi HA, Le HM, Pham Dinh T (2015) Feature selection in machine learning: an exact penalty approach using a difference of convex function algorithm. Mach Learn. 101(1–3):163–186
Article MathSciNet Google Scholar
PhamDinh T, LeThi HA (1997) Convex analysis approach to DC programming: theory, algorithms and applications, Acta Math Vietnam 22 (1): 289–355
MathSciNet MATH Google Scholar
Wen B, Chen X, Pong T (2018) A proximal difference-of-convex algorithm with extrapolation. Comput Optimiz Appl 69(2):297–324
Article MathSciNet Google Scholar
PhamDinh T, LeThi HA (1998) Optimization algorithm for solving the trust-region subproblem, SIAM J Optimiz 8 (2): 476–505
Article MathSciNet Google Scholar
Ahn M, Pang J, Xin J (2017) Difference-of-convex learning: directional stationarity, optimality, and sparsity, SIAM J Optimiz 27 (3): 1637–1665
Article MathSciNet Google Scholar
LeThi HA, PhamDinh T (2018) Dc programming and DCA: thirty years of developments, Math Program 169: 5–68
Article MathSciNet Google Scholar
Pham DN (2016) DCA based algorithms for learning with sparsity in high dimensional setting and stochastical learning, Ph. D. thesis, University of Lorraine
Le Thi HA, Le HM, Phan DN, et al (2017) Stochastic DCA for sparse multiclass logistic regression. In: Advances in Intelligent Systems and Computing
Le Thi HA, Le HM, Phan DN, et al (2017) Stochastic DCA for the large-sum of non-convex functions problem and its application to group variable selection in classification. In: International Conference on Machine Learning
Le Thi HA, Huynh VN, Pham Dinh T (2019) Stochastic difference-of-convex algorithms for solving nonconvex optimization problems. arXiv:1911.04334v1
Nemirovski A, Juditsky A, Lan G et al (2009) Robust stochastic approximation approach to stochastic programming. SIAM J Optim 19(4):1574–1609
Roux N, Schmidt M, Bach F (2013) A stochastic gradient method with an exponential convergence rate for finite training sets. Adv Neural Inform Process Syst 4:2663–2671
Google Scholar
Johnson R, Zhang T (2013) Accelerating stochastic gradient descent using predictive variance reduction. In: Advance in Neural Information Processing Systems, pp 315–323
Defazio A, Bach F, Julien S (2014) SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives. AAdv Neural Inform Process Syst 2:1646–1654
Google Scholar
Xiao L, Zhang T (2014) A proximal stochastic gradient method with progressive variance reduction. SIAM J Optimiz 24(4):2057–2075
Article MathSciNet Google Scholar
Xu Y, Qi Q, Lin Q, et al (2019) Stochastic optimization for DC functions and non-smooth non-convex regularizers with non-asymptotic convergence. arXiv:1811.11829v2
Nguyen L, Liu J, Scheinberg K, et al (2017) Stochastic recursive gradient algorithm for nonconvex optimization. arXiv:1705.07261v1
Nguyen L, Scheinberg K, Taká M (2021) Inexact sarah algorithm for stochastic optimization. Optimiz Methods Softw 36(1):237–258
Article MathSciNet Google Scholar
Lei L, Jordan M (2017) Less than a single pass: Stochastically controlled stochastic gradient. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, vol. 54, pp 148–156
Lu Q, Liao X, Li H, Huang T (2020) A computation-efficient decentralized algorithm for composite constrained optimization, IEEE Trans Signal Inform Process over Netw 6: 774–789
Article MathSciNet Google Scholar
Lin Y, Jin X, Chen J et al (2019) An analytic computation-driven algorithm for decentralized multicore systems. Future Gene Comput Syst. 96:101–110
Article Google Scholar
Sodhro AH, Pirbhulal S, de Albuquerque VHC (2019) Artificial intelligence-driven mechanism for edge computing-based industrial applications. IEEE Trans Ind Inform. 15(7):4235–4243
Article Google Scholar
Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems.SIAM J Imaging Sci 2(1):183–202
Article MathSciNet Google Scholar
Allen-Zhu Z (2017) Katyusha: The first direct acceleration of stochastic gradient methods. In: Acm Sigact Symposium on Theory of Computing
Zhou K (2018) Direct acceleration of SAGA using sampled negative momentum. arXiv:1806.11048v4
Nitanda A (2014) Stochastic proximal gradient descent with acceleration techniques. In: Advances in Neural Information Processing Systems
Allen-Zhu Z (2018) Katyusha X: Practical momentum method for stochastic sum-of-nonconvex optimization. In: Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, PMLR 80
Driggs D, Ehrhardt M et al. Accelerating variance-reduced stochastic gradient methods. Mathematical Programming. https://doi.org/10.1007/s10107-020-01566-2
Lan G, Zhou Y (2018) Random gradient extrapolation for distributed and stochastic optimization. SIAM J Optimiz 28(4):2753–2782
Article MathSciNet Google Scholar
Nesterov Y (2004) Introductory lectures on convex optimizaton: a basic course. Applied Optimization. vol. 87. Kluwer Academic Publishers. London
Book Google Scholar
Attouch H, Bolte J, Redont P et al (2010) Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the kurdyka-lojasiewicz inequality. Mathematics of Operations Research. 35:438–457
Article Google Scholar
Bolte J, Sabach S, Teboulle M (2014) Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math Program 146:459–494
Article MathSciNet Google Scholar
Parikh N, Boyd S (2013) Proximal algorithms. Found Trends Optimiz 1(3):123–231
Google Scholar
Shang F, Jiao L, Zhou K et al (2018) ASVRG: Accelerated Proximal SVRG. In: Proceedings of Machine Learning Research, vol 95, pp 815–830
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties.J Am Stat Assoc 96(456):1348–1360
Article MathSciNet Google Scholar
Gong P, Zhang C, Lu Z, et al (2013) A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems. In: Proceedings of the 30th International Conference on Machine Learning, pp 37–45
Pankratov EL, Spagnolo B (2005) Optimization of impurity profile for p-n-junctionin heterostructures.Eur Phys J B 46:15–19
Article Google Scholar
Giuffrida A, Valenti D, Ziino G et al (2009) A stochastic interspecific competition model to predict the behaviour of listeria monocytogenes in the fermentation process of a traditional sicilian salami. Eur Food Res Technol. 228:767–775
Article Google Scholar
Denaro G, Valenti D, La Cognata A et al (2013) Spatio-temporal behaviour of the deep chlorophyll maximum in mediterranean sea: development of a stochastic model for picophytoplankton dynamics. Ecol Complex 13:21–34
Article Google Scholar
Denaro G, Valenti D, Spagnolo B et al (2013) Dynamics of two picophytoplankton groups in mediterranean sea: analysis of the deep chlorophyll maximum by a stochastic advection-reaction-diffusion model. Plos One 8:e66765
Article Google Scholar
Pizzolato N, Fasconaro A, Adorno DP et al (2010) Resonant activation in polymer translocation: new insights into the escape dynamics of molecules driven by an oscillating field. Phys Biol. 7(3):034001
Article Google Scholar
Falci G, La Cognata A, Berritta M et al (2013) Design of a lambda system for population transfer in superconducting nanocircuits. Phys Rev B. 87(13):214515
Article Google Scholar
Mikhaylov AN, Gryaznov EG, Belov AI et al (2016) Field- and irradiation-induced phenomena in memristive nanomaterials. Physica Status Solidi 13:870–881
Google Scholar
Carollo A, Spagnolo B, Valenti D (2018) Uhlmann curvature in dissipative phase transitions. Sci Rep 8:9852
Article Google Scholar
Spagnolo B, Valenti D (2008) Volatility effects on the escape time in financial market models.Int J Bifurcation and Chaos 18(9):2775–2786
Article Google Scholar

Download references

Funding

This work is supported in part by the National Nature Science Foundation of China under Grant 61573014, and in part by the Fundamental Research Funds for the Central Universities under Grant JB210717.

Author information

Authors and Affiliations

School of Mathematics and Statistics, Xi’dian University, Xi’an, 710071, China
Lulu He, Jimin Ye & Jianwei E

Authors

Lulu He
View author publications
You can also search for this author in PubMed Google Scholar
Jimin Ye
View author publications
You can also search for this author in PubMed Google Scholar
Jianwei E
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jimin Ye.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

The proof of Lemma 3

Proof

By taking expectation with respect to $i_k$, and noting that $i_k$ is dependent of $x^s_k, x^s_{k-1}$ and ${\tilde{x}}^{s-1}$, we obtain

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}_{i_k}||v^s_k-\nabla f(y^s_k)||^2\\&\quad ={\mathbb {E}}_{i_k}||\nabla f_{i_k}(y^s_k)-\nabla f_{i_k}({\tilde{x}}^{s-1}) +\nabla f({\tilde{x}}^{s-1})-\nabla f(y^s_k)||^2 \\&\quad \le {\mathbb {E}}_{i_k}||\nabla f_{i_k}(y^s_k)-\nabla f_{i_k}({\tilde{x}}^{s-1})||^2 \\&\quad \le L^2||y^s_k-{\tilde{x}}^{s-1}||^2 \\&\quad \le 2L^2||x^s_k-{\tilde{x}}^{s-1}||^2+2L^2\beta ^2_k||x^s_k-x^s_{k-1}||^2 \end{aligned} \end{aligned}$$

The first inequality holds by ${\mathbb {E}}[X-{\mathbb {E}}[X]]^2\le {\mathbb {E}}[X^2]$, X is a random variable. The second inequality follows from $\nabla f_i$ is Lipschitz continuously. The last inequality comes from the definition of $y^s_k$ and Cauchy–Schwarz Inequality. This completes the proof. $\square$

The proof of Lemma 4

Proof

Firstly, it follows from the strong convexity of minimization (2) that

$$\begin{aligned} \begin{aligned}&R_1(x^s_{k+1})+\frac{L}{2}||x^s_{k+1}-y^s_k||^2+\langle v^s_k -\partial R_2(x^s_k),x^s_{k+1}-x^s_k\rangle \\&\quad \le R_1(x^s_k)+\frac{L}{2}||x^s_k-y^s_k||^2-\frac{L}{2}||x^s_{k+1}-x^s_k||^2 \end{aligned} \end{aligned}$$

(16)

Then, we have

$$\begin{aligned} \begin{aligned}&F(x^s_{k+1}) \\&\quad = f(x^s_{k+1})+R_1(x^s_{k+1})-R_2(x^s_{k+1}) \\&\quad \le f(y^s_k)+\langle \nabla f(y^s_k)-v^s_k,x^s_{k+1}-x^s_k\rangle \\&\qquad +\frac{L}{2}||x^s_{k+1}-y^s_k||^2+R_1(x^s_{k+1}) \\&\qquad - R_2(x^s_k)+\langle \nabla f(y^s_k),x^s_k-y^s_k\rangle +\langle v^s_k -\partial R_2(x^s_k),x^s_{k+1}-x^s_k\rangle \\&\quad \le f(x^s_k)+\langle \nabla f(y^s_k)-v^s_k,x^s_{k+1}-x^s_k\rangle \\&\qquad +\frac{L}{2}||x^s_{k+1}-y^s_k||^2+R_1(x^s_{k+1}) \\&\qquad - R_2(x^s_k)+\langle v^s_k-\partial R_2(x^s_k),x^s_{k+1}-x^s_k\rangle \\&\quad \le F(x^s_k)+\langle \nabla f(y^s_k)-v^s_k,x^s_{k+1}-x^s_k\rangle \\&\qquad +\frac{L\beta ^2_k}{2}||x^s_k-x^s_{k-1}||^2 \\&\qquad -\frac{L}{2}||x^s_{k+1}-x^s_k||^2 \end{aligned} \end{aligned}$$

(17)

where the first inequality follows from the assumptions that f is L-smooth and the convexity of $R_2$. The second inequality comes from the assumption that the convexity of f. The last inequality holds by (15). To reflect the variance induced by stochastic sampling, we performing the following scaling on the second term of the last inequality in (16).

$$\begin{aligned} \begin{aligned}&\langle \nabla f(y^s_k)-v^s_k,x^s_{k+1}-x^s_k\rangle \\&\quad = \langle \nabla f(y^s_k)-v^s_k,x^s_{k+1}-{\bar{x}}^s_{k+1}\rangle \\&\qquad +\langle \nabla f(y^s_k)-v^s_k,{\bar{x}}^s_{k+1}-x^s_k\rangle \\&\quad \le ||\nabla f(y^s_k)-v^s_k||\cdot ||x^s_{k+1}-{\bar{x}}^s_{k+1}||\\&\qquad +\langle \nabla f(y^s_k)-v^s_k,{\bar{x}}^s_{k+1}-x^s_k\rangle \\&\quad \le \frac{1}{L}||\nabla f(y^s_k)-v^s_k||^{2}+\langle \nabla f(y^s_k) -v^s_k,{\bar{x}}^s_{k+1}-x^s_k\rangle \\ \end{aligned} \end{aligned}$$

(18)

where ${\bar{x}}^s_{k+1}=prox_{\frac{1}{L}R_1}\big (y^s_k-\frac{1}{L}(\nabla f(y^s_k)-\partial R_2(x^s_k))\big )$. The first inequality holds by Cauchy–Schwarz Inequality. The second inequality holds by the nonexpansiveness property of proximal operator. Note that $i_k$ is independent of $x^s_{k_1}, x^s_k$, and ${\tilde{x}}^{s-1}$. By substituting (17) in (16) and taking expectation over $i_k$ we have

$$\begin{aligned} \begin{aligned} {\mathbb {E}}_{i_k}[F(x^s_{k+1})]&\le F(x^s_k)+\frac{1}{L}{\mathbb {E}}_{i_k} ||\nabla f(y^s_k)-v^s_k||^2\\&\quad +\frac{L\beta ^2_k}{2}||x^s_k-x^s_{k-1}||^2 -\frac{L}{2}{\mathbb {E}}_{i_k}||x^s_{k+1}-x^s_k||^2 \\&\le F(x^s_k)+2L||x^s_k-{\tilde{x}}^{s-1}||^2\\&\quad +\frac{5L\beta ^2_k}{2} ||x^s_k-x^s_{k-1}||^2-\frac{L}{2}{\mathbb {E}}_{i_k}||x^s_{k+1}-x^s_k||^2 \end{aligned} \end{aligned}$$

where in the second inequality we use Lemma 3. This completes the proof. $\square$

The proof of Lemma 5

Proof

By Lemma 4, we have the following inequality

$$\begin{aligned} \begin{aligned} {\mathbb {E}}_{i_k}[F(x^s_{k+1})]&\le F(x^s_k)+2L||x^s_k-{\tilde{x}}^{s-1}||^2\\&\quad +\frac{5L\beta ^2_k}{2}||x^s_k -x^s_{k-1}||^2-\frac{L}{2}{\mathbb {E}}_{i_k}||x^s_{k+1}-x^s_k||^2 \end{aligned} \end{aligned}$$

By adding $c_{k+1}{\mathbb {E}}_{i_k}[||x^s_{k+1}-x^s_k||^2+||x^s_{k+1}-{\tilde{x}}^{s-1}||^2]$ on both sides of this inequality,

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}_{i_k}[F(x^s_{k+1})+c_{k+1}\big (||x^s_{k+1}-x^s_k||^2+||x^s_{k+1} -{\tilde{x}}^{s-1}||^2\big )] \\&\quad \le F(x^s_k)+2L||x^s_k -{\tilde{x}}^{s-1}||^2+\frac{5L\beta ^2_k}{2}||x^s_k-x^s_{k-1}||^2 \\&\qquad +(c_{k+1}-\frac{L}{2}){\mathbb {E}}_{i_k}||x^s_{k+1}-x^s_k||^2 +c_{k+1}{\mathbb {E}}_{i_k}||x^s_{k+1}-{\tilde{x}}^{s-1}||^2 \end{aligned} \end{aligned}$$

(19)

We scale the last term on the right hand side of the above inequality,

$$\begin{aligned} \begin{aligned}&c_{k+1}{\mathbb {E}}_{i_k}||x^s_{k+1}-{\tilde{x}}^{s-1}||^2 \\&\quad = c_{k+1}[{\mathbb {E}}_{i_k}||x^s_{k+1}-x^s_k||^2\\&\qquad +2{\mathbb {E}}_{i_k}\langle x^s_{k+1}-x^s_k,x^s_k-{\tilde{x}}^{s-1}\rangle +||x^s_k-{\tilde{x}}^{s-1}||^2] \\&\quad \le c_{k+1}[{\mathbb {E}}_{i_k}||x^s_{k+1}-x^s_k||^2+\frac{1}{a} {\mathbb {E}}_{i_k}||x^s_{k+1}-x^s_k||^2\\&\qquad +a||x^s_k-{\tilde{x}}^{s-1}|^2+||x^s_k -{\tilde{x}}^{s-1}||^2] \\&\quad = c_{k+1}(1+\frac{1}{a}){\mathbb {E}}_{i_k}||x^s_{k+1}-x^s_k||^2\\&\qquad +c_{k+1}(1+a)||x^s_k-{\tilde{x}}^{s-1}||^2 \end{aligned} \end{aligned}$$

(20)

where $a>0$, the inequality comes from Cauchy–Schwarz Inequality and the Young’s Inequality. Substituting (19) in (18), then we have

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}_{i_k}[F(x^s_{k+1})+c_{k+1}\big (||x^s_{k+1}-x^s_k||^2 +||x^s_{k+1}-{\tilde{x}}^{s-1}||^2\big )] \\&\quad \le F(x^s_k)+\frac{5L\beta ^2_k}{2}||x^s_k-x^s_{k-1}||^2\\&\qquad +[2L +c_{k+1}(1+a)]||x^s_k-{\tilde{x}}^{s-1}||^2 \\&\qquad +\left[ c_{k+1}\left( 2+\frac{1}{a}\right) -\frac{L}{2}\right] {\mathbb {E}}_{i_k}||x^s_{k+1}-x^s_k||^2 \\&\quad = F(x^s_k)+c_k[||x^s_k-x^s_{k-1}||^2+||x^s_k-{\tilde{x}}^{s-1}||^2]\\&\qquad -[2L+c_{k+1}(1+a)]||x^s_k-x^s_{k-1}||^2 \\&\qquad -\frac{5L\beta ^2_k}{2}||x^s_k-{\tilde{x}}^{s-1}||^2-\left[ \frac{L}{2} -c_{k+1}\left( 2+\frac{1}{a}\right) \right] \\&\qquad {\mathbb {E}}_{i_k}||x^s_{k+1}-x^s_k||^2 \end{aligned} \end{aligned}$$

(21)

By definition of $c_k$ and $c_k\le \frac{La}{2(1+2a)}$, $\forall ~k\in \{0,1,\ldots ,m-1\}$ in each epoch s

$$\begin{aligned} G^s_{k+1}\le G^s_k \end{aligned}$$

On the other hand, by $x^s_{-1}=x^s_0={\tilde{x}}^{s-1}=x^{s-1}_m$, for any s

$$\begin{aligned} \begin{aligned} G^s_m&={\mathbb {E}}[F(x^s_m)+c_m(||x^s_m-x^s_{m-1}||^2+||x^s_m-{\tilde{x}}^{s-1}||^2)] \\&\le {\mathbb {E}}[F(x^s_0)+c_0(||x^s_0-x^s_{-1}||^2+||x^s_0-{\tilde{x}}^{s-1}||^2] \\&={\mathbb {E}}[F({\tilde{x}}^{s-1})] \\&\le {\mathbb {E}}[F(x^{s-1}_m)+c_m(||x^{s-1}_m-x^{s-1}_{m-1}||^2 +||x^{s-1}_m-{\tilde{x}}^{s-2}||^2] \\&= G^{s-1}_m \end{aligned} \end{aligned}$$

(22)

Therefore, the sequence $\{G^s_k\}$ is nonincreasing $\forall k\in \{0,1,\ldots ,m-1\}$ and s. This together with the fact that it is also bounded from below by $\min f$. So, we can obtain that $\{G^s_k\}$ is convergent.

Next, we will present an upper bound on $c_0$. In view of the definition of $c_{k+1}$, we have

$$\begin{aligned} \begin{aligned} c_{k+1}&=\frac{1}{1+a}c_k-\frac{1}{1+a}\left( \frac{5L\beta ^2_k}{2}+2L\right) \\&=\frac{1}{1+a}\left[ \frac{1}{1+a}c_{k-1}-\frac{1}{1+a}\left( \frac{5L\beta ^2_k}{2} +2L\right) \right] \\&\quad -\frac{1}{1+a}\left( \frac{5L\beta ^2_k}{2}+2L\right) \\&=\frac{1}{(1+a)^2}c_{k-1} -\left[ \frac{1}{1+a}+\frac{1}{(1+a)^2}\right] \left( \frac{5L\beta ^2_k}{2}+2L\right) \\&=...... \\&=\frac{1}{(1+a)^{k+1}}c_0-\left[ \frac{1}{1+a}+... +\frac{1}{(1+a)^{k+1}}\right] \\&\quad \left( \frac{5L\beta ^2_k}{2}+2L\right) \\&=\frac{1}{(1+a)^{k+1}}c_0-\frac{1-\left( \frac{1}{1+a} \right) ^{k+1}}{a}\left( \frac{5L\beta ^2_k}{2}+2L\right) \end{aligned} \end{aligned}$$

Because of $c_m=0$, that is, when $k+1=m$, we have

$$\begin{aligned} c_m=0=\frac{1}{(1+a)^m}c_0-\frac{1 -\left( \frac{1}{1+a}\right) ^m}{a}\left( \frac{5L\beta ^2_k}{2}+2L\right) \end{aligned}$$

when $m=\frac{1}{a}$ and $a\rightarrow 0$, we have

$$\begin{aligned} c_0=\frac{(1+a)^m-1}{a}\left( \frac{5L\beta ^2_k}{2}+2L\right) \rightarrow \frac{e-1}{a}\left( \frac{5L\beta ^2_k}{2}+2L\right) \end{aligned}$$

As a result, we can obtain that $c_0\le \frac{e-1}{a}(\frac{5L\beta ^2_k}{2}+2L)$. There we use $(1+a)^\frac{1}{a}\rightarrow e$, when $a\rightarrow 0$. This completes the proof. $\square$

Rights and permissions

Reprints and permissions

About this article

Cite this article

He, L., Ye, J. & E, . Accelerated proximal stochastic variance reduction for DC optimization. Neural Comput & Applic 33, 13163–13181 (2021). https://doi.org/10.1007/s00521-021-06348-1

Download citation

Received: 22 November 2020
Accepted: 18 July 2021
Published: 06 August 2021
Issue Date: October 2021
DOI: https://doi.org/10.1007/s00521-021-06348-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerated proximal stochastic variance reduction for DC optimization

Abstract

Access this article

Similar content being viewed by others

Inertial proximal gradient methods with Bregman regularization for a class of nonconvex optimization problems

Smoothing sample average approximation method for solving stochastic second-order-cone complementarity problems

Proximal-like incremental aggregated gradient method with Bregman distance in weakly convex optimization problems

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix

The proof of Lemma 3

Proof

The proof of Lemma 4

Proof

The proof of Lemma 5

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Accelerated proximal stochastic variance reduction for DC optimization

Abstract

Access this article

Similar content being viewed by others

Inertial proximal gradient methods with Bregman regularization for a class of nonconvex optimization problems

Smoothing sample average approximation method for solving stochastic second-order-cone complementarity problems

Proximal-like incremental aggregated gradient method with Bregman distance in weakly convex optimization problems

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix

The proof of Lemma 3

Proof

The proof of Lemma 4

Proof

The proof of Lemma 5

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation