Accelerating incremental gradient optimization with curvature information

Wai, Hoi-To; Shi, Wei; Uribe, César A.; Nedić, Angelia; Scaglione, Anna

doi:10.1007/s10589-020-00183-1

Accelerating incremental gradient optimization with curvature information

Published: 07 March 2020

Volume 76, pages 347–380, (2020)
Cite this article

Computational Optimization and Applications Aims and scope Submit manuscript

Hoi-To Wai¹,
Wei Shi²,
César A. Uribe⁴,
Angelia Nedić³ &
…
Anna Scaglione³

765 Accesses
7 Citations
2 Altmetric
Explore all metrics

Abstract

This paper studies an acceleration technique for incremental aggregated gradient (IAG) method through the use of curvature information for solving strongly convex finite sum optimization problems. These optimization problems of interest arise in large-scale learning applications. Our technique utilizes a curvature-aided gradient tracking step to produce accurate gradient estimates incrementally using Hessian information. We propose and analyze two methods utilizing the new technique, the curvature-aided IAG (CIAG) method and the accelerated CIAG (A-CIAG) method, which are analogous to gradient method and Nesterov’s accelerated gradient method, respectively. Setting $\kappa$ to be the condition number of the objective function, we prove the R linear convergence rates of $1 - \frac{4c_0 \kappa }{(\kappa +1)^2}$ for the CIAG method, and $1 - \sqrt{\frac{c_1}{2\kappa }}$ for the A-CIAG method, where $c_0,c_1 \le 1$ are constants inversely proportional to the distance between the initial point and the optimal solution. When the initial iterate is close to the optimal solution, the R linear convergence rates match with the gradient and accelerated gradient method, albeit CIAG and A-CIAG operate in an incremental setting with strictly lower computation complexity. Numerical experiments confirm our findings. The source codes used for this paper can be found on http://github.com/hoitowai/ciag/.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient gradient method with approximate optimal stepsize for large-scale unconstrained optimization

Article 21 June 2017

Iterative Grossone-Based Computation of Negative Curvature Directions in Large-Scale Optimization

Article Open access 31 July 2020

Globally linearly convergent nonlinear conjugate gradients without Wolfe line search

Article 09 February 2024

References

Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015)
Arjevani, Y., Shamir, O.: Dimension-free iteration complexity of finite sum optimization problems. In: Advances in Neural Information Processing Systems 29, pp. 3540–3548 (2016)
Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1999)
MATH Google Scholar
Bertsekas, D.P.: Incremental gradient, subgradient, and proximal methods for convex optimization: a survey. Optim. Mach. Learn. 2010(1–38), 3 (2011)
Google Scholar
Blatt, D., Hero, A.O., Gauchman, H.: A convergent incremental gradient method with a constant step size. SIAM J. Optim. 18(1), 29–51 (2007)
Article MathSciNet Google Scholar
Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)
Article MathSciNet Google Scholar
Bubeck, S., et al.: Convex optimization: algorithms and complexity. Found. Trends Mach. Learn. 8(3–4), 231–357 (2015)
Article Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011)
Article Google Scholar
Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: a fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems 27, pp. 1646–1654 (2014)
Feyzmahdavian, H.R., Aytekin, A., Johansson, M.: A delayed proximal gradient method with linear convergence rate. In: IEEE International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6. IEEE (2014)
Gower, R.M., Roux, N.L., Bach, F.: Tracking the gradients using the hessian: a new look at variance reducing stochastic methods. In: AISTATS (2018)
Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.: A globally convergent incremental newton method. Math. Program. 151(1), 283–313 (2015)
Article MathSciNet Google Scholar
Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.: Why random reshuffling beats stochastic gradient descent. Math. Program. https://doi.org/10.1007/s10107-019-01440-w (2019)
Article MATH Google Scholar
Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.: On the convergence rate of incremental aggregated gradient algorithms. SIAM J. Optim. 27(2), 1035–1048 (2017)
Article MathSciNet Google Scholar
Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Math. Program. 171, 167–215 (2018)
Article MathSciNet Google Scholar
Mairal, J.: Incremental majorization-minimization optimization with application to large-scale machine learning. SIAM J. Optim. 25(2), 829–855 (2015)
Article MathSciNet Google Scholar
Mokhtari, A., Eisen, M., Ribeiro, A.: Iqn: an incremental quasi-newton method with local superlinear convergence rate. SIAM J. Optim. 28(2), 1670–1698 (2018)
Article MathSciNet Google Scholar
Nedić A., Bertsekas D.: Convergence Rate of Incremental Subgradient Algorithms. In: Uryasev S., Pardalos P.M. (eds) Stochastic Optimization: Algorithms and Applications. Applied Optimization, vol. 54. Springer, Boston, MA (2001)
MATH Google Scholar
Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM J. Optim. 12(1), 109–138 (2001)
Article MathSciNet Google Scholar
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course, vol. 87. Springer, Berlin (2013)
MATH Google Scholar
Nitanda, A.: Stochastic proximal gradient descent with acceleration techniques. In: Advances in Neural Information Processing Systems, pp. 1574–1582 (2014)
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951)
Article MathSciNet Google Scholar
Rodomanov, A., Kropotov, D.: A superlinearly-convergent proximal Newton-type method for the optimization of finite sums. In: International Conference on Machine Learning, pp. 2597–2605 (2016)
Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Math. Program. 162(1–2), 83–112 (2017)
Article MathSciNet Google Scholar
Schmidt, M., Roux, N.L., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Advances in Neural Information Processing Systems 24, pp. 1458–1466 (2011)
So, A.M.C., Zhou, Z.: Non-asymptotic convergence analysis of inexact gradient methods for machine learning without strong convexity. Optim. Methods Softw. 32(4), 963–992 (2017)
Article MathSciNet Google Scholar
Vanli, N.D., Gürbüzbalaban, M., Ozdaglar, A.: A stronger convergence result on the proximal incremental aggregated gradient method. arXiv preprint arXiv:1611.08022 (2016)
Vapnik, V.N.: An overview of statistical learning theory. IEEE Trans. Neural Netw. 10(5), 988–999 (1999)
Article Google Scholar
Wai, H.T., Shi, W., Nedić, A., Scaglione, A.: Curvature-aided incremental aggregated gradient method. In: Proceedings of Allerton (2017)
Xiao, L., Zhang, T.: A proximal stochastic gradient method with progressive variance reduction. SIAM J. Optim. 24(4), 2057–2075 (2014)
Article MathSciNet Google Scholar
Zheng, S., Meng, Q., Wang, T., Chen, W., Yu, N., Ma, Z.M., Liu, T.Y.: Asynchronous stochastic gradient descent with delay compensation. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 4120–4129. JMLR.org (2017)

Download references

Acknowledgements

This work has been partially supported by the NSF Grant CCF-1717391 and CUHK Direct Grant #4055113.

Author information

Authors and Affiliations

Department of SEEM, Chinese University of Hong Kong, Sha Tin, Hong Kong
Hoi-To Wai
Department of EE, Princeton University, NJ, USA
Wei Shi
School of ECEE, Arizona State University, Tempe, AZ, USA
Angelia Nedić & Anna Scaglione
LIDS, Massachusetts Institute of Technology, Cambridge, MA, USA
César A. Uribe

Authors

Hoi-To Wai
View author publications
You can also search for this author in PubMed Google Scholar
Wei Shi
View author publications
You can also search for this author in PubMed Google Scholar
César A. Uribe
View author publications
You can also search for this author in PubMed Google Scholar
Angelia Nedić
View author publications
You can also search for this author in PubMed Google Scholar
Anna Scaglione
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hoi-To Wai.

Additional information

In memory of Dr. Wei Shi, a respected friend and talented scholar.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A Proof of Proposition 3

Let us express the gradient error as

$${{{\varvec{e}}}}_{\textsf{CIAG}}^k = \sum _{i=1}^m \left( {\nabla }f_i ( {\varvec{\theta }}^{\tau _i^k} ) + {\nabla }^2 f_i ( {\varvec{\theta }}^{\tau _i^k} ) ( {\varvec{\theta }}^k - {\varvec{\theta }}^{\tau _i^k} ) - {\nabla }f_i ( {\varvec{\theta }}^k ) \right)$$

(59)

Applying Lemma 1:

$$\begin{aligned} \begin{aligned}&\Vert {{{\varvec{e}}}}_{\textsf{CIAG}}^k \Vert \le \sum _{i=1}^m \frac{L_{H,i}}{2} \Vert {\varvec{\theta }}^{\tau _i^k} - {\varvec{\theta }}^k \Vert ^2 \le \sum _{i=1}^m \frac{L_{H,i}}{2} \underbrace{(k - \tau _i^k)}_{\le K} \sum _{j=\tau _i^k}^{k-1} \Vert {\varvec{\theta }}^{j+1} - {\varvec{\theta }}^j \Vert ^2 \\&\quad \le \frac{K L_H}{2} \sum _{j= (k-K)_{++}}^{k-1} \Vert {\varvec{\theta }}^{j+1} - {\varvec{\theta }}^j \Vert ^2 \le \frac{K L_H}{2} \gamma ^2 \sum _{j=(k-K)_{++}}^{k-1} \Vert {{{\varvec{e}}}}_{\textsf{CIAG}}^j + {\nabla }F({\varvec{\theta }}^j) \Vert ^2 \\&\quad \le \gamma ^2 K L_H \sum _{j=(k-K)_{++}}^{k-1} \left( \Vert {{{\varvec{e}}}}_{\textsf{CIAG}}^j \Vert ^2 + \Vert {\nabla }F({\varvec{\theta }}^j) \Vert ^2 \right) \;. \end{aligned} \end{aligned}$$

(60)

Furthermore, we have

$$\begin{aligned}&\Vert {\nabla }F({\varvec{\theta }}^j) \Vert ^2 = \Vert {\nabla }F({\varvec{\theta }}^j) - {\nabla }F({\varvec{\theta }}^\star ) \Vert ^2 \le L^2 V^{(j)}, \end{aligned}$$

(61)

$$\begin{aligned}&\Vert {{{\varvec{e}}}}_{\textsf{CIAG}}^j \Vert \overset{(a)}{\le } \sum _{i=1}^m L_{H,i} \left( V^{(j)} + V^{(\tau _i^j)} \right) \le 2 L_H \max _{ \ell \in \{ \tau _i^j \}_{i=1}^m \cup \{j\} } V^{(\ell )} \;, \end{aligned}$$

(62)

where (a) is due to $\Vert {{{\varvec{a}}}} - {{{\varvec{b}}}} \Vert ^2 \le 2 (\Vert {{{\varvec{a}}}}\Vert ^2 + \Vert {{{\varvec{b}}}} \Vert ^2)$. Plugging these back into (60) and using $\tau _i^{k-K} \ge k - 2K$ gives:

$$\begin{aligned} \begin{aligned} \Vert {{{\varvec{e}}}}_{\textsf{CIAG}}^k \Vert&\le \gamma ^2 K L_H \sum _{j=(k-K)_{++}}^{k-1} \left( L^2 V^{(j)} + \left( 2 L_H \max _{ \ell \in \{ \tau _i^j \}_{i=1}^m \cup \{j\} } V^{(\ell )} \right) ^2 \right) \\&\le \gamma ^2 K^2 L_H \left( L^2 \max _{ (k-K)_{++} \le \ell \le k-1 } V^{(\ell )} + 4 L_H^2 \max _{ (k-2K)_{++} \le \ell \le k-1 } (V^{(\ell )})^2 \right) \;. \end{aligned} \end{aligned}$$

(63)

B Step 3 in the Proof of Theorem 1

Combining Proposition 1 and 3 yields

$$\begin{aligned} \begin{aligned} V^{(k+1)}&\le \left( 1 - 2\gamma \frac{ \mu L }{\mu + L}\right) V^{(k)} \\&\quad + 2 \gamma ^3 K^2 L_H \left( L^2 \max _{ (k-K)_{++} \le \ell \le k } (V^{(\ell )})^{\frac{3}{2}}+ 4 L_H^2 \max _{ (k-2K)_{++} \le \ell \le k } (V^{(\ell )})^{\frac{5}{2}} \right) \\&\quad + 2 \gamma ^6 K^4 L_H^2 \left( L^4 \max _{ (k-K)_{++} \le \ell \le k-1 } (V^{(\ell )})^2 + 16 L_H^4 \max _{ (k-2K)_{++} \le \ell \le k-1 } (V^{(\ell )})^4 \right) , \end{aligned} \end{aligned}$$

(64)

which is the exact form for Eq. (44). The right hand side in (64) can be decomposed into two terms—the first term is of the same order as $V^{(k)}$, and the other terms are delayed and higher-order terms of $V^{(\ell )}$.

Observe that (64) is a special case of (48) in Proposition 5 with $R^{(k)} = V^{(k)}$, $M=2K+1$, $p=1 - 2 \gamma \mu L / (\mu + L)$ and

$$\begin{aligned} \begin{aligned}&q_1 = 2 \gamma ^3 K^2 L^2 L_H,~\eta _1 = 3/2,~q_2 = 8 \gamma ^3 K^2 L_H^3,~\eta _3 = 5/2 \;, \\&q_3 = 2 \gamma ^6 K^4 L_H^2 L^4,~\eta _3 = 2,~q_4 = 32 \gamma ^6 K^4 L_H^6,~\eta _4 = 4 \;. \end{aligned} \end{aligned}$$

(65)

The corresponding convergence condition in (49) can be satisfied if

$$\begin{aligned} \begin{aligned}&\gamma ^5 ~ 2K^4 L_H^2 \left( L^4 V^{(1)} + 16 L_H^4 (V^{(1)})^3 \right)< \frac{ \mu L }{ \mu + L } \\&\text{and}~~\gamma ^2 ~ 2K^2 L_H \left( L^2 (V^{(1)})^{1/2} + 4 L_H^2 (V^{(1)})^{3/2} \right) < \frac{ \mu L }{ \mu + L } \;, \end{aligned} \end{aligned}$$

(66)

which can be implied by (28). The proof is thus concluded.

C Proof of Proposition 5

The proof of the proposition is divided into two parts. We first show that under (49), the sequence $\{ R^{(k)} \}_{k \ge 1}$ converges linearly as in part (a) of the proposition; then we show that the rate of convergence is asymptotically given by p as in part (b) of the proposition [cf. (50)].

The first part of the proof is achieved using induction on all $\ell \ge 1$ with:

$$\begin{aligned} R^{(k)} \le \delta ^\ell ~ R^{(1)},~\forall ~k=(\ell -1)M + 2,..., \ell M + 1\;. \end{aligned}$$

(67)

The base case when $\ell =1$ can be straightforwardly established:

$$\begin{aligned} \begin{aligned}&\textstyle R^{(2)} \le p R^{(1)} + \sum _{j=1}^J q_j (R^{(1)})^{\eta _j} \le \delta R^{(1)} \;, \\&\vdots \\&\textstyle R^{(M+1)} \le p R^{(M)} + \sum _{j=1}^J q_j (R^{(0)})^{\eta _j} \le \delta R^{(1)} \;. \end{aligned} \end{aligned}$$

(68)

Suppose that the statement (67) is true up to $\ell =c$, for $\ell =c+1$, we have:

$$\begin{aligned} \begin{aligned} R^{( cM+ 2)}&\le p R^{( cM+1 )} + \sum _{j=1}^J q_j \max _{ k' \in [ (c-1)M + 2, cM +1 ] } (R^{(k')})^{\eta _j} \\&\le p \left( \delta ^c R^{(1)} \right) + \sum _{j=1}^J q_j \left( \delta ^c R^{(1)} \right) ^{\eta _j} \le \delta ^c ~ \left( pR^{(1)} + \sum _{j=1}^J q_j (R^{(1)})^{\eta _j} \right) \le \delta ^{c+1} R^{(1)} \;. \end{aligned} \end{aligned}$$

Similar statement also holds for $R^{(k)}$ with $k=cM+3,...,(c+1)M+1$. We thus conclude with:

$$\begin{aligned} R^{(k)} \le \delta ^{ \lceil (k-1) / M \rceil } ~ R^{(1)},~\forall ~ k \ge 1 \;, \end{aligned}$$

(69)

which proves the first part of the proposition.

The second part of the proof establishes the asymptotic linear rate of convergence in (50). We consider the upper bound sequence $\{ \bar{R}^{(k)} \}_{k \ge 1}$ such that $\bar{R}^{(1)} = R^{(1)}$ and the inequality (48) is tight for $\{ \bar{R}^{(k)} \}_{k \ge 1}$. Obviously, it also holds that $\bar{R}^{(k)} \le \delta ^{ \lceil (k-1) / M \rceil } \bar{R}^{(1)}$ for all $k \ge 1$. Now, observe that

$$\begin{aligned} \frac{\bar{R}^{(k+1)}}{\bar{R}^{(k)}} = p + \frac{ \sum _{j=1}^J q_j \max _{ k' \in [(k-M+1)_{++}, k] } (R^{(k')})^{\eta _j} }{ \bar{R}^{(k)} } \;. \end{aligned}$$

(70)

For any $k' \in [k-M+1,k]$ and any $\eta > 1$, we have:

$$\begin{aligned} \begin{aligned}&\frac{ (\bar{R}^{(k')})^{\eta } }{ \bar{R}^{(k)} } = \frac{ \bar{R}^{(k')} }{ \bar{R}^{(k)} } ~ (\bar{R}^{(k')})^{\eta -1} \le \frac{ \bar{R}^{(k')} }{ \bar{R}^{(k)} } (R^{(1)})^{\eta -1} \delta ^{ (\lceil \frac{k'-1}{M} \rceil )(\eta -1) }\;. \end{aligned} \end{aligned}$$

(71)

Note that as $\bar{R}^{(k+1)} / \bar{R}^{(k)} \ge p$, we have:

$$\begin{aligned} \frac{ (\bar{R}^{(k')})^{\eta } }{ \bar{R}^{(k)} } \le p^{-M} (R^{(1)})^{\eta -1} \delta ^{ (\lceil \frac{k'-1}{M} \rceil )(\eta -1) } \;. \end{aligned}$$

(72)

Taking $k \rightarrow \infty$ shows that the right hand side vanishes. As a result, we have $\lim _{k \rightarrow \infty } \bar{R}^{(k+1)} / \bar{R}^{(k)} = p$. This proves part (b) of the proposition.

D Proof of Proposition 2

The following proof is partially inspired by [7, 21, 25]. For simplicity, we drop the subscript ACIAG in ${{{\varvec{g}}}}_{\textsf{ACIAG}}^k$ and ${{{\varvec{e}}}}_{\textsf{ACIAG}}^k$. Define $\rho \mathrel{\mathop :}=1 - \sqrt{\mu \gamma }$ and the estimation sequence as:

$$\begin{aligned} \begin{aligned} \varPhi _1 ( {\varvec{\theta }})&\mathrel{\mathop :}=F ( {\varvec{\theta }}_{ex}^1 ) + \frac{ \mu }{2} \Vert {\varvec{\theta }}- {\varvec{\theta }}_{ex}^1 \Vert ^2 \\ \varPhi _{k+1}( {\varvec{\theta }})&\mathrel{\mathop :}=\rho ~\varPhi _k ( {\varvec{\theta }}) + \sqrt{\mu \gamma } \left( F( {\varvec{\theta }}_{ex}^k) + \langle {{{\varvec{g}}}}^k, {\varvec{\theta }}- {\varvec{\theta }}_{ex}^k \rangle + \frac{\mu }{2} \Vert {\varvec{\theta }}- {\varvec{\theta }}_{ex}^k \Vert ^2 \right) \;, \end{aligned} \end{aligned}$$

(73)

where ${{{\varvec{g}}}}^k \mathrel{\mathop :}={{{\varvec{b}}}}^k + {{{\varvec{H}}}}^k {\varvec{\theta }}_{ex}^k$ is the gradient surrogate used in (17). Recall that ${{{\varvec{e}}}}^k \mathrel{\mathop :}={{{\varvec{g}}}}^k - {\nabla }F( {\varvec{\theta }}_{ex}^k )$ is the gradient error. The following inequality, which holds for all ${\varvec{\theta }}\in \mathbb{R}^d$, can be immediately obtained using (73) and the $\mu$-strong convexity of $F({\varvec{\theta }})$:

$$\begin{aligned} \begin{aligned}&\varPhi _{k+1} ({\varvec{\theta }}) - F({\varvec{\theta }}) = \rho \varPhi _k ( {\varvec{\theta }}) - F({\varvec{\theta }}) \\&\qquad + \sqrt{\mu \gamma } \left( F( {\varvec{\theta }}_{ex}^k) + \langle {\nabla }F({\varvec{\theta }}_{ex}^k) + {{{\varvec{e}}}}^k, {\varvec{\theta }}- {\varvec{\theta }}_{ex}^k \rangle + \frac{\mu }{2} \Vert {\varvec{\theta }}- {\varvec{\theta }}_{ex}^k \Vert ^2 \right) \\&\quad \le \rho \left( \varPhi _k ( {\varvec{\theta }}) - F({\varvec{\theta }}) \right) + \sqrt{\mu \gamma } \langle {{{\varvec{e}}}}^k, {\varvec{\theta }}- {\varvec{\theta }}_{ex}^s \rangle \\&\quad \le \rho ^k \left( \varPhi _1( {\varvec{\theta }}) - F({\varvec{\theta }}) \right) + \sum _{\ell =1}^k \rho ^{k-\ell } \sqrt{ \mu \gamma } \langle {{{\varvec{e}}}}^\ell , {\varvec{\theta }}- {\varvec{\theta }}_{ex}^\ell \rangle \;. \end{aligned} \end{aligned}$$

(74)

To facilitate our development, let us denote:

$$\begin{aligned} \varPhi _k^\star \mathrel{\mathop :}=\min _{ {\varvec{\theta }}} \varPhi _k ( {\varvec{\theta }}),~~{{{\varvec{v}}}}^k \mathrel{\mathop :}=\arg \min _{ {\varvec{\theta }}} \varPhi _k ( {\varvec{\theta }}) \;. \end{aligned}$$

(75)

By setting ${\varvec{\theta }}= {\varvec{\theta }}^\star$ in (74), we have:

$$\begin{aligned} \begin{aligned}&\varPhi _{k+1}^\star - F({\varvec{\theta }}^\star ) \le \varPhi _{k+1}({\varvec{\theta }}^\star ) - F({\varvec{\theta }}^\star ) \\&\quad \le \rho ^k \left( \frac{\mu }{2} \Vert {\varvec{\theta }}^\star - {\varvec{\theta }}_{ex}^1 \Vert ^2 + F({\varvec{\theta }}_{ex}^1) - F({\varvec{\theta }}^\star ) \right) + \sum _{\ell =1}^k \rho ^{k-\ell } \sqrt{ \mu \gamma } \langle {{{\varvec{e}}}}^\ell , {\varvec{\theta }}^\star - {\varvec{\theta }}_{ex}^\ell \rangle \\&\quad \le 2 \rho ^k \left( F({\varvec{\theta }}^1) - F({\varvec{\theta }}^\star ) \right) + \sum _{\ell =1}^k \rho ^{k-\ell } \sqrt{ \mu \gamma } \langle {{{\varvec{e}}}}^\ell , {\varvec{\theta }}^\star - {\varvec{\theta }}_{ex}^\ell \rangle \;. \end{aligned} \end{aligned}$$

(76)

Now, if $F( {\varvec{\theta }}^{k+1} ) \le \varPhi _{k+1}^\star$, then the inequality above shows the evolution of the optimality gap $h^{(k)}$. This motivates our next step, relating $F( {\varvec{\theta }}^{k+1} )$ to $\varPhi _{k+1}^\star$.

Lower bounding$\varPhi _{k+1}^\star$in the presence of errors. Since ${\nabla }^2 \varPhi _k ( {\varvec{\theta }}) = \mu {{{\varvec{I}}}}$, the function $\varPhi _k({\varvec{\theta }})$ is quadratic and we can represent $\varPhi _k({\varvec{\theta }})$ alternatively as

$$\begin{aligned} \varPhi _k ({\varvec{\theta }}) = \varPhi _k^\star + \frac{\mu }{2} \Vert {\varvec{\theta }}- {{{\varvec{v}}}}^k \Vert ^2 \;. \end{aligned}$$

(77)

By substituting (77) into the definition of $\varPhi _{k+1} ({\varvec{\theta }})$ in (73) and evaluating the first order optimality condition of the latter, we have:

$$\begin{aligned} \begin{aligned}&\sqrt{\mu \gamma } ( {{{\varvec{g}}}}^k + \mu ( {{{\varvec{v}}}}^{k+1} - {\varvec{\theta }}_{ex}^k ) ) + \rho ~ \mu ( {{{\varvec{v}}}}^{k+1} - {{{\varvec{v}}}}^k ) = {{{\varvec{0}}}} \;,\\&\Longrightarrow {{{\varvec{v}}}}^{k+1} = \rho {{{\varvec{v}}}}^k + \sqrt{ \mu \gamma } {\varvec{\theta }}_{ex}^k - \sqrt{\frac{\gamma }{\mu }} {{{\varvec{g}}}}^k \;. \end{aligned} \end{aligned}$$

(78)

By setting ${\varvec{\theta }}={\varvec{\theta }}_{ex}^k$ in (73) and using the recursive definition of $\varPhi _{k+1} ({\varvec{\theta }})$, we obtain

$$\begin{aligned} \begin{aligned} \varPhi _{k+1} ( {\varvec{\theta }}_{ex}^k )&= \rho \varPhi _{k} ( {\varvec{\theta }}_{ex}^k ) + \sqrt{\mu \gamma } F( {\varvec{\theta }}_{ex}^k ) = \rho \left( \varPhi _k^\star + \frac{\mu }{2} \Vert {\varvec{\theta }}_{ex}^k - {{{\varvec{v}}}}^k \Vert ^2 \right) + \sqrt{\mu \gamma } F( {\varvec{\theta }}_{ex}^k ) \;, \end{aligned} \end{aligned}$$

(79)

while setting ${\varvec{\theta }}={\varvec{\theta }}_{ex}^k$ in (77) and using (78) gives us:

$$\begin{aligned} \begin{aligned} \varPhi _{k+1} ( {\varvec{\theta }}_{ex}^k )&= \varPhi _{k+1}^\star + \frac{\mu }{2} \left( \rho ^2 \Vert {\varvec{\theta }}_{ex}^k - {{{\varvec{v}}}}^k \Vert ^2 + \frac{\gamma }{\mu } \Vert {{{\varvec{g}}}}^k \Vert ^2 + 2 \rho \sqrt{\frac{\gamma }{\mu }} \langle {{{\varvec{g}}}}^k, {\varvec{\theta }}_{ex}^k - {{{\varvec{v}}}}^k \rangle \right) \;. \end{aligned} \end{aligned}$$

(80)

Comparing the right hand side of (79) and (80) shows:

$$\begin{aligned} \begin{aligned} \varPhi _{k+1}^\star&= \rho \left( \varPhi _k^\star + \frac{\mu }{2} \Vert {\varvec{\theta }}_{ex}^k - {{{\varvec{v}}}}^k \Vert ^2 \right) + \sqrt{\mu \gamma } F( {\varvec{\theta }}_{ex}^k ) \\&\quad - \frac{\mu }{2}\left( \rho ^2 \Vert {\varvec{\theta }}_{ex}^k - {{{\varvec{v}}}}^k \Vert ^2 + \frac{\gamma }{\mu } \Vert {{{\varvec{g}}}}^k \Vert ^2 + 2 \rho \sqrt{\frac{\gamma }{\mu }} \langle {{{\varvec{g}}}}^k, {\varvec{\theta }}_{ex}^k - {{{\varvec{v}}}}^k \rangle \right) \\&= \rho \varPhi _k^\star + \sqrt{\mu \gamma } F({\varvec{\theta }}_{ex}^k) + \frac{\mu }{2} \rho \sqrt{\mu \gamma } \Vert {\varvec{\theta }}_{ex}^k - {{{\varvec{v}}}}^k \Vert ^2 - \frac{\gamma }{2} \Vert {{{\varvec{g}}}}^k \Vert ^2 - \rho \sqrt{\mu \gamma } \langle {{{\varvec{g}}}}^k, {\varvec{\theta }}_{ex}^k - {{{\varvec{v}}}}^k \rangle \;. \end{aligned} \end{aligned}$$

Using the fact ${{{\varvec{v}}}}^k - {\varvec{\theta }}_{ex}^k = (\sqrt{\mu \gamma })^{-1} \left( {\varvec{\theta }}_{ex}^k - {\varvec{\theta }}^k \right)$ (proven in Sect. D.1), we have

$$\begin{aligned} \begin{aligned} \varPhi _{k+1}^\star&= \rho \varPhi _k^\star + \sqrt{\mu \gamma } F({\varvec{\theta }}_{ex}^k) + \frac{\mu }{2} \frac{ \rho }{ \sqrt{\mu \gamma } } \Vert {\varvec{\theta }}_{ex}^k - {\varvec{\theta }}^k \Vert ^2 - \frac{ \gamma }{2} \Vert {{{\varvec{g}}}}^k \Vert ^2 - \rho \langle {{{\varvec{g}}}}^k, {\varvec{\theta }}^k - {\varvec{\theta }}_{ex}^k \rangle \;. \end{aligned} \end{aligned}$$

(81)

We obtain the following chain:

$$\begin{aligned} \begin{aligned}&F( {\varvec{\theta }}^{k+1} ) - \varPhi _{k+1}^\star \overset{(a)}{\le } F( {\varvec{\theta }}_{ex}^k ) - \gamma \langle {\nabla }F( {\varvec{\theta }}_{ex}^k ), {{{\varvec{g}}}}^k \rangle + \frac{L \gamma ^2}{2} \Vert {{{\varvec{g}}}}^k \Vert ^2 - \varPhi _{k+1}^\star \\&\quad \overset{(b)}{=} \rho ~ \left( F({\varvec{\theta }}_{ex}^k ) + \langle {{{\varvec{g}}}}^k, {\varvec{\theta }}^k - {\varvec{\theta }}_{ex}^k \rangle - \varPhi _k^\star \right) \\&\qquad -\gamma \langle {\nabla }F( {\varvec{\theta }}_{ex}^k ), {{{\varvec{g}}}}^k \rangle + \frac{\gamma }{2} \left( 1 + L \gamma \right) \Vert {{{\varvec{g}}}}^k \Vert ^2 - \frac{\mu }{2} \frac{ \rho }{ \sqrt{\mu \gamma } } \Vert {\varvec{\theta }}_{ex}^k - {\varvec{\theta }}^k \Vert ^2 \\&\quad \overset{(c)}{=} \rho ~ \left( F({\varvec{\theta }}_{ex}^k) + \langle {\nabla }F({\varvec{\theta }}_{ex}^k), {\varvec{\theta }}^k - {\varvec{\theta }}_{ex}^k \rangle - \varPhi _k^\star \right) -\gamma \langle {\nabla }F( {\varvec{\theta }}_{ex}^k ), {{{\varvec{g}}}}^k \rangle \\&\qquad + \rho \langle {{{\varvec{e}}}}^k, {\varvec{\theta }}^k - {\varvec{\theta }}_{ex}^k \rangle + \frac{\gamma }{2} \left( 1 + L \gamma \right) \Vert {{{\varvec{g}}}}^k \Vert ^2 - \frac{\mu }{2} \frac{ \rho }{ \sqrt{\mu \gamma } } \Vert {\varvec{\theta }}_{ex}^k - {\varvec{\theta }}^k \Vert ^2 \\&\quad \overset{(d)}{\le } \rho ~ \left( F({\varvec{\theta }}^k) - \varPhi _k^\star + \langle {{{\varvec{e}}}}^k, {\varvec{\theta }}^k - {\varvec{\theta }}_{ex}^k \rangle \right) - \frac{\mu }{2} \frac{ 1 - \mu \gamma }{ \sqrt{\mu \gamma } } \Vert {\varvec{\theta }}_{ex}^k - {\varvec{\theta }}^k \Vert ^2 \\&\qquad + \frac{\gamma }{2} \left( 1 + L \gamma \right) \Vert {{{\varvec{g}}}}^k \Vert ^2 - \gamma \langle {\nabla }F( {\varvec{\theta }}_{ex}^k ), {{{\varvec{g}}}}^k \rangle \\&\quad \overset{(e)}{\le } \rho ~ \left( F({\varvec{\theta }}^k) - \varPhi _k^\star + \langle {{{\varvec{e}}}}^k, {\varvec{\theta }}^k - {\varvec{\theta }}_{ex}^k \rangle \right) - \frac{\mu }{2} \frac{ 1 - \mu \gamma }{ \sqrt{\mu \gamma } } \Vert {\varvec{\theta }}_{ex}^k - {\varvec{\theta }}^k \Vert ^2 + \gamma \Vert {{{\varvec{e}}}}^k \Vert ^2 \;, \end{aligned} \end{aligned}$$

(82)

where (a) is due to the L-smoothness of F; (b) is due to (81); (c) is obtained by expanding ${{{\varvec{g}}}}^k$ as ${\nabla }F({\varvec{\theta }}_{ex}^k) + {{{\varvec{e}}}}^k$; (d) is obtained by adding and subtracting $(\mu /2) \Vert {\varvec{\theta }}^k - {\varvec{\theta }}_{ex}^k \Vert ^2$ inside the first bracket, applying the identity $\rho + \rho / \sqrt{\mu \gamma } = (1 - \mu \gamma ) / \sqrt{\mu \gamma }$, and using the $\mu$-strong convexity of F; and (e) is due to the following chain of inequalities:

$$\begin{aligned} \begin{aligned}&\frac{\gamma }{2} \left( 1 + L \gamma \right) \Vert {{{\varvec{g}}}}^k \Vert ^2 - \gamma \langle {\nabla }F( {\varvec{\theta }}_{ex}^k ), {{{\varvec{g}}}}^s \rangle \\&\quad \le \frac{\gamma }{2} \left( 1 + L \gamma \right) \left( \Vert {{{\varvec{e}}}}^k \Vert ^2 + \Vert {\nabla }F( {\varvec{\theta }}_{ex}^k ) \Vert ^2 \right) + \frac{ L \gamma ^2 }{2} \left( \Vert {\nabla }F({\varvec{\theta }}_{ex}^k ) \Vert ^2 + \Vert {{{\varvec{e}}}}^k \Vert ^2 \right) - \gamma \Vert {\nabla }F( {\varvec{\theta }}_{ex}^k ) \Vert ^2 \\&\quad = \left( \frac{\gamma }{2} + L \gamma ^2 \right) \Vert {{{\varvec{e}}}}^k \Vert ^2 + \left( -\frac{\gamma }{2} + L \gamma ^2 \right) \Vert {\nabla }F( {\varvec{\theta }}_{ex}^k ) \Vert ^2 \le \gamma \Vert {{{\varvec{e}}}}^k \Vert ^2 \;. \end{aligned} \end{aligned}$$

As $\varPhi _1( {\varvec{\theta }}^1 ) = F( {\varvec{\theta }}^1 ) = \varPhi _1^\star$, applying the inequality (82) recursively shows:

$$\begin{aligned} \begin{aligned}&F( {\varvec{\theta }}^{k+1} ) - \varPhi _{k+1}^\star \le \\&\sum _{\ell =1}^k \rho ^{k-\ell } \left( (1-\sqrt{\mu \gamma }) \langle {{{\varvec{e}}}}^\ell , {\varvec{\theta }}^\ell - {\varvec{\theta }}_{ex}^\ell \rangle + \gamma \Vert {{{\varvec{e}}}}^\ell \Vert ^2 - \frac{\mu }{2} \frac{1-\mu \gamma }{\sqrt{\mu \gamma }} \Vert {\varvec{\theta }}_{ex}^\ell - {\varvec{\theta }}^\ell \Vert ^2 \right) \;. \end{aligned} \end{aligned}$$

(83)

Importantly, (83) establishes a lower bound on $\varPhi _{k+1}^\star$ in terms of $F({\varvec{\theta }}^{k+1})$ and ${{{\varvec{e}}}}^k$.

Proving Proposition 2. Finally, summing up (83) and (76) gives:

$$\begin{aligned} \begin{aligned} h^{(k+1)}&\le 2 \rho ^k h^{(1)} + \sum _{\ell =1}^k \rho ^{k-\ell } \left( \sqrt{\mu \gamma } \langle {{{\varvec{e}}}}^\ell , {\varvec{\theta }}^\star - {\varvec{\theta }}_{ex}^\ell \rangle \right. \\&\quad \left. +\, \rho \langle {{{\varvec{e}}}}^\ell , {\varvec{\theta }}^\ell - {\varvec{\theta }}_{ex}^\ell \rangle + \gamma \Vert {{{\varvec{e}}}}^\ell \Vert ^2 - \frac{\mu }{2} \frac{1-\mu \gamma }{\sqrt{\mu \gamma }} \Vert {\varvec{\theta }}_{ex}^\ell - {\varvec{\theta }}^\ell \Vert ^2 \right) \\&= 2 \rho ^k h^{(1)} + \sum _{\ell =1}^k \rho ^{k-\ell } \left( \sqrt{\mu \gamma } \langle {{{\varvec{e}}}}^\ell , {\varvec{\theta }}^\star - {\varvec{\theta }}^\ell \rangle \right. \\&\quad \left. +\, \langle {{{\varvec{e}}}}^\ell , {\varvec{\theta }}^\ell - {\varvec{\theta }}_{ex}^\ell \rangle + \gamma \Vert {{{\varvec{e}}}}^\ell \Vert ^2 - \frac{\mu }{2} \frac{1-\mu \gamma }{\sqrt{\mu \gamma }} \Vert {\varvec{\theta }}_{ex}^\ell - {\varvec{\theta }}^\ell \Vert ^2 \right) \;. \end{aligned} \end{aligned}$$

(84)

Let us take a look at the last summands in the above inequality: for any $\ell \ge 1$,

$$\begin{aligned} \begin{aligned}&\sqrt{\mu \gamma } \langle {{{\varvec{e}}}}^\ell , {\varvec{\theta }}^\star - {\varvec{\theta }}^\ell \rangle + \langle {{{\varvec{e}}}}^\ell , {\varvec{\theta }}^\ell - {\varvec{\theta }}_{ex}^\ell \rangle + \gamma \Vert {{{\varvec{e}}}}^\ell \Vert ^2 - \frac{\mu }{2} \frac{1-\mu \gamma }{\sqrt{\mu \gamma }} \Vert {\varvec{\theta }}_{ex}^\ell - {\varvec{\theta }}^\ell \Vert ^2 \\&\quad \overset{(a)}{\le } \sqrt{\mu \gamma } \Vert {{{\varvec{e}}}}^\ell \Vert \Vert {\varvec{\theta }}^\star - {\varvec{\theta }}^\ell \Vert + \left( \gamma + \frac{ \sqrt{\gamma / \mu } }{ 1 - \mu \gamma } \right) \Vert {{{\varvec{e}}}}^\ell \Vert ^2 - \frac{\mu }{4} \frac{1-\mu \gamma }{\sqrt{\mu \gamma }} \Vert {\varvec{\theta }}_{ex}^\ell - {\varvec{\theta }}^\ell \Vert ^2 \\&\quad \overset{(b)}{\le } \sqrt{2 \gamma h^{(\ell )}} \Vert {{{\varvec{e}}}}^\ell \Vert + \left( \gamma + \frac{ \sqrt{\gamma / \mu } }{ 1 - \mu \gamma } \right) \Vert {{{\varvec{e}}}}^\ell \Vert ^2 - \frac{\mu }{4} \frac{1-\mu \gamma }{\sqrt{\mu \gamma }} \Vert {\varvec{\theta }}_{ex}^\ell - {\varvec{\theta }}^\ell \Vert ^2 \\&\quad \overset{(c)}{\le } \sqrt{2 \gamma h^{(\ell )}} \Vert {{{\varvec{e}}}}^\ell \Vert + \sqrt{\frac{9\gamma }{\mu }} \Vert {{{\varvec{e}}}}^\ell \Vert ^2 - \frac{\mu }{4} \frac{1-\mu \gamma }{\sqrt{\mu \gamma }} \Vert {\varvec{\theta }}_{ex}^\ell - {\varvec{\theta }}^\ell \Vert ^2 \;, \end{aligned} \end{aligned}$$

(85)

where (a) is resulted from the fact $\langle {{{\varvec{e}}}}^\ell , {\varvec{\theta }}^\ell - {\varvec{\theta }}_{ex}^\ell \rangle \le (1/2) ( \Vert {{{\varvec{e}}}}^\ell \Vert ^2 / c + c \Vert {\varvec{\theta }}^\ell - {\varvec{\theta }}_{ex}^\ell \Vert ^2 )$ for any $c > 0$ and we have set $c = \frac{\mu }{2} \frac{1-\mu \gamma }{\sqrt{\mu \gamma }}$ therein; (b) is due to the relation $\Vert {\varvec{\theta }}^\ell - {\varvec{\theta }}^\star \Vert \le \sqrt{2 h^{(\ell )} / \mu }$; (c) is due to $\gamma + \frac{ \sqrt{\gamma / \mu } }{ 1 - \mu \gamma } \le 3 \sqrt{ \gamma / \mu }$, which can be verified through replacing $\gamma$ by its upper bound 1/(2L) in the denominator of the fraction on the left-hand-side. Combining the two equations above yields the desired result of Proposition.

1.1 D.1 Proof of the equality

We prove ${{{\varvec{v}}}}^k - {\varvec{\theta }}_{ex}^k = (\sqrt{\mu \gamma })^{-1} \left( {\varvec{\theta }}_{ex}^k - {\varvec{\theta }}^k \right)$ using induction on k. Clearly, the said equality holds for $k=1$ since ${{{\varvec{v}}}}^1 = {\varvec{\theta }}^1 = {\varvec{\theta }}_{ex}^1$, and we assume that it holds up to k. Consider:

$$\begin{aligned} \begin{aligned}&{{{\varvec{v}}}}^{k+1} - {\varvec{\theta }}_{ex}^{k+1} = \rho {{{\varvec{v}}}}^k + \sqrt{ \mu \gamma } {\varvec{\theta }}_{ex}^k - \sqrt{\frac{\gamma }{\mu }} {{{\varvec{g}}}}^k - {\varvec{\theta }}_{ex}^{k+1} \\&\quad =\rho ( {{{\varvec{v}}}}^k - {\varvec{\theta }}_{ex}^k ) + {\varvec{\theta }}_{ex}^k - \sqrt{\frac{\gamma }{\mu }} {{{\varvec{g}}}}^k - {\varvec{\theta }}_{ex}^{k+1} = \frac{ \rho }{ \sqrt{\mu \gamma } } ( {\varvec{\theta }}_{ex}^k - {\varvec{\theta }}^k ) + {\varvec{\theta }}_{ex}^k - \sqrt{\frac{\gamma }{\mu }} {{{\varvec{g}}}}^k - {\varvec{\theta }}_{ex}^{k+1} \;, \end{aligned} \end{aligned}$$

where we have used the induction hypothesis. Furthermore, using ${\varvec{\theta }}^{k+1} = {\varvec{\theta }}_{ex}^k - \gamma {{{\varvec{g}}}}^k$,

$$\begin{aligned} \begin{aligned}&{{{\varvec{v}}}}^{k+1} - {\varvec{\theta }}_{ex}^{k+1} = \sqrt{\mu \gamma }^{-1} \left( \rho ({\varvec{\theta }}_{ex}^k - {\varvec{\theta }}^k) + \sqrt{\mu \gamma } ( {\varvec{\theta }}_{ex}^k - {\varvec{\theta }}_{ex}^{k+1} ) - \gamma {{{\varvec{g}}}}^k \right) \\&\quad \overset{(a)}{=} \sqrt{ \mu \gamma }^{-1} \left( \sqrt{\mu \gamma } ( {\varvec{\theta }}^{k+1} - {\varvec{\theta }}_{ex}^{k+1} ) + \rho ({\varvec{\theta }}^{k+1} - {\varvec{\theta }}^k ) \right) = \sqrt{ \mu \gamma }^{-1} \left( {\varvec{\theta }}_{ex}^{k+1} - {\varvec{\theta }}^{k+1} \right) \;, \end{aligned} \end{aligned}$$

(86)

where (a) is due to $\rho ({\varvec{\theta }}^{k+1} - {\varvec{\theta }}^k ) = (1 + \sqrt{\mu \gamma } ) ( {\varvec{\theta }}_{ex}^{k+1} - {\varvec{\theta }}^{k+1} )$.

E Proof of Proposition 4

We begin by observing that due to the $L_{H,i}$-Lipschitz continuity of the Hessian of $f_i$ and using Lemma 1, we have:

$$\begin{aligned} \begin{aligned}&\Vert {{{\varvec{e}}}}_{\textsf{ACIAG}}^\ell \Vert = \Vert {{{\varvec{g}}}}_{\textsf{ACIAG}}^\ell - {\nabla }F( {\varvec{\theta }}_{ex}^\ell ) \Vert \le \sum _{i=1}^m \frac{L_{H,i}}{2} \Vert {\varvec{\theta }}_{ex}^\ell - {\varvec{\theta }}_{ex}^{\tau _i^\ell } \Vert ^2 \;. \end{aligned} \end{aligned}$$

(87)

Now, expanding the right hand side of (87) gives:

$$\begin{aligned} \begin{aligned} \Vert {{{\varvec{e}}}}_{\textsf{ACIAG}}^\ell \Vert&\le \sum _{i=1}^m \frac{L_{H,i}}{2} \Big \Vert {\varvec{\theta }}_{ex}^\ell - {\varvec{\theta }}_{ex}^{\tau _i^\ell } \Big \Vert ^2 \le \sum _{i=1}^m \frac{L_{H,i}}{2} ~ \underbrace{( \ell - \tau _i^\ell )}_{\le K} \sum _{j=\ell -\tau _i^\ell }^{\ell -1} \Vert {\varvec{\theta }}_{ex}^{j+1} - {\varvec{\theta }}_{ex}^j \Vert ^2 \\&\quad \le \frac{K L_{H}}{2} \sum _{j=( \ell -K )_{++}}^{\ell -1} \Vert {\varvec{\theta }}_{ex}^{j+1} - {\varvec{\theta }}_{ex}^j \Vert ^2 = \frac{K L_{H}}{2} \sum _{j=( \ell -K )_{++}}^{\ell -1} \Vert \gamma {{{\varvec{g}}}}_{\textsf{ACIAG}}^j + \underbrace{\alpha ( {\varvec{\theta }}^{j+1} - {\varvec{\theta }}^j)}_{= {\varvec{\theta }}_{ex}^{j+1} - {\varvec{\theta }}^{j+1}} \Vert ^2 \\&\quad \le \frac{3 K L_H}{2} \sum _{j=( \ell -K )_{++}}^{\ell -1} \left( \gamma ^2 \left( \Vert {{{\varvec{e}}}}^j \Vert ^2 + \Vert {\nabla }F( {\varvec{\theta }}_{ex}^j ) \Vert ^2 \right) + \Vert {\varvec{\theta }}_{ex}^{j+1} - {\varvec{\theta }}^{j+1} \Vert ^2 \right) \;. \end{aligned} \end{aligned}$$

(88)

Remarkably, the above bound resembles that of Proposition 3 with the exception of the last term that depends on ${\varvec{\theta }}_{ex}^{j+1} - {\varvec{\theta }}^{j+1}$. This is included to account for the extrapolated iterates used in the A-CIAG method.

To find an upper bound of $\Vert {{{\varvec{e}}}}_{\textsf{ACIAG}}^\ell \Vert$ to corroborate Proposition 4, in what follows, we will upper bound $\Vert {{{\varvec{e}}}}_{\textsf{ACIAG}}^j \Vert ^2$ and $\Vert {\nabla }F({\varvec{\theta }}_{ex}^j) \Vert ^2$, respectively. Firstly,

$$\begin{aligned} \begin{aligned} \Vert {{{\varvec{e}}}}_{\textsf{ACIAG}}^j \Vert&\le \sum _{i=1}^m \frac{L_{H,i}}{2} \Big \Vert {\varvec{\theta }}_{ex}^j - {\varvec{\theta }}_{ex}^{\tau _i^j} \Big \Vert ^2 \\&\le \sum _{i=1}^m L_{H,i} \left( (1+\alpha )^2 \Vert {\varvec{\theta }}^j - {\varvec{\theta }}^{\tau _i^j} \Vert ^2 + \alpha ^2 \Vert {\varvec{\theta }}^{j-1} - {\varvec{\theta }}^{\tau _i^j-1} \Vert ^2 \right) \;. \end{aligned} \end{aligned}$$

(89)

Noticing that as $\Vert {\varvec{\theta }}^j - {\varvec{\theta }}^{\tau _i^j} \Vert ^2 \le 2 ( \Vert {\varvec{\theta }}^j - {\varvec{\theta }}^\star \Vert ^2 + \Vert {\varvec{\theta }}^{\tau _i^j} - {\varvec{\theta }}^\star \Vert ^2 ) \le (4/\mu ) ( h^{(j)} + h^{(\tau _i^j)} )$, it follows from (89) that

$$\begin{aligned} \begin{aligned} \Vert {{{\varvec{e}}}}_{\textsf{ACIAG}}^j \Vert&\le \frac{4}{\mu } \sum _{i=1}^m L_{H,i} \left( (1+\alpha )^2 ( h^{(j)} + h^{(\tau _i^j)} ) + \alpha ^2 ( h^{(j-1)} + h^{(\tau _i^j - 1)} ) \right) \\&\le \frac{ 8 L_H }{\mu } \left( (1+\alpha )^2 + \alpha ^2 \right) \max _{ (j- K-1)_{++} \le q \le j } h^{(q)} \le \frac{ 40 L_H }{\mu } \max _{ (j- K-1)_{++} \le q \le j } h^{(q)} \;, \end{aligned} \end{aligned}$$

(90)

which implies

$$\begin{aligned} \begin{aligned} \sum _{j=(\ell -K)_{++}}^{\ell -1} \Vert {{{\varvec{e}}}}_{\textsf{ACIAG}}^j \Vert ^2&\le K \left( \frac{ 40 L_H }{\mu }\right) ^2 \max _{ (\ell - 2K-1)_{++} \le q \le \ell } ( h^{(q)} )^2 \;. \end{aligned} \end{aligned}$$

(91)

Secondly,

$$\begin{aligned} \begin{aligned} \Vert {\nabla }F( {\varvec{\theta }}_{ex}^j ) \Vert ^2&\le 2L^2 \left( \Vert {\varvec{\theta }}^j - {\varvec{\theta }}^\star \Vert ^2 + \Vert {\varvec{\theta }}^j - {\varvec{\theta }}^{j-1} \Vert ^2 \right) \le \frac{4L^2}{\mu } \left( 3 h^{(j)} + 2 h^{(j-1)} ) \right) \;, \end{aligned} \end{aligned}$$

(92)

thus

$$\begin{aligned} \begin{aligned} \sum _{j=(\ell -K)_{++}}^{\ell -1} \Vert {\nabla }F( {\varvec{\theta }}_{ex}^j ) \Vert ^2&\le \frac{20L^2 K }{\mu } \max _{ (\ell - K - 1)_{++} \le q \le \ell -1} h^{(q)} \;. \end{aligned} \end{aligned}$$

(93)

Substituting (91) and (93) into the right hand side of (88) verifies Proposition 4.

F Step 3 in the Proof of Theorem 2

To proceed with the proof, let us define the following quantity:

$$\begin{aligned} \begin{aligned}&\tilde{E}^{(\ell )} \mathrel{\mathop :}=\gamma ^{\frac{5}{2}} \sqrt{\frac{9}{2}} K^2 L_H \left( \left( \frac{40L_H}{\mu } \right) ^2 \max _{ (\ell -2K-1)_{++} \le q \le \ell } (h^{(q)})^2 + \frac{20L^2}{\mu } \max _{ (\ell -K-1)_{++} \le q \le \ell } h^{(q)} \right) \\&\quad + \gamma ^{\frac{9}{2}} \frac{ 81 K^4 L_H^2 }{4 \sqrt{\mu }} \left( \left( \frac{40L_H}{\mu } \right) ^4 \max _{ (\ell -2K-1)_{++} \le q \le \ell } (h^{(q)})^4 + \left( \frac{20L^2}{\mu } \right) ^2 \max _{ (\ell -K-1)_{++} \le q \le \ell } (h^{(q)})^2 \right) \;. \end{aligned} \end{aligned}$$

Using Proposition 4, we obtain:

$$\begin{aligned} \begin{aligned}&\sqrt{2 \gamma h^{(\ell )}} \Vert {{{\varvec{e}}}}_{\textsf{ACIAG}}^\ell \Vert + \sqrt{\frac{9\gamma }{\mu }} \Vert {{{\varvec{e}}}}_{\textsf{ACIAG}}^\ell \Vert ^2 \\&\quad \le \tilde{E}^{(\ell )} + \sum _{j=(\ell -K+1)_{++}}^{\ell } \left( \sqrt{\frac{9 \gamma h^{(\ell )} K^2 L_H^2}{2}} \Vert {\varvec{\theta }}^{j} - {\varvec{\theta }}_{ex}^{j} \Vert ^2 + \frac{27 K^3 L_H^2}{4} \sqrt{\frac{9\gamma }{\mu }} \Vert {\varvec{\theta }}^j - {\varvec{\theta }}_{ex}^j \Vert ^4 \right) \;. \end{aligned} \end{aligned}$$

(94)

We need to further bound $h^{(k)}$ [recall for (41) in Proposition 2] in terms of itself to create a ‘recursion’ for $h^{(k)}$. To upper bound the right hand side of (41), let us start from (94). It follows that

$$\begin{aligned} \begin{aligned}&\sum _{\ell =1}^k \rho ^{k-\ell } \left( \sqrt{2 \gamma h^{(\ell )}} \Vert {{{\varvec{e}}}}^\ell \Vert + \sqrt{\frac{9\gamma }{\mu }} \Vert {{{\varvec{e}}}}^\ell \Vert ^2 - \frac{\mu }{4} \frac{1-\mu \gamma }{\sqrt{\mu \gamma }} \Vert {\varvec{\theta }}_{ex}^\ell - {\varvec{\theta }}^\ell \Vert ^2 \right) \le \sum _{\ell =1}^k \rho ^{k-\ell } \Bigg ( \tilde{E}^{(\ell )} \\+ &\left( \sum _{j=\ell }^{\min \{k,\ell +K-1\}} \left( \sqrt{\frac{9 \gamma K^2 L_H^2 h^{(j)}}{2} } + \frac{81 K^3 L_H^2}{4} \sqrt{\frac{\gamma }{\mu }} \Vert {\varvec{\theta }}^\ell - {\varvec{\theta }}_{ex}^\ell \Vert ^2 \right) - \frac{\mu }{4} \frac{1-\mu \gamma }{\sqrt{\mu \gamma }} \right) \Vert {\varvec{\theta }}^\ell - {\varvec{\theta }}_{ex}^\ell \Vert ^2 \Bigg ). \end{aligned} \end{aligned}$$

Moreover, we observe for $\ell \ge 2$:

$$\begin{aligned} \Vert {\varvec{\theta }}^\ell - {\varvec{\theta }}_{ex}^\ell \Vert ^2 \le 2 ( \Vert {\varvec{\theta }}^\ell - {\varvec{\theta }}^\star \Vert ^2 + \Vert {\varvec{\theta }}^{\ell -1} - {\varvec{\theta }}^\star \Vert ^2 ) \le \frac{4}{\mu } \left( h^{(\ell )} + h^{(\ell -1)} \right) \;, \end{aligned}$$

(95)

The coefficient in front of the last $\Vert {\varvec{\theta }}^\ell - {\varvec{\theta }}_{ex}^\ell \Vert ^2$ term can be upper bounded as:

$$\begin{aligned} \tilde{C}^{(\ell ,k)} \mathrel{\mathop :}=\gamma K^2 L_H \sqrt{\frac{9}{2}} \max _{ \ell \le q \le \min \{ \ell +K-1,k \}} (h^{(q)})^{\frac{1}{2}} + {\gamma } \frac{81 K^4 L_H^2}{\mu ^{\frac{3}{2}}} \left( h^{(\ell )} + h^{(\ell -1)} \right) - \frac{\mu }{4} \frac{1-\mu \gamma }{\sqrt{\mu }}. \end{aligned}$$

If we define

$$\begin{aligned} \begin{aligned}&E^{(\ell ,k)} \mathrel{\mathop :}=\tilde{E}^{(\ell )} + \tilde{C}^{(\ell ,k)} \frac{\Vert {\varvec{\theta }}^\ell - {\varvec{\theta }}_{ex}^\ell \Vert ^2}{\sqrt{\gamma }} \;, \end{aligned} \end{aligned}$$

(96)

where $E^{(\ell ,k)} = E^{(\ell ,k -1)}$ for all $k \ge \ell + m$. Applying Proposition 2 readily shows

$$\begin{aligned} h^{(k+1)} \le 2 ( 1 - \sqrt{\mu \gamma } )^k h^{(1)} + \sum _{\ell =1}^k (1 - \sqrt{\mu \gamma })^{k - \ell } E^{(\ell ,k)} \;. \end{aligned}$$

(97)

Concluding the Proof of Theorem 2. Our goal is to analyze (97) using Proposition 6. Let us recognize that:

$$\begin{aligned} R^{(k)}= & {} \bar{h}^{(k)},~p = (1-\sqrt{\mu \gamma }),~b = 2,~M= 2K+1,~\eta _1 = \frac{3}{2},~\eta _2 = \frac{5}{2}, \eta _3 = 2,~\eta _4 = 4\\ s_1= & {} \gamma ^{\frac{5}{2}} \sqrt{\frac{9}{2}} K^2 L_H \frac{20L^2}{\mu },~ s_2 = \gamma ^{\frac{5}{2}} \sqrt{\frac{9}{2}} K^2 L_H \left( \frac{40L_H}{\mu } \right) ^2, \\ s_3= & {} \gamma ^{\frac{9}{2}} \frac{81 K^4 L_H^2}{4\sqrt{\mu }} \left( \frac{20L^2}{\mu }\right) ^2,~ s_4 = \gamma ^{\frac{9}{2}} \frac{81 K^4 L_H^2}{4\sqrt{\mu }} \left( \frac{40L_H}{\mu } \right) ^4 \;,\\ c= & {} \frac{\mu }{4} \frac{1 - \mu \gamma }{\sqrt{\mu }},~D^{(\ell )} = \frac{ \Vert {\varvec{\theta }}^\ell - {\varvec{\theta }}_{ex}^\ell \Vert ^2 }{\sqrt{\gamma }},~f( \bar{h}^{(q)} ) = \gamma \left( K^2 L_H \sqrt{\frac{9}{2}} (\bar{h}^{(q)})^{\frac{1}{2}} + \frac{162 K^4 L_H^2}{\mu ^{\frac{3}{2}}} \bar{h}^{(q)} \right) \;. \end{aligned}$$

The conditions in (55) are satisfied when

$$\begin{aligned} \begin{aligned}&\frac{\sqrt{\mu }}{4} - \gamma \left( K^2 L_H \sqrt{9} (\bar{h}^{(1)})^{\frac{1}{2}} + \frac{324 K^4 L_H^2}{\mu ^{\frac{3}{2}}} \bar{h}^{(1)} + \frac{\mu ^{\frac{3}{2}}}{4} \right) \ge 0 \\&\Longleftrightarrow \gamma \le \frac{\sqrt{\mu }}{4} \left( K^2 L_H \sqrt{9} (\bar{h}^{(1)})^{\frac{1}{2}} + \frac{324 K^4 L_H^2}{\mu ^{\frac{3}{2}}} \bar{h}^{(1)} + \frac{\mu ^{\frac{3}{2}}}{4} \right) ^{-1} \mathrel{\mathop :}=\frac{\bar{c}_3}{L} \;, \end{aligned} \end{aligned}$$

(98)

and

$$\begin{aligned} \begin{aligned} 1 > (1-\sqrt{\mu \gamma })&+ \gamma ^{\frac{5}{2}} \sqrt{\frac{9}{2}} K^2 L_H \left( \frac{20L^2}{\mu } (2 \bar{h}^{(1)})^{\frac{1}{2}} + \left( \frac{40L_H}{\mu } \right) ^2 (2 \bar{h}^{(1)})^{\frac{3}{2}} \right) \\&+ \gamma ^{\frac{9}{2}} \frac{81 K^4 L_H^2}{4\sqrt{\mu }} \left( \left( \frac{20L^2}{\mu }\right) ^2 (2 \bar{h}^{(1)} ) + \left( \frac{40L_H}{\mu } \right) ^4 (2 \bar{h}^{(1)})^3 \right) \;, \end{aligned} \end{aligned}$$

(99)

that can be implied by

$$\begin{aligned} \begin{aligned}&\gamma< \left( \frac{\sqrt{\mu }}{\sqrt{18} K^2 L_H}\left( \frac{20L^2}{\mu } (2 \bar{h}^{(1)})^{\frac{1}{2}} + \left( \frac{40L_H}{\mu } \right) ^2 (2 \bar{h}^{(1)})^{\frac{3}{2}} \right) ^{-1} \right) ^{\frac{1}{2}} \mathrel{\mathop :}=\frac{\bar{c}_1}{L}~~~~\text{and} \\&\gamma < \left( \frac{2 {\mu }}{81 K^4 L_H^2} \left( \left( \frac{20L^2}{\mu }\right) ^2 (2 \bar{h}^{(1)} ) + \left( \frac{40L_H}{\mu } \right) ^4 (2 \bar{h}^{(1)})^3 \right) ^{-1} \right) ^{\frac{1}{4}} \mathrel{\mathop :}=\frac{\bar{c}_2}{L} \;. \end{aligned} \end{aligned}$$

(100)

Substituting these constants into Proposition 6 proves the claims in Theorem 2.

G Proof of Proposition 6

Define $\{ \bar{R}^{(k)} \}_{k \ge 1}$ that satisfies:

$$\begin{aligned} \bar{R}^{(k+1)} = p^k b \bar{R}^{(1)} + \sum _{\ell =1}^k p^{k-\ell } \left( \sum _{j=1}^J s_j \max _{ (\ell - M)_{++} \le q \le \ell } (\bar{R}^{(q)})^{\eta _j} \right) ,~~\bar{R}^{(1)} = R^{(1)} \;, \end{aligned}$$

(101)

By subtracting $p \bar{R}^{(k)}$ from $\bar{R}^{(k+1)}$, (101) can be alternatively expressed as:

$$\begin{aligned} \bar{R}^{(k+1)} - p \bar{R}^{(k)} = \sum _{j=1}^J s_j \max _{ (k- M)_{++} \le q \le k } (\bar{R}^{(q)})^{\eta _j} \;. \end{aligned}$$

(102)

Now, consider the statements (1) and (2) in (56) as the following event:

$$\begin{aligned} \begin{aligned} \mathcal{E}_z = \Big \{&~ \bar{R}^{((z-1)M + k+1)} \ge R^{((z-1)M + k+1)}, ~\bar{R}^{((z-1)M + k+1)} \le \delta ^z (b \bar{R}^{(1)} ),~ k = 1,..., M \Big \} \;, \end{aligned} \end{aligned}$$

for all $z \ge 1$. We shall prove that $\mathcal{E}_z$ is true for $z=1,2,...$ using induction.

Base case with$z=1$. To prove $\mathcal{E}_1$, let us apply another induction on k inside the event. For the base case of $k=1$,

$$\begin{aligned} \begin{aligned} \bar{R}^{(2)}&\ge p ( b R^{(1)} ) + \sum _{j=1}^J s_j (R^{(1)})^{\eta _j} - ( \bar{f} - f( R^{(1)})) D^{(1)} = R^{(2)} \;, \end{aligned} \end{aligned}$$

(103)

where we used the fact $\bar{f} \ge f( b R^{(1)} ) \ge f( R^{(1)} )$. Furthermore, the base case holds as:

$$\begin{aligned} \bar{R}^{(2)} = (b \bar{R}^{(1)}) \left( p + (1/b) \sum _{j=1}^J s_j ( \bar{R}^{(1)} )^{\eta _j - 1} \right) \le \delta ( b \bar{R}^{(1)} ) \;. \end{aligned}$$

(104)

For the induction step, suppose that the statements in (103) are also true up to $k=k' - 1$ with $z=1$ such that $\bar{R}^{(k')} \ge R^{(k')}$ and $\bar{R}^{(k')} \le \delta ( b \bar{R}^{(1)} )$. Consider the case of $k=k'$, we observe that $\bar{f} \ge f( b R^{(1)} ) \ge f (\delta b R^{(1)} ) \ge f( \bar{R}^{(q)} ) \ge f( R^{(q)} )$ for all $q=1,...,k'$. Therefore, we can lower bound $\bar{R}^{(k'+1)}$ as:

$$\begin{aligned} \begin{aligned}&\bar{R}^{(k'+1)} = p^{k'} ( b \bar{R}^{(1)} ) + \sum _{\ell =1}^{k'} p^{k'-\ell } \left( \sum _{j=1}^J s_j \max _{ (\ell -M)_{++} \le q \le \ell } (\bar{R}^{(q)})^{\eta _j} \right) \\&\quad \ge p^{k'} ( b R^{(1)} ) + \sum _{\ell =1}^{k'} p^{k'-\ell } \left( \sum _{j=1}^J s_j \max _{ (\ell -M)_{++} \le q \le \ell } (R^{(q)})^{\eta _j} - \left( \bar{f} - \max _{\ell \le q \le k'} f(R^{(q)}) \right) V^{(\ell )} \right) , \end{aligned} \end{aligned}$$

where the right hand side is exactly $R^{(k'+1)}$; also, using (102), we can show:

$$\begin{aligned} \begin{aligned} \bar{R}^{(k'+1)}&\le ( b \bar{R}^{(1)} ) \left( \delta p + \sum _{j=1}^J s_j (b \bar{R}^{(1)})^{\eta _j-1} \right) \le \delta ( b \bar{R}^{(1)} ) \;. \end{aligned} \end{aligned}$$

(105)

Induction Case. For the induction case, suppose that $\mathcal{E}_z$ is true for all z up to $z'$. We consider the case when $z = z' + 1$. Once again, we apply another induction on k. In the base case of $k = 1$ and $z=z' + 1$, we have

$$\begin{aligned} \begin{aligned}&\bar{R}^{(z'M+2)} = p^{z'M+1} ( b \bar{R}^{(1)} ) + \sum _{\ell =1}^{z'M+1} p^{z'M+1-\ell } \left( \sum _{j=1}^J s_j \max _{ (\ell -M)_{++} \le q \le \ell } (\bar{R}^{(q)})^{\eta _j} \right) \\&\quad \ge p^{z'M+1} ( b R^{(1)} ) + \sum _{\ell =1}^{z'M+1} p^{z'M+1-\ell } \left( \sum _{j=1}^J s_j \max _{ (\ell -M)_{++} \le q \le \ell } (R^{(q)})^{\eta _j} \right. \\&\qquad \left. - \left( \bar{f} - \max _{\ell \le q \le z'M + 1} f(R^{(q)}) \right) V^{(\ell )} \right) = R^{(z'M+2)} \;, \end{aligned} \end{aligned}$$

where we used $\bar{f} \ge f( b R^{(1)} ) \ge f ( \bar{R}^{(q)} ) \ge f( R^{(q)} )$ for all q up to $q = z'M+1$ (by the induction hypothesis). Furthermore, the base case holds since:

$$\begin{aligned} \begin{aligned} \bar{R}^{(z'M+2)}&= p \bar{R}^{(z'M+1)} + \sum _{j=1}^J s_j \max _{ (z'M+1-M)_{++} \le q \le z'M+1 } ( \bar{R}^{(q)} )^{\eta _j} \\&\quad \le \delta ^{z'} (b \bar{R}^{(1)}) \left( p + \sum _{j=1}^J s_j (\delta ^{z'})^{\eta _j-1} (b \bar{R}^{(1)})^{\eta _j-1} \right) \le \delta ^{z'+1} ( b \bar{R}^{(1)} ) \;. \end{aligned} \end{aligned}$$

(106)

Let the statements in $\mathcal{E}_z$ be true up to $k=k' - 1$, $z=z'+1$. With $k = k'$,

$$\begin{aligned} \begin{aligned} \bar{R}^{( z'M + k' + 1 )}&\ge p^{z'M+k'} ( b R^{(1)} ) + \sum _{\ell =1}^{z'M+k'} p^{z'M+k'-\ell } \left( \sum _{j=1}^J s_j \max _{ (\ell -M)_{++} \le q \le \ell } (R^{(q)})^{\eta _j} \right. \\&\quad \left. - \left( \bar{f} - \max _{\ell \le q \le z'M + k'} f(R^{(q)}) \right) V^{(\ell )} \right) = R^{(z'M + k' + 1)} \;,\\ \bar{R}^{(z'M+k'+1)}&\le \delta ^{z'} (b \bar{R}^{(1)}) \left( \delta p + \sum _{j=1}^J s_j (\delta ^{z'})^{\eta _j-1} (b \bar{R}^{(1)})^{\eta _j-1} \right) \le \delta ^{z'+1} ( b \bar{R}^{(1)} ) \;. \end{aligned} \end{aligned}$$

(107)

The induction case is thus proven. This shows that the event $\mathcal{E}_z$ is true for all $z \ge 1$.

Proving statement (iii). We apply statement (ii) to prove (iii). From (102),

$$\begin{aligned} \begin{aligned} \frac{ \bar{R}^{(k+1)} }{ \bar{R}^{(k)} }&= p + \frac{1}{ \bar{R}^{(k)} } \sum _{j=1}^J s_j \max _{ (k-M)_{++} \le q \le k} (\bar{R}^{(q)} )^{\eta _j} \;. \end{aligned} \end{aligned}$$

(108)

For any $q \in [(k-M)_{++}, k]$, we have

$$\begin{aligned} \frac{ (\bar{R}^{(q)})^{\eta _j} }{\bar{R}^{(k)}} = \frac{ \bar{R}^{(q)} }{ \bar{R}^{(k)} } (\bar{R}^{(q)})^{\eta _j - 1} \le \frac{ \bar{R}^{(q)} }{ \bar{R}^{(k)} } \left( \delta ^{\lceil (q-1) / M \rceil } ( b R^{(1)} ) \right) ^{\eta _j - 1} \;. \end{aligned}$$

(109)

Since $\eta _j > 1$ and $|q-k| \le M$, we have $\delta ^{\lceil (q-1) / M \rceil ( \eta _j - 1 )} \rightarrow 0$ as $k \rightarrow \infty$, moreover as $\bar{R}^{(k+1)} / \bar{R}^{(k)} \ge p$ for all $k \ge 1$, $\bar{R}^{(q)} / \bar{R}^{(k)} \le p^{-M}$ for all q. Therefore, we get

$$\begin{aligned} \lim _{ k \rightarrow \infty } \frac{ \max _{ (k-M)_{++} \le q \le k} (\bar{R}^{(q)} )^{\eta _j} }{ \bar{R}^{(k)} } = 0,~\forall ~j \Longrightarrow \lim _{ k \rightarrow \infty } \frac{ \bar{R}^{(k+1)} }{ \bar{R}^{(k)} } = p \;. \end{aligned}$$

(110)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wai, HT., Shi, W., Uribe, C.A. et al. Accelerating incremental gradient optimization with curvature information. Comput Optim Appl 76, 347–380 (2020). https://doi.org/10.1007/s10589-020-00183-1

Download citation

Received: 22 June 2019
Published: 07 March 2020
Issue Date: June 2020
DOI: https://doi.org/10.1007/s10589-020-00183-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerating incremental gradient optimization with curvature information

Abstract

Access this article

Similar content being viewed by others

An efficient gradient method with approximate optimal stepsize for large-scale unconstrained optimization

Iterative Grossone-Based Computation of Negative Curvature Directions in Large-Scale Optimization

Globally linearly convergent nonlinear conjugate gradients without Wolfe line search

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

A Proof of Proposition 3

B Step 3 in the Proof of Theorem 1

C Proof of Proposition 5

D Proof of Proposition 2

1.1 D.1 Proof of the equality

E Proof of Proposition 4

F Step 3 in the Proof of Theorem 2

G Proof of Proposition 6

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Accelerating incremental gradient optimization with curvature information

Abstract

Access this article

Similar content being viewed by others

An efficient gradient method with approximate optimal stepsize for large-scale unconstrained optimization

Iterative Grossone-Based Computation of Negative Curvature Directions in Large-Scale Optimization

Globally linearly convergent nonlinear conjugate gradients without Wolfe line search

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

A Proof of Proposition 3

B Step 3 in the Proof of Theorem 1

C Proof of Proposition 5

D Proof of Proposition 2

1.1 D.1 Proof of the equality

E Proof of Proposition 4

F Step 3 in the Proof of Theorem 2

G Proof of Proposition 6

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation