A Proof of Proposition 3
Let us express the gradient error as
$${{{\varvec{e}}}}_{\textsf{CIAG}}^k = \sum _{i=1}^m \left( {\nabla }f_i ( {\varvec{\theta }}^{\tau _i^k} ) + {\nabla }^2 f_i ( {\varvec{\theta }}^{\tau _i^k} ) ( {\varvec{\theta }}^k - {\varvec{\theta }}^{\tau _i^k} ) - {\nabla }f_i ( {\varvec{\theta }}^k ) \right)$$
(59)
Applying Lemma 1:
$$\begin{aligned} \begin{aligned}&\Vert {{{\varvec{e}}}}_{\textsf{CIAG}}^k \Vert \le \sum _{i=1}^m \frac{L_{H,i}}{2} \Vert {\varvec{\theta }}^{\tau _i^k} - {\varvec{\theta }}^k \Vert ^2 \le \sum _{i=1}^m \frac{L_{H,i}}{2} \underbrace{(k - \tau _i^k)}_{\le K} \sum _{j=\tau _i^k}^{k-1} \Vert {\varvec{\theta }}^{j+1} - {\varvec{\theta }}^j \Vert ^2 \\&\quad \le \frac{K L_H}{2} \sum _{j= (k-K)_{++}}^{k-1} \Vert {\varvec{\theta }}^{j+1} - {\varvec{\theta }}^j \Vert ^2 \le \frac{K L_H}{2} \gamma ^2 \sum _{j=(k-K)_{++}}^{k-1} \Vert {{{\varvec{e}}}}_{\textsf{CIAG}}^j + {\nabla }F({\varvec{\theta }}^j) \Vert ^2 \\&\quad \le \gamma ^2 K L_H \sum _{j=(k-K)_{++}}^{k-1} \left( \Vert {{{\varvec{e}}}}_{\textsf{CIAG}}^j \Vert ^2 + \Vert {\nabla }F({\varvec{\theta }}^j) \Vert ^2 \right) \;. \end{aligned} \end{aligned}$$
(60)
Furthermore, we have
$$\begin{aligned}&\Vert {\nabla }F({\varvec{\theta }}^j) \Vert ^2 = \Vert {\nabla }F({\varvec{\theta }}^j) - {\nabla }F({\varvec{\theta }}^\star ) \Vert ^2 \le L^2 V^{(j)}, \end{aligned}$$
(61)
$$\begin{aligned}&\Vert {{{\varvec{e}}}}_{\textsf{CIAG}}^j \Vert \overset{(a)}{\le } \sum _{i=1}^m L_{H,i} \left( V^{(j)} + V^{(\tau _i^j)} \right) \le 2 L_H \max _{ \ell \in \{ \tau _i^j \}_{i=1}^m \cup \{j\} } V^{(\ell )} \;, \end{aligned}$$
(62)
where (a) is due to \(\Vert {{{\varvec{a}}}} - {{{\varvec{b}}}} \Vert ^2 \le 2 (\Vert {{{\varvec{a}}}}\Vert ^2 + \Vert {{{\varvec{b}}}} \Vert ^2)\). Plugging these back into (60) and using \(\tau _i^{k-K} \ge k - 2K\) gives:
$$\begin{aligned} \begin{aligned} \Vert {{{\varvec{e}}}}_{\textsf{CIAG}}^k \Vert&\le \gamma ^2 K L_H \sum _{j=(k-K)_{++}}^{k-1} \left( L^2 V^{(j)} + \left( 2 L_H \max _{ \ell \in \{ \tau _i^j \}_{i=1}^m \cup \{j\} } V^{(\ell )} \right) ^2 \right) \\&\le \gamma ^2 K^2 L_H \left( L^2 \max _{ (k-K)_{++} \le \ell \le k-1 } V^{(\ell )} + 4 L_H^2 \max _{ (k-2K)_{++} \le \ell \le k-1 } (V^{(\ell )})^2 \right) \;. \end{aligned} \end{aligned}$$
(63)
B Step 3 in the Proof of Theorem 1
Combining Proposition 1 and 3 yields
$$\begin{aligned} \begin{aligned} V^{(k+1)}&\le \left( 1 - 2\gamma \frac{ \mu L }{\mu + L}\right) V^{(k)} \\&\quad + 2 \gamma ^3 K^2 L_H \left( L^2 \max _{ (k-K)_{++} \le \ell \le k } (V^{(\ell )})^{\frac{3}{2}}+ 4 L_H^2 \max _{ (k-2K)_{++} \le \ell \le k } (V^{(\ell )})^{\frac{5}{2}} \right) \\&\quad + 2 \gamma ^6 K^4 L_H^2 \left( L^4 \max _{ (k-K)_{++} \le \ell \le k-1 } (V^{(\ell )})^2 + 16 L_H^4 \max _{ (k-2K)_{++} \le \ell \le k-1 } (V^{(\ell )})^4 \right) , \end{aligned} \end{aligned}$$
(64)
which is the exact form for Eq. (44). The right hand side in (64) can be decomposed into two terms—the first term is of the same order as \(V^{(k)}\), and the other terms are delayed and higher-order terms of \(V^{(\ell )}\).
Observe that (64) is a special case of (48) in Proposition 5 with \(R^{(k)} = V^{(k)}\), \(M=2K+1\), \(p=1 - 2 \gamma \mu L / (\mu + L)\) and
$$\begin{aligned} \begin{aligned}&q_1 = 2 \gamma ^3 K^2 L^2 L_H,~\eta _1 = 3/2,~q_2 = 8 \gamma ^3 K^2 L_H^3,~\eta _3 = 5/2 \;, \\&q_3 = 2 \gamma ^6 K^4 L_H^2 L^4,~\eta _3 = 2,~q_4 = 32 \gamma ^6 K^4 L_H^6,~\eta _4 = 4 \;. \end{aligned} \end{aligned}$$
(65)
The corresponding convergence condition in (49) can be satisfied if
$$\begin{aligned} \begin{aligned}&\gamma ^5 ~ 2K^4 L_H^2 \left( L^4 V^{(1)} + 16 L_H^4 (V^{(1)})^3 \right)< \frac{ \mu L }{ \mu + L } \\&\text{and}~~\gamma ^2 ~ 2K^2 L_H \left( L^2 (V^{(1)})^{1/2} + 4 L_H^2 (V^{(1)})^{3/2} \right) < \frac{ \mu L }{ \mu + L } \;, \end{aligned} \end{aligned}$$
(66)
which can be implied by (28). The proof is thus concluded.
C Proof of Proposition 5
The proof of the proposition is divided into two parts. We first show that under (49), the sequence \(\{ R^{(k)} \}_{k \ge 1}\) converges linearly as in part (a) of the proposition; then we show that the rate of convergence is asymptotically given by p as in part (b) of the proposition [cf. (50)].
The first part of the proof is achieved using induction on all \(\ell \ge 1\) with:
$$\begin{aligned} R^{(k)} \le \delta ^\ell ~ R^{(1)},~\forall ~k=(\ell -1)M + 2,..., \ell M + 1\;. \end{aligned}$$
(67)
The base case when \(\ell =1\) can be straightforwardly established:
$$\begin{aligned} \begin{aligned}&\textstyle R^{(2)} \le p R^{(1)} + \sum _{j=1}^J q_j (R^{(1)})^{\eta _j} \le \delta R^{(1)} \;, \\&\vdots \\&\textstyle R^{(M+1)} \le p R^{(M)} + \sum _{j=1}^J q_j (R^{(0)})^{\eta _j} \le \delta R^{(1)} \;. \end{aligned} \end{aligned}$$
(68)
Suppose that the statement (67) is true up to \(\ell =c\), for \(\ell =c+1\), we have:
$$\begin{aligned} \begin{aligned} R^{( cM+ 2)}&\le p R^{( cM+1 )} + \sum _{j=1}^J q_j \max _{ k' \in [ (c-1)M + 2, cM +1 ] } (R^{(k')})^{\eta _j} \\&\le p \left( \delta ^c R^{(1)} \right) + \sum _{j=1}^J q_j \left( \delta ^c R^{(1)} \right) ^{\eta _j} \le \delta ^c ~ \left( pR^{(1)} + \sum _{j=1}^J q_j (R^{(1)})^{\eta _j} \right) \le \delta ^{c+1} R^{(1)} \;. \end{aligned} \end{aligned}$$
Similar statement also holds for \(R^{(k)}\) with \(k=cM+3,...,(c+1)M+1\). We thus conclude with:
$$\begin{aligned} R^{(k)} \le \delta ^{ \lceil (k-1) / M \rceil } ~ R^{(1)},~\forall ~ k \ge 1 \;, \end{aligned}$$
(69)
which proves the first part of the proposition.
The second part of the proof establishes the asymptotic linear rate of convergence in (50). We consider the upper bound sequence \(\{ \bar{R}^{(k)} \}_{k \ge 1}\) such that \(\bar{R}^{(1)} = R^{(1)}\) and the inequality (48) is tight for \(\{ \bar{R}^{(k)} \}_{k \ge 1}\). Obviously, it also holds that \(\bar{R}^{(k)} \le \delta ^{ \lceil (k-1) / M \rceil } \bar{R}^{(1)}\) for all \(k \ge 1\). Now, observe that
$$\begin{aligned} \frac{\bar{R}^{(k+1)}}{\bar{R}^{(k)}} = p + \frac{ \sum _{j=1}^J q_j \max _{ k' \in [(k-M+1)_{++}, k] } (R^{(k')})^{\eta _j} }{ \bar{R}^{(k)} } \;. \end{aligned}$$
(70)
For any \(k' \in [k-M+1,k]\) and any \(\eta > 1\), we have:
$$\begin{aligned} \begin{aligned}&\frac{ (\bar{R}^{(k')})^{\eta } }{ \bar{R}^{(k)} } = \frac{ \bar{R}^{(k')} }{ \bar{R}^{(k)} } ~ (\bar{R}^{(k')})^{\eta -1} \le \frac{ \bar{R}^{(k')} }{ \bar{R}^{(k)} } (R^{(1)})^{\eta -1} \delta ^{ (\lceil \frac{k'-1}{M} \rceil )(\eta -1) }\;. \end{aligned} \end{aligned}$$
(71)
Note that as \(\bar{R}^{(k+1)} / \bar{R}^{(k)} \ge p\), we have:
$$\begin{aligned} \frac{ (\bar{R}^{(k')})^{\eta } }{ \bar{R}^{(k)} } \le p^{-M} (R^{(1)})^{\eta -1} \delta ^{ (\lceil \frac{k'-1}{M} \rceil )(\eta -1) } \;. \end{aligned}$$
(72)
Taking \(k \rightarrow \infty\) shows that the right hand side vanishes. As a result, we have \(\lim _{k \rightarrow \infty } \bar{R}^{(k+1)} / \bar{R}^{(k)} = p\). This proves part (b) of the proposition.
D Proof of Proposition 2
The following proof is partially inspired by [7, 21, 25]. For simplicity, we drop the subscript ACIAG in \({{{\varvec{g}}}}_{\textsf{ACIAG}}^k\) and \({{{\varvec{e}}}}_{\textsf{ACIAG}}^k\). Define \(\rho \mathrel{\mathop :}=1 - \sqrt{\mu \gamma }\) and the estimation sequence as:
$$\begin{aligned} \begin{aligned} \varPhi _1 ( {\varvec{\theta }})&\mathrel{\mathop :}=F ( {\varvec{\theta }}_{ex}^1 ) + \frac{ \mu }{2} \Vert {\varvec{\theta }}- {\varvec{\theta }}_{ex}^1 \Vert ^2 \\ \varPhi _{k+1}( {\varvec{\theta }})&\mathrel{\mathop :}=\rho ~\varPhi _k ( {\varvec{\theta }}) + \sqrt{\mu \gamma } \left( F( {\varvec{\theta }}_{ex}^k) + \langle {{{\varvec{g}}}}^k, {\varvec{\theta }}- {\varvec{\theta }}_{ex}^k \rangle + \frac{\mu }{2} \Vert {\varvec{\theta }}- {\varvec{\theta }}_{ex}^k \Vert ^2 \right) \;, \end{aligned} \end{aligned}$$
(73)
where \({{{\varvec{g}}}}^k \mathrel{\mathop :}={{{\varvec{b}}}}^k + {{{\varvec{H}}}}^k {\varvec{\theta }}_{ex}^k\) is the gradient surrogate used in (17). Recall that \({{{\varvec{e}}}}^k \mathrel{\mathop :}={{{\varvec{g}}}}^k - {\nabla }F( {\varvec{\theta }}_{ex}^k )\) is the gradient error. The following inequality, which holds for all \({\varvec{\theta }}\in \mathbb{R}^d\), can be immediately obtained using (73) and the \(\mu\)-strong convexity of \(F({\varvec{\theta }})\):
$$\begin{aligned} \begin{aligned}&\varPhi _{k+1} ({\varvec{\theta }}) - F({\varvec{\theta }}) = \rho \varPhi _k ( {\varvec{\theta }}) - F({\varvec{\theta }}) \\&\qquad + \sqrt{\mu \gamma } \left( F( {\varvec{\theta }}_{ex}^k) + \langle {\nabla }F({\varvec{\theta }}_{ex}^k) + {{{\varvec{e}}}}^k, {\varvec{\theta }}- {\varvec{\theta }}_{ex}^k \rangle + \frac{\mu }{2} \Vert {\varvec{\theta }}- {\varvec{\theta }}_{ex}^k \Vert ^2 \right) \\&\quad \le \rho \left( \varPhi _k ( {\varvec{\theta }}) - F({\varvec{\theta }}) \right) + \sqrt{\mu \gamma } \langle {{{\varvec{e}}}}^k, {\varvec{\theta }}- {\varvec{\theta }}_{ex}^s \rangle \\&\quad \le \rho ^k \left( \varPhi _1( {\varvec{\theta }}) - F({\varvec{\theta }}) \right) + \sum _{\ell =1}^k \rho ^{k-\ell } \sqrt{ \mu \gamma } \langle {{{\varvec{e}}}}^\ell , {\varvec{\theta }}- {\varvec{\theta }}_{ex}^\ell \rangle \;. \end{aligned} \end{aligned}$$
(74)
To facilitate our development, let us denote:
$$\begin{aligned} \varPhi _k^\star \mathrel{\mathop :}=\min _{ {\varvec{\theta }}} \varPhi _k ( {\varvec{\theta }}),~~{{{\varvec{v}}}}^k \mathrel{\mathop :}=\arg \min _{ {\varvec{\theta }}} \varPhi _k ( {\varvec{\theta }}) \;. \end{aligned}$$
(75)
By setting \({\varvec{\theta }}= {\varvec{\theta }}^\star\) in (74), we have:
$$\begin{aligned} \begin{aligned}&\varPhi _{k+1}^\star - F({\varvec{\theta }}^\star ) \le \varPhi _{k+1}({\varvec{\theta }}^\star ) - F({\varvec{\theta }}^\star ) \\&\quad \le \rho ^k \left( \frac{\mu }{2} \Vert {\varvec{\theta }}^\star - {\varvec{\theta }}_{ex}^1 \Vert ^2 + F({\varvec{\theta }}_{ex}^1) - F({\varvec{\theta }}^\star ) \right) + \sum _{\ell =1}^k \rho ^{k-\ell } \sqrt{ \mu \gamma } \langle {{{\varvec{e}}}}^\ell , {\varvec{\theta }}^\star - {\varvec{\theta }}_{ex}^\ell \rangle \\&\quad \le 2 \rho ^k \left( F({\varvec{\theta }}^1) - F({\varvec{\theta }}^\star ) \right) + \sum _{\ell =1}^k \rho ^{k-\ell } \sqrt{ \mu \gamma } \langle {{{\varvec{e}}}}^\ell , {\varvec{\theta }}^\star - {\varvec{\theta }}_{ex}^\ell \rangle \;. \end{aligned} \end{aligned}$$
(76)
Now, if \(F( {\varvec{\theta }}^{k+1} ) \le \varPhi _{k+1}^\star\), then the inequality above shows the evolution of the optimality gap \(h^{(k)}\). This motivates our next step, relating \(F( {\varvec{\theta }}^{k+1} )\) to \(\varPhi _{k+1}^\star\).
Lower bounding\(\varPhi _{k+1}^\star\)in the presence of errors. Since \({\nabla }^2 \varPhi _k ( {\varvec{\theta }}) = \mu {{{\varvec{I}}}}\), the function \(\varPhi _k({\varvec{\theta }})\) is quadratic and we can represent \(\varPhi _k({\varvec{\theta }})\) alternatively as
$$\begin{aligned} \varPhi _k ({\varvec{\theta }}) = \varPhi _k^\star + \frac{\mu }{2} \Vert {\varvec{\theta }}- {{{\varvec{v}}}}^k \Vert ^2 \;. \end{aligned}$$
(77)
By substituting (77) into the definition of \(\varPhi _{k+1} ({\varvec{\theta }})\) in (73) and evaluating the first order optimality condition of the latter, we have:
$$\begin{aligned} \begin{aligned}&\sqrt{\mu \gamma } ( {{{\varvec{g}}}}^k + \mu ( {{{\varvec{v}}}}^{k+1} - {\varvec{\theta }}_{ex}^k ) ) + \rho ~ \mu ( {{{\varvec{v}}}}^{k+1} - {{{\varvec{v}}}}^k ) = {{{\varvec{0}}}} \;,\\&\Longrightarrow {{{\varvec{v}}}}^{k+1} = \rho {{{\varvec{v}}}}^k + \sqrt{ \mu \gamma } {\varvec{\theta }}_{ex}^k - \sqrt{\frac{\gamma }{\mu }} {{{\varvec{g}}}}^k \;. \end{aligned} \end{aligned}$$
(78)
By setting \({\varvec{\theta }}={\varvec{\theta }}_{ex}^k\) in (73) and using the recursive definition of \(\varPhi _{k+1} ({\varvec{\theta }})\), we obtain
$$\begin{aligned} \begin{aligned} \varPhi _{k+1} ( {\varvec{\theta }}_{ex}^k )&= \rho \varPhi _{k} ( {\varvec{\theta }}_{ex}^k ) + \sqrt{\mu \gamma } F( {\varvec{\theta }}_{ex}^k ) = \rho \left( \varPhi _k^\star + \frac{\mu }{2} \Vert {\varvec{\theta }}_{ex}^k - {{{\varvec{v}}}}^k \Vert ^2 \right) + \sqrt{\mu \gamma } F( {\varvec{\theta }}_{ex}^k ) \;, \end{aligned} \end{aligned}$$
(79)
while setting \({\varvec{\theta }}={\varvec{\theta }}_{ex}^k\) in (77) and using (78) gives us:
$$\begin{aligned} \begin{aligned} \varPhi _{k+1} ( {\varvec{\theta }}_{ex}^k )&= \varPhi _{k+1}^\star + \frac{\mu }{2} \left( \rho ^2 \Vert {\varvec{\theta }}_{ex}^k - {{{\varvec{v}}}}^k \Vert ^2 + \frac{\gamma }{\mu } \Vert {{{\varvec{g}}}}^k \Vert ^2 + 2 \rho \sqrt{\frac{\gamma }{\mu }} \langle {{{\varvec{g}}}}^k, {\varvec{\theta }}_{ex}^k - {{{\varvec{v}}}}^k \rangle \right) \;. \end{aligned} \end{aligned}$$
(80)
Comparing the right hand side of (79) and (80) shows:
$$\begin{aligned} \begin{aligned} \varPhi _{k+1}^\star&= \rho \left( \varPhi _k^\star + \frac{\mu }{2} \Vert {\varvec{\theta }}_{ex}^k - {{{\varvec{v}}}}^k \Vert ^2 \right) + \sqrt{\mu \gamma } F( {\varvec{\theta }}_{ex}^k ) \\&\quad - \frac{\mu }{2}\left( \rho ^2 \Vert {\varvec{\theta }}_{ex}^k - {{{\varvec{v}}}}^k \Vert ^2 + \frac{\gamma }{\mu } \Vert {{{\varvec{g}}}}^k \Vert ^2 + 2 \rho \sqrt{\frac{\gamma }{\mu }} \langle {{{\varvec{g}}}}^k, {\varvec{\theta }}_{ex}^k - {{{\varvec{v}}}}^k \rangle \right) \\&= \rho \varPhi _k^\star + \sqrt{\mu \gamma } F({\varvec{\theta }}_{ex}^k) + \frac{\mu }{2} \rho \sqrt{\mu \gamma } \Vert {\varvec{\theta }}_{ex}^k - {{{\varvec{v}}}}^k \Vert ^2 - \frac{\gamma }{2} \Vert {{{\varvec{g}}}}^k \Vert ^2 - \rho \sqrt{\mu \gamma } \langle {{{\varvec{g}}}}^k, {\varvec{\theta }}_{ex}^k - {{{\varvec{v}}}}^k \rangle \;. \end{aligned} \end{aligned}$$
Using the fact \({{{\varvec{v}}}}^k - {\varvec{\theta }}_{ex}^k = (\sqrt{\mu \gamma })^{-1} \left( {\varvec{\theta }}_{ex}^k - {\varvec{\theta }}^k \right)\) (proven in Sect. D.1), we have
$$\begin{aligned} \begin{aligned} \varPhi _{k+1}^\star&= \rho \varPhi _k^\star + \sqrt{\mu \gamma } F({\varvec{\theta }}_{ex}^k) + \frac{\mu }{2} \frac{ \rho }{ \sqrt{\mu \gamma } } \Vert {\varvec{\theta }}_{ex}^k - {\varvec{\theta }}^k \Vert ^2 - \frac{ \gamma }{2} \Vert {{{\varvec{g}}}}^k \Vert ^2 - \rho \langle {{{\varvec{g}}}}^k, {\varvec{\theta }}^k - {\varvec{\theta }}_{ex}^k \rangle \;. \end{aligned} \end{aligned}$$
(81)
We obtain the following chain:
$$\begin{aligned} \begin{aligned}&F( {\varvec{\theta }}^{k+1} ) - \varPhi _{k+1}^\star \overset{(a)}{\le } F( {\varvec{\theta }}_{ex}^k ) - \gamma \langle {\nabla }F( {\varvec{\theta }}_{ex}^k ), {{{\varvec{g}}}}^k \rangle + \frac{L \gamma ^2}{2} \Vert {{{\varvec{g}}}}^k \Vert ^2 - \varPhi _{k+1}^\star \\&\quad \overset{(b)}{=} \rho ~ \left( F({\varvec{\theta }}_{ex}^k ) + \langle {{{\varvec{g}}}}^k, {\varvec{\theta }}^k - {\varvec{\theta }}_{ex}^k \rangle - \varPhi _k^\star \right) \\&\qquad -\gamma \langle {\nabla }F( {\varvec{\theta }}_{ex}^k ), {{{\varvec{g}}}}^k \rangle + \frac{\gamma }{2} \left( 1 + L \gamma \right) \Vert {{{\varvec{g}}}}^k \Vert ^2 - \frac{\mu }{2} \frac{ \rho }{ \sqrt{\mu \gamma } } \Vert {\varvec{\theta }}_{ex}^k - {\varvec{\theta }}^k \Vert ^2 \\&\quad \overset{(c)}{=} \rho ~ \left( F({\varvec{\theta }}_{ex}^k) + \langle {\nabla }F({\varvec{\theta }}_{ex}^k), {\varvec{\theta }}^k - {\varvec{\theta }}_{ex}^k \rangle - \varPhi _k^\star \right) -\gamma \langle {\nabla }F( {\varvec{\theta }}_{ex}^k ), {{{\varvec{g}}}}^k \rangle \\&\qquad + \rho \langle {{{\varvec{e}}}}^k, {\varvec{\theta }}^k - {\varvec{\theta }}_{ex}^k \rangle + \frac{\gamma }{2} \left( 1 + L \gamma \right) \Vert {{{\varvec{g}}}}^k \Vert ^2 - \frac{\mu }{2} \frac{ \rho }{ \sqrt{\mu \gamma } } \Vert {\varvec{\theta }}_{ex}^k - {\varvec{\theta }}^k \Vert ^2 \\&\quad \overset{(d)}{\le } \rho ~ \left( F({\varvec{\theta }}^k) - \varPhi _k^\star + \langle {{{\varvec{e}}}}^k, {\varvec{\theta }}^k - {\varvec{\theta }}_{ex}^k \rangle \right) - \frac{\mu }{2} \frac{ 1 - \mu \gamma }{ \sqrt{\mu \gamma } } \Vert {\varvec{\theta }}_{ex}^k - {\varvec{\theta }}^k \Vert ^2 \\&\qquad + \frac{\gamma }{2} \left( 1 + L \gamma \right) \Vert {{{\varvec{g}}}}^k \Vert ^2 - \gamma \langle {\nabla }F( {\varvec{\theta }}_{ex}^k ), {{{\varvec{g}}}}^k \rangle \\&\quad \overset{(e)}{\le } \rho ~ \left( F({\varvec{\theta }}^k) - \varPhi _k^\star + \langle {{{\varvec{e}}}}^k, {\varvec{\theta }}^k - {\varvec{\theta }}_{ex}^k \rangle \right) - \frac{\mu }{2} \frac{ 1 - \mu \gamma }{ \sqrt{\mu \gamma } } \Vert {\varvec{\theta }}_{ex}^k - {\varvec{\theta }}^k \Vert ^2 + \gamma \Vert {{{\varvec{e}}}}^k \Vert ^2 \;, \end{aligned} \end{aligned}$$
(82)
where (a) is due to the L-smoothness of F; (b) is due to (81); (c) is obtained by expanding \({{{\varvec{g}}}}^k\) as \({\nabla }F({\varvec{\theta }}_{ex}^k) + {{{\varvec{e}}}}^k\); (d) is obtained by adding and subtracting \((\mu /2) \Vert {\varvec{\theta }}^k - {\varvec{\theta }}_{ex}^k \Vert ^2\) inside the first bracket, applying the identity \(\rho + \rho / \sqrt{\mu \gamma } = (1 - \mu \gamma ) / \sqrt{\mu \gamma }\), and using the \(\mu\)-strong convexity of F; and (e) is due to the following chain of inequalities:
$$\begin{aligned} \begin{aligned}&\frac{\gamma }{2} \left( 1 + L \gamma \right) \Vert {{{\varvec{g}}}}^k \Vert ^2 - \gamma \langle {\nabla }F( {\varvec{\theta }}_{ex}^k ), {{{\varvec{g}}}}^s \rangle \\&\quad \le \frac{\gamma }{2} \left( 1 + L \gamma \right) \left( \Vert {{{\varvec{e}}}}^k \Vert ^2 + \Vert {\nabla }F( {\varvec{\theta }}_{ex}^k ) \Vert ^2 \right) + \frac{ L \gamma ^2 }{2} \left( \Vert {\nabla }F({\varvec{\theta }}_{ex}^k ) \Vert ^2 + \Vert {{{\varvec{e}}}}^k \Vert ^2 \right) - \gamma \Vert {\nabla }F( {\varvec{\theta }}_{ex}^k ) \Vert ^2 \\&\quad = \left( \frac{\gamma }{2} + L \gamma ^2 \right) \Vert {{{\varvec{e}}}}^k \Vert ^2 + \left( -\frac{\gamma }{2} + L \gamma ^2 \right) \Vert {\nabla }F( {\varvec{\theta }}_{ex}^k ) \Vert ^2 \le \gamma \Vert {{{\varvec{e}}}}^k \Vert ^2 \;. \end{aligned} \end{aligned}$$
As \(\varPhi _1( {\varvec{\theta }}^1 ) = F( {\varvec{\theta }}^1 ) = \varPhi _1^\star\), applying the inequality (82) recursively shows:
$$\begin{aligned} \begin{aligned}&F( {\varvec{\theta }}^{k+1} ) - \varPhi _{k+1}^\star \le \\&\sum _{\ell =1}^k \rho ^{k-\ell } \left( (1-\sqrt{\mu \gamma }) \langle {{{\varvec{e}}}}^\ell , {\varvec{\theta }}^\ell - {\varvec{\theta }}_{ex}^\ell \rangle + \gamma \Vert {{{\varvec{e}}}}^\ell \Vert ^2 - \frac{\mu }{2} \frac{1-\mu \gamma }{\sqrt{\mu \gamma }} \Vert {\varvec{\theta }}_{ex}^\ell - {\varvec{\theta }}^\ell \Vert ^2 \right) \;. \end{aligned} \end{aligned}$$
(83)
Importantly, (83) establishes a lower bound on \(\varPhi _{k+1}^\star\) in terms of \(F({\varvec{\theta }}^{k+1})\) and \({{{\varvec{e}}}}^k\).
Proving Proposition 2. Finally, summing up (83) and (76) gives:
$$\begin{aligned} \begin{aligned} h^{(k+1)}&\le 2 \rho ^k h^{(1)} + \sum _{\ell =1}^k \rho ^{k-\ell } \left( \sqrt{\mu \gamma } \langle {{{\varvec{e}}}}^\ell , {\varvec{\theta }}^\star - {\varvec{\theta }}_{ex}^\ell \rangle \right. \\&\quad \left. +\, \rho \langle {{{\varvec{e}}}}^\ell , {\varvec{\theta }}^\ell - {\varvec{\theta }}_{ex}^\ell \rangle + \gamma \Vert {{{\varvec{e}}}}^\ell \Vert ^2 - \frac{\mu }{2} \frac{1-\mu \gamma }{\sqrt{\mu \gamma }} \Vert {\varvec{\theta }}_{ex}^\ell - {\varvec{\theta }}^\ell \Vert ^2 \right) \\&= 2 \rho ^k h^{(1)} + \sum _{\ell =1}^k \rho ^{k-\ell } \left( \sqrt{\mu \gamma } \langle {{{\varvec{e}}}}^\ell , {\varvec{\theta }}^\star - {\varvec{\theta }}^\ell \rangle \right. \\&\quad \left. +\, \langle {{{\varvec{e}}}}^\ell , {\varvec{\theta }}^\ell - {\varvec{\theta }}_{ex}^\ell \rangle + \gamma \Vert {{{\varvec{e}}}}^\ell \Vert ^2 - \frac{\mu }{2} \frac{1-\mu \gamma }{\sqrt{\mu \gamma }} \Vert {\varvec{\theta }}_{ex}^\ell - {\varvec{\theta }}^\ell \Vert ^2 \right) \;. \end{aligned} \end{aligned}$$
(84)
Let us take a look at the last summands in the above inequality: for any \(\ell \ge 1\),
$$\begin{aligned} \begin{aligned}&\sqrt{\mu \gamma } \langle {{{\varvec{e}}}}^\ell , {\varvec{\theta }}^\star - {\varvec{\theta }}^\ell \rangle + \langle {{{\varvec{e}}}}^\ell , {\varvec{\theta }}^\ell - {\varvec{\theta }}_{ex}^\ell \rangle + \gamma \Vert {{{\varvec{e}}}}^\ell \Vert ^2 - \frac{\mu }{2} \frac{1-\mu \gamma }{\sqrt{\mu \gamma }} \Vert {\varvec{\theta }}_{ex}^\ell - {\varvec{\theta }}^\ell \Vert ^2 \\&\quad \overset{(a)}{\le } \sqrt{\mu \gamma } \Vert {{{\varvec{e}}}}^\ell \Vert \Vert {\varvec{\theta }}^\star - {\varvec{\theta }}^\ell \Vert + \left( \gamma + \frac{ \sqrt{\gamma / \mu } }{ 1 - \mu \gamma } \right) \Vert {{{\varvec{e}}}}^\ell \Vert ^2 - \frac{\mu }{4} \frac{1-\mu \gamma }{\sqrt{\mu \gamma }} \Vert {\varvec{\theta }}_{ex}^\ell - {\varvec{\theta }}^\ell \Vert ^2 \\&\quad \overset{(b)}{\le } \sqrt{2 \gamma h^{(\ell )}} \Vert {{{\varvec{e}}}}^\ell \Vert + \left( \gamma + \frac{ \sqrt{\gamma / \mu } }{ 1 - \mu \gamma } \right) \Vert {{{\varvec{e}}}}^\ell \Vert ^2 - \frac{\mu }{4} \frac{1-\mu \gamma }{\sqrt{\mu \gamma }} \Vert {\varvec{\theta }}_{ex}^\ell - {\varvec{\theta }}^\ell \Vert ^2 \\&\quad \overset{(c)}{\le } \sqrt{2 \gamma h^{(\ell )}} \Vert {{{\varvec{e}}}}^\ell \Vert + \sqrt{\frac{9\gamma }{\mu }} \Vert {{{\varvec{e}}}}^\ell \Vert ^2 - \frac{\mu }{4} \frac{1-\mu \gamma }{\sqrt{\mu \gamma }} \Vert {\varvec{\theta }}_{ex}^\ell - {\varvec{\theta }}^\ell \Vert ^2 \;, \end{aligned} \end{aligned}$$
(85)
where (a) is resulted from the fact \(\langle {{{\varvec{e}}}}^\ell , {\varvec{\theta }}^\ell - {\varvec{\theta }}_{ex}^\ell \rangle \le (1/2) ( \Vert {{{\varvec{e}}}}^\ell \Vert ^2 / c + c \Vert {\varvec{\theta }}^\ell - {\varvec{\theta }}_{ex}^\ell \Vert ^2 )\) for any \(c > 0\) and we have set \(c = \frac{\mu }{2} \frac{1-\mu \gamma }{\sqrt{\mu \gamma }}\) therein; (b) is due to the relation \(\Vert {\varvec{\theta }}^\ell - {\varvec{\theta }}^\star \Vert \le \sqrt{2 h^{(\ell )} / \mu }\); (c) is due to \(\gamma + \frac{ \sqrt{\gamma / \mu } }{ 1 - \mu \gamma } \le 3 \sqrt{ \gamma / \mu }\), which can be verified through replacing \(\gamma\) by its upper bound 1/(2L) in the denominator of the fraction on the left-hand-side. Combining the two equations above yields the desired result of Proposition.
1.1 D.1 Proof of the equality
We prove \({{{\varvec{v}}}}^k - {\varvec{\theta }}_{ex}^k = (\sqrt{\mu \gamma })^{-1} \left( {\varvec{\theta }}_{ex}^k - {\varvec{\theta }}^k \right)\) using induction on k. Clearly, the said equality holds for \(k=1\) since \({{{\varvec{v}}}}^1 = {\varvec{\theta }}^1 = {\varvec{\theta }}_{ex}^1\), and we assume that it holds up to k. Consider:
$$\begin{aligned} \begin{aligned}&{{{\varvec{v}}}}^{k+1} - {\varvec{\theta }}_{ex}^{k+1} = \rho {{{\varvec{v}}}}^k + \sqrt{ \mu \gamma } {\varvec{\theta }}_{ex}^k - \sqrt{\frac{\gamma }{\mu }} {{{\varvec{g}}}}^k - {\varvec{\theta }}_{ex}^{k+1} \\&\quad =\rho ( {{{\varvec{v}}}}^k - {\varvec{\theta }}_{ex}^k ) + {\varvec{\theta }}_{ex}^k - \sqrt{\frac{\gamma }{\mu }} {{{\varvec{g}}}}^k - {\varvec{\theta }}_{ex}^{k+1} = \frac{ \rho }{ \sqrt{\mu \gamma } } ( {\varvec{\theta }}_{ex}^k - {\varvec{\theta }}^k ) + {\varvec{\theta }}_{ex}^k - \sqrt{\frac{\gamma }{\mu }} {{{\varvec{g}}}}^k - {\varvec{\theta }}_{ex}^{k+1} \;, \end{aligned} \end{aligned}$$
where we have used the induction hypothesis. Furthermore, using \({\varvec{\theta }}^{k+1} = {\varvec{\theta }}_{ex}^k - \gamma {{{\varvec{g}}}}^k\),
$$\begin{aligned} \begin{aligned}&{{{\varvec{v}}}}^{k+1} - {\varvec{\theta }}_{ex}^{k+1} = \sqrt{\mu \gamma }^{-1} \left( \rho ({\varvec{\theta }}_{ex}^k - {\varvec{\theta }}^k) + \sqrt{\mu \gamma } ( {\varvec{\theta }}_{ex}^k - {\varvec{\theta }}_{ex}^{k+1} ) - \gamma {{{\varvec{g}}}}^k \right) \\&\quad \overset{(a)}{=} \sqrt{ \mu \gamma }^{-1} \left( \sqrt{\mu \gamma } ( {\varvec{\theta }}^{k+1} - {\varvec{\theta }}_{ex}^{k+1} ) + \rho ({\varvec{\theta }}^{k+1} - {\varvec{\theta }}^k ) \right) = \sqrt{ \mu \gamma }^{-1} \left( {\varvec{\theta }}_{ex}^{k+1} - {\varvec{\theta }}^{k+1} \right) \;, \end{aligned} \end{aligned}$$
(86)
where (a) is due to \(\rho ({\varvec{\theta }}^{k+1} - {\varvec{\theta }}^k ) = (1 + \sqrt{\mu \gamma } ) ( {\varvec{\theta }}_{ex}^{k+1} - {\varvec{\theta }}^{k+1} )\).
E Proof of Proposition 4
We begin by observing that due to the \(L_{H,i}\)-Lipschitz continuity of the Hessian of \(f_i\) and using Lemma 1, we have:
$$\begin{aligned} \begin{aligned}&\Vert {{{\varvec{e}}}}_{\textsf{ACIAG}}^\ell \Vert = \Vert {{{\varvec{g}}}}_{\textsf{ACIAG}}^\ell - {\nabla }F( {\varvec{\theta }}_{ex}^\ell ) \Vert \le \sum _{i=1}^m \frac{L_{H,i}}{2} \Vert {\varvec{\theta }}_{ex}^\ell - {\varvec{\theta }}_{ex}^{\tau _i^\ell } \Vert ^2 \;. \end{aligned} \end{aligned}$$
(87)
Now, expanding the right hand side of (87) gives:
$$\begin{aligned} \begin{aligned} \Vert {{{\varvec{e}}}}_{\textsf{ACIAG}}^\ell \Vert&\le \sum _{i=1}^m \frac{L_{H,i}}{2} \Big \Vert {\varvec{\theta }}_{ex}^\ell - {\varvec{\theta }}_{ex}^{\tau _i^\ell } \Big \Vert ^2 \le \sum _{i=1}^m \frac{L_{H,i}}{2} ~ \underbrace{( \ell - \tau _i^\ell )}_{\le K} \sum _{j=\ell -\tau _i^\ell }^{\ell -1} \Vert {\varvec{\theta }}_{ex}^{j+1} - {\varvec{\theta }}_{ex}^j \Vert ^2 \\&\quad \le \frac{K L_{H}}{2} \sum _{j=( \ell -K )_{++}}^{\ell -1} \Vert {\varvec{\theta }}_{ex}^{j+1} - {\varvec{\theta }}_{ex}^j \Vert ^2 = \frac{K L_{H}}{2} \sum _{j=( \ell -K )_{++}}^{\ell -1} \Vert \gamma {{{\varvec{g}}}}_{\textsf{ACIAG}}^j + \underbrace{\alpha ( {\varvec{\theta }}^{j+1} - {\varvec{\theta }}^j)}_{= {\varvec{\theta }}_{ex}^{j+1} - {\varvec{\theta }}^{j+1}} \Vert ^2 \\&\quad \le \frac{3 K L_H}{2} \sum _{j=( \ell -K )_{++}}^{\ell -1} \left( \gamma ^2 \left( \Vert {{{\varvec{e}}}}^j \Vert ^2 + \Vert {\nabla }F( {\varvec{\theta }}_{ex}^j ) \Vert ^2 \right) + \Vert {\varvec{\theta }}_{ex}^{j+1} - {\varvec{\theta }}^{j+1} \Vert ^2 \right) \;. \end{aligned} \end{aligned}$$
(88)
Remarkably, the above bound resembles that of Proposition 3 with the exception of the last term that depends on \({\varvec{\theta }}_{ex}^{j+1} - {\varvec{\theta }}^{j+1}\). This is included to account for the extrapolated iterates used in the A-CIAG method.
To find an upper bound of \(\Vert {{{\varvec{e}}}}_{\textsf{ACIAG}}^\ell \Vert\) to corroborate Proposition 4, in what follows, we will upper bound \(\Vert {{{\varvec{e}}}}_{\textsf{ACIAG}}^j \Vert ^2\) and \(\Vert {\nabla }F({\varvec{\theta }}_{ex}^j) \Vert ^2\), respectively. Firstly,
$$\begin{aligned} \begin{aligned} \Vert {{{\varvec{e}}}}_{\textsf{ACIAG}}^j \Vert&\le \sum _{i=1}^m \frac{L_{H,i}}{2} \Big \Vert {\varvec{\theta }}_{ex}^j - {\varvec{\theta }}_{ex}^{\tau _i^j} \Big \Vert ^2 \\&\le \sum _{i=1}^m L_{H,i} \left( (1+\alpha )^2 \Vert {\varvec{\theta }}^j - {\varvec{\theta }}^{\tau _i^j} \Vert ^2 + \alpha ^2 \Vert {\varvec{\theta }}^{j-1} - {\varvec{\theta }}^{\tau _i^j-1} \Vert ^2 \right) \;. \end{aligned} \end{aligned}$$
(89)
Noticing that as \(\Vert {\varvec{\theta }}^j - {\varvec{\theta }}^{\tau _i^j} \Vert ^2 \le 2 ( \Vert {\varvec{\theta }}^j - {\varvec{\theta }}^\star \Vert ^2 + \Vert {\varvec{\theta }}^{\tau _i^j} - {\varvec{\theta }}^\star \Vert ^2 ) \le (4/\mu ) ( h^{(j)} + h^{(\tau _i^j)} )\), it follows from (89) that
$$\begin{aligned} \begin{aligned} \Vert {{{\varvec{e}}}}_{\textsf{ACIAG}}^j \Vert&\le \frac{4}{\mu } \sum _{i=1}^m L_{H,i} \left( (1+\alpha )^2 ( h^{(j)} + h^{(\tau _i^j)} ) + \alpha ^2 ( h^{(j-1)} + h^{(\tau _i^j - 1)} ) \right) \\&\le \frac{ 8 L_H }{\mu } \left( (1+\alpha )^2 + \alpha ^2 \right) \max _{ (j- K-1)_{++} \le q \le j } h^{(q)} \le \frac{ 40 L_H }{\mu } \max _{ (j- K-1)_{++} \le q \le j } h^{(q)} \;, \end{aligned} \end{aligned}$$
(90)
which implies
$$\begin{aligned} \begin{aligned} \sum _{j=(\ell -K)_{++}}^{\ell -1} \Vert {{{\varvec{e}}}}_{\textsf{ACIAG}}^j \Vert ^2&\le K \left( \frac{ 40 L_H }{\mu }\right) ^2 \max _{ (\ell - 2K-1)_{++} \le q \le \ell } ( h^{(q)} )^2 \;. \end{aligned} \end{aligned}$$
(91)
Secondly,
$$\begin{aligned} \begin{aligned} \Vert {\nabla }F( {\varvec{\theta }}_{ex}^j ) \Vert ^2&\le 2L^2 \left( \Vert {\varvec{\theta }}^j - {\varvec{\theta }}^\star \Vert ^2 + \Vert {\varvec{\theta }}^j - {\varvec{\theta }}^{j-1} \Vert ^2 \right) \le \frac{4L^2}{\mu } \left( 3 h^{(j)} + 2 h^{(j-1)} ) \right) \;, \end{aligned} \end{aligned}$$
(92)
thus
$$\begin{aligned} \begin{aligned} \sum _{j=(\ell -K)_{++}}^{\ell -1} \Vert {\nabla }F( {\varvec{\theta }}_{ex}^j ) \Vert ^2&\le \frac{20L^2 K }{\mu } \max _{ (\ell - K - 1)_{++} \le q \le \ell -1} h^{(q)} \;. \end{aligned} \end{aligned}$$
(93)
Substituting (91) and (93) into the right hand side of (88) verifies Proposition 4.
F Step 3 in the Proof of Theorem 2
To proceed with the proof, let us define the following quantity:
$$\begin{aligned} \begin{aligned}&\tilde{E}^{(\ell )} \mathrel{\mathop :}=\gamma ^{\frac{5}{2}} \sqrt{\frac{9}{2}} K^2 L_H \left( \left( \frac{40L_H}{\mu } \right) ^2 \max _{ (\ell -2K-1)_{++} \le q \le \ell } (h^{(q)})^2 + \frac{20L^2}{\mu } \max _{ (\ell -K-1)_{++} \le q \le \ell } h^{(q)} \right) \\&\quad + \gamma ^{\frac{9}{2}} \frac{ 81 K^4 L_H^2 }{4 \sqrt{\mu }} \left( \left( \frac{40L_H}{\mu } \right) ^4 \max _{ (\ell -2K-1)_{++} \le q \le \ell } (h^{(q)})^4 + \left( \frac{20L^2}{\mu } \right) ^2 \max _{ (\ell -K-1)_{++} \le q \le \ell } (h^{(q)})^2 \right) \;. \end{aligned} \end{aligned}$$
Using Proposition 4, we obtain:
$$\begin{aligned} \begin{aligned}&\sqrt{2 \gamma h^{(\ell )}} \Vert {{{\varvec{e}}}}_{\textsf{ACIAG}}^\ell \Vert + \sqrt{\frac{9\gamma }{\mu }} \Vert {{{\varvec{e}}}}_{\textsf{ACIAG}}^\ell \Vert ^2 \\&\quad \le \tilde{E}^{(\ell )} + \sum _{j=(\ell -K+1)_{++}}^{\ell } \left( \sqrt{\frac{9 \gamma h^{(\ell )} K^2 L_H^2}{2}} \Vert {\varvec{\theta }}^{j} - {\varvec{\theta }}_{ex}^{j} \Vert ^2 + \frac{27 K^3 L_H^2}{4} \sqrt{\frac{9\gamma }{\mu }} \Vert {\varvec{\theta }}^j - {\varvec{\theta }}_{ex}^j \Vert ^4 \right) \;. \end{aligned} \end{aligned}$$
(94)
We need to further bound \(h^{(k)}\) [recall for (41) in Proposition 2] in terms of itself to create a ‘recursion’ for \(h^{(k)}\). To upper bound the right hand side of (41), let us start from (94). It follows that
$$\begin{aligned} \begin{aligned}&\sum _{\ell =1}^k \rho ^{k-\ell } \left( \sqrt{2 \gamma h^{(\ell )}} \Vert {{{\varvec{e}}}}^\ell \Vert + \sqrt{\frac{9\gamma }{\mu }} \Vert {{{\varvec{e}}}}^\ell \Vert ^2 - \frac{\mu }{4} \frac{1-\mu \gamma }{\sqrt{\mu \gamma }} \Vert {\varvec{\theta }}_{ex}^\ell - {\varvec{\theta }}^\ell \Vert ^2 \right) \le \sum _{\ell =1}^k \rho ^{k-\ell } \Bigg ( \tilde{E}^{(\ell )} \\+ &\left( \sum _{j=\ell }^{\min \{k,\ell +K-1\}} \left( \sqrt{\frac{9 \gamma K^2 L_H^2 h^{(j)}}{2} } + \frac{81 K^3 L_H^2}{4} \sqrt{\frac{\gamma }{\mu }} \Vert {\varvec{\theta }}^\ell - {\varvec{\theta }}_{ex}^\ell \Vert ^2 \right) - \frac{\mu }{4} \frac{1-\mu \gamma }{\sqrt{\mu \gamma }} \right) \Vert {\varvec{\theta }}^\ell - {\varvec{\theta }}_{ex}^\ell \Vert ^2 \Bigg ). \end{aligned} \end{aligned}$$
Moreover, we observe for \(\ell \ge 2\):
$$\begin{aligned} \Vert {\varvec{\theta }}^\ell - {\varvec{\theta }}_{ex}^\ell \Vert ^2 \le 2 ( \Vert {\varvec{\theta }}^\ell - {\varvec{\theta }}^\star \Vert ^2 + \Vert {\varvec{\theta }}^{\ell -1} - {\varvec{\theta }}^\star \Vert ^2 ) \le \frac{4}{\mu } \left( h^{(\ell )} + h^{(\ell -1)} \right) \;, \end{aligned}$$
(95)
The coefficient in front of the last \(\Vert {\varvec{\theta }}^\ell - {\varvec{\theta }}_{ex}^\ell \Vert ^2\) term can be upper bounded as:
$$\begin{aligned} \tilde{C}^{(\ell ,k)} \mathrel{\mathop :}=\gamma K^2 L_H \sqrt{\frac{9}{2}} \max _{ \ell \le q \le \min \{ \ell +K-1,k \}} (h^{(q)})^{\frac{1}{2}} + {\gamma } \frac{81 K^4 L_H^2}{\mu ^{\frac{3}{2}}} \left( h^{(\ell )} + h^{(\ell -1)} \right) - \frac{\mu }{4} \frac{1-\mu \gamma }{\sqrt{\mu }}. \end{aligned}$$
If we define
$$\begin{aligned} \begin{aligned}&E^{(\ell ,k)} \mathrel{\mathop :}=\tilde{E}^{(\ell )} + \tilde{C}^{(\ell ,k)} \frac{\Vert {\varvec{\theta }}^\ell - {\varvec{\theta }}_{ex}^\ell \Vert ^2}{\sqrt{\gamma }} \;, \end{aligned} \end{aligned}$$
(96)
where \(E^{(\ell ,k)} = E^{(\ell ,k -1)}\) for all \(k \ge \ell + m\). Applying Proposition 2 readily shows
$$\begin{aligned} h^{(k+1)} \le 2 ( 1 - \sqrt{\mu \gamma } )^k h^{(1)} + \sum _{\ell =1}^k (1 - \sqrt{\mu \gamma })^{k - \ell } E^{(\ell ,k)} \;. \end{aligned}$$
(97)
Concluding the Proof of Theorem 2. Our goal is to analyze (97) using Proposition 6. Let us recognize that:
$$\begin{aligned} R^{(k)}= & {} \bar{h}^{(k)},~p = (1-\sqrt{\mu \gamma }),~b = 2,~M= 2K+1,~\eta _1 = \frac{3}{2},~\eta _2 = \frac{5}{2}, \eta _3 = 2,~\eta _4 = 4\\ s_1= & {} \gamma ^{\frac{5}{2}} \sqrt{\frac{9}{2}} K^2 L_H \frac{20L^2}{\mu },~ s_2 = \gamma ^{\frac{5}{2}} \sqrt{\frac{9}{2}} K^2 L_H \left( \frac{40L_H}{\mu } \right) ^2, \\ s_3= & {} \gamma ^{\frac{9}{2}} \frac{81 K^4 L_H^2}{4\sqrt{\mu }} \left( \frac{20L^2}{\mu }\right) ^2,~ s_4 = \gamma ^{\frac{9}{2}} \frac{81 K^4 L_H^2}{4\sqrt{\mu }} \left( \frac{40L_H}{\mu } \right) ^4 \;,\\ c= & {} \frac{\mu }{4} \frac{1 - \mu \gamma }{\sqrt{\mu }},~D^{(\ell )} = \frac{ \Vert {\varvec{\theta }}^\ell - {\varvec{\theta }}_{ex}^\ell \Vert ^2 }{\sqrt{\gamma }},~f( \bar{h}^{(q)} ) = \gamma \left( K^2 L_H \sqrt{\frac{9}{2}} (\bar{h}^{(q)})^{\frac{1}{2}} + \frac{162 K^4 L_H^2}{\mu ^{\frac{3}{2}}} \bar{h}^{(q)} \right) \;. \end{aligned}$$
The conditions in (55) are satisfied when
$$\begin{aligned} \begin{aligned}&\frac{\sqrt{\mu }}{4} - \gamma \left( K^2 L_H \sqrt{9} (\bar{h}^{(1)})^{\frac{1}{2}} + \frac{324 K^4 L_H^2}{\mu ^{\frac{3}{2}}} \bar{h}^{(1)} + \frac{\mu ^{\frac{3}{2}}}{4} \right) \ge 0 \\&\Longleftrightarrow \gamma \le \frac{\sqrt{\mu }}{4} \left( K^2 L_H \sqrt{9} (\bar{h}^{(1)})^{\frac{1}{2}} + \frac{324 K^4 L_H^2}{\mu ^{\frac{3}{2}}} \bar{h}^{(1)} + \frac{\mu ^{\frac{3}{2}}}{4} \right) ^{-1} \mathrel{\mathop :}=\frac{\bar{c}_3}{L} \;, \end{aligned} \end{aligned}$$
(98)
and
$$\begin{aligned} \begin{aligned} 1 > (1-\sqrt{\mu \gamma })&+ \gamma ^{\frac{5}{2}} \sqrt{\frac{9}{2}} K^2 L_H \left( \frac{20L^2}{\mu } (2 \bar{h}^{(1)})^{\frac{1}{2}} + \left( \frac{40L_H}{\mu } \right) ^2 (2 \bar{h}^{(1)})^{\frac{3}{2}} \right) \\&+ \gamma ^{\frac{9}{2}} \frac{81 K^4 L_H^2}{4\sqrt{\mu }} \left( \left( \frac{20L^2}{\mu }\right) ^2 (2 \bar{h}^{(1)} ) + \left( \frac{40L_H}{\mu } \right) ^4 (2 \bar{h}^{(1)})^3 \right) \;, \end{aligned} \end{aligned}$$
(99)
that can be implied by
$$\begin{aligned} \begin{aligned}&\gamma< \left( \frac{\sqrt{\mu }}{\sqrt{18} K^2 L_H}\left( \frac{20L^2}{\mu } (2 \bar{h}^{(1)})^{\frac{1}{2}} + \left( \frac{40L_H}{\mu } \right) ^2 (2 \bar{h}^{(1)})^{\frac{3}{2}} \right) ^{-1} \right) ^{\frac{1}{2}} \mathrel{\mathop :}=\frac{\bar{c}_1}{L}~~~~\text{and} \\&\gamma < \left( \frac{2 {\mu }}{81 K^4 L_H^2} \left( \left( \frac{20L^2}{\mu }\right) ^2 (2 \bar{h}^{(1)} ) + \left( \frac{40L_H}{\mu } \right) ^4 (2 \bar{h}^{(1)})^3 \right) ^{-1} \right) ^{\frac{1}{4}} \mathrel{\mathop :}=\frac{\bar{c}_2}{L} \;. \end{aligned} \end{aligned}$$
(100)
Substituting these constants into Proposition 6 proves the claims in Theorem 2.
G Proof of Proposition 6
Define \(\{ \bar{R}^{(k)} \}_{k \ge 1}\) that satisfies:
$$\begin{aligned} \bar{R}^{(k+1)} = p^k b \bar{R}^{(1)} + \sum _{\ell =1}^k p^{k-\ell } \left( \sum _{j=1}^J s_j \max _{ (\ell - M)_{++} \le q \le \ell } (\bar{R}^{(q)})^{\eta _j} \right) ,~~\bar{R}^{(1)} = R^{(1)} \;, \end{aligned}$$
(101)
By subtracting \(p \bar{R}^{(k)}\) from \(\bar{R}^{(k+1)}\), (101) can be alternatively expressed as:
$$\begin{aligned} \bar{R}^{(k+1)} - p \bar{R}^{(k)} = \sum _{j=1}^J s_j \max _{ (k- M)_{++} \le q \le k } (\bar{R}^{(q)})^{\eta _j} \;. \end{aligned}$$
(102)
Now, consider the statements (1) and (2) in (56) as the following event:
$$\begin{aligned} \begin{aligned} \mathcal{E}_z = \Big \{&~ \bar{R}^{((z-1)M + k+1)} \ge R^{((z-1)M + k+1)}, ~\bar{R}^{((z-1)M + k+1)} \le \delta ^z (b \bar{R}^{(1)} ),~ k = 1,..., M \Big \} \;, \end{aligned} \end{aligned}$$
for all \(z \ge 1\). We shall prove that \(\mathcal{E}_z\) is true for \(z=1,2,...\) using induction.
Base case with\(z=1\). To prove \(\mathcal{E}_1\), let us apply another induction on k inside the event. For the base case of \(k=1\),
$$\begin{aligned} \begin{aligned} \bar{R}^{(2)}&\ge p ( b R^{(1)} ) + \sum _{j=1}^J s_j (R^{(1)})^{\eta _j} - ( \bar{f} - f( R^{(1)})) D^{(1)} = R^{(2)} \;, \end{aligned} \end{aligned}$$
(103)
where we used the fact \(\bar{f} \ge f( b R^{(1)} ) \ge f( R^{(1)} )\). Furthermore, the base case holds as:
$$\begin{aligned} \bar{R}^{(2)} = (b \bar{R}^{(1)}) \left( p + (1/b) \sum _{j=1}^J s_j ( \bar{R}^{(1)} )^{\eta _j - 1} \right) \le \delta ( b \bar{R}^{(1)} ) \;. \end{aligned}$$
(104)
For the induction step, suppose that the statements in (103) are also true up to \(k=k' - 1\) with \(z=1\) such that \(\bar{R}^{(k')} \ge R^{(k')}\) and \(\bar{R}^{(k')} \le \delta ( b \bar{R}^{(1)} )\). Consider the case of \(k=k'\), we observe that \(\bar{f} \ge f( b R^{(1)} ) \ge f (\delta b R^{(1)} ) \ge f( \bar{R}^{(q)} ) \ge f( R^{(q)} )\) for all \(q=1,...,k'\). Therefore, we can lower bound \(\bar{R}^{(k'+1)}\) as:
$$\begin{aligned} \begin{aligned}&\bar{R}^{(k'+1)} = p^{k'} ( b \bar{R}^{(1)} ) + \sum _{\ell =1}^{k'} p^{k'-\ell } \left( \sum _{j=1}^J s_j \max _{ (\ell -M)_{++} \le q \le \ell } (\bar{R}^{(q)})^{\eta _j} \right) \\&\quad \ge p^{k'} ( b R^{(1)} ) + \sum _{\ell =1}^{k'} p^{k'-\ell } \left( \sum _{j=1}^J s_j \max _{ (\ell -M)_{++} \le q \le \ell } (R^{(q)})^{\eta _j} - \left( \bar{f} - \max _{\ell \le q \le k'} f(R^{(q)}) \right) V^{(\ell )} \right) , \end{aligned} \end{aligned}$$
where the right hand side is exactly \(R^{(k'+1)}\); also, using (102), we can show:
$$\begin{aligned} \begin{aligned} \bar{R}^{(k'+1)}&\le ( b \bar{R}^{(1)} ) \left( \delta p + \sum _{j=1}^J s_j (b \bar{R}^{(1)})^{\eta _j-1} \right) \le \delta ( b \bar{R}^{(1)} ) \;. \end{aligned} \end{aligned}$$
(105)
Induction Case. For the induction case, suppose that \(\mathcal{E}_z\) is true for all z up to \(z'\). We consider the case when \(z = z' + 1\). Once again, we apply another induction on k. In the base case of \(k = 1\) and \(z=z' + 1\), we have
$$\begin{aligned} \begin{aligned}&\bar{R}^{(z'M+2)} = p^{z'M+1} ( b \bar{R}^{(1)} ) + \sum _{\ell =1}^{z'M+1} p^{z'M+1-\ell } \left( \sum _{j=1}^J s_j \max _{ (\ell -M)_{++} \le q \le \ell } (\bar{R}^{(q)})^{\eta _j} \right) \\&\quad \ge p^{z'M+1} ( b R^{(1)} ) + \sum _{\ell =1}^{z'M+1} p^{z'M+1-\ell } \left( \sum _{j=1}^J s_j \max _{ (\ell -M)_{++} \le q \le \ell } (R^{(q)})^{\eta _j} \right. \\&\qquad \left. - \left( \bar{f} - \max _{\ell \le q \le z'M + 1} f(R^{(q)}) \right) V^{(\ell )} \right) = R^{(z'M+2)} \;, \end{aligned} \end{aligned}$$
where we used \(\bar{f} \ge f( b R^{(1)} ) \ge f ( \bar{R}^{(q)} ) \ge f( R^{(q)} )\) for all q up to \(q = z'M+1\) (by the induction hypothesis). Furthermore, the base case holds since:
$$\begin{aligned} \begin{aligned} \bar{R}^{(z'M+2)}&= p \bar{R}^{(z'M+1)} + \sum _{j=1}^J s_j \max _{ (z'M+1-M)_{++} \le q \le z'M+1 } ( \bar{R}^{(q)} )^{\eta _j} \\&\quad \le \delta ^{z'} (b \bar{R}^{(1)}) \left( p + \sum _{j=1}^J s_j (\delta ^{z'})^{\eta _j-1} (b \bar{R}^{(1)})^{\eta _j-1} \right) \le \delta ^{z'+1} ( b \bar{R}^{(1)} ) \;. \end{aligned} \end{aligned}$$
(106)
Let the statements in \(\mathcal{E}_z\) be true up to \(k=k' - 1\), \(z=z'+1\). With \(k = k'\),
$$\begin{aligned} \begin{aligned} \bar{R}^{( z'M + k' + 1 )}&\ge p^{z'M+k'} ( b R^{(1)} ) + \sum _{\ell =1}^{z'M+k'} p^{z'M+k'-\ell } \left( \sum _{j=1}^J s_j \max _{ (\ell -M)_{++} \le q \le \ell } (R^{(q)})^{\eta _j} \right. \\&\quad \left. - \left( \bar{f} - \max _{\ell \le q \le z'M + k'} f(R^{(q)}) \right) V^{(\ell )} \right) = R^{(z'M + k' + 1)} \;,\\ \bar{R}^{(z'M+k'+1)}&\le \delta ^{z'} (b \bar{R}^{(1)}) \left( \delta p + \sum _{j=1}^J s_j (\delta ^{z'})^{\eta _j-1} (b \bar{R}^{(1)})^{\eta _j-1} \right) \le \delta ^{z'+1} ( b \bar{R}^{(1)} ) \;. \end{aligned} \end{aligned}$$
(107)
The induction case is thus proven. This shows that the event \(\mathcal{E}_z\) is true for all \(z \ge 1\).
Proving statement (iii). We apply statement (ii) to prove (iii). From (102),
$$\begin{aligned} \begin{aligned} \frac{ \bar{R}^{(k+1)} }{ \bar{R}^{(k)} }&= p + \frac{1}{ \bar{R}^{(k)} } \sum _{j=1}^J s_j \max _{ (k-M)_{++} \le q \le k} (\bar{R}^{(q)} )^{\eta _j} \;. \end{aligned} \end{aligned}$$
(108)
For any \(q \in [(k-M)_{++}, k]\), we have
$$\begin{aligned} \frac{ (\bar{R}^{(q)})^{\eta _j} }{\bar{R}^{(k)}} = \frac{ \bar{R}^{(q)} }{ \bar{R}^{(k)} } (\bar{R}^{(q)})^{\eta _j - 1} \le \frac{ \bar{R}^{(q)} }{ \bar{R}^{(k)} } \left( \delta ^{\lceil (q-1) / M \rceil } ( b R^{(1)} ) \right) ^{\eta _j - 1} \;. \end{aligned}$$
(109)
Since \(\eta _j > 1\) and \(|q-k| \le M\), we have \(\delta ^{\lceil (q-1) / M \rceil ( \eta _j - 1 )} \rightarrow 0\) as \(k \rightarrow \infty\), moreover as \(\bar{R}^{(k+1)} / \bar{R}^{(k)} \ge p\) for all \(k \ge 1\), \(\bar{R}^{(q)} / \bar{R}^{(k)} \le p^{-M}\) for all q. Therefore, we get
$$\begin{aligned} \lim _{ k \rightarrow \infty } \frac{ \max _{ (k-M)_{++} \le q \le k} (\bar{R}^{(q)} )^{\eta _j} }{ \bar{R}^{(k)} } = 0,~\forall ~j \Longrightarrow \lim _{ k \rightarrow \infty } \frac{ \bar{R}^{(k+1)} }{ \bar{R}^{(k)} } = p \;. \end{aligned}$$
(110)