Appendix A: More Experiments
1.1 A.1 Results with Support Vector Machine (SVM)
This subsection compares SAAGs against SVRG and VR-SGD on SVM problem with mushroom and gisette datasets. Methods use stochastic backtracking line search method to find the step size. Figure 8 presents the results and compares the suboptimality against the training time (in seconds). Results are similar to experiments with logistic regression but are not that smooth. SAAGs outperform other methods on mushroom dataset (first row) and gisette dataset (second row) for suboptimality against training time and accuracy against time but all methods give almost similar results on accuracy versus training time for mushroom dataset. SAAG-IV outperforms other method and SAAG-III sometimes lags behind VR-SGD method. It is also observed that results with logistic regression are better than the results with the SVM problem. The optimization problem for SVM is given below:
$$ \underset{w}{\min} F(w) = \frac{1}{n} \sum\limits_{i = 1}^{n} \max\left( 0, 1 - y_{i} w^{T} x_{i} \right)^{2} + \frac{\lambda}{2} \|w\|^{2}, $$
(13)
where λ is the regularization coefficient (also penalty parameter) which balances the trade off between margin size and error [7].
1.2 A.2 Comparison of SAAGs (I, II, III and IV) for non-smooth problem
Comparison of SAAGs for non-smooth problem is depicted in Fig. 9 using Adult dataset with mini-batch of 32 data points. As it is clear from the figure, just like the smooth problem, results with SAAG-III and IV are stable and better or equal to SAAG-I and II.
1.3 A.3 Effect of mini-batch size on SAAG-III, IV, SVRG and VR-SGD for non-smooth problem
Effect of mini-batch size on SAAG-III, IV, SVRG and VR-SGD for non-smooth problem is depicted in Fig. 10 using rcv1 binary dataset with mini-batch of 32, 64 and 128 data points. Similar to smooth problem, proposed methods outperform SVRG and VR-SGD methods. SAAG-IV gives the best result in terms of time and epochs but in terms of gradients/n, SAAG-III gives best results.
1.4 A.4 Effect of mini-batch size on SAAGs (I, II, III, IV) for smooth problem
Effect of mini-batch size on SAAGs (I, II, III, IV) for smooth problem is depicted in Fig. 11 using Adult dataset with mini-batch sizes of 32, 64 and 128 data points. The results are similar to non-smooth problem.
1.5 A.5 Effect of regularization coefficient for non-smooth problem
Figure 12 depicts effect of regularization coefficient on SAAG-III, IV, SVRG and VR-SGD for non-smooth problem using rcv1 dataset. It considers regularization coefficient values as 10− 3, 10− 5 and 10− 7. The results are similar to smooth problem. As it is clear from the figure, for larger values, 10− 3, all the methods do not perform well but once the coefficient is sufficiently small, it does not make much difference, and in all the cases our proposed methods outperform SVRG and VR-SGD.
B Proofs
Following assumptions are considered in the paper:
Assumption 1 (Smoothness)
Suppose function\(f_{i}: \mathbb {R}^{n} \rightarrow \mathbb {R}\)isconvex and differentiable, and that gradient ∇fi,∀iis L-Lipschitz-continuous, whereL > 0 isLipschitz constant, then, we have,
$$ \| \nabla f_{i}(y) - \nabla f_{i}(x)\| \le L \|y-x\|, $$
(14)
$$ \begin{array}{ll} \text{and},\quad f_{i}(y) \le f_{i}(x) +\nabla f_{i}(x)^{T}(y-x)+ \frac{L}{2} \|y-x\|^{2}. \end{array} $$
(15)
Assumption 2 (Strong Convexity)
Suppose function\(F: \mathbb {R}^{n} \rightarrow \mathbb {R}\)isμ-stronglyconvex function forμ > 0 andF∗isthe optimal value of F, then, we have,
$$ F(y) \ge F(x) +\nabla F(x)^{T}(y-x) + \frac{\mu}{2} \|y-x\|^{2}, $$
(16)
$$ \begin{array}{ll} \text{and},\quad & F(x) - F^{*} \le \frac{1}{2\mu}\|\nabla F(x)||^{2} \end{array} $$
(17)
Assumption 3 (Assumption 3 in [12])
For alls = 1, 2,...,S,the following inequality holds
$$ \mathbb{E}\left[F({w^{s}_{0}}) - F(w^{*}) \right] \le c\mathbb{E}\left[F(\tilde{w}^{s-1}) - F(w^{*}) \right] $$
(18)
where 0 < c ≪ mis a constant.
We derive our proofs by taking motivation from [12] and [27]. Before providing the proofs, we provide certain lemmas, as given below:
Lemma 1 (3-Point Property [17])
Let\(\hat {z}\)bethe optimal solution of the following problem:\(\underset {z\in \mathbb {R}^{d}}{\min }\quad \frac {\tau }{2} \|z-z_{0}\|^{2} + r(z), \)whereτ ≥ 0 andr(z) isa convex function (but possibly non-differentiable). Then for any\(z\in \mathbb {R}^{d}\),then the following inequality holds,
$$ \frac{\tau}{2} \|\hat{z}-z_{0}\|^{2} + r(\hat{z}) \le r(z) + \frac{\tau}{2} \left( \|z-z_{0}\|^{2} - \|z - \hat{z}\|^{2} \right) $$
(19)
Lemma 2 (Theorem 4 in [16])
For non-smooth problems, taking
\(\tilde {\nabla }^{\prime }_{s,k} = \frac {1}{b} {\sum }_{i \in B_{k}} \nabla f_{i} ({w^{s}_{k}}) - \frac {1}{b} {\sum }_{i \in B_{k}} \nabla f_{i} (\tilde {w}^{s-1}) + \frac {1}{n} {\sum }_{i = 1}^{n}f_{i}(\tilde {w}^{s-1})\)
,
we have
\(\mathbb {E} \left [\tilde {\nabla }^{\prime }_{s,k}\right ] = \nabla f({w^{s}_{k}})\)
and
the variance satisfies following inequality,
$$ \begin{array}{@{}rcl@{}} \mathbb{E} \left[\|\tilde{\nabla}^{\prime}_{s,k} - \nabla f({w^{s}_{k}})\|^{2}\right] &\le& 4L\alpha(b) \left[ F({w^{s}_{k}}) - F(w^{*})\right. \\ &&\left.+ F(\tilde{w}^{s-1}) - F(w^{*}) \right], \end{array} $$
(20)
whereα(b) = (n − b)/(b(n − 1)).
Following the Lemma 2 for non-smooth problems, one can easily prove the following results for the smooth problems,
Lemma 3
For smooth problems, taking
\(\tilde {\nabla }^{\prime }_{s,k} = \frac {1}{b} {\sum }_{i \in B_{k}} \)
\(\nabla f_{i} ({w^{s}_{k}}) - \frac {1}{b} {\sum }_{i \in B_{k}} \nabla f_{i} (\tilde {w}^{s-1}) + \frac {1}{n} {\sum }_{i = 1}^{n}f_{i}(\tilde {w}^{s-1})\)
,
we have
\(\mathbb {E} \left [\tilde {\nabla }^{\prime }_{s,k}\right ] = \nabla f({w^{s}_{k}})\)
and
the variance satisfies following inequality,
$$ \begin{array}{@{}rcl@{}} \mathbb{E} \left[\|\tilde{\nabla}^{\prime}_{s,k} - \nabla f({w^{s}_{k}})\|^{2}\right] &\le& 4L\alpha(b) \left[ f({w^{s}_{k}}) - f^{*} \right.\\ &&\left.+ f(\tilde{w}^{s-1}) - f^{*}\right], \end{array} $$
(21)
whereα(b) = (n − b)/(b(n − 1)).
Lemma 4 (Extension of Lemma 3.4 in [27] to mini-batches)
Under Assumption 1 for smooth regularizer, we have
$$ \mathbb{E} \left[\|\nabla_{B_{k}} f({w^{s}_{k}}) - \nabla_{B_{k}} f(w^{*})\|^{2}\right] \le 2L \left[ f({w^{s}_{k}}) - f(w^{*}) \right] $$
(22)
Proof
Given any k = 0, 1,..., (m − 1), consider the function,
$$ \phi_{B_{k}} (w) = f_{B_{k}} (w) - f_{B_{k}}(w^{*}) - \nabla_{B_{k}} f(w^{*})^{T} (w-w^{*}) $$
It is straightforward to check that \(\nabla \phi _{B_{k}} (w^{*}) = 0\), hence \(\min _{w} \phi _{B_{k}} (w) = \phi _{B_{k}} (w^{*}) = 0\). Since \(\phi _{B_{k}} (w)\) is Lipschitz continuous so we have,
$$ \begin{array}{@{}rcl@{}} &&\!\!\!\!\!\frac{1}{2L}\|\nabla\phi_{B_{k}} (w)\|^{2} \le \phi_{B_{k}} (w) - \min_{w} \phi_{B_{k}} (w)\\ &&{\kern66pt}= \phi_{B_{k}} (w) - \phi_{B_{k}} (w^{*}) = \phi_{B_{k}} (w)\\ &&\!\!\!\!\!\!\implies \!\| \nabla f_{B_{k}} (w) - \nabla f_{B_{k}}(w^{*})\|^{2}\\ &&{\kern66pt}\le\! 2L\! \left[ \!f_{B_{k}} (w) - f_{B_{k}}\!(w^{*}) - \nabla_{B_{k}} f(w^{*})^{T} \!(w - w^{*}) \!\right] \end{array} $$
Taking expectation, we have
$$ \begin{array}{@{}rcl@{}} \mathbb{E}[\| \nabla f_{B_{k}} (w) - \nabla f_{B_{k}}(w^{*})\|^{2} ] &\le& 2L \left[ f (w) - f(w^{*})\right.\\ && \left.- \nabla f(w^{*})^{T} (w-w^{*}) \right]\\ \end{array} $$
(23)
By optimality, ∇f(w∗) = 0, we have
$$ \begin{array}{ll} \mathbb{E}[\| \nabla f_{B_{k}} (w) - \nabla f_{B_{k}}(w^{*})\|^{2} ] \le 2L \left[ f (w) - f(w^{*}) \right] \end{array} $$
This proves the required lemma. □
Lemma 5 (Extension of Lemma 3.4 in [27] to mini-batches)
Under Assumption 1 for non-smooth regularizer, we have
$$ \mathbb{E} \left[\|\nabla_{B_{k}}f({w^{s}_{k}}) - \nabla_{B_{k}} f(w^{*})\|^{2}\right] \le 2L \left[ F({w^{s}_{k}}) - F(w^{*}) \right] $$
(24)
Proof
From inequality (23), we have
$$ \begin{array}{@{}rcl@{}} \mathbb{E}[\| \nabla f_{B_{k}} (w) - \nabla f_{B_{k}}(w^{*})\|^{2} ] &\le& 2L \left[ f (w) - f(w^{*}) \right.\\ &&\left.- \nabla f(w^{*})^{T} (w-w^{*}) \right]\\ \end{array} $$
(25)
By optimality, there exist ξ ∈ ∂g(w∗), such that, ∇F(w∗) = ∇f(w∗) + ξ = 0, we have
$$ \begin{array}{ll} \mathbb{E}[\| \nabla f_{B_{k}} (w) - \nabla f_{B_{k}}(w^{*})\|^{2} ] &\le 2L \left[ f (w) - f(w^{*}) + \xi^{T} (w-w^{*}) \right]\\ &\le 2L \left[ f (w) - f(w^{*}) + g(w) - g (w^{*}) \right]\\ &\le 2L \left[ F(w) - F(w^{*})\right] \end{array} $$
(26)
second inequality follows from the convexity of g. This proves the required lemma. □
Lemma 6 (Variance Bound for smooth problem)
Under the Assumption 1 and taking\(\nabla _{B_{k}} f({w^{s}_{k}}) = \frac {1}{b} {\sum }_{i \in B_{k}} \nabla f_{i} ({w^{s}_{k}})\),\( \nabla _{B^{\prime }_{k}} f(\tilde {w}^{s-1}) = \frac {1}{n} {\sum }_{i \in B_{k}} \nabla f_{i} (\tilde {w}^{s-1})\),\(\tilde {\mu }^{s} = \frac {1}{n} {\sum }_{i = 1}^{n} \nabla f_{i}(\tilde {w}^{s-1})\)andthe gradient estimator,\(\tilde {\nabla }_{s,k} = \nabla _{B_{k}} f({w^{s}_{k}}) - \nabla _{B^{\prime }_{k}} f(\tilde {w}^{s-1}) +\tilde {\mu }^{s} \),then the variance satisfies the following inequalityFootnote 3,
$$ \begin{array}{@{}rcl@{}} \mathbb{E} \left[\| \tilde{\nabla}_{s,k} -\nabla f({w^{s}_{k}})\|^{2} \right] &\le& 8L\alpha(b) \left[ f({w^{s}_{k}}) - f^{*} \right]\\&&+ \frac{8L\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}}\\ &&\times\left[ f(\tilde{w}^{s-1}) - f^{*} \right] + R^{\prime}\\ \end{array} $$
(27)
where α(b) = (n − b)/(b(n − 1)) and \(R^{\prime }\) is a constant.
Proof
First the expectation of estimator is given by
$$ \begin{array}{ll} \mathbb{E}\left[\tilde{\nabla}_{s,k} \right] &= \mathbb{E}\left[\nabla_{B_{k}} f({w^{s}_{k}}) - \nabla_{B^{\prime}_{k}} f(\tilde{w}^{s-1}) +\tilde{\mu}^{s} \right]\\ &= \nabla f({w^{s}_{k}}) - \frac{b}{n} \nabla f(\tilde{w}^{s-1}) + \nabla f(\tilde{w}^{s-1})\\ & = \nabla f({w^{s}_{k}}) + \frac{m-1}{m} \nabla f(\tilde{w}^{s-1}), \end{array} $$
(28)
second equality follows as n = mb. Now the variance bound is calculated as follows,
$$ \begin{array}{@{}rcl@{}} &&\mathbb{E} \left[ \| \tilde{\nabla}_{s,k} - \nabla f({w^{s}_{k}}) \|^{2}\right]\\ &=& \mathbb{E}\left[\| \nabla_{B_{k}} f({w^{s}_{k}}) - \nabla_{B^{\prime}_{k}} f(\tilde{w}^{s-1}) +\nabla f (\tilde{w}^{s-1}) - \nabla f ({w^{s}_{k}})\|^{2}\right]\\ &=& \mathbb{E}\left[\| \nabla_{B_{k}} f({w^{s}_{k}}) - \nabla_{B_{k}} f(\tilde{w}^{s-1}) +\nabla f (\tilde{w}^{s-1}) - \nabla f ({w^{s}_{k}}) + \frac{m-1}{m}\nabla_{B_{k}} f(\tilde{w}^{s-1}) \|^{2}\right]\\ &\le& 2 \mathbb{E} \left[ \| \nabla_{B_{k}} f({w^{s}_{k}}) - \nabla_{B_{k}} f(\tilde{w}^{s-1}) +\nabla f (\tilde{w}^{s-1}) - \nabla f ({w^{s}_{k}})\|^{2}\right] + \frac{2(m-1)^{2}}{m^{2}} \mathbb{E} \left[ \| \nabla_{B_{k}} f(\tilde{w}^{s-1}) \|^{2} \right]\\ &\le& 8L\alpha(b) \left[ f({w^{s}_{k}}) - f^{*} + f(\tilde{w}^{s-1}) - f^{*} \right] + \frac{2(m-1)^{2}}{m^{2}} \mathbb{E} \left[ \| \nabla_{B_{k}} f(\tilde{w}^{s-1}) \|^{2} \right] \end{array} $$
(29)
inequality follows from, \(\|a+b\|^{2} \le 2\left (\|a\|^{2} + \|b\|^{2} \right )\) for \(a,b\in \mathbb {R}^{d}\) and applying the Lemma 3.
$$ \begin{array}{@{}rcl@{}} \text{Now,} &&\frac{2(m-1)^{2}}{m^{2}} \mathbb{E} \left[ \| \nabla_{B_{k}} f(\tilde{w}^{s-1}) \|^{2} \right]\\ &\le& \frac{2(m-1)^{2}}{m^{2}} \left[2 \mathbb{E} \|\nabla_{B_{k}} f(\tilde{w}^{s-1}) - \nabla_{B_{k}} f(w^{*}) \|^{2}\right.\\&&\qquad\quad\quad\quad\left. + 2 \mathbb{E} \| \nabla_{B_{k}} f(w^{*}) \|^{2} \right]\\ & \le& \frac{8L(m-1)^{2}}{m^{2}}\left[ f(\tilde{w}^{s-1}) - f(w^{*}) \right] + R^{\prime} \end{array} $$
(30)
first inequality follows from, \(\|a+b\|^{2} \le 2\left (\|a\|^{2} + \|b\|^{2} \right )\) for \(a,b\in \mathbb {R}^{d}\), second inequality follows from Lemma 4 and assuming \(\mathbb {E}\| \nabla _{B_{k}} f(w^{*}) \|^{2} \le R, \forall k \text {and}\) where taking \(R^{\prime } = \frac {2(m-1)^{2}}{m^{2}}* R\). Now, substituting the above inequality in (29), we have
$$ \begin{array}{@{}rcl@{}} &&\mathbb{E} \left[ \| \tilde{\nabla}_{s,k} - \nabla f({w^{s}_{k}}) \|^{2}\right]\\ &\le& 8L\alpha(b) \left[ f({w^{s}_{k}}) - f^{*} + f(\tilde{w}^{s-1}) - f^{*} \right]\\ &&+ \frac{8L(m-1)^{2}}{m^{2}}\left[ f(\tilde{w}^{s-1}) - f(w^{*}) \right] + R^{\prime}\\ &=& 8L\alpha(b) \left[ f({w^{s}_{k}}) - f^{*} \right]\\ &&+ \frac{8L\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}} \left[ \!f(\tilde{w}^{s-1}) - f^{*} \right] + R^{\prime}, \end{array} $$
(31)
This proves the required lemma. □
Lemma 7 (Variance Bound for non-smooth problem)
Under Assumption 1 and taking notations as in Lemma 6, the variance bound satisfies the following inequality,
$$ \begin{array}{@{}rcl@{}} \mathbb{E} \left[\| \tilde{\nabla}_{s,k} -\nabla f({w^{s}_{k}})\|^{2} \right] &\le& 8L\alpha(b) \left[ F({w^{s}_{k}}) - F^{*} \right]\\ &&+ \frac{8L\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}}\\ &&\times\left[ F(\tilde{w}^{s-1}) - F^{*} \right] + R^{\prime}, \end{array} $$
(32)
whereα(b) = (n − b)/(b(n − 1)) and\(R^{\prime }\)isconstant.
Proof
$$ \begin{array}{@{}rcl@{}} &&\mathbb{E} \left[ \| \tilde{\nabla}_{s,k} - \nabla f({w^{s}_{k}}) \|^{2}\right]\\ &=& \mathbb{E}\left[\| \nabla_{B_{k}} f({w^{s}_{k}}) - \nabla_{B^{\prime}_{k}} f(\tilde{w}^{s-1}) +\nabla f (\tilde{w}^{s-1}) - \nabla f ({w^{s}_{k}})\|^{2}\right]\\ &=& \mathbb{E}\left[\| \nabla_{B_{k}} f({w^{s}_{k}}) - \nabla_{B_{k}} f(\tilde{w}^{s-1}) +\nabla f (\tilde{w}^{s-1}) - \nabla f ({w^{s}_{k}}) + \frac{m-1}{m}\nabla_{B_{k}} f(\tilde{w}^{s-1}) \|^{2}\right]\\ &\le& 2 \mathbb{E} \left[ \| \nabla_{B_{k}} f({w^{s}_{k}}) - \nabla_{B_{k}} f(\tilde{w}^{s-1}) +\nabla f (\tilde{w}^{s-1}) - \nabla f ({w^{s}_{k}})\|^{2}\right] + \frac{2(m-1)^{2}}{m^{2}} \mathbb{E} \left[ \| \nabla_{B_{k}} f(\tilde{w}^{s-1}) \|^{2} \right]\\ &\le& 8L\alpha(b) \left[ F({w^{s}_{k}}) - F({w^{*}}) + F(\tilde{w}^{s-1}) - F({w^{*}}) \right] + \frac{2(m-1)^{2}}{m^{2}} \mathbb{E} \left[ \| \nabla_{B_{k}} f(\tilde{w}^{s-1}) \|^{2} \right] \end{array} $$
(33)
inequality follows from, \(\|a+b\|^{2} \le 2\left (\|a\|^{2} + \|b\|^{2} \right )\) for \(a,b\in \mathbb {R}^{d}\) and applying the Lemma 2.
$$ \begin{array}{@{}rcl@{}} \text{Now,} &&\frac{2(m-1)^{2}}{m^{2}} \mathbb{E} \left[ \| \nabla_{B_{k}} f(\tilde{w}^{s-1}) \|^{2} \right]\\ &\le& \frac{2(m-1)^{2}}{m^{2}} \left[2 \mathbb{E} \|\nabla_{B_{k}} f(\tilde{w}^{s-1}) - \nabla_{B_{k}} f(w^{*}) \|^{2}\right.\\ &&\qquad\quad\quad~\left.+ 2 \mathbb{E} \| \nabla_{B_{k}} f(w^{*}) \|^{2} \right]\\ & \le& \frac{8L(m-1)^{2}}{m^{2}}\left[ F(\tilde{w}^{s-1}) - F(w^{*}) \right] + R^{\prime} \end{array} $$
(34)
first inequality follows from, \(\|a+b\|^{2} \le 2\left (\|a\|^{2} + \|b\|^{2} \right )\) for \(a,b\in \mathbb {R}^{d}\), second inequality follows from Lemma 5 and assuming \(\| \nabla _{B_{k}} f(w^{*}) \|^{2} \le R, \forall k\) and taking \(R^{\prime } = \frac {2(m-1)^{2}}{m^{2}}* R\). Now, substituting the above inequality in (33), we have
$$ \begin{array}{@{}rcl@{}} &&\mathbb{E} \left[ \| \tilde{\nabla}_{s,k} - \nabla f({w^{s}_{k}}) \|^{2}\right]\\ &\le& 8L\alpha(b) \left[ F({w^{s}_{k}}) - F^{*} + F(\tilde{w}^{s-1}) - F^{*} \right]\\ &&+ \frac{8L(m-1)^{2}}{m^{2}}\left[ F(\tilde{w}^{s-1}) - F(w^{*}) \right] + R^{\prime}\\ &=& 8L\alpha(b) \left[ F({w^{s}_{k}}) - F^{*} \right] + \frac{8L\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}}\\ &&\times\left[ F(\tilde{w}^{s-1}) - F^{*} \right] + R^{\prime}, \end{array} $$
(35)
This proves the required lemma. □
Proof of Theorem 1
(Non-strongly convex and smooth problem with SAAG-IV) □
Proof
By smoothness, we have,
$$ \begin{array}{@{}rcl@{}} f(w^{s}_{k + 1}) &\le& f({w^{s}_{k}}) + <\nabla f({w^{s}_{k}}), w^{s}_{k + 1} - {w^{s}_{k}}> + \frac{L}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2}\\ &=& f({w^{s}_{k}}) + <\nabla f({w^{s}_{k}}), w^{s}_{k + 1} - {w^{s}_{k}}> + \frac{L\beta}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2}\\ &&- \frac{L(\beta-1)}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2}\\ &=& f({w^{s}_{k}}) + <\tilde{\nabla}_{s,k}, w^{s}_{k + 1} - {w^{s}_{k}}> + \frac{L\beta}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2}\\ && + <\nabla f({w^{s}_{k}}) - \tilde{\nabla}_{s,k}, w^{s}_{k + 1} - {w^{s}_{k}}>\\&&- \frac{L(\beta-1}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2}, \end{array} $$
(36)
where β is appropriately chosen positive value. Now, separately simplifying the terms, we have
$$ \begin{array}{@{}rcl@{}} &&\mathbb{E}\left[ f({w^{s}_{k}}) + <\tilde{\nabla}_{s,k}, w^{s}_{k + 1} - {w^{s}_{k}}> + \frac{L\beta}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2} \right]\\ &=& f({w^{s}_{k}}) + \mathbb{E}\left[<\tilde{\nabla}_{s,k}, w^{s}_{k + 1} - {w^{s}_{k}}>\right] + \frac{L\beta}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2}\\ &=& f({w^{s}_{k}}) + <\nabla f({w^{s}_{k}}) + \frac{m-1}{m}\nabla f(\tilde{w}^{s-1}), w^{s}_{k + 1} - {w^{s}_{k}}> + \frac{L\beta}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2}\\ &=& f({w^{s}_{k}}) + <\nabla f({w^{s}_{k}}), w^{s}_{k + 1} - {w^{s}_{k}}> + \frac{L\beta}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2} + \frac{m-1}{m}<\nabla f(\tilde{w}^{s-1}), w^{s}_{k + 1} - {w^{s}_{k}}>\\ &\le& f({w^{s}_{k}}) + <\nabla f({w^{s}_{k}}), w^{*} - {w^{s}_{k}}> + \frac{L\beta}{2} \| w^{*} - {w^{s}_{k}} \|^{2} - \frac{L\beta}{2} \| w^{*} - w^{s}_{k + 1}\|^{2}\\ &&+ \frac{m-1}{m}<\nabla f(\tilde{w}^{s-1}), w^{s}_{k + 1} - {w^{s}_{k}}>\\ & =& f({w^{s}_{k}}) + <\nabla f({w^{s}_{k}}), w^{*} - {w^{s}_{k}}> + \frac{L\beta}{2} \| w^{*} - {w^{s}_{k}} \|^{2} - \frac{L\beta}{2} \| w^{*} - w^{s}_{k + 1}\|^{2}\\ &&+ \frac{m-1}{m}\left[<\nabla f(\tilde{w}^{s-1}), w^{s}_{k + 1} - w^{*}>- <\nabla f(\tilde{w}^{s-1}), {w^{s}_{k}} - w^{*}>\right]\\ &\le& f(w^{*}) + \frac{L\beta}{2} \left[ \| w^{*} - {w^{s}_{k}} \|^{2} - \| w^{*} - w^{s}_{k + 1}\|^{2}\right]\\ &&+ \frac{m-1}{m}\left[\frac{1}{2\delta}\|\nabla f(\tilde{w}^{s-1})\|^{2} + \frac{\delta}{2} \|w^{s}_{k + 1} - w^{*}\|^{2} - \left[ \frac{1}{2\delta}\|\nabla f(\tilde{w}^{s-1})\|^{2} +\frac{\delta}{2} \|{w^{s}_{k}} - w^{*}\|^{2}\right]\right]\\ &=& f(w^{*}) + \frac{L\beta}{2} \left[ \| w^{*} - {w^{s}_{k}} \|^{2} - \| w^{*} - w^{s}_{k + 1}\|^{2}\right] + \frac{\delta(m-1)}{2m}\left[\|w^{s}_{k + 1} - w^{*}\|^{2} - \|{w^{s}_{k}} - w^{*}\|^{2}\right],\\ &=& f(w^{*}) + \left( \frac{L\beta}{2} - \frac{\delta(m-1)}{2m}\right) \left[ \| w^{*} - {w^{s}_{k}} \|^{2} - \| w^{*} - w^{s}_{k + 1}\|^{2} \right],\\ &=& f(w^{*}), \end{array} $$
(37)
second equality follows from, \(\mathbb {E}\left [\tilde {\nabla }_{s,k}\right ] = \nabla f({w^{s}_{k}}) + \frac {m-1}{m}\nabla f(\tilde {w}^{s-1})\), first inequality follows from Lemma 1, second inequality follows from the convexity, i.e., \( f(w^{*}) \ge f({w^{s}_{k}}) + <\nabla f({w^{s}_{k}}), w^{*} - {w^{s}_{k}}>\) and Young’s inequality, i.e., xTy ≤ 1/(2δ)∥x∥2 + δ/2∥y∥2 for δ > 0, and last equality follows by choosing \(\delta = \frac {mL\beta }{(m-1)}\).
$$ \begin{array}{@{}rcl@{}} \text{and}, &&\mathbb{E} \left[ <\nabla f({w^{s}_{k}}) - \tilde{\nabla}_{s,k}, w^{s}_{k + 1} - {w^{s}_{k}}> - \frac{L(\beta-1)}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2} \right]\\ &\le& \mathbb{E} \left[ \frac{1}{2L(\beta-1)} \|\nabla f({w^{s}_{k}}) - \tilde{\nabla}_{s,k}\|^{2} + \frac{L(\beta-1)}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2} - \frac{L(\beta-1)}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2} \right]\\ &=& \frac{1}{2L(\beta-1)} \mathbb{E} \left[\| \tilde{\nabla}_{s,k} - \nabla f({w^{s}_{k}})\|^{2} \right]\\ &\le& \frac{1}{2L(\beta-1)} \left[8L\alpha(b) \left[ f({w^{s}_{k}}) - f^{*} \right] + \frac{8L\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}} \left[ f(\tilde{w}^{s-1}) - f^{*} \right] + R^{\prime}\right]\\ &=& \frac{4\alpha(b)}{(\beta-1)} \left[ f({w^{s}_{k}}) - f^{*} \right] + \frac{4\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}(\beta-1)} \left[ f(\tilde{w}^{s-1}) - f^{*} \right] + R^{\prime\prime} \end{array} $$
(38)
first inequality follows from Young’s inequality, second inequality follows from Lemma 6 and \(R^{\prime \prime } = R^{\prime }/(2L(\beta -1))\). Now, substituting the values into (36) from inequalities (37) and (38), and taking expectation w.r.t. mini-batches, we have
$$ \begin{array}{@{}rcl@{}} \mathbb{E} \left[ f(w^{s}_{k + 1})\right] \!&\le&\! f(w^{*}) + \frac{4\alpha(b)}{(\beta-1)} \left[ f({w^{s}_{k}}) - f^{*} \right]\\ \!\!&&+ \frac{4\left( \alpha(b)m^{2}+(m - 1)^{2}\right)}{m^{2}(\beta-1)} \!\left[ f(\tilde{w}^{s-1}) - f^{*} \!\right] + R^{\prime\prime}\\ \mathbb{E} \left[ {\kern-.5pt}f(w^{s}_{k + 1}) - f{\kern-.5pt}(w^{*})\right] \!&\le&\!\! \frac{4\alpha(b)}{(\beta-1)} \left[ f({w^{s}_{k}}) - f^{*} \right]\\ \!\!&&+ \frac{4\!{\kern-.5pt}\left( {\kern-.5pt}\alpha({\kern-.5pt}b{\kern-.5pt})m^{2} + (m - 1{\kern-.5pt})^{2}\right)}{m^{2}(\beta-1)} \!\left[ \!f(\tilde{w}^{s{\kern-.5pt}-{\kern-.5pt}1}{\kern-.5pt}) - f^{*} \!\right] + R^{\prime\prime} \end{array} $$
(39)
Taking sum over k = 0, 1,..., (m − 1) and dividing by m, we have
$$ \begin{array}{@{}rcl@{}} &&\frac{1}{m}{\sum}_{k = 0}^{m-1}\mathbb{E} \left[ f(w^{s}_{k + 1}) - f^{*}\right]\\ &\le& \frac{1}{m}{\sum}_{k = 0}^{m-1} \left[ \frac{4\alpha(b)}{(\beta-1)} \left[ f({w^{s}_{k}}) - f^{*} \right] + \frac{4\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}(\beta-1)} \left[ f(\tilde{w}^{s-1}) - f^{*} \right] + R^{\prime\prime}\right]\\ &&\frac{1}{m}{\sum}_{k = 1}^{m}\mathbb{E} \left[ f({w^{s}_{k}}) - f^{*}\right]\\ &\le& \frac{4\alpha(b)}{(\beta-1)} \frac{1}{m}{\sum}_{k = 1}^{m} \left[ f({w^{s}_{k}}) - f^{*} \right] + \frac{4\alpha(b)}{(\beta-1)} \frac{1}{m} \left[ f({w^{s}_{0}}) - f^{*} - \lbrace f({w^{s}_{m}}) - f^{*} \rbrace\right]\\ &&+ \frac{4\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}(\beta-1)} \left[ f (\tilde{w}^{s-1}) - f^{*} \right]+ R^{\prime\prime} \end{array} $$
(40)
Subtracting \(\frac {4\alpha (b)}{(\beta -1)} \frac {1}{m}{\sum }_{k = 1}^{m} \left [ f({w^{s}_{k}}) - f^{*} \right ] \) from both sides, we have
$$ \begin{array}{@{}rcl@{}} &&\left( 1-\frac{4\alpha(b)}{(\beta-1)}\right)\frac{1}{m}{\sum}_{k = 1}^{m}\mathbb{E} \left[ f({w^{s}_{k}}) - f^{*}\right]\\ &\le& \frac{4\alpha(b)}{(\beta-1)} \frac{1}{m} \left[ f({w^{s}_{0}}) - f^{*} - \lbrace f({w^{s}_{m}}) - f^{*} \rbrace\right]\\ &&+ \frac{4\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}(\beta-1)} \left[ f (\tilde{w}^{s-1}) - f^{*} \right]+ R^{\prime\prime} \end{array} $$
(41)
Since \(f({w^{s}_{m}}) - f^{*} \ge 0\) so dropping this term and using Assumption 3, we have
$$ \begin{array}{@{}rcl@{}} &&\left( 1-\frac{4\alpha(b)}{(\beta-1)}\right)\frac{1}{m}{\sum}_{k = 1}^{m}\mathbb{E} \left[ f({w^{s}_{k}}) - f^{*}\right]\\ &\le& \frac{4\alpha(b)}{(\beta-1)} \frac{1}{m} \left[ f({w^{s}_{0}}) - f^{*} \right] + \frac{4\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}(\beta-1)} \left[ f (\tilde{w}^{s-1}) - f^{*} \right]+ R^{\prime\prime}\\ &\le& \frac{4\alpha(b)}{(\beta-1)} \frac{1}{m} \left[ c\left( f (\tilde{w}^{s-1}) - f^{*}\right) \right] + \frac{4\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}(\beta-1)} \left[ f (\tilde{w}^{s-1}) - f^{*} \right]+ R^{\prime\prime}\\ &=&\left[ \frac{4\alpha(b)}{(\beta-1)}\frac{c}{m} +\frac{4\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}(\beta-1)} \right] \left[ \left( f (\tilde{w}^{s-1}) - f^{*}\right) \right]+ R^{\prime\prime}, \end{array} $$
(42)
Dividing both sides by \(\left (1-\frac {4\alpha (b)}{(\beta -1)}\right )\), and since \(\tilde {w}^{s} = 1/m {\sum }_{k = 1}^{m}{w^{s}_{k}}\) so by convexity, \(f(\tilde {w}^{s}) \le 1/m {\sum }_{k = 1}^{m} f({w^{s}_{k}})\), we have
$$ \begin{array}{@{}rcl@{}} \mathbb{E} \left[ f(\tilde{w}^{s}) - f^{*}\right] &\le& \!\left[ \!\frac{4\alpha(b)}{(\beta - 1 - 4\alpha(b))}\frac{c}{m} + \frac{4\left( \alpha(b)m^{2} + (m-1)^{2}\right)}{m^{2}(\beta - 1 - 4\alpha(b))} \right]\\ &&\times\left[ f (\tilde{w}^{s-1}) - f^{*} \right]+ R^{\prime\prime\prime}, \end{array} $$
(43)
where \(R^{\prime \prime \prime } = R^{\prime \prime } (\beta -1)/\left (\beta -1-4\alpha (b)\right )\). Now, applying this inequality recursively, we have
$$ \begin{array}{ll} \mathbb{E} \left[ f(\tilde{w}^{s}) - f^{*}\right] \le C^{s} \left[ f (\tilde{w}^{0}) - f^{*} \right] + R^{\prime\prime\prime\prime}, \end{array} $$
(44)
inequality follows for \(R^{\prime \prime \prime \prime }= R^{\prime \prime \prime }/(1-C)\), since \({\sum }_{i = 0}^{k} r^{i} \le {\sum }_{i = 0}^{\infty } r^{i} = \frac {1}{1-r}, \quad \|r\|<1\) and \(C = \left [ \frac {4\alpha (b)}{(\beta -1-4\alpha (b))}\frac {c}{m} +\frac {4\left (\alpha (b)m^{2}+(m-1)^{2}\right )}{m^{2}(\beta -1-4\alpha (b))} \right ]\). For certain choice of β, one can easily prove that C < 1. This proves linear convergence with some initial error. □
Proof of Theorem 2
(Strongly convex and smooth problem with SAAG-IV) □
Proof
By smoothness, we have,
$$ \begin{array}{@{}rcl@{}} f(w^{s}_{k + 1}) &\le& f({w^{s}_{k}}) + <\nabla f({w^{s}_{k}}), w^{s}_{k + 1} - {w^{s}_{k}}>\\ &&+ \frac{L}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2}\\ &=& f({w^{s}_{k}}) + <\nabla f({w^{s}_{k}}), w^{s}_{k + 1} - {w^{s}_{k}}> + \frac{L\beta}{2} \| w^{s}_{k + 1}\\ && - {w^{s}_{k}} \|^{2} - \frac{L(\beta-1}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2}\\ &=& f({w^{s}_{k}}) + <\tilde{\nabla}_{s,k}, w^{s}_{k + 1} - {w^{s}_{k}}> + \frac{L\beta}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2}\\ && + <\nabla f({w^{s}_{k}}) - \tilde{\nabla}_{s,k}, w^{s}_{k + 1} - {w^{s}_{k}}>\\&&- \frac{L(\beta-1}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2} \end{array} $$
(45)
where β is appropriately chosen positive value. Now, separately simplifying the terms, we have
$$ \begin{array}{@{}rcl@{}} &&\mathbb{E}\left[ f({w^{s}_{k}}) + <\tilde{\nabla}_{s,k}, w^{s}_{k + 1} - {w^{s}_{k}}> + \frac{L\beta}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2} \right]\\ &=& f({w^{s}_{k}}) + \mathbb{E}\left[<\tilde{\nabla}_{s,k}, w^{s}_{k + 1} - {w^{s}_{k}}>\right] + \frac{L\beta}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2}\\ &=& f({w^{s}_{k}}) + <\nabla f({w^{s}_{k}}) + \frac{m-1}{m}\nabla f(\tilde{w}^{s-1}), w^{s}_{k + 1} - {w^{s}_{k}}> + \frac{L\beta}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2}\\ &=& f({w^{s}_{k}}) + <\nabla f({w^{s}_{k}}), w^{s}_{k + 1} - {w^{s}_{k}}> + \frac{L\beta}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2} + \frac{m-1}{m}<\nabla f(\tilde{w}^{s-1}), w^{s}_{k + 1} - {w^{s}_{k}}>\\ &\le& f({w^{s}_{k}}) + <\nabla f({w^{s}_{k}}), w^{*} - {w^{s}_{k}}> + \frac{L\beta}{2} \| w^{*} - {w^{s}_{k}} \|^{2} - \frac{L\beta}{2} \| w^{*} - w^{s}_{k + 1}\|^{2}\\ &&+ \frac{m-1}{m}<\nabla f(\tilde{w}^{s-1}), w^{s}_{k + 1} - {w^{s}_{k}}>\\ &=& f({w^{s}_{k}}) + <\nabla f({w^{s}_{k}}), w^{*} - {w^{s}_{k}}> + \frac{L\beta}{2} \| w^{*} - {w^{s}_{k}} \|^{2} - \frac{L\beta}{2} \| w^{*} - w^{s}_{k + 1}\|^{2}\\ &&+ \frac{m-1}{m}\left[<\nabla f(\tilde{w}^{s-1}), w^{s}_{k + 1} - \tilde{w}^{s-1}>- <\nabla f(\tilde{w}^{s-1}), {w^{s}_{k}} - \tilde{w}^{s-1}>\right]\\ &\le& f(w^{*}) + \frac{L\beta}{2} \left[ \| w^{*} - {w^{s}_{k}} \|^{2} - \| w^{*} - w^{s}_{k + 1}\|^{2}\right]\\ &+& \frac{m-1}{m}\left[ f(w^{s}_{k + 1}) - f(\tilde{w}^{s-1}) - \left( f({w^{s}_{k}}) - f(\tilde{w}^{s-1})\right)\right]\\ &=& f(w^{*}) + \frac{L\beta}{2} \left[ \| w^{*} - {w^{s}_{k}} \|^{2} - \| w^{*} - w^{s}_{k + 1}\|^{2}\right] + \frac{m-1}{m}\left[ f(w^{s}_{k + 1}) - f({w^{s}_{k}})\right], \end{array} $$
(46)
second equality follows from, \(\mathbb {E}\left [\tilde {\nabla }_{s,k}\right ] = \nabla f({w^{s}_{k}}) + \frac {m-1}{m}\nabla f(\tilde {w}^{s-1})\), first inequality follows from Lemma 1 and second inequality follows from the convexity, i.e., f(x) ≥ f(y)+ < ∇f(y),x − y >.
$$ \begin{array}{@{}rcl@{}} \text{and}, &&\mathbb{E} \left[ <\nabla f({w^{s}_{k}}) - \tilde{\nabla}_{s,k}, w^{s}_{k + 1} - {w^{s}_{k}}> - \frac{L(\beta-1)}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2} \right]\\ &\le& \mathbb{E} \left[ \frac{1}{2L(\beta-1)} \|\nabla f({w^{s}_{k}}) - \tilde{\nabla}_{s,k}\|^{2} + \frac{L(\beta-1)}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2} - \frac{L(\beta-1)}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2} \right]\\ &\le& \frac{1}{2L(\beta-1)} \left[8L\alpha(b) \left[ f({w^{s}_{k}}) - f^{*} \right] + \frac{8L\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}} \left[ f(\tilde{w}^{s-1}) - f^{*} \right] + R^{\prime}\right]\\ &=& \frac{4\alpha(b)}{(\beta-1)} \left[ f({w^{s}_{k}}) - f^{*} \right] + \frac{4\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}(\beta-1)} \left[ f(\tilde{w}^{s-1}) - f^{*} \right] + R^{\prime\prime} \end{array} $$
(47)
first inequality follows from Young’s inequality and second inequality follows from Lemma 6 and \(R^{\prime \prime } = R^{\prime }/(2L(\beta -1))\).
Now, substituting the values into (45) from inequalities (46) and (47), and taking expectation w.r.t. mini-batches, we have
$$ \begin{array}{@{}rcl@{}} &&\mathbb{E} \left[ f(w^{s}_{k + 1})\right]\\ &\le& f(w^{*}) + \frac{L\beta}{2} \left[ \| w^{*} - {w^{s}_{k}} \|^{2} - \| w^{*} - w^{s}_{k + 1}\|^{2}\right] + \frac{m-1}{m}\left[ f(w^{s}_{k + 1}) - f({w^{s}_{k}})\right]\\ &&+ \frac{4\alpha(b)}{(\beta-1)} \left[ f({w^{s}_{k}}) - f^{*} \right] + \frac{4\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}(\beta-1)} \left[ f(\tilde{w}^{s-1}) - f^{*} \right] + R^{\prime\prime},\\ &&\mathbb{E} \left[ f(w^{s}_{k + 1})-f(w^{*})\right]\\&=& \frac{L\beta}{2} \left[ \| w^{*} - {w^{s}_{k}} \|^{2} - \| w^{*} - w^{s}_{k + 1}\|^{2}\right] + \frac{m-1}{m}\left[ f(w^{s}_{k + 1}) - f({w^{s}_{k}})\right]\\ &+& \frac{4\alpha(b)}{(\beta-1)} \left[ f({w^{s}_{k}}) - f^{*} \right] + \frac{4\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}(\beta-1)} \left[ f(\tilde{w}^{s-1}) - f^{*} \right] + R^{\prime\prime} \end{array} $$
(48)
Taking sum over k = 0, 1,..., (m − 1) and dividing by m, we have
$$ \begin{array}{@{}rcl@{}} &&\frac{1}{m}{\sum}_{k = 0}^{m-1}\mathbb{E} \left[ f(w^{s}_{k + 1})-f(w^{*})\right]\\ &\le& \frac{1}{m}{\sum}_{k = 0}^{m-1}\left\lbrace \frac{L\beta}{2} \left[ \| w^{*} - {w^{s}_{k}} \|^{2} - \| w^{*} - w^{s}_{k + 1}\|^{2}\right] + \frac{m-1}{m}\left[ f(w^{s}_{k + 1}) - f({w^{s}_{k}})\right]\right\rbrace\\ &&+ \frac{1}{m}{\sum}_{k = 0}^{m-1}\left\lbrace\frac{4\alpha(b)}{(\beta-1)} \left[ f({w^{s}_{k}}) - f^{*} \right] + \frac{4\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}(\beta-1)} \left[ f(\tilde{w}^{s-1}) - f^{*} \right] + R^{\prime\prime}\right\rbrace\\ &&\frac{1}{m}{\sum}_{k = 1}^{m}\mathbb{E} \left[ f({w^{s}_{k}})-f(w^{*})\right]\\ &\le& \frac{L\beta}{2m} \left[ \| w^{*} - {w^{s}_{0}} \|^{2} - \| w^{*} - {w^{s}_{m}}\|^{2}\right] + \frac{m-1}{m^{2}}\left[ f({w^{s}_{m}}) - f({w^{s}_{0}})\right]\\ &&+ \frac{4\alpha(b)}{(\beta-1)}\frac{1}{m}\left\lbrace {\sum}_{k = 1}^{m}\left[ f({w^{s}_{k}}) - f^{*} \right] + f({w^{s}_{0}}) - f^{*} - \left( f({w^{s}_{m}}) - f^{*} \right)\right\rbrace\\ &&+ \frac{4\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}(\beta-1)} \left[ f(\tilde{w}^{s-1}) - f^{*} \right] + R^{\prime\prime} \end{array} $$
(49)
Subtracting \(\frac {4\alpha (b)}{(\beta -1)}\frac {1}{m}{\sum }_{k = 1}^{m}\left [ f({w^{s}_{k}}) - f^{*} \right ]\) from both sides, we have
$$ \begin{array}{@{}rcl@{}} &&\left( 1-\frac{4\alpha(b)}{(\beta-1)}\right)\frac{1}{m}{\sum}_{k = 1}^{m}\mathbb{E} \left[ f({w^{s}_{k}}) - f^{*}\right]\\ &\le& \frac{L\beta}{2m} \left[ \| w^{*} - {w^{s}_{0}} \|^{2} - \| w^{*} - {w^{s}_{m}}\|^{2}\right] - \left( \frac{m-1}{m^{2}} - \frac{4\alpha(b)}{(\beta-1)} \frac{1}{m}\right) \left[ f({w^{s}_{0}}) -f({w^{s}_{m}}) \right]\\ &&+ \frac{4\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}(\beta-1)} \left[ f(\tilde{w}^{s-1}) - f^{*} \right] + R^{\prime\prime}\\ &\le& \frac{L\beta}{2m} \| w^{*} - {w^{s}_{0}} \|^{2} - \left( \frac{m-1}{m^{2}} - \frac{4\alpha(b)}{(\beta-1)} \frac{1}{m}\right) \left[ f({w^{s}_{0}}) -f^{*} - \lbrace f({w^{s}_{m}}) - f^{*}\rbrace \right]\\ &&+ \frac{4\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}(\beta-1)} \left[ f(\tilde{w}^{s-1}) - f^{*} \right] + R^{\prime\prime}\\ &\le& \frac{L\beta}{2m} \frac{2}{\mu}\left( f({w^{s}_{0}})- f^{*} \right) - \left( \frac{m-1}{m^{2}} - \frac{4\alpha(b)}{(\beta-1)} \frac{1}{m}\right) \left[ c\left[ f(\tilde{w}^{s-1}) - f^{*} \right] - c\left[ f(\tilde{w}^{s}) - f^{*} \right] \right]\\ &&+ \frac{4\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}(\beta-1)} \left[ f(\tilde{w}^{s-1}) - f^{*} \right] + R^{\prime\prime}\\ &\le& \frac{L\beta}{2m} \frac{2}{\mu}c\left[ f(\tilde{w}^{s-1}) - f^{*} \right] - \left( \frac{m-1}{m^{2}} - \frac{4\alpha(b)}{(\beta-1)} \frac{1}{m}\right) \left[ c\left[ f(\tilde{w}^{s-1}) - f^{*} \right] - c\left[ f(\tilde{w}^{s}) - f^{*} \right] \right]\\ &&+ \frac{4\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}(\beta-1)} \left[ f(\tilde{w}^{s-1}) - f^{*} \right] + R^{\prime\prime} \end{array} $$
(50)
second inequality follows by dropping, \(\| w^{*} - {w^{s}_{m}}\|^{2} < 0\), third inequality follows from the strong convexity, i.e., \(\| {w^{s}_{0}} - w^{*}\|^{2} \le 2/ \mu \left (f({w^{s}_{0}})- f^{*} \right ) \) and application of Assumption 3 twice, and fourth inequality follows from Assumption 3.
Since \(\tilde {w}^{s} = 1/m {\sum }_{k = 1}^{m}{w^{s}_{k}}\) so by convexity using, \(f(\tilde {w}^{s}) \le 1/m {\sum }_{k = 1}^{m} f({w^{s}_{k}})\), we have
$$ \begin{array}{@{}rcl@{}} &&\left( 1-\frac{4\alpha(b)}{(\beta-1)}\right)\mathbb{E} \left[ f(\tilde{w}^{s}) - f^{*}\right]\\ &\le& \frac{L\beta}{2m} \frac{2}{\mu}c\left[ f(\tilde{w}^{s-1}) - f^{*} \right] - \left( \frac{m-1}{m^{2}} - \frac{4\alpha(b)}{(\beta-1)} \frac{1}{m}\right)\\ &&\times\left[ c\left[ f(\tilde{w}^{s-1}) - f^{*} \right] - c\left[ f(\tilde{w}^{s}) - f^{*} \right] \right]\\ &&+ \frac{4\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}(\beta-1)} \left[ f(\tilde{w}^{s-1}) - f^{*} \right] + R^{\prime\prime} \end{array} $$
(51)
Subtracting, \( c\left (\frac {m-1}{m^{2}} - \frac {4\alpha (b)}{(\beta -1)} \frac {1}{m}\right ) \mathbb {E} \left [ f(\tilde {w}^{s}) - f^{*}\right ]\) both sides, we have
$$ \begin{array}{@{}rcl@{}} &&\left( 1-\frac{4\alpha(b)}{(\beta-1)} - c\left( \frac{m-1}{m^{2}} - \frac{4\alpha(b)}{(\beta-1)} \frac{1}{m}\right)\right) \mathbb{E} \left[ f(\tilde{w}^{s}) - f^{*}\right]\\ &\le& \left[\frac{cL\beta}{m\mu} + \frac{4\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}(\beta-1)} -\frac{c(m-1)}{m^{2}} + \frac{4\alpha(b)}{(\beta-1)}\right]\\ &&\times\left[ f(\tilde{w}^{s-1}) -f^{*} \right]+ R^{\prime\prime} \end{array} $$
(52)
Dividing both sides by \( \left (1-\frac {4\alpha (b)}{(\beta -1)} - c\left (\frac {m-1}{m^{2}} - \frac {4\alpha (b)}{(\beta -1)} \frac {1}{m}\right )\right )\), we have
$$ \mathbb{E} \left[ f(\tilde{w}^{s}) - f^{*}\right] \le C\left[ f(\tilde{w}^{s-1}) -f^{*} \right]+ R^{\prime\prime\prime} $$
(53)
where \(C = \left [\frac {cL\beta }{m\mu } + \frac {4\left (\alpha (b)m^{2}+(m-1)^{2}\right )}{m^{2}(\beta -1)} -\frac {c(m-1)}{m^{2}} + \frac {4\alpha (b)}{(\beta -1)}\right ]\)\(\left (1-\frac {4\alpha (b)}{(\beta -1)} - c\left (\frac {m-1}{m^{2}} - \frac {4\alpha (b)}{(\beta -1)} \frac {1}{m}\right )\right )^{-1}\) and \(R^{\prime \prime \prime }= R^{\prime \prime }\left (1-\frac {4\alpha (b)}{(\beta -1)} - c\left (\frac {m-1}{m^{2}} - \frac {4\alpha (b)}{(\beta -1)} \frac {1}{m}\right )\right )^{-1}\). Now, recursively applying the inequality, we have
$$ \mathbb{E} \left[ f(\tilde{w}^{s}) - f^{*}\right] \le C^{s}\left[ f(\tilde{w}^{0}) -f^{*} \right]+ R^{\prime\prime\prime\prime}, $$
(54)
inequality follows for \(R^{\prime \prime \prime \prime }= R^{\prime \prime \prime }/(1-C)\), since \({\sum }_{i = 0}^{k} r^{i} \le {\sum }_{i = 0}^{\infty } r^{i} = \frac {1}{1-r}, \quad \|r\|<1\). For certain choice of β, one can easily prove that C < 1. This proves linear convergence with some initial error. □
Proof of Theorem 3
(Non-strongly convex and non-smooth problem with SAAG-IV) □
Proof
By smoothness, we have,
$$ \begin{array}{@{}rcl@{}} f(w^{s}_{k + 1}) &\le& f({w^{s}_{k}}) + <\nabla f({w^{s}_{k}}), w^{s}_{k + 1} - {w^{s}_{k}}> \\ &&+ \frac{L}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2} \end{array} $$
(55)
$$ \begin{array}{@{}rcl@{}} \text{Now,} \!\!\!&&F(w^{s}_{k + 1}) = f(w^{s}_{k + 1}) + g(w^{s}_{k + 1})\\ \!\!\!&\le& f({w^{s}_{k}}) + g(w^{s}_{k + 1}) + <\nabla f({w^{s}_{k}}), w^{s}_{k + 1} - {w^{s}_{k}}>\\ &&+ \frac{L}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2}\\ \!\!\!&=& f({w^{s}_{k}}) + g(w^{s}_{k + 1}) + <\nabla f({w^{s}_{k}}), w^{s}_{k + 1} - {w^{s}_{k}}>\\ &&+ \frac{L\beta}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2} - \frac{L(\beta-1}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2}\\ \!\!\!&=& f({w^{s}_{k}}) + g(w^{s}_{k + 1}) + <\tilde{\nabla}_{s,k}, w^{s}_{k + 1} - {w^{s}_{k}}>\\ &&+ \frac{L\beta}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2} + <\nabla f({w^{s}_{k}}) - \tilde{\nabla}_{s,k}, w^{s}_{k + 1} - {w^{s}_{k}}>\\ &&- \frac{L(\beta-1}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2} \end{array} $$
(56)
where β is appropriately chosen positive value. Now, separately simplifying the terms, we have
$$ \begin{array}{@{}rcl@{}} &&\mathbb{E}\left[ f({w^{s}_{k}}) + g(w^{s}_{k + 1}) + <\tilde{\nabla}_{s,k}, w^{s}_{k + 1} - {w^{s}_{k}}> + \frac{L\beta}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2} \right]\\ & =& f({w^{s}_{k}}) + g(w^{s}_{k + 1}) + \mathbb{E}\left[<\tilde{\nabla}_{s,k}, w^{s}_{k + 1} - {w^{s}_{k}}>\right] + \frac{L\beta}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2}\\ & =& f({w^{s}_{k}}) + g(w^{s}_{k + 1}) + <\nabla f({w^{s}_{k}}) + \frac{m-1}{m}\nabla f(\tilde{w}^{s-1}), w^{s}_{k + 1} - {w^{s}_{k}}> + \frac{L\beta}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2}\\ & =& f({w^{s}_{k}}) + g(w^{s}_{k + 1}) + <\nabla f({w^{s}_{k}}), w^{s}_{k + 1} - {w^{s}_{k}}> + \frac{L\beta}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2} + \frac{m-1}{m}<\nabla f(\tilde{w}^{s-1}), w^{s}_{k + 1} - {w^{s}_{k}}>\\ & \le& f({w^{s}_{k}}) + g(w^{*}) + <\nabla f({w^{s}_{k}}), w^{*} - {w^{s}_{k}}> + \frac{L\beta}{2} \| w^{*} - {w^{s}_{k}} \|^{2} - \frac{L\beta}{2} \| w^{*} - w^{s}_{k + 1}\|^{2}\\ && + \frac{m-1}{m}<\nabla f(\tilde{w}^{s-1}), w^{s}_{k + 1} - {w^{s}_{k}}>\\ & =& f({w^{s}_{k}}) + g(w^{*}) + <\nabla f({w^{s}_{k}}), w^{*} - {w^{s}_{k}}> + \frac{L\beta}{2} \| w^{*} - {w^{s}_{k}} \|^{2} - \frac{L\beta}{2} \| w^{*} - w^{s}_{k + 1}\|^{2}\\ && + \frac{m-1}{m}\left[<\nabla f(\tilde{w}^{s-1}), w^{s}_{k + 1} - w^{*}>- <\nabla f(\tilde{w}^{s-1}), {w^{s}_{k}} - w^{*}>\right]\\ & \le& f(w^{*}) + g(w^{*}) + \frac{L\beta}{2} \left[ \| w^{*} - {w^{s}_{k}} \|^{2} - \| w^{*} - w^{s}_{k + 1}\|^{2}\right]\\ && + \frac{m-1}{m}\left[\frac{1}{2\delta}\|\nabla f(\tilde{w}^{s-1})\|^{2} + \frac{\delta}{2} \|w^{s}_{k + 1} - w^{*}\|^{2} - \left[ \frac{1}{2\delta}\|\nabla f(\tilde{w}^{s-1})\|^{2} +\frac{\delta}{2} \|{w^{s}_{k}} - w^{*}\|^{2}\right]\right]\\ & =& F(w^{*}) + \frac{L\beta}{2} \left[ \| w^{*} - {w^{s}_{k}} \|^{2} - \| w^{*} - w^{s}_{k + 1}\|^{2}\right] + \frac{\delta(m-1)}{2m}\left[\|w^{s}_{k + 1} - w^{*}\|^{2} - \|{w^{s}_{k}} - w^{*}\|^{2}\right],\\ & =& F(w^{*}) + \left( \frac{L\beta}{2} - \frac{\delta(m-1)}{2m}\right) \left[ \| w^{*} - {w^{s}_{k}} \|^{2} - \| w^{*} - w^{s}_{k + 1}\|^{2} \right],\\ & =&F(w^{*}), \end{array} $$
(57)
second equality follows from, \(\mathbb {E}\left [\tilde {\nabla }_{s,k}\right ] = \nabla f({w^{s}_{k}}) + \frac {m-1}{m}\nabla f(\tilde {w}^{s-1})\), first inequality follows from Lemma 1, second inequality follows from the convexity, i.e., f(w∗) ≥ \( f({w^{s}_{k}}) + <\nabla f({w^{s}_{k}}), w^{*} - {w^{s}_{k}}>\) and Young’s inequality, i.e., xTy ≤ 1/(2δ)∥x∥2 + δ/2∥y∥2 for δ > 0, and last equality follows by choosing \(\delta = \frac {mL\beta }{(m-1)}\).
$$ \begin{array}{@{}rcl@{}} \text{And}, &&\mathbb{E} \left[ <\nabla f({w^{s}_{k}}) - \tilde{\nabla}_{s,k}, w^{s}_{k + 1} - {w^{s}_{k}}> - \frac{L(\beta-1)}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2} \right]\\ & \le& \mathbb{E} \left[ \frac{1}{2L(\beta-1)} \|\nabla f({w^{s}_{k}}) - \tilde{\nabla}_{s,k}\|^{2} + \frac{L(\beta-1)}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2} - \frac{L(\beta-1)}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2} \right]\\ & =& \frac{1}{2L(\beta-1)} \mathbb{E} \left[\| \tilde{\nabla}_{s,k} - \nabla f({w^{s}_{k}})\|^{2} \right]\\ & \le& \frac{1}{2L(\beta-1)}\left[8L\alpha(b) \left[ F({w^{s}_{k}}) - F^{*} \right] + \frac{8L\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}} \left[ F(\tilde{w}^{s-1}) - F^{*} \right] + R^{\prime}\right]\\ & =& \frac{4\alpha(b)} {(\beta-1)}\left[ F({w^{s}_{k}}) - F^{*} \right] + \frac{4\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}(\beta-1)} \left[ F(\tilde{w}^{s-1}) - F^{*} \right] + R^{\prime\prime} \end{array} $$
(58)
first inequality follows from Young’s inequality and second inequality follows from Lemma 7 and \(R^{\prime \prime } = R^{\prime }/(2L(\beta -1))\). Now, substituting the values into (56) from inequalities (57) and (58), and taking expectation w.r.t. mini-batches, we have
$$ \begin{array}{@{}rcl@{}} \mathbb{E} \!\left[ F(w^{s}_{k + 1})\right] \!\!&\le&\! F(w^{*}) + \frac{4\alpha(b)} {(\beta-1)}\left[ F({w^{s}_{k}}) - F^{*} \right]\\ &&\!\!+ \frac{4\left( \alpha(b)m^{2} + (m - 1)^{2}\right)}{m^{2}(\beta-1)} \!\left[ \!F(\tilde{w}^{s-1}) - F^{*} \right] + R^{\prime\prime}\\ \mathbb{E} \!\left[ \!F{\kern-.5pt}({\kern-.5pt}w^{s}_{k + 1}{\kern-.5pt}) - F{\kern-.5pt}({\kern-.5pt}w^{*})\right] \!\!&\le&\! \frac{4\alpha(b)} {(\beta-1)}\left[ F({w^{s}_{k}}) - F^{*} \right]\\ &&\!\!+ \frac{4{\kern-.5pt}\left( {\kern-.5pt}\alpha({\kern-.5pt}b)m^{2} + (m - 1)^{2}\right)}{m^{2}(\beta-1)} \!\left[ \!F{\kern-.5pt}({\kern-.5pt}\tilde{w}^{s-1}{\kern-.5pt}) - F^{*} \!\right] + R^{\prime\prime} \end{array} $$
(59)
Taking sum over k = 0, 1,..., (m − 1) and dividing by m, we have
$$ \begin{array}{@{}rcl@{}} &&\frac{1}{m}{\sum}_{k = 0}^{m-1}\mathbb{E} \left[ F(w^{s}_{k + 1})-F(w^{*})\right]\\ &\le& \frac{4\alpha(b)}{(\beta-1)} \frac{1}{m}{\sum}_{k = 0}^{m-1} \left[ F({w^{s}_{k}}) - F(w^{*}) \right] + \frac{4\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}(\beta-1)} \left[ F(\tilde{w}^{s-1}) - F(w^{*}) \right] + R^{\prime\prime}\\ &&\frac{1}{m}{\sum}_{k = 1}^{m}\mathbb{E} \left[ F({w^{s}_{k}}) - F(w^{*})\right]\\ &\le& \frac{4\alpha(b)}{(\beta-1)} \frac{1}{m} \left\lbrace {\sum}_{k = 1}^{m} \left[ F({w^{s}_{k}}) - F(w^{*}) \right] + F({w^{s}_{0}}) - F(w^{*}) - \lbrace F({w^{s}_{m}}) - F(w^{*}) \rbrace \right\rbrace\\ &&+ \frac{4\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}(\beta-1)} \left[ F(\tilde{w}^{s-1}) - F(w^{*}) \right] + R^{\prime\prime} \end{array} $$
(60)
Subtracting \(\frac {4\alpha (b)}{(\beta -1)} \frac {1}{m}{\sum }_{k = 1}^{m} \left [ F({w^{s}_{k}}) - F(w^{*})\right ] \) from both sides, we have
$$ \begin{array}{@{}rcl@{}} \!\!&&\left( 1-\frac{4\alpha(b)}{(\beta-1)}\right)\frac{1}{m}{\sum}_{k = 1}^{m}\mathbb{E} \left[ F({w^{s}_{k}}) - F(w^{*})\right]\\ \!\!& \le& \frac{4\alpha(b)}{(\beta-1)} \frac{1}{m} \left[ F({w^{s}_{0}}) - F(w^{*}) - \lbrace F({w^{s}_{m}}) - F(w^{*}) \rbrace\right]\\ \!\!&& + \frac{4\left( \alpha(b)m^{2} + (m - 1)^{2}\right)}{m^{2}(\beta-1)} \left[ F(\tilde{w}^{s-1}) - F(w^{*}) \right] + R^{\prime\prime} \end{array} $$
(61)
Since \(F({w^{s}_{m}}) - F(w^{*}) \ge 0\) so dropping this term and using Assumption 3, we have
$$ \begin{array}{@{}rcl@{}} &&\left( 1-\frac{4\alpha(b)}{(\beta-1)}\right)\frac{1}{m}{\sum}_{k = 1}^{m}\mathbb{E} \left[ F({w^{s}_{k}}) - F(w^{*})\right]\\ &\le& \frac{4\alpha(b)}{(\beta-1)} \frac{1}{m} \left[ F({w^{s}_{0}}) - F(w^{*}) \right] + \frac{4\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}(\beta-1)} \left[ F(\tilde{w}^{s-1}) - F(w^{*}) \right] + R^{\prime\prime}\\ &\le& \frac{4\alpha(b)}{(\beta-1)} \frac{1}{m} \left[ c\left( F(\tilde{w}^{s-1}) - F(w^{*})\right) \right] + \frac{4\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}(\beta-1)} \left[ F(\tilde{w}^{s-1}) - F(w^{*}) \right] + R^{\prime\prime}\\ &=& \left( \frac{4\alpha(b)}{(\beta-1)} \frac{c}{m}+ \frac{4\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}(\beta-1)} \right) \left[ \left( F(\tilde{w}^{s-1}) - F(w^{*})\right) \right] + R^{\prime\prime}, \end{array} $$
(62)
Dividing both sides by \(\left (1-\frac {4\alpha (b)}{(\beta -1)}\right )\), and since \(\tilde {w}^{s} = 1/m {\sum }_{k = 1}^{m}{w^{s}_{k}}\) so by convexity, \(F(\tilde {w}^{s}) \le 1/m {\sum }_{k = 1}^{m} F({w^{s}_{k}})\), we have
$$ \begin{array}{@{}rcl@{}} \mathbb{E} \left[ F(\tilde{w}^{s}) - F^{*}\right]\! &\le&\! \left( \! \frac{4\alpha(b)}{(\beta - 1 - 4\alpha(b))} \frac{c}{m}+ \frac{4\left( \alpha(b)m^{2} + (m-1)^{2}\right)}{m^{2}(\beta - 1 - 4\alpha(b))}\!\right)\\ &&\times\left[ F (\tilde{w}^{s-1}) - F(w^{*}) \right] + R^{\prime\prime\prime}, \end{array} $$
(63)
where \( R^{\prime \prime \prime }= R^{\prime \prime } \left (1-\frac {4\alpha (b)}{(\beta -1)}\right )^{-1}\). Now, applying above inequality recursively, we have
$$ \begin{array}{ll} \mathbb{E} \left[ F(\tilde{w}^{s}) - F(w^{*})\right] \le C^{s} \left[ F(\tilde{w}^{0}) - F(w^{*}) \right] + R^{\prime\prime\prime\prime}, \end{array} $$
(64)
inequality follows for \(R^{\prime \prime \prime \prime }= R^{\prime \prime \prime }/(1-C)\), since \({\sum }_{i = 0}^{k} r^{i} \le {\sum }_{i = 0}^{\infty } r^{i} = \frac {1}{1-r}, \quad \|r\|<1\) and \(C = \left (\frac {4\alpha (b)}{(\beta -1-4\alpha (b))} \frac {c}{m}+ \frac {4\left (\alpha (b)m^{2}+(m-1)^{2}\right )}{m^{2}(\beta -1-4\alpha (b))}\right ) \). For certain choice of β, one can easily prove that C < 1. This proves linear convergence with some initial error. □
Proof of Theorem 4
(Strongly convex and non-smooth problem with SAAG-IV) □
Proof
By smoothness, we have,
$$ \begin{array}{@{}rcl@{}} f(w^{s}_{k + 1}) & \le& f({w^{s}_{k}}) + <\nabla f({w^{s}_{k}}), w^{s}_{k + 1} - {w^{s}_{k}}> \\ &&+ \frac{L}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2}\\ \end{array} $$
Now,
$$ \begin{array}{@{}rcl@{}} F(w^{s}_{k + 1}) &=& f(w^{s}_{k + 1}) + g(w^{s}_{k + 1})\\ & \le& f({w^{s}_{k}}) + g(w^{s}_{k + 1}) + <\nabla f({w^{s}_{k}}), w^{s}_{k + 1} - {w^{s}_{k}}>\\ &&+ \frac{L}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2}\\ & =& f({w^{s}_{k}}) + g(w^{s}_{k + 1}) + <\nabla f({w^{s}_{k}}), w^{s}_{k + 1} - {w^{s}_{k}}>\\ && + \frac{L\beta}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2} - \frac{L(\beta-1}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2}\\ & =& f({w^{s}_{k}}) + g(w^{s}_{k + 1}) + <\tilde{\nabla}_{s,k}, w^{s}_{k + 1} - {w^{s}_{k}}>\\ && + \frac{L\beta}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2} + <\nabla f({w^{s}_{k}}) - \tilde{\nabla}_{s,k}, w^{s}_{k + 1}\\ && - {w^{s}_{k}}>- \frac{L(\beta-1}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2} \end{array} $$
(65)
where β is appropriately chosen positive value. Now, separately simplifying the terms, we have
$$ \begin{array}{@{}rcl@{}} &&\mathbb{E}\left[ f({w^{s}_{k}}) + g(w^{s}_{k + 1}) + <\tilde{\nabla}_{s,k}, w^{s}_{k + 1} - {w^{s}_{k}}> + \frac{L\beta}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2} \right]\\ & =& f({w^{s}_{k}}) + g(w^{s}_{k + 1}) + \mathbb{E}\left[<\tilde{\nabla}_{s,k}, w^{s}_{k + 1} - {w^{s}_{k}}>\right] + \frac{L\beta}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2}\\ & =& f({w^{s}_{k}}) + g(w^{s}_{k + 1}) + <\nabla f({w^{s}_{k}}) + \frac{m-1}{m}\nabla f(\tilde{w}^{s-1}), w^{s}_{k + 1} - {w^{s}_{k}}> + \frac{L\beta}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2}\\ & =& f({w^{s}_{k}}) + g(w^{s}_{k + 1}) + <\nabla f({w^{s}_{k}}), w^{s}_{k + 1} - {w^{s}_{k}}> + \frac{L\beta}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2} + \frac{m-1}{m}<\nabla f(\tilde{w}^{s-1}), w^{s}_{k + 1} - {w^{s}_{k}}>\\ & \le& f({w^{s}_{k}}) + g(w^{*}) + <\nabla f({w^{s}_{k}}), w^{*} - {w^{s}_{k}}> + \frac{L\beta}{2} \| w^{*} - {w^{s}_{k}} \|^{2} - \frac{L\beta}{2} \| w^{*} - w^{s}_{k + 1}\|^{2}\\ &&+ \frac{m-1}{m}<\nabla f(\tilde{w}^{s-1}), w^{s}_{k + 1} - {w^{s}_{k}}>\\ & =& f({w^{s}_{k}}) + g(w^{*}) + <\nabla f({w^{s}_{k}}), w^{*} - {w^{s}_{k}}> + \frac{L\beta}{2} \| w^{*} - {w^{s}_{k}} \|^{2} - \frac{L\beta}{2} \| w^{*} - w^{s}_{k + 1}\|^{2}\\ && + \frac{m-1}{m}\left[<\nabla f(\tilde{w}^{s-1}), w^{s}_{k + 1} - \tilde{w}^{s-1}>- <\nabla f(\tilde{w}^{s-1}), {w^{s}_{k}} - \tilde{w}^{s-1}>\right]\\ & \le& f(w^{*}) + g(w^{*}) + \frac{L\beta}{2} \left[ \| w^{*} - {w^{s}_{k}} \|^{2} - \| w^{*} - w^{s}_{k + 1}\|^{2}\right]\\ && + \frac{m-1}{m}\left[ f(w^{s}_{k + 1}) - f(\tilde{w}^{s-1}) - \left( f({w^{s}_{k}}) - f(\tilde{w}^{s-1})\right)\right]\\ & =& f(w^{*}) + g(w^{*}) + \frac{L\beta}{2} \left[ \| w^{*} - {w^{s}_{k}} \|^{2} - \| w^{*} - w^{s}_{k + 1}\|^{2}\right] + \frac{m-1}{m}\left[ f(w^{s}_{k + 1}) - f({w^{s}_{k}})\right],\\ & =& F(w^{*}) + \frac{L\beta}{2} \left[ \| w^{*} - {w^{s}_{k}} \|^{2} - \| w^{*} - w^{s}_{k + 1}\|^{2}\right] + \frac{m-1}{m}\left[ f(w^{s}_{k + 1}) - f({w^{s}_{k}})\right], \end{array} $$
(66)
second equality follows from, \(\mathbb {E}\left [\tilde {\nabla }_{s,k}\right ] = \nabla f({w^{s}_{k}}) + \frac {m-1}{m}\nabla f(\tilde {w}^{s-1})\), first inequality follows from Lemma 1 and second inequality follows from the convexity, i.e., f(x) ≥ f(y)+ < ∇f(y),x − y >.
$$ \begin{array}{@{}rcl@{}} \text{And}, &&\mathbb{E} \left[ <\nabla f({w^{s}_{k}}) - \tilde{\nabla}_{s,k}, w^{s}_{k + 1} - {w^{s}_{k}}> - \frac{L(\beta-1)}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2} \right]\\ & \le& \mathbb{E} \left[ \frac{1}{2L(\beta-1)} \|\nabla f({w^{s}_{k}}) - \tilde{\nabla}_{s,k}\|^{2} + \frac{L(\beta-1)}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2} - \frac{L(\beta-1)}{2} \| w^{s}_{k + 1} - {w^{s}_{k}} \|^{2} \right]\\ & =& \frac{1}{2L(\beta-1)} \mathbb{E} \left[\| \tilde{\nabla}_{s,k} - \nabla f({w^{s}_{k}})\|^{2} \right]\\ & \le& \frac{1}{2L(\beta-1)}\left[8L\alpha(b) \left[ F({w^{s}_{k}}) - F^{*} \right] + \frac{8L\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}} \left[ F(\tilde{w}^{s-1}) - F^{*} \right] + R^{\prime}\right]\\ & =& \frac{4\alpha(b)} {(\beta-1)}\left[ F({w^{s}_{k}}) - F^{*} \right] + \frac{4\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}(\beta-1)} \left[ F(\tilde{w}^{s-1}) - F^{*} \right] + R^{\prime\prime}, \end{array} $$
(67)
first inequality follows from Young’s inequality and second inequality follows from Lemma 7 and \(R^{\prime \prime } = R^{\prime }/(2L(\beta -1))\). Now, substituting the values into (65) from inequalities (66) and (67), and taking expectation w.r.t. mini-batches, we have
$$ \begin{array}{@{}rcl@{}} &&\mathbb{E} \left[ F(w^{s}_{k + 1})\right]\\ & \le& F(w^{*}) + \frac{L\beta}{2} \left[ \| w^{*} - {w^{s}_{k}} \|^{2} - \| w^{*} - w^{s}_{k + 1}\|^{2}\right] + \frac{m-1}{m}\left[ f(w^{s}_{k + 1}) - f({w^{s}_{k}})\right]\\ &&+ \frac{4\alpha(b)} {(\beta-1)}\left[ F({w^{s}_{k}}) - F^{*} \right] + \frac{4\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}(\beta-1)} \left[ F(\tilde{w}^{s-1}) - F^{*} \right] + R^{\prime\prime}\\ &&\mathbb{E} \left[ F(w^{s}_{k + 1})-F(w^{*})\right]\\ & \le& \frac{L\beta}{2} \left[ \| w^{*} - {w^{s}_{k}} \|^{2} - \| w^{*} - w^{s}_{k + 1}\|^{2}\right] + \frac{m-1}{m}\left[ f(w^{s}_{k + 1}) - f({w^{s}_{k}})\right]\\ &&+ \frac{4\alpha(b)} {(\beta-1)}\left[ F({w^{s}_{k}}) - F^{*} \right] + \frac{4\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}(\beta-1)} \left[ F(\tilde{w}^{s-1}) - F^{*} \right] + R^{\prime\prime} \end{array} $$
(68)
Taking sum over k = 0, 1,..., (m − 1) and dividing by m, we have
$$ \begin{array}{@{}rcl@{}} &&\frac{1}{m}{\sum}_{k = 0}^{m-1}\mathbb{E} \left[ F(w^{s}_{k + 1})-F(w^{*})\right]\\ & \le& \frac{1}{m}{\sum}_{k = 0}^{m-1}\left\lbrace \frac{L\beta}{2} \left[ \| w^{*} - {w^{s}_{k}} \|^{2} - \| w^{*} - w^{s}_{k + 1}\|^{2}\right] + \frac{m-1}{m}\left[ f(w^{s}_{k + 1}) - f({w^{s}_{k}})\right] \right\rbrace\\ &&+ \frac{1}{m}{\sum}_{k = 0}^{m-1}\left\lbrace\frac{4\alpha(b)} {(\beta-1)}\left[ F({w^{s}_{k}}) - F^{*} \right] + \frac{4\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}(\beta-1)} \left[ F(\tilde{w}^{s-1}) - F^{*} \right] + R^{\prime\prime} \right\rbrace\\ &&\frac{1}{m}{\sum}_{k = 1}^{m}\mathbb{E} \left[ F({w^{s}_{k}})-F(w^{*})\right]\\ & \le& \frac{L\beta}{2m} \left[ \| w^{*} - {w^{s}_{0}} \|^{2} - \| w^{*} - {w^{s}_{m}}\|^{2}\right] + \frac{m-1}{m^{2}}\left[ f({w^{s}_{m}}) - f({w^{s}_{0}})\right]\\ &&+ \frac{4\alpha(b)}{(\beta-1)}\frac{1}{m}\left\lbrace {\sum}_{k = 1}^{m}\left[ F({w^{s}_{k}}) - F(w^{*}) \right] + F({w^{s}_{0}}) - F(w^{*}) - \left( F({w^{s}_{m}}) - F(w^{*}) \right)\right\rbrace\\ && + \frac{4\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}(\beta-1)} \left[ F(\tilde{w}^{s-1}) - F(w^{*}) \right] + R^{\prime\prime} \end{array} $$
(69)
Subtracting, \(\frac {4\alpha (b)}{(\beta -1)}\frac {1}{m}{\sum }_{k = 1}^{m}\left [ F({w^{s}_{k}}) - F(w^{*}) \right ]\) from both sides, we have
$$ \begin{array}{@{}rcl@{}} && \left( 1-\frac{4\alpha(b)}{(\beta-1)}\right)\frac{1}{m}{\sum}_{k = 1}^{m}\mathbb{E} \left[ F({w^{s}_{k}}) - F(w^{*}) \right]\\ & \le& \frac{L\beta}{2m} \left[ \| w^{*} - {w^{s}_{0}} \|^{2} - \| w^{*} - {w^{s}_{m}}\|^{2}\right] + \frac{m-1}{m^{2}}\left[ f({w^{s}_{m}}) - f({w^{s}_{0}})\right]\\ &&+ \frac{4\alpha(b)}{(\beta-1)}\frac{1}{m}\left\lbrace F({w^{s}_{0}}) - F(w^{*}) - \left( F({w^{s}_{m}}) - F(w^{*}) \right)\right\rbrace\\ && + \frac{4\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}(\beta-1)} \left[ F(\tilde{w}^{s-1}) - F(w^{*}) \right] + R^{\prime\prime}\\ & \le& \frac{L\beta}{2m} \left[ \| w^{*} - {w^{s}_{0}} \|^{2} \right] + \frac{m-1}{m^{2}}\left[ F({w^{s}_{m}}) - F({w^{s}_{0}})\right] + \frac{4\alpha(b)}{(\beta-1)}\frac{1}{m}\left\lbrace F({w^{s}_{0}}) - F(w^{*}) - \left( F({w^{s}_{m}}) - F(w^{*}) \right)\right\rbrace\\ && + \frac{4\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}(\beta-1)} \left[ F(\tilde{w}^{s-1}) - F(w^{*}) \right] + R^{\prime\prime\prime}\\ & \le& \frac{L\beta}{2m} \frac{2}{\mu} \left[ F({w^{s}_{0}}) - F(w^{*}) \right] + \frac{m-1}{m^{2}}\left[ F({w^{s}_{m}}) - F({w^{s}_{0}})\right] + \frac{4\alpha(b)}{(\beta-1)}\frac{1}{m}\left\lbrace F({w^{s}_{0}}) - F(w^{*}) - \left( F({w^{s}_{m}}) - F(w^{*}) \right)\right\rbrace\\ && + \frac{4\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}(\beta-1)} \left[ F(\tilde{w}^{s-1}) - F(w^{*}) \right] + R^{\prime\prime\prime}\\ & =& \left( \frac{L\beta}{m\mu} + \frac{4\alpha(b)}{m(\beta-1)} - \frac{m-1}{m^{2}}\right) \left[ F({w^{s}_{0}}) - F(w^{*}) \right] + \left( \frac{m-1}{m^{2}} - \frac{4\alpha(b)}{m(\beta-1)} \right) \left[F({w^{s}_{m}}) - F(w^{*})\right]\\ && + \frac{4\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}(\beta-1)} \left[ F(\tilde{w}^{s-1}) - F(w^{*}) \right] + R^{\prime\prime\prime}\\ & \le& \left( \frac{L\beta}{m\mu} + \frac{4\alpha(b)}{m(\beta-1)} - \frac{m-1}{m^{2}}\right) c \left[ F(\tilde{w}^{s-1}) - F(w^{*}) \right] + \left( \frac{m-1}{m^{2}} - \frac{4\alpha(b)}{m(\beta-1)} \right) c \left[F(\tilde{w}^{s}) - F(w^{*})\right]\\ && + \frac{4\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}(\beta-1)} \left[ F(\tilde{w}^{s-1}) - F(w^{*}) \right] + R^{\prime\prime\prime}\\ & \le& \left( \frac{Lc\beta}{m\mu} + \frac{4c\alpha(b)}{m(\beta-1)} - \frac{c(m-1)}{m^{2}} + \frac{4\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}(\beta-1)}\right) \left[ F(\tilde{w}^{s-1}) - F(w^{*}) \right]\\ &&+ \left( \frac{m-1}{m^{2}} - \frac{4\alpha(b)}{m(\beta-1)}\right)c \left[F(\tilde{w}^{s}) - F(w^{*})\right] + R^{\prime\prime\prime} \end{array} $$
(70)
second inequality follows from dropping, \(\| w^{*} - {w^{s}_{m}}\|^{2} \ge 0\) and converting, \(f({w^{s}_{m}}) - f({w^{s}_{0}})\) to \(f({w^{s}_{m}}) - f({w^{s}_{0}})\) by introducing some constant, third inequality follows from the strong convexity, i.e., \(\| {w^{s}_{0}} - w^{*}\|^{2} \le 2/ \mu \left (f({w^{s}_{0}})- f^{*} \right ) \), fourth inequality follows from Assumption 3 and \(R^{\prime \prime \prime } = R^{\prime \prime } + (m-1)g({w^{s}_{0}})/m^{2}\). Since \(\tilde {w}^{s} = 1/m {\sum }_{k = 1}^{m}{w^{s}_{k}}\) so by convexity using, \(f(\tilde {w}^{s}) \le 1/m {\sum }_{k = 1}^{m} f({w^{s}_{k}})\), and subtracting \(\left (\frac {m-1}{m^{2}} - \frac {4\alpha (b)}{m(\beta -1)} \right ) c \left [F(\tilde {w}^{s}) - F(w^{*})\right ]\) from both sides, we have
$$ \begin{array}{@{}rcl@{}} \!\!&&\left( \!1 - \frac{4\alpha(b)}{(\beta - 1)} - \frac{c(m - 1)}{m^{2}} + \frac{4c\alpha(b)}{m(\beta - 1)}\!\right)\mathbb{E} \left[ F(\tilde{w}^{s}) - F(w^{*}) \right]\\ \!\!&\le& \!\left( \!\frac{Lc\beta}{m\mu} \!+ \frac{4c\alpha(b)}{m(\beta-1)} - \frac{c(m-1)}{m^{2}} + \frac{4\left( \alpha(b)m^{2}+(m-1)^{2}\right)}{m^{2}(\beta-1)}\right)\\ \!\!&&\times\left[ F(\tilde{w}^{s-1}) - F(w^{*}) \right] + R^{\prime\prime\prime} \end{array} $$
(71)
Dividing both sides by \( \left (1-\frac {4\alpha (b)}{(\beta -1)} - \frac {c(m-1)}{m^{2}} + \frac {4c\alpha (b)}{m(\beta -1)}\right )\), we have
$$ \begin{array}{ll} &\mathbb{E} \left[ F(\tilde{w}^{s}) - F(w^{*}) \right] \le C \left[ F(\tilde{w}^{s-1}) - F(w^{*}) \right] + R^{\prime\prime\prime\prime} \end{array} $$
(72)
where \(C = \left (\frac {Lc\beta }{m\mu } + \frac {4c\alpha (b)}{m(\beta -1)} - \frac {c(m-1)}{m^{2}} + \frac {4\left (\alpha (b)m^{2}+(m-1)^{2}\right )}{m^{2}(\beta -1)}\right )\)\(\left (1-\frac {4\alpha (b)}{(\beta -1)} - \frac {c(m-1)}{m^{2}} + \frac {4c\alpha (b)}{m(\beta -1)}\right )^{-1} \) and \(R^{\prime \prime \prime \prime } = R^{\prime \prime \prime }\left (1-\frac {4\alpha (b)}{(\beta -1)} - \frac {c(m-1)}{m^{2}} + \frac {4c\alpha (b)}{m(\beta -1)}\right )^{-1} \). Now, applying this inequality recursively, we have
$$ \begin{array}{ll} \mathbb{E} \left[ F(\tilde{w}^{s}) - F(w^{*}) \right] \le C^{s} \left[ F(\tilde{w}^{0}) - F(w^{*}) \right] + R^{\prime\prime\prime\prime\prime}, \end{array} $$
(73)
inequality follows for \(R^{\prime \prime \prime \prime \prime }= R^{\prime \prime \prime \prime }/(1-C)\), since \({\sum }_{i = 0}^{k} r^{i} \le {\sum }_{i = 0}^{\infty } r^{i} = \frac {1}{1-r}, \quad \|r\|<1\). For certain choice of β, one can easily prove that C < 1. This proves linear convergence with some initial error. □