Appendix: A Convergence Analysis of Theorem 1
Before providing the theoretical analysis, we give the definitions of \({\overline{\varvec{\theta }}}_{[t]}^{[s+1]}\) and \(F(\varvec{\theta })\) used in the analysis as follows.
\({\overline{\varvec{\theta }}}_{t}^{s}\): \({\overline{\varvec{\theta }}}_{t}^{s}\) is defined as:
$$\begin{aligned} {\overline{\varvec{\theta }}}_{[t]}^{[s+1]} {\mathop {=}\limits ^\textrm{def}} \varvec{\theta }_{[t]}^{[s+1]} -\gamma \varvec{v}^{[s+1]}_{[t]} \end{aligned}$$
(12)
\(F(\varvec{\theta })\): \(F(\varvec{\theta })\) is defined as:
$$\begin{aligned} F(\varvec{\theta })= \frac{1}{l} \sum _{i=1}^l {F_i(\varvec{\theta })} \end{aligned}$$
(13)
Based on (12), it is easy to verify that \(({\overline{\varvec{\theta }}}_{t+1}^{s})_{J(t)} =({\varvec{\theta }}_{t+1}^{s+1})_{J(t)}\). Thus, we have \(\mathbb {E}_{J(t)} ({\varvec{\theta }}_{t+1}^{s} - {\varvec{\theta }}_{t}^{s}) = \frac{1}{k} \left( {\overline{\varvec{\theta }}}_{t+1}^{s} - {\varvec{\theta }}_{t}^{s} \right)\). It means that \({\overline{\varvec{\theta }}}_{t+1}^{s} - {\varvec{\theta }}_{t}^{s}\) captures the expectation of \({\varvec{\theta }}_{t+1}^{s} - {\varvec{\theta }}_{t}^{s}\).
Then, we give a inequality in Lemma 1 respectively. Based on Lemma 1, we prove that \(\mathbb {E}\Vert \varvec{\theta }_{[t-1]}^{[s]} - {\overline{\varvec{\theta }}}_{[t]}^{[s]} \Vert ^2 \le \rho \mathbb {E}\Vert \varvec{\theta }_{[t]}^{[s]} - {\overline{\varvec{\theta }}}_{[t+1]}^{[s]}\Vert ^2\) (Lemma 2), where \(\rho >1\) is a user defined parameter. Then, we prove the monotonicity of the expectation of the objectives \(\mathbb {E} F(\varvec{\theta }_{[t+1]}^{[s]}) \le \mathbb {E} F(\varvec{\theta }_{[t]}^{[s]})\) (Lemma 3). Note that the analyses only consider the case \(|\mathcal {B}|=1\) without loss of generality. The case of \(|\mathcal {B}|>1\) can be proved similarly.
Lemma 1
In each iteration of DSG, \(\forall \varvec{\theta }\), we have the following inequality.
$$\begin{aligned} \left\langle (\varvec{v}^{[s]}_{[t]})_{J(t)}, (\Delta ^{s}_{t})_{J(t)}\right\rangle \le - \frac{1}{\gamma } \Vert (\Delta ^{s}_{t})_{J(t)} \Vert \end{aligned}$$
(14)
Proof
The updating rule of line 7 in Algorithm 2 is equivalent to solve the following problem.
$$\begin{aligned} \varvec{\theta }_{[t+1]}^{[s]} = \arg \min _{\varvec{\theta }} P(\varvec{\theta }) = \arg \min _{\varvec{\theta }}{} & {} \left\langle (\varvec{v}^{[s]}_{[t]})_{J(t)}, (\varvec{\theta } - \varvec{\theta }_{[t]}^{[s]})_{J(t)}\right\rangle + \frac{1}{2\gamma } \left\| (\varvec{\theta } - \varvec{\theta }_{[t]}^{[s]})_{J(t)} \right\| ^2\nonumber \\ s.t.{} & {} \varvec{\theta }_{\setminus J(t)} = (\varvec{\theta }_{[t]}^{[s]})_{\setminus J(t)} \end{aligned}$$
(15)
Substituting \(\varvec{\theta }_{[t+1]}^{[s]}=\left[ \begin{array}{c} \left( \varvec{\theta }_{[t]}^{[s]}\right) _{J(t)} - \gamma \left( \varvec{v}^{[s]}_{[t]} \right) _{J(t)} \\ (\varvec{\theta }_{[t]}^{[s]})_{\setminus J(t)} \end{array} \right]\) into (15), we have that \(P(\varvec{\theta }_{[t+1]}^{[s]}) = - \frac{\gamma }{2} \left\| \varvec{v}^{[s]}_{[t]} \right\| ^2\). Thus, we have that
$$\begin{aligned} \left\langle (\varvec{v}^{[s]}_{[t]})_{J(t)}, (\varvec{\theta } - \varvec{\theta }_{[t]}^{[s]})_{J(t)}\right\rangle + \frac{1}{2\gamma } \left\| (\varvec{\theta } - \varvec{\theta }_{[t]}^{[s]})_{J(t)} \right\| ^2 \ge - \frac{\gamma }{2} \left\| \varvec{v}^{[s]}_{[t]} \right\| ^2 \end{aligned}$$
(16)
Based on (16) and let \(\varvec{\theta }=\varvec{\theta }_{[t+1]}^{[s]}\), we have the conclusion. This completes the proof. \(\square\)
Lemma 2
The size of the partition of \(\{1,...,d \times p \}\) is k. Let \(\rho\) be a constant that satisfies \(\rho > 1\), and define the quantity \(\theta = \frac{\rho ^{\frac{1}{2}} - \rho ^{\frac{m}{2}}}{1-\rho ^{\frac{1}{2}}}\). Suppose the nonnegative steplength parameter \(\gamma >0\) satisfies \(\gamma \le \frac{k^{1/2}(1-\rho ^{-1})-2}{4 L_{nor} \left( 1+ \theta \right) }\), under Assumptions 1 and 2, we have
$$\begin{aligned} \mathbb {E}\Vert \varvec{\theta }_{[t-1]}^{[s]} - {\overline{\varvec{\theta }}}_{[t]}^{[s]} \Vert ^2 \le \rho \mathbb {E}\Vert \varvec{\theta }_{[t]}^{[s]} - {\overline{\varvec{\theta }}}_{[t+1]}^{[s]}\Vert ^2 \end{aligned}$$
(17)
Proof
According to (A.8) in (Liu and Wright, 2015), we have
$$\begin{aligned}{} & {} \Vert \varvec{\theta }_{[t-1]}^{[s]} -{\overline{\varvec{\theta }}}_{[t]}^{[s]}\Vert ^2 - \Vert \varvec{\theta }_{[t]}^{[s]} -{\overline{\varvec{\theta }}}_{[t+1]}^{[s]}\Vert ^2\nonumber \\{} & {} \quad \le 2 \Vert \varvec{\theta }_{[t-1]}^{[s]} -{\overline{\varvec{\theta }}}_{[t]}^{[s]} \Vert \Vert \varvec{\theta }_{[t]}^{[s]} -{\overline{\varvec{\theta }}}_{[t+1]}^{[s]} - \varvec{\theta }_{[t-1]}^{[s]} + {\overline{\varvec{\theta }}}_{[t]}^{[s]} \Vert \end{aligned}$$
(18)
The second part in the right half side of (18) is bound as follows if \(\mathcal {B}=\{i_t\}\) and \({J}(t)=\{J(t)\}\).
$$\begin{aligned}{} & {} \Vert \varvec{\theta }_{[t]}^{[s]} -{\overline{\varvec{\theta }}}_{[t+1]}^{[s]} - \varvec{\theta }_{[t-1]}^{[s]} + {\overline{\varvec{\theta }}}_{[t]}^{[s]} \Vert \nonumber \\{} & {} \quad = \left\| \varvec{\theta }_{[t]}^{[s]} - \varvec{\theta }_{[t]}^{[s]} + \gamma \varvec{{v}}_{[t]}^{[s]} - \varvec{\theta }_{[t-1]}^{[s]} + \varvec{\theta }_{[t-1]}^{[s]} - \gamma \varvec{v}_{[t-1]}^{[s]} \right\| \nonumber \\{} & {} \quad \le \Vert \varvec{\theta }_{[t]}^{[s]} - \varvec{\theta }_{[t-1]}^{[s]} \Vert + \gamma \left\| \varvec{{v}}_{[t]}^{[s]} - \varvec{v}_{[t-1]}^{[s]} \right\| =\Vert \varvec{\theta }_{[t]}^{[s]} - \varvec{\theta }_{[t-1]}^{[s]} \Vert \nonumber \\{} & {} \quad + \gamma \left\| \nabla F_{i_t}(\varvec{\theta }_{[t]}^{[s]})- \nabla F_{i_t}(\varvec{\theta }^{[s-1]}) \right. \nonumber \\{} & {} \left. \quad + \varvec{\mu }^{[s-1]} - \nabla F_{i_{t-1}}(\varvec{\theta }_{[t-1]}^{[s]})+ \nabla F_{i_{t-1}}(\varvec{\theta }^{[s-1]}) - \varvec{\mu }^{[s-1]} \right\| \nonumber \\{} & {} \quad = \Vert \varvec{\theta }_{[t]}^{[s]} - \varvec{\theta }_{[t-1]}^{[s]} \Vert + \gamma \left\| \nabla F_{i_t}(\varvec{\theta }_{[t]}^{[s]}) \right. \nonumber \\{} & {} \left. \quad - \nabla F_{i_t}(\varvec{\theta }^{[s-1]}) - \nabla F_{i_{t-1}}(\varvec{\theta }_{[t-1]}^{[s]}) + \nabla F_{i_{t-1}}(\varvec{\theta }^{[s-1]}) \right\| \nonumber \\{} & {} \quad \le \Vert \varvec{\theta }_{[t]}^{[s]} - \varvec{\theta }_{[t-1]}^{[s]} \Vert \nonumber \\{} & {} \quad + \gamma \left\| \nabla F_{i_t}(\varvec{\theta }_{[t]}^{[s]}) -\nabla F_{i_t}(\varvec{\theta }^{[s-1]}) \right\| + \gamma \left\| \nabla F_{i_{t-1}}(\varvec{\theta }_{[t-1]}^{[s]}) - \nabla F_{i_{t-1}}(\varvec{\theta }^{[s-1]}) \right\| \nonumber \\{} & {} \quad \le \Vert \varvec{\theta }_{[t]}^{[s]} - \varvec{\theta }_{[t-1]}^{[s]} \Vert + {\gamma L_{nor}} \left( \Vert \varvec{\theta }_{[t]}^{[s]} - \varvec{\theta }^{[s-1]} \Vert + \Vert \varvec{\theta }_{[t-1]}^{[s]} - \varvec{\theta }^{[s-1]} \Vert \right) \nonumber \\{} & {} \quad \le \Vert \varvec{\theta }_{[t]}^{[s]} - \varvec{\theta }_{[t-1]}^{[s]} \Vert + 2 \gamma L_{nor}\sum _{t' = 0 }^{t-1} \Vert \Delta ^s_{t'} \Vert \end{aligned}$$
(19)
where the first and second inequalities use \(\Vert a_1+a_2 \Vert \le \Vert a_1 \Vert +\Vert a_1 \Vert\), the third inequality use Assumption 1, the final inequality comes from \(\Vert \varvec{\theta }_{[t]}^{[s]} - \varvec{\theta }^{[s-1]} \Vert =\Vert \sum _{t' = 0 }^{t-1} \Delta ^s_{t'} \Vert \le \sum _{t' = 0 }^{t-1}\Vert \Delta ^s_{t'} \Vert\).
If \(t=1\), according to (19), we have
$$\begin{aligned} \Vert \varvec{\theta }_{[1]}^{[s]} -\overline{\varvec{\theta }}_{[2]}^{[s]} - \varvec{\theta }_{[0]}^{[s]} + \overline{\varvec{\theta }}_{[1]}^{[s]} \Vert \le \Vert \varvec{\theta }_{[1]}^{[s]} - \varvec{\theta }_{[0]}^{[s]} \Vert + 2\gamma L_{nor} \Vert \Delta ^s_{0} \Vert \end{aligned}$$
(20)
Substituting (20) into (18), and takeing expectations, we have
$$\begin{aligned}{} & {} \mathbb {E} \Vert \varvec{\theta }_{[0]}^{[s]} -\overline{\varvec{\theta }}_{[1]}^{[s]}\Vert ^2 - \mathbb {E} \Vert \varvec{\theta }_{[1]}^{[s]} -\overline{\varvec{\theta }}_{[2]}^{[s]}\Vert ^2 \le 2 \mathbb {E} \left( \Vert \varvec{\theta }_{[0]}^{[s]} -\overline{\varvec{\theta }}_{[1]}^{[s]} \Vert \Vert \varvec{\theta }_{[1]}^{[s]} -\overline{\varvec{\theta }}_{[2]}^{[s]} - \varvec{\theta }_{[0]}^{[s]} + \overline{\varvec{\theta }}_{[1]}^{[s]} \Vert \right) \nonumber \\{} & {} \quad \le 2 \mathbb {E} \left( \Vert \varvec{\theta }_{[0]}^{[s]} -\overline{\varvec{\theta }}_{[1]}^{[s]} \Vert \Vert \varvec{\theta }_{[1]}^{[s]} - \varvec{\theta }_{[0]}^{[s]} \Vert \right) + 4\gamma L_{nor} \mathbb {E} \left( \Vert \varvec{\theta }_{[0]}^{[s]} -\overline{\varvec{\theta }}_{[1]}^{[s]} \Vert \Vert \Delta ^s_{0} \Vert \right) \nonumber \\{} & {} \quad \le 2 k^{-\frac{1}{2}} \mathbb {E} \left( \Vert \varvec{\theta }_{[0]}^{[s]} -\overline{\varvec{\theta }}_{[1]}^{[s]} \Vert ^2 \right) + 4\gamma L_{nor} \mathbb {E} \left( \Vert \varvec{\theta }_{[0]}^{[s]} -\overline{\varvec{\theta }}_{[1]}^{[s]} \Vert \Vert \Delta ^s_{0} \Vert \right) \end{aligned}$$
(21)
where the last inequality uses A.13 in (Liu and Wright, 2015). Further, we have the upper bound of \(\mathbb {E} \left( \Vert \varvec{\theta }_{[t]}^{[s]} -{\overline{\varvec{\theta }}}_{[t+1]}^{[s]} \Vert \Vert \Delta ^s_{t} \Vert \right)\) as
$$\begin{aligned}{} & {} \mathbb {E} \left( \Vert \varvec{\theta }_{[t]}^{[s]} -{\overline{\varvec{\theta }}}_{[t+1]}^{[s]} \Vert \Vert \Delta ^s_{t} \Vert \right) \le \frac{1}{2}\mathbb {E} \left( k^{-\frac{1}{2}} \Vert \varvec{\theta }_{[t]}^{[s]} -{\overline{\varvec{\theta }}}_{[t+1]}^{[s]} \Vert ^2 + k^{ \frac{1}{2}} \Vert \Delta ^s_{t} \Vert ^2 \right) \nonumber \\{} & {} \quad = \frac{1}{2} \mathbb {E} \left( k^{-\frac{1}{2}} \Vert \varvec{\theta }_{[t]}^{[s]} -{\overline{\varvec{\theta }}}_{[t+1]}^{[s]} \Vert ^2 + k^{\frac{1}{2}} \mathbb {E}_{J(t)} \Vert \Delta ^s_{t} \Vert ^2 \right) \nonumber \\{} & {} \quad = \frac{1}{2} \mathbb {E} \left( k^{-\frac{1}{2}} \Vert \varvec{\theta }_{[t]}^{[s]} -{\overline{\varvec{\theta }}}_{[t+1]}^{[s]} \Vert ^2 + k^{- \frac{1}{2}}\mathbb {E} \Vert \varvec{\theta }_{[t]}^{[s]} -{\overline{\varvec{\theta }}}_{[t+1]}^{[s]} \Vert ^2 \right) \nonumber \\{} & {} \quad = k^{- \frac{1}{2}}\mathbb {E} \Vert \varvec{\theta }_{[t]}^{[s]} -{\overline{\varvec{\theta }}}_{[t+1]}^{[s]} \Vert ^2 \end{aligned}$$
(22)
Substituting (22) into (21), we have
$$\begin{aligned} \mathbb {E} \Vert \varvec{\theta }_{[0]}^{[s]} -\overline{\varvec{\theta }}_{[1]}^{[s]}\Vert ^2 - \mathbb {E} \Vert \varvec{\theta }_{[1]}^{[s]} -\overline{\varvec{\theta }}_{[2]}^{[s]}\Vert ^2 \le k^{-\frac{1}{2}} \left( 2 + 4\gamma L_{nor} \right) \mathbb {E} \left( \Vert \varvec{\theta }_{[0]}^{[s]} -\overline{\varvec{\theta }}_{[1]}^{[s]} \Vert ^2 \right) \end{aligned}$$
(23)
which implies that
$$\begin{aligned} \mathbb {E} \Vert \varvec{\theta }_{[0]}^{[s]} -\overline{\varvec{\theta }}_{[1]}^{[s]}\Vert ^2 \le \left( 1 - \frac{ 2 + 4\gamma L_{nor} }{\sqrt{k}} \right) ^{-1} \mathbb {E} \Vert \varvec{\theta }_{[1]}^{[s]} -\overline{\varvec{\theta }}_{[2]}^{[s]}\Vert ^2 \le \rho \mathbb {E} \Vert \varvec{\theta }_{[1]}^{[s]} -\overline{\varvec{\theta }}_{[2]}^{[s]}\Vert ^2 \end{aligned}$$
(24)
where the last inequality follows from the fact \(\rho ^{-1} \le 1 - \frac{ 2 + 4\gamma L_{nor} }{\sqrt{k}} \Leftrightarrow \gamma \le \frac{k^{1/2}(1-\rho ^{-1})-2}{4 L_{nor} }\). Thus, we have (17) for \(t=1\).
Next, we consider the cases for \(t>1\). For \(t' \le t-1\) and any \(\beta >0\), we have
$$\begin{aligned}{} & {} \mathbb {E} \left( \Vert \varvec{\theta }_{[t]}^{[s]} -{\overline{\varvec{\theta }}}_{[t+1]}^{[s]} \Vert \Vert \Delta ^s_{t'} \Vert \right) \le \frac{1}{2}\mathbb {E} \left( k^{-\frac{1}{2}} \beta ^{-1} \Vert \varvec{\theta }_{[t]}^{[s]} -{\overline{\varvec{\theta }}}_{[t+1]}^{[s]} \Vert ^2 + k^{ \frac{1}{2}} \beta \Vert \Delta ^s_{t'} \Vert ^2 \right) \nonumber \\{} & {} \quad = \frac{1}{2} \mathbb {E} \left( k^{-\frac{1}{2}} \beta ^{-1} \Vert \varvec{\theta }_{[t]}^{[s]} -{\overline{\varvec{\theta }}}_{[t+1]}^{[s]} \Vert ^2 + k^{ \frac{1}{2}} \beta \mathbb {E}_{J(t)} \Vert \Delta ^s_{t'} \Vert ^2 \right) \nonumber \\{} & {} \quad = \frac{1}{2} \mathbb {E} \left( k^{-\frac{1}{2}} \beta ^{-1} \Vert \varvec{\theta }_{[t]}^{[s]} -{\overline{\varvec{\theta }}}_{[t+1]}^{[s]} \Vert ^2 + k^{-\frac{1}{2}} \beta \mathbb {E} \Vert \varvec{\theta }_{[t']}^{[s]} -{\overline{\varvec{\theta }}}_{[t'+1]}^{[s]} \Vert ^2 \right) \nonumber \\{} & {} \quad \le \frac{1}{2} \mathbb {E} \left( k^{-\frac{1}{2}} \beta ^{-1} \Vert \varvec{\theta }_{[t]}^{[s]} -{\overline{\varvec{\theta }}}_{[t+1]}^{[s]} \Vert ^2 + k^{-\frac{1}{2}} \rho ^{t-t'} \beta \mathbb {E} \Vert \varvec{\theta }_{[t]}^{[s]} -{\overline{\varvec{\theta }}}_{[t+1]}^{[s]} \Vert ^2 \right) \nonumber \\{} & {} \quad {\mathop {\le }\limits ^{\beta =\rho ^{\frac{t'-t}{2}} }} k^{-\frac{1}{2}} \rho ^{\frac{t-t'}{2}} \mathbb {E} \Vert \varvec{\theta }_{[t]}^{[s]} -{\overline{\varvec{\theta }}}_{[t+1]}^{[s]} \Vert ^2 \end{aligned}$$
(25)
We assume that (17) holds \(\forall t' <t\). By substituting (19) into (18) and taking expectation on both sides of (18), we can have
$$\begin{aligned}{} & {} \mathbb {E} \left( \Vert \varvec{\theta }_{[t-1]}^{[s]} -{\overline{\varvec{\theta }}}_{[t]}^{[s]}\Vert ^2 - \Vert \varvec{\theta }_{[t]}^{[s]} -{\overline{\varvec{\theta }}}_{[t+1]}^{[s]}\Vert ^2 \right) \nonumber \\{} & {} \quad \le 2 \mathbb {E} \left( \Vert \varvec{\theta }_{[t-1]}^{[s]} -{\overline{\varvec{\theta }}}_{[t]}^{[s]} \Vert \Vert \varvec{\theta }_{[t]}^{[s]} -\overline{x}_{t+1} - \varvec{\theta }_{[t-1]}^{[s]} + {\overline{\varvec{\theta }}}_{[t]}^{[s]} \Vert \right) \nonumber \\{} & {} \quad \le 2 \mathbb {E} \left( \Vert \varvec{\theta }_{[t-1]}^{[s]} -{\overline{\varvec{\theta }}}_{[t]}^{[s]} \Vert \left( \Vert \varvec{\theta }_{[t]}^{[s]} - \varvec{\theta }_{[t-1]}^{[s]} \Vert + 2 \gamma L_{nor}\sum _{t' = 0 }^{t-1} \Vert \Delta ^s_{t'} \Vert \right) \right) \nonumber \\{} & {} \quad = 2 \mathbb {E} \left( \Vert \varvec{\theta }_{[t-1]}^{[s]} -{\overline{\varvec{\theta }}}_{[t]}^{[s]} \Vert \Vert \varvec{\theta }_{[t-1]}^{[s]} -{\overline{\varvec{\theta }}}_{[t]}^{[s]} \Vert \right) + 4 \gamma \mathbb {E} \left( L_{nor} \sum _{t' = 0 }^{t-1} \Vert \varvec{\theta }_{[t-1]}^{[s]} -{\overline{\varvec{\theta }}}_{[t]}^{[s]} \Vert \Vert \Delta ^s_{t'} \Vert \right) \nonumber \\{} & {} \quad \le 2 k^{-1/2} \mathbb {E}\left( \Vert \varvec{\theta }_{[t-1]}^{[s]} -{\overline{\varvec{\theta }}}_{[t]}^{[s]} \Vert ^2 \right) + 4 \gamma k^{-1/2} \mathbb {E}\left( \Vert \varvec{\theta }_{[t-1]}^{[s]} -{\overline{\varvec{\theta }}}_{[t]}^{[s]} \Vert ^2 \right) L_{nor}\sum _{t' = 0 }^{t-1}\rho ^{\frac{t-1-t'}{2}}\nonumber \\{} & {} \quad \le k^{-1/2} \mathbb {E}\left( \Vert \varvec{\theta }_{[t-1]}^{[s]} -{\overline{\varvec{\theta }}}_{[t]}^{[s]} \Vert ^2 \right) \left( 2 + 4 \gamma L_{nor} \left( 1+ \frac{\rho ^{\frac{1}{2}} - \rho ^{\frac{m}{2}}}{1-\rho ^{\frac{1}{2}}} \right) \right) \nonumber \\{} & {} \quad = k^{-1/2} \mathbb {E}\left( \Vert \varvec{\theta }_{[t-1]}^{[s]} -{\overline{\varvec{\theta }}}_{[t]}^{[s]} \Vert ^2 \right) \cdot \left( 2 + 4 \gamma L_{nor} \left( 1+ \theta \right) \right) \end{aligned}$$
(26)
where the third inequality uses (25). Based on (26), we have that
$$\begin{aligned}{} & {} \mathbb {E} \left( \Vert \varvec{\theta }_{[t-1]}^{[s]} -{\overline{\varvec{\theta }}}_{[t]}^{[s]}\Vert ^2 \right) \nonumber \\{} & {} \quad \le \left( 1- k^{-1/2} \left( 2+4 \gamma L_{nor} \left( 1+ \theta \right) \right) \right) ^{-1} \cdot \mathbb {E} \left( \Vert \varvec{\theta }_{[t]}^{[s]} -{\overline{\varvec{\theta }}}_{[t+1]}^{[s]}\Vert ^2 \right) \nonumber \\{} & {} \quad \le \rho \mathbb {E} \left( \Vert \varvec{\theta }_{[t]}^{[s]} -{\overline{\varvec{\theta }}}_{[t+1]}^{[s]}\Vert ^2 \right) \end{aligned}$$
(27)
where the last inequality follows from
$$\begin{aligned} \rho ^{-1} \le 1- k^{-1/2} \left( 2+4 \gamma L_{nor} \left( 1+ \theta \right) \right) \Leftrightarrow \gamma \le \frac{k^{1/2}(1-\rho ^{-1})-2}{4 L_{nor} \left( 1+ \theta \right) } \end{aligned}$$
(28)
This completes the proof. \(\square\)
Lemma 3
Let \(\rho\) be a constant that satisfies \(\rho > 1\), the size of the partition of \(\{1,...,d \times p \}\) is k, and define the quantity \(\theta = \frac{\rho ^{\frac{1}{2}} - \rho ^{\frac{m}{2}}}{1-\rho ^{\frac{1}{2}}}\). Suppose Suppose the steplength parameter \(\gamma\) satisfies \(\gamma \le \min \left\{ \frac{1}{\frac{L_{\max }}{2} +\frac{2 L_{\max } \theta }{k^{1/2}}}, \frac{k^{1/2}(1-\rho ^{-1})-2}{4 L_{nor} \left( 1+ \theta \right) } \right\}\). Under Assumptions 1 and 2, the expectation of the objective function \(\mathbb {E} F(\varvec{\theta }_{[t]}^{[s]})\) is monotonically decreasing, i.e., \(\mathbb {E} F(\varvec{\theta }_{[t+1]}^{[s]}) \le \mathbb {E} F(\varvec{\theta }_{[t]}^{[s]})\).
Proof
Take expectation \(F(\varvec{\theta }_{[t+1]}^{[s]})\) on J(t), we have that
$$\begin{aligned}{} & {} \mathbb {E}_{J(t)} F(\varvec{\theta }_{[t+1]}^{[s]}) = \mathbb {E}_{J(t)} F(\varvec{\theta }_{[t]}^{[s]} + \Delta _t^s) \nonumber \\{} & {} \quad \le \mathbb {E}_{J(t)} \left( F(\varvec{\theta }_{[t]}^{[s]} ) + \left\langle \nabla _{J(t)} F(\varvec{\theta }_{[t]}^{[s]}), ( \Delta _t^s)_{J(t)}\right\rangle + \frac{L_{\max }}{2} \left\| ( \Delta _t^s)_{J(t)} \right\| ^2 \right) \nonumber \\{} & {} \quad = F(\varvec{\theta }_{[t]}^{[s]} ) + \mathbb {E}_{J(t)} \left( \left\langle \nabla _{J(t)} F(\varvec{\theta }_{[t]}^{[s]}), (\Delta _t^s)_{J(t)}\right\rangle \right. \nonumber \\{} & {} \left. \quad + \frac{L_{\max }}{2} \left\| ( \Delta _t^s)_{J(t)} \right\| ^2 \right) \nonumber \\{} & {} \quad = F(\varvec{\theta }_{[t]}^{[s]} ) + \mathbb {E}_{J(t)} \left( \left\langle (\varvec{v}^s_t)_{J(t)}, ( \Delta _t^s)_{J(t)}\right\rangle + \frac{L_{\max }}{2} \left\| ( \Delta _t^s)_{J(t)} \right\| ^2 \right. \nonumber \\{} & {} \left. \quad +\left\langle \nabla _{J(t)} F(\varvec{\theta }_{[t]}^{[s]}) -(\varvec{v}^s_t)_{J(t)}, ( \Delta _t^s)_{J(t)}\right\rangle \right) \nonumber \\{} & {} \quad \le F(\varvec{\theta }_{[t]}^{[s]}) + \mathbb {E}_{J(t)} \left( -\frac{1}{\gamma } \left\| ( \Delta _t^s)_{J(t)} \right\| ^2 + \frac{L_{\max }}{2} \left\| ( \Delta _t^s)_{J(t)} \right\| ^2 \right. \nonumber \\{} & {} \quad \left. +\left\langle \nabla _{J(t)} F(\varvec{\theta }_{[t]}^{[s]}) -(\varvec{v}^s_t)_{J(t)}, ( \Delta _t^s)_{J(t)}\right\rangle \right) \nonumber \\{} & {} \quad = F(\varvec{\theta }_{[t]}^{[s]}) + \mathbb {E}_{J(t)} \left( \left( \frac{L_{\max }}{2} -\frac{1}{\gamma } \right) \left\| ( \Delta _t^s)_{J(t)} \right\| ^2 \right) \nonumber \\{} & {} \quad +\mathbb {E}_{J(t)} \left\langle \nabla _{J(t)} F(\varvec{\theta }_{[t]}^{[s]}) -(\varvec{v}^s_t)_{J(t)}, ( \Delta _t^s)_{J(t)}\right\rangle \nonumber \\{} & {} \quad = F(\varvec{\theta }_{[t]}^{[s]}) + \frac{1}{k}\left( \frac{L_{\max }}{2} -\frac{ 1}{\gamma } \right) \Vert \overline{\varvec{\theta }}_{[t+1]}^{[s]} -\varvec{\theta }_{[t]}^{[s]} \Vert ^2\nonumber \\{} & {} \quad +\mathbb {E}_{J(t)} \left\langle \nabla _{J(t)} F(\varvec{\theta }_{[t]}^{[s]}) -(\varvec{v}^s_t)_{J(t)}, ( \Delta _t^s)_{J(t)}\right\rangle \end{aligned}$$
(29)
where the first inequality uses (6), and the second inequality uses (14) in Lemma 1.
Consider the expectation of the last term on the right-hand side of (29), we have
$$\begin{aligned}{} & {} \mathbb {E} \left\langle \nabla _{J(t)} F(\varvec{\theta }_{[t]}^{[s]}) -(\varvec{v}^s_t)_{J(t)}, (\Delta _t^s)_{J(t)}\right\rangle \nonumber \\{} & {} \quad = \mathbb {E} \left\langle \frac{1}{|\mathcal {B}'|}\sum _{i\in \mathcal {B}'} \nabla _{J(t)} F_i(\varvec{\theta }_{[t]}^{[s]}) -(\varvec{v}^s_t)_{J(t)}, (\Delta _t^s)_{J(t)}\right\rangle \nonumber \\{} & {} \quad = \mathbb {E} \left\langle \frac{1}{|\mathcal {B}'|}\sum _{i\in \mathcal {B}'} \nabla _{J(t)} F_i(\varvec{\theta }_{[t]}^{[s]}) - \left( \nabla F_{i_{t}}(\varvec{\theta }_{[t]}^{[s]}) \right. \right. \nonumber \\{} & {} \quad \left. \left. - \nabla _{J(t)} F_{i_{t}}(\varvec{\theta }^{[s-1]}) + \varvec{\mu }^{[s-1]} \right) _{J(t)}, ( \Delta _t^s)_{J(t)}\right\rangle \nonumber \\{} & {} \quad = \mathbb {E} \frac{1}{|\mathcal {B}'|}\sum _{i\in \mathcal {B}'} \left\langle \nabla _{J(t)} F_i(\varvec{\theta }_{[t]}^{[s]}) - \nabla _{J(t)} F_i(\varvec{\theta }^{[s-1]}), ( \Delta _t^s)_{J(t)} \right\rangle \nonumber \\{} & {} \quad + \mathbb {E} \left\langle \nabla _{J(t)} F_{i_{t}}(\varvec{\theta }_{[t]}^{[s]}) - \nabla _{J(t)} F_{i_{t}}(\varvec{\theta }^{[s-1]}), (\Delta _t^s)_{J(t)} \right\rangle \nonumber \\{} & {} \quad \le \mathbb {E} \frac{1}{|\mathcal {B}'|}\sum _{i\in \mathcal {B}'} \left( \left\| \nabla _{J(t)} F_i(\varvec{\theta }_{[t]}^{[s]}) - \nabla _{J(t)} F_i(\varvec{\theta }^{[s-1]})\right\| \left\| \Delta _t^s \right\| \right) \nonumber \\{} & {} \quad + \mathbb {E} \left( \left\| \nabla _{J(t)} F_{i_{t}}(\varvec{\theta }_{[t]}^{[s]}) - \nabla _{J(t)} F_{i_{t}}(\varvec{\theta }^{[s-1]})\right\| \left\| \Delta _t^s \right\| \right) \nonumber \\{} & {} \quad \le \frac{2L_{\max } }{k}\mathbb {E} \Vert \varvec{\theta }_{[t]}^{[s]} - \varvec{\theta }^{[s-1]} \Vert \Vert \overline{\varvec{\theta }}_{[t+1]}^{[s]} -\varvec{\theta }_{[t]}^{[s]} \Vert \nonumber \\{} & {} \quad \le \frac{2L_{\max } }{k}\mathbb {E} \sum _{t'=0}^{t-1} \Vert \Delta _{t'}^s \Vert \Vert \overline{\varvec{\theta }}_{[t+1]}^{[s]} -\varvec{\theta }_{[t]}^{[s]} \Vert \nonumber \\{} & {} \quad \le 2L_{\max }\sum _{t'=0}^{t-1} \frac{\rho ^{\frac{t-t'}{2}}}{k^{3/2}} \mathbb {E} \Vert \overline{\varvec{\theta }}_{[t+1]}^{[s]} -\varvec{\theta }_{[t]}^{[s]} \Vert ^2\nonumber \\{} & {} \quad \le 2L_{\max } k^{-3/2} \frac{\rho ^{\frac{1}{2}} - \rho ^{\frac{m}{2}}}{1-\rho ^{\frac{1}{2}}} \cdot \mathbb {E} \Vert \overline{\varvec{\theta }}_{[t+1]}^{[s]} -\varvec{\theta }_{[t]}^{[s]} \Vert ^2\nonumber \\{} & {} \quad = 2L_{\max } k^{-3/2} \theta \mathbb {E} \Vert \overline{\varvec{\theta }}_{[t+1]}^{[s]} -\varvec{\theta }_{[t]}^{[s]} \Vert ^2 \end{aligned}$$
(30)
where the first inequality uses the Cauchy-Schwarz inequality (Callebaut, 1965), the second inequality uses Assumption 1, the third inequality uses \(\Vert \sum _{i=1}^n a_i \Vert \le \sum _{i=1}^n \Vert a_i \Vert\), the fourth inequality uses (25).
By taking expectations on both sides of (29) and substituting (30), we have
$$\begin{aligned}{} & {} \mathbb {E} F(\varvec{\theta }_{[t+1]}^{[s]})\nonumber \\{} & {} \quad \le F(\varvec{\theta }_{[t]}^{[s]}) + \frac{1}{k}\left( \frac{L_{\max }}{2} -\frac{ 1}{\gamma } \right) \Vert \overline{\varvec{\theta }}_{[t+1]}^{[s]} -\varvec{\theta }_{[t]}^{[s]} \Vert ^2 +\mathbb {E}_{J(t)} \left\langle \nabla _{J(t)} F(\varvec{\theta }_{[t]}^{[s]}) -(\varvec{v}^s_t)_{J(t)}, ( \Delta _t^s)_{J(t)}\right\rangle \nonumber \\{} & {} \quad \le \mathbb {E} F(\varvec{\theta }_{[t]}^{[s]})- \frac{1}{k} \cdot \left( \frac{ 1}{\gamma }- \frac{L_{\max }}{2} -\frac{2 L_{\max } \theta }{k^{1/2}} \right) \mathbb {E} \Vert \overline{\varvec{\theta }}_{[t+1]}^{[s]} -\varvec{\theta }_{[t]}^{[s]} \Vert ^2 \end{aligned}$$
(31)
where \(\frac{ 1}{\gamma }- \frac{L_{\max }}{2} -\frac{2 L_{\max } \theta }{k^{1/2}}\ge 0\) because that \(\gamma ^{-1} \ge \frac{L_{\max }}{2} +\frac{2 L_{\max } \theta }{k^{1/2}}\). This completes the proof. \(\square\)
Now, we provide the proof to Theorem 1 as follows.
Proof
We have that
$$\begin{aligned}{} & {} \Vert \varvec{\theta }_{[t+1]}^{[s]} - \varvec{\theta }^* \Vert ^2 = \Vert \varvec{\theta }_{[t]}^{[s]} + \Delta _t^s - \varvec{\theta }^* \Vert ^2 \nonumber \\= & {} \Vert \varvec{\theta }_{[t]}^{[s]} - \varvec{\theta }^* \Vert ^2 - \Vert \Delta _t^s \Vert ^2 - 2 \langle \left( \varvec{\theta }^* -\varvec{\theta }_{[t]}^{[s]} - \Delta _t^s \right) _{J(t)},( \Delta _t^s)_{J(t)} \rangle \nonumber \\= & {} \Vert \varvec{\theta }_{[t]}^{[s]} - \varvec{\theta }^* \Vert ^2 - \Vert \Delta _t^s \Vert ^2 - 2 \langle \left( \varvec{\theta }^* -\varvec{\theta }_{[t]}^{[s]} - \Delta _t^s \right) _{J(t)}, -\gamma (\varvec{v}_t^s)_{J(t)} \rangle \nonumber \\= & {} \Vert \varvec{\theta }_{[t]}^{[s]} - \varvec{\theta }^* \Vert ^2 - \Vert \Delta _t^s \Vert ^2 + 2 \gamma \underbrace{ \left( \langle \left( \varvec{\theta }^* -\varvec{\theta }_{[t]}^{[s]} \right) _{J(t)},( {v}_t^s)_{J(t)} \rangle \right) }_{T_1} + 2 \gamma \underbrace{ \left( \langle \left( \Delta _{t}^s \right) _{J(t)},( {v}_t^s)_{J(t)} \rangle \right) }_{T_2} \end{aligned}$$
(32)
For the expectation of \(T_1\), we have
$$\begin{aligned} \mathbb {E}(T_1)= & {} \mathbb {E} { \left( \langle \left( \varvec{\theta }^* -\varvec{\theta }_{[t]}^{[s]} \right) _{J(t)},( {v}_t^s)_{J(t)} \rangle \right) }\nonumber \\= & {} \frac{1}{k}\mathbb {E} \langle \varvec{\theta }^* -\varvec{\theta }_{[t]}^{[s]}, {v}_t^s \rangle \nonumber \\= & {} \frac{1}{k}\mathbb {E} \langle \varvec{\theta }^* -\varvec{\theta }_{[t]}^{[s]}, \nabla F_{i_{t}}(\varvec{\theta }_{[t]}^{[s]}) - \nabla F_{i_{t}}(\varvec{\theta }^{[s-1]}) + \varvec{\mu }^{[s-1]} \rangle \nonumber \\= & {} \frac{1}{k}\mathbb {E} \langle \varvec{\theta }^* -\varvec{\theta }_{[t]}^{[s]}, \nabla F_{i_{t}}(\varvec{\theta }_{[t]}^{[s]}) \rangle + \frac{1}{k} \langle \mathbb {E} ( \varvec{\theta }^* -\varvec{\theta }_{[t]}^{[s]} ), \mathbb {E} (- \nabla F_{i_{t}}(\varvec{\theta }^{[s-1]}) + \varvec{\mu }^{[s-1]} ) \rangle \nonumber \\= & {} \frac{1}{k}\mathbb {E} \langle \varvec{\theta }^* -\varvec{\theta }_{[t]}^{[s]}, \nabla F_{i_{t}}(\varvec{\theta }_{[t]}^{[s]}) \rangle \nonumber \\\le & {} \frac{1}{k}\mathbb {E} \left( F_{i_t}( \varvec{\theta }^*)- F_{i_{t}}(\varvec{\theta }_{[t]}^{[s]}) \right) \nonumber \\= & {} \frac{1}{k}\mathbb {E} \left( F( \varvec{\theta }^*)- F(\varvec{\theta }_{[t]}^{[s]}) \right) \end{aligned}$$
(33)
where the first inequality uses the convexity of \(F_i\). For the expectation of \(T_2\), we have
$$\begin{aligned}{} & {} \mathbb {E}(T_2) =\mathbb {E} \langle \left( \Delta _{t}^s \right) _{J(t)},( {v}_t^s)_{J(t)} \rangle \end{aligned}$$
(34)
$$\begin{aligned}{} & {} \quad = \mathbb {E} \langle \left( \Delta _{t} \right) _{J(t)}, \left( \nabla F_{i_{t}}(\varvec{\theta }_{[t]}^{[s]}) - \nabla F_{i_{t}}(\varvec{\theta }^{[s-1]}) + \varvec{\mu }^{[s-1]} \right) _{J(t)} \rangle \nonumber \\{} & {} \quad = \mathbb {E} \langle \left( \Delta _{t} \right) _{J(t)}, \left( \nabla F_{i_{t}}(\varvec{\theta }_{[t]}^{[s]}) - \nabla F_{i_{t}}(\varvec{\theta }^{[s-1]}) \right) _{J(t)} \rangle +\mathbb {E} \langle \left( \Delta _{t} \right) _{J(t)}, (\varvec{\mu }^{[s-1]})_{J(t)} \rangle \nonumber \\{} & {} \quad \le \frac{1}{k} \mathbb {E} \left( \Vert \overline{\varvec{\theta }}_{[t+1]}^{[s]} -\varvec{\theta }_{[t]}^{[s]} \Vert \left\| \nabla F_{i_{t}}(\varvec{\theta }_{[t]}^{[s]}) - \nabla F_{i_{t}}(\varvec{\theta }^{[s-1]}) \right\| \right) + \mathbb {E} \langle \left( \Delta _{t} \right) _{J(t)}, (\varvec{\mu }^{[s-1]})_{J(t)} \rangle \nonumber \\{} & {} \quad \le \frac{L_{res }}{k} \mathbb {E} \left( \Vert \overline{\varvec{\theta }}_{[t+1]}^{[s]} -\varvec{\theta }_{[t]}^{[s]} \Vert \Vert \varvec{\theta }_{[t]}^{[s]} - \varvec{\theta }^{[s-1]} \Vert \right) + \mathbb {E} \langle \left( \Delta _{t} \right) _{J(t)}, (\varvec{\mu }^{[s-1]})_{J(t)} \rangle \nonumber \\{} & {} \quad \le \frac{L_{res }}{k} \mathbb {E} \left( \sum _{t'=0}^{t-1} \Vert \overline{\varvec{\theta }}_{[t+1]}^{[s]} -\varvec{\theta }_{[t]}^{[s]} \Vert \Vert \Delta ^{s}_{t'} \Vert \right) + \mathbb {E} \langle \left( \Delta _{t} \right) _{J(t)}, (\varvec{\mu }^{[s-1]})_{J(t)} \rangle \nonumber \\{} & {} \quad \le \frac{L_{res}}{k^{3/2}} \sum _{t'=0}^{t-1} \rho ^{(t-t')/2} \mathbb {E}(\Vert \varvec{\theta }_{[t]}^{[s]}-{\overline{\varvec{\theta }}}_{[t+1]}^{[s]} \Vert ^2)+ \mathbb {E} \langle \left( \Delta _{t} \right) _{J(t)}, (\varvec{\mu }^{[s-1]})_{J(t)} \rangle \nonumber \\{} & {} \quad \le \frac{ L_{res} \theta }{k^{3/2}} \mathbb {E}(\Vert \varvec{\theta }_{[t]}^{[s]}-{\overline{\varvec{\theta }}}_{[t+1]}^{[s]} \Vert ^2)+ \mathbb {E} \langle \left( \Delta _{t} \right) _{J(t)}, (\varvec{\mu }^{[s-1]})_{J(t)} \rangle \end{aligned}$$
(35)
where the second inequality uses Assumption 1, the fourth inequality uses (25). By substituting the upper bounds from (33) and (34) into (32), we have
$$\begin{aligned} \mathbb {E}\Vert \varvec{\theta }_{[t+1]}^{[s]} - \varvec{\theta }^* \Vert ^2\le & {} \mathbb {E} \Vert \varvec{\theta }_{[t]}^{[s]} - \varvec{\theta }^* \Vert ^2 - \frac{1}{k} \mathbb {E}(\Vert \varvec{\theta }_{[t]}^{[s]}-{\overline{\varvec{\theta }}}_{[t+1]}^{[s]} \Vert ^2)\nonumber \\{} & {} + \frac{2 \gamma }{k} \mathbb {E} \left( F( \varvec{\theta }^*)- F(\varvec{\theta }_{[t]}^{[s]}) \right) + 2 \gamma \left( \frac{ L_{res} \theta }{k^{3/2}} \theta \mathbb {E}(\Vert \varvec{\theta }_{[t]}^{[s]}\right. \nonumber \\{} & {} \left. -{\overline{\varvec{\theta }}}_{[t+1]}^{[s]} \Vert ^2) + \mathbb {E} \langle \left( \Delta _{t} \right) _{J(t)}, (\varvec{\mu }^{[s-1]})_{J(t)} \rangle \right) \nonumber \\= & {} \mathbb {E} \Vert \varvec{\theta }_{[t]}^{[s]} - \varvec{\theta }^* \Vert ^2 + \frac{2 \gamma }{k} \mathbb {E} \left( F( \varvec{\theta }^*)- F(\varvec{\theta }_{[t]}^{[s]}) \right) \nonumber \\{} & {} - \frac{1}{k} \left( 1- \frac{2 L_{res} \theta \gamma }{k^{1/2} }\right) \mathbb {E}(\Vert \varvec{\theta }_{[t]}^{[s]}-{\overline{\varvec{\theta }}}_{[t+1]}^{[s]} \Vert ^2) + 2 \gamma \mathbb {E} \langle \left( \Delta _{t} \right) _{J(t)}, (\varvec{\mu }^{[s-1]})_{J(t)} \rangle \end{aligned}$$
(36)
We consider a fixed stage \(s+1\) such that \(x_0^{s+1} = x_{m}^{s}\). By summing the the inequality (36) over \(t = 0,\cdots ,m-1\), we obtain
$$\begin{aligned} \mathbb {E}\Vert \varvec{\theta }^{[s+1]} - \varvec{\theta }^* \Vert ^2\le & {} \mathbb {E}\Vert \varvec{\theta }^{[s]} - \varvec{\theta }^* \Vert ^2+ \sum _{t'=0}^{m-1}\frac{2 \gamma }{k} \mathbb {E} \left( F( \varvec{\theta }^*)\right. \nonumber \\{} & {} \left. - F(\varvec{\theta }^{[s+1]}_{[t']}) \right) - \sum _{t'=0}^{m-1} \frac{1}{k} \left( 1- \frac{2 L_{res} \theta \gamma }{k^{1/2} }\right) \cdot \mathbb {E}(\Vert \varvec{\theta }^{[s+1]}_{[t']}\nonumber \\{} & {} -\overline{\varvec{\theta }}_{[t'+1]}^{[s+1]} \Vert ^2) + 2 \gamma \sum _{t'=0}^{m-1} \mathbb {E} \left\langle \left( \Delta _{t'} \right) _{J(t')}, (\varvec{\mu }^{[s-1]})_{J(t')} \right\rangle \nonumber \\= & {} \mathbb {E}\Vert \varvec{\theta }^{[s]} - \varvec{\theta }^* \Vert ^2 + \sum _{t'=0}^{m-1}\frac{2 \gamma }{k} \mathbb {E} \left( F( \varvec{\theta }^*)- F(\varvec{\theta }^{[s+1]}_{[t']}) \right) \nonumber \\{} & {} - \sum _{t'=0}^{m-1} \frac{1}{k} \left( 1- \frac{2 L_{res} \theta \gamma }{k^{1/2} }\right) \cdot \mathbb {E}(\Vert \varvec{\theta }^{[s+1]}_{[t']}-\overline{\varvec{\theta }}_{[t'+1]}^{[s+1]} \Vert ^2) \nonumber \\{} & {} + 2 \gamma \sum _{t'=0}^{m-1} \mathbb {E} \left\langle \varvec{\theta }^{[s+1]}_{[t']}-\varvec{\theta }^{[s+1]}_{[t'+1]}, \nabla F (\varvec{\theta }^{[s-1]}) \right\rangle \nonumber \\\le & {} \mathbb {E}\Vert \varvec{\theta }^{[s]} - \varvec{\theta }^* \Vert ^2 + \sum _{t'=0}^{m-1}\frac{2 \gamma }{k} \mathbb {E} \left( F( \varvec{\theta }^*)- F(\varvec{\theta }^{[s+1]}_{[t']}) \right) \nonumber \\{} & {} - \sum _{t'=0}^{m-1} \frac{1}{k} \left( 1- \frac{2 L_{res} \theta \gamma }{k^{1/2} }\right) \cdot \mathbb {E}(\Vert \varvec{\theta }^{[s+1]}_{[t']}-\overline{\varvec{\theta }}_{[t'+1]}^{[s+1]} \Vert ^2)\nonumber \\{} & {} + 2 \gamma \mathbb {E} \left( F(\varvec{\theta }^{[s]} ) - F(\varvec{\theta }^{[s+1]}) + \frac{L_{res}}{2k} \sum _{t'=0}^{m-1} \Vert \varvec{\theta }^{[s+1]}_{[t']}-\overline{\varvec{\theta }}_{[t'+1]}^{[s+1]} \Vert ^2 \right) \nonumber \\\le & {} \mathbb {E}\Vert \varvec{\theta }^{[s]} - \varvec{\theta }^* \Vert ^2 + \frac{2 \gamma }{k}\sum _{t'=0}^{m-1} \left( F( \varvec{\theta }^*) - \mathbb {E}F(\varvec{\theta }^{[s+1]}_{[t']}) \right) \nonumber \\{} & {} + 2 \gamma \left( \mathbb {E}F(\varvec{\theta }^{[s]}) - \mathbb {E}F(\varvec{\theta }^{[s+1]}) \right) \nonumber \\{} & {} - \sum _{t'=0}^{m-1} \frac{1}{k} \left( 1- L_{res} \gamma - \frac{2 L_{res} \theta \gamma }{k^{1/2} }\right) \cdot \mathbb {E}(\Vert \varvec{\theta }^{[s+1]}_{[t']}-\overline{\varvec{\theta }}_{[t'+1]}^{[s+1]} \Vert ^2)\nonumber \\\le & {} \mathbb {E}\Vert \varvec{\theta }^{[s]} - \varvec{\theta }^* \Vert ^2 + \frac{2 \gamma }{k}\sum _{t'=0}^{m-1} \left( F( \varvec{\theta }^*) - \mathbb {E}F(\varvec{\theta }^{[s+1]}_{[t']}) \right) \nonumber \\{} & {} + 2 \gamma \left( \mathbb {E}F(\varvec{\theta }^{[s]}) - \mathbb {E}F(\varvec{\theta }^{[s+1]}) \right) \end{aligned}$$
(37)
where the second inequality uses (3), the final inequality comes from \(1- L_{res} \gamma - \frac{2 L_{res} \theta \gamma }{k^{1/2} } \ge 0\). Define \(\mathcal {F}(\varvec{\theta }^{[s]}) = \mathbb {E} \Vert \varvec{\theta }^{[s]} - \varvec{\theta }^* \Vert ^2 + 2 \gamma \mathbb {E} \left( F( \varvec{\theta }^{[s]}) - F( \varvec{\theta }^*) \right)\). According to (37), we have
$$\begin{aligned} \mathcal {F}(\varvec{\theta }^{[s+1]})\le & {} \mathcal {F}(\varvec{\theta }^{[s]}) - \frac{2 \gamma }{k}\sum _{t'=0}^{m-1} \mathbb {E} \left( F( \varvec{\theta }^{[s+1]}_{t'}) - F( \varvec{\theta }^*) \right) \nonumber \\\le & {} \mathcal {F}(\varvec{\theta }^{[s]}) - \frac{2 m \gamma }{ k}\mathbb {E} \left( F( \varvec{\theta }^{[s+1]}) - F( \varvec{\theta }^*) \right) \end{aligned}$$
(38)
where the second inequality comes from the monotonicity of \(\mathbb {E} F(\varvec{\theta }_{[t]}^{[s]})\). According to (38), we have
$$\begin{aligned} \mathcal {F}(\varvec{\theta }^{[S]}) \le \mathcal {F}(\varvec{\theta }^{[0]}) - \frac{2 m \gamma S}{ k}\mathbb {E} \left( F( \varvec{\theta }^{[S]}) - F( \varvec{\theta }^*) \right) \end{aligned}$$
(39)
Thus, the sublinear convergence rate can be obtained from (39). This completes the proof. \(\square\)
Convergence analysis of theorem 2
Lemma 4
The function \(\nu \sum _{j=1}^p\left\| \varvec{\tilde{\beta }}^j - \varvec{\beta }^j \right\| ^2\) with the parameter \(\varvec{\beta }\) in (5) has the normal Lipschitz constant \(2\nu\) as similarly defined in Definition 1.
Proof
First, we have that
$$\begin{aligned} \nu \Vert \varvec{\beta } \Vert ^2 - \nu \sum _{j=1}^p\left\| \varvec{\tilde{\beta }}^j - \varvec{\beta }^j \right\| ^2= & {} \nu \Vert \varvec{\beta } \Vert ^2 - \nu \Vert \varvec{\beta } \Vert ^2 + 2\nu \sum _{j=1}^p \langle \tilde{\varvec{\beta }}^j, \varvec{\beta }^j \rangle -\nu \Vert \varvec{\tilde{\beta }} \Vert ^2\nonumber \\= & {} 2\nu \sum _{j=1}^p \langle \varvec{\tilde{\beta }}^j, \varvec{\beta }^j \rangle -\nu \Vert \varvec{\tilde{\beta }} \Vert ^2 \end{aligned}$$
(40)
It is easy to verify that (40) is a convex function w.r.t. the parameter \(\varvec{\beta }\). Thus, according to the convexity, we have
$$\begin{aligned} \nu \Vert \varvec{\beta } \Vert ^2 - \nu \sum _{j=1}^p\left\| \varvec{\tilde{\beta }}^j - \varvec{\beta }^j \right\| ^2 \ge \nu \Vert \varvec{\beta }' \Vert ^2 - \nu \sum _{j=1}^p\left\| \varvec{\tilde{\beta }}^j - \varvec{\beta }'^j \right\| ^2 + \left\langle 4\nu \varvec{\beta }'- 2\nu \varvec{\tilde{\beta }}, \varvec{\beta } - \varvec{\beta }' \right\rangle \end{aligned}$$
(41)
Based on (41), we have that
$$\begin{aligned}{} & {} - \nu \sum _{j=1}^p\left\| \varvec{\tilde{\beta }}^j - \varvec{\beta }^j \right\| ^2 \ge - \nu \Vert \varvec{\beta } \Vert ^2 - \nu \Vert \varvec{\beta }' \Vert ^2 + \left\langle 2\nu \varvec{\beta }', \varvec{\beta } \right\rangle \nonumber \\{} & {} - \nu \sum _{j=1}^p\left\| \varvec{\tilde{\beta }}^j - \varvec{\beta }'^j \right\| ^2 + \left\langle 2\nu \varvec{\beta }'- 2\nu \varvec{\tilde{\beta }}, \varvec{\beta } -\varvec{\beta }' \right\rangle \nonumber \\= & {} - \nu \sum _{j=1}^p\left\| \varvec{\tilde{\beta }}^j - \varvec{\beta }'^j \right\| ^2 + \left\langle 2\nu \varvec{\beta }'- 2\nu \varvec{\tilde{\beta }}, \varvec{\beta } -\varvec{\beta }' \right\rangle -\frac{2 \nu }{2}\Vert \varvec{\beta } - \varvec{\beta }' \Vert ^2 \end{aligned}$$
(42)
According to (42), we have that the function \(\nu \sum _{j=1}^p\left\| \varvec{\tilde{\beta }}^j - \varvec{\beta }^j \right\| ^2\) with the parameter \(\varvec{\beta }\) in (5) has the normal Lipschitz constant \(2\nu\). This completes the proof. \(\square\)
Lemma 5
Let \(\varvec{\bar{\theta }}^{[1]}\) be the solution of (6) produced by batch gradient descent algorithm after the first iteration with the learning rate of \(\frac{1}{L_{nor}}\). Assume \(\mathbb {E} \bar{\mathcal {F}}(\varvec{\theta }^{[S]};\varvec{\beta }) \le \bar{\mathcal {F}}(\bar{\theta }^{[1]};\varvec{\beta })\) for each call of DSG algorithm. For DSGAM algorithm we have that
$$\begin{aligned} {\mathcal {F}}(\varvec{\theta }^{[t]},\varvec{\beta }^{[t+1]}) - {\mathbb {E}} {\mathcal {F}}(\varvec{\theta }^{[t+1]},\varvec{\beta }^{[t+1]}) \ge \frac{1}{2L_{nor} } \left\| \nabla F(\varvec{\theta }^{[t]}) \right\| ^2 \end{aligned}$$
(43)
where \(F(\varvec{\theta }^{[t]}) = \frac{1}{l}\sum _i^l \nabla F_i(\varvec{\theta }^{[t]})\).
Proof
According to the strong convexity of \(\mathcal {F}(\varvec{\theta },\varvec{\beta })\) w.r.t. the parameter \(\varvec{\theta }\) and \(\varvec{\theta }^{[t+1]} = \varvec{\theta }^{[t]} - \frac{1}{L_{nor}} \nabla F(\varvec{\theta }^{[t]})\), we have that
$$\begin{aligned}{} & {} \mathbb {E} \mathcal {F}(\varvec{\theta }^{[t+1]},\varvec{\beta }^{[t+1]}) \le \mathcal {F}(\overline{\theta ^{[t]}}^{[1]},\varvec{\beta }^{[t+1]})\nonumber \\{} & {} \quad \le \mathcal {F}(\varvec{\theta }^{[t]},\varvec{\beta }^{[t+1]}) + \left\langle \nabla F(\varvec{\theta }^{[t]}), \varvec{\theta }^{[t+1]} - \varvec{\theta }^{[t]} \right\rangle + \frac{L_{nor}}{2} \left\| \varvec{\theta }^{[t+1]} - \varvec{\theta }^{[t]} \right\| ^2\nonumber \\{} & {} \quad = \mathcal {F}(\varvec{\theta }^{[t]},\varvec{\beta }^{[t+1]}) - \frac{1}{L_{nor}} \left\| \nabla F(\varvec{\theta }^{[t]}) \right\| ^2+ \frac{1}{2L_{nor}} \left\| \nabla F(\varvec{\theta }^{[t]}) \right\| ^2\nonumber \\{} & {} \quad = \mathcal {F}(\varvec{\theta }^{[t]},\varvec{\beta }^{[t+1]}) - \frac{1}{2 L_{nor}} \left\| \nabla F(\varvec{\theta }^{[t]}) \right\| ^2 \end{aligned}$$
(44)
where the first inequality uses Assumption 1, the first equality uses \(\varvec{\theta }^{[t+1]} = \varvec{\theta }^{[t]} - \frac{1}{L_{nor}} \nabla F(\varvec{\theta }^{[t]})\). This completes the proof. \(\square\)
Now, we provide the proof to Theorem 2 as follows.
Proof
According to Lemma 3.3 in Beck (2015), we have that
$$\begin{aligned}{} & {} \mathbb {E} \mathcal {F}(\varvec{\theta }^{[t+1]},\varvec{\beta }^{[t+1]}) - \mathcal {F}(\varvec{\theta }^{*},\varvec{\beta }^{*}) \le \left\| \nabla F(\varvec{\theta }^{[t]}) \right\| \left( \left\| \varvec{\theta }^{[t]} -\varvec{\theta }^{*} \right\| + \left\| \varvec{\beta }^{[t+1]} -\varvec{\beta }^{*} \right\| \right) \nonumber \\{} & {} \quad \le \left\| \nabla F(\varvec{\theta }^{[t]}) \right\| \left( \left\| \varvec{\theta }^{[0]} -\varvec{\theta }^{*} \right\| + \left\| \varvec{\beta }^{[0]} -\varvec{\beta }^{*} \right\| \right) =\left\| \nabla F(\varvec{\theta }^{[t]}) \right\| R_2 \end{aligned}$$
(45)
where the second inequality uses the fact \(\left\| \varvec{\theta }^{[t]} -\varvec{\theta }^{*} \right\| + \left\| \varvec{\beta }^{[t+1]} -\varvec{\beta }^{*} \right\| \le \left\| \varvec{\theta }^{[0]} -\varvec{\theta }^{*} \right\| + \left\| \varvec{\beta }^{[0]} -\varvec{\beta }^{*} \right\|\). According to Lemma 5, we have that
$$\begin{aligned}{} & {} \mathcal {F}(\varvec{\theta }^{[t]},\varvec{\beta }^{[t]}) - \mathbb {E} \mathcal {F}(\varvec{\theta }^{[t+1]},\varvec{\beta }^{[t+1]})\nonumber \\{} & {} \quad \ge \mathcal {F}(\varvec{\theta }^{[t]},\varvec{\beta }^{[t+1]}) - \mathbb {E} \mathcal {F}(\varvec{\theta }^{[t+1]},\varvec{\beta }^{[t+1]})\nonumber \\{} & {} \quad \ge \frac{1}{2 L_{nor} } \left\| \nabla F(\varvec{\theta }^{[t]}) \right\| ^2\nonumber \\{} & {} \quad \ge \frac{ \left( \mathbb {E} \mathcal {F}(\varvec{\theta }^{[t+1]},\varvec{\beta }^{[t+1]}) - \mathcal {F}(\varvec{\theta }^{*},\varvec{\beta }^{*}) \right) ^2 }{2 L_{nor} R_2^2} \end{aligned}$$
(46)
Similarly, considering the line 3 of our DSGAM, according to Lemma 4, we can have that
$$\begin{aligned} \mathcal {F}(\varvec{\theta }^{[t]},\varvec{\beta }^{[t]}) - \mathbb {E} \mathcal {F}(\varvec{\theta }^{[t+1]},\varvec{\beta }^{[t+1]}) \ge \frac{ \left( \mathbb {E} \mathcal {F}(\varvec{\theta }^{[t+1]},\varvec{\beta }^{[t+1]}) - \mathcal {F}(\varvec{\theta }^{*},\varvec{\beta }^{*}) \right) ^2 }{4 \nu R_2^2} \end{aligned}$$
(47)
This inequality is proved in Beck (2015).
Combining (46) and (47), we have that
$$\begin{aligned} \mathcal {F}(\varvec{\theta }^{[t]},\varvec{\beta }^{[t]}) - \mathbb {E} \mathcal {F}(\varvec{\theta }^{[t+1]},\varvec{\beta }^{[t+1]}) \ge \frac{ \left( \mathbb {E} \mathcal {F}(\varvec{\theta }^{[t+1]},\varvec{\beta }^{[t+1]}) - \mathcal {F}(\varvec{\theta }^{*},\varvec{\beta }^{*}) \right) ^2 }{2\min \{L_{nor}, 2\nu \} R_2^2} \end{aligned}$$
(48)
According to Lemma 3.6 in Beck (2015) and (48), we have that
$$\begin{aligned} \mathbb {E} \mathcal {F}(\varvec{\theta }^{[T]},\varvec{\beta }^{[T]}) - \mathcal {F}(\varvec{\theta }^{*},\varvec{\beta }^{*}) \le \max \left\{ \left( \frac{1}{2} \right) ^{\frac{T-1}{2}} R_1, \frac{8 \min \{L_{nor}, 2\nu \} R_2}{T-1} \right\} \end{aligned}$$
(49)
This completes the proof. \(\square\)