To derive a practical algorithm, we need to satisfy the update rules (C1) and (C2), as well as the partial monotonicity conditions (G-PM) and (\(\hbox {F}^*\)-PM). As we have already discussed in Sect. 3, this can be done when for some \({\widetilde{\tau }}_i>0\) we set
$$\begin{aligned} {\widetilde{T}}_i={\widetilde{\tau }}_i I, \quad \text {and}\quad {\hat{T}}_i={\widetilde{\tau }}_i I. \end{aligned}$$
(37)
The result of these choices is Algorithm 2, whose convergence we studied in Theorem 3.1. Our task now is to verify its conditions, in particular (G-pc) and \(\hbox {F}^*-\hbox {pc}\) [alternatively (\(\hbox {F}^*-\hbox {pm}\)) and (G-pm)], as well as (\(\hbox {C1}''\)), (\(\hbox {C2}''\)), and (\(\hbox {C3}''\)) for \(\Gamma \) of the projection form \(\gamma P\).
An Approach to Updating \(\Sigma \)
We have not yet defined an explicit update rule for \(\Sigma _{i+1}\), merely requiring that it has to satisfy (\(\hbox {C2}''\)) and (\(\hbox {C1}''\)). The former in particular requires
$$\begin{aligned} \Sigma ^{-1}_{i+1} \ge {\widetilde{\omega }}_i (1-\delta )^{-1} K T_i K^*. \end{aligned}$$
Hiring the help of some linear operator \(\mathcal {F} \in \mathcal {L}(\mathcal {L}(Y; Y)\); \(\mathcal {L}(Y;Y))\) satisfying
$$\begin{aligned} \mathcal {F}(K T_i K^*) \ge K T_i K^*, \end{aligned}$$
(38)
our approach is to define
$$\begin{aligned} \Sigma ^{-1}_{i+1} :={{\widetilde{\omega }}_i}(1-\delta )^{-1} \mathcal {F}(K T_i K^*). \end{aligned}$$
(39)
Then, (\(\hbox {C2}''\)) is satisfied provided \(T^{-1}_i \in \mathcal {Q}\). Since \( {\widetilde{\tau }}_{i+1}^{-1}\Sigma _{i+1}^{-1} ={\widetilde{\tau }}_i^{-1}(1-\delta )^{-1} \mathcal {F}(K T_i K^*), \) the condition (\(\hbox {C1}''\)) reduces into the satisfaction for each \(i \in \mathbb {N}\) of
$$\begin{aligned}&{\widetilde{\tau }}_i^{-1} (I + 2 \Gamma T_i) T^{-1}_i - {\widetilde{\tau }}_{i+1}^{-1} T^{-1}_{i+1} \ge -2{\widetilde{\tau }}_i^{-1}(\Gamma _i-\Gamma ), \quad \text {and} \end{aligned}$$
(40a)
$$\begin{aligned}&\frac{1}{1-\delta }\left( {\widetilde{\tau }}_{i}^{-1}\mathcal {F}\left( KT_{i}K^*\right) - {\widetilde{\tau }}_{i+1}^{-1}\mathcal {F}\left( KT_{i+1} K^*\right) \right) \nonumber \\&\quad \ge -2 {\widetilde{\tau }}_{i+1}^{-1} R_{i+1}. \end{aligned}$$
(40b)
To apply Theorem 3.1, all that remains is to verify in special cases the conditions (40) together with (\(\hbox {C3}''\)) and the partial strong convexity conditions (G-pc) and \(\hbox {F}^*-\hbox {pc}\).
When \(\Gamma \) is a Multiple of a Projection
We now take \(\Gamma =\bar{\gamma }P\) for some \(\bar{\gamma }>0\), and a projection operator \(P \in \mathcal {L}(X; X)\): idempotent, \(P^2=P\), and self-adjoint, \(P^*=P\). We let \(P^\perp :=I-P\). Then, \(P^\perp P=P P^\perp =0\). With this, we assume that \(\widetilde{\mathcal {K}}\) is such that for some \(\bar{\gamma }^\perp >0\) holds
$$\begin{aligned}{}[0, \bar{\gamma }^\perp P^\perp ] \subset \widetilde{\mathcal {K}}. \end{aligned}$$
(41)
To unify our analysis for gap and non-gap estimates of Theorem 3.1, we now pick \(\lambda =1/2\) in the former case, and \(\lambda =1\) in the latter. We then pick \(0 \le \gamma \le \lambda \bar{\gamma }\), and \(0 \le \gamma _i^\perp \le \lambda \bar{\gamma }^\perp \), and set
$$\begin{aligned} T_i= & {} \tau _i P+\tau _i^\perp P^\perp , \quad \Omega _i=\omega _i P+\omega _i^\perp P^\perp , \quad \text {and}\nonumber \\ \Gamma _i= & {} \gamma P + \gamma _i^\perp P^\perp . \end{aligned}$$
(42)
With this, \(\tau _i, \tau _i^\perp > 0\) guarantee \(T_i \in {\mathcal {Q}}\). Moreover, \(T_i\) is self-adjoint. Moreover, \(\Gamma _i \in \lambda ([0, \Gamma ] + \widetilde{\mathcal {K}})\), exactly as required in both the gap and the non-gap cases of Theorem 3.1.
Since
$$\begin{aligned} KT_{i}K^*= & {} \tau _{i} KPK^* +\tau _{i}^\perp KP^\perp K^*\\= & {} (\tau _{i} - \tau _{i}^\perp ) KPK^* + \tau _{i}^\perp KK^*, \end{aligned}$$
we are encouraged to take
$$\begin{aligned} {\mathcal {F}}(K T_{i} K^*) :=\max \{0, \tau _{i} - \tau _{i}^\perp \} \Vert KP\Vert ^2 I + \tau _{i}^\perp \Vert K\Vert ^2 I.\nonumber \\ \end{aligned}$$
(43)
Observe that (43) satisfies (38). Inserting (43) into (39), we obtain
$$\begin{aligned} \Sigma _{i+1}= & {} \sigma _{i+1} I \quad \text {with}\nonumber \\ \sigma ^{-1}_{i+1}= & {} \frac{{\widetilde{\omega }}_i}{1-\delta }\left( \max \{0, \tau _{i} - \tau _{i}^\perp \} \Vert KP\Vert ^2 + \tau _i^\perp \Vert K\Vert ^2\right) .\nonumber \\ \end{aligned}$$
(44)
Since \(\Sigma _{i+1}\) is now equivalent to a scalar, (40b), we also take \(R_{i+1}=\rho _{i+1} I\), assuming for some \(\bar{\rho }>0\) that
$$\begin{aligned}{}[0, \bar{\rho }I] \subset \hat{\mathcal {K}}. \end{aligned}$$
Setting
$$\begin{aligned} \eta _i :={\widetilde{\tau }}_i^{-1}\max \{0, \tau _{i} - \tau _{i}^\perp \} - {\widetilde{\tau }}_{i+1}^{-1}\max \{0, \tau _{i+1} - \tau _{i+1}^\perp \} \end{aligned}$$
we thus expand (40) as
$$\begin{aligned}&{\widetilde{\tau }}_i^{-1} (1 + 2 \gamma \tau _i) \tau ^{-1}_i - {\widetilde{\tau }}_{i+1} \tau ^{-1}_{i+1} \ge 0, \end{aligned}$$
(45a)
$$\begin{aligned}&{\widetilde{\tau }}_i^{-1}\tau _i^{\perp ,-1} - {\widetilde{\tau }}_{i+1}^{-1} \tau _{i+1}^{\perp ,-1} \ge -2 {\widetilde{\tau }}_i^{-1} \gamma _i^\perp , \end{aligned}$$
(45b)
$$\begin{aligned}&\frac{1}{1-\delta } \left( \eta _i \Vert KP\Vert ^2+({\widetilde{\tau }}_i^{-1} \tau _i^\perp -{\widetilde{\tau }}_{i+1}^{-1} \tau _{i+1}^\perp )\Vert K\Vert ^2\right) \nonumber \\&\quad \ge -2 {\widetilde{\tau }}_{i+1}^{-1} \rho _{i+1}. \end{aligned}$$
(45c)
We are almost ready to state a general convergence result for projective \(\Gamma \). However, we want to make one more thing more explicit. Since the choices (42) satisfy
$$\begin{aligned}&\Gamma _i-\lambda \Gamma =(\gamma -\lambda \bar{\gamma })P + \gamma _i^\perp P^\perp \le \gamma _i^\perp P^\perp \\&\quad \text {and}\quad R_{i+1}=\rho _{i+1} I, \end{aligned}$$
we suppose for simplicity that
$$\begin{aligned} \psi _{\Gamma _i-\lambda \Gamma }(x) =\gamma _i^\perp \psi ^\perp (P^\perp x) \quad \text {and}\quad \phi _{R_{i+1}}(y)=\rho _{i+1} \phi (y)\nonumber \\ \end{aligned}$$
(46)
for some \(\psi ^\perp : P^\perp X \rightarrow \mathbb {R}\) and \(\phi : Y \rightarrow \mathbb {R}\). The conditions (G-pc) and \(\hbox {F}^*-\hbox {pc}\) reduce in this case to the satisfaction for some \(\bar{\gamma }, \bar{\gamma }^\perp , \bar{\rho }>0\) of
for all \(x, x' \in X\) and \(0 \le \gamma ^\perp \le \bar{\gamma }^\perp \), as well as of
for all \(y, y' \in Y\) and \(0 \le \rho \le \bar{\rho }\). Analogues of (G-pm) and (\(\hbox {F}^*-\hbox {pm}\)) can be formed.
To summarise the findings of this section, we state the following proposition.
Proposition 4.1
Suppose (G-pcr) and (\(\hbox {F}^*\)-pcr) hold for some projection operator \(P \in \mathcal {L}(X; X)\) and scalars \(\bar{\gamma }, \bar{\gamma }^\perp , \bar{\rho }> 0\). With \(\lambda =1/2\), pick \(\gamma \in [0, \lambda \bar{\gamma }]\). For each \(i \in \mathbb {N}\), suppose (45) is satisfied with
$$\begin{aligned} 0 \le \gamma _i^\perp \le \lambda \bar{\gamma }^\perp , \quad 0 \le \rho _i \le \lambda \bar{\rho }, \quad \text {and}\quad {\widetilde{\tau }}_0 \ge {\widetilde{\tau }}_i >0. \end{aligned}$$
(47)
If we solve (45a) exactly, define \(T_i\), \(\Gamma _i\), and \(\Sigma _{i+1}\) through (42) and (44), and set \(R_{i+1}=\rho _{i+1}I\), then the iterates of Algorithm 2 satisfy with \(C_0\) and \(D_{i+1}\) as in (27) the estimate
$$\begin{aligned}&\frac{\delta }{2}\Vert P(x^N-{\widehat{x}})\Vert ^2 + \frac{1}{\tau ^{-1}_0 + 2\gamma }\mathcal {G}^{N}\nonumber \\&\quad \le {\widetilde{\tau }}_N\tau _N\left( C_0 + \sum _{i=0}^{N-1} D_{i+1} \right) . \end{aligned}$$
(48)
If we take \(\lambda =1\), then (48) holds with \(\mathcal {G}^{N} = 0\).
Observe that presently
$$\begin{aligned} D_{i+1}= & {} {\widetilde{\tau }}_i^{-1}\gamma _i^\perp \psi ^\perp (P^\perp ({x^{i+1}-{\widehat{x}}}))\nonumber \\&+\, {\widetilde{\tau }}_{i+1}^{-1}\rho _{i+1}\phi ({y^{i+1}-{\widehat{y}}}). \end{aligned}$$
(49)
Proof
As we have assumed through (47), or otherwise already verified its conditions, we may apply Theorem 3.1. Multiplying (26) by \({\widetilde{\tau }}_N\tau _N\), we obtain
$$\begin{aligned} \frac{\delta }{2}\Vert x^N-{\widehat{x}}\Vert _P^2 + {\widetilde{q}}_N{\widetilde{\tau }}_N\tau _N \mathcal {G}^{N} \le {\widetilde{\tau }}_N\tau _N\biggl ( C_0 + \sum _{i=0}^{N-1} D_{i+1} \biggr ).\nonumber \\ \end{aligned}$$
(50)
Now, observe that solving (45a) exactly gives
$$\begin{aligned} {\widetilde{\tau }}_{N}^{-1}\tau _N^{-1}= & {} {\widetilde{\tau }}_{N-1}^{-1}\tau _{N-1}^{-1} + 2\gamma {\widetilde{\tau }}_{N-1}^{-1}\nonumber \\= & {} {\widetilde{\tau }}_{0}^{-1}\tau _{0}^{-1} + \sum _{j=0}^{N-1} 2\gamma {\widetilde{\tau }}_{j}^{-1} = {\widetilde{\tau }}_{0}^{-1}\tau _{0}^{-1} + 2\gamma {\widetilde{q}}_N. \end{aligned}$$
(51)
Therefore, we have the estimate
$$\begin{aligned} {\widetilde{q}}_N{\widetilde{\tau }}_{N}\tau _N= & {} \frac{\widetilde{q}_N}{{{\widetilde{\tau }}^{-1}_{0}}\tau _{0}^{-1}+ 2\gamma {\widetilde{q}_N}} \nonumber \\= & {} \frac{1}{{\widetilde{\tau }}_{0}^{-1}\tau _{0}^{-1} \widetilde{q}_N^{-1}+ 2\gamma } \ge \frac{1}{\tau _{0}^{-1} + 2\gamma }. \end{aligned}$$
(52)
With this, (50) yields (48).\(\square \)
Primal and Dual Penalties with Projective \(\Gamma \)
We now study conditions that guarantee the convergence of the sum \({\widetilde{\tau }}_N\tau _N \sum _{i=0}^{N-1} D_{i+1}\) in (48). Indeed, the right-hand sides of (45b) and (45c) relate to \(D_{i+1}\). In most practical cases, which we study below, \(\phi \) and \(\psi \) transfer these right-hand side penalties into simple linear factors within \(D_{i+1}\). Optimal rates are therefore obtained by solving (45b) and (45c) as equalities, with the right-hand sides proportional to each other. Since \(\eta _i \ge 0\), and it will be the case that \(\eta _i=0\) for large i, we, however, replace (45c) by the simpler condition
$$\begin{aligned} \frac{1}{1-\delta } ({\widetilde{\tau }}_i^{-1} \tau _i^\perp -{\widetilde{\tau }}_{i+1}^{-1} \tau _{i+1}^\perp )\Vert K\Vert ^2 \ge -2 {\widetilde{\tau }}_{i+1}^{-1} \rho _{i+1}. \end{aligned}$$
(53)
Then, we try to make the left-hand sides of (45b) and (53) proportional with only \(\tau _{i+1}^\perp \) as a free variable. That is, for some proportionality constant \(\zeta > 0\), we solve
$$\begin{aligned} {\widetilde{\tau }}_i^{-1}\tau _i^{\perp ,-1} - {\widetilde{\tau }}_{i+1}^{-1} \tau _{i+1}^{\perp ,-1} = \zeta ({\widetilde{\tau }}_i^{-1} \tau _i^\perp -{\widetilde{\tau }}_{i+1}^{-1} \tau _{i+1}^\perp ). \end{aligned}$$
(54)
Multiplying both sides of (54) by \(\zeta ^{-1}{\widetilde{\tau }}_{i+1}\tau _{i+1}^\perp \), gives on \(\tau _{i+1}^\perp \) the quadratic condition
$$\begin{aligned} \tau _{i+1}^{\perp ,2} +{\widetilde{\omega }}_i(\zeta ^{-1}\tau _i^{\perp ,-1}- \tau _i^\perp ) \tau _{i+1}^\perp - \zeta ^{-1}= 0. \end{aligned}$$
Thus,
$$\begin{aligned} \tau _{i+1}^\perp= & {} \frac{1}{2} \left( {\widetilde{\omega }}_i(\tau _i^\perp -\zeta ^{-1}\tau _i^{\perp ,-1})\right. \nonumber \\&\left. + \sqrt{{\widetilde{\omega }}_i^2(\tau _i^\perp -\zeta ^{-1}\tau _i^{\perp ,-1})^2+4\zeta ^{-1}} \right) . \end{aligned}$$
(55)
Solving (45b) and (53) as equalities, (54) and (55) give
$$\begin{aligned} 2{\widetilde{\tau }}_i^{-1} \gamma _i^\perp = \frac{2\zeta (1-\delta )}{\Vert K\Vert ^2} {\widetilde{\tau }}_{i+1}^{-1} \rho _{i+1} = \zeta ({\widetilde{\tau }}_{i+1}^{-1} \tau _{i+1}^\perp -{\widetilde{\tau }}_i^{-1} \tau _i^\perp ).\nonumber \\ \end{aligned}$$
(56)
Note that this quantity is non-negative exactly when \(\omega _i^\perp \ge {\widetilde{\omega }}_i\). We have
$$\begin{aligned} \frac{\omega _i^\perp }{{\widetilde{\omega }}_i}= & {} \frac{\tau _{i+1}^\perp }{\tau _i^\perp {\widetilde{\omega }}_i}\\= & {} \frac{1}{2} \left( 1-\zeta ^{-1}\tau _i^{\perp ,-2}\right. \\&\left. + \sqrt{(1-\zeta ^{-1}\tau _i^{\perp ,-2})^2+4\zeta ^{-1}{\widetilde{\omega }}_i^{-2}\tau _i^{\perp ,-2}} \right) . \end{aligned}$$
This quickly yields \(\omega _i^\perp \ge {\widetilde{\omega }}_i\) if \({\widetilde{\omega }}_i \le 1\). In particular, (56) is non-negative when \({\widetilde{\omega }}_i\le 1\).
The next lemma summarises these results for the standard choice of \({\widetilde{\omega }}_i\).
Lemma 4.1
Let \(\tau _{i+1}^\perp \) by given by (55), and set
$$\begin{aligned} {\widetilde{\omega }}_i=\omega _i=1/\sqrt{1+2\gamma \tau _i}. \end{aligned}$$
(57)
Then, \(\omega _i^\perp \ge {\widetilde{\omega }}_i\), \({\widetilde{\tau }}_i \le {\widetilde{\tau }}_0\), and (45) is satisfied with the right-hand sides given by the non-negative quantity in (56). Moreover,
$$\begin{aligned} \tau _i^\perp \le \zeta ^{-1/2} \implies \tau _{i+1}^\perp \le \zeta ^{-1/2}. \end{aligned}$$
(58)
Proof
The choice (57) satisfies (45a), so that (45) in its entirety will be satisfied with the right-hand sides of (45b)–(45c) given by (56). The bound \({\widetilde{\tau }}_i \le {\widetilde{\tau }}_0\) follows from \({\widetilde{\omega }}_i \le 1\). Finally, the implication (58) is a simple estimation of (55).
\(\square \)
Specialisation of Algorithm 2 to the choices in Lemma 4.1 yields the steps of Algorithm 3. Observe that \({\widetilde{\tau }}_i\) entirely disappears from the algorithm. To obtain convergence rates, and to justify the initial conditions, we will shortly seek to exploit with specific \(\phi \) and \(\psi \) the telescoping property stemming from the non-negativity of the last term of (56).
There is still, however, one matter to take care of. We need \(\rho _i \le \lambda \bar{\rho }\) and \(\gamma _i^\perp \le \lambda \bar{\gamma }^\perp \), although in many cases of practical interest, the upper bounds are infinite and hence inconsequential. We calculate from (55) and (57) that
$$\begin{aligned} \gamma _i^\perp= & {} \frac{\zeta }{2} ({\widetilde{\omega }}_i^{-1} \tau _{i+1}^\perp - \tau _i^\perp ) = \frac{1}{2} \left( -\zeta \tau _i^\perp -\tau _i^{\perp ,-1} \right. \nonumber \\&\left. +\sqrt{(\zeta \tau _i^\perp -\tau _i^{\perp ,-1})^2+4\zeta {\widetilde{\omega }}_i^{-2}} \right) \nonumber \\\le & {} \sqrt{\zeta ({\widetilde{\omega }}_i^{-2}-1)} =\sqrt{2\zeta \gamma \tau _i} \le \sqrt{2\zeta \gamma \tau _0}. \end{aligned}$$
(60)
Therefore, we need to choose \(\zeta \) and \(\tau _0\) to satisfy \(2\zeta \gamma \tau _0 \le (\lambda \bar{\gamma }^\perp )^2\). Likewise, we calculate from (56), (57), and (60) that
$$\begin{aligned} \rho _{i+1}= & {} \frac{{\widetilde{\omega }}_i}{c} \gamma _i^\perp = \frac{\Vert K\Vert ^2 {\widetilde{\omega }}_i }{(1-\delta )\zeta } \gamma _i^\perp \le \frac{\Vert K\Vert ^2 {\widetilde{\omega }}_i }{(1-\delta )\zeta } \sqrt{2 \zeta \gamma \tau _i}\\= & {} \frac{\Vert K\Vert ^2}{(1-\delta )\zeta } \sqrt{2 \zeta \gamma \tau _0}. \end{aligned}$$
This tells us to choose \(\tau _0\) and \(\zeta \) to satisfy \(2 \Vert K\Vert ^4/(1-\delta )^2 \zeta ^{-1} \gamma \tau _0 \le (\lambda \bar{\rho })^2\). Overall, we obtain for \(\tau _0\) and \(\zeta \) the condition
$$\begin{aligned} 0 < \tau _0 \le \frac{\lambda ^2}{2\gamma } \min \left\{ \frac{\bar{\gamma }^{\perp ,2}}{\zeta }, \frac{\bar{\rho }^2 \zeta (1-\delta )^2}{\Vert K\Vert ^4} \right\} . \end{aligned}$$
(61)
This can always be satisfied through suitable choices of \(\tau _0\) and \(\zeta \).
If now \(\phi \equiv C_\phi \) and \(\psi \equiv C_\psi ^\perp \), using the non-negativity of (56), we calculate
$$\begin{aligned}&\sum _{i=0}^{N-1} {\widetilde{\tau }}^{-1}_{i+1}\rho _{i+1} \phi ({y^{i+1}-{\widehat{y}}}) =\frac{\Vert K\Vert ^2C_\phi }{2(1-\delta )}\sum _{i=0}^{N-1}\nonumber \\&\quad \left( \frac{{\widetilde{\tau }}_{i+1}^{-1} \tau _{i+1}^\perp }{2} - \frac{{\widetilde{\tau }}_i^{-1} \tau _i^\perp }{2} \right) \le \frac{\Vert K\Vert ^2 C_\phi }{2(1-\delta )} {\widetilde{\tau }}_{N}^{-1} \tau _{N}^\perp . \end{aligned}$$
(62)
Similarly
$$\begin{aligned} \sum _{i=0}^{N-1} {\widetilde{\tau }}^{-1}_i \gamma _i^\perp \psi ({x^{i+1}-{\widehat{x}}}) \le \frac{\zeta C_\psi ^\perp }{2} {\widetilde{\tau }}_{N}^{-1} \tau _{N}^\perp . \end{aligned}$$
(63)
Using these expression to expand (49), we obtain the following convergence result.
Theorem 4.1
Suppose (G-pcr) and (\(\hbox {F}^*\)-pcr) hold for some projection operator \(P \in \mathcal {L}(X; X)\), scalars \(\bar{\gamma }, \bar{\gamma }^\perp , \bar{\rho }> 0\) with \(\phi \equiv C_\phi \), and \(\psi \equiv C_\psi ^\perp \), for some constants \(C_\phi , C_\psi ^\perp >0\). With \(\lambda =1/2\), fix \(\gamma \in (0, \lambda \bar{\gamma }]\). Select initial \(\tau _0, \tau ^\perp _0 > 0\), as well as \(\delta \in (0, 1)\) and \(\zeta \le (\tau _0^\perp )^{-2}\) satisfying (61). Then, Algorithm 3 satisfies for some \(C_0,C_\tau >0\) the estimate
$$\begin{aligned}&\frac{\delta }{2}\Vert P(x^N-{\widehat{x}})\Vert ^2 + \frac{1}{\tau ^{-1}_0+2\gamma }\mathcal {G}^{N} \le \frac{C_0 C_\tau ^2}{N^2}\nonumber \\&\quad +\, \frac{C_\tau }{2 N}\left( \zeta ^{1/2} C_\psi ^\perp +\frac{\zeta ^{-1/2}\Vert K\Vert ^2}{1-\delta }C_\phi \right) , \quad (N \ge 0). \end{aligned}$$
(64)
If we take \(\lambda =1\), then (48) holds with \(\mathcal {G}^{N} = 0\).
Proof
During the course of the derivation of Algorithm 3, we have verified (45), solving (45a) as an equality. Moreover, Lemma 4.1 and (61) guarantee (47). We may therefore apply Proposition 4.1. Inserting (62) and (63) into (48) and (49) gives
$$\begin{aligned}&\frac{\delta }{2}\Vert P(x^N-{\widehat{x}})\Vert ^2 + \frac{1}{\tau ^{-1}_0+2\gamma }\mathcal {G}^{N} \le \tau _N{\widetilde{\tau }}_N\nonumber \\&\quad \times \, \biggl ( C_0 + \frac{\zeta C_\psi ^\perp }{2} {\widetilde{\tau }}_{N}^{-1} \tau _{N}^\perp + \frac{\Vert K\Vert ^2 C_\phi }{2(1-\delta )} {\widetilde{\tau }}_{N}^{-1} \tau _{N}^\perp \biggr ). \end{aligned}$$
(65)
The condition \(\zeta \le (\tau _0^\perp )^{-2}\) now guarantees \(\tau _N^\perp \le \zeta ^{-1/2}\) through (58). Now we note that \({\widetilde{\tau }}_i\) is not used in Algorithm 3, so it only affects the convergence rate estimates. We therefore simply take \({\widetilde{\tau }}_0=\tau _0\), so that \({\widetilde{\tau }}_N=\tau _N\) for all \(N \in \mathbb {N}\). With this and the bound \(\tau _N \le C_\tau /N\) from Remark 3.2, (64) follows by simple estimation of (65).\(\square \)
Remark 4.1
As a special case of Algorithm 3, if we choose \(\zeta = \tau _0^{\perp , -2}\), then we can show from (55) that \(\tau _i^\perp =\tau _0^\perp =\zeta ^{-1/2}\) for all \(i \in \mathbb {N}\).
Remark 4.2
The convergence rate provided by Theorem 4.1 is a mixed \(O(1/N^2) + O(1/N)\) rate, similarly to that derived in [5] for a type of forward–backward splitting algorithm for smooth G. Ours is of course backward–backward type algorithm. It is interesting to note that using the differentiability properties of infimal convolutions [23, Proposition 18.7], and the presentation of a smooth G as an infimal convolution, it is formally possible to derive a forward–backward algorithm from Algorithm 3. The difficulties lie in combining this conversion trick with conditions on the step lengths.
Dual Penalty Only with Projective \(\Gamma \)
Continuing with the projective \(\Gamma \) setup of Sect. 4.2, we now study the case \(\widetilde{\mathcal {K}}=\{0\}\), that is, when only the dual penalty \(\phi \) is available with \(\psi \equiv 0\). To use Proposition 4.1, we need to satisfy (47) and (45), with (45a) holding as an equality. Since \(\gamma _i^\perp =0\), (45b) becomes
$$\begin{aligned} {\widetilde{\tau }}_i^{-1}\tau _i^{\perp ,-1} - {\widetilde{\tau }}_{i+1}^{-1} \tau _{i+1}^{\perp ,-1} \ge 0. \end{aligned}$$
(66)
With respect to \(\tau _{i+1}^{\perp }\), the left-hand side of (45c) is maximised (and the penalty on the right-hand side minimised) when (66) is minimised. Thus, we solve (66) exactly, which gives
$$\begin{aligned} \tau _{i+1}^{\perp }= \tau _i^{\perp }{\widetilde{\omega }}_i^{-1}. \end{aligned}$$
In consequence \(\omega _i^\perp ={\widetilde{\omega }}_i^{-1}\), and (45c) becomes
$$\begin{aligned} \frac{1}{1-\delta }\eta _i \Vert KP\Vert ^2 + \frac{{\widetilde{\tau }}_i^{-2}}{1-\delta } (1-{\widetilde{\omega }}_{i}^{-2})\Vert K\Vert ^2 \ge -2 {\widetilde{\tau }}_{i+1}^{-1} \rho _{i+1}.\nonumber \\ \end{aligned}$$
(67)
In order to simultaneously satisfy (45a), this suggests for some, yet undetermined, \(a_i>0\), to choose
$$\begin{aligned} {\widetilde{\omega }}_{i} :=\frac{1}{\sqrt{1+a_i{\widetilde{\tau }}_i^2}} \quad \text {and}\quad \omega _i :=\frac{1}{{\widetilde{\omega }}_i(1+2\gamma \tau _i)}. \end{aligned}$$
(68)
Since \(\eta _i \ge 0\), (67) is satisfied with the choice (68) if we take
$$\begin{aligned} \rho _{i+1} = {\widetilde{\tau }}_{i+1} a_i\frac{\Vert K\Vert ^2}{2(1-\delta )}. \end{aligned}$$
To use Proposition 4.1, we need to satisfy \(\rho _{i+1} \le \lambda \bar{\rho }\). Since (68) implies that \(\{{\widetilde{\tau }}_i\}_{i=0}^\infty \) is non-increasing, we can satisfy this for large enough i if \(a_i \searrow 0\). To ensure satisfaction for all \(i \in \mathbb {N}\), it suffices to take \(\{a_i\}_{i=0}^\infty \) non-increasing, and satisfy the initial condition
$$\begin{aligned} a_0 {\widetilde{\tau }}_0 \frac{\Vert K\Vert ^2}{2(1-\delta )} \le \lambda \bar{\rho }. \end{aligned}$$
(69)
The rule \({\widetilde{\tau }}_{i+1}={\widetilde{\omega }}_i{\widetilde{\tau }}_i\) and (68) give \({\widetilde{\tau }}_{i+1}^{-2} = {\widetilde{\tau }}_{i}^{-2} + a_i\). We therefore see that
$$\begin{aligned} {\widetilde{\tau }}_N^{-1}\tau _N^{-1}= & {} {\widetilde{\tau }}_0^{-1}\tau _0^{-1} + 2\gamma \sum _{i=0}^{N-1} \sqrt{\textstyle {\widetilde{\tau }}_0^{-2} + \sum _{j=0}^{i-1} a_j}\\\ge & {} 2\gamma \sum _{i=0}^{N-1} \sqrt{\textstyle {\widetilde{\tau }}_0^{-2} + \sum _{j=0}^{i-1} a_j} =: 1/\mu _0^N. \end{aligned}$$
Assuming \(\phi \) to have the structure (46), moreover,
$$\begin{aligned} \sum _{i=0}^{N-1} D_{i+1}= & {} \sum _{i=0}^{N-1} \phi _{{\widetilde{\tau }}_{i+1}^{-1} R_{i+1}}({y^{i+1}-{\widehat{y}}})\nonumber \\= & {} \frac{\Vert K\Vert ^2}{2(1-\delta )} \sum _{i=0}^{N-1} a_i \phi ({y^{i+1}-{\widehat{y}}}). \end{aligned}$$
Thus, the rate (48) in Proposition 4.1 states
$$\begin{aligned} \frac{\delta }{2}\Vert P(x^N-{\widehat{x}})\Vert ^2 + \frac{1}{\tau ^{-1}_0 + 2\gamma }\mathcal {G}^{N} \le \mu _0^N C_0 + \frac{\Vert K\Vert ^2}{2(1-\delta )} \mu _1^N\nonumber \\ \end{aligned}$$
(70)
for
$$\begin{aligned} \mu _1^N :=\mu _0^N \sum _{i=0}^{N-1} a_i \phi ({y^{i+1}-{\widehat{y}}}). \end{aligned}$$
The convergence rate is thus completely determined by \(\mu _0^N\) and \(\mu _1^N\).
Remark 4.3
If \(\phi \equiv 0\), that is, if \(F^*\) is strongly convex, we may simply pick \({\widetilde{\omega }}_i=\omega _i=1/\sqrt{1+2\gamma \tau _i}\), that is \(a_i=2\gamma \), and obtain from (70) a \(O(1/N^2)\) convergence rate.
For a more generally applicable algorithm, suppose \(\phi ({y^{i+1}-{\widehat{y}}}) \equiv C_\phi \) as in Theorem 4.1. We need to choose \(a_i\). One possibility is to pick some \(q \in (0, 1]\) and
$$\begin{aligned} a_i :={\widetilde{\tau }}_0^{-2}\bigl ((i+1)^{q}-i^{q}\bigr ). \end{aligned}$$
(71)
The concavity of \(i \mapsto q^i\) for \(q \in (0, 1]\) easily shows that \(\{a_i\}_{i=0}^\infty \) is non-increasing. With the choice (71), we then compute
$$\begin{aligned}&\sum _{i=0}^{N-1} \sqrt{\textstyle {\widetilde{\tau }}_0^{-2} + \sum _{j=0}^{i-1} a_j} ={\widetilde{\tau }}_0^{-1}\sum _{i=0}^{N-1} i^{q/2}\nonumber \\&\quad \ge {\widetilde{\tau }}_0^{-1} \int _0^{N-1} x^{q/2} \,dx = \frac{{\widetilde{\tau }}_0^{-1}}{1+q/2}(N-1)^{1+q/2}, \end{aligned}$$
and
$$\begin{aligned} \sum _{i=0}^{N-1} a_i \le {\widetilde{\tau }}_0^{-2} N^q. \end{aligned}$$
If \(N \ge 2\), we find with \(C_a=(1+q/2)/(2^{1+q/2}\lambda \gamma )\) that
$$\begin{aligned} \mu _0^N \le \frac{{\widetilde{\tau }}_0 C_a}{N^{1+q/2}}, \quad \text {and}\quad \mu _1^N \le \frac{C_a C_\phi }{{\widetilde{\tau }}_0N^{1-q/2}}. \end{aligned}$$
(72)
The choice \(q=0\) gives uniform O(1 / N) over both the initialisation and the dual sequence. By choosing \(q=1\), we get \(O(1/N^{3/2})\) convergence with respect to the initialisation, and \(O(1/N^{1/2})\) with respect to the residual sequence.
With these choices, Algorithm 2 yields Algorithm 4, whose convergence properties are stated in the next theorem.
Theorem 4.2
Suppose (G-pcr) and (\(\hbox {F}^*\)-pcr) hold for some projection operator \(P \in \mathcal {L}(X; X)\) and \(\bar{\gamma }, \bar{\gamma }^\perp , \bar{\rho }\ge 0\) with \(\psi \equiv 0\) and \(\phi \equiv C_\phi \) for some constant \(C_\phi \ge 0\). With \(\lambda =1/2\), choose \(\gamma \in (0, \lambda \bar{\gamma }]\), and pick the sequence \(\{a_i\}_{i=0}^\infty \) by (71) for some \(q \in (0, 1]\). Select initial \(\tau _0, \tau _0^\perp , {\widetilde{\tau }}_0 >0\) and \(\delta \in (0, 1)\) verifying (69). Then, Algorithm 4 satisfies
$$\begin{aligned}&\frac{\delta }{2}\Vert P(x^N-{\widehat{x}})\Vert ^2 + \frac{1}{\tau ^{-1}_0 + \gamma }\mathcal {G}^{N} \le \frac{{\widetilde{\tau }}_0 C_a C_0}{N^{1+q/2}}\nonumber \\&\quad + \frac{C_a C_\phi \Vert K\Vert ^2}{2(1-\delta ){\widetilde{\tau }}_0^2 N^{1-q/2}}, \quad (N \ge 2). \end{aligned}$$
(74)
If we take \(\lambda =1\), then (74) holds with \(\mathcal {G}^{N} = 0\).
Proof
We apply Proposition 4.1 whose assumptions we have verified during the course of the present section. In particular, \({\widetilde{\tau }}_i \le {\widetilde{\tau }}_0\) through the choice (68) that forces \({\widetilde{\omega }}_i \le 1\). Also, have already derived the rate (70) from (48). Inserting (72) into (70), noting that the former is only valid for \(N \ge 2\), immediately gives (74).\(\square \)