Abstract
In a Hilbert space \({{\mathcal {H}}}\), we study the convergence properties of a class of relaxed inertial forward–backward algorithms. They aim to solve structured monotone inclusions of the form \(Ax + Bx \ni 0\) where \(A:{{\mathcal {H}}}\rightarrow 2^{{\mathcal {H}}}\) is a maximally monotone operator and \(B:{{\mathcal {H}}}\rightarrow {{\mathcal {H}}}\) is a cocoercive operator. We extend to this class of problems the acceleration techniques initially introduced by Nesterov, then developed by Beck and Teboulle in the case of structured convex minimization (FISTA). As an important element of our approach, we develop an inertial and parametric version of the Krasnoselskii–Mann theorem, where joint adjustment of the inertia and relaxation parameters plays a central role. This study comes as a natural extension of the techniques introduced by the authors for the study of relaxed inertial proximal algorithms. An illustration is given to the inertial Nash equilibration of a game combining non-cooperative and cooperative aspects.
Similar content being viewed by others
Notes
In the statements, we use \(t_k\) with index k instead of i to have homogeneous notation of the sequences.
References
Álvarez, F.: On the minimizing property of a second-order dissipative system in Hilbert spaces. SIAM J. Control Optim. 38(4), 1102–1119 (2000)
Álvarez, F.: Weak convergence of a relaxed and inertial hybrid projection-proximal point algorithm for maximal monotone operators in Hilbert space. SIAM J. Optim. 14, 773–782 (2004)
Álvarez, F., Attouch, H.: The Heavy Ball with Friction Dynamical System for Convex Constrained Minimization Problems, Optimization, Namur, 1998. Lecture Notes in Economics and Mathematical Systems, vol. 481, pp. 25–35. Springer, Berlin (2000)
Álvarez, F., Attouch, H.: An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set-Valued Anal. 9(1–2), 3–11 (2001)
Attouch, H., Cabot, A.: Convergence rates of inertial forward–backward algorithms. SIAM J. Optim. 28(1), 849–874 (2018)
Attouch, H., Cabot, A.: Convergence of a relaxed inertial proximal algorithm for maximally monotone operators, 2018, HAL-01708905 (2018)
Attouch, H., Chbani, Z., Peypouquet, J., Redont, P.: Fast convergence of inertial dynamics and algorithms with asymptotic vanishing damping. Math. Progr. Ser. B 168, 123–175 (2018)
Attouch, H., Maingé, P.E.: Asymptotic behavior of second-order dissipative evolution equations combining potential with non-potential effects, ESAIM: control. Optim. Calc. Var. 17, 836–857 (2010)
Attouch, H., Peypouquet, J.: The rate of convergence of Nesterov’s accelerated forward-backward method is actually faster than \(\frac{1}{k^2}\). SIAM J. Optim. 26(3), 1824–1834 (2016)
Attouch, H., Peypouquet, J.: Convergence of inertial dynamics and proximal algorithms governed by maximal monotone operators. Math. Program., Ser. B. https://doi.org/10.1007/s10107-018-1252-x
Aujol, J.-F., Dossal, C.: Stability of over-relaxations for the forward–backward algorithm, application to FISTA. SIAM J. Optim. 25(4), 2408–2433 (2015)
Baillon, J.B., Brézis, H.: Une remarque sur le comportement asymptotique des semigroupes non linéaires. Houston J. Math. 2(1), 5–7 (1976)
Baillon, J.B., Haddad, G.: Quelques propriétés des opérateurs angle-bornés et \(n\)-cycliquement monotones. Israël J. Math. 26, 137–150 (1977)
Bauschke, H., Combettes, P.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics. Springer, New York (2011)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Bot, R.I., Csetnek, E.R.: Second order forward–backward dynamical systems for monotone inclusion problems. SIAM J. Control Optim. 54(3), 1423–1443 (2016)
Bot, R.I., Csetnek, E.R., Hendrich, C.: Inertial Douglas-Rachford splitting for monotone inclusion problems. Appl. Math. Comput. 256, 472–487 (2015)
Brézis, H.: Opérateurs maximaux monotones dans les espaces de Hilbert et équations d’évolution. Lecture Notes 5, North Holland (1972)
Brézis, H., Browder, F.E.: Nonlinear ergodic theorems. Bull. Am. Math. Soc. 82(6), 959–961 (1976)
Briceño-Arias, L.M., Combettes, P.L.: A monotone\(+\)skew splitting model for composite monotone inclusions in duality. SIAM J. Optim. 21, 1230–1250 (2011)
Chambolle, A., Dossal, Ch.: On the convergence of the iterates of the “fast iterative shrinkage/thresholding algorithm”. J. Optim. Theory Appl. 166, 968–982 (2015)
Chen, G., Rockafellar, R.T.: Convergence rates in forward–backward splitting. SIAM J. Optim. 7, 421–444 (1997)
Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward–backward splitting. Multiscale Model. Simul. 4, 1168–1200 (2005)
Combettes, P.L., Yamada, I.: Compositions and convex combinations of averaged nonexpansive operators. J. Math. Anal. Appl. 425, 55–70 (2015)
Eckstein, J.: Splitting methods for monotone operators with applications to parallel optimization, PhD thesis, Massachusetts Institute of Technology (1989)
Eckstein, J., Bertsekas, D.P.: On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Progr. 55, 293–318 (1992)
Gabay, D.: Applications of the Method of Multipliers to Variational Inequalities. In: Fortin, M., Glowinski, R. (eds.) Augmented Lagrangian Methods: Applications to the Solution of Boundary Value Problems, pp. 299–331. North-Holland, Amsterdam (1983)
Hofbauer, J., Sorin, S.: Best response dynamics for continuous zero-sum games. Discret. Contin. Dyn. Syst. Ser. B 6, 215–224 (2006)
Iutzeler, F., Hendrickx, J.M.: Generic online acceleration scheme for optimization algorithms via relaxation and inertia (2017), arXiv:1603.05398v3 [math.OC]
Kim, D., Fessler, J.A.: Optimized first-order methods for smooth convex minimization. Math. Progr. 159(1), 81–107 (2016)
Liang, J., Fadili, J., Peyré, G.: Local linear convergence of forward-backward under partial smoothness. In: Advances in Neural Information Processing Systems, pp. 1970–1978 (2014)
Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16, 964–979 (1979)
Lorenz, D.A., Pock, T.: An inertial forward–backward algorithm for monotone inclusions. J. Math. Imaging Vis. 51, 311–325 (2015)
Maingé, P.-E.: Convergence theorems for inertial KM-type algorithms. J. Comput. Appl. Math. 219, 223–236 (2008)
Moudafi, A., Oliny, M.: Convergence of a splitting inertial proximal method for monotone operators. J. Comput. Appl. Math. 155, 447–454 (2003)
Nesterov, Y.: A method of solving a convex programming problem with convergence rate \(O(1/k^2)\). Soviet Math. Doklady 27, 372–376 (1983)
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course, Volume 87 of Applied Optimization. Kluwer Academic Publishers, Boston (2004)
Opial, Z.: Weak convergence of the sequence of successive approximations for nonexpansive mappings. Bull. Am. Math. Soc. 73, 591–597 (1967)
Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1, 123–231 (2013)
Passty, G.B.: Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. J. Math. Anal. Appl. 72, 383–390 (1979)
Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5), 1–17 (1964)
Raguet, H., Fadili, J., Peyré, G.: A generalized forward–backward splitting. SIAM J. Imaging Sci. 6, 1199–1226 (2013)
Rockafellar, R.T.: Monotone Operators Associated with Saddle-functions and Minimax Problems. Nonlinear Functional Analysis, Vol. I. In: Browder, F.E. (ed.) Proceedings of Symposia in Pure Mathematical American Mathematical Society (1970)
Su, W., Boyd, S., Candès, E.J.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. Neural Inf. Process. Syst. 27, 2510–2518 (2014)
Tseng, P.: Applications of a splitting algorithm to decomposition in convex programming and variational inequalities. SIAM J. Control Optim. 29, 119–138 (1991)
Tseng, P.: A modified forward–backward splitting method for maximal monotone mappings. SIAM J. Control Optim. 38, 431–446 (2000)
Tseng, P.: On accelerated proximal gradient methods for convex-concave optimization, 2008. Technical report (2008)
Villa, S., Salzo, S., Baldassarres, L., Verri, A.: Accelerated and inexact forward–backward. SIAM J. Optim. 23(3), 1607–1633 (2013)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Yosida Regularization
Let \(A:{{\mathcal {H}}}\rightarrow 2^{{\mathcal {H}}}\) be a maximally monotone operator. For any \(\mu >0\), its resolvent with index \(\mu \) is defined by \(J_{\mu A} = \left( I + \mu A \right) ^{-1}\). The operator \(J_{\mu A}: {{\mathcal {H}}}\rightarrow {{\mathcal {H}}}\) is nonexpansive and everywhere defined (indeed it is firmly non-expansive). The Yosida regularization of A with parameter \(\mu \) is defined by \( A_{\mu } = \frac{1}{\mu } \left( I- J_{\mu A} \right) \). The operator \(A_{\mu }\) is \(\mu \)-cocoercive: for all \(x, y \in {{\mathcal {H}}}\) we have
This property immediately implies that \(A_{\mu }: {{\mathcal {H}}}\rightarrow {{\mathcal {H}}}\) is \(\frac{1}{\mu }\)-Lipschitz continuous. Another property that proves useful is the resolvent equation (see, for example [18, Proposition 2.6] or [14, Proposition 23.6])
which is valid for any \(\theta , \mu >0\). This property allows to compute simply the resolvents of \(A_\mu \) by
for any \(\theta , \mu >0\). Also note that for any \(x \in {{\mathcal {H}}}\), and any \(\mu >0\)
Finally, for any \(\mu >0\), \(A_{\mu }\) and A have the same solution set, \({\mathrm{zer}}A_{\mu }= {\mathrm{zer}}A\). For a detailed presentation of the properties of the maximally monotone operators and the Yosida approximation, see References [14, 18].
Appendix B: Properties of the Operator \(M_{A,B,\mu }\)
The cocoercivity properties of the operator \(M_{A,B,\mu }\) which is defined by \(M_{A,B,\mu }(x)=\frac{1}{\mu } \left( x- J_{\mu A}( x-\mu B(x))\right) \) play a central role in our analysis.
Lemma B.1
Let A be a maximally monotone operator. Suppose that B is \(\lambda \)-cocoercive, and that \(\mu \in ]0, 2\lambda [\). Then the operator \(M_{A,B,\mu }\) is \(\beta \)-cocoercive with \(\beta = \mu \left( 1-\frac{\mu }{4\lambda }\right) \).
Proof
The proof is based on the link between the cocoercivity property and the \(\alpha \)-averaged property. Recall that \(T: {{\mathcal {H}}}\rightarrow {{\mathcal {H}}}\) is \(\alpha \)-averaged if \(T= (1-\alpha )I + \alpha R\) for some nonexpansive operator \(R: {{\mathcal {H}}}\rightarrow {{\mathcal {H}}}\), and \(\alpha \in ]0,1[\). The operators \(J_{\mu A}\) and \(I-\mu B\) are respectively \(\alpha _1:= \frac{1}{2}\) and \(\alpha _2:= \frac{\mu }{2 \lambda }\) averaged, see References [14, Corollary 23.8], resp. [14, Proposition 4.33]. By [24, Proposition 2.4] their composition \(J_{\mu A}(I-\mu B)\) is \(\alpha \)-averaged with \(\alpha = \frac{\alpha _1 + \alpha _2 -2 \alpha _1 \alpha _2}{1- \alpha _1 \alpha _2}\), which yields \(\alpha = \frac{1}{2(1-\frac{\mu }{4\lambda } )}\). Hence \(J_{\mu A}(I-\mu B) =(1-\alpha )I + \alpha R\) for some nonexpansive operator R, which gives \(M_{A,B,\mu } = \frac{\alpha }{\mu } (I-R)\). Since \(I-R\) is \(\frac{1}{2}\)-cocoercive, see for example [3, Lemma 2.3], we finally obtain that \(M_{A,B,\mu }\) is \(\frac{1}{2}\times \frac{\mu }{\alpha }= \mu \left( 1-\frac{\mu }{4\lambda }\right) \)-cocoercive. \(\square \)
Lemma B.2
Let \(A:{{\mathcal {H}}}\rightarrow 2^{{\mathcal {H}}}\) be a maximally monotone operator and let \(B: {{\mathcal {H}}}\rightarrow {{\mathcal {H}}}\) be a \(\lambda \)-cocoercive operator for some \(\lambda >0\). Given \((z,q)\in \mathrm{gph}(A+B)\) and \(\theta \in [0,1[\), we have for every \(x\in {{\mathcal {H}}}\) and \(\mu >0\),
In particular, we obtain for \(\theta =0\)
Now assume that \({\mathrm{zer}}(A+B)\ne \emptyset \) and let \(z\in {\mathrm{zer}}(A+B)\). For every \(x\in {{\mathcal {H}}}\), \(\mu >0\) and \(\theta \in [0,1[\), the following inequality holds true
Proof
First observe that for every \(x\in {{\mathcal {H}}}\) and \(\mu >0\), \(M_{A,B,\mu }(x)=A_\mu (x-\mu B(x))+B(x).\) Using the classical property (65) of the Yosida approximation, this leads to
Since \((z,q)\in \mathrm{gph}(A+B)\), we have \(q-B(z)\in A(z)\). The monotonicity of A then yields
Hence
The \(\lambda \)-cocoercivity of B gives
On the other hand, by using Cauchy-Schwarz inequality, we obtain
By combining (70), (71) and (72), we immediately find (66). Inequality (67) (resp. (68)) is obtained by taking \(\theta =0\) (resp. \(q=0\)) in (66). \(\square \)
Lemma B.3
Let \(A:{{\mathcal {H}}}\rightarrow 2^{{\mathcal {H}}}\) be a maximally monotone operator, and let \(B: {{\mathcal {H}}}\rightarrow {{\mathcal {H}}}\) be a \(\lambda \)-cocoercive operator for some \(\lambda >0\). Let \((\xi _n)\) be a sequence of \({{\mathcal {H}}}\), and let \((\lambda _n)\) be a sequence of positive numbers. Assume that \(\xi _n\rightharpoonup \overline{x^{}}\) weakly in \({{\mathcal {H}}}\) and that
Then we have \(\overline{x^{}}\in {\mathrm{zer}}(A+B)\).
Proof
By using inclusion (69) with \(x=\xi _n\) and \(\mu =\lambda _n\), we obtain
and hence
Since the operator B is \(\frac{1}{\lambda }\)-Lipschitz continuous, we have
We then deduce from (73) that the left member of (74) tends to 0 strongly in \({{\mathcal {H}}}\) as \(n\rightarrow +\infty \). By invoking again (73), we see that \(\xi _n-\lambda _{n} M_{A,B,\lambda _{n}}(\xi _n)\) converges weakly to \(\overline{x^{}}\) as \(n\rightarrow +\infty \). Taking the limit as \(n\rightarrow +\infty \) in (74), we conclude that \(0\in (A+B)(\overline{x^{}})\), due to the graph-closedness of the maximally monotone operator \(A+B\) for the weak-strong topology in \({{\mathcal {H}}}\times {{\mathcal {H}}}\). \(\square \)
We now study the variations of the function \((x,\mu )\mapsto \mu M_{A,B,\mu }(x)\).
Lemma B.4
Let \(A:{{\mathcal {H}}}\rightarrow 2^{{\mathcal {H}}}\) be a maximally monotone operator, and let \(B: {{\mathcal {H}}}\rightarrow {{\mathcal {H}}}\) be a \(\lambda \)-cocoercive operator for some \(\lambda >0\). Assume that \({\mathrm{zer}}(A+B)\ne \emptyset \). Let \(\gamma \), \(\delta \in ]0,2\lambda [\), and x, \(y\in {{\mathcal {H}}}\). Then, for each \(z\in {\mathrm{zer}}(A+B)\), we have
Proof
The proof follows the lines of Reference [10, Lemma A.4]. By using successively the definition of \(M_{A,B,\gamma }\), the resolvent identity [14, Proposition 23.28 (i)], and the nonexpansive property of the resolvent, we obtain
Since \(\delta \in ]0, 2\lambda [\), the operator \(I-\delta B\) is \(\frac{\delta }{2\lambda }\)-averaged, see Reference [14, Proposition 4.33]. This guarantees that the operator \(I-\delta B\) is nonexpansive, hence
On the other hand, observe that for \(z\in {\mathrm{zer}}(A+B)\),
From the triangle inquality combined with the nonexpansive property of the resolvent, we deduce that
By putting together (75), (76) and (77), we conclude that
\(\square \)
Appendix C: Some Auxiliary Results
Lemma C.1
(Attouch-Cabot [6]) Let \((a_k)\), \((\alpha _k)\) and \((w_k)\) be sequences of real numbers satisfying
Assume that \(\alpha _k\ge 0\) for every \(k\ge 1\). Let \((t_i)\) and \((t_{i,k})\) be the sequences respectively defined by (6) and (7).
-
(i)
For every \(k\ge 1\), we have
$$\begin{aligned} \sum _{i=1}^k a_i\le t_{1,k}a_1+\sum _{i=1}^{k-1} t_{i+1,k} w_i. \end{aligned}$$ -
(ii)
Under \((K_0)\), assume that \(\sum _{i=1}^{+\infty }t_{i+1}[w_i]_+<~+\infty \). Then the series \(\sum _{i\ge 1}[a_i]_+\) is convergent, and
$$\begin{aligned} \sum _{i= 1}^{+\infty }[a_i]_+\le t_1 [a_1]_+ +\sum _{i=1}^{+\infty } t_{i+1} [w_i]_+. \end{aligned}$$
The proof is omitted and the reader is referred to Reference [6, Lemma B.1]. It makes use of Reference [6, Lemma 2.4], which corresponds to Lemma 2.1.
The next lemma gives basic properties of the averaging process (58), see Reference [6, Lemma B.2] for the proof.
Lemma C.2
(Attouch-Cabot [6]) Let \(({{\mathcal {X}}},\Vert .\Vert )\) be a Banach space and let \((x_k)\) be a bounded sequence of \({{\mathcal {X}}}\). Given a sequence \((\tau _{i,k})_{i,k\ge 1}\) of nonnegative numbers satisfying (56) and (57), let \(({\widehat{x}}_k)\) be the averaged sequence defined by \({\widehat{x}}_k=\sum _{i=1}^{+\infty }\tau _{i,k}x_i\). Then we have
-
(i)
The sequence \(({\widehat{x}}_k)\) is well-defined, bounded and \(\sup _{k\ge 1}\Vert {\widehat{x}}_k\Vert \le \sup _{k\ge 1}\Vert x_k\Vert \).
-
(ii)
If \((x_k)\) converges toward \(\overline{x^{}}\in {{\mathcal {X}}}\), then the sequence \(({\widehat{x}}_k)\) is also convergent and \(\lim _{k\rightarrow +\infty }{\widehat{x}}_k=\overline{x^{}}\).
Rights and permissions
About this article
Cite this article
Attouch, H., Cabot, A. Convergence of a Relaxed Inertial Forward–Backward Algorithm for Structured Monotone Inclusions. Appl Math Optim 80, 547–598 (2019). https://doi.org/10.1007/s00245-019-09584-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00245-019-09584-z
Keywords
- Structured monotone inclusions
- Inertial forward–backward algorithms
- Cocoercive operators
- Relaxation
- Convergence rate
- Inertial Krasnoselskii–Mann iteration
- Nash equilibration