Skip to main content
Log in

Convergence of a relaxed inertial proximal algorithm for maximally monotone operators

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

In a Hilbert space \({\mathcal {H}}\), given \(A{:}\;{\mathcal {H}}\rightarrow 2^{\mathcal {H}}\) a maximally monotone operator, we study the convergence properties of a general class of relaxed inertial proximal algorithms. This study aims to extend to the case of the general monotone inclusion \(Ax \ni 0\) the acceleration techniques initially introduced by Nesterov in the case of convex minimization. The relaxed form of the proximal algorithms plays a central role. It comes naturally with the regularization of the operator A by its Yosida approximation with a variable parameter, a technique recently introduced by Attouch–Peypouquet (Math Program Ser B, 2018. https://doi.org/10.1007/s10107-018-1252-x) for a particular class of inertial proximal algorithms. Our study provides an algorithmic version of the convergence results obtained by Attouch–Cabot (J Differ Equ 264:7138–7182, 2018) in the case of continuous dynamical systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. Note that in [4, Proposition 14], a closely related but different condition has been considered: the difference of the quotients is assumed to be less than or equal to c (and this guarantees (\(K_0\))).

References

  1. Álvarez, F.: Weak convergence of a relaxed and inertial hybrid projection-proximal point algorithm for maximal monotone operators in Hilbert space. SIAM J. Optim. 14, 773–782 (2004)

    Article  MathSciNet  Google Scholar 

  2. Álvarez, F., Attouch, H.: The heavy ball with friction dynamical system for convex constrained minimization problems, Optimization (Namur, 1998), pp. 25–35, Lecture Notes in Economics and Mathematical Systems, vol. 481. Springer, Berlin, (2000)

  3. Álvarez, F., Attouch, H.: An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set Valued Anal. 9(1–2), 3–11 (2001)

    Article  MathSciNet  Google Scholar 

  4. Attouch, H., Cabot, A.: Convergence rates of inertial forward–backward algorithms. SIAM J. Optim. 28, 849–874 (2018)

    Article  MathSciNet  Google Scholar 

  5. Attouch, H., Cabot, A.: Convergence of damped inertial dynamics governed by regularized maximally monotone operators. J. Differ. Equ. 264, 7138–7182 (2018)

    Article  MathSciNet  Google Scholar 

  6. Attouch, H., Maingé, P.E.: Asymptotic behavior of second-order dissipative evolution equations combining potential with non-potential effects. ESAIM Control Optim. Calc. Var. 17, 836–857 (2010)

    Article  MathSciNet  Google Scholar 

  7. Attouch, H., Peypouquet, J.: Convergence of inertial dynamics and proximal algorithms governed by maximal monotone operators. Math. Program. Ser. B. https://doi.org/10.1007/s10107-018-1252-x (2018)

  8. Bauschke, H., Combettes, P.: Convex Analysis and Monotone Operator Theory in Hilbert spaces. CMS Books in Mathematics. Springer, Berlin (2011)

    Book  Google Scholar 

  9. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    Article  MathSciNet  Google Scholar 

  10. Bertsekas, D.P.: Constrained Optimization and Lagrange Multiplier Methods. Academic Press, New York (1982)

    MATH  Google Scholar 

  11. Bot, R.I., Csetnek, E.R.: Second order forward–backward dynamical systems for monotone inclusion problems. SIAM J. Control Optim. 54, 1423–1443 (2016)

    MathSciNet  MATH  Google Scholar 

  12. Boţ, R.I., Csetnek, E.R., Hendrich, C.: Inertial Douglas–Rachford splitting for monotone inclusion problems. Appl. Math. Comput. 256, 472–487 (2015)

    MathSciNet  MATH  Google Scholar 

  13. Brézis, H.: Opérateurs maximaux monotones dans les espaces de Hilbert et équations d’évolution. Lecture Notes 5. North Holland (1972)

  14. Brézis, H., Browder, F.E.: Nonlinear ergodic theorems. Bull. Am. Math. Soc. 82(6), 959–961 (1976)

    MathSciNet  MATH  Google Scholar 

  15. Brézis, H., Lions, P.L.: Produits infinis de résolvantes. Isr. J. Math. 29, 329–345 (1978)

    MATH  Google Scholar 

  16. Cabot, A., Frankel, P.: Asymptotics for some proximal-like method involving inertia and memory aspects. Set Valued Var. Anal. 19, 59–74 (2011)

    Article  MathSciNet  Google Scholar 

  17. Combettes, P.L., Glaudin, L.E.: Quasinonexpansive iterations on the affine hull of orbits: from Mann’s mean value algorithm to inertial methods. SIAM J. Optim. 27, 2356–2380 (2017)

    Article  MathSciNet  Google Scholar 

  18. Eckstein, J., Bertsekas, D.P.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55, 293–318 (1992)

    Article  MathSciNet  Google Scholar 

  19. Eckstein, J., Ferris, M.C.: Operator-splitting methods for monotone affine variational inequalities, with a parallel application to optimal control. Informs J. Comput. 10, 218–235 (1998)

    Article  MathSciNet  Google Scholar 

  20. Iutzeler, F., Hendrickx, J.M.: Generic online acceleration scheme for optimization algorithms via relaxation and inertia. Optim. Methods Softw. (2018) (to appear)

  21. Lorenz, D.A., Pock, T.: An inertial forward–backward algorithm for monotone inclusions. J. Math. Imaging Vis. 51, 311–325 (2015)

    MathSciNet  MATH  Google Scholar 

  22. Maingé, P.-E.: Convergence theorems for inertial KM-type algorithms. J. Comput. Appl. Math. 219, 223–236 (2008)

    MathSciNet  MATH  Google Scholar 

  23. Moudafi, A., Oliny, M.: Convergence of a splitting inertial proximal method for monotone operators. J. Comput. Appl. Math. 155, 447–454 (2003)

    MathSciNet  MATH  Google Scholar 

  24. Nesterov, Y.: A method of solving a convex programming problem with convergence rate \(O(1/k^2)\). Sov. Math. Dokl. 27, 372–376 (1983)

    MATH  Google Scholar 

  25. Opial, Z.: Weak convergence of the sequence of successive approximations for nonexpansive mappings. Bull. Am. Math. Soc. 73, 591–597 (1967)

    MathSciNet  MATH  Google Scholar 

  26. Passty, G.B.: Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. J. Math. Anal. Appl. 72, 383–390 (1979)

    MathSciNet  MATH  Google Scholar 

  27. Pesquet, J.-C., Pustelnik, N.: A Parallel Inertial Proximal Optimization Method. Pacific Journal of Optimization 8, 273–305 (2012)

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexandre Cabot.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A. Yosida regularization

A set-valued mapping A from \({\mathcal {H}}\) to \({\mathcal {H}}\) assigns to each \(x\in {\mathcal {H}}\) a set \(A(x)\subset {\mathcal {H}}\), hence it is a mapping from \({\mathcal {H}}\) to \(2^{\mathcal {H}}\). Every set-valued mapping \(A:{\mathcal {H}}\rightarrow 2^{\mathcal {H}}\) can be identified with its graph defined by

$$\begin{aligned}{\mathrm{gph}}A=\{(x,u)\in {\mathcal {H}}\times {\mathcal {H}}: \, u\in A(x)\}.\end{aligned}$$

The set \(\{x\in {\mathcal {H}}:\ 0\in A(x)\}\) of the zeros of A is denoted by \({\mathrm{zer}}A\). An operator \(A:{\mathcal {H}}\rightarrow 2^{\mathcal {H}}\) is said to be monotone if for any (xu), \((y,v)\in {\mathrm{gph}}A\), one has \(\langle y-x, v-u\rangle \ge 0\). It is maximally monotone if there exists no monotone operator whose graph strictly contains \({\mathrm{gph}}A\). If a single-valued operator \(A:{\mathcal {H}}\rightarrow {\mathcal {H}}\) is continuous and monotone, then it is maximally monotone, cf.  [13, Proposition 2.4].

Given a maximally monotone operator A and \(\lambda >0\), the resolvent of A with index \(\lambda \) and the Yosida regularization of A with parameter \(\lambda \) are defined by

$$\begin{aligned} J_{\lambda A} = \left( I + \lambda A \right) ^{-1}\qquad \hbox {and}\qquad A_{\lambda } = \frac{1}{\lambda } \left( I- J_{\lambda A} \right) , \end{aligned}$$

respectively. The operator \(J_{\lambda A}: {\mathcal {H}}\rightarrow {\mathcal {H}}\) is nonexpansive and everywhere defined (indeed it is firmly non-expansive). Moreover, \(A_{\lambda }\) is \(\lambda \)-cocoercive: for all \(x, y \in {\mathcal {H}}\) we have

$$\begin{aligned} \langle A_{\lambda }y - A_{\lambda }x, y-x\rangle \ge \lambda \Vert A_{\lambda }y - A_{\lambda }x \Vert ^2 . \end{aligned}$$

This property immediately implies that \(A_{\lambda }: {\mathcal {H}}\rightarrow {\mathcal {H}}\) is \(\frac{1}{\lambda }\)-Lipschitz continuous. Another property that proves useful is the resolvent equation (see, for example, [13, Proposition 2.6] or [8, Proposition 23.6])

$$\begin{aligned} (A_\lambda )_{\mu }= A_{(\lambda +\mu )}, \end{aligned}$$

which is valid for any \(\lambda , \mu >0\). This property allows to compute simply the resolvent of \(A_\lambda \) by

$$\begin{aligned} J_{\mu A_\lambda } = \frac{\lambda }{\lambda + \mu }I + \frac{\mu }{\lambda + \mu }J_{(\lambda + \mu )A}, \end{aligned}$$

for any \(\lambda , \mu >0\). Also note that for any \(x \in {\mathcal {H}}\), and any \(\lambda >0\)   \( A_\lambda (x) \in A (J_{\lambda A}x)= A( x - \lambda A_\lambda (x)). \) Finally, for any \(\lambda >0\), \(A_{\lambda }\) and A have the same solution set, \({\mathrm{zer}}A_{\lambda }= {\mathrm{zer}}A\). For a detailed presentation of the maximally monotone operators and the Yosida approximation, the reader can consult [8] or [13].

Appendix B. Some auxiliary results

In this section, we present some auxiliary lemmas that are used throughout the paper.

Lemma B.1

Let \((a_k)\), \((\alpha _k)\) and \((w_k)\) be sequences of real numbers satisfying

$$\begin{aligned} a_{i+1}\le \alpha _i a_i+w_i \quad \text{ for } \text{ every } i\ge 1. \end{aligned}$$
(67)

Assume that \(\alpha _i\ge 0\) for every \(i\ge 1\).

  1. (i)

    For every \(k\ge 1\), we have

    $$\begin{aligned} \sum _{i=1}^k a_i\le t_{1,k}a_1+\sum _{i=1}^{k-1} t_{i+1,k} w_i, \end{aligned}$$
    (68)

    where the double sequence \((t_{i,k})\) is defined by (13).

  2. (ii)

    Under \((K_0)\), assume that the sequence \((t_i)\) defined by (15) satisfies \(\sum _{i=1}^{+\infty }t_{i+1}(w_i)_+<~+\infty \). Then the series \(\sum _{i\ge 1}(a_i)_+\) is convergent, and

    $$\begin{aligned}\sum _{i= 1}^{+\infty }(a_i)_+\le t_1(a_1)_+ +\sum _{i=1}^{+\infty } t_{i+1} (w_i)_+.\end{aligned}$$

Proof

\(\mathrm{{(i)}}\) Recall from Lemma 2.4\(\mathrm{{(i)}}\) that \(\alpha _i t_{i+1,k}=t_{i,k}-1\) for every \(i\ge 1\) and \(k\ge i+1\). Multiplying inequality (67) by \(t_{i+1,k}\) gives

$$\begin{aligned}t_{i+1,k} a_{i+1}\le (t_{i,k}-1) a_i+t_{i+1,k}w_i,\end{aligned}$$

or equivalently

$$\begin{aligned}a_i\le (t_{i,k}a_i-t_{i+1,k} a_{i+1}) +t_{i+1,k}w_i.\end{aligned}$$

By summing from \(i=1\) to \(k-1\), we deduce that

$$\begin{aligned}\sum _{i=1}^{k-1} a_i\le t_{1,k}a_1-t_{k,k}a_k+\sum _{i=1}^{k-1} t_{i+1,k} w_i.\end{aligned}$$

Since \(t_{k,k}=1\), inequality (68) follows immediately.

\(\mathrm{{(ii)}}\) Taking the positive part of each member of (67), we find

$$\begin{aligned}(a_{i+1})_+\le \alpha _i (a_i)_+ +(w_i)_+ .\end{aligned}$$

By applying \(\mathrm{{(i)}}\) with \((a_i)_+\) (resp. \((w_i)_+\)) in place of \(a_i\) (resp. \(w_i\)), we obtain for every \(k\ge 1\)

$$\begin{aligned} \sum _{i=1}^k (a_i)_+\le & {} t_{1,k}(a_1)_+ +\sum _{i=1}^{k-1} t_{i+1,k} (w_i)_+ \le t_{1}(a_1)_+ +\sum _{i=1}^{+\infty } t_{i+1} (w_i)_+<+\infty , \end{aligned}$$

because \(t_{i+1,k} \le t_{i+1} \), and \(\sum _{i=1}^{+\infty } t_{i+1} (w_i)_+<+\infty \) by assumption. Then just let k tend to \( + \infty \). \(\square \)

Given a bounded sequence \((x_k)\) of a Banach space \(({\mathcal {X}},\Vert .\Vert )\), the next lemma gives basic properties of the averaged sequence \((\widehat{x}_k)\) defined by (57).

Lemma B.2

Let \(({\mathcal {X}},\Vert .\Vert )\) be a Banach space and let \((x_k)\) be a bounded sequence of \({\mathcal {X}}\). Given a sequence \((\tau _{i,k})_{i,k\ge 1}\) of nonnegative numbers satisfying (55)–(56), let \((\widehat{x}_k)\) be the averaged sequence defined by \(\widehat{x}_k=\sum _{i=1}^{+\infty }\tau _{i,k}x_i\). Then we have

  1. (i)

    The sequence \((\widehat{x}_k)\) is well-defined, bounded and \(\sup _{k\ge 1}\Vert \widehat{x}_k\Vert \le \sup _{k\ge 1}\Vert x_k\Vert \).

  2. (ii)

    If \((x_k)\) converges toward \(\overline{x^{}}\in {\mathcal {X}}\), then the sequence \((\widehat{x}_k)\) is also convergent and \(\lim _{k\rightarrow +\infty }\widehat{x}_k=\overline{x^{}}\).

Proof

\(\mathrm{{(i)}}\) Set \(M=\sup _{k\ge 1}\Vert x_k\Vert <+\infty \). In view of (55), observe that for every \(k\ge 1\),

$$\begin{aligned} \sum _{i=1}^{+\infty }\tau _{i,k}\Vert x_i\Vert \le M\, \sum _{i=1}^{+\infty }\tau _{i,k}=M. \end{aligned}$$
(69)

Since the space \({\mathcal {X}}\) is complete, we classically deduce that the series \(\sum _{i\ge 1}\tau _{i,k}x_i\) is convergent. From the definition of \(\widehat{x}_k\), we then have \(\Vert \widehat{x}_k\Vert \le \sum _{i=1}^{+\infty }\tau _{i,k}\Vert x_i\Vert ,\) and hence \(\Vert \widehat{x}_k\Vert \le M\) in view of (69).

\(\mathrm{{(ii)}}\) Assume that \((x_k)\) converges toward \(\overline{x^{}}\in {\mathcal {X}}\). By using (55), we have for every \(k\ge 1\),

$$\begin{aligned} \Vert \widehat{x}_k-\overline{x^{}}\Vert= & {} \left\| \sum _{i=1}^{+\infty }\tau _{i,k}(x_i-\overline{x^{}})\right\| \le \sum _{i=1}^{+\infty }\tau _{i,k}\Vert x_i-\overline{x^{}}\Vert . \end{aligned}$$

Fix \(\varepsilon >0\), and let \(K\ge 1\) such that \(\Vert x_i-\overline{x^{}}\Vert \le \varepsilon \) for every \(i\ge K\). From the above inequality, we obtain

$$\begin{aligned} \Vert \widehat{x}_k-\overline{x^{}}\Vert\le & {} \left( \sup _{i\in \{1,\ldots ,K\}}\Vert x_i-\overline{x^{}}\Vert \right) \left( \sum _{i=1}^K\tau _{i,k}\right) +\varepsilon \, \sum _{i=K+1}^{+\infty }\tau _{i,k} \le M\,\sum _{i=1}^K\tau _{i,k} +\varepsilon , \end{aligned}$$

with \(M=\sup _{i\ge 1}\Vert x_i-\overline{x^{}}\Vert <+\infty \). Taking the upper limit as \(k\rightarrow +\infty \), we deduce from (56) that

$$\begin{aligned}\limsup _{k\rightarrow +\infty }\Vert \widehat{x}_k-\overline{x^{}}\Vert \le \varepsilon .\end{aligned}$$

Since this is true for every \(\varepsilon >0\), we conclude that \(\lim _{k\rightarrow +\infty }\Vert \widehat{x}_k-\overline{x^{}}\Vert =0\). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Attouch, H., Cabot, A. Convergence of a relaxed inertial proximal algorithm for maximally monotone operators. Math. Program. 184, 243–287 (2020). https://doi.org/10.1007/s10107-019-01412-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-019-01412-0

Keywords

Mathematics Subject Classification

Navigation