Testing and Non-linear Preconditioning of the Proximal Point Method

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Employing the ideas of non-linear preconditioning and testing of the classical proximal point method, we formalise common arguments in convergence rate and convergence proofs of optimisation methods to the verification of a simple iteration-wise inequality. When applied to fixed point operators, the latter can be seen as a generalisation of firm non-expansivity or the \(\alpha \)-averaged property. The main purpose of this work is to provide the abstract background theory for our companion paper “Block-proximal methods with spatially adapted acceleration”. In the present account we demonstrate the effectiveness of the general approach on several classical algorithms, as well as their stochastic variants. Besides, of course, the proximal point method, these method include the gradient descent, forward–backward splitting, Douglas–Rachford splitting, Newton’s method, as well as several methods for saddle-point problems, such as the Alternating Directions Method of Multipliers, and the Chambolle–Pock method.

This is a preview of subscription content, log in to check access.

References

  1. 1.

    Martinet, B.: Brève communication. Régularisation d’inéquations variationnelles par approximations successives. ESAIM 4(R3), 154–158 (1970)

    MATH  Google Scholar 

  2. 2.

    Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Optim. 14(5), 877–898 (1976). https://doi.org/10.1137/0314056

    MathSciNet  Article  MATH  Google Scholar 

  3. 3.

    Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009). https://doi.org/10.1137/080716542

    MathSciNet  Article  MATH  Google Scholar 

  4. 4.

    Loris, I., Verhoeven, C.: On a generalization of the iterative soft-thresholding algorithm for the case of non-separable penalty. Inverse Probl. 27(12), 125,007 (2011). https://doi.org/10.1088/0266-5611/27/12/125007

    MathSciNet  Article  MATH  Google Scholar 

  5. 5.

    Gabay, D.: Applications of the method of multipliers to variational inequalities. In: M. Fortin, R. Glowinski (eds.) Augmented Lagrangian Methods: Applications to the Numerical Solution of Boundary-Value Problems, vol. 15, pp. 299–331. North-Holland (1983)

  6. 6.

    Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40, 120–145 (2011). https://doi.org/10.1007/s10851-010-0251-1

    MathSciNet  Article  MATH  Google Scholar 

  7. 7.

    He, B., Yuan, X.: Convergence analysis of primal-dual algorithms for a saddle-point problem: from contraction perspective. SIAM J. Imaging Sci. 5(1), 119–149 (2012). https://doi.org/10.1137/100814494

    MathSciNet  Article  MATH  Google Scholar 

  8. 8.

    Valkonen, T., Pock, T.: Acceleration of the PDHGM on partially strongly convex functions. J. Math. Imaging Vis. 59, 394–414 (2017). https://doi.org/10.1007/s10851-016-0692-2

    MathSciNet  Article  MATH  Google Scholar 

  9. 9.

    Valkonen, T.: Block-proximal methods with spatially adapted acceleration (2017). http://tuomov.iki.fi/m/blockcp.pdf (Submitted)

  10. 10.

    Browder, F.E.: Nonexpansive nonlinear operators in a banach space. Proc. Natl. Acad. Sci. USA 54(4), 1041 (1965)

    MathSciNet  Article  Google Scholar 

  11. 11.

    Wright, S.: Coordinate descent algorithms. Math. Progr. 151(1), 3–34 (2015). https://doi.org/10.1007/s10107-015-0892-3

    MathSciNet  Article  MATH  Google Scholar 

  12. 12.

    Censor, Y., Zenios, S.A.: Proximal minimization algorithm withd-functions. J. Optim. Theory Appl. 73(3), 451–464 (1992). https://doi.org/10.1007/BF00940051

    MathSciNet  Article  MATH  Google Scholar 

  13. 13.

    Chen, G., Teboulle, M.: Convergence analysis of a proximal-like minimization algorithm using bregman functions. SIAM J. Optim. 3(3), 538–543 (1993). https://doi.org/10.1137/0803026

    MathSciNet  Article  MATH  Google Scholar 

  14. 14.

    Lorenz, D., Pock, T.: An inertial forward-backward algorithm for monotone inclusions. J. Math. Imaging Vis. 51(2), 311–325 (2015). https://doi.org/10.1007/s10851-014-0523-2

    MathSciNet  Article  MATH  Google Scholar 

  15. 15.

    Hohage, T., Homann, C.: A generalization of the Chambolle-Pock algorithm to Banach spaces with applications to inverse problems (2014) (Preprint)

  16. 16.

    Hua, X., Yamashita, N.: Block coordinate proximal gradient methods with variable bregman functions for nonsmooth separable optimization. Math. Program. 160(1), 1–32 (2016). https://doi.org/10.1007/s10107-015-0969-z

    MathSciNet  Article  MATH  Google Scholar 

  17. 17.

    Brezis, H., Crandall, M.G., Pazy, A.: Perturbations of nonlinear maximal monotone sets in banach space. Commun. Pure Appl. Math. 23(1), 123–144 (1970). https://doi.org/10.1002/cpa.3160230107

    MathSciNet  Article  MATH  Google Scholar 

  18. 18.

    Opial, Z.: Weak convergence of the sequence of successive approximations for nonexpansive mappings. Bull. Am. Math. Soc. 73(4), 591–597 (1967). https://doi.org/10.1090/S0002-9904-1967-11761-0

    MathSciNet  Article  MATH  Google Scholar 

  19. 19.

    Browder, F.E.: Convergence theorems for sequences of nonlinear operators in banach spaces. Math. Z. 100(3), 201–225 (1967). https://doi.org/10.1007/BF01109805

    MathSciNet  Article  MATH  Google Scholar 

  20. 20.

    Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math. 57(11), 1413–1457 (2004). https://doi.org/10.1002/cpa.20042

    MathSciNet  Article  MATH  Google Scholar 

  21. 21.

    Douglas Jim, J., Rachford, H.H.J.: On the numerical solution of heat conduction problems in two and three space variables. Trans. Am. Math. Soc. 82(2), 421–439 (1956). https://doi.org/10.2307/1993056

    MathSciNet  Article  MATH  Google Scholar 

  22. 22.

    Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 2 edn. CMS Books in Mathematics/Ouvrages de Mathématiques de la SMC. Springer, New York (2017). https://doi.org/10.1007/978-3-319-48311-5

  23. 23.

    Lee, J.D., Sun, Y., Saunders, M.A.: Proximal Newton-type methods for minimizing composite functions. SIAM J. Optim. 24(3), 1420–1443 (2014). https://doi.org/10.1137/130921428

    MathSciNet  Article  MATH  Google Scholar 

  24. 24.

    Mann, W.R.: Mean value methods in iteration. Proc. Am. Math. Soc. 4(3), 506–510 (1953)

    MathSciNet  Article  Google Scholar 

  25. 25.

    Schaefer, H.: Über die methode sukzessiver approximationen. Jahresbericht der Deutschen Mathematiker-Vereinigung 59, 131–140 (1957)

    MathSciNet  MATH  Google Scholar 

  26. 26.

    Petryshyn, W.: Construction of fixed points of demicompact mappings in Hilbert space. J. Math. Anal. Appl. 14(2), 276–284 (1966)

    MathSciNet  Article  Google Scholar 

  27. 27.

    Krasnoselski, M.A.: Two remarks about the method of successive approximations. Uspekhi Mat. Nauk. 19, 123–127 (1955)

    Google Scholar 

  28. 28.

    Shiriyaev, A.N.: Probability. Graduate Texts in Mathematics. Springer, New York (1996)

    Google Scholar 

  29. 29.

    Qu, Z., Richtárik, P., Takáč, M., Fercoq, O.: SDNA: stochastic dual Newton ascent for empirical risk minimization (2015)

  30. 30.

    Pilanci, M., Wainwright, M.J.: Iterative Hessian sketch: fast and accurate solution approximation for constrained least-squares. J. Mach. Learn. Res. 17(53), 1–38 (2016)

    MathSciNet  MATH  Google Scholar 

  31. 31.

    Condat, L.: A primal-dual splitting method for convex optimization involving lipschitzian, proximable and linear composite terms. J. Optim. Theory Appl. 158(2), 460–479 (2013). https://doi.org/10.1007/s10957-012-0245-9

    MathSciNet  Article  MATH  Google Scholar 

  32. 32.

    Vũ, B.C.: A splitting algorithm for dual monotone inclusions involving cocoercive operators. Adv. Comput. Math. 38(3), 667–681 (2013). https://doi.org/10.1007/s10444-011-9254-8

    MathSciNet  Article  MATH  Google Scholar 

  33. 33.

    Chambolle, A., Pock, T.: On the ergodic convergence rates of a first-order primal–dual algorithm. Math. Progr. (2015). https://doi.org/10.1007/s10107-015-0957-3

  34. 34.

    Richtárik, P., Takáč, M.: Parallel coordinate descent methods for big data optimization. Math. Progr. (2015). https://doi.org/10.1007/s10107-015-0901-6

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Tuomo Valkonen.

Appendices

Appendix A: Outer Semicontinuity of Maximal Monotone Operators

We could not find the following result explicitly stated in the literature, although it is hidden in, e.g., the proof of [2, Theorem 1].

Lemma A.1

Let \(H: U\rightrightarrows U\) be maximal monotone on a Hilbert space \(U\). Then H is is weak-to-strong outer semicontinuous: for any sequence \(\{u^i\}_{i \in \mathbb {N}}\), and any \(z^i \in H(u^i)\) such that \(u^i\mathrel {\rightharpoonup }u\) weakly, and \(z^i \rightarrow z\) strongly, we have \(z \in H(u)\).

Proof

By monotonicity, for any \(u' \in U\) and \(z' \in U\) holds \(D_i :=\langle u'-u^i,z'-z^i\rangle \ge 0\). Since a weakly convergent sequence is bounded, we have \(D_i \ge \langle u'-u^i,z'-z\rangle -C\Vert z-z^i\Vert \) for some \(C>0\) independent of i. Taking the limit, we therefore have \(\langle u'-u,z'-z\rangle \ge 0\). If we had \(z \not \in H(u)\), this would contradict that H is maximal, i.e., its graph not contained in the graph of any monotone operator. \(\square \)

Appendix B: Three-Point Inequalities

The following three-point formulas are central to handling forward steps with respect to smooth functions.

Lemma B.1

If \(J \in \mathrm {cpl}(X)\) has L-Lipschitz gradient. Then

$$\begin{aligned} \langle \nabla J(z)-\nabla J({\widehat{x}}),x-{\widehat{x}}\rangle \ge -\frac{L}{4}\Vert x-z\Vert ^2 \quad ({\widehat{x}}, z, x \in X), \end{aligned}$$
(74)

as well as

$$\begin{aligned} \langle \nabla J(z),x-{\widehat{x}}\rangle \ge J(x)-J({\widehat{x}}) - \frac{L}{2}\Vert x-z\Vert ^2 \quad ({\widehat{x}}, z, x \in X). \end{aligned}$$
(75)

Proof

Regarding the “three-point hypomonotonicity” (74), the L-Lipschitz gradient implies co-coercivity (see [22] or Appendix C)

$$\begin{aligned} \langle \nabla J(z)-\nabla J({\widehat{x}}),z-{\widehat{x}}\rangle \ge L^{-1} \Vert \nabla J(z)-\nabla J({\widehat{x}})\Vert ^2. \end{aligned}$$

Thus using Cauchy’s inequality

$$\begin{aligned} \begin{aligned} \langle \nabla J(z)-\nabla J({\widehat{x}}),x-{\widehat{x}}\rangle&=\langle \nabla J(z)-\nabla J({\widehat{x}}),z-{\widehat{x}}\rangle +\langle \nabla J(z)-\nabla J({\widehat{x}}),x-z\rangle \\&\ge -\frac{L}{4}\Vert x-z\Vert ^2. \end{aligned} \end{aligned}$$

To prove (75), the Lipschitz gradient implies the smoothness or “descent inequality” (again, [22] or Appendix C)

$$\begin{aligned} J(z)-J(x) \ge \langle \nabla J(z),z-x\rangle - \frac{L}{2}\Vert x-z\Vert ^2. \end{aligned}$$
(76)

By convexity \(J({\widehat{x}})-J(z) \ge \langle \nabla J(z),{\widehat{x}}-z\rangle \). Summed, we obtain (75). \(\square \)

Lemma B.2

If \(J \in \mathrm {cpl}(X)\) has L-Lipschitz gradient and is \(\gamma \)-strongly convex. Then for any \(\tau >0\) holds

$$\begin{aligned} \langle \nabla J(z)-\nabla J({\widehat{x}}),x-{\widehat{x}}\rangle \ge \frac{2\gamma -\tau L^2}{2}\Vert x-{\widehat{x}}\Vert ^2 -\frac{1}{2\tau }\Vert x-z\Vert ^2 \quad ({\widehat{x}}, z, x \in X), \end{aligned}$$
(77)

as well as

$$\begin{aligned} \langle \nabla J(z),x-{\widehat{x}}\rangle \ge J(x)-J({\widehat{x}}) + \frac{\gamma -\tau L^2}{2}\Vert x-{\widehat{x}}\Vert ^2 -\frac{1}{2\tau }\Vert x-z\Vert ^2 \quad ({\widehat{x}}, z, x \in X). \end{aligned}$$
(78)

Proof

To prove (78), using strong convexity,the Lipschitz gradient, and Cauchy’s inequality, we have

$$\begin{aligned} \begin{aligned} \langle \nabla J(z),x-{\widehat{x}}\rangle&=\langle \nabla J(x),x-{\widehat{x}}\rangle +\langle \nabla J(z)-\nabla J(x),x-{\widehat{x}}\rangle \\&\ge J(x)-J({\widehat{x}}) + \frac{\gamma }{2}\Vert x-{\widehat{x}}\Vert ^2 -\frac{1}{2\tau }\Vert x-z\Vert ^2 - \frac{\tau L^2}{2}\Vert x-{\widehat{x}}\Vert ^2. \end{aligned} \end{aligned}$$

Regarding (77), using the \(\gamma \)-strong monotonicity of \(\nabla J\), we estimate completely analogously

$$\begin{aligned} \begin{aligned} \langle \nabla J(z)-\nabla J({\widehat{x}}),x-{\widehat{x}}\rangle&=\langle \nabla J(x)-\nabla J({\widehat{x}}),x-{\widehat{x}}\rangle +\langle \nabla J(z)-\nabla J(x),x-{\widehat{x}}\rangle \\&\ge \gamma \Vert x-{\widehat{x}}\Vert ^2 -\frac{1}{2\tau }\Vert x-z\Vert ^2 - \frac{\tau L^2}{2}\Vert x-{\widehat{x}}\Vert ^2. \end{aligned} \end{aligned}$$

\(\square \)

Since smooth functions with a positive Hessian are locally convex, the above lemmas readily extend to this case, locally. In fact, we have following more precise result:

Lemma B.3

Suppose \(J \in C^2(X)\) with \(\nabla ^2 J({\widehat{x}}) > 0\) at given \({\widehat{x}}\in X\). Then for any \(\tau \in (0, 2]\) and all \(z, x, \eta \in X\), we have

$$\begin{aligned} \langle \nabla J(z)-\nabla J({\widehat{x}}),x-{\widehat{x}}\rangle \ge \frac{(1-\delta _{z,\eta })(2-\tau )}{2}\Vert x-{\widehat{x}}\Vert ^2_{\nabla ^2 J(\eta )} -\frac{1+\delta _{z,\eta }}{2\tau } \Vert x-z\Vert ^2_{\nabla ^2 J(\eta )} \end{aligned}$$
(79)

with

$$\begin{aligned} \delta _{z,\eta } :=\inf \left\{ \delta \ge 0 \,\Bigg |\, \begin{array}{r} (1-\delta )\nabla ^2 J(\eta ) \le \nabla ^2 J(\zeta ) \le (1+\delta )\nabla ^2 J(\eta ) \\ \text { for all } \zeta \in {{\mathrm{cl}}}B(\Vert z-{\widehat{x}}\Vert , {\widehat{x}}) \end{array} \right\} . \end{aligned}$$
(80)

If \(x \in {{\mathrm{cl}}}B(\Vert z-{\widehat{x}}\Vert , {\widehat{x}})\), then also

$$\begin{aligned} \langle \nabla J(z),x-{\widehat{x}}\rangle \ge J(x)-J({\widehat{x}}) + \frac{(1-\delta _{z,\eta })(1-\tau )-2\delta _{z,\eta }}{2}\Vert x-{\widehat{x}}\Vert _{\nabla ^2 J(\eta )}^2 -\frac{1+\delta _{z,\eta }}{2\tau }\Vert x-z\Vert _{\nabla ^2 J(\eta )}^2. \end{aligned}$$
(81)

Proof

By Taylor expansion, for some \(\zeta \) between z and \({\widehat{x}}\), and any \(\tau >0\), we have

$$\begin{aligned} \begin{aligned} \langle \nabla J(z)-\nabla J({\widehat{x}}),x-{\widehat{x}}\rangle&=\langle \nabla ^2 J(\zeta )(z-{\widehat{x}}),x-{\widehat{x}}\rangle \\&=\Vert x-{\widehat{x}}\Vert ^2_{\nabla ^2 J(\zeta )} +\langle \nabla ^2 J(\zeta )(z-x),x-{\widehat{x}}\rangle \\&\ge \frac{2-\tau }{2}\Vert x-{\widehat{x}}\Vert ^2_{\nabla ^2 J(\zeta )} -\frac{1}{2\tau } \Vert x-z\Vert ^2_{\nabla ^2 J(\zeta )}. \end{aligned} \end{aligned}$$
(82)

Since \(\zeta \in {{\mathrm{cl}}}B(\Vert z-{\widehat{x}}\Vert , {\widehat{x}})\), by the definition of \(\delta _{z,\eta }\), we obtain (79).

Similarly, by Taylor expansion, for some \(\zeta _0\) between x and \({\widehat{x}}\), we have

$$\begin{aligned} \langle \nabla J(z),x-{\widehat{x}}\rangle - J(x) + J({\widehat{x}}) = \langle \nabla J(z)-\nabla J({\widehat{x}}),x-{\widehat{x}}\rangle -\frac{1}{2}\langle \nabla ^2 J(\zeta _0)(x-{\widehat{x}}),x-{\widehat{x}}\rangle \end{aligned}$$
(83)

Using (82) we obtain

$$\begin{aligned} \begin{aligned} \langle \nabla J(z),x-{\widehat{x}}\rangle \! -\! J(x) + J({\widehat{x}})&\ge \frac{1}{2}\Vert x-{\widehat{x}}\Vert ^2_{(2-\tau )\nabla ^2 J(\zeta ) - \nabla ^2 J(\zeta _0)} -\frac{1}{2\tau } \Vert x-z\Vert ^2_{\nabla ^2 J(\zeta )}. \end{aligned} \end{aligned}$$

Using the assumption \(x \in {{\mathrm{cl}}}B(\Vert z-{\widehat{x}}\Vert , {\widehat{x}})\), we have \(\zeta _0 \in {{\mathrm{cl}}}B(\Vert z-{\widehat{x}}\Vert , {\widehat{x}})\). Hence we obtain (81) by the definition of \(\delta _{z,\eta }\) and \((1-\delta _{z,\eta })(2-\tau )-(1+\delta _{z,\eta })=(1-\delta _{z,\eta })(1-\tau )-2\delta _{z,\eta }\). \(\square \)

We can also derive the following alternate result:

Lemma B.4

Suppose \(J \in C^2(X)\) with \(\nabla ^2 J({\widehat{x}}) > 0\) at given \({\widehat{x}}\in X\). Then for all \(z, x, \eta \in X\) we have

$$\begin{aligned} \langle \nabla J(z)-\nabla J({\widehat{x}}),x-{\widehat{x}}\rangle \ge \frac{1-\delta _{z,\eta }}{2}\Vert x-{\widehat{x}}\Vert ^2_{\nabla ^2 J(\eta )} + \frac{1-\delta _{z,\eta }}{2}\Vert z-{\widehat{x}}\Vert ^2_{\nabla ^2 J(\eta )} - \frac{1}{2}\Vert x-z\Vert ^2_{\nabla ^2 J(\eta )} \end{aligned}$$
(84)

for \(\delta _{z,\eta }\) given by (80). If \(x \in {{\mathrm{cl}}}B(\Vert z-{\widehat{x}}\Vert , {\widehat{x}})\), then also

$$\begin{aligned} \begin{aligned} \langle \nabla J(z),x-{\widehat{x}}\rangle \ge&-\delta _{z,\eta }\Vert x-{\widehat{x}}\Vert ^2_{\nabla ^2 J(\eta )} + \frac{1-\delta _{z,\eta }}{2}\Vert z-{\widehat{x}}\Vert ^2_{\nabla ^2 J(\eta )} - \frac{1}{2}\Vert x-z\Vert ^2_{\nabla ^2 J(\eta )} \\&+ J(x)-J({\widehat{x}}). \end{aligned} \end{aligned}$$
(85)

Proof

By Taylor expansion, for some \(\zeta \) between z and \({\widehat{x}}\), we have

$$\begin{aligned} \begin{aligned} \langle \nabla J(z)-\nabla J({\widehat{x}}),x-{\widehat{x}}\rangle&=\langle \nabla ^2 J(\zeta )(z-{\widehat{x}}),x-{\widehat{x}}\rangle \\&= \langle \nabla ^2 J(\eta )(z-{\widehat{x}}),x-{\widehat{x}}\rangle \\&\quad +\,\langle [\nabla ^2 J(\zeta )-\nabla ^2 J(\eta )](z-{\widehat{x}}),x-{\widehat{x}}\rangle \\&\ge \langle \nabla ^2 J(\eta )(z-{\widehat{x}}),x-{\widehat{x}}\rangle \\&\quad -\, \frac{\delta _{z,\eta }}{2}\Vert x-{\widehat{x}}\Vert _{\nabla ^2 J(\eta )} - \frac{\delta _{z,\eta }}{2}\Vert z-{\widehat{x}}\Vert _{\nabla ^2 J(\eta )}. \end{aligned} \end{aligned}$$
(86)

In the last step we have used Cauchy’s inequality, and the definition of \(\delta _{z,\eta }\) following \(\zeta \in {{\mathrm{cl}}}B(\Vert z-{\widehat{x}}\Vert , {\widehat{x}})\). The standard three-point or Pythagoras’ identity states

$$\begin{aligned} \langle \nabla ^2 J(\eta )(z-{\widehat{x}}),x-{\widehat{x}}\rangle = \frac{1}{2}\Vert z-{\widehat{x}}\Vert ^2_{\nabla ^2 J(\eta )} + \frac{1}{2}\Vert x-{\widehat{x}}\Vert ^2_{\nabla ^2 J(\eta )} - \frac{1}{2}\Vert x-z\Vert ^2_{\nabla ^2 J(\eta )}. \end{aligned}$$

Applying this in (86), we obtain (84).

To prove (85), we use (83), the definition of \(\delta _{z,\eta }\), and (84). \(\square \)

Appendix C: Projected Gradients and Smoothness

The next lemma generalises well-known properties (see, e.g., [22]) of smooth convex functions to projected gradients, when we take P as projection operator. With P a random projection, taking the expectation in (89), we in particular obtain a connection to the Expected Separable Over-approximation property in the stochastic coordinate descent literature [34].

Lemma C.1

Let \(J \in \mathrm {cpl}(X)\), and \(P \in \mathcal {L}(X; X)\) be self-adjoint and positive semi-definite on a Hilbert space X. Suppose P has a pseudo-inverse \(P^\dag \) satisfying \( P P^\dag P = P\). Consider the properties:

  1. (i)

    P-relative Lipschitz continuity of \(\nabla J\) with factor L:

    $$\begin{aligned} \Vert \nabla J(x)-\nabla J(y)\Vert _P \le L \Vert x-y\Vert _{P^\dag } \quad (x, y \in X). \end{aligned}$$
    (87)
  2. (ii)

    The P-relative property

    $$\begin{aligned} \langle \nabla J(x+Ph) - \nabla J(x),Ph\rangle \le L\Vert h\Vert _P^2 \quad (x, h \in X). \end{aligned}$$
    (88)
  3. (iii)

    P-relative smoothness of J with factor L:

    $$\begin{aligned} J(x+Ph) \le J(x) + \langle \nabla J(x),Ph\rangle +\frac{L}{2}\Vert h\Vert _P^2 \quad (x, h \in X). \end{aligned}$$
    (89)
  4. (iv)

    The P-relative property

    $$\begin{aligned} J(y) \le J(x) + \langle \nabla J(y),y-x\rangle -\frac{1}{2L}\Vert \nabla J(x)-\nabla J(y)\Vert _P^2 \quad (x, h \in X). \end{aligned}$$
    (90)
  5. (v)

    P-relative co-coercivity of \(\nabla J\) with factor \(L^{-1}\):

    $$\begin{aligned} L^{-1} \Vert \nabla J(x)-\nabla J(y)\Vert _P^2 \le \langle \nabla J(x)-\nabla J(y),x-y\rangle \quad (x, y \in X). \end{aligned}$$
    (91)

We have (i) \(\implies \) (ii) \(\iff \) (iii) \(\implies \) (iv) \(\implies \) (v). If P is invertible, all are equivalent.

Proof

(i) \(\implies \) (ii): Take \(y=x+Ph\) and multiply (87) by \(\Vert h\Vert _P\). Then use Cauchy–Schwarz.

(ii) \(\implies \) (iii): Using the mean value theorem and (88), we compute (89):

$$\begin{aligned} \begin{aligned}&J(x+Ph) - J(x) - \langle \nabla J(x),Ph\rangle =\int _0^1 \langle \nabla J(x+tPh),Ph\rangle \,dt - \langle \nabla J(x),Ph\rangle \\&\quad =\int _0^1 \langle \nabla J(x+tPh)-\nabla J(x),Ph\rangle \,dt \le \int _0^1 t \,dt \cdot L\Vert h\Vert _P^2 = \frac{L}{2} \Vert h\Vert _P^2. \end{aligned} \end{aligned}$$

(iii) \(\implies \) (ii): Add together (89) for \(x=x'\) and \(x=x'+Ph\).

(iii) \(\implies \) (iv): Adding \(-\langle \nabla J(y),x+Ph\rangle \) on both sides of (89), we get

$$\begin{aligned} J(x+Ph) - \langle \nabla J(y),x+Ph\rangle \le J(x) - \langle \nabla J(y),x\rangle + \langle \nabla J(x)-\nabla J(y),Ph\rangle +\frac{L}{2}\Vert h\Vert _P^2. \end{aligned}$$

The left hand side is minimised with respect to x by taking \(x=y-Ph\). Taking on the right-hand side \(h=L^{-1}(\nabla J(y)-\nabla J(x))\) therefore gives (90).

(iv) \(\implies \) (v): Summing the estimate (90) with the same estimate with x and y exchanged, we obtain (91).

(v) \(\implies \) (i) when P is invertible: Cauchy–Schwarz. \(\square \)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Valkonen, T. Testing and Non-linear Preconditioning of the Proximal Point Method. Appl Math Optim 82, 591–636 (2020). https://doi.org/10.1007/s00245-018-9541-6

Download citation

Mathematics Subject Classification

  • 49M29
  • 65K10
  • 65K15
  • 90C30
  • 90C47