Testing and Non-linear Preconditioning of the Proximal Point Method

Valkonen, Tuomo

doi:10.1007/s00245-018-9541-6

Testing and Non-linear Preconditioning of the Proximal Point Method

Published: 28 November 2018

Volume 82, pages 591–636, (2020)
Cite this article

Applied Mathematics & Optimization Submit manuscript

Tuomo Valkonen²^nAff1

418 Accesses
12 Citations
1 Altmetric
Explore all metrics

Abstract

Employing the ideas of non-linear preconditioning and testing of the classical proximal point method, we formalise common arguments in convergence rate and convergence proofs of optimisation methods to the verification of a simple iteration-wise inequality. When applied to fixed point operators, the latter can be seen as a generalisation of firm non-expansivity or the $\alpha $-averaged property. The main purpose of this work is to provide the abstract background theory for our companion paper “Block-proximal methods with spatially adapted acceleration”. In the present account we demonstrate the effectiveness of the general approach on several classical algorithms, as well as their stochastic variants. Besides, of course, the proximal point method, these method include the gradient descent, forward–backward splitting, Douglas–Rachford splitting, Newton’s method, as well as several methods for saddle-point problems, such as the Alternating Directions Method of Multipliers, and the Chambolle–Pock method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Proximal Point Analysis of the Preconditioned Alternating Direction Method of Multipliers

Article 26 April 2017

The generalized proximal point algorithm with step size 2 is not necessarily convergent

Article 03 March 2018

On the Optimal Linear Convergence Rate of a Generalized Proximal Point Algorithm

Article 12 July 2017

References

Martinet, B.: Brève communication. Régularisation d’inéquations variationnelles par approximations successives. ESAIM 4(R3), 154–158 (1970)
MATH Google Scholar
Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Optim. 14(5), 877–898 (1976). https://doi.org/10.1137/0314056
Article MathSciNet MATH Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009). https://doi.org/10.1137/080716542
Article MathSciNet MATH Google Scholar
Loris, I., Verhoeven, C.: On a generalization of the iterative soft-thresholding algorithm for the case of non-separable penalty. Inverse Probl. 27(12), 125,007 (2011). https://doi.org/10.1088/0266-5611/27/12/125007
Article MathSciNet MATH Google Scholar
Gabay, D.: Applications of the method of multipliers to variational inequalities. In: M. Fortin, R. Glowinski (eds.) Augmented Lagrangian Methods: Applications to the Numerical Solution of Boundary-Value Problems, vol. 15, pp. 299–331. North-Holland (1983)
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40, 120–145 (2011). https://doi.org/10.1007/s10851-010-0251-1
Article MathSciNet MATH Google Scholar
He, B., Yuan, X.: Convergence analysis of primal-dual algorithms for a saddle-point problem: from contraction perspective. SIAM J. Imaging Sci. 5(1), 119–149 (2012). https://doi.org/10.1137/100814494
Article MathSciNet MATH Google Scholar
Valkonen, T., Pock, T.: Acceleration of the PDHGM on partially strongly convex functions. J. Math. Imaging Vis. 59, 394–414 (2017). https://doi.org/10.1007/s10851-016-0692-2
Article MathSciNet MATH Google Scholar
Valkonen, T.: Block-proximal methods with spatially adapted acceleration (2017). http://tuomov.iki.fi/m/blockcp.pdf (Submitted)
Browder, F.E.: Nonexpansive nonlinear operators in a banach space. Proc. Natl. Acad. Sci. USA 54(4), 1041 (1965)
Article MathSciNet Google Scholar
Wright, S.: Coordinate descent algorithms. Math. Progr. 151(1), 3–34 (2015). https://doi.org/10.1007/s10107-015-0892-3
Article MathSciNet MATH Google Scholar
Censor, Y., Zenios, S.A.: Proximal minimization algorithm withd-functions. J. Optim. Theory Appl. 73(3), 451–464 (1992). https://doi.org/10.1007/BF00940051
Article MathSciNet MATH Google Scholar
Chen, G., Teboulle, M.: Convergence analysis of a proximal-like minimization algorithm using bregman functions. SIAM J. Optim. 3(3), 538–543 (1993). https://doi.org/10.1137/0803026
Article MathSciNet MATH Google Scholar
Lorenz, D., Pock, T.: An inertial forward-backward algorithm for monotone inclusions. J. Math. Imaging Vis. 51(2), 311–325 (2015). https://doi.org/10.1007/s10851-014-0523-2
Article MathSciNet MATH Google Scholar
Hohage, T., Homann, C.: A generalization of the Chambolle-Pock algorithm to Banach spaces with applications to inverse problems (2014) (Preprint)
Hua, X., Yamashita, N.: Block coordinate proximal gradient methods with variable bregman functions for nonsmooth separable optimization. Math. Program. 160(1), 1–32 (2016). https://doi.org/10.1007/s10107-015-0969-z
Article MathSciNet MATH Google Scholar
Brezis, H., Crandall, M.G., Pazy, A.: Perturbations of nonlinear maximal monotone sets in banach space. Commun. Pure Appl. Math. 23(1), 123–144 (1970). https://doi.org/10.1002/cpa.3160230107
Article MathSciNet MATH Google Scholar
Opial, Z.: Weak convergence of the sequence of successive approximations for nonexpansive mappings. Bull. Am. Math. Soc. 73(4), 591–597 (1967). https://doi.org/10.1090/S0002-9904-1967-11761-0
Article MathSciNet MATH Google Scholar
Browder, F.E.: Convergence theorems for sequences of nonlinear operators in banach spaces. Math. Z. 100(3), 201–225 (1967). https://doi.org/10.1007/BF01109805
Article MathSciNet MATH Google Scholar
Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math. 57(11), 1413–1457 (2004). https://doi.org/10.1002/cpa.20042
Article MathSciNet MATH Google Scholar
Douglas Jim, J., Rachford, H.H.J.: On the numerical solution of heat conduction problems in two and three space variables. Trans. Am. Math. Soc. 82(2), 421–439 (1956). https://doi.org/10.2307/1993056
Article MathSciNet MATH Google Scholar
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 2 edn. CMS Books in Mathematics/Ouvrages de Mathématiques de la SMC. Springer, New York (2017). https://doi.org/10.1007/978-3-319-48311-5
Lee, J.D., Sun, Y., Saunders, M.A.: Proximal Newton-type methods for minimizing composite functions. SIAM J. Optim. 24(3), 1420–1443 (2014). https://doi.org/10.1137/130921428
Article MathSciNet MATH Google Scholar
Mann, W.R.: Mean value methods in iteration. Proc. Am. Math. Soc. 4(3), 506–510 (1953)
Article MathSciNet Google Scholar
Schaefer, H.: Über die methode sukzessiver approximationen. Jahresbericht der Deutschen Mathematiker-Vereinigung 59, 131–140 (1957)
MathSciNet MATH Google Scholar
Petryshyn, W.: Construction of fixed points of demicompact mappings in Hilbert space. J. Math. Anal. Appl. 14(2), 276–284 (1966)
Article MathSciNet Google Scholar
Krasnoselski, M.A.: Two remarks about the method of successive approximations. Uspekhi Mat. Nauk. 19, 123–127 (1955)
Google Scholar
Shiriyaev, A.N.: Probability. Graduate Texts in Mathematics. Springer, New York (1996)
Google Scholar
Qu, Z., Richtárik, P., Takáč, M., Fercoq, O.: SDNA: stochastic dual Newton ascent for empirical risk minimization (2015)
Pilanci, M., Wainwright, M.J.: Iterative Hessian sketch: fast and accurate solution approximation for constrained least-squares. J. Mach. Learn. Res. 17(53), 1–38 (2016)
MathSciNet MATH Google Scholar
Condat, L.: A primal-dual splitting method for convex optimization involving lipschitzian, proximable and linear composite terms. J. Optim. Theory Appl. 158(2), 460–479 (2013). https://doi.org/10.1007/s10957-012-0245-9
Article MathSciNet MATH Google Scholar
Vũ, B.C.: A splitting algorithm for dual monotone inclusions involving cocoercive operators. Adv. Comput. Math. 38(3), 667–681 (2013). https://doi.org/10.1007/s10444-011-9254-8
Article MathSciNet MATH Google Scholar
Chambolle, A., Pock, T.: On the ergodic convergence rates of a first-order primal–dual algorithm. Math. Progr. (2015). https://doi.org/10.1007/s10107-015-0957-3
Richtárik, P., Takáč, M.: Parallel coordinate descent methods for big data optimization. Math. Progr. (2015). https://doi.org/10.1007/s10107-015-0901-6

Download references

Author information

Tuomo Valkonen
Present address: ModeMat, Escuela Politécnica Nacional, Quito, Ecuador

Authors and Affiliations

Department of Mathematical Sciences, University of Liverpool, Liverpool, UK
Tuomo Valkonen

Authors

Tuomo Valkonen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tuomo Valkonen.

Appendices

Appendix A: Outer Semicontinuity of Maximal Monotone Operators

We could not find the following result explicitly stated in the literature, although it is hidden in, e.g., the proof of [2, Theorem 1].

Lemma A.1

Let $H: U\rightrightarrows U$ be maximal monotone on a Hilbert space $U$. Then H is is weak-to-strong outer semicontinuous: for any sequence $\{u^i\}_{i \in \mathbb {N}}$, and any $z^i \in H(u^i)$ such that $u^i\mathrel {\rightharpoonup }u$ weakly, and $z^i \rightarrow z$ strongly, we have $z \in H(u)$.

Proof

By monotonicity, for any $u' \in U$ and $z' \in U$ holds $D_i :=\langle u'-u^i,z'-z^i\rangle \ge 0$. Since a weakly convergent sequence is bounded, we have $D_i \ge \langle u'-u^i,z'-z\rangle -C\Vert z-z^i\Vert $ for some $C>0$ independent of i. Taking the limit, we therefore have $\langle u'-u,z'-z\rangle \ge 0$. If we had $z \not \in H(u)$, this would contradict that H is maximal, i.e., its graph not contained in the graph of any monotone operator. $\square $

Appendix B: Three-Point Inequalities

The following three-point formulas are central to handling forward steps with respect to smooth functions.

Lemma B.1

If $J \in \mathrm {cpl}(X)$ has L-Lipschitz gradient. Then

$$\begin{aligned} \langle \nabla J(z)-\nabla J({\widehat{x}}),x-{\widehat{x}}\rangle \ge -\frac{L}{4}\Vert x-z\Vert ^2 \quad ({\widehat{x}}, z, x \in X), \end{aligned}$$

(74)

as well as

$$\begin{aligned} \langle \nabla J(z),x-{\widehat{x}}\rangle \ge J(x)-J({\widehat{x}}) - \frac{L}{2}\Vert x-z\Vert ^2 \quad ({\widehat{x}}, z, x \in X). \end{aligned}$$

(75)

Proof

Regarding the “three-point hypomonotonicity” (74), the L-Lipschitz gradient implies co-coercivity (see [22] or Appendix C)

$$\begin{aligned} \langle \nabla J(z)-\nabla J({\widehat{x}}),z-{\widehat{x}}\rangle \ge L^{-1} \Vert \nabla J(z)-\nabla J({\widehat{x}})\Vert ^2. \end{aligned}$$

Thus using Cauchy’s inequality

$$\begin{aligned} \begin{aligned} \langle \nabla J(z)-\nabla J({\widehat{x}}),x-{\widehat{x}}\rangle&=\langle \nabla J(z)-\nabla J({\widehat{x}}),z-{\widehat{x}}\rangle +\langle \nabla J(z)-\nabla J({\widehat{x}}),x-z\rangle \\&\ge -\frac{L}{4}\Vert x-z\Vert ^2. \end{aligned} \end{aligned}$$

To prove (75), the Lipschitz gradient implies the smoothness or “descent inequality” (again, [22] or Appendix C)

$$\begin{aligned} J(z)-J(x) \ge \langle \nabla J(z),z-x\rangle - \frac{L}{2}\Vert x-z\Vert ^2. \end{aligned}$$

(76)

By convexity $J({\widehat{x}})-J(z) \ge \langle \nabla J(z),{\widehat{x}}-z\rangle $. Summed, we obtain (75). $\square $

Lemma B.2

If $J \in \mathrm {cpl}(X)$ has L-Lipschitz gradient and is $\gamma $-strongly convex. Then for any $\tau >0$ holds

$$\begin{aligned} \langle \nabla J(z)-\nabla J({\widehat{x}}),x-{\widehat{x}}\rangle \ge \frac{2\gamma -\tau L^2}{2}\Vert x-{\widehat{x}}\Vert ^2 -\frac{1}{2\tau }\Vert x-z\Vert ^2 \quad ({\widehat{x}}, z, x \in X), \end{aligned}$$

(77)

as well as

$$\begin{aligned} \langle \nabla J(z),x-{\widehat{x}}\rangle \ge J(x)-J({\widehat{x}}) + \frac{\gamma -\tau L^2}{2}\Vert x-{\widehat{x}}\Vert ^2 -\frac{1}{2\tau }\Vert x-z\Vert ^2 \quad ({\widehat{x}}, z, x \in X). \end{aligned}$$

(78)

Proof

To prove (78), using strong convexity,the Lipschitz gradient, and Cauchy’s inequality, we have

$$\begin{aligned} \begin{aligned} \langle \nabla J(z),x-{\widehat{x}}\rangle&=\langle \nabla J(x),x-{\widehat{x}}\rangle +\langle \nabla J(z)-\nabla J(x),x-{\widehat{x}}\rangle \\&\ge J(x)-J({\widehat{x}}) + \frac{\gamma }{2}\Vert x-{\widehat{x}}\Vert ^2 -\frac{1}{2\tau }\Vert x-z\Vert ^2 - \frac{\tau L^2}{2}\Vert x-{\widehat{x}}\Vert ^2. \end{aligned} \end{aligned}$$

Regarding (77), using the $\gamma $-strong monotonicity of $\nabla J$, we estimate completely analogously

$$\begin{aligned} \begin{aligned} \langle \nabla J(z)-\nabla J({\widehat{x}}),x-{\widehat{x}}\rangle&=\langle \nabla J(x)-\nabla J({\widehat{x}}),x-{\widehat{x}}\rangle +\langle \nabla J(z)-\nabla J(x),x-{\widehat{x}}\rangle \\&\ge \gamma \Vert x-{\widehat{x}}\Vert ^2 -\frac{1}{2\tau }\Vert x-z\Vert ^2 - \frac{\tau L^2}{2}\Vert x-{\widehat{x}}\Vert ^2. \end{aligned} \end{aligned}$$

$\square $

Since smooth functions with a positive Hessian are locally convex, the above lemmas readily extend to this case, locally. In fact, we have following more precise result:

Lemma B.3

Suppose $J \in C^2(X)$ with $\nabla ^2 J({\widehat{x}}) > 0$ at given ${\widehat{x}}\in X$. Then for any $\tau \in (0, 2]$ and all $z, x, \eta \in X$, we have

$$\begin{aligned} \langle \nabla J(z)-\nabla J({\widehat{x}}),x-{\widehat{x}}\rangle \ge \frac{(1-\delta _{z,\eta })(2-\tau )}{2}\Vert x-{\widehat{x}}\Vert ^2_{\nabla ^2 J(\eta )} -\frac{1+\delta _{z,\eta }}{2\tau } \Vert x-z\Vert ^2_{\nabla ^2 J(\eta )} \end{aligned}$$

(79)

with

$$\begin{aligned} \delta _{z,\eta } :=\inf \left\{ \delta \ge 0 \,\Bigg |\, \begin{array}{r} (1-\delta )\nabla ^2 J(\eta ) \le \nabla ^2 J(\zeta ) \le (1+\delta )\nabla ^2 J(\eta ) \\ \text { for all } \zeta \in {{\mathrm{cl}}}B(\Vert z-{\widehat{x}}\Vert , {\widehat{x}}) \end{array} \right\} . \end{aligned}$$

(80)

If $x \in {{\mathrm{cl}}}B(\Vert z-{\widehat{x}}\Vert , {\widehat{x}})$, then also

$$\begin{aligned} \langle \nabla J(z),x-{\widehat{x}}\rangle \ge J(x)-J({\widehat{x}}) + \frac{(1-\delta _{z,\eta })(1-\tau )-2\delta _{z,\eta }}{2}\Vert x-{\widehat{x}}\Vert _{\nabla ^2 J(\eta )}^2 -\frac{1+\delta _{z,\eta }}{2\tau }\Vert x-z\Vert _{\nabla ^2 J(\eta )}^2. \end{aligned}$$

(81)

Proof

By Taylor expansion, for some $\zeta $ between z and ${\widehat{x}}$, and any $\tau >0$, we have

$$\begin{aligned} \begin{aligned} \langle \nabla J(z)-\nabla J({\widehat{x}}),x-{\widehat{x}}\rangle&=\langle \nabla ^2 J(\zeta )(z-{\widehat{x}}),x-{\widehat{x}}\rangle \\&=\Vert x-{\widehat{x}}\Vert ^2_{\nabla ^2 J(\zeta )} +\langle \nabla ^2 J(\zeta )(z-x),x-{\widehat{x}}\rangle \\&\ge \frac{2-\tau }{2}\Vert x-{\widehat{x}}\Vert ^2_{\nabla ^2 J(\zeta )} -\frac{1}{2\tau } \Vert x-z\Vert ^2_{\nabla ^2 J(\zeta )}. \end{aligned} \end{aligned}$$

(82)

Since $\zeta \in {{\mathrm{cl}}}B(\Vert z-{\widehat{x}}\Vert , {\widehat{x}})$, by the definition of $\delta _{z,\eta }$, we obtain (79).

Similarly, by Taylor expansion, for some $\zeta _0$ between x and ${\widehat{x}}$, we have

$$\begin{aligned} \langle \nabla J(z),x-{\widehat{x}}\rangle - J(x) + J({\widehat{x}}) = \langle \nabla J(z)-\nabla J({\widehat{x}}),x-{\widehat{x}}\rangle -\frac{1}{2}\langle \nabla ^2 J(\zeta _0)(x-{\widehat{x}}),x-{\widehat{x}}\rangle \end{aligned}$$

(83)

Using (82) we obtain

$$\begin{aligned} \begin{aligned} \langle \nabla J(z),x-{\widehat{x}}\rangle \! -\! J(x) + J({\widehat{x}})&\ge \frac{1}{2}\Vert x-{\widehat{x}}\Vert ^2_{(2-\tau )\nabla ^2 J(\zeta ) - \nabla ^2 J(\zeta _0)} -\frac{1}{2\tau } \Vert x-z\Vert ^2_{\nabla ^2 J(\zeta )}. \end{aligned} \end{aligned}$$

Using the assumption $x \in {{\mathrm{cl}}}B(\Vert z-{\widehat{x}}\Vert , {\widehat{x}})$, we have $\zeta _0 \in {{\mathrm{cl}}}B(\Vert z-{\widehat{x}}\Vert , {\widehat{x}})$. Hence we obtain (81) by the definition of $\delta _{z,\eta }$ and $(1-\delta _{z,\eta })(2-\tau )-(1+\delta _{z,\eta })=(1-\delta _{z,\eta })(1-\tau )-2\delta _{z,\eta }$. $\square $

We can also derive the following alternate result:

Lemma B.4

Suppose $J \in C^2(X)$ with $\nabla ^2 J({\widehat{x}}) > 0$ at given ${\widehat{x}}\in X$. Then for all $z, x, \eta \in X$ we have

$$\begin{aligned} \langle \nabla J(z)-\nabla J({\widehat{x}}),x-{\widehat{x}}\rangle \ge \frac{1-\delta _{z,\eta }}{2}\Vert x-{\widehat{x}}\Vert ^2_{\nabla ^2 J(\eta )} + \frac{1-\delta _{z,\eta }}{2}\Vert z-{\widehat{x}}\Vert ^2_{\nabla ^2 J(\eta )} - \frac{1}{2}\Vert x-z\Vert ^2_{\nabla ^2 J(\eta )} \end{aligned}$$

(84)

for $\delta _{z,\eta }$ given by (80). If $x \in {{\mathrm{cl}}}B(\Vert z-{\widehat{x}}\Vert , {\widehat{x}})$, then also

$$\begin{aligned} \begin{aligned} \langle \nabla J(z),x-{\widehat{x}}\rangle \ge&-\delta _{z,\eta }\Vert x-{\widehat{x}}\Vert ^2_{\nabla ^2 J(\eta )} + \frac{1-\delta _{z,\eta }}{2}\Vert z-{\widehat{x}}\Vert ^2_{\nabla ^2 J(\eta )} - \frac{1}{2}\Vert x-z\Vert ^2_{\nabla ^2 J(\eta )} \\&+ J(x)-J({\widehat{x}}). \end{aligned} \end{aligned}$$

(85)

Proof

By Taylor expansion, for some $\zeta $ between z and ${\widehat{x}}$, we have

$$\begin{aligned} \begin{aligned} \langle \nabla J(z)-\nabla J({\widehat{x}}),x-{\widehat{x}}\rangle&=\langle \nabla ^2 J(\zeta )(z-{\widehat{x}}),x-{\widehat{x}}\rangle \\&= \langle \nabla ^2 J(\eta )(z-{\widehat{x}}),x-{\widehat{x}}\rangle \\&\quad +\,\langle [\nabla ^2 J(\zeta )-\nabla ^2 J(\eta )](z-{\widehat{x}}),x-{\widehat{x}}\rangle \\&\ge \langle \nabla ^2 J(\eta )(z-{\widehat{x}}),x-{\widehat{x}}\rangle \\&\quad -\, \frac{\delta _{z,\eta }}{2}\Vert x-{\widehat{x}}\Vert _{\nabla ^2 J(\eta )} - \frac{\delta _{z,\eta }}{2}\Vert z-{\widehat{x}}\Vert _{\nabla ^2 J(\eta )}. \end{aligned} \end{aligned}$$

(86)

In the last step we have used Cauchy’s inequality, and the definition of $\delta _{z,\eta }$ following $\zeta \in {{\mathrm{cl}}}B(\Vert z-{\widehat{x}}\Vert , {\widehat{x}})$. The standard three-point or Pythagoras’ identity states

$$\begin{aligned} \langle \nabla ^2 J(\eta )(z-{\widehat{x}}),x-{\widehat{x}}\rangle = \frac{1}{2}\Vert z-{\widehat{x}}\Vert ^2_{\nabla ^2 J(\eta )} + \frac{1}{2}\Vert x-{\widehat{x}}\Vert ^2_{\nabla ^2 J(\eta )} - \frac{1}{2}\Vert x-z\Vert ^2_{\nabla ^2 J(\eta )}. \end{aligned}$$

Applying this in (86), we obtain (84).

To prove (85), we use (83), the definition of $\delta _{z,\eta }$, and (84). $\square $

Appendix C: Projected Gradients and Smoothness

The next lemma generalises well-known properties (see, e.g., [22]) of smooth convex functions to projected gradients, when we take P as projection operator. With P a random projection, taking the expectation in (89), we in particular obtain a connection to the Expected Separable Over-approximation property in the stochastic coordinate descent literature [34].

Lemma C.1

Let $J \in \mathrm {cpl}(X)$, and $P \in \mathcal {L}(X; X)$ be self-adjoint and positive semi-definite on a Hilbert space X. Suppose P has a pseudo-inverse $P^\dag $ satisfying $ P P^\dag P = P$. Consider the properties:

(i)
P-relative Lipschitz continuity of $\nabla J$ with factor L:
$$\begin{aligned} \Vert \nabla J(x)-\nabla J(y)\Vert _P \le L \Vert x-y\Vert _{P^\dag } \quad (x, y \in X). \end{aligned}$$
(87)
(ii)
The P-relative property
$$\begin{aligned} \langle \nabla J(x+Ph) - \nabla J(x),Ph\rangle \le L\Vert h\Vert _P^2 \quad (x, h \in X). \end{aligned}$$
(88)
(iii)
P-relative smoothness of J with factor L:
$$\begin{aligned} J(x+Ph) \le J(x) + \langle \nabla J(x),Ph\rangle +\frac{L}{2}\Vert h\Vert _P^2 \quad (x, h \in X). \end{aligned}$$
(89)
(iv)
The P-relative property
$$\begin{aligned} J(y) \le J(x) + \langle \nabla J(y),y-x\rangle -\frac{1}{2L}\Vert \nabla J(x)-\nabla J(y)\Vert _P^2 \quad (x, h \in X). \end{aligned}$$
(90)
(v)
P-relative co-coercivity of $\nabla J$ with factor $L^{-1}$:
$$\begin{aligned} L^{-1} \Vert \nabla J(x)-\nabla J(y)\Vert _P^2 \le \langle \nabla J(x)-\nabla J(y),x-y\rangle \quad (x, y \in X). \end{aligned}$$
(91)

We have (i) $\implies $ (ii) $\iff $ (iii) $\implies $ (iv) $\implies $ (v). If P is invertible, all are equivalent.

Proof

(i) $\implies $ (ii): Take $y=x+Ph$ and multiply (87) by $\Vert h\Vert _P$. Then use Cauchy–Schwarz.

(ii) $\implies $ (iii): Using the mean value theorem and (88), we compute (89):

$$\begin{aligned} \begin{aligned}&J(x+Ph) - J(x) - \langle \nabla J(x),Ph\rangle =\int _0^1 \langle \nabla J(x+tPh),Ph\rangle \,dt - \langle \nabla J(x),Ph\rangle \\&\quad =\int _0^1 \langle \nabla J(x+tPh)-\nabla J(x),Ph\rangle \,dt \le \int _0^1 t \,dt \cdot L\Vert h\Vert _P^2 = \frac{L}{2} \Vert h\Vert _P^2. \end{aligned} \end{aligned}$$

(iii) $\implies $ (ii): Add together (89) for $x=x'$ and $x=x'+Ph$.

(iii) $\implies $ (iv): Adding $-\langle \nabla J(y),x+Ph\rangle $ on both sides of (89), we get

$$\begin{aligned} J(x+Ph) - \langle \nabla J(y),x+Ph\rangle \le J(x) - \langle \nabla J(y),x\rangle + \langle \nabla J(x)-\nabla J(y),Ph\rangle +\frac{L}{2}\Vert h\Vert _P^2. \end{aligned}$$

The left hand side is minimised with respect to x by taking $x=y-Ph$. Taking on the right-hand side $h=L^{-1}(\nabla J(y)-\nabla J(x))$ therefore gives (90).

(iv) $\implies $ (v): Summing the estimate (90) with the same estimate with x and y exchanged, we obtain (91).

(v) $\implies $ (i) when P is invertible: Cauchy–Schwarz. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Valkonen, T. Testing and Non-linear Preconditioning of the Proximal Point Method. Appl Math Optim 82, 591–636 (2020). https://doi.org/10.1007/s00245-018-9541-6

Download citation

Published: 28 November 2018
Issue Date: October 2020
DOI: https://doi.org/10.1007/s00245-018-9541-6

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Testing and Non-linear Preconditioning of the Proximal Point Method

Abstract

Access this article

Similar content being viewed by others

A Proximal Point Analysis of the Preconditioned Alternating Direction Method of Multipliers

The generalized proximal point algorithm with step size 2 is not necessarily convergent

On the Optimal Linear Convergence Rate of a Generalized Proximal Point Algorithm

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: Outer Semicontinuity of Maximal Monotone Operators

Lemma A.1

Proof

Appendix B: Three-Point Inequalities

Lemma B.1

Proof

Lemma B.2

Proof

Lemma B.3

Proof

Lemma B.4

Proof

Appendix C: Projected Gradients and Smoothness

Lemma C.1

Proof

Rights and permissions

About this article

Cite this article

Mathematics Subject Classification

Navigation

Testing and Non-linear Preconditioning of the Proximal Point Method

Abstract

Access this article

Similar content being viewed by others

A Proximal Point Analysis of the Preconditioned Alternating Direction Method of Multipliers

The generalized proximal point algorithm with step size 2 is not necessarily convergent

On the Optimal Linear Convergence Rate of a Generalized Proximal Point Algorithm

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: Outer Semicontinuity of Maximal Monotone Operators

Lemma A.1

Proof

Appendix B: Three-Point Inequalities

Lemma B.1

Proof

Lemma B.2

Proof

Lemma B.3

Proof

Lemma B.4

Proof

Appendix C: Projected Gradients and Smoothness

Lemma C.1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Mathematics Subject Classification

Search

Navigation