Skip to main content

On the convergence rate of the Halpern-iteration

Abstract

In this work, we give a tight estimate of the rate of convergence for the Halpern-iteration for approximating a fixed point of a nonexpansive mapping in a Hilbert space. Specifically, using semidefinite programming and duality we prove that the norm of the residuals is upper bounded by the distance of the initial iterate to the closest fixed point divided by the number of iterations plus one.

Introduction

Let H be a Hilbert space equipped with a symmetric inner product \(\langle . , . \rangle :H \times H \rightarrow {\mathbb {R}}\). Let \(T:H \rightarrow H\) be a nonexpansive mapping and consider for fixed \(x_0\in H\) the Halpern-Iteration

$$\begin{aligned} x_{k+1}:= \lambda _k x_0 + (1- \lambda _k)T(x_k) \end{aligned}$$
(1)

from [5] with \(\lambda _k:= \tfrac{1}{k+2}\) for approximating a fixed point of T. Let \(\Vert x \Vert :=\sqrt{\langle x , x \rangle }\) denote the induced norm and \( Fix(T):= \{ x \in H \ : \ x=T(x) \}\) the set of fixed points of T. It is well known that, if the set Fix(T) is nonempty, then the sequence \( \{x_k\}_{k\in {\mathbb {N}}_0}\) will converge to \(x_*\in Fix(T)\) minimizing the distance to \(x_0\); see [17] Theorem 2, and [18] for generalizations of this remarkable property. As a consequence the norm of the residuals \(x_k -T(x_k)\) tends to zero, i.e. \( \lim _{k \rightarrow \infty } \Vert x_k -T(x_k) \Vert =0\). Our goal here is to quantify their rate of convergence. A first result of this type was generated via proof mining in [10] in normed spaces (see also [8] and [9] for further details on results in more general spaces.). Here, we improve the result for the setting of Hilbert spaces. Our proof technique is not based on proof mining, but on semidefinite programming, and is strongly motivated by the recent work of Taylor et al. [15] on worst case performance of first order minimization methods. Our methodology and focus here are, however, slightly different. We present two new proofs below. The first one is short and uses a parameter choice derived from the technique used in [15]. The second proof based on semidefinite programming is self-contained and adapts the framework of [15] to fixed point problems. The second approach can also be applied to other choices of parameters \(\lambda _k\) and to other fixed point methods. The rates are however in general not obvious. After the manuscript of this work was made public in late 2017, the author (see section 4.1 and following of [12]) and independently other authors (see e.g. [2,3,4, 6, 16]) have studied similar setups, which may all be categorized as part of the performance estimation framework. Let us briefly sketch how our setup here and in [12] may be applied in the context of proximal point algorithms (e.g. the setup of [2, 6]) by defining the nonexpansive operator \(T:=2J-I\), where I is the identity and J is a firmly nonexpansive operator (e.g. the resolvent operator of a maximal monotone operator). Because the fixed points of J and T are the same, one may now apply the Halpern-Iteration (1) to find a fixed point of J. The (tight) convergence rate is then implied by Theorem 2.1 below.

Main result

Theorem 2.1

Let \( x_0 \in H\) be arbitrary but fixed. If T has fixed points, i.e. \( Fix(T) \not = \emptyset \), then the iterates defined in (1) satisfy

$$\begin{aligned} \tfrac{1}{2} \Vert x_k - T(x_k) \Vert \le \frac{ \Vert x_0 - x_* \Vert }{k+1} \ \forall k \in {\mathbb {N}}_0 \quad \forall x_* \in Fix(T) \end{aligned}$$
(2)

This bound is tight.

Remark 2.1

A generalization of the Halpern-Iteration, the sequential averaging method (SAM), was analyzed in the recent paper [14], where for the first time a rate of convergence of order O(1/k) could be established for SAM. The rate of convergence in (2) is even slightly faster than the one established for the more general framework in [14] (by a factor of 4). More importantly, however, as shown by Example 3.1 below, the estimate (2) is actually tight, in the sense that for every \(k\in {\mathbb {N}}_0\) there exists a Hilbert space H and a nonexpansive operator T with some fixed point \(x_*\) such that the inequality (2) is not strict.

Estimate (2) refers to the step length \(\lambda _k:=1/(k+2)\). The restriction to this choice is motivated by problem (17) below in the proof based on semidefinite programming; in numerical tests for small for dimensions k these coefficients provided a better worst-case complexity than any other choice of coefficients.

Next, an elementary direct proof of Theorem 2.1 is given.

Direct proof based on a weighted sum

The iteration (1) with \(\lambda _k=1/(k+2)\) implies for \(1\le j\le k\):

$$\begin{aligned} \textstyle x_j = \frac{1}{j+1}x_0 + \frac{j}{j+1}T(x_{j-1}) \quad \hbox {or}\,\, \ T(x_{j-1}) = \frac{j+1}{j}x_j-\frac{1}{j}x_0 . \end{aligned}$$
(3)

By nonexpansiveness the following inequalities hold:

$$\begin{aligned} {\Vert T(x_k)-x_*\Vert ^2}\le {\Vert x_k-x_*\Vert ^2} \quad \hbox {for}\,\, x_* \in Fix(T) \end{aligned}$$
(4)

and

$$\begin{aligned} {\Vert T(x_j)-T(x_{j-1})\Vert ^2}\le {\Vert x_j-x_{j-1}\Vert ^2} \quad \hbox {for} \,\, j=1,\ldots ,k. \end{aligned}$$
(5)

Below we reformulate the following weighted sum of (5):

$$\begin{aligned} 0 \ge \sum _{j=1}^kj(j+1)\left( {\Vert T(x_j)-T(x_{j-1})\Vert ^2}- {\Vert x_j-x_{j-1}\Vert ^2} \right) . \end{aligned}$$
(6)

Using the second relation in (3) the first terms in the summation (6) are

$$\begin{aligned}&j(j+1){\Vert T(x_j)-T(x_{j-1})\Vert ^2} \nonumber \\&\quad = j(j+1) {\Vert x_j-T(x_j)+\frac{1}{j} (x_j-x_0)\Vert ^2} \nonumber \\&\quad = \textstyle j(j+1){\Vert x_j-T(x_j)\Vert ^2}+2(j+1)\langle x_j-T(x_j) , x_j-x_0 \rangle +\frac{j+1}{j} {\Vert x_j-x_0\Vert ^2}, \nonumber \\ \end{aligned}$$
(7)

and using the first relation in (3) it follows for the second terms in (6)

$$\begin{aligned}&-j(j+1){\Vert x_j-x_{j-1}\Vert ^2} \nonumber \\&\quad = \textstyle - j(j+1) {\Vert \frac{1}{j+1}(x_0-T(x_{j-1}))+T(x_{j-1}) -x_{j-1}\Vert ^2} \nonumber \\&\quad = \textstyle -\frac{j}{j+1} {\Vert x_0-T(x_{j-1})\Vert ^2}-2j\langle x_0-T(x_{j-1}) , T(x_{j-1}) -x_{j-1} \rangle \nonumber \\&\qquad -j(j+1) {\Vert T(x_{j-1}) -x_{j-1}\Vert ^2}. \end{aligned}$$
(8)

Observe [using again the second relation in (3)] that the first term in (8)

$$\begin{aligned} -\frac{j}{j+1} {\Vert x_0-T(x_{j-1})\Vert ^2}= & {} - \frac{j}{j+1} {\Vert \frac{j+1}{j}x_0-\frac{j+1}{j}x_j\Vert ^2} \nonumber \\= & {} - \frac{j+1}{j}{\Vert x_0-x_j\Vert ^2} \end{aligned}$$
(9)

cancels the third term in (7). Summing up the second terms in (8) for \(j=1,\ldots ,k\) we shift the summation index,

$$\begin{aligned} -\sum _{j=1}^k 2j\langle x_0-T(x_{j-1}) , T(x_{j-1})-x_{j-1} \rangle =\sum _{j=0}^{k-1}2(j+1)\langle x_{j}-T(x_{j}) , x_0-T(x_{j}) \rangle , \end{aligned}$$

so that summing up the second terms in (7) and in (8) for \(j=1,\ldots ,k\) results in

$$\begin{aligned}&2(k+1)\langle x_k-T(x_k) , x_k-x_0 \rangle \nonumber \\&\quad +2\sum _{j=1}^{k-1}(j+1) \langle x_j-T(x_j) , x_j-T(x_j) \rangle +2{\Vert x_0-T(x_0)\Vert ^2}. \end{aligned}$$
(10)

Shifting again the index in the summation of the third terms in (8)

$$\begin{aligned} -\sum _{j=1}^k j(j+1){\Vert x_{j-1}-T(x_{j-1})\Vert ^2}= -\sum _{j=0}^{k-1} (j+1)(j+2){\Vert x_{j}-T(x_{j})\Vert ^2} \end{aligned}$$

and summing up the first terms in (7) and the third terms in (8) for \(j=1,\ldots ,k\) gives

$$\begin{aligned} k(k+1){\Vert x_{k}-T(x_{k})\Vert ^2}-2\sum _{j=1}^{k-1}(j+1){\Vert x_j-T(x_j)\Vert ^2}-2{\Vert x_0-T(x_0)\Vert ^2} \end{aligned}$$
(11)

where the sum in the middle cancels the sum in the middle of (10) and the terms \(2{\Vert x_0-T(x_0)\Vert ^2}\) cancel as well. The only remaining terms are the first terms in (10) and (11).

Thus, inserting (9), (10), and (11) in (6) leads to

$$\begin{aligned} 0 \ge k(k+1){\Vert x_k-T(x_k)\Vert ^2}+2(k+1)\langle x_k-T(x_k) , x_k-x_0 \rangle . \end{aligned}$$
(12)

Applying the Cauchy–Schwarz inequality to the second term in (12) leads to

$$\begin{aligned} \frac{1}{2} \Vert x_k-T(x_k)\Vert \le \frac{1}{k} \Vert x_k-x_0\Vert \end{aligned}$$

which may be interesting in its own right. To prove the theorem, (12) is divided by \(k+1\) and then (4) is added:

$$\begin{aligned} 0\ge & {} k{\Vert x_k-T(x_k)\Vert ^2}+2\langle x_k-T(x_k) , x_k-x_0 \rangle + {\Vert T(x_k)-x_*\Vert ^2}- {\Vert x_k-x_*\Vert ^2}\nonumber \\= & {} \textstyle \frac{k+1}{2} {\Vert x_k-T(x_k)\Vert ^2}-\frac{2}{k+1}{\Vert x_0-x_*\Vert ^2} \nonumber \\&+ \frac{2}{k+1}{\Vert x_0-x_*- \frac{k+1}{2}(x_k-T(x_k))\Vert ^2}. \end{aligned}$$
(13)

To see the last equation, the last two terms in (13) can be combined, and then a straightforward but tedious multiplication of the terms \(a:=x_k-T(x_k)\), \(b:= x_k-x_0\), \(c:= T(x_k)-x_*\), \(a+c=x_k-x_*\), and \(a+c-b =x_0-x_*\) reveals the identity.

Omitting the last term in (13) one obtains

$$\begin{aligned} {\Vert x_k-T(x_k)\Vert ^2}\le \left( \tfrac{2}{k+1}\right) ^2 {\Vert x_0-x_*\Vert ^2} \end{aligned}$$

which proves the theorem when taking square roots on both sides. \(\square \)

The above proof is somewhat unintuitive as the choice of the weights with which the inequalities (4) and (5) are added in (6) and (13) is far from obvious. In fact we owe the suggestion of these weights to an extremely helpful anonymous referee, who extracted it from a more complex construction in [15] which was also the basis for the initial proof of this paper based on semidefinite programming. We state this proof next since it offers a generalizable approach for analyzing fixed point iterations; it can be modified to the KM iteration, for example in the recent thesis [12]—though this modification is quite technical. The proof based on semidefinite programming also led to Example 3.1 below showing that the rate of convergence is tight.

Proof based on semidefinite programming Let \(x_* \in Fix(T)\). The Halpern-Iteration was stated in the form (1) to comply with existing literature. For our proof however, it is more convenient to consider the shifted sequence \({\bar{x}}_1:= x_0\) and \({\bar{x}}_k:=x_{k-1} \ \forall k \in {\mathbb {N}}_{\not = 0 }:=\{1,2,3,\dots \} \) and to show a shifted statement

$$\begin{aligned} \tfrac{1}{2} \Vert {\bar{x}}_k - T({\bar{x}}_k) \Vert \le \frac{\Vert {\bar{x}}_1-x_* \Vert }{k} \quad \forall \,\, k\in {\mathbb {N}}_{\not =0 } \end{aligned}$$
(14)

Let us define \(g(x):=\frac{1}{2}(x-T(x))\). It is well known that g is firmly nonexpansive. For sake of completeness the argument is repeated here:

$$\begin{aligned} \begin{aligned}&\Vert g(x)-g(y) \Vert ^2 - \langle g(x)-g(y) , x-y \rangle \\&\quad = \Vert g(x)-g(y) - \tfrac{1}{2}(x-y) \Vert ^2 - \tfrac{1}{4} \Vert x-y \Vert ^2 \\&\quad = \tfrac{1}{4} \Vert T(x)-T(y) \Vert ^2 - \tfrac{1}{4}\Vert x-y \Vert ^2 \le 0 \quad \forall \,\, x,y\in H. \end{aligned} \end{aligned}$$

Nonexpansiveness and the Cauchy–Schwarz inequality also imply \( \Vert g(x)-g(y) \Vert \le \Vert x-y \Vert \forall x,y\in H\). For \( k =1\) the statement (14) follows immediately since \(g(x_*)=0\) and therefore \( \tfrac{1}{2} \Vert {\bar{x}}_1 - T( {\bar{x}}_1) \Vert = \Vert g({\bar{x}}_1 ) \Vert = \Vert g({\bar{x}}_1) - g( {\bar{x}}^*) \Vert \le \tfrac{\Vert {\bar{x}}_1-x_* \Vert }{1}\).

For fixed \(k\ge 2\) we first consider the differences \({\bar{x}}_j - {\bar{x}}_1\) for \( j \in \{2,\dots ,k \}\)

$$\begin{aligned} \begin{aligned} {\bar{x}}_{j} -{\bar{x}}_1&= x_{j-1} - {\bar{x}}_1\\&= \lambda _{j-2} x_0 + (1- \lambda _{j-2}) T(x_{j-2}) - {\bar{x}}_1\\&= \tfrac{1}{j} x_0 + \left( 1- \tfrac{1}{j}\right) T(x_{j-2} ) - {\bar{x}}_1 \\&= \left( \tfrac{1}{j}-1\right) {\bar{x}}_1 + \left( 1- \tfrac{1}{j}\right) T( {\bar{x}}_{j-1} ) \\&= \left( \tfrac{1}{j} -1\right) {\bar{x}}_1 + \left( 1- \tfrac{1}{j}\right) ({\bar{x}}_{j-1} - 2 g( {\bar{x}}_{j-1}) ) \\&= \left( 1-\tfrac{1}{j}\right) ({\bar{x}}_{j-1} - {\bar{x}}_1) - 2 \left( 1- \tfrac{1}{j}\right) g({\bar{x}}_{j-1} ) \\&= \tfrac{j-1}{j} ({\bar{x}}_{j-1} - {\bar{x}}_1) - 2 \tfrac{j-1}{j} g({\bar{x}}_{j-1} ) \end{aligned} \end{aligned}$$

which inductively leads to

$$\begin{aligned} {\bar{x}}_{j} -{\bar{x}}_1=- 2 \sum _{l=1}^{j-1} \tfrac{l}{j} g({\bar{x}}_{l}). \end{aligned}$$

Let us shorten the notation slightly and define \( g_i:= g({\bar{x}}_i) \), \( R:= \Vert {\bar{x}}_1 -x_* \Vert \ge 0\), the vector \(b=(\langle g_i , {\bar{x}}_1-x_* \rangle )_{i=1}^k\), the matrices \(A:= (\langle g_i , g_j \rangle )_{i,j=1}^k\) and

$$\begin{aligned} L:=-2 \begin{pmatrix} 0 &{} \quad \tfrac{1}{2} &{} \quad \tfrac{1}{3} &{} \quad \ldots &{} \quad \tfrac{1}{k}\\ 0 &{} \quad 0 &{} \quad \tfrac{2}{3} &{} \quad \ldots &{} \quad \tfrac{2}{k}\\ \vdots &{} \quad \vdots &{} \quad 0 &{} \quad \ddots &{} \quad \vdots \\ 0 &{} \quad 0 &{} \quad 0 &{} \quad 0 &{} \quad \tfrac{k-1}{k} \\ 0 &{} \quad 0 &{} \quad 0 &{} \quad 0 &{} \quad 0 \end{pmatrix} \in {\mathbb {R}}^{k\times k}. \end{aligned}$$

Let \(b^T\) denote the transpose of b. Note that

$$\begin{aligned} \begin{pmatrix} R^2 &{} \quad b^T \\ b &{} \quad A \end{pmatrix} \in {\mathbb {R}}^{(k+1) \times (k+1)} \end{aligned}$$

is a Gramian matrix formed from \( {\bar{x}}_1-x_*,g_1,\dots ,g_k\in H\) and is therefore symmetric and positive semidefinite. We proceed by expressing the inequalities from firm nonexpansiveness in terms of the Gram-Matrix. Since L often is of much lower dimension than H, this is sometimes referred to as “kernel trick”. Keeping in mind that we can rewrite the differences \( {\bar{x}}_j - {\bar{x}}_1 = - 2 \sum _{l=1}^{j-1} \frac{l}{j} g_l\) for \(j\in \{1,\dots ,k\}\), we arrive at

$$\begin{aligned} AL = ( \langle g_i , {\bar{x}}_j - {\bar{x}}_1 \rangle )_{i,j=1}^k. \end{aligned}$$

Let \(e \in {\mathbb {R}}^k\) denote the vector of all ones. Then

$$\begin{aligned} \mathrm {diag} (AL) e^T - AL= ( \langle g_i , {\bar{x}}_i - {\bar{x}}_j \rangle )_{i,j=1}^k, \end{aligned}$$

where \(\mathrm {diag}(.)\) denotes the diagonal of its matrix argument, holds true. Hence

$$\begin{aligned} \mathrm {diag} (AL) e^T +e \ \mathrm {diag} (AL)^T - AL -L^T A= ( \langle g_i-g_j , {\bar{x}}_i - {\bar{x}}_j \rangle )_{i,j=1}^k \end{aligned}$$

and

$$\begin{aligned} b e^T + AL= & {} ( \langle g_i , {\bar{x}}_j - x_* \rangle )_{i,j=1}^k,\\ \mathrm {diag} (A) e^T + e \ \mathrm {diag}(A)^T -2 A= & {} (\Vert g_i-g_j \Vert ^2)_{i,j=1}^k. \end{aligned}$$

The firm nonexpansiveness inequalities \( \Vert g_i- g_j \Vert ^2 \le \langle g_i-g_j , {\bar{x}}_i-{\bar{x}}_j \rangle \) are equivalent to the component-wise inequality

$$\begin{aligned} \mathrm {diag} (A) e^T + e \ \mathrm {diag}(A)^T -2 A\le & {} \mathrm {diag} (AL) e^T +e \ \mathrm {diag} (AL)^T \nonumber \\&- AL -L^T A. \end{aligned}$$
(15)

Note that only \( \tfrac{k^2-k}{2}\) of these componentwise inequalities are non redundant. From

\(g_*:=g(x_*)=0\) we obtain another k inequalities, i.e. \( \Vert g_i \Vert ^2 \le \langle g_i , {\bar{x}}_i-x_* \rangle \), which translate to

$$\begin{aligned} \mathrm {diag}(A) \le b + \mathrm {diag}(AL). \end{aligned}$$
(16)

Defining \(U:=I-L\) , relations (15) and (16) can be shortened slightly to

$$\begin{aligned} \mathrm {diag} (AU) e^T + e \ \mathrm {diag}(AU)^T \le AU +U^T A \end{aligned}$$

and

$$\begin{aligned} \mathrm {diag}(AU) \le b. \end{aligned}$$

Let \(e_k \in {\mathbb {R}}^k\) denote the k-th unit vector, \( {\mathbb {S}}^n := \{ X \in {\mathbb {R}}^{n \times n }\ | \ X=X^T \}\) denote the space of symmetric matrices and \({\mathbb {S}}_+^n := \{ X \in {\mathbb {S}}^n\ | x^TXx\ge 0 \ \forall x \in {\mathbb {R}}^n \}\) the convex cone of positive semidefinite matrices. Consider the chain of inequalities

$$\begin{aligned}&\begin{aligned} \Vert g({\bar{x}}_k) \Vert ^2 = \underset{ y_0 \in {\mathbb {R}}, y_1 \in {\mathbb {R}}^{k} , Y_2\in {\mathbb {S}}^k}{\text {maximize}} (Y_2)_{kk} \,&| \ \begin{pmatrix} y_0 &{} y_1^T \\ y_1 &{} Y_2 \end{pmatrix} \in {\mathbb {S}}_+^{k+1}, \ y_0\le R^2, \ \mathrm {diag}(Y_2 U) \le y_1 \\&|\ \mathrm {diag}(Y_2 U)e^T + e \ \mathrm {diag}(U^T Y_2)^T \le Y_2U+ U^TY_2 \\&| \ y_0=R^2 ,y_1=b, Y_2= A \\ \\ \end{aligned} \nonumber \\&\begin{aligned} \le \underset{ y_0 \in {\mathbb {R}}, y_1 \in {\mathbb {R}}^{k} , Y_2=Y_2^T \in {\mathbb {S}}^k}{\text {maximize}} (Y_2)_{kk} \&| \ \begin{pmatrix} y_0 &{} y_1^T \\ y_1 &{} Y_2 \end{pmatrix} \in {\mathbb {S}}_+^{k+1}, \ y_0\le R^2, \ \mathrm {diag}(Y_2 U) \le y_1 \\&|\ \mathrm {diag}(Y_2 U)e^T + e \ \mathrm {diag}(U^T Y_2)^T \le Y_2U+ U^TY_2 \\ \\ \end{aligned} \end{aligned}$$
(17)
$$\begin{aligned}&\begin{aligned} \le \underset{ \xi \in {\mathbb {R}}_+, X\in {\mathbb {S}}^k \cap {\mathbb {R}}_+^{k\times k }}{\text {minimize}} \ \ R^2 \xi \ | \&\begin{pmatrix} \xi &{} -\tfrac{1}{2} \mathrm {diag}(X)^T \\ -\tfrac{1}{2} \mathrm {diag}(X) &{} U F(X) + F(X)U^T \end{pmatrix} - \begin{pmatrix} 0&{} 0\\ 0 &{} e_ke_k^T \end{pmatrix} \in {\mathbb {S}}_+^{k+1} \end{aligned} \end{aligned}$$
(18)

for

$$\begin{aligned} F(X) := \mathrm {Diag}(Xe) + \tfrac{1}{2} \mathrm {Diag}(\mathrm {diag}(X))-X, \end{aligned}$$
(19)

where \(\mathrm {Diag}(.)\) denotes the square diagonal matrix with its vector argument on the diagonal. The first equality follows from construction, the first inequality from relaxing, and the second inequality from weak conic duality as detailed in Sect. 5. We conclude the proof by showing feasibility of \({\hat{\xi }} := \frac{1}{k^2} >0\) and

$$\begin{aligned} {\hat{X}} := \frac{1}{k^2} \begin{pmatrix} 0 &{} \quad 1 \cdot 2 &{} \quad 0 &{} \quad \ldots &{} \quad \ldots &{} \quad 0 \\ 1 \cdot 2 &{} \quad 0&{} \quad 2 \cdot 3 &{} \quad 0 &{} \quad \ldots &{} \quad \vdots \\ 0 &{} \quad 2 \cdot 3 &{} \quad \ddots &{} \quad \ddots &{} \quad \ddots &{} \quad \vdots \\ \vdots &{} \quad \ddots &{} \quad \ddots &{} \quad \ddots &{} \quad (k-2)(k-1) &{} \quad 0 \\ 0&{} \quad \ldots &{} \quad 0&{} \quad (k-2)(k-1) &{} \quad 0&{} \quad (k-1)k \\ 0 &{} \quad \ldots &{} \quad 0 &{} \quad 0 &{} \quad (k-1)k &{} \quad 2 k \end{pmatrix} \in {\mathbb {R}}^{k \times k} \end{aligned}$$

for the last optimization problem (18). First note that \({\hat{X}}= {\hat{X}}^T\) is symmetric and nonnegative. A short computation reveals, that the equality

$$\begin{aligned} U F({\hat{X}}) + F({\hat{X}})U^T= 2 e_k e_k^T \end{aligned}$$

holds true: Define the diagonal matrix \( D:= \frac{1}{k} \mathrm {Diag}([1,\ldots ,k ]^T ) \in {\mathbb {R}}^{k \times k }\), together with the strict upper triangular matrix

$$\begin{aligned} P:= \begin{pmatrix} 0 &{} \quad 1 &{} \quad \ldots &{} \quad 1\\ &{} \quad \ddots &{} \quad \ddots &{} \quad \vdots \\ &{} \quad &{} \quad \ddots &{} \quad 1 \\ &{} \quad &{} \quad &{} \quad 0 \end{pmatrix} \in {\mathbb {R}}^{k\times k} \end{aligned}$$
(20)

and the bidiagonal matrix

$$\begin{aligned} B:= \begin{pmatrix} 0 &{} \quad 1 &{} \quad &{} \quad \\ 1 &{} \quad \ddots &{} \quad \ddots &{} \quad \\ &{} \quad \ddots &{} \quad \ddots &{} \quad 1 \\ &{} \quad &{} \quad 1 &{} \quad 0 \end{pmatrix} \in {\mathbb {S}}^k. \end{aligned}$$

The matrices \(U, {\hat{X}}\) and \(F({\hat{X}})\) can now be expressed as

$$\begin{aligned} \begin{aligned}&U= I +2 D P D^{-1}, \quad {\hat{X}} = DBD+\frac{2}{k} e_ke_k^T&\quad \text { and } \\&F({\hat{X}} ) = 2D^2-e_ke_k^T -DBD. \end{aligned} \end{aligned}$$
(21)

Combining the equalities (21), \(D e_k =e_k\) and \( D^{-1} e_k =e_k\), yields

$$\begin{aligned} \begin{aligned} U F ({\hat{X}} )&= 2 D^2 -e_k e_k^T- DBD +4 DPD-2 DP e_k e_k^T -2 DP BD \\&= D(2I -e_k e_k^T -B+4 P-2 P e_k e_k^T -2 PB) D \end{aligned} \end{aligned}$$
(22)

and using (22) we compute

$$\begin{aligned} \begin{aligned}&U F({\hat{X}}) + F({\hat{X}}) U^T -2 e_k e_k^T \\&\quad = D(4I - 2e_k e_k^T -2 B + 4 P-2 P e_k e_k^T -2 PB + 4 P^T-2 e_k e_k^T P^T -2 B P^T - 2 e_k e_k^T)D \\&\quad = D(\underbrace{4I+ 4 P+ 4 P^T}_{=4 e e^T} - 4e_k e_k^T -2 \underbrace{P e_k}_{e-e_k} e_k^T -2 e_k \underbrace{e_k^T P^T}_{e^T-e_k^T} -2 B -2 PB -2 B P^T )D \\&\quad = D (4 e e^T \underbrace{-2 ee_k^T -2 e_k e^T -2 B -2 PB -2 B P^T}_{=-4 ee^T}) D \\&\quad = 0, \end{aligned} \end{aligned}$$

which implies \(U F({\hat{X}}) + F({\hat{X}})U^T= 2 e_k e_k^T\) as we claimed above. Consequently

$$\begin{aligned} \begin{pmatrix} {\hat{\xi }} &{} \quad -\tfrac{1}{2} \mathrm {diag}({\hat{X}})^T \\ -\tfrac{1}{2} \mathrm {diag}({\hat{X}}) &{} \quad U F({\hat{X}}) + F({\hat{X}})U^T \end{pmatrix} - \begin{pmatrix} 0&{} \quad 0\\ 0 &{} \quad e_ke_k^T \end{pmatrix} =\begin{pmatrix} \frac{1}{k^2} &{} \quad - \frac{1}{k} e_k^T \\ - \frac{1}{k} e_k &{} \quad e_k e_k^T \end{pmatrix} \succeq 0 \end{aligned}$$

is positive semidefinite and as a result, \( {\hat{\xi }}\) and \( {\hat{X}}\) is feasible for (18). Hence

$$\begin{aligned} \Vert g({\bar{x}}_k) \Vert ^2 \le R^2 {\hat{\xi }} = \frac{\Vert {\bar{x}}_1 - x_* \Vert ^2}{k^2}, \end{aligned}$$

which yields the desired result after taking the square root. \(\square \)

Remark 2.2

The matrix \({\hat{X}}\) in the above proof carrying the weights \(j(j+1)\) used in (6) were obtained by solving (18) with YALMIP [11] in combination with the SDP solver Sedumi [13] for small values of k. In order to provide a theoretical proof that the points \({\hat{\xi }}\) and \({\hat{X}}\) above are not only feasible but actually optimal for (18) and to prove tightness of the derived bound, we refer to Example 3.1 below, which was derived from a, numerically obtained, low-rank optimal solution of (17). More precisely, after numerically determining the optimal value of (18) a linear equation was added to (18) requiring that \((Y_2)_{kk}\) equals this value, and then the trace of \(Y_{2,2}\) was minimized with the intention to find the optimal solution with minimum rank. This optimal solution was then used to derive Example 3.1 below proving the tightness of (2). In fact for any optimal solution of the SDP relaxation (17), there exists at least one Lipschitz continuous operator \({{\tilde{T}}}_k:{\mathbb {R}}^d \rightarrow {\mathbb {R}}^d\) for appropriately chosen d with some fixed point \(x_*\) such that the inequality in (18) is tight: This follows from appropriately labeling the columns of the symmetric square root of such an optimal solution and a Lipschitz extension argument. Specifically the Kirszbraun-Theorem [7] allows a Lipschitz extension of an operator that is Lipschitz on a discrete set to the entire space. For further details we refer to Section 4.2 of [12].

Tightness and choice of step lengths

Example 3.1

We consider the following one-dimensional real example with fixed point \(x_* = 0\) and starting point \(x_0\not = 0\). Let \(k\in {\mathbb {N}}\) be given. A nonexpansive mapping proving tightness of (2) can then be set up as follows: Let \(T:{\mathbb {R}}\rightarrow {\mathbb {R}}\) be defined via

$$\begin{aligned} T(x):= {\left\{ \begin{array}{ll} x+\tfrac{2R}{k+1}&{} \quad \text { if } x\le - \tfrac{R}{k+1} \\ -x &{} \quad \text { if } -\tfrac{R}{k+1}< x < \tfrac{R}{k+1} \\ x-\tfrac{2R}{k+1}&{} \quad \text { if } \tfrac{R}{k+1} \le x \end{array}\right. } \end{aligned}$$
(23)

for some fixed \(k\in {\mathbb {N}}\) and \(R:= \Vert x_0-x_*\Vert _2\) with \(x_0\in {\mathbb {R}}\) and \(x_*:= 0\). Note that T satisfies \(T(x_*)=0=x_*\) and is 1-Lipschitz continuous, i.e. nonexpansive, because it is piece-wise linear, continuous and the derivative is bounded in norm by one (\(|T'|\le 1\)) whenever it exists. We will now show that applying the Halpern-Iteration results in an equality in (2) for the k-th iterate, i.e.

$$\begin{aligned} \Vert \tfrac{1}{2}(x_k-T(x_k))\Vert _2=\tfrac{\Vert x_0-x_*\Vert _2}{k+1} \end{aligned}$$

is satisfied. This means that the bound (2) can not be improved without making further assumptions, as the operator above would otherwise pose a counterexample. For the first k iterates of the Halpern-Iteration (1) we can obtain

$$\begin{aligned} x_j=x_0 \ \underset{\ge \tfrac{1}{k+1}}{\underbrace{\left( 1- \tfrac{j}{k+1} \right) }} \quad \text { for }\,\, j\in {0,\dots ,k} \end{aligned}$$

inductively: for \(x_0=0=x_*\) there is nothing to prove. The case \(0<x_0=\Vert x_0-x_*\Vert _2 =R\) follows by using the definition of T and considering for \( j \in \{0,\dots ,k-1\}\) the iterates

$$\begin{aligned} \begin{aligned} x_{j+1}&= \tfrac{x_0}{j+2} +\left( 1 - \tfrac{1}{j+2}\right) T(\underset{\ge \tfrac{x_0}{k+1}=\tfrac{R}{k+1}}{\underbrace{x_j}})\\&=\tfrac{x_0}{j+2} +\left( 1 - \tfrac{1}{j+2} \right) \left( x_j -\tfrac{2R}{k+1}\right) \\&=\tfrac{x_0}{j+2} +\left( 1 - \tfrac{1}{j+2}\right) \left( x_0 \, \left( 1- \tfrac{j}{k+1} \right) -\tfrac{2x_0}{k+1}\right) \\&= x_0\left( \tfrac{1}{j+2} +\left( 1 - \tfrac{1}{j+2}\right) \left( 1- \tfrac{j}{k+1} -\tfrac{2}{k+1}\right) \right) \\&= x_0\left( \tfrac{1}{j+2} +1 - \tfrac{j+2}{k+1}- \tfrac{1}{j+2} +\tfrac{1}{k+1}\right) \\&= x_0 \ \underset{\ge \tfrac{1}{k+1}}{\underbrace{\left( 1- \tfrac{j+1}{k+1} \right) }}, \end{aligned} \end{aligned}$$

which imply, that

$$\begin{aligned} \Vert \tfrac{1}{2}(x_k-T(x_k))\Vert _2= & {} \Vert \tfrac{1}{2}\left( x_0 \left( 1- \tfrac{k}{k+1}\right) -\left( x_0\left( 1- \tfrac{k}{k+1}\right) -\tfrac{2R}{k+1}\right) \right) \Vert _2 \nonumber \\= & {} \tfrac{R}{k+1}= \tfrac{\Vert x_0-x_*\Vert _2}{k+1} \end{aligned}$$

holds true. The case \(x_0<0\) follows from the operators point symmetry, i.e. \(T(-x)=-T(x)\). This completes the proof of tightness. \(\square \)

While Example 3.1 shows that the bound (2) is best possible for the Halpern-Iteration with \(\lambda _k=1/(k+2)\) , the rate of convergence could be improved for this example, if the values \(\lambda _k\) were chosen appropriately less than \(1/(k+2)\). Let us illustrate next that this is not true in general, i.e. that choosing smaller values of \(\lambda _k\) does not always provide faster convergence. Let H be a Hilbert space with a countable orthonormal Schauder basis \(\{e_j\}_{j\in {\mathbb {N}}}\) and T be the linear operator defined by \(T(e_j)=e_{j+1}\) for \(j\in {\mathbb {N}}\). Hence the unique fixed point is \(x_*=0\). When choosing \(x_0=e_1\), then for any choice of step lengths \(\lambda _j\in [0,1]\), the k-th iterate always lies in the convex hull of \(e_1,\ldots ,e_{k+1}\), and the choice of \(\lambda _j\) minimizing the error \(\Vert x_k-x_*\Vert \) is precisely the step length \(\lambda _j=1/(j+2)\) for all \(1\le j\le k\). This step length leads to a slightly faster rate of convergence than (2) for this example, namely \(\frac{1}{2} \Vert x_k-T(x_k)\Vert \le \frac{\Vert x_0-x_*\Vert }{\sqrt{2}(k+1)}\). While this step length does not minimize the residual \(\frac{1}{2} \Vert x_k-T(x_k)\Vert \) it shows that smaller values of \(\lambda _j\) such as \(\lambda _j := \rho /(j+2)\) for all j with \(\rho \in [0,1)\) lead to larger residuals. On the other hand larger values \(\lambda _j > 1/(j+2)\) for all j lead to larger residuals for Example 3.1.

Conclusions

We have derived a new and tight complexity bound for the Halpern-Iteration with coefficients chosen as \(\lambda _k= \tfrac{1}{k+2}\). The proof based on semidefinite programming can in principle be adapted for other choices of parameters and fixed point iterations, again leading to tight complexity bounds. For example, for the Krasnoselski–Mann (KM) iteration (see e.g. [1])

$$\begin{aligned} x_{k+1}:=(1-t) x_k+t \ T(x_k) \end{aligned}$$

with some constant stepsize \(t\in [\tfrac{1}{2},1]\) a proof can be found in [12] (Theorem 4.9), where

$$\begin{aligned} L:=-2 \begin{pmatrix} 0 &{} \quad t &{} \quad t &{} \quad \ldots &{} \quad t\\ 0 &{} \quad 0 &{} \quad t &{} \quad \ldots &{} \quad t\\ \vdots &{} \quad \vdots &{} \quad 0 &{} \quad \ddots &{} \quad \vdots \\ 0 &{} \quad 0 &{} \quad 0 &{} \quad 0 &{} \quad t \\ 0 &{} \quad 0 &{} \quad 0 &{} \quad 0 &{} \quad 0 \end{pmatrix} \in {\mathbb {R}}^{k\times k} \end{aligned}$$

is used to to define inequalities of the form (15). However, while in practice the KM-Iteration with constant stepsize may often perform much better than the Halpern-Iteration, its worst-case complexity is an order of magnitude worse—and the convergence analysis based on semidefinite programming is considerably longer.

References

  1. Cominetti, R., Soto, J.A., Vaisman, J.: On the rate of convergence of Krasnoselskii–Mann iterations and their connection with sums of Bernoullis. Isr. J. Math. 199(2), 757–772 (2014)

    Article  Google Scholar 

  2. Gu, G., Yang, J.: Optimal nonergodic sublinear convergence rate of proximal point algorithm for maximal monotone inclusion problems (2019). arXiv preprint arXiv:1904.05495

  3. Gu, G., Yang, J.: On the optimal linear convergence factor of the relaxed proximal point algorithm for monotone inclusion problems (2019). arXiv preprint arXiv:1905.04537

  4. Gu, G., Yang, J.: On the optimal ergodic sublinear convergence rate of the relaxed proximal point algorithm for variational inequalities (2019). arXiv preprint arXiv:1905.06030

  5. Halpern, B.: Fixed points of nonexpanding maps. Bull. Am. Math. Soc. 73(6), 957–961 (1967)

    MathSciNet  Article  Google Scholar 

  6. Kim, D.: Accelerated proximal point method for maximally monotone operators (2019). arXiv preprint arXiv:1905.05149

  7. Kirszbraun, M.D.: Über die zusammenziehende und Lipschitzsche Transformationen. Fund. Math. 22, 77–108 (1934)

    Article  Google Scholar 

  8. Kohlenbach, U.: On quantitative versions of theorems due to FE Browder and R Wittmann. Adv. Math. 226(3), 2764–2795 (2011)

    MathSciNet  Article  Google Scholar 

  9. Kohlenbach, U., Leuştean, L.: Effective metastability of Halpern iterates in CAT (0) spaces. Adv. Math. 231(5), 2526–2556 (2012)

    MathSciNet  Article  Google Scholar 

  10. Leustean, L.: Rates of asymptotic regularity for halpern iterations of nonexpansive mappings. J. UCS 13(11), 1680–1691 (2007)

    MathSciNet  MATH  Google Scholar 

  11. Lofberg, J.: YALMIP: a toolbox for modeling and optimization in MATLAB. In: 2004 IEEE International Symposium on Computer Aided Control Systems Design, IEEE, pp. 284–289 (2004)

  12. Lieder, F.: Projection based methods for conic linear programming—optimal first order complexities and norm constrained quasi newton methods, (Doctoral dissertation, Universitäts-und Landesbibliothek der Heinrich-Heine-Universität Düsseldorf) (2018). https://docserv.uni-duesseldorf.de/servlets/DerivateServlet/Derivate-49971/Dissertation.pdf

  13. Sturm, J.F.: Implementation of interior point methods for mixed semidefinite and second order cone optimization problems. Optim. Methods Softw. 17(6), 1105–1154 (2002)

    MathSciNet  Article  Google Scholar 

  14. Sabach, S., Shtern, S.: A first order method for solving convex bilevel optimization problems. SIAM J. Optim. 27(2), 640–660 (2017)

    MathSciNet  Article  Google Scholar 

  15. Taylor, A.B., Hendrickx, J.M., Glineur, F.: Smooth strongly convex interpolation and exact worst-case performance of first-order methods. Math. Program. 161(1–2), 307–345 (2017)

    MathSciNet  Article  Google Scholar 

  16. Ryu, E.K., Taylor, A.B., Bergeling, C., Giselsson, P.: Operator splitting performance estimation: tight contraction factors and optimal parameter selection (2018). arXiv preprint arXiv:1812.00146

  17. Wittmann, R.: Approximation of fixed points of nonexpansive mappings. Arch. der Math. 58(5), 486–491 (1992)

    MathSciNet  Article  Google Scholar 

  18. Xu, H.K.: Iterative algorithms for nonlinear operators. J. Lond. Math. Soc. 66(1), 240–256 (2002)

    MathSciNet  Article  Google Scholar 

Download references

Acknowledgements

Open Access funding provided by Projekt DEAL. I would like to thank Florian Jarre for careful proofreading of this paper. I would also like to sincerely thank the anonymous referees for their constructive criticism that helped to improve the presentation of the paper and that led to the direct proof based on a weighted sum.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Felix Lieder.

Ethics declarations

Conflict of interest

The author declares that he has no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix, a technical duality result

Appendix, a technical duality result

While the duality used in (17) and (18) is the well known weak conic duality, the format of the problems (17) and (18) is quite intricate. Here, the explicit derivation of the dual problem therefore is derived in detail: Define the Euclidean space \({\mathbb {E}}:= {\mathbb {R}} \times {\mathbb {S}}^k \times {\mathbb {S}}^{k+1}\) and the closed convex cone \({\mathcal {K}}:={\mathbb {R}}_+ \times ({\mathbb {S}}^k \cap {\mathbb {R}}_+^{k\times k}) \times {\mathbb {S}}_+^{k+1}\). We denote the trace inner product \(A\cdot B:= \mathrm{trace}({AB})\) for all symmetric matrices AB. Then \({\mathcal {K}}\) is self-dual with respect to the canonical inner product \(\langle X , Y \rangle _{{\mathbb {E}}}:=x_1^Ty_1 + X_2 \cdot Y_2 + X_3 \cdot Y_3\), which we define for any

$$\begin{aligned} \begin{pmatrix} x_1\\ X_2 \\ X_3 \end{pmatrix}, \begin{pmatrix} y_1\\ Y_2 \\ Y_3 \end{pmatrix} \in {\mathbb {E}}. \end{aligned}$$

We proceed by restating problems (17) and (18) in standard form and exploit (weak) conic duality. In fact, by adding slack variables \(S\in {\mathcal {K}}\), we can write (17) equivalently as a conic optimization problem in dual standard form

$$\begin{aligned} \underset{Y\in {\mathbb {S}}^{n+1}, \ S \in {\mathbb {E}}}{ \text {maximize}}\{ {\tilde{B}} \cdot Y \ | \ {\mathcal {A}}^*(Y) + S ={\widetilde{C}}, \ S \in {\mathcal {K}} \} \end{aligned}$$
(24)

for

$$\begin{aligned} {\tilde{C}} := \begin{pmatrix} R^2\\ 0\\ 0 \end{pmatrix} \in {\mathbb {E}}, \end{aligned}$$
$$\begin{aligned} {\tilde{B}} =\begin{pmatrix} 0&{} \quad 0\\ 0 &{} \quad e_ke_k^T \end{pmatrix} \in {\mathbb {S}}^{k+1} \end{aligned}$$
(25)

and the linear operator \({\mathcal {A}}^*: {\mathbb {S}}^{n+1}\rightarrow {\mathbb {E}}\)

$$\begin{aligned}&{{\mathcal {A}}^*} \left( \begin{pmatrix} y_0 &{} \quad y_1^T \\ y_1 &{} \quad Y_2 \end{pmatrix}\right) \nonumber \\&\quad := \begin{pmatrix} y_0 \\ \mathrm {diag}(Y_2 U)e^T + e \ \mathrm {diag}(U^T Y_2)^T - (Y_2U+ U^TY_2) + \mathrm {Diag}(\mathrm {diag}(Y_2 U) - y_1) \\ -\begin{pmatrix} y_0 &{} \quad y_1^T \\ y_1 &{} \quad Y_2 \end{pmatrix} \end{pmatrix}. \end{aligned}$$
(26)

The constraint \({\mathcal {A}}^*(Y) + S ={\widetilde{C}}, S \in {\mathcal {K}}\) is a direct translation of the constraints in (17) except for the diagonal entries. Here it is used that the term \(\mathrm {diag}(Y_2 U)e^T + e \ \mathrm {diag}(U^T Y_2)^T - (Y_2U+ U^TY_2)\) has an all zero diagonal, which is why we can “place ” the constraints \(\mathrm {diag}(Y_2 U) - y_1\le 0\) on the diagonal.

Recall that the optimal value of the conic optimization problem (24) in dual standard form is upper bounded by the optimal value of the conic optimization in primal standard form

$$\begin{aligned} \underset{X \in {\mathbb {E}}}{\text {minimize}} \{ \langle {\widetilde{C}} , {\widetilde{X}} \rangle _{{\mathbb {E}}} \ | \ {\mathcal {A}} ({\widetilde{X}}) = {\tilde{B}}, \ {\widetilde{X}} \in {\mathbb {E}}, \ {\widetilde{X}} \in {\mathcal {K}} \} \end{aligned}$$
(27)

where the linear operators \({\mathcal {A}}^*: {\mathbb {S}}^{n+1}\rightarrow {\mathbb {E}}\) and \( {\mathcal {A}}: {\mathbb {E}} \rightarrow {\mathbb {S}}^{n+1}\) are adjoint, i.e. \({\mathcal {A}}\) is the unique operator that satisfies

$$\begin{aligned} \langle {\mathcal {A}}^*(Y) , {\tilde{X}} \rangle _{{\mathbb {E}} } = Y \cdot {\mathcal {A}}({\tilde{X}}) \end{aligned}$$
(28)

for all \(Y\in {\mathbb {S}}^{n+1}, \ {\tilde{X}}\in {\mathbb {E}}\). The ”\(Y_2\)-part” of the adjoint of the second component of \({\mathcal {A}}^*\) can be obtained from

$$\begin{aligned} \begin{aligned}&(\mathrm {diag}(Y_2 U)e^T + e \ \mathrm {diag}(U^T Y_2)^T - (Y_2U+ U^TY_2) + \mathrm {Diag}(\mathrm {diag}(Y_2 U)) ) \cdot X_1 \\&\quad = Y_2 \cdot ( U(\underbrace{\mathrm {Diag}(X_1e) + \tfrac{1}{2} \mathrm {Diag}(\mathrm {diag}(X_1))-X_1}_{=F(X_1)} ) \\&\qquad + \underbrace{(\mathrm {Diag}(X_1e) + \tfrac{1}{2} \mathrm {Diag}(\mathrm {diag}(X_1))-X_1}_{=F(X_1)}) U^T) \end{aligned} \end{aligned}$$
(29)

leading to the full description of \({\mathcal {A}}\)

$$\begin{aligned} {\mathcal {A}} (\begin{pmatrix} \xi \\ X_1 \\ X_2\end{pmatrix}) = \begin{pmatrix} \xi &{} \quad -\tfrac{1}{2} \mathrm {diag}(X_1)^T \\ -\tfrac{1}{2} \mathrm {diag}(X_1) &{} \quad U F(X_1) + F(X_1)U^T \end{pmatrix} - X_2 \end{aligned}$$
(30)

where, as before \(F(X) = \mathrm {Diag}(Xe) + \tfrac{1}{2} \mathrm {Diag}(\mathrm {diag}(X))-X\). This shows that we can rewrite (27) in the claimed form (18) by eliminating the (semidefinite) variable \(X_2\). This establishes the inequality in (17) and (18). \(\square \)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lieder, F. On the convergence rate of the Halpern-iteration. Optim Lett 15, 405–418 (2021). https://doi.org/10.1007/s11590-020-01617-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11590-020-01617-9

Keywords

  • Halpern-iteration
  • Fixed point methods
  • First order methods
  • Semidefinite programming
  • Performance estimation
  • Proximal point