1 Introduction

Let H be a Hilbert space equipped with a symmetric inner product \(\langle . , . \rangle :H \times H \rightarrow {\mathbb {R}}\). Let \(T:H \rightarrow H\) be a nonexpansive mapping and consider for fixed \(x_0\in H\) the Halpern-Iteration

$$\begin{aligned} x_{k+1}:= \lambda _k x_0 + (1- \lambda _k)T(x_k) \end{aligned}$$
(1)

from [5] with \(\lambda _k:= \tfrac{1}{k+2}\) for approximating a fixed point of T. Let \(\Vert x \Vert :=\sqrt{\langle x , x \rangle }\) denote the induced norm and \( Fix(T):= \{ x \in H \ : \ x=T(x) \}\) the set of fixed points of T. It is well known that, if the set Fix(T) is nonempty, then the sequence \( \{x_k\}_{k\in {\mathbb {N}}_0}\) will converge to \(x_*\in Fix(T)\) minimizing the distance to \(x_0\); see [17] Theorem 2, and [18] for generalizations of this remarkable property. As a consequence the norm of the residuals \(x_k -T(x_k)\) tends to zero, i.e. \( \lim _{k \rightarrow \infty } \Vert x_k -T(x_k) \Vert =0\). Our goal here is to quantify their rate of convergence. A first result of this type was generated via proof mining in [10] in normed spaces (see also [8] and [9] for further details on results in more general spaces.). Here, we improve the result for the setting of Hilbert spaces. Our proof technique is not based on proof mining, but on semidefinite programming, and is strongly motivated by the recent work of Taylor et al. [15] on worst case performance of first order minimization methods. Our methodology and focus here are, however, slightly different. We present two new proofs below. The first one is short and uses a parameter choice derived from the technique used in [15]. The second proof based on semidefinite programming is self-contained and adapts the framework of [15] to fixed point problems. The second approach can also be applied to other choices of parameters \(\lambda _k\) and to other fixed point methods. The rates are however in general not obvious. After the manuscript of this work was made public in late 2017, the author (see section 4.1 and following of [12]) and independently other authors (see e.g. [2,3,4, 6, 16]) have studied similar setups, which may all be categorized as part of the performance estimation framework. Let us briefly sketch how our setup here and in [12] may be applied in the context of proximal point algorithms (e.g. the setup of [2, 6]) by defining the nonexpansive operator \(T:=2J-I\), where I is the identity and J is a firmly nonexpansive operator (e.g. the resolvent operator of a maximal monotone operator). Because the fixed points of J and T are the same, one may now apply the Halpern-Iteration (1) to find a fixed point of J. The (tight) convergence rate is then implied by Theorem 2.1 below.

2 Main result

Theorem 2.1

Let \( x_0 \in H\) be arbitrary but fixed. If T has fixed points, i.e. \( Fix(T) \not = \emptyset \), then the iterates defined in (1) satisfy

$$\begin{aligned} \tfrac{1}{2} \Vert x_k - T(x_k) \Vert \le \frac{ \Vert x_0 - x_* \Vert }{k+1} \ \forall k \in {\mathbb {N}}_0 \quad \forall x_* \in Fix(T) \end{aligned}$$
(2)

This bound is tight.

Remark 2.1

A generalization of the Halpern-Iteration, the sequential averaging method (SAM), was analyzed in the recent paper [14], where for the first time a rate of convergence of order O(1/k) could be established for SAM. The rate of convergence in (2) is even slightly faster than the one established for the more general framework in [14] (by a factor of 4). More importantly, however, as shown by Example 3.1 below, the estimate (2) is actually tight, in the sense that for every \(k\in {\mathbb {N}}_0\) there exists a Hilbert space H and a nonexpansive operator T with some fixed point \(x_*\) such that the inequality (2) is not strict.

Estimate (2) refers to the step length \(\lambda _k:=1/(k+2)\). The restriction to this choice is motivated by problem (17) below in the proof based on semidefinite programming; in numerical tests for small for dimensions k these coefficients provided a better worst-case complexity than any other choice of coefficients.

Next, an elementary direct proof of Theorem 2.1 is given.

Direct proof based on a weighted sum

The iteration (1) with \(\lambda _k=1/(k+2)\) implies for \(1\le j\le k\):

$$\begin{aligned} \textstyle x_j = \frac{1}{j+1}x_0 + \frac{j}{j+1}T(x_{j-1}) \quad \hbox {or}\,\, \ T(x_{j-1}) = \frac{j+1}{j}x_j-\frac{1}{j}x_0 . \end{aligned}$$
(3)

By nonexpansiveness the following inequalities hold:

$$\begin{aligned} {\Vert T(x_k)-x_*\Vert ^2}\le {\Vert x_k-x_*\Vert ^2} \quad \hbox {for}\,\, x_* \in Fix(T) \end{aligned}$$
(4)

and

$$\begin{aligned} {\Vert T(x_j)-T(x_{j-1})\Vert ^2}\le {\Vert x_j-x_{j-1}\Vert ^2} \quad \hbox {for} \,\, j=1,\ldots ,k. \end{aligned}$$
(5)

Below we reformulate the following weighted sum of (5):

$$\begin{aligned} 0 \ge \sum _{j=1}^kj(j+1)\left( {\Vert T(x_j)-T(x_{j-1})\Vert ^2}- {\Vert x_j-x_{j-1}\Vert ^2} \right) . \end{aligned}$$
(6)

Using the second relation in (3) the first terms in the summation (6) are

$$\begin{aligned}&j(j+1){\Vert T(x_j)-T(x_{j-1})\Vert ^2} \nonumber \\&\quad = j(j+1) {\Vert x_j-T(x_j)+\frac{1}{j} (x_j-x_0)\Vert ^2} \nonumber \\&\quad = \textstyle j(j+1){\Vert x_j-T(x_j)\Vert ^2}+2(j+1)\langle x_j-T(x_j) , x_j-x_0 \rangle +\frac{j+1}{j} {\Vert x_j-x_0\Vert ^2}, \nonumber \\ \end{aligned}$$
(7)

and using the first relation in (3) it follows for the second terms in (6)

$$\begin{aligned}&-j(j+1){\Vert x_j-x_{j-1}\Vert ^2} \nonumber \\&\quad = \textstyle - j(j+1) {\Vert \frac{1}{j+1}(x_0-T(x_{j-1}))+T(x_{j-1}) -x_{j-1}\Vert ^2} \nonumber \\&\quad = \textstyle -\frac{j}{j+1} {\Vert x_0-T(x_{j-1})\Vert ^2}-2j\langle x_0-T(x_{j-1}) , T(x_{j-1}) -x_{j-1} \rangle \nonumber \\&\qquad -j(j+1) {\Vert T(x_{j-1}) -x_{j-1}\Vert ^2}. \end{aligned}$$
(8)

Observe [using again the second relation in (3)] that the first term in (8)

$$\begin{aligned} -\frac{j}{j+1} {\Vert x_0-T(x_{j-1})\Vert ^2}= & {} - \frac{j}{j+1} {\Vert \frac{j+1}{j}x_0-\frac{j+1}{j}x_j\Vert ^2} \nonumber \\= & {} - \frac{j+1}{j}{\Vert x_0-x_j\Vert ^2} \end{aligned}$$
(9)

cancels the third term in (7). Summing up the second terms in (8) for \(j=1,\ldots ,k\) we shift the summation index,

$$\begin{aligned} -\sum _{j=1}^k 2j\langle x_0-T(x_{j-1}) , T(x_{j-1})-x_{j-1} \rangle =\sum _{j=0}^{k-1}2(j+1)\langle x_{j}-T(x_{j}) , x_0-T(x_{j}) \rangle , \end{aligned}$$

so that summing up the second terms in (7) and in (8) for \(j=1,\ldots ,k\) results in

$$\begin{aligned}&2(k+1)\langle x_k-T(x_k) , x_k-x_0 \rangle \nonumber \\&\quad +2\sum _{j=1}^{k-1}(j+1) \langle x_j-T(x_j) , x_j-T(x_j) \rangle +2{\Vert x_0-T(x_0)\Vert ^2}. \end{aligned}$$
(10)

Shifting again the index in the summation of the third terms in (8)

$$\begin{aligned} -\sum _{j=1}^k j(j+1){\Vert x_{j-1}-T(x_{j-1})\Vert ^2}= -\sum _{j=0}^{k-1} (j+1)(j+2){\Vert x_{j}-T(x_{j})\Vert ^2} \end{aligned}$$

and summing up the first terms in (7) and the third terms in (8) for \(j=1,\ldots ,k\) gives

$$\begin{aligned} k(k+1){\Vert x_{k}-T(x_{k})\Vert ^2}-2\sum _{j=1}^{k-1}(j+1){\Vert x_j-T(x_j)\Vert ^2}-2{\Vert x_0-T(x_0)\Vert ^2} \end{aligned}$$
(11)

where the sum in the middle cancels the sum in the middle of (10) and the terms \(2{\Vert x_0-T(x_0)\Vert ^2}\) cancel as well. The only remaining terms are the first terms in (10) and (11).

Thus, inserting (9), (10), and (11) in (6) leads to

$$\begin{aligned} 0 \ge k(k+1){\Vert x_k-T(x_k)\Vert ^2}+2(k+1)\langle x_k-T(x_k) , x_k-x_0 \rangle . \end{aligned}$$
(12)

Applying the Cauchy–Schwarz inequality to the second term in (12) leads to

$$\begin{aligned} \frac{1}{2} \Vert x_k-T(x_k)\Vert \le \frac{1}{k} \Vert x_k-x_0\Vert \end{aligned}$$

which may be interesting in its own right. To prove the theorem, (12) is divided by \(k+1\) and then (4) is added:

$$\begin{aligned} 0\ge & {} k{\Vert x_k-T(x_k)\Vert ^2}+2\langle x_k-T(x_k) , x_k-x_0 \rangle + {\Vert T(x_k)-x_*\Vert ^2}- {\Vert x_k-x_*\Vert ^2}\nonumber \\= & {} \textstyle \frac{k+1}{2} {\Vert x_k-T(x_k)\Vert ^2}-\frac{2}{k+1}{\Vert x_0-x_*\Vert ^2} \nonumber \\&+ \frac{2}{k+1}{\Vert x_0-x_*- \frac{k+1}{2}(x_k-T(x_k))\Vert ^2}. \end{aligned}$$
(13)

To see the last equation, the last two terms in (13) can be combined, and then a straightforward but tedious multiplication of the terms \(a:=x_k-T(x_k)\), \(b:= x_k-x_0\), \(c:= T(x_k)-x_*\), \(a+c=x_k-x_*\), and \(a+c-b =x_0-x_*\) reveals the identity.

Omitting the last term in (13) one obtains

$$\begin{aligned} {\Vert x_k-T(x_k)\Vert ^2}\le \left( \tfrac{2}{k+1}\right) ^2 {\Vert x_0-x_*\Vert ^2} \end{aligned}$$

which proves the theorem when taking square roots on both sides. \(\square \)

The above proof is somewhat unintuitive as the choice of the weights with which the inequalities (4) and (5) are added in (6) and (13) is far from obvious. In fact we owe the suggestion of these weights to an extremely helpful anonymous referee, who extracted it from a more complex construction in [15] which was also the basis for the initial proof of this paper based on semidefinite programming. We state this proof next since it offers a generalizable approach for analyzing fixed point iterations; it can be modified to the KM iteration, for example in the recent thesis [12]—though this modification is quite technical. The proof based on semidefinite programming also led to Example 3.1 below showing that the rate of convergence is tight.

Proof based on semidefinite programming Let \(x_* \in Fix(T)\). The Halpern-Iteration was stated in the form (1) to comply with existing literature. For our proof however, it is more convenient to consider the shifted sequence \({\bar{x}}_1:= x_0\) and \({\bar{x}}_k:=x_{k-1} \ \forall k \in {\mathbb {N}}_{\not = 0 }:=\{1,2,3,\dots \} \) and to show a shifted statement

$$\begin{aligned} \tfrac{1}{2} \Vert {\bar{x}}_k - T({\bar{x}}_k) \Vert \le \frac{\Vert {\bar{x}}_1-x_* \Vert }{k} \quad \forall \,\, k\in {\mathbb {N}}_{\not =0 } \end{aligned}$$
(14)

Let us define \(g(x):=\frac{1}{2}(x-T(x))\). It is well known that g is firmly nonexpansive. For sake of completeness the argument is repeated here:

$$\begin{aligned} \begin{aligned}&\Vert g(x)-g(y) \Vert ^2 - \langle g(x)-g(y) , x-y \rangle \\&\quad = \Vert g(x)-g(y) - \tfrac{1}{2}(x-y) \Vert ^2 - \tfrac{1}{4} \Vert x-y \Vert ^2 \\&\quad = \tfrac{1}{4} \Vert T(x)-T(y) \Vert ^2 - \tfrac{1}{4}\Vert x-y \Vert ^2 \le 0 \quad \forall \,\, x,y\in H. \end{aligned} \end{aligned}$$

Nonexpansiveness and the Cauchy–Schwarz inequality also imply \( \Vert g(x)-g(y) \Vert \le \Vert x-y \Vert \forall x,y\in H\). For \( k =1\) the statement (14) follows immediately since \(g(x_*)=0\) and therefore \( \tfrac{1}{2} \Vert {\bar{x}}_1 - T( {\bar{x}}_1) \Vert = \Vert g({\bar{x}}_1 ) \Vert = \Vert g({\bar{x}}_1) - g( {\bar{x}}^*) \Vert \le \tfrac{\Vert {\bar{x}}_1-x_* \Vert }{1}\).

For fixed \(k\ge 2\) we first consider the differences \({\bar{x}}_j - {\bar{x}}_1\) for \( j \in \{2,\dots ,k \}\)

$$\begin{aligned} \begin{aligned} {\bar{x}}_{j} -{\bar{x}}_1&= x_{j-1} - {\bar{x}}_1\\&= \lambda _{j-2} x_0 + (1- \lambda _{j-2}) T(x_{j-2}) - {\bar{x}}_1\\&= \tfrac{1}{j} x_0 + \left( 1- \tfrac{1}{j}\right) T(x_{j-2} ) - {\bar{x}}_1 \\&= \left( \tfrac{1}{j}-1\right) {\bar{x}}_1 + \left( 1- \tfrac{1}{j}\right) T( {\bar{x}}_{j-1} ) \\&= \left( \tfrac{1}{j} -1\right) {\bar{x}}_1 + \left( 1- \tfrac{1}{j}\right) ({\bar{x}}_{j-1} - 2 g( {\bar{x}}_{j-1}) ) \\&= \left( 1-\tfrac{1}{j}\right) ({\bar{x}}_{j-1} - {\bar{x}}_1) - 2 \left( 1- \tfrac{1}{j}\right) g({\bar{x}}_{j-1} ) \\&= \tfrac{j-1}{j} ({\bar{x}}_{j-1} - {\bar{x}}_1) - 2 \tfrac{j-1}{j} g({\bar{x}}_{j-1} ) \end{aligned} \end{aligned}$$

which inductively leads to

$$\begin{aligned} {\bar{x}}_{j} -{\bar{x}}_1=- 2 \sum _{l=1}^{j-1} \tfrac{l}{j} g({\bar{x}}_{l}). \end{aligned}$$

Let us shorten the notation slightly and define \( g_i:= g({\bar{x}}_i) \), \( R:= \Vert {\bar{x}}_1 -x_* \Vert \ge 0\), the vector \(b=(\langle g_i , {\bar{x}}_1-x_* \rangle )_{i=1}^k\), the matrices \(A:= (\langle g_i , g_j \rangle )_{i,j=1}^k\) and

$$\begin{aligned} L:=-2 \begin{pmatrix} 0 &{} \quad \tfrac{1}{2} &{} \quad \tfrac{1}{3} &{} \quad \ldots &{} \quad \tfrac{1}{k}\\ 0 &{} \quad 0 &{} \quad \tfrac{2}{3} &{} \quad \ldots &{} \quad \tfrac{2}{k}\\ \vdots &{} \quad \vdots &{} \quad 0 &{} \quad \ddots &{} \quad \vdots \\ 0 &{} \quad 0 &{} \quad 0 &{} \quad 0 &{} \quad \tfrac{k-1}{k} \\ 0 &{} \quad 0 &{} \quad 0 &{} \quad 0 &{} \quad 0 \end{pmatrix} \in {\mathbb {R}}^{k\times k}. \end{aligned}$$

Let \(b^T\) denote the transpose of b. Note that

$$\begin{aligned} \begin{pmatrix} R^2 &{} \quad b^T \\ b &{} \quad A \end{pmatrix} \in {\mathbb {R}}^{(k+1) \times (k+1)} \end{aligned}$$

is a Gramian matrix formed from \( {\bar{x}}_1-x_*,g_1,\dots ,g_k\in H\) and is therefore symmetric and positive semidefinite. We proceed by expressing the inequalities from firm nonexpansiveness in terms of the Gram-Matrix. Since L often is of much lower dimension than H, this is sometimes referred to as “kernel trick”. Keeping in mind that we can rewrite the differences \( {\bar{x}}_j - {\bar{x}}_1 = - 2 \sum _{l=1}^{j-1} \frac{l}{j} g_l\) for \(j\in \{1,\dots ,k\}\), we arrive at

$$\begin{aligned} AL = ( \langle g_i , {\bar{x}}_j - {\bar{x}}_1 \rangle )_{i,j=1}^k. \end{aligned}$$

Let \(e \in {\mathbb {R}}^k\) denote the vector of all ones. Then

$$\begin{aligned} \mathrm {diag} (AL) e^T - AL= ( \langle g_i , {\bar{x}}_i - {\bar{x}}_j \rangle )_{i,j=1}^k, \end{aligned}$$

where \(\mathrm {diag}(.)\) denotes the diagonal of its matrix argument, holds true. Hence

$$\begin{aligned} \mathrm {diag} (AL) e^T +e \ \mathrm {diag} (AL)^T - AL -L^T A= ( \langle g_i-g_j , {\bar{x}}_i - {\bar{x}}_j \rangle )_{i,j=1}^k \end{aligned}$$

and

$$\begin{aligned} b e^T + AL= & {} ( \langle g_i , {\bar{x}}_j - x_* \rangle )_{i,j=1}^k,\\ \mathrm {diag} (A) e^T + e \ \mathrm {diag}(A)^T -2 A= & {} (\Vert g_i-g_j \Vert ^2)_{i,j=1}^k. \end{aligned}$$

The firm nonexpansiveness inequalities \( \Vert g_i- g_j \Vert ^2 \le \langle g_i-g_j , {\bar{x}}_i-{\bar{x}}_j \rangle \) are equivalent to the component-wise inequality

$$\begin{aligned} \mathrm {diag} (A) e^T + e \ \mathrm {diag}(A)^T -2 A\le & {} \mathrm {diag} (AL) e^T +e \ \mathrm {diag} (AL)^T \nonumber \\&- AL -L^T A. \end{aligned}$$
(15)

Note that only \( \tfrac{k^2-k}{2}\) of these componentwise inequalities are non redundant. From

\(g_*:=g(x_*)=0\) we obtain another k inequalities, i.e. \( \Vert g_i \Vert ^2 \le \langle g_i , {\bar{x}}_i-x_* \rangle \), which translate to

$$\begin{aligned} \mathrm {diag}(A) \le b + \mathrm {diag}(AL). \end{aligned}$$
(16)

Defining \(U:=I-L\) , relations (15) and (16) can be shortened slightly to

$$\begin{aligned} \mathrm {diag} (AU) e^T + e \ \mathrm {diag}(AU)^T \le AU +U^T A \end{aligned}$$

and

$$\begin{aligned} \mathrm {diag}(AU) \le b. \end{aligned}$$

Let \(e_k \in {\mathbb {R}}^k\) denote the k-th unit vector, \( {\mathbb {S}}^n := \{ X \in {\mathbb {R}}^{n \times n }\ | \ X=X^T \}\) denote the space of symmetric matrices and \({\mathbb {S}}_+^n := \{ X \in {\mathbb {S}}^n\ | x^TXx\ge 0 \ \forall x \in {\mathbb {R}}^n \}\) the convex cone of positive semidefinite matrices. Consider the chain of inequalities

$$\begin{aligned}&\begin{aligned} \Vert g({\bar{x}}_k) \Vert ^2 = \underset{ y_0 \in {\mathbb {R}}, y_1 \in {\mathbb {R}}^{k} , Y_2\in {\mathbb {S}}^k}{\text {maximize}} (Y_2)_{kk} \,&| \ \begin{pmatrix} y_0 &{} y_1^T \\ y_1 &{} Y_2 \end{pmatrix} \in {\mathbb {S}}_+^{k+1}, \ y_0\le R^2, \ \mathrm {diag}(Y_2 U) \le y_1 \\&|\ \mathrm {diag}(Y_2 U)e^T + e \ \mathrm {diag}(U^T Y_2)^T \le Y_2U+ U^TY_2 \\&| \ y_0=R^2 ,y_1=b, Y_2= A \\ \\ \end{aligned} \nonumber \\&\begin{aligned} \le \underset{ y_0 \in {\mathbb {R}}, y_1 \in {\mathbb {R}}^{k} , Y_2=Y_2^T \in {\mathbb {S}}^k}{\text {maximize}} (Y_2)_{kk} \&| \ \begin{pmatrix} y_0 &{} y_1^T \\ y_1 &{} Y_2 \end{pmatrix} \in {\mathbb {S}}_+^{k+1}, \ y_0\le R^2, \ \mathrm {diag}(Y_2 U) \le y_1 \\&|\ \mathrm {diag}(Y_2 U)e^T + e \ \mathrm {diag}(U^T Y_2)^T \le Y_2U+ U^TY_2 \\ \\ \end{aligned} \end{aligned}$$
(17)
$$\begin{aligned}&\begin{aligned} \le \underset{ \xi \in {\mathbb {R}}_+, X\in {\mathbb {S}}^k \cap {\mathbb {R}}_+^{k\times k }}{\text {minimize}} \ \ R^2 \xi \ | \&\begin{pmatrix} \xi &{} -\tfrac{1}{2} \mathrm {diag}(X)^T \\ -\tfrac{1}{2} \mathrm {diag}(X) &{} U F(X) + F(X)U^T \end{pmatrix} - \begin{pmatrix} 0&{} 0\\ 0 &{} e_ke_k^T \end{pmatrix} \in {\mathbb {S}}_+^{k+1} \end{aligned} \end{aligned}$$
(18)

for

$$\begin{aligned} F(X) := \mathrm {Diag}(Xe) + \tfrac{1}{2} \mathrm {Diag}(\mathrm {diag}(X))-X, \end{aligned}$$
(19)

where \(\mathrm {Diag}(.)\) denotes the square diagonal matrix with its vector argument on the diagonal. The first equality follows from construction, the first inequality from relaxing, and the second inequality from weak conic duality as detailed in Sect. 5. We conclude the proof by showing feasibility of \({\hat{\xi }} := \frac{1}{k^2} >0\) and

$$\begin{aligned} {\hat{X}} := \frac{1}{k^2} \begin{pmatrix} 0 &{} \quad 1 \cdot 2 &{} \quad 0 &{} \quad \ldots &{} \quad \ldots &{} \quad 0 \\ 1 \cdot 2 &{} \quad 0&{} \quad 2 \cdot 3 &{} \quad 0 &{} \quad \ldots &{} \quad \vdots \\ 0 &{} \quad 2 \cdot 3 &{} \quad \ddots &{} \quad \ddots &{} \quad \ddots &{} \quad \vdots \\ \vdots &{} \quad \ddots &{} \quad \ddots &{} \quad \ddots &{} \quad (k-2)(k-1) &{} \quad 0 \\ 0&{} \quad \ldots &{} \quad 0&{} \quad (k-2)(k-1) &{} \quad 0&{} \quad (k-1)k \\ 0 &{} \quad \ldots &{} \quad 0 &{} \quad 0 &{} \quad (k-1)k &{} \quad 2 k \end{pmatrix} \in {\mathbb {R}}^{k \times k} \end{aligned}$$

for the last optimization problem (18). First note that \({\hat{X}}= {\hat{X}}^T\) is symmetric and nonnegative. A short computation reveals, that the equality

$$\begin{aligned} U F({\hat{X}}) + F({\hat{X}})U^T= 2 e_k e_k^T \end{aligned}$$

holds true: Define the diagonal matrix \( D:= \frac{1}{k} \mathrm {Diag}([1,\ldots ,k ]^T ) \in {\mathbb {R}}^{k \times k }\), together with the strict upper triangular matrix

$$\begin{aligned} P:= \begin{pmatrix} 0 &{} \quad 1 &{} \quad \ldots &{} \quad 1\\ &{} \quad \ddots &{} \quad \ddots &{} \quad \vdots \\ &{} \quad &{} \quad \ddots &{} \quad 1 \\ &{} \quad &{} \quad &{} \quad 0 \end{pmatrix} \in {\mathbb {R}}^{k\times k} \end{aligned}$$
(20)

and the bidiagonal matrix

$$\begin{aligned} B:= \begin{pmatrix} 0 &{} \quad 1 &{} \quad &{} \quad \\ 1 &{} \quad \ddots &{} \quad \ddots &{} \quad \\ &{} \quad \ddots &{} \quad \ddots &{} \quad 1 \\ &{} \quad &{} \quad 1 &{} \quad 0 \end{pmatrix} \in {\mathbb {S}}^k. \end{aligned}$$

The matrices \(U, {\hat{X}}\) and \(F({\hat{X}})\) can now be expressed as

$$\begin{aligned} \begin{aligned}&U= I +2 D P D^{-1}, \quad {\hat{X}} = DBD+\frac{2}{k} e_ke_k^T&\quad \text { and } \\&F({\hat{X}} ) = 2D^2-e_ke_k^T -DBD. \end{aligned} \end{aligned}$$
(21)

Combining the equalities (21), \(D e_k =e_k\) and \( D^{-1} e_k =e_k\), yields

$$\begin{aligned} \begin{aligned} U F ({\hat{X}} )&= 2 D^2 -e_k e_k^T- DBD +4 DPD-2 DP e_k e_k^T -2 DP BD \\&= D(2I -e_k e_k^T -B+4 P-2 P e_k e_k^T -2 PB) D \end{aligned} \end{aligned}$$
(22)

and using (22) we compute

$$\begin{aligned} \begin{aligned}&U F({\hat{X}}) + F({\hat{X}}) U^T -2 e_k e_k^T \\&\quad = D(4I - 2e_k e_k^T -2 B + 4 P-2 P e_k e_k^T -2 PB + 4 P^T-2 e_k e_k^T P^T -2 B P^T - 2 e_k e_k^T)D \\&\quad = D(\underbrace{4I+ 4 P+ 4 P^T}_{=4 e e^T} - 4e_k e_k^T -2 \underbrace{P e_k}_{e-e_k} e_k^T -2 e_k \underbrace{e_k^T P^T}_{e^T-e_k^T} -2 B -2 PB -2 B P^T )D \\&\quad = D (4 e e^T \underbrace{-2 ee_k^T -2 e_k e^T -2 B -2 PB -2 B P^T}_{=-4 ee^T}) D \\&\quad = 0, \end{aligned} \end{aligned}$$

which implies \(U F({\hat{X}}) + F({\hat{X}})U^T= 2 e_k e_k^T\) as we claimed above. Consequently

$$\begin{aligned} \begin{pmatrix} {\hat{\xi }} &{} \quad -\tfrac{1}{2} \mathrm {diag}({\hat{X}})^T \\ -\tfrac{1}{2} \mathrm {diag}({\hat{X}}) &{} \quad U F({\hat{X}}) + F({\hat{X}})U^T \end{pmatrix} - \begin{pmatrix} 0&{} \quad 0\\ 0 &{} \quad e_ke_k^T \end{pmatrix} =\begin{pmatrix} \frac{1}{k^2} &{} \quad - \frac{1}{k} e_k^T \\ - \frac{1}{k} e_k &{} \quad e_k e_k^T \end{pmatrix} \succeq 0 \end{aligned}$$

is positive semidefinite and as a result, \( {\hat{\xi }}\) and \( {\hat{X}}\) is feasible for (18). Hence

$$\begin{aligned} \Vert g({\bar{x}}_k) \Vert ^2 \le R^2 {\hat{\xi }} = \frac{\Vert {\bar{x}}_1 - x_* \Vert ^2}{k^2}, \end{aligned}$$

which yields the desired result after taking the square root. \(\square \)

Remark 2.2

The matrix \({\hat{X}}\) in the above proof carrying the weights \(j(j+1)\) used in (6) were obtained by solving (18) with YALMIP [11] in combination with the SDP solver Sedumi [13] for small values of k. In order to provide a theoretical proof that the points \({\hat{\xi }}\) and \({\hat{X}}\) above are not only feasible but actually optimal for (18) and to prove tightness of the derived bound, we refer to Example 3.1 below, which was derived from a, numerically obtained, low-rank optimal solution of (17). More precisely, after numerically determining the optimal value of (18) a linear equation was added to (18) requiring that \((Y_2)_{kk}\) equals this value, and then the trace of \(Y_{2,2}\) was minimized with the intention to find the optimal solution with minimum rank. This optimal solution was then used to derive Example 3.1 below proving the tightness of (2). In fact for any optimal solution of the SDP relaxation (17), there exists at least one Lipschitz continuous operator \({{\tilde{T}}}_k:{\mathbb {R}}^d \rightarrow {\mathbb {R}}^d\) for appropriately chosen d with some fixed point \(x_*\) such that the inequality in (18) is tight: This follows from appropriately labeling the columns of the symmetric square root of such an optimal solution and a Lipschitz extension argument. Specifically the Kirszbraun-Theorem [7] allows a Lipschitz extension of an operator that is Lipschitz on a discrete set to the entire space. For further details we refer to Section 4.2 of [12].

3 Tightness and choice of step lengths

Example 3.1

We consider the following one-dimensional real example with fixed point \(x_* = 0\) and starting point \(x_0\not = 0\). Let \(k\in {\mathbb {N}}\) be given. A nonexpansive mapping proving tightness of (2) can then be set up as follows: Let \(T:{\mathbb {R}}\rightarrow {\mathbb {R}}\) be defined via

$$\begin{aligned} T(x):= {\left\{ \begin{array}{ll} x+\tfrac{2R}{k+1}&{} \quad \text { if } x\le - \tfrac{R}{k+1} \\ -x &{} \quad \text { if } -\tfrac{R}{k+1}< x < \tfrac{R}{k+1} \\ x-\tfrac{2R}{k+1}&{} \quad \text { if } \tfrac{R}{k+1} \le x \end{array}\right. } \end{aligned}$$
(23)

for some fixed \(k\in {\mathbb {N}}\) and \(R:= \Vert x_0-x_*\Vert _2\) with \(x_0\in {\mathbb {R}}\) and \(x_*:= 0\). Note that T satisfies \(T(x_*)=0=x_*\) and is 1-Lipschitz continuous, i.e. nonexpansive, because it is piece-wise linear, continuous and the derivative is bounded in norm by one (\(|T'|\le 1\)) whenever it exists. We will now show that applying the Halpern-Iteration results in an equality in (2) for the k-th iterate, i.e.

$$\begin{aligned} \Vert \tfrac{1}{2}(x_k-T(x_k))\Vert _2=\tfrac{\Vert x_0-x_*\Vert _2}{k+1} \end{aligned}$$

is satisfied. This means that the bound (2) can not be improved without making further assumptions, as the operator above would otherwise pose a counterexample. For the first k iterates of the Halpern-Iteration (1) we can obtain

$$\begin{aligned} x_j=x_0 \ \underset{\ge \tfrac{1}{k+1}}{\underbrace{\left( 1- \tfrac{j}{k+1} \right) }} \quad \text { for }\,\, j\in {0,\dots ,k} \end{aligned}$$

inductively: for \(x_0=0=x_*\) there is nothing to prove. The case \(0<x_0=\Vert x_0-x_*\Vert _2 =R\) follows by using the definition of T and considering for \( j \in \{0,\dots ,k-1\}\) the iterates

$$\begin{aligned} \begin{aligned} x_{j+1}&= \tfrac{x_0}{j+2} +\left( 1 - \tfrac{1}{j+2}\right) T(\underset{\ge \tfrac{x_0}{k+1}=\tfrac{R}{k+1}}{\underbrace{x_j}})\\&=\tfrac{x_0}{j+2} +\left( 1 - \tfrac{1}{j+2} \right) \left( x_j -\tfrac{2R}{k+1}\right) \\&=\tfrac{x_0}{j+2} +\left( 1 - \tfrac{1}{j+2}\right) \left( x_0 \, \left( 1- \tfrac{j}{k+1} \right) -\tfrac{2x_0}{k+1}\right) \\&= x_0\left( \tfrac{1}{j+2} +\left( 1 - \tfrac{1}{j+2}\right) \left( 1- \tfrac{j}{k+1} -\tfrac{2}{k+1}\right) \right) \\&= x_0\left( \tfrac{1}{j+2} +1 - \tfrac{j+2}{k+1}- \tfrac{1}{j+2} +\tfrac{1}{k+1}\right) \\&= x_0 \ \underset{\ge \tfrac{1}{k+1}}{\underbrace{\left( 1- \tfrac{j+1}{k+1} \right) }}, \end{aligned} \end{aligned}$$

which imply, that

$$\begin{aligned} \Vert \tfrac{1}{2}(x_k-T(x_k))\Vert _2= & {} \Vert \tfrac{1}{2}\left( x_0 \left( 1- \tfrac{k}{k+1}\right) -\left( x_0\left( 1- \tfrac{k}{k+1}\right) -\tfrac{2R}{k+1}\right) \right) \Vert _2 \nonumber \\= & {} \tfrac{R}{k+1}= \tfrac{\Vert x_0-x_*\Vert _2}{k+1} \end{aligned}$$

holds true. The case \(x_0<0\) follows from the operators point symmetry, i.e. \(T(-x)=-T(x)\). This completes the proof of tightness. \(\square \)

While Example 3.1 shows that the bound (2) is best possible for the Halpern-Iteration with \(\lambda _k=1/(k+2)\) , the rate of convergence could be improved for this example, if the values \(\lambda _k\) were chosen appropriately less than \(1/(k+2)\). Let us illustrate next that this is not true in general, i.e. that choosing smaller values of \(\lambda _k\) does not always provide faster convergence. Let H be a Hilbert space with a countable orthonormal Schauder basis \(\{e_j\}_{j\in {\mathbb {N}}}\) and T be the linear operator defined by \(T(e_j)=e_{j+1}\) for \(j\in {\mathbb {N}}\). Hence the unique fixed point is \(x_*=0\). When choosing \(x_0=e_1\), then for any choice of step lengths \(\lambda _j\in [0,1]\), the k-th iterate always lies in the convex hull of \(e_1,\ldots ,e_{k+1}\), and the choice of \(\lambda _j\) minimizing the error \(\Vert x_k-x_*\Vert \) is precisely the step length \(\lambda _j=1/(j+2)\) for all \(1\le j\le k\). This step length leads to a slightly faster rate of convergence than (2) for this example, namely \(\frac{1}{2} \Vert x_k-T(x_k)\Vert \le \frac{\Vert x_0-x_*\Vert }{\sqrt{2}(k+1)}\). While this step length does not minimize the residual \(\frac{1}{2} \Vert x_k-T(x_k)\Vert \) it shows that smaller values of \(\lambda _j\) such as \(\lambda _j := \rho /(j+2)\) for all j with \(\rho \in [0,1)\) lead to larger residuals. On the other hand larger values \(\lambda _j > 1/(j+2)\) for all j lead to larger residuals for Example 3.1.

4 Conclusions

We have derived a new and tight complexity bound for the Halpern-Iteration with coefficients chosen as \(\lambda _k= \tfrac{1}{k+2}\). The proof based on semidefinite programming can in principle be adapted for other choices of parameters and fixed point iterations, again leading to tight complexity bounds. For example, for the Krasnoselski–Mann (KM) iteration (see e.g. [1])

$$\begin{aligned} x_{k+1}:=(1-t) x_k+t \ T(x_k) \end{aligned}$$

with some constant stepsize \(t\in [\tfrac{1}{2},1]\) a proof can be found in [12] (Theorem 4.9), where

$$\begin{aligned} L:=-2 \begin{pmatrix} 0 &{} \quad t &{} \quad t &{} \quad \ldots &{} \quad t\\ 0 &{} \quad 0 &{} \quad t &{} \quad \ldots &{} \quad t\\ \vdots &{} \quad \vdots &{} \quad 0 &{} \quad \ddots &{} \quad \vdots \\ 0 &{} \quad 0 &{} \quad 0 &{} \quad 0 &{} \quad t \\ 0 &{} \quad 0 &{} \quad 0 &{} \quad 0 &{} \quad 0 \end{pmatrix} \in {\mathbb {R}}^{k\times k} \end{aligned}$$

is used to to define inequalities of the form (15). However, while in practice the KM-Iteration with constant stepsize may often perform much better than the Halpern-Iteration, its worst-case complexity is an order of magnitude worse—and the convergence analysis based on semidefinite programming is considerably longer.