Theorem 2.1
Let \( x_0 \in H\) be arbitrary but fixed. If T has fixed points, i.e. \( Fix(T) \not = \emptyset \), then the iterates defined in (1) satisfy
$$\begin{aligned} \tfrac{1}{2} \Vert x_k - T(x_k) \Vert \le \frac{ \Vert x_0 - x_* \Vert }{k+1} \ \forall k \in {\mathbb {N}}_0 \quad \forall x_* \in Fix(T) \end{aligned}$$
(2)
This bound is tight.
Remark 2.1
A generalization of the Halpern-Iteration, the sequential averaging method (SAM), was analyzed in the recent paper
[14], where for the first time a rate of convergence of order O(1/k) could be established for SAM. The rate of convergence in (2) is even slightly faster than the one established for the more general framework in
[14] (by a factor of 4). More importantly, however, as shown by Example 3.1 below, the estimate (2) is actually tight, in the sense that for every \(k\in {\mathbb {N}}_0\) there exists a Hilbert space H and a nonexpansive operator T with some fixed point \(x_*\) such that the inequality (2) is not strict.
Estimate (2) refers to the step length \(\lambda _k:=1/(k+2)\). The restriction to this choice is motivated by problem (17) below in the proof based on semidefinite programming; in numerical tests for small for dimensions k these coefficients provided a better worst-case complexity than any other choice of coefficients.
Next, an elementary direct proof of Theorem 2.1 is given.
Direct proof based on a weighted sum
The iteration (1) with \(\lambda _k=1/(k+2)\) implies for \(1\le j\le k\):
$$\begin{aligned} \textstyle x_j = \frac{1}{j+1}x_0 + \frac{j}{j+1}T(x_{j-1}) \quad \hbox {or}\,\, \ T(x_{j-1}) = \frac{j+1}{j}x_j-\frac{1}{j}x_0 . \end{aligned}$$
(3)
By nonexpansiveness the following inequalities hold:
$$\begin{aligned} {\Vert T(x_k)-x_*\Vert ^2}\le {\Vert x_k-x_*\Vert ^2} \quad \hbox {for}\,\, x_* \in Fix(T) \end{aligned}$$
(4)
and
$$\begin{aligned} {\Vert T(x_j)-T(x_{j-1})\Vert ^2}\le {\Vert x_j-x_{j-1}\Vert ^2} \quad \hbox {for} \,\, j=1,\ldots ,k. \end{aligned}$$
(5)
Below we reformulate the following weighted sum of (5):
$$\begin{aligned} 0 \ge \sum _{j=1}^kj(j+1)\left( {\Vert T(x_j)-T(x_{j-1})\Vert ^2}- {\Vert x_j-x_{j-1}\Vert ^2} \right) . \end{aligned}$$
(6)
Using the second relation in (3) the first terms in the summation (6) are
$$\begin{aligned}&j(j+1){\Vert T(x_j)-T(x_{j-1})\Vert ^2} \nonumber \\&\quad = j(j+1) {\Vert x_j-T(x_j)+\frac{1}{j} (x_j-x_0)\Vert ^2} \nonumber \\&\quad = \textstyle j(j+1){\Vert x_j-T(x_j)\Vert ^2}+2(j+1)\langle x_j-T(x_j) , x_j-x_0 \rangle +\frac{j+1}{j} {\Vert x_j-x_0\Vert ^2}, \nonumber \\ \end{aligned}$$
(7)
and using the first relation in (3) it follows for the second terms in (6)
$$\begin{aligned}&-j(j+1){\Vert x_j-x_{j-1}\Vert ^2} \nonumber \\&\quad = \textstyle - j(j+1) {\Vert \frac{1}{j+1}(x_0-T(x_{j-1}))+T(x_{j-1}) -x_{j-1}\Vert ^2} \nonumber \\&\quad = \textstyle -\frac{j}{j+1} {\Vert x_0-T(x_{j-1})\Vert ^2}-2j\langle x_0-T(x_{j-1}) , T(x_{j-1}) -x_{j-1} \rangle \nonumber \\&\qquad -j(j+1) {\Vert T(x_{j-1}) -x_{j-1}\Vert ^2}. \end{aligned}$$
(8)
Observe [using again the second relation in (3)] that the first term in (8)
$$\begin{aligned} -\frac{j}{j+1} {\Vert x_0-T(x_{j-1})\Vert ^2}= & {} - \frac{j}{j+1} {\Vert \frac{j+1}{j}x_0-\frac{j+1}{j}x_j\Vert ^2} \nonumber \\= & {} - \frac{j+1}{j}{\Vert x_0-x_j\Vert ^2} \end{aligned}$$
(9)
cancels the third term in (7). Summing up the second terms in (8) for \(j=1,\ldots ,k\) we shift the summation index,
$$\begin{aligned} -\sum _{j=1}^k 2j\langle x_0-T(x_{j-1}) , T(x_{j-1})-x_{j-1} \rangle =\sum _{j=0}^{k-1}2(j+1)\langle x_{j}-T(x_{j}) , x_0-T(x_{j}) \rangle , \end{aligned}$$
so that summing up the second terms in (7) and in (8) for \(j=1,\ldots ,k\) results in
$$\begin{aligned}&2(k+1)\langle x_k-T(x_k) , x_k-x_0 \rangle \nonumber \\&\quad +2\sum _{j=1}^{k-1}(j+1) \langle x_j-T(x_j) , x_j-T(x_j) \rangle +2{\Vert x_0-T(x_0)\Vert ^2}. \end{aligned}$$
(10)
Shifting again the index in the summation of the third terms in (8)
$$\begin{aligned} -\sum _{j=1}^k j(j+1){\Vert x_{j-1}-T(x_{j-1})\Vert ^2}= -\sum _{j=0}^{k-1} (j+1)(j+2){\Vert x_{j}-T(x_{j})\Vert ^2} \end{aligned}$$
and summing up the first terms in (7) and the third terms in (8) for \(j=1,\ldots ,k\) gives
$$\begin{aligned} k(k+1){\Vert x_{k}-T(x_{k})\Vert ^2}-2\sum _{j=1}^{k-1}(j+1){\Vert x_j-T(x_j)\Vert ^2}-2{\Vert x_0-T(x_0)\Vert ^2} \end{aligned}$$
(11)
where the sum in the middle cancels the sum in the middle of (10) and the terms \(2{\Vert x_0-T(x_0)\Vert ^2}\) cancel as well. The only remaining terms are the first terms in (10) and (11).
Thus, inserting (9), (10), and (11) in (6) leads to
$$\begin{aligned} 0 \ge k(k+1){\Vert x_k-T(x_k)\Vert ^2}+2(k+1)\langle x_k-T(x_k) , x_k-x_0 \rangle . \end{aligned}$$
(12)
Applying the Cauchy–Schwarz inequality to the second term in (12) leads to
$$\begin{aligned} \frac{1}{2} \Vert x_k-T(x_k)\Vert \le \frac{1}{k} \Vert x_k-x_0\Vert \end{aligned}$$
which may be interesting in its own right. To prove the theorem, (12) is divided by \(k+1\) and then (4) is added:
$$\begin{aligned} 0\ge & {} k{\Vert x_k-T(x_k)\Vert ^2}+2\langle x_k-T(x_k) , x_k-x_0 \rangle + {\Vert T(x_k)-x_*\Vert ^2}- {\Vert x_k-x_*\Vert ^2}\nonumber \\= & {} \textstyle \frac{k+1}{2} {\Vert x_k-T(x_k)\Vert ^2}-\frac{2}{k+1}{\Vert x_0-x_*\Vert ^2} \nonumber \\&+ \frac{2}{k+1}{\Vert x_0-x_*- \frac{k+1}{2}(x_k-T(x_k))\Vert ^2}. \end{aligned}$$
(13)
To see the last equation, the last two terms in (13) can be combined, and then a straightforward but tedious multiplication of the terms \(a:=x_k-T(x_k)\), \(b:= x_k-x_0\), \(c:= T(x_k)-x_*\), \(a+c=x_k-x_*\), and \(a+c-b =x_0-x_*\) reveals the identity.
Omitting the last term in (13) one obtains
$$\begin{aligned} {\Vert x_k-T(x_k)\Vert ^2}\le \left( \tfrac{2}{k+1}\right) ^2 {\Vert x_0-x_*\Vert ^2} \end{aligned}$$
which proves the theorem when taking square roots on both sides. \(\square \)
The above proof is somewhat unintuitive as the choice of the weights with which the inequalities (4) and (5) are added in (6) and (13) is far from obvious. In fact we owe the suggestion of these weights to an extremely helpful anonymous referee, who extracted it from a more complex construction in
[15] which was also the basis for the initial proof of this paper based on semidefinite programming. We state this proof next since it offers a generalizable approach for analyzing fixed point iterations; it can be modified to the KM iteration, for example in the recent thesis
[12]—though this modification is quite technical. The proof based on semidefinite programming also led to Example 3.1 below showing that the rate of convergence is tight.
Proof based on semidefinite programming Let \(x_* \in Fix(T)\). The Halpern-Iteration was stated in the form (1) to comply with existing literature. For our proof however, it is more convenient to consider the shifted sequence \({\bar{x}}_1:= x_0\) and \({\bar{x}}_k:=x_{k-1} \ \forall k \in {\mathbb {N}}_{\not = 0 }:=\{1,2,3,\dots \} \) and to show a shifted statement
$$\begin{aligned} \tfrac{1}{2} \Vert {\bar{x}}_k - T({\bar{x}}_k) \Vert \le \frac{\Vert {\bar{x}}_1-x_* \Vert }{k} \quad \forall \,\, k\in {\mathbb {N}}_{\not =0 } \end{aligned}$$
(14)
Let us define \(g(x):=\frac{1}{2}(x-T(x))\). It is well known that g is firmly nonexpansive. For sake of completeness the argument is repeated here:
$$\begin{aligned} \begin{aligned}&\Vert g(x)-g(y) \Vert ^2 - \langle g(x)-g(y) , x-y \rangle \\&\quad = \Vert g(x)-g(y) - \tfrac{1}{2}(x-y) \Vert ^2 - \tfrac{1}{4} \Vert x-y \Vert ^2 \\&\quad = \tfrac{1}{4} \Vert T(x)-T(y) \Vert ^2 - \tfrac{1}{4}\Vert x-y \Vert ^2 \le 0 \quad \forall \,\, x,y\in H. \end{aligned} \end{aligned}$$
Nonexpansiveness and the Cauchy–Schwarz inequality also imply \( \Vert g(x)-g(y) \Vert \le \Vert x-y \Vert \forall x,y\in H\). For \( k =1\) the statement (14) follows immediately since \(g(x_*)=0\) and therefore \( \tfrac{1}{2} \Vert {\bar{x}}_1 - T( {\bar{x}}_1) \Vert = \Vert g({\bar{x}}_1 ) \Vert = \Vert g({\bar{x}}_1) - g( {\bar{x}}^*) \Vert \le \tfrac{\Vert {\bar{x}}_1-x_* \Vert }{1}\).
For fixed \(k\ge 2\) we first consider the differences \({\bar{x}}_j - {\bar{x}}_1\) for \( j \in \{2,\dots ,k \}\)
$$\begin{aligned} \begin{aligned} {\bar{x}}_{j} -{\bar{x}}_1&= x_{j-1} - {\bar{x}}_1\\&= \lambda _{j-2} x_0 + (1- \lambda _{j-2}) T(x_{j-2}) - {\bar{x}}_1\\&= \tfrac{1}{j} x_0 + \left( 1- \tfrac{1}{j}\right) T(x_{j-2} ) - {\bar{x}}_1 \\&= \left( \tfrac{1}{j}-1\right) {\bar{x}}_1 + \left( 1- \tfrac{1}{j}\right) T( {\bar{x}}_{j-1} ) \\&= \left( \tfrac{1}{j} -1\right) {\bar{x}}_1 + \left( 1- \tfrac{1}{j}\right) ({\bar{x}}_{j-1} - 2 g( {\bar{x}}_{j-1}) ) \\&= \left( 1-\tfrac{1}{j}\right) ({\bar{x}}_{j-1} - {\bar{x}}_1) - 2 \left( 1- \tfrac{1}{j}\right) g({\bar{x}}_{j-1} ) \\&= \tfrac{j-1}{j} ({\bar{x}}_{j-1} - {\bar{x}}_1) - 2 \tfrac{j-1}{j} g({\bar{x}}_{j-1} ) \end{aligned} \end{aligned}$$
which inductively leads to
$$\begin{aligned} {\bar{x}}_{j} -{\bar{x}}_1=- 2 \sum _{l=1}^{j-1} \tfrac{l}{j} g({\bar{x}}_{l}). \end{aligned}$$
Let us shorten the notation slightly and define \( g_i:= g({\bar{x}}_i) \), \( R:= \Vert {\bar{x}}_1 -x_* \Vert \ge 0\), the vector \(b=(\langle g_i , {\bar{x}}_1-x_* \rangle )_{i=1}^k\), the matrices \(A:= (\langle g_i , g_j \rangle )_{i,j=1}^k\) and
$$\begin{aligned} L:=-2 \begin{pmatrix} 0 &{} \quad \tfrac{1}{2} &{} \quad \tfrac{1}{3} &{} \quad \ldots &{} \quad \tfrac{1}{k}\\ 0 &{} \quad 0 &{} \quad \tfrac{2}{3} &{} \quad \ldots &{} \quad \tfrac{2}{k}\\ \vdots &{} \quad \vdots &{} \quad 0 &{} \quad \ddots &{} \quad \vdots \\ 0 &{} \quad 0 &{} \quad 0 &{} \quad 0 &{} \quad \tfrac{k-1}{k} \\ 0 &{} \quad 0 &{} \quad 0 &{} \quad 0 &{} \quad 0 \end{pmatrix} \in {\mathbb {R}}^{k\times k}. \end{aligned}$$
Let \(b^T\) denote the transpose of b. Note that
$$\begin{aligned} \begin{pmatrix} R^2 &{} \quad b^T \\ b &{} \quad A \end{pmatrix} \in {\mathbb {R}}^{(k+1) \times (k+1)} \end{aligned}$$
is a Gramian matrix formed from \( {\bar{x}}_1-x_*,g_1,\dots ,g_k\in H\) and is therefore symmetric and positive semidefinite. We proceed by expressing the inequalities from firm nonexpansiveness in terms of the Gram-Matrix. Since L often is of much lower dimension than H, this is sometimes referred to as “kernel trick”. Keeping in mind that we can rewrite the differences \( {\bar{x}}_j - {\bar{x}}_1 = - 2 \sum _{l=1}^{j-1} \frac{l}{j} g_l\) for \(j\in \{1,\dots ,k\}\), we arrive at
$$\begin{aligned} AL = ( \langle g_i , {\bar{x}}_j - {\bar{x}}_1 \rangle )_{i,j=1}^k. \end{aligned}$$
Let \(e \in {\mathbb {R}}^k\) denote the vector of all ones. Then
$$\begin{aligned} \mathrm {diag} (AL) e^T - AL= ( \langle g_i , {\bar{x}}_i - {\bar{x}}_j \rangle )_{i,j=1}^k, \end{aligned}$$
where \(\mathrm {diag}(.)\) denotes the diagonal of its matrix argument, holds true. Hence
$$\begin{aligned} \mathrm {diag} (AL) e^T +e \ \mathrm {diag} (AL)^T - AL -L^T A= ( \langle g_i-g_j , {\bar{x}}_i - {\bar{x}}_j \rangle )_{i,j=1}^k \end{aligned}$$
and
$$\begin{aligned} b e^T + AL= & {} ( \langle g_i , {\bar{x}}_j - x_* \rangle )_{i,j=1}^k,\\ \mathrm {diag} (A) e^T + e \ \mathrm {diag}(A)^T -2 A= & {} (\Vert g_i-g_j \Vert ^2)_{i,j=1}^k. \end{aligned}$$
The firm nonexpansiveness inequalities \( \Vert g_i- g_j \Vert ^2 \le \langle g_i-g_j , {\bar{x}}_i-{\bar{x}}_j \rangle \) are equivalent to the component-wise inequality
$$\begin{aligned} \mathrm {diag} (A) e^T + e \ \mathrm {diag}(A)^T -2 A\le & {} \mathrm {diag} (AL) e^T +e \ \mathrm {diag} (AL)^T \nonumber \\&- AL -L^T A. \end{aligned}$$
(15)
Note that only \( \tfrac{k^2-k}{2}\) of these componentwise inequalities are non redundant. From
\(g_*:=g(x_*)=0\) we obtain another k inequalities, i.e. \( \Vert g_i \Vert ^2 \le \langle g_i , {\bar{x}}_i-x_* \rangle \), which translate to
$$\begin{aligned} \mathrm {diag}(A) \le b + \mathrm {diag}(AL). \end{aligned}$$
(16)
Defining \(U:=I-L\) , relations (15) and (16) can be shortened slightly to
$$\begin{aligned} \mathrm {diag} (AU) e^T + e \ \mathrm {diag}(AU)^T \le AU +U^T A \end{aligned}$$
and
$$\begin{aligned} \mathrm {diag}(AU) \le b. \end{aligned}$$
Let \(e_k \in {\mathbb {R}}^k\) denote the k-th unit vector, \( {\mathbb {S}}^n := \{ X \in {\mathbb {R}}^{n \times n }\ | \ X=X^T \}\) denote the space of symmetric matrices and \({\mathbb {S}}_+^n := \{ X \in {\mathbb {S}}^n\ | x^TXx\ge 0 \ \forall x \in {\mathbb {R}}^n \}\) the convex cone of positive semidefinite matrices. Consider the chain of inequalities
$$\begin{aligned}&\begin{aligned} \Vert g({\bar{x}}_k) \Vert ^2 = \underset{ y_0 \in {\mathbb {R}}, y_1 \in {\mathbb {R}}^{k} , Y_2\in {\mathbb {S}}^k}{\text {maximize}} (Y_2)_{kk} \,&| \ \begin{pmatrix} y_0 &{} y_1^T \\ y_1 &{} Y_2 \end{pmatrix} \in {\mathbb {S}}_+^{k+1}, \ y_0\le R^2, \ \mathrm {diag}(Y_2 U) \le y_1 \\&|\ \mathrm {diag}(Y_2 U)e^T + e \ \mathrm {diag}(U^T Y_2)^T \le Y_2U+ U^TY_2 \\&| \ y_0=R^2 ,y_1=b, Y_2= A \\ \\ \end{aligned} \nonumber \\&\begin{aligned} \le \underset{ y_0 \in {\mathbb {R}}, y_1 \in {\mathbb {R}}^{k} , Y_2=Y_2^T \in {\mathbb {S}}^k}{\text {maximize}} (Y_2)_{kk} \&| \ \begin{pmatrix} y_0 &{} y_1^T \\ y_1 &{} Y_2 \end{pmatrix} \in {\mathbb {S}}_+^{k+1}, \ y_0\le R^2, \ \mathrm {diag}(Y_2 U) \le y_1 \\&|\ \mathrm {diag}(Y_2 U)e^T + e \ \mathrm {diag}(U^T Y_2)^T \le Y_2U+ U^TY_2 \\ \\ \end{aligned} \end{aligned}$$
(17)
$$\begin{aligned}&\begin{aligned} \le \underset{ \xi \in {\mathbb {R}}_+, X\in {\mathbb {S}}^k \cap {\mathbb {R}}_+^{k\times k }}{\text {minimize}} \ \ R^2 \xi \ | \&\begin{pmatrix} \xi &{} -\tfrac{1}{2} \mathrm {diag}(X)^T \\ -\tfrac{1}{2} \mathrm {diag}(X) &{} U F(X) + F(X)U^T \end{pmatrix} - \begin{pmatrix} 0&{} 0\\ 0 &{} e_ke_k^T \end{pmatrix} \in {\mathbb {S}}_+^{k+1} \end{aligned} \end{aligned}$$
(18)
for
$$\begin{aligned} F(X) := \mathrm {Diag}(Xe) + \tfrac{1}{2} \mathrm {Diag}(\mathrm {diag}(X))-X, \end{aligned}$$
(19)
where \(\mathrm {Diag}(.)\) denotes the square diagonal matrix with its vector argument on the diagonal. The first equality follows from construction, the first inequality from relaxing, and the second inequality from weak conic duality as detailed in Sect. 5. We conclude the proof by showing feasibility of \({\hat{\xi }} := \frac{1}{k^2} >0\) and
$$\begin{aligned} {\hat{X}} := \frac{1}{k^2} \begin{pmatrix} 0 &{} \quad 1 \cdot 2 &{} \quad 0 &{} \quad \ldots &{} \quad \ldots &{} \quad 0 \\ 1 \cdot 2 &{} \quad 0&{} \quad 2 \cdot 3 &{} \quad 0 &{} \quad \ldots &{} \quad \vdots \\ 0 &{} \quad 2 \cdot 3 &{} \quad \ddots &{} \quad \ddots &{} \quad \ddots &{} \quad \vdots \\ \vdots &{} \quad \ddots &{} \quad \ddots &{} \quad \ddots &{} \quad (k-2)(k-1) &{} \quad 0 \\ 0&{} \quad \ldots &{} \quad 0&{} \quad (k-2)(k-1) &{} \quad 0&{} \quad (k-1)k \\ 0 &{} \quad \ldots &{} \quad 0 &{} \quad 0 &{} \quad (k-1)k &{} \quad 2 k \end{pmatrix} \in {\mathbb {R}}^{k \times k} \end{aligned}$$
for the last optimization problem (18). First note that \({\hat{X}}= {\hat{X}}^T\) is symmetric and nonnegative. A short computation reveals, that the equality
$$\begin{aligned} U F({\hat{X}}) + F({\hat{X}})U^T= 2 e_k e_k^T \end{aligned}$$
holds true: Define the diagonal matrix \( D:= \frac{1}{k} \mathrm {Diag}([1,\ldots ,k ]^T ) \in {\mathbb {R}}^{k \times k }\), together with the strict upper triangular matrix
$$\begin{aligned} P:= \begin{pmatrix} 0 &{} \quad 1 &{} \quad \ldots &{} \quad 1\\ &{} \quad \ddots &{} \quad \ddots &{} \quad \vdots \\ &{} \quad &{} \quad \ddots &{} \quad 1 \\ &{} \quad &{} \quad &{} \quad 0 \end{pmatrix} \in {\mathbb {R}}^{k\times k} \end{aligned}$$
(20)
and the bidiagonal matrix
$$\begin{aligned} B:= \begin{pmatrix} 0 &{} \quad 1 &{} \quad &{} \quad \\ 1 &{} \quad \ddots &{} \quad \ddots &{} \quad \\ &{} \quad \ddots &{} \quad \ddots &{} \quad 1 \\ &{} \quad &{} \quad 1 &{} \quad 0 \end{pmatrix} \in {\mathbb {S}}^k. \end{aligned}$$
The matrices \(U, {\hat{X}}\) and \(F({\hat{X}})\) can now be expressed as
$$\begin{aligned} \begin{aligned}&U= I +2 D P D^{-1}, \quad {\hat{X}} = DBD+\frac{2}{k} e_ke_k^T&\quad \text { and } \\&F({\hat{X}} ) = 2D^2-e_ke_k^T -DBD. \end{aligned} \end{aligned}$$
(21)
Combining the equalities (21), \(D e_k =e_k\) and \( D^{-1} e_k =e_k\), yields
$$\begin{aligned} \begin{aligned} U F ({\hat{X}} )&= 2 D^2 -e_k e_k^T- DBD +4 DPD-2 DP e_k e_k^T -2 DP BD \\&= D(2I -e_k e_k^T -B+4 P-2 P e_k e_k^T -2 PB) D \end{aligned} \end{aligned}$$
(22)
and using (22) we compute
$$\begin{aligned} \begin{aligned}&U F({\hat{X}}) + F({\hat{X}}) U^T -2 e_k e_k^T \\&\quad = D(4I - 2e_k e_k^T -2 B + 4 P-2 P e_k e_k^T -2 PB + 4 P^T-2 e_k e_k^T P^T -2 B P^T - 2 e_k e_k^T)D \\&\quad = D(\underbrace{4I+ 4 P+ 4 P^T}_{=4 e e^T} - 4e_k e_k^T -2 \underbrace{P e_k}_{e-e_k} e_k^T -2 e_k \underbrace{e_k^T P^T}_{e^T-e_k^T} -2 B -2 PB -2 B P^T )D \\&\quad = D (4 e e^T \underbrace{-2 ee_k^T -2 e_k e^T -2 B -2 PB -2 B P^T}_{=-4 ee^T}) D \\&\quad = 0, \end{aligned} \end{aligned}$$
which implies \(U F({\hat{X}}) + F({\hat{X}})U^T= 2 e_k e_k^T\) as we claimed above. Consequently
$$\begin{aligned} \begin{pmatrix} {\hat{\xi }} &{} \quad -\tfrac{1}{2} \mathrm {diag}({\hat{X}})^T \\ -\tfrac{1}{2} \mathrm {diag}({\hat{X}}) &{} \quad U F({\hat{X}}) + F({\hat{X}})U^T \end{pmatrix} - \begin{pmatrix} 0&{} \quad 0\\ 0 &{} \quad e_ke_k^T \end{pmatrix} =\begin{pmatrix} \frac{1}{k^2} &{} \quad - \frac{1}{k} e_k^T \\ - \frac{1}{k} e_k &{} \quad e_k e_k^T \end{pmatrix} \succeq 0 \end{aligned}$$
is positive semidefinite and as a result, \( {\hat{\xi }}\) and \( {\hat{X}}\) is feasible for (18). Hence
$$\begin{aligned} \Vert g({\bar{x}}_k) \Vert ^2 \le R^2 {\hat{\xi }} = \frac{\Vert {\bar{x}}_1 - x_* \Vert ^2}{k^2}, \end{aligned}$$
which yields the desired result after taking the square root. \(\square \)
Remark 2.2
The matrix \({\hat{X}}\) in the above proof carrying the weights \(j(j+1)\) used in (6) were obtained by solving (18) with YALMIP
[11] in combination with the SDP solver Sedumi
[13] for small values of k. In order to provide a theoretical proof that the points \({\hat{\xi }}\) and \({\hat{X}}\) above are not only feasible but actually optimal for (18) and to prove tightness of the derived bound, we refer to Example 3.1 below, which was derived from a, numerically obtained, low-rank optimal solution of (17). More precisely, after numerically determining the optimal value of (18) a linear equation was added to (18) requiring that \((Y_2)_{kk}\) equals this value, and then the trace of \(Y_{2,2}\) was minimized with the intention to find the optimal solution with minimum rank. This optimal solution was then used to derive Example 3.1 below proving the tightness of (2). In fact for any optimal solution of the SDP relaxation (17), there exists at least one Lipschitz continuous operator \({{\tilde{T}}}_k:{\mathbb {R}}^d \rightarrow {\mathbb {R}}^d\) for appropriately chosen d with some fixed point \(x_*\) such that the inequality in (18) is tight: This follows from appropriately labeling the columns of the symmetric square root of such an optimal solution and a Lipschitz extension argument. Specifically the Kirszbraun-Theorem
[7] allows a Lipschitz extension of an operator that is Lipschitz on a discrete set to the entire space. For further details we refer to Section 4.2 of
[12].