Non-optimality of the Greedy Algorithm for subspace orderings in the method of alternating projections

The method of alternating projections involves projecting an element of a Hilbert space cyclically onto a collection of closed subspaces. It is known that the resulting sequence always converges in norm and that one can obtain estimates for the rate of convergence in terms of quantities describing the geometric relationship between the subspaces in question, namely their pairwise Friedrichs numbers. We consider the question of how best to order a given collection of subspaces so as to obtain the best estimate on the rate of convergence. We prove, by relating the ordering problem to a variant of the famous Travelling Salesman Problem, that correctness of a natural form of the Greedy Algorithm would imply that $\mathrm{P}=\mathrm{NP}$, before presenting a simple example which shows that, contrary to a claim made in the influential paper [Kayalar-Weinert, Math. Control Signals Systems, vol. 1(1), 1988], the result of the Greedy Algorithm is not in general optimal. We go on to establish sharp estimates on the degree to which the result of the Greedy Algorithm can differ from the optimal result. Underlying all of these results is a construction which shows that for any matrix whose entries satisfy certain natural assumptions it is possible to construct a Hilbert space and a collection of closed subspaces such that the pairwise Friedrichs numbers between the subspaces are given precisely by the entries of that matrix.


Introduction
Let X be a real or complex Hilbert space, N ≥ 2 an integer, and suppose that M 1 , . . . , M N are closed subspaces of X. Furthermore let P k denote the orthogonal projection onto M k , 1 ≤ k ≤ N , and let P M denote the orthogonal projection onto the intersection M = M 1 ∩ . . . ∩ M N . If we let T = P N · · · P 1 then it follows from a classical theorem due to Halperin [8] that for all x ∈ X. It follows easily that, for any x ∈ X, the sequence in X obtained by starting at X and then projecting cyclically onto the N subspaces M 1 , . . . , M N must converge to the point P M x, which is the point in M closest to the starting vector x. This procedure is known as the method of alternating projections and has many applications, for instance to the iterative solution of large linear systems but also in the theory of partial differential equations and in image restoration; see [3] for a survey.
In view of these applications it is important to understand the rate at which the convergence in (1.1) takes place; see for instance [1,2,6,7] for indepth investigations. Recall that the Friedrichs number c(L 1 , L 2 ) between the two subspaces L 1 , L 2 of X is defined as c(L 1 , L 2 ) = sup |(x 1 , x 2 )| : x k ∈ L k ∩ L ⊥ and x k ≤ 1 for k = 1, 2 , where L = L 1 ∩ L 2 . The Friedrichs number lies in the interval [0, 1] and may be thought of as the cosine of the 'angle' between the subspaces L 1 and L 2 . It is shown in [9,Theorem 2] that for N = 2 in the method of alternating projections we have Moreover, the assumption on the subspaces cannot be omitted. The same bound was obtained earlier in [9] in the special case where the subspaces Examples in [5,Section 3] show both that the bound in (1.4) fails to be sharp in some special cases, thus disproving a conjecture made in [9], and more generally that it is not possible for N ≥ 3 to obtain a sharp upper bound for T n − P M , n ≥ 1, which depends only on the pairwise Friedrichs numbers between the subspaces M 1 , . . . , M N . Nevertheless, the estimate in (1.3) recovers the sharp bound in (1.2) when N = 2 and holds with equality in a number of other cases, for instance if all of the spaces M 1 , . . . , M N are one-dimensional. We also see from (1.3) that if the Friedrichs number between a pair of consecutive subspaces is zero then we have convergence in the method of alternating projections after at most two steps. Since our interest here is primarily in the asymptotic rate of convergence as n → ∞, there is no significant loss of generality in assuming that c(M k , M ℓ ) > 0 for 1 ≤ k, ℓ ≤ N with k = ℓ. In this case (1.3) may be recast as , indices henceforth being considered modulo N . Since the asymptotic rate of convergence is determined by the value of r ∈ (0, 1], it is natural to seek the reordering of the subspaces M 1 , . . . , M N which leads to the smallest possible value of r. More formally, given N ≥ 2 we let S N denote the symmetric group on N letters and for each σ ∈ S N we let r σ = N k=1 c(M σ(k) , M σ(k+1) ), so that for the reordered product T σ = P σ(N ) · · · P σ(1) we obtain The objective therefore is to find a permutation σ ∈ S N such that r σ = r * , where r * = min{r σ : σ ∈ S N }, and to find such a permutation a version of the following 'greedy' algorithm was proposed in [9, Section 9]. Greedy Algorithm: Given N ≥ 2 independent closed subspaces M 1 , . . . , M N of a Hilbert space X whose mutual Friedrichs numbers are known we obtain permutations σ k ∈ S N , 1 ≤ k ≤ N , as follows. Let σ k (1) = k and for j = 2, . . . , N consider as possible values for σ k (j) any previously unused index ℓ which minimises c(M σ k (j−1) , M ℓ ). If at any stage there is more than one choice of such an index ℓ then proceed by considering all possible choices of this index and take σ k to be that permutation which among those leading to the least value of r σ k comes first in the lexicographical ordering. Return the permutation σ G = σ ℓ where ℓ ∈ {1, . . . , N } is the smallest index such that r σ ℓ = min{r σ k : 1 ≤ k ≤ N }.
If we let r G = r σ G , N ≥ 2, then the Greedy Algorithm is correct if and only if r G = r * for all constellations of subspaces. By definition of r * it is clear that r * ≤ r G , N ≥ 2. In Section 3 we show that if the Greedy Algorithm were correct then it would follow that P = NP. We then exhibit a simple example with N = 4 in which r * < r G . Both results are obtained as a consequence of a construction, presented in Section 2, which shows that any suitable collection of numbers in [0, 1] arises as the set of pairwise Friedrichs numbers between subspaces of some Hilbert space. This result is of independent interest and in particular implies that the problem of finding an optimal ordering is at least as hard as solving a multiplicative form of the Travelling Salesman Problem (TSP). In Section 4 we give sharp estimates for the maximal discrepancies between r * and r G . In particular, we show that generically r G < r 1/2 * , and that the estimate is optimal in the sense that for every ε ∈ (0, 1) there exists some N ≥ 2 and a suitable collection of N subspaces of some Hilbert space such that r G > (1 − ε)r 1/2 * . The last step once again requires the construction from Section 2.

Friedrichs matrices
Given N ≥ 2 closed subspaces M 1 , . . . , M N of a Hilbert space, we may consider the N × N -matrix (c(M k , M ℓ )) 1≤k,ℓ≤N whose entries are the pairwise Friedrichs numbers between the various subspaces. We call the matrix arising in this way the Friedrichs matrix corresponding to the collection of subspaces. It is clear that any Friedrichs matrix must be symmetric, have zeros along its main diagonal and elsewhere must have entries lying in the interval [0, 1]. Is every square matrix which has these three properties a Friedrichs matrix for some collection of closed subspaces? The following result answers this question in the affirmative. Here and in what follows we use the same notation as in Section 1.
Theorem 2.1. Let F ∈ {R, C} and N ≥ 2, and suppose that C is an N × N -matrix which is symmetric, has zeros along its main diagonal and elsewhere has entries lying in the interval [0, 1]. Then there exists a Hilbert space X over the field F and closed subspaces M 1 , . . . , M N of X such that C is the corresponding Friedrichs matrix. Furthermore, the subspaces can be constructed in such a way that Proof. Let C = (c k,ℓ ) and suppose first that 0 ≤ c k,ℓ < 1 for 1 ≤ k, ℓ ≤ N . Let {e k,ℓ : 1 ≤ k, ℓ ≤ N, k = ℓ} be an orthonormal basis for the space X = F N (N −1) endowed with the Euclidean norm, and set noting that these sets are orthonormal, and consider the closed subspaces of X given by M k = span B k . By our assumption that the entries of C be strictly smaller than 1 we see By the first part we may find closed subspaces and endow X with its natural Hilbert space norm. Moreover, let U, V be two closed subspaces of c ℓ,m = 1 and k = ℓ, V, c ℓ,m = 1 and k = m, {0}, otherwise, [4,Theorem 9.35] this implies that c(M k , M ℓ ) = 1 = c k,ℓ , and hence we have the required subspaces. Moreover, it is clear from the construction that M k ∩ M ℓ = {0} for 1 ≤ k, ℓ ≤ N with k = ℓ and, if N ≥ 3, that P k P ℓ P m = 0 for 1 ≤ k, ℓ, m ≤ N mutually distinct.

Incorrectness of the Greedy Algorithm
In this section we turn to the Greedy Algorithm presented in Section 1, and in particular we ask whether the algorithm is correct in the sense that the ordering it produces leads to the optimal value of r ∈ [0, 1] in (1.4). We first consider the connection between our problem of finding an optimal ordering and the classical TSP, and we show in Corollary 3.3 below that correctness of the Greedy Algorithm for a sufficiently large class of cases would imply that P = NP. We then exhibit a simple example in which the Greedy Algorithm gives a suboptimal ordering.
Recall that in the graph-theoretical formulation of the TSP we are given, for some N ≥ 2, a complete graph K N with vertices V N = {1, 2, . . . , N } and a weight function w : (k, ℓ) ∈ V 2 N : k = ℓ → R such that w(k, ℓ) = w(ℓ, k) for 1 ≤ k, ℓ ≤ N with k = ℓ, and the objective is to find a permutation σ * ∈ S N such that Σ σ * = min{Σ σ : σ ∈ S N }, where for a permutation σ ∈ S N we let Σ σ = N k=1 w σ(k), σ(k + 1) with indices, as usual, considered modulo N . We will be interested primarily in the multiplicative form of the TSP, denoted by MTSP, in which the objective is to minimise not the additive cost but instead to find σ * ∈ S N such that Π σ * = min{Π σ : σ ∈ S N }, where for a permutation σ ∈ S N we let Π σ = N k=1 w σ(k), σ(k + 1) .
It is clear that TSP and MTSP have the same solution, and indeed one may pass from one form of the problem to the other simply by replacing the weight function by its logarithm or its exponential, as appropriate. Furthermore, the solution of TSP is unaffected by shifting the values of the weight function by a constant amount, which implies in particular that there is no loss of generality in considering the MTSP only for weight functions taking values in the range [0, 1].
It is well known that the TSP, and hence also MTSP, is NP-complete. This means that it lies in the complexity class NP and is NP-hard, which is to say that any other problem in NP can be transformed into an instance of the TSP in polynomial time. Furthermore, by considering the corresponding decision problems it can be seen that TSP and hence MTSP remain NPcomplete if the weight function is assumed to take distinct values on distinct pairs. Our first result is an application of Theorem 2.1 showing that the subspace ordering problem is NP-hard.
Proposition 3.1. The problem of finding an optimal ordering for collections of independent closed subspaces with pairwise distinct Friedrichs numbers is NP-hard.
Proof. It suffices to show that every instance of TSP with distinct costs can be transformed in polynomial time into a subspace ordering problem with pairwise distinct Friedrichs numbers. However, this follows straightforwardly from Theorem 2.1. Indeed, given a TSP problem on N ≥ 2 vertices we may transform it to an instance of MTSP with weight function taking values in the range [0, 1] in O(N 2 ) steps. Let C = (c k,ℓ ) 1≤k,ℓ≤N be the symmetric matrix with zeros along its main diagonal and entries c k,ℓ = w(k, ℓ) for 1 ≤ k, ℓ ≤ N with k = ℓ. By Theorem 2.1 there exists a Hilbert space X and independent closed subspaces M 1 , . . . , M N of X such that C is the associated Friedrichs matrix. Moreover, it is clear from the proof of Theorem 2.1 that it is possible to obtain these subspaces in polynomial time. If we find a permutation σ * ∈ S N such that r σ * = r * , then since r σ = Π σ for all σ ∈ S N the permutation σ * also solves our instance of MTSP, and hence the original TSP problem. Since TSP is known to be NP-hard, our problem is too.
Remark 3.2. Note that the subspaces M 1 , . . . , M N are not merely independent but satisfy the much stronger conditions described in Theorem 2.1. In particular, the result remains true if the subspaces which we are trying to order are merely pairwise quasi-disjoint in the sense of Section 1.
The result shows that the existence of any polynomial-time algorithm which solves the subspace ordering problem in a sufficiently large number of cases implies that P = NP. In particular, we obtain the following consequence for the Greedy Algorithm. Proof. It is straightforward to see that if all the pairwise Friedrichs numbers are distinct then the Greedy Algorithm terminates after O(N 3 ) steps, where N ≥ 2 is the number of subspaces we a required to order optimally.
Remark 3.4. The version of the Greedy Algorithm formulated in [9, Section 9] differs from ours in that it does not consider all possible greedy paths and hence runs in polynomial time even if the pairwise Friedrichs numbers are not assumed to be distinct. Note also that, as in the case of Proposition 3.1, the assumption of independence on the subspaces can be relaxed to pairwise quasi-disjointness.
Given that the question whether P = NP is a long-standing open problem, one may view Proposition 3.1 as evidence suggesting that the Greedy Algorithm does not in general lead to an optimal ordering of the subspaces in question. This is indeed the case, as the following example illustrates.
Remark 3.6. Example 3.5 disproves a claim made in [9, Section 9], namely that the Greedy Algorithm always leads to an optimal ordering in the case of independent subspaces. The examples considered in [9, Section 9] involve only N = 3 subspaces, a special case in which the Greedy Algorithm performs an exhaustive search of all possible orderings (up to the direction in which they are traversed) and in particular is correct. Thus Example 3.5 is minimal in terms of the number of subspaces involved.

Sharp estimates for the degree of suboptimality
Having shown in Section 3 that the Greedy Algorithm does not in general lead to an optimal ordering of the subspaces in the method of alternating projections, we seek now to quantify how much the result reached by the Greedy Algorithm can disagree with the optimal result. Given a collection of closed subspaces of a Hilbert space such that at least one of the pairwise Friedrichs numbers is zero, we see that for suitable orderings of the subspaces we obtain convergence after at most two steps in the method of alternating projections. Another essentially uninteresting case for asymptotic analysis is when all of the pairwise Friedrichs numbers equal 1, so that no ordering leads to a useful estimate in (1.3). If either of these two cases holds we shall say that the collection of subspaces involved is non-generic, and otherwise we call it generic.  r * ≤ r G ≤ r 1/2 * . Moreover, the second inequality is strict unless the collection M 1 , . . . , M N of subspaces is non-generic Proof. For 1 ≤ k ≤ N let σ k ∈ S N be the permutation produced by running the Greedy Algorithm with the starting vertex σ k (1) = k and let r k = r σ k . Then certainly r * ≤ r k for 1 ≤ k ≤ N , and hence also r * ≤ r G . For 1 ≤ k, ℓ ≤ N let s k (ℓ) = σ k σ −1 k (ℓ) + 1 denote the index of the successor to M ℓ in the ordering of the subspaces determined by σ k , noting that s k (ℓ) = 1 if σ k (ℓ) = N . Let σ ∈ S N and for , which is to say that in the ordering determined by σ k the subspace M σ(ℓ) comes before M σ(ℓ+1) , then by definition of the Greedy Algorithm we must have Since w takes values in [0, 1] it follows that (4.2) w σ(ℓ), s k (σ(ℓ) w σ(ℓ + 1), s k (σ(ℓ + 1) ≤ w σ(ℓ), σ(ℓ + 1) for 1 ≤ k, ℓ ≤ N . Thus for 1 ≤ k ≤ N we have (4.3) w σ(ℓ), s k (σ(ℓ) w σ(ℓ + 1), s k (σ(ℓ + 1) Since σ ∈ S N was arbitrary we deduce that r 2 k ≤ r * for 1 ≤ k ≤ N , and in particular r 2 G ≤ r * , as required. Now suppose that r 2 G = r * , and let σ * ∈ S N be a permutation such that r σ * = r * . Since r 2 G ≤ r 2 k ≤ r * for 1 ≤ k ≤ N , we see that in fact r 2 k = r * for 1 ≤ k ≤ N . Now either one of the pairwise Friedrichs numbers is zero or all of the pairwise Friedrichs numbers are non-zero. In the latter case it is clear from (4.3) that we must have equality in (4.2) for 1 ≤ k, ℓ ≤ N when σ = σ * . Taking k = σ * (ℓ) in (4.2) for 1 ≤ ℓ ≤ N , it follows that w σ * (ℓ), σ * (ℓ + 1) = min w(σ * (ℓ), k) : 1 ≤ k ≤ N, k = σ * (ℓ) for 1 ≤ ℓ ≤ N . It follows that σ * is itself a permutation considered by the Greedy Algorithm, and therefore r * = r G . Hence r 2 * = r * , and since r * = 0 we have r * = 1, which implies that c(M k , M ℓ ) = 1 for 1 ≤ k, ℓ ≤ N with k = ℓ. It follows that r 2 G < r * unless the collection M 1 , . . . , M N of subspaces is non-generic.
It remains to be investigated to what extent the second bound in (4.1) is sharp for generic constellations of subspaces. Our final example shows that it cannot be improved in the sense that given any ε ∈ (0, 1) there exists a generic constellation of subspaces of some Hilbert space such that In fact, there exists a constellation of N such subspaces for every even N ≥ 4. c if k = ℓ ± 1 (mod N ) cδ if k = ℓ ± 2 (mod N ) and k is even, 1 otherwise.
Let σ 0 ∈ S N denote the identity permutation. Then r σ 0 = c N . If we think of the subspaces as the vertices of a complete graph of order N , and we let the edges have weights given by the pairwise Friedrichs numbers, then r σ ≥ r σ 0 for all permutations σ ∈ S N involving no cδ-edges. Moreover, any cycle σ ∈ S N which uses at least one of the cδ-edges cannot use more than n − 1 of them, and must involve at least two 1-edges, so for any such cycle r σ ≥ c n−1 (cδ) n−1 = c N −2 δ n−1 .

Acknowledgements
For financial support O.D. thanks Magdalen College, Oxford, A.J. thanks the Mathematical Institute of the University of Oxford, S.R. and R.S. thank both St John's College, Oxford, and the Mathematical Institute, and L.S. thanks the EPSRC. All authors would further like to express their thanks to Alexis Chevalier, Stefan Kiefer, Dominik Peters and Zhixuan Wang for useful discussions.