Modified Iterations for Data-Sparse Solution of Linear Systems

A modification of standard linear iterative methods for the solution of linear equations is investigated aiming at improved data-sparsity with respect to a rank function. The convergence speed of the modified method is compared to the rank growth of its iterates for certain model cases. The considered general setup is common in the data-sparse treatment of high dimensional problems such as sparse approximation and low rank tensor calculus.


Introduction
In this work we investigate modifications of linear fixed-point iterations for computing approximate solutions of a linear equation in a Banach space V. A standard linear iterative method for solving (1.1) takes the form u m+1 = Mu m + Nf (1.2) and corresponds to a linear fixed-point equation of N is a concrete function of A, then this function defines a class of iterative solvers for all (invertible) A. For instance, for the Jacobi method N is the inverse of the diagonal part of A. In other cases it may be a polynomial of A, and so on. However, in this work we will make direct assumptions on the properties of M and Nf , which may or may not be realizable for a given A. Hence we take the fixed-point problem (1.3) as the starting point of our considerations.
In a practical implementation, if the dimension of V is very large or (in principle) infinite, the use of data-sparse representations of its elements is required for storing the iterates and performing the matrix-vector products. An example that motivates our work is lowrank matrix and tensor formats which can be used for the numerical treatment of highdimensional problems, and have found many applications in scientific computing [5,13,16,17,19]. In these formats the numerical complexity for storing vectors and performing basic linear algebra operations is typically captured by one or several rank functions, rank: V → N ∪ {+∞}. These rank functions are usually sub-additive rank(u + v) ≤ rank(u) + rank(v), and satisfy rank(0) = 0. More generally, such a sub-additive rank function may arise from a generating set D ⊆ V, typically called a dictionary, by defining rank(u) for u = 0 as the minimal number of elements from D needed to write u as a linear combination, or +∞ if u is not in the finite span. The goal is then to find possibly sparse, that is lowrank representations of a sought solution in the dictionary. This very general concept of expansion in dictionaries occurs frequently in nonlinear approximation, and covers classical sparsity (then the dictionary consists of unit vectors) or more general best M-term approximation problems. For low-rank matrices or tensors the dictionary consists of all elementary tensors. Of course, when using data-sparse representations with respect to a dictionary, it is implicitly assumed that the true solution of the problem admits accurate 'low-rank' approximations, but verifying this analytically in advance can be difficult depending on the application. Also note that in many applications the choice of the dictionary is not only motivated by reducing the numerical complexity, but has some well defined and problem dependent purpose for revealing the structure, patterns, principal subspaces etc. of some measured data.
In this paper we investigate the rank accumulation in iterative methods like the linear fixed-point iteration (1.2) in relation to its convergence speed. When looking at (1.2) we see that a rank increase occurs due to two operations: the application of the operator M and the addition of Nf . To deal with the first we assume a multiplicative model, in which the rank of Mu m (and typically also the cost of forming Mu m ) is proportional to the rank of u m , that is, rank(Mu m ) ≤ μ 1 rank(u m ). Then rank(u m+1 ) ≤ μ 1 rank(u m ) + rank(Nf ). (1.4) For several steps one can either turn this into an exponentially growing bound, or draw upon refined estimates on how powers M or polynomials p(M) increase the rank.
Here we can mention two examples. In sparse approximation in V = R n , when the rank of a vector is defined as the number of its nonzero elements, a banded matrix M can be efficiently applied to a sparse vector, but will increase the number of nonzero entries by a multiple of the bandwidth. The bandwidth of M , however, does not grow exponentially but only linearly in . In fact, for the same reason the bandwith of any polynomial p (M) of degree at most grows only linearly. Note that if in the initial linear equation (1.1) the matrix A has a banded structure and N = D is a diagonal preconditioner (e.g. in the Jacobi method), then M = I − DA has the same band structure.
As a second example assume that (1.3) is a fixed point matrix equation in V = R n×n and M is a Kronecker product operator, M =M 1 ⊗M 2 . It is well known that a matrix Sylvester equation, that is, (1.1) with A =Ã 1 ⊗ I + I ⊗Ã 2 , can be transformed into such a problem with M having spectral radius less than one, if bothÃ 1 andÃ 2 have eigenvalues with negative real part [25]. The Kronecker product operator M can then be efficiently applied to a low-rank matrix (in the usual sense) and does not increase the rank at all. Therefore, applying a polynomial p (M) of degree to a matrix increases the rank at most by a factor of + 1; see Section 3.1 for a more general example.
Our main attention in this work is on the second step in the iteration (1.2) which is the addition of Nf . In standard applications, Nf is a fully populated vector and the inequality (1.4) indicates that even one such step is infeasible if rank(Nf ) is very large or infinite. The standard approach would be replacing Nf by an approximation Nf of acceptable rank, and will be discussed in Section 3.2. As an alternative, we propose using a sequence g m → Nf of approximations with (usually) growing rank. This leads to the modified fixed-point iterationû (1.5) considered in this paper. It turns out that theû m converge to a solution u of (1.3) at a similar speed as the standard iteration if the convergence g m → Nf is fast enough.
The proposed modification (1.5) of the fixed-point iteration is an interesting and easily realizable variation of the standard iteration. While several questions could be considered, we focus on its impact on the rank accumulation in certain model cases for M and Nf to show that it can be beneficial compared to the standard iteration. Specifically, to limit the scope we investigate only the cases that the approximation g m → Nf converges exponentially fast, and that the rank amplification by the repeated application of the iteration matrix is either exponential or linear in the number of steps. The latter scenario is motivated by the examples mentioned above.
Besides its implications on the computational cost in iterative methods, our rank estimates also serve a theoretic purpose of characterizing low-rank approximability of structured linear equations since they yield upper bounds on the corresponding approximation numbers for the solution u in terms of the rank parameter r. However, the asymptotic rates obtained in this way by using linear fixed-point iterations (or modifications of them) are not necessarily optimal and can be slow, and hence mainly relevant if V is of very large or infinite dimension. For the two mentioned examples of sparse approximation of systems with banded matrices, and matrix or tensor equations with Sylvester-type operators, better rates than those implied by our analysis (covered by Examples 3.3 and 3.7) are available for Hilbert spaces based on different and more problem related approximation schemes; see, e.g., [4,12] and [10,14,15,20,22,26], respectively. Furthermore, by taking the fixed-point formulation (1.3) as the point of departure we avoid any discussion under which conditions it is actually possible to design for a given linear equation (1.1) an iterative solver (1.2) for which M is highly contractive and mildly rank increasing at the same time (which could be conflicting targets), while Nf admits fast converging and available low-rank approximations. The existence of such a linear iteration would directly imply that the solution can be well approximated in the dictionary. Clearly, if it is not available, it can be more efficient to use few steps of a method with a general spectrally equivalent preconditioner N and then apply truncation. This difficult question is at the very core of understanding low-rank approximability and preconditioning of linear systems for a given rank function, and should not be treated in a general setup like in this paper. There is, however, an interesting converse logic to this. If the solution u of a given linear equation (1.1) does not satisfy the approximability estimates that would be implied by certain assumptions on M and N , then it means that there cannot exist a linear iterative solver with the desired properties. While this may sound trivial, it can actually be seen as a remarkable non-existence result, say for matrix decompositions A = N −1 (I − M).
Our results extend the studies in [21], where mainly the Richardson iteration and the steepest descent method have been considered, to the broad class of linear iterations (1.2). Of course, a great amount of other iterative methods for low-rank solutions of linear systems has been developed in the literature, among them those based on variational formulations, nonlinear optimization, or greedy methods. One common feature of such methods, that should also be applied in a practical implementation of the modified iterations considered in this paper, is the rank truncation of intermediate iterates. A truncation usually occurs either as an adaptive projection (hard thresholding) or as a prox-operator (soft thresholding), and its combination with fixed-point iterations can be studied in a quite general context; see, e.g. [3,7,8,11] for sparse vectors and [1,2,6,18,23,24] for low-rank tensors. In particular, if a certain low-rank approximability of the solution is already known or assumed, then suitable adaptive truncation schemes can lead to refined and near-optimal error estimates. This is, however, outside the scope of the present paper.
The paper is outlined as follows. In Section 2 the convergence rate of the modified iteration (1.5) is estimated for the model case that the g m approximate Nf exponentially fast. In Sections 3.1-3.4 rank estimates for approximate solutions obtained from the standard and modified iteration are derived from assumptions on the rank increasing properties of the iteration matrix M. Section 3.5 presents numerical comparisons of the obtained bounds.

Convergence of the Modified Iteration
Let u be a solution to the linear fixed-point equation (1.3). Given 0 < ε ≤ 1 we seek an approximation u ε to u of relative accuracy ε, that is, Such a u ε will be called an ε-solution of the fixed-point equation (1.3). Here the choice of the norm (and hence of the Banach space V) is usually problem dependent and can already account, e.g., for a large condition number or unboundedness of the operator A in the initial linear equation (1.1). In particular, we assume that M satisfies in the corresponding operator norm. It guarantees that u is the unique solution of (1.3) and that the standard iteration (1.2) converges to u for every starting point u 0 at a linear rate: Note that ζ is a property of the chosen iteration (1.2) and of the chosen norm. 1 It will be convenient to use the starting value u 0 = 0 throughout the paper. It leads to the relative error after m steps of the iteration, and hence the number of steps needed for an ε-solution is upper bounded by Recall that it is always assumed that 0 < ε ≤ 1.
We now consider the modified iteration (1.5) in which in the (m + 1)-th step Nf is replaced by some (simpler) approximation g m+1 . A first general statement on this approach is the following.
In the following, we assume that the g m approximate Nf exponentially fast and satisfy and C > 0 is a constant (that may depend on Nf ). Then the error after m steps of the modified iteration (1.5) can be estimated. As for the standard iteration it will be convenient to consider here and in the following only the starting point The proof is an immediate induction from (2.5) and (2.7). Now note that due to u−Mu = Nf it holds that From Proposition 2.2 we thus obtain the following estimate on the relative error for the modified iteration with starting pointû 0 = 0: This should be compared with (2.3).
To obtain a relative error ε, the modified iteration (1.5) hence requires at most steps. The standard iteration (1.2) needs only m (1.2) (ε) = ln ε ln ζ steps. Note, however, that ln(2 + ζ ) < ln(3) < 1.1. Therefore in the case of a fast standard iteration, when ln ζ is not close to zero, the number of additional steps in the modified iteration is very small. For instance with ζ = 1 2 the additional term ln(2+ζ ) | ln ζ | equals 1.322, so the modified iteration needs at most two steps more than the standard iteration.
Generalizing this example we can derive a bound for the required number of steps for reaching a certain accuracy by estimating the inverse function of the right-hand side in (2.8).

Proposition 2.4 Let u be a solution to
whereas in case ξ = ζ it holds that The proof for ξ < ζ simply follows from (2.8) by omitting the term (ξ/ζ ) m and rearranging for m. The case ξ = ζ is treated as Lemma A.1 in the Appendix, where also the accuracy of the bound (2.11) is illustrated in Fig. 4. It shows that the estimate is reasonably good, but too pessimistic for ζ close to one. Both constants K 1 (ζ, ξ, C) and K 2 (ζ, C) are unbounded for ζ → 1.
Recall that m (1.2) ln ζ is the iteration bound for an ε-solution with the standard iteration. If ζ /ξ is sufficiently large, then (2.10) shows that the number of additional steps required by the modified iteration for reaching the same accuracy is effectively constant, and indeed small if ζ itself is very small, see Example 2.3. In the case ξ = ζ we can roughly state that but with a constant that behaves like 1/ ln ζ −1 when ζ → 1. In practice, for a fixed ζ , say up to ζ ≤ 0.9, and reasonable ε, the actual number of additional steps asserted by this bound is still effectively constant, as can be seen in Fig. 4 in the Appendix. For example, for a fast iteration with ζ = 1 2 and C = 1, (2.11) provides the bound for the case ξ = ζ . For small ε this is considerably worse than (2.9), where ξ = ζ /2, but in turn this bound is actually valid for all possible ξ ≤ ζ = 1 2 . We conclude this section by mentioning a further possible modification of the standard linear iteration, in which instead of a fixed iteration matrix M a sequence M m → M is used. This leads to iterations of the form The matrix M m could be implicitly given by a fixed linear iterative solver applied to a family of approximations A m → A of the linear system itself, or by a sequence N m → N of preconditioners. Assuming (2.2), it is not difficult to prove that if g m → Nf and M m → M, thenū m → u, the fixed point of (1.3), the argument is similar to the proof of Proposition 2.1. Based on suitable assumptions on the convergence speed M m → M one can then study error estimates. In this work, however, we restrict our attention to the simpler variation (1.5) with M being fixed.

Rank Growth in the Standard and Modified Iteration
The modified iteration (1.5) will usually need some more steps than the standard iteration (1.2) to reach a target accuracy ε for the relative error, which is indicated by the error estimates stated in the previous section. In turn the rank of the iterates may grow a little bit less per step, since we are adding g m+1 instead of Nf . In this section we compare the achievable accuracy with the accumulated representation ranks of the approximate solutions generated by the standard iteration and its modification in simplified model cases. While the results are perhaps too generic (and thereby too pessimistic) to use when studying a particular linear equation, our aim is to show that the modified iteration can provide some improvement. We mention again that rank truncation during the iteration is not considered in our analysis, but recommended in computations. The required ranks for a certain accuracy in practice hence can be much smaller than the rank bounds obtained below.

Rank Growth in the Standard Iteration
Due to the representation the ranks for the iterates of the standard iteration (with u 0 = 0) can be estimated in terms of the following, in general unknown, constants: .
Using these constants we have the following obvious estimate from (3.1). In general all the ν m could be infinite or equal to the dimension of V. A basic assumption in our paper is that at least for small m the ν m are small compared to the dimension of V and do not grow too fast. But even then, since in the definition of ν m we have taken a supremum, the estimate (3.2) is quite generic and our results will not account for any additional structure that could be exploited when applying powers of M to Nf in a particular instance. Another issue is that the estimate (3.2) is only reasonable if rank(Nf ) is finite. Let us assume this, then together with the convergence speed (2.4) one obtains rank bounds for an ε-solution of the fixed-point equation (1.3), depending on the behaviour of the constants ν m .
When the constants ν m are not known, it is possible to estimate them by the constants which can be easier to determine. In most cases we may rightfully assume Then clearly, and therefore We call the upper bound in this estimate the worst-case behaviour, since in the typical case μ 1 > 1 it indicates exponential rank growth in the standard iteration. It leads to rather pessimistic results. With (2.4) it implies that for ε > 0, there exists an ε-solution u ε for the linear equation (1.1) satisfying (2.1) and with a rank bounded by for ε → 0. If Nf has finite rank we can deduce an algebraic decay rate for the best low-rank approximation error of u with respect to the rank, namely for r → ∞, where τ r are the approximation numbers defined in (1.6).
There exist interesting examples for which the ν m do not increase exponentially. As mentioned in the introduction, for sparse approximation in R n (rank being number of nonzero elements) a banded matrix M with bandwidth 1+b will increase the number of nonzero elements of a vector by at most a factor μ 1 ≤ 1 + b. However, it holds μ ≤ 1 + b, since M has bandwidth 1 + b. Indeed, the band support for different powers of M is nested so that As another example, assume M is of the form where both M 1 and M 2 do not increase the rank when applied to any u, that is, μ 1 (M 1 ) ≤ 1 and μ 1 (M 2 ) ≤ 1. Assume furthermore that M 1 and M 2 commute. Then shows that in such a case we have μ ≤ + 1.

This implies
However, in special cases one can go further. Assume additionally that p(M 2 ) is rankpreserving for any polynomial p. Then Both examples can be generalized to operators of such form on tensor spaces, but in the case of the Sylvester-like structure, ν m becomes a polynomial of higher order.
It implies a super-algebraic decay rate for the best rank-r approximation error of the solution u to a fixed-point equation (1.3). More generally, for a polynomial growth ν m ≤ p(m), where p is a polynomial of degree q, one obtains rank(u ε ) = O((ln ε/ ln ζ ) q ) and τ r (u) = O(ζ (r 1/q ) ).

Standard Iteration with Fixed Approximation of Nf
Now we discuss the case that Nf has very large or infinite rank. In practice, when an approximation is available, where Nf has finite rank, we can simply use Nf in the standard iteration. This is equivalent to solving a perturbed fixed-point equation which in case that N is invertible corresponds to a linear equation (3.8) For simplicity, let us choose target accuracies of the form

for (3.5) and (3.7). Then (3.8) becomes
where v m satisfies rank(v m ) ≤ ν m rank(Nf ) (3.11) by Proposition 3.1. The second inequality in (3.9) is satisfied after at most m = ln ε − ln 2 ln ζ iterations. One can now proceed as above by assuming different cases for ν m and rank(Nf ).

Example 3.4
For later comparison with the modified iteration, we assume (2.7) holds with C = 1 and rank(g m ) ≤ m 0 · m. Then we choose Nf = gm such that (3.5) is satisfied for δ = 1−ζ 1+ζ ε 2+ε as required in (3.9). For this, we need to truncate (2.7) afterm = ln δ ln ξ terms so that Assuming the worst case (3.3) of exponential rank growth, we conclude from (3.10) and (3.11) that there exists an ε-solution u ε for the initial fixed-point equation (1.3) that satisfies ln ξ , (3.12) where the constant only depends on μ 1 .
Correspondingly, in case of linear rank growth μ m ≤ m, we obtain The bounds (3.12) and (3.13) are depicted in Figs. 1 and 2 further below for some values of ζ and ξ .

Rank Growth in the Modified Iteration
In the modified iteration (1.5) we can deal with the case that Nf has large rank by replacing it with a sequence g m with growing ranks. In non-recursive form, the modified iteration withû 0 = 0 readsû m = M m−1 g 1 + M m−2 g 2 + · · · + M 0 g m . This can also be written aŝ Instead of Proposition 3.1 we hence have the following rank estimates. (3.14) and (3.15) where g 0 = 0.
For the standard iteration, knowing the behaviour of the constants ν m or μ is sufficient for deriving approximation results in terms of ζ . In the case of the modified iteration we also need to know how fast the ranks of g m grow in relation to how fast the error Nf −g m tends to zero. We keep the assumption (2.7), but restrict to C = 1, that is, (3.16) for some ξ ≤ ζ , and consider the simplest case that the rank of the g m grow linearly, that is, rank(g m ) ≤ m 0 · m, (3.17) where m 0 is a fixed constant. In combination with (2.7) this assumption is equivalent to Nf belonging to a certain approximation class defined by where τ r again are the approximation numbers (1.6). Note however that in a practical method the g m must be available. When the rank function is defined by a dictionary D, a most reasonable model for (3.17) is that Nf admits an initial expansion (3.18) and then approximating it by batches of m 0 terms taking for all m. A related approach that arises in practice is that a dictionary expansion of f = i f i is given. Then assuming that the operator N does not increase rank by more than a factor μ N , one could take g m = N(f 1 + · · · + f m ) so that rank(g m − g m−1 ) ≤ μ N for all m.
Clearly in (3.19) we can trade a larger batch size m 0 for a faster approximation rate. In general, if (3.16) and (3.17) hold for some sequence g m one can define for an integer t > 1 the sequence g m = g t·m , (3.21) which will satisfy (3.16) and (3.17), too, but with different constants ξ = ξ t and m 0 = tm 0 . In particular, in the case that ξ = ζ we can always pass to ξ = ζ 2 < ζ which enables the more accurate estimate (2.10) for the required number of steps in Proposition 2.4. The difference will be illustrated in some numerical comparisons further below. With (3.17) the rank estimate (3.14) simplifies to For rigorous bounds on the rank of an ε-solution we can insert the estimates for m (1.5) (ε) provided in (2.10) and (2.11) in the right-hand side of (3.24). We omit the resulting formulas. The asymptotic behaviour is rank(u ε ) μ (with a constant deteriorating for ζ → 1), but as explained above, in the considered model (3.16) and (3.17) we can always assume ξ = ζ 2 < ζ. Figure 1 further below contains the precise bounds for some combinations of ζ and ξ . for an ε-solution of (1.3), with different constants for the cases ξ < ζ and ξ = ζ . The constants deteriorate with ξ → ζ and ζ → 1, respectively. Some concrete values are plotted in Fig. 2 below. We omit more detailed formulas and just note the implied approximation rate τ r (u) = O(ζ √ r ) for r → ∞, with constants depending on ζ , ξ and m 0 .

Modified Iteration with Target Accuracy for Nf
Since one seeks only for an ε-solution to the fixed-point equation (1.3) it may not be necessary to approximate Nf with higher and higher rank. Similar to the discussion for the standard iteration in Section 3.2, one can terminate at some gm = Nf satisfying Nf − gm ≤ δ Nf as in (3.5) and then proceed with g m = gm for all subsequent iterations. We can analyse such an approach as a modified iteration whereC is some not too large constant. For example, we may assume the h i in a dictionary expansion (3.18) of Nf to be pairwise orthogonal, as would be the case in sparse approximation in R n or in a singular value decomposition of a matrix. Then we can take which yields (3.27) for m ≤m (for larger m the left side of (3.27) is zero anyway). Letm be the number of steps required for (3.26) to reach an (ε/2)-solution for v, which by (3.8) will be an ε-solution for the original u. Assumingm =m +k ≥m, 2 then according to (3.15) the final rank estimate will be rank(vm) ≤m =k+1 ν rank(gm − +1 − gm − ).
Example 3.8 For exponential rank growth (3.3), and assuming the model (3.20), this means In the case of linear rank growth ν m ≤ m one gets We omit the formulas for rank estimates in terms of target accuracy ε. Numerical values are provided in Figs. 1 and 2. They indicate that using the modified iteration until some gm = Nf as derived above can outperform the standard iteration with fixed Nf .

Numerical Illustration of Error Bounds
In the numerical illustrations in Figs. 1 and 2 we compare the derived rank estimates for achieving an ε-solution with the modified iteration in the two scenarios of an exponential growth ν m ≤ 2 m − 1, i.e. μ 1 = 2 in (3.3), and for a linear rank growth ν m ≤ m. Both scenarios are evaluated for the values ζ = 0.5 and ζ = 0.9 (spectral norm of M). We consider the approximation rate (2.7) for Nf with C = 1 in the two cases ξ = ζ /2 and ξ = ζ in (3.16), where we assume m 0 = 1 in (3.20), that is, g m is obtained from g m−1 by a rank-one update. To check the potential merit of a larger batch size in the case ξ = ζ , we also consider m 0 = 2 (rank-two updates) with squared rate ξ = ζ 2 (i.e. t = 2 in (3.21)). According to the plots, the larger batch size can be slightly beneficial for the case of exponential rank growth, but does not help in the case of linear rank growth. The following functions are shown in Figs. 1 and 2: -The rank bounds (3.24) (in Fig. 1) and (3.25) (in Fig. 2) for the modified iteration as solid lines, when using as m the minimal number of steps m (1.5) (ε) such that the righthand side of (2.8) is less than ε. The values for m (1.5) (ε) are determined numerically and are depicted in Fig. 4 in the Appendix (as solid lines). One could use instead the derived upper bounds (2.10)-(2.11) for m (1.5) (ε) (these can be seen in in Fig. 4 as dotted lines). One then obtains slightly worse rank bounds, especially in the case ξ = ζ . -The rank bounds for a modified iteration (3.26) as dotted lines, using some truncation Nf = gm as a final approximation, according to (3.28) (Fig. 1) and (3.29) (Fig. 2). We usedC = (1 − ξ 2m ) −1/2 in (3.27), as motivated above.  Fig. 4). Dotted lines: rank bound for a modified iteration (3.26) with target accuracy Nf = gm according to (3.28). Cross and plus markers: standard iteration using the same Nf according to (3.12) for ξ = ζ /2 (cross) and ξ = ζ (plus) -The rank bound for the standard iteration when using the same truncation Nf = gm, according to (3.12) (Fig. 1) and (3.13) (Fig. 2), respectively. These are only given for batch size m 0 = 1 and are depicted as cross markers for ξ = ζ /2 and plus markers for ξ = ζ .
As can be seen from both figures, modified iterations can perform equally well or better than the standard iteration with truncated right-hand side. Especially for the case of linear rank growth, the modified iterations with a target accuracy for Nf (dotted lines) seem to provide a reasonable improvement for ξ = ζ , in particular keeping in mind that they are more data-sparse. It appears that with exponential rank growth the modified iteration should not be terminated at a fixed gm = Nf , and that it helps to take a larger batch size to ensure fewer steps. However, the rank bounds, especially for ζ = 0.9, (Fig. 1, right) are ridiculously large and only of theoretical interest. This illustrates that if for a given linear equation (1.1)  Fig. 4). Dotted lines: rank bound for a modified iteration (3.26) with target accuracy Nf = gm according to (3.29). Cross and plus markers: standard iteration using the same Nf according to (3.13) for ξ = ζ /2 (cross) and ξ = ζ (plus) there does not exist an iteration that is either fast or not exponentially rank increasing, its solution might not admit a good low-rank approximation.

Numerical Experiment
Finally, we include a small numerical experiment to compare the actual convergence of the methods for a particular problem. We generate a 1000 × 1000 tridiagonal matrix A = L + D + R, with diagonal entries in D uniformly distributed in the interval [2,3], whereas the lower and upper off diagonal entries in L and R are uniformly distributed in [−2, −1] and [−1, 0], respectively. The goal is to solve the linear equation Au = f , where f has exponentially decaying entries f i = (4/5) i . Since the exact solution can be well approximated by sparse vectors, we aim at iterations that build approximations with possibly few nonzero entries. Hence the rank function here is the number of nonzero entries in a vector.
We employ two iterative solvers (1.2), the Jacobi method, where N = D −1 , and an approximate Gauss-Seidel method, with which is an approximation of (L + D) −1 = D −1 (I + LD −1 ) −1 . Correspondingly, is a tridiagonal banded matrix for the Jacobi method (with zero diagonal), and a five-banded matrix (with only one upper diagonal) for the approximate Gauss-Seidel method. We use the standard iterations (1.2) and modified iterations (1.5). As approximations g m → Nf we take g m = Nf m , where f m contains the largest m entries of f in modulus. Compared to taking the m largest entries of Nf this has the advantage that the g m can be recursively computed from the sparse columns of N without forming Nf . (Almost no difference was observed for the two approaches.) All four resulting methods are also tested in a truncated version, where after every step entries smaller than a fixed threshold are deleted from the iterate. The results for various instances of A varied slightly but were overall quite similar. In Fig. 3 we show one of the better outcomes. The left plot shows the decrease of the relative error (in Euclidean norm) to the exact solution u, with the dashed lines corresponding to the standard iteration and solid lines to the modified iteration. The numerically computed spectral radii and spectral norms in this instance were ρ(M) ≈ 0.934 and ζ = M ≈ 1.145 for the Jacobi method, and ρ(M) ≈ 0.885 and ζ = M ≈ 0.981 for the approximate Gauss-Seidel method. The threshold in the truncated versions was 10 −9 to reach a relative accuracy below 10 −8 .
In the right panel of Fig. 3 we investigate the convergence speed with respect to the number of used nonzero entries. For the standard iterations only the truncated versions are shown (the vertical dashed lines), since without truncation the iterates immediately fill up. While all truncated methods eventually need about the same number of nonzero entries for a relative error in the magnitude of the threshold, the modified iterations need less nonzeros during the overall process. One should also recall that the standard methods operate with a full vector Nf throughout. Note that the Jacobi method without truncation (blue line), while being slower, is capable of constructing relatively sparse solutions, whereas the approximate Gauss-Seidel method (red line) clearly requires the truncation. For comparison, the right panel also displays the best possible (relative) sparse approximation errors (i.e. the decay of τ r (u)) as a black dotted line, which are obtained from using the largest entries (in modulus) of the true solution u. The truncated approximate Gauss-Seidel method gets closest to this minimal error before reaching the number of non-zeros required for the accuracy specified by the threshold. Note that for both the truncated Jacobi and approximate Gauss-Seidel method the final error is essentially optimal with respect to the sparsity. 2b + 1 is therefore stronger than (A.3) and hence also sufficient for (A.2). Using the definition of x and a we rewrite this as Now noting that b ln ζ −1 = ln ε ln ζ + 1 C(1 + ζ ) + ln ln a −1 ln ζ + 1 ln ζ and setting K 2 (ζ, C) = ln ln a −1 ln ζ proves the assertion.