Stability of Low-Rank Tensor Representations and Structured Multilevel Preconditioning for Elliptic PDEs

Folding grid value vectors of size 2L\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2^L$$\end{document} into Lth-order tensors of mode size 2×⋯×2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2\times \cdots \times 2$$\end{document}, combined with low-rank representation in the tensor train format, has been shown to result in highly efficient approximations for various classes of functions. These include solutions of elliptic PDEs on nonsmooth domains or with oscillatory data. This tensor-structured approach is attractive because it leads to highly compressed, adaptive approximations based on simple discretizations. Standard choices of the underlying bases, such as piecewise multilinear finite elements on uniform tensor product grids, entail the well-known matrix ill-conditioning of discrete operators. We demonstrate that, for low-rank representations, the use of tensor structure itself additionally introduces representation ill-conditioning, a new effect specific to computations in tensor networks. We analyze the tensor structure of a BPX preconditioner for a second-order linear elliptic operator and construct an explicit tensor-structured representation of the preconditioner, with ranks independent of the number L of discretization levels. The straightforward application of the preconditioner yields discrete operators whose matrix conditioning is uniform with respect to the discretization parameter, but in decompositions that suffer from representation ill-conditioning. By additionally eliminating certain redundancies in the representations of the preconditioned discrete operators, we obtain reduced-rank decompositions that are free of both matrix and representation ill-conditioning. For an iterative solver based on soft thresholding of low-rank tensors, we obtain convergence and complexity estimates and demonstrate its reliability and efficiency for discretizations with up to 250\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2^{50}$$\end{document} nodes in each dimension.


Introduction
The direct textbook treatment of elliptic PDEs by low-order discretizations on uniform grids becomes unaffordable for many important problem classes.The high computational costs are due to the prohibitively large number of degrees of freedom required to resolve specific features of solutions, such as singularities and high-frequency oscillations, that arise in problems with nonsmooth or oscillatory data.More efficient discretizations can be obtained with basis functions that are adapted to the given problem and require fewer degrees of freedom.However, the construction and analysis of such methods (for instance, of hp-adaptive solvers) generally depends on specific features of the considered problem classes and accordingly specialized analytical tools.
By the approach considered in this work, efficiency is achieved in a different way: extremely large arrays of coefficients parametrizing simple, uniformly refined low-order discretizations are themselves parametrized as nonlinear functions of relatively few effective degrees of freedom.The latter parametrization is based on representing the coefficient arrays, reshaped into high-order tensors, in the tensor train decomposition with low ranks.This representation exploits low-rank structure with respect to a hierarchy of dyadic scales, providing, at each scale, a problemadapted basis that can be computed using standard techniques of numerical linear algebra.In other words, for the identification of suitable degrees of freedom, this approach avoids relying on problem-specific a priori information; instead, suitable degrees of freedom are found by the low-rank tensor compression of generic, conceptually straightforward discretizations.
In numerical solvers for PDE problems that operate on such highly compressed, nonlinear representations of basis coefficients, new difficulties arise compared to a standard entry-wise representation.As we demonstrate in this contribution, specific types of ill-conditioning in such tensor representations can dramatically affect the numerical stability of solvers.We show how a special low-rank representation of a BPX preconditioner allows to overcome these difficulties and obtain estimates for the total computational complexity of computing solutions with low-rank tensor-train structure.
1.1.Low-rank tensor approximations.The development of low-rank tensor representations [18,24,45,47,50], such as the tensor train format, has originally been motivated by applications to high-dimensional PDEs.As observed in [19,37,43,44], the artificial treatment of coefficient vectors in lower-dimensional problems as high-dimensional quantities, known in the literature as quantized tensor train (QTT) decomposition or tensorization, leads to highly efficient approximations in many problems of interest.See [38] for a general overview and, for instance, [29,36] for further applications.
To briefly illustrate this concept, let us suppose that a function u has an accurate approximation u ≈ N j=1 u j φ j in terms of the basis functions {φ j } j=1,...,N with the coefficient vector u = (u j ) j=1,...,N ∈ R N .The basic idea is to re-interpret u as a higher-order tensor of mode sizes provided by the unique decomposition n k with i ∈ {0, . . ., n − 1} for all = 1, . . ., L .
We assume a simple choice of basis functions, such as low-order splines, combined with a compressed, nonlinearly parametrized approximation of the corresponding coefficients u in the tensor train format, (1.1) u i 1 ,...,i L ≈ The actual degrees of freedom are now the entries of the third-order tensors U ∈ R r −1 ×n ×r with ∈ {1, . . ., L}, which are referred to as cores (where r 0 = r L = 1 for notational convenience).In the case of n = n ∈ N for all , which we consider in this work, the total number of parameters defining this approximation equals L =1 n r −1 r (log N ) max{r 2 1 , . . ., r 2 L−1 }.For certain representative approximation problems (such as functions with isolated singularities or high-frequency oscillations), as shown in [19,31,34,37], one obtains approximations where the rank parameters r 1 , . . ., r L−1 grow at most polylogarithmically in the corresponding error.This suggests the possibility of constructing numerical methods with complexity scaling as (log N ) α for a fixed α.
1.2.Multilevel low-rank approximations for elliptic boundary value problems.In this work, we focus on the application of low-rank tensor techniques for solving second-order elliptic boundary value problems on domains Ω ⊂ R D , where we are mainly interested in the cases of D ∈ {1, 2, 3}.First, consider the exact solution u and finite element solutions u h , where h > 0 is a mesh-size parameter, that are simple, low-order finite element functions with coefficient vectors u h .These are given by suitable linear systems of the form A h u h = f h .For each mesh size h, one can seek instead u LR h from the same finite element space whose coefficient vector u LR h is a low-rank approximation in the form (1.1) of u h .In order to benefit from the complexity reduction afforded by the representation (1.1), the vector u LR h needs to be computed directly in this low-rank representation.Using corresponding representations of A h and f h , this can be achieved by iteratively solving the nonlinear problem in terms of the cores U 1 , . . ., U L of u LR h in (1.1).In our setting, the binary indexing (i 1 , . . ., i L ) used in the interpretation of u h as a tensor of order L corresponds to uniform grid refinement with L levels, and thus h ∼ 2 −L .The separation of variables expressed by (1.1) therefore applies not to the spatial dimensions but rather to the dyadic scales of u LR h .In our model problem, the underlying discretization uses piecewise D-linear finite elements.Using the triangle inequality, we can decompose the error u − u LR h into a discretization error u − u h , for which on uniform meshes one obtains bounds of the form with C u > 0 depending only on u and 0 < s ≤ 1, and the computation error u h − u LR h including the error of low-rank approximation.In problems where u exhibits, for instance, singularities or high-frequency oscillations, one may be dealing with C u extremely large or with s 1.Thus, achieving reasonable total errors may require values of h that are so small that the entry-wise representations of coefficient vectors and matrices is computationally infeasible.
Under natural assumptions on the data and on the underlying mesh, the problem of finding u h remains well-conditioned with respect to the problem data independently of h.However, for very small h as considered here, it becomes a nontrivial issue to ensure numerical stability of algorithms, since these are affected by the condition numbers O(h −2 ) of A h .Regardless of the type of solver that is employed, preconditioning A h becomes a necessity for avoiding numerical instabilities even for moderately small h.As a first step, we therefore construct a preconditioner for A h that can be applied directly in low-rank form, where both the resulting matrix condition numbers after preconditioning and the tensor representation ranks are uniformly bounded with respect to the discretization level L.
However, we also find that when such a preconditioner is applied as usual by the standard matrix-vector multiplication in the tensor format, numerical solvers still stagnate at an error u h − u LR h H 1 of order O(h −2 ), where is the machine precision.This shows that ensuring uniformly bounded matrix condition numbers by preconditioning is not sufficient for low-rank tensor methods to remain numerically stable for very small h.It turns out that tensor representations of vectors in the form of (1.1) generated by the action of A h can be extremely sensitive to perturbations of each single core.This new type of ill-conditioning cannot be eradicated by simply multiplying by the preconditioner, and any further numerical manipulations of the resulting tensor representations are prone to large round-off errors.To quantify this effect, we introduce the notion of representation condition numbers.
Without addressing the issue of representation ill-conditioning, one can therefore only expect u − u h H 1 = O(C u h s + h −2 ).With the optimal choice of h, this yields a total error of order O(C 2/(2+s) u s/(2+s) ); even in the ideal case s = 1, one thus has a limitation to O( 1/3 ).In the present paper, by analytically combining the low-rank representations of the preconditioner and of the stiffness matrix, we obtain a tensor representation that retains favorable representation condition numbers also for large L and leads to solvers that remain numerically stable even for h on the order of the machine precision .For the problems preconditioned in this manner, we can apply results from [4,6] to obtain bounds for the number of operations required for computing u LR h , in terms of the ranks of low-rank best approximations of u h with the same error.Since the costs depend only weakly on the discretization level L, one may then in fact simply choose L so large that h ≈ .This ensures that the discretization error u − u h H 1 is negligible in all practical situations and only the explicitly controllable low-rank approximation error u h − u LR h H 1 remains.1.3.Conditioning of tensor train representations.Let us now briefly outline the source of numerical instability that we need to mitigate here.Subspace-based tensor decompositions such as the Tucker format, hierarchical tensors [24], or the presently considered tensor train format [45] share the basic stability property that the existence of low-rank best approximations with fixed rank parameters is guaranteed.In contrast, such best approximation problems for canonical tensors are in general ill-posed [14], and one has the well-known border rank phenomena where given tensors can be approximated arbitrarily well by tensors of lower canonical ranks.In subspace-based formats, such pathologies of the canonical rank are avoided by working only with matrix ranks of certain tensor matricizations.This leads to natural higher-order generalizations of the singular value decomposition (SVD), in particular the TT-SVD algorithm for tensor trains.
However, when performing computations in such tensor formats, tensors in general do not remain in orthogonalized standard representations, such as those given by the TT-SVD.For instance, the action of low-rank representations of finite element stiffness matrices in iterative solvers may create tensor train representations with substantial redundancies that are far from their respective SVD forms.A return to the rank-reduced SVD form can then in principle be accomplished by applying standard linear algebra operations (such as QR decomposition and SVD) to the representation components.
As we demonstrate in what follows, in relevant cases, tensor train representations can become so ill-conditioned that performing this rank reduction with machine precision no longer produces useful results.To our knowledge, this particular point has not received attention in the literature so far.As we consider in further detail in Section 4, a particular instance where this effect occurs are multilevel low-rank representations of discretization matrices of differential operators.
In order to illustrate these issues, let us consider a low-rank matrix M = AB T with A ∈ R m×r and B ∈ R n×r .Performing numerical manipulations of A, for instance a QR factorization with machine precision, amounts to replacing M by M = ÃB T with A − Ã F ≤ δ A F , where δ will ideally be close to the relative machine precision.Similarly to standard perturbation estimates for matrix products (see, e.g., [27,Sec. 3]), one obtains the generally sharp worst-case bound In the case of high-order tensor train representations, one may think of B as composed of many individual cores.Even when each of these cores looks completely innocent, their cumulative effect can lead to very large B 2→2 .In cases where cancellations occur in the product with A, the size of M F , however, can be small compared to B 2→2 , and perturbations to A are strongly amplified.This means that any numerical manipulation of such representations (such as orthogonalization, which is also the first step in performing a TT-SVD, see Section 3.6) can introduce extremely large errors in the represented tensor.
We define the representation condition number of an operator in low-rank representation as the factor by which its action may deteriorate the conditioning of tensor train representations.In the case of the finite element stiffness matrices A h , we find that this condition number scales (matching the standard matrix condition number) as O(h −2 ), which agrees with the numerically observed loss of precision.One may regard this as a tensor-decomposition analogue of the classical amplification of relative errors by ill-conditioned matrices.However, this error amplification manifests itself not in the action of the tensor representation of A h on any single tensor core, which by itself is harmless, but rather in the cumulative effect that emerges when further operations are performed on the resulting output cores.1.4.Novelty and relation to previous work.As a main contribution of this work, we introduce basic notions and auxiliary results for studying the representation conditioning of tensor train representations.In particular, our finding that the stiffness matrix represented in lowrank format has a representation condition number of order 2 2L explains numerical instabilities in its direct application for large L as observed in tests in [11].We prove a new result on a BPX preconditioner for second-order elliptic problems that is tailored to our purposes, and we construct a low-rank decomposition of the preconditioned stiffness matrix with the following properties: it is well-conditioned uniformly in discretization level L as a matrix; its ranks are independent of L; and its representation condition numbers remain moderate for large L. Based on these properties, we establish an estimate for the total computational complexity of finding approximate solutions in low-rank form.These complexity bounds are shown for an iterative solver based on the soft thresholding of tensors [6], for which the ranks of approximate solutions can be estimated in terms of the ranks of the exact Galerkin solution.We identify appropriate approximability assumptions on solutions in the present context, which are slightly different from those proved in [34].
Difficulties with the numerical stability of solvers for large L have also been noted previously in [34].In [11,46], a reformulation as a constrained minimization problem with Volterra integral operators is proposed.It is demonstrated numerically in [11] up to L ≈ 20 to lead to improved numerical stability, compared to a direct finite difference discretization, for Poisson-type problems with D = 2 dimensions.However, in this reformulation, which so far has been studied only experimentally, the matrix condition number still grows exponentially with respect to L, and numerical stability is still observed to be lacking for larger values of L.
A different class of preconditioners based on approximate matrix exponentials has been proposed for QTT decompositions in [39].In the different context of separation of spatial coordinates in high-dimensional problems, tensor representations have been combined with multilevel preconditioners based on multigrid methods [8,23], BPX preconditioners [1], and wavelet Riesz bases [5].There the required representation ranks of preconditioners have been observed to increase with discretization levels, in contrast to the uniformly bounded ranks that we obtain in our present setting of tensor separation between scales.1.5.Outline.In Section 2, we consider the structure of discretization matrices in detail and establish a result on symmetric BPX preconditioning.In Section 3, we recapitulate basic notation and operations for the tensor train format.In Section 4, we introduce notions of representation condition numbers of tensor decompositions and investigate some of their basic properties.Building on these concepts, in Section 5 we construct well-conditioned multilevel low-rank representations of preconditioned discretization matrices.In Section 6, we discuss the implications of our findings on the complexity of finding approximate solutions, and illustrate the performance of numerical solvers in Section 7.
We use the following general notational conventions: A B denotes A ≤ CB with C independent of any parameters explicitly appearing in the expressions A and B, and A ∼ B denotes A B ∧ A B. We use • 2 to denote the 2 -norm both of vectors and of higher-order tensors, and • 2→2 to denote the associated operator norm.In addition, • F denotes the Frobenius norm of matrices.By •, • , we denote the 2 -inner product of vectors and tensors or the L 2 -inner product of functions, as well as the corresponding duality product.

Discretization and Preconditioning
The model problem that we focus on in what follows is posed on the product domain , we consider the corresponding Sobolev space of functions defined on Ω and vanishing on Γ , (2.1) . On this space, we consider the variational problem (2.2) where a : V × V → R is the bilinear form given by and f ∈ V is a given linear form.We assume the diffusion and reaction coefficients A ∈ L ∞ Ω, R D×D and c ∈ L ∞ (Ω) to be strongly elliptic and nonnegative, respectively: ξ T Aξ ξ T ξ > 0 and c ≥ 0 a.e. on Ω .Under the assumptions on the data made so far, the bilinear form a is continuous and coercive and the linear form f is continuous.By the Lax-Milgram theorem, (2.2) has a unique solution satisfying Additional assumptions on the data of the problem (2.2), essential for its tensor-structured preconditioning and solution, are stated in Sections 2 and 5.
In what follows, we consider a hierarchy of discretizations based on piecewise D-linear nodal basis functions on a sequence of uniform grids with cell sizes 2 − × • • • × 2 − , = 0, 1, 2, . ..; the basis functions can be written as tensor products of standard univariate hat functions.
In this section, we describe V with ∈ N 0 , nested finite-dimensional subspaces of V introduced in (2.1).We will use these subspaces to approximate the solution of the variational problem stated in (2.2).
2.1.Finite element spaces for Ω = (0, 1).Throughout this section, we assume that an arbitrary number ∈ N 0 of refinement levels is fixed.We consider a uniform partition of Ω into 2 subintervals and corresponding 2 continuous piecewise linear functions defined on Ω.Then, by tensorization, we introduce basis functions defined on Ω.
The -dependent normalization factor in the right-hand side of (2.8) results in the uniform normalization (2.9) By the above construction of basis functions, each φ ,j with j ∈ Ĵ is a degree-one polynomial on every Ω ,i with i ∈ Ĵ .This implies that, for α = 0, 1, there exist matrices M ,α with rows and columns indexed by Ĵ × {α, 1} and Ĵ , respectively, such that for all i, j ∈ Ĵ , where ψ0 and ψ1 are the standard monomials of degree zero and one, (2.10b) ψ0 (t) = 1 and ψ1 (t) = t for all t ∈ (−1, 1) .
We note that the matrix M ,0 is rectangular of size 2 +1 × 2 and the matrix M ,1 is a square matrix of order 2 .
For the basis functions defined in (2.10b), since ψ 1 = ψ0 , the odd rows of M ,0 form a multiple of M ,1 : for β = 1 and all i, j ∈ Ĵ , we have Furthermore, the matrices M ,0 and M ,1 have the following explicit form, which will be used below: (2.10d) are square matrices of order 2 .The finite element spaces span{ φ ,j } j∈ Ĵ with ∈ N 0 are nested: for all L, ∈ N 0 such that ≤ L, we have (2.11) φ ,j = j ∈ ĴL ( P ,L ) j j φL,j for all j ∈ Ĵ , where P ,L is the matrix of the identity operator from span{ φ ,j } j∈ Ĵ to span{ φL,j } j ∈ ĴL with respect to the bases defined in (2.8): (2.12) 2.2.Finite element spaces for Ω = (0, 1) D .The partition (2.5) induces a uniform tensor product partition of Ω that consists of the 2 D elements (2.14) Tensorizing (2.8), we obtain the 2 D functions (2.15) φ ,j d with j = (j 1 , . . ., j D ) ∈ J , which are continuous on Ω and D-linear on each of the partition elements given by (2.14).We will use these functions as a basis of a finite-dimensional subspace of V , (2.16) The normalization of univariate factors in (2.9) implies (2.17) and hence with equivalence constants independent of ∈ N. Also, the relationship (2.10a) results in for all i = (i 1 , . . ., i D ) ∈ J , j = (j 1 , . . ., j D ) ∈ J and β = (β 1 , . . ., β D ) ∈ {0, 1} D .Note that, for each α ∈ {0, 1} D , the rows and columns of M L,α are indexed by and J L , respectively.The embedding (2.10c) implies The finite element spaces V with ∈ N 0 are also nested: for all L, ∈ N 0 such that ≤ L, we have V ⊂ V L .In particular, the basis functions of V and V L introduced in (2.15) satisfy the refinement relation (2.19) ϕ ,j = j ∈J L (P ,L ) j j ϕ L,j for all j ∈ J , where (2.20) with P ,L given by (2.12).The stiffness matrix for the bilinear form a and discretization level is given by (2.21)A = a(ϕ ,i , ϕ ,j ) j,i∈J .
Note that due to (2.17), For the right-hand side, we set f = f (ϕ ,i ) i∈J .

2.3.
Representation of differential operators.The bilinear form a : 3) can be rewritten in the form in terms of the affine transformations φ L,i with i ∈ J L defined by (2.7) and (2.18b), a finite index set Γ αα of cardinality In this section, we analyze the dimension structure of the matrix A L of a restricted to V L ×V L with respect to the basis of ϕ L,j with j ∈ J L , whose entries are induced by the tensor product dimension structure of the basis.Splitting integration over the elements Ω ,i with i ∈ J L , given by (2.14), and applying (2.18a), we obtain Let us now, for all α, α ∈ D, introduce a matrix Using these matrices, we can rewrite (2.23b) as Example 2.1.In the case of the negative Laplacian, we deal with a bilinear form given by (2.22a) with D = (δ k1 , . . ., δ kD ), (δ k1 , . . ., δ kD ) D k=1 and c αα = 1 for all (α, α ) ∈ D. For each (α, α), the corresponding coefficient is of the form (2.22b) with Γ αα = {0}, χ αα 0 = 1 and (c L,αα ) i0 = 1 for all i ∈ J L .The corresponding matrix Λ L,α,α given by (2.24a) takes the Kronecker product form where the factors ΛL,0,0 and ΛL,1,1 are diagonal matrices independent of (α, α ) ∈ D whose rows and columns are indexed by J L ⊗ {0, 1} and J L ⊗ {1} respectively.Specifically, their nonzero entries are The multilevel tensor structure of the factorization (2.24b) and, in particular, of Λ L,α,α with (α, α ) ∈ D is investigated in in Section 5.This analysis applies to the case of general nonconstant coefficients c αα with (α, α ) ∈ D under the assumption that each of them exhibits the multilevel low-rank structure in the sense of the following Section 3. Specifically, in Section 5, we analyze the low-rank structure of every factor matrix M L,α with α ∈ {0, 1} D and also show how the low-rank structure of c αα with (α, α ) ∈ D translates into that of Λ L,α,α .First, however, in the remainder of Section 2 we turn to the multilevel preconditioning of A L .This gives rise to the preconditioned operator B L and matrices Q L,α with α ∈ {0, 1} D , defined in (2.31c) below, which relate to B L as M L,α with α ∈ {0, 1} D to A L .The low-rank multilevel structure of B L and Q L,α with α ∈ {0, 1} D is the main topic of Section 5.
Remark 2.2.In the case of one dimension (D = 1), let us consider a diffusion operator with a coefficient c that is piecewise constant: c • φL,i = (ĉ L ) i on (−1, 1) for all i ∈ ĴL , cf. (2.22b).Such coefficients appear, for example, as approximations in the midpoint quadrature rule.Then the representation (2.23b) takes the form where ML,1 = 2 3 2 L ( Î − Ŝ ) is defined by (2.10a) and is given explicitly by (2.10d).The representation (2.26) has been used for this one-dimensional case in [15,16,31]; the representation (2.24b) provides a generalization to higher dimensions and general coefficients.
2.4.Multilevel preconditioning.Among the various existing methods for preconditioning discretization matrices of second-order elliptic problems, we are especially interested in approaches that provide optimal preconditioning and at the same time lead to favorable multilevel low-rank structures.A choice that meets these criteria is based on the classical BPX preconditioner [10].For our particular purposes, in what follows we also obtain a new result on symmetric preconditioning by this method.
The BPX preconditioner requires a hierarchy of nested finite element spaces which in the present case are the uniformly refined spaces defined in (2.16).The standard implementable form of the preconditioner (cf.[10,53]) is then given by Interpreting C 2,L as a mapping of coefficient sequences ( v, ϕ L,j ) j∈J L to nodal values of finite element functions, one obtains the corresponding matrix representation where P ,L is as in (2.19), (2.20).The following result on the BPX preconditioner (2.27) was established in [12,48], see also [9,54].
Theorem 2.3.Let A L and C 2,L be as in (2.21) and (2.27).Then there exist c, C > 0 independent of L such that This preconditioner is therefore optimal, that is, the condition numbers of preconditioned systems remain bounded uniformly in the discretization level.It is usually applied in the form of a left-sided preconditioning: it implies in particular that cond(C ) is uniformly bounded with respect to L and that there exists ω > 0 such that the iteration at an L-independent rate.Also standard implementations of the preconditioned conjugate gradient method use only the action of C 2,L .
For our purposes, for several reasons explained in further detail in what follows, we require symmetric preconditioning, that is, an implementable operator 2,L provides optimal symmetric preconditioning by Theorem 2.3, this is not directly numerically realizable.
We thus instead consider two-sided preconditioning by the implementable operator (2.28) For bounding the condition number of the symmetrically preconditioned operator C L A L C L , we need to establish spectral equivalence of A L and C −2 L .This is not a direct consequence of Theorem 2.3.Although relying mainly on adaptations of established techniques as in [26,51,54], the following result appears to be new.The proof is given in Appendix A.
Theorem 2.4.With A L as in (2.21) and C L as in (2.28), there exist c, C > 0 independent of L such that Remark 2.5.As an immediate consequence of Theorem 2.4, In what follows, we consider the symmetrically preconditioned problem of finding u L such that (2.31a) For our purposes, the symmetrically preconditioned operator is preferable mainly for two reasons.On the one hand, an important advantage of the symmetric preconditioning (2.31b) consists in the norm equivalence (2.30), since ultimately we are interested in numerical schemes with guaranteed convergence in the H 1 norm.With low-rank methods using SVD-based rank truncations, as considered in further detail in Section 6, for any ε > 0 we can find v such that u L − v 2 ≤ ε with u L as in (2.31a).With the nodal basis coefficients v = C L v, for the corresponding finite element functions v = j∈J L vj ϕ L,j and u L = j∈J L ūL,j ϕ L,j we have (2.30).On the other hand, the symmetric preconditioning (2.31b) allows for the explicit assembly of the preconditioned operator B L directly in the low-rank form, as considered in detail in Section 5.

Tensor Train Decomposition
In this section, we recapitulate the definition of the tensor train (TT) decomposition of multidimensional arrays and present the notation that we need for the following sections.
The TT decomposition uses one of many possible ways to separate variables in multidimensional arrays; see, e.g., the survey [40] and the monograph [22].The TT decomposition is a particular case of the more general hierarchical tensor representation, also known as the hierarchical Tucker representation [18,24].Both the TT and hierarchical tensor representations can be interpreted as successive subspace approximation or low-rank matrix factorization, and this relation allows for the quasi-optimal low-rank approximation of tensors built upon standard matrix algorithms.
The number of parameters of the representation, formally linear in L, is mainly governed by the ranks, such as r 1 , . . .r L−1 in (3.1a) and p 1 , . . ., p L−1 in (3.1b).In many applications, the complexity is observed, theoretically as well as numerically, to depend moderately on L (see, e.g., [20]), which allows to lift or completely avoid the so-called curse of dimensionality associated with the entrywise storage of high-dimensional arrays.
The use of L for the dimensionality of tensors in this section is not accidental: in the present paper, the "dimension" index ∈ {1, . . ., L} enumerates the levels of discretization, and each of the mode indices (i and j with ∈ {1, . . ., L} above) represents the corresponding D bits of the D "physical" dimensions.In this case, the TT format separates not "physical" dimensions of tensors but rather their multilevel structure, and adaptive low-rank approximation allows to resolve this structure in vectors and matrices.In this setting, the TT decomposition is known as the quantized tensor train (QTT) decomposition [21,37,43,44].This idea is further explained in Section 3.7.

Core notation.
In this section, we present the notation developed in [30,32,35], which we extensively use to work with TT representations.For the sake of brevity, several definitions and properties will be stated for cores with two mode indices, which naturally arise in TT representations of matrices.The setting with a single mode index per core can be considered a particular case in the same way as vectors can be considered one-column matrices.
For explicitly defining a core U , as a tensor of order four as in (3.2), in terms of its blocks (which in turn can be matrices or vectors), we use the notation where square brackets are used for distinction from matrices.The following matrices are examples of blocks that we frequently use in this paper: (3.4) To apply the usual matrix transposition to TT decompositions of matrices, we will use the transposition of mode indices of cores: in terms of matrix transposition, for all values of the indices.

3.3.
Strong Kronecker product.We are interested in cores as factors of TT decompositions, and now we present how decompositions of the form (3.1a)-(3.1b)can be expressed in terms of cores.For that purpose, we use the strong Kronecker product, introduced for two-level matrices in [13].In order to avoid confusion with the Hadamard and tensor products, we denote this operation by , as in [35,Definition 2.1], where it was introduced specifically for connecting cores into TT representations.
Consider cores U and V of ranks p × r and r × q and of mode size m 1 × m 2 and n 1 × n 2 respectively.The strong Kronecker product U V of U and V is the core of rank p × q and mode size m 1 m 2 × n 1 n 2 given, in terms of the matrix multiplication of slices (of size p × r and r × q), by In other words, we define U V as the usual matrix product of the corresponding core matrices, their entries (blocks) being multiplied by means of the Kronecker product.For example, we have for two cores of rank 2×2.Using the strong Kronecker product, we can rewrite (3.1a) and (3.1b) as follows: where the first equalities indicate that any tensor of dimension m × n can be identified with a core of rank 1 × 1 and mode size m × n.
3.4.Representation map.Since many different tuples of cores may represent (or approximate) the same tensor, we need to distinguish representations as tuples of cores.We denote such tuples by sans-serif letters; for example, for the decompositions given by (3.1a) and (3.1b).Further, we denote by τ the function mapping tuples of cores into cores (in particular, into tensors when the rank of the resulting core is 1 × 1): For the sets of all tuples of L ∈ N cores with compatible ranks, we write TT L = TT 1 L in the case of blocks with one mode index, and TT 2 L in the case of two mode indices as in (3.2).
Furthermore, let us assume that U = (U 1 , . . ., U L ) ∈ TT L , i.e., that U 1 , . . ., U L are cores such that τ (U 1 , . . ., U L ) is a core of rank r 0 × r L and mode size n, where r 0 , r L , n ∈ N. Then by τ − and τ + we denote the matrices of size r 0 n × r L and r 0 × nr L , respectively, given as follows: for all β 0 = 1, . . ., r 0 , i = 1, . . ., n and β L = 1, . . ., r L .These matrices may be called matricizations of the core τ (U 1 , . . ., U L ): they are obtained by interpreting the rank indices as row and column indices, which is consistent with (3.3), and by interpreting all mode indices as either row or column indices.For notational convenience, we set τ − (∅) = 1 and τ + (∅) = 1 for empty lists of cores.Moreover, for each = 1, . . ., L, we define In particular, we have Unfolding matrices, ranks, and orthogonality.Let us consider a vector u of size For every = 1, . . ., L − 1, we denote by U (u) and U (A) the th unfolding matrices of u and A, which are the matrices of size The decompositions given by (3.1a)-(3.1b)or, equivalently, by (3.9c) imply rank (u) ≤ r and rank (A) ≤ p for each = 1, . . ., L − 1; furthermore, the decompositions provide low-rank factorizations of the unfolding matrices with the respective numbers of rank-one terms.For example, in the case of a vector, using the notation introduced in (3.10c)-(3.10d),we can write U (u) = τ − +1 (U) τ + (U).Conversely, if u and A are such that, for every = 1, . . ., L − 1, the unfolding matrices U (u) and U (A) have approximations of ranks r and p , respectively, and of accuracy ε in the Frobenius norm, then representations U = (U 1 , . . ., U L ) and A = (A 1 , . . ., A L ) of ranks r 1 , . . ., r L−1 and p 1 , . . ., p L−1 such that and can be constructed by the TT-SVD algorithm [45, Algorithm 1].
Next, we recapitulate the notion of orthogonality of decompositions in terms of the matricization operators defined in (3.10a)-(3.10d).If a core U is such that the matrix τ − (U ) has orthonormal columns, then the core is called left-orthogonal.Similarly, if the matrix τ + (U ) has orthonormal rows, then the core is called right-orthogonal.Further, if U ∈ TT L is such that the columns of each matrix τ − (U) with = 2, . . ., L + 1 are orthonormal, then the decomposition is called left-orthogonal.Analogously, if the rows of each matrix τ + (U) with = 0, . . ., L−1 are orthonormal, then the decomposition is called right-orthogonal.It is easy to see that any core U of the form U = U 1 U 2 is left-or right-orthogonal if both U 1 and U 2 are left-or right-orthogonal, respectively.As a result, any decomposition U = (U 1 , . . ., U L ) is left-or right-orthogonal if each of the cores U 1 , . . ., U L is left-or right-orthogonal.
Moreover, we say that U is in left-orthogonal TT-SVD form if τ − +1 (U) has orthonormal columns and τ + (U) has orthogonal rows for each = 1, . . ., L − 1; in other words, these matrices provide the SVD of U (u) for each , where the norms of the rows of τ + (U) are the corresponding singular values, and u 2 = U L 2 .Analogously, U is in right-orthogonal TT-SVD form if τ − +1 (U) has orthogonal columns and τ + (U) has orthonormal rows.These TT-SVD forms can be obtained numerically for any given U by the procedure [45, Algorithm 1] without rank truncation.
3.6.Operations on cores.We require several further operations, which are explained in this section.We start with the mode product of cores, which was introduced in [30, Definition 2.2] and which generalizes matrix multiplication to the case of cores.
For example, for a core A with two mode indices and a core B with one or two mode indices, each core being of rank 2 × 2, we have (3.12)The mode product and the strong Kronecker product inherit distributivity from the usual matrix product and from the Kronecker product: for A = (A 1 , . . ., A L ) and U = (U 1 , . . ., U L ) such that the products A • U with = 1, . . ., L are all defined, we have that the product τ (A) • τ (U) is defined and is given by When τ (A) and τ (U) are both of rank 1 × 1 and can therefore be identified with matrices, τ (A) • τ (U) is the core of rank 1 × 1 identified with the matrix-matrix product of these matrices, and (3.13) gives a representation for the product of a matrix A = τ (A) and a vector u = τ (U) given by (3.1b) and (3.1a).
Finally, our derivations involve Kronecker products of cores, which are defined as the Kronecker product of the corresponding arrays.For any p, p , q, q ∈ N and m, n, m , n ∈ N, let A be a core of rank p × p and mode size m × n and let B be a core of rank q × q and mode size m × n .Then the Kronecker product A ⊗ B of A and B is the core of rank pq × p q and mode size mm × nn given by in terms of the Kronecker products of all pairs of block tensors or, equivalently, by in terms of the Kronecker products of all pairs of slice matrices.Similarly to (3.13), we have for any representation A = (A 1 , . . ., A L ) and B = (B 1 , . . ., B L ).The relations (3.13) and (3.15) indicate the well-known fact that the matrix and Kronecker products can be recast core-wise; see, e.g., [22,40,45].One of the most important properties of the TT decomposition of tensors is that any representation can be made left-or right-orthogonal in the sense of Section 3.5 by the successive application of the QR decomposition [18,22,24,41,45].We now briefly present an algorithm for the left-orthogonalization of a decomposition, which we use as an example in the discussion of representation conditioning.This scheme is also the first step in the computation of the TT-SVD form of a TT representation, as in [45, Algorithm 2].
Algorithm 3.1 left-orthogonalization orth − of a TT representation (right-orthogonalization orth + can be performed analogously) set for = 1, . . ., L − 1 sweep through the representation from left to right 4: compute a matrix QR decomposition: define V , of same dimensions as U , so that τ define W +1 , of same dimensions as U +1 , so that τ end for In exact arithmetic, we have τ (V) = τ (U) for any U ∈ TT S L with L, S ∈ N and V = orth − (U), and this is the view adhered to in the references cited above.However, the situation is drastically different when errors are introduced (e.g., due to round-off) in the course of orthogonalization, namely, in lines 4 and 6 of Algorithm 3.1.

3.7.
Low-rank multilevel decomposition of vectors and matrices.Here, we discuss how we use the tensor train decomposition for the resolution of low-rank multilevel structure in vectors and matrices involved in the solution of (2.2).
To reorder the entries of Kronecker products, we use particular permutation matrices defined as follows.First, for every L ∈ N, we define Π L as the permutation matrix of order 2 DL such that for all i k, = 1, 2 with k = 1, . . ., D and = 1, . . ., L. In our present setting, we are interested in functions whose coefficients admit low-rank TT representations in the following sense: with some U = (U 1 , . . ., U L ).The set J L , which is defined by (2.14), is isomorphic to {1, 2} DL .The matrix Π L , when applied to a vector whose components are indexed by J L , folds the vector into a DL-dimensional array, transposes the DL indices according to the transformation of ordering in the product {1, . . ., D} × {1, . . ., L} from big-endian to little-endian, and unfolds the resulting array back into a vector.
In other words, the matrix Π L , acting on a vector whose entries are enumerated so that the indices corresponding to each dimension and all of the levels occur one after another, rearranges the entries in such a way that the indices corresponding to each level and all of the dimensions occur one after another.In the present paper, we will use Π L to permute the rows and columns of matrices, as the following example illustrates.
Example 3.3.In the case of D = 2 and L = 3, the following relation holds: , where we use the matrices that we defined in (3.4) above.

Representation Conditioning
Since the TT decomposition is based on low-rank matrix factorization, redundancy (linear dependence) in explicit TT representations can be eliminated analytically.This is illustrated in Appendix B: see (B.1a)-(B.1c)and, for more practical examples, the proof of Lemma 5.5.On the other hand, in the course of computations, this reduction has to be done numerically.
In exact arithmetic, it can always be achieved by the TT rounding algorithm [45, Algorithm 2] using the TT-SVD.In practice, however, it may fail due to round-off errors: a small perturbation of a single core in a TT decomposition may, through catastrophic cancellations, introduce a large perturbation in the represented tensor.This can occur even in the course of orthogonalization (Algorithm 3.1), which is essential for ensuring the stability of the TT rounding algorithm.We now turn to an analysis of the potential for such error amplification, which we refer to as representation conditioning.

4.1.
Examples of ill-conditioning of tensor representations.We first consider a simple example of a tensor where relative perturbations on the order of the machine precision can lead to large changes in the represented tensors.
Example 4.1.Take D = 1 (so that I = {0, 1}) and let x be the tensor with all entries equal to one, x i 1 ,...,i L = 1 for i 1 , . . ., i L ∈ I. Clearly, x can be represented by X = (X ) =1,...,L with ranks(X) = (1, . . ., 1), where X = [(1, 1) T ] for each .However, we also have an alternative representation Y with ranks(Y) = (2, . . ., 2): for any fixed R > 0 and y 0 = (0, 0) T , y R = (R, R) T , we instead set (4.1) For ε > 0, consider a perturbation of Y by replacing Y for some fixed 1 < < L by This corresponds to a relative error of order ε with respect to Y 2 .The resulting perturbed tensor x ε is again constant with entries 1 + (R L + 1)ε, and therefore satisfies  For instance, with R = 4 and L ≥ 25, we obtain R L > 10 15 .Consequently, any numerical manipulation of the representations can then lead to very large round-off errors that leave no significant digits in the output; in particular, an automatic rank reduction of the representation by SVD will in general not produce any useful result.
To illustrate this numerically, we consider the left-orthogonalization orth − (Y) with R = 4 and machine precision ≈ 2 × 10 −16 , which is also the first step in computing the TT-SVD.In exact arithmetic, the tensor τ (orth − (Y)) is identical to τ (Y); however, in inexact arithmetic, this can be far from true.The associated relative numerical errors are compared to the bound (4.2) in Table 1.We consider two ways of evaluating the difference in 2 -norm: by extracting all tensor entries and computing the norm of their differences directly, or by assembling the difference in TT format and computing its norm using another orthogonalization.Due to numerical effects, the resulting values are not identical, but agree in their order of magnitude, which is also the same as predicted for a particular perturbation by (4.2).
The type of instability observed in Example 4.1 occurs in a similar way in other operations, for instance in the computation of inner products, or even in the extraction of a single entry of the tensor.Due to its fundamental importance in many algorithms, we use orthogonalization as an illustrative example in what follows.
Example 4.1 may seem artificial, since in the explicit construction of tensor representations one will usually try to avoid such redundant representations that can cause cancellations.However, redundancies of this kind may also be generated when matrix-vector products are performed.We next consider an example of practical relevance where both matrix and vector are each in multilevel tensor representations of minimal ranks, but the resulting representation of their product has a similar ill-conditioning as the previous example.
Example 4.2.We consider the representation of the negative Laplacian with homogeneous Dirichlet boundary conditions on (0, 1), discretized by piecewise linear finite elements on a uniform grid with 2 L interior nodes.The resulting stiffness matrix as derived in [35,Cor. 3.2], where H = 1 + 2 −L and the elementary blocks are as defined in (3.4).The first eigenvector of A DD L , corresponding to the lowest eigenvalue λ min,L ≈ π 2 , is Then the representation A • X of the matrix-vector product A DD L x min,L in exact arithmetic satisfies τ (A • X) = A DD L x min,L = λ min,L x min,L = λ min,L τ (X).We consider a similar numerical test as in Example 4.1, comparing the relative error in orth − (A • X) to that of orth − (X).The results are given in Table 2, where differences are computed in the TT format.Whereas the numerical manipulation of X leads to errors close to the machine precision , in orth − (A • X) we observe large relative errors of order 2 2L .Note that the representation of A DD L according to (4.3) has a similar structure as the redundant representation (4.1) in the previous example: the cores A 1 , . . ., A L−1 have only positive entries, whereas A L can introduce cancellations, in particular when the matrix is applied to low-frequency grid functions as above.

Representation amplification factors and condition numbers.
We now introduce a quantitative measure for the stability of TT representations under numerical manipulations.To first order in the size of the perturbation, it is determined by the relative condition numbers of the multilinear mapping τ with respect to the component tensors in its argument.Here we use the appropriate metric on the components that corresponds to the above considered perturbations arising in linear algebra operations.Definition 4.3.We define the representation amplification factors of X ∈ TT L , for = 1, . . ., L, by and the representation condition numbers by By multilinearity of τ , if X, X ∈ TT L with x = τ (X), x = τ ( X) are such that X − X 2 ≤ ε X 2 for each , then for such relative perturbations of size ε of cores we have the bounds In the following characterization, we use the notation τ − and τ + for left and right partial matricizations as introduced in (3.10c)-(3.10d).Proposition 4.4.For any X ∈ TT L and = 1, . . ., L, Proof.For fixed in (4.5), let X, X satisfy the conditions in the supremum.Then .
The claim thus follows by taking the supremum over X such that 2 , which is in fact attained.Remark 4.5.The quantities in Definition 4.3 measuring the amplification of perturbations can be defined in an analogous way for more general tensor networks by considering perturbations in the respective components; see [7,22,42,49] for an overview on such more general tensor formats.
We have the following general observations concerning possible representation condition numbers, where in certain special cases, we can also give bounds that depend only on the TT ranks.Here we use the notion of TT-SVD forms introduced in Section 3.5.Proposition 4.6.Let X ∈ TT L , then the following hold for = 1, . . ., L.
Modifications to the components of a TT representation that leave the represented tensor unchanged can still lead to a change in the representation condition numbers.This change can be bounded from above as follows.
In the particular case when the matrix τ + ( X) has orthonormal rows, one has the stronger bound If X +1 is right-orthogonal, then Proof.The estimates (4.7) follow from for the first, and analogous estimates for the second inequality.To see (4.8), observe that R τ + ( X) = τ + (X) and that under the given additional assumption, τ + ( X) 2→2 = 1 and τ + (X) 2→2 = R 2→2 .Under the further assumption for (4.9), we have X +1 2 = R F , and thus Note that the improved bounds (4.8) and (4.9), which do not depend on the particular transformation R, correspond to the transformations made in algorithms for right-orthogonalizing X ∈ TT L .When the roles of X , X +1 and the corresponding orthogonality requirements are reversed, (4.8) and (4.9) are replaced by ramp +1 ( X) ≤ ramp +1 (X) and ramp ( X) ≤ √ r +1 ramp (X).

4.3.
Orthogonalization as an example of a numerical operation.Orthogonalization of tensor train representations is usually done via QR decompositions of matricized cores.When performed at machine precision , these decompositions are affected by round-off errors: applied to M ∈ R m×n , where mn is sufficiently small, as shown in [27, §19] the standard Householder algorithm yields Q, R such that As a consequence of Proposition 4.7, we obtain a statement on the numerical errors incurred by orthogonalization of TT representations.As a simplifying assumption, let us suppose that the QR factorizations in orth − (X), orth + (X) of X ∈ TT L are computed with machine precision up to the error bound (4.10), but that matrix-matrix multiplications are performed exactly (and hence the computed Householder reflectors act as exactly orthogonal matrices).Then recursively using (4.8), (4.9), we obtain 2), (4.12)where r = rank (X) for = 1, . . ., L. The analogous statements for the relative errors τ (orth + (X)) − τ (X) 2 / τ (X) 2 and τ (orth − (X)) − τ (X) 2 / τ (X) 2 hold with ramp replaced by rcond.
Taking into account further numerical effects due to inexact matrix-matrix multiplications leads to substantially more complicated bounds involving additional prefactors depending more strongly on intermediate steps in the algorithms.As our numerical illustrations in Section 4.1 demonstrate, however, the order of magnitude of the resulting errors is typically already very well predicted by the bounds (4.11), (4.12).
In other words, these are the largest factors by which the action of the matrix representation A can possibly change the representation amplification factors and the condition numbers of a vector representation.By definition, these functions are submultiplicative: We do not have an explicit representation of these quantities as in Proposition 4.4, but we obtain the following upper bound in terms of the components of representations.
Proposition 4.9.For A ∈ TT 2 L , we define the matrices Then mramp (A) ≤ β (A) for = 1, . . ., L, where we define and if τ (A) is invertible, The first statement follows with the estimates as well as the analogous bound for τ + (Y).For (4.15), note that if τ (A) is invertible, then In certain situations, Proposition 4.9 provides qualitatively sharp bounds.We now demonstrate this in the simple example of the stiffness matrix for the Dirichlet Laplacian on (0, 1).Similar results are observed numerically for direct representations of more general stiffness matrices of second-order elliptic problems.Proof.The upper bounds follow by direct computation from Proposition 4.9 via evaluation of the auxiliary matrices in (4.14).For the lower bound on mramp (A), we estimate the supremum from below using the representation X max analogous to (4.4) of the eigenvector x max,L = sin(πi/H) i=1,...,2 L corresponding to the largest eigenvalue λ max,L ∼ 2 2L .To this end, it suffices to evaluate ramp (A • X max )/ ramp (X max ) via Proposition 4.4 in a direct but tedious calculation.For the lower bound on mrcond (A), we instead use x min,L = sin(πi2 −L /H) i=1,...,2 L in the representation (4.4).
Thus applying the matrix representation A to the tensor decomposition X of a vector may in general increase its representation condition number by a factor proportional to 2 2L .For instance, if X is given in TT-SVD form with representation condition number close to one, the further numerical manipulation of A • X can cause errors of order O(2 2L τ (X) 2 ).This effect is observed also in the numerical tests in Section 7.1.

Multilevel Low-Rank Tensor Structure of the Operator
In this section, we analyze the low-rank structure of the preconditioner C L , given by (2.28), and of the preconditioned discrete differential operator B L in the form of (2.31b).The resulting low-rank representations are designed specifically to have small representation condition numbers in the sense of Definition 4.8, which is not generally the case for low-rank decompositions of B L .
The central idea for obtaining well-conditioned representations is to directly combine the representations of differential operators ML,α as in (2.10d) with those of the averaging matrices P ,L in the preconditioner.This leads to a natural rank-reduced representation of the products ML,α P ,L , where the cancellations causing representation ill-conditioning that are present in the tensor decomposition of ML,α are explicitly absorbed by the preconditioner and thus removed from the final representation.5.1.Auxiliary results.In this section, for ∈ N, we present explicit joint representations of the identity matrix Î and of the shift matrix Ŝ , given by (2.10e), and of the linear vectors ξ and η , defined in (2.13).These representations will be presented here in terms of the following cores: (5.1) and P = 1 0 .
Our derivations will also involve the square Kronecker-product matrices (5.2) with ∈ N and iterated strong Kronecker products, such as We start with the following auxiliary result, which appeared in slightly different forms in [30,35].The brief derivation, in the form given here, provides an illustration and simplifies further proofs given below.
Lemma 5.1.For every ∈ N, the matrices Î , Ŝ and Ĵ , given by (2.10e) and (5.2), satisfy where the blocks I and J and given by (3.4) and the core Û is as defined in (5.1).
Proof.For = 1, the claim is trivial.Let us assume that > 1.Then, splitting each of the matrices Ŝ , Î and Ĵ into four blocks, we obtain the following recurrence relations: (5.4) Using the core product, these relations can be recast as Applying (5.5) recursively, we obtain (5.3).
As the following auxiliary result shows, a similar technique applies to cores whose blocks are vectors.
Proof.For = 0, 1, the claim is trivial.Let us assume that > 1. Splitting each of the vectors ξ and η into two blocks, we arrive at the recursion from which it is easy to see that (5.7b) Using the core product, the relations (5.7a) and (5.7b) can be recast as Applying (5.8) recursively and comparing ξ1 and η1 with the first column of the core X, which is given by X P , we obtain (5.6).
Proof.Consider L ∈ N and α ∈ {0, 1}.Immediately from (2.10d), we obtain the representation Applying Lemma 5.1, we arrive at the claimed decomposition in the case of = L, Using that T0 T0 = 2 Î, we obtain
Proof.We start with rewriting (2.12) in terms of the core product as where the middle core should be omitted when = 0. Applying Lemma 5.1 (for > 0) and Lemma 5.2 to expand the middle and the last cores, we prove the claim.

Explicit analysis of univariate factors under preconditioning.
Here, obtain an optimal-rank representation of the product M L,α P ,L and note how the products ML,α P ,L P T ,L and P ,L P T ,L can be represented, all for L ∈ N, α ∈ {0, 1} D and = 0, . . ., L. The optimal-rank representation of the product M L,α P ,L is obtained in terms of the following cores: (5.12) .
The proof of the following lemma is rather technical and is therefore given in Appendix B.
Combining the decomposition (5.11) and its transpose, we can rewrite the product P ,L P T ,L core-wise: (5.14) where the factors are We remark that the ranks of the decomposition (5.14) are 4, . . ., 4.
Direct calculation with expressions given in (5.1)-(5.9)leads to Â = 1 0 0 0 , (5.18) Explicit expression for Û , X and Ẑα , Kα with α = 0, 1 can be likewise calculated based on (5.1) and (5.9), from which we refrain to keep exposition concise.5.4.Analysis in D dimensions by tensorization.In this section, we generalize the results of Sections 5.3 to the case of multiple dimensions and analyze the low-rank tensor structure of the preconditioner C L , given by (2.28), and of the preconditioned discrete differential operator B L in the form of (2.31b).For the latter, we first derive a representation of the matrices Q L,α with L ∈ N and α ∈ {0, 1} D , which are defined in (2.31c).
The representations derived below are composed from the following cores: (5.19) for all α ∈ {0, 1} D , where the factors are given by (5.15) and (5.17).Tensorizing (5.14) core-wise and distributing the scaling factor over the cores, we obtain the decompositions (5.20) , where the cores are given by (5.19) and the permutation matrix Π L is as defined in (3.16).Applying [35,Lemma 5.5] to the sum of such matrices with = 1, . . ., L and adding the term corresponding to = 0, we obtain the following result.
Theorem 5.6.For any L ∈ N, the matrix C L , defined by (2.28), admits the decomposition , where the middle cores are the subcores being as in (5.19).
For any L ∈ N, = 0, 1 . . ., L and α ∈ {0, 1} D , tensorizing (5.16) core-wise and distributing the scaling factor over the cores results in the decompositions , where M L,α and P ,L are given by (2.18c) and (2.20), the cores are given by (5.19) and the permutation matrices Π L and Π L,α are as defined in (3.16) and (3.18).Similarly as for C L above, we can apply [35,Lemma 5.5] to the sum of the matrices given by (5.22) with = 1, . . ., L and add the term corresponding to = 0.This leads to the following result, which is analogous to Theorem 5.6.
In Example 2.1, the case of the Laplace operator was considered and the factors Λ L,αα with (α, α ) ∈ D for the suitable D were explicitly given in the Kronecker product form (2.25a).That form immediately leads to a multilevel TT decomposition of ranks 1, . . ., 1 for each Λ L,αα .Here, we analyze the structure of Λ L,αα with (α, α ) ∈ D in the general setting of Section 2.3, for an arbitrary D ⊂ {0, 1} D × {0, 1} D of differential indices, under the additional assumption that the coefficient functions (2.22b) exhibit low-rank structure.
Theorem 5.8.For D ⊂ {0, 1} D × {0, 1} D and L ∈ N, consider a bilinear form of the type (2.22a)-(2.22b),where each coefficient vector c L,αα with (α, α ) ∈ D admits a multilevel TT decomposition of the form (5.24a) with ranks r 0,αα , . . ., r L,αα not exceeding r ∈ N. Then the preconditioned matrix B L of a, defined by (2.23a), (2.28) and (2.31a), admits a multilevel TT decomposition of ranks R 0 , . . ., R L , where Remark 5.9 (sharper bounds in specific cases).The last inequality of (5.25) is given for a general case with D 2 second-order terms (no symmetry is assumed), D first-order terms and a zero-order term.However, for the Laplacian in the case D = 2, the first equality given in (5.25) results in R = 1152, which is a marked reduction from the bound R ≤ 12288 obtained for a general second-order bilinear form with constant coefficients.
Remark 5.10 (inexact application).In computations, algorithms using products of B L with vectors rather than explicit representations of B L may be expected to be more efficient.Indeed, such products can be formed by adding the products of the terms in the sum (2.31b), and for each term the product can be computed by three multiplications.On the intermediate results obtained between these multiplications and additions, low-rank re-approximation can be performed, as explained further in the example of the discretized Laplacian in Section 5.5.The given bounds for TT ranks appear to be highly pessimistic for such inexact schemes.
Remark 5.11.The analysis in D dimensions is given here for the most generic discretization obtained by tensorization.The approach can be applied to discretizations that are not of tensor product form in order to mitigate the growth of the rank bounds with respect to D.

Numerical illustrations.
In summary, we obtain a combined tensor representation Similarly, from Theorem 5.6 we also have L , one can alternatively consider the simple product representation C L • A L • C L , which corresponds to performing the action of the preconditioner C L separately from that of A L .
Note that, in Section 4.4, we have assumed decompositions consisting of L cores.The decompositions in Theorems 5.6, 5.7, and 5.8 comprise L + 2 cores, with first and last playing special roles since they can be merged with the respective adjacent cores.The cores in these extended decompositions are thus indexed by = 0, . . ., L + 1 in what follows, so that again the bounds for = 1, . . ., L are relevant.
One benefit of the combined representation B L is the rank reduction compared to C L • A L • C L .More importantly, however, the decomposition B L is constructed so that the representation condition numbers mrcond (B L ), = 1, . . ., L, remain moderate even for large L. In contrast, the representation condition numbers of C L • A L • C L are in general of the same order of magnitude as those of A L -in other words, whereas the matrix condition number of C L A L C L is uniformly bounded, for improving also the representation condition number, applying the preconditioner C L separately is insufficient and one instead needs a carefully constructed combined representation B L .
We now present numerical observations that illustrate how different the decompositions , as derived in [35]; similar representations can be obtained for D > 1 by tensorization.
We first consider the upper bounds β , defined in (4.14), for mramp from Proposition 4.9.Since both A −1 L and B −1 L are bounded independently of L, by (4.15), up to fixed constants the respective β are also upper bounds of the corresponding representation condition numbers mrcond .
For B L , instead of directly computing the estimates for mramp (B L ) with = 1, . . ., L given by Proposition 4.9, we will do this for the factors of a decomposition that is equivalent to B L and is also based on (2.31b).Let us note that the equality (5.26) of decompositions holds in terms of the factors Θ L,k with k = 1, . . ., D given as follows: for every k, we set L,α,α , which is diagonal and of Kronecker product form (2.25a); thus its decomposition with ranks 1, . . ., 1 is obtained by element-wise application of the square root to each core.Equality (5.26) results in the second of the following inequalities: where the equivalence is uniform with respect to L ∈ N and, for each L ∈ N, β with = 1, . . ., L are as defined in (4.14).As well as in (5.27), the alternate form (5.26) of B L is used to improve the efficiency of residual approximation in the numerical tests of Section 7.
In contrast, as shown in Figure 1 exponentially with respect to L. Although Proposition 4.10 shows that they can lead to useful qualitative statements, the upper bounds provided by β cannot be expected to be quantitatively sharp.The direct evaluation of the suprema in the definitions (4.13) is in general infeasible, but testing with concrete V ∈ TT L can provide some further insight.For D = 1, we use TT-SVD representations V 1 , V min , V max (of maximum ranks 1, 2, and 2, respectively) of the vectors Consequently, as in the examples of Section 4.1, for each such a choice of V and any representation of a matrix M, the absolute and relative errors incurred by the orthogonalization of M • V give an indication of the order of magnitude of ramp (M • V) and rcond (M • V).
The results are summarized in Tables 3 and 4. We see that in all cases, the absolute and relative errors for B L are close to machine precision ≈ 2.2 × 10 −16 , which is quantitatively better than indicated by the upper bounds in Figure 1.For A L and C L • A L • C L , we observe an amplification of relative errors that is exponential in L (and in fact slightly worse for The absolute errors for C L are close to , which is important for the evaluation of preconditioned right-hand sides; the corresponding relative errors increase with L in the case of V max , which is to be expected since C L damps high-frequency oscillations.

Complexity of Solvers
We now consider the numerical computation of u L solving (2.31a).Here the objective is to find u ε ∈ V L(ε) such that u − u ε H 1 ε, and we obtain an estimate for the computational complexity of achieving this goal.Assuming up to a constant independent of L, see (5.27).
as given in Section 5.5.that L(ε) ∼ |log ε| is suitably chosen a priori and that the TT singular values of u L satisfy a natural decay estimate, we show that the number of arithmetic operations for computing a tensor train representation of u ε is of order O(|log ε| θ ), where θ > 0 depends only on the low-rank approximability of the u L .Remark 6.1.The methods we consider rely on the accurate evaluation of residuals B L v−C L f L .As we have seen in Section 5.5, for the representations B L and C L of B L and C L that we have constructed, the quantities mramp (B L ) and mramp (C L ) grow only moderately with respect to L. Indeed, the results of Table 3 indicate that provided that v and f L are given in wellconditioned representations, the corresponding residuals can be evaluated with an absolute error close to machine precision, which is corroborated also by our further numerical tests in Section 7.For the convergence analysis of this section, we assume exact arithmetic.6.1.Estimates of ranks and computational costs.To estimate the computational complexity of finding approximate solutions, we use the quasi-optimality properties of an iterative and V as in Table 3.
method using soft thresholding of hierarchical tensors introduced in [6].This construction directly carries over to the special case of the TT format, leading to a soft thresholding operation S α that is non-expansive with respect to the 2 -norm.It can be realized numerically for TT representations, described in [6,Sec. 3], at essentially the same cost as the TT-SVD.Note that since B L is well-conditioned uniformly with respect to L, as a consequence of Theorem 2.3 we can choose ω > 0 such that ξ = sup L>0 I − ωB L satisfies ξ < 1.The basic iterative method applied to the present problem has the form (6.1) , n ≥ 0, with u 0 L = 0 and α n → 0 determined (according to [6, Alg.2]) as follows: set α 0 = ω g L 2 /(d − 1), and for a fixed B > B L 2→2 , take (6.2) In what follows, we refer to the algorithm given by (6.1), (6.2) as STSolve.
Recall that u L = j∈J L (C L u L ) j ϕ L,j , with analogous notation for the iterates, where u L V ∼ u L 2 .Our convergence analysis is based on the following assumption on uniform decay of singular values, which is discussed further in Section 6.2.Assumption 6.2.For all L ∈ N and = 1, . . ., L, let the singular values σ ,j (u L ) with j = 1, . . ., 2 D max( ,L− ) of the th unfolding matrix U (u L ), defined as in (3.11a), satisfy the bound with C, c, β > 0 independent of and L.
Theorem 6.3.Let ε > 0. Then STSolve stops with u L,ε such that after finitely many steps.In addition, let Assumption 6.2 hold.Then there exist c 1 , c 2 > 0 and ρ ∈ (0, 1) independent of L and n such that with Proof.This is the statement of [6, Thm.5.1(ii)] applied to our setting, combined with [6, Rem.5.6] concerning the dependence of ε n on L.
Remark 6.5 (Complexity bounds).If B L has fixed representation ranks, as in the case of the Laplacian, the costs of each step are dominated by those of applying S αn , which are of order O(L(max rank (u n L )) 3 ).By Corollary 6.4, the total number of operations for N steps to guarantee an H 1 -error of order ε is thus bounded by (6.4) C(1 + |log ε|) with a uniform constant C > 0.
In cases with variable coefficients such that B L does not have an exact low-rank form, but needs to be applied approximately, the iteration given in (6.1) and (6.2) can be adapted to residual approximations with prescribed tolerance as given in [6,Alg. 3], which preserves the statement of Theorem 6.3 as shown in [6,Prop. 5.9].Depending on the L-and ε-dependent rank bounds for B L , one may then obtain additional factors in the estimate (6.4).Remark 6.6.Complexity estimates are also given in [4] for a similar iterative method based on hierarchical SVD truncation (which in the present setting translates to a direct TT-SVD truncation).A simplified version of this method operating on fixed discretizations is given in [6,Alg. 4].Based on the theory for this method, one can also derive rank and complexity bounds similar to (6.4), but with a less favorable exponent: For this method, one arrives at a number of operations bounded by C(1 + |log ε|) β for some C > 0, where t > 0 now depends on the representation ranks and condition number of B L , and the bound can be substantially worse than (6.4).The practical performance of the scheme from [4], however, tends to be comparable to the one of STSolve considered above.Remark 6.7.Alternatively, the linear systems B L u L = g L can be solved by the AMEn methods introduced in [17].The basic version analyzed in [17,Sec. 5] relies on residual approximations of a certain quality and increases approximation ranks in each iteration.However, the available convergence results only lead to a complexity bound that increases faster than exponentially in L. In the practical implementation that we also consider for comparison in Section 7, the basic method is combined with a faster heuristic residual approximation scheme based on the alternating least squares (ALS) method and with additional rank reduction steps.Although no convergence analysis is available for this version, the method performs well in our tests with well-conditioned B L .6.2.Low-rank approximability assumptions.For the case of one or two dimensions, a low-rank approximation analysis for the solution of the problem (2.2) under certain analyticity assumptions on the coefficients and right-hand side, following from the regularity analysis developed in [2,3], is available in [28,33,34].The following result can be obtained as an immediate consequence of [34,Theorem 5.16].Moreover, in the given results, iteration numbers for AMEn need to be interpreted differently, where each iteration in the convergence plots comprises several substeps with local residual evaluations for each core.7.1.Results without preconditioning.We first illustrate the results obtained by a direct application of multilevel tensor representations of stiffness matrices A L without preconditioning.Such representations have been derived, for instance, in [35].In the present case of mixed Dirichlet and Neumann boundary conditions, this leads to representations similar to the pure Dirichlet case in (4.3).Here we consider the case D = 1, where for simplicity we take reaction coefficient c = 0 and right-hand side f = 1, that is, we solve the weak formulation of (7.1) − u = 1, u(0) = 0, u (1) = 0.
Using AMEn directly with system matrix A L and right-hand side f L , we observe that the resulting residual indicators stagnate at values above 2 2L , where ≈ 2.2×10 −16 is the relative machine precision.This is to be expected in view of the matrix and representation ill-conditioning of A L .
If we instead implement the preconditioned matrix C L A L C L by pre-and post-multiplying with a separate tensor representation C L of the preconditioner, we still obtain essentially the same type of stagnation at approximately 2 2L .Since the represented matrix C L A L C L is now well-conditioned, these remaining catastrophic round-off errors and the resulting stagnation are entirely due to representation ill-conditioning, which is not removed by simply multiplying by the preconditioner.This effect is observed both with AMEn and with STSolve.The results are shown in Figure 3, with the residual values with respect to the system matrices A L and C L A L C L , respectively.7.2.Constant-coefficient diffusion, D = 1.We now consider the same basic test case (7.1), but with B L = C L A L C L in the combined tensor representation constructed in Section 5.In this and the following tests, residual values always refer to the preconditioned residuals B L • −g L 2 , which is proportional to the H 1 -errors in the corresponding grid functions.With a target residual of 10 −12 , both AMEn and STSolve converge unaffected by any round-off errors for very large values of L. Indeed, this remains true for values L that are substantially larger than in the case L = 50 shown here, but since the corresponding mesh widths are then smaller than machine precision, the results are more difficult to interpret.
For the AMEn solver, we assemble the complete representation of B L .In exact arithmetic, this would in fact be equivalent to applying representations A L and C L separately, and differences are entirely due to the different tensor decomposition in the previous case.With STSolve, we have the additional option of using error-controlled inexact residual evaluations as in [6, Alg.3]  to reduce the arising ranks of intermediate results; as shown in [6, Prop.5.9], the statement of Theorem 6.3 still applies to this modification.To this end, we use that the tensor representation can be directly rewritten in the form B L = Θ T L,1 Θ L,1 as in (5.26),where Θ L,1 is uniformly bounded with respect to L, and apply an additional recompression by TT-SVD after applying Θ L,1 .7.3.Highly oscillatory diffusion coefficients, D = 1.We next consider the family of problems with oscillatory diffusion coefficients on Ω = (0, 1) given by (7.2) for large values of K.The exact solution reads For K ∈ 4N, we represent the vectors u ex and v ex of nodal values of u and u in the multiscale TT format with ranks bounded by seven and six, respectively.The coefficient a K does not have an explicit low-rank form, and we compute approximations as follows: using the explicit rank-three representation of c(x) = 2 + cos(Kπx), using STSolve we solve the equation c(x i ) a K (x i ) = 1 in the points x i = 2 −L (i − 1 2 ), i = 1, . . ., 2 L , as an elliptic problem on 2 ({1, . . ., 2 L }) for a K ; the tolerance is chosen to ensure a sufficient uniform error bound.
We compare the results for the values K = 2 10 , 2 20 , 2 30 , 2 40 with L = 50 in Figure 5.The observed convergence patterns of both methods show hardly any influence of the value of K.Note that the computed preconditioned coefficients u L do not satisfy the same rank bound as (7.3) (which holds for C L u L , the corresponding vector of scaled nodal values).In each case, comparison with the explicit low-rank form of u ex , v ex shows that the expected total error bounds are achieved.
More specifically, approximations of the H 1 -error in the solutions can be obtained in a numerically stable way by evaluating u ex − C L u L 2 and v ex − Θ L,1 u L 2 , where Θ L,1 is the factor of the preconditioned Laplacian stiffness matrix as in Section 7.2.In Table 5, we summarize the obtained approximations of H 1 -errors for different solver tolerances and parameters L. We observe an effect that is particular to the present one-dimensional setting, where the accuracy in the nodal values is limited only by the solver tolerance as soon as L is sufficiently large for resolving the oscillations in the solution.Similarly to Section 7.2, STSolve is used with inexact residual evaluation, now using that the tensor representation of B L can be written in the form B L = Θ T L,1 Θ L,1 + Θ T L,2 Θ L,2 as in (5.26).Here Θ L,1 and Θ L,2 are uniformly bounded, and each has maximum representation rank 24.Although these ranks remain independent of L, additional rank reductions in this decomposition are important from a quantitative point of view: since B L has maximum representation rank 1152, applying it directly would lead to very large ranks.In the available version of AMEn, the decomposition of B L needs to be used directly, but the impact of large residual ranks is limited due to the ALS-type residual approximation.In this case, the main downside of the direct assembly of B L is in the higher memory requirements for large L.
In terms of computational costs, the error-controlled full residual approximation used by STSolve is substantially more expensive in all considered tests than the heuristic ALS-based residual approximation used by AMEn.The precise CPU timings are of limited significance due to the different implementations, but we observe running times on the order of several minutes with STSolve and of seconds with AMEn in the tests with D = 1, and of several hours with STSolve and several minutes with AMEn in the case of D = 2.Although no convergence analysis is available for this AMEn implementation, especially for the present well-conditioned representations it is thus an interesting practical choice.

Conclusion and Outlook
We have identified notions of condition numbers of tensor representations that determine the propagation of errors in numerical algorithms.In the application to multilevel tensor-structured discretizations of second-order elliptic PDEs, the careful construction of tensor representations of preconditioned system matrices guided by these notions leads to solvers that remain numerically stable also for very large discretization levels.For one such method based on soft thresholding of tensors, we have shown that the total number of arithmetic operations scales like a fixed power of the logarithm of the prescribed bound on the total solution error.
The new variant of BPX preconditioning that we have analyzed leads to a very natural lowrank structure of the symmetrically preconditioned stiffness matrix.Remarkably, unlike the rank increase with discretization levels observed in the case of separation of spatial coordinates [5], in the present case of tensor separation of scales, we obtain preconditioner representation ranks that remain uniformly bounded with respect to the discretization level.Similar results can be obtained for related preconditioners based on wavelet transforms, which are the subject of ongoing work.
For the preconditioned solvers, the relevant approximability properties of solutions we have identified are slightly different from the ones for nodal basis coefficients studied, e.g., in [34].The numerically observed favorable decay of TT singular values of preconditioned quantities thus requires further investigation; it also depends on the particular choice of preconditioner.
The practical application to more general problems was not considered here to avoid further technicalities, but one can similarly treat different boundary conditions, more general coefficients (such as highly oscillatory diffusion coefficients in D > 1) or more general domains by techniques developed in [28].We also expect that our basic considerations concerning the combined lowrank representations of preconditioners and discretization matrices of differential operators can be applied, with potentially more technical effort, to other types of basis expansions and to different classes of PDE problems.
Although the representation ranks of preconditioned matrices that we obtain are bounded independently of the discretization level, they are fairly large for D > 1.This suggests the further investigation of solvers with improved quantitative performance, in particular the combination of AMEn-type methods with efficient residual approximation strategies for preconditioned operator representations.
We expect that the framework we have proposed here for studying the conditioning of tensor representations can be developed further to provide more detailed information, as well as sharper computable bounds for representations of matrices.
, which shows the lower bound in (2.29).
Arguing along similar lines to obtain the upper bound in (2.29) would lead to a constant depending linearly on L, and we thus now turn to a different approach using Lemma A.1.Let R = P ,L (P T ,L P ,L ) −1 P T ,L be the discrete orthogonal projector onto V .For any w ∈ V L , setting w 0 = R 0 w and w = (R − R −1 ) w for = 1, . . ., L, we obtain the decomposition We thus arrive at completing the proof of the upper bound in (2.29) and hence of Theorem 2.4.
Remark A.2.Although we have used some simplifications due to the tensor structure in our particular setting, the proof of Theorem 2.4 carries over to more general hierarchies of finite element spaces, provided that one can establish a corresponding strengthened Cauchy-Schwarz inequality as in (A.3), see, e.g., [9,52,54].
For any scalar coefficients α, β and blocks or subcores V 11 , V When the partitioning shown in (B.1a)-(B.1c) is in terms of blocks (which, by our identification convention, are subcores of rank 1 × 1), the rank of the product is 2 × 2. The left-hand side of (B.1a) and the right-hand side of (B.1c) represent this core "in the TT format", which has only only one rank parameter and happens to be nothing else than low-rank matrix factorization in these two cases.The "ranks" of the first decomposition, equal to 2, are larger than the "ranks" of the last decomposition, equal to 1.
The TT representation (B.1b) consists of three cores and has ranks 2, 1.However, all mode indices of its middle core are dummy indices (the mode size of the middle core is 1 × 1), so the middle core can be merged with either of the neighboring cores without changing the decomposition scheme (by the latter we mean the set and the ordering of the variables separated by the TT format).
Rewriting matrix multiplication core-wise, we combine the rank-two decompositions given by (B.2a)-(B.2b)into a rank-four decomposition for the product: where Â and Ŵα with α = 0, 1 are as in (5.15) and (5.17) and E = Î • P , Û = Û • Û and Ŷ = V • X are newly introduced cores.Direct calculation with expressions given in (5.1), (5.9) and (5.12) yields in terms of the blocks I, I 1 , I 2 and J defined in (3.4).This proves the claim in the case of α = 0 since M0 = N0 by (5.9) and (5.12).Sweeping from level L to level .In the decomposition (B.2e), the ranks involved in the core products to the right of T0 (in particular, those bounding the ranks of unfolding matrices , . . ., L − 1 + α) are all equal to two.To prove the claim, it remains to consider the case of α = 1 and obtain a reduced decomposition in which those ranks are all equal to one instead of two.To this end, we note that Ŷ0 M1 = M1 Ŷ1 = M1 Ŷ1 N1 and T0 M1 = T1 .Applying these relations to (B.2e), we obtain the claim in the case of α = 1.

Definition 3 . 2 (
mode product of cores).Let p, p , r, r ∈ N and m, n, k ∈ N. Consider cores A and B of ranks p × p and r × r and of mode size m × k and k × n, respectively.The mode core product A • B of A and B is the core of rank pq × p q and mode size m × n given, in terms of the matrix multiplication of blocks (of sizes m × k and k × n), by diff.(a) 4.17 × 10 −13 6.06 × 10 −10 6.95 × 10 −07 9.64 × 10 −04 9.48 × 10 −01 diff.(b) 3.51 × 10 −13 3.82 × 10 −10 7.10 × 10 −07 7.02 × 10 −04
and B L are in terms of representation conditioning and demonstrate the improvement afforded by our findings presented in Sections 4, 5.2, and 5.4.As in Example 2.1, we consider the case of the Laplacian: A L = D L with D L as in (A.1).Using (3.4), for D = 1 we have

Figure 1 (
a) shows the computed values of max β (Θ L,1 ) for different values of L and D = 1, 2, where we observe max β

Figure 3 .
Figure 3. Results for Section 7.1, computed residual bounds in dependence on iteration count: (a) AMEn applied directly to A L , (b) AMEn with directly multiplied C L A L C L , (c) STSolve with directly multiplied C L A L C L ; each for L = 10, 15, 20, 25, 30 (by increasing line thickness).

Figure 6 .
Figure 6.Results for Section 7.4: residual bounds (black) and maximum approximation ranks (grey), for well-conditioned representation of B L , L = 50.

Sweeping from level L to level 1 .
Let us define the following cores: note that the second and fourth rows in each of the cores E and Ŷ C are equal.This implies that E = C E and Ŷ C = C Ŷ C. Further, in each of the cores Ŵ0 C and Û , the last row is zero, so that Ŵ0 C = G Ŵ0 C and Û = G Û .These equalities allow to sweep the cores C and G through the last L − and first levels respectively: starting from (B.2c), we obtain(B.2d)N ,L,α = c ,L Â Û Ŵ0 Ŷ (L− ) C E Mα = c ,L Â Û Ŵ0 C ( Ŷ C) (L− ) E Mα = c ,L Â Û G Ŵ0 C ( Ŷ C) (L− ) E Mα = c ,L Â ( Û G) Ŵ0 C ( Ŷ C) (L− ) E Mα .Sweeping from level 1 to level L. Further, we notice that the coressatisfy the relations Â = Â F , F Û G = Û F , F Ŵ0 C = T0 H, H Ŷ C = Ŷ0H and H E = Î.These relations allow to sweep the cores F and H through the first and last L − levels respectively: continuing (B.2c), we derive (B.2e)N ,L,α = c ,L Â F ( Û G) Ŵ0 C ( Ŷ C) (L− ) E Mα = c ,L Â Û F Ŵ0 C ( Ŷ C) (L− ) E Mα = c ,L Â Û T0 H ( Ŷ C) (L− ) The problem (2.2) is a variational formulation of a boundary value problem for a reactiondiffusion equation with homogeneous mixed boundary conditions: of Dirichlet type on Γ , and of Neumann type on ∂Ω Γ .
A 11 B 11 A 11 B 12 A 12 B 11 A 12 B 12 A 11 B 21 A 11 B 22 A 12 B 21 A 12 B 22 A 21 B 11 A 21 B 12 A 22 B 11 A 22 B 12 A 21 B 21 A 21 B 22 A 22 B 21 A 22 B 22 if the first mode size of B equals the second of A.