Truncation of tensors in the hierarchical format

Tensors are in general large-scale data which require a special representation. These representations are also called a format. After mentioning the r-term and tensor subspace formats, we describe the hierarchical tensor format which is the most flexible one. Since operations with tensors often produce tensors of larger memory cost, truncation to reduced ranks is of utmost importance. The so-called higher-order singular-value decomposition (HOSVD) provides a save truncation with explicit error control. The paper explains in detail how the HOSVD procedure is performed within the hierarchical tensor format. Finally, we state special favourable properties of the HOSVD truncation.


Introduction
In the standard case, tensors of order d are quantities v indexed by d indices, i.e., the entries of v are v[i 1 , . . ., i d ], where e.g.all indices run from 1 to n. Hence the data size is n d .This shows that even number n and d of moderate size yield a huge value so that it is impossible to store all entries.Instead one needs some data-sparse tensor representation.In this paper we mention such representations.The optimal one is the hierarchical representation explained in Chapter 4. A slight generalisation is the tree-based format described in Falcó et al. [3].
Section 2 contains an introduction into tensor spaces and the used notation.We mainly restrict ourselves to the finite-dimensional case, in which we do not have to distinguish between the algebraic and topological tensor spaces.The latter tensor spaces are discussed in [3].Since true tensor spaces-those of order d ≥ 3-have less pleasant properties than matrices (which are tensors of order 2), one tries to interpret tensors as matrices.This leads to the technique of matricisation explained in Sect.2.2.The range of the obtained matrices defines the minimal subspaces introduced in Sect.2.3.The dimension of the minimal subspaces yields the associated ranks.The singular-value decomposition applied to the matricisations leads to the so-called higher-order singular-value decomposition (HOSVD) which will be important later.Finally, in Sect.2.5, we discuss basis transformations.
In Sect. 3 we briefly discuss two classical representations of tensors: the r -term format (also called CP format) and the tensor subspace format (also called Tucker format).For the latter format the HOSVD is explained in Sect.3.3: instead of applying HOSVD to the full tensor, we can apply it to the smaller core tensor.As a result of HOSVD we can introduce special HOSVD bases.These bases allow a simple truncation to smaller ranks (i.e., the data sparsity is improved; cf.Sect.3.4).
Section 4 is devoted to the hierarchical tensor format.In principle it is a recursive application of the tensor subspace format.It is connected with a binary tree.The generalisation to a general tree yields the tree-based format in [3].However, for practical reasons one should use a binary tree.a key point is the indirect coding of the bases discussed in Sect.4.2.As a result only transfer matrices are stored instead of large-sized basis vectors.Basis transformations can be performed by simple matrix operations (cf.Sect.4.3).Similarly, the orthonormalisation of the bases are performed by an orthonormalisation of the transfer matrices (cf.Sect.4.4).The main challenge is the computation of the HOSVD bases.As demonstrated in Sect.4.5 one can obtain these bases by singular-value decompositions only involving the transfer matrices.The corresponding truncation can be performed as in the previous chapter (cf.Sect.4.6).
The SVD truncation to lower ranks can be regarded as a projection onto smaller subspaces.However, different from general projections, the SVD projection has particular properties which are discussed in the final Sect. 5.In Sect.5.1 we consider the case of the tensor subspace representation of Sect.3.2.It turns out that certain properties of the given tensor (e.g., side conditions or smoothness properties) are inherited by the projected (truncated) approximation.As proved in Sect.5.2, the same statement holds for the best approximation in the format of lower ranks.Finally, this statement is generalised to the hierarchical tensor representation.

Definitions, notation
Let V j (1 ≤ j ≤ d) be arbitrary vector spaces over the field K, where either K = R or K = C. Then the algebraic tensor space V := a d j=1 V j consists of all (finite) linear combinations of elementary tensors d j=1 v ( j) (v ( j) ∈ V j ).The algebraic definition of V and of the tensor product Greub [4, Chap.I, §2]): Let U be any vector space over K.Then, for any multilinear mapping ϕ : there exists a unique linear mapping Φ : V → U such that ϕ(v (1) , v (2) , . . ., v (d) In the case of infinite-dimensional tensor spaces, one can equip the tensor space with a norm.The completion with respect to the norm • yields the topological tensor space Hackbusch [5,§ 4]).In this article, we restrict ourselves to the finitedimensional case.Then the algebraic tensor space introduced above is already complete with respect to any norm and therefore it coincides with the topological tensor space.This fact allows us to avoid the affix 'a' in V = a d j=1 V j .Instead, V = d j=1 V j is sufficient.The simplest example of a tensor space is based on the vector spaces V j = K n j , where the vectors v ∈ K n j are indexed by i ∈ I j := {1, . . ., n j }.Instead of K n j we also write K I j .

Then the elementary product
Therefore the tensor space V is isomorphic to K I .
The second example is based on the matrix spaces U j = K m j ×n j .Then the tensor space U := d j=1 U j can be interpreted as follows.Set V = d j=1 V j with V j = K n j as above, while W = d j=1 W j is generated by W j = K m j .Matrices M j ∈ U j define a linear map belonging to the vector space L(V j , W j ).Now the elementary tensor M := d j=1 M j ∈ U can be regarded as a linear map of L(V, W) defined by2  ( j)  for all v ( j) ∈ V j . ( The tensor product d j=1 M j of matrices is also called the Kronecker product.In the finitedimensional case, U coincides with L(V, W).
The definition of V = d j=1 V j by all linear combinations of elementary tensors ensures that any v ∈ V has a representation ( The tensor rank of v is the smallest possible integer r in (3).It is denoted by rank(v).If d = 2 and V j = K n j , the tensor space V = V 1 ⊗ V 2 is isomorphic to the matrix space K n1 ×n 2 .The elementary tensor v ⊗ w corresponds to the rank-1 matrix vw T .In this case the tensor rank coincides with the usual matrix rank.d = 1 is the trivial case where V = d j=1 V j coincides with V 1 .For d = 0, the empty product V = d j=1 V j is defined by the underlying field K.

Remark 1 The dimension of
and (U ⊗ W ) ⊗ V are isomorphic as vector spaces.Here (U ⊗ V ) ⊗ W is the tensor space of order 2 based on the vector spaces X := U ⊗ V and W .However, these spaces are not isomorphic as tensor spaces.For instance they have different elementary tensors.V = d j=1 V j and W = d j=1 W j are isomorphic as tensor spaces if, for all 1 ≤ j ≤ d, the vector spaces V j and W j are isomorphic.

Matricisation
Within the theory of tensor spaces, the matrix case corresponding to d = 2 is an exceptional case.This means that most of the properties of matrices do not generalise to tensors of order d ≥ 3.An example is the tensor rank for d ≥ 3.In general its determination is NP hard (cf.Håstad [9]).Tensors in d j=1 R n j can also be regarded as elements in d j=1 C n j , but the corresponding tensor ranks may be different.Matrix decompositions like the Jordan normal form or the singular-value decomposition do not have an equivalent for d ≥ 3.
To overcome these difficulties one may try to interpret tensors are matrices.According to Remark 1, for all 1 ≤ j ≤ d, the tensor space V = d k=1 V k is isomorphic to where Here k = j means k ∈ {1, . . ., d}\{ j}.The vector space isomorphism ⊗ v [ j] with v [ j] = k = j v (k) and called the j-th matricisation.In the case of for all i j ∈ I j and i . An obvious generalisation reads as follows.Set D := {1, . . ., d} and choose a subset α ⊂ D with ∅ = α = D.The complement is α c := D\α.Define I α := × j∈α I j and V α = j∈α V j .The matricisation with respect to α uses the isomorphism Here i α is the tuple (i j ) j∈α ∈ I α .M α (v) can be regarded as a matrix in K I α ×I α c .For α = { j} we obtain the j-th matricisation M j from above.For α = D, the set α c is empty.The formal definition j∈∅ V j = K explains that M D : V → V ⊗ K. Regarding M D (v) as a matrix means that there is only one column containing the vectorised tensor v. Analogously, M ∅ (v) = M D (v) T contains v as row vector.
The α-rank of a tensor v is already defined by Hitchcock [10, p. 170] via the matrix rank of M α (v): For different α the α-ranks are different.The only relations are Let M := d j=1 M ( j) be an elementary Kronecker product of matrices M ( j) .The tensor Mv is defined by (2) and satisfies where M α = j∈α M ( j) and M α c = j∈α c M ( j) are partial Kronecker products.Interpreting M α (v) and M α (Mv) as matrices, the equivalent statement is

Minimal subspaces
Given v ∈ V = d j=1 V j , there may be smaller subspaces U j ⊂ V j such that v ∈ U = d j=1 U j .The subspaces of minimal dimension are called the minimal subspaces and denoted by U min j (v).They satisfy The minimal subspaces can be characterised by This includes the case α = { j}, for which U min α (v) is written as U min j (v).In the infinite-dimensional case one cannot interpret M α (v) as a matrix.Then the definition ( 6) must be replaced by where V j is the dual space3 of V j .The application of φ α c = j∈α c ϕ ( j) to v is defined by In the general case the α-rank is defined by rank α (v) = dim(U min α (v)).An important property is that under natural conditions weak convergence (cf.Hackbusch [5,Theorem 6.24], Falcó-Hackbusch [2]).

Higher-order singular-value decomposition (HOSVD)
In the following we assume that all V j are pre-Hilbert spaces equipped with the Euclidean scalar product •, • .The Euclidean scalar product in j) , w ( j) .
Interpreting M α (v) as a matrix, one may determine its singular-value decomposition (SVD where r α > 0 are the singular values, while {b are the orthonormal systems of the left and right singular vectors.De Lathauwer-De Moor-Vandewalle [1] introduced the name HOSVD for the simultaneous SVD of the matricisations M j (v), 1 ≤ j ≤ d.Note that in general the SVD spectra (σ ( j) i ) 1≤i≤r j as well as r j = rank j (v) do not coincide.Compare also Hackbusch-Uschmajew [8].
It will turn out that the important quantities in (10) are the singular values σ i .These quantities are also characterised by the diagonalisation of the matrix may be a huge quantity.However, M j (v)M j (v) H is only of the size n j × n j .

Basis representations, transformations
The notation (1) refers to the unit vectors e i j .We may choose another basis b (11) with another coefficient tensor c ∈ If B j and B j are two bases of V j , there are transformations T ( j) and S ( j) = (T ( j) ) −1 with B j = B j T ( j) and B j = B j S ( j) , i.e., b Form T := d j=1 T ( j) and Remark 2 According to (5b), the matricisations of v and its core tensor c are related by 3 Tensor representations

r-Term format
Often, the dimension d j=1 n j of d j=1 K n j is much larger than the available computer memory.Therefore a naive representation of a tensor via its entries (1) is impossible.A classical tensor representation is the r -term format (also called the canonical or CP format) related to (3).Let V = d j=1 V j .We fix an integer r ∈ N 0 = N ∪ {0} and define the set i.e., v is represented by r elementary tensors with the factors v One checks that R r = {v : rank(v) ≤ r } .As long as rank(v) ≤ r holds with r of moderate size, this format yields a suitable representation.If rank(v) is too large, one may try to find an approximating tensor v of smaller rank.Another question is the implementation of tensor operations within this format.Adding u ∈ R r and v ∈ R s , one obtains the representation of the sum w := u + v in R r +s .Other operations let the representation rank increase even more.An example is the multiplication of the Kronecker matrix M := The product belongs to R r •s .Therefore one needs a truncation procedure which approximates a tensor from R t (t too large) by an approximation in R r for a suitable r < t.Unfortunately, this task is rather difficult within the r -term format (cf.Hackbusch [5, §7, §9]).

Tensor-subspace format
A remedy is the Tucker format or tensor-subspace format, which is related to ( 6) and (11).
Let n j = dim(V j ).Assume that we know that v ∈ d j=1 U j holds for subspaces U j ⊂ V j of (hopefully much) smaller dimension than n j .Choose any basis (or even only a generating system) b Then there is a tensor c ∈ d j=1 K r j -the so-called core tensor-such Note the difference to (11).The sums in (11) have n j terms, whereas (13b) only uses r j < n j as upper bound.

Definition 1
We denote the set of all tensors in V with a representation (13b) by T r , where r = (r 1 , . . ., r d ) is a multi-index.

Remark 3
The optimal choice of U j is given by U j = U min j (v) (cf.Sect.2.3), since then r j = rank j (v) is minimal.The memory cost for the core tensor is d j=1 r j .Therefore this representation is unfavourable for large d.

HOSVD
The first step are transformations into orthonormal bases B j with B j = B j T j (e.g., by a QR decomposition of B j yielding B j = Q and T j = R).According to ( 12 i ) 1≤i≤r j .The second step is the HOSVD applied to the core tensor c ∈ d j=1 K r j .Assume that the matricisation . This is the singular-value decomposition of M α (v) with the unitary matrices B α X α and B α c Y α .Taking α = { j}, we obtain a new basis transform by i ) 1≤i≤r j is called the j-th HOSVD basis.The core tensor has to be transformed into c as above.Again denoting B and c by B and c, we obtain the representation (13b) with respect to the HOSVD bases.
Since we do not need the right singular vectors in Y α , the practical computation first forms the product P j := M j (c)M j (c) H ∈ K r j ×r j .This is the most expensive step with an arithmetic cost of O(( d j=1 r j ) d j=1 r j ).The second step is the singular-value decomposition of P j (cost: O( d j=1 r 3 j )).The representation of v ∈ T r by the HOSVD bases allows two types of truncations.The number r j = dim(U j ) may be larger than necessary, i.e., larger than rank j (v) = dim(U min j (v)).This is detected by vanishing singular values.Assume that σ ( j) s j > 0, whereas σ ( j) i = 0 for s j < i ≤ r j .Then the sums in (13b) can be shortened (replace r j by s j ).After this step, v ∈ d j=1 U j ⊂ T s holds with U j = U min j (v) and s j = rank j (v).Note that the described procedure yields a shorter representation while the tensor is unchanged.
A truncation changing the tensor is described next.

HOSVD truncation
Assume again that the representation (13b) of v ∈ T r uses the HOSVD bases.We are looking for an approximation u ∈ T s with smaller dimensions s j < r j of the corresponding subspaces U j .This problem has two answers.First there is a (not necessarily unique) best approximation The computation must be done iteratively.It is hard to ensure that the corresponding minimisation method converges to the global minimum, since there may be many local minima.
A much easier approach is the HOSVD truncation: Given v ∈ T r with HOSVD bases in (13b), omit all terms involving indices i j > s j .The other terms are unchained.Obviously, the resulting tensor u HOSVD belongs to T s and its computation requires no arithmetical operation.
In the case of matrices one knows that u HOSVD = u best .However, for d ≥ 3, u HOSVD is not necessarily the best, but the quasi-optimal approximation: (cf. [5,Theorem 10.3]).Since the singular values σ ( j) i are known, the first inequality in ( 14) yields a precise error estimate.Given a tolerance ε, one can choose r j such that the error is below ε.The second inequality proves quasi-optimality.

Definition, notation
The tree-based tensor formats use a so-called dimension partition tree T D .The root of the tree is D = {1, . . ., d}, while the leaves are {1}, . . ., {d}.The tree describes how D is divided recursively.The vertices of the tree are subsets of D. Either a vertex α is a singleton (and therefore a leaf) or it has sons α i with the property that α is the disjoint union of the α i .Examples for d = 4 are given below: The first interpretation is that the tree (a) corresponds to The second interpretation involves the associated subspaces.The tree (a) corresponds to the Tucker format in Sect.3.2: All subspaces U 1 , . . ., In the case of tree (b) one first forms the subspaces U 1 ⊗ U 2 and U 3 ⊗ U 4 and determines subspaces U {1,2} ⊂ U 1 ⊗ U 2 and U {3,4} ⊂ U 3 ⊗ U 4 .Finally U {1,2} ⊗ U {3,4} is defined.The trees (c) and (d) lead to analogous constructions.The final subspace U D must be such that v ∈ U D holds for the tensor v which we want to represent.Obviously, the one-dimensional subspace U D = span{v} is sufficient.
Restricting ourselves to binary trees T D , we obtain the hierarchical tensor format (cases (b), (c) in (15); cf.Hackbusch-Kühn [7]).The practical advantage of a binary tree is the fact that the quantities appearing in the later computations are matrices.The further restriction to linear trees as in case (c) of (15) leads to the so-called TT format or matrix product format (cf.Verstraete-Cirac [14], Oseledets-Tyrtyshnikov [11,12]).
Consider a vertex α ⊂ D of the binary tree T D together with its sons α 1 and α 2 : 123 The sons are associated with subspaces The minimal subspaces U min α (v) introduced in ( 7) and ( 8) satisfy 4 the inclusion This proves the following remark.

Remark 4
The existence of subspaces U α (α ∈ T D ) with the required properties is ensured by the optimal choice

Implementation of the subspaces
In principle, all subspaces U α (α ∈ T D ) are described by basis5 vectors: However, b (α) are already tensors of order #α which should not be stored explicitly.Therefore we distinguish two cases.Case A. α = { j} is a leaf.Then the basis vectors b ( j) i of U α = U j are stored explicitly.Case B. α is a non-leaf vertex with sons α 1 and α 2 .Note that {b with coefficients c (α, ) i j forming an r α 1 × r α 2 matrix The tuple C (α, )   1≤ ≤r α of matrices can be regarded as a tensor C α of order 3 with entries Remark 5 The representation of a tensor v by the hierarchical format uses the data b and c (D) 1 .The memory cost of the hierarchical format is bounded by dnr + (d − 1) r 3 + 1, where n := max j dim(V j ) and r := max α∈T D r α .
Although the representation of v by the quantities (b 1 ) is rather indirect, all tensor operations can be performed by a recursion in the tree T D (either from the leaves to the root or in the opposite direction).Below we describe transformations, the orthonormalisation of the bases, and the HOSVD computation.Concerning other operations we refer to [5, § 13].

Transformations
We recall that the bases {b (α) : 1 ≤ ≤ r α } are well-defined by ( 18), but they are not directly accessible except for leaves α ∈ T D .Transformations of the bases are described by the corresponding modifications of the matrices C (α, ) .As in Sect.3.2 we form matrices related to a linear map in L(K r α , V α ).For simplicity we will call B α the basis (of the spanned subspace).
The left figure illustrates the connection of the basis B α with B α 1 and B α 2 at the son vertices via the data C α .Whenever one of these bases changes, also C α must be updated.Eq. ( 21) describes the update caused by a transformation of B α , while (22) considers the transformations of B α 1 and B α 2 .
Basis transformation in α.Assume that α is not a leaf and that B α and B α are two bases related by B α = B α S (α) , i.e., b The corresponding coefficient matrices C (α, ) and C (α, ) satisfy Using the tensor C α , this transformation becomes Basis transformation in the son vertices α i .Let α 1 , α 2 be the sons of α.Let B α i and B α i be two bases related by B α i T (α i ) = B α i (i = 1, 2).The corresponding coefficient matrices C (α, ) and C (α, ) are related by This is equivalent to

Orthonormalisation
Orthonormality of the (non-accessible) bases {b (α) } can be checked by corresponding properties of the coefficient matrices C (α, ) .The following sufficient condition is easy to prove.

Remark 6
Let α be a non-leaf vertex.The basis {b (α) } is orthonormal, if (a) the bases {b j } of the sons α 1 , α 2 are orthonormal and (b) the matrices C (α, ) in ( 19) are orthonormal with respect to the Frobenius scalar product: C (α, )  , C (α,m) The bases can be orthonormalised as follows.Orthonormalise the explicitly given bases at the leaves (e.g., by QR).As soon as {b -Case A1.Let α 1 be the first son of α.Assume that the basis {b (α, ) , the basis {b (α) } remains unchanged.
-Case A2.If b  (α,i) .(In addition, this transformation causes changes at the father vertex according to Case A1 or Case A2).
As in Sect.3.3, the bases have to be orthonormalised before the HOSVD bases are computed.

HOSVD bases
The challenge is the computation of the HOSVD, more precisely of the singular values σ (α) i and the left singular vectors (tensors) b (v).We recall that these data require the diagonalisation of the square matrix 7 M α (v)M α (v) H .In the case of the tensor subspace representation of Sect.3.3 it was possible to reduce M α (v)M α (v) H to M α (c)M α (c) H involving the (smaller) core tensor.Now we reduce the computation of M α (v)M α (v) H to matrix operations only involving the data C α .
The basis B α = {b with some coefficients e (α) i j which form an r α × r α matrix To simplify matters we assume that the bases are already orthonormal8 (cf.Sect.4.4).We start with the root α = D of the tree 11 is a scalar.The definition of M D (v) in Sect.2.2 shows that X α = vv H . On the other hand, the equality v = c where σ (D) 1 is the only singular value of M D (v).Its left singular vector is v.The following recursion starts with α = D.
We assume that for some non-leaf vertex α ∈ T D the singular values σ (α) i and the matrix E α are known.Now we want to determine E α 1 and E α 2 for the sons α 1 and α 2 of α.Concerning X α and X α 1 , we recall the definition of M α (v) by ( 4).The entries of X α are On the left-hand side, e.g., i α ∈ I α = × j∈α I j and k α c form the pair of matrix indices, while in the second line (i α , k α c ) ∈ I D is the index of v. Analogously we have The sum over k α c ∈ I α c already appears in (25) so that Returning to the matrices M α (v) and M α 1 (v), the latter sum can be regarded as a matrix multiplication when we interpret b .
Since the basis is orthonormal, we obtain that b λ ) H (δ μκ : Kronecker delta).Hence This proves that (23) holds for α 1 instead of α with coefficients e )).A similar treatment of X α 2 proves the following theorem.
Diagonalisation of the explicitly given matrices E α 1 and E α 2 yields with orthogonal matrices U , V and diagonal matrices Σ α i = diag{σ i , the corresponding contributions can be omitted.This reduces the associated subspace U α (cf.( 16)) to the minimal subspace U min α (v).Correspondingly the value of r α becomes rank α (v).

HOSVD truncation
We assume that following the procedure described above the hierarchical representation uses the HOSVD bases.The format H r with r = (r α ) α∈T D consists of all tensors v ∈ V with rank α (v) ≤ r α .Given v ∈ H r we ask for an approximation u ∈ H s for a smaller tuple s with s ≤ r.
The truncation is to the procedure in Sect.3.4.In terms of the (implicitly defined) bases the approximation u HOSVD is obtained by omitting all contributions involving the HOSVD basis vectors b (α) i for s α < i ≤ r α .In practice this means that the coefficient matrices C (α,i) are omitted for s α < i ≤ r α , while the remaining r α 1 × r α 2 matrices C (α,i)  are reduced to size s α 1 × s α 2 by deleting the last r α 1 − s α 1 rows and r α 2 − s α 2 columns.If α = { j} is a leaf, the explicitly given basis {b The approximation error v − u HOSVD satisfies (cf.[5,Theorem 11.58]), where v HOSVD is the truncated value of v: The first inequality allows us to explicitly control the error with respect to the Euclidean norm by the choice of the omitted singular values.The second inequality proves quasi-optimality of this truncation.u best ∈ H s is the best approximation.The parameter d is the order of the tensor.
The number 2d − 3 on the right-hand side becomes smaller if s α = r α holds for some vertices α.For instance, the TT format as described in [11] uses the maximal value s j = r j = dim(V j ) for the leaves.Then (27) holds with 5 Properties of the SVD projection

Case of the tensor-subspace format
The HOSVD truncation of the tensor-subspace format in Sect.3.4 is the Kronecker product where P j : V j → span{b The tensor product Π of the single projections P j can also be written as a usual product Π = d j=1 P j of where I j is the identity map on V j .Since the projections P j commute, the order of the factors in d j=1 P j does not matter.We recall the singular-value decomposition of the matricisation M j (v) (cf.Sect.2.4): where the superscript [ j] = j} c denotes the complement of the leaf α = { j}.(5b), we get However, we may also define where we obtain the identical value P j v = P j v although the projections are different.This property has interesting consequences.We introduce and observe that Π j v = ( k = j P k ) P j v = ( k = j P k )P j v =Πv holds for the special tensor v although Π j = Π.Note that all maps P k and P j are elementary tensors containing the identity I j : V j → V j with respect to the j-th direction.This proves the next lemma for which we introduce the following notation.Let ϕ j : V j → W j be a linear map.It gives rise to the elementary Kronecker product φ Lemma 1 Let ϕ j : V j → W j and φ j as above.Then φ j Π j = Π j φ j holds (the latter Π j contains the identity I j : W j → W j instead of I j : V j → V j ).
This allows the following estimate with respect to the Euclidean norm.

Conclusion 2
Given v ∈ V, let u HOSVD ∈ T s be the HOSVD approximation defined in Sect.3.4.With φ j from above we have In the case of infinite-dimensional Hilbert spaces V we may consider unbounded linear maps φ j .The subspace of elements v for which φ j v is defined, is called the domain of φ j .

Conclusion 3
If v ∈ V belongs to the domain of φ j , then also u HOSVD belongs to the domain and satisfies (30).

An important example is the topological tensor space
where Ω is the Cartesian product of the Ω j .Set φ j = ∂ k /∂ x k j .If the function v ∈ V possesses a k-th derivative with respect to x j , then by Conclusion 3 also u HOSVD is k-times differentiable in the L 2 sense and satisfies ∂ k u HOSVD /∂ x k j L 2 ≤ ∂ k v/∂ x k j L 2 .Assuming sufficient smoothness of v and using the Gagliardo-Nirenberg inequality, we proved in [6] estimates of v − u HOSVD ∞ with respect to the maximum norm by means of the L 2 norm of v − u HOSVD .This is important for the pointwise evaluation of the truncated function.
Another trivial conclusion from is that φ v = 0 implies φ j u HOSVD = 0.For instance, let ϕ j ∈ V j be a functional on V j (i.e., W j = K).Examples of ϕ j are the mean value ϕ j (u) = 1 T u or a zero at a certain index i * : ϕ j (u) = u i * = 0. We say that v satisfies the side condition ϕ j if φ j v = 0. We conclude that u HOSVD satisfies the same side condition.In the case of ϕ j (u) = 1 T u, also u HOSVD has a vanishing mean.If ϕ j (u) = u i * , u HOSVD [i] = 0 holds for i with i j = i * .
In the case of matrix spaces V j , structural properties like symmetry or sparsity can be described by functionals.One concludes that the HOSVD approximations lead to matrices of the same structure.

Best approximation u best
We recall that the HOSVD approximation u HOSVD ∈ T r of v ∈ V is not (necessarily) the best approximation defined by (cf.Definition 1).Nevertheless, u best has similar properties as u HOSVD .Define Depending on the multiplicity of certain singular values, the SVD approximation may be unique.In this case u HOSVD = u best holds.If the SVD approximation is not unique, we may choose u best as u HOSVD = P j v j .Knowing that P j is a SVD projection, we may replace P j by P j as defined in (29).The projection Π j := Π [ j] P j has the same properties as Π j in §5.1.This proves the following (cf.Uschmajew [13]).

Theorem 4
The statements of Lemma 1 and the Conclusions 2 and 3 also hold for the best approximation u best in (31) and the related mapping Π j .

Case of the hierarchical format
The HOSVD truncation within the hierarchical format (cf.Sect.can be expressed orthogonal projections P α for all vertices α of the tree T D .However, different from Sect.5.1, projections P α and P β commute if and only α ∩ β = ∅.The truncation is described by the product where the factors are ordered in such a way that P α is applied before P α 1 and P α 2 (α 1 , α 2 sons of α) follow.Because of these restrictions, the analysis is more involved.We refer the reader to Hackbusch [6, § 4].As a result the statements in Sect.5.1 also hold for the hierarchical format.
), we have v = B c with c := Tc.Denoting B and c again by B and c, we obtain the representation (13b) with orthonormal bases (b ( j)
new is a transformation of the second son of α, C (α, ) must be changed into C (α, ) T T .-Case B. Consider a non-leaf vertex α.If the basis {b (α) } should be transformed into b (α) ,new := i T i b (α) i , one has to change the coefficient matrices C (α, ) by C (α, )

)
Since r D = 1 at the root α = D, we have σ .Assume that the HOSVD basis {b } is already chosen for the representation (we recall that the definition of b (α)

α 1 = B α 1 U and B HOSVD α 2 = B α 2 V
are the desired HOSVD bases at the vertices α 1 and α 2 .If α i is a leaf, this transformation is performed explicitly.Otherwise the coefficient matrices are modified according to Sect.4.3.The procedure is repeated for the sons of α 1 , α 2 until we reach the leaves.Then at all vertices HOSVD bases are introduced together with singular values σ (α) ν .If there are vanishing singular values σ (α)

2 = v j − u HOSVD 2 + 2 < v j − u best 2 + 2 = 2 = v − u best 2 in
U k := U min k (u best ) for 1 ≤ k ≤ d.Let P k : V k → U k be the orthogonal projection onto U k .Based on these projections we define P k and Π as in Sect.5.1.Now we fix one index j and defineΠ [ j] := k = j P k .Set v j := Π [ j] v ∈ U 1 ⊗ . . .⊗ U j−1 ⊗ V j ⊗ U j+1 ⊗ . . .⊗ U d and note that P j v j = u best .Based on the SVD of M j (v j ) we can determine its HOSVD approximation u HOSVD ∈ T r .Since it is the minimiser of min u∈T r v j − u , we have v j − u HOSVD ≤ v j − u best .For an indirect proof assume that v j − u HOSVD < v j − u best .Both u HOSVD and u best are in the range of Π [ j] , i.e.,I − Π [ j] u HOSVD = I − Π [ j] u best = 0.Pythagoras' equality yieldsv − u HOSVD 2 = Π [ j] (v − u HOSVD ) 2 + I − Π [ j] (v − u HOSVD ) I − Π [ j] v I − Π [ j] v Π [ j] (v − u best ) 2 + I − Π [ j] (v − u best )contradiction to the optimality of u best .Hence, v − u HOSVD = v − u best must hold.