Analysis of tensor approximation schemes for continuous functions

In this article, we analyze tensor approximation schemes for continuous functions. We assume that the function to be approximated lies in an isotropic Sobolev space and discuss the cost when approximating this function in the continuous analogue of the Tucker tensor format or of the tensor train format. We especially show that the cost of both approximations are dimension-robust when the Sobolev space under consideration provides appropriate weights.


Introduction
The efficient approximate representation of multivariate functions is an important task in numerical analysis and scientific computing. In this article, we hence consider the approximation of functions which live on the product of m bounded domains Ω 1 × · · · × Ω m , each of which satisfies Ω j ⊂ R n j . Besides a sparse grid approximation of the function under consideration, being discussed in, e.g., [8,17,18,50], one can also apply a low-rank approximation by means of a tensor approximation scheme, see, e.g., [15,21,22,34,35] and the references therein.
The low-rank approximation in the situation of the product of m = 2 domains is well understood. It is related to the singular value decomposition and has been studied for arbitrary product domains in, e.g., [19,20], see also [46,47,48] for the periodic case. However, the situation is not that clear for the product of m > 2 domains, where one ends up with tensor decompositions. Such tensor decompositions are generalizations of the well known singular value decomposition and the corresponding low-rank matrix approximation methods of two dimensions to the higher-dimensional setting. There, besides the curse of dimension, we encounter -due to the non-existence of an Eckart-Young-Mirsky theorem -that the concepts of singular value decomposition and low-rank approximation can be generalized to higher dimensions in more than one way. Consequently, there exist many generalizations of the singular value decomposition of a function and of low-rank approximations to tensors. To this end, various schemes have been developed over the years in different areas of the sciences and have successfully been applied to various high-dimensional problems ranging from quantum mechanics and physics via biology and econometrics, computer graphics and signal processing to numerical analysis. Recently, tensor methods have even been recognized as special deep neural networks in machine learning and big data analysis [11,28]. As tensor approximation schemes, we have, for example, matrix product states, DMRG, MERA, PEPS, CP, CANDECOMP, PARAFAC, Tucker, tensor train, tree tensor networks and hierarchical Tucker, to name a few. A mathematical introduction into tensor methods is given in the seminal book [21], while a survey on existing methods and their literature can be found in [16]. Also various software packages have been developed for an algebra of operators dealing with tensors.
Tensor methods are usually analyzed as low-rank approximations to a full discrete tensor of data with respect to the ℓ 2 -norm or Frobenius-norm. In this respective, they can be seen as compression methods which may avoid the curse of dimensionality, which is inherent in the full tensor representation. The main tool for studying tensor compression schemes is the so-called tensor-rank , compare [12,13,21]. Thus, instead of O(N n ) storage, as less as O(nNr 3 ) or even only O(nNr 2 ) storage is needed, where N denotes the number of data points in one coordinate direction, n denotes the dimension of the tensor under consideration and r denotes the respective tensor rank of the data. The cost complexity of the various algorithms working with sparse tensor representations is correspondingly reduced and working in a sparse tensor format allows to alleviate or to completely break the curse of dimension for suitable tensor data classes, i.e., for sufficiently small r.
However, the question where the tensor data stem from and the issue of the accuracy of the full tensor approximation, i.e., the discretization error of the full tensor itself and its relation to the error of a subsequent low-rank tensor approximation, is usually not adequately addressed. 1 Instead, only the approximation property of a low-rank tensor scheme with respect to the full tensor data is considered. But the former question is important since it clearly makes no sense to derive a tensor approximation with an error that is substantially smaller than the error which is already inherent in the full tensor data due to some discretization process for a continuous highdimensional function which stems from some certain function class.
The approximation rates to continuous functions can be determined by a recursive use of the singular value decomposition, which is successively applied to convert the function into a specific continuous tensor format. We studied the singular value decomposition for arbitrary domains in [19,20] and we now can apply these results to discuss approximation rates of continuous tensor formats. In the present article, given a function f ∈ H k (Ω 1 × · · · × Ω m ), we study the continuous analogues of the Tucker tensor decomposition and of the tensor train decomposition. We give bounds on the ranks required to ensure that the tensor decomposition admits a prescribed target accuracy. Especially, our analysis takes into account the influence of errors induced by truncating infinite expansions to finite ones. We therefore study an algorithm that computes the desired tensor expansion which is in contrast to the question of the smallest tensor-rank. We finally show that (isotropic) Sobolev spaces with dimension weights help to beat the curse of dimension when the number m of product domains tends to infinity.
Besides the simple situation of Ω 1 = · · · = Ω m = [0, 1], which is usually considered in case of tensor decompositions, there are many more applications of our general setting. For example, non-Newtonian flow can be modeled by a coupled system which consists of the Navier Stokes equation for the flow in a three-dimensional geometry described by Ω 1 and of the Fokker-Planck equation in a 3(ℓ − 1)-dimensional configuration space Ω 2 × · · · × Ω ℓ , consisting of ℓ − 1 spheres. Here, ℓ denotes the number of atoms in a chain-like molecule which constitutes the non-Newtonian behavior of the flow, for details see [4,29,31,37]. Another example is homogenization. After unfolding [10], a two-scale homogenization problem gives raise to the product of the macroscopic physical domain and the periodic microscopic domain of the cell problem, see [32]. For multiple scales, several periodic microscopic domains appear which reflect the different separable scales, see e.g. [27]. Also the m-th moment of linear elliptic boundary value problems with random source terms, i.e. Au(ω) = f (ω) in Ω, are known to satisfy a deterministic partial differential equation on the m-fold product domain Ω × · · · × Ω. There, the solution's m-th moment M u is given by the equation see [40,41]. This approach extends to boundary value problems with random diffusion and to random domains as well [9,25]. Moreover, we find the product of several domains in quantum mechanics for e.g. the Schrödinger equation or the Langevin equation, where each domain is three-dimensional and corresponds to a single particle. Finally, we encounter it in uncertainty quantification, where one has the product of the physical domain Ω 1 and of in general infinitely many intervals Ω 2 = Ω 3 = Ω 4 = . . . for the random input parameter, which reflects its series expansion by the Karhunen-Lòeve decomposition or the Lévy-Ciesielski decomposition.
The remainder of this article is organized as follows: In Section 2, we give a short introduction to our results on the singular value decomposition, which are needed to derive the estimates for the continuous tensor decompositions. Then, in Section 3, we study the continuous Tucker tensor format, computed by means of the higher-oder singular value decomposition. Next, we study the continuous tensor train decomposition in Section 4, computed by means of a repeated use of a vector-valued singular value decomposition. Finally, Section 5 concludes with some final remarks.
Throughout this article, to avoid the repeated use of generic but unspecified constants, we denote by C D that C is bounded by a multiple of D independently of parameters which C and D may depend on. Obviously, C D is defined as D C, and C ∼ D as C D and C D. Moreover, given a Lipschitz-smooth domain Ω ⊂ R n , L 2 (Ω) means the space of square integrable functions on Ω. For real numbers k ≥ 0, the associated Sobolev space is denoted by H k (Ω), where its norm · H k (Ω) is defined in the standard way, compare [33,45]. As usual, we have H 0 (Ω) = L 2 (Ω). The seminorm in H k (Ω) is denoted by | · | H k (Ω 1 ) . Although not explicitly written, our subsequent analysis covers also the situation of Ω being not a domain but a (smooth) manifold.

Definition and calculation.
Let Ω 1 ⊂ R n 1 and Ω 2 be Lipschitz-smooth domains. To represent functions f ∈ L 2 (Ω 1 ×Ω 2 ) on the tensor product domain Ω 1 ×Ω 2 in an efficient way, we will consider low-rank approximations which separate the variables x ∈ Ω 1 and y ∈ Ω 2 in accordance with It is well known (see e.g. [30,38,42]) that the best possible representation (2.1) in the L 2 -sense is given by the singular value decomposition, also called Karhunen-Lòeve expansion. 2 Then, the coefficients λ(α) ∈ R are the singular values and the ϕ(α) ∈ L 2 (Ω 1 ) and ψ(α) ∈ L 2 (Ω 2 ) are the left and right (L 2 -normalized) eigenfunctions of the integral operator This means that f (x, y)v(y) dy. 2 We refer the reader to [44] for a comprehensive historical overview on the singular value decomposition.

2.2.
Regularity of the eigenfunctions. Now, we consider functions f ∈ H k (Ω 1 × Ω 2 ). In the following, we collect results from [19,20] concerning the singular value decomposition of such functions. We repeat the proof whenever needed for having explicit constants. To this end, we define the mixed Sobolev space H k,ℓ mix (Ω 1 × Ω 2 ) as a tensor product of Hilbert spaces which we equip with the usual cross norm .
Note that are continuous with Proof. According to (2.2) and Lemma 2.1, we have This proves the first assertion. The second assertion follows by duality.
As an immediate consequence of Lemma 2.2, we obtain . We will show later in Lemma 2.6 how to improve this estimate by sacrificing some regularity.
2.3. Truncation error. We next give estimates on the decay rate of the eigenvalues of the integral operator To this end, we exploit the smoothness in the function's first variable and assume hence f ∈ H k,0 mix (Ω 1 × Ω 2 ). We introduce finite element spaces U r ⊂ L 2 (Ω 1 ), which consist of r discontinuous, piecewise polynomial functions of total degree ⌈k⌉ on a quasi-uniform triangulation of Ω 1 with mesh width h r ∼ r −1/n 1 . Then, given a function w ∈ H k (Ω 1 ), the L 2orthogonal projection P r : uniformly in r due to the Bramble-Hilbert lemma, see e.g., [6,7].
For the approximation of f (x, y) in the first variable, i.e. (P r ⊗ I)f (x, y), we obtain the following approximation result for the present choice of U r , see [20] for the proof.
By combining this lemma with the approximation estimate (2.6) and in view of λ(α) − λ r (α) ≥ 0 for all α ∈ {1, 2, . . . , r} according to the min-max theorem of Courant-Fischer, see [1] for example, we conclude that the truncation error of the singular value decomposition can be bounded by Since the eigenvalues of the integral operator K f and its adjoint K f are the same, we can also exploit the smoothness of f in the second coordinate by interchanging the roles of Ω 1 and Ω 2 in the above considerations. We thus obtain the following theorem: Then, it holds Remark 2.5. Theorem 2.4 implies that the eigenvalues {λ(α)} α∈N in case of a function f ∈ H k (Ω 1 × Ω 2 ) decay like Having the decay rate of the eigenvalues at hand, we are able to improve the result of Lemma 2.2 by sacrificing some regularity. Note that the proof of this result is based upon an argument from [43].
Finally, we exploit the product structure of L 2 (Ω 1 × Ω 2 ) and the orthonormality of {ψ(α)} α∈N to derive the first assertion, i.e., The second assertion follows in complete analogy.
Consider now a vector-valued function w ∈ [H k (Ω 1 )] m of dimension m. Then, instead of (2.6), we find since w consists of m components and we thus need m-times as many ansatz functions for our approximation argument. Hence, in case of a vector-valued function , we conclude by exploiting the smoothness in the first variable 3 that the truncation error of the singular value decomposition can be estimated by Hence, the decay rate of the singular values is considerably reduced. Finally, we like to remark that Lemma 2.6 holds also in the vector case, i.e., provided that f has extra regularity in terms of f ∈ [H k+n 1 (Ω 1 × Ω 2 )] m . Here, analogously to above, . After these preparations, we now introduce and analyze two types of continuous analogues of tensor formats, namely of the Tucker format [26,49] and of the tensor train format [36,34], and discuss their approximation properties for functions f ∈ H k (Ω 1 × · · · × Ω m ). 3 Note that the kernel function of S ⋆ f S f is matrix-valued while the kernel function of S f S ⋆ f is scalar-valued.

Truncation error.
If we intend to truncate the singular value decomposition (3.11) after r j terms such that the truncation error is bounded by ε, we have to choose We have the following result on the Tucker decomposition: Theorem 3.1. Let f ∈ H k (Ω 1 × · · · × Ω m ) for some fixed k > 0 and 0 < ε < 1.
If the ranks are chosen according to r j = ε −n j /k for all j = 1, . . . , m. Then, the truncation error of the truncated Tucker decomposition is while the storage cost for the core tensor of f TF r 1 ,...,rm are m j=1 r j = ε −(n 1 +···+nm)/k .
Proof. For the approximation of the core tensor, the sets of the univariate eigenfunctions {ϕ j (α j )} r j α j =1 are used for all j = 1, . . . , m, cf. (3.12). Due to orthonormality, we find where we obtain f TF ∞,...,∞ = f in case of j = 1. Since for all j ∈ {1, 2, . . . , m}, we arrive with (3.13) and the summation over j = 1, . . . , m at the desired error estimate. This completes the proof, since the estimate on the number of coefficients in the core tensor is obvious.
3.3. Sobolev spaces with dimension weights. The cost of the core tensor of the Tucker decomposition exhibit the curse of dimension as the number m of subdomains increases. This can be seen most simply for the example n j = n. Then, the cost are ε −nm/k , which expresses the curse of dimension as long as k is not proportional to m. Nonetheless, in case of Sobolev spaces with dimension weights, the curse of dimension can be beaten.
For f ∈ H k+n (Ω m ), we shall discuss the situation m → ∞ in more detail. To this end, we assume that all subdomains are identical to a single domain Ω ⊂ R n of dimension n and note that the limit m → ∞ only makes sense when weights are included in the underlying Sobolev spaces which ensure that higher dimensions become less important. For our proofs we choose as usual m arbitrary but fixed and show the existence of m-independent constants in the convergence and cost estimates.
The Sobolev spaces H k γ (Ω m ) with dimension weights γ ∈ R m we consider are given by all functions f ∈ H k (Ω m ) such that (3.14) γ k j f H k (Ω m ) for all |β| = k and j = 1, 2, . . . , m.
The definition in (3.14) means that, given a function f with norm f H k (Ω m ) < ∞, the partial derivatives with respect to x j become less important as the dimension j increases. Such functions appear for example in uncertainty quantification. Let be given a Karhunen-Loève expansion and insert it into a function b : R → R of finite smoothness W k,∞ (R). Then, the function b u(x, y) satisfies (3.14) with respect to the y-variable, where γ j = σ j . Hence, the solution of a given partial differential equation would satisfy a decay estimate similar to (3.14) whenever the stochastic field enters the partial differential equation through a non-smooth coefficient function b, compare [14,23,24] for example.
Then, for all 0 < ε < 1, the error of the continuous Tucker decomposition with ranks (3.16) r j = γ n j j (1+δ)n/k ε −n/k is of order ε while the storage cost for the core tensor of f TF r 1 ,...,rm are bounded by ε −n/k independent of the dimension m.
Proof. In view of Theorem 2.4 and (3.14), we deduce by choosing the ranks as in Therefore, we reach the desired over-all truncation error When the weights γ j decay as in (3.16), then the cost of the core tensor are Hence, the cost of the core tensor stay bounded independently of m since

Tensor train format
4.1. Tensor train decomposition. For the discussion of the continuous tensor train decomposition, we should assume that the domains Ω j ⊂ R n j , j = 1, . . . , m, are arranged in such a way that it holds n 1 ≤ · · · ≤ n m . 5 Now, consider f ∈ H k (Ω 1 × · · · × Ω m ) and separate the variables x 1 ∈ Ω 1 and (x 2 , . . . , x m ) ∈ Ω 2 × · · · × Ω m by the singular value decomposition Since we can separate (α 1 , x 2 ) ∈ N × Ω 2 from (x 3 , . . . , x m ) ∈ Ω 3 × · · · × Ω m by means of a second singular value decomposition and arrive at (4.18) By repeating the last step and successively separating (α j−1 , x j ) ∈ N × Ω j from (x j+1 , . . . , x m ) ∈ Ω j+1 × · · · × Ω m for j = 3, . . . , m − 1 we finally arrive at the 5 The considerations in this section are based upon [5]. Nonetheless, the results derived there are not correct. The authors did not consider the impact of the vector-valued singular value decomposition in a proper way, which indeed does result in the curse of dimension.
In contrast to the Tucker format, we do not obtain a huge core tensor since each of the m − 1 singular value decompositions of the tensor train decomposition removes the actual first spatial domain from the approximant. We just obtain a product of matrix -valued functions (except for the first and last factor which are vector-valued functions), each of which is related with a specific domain Ω j . This especially results in only m − 1 sums in contrast to the m sums for the Tucker format.

Truncation error.
In practice, we truncate the singular value decomposition in step j after r j terms, thus arriving at the representation One readily infers by using again Pythogoras' theorem that the truncation error is bounded by see also [35]. Note that, for j ≥ 2, the singular values {λ j (α)} α∈N in this estimate do not coincide with the singular values from the original continuous tensor train decomposition due to the truncation.
We next shall give bounds on the truncation error. In the j-th step of the algorithm, j = 2, 3, . . . , m − 1, one needs to approximate the vector-valued function by a vector-valued singular value decomposition. This means that we consider the singular value decomposition (2.9) for a vector-valued function in case of the domains Ω j and Ω j+1 × · · · × Ω m .
We summarize our findings in the following theorem, which holds in this form also if the subdomains Ω j ⊂ R n j are not ordered in such a way that n 1 ≤ · · · ≤ n m .
Theorem 4.1. Let f ∈ H k+max{n 1 ,...,nm} (Ω 1 × · · · × Ω m ) for some fixed k > 0 and 0 < ε < 1. Then, the over-all truncation error of the tensor train decomposition with truncation ranks (4.20) is The storage cost for f TT r 1 ,...,r m−1 are given by  If n := n 1 = · · · = n m , then the cost of the tensor train decomposition are O(ε −(2m−1)n/k ). Thus, the cost are quadratic compared to the cost of the Tucker decomposition. However, in practice, one performs m/2 forward steps and m/2 backward steps. This means one computes m/2 steps as described above to successively separate x 1 , x 2 , . . . , x m/2 from the other variables. Then, one performs the algorithm in the opposite direction, i.e., one successively separates x m , x m−1 , . . . , x m/2+1 from the other variables. This way, the over-all cost are reduced to the order O(ε −mn/k ). 6 4.3. Sobolev spaces with dimension weights. Like for the Tucker decomposition, the cost of the tensor train decomposition suffer from the curse of dimension as the number m of subdomains increases. We therefore discuss again appropriately Sobelev spaces with dimension weights, where we assume for reasons of simplicity that all subdomains are identical to a single domain Ω ⊂ R n of dimension n. (Ω m ) for some fixed k > 0 with weights (3.14) that decay like (3.15). For 0 < ε < 1, choose the ranks successively in accordance with  Then, the error of the continuous tensor train decomposition is of order ε while the storage cost of f TT r 1 ,...,rm stay bounded by M exp(ε −n/k ) 2 independent of the dimension m.
Proof. The combination of Theorem 2.4, (3.14) and (4.22) implies ∞ α j =r j +1 λ j (α j ) r j r j−1 −k/n γ k j f H k (Ω m ) ε j 1+δ , j = 1, 2, . . . , M, Hence, as in the proof of Theorem 3.2, the approximation error of the continuous tensor train decomposition is bounded by a multiple of ε independent of m.
This recursively yields Hence, by using that θ = (δ ′ − δ)n/k > 1, we obtain and, hence, are bounded independently of m in view of (4.23).

Discussion and conclusion
In the present article, we considered the continuous versions of the Tucker tensor format and of the tensor train format for the approximation of functions which live on an m-fold product of arbitrary subdomains. By considering (isotropic) Sobolev smoothness, we derived estimates on the ranks to be chosen in order to realize a prescribed target accuracy. These estimates exhibit the curse of dimension.
Both tensor formats have in common that always only the variable with respect to a single domain is separated from the other variables by means of the singular value decomposition. This enables cheaper storage schemes, while the influence of the over-all dimension of the product domain is reduced to a minimum.
We also examined the situation of Sobolev spaces with dimension weights. Having sufficiently fast decaying weights helps to beat the curse of dimension as the number of subdomains tends to infinity. It turned out that algebraically decaying weights are appropriate for both, the Tucker tensor format and the tensor train format.
We finally remark that we considered here only the ranks of the tensor decomposition in the continuous case, i.e., for functions and not for tensors of discrete data. Of course, an additional projection step onto suitable finite dimensional trial spaces on the individual domains would be necessary to arrive at a fully discrete approximation scheme that can really be used in computer simulations. This would impose a further error of discretization type which needs to be balanced with the truncation error of the particular continuous tensor format.