1 Introduction

The efficient approximate representation of multivariate functions is an important task in numerical analysis and scientific computing. In this article, we hence consider the approximation of functions which live on the product of m bounded domains \(\varOmega _1\times \dots \times \varOmega _m\), each of which satisfies \(\varOmega _j\subset {\mathbb {R}}^{n_j}\). Besides a sparse grid approximation of the function under consideration, being discussed in [8, 17, 18, 50], one can also apply a low-rank approximation by means of a tensor approximation scheme, see, e.g., [15, 21, 22, 34, 35] and the references therein.

The low-rank approximation in the situation of the product of \(m=2\) domains is well understood. It is related to the singular value decomposition and has been studied for arbitrary product domains in [19, 20], see also [46,47,48] for the periodic case. However, the situation is not that clear for the product of \(m>2\) domains, where one ends up with tensor decompositions. Such tensor decompositions are generalizations of the well-known singular value decomposition and the corresponding low-rank matrix approximation methods of two dimensions to the higher-dimensional setting. There, besides the curse of dimension, we encounter—due to the nonexistence of an Eckart–Young–Mirsky theorem—that the concepts of singular value decomposition and low-rank approximation can be generalized to higher dimensions in more than one way. Consequently, there exist many generalizations of the singular value decomposition of a function and of low-rank approximations to tensors. To this end, various schemes have been developed over the years in different areas of the sciences and have successfully been applied to various high-dimensional problems ranging from quantum mechanics and physics via biology and econometrics, computer graphics and signal processing to numerical analysis. Recently, tensor methods have even been recognized as special deep neural networks in machine learning and big data analysis [11, 28]. As tensor approximation schemes, we have, for example, matrix product states, DMRG, MERA, PEPS, CP, CANDECOMP, PARAFAC, Tucker, tensor train, tree tensor networks and hierarchical Tucker, to name a few. A mathematical introduction into tensor methods is given in the seminal book [21], while a survey on existing methods and their literature can be found in [16]. Also, various software packages have been developed for an algebra of operators dealing with tensors.

Tensor methods are usually analyzed as low-rank approximations to a full discrete tensor of data with respect to the \(\ell _2\)-norm or Frobenius norm. In this respective, they can be seen as compression methods which may avoid the curse of dimensionality, which is inherent in the full tensor representation. The main tool for studying tensor compression schemes is the so-called tensor-rank, compare [12, 13, 21]. Thus, instead of \({\mathcal {O}}(N^n)\) storage, as less as \({\mathcal {O}}(nNr^3)\) or even only \({\mathcal {O}}(nNr^2)\) storage is needed, where N denotes the number of data points in one coordinate direction, n denotes the dimension of the tensor under consideration, and r denotes the respective tensor rank of the data. The cost complexity of the various algorithms working with sparse tensor representations is correspondingly reduced, and working in a sparse tensor format allows to alleviate or to completely break the curse of dimension for suitable tensor data classes, i.e., for sufficiently small r.

However, the question where the tensor data stem from and the issue of the accuracy of the full tensor approximation, i.e., the discretization error of the full tensor itself and its relation to the error of a subsequent low-rank tensor approximation, are usually not adequately addressed.Footnote 1 Instead, only the approximation property of a low-rank tensor scheme with respect to the full tensor data is considered. But the former question is important since it clearly makes no sense to derive a tensor approximation with an error that is substantially smaller than the error which is already inherent in the full tensor data due to some discretization process for a continuous high-dimensional function which stems from some certain function class.

The approximation rates to continuous functions can be determined by a recursive use of the singular value decomposition, which is successively applied to convert the function into a specific continuous tensor format. We studied the singular value decomposition for arbitrary domains in [19, 20], and we now can apply these results to discuss approximation rates of continuous tensor formats. In the present article, given a function \(f\in H^k(\varOmega _1\times \dots \times \varOmega _m)\), we study the continuous analogues of the Tucker tensor decomposition and of the tensor train decomposition. We give bounds on the ranks required to ensure that the tensor decomposition admits a prescribed target accuracy. Particularly, our analysis takes into account the influence of errors induced by truncating infinite expansions to finite ones. We therefore study an algorithm that computes the desired tensor expansion which is in contrast to the question of the smallest tensor rank. We finally show that (isotropic) Sobolev spaces with dimension weights help to beat the curse of dimension when the number m of product domains tends to infinity.

Besides the simple situation of \(\varOmega _1 = \dots = \varOmega _m = [0,1]\), which is usually considered in case of tensor decompositions, there are many more applications of our general setting. For example, non-Newtonian flow can be modeled by a coupled system which consists of the Navier–Stokes equation for the flow in a three-dimensional geometry described by \(\varOmega _1\) and of the Fokker–Planck equation in a \(3(\ell -1)\)-dimensional configuration space \(\varOmega _2\times \dots \times \varOmega _{\ell }\), consisting of \(\ell -1\) spheres. Here, \(\ell \) denotes the number of atoms in a chain-like molecule which constitutes the non-Newtonian behavior of the flow, for details, see [4, 29, 31, 37]. Another example is homogenization. After unfolding [10], a two-scale homogenization problem gives raise to the product of the macroscopic physical domain and the periodic microscopic domain of the cell problem, see [32]. For multiple scales, several periodic microscopic domains appear which reflect the different separable scales, see, e.g.  [27]. Also, the m-th moment of linear elliptic boundary value problems with random source terms, i.e., \(Au(\omega )=f(\omega )\) in \(\varOmega \), is known to satisfy a deterministic partial differential equation on the m-fold product domain \(\varOmega \times \dots \times \varOmega \). There, the solution’s m-th moment \({\mathcal {M}}_u\) is given by the equation

$$\begin{aligned} (A\otimes \dots \otimes A){\mathcal {M}}_u = {\mathcal {M}}_f\ \text {in}\ \varOmega \times \dots \times \varOmega , \end{aligned}$$

see [40, 41]. This approach extends to boundary value problems with random diffusion and to random domains as well [9, 25]. Moreover, we find the product of several domains in quantum mechanics for the Schrödinger equation or the Langevin equation, where each domain is three-dimensional and corresponds to a single particle. Finally, we encounter it in uncertainty quantification, where one has the product of the physical domain \(\varOmega _1\) and of in general infinitely many intervals \(\varOmega _2 = \varOmega _3 = \varOmega _4 = \dots \) for the random input parameter, which reflects its series expansion by the Karhunen–Lòeve decomposition or the Lévy–Ciesielski decomposition.

The remainder of this article is organized as follows: In Sect. 2, we give a short introduction to our results on the singular value decomposition, which are needed to derive the estimates for the continuous tensor decompositions. Then, in Sect. 3, we study the continuous Tucker tensor format, computed by means of the higher-order singular value decomposition. Next, we study the continuous tensor train decomposition in Sect. 4, computed by means of a repeated use of a vector-valued singular value decomposition. Finally, Sect. 5 concludes with some final remarks.

Throughout this article, to avoid the repeated use of generic but unspecified constants, we denote by \(C \lesssim D\) that C is bounded by a multiple of D independently of parameters which C and D may depend on. Obviously, \(C \gtrsim D\) is defined as \(D \lesssim C\), and \(C \sim D\) as \(C \lesssim D\) and \(C \gtrsim D\). Moreover, given a Lipschitz-smooth domain \(\varOmega \subset {\mathbb {R}}^n\), \(L^2(\varOmega )\) means the space of square integrable functions on \(\varOmega \). For real numbers \(k\ge 0\), the associated Sobolev space is denoted by \(H^k(\varOmega )\), where its norm \(\Vert \cdot \Vert _{H^k(\varOmega )}\) is defined in the standard way, compare [33, 45]. As usual, we have \(H^0(\varOmega ) = L^2(\varOmega )\). The seminorm in \(H^k(\varOmega )\) is denoted by \(|\cdot |_{H^k(\varOmega _1)}\). Although not explicitly written, our subsequent analysis covers also the situation of \(\varOmega \) being not a domain but a (smooth) manifold.

2 Singular Value Decomposition

2.1 Definition and Calculation

Let \(\varOmega _1\subset {\mathbb {R}}^{n_1}\) and \(\varOmega _2\) be Lipschitz-smooth domains. To represent functions \(f\in L^2(\varOmega _1\times \varOmega _2)\) on the tensor product domain \(\varOmega _1\times \varOmega _2\) in an efficient way, we will consider low-rank approximations which separate the variables \({\varvec{x}}\in \varOmega _1\) and \({\varvec{y}}\in \varOmega _2\) in accordance with

$$\begin{aligned} f({\varvec{x}},{\varvec{y}})\approx f_r({\varvec{x}},{\varvec{y}}) := \sum _{\alpha =1}^r \sqrt{\lambda (\alpha )}\varphi ({\varvec{x}},\alpha )\psi (\alpha ,{\varvec{y}}). \end{aligned}$$
(2.1)

It is well known (see, e.g., [30, 38, 42]) that the best possible representation (2.1) in the \(L^2\)-sense is given by the singular value decomposition, also called Karhunen–Lòeve expansion.Footnote 2 Then, the coefficients \(\sqrt{\lambda (\alpha )}\in {\mathbb {R}}\) are the singular values, and the \(\varphi (\alpha )\in L^2(\varOmega _1)\) and \(\psi (\alpha )\in L^2(\varOmega _2)\) are the left and right (\(L^2\)-normalized) eigenfunctions of the integral operator

$$\begin{aligned} {\mathcal {S}}_f:L^2(\varOmega _1)\rightarrow L^2(\varOmega _2), \quad u\mapsto ({\mathcal {S}}_fu)({\varvec{y}}) := \int _{\varOmega _1} f({\varvec{x}},{\varvec{y}})u({\varvec{x}}){\text {d}}\!{\varvec{x}}. \end{aligned}$$

This means that

$$\begin{aligned} \sqrt{\lambda (\alpha )}\psi (\alpha ,{\varvec{y}}) = \big ({\mathcal {S}}_f\varphi (\alpha )\big )({\varvec{y}}) \quad \text {and}\quad \sqrt{\lambda (\alpha )}\varphi ({\varvec{x}},\alpha ) = \big ({\mathcal {S}}_f^\star \psi (\alpha )\big )({\varvec{x}}), \end{aligned}$$
(2.2)

where

$$\begin{aligned} {\mathcal {S}}_f^\star :L^2(\varOmega _2)\rightarrow L^2(\varOmega _1), \quad v\mapsto ({\mathcal {S}}_f^\star v)({\varvec{x}}) := \int _{\varOmega _2} f({\varvec{x}},{\varvec{y}})v({\varvec{y}}){\text {d}}\!{\varvec{y}}. \end{aligned}$$

is the adjoint of \({\mathcal {S}}_f\). Particularly, the left and right eigenfunctions \(\{\varphi (\alpha )\}_{\alpha =1}^\infty \) and \(\{\psi (\alpha )\}_{\alpha =1}^\infty \) form orthonormal bases in \(L^2(\varOmega _1)\) and \(L^2(\varOmega _2)\), respectively.

In order to compute the singular value decomposition, we need to solve the eigenvalue problem

$$\begin{aligned} {\mathcal {K}}_f\varphi (\alpha ) = \lambda (\alpha )\varphi (\alpha ) \end{aligned}$$

for the integral operator

$$\begin{aligned} {\mathcal {K}}_f={\mathcal {S}}_f^\star {\mathcal {S}}_f:L^2(\varOmega _1)\rightarrow L^2(\varOmega _1),\quad u\mapsto ({\mathcal {K}}_fu)({\varvec{x}}) := \int _{\varOmega _1} k_f({\varvec{x}},{\varvec{x}}')u({\varvec{x}}'){\text {d}}\!{\varvec{x}}'. \end{aligned}$$
(2.3)

Since \(f\in L^2(\varOmega _1\times \varOmega _2)\), the kernel

$$\begin{aligned} k_f({\varvec{x}},{\varvec{x}}') = \int _{\varOmega _2} f({\varvec{x}},{\varvec{y}}) f({\varvec{x}}',{\varvec{y}}) {\text {d}}\!{\varvec{y}} \in L^2(\varOmega _1\times \varOmega _1) \end{aligned}$$
(2.4)

is a symmetric Hilbert–Schmidt kernel. Hence, there exist countably many eigenvalues

$$\begin{aligned} \lambda (1)\ge \lambda (2)\ge \cdots \ge \lambda (m)\overset{m\rightarrow \infty }{\longrightarrow } 0 \end{aligned}$$

and the associated eigenfunctions \(\{\varphi (\alpha )\}_{\alpha \in {\mathbb {N}}}\) constitute an orthonormal basis in \(L^2(\varOmega _1)\).

Likewise, to obtain an orthonormal basis of \(L^2(\varOmega _2)\), we can solve the eigenvalue problem

$$\begin{aligned} \widetilde{{\mathcal {K}}}_f\psi (\alpha ) = {\widetilde{\lambda }}(\alpha )\psi (\alpha ) \end{aligned}$$

for the integral operator

$$\begin{aligned} \widetilde{{\mathcal {K}}}_f={\mathcal {S}}_f{\mathcal {S}}_f^\star : L^2(\varOmega _2)\rightarrow L^2(\varOmega _2),\quad u\mapsto (\widetilde{{\mathcal {K}}}_fu)({\varvec{y}}) := \int _{\varOmega _2} {\widetilde{k}}_f({\varvec{y}},{\varvec{y}}')u({\varvec{y}}'){\text {d}}\!{\varvec{y}}' \end{aligned}$$

with symmetric Hilbert–Schmidt kernel

$$\begin{aligned} {\widetilde{k}}_f({\varvec{y}},{\varvec{y}}') = \int _{\varOmega _1} f({\varvec{x}},{\varvec{y}}) f({\varvec{x}},{\varvec{y}}') {\text {d}}\!{\varvec{x}} \in L^2(\varOmega _2\times \varOmega _2). \end{aligned}$$
(2.5)

It holds \(\lambda (\alpha ) = {\widetilde{\lambda }}(\alpha )\), and the sequences \(\{\varphi (\alpha )\}\) and \(\{\psi (\alpha )\}\) are related by (2.2).

2.2 Regularity of the Eigenfunctions

Now, we consider functions \(f\in H^k(\varOmega _1\times \varOmega _2)\). In the following, we collect results from [19, 20] concerning the singular value decomposition of such functions. We repeat the proof whenever needed for having explicit constants. To this end, we define the mixed Sobolev space \(H_{{\text {mix}}}^{k,\ell }(\varOmega _1\times \varOmega _2)\) as a tensor product of Hilbert spaces

$$\begin{aligned} H_{{\text {mix}}}^{k,\ell }(\varOmega _1\times \varOmega _2) := H^k(\varOmega _1)\otimes H^\ell (\varOmega _2), \end{aligned}$$

which we equip with the usual cross-norm

$$\begin{aligned} \Vert f\Vert _{H_{{\text {mix}}}^{k,\ell }(\varOmega _1\times \varOmega _2)} := \sqrt{\sum _{|\varvec{\alpha }|\le k}\sum _{|\varvec{\beta }|\le \ell } \bigg \Vert \frac{\partial ^{|\varvec{\alpha }|}\partial ^{|\varvec{\beta }|}}{\partial \mathbf{x}^{\varvec{\alpha }}\partial \mathbf{y}^{\varvec{\beta }}} f\bigg \Vert _{L^2(\varOmega _1\times \varOmega _2)}^2}. \end{aligned}$$

Note that

$$\begin{aligned} H^k(\varOmega _1\times \varOmega _2)\subset H_{{\text {mix}}}^{k,0}(\varOmega _1\times \varOmega _2),\quad H^k(\varOmega _1\times \varOmega _2)\subset H_{{\text {mix}}}^{0,k}(\varOmega _1\times \varOmega _2). \end{aligned}$$

Lemma 1

Assume that \(f\in H^k(\varOmega _1\times \varOmega _2)\) for some fixed \(k\ge 0\). Then, the operators

$$\begin{aligned} {\mathcal {S}}_f:L^2(\varOmega _1)\rightarrow H^k(\varOmega _2),\quad {\mathcal {S}}_f^\star :L^2(\varOmega _2)\rightarrow H^k(\varOmega _1) \end{aligned}$$

are continuous with

$$\begin{aligned} \big \Vert {\mathcal {S}}_f\big \Vert _{L^2(\varOmega _1)\rightarrow H^k(\varOmega _2)} \le \Vert f\Vert _{H_{{\text {mix}}}^{0,k}(\varOmega _1\times \varOmega _2)},\quad \big \Vert {\mathcal {S}}_f^\star \big \Vert _{L^2(\varOmega _2)\rightarrow H^k(\varOmega _1)} \le \Vert f\Vert _{H_{{\text {mix}}}^{k,0}(\varOmega _1\times \varOmega _2)}. \end{aligned}$$

Proof

From \(H^k(\varOmega _1\times \varOmega _2)\subset H_{{\text {mix}}}^{0,k}(\varOmega _1\times \varOmega _2)\), it follows for \(f\in H^k(\varOmega _1\times \varOmega _2)\) that \(f\in H_{{\text {mix}}}^{0,k}(\varOmega _1\times \varOmega _2)\). Therefore, the operator \({\mathcal {S}}_f:L^2(\varOmega _1) \rightarrow H^k(\varOmega _2)\) is continuous since

$$\begin{aligned} \big \Vert {\mathcal {S}}_fu\big \Vert _{H^k(\varOmega _2)}&= \sup _{\Vert v\Vert _{H^{-k}(\varOmega _2)}=1} ({\mathcal {S}}_fu,v)_{L_2(\varOmega _2)}\\&= \sup _{\Vert v\Vert _{H^{-k}(\varOmega _2)}=1}(f,u\otimes v)_{L^2(\varOmega _1\times \varOmega _2)}\\&\le \sup _{\Vert v\Vert _{H^{-k}(\varOmega _2)}=1}\Vert f\Vert _{H_{{\text {mix}}}^{0,k}(\varOmega _1\times \varOmega _2)} \Vert u\otimes v\Vert _{H_{{\text {mix}}}^{0,-k}(\varOmega _1\times \varOmega _2)}\\&\le \Vert f\Vert _{H_{{\text {mix}}}^{0,k}(\varOmega _1\times \varOmega _2)}\Vert u\Vert _{L^2(\varOmega _1)}. \end{aligned}$$

Note that we have used here \(H_{{\text {mix}}}^{0,-k}(\varOmega _1\times \varOmega _2) = L^2(\varOmega _1)\otimes H^{-k}(\varOmega _2)\). Proceeding likewise for \({\mathcal {S}}_f^\star : L^2(\varOmega _2)\rightarrow H^k(\varOmega _1)\) completes the proof. \(\square \)

Lemma 2

Assume that \(f\in H^k(\varOmega _1\times \varOmega _2)\) for some fixed \(k\ge 0\). Then, it holds \({\mathcal {S}}_f\varphi (\alpha )\in H^k(\varOmega _2)\) and \({\mathcal {S}}_f^\star \psi (\alpha )\in H^k(\varOmega _1)\) for all \(\alpha \in {\mathbb {N}}\) with

$$\begin{aligned} \Vert \varphi (\alpha )\Vert _{H^k(\varOmega _1)}&\le \frac{1}{\sqrt{\lambda (\alpha )}} \Vert f\Vert _{H_{{\text {mix}}}^{k,0}(\varOmega _1\times \varOmega _2)},\\ \Vert \psi (\alpha )\Vert _{H^k(\varOmega _2)}&\le \frac{1}{\sqrt{\lambda (\alpha )}} \Vert f\Vert _{H_{{\text {mix}}}^{0,k}(\varOmega _1\times \varOmega _2)}. \end{aligned}$$

Proof

According to (2.2) and Lemma 1, we have

$$\begin{aligned} \Vert \varphi (\alpha )\Vert _{H^k(\varOmega _1)} = \frac{1}{\sqrt{\lambda (\alpha )}} \big \Vert {\mathcal {S}}_f^\star \psi (\alpha )\big \Vert _{H^k(\varOmega _1)}\le \frac{1}{\sqrt{\lambda (\alpha )}} \Vert f\Vert _{H_{{\text {mix}}}^{k,0}(\varOmega _1\times \varOmega _2)}\Vert \psi (\alpha )\Vert _{L^2(\varOmega _2)}. \end{aligned}$$

This proves the first assertion. The second assertion follows by duality. \(\square \)

As an immediate consequence of Lemma 2, we obtain

$$\begin{aligned} \sum _{\alpha =1}^r\lambda (\alpha )\Vert \varphi (\alpha )\Vert _{H^k(\varOmega _1)}^2 \le r\Vert f\Vert _{H_{{\text {mix}}}^{k,0}(\varOmega _1\times \varOmega _2)}^2 \end{aligned}$$

and

$$\begin{aligned} \sum _{\alpha =1}^r\lambda (\alpha )\Vert \psi (\alpha )\Vert _{H^k(\varOmega _2)}^2 \le r\Vert f\Vert _{H_{{\text {mix}}}^{0,k}(\varOmega _1\times \varOmega _2)}^2. \end{aligned}$$

We will show later in Lemma 4 how to improve this estimate by sacrificing some regularity.

2.3 Truncation Error

We next give estimates on the decay rate of the eigenvalues of the integral operator \({\mathcal {K}}_f = {\mathcal {S}}_f^\star {\mathcal {S}}_f\) with kernel (2.4). To this end, we exploit the smoothness in the function’s first variable and assume hence \(f\in H_{{\text {mix}}}^{k,0}(\varOmega _1 \times \varOmega _2)\). We introduce finite element spaces \(U_r\subset L^2(\varOmega _1)\), which consist of r discontinuous, piecewise polynomial functions of total degree \(\lceil k\rceil \) on a quasi-uniform triangulation of \(\varOmega _1\) with mesh width \(h_r\sim r^{-1/n_1}\). Then, given a function \(w\in H^k(\varOmega _1)\), the \(L^2\)-orthogonal projection \(P_r:L^2(\varOmega _1)\rightarrow U_r\) satisfies

$$\begin{aligned} \Vert (I-P_r)w \Vert _{L^2(\varOmega _1)} \le c_k r^{-k/n_1}|w|_{H^k(\varOmega _1)} \end{aligned}$$
(2.6)

uniformly in \(r\) due to the Bramble–Hilbert lemma, see, e.g., [6, 7].

For the approximation of \(f({\varvec{x}},{\varvec{y}})\) in the first variable, i.e., \(\big ((P_r\otimes I)f\big )({\varvec{x}},{\varvec{y}})\), we obtain the following approximation result for the present choice of \(U_r\), see [20] for the proof.

Lemma 3

Assume that \(f\in H^k(\varOmega _1\times \varOmega _2)\) for some fixed \(k\ge 0\). Let \(\lambda (1)\ge \lambda (2)\ge \ldots \ge 0\) be the eigenvalues of the operator \({\mathcal {K}}_f={\mathcal {S}}_f^\star {\mathcal {S}}_f\) and \(\lambda _r(1)\ge \lambda _r(2)\ge \ldots \ge \lambda _r(r)\ge 0\) those of \({\mathcal {K}}_f^{\,r}:= P_r{\mathcal {K}}_fP_r\). Then, it holds

$$\begin{aligned} \Vert f-(P_r\otimes I) f\Vert _{L^2(\varOmega _1\times \varOmega _2)}^2 = {\text {trace}}{\mathcal {K}}_f-{\text {trace}}{\mathcal {K}}_{f}^{\,r} = \sum _{\alpha =1}^{r}\big (\lambda (\alpha )-\lambda _r(\alpha )\big ) + \sum _{\alpha =r+1}^\infty \lambda (\alpha ). \end{aligned}$$

By combining this lemma with the approximation estimate (2.6) and in view of \(\lambda (\alpha )-\lambda _r(\alpha ) \ge 0\) for all \(\alpha \in \{1,2,\ldots ,r\}\) according to the min-max theorem of Courant–Fischer, see [1], for example, we conclude that the truncation error of the singular value decomposition can be bounded by

$$\begin{aligned} \Bigg \Vert f\!-\!\sum _{\alpha =1}^r\sqrt{\lambda (\alpha )} \big (\varphi (\alpha )\otimes \psi (\alpha )\big )\Bigg \Vert _{L^2(\varOmega _1\times \varOmega _2)}\!\!=\!\! \sqrt{\sum _{\alpha =r+1}^\infty \lambda (\alpha )} \!\le \! c_k r^{-\frac{k}{n_1}}|f|_{H_{{\text {mix}}}^{k,0}(\varOmega _1\times \varOmega _2)}. \end{aligned}$$

Since the eigenvalues of the integral operator \({\mathcal {K}}_f\) and its adjoint \(\widetilde{{\mathcal {K}}}_f\) are the same, we can also exploit the smoothness of f in the second coordinate by interchanging the roles of \(\varOmega _1\) and \(\varOmega _2\) in the above considerations. We thus obtain the following theorem:

Theorem 1

Let \(f\in H^k(\varOmega _1\times \varOmega _2)\) for some fixed \(k\ge 0\), and let

$$\begin{aligned} f_r^{{\text {SVD}}} = \sum _{\alpha =1}^r\sqrt{\lambda (\alpha )} \big (\varphi (\alpha )\otimes \psi (\alpha )\big ). \end{aligned}$$

Then, it holds

$$\begin{aligned} \Vert f-f_r^{{\text {SVD}}}\Vert _{L^2(\varOmega _1\times \varOmega _2)} = \sqrt{\sum _{\alpha =r+1}^\infty \lambda (\alpha )} \le c_k r^{-\frac{k}{\min \{n_1,n_2\}}} |f|_{H^k(\varOmega _1\times \varOmega _2)}. \end{aligned}$$
(2.7)

Remark 1

Theorem 1 implies that the eigenvalues \(\{\lambda (\alpha )\}_{\alpha \in {\mathbb {N}}}\) in case of a function \(f\in H^k(\varOmega _1\times \varOmega _2)\) decay like

$$\begin{aligned} \lambda (\alpha )\lesssim \alpha ^{-\frac{2k}{\min \{n_1,n_2\}}-1} \quad \text {as } \alpha \rightarrow \infty . \end{aligned}$$
(2.8)

Having the decay rate of the eigenvalues at hand, we are able to improve the result of Lemma 2 by sacrificing some regularity. Note that the proof of this result is based upon an argument from [43].

Lemma 4

Assume that \(f\in H^{k+\min \{n_1,n_2\}}(\varOmega _1\times \varOmega _2)\) for some fixed \(k\ge 0\). Then, it holds

$$\begin{aligned} \sum _{\alpha =1}^\infty \lambda (\alpha )\Vert \varphi (\alpha )\Vert _{H^k(\varOmega _1)}^2 =\Vert f\Vert _{H_{{\text {mix}}}^{k,0}(\varOmega _1\times \varOmega _2)}^2 \end{aligned}$$

and

$$\begin{aligned} \sum _{\alpha =1}^\infty \lambda (\alpha )\Vert \psi (\alpha )\Vert _{H^k(\varOmega _2)}^2 =\Vert f\Vert _{H_{{\text {mix}}}^{0,k}(\varOmega _1\times \varOmega _2)}^2. \end{aligned}$$

Proof

Without loss of generality, we assume \(n_1\!\le \! n_2\). Then, since \(f\!\in \! H^{k+n_1}(\varOmega _1\!\times \!\varOmega _2)\), we conclude from (2.8) that

$$\begin{aligned} \lambda (\alpha )\lesssim \alpha ^{-\frac{2(k+n_1)}{n_1}-1} \quad \text {as } \alpha \rightarrow \infty , \end{aligned}$$

where we used that \(n_1=\min \{n_1,n_2\}\). Moreover, by interpolating between \(L^2(\varOmega _1)\) and \(H^{k+n_1}(\varOmega _1)\), compare [33, 45], for example, we find

$$\begin{aligned} \Vert \varphi (\alpha )\Vert _{H^k(\varOmega _1)}^2 \lesssim \lambda (\alpha )^{-\frac{k}{k+n_1}}, \end{aligned}$$

that is

$$\begin{aligned} \lambda (\alpha )\Vert \varphi (\alpha )\Vert _{H^k(\varOmega _1)}^2 \lesssim \lambda (\alpha )^{\frac{n_1}{k+n_1}}. \end{aligned}$$

As a consequence, we infer that

$$\begin{aligned} \lambda (\alpha )\Vert \varphi (\alpha )\Vert _{H^k(\varOmega _1)}^2 \lesssim \alpha ^{-(\frac{2(k+n_1)}{n_1}+1)\cdot \frac{n_1}{k+n_1}} = \alpha ^{-(2+\delta )} \end{aligned}$$

with \(\delta = \frac{n_1}{k+n_1}>0\). Therefore, it holds

$$\begin{aligned} \sum _{\alpha =1}^{\infty }\alpha ^{(1+\delta ')}\lambda (\alpha )\Vert \varphi (\alpha )\Vert _{H^k(\varOmega _1)}^2 < \infty \end{aligned}$$

for any \(\delta '\in (0,\delta )\). Hence, the series

$$\begin{aligned} A({\varvec{x}}) := \sum _{\alpha =1}^{\infty }\alpha ^{(1+\delta ')}\lambda (\alpha ) |\partial _{{\varvec{x}}}^{\varvec{\beta }}\varphi (\alpha ,{\varvec{x}})|^2 \end{aligned}$$

converges for almost all \({\varvec{x}}\in \varOmega _1\), provided that \(|\varvec{\beta }|\le k\). Likewise, the series

$$\begin{aligned} B({\varvec{y}}) := \sum _{\alpha =1}^{\infty }\alpha ^{-(1+\delta ')} |\psi (\alpha ,{\varvec{y}})|^2 \end{aligned}$$

converges for almost all \({\varvec{y}}\in \varOmega _2\). Thus, the series

$$\begin{aligned} \sum _{\alpha =1}^{\infty }\sqrt{\lambda (\alpha )}|\partial _{{\varvec{x}}}^{\varvec{\beta }} \varphi (\alpha ,{\varvec{x}})||\psi (\alpha ,{\varvec{y}})| \le \sqrt{A({\varvec{x}})}\sqrt{B({\varvec{y}})} \end{aligned}$$

converges for almost all \({\varvec{x}}\in \varOmega _1\) and \({\varvec{y}}\in \varOmega _2\), provided that \(|\varvec{\beta }|\le k\). Because of Egorov’s theorem, the pointwise absolute convergence almost everywhere implies uniform convergence. Hence, we can switch differentiation and summation to get

$$\begin{aligned} \Vert f\Vert _{H_{{\text {mix}}}^{k,0}(\varOmega _1\times \varOmega _2)}^2&= \sum _{|\varvec{\beta }|\le k}\Bigg \Vert \partial _{{\varvec{x}}}^{\varvec{\beta }} \sum _{\alpha =1}^{\infty }\sqrt{\lambda (\alpha )} \big (\varphi (\alpha )\otimes \psi (\alpha )\big )\Bigg \Vert _{L^2(\varOmega _1\times \varOmega _2)}^2\\&= \sum _{|\varvec{\beta }|\le k}\Bigg \Vert \sum _{\alpha =1}^{\infty } \sqrt{\lambda (\alpha )}\big (\partial _{{\varvec{x}}}^{\varvec{\beta }} \varphi (\alpha )\otimes \psi (\alpha )\big )\Bigg \Vert _{L^2(\varOmega _1\times \varOmega _2)}^2. \end{aligned}$$

Finally, we exploit the product structure of \(L^2(\varOmega _1\times \varOmega _2)\) and the orthonormality of \(\{\psi (\alpha )\}_{\alpha \in {\mathbb {N}}}\) to derive the first assertion, i.e.,

$$\begin{aligned} \Vert f\Vert _{H_{{\text {mix}}}^{k,0}(\varOmega _1\times \varOmega _2)}^2&= \sum _{\alpha =1}^{\infty }\lambda (\alpha )\sum _{|\varvec{\beta }|\le k} \big \Vert \partial _{{\varvec{x}}}^{\varvec{\beta }} \varphi (\alpha )\big \Vert _{L^2(\varOmega _1)}^2\Vert \psi (\alpha )\Vert _{L^2(\varOmega _2)}^2\\&= \sum _{\alpha =1}^{\infty }\lambda (\alpha ) \Vert \varphi (\alpha )\Vert _{H^k(\varOmega _1)}^2. \end{aligned}$$

The second assertion follows in complete analogy. \(\square \)

2.4 Vector-Valued Functions

In addition to the aforementioned results, we will also need the following result which concerns the approximation of vector-valued functions. Here and in the sequel, the vector-valued function \({\varvec{w}} = [w(\alpha )]_{\alpha =1}^m\) is an element of \([L^2(\varOmega )]^m\) and \([H^k(\varOmega )]^m\) for some domain \(\varOmega \subset {\mathbb {R}}^n\), respectively, if the norms

$$\begin{aligned} \Vert {\varvec{w}}\Vert _{[L^2(\varOmega )]^m} = \sqrt{\sum _{\alpha =1}^m \Vert w(\alpha )\Vert _{L^2(\varOmega )}^2},\quad \Vert {\varvec{w}}\Vert _{[H^k(\varOmega )]^m} = \sqrt{\sum _{\alpha =1}^m \Vert w(\alpha )\Vert _{H^k(\varOmega )}^2} \end{aligned}$$

are finite. Likewise, the seminorm is defined in \([H^k(\varOmega )]^m\).

Consider now a vector-valued function \({\varvec{w}}\in [H^k(\varOmega _1)]^m\) of dimension m. Then, instead of (2.6), we find

$$\begin{aligned} \Vert (I-P_r){\varvec{w}}\Vert _{[L^2(\varOmega _1)]^m} \le c_k \bigg (\frac{r}{m}\bigg )^{-k/n_1}|{\varvec{w}}|_{[H^k(\varOmega _1)]^m}, \end{aligned}$$

since \({\varvec{w}}\) consists of m components and we thus need m-times as many ansatz functions for our approximation argument. Hence, in case of a vector-valued function \({\varvec{f}}\in [H_{{\text {mix}}}^{k,0}(\varOmega _1\times \varOmega _2)]^m\simeq [H^k(\varOmega _1)]^m \otimes L^2(\varOmega _2)\), we conclude by exploiting the smoothness in the first variableFootnote 3 that the truncation error of the singular value decomposition can be estimated by

$$\begin{aligned} \bigg \Vert {\varvec{f}}-\sum _{\alpha =1}^r\sqrt{\lambda (\alpha )} \big (\varvec{\varphi }(\alpha )\otimes \psi (\alpha )\big )\Bigg \Vert _{[L^2(\varOmega _1\times \varOmega _2)]^m}\!\!\! \le c_k \bigg (\frac{r}{m}\bigg )^{-k/n_1}\!|{\varvec{f}}|_{[H_{{\text {mix}}}^{k,0}(\varOmega _1\times \varOmega _2)]^m}.\nonumber \\ \end{aligned}$$
(2.9)

Hence, the decay rate of the singular values is considerably reduced. Finally, we like to remark that Lemma 4 holds also in the vector case, i.e.,

$$\begin{aligned} \sum _{\alpha =1}^\infty \lambda (\alpha )\Vert \varvec{\varphi }(\alpha )\Vert _{[H^k(\varOmega _1)]^m}^2 = \Vert {\varvec{f}}\Vert _{[H_{{\text {mix}}}^{k,0}(\varOmega _1\times \varOmega _2)]^m}^2 \end{aligned}$$

and

$$\begin{aligned} \sum _{\alpha =1}^\infty \lambda (\alpha )\Vert \psi (\alpha )\Vert _{H^k(\varOmega _2)}^2 =\Vert {\varvec{f}}\Vert _{[H_{{\text {mix}}}^{0,k}(\varOmega _1\times \varOmega _2)]^m}^2, \end{aligned}$$
(2.10)

provided that \({\varvec{f}}\) has extra regularity in terms of \({\varvec{f}}\in [H^{k+n_1}(\varOmega _1\times \varOmega _2)]^m\). Here, analogously to above, \([H_{{\text {mix}}}^{0,k}(\varOmega _1\times \varOmega _2)]^m\simeq [L^2(\varOmega _1)]^m\otimes H^k(\varOmega _2)\).

After these preparations, we now introduce and analyze two types of continuous analogues of tensor formats, namely of the Tucker format [26, 49] and of the tensor train format [34, 36], and discuss their approximation properties for functions \(f\in H^k(\varOmega _1\times \dots \times \varOmega _m)\).

3 Tucker Tensor Format

3.1 Tucker Decomposition

We shall consider from now on a product domain which consists of m different domains \(\varOmega _j\subset {\mathbb {R}}^{n_j}\), \(j=1,\ldots ,m\). For given \(f\in L^2(\varOmega _1\times \dots \times \varOmega _m)\) and \(j\in \{1,2,\ldots ,m\}\), we apply the singular value decomposition to separate the variables \({\varvec{x}}_j\in \varOmega _j\) and \(({\varvec{x}}_1,\ldots , {\varvec{x}}_{j-1},{\varvec{x}}_{j+1},\ldots ,{\varvec{x}}_m)\in \varOmega _1 \times \dots \times \varOmega _{j-1}\times \varOmega _{j+1}\times \dots \times \varOmega _m\). We hence get

$$\begin{aligned} \begin{aligned}&f({\varvec{x}}_1,\ldots ,{\varvec{x}}_{j-1},{\varvec{x}}_j,{\varvec{x}}_{j+1},\ldots ,{\varvec{x}}_m)\\&\quad = \sum _{\alpha _j=1}^\infty \sqrt{\lambda _j(\alpha _j)}\varphi _j({\varvec{x}}_j,\alpha _j) \psi _j(\alpha _j,{\varvec{x}}_1,\ldots ,{\varvec{x}}_{j-1},{\varvec{x}}_{j+1},\ldots ,{\varvec{x}}_m), \end{aligned} \end{aligned}$$
(3.1)

where the left eigenfunctions \(\{\varphi _j(\alpha _j)\}_{\alpha _j\in {\mathbb {N}}}\) form an orthonormal basis in \(L^2(\varOmega _j)\). Consequently, if we iterate over all \(j\in \{1,2,\ldots ,m\}\), this yields an orthonormal basis \(\{\varphi _1(\alpha _1)\otimes \cdots \otimes \varphi _m(\alpha _m)\}_{\varvec{\alpha }\in {\mathbb {N}}^m}\) of \(L^2(\varOmega _1\times \dots \times \varOmega _m)\), and we arrive at the representation

$$\begin{aligned} f({\varvec{x}}_1,\ldots ,{\varvec{x}}_m) = \sum _{|\varvec{\alpha }| = 1}^\infty \omega (\varvec{\alpha })\varphi _1(\alpha _1,{\varvec{x}}_1)\cdots \varphi _m(\alpha _m,{\varvec{x}}_m). \end{aligned}$$
(3.2)

Herein, the tensor \(\big [\omega (\varvec{\alpha })\big ]_{\varvec{\alpha }\in {\mathbb {N}}^m}\) is the core tensor, where a single coefficient is given by

$$\begin{aligned} \omega (\alpha _1,\ldots ,\alpha _m) \!= \!\!\int _{\varOmega _1\times \dots \times \varOmega _m}\!\!\! f({\varvec{x}}_1,\ldots ,{\varvec{x}}_m) \varphi _1(\alpha _1,{\varvec{x}}_1) \cdots \varphi _m(\alpha _m,{\varvec{x}}_m){\text {d}}\!\,({\varvec{x}}_1,\ldots ,{\varvec{x}}_m). \end{aligned}$$

3.2 Truncation Error

If we intend to truncate the singular value decomposition (3.1) after \(r_j\) terms such that the truncation error is bounded by \(\varepsilon \), we have to choose

$$\begin{aligned} \sqrt{\sum _{\alpha _j = r_j+1}^\infty \lambda _j(\alpha _j)} \lesssim r_j^{-k/n_j}|f|_{H^k(\varOmega _1\times \dots \times \varOmega _m)} \overset{!}{\lesssim }\varepsilon \qquad \Longrightarrow \qquad r_j = \varepsilon ^{-{n_j}/k} \end{aligned}$$
(3.3)

according to Theorem 1. Doing so for all \(j\in \{1,2,\ldots ,m\}\), we obtain the approximation

$$\begin{aligned} f_{r_1,\ldots ,r_m}^{{\text {TF}}}({\varvec{x}}_1,\ldots ,{\varvec{x}}_m) = \sum _{\alpha _1=1}^{r_1}\cdots \sum _{\alpha _m=1}^{r_m} \omega (\alpha _1,\ldots ,\alpha _m)\varphi _1(\alpha _1,{\varvec{x}}_1) \cdots \varphi _m(\alpha _m,{\varvec{x}}_m). \end{aligned}$$

We have the following result on the Tucker decomposition:

Theorem 2

Let \(f\in H^k(\varOmega _1\times \dots \times \varOmega _m)\) for some fixed \(k>0\) and \(0<\varepsilon <1\). If the ranks are chosen according to \(r_j=\varepsilon ^{-n_j/k}\) for all \(j=1,\dots ,m\), then the truncation error of the truncated Tucker decomposition is

$$\begin{aligned} \big \Vert f-f_{r_1,\ldots ,r_m}^{{\text {TF}}}\big \Vert _{L^2(\varOmega _1\times \dots \times \varOmega _m)} \le \sqrt{\sum _{j=1}^m \sum _{\alpha _j = r_j+1}^\infty \lambda _j(\alpha _j)}\lesssim \sqrt{m}\varepsilon , \end{aligned}$$

while the storage cost for the core tensor of \(f_{r_1,\ldots ,r_m}^{{\text {TF}}}\) are \(\prod _{j=1}^m r_j = \varepsilon ^{-(n_1+\dots +n_m)/k}\).

Proof

For the approximation of the core tensor, the sets of the univariate eigenfunctions \(\{\varphi _j(\alpha _j)\}_{\alpha _j=1}^{r_j}\) are used for all \(j=1,\ldots ,m\), cf. (3.2). Due to orthonormality, we find

$$\begin{aligned}&\big \Vert f-f_{r_1,\ldots ,r_m}^{{\text {TF}}}\big \Vert _{L^2(\varOmega _1\times \dots \times \varOmega _m)}^2\\&\quad = \sum _{j=1}^m \big \Vert f_{r_1,\dots ,r_{j-1},\infty ,\dots ,\infty }^{{\text {TF}}} -f_{r_1,\dots ,r_j,\infty ,\dots ,\infty }^{{\text {TF}}}\big \Vert _{L^2(\varOmega _1\times \dots \times \varOmega _m)}^2, \end{aligned}$$

where we obtain \(f_{\infty ,\ldots ,\infty }^{{\text {TF}}} = f\) in case of \(j=1\). Since

$$\begin{aligned}&\big \Vert f_{r_1,\dots ,r_{j-1},\infty ,\dots ,\infty }^{{\text {TF}}} -f_{r_1,\dots ,r_j,\infty ,\dots ,\infty }^{{\text {TF}}}\big \Vert _{L^2(\varOmega _1\times \dots \times \varOmega _m)}^2\\&\quad \le \big \Vert f_{\infty ,\dots ,\infty }^{{\text {TF}}} -f_{\infty ,\dots ,\infty ,r_j,\infty ,\dots ,\infty }^{{\text {TF}}}\big \Vert _{L^2(\varOmega _1\times \dots \times \varOmega _m)}^2 = \sum _{\alpha _j = r_j+1}^\infty \lambda _j(\alpha _j) \end{aligned}$$

for all \(j\in \{1,2,\ldots ,m\}\), we arrive with (3.3) and the summation over \(j = 1,\ldots ,m\) at the desired error estimate. This completes the proof, since the estimate on the number of coefficients in the core tensor is obvious. \(\square \)

3.3 Sobolev Spaces with Dimension Weights

The cost of the core tensor of the Tucker decomposition exhibits the curse of dimension as the number m of subdomains increases. This can be seen most simply for the example \(n_j=n\). Then, the cost is \(\varepsilon ^{-nm/k}\), which expresses the curse of dimension as long as k is not proportional to m. Nonetheless, in case of Sobolev spaces with dimension weights, the curse of dimension can be beaten.

For \(f\in H^{k+n}(\varOmega ^m)\), we shall discuss the situation \(m\rightarrow \infty \) in more detail. To this end, we assume that all subdomains are identical to a single domain \(\varOmega \subset {\mathbb {R}}^n\) of dimension n and note that the limit \(m\rightarrow \infty \) only makes sense when weights are included in the underlying Sobolev spaces which ensure that higher dimensions become less important. For our proofs, we choose as usual m arbitrary but fixed and show the existence of m-independent constants in the convergence and cost estimates.

The Sobolev spaces \(H_{\varvec{\gamma }}^k(\varOmega ^m)\) with dimension weights \(\varvec{\gamma }\in {\mathbb {R}}^m\) which we consider are given by all functions \(f\in H^k(\varOmega ^m)\) such that

$$\begin{aligned} \bigg \Vert \frac{\partial ^k f}{\partial {\varvec{x}}_j^{\varvec{\beta }}}\bigg \Vert _{L^2(\varOmega ^m)} \lesssim \gamma _j^k\Vert f\Vert _{H^k(\varOmega ^m)}\ \text {for all}\ |\varvec{\beta }| = k\ \text {and}\ j=1,2,\ldots ,m. \end{aligned}$$
(3.4)

The definition in (3.4) means that, given a function f with norm \(\Vert f\Vert _{H^k(\varOmega ^m)}<\infty \), the partial derivatives with respect to \({\varvec{x}}_j\) become less important as the dimension j increases. Such functions appear, for example, in uncertainty quantification. Let be given a Karhunen–Loève expansion

$$\begin{aligned} u(\mathbf{x},\mathbf{y}) = \sum _{j = 1}^m \sigma _j \varphi _j(\mathbf{x}) y_j, \quad y_j\in [-\nicefrac {1}{2},\nicefrac {1}{2}], \end{aligned}$$

and insert it into a function \(b:{\mathbb {R}}\rightarrow {\mathbb {R}}\) of finite smoothness \(W^{k,\infty }({\mathbb {R}})\). Then, the function \(b\big (u(\mathbf{x},\mathbf{y})\big )\) satisfies (3.4) with respect to the \(\mathbf{y}\)-variable, where \(\gamma _j = \sigma _j\). Hence, the solution of a given partial differential equation would satisfy a decay estimate similar to (3.4) whenever the stochastic field enters the partial differential equation through a non-smooth coefficient function b, compare [14, 23, 24], for example.

It turns out that algebraically decaying weights (3.5) are sufficient to beat the curse of dimension in case of the Tucker tensor decomposition.Footnote 4

Theorem 3

Given \(\delta > 0\), let \(f\in H_{\varvec{\gamma }}^k(\varOmega ^m)\) for some fixed \(k>0\) with weights (3.4) that decay like

$$\begin{aligned} \gamma _j\lesssim j^{-(1+\delta ')/k}\ \text {for some } \delta '>\delta +\frac{k}{n}. \end{aligned}$$
(3.5)

Then, for all \(0<\varepsilon <1\), the error of the continuous Tucker decomposition with ranks

$$\begin{aligned} r_j = \big \lceil \gamma _j^n j^{(1+\delta )n/k}\varepsilon ^{-n/k}\big \rceil \end{aligned}$$
(3.6)

is of order \(\varepsilon \), while the storage cost for the core tensor of \(f_{r_1,\ldots ,r_m}^{{\text {TF}}}\) is bounded by \(\varepsilon ^{-n/k}\) independent of the dimension m.

Proof

In view of Theorem 1 and (3.4), we deduce by choosing the ranks as in (3.3) that

$$\begin{aligned} \sqrt{\sum _{\alpha _j = r_j+1}^\infty \lambda _j(\alpha _j)} \lesssim r_j^{-k/n} \gamma _j^k \Vert f\Vert _{H^k(\varOmega ^m)} \lesssim \frac{\varepsilon }{j^{1+\delta }}. \end{aligned}$$

Therefore, we reach the desired overall truncation error

$$\begin{aligned} \sum _{j=1}^m \frac{\varepsilon }{j^{1+\delta }} \lesssim \varepsilon \ \text {as } m\rightarrow \infty . \end{aligned}$$
(3.7)

When the weights \(\gamma _j\) decay as in (3.6), then the cost of the core tensor is

$$\begin{aligned} C := \prod _{j=1}^m r_j \le \prod _{j=1}^m \big (1+\gamma _j^n j^{(1+\delta )n/k}\varepsilon ^{-n/k}\big ) \lesssim \prod _{j=1}^m \big (1+j^{-\theta }\varepsilon ^{-n/k}\big ) \end{aligned}$$

with \(\theta = (\delta '-\delta )n/k>1\). Hence, the cost of the core tensor stays bounded independently of m since

$$\begin{aligned} \log C\lesssim \sum _{j=1}^m \log \big (1+j^{-\theta }\varepsilon ^{-n/k}\big ) \le \varepsilon ^{-n/k}\sum _{j=1}^m j^{-\theta }\lesssim \varepsilon ^{-n/k} \ \text {as } m\rightarrow \infty . \end{aligned}$$

\(\square \)

4 Tensor Train Format

4.1 Tensor Train Decomposition

For the discussion of the continuous tensor train decomposition, we should assume that the domains \(\varOmega _j\subset {\mathbb {R}}^{n_j}\), \(j=1,\ldots ,m\), are arranged in such a way that it holds \(n_1\le \dots \le n_m\).Footnote 5

Now, consider \(f\in H^k(\varOmega _1\times \dots \times \varOmega _m)\) and separate the variables \({\varvec{x}}_1\in \varOmega _1\) and \(({\varvec{x}}_2,\ldots ,{\varvec{x}}_m)\in \varOmega _2\times \dots \times \varOmega _m\) by the singular value decomposition

$$\begin{aligned} f({\varvec{x}}_1,{\varvec{x}}_2,\ldots ,{\varvec{x}}_n) = \sum _{\alpha _1=1}^\infty \sqrt{\lambda _1(\alpha _1)}\varphi _1({\varvec{x}}_1,\alpha _1) \psi _1(\alpha _1,{\varvec{x}}_2,\ldots ,{\varvec{x}}_m). \end{aligned}$$

Since

$$\begin{aligned} \bigg [\sqrt{\lambda _1(\alpha _1)}\psi _1(\alpha _1)\bigg ]_{\alpha _1=1}^\infty \in \ell ^2({\mathbb {N}})\otimes L^2(\varOmega _2\times \dots \times \varOmega _m), \end{aligned}$$

we can separate \((\alpha _1,{\varvec{x}}_2)\in {\mathbb {N}} \times \varOmega _2\) from \(({\varvec{x}}_3,\ldots ,{\varvec{x}}_m)\in \varOmega _3\times \dots \times \varOmega _m\) by means of a second singular value decomposition and arrive at

$$\begin{aligned} \begin{aligned}&\bigg [\sqrt{\lambda _1(\alpha _1)}\psi _1(\alpha _1,{\varvec{x}}_2, \ldots ,{\varvec{x}}_m)\bigg ]_{\alpha _1=1}^\infty \\&\quad = \sum _{\alpha _2=1}^\infty \sqrt{\lambda _2(\alpha _2)} \bigg [\varphi _2(\alpha _1,{\varvec{x}}_2,\alpha _2)\bigg ]_{\alpha _1=1}^\infty \psi _2(\alpha _2,{\varvec{x}}_3,\ldots ,{\varvec{x}}_m). \end{aligned} \end{aligned}$$
(4.1)

By repeating the last step and successively separating \((\alpha _{j-1},{\varvec{x}}_j)\in {\mathbb {N}}\times \varOmega _j\) from \(({\varvec{x}}_{j+1},\ldots ,{\varvec{x}}_m)\in \varOmega _{j+1} \times \dots \times \varOmega _m\) for \(j=3,\ldots ,m-1\), we finally arrive at the representation

$$\begin{aligned} f({\varvec{x}}_1,\ldots ,{\varvec{x}}_m)&= \sum _{\alpha _1=1}^\infty \cdots \sum _{\alpha _{m-1}=1}^\infty \varphi _1(\alpha _1,{\varvec{x}}_1)\varphi _2(\alpha _1,{\varvec{x}}_2,\alpha _2)\\&\quad \cdots \varphi _{m-1}(\alpha _{m-2},{\varvec{x}}_{m-1},\alpha _{m-1}) \varphi _m(\alpha _{m-1},{\varvec{x}}_m), \end{aligned}$$

where

$$\begin{aligned} \varphi _m(\alpha _{m-1},{\varvec{x}}_m) = \sqrt{\lambda _{m-1}(\alpha _{m-1})}\psi _{m-1}(\alpha _{m-1},{\varvec{x}}_m). \end{aligned}$$

In contrast to the Tucker format, we do not obtain a huge core tensor since each of the \(m-1\) singular value decompositions of the tensor train decomposition removes the actual first spatial domain from the approximant. We just obtain a product of matrix-valued functions (except for the first and last factor which are vector-valued functions), each of which is related to a specific domain \(\varOmega _j\). This especially results in only \(m-1\) sums in contrast to the m sums for the Tucker format.

4.2 Truncation Error

In practice, we truncate the singular value decomposition in step j after \(r_j\) terms, thus arriving at the representation

$$\begin{aligned} f_{r_1,\ldots ,r_{m-1}}^{{\text {TT}}}({\varvec{x}}_1,\ldots ,{\varvec{x}}_m)&= \sum _{\alpha _1=1}^{r_1}\cdots \sum _{\alpha _{m-1}=1}^{r_{m-1}}\varphi _1(\alpha _1,{\varvec{x}}_1) \varphi _2(\alpha _1,{\varvec{x}}_2,\alpha _2)\\&\quad \cdots \varphi _{m-1}(\alpha _{m-2},{\varvec{x}}_{m-1},\alpha _{m-1}) \varphi _m(\alpha _{m-1},{\varvec{x}}_m). \end{aligned}$$

One readily infers by using Pythagoras’ theorem that the truncation error is bounded by

$$\begin{aligned} \Vert f-f_{r_1,\ldots ,r_{m-1}}^{{\text {TT}}}\Vert _{L^2(\varOmega _1\times \dots \times \varOmega _m)} \le \sqrt{\sum _{j=1}^{m-1}\sum _{\alpha _j=r_j+1}^\infty \lambda _j(\alpha _j)}, \end{aligned}$$

see also [35]. Note that, for \(j\ge 2\), the singular values \(\{\lambda _j(\alpha )\}_{\alpha \in {\mathbb {N}}}\) in this estimate do not coincide with the singular values from the original continuous tensor train decomposition due to the truncation.

We next shall give bounds on the truncation error. In the j-th step of the algorithm, \(j=2,3,\ldots ,m-1\), one needs to approximate the vector-valued function

$$\begin{aligned} {\varvec{g}}_j({\varvec{x}}_j,\ldots ,{\varvec{x}}_m) := \bigg [\sqrt{\lambda _{j-1}(\alpha _{j-1})} \psi _{j-1}(\alpha _{j-1},{\varvec{x}}_j, \ldots ,{\varvec{x}}_m)\bigg ]_{\alpha _{j-1}=1}^{r_{j-1}} \end{aligned}$$

by a vector-valued singular value decomposition. This means that we consider the singular value decomposition (2.9) for a vector-valued function in case of the domains \(\varOmega _j\) and \(\varOmega _{j+1}\times \dots \times \varOmega _m\).

For \(f\in H^{k+n_{m-1}}(\varOmega _1\times \dots \times \varOmega _m)\), it holds \({\varvec{g}}_2\in [H^{k+n_{m-1}}(\varOmega _2\times \dots \times \varOmega _m)]^{r_1}\) and

$$\begin{aligned} |{\varvec{g}}_2|_{[H^k(\varOmega _2\times \dots \times \varOmega _m)]^{r_1}}&\le \sqrt{\sum _{\alpha _1=1}^\infty \lambda _1(\alpha _1) |\psi _1(\alpha _1)|_{H^k(\varOmega _2\times \dots \times \varOmega _m)}^2}\\&\le |f|_{H^k(\varOmega _1\times \dots \times \varOmega _m)} \end{aligned}$$

according to Lemma 4, precisely in its vectorized version (2.10). It follows \({\varvec{g}}_3 \in [H^{k+n_{m-1}}(\varOmega _3\times \dots \times \varOmega _m)]^{r_2}\) and, again by (2.10),

$$\begin{aligned} |{\varvec{g}}_3|_{[H^k(\varOmega _3\times \dots \times \varOmega _m)]^{r_2}}&\le \sqrt{\sum _{\alpha _2=1}^\infty \lambda _2(\alpha _2) |\psi _2(\alpha _2)|_{H^k(\varOmega _3\times \dots \times \varOmega _m)}^2}\\&\le |{\varvec{g}}_2|_{[H^k(\varOmega _2\times \dots \times \varOmega _m)]^{r_1}}. \end{aligned}$$

We hence conclude recursively \({\varvec{g}}_j\in [H^{k+n_{m-1}}(\varOmega _j\times \dots \times \varOmega _m)]^{r_{j-1}}\) and

$$\begin{aligned} |{\varvec{g}}_j|_{[H^k(\varOmega _j\times \dots \times \varOmega _m)]^{r_{j-1}}} \le |f|_{H^k(\varOmega _1\times \dots \times \varOmega _m)}\ \text {for all}\ j=2,3,\ldots ,m-1. \end{aligned}$$
(4.2)

Estimate (4.2) shows that the \(H^k\)-seminorm of the vector-valued functions \({\varvec{g}}_j\) stays bounded by \(|f|_{H^k(\varOmega _1\times \dots \times \varOmega _m)}\). But according to (2.9), we have in the j-th step only the truncation error estimate

$$\begin{aligned}&\Bigg \Vert {\varvec{g}}_j-\sum _{\alpha _j=1}^{r_j}\sqrt{\lambda _j(\alpha _j)} \big (\varvec{\varphi }_j(\alpha _j)\otimes \psi _j(\alpha _j)\big ) \Bigg \Vert _{[L^2(\varOmega _j\times \dots \times \varOmega _m)]^{r_{j-1}}}\\&\quad \lesssim \bigg (\frac{r_j}{r_{j-1}}\bigg )^{-k/n_j} |{\varvec{g}}_j|_{[H^k(\varOmega _j\times \dots \times \varOmega _m)]^{r_{j-1}}}. \end{aligned}$$

Hence, in view of (4.2), to achieve the target accuracy \(\varepsilon \) per truncation, the truncation ranks need to be increased in accordance with

$$\begin{aligned} r_1 = \varepsilon ^{-n_1/k},\quad r_2 = \varepsilon ^{-(n_1+n_2)/k},\quad \ldots , \quad r_{m-1} = \varepsilon ^{-(n_1+\dots +n_{m-1})/k}. \end{aligned}$$
(4.3)

We summarize our findings in the following theorem, which holds in this form also if the subdomains \(\varOmega _j\subset {\mathbb {R}}_{n_j}\) are not ordered in such a way that \(n_1\le \dots \le n_m\).

Theorem 4

Let \(f\in H^{k+\max \{n_1,\ldots ,n_m\}}(\varOmega _1\times \dots \times \varOmega _m)\) for some fixed \(k>0\) and \(0<\varepsilon <1\). Then, the overall truncation error of the tensor train decomposition with truncation ranks (4.3) is

$$\begin{aligned} \Vert f-f_{r_1,\ldots ,r_{m-1}}^{{\text {TT}}}\Vert _{L^2(\varOmega _1\times \dots \times \varOmega _m)} \lesssim \sqrt{m}\varepsilon . \end{aligned}$$

The storage cost for \(f_{r_1,\ldots ,r_{m-1}}^{{\text {TT}}}\) is given by

$$\begin{aligned} r_1 + \sum _{j=2}^{m-1} {r_{j-1} r_j} = \varepsilon ^{-n_1/k} + \varepsilon ^{-(2n_1+n_2)/k} +\dots + \varepsilon ^{-(2n_1+\dots +2n_{m-2}+n_{m-1})/k} \end{aligned}$$
(4.4)

and hence is bounded by \({\mathcal {O}}(\varepsilon ^{-(2m-1) \max \{n_1,\ldots ,n_{m-1}\}/k})\).

Remark 2

If \(n:=n_1=\dots =n_m\), then the cost of the tensor train decomposition is \({\mathcal {O}}(\varepsilon ^{-(2m-1)n/k})\). Thus, the cost is quadratic compared to the cost of the Tucker decomposition. However, in practice, one performs m/2 forward steps and m/2 backward steps. This means one computes m/2 steps as described above to successively separate \({\varvec{x}}_1,{\varvec{x}}_2,\ldots ,{\varvec{x}}_{m/2}\) from the other variables. Then, one performs the algorithm in the opposite direction, i.e., one successively separates \({\varvec{x}}_{m},{\varvec{x}}_{m-1}, \ldots ,{\varvec{x}}_{m/2+1}\) from the other variables. This way, the overall cost is reduced to the order \({\mathcal {O}}(\varepsilon ^{-mn/k})\).Footnote 6

4.3 Sobolev Spaces with Dimension Weights

Like for the Tucker decomposition, the cost of the tensor train decomposition suffers from the curse of dimension as the number m of subdomains increases. We therefore discuss again appropriate Sobolev spaces with dimension weights, where we assume for reasons of simplicity that all subdomains are identical to a single domain \(\varOmega \subset {\mathbb {R}}^n\) of dimension n.

Theorem 5

Given \(\delta > 0\), let \(f\in H_{\varvec{\gamma }}^{k+n}(\varOmega ^m)\) for some fixed \(k>0\) with weights (3.4) that decay like (3.5). For \(0<\varepsilon <1\), choose the ranks successively in accordance with

$$\begin{aligned} r_j = \big \lceil r_{j-1}\gamma _j^n j^{(1+\delta )n/k}\varepsilon ^{-n/k}\big \rceil \end{aligned}$$
(4.5)

if \(j\le M\) and \(r_j = 0\) if \(j>M\). Here, M is given by

$$\begin{aligned} M = \varepsilon ^{-1/(1+\delta ')}. \end{aligned}$$
(4.6)

Then, the error of the continuous tensor train decomposition is of order \(\varepsilon \), while the storage cost of \(f_{r_1,\ldots ,r_m}^{{\text {TT}}}\) stays bounded by \(M\exp (\varepsilon ^{-n/k})^2\) independent of the dimension m.

Proof

The combination of Theorem 1, (3.4) and (4.5) implies

$$\begin{aligned} \sqrt{\sum _{\alpha _j = r_j+1}^\infty \lambda _j(\alpha _j)} \lesssim \bigg (\frac{r_j}{r_{j-1}}\bigg )^{-k/n} \gamma _j^k \Vert f\Vert _{H^k(\varOmega ^m)}\lesssim \frac{\varepsilon }{j^{1+\delta }}, \quad j=1,2,\ldots ,M, \end{aligned}$$

and

$$\begin{aligned} \sqrt{\sum _{\alpha _{M+1} = 1}^\infty \lambda _{M+1}(\alpha _{M+1})}\lesssim \gamma _{M+1}^k\Vert f\Vert _{H^k(\varOmega ^m)}\lesssim \varepsilon \Vert f\Vert _{H^k(\varOmega ^m)}. \end{aligned}$$

Hence, as in the proof of Theorem 3, the approximation error of the continuous tensor train decomposition is bounded by a multiple of \(\varepsilon \) independent of m.

Next, we observe for all \(j\le M\) that

$$\begin{aligned} r_j \lesssim \big \lceil r_{j-1}\gamma _j^n j^{(1+\delta )n/k}\varepsilon ^{-n/k}\big \rceil \lesssim r_{j-1}\gamma _j^n j^{(1+\delta )n/k}\varepsilon ^{-n/k}+1. \end{aligned}$$

This recursively yields

$$\begin{aligned} r_j-1&\lesssim \sum _{p=1}^j \prod _{q=p}^j \gamma _q^n q^{(1+\delta )n/k}\varepsilon ^{-n/k}\\&\lesssim \sum _{p=1}^j\varepsilon ^{(p-j-1)n/k}\prod _{q=p}^j q^{-\theta }\\&= \sum _{p=1}^j \varepsilon ^{(p-j-1)n/k}\bigg (\frac{(p-1)!}{j!}\bigg )^\theta \\&= \sum _{p=1}^j \varepsilon ^{-p n/k}\bigg (\frac{(j-p)!}{j!}\bigg )^\theta . \end{aligned}$$

Hence, by using that \(\theta = (\delta '-\delta )n/k>1\), we obtain

$$\begin{aligned} r_j\lesssim \sum _{p=0}^j \varepsilon ^{-p n/k}\frac{(j-p)!}{j!} \le \sum _{p=0}^j \frac{\varepsilon ^{-p n/k}}{p!}\le \exp (\varepsilon ^{-n/k}). \end{aligned}$$

Therefore, the cost (4.4) is

$$\begin{aligned} r_1 + \sum _{j=2}^M r_{j-1} r_j\le \sum _{j=1}^M r_j^2 \lesssim M \exp (\varepsilon ^{-n/k})^2 \end{aligned}$$

and is hence bounded independently of m in view of (4.6). \(\square \)

5 Discussion and Conclusion

In the present article, we considered the continuous versions of the Tucker tensor format and of the tensor train format for the approximation of functions which live on an m-fold product of arbitrary subdomains. By considering (isotropic) Sobolev smoothness, we derived estimates on the ranks to be chosen in order to realize a prescribed target accuracy. These estimates exhibit the curse of dimension.

Both tensor formats have in common that always only the variable with respect to a single domain is separated from the other variables by means of the singular value decomposition. This enables cheaper storage schemes, while the influence of the overall dimension of the product domain is reduced to a minimum.

We also examined the situation of Sobolev spaces with dimension weights. Having sufficiently fast decaying weights helps to beat the curse of dimension as the number of subdomains tends to infinity. It turned out that algebraically decaying weights are appropriate for both, the Tucker tensor format and the tensor train format.

We finally remark that we considered here only the ranks of the tensor decomposition in the continuous case, i.e., for functions and not for tensors of discrete data. Of course, an additional projection step onto suitable finite-dimensional trial spaces on the individual domains would be necessary to arrive at a fully discrete approximation scheme that can really be used in computer simulations. This would impose a further error of discretization type which needs to be balanced with the truncation error of the particular continuous tensor format.