1 Introduction

In this article, we consider the random boundary value problem

$$\begin{aligned} -\nabla \cdot \big (a(\omega )\nabla u(\omega )\big ) = f{\text { in }} D,\quad u(\omega )=0{\text { on }} \partial D, \end{aligned}$$
(1)

where \(D\subset {\mathbb {R}}^d\) denotes a domain and \(\omega \in \varOmega \) is a random parameter, with \(\varOmega \) denoting the set of possible outcomes. As the solution \(u\) depends on the parameter \(\omega \), we aim at an efficient approximation of the solution map \(\omega \mapsto u(\omega )\). The numerical solution of (1) has attracted quite some attention during the last decade, motivated by the need for quantifying the impact of uncertainties in PDE-based models.

The key idea of our novel approach is to combine a multilevel stochastic collocation framework with adaptive low-rank tensor techniques. This involves the following steps:

  1. 1.

    A standard technique for random diffusion problems, the Karhunen–Loève expansion of the diffusion coefficient a is truncated after \(N\in {\mathbb {N}}\) terms to turn (1) into a parametric PDE depending on N random parameters. This truncated problem is approximated by a stochastic collocation scheme.

  2. 2.

    We use a hierarchy of finite element discretizations for discretizing the physical domain D and represent the solution u as a telescoping sum. The smoothness properties of the solution u are exploited to adapt the polynomial degrees in the stochastic collocation of the differences of u between two consecutive finite element levels. This allows us to choose a low polynomial degree for the fine spatial discretization while using higher polynomial degrees only on coarser finite element levels.

  3. 3.

    Because of the high-dimensionality of the parameter domain, each difference in the multilevel sum needs to be evaluated at a large number of collocation points. We use adaptive low-rank tensor techniques to obtain good approximations from a relatively small number of samples, while maintaining the accuracy of the multilevel scheme. In turn, the use of low-rank tensor approximation drastically reduces the number of required solutions of the discretized PDE on every finite element level.

Multilevel and low-rank tensor approximation techniques have been extensively studied for the solution of (1). In the following, we briefly describe some of the existing approaches.

A number of different multilevel techniques have been proposed that aim at equilibrating the errors of the spatial approximation and the approximation in the random parameter. If statistics of the solution or a quantity of interest need to be computed, multilevel quadrature methods, like the multilevel (quasi-)Monte Carlo method or even more general quadrature approaches, are feasible; we refer to [7, 19, 20, 23, 28, 30, 31] for instances of this approach. Closer to the setting considered in this paper, the work by Teckentrup et al. [47] proposes to directly interpolate the solution \(u\) itself in suitable collocation points in the parameter domain from a sparse index set. Given additional smoothness in the spatial variable, a spatial sparse-grid approximation can be incorporated, which leads to the multiindex stochastic collocation proposed in [27].

Low-rank tensor approximation techniques have turned out to be a versatile tool for solving PDEs with random data; see [40, 41] and the references therein. In particular, a variety of low-rank approaches have been proposed to address the linear systems arising from a Galerkin discretization of (1); see, e.g., [10, 11, 13,14,15,16,17, 34,35,36, 39, 48]. Non-intrusive tensor-based approaches for uncertainty quantification can be built upon black box approximation techniques [3, 5, 42, 44].

To the best of our knowledge, there is little work on merging multilevel and tensor approximation techniques in uncertainty quantification. Recently, Lee and Elman [38] proposed a two-level scheme in the context of the Galerkin method for PDEs with random data. This scheme uses the solution from the coarse level to identify a dominant subspace in the domain of the random parameter, which in turn is used to speed up the solution on the fine level by avoiding costly low-rank truncations. The combination of multilevel and tensor approximation techniques proposed in this paper is conceptually different and is not restricted to this two-level approach but allows for multiple levels.

The rest of this paper is organized as follows. In Sect. 2, we formulate the mathematical setting and recall the Karhunen-Loève expansion. Section 3 is concerned with the discretization of (1) in the spatial domain and in the stochastic domain. In Sect. 4, we describe an existing multilevel scheme and analyze the impact of perturbations on this scheme. Section 5 contains the main contribution of this paper, a novel combination of the multilevel scheme with a low-rank tensor approximation. Finally, Sect. 6 reports numerical results for PDEs with a random diffusion coefficient on the unit square featuring a variety of different stochastic diffusion coefficients.

Throughout this article, in order to avoid the repeated use of generic but unspecified constants, we indicate by \(C\lesssim D\) that \(C\) can be bounded by a multiple of \(D\), independently of parameters which \(C\) and \(D\) may depend on. Obviously, \(C\gtrsim D\) is defined as \(D\lesssim C\), and we write \(C\eqsim D\) if there holds \(C\lesssim D\) and \(C\gtrsim D\).

2 Problem setting

Let \(D\subset {\mathbb {R}}^d\) denote a bounded Lipschitz domain. Typically, we have \(d=2,3\). Moreover, let \((\varOmega ,{\mathcal {F}},{\mathbb {P}})\) be a complete and separable probability space, where \(\varOmega \) is the set of outcomes, \({\mathcal {F}}\subset 2^\varOmega \) is some \(\sigma \)-algebra of possible events, and \({\mathbb {P}}:{\mathcal {F}}\rightarrow [0,1]\) is a probability measure on \({\mathcal {F}}\). We are interested in solving the following stochastic diffusion problem: Find \(u\in L^2\big (\varOmega ;H^1_0(D)\big )\) such that

$$\begin{aligned} \begin{aligned} -\nabla \cdot \big (a(\omega )\nabla u(\omega )\big )&= f,&{\text {in }} D,\\ u(\omega )&= 0,&{\text {on }}\partial D, \end{aligned} \end{aligned}$$

holds for \({\mathbb {P}}\)-almost every \(\omega \in \varOmega \). Here and in the sequel, for a Banach space \({\mathcal {X}}\), we define the Lebesgue-Bochner-space \(L^p(\varOmega ;{\mathcal {X}})\), \(1\le p\le \infty \) as the space of all equivalence classes of strongly measurable functions \(v:\varOmega \rightarrow {\mathcal {X}}\) whose norm

$$\begin{aligned} \Vert v\Vert _{L^p(\varOmega ;{\mathcal {X}})} \,{:}{=}{\left\{ \begin{array}{ll} \displaystyle {\left( \int _\varOmega \Vert v(\omega )\Vert _{{\mathcal {X}}}^p {\mathrm {d}}{\mathbb {P}}(\omega )\right) ^{1/p}},&{} p<\infty \\ \displaystyle {\mathop {\text {ess sup}}\limits _{\omega \in \varOmega }\Vert v(\omega )\Vert _{{\mathcal {X}}}},&{} p=\infty \end{array}\right. } \end{aligned}$$

is finite. If \(p=2\) and \({\mathcal {X}}\) is a separable Hilbert space, then the Bochner space is isomorphic to the tensor product space

$$\begin{aligned} L^2(\varOmega ;{\mathcal {X}})\cong L^2(\varOmega )\otimes {\mathcal {X}}. \end{aligned}$$

Throughout this article, we shall assume that the load \(f\in L^2(D)\) is purely deterministic. Still, by straightforward modifications it is also possible to deal with random load vectors, see, e.g., [1]. Additionally, for the sake of simplicity, we restrict ourselves here to the case of uniformly elliptic diffusion problems. This means that we assume the existence of constants \(a_{\mathrm {min}} > 0\) and \(a_{\mathrm {max}} < \infty \) that are independent of the parameter \(\omega \in \varOmega \) such that for almost every \(x\in D\) there holds

$$\begin{aligned} a_{\mathrm {min}} \le a(\omega ,x) \le a_{\mathrm {max}}\quad {\mathbb {P}}{\text {-almost surely}}. \end{aligned}$$
(2)

Nevertheless, we emphasize that the presented approach is directly transferable to diffusion problems, where the constants \(a_{\mathrm {min}}\) and \(a_{\mathrm {max}}\) might depend on \(\omega \in \varOmega \) and are only \({\mathbb {P}}\)-integrable, as it is the case for log-normally distributed diffusion coefficients, cf. [33, 45]. Therefore, all results presented here remain valid in this case.

Typically, the diffusion coefficient is not directly feasible for numerical computations and has thus to be represented in a suitable way. To that end, one decomposes the diffusion coefficient with the aid of the Karhunen-Loève expansion.

Let the covariance kernel of \(a(\omega ,x)\) be defined by the positive semi-definite function

$$\begin{aligned} {\mathcal {C}}(x,x')\,{:}{=}\int _\varOmega \big (a(\omega ,x)-{\mathbb {E}}[a]({x})\big )\big (a(\omega ,{x}')-{\mathbb {E}}[a]({x}')\big ) {\mathrm {d}}{\mathbb {P}}. \end{aligned}$$

Herein, the integral with respect to \(\varOmega \) has to be understood in terms of a Bochner integral, cf. [32]. One can show that \({\mathcal {C}}(x,x')\) is well defined if there holds \(a\in L^2(\varOmega ;{\mathcal {X}})\). Now, let \(\{(\lambda _n,\varphi _n)\}_n\) denote the eigenpairs obtained by solving the eigenproblem for the diffusion coefficient’s covariance, i.e. 

$$\begin{aligned} \int _D{\mathcal {C}}({x},{x}')\varphi _n({x}'){\mathrm {d}}{x}'=\lambda _n\varphi _n({x}). \end{aligned}$$

Then, the Karhunen-Loève expansion of \(a(\omega ,x)\) is given by

$$\begin{aligned} a(\omega ,x) ={\mathbb {E}}[a]({x}) +\sum _{n=1}^\infty \sqrt{\lambda _n}X_{n}(\omega )\varphi _n({x}), \end{aligned}$$
(3)

where \(X_n:\varOmega \rightarrow \varGamma _n\subset {\mathbb {R}}\) for \(n=1,2,\ldots \) are centered, pairwise uncorrelated and \(L^2\)-normalized random variables given by

$$\begin{aligned} X_n\,{:}{=}\frac{1}{\sqrt{\lambda _n}}\int _D \big (a(\omega ,x)-{\mathbb {E}}[a](x)\big )\varphi _n(x){\mathrm {d}}x. \end{aligned}$$

From condition (2), we directly infer, that the image of the random variables is a bounded set and that \({\mathbb {E}}[a](\mathbf{x})>0\). Thus, without loss of generality, we assume that \(\varGamma _n=[-1,1]\). The cases, which we wish to study here, are the uniformly distributed case, i.e. \(X_n\sim {\mathcal {U}}([-1,1])\) and the log-uniformly distributed case which means that we have a diffusion coefficient of the form \(a(\omega ,x)=\exp \big (b(\omega ,x)\big )\), where \(b(\omega ,x)\) is given analogously to (3), also with \(X_n\sim {\mathcal {U}}([-1,1])\), and satisfies (2).

Although, we have separated by now the spatial and the stochastic influences in the diffusion coefficient, we are still facing an infinite sum. Nevertheless, for numerical purposes, this sum may be truncated appropriately. The impact of truncating the Karhunen-Loève expansion on the solution is bounded by

$$\begin{aligned} \Vert u-u_{N}\Vert _{L^2(\varOmega ;H^1_0(D))}\lesssim \Vert a-a_{N}\Vert _{L^2(\varOmega ;L^\infty (D))}\le \varepsilon (N), \end{aligned}$$

where \(\varepsilon (N)\rightarrow 0\) monotonically as \(N\rightarrow \infty \), see e.g. [9, Theorem 4.2] and [46, Theorem 2.7]. Herein, we set

$$\begin{aligned} a_N(\omega ,x) \,{:}{=}{\mathbb {E}}[a]({x})+ \sum _{n=1}^N\sqrt{\lambda _n}X_{n}(\omega )\varphi _n({x}), \end{aligned}$$

and denote by \(u_N\) the solution to

$$\begin{aligned} -\nabla \cdot \big (a_N(\omega )\nabla u_N(\omega )\big ) = f{\text { in }} D,\quad u_N(\omega ) = 0{\text { on }}\partial D. \end{aligned}$$

Note that these estimates relate to the log-normal and the uniformly distributed cases. But they also directly transfer to the log-uniform case.

Assuming additionally, that the \(\{X_n\}_n\) are independent and exhibit densities \(\rho _n:\varGamma _n\rightarrow {\mathbb {R}}_{+}\) with respect to the Lebesgue measure, we end up, with the parametric diffusion problem: Find \(u_N\in L^2_\rho \big (\varGamma ;H^1_0(D)\big )\)

$$\begin{aligned} -\nabla \cdot \big (a_N({{\varvec{y}}})\nabla u_N({{\varvec{y}}})\big ) = f{\text { in }} D, \end{aligned}$$
(4)

where \(\rho \,{:}{=}\rho _1(y_1)\cdots \rho _N(y_N)\), \(\varGamma \,{:}{=}\times _{n=1}^N \varGamma _n\) and \({{\varvec{y}}}={{\varvec{y}}}(\omega )\,{:}{=}[y_1(\omega ),\ldots ,y_N(\omega )] \in \varGamma \). Herein, the space \(L^2_\rho \big (\varGamma ;H^1_0(D)\big )\) is endowed with the norm

$$\begin{aligned} \left\| v \right\| _{L^2_\rho (\varGamma ;H^1_0(D))}\,{:}{=}\left( \int _{\varGamma } \left\| v({{\varvec{y}}}) \right\| _{H^1_0(D)}^2 \rho ({{\varvec{y}}}) {\mathrm {d}}{{\varvec{y}}} \right) ^{1/2}. \end{aligned}$$

Note that we have \(\rho _n=1/2\) for the case of \(X_n\sim {\mathcal {U}}([-1,1])\). In view of the polynomial interpolation with respect to the parameter \({{\varvec{y}}}\in \varGamma \), we shall finally introduce for a given Banach space \({\mathcal {X}}\) the space

$$\begin{aligned} C^0(\varGamma ;{\mathcal {X}})\,{:}{=}\Big \{v:\varGamma \rightarrow {\mathcal {X}}: v{\text { is continuous and }}\sup _{{{\varvec{y}}}\in \varGamma }\Vert v({{\varvec{y}}})\Vert _{{\mathcal {X}}}<\infty \Big \}. \end{aligned}$$

3 Discretization

Later on, a standard stochastic collocation scheme, cf. [1], is used for the stochastic discretization of the differences of the solutions to the parametric diffusion problem (4) on consecutive grids. To that end, we use tensor product polynomial interpolation in the parameter space \(\varGamma \) and a finite element approximation in the physical domain D.

3.1 Polynomial interpolation

Let \({\mathcal {P}}_{{{\varvec{p}}}}(\varGamma ) \subset L_\rho ^2(\varGamma )\) denote the span of tensor product polynomials with degree at most \({{\varvec{p}}}=(p_1,\ldots ,p_N)\), i.e.,

$$\begin{aligned} {\mathcal {P}}_{{{\varvec{p}}}}(\varGamma ) = \bigotimes _{n=1}^N {\mathcal {P}}_{p_n}(\varGamma _n) \end{aligned}$$

with

$$\begin{aligned} {\mathcal {P}}_{p_n}(\varGamma _n) = \text {span}\{ y_n^m : m=0,\ldots ,p_n\},\quad n=1,\ldots ,N. \end{aligned}$$

Given interpolation points \(y_{n,k_n}\in \varGamma _n\), \(k_n=0,\ldots ,p_n\), the Lagrange basis for \({\mathcal {P}}_{p_n}(\varGamma _n)\) is defined by \(\{l_{n,k_n}\in {\mathcal {P}}_{p_n}(\varGamma _n):l_{n,k_n}(y_{n,j_n}) = \delta _{k_n,j_n},\ j_n,k_n =0,\ldots ,p_n \}\). By a tensor product construction, we obtain the Lagrange basis \(\{l_{{{\varvec{k}}}}:{\varvec{k}}\in {\mathcal {K}}_{\varvec{p}}\}\) for \({\mathcal {P}}_{{{\varvec{p}}}}(\varGamma )\) where

$$\begin{aligned} l_{{{\varvec{k}}}}({{\varvec{y}}}) \,{:}{=}\prod _{n=1}^N l_{n,k_n}(y_n) \end{aligned}$$

for a multiindex \({{\varvec{k}}}=(k_1,\ldots ,k_N)\in {\mathcal {K}}_{{{\varvec{p}}}}\) with

$$\begin{aligned} {\mathcal {K}}_{{{\varvec{p}}}}\,{:}{=}\big \{(k_1,\ldots ,k_N)\in {\mathbb {N}}^N : k_n = 0,\ldots ,p_n,\, n=1,\ldots ,N\big \}. \end{aligned}$$

For all functions \(v\in C^0\big (\varGamma ;H^1_0(D)\big )\), the tensor product interpolation points

\({{\varvec{y}}}_{{{\varvec{k}}}} \,{:}{=}(y_{1,k_1},\ldots ,y_{N,k_N}) \in \varGamma \) give rise to the interpolation operator

$$\begin{aligned} {\mathcal {I}}_{{{\varvec{p}}}}:C^0\big (\varGamma ; H_0^1(D)\big ) \rightarrow {\mathcal {P}}_{{{\varvec{p}}}}(\varGamma ) \otimes H_0^1(D) \end{aligned}$$

defined by

$$\begin{aligned} {\mathcal {I}}_{{{\varvec{p}}}}[v]({{\varvec{y}}}) = \sum _{{{\varvec{k}}}\in {\mathcal {K}}_{{{\varvec{p}}}}} v({{\varvec{y}}}_{{{\varvec{k}}}}) l_{{{\varvec{k}}}}({{\varvec{y}}}). \end{aligned}$$
(5)

With regard to (4), our goal is to approximate the solution \(u_N\) by

$$\begin{aligned} u_N({{\varvec{y}}}) \approx {\mathcal {I}}_{{{\varvec{p}}}}[u_N]({{\varvec{y}}}) = \sum _{{{\varvec{k}}}\in {\mathcal {K}}_{{{\varvec{p}}}}} u_N({{\varvec{y}}}_{{{\varvec{k}}}}) l_{{{\varvec{k}}}}({{\varvec{y}}}). \end{aligned}$$

In order to obtain the coefficients \(u_N({{\varvec{y}}}_{{{\varvec{k}}}})\), we have to solve

$$\begin{aligned} -\nabla \cdot \big (a_N({{\varvec{y}}}_{{{\varvec{k}}}})\nabla u_N({{\varvec{y}}}_{{{\varvec{k}}}})\big ) = f{\text { in }} D,\quad u_N({{\varvec{y}}}_{{{\varvec{k}}}})=0{\text { on }}\partial D, \end{aligned}$$
(6)

for all \({{\varvec{y}}}_{{{\varvec{k}}}}\) with \({{\varvec{k}}}\in {\mathcal {K}}_{{{\varvec{p}}}}\). For each \({{\varvec{k}}}\in {\mathcal {K}}_{{{\varvec{p}}}}\), (6) is a deterministic diffusion problem on D which can be approximated by the finite element method.

3.2 Interpolation error

To study the impact of the interpolation error, we have to take the smoothness of \(u_N\) with respect to the parameter \({{\varvec{y}}}\in \varGamma \) into account. It is well known, see, e.g., [12, Theorem 4.3] for the affine case and [33, Theorem 3.1] for the log-normal case, that \(u_N\) satisfies the decay estimate

$$\begin{aligned} \big \Vert \partial ^{{\varvec{\alpha }}}_{{{\varvec{y}}}}u_N({{\varvec{y}}})\big \Vert _{H^1_0(D)}\le C|{\varvec{\alpha }}|!c^{|{\varvec{\alpha }}|}{\varvec{\gamma }}^{\varvec{\alpha }}\Vert f\Vert _{L^2(D)},{\text { where }}\gamma _n\,{:}{=}\sqrt{\lambda _n}\Vert \varphi _n\Vert _{L^\infty (D)}, \end{aligned}$$
(7)

cf. (3), for some constants \(C,c>0\). In the sequel, we consider the interpolation based on the Chebyshev nodes

$$\begin{aligned} \eta _k\,{:}{=}\cos \bigg (\frac{2k+1}{2(p+1)}\pi \bigg )\in [-1,1],\quad k=0,\ldots ,p. \end{aligned}$$

The associated uni-directional interpolation operator shall be denoted by

$$\begin{aligned} {\mathcal {I}}_p:C^{0}([-1,1])\rightarrow {\mathcal {P}}_p,\quad v(x)\mapsto \sum _{k=0}^pv(\eta _k) l_k(x). \end{aligned}$$

It satisfies for a function \(v\in C^{p+1}([-1,1])\) the well known interpolation error estimate

$$\begin{aligned} \bigg |v(x)-\sum _{k=0}^p v(\eta _k)l_k(x)\bigg |\le \frac{1}{2^{p}(p+1)!}\max _{\xi \in [-1,1]}\big |v^{(p+1)}(\xi )\big | \end{aligned}$$

and the stability estimate

$$\begin{aligned} \bigg \Vert \sum _{k=0}^p v(\eta _k)l_k(x)\bigg \Vert _{C^0([-1,1])}\le \bigg (\frac{2}{\pi }\log (p+1)+1\bigg )\Vert v\Vert _{C^0([-1,1])}, \end{aligned}$$

see, e.g., [43]. Therefore, we obtain by a tensor product construction the stability estimate for \({\mathcal {I}}_{{{\varvec{p}}}}\) according to

$$\begin{aligned} \big \Vert {\mathcal {I}}_{{{\varvec{p}}}}[v]\big \Vert _{C^0(\varGamma ; H_0^1(D))}\le C_s({{\varvec{p}}})\Vert v\Vert _{C^0(\varGamma ; H_0^1(D))} \end{aligned}$$

with

$$\begin{aligned} C_s({{\varvec{p}}})\,{:}{=}\prod _{i=1}^N\bigg (\frac{2}{\pi }\log (p_i+1)+1\bigg ). \end{aligned}$$

Obviously, the stability constant will grow exponentially as \(N\rightarrow \infty \). Nevertheless, this case is not considered here. Additionally, since usually \(p_i\eqsim \log (\varepsilon )\) for a desired accuracy \(\varepsilon \), we have

$$\begin{aligned} C_s({{\varvec{p}}})\eqsim \big (\log (\log \varepsilon )\big )^N, \end{aligned}$$

which means that the stability constant will mildly depend on the desired accuracy. Moreover, we emphasize that there exist regimes, where the stability constant is bounded. If the error is, for example, measured in \(L^2_\rho \left( \varGamma ; H_0^1(D)\right) \) and the interpolation points are chosen as the roots of the orthogonal polynomials with respect to the densities \(\rho _n\), then the corresponding stability estimate holds with \(C_s({{\varvec{p}}})=1\), cf. [1]. Still, without the orthogonality property, there exist also bounds of the stability constant for Chebyshev points, if the error is measured in \(L^1\left( \varGamma ; H_0^1(D)\right) \), see [18]. Nevertheless, in order to obtain a black box interpolation, which is independent of the particular density function, we will employ here the Chebyshev points and measure the error with respect to \(C^0\big (\varGamma ; H_0^1(D)\big )\) at the cost of a stability constant that is not robust with respect to the polynomial degree.

We obtain the following interpolation result for the solution \(u_N\) to (4), which is a straightforward modification of the related results in [25, Appendix A].

Theorem 1

Let \(c\gamma _k<2\). Then, given that

$$\begin{aligned} p_k=\bigg \lceil \frac{\log (\varepsilon )}{\log (c\gamma _k/2)}\bigg \rceil -1, \end{aligned}$$

the polynomial interpolation satisfies the error estimate

$$\begin{aligned} \big \Vert u_N({{{\varvec{y}}}})-{\mathcal {I}}_{{{\varvec{p}}}}[u_N]({{\varvec{y}}})\big \Vert _{H^1_0(D)}\lesssim \varepsilon C({{\varvec{p}}})\Vert f\Vert _{L^2(D)} \end{aligned}$$

for some constant \(C({{\varvec{p}}})\) that mildly depends on \(\varepsilon \).

Proof

There holds by (7) and the repeated application of the triangle inequality that

$$\begin{aligned}&\big \Vert u_N({{{\varvec{y}}}})-{\mathcal {I}}_{{{\varvec{p}}}}[u_N]({{\varvec{y}}})\big \Vert _{H^1_0(D)}\\&\quad \le \sum _{k=1}^N \big \Vert \big ({\mathcal {I}}_{p_1}\otimes \ldots \otimes {\mathcal {I}}_{p_{k-1}}\otimes ({\text {Id}}-{{\mathcal {I}}_{p_k}}) \otimes {\text {Id}}\otimes \ldots \otimes {\text {Id}}\big )u_N({{{\varvec{y}}}}\big )\big \Vert _{H^1_0(D)}\\&\quad \le \sum _{k=1}^N\bigg [\prod _{m=1}^{k-1}\bigg (\frac{2}{\pi }\log (p_m+1)+1\bigg )\bigg ] \bigg [\frac{1}{2^{p_k}(p_k+1)!}C(p_k+1)!c^{p_k+1}\gamma _k^{p_k+1}\bigg ]\Vert f\Vert _{L^2(D)}\\&\quad =\sum _{k=1}^N\bigg [\prod _{m=1}^{k-1}\bigg (\frac{2}{\pi }\log (p_m+1)+1\bigg )\bigg ] \bigg [2\bigg (\frac{c\gamma _k}{2}\bigg )^{p_k+1}C c\bigg ]\Vert f\Vert _{L^2(D)}. \end{aligned}$$

Thus, with

$$\begin{aligned} p_k=\bigg \lceil \frac{\log (\varepsilon )}{\log (c\gamma _k/2)}\bigg \rceil -1, \end{aligned}$$

we obtain

$$\begin{aligned} \big \Vert u_N({{{\varvec{y}}}})-{\mathcal {I}}_{{{\varvec{p}}}}[u_N]({{\varvec{y}}})\big \Vert _{H^1_0(D)} \le Cc\varepsilon \Bigg (\sum _{k=1}^N\bigg [\prod _{m=1}^{k-1} \bigg (\frac{2}{\pi }\log (p_m+1)+1\bigg )\bigg ]\Bigg )\Vert f\Vert _{L^2(D)}. \end{aligned}$$

\(\square \)

Remark 1

The constant \(C({{\varvec{p}}})\) from the previous theorem can be bounded according to

$$\begin{aligned} C_s({{\varvec{p}}})\lesssim C({{\varvec{p}}})\lesssim NC_s({{\varvec{p}}}). \end{aligned}$$

where we recall that \(C_s({{\varvec{p}}})\) denotes the stability constant of \({\mathcal {I}}_{{{\varvec{p}}}}\). Thus, \(C({{\varvec{p}}})\) also potentially grows exponentially as \(N\rightarrow \infty \) and also depends mildly on \(\varepsilon \).

3.3 Finite element approximation

In order to compute the coefficients \(u_N({{\varvec{y}}}_{{{\varvec{k}}}})\) in (6), we consider an approximation by the finite element method. To this end, let \({\mathcal {T}}_0=\{\tau _{0,k}\}\) be a coarse grid triangulation of the domain D. Then, for \(\ell \ge 1\), a uniform and shape regular triangulation \({\mathcal {T}}_\ell =\{\tau _{\ell ,k}\}\) is recursively obtained by uniformly refining each element \(\tau _{\ell -1,k}\) into \(2^d\) elements with diameter \(h_\ell \eqsim 2^{-\ell }\). We define the space of piecewise linear finite elements according to

$$\begin{aligned} {\mathcal {S}}_\ell ^1(D)\,{:}{=}\{v\in C^{0}(D):v|_{\partial D}=0\ {\text {and}}\ v|_\tau \in \varPi _1\ {\text {for all}}\ \tau \in {\mathcal {T}}_\ell \}\subset H_0^1(D), \end{aligned}$$
(8)

where \(\varPi _1\) denotes the space of all polynomials of total degree 1. Then, the finite element approximations \(u_{N,\ell }({{{\varvec{y}}}_{{{\varvec{k}}}}})\in {\mathcal {S}}_\ell ^1(D)\) to the coefficients \(u_N({{{\varvec{y}}}_{{{\varvec{k}}}}})\) satisfy the following well known error estimate.

Lemma 1

Let D be convex domain and let \(f\in L^2(D)\), \(a_N\in L^\infty \big (\varGamma ;W^{1,\infty }(D)\big )\) for all \(N\in {\mathbb {N}}\). Then, for \({{\varvec{y}}}\in \varGamma \), the finite element solution \(u_{N,\ell }({{\varvec{y}}})\in {\mathcal {S}}_\ell ^1(D)\) of the diffusion problem (4) satisfies the error estimate

$$\begin{aligned} \Vert u_N({{\varvec{y}}})-u_{N,\ell }({{\varvec{y}}})\Vert _{H_{0}^{1}(D)}\lesssim 2^{-\ell }\Vert u_N({{\varvec{y}}})\Vert _{H^2(D)}\lesssim 2^{-\ell }\Vert f\Vert _{L^2(D)}. \end{aligned}$$
(9)

Note that we restrict ourselves here to the situation of piecewise linear finite elements. Nevertheless, by applying obvious modifications, the presented results remain valid also for higher order finite elements. Moreover, for the sake of simplicity, we consider here nested sequences of finite element spaces, i.e.,

$$\begin{aligned} {\mathcal {S}}_0^1(D)\subset {\mathcal {S}}_1^1(D)\subset \ldots . \end{aligned}$$
(10)

This is not a requirement, as has been discussed in [23].

3.4 Stochastic collocation error

By a tensor product argument, the combination of the finite element approximation in the spatial variable and the interpolation in the parameter yields the following approximation result.

Theorem 2

Let the polynomial degree \({{\varvec{p}}}\) be chosen such that there holds

$$\begin{aligned} \big \Vert u_N({{{\varvec{y}}}})-{\mathcal {I}}_{{{\varvec{p}}}}[u_N]({{\varvec{y}}})\big \Vert _{H^1_0(D)}\lesssim 2^{-\ell }C({{\varvec{p}}})\Vert f\Vert _{L^2(D)}. \end{aligned}$$

Then, the fully discrete approximation \({\mathcal {I}}_{{{\varvec{p}}}}[u_{N,\ell }]\in {\mathcal {P}}_{{{\varvec{p}}}}(\varGamma )\otimes {\mathcal {S}}^1_j(D)\) satisfies the error estimate

$$\begin{aligned} \big \Vert u_N({{{\varvec{y}}}})-{\mathcal {I}}_{{{\varvec{p}}}}[u_{N,\ell }]({{\varvec{y}}})\big \Vert _{H^1_0(D)}\lesssim 2^{-\ell }\big (C({{\varvec{p}}})+C_s({{\varvec{p}}})\big )\Vert f\Vert _{L^2(D)}, \end{aligned}$$

where \(u_{N,\ell }({{\varvec{y}}})\) is the finite element approximation to \(u_N({{\varvec{y}}})\) on level \(\ell \) that fulfills (9) and \(C_s({{\varvec{p}}})\) denotes the stability constant of \({\mathcal {I}}_{{{\varvec{p}}}}\).

4 Multilevel approximation

In the previous section, we have introduced the classical stochastic collocation as it has been proposed in, e.g., [1]. The related error estimate is in this case based on a tensor product argument between the spatial approximation and the discretization of the parameter. Now, the idea of the related multilevel approximation is to perform an error equilibration as it is known from sparse tensor product approximations, cf. [8].

4.1 Multilevel scheme

We start by representing the finite element approximation \(u_{N,L}({{\varvec{y}}})\) for a maximal level \(L\in {\mathbb {N}}\) by the telescoping sum

$$\begin{aligned} u_{N,L}({{\varvec{y}}})=\sum _{\ell =0}^L \big (u_{N,\ell }({{\varvec{y}}})-u_{N,\ell -1}({{\varvec{y}}})\big )\quad {\text {with }}u_{N,-1}\,{:}{=}0. \end{aligned}$$

Instead of applying the interpolation operator in the parameter \({{\varvec{y}}}\in \varGamma \) with a fixed degree \({{{\varvec{p}}}}\), we adapt the degree to the finite element approximation level and obtain the multilevel approximation

$$\begin{aligned} u_N({{\varvec{y}}})\approx u_{N,L}^{\mathrm{M}\mathrm{L}}({{\varvec{y}}})\,{:}{=}\sum _{\ell =0}^L {\mathcal {I}}_{{{\varvec{p}}}^{(\ell )}}\big [u_{N,\ell }-u_{N,\ell -1}\big ]({{\varvec{y}}}). \end{aligned}$$
(11)

The goal is now to choose the polynomial degrees \(\{{{\varvec{p}}}^{(\ell )}\}\) antipodal to the refinement level \(\ell \) of the finite element approximation and to equilibrate a high spatial accuracy with a relatively low polynomial degree. In order to facilitate this, it is crucial to have the following mixed regularity estimate for \(u_N\). There holds

$$\begin{aligned} \big \Vert \partial ^{{\varvec{\alpha }}}_{{{\varvec{y}}}}u_N({{\varvec{y}}})\big \Vert _{H^2(D)}\le C|{\varvec{\alpha }}|!c^{|{\varvec{\alpha }}|}\tilde{{\varvec{\gamma }}}^{\varvec{\alpha }}\Vert f\Vert _{L^2(D)},{\text { where }}\tilde{\gamma }_n\,{:}{=}\sqrt{\lambda _n}\Vert \varphi _n\Vert _{W^{1,\infty }(D)}, \end{aligned}$$

cp. (7), for some constants \(C,c>0\). See [12] for a proof of this statement in the affine case and [33] for the log-normal case. The estimate for the log-uniform case can be derived with the same techniques that are applied in these works. From this estimate, one can derive the parametric smoothness of the Galerkin error. This is stated by the following lemma which is, e.g., proven in [29, 37].

Lemma 2

For the error of the Galerkin projection, there holds the estimate

$$\begin{aligned} \big \Vert \partial ^{{\varvec{\alpha }}}_{{{\varvec{y}}}}(u_N-u_{N,\ell })({{\varvec{y}}})\big \Vert _{H_{0}^{1}(D)}\lesssim 2^{-\ell }|{\varvec{\alpha }}|! c^{|{\varvec{\alpha }}|}{{\varvec{\gamma }}}^{{\varvec{\alpha }}}\Vert f\Vert _{L^2(D)} \quad {\text {for all }}|\varvec{\alpha }|\ge 0 \end{aligned}$$
(12)

with a constant \(c>0\) depending on \(a_{\min }\) and \(a_{\max }\), where \(\gamma _k\,{:}{=}\Vert \sqrt{\lambda _k}\varphi _k\Vert _{W^{1,\infty }(D)}\) and \({\varvec{\gamma }}\,{:}{=}(\gamma _1,\ldots ,\gamma _N)\).

With this lemma at hand, it is easy to derive the following error estimate in complete analogy to the proof of Theorem 1.

Theorem 3

Let the degree \({{\varvec{p}}}^{(\ell ')}\in {\mathbb {N}}^N\) be such that the interpolation error is \(C\big ({{\varvec{p}}}^{(\ell ')}\big )\varepsilon \eqsim 2^{-\ell '}\). Then, there holds the error estimate

$$\begin{aligned} \big \Vert ({\text {Id}}-{\mathcal {I}}_{\varvec{p}^{(\ell ')}})[u_N-u_{N,\ell }]({{\varvec{y}}})\big \Vert _{H_{0}^{1}(D)}\lesssim 2^{-(\ell +\ell ')}\Vert f\Vert _{L^2(D)}. \end{aligned}$$
(13)

Proof

Let \({\varvec{p}}^{(\ell ')}=(p_1,\ldots ,p_N)\). From Theorem 1, we infer

$$\begin{aligned}&\big \Vert ({\text {Id}}-{\mathcal {I}}_{\varvec{p}^{(\ell ')}})[u_N-u_{N,\ell }]({{\varvec{y}}})\big \Vert _{H_{0}^{1}(D)}\\&\qquad \le \sum _{k=1}^N\bigg [\prod _{m=1}^{k-1}\bigg (\frac{2}{\pi }\log (p_m+1)+1\bigg )\bigg ]\frac{\big \Vert \partial ^{(p_k+1){\varvec{e}}_k}_{{{\varvec{y}}}}(u_N-u_{N,\ell })({{\varvec{y}}})\big \Vert _{H_{0}^{1}(D)}}{2^{p_k}(p_k+1)!}. \end{aligned}$$

Now, we may employ (12) on the last term in this expression and obtain

$$\begin{aligned}&\big \Vert ({\text {Id}}-{\mathcal {I}}_{\varvec{p}^{(\ell ')}})[u_N-u_{N,\ell }]({{\varvec{y}}})\big \Vert _{H_{0}^{1}(D)}\\&\qquad \lesssim 2^{-\ell }\sum _{k=1}^N\bigg [\prod _{m=1}^{k-1}\bigg (\frac{2}{\pi }\log (p_m+1)+1\bigg )\bigg ] \frac{C(p_k+1)!c^{p_k+1}{\gamma }_k^{p_k+1}\Vert f\Vert _{L^2(D)}}{2^{p_k}(p_k+1)!}. \end{aligned}$$

Finally, the assertion is obtained by inserting the assumption \(C\big ({{\varvec{p}}}^{(\ell ')}\big )\varepsilon \eqsim 2^{-\ell '}\) into the previous inequality

$$\begin{aligned} \big \Vert ({\text {Id}}-{\mathcal {I}}_{\varvec{p}^{(\ell ')}})[u_N-u_{N,\ell }]({{\varvec{y}}})\big \Vert _{H_{0}^{1}(D)} \lesssim 2^{-(\ell +\ell ')}\Vert f\Vert _{L^2(D)}. \end{aligned}$$

\(\square \)

Theorem 4

Let \(\big \{{{\varvec{p}}}^{(\ell ')}\big \}\in {\mathbb {N}}^N\) be a sequence of polynomial degrees, that give rise to an error estimate of the form (13) with \(\ell '=L-\ell \), where \(u_N\in L^2_\rho \big (\varGamma ,H^1_0(D)\big )\) is the solution to (4) that satisfies (9). Then, the error of the multilevel approximation (11) is bounded by

$$\begin{aligned} \bigg \Vert u_N({{\varvec{y}}})-\sum _{\ell =0}^L {\mathcal {I}}_{{{\varvec{p}}}^{(\ell )}}\big [u_{N,\ell }-u_{N,\ell -1}\big ]({{\varvec{y}}})\bigg \Vert _{H^1_0(D)} \lesssim 2^{-L}L\Vert f\Vert _{L^{2}(D)}. \end{aligned}$$
(14)

Proof

We shall apply the following multilevel splitting of the error

$$\begin{aligned} \begin{aligned}&\bigg \Vert u_N({{\varvec{y}}})-\sum _{\ell =0}^L{\mathcal {I}}_{{{\varvec{p}}}^{(\ell )}}\big [u_{N,\ell }-u_{N,\ell -1}\big ]({{\varvec{y}}})\bigg \Vert _{H^1_0(D)}\\&\quad =\bigg \Vert u_N({{\varvec{y}}})-u_{N,L}({{\varvec{y}}})+\sum _{\ell =0}^L(u_{N,\ell }-u_{N,\ell -1})({{\varvec{y}}})-\sum _{\ell =0}^L {\mathcal {I}}_{{{\varvec{p}}}^{(\ell )}}\big [u_{N,\ell }-u_{N,\ell -1}\big ]({{\varvec{y}}})\bigg \Vert _{H^1_0(D)}\\&\quad \le \big \Vert u_N({{\varvec{y}}})-u_{N,L}({{\varvec{y}}})\big \Vert _{H^1_0(D)} + \sum _{\ell =0}^L\big \Vert ({\text {Id}}-{\mathcal {I}}_{\varvec{p}^{(\ell ')}})[u_{N,\ell }-u_{N,\ell -1}]({{\varvec{y}}})\big \Vert _{H^1_0(D)}. \end{aligned} \end{aligned}$$
(15)

The first term just reflects the finite element approximation error and is thus bounded by (9). Thanks to (13), the term inside the sum satisfies

$$\begin{aligned} \begin{aligned}&\big \Vert ({\text {Id}}-{\mathcal {I}}_{\varvec{p}^{(\ell ')}})[u_{N,\ell }-u_{N,\ell -1}]({{\varvec{y}}})\big \Vert _{H^1_0(D)}\\&\qquad \le \big \Vert ({\text {Id}}-{\mathcal {I}}_{\varvec{p}^{(\ell ')}})[u_{N}-u_{N,\ell }]({{\varvec{y}}})\big \Vert _{H^1_0(D)} +\big \Vert ({\text {Id}}-{\mathcal {I}}_{\varvec{p}^{(\ell ')}})[u_{N}-u_{N,\ell -1}]({{\varvec{y}}})\big \Vert _{H^1_0(D)}\\&\qquad \lesssim 2^{-(\ell +L-\ell )}\Vert f\Vert _{L^2(D)}+2^{-(\ell -1+L-\ell )}\Vert f\Vert _{L^2(D)}\lesssim 2^{-L}\Vert f\Vert _{L^2(D)}. \end{aligned} \end{aligned}$$

Thus, we can estimate (15) as

$$\begin{aligned} \bigg \Vert u_N({{\varvec{y}}})-\sum _{\ell =0}^L {\mathcal {I}}_{{{\varvec{p}}}^{(\ell )}}\big [u_{N,\ell } -u_{N,\ell -1}\big ]({{\varvec{y}}})\bigg \Vert _{H^1_0(D)}&\lesssim 2^{-L}\Vert f\Vert _{L^2(D)} +\sum _{\ell =0}^L2^{-L}\Vert f\Vert _{L^2(D)}\\&\le 2^{-L}(L+2)\Vert f\Vert _{L^2(D)}. \end{aligned}$$

This completes the proof. \(\square \)

4.2 Perturbed multilevel scheme

The multilevel scheme from above relies on the exact evaluation of the differences

$$\begin{aligned} \delta _{N,\ell } \,{:}{=}u_{N,\ell }-u_{N,\ell -1} \end{aligned}$$

in the interpolation points \({{\varvec{y}}}_{{{\varvec{k}}}}^{(\ell )}\in {\mathcal {K}}_{{{\varvec{p}}}^{(\ell )}}\subset \varGamma \) on each level \(\ell \). We now slightly relax this assumption and consider perturbations

$$\begin{aligned} \tilde{\delta }_{N,\ell ,{{\varvec{k}}}} \approx \delta _{N,\ell }({{\varvec{y}}}_{{{\varvec{k}}}}^{(\ell )}),\qquad {{\varvec{k}}}\in {\mathcal {K}}_{{{\varvec{p}}}^{(\ell )}}, \end{aligned}$$
(16)

and the associated perturbed interpolant

$$\begin{aligned} \tilde{\delta }_{N,\ell }({{\varvec{y}}}) \,{:}{=}\sum _{{{\varvec{k}}}\in {\mathcal {K}}_{{{\varvec{p}}}^{(\ell )}}} \tilde{\delta }_{N,\ell ,{{\varvec{k}}}} l_{{{\varvec{k}}}}({{\varvec{y}}}). \end{aligned}$$

In view of (11), the perturbed multilevel approximation then reads

$$\begin{aligned} \tilde{u}_{N,L}^{\mathrm{M}\mathrm{L}}({{\varvec{y}}})\,{:}{=}\sum _{\ell =0}^L \tilde{\delta }_{N,\ell }({{\varvec{y}}}). \end{aligned}$$
(17)

For each level \(\ell \), we have the stability estimate

$$\begin{aligned} \Vert \tilde{\delta }_{N,\ell }({{\varvec{y}}}) - {\mathcal {I}}_{{{\varvec{p}}}^{(\ell )}}[\delta _{N,\ell }]({{\varvec{y}}}) \Vert _{H_{0}^{1 }(D)} \le C_s({{\varvec{p}}}^{(\ell )}) \max _{{{\varvec{k}}}\in {\mathcal {K}}_{{{\varvec{p}}}^{(\ell )}}} \Vert \tilde{\delta }_{N,\ell ,{{\varvec{k}}}} -\delta _{N,\ell }({{\varvec{y}}}_{{{\varvec{k}}}}^{(\ell )}) \Vert _{H_{0}^{1 }(D)}. \end{aligned}$$

From Theorem 4, we immediately derive the following lemma.

Lemma 3

Assume that for all \({{\varvec{k}}}\in {\mathcal {K}}_{{{\varvec{p}}}^{(\ell )}}\) the perturbations from (16) fulfill

$$\begin{aligned} \left\| \tilde{\delta }_{N,\ell ,{{\varvec{k}}}} -\delta _{N,\ell }\left( {{\varvec{y}}}_{{{\varvec{k}}}}^{(\ell )}\right) \right\| _{H_{0}^{1 }(D)} \lesssim 2^{-L}\Vert f\Vert _{L^2(D)}. \end{aligned}$$

Then

$$\begin{aligned} \left\| u_N({{\varvec{y}}}) - \tilde{u}_{N,L}^{\mathrm{M}\mathrm{L}}({{\varvec{y}}}) \right\| _{H_{0}^{1 }(D)} \lesssim 2^{-L}L\left\| f \right\| _{L^2(D)}. \end{aligned}$$

A particular perturbation based on low-rank truncations will be considered in the following section.

5 Low-rank tensor approximation

The main computational challenge in the multilevel scheme presented above is the evaluation of the differences \(\delta _{N,\ell }({{\varvec{y}}}_{{{\varvec{k}}}}^{(\ell )})\) for all \({{\varvec{k}}}\in {\mathcal {K}}_{{{\varvec{p}}}^{(\ell )}}\). To address high parameter dimensions N, we suggest to approximate these differences in a low-rank tensor format.

Let \(n_\ell \,{:}{=}{\text {dim}} {\mathcal {S}}_\ell ^1(D)\) for the finite element space from (8) and let \(\{\psi _{\ell ,i} \in {\mathcal {S}}_\ell ^1(D): i=1,\ldots ,n_\ell \}\) be an orthonormal basis of \({\mathcal {S}}_\ell ^1(D)\) with respect to the \(H_0^1(D)\) inner product, i.e.,

$$\begin{aligned} \langle \psi _{\ell ,i}, \psi _{\ell ,j} \rangle _{H_0^1(D)} = 0, \qquad i\ne j, \end{aligned}$$

and \(\left\| \psi _{\ell ,i} \right\| _{H_0^1(D)} = 1\).

Remark 2

An \(H_0^1(D)\) orthonormalized basis can be obtained from the standard nodal finite element basis \(\{\phi _{\ell ,i} \in {\mathcal {S}}_\ell ^1(D): i=1,\ldots ,n_\ell \}\) by computing the Cholesky factorization \({\mathbf {A}} = {\mathbf {L}}{\mathbf {L}}^\top \) of the mass matrix \({\mathbf {A}}\in {\mathbb {R}}^{n_\ell \times n_\ell }\) given by

$$\begin{aligned} {\mathbf {A}}_{i,j} \,{:}{=}\langle \phi _{\ell ,i}, \phi _{\ell ,j} \rangle _{H_0^1(D)}. \end{aligned}$$

We then have the relation

$$\begin{aligned} \psi _{\ell ,i} = \sum _{j=1}^{n_\ell } ({\mathbf {L}}^{-1})_{i,j} \phi _{\ell ,j}. \end{aligned}$$

Given the nestedness assumption (10), we have \(\delta _{N,\ell }({{\varvec{y}}})\in {\mathcal {S}}_\ell ^1(D)\). We can hence write

$$\begin{aligned} \delta _{N,\ell }({{\varvec{y}}}) = \sum _{i=1}^{n_\ell } {\mathbf {u}}_i^{(\ell )}({{\varvec{y}}}) \psi _{\ell ,i} \end{aligned}$$
(18)

with \({\mathbf {u}}^{(\ell )}({{\varvec{y}}}) \in {\mathbb {R}}^{n_\ell }\). Let now \(K_\ell \,{:}{=}\#{\mathcal {K}}_{{{\varvec{p}}}^{(\ell )}}\) and define \({\mathbf {X}}^{(\ell )}\in {\mathbb {R}}^{K_\ell \cdot n_\ell }\) as

$$\begin{aligned} {\mathbf {X}}_{({{\varvec{k}}},i)}^{(\ell )} \,{:}{=}{\mathbf {u}}_i^{(\ell )}({{\varvec{y}}}_{{{\varvec{k}}}}^{(\ell )}),\qquad {{\varvec{k}}}\in {\mathcal {K}}_{{{\varvec{p}}}^{(\ell )}}. \end{aligned}$$
(19)

In the following, we interpret the vector \({\mathbf {X}}^{(\ell )}\) as a tensor of order \(N+1\) and size

$$\begin{aligned} (p_1^{(\ell )}+1)\times \cdots \times (p_N^{(\ell )}+1) \times n_\ell \end{aligned}$$

and use low-rank tensor methods to construct a data-sparse approximation \(\tilde{{\mathbf {X}}}^{(\ell )} \approx {\mathbf {X}}^{(\ell )}\). In particular, we make use of the hierarchical tensor format introduced in [26] and analyzed in [21].

5.1 Hierarchical Tensor Format

In the following, we consider tensors \({\mathbf {X}}\in {\mathbb {R}}^{\mathcal {J}}\) of order \(d\in {\mathbb {N}}\) over general product index sets \({\mathcal {J}}= {\mathcal {J}}_1\times \ldots \times {\mathcal {J}}_d\). We first need the concept of the matricization of a tensor.

Definition 1

Let \({\mathcal {D}}\,{:}{=}\{1,\ldots ,d\}\). Given a subset \(t\subset {\mathcal {D}}\) with complement \([t]\,{:}{=}{\mathcal {D}}\setminus t\), the matricization

$$\begin{aligned} {\mathcal {M}}_t:{\mathbb {R}}^{\mathcal {J}}\rightarrow {\mathbb {R}}^{{\mathcal {J}}_t} \otimes {\mathbb {R}}^{{\mathcal {J}}_{[t]}},\qquad {\mathcal {J}}_t\,{:}{=}\times _{i\in t} {\mathcal {J}}_i,\quad {\mathcal {J}}_{[t]}\,{:}{=}\times _{i\in [t]} {\mathcal {J}}_i, \end{aligned}$$

of a tensor \({\mathbf {X}} \in {\mathbb {R}}^{\mathcal {J}}\) is defined by its entries

$$\begin{aligned} {\mathcal {M}}_t({\mathbf {X}})_{(j_i)_{i\in t}, (j_i)_{i\in [t]}} \,{:}{=}{\mathbf {X}}_{(j_1,\ldots ,j_d)},\qquad (j_1,\ldots ,j_d)\in {\mathcal {J}}. \end{aligned}$$

The subsets \(t\subset {\mathcal {D}}\) are organized in a binary dimension tree \(T_{\mathcal {D}}\) with root \({\mathcal {D}}\) such that each node \(t\in T_{\mathcal {D}}\) is non-empty and each \(t\in T_{\mathcal {D}}\) with \(\#t\ge 2\) is the disjoint union of its sons \(t_1,t_2\in T_{\mathcal {D}}\), cf. Fig. 1.

Fig. 1
figure 1

Dimension trees \(T_{\mathcal {D}}\) for \(d=5\). Left balanced tree. Right linear tree

Definition 2

Let \(T_{\mathcal {D}}\) be a dimension tree. The hierarchical rank \({{\varvec{r}}}\,{:}{=}(r_t)_{t\in T_{\mathcal {D}}}\) of a tensor \({\mathbf {X}}\in {\mathbb {R}}^{\mathcal {J}}\) is defined by

$$\begin{aligned} r_t \,{:}{=}{\text {rank}}\big ({\mathcal {M}}_t({\mathbf {X}})\big ),\quad t\in T_{\mathcal {D}}. \end{aligned}$$

For a given hierarchical rank \({{\varvec{r}}}\,{:}{=}(r_t)_{t\in T_{\mathcal {D}}}\), the hierarchical format \({\mathcal {H}}_{{{\varvec{r}}}}\) is defined by

$$\begin{aligned} {\mathcal {H}}_{{{\varvec{r}}}} \,{:}{=}\big \{{\mathbf {X}}\in {\mathbb {R}}^{\mathcal {J}}: {\text {rank}}\big ({\mathcal {M}}_t({\mathbf {X}})\big ) \le r_t,\, t\in T_{\mathcal {D}}\big \}. \end{aligned}$$

Given a tensor \({\mathbf {X}}\in {\mathcal {H}}_{{{\varvec{r}}}}\), Definition 2 implies that we can choose (orthogonal) matrices \({\mathbf {U}}_t \in {\mathbb {R}}^{{\mathcal {J}}_t\times r_t}\) for all \(t\in T_{\mathcal {D}}\) such that \({\text {range}}({\mathbf {U}}_t) = {\text {range}}\big ({\mathcal {M}}_t({\mathbf {X}})\big )\). Moreover, for every non-leaf node \(t \in T_{\mathcal {D}}\) with sons \(t_1,t_2\in T_{\mathcal {D}}\), there exists a transfer tensor \({\mathbf {B}}_t\in {\mathbb {R}}^{r_t\times r_{t_1} \times r_{t_2}}\) such that

$$\begin{aligned} ({\mathbf {U}}_t)_{\cdot ,s} = \sum _{s_1=1}^{r_{t_1}} \sum _{s_2 = 1}^{r_{t_2}} ({\mathbf {B}}_t)_{(s,s_1,s_2)} ({\mathbf {U}}_{t_1})_{\cdot ,s_1} \otimes ({\mathbf {U}}_{t_2})_{\cdot ,s_2},\qquad s = 1,\ldots ,r_t, \end{aligned}$$
(20)

where \(({\mathbf {U}}_t)_{\cdot ,s}\) denotes the sth column of \({\mathbf {U}}_t\). At the root node \(t={\mathcal {D}}\), we identify the tensor \({\mathbf {X}}\) with the column matrix \({\mathbf {U}}_{\mathcal {D}}\in {\mathbb {R}}^{{\mathcal {J}}\times 1}\).

The recursive relation (20) is the key to represent the tensor \({\mathbf {X}}\) compactly. For all leaf nodes \(t\in T_{\mathcal {D}}\), we explicitly store the matrices \({\mathbf {U}}_t\), whereas for all inner nodes \(t\in T_{\mathcal {D}}\) only the transfer tensors \({\mathbf {B}}_t\) are required. The complexity for the hierarchical representation sums up to \({\mathcal {O}}(dr^3 + drn)\), where \(r\,{:}{=}r_{\text {max}} = \max _{t\in T_{\mathcal {D}}} r_t\), \(n\,{:}{=}\max _{i\in {\mathcal {D}}} \#{\mathcal {J}}_i\). The effective rank \(r_{\mathrm {eff}}\) is the real positive number such that \((d-1)r_{\mathrm {eff}}^3 + dr_{\mathrm {eff}}n\) is the actual storage cost for a tensor in \({\mathcal {H}}_{{{\varvec{r}}}}\).

In the multilevel scheme introduced above, the tensor \({\mathbf {X}}^{(\ell )}\) from (19) is defined via the numerical solution of the original PDE on levels \(\ell \) and \(\ell -1\) at all collocation points. This means that an explicit computation of \({\mathbf {X}}^{(\ell )}\) in terms of all its entries would only be possible for small length \(N\) of the Karhunen-Loève expansion and moderate polynomial degrees \({{\varvec{p}}}^{(\ell )}\). To overcome this limitation, we suggest to approximate \({\mathbf {X}}^{(\ell )}\) directly in the hierarchical tensor format \({\mathcal {H}}_{{{\varvec{r}}}}\) by the so-called cross approximation technique introduced in [4].

5.2 Cross approximation technique

The main idea of tensor cross approximation is to exploit the inherent low-rank structure directly by the evaluation of a (small) number of well-chosen tensor entries. Prior numerical experiments indicate that cross approximation works particularly well for tensors of small size in each direction \(i=1,\ldots ,d\). Considering the tensor \({\mathbf {X}} = {\mathbf {X}}^{(\ell )}\) from (19), we observe that the size \(n_\ell \) in direction \(d=N+1\) becomes rather large for higher levels \(\ell \) such that the cross approximation technique cannot be applied directly. We therefore use the following variant consisting of three steps for each particular level \(\ell \):

Step 1.:

Find an (orthogonal) matrix \({\mathbf {V}}\in {\mathbb {R}}^{n_\ell \times r_d}\) such that

$$\begin{aligned} {\mathcal {M}}_{\{d\}}({\mathbf {X}}) \approx {\mathbf {V}}{\mathbf {V}}^\top {\mathcal {M}}_{\{d\}}({\mathbf {X}}). \end{aligned}$$
Step 2.:

Define a tensor \({\mathbf {Y}}\in {\mathbb {R}}^{{\mathcal {J}}^\prime }\) with \({\mathcal {J}}^\prime \,{:}{=}{\mathcal {J}}_{\{1,\ldots ,d-1\}}\times \{1,\ldots ,r_d\}\) via

$$\begin{aligned} {\mathcal {M}}_{\{d\}}({\mathbf {Y}}) = {\mathbf {V}}^\top {\mathcal {M}}_{\{d\}}({\mathbf {X}}). \end{aligned}$$
(21)

and use cross approximation to find \(\tilde{{\mathbf {Y}}} \approx {\mathbf {Y}}\).

Step 3.:

Build the final approximation \(\tilde{{\mathbf {X}}}\) from

$$\begin{aligned} {\mathcal {M}}_{\{d\}}(\tilde{{\mathbf {X}}}) = {\mathbf {V}} {\mathcal {M}}_{\{d\}}(\tilde{{\mathbf {Y}}}). \end{aligned}$$
(22)

The advantage of applying the cross approximation technique to the tensor \({\mathbf {Y}}\) instead of \({\mathbf {X}}\) lies in the reduced size in direction \(d=N+1\) for which we expect \(r_d \ll n_\ell \). We now describe in more detail how the three approximation steps are carried out.

In Step 1, our aim is to construct an (approximate) basis \({\mathbf {V}}\) of the column space of \({\mathcal {M}}_{\{d\}}({\mathbf {X}})\). To this end, we use the greedy strategy from Algorithm 1 over a subset \({\mathcal {J}}_{\mathrm {train}} \subset {\mathcal {J}}_{\{1,\ldots ,d-1\}}\) of column indices.

figure a

To construct the training set \({\mathcal {J}}_{\mathrm {train}}\), we use the following strategy known from tensor cross approximation [22, Sec. 3.5]. Starting with a random index \({{\varvec{j}}}\in {\mathcal {J}}_{\{1,\ldots ,d-1\}}\), we consider the set

$$\begin{aligned} {\mathcal {J}}_{\text {cross}}({{\varvec{j}}}) \,{:}{=}\{ (j_1,\ldots ,j_{i-1},k,j_{i+1},\ldots ,j_{d-1}): k\in {\mathcal {J}}_i,\, i = 1,\ldots ,d-1\} \end{aligned}$$
(23)

which forms a ’cross’ with center \({{\varvec{j}}}\). Repeating this strategy a few number s of times (say \(s=3\)) for random indices \({{\varvec{j}}}^1,\ldots ,{{\varvec{j}}}^s\in {\mathcal {J}}_{\{1,\ldots ,d-1\}}\), we arrive at

$$\begin{aligned} {\mathcal {J}}_{\mathrm {train}} \,{:}{=}{\mathcal {J}}_{\mathrm {cross}}({{\varvec{j}}}^1)\cup \ldots \cup {\mathcal {J}}_{\mathrm {cross}}({{\varvec{j}}}^s), \end{aligned}$$

which determines the training set for the first loop of Algorithm 1: In every subsequent loop of Algorithm 1, this set is enriched with s additional (random) crosses. In line 3, we reuse tensor entries in the training set \({\mathcal {J}}_{\mathrm {train}}\) that were computed at previous loops of the algorithm.

Once the matrix \({\mathbf {V}}\) is constructed, our next aim in Step 2 is to approximate the tensor \({\mathbf {Y}}\in {\mathbb {R}}^{{\mathcal {J}}^\prime }\) from (21) in the hierarchical tensor format \({\mathcal {H}}_{{{\varvec{r}}}}\). Recalling the main idea of the approach in [4], we seek to recursively approximate the matricizations of \({\mathbf {M}} = {\mathcal {M}}_t({\mathbf {Y}})\) at any node \(t\in T_{\mathcal {D}}\) by a so-called cross approximation of the form

$$\begin{aligned} {\mathbf {M}} \approx \tilde{{\mathbf {M}}} \,{:}{=}{\mathbf {M}}|_{{\mathcal {J}}_t^\prime \times {\mathcal {C}}_t} \cdot {\mathbf {M}}|_{{\mathcal {R}}_t\times {\mathcal {C}}_t}^{-1} \cdot {\mathbf {M}}|_{{\mathcal {R}}_t\times {\mathcal {J}}_{[t]}^\prime } \end{aligned}$$
(24)

with \({\text {rank}}(\tilde{{\mathbf {M}}}) = r_t\) and pivot sets \({\mathcal {R}}_t \subset {\mathcal {J}}_t^\prime \), \({\mathcal {C}}_t \subset {\mathcal {J}}_{[t]}^\prime \) of size \(r_t\).

In our algorithm, the approximation (24) is found by a successive rank-one approximation of the matrix \({\mathbf {M}}\). Let \(\tilde{{\mathbf {M}}}^{(0)}\,{:}{=}{\mathbf {0}}\) and assume that an approximation \(\tilde{{\mathbf {M}}}^{(k)} \approx {\mathbf {M}}\) of the form (24) with \({\text {rank}}(\tilde{{\mathbf {M}}}^{(k)}) = k\) has already been constructed. We then seek a rank-one approximation \(\tilde{{\mathbf {R}}}^{(k)}\) of the remainder \({\mathbf {R}}^{(k)}\,{:}{=}{\mathbf {M}}-\tilde{{\mathbf {M}}}^{(k)}\) and update our approximation as \(\tilde{{\mathbf {M}}}^{(k+1)}\,{:}{=}\tilde{{\mathbf {M}}}^{(k)}+\tilde{{\mathbf {R}}}^{(k)}\). The final approximation is obtained as \(\tilde{{\mathbf {M}}} = \tilde{{\mathbf {M}}}^{(r_t)}\).

In each step k, the pivots sets \({\mathcal {R}}_t\) and \({\mathcal {C}}_t\) are enriched by a new pivot element that is found by searching for large entries in modulus in the remainder \({\mathbf {R}}^{(k)}\). As in Step 1, we again use training sets of the form (23) to identify new pivot elements. We use the absolute value of the pivot at step k as an error estimator for the remainder \({\mathbf {R}}^{(k)}\). In this way, the rank \(r_t\) at each node \(t\in T_{\mathcal {D}}\) can be chosen adaptively in order to reach a given (heuristic) target accuracy \(\varepsilon _{\text {ten}}\ge 0\) such that \(\Vert {\mathbf {M}}-\tilde{{\mathbf {M}}}\Vert _2 \approx \varepsilon _{\text {ten}} \Vert {\mathbf {M}}\Vert _2\).

The matrices \({\mathbf {M}}|_{{\mathcal {J}}_t^\prime \times {\mathcal {C}}_t}, {\mathbf {M}}|_{{\mathcal {R}}_t\times {\mathcal {J}}_{[t]}^\prime }\) in (24) are never formed explicitly. The essential information for the construction of \({\mathbf {Y}}\in {\mathcal {H}}_{{{\varvec{r}}}}\) with \({{\varvec{r}}}=(r_t)_{t\in T_{\mathcal {D}}}\) are condensed in the pivot sets \({\mathcal {R}}_t,{\mathcal {C}}_t\) and the matrices \({\mathbf {M}}|_{{\mathcal {R}}_t\times {\mathcal {C}}_t}\in {\mathbb {R}}^{r_t\times r_t}\) from (24). This construction is explicit in the sense that the necessary transfer tensors \({\mathbf {B}}_t\) for all inner nodes \(t\in T_{\mathcal {D}}\) and the matrices \({\mathbf {U}}_t\) in the leaf nodes \(t\in T_{\mathcal {D}}\) are directly determined by the values of \({\mathbf {Y}}\) at certain entries defined by the pivots sets. The details of this procedure can be found in [4, 24].

After the cross approximation has been performed, Step 3 involves no further approximation but only a simple matrix-matrix product. Assume that the tensor \({\mathbf {Y}}\) has been approximated by \(\tilde{{\mathbf {Y}}}\) represented in \({\mathcal {H}}_{{{\varvec{r}}}}\) by means of transfer tensors \({\mathbf {B}}_t\) for inner nodes \(t\in T_{\mathcal {D}}\) and matrices \({\mathbf {U}}_t\) for leaf nodes \(t\in T_{\mathcal {D}}\). In the node \(t=\{d\}\), we now compute the matrix \({\mathbf {U}}_t^\prime \,{:}{=}{\mathbf {V}}{\mathbf {U}}_t\), whereas for all other leaf nodes \(t\in T_{\mathcal {D}}\) we keep \({\mathbf {U}}_t^\prime \,{:}{=}{\mathbf {U}}_t\). It turns out that the tensor \(\tilde{{\mathbf {X}}}\) from (22) is then represented by the transfer tensors \({\mathbf {B}}_t\) and the matrices \({\mathbf {U}}_t^\prime \).

5.3 Error analysis

We now study the effect of a perturbed multilevel approximation introduced through tensor approximations \(\tilde{{\mathbf {X}}}^{(\ell )} \approx {\mathbf {X}}^{(\ell )}\). In particular our aim is to derive an indication from Lemma 3 for the required accuracy in the tensor approximation in order to maintain the convergence result for the multilevel scheme.

Thanks to the orthonormality of the basis \(\{\psi _{\ell ,i}\}\) in (18), we immediately derive from (19) that

$$\begin{aligned} \left\| \delta _{N,\ell }({{\varvec{y}}}_{{{\varvec{k}}}}^{(\ell )}) \right\| _{H_0^1(D)} = \left\| {\mathbf {X}}^{(\ell )}_{({{\varvec{k}}},\cdot )} \right\| _2,\qquad {{\varvec{k}}}\in {\mathcal {K}}_{{{\varvec{p}}}^{(\ell )}}. \end{aligned}$$

In order to apply Lemma 3, we need to ensure that

$$\begin{aligned} \left\| \tilde{{\mathbf {X}}}^{(\ell )}_{({{\varvec{k}}},\cdot )}-{\mathbf {X}}^{(\ell )}_{({{\varvec{k}}},\cdot )} \right\| _2 \lesssim 2^{-L},\qquad {{\varvec{k}}}\in {\mathcal {K}}_{{{\varvec{p}}}^{(\ell )}}. \end{aligned}$$

Noting that \(\left\| \delta _{N,\ell }({{\varvec{y}}}_{{{\varvec{k}}}}^{(\ell )}) \right\| _{H_0^1(D)}\lesssim 2^{-\ell }\), this can be guaranteed if we require

$$\begin{aligned} \left\| \tilde{{\mathbf {X}}}^{(\ell )}_{({{\varvec{k}}},\cdot )} -{\mathbf {X}}^{(\ell )}_{({{\varvec{k}}},\cdot )} \right\| _2 \lesssim 2^{\ell -L} \left\| {\mathbf {X}}^{(\ell )}_{({{\varvec{k}}},\cdot )} \right\| _2,\qquad {{\varvec{k}}}\in {\mathcal {K}}_{{{\varvec{p}}}^{(\ell )}}. \end{aligned}$$

This motivates to perform the tensor approximation with a relative accuracy of \(\varepsilon _\ell \eqsim 2^{\ell -L}\) such that

$$\begin{aligned} \left\| \tilde{{\mathbf {X}}}^{(\ell )} - {\mathbf {X}}^{(\ell )} \right\| _2 \lesssim \varepsilon _\ell \left\| {\mathbf {X}}^{(\ell )} \right\| _2. \end{aligned}$$
(25)

As a consequence, the tensor approximation for higher levels \(\ell \) needs to be done less accurate.

Let us emphasize that the underlying assumption in this analysis is that the tensor approximation can be performed with a prescribed small error. This is by no means clear. For example, this requires the hierarchical ranks to stay moderate in order to be able to store the hierarchical tensor representation. It is an open question to identify situations for which this is guaranteed; see [2, 48] for some work in this direction.

5.4 Final algorithm

Compiling all the results obtained above, our final strategy is summarized in Algorithm 2.

figure b

6 Numerical experiments

In the numerical experiments, we consider the parametric diffusion problem on the unit square given by

$$\begin{aligned} -\nabla \cdot \big (a({{\varvec{y}}})\nabla u({{\varvec{y}}})\big )&= 1,\quad \, \hbox {in}\,\, D = (0,1)^2,\\ u({{\varvec{y}}})&= 0,\quad \, \hbox {on}\,\, \partial D. \end{aligned}$$

On each level \(\ell \) of the proposed multilevel scheme, the domain D is discretized by a uniform triangulation with mesh size

$$\begin{aligned} h_\ell = 2^{-\ell }h_0,\qquad h_0 = 1/4, \end{aligned}$$

using \({\mathcal {Q}}_1\), i.e., bilinear finite elements with \(n_\ell \) degrees of freedom.

To construct the interpolation operator \({\mathcal {I}}_{{{\varvec{p}}}^{(\ell )}}\) from (5), we use an isotropic polynomial degree on each level defined by

$$\begin{aligned} {{\varvec{p}}}^{(\ell )} = (p^{(\ell )},\ldots ,p^{(\ell )})\in {\mathbb {N}}^N,\qquad p^{(\ell )}\,{:}{=}\lfloor (L-\ell +1)/2\rfloor , \end{aligned}$$

where, for simplicity, we have ignored the mild influence of the polynomial degree on the stability constant of the interpolation. Also, possible anisotropies induced by the decay of the Karhunen-Loève expansion are not considered here. The interpolation points \({{\varvec{y}}}_{{{\varvec{k}}}} \in \varGamma = [-1,1]^N\) are given by the tensorized roots of the Chebyshev polynomials of the first kind of degree \(p^{(\ell )}+1\). The accuracy for the tensor approximation from (25) on each level is chosen as

$$\begin{aligned} \varepsilon _\ell = 2^{\ell -L}\varepsilon _0,\qquad \varepsilon _0 = 1/4. \end{aligned}$$

For each level \(\ell \), we report the effective rank \(r_{\mathrm {eff}}\) and the maximal rank \(r_{\mathrm {max}}\) of the approximate tensor \({\mathbf {X}}^{(\ell )}\) represented in the hierarchical tensor format \({\mathcal {H}}_{{{\varvec{r}}}}\). In addition, we state the number of tensor evaluations for Step 1 and Step 2 during the cross approximation procedure of the tensor \({\mathbf {X}}^{(\ell )}\). Note that each evaluation on level \(\ell \) requires the solution of the PDE on level \(\ell \) and level \(\ell -1\).

To measure the error, we randomly choose \(M=100\) parameters \({{\varvec{y}}}^{i}\in \varGamma \) and compute

$$\begin{aligned} \varepsilon ^{\mathrm{M}\mathrm{L}}_{L,2} [u]&\,{:}{=}\left( \sum _{i=1}^M \left\| \tilde{u}_{N,L}^{\mathrm{M}\mathrm{L}}({{\varvec{y}}}^{i}) - u_{N,L}({{\varvec{y}}}^{i}) \right\| _{H_0^1(D)}^2 \bigg / \sum _{i=1}^M \left\| u_{N,L}({{\varvec{y}}}^{i}) \right\| _{H_0^1(D)}^2\right) ^{1/2},\\ \varepsilon ^{\mathrm{M}\mathrm{L}}_{L,\infty } [u]&\,{:}{=}\max _{i=1,\ldots ,M} \left( \left\| \tilde{u}_{N,L}^{\mathrm{M}\mathrm{L}}({{\varvec{y}}}^{i}) - u_{N,L}({{\varvec{y}}}^{i}) \right\| _{H_0^1(D)} \bigg / \left\| u_{N,L}({{\varvec{y}}}^{i}) \right\| _{H_0^1(D)}\right) . \end{aligned}$$

To study the impact of the different levels, we also compute for the perturbed differences \(\tilde{\delta }_{N,\ell }\) the errors

$$\begin{aligned} \varepsilon ^{(\ell )}_{L,2} [u]&\,{:}{=}\left( \sum _{i=1}^M \left\| {\mathcal {I}}_{{{\varvec{p}}}^{(\ell )}}[\tilde{\delta }_{N,\ell }]({{\varvec{y}}}^{i}) \right\| _{H_0^1(D)}^2 \bigg / \sum _{i=1}^M \left\| u_{N,L}({{\varvec{y}}}^{i}) \right\| _{H_0^1(D)}^2\right) ^{1/2}, \\ \varepsilon ^{(\ell )}_{L,\infty } [u]&\,{:}{=}\max _{i=1,\ldots ,M} \left( \left\| {\mathcal {I}}_{{{\varvec{p}}}^{(\ell )}}[\tilde{\delta }_{N,\ell }]({{\varvec{y}}}^{i}) \right\| _{H_0^1(D)} \bigg / \left\| u_{N,L}({{\varvec{y}}}^{i}) \right\| _{H_0^1(D)}\right) . \end{aligned}$$

For a uniform distribution of \(y_n \eqsim {\mathcal {U}}([-1,1])\), \(n=1,\ldots ,N\), we evaluate the expected value of the multilevel solution and compute

$$\begin{aligned} \varepsilon ^{\mathbb {E}}_L [u]\,{:}{=}\left\| {\mathbb {E}}\left[ \tilde{u}_{N,L}^{\mathrm{M}\mathrm{L}}\right] -{\mathbb {E}}[u_{\mathrm {ref}}] \right\| _{H_0^1(D)} \Big / \left\| {\mathbb {E}}[u_{\mathrm {ref}}] \right\| _{H_0^1(D)}, \end{aligned}$$

where \(u_{\mathrm {ref}}\) is the reference solution obtained from the multilevel scheme on the highest level \(L=7\).

From the multilevel solution, we can immediately compute approximations to output functionals, as, e.g., for

$$\begin{aligned} \psi (u)\,{:}{=}\int _D u\,{\mathrm {d}}x. \end{aligned}$$

Analogous to the errors for the solution, we then obtain relative errors for the output functional and for the expected value as

$$\begin{aligned} \varepsilon ^{\mathbb {E}}_L [\psi ] \,{:}{=}\left| {\mathbb {E}}\left[ \psi \left( \tilde{u}_{N,L}^{\mathrm{M}\mathrm{L}}\right) \right] -{\mathbb {E}}[\psi (u_{\mathrm {ref}})]\right| \Big / \big |{\mathbb {E}}[\psi (u_{\mathrm {ref}})]\big |. \end{aligned}$$

All numerical experiments have been carried out on a quad-core Intel(R) Xeon(R) CPU E31225 with 3.10 GHz. The times spent on each level \(\ell \) are CPU times for a single core. For the finite element approximation, we have used the software library deal.II, see [6]. All sparse linear systems have been solved by a multifrontal solver from UMFPACK.

6.1 Karhunen–Loève expansion with exponential decay

In the first experiment, the Karhunen-Loève expansion of the diffusion coefficient is given by

$$\begin{aligned} a({{\varvec{y}}},x) = 2 + \sum _{n=1}^N \sqrt{\lambda _n} b_n(x) y_n \end{aligned}$$
(26)

with

$$\begin{aligned} b_n(x) = \sin (2\pi n x_1) \sin (2\pi n x_2). \end{aligned}$$

We consider an exponential decay of the eigenvalues defined by \(\lambda _n \,{:}{=}{\text {exp}}(-n)\). The results of this experiment for \(N=10,20\) can be found in Table 1 and Fig. 2.

Table 1 Karhunen–Loève expansion with exponential decay: multilevel approximation for \(L=7\) with the number of tensor evaluations for Step 1 and Step 2 and the time spent on each level

From the last two columns of Table 1, it can be seen that our adaptive choice of the polynomial degree and of the hierarchical ranks successfully equilibrates the error on the different finite element levels. The hierarchical ranks increase initially and then decrease again as the level increases. This decrease is the most important feature of our approach; it significantly reduces the cost, in terms of queries to the solution, on the finer levels and the overall solution process. Figure 2 shows that the error decreases proportionally with h as the maximum number of levels increases, as expected from our error estimates.

Fig. 2
figure 2

Karhunen–Loève expansion with exponential decay for \(N=10\). Left errors \(\varepsilon ^{\mathrm {ML}}_{L,2} [u]\) (interpolation L2), \(\varepsilon ^{\mathrm {ML}}_{L,\infty } [u]\) (interpolation L\(\infty \)), and \(\varepsilon ^{\mathbb {E}}_L [u]\) (expectation) for the solution. Right errors \(\varepsilon ^{\mathrm {ML}}_L [\psi ]\) (interpolation) and \(\varepsilon ^{\mathbb {E}}_L [\psi ]\) (expectation) for the output functional

In Fig. 3, we report the hierarchical rank for the approximate tensor \(\tilde{{\mathbf {X}}}^{(\ell )} \approx {\mathbf {X}}^{(\ell )}\) from (19) on level \(\ell =4\) for \(N=10\) (left) and \(N=20\) (right). Each node \(t\in T_{{\mathcal {D}}}\) in the trees in Fig. 3 is equipped with the corresponding rank \(r_t = {\text {rank}}\left( {\mathcal {M}}_t(\tilde{{\mathbf {X}}}^{(\ell )})\right) \) from Definition 2. We can see that for \(N=20\) there are many more ranks \(r_t\) that are very small, in particular for those nodes t associated to higher values of the index n, which also leads to a reduced effective rank \(r_{\mathrm {eff}}\) compared to the case \(N=10\). This behavior is most likely thanks to the decreasing importance of random variables represented as the trailing terms in the Karhunen-Loève expansion when the dimension N is increased.

Fig. 3
figure 3

Karhunen–Loève expansion with exponential decay: Hierarchical rank for \(\tilde{{\mathbf {X}}}^{(\ell )} \approx {\mathbf {X}}^{(\ell )}\) from (19) on level \(\ell =4\). Left \(N=10\). Right \(N=20\)

6.2 Karhunen–Loève expansion with fast algebraic decay

In this experiment, the diffusion coefficient is again given by (26). We consider an algebraic decay of the eigenvalues defined by \(\lambda _n \,{:}{=}1/n^4\). The results of this experiment for \(N=10,20\) can be found in Table 2 and Fig. 4.

Table 2 Karhunen–Loève expansion with fast algebraic decay: multilevel approximation for \(L=7\) with the number of tensor evaluations for Step 1 and Step 2 and the time spent on each level
Fig. 4
figure 4

Karhunen–Loève expansion with fast algebraic decay for \(N=10\). Left errors \(\varepsilon ^{\mathrm {ML}}_{L,2} [u]\), \(\varepsilon ^{\mathrm {ML}}_{L,\infty } [u]\), and \(\varepsilon ^{\mathbb {E}}_L [u]\) for the solution. Right errors \(\varepsilon ^{\mathrm {ML}}_L [\psi ]\) and \(\varepsilon ^{\mathbb {E}}_L [\psi ]\) for the output functional

6.3 Karhunen–Loève expansion with slow algebraic decay

In this experiment, the diffusion coefficient is also given by (26). We consider an algebraic decay of the eigenvalues defined by \(\lambda _n \,{:}{=}1/n^2\). The results of this experiment for \(N=10\) can be found in Table 3 and Fig. 5. As expected, the maximal hierarchical rank becomes significantly higher compared to the fast algebraic decay.

Table 3 Karhunen–Loève expansion with slow algebraic decay: multilevel approximation for \(L=7\) with the number of tensor evaluations for Step 1 and Step 2 and the time spent on each level
Fig. 5
figure 5

Karhunen–Loève expansion with slow algebraic decay for \(N=10\). Left errors \(\varepsilon ^{\mathrm {ML}}_{L,2} [u]\), \(\varepsilon ^{\mathrm {ML}}_{L,\infty } [u]\), and \(\varepsilon ^{\mathbb {E}}_L [u]\) for the solution. Right errors \(\varepsilon ^{\mathrm {ML}}_L [\psi ]\) and \(\varepsilon ^{\mathbb {E}}_L [\psi ]\) for the output functional

6.4 Log-uniform case

Finally, to demonstrate that our approach does not depend on an affine linear decomposition of the diffusion coefficient with respect to the parameters, we consider

$$\begin{aligned} a({{\varvec{y}}},x) = \exp \left( \sum _{n=1}^N \sqrt{\lambda _n} b_n(x) y_n \right) , \end{aligned}$$

with an algebraic decay defined by \(\lambda _n \,{:}{=}1/n^2\). The results of this experiment for \(N=10\) can be found in Table 4 and Fig. 6.

Table 4 Log-uniform case: Multilevel approximation for \(L=7\) with the number of tensor evaluations for Step 1 and Step 2 and the time spent on each level
Fig. 6
figure 6

Log-uniform case for \(N=10\). Left errors \(\varepsilon ^{\mathrm {ML}}_{L,2} [u]\), \(\varepsilon ^{\mathrm {ML}}_{L,\infty } [u]\), and \(\varepsilon ^{\mathbb {E}}_L [u]\) for the solution. Right errors \(\varepsilon ^{\mathrm {ML}}_L [\psi ]\) and \(\varepsilon ^{\mathbb {E}}_L [\psi ]\) for the output functional

7 Conclusions

In this article, we have considered the multilevel tensor approximation for elliptic partial differential equations with a random diffusion coefficient. By combining the multilevel idea for the approximation in the random parameter, which has firstly been introduced in the context of multilevel Monte Carlo methods, with a hierarchical tensor product approximation, we provide an efficient means to directly represent the solution in a data sparse format. This representation can directly be employed for the evaluation of various functionals of the solution without the necessity of performing additional costly computations. In contrast to previous works, we do not rely on an a priori sparsified representation based on polynomials, but adaptively compute a data sparse representation of the solution with the aid of the hierarchical tensor format and the cross approximation. The numerical results confirm the effectiveness of the presented method.