Foundations of Computational Mathematics

, Volume 15, Issue 4, pp 839–898 | Cite as

Adaptive Near-Optimal Rank Tensor Approximation for High-Dimensional Operator Equations

Article

Abstract

We consider a framework for the construction of iterative schemes for operator equations that combine low-rank approximation in tensor formats and adaptive approximation in a basis. Under fairly general assumptions, we conduct a rigorous convergence analysis, where all parameters required for the execution of the methods depend only on the underlying infinite-dimensional problem, but not on a concrete discretization. Under certain assumptions on the rates for the involved low-rank approximations and basis expansions, we can also give bounds on the computational complexity of the iteration as a function of the prescribed target error. Our theoretical findings are illustrated and supported by computational experiments. These demonstrate that problems in very high dimensions can be treated with controlled solution accuracy.

Keywords

Low-rank tensor approximation Adaptive methods High-dimensional operator equations Computational complexity 

Mathematics Subject Classification

41A46 41A63 65D99 65J10 65N12 65N15 

1 Introduction

1.1 Motivation

Any attempt to recover or approximate a function of a large number of variables with the aid of classical low-dimensional techniques is inevitably impeded by the curse of dimensionality. This means that when only assuming classical smoothness (e.g., in terms of Sobolev or Besov regularity) of order \(s>0\), the necessary computational work needed to realize a desired target accuracy \(\varepsilon \) in \(d\) dimensions scales like \(\varepsilon ^{-d/s}\), i.e., one faces an exponential increase in the spatial dimension \(d\). This can be ameliorated by dimension-dependent smoothness measures. In many high-dimensional problems of interest, the approximand has bounded high-order mixed derivatives, which under suitable assumptions can be used to construct sparse grid-type approximations where the computational work scales like \(C_d \varepsilon ^{-1/s}\). Under such regularity assumptions, one can thus obtain a convergence rate independent of \(d\). In general, however, the constant \(C_d\) will still grow exponentially in \(d\). This has been shown to hold even under extremely restrictive smoothness assumptions in [30] and has been observed numerically in a relatively simple but realistic example in [13].

Hence, in contrast to the low-dimensional regime, regularity is no longer a sufficient structural property that ensures computational feasibility, and further, low-dimensional structure of the sought high-dimensional object is required. Such a structure could be the dependence of the function on a much smaller (unknown) number of variables; see, for example, [12]. It could also mean sparsity with respect to some (a priori) unknown dictionary. In particular, dictionaries comprised of rank-one tensors\(g(x_1,\ldots , x_d)=g_1(x_1)\cdots g_d(x_d)=: (g_1\otimes \cdots \otimes g_d)(x)\) open very promising perspectives and have recently attracted substantial attention.

As a simple example consider \(g(x)= \bigotimes _{i=1}^d g_i(x_i)\) on the unit cube \(\varOmega = [0,1]^d\), where the \(g_i\) are sufficiently smooth. Employing for each factor \(g_i\) a standard spline approximation of order \(s\) with \(n\) knots yields an \(L_\infty \)-accuracy of order \(n^{-s}\), which gives rise to an overall accuracy on the order of \(d n^{-s}\) at the expense of \(dn=:N\) degrees of freedom. Hence, assuming that \(||g||_{\infty }\) does not depend on \(d\), an accuracy \(\varepsilon \) requires
$$\begin{aligned} N=N(\varepsilon ,d)\sim d^{\frac{1+s}{s}} \varepsilon ^{-1/s} \end{aligned}$$
(1)
degrees of freedom. In contrast, it would take the order of \(N= n^d\) degrees of freedom to realize an accuracy of order \(n^{-s} = N^{-d/s}\) when using a standard tensor product spline approximation, which means that in this case \(N(\varepsilon ,d)\sim \varepsilon ^{-d/s}\). Thus, while the first approximation – using a nonlinear parametrization of a reference basis – breaks the curse of dimensionality, the second one obviously does not.
Of course, it is generally unrealistic to assume that \(u\) is a simple tensor, but the curse of dimensionality can still be significantly mitigated when \(f\) is well approximable by relatively short sums of rank-one tensors. By this we mean that for some norm \(\Vert \cdot \Vert \) we have
$$\begin{aligned} \Big \Vert { u - \sum _{j=1}^{r(\varepsilon )} g_{1,j}\otimes \cdots \otimes g_{j,d}} \Big \Vert \le \varepsilon \end{aligned}$$
(2)
where the rank \(r(\varepsilon )\) grows only moderately as \(\varepsilon \) decreases. In our initial example, in these terms we had \(r(\varepsilon )=1\) for all \(\varepsilon >0\). Assuming that all the factors \(g_{j,i}\) in the preceding approximation are sufficiently smooth, the count (1) applied to each summand with target accuracy \(\varepsilon /r\) shows that now at most
$$\begin{aligned} N(\varepsilon ,d,r) \lesssim r^{1+\frac{1}{s}}d^{\frac{1+s}{s}} \varepsilon ^{-\frac{1}{s}} \end{aligned}$$
(3)
degrees of freedom are required, which is still acceptable. This is clearly very crude reasoning because it does not take into account a possible additional decay in the rank-one summands.

This argument, however, already indicates that good approximability in the sense of (2) is not governed by classical regularity assumptions. Instead, the key is to exploit an approximate global low-rank structure of \(u\). This leads to a highly nonlinear approximation problem, where one aims to identify suitable lower-dimensional tensor factors, which can be interpreted as a \(u\)-dependent dictionary.

This discussion, though admittedly somewhat oversimplified, immediately raises several questions that we will briefly discuss as they guide subsequent developments.

Format of approximation: the hope that \(r(\varepsilon )\) in (2) can be rather small is based on the fact that the rank-one tensors are allowed to “optimally adapt” to the approximand \(u\). The format of the approximation used in (2) is sometimes called canonical since it is a formal direct generalization of classical Hilbert–Schmidt expansions for \(d=2\). However, a closer look reveals a number of well-known pitfalls. In fact, they are already encountered in the discrete case. The collection of sums of rank-one tensors of a given length is not closed, and the best approximation problem is not well-posed; see, for example, [35]. There appears to be no reliable computational strategy that can be proven to yield near-minimal rank approximations for a given target accuracy in this format. In this work, we therefore employ different tensor formats that allow us to obtain provably near-minimal rank approximations, as explained later.

A two-layered problem: Given a suitable tensor format, even if a best tensor approximation is known in the infinite-dimensional setting of a continuous problem, the resulting lower-dimensional factors still need to be approximated. Since finding these factors is part of the solution process, the determination of efficient discretizations for these factors will need to be intertwined with the process of finding low-rank expansions. We have chosen here to organize this process through selecting low-dimensional orthonormal wavelet bases for the tensor factors. However, other types of basis expansions would be conceivable as well.

The issue of the total complexity of tensor approximations, taking into account the approximation of the involved lower-dimensional factors, is addressed in [19, 34].

1.2 Conceptual Preview

The problem of finding a suitable format of tensor approximations has been extensively studied in the literature over the years, however, mainly in the discrete or finite-dimensional setting; see, for example, [17, 22, 25, 31, 33]. Some further aspects in a function space setting have been addressed, for example, in [14, 39, 40]. For an overview and further references we also refer the reader to [20] and the recent survey [18]. A central question in these works is: given a tensor, how can one obtain in a stable manner low-rank approximations, and how accurate are they when compared with best tensor approximations in the respective format?

We shall heavily draw on these findings in the present paper, but under the following somewhat different perspectives. First of all, we are interested in the continuous infinite-dimensional setting, i.e., in sparse tensor approximations of a function that is a priori not given in any finite tensor format but that one may expect to be well approximable by simple tensors in a way to be made precise later. We shall not discuss here the question of the concrete conditions under which this is actually the case. Moreover, the objects to be recovered are not given explicitly but only implicitly as a solution to an operator equation
$$\begin{aligned} Au =f, \end{aligned}$$
(4)
where \(A: V\rightarrow V'\) is an isomorphism of some Hilbert space \(V\) onto its dual \(V'\). One may think of \(V\), in the simplest instance, as a high-dimensional \(\mathrm{L}_{2}\) space, or as a Sobolev space. More generally, as in the context of parametric diffusion problems, \(V\) could be a tensor product of a Sobolev space and an \(\mathrm{L}_{2}\) space. Accordingly, we shall always assume that we have a Gelfand triplet
$$\begin{aligned} V\subset H \equiv H' \subset V', \end{aligned}$$
(5)
in the sense of dense continuous embeddings, where we assume that \(H\) is a tensor product Hilbert space, that is,
$$\begin{aligned} H = H_1 \otimes \cdots \otimes H_d \end{aligned}$$
(6)
with lower-dimensional Hilbert spaces \(H_i\). A typical example would be \(H = \mathrm{L}_{2}(\varOmega ^d) = \mathrm{L}_{2}(\varOmega )\otimes \cdots \otimes \mathrm{L}_{2}(\varOmega )\) for a domain \(\varOmega \) of small spatial dimension.

The main contribution of this work is to put forward a strategy that addresses the main obstacles identified previously and results in an algorithm that, under mild assumptions, can be rigorously proven to provide for any target accuracy \(\varepsilon \) an approximate solution of near-minimal rank and representation complexity of the involved tensor factors. Specifically, (i) it is based on stable tensor formats relying on optimal subspaces; (ii) successive solution updates involve a combined refinement of ranks and factor discretizations; (iii) (near-)optimality is achieved, thanks to (i), through accompanying suitable subspace correction and coarsening schemes.

The following comments on the main ingredients are to provide some orientation. A first essential step is to choose a universal basis for functions of a single variable in \(H_i\). Here, we focus on wavelet bases, but other systems, like the trigonometric system for periodic problems, are conceivable as well. As soon as functions of a single variable, especially the factors in our rank-one tensors, are expanded in such a basis, the whole problem of approximating \(u\) reduces to approximating its infinite coefficient tensor \(\mathbf{u}\) induced by the expansion
$$\begin{aligned} u=\sum _{\nu \in \nabla ^d} u_\nu \, \varPsi _\nu ,\quad \varPsi _\nu := \psi _{\nu _1}\otimes \cdots \otimes \psi _{\nu _d},\quad \mathbf{u}= (u_\nu )_{\nu \in \nabla ^d}, \end{aligned}$$
see subsequent discussion. The original operator Eq. (4) is then equivalent to an infinite system
$$\begin{aligned} \mathbf {A}\mathbf{u}=\mathbf {f},\quad \text{ where }\quad \mathbf {A}=\big (\langle A\varPsi _\nu ,\varPsi _{\nu '}\big \rangle \big )_{\nu ,\nu '\in \nabla ^d},\quad \mathbf {f}= \big (\langle f,\varPsi _\nu \rangle \big )_{\nu \in \nabla ^d}. \end{aligned}$$
(7)
For standard types of Sobolev spaces \(V\) it is well understood how to rescale the tensor product basis \(\{ \varPsi _\nu \}_{\nu \in \nabla ^d}\) in such a way that it becomes a Riesz basis for \(V\). This, in turn, together with the fact that \(\kappa _{V\rightarrow V'}(A):= \Vert A\Vert _{V\rightarrow V'}\Vert A^{-1}\Vert _{V'\rightarrow V}\) is finite, allows one to show that \(\kappa _{\ell _2\rightarrow \ell _2}(\mathbf {A})\) is finite, see [11]. Hence one can find a positive \(\omega \) such that \(\Vert \mathbf{I}- \omega \mathbf {A}\Vert _{\ell _2\rightarrow \ell _2}\le \rho < 1\), i.e., the operator \(\mathbf{I}- \omega \mathbf {A}\) is a contraction, so that the iteration
$$\begin{aligned} \mathbf{u}_{k+1} := \mathbf{u}_k +\omega (\mathbf {f}- \mathbf {A}\mathbf{u}_k),\quad k=0,1,2,\ldots , \end{aligned}$$
(8)
converges for any initial guess to the solution \(\mathbf{u}\) of (7).

Of course, (8) is only an idealization because the full coefficient sequences \(\mathbf{u}_k\) cannot be computed. Nevertheless, adaptive wavelet methods can be viewed as realizing (8) approximately, keeping possibly few wavelet coefficients “active” while still preserving enough accuracy to ensure convergence to \(\mathbf{u}\) (e.g., [9, 10]).

In the present high-dimensional context this kind of adaptation is no longer feasible. Instead, we propose here a much more nonlinear adaptation concept. Being able to keep increasingly accurate approximations on a path toward near-minimal rank approximations with properly sparsified tensor factors relies crucially on suitable correction mechanisms. An important contribution of this work is to identify and analyze just such methods. Conceptually, they are embedded in a properly perturbed numerical realization of (8) of the form
$$\begin{aligned} \mathbf{u}_{k+1} = \mathrm{C}_{\varepsilon _2(k)}\big (\mathrm{P}_{\varepsilon _1(k)}(\mathbf{u}_k + \omega (\mathbf {f}- \mathbf {A}\mathbf{u}_k))\big ), \quad k=0,1,2,\ldots , \end{aligned}$$
(9)
where \(\mathrm{P}_{\varepsilon _{1(k)}}\) and \(\mathrm{C}_{\varepsilon _{2(k)}}\) are certain reduction operators, and the \(\varepsilon _i(k)\), \(i=1,2\), are suitable tolerances that decrease for increasing \(k\).

More precisely, the purpose of \(\mathrm{P}_{\varepsilon }\) is to “correct” the current tensor expansion and, in doing so, reduce the rank subject to an accuracy tolerance \(\varepsilon \). We shall always refer to such a rank reduction operation as a recompression. For this operation to work as desired, it is essential that the employed tensor format be stable in the sense that the best approximation problem for any given rank is well-posed. As explained earlier, this excludes the use of the canonical format. Instead, we use the so-called hierarchical Tucker (HT) format since on the one hand it inherits the stability of the Tucker format [14], as a classical best subspace method, while on the other hand it better ameliorates the curse of dimensionality that the Tucker format may still be prone to. In Sect. 2 we collect the relevant prerequisites. This draws to a large extent on known results for the finite-dimensional case but requires proper formulation and extension of these notions and facts for the current sequence space setting. The second reduction operation \(\mathrm{C}_{\varepsilon }\), in turn, is a coarsening scheme that reduces the number of degrees of freedom used by the wavelet representations of the tensor factors, again subject to some accuracy constraint \(\varepsilon \).

1.3 What Is New?

The use of rank reduction techniques in iterative schemes is in principle not new; see, for example, [3, 5, 6, 21, 24, 26, 28] and the further references given in [18]. To our knowledge, corresponding approaches can be subdivided roughly into two categories. In the first one, iterates are always truncated to a fixed tensor rank. This allows one to control the complexity of the approximation, but convergence of such iterations can be guaranteed only under very restrictive assumptions (e.g., concerning highly effective preconditioners). In the second category, schemes achieve a desired target accuracy by instead prescribing an error tolerance for the rank truncations, but the corresponding ranks arising during the iteration are not controlled. A common feature of both groups of results is that they operate on a fixed discretization of the underlying continuous problems.

In contrast, the principal novelty of the present approach can be sketched as follows. The first key element is to show that, based on a known error bound for a given approximation to the unknown solution, a judiciously chosen recompression produces a near-minimal rank approximation to the solution of the continous problem for a slightly larger accuracy tolerance. Moreover, the underlying projections are stable with respect to certain sparsity measures. As pointed out earlier, this reduction needs to be intertwined with a sufficiently accurate but possibly coarse approximation of the tensor factors. A direct coarsening of the full wavelet coefficient tensor would face the curse of dimensionality and, thus, would be practically infeasible. The second critical element is therefore to introduce certain lower-dimensional quantities, termed tensor contractions, from which the degrees of freedom to be discarded in the coarsening are identified. This notion of contractions also serves to define suitable sparsity classes with respect to wavelet coefficients, facilitating a computationally efficient, rigorously founded combination of tensor recompression and coefficient coarsening.

These concepts culminate in the main result of this paper, which can be summarized in an admittedly oversimplified way as follows.

Meta-Theorem:Whenever the solution to (7) has certain tensor-rank approximation rates and when the involved tensor factors have certain best N-term approximation rates, a judicious numerical realization of the iteration (9) realizes these rates. Moreover, up to logarithmic factors, the computational complexity is optimal. More specifically, for the smallest\(k\)such that the approximate solution\(\mathbf{u}_k\)satisfies\(\Vert \mathbf{u}_k -\mathbf{u}\Vert _{\ell _2}\le \tau ,\,\mathbf{u}_k\)has HT ranks that can be bounded, up to multiplication by a uniform constant, by the smallest possible HT ranks needed to realize accuracy\(\tau \).

In the theorem that we will eventually prove we admit classes of operators with unbounded ranks, in which case the rank bounds contain a factor of the form \(|\log \tau |^c\), where \(c\) is a fixed exponent.

To our knowledge, this is the first result of this type, where convergence to the solution of the infinite-dimensional problem is guaranteed under realistic assumptions, and all ranks arising during the process remain proportional to the respective smallest possible ones. A rigorous proof of rank near optimality, using an iteration of the preceding type, is to be contrasted to approaches based on greedy approximation as studied, for example, in [7], where approximations in the (unstable) canonical format are constructed through successive greedy updates. This does, in principle, not seem to offer much hope for finding minimal or near-minimal rank approximations, as the greedy search operates far from orthonormal bases, and errors committed early in the iteration cannot easily be corrected. Although variants of the related proper generalized decomposition, as studied in [16], can alleviate some of these difficulties, for example by employing different tensor formats, the basic issue of controlling ranks in a greedy procedure remains.

1.4 Layout

The remainder of the paper is devoted to the development of the ingredients and their complexity analysis needed to make the statements in the preceding metatheorem precise. Trying to carry out this program raises some issues that we will briefly address now because they guide subsequent developments.

After collecting some preliminaries in Sect. 2, we devote Sect. 3 to a pivotal element of our approach, namely, the development and analysis of suitable recompression and coarsening schemes that yield an approximation in the HT format, that is, for a given target accuracy, of near-minimal rank with possibly sparse tensor factors (in a sense to be made precise later).

Of course, one can hope that the solution of (4) is particularly tensor sparse in the sense that relatively low HT ranks already provide high accuracy if the data \(f\) are tensor sparse and if the operator \(A\) (resp. \(\mathbf {A}\)) is tensor sparse in the sense that its application does not increase ranks too drastically. Suitable models of operator classes that allow us to properly weigh tensor sparsity and wavelet expansion sparsity are introduced and analyzed in Sect. 4. The approximate application of such operators with certified output accuracy builds on the findings in Sect. 3.

Finally, in Sect. 5 we formulate an adaptive iterative algorithm and analyze its complexity. Starting from the coarsest possible approximation \(\mathbf{u}^0 =0\), approximations in the tensor format are built successively, where the error tolerances in the iterative scheme are updated for each step in such a way that two goals are achieved. On the one hand, the tolerances are sufficiently stringent to guarantee the convergence of the iteration up to any desired target accuracy. On the other hand, we ensure that at each stage of the iteration, the approximations remain sufficiently coarse to realize the metatheorem formulated earlier. Here we specify concrete tensor approximability assumptions on \(\mathbf{u}\), \(\mathbf {f}\), and \(\mathbf {A}\) that allow us to make its statement precise.

2 Preliminaries

In this section we set the notation and collect the relevant ingredients for stable tensor formats in the infinite-dimensional setting. In the remainder of this work, for simplicity’s sake we shall use the abbreviation \(||\cdot ||:=||\cdot ||_{\mathrm{\ell }_{2}}\), with the \(\mathrm{\ell }_{2}\)-space on the appropriate index set.

Our basic assumption is that we have a Riesz basis \(\{\varPsi _\nu \}_{\nu \in \nabla ^d}\) for \(V\), where \(\nabla \) is a countable index set. In other words, we require that the index set have a Cartesian product structure. Therefore, any \(u\in V\) can be identified with its basis coefficient sequence \({\mathbf {u}} := {(u_\nu )_{\nu \in \nabla ^d}}\) in the unique representation \(u=\sum _{\nu \in \nabla ^d}u_\nu \varPsi _\nu \), with uniformly equivalent norms. Thus, \(d\) will in general correspond to the spatial dimension of the domain of functions under consideration. In addition, it can be important to reserve the option of grouping some of the variables in a possibly smaller number \(m\le d\) of portions of variables, i.e., \(m\in \mathrm{I}\!\mathrm{N}\) and \(d = d_1 + \cdots + d_m\) for \(d_i\in \mathbb {N}\).

A canonical point of departure for the construction of \(\{ \varPsi _\nu \}\) is a collection of Riesz bases for each component Hilbert space \(H_i\) [see (6)], which we denote by \(\{ \psi ^{H_i}_\nu \}_{\nu \in \nabla ^{H_i}}\). To fit in the preceding context, we may assume without loss of generality that all \(\nabla ^{H_i}\) are identical, denoted by \(\nabla \). The precise structure of \(\nabla \) is irrelevant at this point; however, in the case where the \(\psi ^{H_i}_\nu \) are wavelets, each \(\nu =(j,k)\) encodes a dyadic level \(j=|\nu |\) and a spatial index \(k=k(\nu )\). This latter case is of particular interest since, for instance, when \(V\) is a Sobolev space, a simple rescaling of \(\psi ^{H_1}_{\nu _1} \otimes \cdots \otimes \psi ^{H_d}_{\nu _d}\) yields a Riesz basis \(\{ \varPsi _\nu \}\) for \(V\subseteq H\) as well.

A simple scenario would be \(V=H=\mathrm{L}_{2}((0,1)^d)\), which is the situation considered in our numerical illustration in Sect. 6. A second example are elliptic diffusion equations with stochastic coefficients. In this case, \(V= \mathrm{H}^{1}_0(\varOmega ) \otimes \mathrm{L}_{2}((-1,1)^\infty )\), and \(H = \mathrm{L}_{2}(\varOmega \times (-1,1)^\infty )\). Here a typical choice of bases for \(\mathrm{L}_{2}((-1,1)^\infty )\) are tensor products of polynomials on \((-1,1)\), while one can take a wavelet basis for \(\mathrm{H}^{1}_0(\varOmega )\), obtained by rescaling a standard \(\mathrm{L}_{2}\) basis. A third representative scenario concerns diffusion equations on high-dimensional product domains \(\varOmega ^d\). Here, for instance, \(V = \mathrm{H}^{1}(\varOmega ^d)\) and \(H = \mathrm{L}_{2}(\varOmega ^d)\). We shall comment on some additional difficulties that arise in the application of operators in this case in Remark 18.

We now regard \(\mathbf{u}\) as a tensor of order \(m\) on \(\nabla ^d = \nabla ^{d_1}\times \cdots \times \nabla ^{d_m}\) and look for representations or approximations of \(\mathbf{u}\) in terms of rank-one tensors:
$$\begin{aligned} \mathbf{V}^{(1)}\otimes \cdots \otimes \mathbf{V}^{(m)} := \big (V^{(1)}_{\nu _1}\cdots V^{(m)}_{\nu _m}\big )_{\nu =(\nu _1,\ldots ,\nu _m)\in \nabla ^d}. \end{aligned}$$
Rather than looking for approximations or representations in the canonical format
$$\begin{aligned} \mathbf{u}= \sum _{k=1}^r a_k \mathbf{U}_k^{(1)}\otimes \cdots \otimes \mathbf{U}_k^{({m})}, \end{aligned}$$
we will employ tensor representations of a format that is perhaps best motivated as follows. Consider for each \(i=1,\ldots ,m\) (finitely or infinitely many) pairwise orthonormal sequences \(\mathbf{U}^{(i)}_k =(U^{(i)}_{\nu _i,k})_{\nu _i\in \nabla ^{d_i}}\in \ell _2(\nabla ^{d_i})\), \(k=1,\ldots ,r_i\), that is,
$$\begin{aligned} \langle \mathbf{U}^{(i)}_k,\mathbf{U}^{(i)}_l\rangle := \sum _{\nu _i\in \nabla ^{d_i}} U^{(i)}_{\nu _i,k}U^{(i)}_{\nu _i,l} = \delta _{k,l},\quad i=1,\ldots ,m. \end{aligned}$$
We stress that here and in the sequel \(r_i=\infty \) is admitted. The matrices \(\mathbf{U}^{(i)} = \big (U^{(i)}_{\nu _i,k}\big )_{\nu _i\in \nabla ^{d_i},1\le k\le r_i}\) are often termed orthonormal mode frames. It will be convenient to use the notational convention \({\mathsf {k}} = (k_1,\ldots , k_t)\), \({\mathsf {n}} = (n_1,\ldots ,n_t)\), \({\mathsf {r}} = (r_1,\ldots ,r_t)\), and so forth, for multiindices in \(\mathrm{I}\!\mathrm{N}^t_0\), \(t\in \mathrm{I}\!\mathrm{N}\). Defining for \({\mathsf {r}} \in \mathrm{I}\!\mathrm{N}_0^m\)
$$\begin{aligned} {\mathsf {K}_m}({\mathsf {r}}) := \left\{ \begin{array}{ll} \times _{i=1}^m \{{1},\ldots ,{{r}_i}\} &{}\quad \hbox {if } \min {\mathsf {r}} > 0,\\ \emptyset &{}\quad \hbox {if} \min {\mathsf {r}} = 0, \end{array} \right. \end{aligned}$$
and noting that \(\ell _2(\nabla ^d)=\bigotimes _{j=1}^m\ell _2(\nabla ^{d_j})\) is a tensor product Hilbert space, the tensors
$$\begin{aligned} \mathbb {U}_{\mathsf {k}} := \mathbf{U}^{(1)}_{k_1}\otimes \cdots \otimes \mathbf{U}^{(m)}_{k_m}, \quad {\mathsf {k}}\in {\mathsf {K}_m}({\mathsf {r}}), \end{aligned}$$
(10)
form an orthonormal basis for the subspace of \(\ell _2(\nabla ^d)\), generated by the system \(\mathbb {U}:= (\mathbb {U}_{\mathsf {k}})_{{\mathsf {k}}\in {\mathsf {K}_m}({\mathsf {r}})}\). Hence, for any \(\mathbf{u}\in \ell _2(\nabla ^d)\) the orthogonal projection
$$\begin{aligned} \mathrm{P }_{\mathbb {U}}\mathbf{u}=\sum _{{\mathsf {k}}\in {\mathsf {K}_m}({\mathsf {r}})} a_{\mathsf {k}} \mathbb {U}_{\mathsf {k}}, \quad a_{\mathsf {k}} = \langle \mathbf{u}, \mathbb {U}_{\mathsf {k}} \rangle ,\, {\mathsf {k}}\in {\mathsf {K}_m}({\mathsf {r}}), \end{aligned}$$
(11)
is the best approximation to \(\mathbf{u}\in \ell _2(\nabla ^d)\) from the subspace spanned by \(\mathbb {U}\). The uniquely defined order-\(m\) tensor \(\mathbf {a}\) with entries \(\langle \mathbf{u},\mathbb {U}_{\mathsf {k}}\rangle \), \({\mathsf {k}}\in {\mathsf {K}_m}({\mathsf {r}})\), is referred to as a core tensor. Moreover, when the \(\mathbf{U}^{(i)}_k\), \(k\in \mathrm{I}\!\mathrm{N}\), are bases for all of \(\ell _2(\nabla ^{d_i})\), that is, \({\mathsf {K}_m}({\mathsf {r}}) =\mathrm{I}\!\mathrm{N}^m\), one has, of course, \(\mathrm{P }_\mathbb {U}\mathbf{u}=\mathbf{u}\), while for any \({\mathsf {s}}\le {\mathsf {r}}\), componentwise, the “box truncation”
$$\begin{aligned} \mathrm{P }_{\mathbb {U},{\mathsf {s}}} \mathbf{u}:= \sum _{{\mathsf {k}}\in {\mathsf {K}_m}({\mathsf {s}})} \langle \mathbf{u},\mathbb {U}_{\mathsf {k}}\rangle \mathbb {U}_{\mathsf {k}} \end{aligned}$$
(12)
is a simple mechanism of further reducing the ranks of an approximation from the subspace spanned by \(\mathbb {U}\) at the expense of a minimal loss of accuracy.
The existence of best approximations and their realizability through linear projections suggests approximating a given tensor in \(\ell _2(\nabla ^d)\) by expressions of the form
$$\begin{aligned} \mathbf {u} = \sum _{k_1=1}^{r_1} \cdots \sum _{k_m=1}^{r_m} a_{k_1,\ldots ,k_m} \,(\mathbf {U}^{(1)}_{k_1} \otimes \cdots \otimes \mathbf {U}^{(m)}_{k_m} ), \end{aligned}$$
(13)
even without insisting on the ith mode frame\(\mathbf{U}^{(i)}\) to have pairwise orthonormal column vectors \(\mathbf {U}^{(i)}_k\in \mathrm{\ell }_{2}(\nabla ^{d_i})\), \(k = 1,\ldots ,r_i\). However, these columns can always be orthonormalized, which results in a corresponding modification of the core tensor \(\mathbf {a} = (a_{{\mathsf {k}}})_{{\mathsf {k}}\in {\mathsf {K}_m}({\mathsf {r}})}\); for fixed mode frames, the latter is uniquely determined.

When writing sometimes for convenience \((\mathbf{U}^{(i)}_k)_{k\in \mathrm{I}\!\mathrm{N}}\), although the \(\mathbf{U}^{(i)}_k\) may be specified through (13) only for \(k\le r_i\), it will always be understood to mean \(\mathbf{U}^{(i)}_k=0\) for \(k> r_i\).

If the core tensor \(\mathbf {a}\) is represented directly by its entries, then (13) corresponds to the so-called Tucker format [37, 38] or subspace representation. The hierarchical Tucker format [22], as well as the special case of the tensor train format [33], corresponds to representations in the form (13) as well but use a further structured tensor decomposition for the core tensor \(\mathbf {a}\) that can exploit a stronger type of information sparsity. For \(m=2\) the singular value decomposition (SVD) or its infinite dimensional counterpart, the Hilbert–Schmidt decomposition, yields \(\mathbf{u}\)-dependent mode frames that even give a diagonal core tensor. Although this is no longer possible for \(m > 2\), the SVD remains the main workhorse behind Tucker as well as hierarchical Tucker formats. For the reader’s convenience, we summarize in what follows the relevant facts for these tensor representations in a way tailored to present needs.

2.1 Tucker Format

It is instructive to consider first the simpler case of the Tucker format in more detail.

2.1.1 Some Prerequisites

As mentioned earlier, for a general \(\mathbf {u}\in \mathrm{\ell }_{2}(\nabla ^d)\), the sum in (13) may be infinite. For each \(i\in \{1,\ldots ,m\}\) we consider the mode-\(i\) matricization of \(\mathbf {u}\), that is, the infinite matrix \((u^{(i)}_{\nu ,\tilde{\nu }})_{\nu \in \nabla ^{d_i},\tilde{\nu }\in \nabla ^{d-d_i}}\) with entries \(u^{(i)}_{\nu _i,\check{\nu }_i} := u_\nu \) for \(\nu \in \nabla ^{d}\), which defines a Hilbert–Schmidt operator:
$$\begin{aligned} T^{(i)}_{\mathbf{u}} :\mathrm{\ell }_{2}(\nabla ^{d-d_i}) \rightarrow \mathrm{\ell }_{2}(\nabla ^{d_i}),\quad (c_{\tilde{\nu }})_{\tilde{\nu }\in \nabla ^{d-d_i}} \mapsto \Big ( \sum _{\tilde{\nu }\in \nabla ^{d-d_i}} u^{(i)}_{\nu ,\tilde{\nu }} c_{\tilde{\nu }} \Big )_{\nu \in \nabla ^{d_i}}. \end{aligned}$$
(14)
We define the rank vector\({{\mathrm{rank}}}(\mathbf {u})\) by its entries
$$\begin{aligned} {{\mathrm{rank}}}_{i}(\mathbf {u}) := \dim {{\mathrm{range}}}T^{(i)}_\mathbf{u},\quad i=1,\ldots ,m; \end{aligned}$$
(15)
see [23]. It is referred to as the multilinear rank of \(\mathbf {u}\). We denote by
$$\begin{aligned} \mathcal {R}= \mathcal {R}_{{\mathcal {T}}} := (\mathrm{I}\!\mathrm{N}_0 \cup \{ \infty \})^m \end{aligned}$$
(16)
the set of admissible rank vectors in the Tucker format. For such rank vectors \({\mathsf {r}} \in {\mathcal {R}}\) we introduce the notation
$$\begin{aligned} |{\mathsf {r}}|_\infty := \max _{j=1,\ldots ,m} \,{r}_j. \end{aligned}$$
Given any \({\mathsf {r}} \in \mathcal {R}\), we can then define the set
$$\begin{aligned} {\mathcal {T}}({\mathsf {r}}) := \bigl \{ \mathbf {u}\in \mathrm{\ell }_{2}(\nabla ^d) :{{\mathrm{rank}}}_i(\mathbf {u}) \le {r}_i ,\, i=1,\ldots ,m\bigr \}, \end{aligned}$$
(17)
of those sequences whose multilinear rank is bounded componentwise by \({\mathsf {r}}\). It is easy to see that the elements of \({\mathcal {T}}({\mathsf {r}})\) possess a representation of the form (13). Specifically, for any system of orthonormal mode frames \(\mathbb {V}= \bigl (\mathbf {V}^{(i)}\bigr )_{i=1}^m\) with \(r_i\) columns (where \(r_i\) could be infinity), the \(\mathbb {V}\)-rigid Tucker class
$$\begin{aligned} {\mathcal {T}}(\mathbb {V},\mathsf {r}):= \{ \mathrm{P }_{\mathbb {V}} \mathbf{v}: \mathbf{v}\in \ell _2(\nabla ^d)\} \end{aligned}$$
(18)
is contained in \({\mathcal {T}}({\mathsf {r}})\).
The actual computational complexity of the elements of \({\mathcal {T}}({\mathsf {r}})\) can be quantified by
$$\begin{aligned} {{\mathrm{supp}}}_i (\mathbf {u}) := \bigcup _{z \in {{\mathrm{range}}}T^{(i)}_\mathbf{u}} {{\mathrm{supp}}}z. \end{aligned}$$
(19)
It is not hard to see that these quantities are controlled by the “joint support” of the \(i\)th mode frame, that is, \( {{\mathrm{supp}}}_i (\mathbf {u}) \subseteq \bigcup _{k\le r_i}\mathbf{U}^{(i)}_k\). Note that if \(\#{{\mathrm{supp}}}_i(\mathbf{u}) < \infty \), then one necessarily also has \({{\mathrm{rank}}}_i(\mathbf{u}) <\infty \).

The following result, which can be found, for example, in [14, 20, 39], ensures the existence of best approximations in \( {\mathcal {T}}({\mathsf {r}})\) also for infinite ranks.

Theorem 1

Let \(\mathbf {u}\!\in \! \mathrm{\ell }_{2}(\nabla ^d)\) and \(0 \le {r}_i \!\le \! {{\mathrm{rank}}}_i(\mathbf {u})\); then there exists \(\mathbf {v} \!\in \! {\mathcal {T}}(r)\) such that
$$\begin{aligned} ||\mathbf {u} - \mathbf {v}|| = \min _{{{\mathrm{rank}}}(\mathbf {w})\le {\mathsf {r}}} ||\mathbf {u} - \mathbf {w}||. \end{aligned}$$
The matricization \(T^{(i)}_\mathbf{u}\) of a given \(\mathbf{u}\in \ell _2(\nabla ^d)\), defined in (14), allows one to invoke the SVD or Hilbert–Schmidt decomposition. By the spectral theorem, for each \(i\) there exist a nonnegative real sequence \((\sigma ^{(i)}_n)_{n\in \mathrm{I}\!\mathrm{N}}\), where \(\sigma ^{(i)}_n\) are the eigenvalues of \(\bigl ((T^{(i)}_{\mathbf{u}})^* T^{(i)}_{\mathbf{u}}\bigr )^{1/2}\), and orthonormal bases \(\mathbf{U}^{(i)}=\{ \mathbf {U}^{(i)}_n\}_{n\in \mathrm{I}\!\mathrm{N}}\) for a subspace of \(\mathrm{\ell }_{2}(\nabla ^{d_i})\) and \(\{ \mathbf {V}^{(i)}_n\}_{n\in \mathrm{I}\!\mathrm{N}}\) for \(\mathrm{\ell }_{2}(\nabla ^{d-d_i})\) [again tacitly assuming that \(\mathbf{U}^{(i)}_n=\mathbf{V}^{(i)}_n =0\) for \(n> \dim \,\mathrm{range}(T_\mathbf{u}^{(i)})\)] such that
$$\begin{aligned} T^{(i)}_{\mathbf{u}} = \sum _{n\in \mathrm{I}\!\mathrm{N}} \sigma ^{(i)}_n \langle \mathbf {V}^{(i)}_n, \cdot \rangle \mathbf {U}^{(i)}_n. \end{aligned}$$
(20)
The \(\sigma ^{(i)}_k\) are referred to as mode-i singular values.
To simplify notation in a summary of the properties of the particular orthonormal mode frames \(\mathbf{U}^{(i)}\), \(i=1,\ldots ,m\), defined by (20), we define, for any vector \({\mathsf {x}} = (x_i)_{i=1,\ldots ,m}\) and for \(i\in \{{1},\ldots ,{m}\}\),
$$\begin{aligned} \check{{\mathsf {x}}}_i&:= (x_1, \ldots , x_{i-1}, x_{i+1},\ldots ,x_m), \nonumber \\ \check{{\mathsf {x}}}_i|_y&:= (x_1, \ldots , x_{i-1}, y, x_{i+1},\ldots ,x_m) \end{aligned}$$
(21)
to refer to the corresponding vector with entry \(i\) deleted or entry \(i\) replaced by \(y\), respectively. We shall also need the auxiliary quantities
$$\begin{aligned} a^{(i)}_{pq} := \sum _{\check{{\mathsf {k}}}_i\in \mathsf {K}_{m-1}(\check{{\mathsf {r}}}_i)}a_{\check{{\mathsf {k}}}_i|_p}a_{\check{{\mathsf {k}}}_i|_q}, \end{aligned}$$
(22)
derived from the core tensor, where \(i\in \{1,\ldots ,m\}\) and \(p,q\in \{{1},\ldots ,{r_i}\}\).

2.1.2 Higher-Order Singular Value Decomposition

The representation (20) is the main building block of the higher-order singular value decomposition (HOSVD) [27] for the Tucker tensor format (13). In the following theorem, we summarize its properties in the more general case of infinite-dimensional sequence spaces, where the SVD is replaced by the spectral theorem for compact operators. These facts could also be extracted from the treatment in [20, Sect. 8.3].

Theorem 2

For any \(\mathbf {u}\in \mathrm{\ell }_{2}(\nabla ^d)\) the orthonormal mode frames \(\{ \mathbf {U}^{(i)}_k\}_{k\in \mathrm{I}\!\mathrm{N}}\), \(i=1,\ldots ,m\), with \(\mathbf {U}^{(i)}_k\in \mathrm{\ell }_{2}(\nabla ^{d_i})\), defined by (20), and the corresponding core tensor \(\mathbf {a}\) with entries \(a_{\mathsf {k}} = \langle \mathbf{u},\mathbb {U}_{\mathsf {k}}\rangle \), have the following properties:
  1. (i)

    For all \(i\in \{1,\ldots ,m\}\) we have \((\sigma ^{(i)}_k)_{k\in \mathrm{I}\!\mathrm{N}}\in \mathrm{\ell }_{2}(\mathrm{I}\!\mathrm{N})\), and \(\sigma ^{(i)}_k \ge \sigma ^{(i)}_{k+1}\ge 0\) for all \(k\in \mathrm{I}\!\mathrm{N}\), where \(\sigma ^{(i)}_k\) are the mode-\(i\) singular values in (20).

     
  2. (ii)

    For all \(i\in \{1,\ldots ,m\}\) and all \(p,q\in \mathrm{I}\!\mathrm{N}\), we have \(a^{(i)}_{pq} = \bigl |\sigma ^{(i)}_p\bigr |^2 \delta _{pq}\), where the \(a^{(i)}_{pq}\) are defined by (22).

     
  3. (iii)
    For each \({\mathsf {r}} \in \mathrm{I}\!\mathrm{N}^m_0\), we have
    $$\begin{aligned} \Bigg \Vert {\mathbf {u} - \sum _{{\mathsf {k}}\in {\mathsf {K}_m}({\mathsf {r}})} a_{\mathsf {k}} \mathbb {U}_{\mathsf {k}} }\Bigg \Vert \le \Big (\sum _{i=1}^m \sum _{k = r_i + 1}^{\infty } |\sigma ^{(i)}_k|^2\Big )^{\frac{1}{2}} \le \sqrt{m} \inf _{{{\mathrm{rank}}}(\mathbf {w})\le {\mathsf {r}}} ||\mathbf {u} - \mathbf {w}||. \end{aligned}$$
    (23)
     
If in addition \({{\mathrm{supp}}}\mathbf {u} \subseteq \varLambda _1\times \cdots \times \varLambda _m \subset \nabla ^d\) for finite \(\varLambda _i\subset \nabla ^{d_i}\), then \({{\mathrm{supp}}}\mathbf {U}^{(i)}_k \subseteq \varLambda _i\), and we have \({{\mathrm{supp}}}\mathbf {a} \subseteq {\mathsf {K}_m}({\bar{{\mathsf {r}}}})\), with \({\bar{{\mathsf {r}}}}\in \mathrm{I}\!\mathrm{N}_0^m\) satisfying \({\bar{r}}_i \le \#\varLambda _i\) for \(i=1,\ldots ,m\).

Proof

The representation (20) converges in the Hilbert–Schmidt norm, and as a consequence we have
$$\begin{aligned} \mathbf {u} = \Big (\, \sum _{n\in \mathrm{I}\!\mathrm{N}} \sigma ^{(i)}_n \mathbf {U}^{(i)}_{\nu _i,n} \mathbf {V}^{(i)}_{\check{\nu }_i,n} \Big )_{\nu \in \nabla ^d}, \quad i = 1,{\ldots },m, \end{aligned}$$
(24)
with convergence in \(\mathrm{\ell }_{2}(\nabla ^d)\). Furthermore, \(\{\mathbb {U}_{\mathsf {n}}\}_{{\mathsf {n}}\in \mathrm{I}\!\mathrm{N}^m}\) with \(\mathbb {U}_{\mathsf {n}} := \bigotimes _{j=1}^m \mathbf {U}^{(j)}_{n_j}\) is an orthonormal system in \(\mathrm{\ell }_{2}(\nabla ^d)\) [spanning a strict subspace of \(\mathrm{\ell }_{2}(\nabla ^d)\) when \(|{{\mathrm{rank}}}(\mathbf{u})|_\infty <\infty \)]. For \(a_{\mathsf {n}} = \langle \mathbf {u}, \mathbb {U}_{\mathsf {n}}\rangle \) we have thus shown \(\mathbf {a} = (a_\mathsf {n})_\mathsf {n}\in \mathrm{\ell }_{2}(\mathrm{I}\!\mathrm{N}^m)\) and \(\mathbf {u} = \sum _{{\mathsf {n}}\in \mathrm{I}\!\mathrm{N}^m} a_{\mathsf {n}} \mathbb {U}_{\mathsf {n}}\). The further properties of the expansion can now be obtained along the lines of [27]; see also [2, 20]. \(\square \)
In what follows we shall denote by
$$\begin{aligned} \mathbb {U}(\mathbf{u}) = \mathbb {U}^{{\mathcal {T}}}(\mathbf{u}) := \{\mathbf{U}^{(i)}: i=1,\ldots ,m, \, \hbox {generated by HOSVD}\} \end{aligned}$$
(25)
the particular system of orthonormal mode frames generated for a given \(\mathbf{u}\) by the HOSVD. It will occasionally be important to identify the specific tensor format to which a given system of mode frames refers, for which we use a corresponding superscript, such as in \(\mathbb {U}^{{\mathcal {T}}}\) for the Tucker format.

Property (iii) in Theorem 2 leads to a simple procedure for truncation to lower multilinear ranks with an explicit error estimate in terms of the mode-\(i\) singular values. In this manner, one does not necessarily obtain the best approximation for prescribed rank, but the approximation is quasi-optimal in the sense that the error is at most by a factor \(\sqrt{m}\) larger than the error of best approximation with the same multilinear rank.

We now introduce the notation
$$\begin{aligned} {\lambda }_{{\tilde{{{\mathsf {r}}}}}}(\mathbf{u}) = {\lambda }^{\mathcal {T}}_{{\tilde{{\mathsf {r}}}}}(\mathbf{u}) := \Big ( \sum _{i = 1}^m \sum _{k = \,\tilde{r}_{i}+ 1}^{{{\mathrm{rank}}}_i(\mathbf {u})} \bigl |\sigma ^{(i)}_{k}\bigr |^2 \Big )^{\frac{1}{2}}, \quad {\tilde{{\mathsf {r}}}} \in \mathrm{I}\!\mathrm{N}_0^m. \end{aligned}$$
(26)
This quantity plays the role of a computable error estimate, as made explicit in the following direct consequence of Theorem 2.

Corollary 1

For a HOSVD of \(\mathbf {u} \in \mathrm{\ell }_{2}(\nabla ^d)\), as in Theorem 2, and for \({\tilde{{\mathsf {r}}}}\) with \(0\le \tilde{r}_i \le {{\mathrm{rank}}}_i(\mathbf {u})\), we have
$$\begin{aligned} \Vert \mathbf {u} - \mathrm{P }_{\mathbb {U}(\mathbf{u}),{{\tilde{{\mathsf {r}}}}}}(\mathbf{u})\Vert \le {\lambda }^{\mathcal {T}}_{{{\tilde{{\mathsf {r}}}}}} (\mathbf{u}) \le \sqrt{m} \inf _{\mathbf {w}\in {\mathcal {T}}({\mathsf {r}})} \Vert \mathbf {u} - \mathbf {w}\Vert , \end{aligned}$$
where \(\mathrm{P }_{\mathbb {U}(\mathbf{u}),{{\tilde{{\mathsf {r}}}}}}\) is defined in (12).

While projections to subspaces spanned by the \(\mathbb {U}_{\mathsf {k}}(\mathbf{u})\), \({\mathsf {k}}\in {\mathsf {K}_m}({\mathsf {r}})\), do in general not realize the best approximation from \({\mathcal {T}}({\mathsf {r}})\) [only from \({\mathcal {T}}(\mathbb {U}(\mathbf{u}),{\mathsf {r}})\)], exact best approximations are still orthogonal projections based on suitable mode frames.

Corollary 2

For \(\mathbf{u}\in \mathrm{\ell }_{2}(\nabla ^d)\) and \({\mathsf {r}} = (r_i)_{i=1}^m\in \mathrm{I}\!\mathrm{N}_0^m\) with \(0\le r_i\le {{\mathrm{rank}}}_i(\mathbf{u})\), \(i=1,\ldots ,m\), there exists an orthonormal mode frame system \(\bar{\mathbb {U}}(\mathbf{u}, {\mathsf {r}})\) such that
$$\begin{aligned} ||\mathbf{u}- \mathrm{P }_{\bar{\mathbb {U}}(\mathbf{u}, {\mathsf {r}})} \mathbf{u}|| = \min _{\mathbf{w}\in {\mathcal {T}}{({\mathsf {r}})}}||\mathbf{u}- \mathbf{w}||, \end{aligned}$$
with \(\mathrm{P }_{\bar{\mathbb {U}}(\mathbf{u}, {\mathsf {r}})}\) given by (11).

Proof

By Theorem 1, a best approximation of ranks \({\mathsf {r}}\) for \(\mathbf{u}\),
$$\begin{aligned} \bar{\mathbf{u}}\in \mathrm{arg\,min}\{ ||\mathbf{u}- \mathbf{v}|| :{{\mathrm{rank}}}_\alpha (\mathbf{u}) \le r_\alpha \}, \end{aligned}$$
exists. Defining \(\bar{\mathbb {U}}(\mathbf{u}, {\mathsf {r}}):= \mathbb {U}(\bar{\mathbf{u}})\) as the orthonormal mode frame system for \(\bar{\mathbf{u}}\), given by the HOSVD, we obtain the assertion. \(\square \)

Remark 1

Suppose that for a finitely supported vector \(\mathbf {u}\) on \(\nabla ^d\) we have a possibly redundant representation
$$\begin{aligned} \mathbf {u} = \sum _{{\mathsf {k}}\in {\mathsf {K}_m}({\tilde{{\mathsf {r}}}})} \tilde{a}_{\mathsf {k}} \bigotimes _{i=1}^m {{\tilde{\mathbf{U}}}}^{(i)}_{k_i}, \end{aligned}$$
where the vectors \({\tilde{\mathbf{U}}}^{(i)}_{k}\), \(k=1,\ldots ,\tilde{r}_i\), may be linearly dependent. Then, by standard linear algebra procedures, we can obtain a HOSVD of \(\mathbf {u}\) with a number of arithmetic operations that can be estimated by
$$\begin{aligned} C m |{\tilde{{\mathsf {r}}}}|_\infty ^{m+1} + C |\tilde{{\mathsf {r}}}|_\infty ^2 \sum _{i=1}^m \#{{\mathrm{supp}}}_i(\mathbf {u}), \end{aligned}$$
(27)
where \(C>0\) is an absolute constant (e.g., [20]).

2.2 Hierarchical Tucker Format

The Tucker format as it stands, in general, still gives rise to an increase of degrees of freedom that is exponential in \(d\). One way to mitigate the curse of dimensionality is to further decompose the core tensor \(\mathbf{a}\) in (13). We now briefly formulate the relevant notions concerning the hierarchical Tucker format in the present sequence space context, following essentially the developments in [17, 22]; see also [20].

2.2.1 Dimension Trees

Definition 1

Let \(m\in \mathrm{I}\!\mathrm{N}\), \(m\ge 2\). A set \(\mathcal {D}_{m} \subset 2^{\{ 1,\ldots ,m\}}\) is called a (binary) dimension tree if the following hold:
  1. (i)

    \(\{1,\ldots ,m\} \in \mathcal {D}_{m}\), and for each \(i\in \{1,\ldots ,m\}\) we have \(\{i\} \in \mathcal {D}_{m}\).

     
  2. (ii)

    Each \(\alpha \in \mathcal {D}_{m}\) is either a singleton or there exist unique disjoint \(\alpha _1, \alpha _2 \in \mathcal {D}_{m}\), called children of \(\alpha \), such that \(\alpha = \alpha _1 \cup \alpha _2\).

     
Singletons \(\{ i\} \in \mathcal {D}_{m}\) are referred to as leaves,
$$\begin{aligned} {0_{m}} := \{1,\ldots ,m\} \end{aligned}$$
as root, and elements of \({\mathcal {I}}(\mathcal {D}_{m}) := \mathcal {D}_{m}\setminus \bigl \{{0_{m}} ,\{1\},\ldots ,\{m\} \bigr \}\) as interior nodes. The set of leaves is denoted by \(\mathcal{L}(\mathcal {D}_{m})\), where we additionally set \(\mathcal{N}(\mathcal {D}_{m}) := \mathcal {D}_{m}\setminus \mathcal {L}(\mathcal {D}_m) = {\mathcal {I}}(\mathcal {D}_{m})\cup \{ {0_{m}}\}\). When an enumeration of \(\mathcal{L}(\mathcal {D}_{m})\) is required, we shall always assume the ascending order with respect to the indices, i.e., in the form \(\{\{1\},\ldots , \{m\}\}\).
It will be convenient to introduce the two functions
$$\begin{aligned} {\mathrm{c}_{i}}: \mathcal{D}_m\setminus \mathcal{L}(\mathcal{D}_m)\rightarrow \mathcal{D}_m\setminus \{{0_{m}}\},\quad {\mathrm{c}_{i}}(\alpha ):= \alpha _i,\quad i=1,2, \end{aligned}$$
producing the “left” and “right” children of a nonleaf node \(\alpha {\in \mathcal {N}(\mathcal{D}_m)}\), which, in view of Definition 1, are well defined up to their order, which we fix by the condition \(\min \alpha _1 < \min \alpha _2\).

Note that for a binary dimension tree as defined earlier, \(\# \mathcal {D}_{m} = 2m-1\) and \(\#\mathcal{N}(\mathcal {D}_{m}) = m-1\).

Remark 2

The restriction to binary trees in Definition 1 is not necessary, but it leads to the most favorable complexity estimates for algorithms operating on the resulting tensor format. With this restriction dropped, the Tucker format (13) can be treated in the same framework, with the \(m\)-ary dimension tree consisting only of root and leaves, i.e., \(\bigl \{ {0_{m}}, \{1\},\ldots ,\{m\} \bigr \}\). In principle, all subsequent results carry over to more general dimension trees (see [15, Sect. 5.2]).

Definition 2

We shall refer to a family
$$\begin{aligned} \mathbb {U}= \left\{ \mathbf{U}^{(\alpha )}_k \in \mathrm{\ell }_{2}(\nabla ^{\sum _{j\in \alpha } d_j}) \,:\, \alpha \in \mathcal {D}_{m}\setminus \{{0_{m}}\right\} , k=1,\ldots ,k_\alpha \}, \end{aligned}$$
with \(k_\alpha \in \mathrm{I}\!\mathrm{N}\cup \{\infty \}\) for each \(\alpha \in \mathcal {D}_{m}\setminus \{{0_{m}}\}\), as hierarchical mode frames. In addition, these are called orthonormal if for all \(\alpha \in \mathcal {D}_{m}\setminus \{{0_{m}}\}\) we have \(\langle \mathbf{U}^{(\alpha )}_i, \mathbf{U}^{(\alpha )}_j\rangle = \delta _{ij}\) for \(i,j=1,\ldots ,k_\alpha \), and nested if
$$\begin{aligned}&\overline{{{\mathrm{span}}}}\{ \mathbf{U}^{(\alpha )}_k:k=1,\ldots ,k_\alpha \}\subseteq \overline{{{\mathrm{span}}}}\{ \mathbf{U}^{({\mathrm{c}_1}(\alpha ))}_k:k=1,\ldots , k_{{\text {c}}_2(\alpha )} \} \\&\quad \otimes \; \overline{{{\mathrm{span}}}}\{ \mathbf{U}^{({\mathrm{c}_2}(\alpha ))}_k:k=1,\ldots ,k_{{\mathrm{c}_2}{(\alpha )}} \}. \end{aligned}$$
As for the Tucker format, we set \(\mathbf {U}^{(i)} := \mathbf {U}^{(\{i\})}\), and for \({\mathsf {k}}\in \mathrm{I}\!\mathrm{N}^m\) we retain the notation
$$\begin{aligned} \mathbb {U}_{\mathsf {k}} := \bigotimes _{i=1}^m \mathbf{U}^{(i)}_{k_i} \,. \end{aligned}$$

Again, to express that \(\mathbb {U}\) is associated with the hierarchical format, we sometimes write \(\mathbb {U}^{{\mathcal {H}}}\). Of course, \(\mathbb {U}^{{\mathcal {H}}}\) depends on the dimension tree \(\mathcal{D}_m\), which will be kept fixed in what follows.

To define hierarchical tensor classes and to construct specific \(\mathbf{u}\)-dependent hierarchical mode frames, one can proceed as for the Tucker format. Let \(\mathcal {D}_{m}\) be a dimension tree, let \(\alpha \in {\mathcal {I}}(\mathcal {D}_{m})\) be an interior node, and \(\beta := \{1,\ldots ,m\}\setminus \alpha \). For \(\mathbf {u} \in \mathrm{\ell }_{2}(\nabla ^d)\) we define the Hilbert–Schmidt operator
$$\begin{aligned} T^{(\alpha )}_\mathbf{u}:\mathrm{\ell }_{2}(\nabla ^{\sum _{i\in \beta } d_i}) \rightarrow \mathrm{\ell }_{2}(\nabla ^{\sum _{i\in \alpha } d_i}),\; \mathbf {c} \mapsto \Big ( \sum _{(\nu _i)_{i\in \beta }} u_{\nu } c_{(\nu _i)_{i\in \beta }} \Big )_{(\nu _i)_{i\in \alpha }}, \end{aligned}$$
(28)
and set
$$\begin{aligned} \mathrm{rank}_\alpha (\mathbf{u}) := \dim {{\mathrm{range}}}T^{(\alpha )}_\mathbf{u},\quad \alpha \in \mathcal{D}_m\setminus {0_{m}}. \end{aligned}$$
To be consistent with our previous notation for leaf nodes \(\{i\} \in \mathcal {D}_{m}\), we use the abbreviation \(\mathrm{rank}_i(\mathbf {u}) := \mathrm{rank}_{\{i\}} (\mathbf {u})\). Again, \(\mathrm{rank}_\alpha (\mathbf{u})\) can be infinite. The root element of the dimension tree, \({0_{m}}=\{1,\ldots ,m\}\in \mathcal {D}_{m}\), is a special case. Here we define
$$\begin{aligned} T^{({0_{m}})}_\mathbf{u}:\mathbb {R}\rightarrow \mathrm{\ell }_{2}(\nabla ^d) ,\quad t \mapsto t \,\mathbf{u}\end{aligned}$$
and correspondingly set
$$\begin{aligned} \mathrm{rank}_{{0_{m}}}{(\mathbf{u})} := 1, \quad \mathbf {U}^{({0_{m}})}_1 := \mathbf {u}, \quad \mathbf {U}^{({0_{m}})}_k := 0\,,\; k>1, \end{aligned}$$
if \(\mathbf{u}\ne 0\), and otherwise \(\mathrm{rank}_{{0_{m}}}(\mathbf{u}) :=0\). To be consistent with the Tucker format, we denote by
$$\begin{aligned} {{\mathrm{rank}}}(\mathbf{u}) = {{{\mathrm{rank}}}_{\mathcal {D}_{m}}(\mathbf{u})} := ({{\mathrm{rank}}}_\alpha (\mathbf{u}) )_{\alpha \in \mathcal {D}_{m}\setminus \{{0_{m}}\}}\, \end{aligned}$$
the hierarchical rank vector associated with \(\mathbf{u}\). Since in what follows the dimension tree \(\mathcal {D}_{m}\) will be kept fixed we suppress the corresponding subscript in the rank vector.
This allows us to define for a given \({\mathsf {r}} = ({r}_\alpha )_{\alpha \in \mathcal {D}_{m}\setminus \{{0_{m}}\}} \in (\mathrm{I}\!\mathrm{N}_0\cup \{\infty \})^{\mathcal {D}_{m}\setminus \{{0_{m}}\}}\), in analogy to (17), the class
$$\begin{aligned} {\mathcal {H}}({\mathsf {r}}) := \bigl \{ \mathbf {u} \in \ell _{2}(\nabla ^d) :{{\mathrm{rank}}}_\alpha (\mathbf {u}) \le {r}_\alpha \; \hbox {for all}\;\alpha \in \mathcal {D}_{m}\setminus \{{0_{m}}\} \bigr \} \,. \end{aligned}$$
(29)
For \({\mathcal {H}}({\mathsf {r}})\) to be nonempty the rank vectors must satisfy certain compatibility conditions; see Proposition 1 below. As detailed later, the elements of \({\mathcal {H}}({\mathsf {r}})\) can be represented in terms of hierarchical mode frames in the so-called hierarchical format with ranks \({\mathsf {r}}\).
Now, for a given \(\mathbf{u}\in \ell _2(\nabla ^d)\), let \(\{ \mathbf {U}^{(\alpha )}_k \}_{k=1}^{\mathrm{rank}_\alpha (\mathbf{u})}\), \(\mathbf {U}^{(\alpha )}_k\in \mathrm{\ell }_{2}(\nabla ^{\sum _{i\in \alpha } d_i})\) be the left singular vectors and \(\sigma ^{(\alpha )}_k\) the singular values of \(T^{(\alpha )}_\mathbf{u}\). In analogy to the Tucker format, we denote by
$$\begin{aligned} \mathbb {U}(\mathbf{u})= \mathbb {U}^{{\mathcal {H}}}(\mathbf{u}):= \big \{\{\mathbf {U}^{(\alpha )}_k \}_{k=1}^{\mathrm{rank}_\alpha (\mathbf {u})}:\alpha \in \mathcal{D}_m \big \} \end{aligned}$$
(30)
the system of orthonormal hierarchical mode frames with rank vectors \({{\mathrm{rank}}}(\mathbf{u})\).

The observation that the specific systems of hierarchical mode frames \(\mathbb {U}(\mathbf{u})\) have the following nestedness property, including the root element, will be crucial. The following fact was established in a more generally applicable framework of minimal subspaces in [20] (cf. Corollary 6.18 and Theorem 6.31 there).

Proposition 1

For \(\mathbf{u}\in \mathrm{\ell }_{2}(\nabla ^d)\) and \(\alpha \in \mathcal{N}(\mathcal {D}_{m})\), the mode frames \(\{ \mathbf {U}^{(\alpha )}_k \}\) given by the left singular vectors of the operators \(T^{(\alpha )}_\mathbf{u}\) defined in (28) satisfy
$$\begin{aligned}&\overline{{{\mathrm{span}}}}\{ \mathbf{U}^{(\alpha )}_k:k=1,\ldots ,\mathrm{rank}_\alpha (\mathbf{u}) \} \subseteq \overline{{{\mathrm{span}}}}\{ \mathbf{U}^{({\mathrm{c}_1}(\alpha ))}_k:k=1,\ldots ,\mathrm{rank}_{{\mathrm{c}_1}(\alpha )}(\mathbf{u}) \} \\&\quad \otimes \, \overline{{{\mathrm{span}}}}\{ \mathbf{U}^{({\mathrm{c}_2}(\alpha ))}_k:k=1,\ldots ,\mathrm{rank}_{{\mathrm{c}_2}(\alpha )}(\mathbf{u}) \} \,, \end{aligned}$$
i.e., the family of left singular vectors of the operators \(T^{(\alpha )}_\mathbf{u}\) is comprised of orthonormal and nested mode frames for \(\mathbf{u}\).
Nestedness entails compatibility conditions on the rank vectors \({\mathsf {r}}\). In fact, it readily follows from Proposition 1 that for \(\alpha \in \mathcal{D}_m\setminus \mathcal{L}(\mathcal{D}_m)\) one has \({{\mathrm{rank}}}_\alpha (\mathbf{u})\le {{\mathrm{rank}}}_{c_1(\alpha )}(\mathbf{u}) {{\mathrm{rank}}}_{c_2(\alpha )}(\mathbf{u})\). For necessary and sufficient conditions on a rank vector \({\mathsf {r}}=(r_\alpha )_{\alpha \in \mathcal{D}_m\setminus \{{0_{m}}\}}\) for the existence of corresponding nested hierarchical mode frames, we refer to [20, Sect. 11.2.3]. In what follows we denote by
$$\begin{aligned} \mathcal {R}=\mathcal {R}_{{\mathcal {H}}}\subset (\mathrm{I}\!\mathrm{N}_0 \cup \{\infty \})^{\mathcal{D}_m\setminus \mathcal{L}(\mathcal{D}_m)} \end{aligned}$$
(31)
the set of all hierarchical rank vectors satisfying the compatibility conditions for nestedness.

Following [14, 20], we can now formulate the analog to Theorem 1.

Theorem 3

Let \(\mathbf {u}\in \mathrm{\ell }_{2}(\nabla ^d)\), let \(\mathcal {D}_{m}\) be a dimension tree, and let \({\mathsf {r}} = (r_\alpha )\in \mathcal {R}_{\mathcal {H}}\) with \(0 \le r_\alpha \le {{\mathrm{rank}}}_\alpha (\mathbf {u})\) for \(\alpha \in \mathcal {D}_{m}\setminus \{{0_{m}}\}\); then there exists \(\mathbf {v} \in {\mathcal {H}}({\mathsf {r}})\) such that
$$\begin{aligned} ||\mathbf {u} - \mathbf {v}|| = \min \bigl \{ ||\mathbf {u} - \mathbf {w}|| :{{\mathrm{rank}}}_\alpha (\mathbf {w})\le r_\alpha , \alpha \in \mathcal {D}_{m}\setminus \{{0_{m}}\} \bigr \} \,. \end{aligned}$$
We recall next the specific structure of the hierarchical format. Let \(\mathbb {U}\) be a system of hierarchical orthonormal mode frames. By orthonormality and nestedness, we obtain for each \(\alpha \in \mathcal{N}(\mathcal {D}_{m})\) and \(k=1,\ldots ,\mathrm{rank}_\alpha (\mathbf{u})\) the expansion
$$\begin{aligned} \mathbf {U}^{(\alpha )}_k = \sum _{k_1 = 1}^{\mathrm{rank}_{{\mathrm{c}_1}(\alpha )}(\mathbf {u})} \sum _{k_2 = 1}^{\mathrm{rank}_{{\mathrm{c}_2}(\alpha )}(\mathbf {u})} \bigl \langle \mathbf {U}^{(\alpha )}_k , \mathbf {U}^{({\mathrm{c}_1}(\alpha ))}_{k_1} \otimes \mathbf {U}^{({\mathrm{c}_2}(\alpha ))}_{k_2} \bigr \rangle \, \mathbf {U}^{({\mathrm{c}_1}(\alpha ))}_{k_1} \otimes \mathbf {U}^{({\mathrm{c}_2}(\alpha ))}_{k_2}. \end{aligned}$$
(32)
Defining the matrices \(\mathbf {B}^{(\alpha ,k)} \in \mathrm{\ell }_{2}(\mathrm{I}\!\mathrm{N}\times \mathrm{I}\!\mathrm{N})\) with the entries
$$\begin{aligned} B^{(\alpha ,k)}_{k_1,k_2} := \bigl \langle \mathbf {U}^{(\alpha )}_{k}, \mathbf {U}^{({\mathrm{c}_1}(\alpha ))}_{k_1} \otimes \mathbf {U}^{({\mathrm{c}_2}(\alpha ))}_{k_2} \bigr \rangle \,, \end{aligned}$$
(33)
(32) can be rewritten as
$$\begin{aligned} \mathbf {U}^{(\alpha )}_k = \sum _{k_1 = 1}^{\mathrm{rank}_{{\mathrm{c}_1}(\alpha )}(\mathbf {u})} \sum _{k_2 = 1}^{\mathrm{rank}_{{\mathrm{c}_2}(\alpha )}(\mathbf {u})} B^{(\alpha ,k)}_{k_1,k_2} \, \mathbf {U}^{({\mathrm{c}_1}(\alpha ))}_{k_1} \otimes \mathbf {U}^{({\mathrm{c}_2}(\alpha ))}_{k_2}, \end{aligned}$$
(34)
providing a decomposition into vectors \(\mathbf{U}^{{\mathrm{c}_{i}}(\alpha )}_k\), \(i=1,2\), which now involve shorter multiindices supported in the children \({\mathrm{c}_{i}}(\alpha )\). This decomposition can be iterated as illustrated by the next step. Abbreviating \({\mathrm{c}_{i,j}}(\alpha )={\mathrm{c}_{i}}({\mathrm{c}_{j}}(\alpha ))\), one obtains
$$\begin{aligned} \mathbf {U}^{(\alpha )}_k&= \sum _{k_1 = 1}^{\mathrm{rank}_{{\mathrm{c}_1}(\alpha )}(\mathbf {u})} \sum _{k_2 = 1}^{\mathrm{rank}_{{\mathrm{c}_2}(\alpha )}(\mathbf {u})} \mathop {\sum \limits _{k_{i,1},k_{i,2}, k_{j,1},k_{j,2}}}\limits _{(i,j)\in \{1,2\}^2} B^{(\alpha ,k)}_{(k_1,k_2)} \nonumber \\&\times B^{({\mathrm{c}_{1}}(\alpha ),k_1 )}_{k_{i,1},k_{j,1}}B^{({\mathrm{c}_{2}}(\alpha ),k_2 )}_{k_{i,2},k_{j,2}} \mathbf{U}^{({\mathrm{c}_{i,1}}(\alpha ))}_{k_{i,1}}\otimes \mathbf{U}^{({\mathrm{c}_{j,1}}(\alpha ))}_{k_{j,1}}\otimes \mathbf{U}^{({\mathrm{c}_{i,2}}(\alpha ))}_{k_{i,2}}\otimes \mathbf{U}^{({\mathrm{c}_{j,2}}(\alpha ))}_{k_{j,2}}.\quad \quad \end{aligned}$$
(35)
Applying this recursively, any \(\mathbf {u} \in \mathrm{\ell }_{2}(\nabla ^d)\) can be expanded in the form
$$\begin{aligned} \mathbf {u} = \sum _{k_1 = 1}^{\mathrm{rank}_1(\mathbf {u})} \cdots \sum _{k_m = 1}^{\mathrm{rank}_m(\mathbf {u})} a_{k_1,\ldots ,k_m} \, \mathbf {U}^{(1)}_{k_1} \otimes \cdots \otimes \mathbf {U}^{(m)}_{k_m} \,, \end{aligned}$$
(36)
where the core tensor \(\mathbf {a}\) has a further decomposition in terms of the matrices \(\mathbf {B}^{(\alpha ,k)}\) for all nonleaf nodes \(\alpha \) and \(k=1,\ldots ,\mathrm{rank}_\alpha (\mathbf {u})\). This decomposition can be given explicitly as follows: for each \((k_\alpha )_{\alpha \in \mathcal {D}_{m}}\) we define the auxiliary expression
$$\begin{aligned} \hat{B}_{(k_\alpha )_{\alpha \in \mathcal {D}_{m}}} := \prod _{\beta \in \mathcal{N}(\mathcal {D}_{m})} B^{(\beta ,k_\beta )}_{(k_{{\mathrm{c}_1}(\beta )},k_{{\mathrm{c}_2}(\beta )})} \,. \end{aligned}$$
We now use this to give an entrywise definition of the tensor \(\mathrm {\Sigma }_{\mathcal {D}_{m}}(\{ \mathbf {B}^{(\alpha ,k)}\})\in \mathrm{\ell }_{2}(\mathrm{I}\!\mathrm{N}^m)\), for each tuple of leaf node indices \((k_\beta )_{\beta \in \mathcal {L}(\mathcal {D}_{m})} \in \mathrm{I}\!\mathrm{N}^{\#\mathcal {L}(\mathcal {D}_{m})}\), as
$$\begin{aligned}&\Bigl ( \mathrm {\Sigma }_{\mathcal {D}_{m}}\bigl (\{ \mathbf {B}^{(\alpha ,k)} :\alpha \in \mathcal{N}(\mathcal {D}_{m}),\, k=1,\ldots ,\mathrm{rank}_\alpha (\mathbf{u})\}\bigr ) \Bigr )_{(k_\beta )_{\beta \in \mathcal {L}(\mathcal {D}_{m})}} \nonumber \\&\quad =\mathop {\sum \limits _{(k_\delta )_{\delta \in {\mathcal {I}}(\mathcal {D}_{m})}}}\limits _{k_\delta =1,\ldots ,\mathrm{rank}_\delta (\mathbf{u})} \hat{B}_{(k_\delta )_{\delta \in \mathcal {D}_{m}}}. \end{aligned}$$
(37)
Note that the quantity on the right-hand side involves a summation over all indices corresponding to nonleaf nodes. Since the summands depend on all indices, this leaves precisely the indices corresponding to leaf nodes as free parameters, as on the left-hand side (recall that the index for the root of the tree is restricted to the value \(1\)). The tensor defined in (37) then equals the core tensor \(\mathbf {a}\), which is thus represented as
$$\begin{aligned} \mathbf {a} = \mathrm {\Sigma }_{\mathcal {D}_{m}}\bigl (\{\mathbf {B}^{(\alpha ,k)} :\alpha \in \mathcal{N}(\mathcal {D}_{m}),\,k=1,\ldots ,\mathrm{rank}_\alpha (\mathbf{u})\}\bigr ) \,. \end{aligned}$$
(38)
Subsequently this representation is illustrated explicitly for \(m=4\) in Example 1.

Example 1

Consider \(m = 4,\, \mathcal {D}_{4} = \bigl \{ \{1,2,3,4\}, \{1,2\}, \{3,4\}, \{1\},\{2\},\{3\},\{4\} \bigr \}\). For this example we use the abbreviation \(r_\alpha := \mathrm{rank}_{\alpha }(\mathbf {u})\) and derive from (35) the expansion
$$\begin{aligned} \mathbf {u}&= \sum _{k_1 = 1}^{r_1} \sum _{k_2 = 1}^{r_2} \sum _{k_3 = 1}^{r_3} \sum _{k_4 = 1}^{r_4} \sum _{k_{\{1,2\}} = 1}^{r_{\{1,2\}}} \sum _{k_{\{3,4\}} = 1}^{r_{\{3,4\}}} B^{(\{1,2,3,4\},1)}_{(k_{\{1,2\}},k_{\{3,4\}})} \\&\times B^{(\{1,2\},k_{\{1,2\}})}_{(k_1,k_2)} \, B^{(\{3,4\},k_{\{3,4\}})}_{(k_3,k_4)} \, \mathbf {U}^{(1)}_{k_1} \otimes \mathbf {U}^{(2)}_{k_2} \otimes \mathbf {U}^{(3)}_{k_3} \otimes \mathbf {U}^{(4)}_{k_4} \,, \end{aligned}$$
that is, for the core tensor we have the decomposition
$$\begin{aligned} a_{k_1,k_2,k_3,k_4} = \sum _{k_{\{1,2\}} = 1}^{r_{\{1,2\}}} \sum _{k_{\{3,4\}} = 1}^{r_{\{3,4\}}} B^{(\{1,2,3,4\},1)}_{(k_{\{1,2\}},k_{\{3,4\}})} B^{(\{1,2\},k_{\{1,2\}})}_{(k_1,k_2)} B^{(\{3,4\},k_{\{3,4\}})}_{(k_3,k_4)} \,. \end{aligned}$$

Example 2

A tensor train (TT) representation for \(m=4\) as in Example 1 would correspond to \(\mathcal {D}_{4} = \bigl \{ \{1,2,3,4\}, \{1\}, \{2,3,4\}, \{2\}, \{3,4\}, \{3\}, \{4\} \bigr \}\), i.e., a degenerate instead of a balanced binary tree. More precisely, the special case of the hierarchical Tucker format resulting from this type of tree has also been considered under the name extended TT format [32].

2.2.2 Hierarchical Singular Value Decomposition

For any given \(\mathbf{u}\in \ell _2(\nabla ^d)\) the decomposition (36), with \(\mathbf {a}\) defined by (38), can be regarded as a generalization of the HOSVD, which we shall refer to as a hierarchical singular value decomposition or \({\mathcal {H}}\)SVD. The next theorem summarizes the main properties of this decomposition in the present setting. The finite-dimensional versions of the following claims were established in [17]. All arguments given there carry over to the infinite-dimensional case, as in the proof of Theorem 2.

Theorem 4

Let \(\mathbf {u}\in \mathrm{\ell }_{2}(\nabla ^d)\), where \(d = d_1 + \cdots + d_m\), and let \(\mathcal {D}_{m}\) be a dimension tree. Then \(\mathbf {u}\) can be represented in the form
$$\begin{aligned} \mathbf {u} = \sum _{{\mathsf {k}}\in \mathrm{I}\!\mathrm{N}^m} a_{\mathsf {k}} \mathbb {U}_{\mathsf {k}}, \quad \mathbf {a} = \mathrm {\Sigma }_{\mathcal {D}_{m}}\bigl (\{ \mathbf {B}^{(\alpha ,k)} :\alpha \in \mathcal{N}(\mathcal {D}_{m}), \, k = 1,\ldots , {{\mathrm{rank}}}_\alpha (\mathbf {u})\}\bigr ) \end{aligned}$$
with \(\mathbf {a} \in \mathrm{\ell }_{2}(\nabla ^d)\), \(\mathbf {B}^{(\alpha ,k)}\in \mathrm{\ell }_{2}(\mathrm{I}\!\mathrm{N}\times \mathrm{I}\!\mathrm{N})\), defined by (37), for \(\alpha \in \mathcal{N}(\mathcal {D}_{m})\), \(k\in \mathrm{I}\!\mathrm{N}\), and where the following hold:
  1. (i)

    \(\langle \mathbf {U}^{(i)}_k, \mathbf {U}^{(i)}_l \rangle = \delta _{kl}\) for \(i=1,\ldots ,m\) and \(k,l\in \mathrm{I}\!\mathrm{N}\);

     
  2. (ii)

    \({{\mathrm{rank}}}_{{0_{m}}}(\mathbf{u}) = 1\), \(||\mathbf {B}^{({0_{m}},1)}|| = ||\mathbf{u}||\), and \(\mathbf {B}^{({0_{m}},k)} = 0\) for \(k>1\);

     
  3. (iii)

    \(\langle \mathbf {B}^{(\alpha ,k)}, \mathbf {B}^{(\alpha ,l)}\rangle = \delta _{kl}\) for \(\alpha \in {\mathcal {I}}(\mathcal {D}_{m})\) and \(k,l\in \mathrm{I}\!\mathrm{N}\);

     
  4. (iv)

    for all \(i\in \{1,\ldots ,m\}\) we have \((\sigma ^{(i)}_k)_{k\in \mathrm{I}\!\mathrm{N}}\in \mathrm{\ell }_{2}(\mathrm{I}\!\mathrm{N})\), and \(\sigma ^{(i)}_k \ge \sigma ^{(i)}_{k+1}\ge 0\) for all \(k\in \mathrm{I}\!\mathrm{N}\);

     
  5. (v)

    for all \(i\in \{1,\ldots ,m\}\) we have \(a^{(i)}_{pq} = \bigl |\sigma ^{(i)}_p\bigr |^2 \delta _{pq}\), \(1\le p,q\le \mathrm{rank}_i(\mathbf{u})\).

     

2.2.3 Projections

As in the case of the Tucker format it will be important to associate suitable orthogonal projections with a given system \(\mathbb {V}\) of nested orthonormal mode frames. Recall that \({\mathsf {r}} = (r_\alpha )_{\alpha \in \mathcal {D}_{m}\setminus \{{0_{m}}\}}\in \mathcal {R}_{\mathcal {H}}\) always stands for a rank vector for the hierarchical Tucker format, satisfying the compatibility conditions implied by Proposition 1. Again, \(r_\alpha =\infty \) is permitted. We begin by introducing an analog to (18), with a slightly more involved definition. The hierarchical\(\mathbb {V}\)-rigid tensor class of rank \({\mathsf {r}}\) is given by
$$\begin{aligned} {\mathcal {H}}(\mathbb {V},{\mathsf {r}}):= \big \{\mathbf{w}: \overline{{{\mathrm{range}}}}\, T^{(\alpha )}_{\mathbf{w}} \subseteq \overline{{{\mathrm{span}}}} \{ \mathbf{V}^{(\alpha )}_k :k=1,\ldots ,r_\alpha \} \,, \alpha \in \mathcal {D}_{m}\setminus \{{0_{m}}\}\big \}, \end{aligned}$$
(39)
where \( T^{(\alpha )}_{\mathbf{w}}\) is defined by (28). Clearly, \({\mathcal {H}}(\mathbb {V},{\mathsf {r}}) \subset \mathcal {H}({{\mathsf {r}}})\).

In analogy to (12), we address next a truncation of hierarchical ranks to \({\tilde{{\mathsf {r}}}}\le {\mathsf {r}}\) for elements in \({\mathcal {H}}(\mathbb {V},{\mathsf {r}})\), where \(\mathbb {V}\) is a given system of orthonormal and nested mode frames with ranks \({\mathsf {r}}\). We assume first that \({\tilde{{\mathsf {r}}}}\) belongs also to \(\mathcal {R}_{\mathcal {H}}\). The main point is that an approximation with restricted mode frames can still be realized through an operation represented as a sequence of projections involving the given mode frames from \(\mathbb {V}\). However, the order in which these projections are applied now matters.

In a way, the proof of Lemma 1 below already indicates how to proceed, namely, by applying the restriction operators first on the “lower levels” of the dimension tree. To make this precise, we denote by \(\mathcal {D}_{m}^\ell \) the collection of elements of \(\mathcal {D}_{m}\) that have a distance exactly \(\ell \) to the root [i.e., \(\mathcal {D}_{m}^0 = \{ {0_{m}} \}\), \(\mathcal {D}_{m}^1 = \{ {\mathrm{c}_1}({0_{m}}),{\mathrm{c}_2}({0_{m}}) \}\), and so forth]. Let \(L\) be the maximal integer such that \(\mathcal {D}_{m}^L\ne \emptyset \). For \(\ell =1,\ldots ,L\) let \(\bar{\mathcal{D}}_m^\ell := \bigcup \{i\in \alpha :\alpha \in \mathcal{D}_m^\ell \}\). Then, given \(\mathbb {V}\), and abbreviating
$$\begin{aligned} \mathrm{P }_{\mathbb {V},\alpha ,{\tilde{{\mathsf {r}}}}} := \sum _{k=1}^{\tilde{r}_\alpha } \langle \mathbf{V}^{(\alpha )}_{k} ,\cdot \rangle \mathbf{V}^{(\alpha )}_k , \end{aligned}$$
we define
$$\begin{aligned} P_{\mathbb {V},\ell ,{\tilde{{\mathsf {r}}}}} := \Big ( \bigotimes _{i\in \{1,\ldots ,m\}\setminus \bar{\mathcal{D}}_m^\ell } \mathrm{I}_{i} \Big ) \otimes \Big ( \bigotimes _{\alpha \in \mathcal {D}_{m}^\ell } \mathrm{P }_{\mathbb {V},\alpha ,{\tilde{{\mathsf {r}}}}} \Big ), \end{aligned}$$
with \(\mathrm{I}_i\) denoting the identity operation on the \(i\)th tensor mode. Then, as observed in [17], the truncation operation with mode frames \(\mathbb {V}\) restricted to ranks \({\tilde{{\mathsf {r}}}}\) can be represented as
$$\begin{aligned} \mathrm{P }_{\mathbb {V},{\tilde{{\mathsf {r}}}}} := P_{\mathbb {V},L,{\tilde{{\mathsf {r}}}}} \, \cdots \,P_{\mathbb {V},2,{\tilde{{\mathsf {r}}}}}\, P_{\mathbb {V},1,{\tilde{{\mathsf {r}}}}}. \end{aligned}$$
(40)
Here the order is important because the projections \(\mathrm{P }_{\mathbb {V},\alpha ,{\tilde{{\mathsf {r}}}}}, \mathrm{P }_{\mathbb {V},\beta ,{\tilde{{\mathsf {r}}}}}\) corresponding to \(\alpha ,\beta \in \mathcal {D}_{m}\) with \(\alpha \subset \beta \) do not necessarily commute. Therefore, a different order of projections may in fact lead to an end result that has ranks larger than \({\tilde{{\mathsf {r}}}}\); cf. [17].

Specifically, given \(\mathbf{u}\in \ell _2(\nabla ^d)\), we can choose \(\mathbb {V}=\mathbb {U}(\mathbf{u})\) provided by the \({\mathcal {H}}\)SVD; see (30). Hence, \(\mathrm{P }_{\mathbb {U}(\mathbf{u}),{\tilde{{\mathsf {r}}}}}\mathbf{u}\) gives the truncation of \(\mathbf{u}\) based on the \({\mathcal {H}}\)SVD. For this particular truncation an error estimate, in terms of the error of best approximation with rank \({{\tilde{{\mathsf {r}}}}}\), is given in Theorem 5 below.

Remark 3

By (40) we have a representation of \(\tilde{\mathbf{u}}:=\mathrm{P }_{\mathbb {U}(\mathbf{u}),{\tilde{{\mathsf {r}}}}} \mathbf{u}\) in terms of a sequence of noncommuting orthogonal projections. When \({\tilde{{\mathsf {r}}}}\le {\mathsf {r}}\) does not belong to \(\mathcal {R}_{\mathcal {H}}\), the operator defined by (40) is still a projection that, however, modifies the mode frames for those nodes \(\alpha \in \mathcal {N}(\mathcal{D}_m)\) for which the rank compatibility conditions are violated. The resulting projected mode frames are then nested, that is, \(\tilde{\mathbf{u}}\) may again be represented in terms of the orthonormal and nested mode frames \(\tilde{\mathbb {U}} := \mathbb {U}(\tilde{\mathbf{u}})\).

The situation is simplified if we consider the projection to a fixed nested system of mode frames, without a further truncation of ranks that could entail non-nestedness.

Lemma 1

Let \(\mathbb {V}\) be a family of orthonormal and nested hierarchical mode frames with ranks \({\mathsf {r}}\). Then there exists a linear projection \(\mathrm{P }_{\mathbb {V}} :\mathrm{\ell }_{2}(\nabla ^d) \rightarrow {\mathcal {H}}(\mathbb {V},{\mathsf {r}})\) such that the unique best approximation in \({\mathcal {H}}(\mathbb {V},{\mathsf {r}})\) of any \(\mathbf{u}\in \mathrm{\ell }_{2}(\nabla ^d)\) is given by \(\mathrm{P }_{\mathbb {V}}\mathbf{u}\), that is,
$$\begin{aligned} ||\mathbf{u}- \mathrm{P }_{\mathbb {V}} \mathbf{u}|| = \min _{\mathbf{w}\in \mathcal {H}({\mathbb {V},{\mathsf {r}}})}||\mathbf{u}- \mathbf{w}|| \,. \end{aligned}$$

Proof

The sought projection is given by \(\mathrm{P }_\mathbb {V}= P_{\mathbb {V},1,{\mathsf {r}}}\) since
$$\begin{aligned} P_{\mathbb {V},L,{\mathsf {r}}} \, \cdots \,P_{\mathbb {V},2,{\mathsf {r}}}\, P_{\mathbb {V},1,{\mathsf {r}}} = P_{\mathbb {V},1,{\mathsf {r}}} \end{aligned}$$
holds as a consequence of the nestedness property. \(\square \)

2.2.4 Best Approximation

In analogy to (26), we define the error estimate
$$\begin{aligned} {\lambda }_{\tilde{{\mathsf {r}}}}(\mathbf{u}) = {\lambda }^{\mathcal {H}}_{\tilde{{\mathsf {r}}}}(\mathbf{u}) := \Big ( \sum _{\alpha } \sum _{k = \tilde{r}_{\alpha }+ 1}^{\mathrm{rank}_\alpha (\mathbf {u})} \bigl |\sigma ^{(\alpha )}_{k}\bigr |^2 \Big )^{\frac{1}{2}}. \end{aligned}$$
(41)
Here the sum over \(\alpha \) extends over \(\mathcal {D}_{m}\setminus \{ {0_{m}},{\mathrm{c}_2}({0_{m}}) \}\) if \(\tilde{r}_{{\mathrm{c}_1}({0_{m}})} \le \tilde{r}_{{\mathrm{c}_2}({0_{m}})}\), and otherwise over \(\mathcal {D}_{m}\setminus \{ {0_{m}},{\mathrm{c}_1}({0_{m}}) \}\). We then have the following analog of Corollary 1; see [17].

Theorem 5

For a given \(\mathbf {u} \in \mathrm{\ell }_{2}(\nabla ^d)\) let \(\mathbb {U}^{{\mathcal {H}}}(\mathbf{u})=\mathbb {U}(\mathbf{u})\) be the hierarchical orthonormal system of mode frames generated by the \({\mathcal {H}}\)SVD of \(\mathbf {u}\) as in Theorem 4. Then for hierarchical ranks \({\tilde{{\mathsf {r}}}} = (\tilde{r}_\alpha )\in {\mathcal {R}}_{\mathcal {H}}\) we have
$$\begin{aligned} ||\mathbf {u} - \mathrm{P }_{\mathbb {U}(\mathbf{u}),{\tilde{{\mathsf {r}}}}}\mathbf{u}|| \le {\lambda }^{\mathcal {H}}_{\tilde{{\mathsf {r}}}}(\mathbf{u}) \le \sqrt{2 m - 3} \, \inf \bigl \{||\mathbf {u} - \mathbf {v}|| :\mathbf {v} \in {\mathcal {H}}({\tilde{{\mathsf {r}}}}) \bigr \}. \end{aligned}$$

Corollary 3

For \(\mathbf{u}\in \mathrm{\ell }_{2}(\nabla ^d)\) and \({\mathsf {r}} = (r_\alpha )_{\alpha \in \mathcal {D}_{m}} {\in {\mathcal {R}}_{\mathcal {H}}}\) with \(0\le r_\alpha \le {{\mathrm{rank}}}_\alpha (\mathbf{u})\) there exist orthonormal and nested hierarchical mode frames \(\bar{\mathbb {U}}(\mathbf{u}, {\mathsf {r}})\) such that
$$\begin{aligned} ||\mathbf{u}- \mathrm{P }_{\bar{\mathbb {U}}(\mathbf{u}, {\mathsf {r}})} \mathbf{u}|| = \min _{\mathbf{w}\in \mathcal {H}({{\mathsf {r}}})}||\mathbf{u}- \mathbf{w}||, \end{aligned}$$
with \(\mathrm{P }_{\bar{\mathbb {U}}(\mathbf{u}, {\mathsf {r}})}\) as in Lemma 1.

Proof

By Theorem 3, a best approximation of hierarchical ranks \({\mathsf {r}}\) for \(\mathbf{u}\),
$$\begin{aligned} \bar{\mathbf{u}}\in \mathrm{arg\;min}\{ ||\mathbf{u}- \mathbf{v}|| :{{\mathrm{rank}}}_\alpha (\mathbf{u}) \le r_\alpha \} \,, \end{aligned}$$
exists. Defining \(\bar{\mathbb {U}}(\mathbf{u}, {\mathsf {r}}):= \mathbb {U}(\bar{\mathbf{u}})\) as the nested and orthonormal mode frames for \(\bar{\mathbf{u}}\), given by the \({\mathcal {H}}\)SVD, we obtain the assertion with Lemma 1. \(\square \)

Remark 4

Suppose that, in analogy to Remark 1, a compactly supported vector \(\mathbf{u}\) on \(\nabla ^d\) is given in a possibly redundant hierarchical representation
$$\begin{aligned} \mathbf {u} = \sum _{{\mathsf {k}}\in {\mathsf {K}_m}({\tilde{{\mathsf {r}}}})} \tilde{a}_{\mathsf {k}} \bigotimes _{i=1}^m \tilde{\mathbf U}^{(i)}_{k_i}, \quad \tilde{\mathbf {a}} = \mathrm {\Sigma }_{\mathcal {D}_{m}}(\{ \tilde{\mathbf {B}}^{(\alpha ,k_\alpha )}\}) \,, \end{aligned}$$
where the summations in the expansion of \(\tilde{\mathbf {a}}\) range over \(k_\alpha = 1,\ldots ,\tilde{r}_\alpha \) for each \(\alpha \), and where the vectors \({\tilde{\mathbf U}}^{(i)}_{k}\), \(k=1,\ldots ,\tilde{r}_i\), and \(\tilde{\mathbf {B}}^{(\alpha ,k)}\), \(k = 1,\ldots ,\tilde{r}_\alpha \), may be linearly dependent. Employing standard linear algebra procedures, a \({\mathcal {H}}\)SVD of \(\mathbf {u}\) can be computed from such a representation using a number of operations that can be estimated by
$$\begin{aligned} C m \,\Big (\max _{\alpha \in \mathcal {D}_{m}\setminus \{{0_{m}}\} }\tilde{r}_\alpha \Big )^4 + C \Big (\max _i{\tilde{r}_i}\Big )^2 \sum _{i=1}^m \#{{\mathrm{supp}}}_i(\mathbf{u}), \end{aligned}$$
(42)
where \(C>0 \) is a fixed constant; cf. [17, Lemma 4.9].

3 Recompression and Coarsening

As explained in Sect. 1.2, iterations of the form (9) provide updates \(\mathbf{v}= \mathbf{u}_k +\)\(\omega (\mathbf {f}- \mathbf {A}\mathbf{u}_k)\) that differ from the unknown \(\mathbf{u}\) by some known tolerance. However, even when using a “tensor-friendly” structure of the operator \(\mathbf {A}\) or a known “tensor sparsity” of the data \(\mathbf {f}\), the arithmetic operations leading to the update \(\mathbf{v}\) do not give any clue as to whether the resulting ranks are close to minimal. Hence, one needs a mechanism that realizes a subspace correction leading to tensor representations with ranks at least close to minimal ones. This consists in deriving from the known\(\mathbf{v}\) a near-best approximation to the unknown\(\mathbf{u}\), where the notion of near best in terms of ranks is made precise below. Specifically, suppose that \(\mathbf{v}\in \mathrm{\ell }_{2}(\nabla ^d)\) is an approximation of \(\mathbf{u}\in \mathrm{\ell }_{2}(\nabla ^d)\), which for some \(\eta >0\) satisfies
$$\begin{aligned} ||\mathbf{u}- \mathbf{v}||_{\mathrm{\ell }_{2}(\nabla ^d)}\le \eta . \end{aligned}$$
(43)
We shall show next how to derive from \(\mathbf{v}\) a near-minimal rank tensor approximation to \(\mathbf{u}\). Based on our preparations in Sect. 2, the following developments apply to both formats \({\mathcal {F}}\in \{{\mathcal {T}},{\mathcal {H}}\}\) and, in fact, to any format \({\mathcal {F}}\) with associated mode frame systems \(\mathbb {U}=\mathbb {U}^{{\mathcal {F}}}\) [see (25), (30)] for which one can formulate suitable projections \(\mathrm{P }_\mathbb {V}^{{\mathcal {F}}}, \mathrm{P }_{\mathbb {V},{\tilde{{\mathsf {r}}}}}^{{\mathcal {F}}}\) with analogous properties. Accordingly,
$$\begin{aligned} {\mathcal {R}}= {\mathcal {R}}_{{\mathcal {F}}},\quad {\mathcal {F}}\in \{{\mathcal {T}},{\mathcal {H}}\} \end{aligned}$$
(44)
denotes the respective set of admissible rank vectors \({\mathcal {R}}_{{\mathcal {T}}}\), \({\mathcal {R}}_{{\mathcal {H}}}\), defined in (16), (31), respectively. A crucial role in what follows is played by the following immediate consequence of Corollaries 2 and 3 combined with Corollary 1 and Theorem 5.

Remark 5

Let for a given \(\mathbf{v}\in \ell _2(\nabla ^d)\) the mode frame system \(\mathbb {U}(\mathbf{v})\) be either \(\mathbb {U}^{\mathcal {T}}(\mathbf{v})\) or \(\mathbb {U}^{\mathcal {H}}(\mathbf{v})\). Then, for any rank vector \({\mathsf {r}}\le {{\mathrm{rank}}}(\mathbf{v})\), \({\mathsf {r}}\in {\mathcal {R}}\), one has
$$\begin{aligned} ||\mathbf {v} - \hbox {P}_{{\mathbb {U}}({\mathbf {v}}),{{\mathsf {r}}}} \mathbf {v}|| \le {{\lambda }_{\mathsf {r}} (\mathbf{v}) } \le \kappa _\mathrm{P}||\mathbf {v} - \hbox {P}_{\bar{\mathbb {U}}({\mathbf {v}},{{\mathsf {r}}})} \mathbf {v}|| = \kappa _\mathrm{P}\min _{{{\mathrm{rank}}}(\mathbf {w})\le {\mathsf {r}}} ||\mathbf {u} - \mathbf {w}|| , \end{aligned}$$
(45)
where \(\kappa _\mathrm{P}=\sqrt{m}\) when \({\mathcal {F}}={\mathcal {T}}\), and \(\kappa _\mathrm{P}=\sqrt{2m-3}\) when \({\mathcal {F}}={\mathcal {H}}\).

As mentioned earlier, for \({\mathcal {F}}= {\mathcal {H}}\) the preceding notions depend on the dimension tree \(\mathcal{D}_m\). Since \(\mathcal{D}_m\) is fixed, we dispense with a corresponding notational reference.

3.1 Tensor Recompression

Given \(\mathbf {u}\in \mathrm{\ell }_{2}(\nabla ^d)\), in what follows, by \(\mathbb {U}(\mathbf{u})\) we either mean \(\mathbb {U}^{{\mathcal {T}}}(\mathbf{u})\) or \(\mathbb {U}^{{\mathcal {H}}}(\mathbf{u})\); see (25), (30).

We introduce next two notions of minimal ranks \(\mathrm{r }(\mathbf{u},\eta ), {\bar{\mathrm{r }}}(\mathbf{u}, \eta )\) for a given target accuracy \(\eta \), one for the specific mode frame system \(\mathbb {U}(\mathbf{u})\) provided by either HOSVD or \({\mathcal {H}}\)SVD, and one for the respective best mode frame systems.

Definition 3

For each \(\eta >0\) we choose \(\mathrm{r }(\mathbf {u}, \eta )\in {\mathcal {R}}\) such that
$$\begin{aligned} {{\lambda }_{\mathrm{r }(\mathbf {u}, \eta )}(\mathbf{u}) } \le \eta \,, \end{aligned}$$
and hence \( ||\mathbf {u} - \mathrm{P }_{\mathbb {U}(\mathbf {u}), \mathrm{r }(\mathbf {u},\eta )} \mathbf {u}||\le \eta \), with minimal \(|\mathrm{r }(\mathbf {u}, \eta )|_\infty \), that is,
$$\begin{aligned} \mathrm{r }(\mathbf{u},\eta ) \in \mathrm{arg\;min}\bigl \{|{\mathsf {r}}|_\infty : {\mathsf {r}}\in {\mathcal {R}},\; { {\lambda }_{\mathrm{r }(\mathbf {u}, \eta )}(\mathbf{u}) } \le \eta \bigr \} \,. \end{aligned}$$
Similarly, for each \(\eta >0\) we choose \({\bar{\mathrm{r }}}(\mathbf {u}, \eta )\in {\mathcal {R}}\) such that
$$\begin{aligned} ||\mathbf {u} - \hbox {P}_{\bar{\mathbb {U}}({\mathbf {u}},{{\bar{\mathrm{r }}}(\mathbf {u}, \eta ))}} \mathbf {u} || \le \eta , \end{aligned}$$
with minimal \(|{\bar{\mathrm{r }}}(\mathbf {u}, \eta )|_\infty \), that is (Corollary 2 and Remark 5),
$$\begin{aligned} {\bar{\mathrm{r }}}(\mathbf{u},\eta ) \in \mathrm{arg\;min}\,\bigl \{|{\mathsf {r}}|_\infty : {\mathsf {r}}\in {\mathcal {R}},\,\, \exists \,\, \mathbf{w}\in {{\mathcal {F}}({\mathsf {r}})},\,\, \Vert \mathbf{u}-\mathbf{w}\Vert \le \eta \bigr \} \,. \end{aligned}$$
(46)
Recall that the projections \(\hbox {P}_{\mathbb {U}(\mathbf{v}),{{\mathsf {r}}}}=\hbox {P}_{\mathbb {U}(\mathbf{v}),{{\mathsf {r}}}}^{{\mathcal {F}}}\) to \({\mathcal {F}}({\mathsf {r}})\) are given either by (11) or (40) when \({\mathcal {F}}\in \{{\mathcal {T}},{\mathcal {H}}\}\), respectively. In both cases, they will be used to define computable coarsening operators for any given \(\mathbf{v}\) (of finite support in \(\nabla ^d\)). In fact, setting
$$\begin{aligned} {\hat{\mathrm{P }}}_{\eta } \mathbf{v}:= \mathrm{P }_{\mathbb {U}({\mathbf{v}}),{\mathrm{r }(\mathbf{v},\eta )}}\mathbf{v}\,, \end{aligned}$$
(47)
we have by definition
$$\begin{aligned} ||\mathbf{v}- {\hat{\mathrm{P }}}_{\eta }\mathbf{v}||\le {\lambda }_{\mathrm{r }(\mathbf{v},\eta )} (\mathbf{v}) \le \eta ,\quad |{{\mathrm{rank}}}({\hat{\mathrm{P }}}_{\eta }\mathbf{v})|_\infty = |\mathrm{r }(\mathbf{v},\eta )|_\infty . \end{aligned}$$
(48)

Lemma 2

Fix any \(\alpha >0\). For any \(\mathbf{u}, \mathbf{v},\eta \) satisfying (43), i.e., \(||\mathbf{u}-\mathbf{v}||\le \eta \), one has
$$\begin{aligned} || \mathbf{u}- {\hat{\mathrm{P }}}_{\kappa _\mathrm{P}(1+\alpha )\eta }\mathbf{v}|| \le (1+ \kappa _\mathrm{P}(1+\alpha ))\eta , \end{aligned}$$
(49)
while
$$\begin{aligned} |{{\mathrm{rank}}}({\hat{\mathrm{P }}}_{\kappa _\mathrm{P}(1+\alpha )\eta }\mathbf{v})|_\infty = |\mathrm{r }(\mathbf{v},\kappa _\mathrm{P}(1+\alpha )\eta ) |_\infty \le |{\bar{\mathrm{r }}}(\mathbf{u},\alpha \eta )|_\infty \,. \end{aligned}$$
(50)

In other words, the ranks of \({\hat{\mathrm{P }}}_{\kappa _\mathrm{P}(1+\alpha )\eta }\mathbf{v}\) are bounded by the minimum ranks required to realize a somewhat higher accuracy.

Proof

Bearing in mind Remark 5, given \(\mathbf{u}\), one has for the projection \(\hbox {P}_{\bar{\mathbb {U}}({\mathbf{u}},{{\bar{\mathrm{r }}}(\mathbf{u},\alpha \eta ))}}\)
$$\begin{aligned}&|| \mathbf{v}- \hbox {P}_{\bar{\mathbb {U}}({\mathbf{u}},{{\bar{\mathrm{r }}}(\mathbf{u},\alpha \eta ))}}\mathbf{v}|| \le || (\mathrm{I}- \hbox {P}_{\bar{\mathbb {U}}({\mathbf{u}},{{\bar{\mathrm{r }}}(\mathbf{u},\alpha \eta ))}})(\mathbf{v}-\mathbf{u})|| \nonumber \\&\quad +||\mathbf{u}- \hbox {P}_{\bar{\mathbb {U}}({\mathbf{u}},{{\bar{\mathrm{r }}}(\mathbf{u},\alpha \eta ))}}\mathbf{u}|| \le (1+\alpha )\eta . \end{aligned}$$
(51)
On the other hand, we know that for any \({\mathsf {r}}\in {\mathcal {R}}\),
$$\begin{aligned} || \mathbf{v}- \mathrm{P }_{\mathbb {U}({\mathbf{v}}),{{\mathsf {r}}}}\mathbf{v}|| { \le {\lambda }_{\mathsf {r}}(\mathbf{v}) } \le \kappa _\mathrm{P}\, \inf _{\mathbf{w}\in {\mathcal {F}}({\mathsf {r}})} ||\mathbf{v}-\mathbf{w}||\,, \end{aligned}$$
so that, by (51), for \({\mathsf {r}}={\bar{\mathrm{r }}}(\mathbf{u},\alpha \eta )\) we have
$$\begin{aligned} || \mathbf{v}- \mathrm{P }_{\mathbb {U}({\mathbf{v}}),{{\bar{\mathrm{r }}}(\mathbf{u},\alpha \eta )}}\mathbf{v}|| { \le {\lambda }_{{\bar{\mathrm{r }}}(\mathbf{u},\alpha \eta )}(\mathbf{v}) } \le \kappa _\mathrm{P}(1+\alpha )\eta \,. \end{aligned}$$
Since, by definition, \(|{{\mathrm{rank}}}({\hat{\mathrm{P }}}_{\kappa _\mathrm{P}(1+\alpha )\eta }\mathbf{v})|_\infty \) is minimal to achieve the accuracy bound \(\kappa _\mathrm{P}(1+\alpha )\eta \), (50) follows. Estimate (49) follows by the triangle inequality. \(\square \)

Thus, appropriately coarsening \(\mathbf{v}\) yields an approximation to \(\mathbf{u}\) of still the same quality up to a fixed (dimension-dependent) constant, where the rank of this new approximation is bounded by a minimal rank of a best Tucker or hierarchical Tucker approximation to \(\mathbf{u}\) for somewhat higher accuracy.

Let us reinterpret this in terms of minimal ranks, i.e., for \(r\in \mathrm{I}\!\mathrm{N}_0\) and \({\mathcal {F}}\in \{{\mathcal {T}}, {\mathcal {H}}\}\), let
$$\begin{aligned} \sigma _{r}(\mathbf{v}) = \sigma _{r,{\mathcal {F}}}(\mathbf{v}):= \inf \,\bigl \{|| \mathbf{v}- \mathbf{w}|| \,:\; \mathbf{w}\in {\mathcal {F}}({\mathsf {r}}) \hbox { with}\ {\mathsf {r}}\in {\mathcal {R}}, |{\mathsf {r}}|_\infty \le r \bigr \}\,. \end{aligned}$$
We now consider corresponding approximation classes.

Definition 4

We call a positive, strictly increasing \(\gamma = \bigl (\gamma (n)\bigr )_{n\in \mathrm{I}\!\mathrm{N}_0}\) with \(\gamma (0)=1\) and \(\gamma (n)\rightarrow \infty \), as \(n\rightarrow \infty \), a growth sequence. For a given growth sequence \(\gamma \) we define
$$\begin{aligned} {\mathcal {A}}(\gamma )= {{\mathcal {A}}_{\mathcal {F}}({\gamma })}:= { \bigl \{\mathbf{v}\in {\mathrm{\ell }_{2}(\nabla ^d)} : \sup _{r\in \mathrm{I}\!\mathrm{N}_0} \gamma ({r})\, \sigma _{r,{\mathcal {F}}}(\mathbf{v})=:|\mathbf{v}|_{{{\mathcal {A}}_{\mathcal {F}}({\gamma })}}{ <\infty }\bigr \} } \end{aligned}$$
and \(||\mathbf{v}||_{{{\mathcal {A}}_{\mathcal {F}}({\gamma })}}:= ||\mathbf{v}|| + |\mathbf{v}|_{{{\mathcal {A}}_{\mathcal {F}}({\gamma })}}\). We call the growth sequence \(\gamma \)admissible if
$$\begin{aligned} \rho _\gamma := \sup _{n\in \mathrm{I}\!\mathrm{N}} \gamma (n)/\gamma (n-1)<\infty \,, \end{aligned}$$
which corresponds to a restriction to at most exponential growth.

In the particular case where \(\gamma (n)\sim n^{s}\) for some \(s >0\), \(||\mathbf{v}||_{{{\mathcal {A}}_{\mathcal {F}}({\gamma })}}:= ||\mathbf{v}|| + |\mathbf{v}|_{{{\mathcal {A}}_{\mathcal {F}}({\gamma })}}\) is a quasi-norm and \({{\mathcal {A}}_{\mathcal {F}}({\gamma })}\) is a linear space.

Remark 6

For the subsequent developments it will be helpful to keep in mind the following way of reading \(\mathbf{v}\in {{\mathcal {A}}_{\mathcal {F}}({\gamma })}\): a given target accuracy \(\varepsilon \) can be realized at the expense of ranks of size \(\gamma ^{-1}(|\mathbf{v}|_{{{\mathcal {A}}_{\mathcal {F}}({\gamma })}}/\varepsilon )\) so that a rank bound of the form \(\gamma ^{-1}(C|\mathbf{v}|_{{{\mathcal {A}}_{\mathcal {F}}({\gamma })}}/\varepsilon )\), where \(C\) is any constant, marks a near-optimal performance.

Theorem 6

Let \(\kappa _\mathrm{P}\) be as in Remark 2, and let \(\alpha > 0\). Assume that \(\mathbf{u}\in {{\mathcal {A}}_{\mathcal {F}}({\gamma })}\) and that \(\mathbf{v}\in \mathrm{\ell }_{2}(\nabla ^d)\) satisfies \(||\mathbf{u}-\mathbf{v}|| \le \eta \). Then, defining \(\mathbf{w}_\eta := {\hat{\mathrm{P }}}_{\kappa _\mathrm{P}(1+\alpha )\eta }\mathbf{v}\), one has
$$\begin{aligned} |{{\mathrm{rank}}}(\mathbf{w}_\eta )|_\infty \le \gamma ^{-1}\big (\rho _\gamma ||\mathbf{u}||_{{{\mathcal {A}}_{\mathcal {F}}({\gamma })}}/(\alpha \eta )\big ),\quad ||\mathbf{u}- \mathbf{w}_\eta ||\le (1+ \kappa _\mathrm{P}(1+\alpha ))\eta \end{aligned}$$
(52)
and
$$\begin{aligned} ||\mathbf{w}_\eta ||_{{{\mathcal {A}}_{\mathcal {F}}({\gamma })}}\le C ||\mathbf{u}||_{{{\mathcal {A}}_{\mathcal {F}}({\gamma })}},\quad \eta >0, \end{aligned}$$
(53)
where \(C=\alpha ^{-1}(1+\kappa _\mathrm{P}(1+\alpha )) + 1\).

Proof

The second relation in (52) was already shown in Lemma 2. We also know from (50) that \(|{{\mathrm{rank}}}(\mathbf{w}_\eta )|_\infty \le |{\bar{\mathrm{r }}}(\mathbf{u},\alpha \eta )|_\infty \). Thus, the first relation in (52) is clear if \(|{\bar{\mathrm{r }}}(\mathbf{u},\alpha \eta )|_\infty = 0\). Assume that \(|{\bar{\mathrm{r }}}(\mathbf{u},\alpha \eta )|_\infty >1\). Then, for \(r' :=|{\bar{\mathrm{r }}}(\mathbf{u},\alpha \eta )|_\infty -1\), by definition of \(|\cdot |_{{{\mathcal {A}}_{\mathcal {F}}({\gamma })}}\), we have
$$\begin{aligned} {|\mathbf{u}|_{{{\mathcal {A}}_{\mathcal {F}}({\gamma })}} \ge \gamma (r')\sigma _{ r',{\mathcal {F}}}(\mathbf{u}) \ge \gamma (r')\,\alpha \eta \ge \rho _\gamma ^{-1} \gamma (|{\bar{\mathrm{r }}}(\mathbf{u},\alpha \eta )|_\infty )\,\alpha \eta .} \end{aligned}$$
(54)
Also, when \(|{\bar{\mathrm{r }}}(\mathbf{u},\alpha \eta )|_\infty =1\), we have
$$\begin{aligned} \sigma _{0,{\mathcal {T}}}(\mathbf{u})=||\mathbf{u}||>\alpha \eta =\gamma (0)\,\alpha \eta \ge \rho _\gamma ^{-1}\gamma (|{\bar{\mathrm{r }}}(\mathbf{u},\alpha \eta )|_\infty )\,\alpha \eta \,. \end{aligned}$$
Therefore,
$$\begin{aligned} |{\bar{\mathrm{r }}}(\mathbf{u},\alpha \eta )|_\infty \le \gamma ^{-1}\big (\rho _\gamma ||\mathbf{u}||_{{{\mathcal {A}}_{\mathcal {F}}({\gamma })}}/(\alpha \eta )\big ), \end{aligned}$$
which is the first relation in (52).
As for the remaining claim, we need to estimate \(\gamma (r)\sigma _{r,{\mathcal {F}}}(\mathbf{w}_\eta )\) for \(r\in \mathrm{I}\!\mathrm{N}_0\). Whenever \(r \ge |{\bar{\mathrm{r }}}(\mathbf{u},\alpha \eta )|_\infty \), we have, by (50), \(\sigma _{r,{\mathcal {F}}}(\mathbf{w}_\eta )=0\). It thus suffices to consider \(r < |{\bar{\mathrm{r }}}(\mathbf{u},\alpha \eta )|_\infty \). By (49),
$$\begin{aligned} {\inf _{{\mathsf {r}}\in {\mathcal {R}}:|{\mathsf {r}}|_\infty \le r}||\mathbf{w}_\eta - \hbox {P}_{\bar{\mathbb {U}}({\mathbf{u}},{{\mathsf {r}}})}\mathbf{u}|| }&\le ||\mathbf{w}_\eta - \mathbf{u}|| + { \inf _{{\mathsf {r}}\in {\mathcal {R}}:|{\mathsf {r}}|_\infty \le r} ||\mathbf{u}-\hbox {P}_{\bar{\mathbb {U}}({\mathbf{u}},{{\mathsf {r}}})}\mathbf{u}|| } \\&\le (1+\kappa _\mathrm{P}(1+\alpha ))\eta + { \sigma _{r,{\mathcal {F}}}(\mathbf{u}). } \end{aligned}$$
Since for \(r < |{\bar{\mathrm{r }}}(\mathbf{u},\alpha \eta )|_\infty \) we have \(\sigma _{r,{\mathcal {F}}}(\mathbf{u})>\alpha \eta \), while \(\sigma _{|{\bar{\mathrm{r }}}(\mathbf{u},\alpha \eta )|_\infty ,{\mathcal {F}}}(\mathbf{u}) \le \alpha \eta \), we conclude that
$$\begin{aligned} \gamma (r)\,\sigma _{r,{\mathcal {F}}}(\mathbf{w}_\eta )&\le \gamma (r) \frac{(1+\kappa _\mathrm{P}(1+\alpha ))\alpha \eta }{\alpha } + \gamma (r)\,\sigma _{r,{\mathcal {F}}}(\mathbf{u}) \nonumber \\&\le \Big ( \frac{1+\kappa _\mathrm{P}(1+\alpha )}{\alpha } + 1\Big ) \gamma (r)\sigma _{r,{\mathcal {F}}}(\mathbf{u})\nonumber \\&\le \Big ( \frac{1+\kappa _\mathrm{P}(1+\alpha )}{\alpha } + 1\Big )|\mathbf{u}|_{{{\mathcal {A}}_{\mathcal {F}}({\gamma })}}, \end{aligned}$$
which shows (53). \(\square \)

3.2 Coarsening of Mode Frames

We now turn to a second type of operation for reducing the complexity of given coefficient sequences in tensor representation, an operation that coarsens mode frames by discarding basis indices whose contribution is negligible. We shall use the following standard notions for best \(N\)-term approximations.

Definition 5

For \(\hat{d} \in \mathrm{I}\!\mathrm{N}\) and \(\varLambda \subset \nabla ^{\hat{d}}\) we define the restrictions
$$\begin{aligned} \mathrm{R }_{\varLambda } \mathbf {v} := \mathbf {v} \odot \chi _\varLambda ,\quad \mathbf {v}\in \mathrm{\ell }_{2}(\nabla ^{\hat{d}}) \,, \end{aligned}$$
where \(\odot \) denotes the Hadamard (elementwise) product. The compressibility of \(\mathbf{v}\) can again be described through approximation classes. For \(s >0\), we denote by \({{\mathcal {A}}^s}(\nabla ^{\hat{d}})\) the set of \(\mathbf {v}\in \mathrm{\ell }_{2}(\nabla ^{\hat{d}})\) such that
$$\begin{aligned} ||\mathbf {v}||_{{{\mathcal {A}}^s}(\nabla ^{\hat{d}})} := \sup _{N\in \mathrm{I}\!\mathrm{N}_0} (N+1)^s {\inf _{\begin{array}{c} \varLambda \subset \nabla ^{\hat{d}}\\ \#\varLambda \le N \end{array}} ||\mathbf {v} - \mathrm{R }_{\varLambda } \mathbf {v}|| } < \infty \,. \end{aligned}$$
Endowed with this (quasi-)norm, \({{\mathcal {A}}^s}(\nabla ^{\hat{d}})\) becomes a (quasi-)Banach space. When no confusion can arise, we shall suppress the index set dependence and write \({{\mathcal {A}}^s}= {{\mathcal {A}}^s}(\nabla ^{\hat{d}})\).

Remark 7

The same comment as in Remark 6 applies. Thinking of the growth sequence as being \(\gamma _s(n)=(n+1)^s\), realizing an accuracy \(\varepsilon \) at the expense of \((C ||\mathbf {v}||_{{{\mathcal {A}}^s}(\nabla ^{\hat{d}})} /\varepsilon )^{1/s}\) terms, where \(C\) is a constant independent of \(\varepsilon \), signifies an optimal work-accuracy balance over the class \({{\mathcal {A}}^s}(\nabla ^{\hat{d}})\).

We deliberately restrict the discussion to polynomial decay rates here since this corresponds to finite Sobolev or Besov regularity. However, with appropriate modifications, the subsequent considerations can be adapted also to approximation classes corresponding to more general growth sequences.

3.2.1 Tensor Contractions

Searching through a sequence \(\mathbf{u}\in \ell _2(\nabla ^d)\) (of finite support) would suffer from the curse of dimensionality. Being content with near-best\(N\)-term approximations one can get around this by introducing, for each given \(\mathbf{u}\in \ell _2(\nabla ^d)\), the following quantities formed from certain contractions of the tensor \(\mathbf{u}\ \otimes \ \mathbf{u}\) that are given by \(\mathrm{diag}(T^{(i)}_\mathbf{u}(T^{(i)}_\mathbf{u})^*)\).

Definition 6

Let \(\mathbf{u}\in \mathrm{\ell }_{2}(\nabla ^d)\). For \(i\in \{{1},\ldots ,{m}\}\) we define, using the notation (21),
$$\begin{aligned} \pi ^{(i)}(\mathbf {u}) = \bigl ( \pi ^{(i)}_{\nu _i} (\mathbf {u}) \bigr )_{\nu _i\in \nabla ^{d_i}} :=\biggl ( \Bigl (\sum _{\check{\nu }_i} |u_{\nu }|^2 \Bigr )^{\frac{1}{2}} \biggr )_{\nu _i \in \nabla ^{d_i}} \,. \end{aligned}$$

With a slight abuse of terminology, we shall refer to these \(\pi ^{(i)}(\cdot )\) simply as contractions. Their direct computation would involve high-dimensional summations over the index sets \(\nabla ^{d-d_i}\). However, the following observations show how this can be avoided. This makes essential use of the particular orthogonality properties of the tensor formats.

Proposition 2

Let \(\mathbf{u}\in \mathrm{\ell }_{2}(\nabla ^d)\).
  1. (i)

    We have \(||\mathbf{u}|| = ||\pi ^{(i)}(\mathbf{u})||\), \(i=1,\ldots ,m\).

     
  2. (ii)
    Let \(\varLambda ^{(i)}\subseteq \nabla ^{d_i}\); then
    $$\begin{aligned} ||\mathbf{u}- \mathrm{R }_{\varLambda ^{(1)} \times \cdots \times \varLambda ^{(m)}} \mathbf{u}|| \le \Big (\sum _{i=1}^m \sum _{\nu \in \nabla ^{d_i}\setminus \varLambda ^{(i)}} |\pi ^{(i)}_\nu (\mathbf{u})|^2 \Big )^{\frac{1}{2}} . \end{aligned}$$
    (55)
     
  3. (iii)
    Let in addition \(\mathbf {U}^{(i)}\) and \(\mathbf {a}\) be mode frames and core tensor, respectively, as in Theorem 2 or 4, and let \((\sigma ^{(i)}_k)\) be the corresponding sequences of mode-\(i\) singular values. Then
    $$\begin{aligned} \pi ^{(i)}_\nu (\mathbf {u}) = \Big ( \sum _{k} \bigl |\mathbf {U}^{(i)}_{\nu , k}\bigr |^2 \bigl |\sigma ^{(i)}_{k}\bigr |^2 \Big )^{\frac{1}{2}},\quad \nu \in \nabla ^{d_i}. \end{aligned}$$
    (56)
     

Proof

Property (i) is clear, and (iii) is a simple consequence of the orthogonality properties of mode frames and core tensor stated in Theorems 2 and 4. Abbreviating \({\tilde{\mathbf u}}:= \mathrm{R }_{\varLambda ^{(1)} \times \cdots \times \varLambda ^{(m)}} \mathbf{u}\), property (ii) follows, in view of (i), from
$$\begin{aligned} ||{\tilde{\mathbf u}}- \mathbf {u}||^2&\le ||\mathbf {u} - \mathrm{R }_{\varLambda ^{(1)} \times \nabla ^{d_2}\times \cdots \times \nabla ^{d_m}} \mathbf {u}||^2 + \cdots + ||\mathbf {u} - \mathrm{R }_{\nabla ^{d_1} \times \cdots \times \nabla ^{d_{m-1}} \times \varLambda ^{(m)}} \mathbf {u} ||^2\nonumber \\&= \sum _{i=1}^m \sum _{\nu \in \nabla ^{d_i} \setminus \varLambda ^{(i)}} \bigl |\pi ^{(i)}_{\nu }(\mathbf {u})\bigr |^2 \,. \end{aligned}$$
\(\square \)

The following subadditivity property is an immediate consequence of the triangle inequality.

Proposition 3

Let \(N\in \mathrm{I}\!\mathrm{N}\) and \(\mathbf{u}_n \in \mathrm{\ell }_{2}(\nabla ^d)\), \(n=1,\ldots ,N\). Then for each \(i\) and each \(\nu \in \nabla ^{d_i}\) we have
$$\begin{aligned} \pi ^{(i)}_\nu \Big (\,\sum _{n=1}^N \mathbf{u}_n\Big ) \le \sum _{n=1}^N \pi ^{(i)}_\nu (\mathbf{u}_n). \end{aligned}$$
Relation (56) allows us to realize [in practice, of course, for finite ranks \({{\mathrm{rank}}}(\mathbf{u})\) and finitely supported mode frames \(\mathbf{U}^{(i)}\)] the best \(N\)-term approximations of the contractions \(\pi ^{(i)}(\mathbf{u})\) through those of the mode frames \(\mathbf{U}^{(i)}_k\). Moreover, expressing coarsening errors in terms of the tails of contraction sequences requires finding good Cartesian index sets. To see how to determine them, consider a nonincreasing rearrangement
$$\begin{aligned} \pi ^{(i_1)}_{ \nu ^{i_1,1}}(\mathbf{u})\ge \pi ^{(i_2)}_{ \nu ^{i_2,2}}(\mathbf{u})\ge \cdots \ge \pi ^{(i_j)}_{ \nu ^{i_j,j}}(\mathbf{u})\ge \cdots ,\quad \nu ^{i_j,j}\in \nabla ^{d_{i_j}}, \end{aligned}$$
(57)
of the entire set of contractions for all tensor modes:
$$\begin{aligned} { \bigl \{\pi ^{(i)}_\nu (\mathbf{u}) : \nu \in \nabla ^{d_i},\, i=1,\ldots ,m \bigr \}. } \end{aligned}$$
Next, retaining only the \(N\) largest terms from the latter total ordering (57) and redistributing them to the respective dimension bins
$$\begin{aligned} \varLambda ^{(i)}(\mathbf{u};N):= \bigl \{\nu ^{i_j,j}: i_j= i,\, j=1,\ldots , N\bigr \}, \quad i=1,\ldots , m, \end{aligned}$$
(58)
the product set
$$\begin{aligned} \varLambda (\mathbf {u};N) := \mathop {\times }\limits _{i=1}^m \varLambda ^{(i)}(\mathbf {u};N) \end{aligned}$$
(59)
can be obtained at a cost that is roughly \(m\) times the analogous low-dimensional cost. By construction, one has
$$\begin{aligned} \sum _{i=1}^m \# \varLambda ^{(i)}({\mathbf {u}};N) \le N \end{aligned}$$
(60)
and
$$\begin{aligned} \sum _{i=1}^m \sum _{\nu \in \nabla ^{d_i}\setminus \varLambda ^{(i)}(\mathbf{u};N)} | \pi ^{(i)}_\nu (\mathbf{u})|^2 = \min _{\hat{\varLambda }}\left\{ \sum _{i=1}^m \sum _{\nu \in \nabla ^{d_i}\setminus \hat{\varLambda }^{(i)} } | \pi ^{(i)}_\nu (\mathbf{u})|^2\right\} , \end{aligned}$$
(61)
where \(\hat{\varLambda }\) ranges over all product sets \(\times _{i=1}^m \hat{\varLambda }^{(i)}\) with \(\sum _{i=1}^m \# \hat{\varLambda }^{(i)} \le N\).

Proposition 4

For any \(\mathbf {u} \in \mathrm{\ell }_{2}(\nabla ^d)\) one has
$$\begin{aligned} { ||\mathbf {u} - \mathrm{R }_{\varLambda (\mathbf {u};N)}\mathbf{u}|| \le \Big ( \sum _{i=1}^m\sum _{\nu \in \nabla ^{d_i}\setminus \varLambda ^{(i)}(\mathbf{u};N)} \bigl |\pi ^{(i)}_\nu (\mathbf {u})\bigr | \Big )^{\frac{1}{2}} =: {\mu }_N(\mathbf{u}) \,, } \end{aligned}$$
(62)
and for any \(\hat{\varLambda } = \times _{i=1}^m \hat{\varLambda }^{(i)}\), with \(\hat{\varLambda }^{(i)} \subset \nabla ^{d_i}\) satisfying \(\sum _{i=1}^m \# \hat{\varLambda }^{(i)} \le N\), one has
$$\begin{aligned} ||\mathbf {u} - \mathrm{R }_{\varLambda (\mathbf {u};N)}\mathbf{u}|| \le {\mu }_N(\mathbf{u}) \le \sqrt{m} ||\mathbf {u} - \mathrm{R }_{\hat{\varLambda }}\mathbf {u}|| \,. \end{aligned}$$
(63)

Proof

The bound (62) is immediate from (55). Now let \(\hat{\varLambda }\) be as in the hypothesis; then one obtains by (62) and (61)
$$\begin{aligned} ||\mathbf {u} - \mathrm{R }_{\varLambda (\mathbf {u};N)}\mathbf{u}||^2&\le \sum _{i=1}^m \sum _{\nu \in \nabla ^{d_i}\setminus \hat{\varLambda }^{(i)}} \bigl |\pi ^{(i)}_{\nu }(\mathbf {u})\bigr |^2 = ||\mathbf {u} - \mathrm{R }_{\hat{\varLambda }^{(1)} \times \nabla ^{d_2}\times \cdots \times \nabla ^{d_m}} \mathbf {u}||^2 + \cdots \\&\quad +||\mathbf {u} - \mathrm{R }_{\nabla ^{d_1} \times \cdots \times \nabla ^{d_{m-1}} \times \hat{\varLambda }^{(m)}} \mathbf {u} ||^2 \le m ||\mathbf {u} - \mathrm{R }_{\hat{\varLambda }} \mathbf {u} ||^2\,. \end{aligned}$$
\(\square \)
Note that the sorting (57) used in the construction of (59) can be replaced by a quasi-sorting by binary binning; we shall return to this point in the proof of Remark 11. With the foregoing preparations at hand we define the coarsening operator
$$\begin{aligned} \mathrm{C }_{\mathbf {u}, N} \mathbf {v} := \mathrm{R }_{\varLambda (\mathbf {u};N)} \mathbf {v},\quad \mathbf {v} \in \mathrm{\ell }_{2}(\nabla ^d). \end{aligned}$$
(64)
While \( \mathrm{C }_{\mathbf {u}, N}\) is computationally feasible, it is not necessarily strictly optimal. However, we remark that for each \(N\in \mathrm{I}\!\mathrm{N}\) there exists \(\bar{\varLambda }(\mathbf {u};N) = \times _i \bar{\varLambda }^{(i)}(\mathbf {u};N)\) such that the best tensor coarsening operator
$$\begin{aligned} {{\bar{\mathrm{C }}}}_{\mathbf {u},N} \mathbf {v} := \mathrm{R }_{\bar{\varLambda }(\mathbf {u};N)} \mathbf {v}, \quad \mathbf {v} \in \mathrm{\ell }_{2}(\nabla ^d), \end{aligned}$$
(65)
realizes
$$\begin{aligned} ||\mathbf {u} - {{\bar{\mathrm{C }}}}_{\mathbf {u},N} \mathbf {u} || = \min _{\sum _i \# {{\mathrm{supp}}}_i(\mathbf {w})\le N} ||\mathbf {u} - \mathbf {w}||. \end{aligned}$$
(66)
The next observation is that the contractions are stable under the projections \(\mathrm{P }_{\mathbb {U}^{\mathcal {F}}(\mathbf{u}),{\mathsf {r}}}\), \({\mathcal {F}}\in \{{\mathcal {T}}, {\mathcal {H}}\}\).

Lemma 3

Let \(\mathbf {u}\in \mathrm{\ell }_{2}(\nabla ^d)\) and \({\mathcal {R}}={\mathcal {R}}_{\mathcal {F}}\), as in (44), given by (16), (31), respectively. Then for \(i\in \{1,\ldots ,m\}\), \(\nu \in \nabla ^{d_i}\), and any rank vector \({\mathsf {r}}\in {\mathcal {R}}\), with \({\mathsf {r}}\le {{\mathrm{rank}}}(\mathbf{u})\) componentwise, we have
$$\begin{aligned} \pi ^{(i)}_\nu (\mathrm{P }_{\mathbb {U}({\mathbf{u}}),{{\mathsf {r}}}} \mathbf{u}) \le \pi ^{(i)}_\nu (\mathbf {u}) \,, \end{aligned}$$
where \(\mathbb {U}(\mathbf{u})\) either stands for \(\mathbb {U}^{\mathcal {T}}(\mathbf{u})\) or \(\mathbb {U}^{\mathcal {H}}(\mathbf{u})\) referring to the Tucker and hierarchical Tucker format, respectively; see (25), (30).

Proof

We consider first the Tucker format. Using the orthonormality of the mode frames \(\mathbb {U}(\mathbf{u})\) we obtain
$$\begin{aligned} {\bigl (\pi ^{(i)}_\nu (\mathrm{P }_{\mathbb {U}({\mathbf{u}}),{{\mathsf {r}}}} \mathbf{u}) \bigr )^2 = \sum _{\check{{\mathsf {k}}}_i\in {\mathsf {K}_m}({\check{{\mathsf {r}}}}_i)} \Big ( \sum _{k_i=1}^{r_i} U^{(i)}_{\nu ,k_i} a_{\mathsf {k}} \Big )^2,\quad {\nu \in \nabla ^{d_i}}.} \end{aligned}$$
(67)
For any fixed \(\check{{\mathsf {k}}}_i\) we have
$$\begin{aligned} \sum _{k_i = 1}^{r_i} \sum _{l_i = 1}^{r_i} U^{(i)}_{\nu ,k_i} U^{(i)}_{\nu ,l_i} a_{\check{{\mathsf {k}}}_i|_{k_i}} a_{\check{{\mathsf {k}}}_i|_{l_i}} = \Big ( \sum _{k_i = 1}^{r_i} U^{(i)}_{\nu ,k_i} a_{\check{{\mathsf {k}}}_i|_{k_i}} \Big )^2 \ge 0, \quad {\nu \in \nabla ^{d_i}} . \end{aligned}$$
(68)
Combining this with (67) and abbreviating \({\mathsf {R}} := {{\mathrm{rank}}}(\mathbf{u})\) we obtain
$$\begin{aligned} \bigl (\pi ^{(i)}_\nu (\mathrm{P }_{\mathbb {U}({\mathbf{u}}),{{\mathsf {r}}}} \mathbf{u}) \bigr )^2 \le \sum _{k_i = 1}^{r_i} \sum _{l_i = 1}^{r_i} U^{(i)}_{\nu ,k_i} U^{(i)}_{\nu ,l_i} \sum _{\check{{\mathsf {k}}}_i \in {\mathsf {K}_m}({\check{\mathsf {R}}}_i)} a_{\check{{\mathsf {k}}}_i|_{k_i}} a_{\check{{\mathsf {k}}}_i|_{l_i}}, \quad {\nu \in \nabla ^{d_i}}. \end{aligned}$$
By Theorem 2(ii), the right-hand side equals
$$\begin{aligned} \sum _{k_i=1}^{r_i} \bigl |\sigma ^{(i)}_{k_i} U^{(i)}_{\nu ,k_i}\bigr |^2 \le \bigl ( \pi ^{(i)}_\nu (\mathbf {u}) \bigr )^2. \end{aligned}$$
This proves the assertion for the Tucker format. The proof for the hierarchical Tucker format follows similar lines. We treat additional summations arising in the core tensor in the same way as the summation over \(\check{{\mathsf {k}}}_i\) earlier and recursively apply the same argument as in (68). \(\square \)
As a next step, we shall combine the coarsening procedure of this subsection with the tensor recompression considered earlier. To this end, we define \(N(\mathbf{v},\eta ) := \min \bigl \{ N:{\mu }_N(\mathbf{v}) \le \eta \bigr \}\), where \({\mu }_N\) is defined in (62), as well as
$$\begin{aligned} {{\hat{\mathrm{C}}}_{\eta }} (\mathbf {v}) := \mathrm{C }_{\mathbf{v}, N(\mathbf{v};\eta )} \mathbf {v}, \end{aligned}$$
(69)
in order to switch from \(N\)-term approximation to the corresponding thresholding procedures. As a consequence of (62), we have
$$\begin{aligned} ||\mathbf {v} - \mathrm{C }_{\mathbf {v},N} \mathbf {v}|| {\le {\mu }_N(\mathbf{v}) } \le \kappa _\mathrm{C}||\mathbf {v} - {{\bar{\mathrm{C }}}}_{\mathbf {v},N} \mathbf {v}|| ,\quad \kappa _\mathrm{C}=\sqrt{m}. \end{aligned}$$
(70)
Our general assumption on the approximability of mode frames is that \(\pi ^{(i)}(\mathbf {u}) \in {{\mathcal {A}}^s}\), which, as mentioned earlier, reflects the finite Sobolev or Besov regularity of functions whose wavelet coefficients are given by the lower-dimensional tensor factors.

3.2.2 Combination of Tensor Recompression and Coarsening

Recall that we use \(\Vert \cdot \Vert _{{{\mathcal {A}}^s}}\) to quantify the sparsity of the wavelet expansions of mode frames and \(\Vert \cdot \Vert _{{{\mathcal {A}}_{\mathcal {F}}({\gamma })}}\) to quantify low-rank approximability. The following main result of this section applies again to both the Tucker and the hierarchical Tucker format. It extends Theorem 6 in combining tensor recompression and wavelet coarsening and shows that both reduction techniques combined are optimal up to uniform constants and stable in the respective sparsity norms.

Theorem 7

For a given \(\mathbf{v}\in \ell _2(\nabla ^d)\) let the mode frame system \(\mathbb {U}(\mathbf{v})\) be either \(\mathbb {U}^{\mathcal {T}}(\mathbf{v})\) or \(\mathbb {U}^{\mathcal {H}}(\mathbf{v})\) (see (25), (30)). Let \(\mathbf {u}, \mathbf {v} \in \mathrm{\ell }_{2}(\nabla ^d)\), with \(\mathbf {u}\in {{\mathcal {A}}_{\mathcal {F}}({\gamma })}\), \(\pi ^{(i)}(\mathbf {u}) \in \mathcal{A}^s\) for \(i=1,\ldots ,m\), and \(||\mathbf {u}-\mathbf {v}|| \le \eta \). As earlier, let \(\kappa _\mathrm{P}= \kappa _\mathrm{C}= \sqrt{m}\) for the Tucker format, while for the \({\mathcal {H}}\)-Tucker format \(\kappa _\mathrm{P}= \sqrt{2m-3}\) and \(\kappa _\mathrm{C}= \sqrt{m}\). Then, for
$$\begin{aligned} \mathbf {w}_{\eta } := {\hat{\mathrm{{C}}}}{\kappa _\mathrm{C}(\kappa _\mathrm{P}+1)(1+\alpha )\eta } \bigl ({\hat{\mathrm{P }}}_{\kappa _\mathrm{P}(1+\alpha )\eta } (\mathbf {v}) \bigr ) \end{aligned}$$
(71)
we have
$$\begin{aligned} ||\mathbf{u}- \mathbf{w}_\eta || \le \bigl ( 1 + \kappa _\mathrm{P}(1+\alpha ) + \kappa _\mathrm{C}(\kappa _\mathrm{P}+1)(1+\alpha )\bigr ) \eta , \end{aligned}$$
(72)
as well as
$$\begin{aligned} |{{\mathrm{rank}}}(\mathbf{w}_\eta )|_\infty \le \gamma ^{-1}\bigl (\rho _\gamma ||\mathbf{u}||_{{{\mathcal {A}}_{\mathcal {F}}({\gamma })}}/(\alpha \eta )\bigr ),\qquad ||\mathbf{w}_\eta ||_{{{\mathcal {A}}_{\mathcal {F}}({\gamma })}} \le C_1 ||\mathbf{u}||_{{{\mathcal {A}}_{\mathcal {F}}({\gamma })}} , \end{aligned}$$
(73)
with \(C_1 = (\alpha ^{-1}(1+\kappa _\mathrm{P}(1+\alpha )) + 1)\) and
$$\begin{aligned} \sum _{i=1}^m \#{{\mathrm{supp}}}_i (\mathbf {w}_\eta )&\le {2 \eta ^{-\frac{1}{s}} m\, \alpha ^{-\frac{1}{s}} } \Big ( \sum _{i=1}^m || \pi ^{(i)}(\mathbf{u})||_{{{\mathcal {A}}^s}} \Big )^{\frac{1}{s}} \,, \nonumber \\ \quad \sum _{i=1}^m ||\pi ^{(i)}(\mathbf{w}_\eta )||_{{{\mathcal {A}}^s}}&\le C_2 \sum _{i=1}^m ||\pi ^{(i)}(\mathbf{u})||_{{{\mathcal {A}}^s}}, \end{aligned}$$
(74)
with \(C_2 = 2^s(1+ {3^s}) + 2^{{4s}} { \alpha ^{-1} \bigl ( 1+ \kappa _\mathrm{P}(1+\alpha ) + \kappa _\mathrm{C}(\kappa _\mathrm{P}+ 1)(1+\alpha ) \bigr ) } m^{\max \{1,s\}}\).

Proof

Taking (49) in Lemma 2 and the definition (69) into account, the relation (72) follows from the triangle inequality.

The statements in (73) follow from Theorem 6. Note that the additional mode frame coarsening considered here does not affect these estimates.

For the proof of (74) we can proceed similarly to [9, Corollary 5.2] (see also [8, Theorem 4.9.1]). We set \(\hat{\mathbf{w}} := {\hat{\mathrm{P }}}_{\kappa _\mathrm{P}(1+\alpha )\eta } (\mathbf {v})\). Let \(N \in \mathrm{I}\!\mathrm{N}\) be minimal such that \(||\mathbf{u}- {{\bar{\mathrm{C }}}}_{\mathbf{u},N}\mathbf{u}|| \le \alpha \eta \). Then
$$\begin{aligned} ||\hat{\mathbf{w}} - {{\bar{\mathrm{C }}}}_{\mathbf{u},N} \hat{\mathbf{w}}||&\le ||(\mathrm{I}- {{\bar{\mathrm{C }}}}_{\mathbf{u},N})(\mathbf{u}- \hat{\mathbf{w}})|| + ||\mathbf{u}- {{\bar{\mathrm{C }}}}_{\mathbf{u},N} \mathbf{u}|| \\&\le ||\mathbf{u}- \hat{\mathbf{w}}|| + ||\mathbf{u}- {{\bar{\mathrm{C }}}}_{\mathbf{u},N} \mathbf{u}|| \le \bigl ({1+}\kappa _\mathrm{P}(1+\alpha )+\alpha \bigr )\eta , \end{aligned}$$
where we have used Lemma 2 to bound the first summand on the right-hand side. Consequently, by (70),
$$\begin{aligned} ||\hat{\mathbf{w}} - \mathrm{C }_{\hat{\mathbf{w}},N} \hat{\mathbf{w}}|| \le {\mu }_N(\hat{\mathbf{w}})&\le \kappa _\mathrm{C}||\hat{\mathbf{w}} - {{\bar{\mathrm{C }}}}_{\hat{\mathbf{w}},N} \hat{\mathbf{w}}|| \nonumber \\&\le \kappa _\mathrm{C}||\hat{\mathbf{w}} - {{\bar{\mathrm{C }}}}_{\mathbf{u},N} \hat{\mathbf{w}}|| \le \kappa _\mathrm{C}\bigl ({1+}\kappa _\mathrm{P}(1+\alpha )+\alpha \bigr )\eta .\quad \end{aligned}$$
(75)
Furthermore, note that without loss of generality, we may assume \(N \ge m\). Keeping in mind the definition (65) and the optimality (66), (55) yields
$$\begin{aligned} { \alpha \eta < ||\mathbf{u}- {{\bar{\mathrm{C }}}}_{\mathbf{u},N-1}\mathbf{u}|| }&\le \inf _{\sum _i \#\varLambda _i \le { N-1 }} \Big ( \sum _{i=1}^m ||\pi ^{(i)}(\mathbf{u}) - \mathrm{R }_{\varLambda _i} \pi ^{(i)}(\mathbf{u})||^2 \Big )^{\frac{1}{2}} \\&\le \sum _{i=1}^m \inf _{\#\varLambda _i \le {(N-1)}/m} ||\pi ^{(i)}(\mathbf{u}) - \mathrm{R }_{\varLambda _i} \pi ^{(i)}(\mathbf{u})|| \\&\le \bigl ({(N-1)}/m\bigr )^{-s} \sum _{i=1}^m ||\pi ^{(i)}(\mathbf{u})||_{{{\mathcal {A}}^s}}\\&{\le 2^s \bigl ({N}/m\bigr )^{-s}\sum _{i=1}^m ||\pi ^{(i)}(\mathbf{u})||_{{{\mathcal {A}}^s}}}. \end{aligned}$$
Using the latter estimate and noting that, by (75), the coarsening operator \(\hat{\mathrm{C}}_{\kappa _\mathrm{C}(1+\kappa _\mathrm{P})(1+\alpha )\eta }\) retains at most \(N\) terms, we conclude that
$$\begin{aligned} \sum _{i=1}^m \#{{\mathrm{supp}}}_i(\mathbf{w}_\eta ) \le N \le {2m\,\alpha ^{-\frac{1}{s}} \eta ^{-\frac{1}{s}} } \Bigl ( \sum _{i=1}^m || \pi ^{(i)}(\mathbf{u})||_{{{\mathcal {A}}^s}} \Bigr )^{\frac{1}{s}} \, \end{aligned}$$
(76)
and, hence, the first statement in (74). Now let \(\hat{N} = \sum _{i=1}^m \hat{N}_i\) with \(\hat{N}_i := \#{{\mathrm{supp}}}_i(\mathbf{w}_\eta )\), where we may also assume \(\hat{N}_i > 0\) without loss of generality. Resolving (76) for \(\eta \), one can rewrite (72) as
$$\begin{aligned} ||\mathbf{u}- \mathbf{w}_\eta || \le { {\hat{N}}^{-s} } {C(\alpha )} \, m^s \, \Bigl ( \sum _{i=1}^m ||\pi ^{(i)}(\mathbf{u})||_{{{\mathcal {A}}^s}} \Bigr ) \,, \end{aligned}$$
(77)
where \(C(\alpha ):= 2^s\alpha ^{-1}(1+ \kappa _\mathrm{P}(1+\alpha ) + \kappa _\mathrm{C}(\kappa _\mathrm{P}+ 1)(1+\alpha ))\). Let \(\hat{\mathbf {u}}_i\) be the best \(\hat{N}_i\)-term approximation to \(\pi ^{(i)}(\mathbf{u})\), then
$$\begin{aligned} ||\pi ^{(i)}(\mathbf{w}_\eta )||_{{{\mathcal {A}}^s}}&\le 2^s \bigl ( ||\hat{\mathbf {u}}_i||_{{{\mathcal {A}}^s}} + ||\hat{\mathbf {u}}_i - \pi ^{(i)}(\mathbf{w}_\eta )||_{{{\mathcal {A}}^s}} \bigr ) \\&\le 2^s \bigl ( ||\pi ^{(i)}(\mathbf{u})||_{{{\mathcal {A}}^s}} + {(2 \hat{N}_i + 1)^s} ||\hat{\mathbf {u}}_i - \pi ^{(i)}(\mathbf{w}_\eta )|| \bigr ) \\&\le 2^s \bigl ( ||\pi ^{(i)}(\mathbf{u})||_{{{\mathcal {A}}^s}} \\&\quad + {(2 \hat{N}_i + 1)^s} (||\hat{\mathbf {u}}_i - \pi ^{(i)}(\mathbf{u})|| + ||\pi ^{(i)}(\mathbf{u}) - \pi ^{(i)}(\mathbf{w}_\eta )|| ) \bigr )\\&\le 2^s\bigl ((1+{3^s})||\pi ^{(i)}(\mathbf{u})||_{{{\mathcal {A}}^s}} + {(2 \hat{N}_i + 1)^s} ||\pi ^{(i)}(\mathbf{u}) - \pi ^{(i)}(\mathbf{w}_\eta )|| \bigr ) \,, \end{aligned}$$
where we have used that \(||\hat{\mathbf {u}}_i - \pi ^{(i)}(\mathbf{u})|| \le {\hat{N}_i}^{-s} ||\pi ^{(i)}(\mathbf{u})||_{{{\mathcal {A}}^s}}\) and \(\#{{\mathrm{supp}}}(\hat{\mathbf {u}}_i - \pi ^{(i)}(\mathbf{w}_\eta )) \le 2 \hat{N}_i\). Moreover, as a consequence of the Cauchy–Schwarz inequality, we have the componentwise estimate
$$\begin{aligned} |\pi ^{(i)}_\nu (\mathbf{u}) - \pi ^{(i)}_\nu (\mathbf{w}_\eta )| \le \pi ^{(i)}_\nu (\mathbf{u}- \mathbf{w}_\eta ) \,, \end{aligned}$$
which yields
$$\begin{aligned} ||\pi ^{(i)}(\mathbf{u}) - \pi ^{(i)}(\mathbf{w}_\eta )||&\le ||\pi ^{(i)}(\mathbf{u}- \mathbf{w}_\eta )|| = ||\mathbf{u}- \mathbf{w}_\eta ||. \end{aligned}$$
Combining this with (77) we obtain
$$\begin{aligned}&||\pi ^{(i)}(\mathbf{w}_\eta )||_{{{\mathcal {A}}^s}}\le 2^s (1+{3^s})||\pi ^{(i)}(\mathbf{u})||_{{{\mathcal {A}}^s}}\\&\quad + {2^{s}} C(\alpha )\, m^s { \hat{N}^{-s} } (2 \hat{N}_i + 1)^s \Big (\, \sum _{k=1}^m ||\pi ^{(k)}(\mathbf{u})||_{{{\mathcal {A}}^s}} \Big ). \end{aligned}$$
Summing over \(i=1,\ldots ,m\) and noting that
$$\begin{aligned} {\hat{N}^{-s}} \sum _{i=1}^m {(2\hat{N}_i + 1)^s} \le {2^{2s}} m^{\max \{0,1-s\}}\,, \end{aligned}$$
we arrive at the second assertion in (74). \(\square \)

4 Adaptive Approximation of Operators

Whether the solution to an operator equation actually exhibits some tensor and expansion sparsity is expected to depend strongly on the structure of the involved operator. The purpose of this section is to formulate a class of operators that are tensor-friendly in the sense that their approximate application does not increase the rank too much. Making this precise requires some model assumptions that at this point we feel are relevant in that a wide range of interesting cases is covered. But of course, many possible variants would be conceivable as well. In that sense the main issue in the subsequent discussion is to identify the essential structural mechanisms that would still work under somewhat different model assumptions.

We shall approach this on two levels. First we consider operators with an exact low rank structure. Of course, assuming that the operator is a single tensor product of operators acting on functions of a smaller number of variables would be far too restrictive and also concern a trivial scenario, since ranks would be preserved. More interesting are sums of tensor products such as the \(m\)-dimensional Laplacian
$$\begin{aligned} \varDelta = \partial _{x_1}^2 + \cdots + \partial _{x_m}^2, \end{aligned}$$
where, strictly speaking, each summand \(\partial _{x_j}\) is a tensor product of the identity operators acting on all but the \(j\)th variable with the second-order partial derivative with respect to the \(j\)th variable. Hence the wavelet representation \(\mathbf {A}\) of \(\varDelta \) in an \(L_2\)-orthonormal wavelet basis has the form
$$\begin{aligned} \mathbf {A}= \mathbf {A}_1\otimes \mathrm{I}_2\otimes \cdots \otimes \mathrm{I}_m + \cdots + \mathrm{I}_1\otimes \cdots \otimes \mathrm{I}_{m-1}\otimes \mathbf {A}_m, \end{aligned}$$
(78)
where \(\mathbf {A}_j\) is the wavelet representation of \(\partial _{x_j}\). There is, however, an issue concerning the scaling of the wavelet bases. For \(L_2\)-orthonormalized wavelets \(\mathbf {A}\) is not bounded, an issue to be taken up again later in Remark 18.

At a second stage it is important to cover also operators that do not have an explicit low-rank structure but can be approximated in a quantified manner by low-rank operators. A typical example are potential terms, such as those arising in electronic structure calculations (e.g., [2] and references cited therein), as well as the rescaled versions of operators of the type (78) mentioned earlier.

4.1 Operators with Explicit Low-Rank Form

We start with a technical observation that will be used at several points. Given operators \(\mathbf {B}^{(i)} = (b^{(i)}_{\nu _i,\tilde{\nu }_i})_{\nu _i,\tilde{\nu }_i\in \nabla ^{d_i}}: \ell _2(\nabla ^{d_i})\rightarrow \ell _2(\nabla ^{d_i})\), recall that their tensor product \(\mathbf {B}= \mathbf {B}^{(1)}\otimes \cdots \otimes \mathbf {B}^{(m)}\) is given by \( \mathbf {B}_{\nu ,\tilde{\nu }}= b^{(1)}_{\nu _1,\tilde{\nu }_1}\cdots b^{(m)}_{\nu _m,\tilde{\nu }_m}\) so that whenever \(\mathbf{v}= \mathbf{v}^1\otimes \cdots \otimes \mathbf{v}^m\), \(\mathbf{v}^j\in \ell _2(\nabla ^{d_j})\), we have \(\mathbf {B}\mathbf{v}= (\mathbf {B}^{(1)}\mathbf{v}^1)\otimes \cdots \otimes (\mathbf {B}^{(m)}\mathbf{v}^m)\). Observing that for any \(\mathbf{v}\in \ell _2(\nabla )\)
$$\begin{aligned} \mathbf {B}\mathbf{v}= \Big (\mathrm{I}_1 \otimes \mathbf {B}^{(2)}\otimes \cdots \otimes \mathbf {B}^{(m)} \Big )\Big ( \big (\mathbf {B}^{(1)} \otimes \mathrm{I}_2 \otimes \cdots \otimes \mathrm{I}_m\big )\mathbf{v}\Big ) , \end{aligned}$$
(79)
we conclude
$$\begin{aligned} ||\mathbf {B}\mathbf{v}|| \le \Vert {\mathbf {B}^{(2)}\otimes \cdots \otimes \mathbf {B}^{(m)}} \,\Vert \Vert { \pi ^{(1)}\bigl (( \mathbf {B}^{(1)} \otimes \mathrm{I}_2 \otimes \cdots \otimes \mathrm{I}_m)\mathbf{v}\bigr )\Vert }. \end{aligned}$$
More generally, one obtains by the same argument
$$\begin{aligned} ||\mathbf {B}\mathbf{v}||&\le \Vert {\mathbf {B}^{(1)}\otimes \cdots \otimes \mathbf {B}^{(i-1)}\otimes \mathbf {B}^{(i+1)}\otimes \cdots \otimes \mathbf {B}^{(m)}\Vert } \nonumber \\&\times \Vert {\pi ^{(i)}\bigl ((\mathrm{I}_1\otimes \mathrm{I}_{i-1} \otimes \mathbf {B}^{(i)}\otimes \mathrm{I}_{i+1}\otimes \cdots \otimes \mathrm{I}_{m} )\mathbf{v}\bigr )\Vert }\,. \end{aligned}$$
(80)
Similarly, one derives from (79) the inequality
$$\begin{aligned} \pi ^{(i)}(\mathbf {B}\mathbf{v})_{\nu _i}&\le \big \Vert \mathbf {B}^{(1)}\otimes \cdots \otimes \mathbf {B}^{(i-1)}\otimes \mathbf {B}^{(i+1)}\otimes \cdots \otimes \mathbf {B}^{(m)}\big \Vert \nonumber \\&\times \pi ^{(i)}\big ((\mathrm{I}_1\otimes \cdots \otimes \mathrm{I}_{i-1}\otimes \mathbf {B}^{(i)} \otimes \mathrm{I}_{i+1}\otimes \cdots \otimes \mathrm{I}_m)\mathbf{v}\big )_{\nu _i},\quad \nu _i\in \nabla ^{d_i}.\nonumber \\ \end{aligned}$$
(81)

4.1.1 Tucker Format

We shall be concerned first with (wavelet representations of) operators \(\mathbf {A}= (a_{\nu ,\tilde{\nu }})_{\nu , \tilde{\nu }\in \nabla ^d} : \ell _2(\nabla ^d)\rightarrow \ell _2(\nabla ^d)\) composed of tensor products of operators according to the Tucker format. For a given rank vector \({\mathsf {R}}\in \mathrm{I}\!\mathrm{N}^m\) throughout this section we assume that \(\mathbf {A}:\mathrm{\ell }_{2}(\nabla ^d)\rightarrow \mathrm{\ell }_{2}(\nabla ^d)\) is bounded and has the form
$$\begin{aligned} \mathbf {A} = \sum _{{\mathsf {n}}\in {\mathsf {K}_m}({\mathsf {R}})} c_{\mathsf {n}} \bigotimes _{i=1}^m \mathbf {A}^{(i)}_{n_i}\,, \end{aligned}$$
(82)
where \(\mathbf {A}^{(i)}_{n_i} :\mathrm{\ell }_{2}(\nabla ^{d_i})\rightarrow \mathrm{\ell }_{2}(\nabla ^{d_i})\) for \(i\in \{{1},\ldots ,{m}\}\) and \(n_i\in \{{1},\ldots ,{R_i}\}\).

Example 3

In particular, any operator of the form
$$\begin{aligned} \mathbf {A}_1 \otimes \mathrm{I}_2 \otimes \cdots \otimes \mathrm{I}_m \,+\, \cdots \,+\, \mathrm{I}_1 \otimes \cdots \otimes \mathrm{I}_{m-1} \otimes \mathbf {A}_m \end{aligned}$$
can be written in the form (82) with \({\mathsf {R}} = (2,\ldots ,2)\), \(\mathbf {A}_{1}^{(i)} = \mathrm{I}_i\), \(\mathbf {A}^{(i)}_2=\mathbf {A}_i\) for \(i=1,\ldots , m\), and core tensor
$$\begin{aligned} c_{2,1,\ldots ,1}=\cdots = c_{1,\ldots ,1,2,1,\ldots ,1} = \cdots = c_{1,\ldots ,1,2} = 1, \quad c_{\mathsf {n}} = 0\, \hbox { otherwise.} \end{aligned}$$
The \(\mathbf {A}^{(i)}_{n_i}\) are in general infinite matrices and not necessarily sparse in the strict sense. We shall further require, however, that they be nearly sparse, as will be quantified next. To this end, suppose that for each \(\mathbf {A}^{(i)}_{n_i}\) we have a sequence of approximations (in the spectral norm) such that for a given sequence \(\varepsilon ^{(i)}_{n_i,p} \), \(p\in \mathrm{I}\!\mathrm{N}_0\), of tolerances,
$$\begin{aligned} ||\mathbf {A}^{(i)}_{n_i} - \tilde{\mathbf {A}}^{(i)}_{n_i,[p]}|| \le \varepsilon ^{(i)}_{n_i,p}, \quad p\in \mathrm{I}\!\mathrm{N}_0. \end{aligned}$$
(83)
Moreover, it will be important to apply such sparsified versions of the \(\mathbf {A}_{n_i}^{(i)}\) to vectors that are supported on the elements of a partition\(\{ \varLambda ^{(i)}_{n_i,[p]} \}_{p\in \mathrm{I}\!\mathrm{N}_0}\) of \(\nabla ^{d_i}\).
We shall then consider approximations\( \tilde{\mathbf {A}}\) to \(\mathbf {A}\) of the form
$$\begin{aligned} \tilde{\mathbf {A}} = \sum _{{\mathsf {n}}\in {\mathsf {K}_m}({\mathsf {R}})} c_{\mathsf {n}} \bigotimes _{i=1}^m \tilde{\mathbf {A}}^{(i)}_{n_i},\quad \tilde{\mathbf {A}}^{(i)}_{n_i} := \sum _{p\in \mathrm{I}\!\mathrm{N}_0} \tilde{\mathbf {A}}^{(i)}_{n_i,[p]} \mathrm{R }_{\varLambda ^{(i)}_{n_i,[p]}}, \end{aligned}$$
(84)
where, as earlier, \(\mathrm{R }_{\varLambda }\) denotes the restriction of a given input sequence to \(\varLambda \). The partitions \(\varLambda _{n_i}^{(i)}\) will later be identified for a class of matrices studied in the context of wavelet methods [9, 36]. In particular, choosing them in dependence on a given input sequence \(\mathbf{v}\) facilitates an adaptive approximate application of \(\mathbf {A}\) to \(\mathbf{v}\). The following lemma describes the accuracy of such approximations.

Lemma 4

Let \(\mathbf {v} \in \mathrm{\ell }_{2}(\nabla ^d)\), and let \(\mathbf {A}:\mathrm{\ell }_{2}(\nabla ^d)\rightarrow \mathrm{\ell }_{2}(\nabla ^d)\) have the form (82) for some \({\mathsf {R}}\in \mathrm{I}\!\mathrm{N}^m\), while \(\tilde{\mathbf {A}}\), given by (84), satisfies (83). Then we have
$$\begin{aligned} ||\mathbf {A}\mathbf {v} - \tilde{\mathbf {A}}\mathbf {v}|| \le \sum _{i=1}^m \sum _{n_i=1}^{R_i} \sum _{p\in \mathrm{I}\!\mathrm{N}_0} C^{(i)}_{\tilde{\mathbf {A}}} \varepsilon ^{(i)}_{n_i,[p]} \, \Vert {\mathrm{R }_{\varLambda ^{(i)}_{n_i,[p]}}\,\pi ^{(i)}(\mathbf {v})}\Vert , \end{aligned}$$
(85)
where
$$\begin{aligned} C^{(i)}_{\tilde{\mathbf {A}}} = \max _{n_i=1,\ldots ,R_i} \Big \Vert {\sum _{{\check{\mathsf {n}}}_i} c_{\mathsf {n}} \Big (\bigotimes _{j=1}^{i-1} \tilde{\mathbf {A}}^{(j)}_{n_j} \Big ) \otimes \Big ( \bigotimes _{j=i+1}^{m} \mathbf {A}^{(j)}_{n_j} \Big )}\Big \Vert . \end{aligned}$$

Proof

The usual insertion-triangle-inequality argument for estimating differences of products yields, upon using (80) and the definition of the constants \(C^{(i)}_{\tilde{\mathbf {A}}}\),
$$\begin{aligned} ||\mathbf {A}\mathbf {v} - \tilde{\mathbf {A}}\mathbf {v}||&\le \Big \Vert {\sum _{n_1} \Big (\mathbf {A}^{(1)}_{n_1} - \tilde{\mathbf {A}}^{(1)}_{n_1}\Big ) \otimes \Big ( \sum _{\check{{\mathsf {n}}}_1} c_n \mathbf {A}^{(2)}_{n_2} \otimes \cdots \otimes \mathbf {A}^{(m)}_{n_m}\Big ) \, \mathbf {v}}\Big \Vert \\&\quad + \cdots + \Big \Vert {\sum _{n_m} \Big (\sum _{\check{{\mathsf {n}}}_m} c_n \tilde{\mathbf {A}}^{(1)}_{n_1} \otimes \cdots \otimes \tilde{\mathbf {A}}^{(m-1)}_{n_{m-1}}\Big ) \otimes \Big (\mathbf {A}^{(m)}_{n_m} - \tilde{\mathbf {A}}^{(m)}_{n_m}\Big ) \, \mathbf {v}}\Big \Vert \\&\le C^{(1)}_{\tilde{\mathbf {A}}} \sum _{n_1} \Vert {[(\mathbf {A}^{(1)}_{n_1} - \tilde{\mathbf {A}}^{(1)}_{n_1}) \otimes \mathrm{I}\otimes \cdots \otimes \mathrm{I}] \mathbf {v}}\Vert \\&\quad + \cdots + C^{(m)}_{\tilde{\mathbf {A}}} \sum _{n_m} \Vert {[\mathrm{I}\otimes \cdots \otimes \mathrm{I}\otimes (\mathbf {A}^{(m)}_{n_m} - \tilde{\mathbf {A}}^{(m)}_{n_m})]\mathbf{v}}\Vert . \end{aligned}$$
Assertion (85) follows now, using (83), from
$$\begin{aligned} ||[(\mathbf {A}^{(1)}_{n_1} - \tilde{\mathbf {A}}^{(1)}_{n_1}) \otimes \mathrm{I}\otimes \cdots \otimes \mathrm{I}] \mathbf{v}||&\le \sum _{p} \Vert {[(\mathbf {A}^{(1)}_{n_1} - \tilde{\mathbf {A}}^{(1)}_{n_1,[p]}) \mathrm{R }_{\varLambda ^{(1)}_{n_1,[p]}} \otimes \mathrm{I}\otimes \cdots \otimes \mathrm{I}] \mathbf {v}}\Vert \\&\le \sum _p \varepsilon ^{(1)}_{n_1,p} \Vert {\mathrm{R }_{\varLambda ^{(1)}_{n_1,[p]}} \pi ^{(1)}(\mathbf{v})}\Vert \end{aligned}$$
and analogous estimates for the other summands. \(\square \)

Remark 8

The constants \(C^{(i)}_{\tilde{\mathbf {A}}}\), depending on the operator and its approximation, may introduce a dependence on \(m\). In certain cases, this dependence is exponential. For instance, in the case of an operator of the form \(\mathbf {A} = \mathbf {B} \otimes \mathbf {B}\otimes \cdots \otimes \mathbf {B}\), with \(||\tilde{\mathbf {B}}||\le ||\mathbf {B}||\), we obtain \(C^{(i)}_{\tilde{\mathbf {A}}} = ||\mathbf {B}||^{m-1}\). This constant can therefore also strongly depend on an appropriate scaling of the problem under consideration. However, in the case of an operator
$$\begin{aligned} \mathbf {A} = \mathbf {B}\otimes \mathrm{I}\otimes \cdots \otimes \mathrm{I}\,+\, \mathrm{I}\otimes \mathbf {B}\otimes \mathrm{I}\otimes \cdots \otimes \mathrm{I}\,+\, \cdots \,+\, \mathrm{I}\otimes \cdots \otimes \mathrm{I}\otimes \mathbf {B} \,, \end{aligned}$$
we obtain instead \(C^{(i)}_{\tilde{\mathbf {A}}} \le (m-1) ||\mathbf {B}||\).

Definition 7

Let \(\varLambda \) be a countable index set, and let \(s^* > 0\). We call an operator \(\mathbf {B}:\mathrm{\ell }_{2}(\varLambda )\rightarrow \mathrm{\ell }_{2}(\varLambda )\)\(s^*\)-compressible if for any \(0 < s < s^*\) there exist summable positive sequences \((\alpha _j)_{j\ge 0}\), \((\beta _j)_{j\ge 0}\) and for each \(j\ge 0\) there exists \(\mathbf {B}_j\) with at most \(\alpha _j 2^j\) nonzero entries per row and column such that \(||\mathbf {B} - \mathbf {B}_j|| \le \beta _j 2^{-s j} \). For a given \(s^*\)-compressible operator \(\mathbf {B}\) we denote the corresponding sequences by \(\alpha (\mathbf {B})\), \(\beta (\mathbf {B})\).

Moreover, we say that a family of operators \(\{ \mathbf {B}(n) \}_n\) is equi-\(s^*\)-compressible if all \(\mathbf {B}(n)\) are \(s^*\)-compressible with the same choice of sequences \((\alpha _j)\), \((\beta _j)\) and, in addition, for all \(\lambda \in \varLambda \) the number of nonzero elements in the rows and columns of the approximations \(\mathbf {B}(n)_j\) can be estimated jointly for all \(n\) in the form
$$\begin{aligned} \# \Big ( \bigcup _n \bigl \{ \lambda '\in \varLambda :(\mathbf {B}(n)_j)_{\lambda ,\lambda '} \ne 0 \vee (\mathbf {B}(n)_j)_{\lambda ',\lambda } \ne 0 \bigr \} \Big ) \le \alpha _j 2^j \,. \end{aligned}$$

Example 4

To give a structural example, let us assume that \(\{ \psi _\lambda \}_{\lambda \in \nabla }\) is an orthonormal wavelet basis on \(\mathbb {R}\). As earlier, let \(|\lambda |\) denote the level of the basis function \(\psi _\lambda \). For \(c,\sigma , \beta >0\) we denote by \(\mathcal {M}_{c, \sigma ,\beta }\) the class of infinite matrices for which
$$\begin{aligned} |b_{\lambda ,\lambda '}| \le c \, 2^{-||\lambda |-|\lambda '|| \sigma } \bigl ( 1 + {{\mathrm{dist}}}({{\mathrm{supp}}}\psi _\lambda , {{\mathrm{supp}}}\psi _{\lambda '}) \bigr )^{-\beta } \,. \end{aligned}$$
Such bounds are known to hold, for instance, for wavelet representations of the double layer potential operator. Again, with a suitable rescaling of the wavelets, the representations of other potential types, as well as elliptic partial differential operators, exhibit the same decay of entries. It is shown in [9, Proposition 3.4] that (when specialized to the present case of one-dimensional factors) any \(\mathbf {B} \in \mathcal {M}_{c,\sigma ,\beta }\) with \(\sigma > 1/2\), \(\beta > 1\) is \(s^*\)-compressible with \(s^* = \min \{\sigma -1/2, \beta -1\}\).

If \(\mathbf {B}(n) \in \mathcal {M}_{c(n),\sigma (n),\beta (n)}\) with \(c(n)\) and \(\sigma (n)^{-1},\beta (n)^{-1}\) uniformly bounded, then from the construction in the proof of [9, Proposition 3.4] it can be seen that the \(\mathbf {B}(n)\) are equi-\(s^*\)-compressible with \(s^* = \min \{\inf _n\sigma (n)-1/2, \inf _n\beta (n)-1\}\) since the same set of nonzero matrix entries can be used for each \(n\).

The key property of \(s^*\)-compressible matrices in the context of adaptive methods is that they are not only bounded in \(\ell _2\) but also on the smaller approximation spaces and, thus, preserve sparsity in a quantifiable manner. We wish to establish such concepts next for the tensor setting.

To this end, assume that the components \(\mathbf {A}_{n_i}^{(i)}\) in \(\mathbf {A}\), given by (82), are \(s^*\)-compressible, and let \(\mathbf {A}^{(i)}_{n_i,j}\) be the corresponding approximations according to Definition 7. Quite in the spirit of the adaptive application of an operator in wavelet coordinates (see [9]), for approximating \(\mathbf {A}\mathbf{v}\) for a given \(\mathbf{v}\in \ell _2(\nabla ^d)\), a priori knowledge about \(\mathbf {A}\) in terms of \(s^*\)-compressibility is to be combined with a posteriori information on \(\mathbf{v}\). In fact, given \(\mathbf{v}\in \ell _2(\nabla ^d)\), we describe now how to construct for any \(J\in \mathrm{I}\!\mathrm{N}\) approximations \(\mathbf{w}_J\) to the sequence \(\mathbf {A}\mathbf{v}\) as follows. For each \(i\) and for \(j\in \mathrm{I}\!\mathrm{N}\), let \(\bar{\varLambda }^{(i)}_{j}\) be the support of the best \(2^j\)-term approximation of \(\pi ^{(i)}(\mathbf {v})\) so that, in particular, \(\bar{\varLambda }^{(i)}_p \subset \bar{\varLambda }^{(i)}_{p+1}\). If \(\mathbf {A}^{(i)}_{n_i} = \mathrm{I}\), we simply set \(\tilde{\mathbf {A}}^{(i)}_{n_i} = \mathrm{I}\). If \(\mathbf {A}^{(i)}_{n_i} \ne \mathrm{I}\), then we let \(\bar{\varLambda }^{(i)}_{-1}:=\emptyset \) and
$$\begin{aligned} \varLambda ^{(i)}_{[p]} := \left\{ \begin{array}{ll} \bar{\varLambda }^{(i)}_{p} \setminus \bar{\varLambda }^{(i)}_{p-1}, &{}\quad p=0,\ldots ,J ,\\ \nabla ^{d_i} \setminus \bar{\varLambda }^{(i)}_J, &{}\quad p=J+1,\\ \emptyset , &{}\quad p>J+1. \end{array}\right. \end{aligned}$$
(86)
Moreover, let
$$\begin{aligned} \tilde{\mathbf {A}}^{(i)}_{n_i,[p]} := \left\{ \begin{array}{ll} \mathbf {A}^{(i)}_{n_i,J-p},&{}\quad p=0,\ldots ,J,\\ 0, &{}\quad p> J. \end{array}\right. \end{aligned}$$
(87)
Note that due to the particular choice of the sets \(\varLambda ^{(i)}_{[p]}\), the factors \(\tilde{\mathbf {A}}^{(i)}_{n_i,[p]}\) formed according to (84) depend on the sequence \(\mathbf{v}\). However, as a simple consequence of Definition 7, the \(\tilde{\mathbf {A}}^{(i)}_{n_i}\) are bounded independently of \(\mathbf {v}\).

Lemma 5

Assume that the components \(\mathbf {A}_{n_i}^{(i)}\) of \(\mathbf {A}\) as in (82) are \(s^*\)-compressible. Given any \(\mathbf{v}\in \ell _2(\nabla ^d)\), \(J\in \mathrm{I}\!\mathrm{N}\), let
$$\begin{aligned} \tilde{\mathbf {A}}_J := \sum _{{\mathsf {n}}\in {\mathsf {K}_m}({\mathsf {R}})} c_{\mathsf {n}} \bigotimes _{i=1}^m \tilde{\mathbf {A}}^{(i)}_{n_i},\quad \tilde{\mathbf {A}}^{(i)}_{n_i} := \sum _{p\in \mathrm{I}\!\mathrm{N}_0} \tilde{\mathbf {A}}^{(i)}_{n_i,[p]} \mathrm{R }_{\varLambda ^{(i)}_{n_i,[p]}}\, \end{aligned}$$
with \( \mathrm{R }_{\varLambda ^{(i)}_{n_i,[p]}}, \tilde{\mathbf {A}}^{(i)}_{n_i}\) defined by (86), (87), respectively. Then, whenever \(\pi ^{(i)}(\mathbf{v})\in \mathcal {A}^s\) for some \(0<s<s^*\), the finitely supported sequence \(\mathbf{w}_J := \tilde{\mathbf {A}}_J \mathbf{v}\) satisfies
$$\begin{aligned} ||\mathbf {A}\mathbf{v}- \tilde{\mathbf {A}}_J\mathbf {v}|| \le { 2^{-s (J -1)} } \sum _{i = 1}^m C^{(i)}_{\tilde{\mathbf {A}}} R_i \bigl ( \max _{n} ||\mathbf {A}^{(i)}_{n}|| + ||\hat{\beta }^{(i)}||_{\mathrm{\ell }_{1}} \bigr ) ||\pi ^{(i)}(\mathbf {v})||_{{\mathcal {A}}^s}, \end{aligned}$$
(88)
as well as
$$\begin{aligned} \#{{\mathrm{supp}}}_i (\tilde{\mathbf {A}}_J\mathbf {v}) \le R_i ||\hat{\alpha }^{(i)}||_{\mathrm{\ell }_{1}} 2^J \,, \end{aligned}$$
(89)
where the sequences \(\hat{\alpha }\), \(\hat{\beta }\) are defined as the componentwise maxima of the sequences in Definition 7 for each \(\mathbf {A}^{(i)}_{n_i}\), that is,
$$\begin{aligned} \hat{\alpha }^{(i)}_j := \max _{n} \alpha _j(\mathbf {A}^{(i)}_n),\quad \hat{\beta }^{(i)}_j := \max _{n} \beta _j(\mathbf {A}^{(i)}_n). \end{aligned}$$
(90)

Proof

We apply Lemma 4 with \(\tilde{\mathbf {A}}^{(i)}_{n_i,[p]}\), defined in (87), and \(\varLambda ^{(i)}_{n_i,[p]} := \varLambda ^{(i)}_{[p]}\), according to (86). By \(s^*\)-compressibility, we have \( ||\mathbf {A}^{(i)}_{n_i} - \tilde{\mathbf {A}}^{(i)}_{n_i,[p]}|| \le \hat{\beta }^{(i)}_{J-p} 2^{- s(J-p)} =: \varepsilon ^{(i)}_{n_i,p}\), \(p=0,\ldots ,J,\)\(||\mathbf {A}^{(i)}_{n_i} - \tilde{\mathbf {A}}^{(i)}_{n_i,[J+1]}|| = ||\mathbf {A}^{(i)}_{n_i}||\), \(||\mathrm{R }_{\varLambda ^{(i)}_{[p]}} \pi ^{(i)}(\mathbf{v})|| = 0\) for \(p>J+1\), and therefore
$$\begin{aligned} ||\mathbf {A}\mathbf{v}- \mathbf{w}_J||&\le \sum _{i=1}^m \sum _{n_i=1}^{R_i} C^{(i)}_{\tilde{\mathbf {A}}}\Bigg \{\sum _{p=0}^{J} \hat{\beta }^{(i)}_{J-p} 2^{- s(J-p)}\, \Vert {\mathrm{R }_{\varLambda ^{(i)}_{n_i,[p]}}\,\pi ^{(i)}(\mathbf {v})}\Vert \nonumber \\&+ \, ||\mathbf {A}^{(i)}_{n_i}|| \Vert { \mathrm{R }_{\varLambda ^{(i)}_{n_i,[J+1]}}\pi ^{(i)}(\mathbf{v})}_{}\Vert \Bigg \}. \end{aligned}$$
(91)
By the choice of the \(\varLambda ^{(i)}_{[p]}\) and the definition of \(||\cdot ||_{{{\mathcal {A}}^s}}\), we obtain \(||\mathrm{R }_{\varLambda ^{(i)}_{[p]}} \pi ^{(i)}(\mathbf{v})|| \le 2^{-s(p-1)} ||\pi ^{(i)}(\mathbf{v})||_{{{\mathcal {A}}^s}}\) for \(p = 0,\ldots , J+1\), which confirms (88). Furthermore,
$$\begin{aligned} \#{{\mathrm{supp}}}_i (\tilde{\mathbf {A}}_J \mathbf {v}) \le R_i(\hat{\alpha }^{(i)}_{J} 2^{J} 2^0 + \hat{\alpha }^{(i)}_{J-1} 2^{J-1} 2^1 + \cdots + \hat{\alpha }^{(i)}_0 2^0 2^J) \le R_i ||\hat{\alpha }^{(i)}||_{\mathrm{\ell }_{1}} 2^J \,, \end{aligned}$$
(92)
which is (89). \(\square \)

Remark 9

Whenever \(\mathbf{v}\) is finitely supported, there exists a \(p(\mathbf{v})\in \mathrm{I}\!\mathrm{N}_0\) such that \(\varLambda ^{(i)}_{[p]}=\emptyset \) for \(i=1,\ldots ,m\), \(p>p(\mathbf{v})\). Hence, the right-hand side of (91) can be computed for each \(J\in \mathrm{I}\!\mathrm{N}_0\), where the sum over \(p\) terminates for \(J\ge p(\mathbf{v})\) at \(p(\mathbf{v})\). Further increasing \(J\) will then decrease all summands on the right-hand side of (91). Therefore, fixing any \(s <s^*\) (close to \(s^*\)), we can find for any \(\eta >0\) the integer \(J(\eta )\) defined by
$$\begin{aligned} J(\eta )&:= \mathop {\mathrm{arg}\; \mathrm{min}}_{J\in \mathrm{I}\!\mathrm{N}_0} \Bigg \{ \sum _{i=1}^m \sum _{n_i=1}^{R_i} C^{(i)}_{\tilde{\mathbf {A}}}\Big \{ \sum _{p=0}^{J} \hat{\beta }^{(i)}_{J-p} 2^{- s(J-p)}\, \Vert {\mathrm{R }_{\varLambda ^{(i)}_{n_i,[p]}}\,\pi ^{(i)}({\mathbf {v}})}\Vert \nonumber \\&+ \, ||\mathbf {A}^{(i)}_{n_i}|| \Vert { \mathrm{R }_{\varLambda ^{(i)}_{n_i,[J+1]}}\pi ^{(i)}(\mathbf{v})}\Vert \Big \} \le \eta \Bigg \}. \end{aligned}$$
(93)
To further examine the properties of \(\tilde{\mathbf {A}}_{J(\eta )}\mathbf{v}\) for a given finitely supported \(\mathbf{v}\), let
$$\begin{aligned} C_{\hat{\alpha }}^{(i)} := ||\hat{\alpha }^{(i)}||_{\mathrm{\ell }_{1}}, \quad C_{\hat{\beta }}^{(i)} := \bigl ( \max _{n} ||\mathbf {A}^{(i)}_{n}|| + ||\hat{\beta }^{(i)}||_{\mathrm{\ell }_{1}} \bigr ) \,. \end{aligned}$$
(94)

Theorem 8

Under the assumptions of Lemma 5 on \(\mathbf {A}\) and any given finitely supported \(\mathbf{v}\in \ell _2(\nabla ^d)\), for any \(\eta >0\) let
$$\begin{aligned} \mathbf{w}_\eta := \tilde{\mathbf {A}}_{J(\eta )}\mathbf{v}=: \tilde{\mathbf {A}}_\eta \mathbf{v}, \end{aligned}$$
(95)
where \(J(\eta )\) is defined by (93). Then
$$\begin{aligned} ||\mathbf {A}\mathbf {v} - \tilde{\mathbf {A}}_\eta \mathbf {v}||&\le \eta , \end{aligned}$$
(96)
$$\begin{aligned} \# {{\mathrm{supp}}}_i (\tilde{\mathbf {A}}_\eta \mathbf {v})&\le { 4 } \,C_{\hat{\alpha }}^{(i)}\, R_i\, \eta ^{-\frac{1}{s}}\, \Big ( \sum _{j=1}^m C^{(j)}_{\hat{\beta }} C^{(j)}_{\tilde{\mathbf {A}}} R_j ||\pi ^{(j)}(\mathbf {v})||_{{\mathcal {A}}^s} \Big )^{\frac{1}{s}} \,, \end{aligned}$$
(97)
$$\begin{aligned} ||\pi ^{(i)}(\tilde{\mathbf {A}}_\eta \mathbf {v})||_{{\mathcal {A}}^s}&\le \frac{{2^{3s+2}}}{2^s-1} \,{\bigl (C^{(i)}_{\hat{\alpha }}\bigr )}^s C^{(i)}_{\hat{\beta }} C^{(i)}_{\tilde{\mathbf {A}}} \,R_i^{1+s}, ||\pi ^{(i)}(\mathbf {v})||_{{\mathcal {A}}^s} \end{aligned}$$
(98)
for all \(i=1,\ldots ,m\), where \(C^{(i)}_{\tilde{\mathbf {A}}}\) is as in Lemma 4, and the constants \(C^{(i)}_{\hat{\alpha }}\), \(C^{(i)}_{\hat{\beta }}\) are defined by (94) and are independent of \(\mathbf {v}\), \(\eta \), and \(m\). Moreover,
$$\begin{aligned} {{{\mathrm{rank}}}_i(\tilde{\mathbf {A}}_\eta \mathbf {v})} \le R_i {{\mathrm{rank}}}_i(\mathbf {v}),\quad i=1,\ldots , m. \end{aligned}$$
(99)

Proof

(96) follows from (93). The bound (97) is an immediate consequence of (89). Choosing for a given finitely supported \(\mathbf{v}\) the mode frame system \(\mathbb {U}=\mathbb {U}(\mathbf{v})\) according to the HOSVD, (99) is clear since with \(\mathbf {U}^{(i)}\) and \(\mathbf {a}\) as in Lemma 4 we obtain
$$\begin{aligned} { \tilde{\mathbf {A}}_\eta } \mathbf {v} = \sum _{{\mathsf {n}} \in {\mathsf {K}_m}({\mathsf {R}})} \sum _{{\mathsf {k}}\in \mathrm{I}\!\mathrm{N}^m} \mathbf {d}_{(n_1,k_1),\ldots ,(n_m,k_m)} \bigotimes _{i=1}^m \tilde{\mathbf {A}}^{(i)}_{n_i} \mathbf {U}^{(i)}_{k_i} \,, \end{aligned}$$
(100)
where \(\mathbf {d}_{(n_1,k_1),\ldots ,(n_m,k_m)} = c_{\mathsf {n}} a_{\mathsf {k}}\).
Without loss of generality it suffices to prove (98) only for \(i=1\), which allows us to temporarily simplify the notation by writing \(\varLambda _{[p]}\) for \(\varLambda ^{(1)}_{[p]}\). Note first that for each \(\nu _1\in \nabla ^{d_1}\), using Proposition 3 followed by (81) and (56), we obtain
$$\begin{aligned} \pi ^{(1)}_{\nu _1}({ \tilde{\mathbf {A}}_\eta } \mathbf {v})&\le C^{(1)}_{\tilde{\mathbf {A}}} \sum _{n_1=1}^{R_1} \pi ^{(1)}_{\nu _1}(\tilde{\mathbf {A}}^{(1)}_{n_1} \otimes \mathrm{I}\cdots \otimes \mathrm{I}\mathbf {v}) \nonumber \\&= C^{(1)}_{\tilde{\mathbf {A}}} \sum _{n_1=1}^{R_1} \Big (\sum _k \bigl |\sigma ^{(1)}_k\bigr |^2 \bigl |(\tilde{\mathbf {A}}^{(1)}_{n_1} \mathbf {U}^{(1)}_k)_{\nu _1}\bigr |^2 \Big )^\frac{1}{2} \,, \end{aligned}$$
(101)
where we used (56) in the last step for the mode frame system \(\mathbb {U}(\mathbf{v})\) from Theorem 2. To bound next the terms on the right-hand side of (101), let
$$\begin{aligned} \hat{\varLambda }_{n_1,[0]}&:= {{\mathrm{supp}}}{{\mathrm{range}}}\mathbf {A}^{(1)}_{n_1,0} \mathrm{R }_{\varLambda _{[0]}} \,, \\ \hat{\varLambda }_{n_1,[q]}&:= \Big ( \bigcup _{j+\ell = q} {{\mathrm{supp}}}{{\mathrm{range}}}\mathbf {A}^{(1)}_{n_1,j} \mathrm{R }_{\varLambda _{[\ell ]}} \Big ) \setminus \Big ( \bigcup _{i < q} \hat{\varLambda }_{n_1,[i]} \Big ),\quad q > 0. \end{aligned}$$
By the same argument as in (92), we also obtain
$$\begin{aligned} \#\hat{\varLambda }_{n_1,[q]} \le ||\hat{\alpha }^{(1)}||_{\mathrm{\ell }_{1}} 2^q. \end{aligned}$$
(102)
For \(q=0,\ldots ,J\) and each \(k\) we have
$$\begin{aligned} ||\mathrm{R }_{\hat{\varLambda }_{n_1,[q]}} \tilde{\mathbf {A}}^{(1)}_{n_1} \mathbf {U}^{(1)}_k|| \le {\sum _{\ell = 0}^{q-1}} ||\mathrm{R }_{\hat{\varLambda }_{n_1,[q]}} \tilde{\mathbf {A}}^{(1)}_{n_1} \mathrm{R }_{\varLambda _{[\ell ]}} \mathbf {U}^{(1)}_k|| { + \Big \Vert {\tilde{\mathbf {A}}^{(1)}_{n_1} \sum _{\ell = q}^J \mathrm{R }_{\varLambda _{[\ell ]}} \mathbf {U}^{(1)}_k} }\Big \Vert , \end{aligned}$$
On the one hand, by (87), we obtain for \(\ell =0,\ldots ,q-1\),
$$\begin{aligned} \mathrm{R }_{\hat{\varLambda }_{n_1,[q]}} \tilde{\mathbf {A}}^{(1)}_{n_1} \mathrm{R }_{\varLambda _{[\ell ]}} = \mathrm{R }_{\hat{\varLambda }_{n_1,[q]}} \Big ( \mathbf {A}^{(1)}_{n_1, J-\ell } - \mathbf {A}^{(1)}_{n_1, {q-\ell -1}}\Big ) \mathrm{R }_{\varLambda _{[\ell ]}}, \end{aligned}$$
and hence
$$\begin{aligned}&||\mathrm{R }_{\hat{\varLambda }_{n_1,[q]}} \tilde{\mathbf {A}}^{(1)}_{n_1} \mathrm{R }_{\varLambda _{[\ell ]}}\mathbf {U}^{(1)}_k|| \\&\quad \le \bigl ( ||\mathbf {A}^{(1)}_{n_1} - \mathbf {A}^{(1)}_{n_1, J-\ell }|| + ||\mathbf {A}^{(1)}_{n_1} - \mathbf {A}^{(1)}_{n_1, q-\ell - 1}|| \bigr ) ||\mathrm{R }_{\varLambda _{[\ell ]}} \mathbf {U}^{(1)}_k|| \\&\quad \le \bigl ( \hat{\beta }^{(1)}_{J-\ell } 2^{-s(J-\ell )} + \hat{\beta }^{(1)}_{q-\ell -1} 2^{-s(q-\ell -1)} \bigr ) ||\mathrm{R }_{\varLambda _{[\ell ]}} \mathbf {U}^{(1)}_k|| \\&\quad \le \gamma _\ell 2^{-s(q-\ell -1)} ||\mathrm{R }_{\varLambda _{[\ell ]}} \mathbf {U}^{(1)}_k||, \end{aligned}$$
where we abbreviate \(\gamma _\ell := \hat{\beta }^{(1)}_{J-\ell }+\hat{\beta }^{(1)}_{q-\ell -1}\). On the other hand,
$$\begin{aligned} \Big \Vert {\tilde{\mathbf {A}}^{(1)}_{n_1} \sum _{\ell = q}^J \mathrm{R }_{\varLambda _{[\ell ]}} \mathbf {U}^{(1)}_k}&\Bigr \Vert \le \Big \Vert {\sum _{\ell = q}^J \bigl [(\mathbf {A}^{(1)}_{n_1,J-\ell } - \mathbf {A}^{(1)}_{n_1}) + \mathbf {A}^{(1)}_{n_1}\bigr ]\mathrm{R }_{\varLambda _{[\ell ]}} \mathbf {U}^{(1)}_k}\Big \Vert \\&\le \sum _{\ell = q}^J {\hat{\beta }}^{(1)}_{J-\ell } 2^{-s(J-\ell )} ||\mathrm{R }_{\varLambda _{[\ell ]}} \mathbf {U}^{(1)}_k|| + ||\mathbf {A}^{(1)}_{n_1}|| \Vert {\mathrm{R }_{\bigcup _{j\ge q}\varLambda _{[j]} } \mathbf {U}^{(1)}_k}\Vert . \end{aligned}$$
Combining these estimates and applying the Cauchy–Schwarz inequality yields
$$\begin{aligned}&||\mathrm{R }_{\hat{\varLambda }_{n_1,[q]}} \tilde{\mathbf {A}}^{(1)}_{n_1} \mathbf {U}^{(1)}_k|| \\&\quad \le \Big ( 3 ||\hat{\beta }^{(1)}||_{\mathrm{\ell }_{1}}+ ||\mathbf {A}^{(1)}_{n_1}|| \Big )^\frac{1}{2} \Big ( \sum _{\ell =0}^{q-1} \gamma _\ell 2^{-2s(q-\ell -1)} ||\mathrm{R }_{\varLambda _{[\ell ]}} \mathbf {U}^{(1)}_k||^2 \\&\quad \quad + \sum _{\ell =q}^J \hat{\beta }^{(1)}_{J-\ell } 2^{-2s(J-\ell )} ||\mathrm{R }_{\varLambda _{[\ell ]}} \mathbf {U}^{(1)}_k||^2 + ||\mathbf {A}^{(1)}_{n_1}|| \Vert {\mathrm{R }_{\bigcup _{j\ge q}\varLambda _{[j]}} \mathbf {U}^{(1)}_k} \Vert ^2 \Big )^\frac{1}{2} \,. \end{aligned}$$
Again using (56) as in (101) leads to
$$\begin{aligned} ||\mathrm{R }_{\hat{\varLambda }_{n_1,[q]}} \pi ^{(1)}(\tilde{\mathbf {A}}^{(1)}_{n_1} \otimes \mathrm{I}_2 \cdots \otimes \mathrm{I}_m \mathbf {v})||^2&\le 3C^{(1)}_{\hat{\beta }} \Big ( \sum _{\ell =0}^{q-1} \gamma _\ell 2^{-2s (q -\ell -1)} ||\mathrm{R }_{\varLambda _{[\ell ]}} \pi ^{(1)}(\mathbf{v})||^2 \\&+ \sum _{\ell =q}^J \hat{\beta }^{(1)}_{J-\ell } 2^{-2s(J-\ell )} ||\mathrm{R }_{\varLambda _{[\ell ]}} \pi ^{(1)}(\mathbf{v})||^2\\&+\, ||\mathbf {A}^{(1)}_{n_1}|| \Vert {\mathrm{R }_{\bigcup _{j\ge q}\varLambda _{[j]}} \pi ^{(1)}(\mathbf{v}) }\Vert ^2 \Big ), \end{aligned}$$
and thus, since \(||\mathrm{R }_{\varLambda _{[\ell ]}} \pi ^{(1)}(\mathbf{v})|| \le 2^{-s(\ell -1)} ||\pi ^{(1)}(\mathbf{v})||_{{\mathcal {A}}^s}\) and \(\Vert {\mathrm{R }_{\bigcup _{j\ge q}\varLambda _{[j]}} \pi ^{(1)}(\mathbf{v})}\Vert \le 2^{-s(q-1)} ||\pi ^{(1)}(\mathbf{v})||_{{\mathcal {A}}^s}\), for \(q=0,\ldots ,J\) we arrive at
$$\begin{aligned} ||\mathrm{R }_{\hat{\varLambda }_{n_1,[q]}} \pi ^{(1)}(\tilde{\mathbf {A}}^{(1)}_{n_1} \otimes \mathrm{I}\cdots \otimes \mathrm{I}\mathbf {v})|| \le { 2^{-s q} \, 2^{2(s+1)} C^{(1)}_{\hat{\beta }} || \pi ^{(1)}(\mathbf {v})||_{{\mathcal {A}}^s} \,. } \end{aligned}$$
(103)
Recall that the sets \(\hat{\varLambda }_{n_1,[q]}\) are disjoint with \(\#\hat{\varLambda }_{n_1,[q]} \le ||\hat{\alpha }^{(1)}||_{\mathrm{\ell }_{1}} 2^q\). By definition of the \({{\mathcal {A}}^s}\)-quasi-norm, we have
$$\begin{aligned} ||\pi ^{(1)}(\tilde{\mathbf {A}}^{(1)}_{n_1} \otimes \mathrm{I}\cdots \otimes \mathrm{I}\mathbf {v})||_{{\mathcal {A}}^s}&\le { \sup _{q\in \mathrm{I}\!\mathrm{N}_0} \Big (\sum _{j<q} \#\hat{\varLambda }_{n_1,[j]} + 1 \Big )^s } \\&\times \sum _{j\ge q} ||\mathrm{R }_{\hat{\varLambda }_{n_1,[j]}} \pi ^{(1)}(\tilde{\mathbf {A}}^{(1)}_{n_1} \otimes \mathrm{I}\cdots \otimes \mathrm{I}\mathbf {v})||. \end{aligned}$$
Hence, from (103) we infer
$$\begin{aligned} ||\pi ^{(1)}(\tilde{\mathbf {A}}^{(1)}_{n_1} \otimes \mathrm{I}\cdots \otimes \mathrm{I}\mathbf {v})||_{{\mathcal {A}}^s} \le 2^{3s+2}(2^s-1)^{-1}||\hat{\alpha }^{(1)}||^s_{\mathrm{\ell }_{1}} C^{(1)}_{\hat{\beta }} \, || \pi ^{(1)}(\mathbf {v})||_{{\mathcal {A}}^s} \,. \end{aligned}$$
Since by the first inequality in (101) we have
$$\begin{aligned} ||\pi ^{(1)}({ \tilde{\mathbf {A}}_\eta } \mathbf{v})||_{ {{\mathcal {A}}^s}} \le C^{(1)}_{\tilde{\mathbf {A}}} R_1^s \sum _{n_1=1}^{R_1} ||\pi ^{(1)}(\tilde{\mathbf {A}}^{(1)}_{n_1} \otimes \mathrm{I}\cdots \otimes \mathrm{I}\mathbf {v})||_{ {{\mathcal {A}}^s}} \,, \end{aligned}$$
we arrive at (98). \(\square \)

Remark 10

The estimate (98) corresponds to the worst case that the sets \(\hat{\varLambda }_{n_i,[q]}\) constructed in the proof are disjoint for different \(n_i\). If, on the contrary, the \(\{\mathbf {A}^{(i)}_{n_i}\}_{n_i}\) are equi-\(s^*\)-compressible, and hence these sets are the same for all \(n_i\), we can combine (101) directly with (103) to obtain instead
$$\begin{aligned} ||\pi ^{(i)}({ \tilde{\mathbf {A}}_\eta } \mathbf {v})||_{{\mathcal {A}}^s} \lesssim {\bigl (C^{(i)}_{\hat{\alpha }}\bigr )}^s C^{(i)}_{\hat{\beta }} C^{(i)}_{\tilde{\mathbf {A}}} R_i ||\pi ^{(i)}(\mathbf {v})||_{{\mathcal {A}}^s} \,, \end{aligned}$$
i.e., an improvement by a factor \(R_i^s\). Similarly, in this case we also obtain that by a modification of (92), the estimate (97) can be replaced by
$$\begin{aligned} \# {{\mathrm{supp}}}_i ( {\tilde{\mathbf {A}}_\eta } \mathbf {v}) \lesssim C_{\hat{\alpha }}^{(i)} \eta ^{-\frac{1}{s}} \Big ( \sum _{j=1}^m C^{(j)}_{\hat{\beta }} C^{(j)}_{\tilde{\mathbf {A}}} R_j ||\pi ^{(j)}(\mathbf {v})||_{{\mathcal {A}}^s} \Big )^{\frac{1}{s}} \,. \end{aligned}$$

Remark 11

If \(r_i := {{\mathrm{rank}}}_i(\mathbf {v}) < \infty \), then the number \({{\mathrm{ops}}}({ \tilde{\mathbf {A}}_\eta } \mathbf {v})\) of arithmetic operations for evaluating \({ \tilde{\mathbf {A}}_\eta }\mathbf {v}\) as in Theorem 8, for a given HOSVD of \(\mathbf {v}\), can be estimated by
$$\begin{aligned} {{\mathrm{ops}}}({ \tilde{\mathbf {A}}_\eta } \mathbf {v}) \lesssim \prod _{i=1}^m R_i r_i + \eta ^{-\frac{1}{s}} \sum _{i=1}^m C_{\hat{\alpha }}^{(i)} R_i r_i \Big ( \sum _{j=1}^m C^{(j)}_{\hat{\beta }} C^{(j)}_{\tilde{\mathbf {A}}} R_j ||\pi ^{(j)}(\mathbf {v})||_{{\mathcal {A}}^s} \Big )^{\frac{1}{s}}\quad \quad \end{aligned}$$
(104)
with a constant independent of \(\mathbf {v}\), \(\eta \), and \(m\).

Proof

The sorting of entries of \(\pi ^{(i)}(\mathbf {v})\) required to obtain the index sets of best \(2^j\)-term approximations in Theorem 8 can be replaced by an approximate sorting by binary binning, requiring only \(\#{{\mathrm{supp}}}_i(\mathbf {v})\) operations, as suggested in [4, 29]. This only leads to a change in the generic constants in the resulting estimates.

Let \(\mathbf {v}\) have the HOSVD \(\mathbf {v} = \sum _k a_k \mathbb {U}_k\), then, on the one hand, we need to form the core tensor for the result, which takes \(\prod _{i=1}^m R_i r_i\) operations, and evaluate the approximations to \(\tilde{\mathbf {A}}^{(i)}_{n_i} \mathbf {U}^{(i)}_{k_i}\) for \(n_i=1,\ldots , R_i\) and \(k_i=1,\ldots ,r_i\). The number of operations for each of these terms can be estimated as in [9], which leads to (104). \(\square \)

As the first term on the right-hand side of (104) shows, the Tucker format still suffers from the curse of dimensionality due to the complexity of the core tensors.

4.1.2 Hierarchical Tucker Format

To apply operators to coefficient sequences given in the hierarchical Tucker format, we need a representation of these operators with an analogous hierarchical structure. That is, in the representation
$$\begin{aligned} \mathbf {A} = \sum _{{\mathsf {n}}\in {\mathsf {K}_m}({\mathsf {R}})} c_{\mathsf {n}} \bigotimes _{i=1}^m \mathbf {A}^{(i)}_{n_i} \,, \end{aligned}$$
(105)
for the finitely supported tensor \(\mathbf {c} = (c_{\mathsf {n}})\in \mathrm{\ell }_{2}(\mathrm{I}\!\mathrm{N}^m)\) we need in addition a hierarchical decomposition
$$\begin{aligned} \mathbf {c} = \mathrm {\Sigma }_{\mathcal {D}_{m}}(\{ \mathbf {C}^{(\alpha ,\nu )}:\alpha \in \mathcal{N}(\mathcal {D}_{m}),\, \nu \in \mathrm{I}\!\mathrm{N}\}), \end{aligned}$$
(106)
see (37) and (38). Here, for \(\alpha \in \mathcal{N}(\mathcal {D}_{m})\), we extend the definition of representation ranks in the representation of \(\mathbf {A}\) to each \(\alpha \in \mathcal {D}_{m}\) by setting \(R_{\{i\}} := R_i\) and
$$\begin{aligned} R_{\alpha } := \#\{\nu :\mathbf {C}^{(\alpha ,\nu )}\ne 0 \}\,. \end{aligned}$$
(107)
In what follows, we assume \( \max _{\alpha \in \mathcal {D}_{m}} R_\alpha < \infty \) and \(R_{{0_{m}}} = 1\). According to Theorem 4, \(\mathbf {v} \in \mathrm{\ell }_{2}(\nabla ^d)\) has a representation
$$\begin{aligned} \mathbf {v} = \sum _{{\mathsf {k}}\in \mathrm{I}\!\mathrm{N}^m} a_{\mathsf {k}} \mathbb {U}_{\mathsf {k}},\quad \mathbf {a} = \mathrm {\Sigma }_{\mathcal {D}_{m}}(\{ \mathbf {B}^{(\alpha ,k)}\}). \end{aligned}$$
If \(\max _{\alpha \in \mathcal {D}_{m}} {{\mathrm{rank}}}_\alpha (\mathbf {v})< \infty \), then \({ \tilde{\mathbf {A}}_\eta } \mathbf {v}\) can be represented in the form (100), with \(\mathbf {d}\) again admitting a hierarchical representation in terms of matrices \(\mathbf {D}^{(\alpha ,(\nu ,k)))}\) on \(\mathrm{I}\!\mathrm{N}^2\times \mathrm{I}\!\mathrm{N}^2\) with entries
$$\begin{aligned} \mathbf {D}^{(\alpha ,(\nu ,k)))}_{((\mu _1,l_1),(\mu _2,l_2))} := \mathbf {C}^{(\alpha ,\nu )}_{(\mu _1,\mu _2)} \mathbf {B}^{(\alpha ,k)}_{l_1,l_2} \,. \end{aligned}$$
That is, as in (37), we have an explicit representation
$$\begin{aligned} \mathbf {d} = \mathrm {\Sigma }_{\mathcal {D}_{m}}\bigl (\{ \mathbf {D}^{(\alpha ,(\nu ,k)))} :\alpha \in \mathcal{N}(\mathcal {D}_{m}),\, \nu =1,\ldots ,R_\alpha ,\,k=1,\ldots ,{{\mathrm{rank}}}_\alpha (\mathbf{v}) \}\bigr ) \end{aligned}$$
in (100), where the indices \(k\in \mathrm{I}\!\mathrm{N}\) are replaced in the definition of \(\mathrm {\Sigma }_{\mathcal {D}_{m}}(\cdot )\) by the indices \((\nu ,k) \in \mathrm{I}\!\mathrm{N}^2\).

Example 5

To give a specific example, we consider an operator of the form
$$\begin{aligned} \mathbf {A}_1 \otimes \mathrm{I}_2 \otimes \cdots \otimes \mathrm{I}_m \,+ \cdots +\, \mathrm{I}_1 \otimes \cdots \otimes \mathrm{I}_{m-1} \otimes \mathbf {A}_m \end{aligned}$$
in the hierarchical format with dimension tree
$$\begin{aligned} \mathcal {D}_{m} = \bigl \{ {0_{m}}, \{1\}, \{2,\ldots ,m\}, \{2\}, \{3,\ldots ,m\},\ldots ,\{m\} \bigr \} \end{aligned}$$
as in Example 2. Setting \(\mathbf {A}_{1}^{(i)} = \mathrm{I}_i\), \(\mathbf {A}^{(i)}_2=\mathbf {A}_i\) for \(i=1,\ldots , m\), we obtain a representation as in (106), with \(R_\alpha = 2\) for \(\alpha \ne {0_{m}}\), and
$$\begin{aligned} \mathbf {C}^{({0_{m}},1)} = \begin{pmatrix} 0 &{}\quad 1 \\ 1 &{}\quad 0 \end{pmatrix},\quad \! \mathbf {C}^{(\alpha ,1)} = \begin{pmatrix} 1 &{}\quad 0 \\ 0 &{}\quad 0 \end{pmatrix},\quad \! \mathbf {C}^{(\alpha ,2)} = \begin{pmatrix} 0 &{}\quad 1 \\ 1 &{}\quad 0 \end{pmatrix},\quad \!\alpha \!\in \! \mathcal{N}(\mathcal {D}_{m})\setminus \{{0_{m}}\}. \end{aligned}$$
The estimates in Theorem 8 now directly carry over to the hierarchical Tucker format, where, as the only modification, (99) is replaced by
$$\begin{aligned} {{\mathrm{rank}}}_\alpha ({ \tilde{\mathbf {A}}_\eta } \mathbf {v}) \le R_\alpha {{\mathrm{rank}}}_\alpha (\mathbf {v}) \,. \end{aligned}$$

Remark 12

If for a given \({\mathcal {H}}\)SVD of \(\mathbf {v}\), \(r_\alpha := {{\mathrm{rank}}}_\alpha (\mathbf {v}) < \infty \), \(\alpha \in \mathcal{N}(\mathcal {D}_{m})\), then the number \({{\mathrm{ops}}}({ \tilde{\mathbf {A}}_\eta } \mathbf {v})\) of arithmetic operations for evaluating \({ \tilde{\mathbf {A}}_\eta } \mathbf {v}\) as in Theorem 8 can be estimated by
$$\begin{aligned} {{\mathrm{ops}}}({ \tilde{\mathbf {A}}_\eta } \mathbf {v}) \lesssim&\sum _{\alpha \in \mathcal{N}(\mathcal {D}_{m})} R_\alpha r_\alpha \prod _{q=1}^2 R_{{\mathrm{c}_{q}}(\alpha )} r_{{\mathrm{c}_{q}}(\alpha )} \nonumber \\&+ \, \eta ^{-\frac{1}{s}} \sum _{i=1}^m C_{\hat{\alpha }}^{(i)} R_i r_i \Big (\, \sum _{j=1}^m C^{(j)}_{\hat{\beta }} C^{(j)}_{\tilde{\mathbf {A}}} R_j ||\pi ^{(j)}(\mathbf {v})||_{{\mathcal {A}}^s} \Big )^{\frac{1}{s}} \,, \end{aligned}$$
(108)
with a constant independent of \(\mathbf {v}\), \(\eta \), and \(m\).

Comparing the first summand on the right-hand side of (108) to that in (104), we observe a substantial reduction in complexity regarding the dependence on \(m\) (and, hence, \(d\)).

4.2 Low-Rank Approximations of Operators

In many applications of interest, the involved operators do not have an explicit low-rank form, but there exist efficient approximations to these operators in low-rank representation.

Such a case can be handled by replacing a given operator \(\mathbf {A}\) by such an approximation and then applying the construction for operators given in low-rank form as in the previous subsections.

To make this precise, we assume that for a suitable growth sequence \(\gamma _{\mathbf {A}}\) there exist approximations \(\mathbf {A}_N\) for \(N\in \mathrm{I}\!\mathrm{N}\) with
$$\begin{aligned} \sup _N \gamma _{\mathbf {A}}(N) ||\mathbf {A} - \mathbf {A}_N|| =: M_{\mathbf {A}} < \infty \,, \end{aligned}$$
(109)
where each \(\mathbf {A}_N\) has a representation (105) with \(R_i \le N\). Moreover, in the case of the hierarchical Tucker format, we assume in addition that \(R_\alpha \le N\), with \(R_\alpha \) as in (107).
Moreover, we need to quantify the approximability of the \(\mathbf {A}_N\). We assume that all tensor factors arising in each \(\mathbf {A}_N\) are \(s^*\)-compressible and that for the approximations \({\tilde{\mathbf {A}}_{N,\eta }}\) of \(\mathbf {A}_N\) according to Lemma 4 and Theorem 8 – with constants \(C^{(i)}_{\tilde{\mathbf {A}}_N}\), \(C^{(i)}_{\hat{\alpha }_N}\), \(C^{(i)}_{\hat{\beta }_N}\) in Theorem 8 – we have
$$\begin{aligned} C_{\mathbf {A},\tilde{\mathbf {A}}} := \sup _N \bigl ( \max _i C^{(i)}_{\hat{\alpha }_N} \bigr )^s \bigl ( \max _i C^{(i)}_{\tilde{\mathbf {A}}_N} C^{(i)}_{\hat{\beta }_N} \bigr ) <\infty \,. \end{aligned}$$
(110)
Under these conditions, we shall say that the approximations \(\mathbf {A}_N\) to \(\mathbf {A}\) are uniformly\(s^*\)-compressible.
Under this assumption, the estimates for \({{\mathrm{ops}}}({ \tilde{\mathbf {A}}_\eta } \mathbf {v})\) obtained in Remarks 11 and 12 carry over to the present setting with an additional low-rank approximation of the operator. Here, for given \(\eta > 0\) and \(\mathbf {v}\) we choose \(N_\eta \) such that \(||\mathbf {A} - \mathbf {A}_{N_\eta }|| \le \eta /2\) and \(\tilde{\mathbf {A}}_{N_\eta ,\eta }\) such that \(||\mathbf {A}_{N_\eta } \mathbf {v} - \tilde{\mathbf {A}}_{N_\eta ,\eta } \mathbf {v}|| \le \eta /2\), which in summary yields for the Tucker format
$$\begin{aligned} {{\mathrm{ops}}}({\tilde{\mathbf {A}}_{N_\eta ,\eta }}\mathbf {v})&\lesssim \bigl (\gamma ^{-1}_\mathbf {A}(2 M_{\mathbf {A}} / \eta )\bigr )^m \prod _{i=1}^m {{\mathrm{rank}}}_i(\mathbf {v}) \nonumber \\&+\, C_{\mathbf {A},\tilde{\mathbf {A}}}^{\frac{1}{s}} \eta ^{-\frac{1}{s}} \Big ( \gamma ^{-1}_\mathbf {A}(2 M_{\mathbf {A}} / \eta ) \Big )^{1 + s^{-1}} \sum _{i=1}^m {{\mathrm{rank}}}_i(\mathbf {v}) \Big ( \sum _{j=1}^m ||\pi ^{(j)}(\mathbf {v})||_{{\mathcal {A}}^s} \Big )^{\frac{1}{s}},\nonumber \\ \end{aligned}$$
(111)
and for the hierarchical Tucker format
$$\begin{aligned} {{\mathrm{ops}}}({\tilde{\mathbf {A}}_{N_\eta ,\eta }}\mathbf {v}) \lesssim \bigl (\gamma ^{-1}_\mathbf {A}(2 M_{\mathbf {A}} / \eta )\bigr )^3 \sum _{\alpha \in \mathcal{N}(\mathcal {D}_{m})} {{\mathrm{rank}}}_\alpha (\mathbf {v}) \prod _{q=1}^2 {{\mathrm{rank}}}_{{\mathrm{c}_{q}}(\alpha )} (\mathbf {v}) \nonumber \\ \quad +\,C_{\mathbf {A},\tilde{\mathbf {A}}}^{\frac{1}{s}} \eta ^{-\frac{1}{s}} \bigl ( \gamma ^{-1}_\mathbf {A}(2 M_{\mathbf {A}} / \eta ) \bigr )^{1 + s^{-1}} \sum _{i=1}^m {{\mathrm{rank}}}_i(\mathbf {v}) \Big ( \sum _{j=1}^m ||\pi ^{(j)}(\mathbf {v})||_{{\mathcal {A}}^s} \Big )^{\frac{1}{s}}. \end{aligned}$$
(112)
Note again the reduction in complexity in the first term of (112) over (111).

5 An Adaptive Iterative Scheme

5.1 Formulation and Basic Convergence Properties

We now have all prerequisites in place to formulate an adaptive method whose basic structure resembles the one introduced in [10] for linear operator equations \(\mathbf {A} \mathbf {u} = \mathbf {f}\), where \(\mathbf {f}\in \mathrm{\ell }_{2}\) and \(\mathbf {A}\) is bounded and elliptic on \(\mathrm{\ell }_{2}\), that is,
$$\begin{aligned} \langle \mathbf {A} \mathbf {v},\mathbf {v}\rangle \ge {\lambda _{\mathbf {A}}}||\mathbf {v}||^2,\quad ||\mathbf {A}\mathbf {v}|| \le {\varLambda _{\mathbf {A}}}||\mathbf {v}|| \end{aligned}$$
holds for fixed constants \({\lambda _{\mathbf {A}}}, {\varLambda _{\mathbf {A}}}> 0\). The scheme can be regarded as a perturbation of a simple Richardson iteration,
$$\begin{aligned} \mathbf {v}_{i+1} := \mathbf {v}_{i} - \omega (\mathbf {A} \mathbf {v}_{i} - \mathbf {f}) \,, \end{aligned}$$
(113)
which applies to both symmetric and nonsymmetric elliptic \(\mathbf {A}\). In both cases, the parameter \(\omega >0\) can be chosen such that \(||\mathrm{I}- \omega \mathbf {A}|| < 1\).
Based on the developments in the previous sections, we have at hand numerically realizable procedures \({{\mathrm{\textsc {apply}}}}\), \({{\mathrm{\textsc {rhs}}}}\), \({{\mathrm{\textsc {recompress}}}}\), and \({{\mathrm{\textsc {coarsen}}}}\), which, for finitely supported \(\mathbf {v}\) and any tolerance \(\eta > 0\), satisfy
$$\begin{aligned}&||\mathbf {A}\mathbf {v} - {{\mathrm{\textsc {apply}}}}(\mathbf {v}; \eta )|| \le \eta ,&||\mathbf {f} - {{\mathrm{\textsc {rhs}}}}(\eta )|| \le \eta , \nonumber \\&||\mathbf {v} - {{\mathrm{\textsc {recompress}}}}(\mathbf {v}; \eta )|| \le \eta \,,\quad&||\mathbf {v} - {{\mathrm{\textsc {coarsen}}}}(\mathbf {v}; \eta )|| \le \eta . \end{aligned}$$
(114)
Specifications of the complexities of these procedures will be summarized in Sect. 5.2. The adaptive scheme that we analyze in what follows is given in Algorithm 5.1.

Proposition 5

Let the step size \(\omega >0\) in Algorithm 5.1 satisfy \(||\mathrm{I}- \omega \mathbf {A}|| \le \rho < 1\). Then the intermediate steps \(\mathbf {u}_{k}\) of Algorithm 5.1 satisfy \(||\mathbf {u}_k - \mathbf {u}|| \le \theta ^k\delta \), and in particular, the output \(\mathbf {u}_\varepsilon \) of Algorithm 5.1 satisfies \(||\mathbf {u}_\varepsilon - \mathbf {u}|| \le \varepsilon \).

Proof

Since \(\kappa _1+\kappa _2+\kappa _3\le 1\), it suffices to show that for any \(k\), following termination of the inner loop, the error bound
$$\begin{aligned} ||\mathbf {w}_j - \mathbf {u}|| \le \kappa _1 \theta ^{k+1} \delta \end{aligned}$$
(115)
holds. By the choice of \(\omega \), we have
$$\begin{aligned} || \mathbf {w}_{j+1} - \mathbf {u} ||&\le { ||(\mathrm{I}- \omega \mathbf {A}) (\mathbf {w}_j - \mathbf {u}) || + \omega ||(\mathbf {A}\mathbf {w}_j - \mathbf {f}) - \mathbf {r}_j || + \beta \eta _j} \\&\le \rho ||\mathbf {w}_j - \mathbf {u}|| + (\omega + \beta ) \eta _j \,, \end{aligned}$$
and recursive application of this estimate yields
$$\begin{aligned} ||\mathbf {w}_{j} -\mathbf {u}|| \le \rho ^{j} ||\mathbf {w}_0 -\mathbf {u}|| + (\omega + \beta ) \sum _{l=0}^{j-1} \rho ^{j-1-l} \eta _l\le \rho ^{j}\bigl (1+j(\omega +\beta )\bigr ) \theta ^k \delta \,. \end{aligned}$$
Thus, on the one hand, if the inner loop exits with the first condition in line 10, then (115) holds by definition of \(J\). On the other hand, if the second condition is met, then (115) holds because
$$\begin{aligned} ||\mathbf {w}_j - \mathbf {u}||&\le \rho ||\mathbf {w}_{j-1} - \mathbf {u}|| + (\omega +\beta ) \eta _{j-1} \\&\le \rho c_{\mathbf {A}}^{-1} ( ||\mathbf {r}_{j-1}|| + \eta _{j-1}) + (\omega +\beta ) \eta _{j-1} \le \kappa _1\theta ^{k+1}\delta \,. \end{aligned}$$
\(\square \)

5.2 Complexity

Quite in the spirit of adaptive wavelet methods we analyze the performance of the foregoing scheme by comparing it to an optimality benchmark addressing the following question: suppose an unknown solution exhibits a certain (unknown) rate of tensor approximability where the involved tensors have a certain (unknown) best \(N\)-term approximability with respect to their wavelet representations. Does the scheme automatically recover these rates? Thus, unlike the situation in wavelet analysis, we are dealing here with two types of approximation, and the choice of corresponding rates as a benchmark model should, of course, be representative for relevant application scenarios. For the present complexity analysis we focus on growth sequences of a subexponential or exponential type for the involved low-rank approximations, combined with an algebraic approximation rate for the corresponding tensor mode frames. The rationale for this choice is as follows. Approximation rates in classical methods are governed by the regularity of the approximand, which, unless the approximand is analytic, results in algebraic rates suffering from the curse of dimensionality. However, functions of many variables may very well exhibit a high degree of tensor sparsity without being very regular in the Sobolev or Besov sense. Therefore, fast tensor rates, combined with polynomial rates for the compressibility of the mode frames, mark an ideal target scenario for tensor methods since, as will be shown, the curse of dimensionality can be significantly ameliorated without requiring excessive regularity.

The precise formulation of our benchmark model reads as follows.

Assumption 1

Concerning the tensor approximability of \(\mathbf{u}\), \(\mathbf {A}\), and \(\mathbf {f}\), we make the following assumptions:
  1. (i)

    \(\mathbf{u}\in {{\mathcal {A}}_{\mathcal {H}}({\gamma _\mathbf{u}})}\) with \(\gamma _\mathbf{u}(n) = e^{d_\mathbf{u}n^{1/b_\mathbf{u}}}\) for some \(d_\mathbf{u}>0\), \(b_\mathbf{u}\ge 1\).

     
  2. (ii)

    \(\mathbf {A}\) satisfies (109) for an \(M_\mathbf {A} > 0\), with \(\gamma _\mathbf {A}(n) = e^{d_\mathbf {A} n^{1/b_\mathbf {A}}}\), where \(d_\mathbf {A} >0\), \(b_\mathbf {A} \ge 1\).

     
  3. (iii)

    Furthermore, let \(\mathbf {f} \in {{\mathcal {A}}_{\mathcal {H}}({\gamma _\mathbf {f}})}\) with \(\gamma _\mathbf {f}(n) = e^{d_\mathbf {f} \, n^{1/b_\mathbf {f}}}\), where \(d_\mathbf {f} = \min \{ d_\mathbf{u}, d_\mathbf {A} \}\) and \(b_\mathbf {f} = b_\mathbf{u}+ b_\mathbf {A}\).

     
Concerning the approximability of lower-dimensional components, we assume that for some \(s^* > 0\), the following properties hold:
  1. (iv)

    \(\pi ^{(i)}(\mathbf{u}) \in {{\mathcal {A}}^s}\) for \(i=1,\ldots ,m\), for any \(s\) with \(0 < s <s^*\).

     
  2. (v)

    The low-rank approximations to \(\mathbf {A}\) are uniformly \(s^*\)-compressible in the sense of Sect. 4.2, with \(C_{\mathbf {A}} := \sup _{\eta >0} C_{\mathbf {A},\tilde{\mathbf {A}}} < \infty \), where \(C_{\mathbf {A},\tilde{\mathbf {A}}}\) is defined as in (110) for each value of \(\eta \).

     
  3. (vi)

    \(\pi ^{(i)}(\mathbf {f}) \in {{\mathcal {A}}^s}\), for \(i=1,\ldots ,m\), for any \(s\) with \(0 < s <s^*\).

     
Furthermore, we assume that the number of operations needed to evaluate each required entry in the tensor approximations of \(\mathbf {A}\) or \(\mathbf {f}\) is uniformly bounded.

Note that the requirement on \(\mathbf{f}\) in (iii) is actually very mild because the data are typically more tensor sparse than the solution.

The following complexity estimates are formulated only for the more interesting case of the hierarchical Tucker format. Similar statements hold for the Tucker format, involving, however, additional terms that depend exponentially on \(m\), which makes this format suitable only for moderate values of \(m\).

Remark 13

Let \(\mathbf{v}\) have finite support with finite ranks, i.e., \({{\mathrm{rank}}}_\alpha (\mathbf{v}) < \infty \) for \(\alpha \in \mathcal {D}_{m}\). Then, under Assumptions 1, \({{\mathrm{\textsc {apply}}}}\) can be realized numerically such that for \(\mathbf{w}_\eta := {{\mathrm{\textsc {apply}}}}(\mathbf{v}; \eta )\) we have (Theorem 8 and Remark 9)
$$\begin{aligned}&\# {{\mathrm{supp}}}_i (\mathbf{w}_\eta ) \lesssim C_{\mathbf {A}}^{\frac{1}{s}} \Big (d_\mathbf {A}^{-1} \ln (M_\mathbf {A}/\eta )\Big )^{(1 + s^{-1})b_\mathbf {A}} \Big ( \sum _{j=1}^m ||\pi ^{(j)}(\mathbf {v})||_{{\mathcal {A}}^s} \Big )^{\frac{1}{s}} \eta ^{-\frac{1}{s}},\quad \quad \end{aligned}$$
(116)
$$\begin{aligned}&||\pi ^{(i)}(\mathbf {w}_\eta )||_{{{\mathcal {A}}^s}} \lesssim C_{\mathbf {A}} \bigl (d_\mathbf {A}^{-1} \ln (M_\mathbf {A}/\eta )\bigr )^{(s+1)b_\mathbf {A}} ||\pi ^{(i)}(\mathbf {v})||_{{\mathcal {A}}^s} \,, \end{aligned}$$
(117)
$$\begin{aligned}&|{{\mathrm{rank}}}(\mathbf {w}_\eta )|_\infty \le \bigl (d_\mathbf {A}^{-1} \ln (M_\mathbf {A}/\eta )\bigr )^{b_\mathbf {A}} |{{\mathrm{rank}}}(\mathbf {v})|_\infty , \end{aligned}$$
(118)
and, by (112),
$$\begin{aligned} {{\mathrm{ops}}}(\mathbf{w}_\eta )&\lesssim (m-1) \Big (d_\mathbf {A}^{-1} \ln (M_\mathbf {A}/\eta )\Big )^{3 b_\mathbf {A}} |{{\mathrm{rank}}}(\mathbf{v})|_\infty ^3 \nonumber \\&+\; m\, C_{\mathbf {A}}^{\frac{1}{s}} \, \, \bigl (d_\mathbf {A}^{-1} \ln (M_\mathbf {A}/\eta )\bigr )^{(1+s^{-1}) b_\mathbf {A}} |{{\mathrm{rank}}}(\mathbf {v})|_\infty \Big ( \sum _{i=1}^m ||\pi ^{(i)}(\mathbf {v})||_{{\mathcal {A}}^s} \Big )^{\frac{1}{s}} \eta ^{-\frac{1}{s}} .\nonumber \\ \end{aligned}$$
(119)
Thus, up to polylogarithmic terms, the curse of dimensionality is avoided. If, in addition, the approximations of \(\mathbf {A}\) are equi-\(s^*\)-compressible, then the polylogarithmic terms in the preceding estimates improve according to Remark 10.

Remark 14

Under Assumptions 1, the routine \({{\mathrm{\textsc {rhs}}}}\) can be realized numerically such that for \(\mathbf {f}_\eta :={{\mathrm{\textsc {rhs}}}}(\eta )\) we have
$$\begin{aligned}&\# {{\mathrm{supp}}}_i (\mathbf {f}_\eta ) \lesssim \eta ^{-\frac{1}{s}} ||\pi ^{(i)}(\mathbf {f})||_{{{\mathcal {A}}^s}}^{\frac{1}{s}}\,, \end{aligned}$$
(120)
$$\begin{aligned}&||\pi ^{(i)}(\mathbf {f}_\eta )||_{{{\mathcal {A}}^s}} \lesssim ||\pi ^{(i)}(\mathbf {f})||_{{{\mathcal {A}}^s}}\,, \end{aligned}$$
(121)
$$\begin{aligned}&|{{\mathrm{rank}}}(\mathbf {f}_\eta )|_\infty \lesssim \bigl (d_\mathbf {f}^{-1} \ln (||\mathbf {f}||_{{{\mathcal {A}}_{\mathcal {H}}({\gamma _\mathbf {f}})}}/\eta )\bigr )^{b_\mathbf {f}}, \end{aligned}$$
(122)
as well as
$$\begin{aligned} {{\mathrm{ops}}}(\mathbf {f}_\eta ) \lesssim (m-1) |{{\mathrm{rank}}}(\mathbf {f}_\eta )|_\infty ^3 +\; |{{\mathrm{rank}}}(\mathbf {f}_\eta )|_\infty \sum _{i=1}^m \#{{\mathrm{supp}}}_i(\mathbf {f}_\eta ) \,. \end{aligned}$$
(123)

Remark 15

We take \({{\mathrm{\textsc {recompress}}}}\) as a numerical realization of \({\hat{\mathrm{P }}}_{\eta }\) as defined in (47). This amounts to the computation of a HOSVD or \({\mathcal {H}}\)SVD, respectively, for which we have the complexity bounds given in Remarks 1 and 4.

Likewise, \({{\mathrm{\textsc {coarsen}}}}\) is a numerical realization of \(\hat{\mathrm{C}}_{\eta }\) as defined in (69), with the modification of replacing the exact sorting of the values \(\pi ^{(i)}_{\nu _i}(\cdot )\), \(i=1,\ldots ,m\), \(\nu \in \nabla ^{d_i}\), as required by \(\hat{\mathrm{C}}_{\eta }\), by an approximate sorting, as proposed in [4, 29] (Remark 11). This leads to an increase of \(\kappa _\mathrm{C}\) by only a fixed factor; for finitely supported \(\mathbf{v}\) the procedure can be realized in practice such that \(\kappa _\mathrm{C}= 2\sqrt{m}\) and using a number of operations bounded by
$$\begin{aligned} C |{{\mathrm{rank}}}(\mathbf{v})|_\infty \sum _{i=1}^m \#{{\mathrm{supp}}}_i(\mathbf{v}) \end{aligned}$$
with a fixed \(C>0\). Note that here we make the implicit assumption that the orthogonality properties required by \({{\mathrm{\textsc {coarsen}}}}\) have been enforced if necessary before the application of \({{\mathrm{\textsc {coarsen}}}}\). This can be done by an application of \({{\mathrm{\textsc {recompress}}}}(\cdot ,0)\).

Note that under the assumptions of Proposition 5, the iteration converges for any fixed \(\beta \ge 0\). A call to \({{\mathrm{\textsc {recompress}}}}\) (possibly with \(\beta =0\), i.e., without performing an approximation) is in fact necessary in each inner iteration to ensure the orthogonality properties required by \({{\mathrm{\textsc {apply}}}}\).

The main result of this paper is the following theorem. It says that whenever a solution has the approximation properties specified in Assumptions 1, the adaptive scheme recovers these rates and the required computational work has optimal complexity up to logarithmic factors. We have made an attempt to identify the dependencies of the involved constants on the problem parameters as explicitly as possible.

Theorem 9

Let \(\alpha > 0\), and let \(\kappa _\mathrm{P}, \kappa _\mathrm{C}\) be as in Theorem 7. Let the constants \(\kappa _1,\kappa _2,\kappa _3\) in Algorithm 5.1 be chosen as
$$\begin{aligned} \kappa _1&= \bigl (1 + (1+\alpha )(\kappa _\mathrm{P}+ \kappa _\mathrm{C}+ \kappa _\mathrm{P}\kappa _\mathrm{C})\bigr )^{-1},\quad \kappa _2 = (1+\alpha )\kappa _\mathrm{P}\kappa _1, \\&\quad \kappa _3 = \kappa _\mathrm{C}(\kappa _\mathrm{P}+ 1)(1+\alpha )\kappa _1 \,. \end{aligned}$$
Let \(\mathbf {A}\mathbf{u}= \mathbf {f}\), where \(\mathbf {A}\), \(\mathbf{u}\), \(\mathbf {f}\) satisfy Assumptions 1. Then \(\mathbf{u}_\varepsilon \) produced by Algorithm 5.1 satisfies
$$\begin{aligned}&|{{\mathrm{rank}}}(\mathbf{u}_\varepsilon )|_\infty \le \, \bigl ( d_\mathbf {\mathbf{u}}^{-1} \ln \bigl [(\theta \alpha )^{-1} \rho _{\gamma _\mathbf{u}} \,||\mathbf{u}||_{{{\mathcal {A}}_{\mathcal {H}}({\gamma _\mathbf {\mathbf{u}}})}}\,\varepsilon ^{-1}\bigr ] \bigr )^{ b_\mathbf {\mathbf{u}}} \,, \end{aligned}$$
(124)
$$\begin{aligned}&\sum _{i=1}^m \#{{\mathrm{supp}}}_i(\mathbf{u}_\varepsilon ) \lesssim \Bigl (\sum _{i=1}^m || \pi ^{(i)}(\mathbf{u})||_{{{\mathcal {A}}^s}} \Bigr )^{\frac{1}{s}} \varepsilon ^{-\frac{1}{s}} \,, \end{aligned}$$
(125)
as well as
$$\begin{aligned}&||\mathbf{u}_\varepsilon ||_{{{\mathcal {A}}_{\mathcal {H}}({\gamma _\mathbf {\mathbf{u}}})}} \lesssim ||\mathbf{u}||_{{{\mathcal {A}}_{\mathcal {H}}({\gamma _\mathbf {\mathbf{u}}})}} \,, \end{aligned}$$
(126)
$$\begin{aligned}&\sum _{i=1}^m || \pi ^{(i)}(\mathbf{u}_\varepsilon )||_{{{\mathcal {A}}^s}} \lesssim \sum _{i=1}^m || \pi ^{(i)}(\mathbf{u})||_{{{\mathcal {A}}^s}} \,. \end{aligned}$$
(127)
The multiplicative constant in (126) depends only on \(\alpha \) and \(m\), while those in (125) and (127) depend only on \(\alpha \), \(m\) and \(s\). For the number of required operations, we have the estimate
$$\begin{aligned} {{\mathrm{ops}}}(\mathbf{u}_\varepsilon ) \lesssim |\ln \varepsilon |^{J (3 + s^{-1}) b_\mathbf {A} + 2 b_\mathbf {f}}\, \Big ( \sum _{i=1}^m \max \{||\pi ^{(i)}(\mathbf{u})||_{{\mathcal {A}}^s}, ||\pi ^{(i)}(\mathbf {f})||_{{\mathcal {A}}^s}\} \Big )^\frac{1}{s} \, \varepsilon ^{-\frac{1}{s}} \,, \end{aligned}$$
(128)
with a multiplicative constant independent of \(\varepsilon \) and \(|| \pi ^{(i)}(\mathbf{u})||_{{{\mathcal {A}}^s}}\), \(|| \pi ^{(i)}(\mathbf {f})||_{{{\mathcal {A}}^s}}\), and with an algebraic explicit dependence on \(m\) and \(C_\mathbf {A}\).

Remark 16

Recalling the form of the growth sequence \(\gamma _\mathbf{u}(n)= e^{d_\mathbf{u}n^{1/b_\mathbf{u}}}\), the rank bound (124) can be reformulated in terms of \(\gamma _\mathbf{u}^{-1}\big (C||\mathbf{u}||_{{{\mathcal {A}}_{\mathcal {H}}({\gamma _\mathbf {\mathbf{u}}})}}/\varepsilon \big )\), which, in view of Remark 6, means that up to a multiplicative constant, the ranks remain minimal. On account of Remark 7, the same holds for the bound (125) on the sparsity of the factors.

Remark 17

The maximum number of inner iterations \(J\) that arises in the complexity estimate is defined in line 2 of Algorithm 5.1. This value depends on the freely chosen algorithm parameters \(\beta \) and \(\theta \), on the constants \(\omega \) and \(\rho \) that depend only on \(\mathbf {A}\), and on \(\kappa _1\). Thus, \(J\) depends on \(m\): the choice of \(\kappa _1\) in Theorem 9 leads to \(\kappa _1 \sim m^{-1}\) and, hence, \(J\sim \log m\). Note that since \(|\ln \varepsilon |^{c \ln m} = m^{c \ln |\ln \varepsilon |}\), this leads to an algebraic dependence of the complexity estimate on \(m\). Furthermore, the precise dependence of the constant in (128) on \(m\) is also influenced by the problem parameters from Assumption 1, which may contain additional implicit dependencies on \(m\). In particular, as can be seen from the proof, the constant has a linear dependence on \(C_\mathbf {A}^{J/s}\) if \(C_\mathbf {A}> 1\) (cf. Remark 8).

Proof

(Theorem 9) By the choice of \(\kappa _1\), \(\kappa _2\), and \(\kappa _3\), we can apply Lemma 7 to each \(\mathbf{u}_i\) produced in line 11 of Algorithm 5.1, which yields the bounds (124), (125), (126), and (127) for the values \(\varepsilon = \theta ^k \delta \), \(k\in \mathrm{I}\!\mathrm{N}\).

It therefore remains to estimate the computational complexity of each inner loop. Note that \({{\mathrm{\textsc {recompress}}}}\) in line 8 does not deteriorate the approximability of the intermediates \(\mathbf{w}_j\) as a consequence of Lemma 3.

Let \(\varepsilon _k := \theta ^k \delta \). We already know from Theorem 7 that
$$\begin{aligned} |{{\mathrm{rank}}}(\mathbf{u}_k)|_\infty&\le \, \bigl ( d_\mathbf {\mathbf{u}}^{-1} \ln [\alpha ^{-1} \rho _{\gamma _\mathbf{u}} \,||\mathbf{u}||_{{{\mathcal {A}}_{\mathcal {H}}({\gamma _\mathbf {\mathbf{u}}})}}\,\varepsilon _k^{-1}] \bigr )^{ b_\mathbf {\mathbf{u}}} \lesssim |\ln \varepsilon _k|^{b_\mathbf{u}} \,,\end{aligned}$$
(129)
$$\begin{aligned} \sum _{i=1}^m \#{{\mathrm{supp}}}_i(\mathbf{u}_k)&\lesssim \Bigl (\sum _{i=1}^m || \pi ^{(i)}(\mathbf{u})||_{{{\mathcal {A}}^s}} \Bigr )^{\frac{1}{s}} \varepsilon _k^{-\frac{1}{s}} \,, \end{aligned}$$
(130)
$$\begin{aligned} \sum _{i=1}^m ||\pi ^{(i)}(\mathbf{u}_k)||_{{{\mathcal {A}}^s}}&\lesssim \sum _{i=1}^m || \pi ^{(i)}(\mathbf{u})||_{{{\mathcal {A}}^s}} \,, \end{aligned}$$
(131)
where the multiplicative constants in the last two equations depend on \(\alpha \), \(m\), and \(s\). Similarly, we obtain (126) from (73). Furthermore, by definition of the iteration,
$$\begin{aligned} |{{\mathrm{rank}}}(\mathbf{w}_{j+1})|_\infty \le \bigl ( d_\mathbf {A}^{-1} \ln (2 M_\mathbf {A} / \eta _j )\bigr )^{b_\mathbf {A}} |{{\mathrm{rank}}}(\mathbf{w}_j)|_\infty + \bigl ( d_\mathbf {f}^{-1} \ln (2 |\mathbf {f}|_{{{\mathcal {A}}_{\mathcal {H}}({\gamma _\mathbf {f}})}} / \eta _j )\bigr )^{b_\mathbf {f}} \,. \end{aligned}$$
Combining this with (129) and using \(b_\mathbf {f} > b_\mathbf{u}\), we obtain
$$\begin{aligned} |{{\mathrm{rank}}}(\mathbf{w}_{j})|_\infty \lesssim | \ln \varepsilon _{k}|^{j b_\mathbf {A} + b_\mathbf {f}} \,. \end{aligned}$$
The definition of the iterates also yields
$$\begin{aligned} \#{{\mathrm{supp}}}_i(\mathbf{w}_{j+1})&\lesssim \#{{\mathrm{supp}}}_i(\mathbf{w}_{j})+C_\mathbf {A}^{\frac{1}{s}} \Big ( d_\mathbf {A}^{-1} \ln (2 M_\mathbf {A} / \eta _j )\Big )^{(1+s^{-1})b_\mathbf {A}}\\&\times \Big (\, \sum _{l=1}^m ||\pi ^{(l)}(\mathbf{w}_j)||_{{{\mathcal {A}}^s}} \Big )^{\frac{1}{s}} \eta _j^{-\frac{1}{s}} + ||\pi ^{(i)}(\mathbf {f})||_{{{\mathcal {A}}^s}}^{\frac{1}{s}} \eta _j^{-\frac{1}{s}} \,,\quad \end{aligned}$$
and, by (117),
$$\begin{aligned}&||\pi ^{(i)}(\mathbf{w}_j)||_{{\mathcal {A}}^s}\!\lesssim \!||\pi ^{(i)}(\mathbf{w}_{j-1})||_{{\mathcal {A}}^s}\,+\, \omega C_\mathbf {A} \bigl ( d_\mathbf {A}^{-1} \ln (2M_\mathbf {A}/\eta _{j-1}) \bigr )^{(1+s)b_\mathbf {A}} \\&\qquad \, ||\pi ^{(i)}(\mathbf{w}_{j-1})||_{{\mathcal {A}}^s}+\ \omega ||\pi ^{(i)}(\mathbf {f})||_{{\mathcal {A}}^s}\,. \end{aligned}$$
Using these estimates recursively together with (131) and (130) we obtain
$$\begin{aligned} ||\pi ^{(i)}(\mathbf{w}_j)||_{{\mathcal {A}}^s}\lesssim |\ln \varepsilon _k|^{j\,(1+s)\, b_\mathbf {A}} \max \bigl \{ ||\pi ^{(i)}(\mathbf{u})||_{{\mathcal {A}}^s}, ||\pi ^{(i)}(\mathbf {f})||_{{\mathcal {A}}^s}\bigr \} \end{aligned}$$
and
$$\begin{aligned} \sum _{i=1}^m \#{{\mathrm{supp}}}_i(\mathbf{w}_{j}) \lesssim |\ln \varepsilon _k|^{j\,(1+s^{-1})\, b_\mathbf {A}} \Big (\, \sum _{i=1}^m \max \{||\pi ^{(i)}(\mathbf{u})||_{{\mathcal {A}}^s}, ||\pi ^{(i)}(\mathbf {f})||_{{\mathcal {A}}^s}\} \Big )^\frac{1}{s} \,\varepsilon _k^{-\frac{1}{s}} \,. \end{aligned}$$
The total number of operations for the calls of \({{\mathrm{\textsc {apply}}}}\) in an inner loop according to (119) is dominated by the total number of operations for the calls of \({{\mathrm{\textsc {recompress}}}}\), which can be bounded up to a constant by
$$\begin{aligned} m |{{\mathrm{rank}}}(\mathbf{w}_J)|_\infty ^4 + |{{\mathrm{rank}}}(\mathbf{w}_J)|_\infty ^2 \sum _{i=1}^m \#{{\mathrm{supp}}}_i(\mathbf{w}_{J})\,. \end{aligned}$$
We thus arrive at (128). \(\square \)

Remark 18

The preceding results apply directly to problems posed on separable tensor product Hilbert spaces for which tensor product Riesz bases are available. Note that this is not the case for standard Sobolev spaces \(\mathrm{H}^{s}(\varOmega ^d)\) since in this case the norm induced by the scalar product is not a cross norm. However, for tensor product domains \(\varOmega ^d\) these spaces can be represented as intersections of \(d\) tensor product spaces with induced norms.

As mentioned in the introduction, from a sufficiently regular tensor product wavelet basis \( \{ \varPsi _\nu := \psi _{\nu _1}\otimes \cdots \otimes \psi _{\nu _d}\}_{\nu \in \nabla ^d}\) of \(\mathrm{L}_{2}(\varOmega ^d)\) we can obtain a Riesz basis of \(\mathrm{H}^{s}(\varOmega ^d)\) by a level-dependent rescaling of basis functions, for example,
$$\begin{aligned} \bigl \{ 2^{-s \max _i|\nu _i|} \varPsi _\nu \bigr \}_{\nu \in \nabla ^d} \,. \end{aligned}$$
To again arrive at a problem on \(\mathrm{\ell }_{2}\), we now rewrite the original operator equation \(Au=f\), with \(A:\mathrm{H}^{s}(\varOmega ^d)\rightarrow (\mathrm{H}^{s}(\varOmega ^d))'\), in the form
$$\begin{aligned}&\sum _{\mu \in \nabla ^d} \bigl ( 2^{-s(\max _i|\nu _i| + \max _i|\mu _i|)} \langle A \varPsi _\mu , \varPsi _\nu \rangle \bigr ) \bigl (2^{s\max _i|\mu _i|}\langle u, \varPsi _\mu \rangle \bigr )\\&\quad = 2^{-s(\max _i|\nu _i|} \langle f, \varPsi _\nu \rangle , \quad \nu \in \nabla ^d\,. \end{aligned}$$
We thus obtain a well-posed problem on \(\mathrm{\ell }_{2}(\nabla ^d)\) for the rescaled coefficient sequence \(\mathbf {u} = (2^{s\max _i |\mu _i|}\langle u,\Psi _\mu \rangle )_\mu \), where \(\mathbf {A} = (2^{-s(\max _i |\nu _i| + \max _i|\mu _i|)} \langle A \Psi _\mu , \Psi _\nu \rangle )_{\nu ,\mu }\).

This diagonal rescaling, which in the case of finite-dimensional Galerkin approximations corresponds to a preconditioning of \(A\), leads to additional problems in our context: the sequence \((2^{-s \max _i|\nu _i|})_{\nu \in \nabla ^d}\) (as well as possible equivalent alternatives) has infinite rank on the full index set \(\nabla ^d\). Hence, the application of \(\mathbf {A}\) must involve an approximation by low-rank operators, as discussed in Sect. 4.2. Strategies for handling this issue are discussed in more detail in [2]. The complexity analysis of iterative schemes when \(\mathbf {A}\) involves such a rescaling will be treated in a separate paper.

6 Numerical Experiments

We choose our example to illustrate the results of the previous section numerically according to several criteria. To arrive at a valid comparison between different dimensions, we choose a problem on \(\mathrm{L}_{2}((0,1)^d)\) that has similar properties for different values of \(d\). The problem has a discontinuous right-hand side and solution, which means that reasonable convergence rates can be achieved only by adaptive approximation. It is also still sufficiently simple such that all constants used in Algorithm 5.1 can be chosen rigorously according to the requirements of the convergence analysis.

We set \(\varOmega :=(0,1)^d\) and use tensor order \(m=d\). As an orthonormal wavelet basis \(\{ \psi _\nu \}_{\nu \in \nabla }\) of \(\mathrm{L}_{2}(0,1)\) we use Alpert multiwavelets [1] of polynomial order \(p \in \mathrm{I}\!\mathrm{N}\). Let
$$\begin{aligned} (T v )(t):= \int \limits _0^t v \,\mathrm{d}s, \end{aligned}$$
then \(T\) is a compact operator on \(\mathrm{L}_{2}(0,1)\) with \(||T|| = 2 / \pi \). The infinite matrix representation \(\bigl ( \langle T \psi _\mu ,\psi _\nu \rangle \bigr )_{\nu ,\mu \in \nabla }\) is \(s^*\)-compressible for any \(s^* >0\).
For \(f\in \mathrm{L}_{2}(\varOmega )\) we consider the integral equation
$$\begin{aligned} \Big ( \mathrm{I}- \omega _d \bigotimes _{i=1}^d T \Big ) u = f \end{aligned}$$
(132)
with \(\omega _d = \frac{1}{2}(\frac{\pi }{2})^d\). Note that for \(B := \omega _d \bigotimes _{i=1}^d T\) and \(A := \mathrm{I}-B\) we have \(||B|| = \frac{1}{2}\), and therefore
$$\begin{aligned} A^{-1} = (\mathrm{I}- B)^{-1} = \sum _{k=0}^\infty B^k = \sum _{k=0}^\infty \omega _d^k \bigotimes _{i=1}^d T^k. \end{aligned}$$
Furthermore, \(A := \mathrm{I}- B\) is a nonsymmetric, \(\mathrm{L}_{2}\)-elliptic operator with \(\langle A v, v\rangle \ge \frac{1}{2} ||v||^2_{\mathrm{L}_{2}(\varOmega )}\), as well as \(||A|| \le \frac{3}{2}\). Since \(\mathbf {A}\) is the representation with respect to an orthonormal basis, we obtain \(\lambda _\mathbf {A}= \frac{1}{2}\) and \(\varLambda _\mathbf {A}= \frac{3}{2}\). Due to the special structure of the operator, choosing the iteration parameter \(\omega \) as \(\omega := 1\), we have \(||\mathrm{I}- \omega \mathbf {A}|| \le \frac{1}{2} =: \rho \). We choose the right-hand side as
$$\begin{aligned} f = (1-\tau ) {\sum _{k=0}^\infty \tau ^k \bigotimes _{i=1}^d f_k },\quad f_k(x) := \sqrt{2 \pi }\,\chi _{[0,1/\pi ]} \cos (2 \pi ^2 (k+1) x), \end{aligned}$$
(133)
where \(\tau \in (0,1)\). This gives \(||f_k||_{\mathrm{L}_{2}(0,1)} =1\) and \(||f||_{\mathrm{L}_{2}(\varOmega )} =||\mathbf {f}|| = 1\), and \(\pi ^{(i)}(\mathbf {f}) \in {{\mathcal {A}}^s}\) for any \(s>0\). The functions \(f_k\) have jump discontinuities at \(\pi ^{-1}\), which need to be resolved adaptively to maintain the optimal approximation rate for the given wavelet basis.
From the expansion for \((\mathrm{I}- B)^{-1}\) we already know that \(\pi ^{(i)}(\mathbf{u})\in {{\mathcal {A}}^s}\) for any \(s < p\), for \(i=1,\ldots ,m\). We also have the explicit representation
$$\begin{aligned} u = (1-\tau ) \sum _{k, n=0}^\infty \tau ^k \omega _d^n \bigotimes _{i=1}^d T^n f_k \,. \end{aligned}$$
For the choice of \(f_k\) under consideration, evaluating \(\omega _d^n \bigotimes _{i=1}^d T^n f_k\), we obtain \(u\rightarrow f\) as \(d\rightarrow \infty \); that is, the mode singular values of the solution approach exponential decay at rate \(\tau \) for growing \(d\). Since \(||u-f||_{\mathrm{L}_{2}}\) is small for any \(d>3\), \(\mathbf{u}\) has a similar low-rank approximability for all relevant \(d\).

Hence, for our particular choice of \(f\) the action of \(A^{-1}\) is close to the identity. It should be emphasized, however, that this only simplifies the interpretation of the results but does not simplify the problem from a computational point of view since our algorithm does not make use of this particularity. We have also chosen a problem that is completely symmetric with respect to all variables to simplify the tests and the comparison between values of \(d\), but we do not make computational use of this symmetry.

For the additional constants arising in the iteration, we choose \(\theta := \frac{1}{2}\) and \(\beta :=1\). For the hierarchical Tucker format we have \(\kappa _\mathrm{P}= \sqrt{2m-3}\) and \(\kappa _\mathrm{C}= \sqrt{m}\) and fix the derived constants \(\kappa _1,\kappa _2,\kappa _3\) as in Theorem 9 by taking \(\alpha :=1\). Furthermore, we have \(\delta = \lambda _\mathbf {A}^{-1}||\mathbf {f}|| = 2\).

Remark 19

Since many steps of the algorithm – including the comparably expensive approximate application of lower-dimensional operators to tensor factors and \(QR\) factorizations of mode frames – can be done independently for each mode, an effective parallelization of our adaptive scheme is quite easy to achieve.

In all following examples, we use piecewise cubic wavelets. The implementation was done in C++ using standard LAPACK routines for linear algebra operations. Iterations are stopped as soon as a required wavelet index cannot be represented as a signed 64-bit integer.

We make some simplifications in counting the number of required operations: for each matrix–matrix product, \(QR\) factorization, and SVD we use the standard estimates for the required number of multiplications (e.g., [20]); for the approximation of \(\mathbf {A}\) and \(\mathbf {f}\) we count one operation per multiplication with a matrix entry and per generated right-hand side entry, respectively (note that we thus make the simplifying assumption that all required wavelet coefficients can be evaluated using \({\mathcal {O}}(1)\) operations, which could in principle be realized in the present example but is not strictly satisfied in our current implementation). We thus neglect some minor contributions that play no asymptotic role, such as the number of operations required for adding two tensor representations, and the sorting of tensor contraction values for \({{\mathrm{\textsc {coarsen}}}}\), which is done here by a standard library call for simplicity.

6.1 Results with Right-Hand Side of Rank 1

For comparison, we first consider a simplified version of the right-hand side reduced to the first summand, that is,
$$\begin{aligned} f = \bigotimes _{i=1}^d \sqrt{2 \pi }\,\chi _{[0,1/\pi ]} \cos (2 \pi ^2 \,\cdot ). \end{aligned}$$
In high dimensions, the solution \(u\) coincides with \(f\) up to very small correction terms.
The evolution of the computed approximate residual norms and the corresponding estimates for the \(\mathrm{L}_{2}\)-deviation from the solution of the infinite-dimensional problem is shown in Fig. 1. Here one can clearly observe the effect of the coarsening steps after a certain number of inner iterations. Apart from the expected increase in the number \(J\) of such inner iterations with dimension, the iteration shows quite similar behavior for different \(d\). In particular, in each case the resulting iterates \(\mathbf{w}_j\) in Algorithm 5.1 have rank 1, the residuals \(\mathbf {r}_j\) have ranks at most 3, and thus the maximum rank arising in the iteration is 4.
Fig. 1

Computed approximate residual norms (markers) and corresponding solution error estimates (solid lines), for \(f\) of rank one, depending on the total number of inner iterations (horizontal axis)

Note that the iteration is stopped a few steps earlier with increasing dimension because slightly stricter error tolerances are applied in the approximation of operator and right-hand side. This means that the technical limit for the maximum possible wavelet level is reached earlier.

We see that the number of operations, shown in Fig. 2, increases at a rate close to the approximation order \(4\) of our wavelet basis. What is most remarkable here, however, is the very mild – almost linear – dependence of the total complexity on the dimension: a doubling of dimension leads to only slightly more than twice the number of operations.
Fig. 2

Total operation count (square\(d=32\), circle\(d=64\), times\(d=128\)) at the end of each inner iteration depending on the estimated error (horizontal axis), for \(f\) of rank 1. The triangle shows a slope of \(\frac{1}{4}\)

6.2 Results with Right-Hand Side of Unbounded Rank

We now use the full right-hand side \(f\) as in (133), which leads to a solution with approximately the same exponential decay of singular values as \(f\).

As shown in Fig. 3, the computed residual estimates and the corresponding estimates for the solution error behave quite similarly to the previous example. In the present case, the computed residual norms show a less regular pattern, which is mostly due to the adjustment of approximation ranks for the right-hand side.
Fig. 3

Computed approximate residual norms (markers) and corresponding solution error estimates (solid lines), for \(f\) of unbounded rank, in dependence on the total number of inner iterations (horizontal axis)

The ranks of the produced iterates \(\mathbf{w}_j\), as well as those of the intermediate quantities arising in the iteration (see line 8 of Algorithm 5.1 prior to the recompression operation), show a steady but controlled increase during the iteration, as shown in Fig. 4.
Fig. 4

Maximum ranks of iterates \(\mathbf{w}_j\) (solid lines) and maximum ranks of all intermediates arising in the inner iteration steps (dashed lines), for \(f\) of unbounded rank, in dependence on the total number of inner iterations (horizontal axis)

Note that, in this case, the number of operations, shown in Fig. 5, increases visibly faster than the limiting rate corresponding to the approximation order of the lower-dimensional multiresolution spaces. Due to the higher tensor ranks involved, this is to be expected in view of our complexity estimates. The increase of complexity with the problem dimension, however, remains very moderate.
Fig. 5

Operation count (square\(d=32\), circle\(d=64\), times\(d=128\)) at end of each inner iteration in dependence on estimated error (horizontal axis), for \(f\) of unbounded rank. Triangle: slope of \(\frac{1}{4}\)

7 Conclusion and Outlook

The presented theory and examples indicate that the schemes developed in this work can be applied to very high-dimensional problems, with a rigorous foundation for the type of elliptic operator equations considered here. The results can be extended to more general operator equations, as long as the variational formulation, in combination with a suitable basis, induces a well-conditioned isomorphism on \(\mathrm{\ell }_{2}\). However, when the operator represents an isomorphism between spaces that are not simple tensor products, such as Sobolev spaces and their duals, additional concepts are required, which will be developed in a subsequent publication.

Notes

Acknowledgments

This work was funded in part by the Excellence Initiative of the German Federal and State Governments, DFG Grant GSC 111 (Graduate School AICES), the DFG Special Priority Program 1324, and NSF Grant #1222390.

References

  1. 1.
    Alpert, B.: A class of bases in \(L^2\) for the sparse representation of integral operators. SIAM J. Math. Anal. 24(1), 246–262 (1991)Google Scholar
  2. 2.
    Bachmayr, M.: Adaptive low-rank wavelet methods and applications to two-electron Schrödinger equations. Ph.D. thesis, RWTH Aachen (2012)Google Scholar
  3. 3.
    Ballani, J., Grasedyck, L.: A projection method to solve linear systems in tensor format. Numer. Linear Algebra Appl. 20(1), 27–43 (2013)Google Scholar
  4. 4.
    Barinka, A.: Fast evaluation tools for adaptive wavelet schemes. Ph.D. thesis, RWTH Aachen (2005)Google Scholar
  5. 5.
    Beylkin, G., Mohlenkamp, M.J.: Numerical operator calculus in higher dimensions. PNAS 99(16), 10246–10251 (2002)Google Scholar
  6. 6.
    Beylkin, G., Mohlenkamp, M.J.: Algorithms for numerical analysis in high dimensions. SIAM J. Sci. Comput. 26(6), 2133–2159 (2005)Google Scholar
  7. 7.
    Cances, E., Ehrlacher, V., Lelievre, T.: Convergence of a greedy algorithm for high-dimensional convex nonlinear problems. Math. Models Methods Appl. Sci. 21(12), 2433–2467 (2011)Google Scholar
  8. 8.
    Cohen, A.: Numerical Analysis of Wavelet Methods, Studies in Mathematics and Its Applications, vol. 32. Elsevier, Amsterdam (2003)Google Scholar
  9. 9.
    Cohen, A., Dahmen, W., DeVore, R.: Adaptive wavelet methods for elliptic operator equations: Convergence rates. Math. Comput. 70(233), 27–75 (2001)Google Scholar
  10. 10.
    Cohen, A., Dahmen, W., DeVore, R.: Adaptive wavelet methods II—beyond the elliptic case. Found. Comput. Math. 2(3), 203–245 (2002)Google Scholar
  11. 11.
    Dahmen, W.: Wavelet and multiscale methods for operator equations. Acta Numer. 6, 55–228 (1997)Google Scholar
  12. 12.
    DeVore, R., Petrova, G., Wojtaszczyk, P.: Approximation of functions of few variables in high dimensions. Constr. Approx. 33, 125–143 (2011)Google Scholar
  13. 13.
    Dijkema, T.J., Schwab, C., Stevenson, R.: An adaptive wavelet method for solving high-dimensional elliptic PDEs. Constr. Approx. 30(3), 423–455 (2009)Google Scholar
  14. 14.
    Falcó, A., Hackbusch, W.: On minimal subspaces in tensor representations. Found. Comput. Math. 12, 765–803 (2012)Google Scholar
  15. 15.
    Falcó, A., Hackbusch, W., Nouy, A.: Geometric structures in tensor representations. Preprint 9/2013, Max Planck Institute of Mathematics in the Sciences, Leipzig (2013)Google Scholar
  16. 16.
    Falcó, A., Nouy, A.: Proper generalized decomposition for nonlinear convex problems in tensor banach spaces. Numer. Math. 121, 503–530 (2012)Google Scholar
  17. 17.
    Grasedyck, L.: Hierarchical singular value decomposition of tensors. SIAM J. Matrix Anal. Appl. 31(4), 2029–2054 (2010)Google Scholar
  18. 18.
    Grasedyck, L., Kressner, D., Tobler, C.: A literature survey of low-rank tensor approximation techniques. GAMM-Mitt. 36, 53–78 (2013)Google Scholar
  19. 19.
    Griebel, M., Harbrecht, H.: Approximation of two-variate functions: Singular value decomposition versus regular sparse grids. INS Preprint No. 1109, Universität Bonn (2011)Google Scholar
  20. 20.
    Hackbusch, W.: Tensor Spaces and Numerical Tensor Calculus, Springer Series in Computational Mathematics, vol. 42. Springer, Berlin (2012)Google Scholar
  21. 21.
    Hackbusch, W., Khoromskij, B., Tyrtyshnikov, E.: Approximate iterations for structured matrices. Numer. Math. 109, 119–156 (2008)Google Scholar
  22. 22.
    Hackbusch, W., Kühn, S.: A new scheme for the tensor representation. J. Fourier Anal. Appl. 15(5), 706–722 (2009)Google Scholar
  23. 23.
    Hitchcock, F.L.: Multiple invariants and generalized rank of a \(p\)-way matrix or tensor. J. Math. Phys. 7, 39–79 (1927)Google Scholar
  24. 24.
    Khoromskij, B.N., Schwab, C.: Tensor-structured Galerkin approximation of parametric and stochastic elliptic PDEs. SIAM J. Sci. Comput. 33(1), 364–385 (2011)Google Scholar
  25. 25.
    Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009)Google Scholar
  26. 26.
    Kressner, D., Tobler, C.: Preconditioned low-rank methods for high-dimensional elliptic PDE eigenvalue problems. Comput. Methods Appl. Math. 11(3), 363–381 (2011)Google Scholar
  27. 27.
    Lathauwer, L.D., Moor, B.D., Vandewalle, J.: A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21(4), 1253–1278 (2000)Google Scholar
  28. 28.
    Matthies, H.G., Zander, E.: Solving stochastic systems with low-rank tensor compression. Linear Algebra Appl. 436(10), 3819–3838 (2012)Google Scholar
  29. 29.
    Metselaar, A.: Handling wavelet expansions in numerical methods. Ph.D. thesis, University of Twente (2002)Google Scholar
  30. 30.
    Novak, E., Wozniakowski, H.: Approximation of infinitely differentiable multivariate functions is intractable. J. Complex. 25, 398–404 (2009)Google Scholar
  31. 31.
    Oseledets, I., Tyrtyshnikov, E.: Breaking the curse of dimensionality, or how to use SVD in many dimensions. SIAM J. Sci. Comput. 31(5), 3744–3759 (2009)Google Scholar
  32. 32.
    Oseledets, I., Tyrtyshnikov, E.: Tensor tree decomposition does not need a tree. Tech. Rep., RAS, Moscow 2009–08 (2009)Google Scholar
  33. 33.
    Oseledets, I.V.: Tensor-train decomposition. SIAM J. Sci. Comput. 33(5), 2295–2317 (2011)Google Scholar
  34. 34.
    Schneider, R., Uschmajew, A.: Approximation rates for the hierarchical tensor format in periodic Sobolev spaces. J. Complexity 30, 56–71 (2014)Google Scholar
  35. 35.
    de Silva, V., Lim, L.H.: Tensor rank and the ill-posedness of the best low-rank approximation problem. SIAM J. Matrix Anal. Appl. 30(3), 1084–1127 (2008)Google Scholar
  36. 36.
    Stevenson, R.: On the compressibility of operators in wavelet coordinates. SIAM J. Math. Anal. 35(5), 1110–1132 (2004)Google Scholar
  37. 37.
    Tucker, L.R.: The extension of factor analysis to three-dimensional matrices. Contributions to Mathematical Psychology, pp. 109–127. Holt, Rinehart & Winston, New York (1964)Google Scholar
  38. 38.
    Tucker, L.R.: Some mathematical notes on three-mode factor analysis. Psychometrika 31, 279–311 (1966)Google Scholar
  39. 39.
    Uschmajew, A.: Well-posedness of convex maximization problems on Stiefel manifolds and orthogonal tensor product approximations. Numer. Math. 115, 309–331 (2010)Google Scholar
  40. 40.
    Uschmajew, A.: Regularity of tensor product approximations to square integrable functions. Constr. Approx. 34, 371–391 (2011)Google Scholar

Copyright information

© SFoCM 2014

Authors and Affiliations

  1. 1.Institut für Geometrie und Praktische MathematikRWTH AachenAachenGermany

Personalised recommendations