# Iterative Methods Based on Soft Thresholding of Hierarchical Tensors

- 308 Downloads
- 1 Citations

## Abstract

We construct a soft thresholding operation for rank reduction in hierarchical tensors and subsequently consider its use in iterative thresholding methods, in particular for the solution of discretized high-dimensional elliptic problems. The proposed method for the latter case adjusts the thresholding parameters, by an a posteriori criterion requiring only bounds on the spectrum of the operator, such that the arising tensor ranks of the resulting iterates remain quasi-optimal with respect to the algebraic or exponential-type decay of the hierarchical singular values of the true solution. In addition, we give a modified algorithm using inexactly evaluated residuals that retains these features. The effectiveness of the scheme is demonstrated in numerical experiments.

### Keywords

Low-rank tensor approximation Hierarchical tensor format Soft thresholding High-dimensional elliptic problems### Mathematics Subject Classification

41A46 41A63 65D99 65F10 65N12 65N15## 1 Introduction

Low-rank tensor decompositions have proven to be a very successful tool in the numerical approximation of high-dimensional problems, such as partial differential equations posed on high-dimensional domains. Such problems arise, for instance, in the context of multi-parametric, stochastic, or quantum-physical models; for an overview of various applications of structured tensor decompositions, we refer to the survey articles [25, 28, 34] and the references given there. In this work, we consider the task of finding approximations in low-rank form of high-dimensional functions defined implicitly as solutions of linear operator equations. This problem is not only of intrinsic interest, but also arises, for instance, in methods for time-dependent or eigenvalue problems.

Low-rank approximations can exploit particular structural features to achieve highly compressed representations of high-dimensional objects. However, this comes at the price of a strong degree of nonlinearity in these representations that makes the computation of such approximations a challenging problem. In the case of subspace-based tensor representations such as the *hierarchical Tucker format* [30] or the special case of the *tensor train format* [46], one has a notion of tensor rank as a tuple of matrix ranks of certain matricizations. From a computational perspective, these tensor formats have the major advantage that for any tensor given in such a representation, by a simple combination of linear algebra procedures, one may obtain an error-controlled, quasi-optimal approximation by a tensor of lower ranks. This is achieved by truncating the ranks of a *hierarchical singular value decomposition* [24, 47, 50], or HSVD for short, of the tensor.

We consider an alternative procedure for reducing ranks that is based on *soft thresholding* of the singular values in a HSVD, as opposed to the mentioned rank truncation (which would correspond to their hard thresholding). The new procedure has similar complexity and quasi-optimality properties, but unlike the truncation it is *non-expansive*. Making essential use of this feature, we construct iterative methods that are both guaranteed to converge and ensure strong tensor rank bounds for each iterate under very general conditions on the operator.

Even when both \({\mathcal {A}}\) and \({\mathbf {f}}\) have exact low-rank representations, the unique solution \( \mathbf {u}^*\) of the problem (1.1) may no longer be of low rank. It turns out, however, that in many cases of interest, \(\mathbf {u^*}\) can still be efficiently *approximated* by low-rank tensors up to any given error tolerance. Here, one can obtain algebraic error decay with respect to the ranks under fairly general conditions [38, 49], and superalgebraic or exponential-type decay in more specific situations [1, 15, 23, 37].

When a solution \(\mathbf{u}^*\) that has this property is approximated by an iteration such as (1.3), it is not clear to what extent also the iterates \(\mathbf{u}_{k}\) retain comparably low ranks, since the basic iteration without truncation could in principle lead to an exponential rank increase. That the ranks of \(\mathbf{u}_k\) remain comparable to those needed for approximating \(\mathbf{u}^*\) at the current accuracy therefore depends essentially on the appropriate choice of thresholding parameters \(\alpha _k\). Keeping the tensor ranks of iterates as low as possible is of crucial importance for the computational complexity of such methods, since the number of operations for the procedures that need to be performed, such as orthogonalizations, grows like the fourth power of these ranks.

A fundamental question is therefore whether in the course of such an iteration, one can always fully profit from the low-rank approximability of \(\mathbf{u}^*\). In principle, the ranks of \(\mathbf{u}_k\) could depend, for instance, on the presence of a strong low-rank structure of \({\mathcal {A}}\), or on sufficiently good starting values \(\mathbf{u}_0\). We show in this work that when the rank reduction in each step is done by the soft thresholding procedure, *quasi-optimal* tensor ranks can be enforced *for each single iterate*\(\mathbf{u}_k\), independently of the rank increase caused by \({\mathcal {A}}\). Here, we may always start with rank zero, \(\mathbf{u}_0 = 0\). The corresponding \(\alpha _k\) are chosen by a simple a posteriori rule, requiring no information on the low-rank approximability of \(\mathbf{u}^*\). Our scheme thus automatically strikes a compromise, under quite general assumptions, between maintaining linear convergence of the iteration to \(\mathbf{u}^*\) and preventing the ranks of iterates from growing more strongly than necessary.

Throughout this paper, the notation \(A\lesssim B\) is used to indicate that there exists a constant \(C>0\) such that \(A \le C B\), and \(A \sim B\) if and only if \(A \lesssim B\) and \(B\lesssim A\).

### 1.1 Quasi-Optimality

By quasi-optimality of tensor ranks, we refer to the following property: Assuming that the hierarchical singular values of \(\mathbf{u}^*\) have a certain algebraic or exponential-type decay, the maximum tensor rank of each \(\mathbf{u}_k\) can be bounded, up to a uniform multiplicative constant, by the maximum hierarchical rank of the best hierarchical tensor approximation to \(\mathbf{u}^*\) of the same accuracy. To ensure this, we exploit the non-expansiveness of soft thresholding, which allows us to choose the thresholding parameters in each step as large as required to control the ranks, without compromising the convergence of the iteration.

*weak*-\(\ell ^p\)

*spaces*to quantify algebraic decay of sequences. These spaces are defined as follows: For a given real sequence \(a = (a_k)_{k\in {\mathbb {N}}}\), for each \(n\in {\mathbb {N}}\), let \(a^*_n\) be the

*n*-th largest of the values \(|a_k|\). Then, for \(p>0\), the space \(\ell ^{p,\infty }\) is defined as the collection of sequences for which

*p*precisely when there exists \(C>0\) such that

*C*is proportional to \(|a|_{\ell ^{p,\infty }}\), see, e.g., [17], that is, \(C = C_p |a|_{\ell ^{p,\infty }}\) where \(C_p>0\) depends only on

*p*. In other words, the sequences in \(\ell ^{p,\infty }\) are precisely those for which the error of approximation in \(\ell ^2\) by the

*n*entries largest in modulus decreases like \(n^{-s}\) or faster.

Let us now exemplify our rank estimates in the case \(d=2\), that is, for the low-rank approximation of \(\mathbf{u}\in \mathcal {H}_1\otimes \mathcal {H}_2\). Here, the notion of rank is the rank of the Hilbert–Schmidt operator from \(\mathcal {H}_2\) to \(\mathcal {H}_1\) that is induced by \(\mathbf{u}\), which reduces to the matrix rank in the case of finite-dimensional \(\mathcal {H}_1, \mathcal {H}_2\). For what follows, let \(\sigma =(\sigma _k)_{k\ge 1}\) be the corresponding singular values of \(\mathbf{u}\).

*n*approximation of \(\mathbf{u}\) is then precisely \((\sum _{k>n} |\sigma _k|^2 )^\frac{1}{2}\). If \(\sigma \in \ell ^{p,\infty }\), then for the rank \(n_\varepsilon \) required to approximate \(\mathbf{u}\) up to an error \(\varepsilon \), by (1.4) we have the bound

We establish estimates of the form (1.5) and (1.6) for the iterates of (1.3) with appropriately chosen \(\alpha _k\). In the case \(d>2\), these estimates are for the hierarchical tensor format. The ranks in the above estimates are then replaced by the maximum entry of the hierarchical rank tuple, and in the bounds, we obtain additional factors that are fixed powers of *d*.

Although our resulting procedure of the form (1.3) can in principle be formulated on infinite-dimensional Hilbert spaces, in this work we restrict our considerations concerning a numerically realizable version to *fixed discretizations*. In other words, in the form given here, the scheme applies either to infinite-dimensional \({\mathcal {H}}_i \simeq \ell ^2({\mathbb {N}})\) (which is of course not implementable in practice), or to a fixed finite-dimensional choice \({\mathcal {H}}_i \simeq {\mathbb {R}}^{n_i}\). That the iteration remains applicable on infinite-dimensional \({\mathcal {H}}_i\) is crucial, since this means that the method in itself does not introduce any dependence on finite tensor mode sizes \(n_i\). When suitable preconditioning is used such that the condition number of \({\mathcal {A}}\) remains bounded independently of \(n_i\) as well, the method is therefore robust with respect to the underlying discretization.

A further important point in this regard is that we do not make any restrictive assumptions on a low-rank structure of \({\mathcal {A}}\). For instance, one can reformulate problems posed on standard Sobolev spaces (e.g., on \(H^1\) on a product domain) in the present framework using suitable Riesz bases. This leads to \({\mathcal {A}}\) bounded on the tensor product space \(\mathcal {H}=\ell ^2({\mathbb {N}}^d)\) of the form (1.2). However, such \({\mathcal {A}}\) is in general not of bounded rank, and when one passes to a finite-dimensional setting, its ranks depend on the discretization; these issues are addressed in detail in [3]. The results of the present work apply also in such cases.

After describing the construction of the operation \(\mathbf {S}_{\alpha }\) in (1.3), we first identify choices of geometrically decreasing \(\alpha _k\) that lead to the desired rank estimates, provided that the asymptotic decay behavior of the hierarchical singular values of \(\mathbf{u}^*\) is known. We then construct a scheme which achieves the same type of rank bounds without using such knowledge. Based on an a posteriori criterion, the method adjusts \(\alpha _k\) to the unknown decay of the hierarchical singular values such that the quasi-optimal ranks are preserved. This method requires no a priori information beyond bounds on the spectrum of \({\mathcal {A}}\) and on the norm of \({\mathbf {f}}\). In a third step, we develop a perturbed version of the scheme that permits inexactly evaluated residuals.

### 1.2 Relation to Previous Work

The combination of a convergent iterative scheme with a low-rank truncation, such as the HSVD and its variants, has been proposed for solving problems in low-rank formats in a number of works, for instance in [4, 6, 7, 29, 33, 35, 37]. A typical problem is that the truncation is either done up to error tolerances that still ensure convergence of the iteration, in which case it is not clear how large the tensor ranks of iterates can be, or one always truncates up to predefined target rank, where one instead has the problem that convergence is only guaranteed under very restrictive conditions on the underlying iteration. In [7], for instance, it is demonstrated that convergence is preserved with fixed-rank truncations in solving (1.1), provided that \({\mathcal {A}}\) has condition number very close to one.

In greedy methods [10, 20], low-rank approximations are improved by iteratively adding terms without performing any rank truncation operations. The convergence of such methods can be proven under quite general assumptions, but the resulting ranks are typically far from optimal when \(d>2\).

A further common approach is based on optimizing component tensors in a tensor decomposition to minimize a suitable objective functional. Examples are methods based on the alternating least squares approach and the density matrix renormalization group [31, 51], as well as Riemannian optimization on fixed-rank tensor manifolds [13, 36]. For further details and references concerning such methods, see also [28, §10] and [25]. With these strategies based on local optimization, one can easily restrict the tensor ranks; in fact, for many such methods, the ranks need to be fixed a priori. However, for arbitrary starting values, convergence to approximate solutions is not ensured. Such methods can indeed fail to produce good approximations, for instance due to convergence to local minima of the objective functional.

The AMEn scheme [18] is a hybrid approach in that it uses residual information, combined with optimization strategies that exploit the structure of the tensor format. Here, convergence can in fact be shown. Although the scheme has been demonstrated to perform well on typical model problems, it needs to be noted that the convergence theory given in [18] does not apply to the practically realized method (which uses additional tensor truncations), and the rank bounds that one obtains from the theory scale exponentially in *d* and do not relate to the approximability of the solution.

For the case that the rank reduction in an iteration for solving (1.1) is done by a truncated HSVD (i.e., by hard thresholding), a scheme for choosing thresholding parameters that lead to near-optimal ranks is given in [2, 3]. To the authors’ knowledge, this is the only previous instance of a method that, under realistic requirements on \({\mathcal {A}}\), guarantees global converge to the true solution while at the same time, the arising ranks can be estimated in terms of the ranks required for approximating the solution. A limitation of the approach used there to control the ranks is that their near-optimality is enforced by truncating with a sufficiently large error tolerance, which can be done only after every few iterations when a certain error reduction has been achieved. The ranks of intermediate iterates can therefore still accumulate in the iterations between these complexity reductions and thus depend on the *low-rank structure of the operator*. At least concerning complexity bounds, this can be problematic if each application of the operator already causes a large rank increase (although this can be mitigated in practice, see Sect. 6). In the method proposed in this work, such an accumulation can be ruled out, and intermittent, sufficiently large increases in approximation errors that restore quasi-optimality are not required. A detailed comparison to the results of [2, 3] is given in Sect. 5.2.

Note that here we do not address the aspect of an adaptive underlying discretization of the problem as considered in [2, 3]. The version of our algorithm allowing inexact evaluation of residuals can, however, serve as a starting point for combining the method with adaptivity for identifying suitable discretizations in the course of the iteration. Furthermore, we expect that the concepts put forward here can also be used in the construction of adaptive methods for sparse basis representations.

Iterations using soft thresholding of sequences have been studied extensively in the context of inverse and ill-posed problems, see, e.g., [5, 8, 16], where they are especially well suited for obtaining convergence under very general conditions. Note that in such a setting, a priori choices of geometrically decreasing thresholding parameters have been proposed, e.g., in [14, 52]. Although the choice of thresholding parameters is an important issue also for ill-posed problems, they require entirely different strategies from the ones considered here, aiming at the identification of positive thresholding parameters that yield the desired regularization. Our approach for controlling the complexity of iterates—in the present case, the arising tensor ranks—in iterative schemes for well-posed problems, where the thresholding parameters converge to zero, appears to be new, in particular the a posteriori criterion that steers their decrease.

Soft thresholding of matrices, to which our procedure for hierarchical tensors reduces for \(d=2\), is also used in iterative methods for nuclear norm minimization in matrix completion, see[9, 42, 48], which are formally similar to (1.3), but have quite different features since the involved operators are far from being isomorphisms. In such a setting, dual gradient descent methods [40] can offer advantageous convergence behavior, but these exploit that iterates for the dual variable are in a relatively small linear space for completion problems and are thus not appropriate in our situation. In tensor completion, as a replacement of the nuclear norm in the matrix case, the sum of nuclear norms of matricizations has been proposed [32, 41]; however, with this approach, one does not recover the particular properties of the matrix case.^{1} In [21], repeated soft thresholding of matricizations of Tucker tensors is used in a splitting scheme for minimizing such sums of nuclear norms. This procedure bears some resemblance to our soft thresholding of hierarchical tensors, but eventually solves a different problem.

In an alternative variational formulation, one can prescribe an error tolerance, for instance \(\Vert {\mathcal {A}}\mathbf{v}-{\mathbf {f}}\Vert \le \varepsilon \), and attempt to minimize the tensor ranks over the set of such \(\mathbf{v}\). Although the admissible set is then convex, even in the matrix case \(d=2\), the rank does not define a convex functional. However, one can instead minimize an appropriate convex relaxation, such as the \(\ell ^1\)-norm of singular values. It is well known that in the matrix case, such relaxed problems can be solved by *proximal gradient methods*, which can be rewritten as iterative soft thresholding [42] and hence take precisely the form (1.3) when \(d=2\). In this case, our method can therefore also be motivated as a rank minimization scheme, although this connection does not play a role in the analysis. Note, however, that in the case of higher-order tensors, where our soft thresholding procedure no longer permits an interpretation of the resulting scheme as a proximal gradient method, this is only a formal analogy.

### 1.3 Outline

This article is arranged as follows: In Sect. 2, we collect some prerequisites concerning the hierarchical tensor format as well as soft thresholding of sequences and of Hilbert–Schmidt operators. In Sect. 3, we then describe and analyze the new soft thresholding procedure for hierarchical tensors. In Sect. 4, we consider the combination of this procedure with general contractive fixed-point iterations and derive rank estimates for sequences of thresholding parameters that are chosen based on a priori information on the tensor approximability of \(\mathbf{u}^*\). In Sect. 5, we introduce an algorithm that automatically determines a suitable choice of thresholding parameters without using information on \(\mathbf{u}^*\), analyze its convergence, and additionally give a modified version of the scheme based on inexact residual evaluations. In Sect. 6, we conclude with numerical experiments that illustrate the practical performance of the proposed method.

## 2 Preliminaries

By \(||\cdot ||\), we always denote either the canonical norm on \(\mathcal {H}\), which is the product of the norms on the \(\mathcal {H}_i\), or the \(\ell ^2\)-norm when applied to a sequence.

For separable Hilbert spaces \(\mathcal {S}_1, \mathcal {S}_2\), we write \({\mathrm {HS}}(\mathcal {S}_1, \mathcal {S}_2)\) for the space of Hilbert–Schmidt operators from \(\mathcal {S}_1\) to \(\mathcal {S}_2\) with the Hilbert–Schmidt norm \(||\cdot ||_{\mathrm {HS}}\), which reduces to the Frobenius norm in the case of finite-dimensional spaces. Hilbert–Schmidt operators have a singular value decomposition (SVD) with singular values in \(\ell ^2\) satisfying the following perturbation estimate, shown for matrices in [44] and for Hilbert–Schmidt operators in [43].

**Theorem 2.1**

(cf. [43, Cor. 5.3]) Let \(\mathcal {S}_1, \mathcal {S}_2\) be separable Hilbert spaces, let \(\mathbf {X}, {\tilde{\mathbf{X}}} \in {\mathrm {HS}}(\mathcal {S}_1, \mathcal {S}_2)\), and let \(\sigma , \tilde{\sigma }\in \ell ^2({\mathbb {N}})\) denote the corresponding sequences of singular values. Then, \(|| \sigma - \tilde{\sigma }||_{\ell ^2({\mathbb {N}})} \le || \mathbf {X} - {\tilde{\mathbf{X}}} ||_{{\mathrm {HS}}}\).

### 2.1 The Hierarchical Tensor Format

We now briefly recall definitions and facts concerning the hierarchical Tucker format [30] and collect some basic observations that will play a role later. For further details on the hierarchical format, we refer to [28] and to the exhaustive treatment in [27].

*binary dimension tree*for tensor order

*d*, that is, \(\mathbb {T}\) is a set of non-empty subsets of \(\{1,\ldots ,d\}\) such that \(\{1,\ldots ,d\} \in \mathbb {T}\) and for each \(n\in \mathbb {T}\), either \(\#n=1\) or there exist disjoint \(n_1,n_2\in \mathbb {T}\) such that \(n_1\cup n_2 = n\). Examples of such dimension trees for \(d=4\) are given in Fig. 1. A choice of \(\mathbb {T}\) that is closely related to the tensor train format is the linear dimension tree

*matricization*\({\mathcal {\hat{M}}}_n(\mathbf{u})\) as the canonical reinterpretation of \(\mathbf{u}\in \bigotimes _{i =1}^d \mathcal {H}_i\) as a Hilbert–Schmidt operator

*nestedness property*

*effective edges*

*t*(

*n*) the uniquely determined element of \(\{1,\ldots ,E\}\) such that \(n=n_{t(n)}\) or \(n=n_{t(n)}^c\).

*t-matricization*of the tensor \(\mathbf{u}\). By \(\mathcal {M}^{-1}_t\), we denote the mapping that converts this matricization back to a tensor. Note that for each

*t*, one has \(\mathcal {M}_t(\mathbf{u}) + \mathcal {M}_t(\mathbf{v}) = \mathcal {M}_t(\mathbf{u}+\mathbf{v})\) and

*mode frames*, and to the coefficient tensors \(\mathbf {B}^{(n)}\) and \(\mathbf {C}\) at interior nodes as

*transfer tensors*. Combining (2.5) and (2.4) recursively, one thus obtains a representation of \(\mathbf{u}\) only in terms of transfer tensors and mode frames, that is, in terms of tensors of order up to three. When such a representation uses an orthonormal basis for each \({\mathcal {U}}_n\) as we have supposed here, we refer to it as an

*orthonormal hierarchical representation*. For such a representation, an SVD \(\mathbf {C}_{ij} = \sum _{\ell =1}^{r(n_1)} \sigma _\ell \mathbf {V}^{(n_1)}_{i,\ell } \mathbf {V}^{(n_2)}_{j,\ell } \) of \(\mathbf {C}\) as in (2.5) yields the SVD

The effective edges \(\mathbb {E}\) give rise to an equivalence relation between different dimension trees for a given *d*: In general, there are several dimension trees \(\mathbb {T}\) that share the same corresponding \(\mathbb {E}\), that is, involve the same matricizations of the tensor. The difference between \(\mathbb {T}\) that are equivalent in this sense is that they have the root element of the tree at a different effective edge. This is illustrated in Fig. 1 for a tensor of order four.

*n*, so that

### 2.2 Soft Thresholding

Applied to each element of a vector or sequence, hard thresholding provides a very natural means of obtaining sparse approximations by dropping entries of small absolute value, which is closely related to best *n*-term approximation [17]. Soft thresholding not only replaces entries that have absolute value below the threshold by zero, but also decreases all remaining entries, incurring an additional error. However, this operation has a non-expansiveness property that is useful in the construction of iterative schemes. This property can be derived from a variational characterization.

*proximity operator*\({{\mathrm{prox}}}^{\alpha }_{{\mathcal {J}}} : {\mathcal {G}} \rightarrow {\mathcal {G}}\) by

**Lemma 2.2**

*non-expansive*, that is,

*i*-th singular value of \(\mathbf {X}\), we set

*nuclear norm*,

**Lemma 2.3**

This statement is shown for finite matrices \(\mathbf {X}\), e.g., in [9] using subgradient characterizations. Based on Theorem 2.1, we can give the following argument for Hilbert–Schmidt operators.

*Proof*

## 3 Soft Thresholding of Hierarchical Tensors

For a hierarchical tensor \(\mathbf {u}\) with suitably numbered edges \(\{n_t,n_t^c\}\in \mathbb {E}, t = 1, \ldots , E\), the soft thresholding \( {\mathbf {S}}_{{\alpha }} (\mathbf {u}) \) can be obtained as follows: Starting with \(\mathbf{u}_0 = \mathbf{u}\), for each *t*, we first rearrange \(\mathbf{u}_{t-1}\) such that the root element is on \(\{n_t,n_t^c\}\) with a singular value decomposition of \(\mathcal {M}_t(\mathbf{u}_{t-1})\), which exposes the singular values \(\sigma _t(\mathbf{u}_{t-1})\) and thus allows the direct application of \(S_{t,\alpha }\) to obtain \(\mathbf{u}_t {:}{=} S_{t,\alpha }(\mathbf{u}_{t-1})\). This yields \(\mathbf {S}_\alpha (\mathbf{u}) = \mathbf{u}_E\).

Start with an orthonormal hierarchical representation of \(\mathbf{u}\) such that the root element is at \(\{n_1,n_1^c\}{:}{=}\{ \{1\},\{2,\ldots ,d\}\}\in \mathbb {E}\), perform an SVD of \(\mathcal {M}_1(\mathbf{u})\), and use this to obtain \(\mathbf{u}_1{:}{=} S_{1,\alpha }(\mathbf{u})\).

If \(d=2\), we are done; if \(d>2\), call \({{\mathrm{\textsc {STRecursion}}}}_\alpha (\mathbf{u}_1, \{2,\ldots ,d\})\) as defined in Algorithm 1.

*Remark 3.1*

Assuming that we are given a hierarchical tensor with \(\dim {\mathcal {H}}_i \le n \in {\mathbb {N}}\) and representation ranks bounded by *r*, then the first step of bringing this tensor into its initial representation takes \({\mathcal {O}}(d r^4 + dr^2n)\) operations. Inspecting the procedure given in Sect. 2.1, one finds that moving the root element from one edge to an adjacent one costs \({\mathcal {O}}(r^4)\) operations, and handling the arising SVDs is of the same order. In the general procedure we have described, the root is moved across each interior node of the tree exactly three times (which is clearly also the minimum number possible), at total cost \({\mathcal {O}}(d r^4)\). The total complexity for applying \(\mathbf {S}_\alpha \) applied in this manner is thus of order \({\mathcal {O}}(d r^4 + dr^2n)\), which is the same as for computing the HSVD.

**Proposition 3.2**

For any \(\mathbf{u}, \mathbf{v}\in \mathcal {H}\) and \(\alpha >0\), the operator \({\mathbf {S}}_\alpha \) defined in (3.1) satisfies \(||{\mathbf {S}}_\alpha (\mathbf{u}) - {\mathbf {S}}_\alpha (\mathbf{v})|| \le ||\mathbf{u}- \mathbf{v}||\).

The following lemma guarantees that applying soft thresholding to a certain matricization of a tensor does not increase the hierarchical singular values of any other matricization of this tensor.

**Lemma 3.3**

For any \(\mathbf{v}\in \mathcal {H}\) and for \(t, s = 1, \ldots , E\), one has \(\sigma _{t,i}( \mathbf{u}) \ge \sigma _{t,i}( S_{s,\alpha } (\mathbf{u}))\) for all \(i\in {\mathbb {N}}\) and any \(\alpha \ge 0\).

*Proof*

Note that for the action of \(S_{s,\alpha }\), the tensor is rearranged such that the edge *s* holds the root element. Thus, the statement follows exactly as in part 3 of the proof of Theorem 11.61 in [27], see also the proof of Theorem 7.18 in [39]; there it is shown that when singular values are decreased at the root element, this cannot cause any singular value of the other matricizations in the dimension tree to increase. \(\square \)

Using the above lemma, the error incurred by application of \(\mathbf {S}_\alpha \) to a tensor \(\mathbf{u}\) can be estimated in terms of the sequences of hierarchical singular values \(\sigma _t(\mathbf{u}), t=1,\ldots , E\).

**Lemma 3.4**

*Remark 3.5*

*t*and \(\alpha \), this gives

*d*-dependence of the arising constant \(E\sim d\) is concerned.

*Proof of Lemma 3.4*

*t*.

*t*,

*t*concludes the proof. \(\square \)

Let \(\mathbf{u}\in \mathcal {H}\), then \( \sigma _t(\mathbf{u}) \in \ell ^2\), which implies that \(d^\alpha _t(\mathbf{u}) \rightarrow 0\) as \(\alpha \rightarrow 0\). Without further assumptions, however, this convergence can be arbitrarily slow. In the following proposition, we collect results that quantify this convergence in terms of the decay of the hierarchical singular values.

**Proposition 3.6**

- (i)If in addition \(\sigma _{t}(\mathbf{u}) \in \ell ^{p,\infty }, t=1,\ldots ,E\), for a \(p \in (0,2)\), one hasand thus$$\begin{aligned} r_{t,\alpha } \lesssim |\sigma _t(\mathbf{u})|_{\ell ^{p,\infty }}^p \, \alpha ^{-p},\quad \tau _{t,\alpha } \lesssim \, |\sigma _t(\mathbf{u})|^{p/2}_{\ell ^{p,\infty }} \, \alpha ^{1 - p /2 }, \end{aligned}$$(3.3)where the constants depend only on$$\begin{aligned} || {\mathbf {S}}_\alpha (\mathbf{u}) - \mathbf{u}|| \lesssim E\, \max _{t=1,\ldots ,E} |\sigma _t(\mathbf{u})|^{p/2}_{\ell ^{p,\infty }} \, \alpha ^{1-p/2} , \end{aligned}$$(3.4)
*p*. - (ii)If \(\sigma _{t,j}(\mathbf{u}) \le C e^{- c j^\beta }\) for \(j\in {\mathbb {N}}\) and \(t=1,\ldots ,E\), with \(C,c , \beta > 0\), thenand therefore$$\begin{aligned} r_{t,\alpha } \le \bigl ( c^{-1} \ln (C\alpha ^{-1}) \bigr )^{\frac{1}{\beta }} \lesssim (1 + |\ln \alpha |)^{\frac{1}{\beta }}, \quad \tau _{t,\alpha } \lesssim (1 + |\ln \alpha |)^{\frac{1}{2\beta }} \,\alpha , \end{aligned}$$with constants that depend on$$\begin{aligned} || {\mathbf {S}}_\alpha (\mathbf{u}) - \mathbf{u}|| \lesssim E\, (1 + |\ln \alpha |)^{\frac{1}{2\beta }} \, \alpha , \end{aligned}$$(3.5)
*C*,*c*, and \(\beta \).

*Proof*

The estimates (3.3) are shown in [17], and (3.4) follows with (3.2). In the same manner, we obtain (3.5) by arguing analogously to [17, Section7.4]. \(\square \)

*Remark 3.7*

*sequence*of convex optimization problems: One has \(\mathbf {S}_\alpha (\mathbf{u}) = \tilde{\mathbf{u}}_E\), where

## 4 Fixed-Point Iterations with Soft Thresholding

Since \(\mathbf {S}_\alpha \) is non-expansive, the mapping \(\mathbf {S}_\alpha \circ \mathcal {F}\) still yields a convergent fixed-point iteration, but with a modified fixed point, whose distance to the original fixed point of \(\mathcal {F}\) is quantified by the following result.

**Lemma 4.1**

*Proof*

For fixed thresholding parameter \(\alpha \), the thresholded Richardson iteration thus converges, at the same rate \(\rho \) as the unperturbed Richardson iteration, to a modified solution \(\mathbf{u}^\alpha \). Its distance to the true solution \(\mathbf{u}^*\) is proportional, uniformly in \(\alpha \), to \(||{\mathbf {S}}_\alpha (\mathbf{u}^*)-\mathbf{u}^*||\), that is, to the error of thresholding \(\mathbf{u}^*\). Note that Lemma 4.1 in fact also holds with \({\mathbf {S}}_\alpha \) replaced by an arbitrary non-expansive mapping on \(\mathcal {H}\).

### 4.1 A Priori Choice of Thresholding Parameters

*almost*the unperturbed rate \(\rho \). To this end, observe that

**Proposition 4.2**

- (i)Then, for the choice \(\alpha _k {:}{=} (\rho ^{k+1} c_0)^\frac{2}{2-p}\) in the iteration (4.3), with \(c_0>0\) such that (4.5) holds, we havewhere \(C_1\) depends on \(\rho \) and$$\begin{aligned} ||\mathbf{u}_{k} - \mathbf{u}^*|| \le C_1 E \max _t |\sigma _t(\mathbf{u}^*)|^{p/2}_{\ell ^{p,\infty }} \, k \, \rho ^k, \end{aligned}$$
*p*. Furthermore, for any \(\tilde{\rho }\) such that \(\rho < {\tilde{\rho }} < 1\), with \(\alpha _k {:}{=} ({\tilde{\rho }}^{k+1} c_0)^\frac{2}{2-p}\), we havewhere \(C_2\) depends on \(\rho \) and$$\begin{aligned} ||\mathbf{u}_{k} - \mathbf{u}^*|| \le C_2 E \max _t |\sigma _t(\mathbf{u}^*)|^{p/2}_{\ell ^{p,\infty }} (\tilde{\rho }-\rho )^{-1} {\tilde{\rho }}^{k+1} , \end{aligned}$$(4.6)*p*. - (ii)If \(\sigma _{t,k}(\mathbf{u}^*) \le C e^{- c k^\beta }\) with \(C, c , \beta > 0\), for the choice \(\alpha _k {:}{=} \rho ^{k+1} c_0\), we haveand with \(\alpha _k {:}{=} {\tilde{\rho }}^{k+1} c_0\), we have instead$$\begin{aligned} ||\mathbf{u}_{k} - \mathbf{u}^*|| \lesssim E k^{1 + {\frac{1}{2\beta }}} \, \rho ^k, \end{aligned}$$where the constant depends on \(({\tilde{\rho }} - \rho )^{-1}\).$$\begin{aligned} ||\mathbf{u}_{k} - \mathbf{u}^*|| \lesssim E \,k^{ {\frac{1}{2\beta }}} \, {\tilde{\rho }}^k, \end{aligned}$$(4.7)

*Proof*

*k*in the assertion thus arises due to the logarithmic term in (3.5). For \(\alpha _k {:}{=} {\tilde{\rho }}^{k+1} c_0\), with \(\theta \) as above,

In summary, in this idealized setting with full knowledge of the decay of \(||{\mathbf {S}}_\alpha (\mathbf{u}^*) - \mathbf{u}^*||\) with respect to \(\alpha \), the above choices ensure convergence of the iteration with any asymptotic rate \(\tilde{\rho }> \rho \).

### 4.2 Rank Estimates

We now give estimates for the ranks of the iterates that can arise in the course of the iteration, assuming that the \(\alpha _k\) are chosen as in Proposition 4.2.

For the proof, we will use the following lemma, which is a direct adaptation of [11, Lemma5.1], where the same argument was applied to hard thresholding of sequences; it was restated for soft thresholding of sequences (with the same proof) in [14].

**Lemma 4.3**

- (i)If \(\sigma _t(\mathbf{v})\in \ell ^{p,\infty }\) with \(0<p<2\) for all \(t\in \{1,\ldots ,E\}\), then$$\begin{aligned} {{\mathrm{rank}}}_t \bigl ({\mathbf {S}}_\alpha (\mathbf{w})\bigr ) \le \frac{4 \varepsilon ^2}{\alpha ^2} + C_p |\sigma _t(\mathbf{v})|^p_{\ell ^{p,\infty }} \alpha ^{-p}. \end{aligned}$$
- (ii)If \(\sigma _{t,j}(\mathbf{v}) \le C e^{-c j^\beta }\) for \(j\in {\mathbb {N}}\) with \(C, c, \beta > 0\), then$$\begin{aligned} {{\mathrm{rank}}}_t \bigl ({\mathbf {S}}_\alpha (\mathbf{w})\bigr ) \le \frac{4 \varepsilon ^2}{\alpha ^2} + \bigl ( c^{-1} \ln (2C\alpha ^{-1}) \bigr )^{1/\beta }. \end{aligned}$$

*Proof*

*t*we have

**Theorem 4.4**

- (i)If \(\sigma _t (\mathbf{u}^*) \in \ell ^{p,\infty }, t=1,\ldots ,E\), with \(0<p<2\), then for the choice \(\alpha _k {:}{=} ({\tilde{\rho }}^{k+1} c_0)^\frac{2}{2-p}\) in the iteration (4.3), with \(c_0>0\) such that (4.5) holds, we havewhere \(s = \textstyle \frac{1}{p} - \frac{1}{2}\).$$\begin{aligned} ||\mathbf{u}_k -\mathbf{u}^*|| \lesssim d \varepsilon _k, \qquad \max _{t=1,\ldots ,E} {{\mathrm{rank}}}_t(\mathbf{u}_k) \lesssim d^2 \varepsilon _k^{-\frac{1}{s}}, \end{aligned}$$(4.8)
- (ii)If \(\sigma _{t,j}(\mathbf{u}^*) \le C e^{- c j^\beta }\), for \(j\in {\mathbb {N}}\) and \(t=1,\ldots ,E\), with \(C, c, \beta > 0\), for the choice \(\alpha _k {:}{=} {\tilde{\rho }}^{k+1} c_0\), we have$$\begin{aligned} ||\mathbf{u}_{k} - \mathbf{u}^*||\lesssim & {} d \bigl ( 1+ |\log \varepsilon _k| \bigr )^{\frac{1}{2\beta }} \varepsilon _k, \qquad \max _{t=1,\ldots ,E} {{\mathrm{rank}}}_t(\mathbf{u}_k)\nonumber \\\lesssim & {} d^2 \bigl ( 1+ |\log \varepsilon _k|\bigr )^{\frac{1}{\beta }}. \end{aligned}$$(4.9)

*Proof*

*t*, Lemma 4.3 gives

*Remark 4.5*

Theorem 4.4 yields quasi-optimal rank bounds in the sense discussed in Sect. 1.1. We comment further on this point after stating our main result Theorem 5.1, which is of the same type but concerns a practical algorithm, in the next section.

## 5 A Posteriori Choice of Parameters

The results of the previous section lead to the question whether estimates as in Theorem 4.4 can still be recovered when a priori knowledge of the decay of the sequences \(\sigma _t(\mathbf{u}^*)\) is not available. We thus consider now a modified scheme that adjusts the \(\alpha _k\) automatically without using such information on \(\mathbf{u}^*\), but still yields quasi-optimal rank estimates as in Theorem 4.4 for both cases considered there.

Note that Algorithm 2 only requires—besides a hierarchical tensor representation of \({\mathbf {f}}\) and the action of \({\mathcal {A}}\) on such representations—bounds \(\gamma , \varGamma \) on the spectrum of \({\mathcal {A}}\), certain quantities derived from these, as well as constants \(\nu \) and \(\theta \) that can be adjusted arbitrarily. The following is the main result of this work.

**Theorem 5.1**

- (i)If \(\sigma _t (\mathbf{u}^*) \in \ell ^{p,\infty }, t=1,\ldots ,E\), with \(0<p<2\), then there exists \({\tilde{\rho }} \in (0,1)\) such that with \(\varepsilon _k {:}{=} {\tilde{\rho }}^k\) and \(s = \frac{1}{p} - \frac{1}{2}\), one hasThe constants depend on \(\gamma , \varGamma , \theta , \nu , \alpha _0\), and$$\begin{aligned} ||\mathbf{u}_k -\mathbf{u}^*|| \lesssim d \varepsilon _k, \qquad \max _{t=1,\ldots ,E}{{\mathrm{rank}}}_t(\mathbf{u}_k) \lesssim d^2 \max _{t=1,\ldots ,E} |\sigma _t(\mathbf{u}^*)|_{\ell ^{p,\infty }}^{\frac{1}{s}} \varepsilon _k^{-\frac{1}{s}}. \end{aligned}$$(5.4)
*p*. - (ii)If \(\sigma _{t,j}(\mathbf{u}^*) \le C e^{- c j^\beta }\), for \(j\in {\mathbb {N}}\) and \(t=1,\ldots ,E\), with \(C, c, \beta > 0\), then there exists \({\tilde{\rho }} \in (0,1)\) such that with \(\varepsilon _k {:}{=} {\tilde{\rho }}^k\), one hasThe constants depend on \(\gamma , \varGamma , \theta , \nu , \alpha _0\), and on$$\begin{aligned} ||\mathbf{u}_{k} - \mathbf{u}^*|| \lesssim d \varepsilon _k, \qquad \max _{t=1,\ldots ,E}{{\mathrm{rank}}}_t(\mathbf{u}_k) \lesssim d^2 \bigl ( 1+ |\ln \varepsilon _{k}| \bigr )^{\frac{1}{\beta }}. \end{aligned}$$(5.5)
*C*,*c*, and \(\beta \).

*Remark 5.2*

*p*, and on the parameters of the iteration. Thus, we also have quasi-optimality of approximation errors. Note that the arising power of

*d*does not depend on further properties of \({\mathcal {A}}\). In the case of (5.5), we obtain

*C*,

*c*are modified.

*Remark 5.3*

The choice of \(\nu \) and \(\theta \) enters only into the value of \({\tilde{\rho }}\) and into the multiplicative constants in (5.4) and (5.5), but does not influence the asymptotic dependence on *d* and \(\varepsilon _k\). The results of Proposition 4.2, however, suggest to choose \(\theta \) smaller than \(\rho \) in order to avoid unnecessarily slowing down the convergence of the iteration. Thus, one may for instance simply set \(\nu =\frac{1}{2}, \theta =\frac{\rho }{2}\), although a further adjustment may improve the quantitative behavior of the iteration for a given problem.

In the proof of Theorem 5.1, we will use the following technical lemma, which limits the decay of the soft thresholding error as the thresholding parameter is decreased.

**Lemma 5.4**

Let \(\mathbf{v}\ne 0\), then \( d^{\alpha }_t(\mathbf{v}) \le \theta ^{-1} d^{\theta \alpha }_t (\mathbf{v}) , t=1,\ldots ,E\), for all \(\alpha >0, \theta \in (0,1)\).

*Proof*

*Proof of Theorem 5.1*

*Step 1*We first show that the condition

*k*. Hence, (5.6) holds with \(k = J_0 -1\) for some \(J_0 \in {\mathbb {N}}\), and we assume this to be the minimum integer with this property. The thresholding parameter is then decreased for the following iteration, that is, \(\alpha _{J_0} = \theta \alpha _{J_0 -1} = \theta \alpha _{0}\). As in (4.4), for \(k < J_0\), we obtain

*Step 2*To show convergence of the \(\mathbf{u}_k\), we first observe that by our requirement that \(\alpha _0 \ge E^{-1} \omega ||{\mathbf {f}}||\) (which is in fact not essential for the execution of the iteration), we actually have \(\mathbf{u}_1=0\) and hence \(\mathbf{u}^{\alpha _0} = 0\), implying also \(J_0=1\). In particular,

*i*is increased each time that condition (5.6) is satisfied. For each

*i*, consistently with the previous definition of \(J_0\), we define \(J_i\) as the last index of an iterate produced with the value \(\eta _{i}\), which means that \(\mathbf{u}_{i+1,0} = \mathbf{u}_{i,J_i}\) and, as a consequence of (5.12),

*Step 3*Our next aim is to estimate the values of \(J_i\). We have already established that \(J_0 = 1\). In order to estimate \(J_i\) for \(i>0\), we use (5.7) and (5.8) to obtain

*j*sufficiently large. Thus, (5.6) follows if the two conditions

*Step 4*In order to establish rank estimates, we need to bound the errors of \(\mathbf{w}_{i,j}\) as defined in (5.13) for each \(i \ge 0\) and \(0 < j \le J_i\). Since \(\mathbf{u}_0 = \mathbf{u}^{\eta _0} = 0\) by our choice of \(\alpha _0\), for \(i=0\), we obtain

*t*and \(0 < j \le J_i\), we have

*i*,

*j*, and consequently

*Remark 5.5*

The above algorithm is universal in the sense that it does not require knowledge of the decay of the \(\sigma _t(\mathbf{u}^*)\), but we still obtain the same quasi-optimal rank estimates as with \(\alpha _k\) prescribed a priori as in Theorem 4.4. Note that in (5.5), we have absorbed the additional logarithmic factor that is present in the error bound in (4.9) by comparing to a slightly slower rate of linear convergence as in Remark 4.5, but the estimates are essentially of the same type.

*Remark 5.6*

For the effective convergence rate \({\tilde{\rho }}\) in the statement of Theorem 5.1, as can be seen from the proof (in particular from the estimates for the \(J_i\)), one has an estimate from above of the form \({\tilde{\rho }} \le {\hat{\rho }}^{\frac{1}{\log d}} < 1\), where \(\hat{\rho }\) does not explicitly depend on *d* (although it may still depend on *d* through other quantities such as \(\gamma , \varGamma \)). Consequently, combining this with the statements in (5.4) and (5.5), we generally have to expect that the number of iterations required to ensure \(||\mathbf{u}_k - \mathbf{u}^*|| \le \varepsilon \) scales like \(|\log \hat{\rho }|^{-1} \bigl ( (|\log \varepsilon | + \log d ) \log d\bigr )\). Furthermore, as can be seen from the proof, the estimates for ranks and errors deteriorate only by algebraic factors when \(\gamma \) and \(\varGamma \) vary algebraically in *d*.

*Remark 5.7*

*Remark 5.8*

It needs to be noted that our entire construction relies quite strongly on special properties of soft thresholding, especially on its non-expansiveness. In particular, if the soft thresholding operations are replaced by hard thresholding, none of our convergence and complexity results any longer apply. Since a number of worst-case estimates are involved, however, this difference is typically not quite as crucial in practice. Using hard thresholding, the scheme becomes heuristic, but it may still converge and perform well. We illustrate this numerically in Sect. 6.2.

### 5.1 Inexact Evaluation of Residuals

We finally consider a perturbed version of \({{\mathrm{\textsc {STSolve}}}}\) where residuals are no longer evaluated exactly, but only up to a certain relative error. We assume that for each given \(\mathbf{v}\) and \(\delta > 0\), we can produce \(\mathbf {r}\) such that \(||\mathbf {r} - ({\mathcal {A}}\mathbf{v}- {\mathbf {f}})|| \le \delta \).

*k*such that

*k*) to ensure the relative accuracy with respect to \(||\mathbf{u}_{k+1} - \mathbf{u}_k||\).

The residual approximations may in particular involve additional rank reduction operations. However, these serve a different purpose from the truncation by soft thresholding: While the soft thresholding step performs a rank reduction that is large enough to guarantee quasi-optimal ranks, the residual approximation needs to happen with an error small enough to ensure that the computed residual is still sufficiently accurate for reducing the error and for controlling the iteration. In particular, the residual approximations by themselves would not suffice to achieve the given rank bounds.

**Proposition 5.9**

The statement of Theorem 5.1 holds also for \({{\mathrm{\textsc {STSolve2}}}}\) given in Algorithm 3.

*Proof*

We do not restate the full proof, but instead indicate how the required estimates are modified. For a given iterate \(\mathbf{u}_k\), we now denote the exact residual by \({\bar{\mathbf {r}}}_k := {\mathcal {A}}\mathbf{u}_k - {\mathbf {f}}\) and the computed residual by \(\mathbf {r}_k\).

With (5.22) implying (5.12) and the modified estimates (5.21) and (5.23), one can now follow the proof of Theorem 5.1 to obtain the same statements.

Finally, we address the two additional checks in lines 5 and 14 which ensure that the loops for adjusting \(\delta _k\) always terminate. On the one hand, when the condition in line 14 of Algorithm 3 is satisfied, then \(||{\mathcal {A}}\mathbf{u}_{k+1} - {\mathbf {f}}|| \le \gamma \varepsilon \), which implies \(||\mathbf{u}_{k+1}-\mathbf{u}^*|| \le \varepsilon \), and we can therefore terminate the algorithm.

*Remark 5.10*

Note that as a consequence of the choice of residual approximation tolerances, the \(\delta _k\) obtained in Algorithm 3 remain proportional to \(||\mathbf {r}_k||\) during the iteration.

### 5.2 Comparison to Existing Results

As we have noted in Sect. 1.2, complexity estimates for low-rank solvers, including in particular bounds for the arising ranks, have also been obtained in [2, 3]. In these bounds, the low-rank structure of the operator plays a role, in contrast to the bounds obtained here. We now consider these differences in more detail.

The adaptive method in [2, 3] does not address the same problem as we are considering here, since it is designed to also automatically select a discretization for operator equations. One can, however, extract the basic mechanism for controlling ranks. Similarly to the iterative schemes proposed in this work, the resulting procedure \({{\mathrm{\textsc {BD15Solve}}}}\), given in Algorithm 4, can be formulated on infinite-dimensional tensor product Hilbert spaces or on finite-dimensional subspaces. We now compare the rank bounds that one obtains for this method by the results in [2] to those of Theorem 5.1.

- (i)If \(\sigma _t (\mathbf{u}^*) \in \ell ^{p,\infty }, t=1,\ldots ,E\), with \(0<p<2\), thenwhere \(C_1>0\) depends on$$\begin{aligned} {\bar{r}}(\mathbf{u}^*,\delta ) \le C_1 d^{\frac{1}{2s}} \max _{t=1,\ldots ,E} |\sigma _t(\mathbf{u}^*)|_{\ell ^{p,\infty }}^{\frac{1}{s}} \delta ^{-\frac{1}{s}} ,\quad s = \frac{1}{p}-\frac{1}{2}, \end{aligned}$$(5.29)
*p*and \(\tau \). Hence, (5.28) becomes$$\begin{aligned} ||\mathbf{u}^* - \mathbf{v}_k|| \le \varepsilon _k ,\quad \max _{t=1,\ldots ,E} {{\mathrm{rank}}}_t(\mathbf{v}_k) \lesssim d^{\frac{1}{s}} \max _{t=1,\ldots ,E} |\sigma _t(\mathbf{u}^*)|_{\ell ^{p,\infty }}^{\frac{1}{s}} \varepsilon _k^{-\frac{1}{s}}. \end{aligned}$$ - (ii)If \(\sigma _{t,j}(\mathbf{u}^*) \le C e^{- c j^\beta }\), for \(j\in {\mathbb {N}}\) and \(t=1,\ldots ,E\), with \(C, c, \beta > 0\), thenwhere \(C_2>0\) depends on \(C,c,\beta ,\tau \). In this case, (5.28) yields$$\begin{aligned} {\bar{r}}(\mathbf{u}^*,\delta ) \le C_2 \,\bigl ( \ln d + |\ln \delta | \bigr )^{\frac{1}{\beta }} , \end{aligned}$$(5.30)$$\begin{aligned} ||\mathbf{u}^* - \mathbf{v}_k|| \le \varepsilon _k ,\quad \max _{t=1,\ldots ,E} {{\mathrm{rank}}}_t(\mathbf{v}_k) \lesssim \bigl ( \ln d + |\ln \varepsilon _k| \bigr )^{\frac{1}{\beta }} . \end{aligned}$$

*d*, the bounds for \(\mathbf{v}_k\) do not depend on the properties of \({\mathcal {A}}\) and in fact have a more favorable explicit

*d*-dependence than in Theorem 5.1.

However, the rank bounds obtainable for the iterates \(\mathbf{w}_{k,j}\) of the inner iteration—which of course contribute equally to the computational complexity—depend on \({\mathcal {A}}\) and on the bound *I* for the number of inner iterations. For discussing this dependence, we need to not only track explicit dependencies on *d* that arise in the algorithm, but also implicit dependencies due to the problem data. We assume a *d*-dependent family of comparable problems given by \(({\mathcal {A}},{\mathbf {f}})\). For simplicity, we assume that \({\mathbf {f}}\) is of fixed bounded rank, although our conclusions are valid also when a suitable routine for obtaining low-rank approximations of \({\mathbf {f}}\) is available.

We now consider the particular bounds for \(C(d,\varepsilon _k)\) that one obtains in various situations. It needs to be noted that the resulting worst-case estimates for \(\mathbf{w}_{k,j}\) are typically very pessimistic. In practice, one has additional flexibility in choosing \({\tilde{\kappa }}>0\) to perform additional rank truncations in line 8 of the inner iteration, whose effect is also studied in the numerical examples of Sect. 6, but which do not change the estimates.

*R*, that is, one has \({{\mathrm{rank}}}_t({\mathcal {A}}\mathbf{v}) \le R {{\mathrm{rank}}}_t(\mathbf{v})\) for all \(\mathbf{v}\) and

*t*, and that \(\gamma ^{-1}\varGamma \) is bounded independently of

*d*. Then, due to the choice of \(\kappa _1\), we have \(I \le m \ln d\) for some \(m>0\) independent of

*d*. Since we cannot quantify the effect of the additional recompression in line 8, we can only infer

*m*and

*R*, the result of Theorem 5.1 for all iterates of \({{\mathrm{\textsc {STSolve}}}}\) (where the polynomial

*d*-dependence is unaffected by

*R*) is thus more favorable than the estimate (5.31) for \({{\mathrm{\textsc {BD15Solve}}}}\) in this case.

If instead, for instance, \(R \sim d^q\) for a \(q>0\), we obtain \(C(d,\varepsilon _k) = C(d) = d^{ q m (\ln d + \ln R)}\), which is superalgebraic in *d*. If \({\mathcal {A}}\) is not of bounded rank, but can be approximated so that effectively \(R \sim |\ln \varepsilon _k|\), then we instead obtain \(C(d,\varepsilon _k) = |\ln \varepsilon _k|^{m \ln d}\). In such cases, Theorem 5.1 (or Proposition 5.9, respectively) becomes substantially more favorable than (5.31).

The difference becomes more pronounced when the condition number of \({\mathcal {A}}\) is allowed to grow algebraically in *d*, in which case one obtains \(C(d,\varepsilon _k)\) that has exponential-type growth in *d*. In contrast, inspecting the proof of Theorem 5.1, one finds that the *d*-dependence in the rank bounds for \({{\mathrm{\textsc {STSolve}}}}\) and \({{\mathrm{\textsc {STSolve2}}}}\) remains algebraic in *d*.

The above complexity bounds hold for any nonnegative choice of the parameter \({\tilde{\kappa }}\) in \({{\mathrm{\textsc {BD15Solve}}}}\). Although the effect of the additional truncations for positive \({\tilde{\kappa }}\) is not rigorously quantified, they typically lead to ranks that are substantially smaller than the theoretical bounds for \(\mathbf{w}_{k,j}\), and in this regard, \({{\mathrm{\textsc {BD15Solve}}}}\) offers some additional flexibility. This will be illustrated in more detail in the following section.

## 6 Numerical Experiments

The procedures \({{\mathrm{\textsc {STSolve}}}}\) and \({{\mathrm{\textsc {STSolve2}}}}\) can be applied to quite general discretized elliptic problems, since only bounds on the spectrum of the discrete operator \({\mathcal {A}}\) are required. For our numerical tests^{2}, we first consider a high-dimensional Poisson problem, including a comparison to the results obtained with \({{\mathrm{\textsc {BD15Solve}}}}\) as described in Sect. 5.2, as well as a parametric diffusion problem.

In our tests, we focus on the evolution of the ranks of iterates, using discretizations such that the discretization errors remain negligible in the considered tests. Depending on how precisely the residual approximation is realized in \({{\mathrm{\textsc {STSolve2}}}}\), complexity bounds for the iteration may be dominated by those of the soft thresholding procedure, as discussed in Remark 3.1.

### 6.1 A High-Dimensional Poisson Problem

In principle, for treating such partial differential equations, one needs to adjust the discretization to the desired solution accuracy, as analyzed in detail in [3]. For our present simplified test, we fix a sufficiently fine discretization, corresponding to a fixed choice of wavelet basis functions, in advance. Based on the analysis in [3], we then choose a corresponding preconditioner that guarantees a certain uniform bound on the condition number for this particular discretization.

Note that since the resulting diagonal scalings \({\tilde{\mathbf { S}}}^{-1}_{\bar{n}}\) consist of 10 separable terms (whose number increases with larger \(\varLambda _1\)), a naive direct application of \({\mathcal {A}}\) could increase the hierarchical ranks of a given input \(\mathbf{v}\) by a factor of up to 200; the observed ranks required for accurately approximating \({\mathcal {A}}\mathbf{v}\), however, are much lower. Therefore, we use the recompression strategy described in [3, Section 7.2] for an approximate evaluation of \({\mathcal {A}}\mathbf{v}\) with prescribed tolerance in order to avoid unnecessarily large ranks in intermediate quantities. In this setting, the inexact residual evaluation in Algorithm 3 is thus of crucial practical importance.

The numerical results for \(d=16, d=32\), and \(d=64\) are shown in Figs. 2 and 3. In each case, we use the linear dimension tree (2.1). It can be observed in Fig. 2 that the norm of the solution of the problem decreases slightly with increasing *d*; apart from this, the iteration behaves very similarly for the different values of *d*, producing in particular a monotonic decrease in discrete residual norms. As expected, these values also remain uniformly proportional to the \(H^1\)-difference to the reference solution. The values \(\alpha _k\) can be seen to first decrease in every step as long as \(\mathbf{u}_k = 0\). Subsequently, they decrease in a regular manner after an essentially constant number of iterations. As one would also expect, the final value of \(\alpha _k\) needs to be slightly smaller for larger *d*.

*d*) compared to the ranks of the corresponding computed discrete residuals \(\mathbf {r}_k\), clearly demonstrating the reduced rank increase relative to \(\mathbf{u}_k\) that we obtain by the approximate residual evaluation. The additional variation in the residual ranks is a consequence of the fact that the differences \(||\mathbf{u}_{k+1} - \mathbf{u}_k||\) decrease as long as \(\alpha _k\) remains constant, enforcing a more accurate residual evaluation. As soon as the thresholding parameter changes, the accuracy requirement is subsequently relaxed again by line 20 in Algorithm 3, since the values \(||\mathbf{u}_{k+1} - \mathbf{u}_k||\) again become larger when \(\mathbf{u}^{\alpha _k}\) changes. Note furthermore that the ranks show little variation with increasing

*d*, which is substantially more favorable than the quadratic increase with

*d*that is possible in the estimates of Theorem 5.1.

Comparison of execution times (in seconds) for ensuring a residual norm below \(10^{-3}\), on a machine with 2.5 GHz Core i7-4870HQ CPU, for the example of Sect. 6.1 with \(d=16\)

\({{\mathrm{\textsc {STSolve2}}}}\) | ||||
---|---|---|---|---|

\(\nu \) | \(\theta \) | |||

0.01 | 0.025 | 0.1 | 0.4 | |

0.9 | 218.6 | 86.2 | 77.9 | 175.1 |

0.3 | 227.6 | 95.9 | 90.1 | 244.4 |

\({{\mathrm{\textsc {BD15Solve}}}}\) | ||||
---|---|---|---|---|

\(\vartheta \) | \({\tilde{\kappa }}\) | |||

0.01 | 0.1 | 1 | 100 | |

0.5 | 2233.3 | 27.6 | 11.7 | 23.6 |

0.1 | 1490.9 | 16.4 | 6.7 | 13.0 |

We also compare to the results obtained by \({{\mathrm{\textsc {BD15Solve}}}}\) as in Algorithm 4, using the same procedure for approximating residuals. As noted in Sect. 5.2, the rank bounds for the intermediate iterates \(\mathbf{w}_{k,j}\) are in general weaker than those for the iterates of \({{\mathrm{\textsc {STSolve2}}}}\), but one has additional flexibility in choosing the parameter \({\tilde{\kappa }}\) that controls the additional rank truncations in the inner iteration. The results for \(d=16\), with \({\tilde{\kappa }}=\frac{1}{10}\) and \(\vartheta =\frac{1}{2}\), are given in Fig. 4 and compared to those of \({{\mathrm{\textsc {STSolve2}}}}\) in Fig. 5. Here, the intermittent increases in the residuals observed for \({{\mathrm{\textsc {BD15Solve}}}}\) in Fig. 4 are caused by the rank truncations that restore near-optimal ranks at the end of each outer iteration. As can be seen in Fig. 5, the ranks of intermediate iterates are indeed slightly larger than those produced by \({{\mathrm{\textsc {STSolve2}}}}\) at the same residual bounds. Although the residual approximation tolerances used by both methods remain proportional to the current solution error, the residual tolerances used by \({{\mathrm{\textsc {STSolve2}}}}\) are lower by a certain factor, leading to larger ranks of the resulting approximate residuals \(\mathbf {r}_k\).

The latter difference also plays a role in our comparison of effective computational complexities in Table 1. Here, we juxtapose several choices of parameters in the algorithms, in each case within the ranges where the respective theoretical complexity estimates apply. Mainly as a consequence of the more economical residual approximation tolerances, \({{\mathrm{\textsc {BD15Solve}}}}\) can be substantially more efficient than \({{\mathrm{\textsc {STSolve2}}}}\). However, this depends strongly on a sufficiently large choice of \({\tilde{\kappa }}\), which enables \({{\mathrm{\textsc {BD15Solve}}}}\) to profit from rank truncations beyond the theoretical guarantees. For small \({\tilde{\kappa }}\), where one approaches the more pessimistic rank bounds ensured by the analysis, this advantage is evidently lost. In contrast, \({{\mathrm{\textsc {STSolve2}}}}\) shows fairly small variations in performance for the various choices of parameters.

### 6.2 A Parametric Diffusion Problem

*U*, such that \(u(y)\in V\) for each \(y\in U\) solves

*J*and introduce finite-dimensional subspaces \(V_h\subset V\) and \(W_L \subset L^2((-1,1),\frac{1}{2} d\lambda )\). Specifically, for \(V_h\), we take piecewise quadratic finite elements of uniform grid size

*h*, and \(W_L := {\text {span}} \{ L_k \}_{k=0,\ldots ,L}\), where \(L_k\) are the orthonormal Legendre polynomials. We thus arrive at

The characteristics of this problem with respect to low-rank approximation are quite different from those of the discretized Poisson equation. In the latter case, one obtains similar exponential-type decay for the singular values of all matricizations. Here, one instead observes much slower algebraic decay of the singular values of the matricization \({\mathcal {\hat{M}}}_{\{1\}}(\mathbf{u}^*)\) as \(J\rightarrow \infty \), but exponential decay of those of \({\mathcal {\hat{M}}}_{\{i\}}(\mathbf{u}^*)\) for \(i>1\); note that \({\mathcal {\hat{M}}}_{\{1\}}(\mathbf{u}^*)\) plays a special role, since it corresponds to the separation between spatial and stochastic variables.

The present example thus concerns on the one hand the estimates for algebraic decay in Theorem 5.1 and on the other hand the behavior of the iteration for such heterogeneous decay as noted in Remark 5.7. As a consequence of the results in [12], the sequence of singular values of \({\mathcal {\hat{M}}}_{\{1\}}(\mathbf{u}^*)\) is in \(\ell ^{p,\infty }\) for \(p=\frac{2}{7}\), corresponding to algebraic convergence with rate \(s=3\) in Theorem 5.1.

In our tests, we use again the linear dimension tree (2.1). We choose sufficiently large *J* and *L*, as well as sufficiently fine spatial discretizations, such that these discretizations do not influence the low-rank approximability at the considered solution residual sizes. The results thus reflect the behavior of the iteration on the full function space, where \({\mathcal {A}}\) has unbounded hierarchical rank.

In Fig. 6, we compare the results for \({{\mathrm{\textsc {STSolve2}}}}\) and \({{\mathrm{\textsc {BD15Solve}}}}\). Both yield a growth of maximum ranks at a rate consistent with the above value of *s*, but \({{\mathrm{\textsc {STSolve2}}}}\) leads to quantitatively larger ranks. This appears to be related to the larger truncation error incurred by soft thresholding. To illustrate this, we additionally consider a modification of \({{\mathrm{\textsc {STSolve2}}}}\) obtained by replacing all soft thresholding operations by hard thresholding with the same threshold levels, which leads to a purely heuristic method whose convergence is not ensured. Here, this modified scheme performs well, however, and indeed yields even smaller ranks than \({{\mathrm{\textsc {BD15Solve}}}}\). Thus, in this example, the general complexity bounds for soft thresholding come at the price of larger ranks than obtained with less conservative thresholding procedures.

Residual bounds and matricization ranks of iterates \(\mathbf{u}_k\) of \({{\mathrm{\textsc {STSolve2}}}}\) obtained for the example of Sect. 6.2

| \(||\mathbf {r}_k||+\delta _k\) | \({{\mathrm{rank}}}({\mathcal {\hat{M}}}_{\{i\}}(\mathbf{u}_k))\) for \(i=1,\ldots , 15\) | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

6 | \(1.15\text {e}_{-1}\) | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |

9 | \(3.04\text {e}_{-2}\) | 4 | 3 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |

12 | \(7.50\text {e}_{-3}\) | 9 | 3 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |

15 | \(1.92\text {e}_{-3}\) | 12 | 4 | 3 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |

18 | \(4.81\text {e}_{-4}\) | 18 | 5 | 3 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | 1 |

21 | \(1.21\text {e}_{-4}\) | 26 | 5 | 3 | 3 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 1 |

## 7 Conclusion

We have constructed an iterative scheme for solving linear elliptic operator equations in hierarchical tensor representations. This method guarantees linear convergence to the solution \(\mathbf{u}^*\) as well as quasi-optimality of the tensor ranks of all iterates and is universal in the sense that no a priori knowledge on the low-rank approximability of \(\mathbf{u}^*\) is required.

Since the given choices of thresholding parameters work for quite general contractive fixed-point mappings, the construction of schemes that make this choice a posteriori may be possible for more general cases than the linear elliptic one treated here. In this regard, note that although we have always assumed for ease of presentation that the considered operator \({\mathcal {A}}\) is also symmetric, this is not essential.

In this work, we have considered fixed discretizations, and we have obtained bounds that are robust with respect to these discretizations. Although such rank bounds constitute the central ingredient in the complexity analysis of such methods, to obtain meaningful estimates for the total complexity of solving operator equations, this still needs to be combined with adapted discretizations as in [2], which will be the subject of a forthcoming work.

We have seen that our approach generally yields stronger asymptotic rank bounds for each iterate than previous constructions. It also exhibits the expected robust behavior in practice, where it can, however, be quantitatively somewhat more expensive than existing approaches due to the comparably conservative control of the iteration. A further interesting question is therefore to what extent the quantitative performance can be enhanced, for instance by a modified soft thresholding procedure or by a different adjustment of error tolerances, while retaining the asymptotic bounds.

However, our analysis can also be interpreted as a rigorous justification of heuristic low-rank solvers that are based on similar operations. For instance, as can be seen in the test of Sect. 6.2, replacing soft thresholding by hard thresholding means that the general estimates no longer apply, but it may amount only to a small (or even favorable) perturbation in typical practical examples. The methods presented here can thus also serve as a reference point for heuristic schemes that are optimized for particular problems.

## Footnotes

## Notes

### Acknowledgments

This research was supported in part by DFG SPP 1324 and ERC AdG BREAD.

### References

- 1.Bachmayr, M., Cohen, A.: Kolmogorov widths and low-rank approximations of parametric elliptic PDEs. Math. Comp. (2016). In press.Google Scholar
- 2.Bachmayr, M., Dahmen, W.: Adaptive near-optimal rank tensor approximation for high-dimensional operator equations. Found. Comput. Math.
**15**(4), 839–898 (2015)MathSciNetCrossRefMATHGoogle Scholar - 3.Bachmayr, M., Dahmen, W.: Adaptive low-rank methods: Problems on Sobolev spaces. SIAM J. Numer. Anal. (2016). In press.Google Scholar
- 4.Ballani, J., Grasedyck, L.: A projection method to solve linear systems in tensor format. Numer. Linear Algebra Appl.
**20**(1), 27–43 (2013)MathSciNetCrossRefMATHGoogle Scholar - 5.Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sciences
**2**, 182–202 (2009)MathSciNetCrossRefMATHGoogle Scholar - 6.Beylkin, G., Mohlenkamp, M. J.: Algorithms for numerical analysis in high dimensions. SIAM J. Sci. Comput.
**26**(6), 2133–2159 (electronic) (2005)Google Scholar - 7.Billaud-Friess, M., Nouy, A., Zahm, O.: A tensor approximation method based on ideal minimal residual formulations for the solution of high-dimensional problems. ESAIM: M2AN 48(6), 1777–1806 (2014).Google Scholar
- 8.Bredies, K., Lorenz, D. A.: Linear convergence of iterative soft-thresholding. J Fourier Anal Appl
**14**, 813–837 (2008)MathSciNetCrossRefMATHGoogle Scholar - 9.Cai, J.-F., Candès, E. J., Shen, Z.: A singular value thresholding algorithm for matrix completion. SIAM J Optimization
**20**, 1956–1982 (2010)MathSciNetCrossRefMATHGoogle Scholar - 10.Cancès, E., Ehrlacher, V., Lelièvre, T.: Convergence of a greedy algorithm for high-dimensional convex nonlinear problems. Math. Models Methods Appl. Sci.
**21**(12), 2433–2467 (2011)MathSciNetCrossRefMATHGoogle Scholar - 11.Cohen, A., Dahmen, W., DeVore, R.: Adaptive wavelet methods for elliptic operator equations: Convergence rates. Mathematics of Computation
**70**(233), 27–75 (2001)MathSciNetCrossRefMATHGoogle Scholar - 12.Cohen, A., DeVore, R., Schwab, C.: Analytic regularity and polynomial approximation of parametric and stochastic PDE’s. Analysis and Applications
**9**, 1–37 (2011)MathSciNetCrossRefMATHGoogle Scholar - 13.Da Silva, C., Herrmann, F. J.: Optimization on the hierarchical Tucker manifold—applications to tensor completion. Linear Algebra Appl.
**481**, 131–173 (2015)MathSciNetCrossRefMATHGoogle Scholar - 14.Dahlke, S., Fornasier, M., Raasch, T.: Multilevel preconditioning and adaptive sparse solution of inverse problems. Math. Comp.
**81**, 419–446 (2012)MathSciNetCrossRefMATHGoogle Scholar - 15.Dahmen, W., DeVore, R., Grasedyck, L., Süli, E.: Tensor-sparsity of solutions to high-dimensional elliptic partial differential equations. Found. Comput. Math. (2015). In press. DOI:10.1007/s10208-015-9265-9
- 16.Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Comm. Pure Appl. Math.
**57**(11), 1413–1457 (2004)MathSciNetCrossRefMATHGoogle Scholar - 17.DeVore, R.: Nonlinear approximation. Acta Numer.
**7**, 51–150 (1998)CrossRefMATHGoogle Scholar - 18.Dolgov, S. V., Savostyanov, D. V.: Alternating minimal energy methods for linear systems in higher dimensions. SIAM J. Sci. Comput.
**36**(5), A2248–A2271 (2014)MathSciNetCrossRefMATHGoogle Scholar - 19.Donovan, G. C., Geronimo, J. S., Hardin, D. P.: Orthogonal polynomials and the construction of piecewise polynomial smooth wavelets. SIAM J. Math. Anal.
**30**, 1029–1056 (1999)MathSciNetCrossRefMATHGoogle Scholar - 20.Falcó, A., Nouy, A.: Proper generalized decomposition for nonlinear convex problems in tensor Banach spaces. Numer. Math.
**121**(3), 503–530 (2012)MathSciNetCrossRefMATHGoogle Scholar - 21.Gandy, S., Recht, B., Yamada, I.: Tensor completion and low-n-rank tensor recovery via convex optimization. Inverse Problems
**27**(2), 025010 (2011)MathSciNetCrossRefMATHGoogle Scholar - 22.Gantumur, T., Harbrecht, H., Stevenson, R.: An optimal adaptive wavelet method without coarsening of the iterands. Math. Comp.
**76**(258), 615–629 (2007)MathSciNetCrossRefMATHGoogle Scholar - 23.Grasedyck, L.: Existence and computation of low Kronecker-rank approximations for large linear systems of tensor product structure. Computing
**72**, 247–265 (2004)MathSciNetCrossRefMATHGoogle Scholar - 24.Grasedyck, L.: Hierarchical singular value decomposition of tensors. SIAM J. Matrix Anal. Appl.
**31**(4), 2029–2054 (2010)MathSciNetCrossRefMATHGoogle Scholar - 25.Grasedyck, L., Kressner, D., Tobler, C.: A literature survey of low-rank tensor approximation techniques. GAMM-Mitt.
**36**(1), 53–78 (2013)MathSciNetCrossRefMATHGoogle Scholar - 26.Hackbusch, W.: Entwicklungen nach Exponentialsummen. Tech. Rep. 4, MPI Leipzig (2005)Google Scholar
- 27.Hackbusch, W.: Tensor Spaces and Numerical Tensor Calculus,
*Springer Series in Computational Mathematics*, vol. 42. Springer-Verlag Berlin Heidelberg (2012)CrossRefMATHGoogle Scholar - 28.Hackbusch, W.: Numerical tensor calculus. Acta Numer.
**23**, 651–742 (2014)MathSciNetCrossRefGoogle Scholar - 29.Hackbusch, W., Khoromskij, B., Tyrtyshnikov, E.: Approximate iterations for structured matrices. Numerische Mathematik
**109**, 119–156 (2008)MathSciNetCrossRefMATHGoogle Scholar - 30.Hackbusch, W., Kühn, S.: A new scheme for the tensor representation. J. Fourier Anal. Appl.
**15**(5), 706–722 (2009)MathSciNetCrossRefMATHGoogle Scholar - 31.Holtz, S., Rohwedder, T., Schneider, R.: The alternating linear scheme for tensor optimization in the tensor train format. SIAM J. Sci. Comput.
**34**(2), A683–A713 (2012)MathSciNetCrossRefMATHGoogle Scholar - 32.Huang, B., Mu, C., Goldfarb, D., Wright, J.: Provable models for robust low-rank tensor completion. Pacific Journal of Optimization
**11**(2), 339–364 (2015)MathSciNetMATHGoogle Scholar - 33.Khoromskij, B. N.: Tensor-structured preconditioners and approximate inverse of elliptic operators in \({\mathbb{R}^{d}}\). Constr. Approx.
**30**(3), 599–620 (2009)MathSciNetCrossRefMATHGoogle Scholar - 34.Khoromskij, B. N.: Tensor numerical methods for multidimensional: PDEs theoretical analysis and initial applications. ESAIM: ProcS
**48**, 1–28 (2015)MathSciNetCrossRefMATHGoogle Scholar - 35.Khoromskij, B. N., Schwab, C.: Tensor-structured Galerkin approximation of parametric and stochastic elliptic PDEs. SIAM J. Sci. Comput.
**33**(1), 364–385 (2011)MathSciNetCrossRefMATHGoogle Scholar - 36.Kressner, D., Steinlechner, M., Vandereycken, B.: Preconditioned low-rank Riemannian optimization for linear systems with tensor product structure. preprint (2015)Google Scholar
- 37.Kressner, D., Tobler, C.: Low-rank tensor Krylov subspace methods for parametrized linear systems. SIAM J. Matrix Anal. Appl.
**32**, 1288–1316 (2011)MathSciNetCrossRefMATHGoogle Scholar - 38.Kressner, D., Uschmajew, A.: On low-rank approximability of solutions to high-dimensional operator equations and eigenvalue problems. Linear Algebra Appl.
**493**, 556–572 (2016)MathSciNetCrossRefMATHGoogle Scholar - 39.Kühn, S.: Hierarchische Tensordarstellung. Ph.D. thesis, Universität Leipzig (2012)Google Scholar
- 40.Lai, M.-J., Yin, W.: Augmented \(\ell _1\) and nuclear-norm models with a globally linearly convergent algorithm. SIAM Journal on Imaging Sciences
**6**(2), 1059–1091 (2013)MathSciNetCrossRefMATHGoogle Scholar - 41.Liu, J., Musialski, P., Wonka, P., Ye, J.: Tensor completion for estimating missing values in visual data. IEEE Trans. Pattern Anal. Mach. Intell.
**35**(1), 208–220 (2013)CrossRefGoogle Scholar - 42.Ma, S., Goldfarb, D., Chen, L.: Fixed point and Bregman iterative methods for matrix rank minimization. Mathematical Programming
**128**(1-2), 321–353 (2011)MathSciNetCrossRefMATHGoogle Scholar - 43.Markus, A. S.: The eigen- and singular values of the sum and product of linear operators. Russian Mathematical Surveys
**19**(4), 91–120 (1964)CrossRefMATHGoogle Scholar - 44.Mirsky, L.: Symmetric gauge functions and unitarily invariant norms. Quarterly Journal of Mathematics
**11**, 50–59 (1960)MathSciNetCrossRefMATHGoogle Scholar - 45.Moreau, J.: Proximité et dualité dans un espace hilbertien. Bulletin de la Société Mathématique de France
**93**, 273–299 (1965)MathSciNetCrossRefMATHGoogle Scholar - 46.Oseledets, I., Tyrtyshnikov, E.: Breaking the curse of dimensionality, or how to use SVD in many dimensions. SIAM J. Sci. Comput.
**31**(5), 3744–3759 (2009)MathSciNetCrossRefMATHGoogle Scholar - 47.Oseledets, I. V.: Tensor-train decomposition. SIAM J. Sci. Comput.
**33**(5), 2295–2317 (2011)MathSciNetCrossRefMATHGoogle Scholar - 48.Recht, B., Fazel, M., Parrilo, P. A.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM review
**52**(3), 471–501 (2010)MathSciNetCrossRefMATHGoogle Scholar - 49.Schneider, R., Uschmajew, A.: Approximation rates for the hierarchical tensor format in periodic Sobolev spaces. J. Complexity
**30**(2), 56–71 (2014)MathSciNetCrossRefMATHGoogle Scholar - 50.Vidal, G.: Efficient classical simulation of slightly entangled quantum computations. Phys. Rev. Lett.
**91**, 147902 (2003)CrossRefGoogle Scholar - 51.White, S. R.: Density matrix formulation for quantum renormalization groups. Phys. Rev. Lett.
**69**, 2863–2866 (1992)CrossRefGoogle Scholar - 52.Wright, S. J., Nowak, R. D., Figueiredo, M. A. T.: Sparse reconstruction by separable approximation. IEEE Trans Sig. Process.
**57**, 2479–2493 (2009)MathSciNetCrossRefGoogle Scholar - 53.Yuan, M., Zhang, C.-H.: On tensor completion via nuclear norm minimization. Found. Comput. Math. (2015). In press. DOI:10.1007/s10208-015-9269-5