Numerical Tensor Techniques for Multidimensional Convolution Products

Hackbusch, Wolfgang

doi:10.1007/s10013-018-0300-4

Numerical Tensor Techniques for Multidimensional Convolution Products

Open access
Published: 05 September 2018

Volume 47, pages 69–92, (2019)
Cite this article

Download PDF

You have full access to this open access article

Vietnam Journal of Mathematics Aims and scope Submit manuscript

Numerical Tensor Techniques for Multidimensional Convolution Products

Download PDF

Wolfgang Hackbusch ORCID: orcid.org/0000-0002-4801-6189¹

1994 Accesses
Explore all metrics

Abstract

In order to treat high-dimensional problems, one has to find data-sparse representations. Starting with a six-dimensional problem, we first introduce the low-rank approximation of matrices. One purpose is the reduction of memory requirements, another advantage is that now vector operations instead of matrix operations can be applied. In the considered problem, the vectors correspond to grid functions defined on a three-dimensional grid. This leads to the next separation: these grid functions are tensors in $\mathbb {R}^{n}\otimes \mathbb {R}^{n}\otimes \mathbb {R}^{n}$ and can be represented by the hierarchical tensor format. Typical operations as the Hadamard product and the convolution are now reduced to operations between $\mathbb {R}^{n}$ vectors. Standard algorithms for operations with vectors from $\mathbb {R}^{n}$ are of order $\mathcal {O}(n)$ or larger. The tensorisation method is a representation method introducing additional data-sparsity. In many cases, the data size can be reduced from $\mathcal {O}(n)$ to $\mathcal {O}(\log n)$. Even more important, operations as the convolution can be performed with a cost corresponding to these data sizes.

Fast Higher-Order Functions for Tensor Calculus with Tensors and Subtensors

Efficient Analysis of High Dimensional Data in Tensor Formats

Even-order Toeplitz tensor: framework for multidimensional structured linear systems

Article 29 July 2019

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In this paper, we recapitulate the numerical techniques which are needed to handle high-dimensional problems. As discussion starter, we use an example from quantum chemistry. The following function h is to be determined:

$$ h(x,z)={\int}_{\mathbb{R}^{3}}f(x,x-y)g(y,z)\mathrm{d}y\qquad(x,z\in\mathbb{R}^{3}) $$

(1)

(for instance, f and g describe the pair amplitude and the pair interaction; cf. Flad–Flad-Harutyunyan [5]). A discretisation by a uniform grid {ih = (i₁h,i₂h,i₃h) : 0 ≤ i₁,i₂,i₃ ≤ n − 1} (h: grid size) in a cube leads to the discrete problem

$$ h_{\mathbf{ik}}=h^{3}{\sum}_{\mathbf{j}}f_{\mathbf{i},\mathbf{i} - \mathbf{j}}g_{\mathbf{j},\mathbf{k}}\qquad(\mathbf{i}=(i_{1},i_{2},i_{3}),\mathbf{k}=(k_{1},k_{2},k_{3}),0\leq i_{\nu},k_{\nu}\leq n-1). $$

(2)

Equation (2) describes an unusual matrix multiplication of convolution type:

$$ H=F\star G\qquad(H=(h_{\mathbf{ik}}),F=(f_{\mathbf{i},\mathbf{j}}),G=(g_{\mathbf{j},\mathbf{k}})). $$

(3)

The size of the matrices (number of entries) is n⁶. Taking n of the size 2¹⁰ ≈ 10³ to 2²⁰ ≈ 10⁶, it becomes obvious that naive methods cannot be used to perform the multiplication (3).

In Section 2, we shall consider the matrices in (3) as tensors of the space^{Footnote 1}$\mathbb {R}^{N}\otimes \mathbb {R}^{N}$ with

$$N=n^{3}. $$

Then, problem (3) reduces to operations of vectors in $\mathbb {R}^{N}$.

In a second step (Section 3), $\mathbb {R}^{N}$ is regarded as the tensor space $\mathbb {R}^{n}\otimes \mathbb {R}^{n}\otimes \mathbb {R}^{n}$. For such tensors, we describe an efficient representation and show how operations are performed. In our example, we need two operations in $\mathbb {R}^{n}$:

the Hadamard productv ⊙ w defined by the componentwise product (v ⊙ w)_i = v_iw_i, and
the convolutionv ⋆ w defined by $(v\star w)_{i}={\sum }_{\ell }v_{i-\ell }w_{\ell }$.

The convolution v ⋆ w is a discretisation of the convolution of functions, ${\int }_{\mathbb {R}}v(x-y)w(y)\mathrm {d}y$, provided that v_i (w_i) are the nodal values of v (w) in an equidistant grid. For instance, the convolution in $\mathbb {R}^{n}$ can be performed by the fast Fourier transform (FFT) requiring O(n log n) operations. However, as explained in Section 4, we can perform the convolution (as well as the Hadamard product) much faster using the tensorisation technique. Here, $\mathbb {R}^{n}$ for n = 2^L is replaced by the isomorphic tensor space $\otimes ^{L}\mathbb {R}^{2}$. In many cases, grid functions in $\mathbb {R}^{n}$—in particular those from quantum chemistry—can be approximated by a tensor representation using only $\mathcal {O}(\log ^{\ast }n)$ data.^{Footnote 2} Then, the exact convolution of v ⋆ w requires not more than $\mathcal {O}(\log ^{\ast }n)$ operations.

The convolution algorithm mentioned above is also interesting outside of quantum chemistry applications. Often, the functions v and w in ${\int }_{\mathbb {R}}v(x-y)w(y)\mathrm {d}y$ are represented by finite elements using locally refined grids or even hp techniques to reduce the number of degrees of freedom. If FFT is used for the convolution, one must transfer the finite-element functions to a uniform grid corresponding to the minimal grid size and thus one is destroying the advantages of the nonuniform finite-element approach.^{Footnote 3} The tensorisation technique is able to represent the data at least as efficient as in the finite-element case. Then, the operation cost is determined by the data sizes of the representations. Moreover, it yields the optimal representation of the result v ⋆ w.

2 Low-Rank Techniques for Matrices

2.1 Low-Rank Representation

In quantum chemistry, it is more usual to write the integral (1) as

$$ h(x,z)={\int}_{\mathbb{R}^{3}}\tilde{f}(x,y)g(y,z)\mathrm{d}y \qquad(x,z\in\mathbb{R}^{3}) $$

(4)

by introducing $\tilde {f}(x,y):=f(x,x-y)$ (cf. [5, (1.4)]). Then, the discrete analogue is the standard matrix product $\tilde {F}G$ instead of (3). However, this notation is less appropriate since the properties of the function f and of the matrix F are swept under the carpet.

The function f has a (representation) rank r if $f(x,y)={\sum }_{\nu = 1}^{r}a_{\nu }(x)b_{\nu }(y)$, where {a_ν} and {b_ν} are linearly independent univariate functions. The latter identity is also written in tensor form as

$$f=\sum\limits_{\nu= 1}^{r}a_{\nu}\otimes b_{\nu}. $$

For instance, the function f(x,y) = φ(x)/∥y − y₀∥ (y₀ position of a nucleus) has rank r = 1. However, the function $\tilde {f}(x,y):=\varphi (x)/\|y_{0}+x-y\|$ involved in (4) has infinite rank.

If the matrix $F\in \mathbb {R}^{N\times N}$ has the rank r, it allows a representation $F={\sum }_{\nu = 1}^{r}a_{\nu }b_{\nu }^{\mathsf {T}}~(a_{\nu },b_{\nu }\in \mathbb {R}^{N})$. Again, we write

$$ F={\sum}_{\nu= 1}^{r}a_{\nu}\otimes b_{\nu}. $$

(5)

The splitting of the tensor space $\mathbb {R}^{N}\otimes \mathbb {R}^{N}\cong \mathbb {R}^{N\times N}$ (≅ denotes isomorphy) into the two factors $\mathbb {R}^{N}$ is depicted in Fig. 1. In general, the tensor product v = v⁽¹⁾ ⊗ v⁽²⁾ ⊗⋯ ⊗ v^(d) with $v^{(j)}\in \mathbb {R}^{n_{j}}$ is a quantity indexed by d-tuples i = (i₁,…,i_d) with the values

$$ \mathbf{v[i]}=v^{(1)}[i_{1}]\cdot v^{(2)}[i_{2}]\cdot\ldots\cdot v^{(d)} [i_{d}]\qquad(1\leq i_{j}\leq n_{j}). $$

(6)

Here and in the sequel, we use boldface letters for tensors and tensor spaces, while vectors, matrices, and vector spaces are denoted by standard letters.

If r is much smaller than N, (5) describes the low-rank representation of F. Note that the right-hand side of (5) requires only 2rN ≪ N² data.

v⁽¹⁾ ⊗ v⁽²⁾ ⊗⋯ ⊗ v^(d) is called an elementary tensor. In general, v^(j) may be elements of arbitrary vector spaces V_j. The (algebraic) tensor space $\mathbf {V}=V_{1}\otimes V_{2}\otimes \cdots \otimes V_{d}=\bigotimes _{j = 1}^{d}V_{j}$ is defined as the span of all elementary tensors (cf. [10, Section 3.2]).

Remark 1

As a consequence, linear maps on V are uniquely defined by their values of elementary tensors. The same holds for bilinear maps on Cartesian products V ×W of two tensor spaces.

2.2 SVD Truncation

Even if F has maximal rank N, it might be well approximated by a low-rank matrix F_ε with rank r_ε. For the precise analysis, we need the singular-value decomposition (SVD) of F which is

$$F=\sum\limits_{\nu= 1}^{r}\sigma_{\nu}a_{\nu}\otimes b_{\nu},\qquad \{a_{\nu}\},\{b_{\nu}\}\text{ orthonormal systems}, $$

with the singular values σ₁ ≥ σ₂ ≥⋯ ≥ σ_r > 0. The traditional formulation is F = UΣV^T, where the columns of U and V are defined by a_ν and b_ν, respectively, and Σ is the diagonal matrix containing the singular values.

If $\sigma _{r_{\varepsilon }}\leq \varepsilon $ for some r_ε < r, the truncated matrix $F_{\varepsilon }:={\sum }_{\nu = 1}^{r_{\varepsilon }}\sigma _{\nu }a_{\nu }\otimes b_{\nu }$ has rank r_ε and satisfies the spectral norm estimate ∥F − F_ε∥₂ ≤ ε.

Now, we assume

$$F=\sum\limits_{\nu= 1}^{r}a_{\nu}\otimes b_{\nu},\qquad G=\sum\limits_{\mu= 1}^{s}c_{\mu}\otimes d_{\mu} $$

for the matrices in (3). We denote the entries of the vectors a_ν,b_ν,… by a_ν[i],b_ν[i],…, where i abbreviates the triple (i₁,i₂,i₃). Since $F_{\mathbf {i},\mathbf {j}}={\sum }_{\nu = 1}^{r}a_{\nu }[\mathbf {i}]b_{\nu }[\mathbf {j}]$ etc., the operation described in (2) becomes

$$h_{\mathbf{ik}}=h^{3}\sum\limits_{\nu= 1}^{r}\sum\limits_{\mu= 1}^{s}\sum\limits_{\mathbf{j}}a_{\nu }[\mathbf{i}] b_{\nu}[\mathbf{i}-\mathbf{j}] c_{\mu}[\mathbf{j}] d_{\mu}[\mathbf{k}]. $$

${\sum }_{\mathbf {j}}b_{\nu }[\mathbf {i}-\mathbf {j}] c_{\mu }[\mathbf {j}]$ is the component of the convolution b_ν ⋆ c_μ at index i. Set q_νμ := b_ν ⋆ c_μ. Then, the expression ${\sum }_{\mathbf {j}}a_{\nu }[\mathbf {i}] b_{\nu }[\mathbf {i}-\mathbf {j}] c_{\mu }[\mathbf {j}]$ is the i-component of the Hadamard product a_ν ⊙ q_νμ. Together, we obtain the representation of the matrix H in (3) by

$$ H=\sum\limits_{\mu= 1}^{s}\left( h^{3}\sum\limits_{\nu= 1}^{r}\left[a_{\nu}\odot\left( b_{\nu}\star c_{\mu}\right)\right] \right) \otimes d_{\mu}. $$

(7)

Hence, the following has to be calculated:

(a)
determine the vectors $q_{\nu \mu }:=b_{\nu }\star c_{\mu }\in \mathbb {R}^{N}$,
(b)
calculate the Hadamard products $a_{\nu }\odot q_{\nu \mu }\in \mathbb {R}^{N}$,
(c)
determine the sum $e_{\mu }:=h^{3}{\sum }_{\nu = 1}^{r}a_{\nu }\odot q_{\nu \mu }$.

Then, $H={\sum }_{\mu = 1}^{s}e_{\mu }\otimes d_{\mu }$ is the representation of the resulting matrix. This shows that H is again a low-rank matrix if G is so. Nevertheless, one may apply a singular-value decomposition and truncate H to a lower rank.

Since N = n³ holds with a large value of n, even the simple Hadamard product in Step (b) is too costly when using the standard vector format. Instead we shall exploit the tensor structure of $\mathbb {R}^{N}$.

For later use, we return to the representation (5). Let

$$U:=\operatorname{span}\{a_{\nu}:1\leq\nu\leq r\},\qquad V:=\operatorname{span} \{b_{\nu}:1\leq\nu\leq r\}. $$

Then, the tensor (matrix) F satisfies

$$ F\in U\otimes V\qquad\text{with }\dim(U)=\dim(V)=r. $$

(8)

Comparing (8) with $F\in \mathbb {R}^{N}\otimes \mathbb {R}^{N}$, we see that the full space $\mathbb {R}^{N}$ of dimension N is replaced by subspaces of dimension r ≪ N.

3 The Hierarchical Tensor Format

3.1 Separation and Bilinear Operations

Here, we make use of the Cartesian product structure of the grid {(i₁h,i₂h,i₃h) : 0 ≤ i₁,i₂,i₃ ≤ n − 1}. The tensor product of three vectors $a,b,c\in \mathbb {R}^{n}$ is defined in (6). These tensors span the tensor space $\mathbb {R}^{n}\otimes \mathbb {R}^{n}\otimes \mathbb {R}^{n}$ which is isomorphic to $\mathbb {R}^{N}$ (both spaces have dimension N = n³).

The analogue of the decomposition (5) would be the representation of $\mathbf {v}\in \mathbf {V}:=\mathbb {R}^{n}\otimes \mathbb {R}^{n}\otimes \mathbb {R}^{n}$ by

$$ \mathbf{v}=\sum\limits_{\nu= 1}^{r}a_{\nu}\otimes b_{\nu}\otimes c_{\nu}. $$

(9)

The smallest possible value of r is called the rank of the tensor v. The fact that in general the determination of this rank is NP hard (cf. Håstad [12]) already shows that the case of tensors of order ≥ 3 is much more involved. In particular, there is no direct analogue of the singular-value decomposition. This leads to difficulties when one wants to truncate a tensor to lower order (cf. Espig–Hackbusch [4]).

The Hadamard product (componentwise product) ⊙ is a bilinear operation V ×V →V. Another bilinear map is the matrix-vector multiplication. For a unified approach let $\boxdot $ be the symbol of a general bilinear operation between two tensor spaces. An efficient computation of such a tensor operation $\boxdot :\mathbf {X}\times \mathbf {Y}\rightarrow \mathbf {Z}$ (with $\mathbf {X}=\bigotimes _{j = 1}^{d}X_{j}$, etc.) can be based on the following property (10), provided this property holds. Let $\mathbf {x} = \bigotimes _{j = 1}^{d}x^{(j)}$ and $\mathbf {y}=\bigotimes _{j = 1}^{d}y^{(j)}$ be elementary tensors^{Footnote 4} with x^(j) ∈ X_j, y^(j) ∈ Y_j. Then,

$$ \left( \bigotimes_{j = 1}^{d}x^{(j)}\right) \boxdot \left( \bigotimes_{j = 1}^{d} y^{(j)}\right) = \bigotimes_{j = 1}^{d}\left( x^{(j)}\boxdot_{j}y^{(j)}\right) $$

(10)

reduces the operation $\boxdot $ to simpler bilinear operations $\boxdot _{j}:X_{j}\times Y_{j}\rightarrow Z_{j}$ on the individual vector spaces.

In the case of the Hadamard product, $\boxdot =\odot $ is the componentwise product of tensors, while $\boxdot _{j}=\odot $ is the componentwise product of vectors. In fact, the property

$$ \left( a\otimes b\otimes c\right) \odot \left( a^{\prime}\otimes b^{\prime}\otimes c^{\prime}\right) = \left( a\odot a^{\prime}\right) \otimes \left( b\odot b^{\prime}\right) \otimes\left( c\odot c^{\prime}\right) $$

(11)

follows since {(a ⊗ b ⊗ c) ⊙ (a^′⊗ b^′⊗ c^′)}[i] = (a ⊗ b ⊗ c)[i] ⋅ (a^′⊗ b^′⊗ c^′)[i] = a[i₁]b[i₂]c[i₃]a^′[i₁]b^′[i₂]c^′[i₃] and {(a⊙a^′)⊗(b⊙b^′)⊗(c⊙c^′)}[i] = (a⊙a^′)[i₁](b⊙b^′)[i₂](c⊙c^′)[i₃] = a[i₁]a^′[i₁]b[i₂]b^′[i₂]c[i₃]c^′[i₃]coincide. Note that on the left-hand side of (11) ⊙ acts on V ×V, whereas on the right-hand side ⊙ acts on $\mathbb {R}^{n}\times \mathbb {R}^{n}$.

Another example is the canonical scalar product of a (pre-)Hilbert tensor space X satisfying

$$\left\langle \bigotimes_{j = 1}^{d} x^{(j)},~ \bigotimes_{j = 1}^{d} y^{(j)}\right\rangle = {\prod}_{j = 1}^{d} \left\langle x^{(j)},y^{(j)}\right\rangle. $$

This corresponds to (10) with Y = X and $\mathbf {Z}=\mathbb {R}$ (the field $\mathbb {R}$ is considered as a tensor space of order d = 0).

The notation $(\mathbf {x}\star \mathbf {y})[\mathbf {i}]={\sum }_{\mathbf {j}}\mathbf {x}[\mathbf {i}-\mathbf {j}]\mathbf {y}[\mathbf {j}]$ of the multivariate convolution involving multiindices $\mathbf {i}\in \mathbb {N}_{0}^{d}$ shows that also $\boxdot =\star $ satisfies (10). For d = 3, we have

$$ \left( a_{\nu}\otimes b_{\nu}\otimes c_{\nu}\right) \star \left( a_{\nu}^{\prime}\otimes b_{\nu}^{\prime}\otimes c_{\nu}^{\prime}\right) = \left( a_{\nu}\star a_{\nu}^{\prime}\right) \otimes\left( b_{\nu}\star b_{\nu}^{\prime}\right) \otimes\left( c_{\nu}\star c_{\nu}^{\prime}\right). $$

(12)

Hence, the Hadamard and convolution operations can be reduced to operations acting on vectors in $\mathbb {R}^{n}$. If v and w are given in the form (9), all pairs of elementary terms can be treated by (11) or (12), respectively.

3.2 Introduction of the Hierarchical Format

In the following, we use the hierarchical format, which has the additional advantage that a SVD truncation can be performed (cf. [10, Section 11]). For that purpose, we need tensors of order 2 (matrix case) and rewrite $\mathbb {R}^{n}\otimes \mathbb {R}^{n}\otimes \mathbb {R}^{n}$ as $(\mathbb {R}^{n}\otimes \mathbb {R}^{n}) \otimes \mathbb {R}^{n}\cong \mathbb {R}^{n^{2}}\otimes \mathbb {R}^{n}$. In a second step, we split $\mathbb {R}^{n^{2}}$ into $\mathbb {R}^{n}\otimes \mathbb {R}^{n}$. This leads to the binary tree shown in Fig. 2.

In the first step, we regard the components v[i] = v[i₁,i₂,i₃] of $v\in \mathbb {R}^{N}$ as entries V [(i₁,i₂),i₃] of the matrix $V\in \mathbb {R}^{n^{2}\times n}\cong \mathbb {R}^{n^{2}}\otimes \mathbb {R}^{n}$. As in Section 2, we may write V as ${\sum }_{\nu = 1}^{s}v_{\nu }^{(12)}\otimes v_{\nu }^{(3)}$ (cf. (5)) with $v_{\nu }^{(12)}\in \mathbb {R}^{n^{2}}$ and $v_{\nu }^{(3)}\in \mathbb {R}^{n}$. In the second step, we regard $v_{\nu }^{(12)}$ as n × n matrices or equivalently as tensors of $\mathbb {R}^{n}\otimes \mathbb {R}^{n}$ of the form ${\sum }_{\nu = 1}^{r}v_{\nu } ^{(1)}\otimes v_{\nu }^{(2)}$.

Combining the structures of Figs. 1 and 2 yields the splitting depicted in Fig. 3. At the top of the tree, we see the matrix space $\mathbb {R}^{N\times N}\cong \mathbb {R}^{N}\otimes \mathbb {R}^{N}$ with the sons $\mathbb {R}^{N}$ on both sides. $\mathbb {R}^{N}\cong \mathbb {R}^{n^{2}}\otimes \mathbb {R}^{n}$ is split into $\mathbb {R}^{n^{2}}$ and $\mathbb {R}^{n}$. Finally, $\mathbb {R}^{n^{2}}\cong \mathbb {R}^{n}\otimes \mathbb {R}^{n}$ is split in two factors $\mathbb {R}^{n}$.

Following the construction (8), we associate each vertex of the tree with a subspace. The leaves of the tree correspond to $\mathbb {R}^{n}$. Therefore, there are six subspaces $U_{1},\ldots ,U_{6}\subset \mathbb {R}^{n}$. U₁₂ and U₄₅ are subspaces of $\mathbb {R}^{n}\otimes \mathbb {R}^{n}\cong \mathbb {R}^{n^{2}}$, while U₁₂₃ and U₄₅₆ are subspaces of $\mathbb {R}^{n}\otimes \mathbb {R}^{n}\otimes \mathbb {R}^{n}\cong \mathbb {R}^{N}$. Also, the root $\mathbb {R}^{N\times N}$ has a subspace U_{1 − 6}. The hierarchical structure is given by

$$ \mathbf{U}_{\alpha}\subset\mathbf{U}_{\alpha_{1}}\otimes\mathbf{U}_{\alpha_{2}}\qquad(\alpha_{1},\alpha_{2}\text{ sons of }\alpha), $$

(13)

where α belongs to the index set {12,123,45,456,1-6}, i.e., U₁₂ ⊂ U₁ ⊗ U₂, U₁₂₃ ⊂U₁₂ ⊗ U₃,…,U_1-6 ⊂U₁₂₃ ⊗U₄₅₆ (cf. Fig. 4). The condition (8) becomes

$$ F\in\mathbf{U}_{1\text{-}6}\qquad(1\text{-}6\text{ is the index of the root).} $$

(14)

The subspaces are (in principle) described by a basis (or at least a generating system). The bases of U₁,…,U₆ corresponding to the leaves must be given explicitly. For the other indices, we avoid an explicit description since the basis vectors of $\mathbb {R}^{n^{2}}$, $\mathbb {R}^{N}=\mathbb {R}^{n^{3}}$, etc. are too large. Instead, we make use of (13). Let α be an index of an inner vertex of the tree (no leaf) and α₁, α₂ its sons. Let $\{\mathbf {b}_{i}^{(\alpha _{1})}:1\leq i\leq r_{\alpha _{1}}\}$ and $\{\mathbf {b}_{j}^{(\alpha _{2})}:1\leq j\leq r_{\alpha _{2}}\}$ be the bases of $\mathbf {U}_{\alpha _{1}}$ and $\mathbf {U}_{\alpha _{2}}$. Then $\{\mathbf {b}_{i}^{(\alpha _{1})}\otimes \mathbf {b}_{j}^{(\alpha _{2})}:1\leq i\leq r_{\alpha _{1}},1\leq j\leq r_{\alpha _{2}}\}$ is a basis of $\mathbf {U}_{\alpha _{1}}\otimes \mathbf {U}_{\alpha _{2}}$. A basis vector $\mathbf {b}_{\ell }^{(\alpha )}\in \mathbf {U}_{\alpha }\subset \mathbf {U}_{\alpha _{1}}\otimes \mathbf {U}_{\alpha _{2}}$ must have a representation

$$ \mathbf{b}_{\ell}^{(\alpha)}=\sum\limits_{i,j}c_{ij}^{(\alpha,\ell)}\mathbf{b}_{i}^{(\alpha_{1})}\otimes\mathbf{b}_{j}^{(\alpha_{2})} $$

(15)

with coefficients $c_{ij}^{(\alpha ,\ell )}$ forming an $r_{\alpha _{1}}\times r_{\alpha _{2}}$ matrix

$$ C^{(\alpha,\ell)}=(c_{ij}^{(\alpha,\ell)}). $$

(16)

It is sufficient to store C^(α,ℓ) instead of $\mathbf {b}_{\ell }^{(\alpha )}$. Note that the necessary memory is independent of the vector size n.

If (14) holds, the subspace U_1-6 can be reduced to the one-dimensional space U_root = span{F}. Let $\mathbf {b}_{1}^{(\text {root})}$ be the only basis vector. Then, only one additional factor $c_{1}^{(\text {root})}$ is needed to characterise

$$ F=c_{1}^{(\text{root})}\mathbf{b}_{1}^{(\text{root})}. $$

(17)

Remark 2

(a)
In the given example, we have to store the bases of U₁,…,U₆ with the memory size ${\sum }_{j = 1}^{6}n_{j}r_{j}$. The matrices C^(α,ℓ) require the memory size r₁₂r₁r₂ + r₄₅r₄r₅ + r₁₂₃r₁₂r₃ + r₄₅₆r₄₅r₆ + 1 ⋅ r₁₂₃r₄₅₆. $c_{1}^{(\text {root})}$ is only one real number. If n_j ≤ n and r_j ≤ r, the required memory size is bounded by 6nr + 4r³ + r² + 1.
(b)
In the general case of tensors of order d (instead of 6 as above), the bound is dnr + (d − 1)r³ + 1.

Below, we shall demonstrate that we can perform the required operations although we only have an indirect access to the bases.

3.3 Matricisation

The above construction gives rise to two questions: Do subspaces with the properties (13), (14) exist and what are their dimensions

$$r_{\alpha}=\dim(\mathbf{U}_{\alpha}) $$

in the best case? The answer is given by the matricisation which maps a tensor isomorphically into a matrix. We explain this isomorphism for the example α = 45. The tensor $F\in \bigotimes _{j = 1}^{6}\mathbb {R}^{n}$ has six indices (we write F[i₁,…,i₆] instead of F[i₁,i₂,i₃,j₁,j₂,j₃] = F[i,j]). The matrix M⁽⁴⁵⁾ is of the size $\mathbb {R}^{n^{2}\times n^{4}}$ and has the entries

$$M^{(45)}[(i_{4},i_{5}), (i_{1},i_{2},i_{3},i_{6})] := F[i_{1},i_{2},i_{3},i_{4},i_{5},i_{6}]. $$

The subspace

$$\mathbf{U}_{45}:=\text{range}(M^{(45)})\qquad\text{with }r_{45}= \dim(\mathbf{U}_{45})=\operatorname{rank}(M^{(45)}) $$

is the smallest subspace satisfying (13) and (14). For a more general description of the minimal subspaces see [10, Section 6].

For $\mathbf {v}\in \bigotimes _{j = 1}^{d}\mathbb {R}^{n_{j}}$ let $\emptyset \neq \alpha \subsetneqq \{1,\ldots ,d\}$ be a subset with the complement α^c := {1,…,d}∖α. In general, the minimal subspace $\mathbf {U}_{\alpha }^{\min }(\mathbf {v}):=\text {range}(M^{(\alpha )})$ involves the matricisation M^(α) = M^(α)(v) which is defined by $M^{(\alpha )}[(i_{j})_{j\in \alpha },(i_{j})_{j\in \alpha ^{c}}]=v[i_{1},\ldots ,i_{d}]$. Note that the index sets need not be ordered, since we only use properties of M^(α) which do not depend on the ordering. The (matrix) rank of M^(α) is called the α-rank of v (cf. Hitchcock [13]):

$$\operatorname{rank}_{\alpha}(\mathbf{v}):=\operatorname{rank}(M^{(\alpha )}(\mathbf{v})). $$

3.4 Hadamard Product and General Bilinear Operations

In the following, the Hadamard product ⊙ can be replaced by a general bilinear operation $\boxdot $ (cf. (10)).

In (7), we need the Hadamard product v ⊙w of two tensors in $\bigotimes _{j = 1}^{3}\mathbb {R}^{n}$. We assume that both v and w are represented in the hierarchical format corresponding to the tree depicted in Fig. 2. v uses the bases $\{b_{i}^{(j)}:1\leq i\leq r_{j}\},~1\leq j\leq 3$, at the leaves and the coefficients $c_{ij}^{(\alpha ,\ell )}$, $c_{1}^{(\text {root})}$, whereas w is represented by $\{b_{i}^{\prime (j)}\}$, $c_{ij}^{\prime (\alpha ,\ell )}$, $c_{1}^{\prime (\text {root})}$. Also, the ranks r_α and $r_{\alpha }^{\prime }$ may be different.

We start at the leaves and determine the Hadamard product of the basis vectors explicitly:

$$b_{(i,i^{\prime})}^{\prime\prime(j)}:=b_{i}^{(j)}\odot b_{i^{\prime}}^{\prime(j)}\qquad(1\leq j\leq3,~ 1\leq i\leq r_{j},~ 1\leq i^{\prime}\leq r_{j}^{\prime}). $$

By induction, we assume that the products $\mathbf {b}_{(i,i^{\prime })}^{\prime \prime (\alpha _{1})}$ and $\mathbf {b}_{(j,j^{\prime })}^{\prime \prime (\alpha _{2})}$ are (directly or indirectly) determined. Then, (15) and (11) prove that

$$\begin{array}{@{}rcl@{}} \mathbf{b}_{(\ell,m)}^{\prime\prime(\alpha)} & :=&\mathbf{b}_{\ell}^{(\alpha)}\odot\mathbf{b}_{m}^{\prime(\alpha)}=\left( \sum\limits_{i,j}c_{ij}^{(\alpha,\ell)}\mathbf{b}_{i}^{(\alpha_{1})}\otimes\mathbf{b}_{j}^{(\alpha_{2})}\right) \odot\left( \sum\limits_{i^{\prime},j^{\prime}}c_{i^{\prime}j^{\prime}}^{\prime(\alpha,m)}\mathbf{b}_{i^{\prime}}^{\prime(\alpha_{1})}\otimes\mathbf{b}_{j^{\prime}}^{\prime(\alpha_{2})}\right)\\ &=&\sum\limits_{i,j}\sum\limits_{i^{\prime},j^{\prime}}c_{ij}^{(\alpha,\ell)}c_{i^{\prime}j^{\prime}}^{\prime(\alpha,m)}\left( \mathbf{b}_{i}^{(\alpha_{1})}\odot\mathbf{b}_{i^{\prime}}^{\prime(\alpha_{1})}\right) \otimes\left( \mathbf{b}_{j}^{(\alpha_{2})}\odot\mathbf{b}_{j^{\prime}}^{\prime(\alpha_{2})}\right) \\ &=&\sum\limits_{\left( i,i^{\prime}\right)}\sum\limits_{\left( j,j^{\prime}\right)}c_{ij}^{(\alpha,\ell)}c_{i^{\prime}j^{\prime}}^{\prime(\alpha,m)}\mathbf{b}_{(i,i^{\prime})}^{\prime\prime(\alpha_{1})}\otimes\mathbf{b}_{(j,j^{\prime})}^{\prime\prime(\alpha_{2})}. \end{array} $$

(18)

The result x := v ⊙w is represented by the generating system $\{b_{(i,i^{\prime })}^{\prime \prime (j)}\}$, 1 ≤ j ≤ 3, at the leaves. Here, the pairs (i,i^′) are the indices; thus, the index set has the size $r_{j}^{\prime \prime }:=r_{j}r_{j}^{\prime }$. The equation (15) for the new vector contains the coefficients $c_{(i,i^{\prime }),(j,j^{\prime })}^{\prime \prime (\alpha ,(\ell ,m))}:=c_{ij}^{(\alpha ,\ell )}c_{i^{\prime }j^{\prime }}^{\prime (\alpha ,m)}$. The coefficient $c_{1}^{\prime \prime (\text {root})}$ is $c_{1}^{(\text {root})}c_{1}^{\prime (\text {root})}$, since $\mathbf {v}\odot \mathbf {w}=\left (c_{1}^{(\text {root})}\mathbf {b}_{1}^{(\text {root})}\right ) \odot \left (c_{1}^{\prime (\text {root})}\mathbf {b}_{1}^{\prime (\text {root})}\right ) =c_{1}^{(\text {root})}c_{1}^{\prime (\text {root})}\mathbf {b}_{1}^{(\text {root})}\odot \mathbf {b}_{1}^{\prime (\text {root})}=c_{1}^{(\text {root})}c_{1}^{\prime (\text {root})}\mathbf {b}_{(1,1)}^{\prime \prime (\text {root})}$.

We call $\{\mathbf {b}_{(i,i^{\prime })}^{\prime \prime (\alpha )}\}$ a generating system (or frame) since these vectors are not necessarily linearly independent. If not, the system $\{\mathbf {b}_{(i,i^{\prime })}^{\prime \prime (\alpha )}\}$ is larger than necessary and we can shorten the system. Even if $\{\mathbf {b}_{(i,i^{\prime })}^{\prime \prime (\alpha )}\}$ forms a basis, the question remains whether we can truncate the basis within a given tolerance. This will be the subject of Section 3.6.

Remark 3

The computation of all $b_{(i,i^{\prime })}^{\prime \prime (j)}$ requires $3nr_{j}r_{j}^{\prime }$ multiplications. If all coefficients $c_{(i,i^{\prime }),(j,j^{\prime })}^{\prime \prime (\alpha ,(\ell ,m))}$ are computed explicitly, we need $r_{\alpha }r_{\alpha }^{\prime }r_{\alpha _{1}}r_{\alpha _{1}}^{\prime }r_{\alpha _{2}}r_{\alpha _{2}}^{\prime }$ multiplications. The resulting cost is the product of the data sizes of v and w.

In Section 4, the ranks $r_{\alpha }^{\prime }$, $r_{\alpha _{1}}^{\prime }$, $r_{\alpha _{2}}^{\prime }$ will be equal to 2.

3.5 Scalar Product, Orthonormalisation, Transformations

As mentioned above, the linear independence of the new frame $\{\mathbf {b}_{(i,i^{\prime })}^{\prime \prime (\alpha )}\}$ has to be checked. This can be done by the QR algorithm, provided we are able to determine scalar products $\left \langle \mathbf {b}_{(i,i^{\prime })}^{\prime \prime (j)},\mathbf {b}_{(m,m^{\prime })}^{\prime \prime (j)}\right \rangle $ of the vectors determined in (18). We simplify the notation (index i instead of (ℓ,m)) and consider the bases $\{\mathbf {b}_{i}^{(\alpha )}\}$ at the vertex α and their connection by (15). We proceed from the leaves to the root as in Section 3.4.

At the leaves, the bases are explicitly given so that the scalar products

$$ \sigma_{ij}^{(\alpha)}:=\left\langle \mathbf{b}_{i}^{(\alpha)},\mathbf{b}_{j}^{(\alpha)}\right\rangle $$

(19)

can be determined as usual. As soon as $\sigma _{ij}^{(\alpha _{1})}$ and $\sigma _{ij}^{(\alpha _{2})}$ are known for the sons of α, $\sigma _{\ell m}^{(\alpha )}$ can be determined by

$$\begin{array}{@{}rcl@{}} \sigma_{\ell m}^{(\alpha)} & =& \left\langle \mathbf{b}_{\ell}^{(\alpha)},\mathbf{b}_{m}^{(\alpha)}\right\rangle =\left\langle \sum\limits_{i,j} c_{ij}^{(\alpha,\ell)}\mathbf{b}_{i}^{(\alpha_{1})}\otimes\mathbf{b}_{j}^{(\alpha_{2})},\sum\limits_{i^{\prime},j^{\prime}}c_{i^{\prime}j^{\prime}}^{(\alpha,m)}\mathbf{b}_{i^{\prime}}^{(\alpha_{1})}\otimes\mathbf{b}_{j^{\prime}}^{(\alpha_{2})}\right\rangle\\ & =& {\sum}_{i,j}\sum\limits_{i^{\prime},j^{\prime}}c_{ij}^{(\alpha,\ell)}c_{i^{\prime}j^{\prime}}^{(\alpha,m)}\left\langle \mathbf{b}_{i}^{(\alpha_{1})},\mathbf{b}_{i^{\prime}}^{(\alpha_{1})}\right\rangle \left\langle \mathbf{b}_{j}^{(\alpha_{2})},\mathbf{b}_{j^{\prime}}^{(\alpha_{2})}\right\rangle ={\sum}_{i,j}\sum\limits_{i^{\prime},j^{\prime}}c_{ij}^{(\alpha,\ell)}c_{i^{\prime}j^{\prime}}^{(\alpha,m)}\sigma_{ii^{\prime}}^{(\alpha_{1})}\sigma_{jj^{\prime}}^{(\alpha_{2})}, \end{array} $$

(20)

since the Euclidean scalar product satisfies the rule 〈v ⊗ w,x ⊗ y〉 = 〈v,x〉〈w,y〉. The induction (20) terminates at the vertex α, where the scalar products (19) are desired.

Of particular interest are orthonormal bases: $\sigma _{ij}^{(\alpha )}=\delta _{ij}$. Using (15), we obtain the following result.

Remark 4

Let α be a non-leaf vertex. The basis $\{\mathbf {b}_{\ell }^{(\alpha )}\}$ is orthonormal, if (a) the bases $\{\mathbf {b}_{i}^{(\alpha _{1})}\}$ and $\{\mathbf {b}_{j}^{(\alpha _{2})}\}$ of the sons α₁,α₂ are orthonormal and (b) the matrices C^(α,ℓ) in (16) are orthonormal with respect to the Frobenius scalar product: $\langle C^{(\alpha ,\ell )},C^{(\alpha ,m)}\rangle _{\mathsf {F}}={\sum }_{ij}c_{ij}^{(\alpha ,\ell )}c_{ij}^{(\alpha ,m)}=\delta _{\ell m}$.

The bases (or frames) can be orthonormalised as follows. Orthonormalise the explicitly given bases at the leaves (e.g., by QR). As soon as $\{\mathbf {b}_{i}^{(\alpha _{1})}\}$ and $\{\mathbf {b}_{j}^{(\alpha _{2})}\}$ are orthonormal, orthonormalise the matrices C^(α,ℓ). The new matrices $C_{\text {new}}^{(\alpha ,\ell )}$ define a new orthonormal basis $\{\mathbf {b}_{\ell ,\text {new}}^{(\alpha )}\}$. The cost is described in [10, Remark 11.32].

The above mentioned calculations require basis transformations. Here, the following has to be taken into account (cf. [10, Section 11.3.1.4]).

Case A1. Let α₁ be the first son of α. Assume that the basis $\{\mathbf {b}_{i}^{(\alpha _{1})}\}$ is transformed into a new basis $\{\mathbf {b}_{i,\text {new}}^{(\alpha _{1})}\}$ so that $\textbf {b}_{i}^{(\alpha _{1})}={\sum }_{k}T_{ki} \mathbf {b}_{k,\text {new}}^{(\alpha _{1})}$. Changing C^(α,ℓ) into $C_{\text {new}}^{(\alpha ,\ell )}:=TC^{(\alpha ,\ell )}$, the basis $\{\mathbf {b}_{\ell }^{(\alpha )}\}$ remains unchanged.
Case A2. If $\textbf {b}_{i}^{(\alpha _{2})}={\sum }_{k}T_{ki}\mathbf {b}_{k,\text {new}}^{(\alpha _{2})}$ is a transformation of the second son of α, C^(α,ℓ) must be changed into C^(α,ℓ)T^T.
Case B. Consider a non-leaf vertex α. If the basis $\{\mathbf {b}_{\ell }^{(\alpha )}\}$ should be transformed into $\mathbf {b}_{\ell ,\text {new}}^{(\alpha )}:={\sum }_{i}T_{\ell i}\mathbf {b}_{i}^{(\alpha )}$, one has to change the coefficient matrices C^(α,ℓ) by $C_{\text {new}}^{(\alpha ,\ell )}:={\sum }_{i}T_{\ell i}C^{(\alpha ,i)}$. (In addition, this transformation causes changes at the father vertex according to Case A1 or Case A2.)

3.6 SVD Truncation

The example in Section 3.4 shows that the Hadamard product is given by means of a generating system of increased size $r_{j}^{\prime \prime }:=r_{j}r_{j}^{\prime }$. This size may be larger than necessary and should be truncated. The truncation is prepared by an orthonormalisation as described in Section 3.5.

In principle, the SVD truncation is based on the singular-value decompositions of the matricisations^{Footnote 5}M^(α) (cf. Section 3.3). However, the singular values and singular vectors can be determined without the explicit knowledge of the huge matrix M^(α).

Having generated orthonormal bases at all nodes, the singular-value decomposition starts at the root and proceeds to the leaves. It produces a basis $\{\mathbf {b}_{\ell ,\text {new}}^{(\alpha )}\}$ together with singular values $\sigma _{\ell }^{(\alpha )}$ indicating the importance of $\mathbf {b}_{\ell ,\text {new}}^{(\alpha )}$. At the start α = root there is only one (normalised) basis vector $\mathbf {b}_{1}^{(\text {root})}=\mathbf {b}_{1,\text {new}}^{(\text {root})}$ which remains unchanged. The corresponding weight factor is $\sigma _{1}^{(\text {root})}=|c_{1}^{(\text {root})}|$ (cf. (17)).

Assume that the new basis $\{\mathbf {b}_{\ell ,\text {new}}^{(\alpha )}\}$ is already computed at the vertex α and that α is not a leaf but has sons α₁, α₂. The basis $\{\mathbf {b}_{\ell }^{(\alpha )}\}$ is characterised by the matrices C^(α,ℓ). Together with the given values $\sigma _{\ell }^{(\alpha )}$, we define the matrices^{Footnote 6}

$$\begin{array}{@{}rcl@{}} \mathbf{Z}_{1} & :=& \left[\sigma_{1}^{(\alpha)}C^{(\alpha,1)},\sigma_{2}^{(\alpha)}C^{(\alpha,2)},\ldots,\sigma_{r_{\alpha}}^{(\alpha)} C^{(\alpha,r_{\alpha})}\right] \in\mathbb{R}^{r_{\alpha_{1}}\times(r_{\alpha}r_{\alpha_{2}})},\\ \mathbf{Z}_{2} & :=& \left[\sigma_{1}^{(\alpha)}\left( C^{(\alpha,1)}\right)^{\mathsf{T}},\sigma_{2}^{(\alpha)}\left( C^{(\alpha,2)}\right)^{\mathsf{T}},\ldots,\sigma_{r_{\alpha}}^{(\alpha)}\left( C^{(\alpha,r_{\alpha})}\right)^{\mathsf{T}}\right]^{\mathsf{T}}\in\mathbb{R}^{(r_{\alpha}r_{\alpha_{1}}) \times r_{\alpha_{2}}}. \end{array} $$

The SVD of these matrices yields $\mathbf {Z}_{1}={\sum }_{i}\sigma _{i}^{(\alpha _{1})}u_{i}^{(\alpha _{1})}\otimes v_{i}^{(\alpha _{1})}$ and $\mathbf {Z}_{2}={\sum }_{i}\sigma _{i}^{(\alpha _{2})}u_{i}^{(\alpha _{2})}\otimes v_{i}^{(\alpha _{2})}$ with orthonormal vectors $u_{i}^{(\alpha _{1})}\in \mathbb {R}^{r_{\alpha _{1}}}$ and $v_{i}^{(\alpha _{2})}\in \mathbb {R}^{r_{\alpha _{2}}}$. Now, we have to transform the bases at the son nodes: $\{\mathbf {b}_{i,\text {new}}^{(\alpha _{1})}\}:=\{u_{i}^{(\alpha _{1})}\}$ becomes the new basis for α₁, and $\{\mathbf {b}_{i,\text {new}}^{(\alpha _{2})}\}:=\{v_{i}^{(\alpha _{2})}\}$ becomes the new basis for α₂. The new bases are called the HOSVD bases (cf. Footnote 5).

The procedure is repeated for the sons of α₁, α₂ until we reach the leaves. Then, at all vertices, HOSVD bases are introduced together with singular values $\sigma _{\nu }^{(\alpha )}$. As in Section 2.2, the SVD truncation consists of omitting all basis vectors corresponding to small enough singular values. Let $\sigma _{\nu }^{(\alpha )}$, 1 ≤ ν ≤ r_α, be all singular values at α. Assume that we keep $\sigma _{\nu }^{(\alpha )}$ for 1 ≤ ν ≤ s_α and omit those for ν > s_α. This means that (15) is reduced to $\mathbf {b}_{\ell }^{(\alpha )}$ with ℓ ≤ s_α and that the double sum in (15) is taken over $i\leq s_{\alpha _{1}}$ and $j\leq s_{\alpha _{2}}$. Let v be the input tensor, while v_HOSVD denotes the truncated version. Then, the following estimate holds (cf. [10, Theorem 11.58]):

$$\|\mathbf{v}-\mathbf{v}_{\text{HOSVD}}\| \leq\sqrt{\sum\limits_{\alpha}\sum\limits_{\nu\geq s_{\alpha}+ 1}(\sigma_{\nu}^{(\alpha)})^{2}}\leq\sqrt{2d-3}\|\mathbf{v}-\mathbf{v}_{\text{best}}\|. $$

The first inequality allows us to explicitly control the error with respect to the Euclidean norm by the choice of the omitted singular values. The second inequality proves quasi-optimality of this truncation. v_best is the best approximation with the property that v_best satisfies rankα(v_best) ≤ s_α. The parameter d is the order of the tensor, i.e., d = 6 in the case of Fig. 3 and d = 3 for Fig. 2. Only in the (matrix) case of d = 2, v_HOSVD coincides with v_best.

3.7 Convolution

The treatment of Section 3.4 for the Hadamard operation ⊙ holds for any binary operation with the property (10). Because the multivariate convolution satisfies the analogous condition (12), the constructions of Section 3.4 also hold for the convolution ⋆ instead of ⊙. Therefore, we can perform the convolution in $\mathbb {R}^{n}\otimes \mathbb {R}^{n}\otimes \mathbb {R}^{n}\cong \mathbb {R}^{N}$, provided that we are able to perform the convolution $(v\star w)_{i}={\sum }_{\ell }v_{i-\ell }w_{\ell }$ in $\mathbb {R}^{n}$.

The standard approach is the use of FFT (fast Fourier transform): First, the vectors v,w are mapped into their (discrete) Fourier images $\hat {v},\hat {w}$; then, the Hadamard product $x:=\hat {v}\odot \hat {w}$ is back-transformed into the convolution result $\check {x}=v\star w$ (with suitable scaling). As well-known, the corresponding work is $\mathcal {O}(n\log n)$. For large n, this is still expensive. In the next chapter, we shall describe a much cheaper algorithm for v ⋆ w.

4 Tensorisation

The tensorisation has been introduced by Oseledets [17] (but for matrices instead of vectors). It is more natural to study this technique for vectors. The article Khoromskij [15]^{Footnote 7} is the first one in this direction and contains several examples of this technique. Tensorisation^{Footnote 8} together with truncation can be considered as an algebraic data compression method which is at least as successful as particular analytical compressions, e.g., by means of wavelets, hp methods. The analysis by Grasedyck [6] shows that under suitable conditions, the data size $N(\tilde {\mathbf {v}}_{\varepsilon })=\mathcal {O}(\log n)$ can be expected. Compression by tensorisation can be seen as a quite general multi-scale approach.

Here, we consider operations between vectors. The crucial point is that the computational work of the operations should be related to the data size of the operands. Assuming a data size ≪ n, the cost should also be much smaller than the operation cost in the standard $\mathbb {R}^{n}$ vector format. In particular, we discuss the Hadamard product and the (one-dimensional) convolution operation u := v ⋆ w with $u_{i}={\sum }_{k}v_{k}w_{i-k}$. We shall show that the convolution procedure can be applied directly to the tensor approximations $\tilde {\mathbf {v}}_{\varepsilon }$ and $\tilde {\mathbf {w}}_{\varepsilon }$. The algorithm developed in Section 4.4 has a cost related to the data sizes $N(\tilde {\mathbf {v}}_{\varepsilon })$, $N(\tilde {\mathbf {w}}_{\varepsilon })$.

4.1 Grid Functions in $\mathbb {R}^{n}$

The following algorithms will apply to vectors in $\mathbb {R}^{n}$ with n = 2^L. The connection to the previous part is given by the fact that in Section 3 we have to perform various operations with the basis vectors $b_{i}^{(j)}\in \mathbb {R}^{n}$. However, more general, the techniques of this chapter can be used for computations in $\mathbb {R}^{n}$ without connection to the tensor problems in Sections 2 and 3.

Tensorisation is an interpretation of a usual $\mathbb {R}^{n}$ vector as a tensor. Since n = 2^L, there is a representation of the indices 0 ≤ k ≤ n − 1 by the binary numeral (i_L,i_L− 1,…,i₁)₂:

$$ k=\sum\limits_{\ell= 1}^{L}i_{\ell}2^{\ell-1},\qquad i_{\ell}\in\{0,1\}. $$

(21)

We map the vector $v\in \mathbb {R}^{n}$ into the tensor $\mathbf {v}\in \otimes ^{L}\mathbb {R}^{2}:=\bigotimes _{j = 1}^{L}\mathbb {R}^{2}$ of order L by means of

$$ \mathbf{v}[i_{1},\ldots,i_{L}]=v_{k}\qquad\text{with }k\text{ and }i_{j}\text{ as in (21)}. $$

(22)

Since $n=\dim (\mathbb {R}^{n})=\dim (\otimes ^{L}\mathbb {R}^{2})= 2^{L}$, (22) describes an isomorphism

$$ {\Phi}:\otimes^{L}\mathbb{R}^{2}\rightarrow\mathbb{R}^{n},\quad\mathbf{v}\mapsto v. $$

(23)

On the side of tensors, we shall introduce a hierarchical tensor representation (cf. Section 3). This allows a simple truncation procedure v↦v_ε (cf. Section 3.6). Often, the data size N(v_ε) of v_ε is much smaller than n (see Example 2). As a consequence, the tensorisation together with the truncation yields a black-box compression method for vectors in $\mathbb {R}^{n}$.

4.2 TT Format

The underlying tree of the hierarchical representation is the linear tree^{Footnote 9} depicted in Fig. 5. Hierarchical representations based on a linear tree are introduced by Oseledets [17] as TT format (cf. Oseledets–Tyrtyshnikov [18]). In principle, the hierarchical format requires subspaces at the leaves. Since $\mathbb {R}^{2}$ is extremely low-dimensional, we take the full space $\mathbb {R}^{2}$ and fix the basis by $b_{1}^{(j)}=\binom {1}{0}$ and $b_{2}^{(j)}=\binom {0}{1}$. Figure 5 corresponds to L = 4 (i.e., n = 16). We replace the index α = {1,2,…,μ} for the inner vertices by μ ∈{2,…,L}. The subspaces U_μ belong to $\otimes ^{\mu }\mathbb {R}^{2}\cong \mathbb {R}^{2^{\mu }}$ (in particular $\mathbf {U}_{1}=\mathbb {R}^{2})$.

Since the TT-rank r_μ = rank(M^(μ)) is the minimal dimension of the required subspace $\mathbf {U}_{\mu }\subset \otimes ^{\mu }\mathbb {R}^{2}$, the matricisation M^(μ) of a tensor v is of interest. In fact, M^(μ) can be expressed by means of the corresponding vector v = Φ(v):

$$ M^{(\mu)}=\left[ \begin{array}[c]{cccc} v_{0} & v_{2^{\mu}} & {\ldots} & v_{2^{L-1}}\\ v_{1} & v_{2^{\mu}+ 1} & {\ldots} & v_{2^{L-1}+ 1}\\ {\vdots} & {\vdots} & {\ddots} & \vdots\\ v_{2^{\mu}-1} & v_{2^{\mu+ 1}-1} & {\ldots} & v_{2^{L}-1} \end{array} \right] $$

(24)

Since we use the spaces $\mathbb {R}^{2}$ at the leaves, condition (13) becomes

$$ \mathbf{U}_{\mu+ 1}\subset\mathbf{U}_{\mu}\otimes\mathbb{R}^{2}\qquad(1\leq\mu\leq L-1), $$

(25)

while (15) is

$$ \mathbf{b}_{\ell}^{(\mu+ 1)}=\sum\limits_{i = 1}^{r_{\mu}}\left[c_{i1}^{(\mu+ 1,\ell)}\mathbf{b}_{i}^{(\mu)}\otimes\binom{1}{0}+c_{i2}^{(\mu+ 1,\ell)}\mathbf{b}_{i}^{(\mu)}\otimes\binom{0}{1}\right] \quad\text{for }1\leq \ell\leq r_{\mu+ 1}. $$

(26)

Before we discuss the operations, we want to show that grid functions appearing in practice may have ranks of the order $\mathcal {O}(L)=\mathcal {O}(\log n)\ll n$.

Remark 5

Let f be an analytic function in (0,1] with a singularity at x = 0. An efficient approximation is given by the hp finite-element approach. In a simplified version, one uses polynomials of degree g to interpolate f in [1/2,1],[1/4,1/2],…, [2^−L,2 ⋅ 2^−L],[0,2^−L]. The data size is D = (L + 1)(g + 1) since there are L + 1 intervals and the polynomials have g + 1 coefficients. For the typical asymptotically smooth functions (cf. [11, Appendix E]), one obtains an error estimate decaying exponentially in D. Let F be the piecewise interpolation polynomial and evaluate F at the equidistant grid points: v_i := F(i ⋅ 2^−L) for 0 ≤ i ≤ n − 1. Inspection of the matrix M^(μ) shows that all columns except the first one contain grid values of a polynomial of degree g. Hence this part has at most the rank g + 1. The first column can increase the rank only by one so that r_μ = rank(M^(μ)) ≤ g + 2. Therefore, the TT format representing v = Φ^− 1(F) is of the same size as the hp approach. The optimal approximation of f by the TT format with rank(M^(μ)) ≤ g + 2 yields an error which is as most as large as the hp error, i.e., it is exponentially decreasing with g. More details can be found in Grasedyck [6].

Example 1

A particular function is the exponential z^x, where z≠ 0 may be any complex number. The grid values v_i are ζⁱ with $\zeta =z^{2^{-L}}$. For this vector, the columns of M^(μ) in (24) are linearly dependent so that rank(M^(μ)) = 1. In fact, v = Φ^− 1(v) is the elementary tensor $\mathbf {v}=\bigotimes _{j = 1}^{L}\left (\begin {array}[c]{c} 1\\ \zeta ^{2^{j-1}} \end {array} \right )$. Since $\sin (ax)=\frac {\exp (\mathrm {i}ax)-\exp (-\mathrm {i}ax)}{2\mathrm {i}}$, any trigonometric function leads to rank(M^(μ)) = 2.

This example (mentioned in [15]) implies the next remark.

Remark 6

All functions with a limited number of exponential terms lead to a constant bound of rank(M^(μ)) (e.g., $f(x)={\sum }_{\nu = 1}^{r}\alpha _{\nu }\exp (-\beta _{\nu }x)$ yields rank(M^(μ)) ≤ r). A similar result holds for functions involving a fixed number of trigonometric terms (band-limited functions).

An example of a band-limited function can be found in Khoromskij–Veit [16].

The next example again shows that exponential sums can approximate functions with point singularities (Remark 5 is another approach to this problem). This fact is important for applications in quantum chemistry where singularities appear at the positions of the nuclei. This is an indication that the basis vectors appearing in U_j (1 ≤ j ≤ 6) for the problem (1) allow a tensorisation with moderate ranks.

Example 2

For n = 2^L set $v=(f(k\cdot 2^{-L}))_{k = 0}^{n-1}\in \mathbb {R}^{n}$ for the function f(x) = 1/(1 − x) in [0,1). For any $r\in \mathbb {N}$, there is an approximation $v_{(r)}\in \mathbb {R}^{n}$ such that v_(r) := Φ^− 1(v_(r)) yields ranks r_μ = rank(M^(μ)) ≤ r and satisfies the componentwise error estimate

$$\left| v[k]-v_{(r)}[k]\right| \leq C_{1}n\exp(-C_{2}r)\qquad\text{ with }C_{1},C_{2}>0\text{ for all }0\leq k<n. $$

Hence, for a given error bound ε > 0, the choice $r=\mathcal {O}(\log (n)+\log \frac {1}{\varepsilon })$ is sufficient. The storage size of the tensor v_(r) is $\mathcal {O}(\log ^{2}(n)+\log (n)\log \frac {1}{\varepsilon })$.

Proof

The function 1/t can be approximated in [2^−L,1] by an expression of the form ${\sum }_{\nu = 1}^{r}\alpha _{\nu }\exp (-\beta _{\nu }x)$. The error estimates follow from Braess–Hackbusch [1]. □

4.3 Hadamard Product in $\mathbb {R}^{n}$

Since it does not matter whether the componentwise multiplication is realised via v_k ⋅ w_k or v[i₁,…,i_L] ⋅w[i₁,…,i_L], the property (10) holds also in the case of the artificial tensor product $\otimes ^{L}\mathbb {R}^{2}$; more precisely,

$${\Phi}\left( \bigotimes_{j = 1}^{L}v^{(j)}\right) \odot {\Phi}\left( \bigotimes_{j = 1}^{L}w^{(j)}\right) = {\Phi}\left( \bigotimes_{j = 1}^{L}\left( v^{(j)}\odot w^{(j)}\right)\right) = {\Phi}(\mathbf{v}\odot\mathbf{w}). $$

Conclusion 1

Assume v = Φ(v) and w = Φ(w). Let v,w be represented by the TT format. Then the Hadamard product v ⊙w can be computed as explained in Section 3.4. Since Φ(v ⊙w) = v ⊙ w, the result is the tensorisation of v ⊙ w. The computational cost is discussed in Section 3.4.

We return to the hierarchical format for true tensors as in Figs. 2 or 3. The subspaces at the leaves are described by bases containing $\mathbb {R}^{n}$ vectors. The application of the tensorisation to these vectors corresponds to an extended tree as sketched in Fig. 6.

The combination of the tree in Fig. 2 with the TT tree corresponds to $\mathbb {R}^{N}\cong \otimes ^{3}(\otimes ^{L}\mathbb {R}^{2}) \cong \otimes ^{3L}\mathbb {R}^{2}$. For tensors represented in this format, we can again apply the algorithm in Section 3.4 to compute v ⊙w for $\mathbf {v},\mathbf {w}\in \mathbb {R}^{N}$.

4.4 Convolution in $\mathbb {R}^{n}$

4.4.1 Definition of the Convolution

We take a closer look to the convolution operation. The sum in $(v\star w)_{i}={\sum }_{\ell }v_{i-\ell }w_{\ell }$ is restricted to those ℓ with 0 ≤ i − ℓ, ℓ ≤ n − 1, i.e.,

$$ (v\star w)_{i}=\sum\nolimits_{\ell=\max\{0,i + 1-n\}}^{\min\{n-1,i\}}v_{i-\ell}w_{\ell}. $$

(27)

If i varies in $[0,n-1]\cap \mathbb {Z}$, the sum can be written as ${\sum }_{\ell = 0}^{i}$. For i < 0, the empty sum yields (v ⋆ w)_i = 0, but for n ≤ i ≤ 2n − 2, the sum in (27) is not empty. This shows the following remark.

Remark 7

The convolution of two $\mathbb {R}^{n}$ vectors yield an $\mathbb {R}^{2n-1}$ vector.

The notation becomes simpler if we replace the vector $v\in \mathbb {R}^{n}$ by the infinite sequence $v=(v_{i})_{i\in \mathbb {N}_{0}}$ with $\mathbb {N}_{0}=\mathbb {N}\cup \{0\}$ and v_i := 0 for all i ≥ n. The set $\ell _{0}=\ell _{0}(\mathbb {N}_{0})$ consists of all sequences with only finitely many nonzero components. Now, the sum becomes

$$(v\star w)_{i}=\sum\limits_{\ell= 0}^{i}v_{i-\ell}w_{\ell}\qquad\text{for all }i\in\mathbb{N}_{0}\text{ and all }v,w\in\ell_{0}. $$

Remark 8

The n-periodic convolution is $(v\star _{\text {per}}w)_{i}={\sum }_{\ell = 0}^{i}v_{i-\ell }w_{\ell }~(0\leq i\leq n-1)$, where all indices are understood modulo n. These values can be obtained by (v ⋆ _perw)_i = (v ⋆ w)_i + (v ⋆ w)_{n + i} for 0 ≤ i ≤ n − 1.

4.4.2 Principal Idea of the Algorithm

For multivariate (grid) functions, the definition of the convolution implies the property (10): the convolution of elementary tensors can be reduced to the tensor product of one-dimensional convolutions.

Since now the vector v is replaced by the tensor $\mathbf {v}\in \otimes ^{L}\mathbb {R}^{2}$, an obvious question is whether the product of $\mathbf {v}=\otimes _{j = 1}^{L}v^{(j)}$ and $\mathbf {w}=\otimes _{j = 1}^{L}w^{(j)}$ can be expressed by $\mathbf {x}:=\otimes _{j = 1}^{L}(v^{(j)}\star w^{(j)})$ corresponding to (10), i.e., whether the corresponding vectors satisfy Φ(v) ⋆ Φ(w) = Φ(x). In the naive sense, this cannot be true by the simple reason that v^(j) ⋆ w^(j) is a vector with three nontrivial components (cf. Remark 7). Therefore, the result does not belong to $\otimes ^{L}\mathbb {R}^{2}$. Furthermore, we must expect a result in $\otimes ^{L + 1}\mathbb {R}^{2}$ since v ⋆ w has the length 2n − 1 > 2^L and < 2^L+ 1.

4.4.3 Extension to $\otimes ^{L}\mathbb {\ell }_{0}$

According to Section 4.4.1, $\mathbb {R}^{2}$ can be considered as a subspace of $\mathbb {\ell }_{0}$. Hence, $\otimes ^{L}\mathbb {R}^{2}$ is contained in $\otimes ^{L}\mathbb {\ell }_{0}$. The linear map Φ defined in (23) can be extended to ${\Phi }:\otimes ^{L}\mathbb {\ell }_{0}\rightarrow \mathbb {\ell }_{0}$ by

$$ a={\Phi}\left( \bigotimes_{j = 1}^{L}v^{(j)}\right) \in\mathbb{\ell}_{0}\qquad\text{with }a_{k}=\underset{k={\sum}_{j = 1}^{L}i_{j}2^{j-1}}{\sum\limits_{i_{1},\ldots,i_{L}\in \mathbb{N}_{0}}}{\prod}_{j = 1}^{d}v^{(j)}[i_{j}] $$

(28)

(cf. Remark 1). In the case of $v^{(j)}\in \mathbb {R}^{2}$, the sum on the right-hand side of (28) contains only one term for 0 ≤ k ≤ n − 1 and the product ${\prod }_{j = 1}^{L}v^{(j)}[i_{j}]$ coincides with v[i₁,…,i_L] for $\mathbf {v}:=\bigotimes _{j = 1}^{L}v^{(j)}$ (cf. (22)).

For a better understanding, we look at the case of L = 2.

Remark 9

Let $e_{i}\in \mathbb {\ell }_{0}$ be the ith unit vector, i.e., e_i[j] = δ_ij ($i,j\in \mathbb {N}_{0}$). Then, b := Φ(a ⊗ e_i) is the vector $a\in \mathbb {\ell }_{0}$ shifted by 2i positions: b_k := 0 for 0 ≤ k < 2i and b_k = a_k− 2i for k ≥ i.

The shift by p positions is denoted by S^p. Thus, we can write b = S²ⁱa.

4.4.4 Polynomials

Next, we use the isomorphism between $\mathbb {\ell }_{0}$ and the space $\mathbb {P}$ of polynomials described by

$$\pi:\ell_{0}\rightarrow\mathbb{P}\quad\text{ with }~v\mapsto\pi[v](x):=\sum\limits_{k\in\mathbb{N}_{0}}v_{k}x^{k}. $$

The connection with the convolution is given by the property that the product of two polynomials has the coefficients of the convolution product:

$$ \pi[v] \pi[w]=\pi[v\star w]\qquad\text{for }v,w\in\ell_{0}. $$

(29)

We define an extension of $\pi :\ell _{0}\rightarrow \mathbb {P}$ to $\hat {\pi }:\otimes ^{L}\mathbb {\ell }_{0}\rightarrow \mathbb {P}$ by

$$ \hat{\pi}:\otimes^{L}\mathbb{\ell}_{0}\rightarrow\mathbb{P}\qquad\text{with }~\hat{\pi}\left[\bigotimes_{j = 1}^{L}v^{(j)}\right](x):={\prod}_{j = 1}^{L}\pi[v^{(j)}](x^{2^{j-1}}). $$

(30)

A shift of v by i positions corresponds to the product π[Sⁱv] = π[v](x) ⋅ xⁱ. This result together with Remark 9 shows that

$$ \hat{\pi}\left[\bigotimes_{j = 1}^{L}v^{(j)}\right] =\pi\left[{\Phi}\left( \bigotimes_{j = 1}^{L}v^{(j)}\right) \right]. $$

(31)

The extended map ${\Phi }:\otimes ^{L}\mathbb {\ell }_{0}\rightarrow \mathbb {\ell }_{0}$ is not injective. Two tensors $\mathbf {v}^{\prime },\mathbf {v}^{\prime \prime }\in \otimes ^{L}\mathbb {\ell }_{0}$ are called equivalent—denoted by v^′∼v^″ — if they represent the same vector: Φ(v^′) = Φ(v^″). From (31), we learn that the equivalence of v^′,v^″ can also be expressed by $\hat {\pi }[\mathbf {v}^{\prime }]=\hat {\pi }[\mathbf {v}^{\prime \prime }]$.

By comparing the values under the map $\hat {\pi }$, we obtain the following result.

Lemma 1

${\Phi }\left (\bigotimes _{j = 1}^{L}S^{m_{j}}v^{(j)}\right ) = S^{m}{\Phi }\left (\bigotimes _{j = 1}^{L}v^{(j)}\right )$ holds for $m={\sum }_{j = 1}^{L}m_{j}2^{j-1}$ .

According to (10), we define the convolution of two (elementary) tensors in $\otimes ^{L}\mathbb {\ell }_{0}$ by

$$ \left( \bigotimes_{j = 1}^{L}v^{(j)}\right) \star\left( \bigotimes_{j = 1}^{L}w^{(j)}\right) := \bigotimes_{j = 1}^{L}\left( v^{(j)}\star w^{(j)}\right). $$

(32)

Now, the product v^(j) ⋆ w^(j) makes sense since it belongs to $\mathbb {\ell }_{0}$. Next, we have to prove that the convolution introduced in (32) is consistent with the usual convolution of vectors.

Lemma 2

Let$v={\Phi }\left (\bigotimes _{j = 1}^{L}v^{(j)}\right )$and$w={\Phi }\left (\bigotimes _{j = 1}^{L}w^{(j)}\right )$bevectors in$\mathbb {\ell }_{0}$. Then, (32) implies

$${\Phi}\left( \bigotimes_{j = 1}^{L}\left( v^{(j)}\star w^{(j)}\right)\right)=v\star w. $$

Proof

Since $\pi :\ell _{0}\rightarrow \mathbb {P}$ is an isomorphism, the statement is equivalent to $\pi [{\Phi }(\bigotimes _{j = 1}^{L}(v^{(j)}\star w^{(j)}))]=\pi [v\star w]$. The left-hand side of this equation is

$$\begin{array}{@{}rcl@{}} \pi\left[{\Phi}\left( \bigotimes_{j = 1}^{L}\left( v^{(j)}\star w^{(j)}\right)\right)\right](x) & \underset{(31)}{=}&\hat{\pi}\left[\bigotimes_{j = 1}^{L}\left( v^{(j)}\star w^{(j)}\right) \right](x)\\ &\underset{(30)}{=}& {\prod}_{j = 1}^{L}\pi[v^{(j)}\star w^{(j)}](x^{2^{j-1}})\\ &\underset{(29)}{=}&{\prod}_{j = 1}^{L}\pi[v^{(j)}](x^{2^{j-1}})\cdot\pi[w^{(j)}](x^{2^{j-1}})\\ &=&\left( {\prod}_{j = 1}^{L}\pi[v^{(j)}](x^{2^{j-1}})\right) \cdot \left( {\prod}_{j = 1}^{L}\pi[w^{(j)}](x^{2^{j-1}})\right)\\ &\underset{(30)}{=}&\hat{\pi}\left[\bigotimes_{j = 1}^{L}v^{(j)}\right] (x)\cdot\hat{\pi}\left[\bigotimes_{j = 1}^{L}w^{(j)}\right](x)\\ &\underset{(31)}{=}&\pi\lbrack v](x)\cdot\pi[w](x)\underset{(29)}{=}\pi[v\star w](x). \end{array} $$

□

4.5 Carry-over Procedure

The result $\bigotimes _{j = 1}^{L}(v^{(j)}\star w^{(j)})$ is still unsatisfactory because $v^{(j)},w^{(j)}\in \mathbb {R}^{2}$ produce $v^{(j)}\star w^{(j)}\in \mathbb {R}^{3}$. A solution can be as follows. Let L = 2 as in Remark 9. Consider a ⊗ b with a,b ∈ ℓ₀. We want to find an equivalent tensor with factors in $\mathbb {R}^{2}$. Assume that a_K≠ 0, but a_i = 0 for i > K, which implies $a\in \mathbb {R}^{K + 1}$. If K = 1, a belongs to $\mathbb {R}^{2}$ and nothing has to be done. If K > 1 set $a^{\prime }\in \mathbb {R}^{2}$ with $a_{i}^{\prime }=a_{i}$ for i = 0,1 and a^″∈ ℓ₀ with $a_{i}^{\prime \prime }=a_{i + 2}$ for $i\in \mathbb {N}_{0}$. Using Remark 9, one checks that a ⊗ b represents the same vector as a^′⊗ b + a^″⊗ Sb, where Sb is the shifted version of b:

$${\Phi}(a\otimes b)={\Phi}(a^{\prime}\otimes b+a^{\prime\prime}\otimes Sb). $$

$a^{\prime }\in \mathbb {R}^{2}$ is already of the desired form. a^″ belongs to $\mathbb {R}^{K-1}$. This procedure can again be applied to a^″⊗ b^″ until all first factors belong to $\mathbb {R}^{2}$.

In the case of a general tensor $\bigotimes _{j = 1}^{L}v^{(j)}$, this procedure is applied to the first factor v⁽¹⁾ and yields sums of elementary tensors of the form $w^{(1)}\otimes \bigotimes _{j = 2}^{L}w^{(j)}$ with $w^{(1)}\in \mathbb {R}^{2}$. Then, the procedure is repeated with the second factor resulting in sums of the terms $x^{(1)}\otimes x^{(2)}\otimes \bigotimes _{j = 3}^{L}x^{(j)}$ with $x^{(1)},x^{(2)} \in \mathbb {R}^{2}$, etc. In the case of the last factor, we may have to add an (L + 1)-th factor. Since we know that v ⋆ w belongs to $\mathbb {R}^{2n-1}$, the (L + 1)-th factor must belong to $\mathbb {R}^{2}$.

4.6 Convolution Algorithm

We recall Remark 7: If $\mathbf {v},\mathbf {w}\in \bigotimes _{j = 1}^{L}\mathbb {R}^{2}$, the result is a tensor u := v ⋆ w in $\bigotimes _{j = 1}^{L + 1}\mathbb {R}^{2}$. Lemma 3 describes the start at δ = 1, while Lemma 4 characterises the recursion. In the following, the vector notation v =[ αβ] means v₀ = α, v₁ = β, i.e., the components must be read from the top to the bottom. By v ∼w, we denote the equivalence Φ(v) = Φ(w).

Lemma 3

The convolution ofv =[ αβ] and$w=\genfrac {[}{]}{0pt}{1}{\gamma }{\delta } \in \mathbb {R}^{2}=\bigotimes _{j = 1}^{1}\mathbb {R}^{2}$yields

$$ \genfrac{[}{]}{0pt}{1}{\alpha}{\beta} \star \genfrac{[}{]}{0pt}{1}{\gamma}{\delta} =\left[ \begin{array}[c]{c} \alpha\gamma\\ \alpha\delta+\beta\gamma\\ \beta\delta\\ 0\end{array} \right] \sim \genfrac{[}{]}{0pt}{1}{\alpha\gamma}{\alpha\delta+\beta\gamma} \otimes \genfrac{[}{]}{0pt}{1}{1}{0} + \genfrac{[}{]}{0pt}{1}{\beta\delta}{0} \otimes \genfrac{[}{]}{0pt}{1}{0}{1} \in\bigotimes_{j = 1}^{2}\mathbb{R}^{2}. $$

(33a)

Furthermore, the shifted vector has the tensor representation

$$ S\left[ \begin{array}[c]{c} \alpha\gamma\\ \alpha\delta+\beta\gamma\\ \beta\delta\\ 0\end{array} \right] =\left[ \begin{array}[c]{c} 0\\ \alpha\gamma\\ \alpha\delta+\beta\gamma\\ \beta\delta \end{array} \right] \sim \genfrac{[}{]}{0pt}{1}{0}{\alpha\gamma} \otimes \genfrac{[}{]}{0pt}{1}{1}{0} + \genfrac{[}{]}{0pt}{1}{\alpha\delta+\beta\gamma}{\beta\delta} \otimes \genfrac{[}{]}{0pt}{1}{0}{1} \in\bigotimes_{j = 1}^{2}\mathbb{R}^{2}. $$

(33b)

The basic identity is given in the next lemma.

Lemma 4

For given$\mathbf {v},\mathbf {w}\in \bigotimes \nolimits _{j = 1}^{\delta -1}\mathbb {R}^{2}$letthe convolution result be

$$ \mathbf{v\star w}\sim\mathbf{a}=\mathbf{a}^{\prime}\otimes \genfrac{[}{]}{0pt}{1}{1}{0} +\mathbf{a}^{\prime\prime}\otimes \genfrac{[}{]}{0pt}{1}{0}{1} \in\bigotimes_{j = 1}^{\delta}\mathbb{R}^{2}. $$

(34a)

Then, convolution of the tensors v ⊗ x and w ⊗ y with x =[ αβ], $y= \genfrac {[}{]}{0pt}{1}{\gamma }{\delta } \in \mathbb {R}^{2}$ yields

$$\begin{array}{@{}rcl@{}} (\mathbf{v}\otimes x)\mathbf{\star}(\mathbf{w}\otimes y)\sim\mathbf{u}&=&\mathbf{u}^{\prime}\otimes \genfrac{[}{]}{0pt}{1}{1}{0} +\mathbf{u}^{\prime\prime}\otimes \genfrac{[}{]}{0pt}{1}{0}{1} \in\bigotimes_{j = 1}^{\delta+ 1}\mathbb{R}^{2}\\ \text{with }\quad\mathbf{u}^{\prime} & =& \mathbf{a}^{\prime}\otimes \genfrac{[}{]}{0pt}{1}{\alpha\gamma}{\alpha\delta+\beta\gamma} +\mathbf{a}^{\prime\prime}\otimes \genfrac{[}{]}{0pt}{1}{0}{\alpha\gamma} \in\bigotimes_{j = 1}^{\delta}\mathbb{R}^{2}\\ \text{and }\quad\mathbf{u}^{\prime\prime} & =& \mathbf{a}^{\prime}\otimes \genfrac{[}{]}{0pt}{1}{\beta\delta}{0} +\mathbf{a}^{\prime\prime}\otimes \genfrac{[}{]}{0pt}{1}{\alpha\delta+\beta\gamma}{\beta\delta} \in\bigotimes_{j = 1}^{\delta}\mathbb{R}^{2}. \end{array} $$

(34b)

Proof

Lemma 2 implies that

$$(\mathbf{v}\otimes x)\mathbf{\star}(\mathbf{w}\otimes y) \sim (\mathbf{v\star w}) \otimes z\qquad\text{ with }~z:=x \mathbf{\star}y\in\mathbb{R}^{3}\subset\ell_{0}. $$

Assumption (34a) yields

$$(\mathbf{v\star w}) \otimes z\sim\left( \mathbf{a}^{\prime}+S^{2^{\delta-1}}\mathbf{a}^{\prime\prime}\right)\otimes z. $$

Lemma 1 shows that

$$S^{2^{\delta-1}}\mathbf{a}^{\prime\prime}\otimes z=S^{2^{\delta-1}}(\mathbf{a}^{\prime\prime}\otimes z)\sim\mathbf{a}^{\prime\prime}\otimes(Sz). $$

Using (33a) and (33b), we obtain

$$\begin{array}{@{}rcl@{}} \mathbf{a}^{\prime}\otimes z & \sim& \mathbf{a}^{\prime}\otimes \genfrac{[}{]}{0pt}{1}{\alpha\gamma}{\alpha\delta+\beta\gamma} \otimes \genfrac{[}{]}{0pt}{1}{1}{0} +\mathbf{a}^{\prime}\otimes \genfrac{[}{]}{0pt}{1}{\beta\delta}{0} \otimes \genfrac{[}{]}{0pt}{1}{0}{1},\\ (S^{2^{\delta-1}}\mathbf{a}^{\prime\prime})\otimes z & \sim& \mathbf{a}^{\prime\prime}\otimes(Sz)\sim\mathbf{a}^{\prime\prime}\otimes \genfrac{[}{]}{0pt}{1}{0}{\alpha\gamma} \otimes \genfrac{[}{]}{0pt}{1}{1}{0} +\mathbf{a}^{\prime\prime}\otimes \genfrac{[}{]}{0pt}{1}{\alpha\delta+\beta\gamma}{\beta\delta} \otimes \genfrac{[}{]}{0pt}{1}{0}{1}. \end{array} $$

Summation of both identities yields the assertion of the lemma. □

If the vectors x,y in Lemma 4 belong to {[10],[ 0 1]}, the vectors [ αγαδ + βγ], [ 0αγ], [βδ0], [αδ + βγβδ] from (34b) belong to {[00],[ 1 0],[ 0 1]}.

Lemma 3 proves assumption (34a) for δ = 2, while Lemma 4 shows that v ⊗ x and w ⊗ y satisfy the requirement (34a) (for δ + 1 instead of δ).

4.7 Convolution of Tensors in Hierarchical Format

We recall that the subspaces $\mathbf {U}_{\delta }\subset \otimes ^{\delta }\mathbb {R}^{2}$ satisfy (25): $\mathbf {U}_{\delta + 1}\subset \mathbf {U}_{\delta }\otimes \mathbb {R}^{2}$. The essential observation is that also the results of the convolution yield subspaces with this property.

Note that there are three different tensors v, w, u := v ⋆ w involving representations with three different subspace families $\mathbf {U}_{\delta }^{\prime }$, $\mathbf {U}_{\delta }^{\prime \prime }$, U_δ (1 ≤ δ ≤ L). The bases spanning these subspaces consist of the vectors $\mathbf {b}_{i}^{\prime (\delta )}$, $\mathbf {b}_{i}^{\prime \prime (\delta )}$, $\mathbf {b}_{i}^{(\delta )}$. The dimensions of the subspaces are $r_{\delta }^{\prime }$, $r_{\delta }^{\prime \prime }$, r_δ.

Any tensor $\mathbf {a}\in \otimes ^{\delta }\mathbb {R}^{2}$ (δ ≥ 1) can be written as a = a^′⊗[ 1 0] + a^″⊗[ 0 1]. Define the linear maps $\phi _{\delta }^{\prime }$, $\phi _{\delta }^{\prime \prime }:\otimes ^{\delta }\mathbb {R}^{2}\rightarrow \otimes ^{\delta -1}\mathbb {R}^{2}$ by $\phi _{\delta }^{\prime }(\mathbf {a})=\mathbf {a}^{\prime }$, $\phi _{\delta }^{\prime \prime }(\mathbf {a})=\mathbf {a}^{\prime \prime }$.

Theorem 2

Let the tensors$\mathbf {v},\mathbf {w}\in \bigotimes _{j = 1}^{L}\mathbb {R}^{2}$berepresented by (possibly different) hierarchical formats using the respectivesubspaces$\mathbf {U}_{\delta }^{\prime }$and$\mathbf {U}_{\delta }^{\prime \prime }$,1 ≤ δ ≤ L,satisfying

$$ \begin{array}[c]{lll} \mathbf{U}_{1}^{\prime}=\mathbb{R}^{2},\qquad & \mathbf{U}_{\delta}^{\prime}\subset\mathbf{U}_{\delta-1}^{\prime}\otimes\mathbb{R} ^{2},\qquad & \mathbf{v}\in\mathbf{U}_{L}^{\prime},\\ \mathbf{U}_{1}^{\prime\prime}=\mathbb{R}^{2}, & \mathbf{U}_{\delta} ^{\prime\prime}\subset\mathbf{U}_{\delta-1}^{\prime\prime}\otimes \mathbb{R}^{2}, & \mathbf{w}\in\mathbf{U}_{L}^{\prime\prime}. \end{array} $$

(35a)

The subspaces

$$ \mathbf{U}_{\delta}:=\operatorname*{span}\{\phi_{\delta+ 1}^{\prime} (\mathbf{x}\star\mathbf{y}),\phi_{\delta+ 1}^{\prime\prime}(\mathbf{x}\star\mathbf{y}):\mathbf{x}\in\mathbf{U}_{\delta}^{\prime},~\mathbf{y}\in\mathbf{U}_{\delta}^{\prime\prime}\}\qquad(1\leq\delta\leq L) $$

(35b)

satisfy

$$ \mathbf{U}_{1}=\mathbb{R}^{2},\quad\mathbf{U}_{\delta}\subset \mathbf{U}_{\delta-1}\otimes\mathbb{R}^{2},\quad\mathbf{v} \star \mathbf{w}\in\mathbf{U}_{L + 1}. $$

(35c)

The dimension of U_δ can be bounded by

$$ \dim(\mathbf{U}_{\delta})\leq\min\left\{2\dim(\mathbf{U}_{\delta}^{\prime})\dim(\mathbf{U}_{\delta}^{\prime\prime}),2^{\delta},2^{L + 1-\delta}\right\}. $$

(35d)

Proof

(i)
$\mathbf {U}_{1}=\mathbb {R}^{2}$ can be concluded from Lemma 3.
(ii)
Write $\mathbf {x,y}\in \mathbf {U}_{\delta }^{\prime }\subset \mathbf {U}_{\delta -1}^{\prime }\otimes \mathbb {R}^{2}$ as x=x^′⊗[ 1 0] + x^″⊗[ 0 1] and y=y^′⊗[ 1 0] + y^″⊗[ 0 1] with $\mathbf {x}^{\prime },\mathbf {x}^{\prime \prime },\mathbf {y}^{\prime },\mathbf {y}^{\prime \prime }\in \mathbf {U}_{\delta -1}^{\prime }$. Expansion of the sums yields x ⋆ y = (x^′⊗[ 1 0]) ⋆ (y^′⊗[ 1 0]) + ⋯ For each term z of this expansion, Lemma 4 (with v,w renamed x^′,x^″) states that $\phi _{\delta + 1}^{\prime }(\mathbf {z}) =\mathbf {u}^{\prime }$ and $\phi _{\delta + 1}^{\prime \prime }(\mathbf {z})=\mathbf {u}^{\prime \prime }$ belong to $\mathbf {U}_{\delta -1}\otimes \mathbb {R}^{2}$ (cf. (34b)). Hence, $\phi _{\delta + 1}^{\prime }(\mathbf {x}\star \mathbf {y}),\phi _{\delta + 1}^{\prime \prime }(\mathbf {x}\star \mathbf {y})\in \mathbf {U}_{\delta -1}\otimes \mathbb {R}^{2}$ holds, and the definition of U_δ implies the inclusion $\mathbf {U}_{\delta }\subset \mathbf {U}_{\delta -1}\otimes \mathbb {R}^{2}$.
(iii)
$\mathbf {v}\in \mathbf {U}_{L}^{\prime }$ and $\mathbf {w}\in \mathbf {U}_{L}^{\prime \prime }$ together with the definition of U_L lead to v ⋆ w ∈U_L.
(iv)
The first bound of dim(U_δ) follows directly from (35b). The bound min{2^δ,2^L+ 1−δ} holds for any rank(M^(1,…,δ)(v)) of $\mathbf {v}\in \otimes ^{L + 1}\mathbb {R}^{2}$.

□

The bound $2\dim (\mathbf {U}_{\delta }^{\prime })\dim (\mathbf {U}_{\delta }^{\prime \prime })$ corresponds to the product mentioned in Remark 3.

For δ = 1,…,L, the numerical scheme has

1.
to introduce an orthonormal basis $\{\mathbf {b}_{1}^{(\delta )},\ldots ,\mathbf {b}_{r_{\delta }}^{(\delta )}\}$ of U_δ, where r_δ := dim(U_δ) (cf. Section 3.5),
2.
to represent the convolution $\mathbf {b}_{i}^{\prime (\delta )}\star \mathbf {b}_{j}^{\prime \prime (\delta )}$ by
$$ \mathbf{b}_{i}^{\prime(\delta)}\star\mathbf{b}_{j}^{\prime\prime(\delta)}= \sum\limits_{k = 1}^{r_{\delta}}\sum\limits_{m = 1}^{2}\beta_{ij,km}^{(\delta)}\mathbf{b}_{k}^{(\delta)}\otimes b_{m}. $$
(36)

As soon as the β-coefficients from (36) are known, general products x ⋆ y of $\mathbf {x}\in \mathbf {U}_{\delta }^{\prime }$ and $\mathbf {y}\in \mathbf {U}_{\delta }^{\prime \prime }$ can be evaluated easily as shown in the next remark.

Remark 10

Let $\mathbf {x}={\sum }_{i = 1}^{r_{\delta }^{\prime }}\xi _{i}\mathbf {b}_{i}^{\prime (\delta )}\in \mathbf {U}_{\delta }^{\prime }$ and $\mathbf {y}={\sum }_{j = 1}^{r_{\delta }^{\prime \prime }}\eta _{j}\mathbf {b}_{j}^{\prime \prime (\delta )}\in \mathbf {U}_{\delta }^{\prime \prime }$. Then, convolution yields

$$\begin{array}{@{}rcl@{}} &&\mathbf{x}\star\mathbf{y}=\mathbf{z}=\mathbf{z}^{\prime}\otimes \genfrac{[}{]}{0pt}{1}{1}{0} +\mathbf{z}^{\prime\prime}\otimes \genfrac{[}{]}{0pt}{1}{0}{1} \quad\text{ with}\quad\mathbf{z}^{\prime}=\sum\limits_{k = 1}^{r_{\delta}}\zeta_{k}^{\prime}\mathbf{b}_{k}^{(\delta)},~~\mathbf{z}^{\prime\prime}=\sum\limits_{k = 1}^{r_{\delta}}\zeta_{k}^{\prime\prime}\mathbf{b}_{k}^{(\delta)},\\ \text{where}\quad&&\zeta_{k}^{\prime}=\sum\limits_{i = 1}^{r_{\delta}^{\prime}}\sum\limits_{j = 1}^{r_{\delta}^{\prime\prime}}\xi_{i}\eta_{j}\beta_{ij,k1}^{(\delta)}\quad\text{ and }\quad\zeta_{k}^{\prime\prime}=\sum\limits_{i = 1}^{r_{\delta}^{\prime}}\sum\limits_{j = 1}^{r_{\delta}^{\prime\prime}}\xi_{i}\eta_{j}\beta_{ij,k2}^{(\delta)} \end{array} $$

with $\beta _{ij,km}^{(\delta )}$ from (36). The computation of $\zeta _{k}^{\prime }$, $\zeta _{k}^{\prime \prime }~~(1\leq k\leq r_{\delta })$ requires $4r_{\delta }r_{\delta }^{\prime }(r_{\delta }^{\prime \prime }+ 1)$ operations.

The total cost is described in [p. 482][9]. It is the sum

$$ 8r_{\delta}^{\prime\prime}r_{\delta-1}^{\prime}r_{\delta-1}\left( r_{\delta-1}^{\prime\prime}+r_{\delta}^{\prime}\right) + 8\left( r_{\delta}^{\prime}r_{\delta}^{\prime\prime}\right)^{2}r_{\delta-1}+\frac{4}{3}\left( r_{\delta}^{\prime}r_{\delta}^{\prime\prime}\right)^{3} + 2r_{\delta-1}r_{\delta}^{2}\qquad\text{ for }2\leq\delta\leq L. $$

(37)

A rough estimate by $r_{\delta }^{\prime },r_{\delta }^{\prime \prime }\leq r$ and r_δ ≤ 2r² yields the asymptotic bound $\frac {100}{3}(L-1)r^{6}$. The higher order terms are caused by the orthonormalisation.

5 Toeplitz Matrices

5.1 Notation

A matrix (a_ij) is called a Toeplitz matrix if a_ij only depends on i − j. A multiplication by a Toeplitz matrix and a convolution are almost equivalent (cf. Kazeev et al. [14]).

If we fix the vector x in x ⋆ y, this expression defines a linear map y↦x ⋆ y which may be expressed by a matrix T = T_x, i.e., Ty := x ⋆ y. In the case of $x,y\in \mathbb {R}^{n}$ and $x\star y\in \mathbb {R}^{2n-1}$, T is the (rectangular) Toeplitz matrix of size (2n − 1) × n with T_i0 = x_i (0 ≤ i ≤ n − 1), T_{n− 1 + i,0} = T_0i = 0 (1 ≤ i ≤ n − 1).

A general n × n Toeplitz matrix is uniquely determined by the coefficient vector a = [a₀,…,a_2n− 2]:

$$ T(a):=\left[ \begin{array}{llll} a_{n-1} & a_{n-2} & {\cdots} & a_{0}\\ a_{n} & {\ddots} & {\ddots} & \vdots\\ {\vdots} & {\ddots} & {\ddots} & a_{n-2}\\ a_{2n-2} & {\cdots} & a_{n} & a_{n-1} \end{array} \right],\qquad \begin{array}{ll} \text{i.e., } T(a)_{i,j}=a_{n-1+i-j}\\ \text{for } 0\leq i,j\leq n-1. \end{array} $$

(38)

The product z := a ⋆ y belongs to $\mathbb {R}^{3n-1}$. The part $\hat {z}$ with $\hat {z}_{i}:=z_{n-1+i}$ (0 ≤ i ≤ n − 1) coincides with $T(a)y\in \mathbb {R}^{n}$.

5.2 Tensorisation for Matrices

The matrix space $\mathbb {R}^{n\times n}$ for n = 2^L is isomorphic to $\bigotimes _{j = 1}^{L}\mathbb {R}^{2\times 2}$. As in (23), the isomorphism $\mathbf {M}\in \bigotimes _{j = 1}^{L}\mathbb {R}^{2\times 2}\mapsto M\in \mathbb {R}^{n\times n}$ is defined by M[i,j] = M[(i₁,j₁),…,(i_L,j_L)] where $i={\sum }_{\ell = 1}^{L}i_{\ell }2^{\ell -1},~j={\sum }_{\ell = 1}^{L}j_{\ell }2^{\ell -1},~i_{\ell },j_{\ell }\in \{0,1\}$ (cf. [17]). In particular, a block matrix $\left [ \begin {array}[c]{cc} M_{11} & M_{12}\\ M_{21} & M_{22} \end {array} \right ]$ corresponds to the tensor product M₁₁ ⊗[10 0 0] + M₁₂ ⊗[00 1 0] + M₂₁ ⊗[01 0 0] + M₂₂ ⊗[00 0 1].

In the case of a Toeplitz matrix, all submatrices are again Toeplitz. In the previous example, M₁₁ = M₂₂ follows. Therefore, a suitable subspace U of $\mathbb {R}^{2\times 2}$ is spanned by b₁ := [00 1 0], b₂ := [10 0 1], b₂ := [01 0 0]. For the hierarchical representation, we use the linear tree of Fig. 5 with $\mathbb {R}^{2}$ replaced by U.

The TT-rank r_μ = dim(U_μ) is described next. Let $T=T(a)\in \mathbb {R}^{n\times n}$ be a Toeplitz matrix defined by the coefficient vector $a\in \mathbb {R}^{2n-1}$ (cf. (38). Consider a regular block structure of T with blocks of size 2^μ × 2^μ. Denote these blocks by $T^{\alpha \beta }=(T_{ij})_{\alpha 2^{\mu }\leq i\leq (\alpha + 1) 2^{\mu }-1,~ \beta 2^{\mu }\leq j\leq (\beta + 1) 2^{\mu }-1}$ for 0 ≤ α,β ≤ 2^L−μ − 1. Then, the matricisation yields U_μ = span{T^αβ : 0 ≤ α,β ≤ 2^L−μ − 1} and r_μ = dim(U_μ).

A simpler description follows from the fact that

$$T^{\alpha\beta}=T\left( \left[a_{n+(\alpha-\beta-1)2^{\mu}},\dots,a_{n-2+(\alpha-\beta+ 1) 2^{\mu}}\right]\right)=T(a^{(\alpha-\beta)}), $$

where $a^{(\gamma )}=[a_{n+(\gamma -1)2^{\mu }},\dots ,a_{n-2+(\gamma + 1) 2^{\mu }}]\in \mathbb {R}^{2^{\mu + 1}-1}$ is a part of the vector a defining T = T(a). Since the linear map a↦T(a) is an isomorphism, we obtain the TT-ranks

$$\begin{array}{@{}rcl@{}} r_{\mu} &=& \dim(\mathbf{U}_{\mu})=\dim\operatorname{span}\{a^{(\gamma)}:1-2^{L-\mu}\leq\gamma\leq2^{L-\mu}-1\}\\ & =&\operatorname{rank}\left[ \begin{array}[c]{cccc} a_{0} & a_{2^{\mu}} & {\ldots} & a_{2^{2L}-2\cdot2^{\mu}}\\ a_{1} & a_{2^{\mu}+ 1} & {\ldots} & a_{2^{2L}-2\cdot2^{\mu}+ 1}\\ {\vdots} & {\vdots} & {\ddots} & \vdots\\ a_{2\cdot2^{\mu}-2} & a_{3\cdot2^{\mu}-2} & {\ldots} & a_{2^{2L}-2} \end{array} \right]. \end{array} $$

(39)

The latter matrix looks similar to the matricisation M^(μ) in (24). It can be used for the following bound (cf. [14]).

Lemma 5

The TT-rankr_μofT = T(a) is bounded by 2r_μ(a),wherer_μ(a) is the TT-rank of the tensorisation of thevector$a\in \mathbb {R}^{2n}$(herea_2n− 1can be defined arbitrarily).

Proof

Split the matrix in (39) into the upper part $\left [ \begin {array}[c]{ccc} a_{0} & {\ldots } & a_{2n-2\cdot 2^{\mu }}\\ {\vdots } & {\ddots } & \vdots \\ a_{2^{\mu }-1} & {\ldots } & a_{2n-2^{\mu }-1} \end {array} \right ]$ and the lower part $\left [ \begin {array}[c]{ccc} a_{2^{\mu }} & {\ldots } & a_{2n-2^{\mu }}\\ {\vdots } & {\ddots } & \vdots \\ a_{2\cdot 2^{\mu }-1} & {\ldots } & a_{2n-1} \end {array}\right ]$, where the last column is added. The rank (39) is bounded by the sum of the ranks of the latter two matrices. These, however, are submatrices of the matricisation M^(μ) belonging to the vector a. This proves the assertion. □

5.3 Matrix-Vector Multiplication

For the evaluation of the product Ty, we assume that the Toeplitz matrix T is expressed by the tensorised analogue $\mathbf {T}\in \bigotimes _{j = 1}^{L}\mathbb {R}^{2\times 2}$. Here, it is important that for the tensorised quantities $\mathbf {T}=\bigotimes _{j = 1}^{L}T^{(j)}$ and $\mathbf {y}=\bigotimes _{j = 1}^{L}y^{(j)}$ the directionwise product $\mathbf {z}:=\bigotimes _{j = 1}^{L}(T^{(j)}y^{(j)})$ is the tensorisation of z = Ty.

The hierarchical representation of T uses the bases $_{T}\mathbf {b}_{\ell }^{(\mu )}$ (1 ≤ ℓ ≤ r_μ) of U_μ, while the leaves j are associated with the subspaces U_j = U spanned by the fixed basis ${b_{1}^{U}}:= \left [ {0}{0} \genfrac {}{}{0pt}{}{1}{0} \right ]$, ${b_{2}^{U}}:=\left [\genfrac {}{}{0pt}{}{1}{0} \genfrac {}{}{0pt}{}{0}{1} \right ]$, ${b_{3}^{U}}:=\left [ \genfrac {}{}{0pt}{}{0}{1} \genfrac {}{}{0pt}{}{0}{0} \right ]$. The coefficient matrices are ${~}_{T}C^{(\mu ,\ell )}=\left ({~}_{T}c_{ij}^{(\mu ,\ell )}\right )$, i.e., $_{T}\mathbf {b}_{\ell }^{(\mu )}={\sum }_{i = 1}^{r_{\mu }}{\sum }_{j = 1}^{3} {{~}_{T}c_{ij}^{(\mu ,\ell )}}{{~}_{T}\mathbf {b}_{i}^{(\mu -1)}} \otimes {b_{j}^{U}}$.

Let y ∈Rⁿ have the tensorised analogue $\mathbf {y}\in \bigotimes _{j = 1}^{L}\mathbb {R}^{2}$ represented via (26) with data $_{y}c_{ij}^{(\mu + 1,\ell )}$ and $_{y}\mathbf {b}_{i}^{(\mu )}$. At the leaves, the basis vectors b₁ := [10] , b₂ := [01] are fixed.

Then, the product $z:=Ty\in \mathbb {R}^{2}$ has the tensorised analogue $\mathbf {z}\in \bigotimes _{j = 1}^{L}\mathbb {R}^{2}$ with data $_{z}c_{(\ell ,m),j}^{(\mu + 1,\ell )}$ and $_{z}\mathbf {b}_{(\ell ,m)}^{(\mu )}$ which are obtained as follows. The recursion

$$\begin{array}{@{}rcl@{}} {~}_{z}\mathbf{b}_{(\ell,m)}^{(\mu)} & :=&{{~}_{T}\mathbf{b}_{\ell}^{(\mu)}}~{{~}_{y}\mathbf{b}_{m}^{(\mu)}} =\left( \sum\limits_{i,j}~ {{~}_{T}c_{ij}^{(\mu,\ell)}} ~{{~}_{T}\mathbf{b}_{i}^{(\mu-1)}} \otimes {b_{j}^{U}}\right) \left( \sum\limits_{i^{\prime},j^{\prime}}~ {{~}_{y}c_{i^{\prime}j^{\prime}}^{(\mu,m)}}~ {{~}_{y}\mathbf{b}_{i^{\prime}}^{(\mu-1)}} \otimes b_{j^{\prime}}\right) \\ & =&\sum\limits_{i,j,i^{\prime},j^{\prime}} ~{{~}_{T}c_{ij}^{(\mu,\ell)}} {_{y}c_{i^{\prime}j^{\prime}}^{(\mu,m)}} \left( {{~}_{T}\mathbf{b}_{i}^{(\mu-1)}}~ {{~}_{y}\mathbf{b}_{i^{\prime}}^{(\mu-1)}}\right) \otimes\left( {b_{j}^{U}}b_{j^{\prime}}\right) \\ & =&\sum\limits_{i,i^{\prime}}\sum\limits_{(j,j^{\prime})\in\{(1,2),(2,1)\}} ~{{~}_{T}c_{ij}^{(\mu,\ell)}} ~{{~}_{y}c_{i^{\prime}j^{\prime}}^{(\mu,m)}} \left( {{~}_{T}\mathbf{b}_{i}^{(\mu-1)}}~{{~}_{y}\mathbf{b}_{i^{\prime}}^{(\mu-1)}}\right) \otimes b_{1}\\ &&+\sum\limits_{i,i^{\prime}}\sum\limits_{(j,j^{\prime})\in\{(2,2),(3,1)\}}~ {{~}_{T}c_{ij}^{(\mu,\ell)}} ~{{~}_{y}c_{i^{\prime}j^{\prime}}^{(\mu,m)}} \left( {{~}_{T}\mathbf{b}_{i}^{(\mu-1)}}~{{~}_{y}\mathbf{b}_{i^{\prime}}^{(\mu-1)}}\right) \otimes b_{2} \end{array} $$

corresponds to (18). Here, we use that at the leaves the products ${b_{i}^{U}}b_{j}$ (i = 1,2,3;j = 1,2) are either b₁ or b₂ or zero. At the root, we obtain the result $\mathbf {z}=\mathbf {Ty}= {{~}_{T}c_{1}^{(L)}} ~{{~}_{y}c_{1}^{(L)}}~ {{~}_{z}\mathbf {b}_{(1,1)}^{(\mu )}}$.

The required number of operations is $8{\sum }_{\mu = 1}^{L}r_{\mu }(T)r_{\mu }(y)r_{\mu -1}(T)r_{\mu -1}(y)$. Using Lemma 5 for T = T(a) and the bound r := maxμ{r_μ(y),r_μ(a)}, we obtain the work bound $32{\sum }_{\mu = 1}^{L}r_{\mu }(T)r_{\mu }(y)r_{\mu -1}(T)r_{\mu -1}(y)\lesssim 32r^{4}$. Similar to (37), the main cost is required by the orthonormalisation.

6 Additional Remarks

As mentioned above, the convolution can be computed via Fourier forward and backward transforms. As explained in [10, Section 14.4], the Fourier transform $v\mapsto \hat {v}$ can be realised by using the TT format of the tensorisation of v. The algorithm in Section 4.4 yields the exact convolution. The exact Fourier transform of the tensorised v may produce intermediate results with increasing rank. Therefore, a statement as in (35d) cannot be obtained. Nevertheless, practical examples with intermediate truncation seem to give satisfactory results (cf. Dolgov et al. [3]).

Notes

Throughout the paper, $\mathbb {R}$ may be replaced by $\mathbb {C}$.
log ∗(n) denotes some (not specified) power of log(n).
Appropriate algorithms are described in [7, 8].
According to Remark 1, it is sufficient to investigate the mapping for elementary tensors.
Such SVDs are called the higher order singular-value decompositions (HOSVD) by De Lathauwer–De Moor–Vandevalle [2].
At the root, we have the special situation that Z₁ = Z₂ because r_root = 1.
The preprint of [15] appeared in September 2009.
The term ‘tensorisation’ corresponds to the well-introduced terms ‘matricisation’ (mapping a tensor into a matrix, cf. Section 3.3) or ‘vectorisation’ (mapping a matrix or tensor into a vector).
All binary trees for tensors of order ≤ 3 are linear trees, cf. Fig. 2.

References

Braess, D., Hackbusch, W.: On the efficient computation of high-dimensional integrals and the approximation by exponential sums. In: DeVore, R.A., Kunoth, A (eds.) Multiscale, Nonlinear and Adaptive Approximation, pp 39–74. Springer, Berlin (2009)
De Lathauwer, L., De Moor, B., Vandewalle, J.: A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21, 1253–1278 (2000)
Article MathSciNet MATH Google Scholar
Dolgov, S., Khoromskij, B., Savostyanov, D.V.: Superfast Fourier transform using QTT approximation. J. Fourier Anal. Appl. 18, 915–953 (2012)
Article MathSciNet MATH Google Scholar
Espig, M., Hackbusch, W.: A regularized Newton method for the efficient approximation of tensors represented in the canonical tensor format. Numer. Math. 122, 489–525 (2012)
Article MathSciNet MATH Google Scholar
Flad, H.J., Flad Harutyunyan, G.: Singular analysis of RPA diagrams in coupled cluster theory. Manuscript (2017)
Grasedyck, L.: Polynomial approximation in hierarchical Tucker format by vector-tensorization. DFG-SPP 1324 Preprint 43 Philipps-Universität Marburg (2010)
Hackbusch, W.: Fast and exact projected convolution for non-equidistant grids. Computing 80, 137–168 (2007)
Article MathSciNet MATH Google Scholar
Hackbusch, W.: Convolution of hp-functions on locally refined grids. IMA J. Numer. Anal. 29, 960–985 (2009)
Article MathSciNet MATH Google Scholar
Hackbusch, W.: Tensorisation of vectors and their efficient convolution. Numer. Math. 119, 465–488 (2011)
Article MathSciNet MATH Google Scholar
Hackbusch, W.: Tensor Spaces and Numerical Tensor Calculus. Springer Series in Computational Mathematics, vol. 42. Springer, Berlin (2012)
Book Google Scholar
Hackbusch, W.: Hierarchical Matrices: Algorithms and Analysis. Springer Series in Computational Mathematics, vol. 49. Springer, Berlin (2015)
Book Google Scholar
Håstad, J.: Tensor rank is NP-complete. J. Algorithms 11, 644–654 (1990)
Article MathSciNet MATH Google Scholar
Hitchcock, F.L.: The expression of a tensor or a polyadic as a sum of products. J. Math. Phys. 6, 164–189 (1927)
Article MATH Google Scholar
Kazeev, V.A., Khoromskij, B.N., Tyrtyshnikov, E.E.: Multilevel Toeplitz matrices generated by tensor-structured vectors and convolution with logarithmic complexity. SIAM J. Sci. Comput. 35, A1511–A1536 (2013)
Article MathSciNet MATH Google Scholar
Khoromskij, B.: O(d N)-quantics approximation of N − d tensors in high-dimensional numerical modeling. Constr. Approx. 34, 257–280 (2011)
Article MathSciNet MATH Google Scholar
Khoromskij, B.N., Veit, A.: Efficient computation of highly oscillatory integrals by using QTT tensor approximation. Comput. Methods Appl. Math. 16, 145–159 (2016)
Article MathSciNet MATH Google Scholar
Oseledets, I.V.: Approximation of 2^d × 2^d matrices using tensor decomposition. SIAM J. Matrix Anal. Appl. 31, 2130–2145 (2010)
Article MATH Google Scholar
Oseledets, I.V., Tyrtyshnikov, E.E.: Breaking the curse of dimensionality, or how to use SVD in many dimensions. SIAM J. Sci. Comput. 31, 3744–3759 (2009)
Article MathSciNet MATH Google Scholar

Download references

Funding

Open access funding provided by Max Planck Society.

Author information

Authors and Affiliations

Max-Planck-Institut Mathematik in den Naturwissenschaften, Inselstr. 22, D-04103, Leipzig, Germany
Wolfgang Hackbusch

Authors

Wolfgang Hackbusch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wolfgang Hackbusch.

Additional information

In memory of Eberhard Zeidler.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Hackbusch, W. Numerical Tensor Techniques for Multidimensional Convolution Products. Vietnam J. Math. 47, 69–92 (2019). https://doi.org/10.1007/s10013-018-0300-4

Download citation

Received: 23 August 2017
Accepted: 09 March 2018
Published: 05 September 2018
Issue Date: 15 March 2019
DOI: https://doi.org/10.1007/s10013-018-0300-4

Keywords

Mathematics Subject Classification (2010)

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Numerical Tensor Techniques for Multidimensional Convolution Products

Abstract

Similar content being viewed by others

Fast Higher-Order Functions for Tensor Calculus with Tensors and Subtensors

Efficient Analysis of High Dimensional Data in Tensor Formats

Even-order Toeplitz tensor: framework for multidimensional structured linear systems

1 Introduction

2 Low-Rank Techniques for Matrices

2.1 Low-Rank Representation

Remark 1

2.2 SVD Truncation

3 The Hierarchical Tensor Format

3.1 Separation and Bilinear Operations

3.2 Introduction of the Hierarchical Format

Remark 2

3.3 Matricisation

3.4 Hadamard Product and General Bilinear Operations

Remark 3

3.5 Scalar Product, Orthonormalisation, Transformations

Remark 4

3.6 SVD Truncation

3.7 Convolution

4 Tensorisation

4.1 Grid Functions in \(\mathbb {R}^{n}\)

4.2 TT Format

Remark 5

Example 1

Remark 6

Example 2

Proof

4.3 Hadamard Product in \(\mathbb {R}^{n}\)

Conclusion 1

4.4 Convolution in \(\mathbb {R}^{n}\)

4.4.1 Definition of the Convolution

Remark 7

Remark 8

4.4.2 Principal Idea of the Algorithm

4.4.3 Extension to \(\otimes ^{L}\mathbb {\ell }_{0}\)

Remark 9

4.4.4 Polynomials

Lemma 1

Lemma 2

Proof

4.5 Carry-over Procedure

4.6 Convolution Algorithm

Lemma 3

Lemma 4

Proof

4.7 Convolution of Tensors in Hierarchical Format

Theorem 2

Proof

Remark 10

5 Toeplitz Matrices

5.1 Notation

5.2 Tensorisation for Matrices

Lemma 5

Proof

5.3 Matrix-Vector Multiplication

6 Additional Remarks

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation