Numerical Tensor Techniques for Multidimensional Convolution Products

In order to treat high-dimensional problems, one has to find data-sparse representations. Starting with a six-dimensional problem, we first introduce the low-rank approximation of matrices. One purpose is the reduction of memory requirements, another advantage is that now vector operations instead of matrix operations can be applied. In the considered problem, the vectors correspond to grid functions defined on a three-dimensional grid. This leads to the next separation: these grid functions are tensors in ℝn⊗ℝn⊗ℝn\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathbb {R}^{n}\otimes \mathbb {R}^{n}\otimes \mathbb {R}^{n}$\end{document} and can be represented by the hierarchical tensor format. Typical operations as the Hadamard product and the convolution are now reduced to operations between ℝn\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathbb {R}^{n}$\end{document} vectors. Standard algorithms for operations with vectors from ℝn\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathbb {R}^{n}$\end{document} are of order O(n)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal {O}(n)$\end{document} or larger. The tensorisation method is a representation method introducing additional data-sparsity. In many cases, the data size can be reduced from O(n)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal {O}(n)$\end{document} to O(logn)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal {O}(\log n)$\end{document}. Even more important, operations as the convolution can be performed with a cost corresponding to these data sizes.


Introduction
In this paper, we recapitulate the numerical techniques which are needed to handle highdimensional problems. As discussion starter, we use an example from quantum chemistry. The following function h is to be determined: In memory of Eberhard Zeidler.
In Section 2, we shall consider the matrices in (3) as tensors of the space 1 Then, problem (3) reduces to operations of vectors in R N . In a second step (Section 3), R N is regarded as the tensor space R n ⊗ R n ⊗ R n . For such tensors, we describe an efficient representation and show how operations are performed. In our example, we need two operations in R n : -the Hadamard product v w defined by the componentwise product (v w) i = v i w i , and -the convolution v w defined by (v w) The convolution v w is a discretisation of the convolution of functions, R v(x − y)w(y)dy, provided that v i (w i ) are the nodal values of v (w) in an equidistant grid. For instance, the convolution in R n can be performed by the fast Fourier transform (FFT) requiring O(n log n) operations. However, as explained in Section 4, we can perform the convolution (as well as the Hadamard product) much faster using the tensorisation technique. Here, R n for n = 2 L is replaced by the isomorphic tensor space ⊗ L R 2 . In many cases, grid functions in R n -in particular those from quantum chemistry-can be approximated by a tensor representation using only O(log * n) data. 2 Then, the exact convolution of v w requires not more than O(log * n) operations.
The convolution algorithm mentioned above is also interesting outside of quantum chemistry applications. Often, the functions v and w in R v(x − y)w(y)dy are represented by finite elements using locally refined grids or even hp techniques to reduce the number of degrees of freedom. If FFT is used for the convolution, one must transfer the finite-element functions to a uniform grid corresponding to the minimal grid size and thus one is destroying the advantages of the nonuniform finite-element approach. 3 The tensorisation technique is able to represent the data at least as efficient as in the finite-element case. Then, the operation cost is determined by the data sizes of the representations. Moreover, it yields the optimal representation of the result v w.

Low-Rank Representation
In quantum chemistry, it is more usual to write the integral (1) as by introducingf (x, y) := f (x, x − y) (cf. [5, (1.4)]). Then, the discrete analogue is the standard matrix productF G instead of (3). However, this notation is less appropriate since the properties of the function f and of the matrix F are swept under the carpet.
The function f has a (representation) rank r if f (x, y) = r ν=1 a ν (x)b ν (y), where {a ν } and {b ν } are linearly independent univariate functions. The latter identity is also written in tensor form as For instance, the function f (x, y) = ϕ(x)/ y − y 0 (y 0 position of a nucleus) has rank r = 1. However, the functionf (x, y) := ϕ(x)/ y 0 + x − y involved in (4) has infinite rank.
If the matrix F ∈ R N×N has the rank r, it allows a representation F = The splitting of the tensor space R N ⊗ R N ∼ = R N×N ( ∼ = denotes isomorphy) into the two factors R N is depicted in Fig. 1. In general, the tensor product v = v (1) ⊗ v (2) Here and in the sequel, we use boldface letters for tensors and tensor spaces, while vectors, matrices, and vector spaces are denoted by standard letters. If r is much smaller than N , (5) describes the low-rank representation of F . Note that the right-hand side of (5) requires only 2rN N 2 data. v (1) ⊗v (2) ⊗· · ·⊗v (d) is called an elementary tensor. In general, v (j ) may be elements of arbitrary vector spaces V j . The (algebraic) tensor space

SVD Truncation
Even if F has maximal rank N , it might be well approximated by a low-rank matrix F ε with rank r ε . For the precise analysis, we need the singular-value decomposition (SVD) of F which is with the singular values σ 1 ≥ σ 2 ≥ · · · ≥ σ r > 0. The traditional formulation is F = U V T , where the columns of U and V are defined by a ν and b ν , respectively, and is the diagonal matrix containing the singular values. If σ r ε ≤ ε for some r ε < r, the truncated matrix F ε := r ε ν=1 σ ν a ν ⊗ b ν has rank r ε and satisfies the spectral norm estimate F − F ε 2 ≤ ε. Now, we assume for the matrices in (3). We denote the entries of the vectors a ν , b ν , . . . by a ν etc., the operation described in (2) becomes is the i-component of the Hadamard product a ν q νμ . Together, we obtain the representation of the matrix H in (3) by Hence, the following has to be calculated: (a) determine the vectors q νμ := b ν c μ ∈ R N , (b) calculate the Hadamard products a ν q νμ ∈ R N , (c) determine the sum e μ := h 3 r ν=1 a ν q νμ . Then, H = s μ=1 e μ ⊗ d μ is the representation of the resulting matrix. This shows that H is again a low-rank matrix if G is so. Nevertheless, one may apply a singular-value decomposition and truncate H to a lower rank.
Since N = n 3 holds with a large value of n, even the simple Hadamard product in Step (b) is too costly when using the standard vector format. Instead we shall exploit the tensor structure of R N .
For later use, we return to the representation (5). Let Then, the tensor (matrix) F satisfies Comparing (8) with F ∈ R N ⊗ R N , we see that the full space R N of dimension N is replaced by subspaces of dimension r N .

Separation and Bilinear Operations
Here, we make use of the Cartesian product structure of the grid The tensor product of three vectors a, b, c ∈ R n is defined in (6). These tensors span the tensor space R n ⊗ R n ⊗ R n which is isomorphic to R N (both spaces have dimension N = n 3 ).
The analogue of the decomposition (5) would be the representation of v ∈ V : The smallest possible value of r is called the rank of the tensor v. The fact that in general the determination of this rank is NP hard (cf. Håstad [12]) already shows that the case of tensors of order ≥ 3 is much more involved. In particular, there is no direct analogue of the singular-value decomposition. This leads to difficulties when one wants to truncate a tensor to lower order (cf. Espig-Hackbusch [4]).
The Hadamard product (componentwise product) is a bilinear operation V × V → V. Another bilinear map is the matrix-vector multiplication. For a unified approach let be the symbol of a general bilinear operation between two tensor spaces. An efficient computation of such a tensor operation : X × Y → Z (with X = d j =1 X j , etc.) can be based on the following property (10), provided this property holds. Let x = d j =1 x (j ) and y = d j =1 y (j ) be elementary tensors 4 x (j ) j y (j ) (10) reduces the operation to simpler bilinear operations j : X j ×Y j → Z j on the individual vector spaces.
In the case of the Hadamard product, = is the componentwise product of tensors, while j = is the componentwise product of vectors. In fact, the property Note that on the left-hand side of (11) acts on V × V, whereas on the right-hand side acts on R n × R n . Another example is the canonical scalar product of a (pre-)Hilbert tensor space X satisfying d j =1 x (j ) , y (j ) . This corresponds to (10) with Y = X and Z = R (the field R is considered as a tensor space of order d = 0).
The notation (x y)[i] = j x[i − j]y[j] of the multivariate convolution involving multiindices i ∈ N d 0 shows that also = satisfies (10). For d = 3, we have Hence, the Hadamard and convolution operations can be reduced to operations acting on vectors in R n . If v and w are given in the form (9), all pairs of elementary terms can be treated by (11) or (12), respectively.

Introduction of the Hierarchical Format
In the following, we use the hierarchical format, which has the additional advantage that a SVD truncation can be performed (cf. [10,Section 11]). For that purpose, we need tensors of order 2 (matrix case) and rewrite R n ⊗ R n ⊗ R n as (R n ⊗ R n ) ⊗ R n ∼ = R n 2 ⊗ R n . In a second step, we split R n 2 into R n ⊗ R n . This leads to the binary tree shown in Fig. 2.
In the first step, we regard the components v As in Section 2, we may write V as s ν=1 v (12) ∈ R n 2 and v (3) ν ∈ R n . In the second step, we regard v (12) ν as n × n matrices or equivalently as tensors of R n ⊗ R n of the form r ν=1 v (1) ν ⊗ v (2) ν . Combining the structures of Figs. 1 and 2 yields the splitting depicted in Fig. 3. At the top of the tree, we see the matrix space R N×N ∼ = R N ⊗ R N with the sons R N on both sides. R N ∼ = R n 2 ⊗ R n is split into R n 2 and R n . Finally, R n 2 ∼ = R n ⊗ R n is split in two factors R n .
Following the construction (8), we associate each vertex of the tree with a subspace. The leaves of the tree correspond to R n . Therefore, there are six subspaces U 1 , . . . , U 6 ⊂ R n . U 12 and U 45 are subspaces of R n ⊗ R n ∼ = R n 2 , while U 123 and U 456 are subspaces of R n ⊗ R n ⊗ R n ∼ = R N . Also, the root R N×N has a subspace U 1−6 . The hierarchical structure is given by where α belongs to the index set {12, 123, 45, 456, 1-6}, i.e., Fig. 4). The condition (8) becomes The subspaces are (in principle) described by a basis (or at least a generating system). The bases of U 1 , . . . , U 6 corresponding to the leaves must be given explicitly. For the other indices, we avoid an explicit description since the basis vectors of R n 2 , R N = R n 3 , etc. are too large. Instead, we make use of (13). Let α be an index of an inner vertex of the tree (no leaf) and α 1 , α 2 its sons. Let {b with coefficients c (α, ) ij forming an r α 1 × r α 2 matrix It is sufficient to store C (α, ) instead of b (α) . Note that the necessary memory is independent of the vector size n.

Fig. 4 Corresponding subspaces
If (14) holds, the subspace U 1-6 can be reduced to the one-dimensional space U root = span{F }. Let b (root) 1 be the only basis vector. Then, only one additional factor c (root) 1 is needed to characterise Remark 2 (a) In the given example, we have to store the bases of U 1 , . . . , U 6 with the memory size 6 j =1 n j r j . The matrices C (α, ) require the memory size r 12 r 1 r 2 + r 45 r 4 r 5 + r 123 r 12 r 3 + r 456 r 45 r 6 + 1 · r 123 r 456 . c (root) 1 is only one real number. If n j ≤ n and r j ≤ r, the required memory size is bounded by 6nr + 4r 3 + r 2 + 1. (b) In the general case of tensors of order d (instead of 6 as above), the bound is dnr Below, we shall demonstrate that we can perform the required operations although we only have an indirect access to the bases.

Matricisation
The above construction gives rise to two questions: Do subspaces with the properties (13), (14) exist and what are their dimensions in the best case? The answer is given by the matricisation which maps a tensor isomorphically into a matrix. We explain this isomorphism for the example α = 45. The The matrix M (45) is of the size R n 2 ×n 4 and has the entries The subspace is the smallest subspace satisfying (13) and (14). For a more general description of the minimal subspaces see [10,Section 6].
. . , d} be a subset with the complement Note that the index sets need not be ordered, since we only use properties of M (α) which do not depend on the ordering. The (matrix) rank of M (α) is called the α-rank of v (cf. Hitchcock [13]):

Hadamard Product and General Bilinear Operations
In the following, the Hadamard product can be replaced by a general bilinear operation (cf. (10)). In (7), we need the Hadamard product v w of two tensors in 3 j =1 R n . We assume that both v and w are represented in the hierarchical format corresponding to the tree depicted in Fig. 2. v uses the bases {b

at the leaves and the coefficients
. Also, the ranks r α and r α may be different.
We start at the leaves and determine the Hadamard product of the basis vectors explicitly: By induction, we assume that the products b (j,j ) are (directly or indirectly) determined. Then, (15) and (11) The result x := v w is represented by the generating system {b  (1,1) . We call {b (α) (i,i ) } a generating system (or frame) since these vectors are not necessarily linearly independent. If not, the system {b (α) (i,i ) } is larger than necessary and we can shorten the system. Even if {b (α) (i,i ) } forms a basis, the question remains whether we can truncate the basis within a given tolerance. This will be the subject of Section 3.6. (j,j ) are computed explicitly, we need r α r α r α 1 r α 1 r α 2 r α 2 multiplications. The resulting cost is the product of the data sizes of v and w.

Remark 3 The computation of all
In Section 4, the ranks r α , r α 1 , r α 2 will be equal to 2.

Scalar Product, Orthonormalisation, Transformations
As mentioned above, the linear independence of the new frame {b (α) (i,i ) } has to be checked. This can be done by the QR algorithm, provided we are able to determine scalar products b (m,m ) of the vectors determined in (18). We simplify the notation (index i instead of ( , m)) and consider the bases {b (α) i } at the vertex α and their connection by (15). We proceed from the leaves to the root as in Section 3.4.
At the leaves, the bases are explicitly given so that the scalar products can be determined as usual. As soon as σ (α 1 ) ij and σ (α 2 ) ij are known for the sons of α, σ (α) m can be determined by  (15), we obtain the following result. -Case A1. Let α 1 be the first son of α. Assume that the basis {b

SVD Truncation
The example in Section 3.4 shows that the Hadamard product is given by means of a generating system of increased size r j := r j r j . This size may be larger than necessary and should be truncated. The truncation is prepared by an orthonormalisation as described in Section 3.5.
In principle, the SVD truncation is based on the singular-value decompositions of the matricisations 5 M (α) (cf. Section 3.3). However, the singular values and singular vectors can be determined without the explicit knowledge of the huge matrix M (α) .
Having generated orthonormal bases at all nodes, the singular-value decomposition starts at the root and proceeds to the leaves. It produces a basis {b (α) ,new } together with singular values σ (α) indicating the importance of b (α) ,new . At the start α = root there is only one (normalised) basis vector b (root) 1,new which remains unchanged. The corresponding weight factor is σ (17)).
Assume that the new basis {b ,new } is already computed at the vertex α and that α is not a leaf but has sons α 1 , α 2 . The basis {b (α) } is characterised by the matrices C (α, ) . Together with the given values σ (α) , we define the matrices 6 ,2) , . . . , σ (α) r α C (α,r α ) ∈ R r α 1 ×(r α r α 2 ) , The SVD of these matrices yields The procedure is repeated for the sons of α 1 , α 2 until we reach the leaves. Then, at all vertices, HOSVD bases are introduced together with singular values σ (α) ν . As in Section 2.2, the SVD truncation consists of omitting all basis vectors corresponding to small enough singular values. Let σ (α) ν , 1 ≤ ν ≤ r α , be all singular values at α. Assume that we keep σ (α) ν for 1 ≤ ν ≤ s α and omit those for ν > s α . This means that (15) is reduced to b (α) with ≤ s α and that the double sum in (15) is taken over i ≤ s α 1 and j ≤ s α 2 . Let v be the input tensor, while v HOSVD denotes the truncated version. Then, the following estimate holds (cf. [10,Theorem 11.58 The first inequality allows us to explicitly control the error with respect to the Euclidean norm by the choice of the omitted singular values. The second inequality proves quasioptimality of this truncation. v best is the best approximation with the property that v best satisfies rank α (v best ) ≤ s α . The parameter d is the order of the tensor, i.e., d = 6 in the case of Fig. 3 and d = 3 for Fig. 2. Only in the (matrix) case of d = 2, v HOSVD coincides with v best .

Convolution
The treatment of Section 3.4 for the Hadamard operation holds for any binary operation with the property (10). Because the multivariate convolution satisfies the analogous condition (12), the constructions of Section 3.4 also hold for the convolution instead of . Therefore, we can perform the convolution in R n ⊗ R n ⊗ R n ∼ = R N , provided that we are able to perform the convolution (v w) i = v i− w in R n . The standard approach is the use of FFT (fast Fourier transform): First, the vectors v, w are mapped into their (discrete) Fourier imagesv,ŵ; then, the Hadamard product x :=v ŵ is back-transformed into the convolution resultx = v w (with suitable scaling). As wellknown, the corresponding work is O(n log n). For large n, this is still expensive. In the next chapter, we shall describe a much cheaper algorithm for v w.

Tensorisation
The tensorisation has been introduced by Oseledets [17] (but for matrices instead of vectors). It is more natural to study this technique for vectors. The article Khoromskij [15] 7 is the first one in this direction and contains several examples of this technique. Tensorisation 8 together with truncation can be considered as an algebraic data compression method which is at least as successful as particular analytical compressions, e.g., by means of wavelets, hp methods. The analysis by Grasedyck [6] shows that under suitable conditions, the data size N(ṽ ε ) = O(log n) can be expected. Compression by tensorisation can be seen as a quite general multi-scale approach.
Here, we consider operations between vectors. The crucial point is that the computational work of the operations should be related to the data size of the operands. Assuming a data size n, the cost should also be much smaller than the operation cost in the standard R n vector format. In particular, we discuss the Hadamard product and the (one-dimensional) convolution operation u := v w with u i = k v k w i−k . We shall show that the convolution procedure can be applied directly to the tensor approximationsṽ ε andw ε . The algorithm developed in Section 4.4 has a cost related to the data sizes N(ṽ ε ), N(w ε ).

Grid Functions in R n
The following algorithms will apply to vectors in R n with n = 2 L . The connection to the previous part is given by the fact that in Section 3 we have to perform various operations with the basis vectors b (j ) i ∈ R n . However, more general, the techniques of this chapter can be used for computations in R n without connection to the tensor problems in Sections 2 and 3.
Tensorisation is an interpretation of a usual R n vector as a tensor. Since n = 2 L , there is a representation of the indices 0 ≤ k ≤ n − 1 by the binary numeral (i L , i L−1 , . . . , i 1 ) 2 : Since n = dim(R n ) = dim(⊗ L R 2 ) = 2 L , (22) describes an isomorphism On the side of tensors, we shall introduce a hierarchical tensor representation (cf. Section 3). This allows a simple truncation procedure v → v ε (cf. Section 3.6). Often, the data size N(v ε ) of v ε is much smaller than n (see Example 2). As a consequence, the tensorisation together with the truncation yields a black-box compression method for vectors in R n .

TT Format
The underlying tree of the hierarchical representation is the linear tree 9 depicted in Fig. 5. Hierarchical representations based on a linear tree are introduced by Oseledets [17] as TT format (cf. Oseledets-Tyrtyshnikov [18]). In principle, the hierarchical format requires subspaces at the leaves. Since R 2 is extremely low-dimensional, we take the full space R 2 and fix the basis by b (j ) Figure 5 corresponds to L = 4 (i.e., n = 16). We replace the index α = {1, 2, . . . , μ} for the inner vertices by μ ∈ {2, . . . , L}. The subspaces U μ belong to ⊗ μ R 2 ∼ = R 2 μ (in particular U 1 = R 2 ).
Since the TT-rank r μ = rank(M (μ) ) is the minimal dimension of the required subspace U μ ⊂ ⊗ μ R 2 , the matricisation M (μ) of a tensor v is of interest. In fact, M (μ) can be expressed by means of the corresponding vector v = (v): Since we use the spaces R 2 at the leaves, condition (13) becomes while (15) is Inspection of the matrix M (μ) shows that all columns except the first one contain grid values of a polynomial of degree g. Hence this part has at most the rank g + 1. The first column can increase the rank only by one so that r μ = rank(M (μ) ) ≤ g + 2. Therefore, the TT format representing v = −1 (F ) is of the same size as the hp approach. The optimal approximation of f by the TT format with rank(M (μ) ) ≤ g + 2 yields an error which is as most as large as the hp error, i.e., it is exponentially decreasing with g. More details can be found in Grasedyck [6]. This example (mentioned in [15]) implies the next remark.

Remark 6
All functions with a limited number of exponential terms lead to a constant bound of rank(M (μ) ) (e.g., f (x) = r ν=1 α ν exp(−β ν x) yields rank(M (μ) ) ≤ r). A similar result holds for functions involving a fixed number of trigonometric terms (band-limited functions).
An example of a band-limited function can be found in Khoromskij-Veit [16]. The next example again shows that exponential sums can approximate functions with point singularities (Remark 5 is another approach to this problem). This fact is important for applications in quantum chemistry where singularities appear at the positions of the nuclei. This is an indication that the basis vectors appearing in U j (1 ≤ j ≤ 6) for the problem (1) allow a tensorisation with moderate ranks. 1). For any r ∈ N, there is an approximation v (r) ∈ R n such that v (r) := −1 (v (r) ) yields ranks r μ = rank(M (μ) ) ≤ r and satisfies the componentwise error estimate Hence, for a given error bound ε > 0, the choice r = O(log(n) + log 1 ε ) is sufficient. The storage size of the tensor v (r) is O(log 2 (n) + log(n) log 1 ε ).

Hadamard Product in R n
Since it does not matter whether the componentwise multiplication is realised via v k · w k or v[i 1 , . . . , i L ] · w[i 1 , . . . , i L ], the property (10) holds also in the case of the artificial tensor product ⊗ L R 2 ; more precisely, ⎛

Conclusion 1 Assume v = (v) and w = (w).
Let v, w be represented by the TT format. Then the Hadamard product v w can be computed as explained in Section 3.4. Since (v w) = v w, the result is the tensorisation of v w. The computational cost is discussed in Section 3.4.
We return to the hierarchical format for true tensors as in Figs. 2 or 3. The subspaces at the leaves are described by bases containing R n vectors. The application of the tensorisation to these vectors corresponds to an extended tree as sketched in Fig. 6.
The combination of the tree in Fig. 2 with the TT tree corresponds to R N ∼ = ⊗ 3 (⊗ L R 2 ) ∼ = ⊗ 3L R 2 . For tensors represented in this format, we can again apply the algorithm in Section 3.4 to compute v w for v, w ∈ R N . Fig. 6 Extended tree

Definition of the Convolution
We take a closer look to the convolution operation. The sum in (v w) i = v i− w is restricted to those with 0 ≤ i − , ≤ n − 1, i.e., If i varies in [0, n − 1] ∩ Z, the sum can be written as i =0 . For i < 0, the empty sum yields (v w) i = 0, but for n ≤ i ≤ 2n − 2, the sum in (27) is not empty. This shows the following remark.

Remark 7
The convolution of two R n vectors yield an R 2n−1 vector.
The notation becomes simpler if we replace the vector v ∈ R n by the infinite sequence v = (v i ) i∈N 0 with N 0 = N ∪ {0} and v i := 0 for all i ≥ n. The set 0 = 0 (N 0 ) consists of all sequences with only finitely many nonzero components. Now, the sum becomes , where all indices are understood modulo n. These values can be obtained by (v per w) i = (v w) i + (v w) n+i for 0 ≤ i ≤ n − 1.

Principal Idea of the Algorithm
For multivariate (grid) functions, the definition of the convolution implies the property (10): the convolution of elementary tensors can be reduced to the tensor product of one-dimensional convolutions.
Since now the vector v is replaced by the tensor v ∈ ⊗ L R 2 , an obvious question is whether the product of v = ⊗ L j =1 v (j ) and w = ⊗ L j =1 w (j ) can be expressed by x := ⊗ L j =1 (v (j ) w (j ) ) corresponding to (10), i.e., whether the corresponding vectors satisfy (v) (w) = (x). In the naive sense, this cannot be true by the simple reason that v (j ) w (j ) is a vector with three nontrivial components (cf. Remark 7). Therefore, the result does not belong to ⊗ L R 2 . Furthermore, we must expect a result in ⊗ L+1 R 2 since v w has the length 2n − 1 > 2 L and < 2 L+1 .

Extension to ⊗ L 0
According to Section 4.4.1, R 2 can be considered as a subspace of 0 . Hence, ⊗ L R 2 is contained in ⊗ L 0 . The linear map defined in (23) can be extended to : (cf. Remark 1). In the case of v (j ) ∈ R 2 , the sum on the right-hand side of (28) contains only one term for 0 ≤ k ≤ n − 1 and the product L (22)). For a better understanding, we look at the case of L = 2.

Remark 9
Let e i ∈ 0 be the ith unit vector, i.e., e i [j ] = δ ij (i, j ∈ N 0 ). Then, b := (a ⊗ e i ) is the vector a ∈ 0 shifted by 2i positions: b k := 0 for 0 ≤ k < 2i and b k = a k−2i for k ≥ i.
The shift by p positions is denoted by S p . Thus, we can write b = S 2i a.

Polynomials
Next, we use the isomorphism between 0 and the space P of polynomials described by The connection with the convolution is given by the property that the product of two polynomials has the coefficients of the convolution product: We define an extension of π : 0 → P toπ : The extended map : By comparing the values under the mapπ , we obtain the following result.
According to (10), we define the convolution of two (elementary) tensors in ⊗ L 0 by ⎛ Now, the product v (j ) w (j ) makes sense since it belongs to 0 . Next, we have to prove that the convolution introduced in (32) is consistent with the usual convolution of vectors.

Carry-over Procedure
The result L j =1 (v (j ) w (j ) ) is still unsatisfactory because v (j ) , w (j ) ∈ R 2 produce v (j ) w (j ) ∈ R 3 . A solution can be as follows. Let L = 2 as in Remark 9. Consider a ⊗ b with a, b ∈ 0 . We want to find an equivalent tensor with factors in R 2 . Assume that a K = 0, but a i = 0 for i > K, which implies a ∈ R K+1 . If K = 1, a belongs to R 2 and nothing has to be done. If K > 1 set a ∈ R 2 with a i = a i for i = 0, 1 and a ∈ 0 with a i = a i+2 for i ∈ N 0 . Using Remark 9, one checks that a⊗b represents the same vector as a ⊗b+a ⊗Sb, where Sb is the shifted version of b: a ∈ R 2 is already of the desired form. a belongs to R K−1 . This procedure can again be applied to a ⊗ b until all first factors belong to R 2 .
In the case of a general tensor L j =1 v (j ) , this procedure is applied to the first factor v (1) and yields sums of elementary tensors of the form w (1) ⊗ L j =2 w (j ) with w (1) ∈ R 2 . Then, the procedure is repeated with the second factor resulting in sums of the terms (1) , x (2) ∈ R 2 , etc. In the case of the last factor, we may have to add an (L + 1)-th factor. Since we know that v w belongs to R 2n−1 , the (L + 1)-th factor must belong to R 2 .

Convolution Algorithm
We recall Remark 7: If v, w ∈ L j =1 R 2 , the result is a tensor u := v w in L+1 j =1 R 2 . Lemma 3 describes the start at δ = 1, while Lemma 4 characterises the recursion. In the following, the vector notation v = α β means v 0 = α, v 1 = β, i.e., the components must be read from the top to the bottom. By v ∼ w, we denote the equivalence (v) = (w).

Lemma 3 The convolution of
Furthermore, the shifted vector has the tensor representation The basic identity is given in the next lemma.

Lemma 4 For given
Then, convolution of the tensors v ⊗ x and w ⊗ y with x = α β , y = γ δ ∈ R 2 yields Proof Lemma 2 implies that Assumption (34a) yields Lemma 1 shows that Using (33a) and (33b), we obtain ⊗ 0 1 . Summation of both identities yields the assertion of the lemma.

Convolution of Tensors in Hierarchical Format
We recall that the subspaces U δ ⊂ ⊗ δ R 2 satisfy (25): U δ+1 ⊂ U δ ⊗ R 2 . The essential observation is that also the results of the convolution yield subspaces with this property.
Note that there are three different tensors v, w, u := v w involving representations with three different subspace families U δ , U δ , U δ (1 ≤ δ ≤ L). The bases spanning these subspaces consist of the vectors b i . The dimensions of the subspaces are r δ , r δ , r δ .
Theorem 2 Let the tensors v, w ∈ L j =1 R 2 be represented by (possibly different) hierarchical formats using the respective subspaces U δ and U δ , 1 ≤ δ ≤ L, satisfying The subspaces The dimension of U δ can be bounded by Proof (i) U 1 = R 2 can be concluded from Lemma 3.

Lemma 5 The TT-rank r μ of T = T (a) is bounded by
Proof Split the matrix in (39)