MERACLE: Constructive layer-wise conversion of a Tensor Train into a MERA

In this article two new algorithms are presented that convert a given data tensor train into either a Tucker decomposition with orthogonal matrix factors or a multi-scale entanglement renormalization ansatz (MERA). The Tucker core tensor is never explicitly computed but stored as a tensor train instead, resulting in both computationally and storage efficient algorithms. Both the multilinear Tucker-ranks as well as the MERA-ranks are automatically determined by the algorithm for a given upper bound on the relative approximation error. In addition, an iterative algorithm with low computational complexity based on solving an orthogonal Procrustes problem is proposed for the first time to retrieve optimal rank-lowering disentangler tensors, which are a crucial component in the construction of a low-rank MERA. Numerical experiments demonstrate the effectiveness of the proposed algorithms together with the potential storage benefit of a low-rank MERA over a tensor train.

1. Introduction. Tensor decompositions have played an important role over the past 2 decades in lifting the curse of dimensionality in myriad of applications [2,3,4,17,25]. The key idea in lifting the curse of dimensionality with tensor decompositions is the usage of a low-rank approximation. Many kinds of decompositions have consequently been developed and each has its own rank definition. The canonical polyadic decomposition (CPD) [1,14,15] and Tucker decomposition [1,26] both generalize the notion of the matrix singular value decomposition (SVD) to higher order tensors and have therefore received a lot of attention. More recent tensor decompositions are the Tensor Train [20,7,8,17] (TT) and hierarchical Tucker decomposition [11,12]. It turns out that the latter two decompositions were already known in the quantum mechanics and condensed matter physics communities as the matrix product state (MPS) [22] and Tensor Tree Network [24], respectively. The multi-scale entanglement renormalization ansatz (MERA) [9,29] is an extension of the TTN decomposition, recently proposed in quantum mechanics but has so far not received enough attention in the numerical linear algebra community. A key component of the MERA is the socalled disentangler tensor, responsible for limiting the growth of the TTN-ranks over consecutive levels. Although the computation of a MERA from a given tensor can be deduced from [9], computations are intensive due to multiple contractions and do not allow for the discovery of optimal ranks of the decomposition. The contributions of this article address this area. Specifically, we 1. propose an algorithm that converts a given TT into a Tucker decomposition with guaranteed error bounds. 2. propose an algorithm that converts a given TT into a MERA with guaranteed error bounds. This algorithm is called MERA Constructive Layer-wise Expansion (MERACLE). 3. propose an iterative algorithm that computes a rank-lowering disentangler. The resulting ranks of the computed Tucker and MERA approximations are com-pletely determined by a given upper bound on the relative approximation error. The conversion of a TT into a Tucker decomposition was first suggested in [6], where the corresponding algorithm uses an iterative Alternating Least Squares (ALS) approach. It will be shown in this article that no ALS procedure is necessary. In fact, for a D-th order tensor it is sufficient to perform D consecutive SVD computations as described in Algorithm 3.1. It is then shown in Algorithm 4.1 that a TT can be converted into an L-layer MERA by applying Algorithm 3.1 2L times. The obtained MERA ranks are, however, not optimal and this is identified to be due to the disentangler tensor computation. An iterative orthogonal Procrustes algorithm is proposed that, to our knowledge for the first time ever, is able to compute optimal disentanglers that result in a minimal-rank MERA.
In Section 2 we introduce the notation and relevant tensor decompositions. The algorithm that converts a given TT into a Tucker decomposition with a guaranteed relative error bound is fully described in Section 3. The application of Algorithm 3.1 for the conversion of a given TT into a MERA with a guaranteed relative error bound is illustrated in Section 4. Section 5 discusses the problem of finding optimal disentangler tensors and the iterative Procrustes algorithm is proposed. Finally, in Section 6 numerical experiments demonstrate the effectiveness of the proposed algorithms.
2. Tensor basics. A D-way or Dth order tensor A ∈ R I1×I2×···×I D is a Ddimensional array where each entry is completely determined by D indices i 1 , . . . , i D . The scalar D is also often called the order of the tensor. The convention i d = 1, 2, . . . , I d is used, together with MATLAB colon notation. Boldface capital calligraphic letters A, B, . . . are used to denote tensors, boldface capital letters A, B, . . . denote matrices, boldface letters a, b, . . . denote vectors, and Roman letters a, b, . . . denote scalars. The identity matrix of order N is denoted I N . The Frobenius norm ||A|| 2 F of a tensor A is defined as the sum of squares of all tensor entries. The order of a tensor can be altered by grouping several indices together into a multi-index. The conversion of a multi-index [i 1 i 2 · · · i D ] into a linear index is per definition In what follows, we will introduce three important tensor operations. The first tensor operation is the "reshape" operation, which changes the order of a given tensor and is commonly used to flatten tensors into matrices and vice versa.
Another important operation is the generalization of the matrix transpose to three or more indices.
Definition 2.2. The operator "permute(A, p)" rearranges the indices of A ∈ R I1×I2×···×I D so that they are in the order specified by the vector p. The resulting tensor has the same values of A but the order of the subscripts needed to access any particular element is rearranged as specified by p. All the elements of p must be unique, real, positive, integer values from 1 to D.
The definition of the "permute" operation allows one to write the transpose of a matrix A as permute(A, [2,1]). By combining both the reshape and permute operations, we can now introduce the mode-d matricization A <d> of a tensor.
The mode-d matricization A <d> is hence obtained from A as The third and final important tensor operation is the summation over indices, also called contraction of indices. A particular common operation in this regard is the d-mode product of a tensor with a matrix.
A very convenient graphical representation of D-way tensors is shown in Figure 1(a). Tensors are here represented by nodes and each edge denotes a particular index of the tensor. The order of the tensor is then easily determined by counting the number of edges. Since a scalar is a zeroth-order tensor, it is represented by a node without any edges. The graphical representation of a summation over an index is by connecting the edge between the two nodes in the diagram. For example, the two index summations of a 3-way tensor A ∈ R I1×I2×I3 with a matrix U 1 ∈ R J1×I1 and a vector u 3 ∈ R I3 is graphically depicted in Figure 1(b) by two connected edges between the nodes for A, U 1 and u 3 . The result from these two summations is a J 1 × I 2 matrix, which can also be deduced from the two "free" edges in Figure 1(b). Three important tensor decompositions in this article are the Tucker decomposition, the TT and the MERA. Each of these decompositions will now be briefly discussed.

Tucker decomposition. The Tucker decomposition represents a tensor
A ∈ R I1×···×I D as Diagram of a Tucker decomposition.
Diagram of a Tensor Train. where S ∈ R S1×···×S d is called the Tucker core tensor and are the Tucker factor matrices. The total storage complexity of the Tucker decomposition is therefore These factor matrices are typically chosen to be orthogonal and can then be obtained as the left singular vectors of the corresponding unfolded matrices of A. A special case of the Tucker decomposition is the HOSVD [5], which has orthogonal matrices and where the Tucker core satisfies two additional properties. The dimensions S 1 , . . . , S D of the Tucker core are called the multilinear rank of A and are defined as for all values of d. A graphical representation of the Tucker decomposition is shown in Figure 2(a).

Tensor Train decomposition.
The TT decomposition was introduced into the scientific computing community in [20], but was known as a Matrix Product State in the field of condensed matter physics [22,23] a decade earlier.
Definition 2.5. The TT decomposition of a given tensor A ∈ R I1×I2×···×I D is a set of 3-way tensors The 3-way tensors of the TT are also called the TT-cores and the minimal values of R 1 , . . . , R D for which (2.4) holds exactly for all tensor entries are called the TT-ranks. When R 1 = R D+1 > 1 the decomposition is called a Tensor Ring (TR), for which the diagram is shown in Figure 2(b) with all dimensions of the TT-cores indicated. We will consider from now on only the TT case and therefore the R 1 -link in Figure 2(b) that "closes the loop" will not be drawn in future diagrams anymore. The total storage complexity of a TT is The TT-ranks are upper bounded as described by the following theorem. Suppose now that we have a Tucker core in TT form. The mode-products of the Tucker factor matrices with this Tucker core in TT form do not alter its TT-ranks. Theorem 2.6 therefore reveals the connection between the upper bounds on the TTranks of a given tensor and its multilinear rank.
Corollary 2.7. Let A be a D-way tensor with multilinear rank S 1 , . . . , S D , then its TT-ranks R 2 , . . . , R D satisfy The TT approximation of a given tensor with a prescribed relative error can be computed with either the TT-SVD algorithm [20, p. 2301] or TT-cross algorithm [21]. Furthermore, through the TT-rounding procedure [20, p. 2305] the TT-ranks of a given TT can be truncated such that the computed approximation satisfies a prescribed relative error. The notion of a TT in site-d-mixed-canonical form will be very important in the development of the algorithms in this article and relies on both left-orthogonal and right-orthogonal TT-cores.
A TT is in site-d-mixed-canonical form when all TT-cores A (1) up to A (d−1) are left-orthogonal and all TT-cores A (d+1) up to A (D) are right-orthogonal.
Once a TT is in site-d-mixed-canonical form, then it can be readily verified that its Frobenius norm is easily obtained from the dth core tensor The MERA decomposition is a generalization of the Hierarchical Tucker decomposition and consists of three different building blocks. A common implementation of the Hierarchical Tucker decomposition is the binary tree form, as shown in Figure 3(a). Reading such a diagram from the bottom to the top, one can interpret each row/layer in such a tree structure as a coarse-graining transformation where each tensor in a row/layer transforms two indices into one index. Such tensors W of size I 1 × · · · × I K × S that reduce K > 1 indices to a single index are called isometries. An isometry can always be reshaped into a size I 1 I 2 · · · I K × S matrix W with orthonormal columns where S is the dimension of the "output" index. The minimal outgoing dimensions of all isometries such that the MERA represents a given tensor exactly are called the MERA-ranks. The diagram representation of an isometry is shown in Figure 4    The bottom layer of isometries with K = 2 in Figure 3(a) reduces the eight indices of a given tensor into four indices, as illustrated in Figure 5. Each application of a layer in the tree halves the resulting total number of indices. The coarse-graining with a Hierarchical Tucker decomposition pairs two consecutive indices and sums over them, thereby ignoring possible correlations over neighbouring indices resulting in higher ranks during coarse-graining. This issue is resolved in the MERA through the introduction of additional disentangler tensors in the coarse-graining layers. Disentanglers, shown as shaded nodes in Figure 3(b), "bridge" neighbouring pairs before being coarse-grained. A disentangler tensor is per definition a 4-way tensor V of size I 1 × I 2 × I 1 × I 2 that can be reshaped into an orthogonal I 1 I 2 × I 1 I 2 matrix V . The reduction of an 8-way TT into a 4-way TT through a MERA layer is shown as a diagram in Figure 6. The third and final MERA building block is the top tensor. This tensor T is located at the top of the MERA structure and connects to all outgoing isometry indices of the highest layer. Since all disentanglers and isometries have their respective notion of orthogonality, it follows that the Frobenius norm of a tensor A that is represented by a MERA is given by ||A|| 2 F = ||T || 2 F . This easy computation of the norm due to orthogonality is very similar to the case of a TT in site-d-mixed canonical form. The storage complexity of a MERA is simply the sum of storage complexities of all disentanglers, isometries and the top tensor. In this respect, it is only meaningful from a data tensor compression perspective to have MERA-ranks that do not increase over consecutive layers. In the next section, we develop the main algorithm to convert a given TT into a Tucker decomposition and this algorithm will serve as the main computational building block to eventually convert a TT into a MERA.  3. Tensor train to Tucker decomposition. In this section, an algorithm is developed that converts a given TT into either a HOSVD or truncated HOSVD with a guaranteed upper bound on the relative approximation error. The Tucker core S will be directly obtained in the TT format, avoiding its exponential storage complexity. The starting point of the algorithm is a TT in site-1-mixed-canonical form. Before stating the algorithm, we first introduce some additional notation together with an important lemma.
3.1. Tucker factor matrix from TT-core. In order to know how a given TT can be converted into a Tucker decomposition we need to know how the Tucker factor matrices can be computed from each TT-core. In order to describe this computation we first introduce the following convenient notation.
Definition 3.1. Let A (1) , . . . , A (D) be TT-cores of a D-way tensor A. We define A <d as the R d × (I 1 · · · I d−1 ) matrix obtained from summing over the auxiliary indices of A (1) up to A (d−1) and permuting and reshaping the result into the desired matrix. The R d+1 × (I d+1 · · · I D ) matrix A >d is defined similarly from the TT-cores is defined from permuting and reshaping A (d) . Finally, both A <1 and A >D are defined to be unit scalars.
Note that if the TT of A is in site-d-mixed-canonical form, then the left and rightorthogonality of the TT-cores implies that both A <d and A >d have orthonormal rows The following lemma tells us how the unfolding matrix A <d> can be written in terms of the matrices from Definition 3.1.
Lemma 3.2. For a D-way tensor A in TT-form we have the following relationship The Kronecker product in Lemma 3.2 is due to the rank-1 link of the TT. Also note that the rows of (A >d ⊗ A <d ) are orthonormal when the TT is in site-d-mixedcanonical form, due to the preservation of orthonormality with the Kronecker product. Lemma 3.2 tells us that any unfolding matrix A <d> can be written as a product of A d with (A >d ⊗ A <d ), which leads to the following two corollaries. Corollary 3.3. For a D-way tensor A with multilinear ranks S 1 , . . . , S D we have that In Corollary 3.3 we have tacitly assumed that R d R d+1 < k =d I k is always satisfied. Corollary 3.4 follows directly from the fact that the product of matrices with orthonormal rows also has orthonormal rows. The matrix A T >d ⊗ A T <d V therefore contains the right singular vectors of A <d> corresponding with the S d largest singular values. Corollary 3.4 also implies that the HOSVD factor matrix U d can be directly computed from the SVD of A d . The dth component S d of the multilinear rank can be determined by inspecting the singular values on the diagonal S matrix. If there is no need to know the exact multilinear rank, then a square orthogonal U d can also be obtained through a QR decomposition of A d .
3.2. The TT to Tucker conversion algorithm. Lemma 3.2 forms the basis of the proposed algorithm to convert a given TT into either a HOSVD or truncated HOSVD. The algorithm to compute a truncated HOSVD is presented in pseudo-code as Algorithm 3.1. The algorithm assumes the TT is in site-1-mixed-canonical form but can be easily adjusted to work for any other starting site. The main idea of Algorithm 3.1 is to compute the orthogonal factor matrix U d using Corollary 3.4 and then to bring the TT into site-(d + 1)-mixed-canonical form. The conversion of the TT into site-d + 1-mixed-canonical form is computed through a QR decomposition of the SV T factor. The orthogonal Q matrix is then retained as the dth TT-core of the Tucker core S, while the norm of A is moved to the next TT-core A (d+1) through the absorption of the R factor. The final TT of S will therefore be in site-D-mixedcanonical form. Both the SVD step and the QR decomposition step are graphically represented in Figure 7. During each run of the for-loop in Algorithm 3.1 we are working with a partially truncated core tensor, which is very reminiscent of the ST-HOSVD algorithm [27]. In fact, the approximation error induced by truncating the SVD in Algorithm 3.1 can also be expressed exactly in terms of the singular values.
Given that the TT for the all-orthogonal Tucker core is in site-d-mixed-canonical form and the similarity of Algorithm 3.1 concerning the use of a sequentially truncated Tucker core, it follows that the proof of Theorem 3.5 is completely identical with the one found in [27, p. A1039]. Theorem 3.5 allows us to compute the absolute approximation error for a truncated HOSVD during the execution of Algorithm 3.1 by simply adding the squares of the discarded singular values. In addition, Theorem 3.5 also allows us to compute a truncated HOSVD for a given upper bound on the relative approximation error. Since Algorithm 3.1 consists of D truncated SVDs, setting the tolerance δ for each of these SVDs to ||A|| F / √ D then effectively guarantees that the computed approximation B satisfies ||A − B|| F ≤ ||A|| F . 10: Q, R ← QR(T ) 11: 12: end if 14: end for 3.3. Computational complexity. In this subsection we briefly analyze the computational complexity of Algorithm 3.1. For notational convenience we will assume that a D-way tensor A ∈ R I×···×I is represented by a TT with uniform TT-rank R. An additional assumption is that I < R 2 . The computation of D thin SVDs of A d ∈ R I×R 2 in line 3 takes then D(14R 2 I 2 + 8I 3 ) flops [10, p.493]. The QR decompositions in line 10 required for the computation of the site-(d+1)-mixed-canonical form require (D − 1) 2I 2 (R 2 − I/3) + 4(R 4 I − R 2 I 2 + I 3 /3) flops [10, p.249] when performed with Householder transformations. In practical cases we have that I ≤ R 2 and this implies that the total computational complexity for Algorithm 3.1 is dominated by the O(R 4 I) term of the QR decompositions. If instead of a guaranteed relative approximation error a Tucker decomposition with given multilinear-rank is desired, then one can replace the SVD in line 3 of Algorithm 3.1 by a randomized SVD [13] or an Implicitly Restarted Arnoldi Method [19]. Also note that the actual complexity will depend heavily of the order of the indices, which is also the case with the sequentially truncated HOSVD. In practice, a heuristic that reduces the computational complexity is to permute the dimensions of the tensor A in an ascending manner prior to computing its Tucker decomposition [27, p. A1041] as this permutation typically reduces the maximal value of R.
A Tucker decomposition where the Tucker core tensor is stored as a TT was first introduced in [6]. Algorithm 5 [6, p. 611] describes how such a decomposition can be obtained by means of an iterative ALS method. One disadvantage of an ALS approach, however, is that the desired TT-ranks need to be chosen a priori. An alternative DMRG approach that is able to retrieve the TT-ranks has been proposed 4. Tensor train to MERA. The conversion of a TT into a MERA can be done via a sequence of HOSVD and truncated HOSVD computations. The disentanglers are computed through an HOSVD while the isometries are obtained through a truncated HOSVD. The conversion algorithm will be demonstrated through an illustrative example that consists of a TT with eight TT-cores with dimensions R d × I × R d+1 for d = 1, . . . , 8. The goal is to compute a MERA for which the isometries convert K = 2 indices into one. As demonstrated in Figure 6, the "action" of the first MERA layer is the application of three disentanglers. The diagram representation of the required operations to find these disentanglers is shown in Figure 8. The required disentanglers are orthogonal transformations on three index pairs. The relevant TT-cores are contracted over their auxiliary indices R 3 , R 5 , R 7 to obtain so-called "supercores". For example, TT-cores A (2) and A (3) are combined into a supercore A (2,3) ∈ R R2×I 2 ×R4 , where the two free indices of size I are combined into one multi-index of size I 2 . Algorithm 3.1 is then applied to these supercores with a full SVD in order to obtain the desired disentanglers. The bottom row of Figure 8 shows the obtained partial Tucker core with the orthogonal factor matrices, which will serve as the transposes of the disentanglers. For example, Algorithm 3.1 allows us to write where S (2,3) ∈ R R2×I 2 ×R4 is represented by the leftmost oval of the bottom row in Figure 8 and U 2,3 ∈ R I 2 ×I 2 is an orthogonal matrix. The desired disentangler is then obtained by reshaping U T 2,3 into a cubical 4-way tensor of dimension I. The partial Tucker core is now used as the starting point for obtaining the isometries, as shown in the top row of Figure 9. The supercores, represented by the ovals in the top row of Figure 9, first need to be split back into separate TT-cores through an SVD, e.g the supercore S (2,3) is reshaped into the R 2 I × IR 4 matrix S 2,3 with S 2 := U and S 3 := SV T . The rank R 3 is determined as the number of nonzero singular values such that S 2 ∈ R R2I×R3 and S 3 ∈ R R3×IR4 . The desired TT-cores are obtained by reshaping S 2 , S 3 into the desired 3-way tensors. In this way we arrive at the second row from the top of Figure 9. In a MERA with K = 2 are the isometries orthogonal transformations that convert two consecutive TT indices into one index. The next step is therefore to form new supercores by summing over auxiliary indices R 2 , R 4 , R 6 and R 8 . Applying Algorithm 3.1 with a truncated SVD then results in the desired isometries. Indeed, the first supercoreÂ (1,2) can then be written aŝ withŜ (1,2) ∈ R 1×S 2 ×R3 and U 1,2 ∈ R I 2 ×S . The bottom row of Figure 9 shows the diagram of the truncated HOSVD in TT form. The desired isometry is obtained by reshaping U 1,2 into a I × I × S matrix, where S is the truncated index. Theorem 3.5 allows us to quantify the absolute approximation error due to truncation at each isometry step in the formation of the MERA and compute a MERA that approximates a given tensor with a guaranteed relative error. If sufficient MERA layers have been computed through this procedure, then the remaining Tucker core can be retained as the top tensor. This final step also ensures that the norm of the MERA is completely determined by the top tensor. The pseudocode for the whole algorithm is presented in Algorithm 4.1.

5.
Iterative algorithm for finding a rank-lowering disentangler. Assuming that an exact low-rank MERA exists for a given TT, Algorithm 4.1 will typically fail to find it. In practice, the output dimensions S of the isometries will simply be the product of the input dimensions I 1 I 2 · · · I K and no truncation is ever performed. This leads to an exponential growth of the isometry output dimensions as a function of the number of MERA layers. The problem with Algorithm 4.1 is that it fails to find the correct disentanglers. In order to explain the issue at hand, we first need to explain the workings of a disentangler in a bit more detail.
5.1. Disentangler. As mentioned earlier in Section 2.3, disentanglers were originally introduced in order to remove possible correlations between neighbouring indices in order to avoid high TT-ranks after coarse-graining [29]. Figure 10 illustrates the key effect of a disentangler on a simple example of 4 TT-cores with dimensions Compute supercores of the TT according to desired disentangler locations.

5:
Apply Algorithm 3.1 with full SVD on all supercores.

7:
Split supercores of the partial Tucker core with the SVD.

8:
Compute new supercores according to the location and order K of the isometries.

9:
Apply Algorithm 3.1 with a truncated SVD δ on all supercores.

11:
if final MERA layer then 12: Retain single Tucker core tensor as the top tensor. Split supercores of the Tucker core with an SVD. 15: end if 16: end for I 1 = I 2 = I 3 = I 4 = I. Note that the maximal TT-rank R between the second and third TT-core is I 2 . Suppose that R = I 2 . It is straightforward to see that having no disentangler implies that the output dimensions of the two isometries needs to be R = I 2 , as two indices with dimension I are simply combined into one multiindex. Now suppose that prior to the isometries, a disentangler can be applied to the second and third TT-cores such that the TT-rank R is reduced to R < R. In this case, the two isometries can truncate the dimensions I 2 down to R without the loss of any accuracy. Unfortunately, the disentanglers obtained from an HOSVD in Algorithm 4.1 do not reduce the TT-ranks, which implies that none of the isometries Fig. 9. Diagram of isometry computation through a truncated HOSVD step in the TT format as described in Algorithm 4.1.
can effectively truncate the dimensions. If we are able to develop an algorithm that can find a rank-lowering disentangler, then Corollary 3.3 automatically guarantees that the truncated HOSVD in line 9 of Algorithm 4.1 will find an optimal isometry. In the next subsection we propose an iterative algorithm that attempts to recover rank-lowering disentanglers.

Iterative orthogonal Procrustes algorithm.
Before stating the problem of finding the optimal disentangler in a formal way, we first introduce some convenient notation.
Definition 5.1. For a supercore A (d,d+1) ∈ R R d ×I d I d+1 ×R d+2 we define the following matricizations These matrices are per definition related to one another via the shuffling operator shuf and its inverse With these definitions the optimal disentangler problem can now be formulated.
Problem 5.2. Given a supercore A (d,d+1) for which rank A (d,d+1) = R, find an orthogonal matrix V ∈ R I d I d+1 ×I d I d+1 such that Fig. 10. The disentangler reduces the TT-rank from R to R with R < R , allowing the two isometries to truncate to R without any loss of accuracy.
Problem 5.2 is essentially an orthogonal Procrustes problem in A with the additional constraint that the orthogonal transformation V lowers the rank of A (d,d+1) . The difficulty is that both A and shuf −1 (A ) are unknown. We therefore propose to solve the orthogonal Procrustes problem in an iterative manner, where we fix A (k,k+1) in every iteration to a low-rank approximation of A (d,d+1) . The computational complexity of solving the orthogonal Procrustes problem every iteration is O((I d I d+1 ) 3 ), as this amounts to computing the SVD of A A T . The proposed iterative algorithm is presented in pseudocode as Algorithm 5.1. The stopping criterion can be set to a fixed maximum number of iterations or one can inspect the rank-gap σ R /σ R +1 of A (d,d+1) and stop the iterations as soon as this gap has reached a certain order of magnitude. A low-rank approximation of A (d,d+1) can be computed via its SVD. At this moment, there is no formal proof of convergence for Algorithm 5.1, nor is it known what the conditions for convergence are. The best that we are currently able to do is to empirically show the successful application of this algorithm and to explore its properties based on extensive numerical experiments.

Algorithm 5.1 Iterative disentangler computation
Output: Disentangler V that reduces the TT-rank to R .

4.V ← solve orthogonal Procrustes problem that minimizes ||V
6. Experiments. In this section we demonstrate the computational efficiency of Algorithms 3.1, 4.1 and 5.1 through numerical experiments. All algorithms were implemented in MATLAB and the experiments were performed on a desktop computer with a 4-core processor running at 3.6 GHz with 16 GB RAM.

Converting a TT into Tucker -compression of simulation results.
In this experiment we demonstrate Algorithm 3.1 and how a representation of a Tucker decomposition can benefit compression without loss of accuracy. Inspired by the example discussed in [27, p. A1047], a tensor decomposition is used for the compression the solution u(x, y, t) of on the unit square [0, 1] 2 with boundary condition 0.25 − |0.5 − x| · |0.5 − y|, which also describes the initial temperature distribution over the entire square. The PDE was discretized with a uniform mesh with cell size (∆s, ∆s, ∆t) and solved with the explicit Euler method, using a time step 0.25∆s 2 to ensure numerical stability. We set ∆s = 10 −2 and ∆t = 0.25 · 10 −4 and simulate for about 0.25 seconds, resulting in a tensor of size 100 × 100 × 10000. The upper bound on the relative approximation error when computing tensor decompositions is set to 10 −3 . We compare the sequentially truncated HOSVD (STHOSVD) with both the TT and Tucker decomposition in TT form. The STHOSVD is computed with the mlsvd command of the Tensorlab toolbox [28], while the conversion of the original data tensor into a TT is done via the TT-SVD algorithm [20, p. 2301]. The TT is converted into a Tucker decomposition via Algorithm 3.1. We consider two cases. In the first case, we compute the three tensor decompositions on the original solution tensor, while in the second case we first reshape the original data into a 16-way tensor by factorization of all dimensions into their prime components. All results are shown in Table 1. The compression column contains the ratio between how many numbers are required to store the original tensor and how many numbers are required to store the decomposition. Not much difference in neither the total runtime, relative error or compression can be observed when the simulation solution is kept as a 3-way tensor. The computation of an STHOSVD of the 16-way tensor takes about 3 times longer than computing the corresponding TT. The resulting decomposition is also not able to compress the data very much as each of the dimensions of the 16-way tensor consist of (small) prime factors. The TT and Tucker decomposition in TT form, however, result in a saving of around 12,000, which is an improvement of more than 10 times compared to the 3-way case. The time required for Algorithm 3.1 to compute the Tucker decomposition was in both cases negligible compared to the runtime of the TT-SVD algorithm. For this we consider the MERA consisting of a single layer as depicted in Figure 11. The top tensor is taken to be an R × R grayscale image 1 . In this particular case we 1 The image was taken from http://absfreepic.com/free-photos/download/ landscape-with-lake-4412x2941 12692.html, cropped and scaled to appropriate dimensions.  Fig. 11. The disentangler reduces the TT-rank from R to R with R > R , allowing the two isometries to truncate to R without any loss of accuracy.
set R = 128. Both the I × I × R isometry tensors W and I × I × I × I disentangler tensor V are found from the orthogonalization of random matrices with appropriate sizes. For this experiment we set I = 19 and choose the isometries to be identical. Given the R × R grayscale image top tensor A shown in Figure 12(a), we can now apply the MERA 'backwards'. The application of the two isometries on A is resulting in an I 2 × I 2 image B, shown in Figure 12 Figure 12(c). The corresponding TT of the image C has TT-ranks 19 and 361 = I 2 . This increase of the TT-rank is reflected in the image as being much more 'noisy' while the low-rank image of Figure 12(b) has a particular block structure pattern. We now compare the use of Algorithm 3.1 with Algorithm 5.1 for retrieving a disentangler that is able to reduce the maximal TT-rank from 361 down to 128. Algorithm 5.1 is run on the TT of C in site-4mixed-canonical form and a rank-128 approximation of C (2,3) is used. Each iteration of Algorithm 5.1 took 0.03 seconds and, as shown in Figure 13(a), about 16,000 iterations were required for the 233 smallest singular values to converge to values of about 10 −15 . The computed disentanglers are then applied to the supercore C (2,3) . The singular value decay of each corresponding C (2,3) is shown in Figure 13(b), where it can be clearly seen that Algorithm 5.1 is able to retrieve a disentangler that lowers the rank to the minimal value of 128.    as it is yet unclear under which conditions we are able to retrieve an exact ranklowering disentangler. If I is fixed, then the rank of C is I 2 for the particular MERA of Section 6.2 and it appears that there exists a minimal value R min such that Algorithm 5.1 does not converge for values R < R min . There is however an exception to this observation in that Algorithm 5.1 always converges if R = 1. Table 2 lists all values of R min for values of I going from 2 up to 14, where convergence of Algorithm 5.1 was determined from inspecting the singular value decay as in Figure 13(a). A first observation is that R min grows slowly compared to R = I 2 , which implies that the range of values of R for which Algorithm 5.1 converges gets larger as I grows. The reason for the existence of this R min is yet to be fully understood.
A second observation relates to the rate of convergence. It turns out that Algorithm 5.1 converges faster as the difference between R and R becomes smaller. This is illustrated in Figure 14 where the number of iterations required for Algorithm 5.1 to reach a rank-gap of σ R /σ R +1 = 10 12 is shown for varying R when I = 8. An approximately exponential growth in the number of required iterations can be seen as the difference of R with R = 8 2 = 64 grows larger. This exponential growth might explain the existence of R min as a value of R for which convergence becomes 'infinitely slow'. These observations will serve as a starting point to investigate the exact nature of why and when Algorithm 5.1 works, apart from the empirical study herein.  6.4. Comparison of compression capability between a TT and a MERA on a large-scale example. In this experiment we compare the compression capability between a TT and a MERA. We also apply Algorithm 4.1 on a large-scale example for which a 12-way cubical tensor A of dimension 10 is generated that is exactly represented by a 2-layer MERA, where each of the isometries reduces K = 2 indices into 1 index S = 5. The first layer of the MERA coarse-grains 12 indices into 6 indices and each of the isometries in this layer is a 10 × 10 × 5 tensor. The second layer of the MERA coarse-grains the remaining 6 indices of the first layer into 3 indices and therefore consists of 5 × 5 × 5 isometries. The top tensor of the MERA is a 3-way cubical tensor with dimension 5. All isometries and disentanglers are initialized as random matrices, drawn from an standard normal distribution, which are then made orthogonal or orthonormal through a QR decomposition. The top tensor is also initialized as a random matrix. A comparison of the TT and MERA in terms of how well they compress the original 10 12 is given in Table 3. The corresponding TT has TT-ranks R 2 = 10, R 3 = 100, R 4 = 50, R 5 = 500, R 6 = 250, R 7 = 2500, R 8 = 250, R 9 = 500, R 10 = 50, R 11 = 100, R 12 = 10 and needs 15620200 elements. This constitutes a saving in storage space of 10 12 /15620200 = 6.40 × 10 4 . The MERA on the other hand consists of 54750 elements and this results in a saving of storage space of 10 12 /54750 = 1.82×10 7 . The MERA is therefore about 285 times smaller as the TT.
Using Algorithm 4.1 to convert the TT back into a MERA with an identical structure as the "true" MERA (K = 2 and S = 5) takes 32.74 seconds and results in a relative approximation error of 1.00. This large approximation error is explained by the truncated HOSVD (line 9 in Algorithm 4.1) step not being able to truncate the ranks without losing accuracy. Using Algorithm 4.1 to convert the TT back into a MERA and using Algorithm 5.1 for the disentangler computation takes 63.81 seconds. Setting the stopping criterion for Algorithm 5.1 to σ R /σ R +1 > 10 13 guarantees that a tolerance of 10 −12 can be used for the truncated HOSVD, thus obtaining a Table 3 Comparison of storage requirement and compression capability between a TT and a MERA for a 12th-order cubical tensor.

7.
Conclusions. This article has introduced two new algorithms for the conversion of a TT into a Tucker decomposition and a MERA. The computation of a MERA-layer was shown to consist of one HOSVD-step for the computation of the disentanglers and one truncated HOSVD-step for the computation of the isometries. Using HOSVD to compute disentanglers was shown to be sub-optimal in terms of reducing the rank and an iterative orthogonal Procrustes algorithm was proposed that is able to find rank-lowering disentanglers. Numerical experiments have demonstrated the efficacy of the proposed algorithms. The TT to Tucker decomposition algorithm was demonstrated to be fast compared to the conventional HOSVD algorithm and resulted in an improvement of storage complexity that was one order of magnitude smaller. The MERA was shown to have even more potential in storage complexity in an experiment involving a tensor that consisted of 10 12 elements where a compression improvement of a factor 285 compared to a TT was observed. The effectiveness and limitations of the orthogonal Procrustes algorithm were also explored in numerical experiments. The exact conditions under which this orthogonal Procrustes converges to a disentangler that retrieves an exact minimal-rank solution is still a topic for future research.