1 Introduction

The growth of new mobility services, such as automatic driving and ride-sharing, requires a huge amount of data, for example, sensor images and information about the movement of various objects. These data often have multiple attributes, and are stored as a multi-dimensional array called a tensor. When the number of attributes (indexes) of a tensor increases, the size of the tensor becomes huge because it increases exponentially as the number of indexes increases. Therefore, the growth of future mobility services depends on the high-speed and high-quality use of tensor data of huge sizes.

A tensor network (TN) is an effective tool for representing a huge composite tensor and has been extensively developed in quantum information and statistical physics research [2, 12, 13]. For example, it has been used to represent a ground state in quantum many-body systems, where the dimension of the tensor is the number of quantum objects as qubits [13]. Since the tensor network can effectively represent such high-dimensional data, we may efficiently process the general tensor data. Recently, various applications of TN for general tensor data processing have been proposed [1, 5, 7, 10, 14, 18, 24].

In this chapter, we introduce three research areas in statistical physics that use tensor-network formalism to process tensor data: tree tensor networks (TTNs), tensor ring decomposition, and MERA, an extended tree tensor network. After briefly reviewing tensor-network formalism in statistical physics in Sect. 5.2, we explain generative modeling using a TTN for a multi-dimensional probability distribution [1, 5]. In Sect. 5.3, we introduce a new optimization algorithm for the network structure of a TTN. In Sect. 5.4, we outline tensor ring decomposition [24] for the compression of tensor data and explain our new approach for removing redundant information in tensor ring decomposition. In Sect. 5.5, we introduce an extended tree tensor network called MERA [21], which represents the compression of quantum information. We consider the underlying mechanisms of the success of MERA through the MERA representation of the ground state of a one-dimensional quantum model. Finally, in Sect. 5.6, we summarize these tensor-network approaches for tensor data processing.

2 Tensor-Network Formalism

2.1 Tensor Contraction and Tensor Network

Statistical causality between random variables is defined by conditional probability. For example, consider five random variables, \(x_1, x_2, x_3, x_4,\) and \(x_5\), and assume that \(x_4\) and \(x_5\) statistically depend on \((x_1, x_2)\) and \((x_3,x_4)\), respectively. If all these random variables are discrete, and the support is finite, then the conditional probability \(p(x_5|x_1, x_2, x_3)\) is defined by two conditional probabilities:

$$\begin{aligned} p(x_5 | x_1, x_2, x_3) = \sum _{x_4} p_b(x_5|x_3, x_4) p_a(x_4|x_1, x_2), \end{aligned}$$
(5.1)

where \(p_a(|)\) and \(p_b(|)\) are conditional probabilities. If we define the elements of a tensor as a conditional probability, that is,

$$\begin{aligned} & A_{x_1 x_2 x_3} \equiv p_a(x_3|x_1, x_2),\ B_{x_3 x_4 x_5} \equiv p_b(x_5|x_3, x_4), \\ & T_{x_1 x_2 x_3 x_5} \equiv p(x_5|x_1, x_2, x_3), \end{aligned}$$

then (5.1) can be rewritten as

$$\begin{aligned} T_{x_1 x_2 x_3 x_5} =\sum _{x_4} B_{x_3x_4x_5} A_{x_1x_2x_4}. \end{aligned}$$
(5.2)

The right-hand side of (5.2) is an example of a tensor contraction, which is a generalization of a matrix product. A tensor contraction is defined as the summation of the multiplication of two tensor elements, with some indexes common to both tensors: for example, \(x_4\) of A and B in (5.2). The composite tensor T is defined by the tensor contraction of A and B for the index \(x_4\) in (5.2).

Using tensor contractions, we can define a large tensor T as

$$\begin{aligned} T_{x_1x_2x_3x_4x_5x_6} = \sum _{y_1} \sum _{y_2} \sum _{y_3} A_{x_1x_2y_1}B_{y_1x_3y_2}C_{y_2x_4y_3}D_{y_3x_5x_6}. \end{aligned}$$
(5.3)

Visually understanding the relationship of each tensor in tensor contractions using only (5.3) is difficult; a diagram notation is useful for visualizing this. Figure 5.1a and b are diagram notations for the left-hand and right-hand sides of (5.3), respectively. As shown in Fig. 5.1a, a geometrical object (circle) is used to represent a tensor T. The open edges represent the indexes of the tensor. In Fig. 5.1b, an edge between two tensors denotes a tensor contraction. A connected edge indicates the connection of two edges (indexes) for a tensor contraction. In summary, a node and connection of nodes in these diagrams represent a tensor and tensor contraction, respectively. The diagram in Fig. 5.1b represents a composite tensor as a network of tensors; hence, we call this a tensor network.

Fig. 5.1
Two images. a. A shaded circle T has six spokes labeled from x 1 to x 6. b. A row of 4 shaded circles labeled from A to D has vertical inputs x 2, x 3, x 4, and x 5 respectively. The horizontal input of A is x 1. The horizontal output of D is x 6.

Diagram notation of a the composite tensor T in (5.3); b the tensor contractions in (5.3)

Since a tensor network defined by a diagram is a composite tensor, we can define the class of composite tensors as a tensor network. Several types of network structures for tensor networks have already been proposed. For example, a tensor network with the one-dimensional structure shown in Fig. 5.1b is called a matrix product state (MPS). Historically, tensor networks have been used in the field of quantum information. If quantum amplitudes in a wave function are defined by a tensor, the tensor can be mapped to a quantum state. For example,

$$\begin{aligned} T \leftrightarrow \vert \psi \rangle = \sum _{x_1, x_2, x_3} T_{x_1x_2x_3} \vert x_1x_2x_3\rangle , \end{aligned}$$
(5.4)

where \(\vert x_1x_2x_3\rangle \) is a base, that is, a tensor product of local bases in local Hilbert spaces. Thus, tensor networks can be called quantum states. In addition, an MPS is called a tensor train (TT) in applied mathematics [14]. Generally, if all tensors have open edges and are connected to their neighboring tensors, the tensor network is called a tensor product state (TPS) [9]. The one-dimensional TPS is the MPS, and the two-dimensional TPS is called a projected entangled paired state (PEPS) [20]. A tensor network without a loop structure is called a tree, as shown in Fig. 5.2, and various tree tensor networks (TTNs) exist. Note that an MPS belongs to the TTN class.

Fig. 5.2
A tree diagram has 2 branches. In the left branch, node A is divided into nodes C and D. They generate 4 outputs, x 1 and x 2 from C, and x 3 and x 4 from D. In the right branch, node B is divided into nodes E and F. They generate 4 outputs, x 5 and x 6 from E and x 7 and x 8 from F.

Diagram notation of a tree tensor network

2.2 Tensor Decomposition and Tensor Compression

The singular value decomposition of a \(m \times n\) matrix A is \(U\Lambda V^\dagger \), where U and V are isometries, as \(U^\dagger U = V^\dagger V = I\), and \(\Lambda \) is a positive diagonal matrix. The diagonal part of \(\Lambda \) consists of positive singular values. We can perform SVD for any matrix. If the number of singular values is smaller than m and n, or the number of singular values is reduced by removing small values, then the matrix A can be compressed. Although the number of elements in A is \(m\times n\), the total number of independent elements in U, V, and \(\Lambda \) is \((m+n-k)k\). Therefore, if \(k < \min (m,n)\), the SVD is a compression of the original matrix. The SVD with the k largest singular values is the best approximation of A within the rank-k matrix: \(\tilde{U} \tilde{\Lambda } \tilde{V}^\dagger = \text {arg}\min _{\tilde{A}: \text {rank}-k} |\tilde{A} - A|_F\), where \(|X|_F\) is the Frobenius norm, \(|X|_F \equiv \text {Tr}[X^\dagger X]\).

Tensor decomposition is the transformation of a tensor into a composite tensor. Since composite tensors can be defined as a tensor network, a tensor network represents a tensor decomposition. For example, we can regard a tensor as a matrix by splitting the indexes of the tensor into two groups and compositing the indexes in each group. Thus, using SVD, we can decompose a matrix into a product of two matrixes: \(T=U\Lambda V^\dagger = (U\sqrt{\Lambda })(\sqrt{\Lambda }V^\dagger )=AT'\). By decomposing composite indexes into original indexes, we can transform the original tensor into a tensor contraction of two tensors. Repeatedly applying this decomposition to the right-hand side, \(T'\), we can transform any tensor into an MPS without truncations, as shown in Fig. 5.1b. Therefore, an MPS (TT) is a tensor decomposition that is applicable to any tensor. If the dimension of each tensor index is m and the number of indexes is n, then the number of elements in T is \(m^n\). However, the number of elements in the MPS is \(O(n \times (\chi ^2 m))\), where \(\chi \) is the dimension of the indexes between neighboring tensors in the MPS, called a bond dimension. Therefore, an MPS with a fixed bond dimension yields great compression.

If the largest singular values are kept, the approximation can be controlled by the MPS. However, determining the best tensor network with a fixed number of parameters to approximate a given tensor is difficult. Thus, we use a tensor network as a variational ansatz.

The computational cost of a tensor contraction is the product of all dimensions of the indexes in the tensor network that represent the tensor contraction. The computational cost of a tensor network that contains many tensor contractions generally depends on the order in which the tensor contractions are processed; however, determining the best order is an NP-hard problem. In addition, even for a compact tensor network such as a PEPS, the computational cost and spatial one exponentially increase with the number of tensors. Therefore, various approximation methods for tensor contractions have been proposed.

Using a tensor-network structure, we can efficiently calculate tensor contractions in some cases. For example, the computational cost of the inner product of two MPSs is only proportional to the length of the MPS. Therefore, compressing a huge tensor using a proper tensor-network decomposition can significantly reduce the computational cost of processing the tensor data.

3 Generative Model Using a Tree Tensor Network

3.1 Generative Modeling

Generative modeling finds a parametrized distribution \(p({\textbf {x}})\) to approximate a data distribution \(\pi ({\textbf {x}})\) [16]. Since the distance between these distributions can be defined by the Kullback–Leibler (KL) divergence,

$$\begin{aligned} D_{\text {KL}} (\pi ||p) = \sum _{{\textbf {x}}} \pi ({\textbf {x}}) \ln \left( \frac{\pi ({\textbf {x}})}{p({\textbf {x}})}\right) , \end{aligned}$$
(5.5)

we can optimize \(p({\textbf {x}})\) to minimize this KL divergence.

Suppose that a data sample is a vector in which each element takes a state from a finite set of states: \(\{{\textbf {x}}|({\textbf {x}})_i \in \{1, 2, \ldots , m\}\}\). A set of data samples, \({\mathcal M}=\{{\textbf {x}}_\mu \}_{\mu =1,\ldots ,M}\), defines an empirical distribution:

$$\begin{aligned} \pi ({\textbf {x}}) = \frac{1}{M} \sum _{\mu =1}^{M} \delta ({\textbf {x}}, {\textbf {x}}_\mu ), \end{aligned}$$
(5.6)

where M is the number of data samples and \(\delta ({\textbf {x}}, {\textbf {y}})\) is one if \({\textbf {x}}={\textbf {y}}\), otherwise it is zero. In practice, when the target distribution of generative modeling is the empirical data distribution, we minimize the negative log-likelihood (NLL) as the loss function during learning,

$$\begin{aligned} {\mathcal L} = -\frac{1}{M}\sum _{{\textbf {x}}\in {\mathcal M}}\ln [p({\textbf {x}})] = S(\pi ) + D_{\text {KL}}(\pi ||p), \end{aligned}$$
(5.7)

where \(S(\pi )\) is the entropy of the \(\pi \) distribution.

3.2 Tree Generative Model

Here, we consider a generative model based on a quantum state [1, 5]. Following the Born rule, we define \(p({\textbf {x}})\) as the square of the amplitude of a wave function:

$$\begin{aligned} p({\textbf {x}}) = \frac{|\psi ({\textbf {x}})|^2}{Z}, \end{aligned}$$
(5.8)

where \(\psi ({\textbf {x}})\) is a wave function and Z is the normalization factor.

MPSs [5] and TTNs [1] have been proposed to define the wave function for generative modeling. Figure 5.1b shows the network structure of an MPS. Each tensor in the MPS has three edges, and these edges are sequentially connected. Figure 5.2 shows the network structure of a TTN. The number of indexes of a tensor in this TTN is equal to that in the MPS. The only difference is the topology of the network; all physical indexes in the TTN, \(x_i\), are connected, and the TTN network has no loop structure. Thus, an MPS is a specialized type of TTN. In the following, a generative model using a TTN is called a tree generative model.

3.3 Canonical Form of TTN

Using redundancy to insert a pair of matrices and their inverses on an edge in a tensor network, we can construct a useful canonical form of a TTN [17]. Since a TTN has no loop, we can decompose the network of a TTN into two trees by cutting an edge, as shown in Fig. 5.3a. If we regard the cut edge as the root edge of each tree, we can define the order of nodes from the terminal nodes (leaves) to the root and apply the following tensor transformations in this order. Each tensor has an edge toward the root node and the remaining two edges. By combining these two edges into an index, the tensor is transformed into a matrix: \(T_{ijk} = M_{i(jk)}\). The SVD of the matrix splits the matrix into an isometry and a matrix connected to the edge toward the root node (see Fig. 5.3b):

$$\begin{aligned} M_{i(jk)} = \sum _l M'_{il} U_{l,(jk)}, \end{aligned}$$
(5.9)

where U is an isometry as

$$\begin{aligned} \sum _{(jk)}U_{l, (jk)}{U}_{l',(jk)}=\delta _{l,l'}. \end{aligned}$$
(5.10)

The matrix \(M'\) is then absorbed by the next tensor (see Fig. 5.3c). This procedure is repeated from the leaf tensors to the root node. Then, two modified root tensors are obtained and combined into a tensor with four edges. Finally, the SVD of the top tensor obtains two isometries and a diagonal matrix that consists of the singular values (see Fig. 5.3d).

Fig. 5.3
Four tree diagrams with 2 branches and 8 outputs, x 1 to x 8. a. 2 shaded circles are divided into 4 similar circles. b. 2 circles are divided into 4 triangles via smaller circles. c. 2 circles are divided into 4 triangles. d. 2 triangles divided from a smaller triangle are divided into 4 triangles.

Constructing the canonical form of a TTN: a cutting an edge of the TTN; b splitting a tensor into an isometry (triangle) and a matrix (circle) using the SVD of a leaf; c matrices from the SVD are absorbed into an upper tensor; and d the canonical form of the TTN (the rhombus represents a diagonal matrix that consists of singular values)

The canonical form of a TTN is useful as it enables a direct calculation of the normalization factor in (5.8). Since almost all tensors in the canonical form are isometries with the property defined in (5.10), the normalization factor directly depends on the singular values of the canonical form of the TTN, \(\{\lambda _i\}\):

$$\begin{aligned} Z = \sum _{{\textbf {x}}} |\psi ({\textbf {x}})|^2 = \sum _{{\textbf {x}}} \langle \psi ({\textbf {x}}) | \psi ({\textbf {x}}) \rangle = \sum _i (\lambda _i)^2. \end{aligned}$$
(5.11)

To calculate the NLL, we need to estimate a set of quantum amplitudes for the data, useful for the network structure of a TTN. The data vector \({\textbf {x}}_\alpha \) of a sample \(\alpha \) can be decomposed to a direct product of local vectors: \(({\textbf {x}}_\alpha )^{(1)} \otimes ({\textbf {x}}_\alpha )^{(2)} \otimes \cdots \). The set of local vectors for samples at a site i is represented by the matrix \(V_{k\alpha }^{(i)}\), where k is an index of the data at a site i. Thus, the total data set is defined as a TTN defined by the same network structure of delta tensors, where the leaves are the matrices \(V^{(i)}\), as shown in Fig. 5.4a; the delta tensor is one if all indexes are the same, otherwise it is zero. Due to the network structure of a TTN, we can efficiently calculate the contraction of the TTN with a data TTN using recursive steps (see Fig. 5.4a–c.)

Fig. 5.4
Three tree diagrams. a. The left and right triangles are divided into 4 triangles. Each triangle has 2 circles and a square. Both squares are connected to another square. b. The triangle on either side has 2 circles and a square. c. A triangle with 2 circles and a square is on the left and a circle is on the right.

Evaluation of quantum amplitude: a quantum amplitude \(\psi ({\textbf {x}}_\alpha )\) for a sample \(\alpha \); and b and c recursive calculation of \(\psi ({\textbf {x}}_\alpha )\), except for the part enclosed by the dotted line. The circles indicate data matrices at site i and the squares indicate a delta tensor, which is one if all indexes are the same, otherwise, it is zero

.

3.4 Learning Algorithm

Previous studies [1, 5] have used the learning algorithm for a single node (tensor) near the center of a canonical form with singular values, for example, the part enclosed by the dotted line in Fig. 5.4a. After combining the isometry and singular values in the enclosed part into a single tensor (node) with 3-legs, it can be updated using the gradient of the NLL. The SVD of the update tensor into an isometry, singular values, and a unitary can recover the canonical form of the TTN. Since the center of the canonical form of a TTN can move to a neighboring position in the network using SVD [17], we sweep all nodes in a TTN with single node updates.

3.5 Network Optimization

The network structure of a TTN is important for generative modeling. The performance of the balance tree generative model is better than that of the MPS in several scenarios [1]. Both the MPS and balance tree belong to the tree network class. The difference between them is the structure of their network. Therefore, optimizing the network for given data is effective for generative modeling.

The DMRG algorithm [22, 23] is often used to find the ground state of a one-dimensional quantum model, which is a variational method for an MPS. For a finite system, the DMRG algorithm sweeps and optimizes the tensors in the canonical form of an MPS to improve the variational energy for a target Hamiltonian. A two-site DMRG algorithm simultaneously updates two neighboring tensors in an MPS. Two tensors that are directly connected can be combined into a 4-leg tensor, and this combined tensor can be updated. Finally, the combined tensor is decomposed into two 3-leg tensors using SVD, and the algorithm proceeds to the next pair. The corresponding algorithm for a tree generative model was proposed in [1]. In these studies[1, 22, 23], the network structure of the TTN does not change. The 4-leg tensor can be divided into two tensors in three possible ways, as shown in Fig. 5.5. Selecting a new division globally changes the network structure of the TTN.

Fig. 5.5
Three diagrams have two opposite triangles with a square between their peaks. a. The curves on the left base are j and k. The curves on the right base are i and l. b. The left curves are i and j, and the right curves are k and l. c. Left curves, i and k. Right curves, j and l.

Decomposition of a 4-leg tensor. The triangles and squares represent isometries and diagonal matrices of singular values, respectively

We now propose a new algorithm to change the network structure of a TTN for generative modeling. First, two isometries and singular values are combined in the center of the canonical form into a 4-leg tensor. The 4-leg tensor is updated to improve the NLL and is then converted into a matrix. Three matrices are used to divide the 4-leg tensor into two groups. A better division is selected and the SVD of the matrix is conducted to produce a new canonical form. The center of the canonical form is moved, and the updates are repeated. Several strategies can be used to select a better division; further details can be found in [6].

Fig. 5.6
Two images. a. Four horizontal lines with a dense row of vertical lines with circles at the lower ends are connected in a step-like structure. b. A network of multiple branches with circles of different shades. The circular cluster at the top has dark shades.

Network structure of a tree generative model: a initial MPS structure for random patterns and b network structure after optimization by our proposed algorithm. Circles indicate probability variables and their color indicates their position in the line. Edges indicate the index of a tensor, which is a vertex with three legs

We tested the proposed algorithm for the empirical probability distribution of 10 random patterns, starting from the MPS structure shown in Fig. 5.6a, because the binary probability variable is on a line. The length of the line is 256. We fixed the center part of the line at 0 and divided the left- and right-hand sides into a random pattern, as shown in the top and bottom rows in Fig. 5.6a. The variables on the left-hand side strongly correlate to those on the right. In this case, when we start with a randomly initialized MPS, we cannot obtain the minimum NLL if the network structure is fixed as an MPS. However, we can find a tree generative model with the minimum NLL by changing the structure of the network using our proposed algorithm. Figure 5.6b shows the optimized structure of the network using the proposed algorithm; this interesting network structure spontaneously emerged. Since the probability variables of the left- and right-hand sides strongly correlate, they are embedded into a compact tree structure, shown in the upper right of Fig. 5.6b. However, the lower part of Fig. 5.6b consists of probability variables in the center, which are fixed to 0.

Fig. 5.7
Two schematic diagrams. a. A shaded square T with four spokes at the corners leads to an octagon with four shaded triangles at four edges. b. A shaded rectangle T with four vertical lines at the top leads to a loop of four shaded blocks, M 1, M 2, M 3, and M 4.

Tensor ring decomposition (TRD) of a 4-leg tensor T into four 3-leg tensors, \(M^{(i)}\): a a symmetric TRD diagram and b an alternative representation of the TRD diagram

4 Tensor Ring Decomposition

4.1 Introduction to Tensor Ring Decompositions

As discussed in Sect. 5.2, we can consider a variety of tensor-network decompositions to represent a given tensor. When we perform approximation for such a tensor network decomposition, the efficiency of the data compression highly depends on the structure of the tensor-network and properties of the tensor data. For quantum many-body problems, the area law of entanglement entropy, which determines the scaling of the amount of correlation in the quantum state, plays an important role in selecting a better tensor network. However, the area law does not necessarily hold for general tensor data. Thus, it remains unclear which types of tensor networks, beyond the simplest form of MPS (TT), are suitable for decomposing general tensor data.

In this section, we consider tensor ring decomposition (TRD), which is a fundamental decomposition of multidimensional tensors [24]. In TRD, tensors are decomposed into a form of matrix product states with periodic boundary conditions (Fig. 5.7). For example, for the N-leg tensor \(T_{i_1,i_2,\dots ,i_N}\), the TRD is expressed as

$$\begin{aligned} T_{i_1,i_2,\dots ,i_N} & = \sum _{j_1,j_2,j_3} M^{(1)}_{j_1,j_2}[i_1] M^{(2)}_{j_2,j_3}[i_2]\cdots M^{(N)}_{j_N,j_1}[i_N]\nonumber \\ & = \textrm{Tr} \prod _{n=1}^N M^{(n)}[i_n], \end{aligned}$$
(5.12)

where \(M^{(n)}_{j_n,j_{n+1}}[i_n]\) is a 3-leg tensor, \(i_n = 1,2, \ldots , d_n\), and \(j_n = 1,2, \ldots , D_n\). Note that in the last expression, \(M^{(n)}[i_n]\) is regarded as a matrix for a fixed \(i_n\). Hereafter, we denote a tensor defined by (5.12) as \(\textrm{tTr}\prod _{n=1}^N M^{(n)}\), where \(\textrm{tTr}\) indicates the trace of a tensor network. We can obtain a TRD of a certain tensor, for example, by using successive SVDs; however, this yields an (open boundary) MPS, i.e., the bond dimensions of the two boundaries, \(D_1\) and \(D_N\), become one. However, an open boundary MPS solution is not usually optimal if we want to minimize the maximum or average bond dimension in the TRD, because information needs to flow through the entire system to represent correlations between two edges, which increases the bond dimension, \(D_n\). Thus, an efficient algorithm is required to find the optimal TRD for given tensors.

We can also consider a related problem where we instead approximate an N-leg tensor using TRD for given bond dimensions \(D_n\) to find the exact TRD. The alternated least square (ALS) method [24] is often used to find a numerically approximate TRD. The aim of the ALS method is to find a TRD that minimizes the distance between the original tensor and a TRD representation defined by the Frobenius norm:

$$\begin{aligned} F = \left\| T - \textrm{tTr}\prod _n M^{(n)} \right\| ^2. \end{aligned}$$
(5.13)

The optimal solution is iteratively searched for by solving local linear problems for \(M^{(n)}\) defined by fixing \(M^{(m)}\) for \(m \ne n\). When a 3-leg tensor \(M^{(n)}\) is regarded as a vector,

$$\begin{aligned} \left( {{\textbf {M}}}\right) _{(j_1,j_2,i)} = M^{(n)}_{j_1,j_2}[i], \end{aligned}$$
(5.14)

the squared norm F can be written as

$$\begin{aligned} F = \Vert T \Vert ^2 + {{\textbf {M}}}^\dagger \hat{N} {{\textbf {M}}} - {{\textbf {M}}}^\dagger {{\textbf {W}}} -{{\textbf {W}}}^\dagger {{\textbf {M}}}, \end{aligned}$$
(5.15)

where \(\hat{N}\) and \({{\textbf {W}}}\) are defined as in Fig. 5.8.

Fig. 5.8
A matrix and a vector structure. a. N hat = Two interconnected loops with three blocks in each, M 2, M 3, and M 4, and a blank space in between. b. W = A loop of three blocks, M 2, M 3, and M 4, and a blank space connected to a shaded rectangle T at the bottom with vertical lines.

Matrix \(\hat{N}\) and vector \({{\textbf {W}}}\) in (5.15) for \(n=1\)

Note that the matrix N is positive-semidefinite by construction. Because F is a quadratic function of \({{\textbf {M}}}\), its extrema are given by solving the linear equations defined by

$$\begin{aligned} \hat{N} {{\textbf {M}}} = {{\textbf {W}}}. \end{aligned}$$
(5.16)

Thus, when the matrix \(\hat{N}\) is a regular matrix, we can obtain the optimal \({{\textbf {M}}}\):

$$\begin{aligned} {{\textbf {M}}} = \hat{N}^{-1}{{\textbf {W}}}. \end{aligned}$$
(5.17)

However, in general, \(\hat{N}\) can have zero eigenvalues and its inverse matrix \(\hat{N}^{-1}\) may not exist. In this case, the linear problem becomes underdetermined. To solve this problem, the pseudo inverse (PI), \(\hat{N}^{+}\), is often used instead of the inverse. Alternatively, the conjugate gradient (CG) method can be used to obtain one of the solutions.

The ALS algorithm often gets trapped at a local minimum of F. In particular, if the initial estimate of \(M^{(n)}\) contains a “redundant loop”, removing such a contribution from the TRD is not easy; thus, it will not converge to the global minimum. We investigate such a situation in Sect. 5.4.2.

4.2 Redundant Loops

We now consider the numerically optimized TRD of T, starting from an initial estimate that includes a redundant loop. For a general TRD, \(\textrm{tTr}\prod _{n}M^{(n)}\), we define an ideal redundant loop by adding additional degrees of freedom in the virtual bond for all \(M^{(n)}\):

$$\begin{aligned} \tilde{M}^{(n)}_{(j_n,k_n),(j_{n+1},k_{n+1})}[i_n] \equiv \sigma ^{(n)}_{k_n}\delta _{k_n,k_{n+1}} M^{(n)}_{j_n,j_{n+1}}[i_n]. \end{aligned}$$
(5.18)

This is also shown in Fig. 5.9a. Clearly, the TRD represented by \(\tilde{M}^{(n)}\) is equivalent to that of \(M^{(n)}\) up to a constant \(\sum _{k}\prod _{n}\sigma _{k}^{(n)}\). Thus, no essential information is represented by the additional indices \(k_n\).

Fig. 5.9
Three structures. a. T = 2 concentric rectangles with 4 shaded blocks. The inner rectangle is labeled alpha delta i, j. b. N hat = 2 interconnected loops with 3 blocks in each, and a blank space in between. c. W = A loop of 3 blocks and a blank space connected to a shaded rectangle T at the bottom.

TRD that contains a redundant loop: a the TRD of a tensor T, where the red line indicates the redundant loop; b the corresponding matrix \(\hat{N}\); and c the vector \({{\textbf {W}}}\) for \(n=1\)

When a TRD has redundant loops, the matrix \(\hat{N}\) for \(n=1\) is represented as in Fig. 5.9b. The redundant loops in the upper and bottom parts of this figure are disconnected, indicating that \(\hat{N}\) has zero eigenvalues.

Similarly, the vector \({{\textbf {W}}}\) for \(n=1\) is represented as in Fig. 5.9c. We can easily confirm that the solution obtained by the pseudo inverse of N, \({{\textbf {M}}} = \hat{N}^{+} {{\textbf {W}}}\), maintains the redundant loop. However, the redundant loop can disappear in the general solution of the linear equation along with the eigenvectors that correspond to the zero eigenvalues. Thus, to remove redundant loops, we must properly determine the contribution of the zero eigenvectors.

4.3 Entanglement Penalty Algorithm

We now discuss an idea to improve the optimization of the TRD, inspired by quantum many-body problems. We consider the corner double line (CDL) tensor, which often appears in statistical physics [4], as a model tensor. Based on the CDL structure, we introduce a modified cost function that can avoid the previously discussed local minima.

Let a tensor T be represented by an exact TRD:

$$\begin{aligned} T_{i_1,i_2,\dots ,i_N} = \textrm{Tr} \prod _{n=1}^N C^{(n)}[i_n]. \end{aligned}$$
(5.19)

Assume that each index \(i_n\) is represented by a set of two indices \((x_n,y_n)\) and that the 3-leg tensor \(C^{(n)}\) has a CDL structure:

$$\begin{aligned} C^{(n)}_{j_n,j_{n+1}}[i_n = (x_n,y_n)] = \lambda _{j_n}^{(n)} \delta _{j_n,x_n}\delta _{j_{n+1},y_n}. \end{aligned}$$
(5.20)

T and \(C^{(n)}\) can be represented as in Fig. 5.10. More generally, we can consider unitary matrices that mix the indices \(x_n\) and \(y_n\). In this case, \(C^{(n)}\) is written as

$$\begin{aligned} C^{(n)}_{j_n,j_{n+1}}[i_n = (x_n,y_n)] = \lambda _{j_n}^{(n)} U^{(n)}_{(j_n,j_{n+1}),(x_n,y_n)}, \end{aligned}$$
(5.21)

where \(U^{(n)}\) is a unitary matrix.

Fig. 5.10
Three structures. a. C superscript n = a square with two curves at the top corners. b. T = A loop of 4 squares with curves at the top corners. The curves are connected. The last curve is looped with the first. c. Similar structure of T with four shaded rectangles at the top of the curves.

Ideal corner double line (CDL) tensors: a a local CDL tensor; b a 4-leg tensor consisting of general CDL tensors without unitary transformations; and c a 4-leg tensor consisting of general CDL tensors with unitary transformations. The blue squares indicate unitary matrices

As discussed in Sect. 5.4.2, when the ALS algorithm is used to find an optimal TRD of T, represented by (5.19), from an initial guess that includes redundant loops, it gets trapped at a local minimum. To avoid such local minima, we consider an additional term in the cost function.

One of the differences between the redundant loop and CDL structure is the entanglement, or correlation, between the original indices, \(i_n\), and virtual indices in the TRD. For redundant loops, the original and virtual indices have no connection, but in the CDL structure they do. When we regard a 3-leg tensor \(M_{i,j_1,j_2}\) as a matrix \(M_{i,(k_1,k_2)}\), the amount of such entanglement can be characterized by entanglement entropy, which is often used in quantum information [3] and is defined using the singular values of M:

$$\begin{aligned} S_{\textrm{E}} \equiv -\sum _{i} \tilde{s}_i \log \tilde{s}_i, \end{aligned}$$
(5.22)

where \(\tilde{s_i} = s_i/\sum _i s_i\) is the normalization of the singular value \(s_i\). When M is an \( N \times M\) matrix and \(r=\min (M,N)\), \(S_{\textrm{E}}\) is such that \(0 \le S_{\textrm{E}} \le \log r\): \(S_{\textrm{E}}=0\) for \(\tilde{s}_1 = 1\) and \(\tilde{s}_{i} = 0 (i \ne 1)\), and \(S_{\textrm{E}} = \log r\) for \(\tilde{s}_i = 1/r\). Note that \(S=0\) corresponds to the redundant loop, and \(S_{\textrm{E}}\) takes larger finite values for the CDL. Thus, when we consider an additional term (negatively) proportional to the entropy in the cost function, CDL-like solutions may be favored over the redundant loop. The new cost function is explicitly written as

$$\begin{aligned} F'(\epsilon ) = \left\| T - \textrm{tTr}\prod _n M^{(n)} \right\| ^2 -\epsilon \frac{1}{N}\sum _{n} S_{\textrm{E}}(M^{(n)}), \end{aligned}$$
(5.23)

where \(\epsilon \) is a positive constant. For a finite \(\epsilon \), the global minimum of \(F'\) yields a different TRD from that which minimizes F. Thus, in practice, we may begin the ALS algorithm with a sufficiently large \(\epsilon \) and adjust it toward \(\epsilon = 0\) with each iteration. Using this procedure, the TRD may escape the local minimum and redundant loop during the initial iterations; then, when \(\epsilon = 0\), the ALS algorithm is expected to converge to the global minimum of \(F'(\epsilon =0)=F\).

4.4 Numerical Experiments

We will now demonstrate how the entanglement entropy penalty works using numerical experiments. In these experiments, we performed ALS-like site-wise optimization with the cost functions F and \(F'(\epsilon )\) to find the optimal TRD. Each iteration of the ALS algorithm involves updating N local 3-leg tensors. For \(F'\), the local problem becomes nonlinear; we typically use the CG method to solve such a problem. For F, we can use either the PI or CG method.

As the simplest example, we show the typical optimization dynamics for the ideal CDL represented by (5.21). Figure 5.11a shows the convergence of the ALS algorithm, starting from random tensors, for a 4-leg tensor T consisting of a CDL with a bond dimension of each index \(i_1\) of 16. For the entanglement cost function (5.23), we set \(\epsilon > 0\) for the first few steps and then \(\epsilon = 0\), i.e., \(F'(\epsilon = 0) = F\), to obtain the true global minimum. We use the PI and CG methods to minimize the standard ALS cost function (5.13) and use the CG method for the entanglement cost function. The optimization with the usual cost function fails to converge to the global minimum with the PI and CG methods. However, the entanglement cost function (5.23) converges to the global minimum. When we consider initial tensors with the ideal redundant loops defined in (5.18), the standard ALS almost always gets trapped at a local minimum, as shown in Fig. 5.11b. However, the entanglement cost function avoids getting trapped at local minima and smoothly converges to the global minimum. Further details and results of these numerical experiments are presented elsewhere [11].

Fig. 5.11
Two graphs of the norm of T minus T R D over the norm of T versus the number of iterations plot P I, C G, and Entanglement. The L-shaped line indicates the entanglement in both graphs. The overlapping lines at the top in both graphs represent P I and C G.

Optimization of TRD using the standard cost function with the PI and CG solvers versus the entanglement cost function for a random 4-leg tensor consisting of CDL tensors with a bond dimension of 16 for the outer indices: a the initial tensors are random dense tensors and b the initial tensors are random dense tensors with redundant loops with a bond dimension of 2, as in (5.18). For the entanglement cost function, \(\epsilon = 0.1\) for the first five iterations, which was minimized by the CG method; then, \(\epsilon = 0\), which was minimized by the PI. Each iteration optimizes N tensors. Here, \(N=4\)

These numerical experiments indicate that the entanglement cost function works very well for CDL-type tensors. However, in general, the target of TRD can be very different from the ideal CDL. Our other experiments using tensors constructed from the TRD of general tensors indicate that a naive entanglement cost function does not always escape local minima; it depends on the balance of the bond dimensions of the outer and virtual indices [11].

5 Exact MERA Network and Quantum Renormalization Group for Critical Spin Models

5.1 Introduction

In the previous sections of this chapter, we showed that tensor-network approaches are applicable to information-scientific problems. Thus, a deep understanding of these successes is highly desirable. In quantum physics, the approaches were originally proposed as efficient numerical methods for treating the ground and low-lying states of strongly interacting quantum many-body systems. In this section, we examine the underlying mechanisms of the success of a class of tensor networks called multiscale entanglement renormalization ansatz (MERA) [21].

Usually, two different types of tensor networks are considered depending on how close the target model is to quantum criticality. This is because the magnitude of nonlocal quantum correlation or entanglement determines the structure of the network. Away from criticality, the quantum state can be well represented by the PEPS. The well-known MPS is a one-dimensional realization of PEPS, and the relevant matrix dimension characterizes the magnitude of the quantum entanglement. Alternatively, the algebraic Bethe ansatz is a mathematically exact method for one-dimensional solvable models, and this can also be transformed into an MPS. Thus, this recent approach also has a traditional root, and hence the underlying mathematics is clear. However, in quantum critical cases, the ability of PEPS greatly reduces because the tensor dimension must be increased to precisely numerically optimize the tensor-network wave function. In this case, including an extra dimension, the so-called holographic dimension, in the network is a better approach. This extra dimension corresponds to the flow of real-space renormalization and also plays a role in greatly reducing the tensor dimension. The corresponding network is the MERA network.

Surprisingly, the MERA concept stimulates string theorists because the structure of the network is quite similar to the spacetime concept that appears in gauge/gravity correspondence [8, 15, 19]. This correspondence is considered to be key to understanding the complementary relationship between quantum field theory and general relativity. Therefore, clarifying the mathematical structure of the MERA network is important interdisciplinary research beyond condensed matter and statistical physics.

The traditional methods of a renormalization group (RG) are based on the flow of interaction parameters in the Hamiltonian, which is defined by the repetition of the renormalization group transformation in real or momentum space. By transforming the Hamiltonian, we can determine how the system approaches the fixed point and what the dominant parameters are. However, recent tensor-network approaches are mainly based on the optimization of the variational ansatz, and their relationship with the traditional RG concept is not very clear. We aim to overcome this discrepancy by bridging the tensor network with the RG of the Hamiltonian.

5.2 Heisenberg Model and Quantum Entanglement

We start with the antiferromagnetic Heisenberg Hamiltonian in one spatial dimension:

$$\begin{aligned} H=J\sum _{i=1}^{N}{{\textbf {S}}}_{i}\cdot {{\textbf {S}}}_{i+1}. \end{aligned}$$
(5.24)

Here, \({{\textbf {S}}}\) is a quantum spin operator, \({{\textbf {S}}}=\frac{1}{2}{\boldsymbol{\sigma }}\) for Pauli matrix \({\boldsymbol{\sigma }}\), J is the exchange coupling, N is the number of lattice sites, and we assume the periodic boundary condition \({{\textbf {S}}}_{N+1}={{\textbf {S}}}_{1}\). The ground state of this Hamiltonian is \(\left| \psi \right\rangle =\sum _{s_{1},...,s_{N}}\psi ^{s_{1}...s_{N}}\left| s_{1}...s_{N}\right\rangle \), where \(\left| s_{1}...s_{N}\right\rangle \) is the abbreviation of \(\left| s_{1}\right\rangle \otimes \cdots \otimes \left| s_{N}\right\rangle \).

As an example, we consider the 4-site case (\(N=4\)), for which we can obtain the exact eigenstates (this is a very pedagogical example to demonstrate the nature of the MERA network). The eigenvalues are \(E=-2J, -J, 0, 0, 0, J\). In particular, the ground state (\(S_{tot}^{z}=0\)) is given by

$$\begin{aligned} \sqrt{12}\left| \psi \right\rangle =2\left( \left| \uparrow \downarrow \uparrow \downarrow \right\rangle +\left| \downarrow \uparrow \downarrow \uparrow \right\rangle \right) -\left( \left| \uparrow \uparrow \downarrow \downarrow \right\rangle +\left| \uparrow \downarrow \downarrow \uparrow \right\rangle +\left| \downarrow \uparrow \uparrow \downarrow \right\rangle +\left| \downarrow \downarrow \uparrow \uparrow \right\rangle \right) , \end{aligned}$$
(5.25)

where the coefficient \(\sqrt{12}\) corresponds to a normalization factor of \(\left| \psi \right\rangle \). In addition to the first term on the right-hand side of the equation, which is a classical antiferromagnetic configuration, a second term exists due to quantum fluctuation. The second term represents domain excitation, \(\left| \uparrow \downarrow \right\rangle _{i,i+1}\otimes \left| \downarrow \uparrow \right\rangle _{i+2,i+3}\). This state can also be written as

$$\begin{aligned} \sqrt{3}\left| \psi \right\rangle =\left| s\right\rangle _{12}\otimes \left| s\right\rangle _{34}+\left| s\right\rangle _{23}\otimes \left| s\right\rangle _{41}, \end{aligned}$$
(5.26)

where \(\left| s\right\rangle _{ij}=\left( \left| \uparrow \downarrow \right\rangle _{ij}-\left| \downarrow \uparrow \right\rangle _{ij}\right) /\sqrt{2}\). Thus, two singlets spatially fluctuate, and this state has a finite amount of quantum entanglement. To clarify this feature, we introduce a reduced density matrix by taking the partial trace of the density matrix \(\rho =\left| \psi \right\rangle \left\langle \psi \right| \) as \(\rho _{12}=tr_{34}\rho \). Then, the bipartite entanglement entropy is defined by

$$\begin{aligned} S_{12}=-tr_{12}\left( \rho _{12}\log \rho _{12}\right) =S_{12}=2\log 2 - \frac{1}{2}\log 3=0.83698\cdots . \end{aligned}$$
(5.27)

Because of the finite \(S_{12}\), simple treatments, such as mean-field theory, break down. Tensor-network methods can efficiently treat this entanglement. For a one-dimensional case, this model can be solved exactly, even for a general N. In this case, the wave function is represented by the Bethe ansatz. The algebraic version of this ansatz can be transformed into the matrix product state.

5.3 Construction of Exact MERA Network

We would like to represent \(\left| \psi \right\rangle \) by the hierarchical tensor network (MERA) in Fig. 5.12. The hierarchical network matches the quantum critical systems since it can represent the presence of various energy scales due to the real-space renormalization processes. This construction can greatly reduce the tensor dimension compared with the MPS representation. This is advantageous in terms of numerical simulation. Furthermore, the hierarchical network contains disentangling tensors. The presence of disentangling tensors is necessary to realize the success of the real-space RG in quantum systems. Because of quantum fluctuation, the spin correlation is essentially nonlocal, thus maintaining a good approximation using only local transformations is difficult. Before the RG, the disentangling tensors properly kill the nonlocal entanglement, and thus the real-space RG becomes successful. This is a key in MERA optimization. Note that such a network without the disentangling tensors is a tree tensor network. The tree tensor network has various applications, as presented earlier in this chapter, even though it may not match the quantum RG for quantum critical systems.

Fig. 5.12
Two network structures. a. A triangle T leads to 2 dark-shaded triangles W connected to 2 shaded rectangles U. The dots from the rectangles are labeled from s 1 to s 4. b. 2 shaded triangles W with inputs i 1 and i 2 at the top and i 1 prime and i 2 prime at the bottom are connected to a shaded rectangle H.

a MERA tensor network (\(N=4\), periodic boundary condition) and b graphical representation of the effective two-site Hamiltonian after the quantum RG processes represented by disentangling and isometry. Here, the two isometries correspond to the effective sites

We now consider a method for explicitly constructing the tensor elements of the MERA network. We first introduce the following representation with a singlet and triplet on each of the two sites:

$$\begin{aligned} \sqrt{12}\left| \psi \right\rangle =3\left| s\right\rangle _{23}\otimes \left| s\right\rangle _{41}+\left| t_{0}\right\rangle _{23}\otimes \left| t_{0}\right\rangle _{41}-\left| t_{+}\right\rangle _{23}\otimes \left| t_{-}\right\rangle _{41}-\left| t_{-}\right\rangle _{23}\otimes \left| t_{+}\right\rangle _{41}, \nonumber \\ \end{aligned}$$
(5.28)

where \(\left| s\right\rangle =\left( \left| \uparrow \downarrow \right\rangle -\left| \downarrow \uparrow \right\rangle \right) /\sqrt{2}\), \(\left| t_{0}\right\rangle =\left( \left| \uparrow \downarrow \right\rangle -\left| \downarrow \uparrow \right\rangle \right) /\sqrt{2}\), \(\left| t_{+}\right\rangle =\left| \uparrow \uparrow \right\rangle \), and \(\left| t_{-}\right\rangle =\left| \downarrow \downarrow \right\rangle \). The purpose of this representation is that it considers the nonlocal bases and connects them to disentangling tensors (see Fig. 5.12).

We now introduce the disentangling transformation \(\left| s\right\rangle \rightarrow \left| 00\right\rangle \), \(\left| t_{+}\right\rangle \rightarrow \left| 01\right\rangle \), \(\left| t_{-}\right\rangle \rightarrow \left| 10\right\rangle \), and \(\left| t_{0}\right\rangle \rightarrow \left| 11\right\rangle \). This is a unitary transformation that locally reduces the amount of entanglement. This change is quite powerful for understanding the properties of the MERA network. Then, we have

$$\begin{aligned} \sqrt{12}\left| \psi \right\rangle = & {} 3\left| 00\right\rangle _{23}\otimes \left| 00\right\rangle _{41}+\left| 11\right\rangle _{23}\otimes \left| 11\right\rangle _{41}-\left| 10\right\rangle _{23}\otimes \left| 01\right\rangle _{41}-\left| 01\right\rangle _{23}\otimes \left| 10\right\rangle _{41} \nonumber \\ = & {} 3\left| 0000\right\rangle _{1234}+\left| 1111\right\rangle _{1234}-\left| 1100\right\rangle _{1234}-\left| 0011\right\rangle _{1234} \nonumber \\ = & {} \left( \begin{matrix}\left| 00\right\rangle _{12}&\left| 11\right\rangle _{12}\end{matrix}\right) \left( \begin{matrix}3&{}-1\\ -1&{}1\end{matrix}\right) \left( \begin{matrix}\left| 00\right\rangle _{34}\\ \left| 11\right\rangle _{34}\end{matrix}\right) . \end{aligned}$$
(5.29)

The vector \(\left| aa\right\rangle \) (\(a=0,1\)) can be simply represented as \(\left| a\right\rangle \), which compresses the information. The MERA representation of the ground state corresponds to the decomposition of the coefficient \(\psi ^{s_{1}s_{2}s_{3}s_{4}}\) by a set of functional tensors:

$$\begin{aligned} \psi ^{s_{1}s_{2}s_{3}s_{4}}=\sum _{i_{1},i_{2}}\sum _{j_{1},j_{2},j_{3},j_{4}}T^{i_{1}i_{2}}W_{i_{1}}^{j_{1}j_{2}}W_{i_{2}}^{j_{3}j_{4}}U_{j_{2}j_{3}}^{s_{2}s_{3}}U_{j_{4}j_{1}}^{s_{4}s_{1}}. \end{aligned}$$
(5.30)

Here we assume a spatially uniform network. Thus, the top tensor is defined by

$$\begin{aligned} T=\frac{1}{\sqrt{12}}\left( \begin{matrix}3&{}-1\\ -1&{}1\end{matrix}\right) , \end{aligned}$$
(5.31)

and the isometry tensor is defined by

$$\begin{aligned} W_{i}^{jj^{\prime }}=\delta _{ij}\delta _{jj^{\prime }} \; , \; i=0,1. \end{aligned}$$
(5.32)

The disentangling tensor is defined by

$$\begin{aligned} \left( \begin{matrix}\left| 00\right\rangle \\ \left| 01\right\rangle \\ \left| 10\right\rangle \\ \left| 11\right\rangle \end{matrix}\right) =\left( \begin{matrix}\left| s\right\rangle \\ \left| t_{+}\right\rangle \\ \left| t_{-}\right\rangle \\ \left| t_{0}\right\rangle \end{matrix}\right) =U\left( \begin{matrix}\left| \uparrow \uparrow \right\rangle \\ \left| \uparrow \downarrow \right\rangle \\ \left| \downarrow \uparrow \right\rangle \\ \left| \downarrow \downarrow \right\rangle \end{matrix}\right) \; , \; U=\left( \begin{matrix}0&{}1/\sqrt{2}&{}-1/\sqrt{2}&{}0\\ 1&{}0&{}0&{}0\\ 0&{}0&{}0&{}1\\ 0&{}1/\sqrt{2}&{}1/\sqrt{2}&{}0\end{matrix}\right) . \end{aligned}$$
(5.33)

The matrix elements of U originate from the combination of \(\left| s\right\rangle \) and \(\left| t_{0}\right\rangle \), which resembles the Haal wavelet transformation. Thus, we conclude that the quantum RG can also be regarded as an extension of the scale control techniques that have been developed in classical systems.

5.4 RG Flow

The effective two-site Hamiltonian after the quantum RG flow using one layer of the disentangling and isometric transformations is obtained by taking the partial expectation value:

$$\begin{aligned} H^\textrm{eff}_{(i_{1}^{\prime }i_{2}^{\prime })(i_{1}i_{2})}=\sum _{s_{1},s_{2},s_{3},s_{4}}\sum _{s_{1}^{\prime },s_{2}^{\prime },s_{3}^{\prime },s_{4}^{\prime }}\left\langle s_{1}^{\prime }s_{2}^{\prime }s_{3}^{\prime }s_{4}^{\prime }\left| H\right| s_{1}s_{2}s_{3}s_{4}\right\rangle U_{i_{1}^{\prime }i_{2}^{\prime }}^{s_{2}^{\prime }s_{3}^{\prime }}U_{i_{2}^{\prime }i_{1}^{\prime }}^{s_{4}^{\prime }s_{1}^{\prime }}U_{i_{1}i_{2}}^{s_{2}s_{3}}U_{i_{2}i_{1}}^{s_{4}s_{1}}. \nonumber \\ \end{aligned}$$
(5.34)

The matrix representation of the effective Hamiltonian and vector representation of the top tensor are given by

$$\begin{aligned} H^\textrm{eff}=\left( \begin{matrix}-3/2&{}1/2&{}1/2&{}-1/2\\ 1/2&{}0&{}0&{}1/2\\ 1/2&{}0&{}0&{}1/2\\ -1/2&{}1/2&{}1/2&{}1/2\end{matrix}\right) \; , \; \left| T\right\rangle =\frac{1}{\sqrt{12}}\left( \begin{matrix}3\\ -1\\ -1\\ 1\end{matrix}\right) , \end{aligned}$$
(5.35)

respectively, the basis set is \(\left| 00\right\rangle \), \(\left| 01\right\rangle \), \(\left| 10\right\rangle \), and \(\left| 11\right\rangle \). The ground state energy is given by

$$\begin{aligned} E=\left\langle T\left| H^\textrm{eff}\right| T\right\rangle =\left\langle \psi \left| H\right| \psi \right\rangle =-2J. \end{aligned}$$
(5.36)

Here, the eigenvalues of \(H^\textrm{eff}\) are \(E=-2J, 0, 0, J\).

We now calculate the bipartite entanglement entropy after the RG, which is defined by the entanglement entropy between the two effective sites (isometry tensors). The reduced density matrix is defined by

$$\begin{aligned} \rho _{R}=tr_{L}\left| T\right\rangle \left\langle T\right| =\frac{5}{6}\left| 0\right\rangle \left\langle 0\right| -\frac{1}{3}\left| 0\right\rangle \left\langle 1\right| -\frac{1}{3}\left| 1\right\rangle \left\langle 0\right| +\frac{1}{6}\left| 1\right\rangle \left\langle 1\right| =\left( \begin{matrix}5/6&{}-1/3\\ -1/3&{}1/6\end{matrix}\right) . \nonumber \\ \end{aligned}$$
(5.37)

Here, the eigenvalues of \(\rho _{R}\) are \(\left( 3\pm 2\sqrt{2}\right) /6\). The entropy is then calculated as

$$\begin{aligned} S_{R}=-\text {Tr}_{R}\left( \rho _{R}\log \rho _{R}\right) =\log 6 - \frac{2\sqrt{2}}{3}\log \left( 3+2\sqrt{2}\right) =0.12984\cdots . \end{aligned}$$
(5.38)

Thus, we determine that \(S_{R}<S_{12}\). Hence, the disentangling procedure actually reduces the entanglement entropy, as discussed in the original paper by Vidal [21].

5.5 Concluding Remarks

In this section, we presented the analytic properties of the MERA network to better understand the nature of the quantum RG. Although general cases with a larger N are still difficult, the present toy model largely demonstrates how the RG occurs in quantum cases. The practical use of a nonlocal basis to decompose the wave function is key to constructing the tensor network.

6 Summary

In this chapter, we presented two applications of tensor networks for tensor data processing and a discussion of the underlying mechanism of the success of a tensor network related to the compression of quantum information (MERA). Regarding further applications, parameter compression of neural networks by tensor networks is also interesting [10]; however, we could not introduce this here due to space limitations. As discussed in this chapter, research on tensor data processing using tensor networks is promising, and its future development is necessary to support next-generation mobility technologies.