1 Introduction

In wavefunction methods of quantum chemistry, one aims to directly approximate the wavefunction of the electrons in a given molecular system. In the many-electron case, these are defined on extremely high-dimensional spaces. In addition, due to the fermionic nature of electrons, these wavefunctions need to respect certain antisymmetry requirements. Post–Hartree–Fock methods are an established class of wavefunction methods based on approximations of wavefunctions by Slater determinants, that is, by antisymmetrized tensor products of single-electron basis functions on \(\mathbb {R}^3\). With a judicious choice of such lower-dimensional basis functions, called orbitals, these methods can achieve high-accuracy approximations of the wavefunctions corresponding to the lowest-energy states of the system. This is achieved essentially by exploiting near-sparsity of wavefunctions in the basis of all Slater determinants formed from the orbitals.

However, for certain types of problems, for instance strongly correlated systems with several competing states of lowest energy, these classical methods typically fail to yield good approximations. Thus more flexible data-sparse parametrizations of the linear combinations of Slater determinants that can be formed from a given finite set of orbitals are of interest. An elegant way of representing such linear combinations of antisymmetric functions is the formalism of second quantization, where wavefunctions are represented in terms of the occupation of each orbital by a particle. With respect to a sequence of orthonormal orbitals \(\{ \phi _k\}_{k\in \mathbb {N}}\), this leads to a representation of the wavefunction by occupation numbers \(({\mathbb {C}}^2)^\infty \), corresponding to an occupied and an unoccupied state for each orbital. The corresponding space of functions is called Fock space.

For electrons with Coulomb interaction in an external potential V, one has the corresponding representation of the Hamiltonian acting on occupation number tensors,

$$\begin{aligned} \varvec{H} = \sum _{i,j} t_{ij} \varvec{a}_{i}^* \varvec{a}_{j} + \sum _{i,j,k,l} v_{ijkl} \varvec{a}^*_{i} \varvec{a}^*_{j} \varvec{a}_{k} \varvec{a}_{l} \end{aligned}$$

with coefficient tensors \((t_{ij})\) and \((v_{ijkl})\) depending on the orbitals, in terms of the creation operators \(\varvec{a}^*_i\) and annihilation operators \(\varvec{a}_i\). These can be thought of as switching particles from the unoccupied to the occupied state of orbital i or back, respectively. The antisymmetry of wavefunctions corresponds to the anticommutation relations

$$\begin{aligned} \varvec{a}_{i} \varvec{a}^*_{j} + \varvec{a}^*_{j} \varvec{a}_{i} = \delta _{ij}, \quad \varvec{a}^*_{i} \varvec{a}^*_{j} + \varvec{a}^*_{j} \varvec{a}^*_{i} = \varvec{a}_{i} \varvec{a}_{j} + \varvec{a}_{j} \varvec{a}_{i} = 0\,. \end{aligned}$$

The second-quantized representation is particularly suitable for the application of low-rank tensor formats such as matrix product states (abbreviated MPS), also known as tensor trains (abbreviated TT) in the numerical analysis context, or the more general tree tensor networks (or hierarchical tensors); see [4, 16, 28]. Whereas the implementation of wavefunction antisymmetry in such tensor formats is problematic in the real-space representation of wavefunctions, this does not present a problem in the second-quantized representation: the antisymmetry properties are encoded in the representation of operators, and the corresponding occupation numbers describing the wavefunctions can be directly approximated in low-rank tensor formats. However, in contrast to real-space approximations of wavefunctions [15], where the number of electrons is tied to the spatial dimensionality of the problem, this particle number is not fixed in the second-quantized formulation and thus needs to be prescribed explicitly.

Prescribing a number of N particles amounts to restricting the eigenvalue problem for \(\varvec{H}\) to the subspace of those occupation numbers that are also eigenvectors of the particle number operator

$$\begin{aligned} \varvec{P} = \sum _{i} \varvec{a}_i^* \varvec{a}_i \end{aligned}$$

with eigenvalue N. The particle number constraint does not need to be implemented explicitly: as we investigate in detail in this work, every particle number eigenspace corresponds to a certain block-sparse structure in the cores of MPS. This fact has a long history in the physical literature (see, e.g., [5, 8, 26, 30, 33, 34]), where such block sparsity is usually derived from gauge symmetries, such as U(1) symmetry corresponding to particle number conservation. Here we use elementary linear algebra to arrive at this block structure, which to the best of our knowledge has not received any attention thus far in a mathematical context.

Block sparsity can not only be used to build the particle number constraint into the low-rank tensor representations, but it can also be exploited to reduce the costs of operations on MPS. We also consider the implications for analogous representations of linear operators acting on MPS, which are called matrix product operators (MPOs). As we show, these can be applied in a form that preserves their low-rank structure and at the same time maintains the block structure of MPS.

The existence of such block structures is commonly exploited in applications of MPS in physics for solving eigenvalue problems for general Hamiltonians with one- and two-particle interactions as in (1.1); see, for instance, [13, 17, 27, 31]. In these applications, the focus is mainly on density matrix renormalization group (DMRG) algorithms [33, 40], which operate locally on components of the MPS. The block structures of the MPS in this case appear as block diagonal structures of density matrices computed from the MPS. However, DMRG schemes are known to fail in certain circumstances [9], a fact that is related to their local mode of operation.

For designing eigenvalue solvers for MPS that can be guaranteed to converge, an important building block is the eigenvalue residual \(\varvec{H}\varvec{x} -\langle \varvec{H}\varvec{x},\varvec{x}\rangle \varvec{x}\). For its efficient evaluation for an MPO representation of \(\varvec{H}\) and an MPS representation of \(\varvec{x}\), the respective global block structures of these quantities that we consider here become important. In addition to describing the block structure-preserving action of \(\varvec{H}\), we also show that the representation rank of \(\varvec{H}\) can be substantially reduced if the tensor \((v_{ijkl})\) satisfies certain sparsity conditions that can be satisfied with a suitable choice of orbitals.

As one main contribution of this work, we thus consider from the point of view of numerical linear algebra the use of block-sparse MPS as a means of enforcing particle number constraints in solving eigenvalue problems as they arise in quantum chemistry. In particular, we consider the realization of basic operations on block-structured MPS, how Hamiltonians acting on MPS in a compatible block-structured form can be implemented, and their respective computational complexity. We also consider some basic effects of the block structure of MPS on the convergence of eigensolvers.

More generally, we show that a similar block structure is present whenever MPS (or tensor trains) are restricted to an eigenspace of a diagonal operator with a certain Laplacian-type structure, with the particle number operator as a particular example. Without explicitly enforcing the block structure, in exact arithmetic such a constraint is also preserved by many operations on MPS; due to issues of numerical stability, however, this is in general no longer true in numerical computations: when working on full MPS without explicit block structure, the particle number will in general accumulate numerical errors over the course of iterative schemes.

The outline of this paper is as follows: In Sect. 2, we introduce basic notions and notation of MPS. In Sect. 3, we consider the block structure of MPS under particle number constraints and some of their consequences, and we consider the realization of standard operations on MPS exploiting this block structure in Sect. 4. In Sect. 5, we consider low-rank representations of one- and two-electron operators in Hamiltonians as well as their interaction with block-structured MPS. Finally, in 6, we discuss basic implications for iterative eigensolvers and numerical illustrations.

2 Preliminaries

Since we are mainly interested in real-valued Hamiltonians as they arise in molecular systems, we restrict ourselves to real-valued occupation numbers in \((\mathbb {R}^2)^\infty \). However, the following considerations immediately generalize to the complex-valued case. We consider Fock space restricted to a fixed number \(K \in \mathbb {N}\) of orbitals, corresponding to occupation numbers in \({\mathcal {F}}^K {:}{=} (\mathbb {R}^2)^{K}\), which we regard as tensors of order K with indices \(\alpha \in \{ 0, 1\}^K\). This space is spanned by the unit vectors \(e^\alpha = e^{\alpha _1} \otimes \cdots \otimes e^{\alpha _K}\), where \(e^{\alpha _k}=(\delta _{\alpha _k,\beta })_{\beta =0,1}\) are Kronecker vectors for \(k = 1,\ldots , K\).

2.1 Matrix product states and operators

In our notation, we follow [3, 22] with some adaptations. The matrix product state (or tensor train) representation of \(\varvec{x} \in {\mathcal {F}}^K\) with ranks \(r_1, \ldots , r_{K-1} \in \mathbb {N}_0\) reads

$$\begin{aligned} \varvec{x}_\alpha = \varvec{x}_{\alpha _1, \ldots , \alpha _K} = \sum _{j_1=1}^{r_1} \cdots \sum _{j_{K-1}=1}^{r_{K-1}} X_1(j_0, \alpha _1, j_1) X_2(j_1, \alpha _2, j_2) \, \cdots \, X_K(j_{K-1}, \alpha _K, j_K), \end{aligned}$$

where for notational reasons we set \(j_0 = j_K = 1\), \(r_0 = r_K = 1\). For the third-order component tensors \(X_k\) in such a representation, called cores, we write

$$\begin{aligned} \mathsf {X} = (X_1,\ldots , X_K),\quad X_k = \bigl ( X_k(j_{k-1}, \alpha _k, j_k) \bigr )_{\begin{array}{c} j_{k-1} = 1,\ldots , r_{k-1}, \\ \alpha _k = 0,1, \\ j_k = 1,\ldots ,r_k \end{array}} \,. \end{aligned}$$

For linear mappings on \({\mathcal {F}}^K\), we have an analogous matrix product operator (MPO) representation

$$\begin{aligned} \varvec{M}_{\alpha _1, \ldots , \alpha _K,\beta _1, \ldots , \beta _K} = \sum _{j_1=1}^{r_1} \cdots \sum _{j_{K-1}=1}^{r_{K-1}} M_1(j_0, \alpha _1, \beta _1, j_1) \, \cdots \, M_K(j_{K-1}, \alpha _K, \beta _K , j_K),\nonumber \\ \end{aligned}$$

where we similarly write \(\mathsf {M} = (M_1,\ldots ,M_K)\).

For specifying the k-th component of an MPS or MPO explicitly, we use the notation

$$\begin{aligned} X^{ [j_{k-1}, j_k] }_k = \bigl ( X_k(j_{k-1}, \alpha _k, j_k) \bigr )_{\alpha _k=0,1}, \quad M^{ [j_{k-1}, j_k] }_k = \bigl ( M(j_{k-1}, \alpha _k, \beta _k, j_k) \bigr )^{\beta _k=0,1}_{\alpha _k=0,1}\,. \end{aligned}$$

Note that here and in the following, we write row indices in subscript and column indices in superscript, indicating that \(M_k^{ [ j_{k-1}, j_k ] }\) is a matrix. In terms of the vectors \(X_k^{[j_{k-1}, j_k]} \in \mathbb {R}^{\{0,1\}}\), a core \(X_k\) is then given by the rankwise block representation

$$\begin{aligned} X_k = \left[ \begin{array}{ccc} X_k^{[1,1]} &{} \cdots &{} X_k^{[1,r_k]} \\ \vdots &{} \ddots &{} \vdots \\ X_k^{[r_{k-1},1]} &{} \cdots &{} X_k^{[r_{k-1},r_k]} \end{array}\right] , \end{aligned}$$

with the analogous notation for the matrices \( M_k^{ [j_{k-1}, j_k] }\).

We define multiplication of a core \(X_k\) by a matrix G of appropriate size from the left or right on its indices \(j_{k-1}\) or \(j_k\), respectively, by

$$\begin{aligned} \begin{aligned} (GX_k)(j_{k-1}, \alpha , j_k)&= \sum _{j' = 1}^{r_{k-1}} G_{j_{k-1},j'} X_k(j', \alpha _k, j_k) , \\ (X_k G)(j_{k-1},\alpha ,j_k)&= \sum _{j' = 1}^{r_{k}} X_k(j_{k-1}, \alpha _k, j') G_{j', j_k}, \end{aligned} \end{aligned}$$

with the analogous definition for the components \(M_k\) in the representation of operators.

Complementing (2.3), we also introduce

$$\begin{aligned} X^{ \{ \alpha _k \} }_k = \bigl ( X_k(j_{k-1}, \alpha _k, j_k) \bigr )_{j_{k-1}=1,\ldots ,r_{k-1}}^{j_k=1,\ldots ,r_k} , \quad M_k^{ \{ \alpha _k, \beta _k \} } = \bigl ( M_k(j_{k-1}, \alpha _k, \beta _k, j_k) \bigr )_{j_{k-1}=1,\ldots ,r_{k-1}}^{j_k=1,\ldots ,r_k} \,. \end{aligned}$$

Again, the subscripts and superscripts indicate that \(X^{ \{ \alpha _k \} }_k\) and \(M_k^{ \{ \alpha _k, \beta _k \} }\) are matrices.

For a compact way of writing (2.1) in terms of the cores (2.1), we introduce the Strong Kronecker product,

$$\begin{aligned} ( X_1 {{\,\mathrm{\bowtie }\,}}X_2 )^{ \{ \alpha _1 \alpha _2 \} } = X_1^{ \{ \alpha _1 \} } X_2^{ \{ \alpha _2 \} }, \quad ( M_1 {{\,\mathrm{\bowtie }\,}}M_2 )^{ \{ \alpha _1 \alpha _2, \beta _1 \beta _2 \} } = M_1^{ \{ \alpha _1 , \beta _1\} } M_2^{ \{ \alpha _2, \beta _2 \} }. \end{aligned}$$

For example, for two cores XY of ranks \(2\times 2\), we obtain

$$\begin{aligned}&\left[ \begin{array}{cc} X^{[1,1]} &{} X^{[1,2]}\\ X^{[2,1]} &{} X^{[2,2]}\\ \end{array}\right] {{\,\mathrm{\bowtie }\,}}\left[ \begin{array}{cc} Y^{[1,1]} &{} Y^{[1,2]}\\ Y^{[2,1]} &{} Y^{[2,2]}\\ \end{array} \right] \qquad \\&\qquad = \left[ \begin{array}{cc} X^{[1,1]} \otimes Y^{[1,1]} + X^{[1,2]} \otimes Y^{[2,1]} &{} X^{[1,1]} \otimes Y^{[1,2]} + X^{[1,2]} \otimes Y^{[2,2]}\\ X^{[2,1]} \otimes Y^{[1,1]} + X^{[2,2]} \otimes Y^{[2,1]} &{} X^{[2,1]} \otimes Y^{[1,2]} + X^{[2,2]} \otimes Y^{[2,2]}\\ \end{array} \right] \,. \end{aligned}$$

With this notation, we have

$$\begin{aligned} {[}\varvec{x}] = X_1 {{\,\mathrm{\bowtie }\,}}X_2 {{\,\mathrm{\bowtie }\,}}\cdots {{\,\mathrm{\bowtie }\,}}X_K\,, \end{aligned}$$

where the block \([\varvec{x}] \in \mathbb {R}^{ 1 \times \{0,1\}^K \times 1}\) has leading and trailing dimensions of mode size 1. For simplicity, we ignore such singleton dimensions, that is, we identify \([\varvec{x}]\) with the tensor \(\varvec{x} \in \mathbb {R}^{\{0,1\}^K}\) of order K. With this identification, for the representation mapping \(\tau \), we write (2.1) as

$$\begin{aligned} \tau (\mathsf {X}) := X_1 {{\,\mathrm{\bowtie }\,}}X_2 {{\,\mathrm{\bowtie }\,}}\cdots {{\,\mathrm{\bowtie }\,}}X_K, \end{aligned}$$

where we have \(\varvec{x} = \tau (\mathsf {X})\). We use partial representation mappings that assemble the first or last cores of a matrix product state. We set

$$\begin{aligned} \tau _{k,j}^{\tiny {<}}{(\mathsf {X})} = \Bigl ( ( X_1 {{\,\mathrm{\bowtie }\,}}\cdots {{\,\mathrm{\bowtie }\,}}X_{k-1} )_{1,\alpha ,j} \Bigr )_{\alpha \in \{0,1\}^{k-1}}, \quad \tau ^{{\le }{4.3pt}}_{k,j} (\mathsf {X}) = \Bigl ( ( X_1 {{\,\mathrm{\bowtie }\,}}\cdots {{\,\mathrm{\bowtie }\,}}X_{k} )_{1,\alpha ,j} \Bigr )_{\alpha \in \{0,1\}^{k}}, \end{aligned}$$

and analogously

$$\begin{aligned} \tau _{k,j}^{>}{(\mathsf {X})} = \Bigl ( ( X_{k+1} {{\,\mathrm{\bowtie }\,}}\cdots {{\,\mathrm{\bowtie }\,}}X_{K} )_{1,\alpha ,j} \Bigr )_{\alpha \in \{0,1\}^{K-k-1}}, \quad \tau ^{{\ge }{4.3pt}}_{k,j} (\mathsf {X}) = \Bigl ( ( X_{k} {{\,\mathrm{\bowtie }\,}}\cdots {{\,\mathrm{\bowtie }\,}}X_{K} )_{1,\alpha ,j} \Bigr )_{\alpha \in \{0,1\}^{K-k}} \,. \end{aligned}$$

The representation \(\mathsf {X}\) is called left-orthogonal if the vectors \(\tau _{K,j}^{\tiny {<}}{(\mathsf {X})}\) are orthonormal for \(j=1,\ldots ,r_{K-1}\), and right-orthogonal if \(\tau _{1,j}^{>}{(\mathsf {X})}\) are orthonormal for \(j=1,\ldots ,r_{1}\).

As a second product operation between an MPS core \(X_k\) of ranks r and an MPO core \(M_k\) of ranks \(r'\)(or analogously between two MPO cores), with \(j_k = 1,\ldots ,r_k\), \(j_k'=1,\ldots ,r_k'\) for each k, we introduce the mode core product,

$$\begin{aligned} \left( M_k \bullet X_k \right) ^{ [ r_{k-1}'(j_{k-1}-1) + j'_{k-1}, \,r_k' (j_k-1) + j_k'] } = M_k^{[j_{k-1}', j_k']} X_k^{ [j_{k-1}, j_k] } \,. \end{aligned}$$

For matrix product operators a \(\varvec{M} = \tau (\mathsf {M})\) and matrix product states \(\varvec{x} = \tau (\mathsf {X})\), we have

$$\begin{aligned} \varvec{M} \varvec{x} = \tau ( M_1 \bullet X_1, \ldots , M_K \bullet X_K) \,. \end{aligned}$$

Finally, we introduce a lift product of a matrix \(W \in \mathbb {R}^{r \times r'}\) with a matrix \(M \in \mathbb {R}^{2 \times 2}\) or vector \(x \in \mathbb {R}^2\), that is to be understood as a Kronecker product with reordered indices,

$$\begin{aligned}&\uparrow : \mathbb {R}^{r \times r'} \times \mathbb {R}^{2\times 2} \rightarrow \mathbb {R}^{r \times 2 \times 2 \times r'},&(W \uparrow M)_{j \alpha \beta j'}&= W_{j j'}M_{\alpha \beta }, \\&\uparrow : \mathbb {R}^{r \times r'} \times \mathbb {R}^{2} \rightarrow \mathbb {R}^{r \times 2 \times r'},&(W \uparrow x)_{j \alpha j'}&= W_{j j'}x_{\alpha }, \end{aligned}$$

thus resulting in an MPO core or an MPS core, respectively.

2.2 Singular value decomposition

For \(k=1,\ldots ,K\), using \((\alpha _1,\ldots ,\alpha _k)\) as row index and \((\alpha _{k+1},\ldots , \alpha _K)\) as column index, one obtains corresponding matricizations (or unfoldings) of a tensor \(\varvec{x} \in {\mathcal {F}}^K\). Any representation \(\mathsf {X}\) of \(\varvec{x}\) can be transformed by operations on its cores such that these matricizations are in SVD form. This representation is known as Vidal decomposition [38] or as tensor train SVD (TT-SVD) [28] and also arises as a special case of the hierarchical SVD [14] of more general tree tensor networks.

Specifically, \(\mathsf {X}\) with \(\varvec{x}=\tau (\mathsf {X})\) is in left-orthogonal TT-SVD form if for \(k = 1,\ldots , K-1\), \(\{ \tau ^{{\le }{4.3pt}}_{k,j} (\mathsf {X}) \}_{j = 1,\ldots , r_{k}}\) are orthonormal and \(\{ \tau _{k,j}^{>}{(\mathsf {X}})\}_{j = 1,\ldots , r_{k}}\) are orthogonal, where \(\sigma _{k,j}(\varvec{x}) := \Vert \tau _{k,j}^{>}{(\mathsf {X})}\Vert _2\) with \(\sigma _{k,1}(\varvec{x}) \ge \ldots \ge \sigma _{k,r_k}(\varvec{x})\) are the singular values of the k-th matricization. Analogously, \(\mathsf {X}\) is in right-orthogonal TT-SVD form if for \(k = 1,\ldots , K-1\), \(\{ \tau ^{{\le }{4.3pt}}_{k,j} (\mathsf {X}) \}_{j = 1,\ldots , r_{k}}\) are orthogonal with \(\Vert \tau ^{{\le }{4.3pt}}_{k,1} (\mathsf {X})\Vert _2 \ge \ldots \ge \Vert \tau ^{{\le }{4.3pt}}_{k,k} (\mathsf {X})\Vert _2\) and \(\{ \tau ^{>}_{k,j} (\mathsf {X}) \}_{j = 1,\ldots , r_{k}}\) are orthonormal. These forms can be obtained by the scheme given in [28, Algorithm 1].

The rank truncation of either SVD form yields quasi-optimal approximations of lower ranks [14, 28]: let \(\mathsf {X}\) be given in TT-SVD form with ranks \(r_1,\ldots ,r_{K-1}\), and denote by \({\text {trunc}}_{s_1,\ldots ,s_{K-1}}(\mathsf {X})\) its truncation to ranks \(s_k\le r_k\), \(k=1,\ldots ,K-1\), then

$$\begin{aligned} \begin{aligned}&\Vert \tau (\mathsf {X}) - \tau (\mathsf {\mathrm{trunc}}_{s_1,\ldots ,s_{K-1}}(\mathsf {X})) \Vert _2 \le \biggl (\sum _{k=1}^{K-1} \sum _{j=1}^{r_k} \sigma _{k,j}^2 \biggr )^{\frac{1}{2}} \\&\qquad \qquad \qquad \le \sqrt{K-1} \min \bigl \{ \Vert \tau (\mathsf {X}) - \tau (\mathsf {Y})\Vert _2 :\mathsf {Y} \text { of ranks } \, s_1,\ldots , s_{K-1} \bigr \}. \end{aligned} \end{aligned}$$

Using the above error bound in terms of the matricization singular values, one obtains an approximation \({\text {trunc}}_\varepsilon (\mathsf {X})\) with \(\Vert \tau (\mathsf {X}) - \tau ({\text {trunc}}_\varepsilon (\mathsf {X}))\Vert _2 \le \varepsilon \) for any \(\varepsilon >0\) by truncating ranks according to the smallest singular values.

2.3 Tangent space projection

It is well known that MPS of fixed multilinear rank constitute an embedded smooth submanifold of the tensor space \({\mathcal {F}}^K\) [20]. The tangent space of this manifold can be explicitly characterized, and the ranks of the tangent vectors at \(\varvec{x}\) in MPS representation are at most twice the ranks of \(\varvec{x}\).

Let \(\varvec{x} = \tau (\mathsf {U}) = \tau (\mathsf {V})\) where \(\mathsf {U} = (U_1,\cdots ,U_K)\) is in left- and \(\mathsf {V} = (V_1,\cdots ,V_K)\) is in right-orthogonal form. The projection operator onto the tangent space at \(\varvec{x}\) is given by

$$\begin{aligned} \varvec{Q}_{\varvec{x}}= \sum _{k=1}^K \bigl ( \varvec{Q}_{\varvec{x}}^{k,1} - \varvec{Q}_{\varvec{x}}^{k,2} \bigr ), \end{aligned}$$

where, identifying mappings with their representation matrices, for \(k = 1,\ldots ,K\),

$$\begin{aligned} \varvec{Q}_{\varvec{x}}^{k,1} = \biggl (\sum _{j=1}^{r_{k-1}}\tau ^{<}_{k,j} (\mathsf {U})\, \langle \tau ^{<}_{k,j} (\mathsf {U}),\, \cdot \, \rangle \biggr ) \otimes I \otimes \biggl (\sum _{j=1}^{r_{k}}{\tau ^{>}_{k,j} (\mathsf {V})}\, \langle {\tau ^{>}_{k,j} (\mathsf {V})},\,\cdot \,\rangle \biggr ), \end{aligned}$$

for \(k = 1,\ldots ,K-1\),

$$\begin{aligned} \varvec{Q}_{\varvec{x}}^{k,2} = \biggl (\sum _{j=1}^{r_{k}}\tau ^{{\le }{4.3pt}}_{k,j} (\mathsf {U})\,\langle \tau ^{{\le }{4.3pt}}_{k,j} (\mathsf {U}),\,\cdot \,\rangle \biggr ) \otimes \biggl (\sum _{j=1}^{r_{k}}{\tau ^{>}_{k,j} (\mathsf {V})}\,\langle {\tau ^{>}_{k,j} (\mathsf {V})},\,\cdot \,\rangle \biggr ), \end{aligned}$$

and \(\varvec{Q}_{\varvec{x}}^{K,2} = 0\) for \(k=K\).

2.4 Second quantization

The operators of second quantization can be represented as mappings on \({\mathcal {F}}^K\) as follows: with the elementary components

$$\begin{aligned} S = \begin{pmatrix} 1 &{} 0 \\ 0 &{} -1 \end{pmatrix}, \quad A = \begin{pmatrix} 0 &{} 1 \\ 0 &{} 0 \end{pmatrix}, \quad I = \begin{pmatrix} 1 &{} 0 \\ 0 &{} 1 \end{pmatrix}, \end{aligned}$$

the annihilation operator \(\varvec{a}_i\) on \({\mathcal {F}}^K\) reads

$$\begin{aligned} \varvec{a}_i = \biggl ( \bigotimes _{k=1}^{i-1} S \biggr ) \otimes A \otimes \biggl ( \bigotimes _{k=i+1}^{K} I \biggr ), \end{aligned}$$

and the corresponding creation operator is \(\varvec{a}_i^*\). The particle number operator on \({\mathcal {F}}^K\) is given by

$$\begin{aligned} \varvec{P} = \sum _{i=1}^K \varvec{a}_i^* \varvec{a}_i. \end{aligned}$$

In addition, we introduce the truncated versions

$$\begin{aligned} \varvec{P}^{{\le }{4.3pt}}_{k} = \sum _{i = 1}^k \biggl ( \bigotimes _{\ell =1}^{i-1} I \biggr ) \otimes A^*A \otimes \biggl ( \bigotimes _{\ell =i+1}^{k} I \biggr ), \quad \varvec{P}^{>}_{k} = \sum _{i = k+1}^K \biggl ( \bigotimes _{\ell =k+1}^{i-1} I \biggr ) \otimes A^*A \otimes \biggl ( \bigotimes _{\ell =i+1}^{K} I \biggr ), \end{aligned}$$

which act only on the left and right sections, respectively, of a matrix product state.

3 Block structure of matrix products states

In this section we characterize the block sparsity of an MPS \(\mathsf {X}\) such that \(\varvec{x}=\tau (\mathsf {X})\) is an eigenvector of the particle number operator \(\varvec{P}\), or in fact of any operator that shares a certain structural feature of \(\varvec{P}\). We first formulate the result for general tensors \(\varvec{x} \in \mathbb {R}^{n_1\times \cdots \times n_K}\) and then obtain the corresponding result for eigenvectors of \(\varvec{P}\) in \({\mathcal {F}}^K\) as a special case. While the definitions of Sect. 2 are given for tensors in \({\mathcal {F}}^K\) for simplicity, they immediately carry over to general tensors in \(\mathbb {R}^{n_1\times \cdots \times n_K}\) with indices in , where we abbreviate partial index sets for modes \(k_1\) to \(k_2\) by .

For \(\varvec{P}\), the block sparsity of \(\mathsf {X}\) that we obtain is of the following form: For each k, the matrices \(X^{\{0\}}_k\) and \(X^{\{1\}}_k\) have block structure with nonzero blocks only on the main diagonal for \(X^{\{0\}}_k\), and only on the first superdiagonal for \(X^{\{1\}}_k\). Intuitively, this can be interpreted as follows: each block corresponds to a certain number of occupied orbitals to the left of k. For \(X^{\{0\}}_k\), this number does not change. For the occupied state, in \(X^{\{1\}}_k\) the positions of the blocks correspond to increasing the number of particles by one.

More generally, we obtain such block sparsity for so-called Laplace-like operators [22] on \(\mathbb {R}^{n_1\times \cdots \times n_K}\) of the form

$$\begin{aligned} \varvec{L} = \sum _{k=1}^K \biggl ( \bigotimes _{\ell =1}^{k-1} I \biggr ) \otimes L_k \otimes \biggl ( \bigotimes _{\ell =k+1}^{K} I \biggr ) \end{aligned}$$

with diagonal matrices

$$\begin{aligned} L_k = \text {diag}(\lambda _{k,0},\lambda _{k,1},\ldots ,\lambda _{k,n_k-1}), \end{aligned}$$

where in the case of \(\varvec{P}\), we have \(L_k = A^*A\). The unit vectors \(e^{\alpha _1} \otimes \cdots \otimes e^{\alpha _K}\) are eigenvectors of such \(\varvec{L}\), with eigenvalues given by

$$\begin{aligned} \lambda _\alpha = \sum _{k=1}^K \lambda _{k,\alpha _k}, \quad {\alpha \in \mathcal {N}}. \end{aligned}$$

Remark 3.1

Note that for any \(\varvec{L}\) of the form (3.1a) with general symmetric matrices \(L_k\), there exists \(\varvec{U} = U_1 \otimes \cdots \otimes U_K\) with orthogonal matrices \(U_1,\ldots , U_K\) such that \(\varvec{{\tilde{L}}} = \varvec{U} \varvec{L} \varvec{U}^\top \) satisfies also (3.1b), and thus the following considerations apply to \(\varvec{ {\tilde{L}}}\).

Let \(\varvec{x} = \tau (\mathsf {X})\) as above satisfy \(\varvec{L}\varvec{x} = \lambda \varvec{x}\), \(\varvec{x} \ne 0\). For such \(\lambda \), we define the subset \(I_\lambda \subset \mathcal {N}\) of all \(\alpha \) such that \(\lambda = \lambda _\alpha \). Furthermore, for each \(\alpha \in I_\lambda \) and \(k=1,\ldots ,K-1\) we can split the eigenvalue \(\lambda \) in the form

$$\begin{aligned} \lambda = \lambda _{k,\alpha }^{{\le }{4.3pt}} + \lambda _{k,\alpha }^{>} := \sum _{\ell =1}^k \lambda _{\ell ,\alpha _\ell } +\sum _{\ell =k+1}^K \lambda _{\ell ,\alpha _\ell }. \end{aligned}$$

Using the notation from (2.6), the summands \(\lambda _{k,\alpha }^{{\le }{4.3pt}}, \lambda _{k,\alpha }^{\tiny {>}}\) are eigenvalues of the truncated versions \(\varvec{L}_k^{{\le }{4.3pt}}\) and \(\varvec{L}_k^{\tiny {>}}\) of \(\varvec{L}\). We write \({\mathcal {K}}_{\lambda ,k}\) for the set of all \(\lambda _{k,\alpha }^{{\le }{4.3pt}}\) for given \(\lambda \) and k, that is,

$$\begin{aligned} {\mathcal {K}}_{\lambda ,k} = \left\{ \sum _{\ell =1}^k \lambda _{\ell ,\alpha _\ell } :\alpha \in I_\lambda \right\} {,} \end{aligned}$$

where we have \(\mathcal {K}_{\lambda ,0} = \{0\}\). In full representation, \(\varvec{x}\) necessarily has a certain sparsity pattern, since \(\varvec{x}_\alpha = 0\) if \(\alpha \notin I_\lambda \). We can exploit the invariance

$$\begin{aligned} \begin{aligned} \varvec{x}&= X_1 {{\,\mathrm{\bowtie }\,}}X_2 {{\,\mathrm{\bowtie }\,}}\cdots {{\,\mathrm{\bowtie }\,}}X_{K-1} {{\,\mathrm{\bowtie }\,}}X_K \\&= X_1 G_1 {{\,\mathrm{\bowtie }\,}}G_1^{-1} X_2 G_2 {{\,\mathrm{\bowtie }\,}}\cdots {{\,\mathrm{\bowtie }\,}}G_{K-2}^{-1} X_{K-1} G_{K-1} {{\,\mathrm{\bowtie }\,}}G_{K-1}^{-1} X_K \end{aligned} \end{aligned}$$

for invertible \(G_k \in \mathbb {R}^{r_k \times r_k}\) for \(k = 1,\ldots ,K-1\) in order to obtain a block structure for the component tensors \(X_k\) , which is our following main result. Here and in what follows, for \({\mathcal {K}}\subset \mathbb {R}\) and \(\lambda \in \mathbb {R}\), we write \(\mathcal {K} - \lambda := \{\mu \in \mathbb {R}: \mu +\lambda \in \mathcal {K} \}\).

Theorem 3.2

Let \(\varvec{x} \in \mathbb {R}^{n_1\times \cdots \times n_K}\), \(\varvec{x}\ne 0\), have the representation \(\varvec{x} = \tau (\mathsf {X})\) with minimal ranks \(\mathsf {r} = (r_1,\ldots , r_{K-1})\). Then one has \(\varvec{L} \varvec{x} = \lambda \varvec{x}\) with \(\varvec{L}\) as in (3.1) precisely when \(\mathsf {X}\) can be chosen such that the following holds: for \(k=1,\ldots , K\) and for all \(\mu \in {\mathcal {K}}_{\lambda ,k}\), there exist \(\mathcal {S}_{k,\mu } \subseteq \{ 1,\ldots , r_k\}\) such that

$$\begin{aligned} \varvec{L}^{{\le }{4.3pt}}_{k} \tau ^{{\le }{4.3pt}}_{k,j} (\mathsf {X}) = \mu \tau ^{{\le }{4.3pt}}_{k,j} (\mathsf {X}), \quad L_{k}^{>} \tau _{k,j}^{>}{(\mathsf {X})} = (\lambda -\mu ) \tau _{k,j}^{>}{(\mathsf {X})} , \quad j \in \mathcal {S}_{k,\mu } , \end{aligned}$$

and the matrices \(X^{\{\beta \}}_{k}\), \(\beta = 0,1,\ldots ,n_k-1\), have nonzero entries only in the blocks

$$\begin{aligned} \begin{aligned}&X^{\{\beta \}}_{k}\big |_{\mathcal {S}_{k-1,\mu } \times \mathcal {S}_{k,\mu +\lambda _{k,\beta }}}&\text {for }\, \mu \in {\mathcal {K}}_{\lambda ,k-1}\cap ( {\mathcal {K}}_{\lambda ,k}-\lambda _{k,\beta }), \end{aligned} \end{aligned}$$

where we set \(\mathcal {S}_{0,0} = \mathcal {S}_{K,\lambda } = \{1\}\).


We first show that \(\varvec{L} \varvec{x} = \lambda \varvec{x}\) implies that (3.3), (3.4) hold, proceeding by induction over k. Thus, let \(k = 1\). For fixed \(\beta \in \{ 0,\ldots , n_1-1 \}\) we define

$$\begin{aligned} \varvec{y}^\beta = \bigl ( \varvec{x}_{\beta ,{\hat{\alpha }}} \bigr )_{{\hat{\alpha }} \in \mathcal {N}_2^K}. \end{aligned}$$

Then, by the definition of \(\varvec{L}\) and by our assumption,

$$\begin{aligned} \sum _{\beta =0}^{n_1-1} e^\beta \otimes \varvec{L}^{>}_{1}\varvec{y}^\beta + \lambda _{1,\beta }e^\beta \otimes \varvec{y}^\beta = \varvec{L} \varvec{x} = \lambda \varvec{x} = \sum _{\beta =0}^{n_1-1} e^\beta \otimes \lambda \varvec{y}^\beta . \end{aligned}$$

Consequently \(\varvec{L}^{>}_{1}\varvec{y}^\beta = (\lambda - \lambda _{1,\beta })\varvec{y}^\beta \) for each \(\beta \), and thus either \(\varvec{y}^\beta = 0\) or \(\varvec{y}^\beta \) is an eigenvector of a self-adjoint linear mapping. Orthogonality of eigenvectors with distinct eigenvalues implies \(\langle \varvec{y}^{\beta }, \varvec{y}^{\beta '}\rangle = 0\) if \( \lambda _{1,\beta } \ne \lambda _{1,\beta '}\). Writing \(\varvec{y}^\beta = X_1^{\{\beta \}} X_2 {{\,\mathrm{\bowtie }\,}}\cdots {{\,\mathrm{\bowtie }\,}}X_K\), we obtain

$$\begin{aligned} 0 = \langle \varvec{y}^{\beta }, \varvec{y}^{\beta '}\rangle = \langle X_1^{\{\beta \}} X_2 {{\,\mathrm{\bowtie }\,}}\cdots {{\,\mathrm{\bowtie }\,}}X_K, X_1^{\{\beta '\}} X_2 {{\,\mathrm{\bowtie }\,}}\cdots {{\,\mathrm{\bowtie }\,}}X_K \rangle = \langle G_1 X^{\{\beta \}}_1, X^{\{\beta '\}}_1\rangle \end{aligned}$$


$$\begin{aligned} {G}_1 =\bigl ( \langle \tau ^{>}_{1,j}{(\mathsf {X})}, \tau ^{>}_{1,j'}{(\mathsf {X})} \rangle \bigr )_{j, j' = 1,\ldots ,r_1}, \end{aligned}$$

which is invertible since the ranks of \(\mathsf {X}\) are minimal. This means that \(X^{\{\beta \}}_1G_1^{1/2}, X^{\{\beta '\}}_1G_1^{1/2}\) are pairwise orthogonal. Thus using (3.2), replacing \(X_1\) by \(X_1 G_1^{1/2}\) and \(X_2\) by \(G_1^{-1/2}X_2\), we can ensure that \(\langle X^{\{\beta \}}_1, X^{\{\beta '\}}_1 \rangle = 0\) if \( \lambda _{1,\beta } \ne \lambda _{1,\beta '}\). By minimality of ranks, there exist precisely \(r_1\) different \(\beta _1,\ldots ,\beta _{r_1} \in \{ 0,\ldots ,n_1-1\}\) such that \(X^{\{\beta _1\}}_1,\ldots , X^{\{\beta _{r_1}\}}_1\) are linearly independent. For \(\mu \in {\mathcal {K}}_{\lambda , 1} = \{ \lambda _{1,\beta } :\beta = 0,\ldots ,n_1-1\}\), we now define

$$\begin{aligned} \mathcal {S}_{1,\mu } = \{ j :\lambda _{1,\beta _j} = \mu \}. \end{aligned}$$

Again making use of (3.2), by Householder reflectors we can construct an orthogonal transformation \(Q_1 \in \mathrm {O}(r_1)\) such that replacing \(X_1\) by \(X_1 Q_1\) and \(X_2\) by \(Q_1^\top X_2\), we have

$$\begin{aligned} X_1^{\{ \beta \}} \in {{\,\mathrm{span}\,}}\bigl \{ e^{j}\in \mathbb {R}^{r_1}:j\in \mathcal {S}_{1,\lambda _{1,\beta }} \bigr \}\text { for } \beta =0,\ldots ,n_1-1. \end{aligned}$$

This means that \(X_1^{[1,j]} \in {{\,\mathrm{span}\,}}\{ e^{\beta } :\lambda _{1,\beta } = \mu \}\) for \(j \in \mathcal {S}_{1,\mu }\). Noting that the eigenspaces of \(L_1\) are given by \({{\,\mathrm{span}\,}}\{ e^{\beta '} :\beta ' = 0,\ldots ,n_1-1 \text { with } \mu = \lambda _{1,\beta '} \}\) for \(\mu \in {\mathcal {K}}_{\lambda , 1}\), as well as \(X_1^{[1,j]} = \tau ^{{\le }{4.3pt}}_{1,j} (\mathsf {X})\), we thus have

$$\begin{aligned} L_1 \tau ^{{\le }{4.3pt}}_{1,j} (\mathsf {X}) = \mu \tau ^{{\le }{4.3pt}}_{1,j} (\mathsf {X}) \;\text { for } \mu \in {\mathcal {K}}_{\lambda ,1},\, j \in \mathcal {S}_{1,\mu }. \end{aligned}$$

This shows (3.4) and the first statement in (3.3) for \(k=1\). Moreover, combining

$$\begin{aligned} \varvec{L} \varvec{x} = \sum _{\mu \in {\mathcal {K}}_{\lambda , 1}} \sum _{j \in \mathcal {S}_{1,\mu }} \bigl (\mu \tau ^{{\le }{4.3pt}}_{1,j} (\mathsf {X}) \otimes \tau ^{>}_{1,j} (\mathsf {X}) + \tau ^{{\le }{4.3pt}}_{1,j} (\mathsf {X}) \otimes \varvec{L}^{>}_{1} \tau ^{>}_{1,j} (\mathsf {X}) \bigr ) \end{aligned}$$

and \(\varvec{L}\varvec{x} = \lambda \varvec{x}\) we obtain

$$\begin{aligned} \sum _{\mu \in {\mathcal {K}}_{\lambda , 1}} \sum _{j \in \mathcal {S}_{1,\mu }} \tau ^{{\le }{4.3pt}}_{1,j} (\mathsf {X}) \otimes \varvec{L}^{>}_{1} \tau ^{>}_{1,j} (\mathsf {X}) = \sum _{\mu \in {\mathcal {K}}_{\lambda , 1}} (\lambda - \mu ) \sum _{j \in \mathcal {S}_{1,\mu }} \tau ^{{\le }{4.3pt}}_{1,j} (\mathsf {X}) \otimes \tau ^{>}_{1,j} (\mathsf {X}). \end{aligned}$$

Since \(\tau ^{{\le }{4.3pt}}_{1,j} (\mathsf {X}) \), \(j = 1,\ldots , r_1\), are linearly independent by our assumption of minimal ranks, the second statement in (3.3) for \(k=1\) follows.

Suppose we have sets \(\mathcal {S}_{k,\mu }\) with \(\mu \in {\mathcal {K}}_{\lambda ,k}\) such that (3.3) and (3.4) hold for some k with \(1 \le k < K-1\), where \(0 \le \mu \le \lambda \) by construction. Then

$$\begin{aligned} \varvec{L}^{>}_{k} \tau ^{>}_{k,j} (\mathsf {X}) = (\lambda -\mu ) \tau ^{>}_{k,j} (\mathsf {X}) \quad \text {for all } \mu \in {\mathcal {K}}_{\lambda ,k} \text { and all } j \in \mathcal {S}_{k,\mu }. \end{aligned}$$

For \(\beta = 0,\ldots , n_{k+1}-1\), let

$$\begin{aligned} \varvec{y}^\beta _{k,j} = \bigl ( \tau ^{>}_{k,j} (\mathsf {X})_{\beta , {\hat{\alpha }}} \bigr )_{{\hat{\alpha }} \in \mathcal {N}_{k+2}^{K}}, \quad j = 1,\ldots , r_{k}. \end{aligned}$$

For each \(\mu \in {\mathcal {K}}_{\lambda ,k}\), we then have

$$\begin{aligned} \varvec{L}^{>}_{k+1} \varvec{y}^\beta _{k,j} = (\lambda -\mu - \lambda _{k+1,\beta }) \varvec{y}^\beta _{k,j}, \qquad j \in \mathcal {S}_{k,\mu }. \end{aligned}$$

By the orthogonality of eigenvectors corresponding to distinct eigenvalues, this implies

$$\begin{aligned} \langle \varvec{y}^\beta _{k,j}, \varvec{y}^{\beta '}_{k,j'} \rangle = 0 \quad \text {for } j \in \mathcal {S}_{k,\mu }, \, j' \in \mathcal {S}_{k,\mu '} \text { with } {\mu + \lambda _{k+1,\beta }\ne \mu ' + \lambda _{k+1,\beta '}} .\nonumber \\ \end{aligned}$$

We write \((X_{k+1}^{\{\beta \}})_j\) for the j-th row of \(X_{k+1}^{\{\beta \}}\), \(1\le k < K-1\). For \(\mu \in \mathcal {K}_{\lambda ,k+1}\), we define

$$\begin{aligned} \mathcal {Z}_{k+1,\mu }:= {\text {span}} \, \bigcup _{\beta =0}^{n_{k+1}-1} \Bigl \{ (X_{k+1}^{\{\beta \}})_j :j \in \mathcal {S}_{k,\mu -\lambda _{k{+1},\beta }} \Bigr \} , \end{aligned}$$

where for each k, we set \(\mathcal {S}_{k,\mu } = \emptyset \) for \(\mu \notin \mathcal {K}_{\lambda ,k}\). Since \(k < K-1\), we have \( \varvec{y}^\beta _{k,j} = (X_{k+1}^{\{\beta \}})_j X_{k+2} {{\,\mathrm{\bowtie }\,}}\cdots {{\,\mathrm{\bowtie }\,}}X_K \). Under the conditions in (3.6), we thus have

$$\begin{aligned} \langle G_{k+1} (X^{\{\beta \}}_{k+1})_j , (X^{\{\beta '\}}_{k+1})_{j'} \rangle = 0, \quad G_{k+1} = \Bigl ( \langle \tau ^{>}_{k+1,\ell } (\mathsf {X}) , \tau ^{>}_{k+1,\ell '} (\mathsf {X}) \rangle \Bigr )_{\ell ,\ell ' = 1, \ldots , r_{k+1}} , \end{aligned}$$

where again, \(G_{k+1}\) is invertible since the ranks of \(\mathsf {X}\) are minimal. Consequently, for \(z\in \mathcal {Z}_{k+1,\mu }\), \(z' \in \mathcal {Z}_{k+1,\mu '}\) and \(\mu \ne \mu '\), (3.6) means that \(\langle G_{k+1} z, z'\rangle = 0\). Again, there exists an orthogonal transformation \(Q_{k+1}\in \mathrm {O}(r_{k+1})\) such that by replacing \(X_{k+1}\) by \(X_{k+1}G_{k+1}^{1/2}Q_{k+1}\) and \(X_{k+2}\) by \(Q_{k+1}^TG_{k+1}^{-1/2}X_{k+2}\) as before, we can ensure pairwise orthogonality of the spaces \(\mathcal {Z}_{k+1,\mu }\), \(\mu \in \mathcal {K}_{\lambda ,k+1}\), in the Euclidean inner product. Additionally, for \(\mu \in {\mathcal {K}}_{\lambda ,k+1}\), we can define subsets \(\mathcal {S}_{k+1,\mu }\) that form a partition of \(\{1, \ldots , r_{k+1}\}\) with \(\# \mathcal {S}_{k+1,\mu } = \dim \mathcal {Z}_{k+1,\mu }\), such that for all \(z \in \mathcal {Z}_{k+1,\mu }\) we have \(\mathop \mathrm{supp}(z) \subseteq \mathcal {S}_{k+1,\mu }\). For \(k < K-1\), this implies the block structure (3.4) for \(X_{k+1}\), and \(\varvec{L}^{{\le }{4.3pt}}_{k+1} \tau ^{{\le }{4.3pt}}_{k+1,j} (\mathsf {X}) = \mu \tau ^{{\le }{4.3pt}}_{k+1,j} (\mathsf {X})\) for \(j \in \mathcal {S}_{k+1,\mu }\), which is the first statement in (3.3) for \(k+1\), holds by construction. Thus we also have

$$\begin{aligned}&\sum _{\mu \in {\mathcal {K}}_{\lambda , k+1}} \sum _{j \in \mathcal {S}_{k+1,\mu }} \tau ^{{\le }{4.3pt}}_{k+1,j} (\mathsf {X}) \otimes \varvec{L}^{>}_{k+1} \tau ^{>}_{k+1,j} (\mathsf {X}) \\&\quad = \sum _{\mu \in {\mathcal {K}}_{\lambda , k+1}} (\lambda -\mu )\sum _{j \in \mathcal {S}_{k+1,\mu }} \tau ^{{\le }{4.3pt}}_{k+1,j} (\mathsf {X}) \otimes \tau ^{>}_{k+1,j} (\mathsf {X}), \end{aligned}$$

which by minimality of ranks yields the second statement in (3.3) for \(k+1\). By induction, the statement thus follows for all \(k \le K-1\).

Finally, for \(k=K-1\), if \(\mu \in {\mathcal {K}}_{\lambda ,K-1}\), then \(\lambda - \mu = \lambda _{K,\beta }\) for some \(\beta \in \{ 0,\ldots ,n_K-1\}\), and (3.5) becomes

$$\begin{aligned} L_K X_K^{[j, 1]} = (\lambda - \mu ) X_K^{[j, 1]} \quad \text {for } \mu \in {\mathcal {K}}_{\lambda ,K-1},\, j \in \mathcal {S}_{K-1,\mu }, \end{aligned}$$

noting that \(r_K = 1\). As \(L_K\) is a diagonal matrix, the eigenspaces in (3.7) are given by

$$\begin{aligned} {\text {span}}\{ e^{\beta '} :\beta ' = 0,\ldots ,n_K-1 \text { with } \mu = \lambda - \lambda _{K,\beta '} \}, \quad \mu \in {\mathcal {K}}_{\lambda , K-1}. \end{aligned}$$

For \(\beta \in \{ 0,\ldots ,n_K-1 \}\) with \(\mu = \lambda - \lambda _{K,\beta }\) let \(j' \in \{ 1, \ldots , r_{K-1} \}\). If \(j' \notin \mathcal {S}_{K-1,\mu }\), then \(j' \in \mathcal {S}_{K-1,\mu '}\) for some \(\mu ' \ne \lambda - \lambda _{K,\beta }\), and \(X_K^{[j',1]}\) is orthogonal to \(X_K^{[j,1]}\) for all \(j \in \mathcal {S}_{K-1,\mu }\), which by (3.8) implies \(X_K(j,\beta ,1) = 0\). Thus also \(X_K\) satisfies (3.4).

Conversely, suppose now that \(\varvec{x} \in \mathbb {R}^{n_1\times \cdots \times n_K}\), \(\varvec{x}\ne 0\), has the block structure (3.3), (3.4). This means that expanding the representation \(\mathsf {X}\) in terms of elementary tensors of order K, each of the resulting summands is an eigenfunction of \(\varvec{L}\) with eigenvalue \(\lambda \), and thus the same holds for \(\varvec{x}\). \(\square \)

Remark 3.3

The above proof uses the explicit tensor network structure of an MPS. Analogous block sparsity results can be derived for other tree tensor network representations of tensors that are eigenvectors of an operator with the structure (3.1). Similar use of block sparsity for enforcing physical symmetries has also been considered in [5, 34] for tensor networks without tree structure such as PEPS [37] and MERA [39]. In such cases, however, whether any tensor with a given symmetry necessarily has a representation with a particular block-sparsity pattern is not clear in this case. In other words, in the absence of tree structure we cannot establish an equivalence between membership in certain eigenspaces and representability with block structure as in Theorem 3.2. Block sparsity can then still be used for obtaining approximations, see [5, Sec. III.C].

We state the specific result for the particle number operator \(\varvec{P}\) as a corollary. This corresponds to the special case \(n_k = 2\) and \(\lambda = N\). For \(k=1,\ldots ,K\), we have \(\lambda _{k,0} = 0\) and \(\lambda _{k,1} = 1\), and hence \(\lambda _{k,\alpha }^{{\le }{4.3pt}}, \lambda _{k,\alpha }^{\tiny {>}}\in \{ 0, \ldots , N\}\) in Theorem 3.2.

Corollary 3.4

Let \(\varvec{x} \in {\mathcal {F}}^K\), \(\varvec{x}\ne 0\), have the representation \(\varvec{x} = \tau (\mathsf {X})\) with minimal ranks \(\mathsf {r} = (r_1,\ldots , r_{K-1})\). Then for \(N=1,\ldots , K\), one has \(\varvec{P} \varvec{x} = N \varvec{x}\) precisely when \(\mathsf {X}\) can be chosen such that the following holds: for \(k=1,\ldots , K\) and for all

$$\begin{aligned} n \in {\mathcal {K}}_k := \bigl \{ \max \{ 0 , N - K + k \} ,\ldots , \min \{N, k\} \bigr \} \end{aligned}$$

there exist \(\mathcal {S}_{k,n} \subseteq \{ 1,\ldots , r_k\}\) such that

$$\begin{aligned} \varvec{P}^{{\le }{4.3pt}}_{k} \tau ^{{\le }{4.3pt}}_{k,j} (\mathsf {X}) = n \tau ^{{\le }{4.3pt}}_{k,j} (\mathsf {X}), \quad \varvec{P}^{>}_{k} \tau ^{>}_{k,j} (\mathsf {X}) = (N-n) \tau ^{>}_{k,j} (\mathsf {X}) , \quad j \in \mathcal {S}_{k,n} , \end{aligned}$$

and the matrices \(X^{\{\beta \}}_{k}\), \(\beta = 0,1\), have nonzero entries only in the blocks

$$\begin{aligned} \begin{aligned}&X^{\{0\}}_{k}\big |_{\mathcal {S}_{k-1,n} \times \mathcal {S}_{k,n}}&\text {for } n \in {\mathcal {K}}_{k-1}\cap {\mathcal {K}}_{k}, \\&X^{\{1\}}_{k}\big |_{\mathcal {S}_{k-1,n} \times \mathcal {S}_{k,n+1}}&\text {for } n \in {\mathcal {K}}_{k-1} \cap ({\mathcal {K}}_{k}-1) , \end{aligned} \end{aligned}$$

where we set \(\mathcal {S}_{0,0} = \mathcal {S}_{K,N} = \{1\}\).

The block structure described by Corollary 3.4 is equivalent to the one used in physics literature (see, e.g., [26, 33, 34]), where it is usually stated differently in terms of quantum numbers and derived via U(1) symmetry of operators. For other symmetries considered in physics, such as SU(2), the linear algebraic structure is different and therefore cannot be described by Laplace-like operators (3.1) as done above.

In addition to the the fermionic particle number operator \(\varvec{P}\), there is a variety of other settings where the structure described by Theorem 3.2 can be used.

Example 3.5

In quantum chemistry, not only the particle number is conserved, but also the numbers of spin-up and spin-down particles. So the MPS is an eigenvector of two associated Laplace-like operators \(\varvec{P}_\mathrm {up}\) and \(\varvec{P}_\mathrm {down}\). For both cases we have \(n_k = 2\) and \(\lambda _{k,1} = 1\) if k even/odd for the up/down operator and \(\lambda _{k,1} = 0\) otherwise. We then have partial eigenvalues \({\mathcal {K}}_k^{\mathrm {up}}\) and \(K_k^{\mathrm {down}}\) and the blocks depend on two partial eigenvalues, i.e., we have sets \({\mathcal {S}}_{k,n_1,n_2}\subseteq \{1,\ldots ,r_k\}\) for \(n_1 \in {\mathcal {K}}_k^{\mathrm {up}}\) and \(n_2 \in K_k^{\mathrm {down}}\). The blocks can then be ordered such that the blocks from the first operator have block structure themselves.

Alternatively, one can introduce spatial orbitals that can carry one spin-up and one spin-down electron [36]. In this case, all dimensions are \(n_k = 4\) and a similar block structure can be derived that also takes into account the antisymmetry of the particles.

Example 3.6

Another example is the bosonic particle number operator with \(n_k = n\) and \(\lambda _{k,\alpha _k} = \alpha _k\). Tensor trains have frequently been applied in the parametrization of elements of high-dimensional polynomial spaces such as

$$\begin{aligned} V_n^K = \left\{ \sum _{\alpha \in \{0, \ldots , n-1\}^K} c_\alpha \prod _{k =1}^{K}x_k^{\alpha _k}\right\} \,, \end{aligned}$$

see, e.g., [2, 10, 12, 29]. In this context, the bosonic particle number operator can be seen as a polynomial degree operator. That is, if a polynomial is a linear combination of homogeneous polynomials with the same degree, its coefficient vector is an eigenvector of the polynomial degree operator with the eigenvalue equal to the degree. In \(V_n^K\) the degree is precisely the cardinality of the multi-index \(\alpha \).

Another interesting example is the case \(n_k = n\) and \(\lambda _{k,\alpha _k} = 1\), \(\alpha _k > 0\), which in the context of polynomials with \(V_n^K\) as above measures the number of variables in a polynomial. This means eigenvectors of this operator are associated with coefficient vectors where the multi-index \(\alpha \) is nonzero only for a fixed number (the associated eigenvalue) of variables.

We define the block sizes \(\rho _{k,\mu } := \# \mathcal {S}_{k,\mu } \), where \(\sum _{\mu \in {\mathcal {K}}_{\lambda ,k}} \rho _{k,\mu } = r_k\), and derive the following upper bounds.

Lemma 3.7

Let \(\varvec{x} \in \mathbb {R}^{n_1\times \cdots \times n_K}\), \(\varvec{x}\ne 0\), have the representation \(\varvec{x} = \tau (\mathsf {X})\) with minimal ranks \(\mathsf {r} = (r_1,\ldots , r_{K-1})\) and \(\varvec{L} \varvec{x} = \lambda \varvec{x}\). Furthermore, let \(E_{\mu ,k}^{{\le }{4.3pt}}\) and \(E_{\lambda -\mu ,k}^{\tiny {>}}\) be the eigenspaces of \(\varvec{L}^{{\le }{4.3pt}}_{k}\) and \(\varvec{L}^{>}_{k}\) of eigenvalues \(\mu \) and \(\lambda -\mu \), respectively. Then for \(k=1,\ldots , K\) and \(\mu \in {\mathcal {K}}_{\lambda ,k}\), we have

$$\begin{aligned} \rho _{k,\mu } \le \min \bigl \{ \dim E_{\mu ,k}^{{\le }{4.3pt}},\dim E_{\lambda -\mu ,k}^{\tiny {>}} \bigr \} \,. \end{aligned}$$


With Theorem 3.2, for \(j \in \mathcal {S}_{k,\mu }\), we obtain \(\varvec{L}^{{\le }{4.3pt}}_{k} \tau ^{{\le }{4.3pt}}_{k,j} (\mathsf {X}) = \mu \tau ^{{\le }{4.3pt}}_{k,j} (\mathsf {X})\) and \(\varvec{L}^{>}_{k} \tau ^{>}_{k,j} (\mathsf {X}) = (\lambda -\mu ) \tau ^{>}_{k,j} (\mathsf {X})\). The partial tensors \(\tau ^{{\le }{4.3pt}}_{k,j} (\mathsf {X})\) are linearly independent because if they were not, we could reduce the rank \(r_k\), which is assumed to be minimal. Therefore \(\rho _{k,\mu }\) has to be smaller or equal to the dimension of the eigenspace of the operator \(\varvec{L}^{{\le }{4.3pt}}_{k}\) to the eigenvalue \(\mu \). Analogously, we look at the eigenspace of the operator \(\varvec{L}^{>}_{k}\) to the eigenvalue \(\lambda -\mu \). \(\square \)

Note that for the particle number operator \(\varvec{P}\), where \(\rho _{k,n} = \# \mathcal {S}_{k,n} \) for \(n \in {\mathcal {K}}_k\),

$$\begin{aligned} \dim E_{n,k}^{{\le }{4.3pt}} = {k \atopwithdelims ()n}, \quad n \in \{0\ldots ,N\}. \end{aligned}$$

As a final result in this chapter, we show that Laplace-like operators commute with the tangent space projection and thus, they share the same eigenvectors.

Corollary 3.8

For \(\varvec{x}\in \mathbb {R}^{n_1\times \cdots \times n_K}\) and \(\varvec{L}\varvec{x} = \lambda \varvec{x}\), we have \(\varvec{Q}_{\varvec{x}}\varvec{L} = \varvec{L}\varvec{Q}_{\varvec{x}}\).


We apply Theorem 3.2. For two arbitrary but fixed multi-indices \(\alpha ,\beta \in {\mathcal {N}}\) let \(\mathsf {e}^\alpha = (e^{\alpha _1},\ldots ,e^{\alpha _K})\) and \(\mathsf {e}^\beta = (e^{\beta _1},\ldots ,e^{\beta _K})\) be two representations such that \(\tau (\mathsf {e}^\alpha )\) and \(\tau (\mathsf {e}^\beta )\) are unit vectors in \(\mathbb {R}^{n_1\times \cdots \times n_K}\). Then it suffices to show that

$$\begin{aligned} \langle \varvec{Q}_{\varvec{x}}\varvec{L}\tau (\mathsf {e}^\beta ), \tau (\mathsf {e}^\alpha ) \rangle = \langle \varvec{L}\varvec{Q}_{\varvec{x}}\tau (\mathsf {e}^\beta ), \tau (\mathsf {e}^\alpha ) \rangle , \end{aligned}$$

and since \(\tau (\mathsf {e}^\alpha )\) and \(\tau (\mathsf {e}^\beta )\) are eigenvectors of \(\varvec{L}\) with eigenvalue \(\lambda _\alpha \) and \(\lambda _\beta \), this simplifies to showing that for \(\lambda _\alpha \ne \lambda _\beta \) we have \(\langle \varvec{Q}_{\varvec{x}}\tau (\mathsf {e}^\beta ), \tau (\mathsf {e}^\alpha ) \rangle = 0 \). Now for \(k=1,\ldots ,K\), we show that

$$\begin{aligned} \langle \varvec{Q}_{\varvec{x}}^{k,i} \tau (\mathsf {e}^\beta ), \tau (\mathsf {e}^\alpha ) \rangle = 0, \qquad i = 1,2. \end{aligned}$$

We give the proof for \(i=1\), the case \(i=2\) can be treated analogously. Note that if \(\lambda ^\alpha \ne \lambda ^\beta \), we can assume without loss of generality that \(\tau ^{<}_{k,1} (\mathsf {e}^\alpha )\) and \(\tau ^{<}_{k,1} (\mathsf {e}^\beta )\) are eigenvectors of different eigenvalues \(\lambda _{k-1,\alpha }^{{\le }{4.3pt}}\ne \lambda _{k-1,\beta }^{{\le }{4.3pt}}\) of \(\varvec{L}^{{\le }{4.3pt}}_{k-1}\). But since \(\varvec{x}\) is also an eigenvalue of \(\varvec{L}\) it has the properties shown in Theorem 3.2, and consequently,

$$\begin{aligned} \langle \tau ^{<}_{k,j_1} (\mathsf {U}), \tau ^{<}_{k,1} (\mathsf {e}^\alpha ) \rangle= & {} 0 \text { for } j_1\notin \mathcal {S}_{k-1,\lambda _{k-1,\alpha }^{\le }} \text { and } \langle \tau ^{<}_{k,j_2} (\mathsf {U}), \tau ^{<}_{k,1} (\mathsf {e}^\beta ) \rangle \\= & {} 0 \text { for }j_2\notin \mathcal {S}_{k-1,\lambda _{k-1,\beta }^{\le }}. \end{aligned}$$

As \(\mathcal {S}_{k-1,\lambda _{k-1,\alpha }^{{\le }{4.3pt}}}\cap \mathcal {S}_{k-1, \lambda _{k-1,\beta }^{{\le }{4.3pt}}} = \emptyset \), this implies

$$\begin{aligned} \langle \tau ^{<}_{k,j} (\mathsf {U}), \tau ^{<}_{k,1} (\mathsf {e}^\alpha )\rangle \langle \tau ^{<}_{k,j} (\mathsf {U}),\tau ^{<}_{k,1} (\mathsf {e}^\beta )\rangle = 0\quad \text {for all } j\in \{1,\ldots , r_{k-1}\}, \end{aligned}$$

concluding the proof. \(\square \)

4 Basic operations on block-structured matrix products states

In the remainder of this article, we restrict ourselves again to the case \(\varvec{L} = \varvec{P}\) and \({\mathcal {N}} = \{0,1\}^K\). For fixed \(N \le K\), we denote the space of all tensors \(\varvec{x} \in {\mathcal {F}}^K\) with \(\varvec{P} \varvec{x} = N \varvec{x}\) as

$$\begin{aligned} {\mathcal {F}}^K_N := \{ \varvec{x} \in {\mathcal {F}}^K : \varvec{P} \varvec{x} = N \varvec{x} \} \end{aligned}$$

and we represent them in the block-sparse MPS format. The block structure of the matrix product states leads to more efficient storage and computation, if exploited correctly. Furthermore, a restriction to one of the eigenspaces of the particle number operator eliminates redundancies in iterative minimization schemes.

We simplify notation by first noting that the sets \(\mathcal {S}_{k,n}\) are disjoint and that they can be ordered arbitrarily due to the invariance of the components. This means that the matrices \(X^{\{0\}}_k\) and \(X^{\{1\}}_k\) are either block-diagonal or they have blocks only just above or just below the diagonal. We denote the blocks representing an unoccupied k-th orbital by

and those representing an occupied orbital by

For k such that \(N< k < K-N+1\), which we refer to as the generic case, we have \({\mathcal {K}}_{k-1} = {\mathcal {K}}_k = \{0,\ldots ,N\}\); otherwise, the number of particles to the right and to the left of orbital k, and hence the elements of \({\mathcal {K}}_{k-1}\) and \({\mathcal {K}}_k\), are restricted according to (3.9). The corresponding block structure according to Corollary 3.4 has the form

Since nonzero blocks for the unoccupied orbital never occur in the same position as the ones for the occupied orbital, the two layers \(\alpha = 0\) and \(\alpha = 1\) can be summarized in the core representation

where each block is composed of vectors, and where we define


The cases where either \(k \le N\) or \(k \ge K-N+1\) have the last rows or first columns (and zero rows and columns) removed, respectively, as illustrated in the following example.

Example 4.1

A tensor \(\varvec{x} \in {\mathcal {F}}^5_2\) of order \(K=5\) and particle number \(N=2\), representing 5 orbitals and 2 particles, has the form

As an eigenspace of a linear operator, \({\mathcal {F}}^K_N\) is a linear subspace of \({\mathcal {F}}^K\). Addition and scalar multiplication of MPS in this subspace correspondingly work equivalently to those of regular MPS: Addition of two tensors in block-sparse MPS format is the concatenation of corresponding blocks, scalar multiplication is the multiplication of all blocks in one of its components. We will now explicitly describe some further more involved operations, as well as left- and right-orthogonalization and rank truncation procedures.

Many operations on MPS can be performed either left-to-right or right-to-left, and in general, both versions are required. Here we state right-to-left procedures, as their description is notationally more compact. Apart from the notation, however, all left-to-right procedures are performed analogously.

figure a

Alg. 1 describes the scheme for computing the inner product of two elements of \({\mathcal {F}}^K_N\) in block-sparse MPS format, each of which may be of arbitrary ranks. The scheme consists of a right-to-left procedure that successively contracts the blocks of corresponding size. Note that in each step, one ultimately constructs the partial representation mapping \(\tau ^{>}_{k,j} (\mathsf {X})\) where \(j \in \mathcal {S}_{k,n}, n \in {\mathcal {K}}_k\). This, and its left-to-right counterpart \(\tau ^{<}_{k,j} (\mathsf {X})\), is then given in a block-diagonal form, which can be exploited in the construction of subproblems in the DMRG algorithm discussed in Sec. 6.1.

figure b

In Alg. 2, we demonstrate the procedure for orthogonalizing from right to left, resulting in a right-orthogonal tensor. The method for bringing the tensor into its right-orthogonal TT-SVD representation, as described in Alg. 3, follows a similar pattern. Here, the input tensor needs to be given in left-orthogonal format (to which one transforms analogously to Alg. 2), singular value decompositions of joined blocks are computed from right to left, and the singular values in each step are stored. These singular values are used to truncate the ranks of the tensor based on the the estimates in [14, 28]. Alg. 4 summarizes this procedure, where the smallest singular values are selected such that they do not exceed the upper bound on the truncation error \(\varepsilon \). The rows and columns of the corresponding blocks are then deleted. A similar procedure can be used for truncation to given ranks. Note that if the error threshold \(\varepsilon \) is chosen too large, it is possible that the whole tensor is truncated to zero. This means that zero is actually the best low rank approximation to the given tensor. In the present context, we want to avoid this anomaly, since the zero tensor is physically meaningless (we emphasize that it does not represent the vacuum state). As long as \(\varepsilon < \Vert \varvec{x}\Vert \) in Alg. 4, this cannot occur.

figure c
figure d

Remark 4.2

(Optimality) If the standard TT-SVD is unique, for each k, it differs from the block-sparse TT-SVD representation produced by Algorithm 3 only by the ordering of singular values. In particular, when the entries of the diagonal matrices \(\Sigma _{k,n}\), \(n \in {\mathcal {K}}_k\) are ordered by size, the optimality properties (2.4) of the TT-SVD truncation hold also in this setting.

Remark 4.3

(Particle number conservation) The above remark implies that rank truncation of the TT-SVD is a particle number preserving operation. Non-uniqueness of the TT-SVD can occur only if a matricization has a multiple singular value; in such a case, an arbitrary choice of the TT-SVD in general is not block-sparse, and truncation of the TT-SVD may change the particle number. As illustrated in Sect. 6.3, in cases with singular values that are distinct but close, the associated numerical instability of singular vectors can lead to deviations in the particle number when the numerically computed TT-SVD is truncated, unless the block structure is enforced explicitly.

Remark 4.4

(Number of operations) Depending on the relation between block sizes and total MPS ranks, the block-sparse representation can allow for a substantial reduction of computational costs. As an example, we consider the TT-SVD procedure as in Algorithm 3 with \(K\gg N\). For convenience, let \(\rho _{k,n} = \#{ \mathcal {S}_{k,n} }\) if \(n \in {\mathcal {K}}_k\) and \(\rho _{k,n} = 0\) otherwise. Then the number of operations of this algorithm are dominated by the SVDs for joined blocks for each k and n, and thus in total of order

$$\begin{aligned} \mathcal {O}\biggl ( \sum _{k=2}^K \sum _{n\in {\mathcal {K}}_k} \min \{ \rho _{k-1,n}, \rho _{k,n} + \rho _{k,n+1}\}^2 \max \{ \rho _{k-1,n}, \rho _{k,n} + \rho _{k,n+1}\} \biggr ). \end{aligned}$$

If \(\rho _{k,n} = {\bar{\rho }}\) for all kn, the corresponding total ranks are \(r_k = (N+1){\bar{\rho }}\), with exception of the 2N lowest and highest values of k, and thus the total operation costs scale as \(\mathcal {O}(K N {\bar{\rho }}^3)\). In comparison, the TT-SVD of a full MPS representation costs

$$\begin{aligned} \mathcal {O}\biggl (\sum _{k=2}^K \min \{r_{k-1},r_k\}^2 \max \{r_{k-1},r_k\} \biggr ), \end{aligned}$$

where \(\sum _{k=2}^K \min \{r_{k-1},r_k\}^2 \max \{r_{k-1},r_k\} \approx K N^3{\bar{\rho }}^3\). In such a case, the exploitation of block sparsity thus leads to a reduction of the costs approximately by a factor \(N^2\). However, if for each k, one has \(r_k = \rho _{k,n}\) for some n, corresponding to only a single block being nonzero for each k, there is no reduction of storage or operations costs compared to the full MPS representation.

Finally, we want to show how the tangent space of fixed-rank MPS manifolds (see [20]) can be handled algorithmically with the block-sparse MPS format. To this end, we state the parametrization in block-sparse MPS format of an arbitrary element \(\varvec{y}\) of the tangent space at a given \(\varvec{x} \in \mathbb {R}^{n_1\times \cdots \times n_K}\) of ranks \(r = (r_1,\cdots ,r_{K-1})\). For such \(\varvec{y}\), there exist cores \(\delta Y_k\in \mathbb {R}^{r_{k-1}\times n_k\times r_{k}}\) such that

$$\begin{aligned} \varvec{y} = \sum _{k=1}^K \tau (U_1,\cdots ,U_{k-1},\delta Y_k,V_{k+1},\cdots ,V_K), \end{aligned}$$

where as in Sec. 2.3, we assume that \(\varvec{x} = \tau (\mathsf {U}) = \tau (\mathsf {V})\) with \(\mathsf {U} = (U_1,\cdots ,U_K)\) in left- and \(\mathsf {V} = (V_1,\cdots ,V_K)\) in right-orthogonal form. Note that the cores \(\delta Y_k\) necessarily have the same block-sparse structure as \(U_k\) and \(V_k\).

The projection of \(\varvec{z} = \tau (\mathsf {Z}) \in {\mathcal {F}}^K_N\) to the tangent space at \(\varvec{x}\) can be obtained by first computing the components \(\delta Y_k\) of the projection \(\varvec{y}\) and then assembling the MPS representation of \(\varvec{y}\) using (4.1). The computation of the \(\delta Y_k\) can be performed in a similar fashion as the inner product in Algorithm 1, once from left to right and once from right to left. This means that for \(k=1,\ldots ,K-1\), one recursively evaluates

$$\begin{aligned} \left( \langle \tau ^{{\le }{4.3pt}}_{k,j} (\mathsf {U}),\tau ^{{\le }{4.3pt}}_{k,j'} (\mathsf {Z})\rangle \right) _{j,j'} \text { and } \left( \langle \tau ^{>}_{k,j} (\mathsf {V}),\tau ^{>}_{k,j'} (\mathsf {Z})\rangle \right) _{j,j'}. \end{aligned}$$

With these quantities at hand, one obtains \(\delta Y_k\) as described in [35]. Then one has the representation

$$\begin{aligned} \varvec{y} = \left[ \begin{array}{cc} U_1&\delta Y_1 \end{array}\right] {{\,\mathrm{\bowtie }\,}}\left[ \begin{array}{cc} U_2 &{} \delta Y_2\\ 0 &{} V_2 \end{array}\right] {{\,\mathrm{\bowtie }\,}}\cdots {{\,\mathrm{\bowtie }\,}}\left[ \begin{array}{c} \delta Y_K \\ V_K \end{array}\right] , \end{aligned}$$

where the rank indices in each core can be reordered to yield the same block-sparse structure as in \(\mathsf {U}\) and \(\mathsf {V}\) with at most doubled rank parameters.

5 Matrix product operators

It is well known that the Hamiltonian (1.1) commutes with the particle number operator \(\varvec{P}\) [18, §1.3.2]:

Lemma 5.1

We have that the Hamiltonian and the particle number operator commute, that is, \(\varvec{H}\varvec{P} = \varvec{P} \varvec{H}\). Furthermore, all eigenvectors of \(\varvec{H}\) are eigenvectors of \(\varvec{P}\).

Thus \(\varvec{H}\) preserves the particle number of a state as well as its block structure. In fact, we can show that every particle number-preserving operator can be written as a sum of rank-one particle number preserving operators of the form \(\varvec{a}_{D^+}^*\varvec{a}_{D^-}\) with subsets \({D^-,D^+\subseteq \{1,\dots ,K\}}\) such that \(\#{D^-}=\#{D^+}\), where \(\varvec{a}_D = \prod _{i\in D} \varvec{a}_i\). Note that we define \(\varvec{a}_{\emptyset }\) to be the identity mapping. Furthermore, we associate each \(D \subseteq \{ 1,\ldots , K\}\) with a unit vector \(\varvec{e}_D = \varvec{a}_D^*\varvec{e}_\mathrm {vac}\), where \(\varvec{e}_\mathrm {vac}\) is the vacuum state

$$\begin{aligned} \varvec{e}_\mathrm {vac}= \left( e^0\right) ^{\otimes K} \,. \end{aligned}$$

We then have the following result, which is shown in Appendix 1.

Lemma 5.2

Let \(\varvec{B}: {\mathcal {F}}^K \rightarrow {\mathcal {F}}^K\) be a particle number-preserving operator, that is, \(\varvec{B}\) maps each eigenspace of \(\varvec{P}\) to itself. Then there exist coefficients \(v_{D^+,D^-}\in \mathbb {R}\) such that

$$\begin{aligned} \varvec{B} = \sum _{\begin{array}{c} D^+,D^- \subseteq \{1,\dots ,K\} \\ \# D^+ = \# D^- \end{array} } v_{D^+,D^-} \varvec{a}_{D^+}^*\varvec{a}_{D^-}, \end{aligned}$$

in other words, \(\varvec{B}\) can be written as a sum of rank-one particle number-preserving operators.

Linear operators on matrix product states can be in the MPO format (2.2) with cores of order four. We will now investigate the ranks in the MPO representation of Hamiltonians of the form (1.1), that is, particle number-preserving operators with one- and two-particle terms. As shown below, the MPO ranks of such operators grow at most quadratically with the order K of the tensor. Furthermore, since both of these operators preserve the particle number, their effect on a block-sparse MPS can be expressed only in terms of the blocks. At the end of this section, we will show that each of the summands in (5.1) describes nothing but a shift and scalar multiplication of some of the blocks. This means that the application of the Hamiltonian to a block-sparse MPS can be expressed in a matrix-free way, leading to an elegant and efficient algorithmic treatment.

5.1 Compact forms of operators

We now turn to the ranks of Hamiltonians as in (1.1) in second quantization in MPO format. As shown in this section, compared to the number of rank-one terms in the representation (1.1), one can obtain substantially reduced ranks in MPO representations. The basic mechanism behind this rank reduction is described in [6] for projected Hamiltonians in the context of DMRG solvers and, in an MPO form for full Hamiltonians similar to the one given here, in [7, 23]. For one-particle operators, MPO representations are also given in the mathematical literature [11, 21]. Here we use similar considerations to construct an explicit MPO representation of the full Hamiltonian with near-minimal ranks and with a unified treatment of one- and two-particle operators. To avoid technicalities, in what follows we assume K to be even, but this is not essential for the construction.

In preparation for the MPO representations of the one- and two-particle operators, for illustrative purposes, start with the case of Laplace-like operators

$$\begin{aligned} \varvec{F} = \sum _{i=1}^K\lambda _i\varvec{a}_i^*\varvec{a}_i. \end{aligned}$$

We denote this operator by \(\varvec{F}\) since the Fock operator is of this form when its eigenfunctions are used as orbitals. From (5.2) and (2.5), we immediately obtain a representation of \(\varvec{F}\) of MPO rank K. However, using the components in (2.5), we can write \(\varvec{F}\) in MPO format with rank 2. To this end, we define

$$\begin{aligned} F_1 = \left[ \begin{array}{cc} I&\lambda _1 A^*A \end{array}\right] , \qquad F_k = \left[ \begin{array}{cc} I &{} \lambda _k A^*A\\ 0 &{} I \end{array}\right] , \;\; k =2,\ldots , K-1, \qquad F_K = \left[ \begin{array}{cc} \lambda _K A^*A \\ I \end{array}\right] . \end{aligned}$$

With these blocks, as in [22] one immediately verifies that the following representation holds and a linear scaling with respect to K in the termwise representation (5.2) can be reduced to a constant rank in the MPO format.

Lemma 5.3

We have \(\varvec{F} = F_1 {{\,\mathrm{\bowtie }\,}}F_2 {{\,\mathrm{\bowtie }\,}}\cdots {{\,\mathrm{\bowtie }\,}}F_K\), that is, \(\varvec{F}\) has an MPO representation of rank 2.

Before we turn to the one- and two-particle operators, we define some notation that will be needed in both cases. For abbreviating blocks with repeated components, we introduce the abbreviations

$$\begin{aligned} \mathsf {I}_k = I_k \uparrow I =\left[ \begin{array}{cccc} I &{} 0 &{} \cdots &{} 0\\ 0 &{} I &{} \cdots &{} 0\\ \vdots &{}\vdots &{} \ddots &{} \vdots \\ 0 &{} 0 &{} \cdots &{} I \end{array}\right] \in \mathbb {R}^{k\times 2\times 2\times k},\quad \mathsf {S}_k = I_k \uparrow S = \left[ \begin{array}{cccc} S &{} 0 &{} \cdots &{} 0\\ 0 &{} S &{} \cdots &{} 0\\ \vdots &{}\vdots &{} \ddots &{} \vdots \\ 0 &{} 0 &{} \cdots &{} S \end{array}\right] \in \mathbb {R}^{k\times 2\times 2\times k}, \end{aligned}$$

and analogously \(\mathsf {A}_k = I_k \uparrow A\) and \(\mathsf {A}^*_k = I_k \uparrow A^*\), where we write \(I_k\) for the identity matrix of size k.

With this, we can turn to the one-particle operator \(\varvec{S}\) given by

$$\begin{aligned} \varvec{S} =\sum _{i,j=1}^Kt_{ij}\varvec{a}_i^*\varvec{a}_j, \end{aligned}$$

with symmetric coefficient matrix \((t_{ij})_{i,j=1,\ldots , K}\). Naively, \(\varvec{S}\) can be written in MPO format with rank \(K^2\), but again, one can do better: we now show that \(\varvec{S}\) in fact can be written in MPO format with rank \(K+2\).

For each \(k \in \{1,\ldots ,K\}\), we define some slices of the coefficient matrix \(T = (t_{ij})_{i=1,\ldots , K}^{j=1,\ldots , K}\), where the subscript indices correspond to rows and the superscript indices to columns:

$$\begin{aligned} W_{T,k}^1&= (t_{ik})_{i=1,\ldots , k-1},\quad&W_{T,k}^2&= (t_{kj})_{j=1,\ldots , k-1} , \\ W_{T,k}^3&= (t_{kj})^{j=K,\ldots , k+1},\quad&W_{T,k}^4&= (t_{ik})^{i=K,\ldots , k+1}. \end{aligned}$$

Furthermore, the top-right and bottom-left blocks of T are given by

$$\begin{aligned} W_{T}^5 = (t_{ij})_{i=1,\ldots , K/2}^{j=K,\ldots , K/2+1},\qquad \qquad \qquad \qquad W_{T}^6 = (t_{ij})_{j=1,\ldots K/2}^{i=K,\ldots , K/2+1}. \end{aligned}$$

We define the components

$$\begin{aligned} T_1 = \left[ \begin{array}{cccc} I&A^*&A&t_{1,1}A^*A \end{array}\right] \,,\qquad T_K = \left[ \begin{array}{cc} I \\ A \\ A^* \\ t_{K,K}A^*A \end{array}\right] , \end{aligned}$$

for \(k = 2,\ldots ,\frac{K}{2}\),

$$\begin{aligned} T_k = \left[ \begin{array}{cccccc} I &{} 0 &{}A &{} 0 &{}A^*&{} t_{k,k}A^*A\\ 0 &{} \mathsf {S}_{k-1} &{} 0 &{} 0 &{}0 &{} W_{T,k}^1 \uparrow A^*\\ 0 &{} 0 &{}0 &{} \mathsf {S}_{k-1} &{} 0 &{} W_{T,k}^2 \uparrow A\\ 0 &{} 0 &{}0 &{} 0&{}0&{} I\\ \end{array}\right] , \end{aligned}$$

and for \(k = \frac{K}{2} + 1, \ldots ,K-1\),

$$\begin{aligned} T_k = \left[ \begin{array}{c@{\qquad }c@{\qquad }c@{\qquad }c} I &{} 0 &{}0 &{} 0\\ 0 &{} \mathsf {S}_{K-k} &{} 0 &{} 0 \\ A &{} 0 &{}0 &{} 0\\ 0 &{} 0 &{} \mathsf {S}_{K-k} &{} 0\\ A^* &{} 0 &{} 0 &{} 0\\ t_{k,k}A^*A &{} W_{T,k}^3\uparrow A^* &{} W_{T,k}^4\uparrow A&{} I\\ \end{array}\right] . \end{aligned}$$

Finally, let

$$\begin{aligned} M_T = \begin{pmatrix} 0 &{} 0 &{} 0 &{} 1\\ 0 &{} 0 &{}W_{T}^6 &{} 0 \\ 0 &{} W_{T}^5 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 0 \\ \end{pmatrix}. \end{aligned}$$

This allows us to state the one-particle operator explicitly and with (near-)minimal ranks. The same rank bounds can also be extracted from the alternative MPO representation in [11, Thm. 4.2] (see also [21, Lemma 3.2]); the construction we describe here, however, also serves as a preparation for our similar approach to the two-particle case.

Theorem 5.4

We have

$$\begin{aligned} \varvec{S} = T_1{{\,\mathrm{\bowtie }\,}}T_2{{\,\mathrm{\bowtie }\,}}\cdots {{\,\mathrm{\bowtie }\,}}T_{ K/2} M_T {{\,\mathrm{\bowtie }\,}}T_{ K/2+1} {{\,\mathrm{\bowtie }\,}}\cdots {{\,\mathrm{\bowtie }\,}}T_K. \end{aligned}$$

Furthermore, the MPO rank of \(\varvec{S}\) is bounded by \(K+2\). If for some \(d \in \mathbb {N}_0\), we have \( t_{ij} = 0\) whenever \(|i-j| > d\), then the MPO ranks of \(\varvec{S}\) are bounded by \(2d+2\).

For the proof, we proceed as follows: We divide the claim into two cases \(k \le \frac{K}{2}\) and \(k > \frac{K}{2}\) and show by induction over k that the rank of \(T_k\) for \(k\le \frac{K}{2}\) can be bounded by \(2+2k\). Consequently, for \(k=\frac{K}{2}\) we find that the rank of \(\varvec{S}\) can be bounded by \(K+2\). This is also the bound for the rank of the matrix \(M_T\). For the sparse coefficient matrix we directly consider \(M_T\), since the rank is maximized at the center of the representation, where the rank of \(W_{T}^5\) and the rank of \(W_{T}^6\) can be bounded by d in both cases. Thus the rank of \(M_T\) is bounded by \(2d+2\). The details of the proof are given Appendix 2.

The case of two-electron operators \(\varvec{D}\) given by

$$\begin{aligned} \varvec{D} = \sum _{i_1,i_2,j_1,j_2=1}^K v_{i_1 i_2 j_1 j_2}\varvec{a}_{i_1}^* \varvec{a}_{i_2}^* \varvec{a}_{j_1} \varvec{a}_{j_2} \end{aligned}$$

is more involved but can be dealt with quite analogously. We briefly note that due to the anticommutation relations (1.2), one only needs to do the sum over \(i_1 < i_2\), \(j_1 < j_2\), which reduces the number of terms: By grouping together

$$\begin{aligned} {\tilde{v}}_{i_1 i_2 j_1 j_2} =&{\left\{ \begin{array}{ll} v_{i_1 i_2 j_1 j_2} + v_{i_2 i_1 j_2 j_1} - v_{i_2 i_1 j_1 j_2} - v_{i_1 i_2 j_2 j_1}, &{} i_1< i_2, j_1 < j_2, \\ 0 ,&{} \text {otherwise,} \end{array}\right. } \end{aligned}$$

we obtain

$$\begin{aligned} \varvec{D} = \sum _{i_2,j_2=1}^K \sum _{i_1=1}^{i_2-1} \sum _{j_1=1}^{j_2-1} {\tilde{v}}_{i_1 i_2 j_1 j_2} \varvec{a}_{i_1}^* \varvec{a}_{i_2}^* \varvec{a}_{j_1} \varvec{a}_{j_2}. \end{aligned}$$

We denote the tensor grouping the coefficients of V by \({\tilde{V}}=\left( {\tilde{v}}_{i_1 i_2 j_1 j_2}\right) _{i_1 i_2 j_1 j_2=1}^K\). Again, the Kronecker-rank of this operator is \(\left( {\begin{array}{c}K\\ 2\end{array}}\right) ^2 = O(K^4)\), so naively \(\varvec{D}\) could be written with MPO rank \(\left( {\begin{array}{c}K\\ 2\end{array}}\right) ^2\). But we can do better. With the help of the matrices in 2.5 we can write \(\varvec{D}\) in MPO format with rank \(\frac{1}{2} K^2 + \frac{3}{2}{K} + 2\).

As before we need to extract different matrix slices from \({\tilde{V}}\), where again subscript indices correspond to rows and superscript indices to columns:

$$\begin{aligned} W_{V,k}^1&= ({\tilde{v}}_{i_1kkj_2})_{i_1=1,\ldots , k-1}^{j_2=k+1,\ldots , K} \quad&W_{V,k}^2&= ({\tilde{v}}_{ki_2kj_2})_{j_2=1,\ldots , k-1}^{i_2=k+1,\ldots , K}\\ W_{V,k}^3&= ({\tilde{v}}_{i_1kj_1j_2})_{i_1=1,\ldots , k-1;\,j_1=1,\ldots , k-1}^{j_2=k+1\cdots K} \quad&W_{V,k}^4&= ({\tilde{v}}_{i_1i_2j_1k})_{i_1=1,\ldots , k-1;\,j_1=1,\ldots , k-1}^{i_2=k+1,\ldots , K}\\ W_{V,k}^5&= ({\tilde{v}}_{i_1kj_1k})_{i_1=1,\ldots , k-1 ;\, j_1=1,\ldots , k-1} \quad&W_{V,k}^6&= ({\tilde{v}}_{i_1i_2kj_2})_{i_1=1,\ldots , i_2-1;\,i_2=2,\ldots , k-1}^{j_2=k+1,\ldots , K}\\ W_{V,k}^7&= ({\tilde{v}}_{ki_2j_1j_2})_{j_1=1,\ldots , j_2-1 ;\,j_2=2,\ldots , k-1}^{i_2=k+1,\ldots , K}\quad&W_{V,k}^8&= ({\tilde{v}}_{ki_2j_1j_2})_{i_1=1,\ldots , K/2 ;\, j_1=1,\ldots , K/2}^{i_2=K,\ldots , K/2+1 ;\, j_2=K,\ldots , K/2+1}\\ W_{V,k}^9&= ({\tilde{v}}_{ki_2j_1j_2})_{j_1=2,\ldots , K/2;\, j_2=1,\ldots , j_1-1}^{i_1=K,\ldots , i_2+1 ;\, i_2=K-1,\ldots , K/2+1}\quad&W_{V,k}^{10}&= ({\tilde{v}}_{ki_2j_1j_2})_{i_1=1,\ldots , i_2-1 ;\, i_2=2,\ldots , K/2}^{j_1=K-1,\ldots , K/2+1; \, j_2=K,\ldots , j_1+1} \end{aligned}$$

For \(k = 1,\ldots , \frac{K}{2}\) let us define the blocks

$$\begin{aligned} V_k^{1,1}&= \left[ \begin{array}{ccccccccccccc} I &{} 0 &{} A^* &{} 0 &{} A&{}0&{}A^*A&{}0&{}0&{}0&{}0&{}0&{}0\\ 0 &{} \mathsf {S}_{k-1} &{} 0 &{} 0 &{} 0&{}0&{}0&{}\mathsf {A}_{k-1}&{}0&{}0&{}\mathsf {A}_{k-1}^*&{}0&{}0\\ 0 &{} 0 &{} 0 &{} \mathsf {S}_{k-1} &{} 0&{}0&{}0&{}0&{}\mathsf {A}_{k-1}^*&{}0&{}0&{}0&{}\mathsf {A}_{k-1}\\ 0&{}0&{}0&{}0&{}0 &{} \mathsf {I}_{k^2}&{}0&{}0&{}0&{}0&{}0&{}0&{}0\\ 0&{}0&{}0&{}0&{}0 &{}0&{}0&{}0&{}0&{}\mathsf {I}_{\left( {\begin{array}{c}k-1\\ 2\end{array}}\right) }&{}0&{}0&{}0\\ 0&{}0&{}0&{}0&{}0 &{}0&{}0&{}0&{}0&{}0&{}0&{}\mathsf {I}_{\left( {\begin{array}{c}k-1\\ 2\end{array}}\right) }&0 \end{array}\right] \end{aligned}$$

as well as

$$\begin{aligned} V_k^{1,2}&= \left[ \begin{array}{ccc} 0 &{}0 &{}0\\ W_{V,k}^1\uparrow A^*A &{}0&{}0\\ 0&{}W_{V,k}^2\uparrow A^*A &{} 0\\ W_{V,k}^3\uparrow A^*&{}W_{V,k}^4\uparrow A&{}W_{V,k}^5\uparrow A^*A\\ W_{V,k}^6\uparrow A&{}0&{}0\\ 0&{}W_{V,k}^7\uparrow A^*&{}0\\ \end{array}\right] , \qquad \qquad V_k^{2,2} =\left[ \begin{array}{ccc} 0&{}0&{}A\\ \mathsf {S}_{K-k}&{}0&{}0 \\ 0&{}0&{}A^*\\ 0&{}\mathsf {S}_{K-k}&{}0\\ 0&{}0&{}I \end{array}\right] \,. \end{aligned}$$

With these blocks we have

$$\begin{aligned} V_1 = V_1^{1,1} \quad V_2 = \left[ \begin{array}{cc} V_2^{1,1}&V_2^{1,2}\end{array}\right] \quad \mathrm {and} \quad V_k = \left[ \begin{array}{cc} V_k^{1,1} &{} V_k^{1,2}\\ 0 &{}V_k^{2,2} \end{array}\right] , \, k = 3,\ldots , \frac{K}{2}. \end{aligned}$$

Furthermore, we set

where there are \(\frac{K}{2}+1\) ones on each of the two antidiagonals above and below \(V_{\mathrm {mid}}\). Finally, for \(k = \frac{K}{2}+1,\ldots , K\), we analogously obtain

$$\begin{aligned} V_K = V_K^{1,1}, \quad V_{K-1} = \left[ \begin{array}{ccc} V_{K-1}^{1,1} \\ V_{K-1}^{2,1}\end{array}\right] \quad \mathrm {and} \quad V_k = \left[ \begin{array}{ccc} V_k^{1,1} &{} 0\\ V_k^{2,1} &{}V_k^{2,2} \end{array}\right] , \, k = \frac{K}{2} +1,\ldots , K-2. \end{aligned}$$

where \(\left( V_k^{2,1}\right) ^T\) is similar to \(V_k^{1,2}\) with modified coefficients.

With the necessary notation out of the way, we immediately state the MPO representation of the two-particle operator with near-minimal ranks.

Theorem 5.5

We have

$$\begin{aligned} \varvec{D} = V_1{{\,\mathrm{\bowtie }\,}}V_2{{\,\mathrm{\bowtie }\,}}\cdots {{\,\mathrm{\bowtie }\,}}V_{K/2} M_V {{\,\mathrm{\bowtie }\,}}V_{ K/2 + 1}{{\,\mathrm{\bowtie }\,}}\cdots {{\,\mathrm{\bowtie }\,}}V_K, \end{aligned}$$

implying that \(\varvec{D}\) has an MPO representation of rank \(\frac{1}{2} {K^2}+ \frac{3}{2} {K}+2\). If there exists \(d\in \mathbb {N}_0\) such that \(v_{i_1 i_2 j_1 j_2} = 0\) whenever

$$\begin{aligned} \max \{ |i_1-i_2|,|i_1-j_1|,|i_1-j_2|,|i_2-j_1|,|i_2-j_2|,|j_1-j_2| \} > d, \end{aligned}$$

then the MPO ranks of \(\varvec{D}\) are bounded by \(d^2+3d-1\) if d is odd and by \(d^2+3d-2\) if d is even.


We can proceed as in the proof of Theorem 5.4. Again, a detailed proof can be found in Appendix 3. \(\square \)

Remark 5.6

It is possible to incorporate the one-electron operator into the two-electron operator, such that the MPO ranks of \(\varvec{S}+\varvec{D}\) are also bounded by \(\frac{1}{2} {K^2} + \frac{3}{2} {K} +2\). Note that the stated rank bounds are not sharp for the leading and trailing cores, where further reductions are possible with additional technical effort (see also Sect. 6.4).

5.2 Matrix-free operations on block structures

Corollary 3.4 implies that particle number-preserving operators must also preserve the block structure. In other words, if \(\varvec{x} \in {\mathcal {F}}^K_N\) and if \(\varvec{B}\) is any particle number-preserving operator, then \(\varvec{y} := \varvec{B} \varvec{x} \in {\mathcal {F}}^K_N\) has a representation with block structure according to Corollary 3.4. However, if \(\mathsf {B}\) is an MPO representation of \(\varvec{B}\) as derived in Sect. 5.1 and \(\varvec{x} = \tau (\mathsf {X})\) with block-sparse \(\mathsf {X}\), this leaves open the questions whether we can directly obtain the block-sparse representation of \(\varvec{y}\) from the standard representation \(\mathsf {Y} := \mathsf {B} \bullet \mathsf {X}\) of the matrix-vector product, or whether additional transformations of \(\mathsf {Y}\) are required to extract the block structure. We now describe how the blocks of \(\varvec{y}\) can be obtained directly by replacing the component matrices I, S, A, \(A^*\), \(A^* A\) in \(\mathsf {B}\) by certain matrix-free operations on the blocks of \(\mathsf {X}\). Furthermore, these operations are performed entirely componentwise, that is, each component can be computed separately from the others and even partial evaluations are possible.

It turns out that each summand in (5.1) acts on the block-sparse MPS by shifting and deleting some of the blocks. This can be visualized by considering the one-particle part of the Hamiltonian, specifically the case where \(i = j\):

$$\begin{aligned} \varvec{a}_i^* \varvec{a}_i = I \otimes \dots \otimes A^* A \otimes \dots \otimes I. \end{aligned}$$

Since this operator has Kronecker rank one, each matrix in the product acts only on the corresponding component in the block MPS. Clearly, the identity matrices leave their components and their respective block structure unchanged. Only the matrix

$$\begin{aligned} A^*A = \begin{pmatrix} 0 &{} 0 \\ 0 &{} 1 \end{pmatrix} \end{aligned}$$

has the immediate effect of assigning zero to all unoccupied blocks,


Thus in this case, the particle number is preserved locally at orbital i and therefore, the block structure remains otherwise unchanged.

Additional difficulties appear, however, when \(i\ne j\), since the particle number is then conserved only by the combination of operations on different modes. Let us first consider \(i<j\),

$$\begin{aligned} \varvec{a}_i^* \varvec{a}_j = I \otimes \dots \otimes A^* \otimes S \otimes \dots \otimes S \otimes A \otimes \dots \otimes I. \end{aligned}$$

To avoid technicalities, for the moment we assume \(N< i< j < K - N + 1\), corresponding to the generic case where all blocks appear in each core. Again, the identity matrices leave everything unchanged. The creation matrix \(A^*\) replaces the occupied layer \(X_i^{\{1\}}\) by the unoccupied layer \(X_i^{\{0\}}\). However, this clearly violates the block structure, because occupied blocks should only be located on the off-diagonal and because \(N \notin ({\mathcal {K}}_i - 1)\). This inconsistency can only be resolved by noting that the added particle will be removed further down in the j-th position of the tensor. Additionally, we have to take into account that a particle was added to the left of all following components, thus increasing the particle count n by one in each block. The solution to the block structure violation therefore lies in shifting the corresponding blocks and deleting the ones that violate particle number counts. We summarize the case \(N< i< j < K - N + 1\) as follows:

where application of \(A^*\) corresponds to the block operations


application of A to the block operations


and for \(i< k < j\), applying S amounts to the block operations


The case where \(j < i\) can be dealt with analogously, but with the opposite shift because a particle gets removed on the left and all particle counts n have to be decreased by one until a particle gets added again in component i. This means that we have to distinguish the different cases in the implementation of, for instance, the action of the matrix S.

Remark 5.7

Some further technicalities need to be taken into account in an implementation:

  1. (i)

    The sizes of blocks that are set to zero in the above operations is dictated by the consistency of the representation: zero blocks of nontrivial size need to be kept, whereas redundancies due to zero blocks with a vanishing dimension need to be removed.

  2. (ii)

    For the border cases, such as \(i < N\) or \(j > K-N+1\), certain blocks do not occur. The accordingly modified sets \({\mathcal {K}}_k\) and the corresponding differences in the blocks that are present lead to modifications in the above operations.

Figure 1 shows the different cases in the implementation, including the blocks that need to be deleted in order to avoid irregularities. Here \(A^*A\) corresponds to the block operation (5.4); \(A^*_\ell \), \(A_r\) and \(S^+\) correspond to (5.5), (5.6), and (5.7), respectively; and \(A^*_r\), \(A_\ell \) and \(S^-\) are the analogous operations with opposite shifts. The figure assumes the generic block structure for \(i,j \in \{N+1,\ldots ,K-N\}\) and thus needs to be modified for the border cases mentioned in Remark 5.7(ii).

Fig. 1
figure 1

An illustration of the matrix free operations for the one-particle operator

In order to apply tensor representations of general operators of the form \(\varvec{a}_i^* \varvec{a}_j\) with a result given in the same block structure, the components need to be replaced by block operations with particle number semantics as in Fig. 1, depending on their position \(k = 1,\ldots ,K\):

$$\begin{aligned} A&\rightarrow {\left\{ \begin{array}{ll} A_l &{}\quad \text {if } k = j< i, \\ A_r &{}\quad \text {if } k = j> i, \end{array}\right. }&A^*&\rightarrow {\left\{ \begin{array}{ll} A_l^* &{}\quad \text {if } k = i< j, \\ A_r^* &{}\quad \text {if } k = i> j, \end{array}\right. } \\ I&\;\rightarrow \; {\left\{ \begin{array}{ll} I_l &{}\quad \text {if } k< \min (i,j), \\ I_r &{}\quad \text {if } k > \max (i,j), \end{array}\right. }&S&\rightarrow {\left\{ \begin{array}{ll} S^+ &{}\quad \text {if } i< k< j, \\ S^- &{}\quad \text {if } j< k < i. \end{array}\right. } \end{aligned}$$

In addition, we have the k-independent replacement of \(A^*A\) by the operation (5.4) and replace zero components by the operation Z that assigns zero to all blocks.

These operations can be performed efficiently by exchange, removal or sign changes of blocks in the tensor representation. One can proceed analogously for two-particle operators \(\varvec{a}_{i_1}^* \varvec{a}_{i_2}^* \varvec{a}_{j_1} \varvec{a}_{j_2}\), as shown in Fig. 2, where again one needs to make appropriate adjustments for the border cases. In a similar manner, this can be generalized to interactions of three or more particles.

Fig. 2
figure 2

An illustration of the matrix-free operations for the two-particle operator, assuming \(i_1<i_2\) and \(j_1 < j_2\) without loss of generality

5.3 Automatic rank reduction

The particle number semantics of the block operations according to Figs. 1 and 2 are compatible with forming linear combinations of operators. In particular, the full one- and two-particle operator representations constructed in Sect. 5.1 can in the same manner be applied entirely in terms of block operations: replacing \(A^* A\), \(A^*\), A, I, S, and 0 by the respective k-dependent block operations according to Figs. 1, 2 again leads to a consistent representation, and its application directly produces the correct block structure. For this, let \(\mathsf {S}_{\mathrm {b}}\) and \(\mathsf {D}_{\mathrm {b}}\) be the resulting representations of \(\varvec{S}\) and \(\varvec{D}\) with particle number semantics.

Proposition 5.8

Let \(\varvec{x} = \tau (\mathsf {X})\) be a block-sparse MPS representation, then \(\mathsf {U} = \mathsf {S}_{\mathrm {b}} \bullet \mathsf {X}\) and \(\mathsf {V} = \mathsf {D}_{\mathrm {b}} \bullet \mathsf {X}\) are block-sparse MPS representations with \(\varvec{S} \varvec{x} = \tau (\mathsf {U})\), \(\varvec{D} \varvec{x} = \tau (\mathsf {V})\).


By inspection of the proofs of Thm. 5.4 and 5.5, one finds that each rank index in the MPO representations constructed there corresponds to precisely one case in Fig. 1 or 2, respectively. The resulting representations operating on blocks thus directly yield a consistent block-sparse representation of the matrix-vector products. \(\square \)

The addition of such symbolic MPO representations can be done analogously to the addition of MPS and block-sparse MPS. In each core, these symbolic representations are composed of scalar multiples of the elementary matrix-free block operations discussed above, where the corresponding scalars can be collected in a separate matrix of coefficients. We say that a collection of columns of cores in rank-wise representation of matrix-free operators are linearly dependent if they contain the same symbols but the coefficient matrix is rank-deficient. In this case, the operator ranks can be reduced, provided that the corresponding rows in the next component are compatible, meaning that entrywise, their symbols can only differ if all but one of them are the zero symbol Z, which in turn means that they can be added. The resulting algorithm can be performed from left to right and the procedure can be repeated from right to left, where linearly dependent rows can be merged, see Algorithm 5. This automatically reduces the ranks of the sums of operators in the above symbolic representation. It can be very useful if the rank-reduced format for an operator is not known. In fact, as we show experimentally in Sect. 6.4, automatic rank reduction can even improve upon the operator representation derived in Sect. 5.1.

Example 5.9

Let \(K = 5\). The operators \(\varvec{a}_2^* \varvec{a}_2\), \(\varvec{a}_2^* \varvec{a}_4\) and \(\varvec{a}_4^* \varvec{a}_2\) can be represented with particle number semantics by

$$\begin{aligned} \varvec{a}_2^* \varvec{a}_2&= [ I_l ] {{\,\mathrm{\bowtie }\,}}[ A^* A ] {{\,\mathrm{\bowtie }\,}}[ I_r ] {{\,\mathrm{\bowtie }\,}}[ I_r ] {{\,\mathrm{\bowtie }\,}}[ I_r ], \\ \varvec{a}_2^* \varvec{a}_4&= [ I_l ] {{\,\mathrm{\bowtie }\,}}[ A^*_l ] {{\,\mathrm{\bowtie }\,}}[ S^+ ] {{\,\mathrm{\bowtie }\,}}[ A_r ] {{\,\mathrm{\bowtie }\,}}[ I_r ], \\ \varvec{a}_4^* \varvec{a}_2&= [ I_l ] {{\,\mathrm{\bowtie }\,}}[ A_l ] {{\,\mathrm{\bowtie }\,}}[ S^- ] {{\,\mathrm{\bowtie }\,}}[ A^*_r ] {{\,\mathrm{\bowtie }\,}}[ I_r ]. \end{aligned}$$

The sum of the operators is given by

$$\begin{aligned}&\varvec{a}_2^* \varvec{a}_2 + \varvec{a}_2^* \varvec{a}_4 + \varvec{a}_4^* \varvec{a}_2 \\&\quad =\left[ \begin{array}{ccc} I_l&I_l&I_l \end{array}\right] {{\,\mathrm{\bowtie }\,}}\left[ \begin{array}{ccc} A^*A &{} Z &{} Z \\ Z &{} A^*_l &{} Z \\ Z &{} Z &{} A_r \end{array}\right] {{\,\mathrm{\bowtie }\,}}\left[ \begin{array}{ccc} I_r &{} Z &{} Z \\ Z &{} S^+ &{} Z \\ Z &{} Z &{} S^- \end{array}\right] {{\,\mathrm{\bowtie }\,}}\left[ \begin{array}{ccc} I_r &{} Z &{} Z \\ Z &{} A_r &{} Z \\ Z &{} Z &{} A^*_r \end{array}\right] {{\,\mathrm{\bowtie }\,}}\left[ \begin{array}{cc} I_r \\ I_r \\ I_r \end{array}\right] . \end{aligned}$$

After the rank reduction, the operator has the form

$$\begin{aligned} \varvec{a}_2^* \varvec{a}_2 + \varvec{a}_2^* \varvec{a}_4 + \varvec{a}_4^* \varvec{a}_2 = \left[ \begin{array}{cc} I_l \end{array}\right] {{\,\mathrm{\bowtie }\,}}\left[ \begin{array}{ccc} A^*A&A^*_l&A_l \end{array}\right] {{\,\mathrm{\bowtie }\,}}\left[ \begin{array}{ccc} I_r &{} Z &{} Z \\ Z &{} S^+ &{} Z \\ Z &{} Z &{} S^- \end{array}\right] {{\,\mathrm{\bowtie }\,}}\left[ \begin{array}{cc} I_r \\ A_r \\ A^*_r \end{array}\right] {{\,\mathrm{\bowtie }\,}}\left[ \begin{array}{cc} I_r \end{array}\right] . \end{aligned}$$
figure e

6 Numerical aspects

This chapter serves as an outlook on numerical solvers for the eigenvalue problem \( \varvec{H} \varvec{x} = \lambda \varvec{x} \) with the additional constraint \(\varvec{P} \varvec{x} = N \varvec{x}\) implemented by keeping \(\varvec{x}\) in block-sparse format. We comment on standard iterative solvers and discuss their relation to this representation format. Furthermore, we give an example on the effect of enforcing block sparsity on the numerical stability of particle numbers with respect to TT-SVD truncation. Finally, we show that the ranks of the one- and two-particle operators, as discussed in Sect. 5.1, are indeed near-optimal.

6.1 Iterative methods with fixed and variable ranks

A standard method for the computation with MPS is the DMRG algorithm. All modern implementations of this method (see, for instance, [13, 17, 27, 31]) exploit the block sparsity in some form. For the sake of completeness, we give a brief overview of both the one-site and the two-site DMRG. We then turn to methods using global eigenvalue residuals. These methods are nonstandard in physical computations, but may become competitive when block sparsity is taken into account. A detailed numerical comparison of the methods will be subject of further research.

6.1.1 One-site DMRG/ALS

The one-site DMRG or ALS algorithm [19] optimizes one component of the MPS \(\varvec{x}\) at a time. With the appropriate orthogonalization, each subiteration consists of an optimization step on the linear part of the fixed-rank manifold, which coincides with its own tangent space. As such, the one-site DMRG can be formulated as a tangent space prodedure: Let \(\varvec{x}_{k,\ell }\) be the current iterate after \(\ell \) sweeps and the k-th subiteration. That is, we have previously optimized the k-th component and orthogononalized accordingly. Now, we optimize the \((k+1)\)-st component by minimizing the energy

$$\begin{aligned} \varvec{E}_{k,\ell }(\varvec{x}_{k+1,\ell }) = {\frac{\langle \varvec{x}_{k+1,\ell },\varvec{Q}_{\varvec{x}_{k,\ell }}^{k+1,1}\varvec{H} \varvec{Q}_{\varvec{x}_{k,\ell }}^{k+1,1}\varvec{x}_{k+1,\ell } \rangle }{\langle \varvec{x}_{k+1,\ell }, \varvec{x}_{k+1,\ell }\rangle }}. \end{aligned}$$

If \(k = K\), we can go back to \(k = 1\) or do the sweep in reverse. We note that \(\varvec{Q}_{\varvec{x}_{k,\ell }}^{k+1,1}\) is exactly the projection onto the part of the tangent space at \(\varvec{x}_{k,\ell }\) that corresponds to the \((k+1)\)-st component. If \(\varvec{x}_{k,\ell }\) is an eigenvector of the particle number operator \(\varvec{P}\), then by Corollary 3.8, \(\varvec{Q}_{\varvec{x}_{k,\ell }}^{k+1,1}\) commutes with \(\varvec{P}\). By Lemma 5.1, so does the Hamiltonian \(\varvec{H}\). Thus, the next iterate \(\varvec{x}_{k+1,\ell }\) will be in the same eigenspace of \(\varvec{P}\). Therefore, if one initializes the one-site DMRG algorithm with a block-sparse MPS of fixed particle number, then the block sparsity will be preserved for each iterate and the algorithm can be performed by operating only on the nonzero blocks.

6.1.2 Two-site DMRG

The classical (two-site) DMRG [19, 40] optimizes two neighboring components at once. This allows for a certain rank-adaptivity in between these components. While this gives the algorithm more flexibility, it also means that the subiterates can leave the fixed-rank manifold and even the tangent space. Nevertheless, we can show that the particle number will be preserved. To this end, we define the operation \({\varvec{\tilde{Q}}}_{\varvec{x}}^{k,1}\) for \(k=1,\ldots ,K-1\) similarly to \( \varvec{Q}_{\varvec{x}}^{k,1}\) by

$$\begin{aligned} {\varvec{\tilde{Q}}}_{\varvec{x}}^{k,1} = \biggl (\sum _{j_{k-1}=1}^{r_{k-1}}\tau ^{<}_{k,j_{k-1}} (\mathsf {U})\, \langle \tau ^{<}_{k,j_{k-1}} (\mathsf {U}),\, \cdot \, \rangle \biggr ) \otimes I \otimes I \otimes \\ \biggl (\sum _{j_{k+1}=1}^{r_{k+1}}\tau ^{>}_{k+1,j_{k+1}} (\mathsf {V})\,\langle \tau ^{>}_{k+1,j_{k+1}} (\mathsf {V}),\,\cdot \,\rangle \biggr ). \end{aligned}$$

As in Corollary 3.8, it can be shown that \({\varvec{\tilde{Q}}}_{\varvec{x}}^{k,1}\) and \(\varvec{P}\) commute. Thus, with the same argument as above, if the first iterate is an eigenvector of \(\varvec{P}\), then all iterates are in the same eigenspace.

6.1.3 (Preconditioned) Gradient descent

An alternative to the DMRG algorithm are methods operating globally on the MPS representation, such as (preconditioned) gradient descent or more involved variants such as LOBPCG [25]. For basic gradient descent, one can control the ranks by defining a threshold \(\epsilon > 0\) and performing the update scheme

$$\begin{aligned} \varvec{x}_{\ell +1} = {\text {trunc}}_\epsilon \left( \varvec{x}_\ell - \alpha _\ell \left( \varvec{H}\varvec{x}_m - \frac{\langle \varvec{x}_\ell , \varvec{H} \varvec{x}_\ell \rangle }{\langle \varvec{x}_\ell ,\varvec{x}_\ell \rangle }\varvec{x}_\ell \right) \right) . \end{aligned}$$

Since all involved steps preserve the particle number, this scheme produces a sequence \(\varvec{x}_\ell \) with the same particle number if the initial value \(\varvec{x}_0\) has a fixed particle number. Convergence can be accelerated by using an optimized step size \(\alpha _\ell \) or by preconditioning the system [32].

6.1.4 Riemannian gradient descent

One could also consider Riemannian methods, where the gradient is projected first onto the tangent space and the step is performed on the fixed-rank manifold [24]. Generalizations are possible that allow for rank adaptivity. This method is often used because the ranks can be fixed and because the projected gradient in the tangent space can be stated explicitly and compactly, thus reducing computational overhead. We construct a sequence \(\varvec{x}_\ell \) from an initial value \(\varvec{x}_0\) with initial rank r. If \(\varvec{x}_0\) has a fixed particle number, then so does the entire sequence

$$\begin{aligned} \varvec{x}_{\ell +1} = {\text {trunc}}_r \left( \varvec{x}_\ell - \alpha _\ell \varvec{Q}_{\varvec{x}_\ell }\left( \varvec{H}\varvec{x}_\ell - \frac{\langle \varvec{x}_\ell , \varvec{H} \varvec{x}_\ell \rangle }{\langle \varvec{x}_\ell ,\varvec{x}_\ell \rangle }\varvec{x}_\ell \right) \right) , \end{aligned}$$

where \(\alpha _\ell \) is the step size. In [35], it is shown that the truncation to fixed rank is a retraction, and thus the stated scheme can be regarded as a Riemannian optimization method. These methods can be accelerated by typical techniques for gradient descent, such as nonlinear conjugate gradient descent, see [1].

6.2 Blocks of zero size

We usually assume a tensor \(\varvec{x}\) to be represented with minimal ranks; otherwise, we can perform a TT-SVD truncation with a given error threshold or to a fixed multilinear rank as in Algorithm 4. This means that in the block-sparse format, all blocks that contain only zeros will be actually set to size zero, which has several implications.

First of all, as already mentioned in Sect. 3, we stress that truncating the ranks of a tensor \(\varvec{x}\) to a fixed multilinear rank \(r_1,\ldots ,r_{K-1}\) can lead to the tensor being set to zero, that is, . The block-sparse format allows for a deeper understanding of this fact: Setting a block to zero can lead to the tensor as a whole being set to zero, if all other nonzero blocks depend on it.

Furthermore, the question arises whether the above iterative methods can recover a block after it has been momentarily set to zero during an iteration step. We know that the one-site DMRG is not rank-adaptive and the block sizes are fixed during the iteration. In the other three methods it is possible to increase the rank based on some threshold (in the Riemannian case this can be achieved by modification of the retraction onto the manifold).

If \(\rho _{k,n} = 0\) for some k and n, then there exists a basis element \(\varvec{e}^\alpha \) of the eigenspace of the particle number operator with eigenvalue N (that is, \(\varvec{P} \varvec{e}^\alpha = N \varvec{e}^\alpha \)), such that \(\langle \varvec{e}^\alpha , \varvec{x}\rangle = 0\). Then we have \(\varvec{Q}_{\varvec{x}}^{k,1} \varvec{e}^\alpha = 0\), since \(\tau ^{<}_{k,j} (\mathsf {U})\) is not present for \(j\in \mathcal {S}_{k,n} = \emptyset \). A similar argument can be made for \(\varvec{Q}_{\varvec{x}}^{k,2}\). This means that in Riemannian gradient descent, once \(\rho _{k,n} = 0\) in some iterative step, then also \(\rho _{k,n} = 0\) for all subsequent steps. One can overcome this problem by choosing the initial point \(\varvec{x}_0\) in such a way that all \(\rho _{k,n}\) are at least 1. However, when retracting back onto the manifold, care needs to be taken that blocks are not set to zero.

For the two-site DMRG, a similar argument implies that some blocks can be created in each substep, depending on neighboring block sizes. A thorough analysis shows that the rank adaptivity of the two-site DMRG is always local and thus not all points can necessarily be reached from a given starting point \(\varvec{x}_0\). However, if we start from a generic point (with some block sizes possibly zero), we can expect a favorable behavior of the method.

The general gradient case is in this regard the most versatile as it has the fewest restrictions on the update step. Starting in \(\varvec{x}_0 \ne 0\) will allow us to optimize on the whole linear space of fixed particle number N throughout the procedure. A more detailed investigation will be given elsewhere.

6.3 Numerical stability of rounding

Fig. 3
figure 3

Rayleigh quotient of \({\varvec{x}}_{6,\epsilon }\) after rounding from rank 7 to rank 6. Plots the difference of the smallest two singular values over the error in the Rayleigh quotient

As mentioned in Remark 4.3, if the singular values in each matricization are distinct, the TT-SVD in Algorithm 4 is unique up to the signs of singular vectors. Therefore, performing a TT-SVD on a tensor with fixed particle number will automatically result in a (reordered) block-sparse format. However, when two singular values coincide, there is an additional rotational freedom in the corresponding subspaces that needs to be factored out in order to enforce block sparsity.

This fact has an important implication on the SVD truncation of a tensor of fixed particle number that is not in block-sparse format. If the TT-SVD is unique, the SVD truncation will result in a tensor of the same fixed particle number (or the zero tensor, which is still an element of the same eigenspace). If two or more singular values are equal, a general SVD truncation can destroy the natural block structure, as it could remove parts of two blocks simultaneously. Numerically, this already occurs when two singular values are close to each other, since singular vectors become increasingly ill-conditioned with decreasing difference of the corresponding singular values, resulting in numerical errors in the particle number upon truncation. We emphasize that if the block-sparse structure is enforced, this is ruled out.

Table 1 Ranks of the one- and two-particle operators for different numbers of orbitals K
Fig. 4
figure 4

Ranks of the output after the one-particle operator (top) and the two-particle operator (bottom) have been applied to a random MPS of rank 1 in full format, \(K = 32\). The ranks are shown after a TT-SVD trunctation with various values for \(\varepsilon \). The gradient from red to blue indicates a more substantial truncation, resulting in lower ranks

To illustrate this numerical issue, we conduct the following artificial experiment: Let \(K=20\) and \(N=6\). We pick a tensor with blocks of size 1, left-orthogonal components \(U_1,\ldots ,U_{10}\), and right-orthogonal components \(V_{11},\ldots ,V_{20}\). This tensor has rank at most 7 because there can be no more than 7 blocks of size 1 on the unoccupied or occupied layer, respectively. For \(\epsilon \ge 0\), we choose a diagonal matrix of singular values

$$\begin{aligned} \Sigma = \mathop \mathrm{diag}( \sigma _1, \sigma _2, \ldots , \sigma _7) = \mathop \mathrm{diag}(6,5,4,3,2,1,1 - \epsilon ) \end{aligned}$$

and construct the tensor

$$\begin{aligned} \varvec{x}_\epsilon = U_1 {{\,\mathrm{\bowtie }\,}}\dots {{\,\mathrm{\bowtie }\,}}U_{10} \Sigma {{\,\mathrm{\bowtie }\,}}V_{11} {{\,\mathrm{\bowtie }\,}}\ldots {{\,\mathrm{\bowtie }\,}}V_{20}, \end{aligned}$$

which has the singular values \(\sigma _1,\ldots ,\sigma _7\) between its middle components \(U_{10}\) and \(V_{11}\). When \(\epsilon = 0\), truncating the last singular value in full MPS format will therefore in general lead to a deviation from the particle number \(N=6\). This effect can also be observed when the smallest singular values are only roughly equal. We consider the Rayleigh quotient of \(\varvec{x}_{6,\epsilon } = {\text {trunc}}_{6,\ldots ,6}(\varvec{x}_\epsilon )\) with the particle number operator for \(\epsilon \rightarrow 0\), which directly translates to the difference of the smallest singular values \(| \sigma _6 - \sigma _7 | = \epsilon \). This is shown in Fig. 3.

Ideally, this Rayleigh quotient should be constant \(N=6\). This is the case for \(\epsilon > 10^{-8}\). However, as we can see if we keep the MPS in its full format, for small differences in the singular values, the Rayleigh quotient deviates and the natural block structure is destroyed. This is due to the ill-conditioning of singular vectors in the TT-SVD when the smallest singular values of \(\varvec{x}_\epsilon \) are close. If we keep the tensor in block-sparse format, as expected, this problem cannot occur.

6.4 Operator ranks

We have discussed in Sect. 5.1 that the one- and two-particle operators \(\varvec{S}\) and \(\varvec{D}\) can be explicitly stated in rank-compressed format. Here, we numerically show that these representations indeed have near-optimal ranks, in the sense that the rank compression procedure for operators that is outlined in Sect. 5.2 can reduce the ranks only for the border cases at the beginning and end of the MPO chain.

Table 1 shows the ranks of the two operators for different numbers of orbitals K and normally distributed random values for T and V. In the left column, we see the ranks of these operators when they are represented in the rank-reduced form discussed in Sect. 5.1. In the right column, we have performed an extra rank compression from left to right and one from right to left, as discussed in Sect. 5.2. One can see that the ranks in the left column are almost optimal. Only for the border cases of the two-particle operator, the ranks can be reduced further.

Finally, for \(K=32\), we apply these operators \(\varvec{S}\) and \(\varvec{D}\) to a random MPS of rank 1 that is not in the block-sparse format. The entries in the components of this tensor are chosen to be \({\mathcal {N}}(0,1)\)-distributed and the tensor is then normalized. The operators are in the rank-reduced format but they are applied explicitly and not as their matrix-free versions as in Sect. 5.2, since this is applicable only to tensors in block-sparse format.

Figure 4 shows the ranks of the output tensor for the two operators (top and bottom) and different truncation parameters \(\varepsilon \). One can see that with minimal truncation, the ranks of the output are about the same as the ranks of the operators. However, the output ranks can be further reduced by about a factor 2 if one is willing to accept an error threshold of \(\varepsilon = 10^{-12}\). A more substantial truncation does not reduce the ranks further, indicating that the ranks of the two operators are indeed linear and quadradic in K, respectively.