1 Introduction

In the past two decades, singular value decomposition (SVD) based tensor network state (TNS) methods have become vital alternative approaches to treat strongly correlated, i.e., multireference problems in quantum chemistry [1,2,3,4,5,6,7,8,9,10,11]. These provide an approximation of an eigenstate of the ab inito Hamiltonian as a product of low rank matrices or tensors, thus the computational demands are governed by the ranks of the component tensors, also known as bond dimension. In the course of the optimization procedure, the ranks of the tensors can be kept fixed, i.e., the optimization is carried out on a fixed submanifold, or, alternatively, they can be adapted dynamically to fulfill an a priory set error margin [12,13,14]. In the latter case, the maximum rank depends strongly on the network topology [15,16,17] and on the properties of the component tensors [18]. The optimal choice of the modes (i.e., orbitals in quantum chemistry) is another key aspect that influences the efficiency of the TNS methods [19,20,21], because optimal modes can lead to localization of the correlation and entanglement in the system [22], and to a drastic reduction of the bond dimension [23, 24]. Therefore, a joint optimization strategy that optimizes both the tensors and the modes simultaneously is expected to lead to a black box application of TNS methods for strongly correlated multireference problems.

The optimization of the orbitals is not a new concept, localized molecular orbitals (LMO) has a long history in quantum chemistry. The aim of the localization of orbitals is twofold. On the one hand, localization leads to chemically intuitive orbitals for rationalizing electronic structure of molecular systems. On the other hand, LMOs has proven to be useful in making the high-level correlated quantum chemical methods more tractable computationally. These methods are based on specially constructed unitary operators, and usually involve the optimization of the expectation value of a specific operator. Among many others, we can recall Foster–Boys localization [25, 26], which minimizes the radial extent of the localized orbitals, or Pipek–Mezey localization [27], which is based on maximizing the charge of each orbital.

In this work, we present a brief overview of the main aspects of the orbital optimization protocol, which is the quantum chemical application of the more general fermionic mode transformation [23, 24], and demonstrate that it has the potential to compress the multireference character of the wave functions, after finding optimal MOs, based on entanglement localization. Numerical simulations are performed for the nitrogen dimer for the equilibrium and for stretched geometries in the cc-pVDZ basis, which is a common basis for benchmark computations, developed by Dunning and co-workers [28]. Note that we use the term “basis” only for the atomic basis set cc-pVDZ. From this, the “canonical MOs” are obtained by the Hartree–Fock SCF optimization, which form the “initial MOs” of the DMRG calculation. From this, the “optimized MOs” are obtained by the orbital optimization, which is our main concern. The more general term “modes” is used for orbitals only where the general aspects of the theory are emphasized.

The organization of this work is as follows: in Sect. 2, we briefly recall the basics of matrix product states (MPS, a special case of TNS) and orbital optimization; in Sect. 3, we describe the numerical procedure applied, and present our numerical results; in Sect. 4, we draw the conclusions.

2 Theoretical background

In this section, we present a brief overview of the joint optimization procedure based on MPS and orbital optimizations, while a more detailed description can be found in the original works [20, 23].

In the context of nonrelativistic quantum chemistry, the Hamiltonian takes the second quantized form

$$\begin{aligned} H = \sum _{i,j=1}^d \sum _{\sigma \in \{\downarrow , \uparrow \}} t_{i,j} c_{i,\sigma }^\dag c_{j,\sigma } + \sum _{i,j,k,l=1}^d \sum _{\sigma ,\sigma '\in \{\downarrow , \uparrow \}} v_{i,j,k,l} c_{i,\sigma }^\dag c_{j,\sigma '}^\dag c_{l,\sigma '} c_{k,\sigma }, \end{aligned}$$
(1)

where ijkl are spatial indices and \(\sigma ,\sigma '\) are spin indices, and \(c_{i,\sigma }\) are the fermionic annihilation operators, satisfying the canonical anti-commutation relations \(\{c_{i,\sigma },c_{j,\sigma '}\}=0\) and \(\{c_{i,\sigma }^\dag ,c_{j,\sigma '}\}=\delta _{i,j}\delta _{\sigma ,\sigma '}\). There are constraints among the elements of the spin-independent integrals t and v such that H is Hermitian. The Hilbert space of N interacting electrons in d orbitals is the N-electron subspace of the fermionic Fock space \({\mathcal {F}}_d \cong \bigotimes _{i=1}^d {\mathcal {H}}_i\), which can be described by the basis constituted by all Slater determinants \(\{ \vert \alpha _1,\dots ,\alpha _d \rangle =\bigotimes _{i=1}^d \vert \alpha _i \rangle \}\), where the occupation indices are \(\alpha _i\in \{0,\downarrow ,\uparrow ,\downarrow \uparrow \}\), labeling the basis in the occupation spaces \({\mathcal {H}}_i\) of orbitals \(i\in \{1,2,\dots ,d\}\). The quantum many-body wave function can be written as a linear combination of all Slater determinants

$$\begin{aligned} \vert \psi \rangle = \sum _{\begin{array}{c} \alpha _1,\ldots ,\alpha _d\\ \in \{0,\downarrow ,\uparrow ,\downarrow \uparrow \} \end{array}} C_{\alpha _1,\ldots ,\alpha _d}\vert \alpha _1,\ldots ,\alpha _d \rangle , \end{aligned}$$
(2)

where the high-order coefficient tensor \(C\in ({\mathbb {C}}^4)^{\otimes d}\) is determined by the eigenvalue problem of the Hamiltonian given by Eq. (1) in the N-electron subspace. The full configuration interaction (full CI) wavefunction (2) can be expressed as a linear combination of wave functions corresponding to different excitation levels with respect to a reference determinant,

$$\begin{aligned} \vert \psi \rangle = \sum _I C_I\vert \psi _I \rangle , \end{aligned}$$
(3)

where the \(I=0\) term \(\vert \psi _0 \rangle \) refers to the reference determinant, and the \(I=1,2,3\dots \) terms \(\vert \psi _1 \rangle ,\vert \psi _2 \rangle ,\vert \psi _3 \rangle \dots \) to the single, double, triple...excitations, respectively. The coefficients \(C_0,C_1,C_2,C_3\dots \) normalize the CI terms [29].

In the MPS representation, the wave function takes the form

$$\begin{aligned} \vert \psi \rangle = \sum _{\begin{array}{c} \alpha _1,\ldots ,\alpha _d\\ \in \{0,\downarrow ,\uparrow ,\downarrow \uparrow \} \end{array}} A^{\alpha _1}_{[1]}\cdots A^{\alpha _d}_{[d]}\vert \alpha _1,\dots ,\alpha _d \rangle , \end{aligned}$$
(4)

where the component tensors are \(A_{[i]}^{\alpha _i}\in {\mathbb {C}}^{D_{i-1}\times D_i}\), with bond dimensions \(D_i\), and \(D_0 = D_d = 1\). Every state vector can be written in an MPS form by applying consecutive SVDs [30], using sufficiently large bond dimensions, however, this scales exponentially with d in the generic case. The restriction of the bond dimensions to a fixed value D restricts the full state space to a sub-manifold. We can then approximate an eigenstate of the Hamiltonian (1) within this sub-manifold by the use of the density matrix renormalization group (DMRG) algorithm, which, being an alternating least square method, optimises the entries of the MPS tensors \(A_{[i]}\) iteratively [7, 8, 31,32,33], leading to a variational treatment of the eigenvalue problem of the Hamiltonian (1).

Utilizing a unitary orbital-transformation \(U \in \mathrm {U}(d)\), a linear transformation of a set of fermionic annihilation operators \(\{c_{i,\sigma }\}\) to a new set \(\{d_{i,\sigma }\}\) satisfying the canonical anti-commutation relations can be obtained, i.e., \(c_{i,\sigma } = \sum _{j=1}^d U_{i,j,\sigma } d_{j,\sigma }\). We note that in the presented system it is not necessary to use different unitaries for spin up and down, \(U_{i,j,\uparrow }=U_{i,j,\downarrow }\), however, the implementation is applicable for the unrestricted case. Under this transformation, the representation G(U) can also be given on the Fock space [23], by which a fermionic wave function \(\vert \psi ({\mathbb {I}}) \rangle \) transforms to \(\vert \psi (U) \rangle = G(U)^\dagger \vert \psi ({\mathbb {I}}) \rangle \) and the Hamiltonian written in terms of the transformed orbitals by \(H(U) = G(U)^\dagger H G(U)\). In the course of the DMRG algorithm, the unitary U is constructed iteratively from two-orbital unitary operators by sweeping through the network. At each micro-iteration step, the half-Rényi block entropy \(S_{1/2}(\rho _{\{1,2,\dots ,k\}}) = 2\ln ({{\,\mathrm{Tr}\,}}\sqrt{\rho _{\{1,2,\dots ,k\}}})\) is minimized by a two-orbital rotation. (Here \(\rho _{\{1,2,\dots ,k\}}\) is the density operator of the first k orbitals [34, 35].) In practice, when turn to numerical simulation including orbital optimization, it is favourable not to transform the operators themselves to keep robustness. Rather it is practical to perform orbital optimization in terms of the parameters t and v in the Hamiltonian.

At the end of the last DMRG sweep, the one-orbital entropies \(s_i\), the two-orbital mutual informations \(I_{i,j}:=s_i+s_j-s_{i,j}\), the total correlation \(I_\text {tot} = \sum _i s_i\), the correlation distance \(I_\text {dist} = \sum _{i,j} I_{i,j} \vert i - j\vert ^2\), the one-particle reduced density matrix \(\gamma _{i,j} = \langle c^\dagger _j c_i \rangle \), and the occupation number distribution \(\langle n_i\rangle \) are calculated (where \(i,j\in \{1,\ldots ,d\}\)). Here, \(s_i= -{{\,\mathrm{Tr}\,}}(\rho _i \ln \rho _i)\) and \(s_{i,j}= -{{\,\mathrm{Tr}\,}}(\rho _{i,j}\ln \rho _{i,j})\) are the von-Neumann entropies of the one- and two-orbital reduced density operators \(\rho _i\) and \(\rho _{i,j}\) [34]. The mutual information and more general correlation measures [36, 37] can be used not only for the optimization but also for the characterization of the chemical properties [38, 39]. On the other hand, the eigenvalues and eigenvectors of the one-particle reduced density matrix \(\gamma \) define the natural occupation numbers \(\lambda _i\), and the natural orbitals (NO). An optimized ordering of orbitals along the tensor network is calculated from the mutual informations \(I_{i,j}\), using the Fiedler vector approach [40]; a new complete active space vector is calculated from the entropies \(s_i\) for the dynamically extended active space (DEAS) procedure [15]; and a new Hartree–Fock (HF) reference configuration is calculated from the occupations \(\langle n_i\rangle \). These, together with the final rotated interaction matrices, are all used as inputs for the subsequent orbital transformation macro-iteration.

3 Numerical approach

In the numerical procedure, the calculations are carried out for the nitrogen dimer in the cc-pVDZ basis [28] for various bond lengths. Systematically increasing the bond dimension \(D_\text {opt}=16,64,256,512,1024,2048,4096\), we have used nine DMRG sweeps, 20 orbital optimization macro-iterations and utilized \(\mathrm {U}(1)\) symmetries only. When \(D_\text {opt}\le 512\) is used, after convergence is reached, large scale DMRG calculations are performed with bond dimension \(D=4096\). We note that, for large \(D_\text {opt}\) bond dimension, orbital optimization converges already after 4-5 macro-iterations.

In earlier works, it has been demonstrated that DMRG calculations after orbital optimization can lead to significantly more accurate results for the same computational complexity (e.g., same truncated bond dimension); or to much lower truncated bond dimensions, needed to reach the same accuracy, due to the tremendous reduction of the entanglement in the system [23, 24]. Here we focus on the emerging MOs and on the structure of the wave function. Therefore, first we choose a small active space, namely, 6 electrons on 14 orbitals, CAS(6,14), for which calculations can be performed in the full-CI limit. Here, a very large value of \(D_\text {opt}=4096\) is enforced. A selected set of quantities, based on concepts of quantum information theory, to monitor numerically the performance of the fermionic orbital optimization are shown in Fig. 1, for the equilibrium geometry with bond length \(r=2.118a_0\), obtained in the initial MOs (first row), and for the optimized MOs (second row). Here it can clearly be seen that the orbital optimization has no effect, except that the MOs have been reordered along the DMRG chain in order to reduce the correlation distance in the system (calculated from the two-orbital mutual informations plotted also in Fig. 1), \(I_\text {dist} = 18.8575\) changes to 12.6095. The ground state energy \(E = -109.0931\mathrm {Ha}\) remains invariant under the action of the unitary group, the single-orbital entropy profiles changed only marginally (plotted also in Fig. 1), so \(I_\text {tot}\), (calculated from the single-orbital entropies) changes slightly from 1.3373 to 1.2696. Three MOs are almost doubly occupied, and, although canonical MOs have been used, \(\langle n_i\rangle \) and \(\lambda _i\) fall on the top of each other for the initial and optimized MOs, resembling the characteristics of NO-like orbitals. The sharp Fermi edge indicates that the system is weakly correlated, i.e., a single-reference problem.

In contrast to this, for a stretched geometry \(r=4.200a_0\), the sharp drop off in \(\langle n_i\rangle \) and \(\lambda _i\) at the Fermi edge disappears, see in Fig. 2. The corresponding six partially occupied orbitals possess very large orbital entropies, indicating that these orbitals are in mixed states, and are highly entangled with the rest of the system. The two orbitals with occupation number close to 1.5 and 0.5 are the \(\sigma \) bonding and anti-bonding orbitals, while the four orbitals with \(0.5 \le \langle n_i\rangle \le 1.5\) are orbitals with \(\pi \) symmetry. The underlying bond breaking effect has already been analyzed in terms of entropies in Ref. [35], however, such analysis depends on the choice of basis and on the modes optimized, as will be addressed below. Carrying out MO optimization, new MOs are found, which are no longer NO-like (see the difference between the profiles of \(\langle n_i\rangle \) and \(\lambda _i\)). Here, \(\langle n_i\rangle =1\) and \(s_i\simeq 0.8\) for four orbitals, i.e., the electrons are uniformly distributed on the corresponding \(\pi \) orbitals. The orbital entropy for the two \(\sigma \) orbitals remains close to unity, which also signals that the \(\pi \) bonds break first. Note that the results of this quantitative analysis in terms of orbital entropies are the opposite as those in Ref. [35], which demonstrates again that the entropic analysis is basis and mode transformation dependent. Although the ground state energy does not change during the macro-iterations, \(E=-108.7935\mathrm {Ha}\), the orbital entropies are reduced. Therefore, the overall quantum correlation encoded in the wave function, \(I_\text {tot}\), reduces from 7.3585 to 5.3188, and \(I_\text {dist}\) from 36.6702 to 28.3941.

Fig. 1
figure 1

Orbital entropy profiles \(\{s_i\}\), sorted values of the natural orbital occupation numbers \(\{\lambda _i\}\), and occupation numbers \(\{\langle n_i\rangle \}\), and two-orbital mutual informations \(\{I_{i,j}\}\) for the initial MOs in CAS(6,14) (first row), and after the 20th orbital optimization macro-iterations (second row) for the nitrogen dimer for bond length \(r=2.118a_0\) using bond dimension \(D=4096\)

Fig. 2
figure 2

a Similar to Fig. 1, but for a stretched geometry at \(r=4.200a_0\)

Further stretching the nitrogen dimer, the orbital entropies of the partially occupied orbitals scale towards \(\ln (4)\simeq 1.38\) in the initial MOs, while they are reduced to \(\ln (2)\simeq 0.69\) in the optimized MOs. For the optimized MOs, \(\langle n_i\rangle \) takes values very close to one or zero, i.e., the six electrons are distributed uniformly on the the six partially occupied orbitals. Note that these six orbitals are almost uncorrelated with the rest of the orbitals, i.e., the problem reduces to a CAS(6,6), as expected. In this almost half-filled configuration, the empty and doubly occupied configurations provides no contribution for the orbitals of CAS(6,6), giving \(s_i\simeq \{\ln (2),0\}\). For \(r=20.000a_0\), even the initial MOs lead to the latter configuration.

Besides the entropic quantities, it is interesting to study the \(C_\alpha \) entries of the coefficient tensor (2), extracted from the MPS wave function obtained by the DMRG algorithm. Figure 3 shows the absolute value of the 50 largest \(C_\alpha \) elements of the coefficient tensor in decreasing order, for various bond lengths. It is clearly visible that at the equilibrium geometry (\(r=2.118a_0\)) for the initial MOs (red), there is one determinant of weight almost one, and the remaining coefficients are smaller by at least an order of magnitude. This single-reference property, however, changes, as the nitrogen dimer is stretched, and the leading coefficient gets smaller and smaller, until degenerate plateaus appear. This multireference behaviour is in accordance with the entropic analysis discussed above.

Fig. 3
figure 3

Absolute value of the 50 largest \(C_a\) elements of the coefficient tensor in decreasing order for various bond lengths for the initial MOs (red) and for the optimized MOs (blue), obtained in the full-CI limit in CAS(6,14). Here the relabelling \(C_a\) for \(a=1,2,\dots ,4^d\) is used for the \(C_\alpha \) elements of the coefficient tensor, such that \(\vert C_a\vert \ge \vert C_b\vert \) if \(a<b\)

When orbital optimization is also utilized, the resulting profile of the \(C_\alpha \) entries of the coefficient tensor changes significantly with increasing bond length, compared to the initial MOs, as is shown in Fig. 3 by blue color. For the equilibrium geometry at \(r=2.118a_0\), the difference between the initial and optimized MOs is negligible, as the initial MOs already provides a single-reference approach for the problem. In contrast to this, for \(r\ge 3.600a_0\), the effect of orbital optimization becomes more drastic. For \(r\ge 4.200a_0\), the leading coefficients become two-fold degenerate, corresponding to a determinant and its spin flipped component, and their weight increase rapidly to the saturation value of \(1/\sqrt{2}\) with increasing bond length. The plateau observed in the initial MOs for \(r\ge 10.000a_0\) completely disappears. Along these lines, the sum of the square of the absolute values of the largest CI coefficients also shows a more rapid convergence to unity in the optimized modes for the stretched geometries, as shown in Fig. 4. Additionally, a more CI-based analysis is given in Appendix B. The fast decay of the \(C_\alpha \) tensor coefficients in the optimized MOs leads to a more suitable basis for DMRG, thus lower computational demands are needed to reach the same level of accuracy, as mentioned before.

Repeating the same analysis but for the full space, i.e., for CAS(14,28), similar conclusions have been reached. Here, however, the full-CI limit could not be reached, thus the effects of bond dimension truncation also influence the results. The corresponding entropy plots are summarized in Appendix A, while Fig. 5a shows the absolute value of the first 25 largest \(C_\alpha \) tensor coefficients up to double excitation levels in decreasing order for various bond lengths. The reference determinant was obtained by the occupation number profile \(\langle n_i \rangle \). Here, for the equilibrium bond length \(r=2.118a_0\), a sharp Fermi edge separates again the almost doubly occupied seven orbitals from the remaining virtual orbitals for both the initial and the optimized MOs. For the stretched geometries, six electrons get again shared among six orbitals, which get superposed under the action of orbital optimization. Since these orbitals are only marginally correlated with the rest of the orbitals, the previous analysis holds for the corresponding CAS(6,6) space.

Fig. 4
figure 4

a Sum of the square of the absolute values of the 1000 largest CI coefficients for the nitrogen dimer in the CAS(6,14) for various bond lengths, extracted from the MPS wave function, obtained by the DMRG algorithm, with a bond dimension \(D=4096\). b Similar to a, but for the optimized MOs

According to Fig. 5a, the absolute values of the leading \(C_\alpha \) coefficients of the tensor, together with the norm squares of the wave function components corresponding to single and double excitation levels systematically decrease with r increasing from \(2.118a_0\) to \(10.000a_0\), as expected. Thus, in the large r limit, higher level excitations besides singles and doubles get more and more weight. Here, we have used the original HF determinant as reference determinant, in which configuration the first seven orbitals are doubly occupied. For the optimized MOs, however, the leading coefficient increases drastically, and, again, the fast decay of the values is observed, see in Fig. 5b. The two-fold degeneracy of the leading coefficient, on the other hand, is sensitive to the bond dimension. This is illustrated in Fig. 5c for \(r=4.200a_0\) for the optimized MOs obtained with \(D_\text {opt}=512\), as a function of D. Here, two reference determinants, given by the occupation number profile, connected by spin flip transformation, have been used. It is clear that for small bond dimensions, \(16\le D \le 256\), the problem looks like a single-reference one in the optimized MOs, while for larger D values, the correct degeneracy of the leading coefficient is recovered. In addition, for \(D=4096\), the norm square of the wave function component corresponding to single and double excitation levels gets close to unity, see in Fig. 5b, indicating that orbital optimization has the potential to convert higher level excitations to lower ones, i.e., compressing multireference character of wave functions. This provides a significantly more optimal MOs for DMRG computation, which is also validated by the resulting lower energies for truncated bond dimensions. Similar conclusion has also been drawn for the two-dimensional spinless fermionic lattice models commonly studied in solid-state physics, where a single determinant is suitable for the description of the quantum many-body wave function for the non-interacting case, and for infinitely large interaction [24].

Fig. 5
figure 5

a Absolute values of the first 25 largest CI coefficients including single and double excitation levels for the nitrogen dimer in the full space CAS(14,28) for various bond lengths, extracted from the MPS wave function, obtained by the DMRG algorithm, with a bond dimension \(D=4096\). The inscribed numbers are the norm squares of the wave function component corresponding to single and double excitations for the various bond lengths. b Similar to a, but for the optimized MOs with \(D_\text {opt}=512\). c Convergence of the absolute values of the first 25 largest CI coefficients including single and double excitation levels for \(r=4.200a_0\) for the optimized MOs, as a function of the bond dimension D

4 Conclusions

In this work, we have presented a brief overview of the main aspects of a joint optimization procedure for tensor network state methods, when optimization is carried out on a fixed rank MPS manifold, and on the manifold of MOs. Numerical illustrations were given for the nitrogen dimer in the cc-pVDZ basis for the equilibrium and for stretched geometries. We have analyzed the properties of the wave function, based on various entropic quantities, and on the profile of the coefficient tensor, highlighting the basis and orbital transformation dependent nature of such quantities. The corresponding method, dubbed orbital optimization, has the potential to reduce significantly the correlation and entanglement encoded in the quantum many-body wave function, and to convert coefficients of higher level excitations to those of lower level ones, resulting in a rapidly decaying entries of the wave function coefficient tensor. These all together provide compression of the multireference character of wave functions, and significantly more optimal MOs for TNS and conventional multireference methods.