1 Introduction

Understanding strongly interacting quantum many-body systems is one of the major challenges of contemporary physics. In order to achieve progress in that direction, it is often useful to apply the same principles as for weakly interacting systems. Following this strategy, the first step is to find or characterize the ground or thermal equilibrium state [1,2,3]. As the second step, one should identify the relevant quasi-particle excitations describing linearized perturbations around this equilibrium state and determine their properties, such as dispersion relations [4,5,6,7,8,9]. Going beyond this linearized level, one can then study the interactions of these quasi-particle excitations among each other and with other degrees of freedom [10,11,12,13,14,15,16,17,18,19,20].

Since exact solutions are typically limited to special cases or small systems [21,22,23], approximations are necessary in most cases. Ideally, these approximation schemes should be based, at least in principle, on a systematic expansion into powers of some small control parameter. In contrast to weakly interacting systems, the large coupling strength prohibits its use as perturbation parameter, but one could use its inverse [24,25,26] (strong-coupling perturbation theory) or the inverse of some other large number, such as spin \(S\gg 1\) [27,28,29] or coordination number \(Z\gg 1\) [30,31,32,33,34,35], which typically leads to some sort of mean-field theory.

In the following, we consider the Fermi-Hubbard model as the drosophila of strongly interacting quantum many-body systems [38,39,40] (\(\hbar =1\))

$$\begin{aligned} \hat{H}=-\frac{1}{Z}\sum _{\mu \nu s} T_{\mu \nu } \hat{c}_{\mu s}^\dagger \hat{c}_{\nu s} +U\sum _\mu \hat{n}_\mu ^\uparrow \hat{n}_\mu ^\downarrow \,, \end{aligned}$$
(1)

where \(\hat{c}_{\mu s}^\dagger \) and \(\hat{c}_{\nu s}\) denote the fermionic creation and annihilation operators at the lattice sites \(\mu \) and \(\nu \) with spin \(s\in \{\uparrow ,\downarrow \}\) while \(\hat{n}_\nu ^s\) are the associated number operators. The lattice structure is encoded in the hopping matrix \(T_{\mu \nu }\) which equals the tunneling strength T for nearest neighbors \(\mu \) and \(\nu \) and is zero otherwise. The coordination number Z counts the number of nearest neighbors \(\mu \) for a given lattice site \(\nu \) and is assumed to be large \(Z\gg 1\). Finally, U denotes the on-site repulsion and we focus on the strong-coupling limit \(U\gg T\) in the following.

Let us briefly recapitulate the relevant symmetries of the Fermi-Hubbard Hamiltonian (1). In addition to the total particle numbers \(\hat{N}^s=\sum _\mu \hat{n}_\mu ^s\), the total spin

$$\begin{aligned} \hat{\textbf{S}}=\sum _\mu \hat{\textbf{S}}_\mu =\frac{1}{2}\sum _{\mu ss'}\varvec{\sigma }_{ss'}\hat{c}^\dagger _{\mu s}\hat{c}_{\mu s'} \end{aligned}$$
(2)

is also conserved, where \(\varvec{\sigma }_{ss'}\) are the elements of the Pauli spin matrices reflecting the global SU(2)-invariance [41]. Here, bold-face symbols such as \(\hat{\textbf{S}}=(\hat{S}^x,\hat{S}^y,\hat{S}^z)\) represent vectors.

Another interesting symmetry is the particle-hole duality: If we exchange all creation and annihilation operators \(\hat{c}_{\mu s}^\dagger \leftrightarrow \hat{c}_{\mu s}\) which implies \(\hat{n}_\mu ^s \leftrightarrow 1-\hat{n}_\mu ^s\), we find that the Hamiltonian (1) is mapped to the same form with a negative hopping strength \(T \leftrightarrow -T\) up to an irrelevant shift containing the total particle number \(\hat{N}=\hat{N}^\uparrow +\hat{N}^\downarrow \). In order to avoid this shift, one could consider the grand-canonical Hamiltonian \(\hat{H}_\textrm{gc}=\hat{H}-\mu \hat{N}\) with the chemical potential \(\mu =U/2\) which is then mapped onto itself with \(T \leftrightarrow -T\).

For bi-partite lattices, where one can introduce a parity \((-1)^\mu \) which is alternating for neighboring lattice sites, the pseudo-spin \(\hat{\varvec{\eta }}=(\hat{\eta }^x,\hat{\eta }^y,\hat{\eta }^z)\) yields another conserved quantity (see Appendix  C). In addition, for this case the staggered gauge transformation \(\hat{c}_{\mu s}\rightarrow (-1)^\mu \hat{c}_{\mu s}\) does also map the Hamiltonian (1) into the same form with a negative hopping strength \(T \leftrightarrow -T\) [22, 36, 37].

A prominent example for the crucial differences between weakly and strongly interacting systems is the Mott insulator [38, 42]. For weak interactions \(U\ll T\), the state at half filling for both spin species would be metallic, only the Fermi surface would be deformed a bit by the coupling U. The Mott insulator [43] is realized in the other limit \(U \gg T\) however, where the ground state is insulating and basically one particle occupies each lattice site – up to small virtual hopping corrections with probabilities of order \(T^2/U^2\). In order to facilitate transport, one has to excite a doublon-holon pair which requires a minimum energy given by the Mott gap \(\Delta E_\textrm{Mott}\approx U\). Note that, in contrast to these real and long-lived doublon-holon pairs (whose creation requires a minimum energy given by the Mott gap \(\Delta E_\textrm{Mott}\approx U\)), the hopping corrections mentioned above are sometimes pictured as virtual and short-lived doublon-holon pairs (which do not require such an excitation energy and are present in the ground state).

In the following, we shall study the properties of these quasi-particle excitations on top of the Mott insulating state [44]. As explained above, this includes the single-particle characteristics such as their dispersion relation – but also two-particle properties describing their interaction among each other.

2 Hierarchy of Correlations

In order to pursue the strategy described in the Introduction, let us first employ the method of the hierarchy of correlations, see also [30, 33, 45]. To this end, we consider the reduced density matrices of one \(\hat{\rho }_\mu \), two \(\hat{\rho }_{\mu \nu }\), and three \(\hat{\rho }_{\mu \nu \lambda }\) lattice sites, etc., and split up the correlated parts via \(\hat{\rho }_{\mu \nu }^\textrm{corr}=\hat{\rho }_{\mu \nu }-\hat{\rho }_{\mu }\hat{\rho }_{\nu }\), and so on.

Now, based on the assumption \(Z\gg 1\), we may employ an expansion into powers of 1/Z where we find that higher-order correlators are successively suppressed as \(\hat{\rho }_{\mu \nu }^\textrm{corr}= \mathcal {O}(1/Z), \hat{\rho }_{\mu \nu \lambda }^{\text {corr}}=\mathcal {O}(1/Z^2)\), and so on. This hierarchy facilitates an iterative approximation scheme, where we may start from the exact evolution equations

$$\begin{aligned} i\partial _t \hat{\rho }_\mu= & {} F_1(\hat{\rho }_\mu ,\hat{\rho }_{\mu \nu }^\textrm{corr})\,, \nonumber \\ i\partial _t \hat{\rho }_{\mu \nu }^\textrm{corr}= & {} F_2(\hat{\rho }_\mu ,\hat{\rho }_{\mu \nu }^\textrm{corr},\hat{\rho }_{\mu \nu \lambda }^\textrm{corr})\,,\nonumber \\ i\partial _t \hat{\rho }_{\mu \nu \lambda }^\textrm{corr}= & {} F_3(\hat{\rho }_\mu ,\hat{\rho }_{\mu \nu }^\textrm{corr},\hat{\rho }_{\mu \nu \lambda }^\textrm{corr}, \hat{\rho }_{\mu \nu \lambda \kappa }^\textrm{corr}), \end{aligned}$$
(3)

and so on for even higher orders, where the functions \(F_n\) are determined by the Hamiltonian (1).

To lowest order \(\mathcal {O}(Z^0)\), we may approximate the first equation by \(i\partial _t \hat{\rho }_\mu =F_1(\hat{\rho }_\mu ,0)+\mathcal {O}(1/Z)\). The solution to this equation obeying the required boundary conditions then yields the mean-field ansatz \(\hat{\rho }_\mu ^0\) as the starting point for calculating the higher orders in 1/Z.

In order to describe the Mott insulator state at half filling in the strong-coupling limit \(U\gg T\), we use the simple mean-field ansatz (at zero temperature)

$$\begin{aligned} \hat{\rho }_\mu ^0=\frac{|{\uparrow }\rangle _\mu \langle {\uparrow }|+|{\downarrow }\rangle _\mu \langle {\downarrow }|}{2}. \end{aligned}$$
(4)

In principle, the aforementioned virtual hopping corrections with small probabilities \(\sim T^2/U^2\) for an empty \(|0\rangle _\mu \) or full lattice site \(|\uparrow \downarrow \rangle _\mu \) could be included as well, but we neglect them here.

Note that the above mean-field ansatz (4) is invariant under the particle-hole duality transformation mentioned after equation (2) and does not include any spin ordering to lowest order. A staggered mean-field ansatz which does display spin ordering could be introduced for bi-partite lattices via

$$\begin{aligned} \hat{\rho }_\mu ^\text {Ising}=\left\{ \begin{array}{lll} |{\uparrow }\rangle _\mu \langle {\uparrow }| &{} \text {for} &{} \mu \in \mathcal {A}\\ |{\downarrow }\rangle _\mu \langle {\downarrow }| &{} \text {for} &{} \mu \in \mathcal {B} \end{array} \right. \,, \end{aligned}$$
(5)

where \(\mathcal A\) and \(\mathcal B\) denote the two sub-lattices. This state describes an Ising type anti-ferromagnet where \(\langle \hat{S}^z_\mu \hat{S}^z_\nu \rangle \) is minimized and the \({\mathbb Z}_2\) symmetry \(\hat{c}_{\mu \uparrow } \leftrightarrow \hat{c}_{\mu \downarrow }\) is spontaneously broken.

Note, however, that the ground state of the Fermi-Hubbard model (1) does not display this Ising type but rather Heisenberg type anti-ferromagnetic order where \(\langle \hat{\textbf{S}}_\mu \cdot \hat{\textbf{S}}_\nu \rangle \) is minimized instead of \(\langle \hat{S}^z_\mu \hat{S}^z_\nu \rangle \) as in the Ising case. This ground state is invariant under the SU(2)-invariance generated by the total spin (2) instead of the broken \({\mathbb Z}_2\) symmetry of the Ising case. Formally, the reduced density matrix of a single lattice site \(\mu \) is given by the ansatz (4). The correlations between neighboring lattice sites \(\mu \) and \(\nu \) can be taken into account via \(\hat{\rho }_{\mu \nu }^\textrm{corr}\). As a more intuitive picture, the Heisenberg type anti-ferromagnet can be visualized as lying somewhere in between the fully ordered Ising-type state (5) and the state (4) without any spin order, see also equation (30) below.

The correlations \(\hat{\rho }_{\mu \nu }^\textrm{corr}\) can be further suppressed for finite temperatures. For example, if the temperature is much larger than the effective anti-ferromagnetic interaction \(\mathcal {O}(T^2/U)\) but still way below the Mott gap \(\Delta E=\mathcal {O}(U)\), the ansatz (4) would basically reproduce the exact thermal density matrix. As another possibility, the coupling to an environment can effectively steer the system towards the state (4), see, e.g., [32, 46].

2.1 Doublons and Holons

To next order in 1/Z, we may derive the quasi-particle excitations by approximating the second line of equation (3) via \(i\partial _t \hat{\rho }_{\mu \nu }^\textrm{corr}= F_2(\hat{\rho }_\mu ^0,\hat{\rho }_{\mu \nu }^\textrm{corr},0)+\mathcal {O}(1/Z^2)\) which yields a linear equation for \(\hat{\rho }_{\mu \nu }^\textrm{corr}\). To solve this linear equation, it is useful to split the original annihilation operator

$$\begin{aligned} \hat{c}_{\mu \uparrow }=|\downarrow \rangle _\mu \!\langle \uparrow \downarrow |+|0\rangle _\mu \!\langle \uparrow |=\hat{c}_{\mu \uparrow }\hat{n}_\mu ^\downarrow +\hat{c}_{\mu \uparrow }(1-\hat{n}_\mu ^\downarrow )=\hat{f}_{\mu \uparrow }+\hat{e}_{\mu \uparrow }^\dagger \end{aligned}$$
(6)

into the annihilation operator \(\hat{f}_{\mu \uparrow }\) of a full lattice site \(|\uparrow \downarrow \rangle _\mu \) and the creation operator \(\hat{e}_{\mu \uparrow }^\dagger \) of an empty lattice site \(|0\rangle _\mu \). After a spatial Fourier transform (assuming infinite-size lattices), the linear equation for \(\hat{\rho }_{\mu \nu }^\textrm{corr}\) can be mapped onto a set of linear equations for the operators \(\hat{f}_{\textbf{k}s}\) and \(\hat{e}_{\textbf{k}s}^\dagger \)

$$\begin{aligned} i\partial _t \left( \begin{array}{l} \hat{f}_{\textbf{k}s}\\ \hat{e}_{\textbf{k}s}^\dagger \end{array} \right) = \left( \begin{array}{ll} U-T_\textbf{k}/2 &{} -T_\textbf{k}/2 \\ -T_\textbf{k}/2 &{} -T_\textbf{k}/2 \end{array} \right) \cdot \left( \begin{array}{l} \hat{f}_{\textbf{k}s} \\ \hat{e}_{\textbf{k}s}^\dagger \end{array} \right) , \end{aligned}$$
(7)

For convenience, we have re-scaled all length scales with respect to the lattice spacing \(\ell \) and thus the wave-numbers used here are dimensionless. Note that the wave-numbers \(\textbf{k}\) are also vectors, but – depending on the dimensionality of the lattice – possibly in a vector space of different dimension than the three-dimensional vector \(\hat{\textbf{S}}\) in equation (2).

Diagonalizing the above \(2\times 2\)-matrix, the eigenvalues \(\lambda ^\pm _\textbf{k}\) yield the quasi-particle energies \(E^\pm _\textbf{k}\) via

$$\begin{aligned} \lambda ^\pm _\textbf{k}=\pm E^\pm _\textbf{k}=\frac{1}{2}\left( U-T_\textbf{k}\pm \sqrt{T_\textbf{k}^2+U^2}\right) . \end{aligned}$$
(8)

where \(T_\textbf{k}\) denotes the Fourier transform of the hopping matrix \(T_{\mu \nu }\). In the strong-coupling limit \(U\gg T\), these quasi-particle energies simplify to \(E^+_\textbf{k}\approx U-T_\textbf{k}/2\) and \(E^-_\textbf{k}\approx T_\textbf{k}/2\).

Starting from the grand-canonical Hamiltonian including the chemical potential, the matrix in equation (7) would contain \(\pm U/2\) on the diagonal and thus its eigenvalues would be lowered by U/2 such that the quasi-particle energies in (8) assume a more symmetric form \(E^\pm _\textbf{k}\approx U/2\mp T_\textbf{k}/2\). In the following, we shall use the convention (8).

Note that, starting from the mean-field ansatz (5) reflecting the perfect Ising type anti-ferromagnetic order, the quasi-particle energies would not contain such a contribution linear in \(T_\textbf{k}\) but scale quadratically \(\mathcal {O}(T_\textbf{k}^2/U)\). As an intuitive picture, since neighboring sites always have opposite spins, the propagation of doublons or holons on such a perfectly spin-ordered background can only occur via second-order tunneling processes. In contrast, for Heisenberg type spin order (or an unordered state), there is a finite probability that neighboring lattice sites are occupied by particles with the same spin, such that doublons or holons can propagate via first-order tunneling processes.

The eigenvectors of the matrix in equation (7) determine the Bogoliubov transformation to the quasi-particle operators \(\hat{d}_{\textbf{k}s}\) and \(\hat{h}_{\textbf{k}s}^\dagger \)

$$\begin{aligned} \left( \begin{array}{c} \hat{d}_{\textbf{k}s} \\ \hat{h}_{\textbf{k}s}^\dagger \end{array} \right) = \left( \begin{array}{cc} \cos \varphi _\textbf{k} &{} \sin \varphi _\textbf{k} \\ -\sin \varphi _\textbf{k} &{} \cos \varphi _\textbf{k} \end{array} \right) \cdot \left( \begin{array}{c} \hat{f}_{\textbf{k}s} \\ \hat{e}_{\textbf{k}s}^\dagger \end{array} \right) \end{aligned}$$
(9)

with the rotation angle

$$\begin{aligned} \tan \varphi _\textbf{k}=\frac{\sqrt{T_\textbf{k}^2+U^2}+U}{T_\textbf{k}} \end{aligned}$$
(10)

where \(\hat{d}_{\textbf{k}s}\) is the annihilation operator for a doublon with energy \(E^+_\textbf{k}\) while \(\hat{h}_{\textbf{k}s}^\dagger \) is the creation operator of a holon with energy \(E^-_\textbf{k}\).

Of course, the above derivation of the quasi-particle picture is not unique, one can also derive it via other means, e.g., the Hubbard approximation [4, 38]. However, the 1/Z-expansion provides a clear and controlled path to incorporate higher orders consistently.

2.2 Boltzmann Equation

To first order in 1/Z, the time evolution of the operators \(\hat{d}_{\textbf{k}s}\) and \(\hat{h}_{\textbf{k}s}\) is simply governed by the trivial phase factors \(\exp \{-iE^\pm _\textbf{k}t\}\) such that their populations \(\langle \hat{d}_{\textbf{k}s}^\dagger \hat{d}_{\textbf{k}s}\rangle \) and \(\langle \hat{h}_{\textbf{k}s}^\dagger \hat{h}_{\textbf{k}s}\rangle \) remain constant. Interactions such as collisions between these quasi-particles leading to a finite energy and momentum transfer would induce a redistribution of these populations and are thus not described within this first-order approach. To incorporate such interactions, one has to include higher orders in 1/Z.

To second order \(1/Z^2\), one should take the three-point correlator \(\hat{\rho }_{\mu \nu \lambda }^\textrm{corr}\) in the second line of equation (3) into account. Its time-derivative does also contain the four-point correlator \(\hat{\rho }_{\mu \nu \lambda \kappa }^\textrm{corr}\), which is of order \(1/Z^3\). Truncating the set of evolution equations (3) at this order, i.e., neglecting all terms scaling with \(1/Z^4\) or higher, we may apply basically the same steps (Markov approximation etc.) as for weakly interacting systems and arrive at a Boltzmann equation describing the redistribution of the quasi-particle populations \(\mathfrak {d}_\textbf{k}^s= \langle \hat{d}_{\textbf{k}s}^\dagger \hat{d}_{\textbf{k}s}\rangle \) and \(\mathfrak {h}_\textbf{k}^s= \langle \hat{h}_{\textbf{k}s}^\dagger \hat{h}_{\textbf{k}s}\rangle \). Focusing on the holon sector for simplicity, we find in the strong-coupling limit \(U\gg T\) (where \(E^-_\textbf{k}\approx T_\textbf{k}/2\)) for the mean-field background (4), see Appendix A and B [45, 47]

$$\begin{aligned} \partial _t \mathfrak {h}_\textbf{k}^\uparrow= & {} -2\pi \int \limits _{\textbf{p}\textbf{q}}\left( T_\textbf{k}+T_\textbf{p}\right) ^2\delta \left( E_\textbf{k}^-+E_\textbf{p}^- -E_\mathbf {k+q}^- -E_\mathbf {p-q}^- \right) \nonumber \\{} & {} \times \left[ \mathfrak {h}_\textbf{k}^\uparrow \mathfrak {h}_\textbf{p}^\downarrow \left( 1-\mathfrak {h}_\mathbf {k+q}^\uparrow \right) \left( 1-\mathfrak {h}_\mathbf {p-q}^\downarrow \right) -\mathfrak {h}_\mathbf {k+q}^\uparrow \mathfrak {h}_\mathbf {p-q}^\downarrow \left( 1-\mathfrak {h}_\textbf{k}^\uparrow \right) \left( 1-\mathfrak {h}_\textbf{p}^\downarrow \right) \right] . \end{aligned}$$
(11)

Thus, even in the strongly interacting limit, the quasi-particle distributions obey a Boltzmann equation which has the usual interpretation: Two holons with opposite spins and initial momenta \(\textbf{k}\) and \(\textbf{p}\) collide with each other and are scattered to the final momenta \(\mathbf {k+q}\) and \(\mathbf {p-q}\) where \(\textbf{q}\) is the momentum transfer. Note that the scattering cross section \(\propto (T_\textbf{k}+T_\textbf{p})^2\) is actually independent of the momentum transfer \(\textbf{q}\). For two holons with the same spin, we found a vanishing scattering cross section, i.e., they do not interact at this order.

The second term in the last line of equation (11) represents the inverse process and ensures the conservation of probability or total holon number. Energy conservation is implied by the Dirac delta distribution in the first line of (11). Since the above Boltzmann equation (11) assumes the standard form, it entails the usual consequences, such as the H-theorem describing thermalization etc.

Focusing on the doublon sector instead, one obtains precisely the same form of the Boltzmann equation (11) for \(\mathfrak {d}_\textbf{k}^s= \langle \hat{d}_{\textbf{k}s}^\dagger \hat{d}_{\textbf{k}s}\rangle \) instead of \(\mathfrak {h}_\textbf{k}^s= \langle \hat{h}_{\textbf{k}s}^\dagger \hat{h}_{\textbf{k}s}\rangle \), as expected from the particle-hole duality mentioned in the Introduction. Taking both sectors into account simultaneously also accounts for collisions between doublons and holons, see Appendices A and B.

Note that initial states which are spin polarized in \(\sigma _x\) direction, for example, would also induce off-diagonal terms such as \(\langle \hat{h}_{\textbf{k}\uparrow }^\dagger \hat{h}_{\textbf{k}\downarrow }\rangle \), see also [48]. In the absence of such a spin polarization, however, these terms vanish initially and thus stay zero throughout the evolution because our equations of motion do not contain symmetry-breaking contributions such as magnetic fields. Thus, we omit these off-diagonal terms here.

3 Effective Hamiltonian

In order to compare the Boltzmann equation (11) obtained via the 1/Z-expansion with the standard derivation of Boltzmann equations for weakly interacting systems, let us construct an effective Hamiltonian which would reproduce equation (11) in this way. To this end, let us start with the usual fermionic creation and annihilation operators \(\hat{a}_{\textbf{k}s}^\dagger \) and \(\hat{a}_{\textbf{k}s}\) and the standard ansatz for such an effective Hamiltonian

$$\begin{aligned} \hat{H}_\textrm{eff}=\sum _s\int \limits _{\textbf{k}}E_\textbf{k} \hat{a}_{\textbf{k}s}^\dagger \hat{a}_{\textbf{k}s}+\int \limits _{\textbf{k}\textbf{p}\textbf{q}}V_{\textbf{k}\textbf{p}\textbf{q}}^{\uparrow \downarrow }\hat{a}_{\mathbf {k+q}\uparrow }^\dagger \hat{a}_{\mathbf {p-q}\downarrow }^\dagger \hat{a}_{\textbf{p}\downarrow }\hat{a}_{\textbf{k}\uparrow }\,. \end{aligned}$$
(12)

If we now set \(V_{\textbf{k}\textbf{p}\textbf{q}}^{\uparrow \downarrow }= -(T_\textbf{k}+T_\textbf{p}+T_\mathbf {k+q}+T_\mathbf {p-q})/2\) as well as \(E_\textbf{k}=T_\textbf{k}/2\), we would indeed recover equation (11) via the usual Born-Markov approximation.

However, a few cautionary remarks are in order. First, the standard derivation of equation (11) from equation (12) is based on the usual fermionic commutation relations between the operators \(\hat{a}_{\textbf{k}s}^\dagger \) and \(\hat{a}_{\textbf{k}s}\). In contrast, neither the doublon \(\hat{d}_{\textbf{k}s}^\dagger \) and \(\hat{d}_{\textbf{k}s}\) nor the holon operators \(\hat{h}_{\textbf{k}s}^\dagger \) and \(\hat{h}_{\textbf{k}s}\) satisfy these commutation relations, see equations (6) and (9). Second, in contrast to the weakly interacting case, both \(V_{\textbf{k}\textbf{p}\textbf{q}}^{\uparrow \downarrow }\) and \(E_\textbf{k}\) scale with \(T_\textbf{k}\) and are thus not really independent, which requires special care when justifying the Born-Markov approximation. It should also be noted here that the insertion of the simple replacement \(\hat{a}_{\mu \uparrow }\rightarrow \hat{a}_{\mu \uparrow } (1-\hat{a}_{\mu \downarrow }^\dagger \hat{a}_{\mu \downarrow })\) into the free Hamiltonian \(\sum _{\mu \nu s} T_{\mu \nu } \hat{a}_{\mu s}^\dagger \hat{a}_{\nu s}\) does not yield the correct effective Hamiltonian (12).

As another point, the scattering cross section in the Boltzmann equation (11) is given by the square of the interaction matrix element \(|V^{\uparrow \downarrow }_{\textbf{k}\textbf{p}\textbf{q}}|^2\) and thus does not uniquely determine the sign (or phase) of \(V^{\uparrow \downarrow }_{\textbf{k}\textbf{p}\textbf{q}}\), e.g., whether the interaction is attractive or repulsive. For example, for doublons one should insert \(E_\textbf{k}=U-T_\textbf{k}/2\) and \(V_{\textbf{k}\textbf{p}\textbf{q}}^{\uparrow \downarrow }= (T_\textbf{k}+T_\textbf{p}+T_\mathbf {k+q}+T_\mathbf {p-q})/2\) into the effective Hamiltonian (12), which does, however, yield the same Boltzmann equation (11).

4 Perturbation Theory in T/U

In order to settle the sign ambiguity mentioned above, let us compare our results to strong-coupling perturbation theory, i.e., a power expansion in the small control parameter \(\epsilon =T/U\ll 1\). To this end, we split the Hamiltonian (1) via \(\hat{H}=\hat{H}_U+\hat{H}_T=\hat{H}_0+\hat{H}_1\) into an undisturbed part \(\hat{H}_0=\hat{H}_U=\mathcal {O}(\epsilon ^0)\) plus a perturbation \(\hat{H}_1=\hat{H}_T=\mathcal {O}(\epsilon ^1)\). For general matrix elements

$$\begin{aligned} \mathcal{M}=\langle \Psi _\textrm{out}|\hat{H}|\Psi _\textrm{in}\rangle \,, \end{aligned}$$
(13)

we employ the same power expansion of the states

$$\begin{aligned} |\Psi _\textrm{in}\rangle =|\Psi _\textrm{in}\rangle _0+\epsilon |\Psi _\textrm{in}\rangle _1+\mathcal {O}(\epsilon ^2)\,, \end{aligned}$$
(14)

and analogously for \(|\Psi _\textrm{out}\rangle \).

Because all the states considered in this section satisfy \(\hat{H}_0|\Psi _\textrm{in}\rangle _0=0\) and \(\hat{H}_0|\Psi _\textrm{out}\rangle _0=0\), the first-order matrix elements simplify to

$$\begin{aligned} \mathcal{M}=\langle \Psi _\textrm{out}|\hat{H}_1|\Psi _\textrm{in}\rangle _0+\mathcal {O}(\epsilon ^2)\,. \end{aligned}$$
(15)

Apart from the power expansion in \(\epsilon \), we have not made any assumptions regarding the states \(|\Psi _\textrm{in}\rangle \) and \(|\Psi _\textrm{out}\rangle \), e.g., regarding their degeneracy. They could be the same states, where \(\mathcal{M}\) would yield the energy expectation value, or they could be different states, where \(\mathcal{M}\) would describe a transition matrix element.

4.1 Mott State

Let us start with the Mott state \(|\textrm{Mott}\rangle \), which we take to be the ground state of the Fermi-Hubbard Hamiltonian (1) at half filling (but other choices would also be possible). In the quasi-particle picture, it describes the state without doublons \(\hat{d}_{\textbf{k}s}|\textrm{Mott}\rangle =0\) and holons \(\hat{h}_{\textbf{k}s}|\textrm{Mott}\rangle =0\). After a power expansion in \(\epsilon \)

$$\begin{aligned} |\textrm{Mott}\rangle =|\textrm{Mott}\rangle _0+\epsilon |\textrm{Mott}\rangle _1+\mathcal {O}(\epsilon ^2)\,, \end{aligned}$$
(16)

the zeroth order \(|\textrm{Mott}\rangle _0\) has exactly one particle per site, i.e., \(\hat{e}_{\textbf{k}s}|\textrm{Mott}\rangle _0=0\) and \(\hat{f}_{\textbf{k}s}|\textrm{Mott}\rangle _0=0\).

The virtual hopping corrections mentioned in the Introduction are included in the first-order correction

$$\begin{aligned} |\textrm{Mott}\rangle _1=-\frac{\hat{H}_1}{U}|\textrm{Mott}\rangle _0\,, \end{aligned}$$
(17)

consistent with Bogoliubov transformation (9) between \(\hat{d}_{\textbf{k}s}\) and \(\hat{h}_{\textbf{k}s}\) on the one hand and \(\hat{f}_{\textbf{k}s}\) and \(\hat{e}_{\textbf{k}s}\) on the other hand. Obviously, the first-order energy shift vanishes

$$\begin{aligned} \langle \textrm{Mott}|\hat{H}_1|\textrm{Mott}\rangle _0=0\,, \end{aligned}$$
(18)

such that the ground-state energy is of order \(T^2/U\).

4.2 One-Holon State

The quasi-particle picture described above motivates the ansatz \(\hat{h}_{\textbf{k}s}^\dagger |\textrm{Mott}\rangle \) for the state containing one holon. However, one should be a bit careful because the operators \(\hat{h}_{\textbf{k}s}\) and \(\hat{h}_{\textbf{k}s}^\dagger \) do not obey the usual commutation relations. Fortunately, the calculation of the first-order matrix elements (15) only requires the zeroth-order states

$$\begin{aligned} |\Psi _\textrm{in}\rangle _0 = \mathcal{N}_{\textbf{k}\uparrow }\hat{e}^\dagger _{\textbf{k}\uparrow }|\textrm{Mott}\rangle _0 = \mathcal{N}_{\textbf{k}\uparrow }\hat{c}_{\textbf{k}\uparrow }|\textrm{Mott}\rangle _0 = \mathcal{N}_{\textbf{k}\uparrow }\sum \limits _\alpha \hat{c}_{\alpha \uparrow }|\textrm{Mott}\rangle _0 \exp \{i\textbf{k}\cdot \textbf{r}_\alpha \}\,, \end{aligned}$$
(19)

where most of these difficulties are absent because the operators \(\hat{c}_{\textbf{k}s}\) and \(\hat{c}_{\textbf{k}s}^\dagger \) do satisfy the standard commutation relations. The normalization \(\mathcal{N}_{\textbf{k}\uparrow }\) can be derived from \(\langle \textrm{Mott}| \hat{c}^\dagger _{\alpha \uparrow }\hat{c}_{\beta \uparrow } |\textrm{Mott}\rangle _0 =\delta _{\alpha \beta } \langle \textrm{Mott}|\hat{n}_{\alpha \uparrow }|\textrm{Mott}\rangle _0\) and is – independently of \(\textbf{k}\) – just determined by the total number of particles with spin \(\uparrow \).

In analogy, we use the same ansatz for \(|\Psi _\textrm{out}\rangle _0\) with \(\textbf{k}'\) and same spin \(\uparrow \) (all other matrix elements vanish)

$$\begin{aligned} \mathcal{M}= & {} -\frac{\left| \mathcal{N}_\uparrow \right| ^2}{Z}\sum \limits _{\alpha \beta \mu \nu s}T_{\mu \nu }\exp \{i\textbf{k}\cdot \textbf{r}_\alpha -i\textbf{k}'\cdot \textbf{r}_\beta \}\nonumber \\{} & {} \times \langle \textrm{Mott}|\hat{c}^\dagger _{\beta \uparrow }\hat{c}^\dagger _{\mu s}\hat{c}_{\nu s}\hat{c}_{\alpha \uparrow }|\textrm{Mott}\rangle _0 +\mathcal {O}(\epsilon ^2)\, . \end{aligned}$$
(20)

Since the hopping matrix \(T_{\mu \nu }\) is only non-zero for \(\mu \ne \nu \) and the state \(|\textrm{Mott}\rangle _0\) has exactly one particle per site, we may set \(\alpha =\mu \) and \(\beta =\nu \) or vice versa in the sum

$$\begin{aligned} \mathcal{M}= & {} \frac{\left| \mathcal{N}_\uparrow \right| ^2}{Z}\sum \limits _{\mu \nu }T_{\mu \nu }\exp \{i\textbf{k}\cdot \textbf{r}_\mu -i\textbf{k}'\cdot \textbf{r}_\nu \}\nonumber \\{} & {} \times \big (\langle \textrm{Mott}|\hat{n}_{\mu \uparrow }\hat{n}_{\nu \uparrow }|\textrm{Mott}\rangle _0-\langle \textrm{Mott}|\hat{c}^\dagger _{\mu \downarrow } \hat{c}_{\mu \uparrow }\hat{c}^\dagger _{\nu \uparrow }\hat{c}_{\nu \downarrow }|\textrm{Mott}\rangle _0\big )\nonumber \\{} & {} +\mathcal {O}(\epsilon ^2) \, . \end{aligned}$$
(21)

In addition to the number correlator, we obtain the spin-flip term \(\langle \hat{S}^-_\mu \hat{S}^+_\nu \rangle _0\).

If the lattice and the state \(|\textrm{Mott}\rangle _0\) obey translational invariance, the expectation values only depend on the relative coordinate \(\textbf{r}_\mu -\textbf{r}_\nu \) and thus the sum over the center-of-mass coordinate \(\textbf{r}_\mu +\textbf{r}_\nu \) corresponds to momentum conservation \(\delta _{\textbf{k}\textbf{k}'}\). In case of rotational invariance, the expectation values yield the same result for all pairs of neighbors \(\mu \) and \(\nu \) and thus the remaining sum over \(\textbf{r}_\mu -\textbf{r}_\nu \) just yields the Fourier transform \(T_\textbf{k}\) of the hopping matrix, i.e., \(\mathcal{M}\propto \delta _{\textbf{k}\textbf{k}'}T_\textbf{k}+\mathcal {O}(\epsilon ^2)\). For the mean-field ansatz (4), we find \(\mathcal{M}=\delta _{\textbf{k}\textbf{k}'}T_\textbf{k}/2+\mathcal {O}(\epsilon ^2)\) which reproduces the holon energy (8) to lowest order for \(\textbf{k}=\textbf{k}'\) and vanishes for \(\textbf{k}\ne \textbf{k}'\), reflecting momentum conservation. As an outlook, one could study the scattering of holons (i.e., \(\textbf{k}\ne \textbf{k}'\)) by spin inhomogeneities via inserting a mean-field ansatz which breaks translational invariance.

If we replace the mean-field ansatz (4) by the Ising type anti-ferromagnet (5), we find that the first-order matrix elements vanish \(\mathcal{M}=\mathcal {O}(\epsilon ^2)\). Again, this is consistent with the quasi-particle picture because the quasi-particle energies do not contain a linear contribution in this case, as discussed after equation (8).

4.3 Two-Holon State

Now let us consider initial \(|\Psi _\textrm{in}\rangle _0\) and final \(|\Psi _\textrm{out}\rangle _0\) states containing two holons, where we start with the case of opposite spins, as motivated by the Boltzmann equation (11). As usual in scattering theory, we envisage initial and final holon wave-packets which do not overlap but interact in an intermediate space-time region. Then, in straightforward generalization of the one-holon case, we use the following ansatz for their Fourier components

$$\begin{aligned} |\Psi _\textrm{in}\rangle _0=\mathcal{N}_{\textbf{k}_1\textbf{k}_2}^{\uparrow \downarrow }\hat{c}_{\textbf{k}_1\uparrow }\hat{c}_{\textbf{k}_2\downarrow }|\textrm{Mott}\rangle _0=\mathcal{N}_{\textbf{k}_1\textbf{k}_2}^{\uparrow \downarrow }\sum \limits _{\alpha \beta }\hat{c}_{\alpha \uparrow }\hat{c}_{\beta \downarrow }|\textrm{Mott}\rangle _0e^{i\textbf{k}_1\cdot \textbf{r}_\alpha +i\textbf{k}_2\cdot \textbf{r}_\beta }\,, \end{aligned}$$
(22)

and analogously for \(|\Psi _\textrm{out}\rangle _0\) with \(\textbf{k}_3\) and \(\textbf{k}_4\). The resulting matrix elements read

$$\begin{aligned} \mathcal{M}= & {} -\frac{\mathcal{N}_{\textbf{k}_1\textbf{k}_2}^{\uparrow \downarrow }(\mathcal{N}_{\textbf{k}_3\textbf{k}_4}^{\uparrow \downarrow })^*}{Z}\sum \limits _{\alpha \beta \gamma \delta \mu \nu s}e^{i\textbf{k}_1\cdot \textbf{r}_\alpha +i\textbf{k}_2\cdot \textbf{r}_\beta -i\textbf{k}_3\cdot \textbf{r}_\gamma -i\textbf{k}_4\cdot \textbf{r}_\delta }\nonumber \\{} & {} \times T_{\mu \nu }\langle \textrm{Mott}|\hat{c}_{\delta \downarrow }^\dagger \hat{c}_{\gamma \uparrow }^\dagger \hat{c}^\dagger _{\mu s}\hat{c}_{\nu s}\hat{c}_{\alpha \uparrow }\hat{c}_{\beta \downarrow }|\textrm{Mott}\rangle _0+\mathcal {O}(\epsilon ^2)\,. \end{aligned}$$
(23)

The expectation values in the second line are only non-zero if \(\alpha \), \(\beta \), and \(\nu \) are mutually different, and the same for \(\gamma \), \(\delta \), and \(\mu \). We only get non-vanishing contributions if the triple \(\{\alpha ,\beta ,\nu \}\) is a permutation of the triple \(\{\gamma ,\delta ,\mu \}\). In view of \(T_{\mu \nu }=0\) for \(\mu =\nu \), we are left with four permutations (and the sum over spin s).

Altogether, this yields expectation values of the number operators such as \(\langle \textrm{Mott}|\hat{n}_\alpha ^\uparrow \hat{n}_\mu ^\downarrow \hat{n}_\nu ^\downarrow \) \(|\textrm{Mott}\rangle _0\) and spin-flip terms of the form \(\langle \textrm{Mott}|\hat{n}_\alpha ^\uparrow \hat{S}_\mu ^+\hat{S}_\nu ^-|\textrm{Mott}\rangle _0\). For the mean-field ansatz (4), only the former contribute and the matrix element simplifies to

$$\begin{aligned} \mathcal{M}=\frac{T_{\textbf{k}_1}+T_{\textbf{k}_2}}{2}\, \delta _{\textbf{k}_1\textbf{k}_3}\delta _{\textbf{k}_2\textbf{k}_4} -\frac{T_{\textbf{k}_1}+T_{\textbf{k}_2}+T_{\textbf{k}_3}+T_{\textbf{k}_4}}{2}\,\delta _{\textbf{k}_1+\textbf{k}_2,\textbf{k}_3+\textbf{k}_4}\,. \end{aligned}$$
(24)

This result is consistent with the effective Hamiltonian (12) where the first term on the right-hand side of equation (24) corresponds to the free propagation of the two holons with their quasi-particle energies \(E_\textbf{k}\) while the second term describes their scattering with the effective interaction potential \(V^{\uparrow \downarrow }_{\textbf{k}\textbf{p}\textbf{q}}\).

The origin of this effective interaction potential is the fact that the sums are not independent of each other, e.g., the sum over \(\alpha =\gamma \) is not independent of the remaining sums over \(\mu =\beta \) and \(\nu =\delta \) because \(\alpha \), \(\beta \), and \(\nu \) must be mutually different to yield a non-zero expectation value (as explained above). As an intuitive picture, the presence of the \(\uparrow \)-holon at site \(\alpha \) may effectively inhibit the hopping of the \(\downarrow \)-holon from site \(\nu \) to \(\mu \) and thus changes its energy – which implies an effective interaction.

4.4 Two-Holon Triplet State

For comparison, let us consider the state of two holons with the same spin. In complete analogy to (22), we use the ansatz

$$\begin{aligned} |\Psi _\textrm{in}\rangle _0=\mathcal{N}_{\textbf{k}_1\textbf{k}_2}^{\uparrow \uparrow }\hat{c}_{\textbf{k}_1\uparrow }\hat{c}_{\textbf{k}_2\uparrow }|\textrm{Mott}\rangle _0\,. \end{aligned}$$
(25)

Following the same steps as in the previous subsection, including the insertion of the mean-field ansatz (4), we find that only the matrix elements corresponding to the free propagation survive

$$\begin{aligned} \mathcal{M}=\frac{T_{\textbf{k}_1}+T_{\textbf{k}_2}}{2}\left( \delta _{\textbf{k}_1\textbf{k}_3}\delta _{\textbf{k}_2\textbf{k}_4} -\delta _{\textbf{k}_1\textbf{k}_4}\delta _{\textbf{k}_2\textbf{k}_3}\right) \,. \end{aligned}$$
(26)

Again, this is consistent with the Boltzmann equation (11) which also did not contain scattering between holons of equal spin.

4.5 Spin Correlations

So far, our results were based on the zeroth-order mean-field ansatz (4) which neglects all correlations between the lattice sites. Including such correlations leads to corrections to these results. For the two-holon triplet state, for example, an effective interaction \(V_{\textbf{k}\textbf{p}\textbf{q}}^{\uparrow \uparrow }\) can be obtained if we include correlations between lattice sites, i.e., go beyond the mean-field ansatz (4). Taking into account these correlations between two lattice sites \(\mu \) and \(\nu \) as encoded in \(\hat{\rho }_{\mu \nu }^\textrm{corr}=\mathcal {O}(1/Z)\), but neglecting all three-point correlators \(\hat{\rho }_{\mu \nu \lambda }^\textrm{corr}=\mathcal {O}(1/Z^2)\), we find

$$\begin{aligned} V_{\textbf{k}\textbf{p}\textbf{q}}^{\uparrow \uparrow }=\left( T_{\textbf{k}}+T_{\textbf{p}}+T_{\mathbf {k+q}}+T_{\mathbf {p-q}} \right) C_\textbf{q}^{\uparrow \uparrow }\,, \end{aligned}$$
(27)

where \(C_\textbf{q}^{\uparrow \uparrow }\) denotes the Fourier transform of the number correlations

$$\begin{aligned} \langle \textrm{Mott}|\hat{n}_\alpha ^\uparrow \hat{n}_\beta ^\uparrow |\textrm{Mott}\rangle _0^\textrm{corr} = \int \limits _\textbf{q}C_\textbf{q}^{\uparrow \uparrow }e^{i\textbf{q}\cdot (\textbf{r}_\alpha -\textbf{r}_\beta )}\,. \end{aligned}$$
(28)

The sign of the correlations depends on the spin order of the background state. For anti-ferromagnetic order, \(\langle \hat{n}_\alpha ^\uparrow \hat{n}_\beta ^\uparrow \rangle _0^\textrm{corr}\) is negative for nearest neighbors \(\alpha \) and \(\beta \) but positive for next-to-nearest neighbors – while for (locally) ferromagnetic order, it would also be positive for nearest neighbors.

In analogy, we may derive the correlation corrections to the interaction between two holons of opposite spin

$$\begin{aligned} V_{\textbf{k}\textbf{p}\textbf{q}}^{\uparrow \downarrow }= -\frac{1}{2} \left( T_{\textbf{k}}+T_{\textbf{p}}+T_{\mathbf {k+q}}+T_{\mathbf {p-q}}\right) \left[ 1-4C_\textbf{q}^{\uparrow \downarrow }- 8C_\mathbf {p-k-q}^{\uparrow \downarrow }\right] \,, \end{aligned}$$
(29)

where we have used the SU(2)-symmetry [41] of the Mott state \(\langle \hat{S}^x_\mu \hat{S}^x_\nu \rangle _0= \langle \hat{S}^y_\mu \hat{S}^y_\nu \rangle _0= \langle \hat{S}^z_\mu \hat{S}^z_\nu \rangle _0\) in order to express \(\langle \hat{S}^-_\mu \hat{S}^+_\nu +\hat{S}^-_\nu \hat{S}^+_\mu \rangle _0 =2\langle \hat{S}^x_\mu \hat{S}^x_\nu +\hat{S}^y_\mu \hat{S}^y_\nu \rangle _0\) in terms of \(\langle \hat{S}^z_\mu \hat{S}^z_\nu \rangle _0\), i.e., the number correlations such as \(\langle \hat{n}_\mu ^\uparrow \hat{n}_\nu ^\uparrow \rangle _0^\textrm{corr}\) or \(\langle \hat{n}_\mu ^\uparrow \hat{n}_\nu ^\downarrow \rangle _0^\textrm{corr}= -\langle \hat{n}_\mu ^\uparrow \hat{n}_\nu ^\uparrow \rangle _0^\textrm{corr}\).

5 Hubbard Tetramer

Let us exemplify the above results for an analytically solvable example, the Hubbard tetramer consisting of four lattice sites in the form of a square (\(Z=2\)) [46, 49]. Already in this simple case, the total Hilbert space has \(4^4=256\) dimensions and thus the Hamiltonian (1) can be represented by a \(256\times 256\)-matrix. However, by using the conserved quantities such as the particle numbers \(N^\uparrow \) and \(N^\downarrow \), the total spin (2) and pseudo-spin (see Appendix C), as well as the spatial symmetries, one may cast this Hamiltonian into a block-diagonal form consisting of matrices with maximum rank four – admitting analytic solutions.

In contrast to the previous sections, which were devoted to the case of half filling (only marginally disturbed by one or two holons), we shall now also consider filling factors of 3/8 (one holon) and 1/4 (two holons).

5.1 Mott State

In the sector \(N^\uparrow =N^\downarrow =2\), the ground state has an energy of \(-3T^2/U+\mathcal {O}(\epsilon ^3)\) and vanishing total spin \(S=0\) as well as pseudo-spin \(\eta =0\), see also [41]. We identify this state with the Mott state \(|\mathrm Mott\rangle \), which then displays the lowest-order structure

$$\begin{aligned} |\mathrm Mott\rangle _0= & {} \frac{|\uparrow ,\downarrow ,\uparrow ,\downarrow \rangle +|\downarrow ,\uparrow ,\downarrow ,\uparrow \rangle }{\sqrt{3}}\nonumber \\{} & {} -\frac{|\uparrow ,\uparrow ,\downarrow ,\downarrow \rangle +|\downarrow ,\uparrow ,\uparrow ,\downarrow \rangle +|\downarrow ,\downarrow ,\uparrow ,\uparrow \rangle +|\uparrow ,\downarrow ,\downarrow ,\uparrow \rangle }{\sqrt{12}}\,. \end{aligned}$$
(30)

The first-order hopping corrections can be obtained from equation (17). Note that one should be careful with the above representation because the sign of the basis vectors such as \(|\uparrow ,\downarrow ,\uparrow ,\downarrow \rangle = \hat{c}_{4\downarrow }^\dagger \hat{c}_{3\uparrow }^\dagger \hat{c}_{2\downarrow }^\dagger \hat{c}_{1\uparrow }^\dagger |0\rangle \) depends on the chosen order of the fermionic operators. Note that there is also another state in this singlet sector with \(S=\eta =0\) and \(N^\uparrow =N^\downarrow =2\), which has a slightly higher energy of \(-T^2/U+\mathcal {O}(\epsilon ^3)\).

5.2 One-Holon State

The one-holon states – as single quasi-particle excitations around the Mott state – are then identified with the eigenstates in the doublet sector with \(S=1/2\) and \(N^\uparrow =2\) and \(N^\downarrow =1\) (or \(N^\uparrow =1\) and \(N^\downarrow =2\)). They have eigen-energies of \(\pm T/2+\mathcal {O}(\epsilon ^2)\) and \(\pm \sqrt{3}T/2+\mathcal {O}(\epsilon ^2)\). In the following, we shall omit the symbols \(\mathcal {O}(\epsilon ^2)\) for brevity and just state the energies to first order in T.

5.3 Two-Holon State

To study the two-holon states, let us first consider the case \(N^\uparrow =N^\downarrow =1\). These two-holon states lie in the singlet (\(S=0\)) or triplet (\(S=1\)) sector and have eigen-energies \(\pm \sqrt{2}T\), \(\pm T\), and zero. The fact that all eigen-energies obey the reflection symmetry \(T\rightarrow -T\) is a consequence of the staggered gauge transformation mentioned in the Introduction.

Already on the level of the eigen-energies, we find that not all two-holon energies can be written as a sum of two one-holon energies – which can be interpreted as a signature of their interactions. For example, adding one \(\uparrow \) holon with energy \(-\sqrt{3}T/2\) to another \(\downarrow \) holon with the same energy \(-\sqrt{3}T/2\), one would expect a total energy of \(-\sqrt{3}T\) in the non-interacting case. However, such an energy is not contained in the spectrum. Instead, the lowest two-holon energy is \(-\sqrt{2}T\). As an intuitive picture, the presence of the \(\downarrow \) holon reduces the options for the \(\uparrow \) holon to lower the energy via tunneling and vice versa. As a result, these two quasi-particles effectively repel each other in this case.

In addition to the eigen-energies, we may also consider the eigen-states. To this end, let us introduce the operators

$$\begin{aligned} \hat{c}_{s\pm }=\frac{1}{\sqrt{2}} \sum _{\mu =1}^4 (\pm 1)^\mu \hat{c}_{\mu s}\,. \end{aligned}$$
(31)

Acting on the lowest-order Mott state (30), these operators generate the lowest-order one-holon states with energies \(\mp T/2\). However, if we generate a two-holon state by applying these operators twice \(\hat{c}_{\uparrow +}\hat{c}_{\downarrow +}|\text {Mott}\rangle _0\), we obtain an eigen-state with zero energy. This again supports the interpretation that one holon disturbs the hopping options for the other holon such that they repel each other. Note that the same zero-energy state can be obtained via \(\hat{c}_{\uparrow -}\hat{c}_{\downarrow -}|\text {Mott}\rangle _0\) which is consistent with the staggered gauge transformation mentioned in the Introduction and would then lead to the interpretation that these two holons attract each other. Another zero-energy eigenstate can be obtained by \(\hat{c}_{\uparrow +}\hat{c}_{\downarrow -}|\text {Mott}\rangle _0\) which would fit to the non-interacting case.

5.4 Two-Holon Triplet State

To complete the picture, let us discuss the two-holon states for the case \(N^\uparrow =2\) and \(N^\downarrow =0\). Obviously, they are in the triplet sector \(S=1\) and the repulsion U does not play any role in this case. Thus the eigen-energies are the same as in the non-interacting case, i.e., \(\pm T\) and zero. If we try to write these two-holon eigen-energies as the sum of two one-holon eigen-energies, we see that this works for some of the one-holon states (with energies \(\pm T/2\)), but not for the others (with energies \(\pm \sqrt{3}T/2\)), which can again be interpreted as a signature of their interactions. Even though the repulsion U does not play any role for the two-holon states with \(N^\uparrow =2\) and \(N^\downarrow =0\), it is important for the one-holon states.

As an example for the states, we can obtain a two-holon eigen-state via \(\hat{c}_{\downarrow +}\hat{c}_{\downarrow -}|\mathrm Mott\rangle _0\) which has (exactly) zero eigen-energy. Consistent with the above considerations, this would correspond to a case where two holons do not interact.

6 Bardeen-Cooper-Schrieffer (BCS) Theory

Having obtained repulsive as well as attractive contributions to the interaction between the quasi-particles such as holons, let us now investigate possible implications for Bardeen-Cooper-Schrieffer (BCS) like pairing, which might be relevant for our understanding of high-temperature superconductivity [50,51,52]. To this end, we assume a small but finite density of holons – corresponding to a filling factor slightly below half filling.

As one possible approach, one could start from an effective Hamiltonian such as in equation (12) and then follow a procedure very analogous to the standard BCS theory of superconductivity [53], see Section 6.2 below. However, as already explained in Section 3, one might object that the effective quasi-particle operators do not obey the standard commutation relations.

6.1 Variational Ansatz

Thus, we shall first pursue a more conservative approach and employ a variational ansatz in order to see whether and when BCS like pairing could lead to a reduction of the energy. To this end, we use the following ansatz for the zeroth-order BCS state

$$\begin{aligned} |\text {BCS}\rangle _0 = \mathcal{{N}}{\exp }\left\{ \sum \limits _{\mu \nu }\xi _{\mu \nu } {\hat{c}}_{\mu \uparrow }\hat{c}_{\nu \downarrow } \right\} |\mathrm Mott\rangle _0\,, \end{aligned}$$
(32)

with the pairing (squeezing) operator \(\xi _{\mu \nu }\) and a normalization \(\mathcal N\) which is required because the above exponential is not unitary. At a first glance, this ansatz may appear a bit unusual, but using translational invariance of the \(\xi _{\mu \nu }\), we may cast it into a more familiar form

$$\begin{aligned} |\text{ BCS }\rangle _0 = \mathcal{{N}}{\exp }\left\{ \int \limits _\textbf{k}\xi _\textbf{k} {\hat{c}}_{{\textbf {k}}\uparrow }{\hat{c}}_{-{\textbf {k}}\downarrow } \right\} |\text{ Mott }\rangle _0 = \prod \limits _{{\textbf {k}}} \left( u_{{\textbf {k}}}+v_{{\textbf {k}}}{\hat{c}}_{{\textbf {k}}\uparrow }{\hat{c}}_{-{\textbf {k}}\downarrow } \right) |\text{ Mott }\rangle _0\,, \end{aligned}$$
(33)

where we have used the fact that the exponential factorizes and its Taylor expansion terminates after the first order due to the Pauli principle.

Unfortunately, the Mott state does not factorize in the \(\textbf{k}\) basis, rendering the calculation of expectation values difficult. Thus, we use a Taylor expansion for small \(\xi _\textbf{k}\) in order to study in which direction the energy could be reduced. In addition, we employ strong-coupling perturbation theory as in Section 4 which yields expectation values that we have already calculated there

$$\begin{aligned} \langle {\hat{H}}\rangle= & {} \langle \text{ BCS }|\hat{H}_1|\text{ BCS }\rangle _0 +\mathcal {O}(\epsilon ^2)\nonumber \\= & {} 2\int \limits _{{\textbf {k}}}|\xi _{{\textbf {k}}}|^2E_{{\textbf {k}}}^- -\int \limits _{{\textbf {k,p}}}\xi _{{\textbf {k}}}\xi _{{\textbf {p}}}^* \left( T_{\textbf{k}}+T_{\textbf{p}}\right) \left[ 1-12C_\mathbf {k+p}^{\uparrow \downarrow }\right] +\mathcal {O}(\epsilon ^2) +\mathcal {O}(|\xi _\textbf{k}|^4)\,. \end{aligned}$$
(34)

In the first term, \(|\xi _{{\textbf {k}}}|^2\) just gives the number of holon pairs with the holon eigen-energies \(E_{{\textbf {k}}}^-\) which are, up to small correlation induced corrections, given by \(T_\textbf{k}/2\). The second term corresponds to their interaction.

In order to avoid disturbing the Mott background too much, we consider a small number of holons, which is consistent with the assumption of small \(|\xi _\textbf{k}|^2\). Then, only states close the minimum energies \(E_\textbf{k}^-\approx T_\textbf{k}/2\) should be occupied by holons. Assuming a square lattice where \(T_\textbf{k}\) behaves as \(\cos k_x+\cos k_y\), states around \((k_x,k_y)=(\pi ,\pi )\) are filled up first in order to minimize the energy. For these states, the lowest-order interaction term \((T_{\textbf{k}}+T_{\textbf{p}})\) is repulsive, such that the usual s-wave pairing mechanism would not lead to a reduced energy.

However, for d-wave order parameters \(\xi _\textbf{k}\), which behave as \(\cos k_x-\cos k_y\), both \(\int _\textbf{k}\xi _\textbf{k}\) and \(\int _\textbf{k}\xi _\textbf{k}T_\textbf{k}\) vanish due to the angular average and thus the lowest-order repulsion term \((T_{\textbf{k}}+T_{\textbf{p}})\) cancels. The remaining correlations \(C_\mathbf {k+p}^{\uparrow \downarrow }\) can then indeed favor d-wave pairing of holons since it corresponds to an effectively attractive contribution. In order to see how such a d-wave pairing could lower the energy, let us Taylor expand \(C_\textbf{q}^{\uparrow \downarrow }\) for small momenta

$$\begin{aligned} C_\textbf{q}^{\uparrow \downarrow } =c_0+c_2{{\textbf {q}}}^2+c_4(q_x^4+q_y^4) +\tilde{c}_4q_x^2q_y^2 +\dots \end{aligned}$$
(35)

After insertion into the variational ansatz (34), the constant \(c_0\) and quadratic \(c_2\) contributions vanish after their convolution with the d-wave order parameters \(\xi _\textbf{k}\) and \(\xi _\textbf{p}^*\), but the quartic term \(c_4\) does indeed generate a reduction of the energy \(\langle \hat{H}\rangle \) provided that it is positive \(c_4>0\). This condition \(c_4>0\) is satisfied for anti-ferromagnetic nearest-neighbor correlations, for which \(C_\textbf{q}^{\uparrow \downarrow }\) behaves as \(\cos q_x+\cos q_y\) such that \(c_4>0\) and \(\tilde{c}_4=0\). On the other hand, a non-zero \(\tilde{c}_4\) could also support tilted d-wave pairing where \(\xi _\textbf{k}\) behaves as \(\sin k_x \sin k_y\).

In summary, starting with the Mott state and adding a small amount of holon pairs suggests an instability towards d-wave pairing (but not s-wave pairing). This is generated by the effectively attractive contribution to the interaction between holons stemming from the correlation \(C_\textbf{q}^{\uparrow \downarrow }\).

6.2 Effective Hamiltonian

Of course, it would be desirable to go beyond lowest order in \(\xi _\textbf{k}\) and to include the chemical potential \(\mu \) etc. Note that \(\mu \) is now meant to describe the effective chemical potential associated to the finite density of holons, not the chemical potential \(\mu =U/2\) in the grand-canonical Hamiltonian for the original fermions as discussed after equation (2). Ignoring the problems associated with the effective Hamiltonian (12) for a moment, let us treat the holons as fundamental particles as described by the creation and annihilation operators \(\hat{a}_{\textbf{k},s}^\dagger \) and \(\hat{a}_{\textbf{k},s}\) which obey the usual fermionic commutation relations. Then we may start from the effective Hamiltonian (12) together with the effective interaction (29) and perform the same steps as in standard BCS theory, including the derivation of a gap equation. To this end, we make an ansatz for the BCS state which is of the standard form [53]

$$\begin{aligned} |\textrm{BCS}\rangle = \prod _\textbf{k}\left( u_\textbf{k}+v_\textbf{k} \hat{a}^\dagger _{-\textbf{k},\uparrow }\hat{a}_{\textbf{k},\downarrow }^\dagger \right) |0\rangle \,. \end{aligned}$$
(36)

Here \(|0\rangle \) denotes the vacuum state \(\hat{a}_{\textbf{k},s}|0\rangle =0\) and the variational coefficients fulfil \(|u_\textbf{k}|^2+|v_\textbf{k}|^2=1\) which guarantees the normalization of the BCS state. The minimization of the energy \(\langle \textrm{BCS}|\hat{H}_\textrm{eff}|\textrm{BCS}\rangle \) leads to the self-consistency equation for the pairing amplitude

$$\begin{aligned} \triangle _\textbf{p}= -\int \limits _\textbf{k}V^{\uparrow \downarrow }_{\mathbf {k,-k,p-k}} \langle \textrm{BCS}|\hat{a}_{\textbf{k},\downarrow } \hat{a}_{-\textbf{k},\uparrow }|\textrm{BCS}\rangle =-\int \limits _\textbf{k}V^{\uparrow \downarrow }_\mathbf {k,-k,p-k} \frac{ \triangle _\textbf{k}}{2\sqrt{(E_\textbf{k}-\mu )^2+\triangle _\textbf{k}^2}}\,. \end{aligned}$$
(37)

As usual, non-trivial solutions \(\triangle _\textbf{p}\) are obtained if the interaction \(V^{\uparrow \downarrow }_{\mathbf {k,-k,p-k}}\) contains attractive contributions, which are the correlation terms \(C_\textbf{k}^{\uparrow \downarrow }\) in equation (29). In order to estimate these correlations, we exploit the SU(2)-symmetry of the Mott state and the Lieb theorem [41] which states that \(\hat{\textbf{S}}^2|\textrm{Mott}\rangle =0\). Then, neglecting correlations beyond neighboring sites implies \(C_\textbf{k}^{\uparrow \downarrow }\approx T_\textbf{k}/(16T)\) in two dimensions.

Again assuming a square lattice where \(T_\textbf{k}\) behaves as \(\cos k_x+\cos k_y\), holon states around the minimum at \((k_x,k_y)=(\pi ,\pi )\) are filled up first. Shifting the origin to the minimum \(k_{x,y}\rightarrow k_{x,y}+\pi \), the energies \(E_\textbf{k}\) scale quadratically for small \(\textbf{k}\). Then we may seek for d-wave solutions of the gap equation (37)

$$\begin{aligned} \triangle _\textbf{k}=\triangle ^\textrm{d}\left( \cos k_x-\cos k_y\right) \,, \end{aligned}$$
(38)

which do also scale quadratically \(k^2_y-k^2_x\) for small \(\textbf{k}\). As usual, pairing is expected to be most pronounced in the vicinity around the Fermi momentum \(k_\textrm{F}\) such that we restrict the integral (37) to the interval \(|k-k_\textrm{F}|<k_\textrm{cut}\) with some cut-off \(k_\textrm{cut}\le \mathcal {O}(k_\textrm{F})\). Linearizing the energies \(E_\textbf{k}\) in this interval, we find

$$\begin{aligned} \triangle ^{\text {d}} =\mathcal {O}(T)\exp \left\{ -\frac{32\pi }{3k_{\text {F}}^4}\right\} \,. \end{aligned}$$
(39)

Since we have re-scaled all length scales with respect to the lattice spacing \(\ell \), the above Fermi momentum \(k_\textrm{F}\) is dimensionless. Restoring physical units would correspond to the replacement \(k_\textrm{F}\rightarrow \ell k_\textrm{F}\).

As in the usual BCS theory, we obtain an exponential suppression of the gap, but now the exponent is not inversely proportional to the coupling strength (because the kinetic and the interaction energy both scale linearly in T) but to the fourth power of the Fermi momentum, i.e., the holon number density squared. Two powers of \(k_{\text {F}}\) stem from the volume element, the other two from the quadratic scaling of the d-wave order parameter. The strong exponential suppression for small \(k_{\text {F}}\) might indicate that a certain holon number density is required to observe d-wave pairing [54,55,56,57,58], but further investigations are needed to settle this issue.

Nonetheless, comparing our findings with the well-known phase diagram of cuprates, for example, we find qualitative consistency as superconductivity is usually associated with a region of finite holon doping at low temperatures. Of course, the range of applicability of the simple single-band Fermi-Hubbard model (1) must be taken into account in this regard. This becomes even more important for the opposite case of electron doping. The particle-hole duality discussed in the Introduction implies that BCS states for doublons should exist in the same way as for holons. However, the asymmetry of the phase diagram of cuprates with respect to electron doping versus hole doping already shows that these systems do not display this particle-hole duality (as is also well known) and thus requires a description beyond the single-band Fermi-Hubbard model (1).

7 Conclusions

Via a combination of approaches, we studied the interaction between doublons or holons as quasi-particle excitations (i.e., charge modes) of the Mott insulator state in the strongly interacting Fermi-Hubbard model. Using the hierarchy of correlations and the simple mean-field ansatz (4), we derived a Boltzmann equation (11) with a scattering cross section which is quadratic in the hopping strength T for doublons or holons of opposite spin (and zero otherwise).

This motivates an effective interaction \(V_{\textbf{k}\textbf{p}\textbf{q}}^{\uparrow \downarrow }\) whose strength is linear in T and which can be represented by an effective Hamiltonian of the form (12). Note that this effective Hamiltonian should be treated with special care: First, the doublon and holon quasi-particle operators \(\hat{d}_{\textbf{k}s}\) and \(\hat{h}_{\textbf{k}s}\) do not satisfy the standard commutation relations. Second, the Boltzmann equation does only contain the absolute value squared of the interaction strength \(|V_{\textbf{k}\textbf{p}\textbf{q}}^{\uparrow \downarrow }|^2\) and thus does not determine its sign (attractive or repulsive) uniquely.

Although one might use continuity arguments to demonstrate that the effective interaction \(V_{\textbf{k}\textbf{p}\textbf{q}}^{\uparrow \downarrow }\) can be attractive as well as repulsive (depending on the momenta), we employed strong-coupling perturbation theory to infer \(V_{\textbf{k}\textbf{p}\textbf{q}}^{\uparrow \downarrow }\) including its sign. Inserting the simple mean-field ansatz (4), which neglects the correlations between lattice sites, we indeed recover the interaction \(V_{\textbf{k}\textbf{p}\textbf{q}}^{\uparrow \downarrow }\) in the effective Hamiltonian (12) and thus the Boltzmann equation (11) to lowest order in T/U.

These calculations motivate the following intuitive picture: In the Mott insulator state, hopping is suppressed due to the Mott gap, such that the tunneling probabilities scale with \(T^2/U^2\). Inserting a holon, however, the system can lower its energy by tunneling – which gives rise to the single-holon quasi-particle energies of order T. Two holons far away from each other lower the energy according to the sum of their quasi-particle energies. However, if they come too close, the presence of one holon can influence (suppress) the tunneling of the other holon and vice versa, such that the energy reduction changes – giving rise to an effective interaction. Obviously, starting from the Mott state (containing one particle per site), two holons cannot occupy the same lattice site.

Consistent with this picture, two holons with momenta \(\textbf{k}\) and \(\textbf{p}\) which both lower the energy separately \(T_\textbf{k}<0\) and \(T_\textbf{p}<0\) would repel each other while two holons which increase the energy separately \(T_\textbf{k}>0\) and \(T_\textbf{p}>0\) would attract each other. The above line of argument specifically applies to holons, but the particle-hole duality mentioned in the Introduction implies the analogous behavior for doublons after the substitution \(T\rightarrow -T\). (The interaction between a doublon and a holon is discussed in Appendix B.) As a result, if two holons with momenta \(\textbf{k}\) and \(\textbf{p}\) attract each other, two doublons with the same momenta would repel each other and vice versa.

For the simple example of the Fermi-Hubbard model on a square (Hubbard tetramer) admitting an analytic solution, we could confirm the above picture – at least qualitatively. While the energy of the Mott state (30) scales quadratically \(\mathcal {O}(T^2/U)\), the eigen-energies of the states corresponding to one and two holons are linear in T to lowest order. Furthermore, the lowest (highest) two-holon eigen-energies cannot be written as a sum of two one-holon eigen-energies, indicating an effective repulsion (attraction).

Going beyond the simple mean-field ansatz (4) and taking spin correlations between the lattice sites into account, we obtain corrections to the quasi-particle energies \(E_{{\textbf {k}}}\) as well as to their effective interaction. For example, these correlations do also lead to an interaction (27) between holons of the same spin. Furthermore, for two holons of opposite spin, the effective interaction \(V_{\textbf{k}\textbf{p}\textbf{q}}^{\uparrow \downarrow }\), which is repulsive for low-energy holons, does also acquire attractive corrections (29) due to the spin correlations. As an intuitive picture, the fact that two holons cannot occupy the same lattice site leads to an effective on-site repulsion whereas the spin correlations can induce a finite range attraction: If a holon with spin \(\uparrow \) occupies the lattice site \(\mu \), there must have been an electron with that spin \(\uparrow \) in the Mott state at that lattice site \(\mu \). Then, in the presence of (even short-ranged) anti-ferromagnetic order, the probability for having an electron with the other spin \(\downarrow \) in the Mott state at a neighboring lattice site \(\nu \) is larger than average. Thus, this neighboring lattice site \(\nu \) can support a holon with the other spin \(\downarrow \). In addition to the on-site repulsion explained above, one can visualize this as a nearest-neighbor attraction.

Note that the observed scale separation between the fast frequency scale T of the propagation and interaction of the doublons and holons on the one hand and the slow frequency scale \(T^2/U\) of the spin fluctuations on the other hand allows us to approximately treat the (fast) evolution of the doublons and holons as taking place on a background with a fixed spin structure.

Finally, we discussed the implications of our results for high-temperature superconductivity. Using a BCS-like variational ansatz, we found that the usual s-wave pairing would not lower the energy (due to the effective on-site repulsion) but d-wave pairing could actually reduce the energy as a result of the nearest-neighbor attraction. Within the effective Hamiltonian approach, we deduced a gap equation. Its solution for the d-wave gap displays the usual non-perturbative structure, but in terms of the Fermi momentum of the holons instead of a coupling strength.

Of course, the effective interaction between doublons and/or holons has already been discussed in many publications, see, e.g., [39] and references therein. By now it is commonly expected that the spin degrees of freedom play an important role in that respect. The major points specific to the present work are: First, the derivation of the Boltzmann equation (based on the 1/Z expansion) displaying scattering cross sections which scale quadratically in T and thus point to an effective interaction linear in T. Second, the derivation of this effective interaction (based on the 1/U expansion) which is indeed linear in T and contains attractive as well as repulsive contributions. Third, the resulting gap equation whose solution is also exponentially suppressed, but the exponent merely contains the holon density.

8 Outlook

There are many ways to generalize our results. As one example, we focused on the leading order (in 1/Z or T/U). Including higher orders would lead to modifications in several places. For instance, the lowest-order mean-field ansatz (5) could be modified by including small probabilities for an empty or doubly occupied lattice site or that this lattice site is occupied by the “wrong” spin. In this way, the back-reaction of the quantum or thermal fluctuations onto the mean field can be taken into account. This, in turn, would change the lowest-order quasi-particle energies a bit, which corresponds to a renormalization of the involved quantities, quite analogous to the case of weakly interacting systems, see, e.g., [59,60,61,62]. In a similar manner, one could include higher orders in T/U in Sec. 4.

As a somewhat related point, we considered the zero-temperature limit here. Finite temperatures can also be taken into account in the approach based on the hierarchy of correlations (as it deals with density matrices), for example via the double-time correlator, see, e.g., [63]. The expected impact of finite temperatures can be discussed in terms of general arguments. If the temperature is well below the typical spin energy of order \(T^2/U\), one would expect that our results are basically unaffected. Once the temperature is above this energy scale \(T^2/U\), it is expected to wash out the anti-ferromagnetic correlations and thus the finite-range attraction (responsible for d-wave pairing) is suppressed while the on-site repulsion remains. The next characteristic scale is reached when the temperature approaches the hopping rate T leading to thermal broadening of the holon distribution functions. Finally, once the temperature reaches or even exceeds the Mott gap of order U, thermal excitations in the form of real doublon-holon pairs change the background (4) considerably such that the insulating behavior of the Mott phase disappears.

In this context, one should also remember that we took the magnetic order of the Mott background as given, i.e., fixed. As explained above, the rationale behind that is the separation of scales between the scale T of propagation and interaction of the holons and the characteristic scale \(T^2/U\) of the spin fluctuations. However, a complete picture would also require a more detailed treatment of the spin fluctuations and the origin of the magnetic order. For example, even though T is much larger than \(T^2/U\) in the strong-coupling limit considered here, the superconducting gap (39) scales as \(T\exp \{-32\pi /(3k_\textrm{F}^4)\}\) and thus it could be smaller than \(T^2/U\) for a very low density of holons. In this case, the spin fluctuations might even destroy superconductivity.

Closely related to the magnetic order is the lattice structure. Our approach can basically be applied to quite general lattices, as long as they obey the usual (discrete) translational symmetries. The pseudo-spin \(\hat{\varvec{\eta }}\) and the anti-ferromagnetic order such as in (5) require bi-particle lattices. For lattices which are not bi-particle (e.g., a triangular lattice), the anti-ferromagnetic order would be suppressed due to frustration, but short-range anti-ferromagnetic correlations should still persist (although on a weaker level) and thus the finite-range attraction (responsible for d-wave pairing) may survive. Apart from the discussion of the Hubbard tetramer in Section 5 which is obviously devoted to this specific example, we assumed a square lattice in Section 6. For other lattice structures (e.g., hexagonal), one should adapt the Fourier components \(T_{\textbf{k}}\) accordingly, which might then alter the rotational symmetries of the superconducting gap.

It should also be illuminating to compare the results of our approach with other methods based on a large-Z expansion such as dynamical mean-field theory (DMFT) [64,65,66,67] or its time-dependent version (t-DMFT) [68,69,70,71]. As a first difference, this method usually considers a different scaling limit, i.e., a factor of \(1/\sqrt{Z}\) instead of 1/Z in front of the hopping term in the Hamiltonian (1). As a consequence, already the limit \(Z\rightarrow \infty \) becomes non-trivial, while we are mostly interested in the corrections of order 1/Z or higher. Furthermore, such methods which are based on the mapping to an effectively single lattice site or a finite cluster of sites are quite suitable for deriving frequency-dependent quantities such as the self-energy – but are less adapted to the problem considered here, where the spatial structures and the momentum dependence play an important role.

Note that our considerations are solely based on the Fermi-Hubbard model without invoking any effective descriptions (such as the t-J model). However, it would be interesting to generalize our findings to other model Hamiltonians (see, e.g., [72, 73]) and to compare the results.