1 Introduction

Photo-induced states of matter gain increasing attention for their exotic properties [1,2,3,4,5,6] and possible applications, e.g., in the context of energy conversion [7, 8]. The description of these states necessitates nonequilibrium approaches, which are particularly demanding in cases where light brings a strongly correlated electronic system out of equilibrium. The approximate theoretical approaches to correlated systems are being successfully adapted to treat systems out of equilibrium (e.g., nonequilibrium dynamical mean-field theory (DMFT) [9], dynamical cluster approximation [10], auxiliary master equation approach [11], GW [12]). The numerically exact approaches, exact diagonalization (ED) [13], or density-matrix renormalization group [5, 14], where the error can be systematically controlled, are still limited to relatively small system sizes or short times [15]. They are, however, invaluable for benchmarking sophisticated approximate methods. The purpose of this paper is to present a straightforward implementation of the ED method using well-known data formats and algorithms in order to employ highly optimized libraries. The method currently allows for calculations with up to 14 sites.

We specifically focus on the application of the method to calculate electronic properties of a system that is described by a time-dependent Hubbard Hamiltonian. The time dependence is introduced by coupling of the electronic system to a electromagnetic (EM) field pulse. The EM field is treated classically and enters the hoppings via Peierls’ substitution. We apply the method to study the time evolution of a Mott-insulator after interaction with a light-pulse. By calculating double occupation and the nonequilibrium spectral function, we show photo-doping of the original Mott-insulator [16, 17] as well as filling of the Mott gap [8, 18,19,20].

The paper is organized as follows. In Sect. 2, we introduce the Hamiltonian of the Hubbard model with Peierls’ substitution, the observables of our interest, as well as notation and units. In Sect. 3, we give a detailed description of data formats and the time stepping algorithm as well as how observables are practically calculated. In Sect. 4, we present the time evolution of double occupation and nonequilibrium spectral function to illustrate the application of the method. In Sect. 5, we give a short summary and outlook.

2 Model

2.1 Hubbard model

We focus on the paradigm model for strongly correlated electrons, the Hubbard model [21], given by the following Hamiltonian:

(1)

where \(v_{ji}\) describes the relative probability amplitude of an electron hopping from site j to i without change of spin; \(U>0\) is the on-site Coulomb repulsion between two electrons if they reside at the same site (with opposite spins); \(\hat{c}^\dagger _{i\sigma }\) ( ) denote the fermionic creation (annihilation) operators at site i with spin \(\sigma \) and is the occupation number operator (for details on second quantization formalism c.f. [22]).

In the following, we restrict our considerations to finite size systems of \({N_s}\in {\mathbb {N}}\) sites with hoppings explicitly given by a hopping matrix \(v = (v_{ij})_{i,j=1}^{N_s}\). The hopping matrix can be arbitrary, i.e., we can allow for finite hoppings between any two sites. This is where the geometry of the studied system is encoded and where also periodic boundary conditions can be introduced. Lattices of arbitrary dimension and shape can be studied with this approach. Additionally, we can introduce on-site potentials, which can be added as diagonal elements \(v_{ii}\) of the hopping matrix. Figure 1 illustrates a \(2\times 3\) box geometry with open boundary conditions.

Fig. 1
figure 1

Example geometry of a two-dimensional six-site lattice with lexicographical ordering of the sites. The energies \(v_{ii}\) describe an additional on-site potential, \(v_{ij}\) describe the hoppings between sites i and j

2.2 Time-dependent electron-light interaction

The interaction of electrons with light puts the system out of equilibrium. Here, the light is modeled as a classical electric field pulse

$$\begin{aligned} \vec {E}(t)=\vec {E}_0\sin (\omega (t-t_p))e^{-\frac{(t-t_p)^2}{2 \sigma ^2}} \end{aligned}$$
(2)

of width \(\sigma \), peaked around the time \(t_p\), and with frequency \(\omega \). We set the units of frequency equal to the units of energy (\(\hbar \equiv 1\)) and the unit of time is then the inverse of the unit of energy. The EM field is included in the Hubbard Hamiltonian using Peierls’ substitution [23], which adds a time dependence to the hoppings:

$$\begin{aligned} v_{ij} \rightarrow v_{ij}(t)=v_{ij}\exp \left( {-\mathrm {i}e \int _{\vec {R}_i}^{\vec {R}_j}\vec {A}(\vec {r}',t)\mathrm{d}\vec {r}'}\right) . \end{aligned}$$
(3)

We use a gauge where the scalar potential vanishes and \(\vec {E}=\partial _t\vec {A}(t)\). In general, the result of the integral in Eq. (3) will depend on the direction of the \(\vec {E}\)-field and on the relative position of sites i and j. The time-dependent phase factor must then be defined for each pair of sites i and j separately.

In the simpler case of only nearest neighbor (NN) hopping and box geometry, the integral will only depend on whether the hopping between NN sites i and j is in the horizontal or vertical direction. By choosing the \(\vec {E}\)-field direction to be diagonal with respect to the box, we can describe the time dependence by only one function f(t) for each non-zero element of the hopping matrix v. For the sake of simplicity, we further approximate the integral in Eq. (3) to arrive at

$$\begin{aligned} f(t) = \exp \left( \mathrm {i}a\left[ \cos (\omega (t-t_p))-b\right] e^{-\frac{(t-t_p)^2}{2\sigma ^2}}\right) \end{aligned}$$
(4)

with dimensionless parameters a and b. The parameter a describes the strength of the EM field, whereas b can be used to set the initial phase factor of the hoppings to 1. Please note, that the Peierls’ substitution introduces only a phase factor to the hoppings and does not change their absolute value. For all the results presented, the NN hoppings will be set to have equal absolute value and this value is used as the unit of energy, i.e., \(|v_{ij}| = 1\).

2.3 Symmetries of the Hamiltonian

Allowing for at most two electrons (with different spins) per site, a state of the system can be represented by the state vector \(|\psi \rangle = |n_{1\uparrow } n_{1\downarrow } n_{2\uparrow } \ldots n_{{N_s}\downarrow }\rangle \), where \(n_{i\sigma } \in \{0,1\}\) is the number of electrons with spin \(\sigma \) at site i. All states of this form are orthonormal and form an abstract Hilbert space which we denote by \({\mathcal {H}}({N_s})\). The subspace of all states with \({n_\uparrow }\) electrons with spin up and \({n_\downarrow }\) electrons with spin down is denoted by \({\mathcal {H}}^{n_\uparrow }_{n_\downarrow }({N_s})\). It is easy to see that there holds (with \(\oplus \) being the direct sum)

$$\begin{aligned} {\mathcal {H}}({N_s}) = \bigoplus _{0\le {n_\uparrow }, {n_\downarrow }\le {N_s}} {\mathcal {H}}^{n_\uparrow }_{n_\downarrow }({N_s}). \end{aligned}$$
(5)

Any state in \({\mathcal {H}}({N_s})\) can be seen as excitation of the vacuum state:

$$\begin{aligned} (\hat{c}^\dagger _{1\uparrow })^{n_{1\uparrow }} \cdot (\hat{c}^\dagger _{1\downarrow })^{n_{1\downarrow }} \cdot \ldots \cdot (\hat{c}^\dagger _{{N_s}\downarrow })^{n_{{N_s}\downarrow }} |00\ldots 0\rangle = |n_{1\uparrow } n_{1\downarrow } \ldots n_{{N_s}\downarrow }\rangle , \end{aligned}$$
(6)

where the action of fermionic creation and annihilation operators \(\hat{c}^\dagger _{}\) and on a particular state is given by

(7)
(8)

respectively. Here, \(c_{i\sigma }\) is the number of how many electrons are present in the state up to the \(i\sigma \)-entry. This is due to the fermionic anticommutator relations, which cause that switching the order of two adjacent operators results in an additional negative sign. Note that the definition (6) is not consistent throughout literature. Finally, the action of the number operator is given by the equation

$$\begin{aligned} \hat{n}_{i\sigma } |n_{1\uparrow } n_{1\downarrow } \ldots n_{{N_s}\downarrow }\rangle = n_{i\sigma } |n_{1\uparrow } n_{1\downarrow } \ldots n_{{N_s}\downarrow }\rangle , \end{aligned}$$
(9)

i.e., this operator counts the number of electrons on site i with spin \(\sigma \).

Because the Hubbard Hamiltonian commutes with the occupation number and spin operators, the number of electrons of spin \(\sigma \) in the system \(\sum _{i} \hat{n}_{i\sigma }\) is invariant under the Hamiltonian in Eq. (1). This means that, in the basis of all states in the Hilbert space \({\mathcal {H}}({N_s})\), the Hamiltonian takes a block-diagonal form, according to the direct sum in Eq. (5). Our implementation exploits this block-diagonal form and generates the Hamiltonian only in the requested subspace \({\mathcal {H}}^{n_\uparrow }_{n_\downarrow }({N_s})\), with arbitrary \({N_s},{n_\uparrow }\) and \({n_\downarrow }\). Since we are interested in Mott-insulators, we take the system to be half-filled, i.e., \({n_\uparrow }={n_\downarrow }={N_s}/2\) and the total spin is zero.

2.4 Time evolution and observables

The Hubbard Hamiltonian with time-dependent hoppings is a time-dependent Hermitian operator that describes the evolution of a state \(|\psi _0\rangle \in {\mathcal {H}}({N_s})\) in terms of the Schrödinger equation (\(\hbar \equiv 1\))

$$\begin{aligned} \mathrm {i}\partial _t |\psi (t)\rangle = \hat{H}(t) |\psi (t)\rangle , \qquad |\psi (0)\rangle =|\psi _0\rangle . \end{aligned}$$
(10)

Exact diagonalization means that Eq. (10) is solved over the finite-dimensional Hilbert space \({\mathcal {H}}({N_s})\), which yields a large system of ordinary differential equations. The exact solution is given by

$$\begin{aligned} |\psi (t)\rangle = {\mathcal {T}} e^{-\mathrm {i}\int _0^t \hat{H}(\tau ) \text { d}\tau } |\psi _0\rangle , \end{aligned}$$
(11)

where \({\mathcal {T}}\) is the time ordering operator [22]. Once the state of the system at time t, \(|\psi (t)\rangle \), is known the expectation value of an observable \({\hat{O}}\) can be calculated directly through

$$\begin{aligned} \langle {\hat{O}}(t)\rangle = \langle \psi (t)|{\hat{O}}|\psi (t)\rangle . \end{aligned}$$
(12)

Specifically, we are interested in the (time-dependent) average double occupation per site

$$\begin{aligned} \langle \hat{d}_{}(t)\rangle = \tfrac{1}{{N_s}} \sum _{i=1}^{{N_s}}\langle \psi (t)|\hat{d}_{i}|\psi (t)\rangle , \end{aligned}$$
(13)

with \(\hat{d}_{i}=\hat{n}_{i\uparrow }\hat{n}_{i\downarrow }\), and the average energy per site

$$\begin{aligned} E(t)=\langle \hat{H}(t)\rangle = \tfrac{1}{{N_s}} \langle \psi (t)|\hat{H}(t)|\psi (t)\rangle . \end{aligned}$$
(14)

In the following, we drop the explicit time dependencies if it is clear from context. To numerically obtain these quantities of interest, one must assemble a matrix representation of \(\hat{H}\), compute the ground state and then carry out a time stepping before building the expectation values. These tasks are computationally not trivial, since the number of independent variables grows exponentially in the number of sites \({N_s}\). Note, however, that one can still treat the subspaces \({\mathcal {H}}_{n_\downarrow }^{n_\uparrow }({N_s})\) separately, because the time-dependent Hamiltonian \(\hat{H}(t)\) commutes with \(\sum _i \hat{n}_{i\sigma }\) (i.e., it preserves the number of electrons).

2.5 Nonequilibrium spectral function

The time-stepping algorithm allows also for calculation of double-time correlation functions. For example, the nonequilibrium Green’s functions \(G^<\) and \(G^>\) are obtained through [9]

(15)

where \({\mathcal {T}}\) is the time ordering operator and \(|\psi (t)\rangle \) is the solution of Eq. (10). In order to obtain the correlation functions in Eq. (15), we must apply the annihilation (creation) operator (\(\hat{c}^\dagger _{j\sigma }\)) to \(|\psi (t)\rangle \) and then time-evolve the resulting state according to Eq. (10) in the subspace with one electron less (more), act again with \(\hat{c}^\dagger _{j\sigma }\;\) () on the result, and finally build the expectation value.

The nonequilibrium spectral function [9] \(A(\nu ,t) = \big ( A^{<}(\nu ,t) + A^{>}(\nu ,t) \big )\) can then be obtained after performing a forward Fourier transform of \(G^{{\mathop {>}\limits ^{<}}}(t,t')\) [19]:

$$\begin{aligned} A^{{\mathop {>}\limits ^{<}}}(\nu ,t) = \frac{1}{\pi } \text {Im} \int _0^{\infty } e^{\mathrm {i}\nu t_{\text {rel}}} G^{{\mathop {>}\limits ^{<}}}(t,t') \text { d}t_{\text {rel}} \end{aligned}$$
(16)

with \(t_{\text {rel}} = t' - t\) (we omitted spin and site indices for simplicity).

Lehmann representation In equilibrium, the spectral function must remain time-independent and can be benchmarked against the Lehmann representation. The site-averaged spectral function is then given by

(17)

Here, \(\{|\phi \rangle \}\) is an eigenbasis of \({\mathcal {H}}({N_s})\) with respective energy eigenvalues \(E_{|\phi \rangle }\), and \(|\psi _0\rangle \) is the ground state of \(\hat{H}\) with energy \(E_0\).

3 Implementation

The aim of the present paper is to provide efficient data structures and algorithms for assembling matrix representations of Hubbard-Hamiltonians for arbitrary problems of the type that was introduced above, as well as a simple time-stepping algorithm for solving the arising time-dependent Schrödinger equation, Eq. (10). In the following, the key points of the implementation are discussed. For linear algebra subroutines, existing libraries such as Intel’s MKL and LAPACK / BLAS are used, as well as the matrix exponentiation library Expokit [24], which is an essential part of the time-stepping algorithm.

3.1 Discrete basis of subspace \({\mathcal {H}}^{n_\uparrow }_{n_\downarrow }({N_s})\)

Number of states Due to spin-up and spin-down electrons being independent, \({\mathcal {H}}^{n_\uparrow }_{n_\downarrow }({N_s})\) can be identified with \({\mathcal {H}}_{0}^{{n_\uparrow }}({N_s}) \otimes {\mathcal {H}}_{{n_\downarrow }}^{0}({N_s})\). The problem of how to place \({n_\uparrow }\) (\({n_\downarrow }\)) electrons on \({N_s}\) sites is well known in combinatorics. This leads to

$$\begin{aligned} {N_\psi }({n_\uparrow },{n_\downarrow }) = \dim \Big ( {\mathcal {H}}_{{n_\downarrow }}^{{n_\uparrow }}({N_s}) \Big ) = \dim \Big ({\mathcal {H}}_{0}^{{n_\uparrow }}({N_s})\Big ) \dim \Big ({\mathcal {H}}_{{n_\downarrow }}^{0}({N_s})\Big ) = \left( {\begin{array}{c}{N_s}\\ {n_\uparrow }\end{array}}\right) \left( {\begin{array}{c}{N_s}\\ {n_\downarrow }\end{array}}\right) .\nonumber \\ \end{aligned}$$
(18)

Note that \({N_\psi }= {N_\psi }({n_\uparrow },{n_\downarrow })\) takes a maximum for \({n_\uparrow }={n_\downarrow }={N_s}/2\). For the number of all states, there holds

$$\begin{aligned} \dim ({\mathcal {H}}({N_s})) = 2^{2{N_s}} = 4^{N_s}, \end{aligned}$$
(19)

since there are \(2 N_s\) vacancies in the lattice that can be occupied or not. This suggests that the general computational effort for assembling a Hamiltonian and for time-stepping on a system with \({N_s}\) sites scales at least like \({\mathcal {O}}(4^{N_s})\). For different \({N_s}\), the value of \({N_\psi }\) is shown in Table 1.

State representation On a computer, the states can be represented by \(2{N_s}\) bits, which can be stored internally in integers of sufficient size. All actions like hopping, creation, and annihilation of electrons can then be implemented as bitwise operations.

To obtain all states that constitute a basis \({\mathcal {B}}\) of \({\mathcal {H}}_{n_\downarrow }^{n_\uparrow }({N_s})\) for fixed numbers \({n_\uparrow }\), \({n_\downarrow }\), and \({N_s}\), the hopping of electrons is emulated. Starting from an initial state with the right number of electrons, all other states can be obtained by repeated hopping (i.e., flipping two bits). Due to the independence of spin-up and spin-down electrons, we can treat \({\mathcal {H}}_{n_\downarrow }^0({N_s})\) and \({\mathcal {H}}_0^{n_\uparrow }({N_s})\) separately and get all states from building the tensor-product of the respective bases \({\mathcal {B}}_\uparrow \) and \({\mathcal {B}}_\downarrow \). Therefore, we restrict the presentation to the case of spin-up electrons in the following.

Table 1 Number of states \({N_\psi }\), maximum number of non-zero elements of the Hubbard-Hamiltonian \(N_{\text {nz}}\), and estimated memory consumption \(N_\text {mem}\) in gigabyte for a system with \({N_s}\) sites and \({n_\uparrow }={n_\downarrow }={N_s}/2\). Note that \(N_\text {nz}\) and hence also \(N_\text {mem}\) are just upper bounds, the real memory consumption may be much lower

A multi-index \(\alpha \in \{1,\ldots ,{N_s}\}^{{n_\uparrow }}\) can be used to represent the (ordered) positions of electrons on the sites, i.e., \(\alpha _i = j\) means that the i-th electron resides at site j. From Pauli’s principle, we see that

$$\begin{aligned} 1 \le \alpha _1< \cdots < \alpha _{n_\uparrow }\le {N_s}. \end{aligned}$$
(20)

For such multi-indices, one can define a total ordering by

$$\begin{aligned} \alpha< (>)~{\hat{\alpha }} \, \Longleftrightarrow \, \alpha _j < (>)~{\hat{\alpha }}_j \quad \text {with } j = \arg \min \left\{ \alpha _i \ne {\hat{\alpha }}_i~|~i = 1,\ldots ,{n_\uparrow }\right\} .\nonumber \\ \end{aligned}$$
(21)

This gives a natural meaning to increasing \(\alpha \) by one. From Eq. (20), we see further that the smallest admissible multi-index satisfies \(\alpha _i = i\) and the largest satisfies \(\alpha _i = {N_s}-{n_\uparrow }-i\) for all \(i = 1,\ldots ,{n_\uparrow }\). By iterating over all multi-indices yielding to the limitation posed by Eq. (20), one obtains all possible permutations of electrons. This is shown in pseudo-code in Algorithm 1.

figure a

3.2 Sparse structure of the Hamiltonian

Because of Eq. (19), even for small \({N_s}\) the matrix representation of the Hamiltonian for most electron configurations requires vast amounts of memory if implemented as double-precision complex matrix. Due to the limited overlap of states, many elements of the matrix representation are zero. Utilizing this fact allows for using a well-known sparse matrix format, resulting in a much more memory efficient implementation.

Non-zero elements Consider fixed numbers \({N_s}\), \({n_\uparrow }\), and \({n_\downarrow }\). Conservation of electron number implies that two states in \({\mathcal {H}}^{n_\uparrow }_{n_\downarrow }({N_s})\) can only differ by an even number of entries. Furthermore, hopping between more than two sites is not accounted for by the Hubbard Hamiltonian in Eq. (1). This leaves only two cases that give a non-zero contribution.

First, a state differs from itself by zero entries, which gives a contribution to the diagonal of the Hamiltonian. Second, one of the \(n_\uparrow \) (\(n_\downarrow \)) spin up (spin down) electrons can hop to one of the \({N_s}- n_\uparrow \) (\({N_s}- n_\downarrow \)) unoccupied sites, creating a state differing in exactly two entries. There are \(n_\uparrow ({N_s}-n_\uparrow )\) (\(n_\downarrow ({N_s}-n_\downarrow )\)) possibilities for that process, each giving an off-diagonal non-zero contribution to the Hamiltonian. Due to the hermiticity of the Hamiltonian, only its upper triangular part, which consists of the diagonal and half of all off-diagonal non-zero elements, yields non-redundant information. Thus, the number of non-zero elements evaluates to

$$\begin{aligned} N_\text {nz} = {N_\psi }(1 + \tfrac{1}{2}{n_\uparrow }({N_s}-{n_\uparrow }) + \tfrac{1}{2}{n_\downarrow }({N_s}-{n_\downarrow })). \end{aligned}$$
(22)

Note that Eq. (22) is only a worst-case result. If some of the coefficients U, \(v_{ij}\), or combinations thereof are zero, this further reduces the number of nontrivial entries of the Hamiltonian’s matrix representation. For the case of half-filling, we get \(N_\text {nz} = {N_\psi }(1+\tfrac{{N_s}^2}{4})\). Together with \({N_\psi }= {\mathcal {O}}(4^{N_s})\), this means that the non-zero elements can be stored with \({\mathcal {O}}({N_\psi }\ln ^2({N_\psi }))\) memory, which is nearly linear, as opposed to quadratic memory \({\mathcal {O}}({N_\psi }^2)\) for dense matrices.

CSR-format In the light of the previous considerations, the most suitable storage format for the matrix representation H of the Hamiltonian is the Compressed-Sparse-Row (CSR) format [25]. This format stores only the non-zero elements and their positions within the matrix. For \(H \in {\mathbb {C}}^{{N_\psi }\times {N_\psi }}\), it consists of three arrays:

  • \({\mathcal {V}}\in {\mathbb {C}}^{N_\text {nz}}\) consists of all non-zero elements of H in the order they appear in H in a row-wise fashion.

  • \({\mathcal {J}}\in {\mathbb {N}}^{N_\text {nz}}\) consists of the column indices of all non-zero elements in the same order as in \({\mathcal {V}}\).

  • \({\mathcal {I}}\in {\mathbb {N}}^{{N_\psi }+ 1}\) stores where the rows in \({\mathcal {J}}\) are. Its k-th element refers to the position in \({\mathcal {J}}\) where the k-th row begins and the \((k-1)\)-th ends.

If \(H_{ij}\) is the k-th non-zero element of H, it can thus be accessed via \({\mathcal {V}}_{k}\), and there holds \(j = {\mathcal {J}}_k\) as well as \({\mathcal {I}}_i \le k < {\mathcal {I}}_{i+1}\). Due to the Hamiltonian being Hermitian, its matrix representation satisfies \(H^\dagger =H\) and only the upper triangular part needs to be stored explicitly, i.e., \(H_{ij}\) for all \(i \le j\). For all other elements, there holds \(H_{ji} = {H^*_{ij}}\). The following example illustrates this concept:

figure b

Only the upper triangular part of H is considered and the non-zero elements are stored. Note that due to row 3 having no non-zero element above the diagonal, there holds \({\mathcal {I}}_3 = {\mathcal {I}}_4 = 6\).

A comparison of memory requirement between naive and sparse (CSR) implementation of the Hamiltonian matrix for the worst-case (half-filling) is shown in Table 1. Due to the small number of elements that have to be stored, the addition of matrices with the same sparsity structure can be carried out efficiently by adding the \({\mathcal {V}}\) arrays of both matrices. Furthermore, because of the row-wise storage of the matrix, the CSR format is predestined for matrix-vector multiplication. Both of which can be done in \({\mathcal {O}}(N_\text {nz})\) operations. The drawbacks of this format, however, lie in element-access for which a linear search of the \({\mathcal {J}}\) array must be carried out, and in changing the sparsity structure (i.e., set a former zero element to a value other than zero), in which case all three arrays must be altered and possibly reallocated. This is avoided in our implementation.

3.3 Time-dependent Hamiltonian

We assume the hopping amplitudes to be time dependent in the following way: We consider a Hermitian matrix \(v^\text {Re} \in {\mathbb {C}}^{{N_s}\times {N_s}}\) and an Anti-Hermitian matrix \(v^\text {Im} \in {\mathbb {C}}^{{N_s}\times {N_s}}\), as well as a phase factor \(f(t) \in {\mathbb {C}}\) which vanishes for large times, i.e., \(|f(t)| = 1\) and \(f(t) \rightarrow 1\) as \(t \rightarrow \infty \). For each hopping pair (ij), we can decide if the corresponding hopping amplitude should explicitly depend on time or not. Then, the time-dependent hopping amplitudes read

$$\begin{aligned} v_{ij}(t) = {\left\{ \begin{array}{ll} v^\text {Re}_{ij} \text {Re}(f(t)) + \mathrm {i}v^\text {Im}_{ij} \text {Im}(f(t)) &{} \text {if hopping is time dependent,}\\ v^\text {Re}_{ij} &{} \text {else.} \end{array}\right. } \end{aligned}$$
(24)

Note that this definition renders the matrix v(t) Hermitian, i.e., \(v^\dagger (t) = v(t)\), and \(v_{ii}(t)\equiv v_{ii} \in {\mathbb {R}}\). The function f(t) in (24) can, e.g., describe the EM pulse as in Eq. (4).

By separating time-dependent and time-independent parts of the Hamiltonian according to (24), the full time-dependent matrix representation can be written as

$$\begin{aligned} H(t) = H^\text {(stat)} + \text {Re}(f(t)) H^\text {(Re)} + \mathrm {i}\text {Im}(f(t)) H^\text {(Im)}. \end{aligned}$$
(25)

Here, the matrix \(H^\text {(stat)}\) includes all time-independent contributions to the Hamiltonian. These are the Coulomb interaction U and hopping amplitudes \(v_{ij}^\text {Re}\) if hopping between sites i and j is modeled as time-independent. The matrices \(H^\text {(Re)}\) and \(H^\text {(Im)}\) include all hopping amplitudes \(v^\text {Re}_{ij}\) and \(v^\text {Im}_{ij}\), respectively, which are modeled as time-dependent. Due to the function f(t) converging to one at large times t, the Hamiltonian for such t is \(H(t) = H^\text {(stat)} + H^\text {(Re)}\), which describes the system in equilibrium.

We suppose that \(H^\text {(full)}\), \(H^\text {(stat)}\), \(H^\text {(Re)}\), and \(H^\text {(Im)}\) can be described by only one pair of index arrays \({\mathcal {I}}\) and \({\mathcal {J}}\). The assembly of this structure is shown in Algorithm 2. Because of the nested for-loops, this costs \({\mathcal {O}}({N_\psi }^2)\) operations and is the only operation in our code that has quadratic complexity. However, the assembly is done without knowledge of either U, or v, so the structure is independent of the interaction between sites and hence of the geometry. Therefore, the structure for a specific set of \({N_s}, {n_\uparrow },\) and \({n_\downarrow }\) only needs to be computed once (which can be done in parallel); cf. the discussion in Sect. 3.8.

To improve the complexity of Algorithm 2, one needs to avoid the innermost for-loop, which can be done in the following way. For every state \(|\psi _i\rangle \), instead of comparing it to every other state in \({\mathcal {B}}\), one can simulate hopping of electrons as is done in Algorithm 1. Thus, one obtains all states that differ from \(|\psi _i\rangle \) by exactly two entries, i.e., all states that interact with \(|\psi _i\rangle \) and give a possibly non-zero contribution to the hamiltonian, resulting in an overall cost of \({\mathcal {O}}(N_\text {nz})\). To achieve linear complexity in \(N_\text {nz}\), however, it is crucial that finding the position in \({\mathcal {B}}\) of a given state can be done in constant time, e.g., by a suitable hash function as described in Sect. 3.7.

figure c

Pre-assembling the structure allows for fast assembly of the Hamiltonian for specific coefficients U and v, which is shown in Algorithm 3. Note that the first entry in each row of the sparse representation of the Hamiltonian lies on the diagonal, thus line 4 adds a diagonal contribution. Furthermore, as noted in Sect. 2.3, applying to a state where hopping from \(j\sigma \) to \(i\sigma \) is possible results in a factor

$$\begin{aligned} (-1)^{\delta (i,j,\sigma )} = (-1)^{|c_{i\sigma } - c_{j\sigma }|}. \end{aligned}$$
(26)

Thus, \(\delta (i,j,\sigma ) = |c_{i\sigma } - c_{j\sigma }|\) is the number of electrons that lie between the \(i\sigma \) and \(j\sigma \) entry and can be computed by a simple for-loop. This explains the signs in lines 8 and 10.

It is apparent that the cost of Algorithm 3 is \({\mathcal {O}}(N_\text {nz})\) and that the memory consumption of the Hamiltonian is proportional to \(N_\text {nz}\). Upper bounds for the memory consumption for certain parameters \({N_s}\), \({n_\uparrow }\), and \({n_\downarrow }\) are shown in Table 1. To further reduce the memory consumption and computational effort for the time-stepping, one can carry out the assignments in lines 6–12 only if at least one of the contributions that would be assigned is non-vanishing. Afterwards, the structure can be updated to only account for the actual non-vanishing elements of the Hamiltonian.

figure d

3.4 Time-stepping algorithm

Each state in \({\mathcal {H}}_{n_\downarrow }^{n_\uparrow }({N_s})\) can be uniquely represented by a vector \(\vec {v} \in {\mathbb {R}}^{N_\psi }\). Hence, we can write the ODE system resulting from the time-dependent Schrödinger equation Eq. (10) as

$$\begin{aligned} \mathrm {i}\tfrac{\text { d}}{\text { d}t}\vec {v}(t) = H(t) \vec {v}(t), ~~ \vec {v}(0) = \vec {v}_0, \end{aligned}$$
(27)

where \(H \in {\mathbb {C}}^{{N_\psi }\times {N_\psi }}\) is the matrix representation of \(\hat{H}\) and \(\vec {v}_0\) is the vector representing the ground state of the system. We now discuss how to solve (27) numerically.

Ground state In order to obtain the initial state for the time-stepping, we consider the system described by Eq. (10) to be in thermal equilibrium. Then, the ground state \(|\psi _0\rangle \in {\mathcal {H}}_{n_\downarrow }^{n_\uparrow }({N_s})\) is defined as the eigenstate corresponding to the smallest eigenvalue of \(\hat{H}\):

$$\begin{aligned} \hat{H}|\psi _0\rangle = E_0 |\psi _0\rangle , ~~ E_0 = \min \left\{ E~|~E \text { is an eigenvalue of } \hat{H}\right\} . \end{aligned}$$
(28)

Numerically, we obtain a representation \((E_0, \vec {v}_0)\) of the eigenpair \((E_0, \psi _0)\) by a variant of the so-called power iteration method (see e.g., [26]).

The power iteration method iteratively computes the eigenvalue \(\lambda \) of largest absolute value and the corresponding eigenvector \(\vec {v}\) of a Hermitian matrix \(M \in {\mathbb {C}}^{N \times N}\) by the recursive formulae

$$\begin{aligned} \vec {v}^{(n)} = \frac{M \vec {v}^{(n-1)}}{\Vert M \vec {v}^{(n-1)}\Vert }, ~~ \lambda ^{(n)} = (\vec {v}^{(n-1)})^\dagger M \vec {v}^{(n-1)}, \end{aligned}$$
(29)

starting with an arbitrary vector \(\vec {v}^{(0)}\) not orthogonal to the desired vector. The iteration stops if \(\lambda ^{(n)}\) and \(\vec {v}^{(n)}\) are sufficiently near to the real values, which is determined by an a-posteriori error estimate. This is shown in Algorithm 4.

figure e

Applying the power iteration to H gives an approximate eigenpair \((E,\vec {v})\). If E is negative, we have already found the smallest eigenvalue of H and we can set \(E_0 = E\), and \(\vec {v}_0= \vec {v}\). Else, if \(E \ge 0\) and hence is the largest eigenvalue of H, we apply the power iteration once again to the shifted matrix \(H-EI\), obtaining an approximate eigenpair \((E', \vec {v}')\). Since \(H-EI\) has only non-positive eigenvalues, \(E'\) approximates its smallest eigenvalue. Then, we set \(E_0 = E+E'\) and \(\vec {v}_0= \vec {v}'\), as they approximate the smallest eigenvalue of H and the corresponding eigenvector, respectively. This is shown in Algorithm 5.

figure f

Exponential midpoint rule and Krylov subspace method The continuous evolution solving Eq. (27) is the discrete analog of Eq. (11). For small times t, it can be approximated with sufficient accuracy by a Magnus-expansion of order zero [27], which gives

$$\begin{aligned} \vec {v}(t) \approx \exp \left( -\mathrm {i}\int _0^t H(\tau ) \text { d}\tau \right) \vec {v}_0. \end{aligned}$$
(30)

By approximating the integral in the exponent via the midpoint rule

$$\begin{aligned} \int _{a}^{b} f(t) \text { d}t \approx (b-a)f\left( \frac{b+a}{2}\right) , \end{aligned}$$
(31)

the approximation in Eq. (30) can be further simplified. Considering consecutive intervals of length \(\tau \), for which the midpoint rule and the Magnus-expansion are sufficient approximations, yields a sequence of vectors defined by

$$\begin{aligned} \vec {v}^{(n+1)} = \exp \left( -\mathrm {i}H(n\tau + \tau /2) \tau \right) \vec {v}^{(n)}, \quad \vec {v}^{(0)} = \vec {v}_0, \end{aligned}$$
(32)

that approximate the solution at times \(n\tau \): \(\vec {v}^{(n)} \approx \vec {v}(n\tau )\). Note that these approximations are of lowest order and thus the time stepping cannot, in general, be expected to surpass first-order convergence. The discretization in Eq. (32) is the lowest-order representative of a family of numerical time propagation schemes called Magnus integrators [28]. Other representatives, which allow for higher-order time stepping methods, can be obtained by using higher-order expansions in Eq. (30) and Eq. (31) (cf. [29]). Of particular practical interest among these are so-called commutator free exponential time-propagators (CFETs), which avoid commutator terms in the Magnus-expansion Eq. (30) and thus optimize the number of necessary matrix-vector multiplications; see [30] for a fourth-order CFET for the Schrödinger equation (and a comparison to conventional Runge–Kutta-type integrators) and [31] for the derivation of general higher-order CFETs which can be readily implemented with the data structures and methods presented in our work.

The main difficulty in the computation of Eq. (32) is evaluating the exponential of the large anti-Hermitian sparse matrix \(-\mathrm {i}H\tau \). To this end, we employ a so-called Krylov subspace method as described in [24] and references therein. For a matrix H and a vector \(\vec {v}\), the m-th Krylov subspace is defined as \({\mathcal {K}}_m(H,\vec {v}) = \text {span}\{\vec {v}, H\vec {v}, \ldots , H^{m-1}\vec {v}\}\). The space \({\mathcal {K}}_m(H,\vec {v})\) is thus spanned by vectors obtained by (sparse) matrix-vector multiplication only, which can be carried out efficiently in the CSR-format. Let \(V \in {\mathbb {C}}^{{N_\psi }\times m}\) be a projection to an orthonormal basis of \({\mathcal {K}}_m(H,\vec {v})\). Then, by projection, we can approximate H by a lower-dimensional matrix \(h \in {\mathbb {C}}^{m\times m}\):

$$\begin{aligned} H \approx V h {V}^\dagger . \end{aligned}$$
(33)

Hermiticity of H implies hermiticity of h and basic orthogonality properties of a Krylov space (c.f. [24]) cause h to be Hessenberg, i.e., \(h_{ij} = 0\) for \( i>j+1\). Together, we can infer h to be tridiagonal, hence the orthonormal basis V as well as h can be computed via a Lanczos-algorithm in \({\mathcal {O}}(m{N_\psi })\) operations, as is done in Algorithm 6.

figure g

For the exponentiation of a small m-by-m tridiagonal matrix, numerically stable and efficient methods are implemented in the library Expokit [24], requiring \({\mathcal {O}}(m^3)\) operations. Together, this leads to an approximation of the exponential in (32):

$$\begin{aligned} \exp (-\mathrm {i}H\tau )\vec {v} \approx \sum _n \frac{1}{n!} (-\mathrm {i}V h {V}^\dagger \tau )^n\vec {v} = V \exp (-\mathrm {i}h\tau ) V^\dagger \vec {v}. \end{aligned}$$

The dimension m of the Krylov subspace should be chosen sufficiently large to ensure small approximation errors, but small enough to limit the computational effort. In Algorithm 6, the dimension is chosen adaptively in each step via an a-posteriori error estimate, because the difficulty of computing a time-step can vary, according to H(t). We use a method suggested in [32], which uses the norm of the difference of two consecutive approximations of the solution in \({\mathcal {K}}_{m-1}(H,\vec {v})\) and \({\mathcal {K}}_m(H,\vec {v})\). The validity of this error estimate is shown in Fig. 2 for a problem, where the exact solution is known. Although the error estimator underestimates the error by nearly an order of magnitude, its convergence has the same rate as the error, rendering the estimate a good indicator of convergence.

Fig. 2
figure 2

Comparison between actual error and a-posteriori error estimate for a diagonal problem with \({N_\psi }= 63504\)

Algorithm 6 shows one time step, summarizing this subsection. For sake of brevity, \(V_{:,j}\) denotes the j-th column of V, and \(h|_{j\times j}\) denotes the restriction of h to the first j rows and columns. The whole time stepping algorithm to solve Eq. (27) consists of applying Alg. 6 iteratively.

3.5 Observables

To compute the discrete analog of the expectation value \(\langle \hat{O} \rangle = \langle \psi |\hat{O}|\psi \rangle \) of an observable \(\hat{O}\) with respect to a state \(\psi \in {\mathcal {H}}_{n_\downarrow }^{n_\uparrow }({N_s})\), we need to discretize the action of \(\hat{O}\) on \(\psi \). For the energy \(\langle \hat{H}(t) \rangle \), this is already achieved by the discrete Hamiltonian H(t).

For the double occupation, this can be done by computing a weight vector \(\vec {w}_{\hat{d}} \in {\mathbb {N}}^{N_\psi }\). The k-th element of \(\vec {w}_{\hat{d}}\) is the number of double occupations in the k-th state of the considered basis, i.e., \((\vec {w}_{\hat{d}})_k = \langle \psi _k|\hat{d}|\psi _k\rangle \). The desired expectation value can then be obtained by \(\langle \hat{d} \rangle = \vec {v}^\dagger (\vec {w}_{\hat{d}} \odot \vec {v})\), where \(\odot \) denotes element-wise multiplication of vectors. For other observables like double occupation at a specific site, or electron occupation, one can compute the corresponding weight vector and proceed analogously.

3.6 Equilibrium spectral function

The implemented tools can also be used to compute the spectral function of the system from the Lehmann representation. Equation (17) is not suitable for implementation, because of the \(\delta \)-distributions. Approximating these by Lorentzians with width \(\varepsilon > 0\) leads to

(34)

This approximation preserves relative values of spectral weights.

To compute \(A(\nu )\) from Eq. (34), note that, e.g., the creation operator \(\hat{c}^\dagger _{i\uparrow }\) maps \({\mathcal {H}}_{n_\downarrow }^{n_\uparrow }({N_s})\) to \({\mathcal {H}}_{n_\downarrow }^{{n_\uparrow }+1}({N_s})\). Then, from Eq. (5) it is clear that only an eigenbasis of \({\mathcal {H}}_{n_\downarrow }^{{n_\uparrow }+1}({N_s})\) needs to be considered, which can be obtained by assembling the Hamiltonian in this subspace and applying an eigendecomposition. For the evaluation of \(\hat{c}^\dagger _{i\uparrow }|\psi _0\rangle \) the relation between states in \({\mathcal {H}}_{n_\downarrow }^{n_\uparrow }({N_s})\) and \({\mathcal {H}}_{n_\downarrow }^{{n_\uparrow }+1}({N_s})\) introduced by \(\hat{c}^\dagger _{i\uparrow }\) must be known. This relation can be found by a linear search on the states of \({\mathcal {H}}_{n_\downarrow }^{{n_\uparrow }+1}({N_s})\), or, more efficiently, via hash-maps (see the following subsection). Here, the anti-commutator relations for creation and annihilation operators have to be taken into account.

These computations can be carried out analogously for the corresponding term with annihilation operators and for the spin-down case (which can be omitted for systems with spin symmetry).

3.7 Nonequilibrium spectral function

In order to evaluate Eq. (15), the action of a creation (annihilation) operator \(\hat{c}^\dagger _{i\sigma }\) () on a state vector \(|\psi \rangle \) is implemented following the definition in Eq. (7), where the sign resulting from \(c_{i\sigma }\) is computed as in Eq. (26).

Since a general state \(|\psi \rangle \) is represented as a linear combination of basis states, i.e., \(|\varphi _k\rangle \)

$$\begin{aligned} |\psi \rangle = \sum _{k=1}^{{N_\psi }} w(k) |\varphi _k\rangle \equiv \vec {w} \cdot \vec {|\varphi \rangle }, \end{aligned}$$
(35)

it remains only to determine the ordering of states in the subspace with one added or one removed electron, which is not identical to the ordering of the states obtained from \(\hat{c}^\dagger _{i\sigma } |\psi \rangle \) or . Hence, after applying, e.g., \(\hat{c}^\dagger _{i\uparrow }\) we have to find the index of the resulting state in the subspace \({\mathcal {H}}_{n_\downarrow }^{{n_\uparrow }+1}({N_s})\). A simple linear search and match are very inefficient. To this aim, we apply a fermionic hashing function from Ref. [33] given by

$$\begin{aligned} I = \sum _{i = 1}^{{N_s}} \begin{pmatrix}p_i\\ i\end{pmatrix} , \end{aligned}$$
(36)

where I is the hashing index, \(p_i\) is the spin-site that the particle i occupies and \( \begin{pmatrix}m\\ n\end{pmatrix} = 0\) if \(n > m\). This function provides a unique mapping of a state-vector (in its binary representation as an integer) to an integer in the range \(0 \le I < N_{\text {states}}\), which also directly corresponds to the ordering of the states. Thus, if the action of a creation (annihilation) operator on a given state-vector is non-zero, the corresponding hashing index will be calculated in order for it to be correctly assigned.

The Fourier transform in Eq. (16) is performed as post-processing. As in case of Eq. (34), we also use broadening to numerically represent the \(\delta \)-distributions occurring for a finite system. This is achieved by modifying the Fourier transform in Eq. (16) by adding the factor \(e^{-\varepsilon t_\mathrm{rel}}\):

$$\begin{aligned} A^{{\mathop {>}\limits ^{<}}}(\nu ,t) = \frac{1}{\pi } \text {Im} \int _0^{T_\mathrm{max}} e^{-\varepsilon t_{\text {rel}}} e^{\mathrm {i}\nu t_{\text {rel}}} G^{{\mathop {>}\limits ^{<}}}(t,t') \text { d}t_{\text {rel}}, \end{aligned}$$
(37)

where we also limit the maximal time interval taken for the integration with sufficiently large \(T_\mathrm{max}\).

3.8 Numerical cost and limitations

The method presented in this paper can handle computations of up to 14 sites. For more than 14 sites, some issues must be resolved. First, indexing with a 32-bit integer format is no longer possible. Possible remedies include using long integer indexing or splitting the arrays. Furthermore, computation time may be an issue. The computation necessary for obtaining double occupations for a 14-site chain (presented in Fig. 6) took about seven hours on a single node on the VSC-3 computer cluster, which is equipped with a 16-core Intel Xeon processor and 256GB of RAM. Multi-node computing may accelerate simulations, e.g., by outsourcing computations of expectation values.

As seen in Sect. 3.2, the time needed for the time-stepping rises with \({\mathcal {O}}({N_\psi }\ln ^2({N_\psi }))\). Algorithm 2, however, requires \({\mathcal {O}}({N_\psi }^2)\) operations. For small numbers \({N_s}\), this makes up only a small amount of total runtime, since it only needs to be run once, compared to repeated simulations on the same geometry (e.g., parameters studies). For a large number of sites, however, computing the structure of the hamiltonian may become the bottleneck (for 14 sites and half-filling, it takes about one day to compute on 16 cores in parallel). To overcome this issue, one may use the alternative algorithm described in Sect. 3.3, which has almost linear complexity.

Also virtual memory can be an issue for more than 14 sites, as can be seen from Table 1. For 16 sites, the upper bound for the memory needed to store the Hamiltonian is 680GB. However, for typical geometries, many elements of v are zero so that the effective memory consumption is about one-fourth of the upper bound.

Fig. 3
figure 3

Errors (on the left scale) and expectation values of double occupation operators (on the right scale) for the time evolution of the ground state of a 12-site time-independent chain with \(U=4\), \(v_{ij}=1\) for NN sites i and j, and half-filling. The gray lines correspond to double occupation at sites, the black line marks their mean value. Note that there are only 6 light gray lines, because the chain geometry is symmetric with respect to its center (we use open boundary conditions). The farther away a site is from the center of the chain, the lower is its double occupation in the ground state

3.9 Benchmarking

For certain values of the parameters U and \(v_{ij}\), the eigenvalues of the Hamiltonian from Eq. (1) can be computed analytically. They were compared to the numerically obtained values, and agreement within machine precision was found. Also the eigenvalues of Hamiltonians that emanate from different geometries, for which the physics should be the same, were compared. Again, no significant deviation was found.

For the time-stepping algorithm, we tested if stationary systems are described correctly. Figure 3 shows double occupation and its error as well as the error of calculating the energy (difference between expected and obtained values) as a function of time for a time-independent Hamiltionian. The time evolution was started from a ground state. From Fig. 3, we can infer that the ground state is indeed an eigenstate and that the time evolution of such a state can be correctly integrated by our algorithm. The computation of the Green’s function was benchmarked against an analytical computation for a two-site system.

4 Results

In the following, we present results obtained for chain and box geometries with nearest neighbor (NN) hopping and open boundary conditions (OBC). We always start the time evolution from the ground state of the system and choose it to be a Mott-insulator with \(n_{\uparrow }=n_{\downarrow }={N_s}/2\). The time step in the time-stepping algorithm is set to \(\tau =0.005\), with the unit of time being 1/energy. The unit of energy (as already introduced in Sect. 2) is the absolute value of the NN hopping \(|v_{ij}|=1\).

4.1 Time evolution of double occupation

The effect of the light pulse described by f(t) given in Eq. (4) on the electronic system mainly depends on the relation between the pulse frequency \(\omega \) and the size of the gap. In Fig. 4, we show the time evolution of average double occupation and energy per site for an 8-site chain with \(U=4\) for different pulse frequencies. The size of the gap for \(U=4\) is approximately equal to 2 (it can be seen in Fig. 5, where we show the equilibrium spectral functions obtained from Lehmann representation for the same chain and different values of U). For frequencies that are significantly lower or higher than \(\omega =2\) the electrons cannot be excited across the Mott gap and thus the system cannot absorb energy. Almost no electron-hole pairs are generated and the double occupation and energy stay the same after the pulse. For the duration of the pulse only, energy is absorbed for \(\omega =11\) (there is still some spectral weight in the tails of the Hubbard bands, see Fig. 5), but this energy is returned to the pulse (similar deexcitation effects are described in detail within the Boltzmann equation approach in Ref. [18]).

For frequencies \(3.5-8.5\), we observe an increase in double occupation during the pulse and the increase is the strongest for \(\omega =3.5\) which approximately matches the distance between the centra of the Hubbard bands. The energy is absorbed and transformed into potential energy by creating doubly occupied sites (electrons in the upper Hubbard band and holes in the lower Hubbard band). For \(\omega =8.5\), only the lowest energy electrons can be excited to the range of the upper Hubbard band, where they occupy the high-energy part. Thus, only a few electrons are excited but with high energies. This causes the double occupation to barely rise, whereas the energy rises by a moderate amount.

Fig. 4
figure 4

Average double occupation and energy per site for a half-filled 8-site chain, \(U=4\), and different pulse frequencies \(\omega \). The light gray area represents the envelope of the light pulse with \(t_p=6\), \(\sigma =2\) and \(a=0.8\)

Fig. 5
figure 5

Average local spectral function for an 8-site chain with half-filling for different Coulomb interaction U (\(\varepsilon =0.1\))

Fig. 6
figure 6

Time evolution of double occupation for a half-filled 14-site chain with \(U=3.5\) and pulse parameters \(\omega =\tfrac{7}{4}\pi \), \(t_p=6\), \(\sigma =2\) and \(a=0.8\). Here the colored lines represent the double occupation of the separate sites, the black line represents the average value. Sites that have the same distance from the center of the chain have the same color. The vertical dashed lines represent times, at which snapshots are shown in Fig. 8

Fig. 7
figure 7

The dependence of the frequency \(\varOmega \) of the double occupation oscillations on the length of the chain \({N_s}\). The gray line is an \(\alpha /{N_s}\) fit

Fig. 8
figure 8

Snapshots of the time evolution depicted in Fig. 6 at different times. The darkest lines mark the value of the double occupation at the specified time \(t=6.1,\;17.2\;,18.9\), the other lines are the values of the three previous time steps with \(\varDelta t=0.05\), increasing in saturation and darkness with time, creating the effect of the values leaving a trace. The first snapshot is taken at a phase of steepest ascend of double occupation during the pulse. The second and third snapshots are taken at a local maximum and minimum, respectively, of the total double occupation after the pulse

We see that, as a function of time, the double occupation oscillates with two different frequencies. This is even more visible when we look at the site resolved double occupation as presented in Fig. 6 for a 14-site chain. The high-frequency oscillation is equal to the light pulse frequency \(\omega \) and is typically compensated by another site where the oscillation is in opposite phase. The lower frequency (\(\varOmega \)) is found to be inversely proportional to the length of the chain \({N_s}\) (see Fig. 7). It can be viewed to originate from doublon and holon movements through the chain, leaving the overall number of doublons nearly constant. The site-averaged double occupation is almost constant in time after the pulse. We see only a slight oscillation, almost not visible in Fig. 6.

In Fig. 8, we show the values of double occupation along the chain for three different times: \(t=6.1\) during the steep rise of double occupation during the pulse, \(t=17.2\) after the pulse at a local maximum of total double occupation and \(t=18.9\), at a local minimum. Initially, at \(t=6.1\), an alternating pattern is visible with double occupation rising on every second site. The contribution from states, where electrons ’jump to the left’ creating a doublon and leaving a hole behind is bigger than from states where electrons leave the doubly occupied sites—in the ground state the double occupation is small. The rightmost site is different in this respect, since with the OBC there is no site to the right from which an electron could hop. At later times, this alternating pattern is replaced by a longer range oscillation in space—corresponding to doublon and holon movements along the chain. The boundary sites remain different with significantly lower double occupation due to the OBC.

Fig. 9
figure 9

Local, site averaged \(A^{<}(\nu ,t)\) (upper row) and \(A(\nu ,t)\) (lower row) for different times \(t = 8,\; 12,\; 20,\; 30\) and the equilibrium spectral function from Lehmann representation (grey) for an 8-site chain with \(U=6\), for pulse frequency \(\omega =9\) and two different strengths of the EM field: \(a=0.2\) (left column) and \(a=0.6\) (right column). The Gaussian envelope of the pulse is centered at \(t_p = 8\) with width \(\sigma = 2\)

4.2 Nonequilibrium spectral function

The imaginary parts of the lesser Green’s function \(A^<(\nu ,t)\) and spectral functions \(A(\nu .t)\) shown in Figs. 9, 10 and 11 are calculated from Eq. (37) with the broadening \(\varepsilon =0.1\) and \(T_\mathrm{max}\approx 80\). They are all local and site-averaged. Additionally we show the equilibrium spectral function in the ground state (which is our initial state at \(t=0\)) calculated from Lehmann representation (34) with the same broadening \(\varepsilon =0.1\).

In Fig. 9, we show \(A^<(\nu ,t)\) (upper row) and \(A(\nu .t)\) (lower row) for an 8-site chain with \(U=6\) and the pulse frequency \(\omega =9\) at different times during (\(t=8\)) and after the pulse. At \(t=0\), we start from the ground state where \(A^{<}\) does not have any spectral weight above \(\nu =0\). For smaller pulse strength \(a=0.2\) (left plots), we see only few photo-induced excitations into the upper Hubbard band in \(A^{<}\) and the overall spectrum remains almost unchanged. With increasing the pulse strength to \(a=0.6\), we see stronger redistribution of the spectral weight. This effect, known as photo-doping of the Mott-insulator, has already been found in the Hubbard model with other methods—nonequilibrium DMFT [16, 19] and quantum Boltzmann equation [18]. At later times, there is also an additional spectral weight shift inside both lower and upper Hubbard bands, which corresponds to the first step of thermalization [18]. In the corresponding full spectral functions \(A(\nu ,t)\), we additionally see that the spectral weight shifts into the Mott-gap causing a gap reduction (photo-melting). Such gap filling is also seen in the nonequilibrium DMFT study [8, 16], but is missed by the quantum Boltzmann approach [18]. Both effects have also been reported in Refs. [34, 35].

The gap filling is stronger in case we choose a smaller pulse frequency \(\omega =6\) that connects points with more spectral weight than \(\omega =9\) (as already noted in the discussion of Fig. 4, more energy can then be pumped into the system at the same pulse intensity). The spectral function and \(A^<\) of the same 8-site chain but with \(\omega =6\) are shown in Fig. 10. Already for \(a=0.2\) there is a significant amount of spectral weight in the upper Hubbard band (upper left plot of Fig. 10) and we also see a slight gap filling. The shift of spectral weight into the gap already for small pulse intensity \(a=0.2\) is even more pronounced for the \(4\times 2\) box geometry (see Fig. 11).

Fig. 10
figure 10

The same as in Fig. 9 but for the pulse frequency \(\omega =6\)

Fig. 11
figure 11

The same as in Fig. 10, but for the \(4\times 2\) box geometry

For both \(8\times 1\) chain and \(4\times 2\) box systems, increasing the pulse strength to \(a=0.6\) causes a significant redistribution of spectral weight. Although both systems absorb approximately the same amount of energy for \(a=0.6\) (cf. Fig. 12, where we show double occupation and energy for different pulse strengths as a function of time), the gap filling is much stronger for the box geometry. In both geometries, there is more spectral weight in the gap at \(t=8\), i.e., during the pulse, than at later times. The systems initially absorb more energy (particularly the 8-site chain), but it cannot be stored and is returned to the pulse (cf. Fig. 12 and Ref. [18]). From Fig. 12, we also learn that further increasing the pulse strength a does not lead to further increase in double occupation at later times. The initial increase in energy and double occupation grows with increasing pulse strength, in case of the chain even above the maximum equilibrium value of \(\langle d \rangle =0.25\), but at later times the spectral weight is redistributed and double occupation is reduced. When we look at the maximal values of energy and double occupation at a later time after the pulse (e.g., \(t=20\)) the chain and box geometries do not differ significantly. The chain, however, can initially absorb more energy and the rise of double occupation is bigger. This is likely related to the specific properties of the spectra, i.e., the distribution of the available states on the \(\nu \)-axis (the bandwidth is similar in both geometries). For the same reason, the increase in double occupation and energy at \(a=0.4\) is also different for the two systems.

Fig. 12
figure 12

Average double occupation (upper plots) and energy (lower plots) per site for different pulse intensities a for the two geometries studied: \(8\times 1\) chain (left) and \(4\times 2\) box (right) and \(\omega =6\). Other parameters as in Figs. 1011

5 Summary and outlook

We have presented a simple implementation scheme for solving the time-dependent Schrödinger equation for systems described by the Hubbard Hamiltonian with time-dependent hoppings. As example application, we show a detailed time dependence of double occupation after applying a light pulse for a 14-site chain with open boundary conditions. We further study the photo-induced doping and gap filling of a Mott-insulator and find similar behavior for 8-site clusters with chain and box geometry with open boundary conditions. The 8-site clusters are certainly too small to identify differences that could come from dimensionality (1d vs 2d) but it is an interesting future question if the chain/box small differences shown here, deepen and become more characteristic for larger systems.

The algorithms presented here are flexible and allow for arbitrary geometry, open and periodic boundary conditions, as well as for calculating any correlation function that can be built from creation and annihilation operators. Larger cluster sizes can become possible if one can avoid storing explicitly the matrix elements of the Hamiltonian and generate them during computation. The presented implementation allows for this change, since only matrix-vector multiplications are needed. These can be replaced by operators that directly change the vector, without storing them in the matrix form.