Electron-light interaction in nonequilibrium: exact diagonalization for time-dependent Hubbard Hamiltonians

We present a straightforward implementation scheme for solving the time-dependent Schrödinger equation for systems described by the Hubbard Hamiltonian with time-dependent hoppings. The computations can be performed for clusters of up to 14 sites with, in principle, general geometry. For the time evolution, we use the exponential midpoint rule, where the exponentials are computed via a Krylov subspace method, which only uses matrix-vector multiplication. The presented implementation uses standard libraries for constructing sparse matrices and for linear algebra. Therefore, the approach is easy to use on both desktop computers and computational clusters. We apply the method to calculate time evolution of double occupation and nonequilibrium spectral function of a photo-excited Mott-insulator. The results show that not only the double occupation increases due to creation of electron-hole pairs but also the Mott gap becomes partially filled.


Introduction
Photo-induced states of matter gain increasing attention for their exotic properties [1][2][3][4][5][6] and possible applications, e.g., in the context of energy conversion [7,8]. The description of these states necessitates nonequilibrium approaches, which are particularly demanding in cases where light brings a strongly correlated electronic system out of equilibrium. The approximate theoretical approaches to correlated systems are being successfully adapted to treat systems out of equilibrium (e.g., nonequilibrium dynamical mean-field theory (DMFT) [9], dynamical cluster approximation [10], auxiliary master equation approach [11], GW [12]). The numerically exact approaches, exact diagonalization (ED) [13], or density-matrix renormalization group [5,14], where the error can be systematically controlled, are still limited to relatively small system sizes or short times [15]. They are, however, invaluable for benchmarking sophisticated approximate methods. The purpose of this paper is to present a straightforward implementation of the ED method using well-known data formats and algorithms in order to employ highly optimized libraries. The method currently allows for calculations with up to 14 sites. a e-mail: kauch@ifp.tuwien.ac.at (corresponding author) We specifically focus on the application of the method to calculate electronic properties of a system that is described by a time-dependent Hubbard Hamiltonian. The time dependence is introduced by coupling of the electronic system to a electromagnetic (EM) field pulse. The EM field is treated classically and enters the hoppings via Peierls' substitution. We apply the method to study the time evolution of a Mott-insulator after interaction with a light-pulse. By calculating double occupation and the nonequilibrium spectral function, we show photo-doping of the original Mott-insulator [16,17] as well as filling of the Mott gap [8,[18][19][20].
The paper is organized as follows. In Sect. 2, we introduce the Hamiltonian of the Hubbard model with Peierls' substitution, the observables of our interest, as well as notation and units. In Sect. 3, we give a detailed description of data formats and the time stepping algorithm as well as how observables are practically calculated. In Sect. 4, we present the time evolution of double occupation and nonequilibrium spectral function to illustrate the application of the method. In Sect. 5, we give a short summary and outlook.

Hubbard model
We focus on the paradigm model for strongly correlated electrons, the Hubbard model [21], given by the following Hamiltonian: where v ji describes the relative probability amplitude of an electron hopping from site j to i without change of spin; U > 0 is the on-site Coulomb repulsion between two electrons if they reside at the same site (with opposite spins);ĉ † iσ (ĉ iσ ) denote the fermionic creation (annihilation) operators at site i with spin σ andn iσ =ĉ † iσĉiσ is the occupation number operator (for details on second quantization formalism c.f. [22]).
In the following, we restrict our considerations to finite size systems of N s ∈ N sites with hoppings explicitly given by a hopping matrix v = (v i j ) N s i, j=1 . The hopping matrix can be arbitrary, i.e., we can allow for finite hoppings between any two sites. This is where the geometry of the studied system is encoded and where also periodic boundary conditions can be introduced. Lattices of arbitrary dimension and shape can be studied with this approach. Additionally, we can introduce on-site potentials, which can be added as diagonal elements v ii of the hopping matrix. Figure 1 illustrates a 2 × 3 box geometry with open boundary conditions.

Time-dependent electron-light interaction
The interaction of electrons with light puts the system out of equilibrium. Here, the light is modeled as a classical electric field pulse E(t) = E 0 sin(ω(t − t p ))e − (t−t p ) 2 2σ 2 (2) of width σ , peaked around the time t p , and with frequency ω. We set the units of frequency equal to the units of energy (h ≡ 1) and the unit of time is then the inverse of the unit of energy. The EM field is included in the Hubbard Hamiltonian using Peierls' substitution [23], which adds a time dependence to the hoppings: We use a gauge where the scalar potential vanishes and E = ∂ t A(t). In general, the result of the integral in Eq.
(3) will depend on the direction of the E-field and on the relative position of sites i and j. The time-dependent phase factor must then be defined for each pair of sites i and j separately.
In the simpler case of only nearest neighbor (NN) hopping and box geometry, the integral will only depend on whether the hopping between NN sites i and j is in the horizontal or vertical direction. By choosing the E-field direction to be diagonal with respect to the box, we can describe the time dependence by only one function f (t) for each non-zero element of the hopping matrix v. For the sake of simplicity, we further approximate the integral in Eq. (3) to arrive at with dimensionless parameters a and b. The parameter a describes the strength of the EM field, whereas b can be used to set the initial phase factor of the hoppings to 1. Please note, that the Peierls' substitution introduces only a phase factor to the hoppings and does not change their absolute value. For all the results presented, the NN hoppings will be set to have equal absolute value and this value is used as the unit of energy, i.e., |v i j | = 1.

Symmetries of the Hamiltonian
Allowing for at most two electrons (with different spins) per site, a state of the system can be represented by the state vector |ψ = |n 1↑ n 1↓ n 2↑ . . . n N s ↓ , where n iσ ∈ {0, 1} is the number of electrons with spin σ at site i. All states of this form are orthonormal and form an abstract Hilbert space which we denote by H(N s ). The subspace of all states with n ↑ electrons with spin up and n ↓ electrons with spin down is denoted by H Any state in H(N s ) can be seen as excitation of the vacuum state: where the action of fermionic creation and annihilation operatorsĉ † andĉ on a particular state is given bŷ respectively. Here, c iσ is the number of how many electrons are present in the state up to the iσ -entry. This is due to the fermionic anticommutator relations, which cause that switching the order of two adjacent operators results in an additional negative sign. Note that the definition (6) is not consistent throughout literature. Finally, the action of the number operatorn iσ =ĉ † iσĉiσ is given by the equation i.e., this operator counts the number of electrons on site i with spin σ . Because the Hubbard Hamiltonian commutes with the occupation number and spin operators, the number of electrons of spin σ in the system in iσ is invariant under the Hamiltonian in Eq. (1). This means that, in the basis of all states in the Hilbert space H(N s ), the Hamiltonian takes a block-diagonal form, according to the direct sum in Eq. (5). Our implementation exploits this block-diagonal form and generates the Hamiltonian only in the requested subspace H n ↑ n ↓ (N s ), with arbitrary N s , n ↑ and n ↓ . Since we are interested in Mott-insulators, we take the system to be half-filled, i.e., n ↑ = n ↓ = N s /2 and the total spin is zero.

Time evolution and observables
The Hubbard Hamiltonian with time-dependent hoppings is a time-dependent Hermitian operator that describes the evolution of a state |ψ 0 ∈ H(N s ) in terms of the Schrödinger Exact diagonalization means that Eq. (10) is solved over the finite-dimensional Hilbert space H(N s ), which yields a large system of ordinary differential equations. The exact solution is given by where T is the time ordering operator [22]. Once the state of the system at time t, |ψ(t) , is known the expectation value of an observableÔ can be calculated directly through Specifically, we are interested in the (time-dependent) average double occupation per site withd i =n i↑ni↓ , and the average energy per site In the following, we drop the explicit time dependencies if it is clear from context. To numerically obtain these quantities of interest, one must assemble a matrix representation of H , compute the ground state and then carry out a time stepping before building the expectation values. These tasks are computationally not trivial, since the number of independent variables grows exponentially in the number of sites N s . Note, however, that one can still treat the subspaces H n ↑ n ↓ (N s ) separately, because the time-dependent HamiltonianĤ (t) commutes with in iσ (i.e., it preserves the number of electrons).

Nonequilibrium spectral function
The time-stepping algorithm allows also for calculation of double-time correlation functions. For example, the nonequilibrium Green's functions G < and G > are obtained through [9] where T is the time ordering operator and |ψ(t) is the solution of Eq. (10). In order to obtain the correlation functions in Eq. (15), we must apply the annihilation (creation) operatorĉ iσ (ĉ † jσ ) to |ψ(t) and then time-evolve the resulting state according to Eq. (10) in the subspace with one electron less (more), act again withĉ † jσ (ĉ iσ ) on the result, and finally build the expectation value.
The nonequilibrium spectral function [9] A(ν, t) = A < (ν, t) + A > (ν, t) can then be obtained after performing a forward Fourier transform of G < > (t, t ) [19]: (16) with t rel = t − t (we omitted spin and site indices for simplicity). Lehmann representation In equilibrium, the spectral function must remain timeindependent and can be benchmarked against the Lehmann representation. The site-averaged spectral function is then given by Here, {|φ } is an eigenbasis of H(N s ) with respective energy eigenvalues E |φ , and |ψ 0 is the ground state ofĤ with energy E 0 .

Implementation
The aim of the present paper is to provide efficient data structures and algorithms for assembling matrix representations of Hubbard-Hamiltonians for arbitrary problems of the type that was introduced above, as well as a simple time-stepping algorithm for solving the arising time-dependent Schrödinger equation, Eq. (10). In the following, the key points of the  [24], which is an essential part of the time-stepping algorithm.

Discrete basis of subspace
Number of states Due to spin-up and spin-down electrons being independent, H The problem of how to place n ↑ (n ↓ ) electrons on N s sites is well known in combinatorics. This leads to Note that N ψ = N ψ (n ↑ , n ↓ ) takes a maximum for n ↑ = n ↓ = N s /2. For the number of all states, there holds since there are 2N s vacancies in the lattice that can be occupied or not. This suggests that the general computational effort for assembling a Hamiltonian and for time-stepping on a system with N s sites scales at least like O(4 N s ). For different N s , the value of N ψ is shown in Table 1.
State representation On a computer, the states can be represented by 2N s bits, which can be stored internally in integers of sufficient size. All actions like hopping, creation, and annihilation of electrons can then be implemented as bitwise operations.
To obtain all states that constitute a basis B of H n ↑ n ↓ (N s ) for fixed numbers n ↑ , n ↓ , and N s , the hopping of electrons is emulated. Starting from an initial state with the right number of electrons, all other states can be obtained by repeated hopping (i.e., flipping two bits). Due to the independence of spin-up and spin-down electrons, we can treat H 0 n ↓ (N s ) and H n ↑ 0 (N s ) separately and get all states from building the tensor-product of the respective bases B ↑ and B ↓ . Therefore, we restrict the presentation to the case of spin-up electrons in the following.
A multi-index α ∈ {1, . . . , N s } n ↑ can be used to represent the (ordered) positions of electrons on the sites, i.e., α i = j means that the i-th electron resides at site j. From Pauli's principle, we see that For such multi-indices, one can define a total ordering by This gives a natural meaning to increasing α by one. From Eq. (20), we see further that the smallest admissible multi-index satisfies α i = i and the largest satisfies α i = N s − n ↑ − i for all i = 1, . . . , n ↑ . By iterating over all multi-indices yielding to the limitation posed by Eq. (20), one obtains all possible permutations of electrons. This is shown in pseudo-code in Algorithm 1.  19), even for small N s the matrix representation of the Hamiltonian for most electron configurations requires vast amounts of memory if implemented as doubleprecision complex matrix. Due to the limited overlap of states, many elements of the matrix representation are zero. Utilizing this fact allows for using a well-known sparse matrix format, resulting in a much more memory efficient implementation.

Algorithm 1 Generating states with only spin-up excitations
Non-zero elements Consider fixed numbers N s , n ↑ , and n ↓ . Conservation of electron number implies that two states in H n ↑ n ↓ (N s ) can only differ by an even number of entries. Furthermore, hopping between more than two sites is not accounted for by the Hubbard Hamiltonian in Eq. (1). This leaves only two cases that give a non-zero contribution.
First, a state differs from itself by zero entries, which gives a contribution to the diagonal of the Hamiltonian. Second, one of the n ↑ (n ↓ ) spin up (spin down) electrons can hop to one of the N s −n ↑ (N s −n ↓ ) unoccupied sites, creating a state differing in exactly two entries. There are n ↑ (N s − n ↑ ) (n ↓ (N s − n ↓ )) possibilities for that process, each giving an off-diagonal non-zero contribution to the Hamiltonian. Due to the hermiticity of the Hamiltonian, only its upper triangular part, which consists of the diagonal and half of all off-diagonal nonzero elements, yields non-redundant information. Thus, the number of non-zero elements evaluates to Note that Eq. (22) is only a worst-case result. If some of the coefficients U , v i j , or combinations thereof are zero, this further reduces the number of nontrivial entries of the Hamiltonian's matrix representation. For the case of half-filling, we get N nz = N ψ (1 + N s 2 4 ). Together with N ψ = O(4 N s ), this means that the non-zero elements can be stored with O(N ψ ln 2 (N ψ )) memory, which is nearly linear, as opposed to quadratic memory O(N ψ 2 ) for dense matrices.

CSR-format
In the light of the previous considerations, the most suitable storage format for the matrix representation H of the Hamiltonian is the Compressed-Sparse-Row (CSR) format [25]. This format stores only the non-zero elements and their positions within the matrix. For H ∈ C N ψ ×N ψ , it consists of three arrays: If H i j is the k-th non-zero element of H , it can thus be accessed via V k , and there holds j = J k as well as Only the upper triangular part of H is considered and the non-zero elements are stored. Note that due to row 3 having no non-zero element above the diagonal, there holds I 3 = I 4 = 6. A comparison of memory requirement between naive and sparse (CSR) implementation of the Hamiltonian matrix for the worst-case (half-filling) is shown in Table 1. Due to the small number of elements that have to be stored, the addition of matrices with the same sparsity structure can be carried out efficiently by adding the V arrays of both matrices. Furthermore, because of the row-wise storage of the matrix, the CSR format is predestined for matrixvector multiplication. Both of which can be done in O(N nz ) operations. The drawbacks of this format, however, lie in element-access for which a linear search of the J array must be carried out, and in changing the sparsity structure (i.e., set a former zero element to a value other than zero), in which case all three arrays must be altered and possibly reallocated. This is avoided in our implementation.

Time-dependent Hamiltonian
We assume the hopping amplitudes to be time dependent in the following way: We consider a Hermitian matrix v Re ∈ C N s ×N s and an Anti-Hermitian matrix v Im ∈ C N s ×N s , as well as a phase factor f (t) ∈ C which vanishes for large times, i.e., | f (t)| = 1 and f (t) → 1 as t → ∞. For each hopping pair (i, j), we can decide if the corresponding hopping amplitude should explicitly depend on time or not. Then, the time-dependent hopping amplitudes read Note that this definition renders the matrix v(t) (24) can, e.g., describe the EM pulse as in Eq. (4). By separating time-dependent and time-independent parts of the Hamiltonian according to (24), the full time-dependent matrix representation can be written as Here, the matrix H (stat) includes all time-independent contributions to the Hamiltonian. These are the Coulomb interaction U and hopping amplitudes v Re i j if hopping between sites i and j is modeled as time-independent. The matrices H (Re) and H (Im) include all hopping amplitudes v Re i j and v Im i j , respectively, which are modeled as time-dependent. Due to the function f (t) converging to one at large times t, the Hamiltonian for such t is H (t) = H (stat) + H (Re) , which describes the system in equilibrium.
We suppose that H (full) , H (stat) , H (Re) , and H (Im) can be described by only one pair of index arrays I and J . The assembly of this structure is shown in Algorithm 2. Because of the nested for-loops, this costs O(N ψ 2 ) operations and is the only operation in our code that has quadratic complexity. However, the assembly is done without knowledge of either U , or v, so the structure is independent of the interaction between sites and hence of the geometry. Therefore, the structure for a specific set of N s , n ↑ , and n ↓ only needs to be computed once (which can be done in parallel); cf. the discussion in Sect. 3.8.
To improve the complexity of Algorithm 2, one needs to avoid the innermost for-loop, which can be done in the following way. For every state |ψ i , instead of comparing it to every other state in B, one can simulate hopping of electrons as is done in Algorithm 1. Thus, one obtains all states that differ from |ψ i by exactly two entries, i.e., all states that interact with |ψ i and give a possibly non-zero contribution to the hamiltonian, resulting in an overall cost of O(N nz ). To achieve linear complexity in N nz , however, it is crucial that finding the position in B of a given state can be done in constant time, e.g., by a suitable hash function as described in Sect. 3.7. if |ψ i and |ψ j differ by 2 entries or less then 7: J k = j 8: k = k + 1 9: end if 10: end for 11: end for 12: I N ψ +1 = k + 1 Pre-assembling the structure allows for fast assembly of the Hamiltonian for specific coefficients U and v, which is shown in Algorithm 3. Note that the first entry in each row of the sparse representation of the Hamiltonian lies on the diagonal, thus line 4 adds a diagonal contribution. Furthermore, as noted in Sect. 2.3, applyingĉ † iσĉ jσ to a state where hopping from jσ to iσ is possible results in a factor Thus, δ(i, j, σ ) = |c iσ − c jσ | is the number of electrons that lie between the iσ and jσ entry and can be computed by a simple for-loop. This explains the signs in lines 8 and 10. It is apparent that the cost of Algorithm 3 is O(N nz ) and that the memory consumption of the Hamiltonian is proportional to N nz . Upper bounds for the memory consumption for certain parameters N s , n ↑ , and n ↓ are shown in Table 1. To further reduce the memory consumption and computational effort for the time-stepping, one can carry out the assignments in lines 6-12 only if at least one of the contributions that would be assigned is non-vanishing. Afterwards, the structure can be updated to only account for the actual non-vanishing elements of the Hamiltonian.

Algorithm 3 Assembling all parts of the Hamiltonian
determine sites α and β between which the hopping |ψ J j → |ψ i happens 7: if hopping between sites α and β is time-dependent then 8: H where H ∈ C N ψ ×N ψ is the matrix representation ofĤ and v 0 is the vector representing the ground state of the system. We now discuss how to solve (27) numerically.
Ground state In order to obtain the initial state for the time-stepping, we consider the system described by Eq. (10) to be in thermal equilibrium. Then, the ground state is defined as the eigenstate corresponding to the smallest eigenvalue ofĤ : Numerically, we obtain a representation (E 0 , v 0 ) of the eigenpair (E 0 , ψ 0 ) by a variant of the so-called power iteration method (see e.g., [26]). The power iteration method iteratively computes the eigenvalue λ of largest absolute value and the corresponding eigenvector v of a Hermitian matrix M ∈ C N ×N by the recursive formulae starting with an arbitrary vector v (0) not orthogonal to the desired vector. The iteration stops if λ (n) and v (n) are sufficiently near to the real values, which is determined by an a-posteriori error estimate. This is shown in Algorithm 4.

Algorithm 4 Power iteration method
Input: Hermitian matrix M ∈ C N ×N , maximum number of iterations N max , tol Output: approximate eigenpair (λ, v) of M 1: initialize v (0) randomly and normalize, λ (0) = 0 2: Applying the power iteration to H gives an approximate eigenpair (E, v). If E is negative, we have already found the smallest eigenvalue of H and we can set E 0 = E, and v 0 = v. Else, if E ≥ 0 and hence is the largest eigenvalue of H , we apply the power iteration once again to the shifted matrix H − E I , obtaining an approximate eigenpair (E , v ). Since H − E I has only non-positive eigenvalues, E approximates its smallest eigenvalue. Then, we set E 0 = E + E and v 0 = v , as they approximate the smallest eigenvalue of H and the corresponding eigenvector, respectively. This is shown in Algorithm 5. Exponential midpoint rule and Krylov subspace method The continuous evolution solving Eq. (27) is the discrete analog of Eq. (11). For small times t, it can be approximated with sufficient accuracy by a Magnus-expansion of order zero [27], which gives

Algorithm 5 Computing the ground state
By approximating the integral in the exponent via the midpoint rule the approximation in Eq. (30) can be further simplified. Considering consecutive intervals of length τ , for which the midpoint rule and the Magnus-expansion are sufficient approximations, yields a sequence of vectors defined by that approximate the solution at times nτ : v (n) ≈ v(nτ ). Note that these approximations are of lowest order and thus the time stepping cannot, in general, be expected to surpass first-order convergence. The discretization in Eq. (32) is the lowest-order representative of a family of numerical time propagation schemes called Magnus integrators [28]. Other representatives, which allow for higher-order time stepping methods, can be obtained by using higher-order expansions in Eq. (30) and Eq. (31) (cf. [29]). Of particular practical interest among these are so-called commutator free exponential time-propagators (CFETs), which avoid commutator terms in the Magnus-expansion Eq. (30) and thus optimize the number of necessary matrixvector multiplications; see [30] for a fourth-order CFET for the Schrödinger equation (and a comparison to conventional Runge-Kutta-type integrators) and [31] for the derivation of general higher-order CFETs which can be readily implemented with the data structures and methods presented in our work. The main difficulty in the computation of Eq. (32) is evaluating the exponential of the large anti-Hermitian sparse matrix −i H τ . To this end, we employ a so-called Krylov subspace method as described in [24] and references therein. For a matrix H and a vector v, the m-th Krylov subspace is defined as is thus spanned by vectors obtained by (sparse) matrix-vector multiplication only, which can be carried out efficiently in the CSR-format. Let V ∈ C N ψ ×m be a projection to an orthonormal basis of K m (H, v). Then, by projection, we can approximate H by a lowerdimensional matrix h ∈ C m×m : Hermiticity of H implies hermiticity of h and basic orthogonality properties of a Krylov space (c.f. [24]) cause h to be Hessenberg, i.e., h i j = 0 for i > j + 1. Together, we can infer h to be tridiagonal, hence the orthonormal basis V as well as h can be computed via a Lanczos-algorithm in O(m N ψ ) operations, as is done in Algorithm 6. For the exponentiation of a small m-by-m tridiagonal matrix, numerically stable and efficient methods are implemented in the library Expokit [24], requiring O(m 3 ) operations. Together, this leads to an approximation of the exponential in (32): The dimension m of the Krylov subspace should be chosen sufficiently large to ensure small approximation errors, but small enough to limit the computational effort. In Algorithm 6, the dimension is chosen adaptively in each step via an a-posteriori error estimate, because the difficulty of computing a time-step can vary, according to H (t). We use a method suggested in [32], which uses the norm of the difference of two consecutive approximations of the solution in K m−1 (H, v) and K m (H, v). The validity of this error estimate is shown in Fig. 2 for a problem, where the exact solution is known. Although the error estimator underestimates the error by nearly an order of magnitude, its convergence has the same rate as the error, rendering the estimate a good indicator of convergence. Algorithm 6 shows one time step, summarizing this subsection. For sake of brevity, V :, j denotes the j-th column of V , and h| j× j denotes the restriction of h to the first j rows and columns. The whole time stepping algorithm to solve Eq. (27) consists of applying Alg. 6 iteratively.

Observables
To compute the discrete analog of the expectation value Ô = ψ|Ô|ψ of an observablê O with respect to a state ψ ∈ H n ↑ n ↓ (N s ), we need to discretize the action ofÔ on ψ. For the energy Ĥ (t) , this is already achieved by the discrete Hamiltonian H (t).
For the double occupation, this can be done by computing a weight vector wd ∈ N N ψ . The k-th element of wd is the number of double occupations in the k-th state of the considered basis, i.e., ( wd ) k = ψ k |d|ψ k . The desired expectation value can then be obtained by where denotes element-wise multiplication of vectors. For other observables like double occupation at a specific site, or electron occupation, one can compute the corresponding weight vector and proceed analogously.

Equilibrium spectral function
The implemented tools can also be used to compute the spectral function of the system from the Lehmann representation. Equation (17) is not suitable for implementation, because of the δ-distributions. Approximating these by Lorentzians with width ε > 0 leads to This approximation preserves relative values of spectral weights.
To compute A(ν) from Eq. (34), note that, e.g., the creation operatorĉ † i↑ maps H n ↓ (N s ). Then, from Eq. (5) it is clear that only an eigenbasis of H n ↑ +1 n ↓ (N s ) needs to be considered, which can be obtained by assembling the Hamiltonian in this subspace and applying an eigendecomposition. For the evaluation ofĉ † i↑ |ψ 0 the relation between states in H n ↑ n ↓ (N s ) and H n ↑ +1 n ↓ (N s ) introduced byĉ † i↑ must be known. This relation can be found by a linear search on the states of H n ↑ +1 n ↓ (N s ), or, more efficiently, via hash-maps (see the following subsection). Here, the anti-commutator relations for creation and annihilation operators have to be taken into account.
These computations can be carried out analogously for the corresponding term with annihilation operators and for the spin-down case (which can be omitted for systems with spin symmetry).

Nonequilibrium spectral function
In order to evaluate Eq. (15), the action of a creation (annihilation) operatorĉ † iσ (ĉ iσ ) on a state vector |ψ is implemented following the definition in Eq. (7), where the sign resulting from c iσ is computed as in Eq. (26).
Since a general state |ψ is represented as a linear combination of basis states, i.e., |ϕ k it remains only to determine the ordering of states in the subspace with one added or one removed electron, which is not identical to the ordering of the states obtained fromĉ † iσ |ψ orĉ iσ |ψ . Hence, after applying, e.g.,ĉ † i↑ we have to find the index of the resulting state in the subspace H n ↑ +1 n ↓ (N s ). A simple linear search and match are very inefficient. To this aim, we apply a fermionic hashing function from Ref. [33] given by where I is the hashing index, p i is the spin-site that the particle i occupies and m n = 0 if n > m. This function provides a unique mapping of a state-vector (in its binary representation as an integer) to an integer in the range 0 ≤ I < N states , which also directly corresponds to the ordering of the states. Thus, if the action of a creation (annihilation) operator on a given state-vector is non-zero, the corresponding hashing index will be calculated in order for it to be correctly assigned. The Fourier transform in Eq. (16) is performed as post-processing. As in case of Eq. (34), we also use broadening to numerically represent the δ-distributions occurring for a finite system. This is achieved by modifying the Fourier transform in Eq. (16) by adding the factor e −εt rel : where we also limit the maximal time interval taken for the integration with sufficiently large T max .

Numerical cost and limitations
The method presented in this paper can handle computations of up to 14 sites. For more than 14 sites, some issues must be resolved. First, indexing with a 32-bit integer format is no longer possible. Possible remedies include using long integer indexing or splitting the arrays. Furthermore, computation time may be an issue. The computation necessary for obtaining double occupations for a 14-site chain (presented in Fig. 6) took about seven hours on a single node on the VSC-3 computer cluster, which is equipped with a 16-core Intel Xeon processor and 256GB of RAM. Multi-node computing may accelerate simulations, e.g., by outsourcing computations of expectation values. As seen in Sect. 3.2, the time needed for the time-stepping rises with O(N ψ ln 2 (N ψ )). Algorithm 2, however, requires O(N ψ 2 ) operations. For small numbers N s , this makes up only a small amount of total runtime, since it only needs to be run once, compared to repeated simulations on the same geometry (e.g., parameters studies). For a large number of sites, however, computing the structure of the hamiltonian may become the bottleneck (for 14 sites and half-filling, it takes about one day to compute on 16 cores in parallel). To overcome this issue, one may use the alternative algorithm described in Sect. 3.3, which has almost linear complexity.
Also virtual memory can be an issue for more than 14 sites, as can be seen from Table 1. For 16 sites, the upper bound for the memory needed to store the Hamiltonian is 680G B. However, for typical geometries, many elements of v are zero so that the effective memory consumption is about one-fourth of the upper bound.

Benchmarking
For certain values of the parameters U and v i j , the eigenvalues of the Hamiltonian from Eq. (1) can be computed analytically. They were compared to the numerically obtained values, and agreement within machine precision was found. Also the eigenvalues of Hamiltonians that emanate from different geometries, for which the physics should be the same, were compared. Again, no significant deviation was found. For the time-stepping algorithm, we tested if stationary systems are described correctly. Figure 3 shows double occupation and its error as well as the error of calculating the energy (difference between expected and obtained values) as a function of time for a timeindependent Hamiltionian. The time evolution was started from a ground state. From Fig. 3, we can infer that the ground state is indeed an eigenstate and that the time evolution of such a state can be correctly integrated by our algorithm. The computation of the Green's function was benchmarked against an analytical computation for a two-site system.

Results
In the following, we present results obtained for chain and box geometries with nearest neighbor (NN) hopping and open boundary conditions (OBC). We always start the time evolution from the ground state of the system and choose it to be a Mott-insulator with n ↑ = n ↓ = N s /2. The time step in the time-stepping algorithm is set to τ = 0.005, with the unit of time being 1/energy. The unit of energy (as already introduced in Sect. 2) is the absolute value of the NN hopping |v i j | = 1. For the duration of the pulse only, energy is absorbed for ω = 11 (there is still some spectral weight in the tails of the Hubbard bands, see Fig. 5), but this energy is returned to the pulse (similar deexcitation effects are described in detail within the Boltzmann equation approach in Ref. [18]). For frequencies 3.5−8.5, we observe an increase in double occupation during the pulse and the increase is the strongest for ω = 3.5 which approximately matches the distance between the centra of the Hubbard bands. The energy is absorbed and transformed into potential energy by creating doubly occupied sites (electrons in the upper Hubbard band and holes in the lower Hubbard band). For ω = 8.5, only the lowest energy electrons can be excited to the range of the upper Hubbard band, where they occupy the high-energy part. Thus, only a few electrons are excited but with high energies. This causes the double occupation to barely rise, whereas the energy rises by a moderate amount.
We see that, as a function of time, the double occupation oscillates with two different frequencies. This is even more visible when we look at the site resolved double occupation as presented in Fig. 6 for a 14-site chain. The high-frequency oscillation is equal to the light pulse frequency ω and is typically compensated by another site where the oscillation is in opposite phase. The lower frequency (Ω) is found to be inversely proportional to the length of the chain N s (see Fig. 7). It can be viewed to originate from doublon and holon movements through the chain, leaving the overall number of doublons nearly constant. The site-averaged double occupation is almost constant in time after the pulse. We see only a slight oscillation, almost not visible in Fig. 6.
In Fig. 8 Fig. 8 at a local maximum of total double occupation and t = 18.9, at a local minimum. Initially, at t = 6.1, an alternating pattern is visible with double occupation rising on every second site. The contribution from states, where electrons 'jump to the left' creating a doublon and leaving a hole behind is bigger than from states where electrons leave the doubly occupied sites-in the ground state the double occupation is small. The rightmost site is different in this respect, since with the OBC there is no site to the right from which an electron could hop. At later times, this alternating pattern is replaced by a longer range oscillation in spacecorresponding to doublon and holon movements along the chain. The boundary sites remain different with significantly lower double occupation due to the OBC.

Nonequilibrium spectral function
The imaginary parts of the lesser Green's function A < (ν, t) and spectral functions A(ν.t) shown in Figs. 9, 10 and 11 are calculated from Eq. (37) with the broadening ε = 0.1 and T max ≈ 80. They are all local and site-averaged. Additionally we show the equilibrium spectral function in the ground state (which is our initial state at t = 0) calculated from Lehmann representation (34) with the same broadening ε = 0.1. In Fig. 9, we show A < (ν, t) (upper row) and A(ν.t) (lower row) for an 8-site chain with U = 6 and the pulse frequency ω = 9 at different times during (t = 8) and after the pulse. At t = 0, we start from the ground state where A < does not have any spectral weight above ν = 0. For smaller pulse strength a = 0.2 (left plots), we see only few photo-induced excitations into the upper Hubbard band in A < and the overall spectrum remains almost unchanged. With increasing the pulse strength to a = 0.6, we see stronger redistribution of the spectral weight. This effect, known as photo-doping of the Mott-insulator, has already been found in the Hubbard model with other methods-nonequilibrium DMFT [16,19] and quantum Boltzmann equation [18]. At later times, there is also an additional spectral weight shift inside both lower and upper Hubbard bands, which corresponds to the first step of thermalization [18]. In the corresponding full spectral functions A(ν, t), we additionally see that the spectral weight shifts into the Mott-gap causing a gap reduction (photo-melting). Such gap filling is also seen in the nonequilibrium DMFT study [8,16], but is missed by the quantum Boltzmann approach [18]. Both effects have also been reported in Refs. [34,35]. The gap filling is stronger in case we choose a smaller pulse frequency ω = 6 that connects points with more spectral weight than ω = 9 (as already noted in the discussion of Fig. 4, more energy can then be pumped into the system at the same pulse intensity). The spectral function and A < of the same 8-site chain but with ω = 6 are shown in Fig. 10. Already for a = 0.2 there is a significant amount of spectral weight in the upper Hubbard band (upper left plot of Fig. 10) and we also see a slight gap filling. The shift of spectral weight into the gap already for small pulse intensity a = 0.2 is even more pronounced for the 4 × 2 box geometry (see Fig. 11).
For both 8×1 chain and 4×2 box systems, increasing the pulse strength to a = 0.6 causes a significant redistribution of spectral weight. Although both systems absorb approximately the same amount of energy for a = 0.6 (cf. Fig. 12, where we show double occupation and energy for different pulse strengths as a function of time), the gap filling is much stronger for the box geometry. In both geometries, there is more spectral weight in the gap at t = 8, i.e., during the pulse, than at later times. The systems initially absorb more energy (particularly the 8-site chain), but it cannot be stored and is returned to the pulse (cf. Fig. 12 and Ref. [18]). From Fig. 12, we also learn that further increasing the pulse strength a does not lead to further increase in double occupation at later times. The initial increase in energy and double occupation grows with increasing pulse strength, in case of the chain even above the maximum equilibrium value of d = 0.25, but at later times the spectral weight is redistributed and double occupation is reduced. When we look at the maximal values of energy and double occupation at a later time after the pulse (e.g., t = 20) the chain and box geometries do not differ significantly. The chain, however, can initially absorb more energy and the rise of double occupation is bigger. This is likely related to the specific properties of the spectra, i.e., the distribution of the available states on the ν-axis (the bandwidth is similar in both geometries). For the same reason, the increase in double occupation and energy at a = 0.4 is also different for the two systems.

Summary and outlook
We have presented a simple implementation scheme for solving the time-dependent Schrödinger equation for systems described by the Hubbard Hamiltonian with timedependent hoppings. As example application, we show a detailed time dependence of double occupation after applying a light pulse for a 14-site chain with open boundary conditions. We further study the photo-induced doping and gap filling of a Mott-insulator and find similar behavior for 8-site clusters with chain and box geometry with open boundary conditions. The 8-site clusters are certainly too small to identify differences that could come from dimensionality (1d vs 2d) but it is an interesting future question if the chain/box small differences shown here, deepen and become more characteristic for larger systems.
The algorithms presented here are flexible and allow for arbitrary geometry, open and periodic boundary conditions, as well as for calculating any correlation function that can be built from creation and annihilation operators. Larger cluster sizes can become possible if one can avoid storing explicitly the matrix elements of the Hamiltonian and generate them during computation. The presented implementation allows for this change, since only matrixvector multiplications are needed. These can be replaced by operators that directly change the vector, without storing them in the matrix form.