Quantum Monte Carlo impurity solvers for multi-orbital problems and frequency-dependent interactions

The solution of an auxiliary quantum impurity system is the computationally expensive step in dynamical mean field theory simulations of lattice models and materials. In this review, we discuss Monte Carlo based impurity solvers, which are suitable for a wide range of applications. In particular, we present an efficient implementation of the hybridization expansion approach, which enables the simulation of multiorbital impurity problems with off-diagonal and complex hybridizations, and dynamically screened (retarded) density-density interactions. As a complementary approach, we discuss an impurity solver based on the determinant Monte Carlo method, which scales favorably with inverse temperature and hence provides access to the very low temperature regime. The usefulness of these state-of-the-art impurity solvers is demonstrated with applications to the downfolding problem, i.e., the systematic derivation of dynamically screened interactions for low-energy effective models, and to pyrochlore iridates, where the spin-orbit coupling leads to complex hybridization functions in a multi-orbital system. As a benchmark for cluster extensions of dynamical mean field theory, we also present results from lattice Monte Carlo simulations for the momentum dependence of the pseudo-gap in the half-filled two-dimensional Hubbard model.


Introduction
Dynamical mean field theory (DMFT) approximates the self-energy of a correlated lattice system by a momentum independent, but frequency dependent function, which can be obtained from the solution of a self-consistently determined impurity model [1]. While this simplifies the problem considerably, the numerical solution of these auxiliary quantum impurity models remains a nontrivial task. In the context of realistic a e-mail: philipp.werner@unifr.ch materials simulations, one has to confront several challenges, ranging from multiorbital problems and dynamically screened interactions to spin-orbit induced complex hybridizations and various types of symmetry breaking. Recently developed extensions of DMFT, such as the dynamical vertex approximation [2] and the dual fermion [3] or dual boson [4] approach also require the calculation of general fourpoint correlation functions. In materials studies, we are furthermore interested both in crossover phenomena at elevated temperatures, and quasi-particle formation or ordering phenomena at very low temperature, which requires a versatile and powerful computational approach.
In this section, we describe Monte Carlo based impurity solvers which have been developed or further improved in the context of the Research Unit FOR 1346. In Section 2 we will discuss the hybridization expansion approach [5], which is nowadays the most widely used method in DMFT based ab-initio calculations of correlated materials. We describe the treatment of retarded density-density interactions and complex off-diagonal hybridizations, as well as the efficient and flexible measurement of correlation functions based on worm sampling. The second class of impurity solvers [6], discussed in Section 3, is based on a determinantal Monte Carlo algorithm [7] that scales linearly in the inverse temperature, and which hence provides access to very low temperatures.
We illustrate these methods with three types of applications. First, we test the downfolding scheme based on the random phase approximation [8], which requires an impurity solver which can handle retarded interactions. Second, we present the phase diagram of pyrochlore iridates as an illustration of a material simulation with spin-orbit induced complex hybridizations [9]. In the third application we use the BSS algorithm to formulate a quasi-continuous-time impurity solver for dynamical meanfield theory calculations with linear scaling in the inverse temperature [6], and to investigate momentum-dependent pseudogaps in the half-filled Hubbard model [10]. Finally in the fourth application we discuss how to incorporate the BSS solver in the DMFT self-consistency loop with special emphasis on a multigrid approach, which extrapolates the Trotter error to zero at each iteration [11].

.1 Perturbation expansion of the partition function
Continuous-time impurity solvers [12] are based on an expansion of the partition function into a series of diagrams and the stochastic sampling of collections of these diagrams. We represent the partition function of the impurity model as a sum (or, more precisely, integral) of configurations C with weight w C , and implement a random walk C 1 → C 2 → C 3 → . . . in configuration space in such a way that each configuration can be reached from any other in a finite number of steps (ergodicity) and that detailed balance is satisfied, This assures that each configuration C is visited with a probability proportional to |w C |. One can thus obtain an estimate for some observable A from a finite number Dynamical Mean-Field Approach with Predictive Power 2501 N of measurements: where A C is the value of the observable associated with the configuration C, w C = |w C | × (phase C ), and we assumed that A C and w C can be obtained by sampling the same configuration space. The error on this Monte Carlo estimate decreases like 1/ √ N . To derive the weights, we write the partition function Z = Tr[e −βH ] of the impurity model as a time-ordered exponential in some interaction representation. We split the impurity Hamiltonian into two parts, H = H 1 + H 2 , and define the time-dependent operators in the interaction picture as O(τ ) = e τH1 Oe −τH1 . In this interaction representation, the partition function reads Z = Tr[e −βH1 T e − β 0 dτ H2(τ ) ], which may be expanded into a power series, In the rest of this section, we focus on impurity models with Hamiltonian where H loc describes the impurity, characterized by a small number of degrees of freedom (typically spin and orbital degrees of freedom denoted by a, b, . . .), and H bath describes an infinite reservoir of free electrons, labeled by a continuum of quantum numbers p and a discrete set of quantum numbers α. Finally, H mix describes the exchange of electrons between the impurity and the bath in terms of hybridization amplitudes V aα p . Explicitly, the three terms are The hybridization-expansion approach [5,13] is based on an expansion of Z in powers of the impurity-bath hybridization term H mix , and an interaction representation in which the time evolution is determined by the local part H loc + H bath . Since pα d a has two terms, corresponding to electrons hopping from the bath to the impurity and from the impurity back to the bath, only even perturbation orders contribute to the trace in equation (4). Furthermore, at perturbation order 2n, only the (2n)!/(n!) 2 terms corresponding to n creation operators d † and n annihilation operators d contribute. We therefore write the partition function as a sum over configurations {τ 1 , . . . , τ n ; τ 1 , . . . , τ n } that are collections of imaginary-time points corresponding to these n annihilation and n creation operators: Introducing the β-antiperiodic hybridization function Δ, which in the time-domain reads we re-express the trace over the bath states as Tr c e −βH bath T p1...pn p 1 ...p n a1...an a 1 ...a n α1...αn α 1 ...α n In the hybridization expansion approach, the configuration space consists of all sequences C = {τ a1 1 , . . . , τ an n ; τ a n 1 , . . . , τ a n n } of n creation and n annihilation operators (n = 0, 1, . . .), see illustration in Figure 1a. The weight of this configuration is The trace factor represents the contribution of the impurity, which fluctuates between different quantum states as electrons hop in and out. The determinant sums up all bath evolutions which are compatible with the given sequence of transitions.

Retarded interactions
Low energy models of correlated materials have dynamically screened (retarded) effective interactions, which reflect the screening effect of the high energy bands [14]. A systematic procedure for calculating these interactions is the constrained random phase approximation (cRPA) [8], which assumes that the screening effect of the high energy bands can be captured by a polarization evaluated at the random phase approximation level (bubble diagrams with LDA propagators). In practice, one finds that the monopole part of the interaction, UN tot N tot , is strongly reduced by the screening, while multipolar interactions, for example Hund's exchange terms, exhibit only little frequency dependence. An efficient formalism for treating dynamically screened monopole interactions within the hybridization expansion approach has been developed in references [15,16], Analogous configuration in a model with retarded monopole interaction. The arcs represent an interaction between operator pairs given by the function K, which has a different sign depending on whether it connects a creation and annihilation operator or a pair of creation or annihilation operators. and discussed in detail in a recent review paper [17]. In a Hamiltonian formulation, one can introduce such a screening effect by coupling the total charge on a given lattice site to a continuum of (fictitious) Holstein phonons, with frequency ω and a coupling strength g ω determined by the frequency-dependent interaction U (ω): The boson bath can be integrated out, which results in an additional factor in the Monte Carlo weight of the configuration C [15], where the fermionic part corresponds to equation (12), evaluated with the screened static interaction U scr = U (ω = 0), and is determined by the twice integrated retarded interaction [16], In equation (15) the sign factor s j associated with the operator j is +1 if it is a creation operator, and −1 if it is an annihilation operator. The bosonic contribution to the weight can thus be interpreted as arising from an "interaction" between operator pairs, which is −K(τ i − τ j ) if (i, j) is a pair of creation and annihilation operators, and +K(τ i − τ j ) if both are creation or annihilation operators (the extra minus sign comes from w C = e −SC ). The structure of the resulting diagrams is illustrated in Figure 1b, where we only plot the retarded interactions associated with one of the creation operators. Since the main computational bottleneck is the evaluation of the trace factor in w fermion,C , and the factor w boson,C is obviously real (i.e., does not generate an additional sign problem), the treatment of retarded monopole-monopole interactions is possible without a significant numerical overhead. If retarded interactions of non-density-density type have to be treated, the simulation becomes more complicated, because the retardation effects can no longer be expressed in the form of an exponential term similar to equation (15). One approach, which has recently been tested [18], is a double expansion in both the hybridization terms and the non-density-density terms. While this approach in principle allows to treat retarded spin-flips and pair-hoppings, it suffers from a severe sign problem.

Computation of the trace over the local degrees of freedom
A Monte Carlo update consists of elementary updates such as the insertion/removal of a pair of d and d † . The most costly part of an elementary update in the multi-orbital case is typically evaluating the trace for the given sequence of creation/annihilation operators and time-evolution operators. Equation (12) can be expressed as where 0 ≤ τ 1 < . . . < τ n < β, 0 ≤ τ 1 < . . . < τ n < β. The set {O 1 , · · · , O 2n } is a timeordered set of the impurity creation and annihilation operators, while P trace is the permutation of time ordering from {d an (τ an n ), d † a n (τ n a n ), · · · , d a1 (τ a1 The trace can be evaluated either by the matrix algorithm [13] or by the Krylov algorithm [19]. It was shown that the Krylov method is superior in performance for impurity problems involving more than 4 orbitals as local degrees of freedom. For the matrix algorithm, several optimized procedures have been proposed in which the recomputation of matrix products is minimized by storing intermediate products by means of tree algorithms [12] or skip lists [20]. However, these optimizations do not apply to the Krylov algorithm as they are based on storing intermediate matrix products. Here, we describe an alternative optimization, called "sliding-window update" [21], which can be applied to both algorithms. To be specific, we detail the procedure in the context of the matrix algorithm, but it naturally extends to the Krylov algorithm. As illustrated in Figure 2a, we define a narrow window on the imaginary time axis in which the updates are performed. The left and right end points of the window are denoted by τ L and τ R (τ L − τ R = τ win ). The idea is that we precompute the products of all matrices for τ > τ L and τ < τ R and keep them in memory. We choose τ win = β/ n MC , where n MC is the Monte Carlo average of the perturbation order estimated during the thermalization process. The value of τ win is fixed during the measurement process.
We define a ket as where n R is the number of operators on the interval (τ R ,0], E(τ ) is the matrix of the imaginary-time evolution operator e −τHimp and |l is the l-th eigenstate of H loc . Similarly, a bra is defined as where n L is the index of the operator with the smallest imaginary time on the interval (β, τ L ]. Using equations (18) and (19), the trace reads where Q(τ L , τ R ) is the product of matrices on the interval (τ L , τ R ]. The point of this optimization is that we do not have to recompute the bra and ket as long as updates are performed locally within the window. We move the window back and forth on the interval [0,β] in a sequential fashion as follows. At each position of the window, we propose a few elementary updates, whose number is proportional to that of flavors. After that, we move the window to the next position with a finite overlap with the previous position as illustrated in Figure 2b. In our current implementation, the step size is chosen as τ win /2. For the new position of the window, we update the bra and ket by applying creation, annihilation and time-evolution operators, or by loading cached data from memory. The important point is that we avoid evaluating the bra and ket from scratch thanks to the sequential move of the window on the imaginary-time axis. Note that all the above steps require only O(1) operations. This procedure is repeated in such a way that the window moves sequentially back and forth on the whole interval [0, β].
To make sure that the procedure is ergodic, we perform several global updates after each sweep of the window. We refer the readers to reference [21] for more details. For the global updates, we need to recompute the trace from scratch. However, the computational cost for these updates is typically not dominating because they are performed less frequently.

Complex and off-diagonal hybridizations
It is clear from the definition of the hybridization function (10) and the Monte Carlo weights (12), (11) that the formalism outlined in Section 2 is applicable also to models with off-diagonal hybridization functions (Δ ab = 0 for a = b). In the most general case, we get a single matrix M which is determined by the time positions of all the operators. The hybridization functions can in general be complex, which then leads to complex configuration weights w C , and to a sign problem (or more properly phase problem) in the Monte Carlo sampling. In this case, we implement the Monte Carlo sampling using the positive distribution of weights |w C |, and shift the phase to the observable. As shown in equation (3), the expectation values of observables are then obtained by the ratio of this phase weighted measurement and the average value of the phase. In general, w C /|w C | can be a complex number. But, the expectation value phase , which is usually refered to as the average sign, is always real because the partition funtion is real.
If the hybridization is diagonal in certain flavors, then the hybridiztation matrix acquires a block structure, and the determinant in equation (12) becomes the product of the determinants of these blocks [13]. For example, if the hybridization is diagonal in spin, we can separate the configuration of operators into spin-up and spin-down operators, and define separate matrices M σ for σ =↑, ↓, and det M becomes σ M σ .

Worm sampling
When it comes to the measurement of Green's functions and other correlation functions in general models, some care has to be taken, since the standard measurement procedure based on the removal of hybridization lines [5] may not always be ergodic. For instance, we cannot measure the off-diagonal elements of the singleparticle Green's function when off-diagonal hybridization functions are vanishing. Also, we cannot measure spin off-diagonal elements of correlation functions such as S + (τ )S − (0) by conventional sampling.
To overcome these ergodicity problems, worm-type measurements have been developed [12,[22][23][24]. In this section, we discuss an efficient implementation of the worm sampling and worm measurement procedure.
Our implementation provides measurements of correlation functions of the form For example, the single-particle Green's function, which is the most important observable for DMFT calculations, is defined as where τ, τ ∈ [0, β). Comparing equations (21) and (22), one finds that O Similarly to the partition function, the numerator of equation (21) is expanded as a1,··· ,an a 1 ,··· ,a n τ2 0 dτ 1 where {O 1 , · · · , O 2n+K } is a time-ordered set of all creation and annihilation operators. We denote the number of worm creation and annihilation operators by K. P trace is the permutation of time ordering from {c a1 (τ 1 ), c † a 1 (τ 1 ), · · · , c an (τ n ), c † a n (τ n ), Note that a single worm operator may contain multiple creation and annihilation operators. The set {τ 1 , · · · ,τ 2n+K } is the time-ordered set of the imaginary times of all creation and annihilation operators.
In a Monte Carlo simulation, we sample an extended configuration space [12,[22][23][24] We denote a configuration in the subspace S O by τ n , τ n ; a 1 , a 1 , · · · , a n , a n ; The total partition functionW now reads The overline means that we take the absolute values of the contributions of diagrams. We choose the coefficient η (> 0) such that the simulation spends approximately the same number of Monte Carlo steps in both spaces. When we sample the subspace S O , the Monte Carlo weight is given by We sample in both configuration spaces according to the weights |w(C)| and η|w(C O )|, respectively. In practice, we switch from S Z to S O by inserting worm operators as described later.
The estimator for the observable O is given by where N Z and N O are the number of Monte Carlo steps spent in S Z and S O , respectively. The brackets · · · Z and · · · O denote the Monte Carlo average in S Z and S O , respectively. We introduce N Z and N O into equation (29) to cancel out the ones included in · · · Z and · · · O , respectively. The symbols "phase" in the numerator and the denominator denote | is a complex number, but the expectation value phase Z is real, because the partition function Z is real. The above equation indicates that the importance sampling works efficiently only if phase Z is sufficiently large ( phase Z 0.1).
For instance, the estimator for the single-particle Green's function reads for 0 < Δτ < β. The factor β −1 comes from the extra degree of freedom τ for the sampling in the worm space. Here, we introduced where x(τ ) = 2τ /β − 1 and P l (x) is the l-th Legendre polynomial defined on the interval [-1,1]. In practice, instead of using the imaginary-time estimator in equation (30), we directly measure the coefficients of the Legendre polynomials using the estimator whereP The cutoff N l is a simulation parameter, whose typical values are N l = 50 -100. The above procedure naturally extends to the measurement of multiple observables, for instance, O, P , · · · . Correspondingly, the total partition function now reads The positive parameters η O , η P , · · · must be adjusted so that the number of Monte Carlo steps spent in the subspaces are of the same order of magnitude. We refer the reader to the Appendix of reference [26] for the technical details of the adjustment of the parameters. The estimators for O, P , · · · are defined in a way analogous to equation (29). We insert or remove a worm as {τ 1 , τ 1 , · · · , τ n , τ n ; a 1 , a 1 , · · · , a n , a n } {τ 1 , τ 1 , · · · , τ n , τ n ; a 1 , a 1 , · · · , a n , a n ; However, some care has to be taken because these updates may never be accepted when the local Hamiltonian has conserved quantum numbers other than the number of electrons. One example is the measurement of the equal-time single-particle Green's function c † i c j (i = j) in the case where H loc has conserved flavor quantum numbers. In this case, the insertion/removal of a worm breaks the conservation law of the quantum numbers from τ = 0 to β, leading to a vanishing expectation value of the trace. Thus, such updates are never accepted.
To avoid this problem, we also insert/remove a worm by detaching/attaching hybridization lines from/to impurity operators. For the single-/two-particle Green's functions, we follow the procedure described in reference [23]. However, this procedure does not directly apply to other correlation functions whose worm operators have the same imaginary time, e.g., the equal-time Green's functions. For such correlation functions, we need to gather/redistribute the worm operators on the imaginary time axis at the same time. In practice, we measure multiple observables simultaneously. We sketch the transitions in the extended configuration space in Figure 3.  3. Illustration of transitions in the extended configuration space. It is possible to measure multiple observables, namely "Z" (partition function), "G1" (single-particle Green's function), "G2" (two-particle Green's function), "Equal-time G1" (equal-time single-particle Green's function), "Two-time G2" (two-time two-particle Green's function), and "Equal-time G2" (equal-time two-particle Green's function) simultaneously. By default, we measure only "Z", "G1" and "Equal-time G1". We make transitions between the subspaces of the configuration space by inserting/removing a worm. The solid arrows denote transitions by adding/removing worm operators and those by detaching/attaching hybridization lines. The broken arrow represents transitions by adding/removing worm operators. We also perform direct transitions between "Two-time G2" and "Equal-time G2" to reduce autocorrelation times. The definitions of these observables are given in reference [26].

BSS impurity solver
The above described CT-HYB algorithm [5], as well as the weak-coupling CT-INT [41] algorithm are action based methods. They can easily handle retarded density-density interactions [42][43][44] but invariably scale with the cube of the inverse temperature. In contrast the BSS algorithm [45,46] is Hamiltonian based. It cannot handle retarded interactions but scales linearly with inverse temperature. Another caveat which has been extensively discussed in the literature is the systematic Trotter error inherently present in the BSS algorithm. This systematic error originates at the very first step, when considering a generic Anderson impurity type model relevant for applying the BSS algorithm in the context of quantum cluster theories [47]. Here, i labels orbitals, and M is a subset of orbitals on which a Hubbard interaction U is applied. The above form includes the single impurity Anderson model (SIAM) relevant for DMFT calculations. In the BSS algorithm, the first step is to isolate the interaction term. This is achieved using the Trotter decomposition: Strictly speaking the Trotter error is of order Δτ . However for real representable hopping and interaction terms, the coefficient of the linear term can be shown to vanish. The Trotter error is a high energy cutoff (i.e., lattice spacing in imaginary time) and should thereby not affect critical behavior such as the Ising universality class of the Mott metal-insulator transition. However, the location of the transition will be dependent on the Trotter error, thus making it hard to compare with other methods. Given this situation a number of efforts were initiated to formulate a so called continuous-time BSS algorithm [48]. To date, efforts in this direction are based 2510 The European Physical Journal Special Topics on a CT-AUX type formulation [49,50] and face two issues. The first issue is that they are restricted to a class of models with Hubbard-type interactions such that the basic CT-AUX equation [51] 1 holds. The second issue is that in the continuous-time approach it is hard to formulate a computationally efficient algorithm. Given this situation it turns out that the multigrid method [6] put forward by one of the authors is an efficient scheme to extrapolate to small imaginary time steps so as to eliminate the Trotter error.
We have developed a general open source package entitled Algorithims for Lattice Fermions available under http://alf.physik.uni-wuerzburg.de which provides a general implementation of the BSS algorithm. It is beyond the scope of this article to review the details of the BSS algorithm and the reader is referred to overview articles such as reference [52]. The ALF-package enables efficient simulations of models that can be written in the form Here O σ,k is a Hermitian sparse matrix, and k runs over a set of interaction terms. We note that the above form can account for various types of impurity models including Kondo impurities [53].

Tests of the cRPA downfolding
We have used the solver's ability to treat retarded density-density interactions in a systematic test of the constrained random phase approximation (cRPA) downfolding procedure [27]. The cRPA method [8] is commonly used to compute the dynamically screened interaction parameters of low-energy models from the LDA bandstructure. The idea is to separate the bandstructure into low-energy target bands and high-energy screening bands, and to calculate a polarization P r associated with single-particle excitations into or out of the target bands. If v is the bare Coulomb interaction, the effective interaction for the target bands is obtained by evaluating the matrix elements of in some appropriate localized basis. While this procedure is in principle exact, in cRPA the polarization P r is evaluated in the random phase approximation, i.e., by computing bubble diagrams made from LDA propagators. In order to test the accuracy of this scheme, we have studied a three-orbital Hubbard model on a cubic lattice with orbital-diagonal transfer t = 1 between nearest-neighbor sites [27]. The Hamiltonian is where i and j are site indices, while α and β denote orbital indices. The idea is that this model can be solved within DMFT, so that we can compare this solution of the full model with the predictions of the downfolded model. Since the low-energy model has retarded and long-ranged interactions, it has to be solved within the framework of extended DMFT (EDMFT) [28,29]. We choose α = 2 as the target band and α = 1, 3 as the screening bands. The orbital potentials E α are given by −Δ, 0, Δ for α = 1, 2, 3, so that Δ > 0 produces gaps between the target-and screening-band manifolds. The on-site repulsion U α is taken to be U α = U d /2, U d , U d /2 for α = 1, 2, 3, respectively, and we also include inter-orbital interactions U . The chemical potential μ is adjusted in the DMFT selfconsistent procedure such that the number of electrons is 3 (half filling). The Coulomb interaction breaks the particle-hole symmetry because it induces orbital-dependent mean fields. To recover this symmetry in the limit of t = 0, we take E dc α = U , 0, In Figure 4 we show the onsite (U ) and nearest-neighbor (V nn ) interactions produced by the cRPA method for Δ = 10, t = 4, U d = 10 and three different values of U /U d . The horizontal dashed lines indicate the bare value of the on-site interaction in the target manifold [= U (ω = ∞)], which is smaller than U d . This is because the Wannier functions of the target band extend to the screening orbitals with smaller onsite repulsions. This also explains the dependence of U (ω = ∞) on U . For the values of U /U d considered here, ImU (ω) exhibits a dominant negative peak around ω=15. Below this energy scale, which corresponds to transitions between the target band and the screening bands, the on-site interaction is reduced from the instantaneous value U (ω = ∞). Although the full model has only on-site interactions, a dynamic nearest-neighbor interaction V nn (ω) is generated. This interaction is substantially smaller than the on-site interaction, and almost vanishes at low frequencies.
The full ω dependence of the on-site and nearest-neighbor interaction is taken into account in the EDMFT solution of the low-energy model. The resulting phase diagram in the space of U d and U is shown in Figure 5 (charge-cRPA, red line). We have also tested a spin-dependent version of the cRPA procedure, which takes into account the self-interactions in Hubbard-type models resulting from the Pauli exclusion principle. In this formalism the Coulomb matrix is a 6 × 6 matrix of the form where and the screened interaction projected onto orbital 2 becomes This compares to in the conventional charge-cRPA, which contains a term O(P ) that violates the Pauli principle.
The spin-cRPA phase boundary is shown by the pink line. We also plot in green the phase boundary obtained by the unscreened model, i.e., the model with bare interactions projected onto the orbitals of the target space, as well as in blue the phase boundary predicted by the DMFT solution of the original three-band system.
The considerable disagreement between the phase boundaries shows that the cRPA downfolding procedure (both in its charge and spin variants) is not reliable, at least in systems with low-energy screening bands. For example, at U = 0 the charge-cRPA interaction considerably underestimates the correlation strength, while the bare interaction projected onto the targed band gives a good prediction for the critical U d . This suggests that the screening processes captured by the cRPA bubble diagrams are compensated by anti-screening contributions from neglected diagrams. An analysis and deeper understanding of these cancellation effects, which may result for example from a functional renormalization group study [30,31], is an important step in the development of more reliable downfolding procedures.

Pyrochlore iridates
We have used the multi-orbital impurity solver with complex hybridization functions to describe finite-temperature properties of pyrochlore iridates by means of the DMFT based ab-initio method [9]. The pyrochlore iridates A 2 Ir 2 O 7 (A = Pr, Nd, Y, etc.) are an ideal system to study novel phenomena induced by the competition and cooperation between spin-orbit coupling (SOC) and electron correlations because their magnetic and electronic states can be tuned by chemical substitution, pressure, and temperature (T ). These compounds show a crossover from metal to insulator with decreasing A 3+ ionic radii at high T [32,33]. The Ir magnetic moments form a socalled all-in/all-out non-collinear magnetic state at low T for small A 3+ ionic [35][36][37]. Y 2 Ir 2 O 7 is the insulating compound with the highest magnetic transition temperature and no f moments. A pioneering local density approximation (LDA)+U study suggested that a topological Weyl semimetal (with the coexisting all-in/all-out magnetic order) may be realized as the ground state of some compounds in this series [38].
To clarify the role of local electron correlations, we applied the so-called local density approximation + DMFT method to the prototype compound Y 2 Ir 2 O 7 . First, we constructed maximally localized Wannier functions for the t 2g manifold by band calculations based on the local density approximation. We angular momentum is defined as −L +Ŝ. The vectors of the eigenbasis are denoted by |j eff , j eff 111 . This basis diagonalizes the local SOC in the t 2g manifold. However, the trigonal crystal field introduces non-zero hybridization functions between vectors with the same value of j eff 111 , e.g., |1/2, 1/2 and |3/2, 1/2 . We took into account all the off-diagonal hybridization functions, taking advantage of our complex-number implementation of the solver.
We show the U -T phase diagram obtained by the DMFT calculations in the left panel of Figure 6. The calculations were done down to β = 150 (eV −1 ). We identify the magnetically ordered phase at low T as follows. At sufficiently high T = T 0 , we obtain a paramagnetic solution, which has vanishing magnetic moments on atoms within statistical errors. Then, we compute a self-consistent solution at a slightly lower T = T 1 using the solution at T 0 as an input of the self-consistent procedure. The procedure is repeated as T is lowered as T 0 , T 1 , T 2 , · · · . We observed that the all-in/all-out magnetic structure emerges below the transition temperature T c . As a function of U , T c rises up to values about three times higher than the experimental T exp c 150 K. This may be due to the neglect of the feedback of spatial fluctuations in the DMFT approximation.
In the left panel of Figure 6, we also show the first-order Mott transition line obtained by paramagnetic DMFT calculations. Apparently, the metal-insulator transition in the magnetic DMFT phase diagram is assisted by magnetic ordering. The metal-insulator transition is sticking out of the magnetically ordered phase as a crossover between a paramagnetic metal and a paramagnetic insulator, which is signaled by a sharp change in the spectral function A(ω = 0). The insulating compound Y 2 Ir 2 O 7 , which has the highest T c in the family, may be located at U 2.5 eV.
We plot the momentum-resolved spectral function computed at β = 40 (eV −1 ) 290 K. We compare the spectral function computed by LDA+DMFT and the LDA band structure (solid red lines). One can see that, even in the paramagnetic metallic phase, the band width is substantially reduced by strong correlation effects. This raises questions on the reliability of the LDA+U method [38], which takes into account correlation effects within the Hartree-Fock level, for discussing the stability of the Weyl semimetal [38] in this series. On the other hand, in the paramagnetic insulating  phase, we did not observe any signature of quasi-particle bands, indicating that the system is in a Mott insulating state. Figure 7 shows the average sign encountered in solving the quantum impurity problems. The calculations were done down to β = 150 (1/eV) 77 K. The average sign remains sufficiently large in the eigenbasis. Interestingly, the temperature dependence is not monotonic. At large U , the average sign shows a local minimum around 200 K. The average sign decreases again below 100 K. Although this basis may not be optimal in terms of average sign, it is practically good enough for the present purpose.
Recently, G. Prando et al. reported the magnetic properties of Eu 2 Ir 2 O 7 under hydrostatic pressure by macroscopic and local-probe techniques [40]. They found that the magnetic transition temperature increases up to P = 20 kbar and then drops as hydrostatic pressure is increased. They compared their results with our theoretical results under the assumption that the hydrostatic pressure corresponds to changing the U/W ratio (W is the band width). We show a similar plot in Figure 8, where the theoretical T c are scaled by a factor of 0.31 to correct the overestimation by the local approximation. The values of U used in our LDA+DMFT calculations are mapped to the values of hydrostatic pressure as suggested by G. Prando et al. One sees that the P dependence of the data are consistent with the theoretical curve, although experiments for higher P are needed for a more detailed comparison.

Momentum dependent pseudogaps in the half-filled Hubbard model
In the absence of a negative sign problem, the BSS algorithm offers the possibility of accessing large lattice sizes at high precision such that a detailed study of the interplay of spin and charge degrees of freedom can be carried out. Here, we will concentrate on how spin fluctuations lead to a momentum dependence of the opening of the single particle gap in the half-filled two dimensional Hubbard model. This is a non-local effect and the results we present in this section can act as benchmarks for extensions of the DMFT approach aiming at including spatial spin-fluctuations.
We consider the Hubbard model on the square lattice with hopping matrix element −t and Hubbard interaction U . In the non-interacting limit, the band width, W = 8t, sets the coherence scale T coh . At weak coupling, U/W < 1, we can expect the spin scale, T Spin , as defined by the energy scale below which spin-fluctuations emerge, to satisfy T Spin < T coh . In this regime, our numerical results describe how the nested Fermi surface develops a single particle gap due to the coupling to spin fluctuations.   [55]. (c) Size extrapolation of the imaginary time Green function at U/t = 4, T = 0.2t and k k k = (π/2, π/2). (d) The spectral functions at high symmetry points on a finite lattice and after extrapolation of the imaginary time data to the thermodynamic limit. This figure is adapted from reference [10].
In the opposite limit, U/W > 1, this picture does not hold. Here the charge gap originates from local interactions and is set by the energy scale U , and the magnetic energy scale by J = 4t 2 /U . From the point of view of magnetic fluctuations one expects both the strong and weak coupling limits to belong to the so-called renormalized classical phase of quantum anti-ferromagnets [54]. Thereby, below the spin scale, the spin correlation length will grow as ξ s ∝ vs kB T e 2πρs/kB T where v s is the spin velocity and ρ s the spin stiffness. This exponential growth of the spin correlation length defines the challenge of the calculation since strictly speaking we have to consider lattice sizes L > ξ s . Our calculations are summarized in detail in reference [10].
As mentioned above, the first challenge is the system size. The second is the Wick rotation from the imaginary time axis to the real time which we carried out with the Maximum Entropy method [56]. Finally the third issue is the systematic Trotter error. Figure 9 illustrates these issues. It turns out that given the accuracy of the Wick rotation, the data at Δτ t = 0.1 are good enough such that a systematic extrapolation to the continuous time was not carried out. The size dependence is on the other hand crucial. Figure 9c shows that for all considered imaginary times the finite size corrections of the Green function follow a 1 L 2 law which we use to carry out the extrapolation to the thermodynamic limit. The importance of taking the thermodynamic limit is underpinned in Figure 9d on the real frequency axis. Consider the temperature T = 0.2t. On the 8 × 8 lattice the magnetic correlation length is comparable to the lattice size. This statement is supported by Figure 3 of reference [46] which shows that the spin structure factor at the antiferromagnetic wave vector is not saturated at this temperature and lattice size. As a consequence, for the L = 8 lattice we expect the data to bear similarities to a long range ordered state and thereby be well described within a mean-field picture. Within this approximation the single particle gap at the M' and X points in the Brillouin zone are identical. As the system size grows and exceeds the magnetic correlation length, it becomes apparent that gap fills in at the M' point but remains robust at the X point. This momentum dependent opening of the single particle gap is at the origin of the pseudogap. In our calculation the Fermi surface is nested and crystal momentum conservation alone will not account for stronger scattering of quasiparticles off paramagnons between the antinodal points k k k = (0, π) and (π, 0). We hence attribute our observation to the Van-Hove singularity at the X and symmetry equivalent points in the Brillouin zone which provide an enhanced phase space for scattering off paramagnons. Figure 10 summarizes the temperature dependence of the single particle spectral function as well as the local density of states. At the highest temperature the data follow approximately the single particle dispersion relation (dashed green line). In particular between the X and M' points it is dispersionless. The observed momentum dependence along this line at slightly lower temperatures reflects the k-dependence of the self energy stemming from scattering off paramagnons. At lower temperatures, and as the correlation length increases, the gap becomes to a first approximation k-independent as expected in a weak coupling mean-field approximation. We also note that there is a break between high and low energy features which is especially apparent along the Γ-X line at low temperatures. This so-called waterfall feature reflects a fundamental property of the dynamics of a single hole in a antiferromagnetic background namely that a coherent quasiparticle band of width set by the magnetic scale [57,58] splits off from an incoherent high energy background.

Quasi-continuous-time impurity solver for DMFT with linear scaling in inverse temperature
As mentioned previously the BSS algorithm is, as opposed to the CT-INT and CT-HYB, Hamiltonian based. This is very similar to exact diagonalization and one consequence is that in the DMFT self-consistency loop, depicted in Figure 11, one has to solve an inverse problem: given the local Green function, we have to find a Hamiltonian form which reproduces it best. This step essentially corresponds to finding an optimal discretization of the fermionic bath of the single impurity Anderson model with a bath consisting of N b orbitals: To solveĤ And for N b bath sites, we can use the general implementation of the BSS algorithm described in Section 3, to produce the local Green function. To avoid any Trotter error in this step, we use the multigrid algorithm developed by one of the authors [55] so as to extrapolate the systematic error to zero. For each given choice of the Trotter time step, Δτ , the Green function is known only on a discrete grid. Prior to extrapolation, one has to generate a common grid. There are many ways of doing this. A simple approach that differs marginally from that used in reference [11] is to use the Maximum Entropy method to solve the inverse problem and then use the spectral function to interpolate G Δτ And (τ, {V i , i }) to a finer mesh. Extrapolation produces G And (τ, {V i , i }) or equivalently the Matsubara Green function, G And (iω m , {V i , i }). We note that rather large values of Δτ can be used for the extrapolation, thus rendering the scheme very efficient even though the BSS scales linearly with the number of Trotter slices. The ability to calculate the Matsubara Green function for the Anderson model of equation (48) is only part of the self-consistency since the model parameters {V i , i } have to be determined so as to at best account for the bath Green function G. To carry out this fitting procedure, we define the cost function with a cutoff Matsubara frequency iω nc and the weighting factor w n , which can be used to optimize the bath parametrization and which we set to w n = 1. The minimization is a tricky problem. Here we used a Newton method with an analytic expression for the gradient. Since this approach can be trapped in local minima we start the Newton search with many different initial configurations. Steps to go beyond DMFT, in particular applications to DCA, require a better fitting strategy. In the framework of this project we have developed a stochastic fit algorithm based on parallel tempering and simulated annealing. The algorithm is based on stochastic sampling of the target parameters, {V i , ε i }, and searches for the minimal value of the above defined cost function (see Eq. (49)). Details of the implementation may be found in reference [59].
In the following, we test the method by comparing to an exact diagonalization solver. First, we consider the extrapolation to zero imaginary time step at fixed number of bath sites, and then test the convergence as a function of the number of bath sites. Tests were carried out for the half-filled Hubbard model with semi-elliptic density of states with full band width W = 4 and within the paramagnetic phase. Figure 12a shows how important it is to take into account the Trotter time step extrapolation in the calculation of the exact location of the first order transition. As mentioned above, the transition points are not universal and depend upon the details of the model. On the other hand, the Ising universality class of the Mott transition at finite temperature is expected to be independent on the choice of the Trotter discretization. One notices that the multiscale BSS algorithm reproduces the exact diagonalization results for the special case of four bath sites. It is also to be noted that four bath sites at this temperature already reproduce very well the multigrid Hirsch-Fye result for a continuous bath both for the double occupancy and the quasiparticle residue (upper and lower panels of Fig. 12a). As the temperature deceases, the number of bath sites required to achieve convergence is however expected to grow. This is illustrated in Figures 12b and 12c. Upon inspection, one notices that at T = 0.04 four bath sites suffice to reproduce the continuous bath results. On the other hand, at the lower temperature T = 0.02 convergence is reached only with five bath sites. The convergence as a function of bath sites is remarkable. One should compare this to standard lattice calculations with dynamical exponent z = 1. In this case, the system size has to track the inverse temperature so as to reach convergence. On the basis of these calculations we believe that using the BSS algorithm with the muliti-grid approach provides a very efficient linear in β solver. The bottleneck will however clearly be the sign problem and the complexity of the interaction. However, within the A A Algorithims for L L Lattice F F Fermions package available under http://alf. physik.uni-wuerzburg.de quite general interactions (see Eq. (41)) can be treated, thereby allowing many possible applications including the Kondo lattice model. An account of the status of this project can be found in references [59,60].

Conclusions
In this chapter, we described the Monte Carlo based impurity solvers which have been developed or further improved in the context of the Reseach Unit FOR 1346. In Section 2, we reviewed the formulation of the hybridization expansion Monte Carlo impurity solver. In particular, we sketched the treatment of retarded interactions and complex hybridizations, optimized sampling procedures for multi-orbital models, and the worm-based measurement of general correlation functions. An open source code package providing most of the features described here has recently been published in reference [26].
In Section 3, we gave a brief description of the Hamiltonian-based BSS algorithm, whose computational complexity scales linearly with inverse temperature. We also mentioned the general open source package, Algorithms for Lattice Fermions, which provides a general implementation of the BSS algorithm.
In Section 4, we demonstrated the power of our impurity solvers by presenting four applications. First, we performed a systematic test of the constrained random phase approximation (cRPA) downfolding procedure in the context of a three-orbital Hubbard model. We derived a low-energy single-band model with retarded and longranged interactions from the three-orbital Hubbard model by means of cRPA. We solved both models within the DMFT and extended DMFT frameworks to compare the results, and to judge the reliablility of cRPA. This application used the impurity solver's ability to treat retarded interactions. Second, we presented the phase diagram of pyrochlore iridates obtained using DMFT based ab-initio calculation as an illustration of a material simulation with spin-orbit induced complex hybridizations. We revealed the substantial role of strong electron correlations in the low-temperature states. The solver's ability to treat complex hybridizations and the absence of a severe sign problem enabled these calculations at low temperatures. Third, we applied the BSS lattice model solver to the Hubbard model on the square lattice. We showed that magnetic fluctuations lead to a momentum dependent opening of the single particle gap. Such calculations provide benchmark results for approximate DMFT based methods aimed at describing spatial fluctuations. In the fourth application we incorporated the BSS solver within the DMFT self-consistency loop taking care of extrapolating the Trotter time step to zero by using a multigrid approach. This BSS-DMFT algorithm was tested extensively and provides an efficient linear-in-β solver for DMFT and beyond.
These examples illustrate how the efficient Monte Carlo impurity solvers developed in the context of the Reseach Unit FOR 1346 allow to go beyond the previous stateof-the-art model and ab-initio calculations, and contribute to the goal of materials simulations with predictive power.