Complex Langevin dynamics and zeroes of the fermion determinant

QCD at nonzero baryon chemical potential suffers from the sign problem, due to the complex quark determinant. Complex Langevin dynamics can provide a solution, provided certain conditions are met. One of these conditions, holomorphicity of the Langevin drift, is absent in QCD since zeroes of the determinant result in a meromorphic drift. We first derive how poles in the drift affect the formal justification of the approach and then explore the various possibilities in simple models. The lessons from these are subsequently applied to both heavy dense QCD and full QCD, and we find that the results obtained show a consistent picture. We conclude that with careful monitoring, the method can be justified a posteriori, even in the presence of meromorphicity.


Introduction
The determination of the QCD phase diagram in the plane of temperature and baryon chemical potential is one of the outstanding open questions in the theory of the strong interaction, as it is relevant for the early Universe, ongoing heavy-ion collision experiments at the Large Hadron Collider and the Relativistic Heavy Ion Collider, nuclear matter and compact objects such as neutron stars. Ample progress has been made along (or close to) the temperature axis, where lattice QCD can be used to solve the theory numerically, and in recent years it has been possible to simulate QCD with 2 + 1 flavours of light quarks using physical quark masses while taking the continuum limit [1,2]. This is directly relevant for ultrahigh-energy heavy-ion collisions. The remainder of the phase diagram has not yet been established from first principles. As is well-known [3,4], at nonzero baryon chemical potential, the quark determinant in the standard representation of the QCD partition function is complex, rather than real and positive, ruling out the immediate use of standard numerical methods based on importance sampling. This is generally referred to as the sign problem.
There are various proposals available to circumvent the sign problem, see e.g. the reviews [4][5][6][7][8] and lecture notes [9]. One approach which has generated substantial attention in the past years is the complex Langevin (CL) method, since it has so far proved to be quite successful in simulating systems with a complex action S, or complex weight ρ, from simple toy models to QCD [10][11][12][13][14][15][16][17][18][19][20][21]. While the method was suggested already in the 1980s [22,23], recent progress has come in several ways: the theoretical justification has been provided [24,25] (see also Refs. [26,27] for related theoretical developments); numerical instabilities can be eliminated using adaptive stepsizes [28]; explicit demonstrations that the sign problem can be solved in spin models and field theories have been given, even when it is severe [11,13,29]; and finally, for nonabelian theories, controlling the dynamics via gauge cooling [15], possibly adaptive [30] (see also Ref. [31]), has been shown to be necessary and effective, resulting in the first results for full QCD [16,18,20,21,32]. Promising steps beyond gauge cooling have also been taken [33].
There is, however, a serious conceptual problem that has to be faced. It is by now quite well established that when the weight ρ ∼ exp(−S) is free from zeroes in the whole complexified configuration space, the only worry is the possibility of slow decay in imaginary directions [24,25], which will result in incorrect convergence. However, for theories which include fermions, such as QCD, integrating out the latter will yield a determinant which will always have zeroes for some complexified configurations. These zeroes lead to a meromorphic drift; the formal justification for the CL method [24,25] requires holomorphicity, however (this will be reviewed below), and poles may cause convergence to wrong results. The relevance of this has first been pointed out by Mollgaard and Splittorff [34,35] in the context of a random matrix model and has been further investigated in Refs. [36][37][38][39]. Possible consequences for the behaviour of the spectrum of the Dirac operator [40] have been studied in random matrix theory [41], as has the interplay with gauge cooling [42].
This problem has both theoretical and practical aspects. Concerning the former, it requires a re-analysis of the derivation and justification of the method, given for the holomorphic case in Refs. [24,25]. To do so is the first aim of this paper and is the topic of Sec. 2. In practice, it has been observed in a number of papers that a meromorphic drift will not necessarily cause convergence to wrong resultssometimes without this issue being explicitly flagged up (one example being when the meromorphicity is due to the Haar measure). However, this aspect is not yet properly understood; while there is a collection of results for a variety of models, an overall understanding is lacking. In Sec. 3 we address this issue using simple models, in which a detailed understanding can be obtained. Lessons from this analysis are summarised in Sec. 4. In Sec. 5 we then move to a more intricate SU(3) model and see how the lessons apply in that context. Finally, in Sec. 6 we turn to lattice QCDheavy dense QCD and full QCD -and compare our findings with the understanding developed previously. A discussion of the results obtained in the various models is contained in Sec. 7. We conclude that an overall consistent picture can be extracted, applicable across all models considered, and give guidance on how to tackle this problem in future simulations. The Appendices contain some additional material, including proposals on how to handle poles in the drift in special cases. We note that partial results have already been presented in Refs. [43][44][45].

Formal justification in the presence of poles
We briefly recall the basic principles of the CL method, adapting the results for its justification [24,25] to include a meromorphic drift, i.e. a drift with a pole.
Given a holomorphic action S we denote by ρ the (normalised) complex density on the original real field space. For simplicity we assume here a flat configuration space, i.e. R n . A complex drift K(x + iy) is defined by analytic continuation as K(x + iy) = ∇ρ(x + iy) ρ(x + iy) = −∇S(x + iy). (2. 2) The CL equation, a stochastic differential equation in the complexified field space, with the drift given by the real and imaginary parts of K, 3) y = K y + η I , K y ≡ Im K, η I (t)η I (t ′ ) = 2N I δ(t − t ′ ), (2.4) leads to the Fokker-Planck equation describing the evolution of the (positive) probability density P (x, y; t),Ṗ (x, y; t) = L T P (x, y; t), (2.5) with where N R − N I = 1 and N I ≥ 0. We used here 'complex noise' (N I > 0) for presentation purposes; below we specialise to real noise (N I = 0), as advocated earlier [24,25]. Averaging over the noise, the evolution of holomorphic observables O(x + iy) is governed by the equatioṅ O(x + iy; t) = LO(x + iy; t) =LO(x + iy; t), (2.8) withL = [∇ z − (∇ z S(z))] ∇ z , (2.9) where in the last step we used the Cauchy-Riemann equations, i.e. holomorphy of O(x + iy; t), and z = x + iy.
For holomorphic actions, this requires care in the imaginary directions, |y| → ∞. Slow decay, for instance power-law decay in polynomial models, does not allow partial integration to be carried out for all holomorphic observables z n without picking up contributions at the boundary. On the other hand, if the distribution is strictly localised in a strip in the complex configuration space, no boundary terms will appear and the results from the CL simulation can be justified, see for instance Ref. [46] for an explicit example.
In the case of a meromorphic drift, the topic of this paper, we have to introduce two boundaries: one at large |y| and near the location(s) of the pole(s), which we denote generically as z p . Let us first reconsider Eq. (2.12): by definition we have F (t, t) = P (x, y; 0)O(x + iy; t) dxdy. (2.15) We may consider a single trajectory starting at (x, y) = (x 0 , 0), which means choosing P (x, y; 0) = δ(x − x 0 )δ(y). (2.16) We then find F (t, t) = O(x 0 ; t). (2.17) This is well defined. Furthermore, provided x 0 = z p , the time-evolved observable O(z; t) is holomorphic for z = z p . However, according to Eq. (2.8), we have to expect that O(z; t) has an essential singularity at z = z p , since formally and each term of the series in general will produce a pole of higher order. This is the first finding. Now let us look at Eq. (2.14): For simplicity we assume that there is only a single pole at z = z p and consider a one-dimensional configuration space. Integration by parts can be used at first only for the domain (2.19) in which the dynamics is nonsingular; later we have to take the limits Y → ∞ and ǫ → 0. The first integral in Eq. (2.14) is of the form where J is the 'probability current' Using the divergence theorem (Gauss's theorem) one finds that the first integral is equal to where ds stands for the line element of the boundary ∂G and n denotes the outer normal. The boundary has 3 disconnected pieces: two straight lines at y = ±Y and a circle at |z − z p | = ǫ. We assume now as usual that P has sufficient decay so that the contributions from y = ±Y disappear for Y → ∞. Then the question remains if the circle around z p gives a nonvanishing or even divergent contribution. Numerically it has been found that always P (x p , y p ) = 0, (2.24) and furthermore that P vanishes at least linearly with the distance from z p , with some angular dependence. But the expected essential singularity of the evolved observable O(x + iy; t) at z p could lead to a finite or even divergent contribution as ǫ → 0. Numerically, however, we never found divergent behaviour, so presumably the boundary terms are finite. But they may be nonzero, spoiling the proof of correctness. This is the second finding, the appearance of boundary terms, similar to the ones that may appear at |y| = Y . Let us apply integration by parts a second time to the bulk integral Green's first identity (also a consequence of the divergence theorem) says that this is equal to The discussion of the new boundary terms is almost identical to the one above; again what happens depends on the detailed behavior of O(x + iy; t) near z p . In practice we found (numerically) no indication of any divergence caused by the existence of an essential singularity of O(x + iy; t). 1 The reason for this seems to be that both P (x, y; t) and O(x + iy; t) have nontrivial angular dependence. In Sec. 3.2 we discuss a probably typical situation in which P (x, y; t) vanishes identically in two opposite quadrants near the pole. So if O(x + iy; t) shows strong growth only in those quadrants, the product may well be integrable, i.e. the boundary terms near the pole remain bounded.
To summarise, we find that the time-evolved observable will generically have an essential singularity at the pole, which, however, is counteracted by the vanishing distribution. Concerning the justification, partial integration at the boundaries now also includes integration around the pole, which requires the distribution to vanish rapidly enough for partial integration to be possible without picking up boundary terms. In the following section, we will study this first in simple models, focussing on the essential elements.

One-pole model
The simplest model of a system with a pole is given by the density on R, where we take β real. When z p is real, the weight is real as well, but the model has a sign problem for odd n p , while for even n p the zero in the distribution may potentially lead to problems with ergodicity. When z p is complex, the weight is complex of course. The complex drift appearing in the Langevin process is given by While the original weight vanishes at z p , the drift diverges and is hence meromorphic.
We will refer to this model as the "one-pole model". Special cases (with n p = 1) have been considered long ago [47,48], while recently this model has been studied again, in particular for a large range of values of n p [37]. Our focus is somewhat different; we are mostly interested in the interplay between the location of the pole and the distribution and, for real z p , the difference between n p = 1 and n p = 2. This model captures the presence of a meromorphic drift in QCD in a very rudimentary way, as follows. Consider the QCD partition function for n f degenerate flavours, (3.4) where in the last expression we have written the fermion determinant in terms of the eigenvalues of the Dirac operator, λ i (U), which depend on the gauge field configuration, as indicated with the U dependence. The drift contributing to the update of link U will now have a contribution from the fermion determinant as where D denotes the derivative. When λ i goes to zero (and the determinant vanishes), the drift has a pole. In the one-pole model, the complicated dependence of λ i on U is replaced by a simple pole located at z p , i.e.
In QCD, the links U are of course fluctuating and the dependence is considerably more complicated. The relation between the number of flavours (n f ) and the order of the zero (n p ) depends on details of the fermion determinant.

Strips in the complex plane
To continue, we allow the location z p of the pole in the drift to be complex in general and take β real and positive. The drift has fixed points (K(z) = 0) at and hence, for real or imaginary z p , this yields (a) z p = x p real ⇒ both fixed points z 1,2 are real and attractive; (b) z p = iy p imaginary, y 2 p < 2n p /β ⇒ z 1,2 complex: both fixed points are attractive; (c) z p = iy p imaginary, y 2 p > 2n p /β ⇒ z 1,2 imaginary: the fixed point closer to the real axis is attractive, the other one repulsive.
In order to find where the pole is with respect to the equilibrium distribution P (x, y), and be able to discuss the three cases above (pole is outside, on the edge or inside the distribution), we note the following. The drift in the imaginary direction is given by Without loss of generality we take y p ≥ 0. Hence it immediately follows that the drift is pointing downwards when y > y p and upwards when y < 0. In the case of real noise (which we use from now on), this implies that the equilibrium distribution will be nonzero only in the strip 0 < y < y p [24,46]. Hence generically in this model the pole will be on the edge of the distribution. Moreover, since the distribution is strictly zero outside the strip, partial integration at y → ±∞ is not a problem and therefore this aspect of the justification is under complete control. Following the analysis of Refs. [24,46], we can in fact derive a stronger result. It follows from the FPE that the equilibrium distribution has to satisfy the condition ∞ −∞ dx K y (x, y)P (x, y) = 0. (3.10) Since P (x, y) ≥ 0, it follows that if K y (x, y) has a definite sign as a function of x for given y, P (x, y) has to vanish for this y value. Following exactly the same steps p y=y + y=y_ x y y=0 a) y=y b) Figure 1. One-pole model: strips where the equilibrium distribution P (x, y) is nonzero. The pole is located at z p = x p + iy p , with y p > 0 (red square). Left: y 2 p < 2n p /β: P (x, y) > 0 when 0 < y < y p and the pole is on the edge. Right: y 2 p > 2n p /β: P (x, y) > 0 when 0 < y < y − and the pole is outside the distribution. The strip y + < y < y p can be visited during the Langevin process, provided that the process is initialised at y > y + , but will eventually be abandoned (transient).
as in Sec. 4.2 of Ref. [46], we find the following. As a function of x, K y (x, y) has an extremum at x = x p and the value at the extremum is given by The zeroes of F (y), at determine the presence of additional boundaries at y ± , provided they are real [46]. We find that a) y 2 p < 2n p /β: no additional boundaries; b) y 2 p > 2n p /β: additional boundaries at y ± , P (x, y) = 0 when y − < y < y + .
This situation is sketched in Fig. 1.
In the latter case, no conclusion from this argument can be drawn regarding the strips 0 < y < y − and y + < y < y p . However, an additional analysis of the classical flow pattern shows that the strip 0 < y < y − is an attractor, while the strip y + < y < y p can only be visited when the process starts at y > y + . The drift inside this strip is pointing mostly towards y = y − and hence this region will eventually be abandoned. It will therefore at most be present as a transient.
We conclude that in this model the pole is either on the edge of (case a) or outside (case b) the distribution. In the following we address each of these possibilities.

Ergodicity and bottlenecks for real poles
We first discuss the real case, with z p = x p , since this allows us to introduce the concept of a 'bottleneck', which will turn out also be relevant for the complex case. In this case, the distribution ρ(x) is real, but with a sign problem for odd n p . As follows from the analysis above, both fixed points are attractive and the equilibrium distribution lies on the real axis. This is illustrated in Fig. 2 for β = z p = 1 and n p = 1, 2. We note that close to the pole, the drift is repulsive along the real direction and attractive along the imaginary direction; it is easy to see that this is true in general.
In the limit of continuous Langevin time, trajectories of a real Langevin process will not cross the poles [49]. This leads to a 'separation phenomenon', a point made some time ago [50]. In an actual simulation, because of the finite step size, crossing of the poles may happen (depending on the step size) [48]. It is instructive to look at the corresponding stationary Fokker-Planck equation (on the real axis) Clearly P (x) ∼ ρ(x) is a solution, but wherever there is a sign problem, it cannot be the stationary probability distribution, since P (x) should be nonnegative. Instead we find two linearly independent, nonnegative solutions: any linear combination of P + and P − with nonnegative coefficients is likewise a possible long time average. Hence the Fokker-Planck Hamiltonian has two ground states.
If the simulation manages to slip through the barrier sufficiently easily, we expect to get P q (x) = P + (x) + P − (x) = |ρ(x)|, i.e. the phase-quenched model, as already found in Ref. [48]. We have verified this numerically for n p = 1. One way to cross the bottleneck and facilitate tunneling through the pole is by adding a small amount of imaginary noise. However, the drift (3.2) is insensitive to sign changes in ρ and the phase-quenched result is recovered. We conclude that the Langevin process cannot give correct results for odd n p . For even n p > 0, there is no sign problem, but the lack of ergodicity exists as well. In this case, because of the stronger repulsion away from the pole, our simulations typically do not cross the pole, and hence produce incorrect results when started on one side of the pole. In this case, adding a small imaginary noise term does facilitate the crossing and leads to correct results. 2 In conclusion, we find that zeroes in the distribution lead to a bottleneck and hence ergodicity problems. Whether this zero is crossed depends on the order of the zero: the higher the order, the more difficult the crossing is. We will see that the same is true in the complex case, even though it is easier to go around the pole in the complex plane in that case.

Poles outside the distribution
We now consider the complex case and take n p = 2, z p = i (y p = 1) and three β values: β = 1.6, 3.2, 4.8. The relevant parameter determining the distribution is 2n p /βy 2 p , which takes the values 5/2, 5/4 and 5/6 respectively. Hence for β = 4.8 the distribution is confined to the strip 0 < y < y − ≈ 0.296 and the pole is outside   the strip, while for the other β values the distribution touches the pole and 0 < y < y p = 1. The classical flow diagrams are shown in Fig. 3, for β = 1.6 and 4.8. It is easy to see from the flow patterns that the general conclusions apply. For completeness, the corresponding thimbles 3 are shown in Fig. 4. Here the full (blue) lines are the stable, contributing thimbles and the dashed lines are the unstable, noncontributing thimbles. We note that at β = 4.8 the unstable thimble for the lower fixed point is the stable thimble for the upper fixed point. At the lower β value the thimbles meet at the pole, while at the higher value the stable thimble avoids the pole, consistent with the Langevin analysis.
We first consider the case β = 4.8. The histogram for P (x, y) is shown in Fig.  5 and is confined between 0 < y < y − ≃ 0.296, as it should be. The results for the observables z n (n = 1, 2, 3, 4) from a complex Langevin simulation are shown in Table 1  in line with the formal derivation. We hence state the following Proposition: If the drift is such that the equilibrium distribution is confined to a simply connected region not containing any poles of the drift, the complex Langevin process converges to the exact results.

Poles on the edge of the distribution
We now turn to β = 1.6 and 3.2, with the pole at the edge of the distribution. The corresponding histograms for P (x, y) are shown in Fig. 6 and the results for z n are listed in Table 1. Here we note that the Langevin results for β = 1.6 are wrong, while the results for β = 3.2 appear to be correct (within the error). To understand this better we employ two methods.
First we note that the histograms look quite different. At β = 1.6 the distribution is nonzero very close to the pole, which one expects yields boundary terms in the formal justification, which invalidate the outcome. On the other hand, at β = 3.2 the distribution is peaked predominantly away from y = 1 and the pole appears to be avoided. We make this more precise by computing the partially integrated distribution The results are shown in Fig. 7 on a linear scale (left) and on a logarithmic scale (right). On a linear scale it is easy to see that at β = 1.6, P y (y) is nonzero up to y = 1 and goes to zero linearly (at the pole the distribution is zero of course). Based on the formal justification, we conclude that this slow decay invalidates the applicability of the approach. On the other hand, at β = 3.2 the distribution appears to drop exponentially in an extended interval 0.5 < y 1, possibly with two exponentials. Hence expectation values of polynomials z n can be computed safely, as illustrated in Table 1.
For β = 1.6, it can be seen that there is a nonvanishing boundary term around the pole. Instead of a small circle surrounding the pole at z = i we may consider a horizontal line y = 1 − ǫ approaching the pole for ǫ → 0. Then the boundary term in Eq. 2.23 becomes (for N I = 0) (3.17) The smooth terms can be replaced by their values for ǫ = 0 and a boundary term arises because lim ǫ→0 K y (x, 1 − ǫ)P (x, 1 − ǫ) dx = 0.   These functions are related to the expectation values of exp(ikz) by In Fig. 8 we show the functions log Q k (x) for integer values k = 0, −1, . . . , −12. In all three cases the shape of the functions seems to stabilise with growing k, whereas there is approximately constant shift upwards with k. This suggests the following asymptotic behaviour, with some constant c > 0, so exp(ik(x + iy)) ∼ exp(ck) dx f (x)e ikx = exp(ck)f (k). (3.23) How can this remain bounded for k → ∞? The only possibility is that the Fourier transformf (k) decays exponentially; this will be the case if f (x + iy) is analytic in a strip |y| < const. In particular f (x) has to be smooth. Looking at Fig. 8 one can see clearly that for β = 1.6 f (x) is developing a kink, wheres in the other two cases it at least appears to be smooth and the effect of the pole appears to be negligible. Hence we may conclude that the incorrect convergence is due to the failure of the bound (3.19).
To summarise the findings in the one-pole model, we conclude that if close to the pole the distribution drops to zero fast enough, e.g. exponentially in the case considered here, the meromorphicity of the Langevin drift is not necessary an obstacle and correct results can still be obtained. When on the other hand the distribution is not falling rapidly at the pole, incorrect convergence is observed.

U(1) one-link model
In order to analyse what happens when poles are inside the distribution, we switch to the following U(1) integral with a complex weight, This model was introduced in Ref. [10] (for n p = 1) as a toy model for QCD, with a complex 'fermion determinant' satisfying [D(x; µ)] * = D(x; −µ * ). Complex Langevin dynamics was studied extensively in Ref. [10] for κ < 1, while problems for κ > 1 were first reported in Ref. [34]. Subsequently thimbles were analysed in Ref. [52]. When κ < 1 the weight is positive when µ = 0, while for κ > 1 there is already a sign problem at µ = 0. Concerning Langevin dynamics, we note that good results are obtained when κ < 1 (and k not too large and negative), while problems emerge for κ > 1 and β not too large [34,52]. It should be noted that in view of the later sections even values of n p ≥ 2 can be physical as the QCD determinant has double zeroes when the Wilson fermion formulation is used.
Results of CL dynamics for the observables e ikz (k = ±1, . . . , ±5) are shown in Fig. 9. For set (1), we observe good results, except when k is large and negative, k = −4, −5. For those values, fluctuations are large and increasing the simulation time does not improve this, a sign of non or poor convergence. For set (2), excellent agreement with exact results is obtained. For set (3), we observe agreement for large and positive k, but increasingly worse behaviour as k is reduced. The results for k = −4, −5 have larger errors, but the values of the averages are robust as the Langevin time is increased, hence here we find incorrect convergence. Since for our choice of parameters the poles are located at y p > 0, we note that exponentials with k > 0 (k < 0) will be less (more) sensitive to the presence of the poles, as a suppression (enhancement) with e −ky (e ky ) arises naturally. This is indeed supported by the data.
In the following we focus on the case where κ > 1 and β 1, since this is where complex Langevin dynamics converges, but possibly to an incorrect result. Moreover, we will compare n p = 1, 2 and 4.

Poles inside the distribution
We consider parameter set (3), with β = 0.3, κ = 2, µ = 1 and n p = 1, 2 and 4. Classical flow diagrams are given in Fig. 10 for n p = 1, 2 (note the periodicity in x). Besides the attractive 'perturbative' fixed point at x = 0, there is an additional attractive fixed point at x = ±π. The other two fixed points are repulsive. It is clear to see from the flow diagrams, and can be confirmed following a similar analysis as above, that the equilibrium distribution will be contained in a horizontal strip between the two attractive fixed points. Finally, the pole is attractive in the imaginary direction and repulsive in the real direction (as always), making the pole an approximate bottleneck, just as in the real case considered in Sec. 3.1.2. Hence, as the attractive fixed points move closer together in the imaginary direction, the distribution gets narrower and narrower.
In Fig. 11 we show logarithmic contour plots of the equilibrium distribution sampled during the CL process in the complex plane for n p = 1 (left) and 2 (right). Note that the darker colours correspond to the most frequently visited regions. The position of the pole can clearly be identified as the place where the distribution is pinched, resulting in a bottleneck; this effect gets stronger with increasing n p . The distribution is strictly zero outside the strip set by the attractive fixed points.
To better understand this structure, we note that the approximately disconnected regions (i.e. the 'head' and the 'ears' in Fig. 11) are characterised by the sign of the real part of the determinant, (3.27) Figure 11. Logarithmic contour plots of the distribution in the xy plane, for β = 0.3, κ = 2, µ = 1, n p = 1 (left) and n p = 2 (right).
and hence we will refer to them as G ± , with G + the 'head' and G − the 'ears'. For n p = 1, we observed frequent crossings between the two regions. For n p = 2, the crossings are rarer but still frequent enough such that both regions are visited during long runs. This might, however, be due to the finite time step. In the continuous time limit it is possible that the two regions that are not connected by the process, i.e. the process might not be ergodic. Of course rare crossings make it hard to collect good statistics. For n p = 4 (not shown) no crossings were observed and the distribution only has support in G + . To translate these findings to an observable easily accessible also in more complicated models and lattice theories, we consider the complex determinant. Logarithmic contour plots of D are shown in Fig. 12. We observe a similar structure, with the zero of D acting as the bottleneck. We will use this diagnostics in the more complicated models discussed below.
In view of the formal justification, see Sec. 2, it is important to know the rate at which the distribution goes to zero at the pole. This is shown in Fig. 13 for the partially integrated distribution P x (x) (left) and the real part of D (right). For n p = 1 we observe a linear decrease at the pole (recall that x p = ±2π/3), while for n p = 2 the decay is faster. For n p = 4, the pole is not crossed and the entire dynamics takes places in G + . Since the pole does not negatively influence the dynamics in this case, we expect good agreements with the exact results, although there may be problems with ergodicity, similar to the real case. This is demonstrated in Fig. 14, where Re e ikz is shown on a logarithmic scale, for 10 values of k. For n p = 2 we  find approximate agreement, especially for k close to 0. For n p = 4, good agreement is seen for all k values considered. This is consistent with the formal derivation: for n p = 4 the pole is avoided and only the region sufficiently far from D = 0 is relevant. Finally, we stress once more that further support for the validity of the formal arguments comes from the observed interplay of the observables and the pole: it is possible that for some observables good agreement is found, while for others it is not. This crucially depends on the region in configuration space most relevant for the observable under consideration, as exemplified in this model by the observables e ikz , with k ≷ 0.

What does the CL simulation actually compute?
In order to further understand the relevance of the contributions from the nearly disconnected regions G ± , we have analysed the results from Langevin simulations for e ikz by separating the trajectories based on the sign of the real part of D. The results are summarised in Table 2 in the columns labeled CL[G ± ]. We note that the results obtained when restricted to G + are close to the exact results, listed in the first column, but not quite equal. We can understand this as follows: first we shift the contour of integration of the original integrals to go through the zeroes of ρ(z). For set (3) this means Im z = µ. Next we split the integration into two contributions coming from the two inequivalent paths connecting the zeroes, one living in G + and the other in G − , and define and similarly The exact results, restricted to G ± , are shown in Table 2 in the columns labeled exact[G ± ]. The agreement between the restricted Langevin and exact results is convincing. This should not be surprising, since the formal proof of correctness provided earlier is directly applicable to the model restricted to G + or G − . Since the exact values for the full model can be obtained as  Table 2. Re e ikz for several values of k, when restricted to G ± (Re D ≷ 0), for β = 0.3, κ = 2, µ = 1, with n p = 2, comparing complex Langevin (CL) and exact results.
a way to obtain the correct results would be to combine the restricted simulation results with the weights Note that since Z − /Z + ≃ 0.0281 ≪ 1, the deviation between full results and those restricted to G + is on the order of a few percent as well, as illustrated in Table 2.
The problem with this prescription is of course that in realistic models the weights are not known. However, below we will see that typically w − is tiny and can be approximated by zero; here it is nonnegligible because we chose the rather extreme value κ = 2.
In the case of n p = 4, the process never crosses into G − , which indicates a lack of ergodicity, similar to what was found in Sec. 3.1.2. The process simulates a version of the original integral restricted to a path running between the zeroes, which is not quite equal to the full integral. This causes a tiny systematic error which is, however, not visible in the data since it is highly suppressed; for n p = 4, Z − /Z + ≃ 0.00302 ≪ 1.

Lessons from simple models
The following lessons can be learned from the simple one-variable models.
Lesson 1: It has been suggested [34,35] that the winding of the Langevin paths around the pole is the source of the problem, because the pole corresponds to a logarithmic branch point in the action. However, in the one-pole model of Sec. 3.1 we have demonstrated explicitly that no such winding occurs, since the pole lies either on the edge or outside the distribution. Nevertheless wrong results can be encountered. Further indication that it is not the winding which matters has been given in Ref. [37], see also Sec. 6 for the case of full QCD.
Lesson 2: It has been said (see for instance Ref. [37]) that it is sufficient for correctness if the distribution P is 'practically zero' at the pole. Using again evidence from the one-pole model, we note that this is not correct in general: for small β = 1.6 wrong results are obtained, but P (x, y) vanishes at the pole. On the other hand, we have shown (and demonstrated numerically for β = 4.8) that it is sufficient for P to be nonzero only in a simply connected region whose closure does not contain the pole(s). The intermediate case β = 3.2 seems to have at least a distribution P vanishing at very high (maybe infinite) order at the pole, also leading to good results. All this can be understood in the light of the fact discussed in Sec. 2: the observables evolving according to Eq. (2.8) typically develop an essential singularity at the location of the pole of the drift.
Lesson 3: A strong attractive fixed point sufficiently far from any poles of the drift leads to correct results. This almost obvious fact has been observed already earlier, e.g. in QCD with static quarks [15].
Lesson 4: The existence of a 'bottleneck' between two regions G + and G − , such as in the U(1) one-link model of Sec. 3.2, is a signal for potential trouble. The best variable to analyse this is the determinant D (not raised to any power), because it can also be used in more complicated lattice models, as we will see below.
Lesson 5: It is possible that the relative weight of one of the two regions is suppressed, i.e. w − ≪ w + . Then a modification of the process which includes only trajectories with Re det D > 0, i.e. those contained in G + , or using long runs such that the weight of runs in G − is naturally suppressed, seems to produce reasonably good results. On closer inspection, however, it only gives approximate results, since only one part of the original complex integral is represented, namely the part contained in G + . However, if indeed w − ≪ w + , this may give a numerically accurate approximation to the complete problem.
Lesson 6: The effect of increasing the strength of the pole by increasing n p is twofold: On the one hand the 'pull' in the imaginary directions towards the pole is increased, which is bad; on the other hand the 'push' in the real directions away from the pole is strengthened, which is good.
In the one-pole model, with the pole on the imaginary axis, the first effect dominates: hence increasing n p makes the situation worse. We have checked that for n p = 2, to obtain correct results, a larger value of β is needed than for n p = 1.
For parameter set (3) in the U(1) model the second effect dominates: increasing n p makes the bottleneck between the two regions G + and G − narrower and inhibits transitions between the two regions; furthermore it reduces the relative weight of G − . For n p = 1 this bottleneck does not prevent the process from moving between the two regions; for n p = 2, transitions are already rarer and it seems that each of the regions around the two attractive fixed points supports an invariant measure by itself; for n p = 4 no transitions are observed even for extremely long runs. It should be noted that in lattice QCD with n f flavours of Wilson fermions the degrees of freedom make n p at least 2n f . Lesson 7: The interplay between an observable and the distribution determines how close the expectation value of the former is to the correct one: if the observable is naturally suppressed/enhanced near the pole, it is possible to obtain, within the numerical error, correct/manifestly incorrect results. This explains why one can encounter both apparently correctly and manifestly incorrectly determined expectation values in a single analysis.
We will now take these lessons and see how they apply to more realistic models.

Effective SU(3) one-link model
In the following section we investigate the role of the zeroes and the ensuing lessons in a system with more degrees of freedom, which is however still exactly solvable, namely an effective SU(3) one-link model. Versions of this model have been considered before, see e.g. Refs. [10,14]. Here, the form of the model and the choice of parameters is motivated by QCD with heavy quarks (HDQCD), to be discussed in Sec. 6. The starting point is QCD with N f flavours of Wilson fermions. At leading order in the hopping expansion, the fermion determinant can be expressed as a product of factors involving Polyakov loops at each spatial site, see Sec. 6 below, where the remaining determinant is in colour space only 4 and P with N τ the number of time slices in the temporal direction. The parameters C,C arise from the hopping expansion and read Employing the temporal gauge we can see that the product of local factors is equivalent to having only one temporal link in each factor. Using standard relations, again valid for both SU(N c ) and SL(N c , C), the remaining determinants can be expressed in terms of the traced Polyakov loops, Explicitly, for N c = 2 this gives and for N c = 3, For larger N c the relations become more complicated but the determinant always includes a C Nc term which dominates at large µ (making the sign problem increasingly harmless toward the saturation regime). Notice that for SU(N c ), |P x |, |P ′ x | ≤ 1. In the following we concentrate on the N c = 3 case.

Effective one-link model for HDQCD
To define an effective model for HDQCD in four dimensions we consider the resulting fermion determinant on a single spatial lattice site, such that P = tr U/3 and P ′ = tr U −1 /3 are the only degrees of freedom. Here U is the remaining temporal link in the temporal gauge. To approximate the Yang-Mills integration of the lattice model we consider the temporal link U surrounded by its neighbours, see Fig. 15, and replace the contributions from the staples connected to U by a single matrix A, such that There are various ways to proceed [14]. Here we diagonalise U, with eigenvalues e iw k ( k w k = 0, k = 1, 2, 3). The group integral then includes the reduced Haar measure The complete one-link action to consider now takes the form where the diagonal elements of A are represented by the α k 's (where we took out a factor of 6). The Langevin drift is determined by K = −∇S and complex Langevin dynamics can be implemented for all three w k 's or after eliminating the constraint k w k = 0 [14]. Zeroes in the Haar measure also lead to poles in the drift, but these generally do not lead to problems and, in fact, stabilise the dynamics. This has been discussed in Ref. [14]. More details concerning the distribution of zeroes of det M are given in App. C.
As observables we consider Exact results are obtained by numerically integrating over the angles w k . When the action is real, In order to determine reasonable parameter values, relevant for HDQCD, we write the fermion determinant as where the critical chemical potential for onset at zero temperature, i.e. the chemical potential at which the density changes from zero to nonzero [19]. This is illustrated in Fig. 16 (left), where the µ dependence of a and b is shown for given N τ and κ. With increasing κ, µ 0 c decreases and the peaks shift to the left, while with increasing N τ the peaks become narrower. We also note that the (anti-quark) contributionD becomes increasingly irrelevant as N τ increases, asC becomes exponentially small. Hence we will usually neglectD. In the following we use N τ = 8, unless stated otherwise, and N f = 1. Since in the one-link model there is no transition as β is varied at µ = 0, see Fig. 16 (right), we choose to work at β = 0.25, but we have also studied larger β values. We considered two κ values κ = 0.120, µ 0 c = 1.427, and κ = 0.145, µ 0 c = 1.238, (5.17) where µ 0 c is the corresponding critical µ value (5.16), corresponding to C = 1. The sign problem is (nearly) absent exactly at onset, where C = 1, a = b = 3/2, and D in Eq. (5.13) is real (D is exponentially close to 1). This will explain some of the results below and has been noted before [53,54]. The behaviour for the two κ values is rather similar therefore we shall only show the results for κ = 0.120.

Ordered lattice
We first consider the ordered lattice (α k = 0). Fig. 17 contains results for the observables O ±n (n = 1, 2, 3), averaged over 100 trajectories, using random starting points. The runs are relatively short: the total Langevin time is around 130, with 20% thermalisation. Note that O +n and O −n are typically rather close together. We see very good agreement, except around µ ≃ µ 0 c = 1.425. The same behaviour is found at the larger κ = 0.145. We hence focus on three µ values: µ =1.375 (below onset, CL fine), 1.425 (close to onset, CL problematic), 1.475 (above onset, CL fine).   In Fig. 18 we show results for each of those µ values, using 50 independent, relatively short, trajectories. The figures on the left show the observables against trajectory index. When CL is fine, all trajectories fluctuate around the exact result. However, when CL is problematic (middle figure), the trajectories appear to split in two groups, indicated by the red and blue symbols. We identify those using the minimal absolute value of the determinant on the trajectory, To investigate these two types of trajectories further, we show in Fig. 18 (right) the corresponding scatter plots for the determinant. When CL works well, the points from all trajectories appear similarly distributed, even when d min gets very small. At the middle µ value of µ = 1.425 a different picture appears: the trajectories of the first group (d min > d c ) give a similar picture as at the lower and higher µ values (the "red fish"), while the second group (d min < d c ) yields a very peculiar structure (the "blue whiskers"). The appearance of two essentially disjoint contributions in the determinant is very similar to what was observed in the U(1) one-link model. A red/blue code for identifying the disjoint ("regular"/"deviant") contributions is used in Figs. 18, 19, 22, and explained in the captions.
In the scatter plot we showed results for the determinant det M = (DD) 2 which enters in the determination of the drift. More information, however, is provided by the unsquared factors DD ≃ D. In Fig. 19 (bottom left) we show the scatter plot of the unsquared factors DD ≃ D at µ = 1.425. The "blue whiskers" have Re D < 0 and come from trajectories which approach the pole (zero of the determinant) with d min < d c . As in the simple models, a bottleneck separates them from the region with Re D > 0. Moreover, the contributions have a very small weight. Depending on the starting configuration, trajectories may run for a while in the region with negative Re D, before switching to the positive side. The contributions from the "whiskers" therefore practically fades out after enough thermalization: already after t ≃ 1000 all configurations appear in the region Re D > 0, see Fig. 19 (top). Some of the results are summarised in Table 3.
Similar as in Sec. 3.2.2, we defined here partition functions and weights restricted to subsectors, namely where ρ(U) is the original complex distribution. Table 3 also contains an estimate of how much time is spent in the region with Re D < 0, which is denoted with p − . It should be noted that p − depends on the details of transient behaviour and crossings, and is hence not immediately related to w ± = Z ± /Z. The exact relative weight of the region Re D < 0 is easily found and is O (10 −4 ). We find that the process, once arrived in the positive region, only rarely visits the  Table 3. Simulation results at β = 0.25, κ = 0.12, N τ = 8 and µ = 1.375, 1.425, in the ordered case, from all trajectories (CL), from trajectories with d min > d c (CLD), from all trajectories after dropping the points with Re D < 0 (CL+), using 100 or 50 trajectories with varying length of Langevin time t = t therm. + t meas. . Errors are not indicated but are at the permille level. Imaginary parts are zero within the error. Also indicated are exact results: full, restricted to Re D ≷ 0, and phase quenched (pq). w − is the relative weight of the Re D < 0 region in the partition function, for the simulation p − is given instead, estimated via the proportion of Re D < 0 points.
negative region and typically only very briefly. Hence random starts with Re D < 0 give that region an artificially large weight and a considerate choice of the start configuration will reduce the necessity of long thermalisation times. These findings suggest it might be useful to discard trajectories with d min < d c and thus sample the Re D > 0 region only, to ensure nearly correct convergence. We find that the value of d c does not need tuning, in the above case any value between 10 −5 − 10 −8   is acceptable, see Fig. 20 (left). Alternatively one can keep all trajectories but drop contributions from configurations with Re D < 0, as not to lose statistics. This is demonstrated in Fig. 20 (right). Keeping only trajectories with d min > d c leads to the similar results. As already suggested by Fig. 18 the signal for deviant contributions also shows up in observables, as can be seen in Fig. 19 (right). In this long run and rather representative case only three trajectories out of fifty contain configurations with Re D < 0, which lead to outlying contributions in the observables. These few contributions falsify the averages by nearly 1%, while the average over the other trajectories reproduces the exact result within the error ( 0.1 %). The irregular signal is clear in the scatter plot of the observables, which also suggests that they are related to certain initial configurations, such that their weight will diminish in long runs.
The above effects become evident by plotting the correlation between the determinant and various other quantities. In Fig. 21 we show the histograms of the probability distributions of Re D and of one selected observable, O 2 = tr U 2 (top). The scatter plots (middle, bottom) show a clear correlation between Re D and the unitarity norm tr(U † U − 1 1), the drift, and O 2 .

Disordered lattice
Next we consider a disordered lattice, i.e. we take into account the nontrivial effect of the neighbours when the lattice theory is reduced to a one-link model, see Eq. (5.10). We take α k = 0 and choose the values given at the end of Sec. 5.1. Fig. 22 demonstrates the behaviour of the unsquared determinant and for O 1 , see also Table  4, which is very similar as for the ordered case. We find that trajectories starting with Re D < 0 ("blue whiskers") need much more time to switch to the region with Re D > 0 ("red fish"). Nevertheless the weight of the former is only about 0.001 − 0.05, and discarding the contribution with Re D < 0 decisively improves the results. The results, however, deteriorate with increasing lattice disorder, which may indicate the effect of large excursions in the noncompact directions, for which the adaptive stepsize and gauge cooling become essential. Since this was studied in previous papers, we do not analyse this problem any further here.  Table 4. As in the previous Table,

Expansion
Finally we study the possibility to ameliorate the dynamics using an expansion of the determinant, which is also discussed in App. D for the simple models. We restrict ourselves here to the ordered case. The fermionic part of the drift is of the form Neglecting the factor 1 + C 3 , which cancels in the drift, we write and similar forD. The pole is at X = −1. We then write a Taylor expansion centred at the shifted point λ, such that and again similar forD, with parameterλ. Notice that the λ used here differs by one unit from D 0 used in App. D. Since the pole is at X = −1, the expansion around  Fig. 19 for the strongly disordered lattice 1 2 X Figure 23. Complex X plane, with the pole at X = −1. The expansion around X = λ has a larger radius of convergence than around X = 0. X = λ with conveniently chosen λ has an increased radius of convergence compared to the expansion centred at X = 0, see Fig. 23.
The "regularisation parameters" λ,λ can be chosen conveniently, and can also be adapted during the simulation (dynamical analytic continuation, see App. D). We note here that this procedure can also be used for inverting matrices 1 1 + X, where a simple choice for the regularisation term is λ1 1. In lattice QCD, this can e.g. be applied to the fermion matrix. Notice that this shift is an exact procedure and does not represent an approximation for which a subsequent extrapolation is needed. In   practice the expansions are of course truncated and their effectiveness depends on the radius of convergence, which is however improved by the λ-shift. In Fig. 24 results for this procedure are shown. We have tested λ real, imaginary and 0 (no regularisation). For the latter, the expansion shows runaways and does not converge. With imaginary λ = i(a + b), the convergence was found to be not very good. On the other hand, for real λ = a+ b the convergence and results are excellent. In Fig. 24 (left) convergence of the expansion is shown at µ = 1.425, i.e. the µ value where the "whiskers" affect the result. The expansion is truncated, n ≤ N, and the dependence on N is shown. Excellent convergence is observed. In Fig. 24 (right), observables are shown as a function of µ, using a fixed N = 16 , leading to good agreement with the exact results (cf. Fig. 17).
We find therefore that meaningfully regularised expansions converge to the correct results, even at µ values for which the standard procedure does not work. This can be understood in the following way: Choosing the shift such that the expansion point lies in the "fish" region of the scatter plots, e.g. by choosing λ = a + b, the trajectories explore this region and never enter the "blue whiskers" region, due to the bottleneck. Hence it has the same effect as avoiding the Re D < 0 region altogether. A dynamical expansion which adapts λ when approaching the edge of the domain of convergence is easy to implement as well and will cover the full analyticity domain. In this case, however, it will also collect data from the "blue whiskers", leading to the same wrong results as when all trajectories are used.

Discussion and tentative conclusions
As long as the parameters a, b,ã,b are below 1, the Langevin process is found always to converge to the correct results, within the error. Significant discrepancies appear for C ≃ 1 where a, b ≃ 1.5.   Although the measure is det M = (DD) 2N f ≃ D 2N f , the relevant factor for the analysis is DD ≃ D. The sign of Re D appears to identify two separate regions with two different contributions to expectation values. This conspicuous situation appears close to onset. The determinant also becomes squeezed in the imaginary direction. These regions show up in the scatter plots as "red fish" (Re D > 0) and "blue whiskers" (Re D < 0), the former producing good expectation values for the observable but the latter leading to strongly deviant contributions. This outlying behaviour is also visible in the scatter plots of the observables directly sensitive to the pole or the fermionic degrees of freedom, which may provide a practical test in realistic lattice simulations at no cost, as the gauge invariant observables are calculated anyway. The clear correlation between the Re D < 0 region and the outlying contributions to various quantities is shown in Fig. 21, which also illustrates the small weight of these regions.
At least in the examples analysed here the two regions appear separated by a bottleneck which can only be crossed by trajectories approaching |D| = 0 below a certain threshold. The approach to the pole can therefore signal the possible sampling of "deviant" contributions from the Re D < 0 region. The latter has a significantly smaller weight, which may only appear as a quasi-transient whose contribution for very long Langevin trajectories is extremely small. This suggests that discarding the contributions from the region with Re D < 0 will automatically lead to good results. Long enough thermalisation times, or adequate starting points in the Re D > 0 region can help by inhibiting the development of these regions. The expansion, on the other hand, seems to do just that if we set the expansion centre in the region of Re D > 0, leading to good results via this approach. We note that the appearance of the bottleneck, separating the complex configuration space into two regions, with w − ≪ w + , is consistent with the findings in the U(1) model. Larger β and/or larger N τ show a picture consistent with the above one, sup- We conclude that the development of Re D < 0 regions signals failure in the simulation: although the weight of these regions is small, their contributions deviate strongly from the ones with Re D > 0 and hence they may affect the results by many standard deviations. Dropping, one way or the other, those contributions leads to results agreeing with the exact ones, at the level of the statistical errors (at the permille level in these runs). The fact that the process fails to account correctly for the region with Re D < 0 at certain parameter values suggests, however, that we should attribute a possible systematic error proportional with the weight of this region, which is O (10 −2 − 10 −4 ) in the examples studied here.
We may try to quantify the relevance of the region Re D < 0 by calculating the logarithm of its relative weight, ∆F eff = − ln(w − /w + ), using the exact integral expressions. As can be seen in Fig. 26 (left), ∆F eff appears bounded from below, and even increasing with N τ for some µ values far from µ 0 c . The bound is about 10% lower on the disordered lattice but shows similar behaviour. With increasing number of fermion species, the difference in free energy scales approximately with the number of flavours, ∆F eff (N f ) ≃ N f ∆F eff (1), see Fig. 26 (right). Since in HDQCD the lattice determinant is a product of factors of the form D 2 over the spatial lattice, we may ask how the spatial volume would manifest itself in ∆F . Fully correlated Polyakov loops would then behave as if in the presence of many flavours, but the general case is not trivial and will be discussed in the next section.
Extrapolating the lesson from this discussion to realistic QCD lattice calculations, we conclude that one has to monitor the appearance of disconnected regions with a bottleneck at |D| ≃ 0. This can be done by monitoring various quantities such as some selected observables, the drift or the lowest determinant modes avoiding time consuming procedures. Dropping by hand the occasional contributions of regions of type "blue whiskers" (Re D < 0) should already produce good results, while an estimate of the relative impact of such contributions would suggest a measure for possible systematic errors.

Lattice QCD
In the following section we aim to apply the lessons found above to the case of QCD at nonzero quark density, first in the case of heavy quarks (HDQCD) and then for full QCD, using the staggered fermion formulation.
In QCD the partition function is, after integrating out the quarks fields, written as where S YM is the Yang-Mills action, U are the gauge links, and M is the fermion matrix. The Langevin update for the gauge links U reads [55], in a first-order discretised scheme, with Langevin time t = nǫ, where λ a are the Gell-Mann matrices, normalised as Trλ a λ b = 2δ ab , and the sum over a = 1, . . . , 8, is not written explicitly. More details can be found in Refs. [10,15,16]. The drift is generated by the action S and reads

Heavy dense QCD
To assess the importance of these poles, we need to specify the fermion matrix. We consider first heavy dense QCD. This approximation to full QCD can be obtained by a systematic hopping-parameter expansion of the fermion determinant, preserving terms that cannot be ignored for large chemical potential, as well as the terms related by symmetry under µ → −µ. For Wilson quarks, this amounts to an expansion in terms of κ, keeping κe µ fixed and preserving terms that go as κe −µ . A detailed discussion can be found in Refs. [18,56,57], see also Refs. [58,59] for combined hopping and strong-coupling expansions.
Here we consider the resulting theory at leading order, using N f degenerate quark flavours, for which the fermion determinant reads [10] (see also Sec. 5) The remaining determinant is in colour space only. The parameter h = (2κ) Nτ arises from the hopping expansion (N τ the number of time slices in the temporal direction) and P (−1) x are the (inverse) Polyakov loops, see Eq. (5.1). Note that the gluon dynamics is included in Eq. (6.3) via the usual Wilson Yang-Mills lattice action, with gauge coupling β.
In order to study the zeroes of the determinant, we identify the basic building block of determinant, defined such that the full determinant in the path integral weight is written as Here the local determinants are where z = he µ/T ,z = he −µ/T and We study the zeroes of the local determinants rather than the full determinant, since this is what is closest to the analysis carried out above and will allow us to focus on individual factors getting small. HDQCD has been used extensively to justify the results obtained with CL, e.g. via reweighting [15,19,30]. Since reweighting and CL have very different systematic uncertainties, the agreement of the results obtained by both methods is a strong argument for the correctness of either approach. In particular, since reweighting does not suffer from potential problems caused by zeroes of the determinant, agreement indicates that the latter do not cause problems for CL.
We note here that HDQCD has also been used to test and compare variations of the hopping parameter expansion to higher order [18], in particular with regard to the standard hopping expansion, obtained via [60] det for which zeroes of the determinant do not appear. An expansion in spatial hopping terms only, for which HDQCD is the leading-order term, has been discussed and assessed in Ref. [18]. We now discuss some results in HDQCD, focussing on potential zeroes of the determinant. We mostly use the following choice of parameters β = 6, κ = 0.12, Note that the lattice spacing (or gauge coupling β) and the spatial volume (N 3 s ) are fixed, but we consider two temperatures (N τ = 8, 16). In this theory, the quark number at zero temperature changes from 0 below onset to saturation above onset, with n sat = N spin × N colour × N f = 6N f , and the critical chemical potential is given by µ 0 c = − ln(2κ) = 1.427. (6.10) At nonzero temperature, this transition is smoothed and the critical chemical potential µ c (T ) < µ 0 c , eventually connecting to the thermal confinement-deconfinement transition line at higher temperature and lower chemical potential. A study of the phase diagram at fixed lattice spacing can be found in Ref. [19]. In Fig. 27 we show the density, in units of saturation density, for the parameters in Eq. (6.9) on the 8 4 lattice. The rapid rise around µ = µ 0 c is indeed observed. Writing the determinant as a product of its absolute value and phase, det M = | det M|e iϕ , (6.11) and using the symmetry we can extract the average phase factor in the full (i.e. not in the phase-quenched) theory via a computation of [10] e 2iϕ = det M(µ) det M(−µ) , (6.13) which is accessible using CL dynamics. The result is shown in Fig. 27 as well. The sign problem is severe in the onset region. At the critical chemical potential, the fermions are at 'half-filling', i.e. half of the available fermionic states are filled, and the theory becomes particle-hole symmetric [54]. Exactly at µ c , the sign problem becomes very mild, as the first dominating factor in Eq. (6.6) becomes real (z = 1). The sign problem due to the second factor is very small, sincez = (2κ) 2Nτ ≪ 1.
To investigate the zeroes of the measure in HDQCD, we have analysed det M x , see Eq. (6.6). Note that the corresponding factor in the effective SU(3) model was discussed in Sec. 5. We find that the simulations largely avoid the zeroes of the determinant, except in the vicinity of the critical chemical potential. This is illustrated in Fig. 28 (top), where a histogram of the absolute value of the local determinant det M x is shown, on a double-logarithmic scale for two lattice sizes. For small chemical potential, the absolute value of the determinant is close to 1, as z,z ≪ 1. As µ is increased, the distribution widens and its maximum shifts towards larger values. However, we also observe that the distribution is nonzero for smaller values, with an apparent power decay towards zero. Based on these simulations, we find the Note that −π < Φ < π and that the distributions are symmetric around zero. Away from the critical chemical potential, the distribution drop to zero rapidly; around µ c we observe a decay ∼ 1/Φ 3 . The relation between this phase and the phase of the full determinant ϕ is not immediate. However, we note that in general it is expected that the latter will vary rapidly, as the full determinant is a product of 2N f N 3 s local determinants.
To compare with the results presented in the previous sections, we show in Fig. 29 the probability density of the local determinants on a logarithmic scale, for two values of the chemical potential close to µ 0 c = 1.427, namely µ = 1.3 (left) and 1.425 (right). Note the very different horizontal and vertical scales. These distributions look remarkably similar to those encountered in the simpler cases, although with only a very thin presence of the 'whiskers', if at all. We find that at the critical point the distribution is highly elongated in the positive real direction, and that it shrinks again for µ > µ c . Exactly at µ c , the distribution shrinks in the imaginary direction, leading to a much smaller typical phase of the determinant, and thus a milder sign problem. This explains the milder sign problem at µ = µ c , as observed via e 2iϕ in Fig. 27. There are some configurations where Re det M < 0, but these appear very infrequent and do not carry substantial weight.
However, at lower temperatures a clear sign of the whiskers appears, which indicates the possibility of contamination from configurations with Re det M < 0. This is illustrated in Fig. 30, where we show results at a fixed lattice spacing, on a 8 4 and 16 4 lattice, such that the temperature is twice as low on the 16 4 lattice. Close to µ c , the weight of the region with Re det M < 0 is approx. 0.005% on the 8 4 lattice and 0.08% on a 16 4 lattice, indicating a growing importance as the temperature is lowered. At the lower temperature, the "whiskers" are remarkably similar to those encountered in the SU(3) one-link model. In Fig. 31 we aim to reduce the lattice spacing, while keeping the physical volume and the temperature constant. Here we see that the power decay towards zero remains approximately the same and hence the role of configurations with a small absolute value of the local determinant does not change when going (somewhat) closer to the continuum limit. We also investigated whether changing N f influences the appearance of the whiskers. While using a larger N f is beneficial, configurations with determinants in the whiskers do appear at low temperatures. Finally we have studied the volume dependence of the weights w ± , signifying the importance of the regions G ± , as suggested by the analysis of the one-link models, but did not find a clear suppression of the region with Re D < 0.
We hence conclude that the zeroes of the determinant for HDQCD appear to have no effect except close to the critical chemical potential. Since for those chemical potentials the sign problem is quite mild, this observation is not related to the severeness of the sign problem but to details of the CL process. Although the zeroes might influence the results in this region, the relative weight of configurations with Re det M < 0 is quite small and their influence is therefore suppressed.

Full QCD
Finally, we consider full QCD, with dynamical fermions. We note that full QCD is numerically much more costly than HDQCD, as the inversion of the fermion matrix has to be carried out numerically for every update. Similarly, an assessment of the zeroes of the determinant is harder, since it requires the computation of the full determinant. It should be noted that the determinant itself is not required for the Langevin update, only M −1 evaluated on a fixed vector [16].
Here we show results obtained using staggered fermions, with the (unimproved) staggered fermion matrix where η νx are the staggered sign functions, η 1x = 1, η 2x = (−1) x 1 , η 3x = (−1) x 1 +x 2 , η 4x = (−1) x 1 +x 2 +x 3 . Note that this formulation describes four tastes, due to the fermion doubling. There are several indications for full QCD that, at least at high temperatures and for the quark masses considered, the CL simulations are unaffected by poles in the drift: these include comparisons to systematic hopping-parameter expansions, which have holomorphic actions [18], comparisons to reweighting, which does not depend on the action being holomorphic [61], and by observing spectral properties of the fermion matrix [7] (see also below). At low temperature, it is at present not known whether poles affect the CL results, partly because simulations are more expensive due to the larger values of N τ required, or are hindered by the ineffectiveness of gauge cooling on coarse lattices. In order to study the phase of the determinant in full QCD we start from an initial configuration on the SU(3) submanifold. After thermalisation we then follow the evolution of the phase. The results are shown in Fig. 32, for a 8 3 ×4 and a 12 3 ×4 lattice. On the smaller volume and for the smaller chemical potentials µ/T = 0.4 and 1.2, one observes very mild time dependence with no winding of the phase around the origin. For the larger chemical potentials µ/T = 2.0 and 3.2, however, we see frequent crossings of the negative real axis. This signals that there is a sign problem in the theory which is hard to counter with reweighting, as the average phase factor gets close to zero. As expected, this behaviour gets worse as the volume is increased.
The corresponding histograms are shown in Fig. 33, for three different spatial volumes, namely 8 3 , 12 3 , 16 3 , with fixed N τ = 4. As expected, the distribution is localised on the smallest volume, but gets increasingly wider as the volume and/or the chemical potential are increased. To relate this behaviour to possible zeroes of the determinant, we have computed the eigenvalue spectrum of the Dirac operator for typical configurations in the ensemble, using the same setup as in Fig. 32. Fig. 35 contains the spectra, while Fig. 34 contains histograms of the absolute values of the eigenvalues, obtained by averaging over 100 configurations. We note that in spite of the frequent circlings of the origin by the fermionic determinant there are typically no eigenvalues close to zero, suggesting that the probability density of configurations around the singularities of the drift is very small, as in the simpler models discussed above. The change of the total phase is given by the sum of the changes over all the eigenvalues, so in contrast to toy models the frequent crossing of the negative real axis does not suggest that the poles are affecting the CL dynamics. We note that the increase of the volume, from 8 3 to 12 3 , leaves the shape of the spectrum very similar, but increases the density of eigenvalues.
To summarize our findings, in full QCD at high temperatures the singularities of the drift appears to be outside the support of the probability density of configurations. We conclude therefore that in this situation complex Langevin dynamics provides correct results, in line with the formal arguments and the lessons from the simple models. What happens at lower temperatures remains an open question.

Discussion
In this section we summarise the key findings and discuss them in the context of the various models. The main objective was to understand the role of zeroes in the path integral measure after analytic continuation in the context of complex Langevin dynamics, in which the zeroes show up as poles in the Langevin drift. Since the derivation of the justification of the complex Langevin approach for complex measures relies on holomorphicity of the drift, the presence of poles makes a re-analysis of this derivation necessary.
We have shown that a crucial role is again played by the behaviour of the observables considered and the (real and nonnegative) distribution, which is a solution of the associated Fokker-Planck equation and effectively sampled during the Langevin process. While for holomorphic drifts it is the behaviour at large imaginary directions in the complex configuration space that matters, for meromorphic drifts we have shown that an additional constraint arises from the behaviour close to the poles: to justify the method it is necessary to be able to perform partial integration, without picking up finite boundary terms, both around the poles and for large imaginary directions. This condition gives a requirement of fast decay of the distribution in those regions. In simple cases it is possible to verify this requirement analytically, for instance when it can be shown that the distribution is strictly zero in those regions, but for most models this has to be verified a posteriori by diagnosing the output from numerical simulations.
Besides this, we also found that time-evolved observables typically have an essential singularity at a pole. However, this is counteracted by the distribution going to zero at this pole, with nontrivial angular behaviour. This ensures that the contribution to expectation values from this region is finite, but not that boundary terms are necessarily absent.
In order to further understand and support these analytical considerations, we have subsequently analysed a number of models and theories with increasing complexity, from the one-pole model with one degree of freedom to full QCD at nonzero baryon density, with the aim of extracting common features. Logically a pole can be outside, on the edge of, or inside the distribution, and we have given examples of each of these. As expected, when the pole is outside, it does not interfere with the Langevin process. The possibility of a pole on the edge is important, since it indicates that it is not the winding around the pole that matters, but the decay of the distribution towards the pole. Indeed, we have encountered both correct and incorrect convergence in this case, and this can be traced back to the fast decay of the distribution, or lack thereof.
As a side remark, we note that further support for our analytical understanding comes from the observed interplay between observables, drift and the distribution: if the observable is naturally suppressed (enhanced) near the pole, it is possible to obtain correct (incorrect) results. This explains why in one analysis one can encounter both correctly and incorrectly determined or nonconverging expectation values.
When the pole is inside the distribution, we have found that it typically leads to a bottleneck, i.e. a region in configuration space which is difficult to pass and effectively divides the configuration space in two, nearly disjoint regions. For QCD and QCDlike models, it is zeroes of the determinant that correspond to poles and determine the location of the bottleneck. Hence the complex-valued determinant, preferably not raised to any powers (such as N f ), or the real part of the determinant, provide useful diagnostic observables to analyse the dynamics. We have studied a number of models in this way, namely U(1) and SU(3) one-link models and heavy dense QCD (HDQCD) in four dimensions. In all of these, we indeed observed similar behaviour: the emergence of two regions, which were denoted with G ± and are identified by the sign of the real part of the determinant (before raising it to a power). We found that it is typically the region with positive real part that dominates the dynamics, but that excursions to G − can upset expectation values, even when their relative weight is suppressed. In some cases the bottleneck is particularly difficult to pass, which may occur when the order of the zero is increased. It is possible to analyse each region G ± separately. The error made by restricting the simulation to G + can then be estimated and was typically seen to be small, depending on the parameters used. In the SU(3) one-link model and HDQCD, the process is affected by zeroes close to the half filling point, where the sign problem is milder. In this case the region G − took the form of characteristic "whiskers": even though the relative weight of these regions was small, the contribution to expectation values could be large. Hence an exclusion of this region improves the results, but with a systematic uncertainty. In HDQCD we found indications that the role of zeroes is unchanged when the lattice spacing is decreased, while keeping the physical volume approximately constant. Finally, we considered QCD with dynamical staggered quarks at high temperature and analysed the eigenvalues of the Dirac operator. Here we noted that there are typically no eigenvalues close to zero, which suggests that complex Langevin dynamics is applicable in this part of the phase diagram.

Summary and Outlook
We have given a detailed analysis of the role of poles in the Langevin drift, in the case of complex Langevin dynamics for theories with a sign problem. Since the standard derivation of the formal justification relies on holomorphicity of the drift, we have revisited the derivation and shown that, besides the requirement of a fast decay of the probability distribution at large imaginary directions, an additional requirement of fast decay near the pole(s) is present. The probability distribution is typically not known a priori, but its decay can be analysed a posteriori.
We then studied a number of models, from simple integrals to QCD at nonzero baryon chemical potential, and found support for the analytical considerations. In the cases when the simulation is affected by the pole(s), we found that typically the configuration space is divided into two regions, connected via a bottleneck. For theories with a complex fermion determinant, such as QCD, the bottleneck is determined by the zeroes of the determinant. In the simple models, and even for QCD in the presence of heavy (static) quarks, this understanding is sufficient to analyse the reliability of the Langevin simulation.
In full QCD, with dynamical quarks, ideally it requires knowledge of the (small) eigenvalues of the fermion matrix throughout the simulation, which is nontrivial. At high temperature, it was shown that the eigenvalues are typically not close to zero. Hence the most important outstanding question for QCD refers to the lowtemperature region in the phase diagram. Here a number of hurdles remains to be taken. When eigenvalues become very small, the conjugate gradient algorithm used in the fermion matrix inversion becomes ineffective. This is common to many problems at nonzero density, even in the absence of a sign problem, and requires e.g. a regulator. A successful approach in this case has not yet been developed. Besides this, on coarse lattices gauge cooling, which stabilises the Langevin process, is ineffective. This situation is improved on finer lattices, which are however more expensive due to the larger values of N τ required. A definite statement on the applicability of complex Langevin dynamics throughout the QCD phase diagram can only be made once further analysis of theses issues has been brought to a conclusion.
www.lrz.de), as well as for providing computing time through the John von Neumann Institute for Computing (NIC) on the GCS share of the supercomputer JU-RECA and JUQUEEN [62] at Jülich Supercomputing Centre (JSC). Some parts of the numerical calculation were done on the GPU cluster at Eötvös and Wuppertal Universities.
the boundary conditions at x = 0. The drift is strongly repulsive away from the origin along the real axis, the probability density ρ is vanishing quadratically there and the Langevin process does not cross the origin.
This means that mathematically we have to consider H with 0-Dirichlet boundary conditions at the origin. To avoid confusion, we call the corresponding Hamiltonian H D . All functions in the domain of definition of H D have to have at least a square integrable second derivative; this means that they are continuous and vanish at the origin. The ground state wave function of H (ignoring the b.c. at 0), which superficially seems to belong to a negative eigenvalue of H D , is not in the domain of definition (neither are all even eigenfunctions of H). Actually, because there is no communication between the two half-lines, we may as well consider the problem only on one of the half-lines R ± . From now on we choose R + .
The as it has to be. The first excited state of L is φ 2 = N 3 (x 2 − 3/ω); it belongs to the eigenvalue −2ω. We thus find which converges to the correct expectation value for t → ∞. Similarly convergence to the correct value is found for all even functions. How about the superficially unstable mode 1/x, see Eq. (A.3)? It corresponds to the ground state of H on L 2 (R), which is not in the domain of definition of H D . The odd eigenfunctions of H are complete in the subspace of odd functions; hence the odd eigenfunctions restricted to R + are complete in L 2 (R + ) and so the apparent eigenvector of H D with a negative eigenvalue ψ 0 should be considered as a L 2 (R + ) convergent series Instead of considering −H D as a self-adjoint operator on L 2 (R + ) we may consider L itself as a self-adjoint operator on the Hilbert space H obtained as (the completion of) the set of functions φ on R + with the scalar product The eigenfunctions φ n , n = 0, 1, . . ., are orthogonal with respect to this scalar product. So we can write equivalently where the convergence is now to be understood in the sense of H. Since the models are real, one might attempt to treat them by the real Langevin method. But a simple consideration shows that for a real model with a sign problem this cannot produce correct results. The reason is that the real Langevin equation will have a positive equilibrium measure on the real axis and thus cannot reproduce all the averages which would be obtained with a signed measure. When we modify the process to allow it move out into the complex plane, the story changes: it seems then a priori not impossible that a positive measure on C reproduces correctly the averages of holomorphic observables with a signed measure on R and there are examples that bear this out. A simple example is given by It is easy to verify that the correct expectation values are reproduced by the positive density in C, This simple solution is, however, unrelated to the CL method.
B.2 One pole, n p even For n p > 0 and even, there is no sign problem, but the lack of ergodicity exists as well.
In this case, because of the stronger repulsion away from the pole, our simulations typically do not cross the pole, and so produces incorrect results when started on one side of the pole. (If there is a symmetry x → −x, this defect can easily be remedied by starting the process with equal probability on either side of the pole). Another way to facilitate the crossing and achieve correct results is by adding a small imaginary noise term. where the symbol · stands for the ordinary real or complex Langevin long time average. But this cure, like any reweighting method, while it works for one-variable models, it is not very useful for lattice models. So we will not pursue it any further.

B.3 A cure for compact real models
The final cure we consider is for a real but nonpositive weight ρ. Let c be a constant such that ρ + c > 0 (B.7) and define σ ≡ ρ + c. So a correct procedure is to run real Langevin with the drift derived from the positive density σ and correct the normalisation as shown above. We take the U(1) model with n p = 1, such that which increases as c is increased (c = 0 is the original process). The observables are Re/Im e ikx with k = 1, 2.
As expected, the numerical results start agreeing with the exact results as soon as c is large enough to make σ nonnegative. Therefore a plot like this can also serve to determine the minimal c for which correct results are obtained and a priori knowledge about the zeroes of the density ρ is not required.
The advantage of this cure is that it can be easily generalised to the complex case µ = 0; it turns out that it does work reasonably well, but not perfectly, provided µ is not very large. But again, since the cure involves some reweighting, it is not very useful for lattice systems.

C Zeroes of the HDQCD determinant
In this Appendix, we further consider the HDQCD determinant for gauge group SU(3) or SL(3,C), reduced to a single link U. We will demonstrate that zeroes of the determinant do not come as isolated points.
As discussed in Sec. 5 But there is a different way to think about this: define u ≡ tr U = z 1 + z 2 + 1 z 1 z 2 , v ≡ tr U −1 = 1 z 1 + 1 z 2 + z 1 z 2 .
(C. 4) u and v are algebraically independent and can be used instead of z 1 and z 2 to parametrise conjugacy classes of SL(3, C). The map from z 1 , z 2 to u, v is not one-toone, since interchanging z 1 and z 2 will leave u, v unchanged. The inverse map from u, v to z 1 , z 2 will have branch points where any two eigenvalues coincide. The fact that u, v ignore permutations of the eigenvalues is actually an advantage because D too remains unchanged. The zeroes of D are then determined by The three manifolds (1,2,3) in Eq. (C.3) are thus mapped into a single manifold given by Eq. (C.5). This manifold is an affine complex plane in the space C 2 (affine means that it does not go through the origin).
tr U −1 does not have to be computed by taking the inverse of U; one can instead use the identity, valid for U ∈ SL(3, C), So we get, using u 2 = tr U 2 , which vanishes on a complex parabola. The determinant factorD is quite similar; proceeding as before we find so its zeroes are again described either by a complex affine plane in u, v or a complex parabola in u, u 2 . The main point is that they are real manifolds of codimension two, so there are no isolated zeroes.

D Expansion methods
One possibility to deal with the pole is to use power series expansions in order to approximate the meromorphic drift by polynomials. In QCD the pole in the drift is always due to a zero of the determinant. In the one-pole model the role of the determinant is played by the factor D(x) ≡ x − z p .

D.1 One-pole model
We explain the approach in the one-pole model. We study two basic procedures: (1) Fixed expansion: Let D np be the 'determinant' causing problems due to its zeroes. Consider the drift caused by D, In order to obtain a holomorphic approximation to K D , we choose a point (x 0 , y 0 ) not too far from the peak of the distribution but far enough from the pole(s). We then expand 1/D around this point to order N as follows: let D 0 ≡ D(x 0 , y 0 ), then replace 1/D by  converges to 0 if and only if |D/D 0 | < 0. There could be problems if the process goes outside the region of convergence, but experience shows that typically the drift will tend to keep the process inside the region of convergence. An illustrative example is shown in Fig. 37. Because the expansion point (x 0 , y 0 ) is chosen once and for all, we call this the fixed expansion. In any case, by varying N one can check whether this is the case. Numerical studies using this fixed expansion are presented below for the U(1) one-link model and for the SU(3) one-link model in Sec. 5.
(2) Dynamic expansion: one may choose different expansion points (x i , y i ), with D(x i , y i ), in such a way that the domain of analyticity is covered, while always staying well inside the domain of convergence. The quality of the expansion can be fixed by changing (x i , y i ) to the actual configuration point whenever (1 − D/D 0 ) n+1 > ǫ with some pre-chosen ǫ. If ǫ is chosen small enough we should not find any appreciable difference between the results using D and D N . By studying various situations we find that, as expected, the dynamically expanded drift generally performs just like the unexpanded one: it works where the latter works and it fails where the latter fails. has two poles for κ > 1. As described for the one-pole model we obtain a holomorphic approximation to K D by choosing a point z 0 = x 0 + iy 0 somewhere near the center of the equilibrium distribution; here the natural choice is z 0 = iµ, i.e. D(z 0 ) = 1 + κ. The Taylor expansion of 1/D around this point to order N then looks as in Eq. (D.2). Again the drift from the expansion tends to keep the process inside the region of convergence, as shown in Fig. 38.
We have tested this approach numerically, using expansions to order 10 and 20, making sure that the centre of the expansion was chosen reasonably far away from the poles and near the maximum of the distribution P (x, y) in the region with positive Re D, i.e. G + . We found the results to be more or less comparable to the ones obtained by restricting the process to G + , and not a great difference between orders 10 and 20.
Hence we conclude that the fixed expansion is a potential cure of the ills of meromorphic drift in the cases where restricting the process to G + works as well. It should be noted though that potentially significant, though much reduced deviations from the exact results remain.