Testing the criterion for correct convergence in the complex Langevin method

Recently the complex Langevin method (CLM) has been attracting attention as a solution to the sign problem, which occurs in Monte Carlo calculations when the effective Boltzmann weight is not real positive. An undesirable feature of the method, however, was that it can happen in some parameter regions that the method yields wrong results even if the Langevin process reaches equilibrium without any problem. In our previous work, we proposed a practical criterion for correct convergence based on the probability distribution of the drift term that appears in the complex Langevin equation. Here we demonstrate the usefulness of this criterion in two solvable theories with many dynamical degrees of freedom, i.e., two-dimensional Yang-Mills theory with a complex coupling constant and the chiral Random Matrix Theory for finite density QCD, which were studied by the CLM before. Our criterion can indeed tell the parameter regions in which the CLM gives correct results.


Introduction
The sign problem is one of the most important issues in contemporary physics, which hinders theoretical developments in QCD at finite density, real-time dynamics of quantum many-body systems, strongly coupled electron systems, supersymmetric theories and so on. In the path-integral formulation, these theories typically have an effective Boltzmann weight which is not real positive, and hence the importance sampling used in conventional Monte Carlo methods does not work. The complex Langevin method (CLM) is a promising candidate of the methods that can be applied in such cases. It is based on the stochastic quantization [1,2], which uses a Langevin process associated with the Boltzmann weight. Since it does not rely on the probabilistic interpretation of the Boltzmann weight, it has a chance to be generalized to the case of a complex Boltzmann weight [3,4], which, however, necessarily requires the dynamical variables that are real in the original theory to be complexified. Accordingly, the observables and the drift term in the Langevin process are defined for complexified variables by analytic continuation.
While the Langevin method as applied to a system with a real positive Boltzmann weight yields correct results in general, it is known that the CLM does not always yield correct results, and this feature had not been understood for quite a long time. An important progress was made by refs. [5,6], in which the justification of the CLM was discussed based on an equality between the expectation value of observables defined in the CLM and the expectation value defined in the original path integral. It was noticed that the integration by parts used to prove the equality cannot be justified if the complexified variables make long excursions in the imaginary directions ("the excursion problem"). The same obstacle appears when the drift term has singularities and the complexified variables come close to these singularities frequently [7] ("the singular drift problem"). Thus the reasons for wrong convergence in the CLM was understood at least theoretically. For recent progress concerning the CLM, see refs. .
A crucial issue in the CLM is therefore how one can judge whether these problems are occurring or not during the simulation. In ref. [5], the Langevin-time evolution operatorL acting on an observable O was considered, and the identity L O = 0 in the long-time limit was proposed as a necessary condition for the validity of integration by parts used in the justification. While this criterion was shown to be useful in simple models, it is numerically demanding to apply it to models with many dynamical degrees of freedom since the quantitỹ LO fluctuates violently around zero, and it requires a tremendous amount of statistics in order to judge whether it averages to zero or not. One should also note that the integration by part is not fully justified even if this criterion is met because it is merely a necessary condition.
Recently, we have reconsidered the argument for justification of the CLM [17], and pointed out a subtlety in the use of time-evolved observables, which plays a crucial role in the argument. Our refined argument, which cures this subtlety, requires the probability distribution of the drift term to fall off exponentially or faster at large magnitude. The issue of the integration by parts can actually be reformulated in terms of the same probability distribution, and the corresponding condition turned out to be slightly weaker than the one above. Thus the above condition was proposed as a necessary and sufficient condition for correct convergence in the CLM under obvious assumptions such as the stability of the Langevin process 1 and the convergence of the observable itself. Since the drift term is a quantity that one has to calculate anyway at each Langevin step, probing its distribution costs almost nothing in addition. In the same paper, we have shown the validity of our criterion in simple one-variable models.
Our criterion may be viewed as a refinement of the theoretical understanding that the probability distribution of the dynamical variables should decay fast enough at infinity [5,6] and at the singularities of the drift term [7,22]. However, the statement based on the magnitude of the drift term has a big advantage that it enables us to claim how fast the distribution should decay for the correct convergence.
The purpose of this work is to demonstrate the usefulness of our criterion in models with many dynamical degrees of freedom. Here we study two solvable models, two-dimensional pure Yang-Mills theory (2dYM) [30][31][32] with a complex coupling constant and the chiral Random Matrix Theory (cRMT) for finite density QCD [33,34], which were studied by the CLM in refs. [35] and [36][37][38][39], respectively. In both models, the CLM reproduced the exact results correctly in some parameter region but not in the other, due to the excursion problem and the singular-drift problem, respectively. Since the results of the CLM depend smoothly on the parameter, it was not possible to identify precisely the parameter region in which the CLM is valid without knowing the exact results. Our results for the probability distribution of the drift term indeed show a drastic change of its behavior at large magnitude as expected depending on the parameter regions. This demonstrates the usefulness of our criterion for correct convergence in the CLM.
The rest of this paper is organized as follows. In section 2, we briefly review the CLM. In particular, we discuss the criterion for correct convergence proposed in ref. [17] and the gauge cooling technique used in the present work. In sections 3 and 4, we apply the CLM to the 2dYM with a complex coupling constant and the cRMT for finite density QCD, respectively. In particular, we provide numerical results which demonstrate the usefulness of our criterion. Section 5 is devoted to a summary and discussions.

Brief review of the complex Langevin method
In this section, we review the CLM and the criterion for correct convergence using a system of N real variables x k (k = 1, · · · , N ) as a simple example. Here the weight w(x) is a complex-valued function, which causes the sign problem.

complex Langevin method
In the CLM, the original real variables x k are complexified as x k → z k = x k + iy k ∈ C and one considers a fictitious time evolution of the complexified variables z k using the complex Langevin equation given, in its discretized form, by where t is the fictitious time with a stepsize ǫ. The second term v k (z) on the right-hand side is called the drift term, which is defined by holomorphic extension of the one for the real variables x k . The variables η k (t) appearing on the right-hand side of eq. (2.2) are a real Gaussian noise with the probability distribution ∝ e − 1 (η) The expectation values with respect to the noise η k (t) are denoted as · · · η in what follows.
Let us consider the expectation value of an observable O(x). In the CLM, one computes the expectation value of the holomorphically extended observable O(x + iy) as where P (x, y; t) is the probability distribution of x (η) (t) and y (η) (t) defined by Then, the correct convergence of the CLM implies the equality where the right-hand side is the expectation value of O(x) in the original theory (2.1). A proof of eq. (2.6) was given in refs. [5,6], where the notion of the time-evolved observable O(z; t) plays a crucial role. In particular, it was pointed out that the integration by parts used in the argument cannot be justified when the probability distribution (2.5) falls off slowly in the imaginary direction. In ref. [7], it was noticed that the wrong convergence associated with the zeroes of the fermion determinant [36] is actually due to the slow fall-off of the probability distribution (2.5) toward the singularities of the drift term. While this argument provided theoretical understanding of the cases in which the CLM gives wrong results, the precise condition on the probability distribution was not specified. Furthermore, there is actually a subtlety in defining the time-evolved observable O(z; t) as we discuss in the next subsection.

the condition for correct convergence
Here we review the refined argument for justification of the CLM, which leads to the condition for correct convergence [17].
The basic idea in proving the equality (2.6) is to consider the time evolution of the expectation value Φ(t), which is given by where we have defined the time-evolved observable . Expanding the right-hand side of (2.8) with respect to ǫ and integrating η out, one can rewrite (2.7) as where we have defined a differential operator acting on a holomorphic function of z k , and the symbol : · · · : implies that the derivatives are moved to the right, i.e., : Taking the ǫ → 0 limit in (2.9), one naively obtains and a finite time evolution of Φ(t) as Assuming that (2.12) is valid for finite τ at arbitrary t, one can derive the time evolution of an equivalent system of real variables by induction with respect to t, from which eq. (2.6) follows.
The expressions such as (2.9) and (2.12) need some care, though. In order for the ǫexpansion (2.9) to be valid, the integral on the right-hand side should be convergent for all n. In order for the expression (2.12) to be valid for finite τ , the integral on the right-hand side should be convergent for all n, and on top of that, the infinite sum over n should have a finite convergence radius, which may depend on t.
The issues raised above are nontrivial since the drift term v k (z) in the differential operator (2.10) can become large for some z = x + iy, which appears with the probability distribution P (x, y; t). Defining the magnitude of the drift term u(z) in a suitable manner, the most dominant contribution fromL n in (2.9) and (2.12) can be estimated asL n ∼ u(z) n . Therefore, the integral appearing in the infinite series can be estimated as where we have defined the probability distribution of u(z) by (2.14) In order for (2.11) to be valid, (2.13) should be finite for arbitrary n, which requires that p(u; t) should fall off faster than any power law. In order for the infinite series (2.12) to have a finite convergence radius, p(u; t) should fall off exponentially or faster. Since the latter condition is slightly stronger than the former, it can be regarded as a necessary and sufficient condition for correct convergence in the CLM.
In the previous argument [5,6], eq. (2.11) was derived in a continuous time formulation, where the time evolution of the probability distribution P (x, y; t) was converted to the time evolution of the observable using integration by parts. If the integration by parts can be justified and eq. (2.11) indeed holds, the right-hand side of (2.11) should vanish in the long time limit. This was proposed as a necessary condition for correct convergence in the CLM. On the other hand, it was implicitly assumed that the infinite series in (2.12) has an infinite convergence radius. This assumption is actually too strong, though, as we have discussed above. Our refined argument based on induction only requires that the infinite series in (2.12) should have a finite convergence radius at arbitrary t. This leads to a condition, which is slightly stronger than the condition required for justifying the integration by parts as one can see from our derivation of (2.11) based on the ǫ-expansion.
As we mentioned in the previous subsection, the situation in which the CLM fails can be classified into two cases. One is the case in which the complexified variables make long excursions in the imaginary directions (the excursion problem) [5,6], and the other is the case in which the drift term has singularities and the complexified variables come close to these points frequently (the singular-drift problem) [7]. In both these cases, the magnitude of the drift term tends to become large, and the probability distribution of the drift term can have a power-law behavior at large magnitude. Thus, our criterion can detect these two problems in a unified manner, and more importantly, it enables us to determine precisely the parameter region in which these problems occur. The usefulness of our criterion was demonstrated in ref. [17] for two simple one-variable models, which suffer from the excursion problem and the singular drift problem, respectively, in some parameter region. Whether it is useful also for systems with many degrees of freedom is the issue we address in what follows.

gauge cooling
In the present work, we use the so-called gauge cooling to reduce the excursion problem or the singular-drift problem as much as possible. Here we briefly review the basic idea of this technique using the system (2.1) as a simple example. Suppose the original system (2.1) has a symmetry under x ′ k = g kl x l , where g kl is an element of a Lie group. Upon complexification x k → z k , the symmetry of the action and the observables is enhanced to z ′ k = g kl z l , where g kl is an element of a Lie group obtained by complexifying the original Lie group. Using this fact, one can improve the Langevin process (2.2) asz (η) where the first line represents the gauge cooling. At each Langevin step, one chooses an appropriate transformation function g depending on the previous configuration in such a way that possible problems of the CLM are avoided. This gauge cooling was originally proposed as a technique to solve the excursion problem [8], but later it was shown to be useful also in solving the singular-drift problem [38,39]. Theoretical justification of this technique was given explicitly in refs. [11,17].

2d Yang-Mills theory with a complex coupling constant
In this section, we apply the CLM to 2dYM with a complex coupling constant, which suffers from the excursion problem in some parameter region [35].

the model
Let us consider 2dYM with an SU(N c ) gauge group, which is defined by where n represents a site on a N t × N s lattice with periodic boundary conditions and U 12 (n) represents the plaquette defined in terms of the link variables U nµ ∈ SU(N c ) by U 12 (n) = U n,1 U n+1,2 U † n+2,1 U † n,2 withμ being the unit vector in the µ direction (µ = 1, 2). This system can be solved analytically using the character expansion [30][31][32], and the partition function is given by where I n (x) is the modified Bessel function of the first kind and V = N t N s is the number of sites on the lattice. The parameter β in (3.2) is related to the gauge coupling constant g by β = 1/g 2 , and it is usually taken to be real positive, in which case the action is real and the sign problem does not occur. Note, however, that the model itself is well defined also for complex β, in which case the action is complex and the sign problem occurs. Since the analytic solution (3.3) remains valid for complex β, this simple gauge theory serves as a useful testing ground for methods which aim at solving the sign problem.
In order to apply the CLM to the 2dYM with complex β, we complexify the link variables as U nµ → U nµ ∈ SL(N c , C) and extend the action and the observables to holomorphic functions of the complexified variables. For instance, the plaquette is extended to

4)
and the action is extended to Note that the Hermitian conjugate of U nµ is replaced with the inverse of U nµ so that the action becomes a holomorphic function of the complexified link variables U nµ . A fictitious time evolution of the complexified link variables is defined by the complex Langevin equation with gauge cooling as where t is the discretized Langevin time with a stepsize ǫ. The SU(N c ) generators t a (a = 1, · · · , N 2 c − 1) are normalized as tr (t a t b ) = δ ab and the real Gaussian noise η anµ (t) is normalized as The gauge transformation in (3.6) represents the gauge cooling, where g n takes values in SL(N c , C). In this model, the complexified link variables can have large components in the non-compact direction of SL(N c , C) [35], which represents the excursion problem. We try to avoid this problem as much as possible by using the gauge cooling. For that purpose, we define the unitarity norm, which measures the deviation of the link variables from SU(N c ), and choose the gauge transformation in (3.6) in such a way that the norm N is minimized [8].
Let us define the magnitude u n (U ) of the drift term at site n by with v anµ (U ) being the drift term (3.8). Then, the probability distribution of the drift term can be defined as where P (U ) represents the probability distribution of U nµ (t) in the t → ∞ limit. This definition of p(u) respects the SU(N c ) gauge symmetry of the original theory.

results
Here we present our results obtained by the CLM. Following [35], we choose N c = 2, the lattice size N s = N t = 4 and β = 1.5 e iθ with various θ. The simulation was performed for the total Langevin time t ∼ 500 with a fixed stepsize ǫ = 10 −5 .
In Fig. 1 (Left), we show the expectation value of the average plaquette defined by as a function of θ, which agrees with the result obtained in ref. [35]. In particular, the CLM fails to reproduce the exact results for θ 0.6. As was reported in ref. [35], the unitarity norm (3.9) is under control for all the parameters investigated and, in particular, it does not blow up even for the cases in which the CLM gives wrong results. In ref. [35], the scatter plot of the average plaquette was also studied, but the results for θ 0.6 were not able to detect the existence of the excursion problem.
In Fig. 1 (Right), we show the probability distribution (3.11) of the drift term for various θ. We find that the distribution falls off exponentially or faster for θ 0.4, while it falls off only by a power law for θ 0.6. This implies that our criterion based on the probability distribution of the drift term can indeed tell the parameter region in which the CLM gives correct results.

Chiral Random Matrix Theory for finite density QCD
In this section, we consider the cRMT [33,34], which was proposed as a toy model for finite density QCD. This model was studied by the CLM in refs. [36][37][38][39], and it was found that the singular-drift problem occurs in some parameter region.

the model
The partition function of the model is given by [34] where N f is the number of flavors and m > 0 represents the degenerate quark mass. The dynamical variables consist of two general N × (N + ν) complex matrices Φ k (k = 1, 2), where the integer ν represents the topological index. The action S b in (4.1) is given by where Ψ k (k = 1, 2) are (N + ν) × N matrices defined by The reason for introducing new matrices representing the Hermitian conjugate of Φ k will be clear shortly. The Dirac operator D in (4.1) is given by where µ is the chemical potential. The effective action of this model reads When one tries to apply Monte Carlo methods to this model, the sign problem occurs for µ = 0 due to the complex fermion determinant det(D + m). To see that, let us note first that the Dirac operator D satisfies the relation for any µ. This implies that all the nonzero eigenvalues of D are paired with the ones with the sign flipped. When µ = 0, D is anti-Hermitian and its eigenvalues are purely imaginary, which implies that the determinant det(D + m) is real semi-positive. On the other hand, when µ = 0, D is no longer anti-Hermitian and its eigenvalues can take complex values. In this case, the determinant det(D + m) is complex in general, which causes the sign problem. Since the model is actually analytically solvable, it serves as a useful toy model for investigating the sign problem that occurs in finite density QCD. Let us apply the CLM to the cRMT with µ = 0. First we consider real variables corresponding to the real part and the imaginary part of (Φ k ) ij and complexify these variables. The action and the observables are extended to holomorphic functions of these complexified variables by analytic continuation. It is easy to convince oneself that this simply amounts to disregarding the constraint (4.3) and extending the action and the observables to holomorphic functions of Φ k and Ψ k (k = 1, 2).
A fictitious time evolution of the complex matrices Φ k and Ψ k (k = 1, 2) is given by the complex Langevin equation with gauge cooling as where W = m 2 −XY is an N ×N matrix. The N ×(N +ν) matrices η k (t) have components taken from complex Gaussian variables normalized by η k, Eq. (4.7) represents the gauge cooling with g ∈ GL(N, C) and h ∈ GL(N + ν, C), which are obtained by complexifying the U(N ) × U(N + ν) symmetry of the original model (4.1).
In ref. [36], the same model was studied by the CLM without gauge cooling, and it was found that one obtains wrong results for small quark mass or large chemical potential. The reason for this failure is the singular-drift problem [7], which occurs due to eigenvalues of (D+m) close to zero. In ref. [39], we proposed to use the gauge cooling to avoid the singular drift problem as much as possible. 2 There, three different types of "norm" were considered so that the gauge transformations g and h in (4.7) can be determined by minimizing them.
A counterpart of the unitarity norm (3.9) in gauge theory, which is called the Hermiticity norm, can be defined as which measures the violation of the relation (4.3). It turned out, however, that the gauge cooling with this norm does not reduce the singular drift problem. This led us to consider a norm that is related directly to the eigenvalue distribution of the Dirac operator. A simple choice is given by which measures the deviation of the Dirac operator (4.4) from an anti-Hermitian matrix. The gauge cooling with this norm has an effect of making the eigenvalue distribution of D + m narrower in the real direction. Another choice is given by where ξ is a real parameter and α a are the real semi-positive eigenvalues of M † M with M = D + m. In eq. (4.11), we take a sum over the n ev smallest eigenvalues of M † M . The gauge cooling with this norm has an effect of achieving α a 1/ξ and suppressing the appearance of small α a . Since α a 1/ξ implies |λ a | 2 1/ξ, where λ a are the eigenvalues of M , it is also expected to suppress the appearance of λ a close to zero. In some cases, the use of the norm (4.10) or (4.11) causes the excursion problem. In order to avoid this, we consider a combined norm where s (0 ≤ s ≤ 1) is a tunable parameter. Let us discuss how we define the probability distribution of the drift term. Here we set the topological index ν = 0 for simplicity so that the dynamical variables Φ k and Ψ k are N × N square matrices. We denote the drift terms of Φ k and Ψ k by F k and G k (k = 1, 2), and represent the eigenvalues of (F † k F k ) 1/2 and (G † k and w (a) k (a = 1, · · · , N ), respectively. Then we define the probability distribution of the drift term as where P (Φ, Ψ) is the probability distribution of Φ k (t) and Ψ k (t) in the t → ∞ limit. This definition of p(u) respects the U (N ) × U (N ) symmetry of the original theory.
In Fig. 2 (Left), we show the real part of the chiral condensate Σ = 1 N ∂ ∂m log Z (4.14) obtained by the CLM with or without gauge cooling and compare it with the exact result given in ref. [33]. In the case without gauge cooling (Top), we find that the result of the CLM deviates from the exact result form 11. On the other hand, in the case with gauge cooling using the normN 1 (Middle), we find that the result of the CLM deviates from the exact result form 2. When we use the normN 2 (Bottom) instead, the deviation occurs form 1. In Fig. 2 (Right), we show the probability distribution p(u) of the drift term obtained by the CLM with or without gauge cooling focusing on the parameter region in which the result starts to deviate from the exact result. In the case without gauge cooling (Top), we find that the probability distribution falls off exponentially or faster form 12, while it develops a power-law tail form 11. In the case with gauge cooling using the norm N 1 (Middle), we find that the probability distribution falls off exponentially or faster for m 3, while it develops a power-law tail form 2. When we use the normN 2 (Bottom) instead, the probability distribution falls off exponentially or faster form 2, while it develops a power-law tail form 1. This implies that the probability distribution of the drift term can indeed tell the parameter region in which the CLM gives correct results.

Summary and discussions
In this paper we have shown that the probability distribution of the drift term indeed provides a useful criterion for judging the reliability of results obtained by the CLM. According to our criterion, the CLM gives correct results when the probability distribution of the drift term falls off exponentially or faster. We have tested it in two solvable models with many dynamical degrees of freedom, i.e., 2dYM with a complex coupling and the cRMT for finite density QCD. While the CLM was known to fail in these models in some parameter region due to the excursion problem and the singular-drift problem, respectively, it was not possible to tell precisely in which region the CLM fails without knowing the exact results. Our criterion was able to determine this parameter region clearly. Note also that the two apparently different problems can be detected by a single criterion in a unified manner.
While it is widely appreciated that the CLM enables explicit calculations in various interesting models with the sign problem at least in certain parameter regions, its usefulness would be rather limited if there is no way to tell precisely in which parameter region it works. The establishment of such a criterion that enables this is therefore of particular importance. It should be also emphasized that our criterion requires essentially no additional cost since the drift term is calculated anyway at each step in the Langevin simulation. Indeed our criterion has played a crucial role in investigating the spontaneous symmetry breaking in matrix models [20,29] motivated by superstring theory. We consider that our criterion is indispensable in applying the CLM to many other interesting systems such as finite density QCD [25].