Large-$N$ $SU(N)$ Yang-Mills theories with milder topological freezing

We simulate $4d$ $SU(N)$ pure-gauge theories at large $N$ using a parallel tempering scheme that combines simulations with open and periodic boundary conditions, implementing the algorithm originally proposed by Martin Hasenbusch for $2d$ $CP^{N-1}$ models. That allows to dramatically suppress the topological freezing suffered from standard local algorithms, reducing the autocorrelation time of $Q^2$ up to two orders of magnitude. Using this algorithm in combination with simulations at non-zero imaginary $\theta$ we are able to refine state-of-the-art results for the large-$N$ behavior of the quartic coefficient of the $\theta$-dependence of the vacuum energy $b_2$, reaching an accuracy comparable with that of the large-$N$ limit of the topological susceptibility.


INTRODUCTION
The dependence of physical observables on the topological parameter θ is one of the most interesting properties of four dimensional SU (N ) pure-gauge theories. The parameter is coupled in the action to the topological charge θ-dependence can be studied to achieve a better understanding of the non-perturbative features of Yang-Mills theories, but also has direct phenomenological implications for hadron physics in the limit of large number of colors [1][2][3][4][5][6][7]. A particularly interesting quantity, whose dependence on θ has been thoroughly investigated, is the vacuum energy density E(θ), which is formally defined by the relation where V is the four-dimensional euclidean space-time volume. The functional form of E(θ) is known only for very specific theories, like for QCD close to the chiral limit [7]. It is customary to consider a Taylor expansion of E(θ) around θ = 0. Since Q is odd under a CP transformation, only even powers of θ appear in the expansion, which can be parametrized in the form The coefficients of this expansion are related to the cumulants of the topological charge distribution at θ = 0, * Electronic address: claudio.bonanno@pi.infn.it † Electronic address: claudio.bonati@unipi.it ‡ Electronic address: massimo.delia@unipi.it for instance χ = Q 2 θ=0 /V , where χ is the topological susceptibility.
While the exact numerical values of the coefficients χ and b 2n are generically unknown, something is known about their dependence on the number of colors N , at least when N is large enough. Indeed, assuming that a non-trivial θ-dependence is present in the large-N limit, it is possible to fix the large N scaling form of the energy density E(θ, N ) = N 2Ē (θ/N ) [4,6,8], implying The value ofχ is related to the mass of the η ′ meson by the Witten-Veneziano formula [4,5], which provides the estimateχ ≃ (180 MeV) 4 . Analytic estimates ofχ and of theb 2n coefficients are available only for two dimensional models [9][10][11][12][13]. Given the non-perturbative nature of θdependence, the numerical lattice approach is the natural tool to investigate such topics quantitatively, and in particular to test large-N predictions [11,[13][14][15][16][17][18][19][20][21][22][23][24][25][26][27]. There are however some non-trivial computational challenges that have to be faced, especially in the large N regime. The first problem is related to the measure of the coefficients b 2n . The task is challenging by itself since, contrary to what happens for χ, these coefficients approach zero as N is increased. In addition, the simplest estimators available for θ = 0 simulations are based on the cumulants of the topological charge distribution, which are not self-averaging quantities, leading to a bad signal-tonoise ratio for large volumes. Such problem can be solved by introducing an explicit source term in the action, coupled to the topological charge density, which corresponds in practice to an imaginary θ term [22,[28][29][30][31][32][33][34][35][36]. The method was exploited for the determination of the b 2n coefficients first in Ref. [22] and later developed and applied in several works to both SU (N ) Yang-Mills theory [13,[25][26][27] and two dimensional CP N −1 models [37,38].
The second problem is the so-called "freezing problem", it represents a well known problem in a wide range of theories sharing the presence of topological excitations [17,[39][40][41][42][43][44][45][46][47][48][49] and it will be the main topic of this work. When adopting local update algorithms on lattices with periodic boundary conditions, the topological modes experience a severe critical slowing down when approaching the continuum limit, with the autocorrelation time of the topological charge which grows approximately exponentially as a function of 1/a [17,43]. As a consequence gauge configurations stay fixed (frozen) in a given topological sector for an exceedingly large amount of Monte Carlo time, thus preventing a correct sampling of the path-integral distribution. This problem becomes even worse in the large-N limit, since at fixed lattice spacing a the autocorrelation time of the topological charge seems to grow exponentially with the number of colors [17,37,43].
Despite the fact that a definitive solution to the topological freezing problem has been obtained only in toy models [48], several strategies have been proposed to reduce its severeness, in particular by reducing the exponential critical slowing down to a polynomial critical slowing down [44,45,48,50], or to extract information even from completely frozen configurations [51]. A popular method, proposed in Ref. [44], is to adopt open boundary conditions for the gauge fields in the time direction instead of the usual periodic ones. The presence of an open boundary eliminates barriers among different topological sectors even in the continuum, and one expects the critical slowing down of the topological modes to be essentially diffusive, with an autocorrelation time increasing as 1/a 2 approaching the continuum limit. Using this method, however, one also breaks translation invariance and loses completely any notion of global topological charge. Therefore χ and b 2n can only be estimated from the integral of the (2n + 2)-point connected correlators of the topological charge density on the bulk of the lattice.
A different strategy, that keeps the advantages of the open boundary conditions without breaking translation invariance, has been proposed by M. Hasenbusch in Ref. [52], where it was tested for two dimensional CP N −1 models. The basic idea of this method is to combine periodic and open boundary conditions in a parallel tempering framework, using the copies with open or partially open boundary conditions as sources of topological fluctuations for the copy with periodic boundary conditions, which is the one on which measures are performed. To reduce the number of copies to be used in the parallel tempering, open boundary conditions are not enforced along all the temporal boundaries, but only in a limited space region, that will be referred to as the "defect" in the following.
In Ref. [38] it was shown that in two dimensional CP N −1 models, for large N values, the adoption of the Hasenbusch algorithm in combination with the imaginary-θ method allows to achieve impressive improvements compared to previously available results for the θ-dependence of these models. The aim of the present work is to test the same setup in the case of four dimensional SU (N ) Yang-Mills theories at zero temperature, comparing its performance with that of the standard simulations. In doing this we will also refine the state-ofthe-art results about the large-N behavior of the b 2 coefficient.
This paper is organized as follows: in Sec. 2 we discuss our lattice setup, along with the parallel tempering algorithm and the imaginary-θ method, in Sec. 3 we show the numerical results obtained with the parallel tempering and finally in Sec. 4 we draw our conclusions.

LATTICE SETUP
In this section we introduce the discretizations adopted for the action and the topological charge, we present a summary of the imaginary-θ method and discuss the parallel tempering algorithm employed in our simulations.

A. Lattice action and lattice topological charge
We discretize the Yang-Mills action on an hyper-cubic lattice of size L and with periodic boundary conditions in every direction (see Sec. 2 C for the defect) using the standard Wilson action: For the topological charge (1), we adopt the simplest discretization with definite parity, the so-called clover discretization: where C µν (x) is a discretization of the field strength given by the sum of all the 4 plaquettes centered in the site x and lying on the µ-ν plane. Q clov is generically noninteger and it is related, configuration by configuration, to the physical charge Q by [53]: where Z is a finite renormalization constant that approaches 1 in the continuum limit, and η is a stochastic noise due to ultraviolet (UV) fluctuations at the scale of the lattice spacing. Using the variance of Q clov to estimate the topological susceptibility would require to take into account both multiplicative and additive renormalizations, which can be avoided by using one of the several smoothing procedures that have been proposed in the literature, such as the gradient flow [54,55] or cooling [56][57][58][59][60][61][62], which are all known to agree with each other when properly matched [63,64]. In this work we use cooling due to its simplicity and numerical effectiveness.
We denote by Q cool clov the topological charge obtained by measuring the observable Eq. (7) on a configuration to which a certain number of cooling steps have been applied. To assign an integer topological charge Q L to each configuration we follow Ref. [17], defining where "round" denotes the rounding to the closest integer and the value of α is fixed by minimizing so that the maxima in the distribution of αQ cool clov are located approximately at integer values; such fixing is performed at θ = 0 and then adopted also for θ = 0. The topological susceptibility computed using Q L becomes stable (i.e. independent of the number of cooling steps n cool ) after n cool ∼ 10, moreover such threshold reveals to be weakly dependent on the lattice spacing, thus we chose n cool = 20 to define the topological charge in all simulations, verifying also the stability of all continuum extrapolations if a different value of n cool is used.

B. Imaginary-θ method
As anticipated in the introduction, the imaginary-θ method is a technique that is useful to estimate the topological susceptibility and especially the coefficients b 2n introduced in Eq. (3). In this section we provide a short summary of this computational method, referring to Ref. [25] for more details. The idea of the method is to introduce an imaginary θ term, in order to avoid a sign problem, and to extract χ and b 2n from the dependence on θ of the cumulants of the topological charge distribution: the method wins over the standard computation at θ = 0 since now the information on the b 2n parameters is contained in all cumulants, including the lowest order ones. The procedure is most conveniently explained by working formally in the continuum: the continuum euclidean action can be written in the form where θ I = iθ. The dependence on θ I of the cumulants of the topological charge distribution can be computed from the derivatives of E(θ) in Eq. (2), properly continued to the imaginary axis, and can be expressed in terms of χ and b 2n using Eq. (3). As an example, the explicit expressions of the first few cumulants as a function of θ I read: where V is the space-time volume and the first fourth cumulants of the topological charge are All these averages are computed by using the weight e −S(θI ) in the path-integral.
Let us now describe what changes on the lattice: the lattice action is where the θ-term is discretized by using the nonsmoothed clover charge Q clov defined in the previous subsection, and θ L is the bare imaginary-θ coupling. The reason for using Q clov is that with this choice standard heatbath and overrelaxation algorithms can be used in the update. Relations analogous to Eq. (12) can be obtained, where θ I = Zθ L and Z is the renormalization constant appearing in Eq. (8). Measuring the cumulants for several values of θ L we can thus fit the values of χ, b 2n and Z. We explictly note that the cumulants are not affected by the renormalization of Q clov since they are evaluated by using the smoothed and rounded charge Q L introduced in the previous section (see Ref. [38] for a more detailed discussion on this point).

C. Parallel tempering of volume defect
The lattice action in Eq. (14), as the standard Wilson action, is linear in each of the link variables, hence standard heathbath [65,66] and overrelaxation [67] algorithms can be applied to update the gauge configurations when using the imaginary-θ method. However, as anticipated in the introduction, the local nature of this updating procedure results in a slowing down of the topological modes, which is exponential both in the inverse lattice spacing and in the number N of colors. This makes very difficult to perform controlled continuum extrapolations for large values of N , since even simulations with moderately small values of the lattice spacing become prohibitively expensive as N is increased.
To mitigate this problem, we adopt, in this work, the parallel tempering algorithm proposed for the CP N −1 models in Ref. [52], where this algorithm has been shown to perform as well as simulations with open boundaries while bypassing their complications related to finitesize effects. Moreover, as shown for CP N −1 models in Ref. [38], parallel tempering can be easily applied in combination with the imaginary-θ method discussed in the previous subsection.
In this algorithm we consider N r identical systems, differing only for the boundary conditions on a cuboid defect located on a given spatial slice. In particular, the boundary conditions imposed on the links that orthogonally cross the defect are chosen so that the different copies interpolate between periodic boundary conditions (pbc) and open boundary conditions (obc). Each system is evolved independently using standard local algorithms, and different copies are exchanged from time to time with a Metropolis step, so that the strong reduction of the autocorrelation time achieved in the obc copy is transferred to the pbc one, on which the measure of the cumulants of the topological charge Q L is performed. Since the injection (or ejection) of topological charge in the system is mainly triggered by the update of the links close to the defect, it is convenient [52] to alternate updating sweeps over the whole lattice with hierarchic updates over sub-regions of the lattice centered around the defect. In particular, we updated more frequently small hyper-rectangular regions centered around the defect.
In our simulations the location of the defect is the tridimensional region however, after every hierarchic update, we perform a random translation of the pbc copy by one lattice spacing, thus effectively moving the location of the defect. For the sake of the simplicity we use a cubic defect L ≡ L d and it is sufficient to choose L d equal to a few lattice spacings to obtain satisfactory performances. For a discussion on how the choice of L d affects the efficiency of the algorithm, see Sec. 3 A.
In order to specify how the different boundary conditions across the defect are implemented, it is convenient to rescale each link of every replica according to where U  µ (x) is: so that only the links crossing the volume defect are affected by its presence. For the pbc replica (corresponding to r = 0) we have c(0) = 1, for the obc replica (corresponding to r = N r − 1) we have c(N r − 1) = 0, for 0 < r < N r − 1 the value of c(r) interpolates between 0 and 1. With these notations the action of the r th copy reads Note that we chose to keep the θ term insensitive to the presence of the defect, which only affects the Wilson part of the action. Exploratory simulations performed by modifying also the θ term provided evidence that this choice does not significantly affects the performance of the algorithm, analogously to what was found for CP N −1 models in Ref. [38]. This is not surprising since the barriers between the topological sectors, responsible of the critical slowing down, stem essentially from the Wilson term.
The swap of replicas was proposed after every step of hierarchic update, for every couple of adjacent copies r and r + 1 (differentiating the cases of even and odd r in order to avoid synchronization problems), and was accepted with the Metropolis probability where the θ-term, which is not affected by the defect, does not enter the acceptance probability. Since boundary conditions of different replicas differ only on a subregion of the lattice, to compute ∆S it is sufficient to sum the contributions to the action of the plaquettes centered on sites lying in a hyper-cuboid region centered around the defect and extending one lattice spacing from it. For the parallel tempering to be effective in decorrelating the topological modes of the pbc copy there must be no bottleneck for a configuration in the obc copy to be swapped toward the pbc one. In order to guarantee a random walk without barrier of the configurations between the different replicas it is thus convenient to choose the constants c(r) entering K (r) µ (x) in such a way that the acceptance ratio is constant for all the proposed swaps: p(0, 1) = p(1, 2) = · · · = p(N r − 2, N r − 1).

NUMERICAL RESULTS
In order to compare the performances of parallel tempering with the ones of the standard algorithm, we performed simulations for N = 4 and 6 with both algorithms and for several values of β at θ L = 0, measuring the autocorrelation time of Q 2 L . We then performed simulations at non-zero θ L with parallel tempering for each value of β, to estimate χ, b 2 and b 4 using the imaginary-θ method.
In Tab. I we summarize the parameters of the performed simulations. Lattice sizes have been chosen to ensure that L √ σ 3, where σ ≃ (440 MeV) 2 is the string tension, so as to have finite-size effects under control [13,68]. Statistics reported in Tab. I refer to the parallel tempering simulations and are reported in number of parallel tempering steps. A single step of tempering update consists first of all of a complete update of each replica, using 5 lattice sweeps of over-relaxation [67] followed by 1 lattice sweep of heat-bath [65,66], both implementedà la Cabibbo-Marinari [69], i.e., updating all the N (N − 1)/2 diagonal SU (2) subgroups of SU (N ). After this "global" update step we perform an iteration on the sub-lattices entering the hierarchical update (see Sec. 2 C), each iteration consisting of • a local update sweep of the sub-lattices for every replica, using the same combination of local algorithms adopted for the global update; • the parallel tempering swap proposal; • a random translation of the pbc copy by one lattice spacing.
Since each system is updated using the same procedure and the time required for hierarchic updates, swap and translations is negligible with respect to the time of the global update, the total numerical effort of a single parallel tempering step is ∼ N r times larger than the one required for the local update in the standard setup.

A. Parallel tempering: results and comparison
Just by inspection of the time histories of the topological charge it is simple to realize that parallel tempering substantially reduces the topological freezing allowing us to perform simulations at values of the lattice spacings which would have been otherwise prohibitive with the standard algorithm. An example of Monte Carlo time evolution of Q L obtained with parallel tempering for N = 6, β = 25.75 and θ L = 0 is shown in Fig. 1, where we compare it with the evolution obtained with the standard algorithm.
In order to quantitatively characterize the gain achieved with parallel tempering and optimize its efficiency, it is useful to study the autocorrelation time of the topological susceptibility. We use as definition of the autocorrelation time for the generic observable O the expression [71]  The last column refers to the total statistics accumulated for all imaginary-θ simulations. The defect length was, in all cases, L d /a = 2. All simulations for N = 4 were performed using Nr = 10, corresponding to a constant swap probability p around 20%, while simulations for N = 6 used Nr = 17, corresponding to p ≈ 30%. Lattice spacings are taken from Ref. [70] or interpolation/extrapolation of data thereof, except for the one marked with *, which comes from Ref. [  where ∆ binned O is the error associated to O by using a self-consistent binning analysis, while ∆ naive O is the usual standard error of the mean for independent identically distributed samples. The autocorrelation time of the topological susceptibility, however, does not take into account the increased computational effort of the parallel tempering algorithm with respect to the standard local algorithms. As a figure of merit for the computational effectiveness of parallel tempering, it is thus convenient to introduce the effective autocorrelation time given by As discussed in the previous section, in every parallel tempering simulation we tuned the parameters c(r) in such a way that the acceptance p(r, r+1) of the Metropolis swap move between the replicas r and r + 1 is approximately independent of r. This tuning was performed using test simulations at θ L = 0 (acceptances do not depend on θ L ), and in Fig. 2 we show an example of the behavior of c(r) for a run with N = 4 and β = 11.347. Deviations from the linear behavior appear to be small in the optimal c(r) values, however using these values τ pt (Q 2 L ) is about half the one obtained by using a simple linear interpolation. Once p(r, r + 1) is almost independent of r, configurations move freely among the different replicas following a random walk, as shown in Fig. 3.
The constant value p that is reached by p(r, r + 1) after the tuning of c(r) obviously depends on the number N r of replicas used. We did not perform a systematic investigation of the dependence on p of the numerical effectiveness of parallel tempering, however this dependence seems to be quite mild. Indeed by increasing the number of replicas N r , the constant acceptance probability p grows and the autocorrelation time τ (Q 2 L ) is reduced, however also the computational cost increases with N r . The net effect is that τ pt (Q 2 L ) is largely insensitive to the specific value of p, at least as far as it is not too close to 0 or 1, as we verified in some test simulations. For example, simulations performed with N = 4 and β = 11.104 using p ≃ 20% (achieved with N r = 10) or p ≃ 30% (corresponding to N r = 12) provided consistent values of τ pt (Q 2 L ): 72(10) and 78 (18) respectively. The value of p was kept fixed while approaching the continuum limit, and in all cases we found that this could be achieved by using the same value of N r for the different lattice spacings (at fixed physical volume).
Most of the simulations reported in this work used L d /a = 2 for the size of the defect, a value that is sufficient to drastically reduce the freezing problem. To investigate the dependence of τ pt on L d some more simulations have been performed with L d /a = 3 and L d /a = 1 for the case N = 6, always keeping the swap acceptance probability p fixed to about 30%, which requires to scale the number of replicas approximately as ∼ L 3 d . A complete list of the obtained autocorrelation times τ pt (Q 2 L ) is reported in Tab. II, where they are also compared with the results obtained in Ref. [13] using just local algorithms. The scaling of τ pt (Q 2 L ) with 1/(a √ σ) for the case N = 6 is instead shown in Fig. 4.
Simulations performed for N = 6 at β = 24.845 (corresponding to the points at 1/(a √ σ) ≃ 3.57 in Fig. 4) using L d /a = 1, 2, 3 show that L d /a = 2 is the optimal choice for this value of the coupling. As can be seen from Fig. 4, autocorrelation times extracted from simu- lations performed at fixed L d /a = 2 are much smaller than the corresponding ones obtained from simulation using local update algorithms, also for smaller values of the lattice spacing. However τ pt (Q 2 L ) still seems to scale exponentially with the inverse lattice spacing. This is due to the fact that, by approaching the continuum limit at fixed L d /a, the size of the defect in physical units is reduced, and the mechanism of injection of topological charge through the defect becomes less and less efficient.  Quantities denoted with * are taken from Ref. [13], where the procedure used for the local update was the same of the present work. The result denoted by ** is just a rough estimate, since it was impossible to obtain a reliable value even after ∼ 2.5M trajectories. If instead L d is kept fixed in physical units while approaching the continuum limit, one generically expects a polynomial critical slowing down in 1/(a √ σ). To investigate this point we performed additional simulations at β = 25.75 using L d /a = 3, in order to have at this lattice spacing a defect of the same physical size as the one corresponding to L d /a = 2 at β = 24.845 (in both the cases L d √ σ ∼ 0.56). The outcome of this test is that, despite a ≈ 33% reduction of the lattice spacing, the effective autocorrelation time τ pt (Q 2 L ) is compatible in the two cases, as reported in Tab. II and shown in Fig. 4.
These results still do not permit to make a clear assessment about the scaling and the optimal tuning of the parallel tempering algorithm towards the continuum limit, however, altogether they give a strong indication that it works exceedingly well, compared to standard algorithms, in reducing topological freezing, and that the best scaling is obtained keeping L d in the range 0.2 − 0.3 fm. All this is consistent with what is observed in two-dimensional CP N −1 models [38,52], where the continuum limit is performed at fixed L d /ξ, i.e. at fixed physical size of the defect.

B. Analytic continuation and continuum limit
In Tab. III we summarize the results obtained for the topological observables χ, b 2 and b 4 at the different values of N and of the lattice spacing, obtained by fitting the θ dependence of the cumulants as described in Sec. 2 B. An example of imaginary-θ fit of the cumulants is shown in Fig. 5 for the case N = 6 and β = 25.75.
In all the cases we found sufficient to fit the first three cumulants, as the addition of the fourth one did not change the obtained results. Moreover, in all cases we found the O θ 6 L term in E(θ) to be well compatible with zero; results reported in Tab. III have thus been obtained by fixing b 4 = 0. Finally we note that correlations between the different cumulants are small and do not significantly affect the result of the fit, as we explicitly checked by performing both correlated and uncorrelated fits. We used our data for N = 4 and N = 6, as well as data obtained for larger lattice spacings taken from Ref. [13], to extrapolate continuum results for χ/σ 2 and b 2 . For the topological susceptibility the improvement with respect to previously available results is only marginal, since the dominant source of error comes from the string tension σ used to set the scale. For b 2 , instead, we achieved a substantial improvement of the state of the art, both for N = 4 and N = 6. In particular for N = 6 parallel tempering allowed us to reach much finer lattice spacings than the ones used in previous studies. In this way we could perform for the first time a controlled continuum extrapolation of b 2 in this case, while in Ref. [13] only a reasonable confidence interval was reported. In Tab. IV we summarize our continuum limits, while in Fig. 6 we report our continuum extrapolations.

C. Large-N limit
In this section we revisit the large-N extrapolation on the basis of our improved results in particular we report our estimates ofχ andb 2 introduced in Eq. (4). Let us start from the topological susceptibility: following large-N expectations, we fitted our data for N ≥ 3 using the functional form: Our data are in agreement with the expected large-N scaling and we find the resultχ/σ 2 = 0.0199(10); the best FIG. 6: Continuum extrapolations of χ/σ 2 (above) and b2 (below) for N = 4 and 6 (solid and dashed line respectively) obtained fitting linear corrections in a 2 σ to the continuum limit. The reported best fits yield, respectively for N = 4 and 6, χ 2 /dof = 0.8/4 and 3.7/4 for χ/σ 2 and χ 2 /dof = 4.9/4 and 1.2/4 for b2. The diamond points represent the determinations reported in Ref. [13] for N = 4 and 6.
fit is shown in Fig. 7  We now pass to the discussion of the large-N behavior of b 2 . According to the standard large-N arguments, we expect a behavior of the type: To test this prediction we perform a best fit of our data Extrapolation of χ/σ 2 towards the large-N limit using fit function χ/σ 2 =χ/σ 2 + k/N 2 . Best fit yieldsχ/σ 2 = 0.0199(10) and k = 0.082 (17).
with N ≥ 3 using the power law b 2 (N ) =b 2 /N c , obtaining a perfect agreement with expectations, since the exponent results c = 2.17 (26), which improves the previous result c = 2.0(4) reported in Ref. [13]. The obtained best fit is shown in Fig. 8. By fixing the exponent c = 2 and fitting our data with just the leading behavior b 2 =b 2 /N 2 in the ranges N ≥ 3 and N ≥ 4, we obtain the resultsb 2 = −0.1931(98) andb 2 = 0.192 (14) respectively. Since the curve profiles obtained in these two cases are practically indistinguishable, we only show the former in Fig. 8. As our final result, we quote the valueb 2 = −0.193 (10), which improves on the previous determinationb 2 = −0.23(3) of Ref. [13].

CONCLUSIONS
In this work we investigated the θ-dependence of SU (N ) Yang-Mills theories at zero temperature using the parallel tempering algorithm proposed by Hasenbusch in Ref. [52]. This algorithm was originally tested in two dimensional CP N −1 models, and a first extension of the original proposal has already been performed in Ref. [38] (still for CP N −1 models), by extending the parallel tempering approach to simulations at imaginary θ values. In the present work we implemented the same setup in SU (N ) Yang-Mills theory, thus proving the feasibility of the approach also for computationally more demanding models, and improving the state of the art results for the θ dependence of these models in the large N limit.
The idea of the method is to simulate many independent identical systems differing only for the boundary conditions imposed on a cubic defect L d × L d × L d , which are chosen to interpolate between open (obc) and periodic boundary conditions (pbc). Each replica evolves in-  8: Extrapolation of b2 towards the large-N limit. The solid line represents the best fit obtained using fit function b2 =b2/N c in the whole range; the dashed line represents the best fit obtained using the same fit function but with fixed c = 2 in the whole range; the dotted line represents the best fit obtained in the whole range including a furtherb (1) 2 /N 4 term in the fit function. The best fits yield, respectively,b2 = −0.238(79), −0.1931(98) and −0.179 (31). The free exponent results c = 2.17 (26), while the next-to-leading correction is b (1) 2 = −0.17 (35).
dependently, and swaps among them are proposed from time to time in order to transfer configurations from the obc to the pbc replica. In this way a drastic reduction of the autocorrelation time of the topological charge is achieved, while avoiding the complication related to the breaking of translation invariance connected to the adoption of open boundary conditions, since measures are performed on the pbc replica.
By using the parallel tempering algorithm we got an impressive reduction of the autocorrelation times of topological observables, which for the smallest lattice spacing used was of at least two order of magnitude when taking into account also the larger computational complexity. A nice feature of the algorithm is that this gain was obtained without optimally tuning all the possible parameters entering the update, which proves the robustness of the approach. The most relevant parameter to be fixed is clearly the size of the defect, and we verified that for the cases studied in this paper a size in the range 0.2-0.3 fm is sufficiently close to optimal to obtain a huge reduction of the critical slowing down.
The possibility of performing simulations at smaller lattice spacing than in previous studies allowed us to achieve a substantial improvement in the determination of θ-dependence beyond the leading O(θ 2 ) order. In particular, we improved the accuracy of the determination of the coefficients b 2 for both N = 4 and N = 6, in the last case also performing a controlled continuum limit extrapolation, which was not possible with previously available results. These data confirm that b 2 scales with the number of colors in a way that is consistent with the leading behavior predicted by large-N arguments: data for N ≥ 3 are perfectly compatible with a scaling of the form b 2 =b 2 /N 2 , withb 2 = −0.193 (10), while a best fit according to b 2 =b 2 /N c returns the value c = 2.17 (26) for the free exponent. This shows that the scaling of our data is consistent with the leading expected behavior, and it is thus reasonable to neglect sub-leading corrections. We however explicitly note that the accuracy is still not high enough to exclude possible unconventional scenarios like the one put-forward in Ref. [74], or to better investigate if critical corrections emerge at small N [75].
A further refinement of present results, including also new values of N , would thus be welcome in the future: the algorithm proposed in Ref. [52], and extended to SU (N ) gauge theories in this study, will permit such systematic refinement.