Topology in full QCD at high temperature: a multicanonical approach

We investigate the topological properties of Nf = 2 + 1 QCD with physical quark masses, at temperatures around 500 MeV. With the aim of obtaining a reliable sampling of topological modes in a regime where the fluctuations of the topological charge Q are very rare, we adopt a multicanonical approach, adding a bias potential to the action which enhances the probability of suppressed topological sectors. This method permits to gain up to three orders magnitude in computational power in the explored temperature regime. Results at different lattice spacings and physical spatial volumes reveal no significant finite size effects and the presence, instead, of large finite cut-off effects, with the topological susceptibility which decreases by 3-4 orders of magnitude while moving from a ≃ 0.06 fm towards the continuum limit. The continuum extrapolation is in agreeement with previous lattice determinations with smaller uncertainties but obtained based on ansatzes justified by several theoretical assumptions. The parameter b2, related to the fourth order coefficient in the Taylor expansion of the free energy density f (θ), has instead a smooth continuum extrapolation which is in agreement with the dilute instanton gas approximation (DIGA); moreover, a direct measurement of the relative weights of the different topological sectors gives an even stronger support to the validity of DIGA.


Introduction
The topological properties of QCD in the high temperature regime represent an essential input to axion cosmology. The QCD axion, originally introduced to explain the observed suppression of the topological CP-breaking θ-parameter in QCD [1][2][3][4], has also been recognized as a possible candidate of dark matter. Through its coupling to the topological charge density, the axion gets its effective (temperature dependent) mass m a (T ) from topological charge fluctuations, in particular m 2 a (T ) = χ(T )/f 2 a , where χ(T ) is the topological susceptibility and f a is the effective axion coupling. Assuming axions as the main source of dark matter, in particular through the so-called misalignment mechanism [5][6][7], a precise knowledge of χ(T ) for temperatures at or above the GeV scale can give access to the coupling constant f a and, in turn, to the value of the axion mass today, which is a quantity relevant to present and future experiments trying to detect it.

JHEP11(2018)170
i) The topological susceptibility is usually determined starting from the probability distribution P (Q) of the topological charge Q, in particular from its variance: χ(T ) = Q 2 /V 4 , where V 4 is the space-time volume. The suppression of χ(T ) at large T implies that, on volumes reasonably achievable in lattice simulations, one may have Q 2 = V 4 χ(T ) 1. Since Q is integer valued, that means that configurations with Q = 0 become very rare, leading to the need of unaffordably large statistics in order to achieve a correct sampling of the topological charge distribution; ii) Topology in QCD with physical quark masses is strongly correlated with the chiral properties of the theory, indeed θ-dependence would be strictly zero in the presence of massless quarks. That means that the explicit chiral symmetry breaking that is present in most fermion discretizations can lead to significant lattice artifacts, thus requiring very small lattice spacings to achieve a reliable continuum extrapolation; iii) On the other hand, when trying to get closer to the continuum limit, a different challenge emerges: because of the topological nature of the problem, standard updating algorithms fail to correctly sample the distribution of Q and get trapped in path integral sectors with fixed topology. This freezing of topological charge leads to a severe critical slowing down of numerical simulations [44][45][46][47][48]; iv) Finally, since finite temperature numerical simulations are usually performed on lattices L 3 s × N t with a fixed aspect ratio L s /N t , as T = 1/(N t a) becomes large also the physical spatial volume a 3 L 3 s becomes small, so that the possible presence of finite volume effects should be checked.
These are four different, partially independent and all potentially lethal problems, which justify the discrepancies found in results from recent lattice studies. In particular, in refs. [38,39] the coefficient D 2 of the power law suppression in eq. (1.1) was found to be around 3, i.e. much smaller than the DIGA prediction, in a range of temperatures going up to around 600 MeV and exploring lattice spacings down to around 0.06 fm. Within the misalignment mechanism for cosmological axion production, a milder power law suppression implies a larger value of the coupling constant f a in order to comply with the dark matter upper bound, hence a smaller predicted value for the axion mass today. However, later results found instead a substantial agreement with the DIGA exponent, even if larger by about one order of magnitude in absolute value, in a range of temperatures starting from very close to the pseudo-critical temperature T c and going up to a few GeVs [40,42,43].
In particular, in ref. [42], various strategies were devised to circumvent the problems exposed above. Problems i) and iii) were bypassed at the same time, in the high-T regime, by giving up sampling the complete distribution P (Q) of the topological charge from the path integral, and determining instead the ratio of probabilities between fixed topological sectors, in particular P (±1)/P (0), based on an integral method which exploits numerical simulations in which the value of Q is kept fixed on purpose (see also ref. [41]). This strategy is justified provided the main DIGA assumption is at work, i.e. that the distribution of instantons and anti-instantons is that of non-interacting topological objects. Indeed, in this JHEP11(2018)170 case, the distribution is Poissonian, so that one can learn everything about the distribution expected in the infinite volume limit even by exploring small volumes where the topological sectors |Q| = 0, 1 are the only relevant ones; in absence of this strong assumption, one could be misled in deducing the infinite volume limit of Q 2 /V 4 by a single measurement of Z 1 /Z 0 . On the other hand this requires to know at least some properties of P (Q), like for instance the so-called b 2 coefficient, which is the fourth order coefficient in the Taylor expansion of the free energy density as a function of θ [49], and approaches −1/12 when the DIGA sets in [8,32]. However, as we will discuss later on, b 2 is still not the end of the story because, even when DIGA fails, it will turn out to be very close to -1/12 if the volume is so small that just the |Q| = 0, 1 sectors dominate and moreover Z 1 /Z 0 1. What one should really do is to check that the relative probability ratios of different topological sectors (with |Q| > 1) follow the volume scaling predicted by DIGA. This requires, on the typical volumes accessible with available computational power, to estimate events (multiple instanton occurrences) which are order of magnitudes smaller than the already rare single instanton events.
In addition, in ref. [42], trying to improve the convergence to the continuum limit, a procedure was adopted which tries to correct for the absence of exact zero modes in the lattice Dirac operator by suppressing gauge configurations with Q = 0, using an ad hoc reweighting factor based on the actual lowest eigenvalues of the adopted lattice Dirac operator.
The main purpose of this paper is to make progress towards an independent determination of χ(T ) in the continuum limit, trying to solve at least problem i) from first principles and without any extra assumption. To that purpose, we exploit a reweighting technique which combines ideas typical of multicanonical simulations [50] and of metadynamics [51][52][53], and has already proved to be extremely efficient in the toy model of the 1D quantum rotor [48], where it permits to correctly sample the distribution P (Q) for Q 2 going down by several orders of magnitude, and more recently in pure gauge theories [54].
The main idea is to add a weight exp(−V (Q)) to the path integral distribution, where the potential V (Q) is chosen so as to enhance the probability of topological sectors which would otherwise be strongly suppressed. That is then corrected when computing averages over the Monte-Carlo sample: where O is a generic observable and · V stands for the average taken according to the modified distribution. Averages are left unchanged, however fluctuations are modified leading to an improved signal-to-noise ratio which, as we will show, permits to gain orders of magnitude in terms of computational effort. As already mentioned above, the idea is very similar in spirit to metadynamics, where however the potential V (Q) is modified dynamically during the simulation, with the main purpose of enhancing tunneling between topological sectors, thus defeating also the critical JHEP11(2018)170 slowing down of topological modes. Actually, our algorithmic framework was originally developed having a metadynamical approach in mind; however, since freezing has not revealed to be a severe problem in the explored range of lattice spacings and temperatures, we have decided to opt for a static version of the potential V (Q) which is determined a priori. This framework is simpler because the problem comes back to a standard equilibrium simulation and, on the other hand, the choice of the potential does not reveal to be critical, apart from possible unwise choices which however are easily avoided.
The method is applied to numerical simulations of N f = 2 + 1 QCD with physical quark masses and for two different temperatures, T 430 and T 570 MeV, exploring lattice spacings down to a ∼ 0.03 fm. The improved signal-to-noise ratio permits us to perform reliable infinite volume and continuum limit extrapolations, which turn out to be in agreement with those of ref. [42]. Lattice artifacts reveal to be significant, so that, despite the improved precision reached in simulations performed at finite lattice spacing and the absence of any ad hoc assumption which could bias the result, the final error for the continuum extrapolation is still rather large. Therefore, efforts to suppress lattice artifacts or to go to finer lattice spacings by defeating the critical slowing down are still needed.
Another important achievement reached by our strategy is the possibility to obtain a direct measurement of the relative weights of the different topological sectors, even if they are extremely small quantities on the accessible lattice volumes, thus giving a solid support to the validity of DIGA in the explored temperature regime.
Finally, we would like to stress that, due to computer limitations, we had to use staggered fermions, which require the rooting of the fermion determinant. In this paper we intended to verify whether the reweighting procedure to enhance the lowest eigenvalues and the limitation of the Montecarlo to the topological sectors with Q = 0, 1 would not bias the final result. In the future further progress in our understanding of the complex dynamics of the topological charge can be obtained by adopting different lattice actions that do not require any rooting of the fermion determinant.
The paper is organized as follows. In section 2 we describe our numerical setup, regarding both the lattice discretization and the introduction and choice of the bias potential. In section 3 we illustrate the improvement achieved by our approach and present results for the topological susceptibility and the b 2 coefficient. Finally, in section 4, we discuss perspectives for further improvement.

Numerical setup
As in ref. [39], we adopted a rooted stout staggered discretization of N f = 2 + 1 QCD, with a tree level improved Symanzik action [55,56] for the pure gauge sector. The standard, finite temperature partition function reads  [61,62] or spline interpolation of data thereof. The systematic uncertainty on the lattice spacing determination is 2-3% and the light quark mass is fixed by m s /m l = 28.15. and W n×m i; µν denotes the trace of the n × m Wilson loop in the µ, ν plane and starting at site i, constructed in terms of the original gauge links of the theory, which are the integration variables in eq.
and is constructed in terms of the modified link variables U (2) i,µ , which are obtained after two levels of the stout-smearing procedure introduced in ref. [57], with isotropic smearing parameter ρ = 0.15.
Stout smearing is a convenient smoothing procedure which shares with other smoothing techniques the benefits of suppressing lattice artifacts, in particular by reducing the taste violations present in the staggered discretization. However, at a variance with other smoothing techniques, stout smeared links are analytic functions of the original gauge link variables [57], so that a standard Rational Hybrid Monte-Carlo (RHMC) algorithm [58][59][60] can be easily applied.
The bare parameters, β, m s and m l ≡ m u = m d , have been chosen so as to move on a line of constant physics (LCP), with a physical value of the pseudo-Goldstone pion mass, m π ≈ 135 MeV and of the strange-to-light mass ratio, m s /m l 28.15. The LCP parameters have been fixed according to the determinations reported in refs. [61][62][63] or to a cubic spline interpolation of them. In table 1 we provide a summary of our simulation points, including the lattice sizes L 3 s × N t and the temperatures T = 1/(aN t ).

The multicanonical algorithm
As already outlined above, in order to efficiently sample the topological charge distribution P (Q) in a regime where fluctuations away from Q = 0 become very rare, we will modify

JHEP11(2018)170
the Monte-Carlo weight of gauge configurations, adding to the distribution appearing in eq. (2.1) a Q-dependent bias, i.e. sampling according to where V (Q mc ) is the bias potential and Q mc is a proper discretization of the topological charge which will be in general different from the observable Q measured to determine the cumulants of the topological charge distribution.
Monte-Carlo averages obtained in this way, denoted by · V , can then be combined to estimate standard averages based on eq. (2.1), using the reweighting formula In principle, the choice of V is not critical, since after reweighting one recovers the original averages anyway. However, on one hand we would like to construct V so as to enhance topological sectors which would be otherwise strongly suppressed, in order to improve the statistical accuracy in the determination of Q 2 . On the other hand, we would like to avoid a typical problem of reweighting, i.e. a possible bad overlap between the distribution in eq. (2.1) and that in eq. (2.4), which could result in a failure to sample important configurations. Both issues will be discussed in more detail in the next subsection, where we show how V has been chosen in practice. A different, important issue regards the discretized topological charge Q mc [U ] entering the potential. Several different lattice implementations, gluonic or fermionic, are available in principle. However, the need for an easy implementation of the potential into the RHMC Molecular Dynamics equations constrains the choice. Indeed, the introduction of V (Q mc ) induces a new force term F related to the value of Q mc . By using the chain rule one obtains Therefore, the calculation of the new force term related to a given gauge link proceeds through the calculation of the scalar coefficient ∂V /∂Q mc , and the calculation of the derivative of Q mc with respect to the given gauge link.
In order to keep the calculation of the derivative simple, an easy choice would be to consider the field-theoretical clover-based definition of the topological charge [64,65]: where Π µν is the plaquette operator,˜ µνρσ is the Levi-Civita tensor for positive entries and is fixed by antisymmetry and˜ µνρσ = −˜ (−µ)νρσ otherwise. With such a definition, the calculation of the force term driving the dynamics amounts to a simple modification of the usual Wilson gauge action force, in which the staples are decorated by clover insertions, just as in pure gauge simulations implementing an imaginary θ term [21].

JHEP11(2018)170
However, despite the easy implementation, such a definition is not convenient for the purpose of this study, in the naive way above. Indeed, it must be remembered that Q f t , like any field-theoretic definition, is related to the actual topological background Q, configuration by configuration, through the relation where Z is a multiplicative renormalization constant [66] and η is a noise term which is practically independent of Q and has zero expectation value. It is well known that the renormalization constant Z is typically quite small, while the variance of η is quite large, with η 2 Z 2 Q 2 , so that Q f t has a small correlation with the actual topological background Q, i.e.
While this is not harmful to the reweighting procedure, which is exact anyway as long as one uses the same charge in the RHMC and in eq. (1.3), it makes it useless, since the bias potential will not be able to increase the variance of the sampled distribution of topological charge Q, because of the loose correlation with it. 1 In order to circumvent this problem, an improved definition of Q mc is needed. This can be easily achieved by standard "smoothing" techniques, i.e. computing Q f t on smoothed gauge fields in place of the original ones. This has the effect of moving Z closer to 1, and of reducing the fluctuations of the noise η, thus leading to a substantial improvement for the correlation with Q. Many methods have been proposed so far to smooth gauge configurations, including the gradient flow [67,68], cooling [69][70][71][72][73], and several kinds of smearing techniques, all being equivalent to each other [74][75][76][77][78][79]. Given the need of integrating the equations of motion induced by V (Q mc ), the most natural choice is again to make use of the same stout smearing procedure adopted to define the fermion matrix, which makes Q mc a differentiable function of the original gauge links.
In particular, assuming to compute Q f t on n times stout-smeared links U (n) µ (i), the computation of the "topological force" related to the bias potential proceeds through . (2.10) The actual implementation of this chain relation amounts to the same procedure used to compute the derivative of the pseudo-fermion contribution to the force, described in ref. [57] and already adopted in our fermion discretization. For small values of the stouting parameter ρ st , n st stout-smearing steps are equivalent to a gradient flow time τ = n st ρ st [78], hence to a number of cooling steps n cool = 3 τ = 3 n st ρ st [74].
One would like to have n st ρ st large enough to enhance Z and suppress η, so as to have a reasonable correlation between Q mc and Q. However, using a large number of stout JHEP11(2018)170 smearing steps implies a large numerical overhead in the implementation of the molecular dynamics equations while, on the other hand, stout smearing does not act as a smoothing if ρ st is too large [57]. As a matter of fact, we found as a good compromise to fix ρ st = 0.1 and to choose n st in the range from 10 (for the finest lattice spacing) to 20 (for the coarsest), which corresponds to a number of equivalent cooling steps going from 3 to 6. For this choice, the total numerical overhead due to the introduction of the bias potential goes from a minimum of 30% to a maximum of 60%, measured with the respect to the computational time needed in absence of the bias potential. Notice that, despite the moderate overhead, implementation of the bias potential in full QCD, through the RHMC equations, does not require a change of simulation paradigm as it is needed in the pure gauge case [54].
Finally, the determination of the topological charge Q used for measurements was based on a standard cooling procedure, adopting in particular n cool = 80, then rounding it to the closest integer value, in particular following the procedure originally proposed in ref. [45] (see also ref. [28] for more details). Such a definition is based on the expectation that the physical topological content of gauge configurations becomes stable under action minimization when the continuum limit is approached, while ultraviolet (UV) fluctuations responsible for renormalizations are removed, and has been proved to yield results equivalent to the gradient flow [74,77]. Results obtained in this work have been checked to be indeed stable, within statistical errors, in a range of cooling steps going from 40 to 120; however, in order to be sure to correctly include possible systematics of the method, the observed small variations in this range has been added as a systematic error in all cases.

Practical implementation and choice of the potential
In this section, as an illustrative example, we discuss the way in which we fixed the form of the biasing potential for the case of the 32 3 × 8 lattice at β = 4.140 (see table 1), corresponding to T 430 MeV.
In figure 1 we report the Monte-Carlo history of the topological charge, from which it is already clear that only rare fluctuations to just the Q = ±1 topological sectors take place, this is also clear looking at figure 2, where we report the probability distribution of the topological charge measured after 20 steps of stout smearing. Indeed, we obtain Q 2 0.01, meaning that the sectors with non-zero topological background are suppressed by about 2 orders of magnitude with respect to the Q = 0 sector. In particular, taking also autocorrelations into account, we obtain a 4 χ = Q 2 /V 4 = (4.1 ± 1.6) × 10 −8 , which is compatible with the value obtained at the same temperature and lattice spacing in ref. [39].
In principle, the optimal choice for V (Q mc ) would be the one which makes the biased topological charge distribution flat, i.e. V (Q mc ) should be taken equal to minus the logarithm of the probability distribution for Q mc determined in absence of the potential: that would enhance the probability of configurations at the border between different topological sector, thus defeating critical slowing down at the same time. However, this would require a precise a priori knowledge of P (Q mc ): were that available, the problem would have already been solved. On the other hand, we are not looking for the optimal choice but just for a substantial improvement, so we will explore some suitable smooth potentials.  A first possibility is to choose a quadratic potential, since, at least for large enough volumes, the probability distribution of Q mc is expected to become Gaussian-like. Willing to enhance the Q = ±1 sectors by a factor O(100), as it seems necessary looking at figure 2, one should choose a q ∼ log(100) 4.6; however, even smaller magnitudes of a q may lead to dangerous instabilities. Indeed, as an example, consider a potential like that in eq. (2.11), with a q = 3.25, which is shown in figure 3, where the bias towards larger topological charges is stopped at a threshold charge Q max = 3, fixing V (Q mc ) = V (Q max ) for |Q mc | ≥ Q max . The Monte-Carlo history of the topological charge obtained with this bias potential is shown in figure 4: Q mc gets trapped around Q max and as a result also Q assumes only positive values. That means that the bias is too strong towards large values of |Q mc |, so that the biased system develops a sort of spontaneous breaking of CP symmetry: that clearly affects the validity of the reweighting procedure, since topological sectors which are relevant to the original path integral now are badly sampled. 2 It is of course possible to choose lower values of a q for which this problem does not appear, however, in order to maintain a good enhancement of low Q = 0 sectors, we have explored a different class of potentials with a less steep behavior at large Q, in particular (2.12) In figure 3 we show one example of such potential, which we have found as a good compromise after a few trial runs, for which B = 6 and C = 2: also in this case the bias  is cut at Q max = 3. The corresponding Monte-Carlo history is shown in figure 5: the topological charge now fluctuates evenly around Q = 0, and sectors which are important in the original partition function are well sampled, however also contributions from Q = 0 are explored frequently, so that it is possible to obtain a more accurate determination of their contribution to Q 2 after using eq. (2.5). This is also visible from the probability distribution obtained for Q mc during the biased run shown in figure 6. It is interesting to notice, looking at figure 5, that the correlation between the two charges, Q and Q mc is visibly good; from a quantitative point of view one obtains, using the definition in eq. (2.9), a good correlation around 0.86, meaning that the bias on Q mc is also a good bias for Q. For comparison, the corrlation between Q and Q f t before any stout smearing step is quite low and around 0.08, while topological charges obtained after prolongated cooling show of course larger correlation, for instance that between 60 and 80 cooling steps is 0.97.
The final estimate obtained for the susceptibility from this run is a 4 χ = Q 2 /V 4 = (6.1 ± 1.1) × 10 −8 , where the error has been estimated after a binned jackknife analysis using eq. (2.5). Notice that, after reweighting, the relative contribution to Q 2 from the Q = 3 sector turns out to be well below 0.01 , so that the cut at Q max , which does not enhance the contribution from sectors with larger values of Q, is totally irrelevant.
In the following we will show results from several runs performed for different lattice volumes and at different values of the lattice spacing. In all cases we have adopted a bias potential of the form showed in eq. (2.12), fixing the coefficient by reasonable guesses followed by some preliminary short runs. For instance, changing the lattice volume at fixed temperature and lattice spacing, Q 2 is expected to scale proportionally to V 4 , so that a good starting guess is to rescale B 2 proportionally to 1/V 4 .

Numerical results
In this section, after discussing the effectiveness of our numerical approach, as compared to standard Monte-Carlo simulations, we will illustrate results obtained for the topological susceptibility and for the b 2 coefficient, and discuss the relevance of finite size and finite discretization effects.

Efficiency of the method and gain over the standard approach
In the exploratory simulation that we have discussed in some detail in the previous section, the relative error has decreased from 39%, with the standard approach, to about 18% when using the bias potential in eq. (2.12), maintaining more or less the same number of RHMC trajectories. That is as if we have gained roughly a factor 4 in statistics, which however, when considering the 60% overhead required to integrate the equations of motion for V (Q mc ) with 20 stout smaring steps, reduces to about a factor 2.5 in terms of computational effort gain. Of course, one should consider the additional moderate effort spent in the preliminary small runs required to find a reasonable bias potential. However, the gain is expected to grow rapidly as the value of Q 2 one has to determine decreases further: that could happen either by increasing the temperature further or, according to the indications for a strong cut-off dependence of χ [40,42], by decreasing the lattice spacing.
Therefore, let us consider a run performed on a 48 3 × 16 lattice at β = 4.592, corresponding to a 0.0286 fm and the same temperature T 430 MeV considered in the previous example. In this case we do not have a simulation adopting the standard approach to compare with, since in that case no topological fluctuation at all is expected in a few From that we get a rough estimate of the computational effort that would have been required to obtain the same statistical accuracy, which is around 30%, with the standard approach. One should observe roughly O(10) fluctuations away from Q = 0: given the value of Q 2 , that would require around 5 × 10 4 independent draws of Q when using the standard run. We do not know what the autocorrelation time for Q in the standard run would have been, however we can assume it is at least of the same order of magnitude as that observed in figure 7, which is O(10 3 ). Therefore, one estimates the number of RHMC trajectories that would have been necessary using the standard approach to be roughly between 10 7 and 10 8 , implying a gain in computational effort of O(10 3 ).

Results for χ and b 2
We now illustrate the numerical results obtained for the topological susceptibility and for the b 2 coefficient at two different temperatures and several lattice spacings, as illustrated in table 1, with the aim of obtaining a continuum extrapolation. Given the small values of χ, one would also like to exclude that the limited spatial volume available induces significant distortions in the distribution of topological charge: therefore, at least for the lower temperature, we have also explored, for every lattice spacing, different spatial volumes, in order to exclude the possibile presence of finite size effects. In figure 8 we report the whole collection of results obtained for T 430 MeV, where results for a 4 χ are reported as a function of the inverse spatial volume for the different lattice spacings (see table 1), which appear in increasing order starting from the bottom. Finite size effects appear to be not significant, and the horizontal bands represent our infinite volume estimates for a 4 χ. 2 It is interesting to compare the results obtained for a 4 χ with those that one would obtain by just sampling the topological sectors with |Q| = 0, 1: in this case the estimate of a 4 χ would be given by (2Z 1 /Z 0 )/V 4 , where V 4 is the four volume and Z Q is the partition function restricted to topological sector Q. In general, the ratio Z Q 1 /Z Q 2 can be simply obtained from our simulations by taking the ratios between the reweighted averages for the occurences of Q 1 and Q 2 . In figure 9 we report (2Z 1 /Z 0 )/V 4 computed for the same lattice spacings and volumes reported in figure 8: it is clear that in fact, at least for the explored physical volumes, the two lowest topological sectors |Q| = 0, 1 contain practically the whole information relevant to the computation of the aχ 4 , whose infinite volume extrapolation, already reported in figure 8, is reported again for a better comparison (horizontal bands).
However, it is important to stress that our simulations contain much more information than just Z 1 /Z 0 , indeed we are able to estimate the ratios Z Q 1 /Z Q 2 with good accuracy also for higher values of Q 1 and Q 2 . In figure 10 we report, as an example, the ratio Z 2 /Z 0 , normalized by V 2 4 and measured for the different lattice spacings and volumes. The reason of the normalization is to elucidate an important piece of information contained in these data: Z 2 /Z 0 scales proportionally to V 2 4 , as expected for the occurrence of two independent and non-interacting topological objects in the same volume, i.e. for the Poissonian distribution of the topological charge predicted by DIGA. A similar behaviour, i.e. a scaling with the appropriate power of the volume predicted by DIGA, is observed also for Z 3 /Z 0 .
The same estimates, expressed in physical units for χ 1/4 , are reported in figure 11 as a function of a 2 ; in the same figure, continuum extrapolations at the same temperature from ref. [39] and ref. [42] are also reported. As one can easily appreciate, finite cut-off effects are huge, with χ decreasing by a factor 40 when moving from a 0.0576 fm to a 0.0286 fm: that explains the discrepancy observed among previous literature results. 2 In almost all the cases the values ofχ 2 are somehow large, however (because of the small number of degrees of freedom) these numbers are not incompatible with the hypothesis that χ 1/4 (a) is linear in a 2 for the three smallest values of the lattice spacing. On the other hand the largeχ 2 value obtained for T = 430 MeV when using all data points, and the only marginal agreement between the results of the two fits performed at T = 530 MeV, are indications that the continuum limit systematics are still not completely under control. A fair account of our final results, based on the fits with lowest chisquared but including such systematics, would be (3 ± 3 ± 2) MeV for T = 430 MeV and (7 ± 5 ± 6) MeV for T = 530 MeV. Moreover, the almost 100% relative errors of our continuum extrapolations prevent a reliable estimate of the power law coefficient in eq. (1.1). Let us however stress once more that our main purpose in this paper was not to provide a precise determination of the temperature dependence of χ(T ), but to show that the obstacle represented by the suppression of the Q = 0 sectors at high temperature can be overcome in a model independent way. This is obviously not the end of the story since we still have other obstacles and, as we discuss in the next section, smaller lattice spacings or a way to reduce lattice artifacts are needed for the future. Borsanyi et al.
Bonati et al. Figure 11. Fourth root of the topological susceptibility, as a function of a 2 , for T 430 MeV. To two bands represent the result of two different continuum extrapolations, taking into account respectively O(a 2 ) or O(a 4 ) corrections. The two horizontal lines are the continuum extrapolations reported respectively in refs. [39] and [37] at the same temperature.
Despite the huge cut-off effects observed for χ, the values obtained for the b 2 coefficient defined in eq. (1.2), which are reported in figure 13 for T = 430 MeV, are practically independent of a within errors, and always compatible with the prediction from DIGA, b 2 = −1/12. A fit to all lattice spacings at this temperature assuming O(a 2 ) corrections yields −12b 2 = 1.0006(10) (χ 2 = 0.75/3), while −12b 2 = 0.9997(3) (χ 2 = 1.6/4) is obtained taking a fit to a constant function. These results for b 2 are consistent with the assumption of a dilute gas of independent topological objects, however let us stress that we could not have expected anything different from that, given the fact that, on the explored volumes, the topological sectors with |Q| = 0, 1 are largely dominant (see figure 9) and moreover Z 1 /Z 0 1: that alone yields inevitably to b 2 −1/12 and would happen even for T = 0 if one takes the volume small enough. Therefore, a more substantial support to the validity of DIGA, which is assumed in ref. [42], is given by the analysis of the volume scaling of the ratios Z Q /Z 0 that we have discussed above: that guarantees that, even when making the volume large enough that |Q| = 1 does not dominate any more, b 2 will remain consistent with DIGA and the susceptibility will maintain the value already determined on the smaller, accessible volumes.

Discussion
The determination of the topological susceptibility in the high temperature regime has to face various numerical challenges: topological fluctuations become very rare and difficult to sample, UV cut-off effects can be quite large because of the non-exact lattice chiral symmetry of the adopted fermion discretization and, finally, the freezing of topological modes leads to a critical slowing down when one gets too close to the continuum limit. These  difficulties are at the basis of the discrepancies found among recent lattice determinations adopting different approximations or assumptions.
In this study, we have shown how rare topological events can be effectively sampled, in a controlled way, by inserting a bias Q-dependent potential in the probability distribution of gauge configurations, which is then reweighted away in the analysis. That permits to gain orders of magnitude in terms of computational effort, in situations in which Q 2 1. That has given access not only to the topological susceptibility and to b 2 , but also to quantities like Z Q /Z 0 with |Q| > 1, which are extremely small on the accessible lattice volumes and would have been completely unaccessible otherwise: a careful check of the volume scaling has revealed that finite size effects are not relevant and has given substantial support to the validity of the DIGA assumption in the explored temperature regime.
Nevertheless, as we have seen, the large UV cut-off effects still represent a problem: χ drops by 3-4 orders of magnitude when moving from a ∼ 0.06 fm towards a = 0, meaning that, despite the huge improvement induced by the bias potential, the final continuum extrapolation still has considerable error bars. In particular, the present accuracy obtained at the two explored temperatures, with relative errors not far from 100%, does not permit us to extract a reliable estimate for the power law coefficient in eq. (1.1).
In ref. [42], UV effects have been suppressed by an ad hoc procedure which reweights gauge configurations with Q = 0 by forcing the Q lowest eigenvalues of the discretized Dirac operator associated with them to be zero. This procedure becomes exact as one approaches the continuum limit, where the Dirac operator indeed develops exact zero modes, however it induces non-local modifications in the discretized theory at finite lattice spacing. For that reason, an independent way of obtaining continuum extrapolations of χ with a good statistical accuracy would be welcome. An obvious strategy is to push our efforts forward by exploring smaller lattice spacings. However, that will require to face problems related to critical slowing down, since at the smallest explored lattice spacing the autocorrelation length is already larger than O(10 3 ) unit time RHMC trajectories. A natural solution could be represented by metadynamics [51][52][53], where the bias potential is dynamically tuned so as to enhance the tunneling between different topological sectors.
A different approach could be represented by a modification of the observable used to determine the topological susceptibility. Present results are based on a gluonic definition of the topological charge relying on the smoothing of gauge configurations. The adoption of a fermionic definition of the topological charge could be a much better choice: the point here is not about the solid theoretical basis of this definition, but rather about the practical benefits that one could achieve if the same discretization of the Dirac operator is adopted both for the MC simulation and for the determination of the topological content. In particular, recent studies performed at zero temperature [80] have shown that definitions of χ based on spectral projectors [81] lead to finite cut-off effects which are significantly reduced with respect to those based on standard gluonic observables. We plan to consider both strategies in the near future.