Heavy Wilson Quarks and O( a ) Improvement: Nonperturbative Results for b g LPHA A

With Wilson quarks, on-shell O( a ) improvement of the lattice QCD action is achieved by including the Sheikholeslami-Wohlert term and two further operators of mass dimension 5, which amount to a mass-dependent rescaling of the bare parameters. We here focus on the rescaled bare coupling, ˜ g 20 = g 20 (1 + b g am q ) , and the determination of b g ( g 20 ) , which is currently only known to 1-loop order of perturbation theory. We derive suitable improvement conditions in the chiral limit and in a finite space-time volume and evaluate these for different gluonic observables, both with and without the gradient flow. The choice of β -values and the line of constant physics are motivated by the ALPHA collaboration’s decoupling strategy to determine α s ( m Z ) [1]. However, the improvement conditions and some insight into systematic effects may prove useful in other contexts, too.


Introduction
On-shell O(a) improvement of lattice QCD with Wilson quarks near the chiral limit requires a single new term in the action, the Sheikholeslami-Wohlert term with coefficient c sw [2].The latter is a function of the bare coupling g 2 0 and can be determined by imposing continuum chiral symmetry at finite lattice spacing [3].With N f degenerate quarks of subtracted bare mass m q = m 0 − m cr , there are 2 further counterterms which can be implemented as a rescaling of the bare parameters, i.e. g2 0 = g 2 0 (1 + b g (g 2 0 )am q ), m q = m q (1 + b m (g 2 0 )am q ) . (1.1) In practice, perturbative estimates of these b-coefficients are often sufficient, as am q is a small parameter for the light quark flavours up, down and strange.For the most common gauge actions, i.e.Wilson's plaquette or the tree-level O(a 2 ) improved Lüscher-Weisz actions, the one-loop result for b g reads [4], b g (g 2 0 ) = 0.01200 × N f g 2 0 + O(g 4 0 ). (1.2) For heavier quarks, am q is no longer small, and in the recent decoupling project of the ALPHA collaboration, values of am q up to 0.3 − 0.4 are included in the analysis [1,5].In such situations a non-perturbative determination of the relevant b-coefficients becomes very desirable as, otherwise, uncontrolled systematic errors may ensue.In particular, the decoupling result for α s (m Z ) [1] includes a sizeable systematic error due to an assumed 100 percent uncertainty on the truncated perturbative result for b g , Eq. (1.2).Various b-coefficients have been determined non-perturbatively over the past 25 years, including b m and some of the coefficients required for the O(a) improvement of quark bilinear composite operators (see [6][7][8][9][10][11][12][13][14] for an incomplete list of references).The coefficient b g is a notable exception.While there exist some qualitative ideas in the literature as to how b g could be determined by measurement (see e.g.[15]), no proof of concept regarding their viability and practicality has been given.This is somewhat surprising given the central rôle of b g for the consistency of O(a) improvement: it is required when the quark masses are varied at fixed lattice spacing such as needed for chiral extrapolations or in studies of decoupling [1,5].Furthermore, since constant lattice spacing amounts to keeping g2 0 constant, the renormalization constants in mass-independent schemes depend on g2 0 rather than g 2 0 .In fact, for composite operators this often leads to the determination of effective b-coefficients, which include the b g -contribution from the renormalization constant by a Taylor expansion to first order in am q [11], even if not mentioned explicitly [12].
In this paper we propose a class of improvement conditions to determine b g non-perturbatively and then proceed to evaluate these for N f = 3, non-perturbatively O(a) improved lattice QCD [16] with the tree-level O(a 2 ) improved Lüscher-Weisz gauge action [17].We perform consistency checks by looking at gradient flow observables as well as Creutz ratios in a finite space time volume and for different lines of constant physics.We also generated some data in the small coupling region in order to compare to perturbation theory.Our parameter choices, in particular the range of lattice spacings, are motivated by the decoupling project [1,5] and do not overlap with the parameters of CLS [18,19].
The paper is organized as follows.In Section 2 we discuss improvement conditions for b g .In particular we review the connection to the O(a) improvement term in the flavour singlet scalar density, which clarifies and corrects the discussion in [20].We then define our set of observables (Section 3) and present the chosen line of constant physics and the corresponding simulation parameters (Section 4).In Section 5, we discuss some details of the data analysis and various consistency checks before we present our results for b g (Section 6) and our conclusions (Section 7).A few technical details have been delegated to appendices: Appendix A gives some details pertaining to the derivation of the key equations in Section 2. Appendix B contains some additional information on the simulations and corresponding parameters, while the simulation results are tabulated in Appendix C.
2 Improvement conditions for b g

Chiral Ward identities and b g
While improvement coefficients such as c sw or c A , multiplying the O(a) counterterms in the action and to the axial current, respectively, can be determined by chiral Ward identities [3], this is less obvious for the b-coefficients multiplying am q .One problem is that, away from the chiral limit, the on-shell condition is potentially violated by unavoidable contact terms, at least for chiral Ward identities beyond the simple PCAC relation.A systematic analysis for the N f = 3 lattice QCD action and all quark bilinear composite fields has been carried out by Bhattacharya et al. in [20].By studying the general case of mass non-degenerate quarks, they show how violations of the on-shell condition can be by-passed and they list all combinations of renormalization constants and improvement coefficients which are, at least in principle, determined by chiral Ward identities.The coefficient b g is included in this list, through its relation to the O(a) counterterm of the flavour singlet scalar density, S 0 (x) = ψ(x)ψ(x).In the mass-degenerate limit, the corresponding renormalized and O(a) improved operator takes the form [20,21], and d S can, in prinicple, be determined by chiral Ward identities, together with c S and the scale independent ratio Z P /Z S 0 [21].A difficulty is posed by the fact that the chiral limit is only defined up to cutoff effects of O(a 2 ), which render the subtraction of the cubic power divergence ambiguous.More precisely, with c S ≡ c S (g 2 0 , am q ), an expansion in powers of am q yields a term ∝ am q /a 3 = m q /a 2 .Given the intrinsic O(a 2 ) ambiguity of m q , this results in an ambiguity of O(1) in the subtraction term rendering the definition of the renormalized scalar density impossible along these lines.It is therefore important to note that one may combine Ward identities such that only connected correlation functions remain, in which all power divergences cancel.A practical implementation using SF boundary conditions [22,23] can be obtained following the discussion in [21].Whether these observations lead to a practical method for a non-perturbative determination of b g remains to be seen.We emphasize that this strategy relies entirely on the identity (2.2).We would therefore like to point out a subtlety that was overlooked in [20].Given that one relates an improved operator to the improved action parameters, it is natural to expect that the O(a) counterterm in Eq. (2.1) cannot be discretized arbitrarily, but must be related to the derivative of the lattice action density with respect to g2 0 .For definiteness, the O(a) improved lattice action includes the lattice gauge action S g = a 4 x L g (x) with L g being the lattice version of the (Euclidean) Yang-Mills Lagrangian density in the continuum, We find that, due to the g 0 -dependence of c sw , a contribution from the fermionic part of the action must be included, so that the correct lattice discretization of the counterterm is given by, ) For future reference we provide a detailed derivation in Appendix A.

Restoration of chiral symmetry in a small volume
In this paper we pursue a different approach.It is based on the observation that, in the absence of spontaneous chiral symmetry breaking, physical quark mass effects in gluonic observables are quadratic or higher order in the quark mass.In contrast, the counterterm proportional to b g is designed to cancel terms that are linear in the quark mass, allowing us, in principle, to distinguish these effects.Let us first have a closer look at the physical quark mass dependence.If we consider a finite space time manifold without boundaries, the lattice QCD partition function with Ginsparg-Wilson type quarks [24][25][26][27][28] becomes a finite dimensional, mathematically well-defined ordinary and Grassmann-integral, with exact chiral and flavour symmetries.The absence of spontaneous symmetry breaking in a finite volume, implies that the partition function is an analytic function of m q .For even N f , a change of the fermionic variables by a discrete chiral field transformation 2 then establishes that this function must be even, implying the absence of terms linear in m q for any gluonic observable.With odd N f > 2, this is no longer true; however, by adapting the argument of appendix D in ref. [29] to the lattice regularization with Ginsparg-Wilson quarks, the discrete chiral field transformation, leads to a change of variables with unit Jacobian that leaves the massless action invariant.Under the same transformation, a single insertion of the flavour singlet scalar density, S 0 , into a gluonic correlation function is proportional to itself and must vanish, provided that both the QCD action and the gluonic observable are parity even. 3Although this does not exclude all odd powers in the quark mass, it means that contributions linear in the quark mass are indeed absent.We will denote a generic renormalized gluonic observable by O g .Gradient flow observables [31,32], such as the gauge action density at finite flow time [32], are natural candidates, but the gradient flow is not a necessary requirement.We choose a finite Euclidean space time volume L 4 with boundary conditions that do not break all the chiral symmetries.A hyper-torus geometry with some kind of periodic boundary conditions for all fields is an obvious possibility, however, other possible options include chirally rotated SF (χSF) boundary conditions (with even N f ) [33,34], or a mixture of SF [22,23] and χSF boundary conditions (with odd N f ) [35].For simplicity we also assume that O g has no mass dimension, which can always be achieved by multiplication with the appropriate power of L. In the continuum limit, the observable thus becomes a function of two dimensionless variables, which we may choose as z = M L and Λ (3) MS L, with M and Λ (3) MS ≡ Λ being the Renormalization Group Invariant (RGI) quark mass and Λ-parameter of the N f = 3 theory, respectively.From the previous discussion we then expect that the continuum limit of the lattice expectation value, ⟨O g ⟩, becomes a function of (z, ΛL), with an expansion in the RGI mass of the form (2.8) When considering this observable on the lattice, the mass dependence will have a linear term in z, unless b g is chosen correctly.Hence, a possible improvement condition for b g is given by In order to derive an explicit equation for b g , we will proceed in two steps.First we keep the lattice size L/a and spacing a fixed.Up to O(a 2 ) effects this can be achieved by working at fixed g2 0 .The choice for a line of constant physics only matters later, when g2 0 and thus the lattice spacing is changed.This will be discussed in Section 4. Since z is proportional to m q , we may change variables from (z, ΛL, L/a) to (a m q , g2 0 , L/a) and the improvement condition at a given lattice spacing then reads ∂⟨O g ⟩ ∂a m q mq=0,g 2 0 = 0 . (2.10) We would now like to perform a final change of variables from improved to unimproved bare parameters.In fact, it is technically convenient to do the reverse transformation and then solve a 2 × 2 linear system.For the sake of readability we have relegated this discussion to Appendix A. The improvement condition then implies, This is the desired equation for b g in terms of the bare parameters, which are the ones directly controlled in numerical simulations.Note that the condition is again formulated in the chiral limit, m q = 0.
At this point we recall that, with Wilson quarks, the massless limit is not unambiguously defined.What is required here is a definition up to an O(a 2 ) ambiguity, which can be achieved by tuning the improvement coefficients c sw in the action and c A in the improved axial current used for the definition of the PCAC mass [3].Note that any remnant O(a) ambiguity of the chiral limit would make the quadratic physical mass effect vanish only up to O(a) corrections, which, by virtue of the 1/a factor in the am q -derivative would contribute at O(1), leading to a wrong result for the b g -estimate.
A number of choices must still be made.First, one needs to pick suitable gluonic observables, O g .Second, one needs to find a practical way to implement the derivatives with respect to the bare parameters and then one needs to follow a line of constant physics as β = 6/g 2 0 and thus the lattice spacing is changed.

Choice of observables
We choose a space-time volume with linear extent L in all directions and a hyper-torus topology.We use periodic boundary conditions for the gauge fields, and antiperiodic boundary conditions for the fermions in all 4 space-time directions.The absence of a boundary avoids potential violations of chiral symmetry, while antiperiodic boundary conditions for the fermions allow us to perform simulations around the chiral limit (cf.Section 4.2 and Appendix B).To finish the description of our set-up it remains to specify our choices of gluonic observables, O g .

Gradient flow energy density
First, we consider the action density in terms of the gradient flow field tensor G µν (t, x).In a continuum language it is given by In the numerical evaluation translation invariance w.r.t.x is of course used.The projection to the sector of vanishing topological charge Q(t) is performed to avoid large autocorrelation times as discussed in [36,37].While the motivation is algorithmic, it is part of the definition of the observable.The gradient flow observable is finite: proven to all orders of perturbation theory [38] and well established in numerical simulations.The remaining parameter c fixes the ratio between the scale set by the flow time and the box length [39].On the lattice we discretize the gradient flow equations through the O(a 2 ) improved Zeuthen flow definition, while the action density of the flow fields is obtained from an O(a 2 ) improved combination of plaquette and clover discretizations (see [37,40] for the details).

Creutz ratios
To check for O(a) ambiguities in b g we also use a second observable.It is very independent from σ, namely it is defined without the gradient flow.We start from rectangular Wilson loops W (R, T ) and form Creutz ratios [41] where the lattice derivatives ∂y f (y) = [f (y + a) − f (y − a)]/(2a) are symmetric to avoid linear terms in a.Note that, analogously to the energy density at positive flow time in eq.(3.1), also the Wilson loops W (R, T ) defining the Creutz ratios are projected to the topologically trivial sector.These Creutz ratios are made dimensionless and restricted to the diagonal ones, We have fixed the ratio R/L to 1/4 since this is a practical number for lattices with L/a = 12, 16, . .., which we will use below.Also χ is a finite pure gauge observable for the following reasons.In continuous space-time, Wilson loops around smooth paths have been shown to be finite up to an overall factor Z W (l) which depends (apart from coupling and regularisation parameter) only on the length l of the path [42].When the path contains cusps, such as the rectangular Wilson loops, there is an additional renormalization factor depending on the angles of the cusps.Both factors are removed by the derivatives in eq.(3.2).In our implementation on the lattice, we apply one HYP2 smearing step with parameters α 1 = 1, α 2 = 1 and α 3 = 0.5 to the link variables which form the loop [43][44][45][46][47]. Since this does not change the symmetry properties of the links, χ remains finite and has the same continuum limit as the observable with no smearing.Finally we need also the absence of terms linear in a in the observable.Their absence for the static quark potential has been explained in [48].Analogously it holds for Creutz ratios defined with symmetric derivatives.
4 Line of constant physics (LCP) and simulations

LCP
The Symanzik expansion applies when the lattice spacing a is changed with all other scales being fixed.Consequently also improvement conditions need to be enforced along a LCP.Since vanishing quark masses are required for our improvement condition, it only remains to fix L in physical units. 4To this end we choose g 2 GF (L) = 3.949, where g 2 GF (L) is the gradient flow coupling defined in a finite volume with spatial extent L and SF boundary conditions (cf.[1] for a precise definition).For resolutions L/a ∈ {12, 16, 20, 24} the required β-values are found in Table 13 of [1] with sufficient precision.They are in the range β ∈ [4.302, 4.7141].For larger L/a, it turned out that a precise computation of b g becomes prohibitively expensive.We therefore deviate from the strict LCP and set L/a = 24 for β ≥ 4.9.At that point, the lattice spacing in units of the Λ-parameter is very small and also the only additional scales determining a-ambiguities, √ 8t as well as R, are large enough to make these ambiguities sufficiently small.We show numerical tests in section 5. Furthermore, the extracted values for b g approach the one-loop perturbative values smoothly.All-in-all it is sufficient to determine b g for the parameters listed in table 1.
where Z m is the quark mass renormalization in the SF-scheme at the renormalization scale µ dec (see section 4.1 for the details).

Simulations
We simulate QCD with a Lüscher-Weisz gauge action [17] and three flavours of O(a) improved Wilson quarks [2,16].To implement the strategy described above, simulations of QCD at zero quark mass or even at slightly negative masses are necessary.A finite volume with suitably chosen boundary conditions is indispensable to prevent negative or very small eigenvalues of the Wilson Dirac operator.With Schrödinger Functional boundary conditions, massless simulations of our action on our LCP were unproblematic [37].Here, in order to avoid chiral symmetry breaking by the boundaries, we work on a hyper-torus.Choosing anti-periodic boundary conditions for the fermionic fields in all four directions induces a sufficiently large spectral gap.
The simulation setup is similar to previous simulations with N f = 3 degenerate quarks.We use an even-odd preconditioned variant of the HMC algorithm [49] in which the three quarks are separated into a doublet and a third quark.The quark determinant for the doublet is factorized [50] and pseudo fermion fields are introduced for each factor.Here D denotes the even-odd preconditioned clover Wilson operator.For the third quark it is required that all eigenvalues of D have real parts that are larger than some r a > 0. The operator in the determinant det[ D] = det[ D † D] can then be approximated by a rational function of D † D [51,52].The inexactness of this approximation is accounted for by including a stochastically determined reweighting factor in the observables.The molecular dynamics equations derived from the various contributions to the action are solved using multi level integration schemes with two levels [53].The gauge action is integrated with the finest resolution while forces from different pseudo-fermion actions are evaluated less frequently.We use a modified version of openQCD-1.6[54] for our simulations and refer to its documentation for further details.Simulation parameters as well as tests concerning the spectral gap and the ergodicity of the simulations are collected in appendix B.
5 Data analysis 5.1 Estimates for the derivatives entering eq.(2.11)In principle the derivatives can be obtained from insertions of ∂ p S, where p denotes the bare quark mass and the bare coupling.However, such measurements have large statistical errors.Instead, for a given L/a, we simulate at neighboring values in g 2 0 and m q , fit the data as a function of m q at fixed g 2 0 and vice versa, and then obtain the derivative as the derivative of the fit functions.In detail we fit to the data in tables 5 and 6 and mostly take the results with n m = n g = 3 to three data points.Let us discuss this choice as well as the exceptions from it.For the determination of the quark mass derivative the fitted data are generically at am q ≈ 0, ±0.025.We need to check that this is small enough to obtain the first derivative ∂ amq ⟨O g ⟩ ≈ c 1 , namely that we are not affected by higher order terms in am q beyond n m = 3.Indeed, note that the variable am q in eq.(5.1) is natural for the linear term in am q that we are seeking.However, already the quadratic term in the mass exists in the continuum limit.It is then naturally written in the form ĉ2 (Lm) 2 = c 2 (am q ) 2 with some renormalized quark mass m and a coefficient ĉ2 (a/L) which has a finite limit ĉ2 (0).Therefore, as we increase L/a, the coefficient c 2 increases in magnitude roughly at the rate (L/a) 2 (up to logarithmic renormalization effects).At β = 4.7141 we simulated five quark mass points with am q ≈ 0, ±0.015, ±0.025.The corresponding data for σ(0.18) are shown in fig. 1.One sees significant curvature, but also that a purely quadratic fit works well.In table 1 we list both c 2 and ĉ2 = (a/L) 2 Z −2 m c 2 for the strict LCP. 5 Apart from the small statistics L/a = 20 results, the expected behavior is clearly seen.It implies that the largest contamination of our data by higher order effects in am q is present for L/a = 24.Therefore we checked explicitly for β = 4.7141, L/a = 24, that we obtain the same derivative by fitting the available five points with n m = 3 and also by taking just the symmetric derivative, ∂amq f = [f (am q )−f (−am q )]/(2am q ) with either am q = 0.015 or am q = 0.025.The resulting estimates for the derivative agree well within errors: ∂amq σ amq=0.025= 0.01786 (54) , ∂amq σ amq=0.015= 0.01806( 55) , ∂ amq σ fit 5pts = 0.01796 (38) . (5.3) Altogether we conclude that systematic errors due to this step are negligible and use the five-point fit at β = 4.7141 and the three-point fit otherwise.The g 2 0 derivatives are obtained from the fits with n g = 3 to the data at the three closest values of g 2 0 of the data compiled in table 5.For the case of L/a = 24, we compared these with the estimates from n g = 5, fitting all nine available values of g 2 0 in the range specified by 4.6 ≤ β ≤ 6.6 (cf.table 5).The comparison, listed in table 2 for σ(0.18), shows that the estimates for the g 2 0 derivative agree within uncertainties and the changes in the resulting b g are negligible compared to the overall errors, which are dominated by the ones of the quark mass derivative.The nine-point fit with n g = 5 is illustrated in fig. 1.Comparison of b g and corresponding g 2 0 derivatives of σ(0.18) from 3-point n g = 3 and 9-point n g = 5 fits, for L/a = 24 and different values of β (cf.table 5).
The data for χ are analyzed in the same way.Results for b g from both observables are listed in table 3.As checks on the dependence on c, R/L, and L/a-values deviating from the lines of constant physics, there are additional data not reported in the tables.

Relaxation of LCP condition for β ≥ 4.9
For our application to decoupling [1], we also need bare couplings below g 2 0 = 6/4.7.Strictly following the LCP then requires prohibitively large L/a, in particular as we seek the connection to perturbation theory below g 2 0 = 1.We therefore keep L/a = 24 fixed for all β ≥ 4.7.In principle this means that there is an ambiguity of order a/L = 1/24 which does not vanish as we increase β and therefore decrease a.We follow a skewed trajectory where L decreases.
The magnitude of the ambiguity can be tested by computing how b g varies as one changes L. At fixed a, we therefore added computations with L/a = 16, 32.In fig.2, we show b g from σ, for L/a = 16, 24, 32 and β = 4.9.Within a small margin, a universal behavior, independent of L/a, is seen when b g is considered at fixed √ 8t/a.Therefore it is actually not very relevant that we deviate from the fixed L trajectory.Rather we have to be concerned about the change in t (which is linked to L by t = c 2 L 2 along the strict trajectory).Fortunately, it is seen that b g becomes approximately independent of t for √ 8t > ∼ 3a.Since c = 0.18 puts us into the flat region    for L/a = 24, there are good indications that our deviation from the strict LCP is irrelevant at the level of our uncertainties.In fig.3, we show the same investigation for b g from R 2 χ(R, R), where the variable √ 8t is replaced by R. The precision is not as good as in fig.2, but the statements made before hold, exchanging √ 8t → R.  In summary, we may relax the strict LCP as long as we keep √ 8t and R larger than about three in lattice units and have L/a ≥ 24.The resulting ambiguity is small.We emphasize that above g 2 0 = 6/4.9,where the improvement by b g is most relevant, we keep to the strict LCP.

Results
Our central results from the gradient flow energy density σ(0.18) (cf.table 3) are displayed in fig. 4. As an interpolation of the data from the perturbative regime up to our maximum bare coupling, we fit the results to g = 0.036 .(6.1) A good fit is obtained already for n k = 2 with parameters This interpolation deviates little from others, obtained with e.g.n k = 3 or from ones where we add a term proportional to a/L.The maximum (absolute) difference is 4 × 10 −4 in the important range of 4.30 ≤ β ≤ 5.17, and it is 1.1 × 10 −3 overall.Apart from this interpolation uncertainty, it is relevant to check for the always present ambiguity of O(a).We display the difference to the determination from ⟨O g ⟩ = χ in fig. 4. The ambiguity is small and consistent with b σ g − b χ g = const.× a/L.The uncertainty of the difference is dominated by b χ g , but especially in the region of larger g 2 0 , the test is rather significant and reassuring.The large difference of b g from the one-loop approximation is not due to our particular choice of improvement condition.
The extra factor a/L is added because b g always enters into observables with an explicit factor of a; we also include a linear fit to all points constrained to go to zero.The values of a/L considered in the plot have been obtained by fitting the values of L/a as a function of β LCP in Table 1 of ref. [1], and enforcing the universal 2-loop running of the lattice spacing for g 2 0 → 0.

Conclusions
For the renormalization-by-decoupling strategy [1,5], massive and massless QCD have to be connected with high precision.In such problems the improved bare coupling g2 0 needs to be kept fixed as one approaches the continuum limit and the improvement coefficient b g is needed.The relevant combination am q b g reaches up to 0.06 in the continuum extrapolation of the massive coupling in [1].We have shown that b g can be determined requiring the absence of a term linear in the quark mass around m q = 0 when boundary conditions do not break chiral symmetry and one is not effectively in the large volume region where chiral symmetry is broken spontaneously.We have further seen that with L ≈ 0.25 fm, periodic boundary conditions for the gluon fields and anti-periodic ones for the quarks, the Dirac operator has a sufficiently large spectral gap such that the theory can be simulated around the chiral limit.Furthermore, the quark mass derivative could be extracted precisely and we obtained b g with good precision.The result, eqs.(6.1) and (6.2), can be used without uncertainty, since subsequent assumptions made on the form of the continuum limit will be more relevant than our errors.Still, we have cited a systematic error which can be used if desired.
We emphasize that eq. ( 6.1) has not been shown to be valid below β = 4.3.Firstly, we simulated only starting at that β-value, and secondly the LCP would quickly demand very small √ 8t/a where ambiguities are potentially very large.Extending the present method to the region of CLS simulations [18,19] is therefore not entirely straightforward.One alternative are higher order perturbative estimates; for recent steps towards a 2-loop result cf.[55], and one of us (MDB) has derived a preliminary result for the Wilson gauge action using the code developed for the 2-loop calculation of the SF coupling [56,57].Another alternative is the chiral Ward identity technique, applied to the O(a) improved flavour singlet scalar density, eq.(2.1), cf.[21].The connection to b g relies on the identity, eq.(2.2), which was first derived by Bhattacharya et al. in [20].In Section 2, we have pointed out an oversight in this original derivation.The identity can however be rescued, provided one uses the particular discretization of the O(a) counterterm to the scalar density, eq.(2.5), which includes a contribution from the Sheikholeslami-Wohlert term.the cubic divergence cancels in the connected correlation function, so that we can ignore the counterterm ∝ c S .
We now use the chain rule to relate the derivative of the action with respect to O(a) improved bare parameters (depending on b g and b m ) to the derivatives with respect to the bare parameters.One may think of this as a change of variables, (g 2 0 , am q ) → (g 2 0 , a m q ) , (A.5) where the latter are functions of the unimproved bare parameters.As is customary in the physics literature, we do not use different function names.Straightforward application of the chain rule gives, We thus obtain a 2 × 2 system of equations, in the unknowns X 1 = ∂S/∂ m q and X 2 = ∂S/g 2 0 .Working out the derivatives with respect to the unimproved bare parameters we further have, ∂ m q ∂m q g 2 0 = 1 + 2b m am q , (A.9) where we use the prime notation for derivatives w.r.t.g 2 0 , e.g.b ′ g ≡ db g /dg 2 0 .The derivatives of the lattice action are given by, Inverting the matrix M , solving for X 1 , and inserting the explicit expressions for the matrix elements, we obtain, Expanding this expression to first order in am q and neglecting O(a 2 ) terms we then get, Upon insertion into the connected correlation function, Eq. (A.4), and comparing with Eqs.(2.1,A.3)we thus see that we can identify, provided that the d S -counterterm is realized on the lattice in the particular form of Eq. (2.5).We conclude that it is this lattice discretized form of the O(a) counterterm that one must use in order to exploit the identity b g = −2g 2 0 d S .
A.2 Derivation of Eq. (2.11) We now have a closer look at Eq. (A.17) in the chiral limit, m q = 0.According to the discussion in Sect.2, we impose the additional requirement that terms linear in the quark mass are absent.
We thus obtain, in terms of derivatives of expectation values, Solving for b g yields Eq.

B Simulations
In the following we use the notation and conventions of [54] and the openQCD documentation.The simulation setup is very similar for all runs.The trajectory length is always 2 MDUs.Fermionic forces are integrated with a 4th order Omelyan-Mryglod-Folk (OMF4) [58] integrator with 8 integration steps per trajectory.The gauge force is evaluated five times more frequently, as every interval between fermionic force evaluations is integrated with 1 OMF4 step.The factorization of the determinant of the doublet into three determinants, as described in section 4.2, has in all cases the masses aµ 1 = 0.1 and aµ 2 = 1.5.The degrees and ranges of the rational approximations for the third quark's determinant vary and can be found in table 4.This table also shows the acceptance rates.For solving linear systems, either a standard conjugate gradient solver, or a variant that solves systems with different mass shifts simultaneously, is used and stopped when a relative residuum norm below 10 −12 is reached.
More elaborate solvers are not necessary, because the small volume together with our choice of boundary conditions results in quite well conditioned Dirac operators.We monitor the largest and the smallest eigenvalues of |γ 5 D| in order to decide on the ranges for the rational approximation, and in no case these were dangerously close to zero.Two examples are shown in Figure 5. Data analysis is performed using ADerrors.jl[59,60] for the gradient flow energy density and pyerrors [61] for the Wilson loops, keeping track of the correct errors and auto-correlations via the Γ-method [62,63].
At last a remark on the ergodicity of the HMC algorithm.There are two potential difficulties with our setup.First, it is known [63] that with decreasing lattice spacing the correct sampling of topological sectors becomes increasingly difficult.This does not affect our calculation because at the volume corresponding to our LCP the sectors with non-zero charge are suppressed so strongly, that they play virtually no rôle.The second is a difficulty related to the center symmetry.In pure gauge theory, in the large volume limit, the center symmetry is spontaneously broken at high enough temperature.In our small volume the symmetry is restored, but the trace of a Polyakov line exhibits a tri-modal distribution and many algorithms, including the HMC, tend to get stuck in one of the sectors.Although there is no center symmetry to break in the case of full QCD, similar algorithmic issues have been observed.We investigated the distribution of Polyakov lines and observed that at the volume corresponding to our LCP the distribution is unimodal, and in the perturbative region (e.g.L/a = 12, β = 16) there is enough movement between the "sectors" to ensure ergodicity.

C Tables of simulation results
In tables 5 and 6 we list the results of our simulations, which are used to obtain the derivatives of σ(c = 0.18) and χ with respect to the quark mass and bare coupling.Simulation parameters for the m q -fits.Results for σ(0.18) and χ used to obtain their m q -derivative.All measurements are carried out at am q ≈ ±0.025; for β = 4.7141 also at am q ≈ ±0.015.We carried out N rep independent simulations for a total of N ms measurements.The measurements with am q ≈ 0 are listed in table 5.

. 1 )
Using an argument based on Feynman-Hellmann type identities, Bhattacharya et al. derive the identity1

Figure 4 :
Figure 4: Final result for b g from σ(0.18) (left) and ambiguity [b σ g − b χ g ] × a L (right).The extra factor a/L is added because b g always enters into observables with an explicit factor of a; we also include a linear fit to all points constrained to go to zero.The values of a/L considered in the plot have been obtained by fitting the values of L/a as a function of β LCP in Table1of ref.[1], and enforcing the universal 2-loop running of the lattice spacing for g 2 0 → 0.
(2.11)  for gradient flow observables.In order to generalize this result to any O(a) improved gluonic observable, O g , we recall that the gradient flow simply removed the complication from contact terms in the insertion of the O(a) improved scalar density, thereby allowing us to relate b g to d S .However, to derive Eq. (2.11), we only need to know that derivatives of O(a) improved expectation values with respect to the improved bare parameters are again O(a) improved.Hence the very same steps as carried out above establish Eq. (2.11) for arbitrary O(a) improved gluonic observables.

Figure 5 :
Figure 5: The lowest eigenvalue of |γ 5 D| on several thermalized configurations at two different lattice spacings.The upper part corresponds to the finer lattice spacing (L/a = 24, β = 4.7141), while the lower one to the coarser one (L/a = 16, β = 4.4662).Both are at approximately zero quark mass.The vertical red lines correspond to the lower interval boundaries of the rational approximations that were used in the simulations.

Table 1 :
Bare parameters for which b g has been computed as well as the fit coefficients c 1 , c 2 for σ(0.18) (cf.eq.(5.1)) and ĉ2

Table 3 :
Results for b g , eq.(2.11), and corresponding derivatives w.r.t. the bare parameters from ⟨O g ⟩ = σ(0.18)and ⟨O g ⟩ = χ, in the first and second line for each value of L/a and β, respectively.

Table 4 :
The degrees and ranges of the rational approximations used in the simulations.The last column is the rounded average acceptance rate.

Table 5 :
Results for σ(0.18) and χ used to obtain their g 2 0 -derivative.All measurements are carried out at vanishing quark mass and originate from N rep independent simulations with a total of N ms measurements.