An analysis of systematic effects in finite size scaling studies using the gradient flow

We propose a new strategy for the determination of the step scaling function σ(u)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma (u)$$\end{document} in finite size scaling studies using the gradient flow. In this approach the determination of σ(u)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma (u)$$\end{document} is broken in two pieces: a change of the flow time at fixed physical size, and a change of the size of the system at fixed flow time. Using both perturbative arguments and a set of simulations in the pure gauge theory we show that this approach leads to a better control over the continuum extrapolations. Following this new proposal we determine the running coupling at high energies in the pure gauge theory and re-examine the determination of the Λ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Lambda $$\end{document}-parameter, with special care on the perturbative truncation uncertainties.


Introduction
In recent years the gradient flow [1,2] (GF) has found many successful applications. In particular, when combined with the ideas of finite-size scaling [3], it provides a powerful tool for the determination of the running coupling in asymptotically-free strongly-coupled gauge theories [4][5][6]. The applications include the determination of the strong coupling in QCD, and the study of near conformal systems (see [7] for a recent review).
Coupling definitions based on the gradient flow have some properties that make them very attractive. First, the relevant observables show a small variance. A modest numerical effort allows their determination with a sub-percent precision. Second, the gradient flow coupling is given directly as an expectation value. Its determination only involves the numerical integration of the flow equations, something that in practice can be done with arbitrary precision without having to perform any fits, or taking any limit. This means that finite-size scaling studies using the GF only have one source of systematic effect: the continuum extrapolation.
Nevertheless, it has been shown that these systematic effects are difficult to keep under control (see [8] for a review). It was soon noticed [9] that seemingly innocent extrapolations could cause large systematic effects. Although the same work suggested to simply use larger flow times as a way to have a better control of the continuum extrapolation, this comes with an increase in the variance of the observable. One of the strong points of GF studies, the high statistical precision, had to be sacrificed. Some efforts were made in order to understand the anatomy of these cutoff effects, first in a perturbative context to tree level [10], and later more systematically from an effective field theory point of view [11]. It became clear that the integration of the flow equations and the evaluation of the relevant observables at positive flow time could be performed in a way that no O(a 2 ) effects are produced: it is enough to use classically improved discretizations [11]. Still, the use of these improved observables did not reduce substantially the observed scaling violations (see for example [12]). Unavoidably large lattices have to be simulated in GF studies, and the continuum extrapolation will remain the main source of concern.
In this work we study the main sources of systematic effects in finite-size scaling studies using the GF. We will argue that changes in the flow time t are responsible for the scaling violations. On the contrary, when different finite volume couplings are determined at the same value of the flow time t, the scaling violations are very small. This observation will be supported by a non-perturbative study. Moreover it will allow us to propose a new strategy for the determination of the step scaling function by breaking it up into two pieces: first a change in the flow time, without any change in the volume, second a change in the volume without any change in the flow time. Since the first step can be performed without having to double the lattices, the continuum extrapolation can be performed much more accurately. We will discuss in detail the advantages of this approach. Finally we will apply this alternative strategy to the case of pure gauge SU (3). Using the same datasets of [13], we will revisit the most crucial part of this work: the running at high energies and the matching with the asymptotic perturbative regime.
We also include an analysis of the variance of flow observables, allowing us to predict the dependence of the statistical uncertainties with the flow time t.

Preliminaries
The gradient flow provides a set of renormalized observables with small variance that are easy to compute via numerical simulations. The key idea consists in adding an extra time coordinate to the gauge field, the flow time t. The dependence of the gauge field B μ (t, x) on the flow time is given by the first order diffusion equation (2.1) (2. 2) The flow time has dimensions of length squared, and therefore introduces a scale into the problem (see Fig. 1). Gaugeinvariant composite operators defined at positive flow time are automatically renormalized [14]. In particular, a renormalized coupling at scale μ = 1/ √ 8t can be defined by using the action density [2] In finite volume schemes where the invariance under Euclidean time translations is broken, like the Schrödinger functional (SF) or open-SF boundary conditions, the coupling is only measured at the time-slice x 0 = T /2 and usually only certain components of the field strength are used The coupling defined using E = E m is usually referred as magnetic and the one using E = E e as electric.
Most applications of the gradient flow until today derive from these coupling definitions. In infinite volume the scale μ 0 = 1/ √ 8t 0 at which the couplingḡ 2 (μ 0 ) takes some predefined value is used to define the reference scale t 0 . In the context of finite-size scaling, the renormalization scale μ = 1/ √ 8t is linked with the linear size of the system and therefore the coupling "runs" with the size L (see Fig. 1). The constant c defines a scheme by fixing the ratio of the length of the system and the flow time scale √ 8t. The full definition of the GF coupling in finite volume reads The constant N (c) has been determined for different choices of boundary conditions [4][5][6]15]. When computing the gradient flow coupling on the lattice there are several choices to make, beyond the action chosen for the simulation. First, the flow equation (2.1) has to be translated to the lattice. Second, there are many valid discretizations of the energy density E(t, x). It is well understood that using the Zeuthen flow to integrate the flow equations and using any classically improved discretization of the action density E(t, x) guarantees that no further O(a 2 ) effects are generated. The only remaining O(a 2 ) effects are those of the choice of action for the simulation and an additional counterterm, only affecting flow quantities [11].
On the lattice, where we can work only with dimensionless quantities, the finite volume gradient flow coupling (Eq. (2.6)) can be measured by determiningḡ 2 (μ) at the flow time (in lattice units) √ 8t/a = cL/a. It is also common to use a lattice version of the normalization factor N (c, a/ √ 8t) instead of the N (c) of Eq. (2.6), in order to ensure that the leading order perturbative relation is exact for any lattice size.
The key character in finite-size scaling studies is the step scaling function It measures how much the coupling changes under a variation in the renormalization scale of a factor two. 1 Its determination in lattice simulations is performed via a matching of the bare parameters. At a fixed value of the bare coupling g 2 0 (and therefore fixed value of the lattice spacing a), 2 one determines the GF coupling on a lattice of size L/a (resulting inḡ 2 (μ)), and on a lattice 2L/a (resulting inḡ 2 (μ/2)). This allows the determination of a lattice approximation of the step scaling function: (2.9) 1 Other scale factors, like 3/2, are less common, but obviously possible. The discussion in this paper also applies to any other choice. 2 One also needs to specify the value of the quark masses. Simulations on a finite volume make it possible to directly simulate at m = 0, and this is typically the additional condition that completes the line of constant physics.

Fig. 2
The determination of the step scaling function σ involves a change in the renormalization scale and in the size of the system of a factor two. These two steps do not need to be performed at the sime time. The figure shows that σ can be determined as the composition of the function J 1 that changes the renormalization scale μ → μ/2 at fixed size L, with the function J 2 that changes the size L → 2L at fixed renormalization scale μ In the rest of this section we will be concerned with the continuum extrapolation of (u, a/ √ 8t); before that however, let us insist on two points: • The direct determination of , as suggested above, requires the determination of the GF couplings at two different renormalization scales μ and μ/2, on lattices of two different sizes L/a and 2L/a. • It is clear that once c (Eq. (2.5)) is fixed, taking a/L → 0 is equivalent to taking a/ √ 8t → 0, so the usual notation that uses a/L as the variable to parametrize cutoff effects is fully justified. Nevertheless, it is a/ √ 8t = aμ the natural variable to measures the size of cutoff effects. Note that for the typical choices of c ∈ [0.2 − 0.5] we have √ 8t < L. With this choice of the renormalization scale, it is clear that, at fixed L/a, larger values of c would lead to smaller cutoff effects. This notation will also be convenient for the discussion that follows.

A new strategy for the determination of σ (u)
As already pointed out, the determination of the lattice step scaling function involves two steps: a change in the renormalization scale by a factor two, and change in the lattice size by the same factor. In previous works these two changes have been performed at the same time in a single step, but conceptually there is no need to do so. Figure 2 shows that the value of the step scaling function σ (u) can be determined by the composition of two functions. First we have (2.10) This function changes the renormalization scale by a factor two μ → μ/2 at fixed physical size L. Second, we need to determine . (2.11) This function changes the lattice size L → 2L keeping constant the renormalization scale μ −1 = √ 8t. The relation is now exact, and provides an alternative method to determine σ .
Our main assumption is that large scaling violations come with changes in the renormalization scale. We will later provide evidence that this is the case, but for the moment let us discuss why this opens up the possibility to improve the quality of the continuum extrapolations. Let us start by explaining how these functions are computed in practice: • The determination of J 1 involves measuring how much the coupling changes when the renormalization scale is varied as μ → μ/2 at constant physical size L. This is simply achieved by measuring on a lattice simulation the value of the GF coupling at two different flow times [i.e. t → 4t, see Eq. (2.5)]. Crucially, this determination does not require to double the lattice size, allowing precise results without the need of very large values of L/a. • The determination of J 2 requires to change the physical size L without varying the renormalization scale. In practice one fixes the bare coupling g 2 0 at a given value, and then measures the GF coupling on a L/a lattice at flow time √ 8t/a = cL/a and on a 2L/a lattice at the same t. This step requires to change the lattice size, but since the renormalization scale remains the same, one expects reduced cutoff effects. In some sense the determination of J 2 corresponds to measuring the finite volume effects inḡ 2 c (μ) (see Fig. 2).
In the rest of this section we will provide evidence of these statements, but before that let us further comment on two points: • The determination of the functions J 1 and J 2 can be done on the lattice just by applying the definitions equation (2.10) and (2.11). Note however that in this case the functions will carry a dependence on the cutoff. We will label these functionsĴ 1 (u, a/ √ 8t) andĴ 2 (u, a/ √ 8t) respectively.
• In this work all numerical results make use of the same discretizations used in [13]. We encourage the reader to consult this reference for more details. Here it is enough to say that we define the GF coupling with SF boundary conditions and that our preferred setup, based on theoretical expectations, uses the Zeuthen flow and an improved definition of E(t, x). This preferred setup will also be compared with the more common combination Wilson flow/Clover observable. Moreover, we will focus in the rest of the text on the magnetic definition of the GF coupling (see Eq. (2.4)). 3 Of course, our discussion is general, and does not depend on these particular choices.

Leading order perturbation theory
As a first look at the proposal we examine the leading order perturbative relation. We use the continuum norm N (c) in the evaluation of the finite volume couplingsḡ 2 c and examine, to leading order in perturbation theory, the quantitieŝ Note that since we are working at leading order inḡ 2 , and thanks to the normalization by the constant factor u, all these quantities are one in the continuum. In this example we will examine a typical case where we consider data for L/a = 8, 10, 12, 16, 18, 20, 24, 32, 36, 40, 48. (2.14) We will use c = 0.2 (see Eq. (2.5)). Let us note a few basic points: • The determination ofĈ 0 (a/ √ 8t) andĈ 2 (a/ √ 8t) requires to double the lattice size. This means that with our data, lattice estimates for these functions will be available only for a factor 3 change from the finest to the coarsest lattice spacings: 8 → 16, 10 → 20, 12 → 24, 16 → 32, 18 → 36, 20 → 40, 24 → 48.
• On the other hand, the determination ofĈ 1 (a/ √ 8t) only requires the measurement of the GF coupling at different values of the flow time t. This can be done on all lattices, and our dataset provides a factor 6 change from the coarsest (L/a = 8) to the finest (L/a = 48) lattice spacing. Fig. 3 Cutoff effects in the usual step scaling function (Ĉ 0 (a/ √ 8t)), compared to those in J 1 (seeĈ 1 (a/ √ 8t)) and J 2 (seeĈ 2 (a/ √ 8t)). See text for more details. Here we show the case of the Zeuthen flow/improved observable with plaquette gauge action (i.e. the same setup that will be used in our non-perturbative study) Figure 3 shows the perturbative results. As the reader can see, the cutoff effects inĈ 0 (a/ √ 8t) are very similar to those inĈ 1 (a/ √ 8t). This can be understood in a simple way, since both these functions involve a change in the renormalization scale by a factor two. The main difference between both cases is thatĈ 1 (a/ √ 8t) can be determined using lattice spacings that are a factor two smaller, since its determination does not require any change in the lattice size. The determination of C 2 (a/ √ 8t) does not involve any change in the renormalization scale, and it shows cutoff effects that are one order of magnitude smaller than eitherĈ 0 (a/ √ 8t) orĈ 1 (a/ √ 8t). In the next section we will show that indeed these properties hold non-perturbatively, and that they are not a coincidence of leading order perturbation theory.

Non-perturbative study
In this section we will describe the non-perturbative results used in this study, first to support the claim that the numerical determination ofĴ 2 has very small scaling violations, due to the fact that the renormalization scale is not changed (i.e. the determination ofĴ 2 amounts to measuring finite volume effects in the coupling). Then, we want to show that the determination of J 1 can be performed accurately even at values of c that are too small to allow for a conservative estimate of the step scaling function.
All the analysis have been performed using two different analysis codes: one [16] based on the -method [16][17][18][19], and the other using a jackknife resampling technique. Both analysis techniques take into account the correlations between observables measured on the same ensemble. This is crucial, both for the determination of J 1 that involves the measure-ments ofḡ 2 on the same configuration at different values of the flow time, and also to correctly determine the uncertainty in the composition J 2 • J 1 .

Description of the data set
For our non-perturbative study we are going to use exactly the same dataset of [13]. This setup includes simulations of the pure gauge theory with the Wilson plaquette action on a lattice of size L 4 and lattice spacing a. We have several resolutions, L/a = 8, 10,12,16,20,24,32,48, at a large range of lattice spacings a and with Schrödinger functional (SF) boundary conditions. The setup is the same as the one used for the perturbative study in Sect. 2.2.1.
We have measurements of the GF coupling at values of c = 0.2, 0.3, 0.4 with two different discretizations: the usual Wilson flow/Clover observable and the Zeuthen flow/improved observable (more details can be found in [13]). Our target will be to determine the running non-perturbatively in the scheme defined by c = 0.2 by computing the associated step scaling function σ (u). Note that in Ref. [13] the value c = 0.3 was used because the large scaling violations at c = 0.2 did not allow for a determination of σ (u).
We will revisit this attempt at a direct determination of σ (u) here. Moreover, together with the data at c = 0.4, we will be able to determine both J 1 , J 2 , and compare their composition with the direct determination of σ .
Finally, the data with c = 0.3 will be used in Sect. 4 to compare the results of [13] (where the parameter is obtained by using a direct determination of the step scaling function with c = 0.3) with our new strategy.
We will focus our investigations on the high energy regime, whereḡ 2 ∼ 1-3. Note that this region showed significant scaling violations, and in fact turns out to be the most delicate part of the analysis in the extraction of (see [13] for more details).

Scaling violations in J 2
Let us start by investigating the scaling violations of J 2 . It is convenient to study the combination The continuum limit of the right hand side, f 2 (u, 0), has an asymptotic expansion (in perturbation theory) as a polynomial in u, starting with a constant term. Note that our data set allows to determineĴ 2 (u, a/ √ 8t) for a factor three change in a/ √ 8t. The numerical raw data forĴ 2 (u, a/ √ 8t) is shown in Fig. 4a. We also include in the plot the continuum extrapolation. At this point we defer the discussion on how this continuum curve is determined to Sect. 4, and focus on the key element: the non-perturbative data forĴ 2 show very small scaling violations for all values of the coupling under study. The continuum curve is at most two standard deviations away from the coarser lattice data ( √ 8t/a ≈ 1.6), and the two finest lattices with √ 8t/a ≈ 3.2, 4.8 show no significant deviation from the continuum value.
One can look in more detail at the previous statement by interpolating the data with different L/a to a common value of u, and then look at the continuum extrapolation ofĴ 2 . We choose the value u = 1.5, where we have several points at each L/a, and therefore the necessary interpolations can be performed in the safest conditions available. Figure 4b shows that the Zeuthen flow data shows no significant scaling violations in the whole range of lattice spacing. The Wilson flow data show some scaling violations, but they are rather mild, with the finest lattice being almost compatible with the continuum value. Extrapolations of the data with both discretizations are in full agreement with each other.

Scaling violations in J 1
Once more, it is convenient to study the quantity The crucial difference with the previous case is that the determination of J 1 involves a change in renormalization scale μ → μ/2, so we expect significant scaling violations. On the other hand, its determination does not require to double the lattice sizes. In practice we have a range in lattice spacing that spans a factor 2 further. Figure 5a shows a comparison of the raw data with the continuum curve (see Sect. 4 for a discussion on its determination). In contrast with the case ofĴ 2 , we observe significant scaling violations, confirming our hypothesis that such violations are mainly a result of changes in the renormalization scale. Moreover, they show a complicated functional form: the three coarser lattices with √ 8t/a = 1.6, 2.0, 2.4 do not show a monotonous pattern. The data is several standard deviations away from the continuum curve. Figure 5b shows again the continuum extrapolation of J 1 (u) at the fixed value u = 1.5. The plot confirms that scaling violations are significant, with the different discretizations based on the Wilson/Zeuthen flow showing differences of several standard deviations for √ 8t/a < 3.2. Still, one can obtain an accurate extrapolation of J 1 . In order to do so, the large range of lattice spacings available to us, from L/a = 8 to L/a = 48, is crucial. This is of course possible only because the determination of J 1 does not require to double the lattice size. Figure 5a shows that the two finest lattices are in agreement with the continuum curve. Note however that these are very fine lattices with √ 8t/a ≈ 6.4, 9.6.
A detailed comparison with a direct determination of σ 0.2 (u) Finally, let us compare our new strategy with the direct determination of the step scaling function σ c=0.2 . Let's start by stating what is known. 4 The leading cutoff effects of the step scaling function can be described thanks to the Symanzik effective theory [22][23][24]. The asymptotic scaling violations have the form Here γ is related with an anomalous dimension. Its value depends both on the details of the gauge action simulated and the details of the observable . Only recently [20] the leading relevant anomalous dimension for the special case of spectral quantities (in the pure gauge theory) has been computed. Except in this case, the relevant values of the leading anomalous dimensions are unknown.  But this consistent picture is just hiding the assumptions that are behind such extrapolations. In particular, the left panel shows that using linear extrapolations in a 2 log(a) (for the Zeuthen flow data) or linear extrapolations in a 2 log 2 (a) (for the Wilson flow data), one obtains an even better agreement. Unfortunately the perfectly consistent extrapolations without logarithmic terms do not agree with the perfectly consistent extrapolations that include such terms. The largest difference is between the Zeuthen flow extrapolation in a 2 log(a) (with result 1.78341(88)), and the two-point linear extrapolation in a 2 of the Wilson flow data (with result 1.7744(10)), that differ by approximatley 7 combined sigmas. This just shows that the direct extrapolation of c=0.2 , although statistically very precise, has an uncontrolled systematic uncertainty unless very large lattices are simulated. Figure 7 shows the equivalent extrapolation for the case of J 1 . 5 In this case one can see that the extrapolations with or without logarithmic terms agree nicely. The largest deviation is found in the extrapolation of the Zeuthen flow data with term a 2 log(a) (with result 1.8309 (19)), and the a 2 linear extrapolation of the Wilson flow data (with result 1.8291 (15)). This discrepancy is less than one combined sigma. We conclude that the uncertainty in J 1 (u) is in fact dominated by the statistical uncertainty, and not by our prejudice on the unknown values of the anomalous dimensions.

Statistical uncertainties
It has been argued that a simple way to improve the scaling properties of GF couplings consists in using large values of c (see for example the discussion in [26]). Unfortunately, it is well known that this comes at the cost of increased statistical uncertainties. In this section we want to make this last statement more precise. We will present a simple model for the understanding of the scaling of the statistical uncertainties of the GF coupling and then we will show how the results of numerical simulations agree with this naive approach.
Let us first focus our discussion on schemes that fully preserve the translational invariance, like the case of periodic [4] or twisted [27] boundary conditions. The gradient flow smears the original gauge field A μ (x) over a distance d ∼ √ 8t. Due to the invariance under translations, each four dimensional ball of radius √ 8t provides an estimate of the quantity E(x, t) (see Fig. 9). Under this assumption the volume average on a lattice L 0 × L 1 × L 2 × L 3 will make the variance of the observable E(x, t) proportional to Note that in the common situation of an L 4 lattice with the same length in all directions one has F = c 4 (see Eq. (2.5)).  In schemes where the invariance under translations is broken in the time direction, like Schrödinger functional (SF) [5] or open-SF [15] boundary conditions on a box of sizes L 0 × L 1 × L 2 × L 3 , a similar argument applies, except that in these cases the coupling is only measured at a single timeslice x 0 = L 0 /2. Therefore in this case we expect a factor e.g. on a symmetric lattice F = c 3 . When do we expect this model to break down? For the volume average argument to make sense, the region that is smeared by the flow must be much smaller than the size of the lattice, so we require √ 8t L μ 1/2. If our hypothesis is correct, we expect this combination to be independent on the lattice size and on the values of √ 8t/L μ . Figure 8 shows this quantity in three data sets.
First, in orange we plot the usual magnetic coupling definition that we have been using for our non-perturbative study (Sect.   Second, in black, we have another definition of the coupling in the same datasets (in particular the lattice sizes and values of c are the same as in the previous case). The data corresponds to the coupling definition based on the space-time components of the Energy density (i.e. the average between the "magnetic" and the "electric" components). Despite the high correlation between the electric and magnetic energy densities, the average shows a significant smaller variance. In this the combination of Eq. (3.4) can also be reasonably well described by a linear function.
Finally, in blue, we have data with twisted boundary condition on an asymmetric lattice [25]. In this case the simulations are done on volumes L 2 × (L/3) 2 (see [28] for the theoretical motivation behind this particular geometry). We use data with √ 8t/L = 0.20, 0.25, . . . , 0.4 and lattice sizes L/a = 12, 24, 48. Note that in this particular case the condition equation (3.3) is flagrantly violated in the two short directions, since 3 √ 8t/L = 0.6-1.2. This may explain why this dataset shows a much larger dispersion. For this case the combination in Eq. (3.4) shows a larger dependence on details like the particular choices of L/a and √ 8t and not only onḡ 2 . Still, the variation is not large taking into account the disparity of scales (varying by several factors) involved in the data (note that naively the variance changes by more than two orders of magnitude).
It is also worth mentioning that the variance at weak coupling is very similar for the three datasets, differing by at most a factor three. Together with the observation that being a flow observable, the quantity in Eq. (3.4) has a well defined continuum limit, we can conclude that the function in Eq. (3.4) is universal. Details like the choice of boundary conditions or the choices of discretizations induce relatively small scaling violations, especially at the weakest couplings.

The high energy regime of Yang-Mills revisited
As a further test on our proposal, we will re-examine the high energy regime of Yang-Mills. Let us first recall the relevant points of the work [13]. Given that the pure gauge determination of MS has to face the very same challenges as the determination of the strong coupling α s (M Z ) in QCD, we think that revising the crucial part of the work [13] with the new method proposed in this work is fully justified. We recall that our dataset is exactly the same as the one used in [13] (see Sect. 2.2.2).

The continuum limit of J 1 and J 2
In order to obtain the functions J 1 and J 2 in the continuum, the best strategy consists in combining the continuum extrapolation with a parametrization of the function in the continuum. In particular we are going to use the parametrization Note that the coefficients c (i) n parametrize the continuum function J i (u), while the coefficients ρ (i) n parametrize the O(a 2 ) cutoff effects in the functionĴ i . There are several assumptions hidden in this parametrization. First we assume that the continuum function 1/Ĵ i (u, 0) − 1/u can be well described by a polynomial. This is certainly the case in perturbation theory to all orders 6 and we expect that the nonperturbative functions can be well described by a polynomial ansatz.
A more delicate assumption is that in our functional form of Eq. (4.2) all scaling violations are quadratic (i.e. a 2 /(8t)). First, due to the breaking of translational invariance in the Schrödinger Functional, we expect cutoff effects linear in the lattice spacing. They are expected to be small, due to the localization of the GF coupling at the timeslice x 0 = T /2. Moreover the extrapolations in Sect. 2.2.2 have completely ignored these effects, and our data in fact seem to scale like O(a 2 ) after dropping the coarser lattices. But due to the high precision of our data, these O(a) effects cannot be completely ignored, especially if we take into account the fact that our strategy uses data at large values of √ 8t/T = 0.4, where these effects are expected to be larger than in [13], where √ 8t/T = 0.3 was used. For this reason we include a generous estimate of these linear effects in the error. The details are explained in Appendix A Our data set has also higher order cutoff effects, of the form a n for n > 2, and logarithmic corrections as well [20]. The effect of these terms in our extrapolations will be estimated by changing the cuts used to fit the coefficients c

The case J 1
In the exploratory study in Sect. 2.2.2 the Zeuthen flow/improved observable discretization already displayed better scaling properties, but still we had to discard all lattices with L/a < 12. Since the Wilson flow/Clover combination would require even more stringent cuts, we will only use the improved setup to quote final results. This is confirmed by looking at Table 1, where the values in the continuum of J 1 (u, 0) are shown at a few representative values of u. As the reader can see, the effect of varying the number of fit parameters (n c and n ρ in Eq. (4.2)) is negligible. On the other hand, the cut in L/a has a small effect on the extrapolations. If lattices with L/a = 12 are included, the continuum value of J 1 seems to be systematically higher, but still compatible within errors. Also the statistical errors are smaller for these analysis. A conservative approach consists in just taking a fit with L/a ≥ 20 (i.e. a/ √ 8t < 1/4), so that 6 In fact the first two coefficients c (i) 0,1 can be computed with the help of the k i coefficients calculated in [30]: (6). c (2) 0 = −0.0461 (7), c (7), the continuum value has a larger uncertainty. Note that since the computation of J 1 does not require to double the lattice sizes, even with this stringent cut our dataset still offers more than a factor two in lattice spacing. Among these fits there is very little difference between different parametrizations. Moreover the fit quality is very similar in all cases. All in all we just choose one of these fits (n c = 3, n ρ = 2, bold in Table 1) as our final result.

The case J 2
The computation of J 2 requires to double the lattice sizes, and then our datasets offers only half the lever arm in lattice spacing for the continuum extrapolations. Our hypothesis is that the scaling violations are small for J 2 because its determination does not involve a change in renormalization scale. Our preliminary investigation of Sect. 2.2.2 has also confirmed this hypothesis. Table 2 shows that this is indeed the case. Even including the coarser lattices with L/a = 8 (corresponding to a/ √ 8t = 1/1.6), the results are in agreement within errors. It is clear that the choice of parametrization has very little effect. We just settle for one particular fit with L/a ≥ 12 (represented in bold in Table 2) that we will use for any further analysis. that has a very small uncertainty. 7 The other factor MS /μ ref was much more delicate to determine. It is precisely this last quantity that we want to determine once more with our new strategy. A first step consists in dealing with the factor μ ref .
This was defined in the scheme with c = 0.3 by the condition Since our new strategy provides the step scaling function σ (u) for c = 0.2, we must first determine the value of our couplingḡ c=0.2 (μ ref ). The procedure is completely analo- 7 The quantity √ 8t 0 μ ref was determined in two different schemes, and for each scheme, using two different strategies. All analysis resulted in completely negligible differences.   (28) gous to the determination of J 1 . We first definê We choose to fit our data to the model (4.6) The same considerations discussed in Sect. 4.1.1 apply to the determination, in the continuum, of the relation between g 2 c=0.2 (μ) andḡ 2 c=0.3 (μ). In this case, however, we expect the scaling violations to be smaller, since the change in renormalization scale is not a factor two, but only a factor 3/2.
We performed several fits, changing the number of fit parameters n c and n ρ , and using different cuts for our data, and the overall analysis results in a consistent value for g c=0.2 (μ ref ) as long as data with L/a ≥ 16 is used.
We choose to quote the result with n c = 3 and n ρ = 2 and L/a ≥ 20 Despite the high precision, the result should be actually considered conservative, as this particular fit has one of the largest uncertainties of all the combinations that we tried (see Fig. 10).

The extraction of MS
The s -parameter in the scheme defined by the couplingḡ 2 s is given by the expression Note that this expression is exact, and valid beyond perturbation theory, as long as the non-perturbative β-function, This last formula allows for a non-perturbative definition of MS , even if the MS scheme is intrinsically perturbative. All in all, the determination of MS requires the determination of the integral in Eq. (4.8) in a scheme that is non-perturbatively defined. The lower limit of the integral is zero, which requires to determine the β-function up to infinite energy. In practice this can only be achieved by a limit process. One first defines very similar to the previous function I s (ḡ s (μ)). The only difference is that the integral in Eq. (4.8) for values of the coupling smaller than g PT is determined by substituting the β s -function by its l-loop perturbative approximation β (l) (4.13) The first two coefficients b 0 = 11/(4π) 2 and b 1 = 102/(4π) 4 are scheme-independent, while the values b n for n > 1 depend on the chosen scheme. It is now clear that In practice for most finite volume schemes, the β s -function is known up to three loops, and therefore the corrections are O(g 4 PT ). The value of the coupling g PT delimits the energy region (from μ PT to ∞) where perturbation theory is used via the relation Ideally one would like to estimate the -parameter by taking the following limit Since the value of the couplingḡ 2 s (μ) runs logarithmically with μ, it is technically a challenge to probe a large range of energy scales so that the corrections O(g 2(l−1) PT ) vary substantially and the limit can be taken accurately.
Of course finite-size scaling was designed to explore such large ranges of energy scales. Starting from the scale μ ref (see Sect. 4.2.1), and with the knowledge of the step scaling function σ = J 2 • J 1 (see Sect. 4.1), one can define the sequence of couplings The energy scales reached by this procedure increase geometrically. Contact with perturbation theory can be made at each step by choosing g 2 PT = u n in Eq. (4.14), and one can indeed check that the corrections O(u 2 n ) are small and decrease as they should. For a long time, the challenge was mainly to maintain a high precision, but the most recent works [13,31] have shown that when one reaches a high precision, the corrections can be significant in some schemes even at very high energy scales.
For this reason reaching high energies alone is not enough. The limit in Eq. (4.16) has to be taken seriously and the systematics well estimated. Fortunately our dataset allows us to study the matching with perturbation theory at energy scales μ n = 2 n μ ref , (n = 0, . . . , 5) . (4.18) Moreover we will explore several options to match with perturbation theory: GF: This is just the direct application of Eq. (4.8) using g 2 PT = u n to determine I GF (ḡ(μ n )) (cf. Eq. (4.14)). Schematically: In this case the matching with perturbation theory is performed in the GF scheme at a scale μ PT = μ n = 2 n μ ref . SF: Reference [13] showed that schemes based on the GF show a very poor perturbative convergence. The same reference suggested to match non-perturbatively to the traditional SF coupling [32] with background field. The details of this matching are explained in appendix B. Schematically: In this case matching with perturbation theory is performed in the SF scheme at a scale μ PT = 0.3 × μ n+1 = 0.3 × 2 n+1 μ ref .

MS:
One can convert the values of the SF coupling to the MS scheme using the perturbative relation [33]  Once the value of the coupling in the MS scheme is known, one can use the known 5-loop β-function [34] to determine directly MS . Even if the running is known much more accurately in the MS scheme, this procedure carries the same parametric uncertainty O(ḡ 4 MS ) as the others, since the limiting factor is represented by the known orders in the perturbative relation between couplings, Eq. (4.19) (see [31]). Schematically we have In this case the scale of matching with perturbation theory is performed in the SF scheme at a scale μ PT = s 0.3 × 2 n+1 μ ref , but the RG evolution is done in the MS scheme. The value of s is in principle arbitrary, but if taken too large the perturbative coefficients of Eq. (4.20) become large, and one expects a bad asymptotic convergence of the perturbative series. We will explore two choices: first the simple s = 1, and then the value s = 2, that is very close to the value of fastest apparent convergence. 8 The values of MS /μ ref can be multiplied by the factor √ 8t 0 μ ref (cf. Eq. (4.3)) to produce the results for √ 8t 0 MS reported in Table 3 according to the different procedures. In the next section we will comment on the results.

Results and discussion
We refer the reader once more to Table 3. The values for √ 8t 0 MS differ for the different treatments of perturbation theory. There are two important points worth mentioning: Table 3 Sequence of couplings in different schemes and at different scales (μ n = 2 n μ ref ) and the corresponding values of √ 8t 0 MS (see text for more details). The values ofḡ(μ n ) are obtained via a recursive application of the step scaling function in the GF scheme (Eq. (4.17)). The conversion from the GF scheme to the SF scheme is performed non-perturbatively and detailed in Appendix B. The conversion to the MS scheme is done by using the perturbative relation with the SF scheme (Eq. (4.19)). The last two rows show possible extrapolations of √ 8t 0 MS using the last four values (n = 2, 3, 4, 5). We show both an extrapolation linear in theḡ 4 and a extrapolation to a constant for the cases that this behavior is compatible with the data 1. Even at scales whereḡ 2 ≈ 1 (corresponding to α ≈ 0.08), different treatments of perturbation theory produce values of √ 8t 0 MS that vary as much as 3%. 2. There are two particular treatments of perturbation theory (labeled SF and MS(s = 2)), where the value of √ 8t 0 MS is constant within errors when extracted over a range of energy scales that vary by a factor 32.
These results are also plotted in Fig. 11. Qualitatively we see that the variations between different treatments of perturbation theory roughly scale as expected (i.e. decrease proportionally to α 2 ).
A more quantitative picture is obtained by looking at the two last rows of Table 3. They show possible extrapolations of the quantity √ 8t 0 MS (see Eq. (4.16)): the deviation from the final result of Ref. [13] √ 8t 0 MS = 0.6227(98) of any of the possible extrapolations is below the statistical uncertainties (about 1.5%). This is half of the differences present at scales whereḡ 2 ≈ 1.
All extrapolations g PT → 0 agree well (last two rows of Table 3). In particular even the extrapolations that assume that the higher order terms proportional to g 4 PT are negligible show quite a small uncertainty. Still, the error band covers the central values of all other extrapolations. Note however that the size of the uncertainties depends strongly on how much data one decides to include. A very conservative approach (such as the one used in Ref. [13]) would consist in just quoting as final result the value at the most perturbative point. This is justified since the data labeled SF and MS (s = 2) shows basically no dependence on the value of g PT .
Note however that the methodology in these two works is very different. In particular they deal with the systematic associated with the continuum extrapolations in a very different way. Reference [13] uses the GF coupling with c = 0.3 in order to perform the non-perturbative running. On the other hand we use the step scaling function with c = 0.2, determined as the composition of the functions J 1 and J 2 as described in Sect. 2.2 order to do the non-perturbative running. Even with the highly conservative approach that we used in Sect. 4.1 to perform the continuum extrapolations of J 1 andĴ 2 , we find a final uncertainty on √ 8t 0 MS of the same size of the one obtained in Ref. [13]. Moreover, the fact that the central values are in perfect agreement in the two calculations provides very strong evidence that the systematic effects associated with the continuum extrapolations are completely under control, and well below our statistical uncertainties.
Finally, the extrapolations that assume a linear dependence in α 2 PT show larger uncertainties. Still, it is very important to see that the g PT → 0 extrapolation substantially improves the agreement between all treatments of the perturbative matching.
It is also worth mentioning that the propagation of the linear O(a) effects (see appendix A) represent about a 15% of the final error squared in √ 8t 0 MS . This is about a 50% larger than in the extraction of Ref. [13] and can be understood noting that making use of values of the coupling at c = 0.4, we increase the boundary effects.
All in all our approach shows a remarkable agreement between very different treatments of the matching with perturbation theory, and thanks to our new proposal, we are able to also show a very good agreement with previous works that have rather different systematics associated with the continuum extrapolation.

Scale uncertainties
The approach labeled MS in Sect. 4.2.2 is very close to many phenomenological extractions of the strong coupling. The value ofḡ 2 MS is extracted from a measurement, in this case from the value of the SF coupling obtained in a simulation, thanks to its perturbative expansion Different renormalization scales sμ can be used for each value of the physical scale μ. The differences between different renormalization scales are an estimate of the truncation errors (i.e. an estimate of the O(g 2(l−1) PT ) effects in Eq. (4.14)). In particular, in phenomenology, it is very common to vary the renormalization scale a factor two above/below some chosen value. Figure 12 shows such estimate of the uncertainties propagated to the -parameter. δ is obtained by varying the renormalization scale of a factor two above/below the scale of fastest apparent convergence (i.e. the average difference between the values of obtained after using s = 1 and s = 2 and then s = 2 and s = 4).
From Fig. 12 it is clear that that scales uncertainties are rather large in the pure gauge theory. 9 Even at α ≈ 0.1, corresponding to the highest scales reached in our study, they are around 2%. One might question the results of our works, that claim a significantly smaller uncertainty. The key to claim smaller errors than the scale uncertainties lies in the limit definition of , Eq. (4.16). Once the limit g PT → 0 is properly taken, and its systematic estimated, one does not need to talk about the uncertainties at non-zero g PT . Of course taking such a limit is hard: data at different values of g PT is required. Due to the logarithmic running of the coupling with the physical energy scale, the apparently innocent limit of Eq. (4.16) requires to solve a hard multi-scale problem. Even with our datasets, that spans a factor 32 in energy scales (a change in the coupling g 2 PT by more than a factor two), we have seen that some assumptions on the scaling as g PT → 0 are needed in order to reach the 1.4% precision on . We consider our approach to treat perturbative uncertainties very conservative. Still, future works in the pure gauge theory might want to explore even larger energy scales.

Conclusions
In this work we have examined the main sources of uncertainties present in finite-size scaling studies using the Gradient Flow: the continuum extrapolation and the statistical uncertainties. We have argued that scaling violations are a result of exploring changes in the flow time. This observation has been supported both by a perturbative study and by non-perturbative numerical results.
The determination of the step scaling function σ (u), the crucial observable in step scaling studies, involves both a change in the flow time and a change in the size of the system. We propose to divide the determination of σ (u) in two pieces: first, a change in the renormalization scale at a constant size of the system (the function J 1 ), followed by a change of the size of the system at constant renormalization scale (the function J 2 ). The advantage is that, according to our hypothesis, only the first step shows significant cutoff effects. By breaking up the determination in two pieces, the scaling violations can be studied much more accurately. Modest datasets allow to explore a change of the renormalization scale at constant physical volume with lattice spacings varying by factors 4-6. In Sect. 2.2 we have seen that this strategy usually comes at the cost of larger statistical uncertainties, especially in schemes like the Schrödinger Functional that break translation invariance and have to deal with O(a) systematic effects. Thus, the proposal trades the large systematic associated with the continuum extrapolations present in many GF studies with a statistical uncertainty. Since the latter are much easier to control, we think that the proposed strategy shows a clear advantage. In Sect. 3 we have shown that statistical uncertainties can be well understood and predicted with a simple model.
We think that this strategy can shed some light in many problems that are currently being studied where systematic effects of the continuum extrapolation are relevant (see for example the recent discussion in [7]). A detailed study of the scaling violations of J 1 , that according to our hypothesis are very similar to those of the step scaling function σ , should become a standard way to assess the quality of the continuum determination of the step scaling function. This is specially relevant for studies of the conformal window, since much less is known about the logarithmic corrections to scaling in These models, and as we have shown in Sect. 2.2.2 they can have a large effect in the extrapolations.
We have also re-examined the determination of theparameter in the pure gauge theory. The most crucial step is the high energy region and the matching with the asymptotic perturbative regime. We have used the step scaling function with c = 0.2, determined using our new proposal. The matching with perturbation theory is performed in different schemes and using different procedures. Our datasets allows to match with perturbation theory at energy scales μ PT where α(μ PT ) 0.1. Even at this large energy scales the perturbative truncation effects are large, corresponding to about a 2% uncertainty in . The size of this uncertainty is also confirmed by a scale variation analysis. All in all, perturbative errors are large in the pure gauge theory, making the determination of rather challenging in this aspect, in particular when compared with the corresponding determination in QCD. 10 Fortunately, our dataset explores a large range of energy scales, and gives us the possibility to explore the limit α(μ PT ) → 0 (corresponding to μ PT → ∞). Once this limit is properly taken, the perturbative uncertainties at α(μ PT ) estimated using scale variation or any other procedure are irrelevant. Of course taking such a limit is very challenging. The corrections, O(α 2 (μ PT )), decrease very slowly due to the logarithmic running of the coupling at high energies.
Taking the limit α PT → 0 is very challenging in a large volume setup due to the finite range of scales that any lattice simulation can probe. This might explain the difference with some precise results in large volume [29], although a more detailed study is necessary.
Depending on the assumptions made in the extrapolation α(μ PT ) → 0, the uncertainty in varies in the range 1−2%. All in all, our results show a perfect agreement with the final result of [13] ( √ 8t 0 × MS = 0.6227(98)), obtained with c = 0.3, that quotes an uncertainty ≈ 1.4%. We stress once more that the method that we used in this work provides a careful control on the continuum extrapolations, leading to a final uncertainty on √ 8t 0 × MS of the same size of [13] even when using a very conservative approach. Due to the different treatments of perturbation theory and the use of different schemes, it seems clear that, despite some discrepancies with other works (see discussion in [13]), the systematic effects in [13] are well under control. Data Availability Statement This manuscript has no associated data or the data will not be deposited. [Authors' comment: The data used is the same as in reference [13]. The data will be made available upon request.] Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/. Funded by SCOAP 3 .

Appendix A: Boundary O(a) effects
The procedure to estimate the O(a) cutoff effects is completely analogous as in [13]. In fact we use the same datasets. We determine numerically the dependence of the GF coupling at different values of c = 0.2, 0.3, 0.4 with the boundary parameter c t . This is done using simulations at different values of the improvement parameter c t close to its 2-loop value  Table 4 shows the result of the parameters a 0 (c), a 1 (c). Now we need an estimate of how much the true, nonperturbative, value of c t differs from its 2-loop value. There is no information available, but the extrapolations at constant u of Sect. 2.2.2 completely ignored this linear effects and the data showed no significant deviation from a O(a 2 ) scaling. This suggests that these effects are smaller than our statistical uncertainties. With this insight, a conservative approach consists in using the full 2-loop contribution as an estimate of the difference between the 2-loop value and the correct non-perturbative one.
In summary we add to all our data, in quadratures, the error

Appendix B: Matching with the SF scheme
Since the strategy is the same as the one used in [13], we refer the interested reader to this reference for more details.
Here it is enough to say that we performed a set of SF simulations with background field and lattice sizes L/a = 6, 8, 10, 12, 16 and at the same values of the bare coupling g 2 0 = 6/β as the available measurements of the GF coupling in twice the lattice size L/a = 12, 16, 20, 24, 32. We collect between 9 × 10 5 and 2 × 10 6 measurements of the SF couplingḡ 2 SF . We further remove all cutoff effects up to 2 loops fromḡ 2 SF (i.e. the leading cutoff effects are O(g 6 0 )). The non-perturbative data is fitted to a functional form of the type To quote all our results we use the same fit used in Ref. [13], where the coarser lattice is dropped from the fit and the functions f (u),ρ(u) are degree two polynomials.