An analysis of systematic effects in finite size scaling studies using the gradient flow

We propose a new strategy for the determination of the step scaling function $\sigma(u)$ in finite size scaling studies using the Gradient Flow. In this approach the determination of $\sigma(u)$ is broken in two pieces: a change of the flow time at fixed physical size, and a change of the size of the system at fixed flow time. Using both perturbative arguments and a set of simulations in the pure gauge theory we show that this approach leads to a better control over the continuum extrapolations. Following this new proposal we determine the running coupling at high energies in the pure gauge theory and re-examine the determination of the $\Lambda$-parameter, with special care on the perturbative truncation uncertainties.


Introduction
In recent years the gradient flow [1,2] (GF) has found many successful applications. In particular, when combined with the ideas of finite-size scaling [3], it provides a powerful tool for the determination of the running coupling in asymptotically-free strongly-coupled gauge theories [4,5,6]. The applications include the determination of the strong coupling in QCD, and the study of near conformal systems (see [7] for a recent review).
Coupling definitions based on the gradient flow have some properties that make them very attractive. First, the relevant observables show a small variance. A modest numerical effort allows their determination with a sub-percent precision. Second, the gradient flow coupling is given directly as an expectation value. Its determination only involves the numerical integration of the flow equations, something that in practice can be done with arbitrary precision without having to perform any fits, or taking any limit. This means that finite-size scaling studies using the GF only have one source of systematic effect: the continuum extrapolation.
Nevertheless, it has been shown that these systematic effects are difficult to keep under control (see [8] for a review). It was soon noticed [9] that seemingly innocent extrapolations could cause large systematic effects. Although the same work suggested to simply use larger flow times as a way to have a better control of the continuum extrapolation, this comes with an increase in the variance of the observable. One of the strong points of GF studies, the high statistical precision, had to be sacrificed. Some efforts were made in order to understand the anatomy of these cutoff effects, first in a perturbative context to tree level [10], and later more systematically from an effective field theory point of view [11]. It became clear that the integration of the flow equations and the evaluation of the relevant observables at positive flow time could be performed in a way that no O(a 2 ) effects are produced: it is enough to use classically improved discretizations [11]. Still, the use of these improved observables did not reduce substantially the observed scaling violations (see for example [12]). Unavoidably large lattices have to be simulated in GF studies, and the continuum extrapolation will remain the main source of concern.
In this work we study the main sources of systematic effects in finite-size scaling studies using the GF. We will argue that changes in the flow time t are responsible for the scaling violations. On the contrary, when different finite volume couplings are determined at the same value of the flow time t, the scaling violations are very small. This observation will be supported by a non-perturbative study. Moreover it will allow us to propose a new strategy for the determination of the step scaling function by breaking it up into two pieces: first a change in the flow time, without any change in the volume, second a change in the volume without any change in the flow time. Since the first step can be performed without having to double the lattices, the continuum extrapolation can be performed much more accurately. We will discuss in detail the advantages of this approach. Finally we will apply this alternative strategy to the case of pure gauge SU(3). Using the same datasets of [13], we will revisit the most crucial part of this work: the running at high energies and the matching with the asymptotic perturbative regime.
We also include an analysis of the variance of flow observables, allowing us to predict the dependence of the statistical uncertainties with the flow time t.

Preliminaries
The gradient flow provides a set of renormalized observables with small variance that are easy to compute via numerical simulations. The key idea consists in adding an extra time coordinate to the gauge field, the flow time t. The dependence of the gauge field B µ (t, x) on the flow time is given by L √ 8t Figure 1: The GF coupling in finite volumeḡ 2 (µ) is measured by computing the action density of the flow field B µ (t, x) smeared over a distance √ 8t = µ −1 (see eq. (2.3)). The renormalization scale µ and the size of the system L are linked by the relation √ 8t = cL (with c a constant).
the first order diffusion equation (2. 2) The flow time has dimensions of length squared, and therefore introduces a scale into the problem (see figure 1). Gauge-invariant composite operators defined at positive flow time are automatically renormalized [14]. In particular, a renormalized coupling at scale µ = 1/ √ 8t can be defined by using the action density [2] g 2 (µ) The coupling defined using E = E m is usually referred as magnetic and the one using E = E e as electric.
Most applications of the gradient flow until today derive from these coupling definitions. In infinite volume the scale µ 0 = 1/ √ 8t 0 at which the couplingḡ 2 (µ 0 ) takes some pre-defined value is used to define the reference scale t 0 . In the context of finite-size scaling, the renormalization scale µ = 1/ √ 8t is linked with the linear size of the system and therefore the coupling "runs" with the size L (see fig 1). The constant c defines a scheme by fixing the ratio of the length of the system and the flow time scale √ 8t. The full definition of the GF coupling in finite volume readsḡ The constant N (c) has been determined for different choices of boundary conditions [4,5,6,15]. When computing the gradient flow coupling on the lattice there are several choices to make, beyond the action chosen for the simulation. First, the flow equation (2.1) has to be translated to the lattice. Second, there are many valid discretizations of the energy density E(t, x). It is well understood that using the Zeuthen flow to integrate the flow equations and using any classically improved discretization of the action density E(t, x) guarantees that no further O(a 2 ) effects are generated. The only remaining O(a 2 ) effects are those of the choice of action for the simulation and an additional counterterm, only affecting flow quantities [11].
On the lattice, where we can work only with dimensionless quantities, the finite volume gradient flow coupling (equation (2.6)) can be measured by determiningḡ 2 (µ) at the flow time (in lattice units) √ 8t/a = cL/a. It is also common to use a lattice version of the normalization factor N (c, a/ √ 8t) instead of the N (c) of eq. (2.6), in order to ensure that the leading order perturbative relation is exact for any lattice size.
The key character in finite-size scaling studies is the step scaling function It measures how much the coupling changes under a variation in the renormalization scale of a factor two 1 . Its determination in lattice simulations is performed via a matching of the bare parameters. At a fixed value of the bare coupling g 2 0 (and therefore fixed value of the lattice spacing a) 2 , one determines the GF coupling on a lattice of size L/a (resulting inḡ 2 (µ)), and on a lattice 2L/a (resulting in g 2 (µ/2)). This allows the determination of a lattice approximation of the step scaling function: In the rest of this section we will be concerned with the continuum extrapolation of Σ(u, a/ √ 8t); before that however, let us insist on two points: • The direct determination of Σ, as suggested above, requires the determination of the GF couplings at two different renormalization scales µ and µ/2, on lattices of two different sizes L/a and 2L/a.
• It is clear that once c (eq. (2.5)) is fixed, taking a/L → 0 is equivalent to taking a/ √ 8t → 0, so the usual notation that uses a/L as the variable to parametrize cutoff effects is fully justified. Nevertheless, it is a/ √ 8t = aµ the natural variable to measures the size of cutoff effects. Note that for the typical choices of c ∈ [0.2 − 0.5] we have scale √ 8t < L. With this choice of the renormalization scale, it is clear that, at fixed L/a, larger values of c would lead to smaller cutoff effects. This notation will also be convenient for the discussion that follows.

A new strategy for the determination of σ(u)
As already pointed out, the determination of the lattice step scaling function involves two steps: a change in the renormalization scale by a factor two, and change in the lattice size by the same factor. σ J 1 J 2 Figure 2: The determination of the step scaling function σ involves a change in the renormalization scale and in the size of the system of a factor two. These two steps do not need to be performed at the sime time. The figure shows that σ can be determined as the composition of the function J 1 that changes the renormalization scale µ → µ/2 at fixed size L, with the function J 2 that changes the size L → 2L at fixed renormalization scale µ.
In previous works these two changes have been performed at the same time in a single step, but conceptually there is no need to do so. Figure 2 shows that the value of the step scaling function σ(u) can be determined by the composition of two functions. First we have This function changes the renormalization scale by a factor two µ → µ/2 at fixed physical size L. Second, we need to determine This function changes the lattice size L → 2L keeping constant the renormalization scale is now exact, and provides an alternative method to determine σ.
Our main assumption is that large scaling violations come with changes in the renormalization scale. We will later provide evidence that this is the case, but for the moment let us discuss why this opens up the possibility to improve the quality of the continuum extrapolations. Let us start by explaining how these functions are computed in practice: • The determination of J 1 involves measuring how much the coupling changes when the renormalization scale is varied as µ → µ/2 at constant physical size L. This is simply achieved by measuring on a lattice simulation the value of the GF coupling at two different flow times (i.e. t → 4t, see eq. (2.5)). Crucially, this determination does not require to double the lattice size, allowing precise results without the need of very large values of L/a.
• The determination of J 2 requires to change the physical size L without varying the renormalization scale. In practice one fixes the bare coupling g 2 0 at a given value, and then measures the GF coupling on a L/a lattice at flow time √ 8t/a = cL/a and on a 2L/a lattice at the same t. This step requires to change the lattice size, but since the renormalization scale remains the same, one expects reduced cutoff effects. In some sense the determination of J 2 corresponds to measuring the finite volume effects inḡ 2 c (µ) (see figure 2). In the rest of this section we will provide evidence of these statements, but before that let us further comment on two points: • The determination of the functions J 1 and J 2 can be done on the lattice just by applying the definitions equation (2.10) and (2.11). Note however that in this case the functions will carry a dependence on the cutoff. We will label these functionsĴ 1 (u, a/ √ 8t) andĴ 2 (u, a/ √ 8t) respectively.
• In this work all numerical results make use of the same discretizations used in [13]. We encourage the reader to consult this reference for more details. Here it is enough to say that we define the GF coupling with SF boundary conditions and that our preferred setup, based on theoretical expectations, uses the Zeuthen flow and an improved definition of E(t, x). This preferred setup will also be compared with the more common combination Wilson flow/Clover observable.
Moreover, we will focus in the rest of the text on the magnetic definition of the GF coupling (see equations (2.4)) 3 . Of course, our discussion is general, and does not depend on these particular choices.

Leading order perturbation theory
As a first look at the proposal we examine the leading order perturbative relation. We use the continuum norm N (c) in the evaluation of the finite volume couplingsḡ 2 c and examine, to leading order in perturbation theory, the quantitieŝ Note that since we are working at leading order inḡ 2 , and thanks to the normalization by the constant factor u, all these quantities are one in the continuum. In this example we will examine a typical case where we consider data for L/a = 8, 10, 12, 16, 18, 20, 24, 32, 36, 40, 48 . (2.14) We will use c = 0.2 (see eq. (2.5)). Let us note a few basic points: • The determination ofĈ 0 (a/ √ 8t) andĈ 2 (a/ √ 8t) requires to double the lattice size. This means that with our data, lattice estimates for these functions will be available only for a factor 3 change from the finest to the coarsest lattice spacings: 8 → 16, 10 → 20, 12 → 24, 16 → 32, 18 → 36, 20 → 40, 24 → 48.
• On the other hand, the determination ofĈ 1 (a/ √ 8t) only requires the measurement of the GF coupling at different values of the flow time t. This can be done on all lattices, and our dataset provides a factor 6 change from the coarsest (L/a = 8) to the finest (L/a = 48) lattice spacing. Figure 3 shows the perturbative results. As the reader can see, the cutoff effects inĈ 0 (a/ √ 8t) are very similar to those inĈ 1 (a/ √ 8t). This can be understood in a simple way, since both these functions involve a change in the renormalization scale by a factor two. The main difference between both cases is thatĈ 1 (a/ √ 8t) can be determined using lattice spacings that are a factor two smaller, since its determination does not require any change in the lattice size. The determination ofĈ 2 (a/ √ 8t) does not involve any change in the renormalization scale, and it shows cutoff effects that are one order of magnitude smaller than eitherĈ 0 (a/ √ 8t) orĈ 1 (a/ √ 8t). In the next section we will show that indeed these properties hold non-perturbatively, and that they are not a coincidence of leading order perturbation theory. C 0 (a/L)Ĉ 1 (a/L)Ĉ 2 (a/L) Figure 3: Cutoff effects in the usual step scaling function (Ĉ 0 (a/ √ 8t)), compared to those in J 1 (seê C 1 (a/ √ 8t)) and J 2 (seeĈ 2 (a/ √ 8t)). See text for more details. Here we show the case of the Zeuthen flow/improved observable with plaquette gauge action (i.e. the same setup that will be used in our nonperturbative study).

Non-perturbative study
In this section we will describe the non-perturbative results used in this study, first to support the claim that the numerical determination ofĴ 2 has very small scaling violations, due to the fact that the renormalization scale is not changed (i.e. the determination ofĴ 2 amounts to measuring finite volume effects in the coupling). Then, we want to show that the determination of J 1 can be performed accurately even at values of c that are too small to allow for a conservative estimate of the step scaling function.
All the analysis have been performed using two different analysis codes: one [16] based on the Γ-method [16,17,18,19], and the other using a jackknife resampling technique.

Description of the data set
For our non-perturbative study we are going to use exactly the same dataset of [13]. This setup includes simulations of the pure gauge theory with the Wilson plaquette action on a lattice of size L 4 and lattice spacing a. We have several resolutions, L/a = 8, 10, 12, 16, 20, 24, 32, 48, at a large range of lattice spacings a and with Schrödinger Functional (SF) boundary conditions. The setup is the same as the one used for the perturbative study in section 2.2.1.
We have measurements of the GF coupling at values of c = 0.2, 0.3, 0.4 with two different discretizations: the usual Wilson flow/Clover observable and the Zeuthen flow/improved observable (more details can be found in [13]). Our target will be to determine the running non-perturbatively in the scheme defined by c = 0.2 by computing the associated step scaling function σ(u). Note that in reference [13] the value c = 0.3 was used because the large scaling violations at c = 0.2 did not allow for a determination of σ(u).
We will revisit this attempt at a direct determination of σ(u) here. Moreover, together with the data at c = 0.4, we will be able to determine both J 1 , J 2 , and compare their composition with the  direct determination of σ.
Finally, the data with c = 0.3 will be used in section 4 to compare the results of [13] (where the Λ parameter is obtained by using a direct determination of the step scaling function with c = 0.3) with our new strategy.
We will focus our investigations on the high energy regime, whereḡ 2 ∼ 1 − 3. Note that this region showed significant scaling violations, and in fact turns out to be the most delicate part of the analysis in the extraction of Λ (see [13] for more details).

Scaling violations in J 2
Let us start by investigating the scaling violations of J 2 . It is convenient to study the combination The continuum limit of the right hand side, f 2 (u, 0), has an asymptotic expansion (in perturbation theory) as a polynomial in u, starting with a constant term. Note that our data set allows to determinê J 2 (u, a/ √ 8t) for a factor three change in a/ √ 8t. The numerical raw data forĴ 2 (u, a/ √ 8t) is shown in figure 4a. We also include in the plot the continuum extrapolation. At this point we defer the discussion on how this continuum curve is determined to section 4, and focus on the key element: the non-perturbative data forĴ 2 show very small scaling violations for all values of the coupling under study. The continuum curve is at most two standard deviations away from the coarser lattice data ( √ 8t/a ≈ 1.6), and the two finest lattices with √ 8t/a ≈ 3.2, 4.8 show no significant deviation from the continuum value. One can look in more detail at the previous statement by interpolating the data with different L/a to a common value of u, and then look at the continuum extrapolation ofĴ 2 . We choose the value u = 1.5, where we have several points at each L/a, and therefore the necessary interpolations can be performed in the safest conditions available. Figure 4b shows that the Zeuthen flow data shows no significant scaling violations in the whole range of lattice spacing. The Wilson flow data show some scaling violations, but they are rather mild, with the finest lattice being almost compatible with the continuum value. Extrapolations of the data with both discretizations are in full agreement with each other.

Scaling violations in
(a) Results for the combination of eq. (2.16) and continuum extrapolation from global fit. (b) Results for theĴ 1 (u, a/ √ 8t)/u ratio for u = 1.5 for different discretizations. The continuum extrapolation is also included. Once more, it is convenient to study the quantity The crucial difference with the previous case is that the determination of J 1 involves a change in renormalization scale µ → µ/2, so we expect significant scaling violations. On the other hand, its determination does not require to double the lattice sizes. In practice we have a range in lattice spacing that spans a factor 2 further. Figure 5a shows a comparison of the raw data with the continuum curve (see section 4 for a discussion on its determination). In contrast with the case ofĴ 2 , we observe significant scaling violations, confirming our hypothesis that such violations are mainly a result of changes in the renormalization scale. Moreover, they show a complicated functional form: the three coarser lattices with √ 8t/a = 1.6, 2.0, 2.4 do not show a monotonous pattern. The data is several standard deviations away from the continuum curve. Figure 5b shows again the continuum extrapolation of J 1 (u) at the fixed value u = 1.5. The plot confirms that scaling violations are significant, with the different discretizations based on the Wilson/Zeuthen flow showing differences of several standard deviations for √ 8t/a < 3.2. Still, one can obtain an accurate extrapolation of J 1 . In order to do so, the large range of lattice spacings available to us, from L/a = 8 to L/a = 48, is crucial. This is of course possible only because the determination of J 1 does not require to double the lattice size. Figure 5a shows that the two finest lattices are in agreement with the continuum curve. Note however that these are very fine lattices with √ 8t/a ≈ 6.4, 9.6.
A comparison with a direct determination of σ 0.2 (u) Finally, let us compare our new strategy with the direct determination of the step scaling function σ c=0.2 . As an illustration we choose the value u = 1.5 as in the previous cases. Figure 6 shows that the data for the direct determination of σ c=0.2 (u) is more precise, but it also displays large scaling violations, similar to the case of J 1 . However, since the lattice size has to be doubled, we lack the data at √ 8t/a = 9.6, 6.4 and the finest lattice has √ 8t/a = 4.8, a factor two smaller. The final uncertainty in the continuum value of σ c=0.2 (u) is very difficult to assess due to the missing data at finer lattice spacing. This findings are consistent with earlier works: reference [12] already showed that controlling the continuum extrapolation is difficult if one relies on data with √ 8t/a ≤ 2.4. Here we see that  Orange symbols are for the "magnetic" definition of the GF coupling; black symbols are for the average of "magnetic" and "electric" definitions of the GF coupling; the blue symbols are from the dataset of ref. [30].
there is poor agreement between the continuum value obtained by the two different discretizations of the gradient flow with respect to the excellent agreement found forĴ 1 andĴ 2 in figures 5 and 4 respectively. Despite the higher statistical accuracy, the large systematic effects overshadow the better statistical precision.

Statistical uncertainties
It has been argued that a simple way to improve the scaling properties of GF couplings consists in using large values of c (see for example the discussion in [20]). Unfortunately, it is well known that this comes at the cost of increased statistical uncertainties. In this section we want to make this last statement more precise. We will present a simple model for the understanding of the scaling of the statistical uncertainties of the GF coupling and then we will show how the results of numerical simulations agree with this naive approach. Let us first focus our discussion on schemes that fully preserve the translational invariance, like the case of periodic [4] or twisted [21] boundary conditions. The gradient flow smears the original gauge field A µ (x) over a distance d ∼ √ 8t. Due to the invariance under translations, each four dimensional ball of radius √ 8t provides an estimate of the quantity E(x, t) (see figure 8). Under this assumption the volume average on a lattice L 0 × L 1 × L 2 × L 3 will make the variance of the observable E(x, t) proportional to Note that in the common situation of an L 4 lattice with the same length in all directions one has F = c 4 (see equation (2.5)). This gives a quantitative explanation to the fact that the statistical uncertainties are large at large values of c.
In schemes where the invariance under translations is broken in the time direction, like Schrödinger Functional (SF) [5] or open-SF [15] boundary conditions on a box of sizes L 0 × L 1 × L 2 × L 3 , a similar argument applies, except that in these cases the coupling is only measured at a single time-slice x 0 = L 0 /2. Therefore in this case we expect a factor e.g. on a symmetric lattice F = c 3 . When do we expect this model to break down? For the volume average argument to make sense, the region that is smeared by the flow must be much smaller than the size of the lattice, so we require Note that for the case of an L 4 lattice this condition just means c 0.5. The typical values used in the literature are c = 0.2 − 0.4, so we can only expect the scaling of the variance to be approximate. In order to see how good this approximation is, it is useful to have a look at the quantity If our hypothesis is correct, we expect this combination to be independent on the lattice size and on the values of √ 8t/L µ . Figure 7 shows this quantity in three data sets. First, in orange we plot the usual magnetic coupling definition that we have been using for our non-perturbative study (section 2. Note that, despite the fact that the lattice size changes by a factor of four, and that the values of c change by a factor two, the combination in equation (3.4) shows a very mild variation in all the range of couplingsḡ 2 = 1 − 4. The plot shows some variation, but to a reasonably good approximation we can say that Q(ḡ 2 ) ≈ 0.25. An even better description of the data can be obtained by using a simple linear approximation.
Second, in black, we have another definition of the coupling in the same datasets (in particular the lattice sizes and values of c are the same as in the previous case). The data corresponds to the coupling definition based on the space-time components of the Energy density (i.e. the average between the "magnetic" and the "electric" components). Despite the high correlation between the electric and magnetic energy densities, the average shows a significant smaller variance. In this the combination of equation (3.4) can also be reasonably well described by a linear function.
Finally, in blue, we have data with twisted boundary condition on an asymmetric lattice [30]. In this case the simulations are done on volumes L 2 × (L/3) 2 (see [22] for the theoretical motivation behind this particular geometry). We use data with and not only onḡ 2 . Still, the variation is not large taking into account the disparity of scales (varying by several factors) involved in the data (note that naively the variance changes by more than two orders of magnitude).
It is also worth mentioning that the variance at weak coupling is very similar for the three datasets, differing by at most a factor three. Together with the observation that being a flow observable, the quantity in equation (3.4) has a well defined continuum limit, we can conclude that the function in equation

The high energy regime of Yang-Mills revisited
As a further test on our proposal, we will re-examine the high energy regime of Yang-Mills. Let us first recall the relevant points of the work [13]. • Most of the error in the dimensionless ratio comes from the first piece (i.e. the high energy part). The total uncertainty in Λ MS × √ 8t 0 is 1.57%, while the uncertainty in Λ MS /µ ref is already 1.37%.
Given that the pure gauge determination of Λ MS has to face the very same challenges as the determination of the strong coupling α s (M Z ) in QCD, we think that revising the crucial part of the work [13] with the new method proposed in this work is fully justified. We recall that our dataset is exactly the same as the one used in [13] (see section 2.2.2).

The continuum limit of J 1 and J 2
In order to obtain the functions J 1 and J 2 in the continuum, the best strategy consists in combining the continuum extrapolation with a parametrization of the function in the continuum. In particular we are going to use the parametrization Note that the coefficients c (i) n parametrize the continuum function J i (u), while the coefficients ρ (i) n parametrize the O(a 2 ) cutoff effects in the functionĴ i . There are several assumptions hidden in this parametrization. First we assume that the continuum function 1/Ĵ i (u, 0) − 1/u can be well described by a polynomial. This is certainly the case in perturbation theory to all orders 4 , and we expect that the non-perturbative functions can be well described by a polynomial ansatz. A more delicate assumption is that in our functional form of equation (4.2) all scaling violations are quadratic (i.e. a 2 /(8t)). First, due to the breaking of translational invariance in the Schrödinger Functional, we expect cutoff effects linear in the lattice spacing. They are expected to be small, due to the localization of the GF coupling at the timeslice x 0 = T /2. Moreover the extrapolations in section 2.2.2 have completely ignored these effects, and our data in fact seem to scale like O(a 2 ) after dropping the coarser lattices. But due to the high precision of our data, these O(a) effects cannot be completely ignored, especially if we take into account the fact that our strategy uses data at large 4 In fact the first two coefficients c 1 = −0.0063 (6) . c (2) 0 = −0.0461 (7) , c  values of √ 8t/T = 0.4, where these effects are expected to be larger than in [13], where √ 8t/T = 0.3 was used. For this reason we include a generous estimate of these linear effects in the error. The details are explained in Appendix A.
Our data set has also higher order cutoff effects, of the form a n for n > 2, and logarithmic corrections as well [25]. The effect of these terms in our extrapolations will be estimated by changing the cuts used to fit the coefficients c

The case J 1
In the exploratory study in section 2.2.2 the Zeuthen flow/improved observable discretization already displayed better scaling properties, but still we had to discard all lattices with L/a < 12. Since the Wilson flow/Clover combination would require even more stringent cuts, we will only use the improved setup to quote final results. This is confirmed by looking at table 1, where the values in the continuum of J 1 (u, 0) are shown at a few representative values of u. As the reader can see, the effect of varying the number of fit parameters (n c and n ρ in equation (4.2)) is negligible. On the other hand, the cut in L/a has a small effect on the extrapolations. If lattices with L/a = 12 are included, the continuum value of J 1 seems to be systematically higher, but still compatible within errors. Also the statistical errors are smaller for these analysis. A conservative approach consists in just taking a fit with L/a ≥ 20 (i.e. a/ √ 8t < 1/4), so that the continuum value has a larger uncertainty. Note that since the computation of J 1 does not require to double the lattice sizes, even with this stringent cut our dataset still offers more than a factor two in lattice spacing. Among these fits there is very little difference between different parametrizations. Moreover the fit quality is very similar in all cases. All in all we just choose one of these fits (n c = 3, n ρ = 2, bold in table 1) as our final result.

The case J 2
The computation of J 2 requires to double the lattice sizes, and then our datasets offers only half the lever arm in lattice spacing for the continuum extrapolations. Our hypothesis is that the scaling violations are small for J 2 because its determination does not involve a change in renormalization scale. Our preliminary investigation of section 2.2.2 has also confirmed this hypothesis. Table 2 shows that this is indeed the case. Even including the coarser lattices with L/a = 8 (corresponding to a/ √ 8t = 1/1.6), the results are in agreement within errors. It is clear that the choice of parametrization has very little effect. We just settle for one particular fit with L/a ≥ 12 (represented in bold in table 2) that we will use for any further analysis.  Since our new strategy provides the step scaling function σ(u) for c = 0.2, we must first determine the value of our couplingḡ c=0.2 (µ ref ). The procedure is completely analogous to the determination of J 1 . We first defineĴ We choose to fit our data to the model The same considerations discussed in section 4.1.1 apply to the determination, in the continuum, of the relation betweenḡ 2 c=0.2 (µ) andḡ 2 c=0.3 (µ). In this case, however, we expect the scaling violations to be smaller, since the change in renormalization scale is not a factor two, but only a factor 3/2.
We performed several fits, changing the number of fit parameters n c and n ρ , and using different cuts for our data, and the overall analysis results in a consistent value forḡ c=0.2 (µ ref ) as long as data with L/a ≥ 16 is used.
We choose to quote the result with n c = 3 and n ρ = 2 and L/a ≥ 20 Despite the high precision, the result should be actually considered conservative, as this particular fit has one of the largest uncertainties of all the combinations that we tried (see figure 9). 5 The quantity √ 8t0µ ref was determined in two different schemes, and for each scheme, using two different strategies. All analysis resulted in completely negligible differences.  (4.7)).

The extraction of Λ MS
The Λ s -parameter in the scheme defined by the couplingḡ 2 s is given by the expression Note that this expression is exact, and valid beyond perturbation theory, as long as the non-perturbative β-function, defined by This last formula allows for a non-perturbative definition of Λ MS , even if the MS scheme is intrinsically perturbative. All in all, the determination of Λ MS requires the determination of the integral in equation (4.8) in a scheme that is non-perturbatively defined. The lower limit of the integral is zero, which requires to determine the β-function up to infinite energy. In practice this can only be achieved by a limit process. One first defines very similar to the previous function I s (ḡ s (µ)). The only difference is that the integral in equation (4.8) for values of the coupling smaller than g PT is determined by substituting the β s -function by its l-loop perturbative approximation β (l)    (4.17)). The conversion from the GF scheme to the SF scheme is performed non-perturbatively and detailed in appendix B. The conversion to the MS scheme is done by using the perturbative relation with the SF scheme (equation (4.19)). The last two rows show possible extrapolations of √ 8t 0 Λ MS using the last four values (n = 2, 3, 4, 5). We show both an extrapolation linear in theḡ 4 and a extrapolation to a constant for the cases that this behavior is compatible with the data.
The first two coefficients b 0 = 11/(4π) 2 and b 1 = 102/(4π) 4 are scheme-independent, while the values b n for n > 1 depend on the chosen scheme. It is now clear that (4.14) In practice for most finite volume schemes, the β s -function is known up to three loops, and therefore the corrections are O(g 4 PT ). The value of the coupling g PT delimits the energy region (from µ PT to ∞) where perturbation theory is used via the relation Ideally one would like to estimate the Λ-parameter by taking the following limit Since the value of the couplingḡ 2 s (µ) runs logarithmically with µ, it is technically a challenge to probe a large range of energy scales so that the corrections O(g 2(l−1) PT ) vary substantially and the limit can be taken accurately.
Of course finite-size scaling was designed to explore such large ranges of energy scales. Starting from the scale µ ref (see section 4.2.1), and with the knowledge of the step scaling function σ = J 2 • J 1 (see section 4.1), one can define the sequence of couplings The energy scales reached by this procedure increase geometrically. Contact with perturbation theory can be made at each step by choosing g 2 PT = u n in equation (4.14), and one can indeed check that the corrections O(u 2 n ) are small and decrease as they should. For a long time, the challenge was mainly to maintain a high precision, but the most recent works [26,13] have shown that when one reaches a high precision, the corrections can be significant in some schemes even at very high energy scales.
For this reason reaching high energies alone is not enough. The limit in equation (4.16) has to be taken seriously and the systematics well estimated. Fortunately our dataset allows us to study the matching with perturbation theory at energy scales µ n = 2 n µ ref , (n = 0, . . . , 5) . Moreover we will explore several options to match with perturbation theory: GF: This is just the direct application of equation (4.8) using g 2 PT = u n to determine I GF (ḡ(µ n )) (cf. equation (4.14)). Schematically: In this case the matching with perturbation theory is performed in the GF scheme at a scale µ PT = µ n = 2 n µ ref .
SF: Reference [13] showed that schemes based on the GF show a very poor perturbative convergence. The same reference suggested to match non-perturbatively to the traditional SF coupling [27] with background field. The details of this matching are explained in appendix B. Schematically: In this case matching with perturbation theory is performed in the SF scheme at a scale µ PT = 0.3 × µ n+1 = 0.3 × 2 n+1 µ ref .  Once the value of the coupling in the MS scheme is known, one can use the known 5-loop βfunction [29] to determine directly Λ MS . Even if the running is known much more accurately in the MS scheme, this procedure carries the same parametric uncertainty O(ḡ 4 MS ) as the others, since the limiting factor is represented by the known orders in the perturbative relation between couplings, equation (4.19) (see [26]). Schematically we have

MS
In this case the scale of matching with perturbation theory is performed in the SF scheme at a scale µ PT = s 0.3 × 2 n+1 µ ref , but the RG evolution is done in the MS scheme. The value of s is in principle arbitrary, but if taken too large the perturbative coefficients of equation (4.20) become large, and one expects a bad asymptotic convergence of the perturbative series. We will explore two choices: first the simple s = 1, and then the value s = 2, that is very close to the value of fastest apparent convergence 6 .  Figure 10: The dimensionless product √ 8t 0 × Λ MS as a function of g PT (see equation (4.16)). The empty symbols represent the data of table 3 for n = 0, . . . , 5, while the filled symbols are extrapolations g PT → 0 (shifted for better visibility) of the different approaches to the perturbative matching (see text for more details). The gray band is the result of reference [13], while the data point labeled FlowQCD is the result of reference [23].

Results and discussion
We refer the reader once more to table 3. The values for √ 8t 0 Λ MS differ for the different treatments of perturbation theory. There are two important points worth mentioning: 1. Even at scales whereḡ 2 ≈ 1 (corresponding to α ≈ 0.08), different treatments of perturbation theory produce values of √ 8t 0 Λ MS that vary as much as 3%.
2. There are two particular treatments of perturbation theory (labeled SF and MS(s = 2)), where the value of √ 8t 0 Λ MS is constant within errors when extracted over a range of energy scales that vary by a factor 32.
These results are also plotted in figure 10. Qualitatively we see that the variations between different treatments of perturbation theory roughly scale as expected (i.e. decrease proportionally to α 2 ).
A more quantitative picture is obtained by looking at the two last rows of table 3. They show possible extrapolations of the quantity √ 8t 0 Λ MS (see equation (4.16)): the deviation from the final result of reference [13] √ 8t 0 Λ MS = 0.6227(98) of any of the possible extrapolations is below the statistical uncertainties (about 1.5%). This is half of the differences present at scales whereḡ 2 ≈ 1.
All extrapolations g PT → 0 agree well (last two rows of table 3). In particular even the extrapolations that assume that the higher order terms proportional to g 4 PT are negligible show quite a small uncertainty. Still, the error band covers the central values of all other extrapolations. Note however that the size of the uncertainties depends strongly on how much data one decides to include. A very conservative approach (such as the one used in reference [13]) would consist in just quoting as final result the value at the most perturbative point. This is justified since the data labeled SF and MS (s = 2) shows basically no dependence on the value of g PT .
Note however that the methodology in these two works is very different. In particular they deal with the systematic associated with the continuum extrapolations in a very different way. Reference [13] uses the GF coupling with c = 0.3 in order to perform the non-perturbative running. On the other hand we use the step scaling function with c = 0.2, determined as the composition of the functions J 1 and J 2 as described in section 2.2 order to do the non-perturbative running. Even with the highly conservative approach that we used in section 4.1 to perform the continuum extrapolations ofĴ 1 andĴ 2 , we find a final uncertainty on √ 8t 0 Λ MS of the same size of the one obtained in reference [13]. Moreover, the fact that the central values are in perfect agreement in the two calculations provides very strong evidence that the systematic effects associated with the continuum extrapolations are completely under control, and well below our statistical uncertainties.
Finally, the extrapolations that assume a linear dependence in α 2 PT show larger uncertainties. Still, it is very important to see that the g PT → 0 extrapolation substantially improves the agreement between all treatments of the perturbative matching.
It is also worth mentioning that the propagation of the linear O(a) effects (see appendix A) represent about a 15% of the final error squared in √ 8t 0 Λ MS . This is about a 50% larger than in the extraction of reference [13] and can be understood noting that making use of values of the coupling at c = 0.4, we increase the boundary effects.
All in all our approach shows a remarkable agreement between very different treatments of the matching with perturbation theory, and thanks to our new proposal, we are able to also show a very good agreement with previous works that have rather different systematics associated with the continuum extrapolation. The approach labeled MS in section 4.2.2 is very close to many phenomenological extractions of the strong coupling. The value ofḡ 2 MS is extracted from a measurement, in this case from the value of the SF coupling obtained in a simulation, thanks to its perturbative expansion ) effects in equation (4.14)). In particular, in phenomenology, it is very common to vary the renormalization scale a factor two above/below some chosen value. Figure 11 shows such estimate of the uncertainties propagated to the Λ-parameter. δ is obtained by varying the renormalization scale of a factor two above/below the scale of fastest apparent convergence (i.e. the average difference between the values of Λ obtained after using s = 1 and s = 2 and then s = 2 and s = 4).

Scale uncertainties
From figure 11 it is clear that that scales uncertainties are rather large in the pure gauge theory 7 . Even at α ≈ 0.1, corresponding to the highest scales reached in our study, they are around 2%. One might question the results of our works, that claim a significantly smaller uncertainty. The key to claim smaller errors than the scale uncertainties lies in the limit definition of Λ, equation (4.16). Once the limit g PT → 0 is properly taken, and its systematic estimated, one does not need to talk about the uncertainties at non-zero g PT . Of course taking such a limit is hard: data at different values of g PT is required. Due to the logarithmic running of the coupling with the physical energy scale, the apparently innocent limit of equation (4.16) requires to solve a hard multi-scale problem. Even with our datasets, that spans a factor 32 in energy scales (a change in the coupling g 2 PT by more than a factor two), we have seen that some assumptions on the scaling as g PT → 0 are needed in order to reach the 1.4% precision on Λ.
We consider our approach to treat perturbative uncertainties very conservative. Still, future works in the pure gauge theory might want to explore even larger energy scales.

Conclusions
In this work we have examined the main sources of uncertainties present in finite-size scaling studies using the Gradient Flow: the continuum extrapolation and the statistical uncertainties. We have argued that scaling violations are a result of exploring changes in the flow time. This observation has been supported both by a perturbative study and by non-perturbative numerical results.
The determination of the step scaling function σ(u), the crucial observable in step scaling studies, involves both a change in the flow time and a change in the size of the system. We propose to divide the determination of σ(u) in two pieces: first, a change in the renormalization scale at a constant size of the system (the function J 1 ), followed by a change of the size of the system at constant renormalization scale (the function J 2 ). The advantage is that, according to our hypothesis, only the first step shows significant cutoff effects. By breaking up the determination in two pieces, the scaling violations can be studied much more accurately. Modest datasets allow to explore a change of the renormalization scale at constant physical volume with lattice spacings varying by factors 4-6. In section 2.2 we have seen that this strategy usually comes at the cost of larger statistical uncertainties, especially in schemes like the Schrödinger Functional that break translation invariance and have to deal with O(a) systematic effects. Thus, the proposal trades the large systematic associated with the continuum extrapolations present in many GF studies with a statistical uncertainty. Since the latter are much easier to control, we think that the proposed strategy shows a clear advantage. In section 3 we have shown that statistical uncertainties can be well understood and predicted with a simple model.
We think that this strategy can shed some light in many problems that are currently being studied where systematic effects of the continuum extrapolation are relevant (see for example the recent discussion in [7]). A detailed study of the scaling violations of J 1 , that according to our hypothesis are very similar to those of the step scaling function σ, should become a standard way to assess the quality of the continuum determination of the step scaling function.
We have also re-examined the determination of the Λ-parameter in the pure gauge theory. The most crucial step is the high energy region and the matching with the asymptotic perturbative regime. We have used the step scaling function with c = 0.2, determined using our new proposal. The matching with perturbation theory is performed in different schemes and using different procedures. Our datasets allows to match with perturbation theory at energy scales µ PT where α(µ PT ) 0.1. Even at this large energy scales the perturbative truncation effects are large, corresponding to about a 2% uncertainty in Λ. The size of this uncertainty is also confirmed by a scale variation analysis. All in all, perturbative errors are large in the pure gauge theory, making the determination of Λ rather challenging in this aspect, in particular when compared with the corresponding determination in QCD 8 . Fortunately, our dataset explores a large range of energy scales, and gives us the possibility to explore the limit α(µ PT ) → 0 (corresponding to µ PT → ∞). Once this limit is properly taken, the perturbative uncertainties at α(µ PT ) estimated using scale variation or any other procedure are irrelevant. Of course taking such a limit is very challenging. The corrections, O(α 2 (µ PT )), decrease very slowly due to the logarithmic running of the coupling at high energies.
Depending on the assumptions made in the extrapolation α(µ PT ) → 0, the uncertainty in Λ varies in the range 1 − 2%. All in all, our results show a perfect agreement with the final result of [13] ( √ 8t 0 × Λ MS = 0.6227(98)), obtained with c = 0.3, that quotes an uncertainty ≈ 1.4%. We stress once more that the method that we used in this work provides a careful control on the continuum extrapolations, leading to a final uncertainty on √ 8t 0 × Λ MS of the same size of [13] even when using a very conservative approach. Due to the different treatments of perturbation theory and the use of different schemes, it seems clear that, despite some discrepancies with other works (see discussion in [13]), the systematic effects in [13] are well under control.  Table 4: Values of the fit coefficients a 0 (c) and a 1 (c) that parameterize the sensitivity ofḡ 2 c to the boundary improvement coefficient c t .

A Boundary O(a) effects
The procedure to estimate the O(a) cutoff effects is completely analogous as in [13]. In fact we use the same datasets. We determine numerically the dependence of the GF coupling at different values of c = 0.2, 0.3, 0.4 with the boundary parameter c t . This is done using simulations at different values of the improvement parameter c t close to its 2-loop value  Table 4 shows the result of the parameters a 0 (c), a 1 (c). Now we need an estimate of how much the true, non-perturbative, value of c t differs from its 2loop value. There is no information available, but the extrapolations at constant u of section 2.2.2 completely ignored this linear effects and the data showed no significant deviation from a O(a 2 ) scaling. This suggests that these effects are smaller than our statistical uncertainties. With this insight, a conservative approach consists in using the full 2-loop contribution as an estimate of the difference between the 2-loop value and the correct non-perturbative one.
In summary we add to all our data, in quadratures, the error

B Matching with the SF scheme
Since the strategy is the same as the one used in [13], we refer the interested reader to this reference for more details.
Here it is enough to say that we performed a set of SF simulations with background field and lattice sizes L/a = 6, 8, 10, 12, 16 and at the same values of the bare coupling g 2 0 = 6/β as the available measurements of the GF coupling in twice the lattice size L/a = 12, 16, 20, 24, 32. We collect between 9 × 10 5 and 2 × 10 6 measurements of the SF couplingḡ 2 SF . We further remove all cutoff effects up to 2 loops fromḡ 2 SF (i.e. the leading cutoff effects are O(g 6 0 )). The non-perturbative data is fitted to a functional form of the type To quote all our results we use the same fit used in reference [13], where the coarser lattice is dropped from the fit and the functions f (u),ρ(u) are degree two polynomials.