1 Introduction

In recent years, the level of precision achieved at the LHC has reached far beyond what was once thought possible. This has initiated a new era of high precision phenomenology that has pushed the need for a robust understanding of theoretical uncertainty to new levels. Due to the perturbative nature of calculations in Quantum Chromodynamics (QCD), with respect to the strong coupling constant \(\alpha _{s}\), a leading theoretical uncertainty arises from the truncation of perturbative expansions [1, 2]. The current state of the art for parton distribution functions (PDFs) is next-to-next-to leading order (NNLO) [3,4,5,6,7,8,9,10]. However, these PDF sets do not generally include theoretical uncertainties arising from the truncation of perturbative calculations that enter the fit. The consideration of these so-called Missing Higher Order Uncertainties (MHOUs), and how to estimate them, is the topic of much discussion among groups involved in fitting PDFs [11,12,13,14].

More recently, a method of utilising a scale variation approach to estimating these uncertainties has been included in an NLO PDF fit [11]. This approach is based upon the fact that to all orders, a physical calculation must not depend on any unphysical scales introduced into calculations. Therefore varying the factorisation and renormalisation scales is, in principle, a first attempt at estimating the level of theory uncertainty from missing higher orders (MHOs). Motivated by the renormalisation group invariance of physical observables, this method is theoretically grounded to all orders. However, the method of scale variations has been shown to be less than ideal in practice [12, 15]. An obvious difficulty is the arbitrary nature in the chosen range of the scale variation, as well as the choice of central scale. Expanding on this further, even if a universal treatment of scale variations was agreed upon, these variations are unable to predict the effect of various classes of logarithms (e.g. small-x, mass threshold and leading large-x contributions) present at higher orders. As an example, studies of fits including small-x resummation have recently been done [16, 17], showing significant PDF changes. Since it is these type of contributions that are often the most dominant at higher orders, this is an especially concerning pitfall in the use of scale variations to estimate MHOUs. Rather more subtle are the challenges encountered when considering and accounting for correlations between fit and predictions of PDFs [12, 14]. An alternative method to the above is to parameterise the missing higher orders with a set of nuisance parameters, using the available (albeit incomplete) current knowledge [18, 19].

In this paper we present the first study of an approximate \({\textrm{N}}^3{\textrm{LO}}\) (aN\(^{3}\)LO) PDF fit. In particular, we first consider approximations to the N\(^{3}\)LO structure functions and DGLAP evolution of the PDFs, including the relevant heavy flavour transition matrix elements. We make use of all available knowledge to constrain an approximate parameterisation of the N\(^{3}\)LO theory, including the calculated Mellin moments, low-x logarithmic behaviour and the full results where they exist. Then for the case of hadronic observables (where less N\(^{3}\)LO information is available), we include approximate N\(^{3}\)LO K-factors which are guided by the size of known NLO and NNLO corrections. Based on the uncertainty in our knowledge of each N\(^3\)LO function, we obtain a theoretical confidence level (C.L.) constrained by a prior. The corresponding theoretical uncertainties are therefore regulated by our theoretical understanding or lack thereof. Applying the above procedure, we have performed a full global fit at approximate N\(^{3}\)LO, with a corresponding theoretical uncertainty included within a nuisance parameter framework. As we will show, adopting this procedure allows the correlations and sources of uncertainties to be easily controlled. The preferred form of the aN\(^{3}\)LO corrections is determined from the fit quality to data, subject to theoretical constraints from the known information about higher orders.

We note that the source of the above uncertainty is due the (currently unknown) missing ingredients at N\(^{3}\)LO, and hence to be precise this corresponds to a ‘missing N\(^{3}\)LO’ uncertainty. However under the common assumption that the dominant uncertainty from missing higher orders (MHOs) is due this uncertainty at the next not fully known N\(^{3}\)LO order, one can also expect this to provide a reasonable estimate of MHOs in the fit. Indeed, by allowing the unknown theory parameters to be determined by the fit to data, sensitivity to orders beyond N\(^{3}\)LO is explicitly introduced. As we will see, this is particularly transparent in the case of the hadronic K-factors, which are more directly interpreted as giving a full MHO uncertainty, although a similar sensitivity to higher orders (in particular at low x) is observed in the DGLAP evolution of the PDFs. Therefore, while we assume that the majority of this uncertainty is due to the missing information at N\(^{3}\)LO, it is the case that some is associated with orders even beyond this, most obviously further effects due to small-x logarithms. Nonetheless, there is in general a distinction between the missing N\(^{3}\)LO uncertainty we explicitly include and the uncertainty from MHOs beyond this and hence we will take care throughout this paper to distinguish the two where appropriate, even if the separation is not always clear cut. At present we assume that the majority of this uncertainty is due to the missing information at N\(^{3}\)LO, but that some is associated with orders even beyond this, most obviously further effects due to small-x resummation. We will discover that, indeed, the results support this interpretation. As we will show, adopting this procedure allows the correlations and sources of uncertainties to be easily controlled. The preferred form of the aN\(^{3}\)LO corrections is determined from the fit quality to data, subject to theoretical constraints from the known information about higher orders.

The outline of this paper is as follows. In Sect. 2 we present the theoretical framework, describing the method and conventions used for the rest of the paper. Section 3 describes the structure functions and their role in QCD calculations. In Sects. 4, 5 and 6 we present our approximations for the N\(^{3}\)LO DIS theory functions, while in Sect. 7 we present the K-factors at aN\(^{3}\)LO. In Sect. 8 we present the MSHT aN\(^{3}\)LO PDFs with theoretical uncertainties and analyse the implications of the approximations in terms of a full MSHT global fit. Section 9 contains examples of using these aN\(^{3}\)LO PDFs in predictions up to N\(^{3}\)LO. Finally in Sects. 10 and 11 we present recommendations for how to best utilise these PDFs and summarise our results.

2 Theoretical procedures

In this section we describe the mathematical procedures used to implement N\(^{3}\)LO approximations into the MSHT PDF framework. These procedures are discussed in terms of the Hessian minimisation method employed by the MSHT fit and extended by theoretically grounded arguments to accommodate theoretical uncertainties.

The above will be achieved by adapting the underlying theory description of the data from NNLO to N\(^{3}\)LO (a formal description of how this will be done for the \(F_{2}\) structure function is discussed in Sect. 3). Not all the ingredients necessary for full N\(^{3}\)LO theory predictions are known, where there is missing information the N\(^{3}\)LO theory predictions will therefore include additional theoretical nuisance parameters, allowing their variation via an additional degree of freedom in specific theoretical pieces. These theoretical nuisance parameters will be constrained via an additional \(\chi ^{2}\) penalty in the global fit and will accommodate a level of uncertainty for each added approximate N\(^{3}\)LO ingredient (more information on how these prior variations are decided is included in Sects. 4.16.1 and 7.1). From this point, the fitting procedure remains similar to previous MSHT fits with a number of extra theory nuisance parameters which are treated in the same manner as experimental nuisance parameters inherent in PDF fits i.e. they can be fit to the data via an expanded Hessian matrix.

2.1 Hessian method with nuisance parameters

Following the notation and description from [14], in the Hessian prescription, the Bayesian probability can be written as

$$\begin{aligned} P(T|D) \propto \exp \left( -\frac{1}{2}(T - D)^{T} H_{0} (T - D)\right) \end{aligned}$$
(2.1)

where \(H_{0}\) is the Hessian matrix and \(T = \{T_{i}\}\) is the set of theoretical predictions fit to N experimental data points \(D = \{D_{i}\}\) with \(i=1,\ldots , N\). In this section we explicitly show the adaptation of this equation to accommodate extra theoretical parameters (with penalties) into the total \(\chi ^{2}\) and Hessian matrices.

To adapt this equation to include a single extra theory parameter, we can make the transformation \(T \rightarrow T + t u = T^{\prime }\), where t is the chosen central value of the theory parameter considered and u is some non-zero vector such that \(u u^{T}\) is the theory covariance matrix for t. In defining this new theoretical prescription \(T^{\prime }\), we are making the general assumption that the underlying theory is now not necessarily identical to our initial NNLO theoryFootnote 1T.

We now seek to include a nuisance parameter \(\theta \), centered around t, to allow the fit to control this extra theory addition. We demand that when \(\theta = t\), \(T^{\prime }\) remains unaffected with the theory addition unaltered from its central value t. This leads us to the expression,

$$\begin{aligned} T^{\prime } + (\theta - t)u = T + t u + (\theta - t)u. \end{aligned}$$
(2.2)

Redefining the nuisance parameter as the shift from its central value t (\(\theta ^{\prime } = \theta - t\)) we define \(\theta ^{\prime }\) centered around 0. To constrain \(\theta ^{\prime }\) within the fitting procedure, we must also define a prior probability distribution \(P(\theta ^{\prime })\) centered around zero and characterised by some standard deviation \(\sigma _{\theta ^{\prime }}\),

$$\begin{aligned} P(\theta ^{\prime }) = \frac{1}{\sqrt{2\pi }\sigma _{\theta ^{\prime }}}\exp (-\theta ^{\prime \ 2} / 2\sigma _{\theta ^{\prime }}^{2}). \end{aligned}$$
(2.3)

Throughout this paper, we refer to the chosen variation of theory predictions in the language of the standard deviation \(\sigma _{\theta ^{\prime }}\) presented here. A caveat to this however is that technically speaking, this standard deviation is chosen with a level of arbitrariness based on general assumptions and known information about the theory (we will show how this is done in more detail in Sects. 4.16.1 and 7.1). Although this definition of \(\sigma _{\theta ^{\prime }}\) lacks the full extent of statistical meaning of a true standard deviation, the same is also true for scale variations as well as various experimental systematic uncertainties, which are often not strictly Gaussian. Furthermore, a more robust statistical meaning is recovered for the constraints on various theoretical parameters after a fit is performed, where we become less sensitive to a prior. Using this information and making the redefinition \(u \rightarrow u / \sigma _{\theta ^{\prime }}\) (in order to normalise the covariance matrix), we can update Eq. (2.1) to be

$$\begin{aligned} P(T|D\theta )&\propto \exp \left( -\frac{1}{2}\left( T^{\prime } + \frac{(\theta - t)}{\sigma _{\theta ^{\prime }}} u - D\right) ^{T} \right. \nonumber \\&\quad \times \left. H_{0} \left( T^{\prime } + \frac{(\theta - t)}{\sigma _{\theta ^{\prime }}} u - D\right) \right) \end{aligned}$$
(2.4)
$$\begin{aligned} P(T^{\prime }|D\theta ^{\prime })&\propto \exp \left( -\frac{1}{2}\left( T^{\prime } + \frac{\theta ^{\prime }}{\sigma _{\theta ^{\prime }}} u - D\right) ^{T} \right. \nonumber \\&\quad \times \left. H_{0} \left( T^{\prime } + \frac{\theta ^{\prime }}{\sigma _{\theta ^{\prime }}} u - D\right) \right) . \end{aligned}$$
(2.5)

From here, Bayes theorem tells us

$$\begin{aligned} P(T^{\prime }|D\theta ^{\prime })P(\theta ^{\prime }|D) = P(\theta ^{\prime }|T^{\prime }D)P(T^{\prime }|D) \end{aligned}$$
(2.6)

where our nuisance parameter \(\theta ^{\prime }\) is assumed to be independent of the data i.e. \(P(\theta ^{\prime }|D) = P(\theta ^{\prime })\). Integrating over \(\theta ^{\prime }\) gives

$$\begin{aligned} P(T^{\prime }|D)= & {} \underbrace{\int d\theta ^{\prime } P(\theta ^{\prime }|T^{\prime }D)}_{=1} P(T^{\prime }|D) \nonumber \\= & {} \int d\theta ^{\prime } P(T^{\prime }|D\theta ^{\prime }) P(\theta ^{\prime }). \end{aligned}$$
(2.7)

Combining Eqs. (2.3), (2.5) and (2.7) it is possible to show that,

$$\begin{aligned} P(T^{\prime }|D)&\propto \int d\theta \exp \left( -\frac{1}{2}\left[ \left( T^{\prime } + \frac{\theta ^{\prime }}{\sigma _{\theta ^{\prime }}} u - D\right) ^{T} \right. \right. \nonumber \\&\quad \times \left. \left. H_{0} \left( T^{\prime } + \frac{\theta ^{\prime }}{\sigma _{\theta ^{\prime }}} u - D\right) + \theta ^{\prime \ 2}/\sigma _{\theta ^{\prime }}^{2}\right] \right) . \end{aligned}$$
(2.8)

To make progress with this equation we consider the exponent and refactor terms in powers of \(\theta ^{\prime }\),

$$\begin{aligned}{} & {} \left( u^{T}H_{0} u + 1\right) \frac{\theta ^{\prime \ 2}}{\sigma _{\theta ^{\prime }}^{2}} + 2u^{T} H_{0} (T^{\prime }-D) \frac{\theta ^{\prime }}{\sigma _{\theta ^{\prime }}} \nonumber \\{} & {} \quad +\, (T^{\prime }-D)^{T}H_{0}(T^{\prime }-D). \end{aligned}$$
(2.9)

Defining \(M^{-1} = \frac{1}{\sigma _{\theta ^{\prime }}^{2}}\left( u^{T}H_{0}u + 1\right) \) and completing the square gives,

$$\begin{aligned}{} & {} M^{-1}\left[ \theta ^{\prime } + \frac{1}{\sigma _{\theta ^{\prime }}}M u^{T} H_{0} (T^{\prime }-D) \right] ^{2} \nonumber \\{} & {} \quad - \frac{1}{\sigma _{\theta ^{\prime }}^{2}}M\left( u^{T} H_{0} (T^{\prime }-D)\right) ^{2}+ (T^{\prime }-D)^{T}H_{0}(T^{\prime }-D). \nonumber \\ \end{aligned}$$
(2.10)

In Eq. (2.10), we are able to simplify the first term by defining,

$$\begin{aligned} {\overline{\theta }}^{\prime }(T,D) = \frac{1}{\sigma _{\theta ^{\prime }}} M u^{T} H_{0} (D-T^{\prime }). \end{aligned}$$
(2.11)

Expanding the second term leaves us with,

$$\begin{aligned} \left( u^{T} H_{0} (T^{\prime }-D)\right) ^{2} = (T^{\prime }-D)^{T} H_{0} u u^{T} H_{0} (T^{\prime }-D). \nonumber \\ \end{aligned}$$
(2.12)

The second and third term in Eq. (2.10) can then be combined to give,

$$\begin{aligned} (T^{\prime }-D)^{T}\left( H_{0}-\frac{1}{\sigma _{\theta ^{\prime }}^{2}}M H_{0} u u^{T} H_{0}\right) (T^{\prime }-D). \end{aligned}$$
(2.13)

Further to this we note that the following is true:

$$\begin{aligned}{} & {} (H_{0}^{-1} + u u^{T})\left( H_{0}-\frac{1}{\sigma _{\theta ^{\prime }}^{2}}M H_{0} u u^{T} H_{0}\right) \nonumber \\{} & {} \quad = 1 + uu^{T}H_{0} - \frac{1}{\sigma _{\theta ^{\prime }}^{2}}Muu^{T}H_{0} \nonumber \\{} & {} \qquad - \frac{1}{\sigma _{\theta ^{\prime }}^{2}}Mu u^{T}H_{0}u u^{T} H_{0} \nonumber \\{} & {} \quad = 1 + uu^{T}H_{0} - \frac{1}{\sigma _{\theta ^{\prime }}^{2}}Muu^{T}H_{0}\nonumber \\{} & {} \qquad - \frac{1}{\sigma _{\theta ^{\prime }}^{2}}Mu (\sigma _{\theta ^{\prime }}^{2}M^{-1} - 1) u^{T} H_{0} = 1. \end{aligned}$$
(2.14)

Using Eq. (2.14) we are finally able to rewrite Eq. (2.8) as,

$$\begin{aligned} P(T^{\prime }|D)&\propto \int d\theta ^{\prime } \exp \left( -\frac{1}{2} M^{-1} (\theta ^{\prime } - {\overline{\theta }}^{\prime })^{2} \right. \nonumber \\&\left. \quad - \frac{1}{2} (T^{\prime }-D)^{T}(H_{0}^{-1} + u u^{T})^{-1} (T^{\prime }-D)\right) . \end{aligned}$$
(2.15)

At this point we can make a choice whether to redefine our Hessian matrix as \(H = (H_{0}^{-1} + u u^{T})^{-1}\), or keep the contributions completely separate. By redefining the Hessian we can include correlations between the standard set of MSHT parameters included in \(H_{0}\) and the new theoretical parameter \(\theta ^{\prime }\) contained within \(u u^{T}\). However, by doing so we lose information about the specific contributions to the total uncertainty i.e. we cannot then decorrelate the theoretical and standard PDF uncertainties a posteriori. Whereas for the decorrelated choice, although we sacrifice knowledge related to the correlations between the separate sources of uncertainty, we are able to treat the sources completely separably. Interpreting Eq. (2.15) as in Eq. (2.1) we can write down the two \(\chi ^{2}\) contributions,

$$\begin{aligned} \chi ^{2}_{1}&= (T^{\prime } - D)^{T} (H_{0}^{-1} + u u^{T})^{-1} (T^{\prime }-D) \nonumber \\&= (T^{\prime } - D)^{T} H (T^{\prime }-D), \end{aligned}$$
(2.16)
$$\begin{aligned} \chi ^{2}_{2}&= M^{-1} (\theta ^{\prime } - {\overline{\theta }}^{\prime })^{2}, \end{aligned}$$
(2.17)

where \(\chi ^{2}_{1}\) is the contribution from the fitting procedure, \(\chi ^{2}_{2}\) is the posterior penalty contribution applied when the theory addition strays too far from its fitted central value and M is the posterior error matrix for this contribution. This will be discussed further in following sections.

2.2 Multiple theory parameters

In the case of multiple \(N_{\theta ^{\prime }}\) theory parameters, Eq. (2.5) becomes

$$\begin{aligned} P(T^{\prime }|D\theta ^{\prime })&\propto \exp \left( -\frac{1}{2}\sum _{i,j}^{N_{\textrm{pts}}}\bigg (T^{\prime }_{i} + \sum _{\alpha = 1}^{N_{\theta ^{\prime }}}\frac{\theta ^{\prime }_{\alpha }}{\sigma _{\theta ^{\prime }_{\alpha }}} u_{\alpha , i} - D_{i}\bigg ) \right. \nonumber \\&\quad \times \left. H^{0}_{ij} \bigg (T^{\prime }_{j} + \sum _{\beta = 1}^{N_{\theta ^{\prime }}}\frac{\theta ^{\prime }_{\beta }}{\sigma _{\theta ^{\prime }_{\beta }}} u_{\beta ,j} - D_{j}\bigg )\right) \end{aligned}$$
(2.18)

where we have explicitly included the sum over the number of data points \(N_{\textrm{pts}}\) in the matrix calculation for completeness.

The prior probability for all N\(^{3}\)LO nuisance parameters also becomes

$$\begin{aligned} P(\theta ^{\prime }) =\prod _{\alpha = 1}^{N_{\theta ^{\prime }}} \frac{1}{\sqrt{2\pi }\sigma _{\theta _{\alpha }^{\prime }}}\exp (-\theta _{\alpha }^{\prime \ 2} / 2\sigma _{\theta _{\alpha }^{\prime }}^{2}). \end{aligned}$$
(2.19)

Constructing \(P(T^{\prime }|D)\) using Bayes theorem as before, results in the expression,

$$\begin{aligned}&P(T^{\prime }|D) \nonumber \\&\quad \propto \int d^{N_{\theta ^{\prime }}}\theta ^{\prime } \exp \Bigg ( -\frac{1}{2}\Bigg [\sum _{i,j}^{N_{\textrm{pts}}}\bigg (T^{\prime }_{i} + \sum _{\alpha = 1}^{N_{\theta ^{\prime }}}\frac{\theta ^{\prime }_{\alpha }}{\sigma _{\theta ^{\prime }_{\alpha }}} u_{\alpha , i} - D_{i}\bigg ) H^{0}_{ij} \nonumber \\&\qquad \times \bigg (T^{\prime }_{j} + \sum _{\beta = 1}^{N_{\theta ^{\prime }}}\frac{\theta ^{\prime }_{\beta }}{\sigma _{\theta ^{\prime }_{\beta }}} u_{\beta ,j} - D_{j}\bigg ) + \sum _{\alpha , \beta }^{N_{\theta ^{\prime }}}\frac{\theta ^{\prime }_{\alpha }}{\sigma _{\theta ^{\prime }_{\alpha }}}\frac{\theta ^{\prime }_{\beta }}{\sigma _{\theta ^{\prime }_{\beta }}}\delta _{\alpha \beta }\Bigg ]\Bigg ). \end{aligned}$$
(2.20)

Following the same procedure as laid out in the previous section, defining \(M_{\alpha \beta }^{-1} = (\delta _{\alpha \beta } + u_{\alpha , i}H^{0}_{ij}u_{\beta , j}) / \sigma _{\theta _{\alpha }^{\prime }}\sigma _{\theta _{\beta }^{\prime }}\) and completing the square leaves us with,

$$\begin{aligned}{} & {} (T^{\prime }_{i} - D^{\prime }_{i}) H^{0}_{ij} (T^{\prime }_{j} - D^{\prime }_{j}) \nonumber \\{} & {} \quad + \sum _{\alpha ,\beta }^{N_{\theta ^{\prime }}}M_{\alpha \beta }^{-1}\left[ \left( \theta ^{\prime }_{\alpha } + \sum _{i,j}^{N_{\textrm{pts}}}\sum _{\delta =1}^{N_{\theta ^{\prime }}}\frac{1}{\sigma _{\theta _{\alpha }^{\prime }}}M_{\alpha \delta }u_{\delta ,i}H^{0}_{ij} (T^{\prime }_{j} - D_{j}) \right) ^{2} \right. \nonumber \\{} & {} \quad \left. - \left( \sum _{i,j}^{N_{\textrm{pts}}}\sum _{\delta =1}^{N_{\theta ^{\prime }}}\frac{1}{\sigma _{\theta _{\alpha }^{\prime }}}M_{\alpha \delta }u_{\delta ,i}H^{0}_{ij} (T^{\prime }_{j} - D_{j})\right) ^{2}\right] , \end{aligned}$$
(2.21)

where the summation over the \(\beta \) index in \(M^{-1}_{\alpha \beta }\) is implicit in the squared terms of the squared bracket expressions.

As in the previous section for a single parameter, we can define,

$$\begin{aligned} {\overline{\theta }}_{\alpha }^{\prime }(T^{\prime },D)&= \sum _{i,j}^{N_{\textrm{pts}}}\sum _{\delta = 1}^{N_{\theta ^{\prime }}}\frac{1}{\sigma _{\theta _{\alpha }^{\prime }}}M_{\alpha \delta }u_{\delta ,i}H^{0}_{ij}(D_{j} - T^{\prime }_{j}) \end{aligned}$$
(2.22)
$$\begin{aligned} H_{ij}&= \left( \left( H^{0}_{ij}\right) ^{-1} + \sum ^{N_{\theta ^{\prime }}}_{\alpha = 1} u_{\alpha ,i}u_{\alpha , j}\right) ^{-1} \end{aligned}$$
(2.23)

which leads to the final expression for P(T|D),

$$\begin{aligned}&P(T^{\prime }|D) \nonumber \\&\quad \propto \int d^{N_{\theta ^{\prime }}}\theta ^{\prime } \exp \left( -\frac{1}{2}\left[ \sum _{\alpha , \beta }^{N_{\theta ^{\prime }}} \left( \theta ^{\prime }_{\alpha } - {\overline{\theta }}_{\alpha }^{\prime }\right) M_{\alpha \beta }^{-1}\big (\theta ^{\prime }_{\beta } - {\overline{\theta }}_{\beta }^{\prime }\big ) \right. \right. \nonumber \\&\qquad \left. \left. + \sum _{i,j}^{N_{\textrm{pts}}}\left( T^{\prime }_{i} - D_{i}\right) H_{ij} \big (T^{\prime }_{j} - D_{j}\big )\right] \right) . \end{aligned}$$
(2.24)

which can be interpreted analogously to the single parameter case in (2.15).

2.3 Decorrelated parameters

In the treatment above we investigated the case of correlated parameters whereby the Hessian matrix was redefined in Eq. (2.23). In performing this redefinition we sacrifice the information contained within \(u_{\alpha ,i}u_{\alpha , j}\) in order to gain information about the correlations between the original PDF parameters making up \(H^{0}_{ij}\) and any new N\(^{3}\)LO nuisance parameters. As stated earlier, in this case, we can perform a fit to find \(H_{ij}\) but one is unable to separate this Hessian matrix into individual contributions.

As will be discussed in later sections, the K-factors we include in the N\(^{3}\)LO additions are somewhat more separate from other N\(^{3}\)LO parameters considered. The reason for this is that not only are they concerned with the cross section data directly, they are also included for processes separate from inclusive DIS.Footnote 2

Hence, we have some justification to include the aN\(^{3}\)LO K-factors’ nuisance parameters as completely decorrelated from other PDF parameters (including other N\(^{3}\)LO theory parameters). To do this we rewrite Eq. (2.23) as,

$$\begin{aligned}{} & {} \left( \left( H^{0}_{ij}\right) ^{-1} + \sum ^{N_{\theta ^{\prime }}}_{\alpha = 1} u_{\alpha ,i}u_{\alpha , j} + \sum _{p = 1}^{N_{p}}\sum ^{N_{\theta _{K}}}_{\delta = 1} u_{\delta ,i}^{p}u_{\delta , j}^{p}\right) ^{-1} \nonumber \\{} & {} \quad = \left( H_{ij}^{-1} + \sum _{p = 1}^{N_{p}}K_{ij,p}^{-1} \right) ^{-1}= H_{ij}^{\prime } \end{aligned}$$
(2.25)

where \(N_{\theta ^{\prime }} \rightarrow N_{\theta ^{\prime }} + N_{\theta _{K}}\), \(K_{ij,p}\) defines the extra decorrelated contributions from the N\(^{3}\)LO K-factor’s parameters, stemming from \(N_{p}\) processes; \(H_{ij}\) is the Hessian matrix including correlations with parameters associated with N\(^{3}\)LO structure function theory; and \(H_{ij}^{\prime }\) is the fully correlated Hessian matrix. It is therefore possible to construct these matrices separately and perform the normal Hessian eigenvector analysis (described in Sect. 8.3) on each matrix in turn. In doing this, we maintain a high level of flexibility in our description by assuming the sets of parameters (contained in \(H_{ij}^{-1}\) and \(K_{ij,p}\)) to be suitably orthogonal.

3 Structure functions at N\(^{3}\)LO

The general form of a structure function \(F(x,Q^{2})\) is a convolution between the PDFs \(f_{i}(x, Q^{2})\) and some defined process dependent coefficient function \(C(x,\alpha _{s}(Q^{2}))\),

$$\begin{aligned} F(x,Q^{2}) = \sum _{i=q,{\bar{q}},g}\left[ C_{i}(\alpha _{s}(Q^{2}))\otimes f_{i}(Q^{2})\right] (x) \end{aligned}$$
(3.1)

where we have the sum over all partons i and implicitly set the factorisation and renormalisation scales as \(\mu _{f}^{2} = \mu _{r}^{2} = Q^{2}\), a choice that will be used throughout this paper for DIS scales. We also note that the relevant charge weightings are implicit in the definition of the coefficient function for each parton.

In Eq. (3.1), the perturbative and non-perturbative regimes are separated out into coefficient functions \(C_{i}\) and PDFs \(f_{i}\) respectively. Since these coefficient functions are perturbative quantities, they are an important aspect to consider when transitioning to N\(^{3}\)LO.

The PDFs \(f_{i}(x, Q^{2})\) in Eq. (3.1) are non-perturbative quantities. However, their evolution in \(Q^{2}\) is perturbatively calculable. In a PDF fit, the PDFs are parameterised at a chosen starting scale \(Q^{2}_{0}\), which is in general different to the scale \(Q^{2}\) at which an observable (such as \(F(x,Q^{2})\)) is calculated. It is therefore important that we are able to accurately evolve the PDFs from \(Q_{0}^{2}\) to the required \(Q^{2}\) to ensure a fully consistent and physical calculation. To permit this evolution, we introduce the standard factorisation scale \(\mu _{f}\).

The flavour singlet distribution is defined as,

$$\begin{aligned} \Sigma (x,\mu _{f}^{2})=\sum ^{n_{f}}_{i=1}\left[ q_{i}(x,\mu _{f}^{2}) +{\overline{q}}_{i}(x,\mu _{f}^{2})\right] , \end{aligned}$$
(3.2)

where \(q_{i}(x,\mu _{f}^{2})\) and \({\overline{q}}_{i}(x,\mu _{f}^{2})\) are the quark and anti-quark distributions respectively, as a function of Bjorken x and the factorisation scale \(\mu _{f}^{2}\). The summation in Eq. (3.2) runs over all flavours of (anti-)quarks i up to the number of available flavours \(n_{f}\).

This singlet distribution is inherently coupled to the gluon density. Because of this, we must consider the gluon carefully when describing the evolution of the flavour singlet distribution with the energy scale \(\mu _{f}\). The Dokshitzer–Gribov–Lipatov–Altarelli–Parisi (DGLAP) [20] equations that govern this evolution are:

$$\begin{aligned} \frac{d {\varvec{f}}}{d \ln \mu _{f}^{2}}\equiv & {} \frac{d}{d \ln \mu _{f}^{2}}\left( \begin{array}{c}{\Sigma } \\ {g}\end{array}\right) \nonumber \\= & {} \left( \begin{array}{cc}{P_{q q}} &{} {\ n_{f}P_{q g}} \\ {P_{g q}} &{} {P_{g g}}\end{array}\right) \otimes \left( \begin{array}{c}{\Sigma } \\ {g}\end{array}\right) \equiv {\varvec{P}} \otimes {\varvec{f}} \end{aligned}$$
(3.3)

where \(P_{ij}: i,j \in q,g\) are the splitting functions and the factorisation scale \(\mu _{f}\) is allowing the required evolution up to the physical scale \(Q^{2}\). The matrix of splitting functions \({\varvec{P}}\) appropriately couples the singlet and gluon distribution by means of a convolution in the momentum fraction x. We note here that \(P_{qq}\equiv P_{q \rightarrow gq}\) is decomposed into non-singlet (NS) and a pure-singlet (PS) parts defined by,

$$\begin{aligned} P_{qq}(x)=P^{+}_{{\textrm{NS}}}(x)+P_{{\textrm{PS}}}(x), \end{aligned}$$
(3.4)

where the \(P^{+}_{NS}\) is a non-singlet distribution splitting function which has been calculated approximately to four loops in [21].Footnote 3 The non-singlet part of \(P_{qq}\) dominates at large-x but as \(x\rightarrow 0\), this contribution is highly suppressed due to the relevant QCD sum rules. On the other hand, due to the involvement of the gluon in the pure-singlet splitting function (as described above), this contribution grows towards small-x and therefore begins to dominate.

Turning to the splitting function matrix, each element can be expanded perturbatively as a function of \(\alpha _s\) up to N\(^{3}\)LO as,

$$\begin{aligned} {\varvec{P}}(x,\alpha _{s})= & {} \alpha _{s}{\varvec{P}}^{(0)}(x)+\alpha _{s}^{2}{\varvec{P}}^{(1)}(x)\nonumber \\{} & {} +\,\alpha _{s}^{3}{\varvec{P}}^{(2)}(x)+\alpha _{s}^{4}{\varvec{P}}^{(3)}(x)+\cdots , \end{aligned}$$
(3.5)

where we have omitted the scale argument of \(\alpha _{s}(\mu _{r}^{2}=\mu _{f}^{2})\equiv \alpha _{s}\) for brevity and \({\varvec{P}}^{(0)}\), \({\varvec{P}}^{(1)}\), \({\varvec{P}}^{(2)}\) are known [20, 22,23,24,25,26,27]. \({\varvec{P}}^{(3)}\) are the four-loop quantities which we approximate in Sect. 4 using information from [21, 28,29,30,31,32,33,34,35,36].

Considering Eq. (3.1), \(\Sigma (Q^{2})\) and \(g(Q^{2})\) are the singlet and gluon PDFs respectively, evolved to the required \(Q^{2}\) energy of the process via Eq. (3.3). For more information on the relevant formulae used in this convolution, the reader is referred to [37].

Thus far, we have limited our discussion to only light quark flavours. However, as we move through the full range of \(Q^{2}\) values, the number of partons which are kinematically accessible increases. More specifically, as we pass over the charm and bottom mass thresholds (where \(Q^{2} = m_{c, b}^{2}\)) we must account for the heavy quark PDFs and their corresponding contributions.

To deal with the heavy quark contributions to the total structure function, whilst remaining consistent with the light quark picture described above, we consider

$$\begin{aligned} f_{\alpha }^{n_{f} + 1}(x, Q^{2}) = \left[ A_{\alpha i}(Q^{2}/m_{h}^{2}) \otimes f_{i}^{n_{f}}(Q^{2})\right] (x), \end{aligned}$$
(3.6)

where we have an implied summation over partons i and \(A_{\alpha i}\) are the heavy flavour transition matrix elements [38, 39] which explicitly depend on the heavy flavour mass threshold \(m_{h}\), where these contributions are activated.Footnote 4 We also denote the PDFs as \(f_{i}^{n_{f}}\) and \(f_{i}^{n_{f} + 1}\) to indicate whether the PDF has been evolved with only light flavours (\(n_{f}\)) or also with heavy flavours (\(n_{f} + 1\)). In this work we only consider contributions at heavy flavour threshold i.e. where \(Q^{2} = m_{h}^{2}\). We then define the PDFs:

$$\begin{aligned} f_{q}^{n_{f} + 1}(x, Q^{2})= & {} \left[ A_{qq,H}(Q^{2}/m_{h}^{2}) \otimes f_{q}^{n_{f}}(Q^{2}) \right. \nonumber \\{} & {} \left. + A_{qg,H}(Q^{2}/m_{h}^{2}) \otimes f_{g}^{n_{f}}(Q^{2})\right] (x) \end{aligned}$$
(3.7a)
$$\begin{aligned} f_{g}^{n_{f} + 1}(x, Q^{2})= & {} \left[ A_{gq,H}(Q^{2}/m_{h}^{2}) \otimes f_{q}^{n_{f}}(Q^{2}) \right. \nonumber \\{} & {} \left. + A_{gg,H}(Q^{2}/m_{h}^{2}) \otimes f_{g}^{n_{f}}(Q^{2})\right] (x) \end{aligned}$$
(3.7b)
$$\begin{aligned} f_{H}^{n_{f} + 1}(x, Q^{2})= & {} \left[ A_{Hq}(Q^{2}/m_{h}^{2}) \otimes f_{q}^{n_{f}}(Q^{2}) \right. \nonumber \\{} & {} \left. + A_{Hg}(Q^{2}/m_{h}^{2}) \otimes f_{g}^{n_{f}}(Q^{2})\right] (x) \end{aligned}$$
(3.7c)

where we have an implicit summation over light flavours of q and a generalised theoretical description to involve heavy flavour contributions.Footnote 5 Equations (3.7a) and (3.7b) are the light flavour quark and gluon PDFs defined earlier, modified to include contributions mediated by heavy flavour loops. Whereas in Eq. (3.7c) we describe the heavy flavour PDF, perturbatively calculated from the light quark and gluon PDFs.

By considering the number of vertices (and hence orders of \(\alpha _{s}\)) required for each of these transition matrix elements to contribute to their relevant ‘output’ partons, we are immediately able to show:

$$\begin{aligned} A_{qq,H}&= \delta (1-x)\ +\ {\mathcal {O}}(\alpha _{s}^{2}) \quad A_{gg,H} = \delta (1-x)\ +\ {\mathcal {O}}(\alpha _{s}) \nonumber \\ A_{qg,H}&= {\mathcal {O}}(\alpha _{s}^{2}) \quad \qquad \qquad \qquad \quad A_{Hq} = {\mathcal {O}}(\alpha _{s}^{2}) \nonumber \\ A_{gq,H}&= {\mathcal {O}}(\alpha _{s}^{2}) \quad \qquad \qquad \qquad \quad A_{Hg}= {\mathcal {O}}(\alpha _{s}) \end{aligned}$$
(3.8)

where \(A_{qq,H}\) and \(A_{gg,H}\) include LO \(\delta \)-functions to ensure this description is consistent with the light quark picture discussed earlier. It is therefore the \(A_{Hg}\) transition matrix element which provides our lowest order contribution to the heavy flavour sector (i.e. \(g \rightarrow H {\overline{H}}\)).

The insertion of scale independent contributions to \(A_{\alpha i}\) introduce unwanted discontinuities at NNLO into the PDF evolution. In order to ensure the required smoothness and validity of the structure functions across \((x, Q^{2})\), these discontinuities must be accounted for elsewhere in the structure function picture. Equating the coefficient functions above the mass threshold \(m_{h}^{2}\) (describing the total number of flavours including heavy flavour quarks) and those below this threshold, discontinuities are able to be absorbed by a suitable redefinition of the coefficient functions. This procedure provides the foundation for the description of different flavour number schemes.

There are two number schemes which are preferred at different points in the \(Q^2\) range. Towards \(Q^2 \le m_{h}^2\) we adopt the Fixed Flavour Number Scheme (FFNS). Towards \(\frac{Q^2}{m_{h}^2} \rightarrow \infty \), the heavy contributions can be considered massless and therefore the Zero Mass Variable Flavour Number Scheme (ZM-VFNS) is assumed. In order to join the FFNS and ZM-VFNS schemes seamlessly together, we ultimately wish to describe the General Mass Variable Number Scheme (GM-VFNS) [40] (which is valid across all \(Q^2\)). This scheme can then account for discontinuities from transition matrix elements and re-establish a smooth description of the structure functions.

In [41] an ambiguity in the definition of the GM-VFNS scheme was pointed out (namely the freedom to swap \({\mathcal {O}}(m_{h}^{2}/Q^{2})\) terms without violating the definition of the GM-VFNS). We note here that since [42], MSHT PDFs have employed the TR scheme to define the distribution of \({\mathcal {O}}(m_{h}^{2}/Q^{2})\) terms, the specific details of which are found in [41, 43, 44]. The general method to relate the FFNS and GM-VFNS number schemes is to compare the prediction for a result e.g. the \(F_2\) structure function in the FFNS scheme:

$$\begin{aligned} F_{2}(x,Q^2)&= F_{2,q}(x, Q^{2})\ +\ F_{2, H}(x, Q^{2}) \nonumber \\&= C_{q,i}^{{\textrm{FF}},\ n_{f}}\otimes f_{i}^{n_{f}}(Q^2) + C_{H,i}^{{\textrm{FF}},\ n_{f}}\otimes f_{k}^{n_{f}}(Q^2) \nonumber \\&= C_{q,q}^{{\textrm{FF}},\ n_{f}}\otimes f_{q}^{n_{f}}(Q^2) + C_{q,g}^{{\textrm{FF}},\ n_{f}}\otimes f_{g}^{n_{f}}(Q^2)\nonumber \\&\quad + C_{H,q}^{{\textrm{FF}},\ n_{f}}\otimes f_{q}^{n_{f}}(Q^2) + C_{H,g}^{{\textrm{FF}},\ n_{f}}\otimes f_{g}^{n_{f}}(Q^2) \end{aligned}$$
(3.9)

and the GM-VFNS scheme,

(3.10)

where \(F_{2,q}\) and \(F_{2,H}\) are the light and heavy flavour structure functions respectively.Footnote 6\(C^{{\textrm{FF}}, n_{f}}\) and \(C^{{\textrm{VF}}, n_{f}+1}\) are the FFNS (known up to NLO [45, 46] with some information at NNLO [47,48,49] including high-\(Q^{2}\) transition matrix elements at \({\mathcal {O}}(\alpha _{s}^{3})\) [49,50,51,52,53,54,55]) and GM-VFNS coefficient functions respectively, and \(A_{\alpha i}(Q^2/m_{h}^2)\) are the transition matrix elements. We note that the above also applies to other structure functions and for clarity, in the following we consider the light and heavy structure functions separately.

3.1 \(F_{2,q}\)

Expanding the first term in Eq. (3.10) in terms of the transition matrix elements results in,

$$\begin{aligned} F_{2,q}(x,Q^2)&= C_{q, H}^{{\textrm{VF}},\ n_{f}+1}\otimes \bigg [A_{Hq}(Q^2/m_{h}^2)\otimes f_{q}^{n_{f}}(Q^2) \nonumber \\&\quad + A_{Hg}(Q^2/m_{h}^2)\otimes f_{g}^{n_{f}}(Q^2)\bigg ] \nonumber \\&\quad + C_{q, q}^{{\textrm{VF}},\ n_{f}+1}\otimes \bigg [A_{qq, H}(Q^2/m_{h}^2)\otimes f_{q}^{n_{f}}(Q^2) \nonumber \\&\quad + A_{qg, H}(Q^2/m_{h}^2)\otimes f_{g}^{n_{f}}(Q^2)\bigg ] \nonumber \\&\quad + C_{q, g}^{{\textrm{VF}},\ n_{f}+1}\otimes \bigg [A_{gq, H}(Q^2/m_{h}^2)\otimes f_{q}^{n_{f}}(Q^2) \nonumber \\&\quad + A_{gg, H}(Q^2/m_{h}^2)\otimes f_{g}^{n_{f}}(Q^2)\bigg ], \end{aligned}$$
(3.11)

which is valid at all orders. The first term in Eq. (3.11) is the contribution to the light quark structure function from heavy quark PDFs (since the term contained within square brackets is exactly our definition in Eq. (3.7c)). Due to this, the coefficient function \(C_{q,H}\) describes the transition of a heavy quark to a light quark via a gluon and is therefore forbidden to exist below NNLO. The second and third terms here are the purely light quark and gluon contributions, with extra corrections from heavy quark at higher orders.

Using the definitions in Eq. (3.8) we can obtain an equation for \(F_{2,q}(x, Q^{2})\) up to \({\mathcal {O}}(\alpha _{s}^{3})\) as,

$$\begin{aligned}{} & {} F_{2,q}(x, Q^{2}) = C_{q,q}^{{\textrm{VF}},\ (0)} \otimes f_{q}(Q^{2})\nonumber \\{} & {} \quad + \frac{\alpha _s}{4\pi }\ \bigg \{C_{q,q,\ n_{f}+1}^{{\textrm{VF}},\ (1)}\otimes f_{q}(Q^{2}) + C_{q,g,\ n_{f}+1}^{{\textrm{VF}},\ (1)}\otimes f_{g}(Q^{2})\bigg \} \nonumber \\{} & {} \quad + \left( \frac{\alpha _s}{4\pi }\right) ^{2}\ \bigg \{\bigg [C_{q,q,\ n_{f}+1}^{{\textrm{VF}},\ (2)} + C_{q,q}^{{\textrm{VF}},\ (0)} \otimes A_{qq,H}^{(2)}\bigg ]\otimes f_{q}(Q^{2}) \nonumber \\{} & {} \quad + \bigg [C_{q,g,\ n_{f}+1}^{{\textrm{VF}},\ (2)}+ C_{q,g,\ n_{f}+1}^{{\textrm{VF}},\ (1)}\otimes A_{gg, H}^{(1)} \nonumber \\{} & {} \quad + C_{q,q}^{{\textrm{VF}},\ (0)} \otimes A_{qg,H}^{(2)}\bigg ]\otimes f_{g}(Q^{2})\bigg \} \nonumber \\{} & {} \quad + \left( \frac{\alpha _s}{4\pi }\right) ^{3}\ \bigg \{\bigg [C_{q,q,\ n_{f}+1}^{{\textrm{VF}},\ (3)} + C_{q,q,\ n_{f}+1}^{{\textrm{VF}},\ (1)}\otimes A_{qq, H}^{(2)} \nonumber \\{} & {} \quad + C_{q,g,\ n_{f}+1}^{{\textrm{VF}},\ (1)}\otimes A_{gq, H}^{(2)}+ C_{q,q}^{{\textrm{VF}},\ (0)} \otimes A_{qq,H}^{(3)}\bigg ]\otimes f_{q}(Q^{2}) \nonumber \\{} & {} \quad + \bigg [C_{q,g,\ n_{f}+1}^{{\textrm{VF}},\ (3)} + C_{q,g,\ n_{f}+1}^{{\textrm{VF}},\ (1)}\otimes A_{gg, H}^{(2)} + C_{q,q,\ n_{f}+1}^{{\textrm{VF}},\ (1)}\otimes A_{qg, H}^{(2)} \nonumber \\{} & {} \quad + C_{q,g,\ n_{f}+1}^{{\textrm{VF}},\ (2)}\otimes A_{gg, H}^{(1)}+ C_{q,q}^{{\textrm{VF}},\ (0)} \otimes A_{qg,H}^{(3)}\bigg ]\otimes f_{g}(Q^{2}) \nonumber \\{} & {} \quad + C_{q,H}^{{\textrm{VF}},\ (2)}\otimes A_{Hg}^{(1)}\otimes f_{g}(Q^{2})\bigg \} + {\mathcal {O}}(\alpha _{s}^{4}) \end{aligned}$$
(3.12)

where \(C_{q,q}^{{\textrm{VF}},\ (0)} = \delta (1-x)\) up to charge weighting. Eq. (3.12) defines the light quark structure function to N\(^{3}\)LO including heavy flavour corrections.Footnote 7

3.2 \(F_{2,H}\)

Moving to the heavy quark structure function in Eq. (3.9), as above the second term in Eq. (3.10) can be expanded in terms of the transition matrix elements to obtain,

$$\begin{aligned} F_{2,H}(x,Q^2)&= C_{H, H}^{{\textrm{VF}},\ n_{f}+1}\otimes \bigg [A_{Hq}(Q^2/m_{h}^2)\otimes f_{q}^{n_{f}}(Q^2)\nonumber \\&\quad + A_{Hg}(Q^2/m_{h}^2)\otimes f_{g}^{n_{f}}(Q^2)\bigg ] \nonumber \\&\quad + C_{H, q}^{{\textrm{VF}},\ n_{f}+1}\otimes \bigg [A_{qq, H}(Q^2/m_{h}^2)\otimes f_{q}^{n_{f}}(Q^2) \nonumber \\&\quad + A_{qg, H}(Q^2/m_{h}^2)\otimes f_{g}^{n_{f}}(Q^2)\bigg ] \nonumber \\&\quad + C_{H, g}^{{\textrm{VF}},\ n_{f}+1}\otimes \bigg [A_{gq, H}(Q^2/m_{h}^2)\otimes f_{q}^{n_{f}}(Q^2) \nonumber \\&\quad + A_{gg, H}(Q^2/m_{h}^2)\otimes f_{g}^{n_{f}}(Q^2)\bigg ], \end{aligned}$$
(3.13)

which is valid at all orders. Similar to Eq. (3.11), we have a contribution from the heavy flavour quarks, the light quarks and the gluon respectively. However in this case, due to the required gluon intermediary, the coefficient functions associated with the light quark flavours and gluon are forbidden to exist below NNLO. Considering the \(C_{H,H}\) function, we are able to choose this to be identically the ZM-VFNS light quark coefficient function \(C_{q,q}\) up to kinematical suppression factors, since at \(Q^{2} \rightarrow \infty \) these functions must be equivalent [40, 44, 56].

The full heavy flavour structure function then reads as,

$$\begin{aligned}{} & {} F_{2,H}(x, Q^{2}) = \frac{\alpha _s}{4\pi }\bigg [C_{H,g}^{{\textrm{VF}},\ (1)} + C_{H,H}^{{\textrm{VF}},\ (0)}\otimes A_{Hg}^{(1)}\bigg ]\otimes f_{g}(Q^{2}) \nonumber \\{} & {} \quad + \bigg (\frac{\alpha _s}{4\pi }\bigg )^{2} \bigg \{\bigg [C_{H,q}^{{\textrm{VF}},\ (2)} + C_{H,H}^{{\textrm{VF}},\ (0)}\otimes A_{Hq}^{(2)}\bigg ]\otimes f_{q}(Q^{2}) \nonumber \\{} & {} \quad + \bigg [C_{H,g}^{{\textrm{VF}},\ (2)} + C_{H,g}^{{\textrm{VF}},\ (1)}\otimes A_{gg,H}^{(1)} + C_{H,H}^{{\textrm{VF}},\ (1)}\otimes A_{Hg}^{(1)} \nonumber \\{} & {} \quad + C_{H,H}^{{\textrm{VF}},\ (0)}\otimes A_{Hg}^{(2)}\bigg ]\otimes f_{g}(Q^{2})\bigg \} \nonumber \\{} & {} \quad + \bigg (\frac{\alpha _s}{4\pi }\bigg )^{3} \bigg \{\bigg [C_{H,q}^{{\textrm{VF}},\ (3)} + C_{H,g}^{{\textrm{VF}},\ (1)}\otimes A_{gq,H}^{(2)} \nonumber \\{} & {} \quad + C_{H,H}^{{\textrm{VF}},\ (1)}\otimes A_{Hq}^{(2)} + C_{H,H}^{{\textrm{VF}},\ (0)}\otimes A_{Hq}^{(3)}\bigg ]\otimes f_{q}(Q^{2}) \nonumber \\{} & {} \quad + \bigg [C_{H,g}^{{\textrm{VF}},\ (3)} + C_{H,g}^{{\textrm{VF}},\ (2)}\otimes A_{gg,H}^{(1)} + C_{H,g}^{{\textrm{VF}},\ (1)}\otimes A_{gg,H}^{(2)}\nonumber \\{} & {} \quad + C_{H,H}^{{\textrm{VF}},\ (2)}\otimes A_{Hg}^{(1)} \nonumber \\{} & {} \quad + C_{H,H}^{{\textrm{VF}},\ (1)}\otimes A_{Hg}^{(2)} + C_{H,H}^{{\textrm{VF}},\ (0)}\otimes A_{Hg}^{(3)}\bigg ]\otimes f_{g}(Q^{2})\bigg \} \end{aligned}$$
(3.14)

where combining Eqs. (3.12) and (3.14), one can obtain the full structure function \(F_{2}(x, Q^{2})\). Equating the FFNS expansion from Eq. (3.9) to the above expressions in the GM-VFNS setting, one can find relationships between the two pictures. In Sect. 6 we use this equivalence to enable the derivation of the GM-VFNS functions at N\(^{3}\)LO.

To summarise, we have identified the leading theoretical ingredients entering the structure functions and detailed how these affect the PDFs. As we will discuss further, when pushing these equations to N\(^{3}\)LO, there is already some knowledge available. For example, the N\(^{3}\)LO ZM-VFNS coefficient functions are known precisely for \(n_{f}=3\) from [57], as are a handful of Mellin moments [21, 35, 36, 50] and leading small and large-x terms [28,29,30,31,32,33,34, 49, 51,52,53,54] associated with the splitting functions and transition matrix elements at N\(^{3}\)LO. Using this information, we approximate these functions to N\(^{3}\)LO and incorporate the results into the first approximate N\(^{3}\)LO global PDF fit.

4 N\(^{3}\)LO splitting functions

Splitting functions at N\(^{3}\)LO allow us to more accurately describe the evolution of the PDFs. These functions are estimated here and the resulting approximations are included within the framework described in Sect. 2 and below in Sect. 4.1. In all singlet cases we set \(n_{f} = 4\) before constructing our approximations and ignore any corrections to this from any further change in the number of flavours.Footnote 8 In the non-singlet case, we calculate the approximate parts of \(P_{qq}^{NS\ (3)}\) with \(n_{f}=4\) however, there is a relatively large amount of information about the \(n_{f}\)-dependence included from [21]. Therefore in the final result we choose to allow the full \(n_{f}\)-dependence to remain for the non-singlet splitting function.

4.1 Approximation framework: discrete moments

In order to estimate the missing N\(^3\)LO uncertainty in the splitting functions (also transition matrix elements considered in the following Sect. 5), and ultimately include these into the framework described in Sect. 2.2, one must acquire some approximation at N\(^{3}\)LO. Here we discuss using available sets of discrete Mellin moments for each function, along with any exact leading terms already calculated, to obtain N\(^{3}\)LO estimations. To perform the parameterisation of the unknown N\(^{3}\)LO quantities, we follow a similar estimation procedure as in [58, 59] following the form,

$$\begin{aligned} F(x)=\sum _{i=1}^{N_{m}}A_{i}f_{i}(x) + f_{e}(x). \end{aligned}$$
(4.1)

In Eq. (4.1), \(N_{m}\) is the number of available moments, \(A_{i}\) are calculable coefficients, \(f_i(x)\) are functions chosen based on our intuition and theoretical understanding of the full function, and \(f_{e}(x)\) encapsulates all the currently known leading exact contributions at either large or small-x. To describe this, consider a toy situation where we are given four data points described by some unknown degree 9 polynomial. Along with this information, we are told the dominant term at small-x is described by 3x. In this case, one may wish to attempt to approximate this function by means of a set of 4 simultaneous equations formed from Eq. (4.1) equated to each of the four data points (or constraints). The result of this is then a unique solution for each chosen set of functions \(\{f_{i}(x)\}\). However, a byproduct of this is that for each \(\{f_{i}(x)\}\), one lacks any means to control the uncertainty in these approximate solutions. In order to allow a controllable level of uncertainty into this approximation, one must introduce an extra degree of freedom. This degree of freedom will be introduced through an unknown coefficient \(a \equiv A_{N_{m} + 1}\), which for convenience, will be absorbed into the definition of \(f_{e}(x) \rightarrow f_{e}(x, a)\). In this toy example one is then able to choose to define the functions \(f_{i}(x)\) as,

$$\begin{aligned} f_{1}(x)&= x^{3} \quad \text {or} \quad x^{4}, \nonumber \\ f_{2}(x)&= x^{5}, \quad \text {or} \quad x^{6} \nonumber \\ f_{3}(x)&= x^{7} \quad \text {or} \quad x^{8}, \nonumber \\ f_{4}(x)&= x^{9}, \nonumber \\ f_{e}(x, a)&= 3x + ax^{2}, \end{aligned}$$
(4.2)

where we have prioritised approximating the small-x behaviour more precisely than the large-x behaviour. This could easily be adapted and even reversed depending on which region of x we are most sensitive to, however in this paper we will be more focused on small-x. There is also an inherent functional uncertainty from the ambiguity in the choice of functions for \(f_{1,2,3}(x)\) in this toy example, in principle the number of functions in the functional variation can be larger than demonstrated here and indeed a larger choice of functions will be used for all \(f_i(x)\) when we apply this in practice in subsequent sections. Using these functions, one is then able to assemble a set of potential approximations to the overall polynomial, each uniquely defined by a set of functions and corresponding coefficients \(\{A_{i}, f_{i}\}\) for each value of a.

As mentioned, for the N\(^{3}\)LO additions considered in this framework we use the available calculated moments as constraints for the corresponding simultaneous equations. A summary of all the known and used ingredients for all N\(^{3}\)LO approximations is provided in Appendix A. The details of these known quantities will be discussed in detail in Sects. 4.2 and 5.1. We also mention here that towards the small-x regime, the leading terms present in the splitting functions and transition matrix elements exhibit the relations,

$$\begin{aligned} F_{gg}(x\rightarrow 0) \simeq&\frac{C_{A}}{C_{F}}F_{gq}(x\rightarrow 0), \end{aligned}$$
(4.3a)
$$\begin{aligned} F_{qq}(x\rightarrow 0) \simeq&\frac{C_{F}}{C_{A}}F_{qg}(x\rightarrow 0), \end{aligned}$$
(4.3b)

where \(F_{ij} \in \{P_{ij}, A_{ij,H}\}\) and \(C_{A}, C_{F}\) are the usual QCD constants. Although Eq. (4.3) are exact at leading order, it is known that as we expand to higher orders, these will break down due to the effect of large sub-leading logarithms. Due to this, we do not demand this relation as a constraint in our approximations. Instead we discuss the validity of Eq. (4.3) in comparison with the aN\(^{3}\)LO functions.

Following from [58, 59], we must choose a set of candidate functions for each \(f_{i}(x)\). Our convention is to assign these functions such that at small-x, \(f_{1}(x)\) is dominant, while at large-x, \(f_{N_{m}}(x)\) is dominant. With \(f_{i}(x)\ \forall i \in \{2,\ldots , N_{m} - 1\}\), dominating in the region between. The sets of functions assigned to each \(f_{i}(x)\) are determined for each N\(^{3}\)LO function based on knowledge from lower orders and our intuition about what to expect at N\(^{3}\)LO.

Analogous to our toy polynomial example, we allow the inclusion of an unknown next-to-leading small-x logarithm (NLL) term (NNLL in the \(P_{gg}\) case) into the \(f_{e}\) function of our parameterisation. The coefficient of this NLL (NNLL) term is then controlled by a variational parameter a. This parameter uniquely defines the solution to the sets of simultaneous equations considered i.e. for each set of functions \(f_{i}(x)\) there exists a unique solution for every possible choice of a. The final step to consider in this approximation is how to choose the prior allowed variation of a in a sensible way for each N\(^{3}\)LO approximation. To do this, we consider the criteria outlined below:

Criterion 1::

At sufficiently small-x (\(x < 10^{-5}\)), for a fixed value of a, we require \(f_{e}(x, a)\) to be contained within the range of variation for F(x) predicted from the combinations of functions in (4.2). For example, after fixing a, \(f_{e}(x, a)\) it should lie within the variation predicted for F(x) from the entire set of potential approximations defined in (4.2). In practice this means that we require the small-x behaviour to not be in large tension with the large-x description.

Criterion 2::

At large-x (\(x > 10^{-2}\)) the N\(^3\)LO contribution should have relatively little effect. More specifically, we do not expect as large of a divergence as we do at small-x. Due to this, we require that the trend of the N\(^{3}\)LO approximation follow the general trend of the NNLO function at large-x.

The allowed variation in a gives us an uncertainty which, at its foundations, is chosen via a conservative estimate based on all the available prior knowledge about the function and lower orders being considered. We note that given we are including known information about the higher order, it is not guaranteed that a value of \(a=0\) will satisfy either criterion 1 or 2. Indeed, typically the NLL coefficient in the splitting functions is the opposite sign to and larger than the LL contribution, for example in the NNLO splitting functions and the known NLL term in the N\(^3\)LO splitting function \(P_{gg}\). To determine a full predicted uncertainty for the function and allow for a computationally efficient fixed functional form, the variation of a can absorb the uncertainty from the ambiguity in the choice of functions \(f_{i}(x)\) (essentially expanding the allowed range of a – as will be shown in the following sections). Since the functions are approximations themselves, increasing the allowed variation of a to encapsulate the total uncertainty predicted by the initial treatment described above is a valid simplification.

A worked example following this procedure is provided for the \(P_{qg}^{(3)}\) and \(A_{Hg}^{(3)}\) functions in Sects. 4.2 and 5.1 respectively.

4.2 4-Loop approximations

4.3 \(P_{qg}^{(3)}\)

We begin by considering the four-loop quark-gluon splitting function. Here we provide a more detailed explanation of the method described in Sect. 4.1 which will then be applied to the remaining splitting functions considered in this section. Four even-integer moments are known for \(P_{qg}^{(3)}(n_{f} = 4)\) from [35, 36], along with the LL small-x term from [28].

The functions made available for the \(P_{qg}\) analysis are,

$$\begin{aligned} f_{1}(x)&= \frac{1}{x} \quad \text {or} \quad \ln ^{4}x \quad \text {or} \quad \ln ^{3} x \quad \text {or} \quad \ln ^{2} x,\nonumber \\ f_{2}(x)&=\ln x, \nonumber \\ f_{3}(x)&= 1 \quad \text {or} \quad x \quad \text {or} \quad x^{2},\nonumber \\ f_{4}(x)&= \ln ^{4}(1-x) \quad \text {or} \quad \ln ^{3}(1-x) \quad \text {or} \nonumber \\ {}&\qquad \qquad \qquad \qquad \qquad \ln ^{2}(1-x) \quad \text {or} \quad \ln (1-x),\nonumber \\ f_{e}(x,\rho _{qg})&=\frac{C_{A}^{3}}{3\pi ^{4}}\bigg (\frac{82}{81} + 2\zeta _{3}\bigg ) \nonumber \\&\quad \times \frac{1}{2}\frac{\ln ^{2}1/x}{x}\ + \rho _{qg}\ \frac{\ln 1/x}{x}, \end{aligned}$$
(4.4)

where \(\rho _{qg}\) is the variational parameter. This is then varied between \(-2.5< \rho _{qg} < -0.9\), which has been chosen to satisfy the criteria described in Sect. 4.1. The set of functions in Eq. (4.4) is chosen from the analysis of lower orders. Specifically, following the pattern of functions from lower orders, it can be shown that at this order we expect the most dominant large-x term to be \(\ln ^{4}(1-x)\) and \(\ln ^{4}x\) to be the highest power of \(\ln x\) at small-x.

Fig. 1
figure 1

Combinations of functions with an added variational factor (\(\rho _{qg}\)) controlling the NLL term. Combinations of functions at the upper (left) and lower (right) bounds of the variation are shown. The solid lines indicate the upper and lower bounds for this function chosen from the relevant criteria

Figure 1 displays an example of the variation found from the different choices of functions that encapsulate the chosen range of \(\rho _{qg}\). We also show the upper (A) and lower (B) bounds (at small-x) for the entire uncertainty (solid line) combining the variation in the functions and in the variation of \(\rho _{qg}\). The upper (\(P_{qg}^{(3),A}\)) and lower (\(P_{qg}^{(3), B}\)) bounds are given by,

$$\begin{aligned} P_{qg}^{(3), A}= & {} 1.6699\ \frac{1}{x} + 2.4167\ \ln x \nonumber \\{} & {} - 2.2011\ x^{2} + 0.0024228\ \ln ^{4}(1-x) \nonumber \\{} & {} +\frac{C_{A}^{3}}{3\pi ^{4}}\bigg (\frac{82}{81}\ + 2\zeta _{3}\bigg )\frac{1}{2}\frac{\ln ^{2}1/x}{x}\ -0.9\ \frac{\ln 1/x}{x}, \nonumber \\ \end{aligned}$$
(4.5)
$$\begin{aligned} P_{qg}^{(3), B}= & {} 12.582\ \ln ^{2} x + 5.3065\ \ln x \nonumber \\{} & {} + 1.7957\ x^{2} - 0.0041296\ \ln ^{4}(1-x) \nonumber \\{} & {} +\frac{C_{A}^{3}}{3\pi ^{4}}\bigg (\frac{82}{81}\ + 2\zeta _{3}\bigg )\frac{1}{2}\frac{\ln ^{2}1/x}{x}-2.5\ \frac{\ln 1/x}{x}. \nonumber \\ \end{aligned}$$
(4.6)

Using this information, a fixed functional form is chosen to be,

$$\begin{aligned} P_{qg}^{(3)}= & {} A_{1}\ \ln ^{2} x + A_{2}\ \ln x + A_{3}\ x^{2} + A_{4}\ \ln ^{4}(1-x) \nonumber \\{} & {} +\frac{C_{A}^{3}}{3\pi ^{4}}\bigg (\frac{82}{81}\ + 2\zeta _{3}\bigg )\frac{1}{2}\frac{\ln ^{2}1/x}{x}+ \rho _{qg}\ \frac{\ln 1/x}{x} \end{aligned}$$
(4.7)

and \(\rho _{qg}\) is allowed to vary as \(-2.5< \rho _{qg} < -0.8\). This fixed functional form identically matches with the lower bound \(P_{qg}^{(3),B}\) and the expansion of the variation of \(\rho _{qg}\) enables (to within \(\sim 1\%\)) the absorption of the small-x upper bound uncertainty (predicted from \(P_{qg}^{(3),A}\)) into the variation.Footnote 9 In other areas of x there are larger deviations from the upper bound (\(\sim 10\%\)) when using this convenient fixed functional form. However, in these regions the function is already relatively small, therefore any larger percentage deviations are negligible. Also since the heuristic choice of variation found earlier is intended as a guide, we are not bound by any solid constraints to precisely reconstruct it with our subsequent choice of fixed functional form. Therefore it is entirely justified to be able to slightly adapt the shape of the variation in less dominant regions.

4.4 \(P_{qq}^{{\textrm{NS}},\ (3)}\)

As discussed in Sect. 3, the quark-quark splitting function is comprised of a pure-singlet and non-singlet contribution. We approximate each part independently, although the final quark-quark singlet function will be almost completely dominated by the pure-singlet, except at very high-x.

The four-loop non-singlet splitting function has been the subject of relatively extensive research and is known exactly for a number of regimes. For example in [21], some important exact contributions to the four-loop non-singlet splitting functions are presented, along with 8 even-integer moments for each of the \(+\) and − distributions [21]. In this discussion we are exclusively approximating the non-singlet \(+\)-distribution, as this is the part that contributes to the full singlet quark-quark splitting function. The other relevant non-singlet distributions \(P_{{\textrm{NS}}}^{(3),\ -}\) and \(P_{{\textrm{NS}}}^{(3),\ {\textrm{sea}}}\) (described in [26]), are set to the central values predicted from [21] since any variation in these functions are negligible. All presently known information is used in this approximation, with results similar to that seen in [21] but with our own choice of functions.

$$\begin{aligned} f_{1}(x)&= \frac{1}{(1-x)_{+}}, \quad f_{2}(x) =(1-x)\ \ln (1-x), \nonumber \\ f_{3}(x)&= (1-x)\ \ln ^{2}(1-x), \nonumber \\ f_{4}(x)&= (1-x)\ \ln ^{3}(1-x), \quad f_{5}(x) = 1, \nonumber \\ f_{6}(x)&= x, \quad f_{7}(x) = x^{2}, \quad f_{8}(x)= \ln ^{2} x, \nonumber \\ f_{e}(x, \rho _{qq}^{{\textrm{NS}}})&= C_{F} n_{c}^{3} P_{{\textrm{L}}, 0}^{(3)}(x)+C_{F} n_{c}^{2} n_{f} P_{{\textrm{L}}, 1}^{(3)}(x)\nonumber \\&\quad +P_{L n_{f}}^{(3)+}(x) + \rho _{qq}^{{\textrm{NS}}}\ \ln ^{3} x \nonumber \\&\quad -55.876\ \ln ^{4}x-2.8313\ \ln ^{5}x-0.14883\ \ln ^{6}x\nonumber \\&\quad -2601.7-2118.9\ \ln (1-x) \nonumber \\&\quad + n_{f} \left( 4.6584\ \ln ^{4}x+0.2798\ \ln ^{5}x\right. \nonumber \\&\quad \left. +312.16+337.93\ \ln (1-x)\right) \end{aligned}$$
(4.8)

where the functions \(C_{F} n_{c}^{3} P_{{\textrm{L}}, 0}^{(3)}(x)+C_{F} n_{c}^{2} n_{f} P_{{\textrm{L}}, 1}^{(3)}(x)\) and \(P_{L n_{f}}^{(3)+}(x)\) can be found in Equation (4.11) and Equation (4.14) respectively within [21], and \(\rho _{qq}^{{\textrm{NS}}}\) is our variational parameter. Note that the ansatz from Eq. (4.1) has been extended to include 8 pairs of functions and coefficients, to accommodate 8 known moments. Within the \(f_{e}(x, \rho _{qq}^{{\textrm{NS}}})\) part of Eq. (4.8), we have chosen to vary the coefficient of the most divergent unknown small-x term (\(\ln ^{3}x\)) with the variation across \(0< \rho _{qq}^{{\textrm{NS}}} < 0.014\). Due to the high level of information and larger number of functions allowed to be included, we ignore any functional uncertainty and explicitly define each function. Therefore the only variation needed to be considered as an uncertainty stems from the variation of \(\rho _{qq}^{{\textrm{NS}}}\).

The resulting approximation is then,

$$\begin{aligned} P_{{\textrm{NS}}}^{(3),\ +}= & {} A_{1}\ \frac{1}{(1-x)_{+}} + A_{2}\ (1-x)\ \ln (1-x) \nonumber \\{} & {} + A_{3}\ (1-x)\ \ln ^{2}(1-x) \nonumber \\{} & {} + A_{4}\ (1-x)\ \ln ^{3}(1-x) + A_{5} + A_{6}\ x \nonumber \\{} & {} + A_{7}\ x^{2} + A_{8}\ \ln ^{2}x + f_{e}(x,\rho _{qq}^{{\textrm{NS}}}), \end{aligned}$$
(4.9)

where no alterations are made to the allowed range of \(0< \rho _{qq}^{{\textrm{NS}}} < 0.014\).

4.5 \(P_{qq}^{{\textrm{PS}},\ (3)}\)

We now restrict our analysis to focus on approximating the pure-singlet part of \(P_{qq}^{(3)}\), thereby providing a more accurate set of functions with a focus on the small-x regime. To ensure the \(P_{qq}^{{\textrm{PS}}\ (3)}\) function does not interfere with the large-x regime (where the non-singlet description dominates) the ansatz from Eq. (4.1) is adapted to be:

$$\begin{aligned} P_{ij}^{(3)}(x)= & {} \bigg \{A_{1}f_{1}(x)+A_{2}f_{2}(x)+A_{3}f_{3}(x)\nonumber \\{} & {} +A_{4}f_{4}(x)\bigg \}(1-x)+f_{e}(x, \rho _{qq}^{{\textrm{PS}}}). \end{aligned}$$
(4.10)

This modified parameterisation guarantees that any instabilities in the pure singlet approximation will not wash out the non-singlet behaviour at large-x.

Using four available even-integer moments for \(n_{f} = 4\) [35, 36] and the exact small-x information [28], the chosen set of functions for this approximation is,

$$\begin{aligned} f_{1}(x)&=\frac{1}{x} \quad \text {or} \quad \ln ^{4} x,\nonumber \\ f_{2}(x)&=\ln ^{3} x \quad \text {or} \quad \ln ^{2} x \quad \text {or} \quad \ln x,\nonumber \\ f_{3}(x)&=1 \quad \text {or} \quad x \quad \text {or} \quad x^{2},\nonumber \\ f_{4}(x)&=\ln ^{4}(1-x) \quad \text {or} \quad \ln ^{3}(1-x) \quad \text {or} \quad \ln ^{2}(1-x) \nonumber \\ {}&\qquad \qquad \qquad \qquad \qquad \text {or} \quad \ln (1-x),\nonumber \\ f_{e}(x, \rho _{qq}^{{\textrm{PS}}})&=\frac{C_{A}^{2}C_{F}}{3\pi ^{4}}\bigg (\frac{82}{81}+2\zeta _{3}\bigg )\frac{1}{2}\frac{\ln ^{2}1/x}{x}\nonumber \\&\quad + \rho _{qq}^{{\textrm{PS}}}\ \frac{\ln 1/x}{x}, \end{aligned}$$
(4.11)

where \(\rho _{qq}^{{\textrm{PS}}}\) is varied as \(-0.7< \rho _{qq}^{{\textrm{PS}}} < 0\). For the variation produced from stable combinations of these functions, we coincidentally end up with the same functional form for both the upper \(P_{{\textrm{PS}}}^{(3),\ A}\) and lower \(P_{{\textrm{PS}}}^{(3),\ B}\) bounds. Therefore trivially, the fixed functional form is defined as:

$$\begin{aligned} P_{{\textrm{PS}}}^{(3)}= & {} \bigg \{A_{1}\ \frac{1}{x} + A_{2}\ \ln ^{2}x + A_{3}\ x^{2} \nonumber \\{} & {} + A_{4}\ \ln ^{2}(1-x)\bigg \}(1-x) \nonumber \\{} & {} +\frac{C_{A}^{2}C_{F}}{3\pi ^{4}}\bigg (\frac{82}{81}+2\zeta _{3}\bigg )\frac{1}{2}\frac{\ln ^{2}1/x}{x}\nonumber \\{} & {} +\ \rho _{qq}^{{\textrm{PS}}}\ \frac{\ln 1/x}{x}(1-x) \end{aligned}$$
(4.12)

where the variation of \(\rho _{qq}^{{\textrm{PS}}}\) is unchanged and the entire predicted variation is encapsulated in this form.

4.6 \(P_{gq}^{(3)}\)

As with the previous singlet splitting functions, four even-integer moments for \(n_{f} = 4\) are known [35, 36] along with the LL small-x information [29,30,31]. The set of functions made available for the combinations in our approximation are stated as,

$$\begin{aligned} f_{1}(x)&=\frac{\ln 1/x}{x} \quad \text {or} \quad \frac{1}{x},\nonumber \\ f_{2}(x)&= \ln ^{3}x, \nonumber \\ f_{3}(x)&=x \quad \text {or} \quad x^{2},\nonumber \\ f_{4}(x)&=\ln ^{4}(1-x) \quad \text {or} \quad \ln ^{3}(1-x) \quad \text {or} \nonumber \\ {}&\qquad \qquad \qquad \qquad \qquad \ln ^{2}(1-x) \quad \text {or} \quad \ln (1-x),\nonumber \\ f_{e}(x, \rho _{gq})&=\frac{C_{A}^{3}C_{F}}{3\pi ^{4}}\zeta _{3}\frac{\ln ^{3}1/x}{x} \nonumber \\&\quad + \rho _{gq}\ \frac{\ln ^{2} 1/x}{x}, \end{aligned}$$
(4.13)

where \(\rho _{gq}\) is set as \(\rho _{gq} = -1.8\). In this case, the variation from the choice of functions is large enough to satisfy the criteria in Sect. 4.1 and encapsulate a sensible \(\pm 1\sigma \) variation without including any further variation in \(\rho _{gq}\). Similarly to previous approximations, for stable variations we estimate this variation with the fixed functional form,

$$\begin{aligned} P_{gq}^{(3)}= & {} A_{1}\ \frac{\ln 1/x}{x} + A_{2}\ \ln ^{3} x + A_{3}\ x \nonumber \\{} & {} + A_{4}\ \ln (1-x) + \frac{C_{A}^{3}C_{F}}{3\pi ^{4}}\zeta _{3}\frac{\ln ^{3}1/x}{x}\nonumber \\{} & {} +\ \rho _{gq}\ \frac{\ln ^{2} 1/x}{x} \end{aligned}$$
(4.14)

where the allowed range of \(\rho _{gq}\) is expanded to \(-1.8< \rho _{gq} < -1.5\) to approximate the variation from the choice of functions. As with the \(P_{qg}^{(3)}\) fixed functional form, this new range recovers a variation which is within \(\sim 1\%\) of the original, in the dominant areas of x.

4.7 \(P_{gg}^{(3)}\)

Finally we move to the approximation of the gluon-gluon splitting function, where four available even-integer moments for \(P_{gg}^{(3)}(n_{f} = 4)\) are known from [35, 36]. The list of functions (including the known small-x LL and NLL terms from [29,30,31,32,33]) used for the approximation is,

$$\begin{aligned} f_{1}(x)&=\frac{1}{x} \quad \text {or} \quad \ln ^{3} x \quad \text {or} \quad \ln ^{2} x,\nonumber \\ f_{2}(x)&=\ln x,\nonumber \\ f_{3}(x)&= 1 \quad \text {or} \quad x \quad \text {or} \quad x^{2},\nonumber \\ f_{4}(x)&=\frac{1}{(1-x)}_{+} \quad \text {or} \quad \ln ^{2}(1-x) \quad \text {or} \quad \ln (1-x),\nonumber \\ f_{e}(x, \rho _{gg})&= \frac{C_{A}^{4}}{3\pi ^{4}}\zeta _{3}\frac{\ln ^{3}1/x}{x} +\frac{1}{\pi ^{4}}\bigg [C_{A}^{4}\bigg (-\frac{1205}{162}\nonumber \\&\quad +\frac{67}{36} \zeta _{2}+\frac{1}{4} \zeta _{2}^{2}-\frac{11}{2} \zeta _{3}\bigg ) \nonumber \\&\quad +n_{f} C_{A}^{3}\bigg (-\frac{233}{162}+\frac{13}{36} \zeta _{2}-\frac{1}{3}\zeta _{3}\bigg ) \nonumber \\&\quad +n_{f} C_{A}^{2} C_{F}\bigg (\frac{617}{243} -\frac{13}{18} \zeta _{2}+\frac{2}{3} \zeta _{3}\bigg )\bigg ]\frac{1}{2}\frac{\ln ^{2}1/x}{x}\nonumber \\&\quad +\ \rho _{gg}\ \frac{\ln 1/x}{x}, \end{aligned}$$
(4.15)

where \(\rho _{gg}\) is varied as \(-5< \rho _{gg} < 15\) and \(n_{f} = 4\). The fixed functional form is then chosen to be,

$$\begin{aligned} P_{gg}^{(3)}= & {} A_{1}\ \ln ^{2} x + A_{2}\ \ln x + A_{3}\ x^{2} \nonumber \\{} & {} + A_{4}\ \ln ^{2}(1-x) + \frac{C_{A}^{4}}{3\pi ^{4}}\zeta _{3}\frac{\ln ^{3}1/x}{x} \nonumber \\{} & {} +\frac{1}{\pi ^{4}}\bigg [C_{A}^{4}\bigg (-\frac{1205}{162}+\frac{67}{36} \zeta _{2}+\frac{1}{4} \zeta _{2}^{2}-\frac{11}{2} \zeta _{3}\bigg )\nonumber \\{} & {} +n_{f} C_{A}^{3}\bigg (-\frac{233}{162}+\frac{13}{36} \zeta _{2}-\frac{1}{3}\zeta _{3}\bigg ) \nonumber \\{} & {} +n_{f} C_{A}^{2} C_{F}\bigg (\frac{617}{243} -\frac{13}{18} \zeta _{2}+\frac{2}{3} \zeta _{3}\bigg )\bigg ]\frac{1}{2}\frac{\ln ^{2}1/x}{x}\nonumber \\{} & {} +\ \rho _{gg}\ \frac{\ln 1/x}{x}, \quad (n_{f} = 4) \end{aligned}$$
(4.16)

where we maintain the variation of \(\rho _{gg}\) from above, as the fixed functional form manages to encapsulate the variation predicted, without any extra allowed \(\rho _{gg}\) variation.

4.8 Predicted aN\(^{3}\)LO splitting functions

Fig. 2
figure 2

Perturbative expansion up to aN\(^{3}\)LO for the non-singlet splitting function \(P_{qq}^{{\textrm{NS}},\ +}\) including any corresponding allowed \(\pm 1\sigma \) variation (shaded green region). The best fit value (blue dashed line) displays the prediction for this function determined from a global PDF fit

Fig. 3
figure 3

Perturbative expansions up to aN\(^{3}\)LO for the quark singlet splitting functions \(P_{qq}^{{\textrm{PS}}}\) (top) and \(P_{qg}\) (bottom) including any corresponding allowed \(\pm 1\sigma \) variation (shaded green region). The best fit values (blue dashed line) display the predictions for each function determined from a global PDF fit

Fig. 4
figure 4

Perturbative expansions for the gluon splitting functions \(P_{gq}\) (top) and \(P_{gg}\) (bottom) including any corresponding allowed \(\pm 1\sigma \) variation (shaded green region). The best fit value (blue dashed line) displays the prediction for this function determined from a global PDF fit

Figures 23 and 4 show the perturbative expansions for each splitting function up to approximate N\(^{3}\)LO. Included with these expansions are the predicted variations (\(\pm 1 \sigma \)) from Sect. 4.2 (shown in green) and the aN\(^{3}\)LO best fits (shown in blue – discussed further in Sect. 8). As a general feature, we observe that the singlet N\(^3\)LO approximations are much more divergent than lower orders due to the presence of higher order logarithms at small-x, further highlighting the need for an understanding of MHOUs beyond the default NNLO considered in current PDF sets in a way that is not reliant on the NNLO central value.

Considering the non-singlet case shown in Fig. 2, we see a very close agreement at large-x between \(P_{qq}^{{\textrm{NS}}}\) expanded to NNLO and aN\(^{3}\)LO. This is a general feature of the non-singlet distribution, since by design, this distribution is largely unaffected by small-x contributions. The ratio plot in Fig. 2 provides clearer evidence for this, since it is only towards small-x (where the non-singlet distribution tends towards 0) that any noticeable difference between NNLO and aN\(^{3}\)LO can be seen.

The contributions to \(P_{qq}^{{\textrm{PS}}}\), \(P_{qg}\), \(P_{gq}\) and \(P_{gg}\) shown in Figs. 3 and 4 respectively, display a much richer description at aN\(^{3}\)LO. In all cases, the divergent terms (with \(x \rightarrow 0\)) present in the approximations have a large effect from intermediate-x (\(\sim 10^{-2}\)) down to very small-x values. The asymptotic relationships (red line) Eq. (4.3) defined using the best fit values of the aN\(^{3}\)LO expansions (i.e. comparable to the blue dashed line) are also shown in Figs. 3 and 4. As discussed earlier, these relations are violated by large sub-leading small-x terms and are therefore provided here as a qualitative comparison. Furthermore, we also observe a close resemblance to the N\(^{3}\)LO asymptotic results in Fig. 4 of [34]. Specifically for quark evolution, we show that the data prefers a similar form (\(P_{qq}^{{\textrm{PS}}}\) and \(P_{qg}\)) to the resummed splitting function results in [34] whereas for gluon evolution, this agreement is less prominent.

Superimposed onto these variations in Figs. 3 and 4 are the best fit values for the splitting functions, as predicted from a global fit of the full MSHT approximate N\(^{3}\)LO PDFs. The full fit results will be discussed in more detail in Sect. 8, however we note here that the fit produces relatively good agreement with the prior allowed variations for each of the splitting functions. For all functions except for \(P_{gg}\), the best fit results lie within their \(\pm 1 \sigma \) variation range. This result implies that constraints from the data included in the global fit are in good agreement with the penalties describing quark evolution (i.e. \(P_{{\textrm{PS}}}\) and \(P_{qg}\) in Fig. 3). For the gluon evolution in Fig. 4 we observe a small level of tension with the data pushing towards a slightly harder small-x gluon than preferred by the penalty constraints for \(P_{gg}\). An important caveat to these best fit results is that the data included in the fit is sensitive to all orders in \(\alpha _{s}\). Therefore by proxy, the best fit predictions are also sensitive to corrections at all orders. This will certainly be a driving factor for any violations away from the expected N\(^{3}\)LO behaviour. However, since the ultimate goal of this investigation is to provide a theoretical uncertainty, the violation from higher orders is manifested into the defined penalties and therefore accounted for in the fit as a source of MHOU.

Finally, an important feature that can be seen across all these splitting function plots are points of zero aN\(^{3}\)LO uncertainty in the high-x regions. The regions where these points occur are where the moments are constraining the chosen fixed functional forms very tightly. In particular, for \(N_{m}\) moments (constraints) in Eq. (4.1), we are left with \(N_{m}-1\) points of zero uncertainty predicted from our approximations. As stated, these points are dependent on the choice of our fixed functional form and are therefore regions where the uncertainty has been underestimated when compared to the functional uncertainty which the fixed form approximates. To provide a more complete estimate of the uncertainty in these areas, it would be necessary to smooth the uncertainty band out across these regions (or take into account several fixed functional forms). However, this shortcoming only occurs towards large-x, where the uncertainty is naturally smaller across these functions. Therefore if the uncertainty was smoothed, the effect would be negligible for the theoretical uncertainty this work aims to include in a PDF global fit. Further to this, these functions are ingredients in the DGLAP convolution where any smaller details are washed out by more dominant features inside convolutions with PDFs. For these reasons, we opt for computational efficiency and leave these points as shown.

4.8.1 Moment analysis

Table 1 Numerical moments of singlet and gluon splitting function moments up to N\(^{3}\)LO for \(\alpha _{s}=0.2\) and \(n_{f}=4\)
Fig. 5
figure 5

The low-integer numerical Mellin moments of relevant singlet splitting functions (excluding \(P_{qq}^{{\textrm{NS}},\ +}\)) as a ratio between orders. In all cases the expected perturbative convergence is demonstrated

Tracking back to the moments found for the splitting functions [35, 36] (shown in Table 1 and as a ratio in Fig. 5), we are able to identify the expected convergence in the perturbative expansions up to N\(^{3}\)LO. Figure 5 illustrates the relative size of the NNLO and N\(^{3}\)LO contributions to the low even-integer moments.

Until recently (at the time of writing), there were only 3 moments available for the functions \(P_{gq}\) and \(P_{gg}\) approximated here. However, in [36] an extra moment was published for these two gluon splitting functions. This extra information led to our predictions at small-x being more in line with the resummation results in [34] mentioned earlier. This is an example of how extra information can be added as and when it is available to update any approximations and utilise our full knowledge of the next highest order. By adopting this procedure, we immediately benefit from a slightly increased precision (with a relevant theoretical uncertainty) instead of having to delay the inclusion of higher order theory (for potentially decades) until a complete analytical calculation of the next order in \(\alpha _{s}\) is known.

4.9 Numerical results

We now consider the DGLAP evolution equations for the singlet and gluon shown in Eq. (3.3). We expand this equation to \(\alpha ^{4}_{s}\) and investigate the effects of the variation in the N\(^{3}\)LO contributions.

For the purposes of this analysis, the approximate functions (4.17), taken from [27], are used as sample distributions at an energy scale of \(\mu _{f}^{2} \simeq 30~{\textrm{GeV}}^{2}\), a scale chosen due to its relevance to DIS processes included in the MSHT global fit.

$$\begin{aligned} x\Sigma (x,\mu _{f}^{2}=30~{\textrm{GeV}}^{2})\ =&\quad 0.6\ x^{-0.3}(1-x)^{3.5}(1+5x^{0.8}) \end{aligned}$$
(4.17a)
$$\begin{aligned} xg(x,\mu _{f}^{2}=30~{\textrm{GeV}}^{2})\ =&\quad 1.6\ x^{-0.3}(1-x)^{4.5}(1-0.6x^{0.3}) \end{aligned}$$
(4.17b)

The expressions above are order independent and so provide a robust means to isolate the effects arising from higher orders in the splitting functions. For convenience we also assume

$$\begin{aligned} \alpha _{s}(\mu _{r}^{2}=\mu _{f}^{2}=30~{\textrm{GeV}}^{2})\simeq 0.2. \end{aligned}$$
(4.18)

where \(\mu _{r}\) and \(\mu _{f}\) are the renormalisation and factorisation scales respectively.

4.9.1 Singlet evolution

Fig. 6
figure 6

The flavour singlet quark distribution evolution equation Eq. (3.3) shown for orders up to the approximate N\(^3\)LO (left). The relative shift between subsequent orders of the flavour singlet evolution (right) where \({\dot{\Sigma }} = d \ln \Sigma /d \ln \mu _{f}^{2}\)

Figure 6 demonstrates the result of including the respective N\(^3\)LO expansions from Sect. 4.2 in an analysis of the evolution equation. Towards small-x this variation increases due to the larger uncertainty in the \(P_{qq}^{{\textrm{PS}}}\) and \(P_{qg}\) splitting functions at aN\(^{3}\)LO. On the right of Fig. 6, the difference plot displays the respective shifts from the previous order and demonstrates how this shift changes up to N\(^3\)LO. These results predict a reduction in the evolution of the singlet towards small-x from NNLO. Inspecting Fig. 3, we can see that this reduction is stemming from the contribution of the gluon with the \(P_{qg}\) function at 4-loops, which is the dominant contribution to the evolution. Towards larger x values (\(10^{-2}< x < 10^{-1}\)) we see a fractional increase in the quark evolution, also following the shape of the \(P_{qg}\) function. These results can therefore give some indication as to how we expect our gluon PDF to behave at N\(^{3}\)LO; since the structure functions are directly related to the quarks (through LO), the singlet evolution should remain fairly constant. Therefore we can expect that the fit will prefer a slightly harder gluon at small-x and a softer gluon between \(10^{-2}< x < 10^{-1}\) relative to NNLO.

Figure 6 displays a good level of agreement between the allowed N\(^{3}\)LO shift and the evolution at NLO and NNLO (within \(\pm 1\sigma \) variation bands from theoretical uncertainties). Also shown in Fig. 6 is the evolution prediction using the best fit results for \(P_{qq}^{(3)}\) and \(P_{qg}^{(3)}\) (red dashed). This prediction tends to follow slightly below the center of the \(1\sigma \) uncertainty band, where the data has balanced the two variations and is more in line with the NLO evolution than NNLO due to a negative contribution below \(10^{-2}\). Considering the magnitude of shifts from each order, the predicted shift from NNLO to aN\(^{3}\)LO is slightly larger than that from NLO to NNLO, contradicting what may be expected from perturbation theory. However, we remind the reader that these best fit results are, to some degree, sensitive to all orders in perturbation theory through the data constraint. Due to this, the resultant best fit can be thought of as an approximate asymptote to all orders. Interpreting the approximation in this way, restores our faith in perturbation theory and becomes an entirely plausible estimation of the missing higher orders.

Figure 6 also exhibits an example of how the points of zero uncertainty (discussed in Sect. 4.2) can affect the evolution predictions. We can see that at most the uncertainty is being underestimated by \(<1\%\) and therefore, for the reasons discussed earlier, we do not consider these regions further here.

4.9.2 Gluon evolution

Fig. 7
figure 7

The gluon distribution evolution equation Eq. (3.3) shown for orders up to the approximate N\(^3\)LO (left). The relative shift between subsequent orders of the gluon evolution (right) where \({\dot{g}} = d \ln g/d \ln \mu _{f}^{2}\)

Figure 7 displays the result of including the aN\(^{3}\)LO splitting function contributions into the gluon evolution equation. As with the singlet evolution case, this extra contribution is currently inducing a notable variation at N\(^3\)LO. The general trend at small-x is a reduction in the value of the evolution equation due to the N\(^3\)LO prediction for \(P_{gg}\). On the right hand side of Fig. 7 we observe the respective shifts from lower orders and how this shift changes up to N\(^3\)LO.

In the gluon evolution, there is a large variation coming from the uncertainty in the \(P_{gg}^{(3)}\) function. Therefore when \(P_{gg}^{(3)}\) is convoluted with the gluon PDF at small-x, one could expect a potentially large shift from NNLO. The best fit gluon evolution prediction in Fig. 7 is produced by utilising the best fit results for \(P_{gq}^{(3)}\) and \(P_{gg}^{(3)}\) functions (red dashed). In this prediction we see that the fit prefers a reduction in the evolution from NNLO, which is contained within the \(\pm 1 \sigma \) band until around \(x \lesssim 10^{-4}\). Since at low-\(Q^{2}\), the quark and gluon are comparable at small-x, this reduction is likely driven from the form of \(P_{gq}\) in Fig. 4. Combining this with the smaller gluon PDF at low-\(Q^{2}\) therefore acts to slow the gluon evolution despite \(P_{gg}\) increasing. Furthermore, the best fit is seemingly more in line with the perturbative expectation of the evolution than the chosen variation.Footnote 10 Since this variation is chosen from the known information about the perturbative expansions, this is a manifestation of how the framework we present here can capture the relevant sources of theoretical uncertainty (and account for these via a penalty in a PDF fit). This is encouraging, as even with the large amount of freedom for this gluon evolution, it seems that the data is constraining and balancing the two contributions from the splitting functions in a sensible fashion. As discussed in the singlet evolution case, the relative shift from NNLO to N\(^{3}\)LO is slightly larger than one might hope for when dealing with a perturbative expansion. However, since this best fit is impacted to all orders from the experimental data (up to the leading logarithms at N\(^{3}\)LO i.e. even higher orders involve more divergent logarithms which are missed in this theoretical description), we can interpret this shift as an approximate all order shift and once again restore its validity in perturbation theory.

As with the singlet case above, negligible points of non-zero uncertainty are displayed in Fig. 7. For the reasons discussed in the singlet case and in Sect. 4.2, these are not an area of concern at the current level of desired uncertainty and are therefore not considered further.

5 N\(^{3}\)LO transition matrix elements

Heavy flavour transition matrix elements, \(A_{ij}\), as described in Sect. 3, are exact quantities that describe the transition of all PDFs with \(n_{f}\) active flavours into a scheme with \(n_{f} + 1\) active flavours. Due to discontinuous nature of \(A_{ij}\) at the heavy flavour mass thresholds, they are also present in the coefficient functions to ensure an exact cancellation of this discontinuity in physical quantities. This combination then preserves the smooth nature of the structure function, as demanded by the renormalisation group flows.

The general expansion of the heavy-quark transition matrix elements in powers of \(\alpha _s\) reads,

$$\begin{aligned} A_{i j}=\delta _{i j}+\sum _{\ell =1}^{\infty } \alpha _{\textrm{s}}^{\ell } A_{i j}^{(\ell )}=\delta _{i j}+\sum _{\ell =1}^{\infty } \alpha _{\textrm{s}}^{\ell } \sum _{k=0}^{\ell } L_{\mu }^{k} a_{i j}^{(\ell , k)}, \end{aligned}$$
(5.1)

where at each order the terms proportional to powers of \({\textrm{L}}_{\mu } = \ln (m_{h}^{2} / \mu ^{2})\) are determined by lower order transition matrix elements and splitting functions. Therefore the focus only needs to be on the \(a_{ij}^{(\ell ,0)}\) expressions, as the rest are not only known [38, 39], but are guaranteed not to contribute at mass thresholds due to the presence of \(L_{\mu }\). These \(\mu \)-independent terms can be decomposed in powers of \(n_{f}\) as

$$\begin{aligned} a_{ij}^{(3,0)} =a_{ij}^{(3,0),\ 0}+n_{f} a_{ij}^{(3,0),\ 1}, \end{aligned}$$
(5.2)

where a number of the \(n_f\)-dependent and independent terms are known exactly. The \(n_{f}\) parts are however sub-leading and so as a first approximation, are set to zero in this work. In keeping with the framework set out in Sect. 4.1 for the N\(^{3}\)LO splitting functions, we will make use of the available known information (even-integer Mellin moments [50] and leading small and large-x behaviour [49, 51,52,53,54,55]) about the heavy flavour transition matrix elements to approximate the \(\mu \)-independent contributions \(a_{ij}^{(3,0)}\). As discussed above, we make the choice to completely ignore any terms that do not contribute at mass threshold since not only are these sub-leading but can also be ignored by explicitly setting \(\mu ^{2} = m_{h}^{2}\).

5.1 3-Loop approximations

5.2 \(A_{Hg}\)

The \(A_{Hg}^{(3)}\) function is still under calculation at the time of writing. Currently the first five even-integer moments are known for the \(\overline{\textrm{MS}}\) scheme \(A_{Hg}^{(3)}\) [50], along with the leading small-x terms [49].

The \(n_{f}\)-dependent contribution to the 3-loop unrenormalised \(A_{Hg}\) transition matrix element has also been approximated in [49], while all other contributions to \(A_{Hg}^{(3)}(n_{f} = 0)\) were already known. For this approximation we work in the \(\overline{\textrm{MS}}\) scheme using the framework set out in Sect. 4.1. We then approximate the function using the set of functions,

$$\begin{aligned} f_{1,2}(x)&=\ln ^{5}(1-x) \quad \text {or} \quad \ln ^{4}(1-x) \nonumber \\&\qquad \qquad \qquad \qquad \text {or} \quad \ln ^{3}(1-x) \nonumber \\&\qquad \qquad \qquad \qquad \text {or} \quad \ln ^{2}(1-x) \nonumber \\&\qquad \qquad \qquad \qquad \text {or} \quad \ln (1-x), \nonumber \\ f_{3,4}(x)&= 2 - x \quad \text {or} \quad 1 \quad \text {or} \quad x \quad \text {or} \quad x^{2}, \nonumber \\ f_{5}(x)&= \ln x \quad \text {or} \quad \ln ^{2} x,\nonumber \\ f_{e}(x, a_{Hg})&= \bigg (224\ \zeta _{3} - \frac{41984}{27} - 160 \frac{\pi ^{2}}{6} \bigg )\nonumber \\&\quad \times \frac{\ln 1/x}{x} + a_{Hg} \frac{1}{x} \end{aligned}$$
(5.3)

where \(a_{Hg}\) is varied as \(6000< a_{Hg} < 13000\). This variation is chosen from the criteria outlined in Sect. 4.1 and is comparable to that chosen in [49].

Fig. 8
figure 8

Combinations of functions with an added variational factor (\(a_{Hg}\)) controlling the NLL term. Combinations of functions at the upper (left) and lower (right) bounds of the variation are shown. The solid lines indicate the upper and lower bounds for this function chosen from the relevant criteria

Figure 8 displays the approximation of the \(\overline{\textrm{MS}}\) \(A_{Hg}^{(3)}\) with the variation from different combinations of functions in Eq. (5.3) at the chosen limits of \(a_{Hg}\). Comparing with Fig. 3 in [49], we see a slightly larger range of allowed variation. A small proportion of this difference can be accounted for by the difference in renormalisation schemes, with the majority of this change being from the differences in the criteria from Sect. 4.1. The upper (\(A_{Hg}^{(3), A}\)) and lower (\(A_{Hg}^{(3), B}\)) bounds in the small-x region (shown in Fig. 8) are given by,

$$\begin{aligned} A_{Hg}^{(3), A}= & {} 44.1703\ \ln ^{5}(1-x) + 268.024\ \ln ^{4}(1-x) \nonumber \\{} & {} + 45271.0\ x - 68401.4\ x^{2} \nonumber \\{} & {} + 36029.8\ \ln x + \bigg (224\ \zeta _{3} - \frac{41984}{27} \nonumber \\{} & {} - 160\ \frac{\pi ^{2}}{6}\bigg )\ \frac{\ln 1/x}{x} + 12000\ \frac{1}{x} \end{aligned}$$
(5.4)
$$\begin{aligned} A_{Hg}^{(3), B}= & {} -18.9493\ \ln ^{5}(1-x) - 138.763\ \ln ^{4}(1-x) \nonumber \\{} & {} - 31692.1\ x + 33282.3\ x^{2} \nonumber \\{} & {} - 3088.75\ \ln ^{2} x + \bigg (224\ \zeta _{3} - \frac{41984}{27} \nonumber \\{} & {} - 160\ \frac{\pi ^{2}}{6}\bigg )\ \frac{\ln 1/x}{x} + 6000\ \frac{1}{x}. \end{aligned}$$
(5.5)

Using this information, we then choose the fixed functional form,

$$\begin{aligned} A_{Hg}^{(3)}= & {} A_{1}\ \ln ^{5}(1-x) + A_{2}\ \ln ^{4}(1-x) \nonumber \\{} & {} + A_{3}\ x + A_{4}\ x^{2} + A_{5}\ \ln x \nonumber \\{} & {} + \bigg (224\ \zeta _{3} - \frac{41984}{27} - 160\ \frac{\pi ^{2}}{6}\bigg )\ \frac{\ln 1/x}{x} +a_{Hg}\ \frac{1}{x}\nonumber \\ \end{aligned}$$
(5.6)

where the variation of \(a_{Hg}\) remains unchanged as it already encapsulates the predicted variation to within the \(\sim 1\%\) level.

5.3 \(A_{Hq}^{{\textrm{PS}}}\)

The \(A_{Hq}^{{\textrm{PS}}}\) transition matrix element has been calculated exactly in [53]. Here we attempt to qualitatively reproduce this result via an efficient parameterisation to an appropriate precision.

Using the expressions for the small and large-x limits [53] and the known first six even-integer moments converted into \(\overline{\textrm{MS}}\) [50], we provide a user-friendly approximation as,

$$\begin{aligned}{} & {} A_{Hq}^{{\textrm{PS}}, (3)} =\ (1-x)^{2} \bigg \{-152.523\ \ln ^{3}(1-x)\nonumber \\{} & {} \quad -107.241\ \ln ^{2}(1-x)\bigg \}- 4986.09\ x \nonumber \\{} & {} \quad + 582.421\ x^{2} - 1393.50\ x\ln ^{2}x -4609.79\ x\ln x\nonumber \\{} & {} \quad - 688.396\ \frac{\ln 1/x}{x} + (1-x)\ 3812.90\ \frac{1}{x} \nonumber \\{} & {} \quad + 1.6\ \ln ^{5}x - 20.3457\ \ln ^{4} x \nonumber \\{} & {} \quad + 165.115\ \ln ^{3}x - 604.636\ \ln ^{2}x + 3525.00\ \ln x \nonumber \\{} & {} \quad + (1 - x) \bigg \{0.246914 \ln ^{4}(1-x) - 4.44444 \ln ^{3}(1-x) \nonumber \\{} & {} \quad - 2.28231 \ln ^{2}(1-x) - 357.427 \ln (1-x) + 116.478\bigg \} \nonumber \\ \end{aligned}$$
(5.7)

where the first two lines have been approximated and the last four lines are the exact leading small and large-x terms. We note here that the approximated part of this parameterisation is in a much less important region of x than the exact parts, therefore any small differences in the approximated part from the exact function are unimportant.

5.4 \(A_{qq, H}^{{\textrm{NS}}}\)

Moving to the non-singlet \(A_{qq, H}^{{\textrm{NS}}}\) function, we attempt to parameterise the work from [51, 52]. Specifically, we make use of the known even integer moments up to \(N=14\) [50], converted into the \(\overline{\textrm{MS}}\) scheme, with the even moments corresponding to the (\(+\)) non-singlet distribution.

As for \(A_{Hg}^{(3)}\), the approximation is performed using the set of functions,

$$\begin{aligned} f_{1}(x)&= \ln x, \quad f_{2}(x) =\ln ^{2}x, \nonumber \\ f_{3,4}(x)&=1 \quad \text {or} \quad x \quad \text {or} \quad x^{2} \quad \text {or} \quad \ln (1-x),\nonumber \\ f_{5}(x)&= 1/x, \quad f_{6}(x) =\ln ^{3}(1-x), \nonumber \\ f_{7}(x)&=\ln ^{2}(1-x), \nonumber \\ f_{e}(x, a_{qq,H}^{{\textrm{NS}}})&= a_{qq,H}^{{\textrm{NS}}}\ \ln ^{3}x \end{aligned}$$
(5.8)

where \(a_{qq,H}^{{\textrm{NS}}}\) is varied as \(-90< a_{qq,H}^{{\textrm{NS}}} < -37\). To contain this variation in a fixed functional form we employ:

$$\begin{aligned} A_{qq,H}^{{\textrm{NS}},\ (3)\ +}= & {} A_{1}\ \frac{1}{(1-x)_{+}} + A_{2}\ \ln ^{3}(1-x) \nonumber \\{} & {} + A_{3}\ \ln ^{2}(1-x) + A_{4}\ \ln (1-x) + A_{5}\nonumber \\{} & {} + A_{6}\ x + A_{7}\ \ln ^{2} x + a_{qq,H}^{{\textrm{NS}}}\ \ln ^{3} x \end{aligned}$$
(5.9)

where the variation of \(a_{qq,H}^{{\textrm{NS}}}\) is unchanged.

5.5 \(A_{gq, H}\)

The 3-loop \(A_{gq, H}\) function has been calculated exactly in [54]. As with the \(A_{Hq}^{{\textrm{PS}}}\) function above, we attempt to provide a simple and computationally efficient approximation to this exact form. To do this, we use the known even-integer moments (converted to the \(\overline{\textrm{MS}}\) scheme) and small and large-x information from [50, 54]. Gathering a fixed set of functions \(f_{i}(x)\) and omitting any variational parameter \(a_{gq,H}\), due to the higher amount of information available, the resulting approximation to the \(\overline{\textrm{MS}}\) \(A_{gq, H}^{(3)}\) is:

$$\begin{aligned} A_{gq, H}^{(3)}= & {} -237.172\ \ln ^{3}(1-x) - 201.497\ \ln ^{2}(1-x) \nonumber \\{} & {} + 7247.70\ \ln (1-x) + 39967.3\ x^{2} \nonumber \\{} & {} - 22017.7 - 28459.1\ \ln x - 14511.5\ \ln ^{2} x \nonumber \\{} & {} + 341.543\ \frac{\ln 1 / x}{x} + 1814.73\ \frac{1}{x} \nonumber \\{} & {} - \frac{580}{243}\ \ln ^{4}(1 - x) - \frac{17624}{729}\ \ln ^{3}(1 - x) \nonumber \\{} & {} - 135.699\ \ln ^{2}(1 - x) \end{aligned}$$
(5.10)

where the first two lines have been approximated and the last two lines are the exact small and large-x limits.

5.6 \(A_{gg, H}\)

Work is ongoing for the 3-loop contribution to \(A_{gg, H}\) [60, 61]. Due to this, the entire approximation of \(A_{gg, H}^{(3)}\) presented here is based on the first 5 even-integer Mellin moments [50]. To reduce the wild behaviour of this approximation from only using the Mellin moment information (converted into the \(\overline{\textrm{MS}}\) scheme), we introduce a second mild constraint in the form of the relations in Eq. (4.3). These relations are closely followed by the gluon-gluon functions up to NNLO, but there is no guarantee that this behaviour will continue at N\(^{3}\)LO. This constraint is given as,

$$\begin{aligned} A_{gg,H}(x\rightarrow 0) \simeq \frac{C_{A}}{C_{F}}A_{gq, H}(x\rightarrow 0). \end{aligned}$$
(5.11)

It can be expected that even though this relation may not be followed exactly, it should not stray too far from this general ‘rule of thumb’. Due to this a generous contingency of \(\pm 50\%\) is allowed when using this rule. Furthermore, to ensure this relation is only used as a guide, we allow the variation to move beyond this rule as long as the criteria in Sect. 4.1 are still satisfied. As a result of this change in prescription and because the allowed variation is now on a much larger scale than that of any functional uncertainty, we choose a fixed functional form from the start and use the criteria described above to guide our choice of variation.

$$\begin{aligned} A_{gg,H}^{(3)}= & {} A_{1}\ \ln ^{2}(1-x) + A_{2}\ \ln (1-x) \nonumber \\{} & {} + A_{3}\ x^{2} + A_{4}\ \ln x + A_{5}\ x + a_{gg,H}\ \frac{\ln x}{x} \end{aligned}$$
(5.12)

where \(-2000< a_{gg,H} < -700\).

5.7 Predicted aN\(^{3}\)LO transition matrix elements

Fig. 9
figure 9

Perturbative expansions for the transition matrix element \(A^{{\textrm{NS}}}_{qq,\ H}\) including any corresponding allowed \(\pm 1\sigma \) variation (shaded green region). This function is shown at the mass threshold value of \(\mu = m_{h}\). The best fit value (blue dashed line) displays the prediction for this function determined from a global PDF fit

Fig. 10
figure 10

Perturbative expansions for the transition matrix elements \(A^{{\textrm{PS}}}_{Hq}\) and \(A_{Hg}\) including any corresponding allowed \(\pm 1\sigma \) variation (shaded green region). These functions are shown at the mass threshold value of \(\mu = m_{h}\). The best fit values (blue dashed line) display the predictions for these functions determined from a global PDF fit

Fig. 11
figure 11

Perturbative expansions for the transition matrix elements \(A_{gq,H}\) and \(A_{gg,H}\) including any corresponding allowed \(\pm 1\sigma \) variation (shaded green region). These functions are shown at the mass threshold value of \(\mu = m_{h}\). The best fit values (blue dashed line) display the predictions for these functions determined from a global PDF fit

Figures 910 and 11 show the perturbative expansions for each of the \(n_{f}\)-independent contributions to the transition matrix elements at the mass threshold value of \(\mu = m_{h}\). Included with these expansions are the predicted variations (\(\pm 1 \sigma \)) from Sect. 5.1 (shown in green) and the approximate N\(^{3}\)LO best fits (shown in blue - discussed further in Sect. 8).

\(A_{qq,H}^{{\textrm{NS}}}\) in Fig. 9 behaves as expected with little variation from NNLO until the magnitude of this function is very small. The approximations for the more dominant \(A^{{\textrm{PS}}}_{Hq}\) and \(A_{Hg}\) functions in Fig. 10 exhibit some slight sporadic behaviour towards large-x due to the increased logarithmic influence. However, since this is in a region where the magnitude of these functions become small, any instabilities will have a minimal effect on the overall result. The major feature prevalent across both these functions is the large deviation away from the NNLO behaviour, especially at small-x (and also mid-x for \(A_{Hg}\)).

Similarly for \(A_{gq,H}\) in Fig. 11 (upper), we see some irregular behaviour towards large-x. As with \(A_{Hq}^{{\textrm{PS}}}\) and \(A_{Hg}\), this behaviour is in a region where the magnitude of \(A_{gq,H}\) is small. As discussed in Sect. 5.1, \(A_{gq, H}^{(3)}\) is approximated without any variation due to the range of available information being large.Footnote 11 Due to this, and the fact that the region of potential instability (large-x) is highly suppressed, we can accept this function with negligible effect on any results. As more information becomes available about all these functions, it will be interesting to observe how the behaviour across x changes.

The \(A_{gg,H}\) function shown in Fig. 11 (lower) displays the \(\pm 50\%\) bounds of violation we allow for the relation Eq. (4.3). It follows that the allowed variation is conservative enough to include a generous violation of Eq. (4.3) at N\(^{3}\)LO, with the prediction that the function is positive at small-x. This is an area where small-x information would clearly be very beneficial. With this information currently in progress, it will be very interesting to compare how well this variation captures the true small-x \(A_{gg,H}\) behaviour.

The final best fit values shown in Figs. 910 and 11 are determined from a global PDF fit with various datasets seen to be constraining these functions within the \(\pm \ 1 \sigma \) variations. As observed, we are able to show good agreement between the allowed variations and the best fit predictions. The perturbative expansion predicted for \(A_{gg, H}\) is the least well constrained while also violating its expected relation with \(A_{gq,H}\) more than one may originally expect. Since the small-x region in all cases changes dramatically at N\(^{3}\)LO, one potential explanation is that this function is compensating for an inaccuracy in another area of the theory. However, when comparing with the relationship between \(A_{Hg}\) and \(A_{Hq}^{{\textrm{PS}}}\), Eq. (4.3) also exhibits a significant violation at this order. This could suggest that for the N\(^{3}\)LO transition matrix elements, this relation may not be the best indicator of precision or consistency. Finally, we remember that the best fit in this case may be feeling a larger effect from higher orders, especially due to these functions only existing from NNLO. For example, in Sect. 4.3 we observed a high level of divergence introduced at 4-loops in the splitting functions. The best fit results shown here may therefore be sensitive to a similar level of divergence further along in their corresponding perturbative expansions.

As previously discussed, this lack of knowledge is contained within our choice of the predicted variations of these functions. Therefore this treatment only seeks to add to the predicted level of theoretical uncertainty from missing N\(^{3}\)LO contributions, as one expects.

5.8 Numerical results

Fig. 12
figure 12

Heavy flavour evolution contributions to the heavy quark (\(H + {\overline{H}}\) (left)) and gluon (right) PDFs provided at \(\mu \simeq 30~{\textrm{GeV}}^{2}\). These results include the \(\mu = m_{h}\) contributions from \(A^{{\textrm{PS}}}_{Hq}\), \(A_{Hg}\), \(A_{gq,H}\) and \(A_{gg,H}\) transition matrix elements up to aN\(^{3}\)LO

For these results, the same toy PDFs presented in Sect. 4.4 are employed which approximate the general order-independent PDF features at \(Q^{2} \simeq 30\ {\textrm{GeV}}^{2}\). Note that due to the higher \(Q^{2}\), these results are more representative of the b-quark. The left plot in Fig. 12 shows the result of including the N\(^{3}\)LO transition matrix element approximations we have determined into Eq. (3.7c), which is describing the heavy quark distribution \((H + {\overline{H}})(x, Q^{2} = m_{h}^{2})\). The right plot in Fig. 12 is describing the heavy flavour contribution to the gluon at \((x, Q^{2} = m_{h}^{2})\) in Eq. (3.7b) where the delta function describing the leading order contribution to \(A_{gg,H}\) has been subtracted out. The dominant contribution to the heavy quark (left plot) is stemming from the \(A_{Hg}\) function. Whereas the dominant contribution to the gluon (right plot) is from the \(A_{gg,H}\) function. As one might expect, the predictions at N\(^{3}\)LO are more divergent at small-x, however it is also true that the general trend from NNLO is being followed across most values of x.

The best fit functions predicted from a global fit show the preferred aN\(^{3}\)LO contributions for both scenarios. The predicted behaviour from the global fit follows the results for the perturbative expansions in Sect. 5.2. For the \((H + {\overline{H}})(x, Q^{2} = m_{h}^{2})\) result (Fig. 12 left), the aN\(^{3}\)LO result is positive across a much wider range of x. Since this is a perturbatively calculated PDF, this is an encouraging result that could potentially eliminate some of the more unphysical shortcomings at NNLO without demanding positivity of the PDF a priori.

6 N\(^{3}\)LO heavy coefficient functions

The final set of functions considered are the Neutral Current (NC) DIS coefficient functions which, when combined with the PDFs, form the structure functions discussed in Sect. 3.Footnote 12 We approximate the N\(^{3}\)LO heavy quark coefficient functions which accompany the heavy flavour transition matrix elements from Sect. 5 and also the N\(^{3}\)LO light quark coefficient functions. We note that our standard definition of the order of coefficient functions includes the longitudinal coefficient functions at order \(\alpha _s\) at LO, at order \(\alpha _s^2\) at NLO etc. This means we already include order \(\alpha _s^3\) coefficient functions for the longitudinal coefficient functions at NNLO, whereas many groups only consider order \(\alpha _s^2\) at NNLO. Since little is know about longitudinal coefficient functions at order \(\alpha _s^4\), and the data constraints from \(F_L(x,Q^2)\) are very much less precise than from \(F_2(x,Q^2)\), we simply remain at the precisely known order \(\alpha _s^3\) in this study.

6.1 Approximation framework: continuous information

In Sect. 4.1 we described the approximation framework employed for functions with discrete Mellin moment information, combined with any available exact information. For the N\(^{3}\)LO coefficient function approximations, we have access to a somewhat richer vein of information than the discrete moments discussed for the framework used in approximating the N\(^{3}\)LO splitting functions and transition matrix elements in Sects. 4 and 5. More specifically, approximations of the FFNS coefficient functions at \({\mathcal {O}}(\alpha _{s}^{3})\) are known for the heavy quark contributions to the heavy flavour structure function \(F_{2,H}(x, Q^{2})\) at \(Q^{2} < m_{c,b}^{2}\) [47,48,49]. These approximations include the exact LL and mass threshold contributions, with an approximated NLL term (the details of this are described in Sect. 6.2). Furthermore, the N\(^{3}\)LO ZM-VFNS coefficient functions are known exactly [57]. Both of these contributions can then be combined with the transition matrix element approximations to define the GM-VFNS functions in the \(Q^{2} \le m_{c}^{2}, m_{b}^{2}\) and \(Q^{2} \rightarrow \infty \) regimes. Due to this, we base our approximations for the \(C_{H,\{q,g\}}^{(3)}\) functions on the known continuous information in the low and high-\(Q^{2}\) regimes.

To achieve a reliable approximation for \(C_{H,\{q,g\}}^{(3)}\), we first fit a regression model with a large number of functions in \((x,Q^{2})\) space made available to the model (in order to reduce the level of functional bias in the parameterisation). This produces an unstable result at the extremes of the parameterisation (large-x and low-\(Q^{2}\)). However, it provides a basis for manually choosing a stable parameterisation to move between the two known regimes (low-\(Q^{2}\) and high-\(Q^{2}\)).

Using the regression model predictions as a qualitative guide, we choose a stable and smooth interpolation between the two \(Q^{2}\) regimes (low-\(Q^{2}\) and high-\(Q^{2}\)) as given in Eq. (6.1). This interpolation is observed to mirror the expected behaviour observed from lower orders, the regression model qualitative prediction having been calculated independently of lower orders and the best fit quality to data. By definition, we also ensure an exact cancellation between the coefficient functions and the transition matrix elements at the mass threshold energies as demanded by the theoretical description in Sect. 3.

For the contributions to the heavy flavour structure function \(F_{2,H}\) the final interpolations in the FFNS regime are defined as,

$$\begin{aligned}{} & {} C_{H,\ \{q,g\}}^{{\textrm{FF}},\ (3)} = \nonumber \\ {}{} & {} {\left\{ \begin{array}{ll} C_{H,\ \{q,g\},\ \text {low-}Q^{2}}^{{\textrm{FF}},\ (3)}(x, Q^{2} = m_{h}^{2})\ e^{0.3\ (1-Q^{2}/m_{h}^{2})} \\ \quad +\ C_{H,\ \{q,g\}}^{{\textrm{FF}},\ (3)}(x, Q^{2} \rightarrow \infty )\big (1 - e^{0.3\ (1-Q^{2}/m_{h}^{2})}\big ), &{} \text {if}\ Q^{2} \ge m_{h}^{2}, \\ C_{H,\ \{q,g\},\ \text {low-}Q^{2}}^{{\textrm{FF}},\ (3)}(x, Q^{2}),&{} \text {if}\ Q^{2} < m_{h}^{2}. \end{array}\right. } \nonumber \\ \end{aligned}$$
(6.1)

where \(C_{H,\ \{q,g\},\ \text {low-}Q^{2}}^{{\textrm{FF}},\ (3)}\) are the already calculated approximate heavy flavour FFNS coefficient functions at \(Q^{2} \le m_{h}^{2}\), and \(C_{H,\ \{q,g\}}^{{\textrm{FF}},\ (3)}(Q^{2} \rightarrow \infty )\) is the limit at high-\(Q^{2}\) found from the known ZM-VFNS coefficient functions and relevant subtraction terms, themselves found from Eq. (3.14). Both of these limits will be discussed in detail on a case-by-case basis in Sect. 6.

For the heavy flavour contributions to \(F_{2,q}\), we have no information about the low-\(Q^{2}\) N\(^{3}\)LO FFNS coefficient functions. In this case, we use intuition from lower orders to provide a soft (lightly weighted) low-\(Q^{2}\) target for our regression model in \((x,Q^{2})\). However, since the overall contribution is very small from these functions, the exact form of these functions is not phenomenologically important at present. Further to this, our understanding from lower orders is that these functions have a weak dependence on \(Q^{2}\) and so the form of the low-\(Q^{2}\) description is even less important. As with the \(C_{H,\ \{q,g\}}^{(3)}\) coefficient functions, the regression results provide an initial qualitative guide which exhibits instabilities in the extremes of \((x,Q^{2})\). We therefore employ a similar technique as before to ensure a smooth extrapolation across all \((x, Q^{2})\) into the unknown behaviour at low-\(Q^{2}\). For these functions, the ansatz used is given as,

$$\begin{aligned}{} & {} C_{q,\ \{q,g\}}^{{\textrm{FF}},\ (3)} \nonumber \\ {}{} & {} \quad = {\left\{ \begin{array}{ll} C_{q,\ q}^{{\textrm{FF}},\ {\textrm{NS}},\ (3)}(x, Q^{2} \rightarrow \infty ) \\ \qquad \qquad \qquad \big (1 + e^{-0.5\ (Q^{2}/m_{h}^{2}) - 3.5}\big ), \qquad \qquad \qquad \\ \\ C_{q,\ q}^{{\textrm{FF}},\ {\textrm{PS}},\ (3)}(x, Q^{2} \rightarrow \infty ) \\ \qquad \qquad \qquad \big (1 - e^{-0.25\ (Q^{2}/m_{h}^{2}) - 0.3}\big ),\qquad \qquad \qquad \\ \\ C_{q,\ g}^{{\textrm{FF}},\ (3)}(x, Q^{2} \rightarrow \infty )\big (1 - e^{-0.05\ (Q^{2}/m_{h}^{2}) + 0.35}\big ), \qquad \qquad \qquad \end{array}\right. } \end{aligned}$$
(6.2)

where \(C_{q,\{q,g\}}^{{\textrm{FF}},\ (3)}(x, Q^{2} \rightarrow \infty )\) is the known limit at high-\(Q^{2}\).

6.2 Low-\(Q^{2}\) N\(^{3}\)LO heavy flavour coefficient functions

As previously mentioned in Sect. 3, the standard MSHT theoretical description of NNLO structure functions includes approximations to the low-\(Q^{2}\) FFNS coefficient functions \(C_{H,\{q,g\}}^{(3),{\textrm{FF}}}\) from [47,48,49]. Within these functions are the precisely known LL small-x terms and mass threshold information, along with an approximate NLL small-x term added into the MSHT fit. In the NNLO fit these approximate NLL parameters play a very small role due to not only being sub-leading, but also only affecting the FFNS scheme below the mass thresholds. At NNLO they are therefore heuristically set to a value that is theoretically justified and suits the NNLO best fit. At N\(^{3}\)LO these functions begin to directly affect the form of the full GM-VFNS scheme across all \((x, Q^{2})\). For this reason, these NLL parameters need to be considered as an independent source of theoretical uncertainty. In the aN\(^{3}\)LO fit, the NLL parameters are left free and included into the framework set out in Sect. 2.1.

The standard NNLO MSHT fit contains terms of the form,

$$\begin{aligned}&C_{H,i}^{(3),\ {\textrm{NLL}}}(Q^{2} \rightarrow 0) \propto -4\ \frac{1}{x}\ + c_{i}^{\textrm{LL}}\ \frac{\ln 1/x}{x}, \nonumber \\&\quad \left( c_{g}^{\textrm{LL}} = \frac{C_{F}}{C_{A}}\ c_{q}^{\textrm{LL}}\right) , \end{aligned}$$
(6.3)

where \(i = q, g\) and \(c_{i}^{\textrm{LL}}\) is the precisely known leading small-x log coefficient. In the aN\(^{3}\)LO fit, the NLL coefficient is allowed to vary by \(\pm 50\%\) (\(\pm 1\sigma \) variation). This conservative range is chosen to enable the release of tension with the variational parameters associated with the N\(^{3}\)LO transition matrix elements. Here we stress that this quantity is heuristically set even at NNLO, therefore our treatment is completely justified with the added benefit of now accounting for an uncertainty for this choice.

6.3 3-Loop approximations

6.4 \(C_{H, q}\)

In this section the \(C_{H, q}\) coefficient function is investigated. As discussed in Sect. 3, \(C_{H, q}\) contributes to the heavy flavour structure function \(F_{2, H}\). We begin by isolating this function from Eq. (3.14) and relating the FFNS and GM-VFNS schemes at all orders from Eq. (3.9) and Eq. (3.13),

$$\begin{aligned} C_{H, q}^{{\textrm{FF}}}= & {} \left[ C_{H, H}^{{\textrm{VF}},\ {\textrm{N S}}} +C_{H, H}^{{\textrm{VF}},\ {\textrm{PS}}}\right] \otimes A_{H q}^{\textrm{P S}} \nonumber \\{} & {} +\ C_{H, q}^{{\textrm{VF}}} \otimes \left[ A_{q q, H}^{{\textrm{NS}}} + A_{qq, H}^{{\textrm{PS}}}\right] \ +\ C_{H, g}^{{\textrm{VF}}} \otimes A_{g q, H}. \nonumber \\ \end{aligned}$$
(6.4)

Expanding this function we obtain:

$$\begin{aligned} {\mathcal {O}}(\alpha _{s}): {\quad }&C_{H,q}^{{\textrm{FF}},\ (1)} = 0 \end{aligned}$$
(6.5)
$$\begin{aligned} {\mathcal {O}}(\alpha _{s}^{2}): {\quad }&C_{H,q}^{{\textrm{FF}},\ (2)} = C_{H,H}^{{\textrm{VF}},\ (0)} \otimes A_{Hq}^{PS,\ (2)} \nonumber \\&\quad \qquad \qquad + C_{H, q}^{{\textrm{VF}},\ (2)} \otimes A_{qq, H}^{{\textrm{NS}},\ (0)} \end{aligned}$$
(6.6)
$$\begin{aligned} {\mathcal {O}}(\alpha _{s}^{3}): {\quad }&C_{H,q}^{{\textrm{FF}},\ (3)} = C_{H, H}^{{\textrm{VF}},\ (1)}\otimes A_{Hq}^{{\textrm{PS}},\ (2)}\nonumber \\&\quad \qquad \qquad + C_{H, H}^{{\textrm{VF}},\ (0)}\otimes A_{Hq}^{{\textrm{PS}},\ (3)} \nonumber \\&\quad \qquad \qquad +\ C_{H, q}^{{\textrm{VF}},\ (3)}\otimes A_{qq,H}^{{\textrm{NS}},\ (0)}\nonumber \\&\quad \qquad \qquad + C_{H, g}^{{\textrm{VF}},\ (1)} \otimes A_{gq, H}^{(2)} \end{aligned}$$
(6.7)

where we recall that \(A_{qq,H}^{{\textrm{NS}},\ (0)} = \delta (1 - x)\).

6.5 NNLO

The first contribution from the heavy quarks appears at the \({\mathcal {O}}(\alpha _{s}^{2})\) level. Fortunately there is a complete picture of this order [43] which provides some experience with the behaviour of these functions before moving into unknown territory. Figure 13 shows the case for \(C_{H, q}^{{\textrm{VF}},\ (2)}\) converging onto \(C_{H, q}^{{\textrm{ZM}},\ (2)}\) at high-\(Q^2\), as required by the definition of the GM-VFNS scheme outlined in Sect. 3.

Fig. 13
figure 13

The NNLO GM-VFNS function \(C_{H, q}^{{\textrm{VF}},\ (2)}\) compared with the NNLO ZM-VFNS function \(C_{H, q}^{{\textrm{ZM}},\ (2)}\) across a variety of x and \(Q^{2}\) values. Mass threshold is set at the charm quark level (\(m_{h}^{2} = m_{c}^{2} = 1.4~\text {GeV}^{2}\))

From Fig. 13, immediately some intuition can be built up surrounding the form of these functions. It can be observed that the GM-VFNS function at low-\(Q^{2}\) is consistently more positive than at high-\(Q^{2}\). However, the values at low and high-\(Q^{2}\) are of the same order of magnitude which provides evidence that the behaviour should not be substantially different across values of \(Q^{2}\) when estimating our N\(^{3}\)LO quantities. Further to this, as \(x \rightarrow 0\) the overall magnitude of \(C_{H,q}^{(2)}\) becomes much larger, which is consistent with an inherently pure singlet quantity.

6.6 N\(^{3}\)LO

At \({\mathcal {O}}(\alpha _{s}^{3})\) the N\(^{3}\)LO ZM-VFNS and low-\(Q^{2}\) FFNS functions are known [47,48,49, 57] and parameterisations/approximations are available (up to the level of precision discussed in Sect. 6.2). Nevertheless, there is no direct information on how the full GM-VFNS function behaves at this order which is required for a full treatment of the heavy flavour coefficients. Using Eq. (6.7) to estimate the N\(^{3}\)LO contribution, we have

$$\begin{aligned} C_{H, q}^{{\textrm{VF}},\ (3)}= & {} C_{H,q}^{{\textrm{FF}},\ (3)} - C_{H, H}^{{\textrm{VF}},\ (1)}\otimes A_{Hq}^{{\textrm{PS}},\ (2)} \nonumber \\{} & {} - C_{H, g}^{{\textrm{VF}},\ (1)} \otimes A_{gq, H}^{(2)} - A_{Hq}^{{\textrm{PS}},\ (3)}, \end{aligned}$$
(6.8)

where \(A_{Hq}^{{\textrm{PS}},\ (3)}\) is the N\(^{3}\)LO transition matrix element approximated in Sect. 5.1.

It must be the case that the discontinuities introduced into the heavy flavour PDF from the transition matrix elements (at the threshold value of \(Q^{2} = m_{h}^{2}\)) are cancelled exactly in the structure function. The cancellation of \(A_{Hq}^{{\textrm{PS}},\ (3)}\) is therefore guaranteed by its inclusion into the GM-VFNS coefficient function in Eq. (6.8). Since in practice the transition matrix elements are convoluted with the PDFs separately to the coefficient functions, to ensure that this statement remains the case, the parameterisation will be performed in the FFNS number scheme. By doing this, we can explicitly switch to the GM-VFNS number scheme by including the subtraction term in Eq. (6.8). This procedure then ensures that \(A_{Hq}^{{\textrm{PS}},\ (3)}\) is subtracted off exactly with no unphysical discontinuity.

Following the methodology set out in Sect. 6.1, the two regimes we wish to interpolate between are the approximate \(C_{H, q}^{{\textrm{FF}},\ (3)}(Q^{2} \rightarrow 0)\) limit and

$$\begin{aligned} C_{H, q}^{{\textrm{FF}},\ (3)}(Q^{2} \rightarrow \infty )= & {} C_{H,q}^{{\textrm{ZM}},\ (3)} + C_{H, H}^{{\textrm{VF}},\ (1)}\otimes A_{Hq}^{{\textrm{PS}},\ (2)} \nonumber \\{} & {} + C_{H, g}^{{\textrm{VF}},\ (1)} \otimes A_{gq, H}^{(2)} + A_{Hq}^{{\textrm{PS}},\ (3)}, \end{aligned}$$
(6.9)

where \(C_{H,q}^{{\textrm{VF}},\ (3)}\) is replaced with \(C_{H,q}^{{\textrm{ZM}},\ (3)}\) in the high-\(Q^{2}\) limit. Eq. (6.1) is then stable across all \((x, Q^{2})\), exactly cancelling any discontinuity that would violate the RG flow, whilst also demanding that the known FFNS approximation (for \(Q^{2} < m^{2}_{h}\)) is followed.Footnote 13

Fig. 14
figure 14

N\(^{3}\)LO GM-VFNS function \(C_{H, q}^{{\textrm{VF}},\ (3)}\) compared with the N\(^{3}\)LO ZM-VFNS function \(C_{H, q}^{{\textrm{ZM}},\ (3)}\) across a variety of x and \(Q^{2}\) values (shown without the variation from the low-\(Q^{2}\) NLL term discussed in Sect. 6.2). \(C_{H, q}^{{\textrm{VF}},\ (3)}\) is parameterised via Eqs. (6.8), (6.9) and (6.1). Mass threshold is set at the charm quark level (\(m_{h}^{2} = m_{c}^{2} = 1.4~\text {GeV}^{2}\))

Figure 14 shows the result of estimating \(C_{H, q}^{{\textrm{VF}},\ (3)}\) using the above approximation for \(C_{H, q}^{{\textrm{FF}},\ (3)}\) and the relevant subtraction term from Eq. (6.8). Note that this plot ignores any variation from the low-\(Q^{2}\) NLL term discussed in Sect. 6.2, where this is fixed to its central value.

6.7 \(C_{H, g}\)

As with \(C_{H,q}\), using Eq. (3.9) and Eq. (3.13) to isolate \(C_{H, g}\) and relate the FFNS and GM-VFNS schemes,

$$\begin{aligned} C_{H,g}^{{\textrm{FF}}}&= C_{H,g}^{{\textrm{VF}}} \otimes A_{gg, H} + C_{H,q}^{{\textrm{VF}},\ {\textrm{PS}}} \otimes A_{qg, H} \nonumber \\&\quad + \left[ C_{H,H}^{{\textrm{VF}},\ {\textrm{NS}}}+C_{H,H}^{{\textrm{VF}},\ {\textrm{PS}}}\right] \otimes A_{H g} \end{aligned}$$
(6.10)
$$\begin{aligned} {\mathcal {O}}(\alpha _{s}): {\quad }&C_{H,g}^{{\textrm{FF}},\ (1)} = C_{H,g}^{{\textrm{VF}},\ (1)} + C_{H, H}^{{\textrm{VF}},\ (0)}\otimes A_{Hg}^{(1)} \end{aligned}$$
(6.11)
$$\begin{aligned} {\mathcal {O}}(\alpha _{s}^{2}): {\quad }&C_{H,g}^{{\textrm{FF}},\ (2)} = C_{H,g}^{{\textrm{VF}},\ (2)} + C_{H,g}^{{\textrm{VF}},\ (1)}\otimes A_{gg,H}^{(1)} \nonumber \\&\qquad \qquad \quad + C_{H,H}^{{\textrm{VF}},\ (0)} \otimes A_{Hg}^{(2)}+ C_{H, H}^{{\textrm{VF}},\ (1)}\otimes A_{Hg}^{(1)} \end{aligned}$$
(6.12)
$$\begin{aligned} {\mathcal {O}}(\alpha _{s}^{3}): {\quad }&C_{H,g}^{{\textrm{FF}},\ (3)} = C_{H,g}^{{\textrm{VF}},\ (3)} + C_{H, g}^{{\textrm{VF}},\ (2)}\otimes A_{gg, H}^{(1)} \nonumber \\&\qquad \qquad \quad + C_{H, g}^{{\textrm{VF}},\ (1)}\otimes A_{gg, H}^{(2)} \nonumber \\&\qquad \qquad \quad + C_{H,H}^{{\textrm{VF}},\ {\textrm{NS}}+{\textrm{PS}},\ (2)} \otimes A_{Hg}^{(1)}\nonumber \\&\qquad \qquad \quad + C_{H, H}^{{\textrm{VF}},\ (1)}\otimes A_{Hg}^{(2)}+ C_{H, H}^{{\textrm{VF}},\ (0)}\otimes A_{Hg}^{(3)} \end{aligned}$$
(6.13)

we uncover a NLO contribution to the heavy flavour structure function. This lower order contribution is a consequence of the gluon being able to directly probe the heavy flavour quarks, whereas a light quark must interact via a secondary interaction (hence the \(C_{H,q}\) coefficient function beginning at NNLO).

6.8 NLO and NNLO

The NLO and NNLO contributions to \(C_{H,g}\) are known exactly [43]. To build some experience and check our understanding, we can observe how the lower order GM-VFNS functions converge onto their ZM-VFNS counterparts in Figs. 15 and 16.

Fig. 15
figure 15

The NLO GM-VFNS function \(C_{H, g}^{{\textrm{VF}},\ (1)}\) compared with the NLO ZM-VFNS function \(C_{H, g}^{{\textrm{ZM}},\ (1)}\) across a variety of x and \(Q^{2}\) values. Mass threshold is set at the charm quark level (\(m_{h}^{2} = m_{c}^{2} = 1.4~\text {GeV}^{2}\))

Fig. 16
figure 16

The NNLO GM-VFNS function \(C_{H, g}^{{\textrm{VF}},\ (2)}\) compared with the NNLO ZM-VFNS function \(C_{H, g}^{{\textrm{ZM}},\ (2)}\) across a variety of x and \(Q^{2}\) values. Mass threshold is set at the charm quark level (\(m_{h}^{2} = m_{c}^{2} = 1.4~\text {GeV}^{2}\))

At NLO and NNLO the magnitude of the functions is generally higher in the low-\(Q^{2}\) limit than at high-\(Q^{2}\). In both cases, the function remains at the same order of magnitude across all \(Q^{2}\). However, the relative change across \(Q^{2}\) is smaller at NLO, and similar to that seen for \(C_{H,q}^{(2)}\) at NNLO. Due to this, we can once again expect that although more of a scaling contribution at N\(^{3}\)LO may be present, it should not be too substantial across the range of \(Q^{2}\).

6.9 N\(^{3}\)LO

As with the \(C_{H, q}^{(3)}\) function at \({\mathcal {O}}(\alpha _{s}^{3})\), the FFNS result at low-\(Q^{2}\) is known (up to the level of precision discussed in Sect. 6.2), as well as the exact ZM-VFNS function at high-\(Q^{2}\) [47,48,49, 57]. Considering the form of \(C^{{\textrm{VF}},\ (3)}_{H,g}\), there is an extra complication coming from the transition matrix element \(A_{Hg}^{(3)}\). As discussed in Sect. 5.1, the \(A_{Hg}^{(3)}\) function is not as well known as the \(A_{Hq}^{(3)}\) function considered earlier and is accompanied by the variational parameter \(a_{Hg}\). Since it is a requirement for \(C_{H, g}^{(3)}\) to exactly cancel the PDF discontinuity introduced by \(A_{Hg}^{(3)}\), this variation must be compensated for and included in the description,

$$\begin{aligned} C_{H,g}^{{\textrm{VF}},\ (3)}= & {} C_{H,g}^{{\textrm{FF}},\ (3)} - C_{H, g}^{{\textrm{VF}},\ (2)}\otimes A_{gg, H}^{(1)} \nonumber \\{} & {} - C_{H, g}^{{\textrm{VF}},\ (1)}\otimes A_{gg, H}^{(2)} - C_{H,H}^{{\textrm{VF}},\ {\textrm{NS}}+{\textrm{PS}},\ (2)} \otimes A_{Hg}^{(1)} \nonumber \\{} & {} - C_{H, H}^{{\textrm{VF}},\ (1)}\otimes A_{Hg}^{(2)} - A_{Hg}^{(3)}. \end{aligned}$$
(6.14)

As in Sect. 6.3, transitioning to the FFNS number scheme ensures an exact cancellation via the subtraction term in Eq. (6.14). Using the exact information for \(C_{H,g}^{{\textrm{FF}},\ (3)}(Q^{2} \rightarrow 0)\) and the known high-\(Q^{2}\) limit,

$$\begin{aligned} C_{H,g}^{{\textrm{FF}},\ (3)}(Q^{2} \rightarrow \infty )= & {} C_{H,g}^{{\textrm{ZM}},\ (3)} + C_{H, g}^{{\textrm{VF}},\ (2)}\otimes A_{gg, H}^{(1)} \nonumber \\{} & {} + C_{H, g}^{{\textrm{VF}},\ (1)}\otimes A_{gg, H}^{(2)} \nonumber \\{} & {} + C_{H,H}^{{\textrm{VF}},\ {\textrm{NS}}+{\textrm{PS}},\ (2)} \otimes A_{Hg}^{(1)} \nonumber \\{} & {} + C_{H, H}^{{\textrm{VF}},\ (1)}\otimes A_{Hg}^{(2)} + A_{Hg}^{(3)} \end{aligned}$$
(6.15)

where \(C_{H,g}^{{\textrm{VF}},\ (3)}\) is replaced with \(C_{H,g}^{{\textrm{ZM}},\ (3)}\) in the high-\(Q^{2}\) limit. Applying the framework set out in Eq. (6.1), the resulting parameterisation is stable across all \((x,Q^{2})\). As \(A_{Hg}^{(3)}\) and its variation is explicitly included in Eq. (6.14) this ensures the continuity of the structure function with exact cancellations of discontinuities at mass thresholds.

Fig. 17
figure 17

The N\(^{3}\)LO GM-VFNS function \(C_{H, g}^{{\textrm{VF}},\ (3)}\) compared with the N\(^{3}\)LO ZM-VFNS function \(C_{H, g}^{{\textrm{ZM}},\ (3)}\) across a variety of x and \(Q^{2}\) values (shown without the variation from the low-\(Q^{2}\) NLL term discussed in Sect. 6.2). \(C_{H, g}^{{\textrm{VF}},\ (3)}\) is parameterised via Eqs. (6.14), (6.15) and (6.1). Mass threshold is set at the charm quark level (\(m_{h}^{2} = m_{c}^{2} = 1.4~\text {GeV}^{2}\))

Figure 17 displays our approximation for the GM-VFNS coefficient function across a range of \((x, Q^{2})\) via a parameterisation for \(C_{H, g}^{{\textrm{FF}},\ (3)}\) and the relevant subtraction term in Eq. (6.14). Figure 17 also contains the uncertainty in this approximation stemming from \(A_{Hg}^{(3)}\) (see Sect. 5.1). Note that Fig. 17 ignores any variation from the low-\(Q^{2}\) NLL term discussed in Sect. 6.2, where this is fixed to its central value. The uncertainty shown in Fig. 17 is suppressed as we move to high-\(Q^{2}\) owing to the required convergence of the GM-VFNS onto the corresponding ZM-VFNS gluon coefficient function at N\(^{3}\)LO.

Included in Fig. 17 is the best fit prediction for \(C_{H, g}^{{\textrm{VF}},\ (3)}\) (corresponding to the best fit of \(A_{Hg}^{(3)}\) approximated in Sect. 5). Overall we see the resultant shape of \(C_{H,g}^{(3)}\) is within our predicted range and follows a sensible shape that matches with the known high-\(Q^{2}\) FFNS behaviour. Contrasting this with NNLO, the shape across the range of x values shown is less consistent. There is no guarantee that this should be the case, since we do not know how the perturbative nature of QCD will behave. However, we do maintain the relatively consistent order of magnitude across the evolution in \(Q^{2}\), therefore the exact form of the shape across \(Q^{2}\) will be less important in the resultant structure function picture.

6.10 \(C_{q,q}^{{\textrm{NS}}}\)

The light quark coefficient functions involve small heavy flavour contributions at higher orders from heavy quarks produced away from the photon vertex. As discussed in Sect. 6.1 the low-\(Q^{2}\) FFNS function in this case is unknown. However, since the heavy flavour contributions to the light quark structure function \(F_{2,q}(x,Q^{2})\) are very small, any choice of sensible variation in \(Q^{2}\) has a near negligible effect on the overall structure function. Further to this, as is apparent from lower order examples, it can be expected that the light quark coefficient functions remain relatively constant across \(Q^{2}\).

Using Eqs. (3.9) and (3.11), the non-singlet coefficient function is stated as,

$$\begin{aligned} C_{q,q}^{{\textrm{FF}},\ {\textrm{NS}}}= & {} A_{qq,H}^{{\textrm{NS}}}\otimes C_{q,q}^{{\textrm{VF}}\ {\textrm{NS}}}, \end{aligned}$$
(6.16)
$$\begin{aligned} {\mathcal {O}}(\alpha _{s}^{0}):&{\quad } C_{q,q,\ {\textrm{NS}}}^{{\textrm{FF}},\ (0)}\ =\ C_{q,q,\ {\textrm{NS}}}^{{\textrm{VF}},\ (0)} \end{aligned}$$
(6.17a)
$$\begin{aligned} {\mathcal {O}}(\alpha _{s}^{1}):&{\quad } C_{q,q,\ {\textrm{NS}}}^{{\textrm{FF}},\ (1)}\ =\ C_{q,q,\ {\textrm{NS}}}^{{\textrm{VF}},\ (1)} \end{aligned}$$
(6.17b)
$$\begin{aligned} {\mathcal {O}}(\alpha _{s}^{2}):&{\quad } C_{q,q,\ {\textrm{NS}}}^{{\textrm{FF}},\ (2)}\ =\ C_{q,q,\ {\textrm{NS}}}^{{\textrm{VF}},\ (2)} + A_{qq,H}^{{\textrm{NS}},\ (2)} \end{aligned}$$
(6.17c)
$$\begin{aligned} {\mathcal {O}}(\alpha _{s}^{3}):&{\quad } C_{q,q,\ {\textrm{NS}}}^{{\textrm{FF}},\ (3)}\ =\ C_{q,q,\ {\textrm{NS}}}^{{\textrm{VF}},\ (3)} + A_{qq,H}^{{\textrm{NS}},\ (3)} \nonumber \\&\qquad \quad \qquad \qquad + C_{q,q,\ {\textrm{NS}}}^{{\textrm{VF}},\ (1)}\otimes A_{qq,H}^{{\textrm{NS}},\ (2)}. \end{aligned}$$
(6.17d)

From Eq. (6.17) the FFNS contribution at LO and NLO is identical to the GM-VFNS and ZM-VFNS function at high-\(Q^{2}\). Physically for heavy quarks to affect light quarks, a larger number of vertices than are allowed at LO and NLO must be present to enable interactions involving heavy quarks. We therefore begin our discussion at NNLO.

6.10.1 NNLO

At NNLO the functions included in Eq. (6.17c) are known exactly [39, 58]. Assembling these together, we provide an example of how the GM-VFNS function converges to the familiar ZM-VFNS function for the light quark. By performing this exercise, expectations as to how \(C_{q,q}^{{\textrm{NS}}}\) will behave at N\(^{3}\)LO can be constructed.

Fig. 18
figure 18

The NNLO GM-VFNS function \(C_{q, q,\ {\textrm{NS}}}^{{\textrm{VF}},\ (2)}\) compared with the NNLO ZM-VFNS function \(C_{q, q,\ {\textrm{NS}}}^{{\textrm{ZM}},\ (2)}\) across a variety of x and \(Q^{2}\) values. Mass threshold is set at the charm quark level (\(m_{h}^{2} = m_{c}^{2} = 1.4~\text {GeV}^{2}\))

From Fig. 18\(C_{q,q,\ {\textrm{NS}}}^{{\textrm{VF}},\ (2)}\) quickly converges onto the ZM-VFNS function with the difference between the low and high-\(Q^{2}\) being within \(10\%\) at large-x and within \(0.01\%\) at small-x. This weak scaling with \(Q^{2}\) reinforces the statement that it is possible to approximate the N\(^{3}\)LO function relatively well without extensive low-\(Q^{2}\) information.

6.10.2 N\(^{3}\)LO

Equation (6.17d) involves a mixture of functions known exactly (ZM-VFNS high-\(Q^{2}\) limit [57]) and functions that are completely unknown (\(C_{q,q,\ {\textrm{NS}}}^{{\textrm{FF}},\ (3)}\)). This presents an issue as it is no longer possible to rely on \(C_{q,q,\ {\textrm{NS}}}^{FF,\ (3)}\) to constrain the low-\(Q^{2}\) limit. Nevertheless, by utilising the experience gained from NNLO, it is feasible to choose any sensible choice for the low-\(Q^{2}\) limit. In practice, due to the observed weak scaling in \(Q^{2}\), the exact form at low-\(Q^{2}\) will not present any noticeable differences.

A naive choice for heuristically placing the \(C_{q,q,\ {\textrm{NS}}}^{{\textrm{FF}},\ (3)}(Q^{2}\rightarrow 0)\) function would be a constant value i.e. no scaling in \(Q^{2}\). We propose to use the intuition from NNLO and the overall fit quality to give us potentially a more sensible and viable choice for the GM-VFNS approximation.Footnote 14 By inserting the high-\(Q^{2}\) limit into the NS part of Eq. (6.2), the result is a crude approximation to \(C_{q,q,\ {\textrm{NS}}}^{{\textrm{FF}},\ (3)}(Q^{2}\rightarrow 0)\). Combining this with Eq. (6.17d), we obtain a GM-VFNS parameterisation which is relatively constant across \(Q^{2}\) (similar to the NNLO behaviour) with any differences arising from the subtraction terms which are known.

Fig. 19
figure 19

The N\(^{3}\)LO GM-VFNS function \(C_{q, q,\ {\textrm{NS}}}^{{\textrm{VF}},\ (3)}\) compared with the N\(^{3}\)LO ZM-VFNS function \(C_{q, q,\ {\textrm{NS}}}^{{\textrm{ZM}},\ (3)}\) across a variety of x and \(Q^{2}\) values. \(C_{q, q,\ {\textrm{NS}}}^{{\textrm{VF}},\ (3)}\) is parameterised via Eqs. (6.17d) and (6.2). Mass threshold is set at the charm quark level (\(m_{h}^{2} = m_{c}^{2} = 1.4~\text {GeV}^{2}\))

Figure 19 shows the result of this approximation for the full \(C_{q, q,\ {\textrm{NS}}}^{{\textrm{VF}},\ (3)}\) function. We notice that the behaviour is similar to that of NNLO across all \((x, Q^{2})\) and appropriately larger in magnitude to account for the extra contributions obtained at N\(^{3}\)LO compared to NNLO. By definition, the parameterisation converges well to the ZM-VFNS scheme with the magnitude at high-\(Q^{2}\) (ZM-VFNS regime) remaining similar to that at low-\(Q^{2}\) for each specific value of x. This final point gives assurances that even if this low-\(Q^{2}\) guess is not entirely representative of the actual N\(^{3}\)LO function, the effects of including this approximation are virtually negligible in a PDF fit. Also shown in Fig. 19 is the variation in the \(C_{q, q,\ {\textrm{NS}}}^{{\textrm{VF}},\ (3)}\) function stemming solely from the \(A_{qq,H}^{{\textrm{NS}},\ (3)}\) function.

6.11 \(C_{q,q}^{{\textrm{PS}}}\)

To complete the light-quark GM-VFNS coefficient function picture the pure-singlet contribution from Eq. (3.9) and Eq. (3.11) is described by,

$$\begin{aligned} C_{q,q}^{{\textrm{FF}},\ {\textrm{PS}}}= & {} C_{q,q}^{{\textrm{VF}},\ {\textrm{PS}}} \otimes A_{qq,H}^{{\textrm{PS}}}\ +\ C_{q,g}^{{\textrm{VF}}}\otimes A_{gq,H} \nonumber \\{} & {} +\ C_{q,H}^{{\textrm{VF}},\ {\textrm{PS}}}\otimes A_{Hq} \end{aligned}$$
(6.18)
$$\begin{aligned} {\mathcal {O}}(\alpha _{s}^{0}):&{\quad } C_{q,q}^{{\textrm{FF}},\ {\textrm{PS}},\ (0)}= 0 \end{aligned}$$
(6.19a)
$$\begin{aligned} {\mathcal {O}}(\alpha _{s}^{1}):&{\quad } C_{q,q}^{{\textrm{FF}},\ {\textrm{PS}},\ (1)}= 0 \end{aligned}$$
(6.19b)
$$\begin{aligned} {\mathcal {O}}(\alpha _{s}^{2}):&{\quad } C_{q,q}^{{\textrm{FF}},\ {\textrm{PS}},\ (2)}= C_{q,q,\ {\textrm{PS}}}^{{\textrm{VF}},\ (2)} \end{aligned}$$
(6.19c)
$$\begin{aligned} {\mathcal {O}}(\alpha _{s}^{3}):&{\quad } C_{q,q}^{{\textrm{FF}},\ {\textrm{PS}},\ (3)}= C_{q,q,\ {\textrm{PS}}}^{{\textrm{VF}},\ (3)} + C_{q,g}^{{\textrm{VF}},\ (1)}\otimes A_{gq,H}^{(2)}. \end{aligned}$$
(6.19d)

As with the non-singlet analysis the heavy flavour contributions to the pure-singlet appear at higher orders to allow for the possibility of heavy quark contributions. In the pure-singlet case, the heavy flavour contributions are pushed one order higher than the non-singlet due to the requirement for an extra intermediary gluon.

6.11.1 N\(^{3}\)LO

In the pure-singlet case, the FFNS function is non-existent up until N\(^{3}\)LO. Because of this, we choose to parameterise the pure-singlet with a weak constraint suppressing the FFNS function \(C_{q,q}^{{\textrm{FF}},\ {\textrm{PS}},\ (3)}\) across all x for very low-\(Q^{2}\). The reason for this is that the coefficient functions acquire more contributions as they exist through higher orders. If \(C_{q,q}^{{\textrm{FF}},\ {\textrm{PS}},\ (3)}\) is beginning at this order, then one could expect the low-\(Q^{2}\) form to be relatively small compared to the known ZM-VFNS function [57]. This is somewhat justified by the low-\(Q^{2}\) kinematic restrictions for the singlet distribution which broadly manifest into a suppression at low-\(Q^{2}\). We reiterate here that the low-\(Q^{2}\) form of this function is still essentially around the same magnitude across all \(Q^{2}\). Therefore, as with \(C_{q,q}^{{\textrm{FF}},\ {\textrm{NS}},\ (3)}\), it will be virtually negligible in the overall structure function.

After constructing the approximation for \(C_{q,q}^{{\textrm{FF}},\ {\textrm{PS}},\ (3)}\) with Eq. (6.2), Eq. (6.19d) is used to approximate the GM-VFNS function. The exact form of Eq. (6.19d) is chosen based on intuition and where the best fit quality can be achieved.Footnote 15

Fig. 20
figure 20

The N\(^{3}\)LO GM-VFNS function \(C_{q, q,\ {\textrm{PS}}}^{{\textrm{VF}},\ (3)}\) compared with the N\(^{3}\)LO ZM-VFNS function \(C_{q, q,\ {\textrm{PS}}}^{{\textrm{ZM}},\ (3)}\) across a variety of x and \(Q^{2}\) values. \(C_{q, q,\ {\textrm{PS}}}^{{\textrm{VF}},\ (3)}\) is parameterised via Eqs. (6.19d) and (6.2). Mass threshold is set at the charm quark level (\(m_{h}^{2} = m_{c}^{2} = 1.4~\text {GeV}^{2}\))

It can be seen from Fig. 20 that the overall magnitude of \(C_{q, q,\ {\textrm{PS}}}^{{\textrm{VF}},\ (3)}\) decreases substantially towards large-x as one would expect from a pure-singlet function. Inspecting the predicted values of \(C_{q, q,\ {\textrm{PS}}}^{{\textrm{VF}},\ (3)}\), we can confirm that the non-singlet function from Fig. 19 begins to dominate at large-x. Conversely towards small-x, \(C_{q, q,\ {\textrm{PS}}}^{{\textrm{VF}},\ (3)}\) is much larger than \(C_{q, q,\ {\textrm{NS}}}^{{\textrm{VF}},\ (3)}\), thereby preserving the familiar interplay between quark distributions. The suppression of the FFNS parameterisation towards low-\(Q^{2}\) is also seen to give sensible results in terms of the expected percentage change in magnitude through the range of \(Q^{2}\) values. Specifically we see \(< 10\%\) difference in magnitude between low and high-\(Q^{2}\). Since scale violating terms become more dominant at higher orders and we are essentially at leading order in terms of heavy flavour contributions, a high level of scaling with \(Q^{2}\) is not expected at this order.

6.12 \(C_{q,g}\)

Finally the gluon-light quark coefficient function is constructed from Eqs. (3.9) and (3.11) to be,

$$\begin{aligned} C_{q,g}^{{\textrm{FF}}}&= C_{q,q}^{{\textrm{VF}}} \otimes A_{qg,H}\ +\ C_{q,g}^{{\textrm{VF}}}\otimes A_{gg,H} \nonumber \\&\quad +\ C_{q,H}^{{\textrm{VF}},\ {\textrm{PS}}}\otimes A_{Hg} \end{aligned}$$
(6.20)
$$\begin{aligned} {\mathcal {O}}(\alpha _{s}^{0}):&{\quad } C_{q,g}^{{\textrm{FF}},\ (0)}\ =\ 0 \end{aligned}$$
(6.21a)
$$\begin{aligned} {\mathcal {O}}(\alpha _{s}^{1}):&{\quad } C_{q,g}^{{\textrm{FF}},\ (1)}\ =\ C_{q,g}^{{\textrm{VF}},\ (1)} \end{aligned}$$
(6.21b)
$$\begin{aligned} {\mathcal {O}}(\alpha _{s}^{2}):&{\quad } C_{q,g}^{{\textrm{FF}},\ (2)}\ =\ C_{q,g}^{{\textrm{VF}},\ (2)} + C_{q,g}^{{\textrm{VF}},\ (1)}\otimes A_{gg, H}^{(1)} + A_{qg, H}^{(2)} \end{aligned}$$
(6.21c)
$$\begin{aligned} {\mathcal {O}}(\alpha _{s}^{3}):&{\quad } C_{q,g}^{{\textrm{FF}},\ (3)}\ =\ C_{q,g}^{{\textrm{VF}},\ (3)} + A_{qg,H}^{(3)} + C_{q,q}^{{\textrm{VF}},\ (1)}\otimes A_{qg,H}^{(2)} \nonumber \\&\qquad \qquad \qquad + C_{q,g}^{{\textrm{VF}},\ (2)}\otimes A_{gg,H}^{(1)} + C_{q,g}^{{\textrm{VF}},\ (1)}\otimes A_{gg,H}^{(2)} \nonumber \\&\qquad \qquad \qquad + C_{q,H,\ {\textrm{PS}}}^{{\textrm{VF}},\ (2)}\otimes A_{Hg}^{(1)}. \end{aligned}$$
(6.21d)

For \(C_{q,g}\), the FFNS function is non-existent up to NNLO, similar to \(C_{q,q,\ {\textrm{NS}}}^{{\textrm{FF}}, (3)}\). However, the \(A_{qg,H}\) contribution at NNLO is sub-leading in \(n_{f}\) [39] and is therefore not considered here.

6.12.1 N\(^{3}\)LO

Fig. 21
figure 21

The N\(^{3}\)LO GM-VFNS function \(C_{q, g}^{{\textrm{VF}},\ (3)}\) compared with the N\(^{3}\)LO ZM-VFNS function \(C_{q, g}^{{\textrm{ZM}},\ (3)}\) across a variety of x and \(Q^{2}\) values. \(C_{q, g}^{{\textrm{VF}},\ (3)}\) is parameterised via Eqs. (6.21d) and (6.2). Mass threshold is set at the charm quark level (\(m_{h}^{2} = m_{c}^{2} = 1.4~\text {GeV}^{2}\))

At N\(^{3}\)LO in Eq. (6.21d), no information is available for the \(C_{q, g}^{{\textrm{FF}},\ (3)}\) at low-\(Q^{2}\). Whereas at high-\(Q^{2}\) the ZM-VFNS function is known [57]. To construct the parameterisation, we apply the same method described for \(C_{q,q,\ {\textrm{PS}}}^{{\textrm{FF}},\ (3)}\). Specifically, by applying a suppression to the FFNS parameterisation in the low-\(Q^{2}\) limit. After constructing the parameterisation for \(C_{q, g}^{{\textrm{FF}},\ (3)}\) with Eq. (6.2), Eq. (6.21d) is used to approximate the GM-VFNS function. Since there is no information in the low-\(Q^{2}\) limit, the parameterisation in Eq. (6.2) is chosen roughly based on how the fit prefers the evolution in \(Q^{2}\) to behave.

Figure 21 illustrates the GM-VFNS function in Eq. (6.21d) with Eq. (6.2) as \(C_{q, g}^{{\textrm{FF}},\ (3)}\) across a range of x and \(Q^{2}\). \(C_{q, g}^{{\textrm{VF}},\ (3)}\) increases in magnitude when moving to smaller x and by definition converges onto the ZM-VFNS function. The convergence in this case is chosen to be less steep than for the light quark convergences due to some minor tensions in the fit.Footnote 16 The magnitude of \(C_{q, g}^{{\textrm{VF}},\ (3)}\) across the entire range of \(Q^{2}\) is still relatively constant, although less flat than the behaviour predicted for \(C_{q,q,\ {\textrm{PS}}/{\textrm{NS}}}^{{\textrm{VF}},\ (3)}\). However, considering Eq. (6.21d), some justification for this behaviour can be offered. When comparing the contributions to the FFNS functions in the NS, PS and gluon cases (Eqs. (6.17d), (6.19d) and (6.21d) respectively), the \(A_{Hg}\) and \(A_{gg,H}\) contributions involved in \(C_{q,g}^{{\textrm{FF}}}\) are much larger than the contributions from \(A_{gq,H}\), \(A_{Hq}\) and \(A^{{\textrm{NS}}}_{qq,H}\). Therefore we can expect a larger difference across \(Q^{2}\) for the \(C_{q, g}^{{\textrm{VF}},\ (3)}\) function. With this being said, the specific form at low-\(Q^{2}\) is not very important in current PDF fits, only that the form is continuous and valid.

7 N\(^{3}\)LO K-factors

Thus far the primary concern has been the N\(^{3}\)LO additions to the theoretical form of the DIS cross section. However, to complement these changes it is necessary to extend other cross section data to the same order. With these ingredients it is possible to maintain a consistent approximate N\(^{3}\)LO treatment across all datasets. At the time of writing, K-factors which provide exact transformations for each dataset up to NNLO are available.Footnote 17 Although there has been progress in N\(^{3}\)LO calculations for various processes including Drell–Yan (DY), top production and Higgs processes [63,64,65,66,67,68,69,70,71,72,73,74,75,76], there is still missing information on how these K-factors behave above NNLO. In this section we investigate the effects of the K-factors for each dataset when extended to N\(^{3}\)LO. Five process categories are considered separately: Drell–Yan, Jets, \(p_{T}\) jets, \(t{\bar{t}}\) production and Dimuon data. Inside each of these process categories we assume a perfect positive correlation between the behaviour of datasets i.e. all Drell–Yan K-factor shifts from NNLO are positively correlated. Clearly this treatment is a simplification, based on the expectation of a high degree of correlation between datasets concerned with the same processes. In practice, the uncertainty introduced from including these K-factors is already relatively small compared to the other sources of MHOUs already discussed, therefore any correction to this is guaranteed to be small (this will be shown more clearly in Sect. 8).

7.1 Extension to aN\(^{3}\)LO

The extension to aN\(^{3}\)LO is parameterised with a mixture of the NLO and NNLO K-factors. This allows control of the magnitude and shape of the transformation from NNLO to aN\(^{3}\)LO, using the known shifts from lower orders.

The basic idea is presented as,

$$\begin{aligned} K^{{\textrm{N}}^{3}{\textrm{LO}}/{\textrm{LO}}} = a_{\textrm{NNLO}}\ K^{{\textrm{NNLO}}/{\textrm{LO}}} + a_{\textrm{NLO}}\ K^{{\textrm{NLO}}/{\textrm{LO}}}, \end{aligned}$$
(7.1)

where \(K^{{\textrm{N}}^{3}{\textrm{LO}}/{\textrm{LO}}}, K^{{\textrm{NNLO}}/{\textrm{LO}}}\ {\textrm{and}}\ K^{{\textrm{NLO}}/{\textrm{LO}}}\) are the relevant K-factors with respect to the LO cross section, and \(a_{{\textrm{N}}({\textrm{N}}){\textrm{LO}}}\) are variational parameters controlling the mixture of NNLO and NLO K-factors included in the N\(^{3}\)LO K-factor approximation. Hence we have 2 parameters for each of the five processes included in the fit, and now 20 theory nuisance parameters in total – 10 controlling aN\(^{3}\)LO K-factors, 5 controlling aN\(^{3}\)LO splitting functions and 5 controlling heavy flavour aN\(^{3}\)LO contributions.

Fig. 22
figure 22

K-factor expansion up to aN\(^{3}\)LO shown for the LHCb 2015 \(W,\ Z\) dataset [77, 78]. The K-factors shown here are absolute i.e. all with respect to LO (\(K^{{\textrm{N}}^{\textrm{m}}{\textrm{LO}}/{\textrm{LO}}}\ \forall \ m \in \{1, 2, 3\}\))

Fig. 23
figure 23

K-factor expansion up to aN\(^{3}\)LO shown for the ATLAS 7 TeV high precision \(W,\ Z\) dataset [79]. The K-factors shown here are absolute i.e. all with respect to LO (\(K^{{\textrm{N}}^{\textrm{m}}{\textrm{LO}}/{\textrm{LO}}}\ \forall \ m \in \{1, 2, 3\}\))

To describe this formalism in terms of physical observables we consider the cross section,

$$\begin{aligned} \sigma = \sigma _{0} + \sigma _{1} + \sigma _{2} + \dots \equiv \sigma _{\textrm{NNLO}} + \dots , \end{aligned}$$
(7.2)

where there is an implicit order of \(\alpha _{s}^{p+i}\) absorbed into the definition of \(\sigma _{i}\) beginning at the relevant LO for each process, i.e. \(p = 0\) for DY.

\(K^{{\textrm{NLO}}/{\textrm{LO}}}\) is then the relative shift from \(\sigma _{\textrm{LO}}\) to \(\sigma _{\textrm{NLO}}\),

$$\begin{aligned} K^{{\textrm{NLO}}/{\textrm{LO}}} = \frac{\sigma _{0} + \sigma _{1}}{\sigma _{0}} = 1 + \frac{\sigma _{1}}{\sigma _{0}}. \end{aligned}$$
(7.3)

Similarly for NNLO we have,

$$\begin{aligned} K^{{\textrm{NNLO}}/{\textrm{LO}}} = \frac{\sigma _{0} + \sigma _{1} + \sigma _{2}}{\sigma _{0}} = 1 + \frac{\sigma _{1}}{\sigma _{0}} + \frac{\sigma _{2}}{\sigma _{0}}. \end{aligned}$$
(7.4)

Moving to N\(^{3}\)LO, we write

$$\begin{aligned} \sigma = \sigma _{0} + \sigma _{1} + \sigma _{2} + \sigma _{3} + \dots \equiv \sigma _{{\textrm{N}}^{3}{\textrm{LO}}} + \dots , \end{aligned}$$
(7.5)

where \(\sigma _{3} = a_{1} \sigma _{1} + a_{2} \sigma _{2}\) is approximated as some superposition of the two lower orders, with \((a_{1}, a_{2}) = (0, 0)\) reproducing the NNLO case.

Pushing forward with this approximation and using the definitions for \(\sigma _{1,2}\) in terms of K-factors (Eqs. (7.3) and (7.4)) we have,

$$\begin{aligned} \sigma _{{\textrm{N}}^3{\textrm{LO}}}&= \sigma _{\textrm{NNLO}} + a_{1}\sigma _{1} + a_{2}\sigma _{2} \end{aligned}$$
(7.6)
$$\begin{aligned}&= \sigma _{\textrm{NNLO}} + a_{1}\sigma _{0} (K^{{\textrm{NLO}}/{\textrm{LO}}} - 1) \nonumber \\&\quad + a_{2} \sigma _{0} (K^{{\textrm{NNLO}}/{\textrm{LO}}} - K^{{\textrm{NLO}}/{\textrm{LO}}}) \end{aligned}$$
(7.7)

since,

$$\begin{aligned} \sigma _{1}&= \sigma _{0}\left( K^{\mathrm {NLO/LO}} - 1\right) \end{aligned}$$
(7.8)
$$\begin{aligned} \sigma _{2}&= \sigma _{0}\left( K^{\mathrm {NNLO/LO}} - \sigma _{1} - \sigma _{0}\right) \nonumber \\&= \sigma _{0}\left( K^{\mathrm {NNLO/LO}} - K^{\mathrm {NLO/LO}}\right) . \end{aligned}$$
(7.9)

From here one can obtain,

$$\begin{aligned} K^{\mathrm {NNLO/LO}} - K^{\mathrm {NLO/LO}}&= \frac{\sigma _{2}}{\sigma _{0}} = \frac{\sigma _{2} + \sigma _{0}}{\sigma _{0}} - 1 \nonumber \\&\approx \frac{\sigma _{2} + \sigma _{1} + \sigma _{0}}{\sigma _{1} + \sigma _{0}} - 1 \nonumber \\&= K^{\mathrm {NNLO/NLO}} - 1, \end{aligned}$$
(7.10)

assuming \(\sigma _{1} \ll \sigma _{0}\), which is in general true for a valid perturbative expansion. Using (7.10) \(\sigma _{{\textrm{N}}^{3}{\textrm{LO}}}\) can be expressed by,

$$\begin{aligned} \sigma _{{\textrm{N}}^3{\textrm{LO}}}\simeq & {} \sigma _{\textrm{NNLO}}\left( 1 + a_{1}(K^{\mathrm {NLO/LO}} - 1) \right. \nonumber \\{} & {} \left. + a_{2} (K^{\mathrm {NNLO/NLO}} - 1)\right) , \end{aligned}$$
(7.11)

where \(\sigma _{2} \ll \sigma _{1} \ll \sigma _{0}\).

This defines the proposed approximated N\(^{3}\)LO cross section. It is given in terms of extra contributions from lower order shifts, which are controlled by variational parameters \(a_{1}\) and \(a_{2}\). It is also true that the contributions to N\(^{3}\)LO are expected to be suppressed by \(\alpha _{s} / \pi \) in the NNLO case and \((\alpha _{s}/\pi )^{2}\) in the NLO case to account for the strengths of each contribution. Currently this is taken into account within the variational parameters \(a_{1}, a_{2}\). However for the purpose of this description, it is more appropriate to explicitly redefine \(a_{1}, a_{2} = a_{s}^{2}\hat{a}_{1}, a_{s}\hat{a}_{2}\) where \(a_{s} = \mathcal {N}\alpha _{s}\) and \(\mathcal {N}\) is some normalisation factor. This then results in,

$$\begin{aligned} K^{{\textrm{N}}^3{\mathrm {LO/LO}}}= & {} K^{\mathrm {NNLO/LO}}\left( 1 + \hat{a}_{1}\mathcal {N}^{2}\alpha _{s}^{2}(K^{\mathrm {NLO/LO}} - 1) \right. \nonumber \\{} & {} \left. + \hat{a}_{2} \mathcal {N}\alpha _{s} (K^{\mathrm {NNLO/NLO}} - 1)\right) , \end{aligned}$$
(7.12)

where the LO cross section \(\sigma _{0}\) is cancelled and Eq. (7.12) is written in terms of the K-factor shifts only. Equation (7.12) also implicitly includes the correct order \({\mathcal {O}}(\alpha _{s}^{3})\) in the parameterisation through (7.3) and (7.4). We can then choose \(\mathcal {N}\) in order to set the approximate magnitude of our variational parameters \(\hat{a}_{1}, \hat{a}_{2}\). Given \(\alpha _{s} \sim 0.1\) for the processes considered, if we neglect \(\mathcal {N}\) (i.e. choose \(\mathcal {N} \sim 1\)), then our order by order reduction in the magnitude of the K-factors would be \(\sim 10\%\) for \({\mathcal {O}}(1)\) for variational parameters, however from previous orders we see that typically K-factors tend to be \(30{-}40\%\) of the previous order, therefore we instead choose \(\mathcal {N} = 3\). This then ensures the natural scale of variation allowed is also of this order with \({\mathcal {O}}(1)\) variational parameters describing the admixture of NLO and NNLO K-factors, with conservative penalties applied accordingly.

Table 2 Table showing the relevant DY datasets and how the individual \(\chi ^{2}\) changes from NNLO by including the N\(^{3}\)LO treatment of K-factors, and theoretical N\(^{3}\)LO additions discussed earlier. The result with purely NNLO K-factors included for all data in the fit is also given
Fig. 24
figure 24

K-factor expansion up to aN\(^{3}\)LO shown for the CMS 7 TeV jets dataset (\(R=0.7\)) [101]. The K-factors shown here are absolute i.e. all with respect to LO (\(K^{{\textrm{N}}^{\textrm{m}}{\textrm{LO}}/{\textrm{LO}}}\ \forall \ m \in \{1, 2, 3\}\))

Reflecting on this, it is worth noting that these fitted K-factors will be sensitive to all orders, not just N\(^{3}\)LO. Considering these K-factors as approximating asymptotic behaviour to all orders in perturbation theory when assessing the stability of predictions, we can be less concerned with any somewhat large shifts from NNLO to aN\(^{3}\)LO, as we will specifically see in the case of Figs. 25 and 26. Finally, we remind the reader that at higher orders, new terms with more divergent leading logarithms appear which are missed by the current theoretical description. Due to this, the all-orders asymptotic description will still remain approximate up to the inclusion of more divergent leading logarithms in \((x,Q^{2})\) limits at even higher orders.

7.2 Numerical results

Using this formalism for the aN\(^{3}\)LO K-factors, we present the global fit results for each of the five process categories considered.

7.2.1 Drell–Yan processes

For the Drell–Yan processes (all calculated at \(\mu _{r,f}=m_{ll}/2\)), a reduction of \(\sim 1{-}2\%\) in the K-factor shift is predicted across most of the corresponding datasets at aN\(^{3}\)LO. This is in agreement with recent work [64]. An example of this reduction is shown in Fig. 22.

Conversely, Fig. 23 displays an example where the K-factor shift has much less of a contribution at N\(^{3}\)LO. This is a feature of the ATLAS datasets included in the fit due to the impact of chosen \(p_{T}\) cuts which reduce the sensitivity to higher orders.

Table 3 Table showing the relevant jet datasets and how the individual \(\chi ^{2}\) changes from NNLO by including the N\(^{3}\)LO treatment of K-factors. The result with purely NNLO K-factors included for all data in the fit is also given

Table 2 demonstrates that in most cases, the new fitted DY aN\(^{3}\)LO K-factors are producing a slightly better fit with a moderate cumulative effect. We remind the reader that we have included a total of 20 extra parameters into the fit. These extra 20 parameters are fit across all datasets and multiple processes, whereas the decrease here is for a subset of datasets corresponding to the DY processes included in a global fit.

Across these datasets, the K-factors act to extend the description of these processes to approximate N\(^{3}\)LO. The result of including this procedure is a better fit in the DY regime while also relaxing tensions with other processes included in the fit. Comparing the \(\Delta \chi ^{2}\) results with and without aN\(^{3}\)LO K-factors, we can see the extent to which the K-factors and all other N\(^{3}\)LO additions are reducing the overall \(\chi ^{2}\).

In some individual cases, the dataset \(\chi ^{2}\) becomes somewhat worse relative to NNLO most notably the ATLAS 8 TeV double differential Z [100], whilst in a few others the \(\chi ^{2}\) improvement upon addition of the aN\(^{3}\)LO splitting functions, transition matrix elements and coefficient function pieces is seen to deteriorate upon addition of the aN\(^{3}\)LO K-factors, e.g. the LHCb 2015 W, Z [77, 78], which exhibits a mild preference for the N\(^{3}\)LO theory with NNLO K-factors. The addition of the aN\(^{3}\)LO K-factors do nonetheless result in a net reduction in \(\chi ^{2}\) and for a large number of cases the aN\(^{3}\)LO K-factors allow for a slight reduction in the individual \(\chi ^{2}\). Of particular note is the 7TeV high precision W, Z [79] data, which improves by over 20 points. This may indicate that the challenge in achieving a statistically good fit to this high precision data that is observed across all NNLO PDF fits is in part related to the lack of higher order corrections in the theory. On the other hand, for arguably the other most precise (and multi-differential) data, namely the ATLAS 8 TeV double differential Z [100], the fit quality deteriorates. This may be due to the differing mass binning and/or cuts, but in general it is difficult to draw firm conclusions here, at least with the current K-factor treatment. The CMS double diff. Drell–Yan [94] also shows a particularly large reduction when these are added on top of the aN\(^{3}\)LO theory, this is a dataset which shows some tension with the DIS N\(^{3}\)LO additions which is then eased by the addition of the aN\(^{3}\)LO K-factors.

7.2.2 Jet production processes

The jets processes (all calculated for \(\mu _{r,f}=p_{T}^{jet}\)) show a general increase in the K-factor shifts from NNLO as seen in Fig. 24, which displays the K-factor expansion up to aN\(^{3}\)LO for the CMS 7 TeV jets dataset [101]. It is apparent that there is a mild shift to N\(^{3}\)LO from the NNLO K-factor. This behaviour follows what one might expect for a perturbative expansion considering the forms of the NLO and NNLO functions.

A \(\chi ^{2}\) summary of the Jets datasets is provided in Table 3. By combining the N\(^{3}\)LO structure function and DGLAP additions (Sects. 46) with NNLO K-factors, the fit exhibits a substantial increase in the \(\chi ^{2}\) from Jets data. Including aN\(^{3}\)LO K-factors acts to reduce some of this tension with around half the initial overall \(\chi ^{2}\) increase still remaining. We note that in the case of the ATLAS \(7~\text {TeV}\) jets [104], it is well known that there are issues in achieving a good fit quality across all rapidity bins (see [107] for a detailed study as well as [108] where the 8 TeV data are presented and the same issues observed). In [107, 108] the possibility of decorrelating some of the systematic error sources where the degree of correlation is less well established, was considered and indeed in our study we follow such a procedure, as described in [3]. Alternatively, however, it might be that the issues in fit quality could at least part be due to deficiencies in theoretical predictions, such as MHOs. To assess this, we revert to the default ATLAS correlation scenario and repeat the global fit. We find that the \(\chi ^{2}\) deteriorates by \(+40.7\) points to 256.6, which is very close to the result found in a pure NNLO fit [3]. In other words, in our framework the impact of MHOUs does not resolve this issue.

The \(\chi ^{2}\) results for datasets in Table 3 show evidence for some tensions with the N\(^{3}\)LO form of the high-x gluon. It is also apparent that the CMS data is in more tension than ATLAS datasets with N\(^{3}\)LO structure function and DGLAP theory. Therefore it will be interesting to see how this behaviour changes when considering this data as dijets in the global fit [109]. We do not consider the dijet data here, though this will be addressed in a future publication.

7.2.3 \(Z\ p_{T}\) and vector boson \(+\) jets processes

In the case of \(Z\ p_{T}\) & vector boson \(+\) jet processes (all calculated at \(\mu _{r,f}=\sqrt{p_{T,ll}^2 + m_{ll}^2}\)), the K-factor shift is almost completely dominated by the ATLAS 8 TeV \(Z\ p_{T}\) dataset [110] (due to the larger number of data points included in this dataset) shown in Fig. 25. The gluon is less directly constrained than the quarks in a global fit. Therefore it can be expected that the significant modifications at small-x will indirectly affect the high-x gluon, where these processes are most sensitive. Considering the jet production processes in Table 3, when performing separate PDF fits not including ATLAS 8 TeV \(Z\ p_{T}\) data [110], we find a reduction of \(\Delta \chi ^{2} = - 7.0\) in CMS \(8~\text {TeV}\) jets data [105] eliminating most of the tension for this dataset (similar to MSHT20 NNLO results in Table 16 of [3]). Further to this, when not including HERA and ATLAS 8 TeV \(Z\ p_{T}\) data we find a reduction of \(\Delta \chi ^{2} = - 26.4\) in CMS \(8~\text {TeV}\) jet data [105] and \(\Delta \chi ^{2} = - 12.7\) in CMS \(2.76~\text {TeV}\) jet data [106].

Although the overall magnitude of the K-factor in Fig. 25 may seem large, this new shift is contained within a \(15\%\) increase from NNLO (due to the NLO and NNLO K-factors also being significant). Moreover, not only does the size of this shift have some dependence on the central scale, but this shift may be more correctly interpreted as the preferred all-orders cross section rather than simply the pure \({\textrm{N}}^3{\textrm{LO}}\) result.

The extent of the \(\chi ^{2}\) reduction in the \(Z\ p_{T}\) datasets is shown in Table 4. Note that around \(\sim 68\%\) of the improvement to the ATLAS \(8~\text {TeV}\ Z\ p_{T}\) [110] \(\chi ^{2}\) is due to the extra N\(^{3}\)LO theory included in the DGLAP and DIS descriptions. It is also known the ATLAS \(8~\text {TeV}\ Z\ p_{T}\) data [110] previously exhibited a significant level of tension with many datasets (including HERA data) at NNLO [3]. This was investigated by performing a global PDF fit with and without HERA data and comparing the individual \(\chi ^{2}\)’s from each dataset. At NNLO it was found that the ATLAS \(8~\text {TeV}\ Z\ p_{T}\) dataset [110] reduced by \(\Delta \chi ^{2} = -39.2\) when fitting to all non-HERA data (see Table 19). At aN\(^{3}\)LO we observe that the ATLAS \(8~\text {TeV}\ Z\ p_{T}\) dataset [110] actually increased by \(\Delta \chi ^{2} = +12.8\) when fitting to all non-HERA data (see Table 20). The aN\(^{3}\)LO additions therefore eliminate this tension previously observed at NNLO, suggesting that this issue at NNLO in fitting the ATLAS \(8 \text {TeV} Z\ p_{T}\) dataset [110] was a sign of MHOs. This is in contrast with the result observed for ATLAS \(7~\text {TeV}\) jets [104] where the issues with fit quality were not alleviated by the inclusion of known higher order N3LO information and approximations for the remaining missing pieces.

Finally we remind the reader that the CMS \(7~\text {TeV}\ W + c\) dataset [62] does not include a K-factor at NNLO. To overcome this, we tie the overall N\(^{3}\)LO K-factor shift to the NLO value (\(K^{\mathrm {NNLO/NLO}} = 1\) in Eq. (7.12)), therefore contributing as an overall normalisation effect.

Fig. 25
figure 25

K-factor expansion up to aN\(^{3}\)LO shown for the ATLAS 8 TeV \(Z\ p_{T}\) dataset [110]. The K-factors shown here are absolute i.e. all with respect to LO (\(K^{{\textrm{N}}^{\textrm{m}}{\textrm{LO}}/{\textrm{LO}}}\ \forall \ m \in \{1, 2, 3\}\))

7.2.4 Top quark processes

Table 4 Table showing the relevant \(Z\ p_{T}\) and Vector Boson jet datasets and how the individual \(\chi ^{2}\) changes from NNLO by including the N\(^{3}\)LO treatment of K-factors, and theoretical N\(^{3}\)LO additions discussed earlier. The result with purely NNLO K-factors included for all data in the fit is also given
Fig. 26
figure 26

K-factor expansion up to aN\(^{3}\)LO shown for the CMS 8 TeV single diff. \(t{\bar{t}}\) dataset [112]. The K-factors shown here are absolute i.e. all with respect to LO (\(K^{{\textrm{N}}^{\textrm{m}}{\textrm{LO}}/{\textrm{LO}}}\ \forall \ m \in \{1, 2, 3\}\))

Table 5 Table showing the relevant Top Quark datasets and how the individual \(\chi ^{2}\) changes from NNLO by including the N\(^{3}\)LO treatment of K-factors, and theoretical N\(^{3}\)LO additions discussed earlier. The result with purely NNLO K-factors included for all data in the fit is also given
Fig. 27
figure 27

K-factor expansion up to aN\(^{3}\)LO shown for the NuTeV \(\nu N \rightarrow \mu \mu X\) dataset [133]. The K-factors shown here are absolute i.e. all with respect to LO (\(K^{{\textrm{N}}^{\textrm{m}}{\textrm{LO}}/{\textrm{LO}}}\ \forall \ m \in \{1, 2, 3\}\))

Moving to top quark processes, for the single differential datasets the scale choice for \(\mu _{r,f}\) is \(H_T/4\) with the exception of data differential in the average transverse momentum of the top or anti-top, \(p_T^t,p_T^{{\bar{t}}}\), for which \(m_T/2\) is used. For the double diff. dataset the scale choice is \(H_T/4\) and for the inclusive top \(\sigma _{t{\bar{t}}}\) a scale of \(m_{t}\) is chosen. Figure 26 displays the K-factor shifts up to N\(^{3}\)LO for the CMS 8 TeV single diff. \(t{\bar{t}}\) dataset [112], which shows the greatest reduction in its \(\chi ^2\). A familiar perturbative pattern can be seen for this process’s K-factors, with the shift at aN\(^{3}\)LO increasing by around 3–4% from NNLO. This is in agreement with a recent \(\sim 3.5\%\) predicted increase in the N\(^3\)LO \(t{\bar{t}}\) production K-factor at \(8~\text {TeV}\) in [66], whereby an approximate N\(^{3}\)LO cross section for \(t{\bar{t}}\) production in proton-proton collisions has been calculated employing a resummation formalism [113,114,115,116].

The \(\chi ^{2}\) results in Table 5 display a mildly better fit for top processes, with most datasets not feeling a large overall effect from the N\(^{3}\)LO additions. Comparing with and without aN\(^{3}\)LO K-factors, we see a slightly better fit overall, with most of the reduction in overall \(\chi ^{2}\) stemming from CMS \(8~\text {TeV}\) double diff. \(t{\bar{t}}\) data [132].

7.2.5 Semi-inclusive DIS dimuon processes

The final set of results to consider in this section are the aN\(^{3}\)LO K-factors associated with semi-inclusive DIS dimuon cross sections (with \(\mu _{r,f}^2=Q^{2}\)). Although the dimuon cross section is associated with the DIS process described from our approximate N\(^{3}\)LO structure function picture, it is a semi-inclusive DIS process. Therefore it is sensible to treat this process as entirely separate from DIS. The NNLO cross-sections used in this case are a general-mass variable flavour number scheme extension of the results in [134], as described in more detail in [3]. The K-factors shown in Fig. 27 (for the NuTeV \(\nu N \rightarrow \mu \mu X\) data [133]) are somewhat similar to NNLO. The reason for this is mostly due to these datasets also including a branching ratio (\(\textrm{BR}(D\rightarrow \mu )\)) which absorbs any overall normalisation shifts. This behaviour is not a concern since in practice these two work in tandem and when combined together it makes no difference where the normalisation factors are absorbed into.

Investigating the change in the BR’s with the addition of N\(^{3}\)LO contributions in Table 6, the BR at N\(^{3}\)LO decreases substantially from NNLO, with little difference from the addition of aN\(^{3}\)LO K-factors. The predicted dimuon BR at aN\(^{3}\)LO is inside the allowed \(\pm 1 \sigma \) range of \(0.092 \pm 0.010\). When performing a fit with the BR fixed at its central value (BR = 0.092), one is able to observe the effect of manually forcing the normalisation into the K-factor variation alone. The result of this is a worse global fit quality \(\Delta \chi ^{2} = + 11.2\), where \(+3.9\) units arise from an increased penalty for the Dimuon K-factor description and \(+2.3\) units from a slightly worse fit to the Dimuon datasets listed in Table 7. The rest of the observed increase in \(\chi ^{2}_{\textrm{global}}\) is dominated by a \(+4.1\) increase in the ATLAS \(7~\text {TeV}\) high prec. WZ [79] data due to a smaller strange quark PDF (compensating the higher BR in dimuon datasets). Returning to consider the case of the K-factors and BR together, the predicted effect on dimuon datasets is very similar. However due to the errors accounting for a larger allowed shift in the BR relative to the K-factors, the fit favours moving the BR by a larger amount to reduce the penalty \(\chi ^{2}\) contribution from K-factors which explains the results shown in Table 6.

Table 7 further confirms the expectation that the Dimuon datasets are not too sensitive to N\(^{3}\)LO additions. The results with and without a full treatment of aN\(^{3}\)LO K-factors are also similar in magnitude. It is therefore clear that the dimuon BR’s are compensating for any indirect normalisation effects from the form of the PDFs in the full aN\(^{3}\)LO fit, as opposed to the aN\(^{3}\)LO K-factors.

8 MSHT20 approximate N\(^{3}\)LO global analysis

With the inclusion of all N\(^{3}\)LO approximations discussed in earlier sections resulting in 20 extra free parameters from the NNLO MSHT20 fit, we now present the results for the first approximate N\(^{3}\)LO global PDF fit with theoretical uncertainties from missing N\(^{3}\)LO contributions and implicitly some MHOs beyond this. This includes the results for the best fit for the nuisance parameters describing the theoretical uncertainty. We remind the reader that these are parameterised specifically to represent the missing uncertainty at N\(^3\)LO, which is currently the dominant source of uncertainty due to missing higher orders. However, the fit will also be influenced, to a limited extent, by effects at even higher orders. Later in the section we discuss this in more detail.

8.1 \(\chi ^{2}\) breakdown

Table 8 shows the global \(\chi ^{2}\) results for an aN\(^{3}\)LO best fit, inclusive of penalties associated with the new theory variational parameters (from Eq. (2.17)). The theory parameters are labelled as: \(A_{Hg}(a_{Hg})\), \(A_{gg,H}(a_{gg,H})\), \(A_{qq,H}^{{\textrm{NS}}}(a_{qq,H}^{{\textrm{NS}}})\) for the transition matrix elements; \(P_{qq}^{{\textrm{NS}}}(\rho _{qq}^{{\textrm{NS}}})\), \(P_{qq}^{{\textrm{PS}}}(\rho _{qq}^{{\textrm{PS}}})\), \(P_{qg}(\rho _{qg})\), \(P_{gq}(\rho _{gq})\) and \(P_{gg}(\rho _{gg})\) for the splitting functions; and \(c_{q}^{\textrm{NLL}}\) and \(c_{g}^{\textrm{NLL}}\) correspond to the NLL parameters discussed in Sect. 6.2. These are supplemented by the 10 additional nuisance parameters for the NLO and NNLO K-factors for the five process categories. These 20 additional parameters and their associated penalties are also shown in Table 8.

The extra N\(^{3}\)LO theory and level of freedom introduced has allowed the fit to achieve a total \(\Delta \chi ^{2} = - 150.4\) compared to MSHT20 NNLO total \(\chi ^{2}\) (Table 7 from [3]). Comparing with lower order PDF fits, we find a smooth convergence in the fit quality which follows what one may expect from an increase in the accuracy of a perturbative expansion (\(\chi ^{2} / N_{\textrm{pts}}\) = LO: 2.57, NLO: 1.33, NNLO: 1.17, N\(^{3}\)LO: 1.14). In part, this is due to the extra freedom in the K-factors, which will almost always act to reduce this \(\chi ^{2}\) due to the minimisation procedure. However, even with this freedom, in most cases the N\(^{3}\)LO theory (non K-factor) contributions include large divergences from NNLO. With this in mind, we must conclude that the fit is preferring a description different from the current NNLO standard.

Table 6 Table displaying dimuon branching ratios (BRs) at NLO, NNLO, aN\(^{3}\)LO and aN\(^{3}\)LO with NNLO K-factors

At NNLO (Table 19), the tension between HERA and non-HERA datasets accounted for \(\Delta \chi ^{2} = -61.6\) reduction in the overall fit quality when the former was removed, with the majority of this tension between HERA and ATLAS \(8~\text {TeV}\ Z\ p_{T}\) [110] data. Whereas comparing fit results with and without HERA data at N\(^{3}\)LO, we find \(\Delta \chi ^{2} = -49.0\). Although the overall difference is not too substantial we do report a substantial shift in the leading tensions, where most of the tension with HERA data is now residing with NMC \(F_{2}\) [136] and CMS \(8~\text {TeV}\) jets [105] data. Tensions with NMC \(F_{2}\) [136] data are also seen to some extent at NNLO where we show a \(\Delta \chi ^{2} = - 20.6\) in a fit omitting HERA data (combining the NMC \(F_{2}\) datasets shown in Table 19). However at N\(^{3}\)LO, Table 20 shows a \(\Delta \chi ^{2} = - 23.4\) reduction from NMC \(F_{2}\) data in a fit omitting HERA data. Therefore whilst the N\(^{3}\)LO additions remove tensions with \(Z\ p_{T}\) data, it remains that the HERA data is preferring the high-x quarks to be lower than favoured by NMC data. This is suggestive of higher twist effects for NMC data at low-\(Q^{2}\) (as we observe a worse fit to low-\(Q^{2}\) data). We also emphasise that when conducting a fit at NNLO with \(Z\ p_{T}\) data removed, an improvement of \(\Delta \chi ^{2} = -41.3\) is observed in the rest of the data, whereas at N\(^{3}\)LO an improvement of \(\Delta \chi ^{2} = -65.2\) is observed in all other datasets without removing \(Z\ p_{T}\), therefore these results are not purely an effect of removing any \(Z\ p_{T}\) tension. Considering tensions with CMS \(8~\text {TeV}\) jets [105] data, as discussed in Sect. 7, in general the jets datasets show tensions with the N\(^{3}\)LO description (especially for CMS \(8~\text {TeV}\) jets [105]), therefore it will be interesting to observe how this picture evolves when considering this data in the form of dijets.

Since a naturally richer description of the small-x regime is being included at N\(^{3}\)LO, which has a direct effect on the HERA datasets, the reduction of important tensions from NNLO is even further justification for the inclusion of the N\(^{3}\)LO theory. The extra N\(^{3}\)LO additions are allowing the large-x behaviour of the PDFs to be less dominated by data at small-x, while also producing a better fit quality at small-x (i.e. for HERA data). Some of the above observations are also made in [16, 17] where studies of including small-x resummation results into a PDF fit have been reported.

Table 7 Table showing the relevant Dimuon datasets and how the individual \(\chi ^{2}\) changes from NNLO by including the N\(^{3}\)LO treatment of K-factors, and theoretical N\(^{3}\)LO additions discussed earlier. The result with purely NNLO K-factors included for all data in the fit is also given
Table 8 Full breakdown of \(\chi ^{2}\) results for the aN\(^{3}\)LO PDF fit. The global fit includes the N\(^{3}\)LO treatment for transition matrix elements, coefficient functions, splitting functions and K-factor additions with their variational parameters determined by the fit

Reflecting on the chosen prior distributions for each of the sources of missing N\(^{3}\)LO uncertainty, Table 8 confirms that no especially large penalties are being incurred in this new description. These results therefore demonstrate that the fit is succeeding in leveraging contributions (such as \(P_{qq}^{(3)}\) and \(P_{qg}^{(3)}\) in the quark evolution part of Eq. (3.3)) to produce a better overall fit.

8.1.1 DIS processes

Table 9 Table showing the relevant DIS datasets and how the individual \(\chi ^{2}\) changes from NNLO by including the N\(^{3}\)LO contributions to the structure function \(F_{2}(x, Q^{2})\). The result within purely NNLO K-factors included for all data in the fit is also given

To complement the discussions in Sect. 7, we isolate the \(\chi ^{2}\) results from DIS data in Table 9. This data is directly affected by the N\(^{3}\)LO structure functions constructed approximately in Sects. 36. A substantial decrease in the total \(\chi ^{2}\) from NNLO is observed across DIS datasets. Considering the results in Table 9 in the context of Tables 2, 3, 4, 5, 6 and 7, a better fit quality is observed for all DIS and non-DIS datasets than at NNLO with the inclusion of N\(^{3}\)LO contributions. As the DIS data makes up over half of the total data included in a global fit, it is the dominant force in deciding the overall form of the PDFs, especially at small-x (discussed further in Sect. 8.4). Table 9 further reinforces the point that the N\(^{3}\)LO description is flexible enough to fit to HERA and non-HERA data, without being largely constrained by tensions between the small-x (HERA dominated) and large-x (non-HERA dominated) regions.

8.2 Correlation results

The correlation matrix shown in Fig. 28 illustrates the correlations between extra N\(^{3}\)LO theory parameters and the subset of the MSHT20 parameters which are included in the construction of Hessian eigenvectors (see Sect. 8.3 and [3] for details). It is apparent that the correlations between K-factor parameters for each process (shown in green) and other PDF and theory parameters are usually small, with some exceptions e.g. for the Top\(_{\text {NLO}}\) parameter. Due to this there is an argument that each process’ K-factor parameters could be treated separately from all other parameters in the Hessian prescription (see Sect. 4.1) which allows for a more flexible PDF set that can be decorrelated from a process. By using the uncorrelated Hessian results for a process NNLO hard cross sections can be transformed to aN\(^{3}\)LO and therefore provide more reliable predictions (more details in Sect. 10). This is a fairly intuitive result, since most correlations are showing a natural separation between the process dependent and process independent physics in the DIS picture.Footnote 18 Mathematically, the K-factors are directly associated with the hard cross section, whereas other N\(^{3}\)LO theory parameters (\(\rho _{ij}\) and \(a_{ij}\)) are having a direct effect on the PDFs. Figure 28 therefore begins to motivate the inclusion of the ‘pure’ theory (splitting functions and transition matrix elements) parameters within the standard MSHT eigenvector analysis [3], with the decorrelation of the K-factor parameters, as discussed in Sect. 2.3. We investigate and compare both treatments (complete correlation and K-factor decorrelation) throughout the rest of this section. We show in Sect. 8.4 that while the decorrelation of K-factors is not complete, both treatments result in similar uncertainty bands, therefore confirming that the effect of making the assumption of full decorrelation is minimal in practice. Note that although the \(c_{i}^{\textrm{NLL}}\) parameters also show minimal correlation with other parameters, we include these within the ‘pure’ theory group of parameters (i.e. correlated with \(\rho _{ij}\) and \(a_{ij}\)) as they are essential ingredients in the underlying DIS theory.

8.3 Eigenvector results

In the MSHT fitting procedure (described in [3]) the eigenvectors of a Hessian matrix are found, which encapsulate the sources of uncertainties and corresponding correlations. Combining these with the central PDFs, forms the entire PDF set with uncertainties. In this eigenvector analysis a dynamical rescaling of each eigenvector \(e_{i}\) is performed via a tolerance factor t to encapsulate the \(68\%\) confidence limit (C.L.).

$$\begin{aligned} a_{i} = a_{i}^{0} \pm t e_{i}, \end{aligned}$$
(8.1)

where \(a_{i}^{0}\) is the best fit parameter. t is then adjusted to give the desired tolerance T for the required confidence interval defined as \(T = \sqrt{\Delta \chi _{\textrm{global}}^{2}}\) (for \(68\%\) C.L.). In a quadratic approximation, for suitably well-behaved eigenvectors, \(t = T\) is true. Although for eigenvectors with larger eigenvalues, it is possible to observe significant deviations from \(t=T\). The standard MSHT fitting procedure involves allowing all relevant parameters from [3] to vary when finding the best fit, now including all N\(^{3}\)LO theory parameters (\(\rho _{ij}, a_{ij}, c_{i}^{\textrm{NLL}}, K_{\mathrm {NLO/NNLO}}\)) discussed in this work. After accounting for high degrees of correlation between parameters (described in [42]), the result is a Hessian matrix which in general, depends on a subset of the parameters that were allowed to vary in a best fit and provides a set of suitably well-behaved eigenvectors. The standard MSHT NNLO PDF eigenvectors are based on a set of 32 parameters, reduced from the 52 parameters allowed to vary in the full fit. In the following analysis we are therefore concerned with a smaller number of parameters, specifically the 32 parameters from the standard MSHT fitting procedure plus an extra 20 N\(^{3}\)LO parameters (shown in Fig. 28).

Fig. 28
figure 28

Correlation matrix for all N\(^{3}\)LO theory parameters included in the fit against the subset of the MSHT20 parameters (shown in black) used in constructing the Hessian eigenvectors. This is shown for the case where the K-factors correlations with the first 42 parameters are included. N\(^{3}\)LO theory parameters associated with the splitting functions are coloured blue, the parameters affecting the transition matrix elements and coefficient functions are in red and the K-factor parameters are in green

A standard choice of tolerance T is \(T = \sqrt{\Delta \chi ^{2}_{\textrm{global}}} = 1\) for a 68% C.L. limit. However, this assumes all datasets are consistent with Gaussian errors. In practice, due to incomplete theory, tensions between datasets and parameterisation inflexibility, this is known not to be the case in a global PDF fit. To overcome this, a 68% C.L. region for each dataset is defined. Then for each eigenvector, the value of \(\sqrt{\chi ^{2}_{\textrm{global}}}\) for each chosen t is recorded (ideally showing a quadratic behaviour). Finally, a value of T is chosen to ensure that all datasets are described within their 68% CL in each eigenvector direction. For a fuller mathematical description of the dynamical tolerance procedure used in MSHT PDF fits, the reader is referred to [42]. In this section we present a demonstration of how well the resultant eigenvectors follow the quadratic assumption based on \(t=T\), including the specific choices of dynamical tolerances and which dataset/penalty constrains this tolerance in each eigenvector direction.

8.3.1 PDF + N\(^{3}\)LO DIS theory + N\(^{3}\)LO K-factor (decorrelated) parameters

As discussed in Sect. 8.2, when determining the eigenvectors and therefore PDF uncertainties, we can choose to either include the correlations between the 10 K-factor parameters added with the other 42 parameters (encompassing the standard 32 MSHT eigenvector parameters and the 10 new theory parameters from the splitting functions, transition matrix elements and coefficient functions) or to decorrelate the 10 K-factor parameters.

In this section we address the scenario where we decorrelate the K-factors as

$$\begin{aligned} H_{ij}^{-1} + \sum _{p = 1}^{N_{p}}K_{ij,p}^{-1} \end{aligned}$$
(8.2)

and consider each term individually.

Fig. 29
figure 29

Correlation matrix of the first 42 (total 52) eigenvectors found with the N\(^{3}\)LO parameters added into the analysis in the case where the K-factors are decorrelated from these first 42 parameters. Parameters associated with the splitting functions are coloured blue, those affecting the transition matrix elements and coefficient functions are in red

Fig. 30
figure 30

Map of the 10 K-factor eigenvectors found with the N\(^{3}\)LO parameters added into the analysis in the case where the K-factors are decorrelated from these first 42 parameters.. Combined with the 42 eigenvectors shown in Fig. 29, these form the total 52 eigenvectors in the decorrelated case. Parameters associated with the K-factor parameters are in green

Figure 29 shows the map of eigenvectors produced from \(H_{ij}\) in Eq. (8.2), where we have included the new N\(^{3}\)LO DIS theory parameters (splitting functions in blue and coefficient functions/transition matrix elements in red) correlated with the PDF parameters. Eigenvectors 35 and 36 are prime examples of where the eigenvectors have specifically encompassed the correlation/anti-correlation between the two NLL FFNS coefficient function parameters \(c_{i}^{\textrm{NLL}}\) (\(i \in \{q,g\}\)). Whereas the splitting functions naturally give rise to a much more complicated mixing with other PDF parameters as these directly affect the evolution of the PDFs. Due to the direct impact of \(\rho _{ij}\)’s on the PDFs (via DGLAP evolution), combined with the large contributions to the evolution shown at N\(^{3}\)LO, this result is as expected.

Another somewhat pleasing aspect is the recovery of a natural separation between eigenvectors associated with the N\(^{3}\)LO coefficient function/transition matrix elements and our original PDF parameters (incl. N\(^{3}\)LO splitting functions). This separation is reminiscent of our DIS picture, whereby the splitting functions are much more intertwined with the raw PDFs and the transition matrix elements have a symbiotic relationship with the coefficient functions (see GM-VFNS description in Sect. 3). Due to this, the form of these eigenvectors has not only some level of physical interpretation inherited from our underlying theory, but also offers a useful way to access the different sources of N\(^{3}\)LO additions within the PDF set.

In Fig. 30 the eigenvectors resulting from the \(\sum _{p = 1}^{N_{p}}K_{ij,p}^{-1}\) terms in Eq. (8.2) are shown. These eigenvectors are constructed in pairs, describing the correlation and anti-correlation of the two K-factor parameters (controlling the NLO and NNLO contributions to N\(^{3}\)LO) for each process p contained within the corresponding \(K_{ij,p}\) correlation matrix.

Table 10 shows further information regarding the K-factor parameter limits from each eigenvector. In most cases the parameter limits are well within the allowed variation (\(-1< a < 1\)), which is an indication that the data included in the fit is constraining these parameters rather than the individual penalties for each parameter.Footnote 19

To assess whether the eigenvectors are violating the quadratic treatment, four examples displaying this behaviour are shown in Fig. 31, with a full analysis provided in Appendix 1. Additionally, Table 11 provides a summary of all tolerances found within the eigenvector scans.

Table 10 Limiting values for specific K-factor parameters for each of the processes considered in the decorrelated case. Parameter values are shown in the positive and negative limits for each eigenvector. The scale choices for top quark processes are described in Sect. 7.2 to be \(H_T/4\) for the single differential datasets with the exception of data differential in the average transverse momentum of the top or antitop, \(p_T^t,p_T^{{\bar{t}}}\), for which \(m_T/2\) is used. For the double diff. dataset the scale choice is \(H_T/4\) and for the inclusive top \(\sigma _{t{\bar{t}}}\) a scale of \(m_{t}\) is chosen
Fig. 31
figure 31

Dynamic tolerance behaviour for 4 selected eigenvectors in the case of decorrelated K-factor parameters. The black dots show the fixed tolerance relations found for integer values of t, whereas the red triangles show the final chosen dynamical tolerances for each eigenvector direction. For an exhaustive analysis of all eigenvectors see Fig. 50

There is relatively consistent agreement between t and T across all eigenvectors with later eigenvectors (i.e. higher #) generally becoming less quadratic (a feature which is built into the fit). Eigenvectors 31, 41 and 42 displayed in Fig. 31 are shown in Table 11 to be either dominated or limited by at least one new N\(^{3}\)LO parameter. Conversely, eigenvector 26 is much more dominated by the original PDF parameters from MSHT20 NNLO. Comparing these cases, the eigenvectors associated more strongly with the N\(^{3}\)LO parameters exhibit a similar level of agreement (and occasionally better) with the desired quadratic behaviour as eigenvectors more closely associated with the original PDF parameters.

The last 5 sets of eigenvectors (i.e. the last 10 where a set contains 2 eigenvectors for a particular process) we see in Table 11 are the decorrelated K-factor eigenvectors, where there are correlated/anti-correlated eigenvectors for each process. For all K-factor cases, Table 11 provides sensible results with either the dominant datasets or parameter penalties constraining each eigenvector direction. One interesting feature one can observe here is a sign of tension between the ATLAS \(8~\text {TeV}\ Z\ p_{T}\) [110] and ATLAS \(8~\text {TeV}\ W + \text {jets}\) [111] datasets where the limiting factors in Table 11 for eigenvectors 49 and 50 show that these datasets are preferring a slightly different K-factor.

To provide some extra level of comparison between the eigenvectors shown here and the eigenvectors found in the NNLO case, the average tolerance T for aN\(^{3}\)LO (decorrelated K-factors) set is 3.34, compared to the NNLO average T of 3.37.

Table 11 Tolerances resulting from eigenvector scans with decorrelated K-factors for each process. The average tolerance for this set of eigenvectors is \(T=3.34\)

8.3.2 PDF + N\(^{3}\)LO DIS theory + N\(^{3}\)LO K-factor (correlated) parameters

Fig. 32
figure 32

Map of eigenvectors found with the N\(^{3}\)LO theory and K-factor parameters added into the analysis. Parameters associated with the splitting functions are coloured blue, those affecting the transition matrix elements and coefficient functions are in red and the K-factor parameters are in green

Fig. 33
figure 33

Dynamic tolerance behaviour for 4 selected eigenvectors in the case of correlated K-factor parameters. The black dots show the fixed tolerance relations found for integer values of t, whereas the red triangles show the final chosen dynamical tolerances for each eigenvector direction. For an exhaustive analysis of all eigenvectors see Fig. 51

In this section we address the scenario,

$$\begin{aligned} H_{ij}^{\prime } = \left( H_{ij}^{-1} + \sum _{p = 1}^{N_{p}}K_{ij,p}^{-1} \right) ^{-1}. \end{aligned}$$
(8.3)

Moving to an analysis including aN\(^{3}\)LO K-factors as correlated parameters with PDF and other N\(^{3}\)LO theory parameters. This provides a comparison to the case of decorrelated K-factors and justification for treating the cross section behaviour separately to the PDF theory behaviour.

Figure 32 shows a map of eigenvectors with the extra 10 N\(^{3}\)LO K-factor parameters (shown in green) included into the correlations considered. As expected, the result of including the correlations between PDF parameters and aN\(^{3}\)LO K-factors results in a slightly more intertwined set of eigenvectors (although a high level of decorrelation remains). Specifically, due to the much higher number of DY datasets included in the global fit, these N\(^{3}\)LO K-factor parameters tend to be included across more of a spread of eigenvectors. On the other hand, the Dimuon K-factors are almost entirely isolated within two eigenvectors, similar to the decorrelated case.

Once again, to investigate deviations from the quadratic behaviour, Fig. 33 illustrates examples of the tolerance behaviours of selected eigenvectors, with a full analysis provided in Appendix 1. Further to this, Table 12 displays the tolerances and limiting datasets/parameters for the 52 correlated eigenvectors. It is difficult to compare and contrast these results with the decorrelated case, since the eigenvectors are inherently different. However in both cases, the eigenvectors are similarly well behaved, exhibit relatively good consistency between t and T and are therefore providing valid descriptions for a PDF fit.

For most of the 12 eigenvectors with N\(^{3}\)LO K-factors as primary parameters, there is expected behaviour, with the eigenvectors constrained either by their own penalties or by dominant datasets for the associated process. However, due to the extra correlations considered, there are a small number of eigenvector directions which are not as trivial to explain (e.g. eigenvector 31). We therefore recover the lack of correlation between K-factor parameters seen within Fig. 28 in the set of correlated PDF eigenvectors presented here. Further to this, comparing the t and T values found for eigenvectors associated with N\(^{3}\)LO K-factors in Tables 11 and 12, one can observe clear similarities between eigenvectors. This suggests that even when correlating the K-factor parameters, the fit succeeds in decorrelating the individual processes, thereby motivating our original assumption that the correlations with K-factors can be ignored. Another similarity one can observe between Tables 11 and 12 is the suggestion of some tension between ATLAS \(8~\text {TeV}\ Z\ p_{T}\) [110] and ATLAS \(8~\text {TeV}\ W + \text {jets}\) [111] datasets seen in the limiting factors of eigenvector 39 in the correlated case.

Eigenvectors 27, 29 and 52 displayed in Fig. 33 can be seen from Table 12 to be associated with the new N\(^{3}\)LO theory parameters. Whereas eigenvector 37 is primarily focused on an original PDF parameter. One can observe a similar level of quadratic behaviour across all four of these eigenvector tolerances. Comparing all eigenvectors in the decorrelated/correlated cases, the behaviours are similarly well behaved. The average tolerance T for the aN\(^{3}\)LO (with correlated K-factors) case is 3.57, slightly higher than the NNLO average of 3.37 and the aN\(^{3}\)LO (with decorrelated K-factors) average of 3.34.

8.4 PDF results

Figure 34 displays the overall shape of the PDFs including the N\(^{3}\)LO additions compared to the standard NNLO set. We provide this comparison to accompany the results described in earlier sections. At small-x and low-\(Q^{2}\) the gluon exhibits a marked enhancement due to the large small-x logarithms inserted at N\(^{3}\)LO. The changes induced from specific N\(^{3}\)LO contributions are investigated in Sect. 8.8.

Shown in Figs. 35 and 36 are the ratios for each flavour of aN\(^{3}\)LO PDF compared to the NNLO set with their 68% confidence intervals at low and high-\(Q^{2}\) respectively. The shaded aN\(^{3}\)LO regions indicate the PDF uncertainty produced with the decorrelated (\((H^{-1}_{ij} + \sum _{p=1}^{N_{p}}K^{-1}_{ij,\ p})^{-1}\)) aN\(^{3}\)LO K-factors for each process. As a comparison to these shaded regions, the bounds of uncertainty for the fully correlated (\(H_{ij}^{\prime }\)) N\(^{3}\)LO K-factor parameters are also provided (red dashed line).

Considering Fig. 35 we present the aN\(^{3}\)LO PDF set at \(Q^{2} = 10\ {\textrm{GeV}}^{2}\) with the bottom quark PDF at \(Q^{2} = 25\ {\textrm{GeV}}^{2}\). These PDF ratios better display the substantial increase in the gluon at small-x, reminiscent of the gluon PDF presented in [16, 17]. The predicted harder small-x gluon is then accommodated for by reductions in the PDFs at large and small-x (particularly the gluon near \(x=10^{-2}\)) from NNLO. Another prominent feature is the enhanced charm and bottom quark at N\(^{3}\)LO. Since the heavy flavour quarks are perturbatively calculated in the MSHT framework, this amplification is a feature of the transition matrix element \(A_{Hg}^{(3)}\) at high-x, combined with the increase in the gluon PDF at small-x (as these two ingredients are convoluted together). Comparing with Fig. 95 in [3], we observe that the approximate N\(^{3}\)LO charm quark now follows a much closer trend to the CT18 PDF and is therefore even more significantly different from the NNPDF NNLO fitted charm at large-x than MSHT20 at NNLO. In the high-\(Q^{2}\) setting shown in Fig. 36 we observe similar albeit less drastic effects to those described above.

Table 12 Tolerances resulting from eigenvector scans with correlated K-factors for each process. The average tolerance for this set of eigenvectors is \(T=3.57\)

Also contained in Figs. 35 and 36 are the relative forms of NNLO PDFs when fit to all non-HERA data (full \(\chi ^{2}\) results are provided in Appendix B). Comparing the non-HERA NNLO PDFs with aN\(^{3}\)LO PDFs, there are some similarities in the shapes and magnitudes of a handful of PDFs in the intermediate to large-x regime, most noticeably the light quarks. At small-x the HERA data heavily constrains the PDF fit and therefore these similarities rapidly break down. However, this analysis displays further evidence that including N\(^{3}\)LO contributions, even though approximate, reduces tensions between the HERA and non-HERA data (when considering the reduction in tension seen in Table 19). The aN\(^{3}\)LO PDFs are seemingly able to fit to HERA and non-HERA datasets with superior flexibility than at NNLO.

While in principle the negativity of quarks is possible in the \({\overline{MS}}\) scheme, it is unlikely to be correct at very high scales and the behaviour can lead to issues concerning negative cross section predictions [148, 149]. In the case of the \({\overline{d}}\), the form of this PDF has a negative central value above \(x \sim 0.5\) with a minimum of \(\sim -0.001\) at \(x \sim 0.6\). It is also noted that although the \({\overline{d}}\) central value becomes negative in this region, it is still positive within PDF uncertainties. These features are not uncommon in PDF analyses and are discussed in detail in [8]. The proposed smoothing of parameterisations employed in [8] ensures the definite positive nature of PDFs in the high-x region. Comparing the negativity of the approximate N\(^{3}\)LO \({\overline{d}}\) PDF with that in [8], the \({\overline{d}}\) PDF presented here is much less negative and positive within PDF uncertainties. Due to this and the fact that this effect is only apparent in the \({\overline{d}}\), we present these PDFs as they are. We also note that in the current MSHT20 fit, recent results surrounding the \({\overline{d}}/{\overline{u}}\) from the SeaQuest collaboration [150] are not included at the time of writing. It is therefore only the E866 / NuSea pd/pp DY dataset [81] that is constraining this ratio, which is not as precise as the more recent results. However, SeaQuest results suggest a preference for a higher \({\overline{d}}\) at large-x, therefore including this data may in fact help constrain the high-x \({\overline{d}}\) behaviour seen here.

Fig. 34
figure 34

General forms of NNLO (top) and aN\(^{3}\)LO (bottom) PDFs at low (left) and high (right) \(Q^{2}\). Several main features can be compared and contrasted such as the marked increase in the gluon and charm at small-x (note the difference in y-axis scale between NNLO (top) and aN\(^{3}\)LO (bottom))

Figures 37 and 38 express the aN\(^{3}\)LO PDFs with decorrelated (green shaded region) and correlated (red dashed lines) aN\(^{3}\)LO K-factors at low and high-\(Q^{2}\) respectively (again with the bottom quark provided at \(Q^{2} = 25\ {\textrm{GeV}}^{2}\) at low-\(Q^{2}\)) as a ratio to the N\(^{3}\)LO central value. For comparison we also include the level of uncertainty predicted with all N\(^{3}\)LO theory fixed (blue shaded region) i.e. only considering the variation without N\(^{3}\)LO theoretical uncertainty.

Comparing the two different aN\(^{3}\)LO sets in Figs. 37 and 38, in general there is good agreement between the total uncertainties considering the cases with correlated (red dash) and decorrelated (green shaded) aN\(^{3}\)LO K-factors. The differences that are apparent between the two aN\(^{3}\)LO cases, are relatively small across all PDFs, with slightly larger effects only where the PDF itself tends towards zero i.e. valence quarks at small-x.

Fig. 35
figure 35figure 35

Low-\(Q^{2}\) ratio plots showing the aN\(^{3}\)LO 68% confidence intervals with decorrelated (\(H_{ij} + K_{ij}\)) and correlated (\(H_{ij}^{\prime }\)) K-factor parameters, compared to NNLO 68% confidence intervals. Also shown are the central values at NNLO when fit to all non-HERA datasets which show similarities with N\(^{3}\)LO in the large-x region of selected PDF flavours. All plots are shown for \(Q^{2} = 10~{\textrm{GeV}}^{2}\) with the exception of the bottom quark shown for \(Q^{2} = 25~{\textrm{GeV}}^{2}\)

Fig. 36
figure 36figure 36

High-\(Q^{2}\) ratio plots showing the aN\(^{3}\)LO 68% confidence intervals with decorrelated (\(H_{ij} + K_{ij}\)) and correlated (\(H_{ij}^{\prime }\)) K-factor parameters, compared to NNLO 68% confidence intervals. Also shown are the central values at NNLO when fit to all non-HERA datasets which show similarities with N\(^{3}\)LO in the large-x region of selected PDF flavours. All plots are shown for \(Q^{2} = 10^{4}~{\textrm{GeV}}^{2}\)

Fig. 37
figure 37

Low-\(Q^{2}\) ratio plots showing the aN\(^{3}\)LO 68% confidence intervals with decorrelated and correlated K-factor parameters, compared to the aN\(^{3}\)LO central value. Also shown are the central values at aN\(^{3}\)LO when fit to all non-HERA datasets and the central values with all K-factors set at NNLO. All plots are shown for \(Q^{2} = 10~{\textrm{GeV}}^{2}\) with the exception of the bottom quark shown for \(Q^{2} = 25~{\textrm{GeV}}^{2}\)

Fig. 38
figure 38figure 38

High-\(Q^{2}\) ratio plots showing the aN\(^{3}\)LO 68% confidence intervals with decorrelated and correlated K-factor parameters, compared to the aN\(^{3}\)LO central value. Also shown are the central values at aN\(^{3}\)LO when fit to all non-HERA datasets and the central values with all K-factors set at NNLO. All plots are shown for \(Q^{2} = 10^{4}~{\textrm{GeV}}^{2}\)

A larger distinction is observed when comparing the sets with and without theoretical uncertainty (where N\(^{3}\)LO theory is fixed at the best fit value). In general there is an expected substantial increase in the PDF uncertainties when taking into account the missing N\(^{3}\)LO uncertainty for the gluon (and therefore the heavy quarks). In particular, the form of the N\(^{3}\)LO bottom quark uncertainty is reminiscent of the \((H+{\overline{H}})\) prediction from Fig. 12. One can therefore directly observe the effect of the \(A_{Hg}\) theoretical uncertainty on the bottom quark directly above its mass threshold. In other areas, the without theoretical uncertainty PDF set exhibits a comparable uncertainty to aN\(^{3}\)LO and is even shown to increase the overall 68% confidence intervals in certain regions of \((x,Q^{2})\) due to N\(^{3}\)LO parameters being fixed (i.e. \(u_{v}\) and \(d_{v}\) PDFs in Figs. 37 and 38). As the fit now resides in a different \(\chi ^{2}\) landscape where a best fit has been achieved through fitting the N\(^{3}\)LO theory, fixing the aN\(^{3}\)LO theory parameters is likely to have a substantial effect across all PDFs.

An important point made by Figs. 37 and 38 is that the difference between the decorrelated and correlated cases is much smaller than the difference of not including theoretical uncertainties at all (blue shaded region). This analysis therefore provides evidence to support the original assumption of being able to decorrelate the cross section (aN\(^{3}\)LO K-factors) and PDF theory (including other N\(^{3}\)LO theory).

Along with the separate cases of uncertainty illustrated in Figs. 37 and 38, we also display the central values of an aN\(^{3}\)LO fit to all non-HERA data and an aN\(^{3}\)LO fit with NNLO K-factors. Examining the form of the no HERA aN\(^{3}\)LO PDFs for \(x > 10^{-2}\), we show some agreement with the standard N\(^{3}\)LO central value across most PDFs (more so at high-\(Q^{2}\) than low-\(Q^{2}\)). Whereas the form at small-x gives some insight into the importance of HERA data in constraining PDFs in this region. In slightly better agreement across all x are the aN\(^{3}\)LO PDFs with NNLO K-factors, which compliment the \(\chi ^{2}\) results in Sect. 7 and Sect. 8.1 arguing that the form (and fit results) of aN\(^{3}\)LO PDFs is mostly determined from the extra PDF \(+\) DIS coefficient function N\(^{3}\)LO additions i.e. not aN\(^{3}\)LO K-factors which prefer a softer high-x gluon (similar to the N\(^{3}\)LO no HERA case – also shown in Figs. 37 and 38).

8.5 MSHT20aN\(^{3}\)LO PDFs at \(Q^{2} = 2~{\textrm{GeV}}^{2}\)

Fig. 39
figure 39

General forms of NNLO (left) and aN\(^{3}\)LO (right) PDFs at \(Q^{2} = 2~{\textrm{GeV}}^{2}\). Axis are set to the same scale to highlight the main differences between NNLO and aN\(^{3}\)LO. Specifically in the gluon and heavy flavour sectors

Figure 39 compares the MSHT NNLO and aN\(^{3}\)LO PDF sets at \(Q^{2} = 2\ {\textrm{GeV}}^{2}\). In this very low-\(Q^{2}\) regime, some major differences are evident between NNLO and aN\(^{3}\)LO sets at \(Q^{2} = 2\ {\textrm{GeV}}^{2}\), especially towards small-x. For example, the gluon PDF is predicted to be much harder across this region, such that it is now positive across all x values considered here. The effect of this can be immediately seen in the sea and heavy quarks.

Since the charm quark is directly coupled to the gluon PDF (through a convolution with \(A_{Hg}\)), the charm PDF receives a notable enhancement at small-x and also remains positive across all x values considered.Footnote 20 Another interesting feature is the reduction in uncertainty of the strange quark at small-x. It may seem counter intuitive to have an uncertainty reduction by adding sources of theoretical uncertainty, however we should recall that the underlying theory has also been altered. Although one can expect an uncertainty increase in PDFs across \((x,Q^{2})\), there are exceptions to this e.g. where tensions are relieved by introducing the N\(^{3}\)LO theory. The shift in the \(\chi ^{2}\) landscape then has the potential to result in more precise regions of \((x, Q^{2})\) (in this case manifesting in an uncertainty reduction for the strange quark towards small-x).

Fig. 40
figure 40figure 40

Very low-\(Q^{2}\) ratio plots showing the aN\(^{3}\)LO 68% confidence intervals with decorrelated and correlated K-factor parameters, compared to NNLO 68% confidence intervals. All plots are shown for \(Q^{2} = 2~{\textrm{GeV}}^{2}\)

Figure 40 displays the ratios of the aN\(^{3}\)LO MSHT PDFs to their NNLO counterparts at \(Q^{2} = 2\ {\textrm{GeV}}^{2}\). Here the specific shifts of each PDF are displayed more clearly. We note that there are many similar features shown here to those discussed for Figs. 35 and 36. Even in this very low-\(Q^{2}\) regime, the uncertainty difference between correlated and decorrelated aN\(^{3}\)LO K-factor PDF sets is minimal in all relevant regions of x.

8.6 Effect of a \(x < 10^{-3}\) cut at aN\(^{3}\)LO

In this section we include results from a global PDF fit with small-x (\(x < 10^{-3}\)) data omitted. This analysis is provided to shed some light on the tensions between regions of x at aN\(^{3}\)LO while also providing some context with regards to the form of the PDFs in different regions of x.

Fig. 41
figure 41

Low-\(Q^{2}\) PDF ratios showing aN\(^{3}\)LO PDFs fitted with and without small-x (\(<10^{-3}\)) data included in a global fit. All plots are shown for \(Q^{2} = 10~{\textrm{GeV}}^{2}\) with the exception of the bottom quark shown for \(Q^{2} = 25~{\textrm{GeV}}^{2}\)

Immediately, in Fig. 41, one can observe that omitting all small-x data results in a set of less constrained PDFs for \(x < 10^{-3}\) (most notably in the gluon sector). However, also in Fig. 41 it can be observed that overall, the large-x behaviour of these PDFs is very similar across both fits, indicating that the full fit is able to sufficiently fit both large and small x regions simultaneously. We provide this analysis as a cross-check to further support the reliability of our procedure, showing that the small-x behaviour is not overwhelmingly attempting to fit to any all order, or specifically small-x resummation, result at the expense of the large-x description. We also note that while there is some definite change in the central values of the PDFs at small x, in most cases this is very well within uncertainties, and at most at the level of about one standard deviation, particularly for the gluon for x just below \(10^{-3}\). This distinct, but limited shift in the best fit PDFs suggests that effects beyond N\(^3\)LO are certainly not insignificant at very low x, they are also not dominating the pull on the fit.

8.7 Posterior N\(^{3}\)LO theory parameters

Following from the previous section, it is also interesting to examine where the aN\(^{3}\)LO theory contributions and their uncertainties reside after a global PDF fit.

Table 13 Posterior predicted \(\pm 1\sigma \) limits on aN\(^{3}\)LO theoretical nuisance parameters for splitting functions, transition matrix elements and coefficient functions

Displayed in Table 13 are the predicted posterior limits on each (non K-factor) aN\(^{3}\)LO theory parameter. Here one can directly compare these variations with the prior variations decided in earlier sections. Also provided is a comparison of these posterior limits across a fit with and without small-x (\(x < 10^{-3}\)) data included. This comparison compliments the previous section by showing a similar trend in the central values predicted in both cases (i.e. with an overlap of uncertainties). As with the PDFs, there is some significant evidence of tensions, but these are not severe, and the central values of many parameters are extremely stable. Furthermore this is evidence that the small-x behaviour is influencing, but not likely to be dominating the behaviour of aN\(^{3}\)LO parameters in a manner which is significantly adversarial to the preference of data at \(x > 10^{-3}\).

Fig. 42
figure 42

Posterior variations of the aN\(^{3}\)LO splitting functions and transition matrix elements predicted from a full global fit (blue shaded band) compared to the prior variations in each case (green shaded band)

Figure 42 displays a comparison of the prior and posterior variations predicted for the perturbative expansions of the relevant splitting functions and transition matrix elements discussed in Sects. 4 and 5. We exclude the non-singlet quantities from this comparison as the variations predicted for these quantities are very similar to their priors (as can be seen in Table 13) and have a small overall effect on the PDFs. It is true that once a fit is performed, the variation of the aN\(^{3}\)LO theoretical nuisance parameters becomes less sensitive to the prior variation, suggesting that the initial uncertainty estimate was conservative. Nevertheless in Fig. 42, one can observe that all posterior variations overlap with their corresponding priors, in most cases quite considerably. We also note that the most drastic differences between prior and posterior variations are as expected relating to the gluon PDF.

Fig. 43
figure 43

Posterior variations of the aN\(^{3}\)LO splitting functions and transition matrix elements predicted from a full global fit (blue shaded band) compared to a fit with small-x (\(x < 10^{-3}\)) data removed from a fit (red shaded band)

Figure 43 contains a comparison between the aN\(^{3}\)LO functions posterior variations with and without small-x (\(x < 10^{-3}\)) data included in a global fit. These results accompany those presented in Table 13 and further display the reasonable agreement between the two fits, but there is some degree of tension occurring mainly in the cases of \(P_{gg}\), \(A_{Hg}\) and \(A_{gg,H}\). In all cases the predicted variations overlap, with most central values being stable (i.e. contained well within the uncertainty predictions). However, for \(P_{gg}\), \(A_{Hg}\) and \(A_{gg,H}\) the fit with the small x cut does result in posterior functions which are more consistent with the prior functions, again suggesting that for these functions the posterior values are influenced, to a significant, but not overwhelming extent by terms beyond N\(^3\)LO, most likely those associated with small-x resummation. Hence, as with the PDFs this provides evidence that the aN\(^{3}\)LO predictions are reasonably consistent across all values of x but are influenced to a limited extent by the small-x region. This supports our view that while we are explicitly determining the missing N\(^3\)LO corrections, which are indeed overall the dominant part of the missing higher order corrections, the fit is also probing some even higher order corrections, particularly at small x.

8.8 N\(^{3}\)LO contributions

In this section all but one N\(^{3}\)LO contribution will be switched off, in particular only splitting functions, or only heavy or light flavour coefficient functions with their relevant transition matrix elements. In all cases the aN\(^{3}\)LO K-factors are left free to allow the fit some freedom in manipulating the cross sections of other datasets. In practice however, fixing these K-factors at the NNLO values has a minimal effect on the shape of the PDFs in all cases (as demonstrated in Figs. 37 and 38).

Fig. 44
figure 44figure 44

Low-\(Q^{2}\) PDF ratios showing aN\(^{3}\)LO (with decorrelated K-factors) 68% confidence intervals compared to NNLO 68% confidence intervals with varying theory contributions. All plots are shown for \(Q^{2} = 10~{\textrm{GeV}}^{2}\) with the exception of the bottom quark shown for \(Q^{2} = 25~{\textrm{GeV}}^{2}\). The PDFs included are: NNLO (green shaded), All N\(^{3}\)LO contributions (blue shaded), only splitting functions (green dashed), only heavy flavour coefficient functions and transition matrix elements (dark grey dash-dot) and only light flavour coefficient functions and transition matrix elements (red dotted)

The deconstructed aN\(^{3}\)LO PDFs as a ratio to the NNLO MSHT PDFs for various flavours at \(Q^{2} = 10\ {\textrm{GeV}}^{2}\) (with the bottom quark given at \(Q^{2} = 25\ {\textrm{GeV}}^{2}\)) are shown in Fig. 44. Across the more tightly constrained light quark PDFs, all contributions lie very close to the aN\(^{3}\)LO \(\pm 1\sigma \) uncertainty bands (blue shaded region and solid line). The additive and compensating nature of these contributions is also clear in a handful of the ratios from Fig. 44. In other areas the full description is biased towards a single contribution, for example the charm and bottom quarks follow the contribution from heavy flavours as one may expect. Conversely, to some extent the gluon follows the splitting functions much more closely as these contributions indirectly couple the gluon to the more constraining data.Footnote 21

8.9 \(\alpha _{s}\) variation

Fig. 45
figure 45

Quadratic fit to the total \(\chi ^{2}\) results from various \(\alpha _{s}(m_{Z})\) starting scales. The minimum of the quadratic fit provides a rough estimate of \(\alpha _{s}(m_{Z}) = 0.1170\) at aN\(^{3}\)LO

As in the standard MSHT20 NNLO PDF fit, we present the best fit aN\(^{3}\)LO PDFs with \(\alpha _{s}(m_{Z}) = 0.118\), the common value chosen in the PDF4LHC combination [8]. However, investigating the true minima in \(\alpha _{s}(m_{Z})\), the \(\chi ^{2}\) profiles in Fig. 45 prefer a value of around \(\alpha _{s}(m_{Z}) = 0.1170\). This result follows the trend from lower orders whereby the best fit values are \(\alpha _{s}(m_{Z}) = 0.1174 \pm 0.0013\) at NNLO and \(\alpha _{s}(m_{Z}) = 0.1203 \pm 0.0015\) at NLO [151]. Following from NNLO, the aN\(^{3}\)LO \(\alpha _{s}(m_{Z})\) prediction is also slightly lower than the NNLO world average central value at around \(\alpha _{s}(m_{Z}) = 0.1179 \pm 0.0010\) [152]. In any case, the preferred aN\(^{3}\)LO \(\alpha _{s}(m_{Z})\) value stated here is in agreement with the MSHT20 NNLO result and the world average within uncertainties. A full analysis is left for a future publication.

8.10 Charm mass dependence

Fig. 46
figure 46

Quadratic fit to the total \(\chi ^{2}\) results from various charm masses (\(m_{c}\)). The minimum of the quadratic fit provides a rough estimate of \(m_{c} = 1.45~{\textrm{GeV}}\) at aN\(^{3}\)LO

Fig. 47
figure 47figure 47

Low-\(Q^{2}\) PDF ratios showing aN\(^{3}\)LO (with decorrelated K-factors) 68% confidence intervals compared to NNLO 68% confidence intervals with varying fixed values for the charm mass. All plots are shown for \(Q^{2} = 10~{\textrm{GeV}}^{2}\) with the exception of the bottom quark shown for \(Q^{2} = 25~{\textrm{GeV}}^{2}\). The PDFs included are: \(m_{c} = 1.40~{\textrm{GeV}}\) (standard MSHT20 choice) (blue solid), \(m_{c} = 1.30~{\textrm{GeV}}\) (green dashed), \(m_{c} = 1.45~{\textrm{GeV}}\) (grey dotted dashed) \(m_{c} = 1.50~{\textrm{GeV}}\) (red dotted)

In a standard MSHT fit [3], aN\(^{3}\)LO PDFs are produced with the charm pole mass \(m_{c} = 1.40\ {\textrm{GeV}}\). Figure 46 displays the \(\chi ^{2}\) results when varying this charm mass. The predicted minimum at NNLO (for MSHT20 PDFs) is in the range \(m_{c} = 1.35{-}1.40~{\textrm{GeV}}\) [151], whereas at aN\(^{3}\)LO we show a minimum in the region of \(m_{c} = 1.42-1.47\ {\textrm{GeV}}\). This aN\(^{3}\)LO result therefore shows a slightly better agreement with the world average [152]Footnote 22 of \(m_{c} = 1.5 \pm 0.2\ {\textrm{GeV}}\).

Considering Fig. 47, one is then able to analyse the effect of this slightly higher charm mass on the form of the PDFs. As one can expect, the charm PDF is subject to the largest difference and is suppressed by a higher \(m_{c}\). The extra suppression from a higher charm mass allows the fit to suppress the \(c + {\bar{c}}\) sea contribution. This is then compensated by an increase in the \({\bar{u}}\) and \({\bar{d}}\) distributions which stabilises the overall sea contribution.

9 N\(^{3}\)LO predictions

With the increasing number of hard cross section calculations at N\(^{3}\)LO, there is a growing demand for N\(^{3}\)LO accuracy in PDFs. In this section we investigate the effect of the MSHT approximate N\(^{3}\)LO PDFs on Higgs production via gluon fusion and vector boson fusion (VBF). The hard cross sections for these processes have been calculated to N\(^{3}\)LO accuracy [67,68,69,70,71,72,73,74,75,76, 153, 154]. We present a full N\(^{3}\)LO computation for each prediction with our approximate N\(^{3}\)LO PDFs, including theoretical uncertainties. In future work, the intention will be to expand this analysis to include results for N\(^{3}\)LO DY [63] and approximate N\(^{3}\)LO top production [66] cross sections.

Note that in this section we follow the notation used previously and denote the aN\(^{3}\)LO results with decorrelated K-factors as \((H_{ij} + K_{ij})^{-1}\) and those with correlated K-factors with \(H_{ij}^{\prime \ -1}\). In all cases, scale variations are found via the 9-point prescription [11] for results with NNLO PDFs. Whereas for aN\(^{3}\)LO PDFs, although the extra information introduced is at N\(^{3}\)LO, the data (and therefore all relevant theory nuisance parameters) which are included in the global fit are sensitive to all orders. In particular, we include theoretical uncertainties into our aN\(^3\)LO fit which incorporate MHO effects on the PDFs. Therefore we argue (and in these cases demonstrate) that the factorisation scale variation is contained within the PDF uncertainties. Due to this, it is only the renormalisation scale which requires variation in predictions involving aN\(^{3}\)LO PDFs.Footnote 23

9.1 Higgs production – gluon fusion: \(gg \rightarrow H\)

Table 14 Higgs production cross section results via gluon fusion (with \(\sqrt{s} = 13~\text {TeV}\)) using N\(^{3}\)LO and NNLO hard cross sections combined with NNLO and aN\(^{3}\)LO PDFs. All PDFs are at the standard choice \(\alpha _{s}(m_{Z}) = 0.118\). These results are found with \(\mu = m_{H}/2\) unless stated otherwise, with the values for \(\mu = m_{H}\) supplied in Table 21
Fig. 48
figure 48

Higgs production cross section results via gluon fusion (with \(\sqrt{s} = 13~\text {TeV}\)) at two central scales: \(\mu = m_{H}/2\) (left) and \(\mu = m_{H}\) (right). Displayed are the results for aN\(^{3}\)LO PDFs with decorrelated K-factors (\((H_{ij} + K_{ij})^{-1}\)), correlated K-factors (\(H_{ij}^{\prime \ -1} = (H_{ij} + K_{ij})^{-1}\)) each with a scale variation band from varying \(\mu _{r}\) by a factor of 2. In the NNLO and NLO PDF cases, both scales \(\mu _{f}\) and \(\mu _{r}\) are varied by a factor of 2 following the 9-point convention [11]

Table 14 and Fig. 48 (left) show predictions at a central scale of \(\mu = \mu _{f}=\mu _{r}=m_{H}/2\) for the Higgs production cross section via gluon fusionFootnote 24 at the LHC for \(\sqrt{s} = 13~\text {TeV}\), where \(m_{H} = 125~\text {GeV}\) is the Higgs mass and no fiducial cuts are applied. Figure 48 (right) displays the same analysis for the gluon fusion cross section with \(\mu = \mu _{f}=\mu _{r}=m_{H}\) (numerical results provided in Table 21).

Fig. 49
figure 49

Higgs production cross section results via vector boson fusion (with \(\sqrt{s} = 13~\text {TeV}\)) at a central scale set to the vector boson momentum. Displayed are the results for aN\(^{3}\)LO PDFs with decorrelated K-factors (\((H_{ij} + K_{ij})^{-1}\)), correlated K-factors (\(H_{ij}^{\prime \ -1} = (H_{ij} + K_{ij})^{-1}\)) each with a scale variation band from varying \(\mu _{r}\) by a factor of 2. In the NNLO and NLO PDF cases, both scales \(\mu _{f}\) and \(\mu _{r}\) are varied by a factor of 2 following the 9-point convention [11]

Considering the \(\mu = m_{H}/2\) and \(\mu = m_{H}\) central value results displayed in Table 14 and Fig. 48, it can be observed that aN\(^{3}\)LO PDFs predict a lower central value than NNLO PDFs across all hard cross section orders. One can also notice an overlap in all cases between predictions from NNLO and aN\(^{3}\)LO PDFs. However for \(\mu = m_{H}/2\), whilst the error bands for predictions with N\(^{3}\)LO hard cross section and NNLO and N\(^{3}\)LO PDFs overlap, their central values are outside each other’s respective error bands. Since estimating MHOUs via scale variations is a somewhat ambiguous procedure (and is therefore estimated conservatively to reflect this), these results highlight the benefit of being able to exploit a higher level of control over MHOUs i.e. via nuisance parameters. By predicting a different central value we include a more accurate estimation for higher order predictions which may not be contained within scale variations, especially at unmatched orders in perturbation theory.

Examining the predicted central values further, Fig. 48 suggests that the increase in the cross section theory at N\(^{3}\)LO is compensated by the PDF theory at N\(^{3}\)LO, suggesting a cancellation between terms in the PDF and cross section theory at N\(^{3}\)LO. This point is important to consider when combining unmatched orders in physical calculations, since we must be open to the possibility that unmatched cancellations in physical calculations can lead to inaccurate predictions, as our results suggest here.

Further to this, the change in the gluon PDF is largely driven by the predicted form of \(P_{qg}\) at aN\(^{3}\)LO and DIS data. Therefore the relevant changes in the gluon at aN\(^{3}\)LO are most likely due to indirect effects i.e. not directly related to gluon fusion predictions. Due to this, there is no reason to believe that the observed level of convergence should happen at aN\(^{3}\)LO for both choices of \(\mu \). However, owing to the inclusion of known information at higher orders, one can be confident that the prediction is more accurate than NNLO, whichever way it moves.

Table 15 Higgs production cross section results via the vector boson fusion process (with \(\sqrt{s} = 13~\text {TeV}\)) using N\(^{3}\)LO and NNLO hard cross sections combined with NNLO and aN\(^{3}\)LO PDFs. All PDFs are at the standard choice \(\alpha _{s}(m_{Z}) = 0.118\). These results are found with \(\mu = Q^{2}\) where \(Q^{2}\) is the vector boson momentum

Comparing PDF uncertainty values calculated using NNLO and aN\(^{3}\)LO PDFs, another prominent feature one can notice in Table 14 is an increase in PDF uncertainties. We find that the PDF uncertainty without N\(^{3}\)LO theory uncertainties included (i.e. using only the eigenvector description from the first 32 eigenvectors and with N\(^{3}\)LO parameters fixed at the best fit) also includes a marginal increase in the positive direction compared to NNLO. Mathematically, the reason for this comes back to the fact that the best fit is inherently different from the NNLO theory, residing in a completely novel \(\chi ^{2}\) landscape. In turn, this means it is not guaranteed that the PDF uncertainty will remain consistent across the distinct PDF sets.Footnote 25 In the case of gluon fusion, the leading contribution to the positive uncertainty direction is an eigenvector primarily dominated by PDF parameters, while in the negative direction a N\(^{3}\)LO splitting function parameter dominates (eigenvector 9 and 31 in the \((H_{ij} + K_{ij})^{-1}\) N\(^{3}\)LO case – see Table 11). As discussed in Sect. 8.8, the gluon predominantly follows the splitting function contributions, therefore it is not surprising that this eigenvector is having a noticeable effect. Phenomenologically, the increase in predicted uncertainties from the inclusion of the theoretical uncertainties is a reflection of the estimated PDF MHOUs in this particular cross section, and acts to replace factorisation scale variation. As a consistency check, we find that when performing a 9-point scale variation procedure with aN\(^{3}\)LO PDFs, the values calculated (for both choices of \(\mu \)) are within the predicted PDF uncertainties. This is therefore a further verification of our MHOUs and that the \(\mu _{f}\) variation is intrinsic in the PDF uncertainties.

Finally Fig. 48 also demonstrates the increased stability of predictions when considering the two different central scales \(\mu \) at N\(^{3}\)LO. As predicted from perturbation theory, the scale dependence is reduced and central values become more in agreement when increasing the order of either the PDFs or hard cross section. Furthermore, the aN\(^{3}\)LO \(\sigma \) central predictions for both choices of \(\mu \) are contained within the uncertainty bands of each other. This is true by definition for the NNLO PDFs since the factorisation scale \(\mu _{f}\) variation includes both choices of \(\mu \), whereas for aN\(^{3}\)LO PDFs this result is not guaranteed and is therefore intrinsic in the PDF (and renormalisation scale \(\mu _{r}\) variation) uncertainty.

9.2 Higgs production – vector boson fusion: \(qq \rightarrow H\)

Table 16 Higgs production cross section results via the vector boson fusion process (with \(\sqrt{s} = 13~\text {TeV}\)) using N\(^{3}\)LO and NNLO hard cross sections combined with NNLO and decorrelated aN\(^{3}\)LO PDFs whilst varying the number of active flavours \(n_{f}\). All PDFs are at the standard choice \(\alpha _{s}(m_{Z}) = 0.118\). These results are found with \(\mu = Q^{2}\) where \(Q^{2}\) is the vector boson momentum

Table 15 and Fig. 49 show the predictions at various orders in \(\alpha _{s}\) for Higgs production cross sections via vector boson fusionFootnote 26 at the LHC for \(\sqrt{s} = 13~\text {TeV}\) up to N\(^{3}\)LO [153, 154], again no fiducial cuts are applied in this comparison. The predictions shown are calculated with \(\mu _{f}^2 = \mu _{r}^2 = Q^{2}\) as the central scale where \(Q^{2}\) is the vector boson squared momentum.

For this process one can follow the increase in the cross section as higher order PDFs are used. Contrasting with the case of gluon fusion, Fig. 49 displays little cancellation between the terms added in the aN\(^{3}\)LO PDF description and the N\(^{3}\)LO cross section. However, the cross section for VBF produces around a \(\sim 3-4 \%\) change order by order and is therefore fairly constant. Considering this relatively small difference between orders, this lack of cancellation is not a major concern. Further to this, the vector boson fusion process is much more reliant on the quark sector which, compared to the gluon, is relatively constant order by order (see Sect. 8.4). The reason for this stems from the more direct data constraints on the shape of quark PDFs.

Comparing the aN\(^{3}\)LO VBF cross section (with MHO theoretical uncertainties) with the NNLO cross section result (with NNLO PDFs) including MHOUs via scale variations, we see that the scale variation MHOUs are negligible against the PDF uncertainties at aN\(^{3}\)LO. This result is in part due to the fact that the scale variation for aN\(^{3}\)LO is only being included for the renormalisation scale. However at NNLO, the extra MHOU predicted was still only a small contribution. Therefore considering these results further, the effects of higher orders in both cases are expected to be small, which provides some agreement with the argument that there is little scope for cancellation between orders for VBF. As for the gluon fusion prediction in Sect. 9.1, we confirm that any further factorisation scale variation (i.e. using the 9-point prescription) is contained within the predicted PDF uncertainties; hence further motivating our previous argument that factorisation scale variation is not necessary with aN\(^{3}\)LO PDFs.

Another feature of the VBF results is that the level of uncertainty at full aN\(^{3}\)LO is only increased slightly from the calculation involving NNLO PDFs. Comparing this to the gluon fusion results, where the uncertainty was more noticeably increased in both directions, it is evident that these approximate N\(^{3}\)LO additions are having a smaller effect on the VBF calculation. Once again, the origin of this is due to the nature of the process. VBF involves mostly the quark sector and is therefore much less affected by the extra N\(^{3}\)LO theory we have introduced (due to direct constraints from data). As we have presented in previous sections, most of the uncertainty in the N\(^{3}\)LO theory resides in the small-x regime which is more directly probed by the gluon sector than in the quark sector.

Lastly we briefly discuss the \(n_{f}\) dependence of the VBF cross section. In VBF the scaling of contributions follows as \(n_{f}^{2}\) due to the presence of two input quark flavours in the process. In Table 16 we observe that the VBF cross section receives a large contribution when including the charm quark (\(n_{f} = 3 \rightarrow 4\)) due to this scaling. We also show that at aN\(^{3}\)LO, this is where most of the difference in the central value and uncertainty from NNLO is accounted for. This is a consequence of the predicted enhancement of the charm PDF at aN\(^{3}\)LO, discussed in Sect. 8.4. Beyond \(n_{f} = 4\) the bottom contribution to VBF in the \(W^{\pm }\) channel (the dominant channel) is heavily suppressed, since due to the CKM elements b must transition to t most of the time. Therefore the VBF cross section only receives a small contribution moving from \(n_{f}=4\) to \(n_{f}=5\).

10 Availability and recommended usage of MSHT20 aN\(^{3}\)LO PDFs

We provide the MSHT20 aN\(^{3}\)LO PDFs in LHAPDF format [157]:

http://lhapdf.hepforge.org/

as well as on the repository:

http://www.hep.ucl.ac.uk/msht/

The approximate N\(^{3}\)LO functions (for \(P_{ij}(x)\) and \(A_{ij}(x)\)) are provided as lightweight FORTRAN functions or as part of a Python framework in the repository:

https://github.com/MSHTPDF/N3LO_additions

We present the aN\(^{3}\)LO eigenvector sets with and without correlated K-factors as discussed in Sect. 8, with the default set being provided with decorrelated K-factors.Footnote 27

\(\texttt {MSHT20an3lo}\_\texttt {as118}\)

\(\texttt {MSHT20an3lo}\_\texttt {as118}\_\texttt {Kcorr}\)

Both these PDF sets contain a central PDF accompanied by 104 eigenvector directions (describing 52 eigenvectors) and can be used in exactly the same way as previous MSHT PDF sets i.e. the MSHT20 NNLO PDFs with 64 eigenvector directions.

As presented in this work, the aN\(^{3}\)LO PDFs include an estimation for missing N\(^{3}\)LO contributions (the leading theoretical uncertainty) and implicitly some MHOU beyond this within their PDF uncertainties. Due to this, we argue and motivate in Sect. 9 that factorisation scale variations are no longer necessary in calculations involving aN\(^{3}\)LO PDFs. However the renormalisation scale should continue to be varied to provide estimates of MHOUs in the hard cross section piece of physical calculations.

In the case that the hard cross section for a process is available up to N\(^{3}\)LO the recommendation is to use the aN\(^{3}\)LO PDFs, since unmatched ingredients in cross section calculations can ignore important cancellations (between the PDFs and hard cross section).

If a process is included within the global fit and the hard cross section is known only up to NNLO (i.e. those discussed in Sect. 7), we recommend the use of the decorrelated version of the aN\(^{3}\)LO PDF set. Using these PDFs and the details provided in Table 10, the hard cross section can be transformed from NNLO to approximate N\(^{3}\)LO. From here the two approximate N\(^{3}\)LO ingredients can be used together to give a full approximate N\(^{3}\)LO result.

If a process is not included in the global PDF fit and the hard cross section is known only up to NNLO, the standard NNLO PDF set remains the default choice. However, we recommend the use of these aN\(^{3}\)LO PDFs as an estimate of potential MHOUs. In this case the aN\(^{3}\)LO PDF set + NNLO hard cross section prediction should be reflected in any MHOU estimates for the full NNLO prediction. For example, when the hard cross section is known only up to NNLO Equation (3.13) from [75] can be adapted to be,

$$\begin{aligned} \delta (\textrm{PDF} - \textrm{TH}) = \frac{1}{2}\left|{\frac{\sigma _{\textrm{aN}^{3}\textrm{LO}}^{(2)} - \sigma _{\textrm{NNLO}}^{(2)}}{\sigma _{\textrm{aN}^{3}\textrm{LO}}^{(2)}}}\right|\end{aligned}$$
(10.1)

where \(\delta (\textrm{PDF} - \textrm{TH})\) is the predicted PDF theory uncertainty on the \(\sigma \) prediction, \(\sigma _{\textrm{aN}^{3}\textrm{LO}}^{(2)}\) is the NNLO hard cross section with aN\(^{3}\)LO PDFs and \(\sigma _{\textrm{NNLO}}^{(2)}\) is the full NNLO result. A caveat to this treatment is that the theory uncertainty is sensitive to unmatched cancellations and should therefore be used with care (and caution), therefore the NNLO set remains the default in evaluating PDF uncertainties.

11 Conclusions

In this paper we have presented the first approximate N\(^{3}\)LO global PDF fit. This follows the MSHT20 framework [3], where the aN\(^{3}\)LO PDF set also incorporates estimates for theoretical uncertainties from missing N\(^{3}\)LO contributions and implicitly some MHOU beyond this. In addition, the framework presented for obtaining these PDFs provides a means of utilising higher order information as and when it is available. In contrast, previously, complete information of the next order was required for theoretical calculations in PDF fits. This provides a significant advantage moving forward in precision phenomenology, since as we move to higher orders, this information takes increasingly longer to calculate. We have analysed the resulting set of PDFs, denoted MSHT20aN\(^{3}\)LO, and made two sets available as described in Sect. 10. The aN\(^{3}\)LO PDF fits have been performed to the same set of global hard scattering data and PDF parameterisations included for the MSHT20 NNLO PDF fits.

The NNLO theoretical framework for MSHT20 PDFs has been extended in Sect. 2 to include the addition of general N\(^{3}\)LO theory parameters into the fit. Subsequently, we have outlined how these N\(^{3}\)LO theory parameters can be included into the Hessian procedure as controllable nuisance parameters where they are not yet known. Two methods of handling subsets of the N\(^{3}\)LO theory parameters in the Hessian matrix have then been discussed; i.e. including or ignoring correlations with aN\(^{3}\)LO K-factors across distinct processes.

In Sects. 47 we have presented the N\(^{3}\)LO additions to the relevant splitting functions, transition matrix elements, heavy coefficient functions and K-factors. We present usable and computationally efficient approximations to N\(^{3}\)LO based on known information in the small and large-x regimes and the available Mellin moments (and make these available as described in Sect. 10). In all cases the best fit prediction for each N\(^{3}\)LO function is in good agreement with the prior expected behaviour. Also in Sect. 7, we find good agreement with recent progress towards N\(^{3}\)LO DY and top production K-factors [64, 66]. As more information becomes available surrounding each of these functions, the framework we present here can be easily adapted, aiding in the reduction in sources of MHOUs from N\(^{3}\)LO. As we have stressed, we interpret our theoretical uncertainty as being mainly due to the remaining uncertainty at N\(^3\)LO, but with some small, but significant contribution from even higher orders, particularly at small-x. Our results seem consistent with this interpretation. However, in the future we expect the N\(^{3}\)LO description to become more exact. Hence, at some point the remaining N\(^3\)LO uncertainty will become comparable to, or smaller than effects beyond N\(^3\)LO. We would then have to modify our procedure. However, we expect that once the N\(^3\)LO theory becomes very largely known, there will at this point also be more information known about even higher orders (i.e. N\(^{4}\)LO), which could then be incorporated in a similar manner to maintain an estimate for MHOUs. Alternatively, in the event that the available information is not suitable to provide approximations (or indeed to complement these approximations), a treatment similar in principle, but more sophisticated in practice, to that of the K-factors may be adopted for DIS quantities. On this note, we acknowledge that the method of constructing aN\(^{3}\)LO K-factors for non-(inclusive) DIS processes presented here is a first step towards a more robust and flexible procedure, which is left for future work.

Combining together all N\(^{3}\)LO information, in Sect. 8 the results of an approximate N\(^{3}\)LO global PDF fit are presented. The new MSHT20 approximate N\(^{3}\)LO PDFs show a significant reduction in \(\chi ^{2}\) from the MSHT20 NNLO PDF set, with the leading NNLO tensions between HERA and non-HERA datasets heavily reduced at aN\(^{3}\)LO (most notably with the ATLAS 8 TeV \(Z\ p_{T}\) dataset [110]). With this being said, the aN\(^{3}\)LO set does fit selected Jets datasets worse in an aN\(^3\)LO global fit than at NNLO, although these are an exception to the behaviour seen for the other datasets. In performing a fit not including ATLAS 8 TeV \(Z\ p_{T}\) data we provide evidence that similar tensions seen at NNLO (see [3]) remain between this dataset and jet production data at aN\(^{3}\)LO. Further to this, we show that since HERA and ATLAS 8 TeV \(Z\ p_{T}\) data are more in agreement in the form of the high-x gluon at aN\(^{3}\)LO, one can observe that the tension with the jet production data is shared between HERA and ATLAS 8 TeV \(Z\ p_{T}\) data. Finally, as discussed, we highlight that in future work it will be interesting to observe if this increased tension may be alleviated when considering these jet datasets instead as dijet cross sections.

Investigating the correlations present within an aN\(^{3}\)LO PDF fit, a natural separation between process independent and process dependent parameters can be observed. With this motivation, a PDF set with decorrelated aN\(^{3}\)LO K-factor eigenvectors is constructed. The validity of this is then also verified by comparison with a second PDF set which includes correlations between all parameters. Each of these sets exhibits similarly well behaved eigenvectors and levels of dynamical tolerance.

Considering the form of the individual PDFs, the aN\(^{3}\)LO PDFs include a much harder gluon at small-x due to contributions from the splitting functions as discussed in Sect. 8.8. This enhancement then translates into an increase in the charm and bottom PDFs due to the gluon input into the heavy flavour sector via the transition matrix elements. At very low-\(Q^{2}\) the result of the N\(^{3}\)LO additions is a non-negative charm and gluon PDF at small-x. As a consistency check, the fit dependence on \(\alpha _{s}\) and \(m_{c}\) has been investigated. In both of these cases we show a preference for values which suppress the heavy flavour contributions (slightly lower \(\alpha _{s}\) and slightly higher \(m_{c}\) than NNLO). Considering the predicted aN\(^{3}\)LO \(\alpha _{s}\), we observe a slightly lower than \(1\sigma \) effect when comparing with the NNLO world average. While an extensive analysis of the aN\(^{3}\)LO \(\alpha _{s}\) value is left for further study, since the world average is determined by NNLO results, one could expect a small systematic effect from moving to N\(^{3}\)LO.

Taking this analysis further and using the approximate N\(^{3}\)LO PDFs as input to N\(^{3}\)LO cross section calculations, we consider the cases of gluon and vector boson fusion in Higgs production. We present the first aN\(^{3}\)LO calculation for these cross sections and show how the aN\(^{3}\)LO prediction differs from the case with NNLO PDFs including scale variations, highlighting the importance of matching orders in calculations. In VBF we provide an example where cancellation is not realised between orders. However in this case the quark sector is much more constrained and due to the smaller variation between orders, there is naturally less scope for cancellation.

In summary, we have presented a set of approximate N\(^{3}\)LO PDFs that are able to more accurately predict physical quantities involving PDFs (given that all ingredients in these calculations are included at N\(^{3}\)LO or aN\(^{3}\)LO). In producing these PDFs, we have provided a more controllable method for estimating theoretical uncertainties from MHOs in a PDF fit than scale variations. While some ambiguity remains in this method in how the prior variations are chosen, we argue that the current knowledge and intuition surrounding each source of uncertainty can be utilised as and when available. This is therefore much more in line with what one can expect a theoretical uncertainty to encompass. Another potential shortcoming is the possibility of fitting to sources of uncertainty other than higher orders (or higher order corrections elsewhere in theory calculations included in a PDF fit). Although this is a possibility, the position of the considered sources of uncertainty in the underlying theory combined with the prior variations and penalties should act to minimise this effect. In any case, if a separate source of uncertainty is significantly affecting the fit, this will present itself as a source of tension with the N\(^{3}\)LO penalties and the \(\chi ^{2}\) (and PDF uncertainty) will be adapted accordingly.

Table 17 List of all the N\(^{3}\)LO ingredients used to construct the approximate N\(^{3}\)LO splitting functions and transition matrix elements. Where only a citation is provided, extensive knowledge i.e. beyond NLL is used. This table is a non-exhaustive list of the current knowledge about these functions, however information beyond that which is provided here is not currently in a usable format for phenomological studies

In future work it will be interesting to investigate the effects in the high-x gluon, which is a region of phenomenological importance and where the interpretation of LHC constraints is not always straightforward. We also note that there are N\(^{3}\)LO results available from di-lepton rapidity in DY processes [64]. Considering the results in Sect. 7 which display an agreement with these recent results, we hope that these approximate N\(^{3}\)LO PDFs may be of interest in this analysis. Similarly for recent results considering top production [66]. Furthermore, any approximate information from these results could be included in the N\(^{3}\)LO K-factor priors, which was not done for this iteration of the aN\(^{3}\)LO PDFs. Finally, in order to continually improve the description of aN\(^{3}\)LO PDFs, the inclusion of more sub-leading sources of MHOUs could be addressed. With the upcoming wealth of experimental data from future colliders such as the HL-LHC and the EIC, it will be of interest to gain a better understanding of the transition matrix elements and also describe better the charged current and longitudinal structure functions, where currently theoretical uncertainties are much smaller than the experimental uncertainties.