Extraction of the strong coupling with HERA and EIC inclusive data

Sensitivity to the strong coupling $\alpha_S(M^2_Z)$ is investigated using existing Deep Inelastic Scattering data from HERA in combination with projected future measurements from the Electron Ion Collider (EIC) in a next-to-next-to-leading order QCD analysis. A potentially world-leading level of precision is achievable when combining simulated inclusive neutral current EIC data with inclusive charged and neutral current measurements from HERA, with or without the addition of HERA inclusive jet and dijet data. The result can be obtained with substantially less than one year of projected EIC data at the lower end of the EIC centre-of-mass energy range. Some questions remain over the magnitude of uncertainties due to missing higher orders in the theoretical framework.


Introduction
Of the coupling strengths of the fundamental forces, the strong coupling α s is by far the least well constrained.At the same time, it is an essential ingredient of Standard Model cross section calculations, as well as constraints on new physics and grand unification scenarios [1,2].It has previously been measured in a wide range of processes [3,4].In Deep Inelastic Scattering (DIS), recent studies from HERA have shown limited sensitivity when using only inclusive data [5], but much more competitive precision when additionally including jet production cross sections [5][6][7].In recent years, the advances in QCD theory from next-to-leading order (NLO) to next-to-next-to-leading order (NNLO) have resulted in a substantial reduction in the uncertainties on α s extractions due to missing higher order corrections, usually expressed in terms of a QCD scale uncertainty, though they remain by far the largest single source of uncertainty in the best HERA extractions.The Electron Ion Collider (EIC) [8], currently under preparation at Brookhaven National Laboratory in partnership with the Thomas Jefferson National Accelerator Facility is expected to begin taking data around 2030.The EIC will collide highly polarised electrons with highly polarised protons and light/heavy nuclei.In ep mode, the expected luminosity is of order 10 33 −10 34 cm −2 s −1 and the centre-of-mass energy √ s will range from 29 GeV to 141 GeV.As part of the extensive program of EIC physics [9], inclusive DIS cross sections will be measured to high precision in a phase space region that will be complementary to HERA, in particular improving the sensitivity to the large Bjorken-x kinematic region.In this work, the expected experimental uncertainty on the strong coupling at the scale of the Z-pole mass α s (M 2 Z ) is estimated when adding simulated EIC inclusive data to analyses very similar to those performed on HERA data.An earlier study of the impact of inclusive EIC data on α s precision can be found in [9].

Data Samples
The HERA data used in this analysis are the final combined H1 and ZEUS inclusive DIS neutral current (NC) and charged current (CC) cross sections [5] and, where appropriate, the H1 and ZEUS inclusive and dijet measurements used in a recent study of parton distribution function (PDF) sensitivity at NNLO, as summarised in Table 1 of [6].The HERA cross sections correspond to unpolarised beam configurations at proton beam energies of 920, 820, 575 and 460 GeV and an electron beam energy of 27.5 GeV.The data correspond to an integrated luminosity of about 1 fb −1 and span six orders of magnitude in the modules of the four-momentum-transfer squared, Q 2 , and in Bjorken x.
The detailed experimental apparatus configurations for the EIC are currently under intense development.However, the broad requirements are well established, as documented for example in [9].In this paper, the simulated EIC data are taken from the studies performed in the framework for ATHENA detector proposal [10].The ATHENA configuration has since been combined with ECCE [11] in the framework of a new and fast-evolving ePIC design.Whilst the details of the apparatus are different, the overall kinematic range and achievable precision are expected to be very similar.As summarised in Table 1, neutral current EIC simulated measurements ('pseudodata') are produced with integrated luminosities corresponding to expectations for one year of data taking with each of the five different beam configurations expected at the EIC.Charged current pseudodata are also available at the highest √ s.The neutral current pseudodata are produced in a grid of five logarithmically-spaced x and Q 2 values per decade over the range 1 0.001 < y < 0.95, which is well-justified by the expected resolutions.The central values are taken from predictions using HERAPDF2.0NNLO[5], 2 randomly smeared based on Gaussian distributions with standard deviations given by the projected uncertainties as estimated by the ATHENA collaboration and as previously used to study collinear PDF sensitivities in [10,12].The systematic precision is based on experience from HERA and further considerations in [9] and is rather conservative in the context of the more modern detector technologies and larger data sets at the EIC.Most data points have a point-to-point uncorrelated systematic uncertainty of 1.9%, extending to 2.75% at the lowest y values.An additional normalisation uncertainty of 3.4% is ascribed, which is taken to be fully correlated between data at each √ s, and fully uncorrelated between data sets with different √ s.For the purposes of the QCD fits (section 3), the point-to-point systematic uncertainties are added in quadrature with the statistical uncertainties and the normalisation uncertainties are treated as nuisance parameters, as in [5].Table 1: Beam energies, centre-of-mass energies and integrated luminosities of the different configurations considered for the EIC.
The kinematic plane of the inclusive data used in this analysis is shown in Fig. 1.The EIC pseudodata overlap in their coverage with the HERA data and extend the kinematic reach in the high x, moderate Q 2 region.Their impact at large x is significant since the large x HERA data are relatively imprecise due to their kinematic correlation with large Q 2 , the 1/Q 4 photon propagator term in the cross section and the limited integrated luminosity.
HERA and EIC kinematic phase-space

Theoretical Framework and Fitting Procedure
The analysis is based on a QCD fit that follows the HERAPDF [5] theoretical framework, PDF parameterisations and model parameter choices.In the fit, the proton PDFs and α S (M 2 Z ) are constrained simultaneously in a χ 2 -minimisation procedure in which the Q 2 evolution is performed according to the NNLO DGLAP evolution equations [13][14][15][16][17][18][19][20][21][22].The xFitter framework [23][24][25] is used, with the light quark coefficient functions calculated to NNLO as implemented in QCDNUM [26].The MINUIT program [27] is used for the minimisation.The general-mass variable-flavor-number scheme [28,29] is used for the contributions of heavy quarks.The renormalisation and factorisation scales are taken to be µ r = µ f = Q 2 for the inclusive DIS data, while for dijets, where < p T > 2 is the average of the transverse momenta of the two jets.The charm and beauty quark masses (M c , M b ) follow the choices in [5].The minimum Q 2 of the inclusive data included in the fits is Q 2 min = 3.5 GeV 2 .As well as avoiding complications associated with low Q 2 , this requirement also reduces the possible influence of ln(1/x) resummation [30].An additional cut is applied on the squared hadronic final state invariant mass, W 2 = Q 2 (1 − x)/x > 10 GeV 2 , which removes data points with low Q 2 and high x that are most likely to be influenced by power-like higher twist or resummation effects.This cut influences the EIC data sets at the lowest √ s.For the central fit, the PDFs are parameterised at a starting scale for QCD evolution of µ f 0 = 1.9 GeV 2 , as in the HERAPDF2.0fits [5].
The PDFs are parameterised at the starting scale in terms of the gluon distribution (xg), the valence quark distributions (xu v , xd v ), and the u-type and d-type anti-quark distributions (x Ū , x D), where x Ū = xū corresponds to anti-up quarks only and x D = x d + xs is the sum of anti-down and anti-strange quarks.Symmetry is assumed between the sea quarks and antiquarks for each flavour.Strange quarks are suppressed relative to light quarks through a factor f s = 0.4 whereby xs = f s x D for all x.The nominal parameterisation is (1) (2) The parameters A uv and A dv are fixed using the quark counting rules and A g is fixed using the momentum sum rule.The requirement xū = x d is imposed as x → 0 through corresponding conditions on A Ū , A D, B Ū , B D and f s .There are thus a total of 14 PDF free parameters.The experimental, model, and parameterisation uncertainties on α s (M 2 Z ) are evaluated as described in [6].The modelling uncertainties are obtained through variations of Q 2 min , f s , M c and M b as shown in Table 2; the parameters are altered independently and the changes relative to the central value of α s (M 2 Z ) are added in quadrature.For the PDF parameterisation uncertainties, the procedure of [6] is followed, based on variations resulting from the addition of further D and E parameters to the expressions in Eqs. 1 -5 and changes in the starting scale µ 2 f 0 by ±0.3 GeV 2 .The fits are repeated with each of these variations and the largest difference relative to the nominal α s (M 2 Z ) is taken to be the uncertainty.The model and parameterisation uncertainties are added in quadrature in quoting the final results.For the jet data, the uncertainties in the hadronisation corrections are treated in the same manner as the experimental correlated systematic uncertainties.

Parameter
Central value Downwards variation Upwards variation Table 2: Central values of model input parameters and their one-sigma variations.It was not possible to implement the variations marked * because µ f0 < M c is required, see Ref. [6].In these cases, the uncertainty on the PDF obtained from the other variation was symmetrised.
The influence on the extracted α s (M 2 Z ) of missing orders in the perturbation series beyond NNLO is estimated via a scale uncertainty, in which the the renormalisation µ r and factorisation µ f scales are varied up and down by a factor of two.Combinations are considered in which µ r and µ f are changed together or separately and the largest resulting positive and negative deviations on α s (M 2 Z ) (with the exclusion of the two extreme combinations of the scales) is taken as the scale uncertainty.As is currently customary in global QCD fits, 3 no scale variations are made in the treatment of the inclusive data.This topic is further discussed in section 3.4.2.

Fits with EIC Inclusive Data and HERA Inclusive and Jet Data
A simultaneous NNLO fit to extract the PDFs and α s (M 2 Z ) from HERA inclusive and jet data and EIC simulated inclusive data at all five √ s values is performed as described in section 2. The result is α s (M 2 Z ) = 0.1160 ± 0.0004 (exp) +0.0003 −0.0002 (model + parameterisation) ± 0.0005 (scale).
By construction of the EIC simulated data, α s (M 2 Z ) must be close to 0.116.As expected, the PDF parameters obtained in the fits are also fully compatible with those from the HERAPDF2.0set.The uncertainties from the joint fit to HERA and EIC data can be compared with those from the HERA-PDF2.0JetsNNLO result [6]: The results and uncertainties with and without the inclusion of EIC data are shown in the form of a χ 2 scan as a function of α s (M 2 Z ) in Fig. 2. Each point in the figure corresponds to a full QCD fit, with all 14 PDF parameters free and a fixed strong coupling value.The result without EIC data corresponds exactly to the most recent HERA result [6].
Adding the simulated inclusive EIC data leads to a remarkable improvement in the estimated experimental and scale uncertainties.The source of the improvement in experimental precision is discussed in section 3.4.1.The scale uncertainty is reduced to a similar level to the combined model and parameterisation uncertainties and becomes smaller than the experimental uncertainty.This is a consequence of the reduced dependence of the fit on the jet data.The scale uncertainty is not yet evaluated for the inclusive data, as further discussed in section 3.4.2.

Fits with EIC and HERA Inclusive Data Only
The very significant impact of the EIC inclusive data on the α s (M 2 Z ) precision naturally raises the question of whether a similar result can be obtained without the HERA jet data, i.e. using only inclusive DIS measurements.A further question of interest is how important a role is played by the multiple √ s values available at the EIC.Correspondingly, further fits are performed to the following inclusive data sets with the fit procedures otherwise unchanged: • HERA inclusive data only, as already published in the HERAPDF2 paper [5]; • HERA inclusive data and the EIC simulated inclusive data described in 2.1, including all five different √ s values in Table 1; • HERA inclusive data and the EIC simulated inclusive data, separately for each of the five √ s values.
Figure 3 shows the results of this investigation.The fits to HERA data alone show only a limited dependence of the fit χ 2 on the strong coupling α s (M 2 Z ), corresponding to a relatively poor constraint [5].In contrast, the χ 2 minimum around α s (M 2 Z ) = 0.116 is very well pronounced for fits that additionally include EIC data.Although the best result is obtained when including all EIC √ s values together, the precision degrades only slightly when restricting the EIC data to only one EIC √ s value.In the latter case, the precision improves as the √ s value of the chosen EIC data decreases.The second lowest √ s value, corresponding to E e × E p = 5 × 100 GeV, is shown in Fig. 3.
The strong coupling extracted from the simultaneous fit for the PDFs and α s (M 2 Z ), using the full set of EIC pseudodata together with the HERA inclusive data, is corresponding to a total precision of better than 0.4%.As discussed in section 2.2, no scale uncertainty is quoted here.It is expected to be significantly reduced in a fit to inclusive data only relative to the result quoted in section 3.1.Section 3.4.2contains a discussion of possible ways of estimating the scale uncertainties in this case.The fit using inclusive data only is further extended to investigate the influence of the integrated luminosity of the EIC data on the α s (M 2 Z ) precision.The statistical uncertainties of the EIC data are scaled such that the pseudodata samples at each beam energy correspond to 1 fb −1 , approximately ) for the NNLO fits to HERA and jets data in addition to the simulated EIC inclusive data (top) and without the EIC data as published in [6] (bottom).The experimental, model, parameterisation, and scale uncertainties are displayed.matching the integrated luminosity of the HERA data.This results in only a small change compared with the results shown in Fig. 3.

HERA inclusive and jets + EIC inclusive
The relative importance of the different EIC beam energy configurations has also been investigated.When including only a single EIC data set with √ s = 45 GeV, the experimental uncertainty is approximately ±0.0010,only slightly more than a factor of two larger than that obtained when including all EIC √ s values and significantly better than current results from DIS data.Given that the earliest EIC data are expected to be at low √ s, this result might be obtainable after significantly less than one year of EIC data taking.If only the very lowest energy EIC dataset ( √ s = 29 GeV) is used, the uncertainty grows considerably, due to the influence of the W 2 cut.
To further test the influence of the EIC systematic uncertainty assumptions, the fit is repeated with the correlated systematic uncertainties increased by a factor of two.The uncertainty on the extracted α s (M 2 Z ) is barely influenced.Conversely, if the uncorrelated systematic uncertainties are increased by a factor of two, the uncertainty on α s (M 2 Z ) increases to around 1.7%.The uncorrelated systematic uncertainties are thus the most closely correlated with the precision on α s (M 2 Z ).

Variations in Analysis Procedure
The robustness of the extracted α s (M 2 Z ) and PDF results and their uncertainties is tested by varying the details of the fits in a number of ways.The relative sensitivity to α s of different kinematic regions within the simulated EIC data is also investigated.
To check for a possible bias from the data simulation procedure, the HERA data were replaced with pseudodata obtained using the same method as for the EIC samples.The α s (M 2 Z ) scan using the HERA pseudodata alone (Fig. 3) closely follows that of the real HERA data, with no distinct minimum observed.
The established technique for including correlated systematic uncertainties in global QCD fits treats each source of correlated uncertainty separately, whereas the EIC estimate is in terms of only a single normalisation uncertainty for each √ s, corresponding to the sum of all such sources.Studies are therefore conducted in which the correlated EIC systematic uncertainty is decomposed into the separate sources, following Table 10.5 in the EIC Yellow Report [9].The changes to the results are negligible.
The lowest Q 2 data are most likely to be influenced by missing higher orders, higher twist effects and ln(1/x) resummation effects [30].To check that the precision is not dramatically altered by excluding these data, the analysis is repeated with the Q 2 min cut increased from 3.5 GeV 2 to 10 GeV 2 or 20 GeV 2 .The distinct minima shown in Figures 2 and 3 are still observed, with only a small dependence (up to 0.2%) on Q 2 min .Excluding the lowest x EIC and HERA data such that the analysis is restricted to x > 0.001 only increases the uncertainty on the extracted α s to 0.0005, although precision is lost in the PDF determinations.If all data below x = 0.01 are excluded, the precision on α s remains at a similar level, though the PDF determination becomes over-parameterised, leading to instabilities and biases.
The restriction to W 2 > 10 GeV 2 applied here is necessary to avoid theoretical complications associated with higher twists or resummations.It removes data points with the highest x values at low Q 2 for the EIC data sets with the lowest √ s values, and has no influence on the largest √ s EIC data or the HERA data.When only the lowest energy EIC data √ s = 29 GeV 2 are included in the fit, a systematic dependence on the W 2 cut is observed, which is diluted when the higher √ s data are also included.Nonetheless, in the fit to the full HERA and EIC inclusive data, the experimental uncertainty increases from 0.34% to 0.52% when the restriction is altered to W 2 > 15 GeV 2 .This kinematic region is therefore observed to be important to the EIC α s sensitivity, motivating a full understanding of the range of validity of the theoretical framework as W becomes small.

Discussion
The precision on α s (M 2 Z ) obtained in the fits using only inclusive HERA and EIC data, and also additionally using HERA jet data, are compared in Fig. 4 with results from previous DIS studies and with extractions using a wide range of other processes.The world average of experimental measurements according to the Particle Data Group (PDG) [3] and an average from lattice QCD calculations [31] are also shown.The projected results from the current analyses show a level of precision that is significantly better than both the world average and the lattice average.This very encouraging result is subject to the caveat that no uncertainty has been included due to missing higher orders beyond NNLO in the QCD analysis.
0.115 0.12 0.125 0.  Z ) estimated using HERA and simulated EIC data, compared with extractions using other data sets and methods [6,7,[32][33][34][35][36][37], with the world average according to the PDG [3] and with an average from lattice QCD calculations [31].Scale uncertainties are not yet included in the treatment of inclusive DIS data for any of the results shown.The plotting style follows [32].

Origin of the EIC Sensitivity
The variations in the kinematic range of the fit described in section 3.3 show that the improvement in experimental precision is attributable to the addition of precise EIC pseudodata in the large x, moderate Q 2 region, complementing the kinematic coverage of the HERA data.This additional phase space coverage leads to improved precision on the Q 2 dependence of the inclusive cross section, corresponding to the logarithmic derivative of the inclusive structure function dF 2 /d ln Q 2 .At the highest x values, this quantity is driven primarily by the q → qg splitting, and therefore samples the product of α s and the large x quark densities.Since F 2 is itself a measure of the quark densities and other components of the splitting functions are known exactly at a given order, the logarithmic Q 2 derivative at large x essentially depends only on α s .This contrasts with the scaling violations at lower x, as well as the longitudinal structure function F L [38,39] and DIS jet data, all of which have leading contributions that are proportional to the product of the gluon parton distribution and α s [40][41][42].The improvement in precision can thus be traced to the decoupling of α s from the gluon density, enabled by the high x simulated EIC data.This interpretation is supported by the correlation coefficients between α s and the other free parameters in the fit.

Missing Higher Order Uncertainty
Analyses that are sensitive to strong interactions commonly include estimates of the missing higher order uncertainty (MHOU) in the perturbative QCD framework through the variation of the renormalisation and factorisation scales (often referred to as a 'scale uncertainty').As in many other dedicated and global analyses, the approach used for the jet data included here is to obtain a scale uncertainty by varying the scales by factors of two.However, in the global QCD fits to extract PDFs [43][44][45][46], MHOUs have routinely not been included in the treatment of inclusive DIS data, since they are expected to be relatively small in comparison with other PDF uncertainties.Since the present analysis adopts a perturbative QCD treatment at NNLO, it is reasonable to assume that it also has relatively small MHOUs, and the situation is expected to improve further as the state-of-the-art moves towards N 3 LO [47].Nevertheless, the MHOU associated with inclusive data must clearly be finite and cannot be ignored completely at the very high level of precision suggested in the present analysis.First studies of their influence on PDFs have been performed by the NNPDF collaboration [30], though the impact on the strong coupling was not included.There is as yet no consensus how to estimate MHOUs for inclusive DIS.Some discussion of possible methods is supplied in the following.
In a previous analysis at NLO accuracy [48], the H1 collaboration made a combined fit of inclusiveonly DIS data from HERA and from the fixed target BCDMS experiment.The strong coupling was found to be well-constrained, with the BCDMS data playing a similar role to the EIC pseuododata here.A MHOU was obtained by varying the factorisation and renormalisation scales in the usual way, resulting in a large uncertainty at the level of 4%.However, as shown in the context of PDF uncertainties in [49] (Appendix B), applying this method in an NLO analysis results in an estimate of the MHOU that is larger than the difference between NLO and NNLO results by a factor as large as 20 − 50.A similarly conservative approach might be to fit pseudodata simulated using QCD evolution at NNLO using an NLO framework and vice versa, taking the MHOU on α s (M 2 Z ) to be the deviation of the extracted α s (M 2 Z ) from the input.Applying this method to the present analysis also results in an uncertainty of around 4%, but is also likely to be a very significant over-estimate.A potentially promising approach is suggested by the NNPDF group [49].First a theory covariance matrix is computed, typically using scale variations to include missing higher order uncertainties.Including the covariance matrix explicitly in the PDF fit ensures that the theory uncertainties propagate properly, including those associated with α s (M 2 Z ) if it is a fit parameter.However, until a consensus around a welldeveloped method for including inclusive data such as this emerges, the MHOU in the present α s (M 2 Z ) extraction remains to be evaluated.

Conclusions and Outlook
This work shows that the strong coupling can be determined with potentially world-leading precision in a simultaneous fit of PDFs and α s (M 2 Z ) at NNLO in perturbative QCD, using only inclusive DIS data from HERA and simulated data from the EIC.The estimated uncertainty on the strong coupling when including one year's data at each of the five expected EIC √ s values is better than 0.4%, substantially improving on the precision of the present world experimental and lattice averages.If the EIC pseudodata are restricted to a small fraction of a standard expected year of running at a centre-of-mass energy of 45 GeV, as expected in an early phase of operation, the estimated total uncertainty is at the level of 0.9%.The improvement in precision is traceable to the large x, intermediate Q 2 region that was not accessed at HERA, but is well covered by the EIC.The constraint arises primarily from the evolution of the quark densities in this region and is largely decoupled from the uncertainty on the gluon density.It still remains to assign a meaningful uncertainty due to missing higher order contributions beyond NNLO in the theory.
Further improvements of the α s (M 2 Z ) precision may be obtainable by adding inclusive jet and dijet EIC data to the QCD analysis, for example using theory grids for the EIC energies in the fastNLO framework [50].Other observables carrying information on the strong coupling that may be measured at the EIC include event shapes, jet substructure and jet radius parameters.As well as a DIS-only approach, it would also be interesting to investigate the impact of EIC data on α s determinations in global QCD fits that also include data from the LHC and elsewhere [43][44][45][46].In the time before the start of the EIC, it is hoped that new light will be shed on the issue of higher order uncertainties, leading to a consensus on how they should be treated in α s (M 2 Z ) determinations relying on EIC data.

Figure 1 :
Figure 1: The locations in the (x, Q 2 ) plane of the HERA and EIC neutral current inclusive DIS data points included in the analysis.

Figure 2 :
Figure 2: ∆χ 2 = χ 2 − χ 2min vs. α s (M 2 Z ) for the NNLO fits to HERA and jets data in addition to the simulated EIC inclusive data (top) and without the EIC data as published in[6] (bottom).The experimental, model, parameterisation, and scale uncertainties are displayed.

Figure 3 :
Figure 3: ∆χ 2 = χ 2 − χ 2 min vs. α s (M 2 Z ) for the NNLO fits to HERA data on inclusive ep scattering only (black), and also with the addition of simulated EIC inclusive data for all five √ s values together (red) or for only √ s = 45 GeV (blue).The black full points are taken from [5].

Figure 4 :
Figure 4: Projected total uncertainties on the strong coupling constant α s (M 2Z ) estimated using HERA and simulated EIC data, compared with extractions using other data sets and methods[6,7,[32][33][34][35][36][37], with the world average according to the PDG[3] and with an average from lattice QCD calculations[31].Scale uncertainties are not yet included in the treatment of inclusive DIS data for any of the results shown.The plotting style follows[32].

HERA and EIC Inclusive Data HERA Incl + Jet and EIC Incl Data World Average Lattice Average H1 and ZEUS Inclusive + Jet Data H1 Inclusive Jet/Dijet Data ZEUS Incl. Jet Data Electroweak Fit Jets and Shapes
s α