1 Introduction

More than a century after its discovery by Rutherford [1], the proton is still at the core of an intense research activity. Among other aspects, its mechanical properties, described through the quantum-chromodynamics (QCD) energy–momentum tensor (EMT), have attracted a significant attention in the last decade yielding many theoretical (see for instance [2,3,4]), phenomenological (for instance [5,6,7]), lattice (as in [8,9,10]) and continuum studies (see e.g. [11, 12]). For a recent review, see [13]. The reason for this interest is that the macroscopic properties of the proton such as its mass or its spin are expected to be emergent phenomena from the microscopic interaction between quarks and gluons. The computation of the quark and gluon contributions to the macroscopic properties of the proton, and the comparison with the experimental extraction of these contributions has become one of the main objectives of modern hadron physics.

The connection to experimental data is certainly one of the main factors explaining the recent interest for the EMT. Indeed, it was shown almost three decades ago [14] that one can build an indirect experimental access to the EMT via generalised parton distributions (GPDs) [14,15,16,17,18]. The latter enter the description of deep exclusive processes according to QCD factorisation theorems [19, 20]. One can for instance highlight deeply virtual Compton scattering (DVCS) [16], timelike Compton scattering (TCS) [21], deep virtual meson production (DVMP) [22], multiparticle production [23,24,25] or single diffractive hard exclusive processes [26,27,28]. However, the main source of experimental information remains today DVCS which was measured in several facilities in the last two decades (see e.g. [29,30,31,32,33,34,35,36] and which is currently the core of an intense experimental program at the Thomas Jefferson Laboratory in the USA and at COMPASS at CERN.

This experimental effort has triggered an important theoretical interest for DVCS, which is today the deep exclusive process with the clearest theoretical framework. Higher order corrections up to next-to-next to leading order (NNLO) have been derived [37]. Higher-twist kinematic corrections are also available [38]. However, despite its mature theoretical description, DVCS does not allow to extract unambiguously GPDs. The reason is to be found in the so-called deconvolution problem [39, 40], that is the ill-definedness of the inverse problem relating DVCS form factors to GPDs, embodied by the notion of shadow GPDs. One form factor of the EMT has been particularly studied since it can be accessed from a dispersive analysis of DVCS [41] without requiring the full extraction of the GPDs. Indeed, this form factor only depends on the Polyakov–Weiss D-term [42], accessible directly from the real and imaginary parts of the Compton Form Factors. However, this extraction is also plagued by shadow D-term contributions, hinted at in [7] and presented in greater detail here.

In this paper, we assess the feasibility of extracting from DVCS data independently genuine quarks and gluon contributions to the pressure and shear forces inside the proton, taking into account shadow D-terms and evolution. After introducing our notations and conventions in Sect. 2, we propose in Sect. 3 a new derivation of the dispersion relations of DVCS amplitudes at any order of perturbation theory. This presentation highlights that dispersion relations can provide more information than solely the canonical one concerning the D-term. We present the issue of the deconvolution problem and introduce the notion of shadow D-terms in Sect. 4. Then, we apply our formalism on an existing global fit in Sect. 5 and present the results of the first next-to-leading order extraction. In Sect. 6, we investigate more closely the impact of shadow D-term on a kinematic range relevant at future facilities, in view of the future electron-ion collider (EIC).

2 EMT experimental access through GPDs

The proton matrix element of the local gauge-invariant QCD energy–momentum tensor (EMT) operator can be parameterised in terms of five gravitational form factors (GFFs) \(A_{a}(t),\) \(B_{a}(t),\) \(C_{a}(t),\) \({\bar{C}}_{a}(t)\) and \(D_{a}(t)\) [43, 44] where \(t=\varDelta ^2\) with \(\varDelta =p'-p\) the four-momentum transfer to the proton (see Ref. [7]):

$$\begin{aligned}&{\left\langle p', s'\right| } T^{\mu \nu }_a(0){\left| p, s \right\rangle }\nonumber \\&\quad ={\bar{u}}(p', s') \Bigg \{ \frac{P^\mu P^\nu }{M}\,A_a(t) + M \eta ^{\mu \nu }{\bar{C}}_a(t) \nonumber \\&\qquad + \frac{\varDelta ^\mu \varDelta ^\nu - \eta ^{\mu \nu }\varDelta ^2}{M}\, C_a(t)+ \frac{P^{[\mu } i\sigma ^{\nu ]\rho }\varDelta _\rho }{4M}\,D_a(t) \nonumber \\&\qquad + \frac{P^{\{\mu } i\sigma ^{\nu \}\rho }\varDelta _\rho }{4M}\left[ A_a(t)+B_a(t)\right] \Bigg \} u(p, s). \end{aligned}$$
(1)

The label a denotes either the quark flavour \((a=q)\) or the gluon \((a=g)\) contribution to the EMT. These GFFs allow to define various distributions of so-called mechanical properties of the proton, like distributions of pressure and shear stress induced by the nucleon’s partonic structure [2, 3, 45]. Some of these GFFs are accessible thanks to their remarkable relation to generalised parton distributions (GPDs) introduced in [14,15,16,17,18], such as [46]:

$$\begin{aligned} \int _{-1}^{1}\textrm{d}x\,x^{1-p_a}\,H^a(x,\xi ,t)&=A_a(t)+4\xi ^2C_a(t) , \end{aligned}$$
(2)
$$\begin{aligned} \int _{-1}^{1}\textrm{d}x\,x^{1-p_a}\,E^a(x,\xi ,t)&=B_a(t)-4\xi ^2C_a(t) , \end{aligned}$$
(3)

where \(p_q = 0,\) \(p_g = 1\) and \(H^a(x, \xi , t)\) and \(E^a(x, \xi , t)\) are leading-twist chiral-even GPDs depending on x,  the average longitudinal light-front momentum fraction of the active parton and \(\xi ,\) the skewness variable describing the transfer of longitudinal light-front momentum to the system. We use the conventions of [46] where the skewness is defined as \(\xi = -\varDelta ^+/2P^+ = (p^+ - p'^+) / (p^+ + p'^+).\) Then \(\xi \) is bound in \([-1, 1].\) The link between GFFs and GPDs offers a unique opportunity for an experimental access to the mechanical properties of hadron matter, thanks to the sensitivity to GPDs of a wide class of exclusive experimental channels. We also highlight that the specific polynomial \(\xi \) dependence of the Mellin moments of Eqs. (2) and (3) is in fact a general property called polynomiality [47, 48] and is generalised to higher moments as:

$$\begin{aligned} \int _{-1}^1\textrm{d}x\,x^{m-p_a} H^a(x,\xi ,t)&= \sum _{j=0}^{\left[ \frac{m}{2}\right] } A_{m;2j}^a(t) (2\xi )^{2j}\nonumber \\&\quad +\text {mod}(2,m)(2\xi )^{m+1}C_m^a(t), \end{aligned}$$
(4)
$$\begin{aligned} \int _{-1}^1\textrm{d}x\,x^{m-p_a} E^a(x,\xi ,t)&= \sum _{j=0}^{\left[ \frac{m}{2}\right] } B_{m;2j}^a(t) (2\xi )^{2j}\nonumber \\&\quad -\text {mod}(2,m)(2\xi )^{m+1}C_m^a(t), \end{aligned}$$
(5)

where \([\dots ]\) is the floor function and \(\text {mod}(2,m)\) is 0 for m even and 1 for m odd. The polynomiality property is equivalent to the so-called double distribution formalism introduced independently in [15, 18] (see also [49, 50] for a modern picture of the connection between the two). The double distributions \(F^a\) and \(K^a\) are connected to the GPDs through:

$$\begin{aligned} H^a(x,\xi ,t)&= \int _\varOmega \text {d}\beta \text {d}\alpha \bigg [\beta ^{p_a}F^a(\beta ,\alpha ,t) \nonumber \\&\quad +\xi ^{1+p_a} D^a(\alpha ,t)\delta (\beta ) \bigg ] \times \delta (x-\beta -\alpha \xi ), \end{aligned}$$
(6)
$$\begin{aligned} E^a(x,\xi ,t)&= \int _\varOmega \text {d}\beta \text {d}\alpha \bigg [\beta ^{p_a} K^a(\beta ,\alpha ,t)\nonumber \\&\quad -\xi ^{1+p_a} D^a(\alpha ,t)\delta (\beta ) \bigg ] \times \delta (x-\beta -\alpha \xi ), \end{aligned}$$
(7)

with \(\varOmega = \{(\alpha ,\beta )| |\alpha |+|\beta |\le 1\}\) and where \(D^a\) is the so-called Polyakov–Weiss D-term whose first Mellin moments yields the GFF \(C_a(t)\) in Eq. (2):

$$\begin{aligned} C_a(t) = \frac{1}{4}\int _{-1}^{1}\textrm{d}\alpha \,\alpha ^{1-p_a} D^a(\alpha ,t). \end{aligned}$$
(8)

However, the unambiguous model-independent extraction of GPDs from one of the most promising current channels, namely deeply virtual Compton scattering (DVCS), has already been demonstrated to be practically unfeasible in Ref. [39]. The reason is to be found in the relation between the DVCS amplitude, parametrised with Compton Form factors \(({\mathcal {H}},\) \({\mathcal {E}},\ldots )\) and GPDs:

$$\begin{aligned} {\mathcal {H}}^a(\xi ,t,Q^2) = \int _{-1}^1 \frac{\text {d}x}{\xi } T^a\left( \frac{x}{\xi },\frac{Q^2}{\mu ^2},\alpha _s \right) \frac{H^a(x,\xi ,t,\mu ^2)}{\xi ^{p_a}}, \end{aligned}$$
(9)

where \(T^a\) is the DVCS coefficient function. The convolution reveals itself not to be numerically invertible on any relevant range in \(Q^2,\) the virtuality of the photon mediating the interaction between the lepton beam and the proton target in DVCS, yielding out of control uncertainties. This situation can partly be tamed by exploiting the theoretical constraints applied on GPDs [40] (see also [51] for a exhaustive list of these properties), but theoretical uncertainties remain significant.

In this context, the GFF \(C_a(t)\) has attracted a specific interest since it does not require a full extraction of GPDs, but is instead sensitive to the D-term only. The latter can be probed more specifically in a dispersive formalism of DVCS [41, 52, 53]. A careful analysis of the world DVCS data using this dispersive formalism at leading order (LO) was performed in Ref. [7] using realistic uncertainties on DVCS form factors coming from a neural network analysis [54]. The dependence on \(Q^2\) was taken into account through the use of evolution equations for the scale dependence of the D-term, but the LO approach did not take into account any direct gluon contribution to the subtraction constant. It is our objective in this paper to propose a full next-to-leading order (NLO) treatment, whose relevance becomes stringent with the lever arm in \(Q^2\) promised by future collider experiments, in particular those to be conducted in the electron-ion collider (EIC) [55, 56], Chinese electron-ion collider (EicC) [57, 58] or large hadron-electron collider (LHeC) [59].

3 DVCS dispersion relations beyond leading order

In this section, we provide a brief summary of the state-of-the-art regarding DVCS dispersion relations and provide an alternative proof beyond leading order. We also generalise the dispersion relation for an arbitrary number of subtractions, allowing a new way to connect moments of GPDs with experimental data. Note however that we focus only on s and u channel dispersion relations, constraining the x and \(\xi \) dependence of GPDs. t-channel dispersion relations have been used in [60, 61], and are beyond the scope of this paper. In [62], dispersion relations for a spin-0 particle were verified in an analytical calculation of the D-term at one-loop order in \(\varPhi ^4\) theory.

3.1 State of the art and dispersion relations at LO and NLO

Dispersion relations at Born order were first derived in Refs. [41, 52]. The authors took advantage of the explicit leading-order expression of the CFF in terms of GPDs to derive the dispersion relations. In a nutshell, at LO, the connection between GPDs and CFFs are given by:

(10)
(11)

where indicates that the integrals are regularised through the Cauchy principal value prescription. Using the dispersion relation relating the real and imaginary part of the Compton Form Factor:

(12)

one introduces \({\mathcal {S}}^q,\) the so-called subtraction constant associated with a flavour q. Combining Eqs. (10), (11) and (12), one gets the following expression for the subtraction constant at LO:

(13)

Expanding \(H^q(x,\xi )\) as a Taylor series around \(x=\xi ,\) and using the polynomiality condition (4) or equivalently the DD representation (6), one recovers the well-known leading-order relation between the subtraction constant and the D-term:

$$\begin{aligned} {\mathcal {S}}^q(t) \overset{\textrm{LO}}{=}2e_q^2\int _{-1}^1 \text {d}z \frac{D^q(z,t)}{1-z}. \end{aligned}$$
(14)

DVCS dispersion relations beyond the Born order were considered in Ref. [53] whose analysis is based on the dispersion relation of Eq. (12). Then, introducing GPDs through the leading-twist all-order factorisation theorem (Eq. (9)) as well as the polynomiality property, they derive the general form of the subtraction constant as:

(15)

To the best of our knowledge, this is the sole study on the topic of dispersion relations beyond Born order until now.

This integral generalises the expression of the subtraction constant at higher order in perturbation theory, at the price of introducing a second integration variable, to be integrated to infinity. In the following, we present another approach to generalise dispersion relations at higher orders, allowing us to write the subtraction constant as a single integral over the D-term.

3.2 Analytic properties of the Compton Form Factors

Fig. 1
figure 1

Diagram for the abstract 2-particle process considered

In order to derive an alternative expression to the DVCS dispersion relation at higher-pQCD order, we need to recall the analytic properties of the CFF, which are a special case of the 2-particle scattering amplitude (see Fig. 1). These types of amplitudes are fully characterised by the three Mandelstam variables st and u defined as:

$$\begin{aligned} s&= (p +q )^2 = (p'+q')^2 \end{aligned}$$
(16)
$$\begin{aligned} t&= (p'-p)^2 = (q'-q)^2 \end{aligned}$$
(17)
$$\begin{aligned} u&= (p'-q)^2 = (p-q')^2 \end{aligned}$$
(18)

such that \(s+t+u = p^2+q^2+q'^2+p'^2.\) In the following, we will assume working at fixed t and fixed \(q^2\) for a real outgoing photon, such that the amplitude can be fully described by either s or u. To distinguish between the two, we will write \({\mathcal {F}}_s(s)\) and \({\mathcal {F}}_u(u)\) respectively. From that point, several postulates allow us to define an analytic continuation to the complex plane. We briefly give them here but more details can be found in Ref. [63].

  1. 1.

    Causality allows us to extend the physical amplitude to the upper-half complex plane \({\mathcal {P}}^+_s,\) such that:

    $$\begin{aligned} {\mathcal {F}}_s(s) = \lim _{\epsilon \rightarrow 0} {\mathcal {F}}_s (s+i \epsilon ). \end{aligned}$$
    (19)
  2. 2.

    The Schwartz reflection principle allows us to extend the upper plane analytic continuation to the lower one following:

    $$\begin{aligned} {\mathcal {F}}_s(s^*) = {\mathcal {F}}^*_s(s), \end{aligned}$$
    (20)

    and provided that \({\mathcal {F}}_s\) is real on at least a segment of the real axis. It immediately follows:

    $$\begin{aligned} \lim _{\epsilon \rightarrow 0}\text {Re}\left( {\mathcal {F}}_s(s+i\epsilon )\right)&= \lim _{\epsilon \rightarrow 0}\text {Re}\left( {\mathcal {F}}_s(s-i\epsilon )\right) , \end{aligned}$$
    (21)
    $$\begin{aligned} \lim _{\epsilon \rightarrow 0}\text {Im}\left( {\mathcal {F}}_s(s+i\epsilon )\right)&= -\lim _{\epsilon \rightarrow 0}\text {Im}\left( {\mathcal {F}}_s(s-i\epsilon )\right) , \end{aligned}$$
    (22)

    and thus, \({\mathcal {F}}\) is discontinuous on the real axis if the imaginary part doesn’t vanish.

  3. 3.

    The imaginary part of the amplitude can be computed from the optical theorem, and corresponds to the sum of all possible on-shell intermediate states. Single stable states are responsible for poles on the real axis, while multi-particle stable states trigger cuts. Importantly, if st and u are all space-like, then no on-shell intermediate state is allowed and the amplitude is real and continuous on the real axis.

Until now we have discussed the analytic continuation in the complex plane of \({\mathcal {F}}_s(s),\) but the same procedure could be applied to \({\mathcal {F}}_u(u).\) The two variables are connected through:

$$\begin{aligned} s+u = 2p^2 +q^2 -t = \varSigma \end{aligned}$$
(23)

and therefore, \({\mathcal {F}}_s(s)\) and \({\mathcal {F}}_u(u)\) are expected to be related by the crossing symmetry and thus by analytical continuation. However, in the case of CFFs, because one of the outgoing particles is massless (a real photon), to the best of our knowledge, there is no formal proof showing how to connect \({\mathcal {F}}_s(s)\) and \({\mathcal {F}}_u(u).\) We therefore stick to the standard assumption stating that the analytic continuations of amplitudes \({\mathcal {F}}_s(s)\) and \({\mathcal {F}}_u(u)\) on their respective physical upper half-planes connect through a real interval between the two thresholds, as highlighted on Fig. 2).

Fig. 2
figure 2

Representation of half-planes \({\mathcal {P}}_s^+\) and \({\mathcal {P}}_u^+\) aligned according to the crossing condition. Singularities for the u and s channel can appear only on the highlighted intervals

In the specific case of DVCS, we are able to easily identify such an interval. Indeed, since \(q^2\) is negative and much larger in absolute value than the hadron mass and t\(\varSigma \) is negative. We can thus identify an interval between \((s=0,u=\varSigma )\) and \((s=\varSigma ,u=0)\) where both \({\mathcal {F}}_s(s)\) and \({\mathcal {F}}_u(u)\) are real, allowing us to define the analytic continuation between the amplitudes. We also highlight that in the case \(\varSigma > 0,\) the crossing structure is more complicated as the connection has to be done through the respective cuts of the amplitudes. This situation is expected to arise in the case of Time-like Compton Scattering (TCS) where the hard scale is provided by a deeply timelike outgoing virtual photon. We do not consider this case in the present analysis.

The last step to characterise the properties of the CFF is to describe their singularity structure in the Bjorken limit.Footnote 1 We recall that in this limit, all masses, thresholds, and |t| are very small compared to \(Q^2 = -q^2.\) Let us introduce the variable \(\nu \) (differing by a factor 2 compared to the one of [53])

$$\begin{aligned} \nu = \frac{s-u}{\varSigma } \overset{\textrm{Bj}}{\approx } \frac{u-s}{Q^2} \overset{\textrm{Bj}}{\approx } \frac{1}{\xi }, \end{aligned}$$
(24)

such that for \(\nu = 1,\) \((s=\varSigma , u=0)\) and for \(\nu =-1,\) \((s=0,u=\varSigma ).\) Consequently, the CFF \({\mathcal {F}}(\nu )\) is real and continuous for \(\nu \in ]-1;1[,\) and analytic for the entire complex plane but \(\nu \in ]-\infty ,-1] \cup [1,\infty [ = {\mathcal {P}}.\) This structure is simple and illustrated on Fig. 3. The direct and crucial consequence is that for any \(|\nu | < 1,\) one can write the CFFs as:

$$\begin{aligned} {\mathcal {F}}(\nu ) = \sum _{j=0}^{\infty }f_j \nu ^j \end{aligned}$$
(25)

or in terms of the variable \(\xi ,\) for \(|\xi |>1\):

$$\begin{aligned} {\mathcal {F}}(\xi ) = \sum _{j=0}^{\infty }f_j \frac{1}{\xi ^j}. \end{aligned}$$
(26)

Note that because of the Schwartz principle, the \(f_j\) are all real and uniquely define the analytic continuation in the complex plane. This generalises the proof provided of analyticity provided in Ref. [52] which was based on a Taylor expansion of the LO DVCS kernel in the unphysical region \(|\xi |>1.\)

Fig. 3
figure 3

Analytic structure of the amplitude in the \(\nu \)-plane

3.3 All order dispersion relation with arbitrary subtraction

The analyticity of the CFFs of Eq. (26) is a major result, and our next goal is to connect the \(f_j\) coefficients with the associated GPDs. To derive these relations, we first go back to Eq. (9), where the factorisation theorem is applied to write the CFF as a convolution of coefficient function T and a GPD H. The \(\xi \) dependance in these formula can be simplified by using the Double Distribution representation of GPDs. Thus introducing Eq. (6) into (9), one obtains:

$$\begin{aligned} {\mathcal {H}}^q = \frac{1}{\xi }\int _{\varOmega }T^q\left( \alpha +\frac{\beta }{\xi }\right) F^q(\beta , \alpha ) \,\textrm{d}\beta \textrm{d}\alpha + h_0, \end{aligned}$$
(27)

where we define:

$$\begin{aligned} h_0 = \int _{-1}^1T^q(\omega )D^q(\omega )\,\textrm{d}\omega . \end{aligned}$$
(28)
Fig. 4
figure 4

The range of the argument inside function T

Note that we have assumed that the Radon transform and the convolution over x can be interchanged. From Eq. (27), one sees that the analytic properties of \({\mathcal {H}}\) are a direct consequence of the analytic properties of T. And in fact these properties are the same, since both \({\mathcal {H}}\) and T describe the scattering of two particles. T(z) is analytic for \(|z|< 1\) and in our case for \(|\xi | > 1\):

$$\begin{aligned} -1< \alpha - \left| \frac{\beta }{\xi }\right| \le \alpha +\frac{\beta }{\xi }\le \alpha + \left| \frac{\beta }{\xi }\right| < 1 , \end{aligned}$$
(29)

as the support of the DD is limited to \(\{(\alpha ,\beta )| |\alpha |+ |\beta |\le 1\}.\) Consequently, for every value of \(\alpha \) in the DD support, \(T(\alpha + \beta /\xi )\) is analytic in the unphysical region (see Fig. 4), and thus Taylor expanded around \(\alpha \) into:

$$\begin{aligned} T\left( \alpha +\frac{\beta }{\xi }\right) = \sum _{n=0}^\infty \frac{1}{n!} \frac{\partial ^n T}{(\partial \alpha )^n}(\alpha ) \left( \frac{\beta }{\xi }\right) ^n, \quad \text {for}\ |\xi | >1. \end{aligned}$$
(30)

Injecting Eq. (30) into (27), we can compute the coefficients introduced in Eq. (26) in terms of moments of the DD f and derivatives of the coefficient function T:

$$\begin{aligned} {\mathcal {H}}(\xi )&= \sum _{j=0}^\infty h_j \frac{1}{\xi ^j} \quad \text {for } |\xi | >1, \end{aligned}$$
(31)
$$\begin{aligned} h_0&= \int _{-1}^1 T^q(\omega ) D^q(\omega )\,\textrm{d}\omega \end{aligned}$$
(32)
$$\begin{aligned} h_{j+1}&= \frac{1}{j!}\int _{-1}^1\text {d}\alpha \frac{\partial ^j T}{(\partial \alpha )^j}(\alpha ) \int _{-1+|\alpha |}^{1-|\alpha |}\beta ^j F^q(\beta ,\alpha )\,\textrm{d}\beta . \end{aligned}$$
(33)

This completes our first goal, but these results are restricted to the unphysical region. One needs to use the dispersion relation to derive relations between these coefficients in the physical region. To do this, we define the following \({\mathcal {I}}_n\) integrals in the complex plane:

$$\begin{aligned} {\mathcal {I}}_n(\xi )&= \oint _{\varGamma _R} \frac{{\mathcal {H}}(\xi ')}{\xi '-\xi }\left( \frac{\xi '}{\xi }\right) ^n\,\textrm{d}\xi ' \nonumber \\&= \sum _j h_j \oint \frac{\xi '^{-j}}{\xi '-\xi }\left( \frac{\xi '}{\xi }\right) ^n\,\textrm{d}\xi ', \end{aligned}$$
(34)

where the contour \(\varGamma _R\) is illustrated on Fig. 5 and chosen such that the CFF is analytic all along. Note that in Eq. (34), \(\xi '\) is in the unphysical region, allowing us to expand the CFF \({\mathcal {H}},\) but \(\xi \) can be safely chosen in the physical region. Indeed, taking \(\xi \in [-1,1],\) one gets:

$$\begin{aligned} \oint _{\varGamma _R} \frac{\xi '^{-k}}{\xi '-\xi } \,\textrm{d}\xi ' ={\left\{ \begin{array}{ll} 0 & \text {if}\ k > 0 \\ 2i\pi \xi ^{-k} & \text {if}\ k \le 0 \end{array}\right. } , \end{aligned}$$
(35)

as in the case \(k > 0,\) the contributions of the two poles exactly compensate each other. We deduce:

$$\begin{aligned} {\mathcal {I}}_n(\xi ) = 2\pi i \sum _{j=0}^{n} h_j \frac{1}{\xi ^j}, \end{aligned}$$
(36)

showing that the \({\mathcal {I}}_n\) truncate Eq. (31) to order n.

The next step is to connect \({\mathcal {I}}_n\) with the value of the CFF within the physical region. To do so, we can deform the contour integration from \(\varGamma _R\) to \(\varGamma '\) such that:

(37)

where we used the Sokhotski–Plemelj formula. The Schwartz reflexion principle in Eq. (20) allows us to rewrite the formula in terms real and imaginary parts of the CFF such that:

(38)

after safely taking the limit \(\epsilon \rightarrow 0^+.\) An additional subtlety comes from the fact that \(\Im [{\mathcal {H}}]\) is defined as the limit coming from the upper half-plane (see Eq. (19)) using Mandelstam variables, and thus \(\nu .\) Since \(\xi = 1/\nu ,\) we have:

$$\begin{aligned} \Im {\mathcal {H}} = \lim _{\epsilon \rightarrow 0^+} {\mathcal {H}}(\nu +i\epsilon ) = \lim _{\epsilon \rightarrow 0^+}{\mathcal {H}}(\xi - i\epsilon ), \end{aligned}$$
(39)

triggering a plus sign in front of integral.

One should realise that the result in Eq. (38) is conditioned to the behaviour of \({\mathcal {H}}\) within the integration contour, and in particular for \(\xi \rightarrow 0.\) Indeed the contour deformation of Eq. (37) can be performed only if the integrand remain integrable on the contour, especially for \(\xi \rightarrow 0\) (or equivalently \(\nu \rightarrow \infty )\). We expect the CFF to present a Regge behaviour in \(\xi ^{-\alpha }\) for \(\xi \rightarrow 0.\) Then expressions (37) and (38) for \({\mathcal {I}}_n\) are only valid for \(n>\alpha -1.\)

Fig. 5
figure 5

The contours used in the proof of dispersion relations. In red, the singularities and branch cuts

Combining Eqs. (36) and (38) we can deduce the general expression for the n-times subtracted dispersion relation at any order of perturbation theory:

(40)

with \(h_j\) being given in Eqs. (32) and (33). It is easy to verify that \(h_0\) is in fact the subtraction constant \(S^q\) we introduced in Eq. (12).

3.4 New expression for the subtraction constant and consistency with previous results

Equation (40) is the key result of this section allowing us to connect the real and imaginary parts of the CFF to the D-term and the Double Distribution. Yet, it can be simplified using the symmetries of DVCS and GPDs. Importantly, the CFFs are \(\xi \) even and thus \({\mathcal {H}}(\xi ) = {\mathcal {H}}(-\xi ).\) When combining this parity argument with the Schwarz reflection principle \({\mathcal {H}}(\xi ^*) = {\mathcal {H}}^*(\xi )\) we obtain the following constraint:

$$\begin{aligned} {\mathcal {H}}(-\xi ^*) = {\mathcal {H}}^*(\xi ). \end{aligned}$$
(41)

From this we deduce that the real part of the CFF must be even in \(\xi ,\) while the imaginary one must be odd along the real axis, i.e. as a function of the real part of \(\xi .\) This restricts our expansion in Eq. (31) to

$$\begin{aligned} {\mathcal {H}}(\xi ) = \sum _{j \textrm{even}}^\infty h_j \frac{1}{\xi ^j}. \end{aligned}$$
(42)

As a consequence, restricting ourselves to \(\xi \in ]0,1[,\) the dispersion relation (40) can be simplified into:

(43)
(44)

where \(k = 2 [\frac{n}{2}]\) is the largest even number inferior or equal to n. \(n = 0\) and \(n = 1\) are therefore equal, and exactly identical to the usual formula in Eq. (12). For phenomenological CFFs, for which \(x \Im {\mathcal {H}}(x)\) is integrable, this expression converges. Note that the terminology can be misleading. \({\mathcal {S}}\) is usually called the subtraction constant, while we show here that it is actually extracted from an unsubtracted dispersion relation.

On top of Eq. (28) for the quarks contribution, the above discussion can be generalised to the gluons contribution with:

$$\begin{aligned} {\mathcal {S}}^g = \int _{-1}^1 T^{g}(\omega ) D^g(\omega ) \text {d}\omega . \end{aligned}$$
(45)

Because of the Schwartz principle, \(h_0\) (and in fact all the \(h_j)\) is real, which means that only the real part of \(T^{q,g}\) contribute to Eqs. (28) and (45). We are thus left with the following results:

$$\begin{aligned} {\mathcal {S}}^a = \int _{-1}^1 \Re T^{a}(\omega ) D^a(\omega ) \text {d}\omega . \end{aligned}$$
(46)

Equation (46) can be recovered from the results presented in Ref. [53] and recalled in (15). There, the subtraction constant is expressed as a double convolution involving the D-term and the imaginary part of the perturbative kernel. Reshuffling the expression using the odd-parity of the D-term:

$$\begin{aligned} {\mathcal {S}}^q&= \frac{2}{\pi } \int _1^\infty \text {d}\omega \Im T^q(\omega ) \int _{-1}^1 \text {d} \alpha \frac{D^q(\alpha )}{\omega -\alpha } \nonumber \\&= \int _{-1}^1 \text {d} \alpha D^q(\alpha ) \frac{1}{\pi } \int _1^\infty \text {d}\omega \Im T^q(\omega ) \left[ \frac{1}{\omega -\alpha } - \frac{1}{\omega + \alpha } \right] \end{aligned}$$
(47)

and injecting now the dispersion relation of the hard scattering kernel obtained in [53]:

$$\begin{aligned} \Re T^q(\nu ) = \frac{1}{\pi } \int _1^\infty \text {d}\omega \Im T^q(\omega )\left[ \frac{1}{\omega -\nu }-\frac{1}{\omega +\nu } \right] , \end{aligned}$$
(48)

we recover our Eq. (46).

3.5 Higher subtractions and connection with the DDs

An unexpected result of this new derivation of the higher-order connection between the subtraction constant and the D-term, is that the dispersion relation formalism allows us to connect the Mellin moments of the Double Distribution with higher-order subtraction constants. Indeed, going back to Eqs. (31)–(33), one realises that we have only exploited the connection provided by Eq. (32). It is possible to isolate the \(h_j\) for \(j\ne 0\) by subtracting two consecutive terms in Eq. (43) (we recall that \(\xi \in ]0,1[)\):

(49)

After simplification, this can be simply written for \(j \ge 1\) as:

$$\begin{aligned} h_{2j} = \frac{2}{\pi }\int _{0}^1\Im {\mathcal {H}}(\xi ')\left( \xi '\right) ^{2j-1}\,\textrm{d}\xi '. \end{aligned}$$
(50)

Reinjecting Eq. (33), we get for \(\ell \) odd:

$$\begin{aligned}&\frac{2}{\pi }\int _{0}^1\Im {\mathcal {H}}(\xi ')\left( \xi '\right) ^{\ell }\,\textrm{d}\xi ' \nonumber \\&\quad = \frac{1}{\ell !}\int _{-1}^1\text {d}\alpha \frac{\partial ^{\ell } T}{(\partial \alpha )^{\ell }}(\alpha ) \int _{-1+|\alpha |}^{1-|\alpha |}\text {d}\beta \beta ^{\ell } f^q(\beta ,\alpha ). \end{aligned}$$
(51)

Rewording this equation, we note that the \(\ell \)th Mellin moment of the imaginary part of the CFF is connected with the \(\beta \)-moment of the Double Distribution, convoluted then with the derivative of the hard scattering kernel. This equation highlights well the deconvolution problem of GPDs and DDs from DVCS data [39]. Indeed, while two indices are required to independently deconvolute the \((\alpha ,\beta )\) dependence of the DDs, only a single index, \(\ell ,\) appears here. The impact of such relation on the characterisation of shadow GPDs is left for future work.

Finally, let us mention that the fact of implementing dispersion relation at the level of the CFF directly may have a significant impact already at the level of CFF extraction (see figure 2 of [65] for an illustration of the impact of the one-subtracted dispersion relation for DVCS). Assessing the impact of higher-subtracted dispersion relations on CFF extraction is also left for future studies.

4 Extraction of the pressure distribution on collider kinematics: an inverse problem

If we limit ourselves to the lowest subtraction, which only involves the D-term, we have demonstrated that the CFFs give us in principle access to the subtraction constant:

$$\begin{aligned} {\mathcal {S}}^a(t, Q^2) = \int _{-1}^1 \textrm{d}\omega \,\Re T^a\left( \omega , \frac{Q^2}{\mu ^2}, \alpha _s\right) D^a(\omega , t, \mu ^2). \nonumber \\ \end{aligned}$$
(52)

We will elaborate on the challenges related to the characterization of \(S^a\) from experimental data in the next section. For now, we are interested in the fact that the GFF \(C_a(t, \mu ^2),\) which is related to the pressure distribution in the proton, writes as another integral of the D-term, Eq. (8) which we recall with its full variable dependence:

$$\begin{aligned} C_a(t, \mu ^2) = \frac{1}{4}\int _{-1}^1 \textrm{d}\alpha \,\alpha ^{1-p_a} D^a(\alpha , t, \mu ^2). \end{aligned}$$
(53)

An obvious question is whether the knowledge of the subtraction constant \(S^a(t, Q^2)\) allows an unambiguous extraction of the GFF \(C_a(t, \mu ^2).\) This question is very similar to that known as the deconvolution problem [39], which aims at determining whether the measurement of CFFs – i.e. the convolution of the perturbative coefficient function to the full GPD – allows the unambiguous reconstruction of the GPD. In fact, the problem at hand in this paper is exactly the restriction of the general deconvolution problem to the D-term.

As we have already hinted at, the root of the deconvolution problem is that DVCS experimental data offers one less kinematic variable compared to the parton distributions we want to extract. CFFs are functions of \((\xi , t, Q^2),\) whereas GPDs are functions of \((x, \xi , t, \mu ^2);\) the subtraction constant is a function of \((t, Q^2)\) whereas the D-term of \((\alpha , t, \mu ^2).\) One might argue that the GFF \(C_a\) which we are fundamentally interested in, is just a function of \((t, \mu ^2)\) – so a similar kinematic dependence as the subtraction constant \(S^a(t, Q^2).\) However, one cannot write a straightforward relation between the subtraction constant \(S^a(t, Q^2)\) and the GFF \(C_a(t, \mu ^2)\) which does not involve in practice the extraction of the D-term \(D^a(\alpha , t, \mu ^2).\)

There is however a theoretical solution to the missing variable problem. At a given order in perturbation theory, the scale dependence of \(D^a(\alpha , t, \mu ^2)\) is given by renormalization group equations, removing in principle one degree of freedom. In practice, evolution equations entangle the \((\alpha , \mu ^2)\) dependence of the D-term (and the \((x, \xi , \mu ^2)\) dependence of GPDs). However, although this solves the issue on paper, in practice, effect of QCD evolution are rather weak on the range of \(Q^2\) accessible to exclusive processes. Reference [39] offered an explicit construction of very different GPDs (with vanishing D-terms) such that their CFFs would be indiscernible to experimental data. These shadow GPDs represent an illustration of particularly badly constrained objects to DVCS in any foreseeable data. Solutions to this issue involve, on the one hand, the introduction of more theoretical constraints to reduce the functional space accessible to GPDs [40], and on the other hand an ambitious program of global fits on a variety of exclusive processes. In particular, processes which do not show an exacerbated sensitivity to a pole \(x = \xi \) as DVCS, TCS or DVMP, but rather to a pole where x and \(\xi \) are entangled to an external kinematic variable are very desirable, like DDVCS [66,67,68] or two-to-three exclusive processes [24,25,26,27,28, 69,70,71].

In a similar fashion to the study of the deconvolution problem led in Ref. [39], there exist shadow D-terms, which bring barely any contribution to the subtraction constant over current ranges in \(Q^2\) and are therefore extremely hard to discern in the data. Reference [7] gave a hint at such shadow D-term when it highlighted a tremendous increase of uncertainty as soon as the parametrization of the D-term was made slightly more flexible. In practice, it is common to parametrize the D-term through an expansion in Gegenbauer moments due to their friendly LO evolution properties:

$$\begin{aligned} D^q(\alpha , t, \mu ^2)&= (1-\alpha ^2) \sum _{\text {odd}\ n} d_n^q(t, \mu ^2) C^{(3/2)}_n(\alpha ), \end{aligned}$$
(54)
$$\begin{aligned} D^g(\alpha , t, \mu ^2)&= \frac{3}{2}(1-\alpha ^2)^2 \sum _{\text {odd}\ n} d_n^g(t, \mu ^2) C^{(5/2)}_{n-1}(\alpha ). \end{aligned}$$
(55)

As Gegenbauer polynomials form a complete orthogonal family, this representation is fairly general – but comes with the drawback that fixed-order truncations are usually oscillating functions. We refer to Ref. [7] for an account of the LO scale dependence of \(d_n^a(t, \mu ^2).\) In this representation, the LO subtraction constant reads as:

$$\begin{aligned} {\mathcal {S}}(t, Q^2) \overset{\textrm{LO}}{=}4\sum _q e_q^2 \sum _{\text {odd}\ n} d_n^q(t, \mu ^2), \end{aligned}$$
(56)

where we will use conventionally in the following \(\mu ^2 \equiv Q^2.\) On the other hand, the GFF \(C_a(t, \mu ^2)\) reads:

$$\begin{aligned} C_a(t, \mu ^2) = \frac{1}{5}\,d^a_1(t, \mu ^2). \end{aligned}$$
(57)

The problem of relating the subtraction constant to the pressure inside the proton turns into the question of extracting \(d_1^a\) from the sum of all \(d_n^a\) at LO (and more complicated infinite linear combinations of the \(d_n^a\) at higher order). A simple solution to the ill-definedness of this extraction is to assume that only a finite number of coefficients \(d_n^a\) actually contribute to the subtraction constant. In fact, the study of [5] used only \(n = 1,\) and evaluated the systematic uncertainty caused by such a rigid modelling of the D-term by inputs from the chiral quark soliton model [72, 73]. In [7], effects of a truncation at \(n= 1\) and \(n=3\) were compared. It was observed that the – already large – uncertainty on \(d_1\) inflated by a factor 20 when \(d_3\) was allowed to be non-zero. In fact, the reason is fairly simple to understand. Since evolution effects are relatively small on the narrow range in \(Q^2\) available to the current precise DVCS data, \(d_1(t, \mu ^2)\) and \(d_3(t, \mu ^2)\) do not exhibit a significantly different behavior in \(\mu ^2.\) Therefore, parasitic contributions such that

$$\begin{aligned} d_1^q(t, \mu ^2) \approx -d_3^q(t, \mu ^2) \end{aligned}$$
(58)

amount to almost no contribution to the LO subtraction constant of Eq. (56), and are virtually unconstrained. An object which brings exactly no contribution to the subtraction constant at a given scale \(\mu _0^2\) will be called a shadow D-term, and we have already highlighted a very simple example at LO:

$$\begin{aligned} d_1^q(\mu _0^2) = \lambda ; \ d_3^q(\mu _0^2) = -\lambda , \end{aligned}$$
(59)

or equivalently

$$\begin{aligned} D_{S,LO}^q(\alpha , \mu _0^2) = \lambda (1-\alpha ^2)[C_1^{(3/2)}(\alpha ) - C_3^{(3/2)}(\alpha )]. \end{aligned}$$
(60)

The space of shadow D-terms at a fixed scale is a vector space (it is the kernel, or null-space of the integral transform and there exist shadow D-terms of arbitrary size). Under evolution to another scale \(\mu ^2 \ne \mu _0^2,\) the contribution of a shadow D-term to the subtraction constant becomes non-zero. Indeed, (59) can only be true at one scale, since the \(\mu ^2\) dependence of both sides of the equation are ruled by different anomalous dimensions. Therefore, the range in scales on which DVCS is measured precisely directly constrains the maximal size of shadow D-terms, and the uncertainty of the deconvolution procedure.

To give a simple approximate example, if there were no mixing between quarks and gluons, the evolution of \(d_n^q\) would be entirely dictated by the anomalous dimension \(\gamma _n\) following

$$\begin{aligned} d_n^q(\mu ^2)&= \varGamma _n^{qq}(\mu ^2, \mu _0^2) d_n^q(\mu _0^2),\nonumber \\&\quad \text {where}\ \varGamma _n^{qq}(\mu ^2, \mu _0^2) = \left( \frac{\alpha _s(\mu ^2)}{\alpha _s(\mu _0^2)}\right) ^{2\gamma _n / \beta _0}, \end{aligned}$$
(61)

where \(\beta _0\) is the first coefficient in the \(\beta \) function of \(\alpha _s.\) Using \(\gamma _1 = 16/9,\) \(\gamma _3 = 157/45\) and \(\beta _0 = 11 - 2 n_f / 3 = 9,\) we find that the contribution to the subtraction constant of the simple shadow D-term of Eq. (59) is:

$$\begin{aligned} {\mathcal {S}}_S^q(Q^2)&= \frac{8}{3}(\varGamma _1^{qq}(Q^2, \mu _0^2) d_1^q(\mu _0^2) + \varGamma _3^{qq}(Q^2, \mu _0^2) d_3^q(\mu _0^2)) , \end{aligned}$$
(62)
$$\begin{aligned}&\approx \frac{8}{3} \lambda \bigg [\left( \frac{\alpha _s(Q^2)}{\alpha _s(\mu _0^2)}\right) ^{0.395} -\left( \frac{\alpha _s(Q^2)}{\alpha _s(\mu _0^2)}\right) ^{0.775}\bigg ]. \end{aligned}$$
(63)

This gives of course 0 if \(Q^2 = \mu _0^2\) by definition of the shadow D-term. Linearizing the previous relation yields this approximate contribution of the shadow D-term to the subtraction constant:

$$\begin{aligned} {\mathcal {S}}_S^q(Q^2) \approx \lambda \bigg [1-\frac{\alpha _s(Q^2)}{\alpha _s(\mu _0^2)}\bigg ]. \end{aligned}$$
(64)

If the experimental uncertainty of the subtraction constant is characterized by a quantity \(\varDelta S,\) and the measurement have been performed on a range in scales of \([Q^2_{min}, Q^2_{max}],\) our approximate approach tells that the shadow D-term of Eq. (59) will be typically bring a dispersion:

$$\begin{aligned} \sigma _{S,d1q}&\approx \sigma _{S,d3q} \nonumber \\&\approx \frac{3}{8} \frac{\varDelta S}{\varGamma _1^{qq}(Q^2_{max}, Q^2_{min}) - \varGamma _3^{qq}(Q^2_{max}, Q^2_{min})} \end{aligned}$$
(65)
$$\begin{aligned}&\approx \frac{\varDelta S}{\displaystyle \left( 1-\frac{\alpha _s(Q_{max}^2)}{\alpha _s(Q^2_{min})}\right) }. \end{aligned}$$
(66)

The approximate form of the last line should only be used for scales close to the charm mass, as it is derived with the anomalous dimensions of \(n_f = 3.\) Equation (65) is general on the other hand, provided the true evolution operator is used. Of course, there exist many more ways to model the shadow D-term if the parametrization in terms of \(d_n\) is made more flexible than solely \(d_1^q\) and \(d_3^q,\) in particular if explicit gluons contributions are included. The simple Eq. (66) represents a typical estimate of the uncertainty of the deconvolution procedure within the parametric space which we have chosen. Despite the simplifying assumptions that we have made, this result captures the essence of the propagation of uncertainty: first a dependence on the experimental uncertainty of the data through \(\varDelta S,\) and then a characterization of how different the evolution of the different parameters is with respect to the scale.

Formally, since the fit of the \(d_n^a\) coefficients is linear, it is straightforward to write the exact solution of the fit. In particular, the covariance matrix of \(d_1^q\) and \(d_3^q\) that we are interested in for this simple example writes:

$$\begin{aligned} \begin{pmatrix} \sigma _{d1q}^2 & \text {cov}[d_1^q, d_3^q] \\ \text {cov}[d_1^q, d_3^q] & \sigma _{d3q}^2 \end{pmatrix} = (C^T \varOmega ^{-1} C)^{-1}, \end{aligned}$$
(67)

where \(\varOmega \) is the covariance matrix of the fitted dataset and C is the so-called design matrix which contains the values of the fitted functions at the fitted kinematics, here:

$$\begin{aligned} C = \frac{8}{3}\begin{pmatrix} \varGamma _1^{qq}(Q_1^2, \mu _0^2) & \varGamma _3^{qq}(Q_1^2, \mu _0^2) \\ \varGamma _1^{qq}(Q_2^2, \mu _0^2) & \varGamma _3^{qq}(Q_2^2, \mu _0^2) \\ \vdots \end{pmatrix}. \end{aligned}$$
(68)

One can draw a parallel between this exact general formula and the approximate shadow D-term uncertainty that we have derived in Eq. (66). If there were only two measurements in \(Q^2,\) one at \(Q^2_{min}\) and one at \(Q^2_{max},\) we chose to evaluate \(\mu _0^2 = Q^2_{min},\) and the experimental dataset was uncorrelated with standard deviation \(\varDelta S,\) we would find:

$$\begin{aligned} \sigma _{d1q}= & \frac{3}{8}\varDelta S \frac{\sqrt{1 + [\varGamma _3^{qq}(Q^2_{max}, Q^2_{min})]^2}}{\varGamma _1^{qq}(Q^2_{max}, Q^2_{min}) - \varGamma _3^{qq}(Q^2_{max}, Q^2_{min})} \nonumber \\ \end{aligned}$$
(69)
$$\begin{aligned} \sigma _{d3q}= & \frac{3}{8}\varDelta S \frac{\sqrt{1 + [\varGamma _1^{qq}(Q^2_{max}, Q^2_{min})]^2}}{\varGamma _1^{qq}(Q^2_{max}, Q^2_{min}) - \varGamma _3^{qq}(Q^2_{max}, Q^2_{min})}. \nonumber \\ \end{aligned}$$
(70)

Using that \(\varGamma ^{qq}(Q^2_{max}, Q^2_{min}) < 1,\) we find a very similar estimate to the one we have derived using the notion of shadow D-term, without needing the concept at all. Shadow D-terms are after all merely an attempt at simplifying, or making more intuitive, the analysis of the inverse linear problem by identifying obvious directions that are dominant in the uncertainty propagation. It is truly useful when it comes to making broad predictions based on general characteristics of the data without going through the process of generating pseudo-data and fitting them. We show such exercise to broadly evaluate the plausible impact of the EIC in the last section of this paper, where we will construct NLO shadow D-terms and treat the evolution equations properly.

Let us note in passing that, with the same assumptions that were used to derive Eqs. (69)–(70), we find:

$$\begin{aligned} \text {corr}[d_1^q, d_3^q] = \frac{-1-\varGamma _1^{qq} \varGamma _3^{qq}}{\sqrt{(1+[\varGamma _1^{qq}]^2)(1+[\varGamma _3^{qq}]^2)}}, \end{aligned}$$
(71)

where we have omitted the argument \((Q^2_{max}, Q^2_{min}).\) If evolution is very weak, \(\varGamma _1^{qq} \approx \varGamma _3^{qq} \approx 1,\) which gives as expected \(\text {corr}[d_1^q, d_3^q] \approx -1,\) with \(\sigma _{d1q} \approx \sigma _{d3q} \approx +\infty .\)

5 Results on current experimental data

We now conduct a re-analysis of the LO extraction of the GFF \(C_a(t)\) of Ref. [7] using the NLO DVCS coefficient function, and putting to use the understanding of the deconvolution uncertainty stemming from shadow D-terms. A neural network analysis of the global DVCS dataset was conducted in 2019 [54], leveraging 30 observables over 2500 kinematic configurations acquired during 17 years of measurements. The real and imaginary parts of the four leading-twist CFFs were modelled independently. The result of the fit was 100 sets of CFFs which represent a sample of the functional distribution of the CFFsFootnote 2.

The computation of the subtraction constant from Eq. (12) requires the evaluation of the imaginary part of the CFF on the full range of \(\xi \in ]0,1[.\) Since the skewness \(\xi \) is related to the plus component of the four-momentum transfer \(\varDelta ,\) it is bound kinematically by the value of t according to:

$$\begin{aligned} |\xi | \le \frac{\sqrt{-t}}{\sqrt{-t + 4M_p^2}} \end{aligned}$$
(72)

where \(M_p\) is the proton mass. It means that part of the integral must be evaluated over a domain where the CFF is continued analytically and where experimental measurements are impossible. This does not represent a theoretical issue per se. For instance, the Double Distribution can be characterized from a limited range in \(\xi \) as highlighted in [74], and then used to construct the CFF in the range where it is not measured. Likewise, the extraction of GPDs on the lattice, which is performed in Euclidean space using a space-like definition of the kinematic variables gives access to any \(\xi \) at any t [75]. Yet, in the context of an analysis based on experimental data, the severe kinematic limitation on the information on the CFFs represents a challenge. The flexibility of the neural network parametrization attempts to introduce as little bias as possible in the analytic continuation of the CFFs outside of their experimental determination.

Fig. 6
figure 6

Subtraction constant as a function of \(\xi \) at a given value of \((t, Q^2)\) obtained from the dispersion relation applied to a neural network extraction of \({\mathcal {H}}\) from the world DVCS dataset in 2019. We show a subset of 50 replicas and the uncertainty computed using the ordinary sample standard deviation and the MAD robust estimator

From the 100 sets of CFFs stemming from the neural network analysis, we compute 100 samples of the functional distribution of the subtraction constant which we will call replicas in the following. The result at one kinematic point \((t, Q^2)\) as a function of \(\xi \) is presented in Fig. 6. We present the result using both the traditional sample standard deviation estimate:

$$\begin{aligned} \sigma _{sample} = \frac{1}{\sqrt{N-1}} \sqrt{\sum _{i=1}^{N} (X_i - \text {mean}(X))^2} \end{aligned}$$
(73)

and using the outlier robust estimate of the standard deviation known as mean absolute deviation (MAD):

$$\begin{aligned} \sigma _{MAD}(X) = \lambda \, \text {med}\bigg (|X - \text {med}(X)|\bigg ), \end{aligned}$$
(74)

med stands for the median, and \(\lambda = 1 / \varPhi ^{-1}(3/4) = 1.4826\) where \(\varPhi (x)\) is the standard normal cumulative distribution function. This constant rescaling allows the MAD operator to coincide with the standard deviation in infinite statistics under the assumption that the distribution is Gaussian. The large difference between the two estimates in Fig. 6 highlights the important contamination by outliers, that we elaborate on in the next paragraph.

Since the real and imaginary parts of the CFFs have been modelled independently without enforcing the connection between them induced by the polynomiality of GPDs, there is in principle no expectation that the subtraction constant will end up independent of \(\xi .\) However, the result is indeed globally compatible with a constant. We exclude from the analysis the subtraction constant for \(\xi > 0.5,\) as there seems to be a slight systematic downward shift of the subtraction constant at large \(\xi .\) This may stem from the fact that \(\text {Im}\,{\mathcal {H}}(\xi )\) is not constrained to go to 0 as \(\xi \rightarrow 1,\) leading to slightly less behaved subtraction constant integrals in this limit. At the value of \((t, Q^2)\) presented in Fig. 6 – one of the most precise since it corresponds to a region well explored by the JLab 6 GeV data – the subtraction constant is still only one standard deviation away from 0 evaluated using outlier robust statistics. We present in Fig. 7 the strength of the signal of the subtraction constant on the kinematic domain. The fact that neural network analyses of the DVCS dataset available before the JLab 12 GeV upgrade lead to subtraction constants which are compatible with 0 was also established in Ref. [6]. This results to a large extent from the poor constraints on the real part of the CFF \({\mathcal {H}}\) within the current experimental dataset. This highlights the interest of a better determination of this quantity, achievable for instance by measuring unpolarised beam charge asymmetry observables with a positron beam at JLab [76, 77].

Fig. 7
figure 7

Strength of the signal of the subtraction constant (robust estimates), expressed in number of standard deviations from 0. We use the most precise value of \(\xi \) for each kinematic \((t, Q^2),\) excluding \(\xi > 0.5\) (see text). Only the best kinematics allow a characterization at \(1 \sigma \) from 0

5.1 Treatment of outliers

As we have already noticed, our data suffers from noticeable outliers. Therefore, instead of the sample means, variances and covariances, we should use outlier robust estimates. In all this analysis, we will use the sample median instead of the sample mean and the MAD instead of the sample standard deviation.

There exists a considerable literature devoted to the question of a robust estimate of the covariance/correlation matrix (see among the most popular suggestions [78,79,80]). We will use in this paper the straightforward generalization of the MAD estimator for the correlation, which we have not seen used before in the existing literature:

$$\begin{aligned} {\widetilde{r}} \equiv \frac{\text {med}\bigg ((X-\text {med}(X))(Y-\text {med}(Y))\bigg )}{\sqrt{\text {med}\bigg ((X - \text {med}(X))^2\bigg )\text {med}\bigg ((Y - \text {med}(Y))^2\bigg )}}. \nonumber \\ \end{aligned}$$
(75)

A comparison of this estimator to the many others presented in the literature would divert us from the physics purpose of this paper, and is conducted in a separate more statistically focused paper [81]. We stick therefore to a succinct presentation. In effect, this estimator does not directly relate to the correlation \(r \equiv \text {corr}[X,Y] = \text {cov}[X,Y] / (\sigma _X \sigma _Y)\): just like a factor \(\lambda \) was necessary to relate the MAD estimator to the standard deviation for a normal distribution, some procedure is needed to match (75) to \(r = \text {corr}[X,Y].\) It is likely that there exist no closed-form matching formula in the case of a bivariate normal distribution. However as presented in [81] and demonstrated empirically in Fig. 8, the following approximation is accurate beyond the per mille level and fully satisfactory considering the precision of this study:

$$\begin{aligned} r / {\widetilde{r}} = 1 - 0.5635 \ln |r|. \end{aligned}$$
(76)

Isolating properly the correlation coefficient yields:

$$\begin{aligned} r = 0.5635 \, {\widetilde{r}}\, W\left( \frac{10.47}{ |{\widetilde{r}}|}\right) \end{aligned}$$
(77)

where W is the Lambert-W function, defined as the inverse function of \(f(W) = W e^W.\)

Fig. 8
figure 8

Empirical measurements of \(r / {\widetilde{r}}\) from extensive samplings of correlated normal distributions, and their fit by the functional form of Eq. (76)

One should note that this definition does not guarantee that the resulting covariance/correlation matrix is positive definite. Only in the limit of infinite statistics of a true Gaussian distribution do we formally expect that this property is fulfilled. This may however not be as much of a drawback that it appears at first sight. The maximal size of negative eigenvalues in the spectrum provides a clear physical evidence of the magnitude of poorly estimated correlations. Instead of using the inverse of the covariance matrix to compute the \(\chi ^2,\) one could therefore use a singular value decomposition (SVD) with a threshold on the singular values larger than the absolute value of the negative eigenvalues.

To demonstrate practically the interest of our outlier resilient covariance matrix estimate, we deliberately create an outlier-ridden distribution, made of a mixture of a narrow normal distribution, and a fraction of samples from a wider normal distribution. The result is depicted in Fig. 9 and shows that our robust estimators are less affected by the outliers. Note that in general, our robust estimators have a larger dispersion than the sample ones (in statistical terms, they are less efficient). In the example presented here, the dispersion of our robust operators is typically 20–30% larger than the sample ones. But although the sample operators are less dispersed, their expectation is further from the value of interest in presence of strong outliers.

Fig. 9
figure 9

We generate a high-statistics sample with 10% of outliers. The median based estimators are less contaminated by the outliers than the ordinary sample estimators

5.2 A model of the D-term

As we have already noted, our data-driven neural network extraction of the subtraction constant is largely unconstrained as |t| and \(Q^2\) increase. We have stressed in Sect. 4 that the scale dependence is instrumental to perform a model independent extraction of the D-term or the C(t) GFF from the experimental data. To proceed forward and obtain sensible results at large |t|,  we are forced to use a model of D-term. We choose the following usual strategy:

  • The scale dependence of the D-term is given by the LO renormalization group equation resummed at leading-logarithmic accuracy. We use the LO running of \(\alpha _s\) with \(\alpha _s(M_Z^2) = 0.118\) and threshold crossing at \(m_c = 1.27\) GeV and \(m_b = 4.18\) GeV.

  • We truncate the Gegenbauer expansion (55) to either \(n = 1\) or \(n = 3,\) which allows us to probe a part of the shadow D-term uncertainty as we have explained in the previous section.

  • We assume an equal contribution of the light quarks \(d_n^{u} = d_n^{d} = d_n^{s} = d_n^{uds}\) and a purely radiatively generated charm contribution, that is \(d_n^c(\mu ^2 = m_c^2) = 0.\)

  • We enforce a factorized t-dependence under the form of a tripole Ansatz:

    $$\begin{aligned} D^a(t) = D^a(t = 0) \left( 1-\frac{t}{M^2}\right) ^{-3}, \end{aligned}$$
    (78)

    where \(M = 0.8\) GeV. The lack of distinctive t-dependence in the CFF extraction makes this fully constrained Ansatz satisfactory.

Our model is therefore entirely defined by the coefficients \(d_n^{uds}(t = 0, \mu _0^2)\) and \(d_n^g(t = 0, \mu _0^2)\) at a conventionally fixed scale \(\mu _0 = 2\) GeV. We want to obtain the distribution of those parameters so that

$$\begin{aligned} \int _{-1}^1 \textrm{d}\omega \,T^a(\omega , \alpha _s(Q^2)) \varSigma ^{ab}(Q^2, \mu _0^2) \otimes D^b(\omega , t, \mu _0^2) \end{aligned}$$
(79)

approximates as much as possible the distribution of the 100 replicas \(S(\xi , t, Q^2).\) \(\varSigma ^{ab}(Q^2, \mu _0^2)\) represent the leading-logarithmic evolution operator of the D-term. Implicit summation on the repeated indices is subtended.

The problem presents itself as finding the best fit of a target function by a parametrized function. A simple way to proceed is to sample the target function and perform a least-squares fit. However, the answer may depend on the choice of kinematics where the sample is performed. In order to perform a correlated fit, we need to select much fewer kinematic values than the number of replicas which are available. With 100 replicas, we will use \(N_{kin}\) kinematics selected because they represent the strongest signal of the subtraction constant. Precisely, we decompose the \((\xi , t, Q^2)\) phase-space in a regular grid with a logarithmic spacing in \(\xi \) characterized by a multiplicative factor of 1.5, a uniform spacing in t characterized by a pace of 0.1 \(\hbox {GeV}^2,\) and a logarithmic spacing in \(Q^2\) (multiplicative factor of 1.35). Then we select the \(N_{kin}\) kinematics where the ratio of the median of the replicas by the MAD is the largest. This prevents from overfitting our model on a region where the neural network is left largely unconstrained.

Let’s call \((\xi _i, t_i, Q^2_i)\) the set of kinematics we have just described, on which the replicas of \(S(\xi , t, Q^2)\) are sampled. Once this choice is fixed, several options present themselves to determine the best parameters.

  1. 1.

    The most natural option is to determine the distribution of our free parameters so as to minimize the correlated least squares:

    $$\begin{aligned}&\sum _{i, i'} (\text {model}(\xi _i, t_i, Q^2_i) - \bar{S}(\xi _i, t_i, Q^2_i)) \times \text {cov}^{-1} [S]_{i,i'} \nonumber \\&\quad \times (\text {model}(\xi _{i'}, t_{i'}, Q^2_{i'}) - \bar{S}(\xi _{i'}, t_{i'}, Q^2_{i'})). \end{aligned}$$
    (80)

    \(\bar{S}\) is the sample median of the dataset and \(\text {cov}[S]\) the robust covariance matrix. Since the fit is linear, the best-fit parameters are normally distributed and we do not need to use individual replicas.

  2. 2.

    If we used an uncorrelated least-squares fit (only the diagonal terms of the covariance matrix), then we would not be limited in the number of kinematics where to perform the fit. However, neglecting outright the statistical information of correlation of the target function seems unjustified.

  3. 3.

    The LO study in Ref. [7] used an hybrid approach: for each replica \(S_j(\xi , t, Q^2)\) where \(1 \le j \le 100\) labels the replica, the best-fit value was found with an uncorrelated least squares:

    $$\begin{aligned} \sum _i \frac{(\text {model}(\xi _i, t_i, Q^2_i) - S_j(\xi _i, t_i, Q^2_i))^2}{(\varDelta S(\xi _i, t_i, Q^2_i))^2} \end{aligned}$$
    (81)

    where \(\varDelta S(\xi _i, t_i, Q^2_i)\) is the outlier robust standard deviation computed from the 100 replicas. The result is then made of the distribution of the best-fit value over each replica. This strategy can in principle be applied to an arbitrary number of kinematics, and yet encompasses part of the correlated information through the use of the distribution of replicas. In the absence of outliers distorting the distribution, if we used \(\text {cov}^{-1}[S]\) instead of \(1 / (\varDelta S)^2\) in Eq. (81), we would find exactly the first method, while if we used \(\bar{S}\) instead of the individual \(S_j\) replicas, we would recover exactly the second method.

Fig. 10
figure 10

Fit by a constant function of a functional distribution represented by the grey replicas. The fit is performed on the sampled kinematics represented by the blue points using (1) a fully correlated fit, (2) a fully uncorrelated fit and (3) the hybrid method of uncorrelated fit replica-by-replica

To appreciate the difference between the three methods independently from the question of the outlier suppression, we construct in Fig. 10 a fictitious normal distribution of replicas (grey curves), sample it on some kinematics (blue points), and apply the different methods in the absence of outliers. The correlated fit (top plot) exhibits a smaller variance than the uncorrelated one (middle plot) as a reflection of the fact that the grey replicas exhibit a long-distance anti-correlation: the replicas that go down at small x tend to go up at large x and vice-versa. This has the effect of pinning down the best constant more precisely than when this information is simply neglected. The hybrid method (bottom plot) exhibits a significantly larger uncertainty than the other two. In the following, we will present results using the correlated method. It is clear that the most reliable strategy would be to perform a full refit of the experimental data – which is outside of the scope of this study which only aims at giving a qualitative understanding of the effect of switching from a LO to a NLO analysis.

Finally, to stress once again the importance of our robust estimate of the covariance, we plot in Fig. 11 a comparison between the spectrum of eigenvalues of the sample covariance matrix versus our robust estimate with \(N_{kin} = 20\) on the subtraction constant dataset. The difference between the largest eigenvalues is mainly driven by the fact that \(\sigma _{MAD}\) is smaller than the sample standard deviation which is inflated by outliers. The eigenvalues of the robust estimator then decrease more slowly than the sample one, which means that they will produce an increased stability in a \(\chi ^2\) fit. Six eigenvalues of the robust estimator are negative. The dotted line represents the largest of them in absolute value, and represents a physical criterion to discard smaller eigenvalues as unreliable. In the end, at most 7 eigenvalues of the covariance matrix are reliably estimated.

The fits presented in the following section are stable when \(N_{kin}\) varies in the interval 10 to 30. Below, we under-sample the phase-space available to the study, resulting in larger uncertainties. Above, the quality of inference of the covariance matrix decreases sharply since \(N_{kin}\) becomes fairly large compared to the 100 replicas. When we use the unreliable sample covariance estimator, we do not find such an extended region of stability. We will show all results of the fits for \(N_{kin} = 20\) and the robust estimators. \(N_{kin} = 20\) corresponds to probing the subtraction constant in the region: \(\xi \in [0.1, 0.4],\) \(-t \in [0.2, 0.4]\) \(\hbox {GeV}^2\) and \(Q^2 \in [1, 2.5]\) \(\hbox {GeV}^2,\) which corresponds to the bulk of the most constraining dataset, JLab 6 GeV.

Fig. 11
figure 11

Comparison of the spectrum of eigenvalues of the covariance matrix for \(N_{kin} = 20\) on the subtraction constant dataset on which this study is performed. We compare the sample covariance to our robust estimator. The dotted line represents the largest negative eigenvalue in the spectrum of the robust estimate

5.3 Results of the fits at LO

5.3.1 LO radiative gluons and \(n = 1\)

At first, we consider only terms with \(n = 1\) in the Gegenbauer expansion of Eq. (55). Furthermore, we assume a radiative gluon generation, that is that \(d_1^g(t = 0, \mu _g^2) = 0\) for some low-lying scale \(\mu _g^2.\) Therefore, \(d_1^{uds}(t = 0, \mu _0^2)\) is really the only free parameter. Using \(\mu _g = 300\) MeV, we obtain the LO result:

(82)

In spite of the different fitting methodology compared to Ref. [7], the results are very similar: there was determined \(d_1^{uds} = -0.5 \pm 1.2\) and \(d_1^g = -0.6 \pm 1.6.\) It was also noticed in Ref. [7] that the threshold \(\mu _g\) were gluons are introduced has barely any impact on the fitted value of \(d_1^{uds}.\) For instance, if we use \(\mu _g = 1\) GeV, we still obtain \(d_1^{uds}(t = 0, 2~\text {GeV}^2) = -0.6 \pm 1.1,\) while on the other hand, \(d_1^g = -0.1 \pm 0.2.\)

In order to understand this interesting observation, we need to remind ourselves that at LO, there is no direct contribution of the gluons to the subtraction constant. The only contribution is indirect, through the radiation of quarks by gluons in the perturbative evolution. On the range of \(Q^2\) relevant for this analysis, that is \(Q^2 \in [1, 2.5]\) \(\hbox {GeV}^2,\) the evolution operator resummed to leading logarithmic accuracy reads:

$$\begin{aligned} \begin{pmatrix} d_1^{uds}(2.5~\text {GeV}^2) \\ d_1^{g}(2.5~\text {GeV}^2) \\ d_1^{c}(2.5~\text {GeV}^2) \end{pmatrix} = \begin{pmatrix} 0.92 & 0.015 \\ 0.23 & 0.95 \\ 0.001 & 0.007 \end{pmatrix} \begin{pmatrix} d_1^{uds}(1~\text {GeV}^2) \\ d_1^{g}(1~\text {GeV}^2) \end{pmatrix}. \end{aligned}$$
(83)

We label the coefficients of the evolution matrix from \(\mu _0^2\) to \(\mu ^2\) as:

$$\begin{aligned} \begin{pmatrix} \varGamma _1^{qq}(\mu ^2, \mu _0^2) & \varGamma _1^{qg}(\mu ^2, \mu _0^2) \\ \varGamma _1^{gq}(\mu ^2, \mu _0^2) & \varGamma _1^{gg}(\mu ^2, \mu _0^2) \\ \varGamma _1^{cq}(\mu ^2, \mu _0^2) & \varGamma _1^{cg}(\mu ^2, \mu _0^2) \end{pmatrix}. \end{aligned}$$
(84)

Notice how small \(\varGamma _1^{qg},\) the radiation of light quarks by gluons, is in the range of scales covered by the bulk of the experimental data. This means that gluon contribution to the subtraction constant at LO is heavily suppressed. Introducing the gluon radiation threshold, we obtain that:

$$\begin{aligned} d_1^{uds}(\mu ^2)&= [\varGamma _1^{qq}(\mu ^2, \mu _0^2) +\varGamma _1^{qg}(\mu ^2, \mu _0^2)\varGamma _1^{gq}(\mu _0^2, \mu _g^2) \nonumber \\&\quad / \varGamma _1^{qq}(\mu _0^2, \mu _g^2)] \times d_1^{uds}(\mu _0^2). \end{aligned}$$
(85)

The maximal effect of gluons on the fit of \(d_1^{uds}\) is obtained when \(\mu _0^2\) and \(\mu ^2\) are taken at the extreme values covered reliably by the experimental data, so here 1 and 2.5 \(\hbox {GeV}^2.\) We find therefore that the features of the fitted data in the interval [1, 2.5] \(\hbox {GeV}^2\) that can be imputable to gluons are typically of the order of \(\varGamma _1^{qg}(2.5, 1) / \varGamma _1^{qq}(2.5,1) \times \varGamma _1^{gq}(1, 0.09) / \varGamma _1^{qq}(1, 0.09) = 0.015 / 0.92 \times 1.21 = 2\)% of the contribution imputable to \(d_1^{uds}.\) If \(\mu _g\) increases, the gluonic contribution decreases even more, but that is in any case completely imperceptible. In other words, in a LO analysis, radiative gluons might as well be equivalent to no gluons at all. Although this means that the quark contribution to the GFF \(\sum _q C_q(t)\) is quite independent of the choice of radiative threshold, it clearly means that the overall GFF \(C(t) = \sum _q C_q(t) + C_g(t)\) is extremely unreliable.

5.3.2 LO radiative gluons and \(n = 3\)

Still using a radiative gluon generation with \(\mu _g = 300\) MeV, we now allow both \(d_1^{uds}(t = 0, 2\) \(\hbox {GeV}^2)\) and \(d_3^{uds}(t = 0, 2\) \(\hbox {GeV}^2)\) to be fitted. At LO, we find:

(86)

\(d_1\) and \(d_3\) are anti-correlated in excess of 99% as identified in Ref. [7] before. One observes that, within uncertainty, \(d_1^{uds} \approx - d_3^{uds}.\) In other words, the uncertainty in the extraction is almost entirely stemming from contamination of LO shadow D-terms. We have derived in Sect. 4 an approximate estimator of the uncertainty linked precisely to this shadow D-term \(d_1^{uds} \approx - d_3^{uds}\) with a simplified evolution kernel. We found that (66):

$$\begin{aligned} \sigma _{d1q} \approx \sigma _{d3q} \approx \frac{\varDelta S}{\left( 1-\frac{\alpha _s(Q_{max}^2)}{\alpha _s(Q^2_{min})}\right) } \end{aligned}$$
(87)

\(\varDelta S\) can be obtained by noting that the fit with \(d_1\) alone reads \(S = 8 / 3 \times d_1^{uds},\) and therefore \(\varDelta S \approx 8 / 3 \times 1.1.\) This is probably an underestimation, since it takes the uncertainty of the simplest fit as a measure of the uncertainty of the full quantity. Then using \(Q^2_{max} = 2.5\) \(\hbox {GeV}^2\) and \(Q^2_{min} = 1\) \(\hbox {GeV}^2,\) we find

$$\begin{aligned} \sigma _{d1q} \approx \sigma _{d3q} \approx 16, \end{aligned}$$
(88)

to compare with the value of 26.5 that our fit produced. Besides the likely underestimation of \(\varDelta S,\) the main drawback of this approximation is the reduction of the information contained in the scale dependence to a sole interval \([Q^2_{min}, Q^2_{max}]\) where we assume that the data is uncorrelated and uniformly constraining. Then, the result is of course sensitive to the choice of this interval. For instance, simply raising \(Q^2_{min}\) to 1.4 \(\hbox {GeV}^2\) [and therefore reducing the range in scales where we believe the data to be truly constraining] would produce an estimate of \(\sigma _{d1q} \approx \sigma _{d3q} \approx 25.5,\) very similar to the one truly observed. The approximate evolution used to derive our estimate (66) only represents a minor imprecision owing to the negligible effect of radiative gluons and the fact that our scales are close to the charm mass.

5.3.3 LO unconstrained gluons and \(n = 1\)

This time, we allow \(d_1^g(t = 0, \mu _0^2)\) to also be a free parameter. We find at LO:

(89)

There again, the results are in good agreement with the results of Ref. [7]. Two interesting features are noticeable: the extraction of \(d_1^{uds}\) has been left unchanged by the addition of unconstrained gluons, and the uncertainty of the gluon term has increased by a factor 90 compared to the radiative gluons. This factor of 90 can be understood as being related to \(\varSigma ^{qq}(2.5, 1) / \varSigma ^{gq}(2.5, 1) = 0.92/0.015 \approx 60,\) the factor by which \(d_1^g(1\) \(\hbox {GeV}^2)\) must be larger than \(d_1^{uds}(1\) \(\hbox {GeV}^2)\) so that both terms contribute with the same order of magnitude to the fitting of the data. As for the reason why \(d_1^{uds}\) is unchanged by unconstrained gluons at LO, it comes from the fact that the fitted distributions of \(d_1^{uds}\) and \(d_1^g\) are largely uncorrelated (correlation coefficient of \(-0.10).\) As we have stressed before, the large similarity of \(\varGamma _1^{qq}\) and \(\varGamma _3^{qq}\) is the root of the very large correlation between \(d_1^q\) and \(d_3^q.\) On the other hand, \(\varGamma _1^{qq}\) and \(\varGamma _1^{qg}\) present very different functional forms, and result therefore in far less correlated fits as can be observed on Fig. 12.

Fig. 12
figure 12

Comparison of the functional form of the operators \(\varGamma _1^{qq},\) \(\varGamma _3^{qq},\) \(\alpha _s \varGamma _1^{gg}\) and \(\varGamma _1^{qg}\) as a function of the scale. The essence of the deconvolution problem is that, when the fitted functional forms are too similar to one another, the associated parameters are extremely difficult to differentiate and inflate considerably the uncertainty

5.4 Results of the fits at NLO

At NLO, for \(\mu ^2 = Q^2\) and with a truncation up to Gegenbauer moments of order \(n = 3,\) the subtraction constant reads:

$$\begin{aligned}&S = \sum _q e^2_q S^q + S^g, \end{aligned}$$
(90)
$$\begin{aligned}&S^q \overset{\textrm{NLO}}{=}d_1^q \left( 4 - \frac{4}{9}\frac{\alpha _s C_F}{4 \pi }\right) + d_3^q \left( 4 + \frac{14759}{450}\frac{\alpha _s C_F}{4 \pi }\right) , \end{aligned}$$
(91)
$$\begin{aligned}&S^g \overset{\textrm{NLO}}{=}\frac{\sum _q e^2_q \alpha _s T_F}{4\pi } \left( -\frac{172}{9}d_1^g - \frac{3317}{150} d_3^g\right) , \end{aligned}$$
(92)

where \(C_F = 4/3\) and \(T_F = 1/2.\) We use the LO running of \(\alpha _s\) from APFEL [82, 83], which remains continuous at the heavy quark mass thresholds. However, a naive implementation of the gluon coefficient function is discontinuous at threshold due to the factor \(\sum _q e^2_q.\) This makes no practical numerical difference for radiative gluons fits, but becomes important for the fit results with unconstrained gluons. The appropriate course of action would be to include heavy quark mass effects in the coefficient function, but this extends beyond the scope of this paper. To avoid spurious effects, we will therefore consider for the rest of the paper that the factor \(\sum _q e^2_q\) in the gluonic contribution to the subtraction constant is fixed to 10/9, the value it assumes naively if \(n_f = 4.\)

5.4.1 NLO radiative gluons and \(n = 1\)

With a threshold of radiative gluon generation at 300 MeV, we find:

(93)

The results are almost identical to the LO results of the same fit. There again, changing the threshold for gluon production is only a little effect on \(d_1^{uds},\) although larger than at LO. With a threshold \(\mu _g^2 = 1~\hbox {GeV}^2,\) we find \(d_1^{uds} = -0.6 \pm 1.1\) and \(d_1^g = -0.1 \pm 0.2.\)

To understand the similarity between the LO and NLO fit in the case where only \(d_1^{uds}\) is fitted, we observe that \(\alpha _s\) is at most \(\alpha _{s, max} = 0.35\) in the fitted range. Then the NLO quark term reads (91):

$$\begin{aligned} \sum _q e^2_q\left( 4 - \frac{4}{9}\frac{\alpha _{s, max} C_F}{4 \pi }\right) \approx 3.98 \sum _q e^2_q \approx 2.65, \end{aligned}$$
(94)

where we neglected the charm contribution to the subtraction constant, and for gluons (92):

$$\begin{aligned} -\frac{172}{9}\frac{\sum _q e^2_q \alpha _{s, max} T_F}{4\pi } \approx -0.30, \end{aligned}$$
(95)

which is of the order of 10% of the quark contribution. Due to its negative sign, it causes a slight increase in the fitted value of \(d_1^{uds}\) compared to the situation at LO when the gluon threshold is small enough. Let us note that although gluons still play a minor role in the extraction, it is a much bigger one than at LO where we estimated it to 2% because of the smallness of \(\varGamma _1^{qg}.\)

5.4.2 NLO radiative gluons and \(n = 3\)

Still with a radiative gluon threshold at \(\mu _g = 300\) MeV, and allowing \(d_1^{uds}\) and \(d_3^{uds}\) to be fitted, we obtain:

(96)

The general uncertainty is of the same order of magnitude as the one obtained at LO and the anti-correlation still in excess of 99%. However, whereas at LO, we had \(d_1^{uds} \approx - d_3^{uds},\) the situation has changed a bit. Since the subtraction constant has been modified at NLO, it does not admit exactly the same shadow D-terms. Indeed, the \(d_3^{uds}\) term in (91) gives with \(\alpha _s(2~\text {GeV}^2) \approx 0.3\):

$$\begin{aligned} \sum _q e^2_q\left( 4 + \frac{14759}{450}\frac{\alpha _s C_F}{4 \pi }\right) \approx 3.36. \end{aligned}$$
(97)

An NLO shadow D-term must now also cancel the contribution stemming from the gluons, which we can no longer ignore even in the radiative approximation. Using the reference scale of 2 \(\hbox {GeV}^2\) and the explicit relation between \(d_n^g\) and \(d_n^{uds}\) offered by the radiation threshold, the gluonic contribution of \(n = 1\) reads:

$$\begin{aligned} -\frac{172}{9}\frac{\sum _q e^2_q \alpha _s T_F}{4\pi } \frac{\varGamma _1^{gq}(2, 0.09)}{\varGamma _1^{qq}(2, 0.09)} d_1^{uds} \approx -0.36 d_1^{uds}, \end{aligned}$$
(98)

whereas for \(n = 3\):

$$\begin{aligned} \frac{3317}{150}\frac{\sum _q e^2_q \alpha _s T_F}{4\pi } \frac{\varGamma _3^{gq}(2, 0.09)}{\varGamma _3^{qq}(2, 0.09)} d_3^{uds}\approx 0.05 d_3^{uds}. \end{aligned}$$
(99)

We note in passing that the effect of gluons at NLO on \(d_3^{uds}\) is much smaller than on \(d_1^{uds}.\) Finally, the NLO subtraction constant at 2 \(\hbox {GeV}^2\) reads approximately as:

$$\begin{aligned} S(2~\text {GeV}^2) \overset{\textrm{NLO}}{=}(2.65 - 0.36) d_1^{uds} + (3.36 + 0.05) d_3^{uds}, \nonumber \\ \end{aligned}$$
(100)

to compare with the LO:

$$\begin{aligned} S \overset{\textrm{LO}}{=}2.65 d_1^{uds} + 2.65 d_3^{uds}. \end{aligned}$$
(101)

An expectation of NLO shadow D-term with radiative gluons at a 300 MeV threshold is therefore:

$$\begin{aligned} d_1^{uds} \approx -\frac{3.36 + 0.05}{2.65 - 0.36} d_3^{uds} \approx -1.5 d_3^{uds}. \end{aligned}$$
(102)

Using the observed value of \(\sigma _{d3q} = 15,\) this would predict \(\sigma _{d1q} \approx 22.5,\) which is close to the true value of 21. The estimator is made somewhat more complicated than at LO by the necessary inclusion of the gluon term and \(\alpha _s.\) It provides however reliable results, demonstrating that the interpretation of the uncertainty in terms of a simple shadow D-term is a valuable tool.

5.4.3 NLO unconstrained gluons and \(n = 1\)

If we now allow the \(d_1^g\) term to be freely fitted alongside \(d_1^{uds},\) we find:

(103)

One will notice that the uncertainty on \(d_1^{uds}\) has increased by a large factor compared to the case where \(d_1^g\) was not a free parameter. This indicates very large correlation between \(d_1^{uds}\) and \(d_1^g\) at NLO, and the impact of an underlying shadow D-term.

The reason why a shadow D-term produces a large effect at NLO in the joint fit of \(d_1^{uds}\) and \(d_1^g\) whereas it was not visible at LO is that gluons now contribute to the subtraction constant in their own right, mostly through \(\alpha _s(Q^2)\varGamma ^{gg}_1(Q^2, \mu _0^2).\) At LO they could only contribute through the radiation term \(\varGamma ^{qg}_1(Q^2, \mu _0^2).\) But \(\varGamma ^{gg}_1\) is a diagonal term in the evolution matrix, whose functional dependence is very similar to that of \(\varGamma _1^{qq},\) and fundamentally different of the off-diagonal term \(\varGamma _1^{qg}.\) The operators can be compared on Fig. 12 where it will be apparent that the impact of the shadow D-term related to \(d_1^g\) at NLO remains smaller than that of \(d_3^{uds}.\) Therefore \(\sigma _{d1q}\) remains less affected by the inclusion of a free NLO \(d_1^q\) than by a LO or NLO \(d_3^{uds}.\)

6 An EIC perspective

In Sect. 4, we derived a simple estimate of the uncertainty of \(d_1^q\) and \(d_3^q\) when they are fitted jointly in a LO framework with simplified evolution over a range \([Q^2_{min}, Q^2_{max}].\) As we studied NLO fits in Sect. 5, we extended the concept and started to consider the case of explicit gluonic degrees of freedom. Let us give here final general expressions and apply them on a kinematic range relevant for the EIC.

We assume that the contribution of heavy quarks remains always negligible in the subtraction constant and that the contribution of all three light flavors is the same \(d_n^{uds}.\) In the absence of appropriate heavy quark mass effects in the gluon coefficient function, we fix \(n_e = \sum _q e^2_q = 10/9.\) Then, at a given value of t,  we remind that the subtraction constant at NLO truncated to the Gegenbauer moments \(n = 3\) reads:

$$\begin{aligned}&S \overset{\textrm{NLO}}{=}\frac{2}{3} d_1^{uds} \left( 4 - \frac{4}{9}\frac{\alpha _s C_F}{4 \pi }\right) -\frac{172}{9}\frac{n_e \alpha _s T_F}{4\pi } d_1^g \nonumber \\&\quad + \frac{2}{3} d_3^{uds} \left( 4 + \frac{14759}{450}\frac{\alpha _s C_F}{4 \pi }\right) - \frac{3317}{150} \frac{n_e \alpha _s T_F}{4\pi } d_3^g \end{aligned}$$
(104)

where S\(d_n^a\) and \(\alpha _s\) have all an implicit dependence on \(Q^2.\)

First let us study the impact of EIC kinematics on the free extraction of a gluon contribution with \(n=1\) only. Following the reasoning of Sect. 4, we cancel the subtraction constant at some reference scale:

$$\begin{aligned} 0 = a(\mu _0^2) d_1^{uds}(\mu _0^2) + b(\mu _0^2) d_1^g(\mu _0^2), \end{aligned}$$
(105)

where the coefficients a and b are read straightforwardly from the first line of Eq. (104). We choose \(\mu _0^2 = m_c^2,\) which will also serve as the \(Q^2_{min}\) of our data. Then at any scale, we have:

$$\begin{aligned} S(Q^2)&= a(Q^2) (\varGamma _1^{qq}(Q^2, \mu _0^2)d_1^{uds}(\mu _0^2)\nonumber \\&\quad + \varGamma _1^{qg}(Q^2, \mu _0^2) d_1^{g}(\mu _0^2)) \nonumber \\&\quad + b(Q^2) (\varGamma _1^{gq}(Q^2, \mu _0^2)d_1^{uds}(\mu _0^2) \nonumber \\&\quad + \varGamma _1^{gg}(Q^2, \mu _0^2)d_1^{g}(\mu _0^2)). \end{aligned}$$
(106)

Assuming that a quantity \(\varDelta S\) represents the typical experimental uncertainty on the subtraction constant for \(Q^2 \in [\mu _0^2, Q^2_{max}]\) where \(Q^2_{max}\) is the largest scale where the subtraction constant is extracted reliably, we find:

$$\begin{aligned} \sigma _{d1q}(\mu _0^2)&= \varDelta S \times \bigg | a(Q_{max}^2) \varGamma _1^{qq} - \frac{a(Q_{max}^2)a(\mu _0^2)}{b(\mu _0^2)} \varGamma _1^{qg} \nonumber \\&\quad + b(Q_{max}^2) \varGamma _1^{gq} - \frac{a(\mu _0^2)b(Q_{max}^2)}{b(\mu _0^2)} \varGamma _1^{gg}\bigg | ^{-1}, \end{aligned}$$
(107)

where all the \(\varGamma _n\) operators have an implicit argument \((Q_{max}^2, \mu _0^2).\) Straightforwardly,

$$\begin{aligned} \sigma _{d1g}(\mu _0^2) = \left| \frac{a(\mu _0^2)}{b(\mu _0^2)}\right| \sigma _{d1q}(\mu _0^2) . \end{aligned}$$
(108)

We depict in Fig. 13 the value of this estimate of the precision using \(\varDelta S = 3,\) that is considering that the current precision on the subtraction constant is extended to a much larger range in \(Q^2.\) The current estimated precision with the CFF extraction used in this study is depicted by the red star. One observes that constraining the CFFs up to \(Q^2 = 20~\hbox {Gev}^2\) could reduce the uncertainty by a factor 3 to 4. Let us notice also that our estimator predicts very accurately that \(\sigma _{d1g} \approx 10\sigma _{d1q},\) a relation that one can observe to be accurately verified in our actual fit as well.

Fig. 13
figure 13

Evolution of the uncertainty of \(d_1^{uds}\) and \(d^1_g\) when the latter is a free parameter depending on the range of \(Q^2\) available for the extraction of CFFs. The red star denotes approximately the current situation

Now we study the impact of EIC kinematics on the extraction of both \(d_1^{uds}\) and \(d_3^{uds}.\) We will assume that the gluon part of the shadow D-term is 0 at the reference scale. Then:

$$\begin{aligned} 0 = a(\mu _0^2) d_1^{uds}(\mu _0^2) + c(\mu _0^2) d_3^{uds}(\mu _0^2), \end{aligned}$$
(109)

and

$$\begin{aligned} S(Q^2)&= [a(Q^2)\varGamma _1^{qq}(Q^2, \mu _0^2) +b(Q^2)\varGamma _1^{gq}(Q^2, \mu _0^2)] \nonumber \\&\quad \times d_1^{uds}(\mu _0^2) + [c(Q^2)\varGamma _3^{qq}(Q^2, \mu _0^2)\nonumber \\&\quad +d(Q^2)\varGamma _3^{gq}(Q^2, \mu _0^2)] \nonumber \\&\quad \times d_3^{uds}(\mu _0^2). \end{aligned}$$
(110)

Combining the two expressions gives the estimator:

$$\begin{aligned} \sigma _{d1q}(\mu _0^2)&= \varDelta S \times \bigg |a(Q^2_{max})\varGamma _1^{qq} + b(Q^2_{max}) \varGamma _1^{gq} \nonumber \\&\quad - \frac{c(Q^2_{max})a(\mu _0^2)}{c(\mu _0^2)} \varGamma _3^{qq} - \frac{d(Q^2_{max})a(\mu _0^2)}{c(\mu _0^2)} \varGamma _3^{gq} \bigg |^{-1}. \end{aligned}$$
(111)

We produce the corresponding plot in Fig. 14. Constraining the CFFs up to \(Q^2 = 20\) \(\hbox {GeV}^2\) would now result in a decrease of uncertainty by a factor 2 to 3. We remind that we have only considered here shadow D-terms with no explicit gluon contribution, which corresponds to the uncertainty of the fits we have performed earlier with radiative gluons. Adding freedom of explicit gluon contributions would result in yet a far larger increase of uncertainty, combining the effect of Figs. 13 and 14.

Fig. 14
figure 14

Evolution of the uncertainty of \(d_1^{uds}\) and \(d_3^{uds}\) depending on the range of \(Q^2\) available for the extraction of CFFs, considering only the role of shadow D-terms with no explicit gluon contribution. The red and green stars denote approximately the current situation obtained when fitting with radiative gluons

However, let us stress that our estimator is only focusing on the impact of the measured range of scales \(Q^2.\) The EIC will also bring precious high-quality data in regions in \(\xi \) which are poorly constrained so far, which will likely decrease the value of \(\varDelta S.\) More work to estimate this impact remains to be done.

7 Conclusion

We have provided a re-derivation of dispersion relations for DVCS at all order in perturbation theory. Our presentation highlights that dispersion relations contain far more information than the commonly acknowledged restriction to the D-term. In fact, Eq. (43) along with Eq. (31) give a very synthetic picture of the information that can possibly be extracted from an arbitrary knowledge of CFFs at a given scale. We have then stressed that the scale dependence of measurements is crucial to mitigate the issues related to the deconvolution problem. An intuitive presentation in terms of shadow D-terms allows to construct simple estimates of the impact on the uncertainty of the range in scales on which DVCS is measured.

Using those tools, we have re-considered the dispersion relation global analysis of Ref. [7]. We have proposed a new statistical method aiming at giving a sounder account of correlated fits in the presence of outliers. Our re-analysis of the LO D-term gives however very similar results to the previous publication. We extend the analysis to NLO. We find that the NLO results are generally fairly similar to the LO. The major modification comes when an explicit gluon D-term is allowed to be freely fitted on the data. Then we find at NLO a very large correlation between the fitted quarks and gluons that does not exist at LO. We provide an explanation for this fact, linked to the similarity of scale dependence of the evolution operators.

Using the simple formalism of shadow D-terms, we finally establish generic estimates of the reduction of uncertainty that one could expect from the range of scales probed at an EIC. We find that extending the region of measurements with a similar statistical accuracy as current best measurements to 20 \(\hbox {GeV}^2\) could bring a reduction of uncertainty in the deconvolution problem by a factor 2 to 5 depending on the quantity of interest. One should however keep in mind that the EIC will additionally reduce the statistical uncertainty on the subtraction constant, by measuring CFFs at values of Bjorken-x where they are poorly constrained so far. Therefore, the potential EIC impact on our experimental knowledge of \(d_1\) could be larger and remains to be fully estimated. In the mean-time, we expect the knowledge of the pressure and shear forces within the nucleon to be mostly driven by lattice-QCD calculations.