1 Introduction

In the framework of the Standard Model, the fact that proton and neutron have nearly the same mass is explained as consequence of an approximate symmetry: isospin [1]. The symmetry is broken explicitly because the two lightest quarks neither have the same charge nor the same mass. The violation of the symmetry is very weak, because the e.m. coupling e as well as the difference between \(m_u\) and \(m_d\) are small. The weak interaction provides the neutron mass with an imaginary part and generates a shift of the real part as well, but these effects are tiny and will be neglected. The Standard Model then reduces to QED + QCD. In that framework, the expansion of the mass difference between proton and neutron in powers of e starts with

$$\begin{aligned} m^p-m^n=m_{{\mathrm{QCD}}}+m_{\mathrm{QED}}+O(e^4) , \end{aligned}$$
(1)

where \(m_{{\mathrm{QCD}}}\) is what remains if e is turned off and is proportional to \(m_u-m_d\), while \(m_{\mathrm{QED}}\) stands for the term of order \(e^2\). It is well-known that the splitting of the physical masses into an electromagnetic and a strong part is not unique. In our analysis, the ambiguity shows up through the scale of the logarithm occurring in the e.m. renormalization of the quark masses and will be discussed in detail.

As shown by Cottingham [2], the leading contribution of the e.m. interaction to the mass of a particle is given by an integral over the spin averaged forward Compton scattering amplitude,

$$\begin{aligned} T^{\mu \nu }(p,q)= {\small \frac{i}{2}}\!\int \!\! d^4x\,e^{i q\cdot x} \langle p|Tj^\mu (x) j^\nu (0)|p\rangle . \end{aligned}$$
(2)

This amplitude is determined by QCD. If the mass of a particle is expanded in powers of the e.m. coupling constant e, the explicit expression for the term of order \(e^2\) is formally given byFootnote 1

$$\begin{aligned} m_\gamma =\frac{ie^2}{2m(2\pi )^4}\int \!\!d^4q \,\frac{1}{q^2+i\epsilon } \,T^\mu _{\;\mu }(p,q) . \end{aligned}$$
(3)

There are two problems with this formula: (i) The short distance properties of QCD imply that the amplitude \(T^\mu _{\;\mu }(p,q)\) does not fall off rapidly enough at large values of q for the integral to converge. (ii) \(T^\mu _{\;\mu }(p,q)\) does not obey an unsubtracted dispersion relation – causality alone determines the Compton amplitude through the structure functions of lepton-nucleon scattering only up to a subtraction function. The asymptotic behaviour in the deep inelastic region is now fully understood on the basis of asymptotic freedom, but the properties of the subtraction function are still under debate.

Elitzur and Harari [3] pointed out that if the exchange of Reggeons correctly describes the asymptotic behaviour in the limit \(\nu \rightarrow \infty \) at fixed \(q^2\) – an assumption we refer to as Reggeon dominance – then the subtraction function obeys a sum rule which fully determines it through the cross section of lepton-nucleon scattering. Their paper appeared in 1970, at a time when the origin of the \(\varDelta I=1\) mass differences within the isospin multiplets was totally mysterious: evaluations of the Cottingham formula invariably led to the conclusion that the proton should be heavier than the neutron and hence unstable.

Gasser and Leutwyler [4] then showed that the mystery disappears if the popular conviction, according to which the strong interaction conserves isospin, is dismissed. They showed that a coherent picture of isospin breaking can be reached within the Quark Model, provided the masses of the two lightest quarks are not only very small but also very different. At that time, the experimental results on deep inelastic scattering were consistent with the scaling laws of Bjorken [5]. The implications of Reggeon dominance were worked out in this framework, using models to substitute the lack of experimental information in part of phase space, with the result \(m_{\mathrm{QED}}=0.7\pm 0.3\,\text {MeV}\) [4].

Lattice calculations of the proton–neutron mass difference are very demanding and became feasible only in the 21st century. Early calculations were consistent with the result obtained from the Cottingham formula, but more recent evaluations indicate higher values for \(m_{\mathrm{QED}}\) – we will compare the available results with the outcome of our calculation in Sect. 23.

Walker-Loud et al. [6] performed a new evaluation of the Cottingham formula. They claimed that the analysis in [4] is inconsistent and replaced our sum rule by a model where the subtraction function \(T_1(0,q^2)\) is parametrized with a simple algebraic formula. This paper triggered renewed interest and several authors investigated the matter [7,8,9,10]. We will discuss these works in Sect. 24. A critical examination of some of the claims made in [6] can be found in appendix E of [11] and in [12, 13].

The discovery of QCD and asymptotic freedom led to a fully transparent picture for the properties of the Compton amplitude in the region where both \(\nu \) and \(q^2\) are large and where the divergence of the Cottingham formula arises [14,15,16,17]. In Ref. [18], we showed that the formal relation (3) can be rewritten in such a way that the divergences are under full theoretical control, exclusively concern the contribution from the subtraction function and are absorbed in the e.m. renormalization of quark masses and QCD coupling constant. The aim of the present paper is to describe the analysis underlying these statements in detail.

The presentation is organized as follows. In a first part, Sects. 27, we discuss the mathematical underpinnings: decomposition of the Compton amplitude, dispersion relations, sum rule for the subtraction function, Wick rotation, mass formulae. The second part, Sects. 813, deals with the operator product expansion, which governs the behaviour of the amplitudes at large momenta. The renormalization of the mass difference is discussed in Sects. 14 and 15, whereas the data concerning the structure functions used in our work and the numerical determination of the subtraction function and of the mass difference are described in Sects. 1622. Sections 23 and 24 compare the outcome of our analysis with results obtained on the lattice and with other recent evaluations of the Cottingham formula. A summary and conclusions are provided in Sect. 25. The Appendices contain material concerning the operator product expansion as well as a detailed derivation of the sum rule for the subtraction function that plays a central role in the present work.

2 Lorentz invariance, kinematic zeros

Causality ensures that the time-ordered amplitude is unique up to contact terms and the ambiguity can be fixed in such a manner that \(T^{\mu \nu }(p,q)\) is Lorentz covariant.Footnote 2 Together with symmetry under space reflections, this property implies that the Compton amplitude can be decomposed as

$$\begin{aligned} T^{\mu \nu }(p,q)= & {} A\, g^{\mu \nu }+B\, p^\mu p^\nu +C p^\mu q^\nu +C' p^\nu q^\mu \nonumber \\&+D\, q^\mu q^\nu , \end{aligned}$$
(4)

where \(A,B,C,C',D\) only depend on the two variables \(q^2\) and \(\nu =p\cdot q/m\). Current conservation imposes the constraints

$$\begin{aligned} A+ m \nu \, C+q^2D=0 ,\quad m \nu \,B+q^2C=0 ,\quad C'=C . \end{aligned}$$
(5)

Since the physical spectrum of QCD does not contain massless particles, the amplitude \(T^{\mu \nu }(p,q)\) cannot have a pole at \(q^2=0\). Hence the second relation shows that B vanishes for \(q^2=0\) and can therefore be represented as \(B=-q^2 T_2/m^2\). Setting \(T_1=D\) and solving the constraints (5) for A and C, this leads to the decomposition

$$\begin{aligned} T^{\mu \nu }(p,q)= & {} T_1(\nu ,q^2)K_1^{\mu \nu } +T_2(\nu ,q^2)K_2^{\mu \nu } ,\nonumber \\ K_1^{\mu \nu }= & {} q^\mu q^\nu -g^{\mu \nu }q^2,\nonumber \\ K_2^{\mu \nu }= & {} \frac{1}{m^2}\{(p^\mu q^\nu +p^\nu q^\mu )p\cdot q -g^{\mu \nu }(p\cdot q)^2\nonumber \\&-p^\mu p^\nu q^2\}. \end{aligned}$$
(6)

Crossing symmetry, \(T^{\mu \nu }(p,q)=T^{\nu \mu }(p,-q)\), implies that \(T_1\) and \(T_2\) are even in \(\nu \).

A popular alternative decomposition identifies the two independent amplitudes instead with \(\hat{T}_1=-A\) and \(\hat{T}_2=m^2B\). It is related to the one specified above by

$$\begin{aligned} \hat{T}_1=q^2T_1+\nu ^2 T_2 ,\quad \hat{T}_2=-q^2 T_2 . \end{aligned}$$
(7)

The problem with this choice is that, in contrast to \(T_1, T_2\), the amplitudes \(\hat{T}_1,\hat{T}_2\) contain kinematic zeros. This makes it difficult to determine their asymptotic behaviour. That is important because analytic functions are fully determined by their singularities only if the asymptotic behaviour is known. In dispersion theory, theoretical constraints are needed to determine the asymptotic properties of the amplitudes.

To illustrate the problems encountered when working with amplitudes that are not free of kinematic zeros, consider the “Born terms”, i.e. the poles generated by the one-particle intermediate states. Their residues are determined by the elastic form factors of the nucleon. The Cauchy formula implies that an analytic function of the variable z is determined uniquely by its singularities (poles, cuts) and by its behaviour for \(z\rightarrow \infty \). The amplitudes \(T_1\) and \(T_2\) are analytic in \(\nu \) at fixed \(q^2\). The Born terms concern the contributions from the nucleon poles at \(\nu =\pm \,q^2/2m\). They are fixed uniquely by the requirement that they disappear for \(\nu \rightarrow \infty \) [11]:

$$\begin{aligned} T_1^{\text {el}}(q^2,\nu )= & {} \frac{4m^2q^2\{G_E^2(q^2)-G_M^2(q^2)\}}{\{4m^2\nu ^2-(q^2+i\epsilon )^2\}(4m^2-q^2)} , \nonumber \\ T_2^{\text {el}}(q^2,\nu )= & {} -\frac{4m^2\{4m^2G_E^2(q^2)-q^2G_M^2(q^2)\}}{\{4m^2\nu ^2-(q^2+i \epsilon )^2\}(4m^2-q^2)} . \end{aligned}$$
(8)

For notation, in particular also for the definition of the Sachs form factors \(G_E\) and \(G_M\), we refer to [11].

For the alternative decomposition (7), the elastic part of \(\hat{T}_1\) does not disappear when \(\nu \rightarrow \infty \). In terms of Regge poles, the elastic part of \(\hat{T}_1\) contains a fixed pole at \(\alpha = 0\), with a residue that is determined by the nucleon form factors: \(\hat{T}_1\) picks up asymptotic contributions that do not have anything to do with the phenomena that dominate the high energy behaviour of the Compton amplitude – they merely reflect the fact that the amplitude \(\hat{T}_1\) contains kinematic zeros. It is not advisable to work with such amplitudes – for further discussion of the problems encountered in the presence of kinematic zeros, we refer to [13, 19, 20].

As pointed out in the letter [1], the operator product expansion shows that, up to normalization, the leading spin 2 contributions in \(T_1\) and \(T_2\) are the same: in the combination

$$\begin{aligned} \bar{T}(\nu ,q^2)=T_1(\nu ,q^2)+ {\frac{1}{2}}T_2(\nu ,q^2) , \end{aligned}$$
(9)

these contributions drop out. For this reason, the analysis of the asymptotic behaviour simplifies considerably if the pair \(T_1\), \(T_2\) is replaced by the pair \(\bar{T}\), \(T_2\), which is also free of kinematic zeros.

3 Dispersion relations

The dispersion relations express the Compton amplitude in terms of the structure functions. These represent the Fourier transform of the current commutator:

$$\begin{aligned} V^{\mu \nu }(p,q)= & {} \frac{1}{4\pi }\!\int \!\!d^4 x e^{iq\cdot x} \langle p|[j^\mu (x),j^\nu (0)]|p\rangle ,\nonumber \\ V^{\mu \nu }(p,q)= & {} V_1(\nu ,q^2)K_1^{\mu \nu } +V_2(\nu ,q^2)K_2^{\mu \nu } . \end{aligned}$$
(10)

The structure functions are experimentally accessible only for \(q^2\le 0\) and it is customary to replace \(q^2\) by \(Q^2\equiv -q^2\). In the standard notation, where the structure functions are denoted by \(F_1(x,Q^2)\), \(F_2(x,Q^2)\) with \(x=Q^2/2m\nu \), \(V_1\) and \(V_2\) are given by:

$$\begin{aligned} V_1= & {} \frac{F_L}{2x Q^2} ,\quad V_2=\frac{F_2}{2x \nu ^2} ,\nonumber \\ F_L= & {} F_2-2x F_1 . \end{aligned}$$
(11)

For \(\bar{T}\), the structure function \(\bar{V}=V_1+\frac{1}{2}V_2\) is relevant:

$$\begin{aligned} \bar{V}= & {} \frac{\bar{F}}{2x Q^2} ,\nonumber \\ \bar{F}= & {} F_L+\frac{2m^2 x^2}{Q^2}F_2 . \end{aligned}$$
(12)

We assume that the Compton amplitude exhibits Regge behaviour for \(\nu \rightarrow \infty \): \(\bar{T}\propto \nu ^\alpha \), \(T_2\propto \nu ^{\alpha -2}\). Accordingly, the dispersion relation for \(\bar{T}\) requires a subtraction while \(T_2\) obeys an unsubtracted dispersion relation:

$$\begin{aligned} \bar{T}(\nu ,q^2)= & {} \bar{S}(q^2)+\bar{T}^{\text {el}}(\nu ,q^2)\nonumber \\&+2(\nu ^2-\nu _0^2) \!\int _{\nu _\mathrm{th}}^\infty \nu 'd\nu '\frac{\bar{V}(\nu ',q^2)}{(\nu '^2-\nu _0^2)(\nu '^2-\nu ^2-i\epsilon )} ,\nonumber \\ \end{aligned}$$
(13)
$$\begin{aligned} T_2(\nu ,q^2)= & {} T_2^{\text {el}}(\nu ,q^2)+2\!\int _{\nu _\mathrm{th}}^\infty d\nu ' \nu ' \frac{ V_2(\nu ',q^2)}{\nu '^2-\nu ^2-i\epsilon } . \end{aligned}$$
(14)

\(\bar{S}(q^2)\) represents the subtraction function, \(\nu _0^2\) is the subtraction point in the variable \(\nu ^2\) and the lower limit corresponds to the threshold for inelastic reactions, \(\nu _\mathrm{th}=(2m M_\pi +M_\pi ^2-q^2)/(2m)\). The elastic part of \(\bar{T}\) is given by \(\bar{T}^{\text {el}}=T_1^{\text {el}}+\frac{1}{2}T_2^{\text {el}}\).

As such, the choice of the subtraction point is arbitrary (provided that \(\nu _0^2<\nu _\mathrm{th}^2\)), but as pointed out in [1], it is convenient to set \(\nu _0^2=-\frac{1}{4}Q^2\) rather than to subtract at \(\nu _0=0\). As will be seen below, this choice simplifies the asymptotic behaviour of the subtraction function for \(Q^2\rightarrow \infty \). Replacing the variable of integration \(\nu '\) by \(x=Q^2/(2m \nu ')\), the dispersive representation then takes the form

$$\begin{aligned}&\bar{T}(\nu ,-Q^2)=\bar{S}(-Q^2)+\bar{T}^{\text {el}}(\nu ,-Q^2) \nonumber \\&\qquad +\, ( Q^2+ 4\nu ^2)\!\int _0^{x_\mathrm{th}}dx\, \frac{m^2 \bar{F}(x,Q^2)}{(Q^2+m^2 x^2) (Q^4-4m^2 x^2 \nu ^2-i\epsilon )} ,\nonumber \\&T_2(\nu ,-Q^2)\nonumber \\&\quad = T_2^{\text {el}}(\nu ,-Q^2)+\!\int _{0}^{x_\mathrm{th}}dx\frac{4m^2F_2(x,Q^2)}{Q^4-4m^2 x^2\nu ^2-i\epsilon } . \end{aligned}$$
(15)

with \(x_\mathrm{th}=Q^2/(Q^2+2m M_\pi +M_\pi ^2)\).

4 Reggeon dominance

While \(T_2\) is fully determined by the form factors and the structure functions because it obeys an unsubtracted dispersion relation, the representation for \(\bar{T}\) involves a subtraction function, which causality alone leaves undetermined. This illustrates a venerable theorem which concerns the implications of causality for the structure functions [21,22,23]. The theorem states that the values of \(\bar{V}(\nu ,q^2)\), \(V_2(\nu ,q^2)\) in the space-like region \(q^2\le 0\) determine these functions in the time-like region, up to a polynomial in the variable \(\nu \). The implications for the dispersive analysis of the Compton amplitude are discussed in [24].

In Regge language, integer powers of \(\nu \) are called fixed poles: the continuation from the space-like to the time-like region is unique up to fixed poles. Regge asymptotics excludes such contributions in \(V_2\), but the continuation of \(\bar{V}\) into the time-like region is unique only up to a term that depends on \(\nu \) exclusively through the step function:

$$\begin{aligned} \bar{V}^{\mathrm{fp}}(\nu ,q^2)=\epsilon (\nu )\sigma (q^2) , \end{aligned}$$
(16)

where \(\sigma (s)\) vanishes for \(s<0\). In \(\bar{T}\), the ambiguity shows up in the form

$$\begin{aligned} \bar{T}^{\mathrm{fp}}(q^2)=\int _0^\infty \!\! ds\,\frac{\sigma (s)}{s-q^2-i\epsilon } , \end{aligned}$$
(17)

which is independent of \(\nu \) and thus only affects the subtraction function. Since the ambiguity amounts to a superposition of free propagators, it is evident that the Fourier transform of \(\bar{V}^{\mathrm{fp}}\) vanishes outside the light-cone. The nontrivial part of the theorem is that the values of the structure functions in the space-like region, where they can be measured, fully determine the amplitude in the time-like region up to contributions of this particular type.

In QED, the electrons reggeize [25], but the photon remains elementary [26]. QCD, however, does satisfy the criteria for reggeization formulated in [26]: the gluons as well as the quarks have this property [27, 28]. In the meantime, the graphs that need to be summed up to study the high energy properties of the scattering amplitudes within QCD perturbation theory have been identified and Reggeon field theory has been developed for the analysis of exchanges of more than one Reggeon, in particular also of the cuts generated by these [29,30,31,32,33,34,35].

It is generally assumed that the asymptotic behaviour of the Compton amplitude is indeed governed by Reggeon exchange. The exchange of Reggeons generates contributions which at high energies are of the form

$$\begin{aligned} \bar{T}^{\mathrm{R}}(\nu ,q^2)=-\sum _{\alpha >0}\frac{\pi \beta _\alpha (q^2)}{\sin \pi \alpha }\{(-s)^{\alpha }+(-u)^\alpha \} , \end{aligned}$$
(18)

where \(s =m^2+ 2m \nu +q^2\) and \(u=m^2-2m \nu +q^2\) represent the square of the centre of mass energy in the s- and u-channels, respectively. In general, the power \(\alpha \) depends on t: \(\alpha (t)\) moves on a Regge trajectory. In our context, however, only the forward scattering amplitude is of interest, so that only the intercept \(\alpha =\alpha (0)\) is relevant. For the Compton amplitude of the proton or the neutron, the Pomeron yields the dominant contribution; it involves a superposition of terms of the above form with intercepts in the vicinity of \(\alpha =1\). The Reggeon with the quantum numbers \(I^C=1^+\) and an intercept in the vicinity of \(\alpha =\frac{1}{2}\) represents the most important non-leading contribution. We refer to this Reggeon as the \(a_2\). In the difference between the amplitudes relevant for proton and neutron, the Pomeron drops out – the \(a_2\) represents the leading term.

We assume that the Reggeons dominate the asymptotic behaviour [4]:

$$\begin{aligned} \lim _{\nu \rightarrow \infty } (\bar{T}-\bar{T}^{\mathrm{R}})= 0 . \end{aligned}$$
(19)

This amounts to the assumption that the amplitude \(\bar{T}\) does not contain a fixed pole at \(\alpha =0\). We do not know of a physical phenomenon that could produce a fixed pole at \(\alpha =0\) in \(\bar{T}\). Neither causality nor the short-distance singularities nor the Reggeons generate terms of this sort.

The constraints imposed on the subtraction function by causality and unitarity have been analyzed within the alternative dispersive framework set up in [36]. Model-independent bounds for the subtraction function \(S_1(q^2)\) are derived and it is shown that the results obtained in [11] from Reggeon dominance at low values of \(Q^2\) are consistent with these. An extension of this work to the higher values of \(Q^2\) investigated in the present paper would be most welcome, as it would allow to subject Reggeon dominance to a further test. Model-independent bounds on the subtraction function \(\bar{S}(q^2)\) would be particularly interesting, because the operator product expansion of this quantity is free of the short distance singularities associated with operators of spin 2 – asymptotic freedom fully determines the asymptotic behaviour of \(\bar{S}\).

5 Sum rule for the subtraction function

As pointed out in [4], Reggeon dominance determines the subtraction function in terms of the cross sections of inelastic scattering. In [11], a sum rule for the subtraction function \(S_1(-Q^2)\equiv T_1(0,-Q^2)\) was derived that represents the inelastic part of this quantity in terms of integrals over the cross sections. We now derive an analogous sum rule that expresses the subtraction function \(\bar{S}(-Q^2)\) as an integral over the structure function \(\bar{F}(x,Q^2)\) – an immediate consequence of the dispersion relation (15) and Reggeon dominance (19). The derivation is not trivial, however, because the Reggeons generate singularities at \(x=0\). The leading singularity in \(\bar{F}\) is of the form:

$$\begin{aligned} \bar{F}^{\mathrm{R}}(x,Q^2)= & {} \sum _{\alpha >0} b_\alpha (Q^2)x^{1-\alpha } ,\nonumber \\ b_\alpha (Q^2)= & {} 2Q^{2(\alpha +1)}\beta _\alpha (-Q^2) . \end{aligned}$$
(20)

For this reason, taking the limit \(\nu \rightarrow \infty \) in the dispersion relation (15) requires some care: the limit cannot simply be exchanged with the dispersion integral

$$\begin{aligned}&\bar{T}^{\text {disp}}(\nu ,-Q^2) \nonumber \\&\quad =(Q^2+ 4\nu ^2)\!\int _0^{x_\mathrm{th}}dx\, \frac{m^2 \bar{F}(x,Q^2)}{(Q^2+m^2 x^2) (Q^4-4m^2 x^2 \nu ^2-i\epsilon )} ,\nonumber \\ \end{aligned}$$
(21)

because, in the vicinity of \(x=0\), the term \(4m^2 x^2\nu ^2\) fails to dominate over \(Q^4\). Since the elastic part of the amplitude tends to zero for \(\nu \rightarrow \infty \), the Reggeon dominance hypothesis (19) amounts to the requirement that the subtraction function cancels the limiting value of the difference between the dispersion integral and the asymptotic representation:

$$\begin{aligned} \bar{S}=-\lim _{\nu \rightarrow \infty } (\bar{T}^{\text {disp}}-\bar{T}^{\mathrm{R}}) . \end{aligned}$$
(22)

The limit is worked out in Appendix C. The result takes the form of a sum rule that expresses the subtraction function \(\bar{S}\) in terms of the structure function \(\bar{F}\)

$$\begin{aligned} Q^2\bar{S}(-Q^2)= & {} \int _0^{x_\mathrm{th}}dx\, \frac{\bar{F}(x,Q^2)-\bar{F}^{\mathrm{R}}(x,Q^2)}{x^2}\nonumber \\&- \sum _{\alpha >0}\frac{b_\alpha (Q^2)}{\alpha \, x_\mathrm{th}^\alpha }- m^2 \int _0^{x_\mathrm{th}}dx\, \frac{\bar{F}(x,Q^2)}{Q^2+m^2 x^2} . \end{aligned}$$
(23)

A finite energy sum rule variant of this relation was proposed by Elitzur and Harari [3]. The above formulation shows that the sum rule is perfectly consistent with the scaling violations required by QCD – contrary to statements made in [17].

6 Wick rotation

We now return to the mass formula (3). In the decomposition introduced above, the trace of the Compton amplitude is given by

$$\begin{aligned} T^\mu _{\;\mu }(p,q)=-\left\{ 3 q^2\bar{T}(\nu ,q^2)+2\left( \nu ^2- {\frac{1}{4}}q^2\right) T_2(\nu ,q^2)\right\} .\nonumber \\ \end{aligned}$$
(24)

The integration in (3) runs over all \(q^2\), space-like as well as time-like. For the integral to converge, it needs to be regularized, for instance by replacing the photon propagator \(1/q^2\) with \(\varLambda ^2/(\varLambda ^2-q^2)/q^2\).

In the rest frame of the particle, \(\nu \) coincides with the component \(q^0\) of the photon momentum. Cottingham [2] observed that the time-ordered amplitude is analytic in \(q^0\) and that the path of integration over this variable may be rotated into the imaginary axis, at fixed three momentum \(\varvec{q}\) – without crossing any singularities of the integrand in (3) (Wick rotation).

Setting \(q^0=i Q_4\) and identifying \(Q_1,Q_2,Q_3\) with the space components of the physical momentum, \(m_\gamma \) takes the form of a euclidean integral extending over the four-vector \(Q_\mu \):

$$\begin{aligned} m_\gamma= & {} \frac{e^2}{2m(2\pi )^4} \int \frac{d^4Q}{Q^2}\frac{\varLambda ^2}{\varLambda ^2+Q^2}\nonumber \\&\quad \times \left\{ 3Q^2\bar{T} +2\left( Q_4^2- {\frac{1}{4}}Q^2\right) T_2\right\} . \end{aligned}$$
(25)

The result for the renormalized mass difference is independent of the form used for the regularization. It is customary to use a cutoff in momentum space: restrict the integration to the euclidean sphere \(Q^2\le \varLambda ^2\) and write the regularized Cottingham formula as

$$\begin{aligned} m_\gamma ^\varLambda =\frac{e^2}{2m(2\pi )^4} \int \limits _{Q^2\le \varLambda ^2} \frac{d^4Q}{Q^2}\left\{ 3Q^2\bar{T}+2\left( Q_4^2- {\frac{1}{4}}Q^2\right) T_2\right\} .\nonumber \\ \end{aligned}$$
(26)

In this formula, the amplitudes \(\bar{T}\) and \(T_2\) are to be evaluated at \(\nu =iQ_4\), \(q^2=-Q^2\).

7 Decomposition of the mass shift

In the framework of QCD+QED, the mass of a particle is determined by the bare parameters that occur in the Lagrangian and the cutoff used to regularize the theory. If the electromagnetic interaction is turned off, only the QCD coupling constant, the quark masses and the cutoff are relevant. To order \(e^2\), the e.m. interaction changes the mass not only by the above integral, but in addition by the contribution \(\varDelta m^\varLambda \), which arises from the change in the bare parameters needed for the mass of the particle to stay finite when the cutoff \(\varLambda \) is removed – the bare quantities depend on \(\varLambda \) as well as on e. The e.m. part of the mass is obtained by adding this contribution to the integral in Eq. (26):

$$\begin{aligned} m_{\mathrm{QED}}= \text {lim}_{\varLambda \rightarrow \infty }\{m_\gamma ^\varLambda +\varDelta m^\varLambda \} . \end{aligned}$$
(27)

Inserting the dispersion relations (15) in formula (26), we obtain a representation of the e.m. part of the mass as a sum of four terms:

$$\begin{aligned} m_{\mathrm{QED}}= m_{\text {el}}+m_{\bar{F}}+m_{F_2}+m_{\bar{S}} . \end{aligned}$$
(28)

In each one of these, the integrals over the direction of the vector \(Q_\mu \) can be done explicitly.

In the first term, which collects the contributions from \(\bar{T}^{\text {el}}\) and \(T_2^{\text {el}}\), this leads to a set of integrals over the form factors of the nucleon, which are known very accurately – an explicit expression for \(m_{\text {el}}\) is given, for instance, in [11].

The second and third term arise from the dispersion integrals over the structure functions \(\bar{F}\) and \(F_2\), respectively. With the above choice of the subtraction point, the integrands are proportional to the factor \(Q^2-4Q_4^2\), in both cases. Taken by itself, the angular integral of this factor over the directions of the vector \(Q_\mu \) vanishes. Moreover, when \(Q^2\) becomes large, the remainder of the integrand becomes independent of \(Q_4\). Hence the angular integration suppresses the contributions arising from large values of \(Q^2\) – these integrals approach finite limits when the cutoff is removed:

$$\begin{aligned} m_{\bar{F}}= & {} 4N\!\!\int _0^\infty dQ^2 \!\int _0^{x_\mathrm{th}}dx \; \frac{yf(y)\bar{F}(x,Q^2)}{x^2(1+4y)} , \end{aligned}$$
(29)
$$\begin{aligned} m_{F_2}= & {} - {\frac{2}{3}}N\!\!\int _0^\infty dQ^2\!\int _0^{x_\mathrm{th}}dx\;\frac{f(y)F_2(x,Q^2)}{x^2} .\end{aligned}$$
(30)

The normalization constant is given by

$$\begin{aligned} N=\frac{3\alpha _\mathrm{em}}{8\pi m} , \end{aligned}$$
(31)

the variable y stands for \(y=Q^2/(4m^2 x^2)\) and the explicit expression for the function f(y) reads

$$\begin{aligned} f(y)= \frac{1+4y}{2}\sqrt{1+\frac{1}{y}}-\frac{3+4y}{2} . \end{aligned}$$
(32)

The suppression of the angular integrals relevant for \(m_{\bar{F}}\) and \(m_{F_2}\) manifests itself in the fact that the function f(y) rapidly falls off when y becomes large:

$$\begin{aligned} f(y)=\frac{1}{16y^2}+O(y^{-3}) .\end{aligned}$$
(33)

The angular integral can be done explicitly in the fourth term as well, but there, it does not suppress the contributions from large values of \(Q^2\), so that the cutoff must be retained:

$$\begin{aligned} m_{\bar{S}}= & {} \text {lim}_{\varLambda \rightarrow \infty } \left\{ N\!\!\int _0^{\varLambda ^2} dQ^2 Q^2 \bar{S} (-Q^2)+\varDelta m^\varLambda \right\} . \end{aligned}$$
(34)

Together with the sum rule (23), the above formulae fully specify the e.m. part of the mass difference between proton and neutron, in terms of measurable quantities. The next four sections concern asymptotic properties of the Compton amplitude that are not of direct relevance for the Cottingham formula – the contributions generated by short distance singularities of spin 2, for instance. If the reader is more interested in the numerical outcome of our analysis for the mass difference, he or she may go directly to Sect. 12.

8 Operator product expansion

The behaviour of the amplitudes \(\bar{T}\) and \(T_2\) at large momenta is determined by the short-distance properties of the matrix element \(\langle p|Tj^\mu (x) j^\nu (0)|p\rangle \), which can be analyzed by means of the operator product expansion (OPE) [37]. The asymptotic freedom of QCD implies that perturbation theory can be used to work out the leading terms of this expansion [14,15,16,17]. For the time-ordered product of two currents, the behaviour at short distances \(z=x-y\) is of the form

$$\begin{aligned} Tj_\mu (x) j_\nu (y)\mathop {\rightarrow }\limits _{z\rightarrow 0}\quad \sum _n \tilde{C}_{\mu \nu }^n(z)O_n(X) , \end{aligned}$$
(35)

where \(O_n\) enumerates the renormalized gauge invariant local operators of QCD and \(X=\frac{1}{2}(x+y)\). The expansion starts with the operators of lowest dimension. The Wilson coefficients \(\tilde{C}_{\mu \nu }^n(z)\) vanish unless \(O_n\) has the same flavour quantum numbers as the product of two e.m. currents. The symmetry of QCD under P, T and C also prevents some operators from contributing to the expansion. The coefficients depend in a nontrivial manner on z only through \(z^2\): they are polynomials in the components of the vector z, with coefficients that depend on \(z^2\).

In momentum space, the OPE governs the behaviour at large momenta. We denote the Fourier transform with respect to z by \(\tilde{T}_{\mu \nu }\):

$$\begin{aligned} \tilde{T}_{\mu \nu }(q,X)= {\small \frac{i}{2}}\!\int \!\! d^4z\,e^{i q\cdot z} Tj_\mu (x) j_\nu (y) . \end{aligned}$$
(36)

The limit \(z=\lambda \bar{z}\), \(\lambda \rightarrow 0\) in coordinate space corresponds to the limit \(q=\lambda \bar{q}\), \(\lambda \rightarrow \infty \) in momentum space. We refer to this limit as \(q\rightarrow \infty \). In this notation, we have

$$\begin{aligned} \tilde{T}_{\mu \nu }(q,X) \mathop {\rightarrow }\limits _{q\rightarrow \infty }\quad \sum _n C_{\mu \nu }^n(q) O_n(X) . \end{aligned}$$
(37)

The coefficients \(C_{\mu \nu }^n(q)\) represent the Fourier transforms of those in Eq. (35) and are polynomials in the components of q, with coefficients that depend on \(q^2\). This immediately implies that the coefficients occurring in the expansion of the invariant amplitudes \(T_1(\nu ,q^2)\), \(T_2(\nu ,q^2)\) are polynomials in \(\nu \). The expansion thus also holds for imaginary values of \(\nu \).

A contribution from the unit operator only occurs in the disconnected part and does not show up in the scattering amplitude. In QCD, the relevant operators of lowest dimension are \(\bar{f}f\) with \(f=u, d, \ldots \) They have spin zero and are of engineering dimension 3. Chiral symmetry, however, suppresses the contributions from these operators: their Wilson coefficients are proportional to the masses \(m_u, m_d,\ldots \) It is convenient to include the mass factor and to work with the operator \(O^{f_0}=m_f\bar{f}f\), which is of dimension 4. Lorentz invariance implies that the spin averaged matrix elements of the operators of lowest dimension can be expressed in terms of the following linearly independent operators, which either have spin 0 or spin 2:

(38)

Accordingly, the leading terms in the operator product expansion of \(\tilde{T}_{\mu \nu }(q,X)\) are given by

$$\begin{aligned}&\tilde{T}_{\mu \nu }(q,X)\mathop {\rightarrow }\limits _{q\rightarrow \infty }\quad {\sum _f}\left\{ C_{\mu \nu }^{f_0} O^{f_0} +C_{\mu \nu }^{f_2\,\alpha \beta } O^{f_2}_{\alpha \beta } \right\} \nonumber \\&\qquad +\,C_{\mu \nu }^{G_0}O^{G_0}+C_{\mu \nu }^{G_2\,\alpha \beta }O^{G_2}_{\alpha \beta } . \end{aligned}$$
(39)

While the dependence on q resides in the Wilson coefficients, the operators only depend on X.

9 Leading Wilson coefficients

Lorentz invariance and current conservation imply that the Wilson coefficient of the scalar operator \(O^{f_0}\) is proportional to the kinematic tensor \(K_{1\,\mu \nu }\) specified in Eq. (6): the contribution from this operator is of the form

$$\begin{aligned} C_{\mu \nu }^{f_0}\,O^{f_0}=K_{1\,\mu \nu }\,c_1^{f}O^{f_0} , \end{aligned}$$
(40)

where the coefficient \(c_1^f\) depends on \(q^2\). The contribution from \(O^{G_0}\) is of the same structure.

For operators with spin 2, the situation is not that simple. As shown in Appendix A, Lorentz invariance and current conservation determine the form of their Wilson coefficients only up to two functions of \(q^2\), which we denote by \(c_2(q^2)\) and \(c_3(q^2)\). According to Eq. (A.3), the contribution from \(O_{\alpha \beta }^{f_2}\) is of the form:

$$\begin{aligned}&C_{\mu \nu }^{f_2\,\alpha \beta }O_{\alpha \beta }^{f_2}=c_3^f\,(q_\mu q_\nu -g_{\mu \nu }q^2) O_{\alpha \beta }^{f_2}q^\alpha q^\beta \nonumber \\&\qquad +\,c_2^f\, (g_{\mu \alpha }O_{\nu \beta }^{f_2}+g_{\nu \alpha } O_{\mu \beta }^{f_2}-g_{\mu \nu }O_{\alpha \beta }^{f_2}) \left( q^\alpha q^\beta -{\frac{1}{2}}g^{\alpha \beta }\right) ,\nonumber \\ \end{aligned}$$
(41)

and analogously for the contribution from the lowest dimensional gluonic operator of spin 2. This shows that kinematics determines the Wilson coefficients of the lowest dimensional operators in terms of six functions, \(c_1^{f},\ldots ,c^{G}_3\), that depend on \(q^2\).

The amplitude we are interested in represents the spin average of the one-particle matrix element

$$\begin{aligned} T_{\mu \nu }(p,q)=\langle p |\tilde{T}_{\mu \nu }(q,X)|p\rangle . \end{aligned}$$
(42)

(Since the initial and final momenta are the same, the matrix element is independent of X). Inserting the expansion (39), we obtain:

$$\begin{aligned}&T_{\mu \nu }(p,q)\,\, \mathop {\rightarrow }\limits _{q\rightarrow \infty }\quad {\sum _f}\left\{ C_{\mu \nu }^{f_0}\langle p| O^{f_0}|p\rangle +C_{\mu \nu }^{f_2\,\alpha \beta }\langle p|O^{f_2}_{\alpha \beta }|p\rangle \right\} \nonumber \\&\quad +\,C_{\mu \nu }^{G_0}\langle p| O^{G_0}|p\rangle +C_{\mu \nu }^{G_2\,\alpha \beta }\langle p| O^{G_2}_{\alpha \beta }|p\rangle . \end{aligned}$$
(43)

For scalar operators, the matrix element is a constant, while for spin 2, it depends on the momentum of the particle:

$$\begin{aligned} \langle p|O^{f_0}|p\rangle= & {} \langle O^{f_0}\rangle ,\nonumber \\ \langle p| O_{\alpha \beta }^{f_2}|p\rangle= & {} \left( \hat{p}_\alpha \hat{p}_\beta -{\frac{1}{4}}g_{\alpha \beta }\right) \langle O^{f_2}\rangle ,\quad \hat{p}_\alpha =\frac{p_\alpha }{m} . \end{aligned}$$
(44)

According to Eq. (40), the contributions from the spin zero operators are proportional to the kinematic tensor \(K_{1\,\mu \nu }\). For the spin 2 terms, a short calculation is needed to verify that it can be expressed as a linear combination of \(K_{1\,\mu \nu }\) and \(K_{2\,\mu \nu }\):

$$\begin{aligned} C_{\mu \nu }^{f_2\,\alpha \beta }\langle O^{f_2}_{\alpha \beta }\rangle= & {} \left( - {\frac{1}{2}}K_{1\,\mu \nu }+K_{2\,\mu \nu }\right) \,c_2^{f}\langle O^{f_2}\rangle \nonumber \\&+ K_{1\,\mu \nu } \left( \nu ^2-{\frac{1}{4}}q^2\right) \,c_3^{f}\langle O^{f_2} \rangle . \end{aligned}$$
(45)

The terms arising from the lowest dimensional gluonic operator of spin 2 are of the same form.

The corresponding asymptotic representations for \(T_1(\nu ,q^2)\) and \(T_2(\nu ,q^2)\) are given by the coefficients of \(K_{1\,\mu \nu }\) and \(K_{2\,\mu \nu }\), respectively:

$$\begin{aligned}&T_1(\nu ,q^2) \mathop {\rightarrow }\limits _{q\rightarrow \infty } {\sum _f}c_1^{f}(q^2)\langle O^{f_0}\rangle +c_1^{G}(q^2)\langle O^{G_0}\rangle \nonumber \\&\quad -{\frac{1}{2}} {\sum _f}c_2^{f}(q^2)\langle O^{f_2}\rangle - {\frac{1}{2}}c_2^{G}(q^2)\langle O^{G_2}\rangle \nonumber \\&\quad +\left( \nu ^2-{\frac{1}{4}}q^2\right) \left\{ {\sum _f}c_3^{f}(q^2)\langle O^{f_2}\rangle +c_3^{G}(q^2)\langle O^{G_2}\rangle \right\} ,\nonumber \\ \end{aligned}$$
(46)
$$\begin{aligned}&T_2(\nu ,q^2) \mathop {\rightarrow }\limits _{q\rightarrow \infty } {\sum _f}c_2^f (q^2)\langle O^{f_2}\rangle +c_2^{G}(q^2)\langle O^{G_2}\rangle . \end{aligned}$$
(47)

While the leading term in the asymptotic behaviour of \(T_2(\nu ,q^2)\) only depends on \(q^2\), \(T_1(\nu ,q^2)\) contains a term proportional to \(\nu ^2\).

The advantage of working with the amplitude \(\bar{T}\) now becomes visible: \(T_1\) and \(T_2\) contain a common spin 2 contribution. In the combination \(\bar{T}=T_1+\frac{1}{2}T_2\), this term drops out – only the one proportional to the factor \(\nu ^2-\frac{1}{4}q^2\) remains:

$$\begin{aligned}&\bar{T}(\nu ,q^2) \mathop {\rightarrow }\limits _{q\rightarrow \infty } {\sum _f}c_1^{f}(q^2)\langle O^{f_0}\rangle + c_1^{G}(q^2)\langle O^{G_0}\rangle \nonumber \\&\quad +(\nu ^2-{\frac{1}{4}}q^2)\left\{ {\sum _f}c_3^{f}(q^2)\langle O^{f_2}\rangle +c_3^{G}(q^2)\langle O^{G_2}\rangle \right\} .\nonumber \\ \end{aligned}$$
(48)

As noted above, the angular integration suppresses contributions that are proportional to this factor – this is the reason why in the decomposition (28) only \(m_{\bar{S}}\) contains a divergence.

10 Difference between proton and neutron

For the mass difference between proton and neutron, only the difference between the Compton amplitudes of the two particles is needed. As far as the asymptotic behaviour is concerned, we thus only need the difference between the spin averaged matrix elements of proton and neutron. In the isospin limit, the neutron matrix elements of the gluonic operators \(O^{G_0}\) and \(O^{G_2}_{\alpha \beta }\) coincide with those of the proton. In reality, since \(m_u\) differs from \(m_d\), the proton and neutron matrix elements of the gluonic operators are slightly different, but in the mass difference between proton and neutron, this generates an effect of second order in isospin breaking and will be neglected. This simplifies matters considerably. Only the matrix elements of non-singlet operators are relevant – operator mixing does not affect these.

Throughout the remainder of this paper, we focus on the difference between the Compton amplitudes of proton and neutron, without explicitly indicating this in the notation: in the following, \(\bar{T}\) and \(T_2\) stand for \(\bar{T}^{p-n}\) and \(T_2^{p-n}\), respectively.

11 Perturbation theory

To leading order of the QCD perturbation series, the Wilson coefficients are the same as for free quarks. The explicit expressions are readily obtained by simply replacing the nucleon in the above relations with a free quark of charge \(Q_f=\frac{2}{3}\) or \(-\frac{1}{3}\). If the strong interaction is turned off and the e.m. interaction is accounted for only to leading order, the Compton scattering on a quark is elastic. The Sachs form factors are given by \(G_E = G_M = Q_f\), so that the formulae (8) reduce to

$$\begin{aligned} T_1^f=0 ,\quad T_2^f=\frac{4m^2_fQ_f^2}{(q^2+i\epsilon )^2-4m^2\nu ^2} . \end{aligned}$$
(49)

In the limit \(q\rightarrow \infty \) relevant for the OPE, the second term in the denominator becomes negligible compared to the first: for space-like momenta, \(T_2^f\) tends to \(4m_f^2Q_f^2/Q^4\). The spin averaged quark matrix elements of the operators \(O^{f_0}\) and \(Q_{\alpha \beta }^{f_2}\) are readily worked out; they yield \(\langle O^{f_0}\rangle =2m_f^2\) and \(\langle O^{f_2}\rangle =4m_f^2\). The leading terms in the expansion of the coefficients \(c_1^f, c_2^f,c_3^f\) in powers of \(g^2\) can then be read off from the asymptotic relations (46) and (47) [17, 38]:

$$\begin{aligned} c_1^f(-Q^2)= & {} \frac{Q_f^2}{Q^4}+O(g^2) ,\nonumber \\ c_2^f(-Q^2)= & {} \frac{Q_f^2}{Q^4}+O(g^2) ,\nonumber \\ c_3^f(-Q^2)= & {} O(g^2) . \end{aligned}$$
(50)

In this calculation, the spin of the operators occurring in the OPE does not play an important role. Appendix B contains an alternative derivation of these relations, which is based on the short distance expansion of the quark propagator and explicitly exhibits the spin structure.

The higher order terms in the expansion of the Wilson coefficients have been studied in detail, also for the gluonic operators [14,15,16,17, 38] – for a thorough review, we refer to [39]. The qualitative features of the asymptotic structure are intimately related to the fact that the dimension of the spin 2 operators is anomalous. The correction of order \(g^2\) in the perturbative expansion of the quantity \(Q^4c_2^f(-Q^2)\) falls logarithmically if \(Q^2\) becomes large. With the renormalization group, the leading logarithms can be summed up to all orders. The contributions from the singlet operators undergo mixing, but as noted above, for the difference between proton and neutron, only the nonsinglet operators are relevant. The matrix element of the term involving \(c_2^f\) falls off with

$$\begin{aligned} \sum _f c_2^f(-Q^2)\langle O^{f_2}\rangle ^{p-n}\mathop {\rightarrow }\limits _{Q\rightarrow \infty }\quad \frac{C_2 }{Q^4}\left( \!\ln \frac{Q^2}{\varLambda _{{\mathrm{QCD}}}^2}\!\right) ^{\!-d_2} , \end{aligned}$$
(51)

where \(\varLambda _{{\mathrm{QCD}}}\) is the renormalization group invariant scale of QCD, while \(d_2\) is related to the anomalous dimension of the operator \(O_{\alpha \beta }^{f_2}\) and depends on the number of flavours:

$$\begin{aligned} d_2= \frac{32}{3(33-2 N_f)} .\end{aligned}$$
(52)

The formula (51) holds provided Q is large, not only compared to \(\varLambda _{{\mathrm{QCD}}}\), but compared to all of the quark masses. In the intermediate range where Q is large compared to \(m_s\), but not large enough to activate the degrees of freedom of the heavy quarks, it should hold approximately, with \(N_f \approx 3\).

Since the perturbation series of \(c_3^f(-Q^2)\) only starts at order \(g^2\), the asymptotic behaviour is suppressed by a factor of \(\ln Q^2/\varLambda _{{\mathrm{QCD}}}^2\):

$$\begin{aligned} \sum _f c_3^f(-Q^2)\langle O^{f_2}\rangle ^{p-n}\mathop {\rightarrow }\limits _{Q\rightarrow \infty }\quad \frac{C_3 }{Q^4}\left( \!\ln \frac{Q^2}{\varLambda _{{\mathrm{QCD}}}^2}\!\right) ^{\!-1-d_2} . \end{aligned}$$
(53)

The scalar operator \(\bar{f}f\) is of anomalous dimension as well, but the same is true of \(m_f\) and the anomalies cancel: the operator \(m_f \bar{f}f\) is renormalization group invariant. This implies that, in the Wilson coefficient \(c_1^f(-Q^2)\), the correction of order \(g^2\) does not pick up a logarithmic enhancement if \(Q^2\) becomes large and there is nothing to be summed up:

$$\begin{aligned} \sum _f c_1^f(-Q^2)\langle O^{f_0}\rangle ^{p-n}\mathop {\rightarrow }\limits _{Q\rightarrow \infty }\quad \frac{C }{Q^4} . \end{aligned}$$
(54)

Note that the above relations only account for the leading logarithms. The perturbation series of the coefficient \(c_1^f(-Q^2)\) does contain contributions of order \(g^2\) that are not enhanced by a logarithm – their role in the context of the Cottingham formula will be discussed in Sect. 15.

Inserting the asymptotic expressions for the Wilson coefficients in Eqs. (47) and (48), we obtain:

$$\begin{aligned}&\bar{T}(\nu ,-Q^2)\mathop {\rightarrow }\limits _{Q\rightarrow \infty }\quad \frac{C}{Q^4} +\frac{C_3({\frac{1}{4}}Q^2+\nu ^2)}{Q^6}\left( \!\ln \frac{Q^2}{\varLambda _{{\mathrm{QCD}}}^2}\!\right) ^{\!-1-d_2},\nonumber \\&T_2(\nu ,-Q^2)\mathop {\rightarrow }\limits _{Q\rightarrow \infty }\quad \frac{C_2}{Q^4}\left( \!\ln \frac{Q^2}{\varLambda _{{\mathrm{QCD}}}^2}\!\right) ^{\!-d_2}. \end{aligned}$$
(55)

While the coefficient C is determined by the spin averaged matrix elements of a renormalization group invariant operator,

$$\begin{aligned} C ={\sum _f}\, Q_f^2\,\langle m_f\bar{f}f\rangle ^{p-n} , \end{aligned}$$
(56)

\(C_2\) and \(C_3\) are related to the matrix elements of the spin 2 operator \(O_{\alpha \beta }^{f_2}\), which do depend on the renormalization convention used.

12 Moments of the structure functions

Let us now compare the dispersive representation (15) with the asymptotic formulae (55) obtained from perturbation theory. Since the form factors rapidly tend to zero when \(Q^2\) becomes large, the elastic part of the amplitudes does not show up in the asymptotic behaviour. The dispersion integrals approach moments of the structure functions:

$$\begin{aligned} M_k(Q^2)= & {} \int _0^{x_\mathrm{th}}dx F_k(x,Q^2) ,\quad k=2,L\nonumber \\ \bar{M}(Q^2)= & {} \int _0^{x_\mathrm{th}}dx \bar{F}(x,Q^2) . \end{aligned}$$
(57)

In our decomposition of the dispersive representation, the asymptotic behaviour (55) boils down to a set of conditions on the subtraction function and on the lowest moments of \(\bar{F}\), \(F_2\) and \(F_L\):

$$\begin{aligned}&\bar{S}(-Q^2) \quad \mathop {\rightarrow }\limits _{Q^2\rightarrow \infty }\quad \frac{C}{Q^4} , \end{aligned}$$
(58)
$$\begin{aligned}&\bar{M}(Q^2) \quad \mathop {\rightarrow }\limits _{Q^2\rightarrow \infty }\quad \frac{C_3}{4m^2}\left( \!\ln \frac{Q^2}{\varLambda _{{\mathrm{QCD}}}^2}\!\right) ^{\!-1-d_2}, \end{aligned}$$
(59)
$$\begin{aligned}&M_2(Q^2) \quad \mathop {\rightarrow }\limits _{Q^2\rightarrow \infty }\quad \frac{C_2}{4m^2}\left( \!\ln \frac{Q^2}{\varLambda _{{\mathrm{QCD}}}^2}\!\right) ^{\!-d_2}, \end{aligned}$$
(60)
$$\begin{aligned}&M_L(Q^2) \quad \mathop {\rightarrow }\limits _{Q^2\rightarrow \infty }\quad \frac{C_3}{4m^2}\left( \!\ln \frac{Q^2}{\varLambda _{{\mathrm{QCD}}}^2}\!\right) ^{\!-1-d_2}. \end{aligned}$$
(61)

In the literature, the perturbative predictions for the moments have been compared in detail with experiment [39]. The parametrization we will be using for the structure functions is based on the DGLAP equations [40,41,42,43,44]. These ensure that the behaviour of the Compton amplitude in the deep inelastic region is consistent with perturbation theory.

13 Prediction for the constant C

Neglecting isospin breaking effects of second order, the neutron matrix elements of \(e^2\bar{d}d\) agree with the proton matrix elements of \(e^2\bar{u}u\) and vice versa. The constant C can thus be expressed in terms of proton matrix elements:

$$\begin{aligned} C= \frac{4m_u-m_d}{9}\langle p|\bar{u}u-\bar{d}d|p\rangle .\end{aligned}$$
(62)

The matrix element of the operator \(\bar{u}u-\bar{d}d\) also determines the leading contribution to the QCD part of the proton–neutron mass difference (see e.g. [45]):

$$\begin{aligned} m_{{\mathrm{QCD}}}=\frac{m_u-m_d}{2m}\langle p|\bar{u}u-\bar{d}d|p\rangle \left\{ 1+O(m_u-m_d)\right\} .\nonumber \\ \end{aligned}$$
(63)

This shows that the constant C is related to the value of the proton–neutron mass difference in the absence of the e.m. interaction:

$$\begin{aligned} C= \frac{2(4m_u-m_d)}{9(m_u-m_d)}\,m\, m_{{\mathrm{QCD}}} +O(m_u-m_d) . \end{aligned}$$
(64)

Once we have determined the e.m. part, the observed mass difference will provide us with a value of \(m_{{\mathrm{QCD}}}\) and hence also with a value of the constant C.

Actually, however, the precise value of C is not crucial in our context. For our purpose, the crude estimate \(m_{{\mathrm{QCD}}}\approx -2\,\text {MeV}\) is good enough. The quark mass ratio \(r=(4m_u-m_d)/(m_d-m_u)\) is determined by \(m_u/m_d\), but is not yet known very firmly. The FLAG result \(m_u/m_d=0.513(39)\) (for \(N_f=2+1+1\)) [46] implies \(r=2.16(42)\). There is a totally independent determination of the mass ratio Q, based a low energy theorem for the decay \(\eta \rightarrow 3\pi \) [47,48,49]. A recent analysis of the data on this basis leads to \(Q=22.1(7)\) [50]. Combining this result with the well-determined ratio \(m_s/m_{ud}=27.23(10)\) [46], we obtain \(m_u/m_d=0.450(25)\) and \(r=1.46(25)\). As pointed out in [50], the origin of the difference could be identified by calculating the corrections to the low energy theorem on the lattice, but this yet needs to be done. The outcome for the constant C is tiny. With the value \(r=1.46\), we obtain:

$$\begin{aligned} C\approx 6\times 10^{-4} {\,\text {GeV}}^2 .\end{aligned}$$
(65)

The reason why the value turns out to be so small is that C vanishes in the chiral limit. It implies that C is small compared to \(C_2\) and \(C_3\). Accordingly, it takes very large values of \(Q^2\) for the singularities generated by the operators of spin 0 to finally dominate over those associated with operators of spin 2.

14 Renormalization

The asymptotic behaviour of the subtraction function in Eq. (58) implies that the corresponding contribution to \(m_{\mathrm{QED}}\) is logarithmically divergent. The leading divergence is determined by the coefficient C:

$$\begin{aligned} \int _0^{\varLambda ^2} dQ^2 Q^2 \bar{S}(-Q^2)\mathop {\rightarrow }\limits _{\varLambda \rightarrow \infty }\quad C \ln \varLambda ^2 .\end{aligned}$$
(66)

A logarithm also occurs in the electromagnetic renormalization of the bare QCD coupling constant and of the bare quark masses (see e.g. [45]):

$$\begin{aligned} \varDelta g= & {} -\frac{e^2g^3}{256\pi ^4}{\sum _f}Q_f^2 \ln \frac{\varLambda ^2}{\mu ^2} , \end{aligned}$$
(67)
$$\begin{aligned} \varDelta m_f= & {} \frac{3e^2}{16\pi ^2}Q_f^2m_f\ln \frac{\varLambda ^2}{\mu ^2} . \end{aligned}$$
(68)

The scale \(\mu \) of the logarithm is a matter of convention – picking a value for \(\mu \) amounts to fixing the ambiguity in the decomposition (1) of the mass difference into contributions arising from the e.m. and strong interactions, respectively.

In the difference between the masses of proton and neutron, the e.m. renormalization of the coupling constant only yields a contribution of second order in isospin breaking – we are neglecting such effects. The renormalization of the quark masses, on the other hand, does not drop out in the difference. In the Lagrangian, the corresponding counter term reads

$$\begin{aligned} \varDelta \mathcal {L}={\sum _f}\varDelta m_f\bar{f}f .\end{aligned}$$
(69)

The corresponding shift in the mass of a particle is given by \(-\langle p|\varDelta \mathcal {L}|p\rangle /2m\). Accordingly, the change in the proton mass generated by the renormalization of \(m_u\) and \(m_d\) is given by

$$\begin{aligned} \varDelta m^p =- \dfrac{1}{2m^p}\{\langle p|\bar{u}u|p\rangle \varDelta m_u+\langle p|\bar{d}d|p\rangle \varDelta m_d\} . \end{aligned}$$
(70)

Again neglecting effects of second order in isospin breaking, the neutron matrix elements can be expressed in terms of those of the proton:

$$\begin{aligned} \varDelta m^n =- \dfrac{1}{2m^{n}} \{\langle p|\bar{d}d|p\rangle \varDelta m_u+\langle p|\bar{u}u|p\rangle \varDelta m_d\} . \end{aligned}$$
(71)

Collecting terms and neglecting second order isospin breaking effects, we obtain the following expression for the counter term \(\varDelta m^\varLambda =\varDelta m^p-\varDelta m^n\):

$$\begin{aligned} \varDelta m^\varLambda =- \frac{\alpha _\mathrm{em}}{24\pi m}(4m_u-m_d)\langle p|\bar{u}u-\bar{d}d|p\rangle \ln \frac{\varLambda ^2}{\mu ^2} .\end{aligned}$$
(72)

Comparison with the expression (62) for the constant C that determines the asymptotic behaviour of the subtraction function shows that the two quantities are related by

$$\begin{aligned} \varDelta m^\varLambda =-N C\ln \frac{\varLambda ^2}{\mu ^2} ,\end{aligned}$$
(73)

where the normalization factor N is specified in Eq. (31). As it should be, the logarithm in the integral (66) over the subtraction function thus cancels the one in the renormalization (68) of the quark masses: the leading divergences occurring in the expression (34) for \(m_{\bar{S}}\) drop out.

15 Subleading divergence

As mentioned above, the asymptotic formulae (55) only account for the leading logarithms – they are valid only up to corrections of order \(g^2\). This applies, in particular also to the Wilson coefficient of the spin 0 operator that is responsible for the logarithmic divergence of the Cottingham formula. The correction of order \(g^2\) gives rise to a theoretical issue, which does not appear to be covered in the literature and which we briefly wish to address.

When \(Q^2\) becomes large, the effective strength of the interaction decreases in proportion to \(1/\ln Q^2\). Those corrections of order \(g^2\) in the Wilson coefficients or in the counter term \(\varDelta m^\varLambda \) that do not pick up a logarithmic enhancement are asymptotically small compared to the leading terms. This does not ensure, however, that the corresponding contributions to \(m_{\bar{S}}\) remain finite when the cutoff is removed: the corresponding contributions instead grow in proportion to

$$\begin{aligned} \int ^{\varLambda ^2}dQ^2\;\frac{1}{Q^2\ln Q^2}= \ln \ln \varLambda ^2 .\end{aligned}$$
(74)

The coefficient of order \(e^2g^2\) in the renormalization of the quark masses in QCD+QED is known [51, 52]:

$$\begin{aligned}&\mu \frac{d m_f}{d\mu } = -\gamma _m(g,e) m_f ,\\&\gamma _m(g,e) =\gamma _0\frac{g^2}{4\pi ^2}+\gamma _1\frac{g^4}{16\pi ^4}+O(g^6)\nonumber \end{aligned}$$
(75)
$$\begin{aligned}&+\frac{3e^2Q_f^2}{8\pi ^2}\left\{ 1+\frac{g^2}{12\pi ^2}+O(g^4)\right\} +O(e^4) , \end{aligned}$$
(76)

where \(\gamma _0=2\) and \(\gamma _1=\frac{101}{12}-\frac{5}{18}N_f\) are the well-known coefficients relevant for mass renormalization in QCD. The counter term \(\varDelta m^\varLambda \) considered above is related to the term of order \(e^2\) in Eq. (76).

On the other hand, the contributions of order \(g^2\) in the Wilson coefficient were considered by Shifman, Vainshtein and Zakharov, more than 40 years ago [53]. Equations (4.15) and (4.18) in this reference indicate that, in the notation used above, the coefficient C picks up the same correction as the counter term,

$$\begin{aligned} C\rightarrow C\left\{ 1+\frac{g^2}{12\pi ^2}+O(g^4)\right\} .\end{aligned}$$
(77)

As this amounts to combining results obtained within two different regularization schemes (cutoff in euclidean momentum space, dimensional regularization) it must be taken with a grain of salt, but it does indicate that the divergences of the type \(\ln \ln \varLambda ^2\) cancel. Irrespective of the regularization used, the renormalization of coupling constants and quark masses must remove the divergences also at the subleading level.

Numerically, the perturbative corrections are not important, because, as pointed out above, chiral symmetry suppresses the entire contribution to the mass difference from the region where perturbation theory applies. In that region, the corrections are even smaller than the leading terms – they are in the noise of our calculation and we neglect them. The limit \(\varLambda \rightarrow \infty \) in formula (34) can then be done explicitly. The result can be written in the form

$$\begin{aligned} m_{\bar{S}}=N\int _0^\infty dQ^2 Q^2 \left\{ \bar{S}(-Q^2)-\frac{C}{(\bar{\mu }^2+Q^2)^2}\right\} ,\end{aligned}$$
(78)

with \(\bar{\mu }=\mu \exp (-\frac{1}{2})\).

Fig. 1
figure 1

Structure function \(\bar{F}\) versus W at \(Q^2=1\) (GeV units for \(Q^2\) and W). For \(W<1.3\) and \(1.3<W<3\), we use the representations labeled MD [54,55,56,57] and BC [58, 59], respectively. In the region \(W>3\) we rely on two different parametrizations: the Regge representation AI [60] and the ABM table. For further explanations, see text

16 Input used for the structure functions

For the numerical evaluation of the inelastic contributions, we need a representation for the difference between the structure functions of proton and neutron, and not only for the relatively well explored quantity \(F_2\), but also for the longitudinal component \(F_L\), which is known less well. At low values of \(Q^2\), we closely follow the analysis of [11] and distinguish three different regions in the centre of mass energy \(W=\sqrt{s}\) (numerical values for W and \(Q^2\) are given in GeV units):

  1. (i)

    For the range \(W<1.3\), we rely on the parametrizations of the structure functions of MAID and DMT [54,55,56] – we refer to these as MD. Both of them are accessible on the MAID home page [57]. We identify the central values of the structure functions in this region with the mean of the two parametrizations and use the difference as an error estimate (half of the difference would suffice to cover the two). The green error band in Fig. 1, shows the corresponding representation of the structure function \(\bar{F}\) for \(Q^2=1\).

  2. (ii)

    In the interval \(1.3< W < 3 \), we make use of the representation due to Bosted and Christy (BC) [58, 59]. It contains a wealth of information, but suffers from several shortcomings that are discussed in detail in section 5.1 of [11]. Part of the problem originates in the fact that the longitudinal cross section is more difficult to measure than the transverse one. In [58, 59] it is assumed that the ratio \(R=\sigma _L/\sigma _T\) of the neutron cross sections is the same as for the proton. In the region where the Pomeron dominates, this holds to good accuracy, but we need the difference between the two, where Pomeron exchange drops out. The assumption amounts to using an approximation and introduces a systematic error that is not easy to estimate.

    In our opinion, the procedure used in [11] to cope with the uncertainties in the region \(1.3< W < 3 \) is on the conservative side and we adopt it in the present work: we treat the transverse and longitudinal cross sections as independent and assign an uncertainty in \(\sigma _T^{p-n}\) and \(\sigma _L^{p-n}\) of \(8\%\) of \(\sigma _T^p\) and \(8\%\) of \(\sigma _L^p\), respectively. In part of phase space, this may well overestimate the uncertainties considerably – a reanalysis of the data in the resonance region would be most welcome. The structure of the brown error band reflects the resonances occurring in this region.

  3. (iii)

    For \(W> 3,\,Q^2<1\), we rely on the parametrization of the proton structure functions due to Alwall and Ingelman (AI) [60]. It represents the amplitude as a sum of a contribution from the Pomeron and one from the \(a_2\). In the difference between the proton and neutron amplitudes, the Pomeron drops out. We assume that the couplings of the \(a_2\) to proton and neutron are approximately SU(3)-symmetric and attach an uncertainty of \(30\%\) to the representation for the difference between proton and neutron obtained on this basis. The blue band in Fig. 1 shows the corresponding uncertainty range at \(Q^2=1\). For details, we refer to section 5.1 in [11].

  4. (iv)

    In the region \(W>3\), \(Q^2>1\), we use the solution of the DGLAP equations constructed by Alekhin, Blümlein and Moch, who obained numerical values for the structure functions over a wide range: \(1 \le Q^2\le 2\times 10^5 \) and \(10^{-7}\le x\le 0.99\). The values of \(F_2(x,Q^2)\) and \(F_L(x,Q^2)\) are listed for the proton as well as for the neutron on a grid of \(60\times 98\) points. We thank Johannes Blümlein for providing us with this table, which we refer to with the acronym ABM. The underlying analysis is described in [61,62,63]. In the region where we make use of these results (see below), we estimate the uncertainty in the values obtained from ABM for \(\bar{F}^{p-n}\) and \(F_2^{p-n}\) at 30%.

    In the deep inelastic region, asymptotic freedom leads to very strong constraints, particularly for the structure function \(F_L\). The strength of these constraints is clearly visible at leading order of the perturbative expansion, where \(F_L\) is given by an integral over \(F_2\). The DGLAP equations extend this relationship to higher orders, by means of the renormalization group. In our framework, the properties of \(F_L\) play a crucial role in the evaluation of the sum rule for the subtraction function. The theoretical constraints on this quantity are very important for our analysis, particularly also because the raw experimental information for \(F_L\) is much weaker than the one for \(F_2\).

The black dots in Fig. 1 show the numbers obtained for W and \(\bar{F}\) from the entries for x, \(F_2\) and \(F_L\), at the lowest value of \(Q^2\) listed in the ABM table, \(Q^2=1\), and \(W>3\). While the result agrees very well with AI for \(W>5\), the two representations do differ at lower values of W. Since the DGLAP equations rely on perturbation theory, we should not be surprised to find deviations at low momenta, i.e. in the region where \(Q^2\) as well as W are small.

17 Polarizabilities, \(\bar{S}(0)\)

Two low energy theorems relate the values of \(T_1\) and \(T_2\) at \(q=0\) to the polarizabilities of proton and neutron:Footnote 3

$$\begin{aligned} T_1= & {} T_1^{\text {el}} -\frac{\kappa ^2}{4m^2}-\frac{m}{\alpha _\mathrm{em}}\beta _M+O(\nu ^2,q^2) .\nonumber \\ T_2= & {} T_2^{\text {el}} +\frac{m}{\alpha _\mathrm{em}}(\alpha _E+\beta _M)+O(\nu ^2,q^2) , \end{aligned}$$
(79)

where \(\kappa \) is the anomalous magnetic moment of the particle (these relations hold separately for proton and neutron). The dispersion relation for \(T_2\) converts the second one into the Baldin sum rule [64], which expresses the sum \(\alpha _E+\beta _M\) as an integral over the cross section for photoproduction. For the combination \(\bar{T}=T_1+\frac{1}{2}T_2\) we are working with, the low energy theorem involves the difference between the electric and magnetic polarizabilities:

$$\begin{aligned} \bar{T}=\bar{T}^{\text {el}} -\frac{\kappa ^2}{4m^2}+\frac{m}{2\alpha _\mathrm{em}}(\alpha _E-\beta _M)+O(\nu ^2,q^2) .\end{aligned}$$
(80)

It fixes the value of the subtraction function \(\bar{S}(q^2)\) at \(q^2=0\) in terms of the polarizabilities:

$$\begin{aligned} \bar{S}(0)=-\frac{1}{4m^2}\kappa ^2+\frac{m}{2\alpha _\mathrm{em}}(\alpha _E-\beta _M) ,\end{aligned}$$
(81)

For \(q^2=0\), our sum rule for the subtraction function thus represents an analog of the Baldin sum rule: it determines the value of the difference between the electric and magnetic polarizabilities rather than their sum, in terms of the structure functions. While the Baldin sum rule directly follows from the unsubtracted dispersion relation for \(T_2\), the one for \(\alpha _E-\beta _M\) relies on Reggeon dominance.

The integrals over the structure functions relevant for the evaluation of the subtraction function in the dispersion relation for \(T_1\) at small values of \(Q^2\) were analyzed in detail in section 5 of [11]. As shown there, the prediction for the electric polarizability comes with comparatively small uncertainties:Footnote 4

$$\begin{aligned} \alpha _E^{p-n}=-\,1.7(4)\,[12] . \end{aligned}$$
(82)

The averages for proton and neutron quoted by the Particle Data Group yield \(\alpha _E^{p-n}=-\,0.6(1.2)\) [65]. The fact that experiment agrees with the prediction within errors provides a test of Reggeon dominance.

For a review of the currently available information about the polarizabilities, in particular also of the analysis based on chiral effective theories, we refer to [66,67,68]. In the framework of \(\chi \)PT, the representation of the virtual Compton scattering amplitude has been worked out to first nonleading order [69]. In this reference, the low energy singularity generated by the \(\varDelta (1232)\) resonance is explicitly accounted for. It will be of considerable interest to compare the result of this analysis for the slope of the subtraction function at \(Q^2=0\) with the solution of the sum rule that follows from Reggeon dominance constructed in the present paper.

The Baldin sum rule and the data on photoproduction imply that the sum \(\alpha _E+\beta _M\) is known more accurately than the individual terms. For this reason, it is useful to treat \(\alpha _E\pm \beta _M\) as the two independent quantities. The results quoted for proton and neutron in the compilation of Melendez et al. lead to

$$\begin{aligned} (\alpha _E+\beta _M)^{p-n}= & {} -1.20(45)\, [69] , \end{aligned}$$
(83)
$$\begin{aligned} (\alpha _E-\beta _M)^{p-n}= & {} -0.4(3.1)\,[69] . \end{aligned}$$
(84)

Combining the prediction (82), which is based on Reggeon dominance, with the result (83) obtained from photoproduction, we obtain a prediction for the magnetic polarizability, which is slightly more accurate than the one given in [11]:

$$\begin{aligned} \beta _M^{p-n}=0.5(6) .\end{aligned}$$
(85)

There were early attempts at calculating the electric polarizabilities on the lattice [71,72,73], based on turning on a constant external electric field, but they did not reach a level where the results could be compared with the experimental determinations in a meaningful way. The very recent lattice determination of the magnetic polarizabilities, however, which makes use of a constant external magnetic field, does yield a remarkably precise value for \(\beta _M^{p-n}\),

$$\begin{aligned} \beta _M^{p-n}=0.80(28)(4)\, [73], \end{aligned}$$
(86)

in good agreement with our predicton in equation (85).

In connection with the proton–neutron mass difference, the polarizabilities are of interest because they determine the value of the subtraction function \(\bar{S}(q^2)\) at \(q^2=0\), according to (81). The prediction (82) for \(\alpha _E^{p-n}\) and the experimental value (83) of \((\alpha _E+\beta _M)^{p-n}\) imply

$$\begin{aligned} \bar{S}(0)=-\,1.71(77)\,\text {GeV}^{-2} . \end{aligned}$$
(87)

The uncertainty is significantly smaller than the one obtained from the experimental value of \((\alpha _E-\beta _M)^{p-n}\):

$$\begin{aligned} \bar{S}(0)=-\,0.2(2.6)\,\text {GeV}^{-2} [69] . \end{aligned}$$
(88)

On the other hand, combining the lattice result (86) for the magnetic polarizability with the experimental value (83) of \((\alpha _E+\beta _M)^{p-n}\), we obtain a result for the subtraction function at the origin that is even slightly more precise than the prediction:

$$\begin{aligned} \bar{S}(0)=-\,2.22(60)\,\text {GeV}^{-2} .\end{aligned}$$
(89)

The fact that, within errors, this result agrees with the prediction (87) amounts to a more stringent test of the Reggeon dominance hypothesis than the one discussed above. It is important to pursue the determination of the polarizabilities; in particular, the pioneering lattice result which provides such a test calls for confirmation.

18 Subtraction function at low \(Q^2\)

Next, we discuss the solution of the sum rule (23) for \(Q^2<1\), where the parametrizations listed in (i)–(iii) suffice. Figure 2 displays the contributions arising from the various regions of phase space.

The interval of integration in (23) is split into three parts that correspond to the regions where we are using the representations MD, BC and AI, respectively. The values of x where \(W=1.3\) and \(W=3\) are denoted by \(x_a\) and \(x_b\), respectively. In the first two parts, the integration over the term \(\bar{F}^R/x^2\) can explicitly be done – we book these contributions together with the term involving the Reggeon residues in \(\bar{S}_{AI}\). For \(Q^2<1\), the solution of the sum rule then takes the form

$$\begin{aligned} \bar{S}= & {} \bar{S}_{\mathrm{MD}}+\bar{S}_{\mathrm{BC}}+ \bar{S}_{\mathrm{AI}} ,\nonumber \\ \bar{S}_{\mathrm{MD}}= & {} \int _{x_a}^{x_\mathrm{th}}dx\,\frac{\bar{F}(x,Q^2)}{x^2(Q^2+m^2 x^2)} ,\nonumber \\ \bar{S}_{\mathrm{BC}}= & {} \int _{x_b}^{x_a}dx\,\frac{\bar{F}(x,Q^2)}{x^2(Q^2+m^2 x^2)} ,\nonumber \\ Q^2\bar{S}_{\mathrm{AI}}= & {} \int _0^{x_b}dx\,\frac{\bar{F}(x,Q^2)-\bar{F}^{\mathrm{R}}(x,Q^2)}{x^2}\nonumber \\&- \sum _{\alpha >0}\frac{b_\alpha (Q^2)}{\alpha \, x_b^\alpha }-m^2 \int _0^{x_b}dx\, \frac{\bar{F}(x,Q^2)}{Q^2+m^2 x^2} . \end{aligned}$$
(90)

The term \(\bar{S}_{\mathrm{MD}}(-Q^2)\) includes the most prominent low energy phenomenon, the resonance \(\varDelta (1232)\). Isospin conservation ensures that the couplings of this state to proton and neutron are the same, so that the resonance does not show up at all in the subtraction function relevant for the difference between the two. Indeed, as shown by the green band, the contributions from this region are small.

Fig. 2
figure 2

Subtraction function at low values of \(Q^2\) (GeV units, \(Q^2\bar{S}\) is dimensionless). The black line and the gray region depict central value and error band attached to our result for \(Q^2\le 1\). It represents the sum of the contributions from the regions \(W\le 1.3\) (MD), \(1.3\le W\le 3\) (BC) and \(3\le W\) (AI), which are discussed in the text. This part of our representation for the subtraction function stops at \(Q^2=1\) because it relies on a Regge representation that is not valid beyond this point. The cyan-coloured wedge labeled A represents the tangent at \(Q^2=0\) obtained with the magnetic polarizability of [74], see Eq. (89)

In the region of the higher resonances, we rely on the BC representation of the structure functions. The brown error band indicates the price to pay with the error estimate specified in Sect. 16: the largest uncertainty in our evaluation of the mass difference stems from there.

The blue band depicts the function \(S_{\mathrm{AI}}\). Since the Regge representation AI we are using in this region is restricted to \(Q^2< 1\), the band stops at \(Q^2=1\).

The plot shows that the contributions from MD and AI are negative, while the one from BC is predominantly positive. The net central value of \(\bar{S}(-Q^2)\), which is indicated by the black line, is rather small and negative, but the uncertainty attached to it (gray error band) excludes positive values only in the vicinity of \(Q^2=0\).

The cyan-coloured wedge labeled A represents the tangent \(Q^2\bar{S}(-Q^2)=Q^2 \bar{S}(0)+\cdots \) calculated with the value of \(\bar{S}(0)\) in Eq. (89) (lattice result for magnetic polarizability plus Baldin sum rule). It confirms that at small values of \(Q^2\), the subtraction function is negative.

19 Intermediate values of \(Q^2\)

The representations of MD and BC are valid also for \(Q^2>1\), but for AI, this is not the case. We instead rely on ABM. The formula for the corresponding contribution to the subtraction function is the same as for AI, but the representation for \(\bar{F}\) consists of a numerical table rather than an algebraic parametrization like the one of AI.

The leading term in Eq. (20) stems from the Reggeon \(a_2\) and has \(\alpha \simeq 0.55\). In order to determine the corresponding coefficient \(b_\alpha \), we focus on small values of x and approximate the numbers for \(\bar{F}\) obtained from the ABM table at a given value of \(Q^2\) with an approximation of the form

$$\begin{aligned} \hat{F}=x^{1-\alpha }(b_\alpha +b_\alpha ' x + b_\alpha '' x^2) . \end{aligned}$$
(91)

The coefficients \(b_\alpha ,b_\alpha ',b_\alpha ''\) depend on \(Q^2\). We determine them by minimizing the sum of the squares of the differences between the parametrization and the ABM values over a suitable interval. At very small values of x, the numerical noise in the entries of the table hides the signal while if x is too large, the approximation used breaks down – we find that \(10^{-4}<x<x_1\) with \(x_1=3 \times 10^{-2}\) represents a suitable range. In the grid of x-values used in the ABM table, this range contains points # 15 to 25. We fix the parameter \(b_\alpha ''\) with continuity at point # 24 and treat the coefficients \(b_\alpha ,b_\alpha '\) as free parameters. For a given value of \(Q^2\), the minimization then fixes these. In particular, the procedure determines the Reggeon residue, which according to (20) is given by \(\beta _\alpha =\frac{1}{2}Q^{-2(\alpha +1)}b_\alpha \).

Fig. 3
figure 3

Behaviour of the structure function \(\bar{F}\) for small x. The black dots represent the values extracted from the ABM table while the red curves show the polynomial fits (91)

Figure 3 compares the fit (red curves) with the values of \(\bar{F}\) obtained from the ABM data (black dots), for various values of \(Q^2\), including the lowest and highest ones listed in the table. In the region where the Pomeron dominates, the values of \(\bar{F}^p\) and \(\bar{F}^n\) are nearly the same. It is difficult to reliably determine the difference between the two from the data on inelastic scattering, even if the DGLAP equations provide a strong theoretical constraint. In the ABM table, the problem also manifests itself directly: for \(Q^2>3.5\), the results for \(b_\alpha \) exhibit fluctuations which are generated by the limited numerical accuracy of the entries and are visible in Fig. 3. On the other hand, it is questionable, whether the ABM data can be trusted down to \(Q^2=1\), because the DGLAP equations rely on perturbation theory. For these reasons we assign an overall relative error of 30% to the numbers for the difference between the structure functions of proton and neutron obtained from ABM.

Fig. 4
figure 4

Residue of the leading Reggeon. The plot shows the results obtained for the function \(\beta _\alpha \), in GeV units. Below \(Q^2=1\), the values are based on AI [60], while above that point, they rely on ABM. To make them visible despite the very rapid fall-off, a logarithmic scale is used for \(\beta _\alpha \)

Figure 4 compares the Reggeon residue extracted from the ABM table with the values for this quantity obtained from the parametrization of AI. For better visibility, the value of \(\beta _\alpha \) is plotted on a logarithmic scale. The figure shows that, at \(Q^2=1\), where the two representations meet, the results agree within errors: the two entirely different sources match, both in sign and in size.

Fig. 5
figure 5

Subtraction function at intermediate values of \(Q^2\). The black dots represent the values for \(Q^2\bar{S}\) obtained from MD, BC and ABM for \(Q^2>1\) and the error bars indicate the uncertainty estimates we attach to these. The shaded red band shows the Vector Meson Dominance parametrization of our results in that region, while the dashed red lines represent the extrapolation of this band for \(Q^2<1\). The significance of the remaining entries is indicated in the caption of Fig. 2

Concerning the evaluation of the sum rule for the subtraction function, the only difference compared to the preceding section is that the AI representations for \(\bar{F}\) and \(b_\alpha \) are replaced by those obtained on the basis of ABM. The black dots in Fig. 5 show the outcome – the error bars are obtained by adding those of the contributions from \(W<1.3\) (MD), \(1.3<W<3\) (BC) and \(W>3\) (ABM) in quadrature. For comparison, the figure also shows the behaviour of the subtraction function for \(Q^2<1\), taken over from Fig. 2.

20 Vector meson dominance

As discussed in detail above, the asymptotic freedom of QCD implies that the subtraction function obeys the asymptotic condition (58): \(\bar{S}\rightarrow C/ Q^{4}\) when \(Q^2\) becomes large. The constant C does not represent an unknown, but can be expressed in terms of the mass difference in QCD. Since C is suppressed by chiral symmetry, it is tiny: \(C\approx 6\times 10^{-4}\,\text {GeV}^2\).

In the subtraction function, the numerical noise mentioned above starts becoming visible at \(Q^2\approx 3.5\) and, for \(Q^2>6\), it hides the signal completely: there, \(\bar{S}\) vanishes within errors.

In order to interpolate between the values of \(Q^2\) where the ABM table provides significant information and the region where asymptotics sets in, we make use of the Generalized vector dominance model of Sakurai and Schildknecht [75], parametrizing the subtraction function in terms of the contributions from \(\rho \), \(\omega \) and \(\phi \). In the difference between proton and neutron, only the off-diagonal terms survive:

$$\begin{aligned} \bar{S}_{\mathrm{VMD}}(-Q^2)=\frac{1}{m_\rho ^2+Q^2}\left\{ \frac{c_\omega }{m_\omega ^2+Q^2}+\frac{c_\phi }{m_\phi ^2+Q^2}\right\} . \end{aligned}$$
(92)

The asymptotic condition requires the two terms in the bracket to nearly cancel:

$$\begin{aligned} c_\omega + c_\phi =C . \end{aligned}$$
(93)

This leaves a single parameter free, say \(c_\omega \). We determine this parameter by fitting the model to the values obtained from MD + BC + ABM in the region \(2<Q^2<3.5\). This range excludes values of \(Q^2\) below 2, where the validity of the DGLAP equations is questionable as well as the region \(Q^2>3.5\), where the fluctuations show up. The minimum occurs at

$$\begin{aligned} c_\omega =-\,0.74(49)\,\text {GeV}^2 . \end{aligned}$$
(94)

The red band in Fig. 5 shows this fit.

Since the \(Q^2\)-dependence of the VMD parametrization reproduces our results very well, the outcome for \(m_{\bar{S}}\) is not sensitive to the range used in the fit – as long as it does not extend into the region \(Q^2>6\), where the numerical fluctuations take over. The dashed red lines indicate the behaviour of the VMD parametrization at low values of \(Q^2\). Remarkably, although only input for \(Q^2>2\) was used, it shows a reasonable behaviour also at low energies. In fact, the central VMD parametrization runs within the error band obtained from the experimental information in the region \(Q^2<1\). Evaluating the representation (92) at \(Q^2 = 0\), for instance, and using the relation (81) between \(\bar{S}(0)\) and the polarizabilities, we obtain \(\alpha _E^{p - n} - \beta _M^{p - n} = -1.1(7) \). This is about four times more accurate than the available experimental information (84) and perfectly consistent with it.

We emphasize, however, that the particular form of the parametrization used to interpolate between low and high values of \(Q^2\) does not play a significant role. A parametrization of the form proposed by Erben et al. [7],

$$\begin{aligned} S_{\mathrm{ESTY}}(-Q^2)=\frac{c_0+C Q^2 }{(m_0^2+Q^2)^3} , \end{aligned}$$
(95)

is adequate as well, because it does have the proper asymptotic behaviour. Fixing \(m_0\) at the central value used in that reference, treating \(c_0\) as a free parameter and fitting it to the values of \(\bar{S}\) obtained from MD + BC + ABM in the region \(2<Q^2<3.5\), the result for the subtraction function can barely be distinguished from the one obtained with the VMD parametrization.

Fig. 6
figure 6

Moments of the structure functions \(F_2\) and \(F_L\). The full lines represent the two moments specified in Eq. (57), while the dashed ones correspond to the asymptotic formulae (60), (61) obtained from the operator product expansion. For better visibility, the entries for \(M_L\) are stretched with a factor of 10

21 Asymptotics

Figure 6 shows the moments \(M_2\) and \(M_L\) obtained from the representation of the structure functions we are using – on a logarithmic scale, so that the entire range covered by the ABM data can be seen. Visibly, the moment \(M_L\) is significantly smaller than \(M_2\) – this is to be expected, because the structure function \(F_L\) violates Bjorken scaling (at leading order of the perturbative expansion, the structure functions obey the Callan–Gross relation \(F_L=0\) [76]). The dashed lines show the asymptotic behaviour predicted by the operator product expansion. The relations (60) and (61) fix the momentum dependence of \(M_2\) and \(M_L\) up to the Wilson coefficients \(C_2\) and \(C_3\), which represent matrix elements of a spin 2 operator. The results obtained from the ABM analysis are well described by setting \(N_f=3\) and using the value \(\varLambda _{{\mathrm{QCD}}}=247\,\text {MeV}\), for which the leading order expression for the running coupling constant agrees with observation at \(\mu =M_Z\). Fitting the numerical results for the moments in the range between \(Q^2=5\times 10^3\) and the upper end of the table provided by ABM, we obtain

$$\begin{aligned} C_2=0.34 \,\text {GeV}^2 ,\quad C_3=0.072 \,\text {GeV}^2 . \end{aligned}$$
(96)

Figure 6 shows that the asymptotic formulae indeed yield a good approximation all the way down to \(Q^2\approx 100\). This property is built in: the ABM analysis is based on the DGLAP equations which in turn rely on perturbation theory. In the region where the effective coupling constant becomes small, the leading terms must dominate. The figure also confirms that \(M_L\) disappears more rapidly than \(M_2\) by one power of the logarithm, but both moments only fall off very, very slowly.

Figure 7 shows the behaviour of the structure function \(\bar{S}\) at large values of \(Q^2\), on a logarithmic scale. The red line represents the VMD parametrization (92) of our central result. To make the asymptotic behaviour visible, the vertical axis is stretched with the factor \(Q^4\). The quantity \(Q^4\bar{S}\) approaches the Wilson coefficient C, which is determined by the proton matrix elements of the spin 0 operator \(\frac{1}{9}(4m_u-m_d)(\bar{u}u- \bar{d}d)\) and is indicated by the dashed red line. As discussed in Sect. 15, C picks up a correction of \(O(g^2)\). The red dots represent the values of the function

$$\begin{aligned} Q^4\bar{S}^{\text {as}}=C\left\{ 1+\frac{g^2}{12 \pi ^2}\right\} . \end{aligned}$$
(97)

The correction is too small to make a visible difference (at the mass of the Z-boson, which is marked with a star, it increases the value of C by about 1%).

Fig. 7
figure 7

Asymptotic behaviour of the subtraction function. The red line shows the VMD parametrization of our results for \(Q^4 \bar{S}\), while the red dots indicate the asymptotic behaviour that follows from the OPE. The blue lines represent the corresponding results for the quantity \(Q^4S_1^{\text {inel}}\) that plays the same role in traditional analyses of the Cottingham formula (\(Q^2\) as well as \(\bar{S}\) and \(S_1\) are given in GeV units). The star indicates the point where \(Q=M_Z\)

Traditionally, the subtraction function is identified with a multiple of \(S_1(-Q^2)\equiv T_1(0,-Q^2)\). The relation between this object and the subtraction function we are working with is readily established by comparing the dispersion relations obeyed by \(\bar{T}\) and \(T_1\). The quantity to compare \(\bar{S}\) with is the inelastic part of \(S_1\),

$$\begin{aligned} S_1^{\text {inel}}(-Q^2)\equiv T_1(0,-Q^2)-T_1^{\text {el}}(0,-Q^2) . \end{aligned}$$
(98)

The comparison of the two dispersion relations yields

$$\begin{aligned} S_1^{\text {inel}}(-Q^2)= & {} \bar{S}(-Q^2)-\varDelta S(-Q^2) ,\nonumber \\ \varDelta S(-Q^2)= & {} \frac{m^2}{Q^2}\int _0^{x_\mathrm{th}}dx \frac{2F_2(x,Q^2)-F_L(x,Q^2)}{Q^2+m^2x^2} . \end{aligned}$$
(99)

In Fig. 7, our result for \(Q^4S_1^{\text {inel}}\) (obtained by subtracting the term \(Q^4\varDelta S\) from the result for \(Q^4 \bar{S}\)) is shown as a blue line. For large values of \(Q^2\), the integral over \(2F_2-F_L\) becomes proportional to \(2M_2(Q^2)-M_L(Q^2)\). With the asymptotic formulae for the moments, the asymptotic behaviour of \(S_1^{\text {inel}}\) thus takes the form

$$\begin{aligned} Q^4S_1^{\text {as}}=C-{\frac{1}{2}}C_2 \left( \!\ln \frac{Q^2}{\varLambda _{{\mathrm{QCD}}}^2}\!\right) ^{\!-d_2} +{\frac{1}{4}}C_3\left( \!\ln \frac{Q^2}{\varLambda _{{\mathrm{QCD}}}^2}\!\right) ^{\!-1-d_2} .\nonumber \\ \end{aligned}$$
(100)

This shows that, while the asymptotics of \(\bar{S}\) is governed by the matrix elements of a scalar operator, \(S_1^{\text {inel}}\) picks up additional contributions proportional to the Wilson coefficients \(C_2\) and \(C_3\), which represent matrix elements of a spin 2 operator.

The qualitative difference in the asymptotic behaviour of \(Q^4\bar{S}\) and \(Q^4S_1^{\text {inel}}\) originates in the fact that

  1. (i)

    the approximate chiral symmetry of QCD suppresses the coefficient C, while \(C_2\), \(C_3\) are not suppressed – they are larger than C by two to three orders of magnitude;

  2. (ii)

    while the contribution proportional to C is independent of \(Q^2\), those from \(C_2\) and \(C_3\) fall off logarithmically.

Although, eventually, C dominates \(Q^4 S_1^{\text {inel}}\) as well, asymptopia is reached only if \(Q^2\) is so large that the logarithmic suppression of the spin 2 contributions wins over the chiral suppression of those with spin 0 – from \(Q^2=10^2\) to \(Q^2=10^3\), the value of \(Q^4 S_1^{\text {inel}}\) only shrinks by about \(10\%\).

The counter term \(\varDelta m^\varLambda \) only removes the leading and subleading divergences associated with C. The additional divergence proportional to \(C_2\) does not have anything to do with renormalization and is of purely technical nature: Eq. (60) shows that the same divergence also shows up in the asymptotic behaviour of \(T_2\). In the sum of the contributions from \(S_1^{\text {inel}}\) and \(T_2\), the spin 2 divergences cancel [17]. It is difficult, however, to specify the contribution from \(S_1^{\text {inel}}\) by itself: the asymptotic formula (100) shows that this contribution diverges unless the non-leading term proportional to \(C_2\) is removed as well as the leading one. Our framework avoids these problems.

22 Numerical evaluation of the mass difference

22.1 Form factors, \(m_{\text {el}}\)

The elastic contribution to the e.m. part of the mass difference is determined by the form factors. In early work, the experimental information about these was adequately described by the dipole formulae (see e.g. appendix A of [11]). They yield \(0.63\,\text {MeV}\) for the proton and \(-0.13\,\text {MeV}\) for the neutron, so that the elastic contribution to the self-energy difference amounts to \(m_{\text {el}}=0.76\,\text {MeV}\) [4]. In the meantime, the precision to which the form factors are known has increased significantly [77,78,79,80]. Using this information, we obtain

$$\begin{aligned} m_{\text {el}}=0.75\pm 0.02\,\text {MeV}.\end{aligned}$$
(101)

The error bar covers the results obtained with the three parametrizations of [77,78,79]. This indicates that, in the difference between the e.m. self-energies of proton and neutron, the departures from the dipole formulae only generate a change of the order of a percent. The uncertainties in the result for the mass difference generated by the elastic part are totally neglible compared to those from the inelastic contributions.

22.2 Contribution from the subtraction function

The contribution to \(m_{\bar{S}}\) depends on the scale \(\mu \) used in the e.m. renormalization of the quark masses. For definiteness, we use \(\mu =\mu _2\equiv 2\,\text {GeV}\). If \(\mu \) is taken differently, the mass difference changes by \(2NC\ln (\mu /\mu _2)\).

In the region \(0<Q^2<1\) our representation of the subtraction function is based on the parametrizations MD, BC and AI (gray band in Fig 5). Inserting this representation in formula (78) we obtain

$$\begin{aligned} m_{\bar{S}} {\small {(0<\!Q^2\!<\!1)}}=-\,0.034(68)\,\text {MeV}. \end{aligned}$$
(102)

The central value is negative and reduces the elastic contribution by about \(5\%\). The error is twice as large, however, so that small positive contributions from this region are not excluded.

Since the integrand of \(m_{\bar{S}}\) is proportional to \(Q^2\bar{S}\), small values of \(Q^2\) are suppressed; the fictitious spike occurring there in the parametrization of BC (see Figs. 3–5 in [11]) does not affect the result very strongly, but an improved analysis of the structure functions in the resonance region above the \(\varDelta (1232)\) would allow reducing the quoted uncertainty.

In the region \(1<Q^2<\infty \), we use the VMD parametrization of \(\bar{S}\) and obtain

$$\begin{aligned}&m_{\bar{S}} {\small (1\!<\!Q^2\!<\!2)}=-\,0.040(27)\,\text {MeV},\nonumber \\&m_{\bar{S}} {\small (2\!<\!Q^2\!<\!\infty )}= -\,0.092(61)\,\text {MeV}. \end{aligned}$$
(103)

To account for the correlations between the contributions from the various regions, we determine the net error in \(m_{\bar{S}}\) by evaluating the integral in Eq. (34) for the upper and lower edges of the error band. This leads to

$$\begin{aligned} m_{\bar{S}}=-\,0.17(16)\,\text {MeV}. \end{aligned}$$
(104)

22.3 Contributions from the dispersion integrals

Finally, we evaluate the convergent integrals \(m_{\bar{F}}\), \(m_{F_2}\) in Eqs. (29) and (30). In these integrals, the small x region does not require special care. As mentioned above, the angular integration suppresses the contributions from the deep inelastic region. In fact, a very strong suppression also occurs at low values of \(Q^2\). Numerically, these integrals are tiny:

$$\begin{aligned} m_{\bar{F}}= & {} -0.0004(4) \,\text {MeV},\nonumber \\ m_{F_2}= & {} -0.0039(10)\,\text {MeV}. \end{aligned}$$
(105)

22.4 Result for \(m_{\mathrm{QED}}\) and \(m_{{\mathrm{QCD}}}\)

Collecting the various contributions, the part of the proton–neutron mass difference that is due to the e.m. interaction becomes

$$\begin{aligned} m_{\mathrm{QED}}= 0.58\pm 0.16\,\text {MeV}. \end{aligned}$$
(106)

The observed mass difference then yields

$$\begin{aligned} m_{{\mathrm{QCD}}}=-\,1.87\mp 0.16\,\text {MeV}. \end{aligned}$$
(107)

The result for \(m_{{\mathrm{QCD}}}\) provides a more precise estimate for the leading Wilson coefficient:

$$\begin{aligned} C= 5.7(1.1)\times 10^{-4}\,\text {GeV}^2 . \end{aligned}$$
(108)

We have repeated the entire calculation with this input instead of the crude estimate used for this constant. At the quoted accuracy, the results stay put.

23 Comparison with Lattice calculations

Within QCD, the lattice approach allows a determination of the mass spectrum with steadily increasing precision, not only for the mesons but also for the more difficult case of the baryons. The inclusion of the e.m. interaction gives rise to a serious problem, however, because this interaction is of long range – enclosing the system in a box distorts the results through finite size effects that need to carefully be sorted out. In comparison with the extensive documentation available for lattice determinations of the quark masses within QCD, the literature containing numerical results for \(m_{\mathrm{QED}}\) is rather scarce. Figure 8 collects the results we found. Visibly, the likelihood for the results listed to represent statistically independent measurements of the same physical quantity is quite small. Indeed, not all of the errors shown include an estimate for the systematic uncertainties. Also, not all of the listed papers have appeared in print. Some of the results are obtained from a calculation that simulates QCD+QED, others stay within QCD, calculate the part due to the difference between \(m_u\) and \(m_d\) and determine the part that comes from the e.m. interaction by comparing the calculated part with the experimental value. It is well-known that the splitting into two parts depends on the convention used, but this is a theoretical problem that does not require numerical simulations.

Our numerical result for \(m_{\mathrm{QED}}\) is dominated by the elastic contribution; the remainder is significantly smaller and negative. The most recent lattice results listed in Fig. 8 are instead larger than the elastic contribution: the remainder is positive and comparable to the elastic term. Clearly, our result is not consistent with that.

Fig. 8
figure 8

Electromagnetic part of the mass difference between proton and neutron (MeV units). The upper part shows lattice results [81,82,83,84,85,86,87,88,89], the lower part contains results obtained with the Cottingham formula [4, 6,7,8, 10, 45], including the outcome of our analysis

24 Comparison with other evaluations of the Cottingham formula

We are aware of four recent estimates for the proton–neutron mass difference based on evaluations of the Cottingham formula: Walker-Loud, Carlson and Miller (WCM) [6, 9], Erben, Shanahan, Thomas and Young (ESTY) [7], Thomas, Wang and Young (TWY) [8] and Tomalak [10]. The first three propose models for the subtraction function \(S_1(-Q^2)\), using the experimental information concerning the difference between the magnetic polarizabilities of proton and neutron to determine the value of \(S_1(0)\) and making a simple algebraic ansatz for the momentum dependence. A detailed comparison of the models proposed by WCM and ESTY with the results obtained from Reggeon dominance at low values of \(Q^2\) can be found in [11].

Tomalak [10] also uses the available experimental information about the magnetic polarizabilities, but instead of making an ansatz for the momentum dependence of the subtraction function, he calculates it on the basis of the assumption that – once the contributions from the Reggeons are removed – the amplitude \(\hat{T}_1=q^2T_1+\nu ^2T_2\) obeys an unsubtracted dispersion relation [90]. Although this assumption resembles Reggeon dominance, we consider it very unlikely that it is correct. For \(q^2 = 0\), for instance, the amplitude \(\hat{T}_1\) reduces to \(\nu ^2T_2\). The asymptotic behaviour of this quantity was investigated by Damashek and Gilman [91] and, independently, by Dominguez, Ferro Fontan and Suaya [92]. Their work indicates that \(f=\nu ^2 (T_2-T_2^{\mathrm{R}})\) tends to a nonzero constant when \(\nu \) becomes large. The assumption used in [10] instead implies that f tends to zero. At any rate, this hypothesis implies a constraint on the imaginary part of \(T_2\) at \(q^2=0\), i.e. on the cross section of photoproduction: it leads to a sum rule that requires an integral over the cross section to cancel the Thomson term. We do not know of an argument that would support this assumption.

Incidentally, the assumption used in [10] corresponds to a special case of the universality hypothesis of Brodsky, Llanes-Estrada and Szczepaniak [93,94,95], who do not impose the condition that the difference \(\hat{T}_1-\hat{T}_1^R\) tends to zero for \(\nu \rightarrow \infty \), but postulate that it becomes independent of \(q^2\). We cannot see any reason for this to be the case in QCD (see also [96,97,98]).

24.1 Contributions from the elastic part

Since \(T_2\) obeys an unsubtracted dispersion relation, the corresponding Born term is readily obtained by saturating the dispersion integral with the contributions from the nucleon poles. For \(T_1\), however, the Born term is not unique – various different expressions are used in the literature. They all obey a subtracted dispersion relation, but differ in the choice of the subtraction function.

Dispersion theory offers a unique solution: since analytic functions are determined by their singularities and their behaviour at infinity, it suffices to impose the condition that the Born term vanishes for \(\nu \rightarrow \infty \). We refer to the resulting expression as the elastic part of the amplitude. It is explicitly given in formula (8) (the unsubtracted dispersion relation used to specify the Born term for \(T_2\) automatically ensures that it disappears if \(\nu \) becomes large). Accordingly, the elastic part of \(m_{\mathrm{QED}}\), which we denote by \(m_{\text {el}}\), is an unambiguous notion as well. It is obtained by replacing the amplitudes in (26) by their elastic parts and removing the cutoff – the elastic contributions are convergent.

WCM [6] instead represent the elastic part of the mass difference with two terms.Footnote 5 The sum of the two, \(\delta M_{\text {el}}+\delta M_{\text {el}}^{\mathrm{sub}}\), differs from \(m_{\text {el}}\) by

$$\begin{aligned} \varDelta m_{\text {el}} =-\frac{3\alpha _\mathrm{em}m}{2\pi }\int _0^\infty dQ^2Q^2\,\frac{(G_E-G_M)^2}{(4m^2+Q^2)^2} . \end{aligned}$$
(109)

Numerically, \(\varDelta m_{\text {el}}\) is small: using the parametrization of Kelly [78], we obtain \(\varDelta m_{\text {el}}^p=-\,0.051\,\text {MeV}\), \(\varDelta m_{\text {el}}^n=-\,0.064\,\text {MeV}\). In the difference between proton and neutron, these numbers even partly cancel.

At the precision at which the nucleon form factors can nowadays be measured, it matters whether the standard expression for \(m_{\text {el}}\) or the quantity \(m_{\text {el}}+\varDelta m_{\text {el}}\) is determined. For the decomposition (28) to be valid, it is essential that the nucleon form factors exclusively occur in \(m_{\text {el}}\) – any other representation of the elastic part must be compensated by a corresponding correction in the term arising from the subtraction function.

24.2 Contributions from the subtraction function

As demonstrated in the preceding sections, the inelastic contributions to \(m_{\mathrm{QED}}\) are totally dominated by the one from the subtraction function \(\bar{S}\). The differences in the values quoted for the elastic contributions are small compared to those from inelastic processes. Hence we can compare the various determinations of the mass difference that rely on dispersion theory by comparing the corresponding representations for \(\bar{S}\).

The bands labeled B and C in Fig. 9 show the models for the subtraction function of WCM [6] and ESTY [7], respectively. They are obtained from the representations proposed for \(S_1\) in these references, merely converting numbers for \(S_1^{\text {inel}}\) into numbers for \(\bar{S}\) by means of Eq. (99). The width of the bands exclusively shows the uncertainties arising from the experimental information used for the magnetic polarizabilities – those associated with the freedom in the choice of the model would widen it further. In the \(Q^2\) range shown in the figure, both models are consistent with our analysis, but come with significantly larger errors (as the lower edge of band C runs within our band of uncertainties, it cannot be seen in Fig. 9).

Fig. 9
figure 9

Subtraction function versus \(Q^2\), in GeV units. The bands labeled MD+BC+AI and VMD indicate our representation in the region below and above \(Q^2=1\), respectively. The wedge A depicts the tangent at \(Q^2=0\) obtained with the magnetic polarizability of [74], B: model [6], C: model [7], D: parametrization of [7] applied to \(\bar{S}\)

The input used in models B and C for the value of the subtraction function at \(Q^2=0\) is the same – it is based on the experimental determination of the polarizabilities of the nucleon. At small values of \(Q^2\), our uncertainties are smaller because the predictions obtained from Reggeon dominance for the polarizabilities of the neutron [11] are more precise than the experimental values. An improved measurement of the polarizabilities would be most welcome as it would subject Reggeon dominance to an important test. In this connection, we also refer to the new lattice results on the magnetic polarizabilities discussed in Sect. 17.

At large values of \(Q^2\), the uncertainty band attached to model C is more narrow than the one of B, because the parametrization is improved: asymptotically, model C does reproduce the leading term in the operator product expansion of \(S_1^{\text {inel}}\). As can be seen in Fig. 7, however, the nonleading spin 2 contributions disappear only extremely slowly. In the parametrization of model C, these are neglected.

The net result obtained with model C for \(\beta _M^{p-n}=-\,0.5(1.6)\) is \(m_{\mathrm{QED}}^C=0.95(25)\,\text {MeV}\) [7]. The corresponding outcome for the contribution from the subtraction function is obtained by removing the elastic part as well as those from the convergent dispersion integrals. With the entries for the elastic contributions listed in Table I of [7] and the values given in Eq. (105) for the tiny terms \(m_{F_2}\) and \(m_{\bar{F}}\), this yields

$$\begin{aligned} m_{\bar{S}}^C=0.19(25) . \end{aligned}$$
(110)

The value obtained by integrating the subtraction function of model C only over the low energy region is nearly the same: \(m_{\bar{S}}^C(Q^2\!<\!2)=0.20(29)\,\text {MeV}\). This indicates that in the evaluation of model C in [7], the contribution from \(Q^2>2\) is nearly cancelled by the counter term, but we cannot verify this within our own framework. Since the parametrization of \(S_1\) used in model C neglects the non-leading contributions in the asymptotic formula (100), it does not make sense to insert the corresponding representation for \(\bar{S}\) in the expression (78) for \(m_{\bar{S}}\) – the integral diverges. Also, the blue line in Fig. 7 shows that for \(S_1^{\text {inel}}\), asymptopia sets in extremely slowly, because the contributions generated by the short distance singularities of spin 2 fall off only logarithmically.

The numerical results for the subtraction function used by TWY and Tomalak are very similar to model C and they also lead to similar results for the e.m. part of the mass difference: \(m_{\mathrm{QED}}=1.04(11)\,\text {MeV}\) (TWY) and \(m_{\mathrm{QED}}=1.09(30)\,\text {MeV}\) (Tomalak). The difference mainly arises from the input used for \(\beta _M^{p-n}\). Note that the value \(\beta _M^{p-n}=-1.12(40)\) used by TWY comes with a remarkably small error and disagrees with the Reggeon dominance prediction (85) by 2.5 \(\sigma \). This is puzzling, because the determination of \(\beta _M^{p-n}\) in TWY is based on the lattice data of Blum et al. [82] – as shown in Fig. 8, these data are perfectly consistent with the range for \(m_{\mathrm{QED}}\) obtained from Reggeon dominance.

The ambiguities related to the fact that the function \(S_1^{\text {inel}}\) approaches asymptotics only very slowly do not arise if the parametrization of model C is used to represent \(\bar{S}\) rather than \(S_1^{\text {inel}}\). We refer to this option as model D: the momentum dependence of \(\bar{S}\) is described by the function specified in Eq. (95), \(m_0\) is identified with the scale \(m_0^2=0.71\,\text {GeV}^2\) occurring in the dipole representation of the nucleon form factors [9] and the parameter \(c_0\) is fixed with the experimental value \(\bar{S}(0)=-0.2(2.6)\,\text {GeV}^{-2}\) given in equation (88), which is based on the determination of the polarizabilities in [70]. The blue shaded region in Fig. 9 shows that the subtraction function obtained with this variant of the models proposed in [6,7,8] agrees perfectly well with our analysis, but comes with a much larger error. Inserting the parametrization of model D in formula (78), we obtain

$$\begin{aligned} m_{\bar{S}}^D=-\,0.05(61)\,\text {MeV}. \end{aligned}$$
(111)

The region \(Q^2>2\) does not contribute much to the central value, but is responsible for a substantial fraction of the error: \(m_{\bar{S}}^D(Q^2\!>\!2)=-\,0.02(28)\,\text {MeV}\).

C and D have the same behaviour at very small and very large values of \(Q^2\) – they only differ in the form of the interpolation used in between. The example shows that – if only the leading terms in the OPE of the subtraction function are accounted for, the outcome is very sensitive to the form of the interpolation: replacing C by D lowers the central value of \(m_{\bar{S}}\) by \(0.24 \,\text {MeV}\) and thus lowers the outcome for the central value of the mass difference to \(m_{\mathrm{QED}}=0.71\,\text {MeV}\). This is within the uncertainty range attached to our result (106). The sensitivity to the form chosen for the interpolation arises because the subtraction function \(S_1\) reaches asymptotics only very slowly.

Our analysis is not affected by this ambiguity, because we calculate the subtraction function in the region \(Q^2<3.5\) on the basis of the experimental information about the structure function \(\bar{F}\) and rely on the theoretical information about the asymptotics only at higher energies. As pointed out in Sect. 20, the contribution arising from the region \(Q^2>2\) is nearly independent of the form of the parametrization used there, provided only that it obeys the theoretical constraints imposed by asymptotic freedom.

25 Summary and conclusions

  1. 1.

    Dispersion theory determines the amplitude in terms of its physical singularities (poles, cuts), provided the asymptotic behaviour is known. The use of amplitudes that contain kinematic zeros is best avoided, because these make it very difficult to sort out the asymptotic behaviour. We work with the invariant amplitudes introduced by Cottingham which do not contain such deficiencies and which we denote by \(T_1,T_2\).

  2. 2.

    In the framework of dispersion theory, the elastic part of \(T_1,T_2\) is an unambiguous notion, determined by the requirement that it is analytic except for the poles generated by the elastic reaction and disappears when \(\nu \rightarrow \infty \). Accordingly, the elastic contribution to the Cottingham formula is unambiguous.

  3. 3.

    As we do not know the error matrix occurring in the determinations of the form factors, we are not in a position to give a reliable estimate for the uncertainties in \(m_{\text {el}}\). We instead rely on the results obtained with the three different parametrizations in [77,78,79], which are covered by

    $$\begin{aligned} m_{\text {el}}=0.75\pm 0.02\,\text {MeV}.\end{aligned}$$
    (112)

    A determination of \(m_{\text {el}}^p\), \(m_{\text {el}}^n\) and \(m_{\text {el}}^{p-n}\) on the basis of the information about the nucleon form factors available today would reduce the error considerably, but at the precision to which the inelastic contributions can currently be determined, the uncertainty quoted in (112) is too small to affect the error estimate attached to our result for \(m_{\mathrm{QED}}\).

  4. 4.

    The leading terms of the operator product expansion of the Compton amplitude involve contributions arising from short distance singularities related to operators of spin 0 as well as spin 2. We make use of the fact that the leading spin 2 contributions to \(T_1\) and \(T_2\) only differ in normalization: in the combination \(\bar{T}\equiv T_1+\frac{1}{2}T_2\), they drop out. Replacing the pair \(\{T_1,T_2\}\) by \(\{\bar{T},T_2\}\) simplifies the analysis considerably.

  5. 5.

    A further simplification occurs if the dispersion relation for \(\bar{T}\) is not subtracted at \(\nu =0\), but at \(\nu =\frac{1}{2}\sqrt{q^2}\). This ensures that the contributions from the dispersion integrals over \(\text {Im}\,\bar{T}\) and \(\text {Im}\,T_2\) both contain the factor \(\nu ^2-\frac{1}{4}q^2\). The point here is that in the Cottingham formula, only the angular average matters. Since the angular average of \(\nu ^2-\frac{1}{4}q^2\) vanishes, the contributions from the dispersion integrals are suppressed – numerically, these contributions are tiny. In our decomposition of the amplitude, only the elastic term and the integral over the subtraction function can generate significant contributions to the mass difference.

  6. 6.

    The quarks and gluons reggeize. The exchange of Reggeons generates moving poles. For large values of \(\nu \) at fixed \(q^2\), a Reggeon contributes with \(\bar{T}\propto \nu ^\alpha \) and \(T_2\propto \nu ^{\alpha -2}\), where \(\alpha \) is the value of the trajectory \(\alpha (t)\) at \(t=0\). Since there are trajectories with \(\alpha >0\), the dispersion relation for \(\bar{T}\) must be subtracted. The one for \(T_2\) does not require a subtraction.

  7. 7.

    We assume that the asymptotic behaviour of \(\bar{T}\) is dominated by the contributions from the Reggeons, which we denote by \(\bar{T}^{\mathrm{R}}\). More precisely, we require that \(\bar{T} -\bar{T}^{\mathrm{R}}\) tends to zero when \(\nu \rightarrow \infty \) and refer to this assumption as Reggeon dominance. A nonzero limiting value would represent a fixed pole – we are thus assuming that reggeization is complete and only moving poles occur. Note that the dispersion relations for \(\bar{T}\) and \(T_2\) imply the presence of contributions that fall off with the power \(\nu ^{-2}\). In \(T_2\), these contributions correspond to a fixed pole at \(\alpha =0\) – Reggeon dominance is perfectly consistent with fixed poles of this sort.

  8. 8.

    Reggeon dominance implies a sum rule that determines the subtraction function \(\bar{S}\) in terms of the structure function \(\bar{F}\). The explicit expression given in (23) shows that neither the nucleon form factors nor the structure function \(F_2\) enter. A variant of this sum rule was proposed by Elitzur and Harari, long ago [3], on the basis of duality and finite energy sum rules.

  9. 9.

    The value of \(\bar{S}(q^2)\) at \(q^2=0\) is related to the polarizabilities of the nucleon. As is well known, the sum of the electric and magnetic polarizabilities is determined by a sum rule involving the cross section for photoproduction. Reggeon dominance implies separate sum rules for the electric and magnetic polarizabilities. The prediction obtained for the difference between the magnetic polarizability of proton and neutron [11] is in agreement with experiment, but this represents only a rather weak test of Reggeon dominance, because the uncertainties in the experimental result are rather large. The errors attached to the recent lattice result of [74] are much smaller – it is encouraging that Reggeon dominance passes this more stringent test as well. More work on the polarizabilities, particularly those of the neutron, would be most welcome.

  10. 10.

    Theory fixes the asymptotic behaviour of the subtraction function: if \(Q^2\) becomes large, \(\bar{S}\) tends to \(C/Q^4\), where the constant C is given by the proton matrix element of the operator \(\frac{1}{9}(4m_u-m_d)(\bar{u}u-\bar{d}d)\). This also holds for \(S_1(q^2)=T_1(0,q^2)\), the subtraction function commonly used in dispersive analyses of the Compton amplitude, but the short distance singularities related to operators of spin 2 generate a significant difference in the asymptotic behaviour. Figure 7 compares the momentum dependence of \(S_1\) and \(\bar{S}\) on a logarithmic scale and shows that, in contrast to \(\bar{S}\), the asymptotics of \(S_1\) sets in only very, very slowly.

  11. 11.

    An important part of the calculation concerns the determination of the residue of the Reggeon \(a_2\), which dominates the asymptotic behaviour of the difference between the amplitudes of proton and neutron. Figure 4 shows that the result obtained at low values of \(Q^2\) from the Regge representation of [60] matches the outcome of the Regge fit to the numerical ABM table remarkably well.

  12. 12.

     With the values for the subtraction function obtained from the solution of the sum rule, our net result for the e.m. part of the mass difference between proton and neutron reads

    $$\begin{aligned} m_{\mathrm{QED}}^{p-n}= 0.58\pm 0.16\,\text {MeV}. \end{aligned}$$
    (113)

    The conclusions reached in Ref. [4] are thus confirmed: \(m_{\mathrm{QED}}\) is dominated by the elastic contribution. The uncertainty in the result obtained forty five years ago, \(m_{\mathrm{QED}}= 0.7(3)\,\text {MeV}\) [4], is reduced by about a factor of two. In the present analysis, the uncertainty is predominantly due to the contributions from the resonance region above the \(\varDelta (1232)\). It could be reduced by an improved experimental determination of the structure functions in that region, particularly for the neutron.

  13. 13.

    It is by no means puzzling that the inelastic contributions are so small: (a) the angular integration suppresses the contributions from the dispersion integrals, (b) at large values of \(Q^2\), the subtraction function is nearly the same for proton and neutron – in the chiral limit, there is no difference, (c) in the region where Reggeon exchange dominates, the leading term, the Pomeron, is the same, (d) isospin symmetry ensures that the most important resonance, the \(\varDelta (1232)\), contributes equally to proton and neutron and (e) the leading terms of the chiral perturbation series are also the same.

    With the experimental value of the mass difference, the above result implies that the part due to the difference between \(m_u\) and \(m_d\) is given by

    $$\begin{aligned} m_{{\mathrm{QCD}}}^{p-n}=-\,1.87\mp 0.16\,\text {MeV}.\end{aligned}$$
    (114)
  14. 14.

    The lattice results for these quanitities did not yet reach a level of coherence to be covered by the FLAG report, but the method is steadily being improved and, in the long run, should provide reliable numbers. Figure 8 indicates that the most recent lattice values are larger than the outcome of the present work. If the value of \(m_{\mathrm{QED}}^{p-n}\) should turn out to be larger than \(1\,\text {MeV}\), we would have to conclude that the Compton amplitude does not fully reggeize: the amplitude \(T_1\) would then contain a fixed pole that invalidates the Reggeon dominance hypothesis. We would then be left with a puzzle: what is the physical origin of this fixed pole?

  15. 16.

    The evaluations of the Cottingham formula in [6,7,8] lead to values for \(m_{\mathrm{QED}}\) around \(1\,\text {MeV}\). In these references, a simple algebraic ansatz is used to parametrize \(S_1^{\text {inel}}\), the inelastic part of the subtraction function \(T_1(0,q^2)\). Figure 7 shows that, in contrast to these parametrizations, \(S_1^{\text {inel}}\) approaches asymptotics only extremely slowly.

    The mismatch with the asymptotics disappears if the ansatz is assumed to be valid for \(\bar{S}\) rather than \(S_1^{\text {inel}}\). The central value obtained for \(m_{\mathrm{QED}}\) then drops by \(0.24\,\text {MeV}\) and winds up slightly below the elastic contribution, in agreement with what we find. On the other hand, quite apart from the sensitivity to the precise form of the assumptions underlying those models, the uncertainties in the result for \(m_{\mathrm{QED}}\) are much larger than ours, because the experimental determination of \(\beta _M^{p-n}\), which plays a key role in that approach, is subject to large uncertainties.