1 Introduction

The muon anomalous magnetic moment, characterized by the gyromagnetic anomaly, \(a_\mu =(g-2)/2\), is the subject of notable current interest. Recent measurements at Fermilab [1, 2] with increased precision exhibit a spectacular excess compared to the Standard Model (SM) prediction [3] whose lowest-order (LO) hadronic vacuum polarization (HVP) contribution is evaluated with dispersion integrals involving \(e^+e^-\rightarrow \textrm{hadrons}\) cross-section data [4,5,6,7,8,9]. Other contributions are from quantum electrodynamics [10, 11], electroweak interactions [12, 13], NLO HVP [9], NNLO HVP [14], hadronic light-by-light [15,16,17,18,19,20,21,22,23,24,25,26,27,28,29]. While the observed \(5\sigma \) deviation could be regarded as a serious clue for physics beyond the SM, it must be taken with extreme caution in view of significant tensions among the data sets entering the HVP calculations. Although the decade-long discrepancy between the two most precise results of the \(e^+e^-\rightarrow \pi ^+\pi ^-(\gamma )\) cross sectionFootnote 1 from KLOE [30,31,32,33] and BABAR [34, 35] was already taken into account in the systematic uncertainty assigned to the prediction [3, 8], the recent measurement of the same process by CMD-3 [36, 37] is in conflict with all previous determinations, thus requiring a close scrutiny of all the \(e^+e^-\) input data.

A tension of a different nature arose almost four years ago with the first precise HVP calculation using QCD on the lattice [38] that resulted in a \(2.1\sigma \) larger lowest-order contribution than the dispersive analysis. The tension is exacerbated to \(3.7\sigma \) if the comparison is restricted to an intermediate HVP window in Euclidean time [39], which can be calculated more precisely on the lattice. Confirmation of this discrepancy has since then been obtained by several independent lattice groups [40,41,42,43]. This situation calls again for specific studies cross-checking both approaches [44, 45].

This paper reviews the existing tensions among the \(e^+e^-\!\rightarrow \pi ^+\pi ^-\) cross-section measurements, and discusses in detail systematic uncertainties related to higher-order (HO) effects [46] in the measurements relying on initial state photon radiation. In view of the results obtained we reappraise the use of \(\tau \) hadronic spectral functions in the dispersive approach, and discuss the discrepancies of the dispersive HVP calculations with lattice QCD and the \(a_\mu \) experimental result.

2 Tensions among the \(e^+e^-\rightarrow \pi ^+\pi ^-(\gamma )\) data sets

The \(e^+e^-\!\rightarrow \pi ^+\pi ^-\) channel contributes with 73% to the lowest-order HVP contribution to \(a_\mu \) in the dispersive approach, and 58% to its uncertainty-squared. It also leads to the largest observed discrepancies among some of the most precise data sets. The studies in this paper therefore focus on that process.

The longest known and most critical tensions occur between precise cross section measurements from KLOE and BABAR. Albeit heavily discussed in the framework of the Muon g – 2 Theory Initiative [47], no understanding of the difference could be achieved and consequently no solution to the problem emerged. The discrepancy was bridged by inflated uncertainties in the corresponding HVP contribution.

The available \(e^+e^-\!\rightarrow \pi ^+\pi ^-\) cross-section measurements, zoomed into the \(\rho \) peak region, are shown in Fig. 1. Their combination and \(1\sigma \) uncertainty, obtained using the DHMZ methodology implemented in the HVPTools software [48, 49], are indicated by the green band. The spline-based combination procedureFootnote 2 takes into account all known correlations and accounts for measurement tensions. It has been thoroughly validated through closure tests [48]. Compared to our last update [8], we added the more recent SND20 [50] and CMD-3 [36, 37] data, while also employing an updated version of the covariance matrix provided by BESIII [51].

Fig. 1
figure 1

Bare \(e^+e^-\!\rightarrow \pi ^+\pi ^-\) cross section versus centre-of-mass energy in the \(\rho \) peak region. The error bars of the data points include statistical and systematic uncertainties added in quadrature. The green band shows the HVPTools combination within its \(1\sigma \) uncertainty

Relative comparisons between the most precise individual measurements and the combination are shown for the \(\rho \) resonance region in Fig. 2, and for the BABAR and CMD-3 data in a wider window in Fig. 3. A large tension arises between CMD-3 and KLOE, which provide the, respectively, largest and smallest cross-section measurements. Tensions are also observed between BABAR and CMD-3 in the central \(\rho \) resonance region, while they agree at low and high energies. The CMD-3 data also exhibit a 2.8\(\sigma \) discrepancy with the older CMD-2 results by the same collaboration [52]. Extensive discussions with CMD-3/2 physicists in the framework of the Muon g – 2 Theory Initiative [53] did not reveal any obvious problem in the new results. A summary of these discussions is available [54].

Fig. 2
figure 2

Comparison between \(e^+e^-\!\rightarrow \pi ^+\pi ^-\) cross-section measurements from BABAR [34, 35], KLOE 08 [30], KLOE 10 [31], KLOE 12 [32], BESIII [51], CMD-2 03 [55], CMD-2 06 [52], SND [56], SND20 [50], CMD-3 [36, 37], and the HVPTools combination. The error bars include statistical and systematic uncertainties added in quadrature

Figure 4 (top) shows the local combination weights versus \(\sqrt{s}\) for each data set. They take into account the uncertainties of the measurements and their correlations, as well as the corresponding point-spacing and binning [48, 49]. While previously the BABAR and KLOE measurements dominated the combination over the entire energy range, the more recent CMD-3 and SND20 data receive important weights, too. The group of experiments labelled “Other exp” corresponds to older data, often with incomplete radiative corrections, which receive small weights throughout.

The bottom panel of Fig. 4 displays the uncertainty scale factor versus \(\sqrt{s}\), derived based on the local compatibility among the measurements [48, 49].Footnote 3 Large scale factors due to tensions indicate the presence of systematic effects that are not included in the measurement uncertainties. They require a conservative uncertainty treatment in the combination [3, 8].

Figure 5 shows the pull magnitude (significance) between pairs of the three most precise \(e^+e^-\!\rightarrow \pi ^+\pi ^-\) experiments, computed as the absolute value of the difference of the contributions to \(a_\mu \) divided by its uncertainty, in various energy intervals. The three KLOE measurements [30,31,32] have been combined into one data set [33]. The difference between BABAR and CMD-3 rises to a significance of 2–3\(\sigma \) on the \(\rho \) peak, while reasonable agreement is seen at lower and higher energies. The differences between BABAR and KLOE are also at the 2–3\(\sigma \) level in the \(\rho \) peak region, reaching up to \(4\sigma \) at higher energy, while good agreement is seen at lower energy. The largest differences are observed between CMD-3 and KLOE, with significance above \(5\sigma \) around the \(\rho \) peak. When probing the broader energy interval 0.6\(-\)0.975\(\mathrm {\;Ge\hspace{-1.00006pt}V} \), covering the \(\rho \) peak, the significance of the difference between BABAR and CMD-3 is \(2.2\sigma \), that between BABAR and KLOE is \(3.0\sigma \), while CMD-3 and KLOE differ by \(5.1\sigma \) (Fig. 5, bottom). When extending the comparisons to the maximal regions of overlap between pairs of experiments, the differences are diluted to \(2.1\sigma \) between BABAR and CMD-3, \(1.5\sigma \) between BABAR and KLOE, and \(3.3\sigma \) between CMD-3 and KLOE, respectively, owing to the better inter-experiment agreement and larger KLOE uncertainties below and above the peak of the resonance.

Fig. 3
figure 3

Comparison between \(e^+e^-\!\rightarrow \pi ^+\pi ^-\) cross-section measurements from BABAR [34, 35] (top panel), CMD-3 [36, 37] (bottom), and the HVPTools combination of all available data in the \(\sqrt{s}\) range covered by CMD-3. The error bars include statistical and systematic uncertainties added in quadrature

Fig. 4
figure 4

Top: relative local weight per measurement contributing to the \(e^+e^-\!\rightarrow \pi ^+\pi ^-\) cross-section combination versus centre-of-mass energy. Bottom: local uncertainty scale factor versus centre-of-mass energy applied to the combined \(\pi ^+\pi ^-\) cross-section uncertainty to account for inconsistencies among the measurements

Fig. 5
figure 5

Significance of the difference between pairs of the three most precise \(e^+e^-\!\rightarrow \pi ^+\pi ^-\) experiments for narrow energy intervals of \(50\mathrm {\;Me\hspace{-1.00006pt}V} \) or less (top) and larger energy intervals (bottom) indicated by the horizontal lines

The interesting possibility to resolve the tensions between different data sets using basic theoretical constraints on the pion form factor from analyticity and unitary has been investigated [57]. However, the theory-constrained fits are loose enough to accommodate even the extreme cases of KLOE and CMD-3 [58].

3 BABAR study of additional photon radiation

The BABAR collaboration performed unique measurements of additional photon radiation in the initial state radiation (ISR) processes \(e^+e^-\rightarrow \mu ^+\mu ^-\gamma \) and \(e^+e^-\rightarrow \pi ^+\pi ^-\gamma \). Hard NLO radiation with one additional photon was studied in Refs. [34, 35]. A new analysis [46] based on the full available data set extended that study and included for the first time the measurement of hard NNLO processes with two additional photons from either initial or final state radiation (FSR). The paper also includes comparisons with predictions from the NLO Phokhara and the partial NNLO AfkQed [59,60,61,62] Monte Carlo generators.

In the following we use the notation LO, NLO, NNLO to specify the true QED order defined with respect to the lowest-order ISR process, while the same symbols taken within quotes, ‘LO’, ‘NLO’, ‘NNLO’, refer to reconstructed topologies with various photon multiplicities. We summarise here the main findings of the BABAR study [46]:

  • ‘NNLO’ contributions with additional photon energies above 200\(\mathrm {\;Me\hspace{-1.00006pt}V}\) (100\(\mathrm {\;Me\hspace{-1.00006pt}V}\)) for the most (least) energetic one (representing (1.9–3.8)% of the beam energy in the centre-of-mass frame) are observed in \((3.47\pm 0.38)\)% and \((3.36\pm 0.39)\)% of the dimuon and dipion events, respectively. These events are dominated by small-angle additional ISR photons.

  • The ‘NLO’ event fractions, with one additional detected or kinematically reconstructed photon above 200\(\mathrm {\;Me\hspace{-1.00006pt}V}\), predicted by the Phokhara generator exceed the BABAR data, particularly for additional ISR photons at small angle. Over the full measured phase space, including additional ISR and FSR photons, Phokhara over-predicts the hard ‘NLO’ contribution by a factor of \(1.25\pm 0.05\).

  • The BABAR cross-section measurements [34, 35] are found to be insensitive to the missing NNLO contributions in (and hard ‘NLO’ excess of) the Phokhara generator.

  • The AfkQed generator approximates real and virtual higher-order (HO) corrections by resumming the leading logarithms.Footnote 4 It provides a reasonable description of the rates and energy distributions of the measured ‘NLO’ and ‘NNLO’ topologies.

4 Cancellation between soft/virtual and hard photon corrections

4.1 Expected behaviour of NLO events and experimental procedures

To assess the effect of higher-order radiative corrections one needs to evaluate the sensitivity of a given measurement to the presence of additional photon radiation, particularly ISR. An analysis that rejects part of the additional ISR requires a compensating correction from an NLO Monte Carlo generator to be consistent, at that order, with the corresponding ISR luminosity computed with the same generator. The Phokhara event generator version 9.1 [64] incorporates all contributions from NLO QED and thus provides a complete prediction of the ISR process \(e^+e^-\rightarrow \mu ^+\mu ^-\gamma (\gamma )\). It includes the lowest-order (LO) ISR and FSR processes and NLO contributions from real photon emission by the \(e^\pm \) beams and the outgoing muons, as well as soft photon emission and virtual corrections. The sum of the soft and virtual terms is infrared finite and the transition energy between soft and hard emission is chosen within a safe range (5\(\mathrm {\;Me\hspace{-1.00006pt}V}\) for BABAR simulations) so that both contributions are under control. From an experimental point of view, both LO and soft plus virtual NLO lead to event configurations that are reconstructed in the ‘LO’ topology and kinematics, whereas sufficiently hard NLO radiation necessitates a different kinematic treatment. The lowest energy for NLO photon contributions is experiment dependent. In BABAR a value of 50\(\mathrm {\;Me\hspace{-1.00006pt}V}\), the energy threshold for a detected photon included in kinematic fits, is representative, although a higher threshold (200\(\mathrm {\;Me\hspace{-1.00006pt}V}\)) is applied to detected or kinematically reconstructed photons to separate the ‘NLO’ from the ‘LO’ topologies for the final results.

The effects of HO radiative corrections are evaluated using samples of ISR muon-pair events generated with Phokhara in the BABAR conditions: ISR (or FSR) photon at large polar angle (\(20^\circ \)\(160^\circ \)) in the \(e^+e^-\) centre-of-mass (CM) system; two-charged-particle mass from threshold to 1.4\(\mathrm {\;Ge\hspace{-1.00006pt}V}\); \(\sqrt{s}=10.58\) \(\mathrm {\;Ge\hspace{-1.00006pt}V}\) CM energy. Soft and virtual corrections are studied with the use of samples generated at LO with either ISR only or with ISR and FSR, and samples generated at NLO with either ISR only or the full NLO configuration with ISR, FSR, and their interference. The fraction of hard photon radiation turns out to be rather large because NLO ISR is enhanced by a factor \(\ln (s/m_e^2)\). It strongly depends on the energy threshold of the additional photon: a fraction of 60% for \(E^*_\gamma \) above 5\(\mathrm {\;Me\hspace{-1.00006pt}V}\) in the centre-of-mass decreases to 38% above 50\(\mathrm {\;Me\hspace{-1.00006pt}V}\) and to 25% above 200\(\mathrm {\;Me\hspace{-1.00006pt}V}\). All contributions are dominated by NLO ISR at small angle with respect to the beam axis. For example, with 50\(\mathrm {\;Me\hspace{-1.00006pt}V}\) photon energy threshold the NLO ISR fraction at small angle outside the BABAR acceptance is 27%, NLO ISR at large angle 8%, and NLO FSR 3%. These values illustrate the importance of a thorough understanding and robust correction of effects from HO radiative corrections. The situation is very similar for the \(e^+e^-\rightarrow \pi ^+\pi ^-\gamma (\gamma )\) ISR process.Footnote 5

It is instructive to compare the Phokhara predictions at different orders. For the BABAR conditions the full NLO (LO) cross section for \(e^+e^-\rightarrow \mu ^+\mu ^-\gamma (\gamma )\) amounts to 17.16 pb (17.45 pb), a reduction by \(-1.7\)% at NLO. Since the NLO cross-section contribution with an additional photon above \(50\mathrm {\;Me\hspace{-1.00006pt}V} \) corresponds to \(38\%\times 17.16/17.45\simeq 37\)%, it is almost compensated by a reduction of 39% due to the soft and virtual contribution. This large cancellation between hard and soft/virtual effects is well-known in QED [66, 67]. It requires a careful assessment of the measured and theoretically corrected cross-section fractions.

Fig. 6
figure 6

Generic Feynman diagrams for the ISR \(e^+e^-\rightarrow \mu \mu \gamma (\gamma )\) process at LO, NLO and NNLO. At each specified QED order indicated on the left, generic diagrams, ignoring specific topologies with different particle permutations, are drawn for the virtual (loop) and real photon emission processes. For each order, the latter processes are given on the left-hand side, while the interference contributions are specified on the right together with their experimental topology labeled with quotes. \(N_\mathrm {\gamma ,add}\) refers to the number of real photons emitted, beyond the main ISR photon: \(N_\mathrm {\gamma , add} =0\) for the ‘LO’ topology, \(N_\mathrm {\gamma , add} =1\) for the ‘NLO’ topology. In the case of NNLO, the two interference contributions labelled (1) and (2) lead to ‘LO’ and ‘NLO’ topologies, respectively

4.2 Going from NLO to NNLO processes

At present there exists no complete NNLO calculation of the \(e^+e^-\rightarrow \mu ^+\mu ^-\gamma (\gamma )(\gamma )\) process. A behaviour similar to NLO is expected, i.e., an overall small effect on the cross section, possibly at the level of a few per mil, and significantly larger contributions from hard radiation, which may affect the fiducial acceptance of the analyses.

The investigation of hard and soft/virtual radiative corrections at NNLO is more intricate than at NLO. The situation is illustrated in Fig. 6, which shows the relevant generic Feynman diagrams. For each order in QED, positive contributions with one to three real photons are separated from contributions from interfering amplitudes involving soft/virtual photons. The first two rows correspond to the diagrams considered in the NLO generator Phokhara. They illustrate the large cancellation occurring at this level as the result of the interference term within the ‘LO’ topology.

At NNLO, the cancellation occurs between the positive three real photon emission contribution and the generic interference contributions leading to an ‘LO’ topology, for the processes labelled (1), or to an ‘NLO’ topology for those in part (2). The interpretation of the results from the radiative BABAR study [46] depends on the relative importance of these two components. Two extreme scenarios may be considered:

  • Scenario 1: the processes labelled (1) dominate the NNLO interference term. Since they fall into the ‘LO’ topology, the ‘NLO’ contribution is unaffected by NNLO and the large excess of events predicted by Phokhara compared to the data for the ‘NLO’ topology would need to be interpreted as a generator issue at NLO.

  • Scenario 2: the processes labelled (2) are the dominant NNLO interference contribution. Being negative, it will affect the ‘NLO’ photon energy distribution in a way uncorrected by the NLO generator. In this situation, the observed deficit in data would arise from NNLO virtual contributions and Phokhara is safe.

The true situation is likely in-between these two extreme scenarios. Only complete NNLO calculations, at fixed order or in an event generator, will help resolve this ambiguity and should be a high priority for the field. Since the interference contribution listed in the second row of part (1) is obviously positive, it will tend to reduce the negative contribution originating from the first row, perhaps to an overall level smaller than part (2). Also, the interpretation of the BABAR results appears more natural in the second scenario as the NNLO contributions, real and virtual, would explain all the observed features without having to question the validity of the Phokhara generator at NLO.

5 Impact of higher-order radiative effects

The observation that Phokhara does not correctly predict the ‘NLO’ contribution raises potential issues for ISR experiments measuring only part of the cross section because of event selection criteria. The fiducial acceptance of an analysis is evaluated with a Monte Carlo generator interfaced with a simulation of the detector response. KLOE, BESIII and CLEOc [68] rely on Phokhara to estimate the unselected ‘NLO’ part. Hard NNLO contributions are ignored. As explained in Sect. 4, a mis-evaluation of hard NLO and NNLO contributions is not compensated by soft/virtual contributions at the same order since the latter are included in the selection of lower-order-like events. This unbalance will generate a bias in the cross section measurement.

5.1 Procedure

It is not possible to accurately compute the bias without full knowledge of the respective analyses and associated detector performance. The purpose of the following study is limited to estimating the possible extent of the bias by reproducing the kinematic conditions of the published analyses with a simplified generic detector. The study is further restricted to two configurations: ‘KLOE08’ with small-angle undetected ISR [30] and ‘BESIII’ with large-angle measured ISR [51], the quotes indicating their generic nature. In both cases, Monte Carlo samples are generated with Phokhara in the kinematic conditions of the experiments with a fast simulation of the tracking and calorimeter performance. The 4-vectors from the event generator are converted into pseudo “reconstructed” data using the acceptances and resolutions of the detector, as found in the papers published by the experiment. Analysis steps are then applied to these pseudo data to reproduce the overall acceptance and efficiency taking into account the analysis cuts used by the experiment eg., fiducial acceptance in the () plane for KLOE or selection for BES-III.

Moreover, two assumptions are made: first, the hard NNLO fraction is taken from the BABAR measurements and assumed to hold independently of the experiment’s CM energy. Secondly, real hard NNLO and soft/virtual NNLO radiative corrections are assumed to cancel in the cross section. In absence of a complete NNLO calculation, the effect of the observed hard NNLO contribution on the ‘NLO’ spectrum is not known. This ambiguity is related to the relative importance of parts (1) and (2) in Fig. 6, which will be approached by considering the extreme scenarios introduced above.

While scenario 2 can be readily transposed to any ISR experiment by estimating the effect of missing NNLO corrections, evaluating the impact of scenario 1 is more delicate without knowing the origin of the issue in the Phokhara generator. It is worth mentioning in this context that all tests documented in the Phokhara publications to evaluate the impact of NLO versus LO corrections relate to the integrated cross section as a function of the two-pion mass [69]. Albeit Phokhara was used by the experiments to evaluate the fiducial acceptance and efficiency of energy and angular selection requirements on additional ISR photons, the modelling accuracy was to our knowledge never tested. Experiments exploiting ISR measure the \(M_{\pi \pi }\) spectrum of the selected \(\pi ^+\pi ^-\gamma \) sample and correct it for acceptance and selection efficiencies to determine the Born-level \(e^+e^-\rightarrow \pi ^+\pi ^-\) cross section

$$\begin{aligned} \sigma _{\pi \pi } = \frac{dN_{\pi \pi \gamma }}{dM_{\pi \pi }}\cdot \frac{s}{2M_{\pi \pi }H(M_{\pi \pi })\varepsilon _\textrm{acc}\varepsilon _\textrm{sel}L_{ee}}\,, \end{aligned}$$
(1)

where s is the CM energy squared, \(H(M_{\pi \pi })\) the ISR radiation function (radiator), and \(L_{ee}\) the \(e^+e^-\) luminosity. The acceptance \(\varepsilon _\textrm{acc}\), selection efficiency \(\varepsilon _\textrm{sel}\), and \(H(M_{\pi \pi })\) are evaluated with Phokhara. Although we have studied NNLO effects for all three variables, results will only be reported for the selection efficiency which is affected most.

5.2 Generic ‘KLOE08’ configuration

In the experimental configuration with \(\sqrt{s}=1.02\) \(\mathrm {\;Ge\hspace{-1.00006pt}V}\), only the two charged particles are detected in a polar angle range between \(50^\circ \) and \(130^\circ \). Their three-momenta are measured accurately, while the energy and polar angle of the putative ISR photon are calculated assuming LO kinematics for the ISR process. The ISR photon is required to be emitted in a dead cone of 15\(^\circ \) around the beams. The common track mass \(M_\textrm{trk}\) of the two charged particles, computed under the LO assumption, allows to separate dimuon from dipion processes. The selection of \(\pi ^+ \pi ^- \gamma \) events is defined in the (\(M_\textrm{trk}\)\(M_{\pi \pi }\)) plane in a region avoiding background from \(\phi \rightarrow \pi ^+\pi ^-\pi ^0\) and the muon band [30]. Fast simulation follows the KLOE performance for charged particle reconstruction [70]. Since the selection is very sensitive to the NLO radiative tail, only events satisfying the acceptance cuts are considered here.

Fig. 7
figure 7

Distributions of simulated \(e^+e^-\rightarrow \pi ^+\pi ^-\gamma (\gamma )\) events within the fiducial acceptance for the ‘KLOE08’ configuration in the (\(M_\textrm{trk}\)\(M_{\pi \pi }\)) plane. Top: events with an additional NLO photon with energy larger than 5\(\mathrm {\;Me\hspace{-1.00006pt}V}\) emitted in the same hemisphere as the ISR photon (left) and in the opposite hemisphere (right). Bottom: the corresponding distributions after applying selection requirements

Fig. 8
figure 8

Distributions of \(M_\textrm{trk}\) for simulated \(e^+e^-\rightarrow \pi ^+\pi ^-\gamma (\gamma )\) events within the fiducial acceptance for the ‘KLOE08’ configuration. Top: events with an additional NLO photon with energy larger than 10\(\mathrm {\;Me\hspace{-1.00006pt}V}\) emitted in the same hemisphere as the ISR photon (left) and in the opposite hemisphere (right). Bottom: the corresponding distributions after applying selection requirements

Despite the LO-like selection, half of the NLO events are automatically kept in the selected sample when the additional ISR photon is emitted in the same hemisphere as the primary ISR photon, reducing thereby the dependence of the selection efficiency on the Phokhara generator. This situation occurs since all ISR emissions are sharply peaked in the beam direction, resulting in a small invariant mass of the diphoton system consistent with the zero-mass assumption in the \(M_\textrm{trk}\) calculation. However, when the additional photon is emitted along the opposite beam direction, the diphoton mass can be relatively large, introducing a long tail in the \(M_\textrm{trk}\) distribution. In that case, the selection efficiency depends on the validity of the photon distribution predicted by Phokhara. The simulated event distributions within the fiducial acceptance in the (\(M_\textrm{trk}\)\(M_{\pi \pi }\)) plane for the same-side and opposite-side samples are shown for all events (top) and the selected events (bottom) in Fig. 7. Figure 8 shows the corresponding \(M_\textrm{trk}\) distributions integrated over \(0.6<M_{\pi \pi }<0.95\mathrm {\;Ge\hspace{-1.00006pt}V} \), for events with additional photon energies larger than 10\(\mathrm {\;Me\hspace{-1.00006pt}V}\). Here the opposite side configuration leads to \(M_\textrm{trk}\) values above the pion peak, which are selected with an average efficiency \(\varepsilon _\textrm{sel}^\textrm{oppo}\) of only 25%.

To estimate the effect of missing hard NNLO radiation in Phokhara, the fraction of \((3.5\pm 0.4)\)% observed by BABAR is assumed, as the relative thresholds for the additional photon, 100\(\mathrm {\;Me\hspace{-1.00006pt}V}\) in BABAR versus 10\(\mathrm {\;Me\hspace{-1.00006pt}V}\) in KLOE, scaled by the beam energies, are comparable.Footnote 6 Taking further the NNLO contribution as a perturbation of the much larger hard NLO component, the selection efficiency is assumed to be unaffected. Following the previous discussion, three out of four configurations feature the emission of at least one of the two additional photons opposite to the ISR photon and thus contribute to the radiative tail of the \(M_\textrm{trk}\) distribution. Averaging over the KLOE08 mass range (dominated by the larger statistics at high mass), the resulting cross section change from the reduced selection efficiency amounts to roughly \(-3.5\cdot 3/4 (1 - \varepsilon _\textrm{sel}^\textrm{oppo})\% = -2.0\)%. The \(M_{\pi \pi }\) dependence is small across the \(\rho \) mass region, with a value of \(-2.3\)% at the peak.

In scenario 1 the NLO excess in Phokhara is assumed to be a generator issue. Were the NLO fractional excess at the same level as that observed by BABAR, the resulting effect on the selection efficiency would partially cancel the bias from missing NNLO radiation, with a residual effect of order \(-1\%\) at the \(\rho \) peak.

In scenario 2, the use of Phokhara is safe and the only bias originates from missing NNLO corrections. Hard NNLO radiation contributes as in scenario 1, but its effect is reduced by the negative interference contributions with an ‘NLO’ topology (cf. part (2) in Fig. 6), of which only one half with opposite-side radiation contributes to the \(M_\textrm{trk}\) radiative tail. The resulting cross section change is estimated to be \(-3.5\cdot (3/4-1/2) (1 - \varepsilon _\textrm{sel}^\textrm{oppo})\% = -0.7\)% for the average over \(M_{\pi \pi }\), and \(-0.8\)% at the \(\rho \) peak.

Both scenarios lead to cross section changes that exceed the 0.5% uncertainty assigned by KLOE08 [30] to radiative corrections.

5.3 Other KLOE measurements

The ‘KLOE10’ configuration with the ISR photon detected at large angle and the two pions in the same range as ‘KLOE08’ may be treated in a similar way. Because additional ISR photons predominantly emitted along the beams are well separated from the detected ISR photon, one expects both same and opposite sides to contribute to the \(M_\textrm{trk}\) radiative tail. The cross-section change in scenario 1 is therefore expected to be larger than for the ‘KLOE08’ configuration. In scenario 2, however, since the NNLO positive real photon and negative virtual/soft interference contributions approximately cancel, and lead to photon topologies in the rejected radiative tail, there is no bias for KLOE10.

In the KLOE12 measurement [32] the cross section was directly obtained from the ratio of the \(\pi ^+\pi ^-\gamma \) to \(\mu ^+\mu ^-\gamma \) mass spectra, protecting the result against modelling biases. However, in practice, the protection is incomplete as pion and muon selection requirements differ in the (\(M_\textrm{trk}\)\(M_{\pi \pi /\mu \mu }\)) plane. While part of the pion radiative tail is retained, it is almost entirely removed in the selected muon sample by a tight \(80<M_\textrm{trk}<115\) \(\mathrm {\;Me\hspace{-1.00006pt}V}\) requirement applied to reduce the pion background. This asymmetry in the selection of the pion and muon samples reintroduces a modelling dependence.

Fig. 9
figure 9

Distributions of simulated \(e^+e^-\rightarrow \pi ^+\pi ^-\gamma (\gamma )\) events in the ‘BESIII’ configuration. Left: \(\chi ^2_\textrm{LO}\) of the \(\pi ^+\pi ^-\gamma \) kinematic fit as a function of the true energy of the NLO additional photon. The dashed horizontal line indicates the \(\chi ^2_\textrm{LO}<60\) selection requirement. Right: the energy spectrum of the additional NLO photon with (blue line) and without (dots) applying the compatibility requirement

KLOE12 features a comparison of the measured muon ISR cross section with the QED NLO prediction by Phokhara. The results show agreement within the quoted systematic uncertainty of 1%, which is however insufficient to validate the 0.5% uncertainty assigned to radiative corrections in the two-pion cross-section measurement. A newer dimuon study based on a much larger data set does not improve in precision [71]. We may proceed as in the pion case to estimate the effect of missing NNLO corrections in Phokhara. The cross section change under scenario 1 is found to be of order \(-2.6\)%, while the increase from a potential Phokhara NLO excess cannot be estimated. As for the pions, an NLO excess as the one observed in BABAR would essentially cancel that cross-section change. In scenario 2, the muon cross-section change is found to be reduced, as for the pions, by a factor 1/3 to \(-0.9\)%. Such a bias would amount to twice the quoted radiative correction uncertainty, but would not be detectable given the 1% systematic uncertainty of the test.

5.4 Generic ‘BESIII’ configuration

BESIII reported ISR based \(e^+e^-\rightarrow \pi ^+\pi ^-\gamma \) cross-section results [51] using data taken at \(\sqrt{s}=3.773\) \(\mathrm {\;Ge\hspace{-1.00006pt}V}\), a factor of three below (above) the BABAR (KLOE) CM energy. The analysis requires detection of the two pions and a large-angle ISR photon, while additional photons are ignored. A kinematic fit using the \(\pi ^+\pi ^-\gamma \) hypothesis selects LO and NLO soft/virtual events with the requirement \(\chi ^2_\textrm{LO}<60\). A fast simulation of the ‘BESIII’ configuration and detector performance [72], using the Phokhara generator and the same assumptions as in the ‘KLOE08’ study, allows to investigate the effects of additional photon radiation.

Figure 9 (left) shows the distribution of \(\chi ^2_\textrm{LO}\) as a function of the true NLO additional photon energy \(E_{\gamma ,\textrm{add}}\), exhibiting a strong correlation. Events with about \(E_{\gamma ,\textrm{add}}> 50\mathrm {\;Me\hspace{-1.00006pt}V} \) are subject to rejection. As this maximum accepted \(E_{\gamma ,\textrm{add}}\) is consistent with the BABAR threshold of 100–\(200\mathrm {\;Me\hspace{-1.00006pt}V} \) for NLO/NNLO photons when normalized to the respective beam energies, one expects very low selection efficiencies of NLO/NNLO events in BESIII. The distributions of \(E_{\gamma ,\textrm{add}}\) for all radiative events and after the \(\chi ^2_\textrm{LO}<60\) selection is shown on the right panel of Fig. 9. The fraction of rejected events with \(E_{\gamma ,\textrm{add}}>50\mathrm {\;Me\hspace{-1.00006pt}V} \) is 92%.

The fractional cross-section change due to missing NNLO in Phokhara in scenario 1 amounts to approximately \(-3.5\cdot 0.92\)% = \(-3.2\)%, again significantly exceeding the assigned systematic uncertainty of 0.5%. As in the case of KLOE, this large effect might be partially cancelled by an NLO excess in Phokhara under scenario 1 that we are unable to propagate to CM energies lower than BABAR.

Similarly to KLOE10, the tight ‘LO’ selected topology preserves BESIII from any bias under scenario 2 as the NNLO positive and negative contributions approximately cancel in the rejected radiative tail.

5.5 Additional remarks

The quantitative effects of higher order radiative corrections on the KLOE and BESIII two-pion cross-section results estimated here cannot be taken at face value. Rather, they indicate the potential size of systematic effects encountered from the use of Phokhara in view of the findings reported by BABAR [46]. According to our study, effects from neglected NNLO contributions may suggest upward cross-section corrections that exceed the quoted systematic uncertainties, potentially reducing the difference seen with BABAR. The concomitant effect of the hard NLO excess in Phokhara [46] is more speculative and may depend on CM energy. Investigations by the generator authors should allow to shed light on this issue [73]. Any definitive assessment needs to be carried out by the KLOE and BESIII collaborations with the full machinery of their analyses.

In this context we also performed a test comparing dimuon samples generated with Phokhara and KKMC [63] in the ‘KLOE08’ configuration. Differences in the energy distributions of additional photons at ‘NLO’ level lead to different acceptance predictions among the two generators. By construction, KKMC produces higher photon multiplicities, but it predicts the fraction of three or more photons above 10 MeV at KLOE energies to be 1.3%, which is lower than the corresponding ‘NNLO’ rate found by BABAR. Of course the two generators operate in different ways. Whereas Phokhara is designed for ISR with an NLO matrix element, KKMC works from the Born level up with multiple ISR photon emission approximating higher orders. It is beyond the scope of this paper to conduct a detailed evaluation of these generators, but we note their different predictions. Contrary to our above estimates for Phokhara, KKMC would predict a downward shift of the measured cross sections, albeit again larger than the quoted systematic uncertainty assigned to radiative corrections.

Fig. 10
figure 10

The minimum angle between the additional large-angle (LA) photon and the two pions within the detector acceptance for simulated \(e^+e^-\rightarrow \pi ^+\pi ^-\gamma (\gamma )\) events in the BABAR, BESIII, and KLOE10 conditions (left to right panels). The separation between FSR and LA ISR events is pronounced at high CM energy (BABAR), still visible at intermediate CM energy (BESIII), and vanishes at low CM energy (KLOE)

In a recent paper [74], Belle II confirms the BABAR finding of an ‘NLO’ excess by Phokhara with respect to their \(\pi ^+\pi ^-\pi ^0\) data. To account for this excess and the missing NNLO contributions, they assign a 1.2% systematic error for the generator. It is critical that future (re-)analyses of ISR based cross section measurements perform data-driven tests of the kinematic properties of additional photons as Belle II has done. Such tests allow to investigate the sensitivity to mismodelling and higher order radiative effects, and help design robust selection criteria. The loose selection used by BABAR could be implemented rather straightforwardly in the BESIII analysis since the setup follows the same topology with a large-angle ISR photon, and the detector allows the measurement of large-angle additional photons. The situation is more complicated for KLOE as the selection method, at least in the small-angle ISR topology used in KLOE08 and KLOE12, lacks kinematic constraints, preventing the reconstruction of additional small-angle photons. Such an approach would be possible in the KLOE10 topology, but would require independent charged particle identification. Another difficulty for KLOE arises from the low centre-of-mass energy and the proximity of the \(\rho \) resonance leading to low ISR photon energies that are not as well separated from additional photons as in the case of BESIII and BABAR. This also presents an obstacle to the experimental separation of additional large-angle ISR and FSR photons as was done by BABAR and would be possible with BESIII, as seen from Fig. 10.

6 Reappraisal of \(\tau \) spectral functions

Spectral functions derived from measurements of mass spectra in hadronic \(\tau \) decays provide a complementary input, under isospin symmetry and accounting for isospin-breaking corrections, to compute HVP integrals [75]. In the late 1990 s, thanks to LEP experiments (particularly ALEPH), \(\tau \) spectral functions in the two-pion channel were more precise than the available \(e^+e^-\) cross sections. In the following decade, both \(\tau \) and \(e^+e^-\) data were therefore used by our group [49, 76,77,78,79,80,81].

The \(\tau \) spectral function \(v_{\pi ^-\pi ^0}(s)\) in the \(\pi ^-\pi ^0\) channel is defined by

$$\begin{aligned} v_{\pi ^-\pi ^0}(s)= & {} \frac{m_\tau ^2}{6\,|V_{ud}|^2}\, \frac{B_{\pi ^-\pi ^0}}{B_{e}}\, \frac{1}{N_{\pi ^-\pi ^0}}\frac{d N_{\pi ^-\pi ^0}}{ds} \nonumber \\{} & {} \times \, \left( 1-\frac{s}{m_\tau ^2}\right) ^{\!\!-2}\! \left( 1+\frac{2s}{m_\tau ^2}\right) ^{\!\!-1} \frac{R_\textrm{IB}(s)}{S_\textrm{EW}} , \end{aligned}$$
(2)

with

$$\begin{aligned} R_\textrm{IB}(s)=\frac{\textrm{FSR}(s)}{G_\textrm{EM}(s)} \frac{\beta ^3_0(s)}{\beta ^3_-(s)} \left| \frac{F_0(s)}{F_-(s)}\right| ^2\,, \end{aligned}$$
(3)

and where \((1/N_{\pi ^-\pi ^0})dN_{\pi ^-\pi ^0}/ds\) is the normalised invariant mass-squared (s) spectrum of the \(\pi ^-\pi ^0\) final state obtained from the combination of spectra from several experiments, \(B_{\pi ^-\pi ^0}\) (\(B_{e}\)) are the corresponding \(\tau \) branching fractions (final state photon radiation implied), and \(S_\textrm{EW}\) is an electroweak radiative correction. The s-dependent isospin-breaking (IB) corrections are included in \(R_\textrm{IB}(s)\). In Eq. (3), \(\beta _{0,-}\) denote the pion velocities in the two-pion CM system for the \(\pi ^+\pi ^-\) and \(\pi ^-\pi ^0\) final states, respectively. \(G_\textrm{EM}(s)\) is the radiative function, correcting from the \(\pi ^-\pi ^0(\gamma )\) to the \(\pi ^+\pi ^-\) final states, requiring the addition of the specific FSR contribution to the neutral case. Several model-dependent approaches exist for the small long-distance radiative correction \(G_\textrm{EM}(s)\). The pioneering work of Cirigliano-Ecker-Neufeld [82, 83] used Chiral Perturbation Theory (ChPT), while vector dominance was the basis of further work by Lopez Castro et al. [84, 85]. The two methods have been known to be in good agreement. More recently, other studies extended the order in ChPT while satisfying short-distance constraints [86]. Additional free parameters, however, deteriorate the precision of the prediction. In the longer term, lattice QCD based estimates are expected to become available [87] and will provide an important cross check. The form factor ratio \({F_0}/{F_-}\) takes into account the different masses and widths of the charged and neutral \(\rho \) mesons and the \(\rho \) – \(\omega \) interference only present in the neutral final state.

The idea of significant \(\rho \) – \(\gamma \) mixing, motivated by the well-founded Z – \(\gamma \) mixing in high-energy \(e^+e^-\) collisions, was put forward by Jegerlehner and Szafron [88] and introduced large IB corrections on top of what had been previously estimated. However, a justification for applying the same Z – \(\gamma \) formalism to the composite \(\rho \) meson was never given. It was exacerbated by the proposal by Jegerlehner to reverse the correction and apply it to \(e^+e^-\) rather than \(\tau \) data [89]. In a consistent dispersive approach of the pion form factor there is no room for \(\rho \) – \(\gamma \) mixing as differences between charged and neutral \(\rho \) line shapes are embedded in their respective resonance parameters (mass, width) [90]. The consideration of \(\rho \) – \(\gamma \) mixing is therefore dropped.

The use of \(\tau \) spectral functions was at some point discontinued owing to the improved \(e^+e^-\) cross-section data from KLOE and BABAR not requiring IB corrections. Given the discrepancies among the \(e^+e^-\) data sets and the progress on the understanding of IB corrections, we reconsider them here and present an update of the \(2\pi \) HVP contribution to the muon g – 2 from \(\tau \) decays. The combined \(\tau \) mass spectrum, after an update of the ALEPH data, is unchanged from Ref. [81]. A small change is introduced by updated IB corrections, essentially the \(\rho \) –\(\omega \) contribution. The parameters used in Eq. (2) are \(m_\tau =(1776.84\pm 0.17)\) \(\mathrm {\;Me\hspace{-1.00006pt}V}\), the CKM matrix element \(|V_{ud}|=0.97418\pm 0.00019\), and \(B_{e}=(17.818 \pm 0.032)\%\). Short-distance electroweak radiative effects [91,92,93,94], relevant for the \(\pi \pi \) decay give \(S_\textrm{EW}=1.0235\pm 0.0003\) [78].

Most corrections to the \(\tau \)-based \(2\pi \) contribution to \(a_\mu \) are unchanged from our previous work [49, 81]. They amount to (all in \(10^{-10}\) units): \(-12.21\pm 0.15\) from \(S_\textrm{EW}\), \(-1.92\pm 0.90\) from \(G_\textrm{EM}\), \(+4.67\pm 0.47\) from FSR, \(-7.88\) from \(m_{\pi ^-}-m_{\pi ^0}\) in the cross section, \(+4.09\) from \(m_{\pi ^-}-m_{\pi ^0}\) in \(\Gamma _\rho \), \(+0.20^{+0.27}_{-0.19}\) from \(m_{\rho ^-}-m_{\rho ^0}\), \(-5.91\pm 0.59\) from \(\pi \pi \gamma \) and other electromagnetic \(\rho \) decays. The last four corrections are affected by a systematic uncertainty from the choice of the analytic model for the \(\rho \) lineshape, which we estimate from the difference between the Gounaris-Sakurai and Kühn-Santamaria resonance parameterisations and add linearly.

Due to its fast bipolar dependence on mass the contribution of \(\rho \) – \(\omega \) interference to the dispersion integral is relatively small. It depends on the \(\omega \) mass, the mixing amplitude \(\varepsilon _{\rho \omega }\) and its phase \(\phi _{\rho \omega }\), all determined from fits to the pion form factor in \(e^+e^-\) data. The value for \(\phi _{\rho \omega }\) used in our previous analyses [49, 81] was unexpectedly large [95]. Here, we use updated results from a fit to the combined \(e^+e^-\) data before CMD-3 [57, 96] giving \(m_\omega =782.07\pm 0.15\) \(\mathrm {\;Me\hspace{-1.00006pt}V}\), \(\varepsilon _{\rho \omega }=(1.99\pm 0.03)\times 10^{-3}\), and \(\phi _{\rho \omega }=(3.8\pm 1.8)^\circ \). Including CMD-3 [58, 96] gives similar results with the full difference added as systematic uncertainty. The resulting IB correction from \(\rho \) – \(\omega \) mixing is \(+(4.0\pm 0.4)\times 10^{-10}\).

Summing up all the effects, the total IB correction to the \(\tau \)-based \(2\pi \) contribution is estimated to be \(-(14.9\pm 1.9)\times 10^{-10}\) to be compared to our previous estimate of \(-(16.1\pm 1.9)\times 10^{-10}\) [49, 81]. Finally the contribution to \(a_\mu \) from the combined \(\tau \) data reads

$$\begin{aligned} a_\mu ^\tau [2\pi ] = (517.3 \pm 1.9 \pm 2.2 \pm 1.9)\times 10^{-10}\,, \end{aligned}$$
(4)

where the uncertainties are from the combined mass spectrum, the branching fractions, and the IB corrections, respectively.

The result (4) differs from that obtained in Ref. [86], \((519.6 \pm 2.8[\textrm{exp}] ^{+1.9} _{-2.1} [\textrm{IB}])\times 10^{-10}\) using \(\mathcal{O}(p^4)\) ChPT. Most of the difference is accounted for by their \(S_\textrm{EW}\) value (1.0201), which does not take into account double counting between \(S_\textrm{EW}\) and \(G_\textrm{EM}\) for the subleading non-logarithmic short-distance correction for quarks. This effect is responsible for a shift of \(1.7\times 10^{-10}\) in \(a_\mu ^\tau [2\pi ]\). The remaining differenceFootnote 7 (\(0.6\times 10^{-10}\)) originates mostly from the \(\rho \) width corrections in the pion form factor.

7 A new perspective on the muon g – 2 HVP contribution from the dispersive method

Having discussed the tensions among the \(e^+e^-\!\rightarrow \pi ^+\pi ^-\) cross-section measurements and their possible origins, and reappraised the use of the complementary \(\tau \) spectral functions, we proceed with a quantitative study of the dominant HVP contributions to \(a_\mu \). We consider here only the most precise results. We do not include the CMD-2 measurements [52, 55], whose discrepancy with CMD-3 is currently under investigation [97], and the SND results, which are in a state of flux from the older [56] to the new measurements [50] that are still being updated [98].

For the following exercise, we consider the LO HVP contributions from the \(\pi ^+\pi ^-\) channel in the wide mass range from threshold to 1.8\(\mathrm {\;Ge\hspace{-1.00006pt}V}\) for each experiment. BABAR and the \(\tau \) spectral functions extend over the entire interval, while the other experiments cover a more restricted range and are completed near threshold and at large mass with the combination discussed in Sect. 2. For KLOE, we use the original combined data from Ref. [33] and consider two cases: the full available range and a restricted range of 0.6-\(-\)0.975\(\mathrm {\;Ge\hspace{-1.00006pt}V} \), where the data are most precise and KLOE’s weight in the combination is largest (cf. top panel of Fig. 4). The two-pion contributions are complemented by the remaining LO HVP, NLO and NNLO HVP, hadronic light-by-light, as well as QED and electroweak contributions, all taken from Ref. [3]. The differences in the resulting \(a_\mu \) predictions therefore reflect the differences in the two-pion contributions from each experiment, whose uncertainties correspond to the original ones, that is without rescaling to accommodate inconsistencies among data sets.

Fig. 11
figure 11

Compilation of \(a_\mu \) predictions subtracted by the central value of the experimental world average [2]. The predictions are computed from the individual \(\pi ^+\pi ^-\) contributions between threshold and 1.8\(\mathrm {\;Ge\hspace{-1.00006pt}V}\), complemented by common non-\(\pi ^+\pi ^-\) contributions taken from Ref. [3] (circles). The quoted uncertainties correspond to the two contributions and do not include that of the subtracted experimental value shown by the vertical band. The error bars indicate the \(\pi ^+\pi ^-\) and total uncertainties, respectively. The percentage given for each experiment represents the fraction of \(a_\mu \)[\(\pi ^+\pi ^-\), threshold\(-\)1.8\(\mathrm {\;Ge\hspace{-1.00006pt}V}\) ] used from a given experiment (see text for details, particularly concerning the two values for KLOE). The lattice result from BMW [38] is shown as filled square

Fig. 12
figure 12

Compilation of LO HVP \(a_\mu ^\textrm{win}\) predictions in the intermediate Euclidean time window (0.4 – 1.0 fm) [39], computed from the individual \(\pi ^+\pi ^-\) measurements between threshold and 1.8\(\mathrm {\;Ge\hspace{-1.00006pt}V}\) (when only part of this interval is available it is extended to the full range using Ref. [8]), complemented by non-\(\pi ^+\pi ^-\) combined spectra taken from Ref. [8]. Also shown is the average of the available lattice QCD results [45]

The results are shown in Fig. 11 as differences between the \(a_\mu \) predictions and experiment [2]. The uncertainties drawn are from the \(\pi ^+\pi ^-\) measurements (inner bars) and the total contributions (outer bars). The quoted uncertainties are separated into the \(\pi ^+\pi ^-\) and remaining non-\(\pi ^+\pi ^-\) contributions.

The BABAR and \(\tau \) based results are in agreement. Combining both with CMD-3 gives \(a_\mu ^\mathrm {had,\,LO}=(7057 \pm 33 \pm 22)\times 10^{-11}\), where the first uncertainty is from the \(\pi ^+\pi ^-\) contribution, scaled by a factor 1.5 according to the \(\chi ^2\) value of 4.5 for 2 degrees of freedom and the second from the non-\(\pi ^+\pi ^-\) contribution. This average results into \(\Delta a_\mu =a_\mu ^\textrm{SM}-a_\mu ^\textrm{exp}=-(123 \pm 33 \pm 29 \pm 22)\times 10^{-11}\), where the first uncertainty is from the \(\pi ^+\pi ^-\) contribution, the second from all the other terms in the \(a_\mu \) prediction, and the third from the g – 2 experimental world average [2]. The significance of a non-zero \(\Delta a_\mu \) is 2.5\(\sigma \). As expected from the known tensions, the \(a_\mu \) value for KLOE in the restricted range lies well below (3.8\(\sigma \)) the above combination.

The BABAR, \(\tau \), CMD-3 combination agrees with the only result available so far from lattice QCD for the full \(a_\mu \) prediction, BMW [38], who find \(\Delta a_\mu =-(105\pm 55\pm 22)\times 10^{-11}\), shedding a new light on the apparent discrepancy between BMW and the dispersive approach. Combining the values of BABAR, \(\tau \), CMD-3 and BMW, the difference with experiment is \(2.8\sigma \).

In the light of these results, we extend the study to the intermediate window 0.4 – 1.0 fm in Euclidean time, which is favourable for lattice QCD. The corresponding \(a_\mu ^\textrm{win}\) values are displayed in Fig. 12, where the quoted uncertainties are again separated into \(\pi ^+\pi ^-\) and non-\(\pi ^+\pi ^-\) contributions, the latter contribution using the combined spectra from Ref. [8].Footnote 8 All dispersive predictions are found below that from lattice QCD with significance of \(1.1\sigma \) for CMD-3, \(2.5\sigma \) for \(\tau \), \(3.1\sigma \) for BABAR, \(5.4\sigma \) for full KLOE, and \(5.8\sigma \) for restricted-range KLOE, exacerbating the pattern seen for \(a_\mu \). The weighted average of BABAR, CMD-3 and \(\tau \) gives \(232.0 \pm 1.1\), to be compared with \(236.1 \pm 0.9\) from lattice QCD. To further understand the discrepancy, additional lattice QCD studies, splitting the range of the lattice window into smaller intervals, possibly around the present optimal window, could be helpful.Footnote 9

8 Conclusions

This paper reviewed existing tensions among the most precise \(e^+e^-\!\rightarrow \pi ^+\pi ^-\) cross-section measurements used in the dispersive evaluation of the hadronic vacuum polarization (HVP) contribution to the anomalous magnetic moment of the muon. Local discrepancies between KLOE on one hand and BABAR and CMD-3 on the other hand exceed significances of \(3\sigma \) and \(5\sigma \), respectively, while that between BABAR and CMD-3 is generally at the \(2\sigma \) level. CMD-3 data lie systematically above all other data, while KLOE data lie below.

A dedicated analysis of radiative processes in \(e^+e^- \rightarrow \mu ^+\mu ^-\gamma \) and \(e^+e^- \rightarrow \pi ^+\pi ^-\gamma \) at NLO and NNLO by BABAR [46] prompted a study of related systematic uncertainties in the measurements using initial state photon radiation. In absence of an NNLO Monte Carlo generator the studies relied on approximate assumptions and fast simulation. They indicate potential problems for radiative event acceptances in the KLOE and BESIII measurements, not covered by the quoted systematic uncertainties.

In view of these difficulties with \(e^+e^-\) results we reappraised the use of \(\tau \) hadronic spectral functions in the dispersive approach with an updated treatment of isospin-breaking corrections. The \(\tau \)-based HVP contribution comes out close to the larger values provided by BABAR and CMD-3.

We reevaluated the compatibility of the dispersive HVP calculations with lattice QCD and with the g – 2 experiment. Combining BABAR, \(\tau \), and CMD-3 measurements for the \(e^+e^-\!\rightarrow \pi ^+\pi ^-\) HVP contribution, and adding all other contributions, the dispersive calculation of \(a_\mu \) agrees with the lattice QCD result from BMW [38], while a discrepancy in the restricted observable \(a_\mu ^\textrm{win}\) persists.

The discrepancy of the dispersive prediction with the g – 2 experimental world average reduces from more than \(5\sigma \) when KLOE measurements are included but neither CMD-3 nor \(\tau \) data, as in [3], to the new prediction of \(2.5\sigma \) when CMD-3 and \(\tau \) measurements are included but not KLOE.