Complete QED NLO contributions to the reaction $e^+e^- \to \mu^+\mu^-\gamma$ and their implementation in the event generator PHOKHARA

KLOE and Babar have an observed discrepancy of 2% to 5% in the invariant pion pair production cross section. These measurements are based on approximate NLO $ \mu^+ \mu^- \gamma $ cross section predictions of the Monte Carlo event generator PHOKHARA7.0. In this article, the complete NLO radiative corrections to $ \mu^+ \mu^- \gamma $ production are calculated and implemented in the Monte Carlo event generator PHOKHARA9.0. Numerical reliability is guaranteed by two independent approaches to the real and the virtual corrections. The novel features include the contribution of pentagon diagrams in the virtual corrections, which form a gauge-invariant set when combined with their box diagram partners. They may contribute to certain distributions at the percent level. Also the real emission was complemented with two-photon final state emission contributions not included in the generator PHOKHARA7.0. We demonstrate that the numerical influence reaches, for realistic charge-averaged experimental setups, not more than 0.1% at KLOE and 0.3% at BaBar energies. As a result, we exclude the approximations in earlier versions of PHOKHARA as origin of the observed experimental discrepancy.


Introduction
The total cross section for electron-positron annihilation into hadrons is a very important physical observable weighting significantly on the theory error of the muon anomalous magnetic moment and the running electromagnetic fine structure constant used in the tests of the Standard Model and its extensions. For recent reviews see e.g. [1][2][3][4][5]. At high energies, the cross section can be calculated using perturbative QCD, however at low energies one has to rely on the experimental measurements.
One of the methods used to extract the hadronic cross section is the "radiative return", exploiting the fact that the cross section of the process with initial state photon radiation can be factorised into a known perturbative factor and the hadronic cross section without initial state radiation at the energy lowered by the emitted photons. Due to the complexity of the experimental setup, the extraction of the hadronic cross section within a realistic experimental framework can be achieved only by means of an event generator.
The most important contribution to the hadronic cross section is the pion pair production channel. Its accuracy is an issue as it provides the main source of error in the evaluation of the muon anomalous magnetic moment [2]. In view of the planned improvement of the direct measurement of the muon anomalous magnetic moment [6], with the expected error four times smaller than the present one, it is crucial to pull down the theoretical error as much as possible. The pion production cross section in e + e − scattering was measured, using the radiative return method, by BaBar [7,8] and KLOE [9][10][11]. Both experiments quote individual errors at the level of a fraction of a percent, while the discrepancy between them is up to 2% at the ρ peak and 5% when approaching the energy of 1 GeV. The origin of the discrepancy remains unclear. Since both experiments use the PHOKHARA [12,13] event generator to extract the hadronic cross section, it is necessary to check carefully its physical content. The PHOKHARA event generator was used to generate the reactions e + e − → π + π − + photons and e + e − → µ + µ − + photons. The latter process is used for monitoring the luminosity. So far, the version of PHOKHARA used by BaBar and KLOE included the dominant next-to-leading order (NLO) radiative corrections. In view of the above mentioned discrepancy between BaBar and KLOE, it is essential to make a full NLO calculation and to establish the importance of the missing contributions.
In this article, the complete radiative NLO Quantum Electrodynamics (QED) corrections to the reaction e + e − → µ + µ − γ are calculated, tested and implemented into the event generator PHOKHARA. From a technical point of view, the pentagon diagrams are the most challenging. Because there is no scale available which might lead to logarithmically enhanced contributions, they are expected to be small. However, it is known that logarithmic enhancements can be generated in some regions of the phase space within complicated experimental setups. The fact that we can not neglect the small electron mass for the same reason poses an additional challenge in the calculation of the virtual amplitudes since ratios of the order of s/m 2 e can appear, where s is the energy of the collider. This demands a good control on the numerical accuracy of our amplitudes 1 . Consequently, detailed Monte Carlo studies are mandatory. First studies were presented some time ago by the Katowice/Zeuthen group [15], in PhD theses [16,17] and also by an independent alternative approach [18,19]. This enabled us to compare specific phase space points with [18,19] with high precision as a first numerical test; see Ref. [17] for details. Here, we perform a complete calculation in the frame of a realistic Monte Carlo environment, PHOKHARA9.0. Because there are several known sources of numerical instabilities, and because we have no external cross-check available, we organized for two independent implementations of the QED virtual corrections. Having implemented the complete radiative corrections, detailed physics studies became possible, and their results are presented here.
The article is organised as follows: In Section 2 and Appendices B and C we give a detailed description of the calculation of the radiative corrections. Section 3 sketches the implementation of the radiative corrections into PHOKHARA9.0. In Section 4, the main tests of the correctness and the numerical stability of the code are presented. Further, the relevance of the NLO radiative corrections, which were missing in PHOKHARA7.0, is investigated. In Section 5, the possible impact of the radiative corrections, analogous to those studied here, on the pion form factor measurements of BaBar and KLOE is discussed. Section 6 contains the conclusions.
2 Radiative corrections to e + e − → µ + µ − γ The tree level diagrams contributing to the leading order (LO) amplitude are shown in Fig. 2.1. There are two types of contributions, those with initial state photon emission (ISR) and final state photon emission (FSR). The ISR and FSR pairs of diagrams are separately gauge invariant.
At NLO QED, there are the virtual and the real corrections resulting in three types of contributions to the cross section, the ISR and the FSR contributions, and their ISR-FSR interference terms.
We use dimensional regularization [20] to regularize the ultraviolet (UV) and infrared (IR) divergences. The UV divergences of the virtual amplitude are removed by the renormalization counter-terms. Both the virtual and the real corrections are infrared divergent. These divergences cancel in the sum for infrared-safe observables. The IR divergences are canceled and both the virtual+real (soft) and the real (hard) corrections become separately numerically integrable. Details of the real emission calculation are given in Section 2.2. In the following, we describe the method used to compute the virtual amplitudes.

Virtual corrections
Besides photonic self-energy corrections, there are 32 diagrams contributing to e + e − → µ + µ − γ at NLO QED. They can be classified into several independent gauge invariant subsets, which we will call Penta-Box, Box-Triangle-Bubble and Triangle contributions. The first class involves loop corrections with the two lepton lines attached to the loop. The most challenging diagrams are the four pentagon diagrams, shown in Fig. 2.2, where a real photon is emitted from an internal line.
They do not constitute a class of gauge independent diagrams by themselves. Gauge invariant groups are formed when a pentagon is associated with two box diagrams where a photon is radiated from the same external (electron or muon) line. This is shown schematically in Fig. 2.3. The contribution of these twelve Penta-Box diagram combinations, interfering with the tree level diagrams of Fig. 2.1, will be discussed in detail in Section 4. Finally, we mention the diagrams with external photon emission and self-energy insertions to the photon propagator 2 . They constitute a gauge-invariant universal correction which can be accounted for in any QED calculation by simply running the fine structure constant to the appropriate scale [21,22]. These self-energies are treated separately and have been omitted from our fixed-order loop amplitude definition in Fig. 2.4. The treatment of vacuum polarisation in the PHOKHARA event generator, together with narrow resonance contributions is described in detail in Ref. [23] and will not be discussed here.
In the present article, two independent programs using two different methods are used. One is based on the trace method and the calculations are done using double precision numerical routines, including the PJFry libraries [24]. We refer to it as "Double precision -Trace method" (DT-method). The other one is based on the helicity formalism as described in Ref. [25], and we refer to it as "Quadruple precision -Helicity method" (QH-method) because numerical calculations are done partially using quadruple precision. Such independent implementations are necessary to gain sufficient numerical reliability.

The DT-method
With the DT-method, topologies are generated by QGRAF [26] and then dressed with particles and momenta by the DIANA program [27] according to the QED model description file. The resulting output contains a list of Feynman diagrams in the textual representation, which is defined by the TML markup language script [28]. Next, the diagrams are passed through the FORM [29] script, which substitutes Feynman rules according to the selected model. Further manipulations are done with FORM. In addition, some general simplifications can be enabled by setting configuration parameters. This includes gamma algebra identities like γ µ γ ν γ µ = (2 − d)γ ν , the transversality condition, e.g. p 1 · ε(p 1 ) = 0, usage of Dirac equation and momentum conservation. The resulting expressions are written in the FORM tablebase. We use it as an input in the squaring program which sums the diagrams and multiplies them by the complex conjugated set of Born diagrams. The fermion lines are connected by the completeness relation. Then, Dirac traces are taken. For the calculation of the newly added one-loop pentagon contributions, one has to calculate 5-point tensor Feynman integrals up to rank R = 3. We reduce the tensor integrals in d = 4 − 2ε dimensions to scalar 1-to 4-point functions. They depend on the reduction basis chosen. Often one uses as basis momenta, the external momenta of the diagram as in Refs. [25,30]. Our choice (with a one-to-one correspondence) are the so-called chords, the shifts of internal momenta with respect to the loop momentum [24].
Advanced tensor integral calculations became a standard task in recent years, mainly triggered by LHC physics. Nevertheless, ensuring sufficient numerical stability is demanding for several reasons. An often discussed issue is the treatment (or avoidance) of small or vanishing inverse Gram determinants. Another one is just the extreme spread of scales met in our physical process, because we cannot neglect the electron mass m e ≈ 1/2000 GeV as an independent parameter. With √ s = 1 − 10 GeV, one faces e.g. a ratio m 2 e /s ∼ 10 −7 − 10 −9 . The DT implementation of tensor integral calculation relies on the approach developed in Refs. [31][32][33][34][35][36][37] and uses the PJFry tensor reduction package [17,38], combined with QCDLoop/FF [39,40] or OneLOop [41] for scalar integrals. More technical details can be found in Ref. [17] 3 .

The QH-method
The second implementation (QH-method) uses the helicity formalism as described in Ref. [25]. To build the virtual amplitude, four building blocks are used. Corrections to a lepton line with two real (on-shell or off-shell) photons attached in a fixed order of external bosons, Fig. 2.6, constitute the first building block, which we call Boxline and also include the corresponding counter-terms which are not shown in Fig. 2.6. We used the effective current approach, thus, V 1 and V 2 should be understood as generic off-shell currents which can be, in this case, an on-shell photon or an off-shell photon, which forms the second lepton line. The physical amplitude is built by considering all physical permutations and contractions with external currents yielding the Box-Triangle-Bubble gauge invariant subsets of Fig. 2.4. In addition, we use the vertex corrections to a lepton line with one real photon attached to it. All possible contributions result in the triangle contributions of Fig. 2.5. The third building group is formed by the Penta-Box diagrams depicted in Fig. 2 which involve the pentagon diagrams. The last building block is obtained by crossing the two initial fermion lines in Fig. 2.3 and constitute an independent gauge group. To compute them, we generalized the software developed in Ref. [25] to be able to compute diagrams with two fermion lines. This includes the use of Chisholm identities [45] which reduces the CPU time required to evaluate the Penta-Box contributions by a factor ten. At the same time, this improves the stability of the code since it makes explicit terms proportional to the small electron mass. The calculation of tensor integrals is done by using Passarino-Veltman reduction [30] for up to 4-point diagrams and the method of [46,47], following the convention of Ref. [25] for higher-point tensor integrals. The scalar integrals are calculated as in Refs. [22,48].
We use a cache system in all the building blocks such that the information of the loop-dependent parts are stored. This is particularly important for this process since up to 32 different helicity amplitudes exist corresponding to the different helicity and polarization combinations of the external particles. After the first helicity is computed, which include the evaluation of the loop-dependent parts, any additional helicity amplitude is computed with less than 10% of CPU time, reducing the CPU time of the code by a factor 10.
The building blocks do not use special properties like transversality or being on-shell for the real photons attached to the lepton lines, instead, we assume external effective current attached to them. This allows us to use Ward identities, by replacing an effective current with the corresponding momentum, to check the accuracy of the computed amplitudes. We classified our contributions in gauge invariant subsets so that the Ward identities are fulfilled. Those identities are called gauge tests and are checked with a small additional computing cost, using the cache system. They are checked for every phase space point and each gauge invariant subset distinguishing between FSR and ISR contributions. This is important because the phase space integration of the virtual contribution shows numerical instabilities in the calculation of the one-loop tensor integrals [25].
We have implemented a rescue system for phase space points where the Ward identities are not satisfied with an accuracy of at least three digits. First, we calculate the amplitudes applying quadruple precision only to the scalar integrals and tensor reduction routines. This requires reconstructing the external momenta in quadruple precision, so that global energy-momentum conservation is still fulfilled at the higher numerical accuracy retaining the external particles on their mass-shell. If the Ward identities are not satisfied, the rescue system evaluates the amplitude using quadruple precision in all parts of the code. With this system, we find that the proportion of phase space points that do not pass the Ward identities for a requested accuracy of ε = 10 −3 is well below one in ten thousand. The rescue system adds an additional 10% to the CPU time.
Despite the cache system and the use of Chisholm identities, this implementation is still seven times slower that the DT-method which can be traced back to the evaluation of the 32 helicity amplitudes and the parts evaluated in quadruple precision.
We have tried to evaluate the amplitudes only in double precision to improve the speed of the code by a factor two using dedicated subroutines for small Gram determinants. These involve the evaluation of three-and four-point functions up to Rank 11 and 9, respectively, following the notation of Ref. [25]. The high rank of the rescue system allows to obtain full double precision for mild cancellations in the Gram determinants. However, two problems arise. First, for the failing phase space points, there was always some internal combination for which the expansion breaks down due to the presence of additional cancellations in sub-Cayley determinants. Thus, full double precision is not achieved for all tensor integrals coefficients. Second, within the helicity method formalism, there exist extremely large cancellations between the different helicity amplitudes resulting into an additional loss of precision which can be larger than the one due to the presence of small Gram determinants. These numerical cancellations are related to the fact that the mass of the electron has to be retained, and many numerical cancellation occur. For example, in the numerical calculation of / p e u(p e ) = m e u(p e ) cancellations of the order of s/m 2 e would appear, where s is the energy of the collider, if the Dirac equation is not applied or is not treated carefully.
These two problems are reflected in a bad accuracy of the gauge test and, therefore, in the large number of identified unstable points using only double precision. The second problem is naturally solved using the DT-method since the summation and averaging over spins are done analytically and many of these numerical cancellations are avoided. We decided to implement the DT-method in PHOKHARA9.0 and compare it with the code implemented in full quadruple precision where the gauge test ensures the numerical accuracy of the code.

Real photon emission
The real two-photon emission, which contributes to the e + e − → µ + µ − γ cross section at NLO is now included in the PHOKHARA code completely in contrast to the implementation of Refs. [49,50], where subleading contributions were neglected. We distinguish between soft and 2-hard photon emission.

Soft photon emission
In the soft photon contribution, the phase space of one of the photons (k 1 ) is integrated out analytically. The integrals to be performed, which factorise in front of the square of the full amplitude describing the e + e − → µ + µ − γ reaction, read where p 1 , p 2 , q 1 and q 2 are the momenta of the positron, electron, antimuon and muon, respectively. The infrared regulator -the photon mass λ and the photon energy cut-off E max dependence can be cast into a single parameter In principle, this is a well known formula, which can be found in the literature [51,52]. However, as the integral over the photon energy is performed only up to a given cut-off E max , its form depends on the frame in which this cut-off is applied. We found it suitable to give the formulae, which are valid in any frame in which the cut-off is defined. They are a bit longer than the usual ones as we express everything through the four momenta p 1 , p 2 , q 1 , q 2 given in the frame, where the cut-off is defined, in contrast to the usual expressions which use invariants, but are less universal. The explicit expression for F(p 1 , p 2 , q 1 , q 2 , r) is given in Appendix C.

Two-hard photon emission
For the two-hard photon emission, the helicity method for the calculation of the amplitudes was used and cross checked with a dedicated code based on the trace method of spin summation. The convention for the helicity amplitude method introduced in Ref. [49] was adopted.
The only amplitude, which was missing in earlier versions of PHOKHARA was the two-photon FSR. Interferences between the coded amplitudes, with infrared divergences matching the ones from Penta-Box diagrams, were also not included. After some algebra similar to Refs. [49,50], a very compact form for the two-photon FSR amplitude is obtained where the matrices A and B and the convention used to define the spinors are given in Appendix B.
The energy of one of the photons has to be bigger than the cut-off E max and the sum of the soft and hard contributions should not depend on this cut-off up to terms ∼ E max , which are neglected in the analytic calculation.
3 Implementation of the radiative corrections in the event generator PHOKHARA9.0 PHOKHARA9.0 is available from the webpage http://ific.uv.es/~rodrigo/phokhara/. As stated already, all new parts of the released computer code were calculated independently by two methods and/or groups of the authors of this article. To ensure the stability of the virtual corrections, we use the two independent codes described in Section 2. The faster routine in the released version of PHOKHARA9.0 is used, which is the code based on the DT-method. The other one can be obtained on request. We sketch here shortly the new ingredients of the released code. The listed changes concern only the e + e − → µ + µ − γ mode, when it is running with the complete NLO radiative corrections: • The virtual corrections are calculated in double precision and the sum over polarisations is done with the trace method. The software PJFry [24] is used for this purpose and the relevant parts of the libraries developed there are distributed with PHOKHARA9.0.
• The soft photon emission is calculated using the formulae discussed in Section 2.2.1 and in Appendix C.
• The two-hard-photon emission part uses the helicity amplitudes as defined in Ref. [50]. The newly added part -the two photon FSR is described in Section 2.2.2 and coded accordingly.

Tests of the code
The released code was tested very extensively both for the real and the virtual contributions to assure the technical accuracy of the code to be much better than the one required for experimental measurements. The necessity of retaining a finite electron mass possesses a potential threat of numerical instabilities both for the real and the virtual contributions since cancellations of the order of s/m 2 e can appear. The virtual corrections constitute the most challenging part. The presence of Gram determinants can constitute an additional source of instabilities which can be more challenging for some realistic experimental selection cuts where forward photon emissions are favoured, resulting in collinear photon emissions.
The checks clearly show the control on the numerical accuracy of the virtual corrections.
Additionally, using the DT-method and realistic cuts (see Appendix A for their definition) for KLOE and Babar energies, we have studied the relevance of the most challenging contribution in this article. For this purpose, we compare the contributions from one-loop Penta-Box diagrams defined in Section 2 with the Born contributions in Fig. 4.1 for muon angular distributions and in Fig. 4.2 for the µ + µ − invariant mass distribution. As we can clearly see from the muon and antimuon angular distributions, the size of the Penta-Box contributions can reach the percent level and they cannot be neglected for the charge odd observables. We confirm here the expectations that the neglected corrections [50] for the charge even distributions are indeed small. For a classification of the charge odd and even contributions, we refer the reader to Ref. [54], where it was done for charged pions in the final state. Replacing pions with muons does not change the classification presented there.  Relevance of NLO Penta-Box contributions at KLOE (above) and Babar (below) energies for muon angular distributions: θ µ + is the µ + polar angle, while θ is the angle of µ + or µ − for the charge 'blind' observable. These definitions are the same in all the figures and will not be repeated in captions.
To appreciate these results, in Fig. 4.3, we plot the Penta-Box contributions using the PJFry package with (left) and without (right) using expansion for small Gram determinants. The right panel reveals discrepancies only after increasing the number of Monte Carlo events to 10 9 [16,55]. PJFry treats properly small Gram determinants, as discussed in details in [17]. With the PJFry package, the leading inverse Gram determinants |G (5) | are eliminated in the tensor reduction and small inverse  Gram determinants |G (4) | are avoided using asymptotic expansions and Pade approximants. For more details concerning the numerical stability of the tensor reductions, see Refs. [17,56]. The results are completely stable and well under control. The soft real emission analytic formulae were also checked. Firstly, by comparing to the integral obtained by means of a Monte Carlo methods. A good agreement was found even if the method is limited in some cases to an accuracy of 2 · 10 −4 . Secondly, the numerical stability of the code was tested comparing the quadruple and the double precision versions of the same code. The relative accuracy of the double precision version used in the released code of the generator is at the level of 10 −7 at 1 GeV, while at 10 GeV , it was only about 10 −3 in some corners of the phase space. However, since those phase space regions did not affect the relevant observables (invariant mass and angular distributions), the code was not changed to cure this behaviour by means of appropriate expansions.
The new contributions of the real two-photon emission were tested comparing two completely independent codes. In one, the trace method (T) and FORM [57] was used to obtain an analytic result, in the second one, the helicity amplitude method (H), described in [49] and in Appendix B, was applied.
The biggest observed relative difference of the codes was at the level of 10 −4 even if both codes were using double precision only. Additionally, in both cases gauge invariance was checked. For the T-method analytically, and for the H-method numerically, obtaining a relative accuracy of 10 −15 .
Both the soft and the real parts were tested checking the independence of the cross section and differential distributions of the separation parameter between the soft part, where the integral over the one photon phase space is performed analytically, and the hard part, where the integral is obtained using the Monte Carlo generation. The accuracy of this test was 2 · 10 −4 . A perfect agreement at this level of accuracy was observed.  For the charge sensitive observables, which were available at PHOKHARA7.0 only at LO (the ISR-FSR interference was present at LO only) the new corrections are relatively bigger and reach typically a few percent as expected from NLO corrections. The KLOE event selection was designed to diminish the FSR radiative corrections and as such it was also mostly killing the asymmetry coming from one photon emission. The asymmetry coming from the two photon emission is however surviving the cuts as shown in Fig. 5

Conclusions
The presented studies allow for the development of a numerically stable Monte Carlo event generator PHOKHARA9.0 simulating the reaction e + e − → µ + µ − γ with full NLO QED accuracy. The radiative corrections which were missing in the previous versions of the generator can reach a few percent. Though, it was shown that the charge blind observables used by the BaBar and KLOE collaborations are affected only at the level of 0.1% for KLOE and 0.3% for BaBar. We conclude that the observed discrepancies between these experiments cannot be attributed to the missing corrections for the reaction e + e − → µ + µ − γ in PHOKHARA4.0 [50,58]