$\eta$ and $\eta'$ decays into lepton pairs

In this work, we calculate the branching ratios for the $\eta(\eta')\rightarrow\bar{\ell}\ell$ decays, where $\ell = e,\mu$. These processes have tiny rates in the standard model due to spin flip, loop, and electromagnetic suppression, for what they could be sensitive to New Physics effects. In order to provide a reliable input for the Standard Model, we exploit the general analytical properties of the amplitude. For that purpose, we invoke the machinery of Canterbury approximants, which provides a systematic description of the underlying hadronic physics in a data-driven fashion. Given the current experimental discrepancies, we discuss in detail the role of the resonant region and comment on the reliability of $\chi$PT calculations. Finally, we discuss the kind of new physics which we think would be relevant to account for them.


Introduction
Pseudoscalar decays into lepton pairs have been subject of continuous study, starting in the fifties with the first paper by Drell [1] on π 0 → e + e − . Special interest in these processes lies in the fact that, given the small rates which are predicted within the Standard Model (SM), they could be sensitive to New Physics effects [2][3][4][5]. The η and η cases are of particular interest since they access the µ + µ − channel as well, opening the possibility of investigating lepton flavor violation. Given the current puzzles existing in the lowenergy precision frontier of particle physics -specifically, the long standing discrepancy among the electron and muon anomalous magnetic moments [6,7], and the most recent proton radius puzzle coming from the different values obtained from electronic-and muonichydrogen experiments [8], together with R K and R D ( * ) [6] anomalies, where R X = Γ(B → Xµ + µ − )/Γ(B → Xe + e − )-, it is of interest to study whether similar puzzles appear in these processes as well.
η → e + e − η → µ + µ − η → e + e − η → µ + µ − π 0 → e + e − ≤ 2.3 × 10 −6 [15] 5.8(8) × 10 −6 [16] ≤ 5.6 × 10 −9 [17,18] − 7.48 (38) × 10 −8 [19]  Entering this precision region requires a precise description of the underlying hadronic process fully driving this electromagnetic decay (see Fig. 1) as well as an accurate calculation of the loop integral involved, which itself requires a full-energy hadronic description 1 . Moreover, in the loop integral calculation, it has been common to neglect mass effects [9,10] which become relevant at a certain precision. In Ref. [11], we showed that a possible approach to avoid the inherent uncertainties from widely-used hadronic models in this calculation is provided by Canterbury approximants, which allow for a data-driven description of the hadronic process in a systematic way; this is because our method exploits the analytic properties of the underlying function. Moreover, we used an exact loop-integral numerical evaluation. In the present work, beyond resuming the main properties of the method [11] to extend it to the η and η cases, we shall discuss a set of novel features coming from the appearance of intermediate hadronic states absent in the π 0 → e + e − , and show how our method is able to deal with them. Our results, together with the latest radiative corrections [12][13][14], would pave the way for the precision low-energy frontier of the Standard Model and New Physics searches in these decays.
The article is organized as follows: In Sec. 2, we report on the state-of-the-art experimental measurements and theoretical predictions together with a general description of the main features of the process under discussion. In Sec. 3, we review the basics of our approach based on Canterbury approximants. In Sec. 4, we discuss the role of intermediate hadronic states and the performance of our approach with the help of a toy model. We present our results in Sec. 5. In Sec. 6, we discuss, in the light of our results, the role of chiral perturbation theory (χPT) when calculating such decays, together with a parameterization to obtain the π 0 -exchange contribution to the 2S hyperfine-splitting in the muonic hydrogen; we also derive a general formula to account for the difference between the electronic and muonic channels in the SM beyond the leading order χPT estimate. Finally, with our results at hand, we discuss the implications of experimental results on New Physics searches in Sec. 7. We provide our conclusions and summarize our results in Sec. 8.
2 Pseudoscalar decays into lepton pairs: state of the art

Experimental status
Pseudoscalar decays into lepton pairs are considered to be rare since their branching ratios (BR) range from 10 −9 to 10 −6 . The state-of-the-art experimental measurements on the BR(η, η → ) with = e, µ are collected in Table 1 where we have also included the π 0 → e + e − for completeness.
The impact of the numerical corrections above show that approximate calculations do not represent a reliable result for the η and η cases. Consequently, we will employ an exact numerical evaluation in what follows. For completeness, we also include the Z-boson contribution in our final results since its size is similar to some of the hadronic uncertainties.

A dissection of the P → process
The dominant contribution to the P → , illustrated in Fig. 1, is mediated through an intermediate two-photon state, which is a loop process of the momentum k running through the photons 7 . The main vertex of the process (the blob in Fig. 1), describes the P → γ * γ * transition which is of electromagnetic nature and proceeds through the Adler-Bell-Jackiw anomaly [37,38]. Since the photons are virtual, the corresponding amplitude reveals the meson structure at all energies in terms of the pseudoscalar-photon transition form factors 5 Ibid, footnote 3. 6 Ibid, footnote 3. 7 There is an additional Z boson contribution to these processes which, as shown in Sec. 7, is subleading but included in our final results. η, η ′ ℓ ℓ Figure 1. Leading contribution to η(η ) decays into lepton pairs. The blob represents the TFF.
(TFFs). The TFF, which is a function of the photon virtualities q 2 1 and q 2 2 , is denoted as F P γ * γ * (q 2 1 , q 2 2 ). A description of the TFF at all energies is a formidable task because not only involves different scales but also requires knowledge of the intermediate particles produced by the photons. No complete description of both time-like (TL) and space-like (SL) regions is available so far. Fortunately, due to the kinematics of the process, the required information belongs (mostly) to the SL region, for which certain information is available. For example, (F P γγ (0, 0)) 2 ∼ Γ P →γγ , which are measured quantities and also calculable in the chiral limit; the large photon virtualities are known in turn from perturbative QCD [39]. The interpolation in between these two regimes represents still a challenge in QCD, and demands a model. Since the problem is how to perform the loop calculation in the SL, with scarce set of theoretical input but with great deal of experimental data with respect to the TFF at low and intermediate energies, our attempt in the present work is to perform an interpolation that should satisfy all possible constraints. From the mathematical point of view, this problem is called the general rational Hermite interpolation problem and the solution is known to be within the mathematical theory of Padé approximants [40]. Padé approximants (PA) are rational functions R N (x)/Q M (x) with a contact of order O(x N +M +1 ) with the function one wants to approximate. That is, their Taylor expansion matches the first N + M + 1 terms from the original function. Only by satisfying these accuracy-throughorder conditions can one claim that the analytical properties of the original function are retained. The blob in Fig. 1 represents, however, the TFF of double virtuality and the appropriate approximation to the TFF must be of a bivariate kind. The extension of PA to two variables are called Canterbury approximants [11].
In order to perform the calculation in Eq. (2.2) -as we have already said-several models for the TFF have been proposed. They started with the advent of VMD ideas [42]. Their theoretical limitations however, arising from the use of a finite set of resonances in a narrow width approximation [43] and lacking the connection with pQCD -a fact circumvented with the use of Padé theory [44,45]-limited their reliability. To supply this, it has been proposed to use high-energy QCD constraints [26] and, later, well-known phenomenological models constrained from data, which alleviate the model dependency [35]. Moreover, given the lack of double-virtual data on the TFF (which is still an experimental challenge), all the model parameters had to be fixed from high-energy QCD constraints [35] only. Whereas these constraints should be considered, they might not account for the lowenergy behavior which, in turn, is the dominant region contributing to this process and may spoil the global description [11].
An alternative idea to circumvent these problems is to use χPT. We only note for the moment that this approach is not well suited though to describe the η − η system and we will comment on its accuracy at the end of Sec. 6.

Canterbury approximants
In this work, we use the method of Canterbury approximants (CAs) to implement the TFF low-energy behavior together with the QCD constraints in a data-driven approach. Indeed, assuming the relevant intermediate hadronic states to be the ρ resonance (through ππ rescattering), and the ω and the φ narrow-width resonances, all of them with positive imaginary part and belonging therefore to the class of Stieltjes functions, the convergence of CAs would be guaranteed as follows from Refs. [46,47]. Given a symmetric bivariate function F P γ * γ * (Q 2 1 , Q 2 2 ) = F P γ * γ * (Q 2 2 , Q 2 1 ) with known Taylor expansion 1) CAs [11,46,48] are rational functions 8 of bivariate polynomials of degree N and M , respectively, which coefficients are defined as to match the low-energy expansion of the original F P γ * γ * (Q 2 1 , Q 2 2 ) function, i.e., , (3.2) fulfilling the conditions specified by Eqs. (A.3, A.4). Only from this definition is our rational function guaranteed to converge to the original function provided it fulfills certain analytical properties (for instance, if it is a meromorphic [40,44,45,[49][50][51][52] or a Stieltjes function [40,47,49,50,53,54]). This construction allows to describe the TFF with the correct lowenergy implementation, which is known to play the main role in these processes, a fact often overlooked (see the discussion in [11]). Moreover, it has the ability to implement at the same time the high energy QCD-behavior, which is of relevance for the high-energy tail of the integration. As discussed in Ref. [11], the simplest element of the C N M (Q 2 1 , Q 2 2 ) sequence reads [11]: where a 0 = F P γγ (0, 0) is related to the Γ P γγ , with m P the pseudoscalar mass. F ηγγ (0, 0) = 0.274(5)GeV −1 and F η γγ (0, 0) = 0.344(6)GeV −1 using the Γ P γγ from the PDG [6]. The parameter b P is the slope of the single-virtual TFF [51,55,56] and ensures the appropriate low-energy behavior up to O(Q 4 ) corrections.
The most precise determinations for the slope of the η and η TFFs have been obtained from a data-driven procedure and read b η = 0.576 (12) [55] and b η = 1.31(4) [56], respectively. In addition, for one virtuality C 0 1 (Q 2 , 0) satisfies the Brodsky-Lepage (BL) [39] high-energy Q 2 -behavior by construction. For the π 0 , the BL asymptotic limit reads lim Q 2 →∞ F P γ * γ * (Q 2 , 0) = 2F π Q −2 with F π = 92.21 MeV [6] the pion decay constant. For the η and η , a similar formula exists which involves both the η − η mixing parameters and the running effects of the singlet axial current [55,56]. In our case, when we state that we fulfill the correct BL Q 2 -behavior, we refer to the fact that our approximant satisfies lim Q 2 →∞ Q 2 F P γ * γ * (Q 2 , 0) = C Q −2 , but without fixing the precise C coefficient (i.e. C = 2F π for the π 0 and its counterpart for the η and η ). Not fixing C is extremely important in our method, given the relevance of the low energies when using the simplest C 0 1 (Q 2 1 , Q 2 2 ) approximant to perform the numerical evaluation of Eq. (2.2). Matching b P /m 2 P to C instead of to the TFF's slope will spoil the TFF low-energy expansion dominant in our decays-see discussion in Appendix A and Table 8. Not only that, but the resulting description can be checked to reproduce the lowenergy data on the single-virtual TFF to an excellent precision and the -less relevantintermediate and high-energy data up to 10% accuracy, see Figs. 6 and 7 in this respect as well.
Finally, a P ;1,1 is the double virtual slope of the TFF. Since there is no experimental data for the double virtual TFF so far, the method of fitting the experimental data with a PA sequence considered in Refs. [51,[55][56][57][58] cannot be used to obtain the double-virtual parameters. For this reason, we suggested in Ref. [11] to choose a generous range for this a P ;1,1 parameter that should cover the well-known theoretical constraints at low and high energies, including the yet unknown real value. It turns out that, at low energies -the most important region in our calculation-χPT calculations favor the so called factorization approach [59], namely, thatF P γ * γ * (Q 2 1 , Q 2 2 ) =F P γ * γ (Q 2 1 , 0) ×F P γγ * (0, Q 2 2 ), implying that a P ;1,1 = b 2 P in Eq. (3.3) 9 . On the other hand, it is known that at high energies the operator product expansion (OPE) dictates that F P γ * γ * (Q 2 , Q 2 ) ∼ Q −2 [26,35], implying a P ;1,1 = 2b 2 P in Eq. (3.3) 10 . Deviations from factorization should translate into a P ;1,1 > b 2 P . Then, from now on and due to the ignorance on the precise value of a P ;1,1 , we will assume for the rest of our work b 2 P ≤ a P ;1,1 ≤ 2b 2 P . In addition, we note that from Eq. (3.3), a P ;1,1 ≤ 2b 2 P is required for avoiding poles in the SL region.
To scrutinize a bit more the interplay of these two regimes, let us look briefly at the π 0 → e + e − decay, for which is possible to employ a reliable approximated version for Eq. (2.2) [11] thanks to the smallness of the π 0 mass. This allows to split the integral (I) in Eq. (2.2) in two parts as I = Λ 0 Fact + ∞ Λ OPE, in which the low-energy part below the scale Λ is integrated using the factorization approach (i.e., a P ;1,1 = b 2 P ), whereas the high-energy part above Λ is integrated using the OPE (a P ;1,1 = 2b 2 P ). We obtain BR(π 0 → e + e − ) = {6.15, 6.21, 6.26, 6.29}×10 −8 for Λ = {0, 1, 2, ∞}. This shows that, if factorization represents a good approximation at low energies, the freedom on choosing when the OPE should contribute implies a non-negligible systematic error at the precision we are aimingof the order of the percent level-and vice-versa. It is reasonable to assume then, that such deviation should be covered by choosing a conservative range a P ;1,1 ∈ (b 2 P , 2b 2 P ). In addition, we remind that the mathematical theory of CAs does not dictate what a P ;1,1 to take and both choices are equally correct as they only correspond to different reconstructions. We illustrate these discussions, e.g. the relevant range for the a P ;1,1 parameter and the relevance of using the low-energy parameters, in the Table 8 of Appendix A with the help of two well motivated toy models. Again, we emphasize that the reconstructed element, C 0 1 (Q 2 1 , Q 2 2 ), fulfill the different Q 2 -behaviors which are implied by QCD for a P ;1,1 = 2b 2 P , but not the corresponding coefficients -these could be implemented in higher elements, which unfortunately involve at the moment too many unknown parameters of double virtuality. Still, from our studies in [11] together with Appendix A and the reasons presented above, these effects are seen to be tiny and the quoted value already provides a reliable result.
The last ingredient in our approach is the estimation of a systematic error induced by the truncation of the CA sequence, i.e., the fact that even though the CA sequence converges to the TFF, at a given and finite CA order, the difference between the function and its approximant is not zero. The inclusion of a systematic error marks a difference with respect to previous studies. This is the goal of the next chapter.

Assessing the systematic error
The η and η masses are large enough to yield intermediate hadronic states in the P →¯ processes as sketched in Fig. 2, which implies an additional imaginary part beyond that of the γγ contribution via Cutcosky rules. As we have already said, this novel feature -not present for the π 0 -diminishes the imaginary part with respect to the γγ intermediate state invalidating the unitary bound [1], |A(m 2 P )| 2 ≥ Im(A γγ (m 2 P )) 2 , which tacitly assumes the absence of intermediate states beyond the γγ one. This effect can be understood from the loop integral as well. After Wick-rotation, the TFF integration domain lies in the −m 2 P ≤ Q 2 ≤ ∞ region, which for the η and η involves threshold production (m η > 2m π ) and resonances (m η > m ρ,ω > 2m π ).
This effect has never been considered before when calculating the rare decays and must be taken into account as well when evaluating the systematic error. As we will show below, the uncertainty from the resonance part of Fig. 2 becomes the dominant source of error. To quantitatively study this effect, we take a toy model for the TFF that includes both a two-pion production threshold and a vector resonance. The model is conceived in such a way that the time-like region contains all the required features of the physical TFF up to the η mass. The first ingredient in our toy model is factorization, which as explained before seems a reasonable choice at low energies. The second ingredient is the use of vector meson dominance ideas [61] allowing to express the single-virtual TFF as where G V (s) are the different resonance contributions weighted by the dimensionless couplings c P V which are obtained from a quark-model, with σ(s) = 1 − 4m 2 π /s, and the parameters M ρ = 0.815 GeV, F π = 0.115 GeV, µ = 0.775 GeV, and m π = 0.139 GeV, chosen to reproduce the pole position s ρ = (M − iΓ/2) 2 with M = 0.764 GeV and Γ = 0.144 GeV from [65], while for the (narrow-width) ω, φ resonances, we take 11 which parameters are fixed from PDG masses and widths [6]. This choice makes our model very similar to the dispersive approach formulated in [62].
To evaluate now the branching ratio of the decay, we have to calculate the diagram in Fig. 1. The blob there stands for the TFF, which includes among others the contributions from Fig. 2 and is accounted for by the model in Eq. (4.1). Employing the Cauchy integral representation for the (factorized) TFF, and changing the integration order in the integral (2.2), allows to express the loop amplitude as This procedure results on an easy evaluation of the loop amplitude denoted as K(M 2 1 , M 2 2 ) through standard one-loop techniques [66] or a numerical evaluation using LoopTools [67]. Now, the threshold effects are clear and easier to handle. To illustrate them, we plot the imaginary part of the integrand in (4.5) in terms of Im[F P γ * γ (M 2 V )] and Im[K(M 2 V )] -which contains both γγ and vector contributions-when dispersing only one virtuality in (4.5) for simplicity (i.e., we neglect the q 2 dependence on the second virtuality). The resulting plot is shown in Fig. 3 as a solid-black (dashed-purple) line for the η(η ) in terms of the dispersive variable M V once the d 4 k integration has been performed to give K(M 2 V ) in the last line of Eq. (4.5). These lines have to be convoluted with Im[F P γ * γ (M 2 V )]. In Fig. 3, this is represented by the bluish area. For clarity, we only plot there Im[G ρ (s)], since the ρ resonance is the only one relevant in discussing ππ threshold effects. The overall contribution would resemble thereby that in Fig. 3, but with additional sharp peaks at M V = m ω , m φ locations, each of them weighted by the corresponding c η(η )V coefficient. For M V > m P , the γγ contribution dominates (the bluish region below m η in Fig. 3 is almost negligible) and will be slightly modified when m P > 2m π due to the tail of the resonance contribution, whereas it will be less important when ] (black and dashed-purple lines for the η and η , respectively) which has then to be convoluted with will be shifted with respect to the γγ contribution. This is, as unitarity implies, whenever an intermediate hadronic channel appears below m P .
For completeness, we illustrate in Table 2 the numerical shift in the imaginary part induced by the vector contributions with respect to the γγ one using our toy model (4.1)we stress that the three (V = ρ, ω, φ) channels have been included in Table 2 Table 2. Imaginary part of A(q 2 ) (total) compared to the imaginary part calculated from the γγ channel alone. The hadronic contributions lower the total value of the imaginary part with respect to the γγ contribution, invalidating the unitary bound.
Given that our model is a Stieltjes function, it is well known that the C N N +1 (Q 2 1 , Q 2 2 ) sequence is guaranteed to converge in the whole complex plane, except along the cut [40,49], where zeros and poles of our CA will clutter to reproduce the discontinuity [40,49,54] 12 . Remarkably, even if the sequence does not converge along the cut, the integral along its imaginary part converges globally to that of the real function, which follows from Cauchy's  Table 3. Comparison between our toy model result and the simplest C 0 1 (Q 2 1 , Q 2 2 ) approximation for each channel. The Error represents the relative deviation between the model and the approximation. Left table collects the BR, whereas the right table contains the loop amplitude A(m 2 P ). See details in the main text.
integral theorem. This means in practice that the approximant's poles will be responsible for effectively generating an imaginary part in our integral via the i prescription mimicking the cut contribution. As an illustration, we collect the results for both BR and A(m 2 P ) from our simplest approximant, the C 0 Table 3 and compare its results with the toy model. The comparison of the BRs reveals a systematic error induced by the fact that we have truncated the CA sequence. For the η, such error is almost negligible (the role of the vector resonances is very mild there), whereas for the η it goes almost up to 20%. These percentages will be used as an estimate of our systematic error in our final results for the C 0 1 (Q 2 1 , Q 2 2 ) element. We would like to remark at this point that using a VMD model with the ρ masswhich was standard in the past for performing this calculation-, we would have found BR(η → ee) = 5.30 × 10 −9 , which implies a larger systematic uncertainty compared to our results with the C 0 Table 3, i.e., BR(η → ee) = 5.42 × 10 −9 . Using experimental data from the space-like region to fit the VMD model does not improve on the result. In this case, we would have obtained BR(η → ee) = 5.26 × 10 −9 . These results illustrate the potential large systematic error coming from the usage of VMD data-fitting procedures from high energies for processes which are low-energy dominated, even if the quality of the fit is good enough and the errors tiny.
As we have said, the convergence of the CA sequence to our toy model is guaranteed in advance [40,49,54]. To fully show this feature beyond the simplest C 0 1 (Q 2 1 , Q 2 2 ) discussed in Table 3, we now discuss the the the C N N +1 (Q 2 1 , Q 2 2 ) sequence up to N = 13 and calculate the corresponding amplitude in Eq. (2.2). The relative distance in the complex plane of the energy squared, defined as |1 − A(m 2 P ) CA /A(m 2 P )|, is shown in Fig. 4, where, for simplicity, we employ only the G ρ contribution in (4.1) (without ω, φ contributions).
The results in Fig. 4 show the ability of our approximants to systematically account for the TFF to arbitrary precision since the relative distance decreases when the order of the CA increases, even in the presence of the non-trivial behavior of the branch cut from the intermediate hadronic states. This exercise is a proof of concept that even though the CAs cannot reproduce the imaginary part locally, they are able to approximate it globally with great precision. Note the a priori irregular convergence for the η case in Fig. 4 (top panel). This is just an accident due to the combination of the particular TFF we are using together with the particular value of the η mass. Whenever some pole is located close to the η mass, it leads to a bad determination. This is compensated in higher approximants  We find then that, for the C N N +1 (Q 2 1 , Q 2 2 ) element, the systematic error can be accounted for, essentially, by the difference in the BR with respect to the C N −1 N (Q 2 1 , Q 2 2 ) result. As in our case of study we only reach the C 0 1 (Q 2 1 , Q 2 2 ) approximant, this procedure does not apply and we take as the systematic error for the BR the one which is displayed in the fourth column in Table 3 (based in our toy model). This possibly overestimates the systematic error, see comments in Sec. 5, but we opt for this to remain on the conservative side. As soon as experimental data on the double virtual TFF will be available, we will be able to extend our CA sequence and reduce the systematic error.
In Ref. [11] we considered a similar method for establishing the systematic error in the case of the π 0 → e + e − case. There, we used the large-N c Regge model from Ref. [68] which roughly factorizes at low energies, obeys the OPE, and for which a P ;1,1 b 2 P -and obtained rather small (O(1%)) systematic errors for the C 0 1 (Q 2 1 , Q 2 2 ) element, which was accounted for by the chosen a P ;1,1 range. As remarked before, the large systematic error in Table 3 comes form the presence of the resonance region and the difficulty of approximating it with a single pole.

Final results
Our final results for the pseudoscalar decays into lepton pairs using the method here described have been calculated using Eq. (2.2) filled with the form factor data-driven param-eterization described by Eq. (3.3).
The loop integral evaluation has been performed using FeynCalc [69] to obtain the decomposition into the Passarino-Veltman functions; the latter have been numerically evaluated using LoopTools [67]. In the factorization case (a P ;1,1 = b 2 P ) this is possible using partial fraction decomposition, allowing to express the TFF and photon propagators in Eq. (2.2) as and the corresponding result involves one-, two-and three-point functions. For the OPE choice (a P ;1,1 = 2b 2 P ) the loop integration is possible rewriting the TFF as which allows to express the loop integral in terms of one-, two-, three-and four-point functions.
Later on, the branching ratios have been calculated with the formula given in Eq. (2.1). The results for the amplitude A(m 2 P ) and for the BRs are collected separately for illustrative purposes in Table 4, second column, and in Table 5, second column, respectively, and represent the main result of this work. In addition, we have included for completeness the Z-boson contribution (third column in Table 4)-we refer the interested reader to Section 7 or to Ref. [70]. Finally, these tables also include results when the kernel in Eq. (2.2) is expanded in terms of m l /m P as well as m l /Λ and m P /Λ, with Λ a cut-off of the loop integral (fourth column in Tables 4 and 5).
To report a final result, several sources of errors must be taken into account: • Our ignorance on the value of the slope of double virtuality a P ;11 , Eq. (3.3), is captured by the range a P ;11 ∈ (b 2 P ÷ 2b 2 P ). With this range, we obtain the results for the loop amplitude A(m 2 P ), Eq. (2.2), collected in Table 4, and its associated BR displayed in Table 5. The values, which are reported in a range form corresponding to imposing the OPE or the factorization, include statistical errors for the amplitudes and statistical and systematic errors for the branching ratios 13 .
• The single error given for the amplitude results in the second column of Table 4 corresponds to the error of the parameter b P in Eq. (2.1) used for the TFF reconstruction. This parameter already includes a systematic error on top of the statistical one, which was estimated in Refs. [55,56] for η and η , respectively.
• The results collected in Table 5 contain three different sources of error. The first corresponds to the often neglected uncertainty from BR(P → γγ) [6] in Eq. (2.1). This error is relevant for the η and very important for the η . The second source is  Table 4. Our results for the range a P ;11 ∈ (2b 2 P ÷b 2 P ), corresponding to (OPE÷Factorization). The error refers to the statistical error alone. We quote the Z-boson contribution A Z (m 2 P ) separately and quote the approximated A app (m 2 P ) calculation after expanding Eq. (2.2) in terms of m l /m P as well as m l /Λ and m P /Λ. See details in the main text.  Table 5. Our results for the Branching Ratios for the range a P ;11 ∈ (2b 2 P ÷ b 2 P ), corresponding to (OPE÷Factorization). The errors refers to the statistical error for BR(P → γγ), the error from b P and the systematic, respectively. We compare to the results either neglecting the Z-boson contribution (BR w/Z) or after expanding Eq. (2.2) in terms of m l /m P as well as m l /Λ and m P /Λ (BR app) discussed in the main text. the one coming from b P as discussed in the previous item. The third is the systematic error from our method discussed in the previous section and given in Table 3, fourth column.
For completeness, Tables 4 and 5 also include the results for the π 0 → e + e − process [11].
From those values it is clear that, for the η decays, the most pressing task is the improvement of the double-virtual description which currently limits the theoretical uncertainty (the difference in the BRs between OPE and factorization is of about 0.1 × 10 −9 for the ee channel, and about 0.2 × 10 −6 for the µµ channel, far beyond the impact of the errors quoted). Notice that the systematic error of our method is of the order of the other error sources. To reduce our ignorance on the double virtual TFF, experimental data (or, eventually, lattice simulations) on the double-virtual TFF would be required.
Concerning the η , the situation is more complicated. In Table 5, third row, the largest errors arise from our systematic-error estimation in Table 3, while the errors coming from the normalization or the slope of the TFF are milder. The systematic uncertainty could be dramatically reduced if instead of estimating it using the toy model in Sec. 4, we would look at the difference between the C 0 1 (Q 2 1 , Q 2 2 ) and the C 1 2 (Q 2 1 , Q 2 2 ) elements in the factorization approach (where no knowledge of the double virtuality is required). This is not surprising since, as emphasized previously, our toy model is not realistic enough and fails to describe the space-like region, which may hint an overestimation of the systematic uncertainty.
In this respect, a more elaborated investigation including not only the low-and highenergy behaviors, but the information about the time-like region, such as physical res-onances and threshold discontinuities, would be of interest in order to reach a similar precision to what is achieved in the η case. Investigations in this respect are undergoing.
Our results may be compared to the experimental values given in Table 1. We find an interesting deviation in the η → µ + µ − channel. Still, the experimental accuracy prevents us from drawing any conclusion and a new experiment would be very welcomed. It has been recently suggested that this could be possible at the LHCb Collaboration [23]. The result becomes even more interesting when comparing to the analogous π 0 → e + e − anomaly (cf. our results in Table 5 and the experimental value in Table 1), as we find that, whereas for the π 0 case a very damped TFF at large energies was required to reproduce the experimental value [11], the η case demands a smoothly falling TFF instead, which points to a puzzling situation. Very interesting as well is the current bound on η → e + e − , which is getting closer to the theoretical expectations. In this respect, it would be very stimulating to push for a new measurement. Such an effort is currently ongoing at VEPP-2000 e + e − collider at Novosibirsk, where they plan to increase their statistics by a factor of ten. Additionally, it has been suggested the possibility to measure η → µ + µ − at LHCb [23].
The results above represent an important improvement with respect to previous studies. See for instance the comparison in Tables 4 and 5 between our exact results (second column) and the results obtained (last column) if we would have used the standard approximated calculation, widely used in the literature [26,35,42], which amounts to expand the kernel in Eq. (2.2) in terms of m l /m P as well as m l /Λ and m P /Λ, with Λ a cut-off of the loop integral (the hadronic scale driving the TFF), see Eq. (3) in Ref. [11]. Indeed, the missing corrections concerning the lepton mass along the lines of Ref. [36] are relevant for the µ + µ − channel, which regretfully were not included in the recent Ref. [10].
Equally important is the correct implementation of both low-and high-energy TFF's behavior (i.e. using b P instead of the commonly employed resonance or VMD fit parameters as well as accounting for the OPE behavior), which for the η case induces the largest effect. Beyond that, the systematic error estimation is for the first time discussed and is by no means negligible.
On the contrary, the role of the Z boson is almost negligible, though similar to some hadronic uncertainties. This is already a sign that new physics at the electroweak scale will not alter the results here presented and any eventual deviation from the Standard Model in these decays will demand a compelling New Physics benchmark.
6 The low-energy description: the χPT approach After reporting on our final results, we would like to discuss the comments raised in the Introduction about the calculation of the pseudoscalar decays into lepton pairs within the framework of χPT. In this framework, the value for A(m 2 P ) at leading order is given, in the M S renormalization scheme, as [30,34] Table 6. Comparison between χPT counter-term χ(µ) defined in Eq. (6.1) with its equal-mass version, χ(µ) mπ , and the U (3)-symmetric version χ(µ) U V . Results are always reported for the range (OPE÷Fact). See description in the main text.
To illustrate the accuracy of χPT at leading order, we want to quantify the role of the different approximations within χPT when calculating the parameter χ(µ) from Eq. (6.1). For that purpose, we equate our final results for A(m 2 P ) (see Table 4) to the χPT result, Eq. (6.1), from which χ(µ) can be determined. The values obtained are displayed as χ(µ) in Table 6 for µ = 0.77 GeV and represent the second main result in this work. In each entry on this table, we quote the extreme solutions given by the boundaries of the (OPE÷Fact) choices for a P ;1,1 . The spread of values obtained for χ(µ) in each channel manifest that the χPT predictions at this order are subject to significant corrections, which origin is detailed below.
First, there is a large splitting due to U (3)-breaking effects coming from the fact that m π < m η < m η . To see the magnitude of them, we illustrate in Table 6, row χ(µ) mπ , what would have been obtained for χ(µ) if we would have imposed m η = m η = m π ≡ m π in our A(m 2 P ) calculations as well as in Eq. (6.1). In this scenario, since m π < 2m µ , the decays into muon pairs cannot be calculated and are accounted for by a hyphen.
An additional difference arises from the different pseudoscalar meson transition form factors. Imposing that all of them are equal to the π 0 TFF, but the masses are restored to their real values, the results are collected under the row called χ(µ) U V in Table 6.
Finally, m plays as well a relevant role in our decays. Corrections of the order O(m /m P , Λ) are expected to arise dynamically, for instance, through pion loop effects as in Fig. 2 and implies that calculations at the next-to-leading order are relevant in χPT. This is specially important for the η , as this is the only mechanism able to generate an additional imaginary contribution within the χPT framework. Moreover, this puts a word of caution on naive χPT analysis of these processes when looking for New Physics, as a different experimental extraction for χ(µ) in either the electronic or muonic channel (for the same pseudoscalar) is expected within the Standard Model and does not necessarily imply lepton flavor violation.

The π 0 -exchange contribution to the 2S hyperfine-splitting in the muonic hydrogen
The results collected in Table 6, first row, are also relevant for calculating the π 0 -exchange contribution to the 2S hyperfine-splitting in the muonic hydrogen [71]. Such calculation can be performed within χPT [72], which involves again Eq. (6.1). However, the kinematics of the process involve a vanishingly small Q 2 space-like momentum for the π 0 since the π 0 -exchange contribution to the 2S hyperfine-splitting is a t-channel exchange. As such, it is A(Q 2 0) instead of A(m 2 π ) which is relevant now [71], shifting the values obtained in Table 6. To illustrate this, we recalculate A(Q 2 ) from Eq. (2.2) taking the limit Q 2 → 0, and obtain the new subtraction constant which should be used in Eq. (6.1) to reproduce our results. Using the π 0 TFF [51] we obtain for = e, which is smaller than its counterpart collected in Table 6. However, for the 2S hyperfine-splitting in muonic hydrogen what is needed is the coupling to muons ( = µ).
In that case, we obtain 15 which is even lower than Eq. (6.2). Note that the shift is of the order of the uncertainties quoted in Table 4 and arises again from the full q 2 and m 2 dependence in Eq. (2.2), which is not accounted for at LO in χPT. We note that the relation between the χ ee π 0 (µ) in Eq. (6.2) and χ µµ π 0 (µ) in (6.3) and the one extracted from the experimental results is nontrivial as it is TFF dependent. In quoting our results, we implicitly assume that there is no New-Physics contribution. However, if the current discrepancies among theory and experiment persists, indicating New Physics contribution -which we will discuss in Sec. 7the connection between the experimental χ(µ) and that in Eqs. (6.2) and (6.3) will depend on the particular scenario and will have to be reanalyzed.
The results above are illustrative as well regarding (g − 2) µ hadronic contributions, which in χPT involve χ(µ) together with an additional counterterm, C(µ), as an input [73]. If we were able to determine C(µ) somehow, from (g − 2) e for example, and χ(µ) would be taken from the experimental π 0 → e + e − result, extrapolating up to the µ case may imply a non-negligible error as illustrated above; similar effects may arise for C(µ) as well [11,74].

Higher order corrections
As discussed above, the precision which is reached at the LO in χPT for processes involving a P vertex may not be enough -a feature which manifests when comparing the same process for a different = e, µ channel. This suggests to look at the next-to-leading order.
In this respect, χPT would yield a power series expansion for the TFF 16 Then, we could calculate the result of (2.2) for the TFF in (6.4), where A LO (q 2 ) has been given in (6.1) and where L = ln(m 2 /Λ 2 ) 17 . We notice that the LO leading logs L correspond -not surprisingly as they arise from a power-like expansion as well-to the corrections found in [9,10] if Λ is taken as the VMD scale. We adopt then a much more modest approach and retain the leading logs alone, which represents a good approximation. This would produce a straightforward generalization to higher orders as well as a tool to estimate the convergence of the chiral expansion. Of particular relevance is the difference A(q 2 , m 2 e ) − A(q 2 , m 2 µ ), where one expects a better convergence of (6.5) due to partial cancellations. Taking into account the smallness of the electron mass, we find that such a shift is given as Whereas our theoretical results for the leptonic and muonic channels in Tab. 4 could not be reproduced at LO with an unique counterterm, the observed differences in Tab. 6 and Section 6.1 can be easily accounted for to a good approximation taking into account the additional terms in (6.8) -an exception is the η case, for which the pion loops cannot be neglected in order to extract an imaginary part. The expansion above, Eq. (6.8), proves extremely useful to relate different leptonic channels, which is not only relevant in the cases discussed above, but for χPT studies on lepton flavor violation in K L → decays [75]. 16 For simplicity, we have assumed a single scale for the TFF inspired in typical VMD models. Note that logarithmic terms coming from loops are of course present too. However, they are subleading as compared to the power expansion and may be Taylor expanded for the π 0 and η cases. 17 We note that, previous to the renormalization procedure, the loop integral produces L = ln(m 2 /µ 2 ) terms. It is after including the (non-explicitly shown) counterterms of the theory that the (scale independent) L = ln(m 2 /Λ 2 ) result would appear. We emphasize that it is not our aim to perform a detailed higher-order χPT evaluation, rather than to illustrate the effects which are expected to appear. Figure 5. Left(right): additional tree level contributions from an axial(pseudoscalar) field. The P within the blob stand for the pseudoscalar meson; A(P) stands for the axial(pseudoscalar) field with momentum q; ( ) for the (anti)lepton with momentum p(p ).

New physics contributions
At this point, we are finally on a firm foot to discuss about possible New Physics (NP) contributions given the current discrepancies in the two existing measured decays. As discussed in [11], any additional contribution will always manifest, after Fierz-rearrangement, only through effective pseudoscalar (P) and axial (A) contributions, which given the existing well-motivated models, are conveniently expressed in the effective Lagrangian where g, m W are the standard electroweak parameters, and c A,P f are dimensionless couplings to the fermions f = {u, d, s, e, µ}. These interactions yield additional tree-level contributions as shown in Fig. 5. Their corresponding amplitudes read for the axial and pseudoscalar contribution with masses m A and M P , respectively. The hadronic matrix element 0| J NP µ5 |P (q) in Eq. (7.1) may be easily obtained using the octet-singlet basis or the flavor basis as an analogy to the SM axial current. From the pseudoscalar decay constants, defined in terms of the U (3) axial currents 18 , where a = 8, 1 and P = η, η , we find that Inserting this equation back into Eq. (7.1) and using the spinor projector for the singlet state from Ref. [76], we obtain . This produces an additional contribution to Eq. (2.2) which reads As an example, the Z-boson contribution is obtained after taking c Z u = −c Z d,s,e,µ = 1, leading Alternatively, we could have used the flavor basis instead [56] 19 , then, For the pseudoscalar contribution, the 0| P NP |P (q) hadronic matrix element determination in Eq. (7.2) is more complicated whenever the singlet component is involved. This is the case for both η and η as they are an admixture of the octet and singlet states. To illustrate this, let us calculate such matrix elements at leading order (LO) in χPT, which amounts to retain the leading pseudoscalar P × P term arising from the interaction between the pseudoscalar field P and the pseudoscalar current P defined in χPT from the building block χ ≡ 2B 0 iP (B 0 is the low-energy constant related to the scalar singlet quark condensate qq 0 in the chiral limit). Then, in the presence of NP of pseudoscalar type, χ → 2B 0 iP NP . For the π 0 such term corresponds to For the η and η such term is more involved and at LO reads After relabeling, introducing g 8 ≡ (c P u + c P d − 2c P s )/ √ 3 and g 1 ≡ √ 2(c P u + c P d + c P s )/ √ 3, it can be expressed as where M 2 8 , M 2 1 and M 2 81 are defined in Refs. [77,78]. Finally, using the η −η masses, mixing and decay constants at LO 20 , we obtain for the matrix element where g a has been defined above and M 2 0 = 6τ /F 2 0 is the topological mass term [78]. After some algebra, we have obtained a relation which is very similar to the π 0 result -except for the singlet a = 1 term-and resembling that of the axial current matrix element. The previous calculation applies so far up to LO. However, such precision is not enough to reproduce the observed η − η mixing, which requires higher order calculations. In order to provide a general result, valid at any order and related to known parameters, it is convenient to recall the Ward identity [78] where M = diag(m,m, m s ) is the quark mass matrix. In such a way, the pseudoscalar current may be expressed in terms of the axial current and the winding number density ω. Then, using the same algebra as previously, the matrix element can be expressed as where ∆ = 0| 3/2 ω |P /m 2 P F 1 P . Still, ∆ needs to be determined. Neglecting the u and d quark masses, ω may be expressed as [79]: Plugging this relation into (7.4), we obtain ∆ = 1 + F 8 P /( √ 2F 1 P ), so the pseudoscalar contribution to P → can be finally expressed as ) . This induces an additional contribution to the A(q 2 ) loop amplitude in Eq. (2.2), We note that the approximation taken for calculating the 0| ω |P matrix element has been used with great success in J/Ψ → γη(η ) decays [79]. Actually, at LO in χPT 21 , the difference between Eq. (7.3) and Eq. (7.4) is of 8%(1%) for the η(η ), precise enough for our study. The nice feature from this approach is to provide an appropriate description for the singlet component valid at any order in the chiral expansion and based on known parameters -as long as the approximation employed holds to the required precision.
In the flavor basis, neglecting the u and d quark masses, only the strange part contributes. Using an analogous procedure, we find All in all, both contributions may be summarized to yield an additional term modifying Eq. (2.2) as where G F is the Fermi coupling constant, and F π 92 MeV is the pion decay constant. The λ-terms depend on the pseudoscalar meson structure, which for the η and η involve the mixing parameters. In the flavor-mixing scheme, they read 22 Taking the result from the mixing parameters in Ref. [55] to numerically calculate Eqs. (7.7, 7.8), Eq. (7.6) yields To discuss the sensitivity of each particular channel to NP, it is convenient to cast a very approximate result for A(m 2 P ), namely 21 At this order, the η and η masses are [78] M 2 η = 0.244GeV 2 M 2 η = 0.917GeV 2 and M 2 0 = 0.673GeV 2 . 22 By definition, F 8 π 0 = F 1 π 0 ≡ 0 and F 3 π 0 ≡ Fπ. From Ref. [56], F q η(η ) = 0.84(0.72)Fπ, F s η(η ) = −0.90(1.14)Fπ and F 3 η(η ) ≡ 0. See details in [56] for the errors on these parameters.
where Λ is an effective hadronic scale characterizing the TFF and δ NP is the NP contribution in Eq. (7.6). From Eq. (7.9), we see that, as the lepton mass gets lighter, the amplitude will be dominated by the ln(m /m P ) terms, which become large and make the NP contribution harder to see. Indeed, for = e, the relative NP contribution to the BR is approximately given by 2δ NP (ln 2 ( me m P ) + π 2 ) −1 . If we are aiming to find contributions from NP, it is therefore much easier to look for the = µ channel as the NP part is insensitive to m (see Eq. (7.6)).
With respect to m P , from the logarithmic scaling, we infer that there is no big difference in the SM in choosing either π 0 , η, or η as their masses are of same order. Furthermore, the NP axial contribution does not see the m P , meaning that is equally likely to appear in any case. This contrasts with the pseudoscalar NP contribution, which strongly depends on m P (cf. Eq. (7.8)) and gets bigger as m P and M P (the mass of the new pseudoscalar particle) approach each other. Still, this is a priori irrelevant unless there is a well-motivated NP scale which is close to either the π 0 , η, or η masses.
From this discussion, we conclude that η(η ) → µ + µ − decays are the best candidates to look for NP effects (as the π 0 cannot decay into muons). For illustrating the statements above, we give the approximate NP contribution to the branching ratio for each particular process, We see that, as stated above, the = e channel has the same sensitivity for every pseudoscalar. For = µ we find that for η(η ) is two(three) times more sensitive as the = e channel. These numbers imply, together with the experimental precision reached for the π 0 (η) decay (we do not consider the central value, but the obtained precision), bounds for the c A parameters of the order c A ∼ 7 (8). As an example, for the Z boson , the c A f combination is −2(−1.27)(1.61) for π 0 (η)(η ). Interesting enough, a typical Z-like contribution has opposite sign for π 0 → e + e − than for η → µ + µ − , contrary to experimental implications. This would suggest either different couplings (necessarily SU (2) breaking), or lepton flavor violating (LFV) models, which would couple different to distinct generation of quarks, leptons, or both. Moreover, in order to avoid (g−2) µ problems, we would need either some balance from an additional vector-like contribution 23 or, again, LFV models in which the coupling to the muon is suppressed.
For a pseudoscalar contribution, as in Ref. [5], the effective couplings may become even larger as the new particle mass approaches the π 0 , η, η masses, meaning that would be visible for one of the pseudoscalars alone. Finally, we comment on the existing correlations given the pseudoscalar structure. We see for instance that π 0 → e + e − and η → µ + µ − are, in general, anti-correlated unless there is a pseudoscalar particle P with m π 0 < m P < m η (or a different structure for distinct generations). Again, (g − 2) µ would play an important constraint for the pseudoscalar case as well.
To conclude, there is still the chance to look for NP contributions, specially in the = µ channel, and a variety of phenomenology is possible depending on which kind of interaction is chosen. Still, our study suggests to go beyond simple scenarios. This seems nevertheless the standard in high energy physics nowadays, and scenarios of this kind have been and are still studied at present. In this discussion, we have omitted a detailed discussion of available physical constraints for these scenarios. This constitutes a field of study by itself [5,[80][81][82][83].

Conclusions
In this work, we discussed a novel approach for evaluating the Standard Model prediction for η(η ) → decays, which are mainly driven by the non-perturbative regime of QCD. This was made possible using the machinery of Canterbury approximants (CA), an extension of Padé approximants to the double virtual case. This approach is data driven, systematic and allows for the correct implementation of the low-and high energy QCD requirements, which are key points in the calculation. In addition, our method implements a systematic error and the results come from a full numerical evaluation for the loop integral, which were not included in most of the previous approaches. From our experience in [11], we expect that higher elements in the approximants sequence yield a value very similar to the OPE a P,1,1 = 2b 2 P choice which also encodes the appropriate high-energy behavior, which means only the correct Q −2 dependence but not its exact coefficient. We quote these as our final number, incorporating the difference with respect to the factorization a P,1,1 = b 2 P choice as an asymmetric systematic error in Table 7. Given the current experimental errors, the achieved precision is accurate enough and does not require the computation of higher elements in the CA sequence, which poses a major complication with respect to the π 0 case. We postpone such a calculation for the near future in case new experiments prove to require a higher precision. In this work, we have found a remarkable feature of our approach, namely, that by abandoning the old conceptions on rational approaches -where the poles are associated to physical resonances and fixed in advance-in favor of the Padé Theory point of viewwhere poles are effective parameters accounting for the analytic structure of the underlying function-it is possible to effectively reproduce non-trivial effects encoded in the TFF, such as branch cuts, which is the reason for which we can perform a calculation for the η(η ) → decays, beyond the possibilities of resonance approaches.
We have found that, similar to the π 0 → e + e − case, there is an anomaly for the η → µ + µ − decay, which would make a new measurement very interesting.
In addition, we have discussed the difficulties in achieving a precise description for these processes from χPT, though it would be a priori an adequate tool for this purpose; in this respect, we have parametrized the expected higher-order corrections for further studies, which could be useful for LFV studies in K L →¯ . As an outcome, we have parameterized the impact of the π 0 -exchange contribution to the 2S hyperfine-splitting in the muonic hydrogen based on our data-driven results.
Finally, we have discussed the implications of New Physics scenarios, which should effectively be of either axial or pseudoscalar nature. Our results suggest that such contributions should arise either from SU (2) isospin-breaking couplings or lepton flavor violating scenarios.

A Generalities of Canterbury approximants
In this appendix, we define and illustrate the performance of Canterbury approximants specializing to the case of symmetric functions f (x, y) = f (y, x). Given the original function formal expansion the Canterbury approximants C N M (x, y) are rational functions of the bivariate degree N and degree M polynomials, R N (x, y) and Q M (x, y) 24 , respectively, where the coefficients (i ≥ j) a i,j ∈ N and b i,j ∈ D are determined by the accuracy-throughorder conditions (b 0,0 = 1 as a part of the definition) 24 Note that the polynomials are constructed as to have the maximum power in each variable rather than a total maximum power in x i y j with i + j ≤ N (M ) [46]. In other words, RN (x, y) contains a term x N y N and (N + 1)(N + 2)/2 terms in total.

A.1 Performance of Canterbury approximants with the help of toy models
To illustrate the performance of Canterbury approximants, we take the example of two wellmotivated models, the Regge model from Ref. [68] and a logarithmic model which naturally arises for flat distribution amplitudes [84]. These models have been employed in Ref. [68] and Refs. [51,84] to study the TFF SL data and are selected here given their sophisticated analytic structure. The first one reads where ψ n (z) = ∂ n+1 z ln Γ(z) is the polygamma function, and M and a are the parameters for the Regge trajectory M 2 n = M 2 + na. For our case of study, we take M = 0.8 GeV and a = 1.3 GeV 2 [85]. The lowest C 0 1 (Q 2 1 , Q 2 2 ) approximant is given, for this model, as where ψ (n) ≡ ψ (n) (M 2 /a). The second, logarithmic model, reads For our study, we take M 2 = 0.6 GeV 2 in analogy to the single-virtual case [51,84]. Its C 0 1 (Q 2 1 , Q 2 2 ) approximant is then given as To show the convergence, we use the C N N +1 sequence. The reason for this particular choice is not a mathematical but a physical one, namely, that the TFF should vanish as Q 2 1,2 → ∞. The results are shown in terms of the relative error C N N +1 (Q 2 1 , Q 2 2 )/F Regge,log P γ * γ * (Q 2 1 , Q 2 2 ) − 1 in Figs. 6 and 7 for the Regge and logarithmic model, respectively. The observed convergence pattern is excellent and lies within the expectations of PAs results.
We warn however that fixing the poles in advance to lie at the physical resonances (which holds strictly in the large-N c limit alone) results in a slower convergence pattern in analogy to PAs [44]. Consequently, we do not advice this practice if only a small set of derivatives are known.
Finally, similar to Padé approximants, the high-energy behavior may be constrained as well. Both of our models fulfill the OPE condition, meaning that, for Q 2 1 = Q 2 2 ≡ Q 2 → ∞, the TFF falls as Q −2 . As an example, for our previous C 0 1 (Q 2 1 , Q 2 2 ) case, this reduces to remove the Q 2 1 Q 2 2 term in the denominator. The performance is greatly improved then for Q 2 1 = Q 2 2 up to very large values, at least, for those approximants beyond C 0 1 (Q 2 1 , Q 2 2 ), which is illustrated in Fig. 8.
In our case of study, we could only reconstruct the C 0 1 (Q 2 1 , Q 2 2 ) approximant given the scarce information on the double-virtual regime. Particularly, we suggested in Ref. [11] that,  Figure 6. Convergence of the C N N +1 (Q 2 1 , Q 2 2 ) sequence to the Regge model for the lowest elements. The first, second, third, and fourth contours, from light to dark red, stand for the relative −1, −5, −10 and −20% deviations. Both axis have been scaled as Q 2 /(1 + Q 2 ).  Figure 7. Analogous results to those in Fig. 6 but for the logarithmic model. based on low-and high-energy constraints, the only involved double virtual parameter in this approximant, a P ;1,1 , should lie within the (b 2 P , 2b 2 P ) range, which would provide then a conservative estimate for the BR(P →¯ ) calculation. In order to illustrate this statement, we display in Table 8 the range implied for these limiting cases (Fact and OPE columns) for our two given models together with the exact result of the models. When calculating the models results, it is convenient to use alternative expressions equivalent to Eq. (A.5) and Eq. (A.7). Particularly, for the Regge model we use which allows to express the result in terms of a summation of individual terms which can be calculated analog to the factorization case. We find a nice convergence when summing up to 10 4 terms after comparison to the 10 5 terms summation. For the logarithmic model,  Figure 8. Convergence of the C N N +1 (Q 2 1 , Q 2 2 ) sequence to the Regge model for the first elements as in Fig. 6 but now with the appropriate high-energy behavior imposed. The first, second, third, and fourth outer(inner) contours, from light to dark red, stand for the relative ∓1, ∓5, ∓10 and ∓20% deviations. Both axis have been scaled as Q 2 /(1 + Q 2 ).

Regge
Log BR(P →¯ ) Fact OPE OPEc Exact Fact OPE Exact π 0 → e + e − × 10 8 6.218 6.080 6.266 6.  Table 8. Our Regge and logarithmic model results for the BR(P →¯ ) (Exact column) together with their C 0 1 (Q 2 1 , Q 2 2 ) approximant results. The Fact (OPE) column stands for our chosen values in the main text a P ;1,1 = b 2 P (2b 2 P ), whereas the OPEc is a choice in which the OPE coefficient is imposed in the C 0 1 (Q 2 1 , Q 2 2 ) reconstruction. See details in the text.
we employ the integral representation F log P γ * γ * (Q 2 1 , Q 2 2 ) = F P γγ M 2 1 0 dx 1 xQ 2 1 + (1 − x)Q 2 2 + M 2 . (A.10) The loop integration can be performed using the same technique as in our OPE case; the resulting value is then integrated over x. From the results in Table 8, the reader can verify that, for all the cases except the η ones, the model value lies within the suggested a P ;1,1 band. The differences in the η case are due to the -thoroughly-discussed-appearance of hadronic-induced imaginary parts and are within the error ∼ 20% that was estimated in Table 3. Summarizing, our error estimations applied to our final results in Table 5 are excellent for the models here investigated.
In addition, the reader may have realized that our C 0 1 (Q 2 1 , Q 2 2 ) description can reproduce the BL and OPE Q 2 -behaviors but not their associated coefficients at once. The latter would be possible with a higher approximant [11], which is however difficult to reconstruct due to our ignorance on double-virtual parameters. Still, given the weight of the low energies in this process, we argued that this is of no relevance, since the low-energy parameters play the main role. Our particular models chosen here, Eq. (A.5) and Eq. (A.7), do not fulfill the BL Q 2 -behavior (they behave for large Q 2 as Q −2 ln Q 2 ) but they do fulfill the OPE one. As a result, we can illustrate what would have been obtained if our C 0 1 (Q 2 1 , Q 2 2 ) approximant fulfilling the OPE Q 2 -behavior would have been constrained to reproduce the OPE coefficient as well -this is, we trade the slope parameter, b P /m 2 P , for the OPE coefficient. Unfortunately, for the logarithmic model this exercise is trivial, since F P γ * γ * (Q 2 , Q 2 ) = M 2 (M 2 + Q 2 ) −1 and a single parameter can account for both, the low and high-energy behaviors. This is not the case for the Regge model, which result when constraining the OPE coefficient is given in the "OPEc" column in Table 8. From this exercise we conclude that, given the weight of the low energies, the most efficient strategy is to rely on the low-energy expansion and the correct high-energy behavior without imposing the high-energy coefficients. The OPEc always represents the worse scenario. This strategy is also supported from a mathematical point of view: whereas the Taylor expansion represents a convergent series with a finite radius of convergence, the high-energy expansion represents only an asymptotic one, for which the convergence of PAs is much slower [44,45]. Let us finally remark that none of the single entries collected in Table 8 guide towards the exact result. It is only the range given by the two bounds (Fact ÷ OPE) which tells where the result is.