1 Introduction

The pseudoscalar transition form factors (TFFs) describe the effect of the strong interaction on the \(\gamma ^*\gamma ^*P\) vertex, where \(P=\pi ^0, \eta , \eta ^\prime , \eta _c\ldots \), and is represented by \(F_{P\gamma ^*\gamma ^*} (q_1^2,q_2^2)\), a function of the photon virtualities \(q_1^2\), and \(q_2^2\). From the experimental point of view, one can study such TFFs from both space- and time-like energy regions. The time-like region of the TFF can be accessed at meson facilities either through the double Dalitz decay processes \(P \rightarrow l^+l^- l^+l^-\), which give access to both photon virtualities \((q_1^2,q_2^2)\) in the range \(4m_l^2<(q_1^2,q_2^2)<(m_P-2m_l)^2\), or the single Dalitz decay processes \(P \rightarrow l^+l^- \gamma \), which contains a single virtual photon with transferred momentum in the range \(4m_l^2<q_1^2<m_P^2\), thus simplifying the TFF to \(F_{P\gamma ^*\gamma ^*}(q_1^2,0)\equiv F_{P\gamma ^*\gamma }(q^2)\). To complete the time-like region, \(e^+e^-\) colliders access the values \(q^2>m_P^2\) through the \(e^+e^- \rightarrow P\gamma \) annihilation processes. The space-like region of the TFFs are accessed in \(e^+e^-\) colliders by the two-photon-fusion reaction \(e^+e^-\rightarrow e^+e^-P\), where at the moment the measurement of both virtualities is still an experimental challenge. The common practice is then to extract the TFF when one of the outgoing leptons is tagged and the other is not, that is, the single-tag method. The tagged lepton emits a highly off-shell photon with transferred momentum \(q_1^2\equiv -Q^2\) and is detected, while the other, untagged, is scattered at a small angle with \(q_2^2\simeq 0\). The form factor extracted from the single-tag experiment is then \(F_{P\gamma ^*\gamma ^*}(-Q^2,0)\) \(\equiv F_{P\gamma ^*\gamma }(Q^2)\).

At low-momentum transfer, the TFF can be described by the expansion

$$\begin{aligned} F_{P\gamma ^*\gamma }(Q^2)=F_{P\gamma \gamma }(0) \left( 1-b_P\frac{Q^2}{m_P^2}+c_{P}\frac{Q^4}{m_P^4}-d_{P}\frac{Q^6}{m_P^6}+\cdots \right) \ , \end{aligned}$$
(1)

where \(F_{P\gamma \gamma }(0)\) is the normalization, the low-energy parameters (LEPs) \(b_P\), \(c_P\), and \(d_P\) are the slope, the curvature, and the third derivative of the TFF, respectively, and \(m_P\) is the pseudoscalar meson mass. \(F_{P\gamma \gamma }(0)\) can be obtained either from the measured two-photon partial width of the meson P,

$$\begin{aligned} |F_{P\gamma \gamma }(0)|^2= \frac{64 \pi }{(4\pi \alpha )^2}\frac{\Gamma (P\rightarrow \gamma \gamma )}{m_P^3}\ , \end{aligned}$$
(2)

or, in the case of \(\pi ^0\), \(\eta \), and \(\eta ^\prime \), from the prediction of the axial anomaly in the chiral limit of QCD.

In this work we shall focus on the \(\eta \) TFF exclusively. Its slope parameter has been extensively discussed from both theoretical analyses [1, 35, 7] and experimental measurements [814]. On the theory side, chiral perturbation theory (ChPT) predicts \(b_{\eta }=0.51\) at \(\mu ^2=0.69\) GeV\(^2\) and for \(\sin \theta _P=-1/3\) [1], being \(\mu \) the renormalization scale and \(\theta _P\) the \(\eta \)\(\eta ^\prime \) mixing angle. Other theoretical predictions are [1]: \(b_{\eta }=0.53\) from vector meson dominance (VMD), \(b_{\eta }=0.51\) from constituent-quark loops (QL), and \(b_{\eta }=0.36\) from the Brodsky–Lepage (BL) interpolation formula [2]. Recently, the slope has been predicted to be \(b_{\eta }=0.546(9)\) and \(b_{\eta }=0.521(2)\) from a chiral theory with one and two octets of vector resonances [3], respectively, \(b_{\eta }=0.60(6)_\mathrm{stat}(3)_\mathrm{sys}\) from rational approximants [4], \(b_{\eta }=0.62^{+0.07}_{-0.03}\) [5] and \(b_{\eta }=0.57^{+0.06}_{-0.03}\) [6] from dispersive analyses, and \(b_{\eta }=0.51\) or 0.54, depending on the data set used as input, from anomaly sum rules [7]. With respect to the experimental determinations, the values for the slope are usually obtained after a fit to data using a normalized, single-pole term with an associated mass \(\Lambda _P\), i.e.

$$\begin{aligned} F_{P\gamma ^*\gamma }(Q^2)=\frac{F_{P\gamma \gamma }(0)}{1+Q^2/\Lambda _P^2}\ . \end{aligned}$$
(3)

The results are \(b_{\eta }=0.428(89)\) from CELLO [9] and \(b_{\eta }=0.501(38)\) from CLEO [10], both from space-like data, and \(b_{\eta }=0.57(12)\) from lepton-G [8], \(b_{\eta }=0.585(51)\) from NA60 [11] \(b_{\eta }=0.58(11)\) from A2 [12], and \(b_{\eta }=0.68(26)\) from WASA [13], all of them from time-like data. More recently, the A2 Collaboration reported \(b_{\eta }=0.59(5)\) [14], the most precise experimental determination up to date. The curvature was for the first time reported in Ref. [4] with the value \(c_{\eta }=0.37(10)_\mathrm{stat}(7)_\mathrm{sys}\). Nothing is yet reported about the third derivative of the TFF, \(d_{\eta }\), although its role on hadronic quantities where TFFs are important suggests also to look at it (see Ref. [4] for its role on the \(\eta \) contribution to the hadronic light-by-light scattering piece to the muon \((g-2)\)).

Several attempts to describe the \(\eta \) TFF are available in the literature at present [3, 5, 7, 1529] but none of them tries for a unique description of both space- and time-like experimental data, specially at low energies. In Ref. [30], it was suggested for the \(\pi ^0\) case that a model-independent approach to the space-like TFF can be achieved using a sequence of rational functions, the Padé approximants (PAs), to fit the data. Later on, in Ref. [4], the same method was applied to the \(\eta \) and \(\eta ^\prime \) TFFs. More recently, the A2 Collaboration reported a new measurement of the \(\eta \rightarrow e^+e^-\gamma \) Dalitz decay process with the best statistical accuracy up to date [14]. A comparison with different theoretical approaches was also performed. In particular, the results from Ref. [4], based on space-like data, were extrapolated to the time-like region and agreed perfectly with their measurement. Triggered by these new A2 results, we explore in the present work a combined description of both space- and time-like regions of the \(\eta \) TFF within our method of rational approximants. This will provide, for the first time, a determination of the energy dependence of the \(\eta \) TFF in both regions together with a unified extraction of its LEPs.

Our approach makes use of PAs as fitting functions to all the experimental data. PAs are rational functions \(P^N_M(Q^2)\) (ratio of a polynomial \(T_N(Q^2)\) of order N and a polynomial \(R_M(Q^2)\) of order M) constructed in such a way that they have the same Taylor expansion as the function to be approximated up to order \(\mathcal{O}(Q^2)^{N+M+1}\) [31]. Since PAs are built in our case from the unknown low-energy parameters (LEPs) of the TFF, once the fit to the experimental data is done, the reexpansion of the PAs yields the desired coefficients. Being rational functions the PAs are analytic everywhere except where the poles are located. Branch cuts cannot in principle be described by PAs, however, if the function to be approximated is of a certain kind, for instance a Stieltjes function, it can be proven mathematically that an infinite-order PA is able to reproduce the cut [31].Footnote 1 Another interesting issue is the implementation of chiral logarithms of the kind \(\log (Q^2/M^2)\), appearing for instance in chiral expansions at next-to-leading order, in the PAs method. These chiral logs admit a Taylor expansion which can be seen as an infinite-order diagonal PA and is convergent for any value of \(Q^2>0\). Therefore, in the case the approximated function includes chiral logs their effects are incorporated in the PAs to a good extent (more precise as the order of the PA increases).Footnote 2 The advantage of PAs over Taylor expansions is their ability to enlarge the domain of convergence. However, to prove the convergence of a given PA sequence is a difficult task and only for certain classes of functions this can be done rigorously. In practice, the success of PAs in the description of experimental data can only be seen a posteriori in the sense that the pattern of convergence can be shown but unfortunately not proven mathematically. We refer the interested reader to Refs. [32, 33] for details on this technique.

In this work, we resume our method [4] for fitting the \(\eta \) TFF experimental data after including all the recent available time-like measurements from \(\eta \rightarrow l^+ l^- \gamma \) decays (\(l=e,\mu \)). Besides recapitulating the main features of the method we will address the following issues:

  • A reevaluation of the systematic errors considered in our previous work is demanded by the inclusion of time-like data at these low energies. This new set of data being more precise than the space-like one, its incorporation will allow for an improved systematic error associated with each element of a given PA sequence and the increase in order of the sequence itself.

  • The better description of the low-energy region of the TFF allows for an improved determination of its value at zero momentum transfer, which is related to the two-photon decay width of the \(\eta \). The impact of the recent measurement of this width by the KLOE collaboration [34] and of older measurements based on Primakoff techniques is commented.

  • The role played by high-energy space-like data in view of the fact that in such region only data is available. Related to this, the existing puzzle between the precise mixing scenario derived from the TFF in contrast to the measured time-like cross section by at \(q^2=112\,\) GeV\(^2\) [35] is discussed. The possibility for the Belle Collaboration to measure the time-like \(\eta \) TFF is also mentioned;

  • The extraction of the \(\eta \)\(\eta ^\prime \) mixing parameters from the TFFs and the two-photon decays after discussing the role of the renormalization scale dependence of the singlet decay constant \(F_0\). The new results are much better constrained with the inclusion of time-like experimental data and turn out to be competitive with standard determinations, such as for instance the analysis of \(V\rightarrow P\gamma \) \((V=\rho , \omega , \phi )\) and \(P\rightarrow V\gamma \) decays [36].

  • The determination of these \(VP\gamma \) coupling constants from the former mixing parameters and its comparison with current experimental values. The effect of OZI-violating parameters and higher-order effects is also discussed.

The paper is organized as follows. In Sect. 2, a reanalysis of the systematic error related to our method when taking into account both space- and time-like experimental data is performed. In Sect. 3, a brief description of the general method for extracting the low-energy parameters of the \(\eta \) TFF using rational approximants is presented and then the impact on them of both the \(\eta \rightarrow \gamma \gamma \) latest measurement and the high-energy space-like data are discussed. In Sect. 4, the implications of our new results for the determination of the \(\eta \)\(\eta ^\prime \) mixing parameters, the understanding of the puzzle, and the prediction of the \(VP\gamma \) couplings are examined. Finally, in Sect. 5 the conclusions of the present analysis are given.

2 A new systematic error

In the context of Padé approximants, by systematic error is meant the difference between the function to be approximated and the highest approximant reached after the fit procedure. If there is seen convergence, the larger the PA order, the smaller the systematic error. Therefore, any finite-order PA should have a definite systematic error. In this section, we discuss how to obtain such an error for a scenario containing both time- and space-like data.

In order to illustrate the utility of the PA as fitting functions, Ref. [30] simulates the real situation of the experimental data on the space-like region by generating with different models a set of pseudodata. Such data were then fitted with a \(P^L_1(Q^2)\) (single-pole approximants) sequence and the LEPs where extracted. This exercise was twofold: first it was meant to show the ability of the PA sequence to extract the LEPs and, second, also provided a systematic error for the extraction of each LEP at each value of L. In Ref. [4], more examples were worked out and further discussed and we refer the interested reader to such references.

Dealing now with a larger set of data, such systematic errors should be reanalyzed, specially because the amount of time-like data, which covers the lowest-energy region—and is most important for LEPs extraction—is larger than the space-like one.

Following the strategy presented in Refs. [4, 30], we simulate with an holographic model (see Appendix B for details on the model together with on the simulation) the situation of the experimental data from both space- and time-like data; see, respectively, [9, 10, 37] and [11, 12, 14]. The results obtained with the holographic model described in Appendix B are collected in Table 1 where the relative errors for the first three derivatives for each element on the \(P^L_1(Q^2)\) sequence are reported. These results are model dependent. Using, instead, the quark model considered in Ref. [30], we find faster convergence and we reach systematic errors one order of magnitude better for the higher PA of the sequence than the holographic model. We chose to use the results with the holographic one to be on the conservative side.

Table 1 Collection of systematic errors (in percentage \(\,\%\)) of the first three derivatives \(b_{\eta },c_{\eta }\) and \(d_{\eta }\) of the \(Q^2 F_{\eta \gamma ^* \gamma }(Q^2)\) for a \(P^L_1(Q^2)\) sequence fit

The strategy is then to generate pseudodata for both regions trying to emulate the real experimental situation. In the space-like region, we evaluated the model at 10 points in the region \(0.6\le Q^2 \le 2.2\) GeV\(^2\), 15 points in the region \(2.7\le Q^2 \le 7.5\) GeV\(^2\), and 9 more points in the region \(9\le Q^2 \le 34\) GeV\(^2\). In the time-like region, the model is evaluated at 8 points in the region \((0.045)^2\le Q^2 \le (0.100)^2\) GeV\(^2\), 15 points in the region \((0.115)^2\le Q^2 \le (0.220)^2\) GeV\(^2\), and 31 more points in the region \((0.230)^2\le Q^2 \le (0.470)^2\) GeV\(^2\). On top of these set of data points we add the value of \(F_{\eta \gamma \gamma }(0,0)\). All these data points have zero error because we want to obtain a pure systematic error on our fitting functions. Notice that the majority of points lie in the low-energy region. This simple exercise also prevents us against over-fitting problems. The very same study can be performed to evaluate the \(P^N_N(Q^2)\) sequence. The results are, however, an order of magnitude better than for the \(P^L_1(Q^2)\) one (see the comparison in Appendix B). From now on, we consider only the systematic errors from the latter to be on the conservative side.

The forthcoming BESIII data on the \(\eta \) TFF at space-like region below 9 GeV\(^2\) might demand a reanalysis of our systematic errors, although we think that including them would not really modify our percentages beyond the precision we are reporting them in Table 1. Their data will be, nevertheless, crucial to reduce our statistical errors which is by now the dominant source.

In passing, we also study what would be the systematic error done by a VMD fit to only the time-like data set. From the three models considered in Refs. [4, 30], the most conservative systematic error found is around \(5\,\%\) (details are presented in Appendix B). Notice that when fitting space-like data with a VMD such an error is around \(40\,\%\). The reason of such a difference is simple because available time-like data is much closer to the origin of energies than the space-like one and less sensible to higher-order effects.

3 \(\varvec{\eta }\) transition form factor: a space- and time-like description

To extract the \(\eta \) TFF low-energy parameters \(b_{\eta }\), \(c_{\eta }\), and \(d_{\eta }\) (slope, curvature, and third derivative respectively) from the available data, we start with a \(P^L_1(Q^2)\) sequence. However, according to Ref. [38], the pseudoscalar TFFs behave as \(1/Q^2\) for \(Q^2\rightarrow \infty \), which means that, for any value of L, one will obtain in principle a good fit only up to a finite value of \(Q^2\) but not for \(Q^2\rightarrow \infty \). Therefore, it would be desirable to incorporate this asymptotic-limit information in the fits to \(Q^2 F_{\eta \gamma ^*\gamma }(Q^2)\) by considering also a \(P^N_{N}(Q^2)\) sequence.

This method, which makes use of experimental data and theoretical framework for fitting them, cannot access the second Riemann sheet where the resonance poles are supposed to be located [39]. One cannot extract resonance poles parameters with such methods, and that poses a word of caution on the interpretation of fits such as Eq. (3) to relate its pole parameters with effective masses. Our method does not contain a branch cut and all the analytical structure is built to reproduce only the first Riemann sheet. The effective pole we obtain should lie outside the range where data are. The main advantage of the method of PAs is indeed to provide the \(Q^2\) dependence of the TFF over the whole space- and time-like region up to the first resonance in an easy and systematic way, without the need of a model for the resonance poles appearing in the amplitude [30, 32]. For how to extract resonance pole parameters using PA, see Refs. [40, 41].

Experimental data from the space-like region is obtained from CELLO, CLEO, and Collaborations [9, 10, 37], together with the time-like experimental data from NA60 and A2 Collaborations [11, 12, 14]. We also include the value \(\Gamma _{\eta \rightarrow \gamma \gamma }=0.516(18)\) keV [42] (which is basically dominated by the recent KLOE-2 measurement [34]) in our fits.

3.1 A remark on experimental systematic errors

When comparing time-like data results from different collaborations it is common to report, together with the experimental data, the result of a fit with a single-pole function Eq. (3). Although such data contain only statistical errors, systematic errors are incorporated in the result of the fit. When using these data in our fits one must incorporate the systematic error information into the fitted data.Footnote 3

The A2 Collaboration reported in 2011 on the Dalitz decay \(\eta \rightarrow e^+e^- \gamma \) [12]. Their fit yielded \(\Lambda ^{-2} = (1.92\pm 0.35_\mathrm{stat} \pm 0.13_\mathrm{syst}) \) GeV\(^{-2}\). Combining both statistical and systematic error one obtains \(\Lambda ^{-2} = (1.92\pm 0.39_\mathrm{comb}) \) GeV\(^{-2}\). In order to obtain the combined error from a direct fit to the published data one can include a new source of error defined in the following way: \(\Delta _\mathrm{final}=\sqrt{ \Delta _\mathrm{stat}^2 + (\epsilon |F(Q_i^2)|^2)^2}\) for each \(Q^2_i\) datum, with \(\epsilon \) a percentage. For the A2 2011 data we find that \(\epsilon = 6.8\,\%\) will allow us to reproduce, with Eq. (3), the combined result \(\Lambda ^{-2} = (1.92\pm 0.39_\mathrm{comb}) \) GeV\(^{-2}\).

At the same time, an analysis of the \(\eta \rightarrow \mu ^+ \mu ^- \gamma \) Dalitz decay by the NA60 Collaboration allowed a determination of \(\Lambda ^{-2}\) with significantly better statistical accuracy. In 2009, they reported the value \(\Lambda ^{-2} = (1.95\pm 0.17_\mathrm{stat} \pm 0.05_\mathrm{syst}) \) GeV\(^{-2}\) [11]Footnote 4, which implies a factor \(\epsilon =1.9\,\%\) for our \(\Delta _\mathrm{final}\).

In 2013, the A2 Collaboration reported a new measurement of the same Dalitz decay \(\eta \rightarrow e^+e^- \gamma \) with larger statistics with a fitted value \(\Lambda ^{-2} = (1.95\pm 0.15_\mathrm{stat} \pm 0.10_\mathrm{syst}) \) GeV\(^{-2}\) [14], which leads to \(\epsilon = 4.8\,\%\).

Published space-like data contains both error sources separately. The exception is the CELLO Collaboration which does not report a systematic error for each bin of data. Only a \(12\,\%\) for the two-photon \(\eta \)-decay channel is reported. Accounting for all the different systematic sources we could find in their publication, we ascribe a \(12\,\%\) of systematic error for the hadronic \(\eta \) decay which leads to a \(6\,\%\) error for the global number of events (implying a \(12\,\%\) of systematic error for each bin). We expect that the forthcoming space-like measurements at BES-III will provide the accurate description of such energy region and the role of the unknown systematic effects in the CELLO data would not be important.

3.2 Results

After defining the set of data we will use, we report on our results. We start fitting with a \(P^L_1(Q^2)\) sequence. We reach \(L=7\) and we show it in Fig. 1 as a green-dashed line. The smaller plot in Fig. 1 is a zoom into the time-like region. The obtained LEPs are collected in Table 2 and shown in Fig. 2 together with our previous results (empty orange) when only space-like data were included in our fits [4]. The stability observed for the LEPs with the \(P^L_1(Q^2)\) sequence is remarkable, and the impact of the inclusion of time-like data is clear since not only allows us to reach higher precision on each PA but also to enlarge our PA sequence by 2 elements. The stability of the result is also clearer and reached earlier, reduces our systematic error, and shows the ability of our method to extract, for the first time, the LEPs from a combined fit to all the available data. The coefficients of the best fitted \(P^L_1(Q^2)\) can be found in Appendix A.

Fig. 1
figure 1

\(\eta \)-TFF best fits. Green-dashed line shows our best \(P^L_{1}(Q^2)\) fit and black line our best \(P^N_N(Q^2)\) fit. Experimental data points in the space-like region are from CELLO (red circles) [9], CLEO (purple triangles) [10], and (orange squares) [37] Collaborations. Experimental data points in the time-like region are from NA60 (blue stars) [11], A2 2011 (dark-green squares) [12], and A2 2013 (empty-green circles) [14]. The inner plot shows a zoom into the time-like region

Table 2 Low-energy parameters for the \(\eta \) TFF obtained from the PA fits to experimental data
Fig. 2
figure 2

Slope (top-left panel), curvature (top-right panel), and third derivative (bottom panel) predictions for the \(\eta \) TFF using the \(P^L_1(Q^2)\) up to \(L=7\) (blue points). Previous results considering only space-like data from Ref. [4] are also shown (empty-orange squares) as a way to stress the role of the time-like data in our fits. Only statistical errors are shown

To reproduce the asymptotic behavior of the TFF, we have also considered the \(P^N_{N}(Q^2)\) sequence (second row in Table 2). The results obtained are in very nice agreement with our previous determinations. The best fit is shown as black-solid line in Fig. 1. We reach \(N=2\). Since these approximants contain the correct high-energy behavior built-in, they can be extrapolated up to infinity (black-dashed line in Fig. 1) and then predict the leading \(1/Q^2\) coefficient:

$$\begin{aligned} \lim _{Q^2\rightarrow \infty }Q^2F_{\eta \gamma ^*\gamma }(Q^2)&=0.177^{+0.020}_{-0.009}\ \text {GeV}\ . \end{aligned}$$
(4)

This prediction, although larger than in our previous work [4], still cannot be satisfactorily compared with the time-like measurement at \(q^2=112\) GeV\(^2\), \(F_{\eta \gamma ^*\gamma }(112 \text { GeV}^2)=0.229(30)(8)\) GeV [35]. The impact of such a discrepancy on \(\eta \)\(\eta ^\prime \) mixing is discussed in the next section.

Our combined weighted average results from Table 2, taking into account both types of PA sequences, give

$$\begin{aligned} {\left\{ \begin{array}{ll} b_{\eta } = 0.576(11)_\mathrm{stat}(4)_\mathrm{sys}\\ c_{\eta } = 0.339(15)_\mathrm{stat}(5)_\mathrm{sys}\\ d_{\eta } = 0.200(14)_\mathrm{stat}(18)_\mathrm{sys} \end{array}\right. } \end{aligned}$$
(5)

where the second error is systematic (around 0.7, 1.5 , and \(9\,\%\) for \(b_P\), \(c_P\), and \(d_P\), respectively, from Table 1).

Equation (5) can be compared with \(b_{\eta } = 0.60(6)_\mathrm{stat}(3)_\mathrm{sys}\), \(c_{\eta } = 0.37(10)_\mathrm{stat}(7)_\mathrm{sys}\) using space-like data exclusively [4]. As expected, not only statistical results have been improved but also systematics, both by an order of magnitude, yielding the most precise slope determination ever.

Our slope is compared with experimental determinations from [814] together with theoretical extraction from [17] in Fig. 3.

Fig. 3
figure 3

Slope determinations for \(\eta \) TFF from different theoretical (red circles) and experimental (blue squares) references discussed in the text. Inner error is the statistical one and larger error is the combination of statistical and systematic errors

One should notice that all the previous collaborations used a VMD model fit to extract the slope. In order to be consistent when comparing with our results, a systematic error of about \(40\,\%\) should be added to the experimental determinations based on space-like data [4, 30], and a systematic error of about \(5\,\%\) should be added to the experimental determinations based on time-like data (see Appendix B for further details).

When comparing different theoretical extractions of the slope of the \(\eta \) TFF with our result in Fig. 3, we find a pretty good agreement with the exception of the results in Ref. [3] that reported \(b_{\eta }=0.546(9)\) and \(b_{\eta }=0.521(2)\) using resonance chiral theory with one- or two-octet ansätze. The disagreement is between 2 and 5 standard deviations. Reference [3] uses resonance chiral theory, which is based on large-\(N_c\) arguments, to extract LEPs. Going from large-\(N_c\) to \(N_c=3\) imposes a systematic error [33, 4446]. Since Ref. [3] considered two approximations for fitting the \(\eta \) TFF (with one and two octets), one could consider the difference between them as a way to estimate such an error [4, 40, 47]. In such a way, the \(\eta \) TFF slope would read \(b_{\eta }=0.53(1)\), at 2.5 standard deviation from our result.

Eventually, we want to comment on the effective single-pole mass determination \(\Lambda _P\) from Eq. (3). Using \(b_P=m_P^2/\Lambda _{P}^2\) and the values in Eq. (5), we obtain \(\Lambda _{\eta }=0.722(7)\) GeV or \(\Lambda _{\eta }^{-2}=1.919(39)\) GeV\(^{-2}\).

The fits shown in Fig. 1 use the experimental value of the two-photon decay width as an experimental datum to be fitted. Such a fit could be repeated without including that decay. In such a way, we reach again a \(P^7_1(Q^2)\) and a \(P^2_2(Q^2)\) as our best PA with the advantage now that the value \(F_{\eta \gamma \gamma }(0)\) is a prediction of our fits. We find \(F_{\eta \gamma \gamma }(0)|_\mathrm{fit}=0.250\,(38)\) GeV\(^{-1}\) for the \(P^7_1(Q^2)\) and \(F_{\eta \gamma \gamma }(0)|_\mathrm{fit}=0.248\,(28)\) GeV\(^{-1}\) for the \(P^2_2(Q^2)\), which translates into \(\Gamma _{\eta \gamma \gamma }|_\mathrm{fit}=0.4\,(13)\) keV and \(\Gamma _{\eta \gamma \gamma }|_\mathrm{fit}=0.42\,(10)\) keV, respectively. Comparing with the experimental value \(\Gamma _{\eta \gamma \gamma }|_\mathrm{exp}=0.516\,(18)\) keV such predictions are at 0.66 and 0.94 standard deviation each.

3.3 The impact of \(\varvec{\eta }\rightarrow \gamma \gamma \) measurements

Our results in Eq. (5) are, by far, the most precise to date. Particularly, we believe that the precision achieved for \(b_{\eta }\) will be hard to improve even if new data becomes available. Nevertheless, the values obtained mildly depend on \(\Gamma _{\eta \gamma \gamma }\). For instance, if we would have used the value measured through the Primakoff mechanism omitted in the PDG average [42] (i.e., \(\Gamma _{\eta \gamma \gamma }^{\mathrm {Primakoff}} = 0.476(62)\) keV [48]), we would find \(b_{\eta }=0.570(13)\) for the SL+TL extraction with a \(20\,\%\) larger error, still in nice agreement with our aforementioned results. Notice that this result is pretty similar to the one obtained by our fits when the decay into two photons is not used in the data set. This fact does not lead to a puzzle, everything seems to agree within uncertainties, but it may suggest to look again for a Primakoff measurement.

3.4 The role played by high-energy space-like data

Low-energy parameters are defined at zero momentum transfer. When extracting them from our fits, one would expect the low-energy data to dominate. We noticed, however, that in order to reach large PA sequences (leading to more precise extractions), the high-energy data is also important as can be seen in Fig. 1. From 5 to 35 GeV\(^2\) data are basically dominated by the measurement [37] and that has a clear implication on the extraction of the asymptotic \(\lim _{Q^2\rightarrow \infty }Q^2F_{\eta \gamma ^*\gamma }(Q^2)\) value as can be seen in Table 3, where the role of data from each collaboration is reported. Indeed, a fit exclusively to data yields similar results for both slope and asymptotic value than when considering the full set of space-like data. However, a fit to the data from the CELLO [9] Collaboration which range only up to 2.23 GeV\(^2\) yields much larger asymptotic value (although statistically compatible). Considering only data from the CLEO [10] Collaboration, which ranges up to 12.74 GeV\(^2\), reduces the asymptotic value by about \(20\,\%\) compared to CELLO. The role of data is then twofold, allowing first to reach \(N=2\) in the \(P^N_N(Q^2)\) sequence and determining basically the asymptotic value.

Table 3 Role of the different sets of experimental data in determining slope and asymptotic values of the \(\eta \) TFF

In view of the puzzle of the \(\pi ^0\) TFF between  [49] and Belle [50] results, a second experimental measurement covering the high-energy region would be very welcome here. We find the Belle Collaboration suited for such purpose and we would like to encourage them to go ahead with such measurements.

On the other side, time-like data can also be used to predict the asymptotic value, even though the range of data is much shorter and much closer to \(Q^2=0\). From the three sets of time-like data used in our fits, A2-11 [12] and A2-13 [14] are based on the \(\eta \rightarrow e^+e^-\gamma \) and covers larger range of phase space. The NA60 [11] Collaboration, based on the \(\eta \rightarrow \mu ^+ \mu ^- \gamma \), covers a shorter range but in the higher-energy region. The asymptotic values extracted from the difference time-like sets of data agree rather well but disagree with the results obtained from the space-like data (although overlapping within errors). Whatever the combination of different data sets selected, data always decides on the asymptotic value. In passing, we notice that any of the configurations considered so far agrees with the results of the \(\eta \) TFF measurement from [35].

4 \(\varvec{\eta }\) transition form factor: applications

As stated in the introduction, TFF are not also interesting by themselves but also for the range of scenarios where they play a crucial role. In this section we consider a few of such applications.

4.1 Reanalysis of the \(\eta \)\(\eta ^\prime \) mixing parameters

In this subsection we briefly summarize the main elements to extract the mixing parameters exclusively from our fits to the form factor data.

As was done in Ref. [4], we analyze \(\eta \)\(\eta ^\prime \) mixing using the quark-flavor basis. In this basis, the \(\eta \) and \(\eta ^\prime \) decay constants are parametrized as

$$\begin{aligned} \left( \begin{array}{cc} F^{q}_{\eta } &{} F^{s}_{\eta }\\ F^{q}_{\eta ^\prime } &{} F^{s}_{\eta ^\prime }\\ \end{array} \right) = \left( \begin{array}{cc} F_{q}\cos \phi _{q} &{} -F_{s}\sin \phi _{s}\\ F_{q}\sin \phi _{q} &{} F_{s}\cos \phi _{s}\\ \end{array} \right) \ , \end{aligned}$$
(6)

where \(F_{q,s}\) are the light-quark and strange pseudoscalar decay constants, respectively, and \(\phi _{q,s}\) the related mixing angles. Several phenomenological analyses find \(\phi _q\simeq \phi _s\), which is also supported by large-\(N_c\) ChPT calculations where the difference between these two angles is seen to be proportional to an OZI-rule violating parameter and hence small [36, 51].

Within this approximation, the asymptotic limits of the TFFs take the form

$$\begin{aligned} \lim _{Q^2\rightarrow \infty }Q^2F_{\eta \gamma ^*\gamma }(Q^2)&=2(\hat{c}_q F^q_{\eta }+\hat{c}_s F^s_{\eta }) \nonumber \\&=2(\hat{c}_q F_q\cos \phi -\hat{c}_s F_s\sin \phi ),\nonumber \\ \lim _{Q^2\rightarrow \infty }Q^2F_{\eta ^\prime \gamma ^*\gamma }(Q^2)&=2(\hat{c}_q F^q_{\eta ^\prime }+\hat{c}_s F^s_{\eta ^\prime })\nonumber \\&=2(\hat{c}_q F_q\sin \phi +\hat{c}_s F_s\cos \phi ), \end{aligned}$$
(7)

and their normalization at zero (from the chiral anomaly and Eq. (2))

$$\begin{aligned} \begin{aligned} F_{\eta \gamma \gamma }(0)&= \frac{1}{4\pi ^2}\left( \frac{\hat{c}_q F^s_{\eta ^\prime }-\hat{c}_s F^q_{\eta ^\prime }}{F^s_{\eta ^\prime }F^q_{\eta }-F^q_{\eta ^\prime }F^s_{\eta }}\right) \\&=\frac{1}{4\pi ^2}\left( \frac{\hat{c}_q}{F_q}\cos \phi -\frac{\hat{c}_s}{F_s}\sin \phi \right) ,\\ F_{\eta ^\prime \gamma \gamma }(0)&= \frac{1}{4\pi ^2}\left( \frac{\hat{c}_q F^s_{\eta }-\hat{c}_s F^q_{\eta }}{F^q_{\eta }F^s_{\eta ^\prime }-F^s_{\eta }F^q_{\eta ^\prime }}\right) \\&=\frac{1}{4\pi ^2}\left( \frac{\hat{c}_q}{F_q}\sin \phi +\frac{\hat{c}_s}{F_s}\cos \phi \right) , \end{aligned} \end{aligned}$$
(8)

with \(\hat{c}_q=5/3\) and \(\hat{c}_s=\sqrt{2}/3\).

Experimental information provides \(|F_{\eta \gamma \gamma }(0)|_\mathrm{exp}=0.274 (5)\) GeV\(^{-1}\) and \(|F_{\eta ^\prime \gamma \gamma }(0)|_\mathrm{exp}=0.344\,(6)\) GeV\(^{-1}\) and for the asymptotic value of the \(\eta \) TFF we take the value shown in Eq. (4) with symmetrical errors, \(\lim _{Q^2\rightarrow \infty }Q^2F_{\eta \gamma ^*\gamma }(Q^2)=0.177\,(15)\) GeV. With these values, the mixing parameters are predicted to be

$$\begin{aligned} F_q/F_\pi =1.07\,(1),\quad F_s/F_\pi =1.39\,(14), \quad \phi =39.3\,(1.2)^{\circ }, \end{aligned}$$
(9)

with \(F_\pi =92.21\,(14)\) MeV [42]. The uncertainties are dominated by the error from the asymptotic value prediction.

One can translate the mixing parameters obtained in the flavor bases into the octet–singlet one by the following recipe [52]:

$$\begin{aligned}&F_8^2=\frac{F_q^2+2 F_s^2}{3}, \quad F_0^2 = \frac{2 F_q^2 + F_s^2}{3},\nonumber \\&\theta _8=\phi - \arctan \left( \frac{\sqrt{2} F_s}{F_q}\right) ,\quad \theta _0=\phi - \arctan \left( \frac{\sqrt{2} F_q}{F_s}\right) . \end{aligned}$$
(10)

where

$$\begin{aligned} \left( \begin{array}{cc} F^{8}_{\eta } &{} F^{0}_{\eta }\\ F^{8}_{\eta ^\prime } &{} F^{0}_{\eta ^\prime }\\ \end{array} \right) = \left( \begin{array}{cc} F_{8}\cos \theta _{8} &{} -F_{0}\sin \theta _{0}\\ F_{8}\sin \theta _{8} &{} F_{0}\cos \theta _{0}\\ \end{array} \right) \ , \end{aligned}$$
(11)

represents the admixture of the \(\eta \) and \(\eta '\) decay constants in terms of the octet and singlet one.

These relations, Eqs. (10), are very useful since, as observed in Ref. [53] and recently discussed in [54], the singlet decay constant \(F_0\) is renormalization-scale dependent:

$$\begin{aligned} \begin{array}{ll} \mu \frac{\mathrm{d}F_0}{\mathrm{d}\mu } = - N_F &{}\left( \frac{\alpha _s(\mu )}{\pi }\right) ^2 F_0\\ \longrightarrow \quad F_0(\mu ) &{} = F_0(\mu _0)\left( 1 + \frac{2N_F}{\beta _0}\left( \frac{\alpha _s(\mu )}{\pi } -\frac{ \alpha _s(\mu _0)}{\pi } \right) \right) \\ &{} = F_0(\mu _0)\left( 1 + \delta \right) , \end{array} \end{aligned}$$
(12)

with \(\beta _0 = \frac{11N_c}{3}-\frac{2}{3}N_F\), \(N_c\) the number of colors, \(N_F\) the number of active flavors at each scale, and \(\mu \) the renormalization scale, with \(\mu _0=1\) GeV a reference point close to the \(\eta '\) mass.

To include this effect in our results it is convenient to work it out in the singlet–octet basis for later on translate it into the flavor one using Eq. (10). As such, the asymptotic behavior equations (7) shift to

$$\begin{aligned}&\lim _{Q^2\rightarrow \infty }Q^2F_{\eta \gamma ^*\gamma }(Q^2) \nonumber \\&\quad =2(\hat{c}_q(1+4\delta /5) F_q\cos \phi - \hat{c}_s(1 + 2\delta ) F_s\sin \phi ), \nonumber \\&\lim _{Q^2\rightarrow \infty }Q^2F_{\eta ^\prime \gamma ^*\gamma }(Q^2) \nonumber \\&\quad =2(\hat{c}_q (1+4\delta /5)F_q\sin \phi + \hat{c}_s (1 + 2\delta )F_s\cos \phi ). \end{aligned}$$
(13)

Assuming asymptotic freedom for \(\alpha _s(\mu )\), the phenomenological input \(\alpha _s(M_z)=0.1185\) [42], and the renormalization group equation for \(\alpha _s(\mu )\), we determine \(\alpha _s(\mu _0=1\,\text {GeV})=0.48\), including up to four loop corrections and threshold effects for its runningFootnote 5. With such values and Eq. (12) we determine \(\delta = -0.17\). Using (10) to go back to the flavor basis we obtain as our final mixing parameters, representing one of the main results of this work:

$$\begin{aligned} \begin{array}{l} \mathbf{inputs:}\, F_{\eta \gamma \gamma }(0), F_{\eta ' \gamma \gamma }(0), \mathrm {asymp }\, \eta \, \\ \Rightarrow \ F_q/F_\pi =1.07(2) , \ F_s/F_\pi =1.29(16) ,\ \phi =38.3(1.6)^{\circ }, \\ \mathbf{inputs:}\, F_{\eta \gamma \gamma }(0), F_{\eta ' \gamma \gamma }(0), \mathrm {asymp }\, \eta '\, \\ \Rightarrow \ F_q/F_\pi =1.06(1) , \ F_s/F_\pi =1.63(8) ,\ \phi =41.1(0.8)^{\circ } , \end{array} \end{aligned}$$
(14)

when taking the \(\eta (\eta ')\) asymptotic behavior, respectively, as part of the subset of equations to be solved (8,13). We stress that corrections from \(\delta \) are bigger for the \(\eta '\) case, as the singlet admixture is more important there. Comparing to our old results (Sect. III in [4]), we find a better agreement among both solutions in (14). As explained in Ref. [4], the mixing equations are not independent, there is a relation between them:

$$\begin{aligned}&\lim _{Q^2\rightarrow \infty }Q^2( F_{\eta \gamma ^*\gamma }(Q^2)F_{\eta \gamma \gamma }(0) + F_{\eta '\gamma ^*\gamma }(Q^2) F_{\eta '\gamma \gamma }(0)) \nonumber \\&\quad =\left( 1+\frac{8}{9}\delta \right) \frac{3}{2\pi ^2} , \end{aligned}$$
(15)

where in the last step we have used the exact expressions (8,13). Numerically, using our \(\delta =-0.17\), one would obtain \(0.85\frac{3}{2\pi ^2}\) for the r.h.s. of (15). Our numerical predictions for the asymptotic form factors together with their experimental normalization yield \(0.89(3) \frac{3}{2\pi ^{2}}\) for its l.h.s., in nice agreement, but this would not be the case without the \(\delta \) correction.

This result contrast with determinations, which, taking the running from \(\mu _0\) up to their scale \(Q^2=112\) GeV\(^2\) instead of at \(\infty \) (resulting in \(\delta \,BABAR=-0.09\)), yields

$$\begin{aligned} \begin{array}{c} F_q/F_\pi =1.10(3)\ , \ \ F_s/F_\pi =0.91(21)\ ,\ \ \phi =33(4)^{\circ }, \\ F_q/F_\pi =1.08(2)\ , \ \ F_s/F_\pi =1.17(23)\ ,\ \ \phi =37(3)^{\circ }, \end{array} \end{aligned}$$
(16)

using again the two subsets of equations in (13). Equation (15), l.h.s., would read \(0.98(7)\frac{3}{2\pi ^2}\) where we are neglecting any \(1/Q^2\) dependence in it. In other words, assuming that \(Q^2=112\,\)GeV\(^2\) plays the role of \(\infty \). The r.h.s. of (15) using \(\delta \) B A B A R results in \(0.92\frac{3}{2\pi ^2}\), then compatible with its l.h.s. This comparison is somewhat false since we do not assume \(\delta \) B A B A R to be at \(\infty \), otherwise we would have used \(\delta =-0.17\), and then we would have found a contradiction between l.h.s. and r.h.s. of (15). This discrepancy is what we call the puzzle; it is depicted in Fig. 4. This figure includes both our mixing results and the determination at \(Q^2=112\)GeV\(^2\) with the corresponding mixing parameters obtained using Eqs. (8) and (13).

We remark that our results are tied to \(\alpha _s^2\) corrections in Eq. (12) and non-negligible systematic effects for the \(\eta '\) asymptotic behavior from our fits. Since this assertion relies on the high-energy TFF-behavior, where only data are available, a second measurement by Belle Collaboration would be a very useful crosscheck.

Fig. 4
figure 4

Mixing parameters as a function of the \(\eta \) TFF asymptotic value when the errors coming from the normalization of the TFFs is set to zero. For comparison, we show the mixing parameters extracted from the measurement of the time-like \(\eta \) TFF at \(q^2=112\) GeV\(^2\) by Collaboration [35]. This figure exemplifies the puzzle between the standard mixing parameters and the measurement

The mixing parameters obtained with our fits are precise enough to be competitive with the standard approaches with the advantage of using much less input information. Figure 5 compares our results from Eq. (14) (blue squares) with well-established phenomenological determinations, the one from Feldmann, Kroll, and Stech (FKS) from Ref. [51, 52], and the one from Escribano and Frere (EF) from Ref. [36] (updated in Ref. [4]). The agreement among the three approaches for both \(F_q\) and \(\phi \) is impressive. Less agreement is found for \(F_s\). This parameter is more sensible to meson decays where the strange quark plays an important role, such as the \(\phi \)-meson decays. In fact, such decays where included in the EF approach but not in the FKS or in the present work.

Fig. 5
figure 5

Mixing parameters of the \(\eta \)\(\eta ^\prime \) system in the flavor basis from different references

Figure 6 compares different singlet octet determinations. In this basis, our results turn out to be

$$\begin{aligned} \begin{array}{ll} F_8/F_{\pi }=1.22\,(11), &{}\quad F_0/F_{\pi }=1.15\,(6),\\ \theta _8 =-21.3\,(3.5)^\circ , &{}\quad \theta _0=-11.3\,(3.9)^\circ , \end{array} \end{aligned}$$
(17)

and they can be compared with the FKS and EF as before together with the results by Leutwyler (L) from Ref. [53] (no errors were given), and the results from Benayoun, DelBuono, and O’Connell (BDO) [55]. Again, the agreement between our results and FKS and EF are remarkable, and also in agreement with the results of Leutwyler. Our results slightly disagree for \(\theta _0\) and \(F_0\) with BDO. The reason is because in the BDO the OZI-violating piece is not set to zero. Since such piece mixes with the singlet component of the mixing, their \(\theta _0\) and \(F_0\) are slightly shifted.

Fig. 6
figure 6

Mixing parameters of the \(\eta \)\(\eta ^\prime \) system in the octet–singlet basis from different references

4.2 A comment on the BABAR high-energy time-like measurement

Our mixing parameters in Eq. (14) disagree with the ones obtained using time-like result as shown in Fig. 4 which, including the results in Figs. 5 and 6 suggests a puzzle between the measurement and the standard phenomenology [4]. Assuming that the measurement is correct, the difference could be explained if the assumption of duality between time- and space-like regions at high energy would not be yet valid at 112 GeV\(^2\) or that the asymptotic limit of Brodsky and Lepage is not yet reached at such energies and a systematic error is done in assuming duality [35].

In Ref. [35], collaboration studied the process \(e^+e^- \rightarrow \gamma ^* \rightarrow \eta ^{(')} \gamma \) at the center-of-mass energy \(\sqrt{s}=10.58\) GeV. They measured its cross section and using its relation with the TFF, obtained the absolute value of the time-like TFF at \(Q^2=-s=-112\,\)GeV\(^2\), \(|Q^2 F_{\eta \gamma ^*\gamma }(Q^2)|=(0.229\pm 0.031)\) GeV and \(|Q^2 F_{\eta '\gamma ^*\gamma }(Q^2)|=(0.251\pm 0.021)\) GeV, where statical and systematic uncertainties are added in quadrature.

A kinematic factor \(K_P^3\) with \(K_P=1-\frac{M_P^2}{s}\) (see [56]) was missing in expressions. This correction leaves published results almost untouched. This small shift together with the duality argument [35] results in a prediction of the TFF at \(Q^2=112\) GeV\(^2\):

$$\begin{aligned} |Q^2 F_{\eta \gamma ^*\gamma }(Q^2)|_{Q^2=+112\,\mathrm {GeV}^2}=(0.231\pm 0.031)~\mathrm {GeV},\nonumber \\ |Q^2 F_{\eta ^{\prime }\gamma ^*\gamma }(Q^2)|_{Q^2=+112\,\mathrm {GeV}^2}=(0.254\pm 0.021)~\mathrm {GeV}. \end{aligned}$$
(18)

One is tempted to include these time-like measurements transformed into space-like predictions Eq. (18) into our fits, after assuming that at this high momentum transfer, the duality between time- and space-like holds and no extra error should be included. For the \(\eta \) TFF fits, its inclusion will mainly modify the asymptotic prediction growing up its value up to \(\lim _{Q^2\rightarrow \infty }Q^2F_{\eta \gamma ^*\gamma }(Q^2)=0.247\) GeV, higher than the result, with a good reduced \(\chi ^2<1\). This, by itself, already indicates that at \(Q^2=112\) GeV\(^2\) the asymptotic regime is not yet reached. Curiously enough, the value of the fit function at \(Q^2=-112\) GeV\(^2\) is \(|Q^2 F_{\eta \gamma ^*\gamma }(Q^2)|_{Q^2=+112\,\mathrm {GeV}^2}=0.219\) GeV, below (18). Even worse is the prediction of our fit function for the time-like counterpart, i.e., \(|Q^2 F_{\eta \gamma ^*\gamma }(Q^2)\) \(|_{Q^2=-112\,\mathrm {GeV}^2} =0.307\) GeV. This exercise shows that the assumption of asymptotic regime at 112 GeV\(^2\) has an error of about \(15\,\%\) in our fits, a theoretical error that should be added to results when used in the space-like region. A recent analysis of the pseudoscalar TFF based on perturbative corrections [54] concludes that the difference between the time- and space-like form factors at \(|Q^2|=112\) GeV\(^2\) can be of the order of 5–\(13\,\%\) for different pseudoscalar distribution amplitudes, and can be enhanced by Sudakov-type corrections (see [57] for details). The Regge model defined in Ref. [4] also suggests a departure from duality of about 15 to \(20\,\%\) at \(|Q^2|=112\) GeV\(^2\). It is, however, difficult to calculate that error and hence difficult to ascribe it to the determination.

Interestingly enough, to check the eventual departure arising from duality violations, one could artificially enhance error just to cross-check its order of magnitude. Increasing the error in Eq. (18) from 0.031 GeV\(^2\) to 0.051 GeV\(^2\) (adding in quadrature a \(1.3\sigma \)) and refitting again, we obtain, with somewhat better \(\chi ^2\), the result that the asymptotic predicted value would then be \(\lim _{Q^2\rightarrow \infty }Q^2F_{\eta \gamma ^*\gamma }(Q^2)=0.193\) GeV, the fit value at \(Q^2=-112\) GeV\(^2\) would read 0.187 GeV, but also our time-like prediction at \(Q^2=112\) GeV\(^2\) would read 0.199 GeV, essentially satisfying the initial assumption that time- and space-like TFF coincide at 112 GeV\(^2\). The error we had to artificially add to reach at that conclusion is around a \(20\,\%\), which agrees with our previous statements and also with [54]. Of course, adding this \(20\,\%\) error in Eq. (16) solve what we call puzzle.

A \(15\,\%\) departure from the asymptotic limit may seem too large for that high momentum transfer. Notice [54, 57] that due to its nature, TFF are a convolution of a perturbative hard-scattering amplitude and a gauge-invariant meson distribution amplitude (DA) [58] which incorporates the nonperturbative dynamics of the QCD bound-state [38]. That means that even for large \(Q^2\) well inside the asymptotic region, soft scales coming from the Fock decomposition can enhance the TFF. These soft corrections depend on the broadness of the DA. At low energies, our fits suggest the typical hadronic scale for the \(\eta \) TFF to be lower than the \(\eta '\) counterpart. Being the \(\eta '\) more contaminated by \(s\bar{s}\) content (and less from other Fock states), one would expect its hadronic scale to be close to the \(\phi \) meson mass, around 1 GeV. This is in fact what we find, and indicates a narrower DA for the \(\eta '\), dominated by a \(q\bar{q}\) state, explaining at once why the duality arguments hold better than in the \(\eta \) case. This argument complements the one discussed in [54] from the perturbative study of the TFFs.

Even larger error should be added to duality arguments at lower energies, such as the measurement of the CLEO Collaboration of the same cross section but at \(\sqrt{s}=3.773\) GeV, and forthcoming measurements by the BES-III Collaboration at \(\sqrt{s}=4.26\) GeV.

For all these reasons, we chose not to use these measurements in the time-like region in our fits.

4.3 A prediction for the \(VP\gamma \) couplings

In this subsection, we extend our analysis to the vector–pseudoscalar electromagnetic form factors. In particular, we are interested in the couplings of the radiative decays of lowest-lying vector mesons into \(\eta \) or \(\eta '\), i.e., \(V \rightarrow (\eta ,\eta ')\gamma \), and of the radiative decays \(\eta ' \rightarrow V\gamma \), with \(V=\rho ,\omega ,\phi \).

We follow closely the method presented in Refs. [36, 59], and we make use of the equations in Appendix A in Ref. [36] to relate the form factors with the mixing angle and the decay constants in the flavor basis. To account for the \(\phi \)\(\omega \) mixing we use \(\phi _V=3.4^\circ \). The form factors, saturated with the lowest-lying resonance and then assuming VMD, can be expressed by

$$\begin{aligned} F_{VP\gamma }(0,0)=\frac{f_V}{m_V}g_{VP\gamma }, \end{aligned}$$
(19)

where \(g_{VP\gamma }\) are the couplings we are interested in, and \(f_V\) are the leptonic decay constants of the vector mesons and are determined from the experimental decay rates via

$$\begin{aligned} \Gamma (V\rightarrow e^+e^-)=\frac{4\pi }{3}\alpha ^2 \frac{f_V^2}{m_V}c_V^2, \end{aligned}$$
(20)

with \(c_V\) an electric charge factor of the quarks that make up the vector, \(c_V=(\frac{1}{\sqrt{2}}, \frac{\sin \theta _V}{\sqrt{6}},\frac{\cos \theta _V}{\sqrt{6}})\) for \(V=\rho ,\omega ,\phi \), respectively. Here \(\theta _V = \phi _V+\arctan (1/\sqrt{2})\). Experimentally we find

$$\begin{aligned} \begin{aligned}&f_{\rho ^0}=(221.2\pm 0.9)\,\mathrm {MeV},\\&f_{\omega }=(179.9\pm 3.1)\,\mathrm {MeV},\\&f_{\phi }=(239.0\pm 3.8)\,\mathrm {MeV}. \end{aligned} \end{aligned}$$
(21)

using \(\Gamma (\rho \rightarrow e^+e^-) = 7.04(6)\) keV, \(\Gamma (\omega \rightarrow e^+e^-) = 0.60(2)\) keV, and \(\Gamma (\phi \rightarrow e^+e^-) = 1.27(4)\) keV from [42].

The couplings in this flavor basis are

$$\begin{aligned}&g_{\rho \eta \gamma }=\frac{3m_{\rho }}{4\pi ^2f_{\rho ^0}}\frac{\cos \phi }{\sqrt{2}F_q}, \quad g_{\rho \eta '\gamma }=\frac{3m_{\rho }}{4\pi ^2f_{\rho ^0}}\frac{\sin \phi }{\sqrt{2}F_q},\nonumber \\&g_{\omega \eta \gamma }=\frac{m_{\omega }}{4\pi ^2f_{\omega }}\left( \cos \phi _V \frac{\cos \phi }{\sqrt{2}F_q} - 2 \sin \phi _V \frac{\sin \phi }{\sqrt{2}F_s} \right) ,\nonumber \\&g_{\omega \eta '\gamma }=\frac{m_{\omega }}{4\pi ^2f_{\omega }}\left( \cos \phi _V \frac{\sin \phi }{\sqrt{2}F_q} + 2 \sin \phi _V \frac{\cos \phi }{\sqrt{2}F_s} \right) ,\nonumber \\&g_{\phi \eta \gamma }=-\frac{m_{\phi }}{4\pi ^2f_{\phi }}\left( \sin \phi _V \frac{\cos \phi }{\sqrt{2}F_q} + 2 \cos \phi _V \frac{\sin \phi }{\sqrt{2}F_s} \right) ,\nonumber \\&g_{\phi \eta '\gamma }=-\frac{m_{\phi }}{4\pi ^2f_{\phi }}\left( \sin \phi _V \frac{\sin \phi }{\sqrt{2}F_q} - 2 \cos \phi _V \frac{\cos \phi }{\sqrt{2}F_s} \right) . \end{aligned}$$
(22)

where we have assumed \(\phi _q=\phi _s=\phi \). Table 4 collects our predictions in its second column. Corrections due to \(\phi _q \ne \phi _s\) to these formulas can be found in Appendix A, Eq. (A.5) of Ref. [36].

Table 4 Summary of VP\(\gamma \) couplings

The decay widths of \(P\rightarrow V\gamma \) and \(V\rightarrow P\gamma \) are

$$\begin{aligned} \Gamma (P\rightarrow V\gamma )&=\frac{\alpha }{8} g^2_{VP\gamma }\left( \frac{m_P^2-m_V^2}{m_P}\right) ^3,\nonumber \\ \Gamma (V\rightarrow P\gamma )&=\frac{\alpha }{24} g^2_{VP\gamma }\left( \frac{m_V^2-m_P^2}{m_V}\right) ^3. \end{aligned}$$
(23)

The experimental decay widths from [42] allow us to extract an experimental value for \(g_{VP\gamma }\), which are collected in the last column on Table 4.

Our predictions compare well with the experimental determinations, see Table 4, specially considering the simplicity of the approach. The differences are always below 2 standard deviations, excepting the \(\omega \) couplings. Our prediction for the ratio of \(J/\Psi \) decays is in that respect remarkable.

The observed deviations hint toward a somehow oversimplified approach. Even though our goal is just to show the relevance of TFF in other decays, and we do not pretend an exhaustive study of higher-order contributions in our scheme, we still want to remark two possible ways to improve our approach.

On the one hand, the fact that \(F_q\) departs from \(F_{\pi }\) in Eq. (14) may imply a correction through an OZI-violating parameter \(\Lambda _1\) that appears at next-to-leading order in the Lagrangian of \(\chi \)PT Large-\(N_c\) used to define the mixing equations, \(F_q=F_{\pi }(1+\Lambda _1/3)\) [52, 60] which in turns imply \(\phi _q \ne \phi _s\), since \(\phi _q-\phi _s \sim \Lambda _1/3\). With the result in Eq. (14), we estimate \(\Lambda _1 \sim 0.2\), in agreement with the naive \(1/N_c\) counting (i.e., \(\Lambda _1 \sim 1/N_c \sim 0.3\)), and then \(\phi _q-\phi _s \sim 3.8^\circ \).

The ratio \(R_{J/\Psi }\) provides direct information on the angle \(\phi _q\) since

$$\begin{aligned} R_{J/\Psi }=\tan ^2(\phi _q)\left( \frac{m_{\eta '}}{m_{\eta }}\right) ^4\left( \frac{M^2_{J/\Psi }- m_{\eta '}^2 }{M^2_{J/\Psi }- m_{\eta }^2 } \right) ^3\ , \end{aligned}$$
(24)

with \(M_{J/\Psi }\) the \(J/\Psi \) meson mass. The experimental \(R_{J/\Psi }\) ratio defined in last the row in Table 4, results in \(\phi _q=(38.1\pm 0.6)^{\circ }\), which implies \(\phi _s=(38.1+3.8 \pm 1.6)^\circ = (41.9\pm 1.6)^\circ \) with the error coming from our determination of \(\phi \) in Eq. (14). Even though both angles are distinguishable, their impact on the \(g_{VP\gamma }\) is a shift of the form \(g_{VP\gamma } \rightarrow g_{VP\gamma } /(\cos \phi _q \cos \phi _s + \sin \phi _q \sin \phi _s)\) [36]. For the \(\phi _q = \phi _s\) limit, such a shift is exactly 1. Using the \(3.8^\circ \) difference, such a shift translates into 0.998, a 2 per mil effect—negligible. Our assumption \(\phi _q = \phi _s=\phi \) is supported phenomenologically.

On the other hand, as discussed in detail in Ref. [61], in the flavor singlet channel one has to allow for another OZI-rule violating correction, which essentially corresponds to replacing \(F_0 \rightarrow F_0/(1 + \Lambda _3)\). This shifts both the \(P\rightarrow \gamma \gamma \) decays and the formulas for \(g_{VP\gamma }\) predictions [60]. The parameter \(\Lambda _3\) is, however, still unknown, although expected to be \(\sim 1/N_c \sim 0.3\). We can make use of Eq. (15) to estimate it. The shift on \(F_0\) can be translated into a shift in \(F_{q,s}\) recalling that both are related to \(F_{\pi }\) following Eqs. (17) and (9), respectively, and find \(F_{q,s}\rightarrow F_{q,s}/(1 \pm \Lambda _3)\) as well. Going then to Eq. (8), \(F_{\eta (')\gamma \gamma }(0) \rightarrow F_{\eta (')\gamma \gamma }(0) (1-\Lambda _3)\).

Then Eq. (15) transforms into

$$\begin{aligned}&\lim _{Q^2\rightarrow \infty }Q^2( F_{\eta \gamma ^*\gamma }(Q^2)F_{\eta \gamma \gamma }(0) + F_{\eta '\gamma ^*\gamma }(Q^2) F_{\eta '\gamma \gamma }(0)) \nonumber \\&\quad \times (1-\Lambda _3)= \left( 1+\frac{8}{9}\delta \right) \frac{3}{2\pi ^2} , \end{aligned}$$
(25)

which, after expanding and reorganizing in such a way that in the l.h.s. remain only experimental quantities, results in

$$\begin{aligned}&\lim _{Q^2\rightarrow \infty }Q^2(F_{\eta \gamma ^*\gamma }(Q^2)F_{\eta \gamma \gamma }(0) + F_{\eta '\gamma ^*\gamma }(Q^2) F_{\eta '\gamma \gamma }(0)) \nonumber \\&\quad = \left( 1+\frac{8}{9}\delta + \Lambda _3 +\frac{8}{9}\delta \Lambda _3 \right) \frac{3}{2\pi ^2}. \end{aligned}$$
(26)

We recall that l.h.s., experimentally, reads \(0.89 \frac{3}{2\pi ^2} \), and \(\delta = -0.17\). With (26) we find \(\Lambda _3=0.05\), smaller than expected and with positive sign.

The VP\(\gamma \) couplings are also shifted by \(\Lambda _3\). The expressions can be found in Eq. (42) in Ref. [60] which, after expanding, can be expressed as a shift on the couplings in our Eq. (22): \(g_{V\eta \gamma } \rightarrow (g_{V\eta \gamma } + |g_{V\eta \gamma }|\Lambda _3/2)\) and \(g_{V\eta '\gamma } \rightarrow (g_{V\eta '\gamma } + |g_{V\eta '\gamma }|\Lambda _3)\), always increasing the coupling. For some of them, the \(\Lambda _3\) correction goes on the right direction (the \(\rho \) case), but for others it is not conclusive (the \(\phi \) case where for \(\eta \) goes well and for \(\eta '\) wrong). The result of the shift is, then, ambiguous.

Discarding OZI-violating effects, Padé approximants can then be the avenue to follow since the vector mass that should be used in Eq. (19) it should not correspond to a physical observable, but an effective scale provided by the pole of a PA assuming the philosophy of the present work. For the \(\eta \) TFF, the \(\Lambda _{\eta }^2\) from Eq. (3) is smaller than the VMD mediator. If the same would happen with the \(\rho ,\omega \) form factors, one would expect, then, different \(g_{VP\gamma }\) couplings. Since this study is beyond the scope of the present analysis, we postpone it for future work. A naive estimate of these effects could be accounted for within the half-width-rule [46], i.e., instead of using \(m_V\) in Eq. (19), we use \(m_V \pm \Gamma _V/2\), with \(\Gamma \) the full width of the vector. This provides a way to assess the error of neglecting the width of the resonance in using \(m_V\). For example, for the \(\rho \) case, within the half-width-rule, the errors of the \(g_{\rho P\gamma }\) would be enlarged by a factor 3, well compatible with the experimental determinations.

Further studies along these lines are postponed for future work.

5 Conclusions

In the present work, the \(\eta \) transition form factor has been analyzed for the first time in both space- and time-like regions at low and intermediate energies making use of a model-independent approach based on the use of rational approximants of Padé type. The model independence of our approach is achieved trough a detailed and conservative evaluation of the systematic error associated to it. The new set of experimental data on the \(\eta \rightarrow e^+e^-\gamma \) reaction provided by the A2 Collaboration in the very low-energy part of the time-like region allows for a much better determination of the slope and curvature parameters of the form factor, as compared to the predictions obtained in our previous work only using space-like data, which constitute the most precise values up-to-date of these low-energy parameters. Our method is also able to predict for the first time the third derivative of the form factor. In addition, the new analysis has served to further constrain its values at zero momentum transfer and infinity. We have seen that our results, in particular for the case of the slope parameter, are quite insensitive to the values used in the fits for the two-photon decay width of the \(\eta \), thus showing that the collection of space- and time-like experimental data is more than enough to fix a value for the normalization of the form factor compatible with current measurements. We have also seen that the role played by the high-energy space-like data is crucial to get accurate predictions for the low-energy parameters of the form factor and its asymptotic value. As a consequence of these new results, we have fully reanalyzed the \(\eta \)\(\eta ^\prime \) mixing parameters this time also considering renormalization-scale dependent effects of the singlet decay constant \(F_0\). The new values obtained are already competitive with standard results having the advantage of requiring much less input information. Related to this, we have also obtained predictions for the \(VP\gamma \) couplings which are in the ballpark of present-day determinations.

In summary, the method of Padé approximants has been shown to be very powerful for fixing the low-energy properties of the \(\eta \) transition form factor making their predictions more accurate and well established. This fact opens the door to a more exhaustive analysis of the single Dalitz decay processes \(P\rightarrow l^+l^-\gamma \), with \(P=\pi ^0, \eta , \eta ^\prime \) and \(l=e, \mu \), the double Dalitz ones \(P\rightarrow l^+l^-l^+l^-\) (in all possible kinematically allowed configurations) [62], and the rare lepton-pair decays \(P\rightarrow l^+ l^-\)—see the \(\pi ^0\rightarrow e^+ e^-\) application in Ref. [63], which are usually discussed only in terms of monopole approximations. Indeed, when this work was being concluded the BESIII Collaboration reported a first observation of the \(\eta ^\prime \rightarrow e^+e^-\gamma \) process measuring the branching ratio and extracting the \(\eta ^\prime \) transition form factor [64]. This new measurement may put our approach with its back to the wall. However, a very preliminary analysis of this recent data in comparison with our prediction for this form factor in the time-like region exhibits a nice agreement but reveals the necessity of going beyond the VMD model used in the experimental analysis [65].