Dispersive analysis of η→3π\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\eta \rightarrow 3 \pi $$\end{document}

The dispersive analysis of the decay η→3π\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\eta \rightarrow 3\pi $$\end{document} is reviewed and thoroughly updated with the aim of determining the quark mass ratio Q2=(ms2-mud2)/(md2-mu2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Q^2=(m_s^2-m_{ud}^2)/(m_d^2-m_u^2)$$\end{document}. With the number of subtractions we are using, the effects generated by the final state interaction are dominated by low energy ππ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi \pi $$\end{document} scattering. Since the corresponding phase shifts are now accurately known, causality and unitarity determine the decay amplitude within small uncertainties – except for the values of the subtraction constants. Our determination of these constants relies on the Dalitz plot distribution of the charged channel, which is now measured with good accuracy. The theoretical constraints that follow from the fact that the particles involved in the transition represent Nambu–Goldstone bosons of a hidden approximate symmetry play an equally important role. The ensuing predictions for the Dalitz plot distribution of the neutral channel and for the branching ratio Γη→3π0/Γη→π+π-π0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varGamma _{\eta \rightarrow 3\pi ^0}/ \varGamma _{\eta \rightarrow \pi ^+\pi ^-\pi ^0}$$\end{document} are in very good agreement with experiment. Relying on a known low-energy theorem that relates the meson masses to the masses of the three lightest quarks, our analysis leads to Q=22.1(7)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Q=22.1(7)$$\end{document}, where the error covers all of the uncertainties encountered in the course of the calculation: experimental uncertainties in decay rates and Dalitz plot distributions, noise in the input used for the phase shifts, as well as theoretical uncertainties in the constraints imposed by chiral symmetry and in the evaluation of isospin breaking effects. Our result indicates that the current algebra formulae for the meson masses only receive small corrections from higher orders of the chiral expansion, but not all of the recent lattice results are consistent with this conclusion.


Introduction
Our world is almost isospin symmetric: The up and the down quarks can be freely interchanged (or replaced by any linear combination of them) inside hadrons almost without any observable consequence. Of course the charge of the two quarks is different, so that after an isospin transformation the charge of the hadronic state might change, but since the electromagnetic interactions are much weaker than the strong ones, we can classify this as a small effect. Besides the charge, the only difference between the two quarks is their mass. In relative terms their mass difference is large, but very small when compared to the mass of a typical hadron: If we interchange the up and down quarks inside a hadron, the mass of the latter barely changes. Observables which are sensitive to isospin violations are therefore particularly interesting, as they offer us rare insights into the sector of the Standard Model Lagrangian which breaks the isospin symmetry. One of them is the decay of the η-meson into three pions. This decay would be forbidden by isospin symmetry and moreover it is mainly due to purely strong isospin violations [1,2]: Among the already rare observables sensitive to isospin breaking, this is even more special as it allows to clearly separate the two sources, which are otherwise mostly present at a similar level. To a good approximation the decay rate is proportional to the square of the up and down mass difference. If one were able to accurately calculate the proportionality factor -the modulus squared of the transition amplitude between the η and a three-pion state mediated by the third component of the scalar isovector quark bilinear -a measurement of the decay rate would provide a determination of this quark mass difference. This approach has been adopted before, but both, recent improved measurements of the differential decay rates as well as progress on the theory side call for an updated and improved analysis. This is the aim of the present paper, where we give a detailed account of the work reported in Ref. [3].
The calculation of hadronic matrix elements is not an easy task, especially if the aim is high precision. Several methods are available and can be applied with varying degree of success, depending on the circumstances: They range from lattice QCD to Chiral Perturbation Theory (χPT), to dispersive approaches. Decays into three particles are not accessible to lattice calculations yet, 1 but both the effective field theory approach and dispersion relations can be and have been used to analyze these processes. As it turns out, the main difficulty concerns the evaluation of rescattering effects among the pions in the final state. In particular, the lowest resonance occurring in QCD, the f 0 (500), strongly amplifies the final state interaction in the S-wave with I = 0. For this reason, the first few terms of the chiral pertubation series do not provide a good description of the momentum dependence of the amplitude, even if the one-loop representation [11] is extended to two loops [12]. We will discuss the limitations of the effective theory in the present case in Sec. 6. Dispersion relations, on the other hand, are perfectly suited to evaluate rescattering effects to all orders [13][14][15]. They express the amplitude in terms of a few subtraction constants, which play a role analogous to the low-energy constants (LEC) of χPT. Those relevant for the momentum dependence of the amplitude can be determined very well on the basis of the experimental information on the Dalitz plot distribution. Theory is needed only for the analogs of those LECs that describe the dependence on the quark masses.
In the literature there are already a few papers which follow essentially the same approach, but there are several compelling reasons for redoing this analysis: 1. Until recently, the dispersive analyses relied on a rather crude input for the ππ phase shifts, which is the essential ingredient in the dispersive calculation. Today a much more accurate representation for this amplitude is available [16,17].
2. Improved calculations of the electromagnetic effects in this decay are available [18] and it is impossible to use these in combination with old dispersive calculations. 3. There have been recent, more accurate experimental measurements of the Dalitz plot in the charged channel [19][20][21][22], which challenge the theory to correctly describe this momentum dependence. 4. The experimental information concerning the momentum dependence in the neutral channel also improved very significantly [23][24][25][26], but represents a theoretical puzzle, because Chiral Perturbation Theory does not predict the slope correctly, in fact, not even the sign.
In the following we take up this challenge and apply and combine all theoretical improvements listed above to come up with a representation for the η → 3π amplitude which can be used to describe the data. The most challenging aspects concern: i) obtaining numerical solutions of the integral equations which follow from the dispersion relations; ii) the dispersion relations are analyzed in the isospin limit -isospin breaking effects must be accounted for; iii) formulate and impose the constraints that follow from the fact that the particles involved in this decay are Nambu-Goldstone bosons of a hidden approximate symmetry.
As we will show, we have been able to successfully address all these challenges and have set up a framework which allows us to describe the data well with values of the subtraction constants -the input parameters in the dispersion relations -which agree well with the prediction of χPT. A proper treatment of isospin breaking corrections is essential, at the current level of precision, to simultaneously describe experimental data in both the charged and the neutral channel of the decay. The plan of the paper is as follows. We set up our dispersive framework in Sec. 2 and review χPT calculations and predictions on this process in Sec. 3. Our dispersive analysis is performed in the isospin limit -the approach used to account for isospin breaking effects is discussed in Sec. 4. In Sec. 5, we describe our fits to the KLOE measurements of the Dalitz plot for η → π + π − π 0 and discuss the importance of the theoretical constraints in this context. The results of the dispersive analysis are compared with the χPT twoloop representation of the decay amplitude in Sec. 6, whereas, in Sec. 7, we analyze the consequences for the decay η → 3π 0 . In Sec. 8, the results are compared with the recent update of the MAMI data on this decay [25]. Sec. 9 discusses our determination of the kaon mass difference in QCD and of the quark mass ratios Q and m u /m d . Finally, in Sec. 10, we compare our analysis with related work. Our conclusions in Sec. 11 are followed by a number of appendices containing details of our calculation.

Theoretical framework 2.1. Isospin
The transition η → 3π proceeds exclusively through isospin breaking operators since three pions cannot be in a state where isospin and angular momentum vanish at the same time. Indeed, the three-pion isoscalar state has odd (and therefore non-zero) angular momentum according to Bose statistics. In the Standard Model, isospin breaking contributions can arise either from the electromagnetic or the strong interaction. However, according to a theorem by Sutherland [1,2], the electromagnetic (e.m.) contribution to the decay η → 3π vanishes at leading order of the chiral perturbation series: The transition is mainly due to the fact that QCD does not conserve isospin. The isospin breaking part of the QCD Lagrangian, carries I = 1 and can indeed generate transitions between the η and three-pion states with I = 1. Up to contributions from the e.m. interaction and higher orders in m u − m d , the transition amplitude is given by the matrix element of the perturbation L ∆m QCD between the unperturbed, stable initial and final states, 2 A c (s, t, u) = π + π − π 0 out|L ∆m QCD |η in .
The Mandelstam variables stand for The quantity A c (s, t, u) is dimensionless like the amplitude of ππ scattering and is proportional to the quark mass difference m d − m u . As pointed out in [11], it is convenient to (i) decompose the amplitude into a momentum-independent term N that breaks isospin symmetry times a remainder M c (s, t, u) that is isospin-invariant and (ii) define N in terms of the kaon mass difference in QCD and the pion decay constant F π : A c (s, t, u) = −N M c (s, t, u) , N ≡M (4) We follow the notation used by FLAG:M K 0 andM K + stand for the masses of the kaons in QCD [27]. The amplitude M c (s, t, u) concerns the isospin limit of QCD, where the charged and neutral pions and kaons carry the common mass M π and M K , respectively. The normalization (4) implies that, in current algebra approximation [28,29], the amplitude M c exclusively involves the meson masses: 3 M c (s, t, u) = (3s − 4M 2 π )/(M 2 η − M 2 π ). 2 The relative phase of the amplitudes for the charged and neutral channels depends on the convention used to specify the phase of the one-particle states. We are working with |π ± = (|π 1 ±i|π 2 )/ √ 2, |π 0 = |π 3 . 3 The mass of the η is protected from isospin breaking: The e.m. self-energy vanishes at leading order of the chiral expansion and the expansion of Mη in powers of the difference m d −mu only starts at O(m d −mu) 2 . The difference between the physical mass of the η and its value in the isospin limit is beyond the accuracy of our calculation.
In this notation, the rate of the decay η → π + π − π 0 is given by Γ η→π + π − π 0 = (2π) 4 N 2 2M η dµ(p π + )dµ(p π − )dµ(p π 0 )δ 4 (p η − p π + − p π − − p π 0 )|M c (s, t, u)| 2 , (5) with dµ(p) = d 3 p/(2p 0 )/(2π) 3 . Since only two of the Mandelstam variables are independent, the rate can be expressed as an integral over two of these: In the entire first part of the present paper, we will limit ourselves to an analysis of the transition amplitude M c (s, t, u) in the isospin limit. The neglected contributions of order e 2 and (m u − m d ) 2 do not respect isospin symmetry and are referred to as isospin breaking corrections. We will analyze these in detail in Sec. 4.
Charge conjugation symmetry requires the amplitude to be invariant under the exchange of the two charged pions, M c (s, t, u) = M c (s, u, t) , and isospin symmetry implies that the amplitude for the transition η → π i π j π k is determined by the one relevant for the charged decay mode: In particular, the transition amplitude for the decay η → 3π 0 , which we denote by M n (s, t, u), is represented as: The formula explicitly shows that the amplitude for the neutral mode is symmetric in all three Mandelstam variables. Note that the indistinguishability of the pions generated in the decay η → 3π 0 implies that the corresponding Mandelstam variables are not unique. While an event occurring in the decay η → π + π − π 0 corresponds to a unique set of values for s, t, u, the six different permutations of s, t, u belonging to a configuration of three neutral pions correspond to six different points in the physical region, but describe the same event. If the phase space integral is extended over the entire physical region, the result must be divided by six:

Branch cuts, discontinuities
The consequences of causality and unitarity for transitions with three particles in the final state were investigated long ago [30][31][32][33][34] and many papers concerning the decays K → 3π and η → 3π have appeared since then. In particular, as shown in [13][14][15]35], the final state interaction can reliably be accounted for with dispersion relations. Since the publication of these papers, the ππ phase shifts have been determined to remarkable precision [16,17,36] and the quality of the experimental information about these decays is now also much better.
Moreover, the nonrelativistic effective field theory has been set up for these transitions. The application of this method to K → 3π turned out to be very successful [37][38][39][40]. These developments have triggered renewed interest in theoretical studies of η → 3π [41][42][43][44][45][46][47][48][49][50][51][52][53]. We briefly summarize the main properties of the transition amplitude at low energies. On account of causality, the function M c (s, t, u) is analytic in the Mandelstam variables s, t, u. At low energies, the final state interaction among the pions generates the most important singularities. The branch cut due to the interaction between π + and π − starts at s = 4M 2 π ('s-channel'), while the cuts associated with the interactions in the t-and u-channels stem from the pairs π + π 0 and π − π 0 and start at t = 4M 2 π and u = 4M 2 π , respectively. The strength of these singularities can be characterized with the discontinuity across the cut, that is with the difference between the values of the amplitude at the upper and lower rim of the cuts. The discontinuity across the branch cut in the s-channel, for instance, is defined by Since the angular momentum barrier strongly suppresses the discontinuities due to the Dand higher partial waves, the low-energy structure is dominated by those from the S-and P-waves. This also manifests itself in χPT: Discontinuities due to partial waves with ≥ 2 start showing up only at O(p 8 ) of the chiral expansion. The discontinuity generated by the S-wave with isospin I = 0 only shows up in the schannel, with a term that does not depend on the scattering angle, i.e. exclusively involves the variable s. We denote the discontinuity due to this partial wave by disc M 0 (s): In the t-channel, the interaction in the S-wave with I = 2 generates a discontinuity that only depends on t: disc M 2 (t). Since the transition amplitude is symmetric with respect to the exchange of t and u, the corresponding discontinuity in the u-channel is determined by the same function: disc M 2 (u). The interaction in the exotic wave also manifests itself in the s-channel, with a discontinuity proportional to disc M 2 (s). The proportionality factor must be such that the projection onto the isoscalar S-wave vanishes. This projection is given by the sum over i = j of the matrix element π i π j π k out|qλ 3 q|η , i.e. by 3f (s, t, u) + f (t, u, s) + f (u, s, t). With f (s, t, u) ∝ disc M 2 (t) + disc M 2 (u) + λ disc M 2 (s), this reduces to (3λ + 2) disc M 2 (s) + · · · , where the ellipsis stands for terms that only depend on t or u. Hence λ = − 2 3 , so that: Since the P-wave carries I = 1, it cannot show up in the s-channel, but generates a t-channel contribution of the form f (t) cos θ t , where θ t is the scattering angle. Expressed in terms of the Mandelstam variables, cos θ t is proportional to s − u. Together with the analogous term in the u-channel the P-wave discontinuity thus takes the form disc P M c (s, t, u) = (s − u) disc M 1 (t) + (s − t) disc M 1 (u) . (14) This shows that the suppression of the higher partial waves simplifies the analytic structure of the transition amplitude considerably: Retaining only the discontinuities due to the leading partial waves with isospin I = 0, 1, 2, those of the full amplitude can be decomposed into three functions of a single variable: disc M c (s, t, u) = disc M 0 (s) + (s − u) disc M 1 (t) + (s − t) disc M 1 (u) The functions disc M 0 (x), disc M 1 (x) and disc M 2 (x) describe the discontinuities in the lowest partial waves with I = 0, 1 and 2, respectively.

Dispersion relations, subtractions
We denote the contribution to the transition amplitude generated by the discontinuity from the leading partial wave with isospin I by M I (s) and refer to the functions M 0 (s), M 1 (s), M 2 (s) as the isospin components of the amplitude. These functions only have a right hand cut for 4M 2 π < s < ∞ and, as suggested by the notation, the discontinuity of M I (s) across this cut is given by disc M I (s). Accordingly, M I (s) obeys a dispersion relation of the form where we have allowed for subtractions, collecting the subtraction constants in the polynomial P I (s). The representation illustrates the fact that analytic functions are fully determined by their singularities. In the present context, not only those occurring at finite values of the Mandelstam variables, but also those at infinity matter. Although we are not interested in the asymptotic behaviour of the amplitude as such, it provides a convenient handle on the subtractions: The singularities unambiguously determine the amplitude provided the asymptotic behaviour is known. The Mandelstam variables are not independent, but obey the constraint s + t + u = M 2 η + 3M 2 π . We use the two independent variables s and τ ≡ t − u (the constraint then fixes all three variables in terms of these two). The condition that the amplitude M c (s, t, u) does not grow more rapidly than with the square of λ if s and τ grow in proportion to λ turns out to lead to a suitable framework that allows sufficiently many subtractions, so that the poorly known high energy behaviour of the amplitude and inelastic contributions do not play a significant role. The general polynomial that is even in τ and obeys this asymptotic condition is of the form p 0 + p 1 s + p 2 s 2 + p 3 τ 2 and it is easy to see that a polynomial of this form can be absorbed in the functions M 0 (s), M 1 (s), M 2 (s). Hence, if the discontinuities are of the form (15), then the asymptotic condition ensures that the amplitude itself can be decomposed into three functions of a single variable, Inserting this in (9), the analogous decomposition of the neutral transition takes the remarkably simple form: In the approximation we are using, only the combination M n (s) ≡ M 0 (s) + 4 3 M 2 (s) (19) of the S-waves is relevant for the neutral decay mode -the P-wave drops out altogether. We expect that, in the physical region of the decay, the representations (17), (18) constitute an excellent approximation to the isospin limit of the transition amplitudes. In χPT, the approximation holds up to and including next-to-next-to-leading order (NNLO) -in that framework, the decomposition (17) is referred to as the 'reconstruction theorem' [54].

Polynomial ambiguities
There is a problem of technical nature with the approximation (17) This demonstrates that the decomposition (17) is unique up to a five-parameter family of polynomials. The transformations specified in (20), (21) form a Lie group, which we denote by G 5 . Under this group, the isospin components M 0 (s), M 1 (s) and M 2 (s) transform in a non-trivial manner, but their sum, M c (s, t, u) is invariant.
The above calculation also shows that the component M 1 (t) cannot grow more rapidly than with the square of t: Otherwise, the function M 1 (t) would not tend to zero when t is sent to infinity, as required by the asymptotic condition. We exploit the freedom inherent in the polynomial ambiguities as follows. First, we choose the parameter a in (20), (21) such that the term in M 1 (t) which asymptotically grows with t 2 is cancelled, such that M 1 (t) ∝ t. For large values of t, the derivative (∂ t − ∂ u )∂ 2 t M c (s, t, u) is then dominated by the contribution from M 2 (t), which is proportional to M 2 (t). The asymptotic condition on M c (s, t, u) thus implies that M 2 (t) must tend to zero when t → ∞, so that M 2 (t) grows at most quadratically. The leading term can again be removed: With a suitable choice of the parameter b, we arrive at a decomposition for which both M 1 and M 2 at most grow linearly. The ambiguities in the decomposition then reduce to a three-parameter family of polynomials, labeled with c, d, e. We fix c with the condition M 1 (0) = 0 and, finally, choose d, e such that M 2 (0) = M 2 (0) = 0. This shows that the decomposition can be made unique by imposing the five constraints With this choice, the asymptotic condition is obeyed by the individual isospin components, not only by their sum. In particular, M 0 (s) then grows at most quadratically: M 0 (s) ∝ s 2 .

Elastic unitarity
The occurrence of ππ branch cuts is a consequence of unitarity, but an amplitude of the simple form (17) can obey the unitarity condition only approximately. The relevant approximation is referred to as elastic unitarity. For ππ scattering, the Roy equations [55] provide a rigorous framework, within which the singularities due to the final state interaction in the S-and P-waves can be sorted out explicitly. For the decay of an η or a kaon into three pions, however, the constraints imposed by elastic unitarity are more subtle. For a detailed discussion, we refer to the literature quoted above. In the following, we rely on the framework developed in [13,15,32], where the final state interaction effects are analyzed by means of analytic continuation in M η . The net result of that analysis is the following expression for the leading discontinuities: The first term in the curly bracket stems from collisions in the s-channel, the second accounts for those in the t-and u-channels and δ 0 (s), δ 1 (s), δ 2 (s) denote the phase shifts of the leading partial waves of ππ scattering with isospin I = 0, 1, 2, respectively (in the standard notation, the phase shifts are denoted by δ I (s), where I and indicate the isospin and angular momentum quantum numbers of the partial wave, respectively; as only the lowest value of is relevant in our approximation, we drop the upper index). The contributions from the t-and u-channels are given by averages over the functions M 0 (s), M 1 (s), M 2 (s): withM 0 =M 0 (s), M 0 = M 0 (s), etc. The quantities s 0 and κ = κ(s) stand for and the averages are defined by The complications occurring with elastic unitarity in the decay into three pions concern the specification of these averages. They arise because the η is an unstable particle. We use the standard method proposed in the pioneering papers on the subject and define the angular averages by means of analytic continuation in the square of the mass of the η. Reserving the symbol M η for the physical value of the mass, we denote the corresponding complex variable by M . Starting with a real value of M 2 below 9M 2 π , where the η is stable, the physical mass is approached with M 2 = M 2 η + iδ, where δ is positive and tends to zero. For M 2 < 9M 2 π , the integral over z in (26) runs over values that are in the analyticity domain of the integrand, so that the integral is meaningful as it stands. Since the integrand is an analytic function of z, the path of integration can be deformed without changing the value of the integral, as long as the path stays within the domain of analyticity. Indeed, if M 2 is increased above 9M 2 π , such a deformation is necessary to avoid the singularities of the integrand. The matter is discussed in some detail in Appendix A.
Gasser and Rusetsky [56] very recently found a more efficient method for the solution of the integral equations. Their approach relies on a formulation of these equations for complex values of the Mandelstam variables and avoids the numerical problems altogether, which are encountered in the method we are using to evaluate the angular averages and are described in Appendix A. They kindly made their numerical results available to us. In the vicinity of the critical points, these are significantly more accurate than those obtained with our numerical procedure, while away from these points, their results offer a very welcome check on our numerics. The numerical results given in the present paper are based on their fundamental solutions -some of our numerical results differ from those quoted in the letter version [3], but in all cases, the difference amounts to a small fraction of the quoted error.
Analytic continuation in the mass of the η fully specifies the elastic unitarity approximation used in the present work. As mentioned in Sec. 2.2, the approximation (17), which represents the amplitude in terms of three functions of a single variable, is valid in χPT, up to and including NNLO. This statement holds within the effective theory based on SU(3)×SU(3), i.e. includes loops involving kaons or η-mesons. Our treatment of elastic unitarity, however, only accounts for the discontinuities generated by elastic collisions among the pions and does not include intermediate states containing heavy members of the Nambu-Goldstone octet.
Albaladejo and Moussallam [48,49] have set up a dispersive framework for the analysis of the decay η → 3π which extends elastic unitarity to the quasi-elastic collisions among the members of the pseudoscalar octet. We compare our approach with theirs in Sec. 10.1. In the range of energies of interest to us and in view of the fact that we use dispersion relations with many subtractions, the polynomial approximation for the contributions from the heavy intermediate states is perfectly adequate. What is important, however, is that the singularities generated by the final state interaction among the pions are properly accounted for and we have checked that this is the case: The elastic unitarity approximation specified above does account for the pionic singularities contained in the chiral representation of the transition amplitude, up to and including two loops.

Phase shifts
The Roy equations [55] very strongly constrain the behaviour of the ππ scattering amplitude at low energies. In particular, these equations fully determine the amplitude in terms of its imaginary part, up to the two S-wave scattering lengths, which enter as subtraction constants. Together with the predictions for the scattering lengths obtained on the basis of χPT, this framework offers a remarkably precise representation for the scattering amplitude at low energies [16,57]. In the meantime, the experimental work on kaon decays [58][59][60][61] and pionic or kaonic atoms [62,63] has tested the predictions for the scattering lengths to high accuracy and the dispersive analysis is also confirmed within errors [17,64].
We use the representations for the three phase shifts δ 0 (s), δ 1 (s), δ 2 (s) given in [16]. In that analysis, the values of the phase shifts at √ s 1 = 0.8 GeV are used to control the uncertainties in the low-energy region. We vary these in the range Fig. 1 shows the energy dependence below KK-threshold. Above that energy, dispersion theory does not impose strong constraints on the behaviour of the phase shifts, but since we are using dispersion relations with many subtractions, the uncertainties in the input used there do not play a significant role. For definiteness, we use a parametrization where, above 1.7 GeV, δ 0 (s) and δ 1 (s) are set equal to 180 • , while the exotic phase δ 2 (s) is set equal to zero. By far the most important contribution stems from δ 0 (s). In order to test the sensitivity to the behaviour of this phase shift in the region between KK-threshold and 1.7 GeV, we generously varied the parametrization used in that region, but found that this barely affects any of the results (see the detailed discussion of our numerical results in Appendix E).

Integral equations
For our method it is crucial that the dispersion relations used uniquely determine the amplitude in terms of the subtraction constants. With the form (16) of these relations, that is not the case, however. There, the subtraction constants are collected in the polynomials P I (s). The problem is that the homogeneous equations obtained if these polynomials are set equal to zero admit non-trivial solutions. In its simplest form, the problem shows up if the contributions to the discontinuities from the crossed channels are dropped. The elastic unitarity relation (23) then reduces to three independent constraints of the form disc M I (s) = sin δ I (s) e −iδ I (s) M I (s), or, equivalently, M I (s + i ) = e 2iδ I (s) M I (s − i ). This condition is well-known from the dispersive analysis of form factors and can be solved explicitly: The Omnès function [65], defined by obeys Ω I (s + i ) = e 2iδ I (s) Ω I (s − i ), so that the ratio m I (s) = M I (s)/Ω I (s) is continuous across the cut. Since Ω I (s) does not have any zeros, m I (s) is an entire function. With the asymptotic behaviour of the phase shifts specified in the preceding section, Ω 0 (s), Ω 1 (s) tend to zero in inverse proportion to s, while Ω 2 (s) approaches a constant: As shown in Sec. Bookkeeping then shows, however, that the dispersion relation (16) cannot determine the solution uniquely: The asymptotic behaviour M 0 (s) ∝ s 2 allows a cubic polynomial for m 0 (s), but only a quadratic one for P 0 (s). Hence the general solution involves four free parameters while the dispersion relation only contains three subtraction constants. Evidently, the phenomenon occurs because the Omnès factor Ω 0 (s) tends to zero if s becomes large. This is the case also for Ω 1 (s), while the solution of the dispersion relation for M 2 (s) is determined uniquely by the subtraction constants.
The problem also occurs if the functionsM I (s) are retained. The preceding discussion points the way towards a solution of the problem: It suffices to replace the dispersion relation for M I (s) with the one for the ratio m I (s) ≡ M I (s)/Ω I (s). The corresponding discontinuity is given by With the relation M I (s − i ) = M I (s + i ) − 2i disc M I (s) and the expression (23) for the discontinuity, this becomes Since the functions M I (s) and Ω I (s) only have a right hand cut and Ω I (s) does not have a zero, the dispersion relations can be rewritten in the form In the simplified situation considered above, these equations indeed unambiguously fix the solution in terms of the polynomialsP I (s). Our numerical results indicate that the same is true also for the full set of coupled integral equations, but we do not have an analytic proof of this statement.

Subtraction constants, fundamental solutions
For the phase shift parametrizations we are using, the integrands vanish above 1.7 GeV. Hence convergence is not an issue -we could use unsubtracted dispersion integrals, i.e. set n I = 0 in (32). It is more convenient, however, to instead work with n 0 = 2, n 1 = 1, n 2 = 2, for two reasons: (i) Although the manifold of solutions is exactly the same, for the solutions obtained with n I = 0, the dispersion integrals are quite sensitive to the behaviour of the phase shifts above 0.8 GeV, which is poorly known -the sensitivity is compensated by a corresponding sensitivity of the subtraction constants, but the correlation leads to a clumsy error analysis. (ii) The choice is also more convenient for comparison with earlier work where the dispersion integrals were written in subtracted form. We now impose the constraints introduced in Sec. 2.4 to make the decomposition unique. Since M 0 (s) then grows only quadratically,P 0 (s) is of the form α 0 + β 0 s + γ 0 s 2 + δ 0 s 3 . The linear growth of M 1 (s) leads toP 1 (s) = α 1 + β 1 s + γ 1 s 2 and the condition M 1 (0) = 0 implies α 1 = 0. Finally, the asymptotic behaviour M 2 (s) ∝ s impliesP 2 (s) = α 2 + β 2 s and the condition M 2 (0) = M 2 (0) = 0 yields α 2 = β 2 = 0. The dispersion relations thus take the following final form: where the integration measure stands for The general solution of the constraints imposed by elastic unitarity and the asymptotic conditions thus involves altogether six subtraction constants: α 0 , β 0 , . . . , γ 1 . Note that these constraints are linear. The general solution of our system of integral equations is a linear combination of six fundamental solutions: The fundamental solutions only depend on the ππ phase shifts, are uniquely determined by these and can be calculated once and for all. The first one, M α 0 I (s), for instance, represents the solution of our integral equations for α 0 = 1, β 0 = . . . = γ 1 = 0. It can be calculated iteratively. As a starting point of the iteration, one may use the solution obtained if the phase shifts are set equal to zero, so that the dispersion integrals in (33) vanish and Ω I (s) = 1. In the case of M α 0 I (s), the starting point of the iteration is M α 0 0 (s) = 1, M α 0 1 (s) = M α 0 2 (s) = 0. Inserting the corresponding angular averages in the integrals in (26), the evaluation of (33) yields the result of the first iteration. The procedure can then be repeated, using this result as a new start. From the second iteration on, the complications in the evaluation of the angular averages discussed in Sec. 2.5 must be accounted for -they do affect the computing   time, but the iteration only requires a few steps to converge. Fig. 2 shows the result for this particular fundamental solution. The comparison of the first and last panels shows that the neutral component of the solution is dominated by the contribution from M 0 (s).

Taylor invariants
The subtraction constants are closely related to the coefficients of the Taylor expansion of the functions M 0 (s), M 1 (s), M 2 (s) in powers of s: In the form (33) of the dispersion relations, the six coefficients A 0 , B 0 , C 0 , D 0 , B 1 , C 1 uniquely determine the six subtraction constants α 0 , β 0 , γ 0 , δ 0 , β 1 , γ 1 and vice versa, but this only holds for the particular choice made, where some of the subtraction constants are set equal to zero.
The polynomial ambiguities in the isospin components amount to corresponding ambiguities in the Taylor coefficients. In the case of M 1 (s), for instance, the transformation law (20) amounts to a linear transformation of the Taylor coefficients belonging to this component: The sum over the isospin components remains the same, provided the coefficients of M 0 (s) and M 2 (s) are subject to corresponding transformations. The Taylor coefficients thus transform in a non-trivial manner under G 5 , but it is a simple matter to check that the six combinations are invariant. We refer to these quantities as Taylor invariants. They fully characterize the representation in a manner that does not depend on the choices made when decomposing M c (s, t, u) into the isospin components M 0 (s), M 1 (s), M 2 (s): Knowledge of the invariants K 0 , . . . , K 5 determines the isospin components up to polynomials that are irrelevant because they drop out in the sum. Instead of specifying the six subtraction constants, we can equally well specify the six Taylor invariants. This will be useful when comparing the dispersive solutions with the representations obtained from χPT.
For K 0 , the expression in terms of the subtraction constants is particularly simple. In the form (33) used for the dispersion relations, the coefficients A 2 and B 2 vanish, so that this invariant is determined by the first two coefficients of the Taylor expansion of the function M 0 (s): where ω 0 is the first derivative of the Omnès factor Ω 0 (s) at s = 0. Hence K 0 is related to the subtraction constants by K 0 = (1 + ω 0 s 0 )α 0 + s 0 β 0 . While α 0 is dimensionless, β 0 is of dimension 1/Energy 2 . Expressing the value of β 0 in GeV units, the relation takes the form

Nonrelativistic expansion
The nonrelativistic region concerns the behaviour of the functions M 0 (s), M 1 (s), M 2 (s) in the vicinity of s = 4M 2 π . The structure of the amplitude in that region is governed by the fact that the branch cut singularity generated by elastic final state interactions among two of the pions is of the square-root type: Below the inelastic thresholds, the amplitude has only two sheets -the functions M 0 (s), M 1 (s), M 2 (s) are analytic in the variable q = s/4M 2 π − 1. They can be expanded in a Taylor series: and likewise for M 1 (s) and M 2 (s). The velocity of the two particles in their center-of-mass system is given by v = q/ 1 + q 2 . Accordingly the series (39) essentially amounts to an expansion in powers of the velocity.
At a given value of s, the two sheets only differ in the sign of q. Hence the discontinuity is given by the contributions from the odd powers Our integral equations fully determine the amplitude as a linear combination of the subtraction constants and the coefficients of the nonrelativistic expansion inherit this property. This implies that only six of the coefficients are independent, m 0 0 , m 2 0 , m 4 0 , m 6 0 , m 2 1 , m 4 1 , for instance. All other coefficients of the nonrelativistic expansion can explicitly be expressed as linear combinations of these. In the nonrelativistic expansion, the integral equations thus boil down to an infinite set of linear relations among the expansion coefficients.
The nonrelativistic effective theory [37][38][39][40][41][42] represents an alternative framework for the analysis of the decay η → 3π. In the two-loop representation of the amplitude given in [38], the ππ phase shifts only enter via the first few terms of the effective range expansion. Indeed, the values do provide a rather accurate representation of the ππ scattering amplitude, throughout the physical region of η → 3π. They determine the coefficients of the loop integrals occurring in the NREFT representation of the functions M 0 (s), M 1 (s), M 2 (s). The representation of Ref. [38] does account for the mass difference between the charged and neutral pions, but otherwise neglects the electromagnetic interaction. It involves six low-energy-constants, To compare this framework with ours, we consider the isospin limit. In this limit, the pion mass difference disappears and only four of the LECs are independent: In the isospin limit, the one-loop integrals of the nonrelativistic effective theory are described by the function J(q) = i q/ 1 + q 2 , which only involves odd powers of q. At two loops, there are contributions proportional to the two-loop integral F (q) as well as terms proportional to J(q) 2 . The nonrelativistic expansion of F (q) involves odd as well as even powers of q. Chopping the expansion off at O(q 4 ) yields a very accurate representation of this function, throughout the physical region. If the loop contributions are dropped, M 0 (s) reduces to a quadratic polynomial in s, M 1 (s) becomes proportional to s, while M 2 (s) vanishes. The LECs L 0 , . . . , L 3 play a role analogous to the subtraction constants α 0 , . . . γ 1 of the dispersive framework, but there is a qualitative difference: While the LECs are real, the subtraction constants can be complex. Note also that the decomposition of the amplitude into isospin components is unique only up to polynomials. When comparing the components of the NREFT representation with those of dispersion theory, the polynomial ambiguities must be taken into account. This can be done with the method used when matching the dispersive and chiral representations. The polynomial ambiguities only affect the coefficients of the even powers of q. There are analogs of the Taylor invariants -suitable linear combinations of the coefficients m 2k 0 , m 2k 1 , m 2k 2 -that do not depend on the choice made when decomposing the amplitude into isospin components. Four such invariants are within reach of the two-loop representation. Hence there is a unique dispersive solution with four subtraction constants that matches the generic two-loop representation in the isospin limit. Alternatively, one may compare the dispersive and nonrelativistic amplitudes in the physical region and minimize the difference between the two. We will carry this out for one particular nonrelativistic representation in Sec. 5.9.

Current algebra, Adler zero
The leading term in the chiral expansion of the transition amplitude was worked out from current algebra, long before the formulation of χPT [28,29]. In the normalization (4), it exclusively involves s, M π and M η : The formula exhibits an Adler zero at s = 4 3 M 2 π . The zero is outside the physical region, where s is confined to 4M 2 π < s < (M η −M π ) 2 . The rapid growth of the observed Dalitz plot distribution does show that the square of the amplitude grows with s, but the leading term represents a decent approximation to the full amplitude only at small values of s. Already at s = 4M 2 π , the final state interaction generates a pronounced momentum dependence which in the chiral expansion starts showing up at NLO.

χPT to one loop
The chiral perturbation series of the transition amplitude was worked out to NLO in the framework of SU(3)×SU(3) in [11]. In this framework, the final state interaction manifests itself through one-loop graphs involving pions as well as kaons or η-mesons. The amplitude can be expressed in terms of the meson masses M π , M K , M η , the decay constants F π , F K and the low-energy constant L 3 . We use the numerical values F π = 92.28(9)MeV [66], F K /F π = 1.193(3) [27] and rely on the recently improved determination of L 3 from K 4 decay, L 3 = −2.63(46) · 10 −3 [67], so that the one-loop representation does not contain any unknowns.
While the dispersive representation yields an accurate description of the momentum dependence in the entire range from s = 0 to the physical region and even beyond, the truncated chiral expansion is useful only at small values of s, where it can be characterized by the lowest few coefficients of the Taylor series (36). The contributions from the loop graphs are determined by the masses of the Nambu-Goldstone bosons and the pion decay constant. The tree graphs, on the other hand, yield polynomials of up to O(p 4 ) in the momenta. The coefficients of these polynomials are in one-to-one correspondence with the Taylor coefficients A 0 , B 0 , C 0 , A 1 , B 1 , A 2 , B 2 , C 2 . Together with F π , these coefficients thus uniquely determine the one-loop representation.
The polynomial ambiguities also show up in the decomposition of the chiral representation. At one loop, the polynomial parts of M 0 (s), M 2 (s) are quadratic in s, while M 1 (s) is linear in s. The transformations (20), (21) retain this property only if a is set equal to zero. This shows that the polynomial ambiguities of the one-loop representation form a four-dimensional subgroup G 4 of the general invariance group G 5 associated with the decomposition (17). Only 8 − 4 = 4 combinations of the eight Taylor coefficients listed above are invariant under this group of transformations. We may identify these with what remains of the Taylor invariants K 0 , K 1 , K 2 , K 3 if the coefficients D 0 , C 1 , D 2 are dropped: Since K 0 does not contain D 0 , C 1 or D 2 , the quantity H 0 is identical with it -this combination is invariant under the full group G 5 . For H 1 , however, this is not the case: involves the coefficient C 1 , which is beyond reach at one loop, but is needed for K 1 to be invariant under the full group. The situation with K 2 and K 3 is similar: The constants H 0 , H 1 , H 2 , H 3 contain the essence of the one-loop representation: If they are known, the transition amplitude is uniquely determined by unitarity, to NLO of the chiral expansion (an explicit proof of this statement can be found in Appendix B). In this sense, the momentum dependence of the chiral representation is not of interest -dispersion theory provides better control over that. The general principles that underly dispersion theory, however, do not determine the subtraction constants. That is where χPT can offer useful information.
In the following, we will make use of the remarkably accurate experimental determination of the Dalitz plot distribution [22], which subjects the Taylor invariants to strong constraints. More precisely, since the distribution is normalized to 1 at the center, these data concern their relative size rather than the constants themselves. We use the invariant H 0 to parametrize the normalization of the amplitude and describe the relative size of the Taylor invariants by means of the variables While experiment yields strong constraints on h 1 , h 2 , h 3 , it cannot shed any light on the value of H 0 , because this term fixes the normalization of the amplitude M c (s, t, u) rather than A c (s, t, u), which is what can be measured. We need to rely on χPT to determine H 0 . At leading order of the chiral expansion, the normalization (4) implies H LO 0 = 1. Working out the Taylor coefficients of the one-loop representation, which is given explicitly in Appendix B, one readily verifies the representation The constants ∆ GMO and ∆ F stand for and the remainder contains the chiral logarithms typical of χPT -in the present case, it involves contributions proportional to M 2 π ln(M 2 π /M 2 η ) and to ln(M 2 K /M 2 η ). The relation (46) amounts to a low energy theorem: Up to contributions of next-to-next-to-leading order, the invariant H 0 is determined by the masses and decay constants of the Nambu-Goldstone bosons.
Remarkably, despite the fact that the η undergoes mixing with the η , the formula (46) only contains M η , while M η does not occur. The role played by the η in the low-energy structure of QCD is well understood. It can be studied in a systematic manner by invoking the large N c limit, where the η becomes massless and can be treated on the same footing as the Nambu-Goldstone bosons [68]. This framework gives a good understanding of the size of the LEC L 7 , which determines the deviation from the Gell-Mann-Okubo formula and enters the low-energy theorem via the term ∆ GMO . Indeed, as shown in Ref. [69], the contribution from this term in the low energy theorem (46) fully accounts for the effects generated by η-η -mixing at O(m quark ) -it would be wrong to supplement χPT with an extra wheel to account for η-η -mixing.
Note that the dependence on the decay constants is suppressed by a factor of M 2 π -if the two lightest quarks are taken massless, H 0 is fully determined by the masses of the Nambu-Goldstone bosons, up to NNLO contributions. At the physical values of the masses and decay constants, the term proportional to ∆ F amounts to 0.036. The contribution from the chiral logarithms is also small: chilogs = 0.037. The dominating contribution stems from the term ∆ GMO and amounts to 0. 103 Numerically, the singular term dominates the difference between h NLO 1 and h LO 1 . We conclude that it is meaningful to truncate the chiral expansion of the Taylor coefficients at NLO. The invariant X is approximated with the one-loop result X NLO and the uncertainties from the omitted higher orders are estimated at 0.3 |X NLO − X LO |. This is on the conservative side of the rule mentioned above and yields the following theoretical estimate for the four Taylor invariants: The estimate used for h 3 in particular also covers the comparatively small uncertainty in the value of L 3 .

χPT to two loops
Bijnens and Ghorbani [12] have worked out the chiral perturbation series of the transition amplitude to NNLO. The amplitude retains the form (17), but the isospin components M 0 (s), M 1 (s), M 2 (s) pick up additional contributions, which can be expressed in terms of the meson masses and the LECs that occur in the effective Lagrangian. As discussed above, elastic unitarity determines the one-loop representation in terms of the tree graph amplitude up to a polynomial, which can be characterized by the four Taylor invariants H 0 , . . . , H 3 . The situation at NNLO is analogous: Elastic unitarity determines the amplitude in terms of the one-loop representation up to a polynomial. Since the amplitude now includes terms of O(p 6 ), the polynomial is of higher degree and now contains six independent terms rather than four: Hence there are six combinations of Taylor coefficients that are independent of the choice of the decomposition. At two loops, all of the six Taylor invariants K 0 , . . . , K 5 are needed to characterize the representation.
The invariants K 0 , . . . , K 5 can also be used to characterize the solutions of our system of integral equations. The Taylor coefficients of the dispersive representation are given by linear combinations of the six subtraction constants and uniquely determined by these. Knowledge of the subtraction constants thus fixes the Taylor invariants K 0 , . . . , K 5 and vice versa: The degrees of freedom inherent in the two-loop representation are in one-to-one correspondence with the degrees of freedom occurring in our integral equations.
The Taylor coefficients of the representation specified in [12] can be worked out with the code provided by Bijnens and collaborators [70]. For the numerical values of the corresponding invariants K 0 , . . . , K 5 , we then obtain: The main problem with the two-loop representation is that it involves new low-energy constants. These arise from the effective Lagrangian of O(p 6 ) and are not known to a precision comparable to the parameters that enter the one-loop representation. They show up in the real parts of K 0 , . . . , K 5 . There is a parameter free prediction only for one of these: The invariant K 4 does not get a contribution from the low-energy constants of NNLO. 6 Estimating the uncertainties in the prediction for Re K 4 with the rule of Sec. 3.2, we obtain Re K 4 = 113 (34) .
As we will see in Sec. 6, where we compare the representation of Bijnens and Ghorbani with the outcome of our dispersive analysis, this prediction is perfectly consistent with experiment.

Imaginary parts at two loops
The coefficients of the Taylor expansion of the Omnès factors are real, but the expansion of the dispersion integrals in (33) in powers of s yields complex coefficients. Accordingly, the linear relations between the Taylor invariants and the subtraction constants involve complex coefficients. As the dispersion integrals arise from the discontinuities in the crossed channels, they are small: If the subtraction constants are real, the imaginary parts of the Taylor invariants are small. Indeed, in the chiral expansion, the Taylor invariants start picking up an imaginary part only at two loops. Unitarity implies that the leading terms in the chiral expansion of the imaginary parts only involve those low-energy constants that occur already in the one-loop representation of the transition amplitude, which are known: The imaginary parts of K 0 , . . . , K 5 represent parameter free predictions. Applying the rule given in Sec. 3.2 to estimate the uncertainties, we obtain As they are small, the imaginary parts of the subtraction constants do not play an important role in our analysis. In the letter version of our work [3], we shortened the presentation by simply setting the imaginary parts of the subtraction constants equal to zero and we stick to this approximation throughout the first part of the present paper. We will return to the issue in Sec. 5.7 and determine the changes occurring if we do not take the subtraction constants real, but instead fix the imaginary parts of the Taylor invariants with Eq. (52). As we will see, the modification barely affects our results.

Matching the dispersive and one-loop representations
At one loop, the Taylor invariants are known within rather small uncertainties. We now work out the dispersive representation that matches the one-loop representation in the sense that the behaviour of the functions M 0 (s), M 1 (s), M 2 (s) at small values of s is the same: the dispersive solution that possesses the same Taylor invariants. More precisely, as we are working with real subtraction constants, we can match only the real parts of the Taylor invariants.
Since only four of the invariants are within reach of the one-loop representation, fixing these does not suffice to determine the solution uniquely. We therefore consider a simplified setting by imposing stronger asymptotic conditions on the dispersive representation: The amplitude M c (s, t, u) is allowed to grow at most linearly when the Mandelstam variables become large. The subtraction constants δ 0 and γ 1 must then be set to zero because the fundamental solutions belonging to them violate the stronger form of the asymptotic condition. We fix the remaining four subtraction constants by requiring that the real parts of the four Taylor invariants of the dispersive representation agree with those obtained at one loop. With the central values in (49), this gives (GeV units) We refer to this solution of our integral equations as the matching solution. Although it does not represent a fit to data, we denote it by fitχ 4 , to simplify the notation used when comparing the various solutions to be discussed below. The label χ indicates that this solution makes use of the constraints imposed by chiral symmetry and 4 is the number of subtraction constants used. In order to compare the isospin components of the matching solution with those of the one-loop representation, we need to fix the decomposition of the latter. This can be done in such a way that the two representations match not only in the real parts of the Taylor invariants within reach of the one-loop representation, but in the real parts of the Taylor coefficients themselves. With this choice of the decomposition, the two representations for Re M 0 (s), Re M 1 (s), Re M 2 (s) agree at small values of s. Fig. 3 compares the matching solution with the chiral representation. By construction, the real parts of the two versions of the amplitude are very close at small values of s. The figure shows that, for the dominating contribution, Re M 0 (s), the more precise treatment of the final state interaction only generates a rather modest change in the physical region. In the small components, M 1 (s), M 2 (s), the changes are more pronounced. The relative size of the corrections is larger because these components vanish altogether at LO, so that the one-loop representation only gives the leading term of the chiral series -in M 0 (s), the one-loop representation is more accurate because it contains the leading as well as the first non-leading order of the series. The imaginary parts of the chiral representation vanish for s < 4M 2 π . Those of the dispersive representation are different from zero in that region, but are very small there because they exclusively arise from the crossed channels. Above threshold, however, the one-loop representation strongly underestimates the imaginary parts. It is not difficult to see why that is so: The dominating contribution to Im M 0 is the one proportional to sin 2 δ 0 . At one-loop, the representation for the ππ phase shifts enters at LO, where the scattering length of the I = 0 S-wave is given by Weinberg's current algebra result [71]: a LO 0 = 0.16 in pion mass units, below the prediction a 0 = 0.220(5) [16] by the factor 1.38. The one-loop representation underestimates the imaginary part of M 0 roughly by the square of this factor. √ t A 550 MeV. As far as the isospin components M 0 (s) and M 1 (s) are concerned, only their behaviour at small arguments of order s s A matters, but M 2 (s) is needed for s t A as well as for s s A . Adler's low-energy theorem thus concerns the behaviour of the amplitude not only at small values of s and u, but also in the vicinity of t = t A . In particular, the contributions from kaon loops to M 2 (t A ) are relevant. The fact that these do not move the position of the zero far away from the place where it occurs in current algebra shows that they do obey the constraints imposed by chiral symmetry.

Adler zero at one loop
For the matching solution, the Adler zero occurs in the same ball park: s A = 1.36M 2 π . By construction, the behaviour at small arguments is the same as for the one-loop representation, but Fig. 3 shows that the chiral and dispersive representations for Re M 2 (s) differ significantly in the physical region. The graph for Re M 2 in Fig. 3 is drawn on a sufficiently wide range to show that the two representations approach one another above the physical region and intersect at s 16.8M 2 π -this ensures that the two solutions have the Adler zero at approximately the same place.

Neutral decay mode
The plot for the neutral isospin component M n (s) in Fig. 3 can again barely be distinguished from the one for M 0 (s), because the exotic component M 2 (s) is small (in particular, the final state interaction in the channel with I = 2 is repulsive, so that the amplification seen in the channel with I = 0 does not occur.) The picture gives the impression that, in the physical region, the one-loop and dispersive representations of the transition amplitude of the neutral mode are practically the same. This is not the case, however. Fig. 5 shows that the corresponding Dalitz plot distributions are qualitatively different. At leading order, the Dalitz plot distribution of the neutral decay mode is flat, D LO n (s, t, u) = 1. At NLO, the distribution picks up a positive curvature: The parameter-free one-loop prediction for the slope of the Z-distribution [14] is positive and hence disagrees with experiment, even in sign (the definition and the properties of that distribution will be discussed in detail in Sec. 7.5). The more accurate account of the final state interaction provided by the matching solution (fitχ 4 ) makes a qualitative difference here: The curvature of this solution is negative. This points to a resolution of the puzzle mentioned in point 4. of the introduction. Indeed, as shown in [3] and discussed in detail in Sec. 7.3, the value of the slope predicted within our framework is in excellent agreement with experiment. Fig. 3 shows that at NLO, the neutral component M n (s) is quite close to the matching solution: In the physical region, the difference does not exceed 15 %. Fig. 5 shows, however, that in the corresponding Dalitz plot distributions, a difference of this size generates a qualitative change. To see why that is so, we expand the neutral component around the center of the Dalitz plot: In the total amplitude M n (s) + M n (t) + M n (u), the linear term drops out. For the Dalitz plot distribution, the expansion starts with the quadratic term: The dimensionless quantity α = 2 9 M 2 η (M η − 3M π ) 2 Re b n is referred to as the slope of the distribution. In the one-loop approximation, the quadratic term is so small that it can barely be seen in Fig. 3. In the matching solution, this term is more than twice as large and of opposite sign.
As noted above, in connection with the imaginary parts, the chiral representation only offers a crude, semi-quantitative description of the final state interaction. The comparison of the LO and NLO representations for M n (s) shows that, at the center of the Dalitz plot, the effects generated by this interaction are large: The one-loop contributions modify the tree level amplitude by more than 50 %. We conclude that the truncated chiral series does not have the accuracy required to make a meaningful statement about the slope.

Isospin breaking corrections
The decay η → 3π violates isospin conservation. As discussed in Sec. 2.1, the dominating contribution to the transition amplitude can be represented in the form (4), as a product of the factor (M 2 K 0 − M 2 K + ) QCD which breaks isospin symmetry and the factor M c (s, t, u) which is invariant under isospin rotations. The basic properties of the amplitude M c (s, t, u) were discussed in the preceding sections -we now turn to the remainder, which is of order . While the effects due to (m u − m d ) 2 are tiny, those from the electromagnetic interaction must properly be taken into account when comparing theory with experiment. In particular, the e.m. self-energy of the charged pion generates a mass difference to the neutral pion which affects the phase space integrals quite significantly.
In the literature, the corrections of order O[e 2 , (m u −m d ) 2 ] have been calculated by several groups, to different levels of accuracyi.e. to different orders of the expansion in the isospin breaking parameters. In the present paper we will rely on the work of Ditsche, Kubis and Meißner (DKM) [18], who evaluated the transition amplitude within the effective theory relevant for QCD+QED, to first non-leading order of the chiral expansion and to order e 2 in the electromagnetic interaction, with unequal up and down quark masses and in the presence of real as well as virtual photons. An earlier calculation by Baur, Kambor and Wyler [72], performed in the same framework, did not include effects of order e 2 (m u − m d ). These are of second order in isospin breaking and were deemed to be negligible. Ditsche, Kubis and Meißner, however, correctly observe that while terms of order (m u − m d ) 2 are indeed negligible, there are a number of effects which scale as e 2 (m u − m d ) and should be taken into account, like real and virtual photon corrections to the purely strong amplitude, and also, and most importantly, effects related to the pion mass difference, which are in particular responsible for the presence of cusps in the Dalitz plot of η → 3π 0 .  Figure 6: The left panel shows the Dalitz plot geometry for the decay η → π + π − π 0 in the plane of the two independent variables X c , Y c . The shaded area indicates the physical region, the full lines that are tangent to this region represent singularities generated by the final state interaction. In addition to the branch cut at s c = 4M 2 π + (full), the s-channel contains a further such singularity outside the physical region, at s c = 4M 2 π 0 (dash-dotted). The right panel shows the kinematics of the decay η → 3π 0 . In this channel, Bose statistics implies that the amplitude is invariant under rotations by 120 • as well as under reflections at the lines where t n = u n or s n = u n or s n = t n , which divide the physical region into six physically identical sextants -the data points in one of these determine the entire distribution. The branch cut singularities where s n , t n or u n are equal to 4M 2 π 0 are tangent to the boundary while those at 4M 2 π + are visible as cusps in the physical region.
Isospin breaking also affects the phase shifts of ππ scattering. We take these from the solution of the Roy equations reported in [16], which is done in the isospin limit. Our dispersive analysis is also carried out in that limit. In order to correct our results for isospin breaking effects, we make use of Chiral Perturbation Theory. We first study the effects of isospin breaking in this framework, comparing the representation of Ditsche, Kubis and Meißner [18], which does account for isospin breaking, with the one of Gasser and Leutwyler [11], which concerns the isospin limit. Our estimates for the size of the isospin breaking effects in the physical amplitudes rely on the assumption that these effects factorize, at least approximately. The branching ratio B = Γ η→3π 0 /Γ η→π + π − π 0 provides a strong test of the assumptions that underly our analysis.

Kinematics
The Mandelstam variables are not independent. We work with s and τ ≡ t − u. The value of the sum s + t + u depends on the masses of the particles occurring in the final state. We reserve the symbols s, t, u for the isospin symmetric world, use the variables s c , t c , u c for the charged decay mode and s n , t n , u n for the neutral mode. The constraints determine all of the Mandelstam variables in terms of (s, τ ), (s c , τ c ), (s n , τ n ).
Note that, up to normalization, τ coincides with the standard Dalitz plot variable X, while s is linear in Y . In the case of the charged decay mode, the relations read In these variables, the physical region is characterized by Since the masses of π 0 and π + differ, the final state interaction among the pions generates several different branch points. The left panel of Fig. 6 shows the location of these singularities for the charged decay mode, in the plane spanned by X c and Y c . They represent straight lines that touch the boundary of the physical region. The s-channel contains two branch points, one at 4M 2 π 0 , the other at 4M 2 π + . The straight line s c = 4M 2 π + also touches the boundary, while the line s c = 4M 2 π 0 runs outside the physical region. The singularities in the t-and u-channels occur at t c = (M π 0 + M π + ) 2 and u c = (M π 0 + M π + ) 2 , respectively.
The Adler zero discussed in Sec. 3.6 occurs along the line s c = u c , which is indicated as a dashed line, but the relevant value of s c is around 4 3 M 2 π , which is outside the range shown in this figure. The symmetry with respect to t ↔ u implies that an Adler zero also occurs along the line s c = t c , at the same value of s c .
The amplitude relevant for the decay into 3π 0 is invariant under the exchange of the three Mandelstam variables also in the presence of isospin breaking. Each of the three channels contains a pair of branch points at 4M 2 π 0 and 4M 2 π + . The right panel of Fig. 6 shows that the three straight lines with s n , t n or u n equal to 4M 2 π 0 touch the boundary of the physical region, while the other three branch cuts run across this region and manifest themselves as cusps in the Dalitz plot distribution. The relations between s n , τ n and the variables X n , Y n used in the figure are obtained from (58) by replacing M π + with M π 0 , while those among the variables s, τ and X, Y of the isospin symmetric world are reached with the substitutions

Isospin breaking at one loop
We denote the representations given in [18] for the amplitudes of the decays η → π + π − π 0 and η → 3π 0 by A DKM c (s c , t c , u c ) and A DKM n (s n , t n , u n ), respectively. In addition to the constants F π , F K , L 3 that occur in the one-loop representation already in the isospin limit, the expressions involve the two isospin breaking parameters δ = m d − m u and e, the meson masses M π + , M π 0 , M K + , M K 0 , M η , and a set of low-energy constants, K 1 , . . . , K 11 , which stem from the effective Lagrangian for the electromagnetic interaction. The infrared singularities occurring in loops that involve virtual photons are regularized by giving these a nonzero mass m γ . We work in the normalization (the constant N is specified in Eq. (4)): We have checked that, in the limit e → 0, m u → m d , these quantities indeed reduce to the isospin symmetric amplitudes M GL c (s, t, u), M GL n (s, t, u) of Gasser and Leutwyler [11]. Photon exchange generates poles in M DKM c (s c , t c , u c ) at s c = 0, t c = 0 and u c = 0. Moreover, the exchange of a photon between the charged pions in the final state gives rise to the so-called Coulomb pole, which in the one-loop representation is described by a triangle graph. It only shows up in the amplitude for the charged decay mode in the form of a contribution to the s-channel discontinuity, where T (s c ) stands for the current algebra approximation to the transition amplitude specified in (43). This contribution diverges at the boundary of the Dalitz plot, where s c → 4M 2 π + . Remarkably, despite these additional singularities, the one-loop representation obeys elastic unitarity also in the presence of photons: The amplitude M DKM c (s c , t c , u c ) can be expressed in terms of three functions of a single variable according to (17) and M DKM n (s n , t n , u n ) retains the form (18). Only the explicit expressions for the components are modified and the relation (19) between the components relevant for the charged and neutral decay modes is lost. As it is the case without isospin breaking, for the charged decay mode one function of a single variable is needed for the s-channel (S-wave) and two functions (S-wave and P-wave) for the t-and u-channels. The decay is necessarily accompanied by the emission of real photons and the comparison with the data must properly account for that. The main features of the phenomenon are universal and are thoroughly discussed in the literature [73]. Up to and including O(e 2 ), the rate of the decay η → π + π − π 0 contains two contributions, one from the square of the amplitude relevant for the decay without real photons in the final state, the other from the square of the amplitude for the emission of one real photon. It is well-known that both of these contributions are infrared divergent and that, in the sum of the two, the infinities cancel. The only physical remnant of the infrared divergences is that the probability for generating a real photon depends logarithmically on the upper limit set for the energy of the emitted photon. In the comparison with the data, the maximal photon energy in the rest frame of the η, which is denoted by E max , is determined by the experimental resolution.
The DKM-representation is regularized by giving the virtual photons a mass m γ . The explicit expression for the amplitude M DKM c (s c , t c , u c ), which represents the transition without real photons, diverges logarithmically if m γ is sent to zero. To leading order in the chiral expansion, the divergent part is given by while the divergence of the soft-photon contribution is of the form To leading order of the chiral expansion, where the finite part in (62) is given by T (s c ), the divergences thus cancel as they should: In effect, adding the contribution from the production of real photons converts the divergent term ln(m 2 γ /M 2 π ) into the finite expression ln(4E 2 max /M 2 π ). At leading order of the chiral expansion, the production of real photons with E < E max can therefore be accounted for in a very simple manner: Stick to the amplitude relevant for the decay without emission of real photons, equip the virtual photons with a mass m γ and set m γ = 2E max . This also provides us with an estimate of the sensitivity to E max : Replacing m γ by 2E max in the one-loop representation of [18] and varying E max in the range M π < 2E max < M η , the quantity |M DKM c (s c , t c , u c )| 2 only changes by half a permille. We conclude that, at the present accuracy, the sensitivity to the experimental resolution is an academic problem and set 2E max = M π . Apart from that, we follow the prescriptions used by Ditsche, Kubis and Meißner [18] to compare the calculated amplitudes with the experimental results (see the discussion in Sect. 3.2.6 therein). In particular, we assume that the Coulomb pole specified in (61) is accounted for in the data analysis and replace the amplitude of [18] Neither photon emission nor the Coulomb pole enter the amplitude M DKM n (s n , t n , u n ), which we take over from Ref. [18] as it is.

Self-energy effects
In the decay η → π + π − π 0 , the self-energy of the charged pion directly affects the kinematics, as it is relevant for the size of the physical region and for the value of s c + t c + u c . The selfenergy of the charged pion increases its mass and hence reduces the phase space available in the charged decay mode -since phase space is small, this makes a significant difference, which must be accounted for. In early work on η-decay, this was done only very crudely: In the calculation of the decay rate, the square of the isospin symmetric amplitude was simply integrated over the physical phase space rather than the isospin symmetric one.
The one-loop representation allows us to separate the self-energy effects from the re-maining contributions generated by the electromagnetic interaction: The amplitude can be evaluated at the physical masses of the mesons even if e is set equal to zero. The left panel of Fig. 7 depicts the square of the ratio , along the lines s = u and t = u. It shows that the remaining electromagnetic contributions vary in the narrow range 0.997 < |K e c | 2 < 1.022. As seen in the right panel, the square of the correction factor K e n = M DKM n (s, t, u)/M DKM n (s, t, u) e=0 relevant for the neutral channel is also of the order of 1 %, but nearly constant over the entire physical region: 0.98757 < |K e n | 2 < 0.98765. This implies that in the Dalitz plot distribution of the decay η → 3π 0 , the corrections generated by the electromagnetic interaction are totally dominated by the self-energy effects.

Kinematic map for
Any comparison of an isospin symmetric transition amplitude with experiment requires that the values of s and τ that correspond to a given point s c and τ c of physical phase space are specified -a map from the physical world into the space spanned by the variables s and τ is needed: The map is all but unique, but not any choice is acceptable. The simplest possible one, for instance, the trivial map s = s c , τ = τ c , fails because it generates fictitious singularities: The branch point t = 4M 2 π is mapped into a line of constant t c , but the value 7 of the constant, 1 2 Hence the image of the singularity crosses the physical region: The trivial map produces a fictitious cusp in the Dalitz plot distribution.
In current algebra approximation, the amplitude only depends on s and the one-loop representation shows that the variable τ does not play an important role at NLO, either. The representation of Ditsche, Kubis and Meißner [18] indicates that this remains true even in the presence of isospin breaking: The leading terms 8 of the Taylor series of the map (64) in powers of τ c , suffice to obtain a good understanding of the deformation of phase space generated by the electromagnetic interaction. The coefficients f c [s c ], g c [s c ] can be chosen such that the map does not generate any fictitious singularities in the physical region: It suffices to impose the condition that the boundary of physical phase space is taken into the boundary of isospin symmetric phase space. We refer to such maps as boundary preserving. Since the branch points of the isospin symmetric amplitude relevant for the charged mode do not pass through the physical region, their image will automatically also have this property. The requirement amounts to the condition which fixes one of the coefficients of the map in terms of the other: The function τ max c (s c ) is specified in (59), while τ max (s) is obtained from this one with We choose a parabola that goes through these two points and, in addition, maps the center of the physical Dalitz plot into the center of the isospin symmetric one. We adopt the definition used in phenomenological analyses of the data, where the center is specified in terms of the standard Dalitz plot variables of Eq. (58), as the point with the coordinates X c = Y c = 0. It sits at , slightly to the right of the place where s c = t c = u c , i.e. where the dashed lines in Fig. 6 intersect. The explicit expression for f c [s c ] involves M π + , M π 0 as well as M π , M η and is rather clumsy. In the convention we are using, where the isospin limit is taken such that M π + stays put (M π = M π + ), it simplifies to The deformation of the trivial map s = s c needed to preserve the boundary is measured by the coefficients p c , q c , which are proportional to M π + − M π 0 . This difference is dominated almost totally by the self-energy of the charged pion. Numerically, the deformation is small throughout the physical region: The difference between s c and s reaches the maximum at the upper end of the range of interest and amounts to 2.2 % there, but this suffices to ensure that the lines s = 4M 2 π , t = 4M 2 π and u = 4M 2 π , where the amplitude is singular, do not enter the physical region. Note that the map is fully specified by the meson masses -in this sense, the deformation of phase space discussed in the present section represents a purely kinematic effect. As will be shown in the next section, the full modification brought about by isospin breaking at one loop includes a second, qualitatively different contribution that is approximately constant over phase space. Hence it affects the Dalitz plot distribution only little, but has an important effect on the rate of the decay.
The extension to the decay η → 3π 0 meets with a technical problem: The map obtained by applying the above construction to the corresponding transition amplitude does take the physical region of the neutral Dalitz plot onto the isospin symmetric one, but does not respect Bose statistics, because it does not treat s on equal footing with t and u. As shown in Appendix C, this shortcoming is easily cured -the kinematic map specified in (172)-(176) does preserve the symmetry under exchange of s, tand u as well as the boundary and the center of the physical region. In the following, we use this map to analyze isospin breaking effects in the neutral channel.

Applying the kinematic map to the one-loop representation
We now apply the map constructed in the preceding section to the one-loop representation. At that level, the isospin symmetric amplitude is given by M GL c (s, t, u). The boundary preserving map defined in (65), (67), (68) expresses the variables s and τ = t − u in terms of those relevant for the physical phase space of the charged decay mode. With the constraint (57) for s + t + u, the variables t and u can also be expressed in terms of s and then lives on physical phase space and has the three branch points that occur at the boundary of the physical region, s c = 4M 2 π + , t c = (M π 0 +M π + ) 2 , u c = (M π 0 +M π + ) 2 , at the proper place. The only qualitative difference with the full one-loop amplitude M DKM c (s c , t c , u c ) is that the branch cut due to π + π − → π 0 π 0 → π + π − , which occurs outside the physical region at s c = 4M 2 π 0 , is missing. We use the ratio to account for the difference between the full amplitude and the one obtained from the isospin symmetric representation with a purely kinematic map. The left panel of Fig. 8 shows that, in the physical region and along the line t c = u c , this ratio is roughly constant at one loop. The same is true along the line s c = u c . Indeed, in the entire physical region, the factor |K c (s c , t c , u c )] 2 only varies in the range 1.031 < |K c | 2 < 1.078. The right panel of Fig. 8 shows the square of the analogous factor relevant in the neutral channel, It describes those effects in the one-loop representation of the decay η → 3π 0 that are not already accounted for by the kinematic map (the explicit expression for M ∼ GL n is given in Appendix C). Visibly, in the neutral decay mode, the residual corrections are even smaller than in the charged mode: Their square only varies in the range 0.972 < |K n | 2 < 0.978. The Dalitz plot distribution of the decay η → 3π 0 is affected by less than half a percent. For t n = u n , the physical region is characterized by 4M 2 π 0 ≤ s n ≤ (M η − M π 0 ) 2 . The small cusp generated by the virtual transition π 0 π 0 → π + π − → π 0 π 0 occurs within that range, at s n = 4M 2 π + . In the right panel of Fig. 8, it shows up near the vertical line that marks the lower end of the physical region.

Correcting the dispersive solutions for isospin breaking effects
In order to clearly distinguish the isospin symmetric dispersive representations M c (s, t, u), M n (s n , t n , u n ) from those that include isospin breaking effects, we denote the physical amplitudes by M phys c (s, t, u), M phys n (s, t, u) and work in the normalization The approximation we are using to account for isospin breaking applies two steps:  (69). Since this operation takes the constraint π -this is where they are uniquely defined. Moreover, the map takes center and boundary of the physical Dalitz plot into center and boundary of the isospin symmetric phase space. Analogous statements hold for the neutral channel -the kinematic map relevant in that case is specified in Appendix C.
(ii) We assume that the remaining isospin breaking effects can be estimated with the one-loop representation and approximate the physical amplitude with Note that we are treating the residual corrections multiplicatively. We expect this prescription to provide a decent estimate even in the physical region: While Fig. 4 shows that the one-loop representation as such has a pronounced momentum dependence and reproduces the curvature of the dispersive solution only semi-quantitatively, the ratios K c , K n vary comparatively slowly and stay close to unity throughout the physical region. The main difference between the two decay modes is that, for η → π + π − π 0 , the residual corrections increase the square of the amplitude at the center by 7.6 % and hence increase the decay rate, while for η → 3π 0 , the opposite is the case: At the center, the square of the amplitude is reduced by 2.6 %. As will be discussed in Sec. 7.1, the comparison of the results obtained for the branching ratio B = Γ η→3π 0 /Γ η→π + π − π 0 with the experimental results offers a strong test of the approximations used to account for isospin breaking.
While in the neutral channel, the residual corrections affect the Dalitz plot distribution only very little, the momentum dependence of the amplitude relevant for the charged decay mode is not properly accounted for by the kinematic map. The contribution from the triangle graph is singular at s = 4M 2 π + , but we have removed that singularity by subtracting the Coulomb pole specified in (61). As shown in Appendix B.4, the spike occurring there does not arise from the triangle graph, but from the interference between the contributions generated by the branch cuts in the s-channel (final state interaction among the pairs π + π − and π 0 π 0 ) with those in the t-and u-channels due to π ± π 0 pairs. We assume that the oneloop approximation does provide a decent estimate for the distortion of the discontinuities generated by the electromagnetic interaction and expect that multiplying the amplitudes of the charged and neutral decay modes with the ratios yields a good approximation of the physical distribution. This implies, in particular, that we are accounting for the cusps that run through the physical region of the decay η → 3π 0 only in one-loop approximation. We will compare the resulting parameter free prediction for the Dalitz plot distribution of the decay η → 3π 0 with experiment in Sec. 7 -this comparison offers another good check on the internal consistency of our framework.

Experiment
The most precise measurement of the Dalitz plot of η → π + π − π 0 and the one on which our analysis has been based is the recent one by KLOE [22], but the experimental measurements of this decay in the charged and neutral channel have a long history, which we are going to briefly review here. The first measurements of the Dalitz plot of η → π + π − π 0 have been performed already in the seventies [74,75] and led to a rough determination of the leading coefficients occurring in the standard parametrization of the distribution, 9 as quoted in Table 1. The same measurement was performed by Crystal Barrel at LEAR in 1998 [76], with less precise (because of the low statistics) but compatible results.
Only more recently has the interest in such a measurement been revived again and thanks to the existence of experimental facilities, like DAΦNE, MAMI or COSY, and detectors like KLOE and WASA, a new series of more precise measurements has been performed. KLOE made a first measurement in 2008 [19], with a much more precise determination of the three parameters a, b and c and for the first time of the parameter f . This measurement has been repeated by the WASA-at-COSY collaboration [20] and more recently by the BESIII collaboration [21]. The latest measurement is again due to KLOE [22], and is based on the largest statistic sample of about 5 million decays (for comparison, WASA has 30 and BESIII 60 times less events). The values of the individual Dalitz plot parameters, all shown in Table 1, seem to differ somewhat among these recent measurements but it is difficult to draw conclusions about a possible discrepancy by just looking at central values and errors, because there are strong correlations among the parameters.
A more effective way to judge the compatibility of the different measurements is to fit them with the same parametrization and calculate the χ 2 for each of the data sets. Unfortunately this is only possible for the latest KLOE data [22] and for those of WASA [20], because only these have published unfolded data in the form of a bidimensional bin distribution. For these two data sets, we find: [75] 1.080 (14) 0.34 (27) (7) −4.4(9) Table 1: Experimental values of the Dalitz plot parameters of η → π + π − π 0 . The two entries for KLOE (16) correspond to their fits with 4 and 5 free coefficients, respectively (fit#3 and fit#4).
• In view of the much larger statistics, KLOE data dominate any common fit; the inclusion of the WASA data barely shifts the parameters and any outcome of the fit.
• The compatibility among the two data sets is marginal: A common fit (with six subtraction constants, i.e. five fit parameters) gives χ 2 K = 371 for 371 data points and χ 2 W = 84 for 59 data points. • Fitting WASA data by themselves gives a much better χ 2 : χ 2 W = 49, but this would be totally incompatible with KLOE, as the corresponding χ 2 is huge.

Fitting the KLOE distribution for
In our analysis, the recent KLOE data [22,77] play the central role. In this experiment, the Dalitz plot distribution of the decay η → π + π − π 0 is determined to high accuracy, splitting phase space into altogether 371 bins. The binning is based on the Dalitz plot variables X c , Y c specified in Eq. (58). We denote the values of X c , Y c at the center of bin #i by Y c = 0.05 Figure 9: Fits to the KLOE data on the Dalitz plot distribution of η → π + π − π 0 . To make the different entries visible, the distribution obtained from current algebra is subtracted.

Dispersive fits to the KLOE data without theoretical constraints
In Sec. 3.5, we determined the dispersive solution that matches the one-loop representation at low energies, allowing for only four subtraction constants. We now consider the opposite: Ignore the information obtained from χPT and exclusively make use of the data on the Dalitz plot distribution. Again, we only allow for four subtraction constants, setting δ 0 = γ 1 = 0.
The minimum occurs at . We refer to this fit to KLOE with 4 subtraction constants as fitK 4 . It is of remarkably good quality: χ 2 K = 390 for 371 data points and 4 free parameters. Fig. 9 compares various fits with the KLOE data. Since the value of Λ K depends on the fit, we leave the data as they are and divide the dispersive representations by this factor -instead of showing the normalized observed distribution. Moreover, for better visibility, the leading term of the chiral expansion, The left panel of Fig. 9 corresponds to the one on the left of Fig. 8: Fig. 8 concerns the correction factor |K c | 2 used to account for some of the isospin breaking effects, we are now considering the Dalitz plot distribution of the full amplitude. The comparison shows that the spike occurring in |K c | 2 near s c = 4M 2 π + also manifests itself in the Dalitz plot distribution near Y c = 0.895, but in rather modest form. For the reasons given in Sec. 4.3, the spikes in |K c | 2 and in D c are of opposite sign. A dedicated experimental study is required to resolve the structure in the vicinity of s c = 4M 2 π + . The most important aspect of the solution obtained by fitting the measured Dalitz plot distribution concerns the comparison with the matching solution discussed earlier. The two solutions exclusively differ in the values of the subtraction constants: While those relevant for the matching solution are given in Eq. (53), the fit to the KLOE data is characterized by Eq. (78). In order to compare fitK 4 with the estimates obtained from χPT, we work out the real parts of the Taylor invariants belonging to this fit. The result reads: Remarkably, these numbers are within the range estimated in (49): Although chiral symmetry was not made use of in the derivation of fitK 4 , the resulting transition amplitude is consistent with the estimates based on the low-energy theorems that follow from it. This neatly confirms that the uncertainty estimates we are attaching to the Taylor invariants are on the conservative side. Moreover, the solution fitK 4 does contain an Adler zero along the where it was predicted long ago, on the basis of current algebra [28]. This provides a good check on the internal consistency of our framework.

Theoretical constraints
Since the experimental and theoretical sources of information are consistent with one another, it is meaningful to combine them. We do this by introducing a discrepancy function that measures the deviation from the theoretical estimates: The quantities H NLO Let us first treat all six subtraction constants as well as the normalization Λ K of the Dalitz plot distribution as free parameters. We use the symbol fitKχ 6 for this fit, to indicate that it relies both on the KLOE data and on the theoretical constraints obtained from χPT and involves 6 subtraction constants. The fit represents a compromise between the minima of the experimental and theoretical discrepancies: The quality of the fit to the data is slightly better than in the case of fitK 4 -not a surprise: We are allowing for six rather than only four subtraction constants. The price to pay is that the theoretical discrepancy increases. By construction χ 2 th vanishes for fitχ 4 , takes the value χ 2 th = 0.67 for fitK 4 and reaches χ 2 th = 1.47 for fitKχ 6 . Fig. 10 displays the behaviour of the real parts belonging to the various dispersive solutions all the way down to s = 0 (while the curves for the Dalitz plot distribution shown in Fig. 9 account for the corrections due to isospin breaking, those for ReM represent the isospin symmetric solutions as they are). Remarkably, in the entire range shown, fitKχ 6 runs close to fitχ 4 , the matching solution specified in Sec. 3.5.
In addition to the representations fitχ 4 , fitK 4 and fitKχ 6 we discussed above, Fig. 10 shows a fourth solution, fitK 6 . The only difference between this solution and fitK 4 is that δ 0 and γ 1 are not set equal to 0, but are treated as free parameters. Accordingly, this fit follows the data even more closely: χ 2 K = 371 for 371 data points and 6 free parameters. Fig. 9 shows that, in the physical region, the Dalitz plot distributions belonging to fitK 4 and fitK 6 are nearly the same. Outside the physical region, however, fitK 6 goes astray: This solution of our system of integral equations is not acceptable, because it does not have an Adler zero at all. The clash with chiral symmetry also manifests itself in the Taylor invariants: fitK 6 yields Re h K 5 3 = 59.8, for instance, which differs from the theoretical estimate h 3 = 6.3(2.0) in (49) by 28 σ. This indicates that -with six subtraction constants -there is too much freedom in the space of solutions for the experimental information about the Dalitz plot distribution to control the behaviour of the transition amplitude outside the physical region.
The fact that fitKχ 6 does have an Adler zero at s A = 1.39 M 2 π shows that the theoretical constraints do provide the missing information: The only difference between fitK 6 and fitKχ 6 is that the latter accounts for these while the former does not. The theoretical constraints barely matter in the physical region, but play an important role in the extrapolation to small values of s. The properties of the amplitude at small values of s are essential, because theory is needed to determine the normalization of the amplitude. Since the relevant Taylor invariant, H 0 , represents a linear combination of the subtraction constants α 0 and β 0 , it concerns the value and the first derivative of the component M 0 (s) at s = 0.

Error analysis
The uncertainties in our results are dominated by the statistical errors. These are determined by the behaviour of the discrepancy function in the vicinity of the minimum. In connection with the fits to the measured Dalitz plot distribution of the charged decay mode, the normalization constant H 0 is irrelevant -we keep it fixed at the value found at one loop. Also, since none of the observables of interest in the present context depends on Λ K , we fix this parameter at the minimum, which is nearly the same for all fits: Λ K 0.938. The discrepancy function χ 2 tot then depends on five independent real variables, which can, for instance, be identified with β 0 , γ 0 , δ 0 , β 1 , γ 1 . We rely on the Gaussian approximation, which exploits the fact that, in the vicinity of the minimum, the discrepancy function can be approximated by the truncated Taylor series in all five variables. The calculation is described in detail in Appendix D.
The uncertainties inherent in the input used for the ππ phase shifts must also be accounted for. These were discussed in Sec. 2.6. We have worked out the response of the dispersive representation to variations in the Roy solutions of [16], not only below 800 MeV where the uncertainties are small, but also at higher energies where dispersion theory does not provide strong constraints -for details see Appendix E. The resulting uncertainties in the subtraction constants are small compared to the Gaussian errors discussed above, except for γ 0 : This term is relatively sensitive to the high energy tail of the dispersion integralsthe corresponding uncertainty is comparable to the Gaussian error.
The kinematic map we are using to embed the isospin symmetric dispersive representation in the physical world accounts for the effects due to the mass difference between the charged and neutral pions only rather crudely. We rely on the one-loop approximation of Ditsche, Kubis and Meißner [18] to correct for all other effects that (i) are generated by the e.m. interaction and (ii) are not taken care of when applying radiative corrections to the data. We consider the difference between our results and those obtained by neglecting the isospin breaking effects altogether and estimate the uncertainty of our treatment of these effects at 30 % of that difference.
The errors listed in Table 2 are obtained by adding the Gaussian errors, those from the ππ phase shifts and those related to isospin breaking in quadrature, Table 2 compares central values and errors obtained for various fits to the KLOE data with the matching solution fitχ 4 , which exclusively involves theoretical ingredients. Fits that do not carry the label χ are obtained by ignoring the theoretical constraints altogether.  Table 2: Comparison of the matching solution fitχ 4 with fits to the KLOE Dalitz plot distribution for η → π + π − π 0 . The presence or absence of the label χ indicates whether or not the theoretical discrepancy (80) is included in the minimization procedure and the index specifies whether four, five, or six subtraction constants are taken different from zero (in the chosen normalization, α 0 is tied to β 0 according to α 0 = 0.8594 − 0.08736 β 0 ).

Number of subtraction constants, significance of theoretical constraints
In this sense, the first two lines represent two extremes: While fitχ 4 only relies on theory, fitK 4 only relies on experiment. For a detailed comparison of these two solutions, we refer to the end of Sec. 5.3. Table 2 shows that the central values of all of the subtraction constants of fitK 4 are within the uncertainty range of fitχ 4 and vice versa. In other words, the fit to the data automatically satisfies the theoretical constraints. This can also be seen in the value χ 2 th = 0.67 obtained with fitK 4 : The central values of h 1 , h 2 , h 3 obtained from the KLOE data are all in the predicted range.
The entries for χ 2 K , on the other hand, show that fitχ 4 differs strongly from fitK 4 : While the latter represents an excellent fit of the 371 data points with χ 2 K = 390, the former yields χ 2 K = 799. Superficially, this may give the impression that the matching solution is ruled out by experiment, but this is by no means the case. In view of the uncertainties attached to the predictions for h 1 , h 2 , h 3 , the matching procedure leads to an entire family of solutionsfitχ 4 merely represents the central one of these. The very fact that fitK 4 is a member of this family shows that the KLOE data on the Dalitz plot distribution of η → π + π − π 0 confirm the theoretical estimates based on the assumption that the strong interaction possesses a hidden approximate symmetry.
In the derivation of fitKχ 4 , both the KLOE data and the theoretical constraints are made use of. The comparison with fitK 4 shows, however, that this barely makes any difference. In particular, the values of χ 2 th and χ 2 K obtained with these two fits are nearly the same. The solution fitK 5 differs from fitK 4 in that the subtraction constant δ 0 is not set equal to zero, but is treated as a free parameter. Table 2 shows that the solution then changes quite drastically: (1) the minimum occurs at a value of δ 0 that differs from zero by about two standard deviations, (2) the quantities β 0 , γ 0 and β 1 are also pushed outside the range found with fitK 4 or fitKχ 4 and (3) the value of χ 2 th becomes very large. This shows that fitK 5 very strongly violates the theoretical constraints. The situation is similar to the one encountered with fitK 6 in Sec. 5.4: The data are not accurate enough to pin down more than four parameters. Both fitK 5 and fitK 6 must be discarded -they represent unphysical solutions of our integral equations.
The theoretical constraints domesticate the manifold of solutions if more than four subtraction constants are treated as free parameters. In fact, it does then not make much of a difference whether five or six subtraction constants are treated as free parameters. In either case, the solution is consistent with the theoretical constraints and the common subtraction constants agree within errors. Moreover, fitKχ 6 , which treats γ 1 as a free parameter, yields a result with a broad uncertainty range -the value γ 1 = 0 that corresponds to fitKχ 5 is within that range. The discrepancy function  (49) shows that, within errors, these numbers are consistent with the estimates based on χPT.
The shape of the Dalitz plot distribution is tightly constrained by experiment. Indeed, Fig. 9 shows that for the behaviour in the physical region, it barely makes a difference whether four or six subtraction constants are treated as free parameters. The numbers for χ 2 K in Table 2 confirm this: The fits fitKχ 4 , fitKχ 5 and fitKχ 6 all describe the data very well. We conclude that, as far as the momentum dependence in the physical region is concerned, the description of the observed behaviour does not require more than four subtraction constants.
In order to establish contact with QCD and with the quark mass ratio Q, however, we need to be able to calculate the decay rate. In this connection, the normalization of the amplitude plays a key role -it is not accessible experimentally because it drops out in the Dalitz plot distribution. As discussed above, we specify the normalization of the dispersive representation with the Taylor invariant H 0 , which only concerns the behaviour of the component M 0 (s) at small values of s. For the rate, the value of the amplitude instead counts at the center of the Dalitz plot. We need to understand the relation between the two. For this purpose, we consider the quantity which compares the value of the dispersive representation at the center of the Dalitz plot (X c = Y c = 0) with the Taylor invariant H 0 . Qualitatively, N 1 represents the amplification generated by the final state interaction at the center of the physical region. At tree level, the final state interaction is ignored: N 1 = 1. The one loop representation yields N 1 = 1.33.
For those fits to the KLOE data that are physically meaningful, the value of N 1 is listed in  To discuss the implications of this result, we consider the correlation between N 1 and the slope a of the Dalitz plot distribution at the center, that is, the term linear in Y c in (75). Fig. 11 shows that it makes a significant difference whether the subtraction constant δ 0 is set equal to zero (fitK 4 , fitKχ 4 ) or treated as a free parameter (fitKχ 5 , fitKχ 6 ). If δ 0 is set equal to zero then N 1 is determined very sharply. In fact, the solution then becomes so stiff that the result for N 1 is outside the range obtained if δ 0 is allowed to float. In somewhat milder form, the problem also manifests itself in Table 2: The value δ 0 = 0 is about two standard deviations away from the results obtained with fitKχ 5 or fitKχ 6 . This shows that setting δ 0 = 0 amounts to introducing a systematic theoretical error, which pulls the amplitude down by about 9 percent.
Four subtraction constants do suffice to properly describe the momentum dependence in the physical region of the decay, but to cope with the theoretical constraints that follow from the fact that the particles involved in this decay are Nambu-Goldstone bosons of a hidden approximate symmetry, an extrapolation from the physical region all the way down to the Adler zero is required. We conclude that with only four subtractions, the dispersive representation does not provide a controlled extrapolation: δ 0 cannot simply be set equal to zero, but needs to be determined by experiment.
For γ 1 , the situation is different: Since the value γ 1 = 0 is close to the center of the range obtained if this parameter is allowed to float, it does not make much of a difference whether or not we keep it fixed at zero. The advantage of using six subtractions rather than five is that the uncertainties associated with the contributions from the high energy tails of the dispersion integrals are then reduced. For this reason, we identify our central solution with fitKχ 6 .

Imaginary parts of the subtraction constants
As discussed in Sec. 3.4, the subtraction constants pick up an imaginary part at NNLO of the chiral expansion. In fact, at two loops, the imaginary part is fully determined by the one-loop representation and does therefore not involve any unknowns. The imaginary parts of the Taylor coefficients depend on the choice of the decomposition, but those of the invariants K 0 , . . . , K 5 are unambiguous. In the present section, we investigate the changes occurring in our central solution if instead of taking the subtraction constants to be real, the values of ImK 0 , . . . , ImK 5 are taken from the two-loop representation of Bijnens and Ghorbani [12], which are listed in Eq. (52). We denote this version of the central solution by FitKχ 6 , to distinguish it from the solution fitKχ 6 considered above, for which the subtraction constants are real. For the Dalitz plot distribution, the normalization of the amplitude is irrelevant. We fix it by using the one-loop result for the real part of K 0 ≡ H 0 . Table 4 compares the real parts of the subtraction constants belonging to FitKχ 6 with those of fitKχ 6 , which are real by construction. It shows that the differences between the two versions of our central solution are negligibly small compared to the uncertainties therein.   Table 5 shows that the same conclusion is reached if instead of the real parts of the subtraction constants we compare the real parts of the Taylor invariants Re K 1 , . . . , Re K 5 or the position of the Adler zero for the two variants of our central solution. The Adler zero is determined to an accuracy of about 8 % and occurs in the immediate vicinity of the current algebra prediction,  Table 5: Taylor invariants and position of the Adler zero for the two variants of the central solution.
Since the difference between the two versions of the central solution is in the noise of our calculation, we do not pursue it further. In Sec. 6, where we discuss the difference between the two-loop representation of χPT and the dispersive representation that matches it at low energies, we consider the version FitKχ 6 , because it matches the imaginary parts as well as the real parts. Throughout the remainder of the paper, however, where we draw the conclusions from our analysis, we stick to real subtraction constants and work with the version fitKχ 6 of the central solution.

Dalitz plot coefficients of our central solution
To complete this discussion of the dispersive representation in the charged channel, we approximate our central solution with a polynomial of the form (75). The result reads It is not surprising that these numbers are close to those obtained by KLOE (last row in Table 1) -the two representations of the Dalitz plot distribution differ by less than 1.2 %, in the entire physical region. The difference arises because we are imposing theoretical constraints. Indeed, dropping these, i.e. replacing our central solution by fitK 6 , the coefficients of the polynomial approximation reproduce those obtained by KLOE within errors. This shows that (i) with 6 subtraction constants, the dispersive framework is flexible enough to describe the KLOE data well and (ii) the available experimental information is consistent with the theoretical constraints.
The parametrization (75) amounts to a polynomial in the Mandelstam variables s, t, u. Unitarity generates branch points at the boundary of the physical region (the corresponding cusps in the real part of the amplitude can be seen e.g. in Fig. 10). Outside the physical region, a polynomial parametrization of the Dalitz plot distribution cannot provide a reliable improvement of the current algebra formula, The dispersive framework we are using does account for the singularities required by unitarity, but as discussed in Sec. 5.6, a fit to the KLOE distribution that simply treats the subtraction constants as free parameters leads to solutions that violate chiral symmetry. We are exploiting the fact that this symmetry imposes strong conditions on the amplitude at small values of s, in particular also near the Adler zero. Although these conditions do not significantly constrain the amplitude in the physical region, they are essential for the interpretation of the experimental results in the framework of the Standard Model.

Comparison with the nonrelativistic effective theory
As discussed above, the Dalitz plot distribution is well described by the dispersive representation with four real subtraction constants. The fit to the KLOE data obtained in that framework, fitK 4 , does have an Adler zero in the vicinity of the current algebra prediction and also yields values for the Taylor invariants h 1 , h 2 , h 3 that are consistent with the theoretical constraints. We now compare the dispersive solutions with the two-loop representation of the nonrelativistic effective theory for the transition η → 3π set up in Ref. [38]. As this representation does not account for the electromagnetic interaction, we consider the isospin limit, setting M π 0 = M π ± and fixing the low-energy constants K 0 , K 1 with (42). Since the Dalitz plot distribution does not fix the normalization of the amplitude, we set L 0 = 1. The fit to the KLOE data then yields the following values in GeV units: With χ 2 K = 370.3 for 371 data points, the fit is of excellent quality, even better than fitK 4 . Next, we look for a solution of our integral equations that matches the nonrelativistic representation. Instead of matching the coefficients of the nonrelativistic expansion as discussed in Sec. 2.10, we minimize the difference between the nonrelativistic and relativistic representations of the amplitude in the physical region. To do this, we allow for four subtraction constants and treat these as complex free parameters. The minimum occurs at We denote this solution of our integral equations by fitNRK 4 . The figure shows that the Dalitz plot distributions of the two representations can barely be distinguished, in the entire physical region and for η → π + π − π 0 as well as for η → 3π 0 . Note the difference in the scale used in the two panels. In the left panel, the difference between the nonrelativistic fit to KLOE and our central solution can barely be seen, but it does show up in the right panel: The cusps generated by the final state interaction represent an isospin breaking effect, which is clearly seen in the band belonging to fitKχ 6 , but is absent in the other Dalitz plot distributions, because these are shown in the isospin limit. Visibly, D n = 1 + 2α(X 2 n + Y 2 n ) + . . . stays close to 1, with a negative value of the slope parameter α. The dispersive solution may be viewed as a relativistic extension of the NR representation: In contrast to the latter, it is meaningful also at small values of s. Indeed, fitNRK 4 does have an Adler zero at s A = 1.36 M 2 π . Moreover, the real parts of the Taylor invariants h 1 , h 2 , h 3 are given by 4.4, 12.3, 7.1, respectively -these values are consistent with the theoretical constraints.
We conclude that the two-loop representation of NREFT yields a decent approximation of the momentum dependence also for η-decay. In the case of kaon-decay, the contributions due to the electromagnetic interaction were worked out in the framework of NREFT and the cusps generated by the transition π 0 π 0 → π + π − → π 0 π 0 were studied in detail. The two-loop representation of Ref. [38] does properly account for the mass difference between the charged and neutral pions -an evident advantage compared to our analysis, which takes care of the mass difference only in a purely kinematic way. What is missing, however, is a representation for those electromagnetic effects that do not show up in the self-energies of the pions. We are relying on the relativistic one-loop representation for these -an extension of the work done in Refs. [39,40] to η-decay would be of considerable interest as it would allow a more satisfactory treatment of the isospin breaking effects.
The numerical values found for the subtraction constants of fitNRK 4 are very different from those of the dispersive solutions listed in Table 2. One of the reasons is that the normalization differs: While the nonrelativistic two-loop representation is normalized by setting L 0 = 1, the solutions in Table 2 are normalized by fixing the Taylor invariant H 0 at the value found at one loop. The Taylor invariants are outside the reach of the nonrelativistic effective theory. We can instead fix the normalization such that the magnitude of the amplitude at the center of the Dalitz plot is the same as for our central solution, fitKχ 6 . This is achieved by simply stretching all of the LECs: L n → λL n , with λ = 2.353. The subtraction constants of fitNRK 4 must be stretched by the same factor.
There is a further difference: For the dispersive solution to match the NR representation, the subtraction constants must be allowed to have an imaginary part -those of the solutions listed in Table 2 are real. We investigated the sensitivity of our results to the imaginary parts of the subtraction constants in Sec. 5.7. There, we observed that, in the chiral expansion, the Taylor invariants become complex at NNLO. We worked out the dispersive solution obtained if the imaginary part of the Taylor invariants are taken from the two-loop representation of the relativistic effective theory and found that the imaginary parts do not significantly affect our results. Matching with the NR effective theory at two loops confirms this experience: Although the subtraction constants of fitNRK 4 have sizeable imaginary parts while those of the solutions listed in Table 2 are real, the results obtained for quantities of physical interest are in the same ballpark. As we are not in a position to properly account for isospin breaking effects, we do not continue the comparison with the nonrelativistic framework further, but will briefly return to related work in Sec. 10.2.

Anatomy of the two-loop representation
As discussed in Sec. 3.3, elastic unitarity determines the NNLO representation of χPT in terms of the one valid at NLO, up to a polynomial. The non-polynomial part does not contain any unknowns, but the polynomial does, in the form of the low-energy constants that occur in the effective Lagrangian at O(p 6 ) -for some of these, only crude theoretical estimates are available. Note that the two-loop representation is unique up to a real polynomial. To consistently compare the dispersive and chiral representations at O(p 6 ) of the chiral expansion, the subtraction constants must be given the proper imaginary part. In particular, for the central solution, we need to consider the version FitKχ 6 , so that the imaginary parts of the Taylor invariants do agree with those of the two-loop representation.

Final state interaction at two loops
We first investigate the non-polynomial part: How well does the two-loop representation account for the final state interaction? To answer this question, we construct the two-loop representation that matches our central solution for the functions M 0 (s), M 1 (s), M 2 (s) at low values of s -the only difference between the two representations then arises from the fact that the dispersive one describes the final state interaction effects more accurately. Finally, we will compare the chiral representation obtained in this way with the one of Bijnens and Ghorbani [12] -these two only differ in the LECs of O(p 6 ).
In Sec. 3.5, we determined the solution of our integral equations which matches the oneloop representation of χPT at low energies: fitχ 4 . We now extend this to the two-loop level, exploiting the fact that the contributions from the loop graphs are determined by the oneloop representation and do not involve any unknowns. For the explicit numerical evaluation of these contributions, we rely on the work of Bijnens and Ghorbani, more precisely on the code provided by these authors [70]. Concerning the tree graph contributions, we make use of the fact that these are polynomials in the momenta. Instead of calculating the coefficients of the polynomials with the effective Lagrangian and then inserting the available estimates for the LECs contained therein, we determine the polynomial part in such a way that the amplitude matches our central solution at low energies. In the sum over the isospin components, the polynomial part contains six independent coefficients, which are in oneto-one correspondence with the Taylor invariants K 0 , . . . , K 5 . In order to construct the two-loop representation that matches FitKχ 6 , we simply need to match these invariants.
In contrast to the one-loop representation, where the Taylor coefficients are real, those of the two-loop representation have an imaginary part, which can only be matched if we allow the subtraction constants of the dispersive representation to be complex. Indeed, in the construction of the solution FitKχ 6 , we pinned the imaginary parts of the subtraction constants down with the requirement that the imaginary parts of the Taylor invariants agree with those obtained from the code [70], which are listed in Eq. (52). The two-loop representations of the functions M 0 (s), M 1 (s), M 2 (s) that match the solution FitKχ 6 differ from those of Ref. [12] only by a polynomial: The coefficients of the polynomial are given by the difference between the Taylor coefficients of the two representations, for instance: and likewise for the remaining coefficients. Note that the differences are complex -only for the Taylor invariants, the imaginary parts are the same. This property ensures that the quantity of physical interest, M NNLO c (s, t, u), which is given by the sum over the components, differs from M BG c (s, t, u) only by a real polynomial in the Mandelstam variables. The polynomial reflects the fact that the LECs of O(p 6 ) are not the same for the two versions of the two-loop representation -the contributions from these constants are real. Fig. 13 compares the isospin components of the two-loop representation with those of FitKχ 6 . Below threshold, the two representations can barely be distinguished from one another. The components with I = 1 and I = 2 of the two-loop representation closely follow those of the central solution even for s > 4M 2 π (note that the range shown for M 2 (s) is substantially wider than for the other components, because this is of interest in connection with the position of the Adler zero -see below). In M 0 (s), however, a significant difference can be seen in the physical region. It implies that the real part of the isospin combination relevant for the transition η → 3π 0 , M NNLO n (s) = M NNLO 0 (s) + 4 3 M NNLO 2 (s) nearly follows a straight line. This answers the question raised above: The two-loop representation accounts sufficiently well for the final state interaction only for s < ∼ 5M 2 π . Above that energy, the lowest resonance of QCD, the f 0 (500), manifests itself. The corresponding pole occurs on the second sheet, in the vicinity of s pole (441 − i 272 MeV) 2 6.2 − i 12.3 M 2 π [64,78] (the arrows in Fig. 13 indicate the real part of the pole position). Although the resonance is very broad -the pole is far away from the real axis -the truncated expansion in powers of momentum cannot properly cope with it above 5M 2 π , not even at NNLO. As discussed in Sec. 3.7, the curvature of the function M n (s) determines the slope parameter α of the neutral decay mode. Since the curvature of M NNLO n (s) nearly vanishes, the slope of this representation is very small -numerically, we obtain α NNLO = +0.002. In the neutral channel, the NNLO representation of the Dalitz plot distribution can thus barely be distinguished from the horizontal line in Fig. 5, which indicates the tree level result. This is lower than the value α = +0.011 that belongs to the NLO curve, which is also shown in Fig. 5, or the two-loop estimate α = +0.013(32) given in [12], but the discrepancy with the experimental value α = −0.0318 (15) [66] is not removed. We conclude that a substantial part of the discrepancy is due to the fact that the two-loop result does not fully account for the enhancement of the final state interaction generated by the resonance f 0 (500). Closely related aspects of the same problem were discussed already earlier, by Schneider, Kubis, and Ditsche (see in particular Sec. 4.3 of Ref. [42]).
The Adler zero of Re M NNLO n (s, t, u) occurs at s A = 1.35(11) M 2 π , remarkably close to the value s A = 1.37 (11) where the real part of FitKχ 6 has its zero. By construction, the isospin components belonging to the two-loop approximation M NNLO (s, t, u) agree with those of the dispersive representation at small values of s = u, but as discussed in Sec. 3.6, the behaviour of the sum over the isospin components at small values of s = u is not controlled exclusively by their behaviour in that region, but also depends on the properties of the comparatively small component Re M 2 (s) in the vicinity of s = 16M 2 π . Fig. 13 shows that even there, the two-loop approximation follows the dispersive representation for M 2 (s) rather well. This explains why that approximation is rather accurate also in the vicinity of the Adler zero.
The differences between the curves labeled Fitχ 6 and NNLO in Fig. 13 yield an estimate for the size of those uncertainties of the two-loop representation that arise solely from the fact that it describes the final state interaction very well only at low energies. In particular, the two-loop representation for the dominating contribution, M 0 (s), represents an accurate approximation only in part of the physical region -the Dalitz plot distribution is not reproduced well, neither in the charged channel, nor in the neutral one.

Contribution from the low-energy constants at NNLO
Finally, we compare the polynomial part of the amplitude of Bijnens and Ghorbani [12] with the two-loop representation constructed in the preceding section. The numbers in the row NNLO of Table 6 represent central values and uncertainties of the Taylor invariants belonging to that representation -by construction, these coincide with the invariants of the dispersive solution FitKχ 6 . The values in the row BG are obtained with the code [70] mentioned earlier.
We recall that the experimental information about the Dalitz plot distribution exclusively concerns the relative size of the invariants, not the invariants themselves. The value quoted  Table 6: Comparison of the Taylor invariants belonging to the two-loop representation constructed in Sec. 6.1 with those of the two-loop representation of Bijnens and Ghorbani [12].
for Re K 0 relies on theory, more precisely on the expansion of K 0 in powers of the masses of the three lightest quarks. This expansion starts with K 0 = 1 + O(m quark ). As discussed in Sec. 3.2, the coefficient of the next-to-leading term of the expansion can be worked out from the one-loop representation of the transition amplitude, which does not involve any unknowns. Numerically, the correction is of typical size: K 0 = 1 + 0.176 + O(m 2 quark ). The error quoted in Table 6 is based on the estimate of the higher order contributions described in Sec. 3.2. The table shows that the value obtained for Re K 0 from the estimates used for the LECs in [12] is outside our range (disregarding the uncertainty in the number 1.27, the difference amounts to 1.7σ). Since K 0 is not plagued by infrared singularities -in particular, this invariant remains finite in the limit M π → 0 -we see no reason why it should pick up unusually large corrections from higher orders and stick to the value quoted in the table.
The value of K 0 is important for the determination of the kaon mass difference and of the quark mass ratio Q, to be discussed in Sec. 9, but in the present section, we compare the chiral and dispersive representations for the Dalitz plot distribution of the charged channel, the slope α of the Z-distribution in the neutral channel and the position of the Adler zero with our central solution -these quantities only involve the ratios K 1 /K 0 , . . . , K 5 /K 0 . We set Re K 0 = 1.176 and fix the imaginary parts with the two-loop representation of Bijnens and Ghorbani [12].
As pointed out in Sec. 3.3, the Taylor invariant K 4 does not get any contribution from the LECs of O(p 6 ). The corresponding entry for Re K 4 in the table includes our uncertainty estimate from Eq. (51). The value obtained with our central solution is indeed within the range of this prediction (the imaginary parts are identical by construction). Re K 3 also agrees within the uncertainties attached to our central solution, but for Re K 1 , Re K 2 and Re K 5 , the two results differ by up to 2σ. We conclude that the values of some of the LECs used in [12] are not consistent with the experimental information on η → 3π available today.
As discussed in Sec. 6.1, a direct comparison of the two-loop representation with the data in the physical region is not meaningful -the f 0 (500) is the stumbling block. Dispersion theory is needed to establish a controlled connection between the region that is accessible to experiment and the domain s < ∼ 5M 2 π , where the two-loop approximation for M 0 (s) is sufficiently accurate.
The Taylor invariants provide the bridge. The dispersive representation reliably determines the behaviour of the amplitude in the physical region in terms of these. Their imaginary parts are known to NNLO of the chiral expansion. Using this, and keeping Re K 0 fixed at the central value, the KLOE data on the Dalitz plot distribution of η → π + π − π 0 imply that the real parts of the remaining five invariants are in the range indicated in the row NNLO of Table 6.
As already mentioned, unitarity fixes the two-loop representation for M c (s, t, u) in terms of known quantities up to a real polynomial. The polynomial contains six independent coefficients that are in one-to-one correspondence with the real parts of the Taylor invariants K 0 , . . . , K 5 . In the representation of the amplitude obtained with χPT, the Taylor invariants represent linear combinations of some of the LECs of O(p 6 ). In particular, those relevant for the scalar channel with I = 0 contribute, which are notoriously difficult to estimate because the contribution from the f 0 (500) to the corresponding spectral functions is not easily accounted for. The experimental information about the Taylor invariants and their correlations obtained from our analysis should make it possible to reliably determine these particular couplings, which also enter in many other applications of χPT. An update of the two-loop representation with the current knowledge of the LECs (for a recent review, see [79]) would be of considerable interest, but is beyond the scope of the present work. Fig. 14 compares our central solution, fitKχ 6 , with the results obtained on the basis of χPT (real part, along the line s = u and in the isospin limit: m u = m d , e = 0). The error band attached to the NNLO representation is obtained with the calculation described in Sec. 6.1, which relies on the KLOE data. It concerns the two-loop representation as such -the contributions from higher orders, which grow with the energy, are not accounted for. The green solid line corresponds to the amplitude of Bijnens and Ghorbani [12], which exclusively differs in the values of the LECs.

Branching ratio
The rates Γ η→π + π − π 0 and Γ η→3π 0 involve the overall normalization factor N , as well as the constant K 0 that normalizes the amplitudes M c (s, t, u) and M n (s, t, u), but in the branching ratio, these quantities drop out. Hence we obtain a parameter free prediction for B.
In the branching ratio, the uncertainties of the dispersive representation also cancel out almost completely -not only the errors occurring in the determination of the subtraction constants, but also those generated by the uncertainties in the phase shifts. The main source of error in B arises from isospin breaking. In particular, the mass difference between the charged and neutral pions generates a substantial difference in shape and size of the region over which the square of the amplitude must be integrated to calculate the rate. As the corrections for the charged and neutral decay modes are of opposite sign, the branching ratio is affected quite strongly -they dominate our estimate of the error: The experimental values given by the Particle Data Group are B = 1.426 (26) ['our fit'] and B = 1.48 (5) ['our average'] [66]. The comparison with our result in (90) shows that the value predicted for the decay rate of the neutral mode (on the basis of Dalitz plot distribution and decay rate of the charged mode) is in good agreement with experiment. This provides a very strong test of the approximations used to account for isospin breaking.

Dispersive representation of the Dalitz plot distribution
Equation (9) shows that, in the isospin limit, the amplitude for the neutral decay mode is determined by the one for the charged mode. With the approximate formulae (74), this statement remains true even in the presence of isospin breaking. The physical amplitude M phys n (s n , t n , u n ) is expressed as the product of a factor K n (s n , t n , u n ) that stems from the one-loop representation and a factor M ∼ n (s n , t n , u n ), that represents the isospin symmetric dispersive amplitude, evaluated with the kinematic map. In this approximation, the Dalitz plot distribution of the neutral mode is given by where M phys n (X n , Y n ) is obtained from M phys n (s n , t n , u n ) by expressing the independent Mandelstam variables s n and τ n = t n − u n in terms of the Dalitz variables X n and Y n : This implies that the central solution fitKχ 6 , which we constructed in Sec. 5, yields a parameter free prediction for the Dalitz plot distribution of the decay η → 3π 0 , together with an estimate of the uncertainties to be attached to this prediction. The main difference compared to the charged channel is that the Dalitz plot distribution is nearly flat: The experimental values differ from the current algebra prediction, D n = 1, only by a few percent. This limits the precision not only of the experimental determination, but also of the theoretical prediction for the parameters that describe the deviation from unity. A further difference compared to the charged channel arises from the fact that a single physical decay into three neutral pions is mapped into six distinct points of the physical region, so that the values of D n on a sextant of phase space fully determine the distribution (compare Sec. 4.1). Accordingly, the Dalitz plot distribution of the decay η → 3π 0 is invariant under 120 • rotations around the center of the (X n , Y n ) plane as well as under reflections at the Y n -axis. Expressed in terms of radial coordinates, the transition amplitude is periodic in ϕ with period 2π/3 and even under ϕ → π−ϕ.  As discussed in Sec. 3.7, the symmetry of the transition amplitude with respect to interchange of the Mandelstam variables implies that the expansion around the center of the physical region starts with a quadratic term. Expressed in the variables X n and Y n , this term is proportional to X 2 n + Y 2 n = Z: Only the real part of the coefficient, α = Re α, shows up in the Dalitz plot distribution:
The uncertainty is dominated by the Gaussian error, but includes our estimates for the noise generated by all sources that play a role in our analysis. The result is consistent with the experimental value α = −0.0318(15) quoted by the Particle Data Group [66]. This solves a long-standing puzzle: Our dispersive framework not only yields the proper sign of the slope, but predicts a value that is consistent with experiment.
Since α is very small, details of the evaluation matter. In particular, as demonstrated in Sec. 3.7, α is very sensitive to the final state interaction. As an example, consider isospin breaking. Although the isospin breaking effects in the decay η → 3π 0 are small, dropping them in the calculation of the slope changes the central value of the prediction from −0.0303 to −0.0327. Details of the evaluation also matter in the analysis of the data: The number quoted in (96) is the derivative of the Z-distribution at Z = 0. In the past, the experimental determination of the slope was instead determined by fitting the data with the linear formula 1 + 2αZ on a finite range of Z-values. The sensitivity of the result to this range and to the fact that -at the accuracy reached -the curvature of the distribution cannot be neglected will be discussed in Sec. 7.7.

Experiment
The experimental determination of the slope α has an even longer recent history than that of the measurement of the Dalitz plot in the charged channel: A list of all the measurements and the references can be found in Table 7.
The most precise determination of the Dalitz plot distribution and its slope parameter α is based on the data collected at the Mainz Microtron: 1.8 million events were analyzed at MAMI-B [23], another three million η → 3π 0 decays were collected at MAMI-C [24] and, very recently, the A2 Collaboration came up with an update based on altogether 7 million events [25]. KLOE has performed such a measurement too [26], on the basis of about half a million events. The PDG average α = −0.0318 (15) [66] is largely dominated by the MAMI measurements. As discussed in the preceding section, the result for α is sensitive to the range over which the data are approximated with the linear formula 1 + 2αZ. A more controlled determination that does not rely on this approximation became possible only very recently [25]. We will discuss it in detail in Sec. 8.

Z-distribution
The Z-distribution is obtained by averaging the Dalitz plot distribution over the angle ϕ. As mentioned above, the events collected in one sextant of phase space fully determine the distribution. We consider the sextant with 30 • < ϕ < 90 • , i.e. the upper one of the two sectors between the lines s = t and t = u (these are shown as dashed red and black lines in the right panel of Fig. 6). If Z is below the value the circle Z = constant runs inside the physical region, so that the average is given by with ϕ 1 = 1 6 π and ϕ 2 = 1 2 π. For Z > Z crit , the interval relevant for the average shrinks. The lower end stays at ϕ 1 = 1 6 π , but the upper end is lowered to the value of ϕ, where the circle Z = constant intersects the boundary of the physical region, which is determined by The band in Fig. 16 shows the result obtained for the Z-distribution from our central solution, fitKχ 6 . The width of the band represents the uncertainties in d Z n , which are worked out as described in Sec. 5.5. The data points represent the Z-distribution obtained by the A2 collaboration at MAMI [25]. In earlier accounts of the data collected at MAMI, the normalization of the Z-distribution was fixed by fitting the data with the linear approximation, d Z n = 1 + 2αZ, but at the accuracy reached, this is not legitimate any more, because the curvature cannot be neglected. In Ref. [25], the normalization of the Z-distribution is left open. When comparing these data with our prediction, we multiply the observed distribution by the factor Λ, which is treated as a free parameter. Visibly, the resulting normalized distribution, Λ d Z exp n , is in excellent agreement with the prediction. Quantitatively, we obtain Λ = 0.974, χ 2 = 24.9 for 30 data points and one free parameter. Fig. 17 shows the distribution over the center-of-mass energy of one of the pion pairs in the final state, which we denote by M ππ . It is given by the mean value of D phys n (X n , Y n ) over the variable X n at the fixed value of Y n that belongs to M ππ = √ s:

M -distribution
We refer to d M n as the M -distribution. The data points represent the MAMI results (Runs I and II combined) [25], while the band indicates the prediction obtained on the basis of the KLOE data for the decay η → π + π − π 0 . In contrast to the distribution in the variable Z, which barely shows any structure at all, the prediction for the M -distribution clearly exhibits a cusp at M ππ = 2M π + . The data, however, do not show any sign of such a cusp. We return to this discrepancy in Sec. 8, where we discuss various fits to the MAMI data. The figure also indicates the M -distribution obtained in Ref. [41] on the basis of the nonrelativistic effective theory. For a brief discussion of this approach, we refer to Sec. 10.2.

Polynomial approximation
Bose statistics interrelates the coefficients of the expansion in powers of X n and Y n : Up to and including quartic terms, the expansion takes the form 10 The analogous approximation relevant for the charged decay mode was discussed in Sec. 5.1. There is a significant difference between the two channels: Instead of the 5 independent coefficients a, b, d, f , g needed if all terms up to third order are retained in the charged channel, the two coefficients α, β suffice in the neutral channel. At the next order of the expansion, D c contains the three independent terms X 4 c , X 2 c Y 2 c , Y 4 c , while the symmetry under exchange of the three particles only allows a single contribution in D n : γ (X 2 n + Y 2 n ) 2 . In the neutral channel, the presence of cusps in the physical region implies that a parametrization of the Dalitz plot distribution in terms of a polynomial in the variables X n , Y n is limited to values of Z below For Z > Z cusp , the square root singularities generated by the virtual transition η → π + π − π 0 → 3π 0 need to be accounted for, but below this value of Z, only the coefficients α and γ contribute to the Z-distribution -the angular average of the term proportional to β sin(3 ϕ) vanishes below Z cusp : In Fig. 16, the left shaded region corresponds to the range 0 < Z < Z cusp . In this region, the Z-distribution is very well described by a straight line: Evidently, the coefficient γ, which measures the curvature, is very small. The same figure also shows that the slope changes at Z = Z cusp 0.597, on account of the contributions from the cusps. In the Z-distribution, the term proportional to β only manifests itself for Z > Z crit 0.756, but it does affect the M -distribution, even in the region above the cusp, 2M π + < M ππ < 0.338 GeV.
Minimizing the square of the difference between the polynomial (101) and the Dalitz plot distribution of our central solution on the disk Z < Z cusp , we obtain the following polynomial approximation: where the errors cover all sources of uncertainty encountered in the dispersive analysis. The polynomial approximation represents our result remarkably well: In the region Z < Z cusp , the difference between D poly n and the Dalitz plot distribution obtained from our central solution of the dispersion relations (corrected for isospin breaking effects) is below 0.2 permille. Within errors, the result for α agrees with the one obtained for the quadratic term of the Taylor series in the variables X n , Y n in (96). This demonstrates that the slope of the Z-distribution at Z = 0 can accurately be measured by fitting the observed Dalitz plot distribution on the disk Z ≤ Z cusp with the formula (101).

Strength of the cusps
The polynomial approximation (101) is adequate only in the singularity-free part of the physical region. We now turn to the remainder, Z > Z cusp , where the cusps do manifest themselves. The pioneering work of Budini, Fonda and Cabibbo [87,88] on the physics of the cusps occurring in the decays K + → π + π 0 π 0 and K L → 3π 0 and the subsequent thorough analysis in [37-40, 89, 90] led to a very satisfactory understanding of the phenomenon. As shown in [37][38][39][40], it can be analyzed by means of nonrelativistic effective theory. Indeed, the precision of the data on kaon decays even allows a determination of ππ scattering lengths [37-40, 59, 88, 89]. The situation for η → 3π 0 is essentially the same as for K L → 3π 0 , but the knowledge is much more limited, both experimentally and theoretically. The work reported in two theoretical investigations [41,42] will briefly be discussed in Sec. 10.2.
The branch cut required by unitarity is of the square-root type: The expansion of the function M n (s) around the point s = 4M 2 π + contains a term proportional to 4M 2 π + − s, which changes from real to imaginary when s passes through this point. In the M -distribution, this term is responsible for the discontinuity in the derivative at M ππ = 2M π + , as well as for the rapid fall-off below this point seen in Fig. 17. In the Dalitz plot distribution, the leading term generated by the branch cut in the s-channel only shows up in the narrow strip between the line s = 4M 2 π + and the boundary of the physical region. We approximate the contributions from the cusps with the leading term: The parameter δ measures the strength of the cusps; θ(x) is the Heaviside step function. For the background underneath the cusps, we simply extrapolate the terms of the Taylor series listed in Eq. (101) and use the approximation on the entire phase space. Although the formula now involves square roots as well as powers of the Mandelstam variables, we continue using the term 'polynomial approximation'. While this approximation is very accurate on the disk Z < Z cusp , where the Taylor expansion converges and D cusp n vanishes, it describes the contributions from the cusps comparatively crudely. For this reason, we do not simply minimize the difference between this approximation and our dispersive representation over the entire physical region, but fix the coefficients α, β, γ at the values listed in Eq. (104) and determine δ by minimizing the discrepancy over the remainder of the physical region, Z > Z cusp . The minimum occurs at With the values of the coefficients in (104), (107), the parametrization (106) reproduces our dispersive representation of the Dalitz plot distribution within 0.6 permille, throughout the physical region. It does not quite reach the remarkable precision of the polynomial representation on the disk Z < Z cusp , presumably because the extrapolation of the first few terms of the Taylor series does not describe the background underneath the cusps very accurately -the presence of the resonance f 0 (500) may accurately be accounted for only in the dispersive representation. The error in the result for δ reflects the uncertainties of the dispersive representation. These subject the coefficients α, β, γ to the errors listed in (104) and also lead to correlations among them. When minimizing the discrepancy in the region Z > Z cusp , the errors then propagate into δ. The evaluation shows that the strength of the cusps is rather sensitive to the uncertainties in the isospin breaking corrections -the corresponding contribution to the error budget is even slightly larger than the Gaussian error, while the one from the noise in the phase shifts is negligible.
The prediction for the slope mainly relies on the experimental information concerning the Dalitz plot distribution of η → π + π − π 0 -the theoretical constraints are not important in this connection. This can be seen by comparing the polynomial approximations for the two dispersive solutions obtained if either the data on this decay or the theoretical constraints are ignored: fitχ 4 versus fitK 4 -the first represents the matching solution, which exclusively relies on theory, while the second is instead based on the KLOE data alone. The coefficients of the corresponding polynomial approximations are listed in Table 8. The comparison shows that the two representations of the Dalitz plot distribution in the neutral channel are consistent with one another. Concerning δ, the results are even the same and for β, there is not much of a difference, either. For fitχ 4 , however, the uncertainties in α and γ are much larger than for fitK 4 : In this regard, the theoretical constraints are much weaker than the experimental ones.  give the contributions to the discrepancy function from these bins). The values of δ are obtained by fitting the remaining 140 bins of the Dalitz plot distribution, varying α, β, γ in the range found in the first step. The asterisks mark values used as input.

Z-distribution
Next, we compare the experimental information with the polynomial parametrization in the region where the Taylor series converges, Z < Z cusp . The simplest way to determine the slope experimentally is to measure the Z-distribution. In the singularity-free region, only the coefficients α and γ of the polynomial approximation show up in this distributionα specifies the slope, while γ measures the curvature. In the recent update of the MAMI data (Runs I and II combined) [25], the Z-distribution is not normalized. Allowing for a free normalization factor Λ M and fitting the data with the polynomial representation (103), we obtain a fit of excellent quality, which we denote by fitMZ: Λ M = 0.9762 (15), χ 2 = 10.2 for 18 data points and 3 parameters. The corresponding values for α and γ are listed in Table 8. The allowed range is represented by the largest ellipse in the left panel of Fig, 19. The central value of α is somewhat smaller than our prediction, fitKχ 6 , which is based on the KLOE data for η → π + π − π 0 , while the result for γ is close to what we obtain on this basis. The uncertainties are large, however -the data on the Z-distribution do not provide an accurate determination of α or γ, but impose a strong correlation between these two coefficients. If γ is not treated as a free parameter, but is held fixed at the value in fitKχ 6 , we obtain fitMZ 1 . The quality remains excellent: χ 2 = 10.2, and the central value of α nearly stays the same, but the uncertainty drops by a factor of four. If we extend the range and fit the data on the entire physical region, 0 < Z < 1, the coefficients β and δ do show up, but the Z-distribution does not determine them well and the result for α and γ barely changes. For comparison, we also show the polynomial fit#10 of Ref. [25]. The right panel concerns band #28 (0.955 < λ < 1), which is located at the boundary of the physical region.

Dalitz plot distribution on the disk Z < Z cusp
Next, we consider the MAMI data on the Dalitz plot distribution. As noted above, each event is represented by 6 different points in the physical region. The binning in the variables X n , Y n does preserve the symmetry under X n → −X n , but not the one under reflections at the lines ϕ = ± 30 • . Accordingly, a subset of bins that contains each event exactly once does not exist. This problem is readily solved by sampling the data in the radial coordinates Z, ϕ defined in Eq. (93) rather than in X n , Y n : The sextant 30 • < ϕ < 90 • contains each event exactly once. At the boundary of the physical region, however, the pair Z, ϕ is no better than X n , Y n , because the boundary value of Z depends on the angle: Z = Z b (ϕ). We propose to instead use the coordinates λ, ϕ, where λ stands for In these variables, each event gives rise exactly to one point in the sextant 0 < λ < 1, 30 • < ϕ < 90 • , so that the binning is easy to implement, not only at the boundaries of the sextant, but also at the boundary of the physical region -for a detailed account of the procedure, we refer to Appendix F. We thank Sergey Prakhov for providing us with the corresponding sampling of the MAMI data. All of the fits to the Dalitz plot distribution discussed in the following are based on this data set (Runs I and II combined). Fig. 18 compares the angular dependence of two subsets of these data with our prediction (fitKχ 6 ). The difference between the prediction and the polynomial approximation to it is too small to be visible in this figure.
A polynomial fit to the MAMI data on the Dalitz plot distribution that does not invoke dispersion theory at all is listed in the entry fitMD of  Figure 19: Correlation between slope and curvature. The polynomial fits to the MAMI data for the decay η → 3π 0 correspond to the large, slightly tilted ellipses in the left panel. They are compared with the results of Schneider, Kubis and Ditsche [42], Albaladejo and Moussalam [49], the A2 collaboration at MAMI [25] and the Particle Data Group [66]. The latter three neglect the curvature and are shown at γ = 0. The matching solution fitχ 4 , which exclusively relies on theory, is indicated by the large yellow ellipse. All other representations obtained within our dispersive framework cluster around the comparatively small cyan ellipse, which represents our prediction, fitKχ 6 . The right panel focuses on these and compares the dispersive representations fitK 4 and fitKχ 6 based on the KLOE data for the decay η → π + π − π 0 alone with the common fits to the KLOE and MAMI data, denoted by fitKM 4 and fitKMχ 6 , respectively.
values for α, β, γ listed in the table, together with Λ M = 0.976 and χ 2 = 343.3 for 266 data points and 4 parameters. The errors are obtained in the same way as for the subtraction constants of the dispersive representation, except that the discrepancy function now contains an additional parameter, Λ M . The result for α and γ confirms what we found when fitting the Z-distribution: fitMD and fitMZ agree within errors. The uncertainties are large, but the values are strongly correlated. In contrast to fitMZ, however, the likelihood of fitMD is not satisfactory: χ 2 /dof = 1.31. Since the polynomial approximation of the dispersive representation is very accurate in the disk Z < Z cusp , we consider it very unlikely that the problem originates in the lack of flexibility of the parametrization.

Cusps
Next we study the behaviour of the data in the remainder of the physical region, where the final state interaction generates cusps. The problem encountered at the boundary of the disk X 2 n + Y 2 n = Z cusp repeats itself at the boundary of the physical region. We have checked, however, that restricting the fit to those bins that are entirely contained in the physical region does not significantly modify the result. In the following, we determine the strength of the cusp with a fit to all of the bins for which D cusp n contributes. To evaluate the strength of the cusps for fitMD, we use the same procedure as in the construction of an approximate representation for our central dispersive solution: Keep the values of α, β and γ fixed at fitMD, vary δ and minimize the difference between the parametrization (106) and the data in the region Z > Z cusp . The quality of the fit is worse than for the bins contained in the disk Z < Z cusp : χ 2 = 233 for 140 data points and 1 free parameter, χ 2 /dof = 1.68. The error calculation follows the same steps: First determine δ for prescribed values of α, β, γ, Λ M , then vary these within the range obtained when minimizing the discrepancy in the disk Z < Z cusp , accounting for the correlations among them. Finally, the additional uncertainty arising from the statistical fluctuations in the region Z > Z cusp is added in quadrature. For δ, the error is dominated by the contribution from the uncertainties and correlations encountered in the first step. Table 8 shows that the result for fitMD is consistent with our prediction, also concerning δ. Although the cusps do not stick out from the fluctuations visible in Fig. 17, the quantitative analysis on the basis of formula (106) does confirm their presence.
For the dispersive representation of the amplitude, it does not make much of a difference whether the slope is determined with a fit in the disk Z < Z cusp or in the entire physical region. Fitting the parametrization (106) to our central solution fitKχ 6 in the entire physical region, we obtain α = −0.0307 (18), β = −0.0049(5), γ = 0.0018(3), δ = −0.016(4). These numbers barely differ from those quoted in Table 8 for the polynomial approximation to fitKχ 6 . This shows that the dispersive representation provides a stable extrapolation from the region below Z cusp to the region where the cusps occur.
When fitting data with the polynomial approximation, the situation is very different, because the correlation between the behaviour at small values of Z and in the region where the cusps manifest themselves is then absent. This is illustrated with two fits taken from Table I of Ref. [25], which are also based on the combined data of Runs I and II, but use all three sextants with X n > 0. Apart from that, the analysis differs from ours only in one respect: While we determine the coefficients α, β, γ with a fit to the data in the disk Z < Z cusp and make use of those in the remaining bins exclusively to estimate the strength of the cusps, fit#9 and fit#10 treat all coefficients on the same footing (except that in the case of fit#9 γ is set to zero). The comparison of the two illustrates the strong correlation between α and γ: The uncertainty in the result for the slope becomes much smaller if γ can be taken as known. Note that for all of the entries in Table 8, the values quoted for χ 2 M refer to the 266 independent bins in the disk Z < Z cusp .
The three polynomial representations fitMD, fit#9 and fit#10 agree within uncertainties, but the latter two have substantially smaller errors. The left panel of Fig. 19 illustrates the difference, which arises because the polynomial terms grow with Z; extending the region over which the approximation is fit to the data leads to smaller errors in the coefficients. While fitMD is consistent with our prediction (104), (107), the values obtained for α andβ with fits #9 and #10 are not. In fact, the entries for χ 2 M show that, in the region Z < Z cusp , the polynomial approximation to our prediction follows the data more closely than these two fits. Concerning the parameter δ, which measures the strength of the cusps, however, they are in very good agreement with our prediction.
The main problem we are facing here is that one is dealing with small effects. In current algebra approximation, the Dalitz plot distribution is flat, D LO n (X n , Y n ) = 1. The MAMI data do allow an accurate measurement of the slope α of the distribution, but what remains is tiny: For our prediction, the difference D phys n (X n , Y n ) − 1 − 2 α Z stays below 7 permille, throughout the region Z < Z cusp , where the Taylor series converges. Although the set we are analyzing is based on more than 7 million events, the statistical errors in the mean value of the Dalitz plot distribution for a given bin are of order 8 permille and the systematic ones must be small compared to this for the measurement to be sound. Isospin breaking effects are by no means negligible at this level of accuracy. In the approximation we are using, they yield a positive contribution to the slope: δα = +0.0024 (7). At Z = Z cusp , it affects the value of the Dalitz plot distribution by about 3 permille. Note also that the cusps are visible in the physical region only because the physical masses of the charged and neutral pions differ -isospin breaking is crucial for an accurate analysis of the Dalitz plot distribution in the region Z > Z cusp . The fact that the result obtained for the branching ratio agrees with experiment gives us confidence that our estimates for the effects due to isospin breaking in the integrals over the square of the amplitude are adequate, but resolving the Dalitz plot distribution at the level of accuracy needed to reliably determine small quanitites like β and γ and to measure the strength of the cusps is a different matter.

Dispersive analysis of the MAMI data
The errors attached to the values of γ listed in the lower half of Table 8 are much smaller than those in the upper half: Dispersion theory fixes the curvature term much more accurately than the data on the Dalitz plot distribution in the neutral channel -even the theoretical constraints alone (fitχ 4 ) yield a rather sharp value for this coefficient. We now investigate the impact of the MAMI data on the dispersive analysis. The discrepancy function relevant for these data is of the same form as the one for the KLOE data in Eq. (77): Taken by themselves, the data on the neutral channel do not suffice to pin down the subtraction constants. In particular, as evidenced by the current algebra approximation, the neutral channel does not contain information about the slope of the amplitude in the charged channel or about the position of the Adler zero. We combine the experimental information available in the charged and neutral channels, first ignore the theoretical constraints and look for the minimum of χ 2 K + χ 2 M . The normalization of the dispersive representation plays no role here -we again fix it with H 0 = H NLO 0 and restrict the fits to the data contained in the disk Z < Z cusp . As noted above, the correlations present in the dispersive representation imply that the results are essentially the same if that restriction is dropped.
We first allow for only four subtraction constants, set δ 0 = γ 1 = 0 and denote the simultaneous fit to the KLOE and MAMI data by fitKM4. Table 8 shows that the inclusion of the MAMI data lowers the value of the slope α from −0.0310(17) (fitK 4 ) to −0.0303(13) (fitKM 4 ), while the coefficients β, γ, δ nearly stay put. The ratio χ 2 M /dof = 1.34 shows that the quality of the fit is not satisfactory, even slightly worse than for the polynomial representation fitMD, where χ 2 M /dof = 1.31. On the other hand, the value χ 2 th = 0.46 indicates that, although the theoretical constraints that follow from the presence of a hidden approximate symmetry are not made use of in the derivation of fitKM 4 , the MAMI data for η → 3π 0 are consistent with these, as well as with the KLOE data for η → π + π − π 0 .
If more than four subtraction constants are treated as free parameters, the minimization again goes astray. When analyzing the KLOE data we found that simply adding the term χ 2 th to the discrepancy function suffices to ensure that the theoretical constraints are respected. In the present case, this is not the case, however: The contributions from the 371 and 406 data points of KLOE and MAMI, respectively, overwhelm the one from the theoretical part of the discrepancy function. The minimum occurs at χ 2 th = 5.12, indicating that the constraints are still violated -fitKMχ 6 does not represent a physically acceptable solution of our integral equations. For the determination of Q, the extrapolation below threshold is needed and the theoretical constraints do play an essential role in this connection.
As far as the behaviour in the physical region is concerned, however, fitKMχ 6 does represent an acceptable parametrization of the amplitude. The violation of the theoretical constraints can be cured without significantly changing the behaviour of the amplitude there. It suffices, for instance, to give the theoretical discrepancy in χ 2 tot = χ 2 K + χ 2 M + χ 2 th more weight. If we multiply that term by 3, the value of χ 2 th falls to 1.20 while α, β, γ, δ nearly stay put at the values obtained for fitKMχ 6 listed in Table 8. The white ellipse in the right panel of Fig. 19 illustrates the result. The comparison shows that fitKMχ 6 is close to fitKM 4 , consistent with fitMZ and fitMD (MAMI data alone) as well as with our prediction, fitKχ 6 (KLOE data plus theoretical constraints). The result for β, γ and δ can barely be distinguished from the prediction. The inclusion of the MAMI data reduces the value of the slope, irrespective of whether four or six subtraction constants are allowed. As emphasized in Ref. [25], these data imply a smaller value than the average α = −0.0318 (15) quoted by the Particle Data Group [66].

Mass difference between charged and neutral kaons
According to Eqs. (6) and (10), the rates of the charged and neutral decay modes are proportional to integrals over the square of the transition amplitude, denoted by J c and J n , respectively. Solving forM 2 K 0 −M 2 K + , the relations can be rewritten in the form: with Γ c ≡ Γ η→π + π − π 0 and Γ n ≡ Γ η→3π 0 . The constant N a does not involve any unknowns. The phase space integrals are quadratic in the subtraction constants {k 1 , . . . , k 6 } = {α 0 , β 0 , γ 0 , δ 0 , β 1 , γ 1 }: The coefficients J ab c and J ab n represent integrals over our fundamental solutions, which only depend on the input used for the phase shifts. They can be worked out once and for all, but to evaluate the uncertainties due to the noise in the phase shifts, the calculation needs to be done separately for the eight different phase shift configurations specified in Appendix E.
For our central solution, fitKχ 6 , we obtain Note that, in contrast to the Dalitz plot distribution and the branching ratio, where the normalization of the amplitude drops out, the integrals J c and J n do depend on it. While the relative size of the subtraction constants is strongly constrained by experiment, the overall normalization is not. We fix it with the theoretical estimate H 0 = 1.176(53) derived in Sec. 3.2. The uncertainty therein and the Gaussian errors contribute about equally to the uncertainties in the integrals J c , J n , while those associated with the phase shifts and with the estimates used for isospin breaking barely affect the result (for more details concerning the error budget, we refer to Sec. 9.3).
With the experimental values Γ c = 299(11) eV and Γ n = 427(15) eV [66], the relations (110) lead to two independent determinations of the kaon mass difference in QCD: 11 Since our prediction for the branching ratio agrees with experiment, the two results are nearly the same, but they are statistically independent only with regard to the uncertainties in the experimental values of the rates, which are responsible for only a small fraction of the error. Combining the two, we can determine the mass difference to an accuracy of 6 %: As discussed in the introduction, η → 3π is uniquely sensitive to isospin breaking due to the quark masses. This is thanks to Sutherland's theorem which proves the suppression of electromagnetic isospin breaking in this decay. In most other quantities which are sensitive to isospin breaking there is a competition of effects of strong and electromagnetic origin and it is difficult to disentangle the two. It is for this reason that lattice calculations, which in principle would be ideally suited to determine the size of the light quark mass difference, only recently have become able to determine this quantity: This task had to wait for simulations of QCD and QED close to the physical point, which have become possible only in the current decade. A detailed understanding of the systematic effects related to the inclusion of QED in the lattice action is still ongoing, but the latest results on strong isospin breaking from the lattice are already of significant precision. A comparison with our results is therefore highly relevant.
There are two recent lattice calculations which have evaluated the kaon mass difference in QCD in a simulation where both QCD and QED were included: one by the BMW collaboration [91] and one by the RM123 collaboration [92]. The details of the calculations differ, of course, but the outcomes are in very good agreement, not only with one another: but also with our determination from η-decay in Eq. (114).

Electromagnetic contributions to the meson masses, Dashen theorem
Theoretical determinations of the meson self-energies started in the sixties of the last century [93][94][95]. The difference between M π + and M π 0 is well understood and is due almost exclusively to the electromagnetic self-energy of the π + . Estimating the small contribution proportional to (m u − m d ) 2 with χPT yieldsM π + −M π 0 = 0.17(3) MeV [68]. We denote the electromagnetic contribution to the square of the mass of a particle by ∆ γ P ≡ M 2 P −M 2 P [27]. Together with the observed mass difference, the above estimate for the mass difference in QCD implies ∆ γ π + − ∆ γ π 0 = 1.21(1)10 −3 GeV 2 .
Dashen's theorem [95] states that, at leading order of χPT, the electromagnetic selfenergies of the neutral pions and kaons vanish, while the contributions to M 2 π + and M 2 K + are the same. The comparison of our result (114) with the observed mass difference yields a result that is about twice as large: Indeed, Langacker and Pagels had pointed out that the chiral perturbation series of the meson self-energies contains unusually large logarithmic infrared singularities [96]. The numerical estimates based on the 1/N c -expansion [97] or on the Cottingham formula [98] indicated that the Dashen theorem is strongly violated. The effective Lagrangian relevant for the evaluation of the contributions generated by virtual photons was set up [99,100], but the evaluation of the self-energies on that basis [101] did not confirm the picture -the numerical estimates used for the LECs of order e 2 p 2 led to corrections of rather modest size. The corrections to the Dashen theorem from higher orders of the chiral expansion can be characterized with the dimensionless parameter , which is defined by [27] In this notation, our results for the electromagnetic self-energy differences amount to = 0.9(3) .
We emphasize that our calculation of the difference ∆ γ K + − ∆ γ K 0 does not face the problem with the strong infrared singularities encountered in direct evaluations of the self-energies and conclude that the Dashen theorem does receive large corrections from higher orders of the chiral expansion.
The lattice results in Eq. (115) lead to the same conclusion. For comparison we include other recent determinations as well as the value quoted in the FLAG review 12 : .
Except for the marginal disagreement with QCDSF, where the quoted error is statistical only, all of these values are consistent with our result in Eq. (119).

Determination of the quark mass ratio Q
Finally, we invoke the low-energy theorem that relates the quark mass ratio Q to a ratio of meson masses [11]: (M K 0 ,M K + denote the mass of the neutral and charged kaons in QCD, while M π , M K represent the mass of the pions and kaons in the isospin limit, respectively.) The lowenergy theorem states that the chiral expansion of the left hand side in powers of m u , m d , m s starts with Q 2 and does not contain terms of next-to-leading order: This implies that, instead of normalizing the amplitude with the kaon mass difference in QCD, we can equally well normalize it with the quark mass ratio Q. The analog of the formula (110) forM 2 In either case, the relations only hold modulo corrections of next-to-next-to-leading order in the chiral expansion. Apart from the phase space integrals J c , J n and the decay rates, they only contain the isospin limit of the meson masses and the pion decay constant.
Concerning M π , we rely on the estimates given in section 3.1.1 of the FLAG review [27], which lead to M π = 134.8(3) MeV .
The result M K = 494.2(3), on the other hand, must be reexamined, because it is based on the FLAG estimate = 0.7(3) for the violation of the Dashen theorem. The change occurring if we instead use our own determination of in Eq. (119) is tiny: The value of M K is lowered to Using our central solution, fitKχ 6 , the experimental values of the two decay rates then yield The uncertainty in the theoretical estimate for H 0 contributes δ 1 Q = 0.49 to the error in the result for Q. The Gaussian error in the fit to the data is of similar size: δ 2 Q = 0.44 (this includes the uncertainties used for the theoretical part of the discrepancy function). The noise in the representation used for the phase shifts only generates an uncertainty of δ 3 Q = 0.05. While the error arising from our treatment of the isospin breaking effects in the charged channel is more important, δ 4 Q c = 0.12, the corresponding uncertainty in the neutral channel is even smaller: δ 4 Q n = 0.04. Finally, the experimental uncertainties in the decay rates of the charged and neutral channels yield an error of δ 5 Q c = 0.20 and δ 5 Q n = 0.19, respectively. The errors quoted in (127) are obtained by adding these contributions up in quadrature. Combining the results obtained in the two channels, we obtain Note that the value of the amplitude at the center of the Dalitz plot plays an important role here. As discussed in Sec. 5.6, this value is sensitive to the number of subtractions made. The systematic theoretical error introduced by setting γ 1 = δ 0 = 0 reduces the value of the amplitude at the center of the Dalitz plot by the factor 1.483/1.366, so that Q is lowered by almost one unit.   Table 9 compares our value of Q with results found in the literature. The numbers listed are either given in the quoted papers or are calculated from the estimates for the quark masses or mass ratios given therein. The first crude estimate for the masses of the three lightest quarks within QCD, m u 4 MeV, m d 6 MeV, m s 135 MeV [105] appeared in 1975 -the entry in the first line is calculated from these numbers. The value given in the second line is obtained from the current algebra formulae for M 2 π + , M 2 K + and M 2 K 0 , corrected for electromagnetic self-energies with Dashen's theorem [106] (tree approximation of χPT).
The significance of the quark mass ratio Q for the chiral expansion of the meson masses was noticed only in 1985 [68]. The third line represents the result of a χPT calculation to one loop [11], where the quantity κ ≡ 1/Q 2 was determined from the experimental decay rate. Note that, at that time, the rate was still subject to substantial uncertainties -since then, the value of Γ η→π + π − π 0 quoted by the Particle Data Group increased by more than three standard deviations: from 197(29) eV to 299(11) eV. As the result for Q is inversely proportional to the fourth root of the rate, the one-loop result 23.3(1.8) quoted in Ref. [11] drops to Q = 20.9(1.6) if the erroneous input used for the width is corrected.

Chiral expansion of the meson masses
As mentioned above, the correction term ∆ Q is beyond the accuracy of our calculation. Our result relies on the assumption that this term is too small to matter at the precision reached. This assumption concerns the properties of the strong interaction and could be examined with the same methods that are used in lattice determinations of the quark mass ratio The lattice results for this quantity have reached remarkable precision [27]. In particular, it has been shown that the result is not sensitive to the heavy quarks. FLAG quotes the values 27.34(31) and 27.30 (34) for simulations of QCD with three and four dynamical flavours, respectively. Since the most recent lattice results on the light quark masses are obtained with four dynamical flavours, we work with the second number, S = 27.30 (34) .
The quark mass ratio S also represents the leading term in the chiral expansion of a ratio of meson masses. The formula analogous to the low-energy theorem (122) reads [68] 13 but there is an important difference. While ∆ Q is of second order in the breaking of chiral symmetry, ∆ S is of first order and involves the low-energy constants L 5 and L 8 of χPT: The lattice result in (130) implies that the correction ∆ S is rather small: The situation with the quark mass ratio ) 13 In the notation used in that reference, ∆S stands for ∆M .
is very similar. It compares the breaking of SU(3)-symmetry with the breaking of isospin symmetry; in current algebra approximation, R is given by the ratio of the mass differences is of the same order as in the case of S: To evaluate R numerically, we make use of the fact that only two of the three ratios Q, R and S are algebraically independent: With our result (122) for Q and the lattice determination for S in (130), we obtain The correction in the low-energy theorem (135) is of about the same size as for S, but of opposite sign: ∆ R = 0.053 (14) .
It is not difficult to understand why that is so. The above formulae show that the higher order contributions in Q, S and R are related by For the first order contributions on the right hand side of this relation to cancel one another, the corrections ∆ R and ∆ S must be of opposite sign and comparable in size. There is no reason for this cancellation to be complete, but we expect ∆ Q to be too small to significantly affect our result for Q. We conclude that, together with the lattice value of S, our result for Q leads to a coherent picture for the chiral expansion of the meson masses. The corrections of first order in the breaking of chiral symmetry are small. The well-known fact that the Gell-Mann-Okubo formula holds to good accuracy corroborates this picture further. The formula predicts the value of M K in terms of M η and M π : 14 The correction ∆ M K is comparable with those in S and R, algebraically, ∆ M K = O(m quark ), as well as numerically, ∆ M K = 0.063 (1). Since the ratio m u /m d is also determined by S and Q, our framework leads to an estimate for the relative size of m u and m d as well. Neglecting ∆ Q also here, we obtain For a while, the theoretical possibility of a massless u-quark was taken seriously as a solution of the strong CP-problem [110,111], but as pointed out long ago [112], that idea is not 14 In the notation of Ref. [68], ∆M K stands for (M 2 η + M 2 π )/(3M 2 η + M 2 π )∆ GMO and involves the LECs L5, L6 and L7. consistent with the observed pattern of chiral symmetry breaking. Our calculation fully confirms this, as it excludes the value m u = 0 by about 16 standard deviations.
The upshot of the above discussion is that, in QCD, the chiral expansion of the squares of the Nambu-Goldstone masses is dominated by the leading terms. At the physical values of m u , m d , m s , the corrections ∆ S , ∆ R , ∆ M K from the higher order terms were found to be remarkably small and the low-energy theorem (123) suggests that ∆ Q is even smaller. We emphasize that these statements concern the dependence of the meson masses on the masses of the quarks and do not apply to the expansion in powers of the momenta. The example of ππ scattering shows that even within SU(2)×SU(2), the expansion in powers of the momenta picks up sizeable contributions from the final state interaction already at threshold. It is essential that our analysis relies on dispersion theory for the momentum dependence -as discussed in detail in Sec. 6, χPT does not describe the momentum dependence of the transition amplitude sufficiently well in the physical region of the decay, even if the contributions arising at NNLO of the chiral perturbation series are taken into account.

Comparison with the lattice results for Q
Finally, we compare our results for Q with the most recent determinations on the lattice. Table 9 shows that, while the results reviewed in the FLAG report [27] for simulations with 3 or 4 flavours are quite consistent with ours, the most recent determinations, BMW (N f = 2 + 1) [91] and RM123 (N f = 2 + 1 + 1) [92] are higher than our value (121) by 1.5 and 1.4 standard deviations, respectively. As mentioned in Sec. 9.1, the results obtained in these references for the kaon mass difference are consistent with ours. Also, the uncertainties in the values of the isospin limits M π and M K are much too small to explain the discrepancy. Hence the difference must arise from the correction term ∆ Q in the low-energy theorem (122), which is beyond the accuracy of our calculation.
To identify the core of the problem, we stick to the central values for M π and M K in (125), (126). Also, in order to respect the identity (136), we fix the value of S with those for R and Q given in the two references. Using the values for the mass differenceM   the quantitiesM 2 K 0 −M 2 K + , R and Q are strongly correlated, a meaningful error estimate requires knowledge of the correlations and is thus beyond our reach. The outcome for ∆ S and ∆ R confirms that the first order corrections are small, but ∆ R is of the same sign as ∆ S : on the right hand side of (139), the two contributions cannot possibly cancel. Hence the result for ∆ Q is in conflict with the expectation that effects of second order are smaller than those of first order.
The lattice approach is ideally suited to resolve this conundrum. At least in principle, it should be possible to determine ∆ Q with the same accuracy as m s /m ud -the issue concerns QCD and is not plagued by the long range contributions from QED, which are difficult to account for at finite volume. The calculation requires the simulation of QCD with three (or more) quark flavours of unequal mass. More precisely, one needs to calculate the meson massesM π + ,M K + ,M K 0 in this theory as a function of the quark masses m u , m d , m s . The scale Λ QCD can be pinned down with the pion decay constant, for instance, and if the simulation includes charmed quarks, the corresponding mass can be fixed withM D + . The quantities of interest are the following combinations of meson and quark masses (M π , M K represent the massesM π + ,M K + in the limit m u , m d → m ud ): If the pion decay constant as well as the relative size of the quark masses are held fixed, ∆ S and ∆ R grow in proportion to m s while ∆ Q is proportional to m 2 s . For sufficiently small quark masses, chiral symmetry guarantees that ∆ Q is small compared to ∆ S and ∆ R , but if the breaking of chiral symmetry becomes comparable to the scale of the theory, there is no reason for this to be so. We know that, for quark masses in the vicinity of the physical values, ∆ S amounts to about 6 %. What is the size of ∆ Q there?
While completing the present work, the Fermilab Lattice, MILC & TUMQCD collaborations came up with a new lattice determination of the quark masses [113]. Unfortunately, the paper does not contain a result for the ratio Q, but neglecting correlations and adding errors in quadrature, the mass ratios which are given therein, S = 27.182 (46)(56)(1) 121) and (137). Accordingly, the outcome of this calculation appears to be consistent with a coherent chiral expansion of the meson masses and to confirm that the corrections to the current algebra formulae are small. Although the paper focuses on the determination of the masses of the heavy quarks, the ratios m u /m d and m s /m ud are given to remarkable accuracy. In particular, the precision claimed for S is breathtaking -the quoted uncertainty is about four times smaller than for the FLAG value (130) we are relying on and the uncertainty in the outcome for Q is smaller than ours by more than a factor of two. Concerning the comparison with [91,92], the main difference is that the calculation is done within QCD rather than QCD + QED. The outcome for the masses m u , m d and m s is corrected for e.m. effects, but for details of the procedure used, the reader is referred to a forthcoming paper by the MILC collaboration.

Dispersive approaches
Early papers on η → 3π which have followed a similar approach to the one presented here are [14,15]. Indeed, in spirit, the calculations are very similar, but there are significant differences which make a detailed comparison of the results difficult: • The phase shifts adopted in [14,15] were taken from [114], whereas we are now able to use solutions of Roy equations matched to χPT [16,57].
• At that time, accurate data on the Dalitz plot in the charged channel were not available yet, so that the best one could do to fix the subtraction constants was to match them to χPT.
• The available χPT calculation was at one loop, and therefore there was no possibility to go beyond four subtraction constants.
• The treatment of isospin breaking corrections available at that time [72] was not yet as complete as the one provided in [18].
The result Q = 22.4(9) obtained by Kambor, Wiesendanger and Wyler [14] and the value Q = 22.7(8) of Anisovich and Leutwyler [15] are slightly higher than ours, but the difference is mainly due to the fact that, in the meantime, the experimental value of the decay rate quoted by the Particle Data Group increased (updating the calculation of [15] with Ref. [107], the result is lowered to Q = 22.3(8) [115]). The formulae derived by Kambor et al. have been used later to fit KLOE data by Martemyanov and Sopov [108]. The paper is very short and does not give any detail about the calculation -other than a formula of Kambor et al., on which the authors based their analysis [14]. All the differences pointed out above between the present analysis and the one by Kambor et al. apply also to this calculation -in particular that isospin breaking effects have not been accounted for. For completeness we nonetheless quote the value of Q they obtained: Q = 22.8 (4). The central value is the same as the one quoted by Walker [107] and therefore higher than the one obtained by Kambor et al., but the error much reduced. It is difficult to understand why the effect of the KLOE data is to increase the value obtained for Q, with respect to what Kambor et al. obtained by doing a matching to χPT to one loop. In his PhD thesis [116] one of the authors of the present paper (S.L.) showed that if one applies the same formulae and simply replaces χPT with data to fix the subtraction constants, the value obtained for Q decreases (see also [44]). Fig. 20 amounts to an update of a picture drawn by Anisovich and Leutwyler, more than twenty years ago, in order to illustrate the effects generated by the final state interaction [15]. The framework underlying that paper is essentially the same as the one used in the construction of the matching solution fitχ 4 in Sec. 3.5: a dispersive analysis with four subtraction constants, which are determined by imposing theoretical constraints derived from χPT. The figure concerns the behaviour of the real part of the amplitude M c (s, t, u) along the line s = u, in the isospin limit.
In the present work, the convention used for the value of the pion mass in the isospin limit is irrelevant, because we account for isospin breaking when comparing our calculation with experiment. In Fig. 20, however, it does matter: The straight line that shows the behaviour at leading order (LO), for instance, depends on it. We identify the isospin limit of the pion mass with the mass of the charged pion, while in [15], the mass of the neutral pion was used. If isospin breaking corrections are not applied, that choice is preferable because isospin breaking in the masses of the pions is dominated by electromagnetism, which barely affects the mass of the neutral pion. We correct for the difference in the same way as for the isospin breaking corrections, using χPT. At LO, the transformation of the amplitude from one convention to the other amounts to a mere rescaling of the vertical axis, by the 1.074. At one-loop, the isospin limit of the chiral representation is given by M GL c (s, t, u) and the real parts are readily worked out for M π = M π 0 as well as for M π = M π + . The ratio of the real parts remains roughly constant, but at a slightly larger value. We expect this to be the case for the dispersive representation as well -the red curve in Fig. 20 is obtained from the one shown in the old figure by stretching the values with the one-loop result for the ratio of the real parts. For comparison, the open circles in Fig. 20 show the real part of the amplitude belonging to the matching solution, fitχ 4 . The main difference between this representation and the one obtained in Ref. [15] is that the ππ phase shifts are now known much more precisely. The figure shows that the old calculation underestimates the amplification of the amplitude by the final state interaction at threshold, but overestimates its growth with the energy.
The figure also shows the outcome of two more recent calculations [43,47]. Kampf, Knecht, Novotný and Zdrádhal [43] have adopted a dispersive approach as well, but instead of solving the dispersion relations numerically, they have solved them analytically by iterations, stopping at the second iteration. This corresponds to a two-loop χPT representation from the analytic point of view, but the subtraction constants are not exactly related to the LEC of χPT, as the authors explain in their paper. In this connection, we refer to the detailed comparison of the dispersive approach with the two-loop representation of χPT given above (Sec. 6). Their approach also differs from ours in the way the normalization of the amplitude is fixed from theory: While we use the value of the Taylor invariant K 0 , they use the imaginary part of the amplitude along the line t = u.  Fig. 10 shows that our calculation also goes astray if we allow for 6 subtraction constants and fit the data on the Dalitz plot distribution by treating these as free parameters. According to Martin Zdráhal [117], this deficiency can be repaired without affecting significantly the rest of the calculation and in particular the fit to data, but detailed results for this improved analysis within their approach have not been published. Note also that their work does not account for isospin breaking corrections. The published value Q = 23.3(8) is significantly higher than ours, but in view of the shortcomings of the underlying analysis, this does not come as a surprise.
More recently, the JPAC collaboration [45][46][47] has also analyzed η → 3π decays, and in particular KLOE data, with a dispersive approach and the aim to determine the value of Q. The spirit is similar to the one adopted here, but the way in which the dispersion relations for this process are solved differs significantly from ours and isospin breaking corrections are not applied. The authors make an approximate treatment of the left-hand cut for the partial wave amplitudes, and assume that it can be well described by a polynomial. As we have demonstrated here (following [15]), the iterative procedure for deriving solutions of the dispersion relation converges fast and takes into account crossed channels (responsible for the left-hand cut) exactly. It is possible that the polynomial approximation adopted in [46,47] works reasonably well, but having the exact solution available, this becomes an academic question. We are indebted to Igor Danilkin for providing us with the numerical values shown in Fig. 20. In the physical region of the decay, their results are consistent with ours and the same holds for the value obtained for the quark mass ratio, Q = 21.6(1.1). Unfortunately, the method used does not work below the physical region, so that the behaviour in the vicinity of the Adler zero cannot be compared.
In Refs. [50][51][52][53] Kolesár and Novotný take a very different point of view from the one adopted here -namely that the reason for the bad convergence of χPT for this decay is understood and has to do with large final-state rescattering effects -and try to identify the reasons for the bad convergence within the framework of the so-called resummed Chiral Perturbation Theory (rChPT) [118,119]. In this approach, vacuum fluctuations ofss pairs are treated in a special way and their effect resummed. Their size is left unconstrained, which implies that both the SU(3) condensate and decay constant are treated as free parameters, having possibly a very different value than their SU(2) counterparts. The idea is very intriguing and if one could find a way to rigorously determine the size of these SU(3) parameters, this would be a very interesting result.
The present work shows that rescattering effects can be accounted for in a systematic, nonperturbative manner. Causality and unitarity determine the momentum dependence of the transition amplitude up to a set of subtraction constants -χPT is used exclusively to work out the constraints on these constants arising from chiral symmetry. Our analysis, in particular, does not rely on the chiral expansion for quantities that contain strong infrared singularities and are notoriously difficult to deal with in χPT.
Very recently, Albaladejo and Moussallam [48] have shown how to extend the dispersive formalism we have used in the present work to include the effect of inelastic two-body effects, likeKK and ηπ. This remarkable and very useful technical advance allowed them to explicitly take into account effects related to narrow resonances in the one-GeV region, like the a 0 (980) and the f 0 (980). From their numerical analysis, they conclude that the effect on the determination of Q are of the order of 0.2 units, and therefore much smaller than the error. They also invoke the KLOE data on the Dalitz plot distribution in the charged channel to constrain their representation and to predict the coefficients of the distribution in the neutral channel. Setting γ = 0, they obtain α = −0.0337 (12), β = −00054(1), to be compared with our result (104). While our value for α is smaller than theirs by about 2 σ, we do confirm their value of β. The difference may in part arise because their analysis does not account for isospin breaking corrections, in part because the terms proportional to α and β in the Taylor series (101) provide a decent approximation only in the immediate vicinity of Z = 0. As discussed in Sec. 7, the curvature term γ affects the behaviour away from the center of the physical region -setting it to zero distorts the result for α. At any rate, we consider it very unlikely that the difference has to do with the presence of inelastic channels. The plots shown in [48] indicate that -in the physical region of the decaythe effects generated by these are well described by a polynomial. In our calculation, such contributions are absorbed in the subtraction constants. We do therefore not expect that explicitly accounting for inelastic channels would lead to a significant change in our results.

Nonrelativistic effective field theory
A different approach which has been applied to η → 3π decays is the one relying on a nonrelativistic Lagrangian. This has been very successful in describing K → 3π decays and in particular the cusp structure at the opening of the π + π − channel in the 2π 0 spectrum of the K ± → π ± 2π 0 decay [37][38][39][40]. In this framework one makes a nonrelativistic expansion both at the level of the Lagrangian as well as in the calculation of rescattering effects. The importance of the latter is controlled by the scattering lengths, which happen to be small (as a consequence of the Nambu-Goldstone-boson nature of the pions): Technically, the NREFT also relies on an expansion in the scattering lengths. From the calculation point of view, rescattering effects are taken care of automatically by the loop expansion of quantum field theory. A significant advantage of this approach is that one does not rely on an expansion in the quark masses: The tree-level decay amplitude near to threshold is expanded in the spatial momentum squared, and the coefficients of this expansion are treated as free parameters. Which means that in this approach one does not have to worry about the slow convergence of χPT for the scattering lengths, for example, because these are by definition the physical values. The only question that matters in this case is whether one is close enough to threshold that the nonrelativistic expansion works.
The nonrelativistic approach is applied to the decay η → 3π in Refs. [41,42]. The mass difference between the charged and neutral pions is accounted for and the cusp due to the opening of the π + π − channel in the π 0 π 0 spectrum of the decay η → 3π 0 is analyzed in detail. Moreover, fitting the free parameters in the nonrelativistic representation of the transition amplitude to the KLOE data available at the time, the authors of Ref. [41] did obtain a negative value for the slope α in the neutral channel, as observed. A comparison of the predicted Dalitz plot in the neutral channel with the data by MAMI-C shows that the calculation is in reasonable agreement with the data: In particular that, as one moves from tree-level to one and then to two loops (in the NR expansion), the curves obtained move towards the data and show a good convergent behaviour.
It is worth emphasizing here the difference between our approach and the NR expansion: While in a dispersive treatment rescattering effects (in the S and P waves) are treated exactly, the NR expansion applies a perturbative scheme to account for these. However, the treatment of isospin breaking effects can be done in a theoretically much cleaner way within the NR approach. We have relied on one-loop χPT and a factorization hypothesis, which can only be approximately correct. To exemplify the difference between the two approaches it is useful to compare the Dalitz plot in the neutral channel: In the NR approach the strength of the cusp effect is exactly described in terms of the S-wave scattering lengths, according to a venerable low-energy theorem [87]. If these were taken from experiment, then the strength of the cusp would be correct by definition.
In Ref. [42] this approach has been further refined and extended to include isospin breaking corrections beyond the π + − π 0 mass difference, and a complete set of formulae describing these decays in the NR expansion have been provided. In this paper the question whether fitting the Dalitz plot data in the charged channel correctly reproduces the Dalitz plot in the neutral channel has been addressed thoroughly. The conclusion is similar to the one obtained by Gullström et al. [41], namely that the agreement with the data in the neutral channel is marginal. In particular, only at the two loop level does the value of α become negative, and only after a partial resummation of rescattering effects does it get close to the measured value. For the coefficients of the Dalitz plot distribution in the neutral channel, Schneider et al. [42] obtain α = −0.0246 (49), β = −0.0042 (7), γ = 0.0013(4), based on matching to χPT and resummation of bubble graphs. Although the ingredients of this calculation are quite different from ours, the comparison with the numbers in (101) shows that the qualitative properties of the prediction for the Dalitz plot distribution in the neutral channel are the same.
Ref. [42] also proposes a different approach to the determination of α within the NREFT formalism: The authors derive an exact relation (in the isospin limit) between the Dalitz plot parameters in the charged channel and the slope α in the neutral channel and show that if one inputs the parameters measured by KLOE and estimates the imaginary part of a combination of Dalitz plot parameters (defined as Imā) within the NR expansion, one obtains a value for α which is only in marginal agreement with the measured value. This remains true even after calculating isospin breaking corrections. We have analyzed this apparent clash in some detail and came to the conclusion that the estimate of the parameter Imā within the NR expansion does not seem to be reliable. The reasoning is as follows: If we fit the KLOE data and calculate the slope at Z = 0 with our dispersive representation we get α = −0.0302 (13), in agreement with the PDG value. This evaluation accounts for isospin breaking effects. As discussed in Sec. 5.8, the polynomial approximation to our central solution agrees well with the experimental determination by KLOE. If we now insert these numbers in Eq. (6.9) of Ref. [42] and rely on their estimate of Imā we get α = −0.0474, in substantial disagreement with our own direct determination. Since Eq. (6.2) of Ref. [42] is algebraically exact, and the estimate of the isospin breaking effects (leading to Eq. (6.9)) only gives a small correction, the problematic step must be in the estimate of Imā.
An even better test of the NREFT approach would be to analyze the data along the lines of Sec. 5.9

Summary and Conclusions
1. The essential properties of the framework we are using to analyze the transition amplitude of the decay η → 3π were derived long ago [30][31][32]. The decay violates the conservation of isospin. Since chiral symmetry suppresses the electromagnetic interaction in this transi-tion [2], the dominating contribution arises from QCD and is proportional to the difference m d − m u of quark masses. It is convenient to normalize the amplitude with whereM K 0 andM K + denote the kaon masses in QCD.
2. The first part of the present paper reviews the dispersion theory of the amplitude M c (s, t, u) in the isospin limit (e → 0, m u → m d ), where this function also determines the amplitude relevant for the transition η → 3π 0 . We follow the dispersive analysis set up in [15], which exploits the fact that, at low energies, the angular momentum barrier suppresses the imaginary parts of the D-and higher partial waves. Neglecting these, the amplitude can be decomposed into three isospin components, which only depend on a single variable: M 0 (s), M 1 (s), M 2 (s) -see Eq. (17).
3. Elastic unitarity determines the discontinuities of the isospin components across the branch cuts associated with collisions among pairs of pions, in terms of the S-and P-wave ππ phase shifts. We write the corresponding dispersion relations in the form (33), allowing for six subtraction constants: α 0 , β 0 , γ 0 , δ 0 , β 1 , γ 1 . These relations represent a set of integral equations that uniquely determine the amplitude in terms of the subtraction constants. Moreover, since the equations are linear in the subtraction constants, the general solution is given by a linear combination of six fundamental solutions that can be determined once and for all. 4. At the experimental accuracy reached, the electromagnetic interaction cannot be ignored. In particular, the e.m. self-energy of the charged pion modifies the amplitude obtained from QCD quite significantly. We rely on the representation of Ditsche, Kubis and Meißner [18], who evaluated the transition amplitude within the effective theory of QCD+QED, to first non-leading order of the chiral expansion and to order e 2 in the electromagnetic interaction. Their analysis in particular also accounts for the emission of the soft photons that necessarily accompany the decay as well as for the Coulomb pole generated by the attraction among the charged pions in the final state. We assume that the data are radiatively corrected in accordance with their analysis. 5. A substantial part of the e.m. interaction can be accounted for with a purely kinematic map that takes the physical phase space of the decay η → π + π − π 0 onto the phase space of the isospin symmetric world. Applying this map and removing the Coulomb pole, the isospin breaking corrections reduce to an approximately constant numerical factor, except near s = 4M 2 π + , where a visible structure due to the interference of the branch cuts from π + π − and π 0 π 0 intermediate states remains (left panel of Fig. 8). Isospin breaking in the decay η → 3π 0 can be treated analogously. In that case, a Coulomb pole does not occur. Instead there is a small cusp due to the virtual transition π 0 π 0 → π + π − → π 0 π 0 (right panel of Fig. 8). Those isospin breaking effects that are not taken care of by the kinematic map are accounted for only in one-loop approximation.
6. The theoretical constraints that follow from the fact that the pions are Nambu-Goldstone bosons of a hidden approximate symmetry can be worked out by means of Chiral Perturbation Theory. The representation of the amplitude obtained on this basis does have the structure of Eq. (17), up to and including NNLO. The only qualitative difference compared to the dispersive framework we are using is that the chiral representation corresponds to an extended version of elastic unitarity, which also accounts for the discontinuities generated by KK, ηη and πη intermediate states. In the region relevant for η decay, the contributions generated by these singularities are very small and well described by their Taylor expansion in powers of s. As we are working with sufficiently many subtractions, they can be absorbed in the subtraction constants.
7. At leading order of the chiral expansion (current algebra), the transition amplitude of the decay η → π + π − π 0 is independent of t and u, grows linearly with s and has an Adler zero at s = 4 . Although the zero occurs outside the physical region, the data on the Dalitz plot distribution beautifully confirm its presence: Ignoring the theoretical constraints altogether and allowing only four subtraction constants, the dispersive representation yields a very good fit of the data (Sec. 5.3, fitK 4 ). Along the line s = u, the real part of this representation indeed passes through zero at s = 1.43M 2 π , close to the place where current algebra predicts this to happen.
8. The information provided by χPT is essential, because the Dalitz plot distribution leaves the normalization of the amplitude open. To establish contact between the dispersive and chiral representations, we consider the region where the uncertainties in the latter are smallest, i.e. focus on small values of s in M 0 (s), M 1 (s), M 2 (s) and compare Taylor coefficients. The requirement that the one-loop representation, which does not involve any unknowns, yields an acceptable approximation at low energies allows us to consistently combine the two. In particular, we normalize the dispersive representation with the oneloop value of the coefficient H 0 , accounting for the higher order contributions merely by attaching an uncertainty estimate to this value.
9. There is an alternative to fitK 4 , which we denote by fitχ 4 : A dispersive representation that also uses only four subtraction constants, but incorporates the theoretical information instead of the one obtained at KLOE. It is uniquely determined by the requirement that the isospin components of the dispersive representation match those of the one-loop representation at small values of s. Fig. 3 shows that the one-loop approximation accurately follows the dispersive representation only below threshold -in the physical region, it underestimates the strength of the final state interaction. This manifests itself particularly clearly in the Dalitz plot distribution of the neutral decay mode: Fig. 5 shows that the curvature of the two representations differs even in sign. 10. The same deficiency also shows up at two loops: The lowest resonance of QCD, the f 0 (500), is not described well enough even at NNLO of the chiral expansion. This implies that the two-loop representation does not have the necessary accuracy in the physical region -a meaningful comparison of theory and experiment is possible only in the framework of dispersion theory. The problem is illustrated in Fig. 13, which compares our central solution with the two-loop representation that matches it at low energies. 11. We emphasize that the analysis reported here became possible only very recently, with the accurate measurement of the Dalitz plot distribution for the decay η → π + π − π 0 at KLOE [22]. For the central solution of our system of equations, the errors arising from the experimental and theoretical uncertainties are of comparable sizeη-decay is a showcase for a fruitful interplay between theory and experiment.
12. As discussed in detail in Sec. 5.5, the simpler framework obtained by dropping the subtraction constants δ 0 and γ 1 is too stiff -doing this amounts to imposing constraints that distort the transition amplitude. The need for the term δ 0 s 3 in the subtraction polynomial of M 0 (s) also shows up in connection with the polynomial approximation of the kaon loops: The contributions from the KK cuts to M 0 (s) are not accounted for sufficiently well by a quadratic polynomial, but a cubic one does suffice. Moreover, working with six subtraction constants has the advantage that -in the region of interest -the solutions are then not sensitive to the high energy tails of the dispersion integrals, where elastic unitarity does not represent a good approximation. In the error analysis, the uncertainties associated with the high energy tails are booked together with those in the phase shifts at low energies, where the Roy equations provide very good control -with six subtraction constants, the net uncertainty from these sources is very small. 13. The decomposition of the amplitude M c (s, t, u) into its isospin components M 0 (s), M 1 (s), M 2 (s) is unique only up to polynomials [see Eqs. (20), (21)]. For the dispersive representation, the ambiguity is disposed of when bringing the dispersion relations to the form (33). Alternatively, the solutions can be characterized by invariant combinations of Taylor coefficients: Two solutions yield the same representation M c (s, t, u) if and only if these invariants are the same. This allows us to unambiguously characterize the twoloop representation that matches our central solution at low energies (see Sec. 6.2). A corresponding update of the low-energy constants occurring in the effective Lagrangian at O(p 6 ) would be of considerable interest but is beyond the scope of the present work.
14. Isospin symmetry leads to a prediction for the branching ratio of the neutral and charged decay modes, B = Γ η→3π 0 /Γ η→π + π − π 0 . The result of our calculation, B = 1.44 (4) is in good agreement with the values B = 1.426 (26) and B = 1.48(5) quoted by the Particle Data Group [66]. 15. The Dalitz plot distribution of the decay η → 3π 0 can be expanded in powers of the variables X n , Y n . In the region where the series converges, X 2 n + Y 2 n < 0.6, our prediction is remarkably well approximated by the polynomial (101) -the coefficients are specified in (104). In the remainder of the physical region, the singularities generated by the final state interaction manifest themselves as cusps. The dominating contribution from these is described by the formula (105). Although they are too weak to stick out from the fluctuations in the data, the quantitative analysis does confirm their presence at the strength required by dispersion theory. 16. The MAMI data on the decay η → 3π 0 [23][24][25] allow a strong test of our calculation. Isospin symmetry implies that the amplitude of this transition is described by the combination M n (s) ≡ M 0 (s) + 4 3 M 2 (s) of the isospin components relevant for the charged channelthe KLOE data thus lead to a parameter free prediction for this decay. Fig. 16 shows that the calculated distribution is in excellent agreement with the MAMI results.
17. The recent update provided by the A2 collaboration [25] now allows an analysis of the Dalitz plot distribution that goes beyond the linear approximation. The data in the neutral channel do not by themselves determine the slope very accurately, but impose a strong correlation between the slope α and the curvature γ. Dispersion theory provides the missing element as it determines the curvature within narrow limits. Our analysis, which relies on the KLOE data for η → π + π − π 0 and on the theoretical constraints that follow from the presence of a hidden approximate symmetry, predicts both the slope and the curvature rather precisely: α = −0.0303 (12), γ = 0.0019(3). The slope is somewhat smaller than the average α = −0.0318(15) quoted by the Particle Data Group [66]. Including the MAMI data [25] in the dispersive analysis, we obtain a result that is even a little smaller: α = −0.0294 (10). Unfortunately, the likelihood of the fits to the MAMI results is not satisfactory: χ 2 M /dof = 1.25 for the polynomial fit to these data alone and χ 2 M /dof = 1.27 for the dispersive fit, which combines them with the data from KLOE. 18. Our resultM 2 K 0 −M 2 K + = 6.3(4)10 −3 GeV 2 for the kaon mass difference in QCD agrees with recent determinations of the electromagnetic self-energies on the lattice [91,92]. We thus confirm that the strong infrared singularities occurring in the chiral expansion of the kaon self-energies subject the Dashen theorem to a large correction from higher orders. For the parameter which measures the size of this correction, we find = 0.9(3).
19. Finally, we invoke the low-energy theorem which relates the kaon mass difference to the ratio Q 2 ≡ (m 2 s − m 2 ud )/(m 2 d − m 2 u ) of quark masses [68]. The theorem can be compared with the Gell-Mann-Okubo formula, but there is an important difference: While that formula only holds at leading order of the chiral expansion and picks up corrections of first non-leading order, the relation relevant for Q receives corrections only at next-tonext-to-leading order. This implies that, instead of expressing the decay rate in terms of the kaon mass difference, we can just as well express it in terms of the quark mass ratio Q. Conversely, the measured decay rates in the charged and neutral channels yield two independent determinations of this mass ratio. The two results agree very well with one another -combining them, we obtain Q = 22.1 (7), where the error includes all sources of uncertainty encountered in the calculation, including an estimate for the neglected higher order contributions in the chiral series. These numbers indicate that, within QCD, the chiral expansion of the square of the Nambu-Goldstone masses is dominated by the leading terms, i.e. by the linear formulae of current algebra. At the physical values of m u , m d , m s , the higher order contributions amount to remarkably small corrections. 21. While the outcome of our calculation for the kaon mass difference in QCD agrees with the lattice results within errors, the values obtained for the isospin breaking quantities Q, R and m u /m d in two of the three most recent lattice calculations [91,92] do not. We point out that the discrepancy concerns the size of the corrections arising in the low-energy theorems for the corresponding ratios of meson masses. While the pattern obtained with our result for Q leads to a coherent picture, these lattice results imply that the corrections in R and S, which are of first order in chiral symmetry breaking are smaller than those in Q, despite the fact that the latter represent contributions of second order. In Sec. 9.5, we indicate a way to resolve this conundrum by means of a lattice simulation within QCD. 22. In the plane of the quark mass ratios m u /m d and m s /m d , a given value of Q corresponds to an ellipse, while a given value of S corresponds to a straight line. The yellow band in the left panel of Fig. 21 represents the region allowed by our result for Q, while the grey band represents the region allowed by the lattice result for S quoted by FLAG. For comparison, the figure also indicates the first estimates of the three lightest quark masses [105,106], which appeared shortly after the discovery of QCD. The hexagon represents the rough estimates for the range in the variables S, R and m u /m d where the chiral expansion yields a coherent picture, obtained many years ago [120].
The right panel focuses on the region of physical interest and includes recent results obtained on the lattice. In particular, it compares the outcome of our work with the region allowed by the lattice results according to FLAG [27] and to the Particle Data Group [66]. The outcome of the three most recent lattice calculations (BMW [91], RM123 [92], Bazavov   et al. [113]) is also indicated -the regions shown are obtained by treating the values obtained for S and m u /m d as statistically independent. 15 23. In Sec. 10, our analysis is compared with related work. There are two significant improvements compared to the early dispersive analyses in Refs. [14,15]: The experimental information about η-decay improved very substantially and the phase shifts of ππ scattering are now under much better control. Concerning the properties of the Dalitz plot distribution, the various investigations are now in reasonable agreement. In order to establish contact with QCD and to extract information about the quark masses from η-decay, however, the theoretical constraints that follow from the fact that the pions and the η-meson are Nambu-Goldstone bosons of a hidden approximate symmetry play a crucial role. These constraints can be analyzed in a controlled manner in the framework of χPT, but care must be taken not to leave the region where the first few terms of the chiral perturbation series provide a decent approximation. Some of the analyses found in the literature, for instance, rely on matching the dispersive and chiral representations directly in the physical region of the decay. Since the first few terms of the chiral perturbation series do not represent a good approximation there, this leads to incorrect conclusions. 24. The nonrelativistic effective theory provides a representation of the transition amplitude for the decay K → 3π that works very well [37][38][39][40]. The method even leads to a coherent analysis of the contributions from the electromagnetic interaction. Since M η is not much larger than M K , this approach can be expected to work for η → 3π as well. We have verified that the amplitude of Ref. [38] indeed fits the KLOE data perfectly well. Moreover, in the isospin limit and in the physical region, the NR framework yields an excellent approximation of our solutions. The subtraction constants of the dispersive solutions that match the NR amplitude have a sizeable imaginary part, but, throughout the physical region, the difference between the two representations is very small, for the imaginary part as well as for the real part. This demonstrates that the NR effective theory provides a suitable framework for the analysis of η-decay. An extension of the analysis of the e.m. contributions from Kto η-decay would be of considerable interest. 25. It is not a straightforward matter to establish contact between the nonrelativistic effective theory and the quark masses which occur in the QCD Lagrangian. Our approach relies on the assumption that, in the vicinity of the Adler zero, the one-loop representation of χPT provides a good approximation. The Adler zero is outside the region where the truncated expansion of the nonrelativistic effective theory represents a good approximation, but the link can be established by matching the dispersive and nonrelativistic representations in the isospin limit: (i) Determine the Dalitz plot distributions in the charged and neutral channels within the nonrelativistic framework. (ii) Take the isospin limit of the transition amplitude and expand it in powers of the spatial momenta of the three pions in the rest frame of the η. (iii) Match the coefficients of this expansion -the analogues of the scattering lengths -to those of the generic dispersive representation. It would be most interesting to carry this out, but we leave this for the future. In the following discussion, four values of s play a special role and we introduce corresponding symbols s 1 < s 2 < s 3 < s 4 to simplify the notation: For s > s 4 , both s + (s ) and s − (s ) are real and smaller than 4M 2 π , so that the integral over z runs along the real axis, to the left of the branch cut (Fig. 22 E). In this case, the situation is essentially the same as for ππ scattering, where elastic unitarity also implies a representation of the form (16) - (26), except that M η is replaced by M π (the expression for κ(s ) then simplifies to κ(s ) = s − 4M 2 π , so that the integration extends over the interval 4M 2 π − s ≤s ≤ 0 of the negative real axis). If s falls below s 4 , the term κ(s ) becomes imaginary: The integration runs along a line that is parallel to the imaginary axis (Fig. 22D). Note that, in view of the square roots, κ(s ) is defined only up to a sign. Since the integrand in (26) only contains the product z κ(s ), the average z n M I (s ) picks up the factor (−1) n if κ(s ) changes sign. The expressions for the functionsM I (s ) in (24), however, remain invariant. Hence the representation for the discontinuities is independent of the sign chosen for κ(s ). In Fig. 22, we have chosen the sign such that Im κ(s ) ≥ 0.
The straight line from s − (s ) to s + (s ) crosses the real axis ats = 1 2 (3s 0 − s ). As long as s stays above s 3 , this point is to the left of the branch cut, so that the path of integration avoids the singularity, but if s falls below that value, there is a problem: The straight line connecting s − (s ) with s + (s ) then crosses the singularity. The problem would not arise if M η were smaller than 3M π : The quantity M 2 η − 5M 2 π would then stay below 4M 2 π , so that, in the entire range over which s varies, the integral in (26) stays away from the branch cut. The very fact that the η does decay into three pions, however, implies that M η is larger than 3M π : The straight path of integration in (26) necessarily runs across the singularity generated by the interaction among one of the pion pairs.
The way out is to deform the path of integration. The right hand side of (26) represents an integral of the analytic function M I (s) over its argument: the term κ(s ) vanishes, so that the path shrinks to a point. This completes the specification of the angular averages. We emphasize that, in the above procedure, the complex parameter M merely serves to determine the proper path of integration. Once this path is identified, the limit δ → 0 can be taken -the numerical evaluation of the angular averages only involves the physical masses.

A.2. Contribution from the horseshoe
The representation (148) involves inverse powers of κ(s ). In the limit δ → 0, κ(s ) vanishes at s = 4M 2 π , s = s 2 and s = s 4 , so that the integrands of the dispersion integrals (33) become singular at these points. The first zero sits at the lower end of region A. The path of integration shrinks to zero there, s ± → s 1 = 1 2 (M 2 η − M 2 π ). Indeed the formula (26) shows that the angular averages tend to a finite limit, proportional to the value of the amplitude there: z n M I (4M 2 π ) = c n M I (s 1 ), with c 0 = 1, c 1 = 0, c 2 = 1 3 . The third zero sits at the boundary between the regions D and E, where the path also shrinks to a point: s ± →s 3 ≡ −M π (M η − M π ). The angular averages tend to a finite limit, given by an analogous formula: z n M I (s 3 ) = c n M I (s 3 ).
For the zero at s = s 2 , however, the situation is not that simple. This value is at the boundary between the regions B and C. For values of s in these regions, it is convenient to decompose the path into three segments: (i) a straight line from s − (s ) to the point s = 1 sion relation (33) are singular there. In the case ofM 0 (s ) andM 2 (s ), the singularity is integrable, but forM 1 (s ) this is not the case.
A horseshoe occurs in the angular averages only if s is in the interval s 1 < s < s 3 . Moreover, at the endpoints of that interval, the functions H n I (s ) vanish. In the case of M 1 (s), the contribution from the horseshoe thus takes the form Here, the imaginary part of the mass plays the role of a regulator: For positive values of δ, the function κ(s ) does not have any zeros on the real axis, so that the dispersion integral is perfectly well-defined, but the limit δ → 0 cannot be interchanged with this integral. The regulator is needed only in the immediate vicinity of the singularity. Only one of the three zeros of κ(s ) is in the range relevant here -in the corresponding square root, the regulator must be retained, but in the remainder, the limit can be taken: κ(s ) can be replaced by without changing the limiting value of the integral. Also, if φ(s )H 1 (s ) is replaced by φ(s )H 1 (s )−φ(s 2 )H 1 (s 2 ), the limit can be interchanged with the integration. The operation, however, generates fictitious logarithmic singularities at s = s 1 , s 3 because the modified integrand is discontinuous there. Since the function H 1 (s) vanishes at the endpoints, the integral (152) does not contain such singularities. The artefact is avoided if the subtracted term is multiplied with a factor h(s ) that is equal to 1 at s = s 2 , but vanishes at s = s 1 , s 3 . The singular part of the integral then boils down to G(s) = The profile of the factor h(s ) is irrelevant. We find it convenient to work with a parabola, h(s ) = (s − s 1 )(s 3 − s )/(s 2 − s 1 )(s 3 − s 2 ) -for this choice, the function G(s) can be given explicitly. Moreover, the integrand in (154) is then analytic in the lower half of the s -plane. Hence the path of integration can be moved away from the real axis into the lower half-plane without changing the value of the integral. The limit δ → 0 can then be interchanged with the integration: On the real axis, G(s) does approach a finite limit and we now remove the regularization.
The dispersion relation only involves real values of s. The representation (154) shows that the function G(s) admits a unique analytic continuation into the upper half of the s-plane. In view of the branch points at s = s 1 and s = s 3 , the continuation into the lower half-plane is ambiguous. We identify the first sheet with the values reached by continuing analytically across the interval s 1 < s < s 3 of the real axis, while the second sheet corresponds to continuation to the left of s 1 . The difference between the values of G(s) on the first and second sheets is given by −2πih(s)(s 2 − s) −3/2 . Since G(s) is regular on the first sheet, it must be singular on the second: The singularity ∝ 1/κ(s ) 3 encountered if the angular average is evaluated with M = M η sits on the second sheet. The i -prescription implies that the value of G(s) on the first sheet is relevant -the presence of a singularity on the second sheet does not affect it.
The integral (154) can be done explicitly. For real values of s below s 1 , the result reads: with w ≡ √ s 2 − s, w 1 = √ s 2 − s 1 , w 3 = √ s 2 − s 3 . In this expression, the branch point at s = s 1 is described by the term arctanh(w 1 /w). As long as s stays below s 2 , both w and w 1 are real and positive -the branch point occurs at the place where the ratio w 1 /w passes through 1. On the first sheet, the analytic continuation of the function arctanh(w 1 /w) yields a constant imaginary part equal to π/2. When s crosses the point s = s 2 , the real part of the function arctanh(w 1 /w) goes through zero. The expansion in powers of w starts with arctanh This formula also holds for arctanh(w 3 /w), where w 1 is replaced by w 3 . Inserting the expansion in the first bracket of (156), the leading terms cancel, but those linear in w do not -they generate a simple pole at s = s 2 , with a residue proportional to h(s 2 ) = 1. The second bracket, however, contains exactly the same pole with the opposite sign, so that the explicit expression for G(s) is indeed singularity free. Accordingly, the numerical representation in Fig. 23 does not show any trace of a singularity at the point s = s 2 .
The complications concerning the behaviour in the vicinity of the zeros of κ(s ) also require extra work in the iterative procedure used to determine the fundamental solutions. For a detailed discussion, we refer to [107,116]. The numerical evaluation of the dispersion relations is carried out on a lattice of points. An interpolation between these is required to calculate the integrands relevant for the next step of the iteration. At those places where the integrand varies rapidly -in the vicinity of the threshold, for instance -the lattice must be fine enough to arrive at an accurate result for the principal value integral. Remnants of the difficulties encountered can be seen in Fig. 2: The fundamental solution belonging to α 0 shows a small wiggle near s = 4M 2 π in M 0 (s) (as well as in the amplitude M n (s) relevant for the neutral channel) and the plot of the component M 1 (s) reveals a spike at the point s = (M η − M π ) 2 8.6M 2 π , which is also due to the limited accuracy of the numerical evaluation. On the other hand, in the vicinity of the third zero of κ(s), s = (M η + M π ) 2 24.3M 2 π , our results do not indicate numerical deficits. For the determination of the quark mass ratio Q, we need the integral over the square of the amplitude over the entire physical region. This integral is not sensitive to the numerical shortcomings mentioned above. The M -distribution in the neutral channel, however, is affected. As seen in Fig. 17, that distribution is nearly flat and high resolution is needed to resolve the structure of the cusps generated by the final state interaction π 0 π 0 → π + π − → π 0 π 0 . The prediction shown in that figure relies on the fundamental solutions of the integral equations obtained by Gasser and Rusetsky [56].
for which the discontinuity is proportional to σ(s): discJ(s) = σ(s)/16π. The result is of further singularity, the one generated by the Coulomb attraction between the charged pions where τ max The Gaussian approximation exploits the fact that, in the vicinity of the minimum, the discrepancy function χ 2 tot can be approximated by the truncated Taylor series in the variables ∆x i = x i − x i min : The probability distribution in the space of the variables x 1 , x 2 , . . . then takes the form dp = N exp Accordingly, the mean values are given by where C ik is the matrix inverse of D ik . In particular, the Gaussian errors in the variables Note that the errors are correlated and this must be accounted for when calculating the uncertainties in the various quantities of physical interest. The correlations concern the offdiagonal elements of the matrix C ik . Table 11 lists the entries of the normalized correlation matrixC ik = C ik / C ii C kk for fitKχ 6 . It shows, for instance, that the results for δ 0 and γ 1 are strongly correlated with γ 0 and β 1 , respectively.

E. Sensitivity to the ππ phase shifts
As discussed in Sec. 2.6, the Roy solutions of [16] are characterized by the values of the phase shifts at 800 MeV. We vary them independently in the range given in (27). In addition, in order to study the sensitivity of our results to the high energy tail of the dispersion integrals, we consider the change occurring if the dispersion integral for the dominating component is chopped off at 1 GeV. Because of the narrow resonance f 0 (980), the phase shift δ 0 (s) rapidly passes through π around 1 GeV. For our central input, the factor sin δ 0 (s) occurring in (34) thus passes through zero there, takes negative values above 1 GeV and then returns to zero at 1.7 GeV. Chopping the integral off amounts to replacing the factor sin δ 0 (s) by zero. The corresponding variation of the phase shift follows the central representation for δ 0 (s) only up to the energy at which it reaches the value δ 0 (s) = π and remains at that value from there on. In order to estimate the sensitivity to the uncertainties in δ 0 (s), we evaluate the various quantities of interest for the two different configurations obtained by identifying δ 0 (s) with the lower or upper boundary of the band shown in Fig. 1, while δ 1 (s) and δ 2 (s) are kept fixed at the central values. We take half of the difference between the two results as an estimate for the error due to uncertainties in the low-energy behaviour of δ 0 (s). The same procedure is applied to the variations in δ 1 (s) and δ 2 (s), as well as to the one in the contributions from above 1 GeV. The net uncertainty due to the noise in the ππ phase shifts is obtained by adding the four individual errors in quadrature.
The comparison with (180) shows that the errors generated by the noise in the phase shifts are significantly smaller than the Gaussian errors -except for γ 0 , where they are of comparable size. Table 2 lists the full uncertainties obtained by adding all errors in quadrature. The numbers indicate that, in the case of the subtraction constants, the error budget is dominated by the contributions from the Gaussian errors and from the uncertainties in the input used for the phase shifts -those associated with the isospin breaking corrections barely matter.

F. Sampling data in the neutral channel
Binning data on the decay η → 3π 0 in the standard Dalitz plot variables X n and Y n is in conflict with Bose statistics -only one sextant of phase space contains independent events, but some of the bins necessarily reach out of this sextant. Fig. 24 shows an alternative sampling of the data that does respect the fact that the three pions in the final state are indistinguishable. It is obtained by binning in the variables λ, ϕ, which are defined by X n = λ Z b (ϕ) cos ϕ , Y n = λ Z b (ϕ) sin ϕ .
The function Z b (ϕ) represents the value of Z ≡ X 2 n + Y 2 n at the boundary of the physical region, which depends on the angle ϕ = arctan Y n /X n . Fig. 25 shows that Z b (ϕ) decreases from the maximum at ϕ = 1 6 π to the minimum at ϕ = 1 2 π. A constant value of λ corresponds to a curve that represents a shrunk version of the boundary, while a fixed value of ϕ corresponds to a ray emanating from the origin. In these coordinates, the area element becomes dX n dY n = Z b (ϕ)λ dλ dϕ .
The binning divides the range 0 ≤ λ ≤ 1 up into a set of curved bands: Λ(n − 1) ≤ λ ≤ Λ(n) n = 1, . . . , n max . Band #n is divided into n bins -the sextant contains altogether 1 2 n max (n max + 1) bins (the figure corresponds to n max = 28 and 406 bins). For the bins to be of the same size, the area of the band must be proportional to n. This determines the binning in the variable λ: Λ(n) = n(n + 1) n max (n max + 1) .
The requirement that the bins are of the same size determines the binning in the variable ϕ as well. The first m bins of band #n must cover the fraction m/n of the area of this band. We denote the area spanned by the events in the range π 6 ≤ ϕ ≤ ϕ by The fraction of the area of a band that contains the events in the above range is given by The binning in the variable ϕ must therefore satisfy the condition f [φ(n, m)] = m/n , where φ(n, m) is the value of ϕ at the upper end of bin #m in band #n. To solve this equation, the function f needs to be inverted. We denote the inverse by g: g[f (x)] ≡ x. In this notation, the explicit expression for the quantity φ(n, m) reads φ(n, m) = g(m/n). Hence the binning in the variable ϕ is given by