Non-perturbative Test of the Witten-Veneziano Formula from Lattice QCD

We compute both sides of the Witten-Veneziano formula using lattice techniques. For the one side we perform dedicated quenched simulations and use the spectral projector method to determine the topological susceptibility in the pure Yang-Mills theory. The other side we determine in lattice QCD with $N_f=2+1+1$ dynamical Wilson twisted mass fermions including for the first time also the flavour singlet decay constant. The Witten-Veneziano formula represents a leading order expression in the framework of chiral perturbation theory and we also employ leading order chiral perturbation theory to relate the flavor singlet decay constant to the relevant decay constant parameters in the quark flavor basis and flavor non-singlet decay constants. After taking the continuum and the SU$(2)$ chiral limits we compare both sides and find good agreement within uncertainties.


Introduction
The Witten-Veneziano formula [1,2] was first derived by Witten for massless quarks It aims at providing an explanation for the unexpectedly large mass of the η meson, by relating its mass in the chiral limitM η to non-trivial topological fluctuations of the gauge fields, which are encoded in the topological susceptibility computed in pure Yang-Mills (YM) theory χ ∞ . Furthermore, the formula involves the singlet decay constant f 0 and the number of quark flavors N f . In order to obtain Eq. (1.1) one considers the limit of a large number of colors N c , for which simplifications of the theory occur that allow one to address a variety of problems which are otherwise impossible to investigate. In particular, in the 't Hooft limit (N c → ∞, while g 2 N c and N f are kept fixed, with g denoting the gauge coupling) the anomalously broken axial U (1) symmetry is restored and the η becomes a Goldstone boson, i.e.M η → 0 [1,3]. The formula itself is valid up to corrections of O(1/N 2 c ) whileM 2 η and 1/f 2 0 vanish as O(1/N c ). However, the topological susceptibility which appears on the right-hand side of Eqs. (1.1) remains finite in the large N c limit, i.e. χ ∞ = O(1), such that in the N c → ∞ limit both sides of Eq. (1.1) vanish. The question remains whether N c = 3 in QCD is large enough in practice to sufficiently suppress corrections to this formula to higher order in 1/N c .
An alternative way to derive the Witten-Veneziano formula is to expand the anomalous flavor-singlet Ward-Takahashi identities of the theory order by order in u = N f /N c around u = 0 [2,4]. Again, the Witten-Veneziano formula corresponds to the lowest order relation in this expansion. Besides, we remark that for lattice QCD it is also possible to obtain an unambiguous, theoretical sound implementation of the Witten-Veneziano formula through the study of anomalous flavor-singlet Ward-Takahashi identities in the limit u → 0 [5,6].
However, the very first attempts to compute that topological susceptibility in YM theory date back to the early 80s [7,8] and a first reasonable number was found in [9].
Moving away from the chiral limit leads to corrections to the formula which are linear in the quadratic meson masses This result was first derived in [2] from the aforementioned expansion in u. In the above expression terms have been reshuffled compared to Eq. (1.1) to isolate the topological susceptibility in the YM theory on the r.h.s., whereas the l.h.s. of the formula contains only quantities that need to be computed in full QCD. From a modern point of view, the Witten-Veneziano formula can also be derived from effective field theory employing a combined power counting scheme in quark masses m q , momenta p and 1/N c , given by m q = O(δ), p 2 O(δ) and 1/N c = O(δ), where δ is a small expansion parameter [10][11][12][13]. In principle, this approach allows one to systematically calculate higher order corrections to the standard form of Eq. (1.2) which represents the leading order expression with respect to the expansion in δ used in chiral perturbation theory (χPT). Note that in general f 0 = f π holds, as the singlet decay constant f 0 in Eqs. (1.1), (1.2) becomes equal to the octet decay constant f 8 only when simultaneously taking the chiral limit and dropping all corrections in 1/N c ) [11].
In this paper, we compute both the topological susceptibility in the pure YM theory (or quenched QCD) and the meson masses and the singlet decay constant f 0 in full QCD. After discussing the lattice actions, we will first discuss the determination of χ ∞ using the socalled spectral projector method [14] based on dedicated quenched simulations 1 . Thereafter, we discuss the determination of M η , M η , M K and in particular f 0 using N f = 2 + 1 + 1 lattice QCD. Finally, we compare both results for χ ∞ , finding good agreement between quenched and dynamic computation.
2 Lattice Actions 2.1 N f = 2 + 1 + 1 lattice QCD The calculation of the masses presented in this work was performed using configurations with N f = 2 + 1 + 1 dynamical Wilson twisted mass fermions at maximal twist generated by the European Twisted Mass Collaboration (ETMC) [15][16][17].
The lattice Wilson twisted mass fermion action 2 for the light sector [19][20][21], i.e. u and d quarks, is given, in the twisted mass basis, by: where m 0 and µ l are the bare untwisted and twisted quark masses, respectively. χ l = (χ u χ d ) T is a flavor doublet and τ 3 acts in flavor space. The massless Wilson Dirac operator D W is defined as where ∇ µ and ∇ µ are the forward and backward covariant derivatives. The lattice Wilson twisted mass action for the heavy doublet χ h = (χ c χ s ) T [21,22], i.e. strange and charm quarks, is given by: The bare twisted mass parameters µ δ and µ σ are related to the bare strange and charm quark masses through the following relation: where Z P /Z S defines the ratio of pseudoscalar and scalar flavor non-singlet renormalization factors. Both doublets, χ l and χ h , are related to their counterparts in the physical basis via chiral rotations. For the gauge sector the Iwasaki [23,24] gauge action was used which is defined as: where b 1 = −0.331 and b 0 = 1 − 8b 1 . The dynamical ensembles used in this work are compiled in Tab. 1 with the labeling adopted from Ref. [15]. For each β-value, there are several quark masses available, which allows one to study the chiral extrapolation. The input parameters of our simulations are supplemented by Tab. 2, which contains the chirally extrapolated values of the Sommer parameter r 0 /a for each value of β, which we use to set the scale in our dynamical simulations. In addition, we quote in this table the chirally extrapolated values of the renormalization constant ratio Z ≡ Z P /Z S from the two definitions discussed in Ref. [25]. These two definitions differ by lattice artefacts and are, therefore, helpful in understanding the corresponding systematic effects.  Table 1. The dynamical N f = 2 + 1 + 1 simulations that are included in our investigations. The notation that is used to label the ensembles is the same as in [15]. In addition to the value of β, the lattice volumes and the bare quark masses µ l , µ σ , µ δ we give the number of configurations N , the number of stochastic samples N s and the bootstrap block length N b that were used in the determination of flavor singlet quantities [26][27][28].  Table 2. Chirally extrapolated values of r 0 /a and Z P /Z S from the two methods M1 and M2 as discussed in [25] for our dynamical simulations.

Quenched Action
For the determination of χ ∞ in the pure YM theory, we have generated four quenched ensembles at four different values of the lattice spacing. For the gauge action, we used again the Iwasaki action -Eq. (2.5).
We emphasize that, in order to test the Witten-Veneziano formula, we tried to achieve a very similar setup for the quenched and dynamical situation -we took the same gauge action, we matched the physical volume and took a fixed value of r 0 µ in the valence Dirac operator used for spectral projectors, equal to r 0 µ of the dynamical simulations 3  details of determining κ c needed for O(a) improvement can be found in A. All the details of the quenched simulations are summarized in Tab. 3. The quenched simulations have been performed using the HMC algorithm implemented in the tmLQCD package [29]. The usage of this algorithm might introduce somewhat larger autocorrelation times compared to e.g. the heatbath with overrelaxation algorithm. However, compared to the dynamical simulations, the generation of quenched ensembles was still only a small effort.
3 Topological susceptibility in the pure Yang-Mills theory

The spectral projectors method
In order to compute the topological susceptibility (χ top ) in the pure YM theory (for this case, we will denote it specifically by χ ∞ ), we used the method of spectral projectors [14]. This method was originally applied to the pure gauge theory in Ref. [30] and then to the dynamical case for twisted mass fermions in Ref. [31].
We introduce the definition of the topological susceptibility in terms of the spectral projector R M and refer to the original papers for further details about the method [14,30]: where the ratio of renormalization constants Z P /Z S differs from unity due to the use of non-chirally symmetric Wilson-type fermions. The spectral projector R M is an orthogonal projector to the subspace of fermion fields spanned by the eigenvectors of the massive Hermitian Dirac operator D † D (with D = D W + m 0 + iµ l γ 5 τ 3 in our case) with eigenvalues not larger than M 2 . To achieve O(a 2 ) scaling towards the continuum limit, the renormalized value of the threshold M , denoted by M R , has to be fixed for all ensembles. The value of M R can be chosen arbitrarily, but it is advisable to avoid too small values (close to the renormalized quark mass) and choose aM R 1 to avoid enhanced cut-off effects [14]. In practice, we compute the following spectral observables: 4) where N is the number of stochastic sources η k (k = 1, . . . , N ) used for the construction of the spectral projector, solving the Dirac equation (D † D + M 2 )ψ = η k an appropriate number of times. The above observables are directly related to the right hand side of Eq. (3.1) through the following expression: Since the use of two independent sets of sources to compute the square of C would duplicate the cost of the calculation, we need to introduce the correction given by B /N to correct for the bias introduced by the usage of the same set of stochastic sources to compute the square of C.
In order to compute the ratio of renormalization constants Z P /Z S , we also use the spectral projector method, since the low cost of the calculation allows us to obtain reliable estimates of Z P /Z S not available a priori, since the quenched ensembles were generated specifically for this project. The ratio can be obtained through the following expression: This application was first proposed in Ref. [14] and applied for dynamical simulations of twisted mass fermions in Ref. [32]. In the latter reference, a comparison of the result given by spectral projectors and more standard methods like RI-MOM [33] and x-space [34] was done and compatible results were found. A study of discretization effects and finite volume effects was also presented.
3.2 Computation of Z P /Z S As we mentioned in previous section, in order to compute the topological susceptibility using twisted mass fermions, the final value has to be renormalized using the ratio Z P /Z S , which is only equal to unity in a chirally symmetric theory. Table 4. Results for Z P /Z S values using spectral projectors for the quenched ensembles. The errors quoted correspond to, first, statistical and, second, systematic uncertainty (coming from the dependence on M ).
In Fig. 1, the value of Z P /Z S for different values of r 0 M for the four quenched ensembles listed in Tab. 3 is shown. Following the strategy presented in Ref. [31], we take the central value at r 0 M = 1.5 and consider the spread of the results in the range r 0 M ∈ [1, 2] as the systematic uncertainty. Thus, we obtain the results presented in Tab. 4. The first error corresponds to the statistical error given by the spectral projectors method, where autocorrelations were taken into account. The second error corresponds to the systematic error introduced in our calculation through the residual dependence of Z P /Z S on M . Note that the systematic error is strongly decreasing as the value of the lattice spacing is lowered.

Continuum limit of χ ∞
The main aim of this work is to test the Witten-Veneziano formula. To this end, we need to compute a reliable continuum limit of the topological susceptibility in the quenched case. The physical conditions, such as the volume or the action used, were matched to the situation of the dynamical simulations used to compute the masses.
Following this strategy, we generated the first ensemble using the Iwasaki gauge action at β = 2.67 with a value of r 0 /a matching the one of the dynamical ensemble at β = 1.95. The other characteristics, such as the volume was matched to the ensemble B55.32 (see table 1) with a 32 3 × 64 lattice and the value of the valence quark mass for the spectral projector method was set to aµ = 0.0055.
To obtain the continuum limit we needed to extend the simulations to different lattice spacings. Thus, we kept the physical volume constant by imposing a constant the physical extent of the lattice. In particular, if we assume r 0 = 0.5 fm, the physical volume was kept at L ≈ 2.8 fm. Similarly, for the quark mass, we demanded the product r 0 µ to remain invariant. As already pointed out in the previous section, we do not have any estimate of Z P and hence we can only match the bare product r 0 µ, assuming that the changes in Z P are small. All the exact values for each ensemble can be found in Tab. 3.
As we mentioned in the introduction, the spectral projector method requires a mass input parameter M . In principle, the continuum limit result should be independent of this value, as long as it is kept fixed for all the ensembles entering the calculation. As the value   Table 6. Results of χ top for the continuum limit in the pure gauge theory. N s represents the number of stochastic sources whereas N meas is the number of evaluated independent configurations and N traj gives the corresponding number of thermalized MC trajectories. The errors quoted for χ top are statistical, coming from Z P /Z S and from r 0 /a, respectively.
of the renormalization constant Z P needed to renormalize the input parameter M is not available, an alternative strategy was followed to match the values of M R . We used the fact that the mode number remains constant at a fixed value of M R and in a constant physical volume [14], i.e. a 1 a 2 = ν 1 n 2 ν 2 n 1 , where a i is the lattice spacing and n i the number of lattice points of the ensembles. Hence, the condition In Tab. 5, the values of the mode number given by the spectral observable A are shown.
Once we know the value of the ratio Z P /Z S for each β value, we are prepared to compute the topological susceptibility χ top .
The continuum limit is a crucial aspect of lattice QCD. In fact, the rate at which a quantity approaches the continuum limit plays a fundamental role in lattice calculations, since it is directly related to the accuracy of the final computation of physical observables.
The twisted mass action at maximal twist guarantees the O(a) improvement for on shell observables [20]. The problem can arise if our observable is affected by short distance singularities, since it can spoil the, otherwise guaranteed, O(a 2 ) scaling.
Our observable, as discussed in detail in Refs. [35,36], is affected by short distance singularities. However, all terms linear in the lattice spacing that are consequently arising in the Symanzik expansion vanish at maximal twist [35]. Thus, our observable remains O(a) improved, guaranteeing an O(a 2 ) scaling towards the continuum limit. Consequently, we have performed a linear extrapolation in (a/r 0 ) 2 to extract the continuum limit which leads to the following continuum result: where the error is dominated by statistical uncertainty, but takes into account also the systematic ones (combined in quadrature with the statistical errors). In Fig. 2, the results of the topological susceptibility for different values of the lattice spacing together with the continuum limit extrapolation are plotted. All the intermediate and final results are compiled in Tab. 5 and Tab. 6, respectively, as well as other relevant details. Our continuum value for the topological susceptibility can be compared to the result of an earlier study [37], where a value of r 4 0 χ ∞ = 0.059(3) stat was given, using the massless Neuberger-Dirac operator on quenched ensembles. Taking into account that in [37] only statistical errors are quoted the two values are in reasonable agreement. Besides, the lattices in the present study exhibit a significantly larger volume (the physical value of L is about a factor two larger), while the range of lattice spacings is similar.

Autocorrelations
We close this section by a discussion of autocorrelation effects in the quenched simulations, which might affect significantly the errors. The topological charge is a quantity highly affected by autocorrelations when the continuum limit is approached due to appearance of topological barriers, i.e. the transitions between topological sectors are suppressed. For this reason, particularly long Monte Carlo simulations are needed in order to guarantee that all topological sectors are sampled adequately.
The topological charge is expected to follow a Gaussian distribution centered at zero in the large volume regime. Consequently, due to the fact that the stochastic observable C is closely related to the topological charge Q, we expect the same behavior for the distribution of C and, equally, C = 0.
In Fig. 3, the histograms of the stochastic observable C are shown for all the quenched ensembles at four different values of the lattice spacing. In all cases we were able to construct a histogram compatible with a Gaussian distribution within errors. Fig. 4 shows the Monte Carlo history of the observable C again for all the ensembles. It is clear, even from visual inspection of the plots, that autocorrelations become significant for the finest lattice spacing. Therefore, even though we generated a few times more trajectories for this ensemble than for the other ones, we only obtained an estimate of the topological susceptibility with a much higher statistical error, since we had to take a step of around  3200 trajectories between measurements to obtain τ int compatible with 0.5, i.e. we had fewer than 120 independent gauge field configurations (see also Tab. 6).

Fermionic contributions to the Witten-Veneziano formula from dynamical simulations
Having established a reliable way to extract χ ∞ from the pure Yang-Mills theory, we still need to tackle the l.h.s. of Eq. (1.2). In this section, we discuss how to compute the relevant masses and the flavor singlet decay constant parameter f 0 . The computations in the flavor singlet sector are very demanding, because of large contributions from quark disconnected diagrams. Moreover, the computation of f 0 on the lattice turns out to be technically involved, due to the fact that the axial vector matrix elements cannot be computed with sufficient signal-to-noise ratio. Therefore, we need to consider pseudoscalar matrix elements and resort to χPT to relate them to the desired axial vector ones. Further complications arise from the fact that the interpolating operators on the lattice are defined in the socalled quark flavor basis, whereas decay constants and the corresponding matrix elements are commonly defined in the singlet-octet flavor basis. This issue can be circumvented in the framework of χPT as well.

Computations in the flavor singlet sector
The masses for η and η mesons are computed as described in [26,28]. In particular, we employed the method of subtracting excited states in the quark connected part of the correlation functions [38]. Together with the application of the one-end trick for the evaluation of quark disconnected diagrams in the light quark sector [39], this yields a substantial improvement of the resulting statistical errors [28,40]. Here, we will only outline some basic details that are relevant for the discussion of the flavor singlet decay constant parameter f 0 in the next section. We use pseudoscalar operators in the physical basis: whereψ l , ψ l andψ h , ψ h refer to degenerate light and non-degenerate heavy quark doublets, respectively. The doublet structure and the flavor projector (1 ± τ 3 )/2 is required due to the twisted mass formulation and the flavor projector in the second lines allows to consider the non-degenerate charm and strange components separately. At maximal twist the corresponding operators read with S the scalar density. In the following, we will neglect the charm operator, as it does neither contribute to the η nor the η within errors, i.e. we consider only the strange component P −,tm h ≡ P tm s . The mixing between the flavor non-singlet scalar and pseudoscalar currents in the heavy sector of the twisted basis introduces a relative factor of Z S /Z P under renormalization: Similarly, we apply a factor Z S /Z P instead of Z S to the light operator S 3,tm l allowing to pull out a global factor of Z 2 P from the resulting, renormalized correlation function matrix: is build from operatorsS 3,tm l ,P tm s that are renormalized up to Z P and apart from that correspond to the ones in Eqs. (4.3,4.5).
When building this matrix, we subtract the excited states in the connected contributions and apply the required factors of Z S /Z P , as given in Tab. 2. We ignore the global factor Z 2 P as it will cancel analytically in all observables. Then, we proceed by solving the generalized eigenvalue problem [41][42][43]: which gives access to masses through and amplitudes respectively. Since masses are renormalization group invariants, they are not affected by the actual choice of Z S /Z P , which is only relevant for the computation of amplitudes leading to different lattice artifacts for the methods M1 and M2. Note that the amplitudes computed in this way are only renormalized up to a factor of Z P as well.
In Tab. 1, we give the number of configurations N used on each gauge ensemble to extract observables in the flavor singlet sector. Errors are computed using bootstrapping with 1000 samples and we have used blocking to deal with autocorrelations. The numbers of configurations per block N b are given in Tab. 1 and have been chosen to correspond to a block length of at least 20 HMC trajectories. In addition, we give the number of stochastic samples that has been employed in the computation of quark disconnected diagrams. It has been chosen such that the resulting statistical errors are dominated by gauge noise.
The results for all ensembles are given in Tab. 7. Clearly, the statistical error for a test of the Witten-Veneziano formula will be dominated by flavor singlet related quantities as can be inferred from the relative errors on the masses. This is caused by large contributions  from quark disconnected diagrams in the flavor singlet sector which are intrinsically more noisy compared to connected contributions and also introduce sizable autocorrelations in case of the η mass. For the computation of the kaon masses, we refer to Ref. [15,26]. The values are also listed in Tab. 7, together with the corresponding charged pion masses M PS . Note that some of these values were calculated at smaller statistics than the flavor singlet sector (c.f. Tab. 1). Nevertheless, their relative statistical error is still at least one order of magnitude smaller than for quantities in the flavor singlet sector.

Treatment of f 0
Consider the general definition for the decay constant f P 4 of any pseudoscalar meson P : where A a µ (0) denotes the axial vector current with flavor structure denoted by the index a. In the charged sector for mass-degenerate, light quarks, the axial vector current in the physical basis transforms into the vector current in the twisted basis at maximal twist. This feature can be exploited together with the PCVC relation to derive an expression for the charged pion decay constant f PS [44][45][46]: where a = 1, 2. Since this formula depends on a pseudoscalar matrix element, it can be calculated with very high statistical accuracy. Moreover, this relation allows one to compute f PS in the twisted mass formulation without the need for any renormalization factor at all, because a factor Z −1 P for the bare quark mass µ l cancels the factor for the pseudoscalar matrix element in the twisted basis.
Similar considerations apply for the kaon sector, although a complication arises from the fact that one has to employ interpolating operators made of light and heavy quarks. The relation for the kaon decay constant f K in the twisted mass formulation is given by: In this case the factor Z S /Z P is required again, because it enters µ s (c.f. Eq. (2.4)) as well as relative factors due to mixing between scalar and pseudoscalar currents.
Assuming exact isospin symmetry and neglecting possible contributions from the charm quark and mixing with further states such as glueballs, the most general parametrization for decay constants of the η, η system reads 14) The choice of A 0 µ , A 8 µ together with P = η, η in Eq. (4.11) defines the so-called singlet-octet basis.
Moreover, employing χPT, it is possible to relate the decay constant parameters in the η-η system to the remaining octet decay constants f π , f K and a low energy constant Λ 1 = O(1/N c ) occurring at next-to-leading order in the chiral expansion [11,12,47] by: In the chosen basis and to the given chiral order, the additional, OZI-violating corrections are specific to the singlet sector, i.e. the term ∼ Λ 1 = O N −1 c affects neither f 8 , nor any of the angles φ 0 , φ 8 , but only the parameter f 0 . From the last relation, it can be inferred that in the octet singlet basis, the difference between the two angles φ 0 , φ is given by SU(3) F -violating effects, leading to the expectation The fact that there are different types of contributions to the mixing, i.e. SU (3) Fbreaking and OZI-violating effects ∼ Λ 1 , can be exploited in order to choose a basis in which the two resulting mixing angles do not exhibit a sizable splitting. To this end, one introduces the quark flavor basis with the axial vector currents A 0 µ and A 8 µ replaced by the combinations 20) in which the light quarks and the strange quark contributions are disentangled. This is the reason why this basis is more convenient in lattice simulations, as this is also the preferred flavor structure for interpolating operators on the lattice, allowing one to directly access the corresponding matrix elements. In exact analogy to the singlet-octet basis, this basis again allows for a parametrization in terms of two decay constants and two mixing angles where the mixing matrix Ξ has the same form as the one defined in Eq. (4.14). In this basis, the relations between mixing parameters in the η-η system and f π , f K , Λ 1 read: The most important feature of the quark flavor basis becomes manifest in the last expression, which is now entirely given by an OZI-violating contribution ∼ Λ 1 = O (δ), amounting to additional suppression for the difference |φ l − φ s | compared to |φ 0 − φ 8 |, which is given by SU(3) F -breaking effects. Besides, in the SU (3) F symmetric case, the angles φ l ≈ φ s take the value φ SU(3) F = arctan √ 2, and, hence, their numerical value is not expected to be small. Therefore, one expects: in the quark flavor basis, which has been numerically confirmed in a previous lattice study [28]. This feature allows one to consider a simplified mixing scheme in the quark flavor basis with only one angle φ where Ξ (φ) ≡ Ξ (φ, φ).
As mentioned at the beginning of this section, we are restricted in our simulations to pseudoscalar operators for practical purposes, because the signal-to-noise ratio for axial vector operators turns out too small for a direct computation of the relevant observables. However, one may instead consider pseudoscalar matrix elements in order to retrieve information on the mixing parameters. This is possible due to the relation between axial vector and pseudoscalar matrix elements, which is given non-perturbatively by where for a = 0 we have T 0 = 1/(2N f ) 1 N f ×N f and ω (x) denotes the winding number density. Nonetheless, the anomaly equation of QCD itself is not sufficient for any practical purposes here, mainly because it requires knowledge of an additional matrix element involving the topological charge density. Therefore, one needs to gain further insight on how the pseudoscalar matrix elements are linked to the mixing parameters. This is again achieved by the use of χPT. Consider pseudoscalar currents in the quark flavor basis in analogy to Eqs. (4.19), (4.20):

28)
P s =siγ 5 s , (4.29) and the corresponding matrix elements for pseudoscalar mesons P that are given by To leading order, one can make contact with the quark flavor basis parametrization for axial vector matrix elements in Eq. (4.21), i.e. obtain an expression for h i P in terms of decay constants f l , f s , the mixing angle φ and octet meson masses [12]: Again, formally higher order, OZI-violating contributions are neglected in this expression, as demanded by the so-called FKS-scheme, which allows for the determination of processindependent mixing parameters [12,47,48]. Finally, we can combine Eqs. (4.15,4.16) and Eqs. (4.22,4.23) to write down leading order relations for the desired singlet decay constant f 0 in terms of the parameters f PS , f K , f l and f s , which we compute on the lattice: In general, these relations receive corrections of O(δ 2 ). However, since they were derived from continuum χPT, they also differ by lattice artifacts of O(a 2 ), if applied to our lattice data. We will in the following exploit this ambiguity to choose a definition for f 0 that exhibits particularly small systematic effects. Although the use of pseudoscalar matrix elements and changing the flavor basis requires to resort to χPT, we point out that this is not a serious drawback, as the above expressions are in general of the same order in the chiral expansion as the Witten-Veneziano formula in Eq. (1.2). In the following, we will refer to the three definitions Eqs. (4.32-4.34) of f 0 as D1, D2 and D3, respectively.
Regarding the computation of decay constants in the charged meson sector, we point out that while the values of f PS have been recalculated with the current statistics (they were first published in [15]), we used less configurations for f K . However, by far the largest contribution to the overall statistical error in our analysis stems again from the flavor singlet sector. Finally, we remark that a dedicated study of η,η -related decay constant parameters is currently in preparation [49].

Chiral extrapolations
Since our lattice simulations employ unphysical quark masses, we need to perform chiral extrapolations of our lattice data when computing the l.h.s. of Eq. (1.2). In principle, to the given order in χPT, this simply corresponds to a constant fit in (r 0 M PS ) 2 . However, we have to take into account lattice artifacts which might be different for the three definitions Eqs. (4.32-4.34).
Moreover, the dependence on the choice of Z P /Z S is non-trivial, as it affects the relevant decay constants (besides f PS ) directly through the renormalization of the corresponding matrix elements, as well as through the relative renormalization factor that enters the quark mass µ s in Eq. (2.4), which appears in the definition of f K in Eq. (4.13).
In Fig. 5 we show results for r 4 0 χ ∞ calculated from using our results for the relevant meson masses as given in Tab. 7. For the plots a.), c.) and e.) in the left column we used Z P /Z S values from method M1 and for the right ones b.), d.) and f.) those from M2; c.f. Tab. 2. The three rows in Fig. 5 correspond to the three definitions of f 0 D1, D2 and D3, respectively.
Clearly, the three definitions of f 0 show different lattice artifacts and systematic effects regarding their Z P /Z S (and hence m s ) dependence. Besides these systematic effects, their relative statistical errors differ as well. Within errors, the first definition D1 shows the largest lattice artifacts, as well as the most significant dependence on the choice of Z P /Z S . Applying a constant fit in (r 0 M PS ) 2 does not provide a good description of the data in this case, as can be seen from panel a.). The fitted value is much lower than for any other definition and most points are incompatible with the fitted line. However, the data points show a trend towards larger values at smaller lattice spacings, i.e. by fitting only to the data at the finest lattice spacing value we obtain r 4 0 χ D ∞ = 0.043(12) stat , significantly higher than the result of the fit to all data points.
The data points extracted from definitions D2 and D3 lead to a reasonable agreement with a constant extrapolation in (r 0 M PS ) 2 , with clearly the best fits stemming from definition D3. In particular, with definition D3, the extrapolated value is merely independent on the choice of Z. We have also tried to add a term of O(a 2 ) to our fit. It turns out that apart from the most extreme case shown in panel a.) of Fig. 5, the data does not allow to resolve such a dependence within errors, i.e. the results for the corresponding coefficient are compatible with zero. Regarding the dependence on the strange quark mass, we point out that to the given chiral order, the computed χ ∞ is expected to be a constant function of (r 0 M K ) 2 as well. Although the data shows a residual dependence on the strange quark mass, no clear picture arises with respect to its functional form.
We have collected numerical results from our constant fits for χ ∞ in Tab. 8 for the three definitions of f 0 and the two sets of values for Z P /Z S . In addition, we give the respective χ 2 /dof values. Clearly, the fit M1D1 is by far worst, which is expected due to the large cutoff effects in this case.
In order to obtain our final result for χ ∞ from the dynamic simulations, we apply a weight to each fit: where p denotes the p-values corresponding to the values of χ 2 /dof given in Tab. 8 and take the average over all six fits, leading to: The systematic error has been chosen as the mean absolute deviation from the central value and should reflect the uncertainties from residual cutoff and strange quark mass effects. Since we included the fit M1D1, which suffers from particularly large lattice artifacts, the value of the systematic error should be considered a conservative estimate.   The solid black line represent the final, p-value weighted average from dynamical simulations and its statistical error is indicated by the gray band. In addition, the dotted lines correspond to its systematic error; see text.

Discussion
In Fig. 6, we compare the pure YM topological susceptibility to the left hand side of Eq. (1.2) computed in N f = 2 + 1 + 1 lattice QCD. All results are given in units of the Sommer parameter r 0 . The weighted mean of the dynamical results and its statistical error is indicated by the gray band. The systematic uncertainty of the weighted average is shown by the dotted horizontal lines. Apart from the outlier M1D1, which is affected by sizable cutoff effects, the results from the dynamical simulations are very close to the one from the pure YM theory. Also the agreement of the quenched and the averaged dynamical result appears to be good within statistical and systematic uncertainties.
A complication arises when one attempts to convert the results to physical units. It stems from the value of r 0 in the YM theory: while in N f = 2 + 1 + 1 QCD direct contact to physical quantities is natural, in the YM theory such a relation is not obvious. In particular, the question arises whether or not the values of r 0 in the YM theory and in N f = 2 + 1 + 1 QCD are expected to be equal. From a general point of view, we do not see a reason for this to be true. Therefore, one may use the standard value of r 0 = 0.5 fm to convert the YM result to physical units and obtain: χ YM ∞ = (185.3(5.6) stat+sys MeV) 4 .
For the dynamical simulations, on the other hand, we can use the value r 0 = 0.474 (14) fm computed in Ref. [25] to convert to physical units. Taking the aforementioned weighted average, we obtain: χ dyn ∞ = (193.5(6.2) stat (13.4) sys MeV) 4 , where we have included the error on the physical value of r 0 in the statistical uncertainty. We find rather close agreement between quenched (χ YM ∞ ) and dynamical (χ dyn ∞ ) results with deviations of only up to O(10%). This confirms the validity of the Witten-Veneziano formula to the given order and for the assumed value of the quenched r 0 also in physical units.

Summary
In this paper, we have presented a non-perturbative test of the famous Witten-Veneziano formula. This formula relates the large mass value obtained for the η meson to the anomalously broken axial U (1) symmetry in QCD. It, therefore, provides important insights for our understanding of QCD and the generation of masses.
We have computed the topological susceptibility in the pure YM theory with dedicated quenched lattice simulations using the so-called spectral projector method. In particular, by using four values of the lattice spacing, we were able to perform a reliable continuum extrapolation and, hence, control this major systematic uncertainty.
For the first time, we have computed the flavour singlet decay constant f 0 using N f = 2 + 1 + 1 lattice QCD. Together with the η, η and kaon meson masses determined in Ref [28], this allowed us to compute also the l.h.s. of the Witten-Veneziano formula. Again, lattice artifacts are controlled by using three values of the lattice spacing. By using a wide range of light quark mass values, also the SU(2) chiral extrapolation was performed in a controlled way. The strange quark mass dependence was found to be not important for the Witten-Veneziano formula.
The comparison of the pure YM topological susceptibility and its Witten-Veneziano counterpart from full QCD leads to agreement within errors. This finding provides clear evidence for the hypothesis that the large mass of the η meson is due the anomalously broken axial U (1) symmetry in QCD.  In order to guarantee the O(a) improvement, we need to tune κ to its critical value for the valence quarks. To do so, we followed the strategy introduced in [50]. Thus, we computed the of κ through the evaluation of m pcac for different values of the quark mass aµ. In particular, we imposed m pcac < 0.1aµ. In Fig. 7, we show a particular example of the chiral behavior of κ c . In all cases, we perform a chiral fit considering only the lowest masses aµ < 0.01 since the larger masses deviate from the linear behavior. Notice that the data is highly correlated and the fit needed to take this correlation into account.