Hadronic light-by-light contribution to $(g-2)_\mu$ from lattice QCD: a complete calculation

We compute the hadronic light-by-light scattering contribution to the muon $g-2$ from the up, down, and strange-quark sector directly using lattice QCD. Our calculation features evaluations of all possible Wick-contractions of the relevant hadronic four-point function and incorporates several different pion masses, volumes, and lattice-spacings. We obtain a value of $a_\mu^{\text{Hlbl}} = 106.8(14.7) \times 10^{-11}$ (adding statistical and systematic errors in quadrature), which is consistent with current phenomenological estimates and a previous lattice determination. It now appears conclusive that the hadronic light-by-light contribution cannot explain the current tension between theory and experiment for the muon $g-2$.


I. INTRODUCTION
The anomalous magnetic moment of the muon, a µ ≡ (g − 2) µ /2, is one of the most precisely measured quantities of the Standard Model (SM) of particle physics. Its value is of considerable interest to the physics community as, currently, there exists a 3.7σ tension between the experimental determination of Ref.
[1] and the current theoretical evaluation (see Ref. [2] and references therein). Although the central value of the theoretical prediction is overwhelmingly dominated by QED effects, its uncertainty is dominated by low-energy QCD contributions. If the tension persists under more precise scrutiny, it is possible that a 5σ discrepancy could appear, heralding an indirect determination of Beyond the Standard Model (BSM) physics.
A new series of experimental results (E989 at Fermilab [3] and E34 at J-PARC [4]) intend to increase the precision of the experimental determination by a factor of about four; as it stands, the error on a µ is at the level of 63 × 10 −11 . Similarly, the theory community is striving to reduce the error of their determination to match the upcoming experimental precision. One of the contributions that is of specific interest is the hadronic vacuum polarisation (HVP), which enters at O(α 2 QED ). Being a QCD quantity dominated by hadronic scales, this contribution can be directly obtained from first-priciples lattice QCD calculations, although currently its most precise estimate [2,[5][6][7][8][9][10][11] comes from dispersive methods and is 6931(40) × 10 −11 . Significant progress has been made in recent years within the lattice approach [12][13][14][15][16][17][18][19][20][21][22], and these determinations are quickly becoming competitive with the dispersive approach.
A much smaller contribution to the overall (g − 2) µ comes from hadronic light-by-light arXiv:2104.02632v1 [hep-lat] 6 Apr 2021 scattering (Hlbl), entering at O(α 3 QED ). However, this quantity is currently only known at the 20% level: the recent evaluation of Ref. [2], omitting an estimate of the small charmquark contribution, amounts to [23][24][25][26][27][28][29][30][31][32][33][34][35] 89.0(19.0) × 10 −11 . Thus the absolute uncertainty of the Hlbl contribution is only about half that of the recent average [2] for the HVP. To match the expected experimental precision, it is thought that the Hlbl contribution a Hlbl µ needs to be known with a precision of around 10%. The task of directly computing this contribution using lattice QCD methods is quite daunting, as it requires the computation of connected and disconnected four-point functions. Few lattice groups have even performed measurements of the leading contributions, and none with the desired precision. The most-precise lattice determination to date [36] uses the finite-volume QED L prescription and quotes a value of (adding their statistical and systematic errors in quadrature) 78.7(35.4) × 10 −11 . In [37], we provided an estimate at the physical pion mass, starting from our SU(3) f -symmetric point result and correcting for the neutral-pion exchange [27,38], of 104.0(20.8) × 10 −11 .
We extend our previous determination of the Hlbl contribution to (g − 2) µ at the SU(3) fsymmetric point [37] by incorporating data from simulations at pion masses as low as 200 MeV. We also provide estimates for the sub-leading (3 + 1), (2 + 1 + 1), and (1 + 1 + 1 + 1) contributions, providing a full first-principles calculation using lattice QCD with a competitive overall error.
This work is organised as follows: first we introduce our approach and formalism for measuring a Hlbl µ using lattice QCD and infinite-volume perturbative QED in Sec. II. In Sec. III, we discuss the numerical techniques and effort for our determination. Section IV contains a comparison of the integrand to the predictions of hadronic models. We then present results for the leading fully-connected and (2 + 2) diagram contributions with light (Sec. V) and strange (Sec. VI) quark content. In Sec. VII we discuss the determination of the higher-order (3 + 1), (2 + 1 + 1), and (1 + 1 + 1 + 1) contributions. We finally discuss the systematics of our largest contribution in Sec. VIII, and combine all of our determinations and draw conclusions in Sec. IX.

II. FORMALISM
In order to have a better control over the long-distance QED effects, we use a positionspace approach, which consists in treating the QED part perturbatively, in infinite-volume and in the continuum, and the hadronic part non-perturbatively on the lattice [39][40][41]. Due to the O(4) symmetry in the Euclidean continuum, the hadronic light-by-light contribution to the anomalous magnetic moment of the muon, a Hlbl µ , admits the following integral representation where f (Topology) (|y|), henceforth called the integrand (for a fixed diagrammatic topology), is itself obtained as an integral over spacetime (in our notation x = d 4 x), Topology f (Topology) (|y|) = m µ e 6 3 2π 2 |y| 3 xL [ρ,σ];µνλ (x, y) i Π ρ;µνλσ (x, y).
and i Π is the first moment of the connected, Euclidean, hadronic four-point function, i Π ρ;µνλσ (x, y) = − z z ρ Π µνσλ (x, y, z), The field j µ (x) appearing above is the hadronic component of the electromagnetic current, As for the QCD four-point function Π µνσλ , it consists of five classes of Wick-contractions, illustrated in Fig. 1: the fully-connected, the (2 + 2), the (3 + 1), the (2 + 1 + 1) and the (1 + 1 + 1 + 1). It can be shown that the contribution to a Hlbl µ of each topology is itself a gauge-independent observable, therefore it is legitimate to focus on each independently.
According to large-N c arguments and some numerical evidence provided by the RBC/UKQCD collaboration [36] on the (3 + 1) topology, only the first two (the fully-connected and (2 + 2)) of the aforementioned classes are believed to be dominant, however no direct calculations of the subleading classes have been performed until now. In addition, the last three classes, which we refer to as higher-order topologies, are suppressed by powers of the light-minusstrange quark-mass difference around the SU(3) f -symmetric point, and necessarily vanish exactly at that point.
As the integrand (f (|y|)) is a scalar function in |y|, our computational strategy consists in calculating the integrand, the inner integrals over x and z being replaced by sums, averaged over many equivalent instances of the origin and the y-vector for a given |y| to enhance statistics, and then applying the trapezoidal rule to approximate the integral over |y| of Eq. (1), in order to finally obtain a Hlbl µ for each gauge ensemble. We then take the appropriate infinite-volume and continuum limits and extrapolate our result to physical quark masses.
In addition to showing the integrand, often we will find it useful to present the partiallyintegrated quantity, This quantity is typically less sensitive to point-by-point fluctuations in f (|y|) and adequately illustrates the salient features of the calculation. Our expectation is that the partially-integrated quantity admits plateau as |y| Max. is increased, indicating the integral has saturated within the uncertainties. Exploiting the Ward identities associated with current conservation, the QED kernel can be modified by adding to it terms which do not contribute to a Hlbl µ in the infinite volume limit [41,42]. To mitigate the signal-to-noise problem of vector-current lattice correlation functions at large seperations, one would like to choose a QED kernel which guarantees a rapid fall-off of the integrand f (|y|) at large |y|, without picking up large discretisation effects by making it too-peaked at short distances. Due to the gauge-invariance of each topology, one can even work with different choices of kernel for each topology individually.
In our previous work at the SU(3) f -symmetric point [37], we have shown the effectiveness of a certain one-parameter family of kernels,L [ρ,σ];µνλ , with positive, real Λ. Our preferred choice for this parameter is Λ = 0.4; this was motivated by several studies of the shape of the integrand: a continuum and infinite volume QED calculation of the lepton loop contribution, a study of the pion-pole contribution with a Vector Meson Dominance (VMD) parametrisation for the transition form factor in the continuum and finite volume, and our direct lattice calculations at the SU(3) f -symmetric point. While Eq. (2) represents our general master formula for the integrand f (|y|), for computational reasons it can be beneficial to exploit the translational invariance of the QCD correlation function to re-arrange the integrand in different ways, such that only the most favourable diagrams within each topology class have to be explicitly computed. In our previous work, we showed that for the fully-connected contribution such an approach reduced the computational cost significantly without introducing undesirable effects when used in conjunction with the kernelL (Λ) [ρ,σ];µνλ [37]. In the subsections below we present the specific integral representations that we use for each of the five topologies.
For notational simplicity, we will find the following QED kernel combination useful, Also, in the equations below we anticipate our use of the local vector current on the lattice, which requires a multiplicative renormalisation factorẐ V .

A. The fully-connected contribution
For the fully-connected calculation we use the following master equation for the integrand: with hadronic contribution Here S j (x, y) is the flavour j-quark propagator from source y to sink x, Q j is the charge , and · U denotes the ensemble average.

B. The (2 + 2) contribution
We start by defining the two-point function "meson-field" which must have its vacuum expectation value (VEV) subtracted: We use the following integral representation for a Note that the VEV subtraction is necessary to guarantee that the two quark loops are connected by gluons, in the perturbative picture. In this representation, the factorisation of the x-and z-integrations makes the lattice computation easier. Similar patterns can also be found in our choice of representation for the higher order topologies for the same reason. We call light-light contribution the set of diagrams consisting exclusively of light quarks. Likewise, the strange-strange contribution contains only strange quark loops. Finally, the light-strange case covers all diagrams containing one light and one strange quark loop. These sub-contributions can easily be constructed by combining different terms in Eq. (11). As the integral is constructed as a post-processing step, the light-quark and strange-quark loops can easily be combined.
C. The (3 + 1) contribution As we work with N f = 2 + 1 lattice ensembles, we assume the mass-degeneracy between the u-and d-quark from here on to simplify our expressions. We begin by defining the two hadronic building blocks (here l and s refer to light and strange quarks respectively), and The quantity R i will be referred to as a triangle with quark species i, and T will be called disconnected loop.
Our expression for the integrand for this contribution reads It is worth noting that unlike in the (2+2) case, no VEV-subtraction is needed for the (3+1) contribution, because the VEV of the three-point function and the one-point function vanish due to the charge conjugation symmetry of the QCD action. In later sections, we will call (3 + 1) light and (3 + 1) strange the sub-contribution with light and strange quark triangle respectively.
D. The (2 + 1 + 1) contribution We can derive a representation for f (2+1+1) from the expression for the (3 + 1) topology; the idea is to split the triangles appearing in the expression of the (3+1) integrand into a sum of products of two-and one-point functions, and then correct the diagram double-counting. In doing so, the terms involving the disconnected quark loop T in Eq. (14) can be reused for the (3 + 1) calculation, as we perform more self-averages for this noisy, more-disconnected quantity (see Sect. III). Moreover, we apply a change of variables to avoid the case where a disconnected loop is located at the origin to increase the number of available samples per |y|.
More explicitly, we define the two quantities and we write Here we finally give our parametrisation of the fully-disconnected (1 + 1 + 1 + 1) contribution. This again takes advantage of the quantities computed in the previous cases. Here, one needs to carefully subtract the non-vanishing VEVs appearing in different pieces in this contribution. We define the following quantity: With this definition in place, we can write down the expression we used for the integrand for this topology, after correcting the triple-counting of the diagrams, As a concluding remark for this section, in some of the provided expressions, terms with a z-integral without a z-dependent weight factor appear. These could be reduced and in some cases vanish in the infinite-volume limit due to the Ward-identity associated with current conservation. Such a modification would in general change the shape of the integrand, as well as its statistical variance. For definiteness, our lattice calculations are done precisely with the expressions given in this section.

III. NUMERICAL SETUP
This section presents the gauge ensembles used in our calculation of a Hlbl µ , as well as the different strategies we applied to compute the contributions of the different topology classes.

A. Ensemble details
In this work we use N f = 2 + 1 O(a)-improved Wilson fermion ensembles generated by the CLS initiative [43], for which the improvement coefficient c SW was determined nonperturbatively in [44]. We extend our previous work at the SU(3) f -symmetric point to ensembles with m l < m s down to pion masses of 200 MeV, while maintaining Tr[M ] = Constant, with M = diag(m u , m d , m s ) the quark mass matrix. In addition, we make use of two ensembles at a further, coarser lattice spacing. We combine the symmetric-point results of our previous determination [37] with measurements taken from nine other ensembles to create a large data set from which all sources of systematic error can be estimated. Table I summarises the gauge ensembles used, their pion and kaon masses, the lattice spacings, as well as the quark-mass dependent renormalisation factors,Ẑ V . The latter is either measured   [46], apart from the "A" ensembles, where the lattice spacing was estimated from ratios of the Wilson flow parameter t 0 at the flavour-symmetric point. Pion and kaon masses primarily come from [18] unless directly measured as part of this work (indicated in bold) or in a recent project [47] (underlined). Likewise, values ofẐ V can be obtained from [45] unless also measured as part of this project, using the same approach. Columns two through six indicate the flavour content computed for each class of diagrams: fully connected (4), leading disconnected (2 + 2), and subleading (3 + 1), (2 + 1 + 1), and (1 + 1 + 1 + 1), where "+" has been omitted for space reasons. Zeros indicate diagrams that vanish by SU(3) flavour symmetry. directly as part of this work or taken from [45], here we only use un-improved local vector currents in this work 1 . The coverage of the lattice spacing and pion mass variables by the gauge ensembles used in this work is illustrated in Fig. 2.

B. Computational strategies and numerical cost
Tab. II illustrates the number of gauge configurations used for our study and the number of point-source propagator inversions per configuration performed for both the fullyconnected and (2 + 2) disconnected. For the disconnected, over an order of magnitude higher statistics was used in comparison to the connected. Typically we favor having larger multiplicities of |y| per configuration, as opposed to a larger number of configurations for the disconnected piece, since it is more effective at reducing the noise. Ideally the number of self-averages per-|y| range in the thousands per configuration, overall the total statistics (configurations×sources×self-averages) lies in the low millions per point. We follow the same setup as in [37], building a grid of point sources in such a way as to maximise the number of self-averages available per |y|.
Although the total number of propagator solves per ensemble are comparable, the computational cost per solve as the pion mass is reduced grows significantly, as the lattice volume increases such that m π L ≥ 4, and generically the cost of a solve grows like V n m m π with m and n both being greater than unity. Although we used a particularly sophisticated propagator-solving routine [48], this prohibitive growth in cost is presently unavoidable.
To partially ameliorate the overhead from propagator solves, a truncated solver/AMA technique [49,50] was used for all of the (2 + 2) contributions on ensembles away from the SU(3) f -symmetric point, with a sloppy stopping criteria of 10 −3 on the norm of the residual. As the propagator solve cost was dominant in the (2 + 2) calculation, a sloppy solve on one of our most expensive ensembles (D200) was approximately 6× faster than a high-precision solve to 10 −10 .

C. Higher-order contributions
For the quark loops containing a single (local) vector current insertion, we make use of an extensive general-purpose data set generated as part of a different project [47]. Therefore we restrict our description of the computational aspects related to these loops to those directly relevant to the Hlbl calculation. Since we are dealing exclusively with the electromagnetic current, it is always the difference of a light and a strange quark loop that is needed. To compute this difference, the "one-end trick", which has been applied extensively in twistedmass fermion calculations [51,52], is used as proposed in Ref. [53]. The one-end trick yields Fully-connected (2 + 2) Ensemble Fully-connected (2 + 2)  [37], although an update for the (2 + 2) on N202 has been performed here and the coarse ensemble A653 has been added.
an efficient estimator for the required difference of Wilson-quark loops based on the identity The right-hand side of this equation is evaluated using stochastic volume sources, inserted between the two propagators, without spin or color dilution. In this way, gauge noise is reached after a few hundred sources at most. The stochastic estimate of the quantity (19) is averaged over blocks of individual volume sources, leading to four "effective" sources that are stored separately as entire fields. Having access to four effective sources is sufficient to compute all higher-order disconnected diagrams for a Hlbl µ without introducing any bias into the final result. For further technical details of the general computational setup we refer to the description in Ref. [47].
The parametrisations of Eq. (14), Eq. (16), and Eq. (18) share certain x-or z-integrals, which allows us to precompute and recycle these terms for the different contributions. For the (3 + 1) contribution, the triangle term defined in Eq. (13) and the terms derived from it can be conveniently obtained from the intermediate quantities in the calculation of the fully-connected contribution. Based on this observation, we choose for the (3 + 1) topology the same set of points for our origin and y-vector as for the fully-connected. Once the factorised terms in Eq. (14) are computed, the Lorentz contraction with the terms which contain a disconnected loop can be performed off-line as a post-processing step.
For the (2 + 1 + 1) contribution, we first compute and save the lattice-wide two-point functions, Eq. (9), for each source position, and then do the VEV subtraction and construct Eq. (16) again off-line. The sources are chosen to be the same set of points as for the fully-connected case. Nonetheless, after setting the origins at these source points, our parametrisation Eq. (16) still allows us to have many choices for the y-vector for a given |y|, because we have at our disposal the two-point function Eq. (9) and the disconnected loop Eq. (12) as entire lattice fields.
A good choice for the y-vectors is hence to pick from the elements on the same orbit under the cubic group. As an example, to obtain (|y|/a) 2 = 12, one can choose the 4-vector , if all these points fit in a range where boundary effects can be neglected. A summary of the choices of the y-vector for the ensembles used for the (2+1+1) computation is given in Table III. Likewise, the (1+1+1+1) calculation also benefits from this strategy because of the reuse of the data generated for the (2 + 1 + 1) integrand.

IV. THE INTEGRAND OF THE TWO DOMINANT CONTRIBUTIONS
In this section, we describe the integrands of the light connected and light (2+2) disconnected contributions obtained in our lattice QCD calculations. Our goal is on the one hand to present some of the available data at small pion masses, and on the other to compare it to the predictions of hadronic models, such as the π 0 exchange contribution. Finally, an observation on the approximate analytic form of the integrand for the latter contribution motivates the analysis of the lattice data presented in the next section. The light connected contribution on the three most chiral ensembles. The solid curve represents the π 0 exchange in infinite volume, computed with the parameters directly determined on ensemble D200 [27]. Right: The light connected contribution on ensemble D200, compared to the predictions of the π 0 exchange (with a VMD transition form factor), the constituent quark loop, as well as the charged pion loop. The latter two contributions are computed within spinor and scalar QED, respectively.
We begin with the left panel of Figure 3, showing an overview of the integrand of the light connected contribution for our three most chiral ensembles (C101, D450, D200), for which the pion mass lies in the interval 200 to 220 MeV. These three ensembles have different lattice spacings and different volumes, nevertheless the corresponding data points fall within one recognisable band. The maxima of these integrands, which lie between 0.7 and 0.9 fm, are followed by a slow fall-off. Beyond |y| = 2 fm, the integrand vanishes within the uncertainties. The height of the maximum is about 20% higher than at the SU(3)-flavour symmetric point [37], m π = m K ≈ 420 MeV. Figure 4 focuses on the data of ensemble C101. The connected and (2+2) disconnected data are displayed separately in the two panels. The disconnected integrand is negative and admits a minimum at |y| ≈ 1.2 fm. The signal degrades sooner than in the connected case, and is lost around 1.5 fm. The ordinate of the minimum is about twice as large as the one found on ensemble H101 at the SU(3)-flavour symmetric point [37], despite the fact that the latter case includes the strange quark, so that this contribution is weighted with the electric-charge factor 36/81 rather than 25/81. Thus we anticipate a very strong chiral dependence of the (2+2) disconnected contribution to a Hlbl µ . Figure 4 also compares the integrand to pseudoscalar-exchange predictions based on the vector-meson dominance (VMD) parametrisation of the corresponding transition form factor. As weight factors with which (π 0 , η, η ) contribute to the connected diagrams, we have used (34/9, 0, 0); the weight factors we have applied for the disconnected diagrams are (−25/9, 1, 1). While these weight factors are certainly the expected ones for the π 0 , the issue of which weight factors are appropriate for the isoscalar mesons is more complicated and depends in particular on their mixing; see Tab. X and the corresponding analysis presented in appendix, as well as Refs. [54,55]. For the π 0 exchange, the contribution has also been computed in finite volume. As can be seen on the left panel, the finite-volume connected integrand is predicted to dive towards negative values at long distances. Whether the lattice data does the same is uncertain due to the growing statistical errors. The lattice data points lie below the π 0 -exchange prediction. A very similar observation was made at the SU(3)-flavour symmetric point [37]. We do not have a clear explanation for the difference, but note that for ensemble D200, we observe a somewhat better agreement (see the right panel of figure 3 discussed below). For the disconnected contribution, the finite-size effect on the integrand are predicted to become significant only around |y| = 2 fm, which is beyond the useful range of our lattice data. The η and η contributions have been estimated very roughly by using the parameters indicated in the figure. The η mass estimate comes from using the Gell-Mann-Okubo formula, knowing the pion and kaon masses, while the same η parameters are used as in [37]. The two isoscalar mesons contribute comparably to a Hlbl µ . In the region between 0.8 and 1.2 fm, the π 0 -exchange prediction is consistent with the lattice data.
While the disconnected contribution does not have a strong short-distance contribution, the connected contribution does. Following [37], we attempt to explain the integrand semiquantitatively by combining a constituent quark loop with the long-distance contributions, i.e. the π 0 exchange and the charged pion loop. The right panel of figure 3 illustrates the comparison of this rough hadronic model with the lattice data. The quark loop as well as the pion loop are calculated in the spinor and scalar QED frameworks respectively, i.e. without the inclusion of form factors. Including the quark loop leads to a satisfactory description of the shape of the integrand, even though the total prediction overshoots the data at distances |y| 0.6 fm. This difference can partly be explained by the cutoff effects present in the data, which tend to reduce the lattice integrand, and it is likely that including a form factor for the constituent quarks would improve the agreement. At distances |y| 0.9 fm, the model prediction is consistent with the lattice data.
In summary, both in the connected and the disconnected case, the prediction for the π 0 exchange alone gives a good first estimate of the magnitude of the integrand. It also predicts the approximate shape of the integrand in the disconnected case. Hence it is worth asking whether the integrand for the π 0 exchange can be approximated by a simple analytic function. Figure 4 shows that the infinite-volume π 0 -exchange integrand can be approximated very well at its extremum and beyond with the form f (|y|) = A|y| 3 exp(−B|y|), displayed as a dashed line. In the connected case, the approximation holds to an excellent degree even at short distances. These observations, which apply to our specific choice of kernelL (Λ) , form part of our motivation to use this functional form in the next section to extend the integrand obtained in lattice QCD to long distances.

V. LIGHT-QUARK FULLY-CONNECTED AND (2 + 2) CONTRIBUTIONS
In this section, we describe the extraction of the dominant contributions, namely the light-quark fully-connected and (2+2) contributions. In the previous section, the integrands are illustrated and compared semi-quantitatively to the main known hadronic contributions. The rapid increase of the relative error on the integrand with growing |y| leads us to employ a method to extend the useful range of the data. In our previous calculation [37], the long-distance tail was assumed to come from π 0 , η, and η -exchange contributions, with the dominant π 0 + η part determined from a lattice calculation of the π 0 γγ transition form factor [27]. The fact that the integrand of the π 0 exchange itself is well described by a simple functional form has led us to adopt a more self-contained and data-driven approach, which relies on extending the tail via a fit to the data. In both the connected and (2 + 2) contributions, we perform a fully-correlated fit to the data from each ensemble using the ansatz where A and B are free parameters. In the intermediate |y| regime, this fit form describes all of our data well, with χ 2 /dof close to 1. As our data become noisy at large |y|, the fit significantly reduces the error for the integral of the long-distance tail. In our approach, we will choose a cutoff distance: below it, we will numerically integrate the lattice data using the trapezoid rule; above it, we will switch to integrating Eq. (20). The cutoff is chosen to be a point where the integrated a µ exhibits stability. The values of a µ from all ensembles will then be used in a combined chiral, infinite-volume, and continuum extrapolation. In particular, while in [37] the volume effects were corrected for on each ensemble using the prediction for the π 0 + η exchange, here we rely on a global fit to all ensembles to eliminate these effects, with an ansatz for the L-dependence motivated by the same meson exchange. Fig. 5 shows an example of our ability to describe the lattice data with Eq. (20) for the fully-connected contribution. The displayed data [37], corresponding to ensemble N202, are among the most precise at the SU(3) f -symmetric point, and the correlated fit has a χ 2 /dof of 1.1. The figure illustrates that the data is very well described by our ansatz all the way to the point where the data fluctuates around zero and the signal is likely lost.
Table IV summarises our results for the two leading sets of diagrams containing only light quarks. For the extrapolation to the infinite-volume, physical pion mass, and continuum limit for both the fully-connected and (2+2) disconnected contributions we use the following ansatz, where we have identified several candidates for the non-analytic function S(m 2 π ), Pole :: 1 m 2 π Log :: log m 2 π Log2 :: log 2 m 2 π m2Log :: m 2 π log m 2 π .
These functions are inspired by the divergent chiral behaviors at the large-N c limit of the pion-pole exchange and the charged-pion loop contribution [56].

A. Light-quark fully-connected results
The left plot of Fig. 6 uses the partially-integrated a µ (|y| Max. ) defined in Eq. (5) to illustrate the growth in the size of the connected contribution with decreasing pion mass.
Here we consider a constant lattice spacing (a = 0.0864 fm) and include data from similar m π L to help isolate the chiral behaviour. The curves begin at the cutoff where we switch to integrating the fitted Eq. (20), with the trapezoidal-rule integrals of the lattice data up to the cutoff added to them. The fit adequately reproduces the lattice data above the cutoff, and saturates where the trapezoidal-rule integral does, within the uncertainties. At large |y| Max. , some of the lattice points begin to drop below where the fit asymptotes; this is probably a mixture of finite volume effects and loss of signal in the integrand.
The right plot of Fig. 6 shows a combined chiral, infinite-volume, and continuum limit extrapolation based on the "Pole" ansatz. The horizontal axis is m 2 π , and we illustrate the dependence of the global fit on a and m π L via curves that show the fit at various fixed (a, m π L). The result increases along all three dimensions of the extrapolation (larger volumes, finer lattice spacings, and lighter pion masses), which produces a large combined effect.

B. Leading light-quark disconnected results
The (2 + 2) disconnected analogue of Fig. 6 can be found in Fig. 7. It is clear that much like the connected data, the size of the contribution increases with decreasing pion mass and so a very significant cancellation will occur at the physical pion mass between these two contributions with opposite signs. This cancellation was predicted in Ref. [54] on the basis of the π 0 exchange contribution and is illustrated in Fig. 4. It is also worth noting that the statistical precision of the disconnected data is significantly worse than the connected, even though almost an order of magnitude more measurements were performed.
On the right-hand side of Fig. 7, we show the chiral-continuum-infinite-volume extrapolation with different m π L at fixed lattice spacing. Much like in our previous work [37], we find the lattice-spacing dependence to have a slope of the same sign as the connected contribution. It is also evident that an accidental partial cancellation occurs between the finite-volume and lattice-spacing terms, with the approach to the infinite volume making the (2 + 2) contribution more negative and the approach to the continuum limit making it less negative.

C. Combined light-quark results
Due to the significant cancellation between the connected and the (2+2) contribution, we find it useful to take the ensemble-by-ensemble sum of the contributions and then perform the extrapolation. For this sum, our data cannot resolve any of the terms non-analytic in m 2 π of Eq. (22), and it appears that these contributions largely cancel. We find that the fit ansatz a µ (m 2 π , m π L, a 2 ) = a µ (0, ∞, 0)(1 + Am 2 describes our data very well. This ansatz assumes that any potential singular terms in our data cancel to a large extent, an assumption that we address along with the discussion of systematics in Section VIII. Here we simply note that, in addition to the cancellation between the connected and the (2+2) contribution, the chirally singular behaviour expected in a Conn+(2+2) µ from the π 0 exchange and the charged pion loop is numerically suppressed over the pion-mass interval 135 to 200 MeV, due to a partial cancellation between these two long-distance contributions.   8 shows an extrapolation for the sum of the light-quark, fully-connected and (2 + 2) disconnected contributions. It appears in the plot that no chiral curvature is present in this combination and the error grows at lighter pion masses; this is due to the large cancellation between the connected and disconnected contributions. Considering the final column in Tab. IV, we do not appear to benefit from a cancellation of statistical errors due to correlations between the two measurements. It is also clear that the approach to the infinite volume is less severe in the combination of these two quantities compared to fitting them individually; this is likely due to large cancellations in the long-distance contributions such as the pion pole. We still see significant discretisation effects in this fit, but fortunately we have several lattice spacings to constrain this behavior; nevertheless this will form our largest systematic as is discussed later on in Sec. VIII.

D. Consistency checks
It is useful to compare the present analysis to our previous work at the SU(3) f -symmetric point [37]. In that work, we combined light and strange contributions and obtained 98.9(2.5) × 10 −11 for the connected contribution in the infinite-volume and continuum limit, and −33.5(4.2) × 10 −11 for the disconnected. Combining these with charge factors adjusted to isolate the (u, d) quark contribution yields 70.1(3.8) × 10 −11 . The extrapolation in Fig. 8 at m 2 π ≈ 0.173 GeV 2 is 72.5(4.3) × 10 −11 , so these results are in good agreement, even though the underlying methodology is considerably different.  9. Consistency of the infinite-volume estimate for the light connected contribution on ensemble C101 between the analysis performed here (horizontal band) and the analysis method of Ref. [37] (blue points), which corrects the lattice data (black points) using the π 0 exchange prediction. The horizontal band is obtained by adding the finite-size correction from the global fit displayed in Fig. 6 (right panel) to the a (Conn) µ value obtained on ensemble C101 using the tail extension parametrisation Eq. (20). Note that the band does not include the systematic uncertainty of varying the fit ansatz in the global fit, which is addressed in section VIII. Fig. 9 illustrates the consistency between our previous 'tail and finite-size correction' methodology (blue and black points have been interpolated) and the result of our global fit with the Pole ansatz. The two are consistent within the combined statistical and systematic error of the blue data points, although the black line does lie a bit lower than the central value. In the C101 data, there is a strong upward fluctuation (visible in Fig. 4) that pushes both the black and the blue points up and likely hides a stable plateau region.

VI. STRANGE CONTRIBUTIONS
For the fully-connected strange, the (2 + 2)-light-strange (ls) and the (2 + 2)-strangestrange (ss) contributions, we use results from a subset of the ensembles (listed in Tab. V) to cut down on cost for what turns out to be a very small contribution to the overall a Hlbl µ . Here we can reuse the results from the symmetric point with the appropriate charge factors.        The left plot of Fig. 10 illustrates the magnitude of light and strange quark contributions to the fully-connected diagrams for ensemble C101, which has a light pion mass. The lightquark contribution is much longer ranged than the strange and statistically far noisier. The peak of the strange integrand for this ensemble is about 35 times smaller than the light one, and the overall integrated contribution is calculated to be about 55 times smaller; as one approaches the physical pion mass this difference will only grow.
The right plot of Fig. 10 illustrates the size of the contributions from different flavour combinations within the (2 + 2) calculation: light-light (ll), light-strange (ls), and strangestrange (ss). We again show ensemble C101 and use the same statistics for all flavours. For this ensemble the ls contribution is a bit larger than 10% of the ll, and the ss contribution is about 0.6%. It can be seen that the integrated a µ plateaus earlier for heavier quark content and the statistical precision is better too. As the majority of the data in this analysis comes from the previous SU(3) f -symmetric work, the same conclusions apply; finite-volume effects and cut-off effects are still sizeable even for the contributions including strange quarks.
We choose to extrapolate the sum of all the strange and light-strange contributions to the infinite-volume, physical quark mass, continuum limit using the Ansatz a µ (m 2 K , m π L, a 2 ) = a µ (0, ∞, 0)(1 + Am 2 K + Be −mπL + Ca 2 ).
It is worth noting that the exponential volume factor here is m π L instead of m π L/2 for the light-quark contribution as there is no π 0 -exchange. A plot of this extrapolation can be found in Fig. 11. The fit of Eq. (24) gives a χ 2 /dof = 0.6. Again, we see the (2 + 2) contribution approaching the continuum limit with the same sign as the fully-connected contribution; in the continuum, these two contributions effectively cancel. Our final result at the physical point is
Our goal is thus to compute the (3 + 1) class of diagrams as well as we can, and provide evidence that the (2 + 1 + 1) and (1 + 1 + 1 + 1) are small enough to be neglected from our targeted error budget. In particular, we will give details about how we treat the |y|integration of our (3 + 1) data.

A. The (3 + 1) contribution
The charge factor of (3 + 1) strange is −1/7 of that of (3 + 1) light . (Recall that here the flavour label corresponds to the triangle and that for the disconnected loop, we always use the combined light and strange contributions.) Furthermore, because of the larger mass of the strange quark, the (3 + 1) strange contribution is expected to be much smaller compared to (3 + 1) light . We will thus put our main effort on the (3 + 1) light contribution. Numerical evidence of the smallness of the (3 + 1) strange contribution is given in Section VII A 3; it turns out that this quantity is at least ten times smaller than the contribution with light triangle. At large distances, the physics is dictacted by the lightest particles, i.e., the pseudoscalar mesons. The most relevant contributions are neutral pseudoscalar-meson poles and charged pseudoscalar-meson loops ( [2] and the references therein). The computation based on Partially-Quenched Chiral Perturbation Theory (PQChPT) in Appendix A shows that there is no contribution at leading order coming from pseudoscalar-meson poles. Nevertheless, the (3 + 1) light receives contributions from pseudoscalar-meson loops (cf. Fig. 20 and Table IX). As the signal of the integrand degrades rapidly with increasing |y|, we decide to use the knowledge from light pseudoscalar contributions in infinite volume as a guideline for cutting the integral at some |y| = |y| cut . The procedure is as follows. We split the |y|-integral for a (3+1) µ into two intervals, below and above |y| cut , so that a

Treatment of the tail of the (3 + 1) light integrand
Below |y| cut , we numerically integrate the lattice data to obtain a < µ . Above |y| cut , we take the central value of a > µ to be zero and assign an uncertainty to this omitted tail contribution based on a calculation of the charged pseudoscalar-meson loop contribution in scalar QED. Final we add the two uncertainties (statistical for a < µ and systematic for a > µ ) in quadrature. For the tail uncertainty, we compute the integrand in infinite-volume scalar QED and integrate from |y| cut to infinity, which gives an order-of-magnitude estimate of the possible missing mesonic contributions in the tail. We then assign a very conservative systematic error, namely w sys. = 120% of this contribution. (This choice of w sys. will be justified in the next subsection.) As scalar QED corresponds to pointlike photon-pseudoscalar-pseudoscalar vertices, which tend to overestimate the pseudoscalar-meson loop contributions to a µ , we assume that the assigned systematic error also covers the possible finite-size effects in the |y| < |y| cut region. Finally, we determine |y| cut for each lattice ensemble by finding the value that minimises the total error. An example of this procedure is shown in Fig. 12. 2. Numerical results for the (3 + 1) light contribution Table VI shows our choice of |y| cut and the value of a (3+1) µ computed on each ensemble, with w sys. = 120%; these results are also plotted in Fig. 13 (left). It is already clear from this plot that there is no distinguishable O(a)-dependence in the data at our level of precision. The same is true of volume effects. This leads us to parameterise our data in a very simple form, namely Such a mass-dependence is motivated by the fact that this contribution must vanish at the SU(3) f -symmetric point. This fit describes our data well and we investigate the stability of our final fit result through applying several cuts in our data, as is shown in Fig. 13 (right). Also shown in the same figure are the results obtained by applying the same procedure with a different choice of the weight, w sys. = 200%, for the estimate of the systematic error of the omitted tail contribution. Note that a bigger value of w sys. implies that one cuts the lattice data at larger |y|. As the lattice data become noisier with increasing |y|, the fluctuations of the central value determined from the lattice data also become larger, especially for our ensembles with lighter pion mass. However, from the consistency between fits with different cuts in the data, it seems that our choice of w sys. = 120% is reasonable enough without being too conservative. For our final determination, including our fit-systematic, we choose to quote the determination excluding the coarsest ensemble A654 (a 2 < 0.2 GeV −2 ):  for ensembles C101 (m π ≈ 220 MeV, m K ≈ 470 MeV) and H105 (m π ≈ 280 MeV, m K ≈ 460 MeV), compared to the (3+1) light for the ensemble H105. The (3 + 1) strange data are multiplied by 10 for visibility. As for the statistics for the (3 + 1) strange , it is 50% compared to the (3 + 1) light for C101 and about 15% for H105.

The (3 + 1) strange contribution
We have computed the (3 + 1) strange contribution using two ensembles: C101 and H105. For both ensembles, the partially-integrated a µ is shown in Fig. 14, and this is compared with the (3 + 1) light for ensemble H105. It is clear from the lattice data that (3 + 1) strange is at least ten times smaller than our bound on (3 + 1) light and can be entirely neglected for our target precision compared to the leading contributions. As it involves the strange-quark triangle, we expect that this quantity depends only on hadronic states which are at least as heavy as the kaon. From this point of view, because the kaon masses on the used ensembles are somewhat lighter than the physical one, we find it exceedingly unlikely that it would grow significantly as the quark masses approach their physical values.
B. The (2 + 1 + 1) and the (1 + 1 + 1 + 1) results Due to the much-higher computational cost of the lattice-wide object (Eq. (9)) used in our computational strategy, we only determined the light-quark, (2 + 1 + 1) contribution for the ensembles N203 and C101. Also, we only computed the (1 + 1 + 1 + 1) contribution for the ensemble C101 because of our expectation for its insignificance to the final error. The results for the partially integrated a µ (|y|) for both of these ensembles are shown in Fig. 15.
A computation from PQChPT (see Appendix A) shows that at leading order, these two topologies receive neither contributions from the neutral pseudoscalar-meson poles, nor from charged pseudoscalar-meson loops. It is therefore hard to decide at which value of |y| one can cut the lattice data and apply a model prediction afterwards. However, one can see from Fig. 15 that the (2 + 1 + 1)-contribution for both ensembles is smaller than the (3 + 1) at small |y|. Although the rapid degradation of the signal of the (2 + 1 + 1)-contribution is expected, our strategy of averaging over possible ways of constructing the vector y for a given |y| appears to work well at suppressing the statistical noise of this quantity at short distances. In the end, we conservatively estimate this quantity to be zero with half the error of the (3 + 1) contribution. From the smallness of the light-quark contribution of the (2 + 1 + 1) topology, we deem it legitimate to assign the value of zero to the strange contribution. This comes with no contribution to the error budget, as this will be irrelevant compared to our overall level of precision for a Hlbl µ . Note that the mere charge factor suppresses the strange (2 + 1 + 1) contribution relative to the light (2 + 1 + 1) by a factor of five. As for the (1 + 1 + 1 + 1) contribution, its observed smallness on the right panel of Fig. 15 does not come as a surprise, in particular since its charge factor weights it five times less than the already-small (2 + 1 + 1). Any improvement to either of these quantities would have a completely negligible effect on the final result for a Hlbl µ , at our current level of precision.
VIII. THE TOTAL a Hlbl µ In this section we investigate two approaches to determining the contribution of the two leading light-quark contributions to a Hlbl µ : the first consists of fitting the sum of the two contributions and the second consists of adding the results of individual fits to the fully-connected and (2 + 2) contributions using various ansätze. We investigate possible systematics in our approach by comparing the results with terms in a or a 2 and by performing cuts in m π L, a 2 , and m 2 π .
A. Sum and fit We find that the fit Ansatz of Eq. (23) describes our data well. At the same time, a term linear in a instead of a 2 also gives a good fit (χ 2 /dof < 1 for all fits in this section). The results for these fits can be found in Fig. 16 and are listed in Tab. XI in Appendix B. There is a systematic difference between the fits in a and a 2 , with the former pulling the final value up a little. Applying the various cuts has little impact on the central value, and only the cut on the pion mass removing the SU(3) f -symmetric-point data leads to a significant increase of the statistical error. This is not surprising, as the larger the volume and the closer the pion mass to its physical value, the larger the cancellation between the fully-connected and (2 + 2) contributions becomes, and therefore the relative error on their sum.
It is thus clear from Fig. 16 that our main systematic in this approach comes from the continuum extrapolation, as was the case in our previous work [37]. For the final result from this analysis, we treat the two ansätze for the parametrization of cutoff effects on an even footing and perform a fit to a constant to all the possibilities. As an estimate of the systematic error, we compute the root-mean-squared deviation of the fit results y i compared to the average resultȳ, i.e. ( N i=1 (y i −ȳ) 2 /N ) 1/2 . We finally end up with a value of a (Conn.+(2+2))-l µ = 107.4(11.3)(9.2) × 10 −11 , with the first error being statistical and the second systematic.

B. Individual fits
Considering Fig. 6, it does appear that our data exhibits some curvature in m 2 π going towards the physical pion mass, however the underlying functional form is unclear. We have identified several ansätze to describe this non-analytic term in Eq. (22), all of which provide acceptable descriptions (χ 2 /dof ≈ 1) of our data. A plot summarising the values obtained for a µ at the physical point is shown in Fig. 17 and these values can be found in Tab. XII in Appendix B. A fit without some kind of curvature term poorly describes the connected data (χ 2 /dof ≈ 2.5), but for the disconnected a good fit is still possible without such a term due to the relatively low statistical precision of the data. It is clear from Fig. 17 that there is considerable ambiguity on the resulting individual contributions from choosing the functional form of this curvature term, although this somewhat "washes out" when the sum is taken. Due to the difficulty of resolving this term precisely using a global fit, we view combining fits to the individual contributions as a suboptimal procedure, especially without a result very close to the physical pion mass to help constrain this possible curvature. It is, however, reassuring that the two approaches have good agreement. In conclusion, we choose to quote the fit to the sum of the two contributions for our final result.
This result includes all systematics (added in quadrature) as well as previously-unmeasured (2 + 1 + 1) and (1 + 1 + 1 + 1) higher-order contributions. The overall precision is about 14%. A breakdown of the individual contributions to this result can be found in Tab. VII.  We find that, as we approach the physical pion mass, the two leading contributions to the total a Hlbl µ , the light-quark fully-connected and (2 + 2) disconnected, yield significant cancellations. This makes a precise measurement at low pion mass and large-m π L extremely challenging. In fact, without the data from the SU(3) f -symmetric-point data our determination would be considerably less precise (see Fig. 16). It is also clear that the only quantities really needed in the determination of a Hlbl µ are the fully-connected and (2 + 2) light-quark contributions. We find that all of the sub-leading contributions are consistent with zero within the desired precision. For the first time, we have performed a direct calculation of the (2 + 1 + 1) and (1 + 1 + 1 + 1) contributions and again find these contributions to be consistent with zero and smaller than the (3 + 1), which is expected to be the case from large-N c arguments, and naively from the magnitude of their charge factors.
As suggested in previous lattice determinations [36,42] and several phenomenological predictions (e.g. [32,56,60,61] and discussion/references in [2]), the hadronic light-bylight contribution is in no way large enough to bridge the current gap between theory and experiment for the overall (g − 2) µ . In Fig. 18, we illustrate that there is excellent agreement between our determination and the literature. An uncorrelated fit to a constant of the upper three values of Fig. 18 yields a Hlbl µ = 98.9(11.1) × 10 −11 . We remind the reader that we consistently omit the contribution of the charm quark, which in [2] is estimated to be 3(1) × 10 −11 . Whether one performs an average of different a Hlbl µ determinations or not, with the literature. The results in circles are the two available lattice determinations (this work and [36], above the horizontal dashed line). The results in squares are phenomenological predictions from [2], [32], [60,61], and [56]. All errors have been added in quadrature.
with the level of precision and consistency achieved, the highest priority in improving the overall (g − 2) µ theory prediction is now to sharpen the HVP determination.
Still, further improvements in the lattice determination of a Hlbl µ are clearly possible with the formalism we have employed. It is worth reiterating once more that a lattice determination of a Hlbl µ needs only to focus on the light fully-connected and (2 + 2) contributions as, at the required accuracy to make an impact on the theory prediction of (g − 2) µ , these are the only parts that matter. The project leading to this publication has also received funding from the Excellence Initiative of Aix-Marseille University -A*MIDEX, a French "Investissements d'Avenir" programme, AMX-18-ACE-005. Calculations for this project were partly performed on the HPC clusters "Clover" and "HIMster II" at the Helmholtz-Institut Mainz and "Mogon II" at JGU Mainz. The authors gratefully acknowledge the Gauss Centre for Supercomputing e.V. (www.gauss-centre.eu) for funding this project by providing computing time on the GCS Supercomputer HAWK at Höchstleistungsrechenzentrum Stuttgart (www.hlrs.de) (Project ID GCS-HQCD). Our programs use the deflated SAP+GCR solver from the openQCD package [62], as well as the QDP++ library [63]. We are grateful to our colleagues in the CLS initiative for sharing ensembles. Partially-Quenched Chiral Perturbation Theory (PQChPT), an effective field theory (EFT) of Partially-Quenched Quantum Chromodynamics (PQQCD), is a commonly-used tool for matching different QCD Wick-contractions to Feynman diagrams in an EFT (see e.g. [64]).
In our previous study at the SU(3) f -symmetric point [37], we have shown how one applies PQChPT to obtain the corresponding contributions for the finite size effect correction. For the purpose of this paper, we shall consider PQChPT with SU(3) f -breaking because our calculations are performed much closer to the physical-quark-mass point. In particular, we will focus on two hadronic contributions from the EFT which are considered to be dominant at large distances for the light-by-light scattering: the neutral pseudoscalar meson exchange and the charged pseudoscalar meson loop.
In all of the cases discussed below, we will consider N f = 2 + 1 QCD as the underlying theory. We will study the matching for the pseudoscalar meson loop using SU(N |M )-theories because only the charged mesons are expected to contribute. On the other hand, we will also consider U(N |M )-theories for the neutral pseudoscalar meson exchange, because it allows one to investigate how the flavour-singlet meson, η , contributes to each individual QCD Wick-contraction, as well as how η/η -mixing might arise in diagram matching. Since the purpose of this study is to understand how different Feynman diagrams are matched between theories, we will only consider the lowest order terms for both contributions and renormalisation will not be taken into account.
At lowest order in perturbation theory, the flavour-breaking effect is due to the mass term. This manifests itself in the mixing of the propagators if we consider the Goldstone boson fields as living in the adjoint representation of the symmetry group; the interaction vertices are not affected at this level. Denoting m l the light quark mass and m s the strange quark mass, and starting from N f = 2 + 1 QCD, we will need two different PQChPTs to single-out different Wick contractions, those in which quenched quarks/ghosts either have mass values of m l or m s . We enumerate these as: In the following, we will first give the expressions for the propagators, and the relevant interaction vertices will then be given in dedicated sub-sections for the neutral pseudoscalar exchange and the charged pseudoscalar loop. Then, we apply the usual routine as described in [37] to build different four-point functions which give exactly the desired diagrams from an adequate choice of PQChPT. The expressions will be given in momentum space in Euclidean spacetime.
Note that we give the matching relations between different contraction topologies in QCD and PQChPT: one has to include the correct charge factors in order to recover the full light-by-light scattering result.
The graded group SU(N |M ) is generated by super-traceless matrices. A convenient choice of generator basis {T a } is a set of super-traceless Hermitian matrices, such that where The kinetic part of the PQChPT Lagrangian can then be arranged as where the φ a s are Goldstone bosons/fermions and Here S is an (N + M ) × (N + M ) diagonal matrix which is related to the quark species of the underlying PQQCD, where the quark fields live in the fundamental representation of SU(N |M ). For our purpose, we require the underlying PQQCD to be an extension of N f = 2 + 1 QCD by setting S = diag(0, 0, 1, · · · ). Under this convention, the first two positions in the fundamental representation correspond to the u-and d-quark, the third to the s-quark and the rests are quenched quarks and their ghost counterparts. The remaining elements of S are set in the following way: S ii = 0 if the i-th quark in the underlying PQQCD is of mass m l and S ii = 1 if it is of mass m s . We shall specify the generator basis before proceeding further, as this affects the definition of K ab and the form of the propagators that we give later. For SU(N |M ), we can write down a basis for the generators with N + M − 1 that are diagonal, and (N + M )(N + M − 1) generators with 0's in the diagonal. We will call the first category neutral and the second charged.
Analogously to the SU(3) Gell-Mann matrices, the set of charged generators can be written as a union of two sets of matrices I ∪ J, where As for the neutral generators, the j-th of them, A j , is defined by It is straightforward to see that, upon re-enumeration, the basis of generators constructed in this way satisfies Eq. (A2). Note that the neutral generator defined by j = 1 in Eq. (A7) has a special rôle because it does not mix with other neutral generators under flavour-breaking. We will name it as neutral pion (π 0 ). For both PQChPT theories (i ) and (ii ) in which we are interested, the propagator takes the form where .

(A9)
The mass parameters, M ab andM ab , and the matrixH depend on the PQChPT and only the neutral non-pion sector is affected by the propagator mixing induced by the second term on the right-hand side of Eq. (A8), due to flavour-symmetry breaking.
In the charged meson sector, for both (i) and (ii ), we have H ab = 0 and ss ≡ M 2 π + 2∆M 2 if the underlying PQQCD content is strange-strange.
(A10) For the neutral sector, the results for (i ) and (ii ) are as follows: (ii ) To make connection with the physical parameters in U(3) ChPT, we define where δ is the η/η -mixing angle. One can build a partially-quenched U(N |M ) theory from a U(N − M ) theory in the same fashion as illustrated in the previous section; the only difference is the presence of an extra generator with a non-vanishing super-trace. This has already been done in the literature but in a different generator basis [65]. For our purpose, we stick to our already-built generator basis for SU(N |M ) plus the diagonal generator Λ −1 I for our generator basis. We extend the definition of the metric defined for SU(N |M ) in Eq. (A3) tõ where the top-left corner of the matrix on the right-hand side represents the flavour-singlet sector. We will use the number 0 to indicate this sector. Instead of introducing new notation, we extend naturally the definition of the matrix K defined in Eq. (A5) to include the flavoursinglet sector. After matching to the U(N − M ) theory, one obtains for the leading order of the chiral Lagrangian: whereφ a is the same field as φ a in the SU(N |M )-theory if a = 0 andφ 0 = η , and The derivation of the propagators for the Lagragian Eq. (A15) can be performed following a similar procedure as in [66]. We introduce here a matrixH, two constants, λ and D 00 , and mass parameters, M ab andM ab , which will be specified later according to the partiallyquenched theory considered.
We define .
The U(N |M )-propagator is then given by where For the charged sector, only the first term on the right-hand side of Eq. (A20) does not vanish and the mass M ab parameter is the same as defined in Eq. (A10); for the neutral sector, we have for the cases (i) and (ii): K ab − 2δ ab (a = π 0 and has no ghost constituent) or (a = η ), K ab + 2δ ab a = π 0 , η and has ghost contituents, K ab else, (1 + 2Λ 2 ) , D 00 = 2 , H ab = (g −1Kg−1 ) ab . (A23)

Charged pseudoscalar meson loop
Consider here an SU(N |M ) theory. Up to O(p 2 ), the interaction Lagrangian is [64] where and [·, ·] acts as commutator or anti-commutator according to the generators that it applies to (for definition cf. [64]). The summed indices run over the whole basis of generators.
In Fig. 19, we define two functions: Ω and Ω c . These are linear combinations of different diagrammatic classes appearing in the charged pseudoscalar meson loop computation (cf .  Table VIII), and Ω and Ω c are computed with a given pseudoscalar meson mass. Different QCD Wick-contractions are matched to the charged pseudoscalar loop in the way described in Fig. 20 with coefficients given in Table IX. The first column of Table IX indicates the pseudoscalar meson mass that Ω or Ω c should take, and only the non-vanishing contributions are listed. 2 One can check that, with the correct charge factors, the total a Hlbl µ receives contributions from one charged pion-loop and one charged kaon-loop.   , e i (for (3+1)) and f i (for (2+2)) for the contribution of different charged pseudoscalar mesons defined in Fig. 20. Here, the same convention as for our lattice computation described in the main text is used. In particular, "(4),l/s" refers to the fully-connected light/strange contribution, "(2+2)-ls" refers to the sum of the 2+2 diagrams with one light and one strange quark, and "(3+1)-l/s" refers to the light/strange quark triangle correlated with light minus strange disconnected loop.

Neutral pseudoscalar meson exchange
The neutral pseudoscalar meson exchange is possible due to the chiral anomaly. To make the analysis simpler, we only consider the coupling of a pseudoscalar meson with two photons, P γγ, via the Wess-Zumino-Witten term [67,68], whose Feynman rule up to an irrelevant factor for our analysis is given by: We compute the matching coefficients for the cases with and without the flavour-singlet pseudoscalar meson η . A priori, the P γγ vertex receives contributions from flavour symmetry breaking effects already at tree-level. In addition, if one is to include the flavour-singlet pseudoscalar meson, additional terms that violate the OZI rule also have an impact at treelevel [69]. As our goal is to have a qualitative idea of the diagram matching and for a more quantitative analysis, a more realistic transition form factor is required anyway, these effects are not included in our analysis.