1 Introduction

The anomalous magnetic moment of the muon, \(a_\mu \equiv (g-2)_\mu /2\), is one of the most precisely measured quantities of the Standard Model (SM) of particle physics. Its value is of considerable interest to the physics community as, currently, there exists a \(3.7\sigma \) tension between the experimental determination of Ref. [1] and the current theoretical evaluation (see Ref. [2] and references therein). Although the central value of the theoretical prediction is overwhelmingly dominated by QED effects, its uncertainty is dominated by low-energy QCD contributions. If the tension persists under more precise scrutiny, it is possible that a \(5\sigma \) discrepancy could appear, heralding an indirect determination of Beyond the Standard Model (BSM) physics.

A new series of experimental results (E989 at Fermilab [3] and E34 at J-PARC [4]) intend to increase the precision of the experimental determination by a factor of about four; as it stands, the error on \(a_\mu \) is at the level of \(63\times 10^{-11}\). Similarly, the theory community is striving to reduce the error of their determination to match the upcoming experimental precision. One of the contributions that is of specific interest is the hadronic vacuum polarisation (HVP), which enters at \(O(\alpha _\text {QED}^2)\). Being a QCD quantity dominated by hadronic scales, this contribution can be directly obtained from first-principles lattice QCD calculations, although currently its most precise estimate [2, 5,6,7,8,9,10,11] comes from dispersive methods and is \(6931(40)\times 10^{-11}\). Significant progress has been made in recent years within the lattice approach [12,13,14,15,16,17,18,19,20,21,22], and these determinations are quickly becoming competitive with the dispersive approach.

A much smaller contribution to the overall \((g-2)_\mu \) comes from hadronic light-by-light scattering (Hlbl), entering at \(O(\alpha _\text {QED}^3)\). However, this quantity is currently only known at the 20% level: the recent evaluation of Ref. [2], omitting an estimate of the small charm-quark contribution, amounts to [23,24,25,26,27,28,29,30,31,32,33,34,35] \(89.0(19.0)\times 10^{-11}\). Thus the absolute uncertainty of the Hlbl contribution is only about half that of the recent average [2] for the HVP. To match the expected experimental precision, it is thought that the Hlbl contribution \(a_\mu ^{\text {Hlbl}}\) needs to be known with a precision of around \(10\%\). The task of directly computing this contribution using lattice QCD methods is quite daunting, as it requires the computation of connected and disconnected four-point functions. Few lattice groups have even performed measurements of the leading contributions, and none with the desired precision. The most-precise lattice determination to date [36] uses the finite-volume \(\text {QED}_L\) prescription and quotes a value of (adding their statistical and systematic errors in quadrature) \(78.7(35.4)\times 10^{-11}\). In [37], we provided an estimate at the physical pion mass, starting from our \(\text {SU}(3)_f\)-symmetric point result and correcting for the neutral-pion exchange [27, 38], of \(104.0(20.8)\times 10^{-11}\).

We extend our previous determination of the Hlbl contribution to \((g-2)_\mu \) at the \(\text {SU}(3)_f\)-symmetric point [37] by incorporating data from simulations at pion masses as low as 200 MeV. We also provide estimates for the sub-leading \((3+1)\), \((2+1+1)\), and \((1+1+1+1)\) contributions, providing a full first-principles calculation using lattice QCD with a competitive overall error.

This work is organised as follows: first we introduce our approach and formalism for measuring \(a_\mu ^{\text {Hlbl}}\) using lattice QCD and infinite-volume perturbative QED in Sect. 2. In Sect. 3, we discuss the numerical techniques and effort for our determination. Section 4 contains a comparison of the integrand to the predictions of hadronic models. We then present results for the leading fully-connected and \((2+2)\) diagram contributions with light (Sect. 5) and strange (Sect. 6) quark content. In Sect. 7 we discuss the determination of the higher-order \((3+1)\), \((2+1+1)\), and \((1+1+1+1)\) contributions. We finally discuss the systematics of our largest contribution in Sect. 8, and combine all of our determinations and draw conclusions in Sect. 9.

2 Formalism

In order to have a better control over the long-distance QED effects, we use a position-space approach, which consists in treating the QED part perturbatively, in infinite-volume and in the continuum, and the hadronic part non-perturbatively on the lattice [39,40,41]. Due to the O(4) symmetry in the Euclidean continuum, the hadronic light-by-light contribution to the anomalous magnetic moment of the muon, \(a_\mu ^{\text {Hlbl}}\), admits the following integral representation

$$\begin{aligned} a_\mu ^{\text {Hlbl}}= \sum _{\text {Topology}}\int _0^\infty d|y|\, f^{(\text {Topology})}(|y|), \end{aligned}$$

where \(f^{(\text {Topology})}(|y|)\), henceforth called the integrand (for a fixed diagrammatic topology), is itself obtained as an integral over spacetime (in our notation \(\int _x = \int d^4x\)),

$$\begin{aligned}&\sum _{\text {Topology}}f^{(\text {Topology})}(|y|)\nonumber \\&\quad = \frac{m_\mu e^6}{3} 2\pi ^2 |y|^3 \int _x \; \mathcal {{{\bar{L}}}}_{[\rho ,\sigma ];\mu \nu \lambda }(x,y)\;i{{\widehat{\Pi }}}_{\rho ;\mu \nu \lambda \sigma }(x,y).\nonumber \\ \end{aligned}$$

Here \(e^2/(4\pi )=\alpha _{\mathrm{QED}}\) is the fine-structure constant and \(m_\mu \) the muon mass. The QED kernel \(\mathcal {{{\bar{L}}}}\) represents the contributions of the photon and muon propagators and vertices (see Fig. 1), and \(i{{\widehat{\Pi }}}\) is the first moment of the connected, Euclidean, hadronic four-point function,

$$\begin{aligned} \begin{aligned} i{{\widehat{\Pi }}}_{\rho ;\mu \nu \lambda \sigma }( x, y)&= -\int _z z_\rho \, {{\widetilde{\Pi }}}_{\mu \nu \sigma \lambda }(x,y,z), \\ {{\widetilde{\Pi }}}_{\mu \nu \sigma \lambda }(x,y,z)&\equiv \Big \langle \,j_\mu (x)\,j_\nu (y)\,j_\sigma (z)\, j_\lambda (0)\Big \rangle _{\mathrm{QCD}}. \end{aligned} \end{aligned}$$

The field \(j_\mu (x)\) appearing above is the hadronic component of the electromagnetic current,

$$\begin{aligned} j_\mu (x) = \frac{2}{3} ({\overline{u}} \gamma _{\mu } u)(x) - \frac{1}{3} ({\overline{d}} \gamma _{\mu } d)(x) - \frac{1}{3} ({\overline{s}} \gamma _{\mu } s)(x). \end{aligned}$$

As for the QCD four-point function \({\widetilde{\Pi }}_{\mu \nu \sigma \lambda }\), it consists of five classes of Wick-contractions, illustrated in Fig. 1: the fully-connected, the \((2+2)\), the \((3+1)\), the \((2+1+1)\) and the \((1+1+1+1)\). It can be shown that the contribution to \(a_\mu ^{\text {Hlbl}}\) of each topology is itself a gauge-independent observable, therefore it is legitimate to focus on each independently.

According to large-\(N_c\) arguments and some numerical evidence provided by the RBC/UKQCD collaboration [36] on the \((3+1)\) topology, only the first two (the fully-connected and \((2+2)\)) of the aforementioned classes are believed to be dominant, however no direct calculations of the subleading classes have been performed until now. In addition, the last three classes, which we refer to as higher-order topologies, are suppressed by powers of the light-minus-strange quark-mass difference around the \(\text {SU}(3)_f\)-symmetric point, and necessarily vanish exactly at that point.

Fig. 1
figure 1

Different quark Wick-contraction classes appearing in the computation of the QCD four-point correlation function. The straight horizontal lines represent muon propagators, wavy lines represent photon propagators. From left to right, top to bottom, they are the fully-connected, \((2+2)\), \((3+1)\), \((2+1+1)\) and \((1+1+1+1)\). Each class contains the digrams obtained from all the possible permutations of the four points attached to photons

As the integrand (f(|y|)) is a scalar function in |y|, our computational strategy consists in calculating the integrand, the inner integrals over x and z being replaced by sums, averaged over many equivalent instances of the origin and the y-vector for a given |y| to enhance statistics, and then applying the trapezoidal rule to approximate the integral over |y| of Eq. (1), in order to finally obtain \(a_\mu ^{\text {Hlbl}}\) for each gauge ensemble. We then take the appropriate infinite-volume and continuum limits and extrapolate our result to physical quark masses.

In addition to showing the integrand, often we will find it useful to present the partially-integrated quantity,

$$\begin{aligned} a_\mu (|y|_\text {Max.}) = \int _0^{|y|_\text {Max.}} d|y| f(|y|). \end{aligned}$$

This quantity is typically less sensitive to point-by-point fluctuations in f(|y|) and adequately illustrates the salient features of the calculation. Our expectation is that the partially-integrated quantity admits plateau as \(|y|_\text {Max.}\) is increased, indicating the integral has saturated within the uncertainties. Although Eqs. (1)–(3) represent our starting point for computing \(a_\mu ^{\mathrm{Hlbl}}\), it is computationally advantageous to rearrange the contributions of individual Wick contraction diagrams within one topology class. In the five subsections below, we present the specific integrand functions \(f^{\mathrm{(Topology)}}\) that we actually use for each of the five topologies.

Exploiting the Ward identities associated with current conservation, the QED kernel can be modified by adding to it terms which do not contribute to \(a_\mu ^{\text {Hlbl}}\) in the infinite volume limit [41, 42]. To mitigate the signal-to-noise problem of vector-current lattice correlation functions at large separations, one would like to choose a QED kernel which guarantees a rapid fall-off of the integrand f(|y|) at large |y|, without picking up large discretisation effects by making it too-peaked at short distances. Due to the gauge-invariance of each topology, one can even work with different choices of kernel for each topology individually.

In our previous work at the \(\text {SU}(3)_f\)-symmetric point [37], we have shown the effectiveness of a certain one-parameter family of kernels, \(\mathcal {{{\bar{L}}}}^{(\Lambda )}_{[\rho ,\sigma ];\mu \nu \lambda }\), with positive, real \(\Lambda \). Our preferred choice for this parameter is \(\Lambda =0.4\); this was motivated by several studies of the shape of the integrand: a continuum and infinite volume QED calculation of the lepton loop contribution, a study of the pion-pole contribution with a Vector Meson Dominance (VMD) parametrisation for the transition form factor in the continuum and finite volume, and our direct lattice calculations at the \(\text {SU}(3)_f\)-symmetric point.

While Eq. (2) represents our general master formula for the integrand f(|y|), for computational reasons it can be beneficial to exploit the translational invariance of the QCD correlation function to re-arrange the integrand in different ways, such that only the most favourable diagrams within each topology class have to be explicitly computed. In our previous work, we showed that for the fully-connected contribution such an approach reduced the computational cost significantly without introducing undesirable effects when used in conjunction with the kernel \(\mathcal {{{\bar{L}}}}^{(\Lambda )}_{[\rho ,\sigma ];\mu \nu \lambda }\) [37]. In the subsections below we present the specific integral representations that we use for each of the five topologies.

For notational simplicity, we will find the following QED kernel combination useful

$$\begin{aligned} {\mathcal {L}}^\prime _{[\rho ,\sigma ];\mu \nu \lambda }(x,y)= & {} \bar{{\mathcal {L}}}^{(\Lambda )}_{[\rho ,\sigma ];\mu \nu \lambda }(x,y) + \bar{{\mathcal {L}}}^{(\Lambda )}_{[\rho ,\sigma ];\nu \mu \lambda }(y,x)\nonumber \\&- \bar{{\mathcal {L}}}^{(\Lambda )}_{[\rho ,\sigma ];\lambda \nu \mu }(x, x-y). \end{aligned}$$

Also, in the equations below we anticipate our use of the local vector current on the lattice, which requires a multiplicative renormalisation factor \({{\hat{Z}}}_{\mathrm{V}}\).

2.1 The fully-connected contribution

For the fully-connected calculation we use the following master equation for the integrand:,

$$\begin{aligned} \begin{aligned} f^{\text {(Conn.)}}(|y|)&= -\sum _{j\in u,d,s}{\hat{Z}}_{\mathrm{V}}^4 Q_j^4 \frac{m_\mu e^6}{3}2\pi ^2|y|^3 \\&\quad \times \int _x \bigg ( {\mathcal {L}}^\prime _{[\rho ,\sigma ]\mu \nu \lambda }(x,y) \int _z z_\rho {{\widetilde{\Pi }}}^{(1),j}_{\mu \nu \sigma \lambda }(x,y,z)\\&\quad +\mathcal {{{\bar{L}}}}^{(\Lambda )}_{[\rho ,\sigma ];\lambda \nu \mu }(x,x-y)x_\rho \int _z {{\widetilde{\Pi }}}^{(1),j}_{\mu \nu \sigma \lambda }(x,y,z)\bigg ), \end{aligned} \end{aligned}$$

with hadronic contribution

$$\begin{aligned}&{{\widetilde{\Pi }}}_{\mu \nu \sigma \lambda }^{(1),j}(x,y,z) \nonumber \\&\quad =-2\text {Re}\left\langle \,\mathrm{Tr}\left[ S^j(0,x) \gamma _{\mu } S^j(x,y) \gamma _{\nu } S^j(y,z) \gamma _{\sigma } S^j(z,0) \gamma _{\lambda } \right] \right\rangle _U.\nonumber \\ \end{aligned}$$

Here \(S^{j}(x,y)\) is the flavour j-quark propagator from source y to sink x, \(Q_j\) is the charge factor (\(Q_u = \frac{2}{3}\), \(Q_d = -\frac{1}{3}\), \(Q_s = -\frac{1}{3}\)), and \(\langle \cdot \rangle _U\) denotes the ensemble average.

2.2 The \((2+2)\) contribution

We start by defining the two-point function “meson-field”

$$\begin{aligned} \Pi _{\mu \nu }^{j}(x,y)= -\text {Re}\left( \text {Tr}[S^j(y,x)\gamma _\mu S^j(x,y)\gamma _\nu ]\right) , \end{aligned}$$

which must have its vacuum expectation value (VEV) subtracted:

$$\begin{aligned} {\hat{\Pi }}^j_{\mu \nu }(x,y) = \Pi ^j_{\mu \nu }(x,y) - \langle \Pi ^j_{\mu \nu }(x,y) \rangle _U. \end{aligned}$$

We use the following integral representation for \(a_\mu ^{(2+2)}\)

$$\begin{aligned}&f^{(2+2)}(|y|) = -\sum _{i,j\in u,d,s} Q_i^2 Q_j^2 {\hat{Z}}_{\mathrm{V}}^4\frac{m_\mu e^6}{3}2\pi ^2 |y|^3\nonumber \\&\quad \times \biggl \langle \int _x \bigg ( (\mathcal {{{\bar{L}}}}^{(\Lambda )}_{[\rho ,\sigma ];\mu \nu \lambda }(x,y)\nonumber \\&\quad +\mathcal {{{\bar{L}}}}^{(\Lambda )}_{[\rho ,\sigma ];\nu \mu \lambda }(y,x)) {\hat{\Pi }}^i_{\mu \lambda }(x,0)\int _z z_\rho {\hat{\Pi }}^j_{\sigma \nu }(z,y)\nonumber \\&\quad +\mathcal {{{\bar{L}}}}^{(\Lambda )}_{[\rho ,\sigma ];\mu \nu \lambda }(x,y){\hat{\Pi }}^i_{\mu \nu }(x,y) \int _z z_\rho {\hat{\Pi }}^j_{\sigma \lambda }(z,0)\bigg ) \biggr \rangle _U. \end{aligned}$$

Note that the VEV subtraction is necessary to guarantee that the two quark loops are connected by gluons, in the perturbative picture. In this representation, the factorisation of the x- and z-integrations makes the lattice computation easier. Similar patterns can also be found in our choice of representation for the higher order topologies for the same reason.

We call light-light contribution the set of diagrams consisting exclusively of light quarks. Likewise, the strange-strange contribution contains only strange quark loops. Finally, the light-strange case covers all diagrams containing one light and one strange quark loop. These sub-contributions can easily be constructed by combining different terms in Eq. (11). As the integral is constructed as a post-processing step, the light-quark and strange-quark loops can easily be combined.

2.3 The \((3+1)\) contribution

As we work with \(N_f=2+1\) lattice ensembles, we assume the mass-degeneracy between the u- and d-quark from here on to simplify our expressions. We begin by defining the two hadronic building blocks (here l and s refer to light and strange quarks respectively),

$$\begin{aligned} T_\mu (x) = \text {Im} \Big ( \mathrm{Tr\,}[\gamma _\mu S^{l}(x,x)] - \mathrm{Tr\,}[\gamma _\mu S^{s}(x,x)] \Big ), \end{aligned}$$


$$\begin{aligned} R^i_{\mu \nu \lambda }(x,y,z) = \text {Im}\Big (\mathrm{Tr\,}[\gamma _\mu S^{i}(x,y)\gamma _\nu S^{i}(y,z)\gamma _\lambda S^{i}(z,x)] \Big ). \end{aligned}$$

The quantity \(R^{i}\) will be referred to as a triangle with quark species i, and T will be called disconnected loop.

Our expression for the integrand for this contribution reads

$$\begin{aligned}&f^{(3+1)}(|y|) = \frac{2m_\mu e^6}{9}\sum _{j\in {u,d,s}} {\hat{Z}}^4_{\mathrm{V}} Q_j^3 2\pi ^2 |y|^3\nonumber \\&\quad \times \Big \langle \int _{x} {\mathcal {L}}^\prime _{[\rho ,\sigma ]\mu \nu \lambda }(x,y) T_\mu (x)\int _z z_\rho R^j_{\lambda \nu \sigma }(0,y,z) \nonumber \\&\quad +\int _x\bar{{\mathcal {L}}}^{(\Lambda )}_{[\rho ,\sigma ]\lambda \nu \mu }(x,x-y) x_\rho T_{\mu }(x)\int _z R^j_{\lambda \nu \sigma }(0,y,z)\nonumber \\&\quad +\int _{x}\bar{{\mathcal {L}}}^{(\Lambda )}_{[\rho ,\sigma ]\mu \nu \lambda }(x,y) R^j_{\mu \nu \lambda }(x,y,0)\int _{z}z_\rho T_\sigma (z)\Big \rangle _U. \end{aligned}$$

It is worth noting that unlike in the \((2+2)\) case, no VEV-subtraction is needed for the \((3+1)\) contribution, because the VEV of the three-point function and the one-point function vanish due to the charge conjugation symmetry of the QCD action. In later sections, we will call \((3+1)_{{\mathrm{light}}}\) and \((3+1)_{\mathrm{strange}}\) the sub-contribution with light and strange quark triangle respectively.

2.4 The \((2+1+1)\) contribution

We can derive a representation for \(f^{(2+1+1)}\) from the expression for the \((3+1)\) topology; the idea is to split the triangles appearing in the expression of the \((3+1)\) integrand into a sum of products of two- and one-point functions, and then correct the diagram double-counting. In doing so, the terms involving the disconnected quark loop T in Eq. (14) can be reused for the \((3+1)\) calculation, as we perform more self-averages for this noisy, more-disconnected quantity (see Sect. 3). Moreover, we apply a change of variables to avoid the case where a disconnected loop is located at the origin to increase the number of available samples per |y|.

More explicitly, we define the two quantities

$$\begin{aligned}&h_{\mu \nu \lambda }^{i}(x,y) = {\hat{\Pi }}^{i}_{\mu \lambda }(x,0)T_\nu (y),\nonumber \\&g_{\mu \nu \lambda }^{i}(x,y) = h^{i}_{\mu \nu \lambda }(x,y) + 2h^{i}_{\nu \mu \lambda }(y,x), \end{aligned}$$

and we write

$$\begin{aligned}&f^{(2+1+1)}(|y|)\nonumber \\&\quad = \frac{m_\mu e^6}{54}{\hat{Z}}_\mathrm{V}^4 \sum _{i\in u,d,s}Q_i^2 2\pi ^2 |y|^3 \nonumber \\&\qquad \times \Big \langle -\int _{x}{\mathcal {L}}^{\prime }_{[\rho ,\sigma ]\mu \nu \lambda }(y-x,y)T_\mu (x)\int _z (z_\rho - y_\rho ) h^{i}_{\sigma \lambda \nu }(z,y)\nonumber \\&\qquad + \int _{x} (x-y)_\rho \bar{{\mathcal {L}}}^{(\Lambda )}_{[\rho ,\sigma ]\lambda \nu \mu }(x-y,x) T_\mu (x)\int _{z} h^{i}_{\sigma \lambda \nu }(z,y)\nonumber \\&\qquad + \int _{x} {\mathcal {L}}^\prime _{[\rho ,\sigma ]\mu \nu \lambda }(x,y)T_\mu (x) \int _z z_\rho g^i_{\sigma \nu \lambda }(z,y) \nonumber \\&\qquad + \int _x \bar{{\mathcal {L}}}^{(\Lambda )}_{[\rho ,\sigma ]\lambda \nu \mu }(x,x-y)x_\rho T_\mu (x)\int _z g^i_{\sigma \nu \lambda }(z,y) \Big \rangle _U.\qquad \end{aligned}$$

2.5 The \((1+1+1+1)\) contribution

Here we finally give our parametrisation of the fully-disconnected \((1+1+1+1)\) contribution. This again takes advantage of the quantities computed in the previous cases. Here, one needs to carefully subtract the non-vanishing VEVs appearing in different pieces in this contribution. We define the following quantity:

$$\begin{aligned}&\langle T_\mu (x)T_\nu (y)T_\sigma (z)T_\lambda (0) \rangle _U^c \nonumber \\&\quad = +\langle T_\mu (x)T_\nu (y)T_\sigma (z)T_\lambda (0) \rangle _U \nonumber \\&\qquad -\langle T_\mu (x)T_\nu (y) \rangle _U \langle T_\sigma (z)T_\lambda (0) \rangle _U \nonumber \\&\qquad -\langle T_\mu (x)T_\sigma (z) \rangle _U \langle T_\nu (y) T_\lambda (0) \rangle _U \nonumber \\&\qquad -\langle T_\mu (x) T_\lambda (0) \rangle _U \langle T_\nu (y)T_\sigma (z) \rangle _U. \end{aligned}$$

With this definition in place, we can write down the expression we used for the integrand for this topology, after correcting the triple-counting of the diagrams,

$$\begin{aligned}&f^{(1+1+1+1)}(|y|) = - \frac{m_\mu e^6}{729} {\hat{Z}}_\mathrm{V}^4 2\pi ^2 |y|^3\nonumber \\&\quad \times \Big \langle \int _x \bar{{\mathcal {L}}}^{(\Lambda )}_{[\rho ,\sigma ]\lambda \nu \mu }(x,x-y)x_\rho T_\mu (x)\int _z T_{\nu }(y) T_{\lambda }(0) T_{\sigma }(z)\nonumber \\&\quad + \int _{x} {\mathcal {L}}^\prime _{[\rho ,\sigma ]\mu \nu \lambda }(x,y) T_\mu (x) \int _z z_\rho T_{\nu }(y) T_{\lambda }(0) T_{\sigma }(z)\Big \rangle _U^c.\nonumber \\ \end{aligned}$$

As a concluding remark for this section, in some of the provided expressions, terms with a z-integral without a z-dependent weight factor appear. These could be reduced and in some cases vanish in the infinite-volume limit due to the Ward-identity associated with current conservation. Such a modification would in general change the shape of the integrand, as well as its statistical variance. For definiteness, our lattice calculations are done precisely with the expressions given in this section.

3 Numerical setup

This section presents the gauge ensembles used in our calculation of \(a_\mu ^{\text {Hlbl}}\), as well as the different strategies we applied to compute the contributions of the different topology classes.

3.1 Ensemble details

In this work we use \(N_f=2+1\) O(a)-improved Wilson fermion ensembles generated by the CLS initiative [43], for which the improvement coefficient \(c_{SW}\) was determined non-perturbatively in [44]. We extend our previous work at the \(\text {SU}(3)_f\)-symmetric point to ensembles with \(m_l< m_s\) down to pion masses of \(200~\text {MeV}\), while maintaining \(\,\mathrm{Tr}[M]=\text {Constant}\), with \(M=\text {diag}(m_u, m_d, m_s)\) the quark mass matrix. In addition, we make use of two ensembles at a further, coarser lattice spacing. We combine the symmetric-point results of our previous determination [37] with measurements taken from nine other ensembles to create a large data set from which all sources of systematic error can be estimated. Table 1 summarises the gauge ensembles used, their pion and kaon masses, the lattice spacings, as well as the quark-mass dependent renormalisation factors, \({\hat{Z}}_V\). The latter is either measured directly as part of this work or taken from [45], here we only use un-improved local vector currents in this work.Footnote 1 The coverage of the lattice spacing and pion mass variables by the gauge ensembles used in this work is illustrated in Fig. 2.

Table 1 Details of the ensembles used to compute the various contributions to \(a_\mu ^{\text {Hlbl}}\). Lattice spacings were determined in [46], apart from the “A” ensembles, where the lattice spacing was estimated from ratios of the Wilson flow parameter \(t_0\) at the flavour-symmetric point. Pion and kaon masses primarily come from [18] unless directly measured as part of this work (indicated in bold) or in a recent project [47] (underlined). Likewise, values of \({\hat{Z}}_\mathrm{V}\) can be obtained from [45] unless also measured as part of this project, using the same approach. Columns two through six indicate the flavour content computed for each class of diagrams: fully connected (4), leading disconnected \((2+2)\), and subleading \((3+1)\), \((2+1+1)\), and \((1+1+1+1)\), where “\(+\)” has been omitted for space reasons. Zeros indicate diagrams that vanish by SU(3) flavour symmetry

3.2 Computational strategies and numerical cost

Table 2 illustrates the number of gauge configurations used for our study and the number of point-source propagator inversions per configuration performed for both the fully-connected and \((2+2)\) disconnected. For the disconnected, over an order of magnitude higher statistics was used in comparison to the connected. Typically we favor having larger multiplicities of |y| per configuration, as opposed to a larger number of configurations for the disconnected piece, since it is more effective at reducing the noise. Ideally the number of self-averages per-|y| range in the thousands per configuration, overall the total statistics (configurations\(\times \)sources\(\times \)self-averages) lies in the low millions per point. We follow the same setup as in [37], building a grid of point sources in such a way as to maximise the number of self-averages available per |y|.

Although the total number of propagator solves per ensemble are comparable, the computational cost per solve as the pion mass is reduced grows significantly, as the lattice volume increases such that \(m_\pi L \ge 4\), and generically the cost of a solve grows like \(\frac{V^n}{m_\pi ^m}\) with m and n both being greater than unity. Although we used a particularly sophisticated propagator-solving routine [48], this prohibitive growth in cost is presently unavoidable.

Fig. 2
figure 2

Ensembles used in this work, shown on the \((m_\pi ^2,a^2)\) plane. The colour depends smoothly on \(m_\pi L\), with darker colours corresponding to larger values, and the symbols have \(\lfloor m_\pi L \rfloor \) sides. The cross indicates the physical point

To partially ameliorate the overhead from propagator solves, a truncated solver/AMA technique [49, 50] was used for all of the \((2+2)\) contributions on ensembles away from the \(\text {SU}(3)_f\)-symmetric point, with a sloppy stopping criteria of \(10^{-3}\) on the norm of the residual. As the propagator solve cost was dominant in the \((2+2)\) calculation, a sloppy solve on one of our most expensive ensembles (D200) was approximately \(6\times \) faster than a high-precision solve to \(10^{-10}\).

Table 2 Statistics gathered for the fully-connected and \((2+2)\) disconnected contributions. For the fully connected different hypercubically-equivalent orientations were used (hence the \(\times \)). \(N_{\text {conf}}\) indicates the number of gauge configurations used and \(N_{\text {solve}}\) indicates the number of propagator inversions performed per \(N_{\text {conf}}\). The \(\text {SU}(3)_f\)-symmetric point ensembles’ data was already used in [37], although an update for the \((2+2)\) on N202 has been performed here and the coarse ensemble A653 has been added

3.3 Higher-order contributions

For the quark loops containing a single (local) vector current insertion, we make use of an extensive general-purpose data set generated as part of a different project [47]. Therefore we restrict our description of the computational aspects related to these loops to those directly relevant to the Hlbl calculation. Since we are dealing exclusively with the electromagnetic current, it is always the difference of a light and a strange quark loop that is needed. To compute this difference, the “one-end trick”, which has been applied extensively in twisted-mass fermion calculations [51, 52], is used as proposed in Ref. [53]. The one-end trick yields an efficient estimator for the required difference of Wilson-quark loops based on the identity

$$\begin{aligned}&{\mathrm {tr}}\left[ \gamma _\mu (S^{l}(x,x) - S^{s}(x,x))\right] \nonumber \\&\quad = (m_{s}-m_l) \sum _y\mathrm {tr} \left[ \gamma _\mu S^{l}(x,y) S^{s}(y,x)\right] . \end{aligned}$$

The right-hand side of this equation is evaluated using stochastic volume sources, inserted between the two propagators, without spin or color dilution. In this way, gauge noise is reached after a few hundred sources at most. The stochastic estimate of the quantity (19) is averaged over blocks of individual volume sources, leading to four “effective” sources that are stored separately as entire fields. Having access to four effective sources is sufficient to compute all higher-order disconnected diagrams for \(a_\mu ^{\text {Hlbl}}\) without introducing any bias into the final result. For further technical details of the general computational setup we refer to the description in Ref. [47].

The parametrisations of Eqs. (14), (16), and (18) share certain x- or z-integrals, which allows us to precompute and recycle these terms for the different contributions. For the \((3+1)\) contribution, the triangle term defined in Eq. (13) and the terms derived from it can be conveniently obtained from the intermediate quantities in the calculation of the fully-connected contribution. Based on this observation, we choose for the \((3+1)\) topology the same set of points for our origin and y-vector as for the fully-connected. Once the factorised terms in Eq. (14) are computed, the Lorentz contraction with the terms which contain a disconnected loop can be performed off-line as a post-processing step.

For the \((2+1+1)\) contribution, we first compute and save the lattice-wide two-point functions, Eq. (9), for each source position, and then do the VEV subtraction and construct Eq. (16) again off-line. The sources are chosen to be the same set of points as for the fully-connected case. Nonetheless, after setting the origins at these source points, our parametrisation Eq. (16) still allows us to have many choices for the y-vector for a given |y|, because we have at our disposal the two-point function Eq. (9) and the disconnected loop Eq. (12) as entire lattice fields.

A good choice for the y-vectors is hence to pick from the elements on the same orbit under the cubic group. As an example, to obtain \((|y|/a)^2 = 12\), one can choose the 4-vector y to be (abcd) with \(a,b,c\in \{-1,1\}\) and \(d\in \{-3, 3\}\), if all these points fit in a range where boundary effects can be neglected. A summary of the choices of the y-vector for the ensembles used for the \((2+1+1)\) computation is given in Table 3. Likewise, the \((1+1+1+1)\) calculation also benefits from this strategy because of the reuse of the data generated for the \((2+1+1)\) integrand.

Table 3 Choice of the y-vectors and statistics for the \((2+1+1).\) Here, n is an integer

4 The integrand of the two dominant contributions

In this section, we describe the integrands of the light connected and light \((2+2)\) disconnected contributions obtained in our lattice QCD calculations. Our goal is on the one hand to present some of the available data at small pion masses, and on the other to compare it to the predictions of hadronic models, such as the \(\pi ^0\) exchange contribution. Finally, an observation on the approximate analytic form of the integrand for the latter contribution motivates the analysis of the lattice data presented in the next section.

Fig. 3
figure 3

Left: The light connected contribution on the three most chiral ensembles. The solid curve represents the \(\pi ^0\) exchange in infinite volume, computed with the parameters directly determined on ensemble D200 [27]. Right: The light connected contribution on ensemble D200, compared to the predictions of the \(\pi ^0\) exchange (with a VMD transition form factor), the constituent quark loop, as well as the charged pion loop. The latter two contributions are computed within spinor and scalar QED, respectively

Fig. 4
figure 4

The connected (left) and the \((2+2)\) disconnected (right) contributions on ensemble C101, compared to pseudoscalar-meson exchange contributions in infinite volume (continuous curves), as well as the \(\pi ^0\) exchange contribution in finite volume (blue points). The dashed curve shows an approximate representation of the infinite-volume \(\pi ^0\) exchange integrand by the function \(A|y|^3\exp (-B|y|)\), with \((A\,[\mathrm{fm}^{-4}],B\, [\mathrm{fm}^{-1}])=(840,\;2.34)\) in the connected and \((-582,\;2.27)\) in the disconnected case

We begin with the left panel of Fig. 3, showing an overview of the integrand of the light connected contribution for our three most chiral ensembles (C101, D450, D200), for which the pion mass lies in the interval 200 to 220 MeV. These three ensembles have different lattice spacings and different volumes, nevertheless the corresponding data points fall within one recognisable band. The maxima of these integrands, which lie between 0.7 and 0.9 fm, are followed by a slow fall-off. Beyond \(|y|=2\) fm, the integrand vanishes within the uncertainties. The height of the maximum is about 20% higher than at the SU(3)-flavour symmetric point [37], \(m_\pi =m_K\approx 420\) MeV.

Figure 4 focuses on the data of ensemble C101. The connected and \((2+2)\) disconnected data are displayed separately in the two panels. The disconnected integrand is negative and admits a minimum at \(|y|\approx 1.2\) fm. The signal degrades sooner than in the connected case, and is lost around 1.5 fm. The ordinate of the minimum is about twice as large as the one found on ensemble H101 at the SU(3)-flavour symmetric point [37], despite the fact that the latter case includes the strange quark, so that this contribution is weighted with the electric-charge factor 36/81 rather than 25/81. Thus we anticipate a very strong chiral dependence of the (2+2) disconnected contribution to \(a_\mu ^{\text {Hlbl}}\).

Figure 4 also compares the integrand to pseudoscalar-exchange predictions based on the vector-meson dominance (VMD) parametrisation of the corresponding transition form factor. As weight factors with which \((\pi ^0,\eta ,\eta ')\) contribute to the connected diagrams, we have used (34/9, 0, 0); the weight factors we have applied for the disconnected diagrams are \((-25/9,1,1)\). While these weight factors are certainly the expected ones for the \(\pi ^0\), the issue of which weight factors are appropriate for the isoscalar mesons is more complicated and depends in particular on their mixing; see Table 10 and the corresponding analysis presented in appendix, as well as Refs. [54, 55]. For the \(\pi ^0\) exchange, the contribution has also been computed in finite volume. As can be seen on the left panel, the finite-volume connected integrand is predicted to dive towards negative values at long distances. Whether the lattice data does the same is uncertain due to the growing statistical errors. The lattice data points lie below the \(\pi ^0\)-exchange prediction. A very similar observation was made at the SU(3)-flavour symmetric point [37]. We do not have a clear explanation for the difference, but note that for ensemble D200, we observe a somewhat better agreement (see the right panel of Fig. 3 discussed below). For the disconnected contribution, the finite-size effect on the integrand are predicted to become significant only around \(|y|=2\) fm, which is beyond the useful range of our lattice data. The \(\eta \) and \(\eta '\) contributions have been estimated very roughly by using the parameters indicated in the figure. The \(\eta \) mass estimate comes from using the Gell-Mann–Okubo formula, knowing the pion and kaon masses, while the same \(\eta '\) parameters are used as in [37]. The two isoscalar mesons contribute comparably to \(a_\mu ^{\text {Hlbl}}\). In the region between 0.8 and 1.2 fm, the \(\pi ^0\)-exchange prediction is consistent with the lattice data.

While the disconnected contribution does not have a strong short-distance contribution, the connected contribution does. Following [37], we attempt to explain the integrand semi-quantitatively by combining a constituent quark loop with the long-distance contributions, i.e. the \(\pi ^0\) exchange and the charged pion loop. The right panel of Fig. 3 illustrates the comparison of this rough hadronic model with the lattice data. The quark loop as well as the pion loop are calculated in the spinor and scalar QED frameworks respectively, i.e. without the inclusion of form factors. Including the quark loop leads to a satisfactory description of the shape of the integrand, even though the total prediction overshoots the data at distances \(|y|\lesssim 0.6\,\)fm. This difference can partly be explained by the cutoff effects present in the data, which tend to reduce the lattice integrand, and it is likely that including a form factor for the constituent quarks would improve the agreement. At distances \(|y|\gtrsim 0.9\) fm, the model prediction is consistent with the lattice data.

In summary, both in the connected and the disconnected case, the prediction for the \(\pi ^0\) exchange alone gives a good first estimate of the magnitude of the integrand. It also predicts the approximate shape of the integrand in the disconnected case. Hence it is worth asking whether the integrand for the \(\pi ^0\) exchange can be approximated by a simple analytic function. Figure 4 shows that the infinite-volume \(\pi ^0\)-exchange integrand can be approximated very well at its extremum and beyond with the form \(f(|y|) = A|y|^3 \exp (-B|y|)\), displayed as a dashed line. In the connected case, the approximation holds to an excellent degree even at short distances. These observations, which apply to our specific choice of kernel \(\mathcal {{{\bar{L}}}}^{(\Lambda )}\), form part of our motivation to use this functional form in the next section to extend the integrand obtained in lattice QCD to long distances.

5 Light-quark fully-connected and \((2+2)\) contributions

In this section, we describe the extraction of the dominant contributions, namely the light-quark fully-connected and (2+2) contributions. In the previous section, the integrands are illustrated and compared semi-quantitatively to the main known hadronic contributions. The rapid increase of the relative error on the integrand with growing |y| leads us to employ a method to extend the useful range of the data. In our previous calculation [37], the long-distance tail was assumed to come from \(\pi ^0\), \(\eta \), and \(\eta '\)-exchange contributions, with the dominant \(\pi ^0+\eta \) part determined from a lattice calculation of the \(\pi ^0\gamma \gamma \) transition form factor [27]. The fact that the integrand of the \(\pi ^0\) exchange itself is well described by a simple functional form has led us to adopt a more self-contained and data-driven approach, which relies on extending the tail via a fit to the data. In both the connected and \((2+2)\) contributions, we perform a fully-correlated fit to the data from each ensemble using the ansatz

$$\begin{aligned} f(|y|) = |y|^3 Ae^{-B|y|}, \end{aligned}$$

where A and B are free parameters. In the intermediate |y| regime, this fit form describes all of our data well, with \(\chi ^2/\text {dof}\) close to 1. As our data become noisy at large |y|, the fit significantly reduces the error for the integral of the long-distance tail. In our approach, we will choose a cutoff distance: below it, we will numerically integrate the lattice data using the trapezoid rule; above it, we will switch to integrating Eq. (20). The cutoff is chosen to be a point where the integrated \(a_\mu \) exhibits stability. The values of \(a_\mu \) from all ensembles will then be used in a combined chiral, infinite-volume, and continuum extrapolation. In particular, while in [37] the volume effects were corrected for on each ensemble using the prediction for the \(\pi ^0+\eta \) exchange, here we rely on a global fit to all ensembles to eliminate these effects, with an ansatz for the L-dependence motivated by the same meson exchange.

Fig. 5
figure 5

An example of our fit ansatz Eq. (20) for the fully-connected contribution from ensemble N202

Figure 5 shows an example of our ability to describe the lattice data with Eq. (20) for the fully-connected contribution. The displayed data [37], corresponding to ensemble N202, are among the most precise at the \(\text {SU}(3)_f\)-symmetric point, and the correlated fit has a \(\chi ^2/\mathrm{dof}\) of 1.1. The figure illustrates that the data is very well described by our ansatz all the way to the point where the data fluctuates around zero and the signal is likely lost.

Table 4 The two leading light-quark contributions to \(a_\mu ^{\text {Hlbl}}\) for each gauge ensemble

Table 4 summarises our results for the two leading sets of diagrams containing only light quarks. For the extrapolation to the infinite-volume, physical pion mass, and continuum limit for both the fully-connected and \((2+2)\) disconnected contributions we use the following ansatzFootnote 2,

$$\begin{aligned}&a_\mu (m_\pi ^2,m_\pi L,a^2)\nonumber \\&\quad = A\, e^{-m_\pi L/2} + B\, a^2 + C\; S(m_\pi ^2) + D + E\, m_\pi ^2, \end{aligned}$$

where we have identified several candidates for the non-analytic function \(S(m_\pi ^2)\),

$$\begin{aligned}&\text {Pole }{::} \ \frac{1}{m_\pi ^2}\nonumber \\&\text {Log }{::} \ \log {m_\pi ^2} \nonumber \\&\text {Log2 } {::} \ \log ^2{\left( m_\pi ^2\right) } \nonumber \\&\text {m2Log }{::} \ m_\pi ^2 \log {\left( m_\pi ^2\right) }. \end{aligned}$$

These functions are inspired by the divergent chiral behaviors at the large-\(N_c\) limit of the pion-pole exchange and the charged-pion loop contribution [56].

Fig. 6
figure 6

Left: Partially-integrated light-quark connected contribution to \(a_\mu ^{\text {Hlbl}}\) versus \(|y|_{\text {Max.}}/a\) for ensembles C101, H105, U102, and U103, which have a broad range of pion masses but the same lattice spacing and similar \(m_\pi L\). The points are the numerically integrated lattice data and the curves result from switching the integrand to the fit of Eq. (20) above a cutoff. Right: Chiral, continuum, and infinite-volume extrapolation of the light-quark connected contribution using the Pole ansatz, shown versus \(m_\pi ^2\). The points are lattice data at finite L and nonzero a, and the cross indicates the extrapolated result at physical pion mass. Curves show the dependence on \(m_\pi ^2\) for fixed a and \(m_\pi L\), with the black curve corresponding to the continuum and infinite volume. The four red curves have different values of \(m_\pi L\) but the same lattice spacing corresponding to \(\beta =3.40\); three of them correspond to \((a,m_\pi L)\) of ensembles H101, C101, and H105

Fig. 7
figure 7

Light-quark \((2+2)\) disconnected contribution to \(a_\mu ^{\text {Hlbl}}\). See the caption of Fig. 6

5.1 Light-quark fully-connected results

The left plot of Fig. 6 uses the partially-integrated \(a_\mu (|y|_\text {Max.})\) defined in Eq. (5) to illustrate the growth in the size of the connected contribution with decreasing pion mass. Here we consider a constant lattice spacing (\(a=0.0864\) fm) and include data from similar \(m_\pi L\) to help isolate the chiral behaviour. The curves begin at the cutoff where we switch to integrating the fitted Eq. (20), with the trapezoidal-rule integrals of the lattice data up to the cutoff added to them. The fit adequately reproduces the lattice data above the cutoff, and saturates where the trapezoidal-rule integral does, within the uncertainties. At large \(|y|_\text {Max.}\), some of the lattice points begin to drop below where the fit asymptotes; this is probably a mixture of finite volume effects and loss of signal in the integrand.

The right plot of Fig. 6 shows a combined chiral, infinite-volume, and continuum limit extrapolation based on the “Pole” ansatz. The horizontal axis is \(m_\pi ^2\), and we illustrate the dependence of the global fit on a and \(m_\pi L\) via curves that show the fit at various fixed \((a,m_\pi L)\). The result increases along all three dimensions of the extrapolation (larger volumes, finer lattice spacings, and lighter pion masses), which produces a large combined effect.

5.2 Leading light-quark disconnected results

The \((2+2)\) disconnected analogue of Fig. 6 can be found in Fig. 7. It is clear that much like the connected data, the size of the contribution increases with decreasing pion mass and so a very significant cancellation will occur at the physical pion mass between these two contributions with opposite signs. This cancellation was predicted in Ref. [54] on the basis of the \(\pi ^0\) exchange contribution and is illustrated in Fig. 4. It is also worth noting that the statistical precision of the disconnected data is significantly worse than the connected, even though almost an order of magnitude more measurements were performed.

On the right-hand side of Fig. 7, we show the chiral-continuum-infinite-volume extrapolation with different \(m_\pi L\) at fixed lattice spacing. Much like in our previous work [37], we find the lattice-spacing dependence to have a slope of the same sign as the connected contribution. It is also evident that an accidental partial cancellation occurs between the finite-volume and lattice-spacing terms, with the approach to the infinite volume making the \((2+2)\) contribution more negative and the approach to the continuum limit making it less negative.

5.3 Combined light-quark results

Due to the significant cancellation between the connected and the \((2+2)\) contribution, we find it useful to take the ensemble-by-ensemble sum of the contributions and then perform the extrapolation. For this sum, our data cannot resolve any of the terms non-analytic in \(m_\pi ^2\) of Eq. (22), and it appears that these contributions largely cancel. We find that the fit ansatz

$$\begin{aligned}&a_\mu (m_\pi ^2,m_\pi L,a^2)\nonumber \\&\quad = a_\mu (0,\infty ,0)(1+Am_\pi ^2 + Be^{-m_\pi L/2} + Ca^2), \end{aligned}$$

describes our data very well. This ansatz assumes that any potential singular terms in our data cancel to a large extent, an assumption that we address along with the discussion of systematics in Sect. 8. Here we simply note that, in addition to the cancellation between the connected and the \((2+2)\) contribution, the chirally singular behaviour expected in \(a_\mu ^{\mathrm{Conn}+ (2+2)}\) from the \(\pi ^0\) exchange and the charged pion loop is numerically suppressed over the pion-mass interval 135–200 MeV, due to a partial cancellation between these two long-distance contributions.

Fig. 8
figure 8

Chiral-continuum-infinite-volume extrapolation of the sum of the light-quark, fully-connected and \((2+2)\) contributions to \(a_\mu ^{\text {Hlbl}}\). See the caption of Fig. 6 (right panel)

Fig. 9
figure 9

Left: Consistency of the infinite-volume estimate for the light connected contribution on ensemble C101 between the analysis performed here (horizontal band) and the analysis method of Ref. [37] (blue points), which corrects the lattice data (black points) using the \(\pi ^0\) exchange prediction. The horizontal band is obtained by adding the finite-size correction from the global fit displayed in Fig. 6 (right panel) to the \(a_\mu ^{\mathrm{(Conn)}}\) value obtained on ensemble C101 using the tail extension parametrisation Eq. (20). Note that the band does not include the systematic uncertainty of varying the fit ansatz in the global fit, which is addressed in Sect. 8. Right: A study of the matching point where we switch to integrating the fit parameters for the ensemble C101

Figure 8 shows an extrapolation for the sum of the light-quark, fully-connected and \((2+2)\) disconnected contributions. It appears in the plot that no chiral curvature is present in this combination and the error grows at lighter pion masses; this is due to the large cancellation between the connected and disconnected contributions. Considering the final column in Table 4, we do not appear to benefit from a cancellation of statistical errors due to correlations between the two measurements. It is also clear that the approach to the infinite volume is less severe in the combination of these two quantities compared to fitting them individually; this is likely due to large cancellations in the long-distance contributions such as the pion pole. We still see significant discretisation effects in this fit, but fortunately we have several lattice spacings to constrain this behavior; nevertheless this will form our largest systematic as is discussed later on in Sect. 8.

5.4 Consistency checks

It is useful to compare the present analysis to our previous work at the \(\text {SU}(3)_f\)-symmetric point [37]. In that work, we combined light and strange contributions and obtained \(98.9(2.5)\times 10^{-11}\) for the connected contribution in the infinite-volume and continuum limit, and \(-33.5(4.2)\times 10^{-11}\) for the disconnected. Combining these with charge factors adjusted to isolate the (ud) quark contribution yields \(70.1(3.8)\times 10^{-11}\). The extrapolation in Fig. 8 at \(m_\pi ^2 \approx 0.173 \text { GeV}^2\) is \(72.5(4.3)\times 10^{-11}\), so these results are in good agreement, even though the underlying methodology is considerably different.

The left plot of Fig. 9 illustrates the consistency between our previous ‘tail and finite-size correction’ methodology (blue and black points have been interpolated) and the result of our global fit with the Pole ansatz. The two are consistent within the combined statistical and systematic error of the blue data points, although the black line does lie a bit lower than the central value. In the C101 data, there is a strong upward fluctuation (visible in Fig. 4) that pushes both the black and the blue points up and likely hides a stable plateau region. The right plot of Fig. 9 shows an investigation of the point where we switch to integrating the fitted integrand (for a fixed fit range) rather than the lattice data for the ensemble C101; the leftmost point of this plot is the result we quote in Table 4. There is good consistency between these points and we consider any systematic due to this variation to be sub-dominant compared to other systematics, in particular from discretisation effects.

Table 5 Fully-connected and the combined \((2+2)\) ls and ss contributions to \(a_\mu ^{\text {Hlbl}}\)
Fig. 10
figure 10

Strange contributions for ensemble C101. Left: Strange and light connected integrands, in lattice units. The strange integrand has been multiplied by 35 for visibility. Right: Partially integrated \((2+2)\) contributions with three different flavour combinations

6 Strange contributions

For the fully-connected strange, the \((2+2)\)-light-strange (ls) and the \((2+2)\)-strange-strange (ss) contributions, we use results from a subset of the ensembles (listed in Table 5) to cut down on cost for what turns out to be a very small contribution to the overall \(a_\mu ^{\text {Hlbl}}\). Here we can reuse the results from the symmetric point with the appropriate charge factors.

The left plot of Fig. 10 illustrates the magnitude of light and strange quark contributions to the fully-connected diagrams for ensemble C101, which has a light pion mass. The light-quark contribution is much longer ranged than the strange and statistically far noisier. The peak of the strange integrand for this ensemble is about 35 times smaller than the light one, and the overall integrated contribution is calculated to be about 55 times smaller; as one approaches the physical pion mass this difference will only grow.

The right plot of Fig. 10 illustrates the size of the contributions from different flavour combinations within the \((2+2)\) calculation: light-light (ll), light-strange (ls), and strange-strange (ss). We again show ensemble C101 and use the same statistics for all flavours. For this ensemble the ls contribution is a bit larger than \(10\%\) of the ll, and the ss contribution is about \(0.6\%\). It can be seen that the integrated \(a_\mu \) plateaus earlier for heavier quark content and the statistical precision is better too. As the majority of the data in this analysis comes from the previous \(\text {SU}(3)_f\)-symmetric work, the same conclusions apply; finite-volume effects and cut-off effects are still sizeable even for the contributions including strange quarks.

We choose to extrapolate the sum of all the strange and light-strange contributions to the infinite-volume, physical quark mass, continuum limit using the Ansatz

$$\begin{aligned}&a_\mu (m_K^2,m_\pi L,a^2)\nonumber \\&\quad = a_\mu (0,\infty ,0)(1+Am_K^2 + Be^{-m_\pi L} + Ca^2). \end{aligned}$$

It is worth noting that the exponential volume factor here is \(m_\pi L\) instead of \(m_\pi L/2\) for the light-quark contribution as there is no \(\pi ^0\)-exchange.

A plot of this extrapolation can be found in Fig. 11. The fit of Eq. (24) gives a \(\chi ^2/\text {dof} = 0.6\). Again, we see the \((2+2)\) contribution approaching the continuum limit with the same sign as the fully-connected contribution; in the continuum, these two contributions effectively cancel. Our final result at the physical point is

$$\begin{aligned} a_\mu ^{(\text {Conn.} + (2+2))\text {-}s} = -0.6(2.0)\times 10^{-11}. \end{aligned}$$
Fig. 11
figure 11

An extrapolation of the sum of the fully-connected strange and \((2+2)\) ls and ss contributions to \(a_\mu ^{\text {Hlbl}}\)

7 Higher-order contributions

The remaining three topologies, \((3+1)\), \((2+1+1)\), and \((1+1+1+1)\), contain one, two, and four disconnected loops, respectively. Empirically, diagrams containing a disconnected loop with a vector current have been found to be suppressed in QCD correlation functions [27, 57,58,59]. Furthermore, these loops vanish at the SU\((3)_f\)-symmetric point where light and strange quarks are degenerate. Finally, diagrams containing more loops are suppressed at large \(N_c\). These considerations lead to the expectation that the \((2+1+1)\) and \((1+1+1+1)\) topologies are suppressed relative to \((3+1)\), which is itself suppressed relative to the two leading topologies.

Our goal is thus to compute the \((3+1)\) class of diagrams as well as we can, and provide evidence that the \((2+1+1)\) and \((1+1+1+1)\) are small enough to be neglected from our targeted error budget. In particular, we will give details about how we treat the |y|-integration of our \((3+1)\) data.

7.1 The \((3+1)\) contribution

The charge factor of \((3+1)_{\mathrm{strange}}\) is \(-1/7\) of that of \((3+1)_{\mathrm{light}}\). (Recall that here the flavour label corresponds to the triangle and that for the disconnected loop, we always use the combined light and strange contributions.) Furthermore, because of the larger mass of the strange quark, the \((3+1)_{\mathrm{strange}}\) contribution is expected to be much smaller compared to \((3+1)_{\mathrm{light}}\). We will thus put our main effort on the \((3+1)_{\mathrm{light}}\) contribution. Numerical evidence of the smallness of the \((3+1)_{\mathrm{strange}}\) contribution is given in Sect. 7.1.3; it turns out that this quantity is at least ten times smaller than the contribution with light triangle.

Fig. 12
figure 12

Determination of the \((3+1)_{\mathrm{light}}\) contribution to \(a_\mu ^{\text {Hlbl}}\) on ensemble C101 using the tail treatment procedure. Horizontal offsets are applied for visibility. One can see that the total error is minimised for \(|y|_{\mathrm{cut}}\) between 1.2 and 1.7 fm

7.1.1 Treatment of the tail of the \((3+1)_{\mathrm{light}}\) integrand

At large distances, the physics is dictated by the lightest particles, i.e., the pseudoscalar mesons. The most relevant contributions are neutral pseudoscalar-meson poles and charged pseudoscalar-meson loops ([2] and the references therein). The computation based on Partially-Quenched Chiral Perturbation Theory (PQChPT) in Appendix A shows that there is no contribution at leading order coming from pseudoscalar-meson poles. Nevertheless, the \((3+1)_{\mathrm{light}}\) receives contributions from pseudoscalar-meson loops (cf. Fig. 20 and Table 9). As the signal of the integrand degrades rapidly with increasing |y|, we decide to use the knowledge from light pseudoscalar contributions in infinite volume as a guideline for cutting the integral at some \(|y|=|y|_{\mathrm{cut}}\).

The procedure is as follows. We split the |y|-integral for \(a_\mu ^{(3+1)}\) into two intervals, below and above \(|y|_{\mathrm{cut}}\), so that \(a_\mu ^{(3+1)}=a_\mu ^<+a_\mu ^>\). Below \(|y|_{\mathrm{cut}}\), we numerically integrate the lattice data to obtain \(a_\mu ^<\). Above \(|y|_{\mathrm{cut}}\), we take the central value of \(a_\mu ^>\) to be zero and assign an uncertainty to this omitted tail contribution based on a calculation of the charged pseudoscalar-meson loop contribution in scalar QED. Final we add the two uncertainties (statistical for \(a_\mu ^<\) and systematic for \(a_\mu ^>\)) in quadrature.

For the tail uncertainty, we compute the integrand in infinite-volume scalar QED and integrate from \(|y|_{\mathrm{cut}}\) to infinity, which gives an order-of-magnitude estimate of the possible missing mesonic contributions in the tail. We then assign a very conservative systematic error, namely \(w_{\mathrm{sys.}}=120\%\) of this contribution. (This choice of \(w_{\mathrm{sys.}}\) will be justified in the next subsection.) As scalar QED corresponds to pointlike photon-pseudoscalar-pseudoscalar vertices, which tend to overestimate the pseudoscalar-meson loop contributions to \(a_\mu \), we assume that the assigned systematic error also covers the possible finite-size effects in the \(|y|<|y|_{\mathrm{cut}}\) region. Finally, we determine \(|y|_{\mathrm{cut}}\) for each lattice ensemble by finding the value that minimises the total error. An example of this procedure is shown in Fig. 12.

7.1.2 Numerical results for the \((3+1)_{\mathrm{light}}\) contribution

Table 6 Results for \((3+1)_{\mathrm{light}}\) on each ensemble (using \(w_{\mathrm{sys.}}=120\%\)), along with the choice of \(|y|_{\mathrm{cut}}\) and the contributions to \(a_\mu \) below and above the cut
Fig. 13
figure 13

Left: Extrapolation of the \((3+1)_{\mathrm{light}}\) contribution to \(a_\mu ^{\text {Hlbl}}\), determined using \(w_{\mathrm{sys.}}=120\%\). The points show the results from each ensemble and the vertical green line indicates physical meson masses. The orange dashed lines show the extrapolation to the physical point with Eq. (26), excluding the coarsest ensemble A654. Right: Fit results of Eq. (26) after applying cuts to the data, with two choices of \(w_{\mathrm{sys.}}\)

Table 6 shows our choice of \(|y|_{\mathrm{cut}}\) and the value of \(a_\mu ^{(3+1)}\) computed on each ensemble, with \(w_{\mathrm{sys.}}=120\%\); these results are also plotted in Fig. 13 (left). It is already clear from this plot that there is no distinguishable O(a)-dependence in the data at our level of precision. The same is true of volume effects. This leads us to parameterise our data in a very simple form, namely

$$\begin{aligned} a_\mu ^{\text {(3+1)-}l} = A (m_K^2 - m_\pi ^2). \end{aligned}$$

Such a mass-dependence is motivated by the fact that this contribution must vanish at the \(\text {SU}(3)_f\)-symmetric point. This fit describes our data well and we investigate the stability of our final fit result through applying several cuts in our data, as is shown in Fig. 13 (right). Also shown in the same figure are the results obtained by applying the same procedure with a different choice of the weight, \(w_{\mathrm{sys.}}=200\%\), for the estimate of the systematic error of the omitted tail contribution. Note that a bigger value of \(w_{\mathrm{sys.}}\) implies that one cuts the lattice data at larger |y|. As the lattice data become noisier with increasing |y|, the fluctuations of the central value determined from the lattice data also become larger, especially for our ensembles with lighter pion mass. However, from the consistency between fits with different cuts in the data, it seems that our choice of \(w_{\mathrm{sys.}}=120 \%\) is reasonable enough without being too conservative. For our final determination, including our fit-systematic, we choose to quote the determination excluding the coarsest ensemble A654 (\(a^2<0.2~\text {GeV}^{-2}\)):

$$\begin{aligned} a_\mu ^{(3+1)\text {-}l} = 0.0(0.6)\times 10^{-11}. \end{aligned}$$
Fig. 14
figure 14

Partially-integrated \((3+1)_{\mathrm{strange}}\) contribution to \(a_\mu ^{\text {Hlbl}}\) for ensembles C101 (\(m_\pi \approx 220\) MeV, \(m_K\approx 470\) MeV) and H105 (\(m_\pi \approx 280\) MeV, \(m_K\approx 460\) MeV), compared to the \((3+1)_{\mathrm{light}}\) for the ensemble H105. The \((3+1)_{\mathrm{strange}}\) data are multiplied by 10 for visibility. As for the statistics for the \((3+1)_{\mathrm{strange}}\), it is \(50\%\) compared to the \((3+1)_{\mathrm{light}}\) for C101 and about \(15\%\) for H105

Fig. 15
figure 15

Partially-integrated higher-order contributions to \(a_\mu ^{\text {Hlbl}}\), in comparison with \((3+1)_{\mathrm{light}}\). Left: Ensemble N203 (\(m_\pi \approx 340\) MeV). Right: Ensemble C101 (\(m_\pi \approx 220\) MeV)

7.1.3 The \((3+1)_{\mathrm{strange}}\) contribution

We have computed the \((3+1)_{\mathrm{strange}}\) contribution using two ensembles: C101 and H105. For both ensembles, the partially-integrated \(a_\mu \) is shown in Fig. 14, and this is compared with the \((3+1)_{\mathrm{light}}\) for ensemble H105. It is clear from the lattice data that \((3+1)_{\mathrm{strange}}\) is at least ten times smaller than our bound on \((3+1)_{\mathrm{light}}\) and can be entirely neglected for our target precision compared to the leading contributions. As it involves the strange-quark triangle, we expect that this quantity depends only on hadronic states which are at least as heavy as the kaon. From this point of view, because the kaon masses on the used ensembles are somewhat lighter than the physical one, we find it exceedingly unlikely that it would grow significantly as the quark masses approach their physical values.

7.2 The \((2+1+1)\) and the \((1+1+1+1)\) results

Due to the much-higher computational cost of the lattice-wide object (Eq. (9)) used in our computational strategy, we only determined the light-quark, \((2+1+1)\) contribution for the ensembles N203 and C101. Also, we only computed the \((1+1+1+1)\) contribution for the ensemble C101 because of our expectation for its insignificance to the final error. The results for the partially integrated \(a_\mu (|y|)\) for both of these ensembles are shown in Fig. 15.

A computation from PQChPT (see Appendix A) shows that at leading order, these two topologies receive neither contributions from the neutral pseudoscalar-meson poles, nor from charged pseudoscalar-meson loops. It is therefore hard to decide at which value of |y| one can cut the lattice data and apply a model prediction afterwards. However, one can see from Fig. 15 that the \((2+1+1)\)-contribution for both ensembles is smaller than the \((3+1)\) at small |y|.

Although the rapid degradation of the signal of the \((2+1+1)\)-contribution is expected, our strategy of averaging over possible ways of constructing the vector y for a given |y| appears to work well at suppressing the statistical noise of this quantity at short distances. In the end, we conservatively estimate this quantity to be zero with half the error of the \((3+1)\) contribution. From the smallness of the light-quark contribution of the \((2+1+1)\) topology, we deem it legitimate to assign the value of zero to the strange contribution. This comes with no contribution to the error budget, as this will be irrelevant compared to our overall level of precision for \(a_\mu ^{\text {Hlbl}}\). Note that the mere charge factor suppresses the strange \((2+1+1)\) contribution relative to the light \((2+1+1)\) by a factor of five. As for the \((1+1+1+1)\) contribution, its observed smallness on the right panel of Fig. 15 does not come as a surprise, in particular since its charge factor weights it five times less than the already-small \((2+1+1)\). Any improvement to either of these quantities would have a completely negligible effect on the final result for \(a_\mu ^{\text {Hlbl}}\), at our current level of precision.

8 The total \(a_\mu ^{\text {Hlbl}}\)

In this section we investigate two approaches to determining the contribution of the two leading light-quark contributions to \(a_\mu ^{\text {Hlbl}}\): the first consists of fitting the sum of the two contributions and the second consists of adding the results of individual fits to the fully-connected and \((2+2)\) contributions using various ansätze. We investigate possible systematics in our approach by comparing the results with terms in a or \(a^2\) and by performing cuts in \(m_\pi L\), \(a^2\), and \(m_\pi ^2\).

8.1 Sum and fit

Fig. 16
figure 16

Chiral-continuum-infinite-volume fits to the sum of the light-quark fully-connected and \((2+2)\) disconnected contributions. The vertical lines represent the result given in Eq. (28) with its statistical (dashed lines) and full uncertainty (solid lines)

We find that the fit Ansatz of Eq. (23) describes our data well. At the same time, a term linear in a instead of \(a^2\) also gives a good fit (\(\chi ^2/\text {dof} < 1\) for all fits in this section). The results for these fits can be found in Fig. 16 and are listed in Table 11 in Appendix B. There is a systematic difference between the fits in a and \(a^2\), with the former pulling the final value up a little. Applying the various cuts has little impact on the central value, and only the cut on the pion mass removing the \(\text {SU}(3)_f\)-symmetric-point data leads to a significant increase of the statistical error. This is not surprising, as the larger the volume and the closer the pion mass to its physical value, the larger the cancellation between the fully-connected and \((2+2)\) contributions becomes, and therefore the relative error on their sum.

It is thus clear from Fig. 16 that our main systematic in this approach comes from the continuum extrapolation, as was the case in our previous work [37]. For the final result from this analysis, we treat the two ansätze for the parametrization of cutoff effects on an even footing and perform a fit to a constant to all the possibilities, yielding \((107.4\pm 11.3)\times 10^{-11}\). As an estimate of the systematic error associated with the eight variations of Fig. 16, we compute the root-mean-squared deviation of the fit results \(y_i\) compared to the average result \({\bar{y}}\), i.e. \((\sum _{i=1}^N (y_i-{\bar{y}})^2/N)^{1/2}=9.2\times 10^{-11}\). However, due to the stability of the central value under applying the pion mass cut \(m_\pi ^2<0.165\,\mathrm{GeV}^2\), this systematic error hardly receives a contribution from the uncertainty of the chiral extrapolation. To quantify the latter more conservatively, we perform an auxiliary fit to all data using Eq. (23), except for replacing the term \(Am_\pi ^2\) with the steeper \(A_l \log (m_\pi ^2/\mathrm{GeV}^2)\). The fit yields \((111.0\pm 12.5)\times 10^{-11}\) with \(\chi ^2/\mathrm{dof}=0.9\), and we take the half-difference between the outcomes of this fit and the fit based on Eq. (23) as an estimate of the uncertainty of the chiral extrapolation. The motivation for employing this auxiliary ansatz is that it is able to describe in a satisfactory manner the chiral dependence of the sum of the neutral-pion exchange and the charged pion loop between the physical and the SU(3)\(_\mathrm{f}\)-symmetric pointFootnote 3. Attributing the entire pion-mass dependence of the light-quark contribution to the sum of these two most chirally singular contributions should provide a conservative error estimate for the chiral extrapolation. We finally end up with the result

$$\begin{aligned} a_\mu ^{(\text {Conn.}+(2+2))\text {-}l} = 107.4(11.3)(9.2)(6.0) \times 10^{-11}, \end{aligned}$$

with the first error being statistical, the second and third one representing the systematic uncertainty of the continuum and chiral extrapolations, respectively.

8.2 Individual fits

Considering Fig. 6, it does appear that our data exhibits some curvature in \(m_\pi ^2\) going towards the physical pion mass, however the underlying functional form is unclear. We have identified several ansätze to describe this non-analytic term in Eq. (22), all of which provide acceptable descriptions (\(\chi ^2/\text {dof}\approx 1\)) of our data. A plot summarising the values obtained for \(a_\mu \) at the physical point is shown in Fig. 17 and these values can be found in Table 12 in Appendix B. A fit without some kind of curvature term poorly describes the connected data (\(\chi ^2/\text {dof}\approx 2.5\)), but for the disconnected a good fit is still possible without such a term due to the relatively low statistical precision of the data.

Fig. 17
figure 17

Individual fit results to Eq. (21) for different choices of the curvature function \(S(m_\pi ^2)\), also shown is the result from the previous section

It is clear from Fig. 17 that there is considerable ambiguity on the resulting individual contributions from choosing the functional form of this curvature term, although this somewhat “washes out” when the sum is taken. Due to the difficulty of resolving this term precisely using a global fit, we view combining fits to the individual contributions as a suboptimal procedure, especially without a result very close to the physical pion mass to help constrain this possible curvature. It is, however, reassuring that the two approaches have good agreement. In conclusion, we choose to quote the fit to the sum of the two contributions for our final result.

9 Conclusions

Our final estimate of the light and strange quark contributions to \(a_\mu ^{\text {Hlbl}}\) is

$$\begin{aligned} a_\mu ^{\text {Hlbl}}= 106.8(15.9)\times 10^{-11}. \end{aligned}$$

This result includes all systematics (added in quadrature) as well as previously-unmeasured \((2+1+1)\) and \((1+1+1+1)\) higher-order contributions. The overall precision is about \(15\%\). A breakdown of the individual contributions to this result can be found in Table 7.

Table 7 A breakdown of our result for \(a_\mu ^{\text {Hlbl}}\)

We find that, as we approach the physical pion mass, the two leading contributions to the total \(a_\mu ^{\text {Hlbl}}\), the light-quark fully-connected and \((2+2)\) disconnected, yield significant cancellations. This makes a precise measurement at low pion mass and large-\(m_\pi L\) extremely challenging. In fact, without the data from the \(\text {SU}(3)_f\)-symmetric-point data our determination would be considerably less precise (see Fig. 16). It is also clear that the only quantities really needed in the determination of \(a_\mu ^{\text {Hlbl}}\) are the fully-connected and \((2+2)\) light-quark contributions. We find that all of the sub-leading contributions are consistent with zero within the desired precision. For the first time, we have performed a direct calculation of the \((2+1+1)\) and \((1+1+1+1)\) contributions and again find these contributions to be consistent with zero and smaller than the \((3+1)\), which is expected to be the case from large-\(N_c\) arguments, and naively from the magnitude of their charge factors.

Fig. 18
figure 18

A comparison of our result for the u, d, and s contributions to \(a_\mu ^{\text {Hlbl}}\) with the literature. The results in circles are the two available lattice determinations (this work and [36], above the horizontal dashed line). The results in squares are phenomenological predictions from [2, 32, 56, 61, 62]. All errors have been added in quadrature

As suggested in previous lattice determinations [36, 42] and several phenomenological predictions (e.g. [32, 56, 61, 62] and discussion/references in [2]), the hadronic light-by-light contribution is in no way large enough to bridge the current gap between theory and experiment for the overall \((g-2)_\mu \). In Fig. 18, we illustrate that there is excellent agreement between our determination and the literature. An uncorrelated fit to a constant of the upper three values of Fig. 18 yields \(a_\mu ^{\text {Hlbl}}=97.5(11.6)\times 10^{-11}\). We remind the reader that we consistently omit the contribution of the charm quark, which in [2] is estimated to be \(3(1)\times 10^{-11}\). Whether one performs an average of different \(a_\mu ^{\text {Hlbl}}\) determinations or not, with the level of precision and consistency achieved, the highest priority in improving the overall \((g-2)_\mu \) theory prediction is now to sharpen the HVP determination.

Still, further improvements in the lattice determination of \(a_\mu ^{\text {Hlbl}}\) are clearly possible with the formalism we have employed. It is worth reiterating once more that a lattice determination of \(a_\mu ^{\text {Hlbl}}\) needs only to focus on the light fully-connected and \((2+2)\) contributions as, at the required accuracy to make an impact on the theory prediction of \((g-2)_\mu \), these are the only parts that matter.

Note added in proof: On the day the preprint of this article appeared, the Fermilab Muon \(g-2\) collaboration announced its first result for the direct measurement of \(a_\mu \) [72], confirming the determination of Ref. [1] and reducing the world-average uncertainty to the level of \(41\times 10^{-11}\).