On the computation of hadron-to-hadron transition matrix elements in lattice QCD

We discuss the accurate determination of matrix elementswhere neither |i>nor |f>is the vacuum state and h_w is some operator. Using solutions of the Generalized Eigenvalue Problem (GEVP) we construct estimators for matrix elements which converge rapidly as a function of the Euclidean time separations involved. |i>and |f>may be either the ground state in a given hadron channel or an excited state. Apart from a model calculation, the estimators are demonstrated to work well for the computation of the B*B pi-coupling in the quenched approximation. They are also compared to a standard ratio as well as to the"summed ratio method"of [1,2,3]. In the model, we also illustrate the ordinary use of the GEVP for energy levels.


Introduction
In lattice QCD, masses, energies and vacuum-to-hadron matrix elements are extracted from the large time asymptotics of Euclidean two-point correlation functions. Convergence to ground state matrix elements and energies proceeds with a rate of order exp(−∆ (A) t) where ∆ (A) is the energy difference of the first excited state and the ground state in the hadron channel characterized by a set of quantum numbers A.
Hadron-to-hadron matrix elements of the type A|ĥ w |B require us to consider in addition three-point functions, which contain two time separations, (1.1) and the corrections to asymptotic behavior are O( exp(−∆ (A) t 2 ), exp(−∆ (B) t 1 ) ). Hence one wants both t 1 and t 2 to be large. On the other hand, at large times the statistical errors of the Monte Carlo estimates typically increase: the noise-to-signal ratio grows with a rate exp(δ (A) t 2 + δ (B) t 1 ), where δ (i) is a positive energy difference. It is therefore necessary to compromise between the two sources of error. This compromise represents an important limitation to the achievable overall precision (statistical and systematic).
As an example, consider the nucleon axial coupling g A . For this application, ∆ (A) = ∆ (B) due to isospin symmetry and δ (A) = δ (B) ≈ m nucleon − 3 2 m π [4,5]. 1 Due to the symmetry, t 1 = t 2 is optimal in this simple but relevant example and • t 1 = t 2 1/∆ (A) ≈ 0.5 fm is required to keep systematic corrections due to excited states small • but statistical errors become too large beyond a time t = O(1/δ (A) ) = O(1 fm). 2 Numerical results have been shown for this particular example in the reviews [8,9] and recently in Refs. [3,10].
As a remedy one may try to reduce either the statistical uncertainties or the contamination by excited states. A general idea for reducing statistical fluctuations is to integrate over part of the configuration space analytically or by a multilevel algorithm [11][12][13]. In the pure gauge theory, a reduction of the growth of statistical errors as a function of time has successfully been achieved by multilevel algorithms [11,13] as well as by symmetry constrained Monte Carlo [14,15], but it appears difficult to make further progress in this direction for QCD with dynamical fermions. More radically, in lower-dimensional models a complete rewriting of the path integral led to simulation methods where errors can be kept constant at large time in specific channels [16]. Returning to more moderate gains, in the Heavy Quark Effective Theory (HQET) one is in a special situation because δ (A) is power divergent and depends strongly on the discretization. An optimization of the discretization of HQET yielded a much reduced δ (A) [7]. Despite these advances, we do not have a true solution of the signal-to-noise problem either in QCD or in HQET. It is therefore important to efficiently exploit the information present in an available set of gauge fields. In particular, due to translational invariance it is possible to construct volume-averaged estimators which should have reduced variance compared to those in which one of the fields has a fixed position. This requires the stochastic estimation of the all-to-all quark propagator [17][18][19][20] rather than the traditional calculation of a point-to-all propagator, and given the aforementioned exponential growth of signal-to-noise in Euclidean time, it is essential to "dilute the noise sources" (notation of [19]) such that each has support only on a single time-slice.
It is then natural to try to reduce the systematic corrections due to excited states. As a first step, one improves the interpolating fields O (A) , usually by smearing (see Sect. 2). In this way one may reduce the pref-actor of exp(−∆ (A) t 2 ).
But in Ref. [21] it has been pointed out that by considering N interpolating fields and the Generalized Eigenvalue Problem (GEVP), one can construct a time-dependent effective GEVP-optimized interpolating field where the gap ∆ (A) is enhanced to Besides raising the relevant gap for the ground state n = 1 by a considerable amount, excited states n > 1 can then also be reached in each channel! This was demonstrated to work very well for a decay constant in HQET, i.e. a matrix element of the type f |ĥ w |0 . In this work we show that it is also very advantageous for non-vacuum matrix elements. Moreover, we present a new formula, which involves the 3-point matrix correlation function and the GEVP eigenpairs. There is a summation over the intermediate time t 1 with t = t 1 + t 2 held fixed. We consider now the "symmetric case" when initial and final states are related by a symmetry transformation (e.g. for g A ), since the general case without the symmetry is more complicated as we will explain in the following sections. When ∆ (A) t 1, the corrections to the matrix element are reduced with some coefficients C, C given by matrix elements which are usually unknown. The N = 1 case of the general formula reduces to the summed 3-point function that has previously been used in early investigations of the nucleon sigma term [1] and g A [22]. The improvement of the convergence rate to the ground state has recently been emphasized in [2,3]. Let us leave aside the pref-actors C, C , about which little can be said on general grounds. The remaining time-dependent factors satisfy Furthermore, when one is in the asymptotic regime ∆ (A) t 1 the gain becomes significant: the "summed" GEVP method requires approximately half the total time separation for the same size systematic corrections.
The derivation of the formula for the matrix element, as well as the associated correction terms, proceeds roughly as follows. We start from the GEVP expression for the energy levels, in a theory with degeneracy E (A) = E (B) n , and augment the theory by a "source" term ĥ w in the Hamiltonian. The matrix element A, n|ĥ w |B, n is then obtained as a derivative with respect to of the effective (time dependent) energy level at = 0. This idea is worked out in Sect. 3. In Sect. 4, we report tests of the method for a toy model and also in a quenched QCD/HQET calculation. There we revisit the GEVP for energy levels, using the overlaps computed in the quenched case to fix the parameters of the toy model and examining the convergence of energies in the model. As we will discuss in the conclusions, we expect our method to be advantageous in a number of applications. First we set up the notation and describe the standard GEVP method of Ref. [21].

Matrix elements from Euclidean correlators
In this section we define the problem more precisely and describe the "standard" solution as well as the one using the GEVP.
We want to compute a matrix element of a local operatorĥ w (x), m |A, m . We take the finite (space-) volume normalization of states A, m|A, m = 1, which is easily related to the relativistic one.
The matrix elements are computed from correlation functions j (t) are interpolating fields localized on a time-slice t with j enumerating different fields. They carry the quantum numbers B in the usual way. We now turn to different ways of reanalyzing the correlation functions.

Standard ratios
We consider m = n = 1 for describing the "standard" method, since it is largely restricted to ground states. One defines a ratio Our lattice derivative is defined as . When the sectors (A) and (B) are related by a symmetry of the theory, the exponential factor in eq. (2.4) is unity, as E eff A (t) = E eff B (t). Many variations of the ratio are possible, e.g. replacing E eff . The ratio has a quantum mechanical representation (based on the transfer matrix of the lattice theory) 3 These correction terms have already been mentioned in the introduction. Note that replacing O with a specific choice of fixed coefficients α does not change anything in this formula except for modifying the prefactors c 1 , c 2 . Instead, in the following section we turn to the use of the GEVP in order to change the exponential rates of the correction terms.

Summed ratios
An improved asymptotic convergence is provided by the effective matrix element Eq. (2.7) can be seen by explicit summation over t 1 of the transfer matrix representation of eq. (2.4) and it is the N = 1 case of eq. (2.15) (taking the limit t 0 → t). For the degenerate case E it has been used long ago [1,22] and its improved convergence rate has recently been emphasized in Refs. [2,3]. In Ref. [3] the generalization to non-degenerate spectra was introduced.

GEVP improvement
We here summarize Ref. [21] and apply it to the present case. We assume that we have N linearly independent fields O j , with couplings to the low lying states. The labels A, B are dropped where statements independent of the channel are made. The GEVP [23] constructed from the matrices C (A) , C (B) at times t > t 0 yields effective energies [24] E eff n (t, t 0 ) = −∂ t log(λ n (t, t 0 )) (2.9) which converge as [21] E eff provided one takes t 0 ≥ t/2, which we use here. 4 The starting point for computing matrix elements is an operator (in each channel) which satisfies [21] With respect to [21] we have here made a specific choice for the relation of t 0 and t, denoting the resulting v n as v n (t) with a single argument. We can then obtain the desired matrix element Here we have reintroduced the labels A, B. Eq. (2.16) reduces to eq. (2.4) for N A = 1 = N B , but taking N A , N B larger improves the convergence and enables access to excited states. As before, one can formulate a simpler effective matrix element when (A) and (B) are related by a symmetry and only the m = n matrix elements are required. The symmetry means for all t, t 0 and n. The ratio (remember that we use the shorthand v satisfies eq. (2.15) as well but may have reduced statistical errors. The leading error is minimized by the choice t 2 = t 1 .

Improved method: sGEVP
Here we combine the improvement by summation of Sect. 2.2 with the GEVP of Sect. 2.3.

Symmetric case
We consider the symmetric case eq. (2.17) and drop the labels A and B. As derived in App. A, the effective matrix element converges to the exact matrix element as The formula assumes t 0 ≥ t/2 and the exact size of the corrections does in general depend on how we choose t 0 , e.g. t 0 = t − a vs. t 0 /t = fixed. We shall demonstrate in Sect. 4 that the corrections in eq. (3.1) are very small generically. The label "s" stands for summed, since K ij (t) is a 3-point function summed over one argument.

Asymmetric case
In the situation when eq. (2.17) is not satisfied or if we want a matrix element M mn with n = m, we first define the estimator for the difference E as well as the energy shifted correlation function The summed three-point function is then defined by Everywhere we take t 0 ≥ t/2. An approximation to the matrix element is We have observed numerically that in the case of Σ(t, t 0 ) = 0, it converges as see Sect. 4. The gap ∆ is given by the minimum one in the two channels, Since the exponential convergence is now governed by t 0 , there is no obvious advantage compared to eq. (2.16) unless one takes t 0 ≈ t. If statistical precision is good enough to allow for such large t 0 , 3-point functions and 2-point functions with a maximal time extent of t are sufficient to obtain a convergence rate of O(∆ t exp(−∆ t)) as in the symmetric case. Eq. (3.10) has not been proved formally, but our numerical investigation of toy models leaves little doubt that it is correct.

Demonstrations
We carry out two sets of demonstrations of how the various estimators for matrix elements work. First we consider toy models, prescribing spectra and matrix elements and do not take statistical errors into account. We construct "difficult" (large corrections due to excited states) and "easy" toy models. The second set of experiments is a quenched computation of the B * Bπ-coupling, where realistic statistical errors are present.

Definition of the models
We first specify the spectra in dimensionless form. Two different ones are used below, The length factor r 0 is in principle arbitrary, setting the overall scale of the theory, but we think of it as r 0 ≈ 0.5fm. Level splittings of around 1/r 0 are realistic in QCD, as the particle data book and lattice computations show. Next the overlaps need to be fixed. In our HQET applications (see Sect. 4.2), we use spatially smeared quark fields to construct the fields O i . We computed their overlaps ψ in for n = 1, . . . , 5 and i = 1, . . . 7 using the GEVP "creation operator" [Â eff n (t)] † . For details we refer to the following section. Here we just take the approximate matrix corresponding to smearing levels 1, 4, 7 which is typically done in practice [25]. We observed a strong decay of the overlaps ψ S in with increasing n which suggests that a truncation with ψ in = 0 for n > 5 is realistic at reasonable time separations of the correlation functions, say t > r 0 /2. In any case, what we discuss here remains a model, but we expect it to be quite realistic.
The matrix ψ S represents a relatively comfortable situation which we may not always have. For that reason we also construct a more challenging case with a slow decay in n. We set ψ in = 0 for n > 20.
With the model matrix elements (again we note that these are not completely unrealistic) and assuming the sectors A, B to be related by a symmetry, the model is completely defined. In particular we have We refer to this model as SlSl. Replacing ψ S by ψ C defines the model ClCl and finally with ψ S , E (l) n for channel A and ψ C , E n for channel B we define the model SlCh. In other words we have the following table.

Energies from the GEVP
The corrections of the effective energies extracted from the GEVP, eq. (2.9), compared to the exact energies is shown in Fig. 1. We see how t 0 ≥ t/2 accelerates the convergence. As expected Cl is a more challenging situation with larger corrections. One also sees that at short time (t/r 0 ≤ 2) the dependence on t 0 is typically not very dramatic. This feature has been observed in a number of practical applications [25,26]. Still, it appears dangerous to rely on this in general. In the left hand plot, we also observe the difference of the GEVP and a standard effective mass (dashed-dotted line). Here C 22 is shown (the corrections for C 11 are quite a bit smaller).

Matrix elements
Let us start with the easiest situation, the extraction of ground state matrix elements m = n = 1 in the symmetric case. These are shown for two models in Fig. 2. The labeling of the different estimates is as follows: "ratio" dotted, black eq. (2.4) with t 1 = t 2 = t/2 "summed" dashed, black eq. (2.6) "GEVP" dashed-dotted, red eq. (2.18) with t 1 = t 2 = t/2 "sGEVP" blue eq. (3.1) The scale of the y-axis covers a variation of 10%. On the x-axis in this and the following figures we have for each method considered the total time extent of the 3-point functions since in a MC computation this generically governs the statistical accuracy. The graphs illustrate that the improved asymptotics of the sGEVP estimate (compared to the GEVP and the single operators) (N = 1) go hand in hand with smaller corrections at moderate time separations, t ≈ r 0 . . . 2r 0 . 5 Among the estimates which do not use a GEVP, the summed method is generically better, at least when t is not too small. Diagonal (m = n) matrix elements for the degenerate case are shown in Fig. 3 on the left. For n > 1 only GEVP and sGEVP can be used for a systematic computation. Even though the scale of the y-axis is enlarged, we observe that sGEVP also works rather well for determining excited states. Note that with a GEVP with N = 3 states (as is used here), the convergence of the m = n = 3 matrix elements is rather slow, but we show them anyway for illustration. It is strongly recommended to use a larger N in a real computation of M 33 if statistical errors allow.  On the right of Fig. 3, we show the matrix elements M 12 . Here the sGEVP means eq. (3.7) with the energy shifts. The improvement compared to the standard application of the GEVP, eq. (2.18), is present but is not as impressive as on the left side, where no energy shifts are needed. We do not show levels above n = 2 since there a larger GEVP would be recommended as we discussed for the diagonal case.
Finally, consider the situation where the spectra of the A sector and the B sector are different, as in the model SlCh. Example applications are B → π transitions or elastic form factors with momentum transfer. On the left side of Fig. 4 the M 11 matrix element is shown. We again observe an impressive advantage of the GEVP methods, in particular of sGEVP over the standard ratio eq. (2.4). On the right side of the figure we study M 12 , where eq. (2.4) is not applicable. In this particular case, the amplitudes of the corrections of the GEVP effective matrix elements are relatively small and interfere destructively. It therefore happens to be more accurate than sGEVP for a range of t.
In conclusion, the study of the models shows that the asymptotic convergence formulae also provide a very good estimate of the relative advantages of the different methods at intermediate t.
suggests that t 0 ≈ t is needed to reach similar accuracy in the two cases and indeed we find this generically to be the case. One then needs the 3-point functions at twice the total time separation in GEVP compared to sGEVP. In the non-degenerate case, the convergence is governed by t 0 in both GEVP and sGEVP. Here a very significant improvement is the change from a standard ratio eq. (2.4) to GEVP or sGEVP, see the left of Fig. 4. The right side of that figure shows that sGEVP yields considerable further improvement over GEVP when a large t 0 is chosen.

The B * Bπ-coupling in the quenched approximation
In the static approximation for the b-quark, the B * Bπ-coupling is denoted byĝ. It is a leading order low energy constant in the heavy meson chiral Lagrangian [27][28][29] . As such, it is of considerable interest for chiral extrapolations of lattice results, employing a systematic expansion in 1/m b and m 2 π /(8π 2 F 2 π ). The bare matrix element is 6 with |B * k (0) polarized along the k-axis, see also [2,30]. Note that here we use the normalization of states B(p)|B(p) = B * k (p)|B * k (p) = 2L 3 , which corresponds to the non-relativistic one in the infinite volume limit. We do not include the renormalization factor of the axial current anywhere.
Our interpolating fields for B and B * are related by the exact spin symmetry of the static approximation and are generated by gauge-covariant Gaussian wave functions inserted between the static and the light quark field. Such gauge invariant interpolating fields were introduced in Ref. [22]. We use exactly the ones of Ref. [2] with width r wf /r 0 = 0.36, 0.51, 0.62, 0.71, 0.87, 1.01, 1.13 (eq. (2.5) of Ref. [2]). For the present demonstration we work in the quenched approximation and the light quark mass is set to the mass of the strange as in [25]. An ensemble of one hundred gauge configurations is used on a 32 × 16 3 lattice with spacing a ≈ 0.1fm and statistical errors are kept small by an all-to-all method [19] in combination with the static action "HYP2" [7] as done previously [2]. Here we use one hundred fully time-diluted noise sources per configuration.

Approximate overlaps
We first pick five fields O i from our set with r wf /r 0 = 0.36, 0.51, 0.62, 0.71, 1.13. With the operator eq. (2.14) we can then compute the overlaps where n = 1, . . . , 5 labels the excitations. The normalization of the fields O i is irrelevant for all applications, but in order to have the interpretation of an overlap, we choose the normalization such that C ii (0) = 1. In this case a value of one for ψ 2 in means that O i |0 = |n without corrections, i.e. 100% overlap. Furthermore, we fix the signs by the convention ψ in > 0 for the value i which maximizes |ψ in | at fixed n. Figure 5 shows examples of ψ in (t). Even if these are not precision determinations of the overlaps, they show interesting features. The local field shown in the first row has considerable overlap with all states considered. It is a bad interpolating field for ψ(t) ground state physics. However the other fields with reasonable radii display a rather strong decay of the overlaps with growing n, indicating that the smeared fields provide a good basis of interpolating fields which couple little to excited states. Indeed, this figure demonstrates that these wave functions considerably reduce the overlaps to high excited states. Conversely, this also means that high excited states are difficult to access with these fields. Standard plateau plot forĝ. The ratio R(t − t 1 , t 1 ) is considered as a function of t 1 /r 0 with t fixed at t = 2.14r 0 . The latter is the value used in the so far most complete determination [31], while earlier t/r 0 ≈ 4 was used on a lattice with a = 0.2 fm [32]. Left: wave-function with r wf = 1.13 r 0 , right: r wf = 0.62 r 0 . model yields a qualitative understanding of the corrections. We believe that computing overlaps as done here may also be very useful for understanding the systematic errors in present extractions of nucleon matrix elements, namely the question of the magnitude of excited state contamination. Given an approximate knowledge of the spectrum, this contamination can be roughly estimated when the overlaps are known. Indeed, let us apply our approximate knowledge of ψ 1n < 0.1 for n > 1, together with the plausible assumption that matrix elements M mn are of roughly the same magnitude as M 11 . Then, at time separations t = r 0 and for the best wave function, excited states make rather small corrections of order 0.1e −1 , i.e. of the order of a few per cent. Therefore the matrix elementĝ is a rather easy test case and all methods should be successful.

The matrix elementĝ
In this section we show numerical results forĝ, computed with the various methods introduced above. The GEVP estimates for the ground state, displayed in Fig. 6, exhibit no corrections exceeding 2% once t = r 0 has been reached. For smaller t, sGEVP has smaller corrections than GEVP. However, at large times the statistical errors are increasing faster for sGEVP. For this particular matrix element and for the best interpolating field, the corrections for the summed ratio (Fig. 7, top) are somewhat larger than those of the standard ratio and again the summed method suffers from larger statistical errors at large times. However, in the case of a less optimal interpolating field (bottom of Fig. 7), the summed ratio exhibits its superiority.
For comparison, we also show the frequently used analysis where t is kept fixed (here at a value used previously in determinations ofĝ [31]) and one looks for a plateau as a function of t 1 . With our precision one can observe the lack of a plateau in Fig. 8 for r wf = 0.62 r 0 , but with errors at a 1% level a false "plateau" would be observed for t/4 ≤ t 1 ≤ 3t 1 /4. This demonstrates the danger inherent in this method. The left hand side of the figure shows that forĝ a plateau with the correct height is obtained for a larger smearing radius.

Excited state matrix elementsĝ nm
For excited state matrix elements, Fig. 9, only the GEVP estimates are applicable. They appear to work quite well for the first excitation and also for the second excitation a reasonable estimate can be obtained. The sGEVP again seems superior, as the deviations from our estimated asymptotic values are smaller. Figure 10 demonstrates these same features for an off-diagonal matrix element.

Conclusions
In this paper we have introduced the GEVP method with summation, denoted sGEVP, and we have examined several alternative methods for computing hadron-to-hadron ma- In the last case, ∆ is given by eq. (3.11) and one will typically use t 0 = t/2. The form of the leading correction term of sGEVP is derived in the appendix for the equal energy case, while for the general one we deduced it from the numerical investigation of toy models. The GEVP correction term is known from [21] and for "ratio" and "summed ratio" it follows directly from the transfer matrix representation. We investigated two toy models constructed to be quite representative for heavy-light meson matrix elements. In these models, the asymptotic forms of the corrections have been found to be a good guideline for the behavior at intermediate values of t, of the order t = (2 − 3)r 0 . In particular we found that generically sGEVP has the smallest systematic errors, followed by GEVP. As a rule of thumb, sGEVP requires half the time separation of GEVP for the same systematic accuracy. A Monte Carlo computation of the B * Bπ couplingĝ confirms our findings in the models concerning the systematic errors. In addition it allows us to make statements about the statistical errors which have to be balanced with systematic ones due to excited states. Statistical errors grow more quickly as a function of t for sGEVP compared to GEVP, but a comparison at roughly the same amount of excited state contamination corresponds to a factor two between the values of t. A comparison at roughly fixed systematic error is shown in Table 1. We observe a minor difference for the ground state in advantage for sGEVP and the ratio of errors grows up to a factor five in the error for M 33 for the considered matrix element.  In the comparison of the different methods, one also has to consider the numerical effort to compute the effective matrix elements. We assume that one wants to control the corrections by computing the t-dependence of the estimators. In the summed cases, eq. (2.6) and eq. (3.1), this can often be done with a fixed number of quark propagator computations yielding a result for all t, by computing sequential propagators. The computation ofĝ is such a case. In fact, since we have used a full all-to-all computation with "time dilution" (in the notation of [19]), also the GEVP estimate is obtained at the same expense. In contrast, if one only uses translation invariance on a time slice ("time-slice-to-all"), and for example varies t, keeping t 1 = t 2 = t/2 in eq. (2.4) or eq. (2.18), then the required number of propagator computations is proportional to the number of t-values considered. In this situation the sGEVP method has an additional advantage.
Taking statistical and excited state errors as well as the effort into account, sGEVP seems to be the overall most accurate, safe and efficient method. Given the difficulty in evaluating relevant correlation functions at large time separations and assessing the systematic errors, it still appears advisable to compare the different approaches in most cases.
In our opinion the sGEVP method (and maybe the GEVP method) should be applied to nucleon matrix elements such as g A or moments of structure functions, where large time separations are difficult to reach [3,[8][9][10] and it is non-trivial to estimate possible contamination by excited states. In order to appreciate the last point, recall that in the standard ratio method, the systematic error drops like exp(−∆ 2,1 t/2). In order to see such a term, one has to change t to t such that the error term changes appreciably, say by a factor of three. One then needs t − t ≈ 2/∆ 2,1 ≈ 1 fm 7 . The summed ratio reduces this requirement by a factor of about two and the GEVP methods by a larger factor. This gains security in the detection of possible systematic errors.

A.2 GEVP in the augmented theory
The desired E n (0) is efficiently computed with a GEVP method as follows. We combine the interpolating fields O It remains to give an explicit expression for E eff n (t, t 0 ) in terms of the correlation functions, which is equivalent to a solution of the GEVP to first order in . The 2N ×2N GEVP equation, C(t, )v n (t, t 0 , ) = λ n (t, t 0 , )C(t 0 , )v n (t, t 0 , ), separates into the two independent ones The expansion of such a GEVP in was written down in [21] with the intention that is given by the HQET expansion parameter. We here just use the solution. Its first order term in yields the desired matrix element in the form eq. (3.1) in terms of the generalized eigenvectors u n of the lowest order ( = 0) GEVP of size N × N in a single channel A.