On the generalised eigenvalue method and its relation to Prony and generalised pencil of function methods

We discuss the relation of three methods to determine energy levels in lattice QCD simulations: the generalised eigenvalue, the Prony and the generalised pencil of function methods. All three can be understood as special cases of a generalised eigenvalue problem. We show analytically that the leading corrections to an energy $E_l$ in all three methods due to unresolved states decay asymptotically exponentially like $\exp(-(E_{n}-E_l)t)$. Using synthetic data we show that these corrections behave as expected also in practice. We propose a novel combination of the generalised eigenvalue and the Prony method, denoted as GEVM/PGEVM, which helps to increase the energy gap $E_{n}-E_l$. We illustrate its usage and performance using lattice QCD examples.


I. INTRODUCTION
In lattice field theories one is often confronted with the task to extract energy levels from noisy Monte Carlo data for Euclidean correlation functions, which have the theoretical form with real and distinct energy levels E k+1 > E k and real coefficients c k . It is well known that this task represents an ill-posed problem because the exponential functions do not form an orthogonal system of functions. Still, as long as one is only interested in the ground state E 0 and the statistical accuracy is high enough to be able to work at large enough values of t, the task can be accomplished by making use of the fact that with corrections exponentially suppressed with increasing t due to ground state dominance. However, in lattice quantum chromodynamics, the non-perturbative approach to quantum chromodynamics (QCD), the signal to noise ratio for C(t) deteriorates exponentially with increasing t [1]. Moreover, at large Euclidean times there can be so-called thermal pollutions (see e.g. Ref. [2]) to the correlation functions, which, if not accounted for, render the data at large t useless. And, once one is interested in excited energy levels E k , k > 0, alternatives to the ground state dominance principle need to be found. The latter problem can be tackled applying the so-called generalised eigenvalue method (GEVM) -originally proposed in Ref. [3] and further developed in Ref. [4]. It is by now well established in lattice QCD applications and allows one to estimate ground and excited states for the price that a correlator matrix needs to be computed instead of a single correlation function. Moreover, the systematics of this method are well understood [4,5]. An alternative method, originally proposed by de Prony [6], represents an algebraic method to determine in principle all the energy levels from a single correlation function. However, it is well known that the Prony method can become unstable in the presence of noise. The Prony method was first used for lattice QCD in Refs. [7,8]. For more recent references see Refs. [9][10][11] and also Appendix A. For an application of the Prony method in real time dynamics with Tensor networks see Ref. [12]. In this paper we discuss the relation among generalised eigenvalue, Prony and generalised pencil of function (GPOF) methods and trace them all back to a generalised eigenvalue problem. This allows us to derive the systematic effects due to so-called excited state contributions for the Prony and GPOF methods using perturbation theory invented for the GEVM [5]. In addition, we propose a combination of the GEVM and the Prony method, the latter of which we also formulate as a generalised eigenvalue method and denote it as Prony GEVM (PGEVM). The combination we propose is to apply first the GEVM to a correlator matrix and extract the socalled principal correlators, which are again of the form Eq.
(1). Then we apply the PGEVM to the principal correlators and extract the energy levels. In essence: the GEVM is used to separate the contributing exponentials in distinct principal correlators with reduced pollutions compared to the original correlators. Then the PGEVM is applied only to obtain the ground state in each principal correlator, the case where it works best.
By means of synthetic data we verify that the PGEVM works as expected and that the systematic corrections are of the expected form. Moreover, we demonstrate that with the combination GEVM/PGEVM example data from lattice QCD simulations can be analysed: we study the pion first, where we are in the situation that the ground state can be determined with other methods with high confidence. Thereafter we also look at the η-meson and I = 1, π = π scattering, both of which require the usage of the GEVM in the first place, but where also noise is significant. The paper is organised as follows: in the next section we introduce the GEVM and PGEVM and discuss the systematic errors of PGEVM. After briefly explaining possible numerical implementations, we present example applications using both synthetic data and data obtained from lattice QCD simulations. In the end we discuss the advantages and disadvantages of our new method, also giving an insight into when it is most useful.

II. METHODS
Maybe the most straightforward approach to analysing the correlation function Eq. (1) for the ground state energy E 0 is to use the so-called effective mass defined as In the limit of large t 0 and fixed δt, M eff converges to E 0 . The correction due to the first excited state E 1 is readily computed: It is exponentially suppressed in t 0 and the energy difference between first excited and ground state. It is also clear from this formula that taking the limit δt → ∞ while keeping t 0 fixed leads to a worse convergence behaviour than keeping δt fixed and changing t 0 . In this section we will discuss how both of the two above equations generalise.
A. The generalised eigenvalue method (GEVM) We first introduce the GEVM. The method is important for being able to determine ground and excited energy levels in a given channel. Moreover, it helps to reduce excited state contaminations to low lying energy levels.
Using the notation of Ref. [5], one considers correlator matrices of the form with energy levels E k > 0 and E k+1 > E k for all values of k. The ψ ki = 0|Ô i |k are matrix elements of n suitably chosen operatorsÔ i with i = 0, ..., n − 1. Then, the eigenvalues or so-called principal correlators λ(t, t 0 ) of the generalised eigenvalue problem (GEVP) can be shown to read for t 0 fixed and t → ∞. Clearly, the correlator matrix C(t) will for every practical application always be square but finite with dimension n. This will induce corrections to Eq. (7). The corresponding corrections were derived in Ref. [4,5] and read to leading order with b k > 0 and Most notably, the principal correlators λ k (t 0 , t) are at fixed t 0 again a sum of exponentials. As was shown in Ref. [5], for t 0 > t/2 the leading corrections are different compared to Eq. (8), namely of order

B. The Prony method
For the original Prony method [6], we restrict ourselves first to a finite number n of exponentials in an Euclidean correlation function C 0 The c k are real, but not necessarily positive constants and t is integer-valued. Thus, we focus on one matrix element of the correlator matrix Eq. (5) from above or other correlators with the appropriate form. We assume now E k = 0 for all k ∈ {0, . . . , n − 1} and that all the E k are distinct. Moreover, we assume the order E k+1 > E k for all k. Then, Prony's method is a generalisation of the effective mass Eq. (3) in the form of a matrix equation with an (n + 1) × (n + 1) Hankel matrix H . . . C 0 (t + n + 1) . . . . . . . . . . . .
C. The Prony GEVM (PGEVM) Next we formulate Prony's method Eq. (12) as a generalised eigenvalue problem (see also Ref. [13]). Let H 0 (t) be a n×n Hankel matrix for i, j = 0, 1, 2, . . . , n−1 defined by with integer ∆ > 0. H 0 (t) is symmetric, but not necessarily positive definite. We are going to show that the energies E 0 , . . . , E n−1 can be determined from the generalised eigenvalue problem The following is completely analogous to the corresponding proof of the GEVM in Ref. [5]. Define a square matrix and re-write H 0 (t) as Note that χ is a square Vandermonde matrix with all coefficients distinct and, thus, invertible. Now, like in Ref. [5], introduce the dual vectors u k with for k, l ∈ {0, . . . , n − 1}. With these we can write Thus, the GEVP Eq. (14) is solved by Moreover, much like in the case of the GEVM we get the orthogonality (u l , H 0 (t)u k ) = c l e −E l t δ lk , k, l ∈ {0, . . . , n−1} (18) for all t-values, because H 0 (t) u k ∝ χ k .

Global PGEVM
In practice, there are two distinct ways to solve the GEVP Eq. (14): one can fix τ 0 and determine Λ 0 k (τ 0 , t) as a function of t. In this case the solution Eq. (17) indicates that for each k the eigenvalues decay exponentially in time. On the other hand, one can fix δt = t − τ 0 and determine Λ 0 k (τ 0 , δt) as a function of τ 0 . In this case the solution Eq. (17) reads because δt is fixed. The latter approach allows to formulate a global PGEVM. Observing that the matrices χ do not depend on τ 0 , one can reformulate the GEVP Eq. (14) as follows since Λ 0 k does not depend on τ 0 . However, this works only as long as there are only n states contributing and all these n states are resolved by the PGEVM, as will become clear below. If this is not the case, pollutions and resolved states will change roles at some intermediate τ 0 -value.

Effects of Additional States
Next, we ask the question what corrections to the above result we expect if there are more than n states contributing, i.e. a correction term to the correlator and a corresponding correction to the Hankel matrix (We have set ∆ = 1 for simplicity.) We assume that we work at large enough t such that these corrections can be considered as a small perturbation. Then it turns out that the results of Refs. [4,5] apply directly to the PGEVM and all systematics are identical (Eq. (8) or Eq. (10)). However, there is one key difference between GEVM and PGEVM. The GEVM with periodic boundary conditions is not able to distinguish the forward and backward propagating terms in as long as they come with the same amplitude. In fact, the eigenvalue λ 0 will in this case also be a cosh or sinh [14]. In contrast, the PGEVM can distinguish these two terms. As a consequence, the backward propagating part needs to be treated as a perturbation like excited states and Λ 0 is no longer expected to have a cosh or sinh functional form in the presence of periodic boundary conditions. This might seem to be a disadvantage at first sight. However, we will see that this does not necessarily need to be the case.
Concerning the size of corrections there are two regimes to consider [5]: when τ 0 is fixed at small or moderately large values and Λ is studied as a function of t → ∞ the corrections of the form Eq. (8) apply [4]. When, on the other hand, τ 0 is fixed but τ 0 ≥ t/2 is chosen and the effective masses Eq. (3) of the eigenvalues are studied, corrections are reduced to O(e −∆E n,l t ) with ∆E m,n = E m − E n [5]. τ 0 ≥ t/2 is certainly fulfilled if we fix δt to some (small) value. However, for this case Λ 0 (t, τ 0 ) is expected to be independent of both, t and τ 0 when ground state dominance is reached and M eff is, thus, not applicable. Therefore, we define alternative effective masses M eff,l (δt, τ 0 ) = − log (Λ l (δt, τ 0 )) δt (21) and apply the framework from Ref. [5] to determine deviations ofM eff,l from the true E l . The authors of Ref. [5] define = e −(En−En−1)τ0 and expand where we denote the eigenvalues of the full problem as Λ(t, τ 0 ). Already from here it is clear that in the situation with δt fixed and τ 0 → ∞ the expansion parameter becomes arbitrarily small. Simultaneously with τ 0 also t → ∞. The first order correction (which is dominant for × e −∆E n,l δt − 1 c l,n with the definition of ∆E m,n from above and constant coefficients c l,n = (v 0 l , χ n )(χ n , v 0 l ) .
These corrections are decaying exponentially in τ 0 with a decay rate determined by ∆E n,l as expected from Ref. [5].
For the effective energies we find likewise with corrections decaying exponentially in τ 0 , again with a rate set by ∆E n,l .

D. Combining GEVM and PGEVM
There is one straightforward way to combine GEVM and PGEVM: we noted already above that the principal correlators of the GEVM are again a sum of exponentials, and, hence, the PGEVM can be applied to them. This means a sequential application of first the GEVM with a correlator matrix of size n 0 to determine principal correlators λ k and then of the PGEVM with size n 1 and the λ k 's as input. This combination allows us to work with two relatively small matrices, which might help to stabilise the method numerically. Moreover, the PGEVM is applied only for the respective ground states in the principal correlators and only relatively small values of n 1 are needed. An additional advantage lies in the fact that λ k is a sum of exponentials with only positive coefficients, because it represents a correlation function with identical operators at source and sink. As a consequence, the Hankel matrix H 0 is positive definite.

E. Generalised Pencil of Function (GPOF)
For certain cases, the PGEVM can actually be understood as a special case of the generalised pencil-offunction (GPOF) method, see Refs. [15][16][17][18] and references therein. Making use of the time evolution operator, we can define a new operator which is the same as C ij (t + ∆t). Using i = j and the . . one defines the PGEVM based on a single correlation function. Note, however, that the PGEVM is more general as it is also applicable to sum of exponentials not stemming from a two-point function.
The generalisation is now straightforward by combininĝ O i andÔ m∆t,i for i = 0, . . . , n 0 −1 and m = 0, . . . , n 1 −1. These operators define a Hankel matrix H 0 with size n 1 of correlator matrices of size n 0 as follows (∆ = 1 for simplicity) with for j = 0, . . . , n 0 − 1 and i = 0, . . . , n 1 − 1. Then n = n 0 · n 1 is the number of energies that can be resolved. H is hermitian, positive definite and the same derivation as the one from the previous subsection leads to the GEVP In this case the matrix H 0 is positive definite, but potentially large, which might lead to numerical instabilities. This can be alleviated by using only for a limited subset of operatorsÔ i their shifted versionsÔ m∆t,i , preferably for thoseÔ i contributing the least noise.

III. NUMERICAL IMPLEMENTATION
In case the Hankel matrix H 0 is positive definite, one can compute the Cholesky decomposition C(t 0 ) = L·L T . Then one solves the ordinary eigenvalue problem If this is not the case, the numerical solution of the PGEVM can proceed along two lines. The first is to compute the inverse of H 0 (τ 0 ) for instance using a QRdecomposition and then solve the ordinary eigenvalue problem for the matrix Alternatively, one may take advantage of the symmetry of both H 0 (t) and H 0 (τ 0 ). One diagonalises both H 0 (t) and H 0 (τ 0 ) with diagonal eigenvalue matrices Λ t and Λ τ0 and orthogonal eigenvector matrices U t and U τ0 . Then, the eigenvectors of the generalised problem are given by the matrix and the generalised eigenvalues read Note that U is in contrast to U t and U τ0 not orthogonal.

A. Algorithms for sorting GEVP states
Solving the generalized eigenvalue problem in Eq. (6) for an n × n correlation function matrix C(t) (or Hankel matrix H) with t > t 0 , results in an a priori unsorted set {s k (t)|k ∈ [0, ..., n − 1]} of states s k (t) = (λ k (t, t 0 ), v k (t, t 0 )) on each timeslice t defined by an eigenvalue λ k (t, t 0 ) and an eigenvector v k (t, t 0 ). In the following discussion we assume that the initial order of states is always fixed on the very first timeslice t 0 + 1 by sorting the states by eigenvalues, i.e. choosing the label n by requiring the vector of states reads (s 0 (t 0 + 1), ..., s n−1 (t 0 + 1)). After defining the initial ordering of states, there are many different possibilities to sort the remaining states for t > t 0 . In general, this requires a prescription that for any unsorted vector of states (s (k=0) (t), ..., s (k=n−1) (t)) yields a re-ordering s (k) (t) of its elements. The permutation (k) may depend on some set of reference states (s 0 (t), ..., s n−1 (t)) at timet which we assume to be in the desired order. However, for the algorithms discussed here, such explicit dependence on a previously determined ordering at a reference timet is only required for eigenvector-based sorting algorithms. Moreover,t does not necessarily have to equal t 0 + 1. In fact, the algorithms discussed below are in practice often more stable for choosing e.g. the previous timeslice t − 1 to determine the order of states at t while moving through the available set of timeslices in increasing order.

Sorting by eigenvalues
This is arguably the most basic way of sorting states; it simply consists of repeating the ordering by eigenvalues that is done at t 0 for all other values of t, i.e. one chooses (k) independent of any reference state and ignoring any information encoded in the eigenvectors, s.t.
The obvious advantage of this method is that it is computationally fast and trivial to implement. However, it is not stable under noise which can lead to a rather large bias and errors in the large-t tail of the correlator due to incorrect tracking of states. This is an issue for systems with a strong exponential signal-to-noise problem (e.g. the η,η -system) as well as for large system sizes n. Moreover, the algorithm fails by design to correctly track crossing states, which causes a flipping of states at least in an unsupervised setup and tends to give large point errors around their crossing point in t.

Simple sorting by eigenvectors
Sorting algorithms relying on eigenvectors instead of eigenvalues generally make use of orthogonality properties. A simple method is based on computing the scalar product where v l (t) denote eigenvectors of some (sorted) reference states s l (t) att < t and v k (t) belongs to a state s k (t) that is part of the set which is to be sorted. For all values of k one assigns k → (k), s.t. |c kl | ! = max. If the resulting map (k) is a permutation the state indexing at t is assigned according to s k (t) → s (k) (t). Otherwise sorting by eigenvalues is used as a fallback. This method has some advantages over eigenvalue-based sorting methods: It can in principle track crossing states and flipping or mixing of states in the presence of noise are less likely to occur. The latter is especially an issue for resampling (e.g. bootstrap or jackknife), i.e. if state assignment fails only on a subset of samples for some value(s) of t, leading to large point errors and potentially introducing a bias. On the downside, the resulting order of states from this method is in general not unambiguous for systems with n > 2 and the algorithm is not even guaranteed to yield a valid permutation (k) for such systems in the presence of noise, hence requiring a fallback.

Exact sorting by eigenvectors
Any of the shortcomings of the aforementioned methods are readily avoided by an approach that uses volume elements instead of scalar products. This allows to obtain an unambiguous state assignment based on (globally) maximized orthogonality. The idea is to consider the set of all possible permutations { (k)} for a given n × n problem and compute for each . This can be understood as assigning a score for how well each individual vector v k (t) fits into the set of vectors at the reference timeslicet at a chosen position (k) and computing a global score for the current permutation by taking the product of the individual scores for all vectors v k (t). The final permutation is then Unlike the method using the scalar product, this method is guaranteed to always give a unique solution, which is optimal in the sense that it tests all possible permutations and picks the global optimum. Therefore, the algorithm is most stable under noise and well suited for systems with crossing states. Empirically, this results in e.g. the smallest bootstrap bias at larger values of t compared to any other method described here. A minor drawback of the approach is that it is numerically more expensive due to the required evaluations of (products of) volume elements instead of simple scalar products. However, this becomes only an issue for large system sizes and a large number of bootstrap (jackknife) samples.

Sorting by minimal distance
While the methods discussed above work all fine for the standard case where the GEVP is solved with fixed time t 0 (or τ 0 ) and δt is varied, the situation is different for M eff with δt fixed: there are t-values for which it is numerically not easy to separate wanted states from pollutions, because they are of very similar size in the elements of the sum of exponentials entering at these specific t-values. However, when looking at the bootstrap histogram of all eigenvalues, there is usually a quite clear peak at the expected energy value for all t-values with not too much noise. Therefore, we implemented an alternative sorting for this situation which goes by specifying a target value ξ. Then we chose among all eigenvalues for a bootstrap replicate the one which is closest to ξ. The error is computed from half of the 16% to 84% quantile distance of the bootstrap distribution and the central value as the mean of 16% and 84% quantiles. For the central value one could also use the median, however, we made the above choice to have symmetric errors. This procedure is much less susceptible to large outliers in the bootstrap distribution, which appear because of the problem discussed at the beginning of this sub-section. For the numerical experiments shown below we found little to no difference in between sorting by eigenvalues and any of the sorting by vectors. Thus, we will work with sorting by eigenvalues for all cases where we study Λ l (t, τ 0 ) with τ 0 fixed. On the other hand, specifying a target value ξ and sort by minimal distance turns out to be very useful for the case Λ l (δt, τ 0 ) with δt fixed. As it works much more reliably than the other two approaches, we use this sorting by minimal distance for the δt fixed case throughout this paper. The methods used in this paper are fully implemented in a R package called hadron [19], which is freely available software.

IV. NUMERICAL EXPERIMENTS
In this section we first apply the PGEVM to synthetic data. With this we investigate whether additional states not accounted for by the size of the Prony GEVP lead to the expected distortions in the principal correlators and effective masses. At this stage the energy levels and am- plitudes are not necessarily chosen realistically, because we would first like to understand the systematics. In a next step we apply the combination of GEVM and PGEVM to correlator matrices from lattice QCD simulations. After applying the framework to the pion, we have chosen two realistic examples, the η-meson and the ρ-meson.

A. Synthetic Data
As a first test we apply the PGEVM alone to synthetic data. We generate a correlator containing three states with E k = (0.125, 0.3, 0.5), k = 0, 1, 2 and t ∈ {0, . . . , 48}. The amplitudes c k have been chosen all equal to 1. We apply the PGEVM to this correlator C s with n = 2. This allows us to resolve only two states and we would like to see how much the third state affects the two extracted states. The result is plotted in Figure 1. We plotM eff of Eq. In Eq. (24) we have discussed that we expect corrections inM eff and M eff to decay exponentially in t = δt + τ 0 . We can test this by subtracting the exactly known energy E k from the PGEVM results. Therefore, we plot in Figure 2 effective masses minus the exact E k values as a function of t. Filled symbols correspond toM eff with δt = 1 and open symbols (only k = 0) to M eff with τ 0 = 10. The asymptotically exponential convergence in t is nicely visible for both effective mass definitions and also for k = 0 and k = 1. ForM eff the decay rate is to a good approximation E 2 − E 0 for k = 0 and E 2 − E 1 for k = 1, respectively, as expected from Eq. (24). For M eff the asymptotic logarithmic decay rate is approximately E 1 − E 0 and, thus, worse as expected from Eq. (8). So far we have worked solely with ∆ = 1. In Figure 3 we investigate the dependence ofM eff and M eff on ∆: we plot E − E 0 on a logarithmic scale as a function of t for ∆ = 1 and ∆ = 4. While ∆ has no influence on the convergence rate, it reduces the amplitude of the pollution for bothM eff and M eff by shifting the data points to the left. The reason is that a larger ∆ allows to reach larger times in the Hankel matrices at the same t. A smaller ∆ on the other hand allows to go to larger t, thus the advantage of increased ∆ is negligible. In order to see the effect of so-called back-propagating states, we next investigate a correlator

B. Lattice QCD Examples
As a first lattice QCD example we start with the charged pion, which gives rise to one of the cleanest signals in any correlation function extracted from lattice QCD simulations. In particular, the signal to noise ratio is independent of t. From now on quantities are given in units of the lattice spacing a, i.e. aE, aM , t/a, . . . are dimensionless real numbers. However, for simplicity we set a = 1. The example we consider is the B55.32 ensemble generated with N f = 2 + 1 + 1 dynamical quark flavours by ETMC [20] at a pion mass of about 350 MeV. For details on the ensemble we refer to Ref. [20]. The correlation functions for the pion have been computed with the so-called one-end-trick and spin dilution, see Ref. [21] on 4996 gauge configurations. The time extent is T = 64 lattice points, the spatial one L = T /2.

Pion
We look at the single pion two-point correlation function C ll π (t) computed with local sink and local source using the standard operatorū iγ 5 d projected to zero momentum. Since the pion is relatively light, the backpropagating state due to periodic boundary conditions is important. For this reason, we compute the cosh effective mass from the ratio C ll π (t + 1) C ll π (t) = e −Eπ(t+1) + e −Eπ(T −(t+1)) e −Eπt + e −Eπ(T −t) by solving numerically for E π . The corresponding result is shown as red circles in Figure 5 as a function of t. The effective masses M eff computed from the PGEVM principal correlator with τ 0 = 2, n = 2 and ∆ = 1 fixed are shown as blue squares. One observes that excited states are reduced but the pollution by the backward propagating state ruins the plateau. As green diamonds we show theM eff for the principal correlator with δt = 1, n = 2 and ∆ = 2 fixed. Here, we used a target value ξ = 0.16 to identify the appropriate state during resampling, see section III A. The plateau starts as early as t = 5, there is an intermediate region where forward and backward propagating states contribute similarly, and there is a region for large t, where again the ground state is identified. The apparent jump in the data at t = 11 is related to coupling to a different state than on previous timeslices and is accompanied by a large error because the sorting of states is performed for each bootstrap sample. Coupling to a different state is allowed for the method with fixed δt as the τ 0 of the GEVP changes for every timeslice. In fact, this feature is a key difference to the methods with fixed τ 0 for which the set of states is unambigously determined by the initial choice of τ 0 , see the discussion in section III A 4. Once all the excited states have become negligible, the PGEVM can also resolve both forward and backward propagating states (see also Ref. [17]). For the example at hand this is shown in Figure 6 with τ 0 = 17 and n = 2 fixed. For this to work it is important to chose τ 0 large enough, such that excited states have decayed sufficiently. Interestingly, the noise is mainly projected into the state with negative energy. In Figure 7 we visualise the improvement realised by combining GEVM with PGEVM. Starting with a 2 × 2 correlator matrix built from local and fuzzed operators, we determine the GEVM principal correlator λ 0 (t) using t 0 = 1. The cosh effective mass of λ 0 is shown as red circles in Figure 7. In green we showM eff of the PGEVM principal correlator Λ 0 obtained with δt = 1, n 1 = 2 and ∆ = 2 fixed. Compared to Figure 7, the plateau inM eff starts as early as t = 3. However, in particular at larger t-values the noise is also increased compared to the PGEVM directly applied to the original correlator. It should be clear that the pion is not the target system for an analysis combining GEVM and PGEVM, because its energy levels can be   Figure 5 for the first two rows and green diamonds of Figure 6 for the third row. The fit ranges are [t1, t2].
extracted without much systematic uncertainty directly from the original correlator. However, it serves as a useful benchmark system, where one can also easily check for correctness. In Figure 8 we plot the (interpolated) bootstrap sample densities ofM eff for three t-values: t = 4, t = 10 and t = 15. They correspond to the green diamonds in Figure 7. One observes that at t = 4 the distribution is approximately Gaussian. At t = 15 the situation is similar, just that the distribution is a bit skew towards larger M eff -values. In the intermediate region with t = 10 there is a two peak structure visible, which is responsible for the large error. It is explained -see above -by the inability of the method with δt = 1 to distinguish the different exponentials contributing to λ 0 . In Table I we have compiled fit results obtained for the pion: the first row corresponds to a fit to the effective mass of the correlator C ll π in the fit range indicated by t 1 , t 2 . The second row represents the fit toM eff with δt = 1 fixed obtained with PGEVM on C ll π directly (green diamonds in Figure 5). The last row is the same, but for the combination of GEVM/PGEVM (green diamonds in Figure 7). The agreement is very good, even though the PGEVM and GEVM/PGEVM errors are larger than the ones obtained from the correlator directly.

η-meson
As a next example we study the η/η system, where due to mixing of flavour singlet and octet states the GEVM cannot be avoided in the first place. In addition, due to large contributions by fermionic disconnected diagrams the correlators are noisy making the extraction of energy levels at late Euclidean times difficult. The η/η analysis on the B55.32 ensemble was first carried out in Refs. [22][23][24] using a powerful method to subtract excited states we can compare to. However, this excited state subtraction method is based on some (well founded) assumptions. The starting point is a 3×3 correlator matrix C η ij (t) with light, strange and charm flavour singlet operators and local operators only. We apply the GEVM with t 0 = 1 and extract the first principal correlator λ 0 (t) corresponding to the η-state, which is then input to the PGEVM. In Figure 9 we show the effective mass of the η-meson for this GEVM principal correlator λ 0 (t) as black circles. In addition we show as red squares the effective masses of Λ 0 obtained from the PGEVM applied to this principal correlator with n 1 = 2, τ 0 = 1 and ∆ = 1. The blue diamonds representM eff of Λ 0 obtained with n 1 = 3, δt = 1 and ∆ = 1 fixed. The dashed horizontal line indicates the results obtained using excited state subtraction [22]. For better legibility we show the effective masses for each of the three cases only up to a certain t max after which errors become too large. Moreover, the two PGEVM results are slightly displaced horizontally. One observes two things: excited state pollutions are significantly reduced by the application of the PGEVM to the GEVM principal correlator λ 0 . However, also noise increases. But, since in the effective masses of λ 0 there are only 5 points which can be interpreted as a plateau, the usage of PGEVM significantly increases the confidence in the analysis.
In the corresponding η principal correlator the noise is too large to be able to identify a plateau for any of the cases studied for the η.
In table Table II we present fit results to the different η effective masses from Figure 9. The agreement among the different definitions, but also with the literature value is reasonable within errors.
3. I = 1, π − π-scattering Finally, we investigate correlator matrices for the I = 1, π − π-scattering. The corresponding correlator matrices were determined as part of a Lüscher analysis including moving frames and all relevant lattice irreducible representations (irreps). A detailed discussion of the framework and the theory can be found in Ref. [25]. Here we use the N f = 2 flavour ensemble cA2.30.48 generated by ETMC [26,27], to which we apply the same methodology as discussed in Ref. [25]. The first example corresponds to the ground state in the A 1 irreducible representation with total squared momentum equal to 1 in units of 4π 2 /L 2 , for which the results are shown in the left panel of Figure 10. In this case the effective mass computed from the GEVM principal correlator λ 0 shows a reasonable plateau (black circles). The red squares show M eff of Λ 0 with n 1 = 2, τ 0 = 1 and ∆ = 2 fixed. Even though the plateau starts at earlier times, noise is increasing quickly. Actually, we no longer display the energies from t > 17 due to too large error bars for better legibility. When usingM eff with n 1 = 3, δt = 1 and ∆ = 1, a plateau can be identified from t = 1 on and with a very reasonable signal to noise ratio. Fit results to the effective masses for the A 1 irrep are compiled in Table III. Here one notices that, despite the visually much longer plateau range, the error on the fitted mass is significantly larger forM eff than for the other two methods. The overall agreement is very good, though. The same can be observed in the right panel of Figure 10 for the T 1u irrep. However, this time it is not straightforward to identify a plateau in M eff of λ 0 shown as black circles. UsingM eff instead with n 1 = 3, δt = 1 and ∆ = 1 fixed improves significantly over the traditional effective masses and give much higher confidence to the extracted energy levels.
Fit results for the T 1u irrep are compiled in Table IV. The conclusion is similar to the one from the A 1 irrep.  Tables III and IV.

V. DISCUSSION
In this paper we have first discussed the relation among the generalised eigenvalue, the Prony and the generalised pencil of function methods: they are all special cases of a generalised eigenvalue method. This fact allows one to discuss systematic effects stemming from finite matrix sizes used to resolve the infinite tower of states. The results previously derived for the generalised eigenvalue method [4,5] can be transferred and generalised to the other methods. In particular, pollutions due to unresolved states decay exponentially in time.
At the beginning of the previous section we have demonstrated with synthetic data that the PGEVM works as expected. In particular, we could confirm that pollutions due to unresolved excited states vanish exponentially in t. This exponential convergence to the wanted state is faster ifM eff Eq. (21) with δt fixed is used, as expected from the perturbative description. Increasing the footprint of the Hankel matrix by increasing the parameter ∆ helps in reducing the amplitude of the polluting terms. Still using synthetic data, we have shown that backward propagating states affect PGEVM effective energies at large times. But, PGEVM makes it also possible to distinguish forward from backward propagating states. As a first example for data with noise we have looked at the pion. There are three important conclusions to be drawn here: first, the PGEVM can also resolve forward and backward propagating states in the presence of noise. Second,M eff computed for fixed δt is advantageous compared to M eff at fixed t 0 , because in this case strong effects from the backward propagating pion can be avoided. And finally, combining GEVM and PGEVM sequentially leads to a reduction of excited state contributions. The next two QCD examples are for the η meson and the ρ meson where one must rely on the variational method. Moreover, the signal to noise ratio decays exponentially such that excited state reduction is imperative. For the case of the η meson the combined GEVM/PGEVM leads to significantly larger confidence in the extracted energy levels. For the I = 1, π − πscattering a strong improvement is visible. The latter is likely due to the large input correlator matrix to the GEVM. This leads to a large gap relevant for the corrections due to excited states and, therefore, to small excited states in the PGEVM principal correlator. Interestingly, for the ρ-meson example studied here also the signal to noise ratio in the PGEVM principal correlator at fixed δt is competitive if not favourable compared to the effective mass of the GEVM principal correlator. Last but not least let us emphasise that the novel method presented here is not always advantageous and many other methods have been developed for the analysis of multi-exponential signals, each with their own strengths and weaknesses. We are especially referring to the recent developments of techniques based on the use of ordinary differential equations [28] and the Gardner method [29], for the latter see appendix A. Both methods are in principle capable of extracting the full energy spectrum. However, the Gardner method becomes unreliable in the case of insufficient data and precision, while we have not tested the ODE method here. But the results in Ref. [28] look promising.

VI. SUMMARY
In this paper we have clarified the relation among different methods for the extraction of energy levels in lattice QCD available in the literature. We have proposed and tested a new combination of generalised eigenvalue and Prony method (GEVM/PGEVM), which helps to reduce excited state contaminations. We have first discussed the systematic effects in the PGEVM stemming from states not resolved by the method. They decay exponentially fast in time with exp(−∆E n,l t 0 ) with ∆E n,l = E n − E l the difference of the first not resolved energy level E n and the level of interest E l . Using synthetic data we have shown that this is indeed the leading correction.
Next we have applied the method to a pion system and discussed its ability to also determine backward propagating states, given high enough statistical accuracy, see also Ref. [17]. Together with the results from the synthetic data we could also conclude that working at fixed δt is clearly advantageous compared to working at fixed t 0 , at least for data with little noise. Finally, looking at lattice QCD examples for the η-meson and the ρ-meson, we find that excited state contaminations can be reduced significantly by using the combined GEVM/PGEVM. While it is not clear whether also the statistical precision can be improved, GEVM/PGEVM can significantly improve the confidence in the extraction of energy levels, because plateaus start early enough in Euclidean time. This is very much in line with the findings for the Prony method in the version applied by the NPLQCD collaboration [8].
The GEVM/PGEVM works particularly well, if in the first step the GEVM removes as many intermediate states as possible and, thus, the gap ∆E n,l becomes as large as possible in the PGEVM with moderately small n. The latter is important to avoid numerical instabilities in the PGEVM.

The algorithm
The most general form of a multicomponent exponential decaying function f (t) is with some integrable function g(λ) and t bound from below, WLOG t ≥ 0. In the common discrete case we get where the A i ∈ R are the amplitudes, the E i are the decay constants, often identified with energy levels, and δ denotes the Dirac-Delta distribution. Gardner et al. [29] proposed to multiply equation (A1) by t = exp(x) and substitute λ = exp(−y) in order to obtain the convolution This equation can now easily be solved for g(λ) using Fourier transformations. We define and obtain The Fourier transformation in equation (A5) has been solved analytically, yielding the complex Gamma function Γ. The peaks of g(e −y ) indicate the values of the E i by their positions and the normalised amplitudes A i E i by their heights. The normalisation is due to the substitution g(λ) → e −y g(e −y ).

Numerical Precision
The Fourier integrals (A4) and (A7) have to be solved numerically. We used the extremely efficient algorithms double exponential formulas [41] for low frequencies ≤ 2π and double exponential transformation for Fourier-type integrals [42] for high frequencies ≥ 2π. These techniques allow to achieve machine precision of floating point double precision arithmetics with 100 function evaluations. This however can only work as long as the result of the integral has the same order of magnitude as the maximum of the integrated function. It turns out that this is not the case for the given integrals. F (µ) decays exponentially in O exp(− π 2 |µ|) (at the same rate as K(µ)) if f (t) follows equation (A1). Thus, as |µ| grows, the sum of values e x f (e x ) ∈ O (1) approaches zero more and more, loosing significant digits. To avoid this effect one would have to employ higher precision arithmetics. With double precision arithmetics the values of F (µ) become completely unreliable in the region |µ| 20 where F (µ) approaches machine precision. In practice we find that only F (|µ| 10) is precise enough to be trusted.

Limited data
In the case relevant for this work the data is limited to a noisy time series f (t) + ν(t), t ∈ {0, . . . , n}, where ν(t) is an error. Thus we have to deal with three difficulties, namely a discrete set, a finite range and noise. Additional problems are the aforementioned limitation in precision for high frequencies and possible small gaps between decay constants E i that cannot be resolved. Ref. [43] summarises a large number of improvements to the Gardner method and we are going to mention the relevant ones explicitly below. a. Limited precision of F (µ) at high frequencies leads to a divergence of F (µ) K(µ) and thus to a divergent integral in equation (A7). If one does not have or want to spend the resources for arbitrary precision arithmetics, one is therefore forced to dampen the integrand in (A7). Gardner et al. [29] originally proposed to simply introduce a cut off to the integral. It turns out that this cut off leads to sinc-like oscillations of g(e −y ), i.e. a high number of slowly decaying spurious peaks. These oscillations can be removed by introducing a convergence factor of the form exp(− µ 2 2w 2 ) instead of the cut off [44]. The effective convolution of the exact result g(e −y ) with a Gaussian only smoothes g(e −y ) but does not introduce oscillations. We chose w = 2 for our test runs. This choice does not always yield optimal results, but it is very stable. b. Discrete data is probably easiest to compensate. The exponential of a cubic spline of log(f (t)) yields a very precise interpolation of the data. Typically for test functions the relative error is less than 10 −4 . Usually this is far below noise level. c. Finite time range is a much more severe problem. The exponential tail of f (t) for t → ∞ carries a lot of information, especially about the lowest decay modes. Thus extrapolation of the data essentially fixes the ground state energy which we are usually most inter- ested in. An extrapolation of some kind is necessary, as a cut off completely obscures the result (see Figure 11). For a proper extrapolation one would need to know at least the smallest E i in advance, removing the necessity to apply the Gardner method in the first place. In our test runs we used a linear extrapolation of the splines to the log-data. Provencher [45] proposes to multiply the complete time series by a damping term of the form t α e −βt with α, β > 0 instead of t. This leads to a suppression of the region beyond the data range, but it also moves the peaks of g(e −y ) closer together, thus decreasing the resolution. Still, Provencher does not remove the necessity of an extrapolation completely. In addition the method introduces two parameters that have to be tuned. Let us remark here that, given a reliable extrapolation or very long measurement, the inverse of Provencher's method can be used to improve resolution: Choose min(E i ) < β < 0 and so separate the lowest lying peak from the others. We show the advantage of such a shift of the decay constants in Figure 12. d. Noisy data is not a significant problem by itself, as long as the magnitude is known. Fluctuations can be captured by the bootstrap or other error propagating methods. Severe problems arise if noise is combined with the aforementioned finite range. Then extrapolations based on the last few points (e.g. with the spline method) become very unreliable. We show this effect in Figure 13 where we slightly increased the value of the very last data point.

Applicability in practice
We applied the method to some data obtained from lattice QCD simulations. With some fine tuning of β and a sensible truncation of the data (we removed points below noise level and regions not falling monotonously) one can obtain very good results. Note especially the high resolution of the ground state in Figure 14, but the relevant exited states can be resolved as well. Nevertheless we have to conclude that the Gardner method is not broadly applicable to real data commonly obtained from lattice simulations. One reason is that it requires fine tuning of several parameters to obtain good results. The main problem however is the absence of a reliable extrapolation of noisy data from the limited time range. The algorithm does not fail gracefully, i.e. there is no obvious check if the result for g(e −y ) is correct or not. Thus even though the Gardner method can yield very precise results, one cannot automatise it and rely on the correctness of the output. As a last remark we would like to add that the Gardner method is also orders of magnitude costlier in terms of computing resources than simpler methods like χ 2 -fits.