# On the generalised eigenvalue method and its relation to Prony and generalised pencil of function methods

## Abstract

We discuss the relation of a variety of different methods to determine energy levels in lattice QCD simulations: the generalised eigenvalue, the Prony, the generalised pencil of function and the Gardner methods. All three former methods can be understood as special cases of a generalised eigenvalue problem. We show analytically that the leading corrections to an energy $$E_l$$ in all three methods due to unresolved states decay asymptotically exponentially like $$\exp (-(E_{n}-E_l)t)$$. Using synthetic data we show that these corrections behave as expected also in practice. We propose a novel combination of the generalised eigenvalue and the Prony method, denoted as GEVM/PGEVM, which helps to increase the energy gap $$E_{n}-E_l$$. We illustrate its usage and performance using lattice QCD examples. The Gardner method on the other hand is found less applicable to realistic noisy data.

## Introduction

In lattice field theories one is often confronted with the task to extract energy levels from noisy Monte Carlo data for Euclidean correlation functions, which have the theoretical form

\begin{aligned} C(t)\ =\ \sum _{k=0}^\infty \ c_k\, e^{-E_k t} \end{aligned}
(1)

with real and distinct energy levels $$E_{k+1}>E_k$$ and real coefficients $$c_k$$. It is well known that this task represents an ill-posed problem because the exponential functions do not form an orthogonal system of functions.

Still, as long as one is only interested in the ground state $$E_0$$ and the statistical accuracy is high enough to be able to work at large enough values of t, the task can be accomplished by making use of the fact that

\begin{aligned} \lim _{t\rightarrow \infty } C(t)\ \approx \ c_0 e^{-E_0 t}, \end{aligned}
(2)

with corrections exponentially suppressed with increasing t due to ground state dominance. However, in lattice quantum chromodynamics, the non-perturbative approach to quantum chromodynamics (QCD), the signal to noise ratio for C(t) deteriorates exponentially with increasing t [1]. Moreover, at large Euclidean times there can be so-called thermal pollutions (see e.g. Ref. [2]) to the correlation functions, which, if not accounted for, render the data at large t useless. And, once one is interested in excited energy levels $$E_k\,,\ k>0$$, alternatives to the ground state dominance principle need to be found.

The latter problem can be tackled applying the so-called generalised eigenvalue method (GEVM) – originally proposed in Ref. [3] and further developed in Ref. [4]. It is by now well established in lattice QCD applications and allows one to estimate ground and excited states for the price that a correlator matrix needs to be computed instead of a single correlation function. Moreover, the systematics of this method are well understood [4, 5].

An alternative method, originally proposed by de Prony [6], represents an algebraic method to determine in principle all the energy levels from a single correlation function. However, it is well known that the Prony method can become unstable in the presence of noise. The Prony method was first used for lattice QCD in Refs. [7, 8]. For more recent references see Refs. [9,10,11] and also Appendix A. For an application of the Prony method in real time dynamics with Tensor networks see Ref. [12].

In this paper we discuss the relation among generalised eigenvalue, Prony and generalised pencil of function (GPOF) methods and trace them all back to a generalised eigenvalue problem. This allows us to derive the systematic effects due to so-called excited state contributions for the Prony and GPOF methods using perturbation theory invented for the GEVM [5]. In addition, we propose a combination of the GEVM and the Prony method, the latter of which we also formulate as a generalised eigenvalue method and denote it as Prony GEVM (PGEVM). The combination we propose is to apply first the GEVM to a correlator matrix and extract the so-called principal correlators, which are again of the form Eq. (1). Then we apply the PGEVM to the principal correlators and extract the energy levels. In essence: the GEVM is used to separate the contributing exponentials in distinct principal correlators with reduced pollutions compared to the original correlators. Then the PGEVM is applied only to obtain the ground state in each principal correlator, the case where it works best.

By means of synthetic data we verify that the PGEVM works as expected and that the systematic corrections are of the expected form. Moreover, we demonstrate that with the combination GEVM/PGEVM example data from lattice QCD simulations can be analysed: we study the pion first, where we are in the situation that the ground state can be determined with other methods with high confidence. Thereafter we also look at the $$\eta$$-meson and $$I=1, \pi =\pi$$ scattering, both of which require the usage of the GEVM in the first place, but where also noise is significant.

The paper is organised as follows: in the next section we introduce the GEVM and PGEVM and discuss the systematic errors of PGEVM. After briefly explaining possible numerical implementations, we present example applications using both synthetic data and data obtained from lattice QCD simulations. In the end we discuss the advantages and disadvantages of our new method, also giving an insight into when it is most useful.

## Methods

Maybe the most straightforward approach to analysing the correlation function Eq. (1) for the ground state energy $$E_0$$ is to use the so-called effective mass defined as

\begin{aligned} M_\text {eff}(t_0, \delta t)\ =\ -\frac{1}{\delta t}\log \left( \frac{C(t_0+\delta t)}{C(t_0)}\right) . \end{aligned}
(3)

In the limit of large $$t_0$$ and fixed $$\delta t$$, $$M_\text {eff}$$ converges to $$E_0$$. The correction due to the first excited state $$E_1$$ is readily computed:

\begin{aligned} \begin{aligned} M_\text {eff}(t_0, \delta t)\ \approx&\ E_0 + \frac{c_1}{c_0} e^{-(E_1-E_0)t_0}\\&\times \left( 1 - e^{-(E_1-E_0)\delta t}\right) \,\frac{1}{\delta t}. \end{aligned} \end{aligned}
(4)

It is exponentially suppressed in $$t_0$$ and the energy difference between first excited and ground state. It is also clear from this formula that taking the limit $$\delta t\rightarrow \infty$$ while keeping $$t_0$$ fixed leads to a worse convergence behaviour than keeping $$\delta t$$ fixed and changing $$t_0$$. In this section we will discuss how both of the two above equations generalise.

### The generalised eigenvalue method (GEVM)

We first introduce the GEVM. The method is important for being able to determine ground and excited energy levels in a given channel. Moreover, it helps to reduce excited state contaminations to low lying energy levels.

Using the notation of Ref. [5], one considers correlator matrices of the form

\begin{aligned} C_{ij}(t)\ =\ \langle {{\hat{O}}}_i^{}(t')\ {{\hat{O}}}_j^\dagger (t'+t)\rangle \ =\ \sum _{k=0}^\infty e^{-E_k t} \psi _{ki}^*\psi _{kj}, \end{aligned}
(5)

with energy levels $$E_k > 0$$ and $$E_{k+1} > E_k$$ for all values of k. The $$\psi _{ki}=\langle 0|{{\hat{O}}}_i|k\rangle$$ are matrix elements of n suitably chosen operators $${{\hat{O}}}_i$$ with $$i=0, \ldots , n-1$$. Then, the eigenvalues or so-called principal correlators $$\lambda (t, t_0)$$ of the generalised eigenvalue problem (GEVP)

\begin{aligned} C(t)\, v_k(t, t_0)\ =\ \lambda ^0_k(t, t_0)\, C(t_0)\, v_k(t, t_0), \end{aligned}
(6)

can be shown to read

\begin{aligned} \lambda ^0_k(t, t_0)\ =\ e^{-E_k(t - t_0)} \end{aligned}
(7)

for $$t_0$$ fixed and $$t\rightarrow \infty$$. Clearly, the correlator matrix C(t) will for every practical application always be square but finite with dimension n. This will induce corrections to Eq. (7). The corresponding corrections were derived in Ref. [4, 5] and read to leading order

\begin{aligned} \lambda _k(t, t_0) = b_k \lambda ^0_k(1+{\mathcal {O}}(e^{-\varDelta E_k t})) \end{aligned}
(8)

with $$b_k>0$$ and

\begin{aligned} \varDelta E_k\ =\ \min _{l\ne k}|E_l -E_k|. \end{aligned}
(9)

Most notably, the principal correlators $$\lambda _k(t_0, t)$$ are at fixed $$t_0$$ again a sum of exponentials. As was shown in Ref. [5], for $$t_0>t/2$$ the leading corrections are different compared to Eq. (8), namely of order

\begin{aligned} \exp [-(E_n - E_k)t]. \end{aligned}
(10)

### The Prony method

For the original Prony method [6], we restrict ourselves first to a finite number n of exponentials in an Euclidean correlation function $$C^0$$

\begin{aligned} C^0(t)\ =\ \sum _{k=0}^{n-1} c_k\, e^{-E_k t}. \end{aligned}
(11)

The $$c_k$$ are real, but not necessarily positive constants and t is integer-valued. Thus, we focus on one matrix element of the correlator matrix Eq. (5) from above or other correlators with the appropriate form. We assume now $$E_k\ne 0$$ for all $$k\in \{0, \ldots , n-1\}$$ and that all the $$E_k$$ are distinct. Moreover, we assume the order $$E_{k+1} > E_k$$ for all k. Then, Prony’s method is a generalisation of the effective mass Eq. (3) in the form of a matrix equation

\begin{aligned} H\cdot x = 0, \end{aligned}
(12)

with an $$n\times (n+1)$$ Hankel matrix H

\begin{aligned} H= \begin{pmatrix} C^0(t) &{} C^0(t+1) &{} \ldots &{} C^0(t+n) \\ C^0(t+1) &{} C^0(t+2) &{} \ldots &{} C^0(t+n+1) \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ C^0(t+n-1) &{} C^0(t+n) &{} \ldots &{} C^0(t+2n-1) \\ \end{pmatrix} \end{aligned}

and a coefficient vector $$x = (x_0, \ldots , x_{n-1}, 1)$$ of length $$n+1$$. After solving for x, the exponentials are obtained from x by the roots of

\begin{aligned} x_0 + x_1\left( e^{-E_l}\right) + x_2 \left( e^{-E_l}\right) ^2 + \ldots + \left( e^{-E_l}\right) ^n = 0. \end{aligned}

For a further generalisation see Ref. [8] and references therein.

### The Prony GEVM (PGEVM)

Next we formulate Prony’s method Eq. (12) as a generalised eigenvalue problem (see also Ref. [13]). Let $$H^0(t)$$ be a $$n\times n$$ Hankel matrix for $$i,j=0, 1, 2, \ldots , n-1$$ defined by

\begin{aligned} \begin{aligned} H^0_{ij}(t)\&=\ C^0(t + i\varDelta + j\varDelta )\\&=\ \sum _{k=0}^{n-1} e^{-E_kt}\, e^{-E_ki\varDelta }\, e^{-E_kj\varDelta } c_k,\\ \end{aligned} \end{aligned}
(13)

with integer $$\varDelta >0$$. $$H^0(t)$$ is symmetric, but not necessarily positive definite. We are going to show that the energies $$E_0, \ldots , E_{n-1}$$ can be determined from the generalised eigenvalue problem

\begin{aligned} H^0(t)\, v_l\ =\ \varLambda ^0_l(t, \tau _0)\, H^0(\tau _0)\, v_l. \end{aligned}
(14)

The following is completely analogous to the corresponding proof of the GEVM in Ref. [5]. Define a square matrix

\begin{aligned} \chi _{ki}\ =\ e^{-E_k i\varDelta }. \end{aligned}
(15)

and re-write $$H^0(t)$$ as

\begin{aligned} H^0_{ij}(t)\ =\ \sum _{k=0}^{n-1} c_k e^{-E_k t}\chi _{ki}\chi _{kj}. \end{aligned}

Note that $$\chi$$ is a square Vandermonde matrix

\begin{aligned} \chi \ =\ \begin{pmatrix} 1 &{} e^{-E_0\varDelta } &{} e^{-2E_0\varDelta } &{} \ldots &{} e^{-(n-1)E_{0}\varDelta }\\ 1 &{} e^{-E_1\varDelta } &{} e^{-2E_1\varDelta } &{} \ldots &{} e^{-(n-1)E_{1}\varDelta }\\ \vdots &{} \vdots &{} \vdots &{} \ldots &{} \vdots \\ 1 &{} e^{-E_{n-1}\varDelta } &{} e^{-2E_{n-1}\varDelta } &{} \ldots &{} e^{-(n-1)E_{n-1}\varDelta }\\ \end{pmatrix} \end{aligned}

with all coefficients distinct and, thus, invertible. Now, like in Ref. [5], introduce the dual vectors $$u_k$$ with

\begin{aligned} (u_k, \chi _l)\ =\ \sum _{i=0}^{n-1} (u_{k}^*)_i \chi _{li}\ =\ \delta _{kl} \end{aligned}

for $$k,l \in \{0, \ldots , n-1\}$$. With these we can write

\begin{aligned} \begin{aligned} H^0(t)\, u_l\&=\ \sum _{k=0}^{n-1} c_k e^{-E_k t}\chi _k \chi _k^* u_l\ =\ c_l e^{-E_l t} \chi _l\\&=\ e^{-E_l(t-\tau _0)}\ c_l e^{-E_l \tau _0} \chi _l\\&=\ e^{-E_l(t-\tau _0)} H^0(\tau _0)\, u_l\\ \end{aligned} \end{aligned}
(16)

Thus, the GEVP Eq. (14) is solved by

\begin{aligned} \varLambda ^0_k(t, \tau _0)\ =\ e^{-E_k(t-\tau _0)},\quad v_k\ \propto \ u_k\,. \end{aligned}
(17)

Moreover, much like in the case of the GEVM we get the orthogonality

\begin{aligned} (u_l,\, H^0(t) u_k)\ =\ c_l e^{-E_l t}\delta _{lk}\,,\quad k,l\in \{0, \ldots , n-1\} \end{aligned}
(18)

for all t-values, because $$H^0(t)\, u_k\propto \chi _k$$.

#### Global PGEVM

In practice, there are two distinct ways to solve the GEVP Eq. (14): one can fix $$\tau _0$$ and determine $$\varLambda ^0_k(\tau _0, t)$$ as a function of t. In this case the solution Eq. (17) indicates that for each k the eigenvalues decay exponentially in time. On the other hand, one can fix $$\delta t = t-\tau _0$$ and determine $$\varLambda ^0_k(\tau _0, \delta t)$$ as a function of $$\tau _0$$. In this case the solution Eq. (17) reads

\begin{aligned} \varLambda ^0_k (\tau _0, \delta t)= e^{-E_k\,\delta t}\ =\ \text {const}, \end{aligned}

because $$\delta t$$ is fixed.

The latter approach allows to formulate a global PGEVM. Observing that the matrices $$\chi$$ do not depend on $$\tau _0$$, one can reformulate the GEVP Eq. (14) as follows

\begin{aligned} \sum _{\tau _0} H^0(\tau _0+\delta t)\, v_l\ =\ \varLambda ^0_l(\delta t)\, \sum _{\tau _0} H^0(\tau _0)\, v_l, \end{aligned}
(19)

since $$\varLambda ^0_k$$ does not depend on $$\tau _0$$. However, this works only as long as there are only n states contributing and all these n states are resolved by the PGEVM, as will become clear below. If this is not the case, pollutions and resolved states will change roles at some intermediate $$\tau _0$$-value.

#### Effects of additional states

Next, we ask the question what corrections to the above result we expect if there are more than n states contributing, i.e. a correction term

\begin{aligned} C^1(t)\ =\ \sum _{k=n}^{\infty } c_k\, e^{-E_k t} \end{aligned}
(20)

to the correlator and a corresponding correction to the Hankel matrix

\begin{aligned} \epsilon H_{ij}^1(t)\ =\ \sum _{k=n}^\infty C^1(t + i + j). \end{aligned}

(We have set $$\varDelta =1$$ for simplicity.) We assume that we work at large enough t such that these corrections can be considered as a small perturbation. Then it turns out that the results of Refs. [4, 5] apply directly to the PGEVM and all systematics are identical (Eq. (8) or (10)).

However, there is one key difference between GEVM and PGEVM. The GEVM with periodic boundary conditions is not able to distinguish the forward and backward propagating terms in

\begin{aligned} c\left( e^{-E t} \pm e^{-E(T-t)}\right) , \end{aligned}

as long as they come with the same amplitude. In fact, the eigenvalue $$\lambda ^0$$ will in this case also be a $$\cosh$$ or $$\sinh$$ [14]. In contrast, the PGEVM can distinguish these two terms. As a consequence, the backward propagating part needs to be treated as a perturbation like excited states and $$\varLambda ^0$$ is no longer expected to have a $$\cosh$$ or $$\sinh$$ functional form in the presence of periodic boundary conditions.

This might seem to be a disadvantage at first sight. However, we will see that this does not necessarily need to be the case.

Concerning the size of corrections there are two regimes to consider [5]: when $$\tau _0$$ is fixed at small or moderately large values and $$\varLambda$$ is studied as a function of $$t\rightarrow \infty$$ the corrections of the form Eq. (8) apply [4]. When, on the other hand, $$\tau _0$$ is fixed but $$\tau _0\ge t/2$$ is chosen and the effective masses Eq. (3) of the eigenvalues are studied, corrections are reduced to $${\mathcal {O}}(e^{-\varDelta E_{n,l}t})$$ with $$\varDelta E_{m,n} = E_m - E_n$$ [5].

$$\tau _0\ge t/2$$ is certainly fulfilled if we fix $$\delta t$$ to some (small) value. However, for this case $$\varLambda ^0(t, \tau _0)$$ is expected to be independent of both, t and $$\tau _0$$ when ground state dominance is reached and $$M_\text {eff}$$ is, thus, not applicable. Therefore, we define alternative effective masses

\begin{aligned} {\tilde{M}}_{\text {eff},l}(\delta t,\tau _0)\ =\ -\frac{\log \left( \varLambda _l(\delta t, \tau _0)\right) }{\delta t} \end{aligned}
(21)

and apply the framework from Ref. [5] to determine deviations of $${\tilde{M}}_{\text {eff},l}$$ from the true $$E_l$$. The authors of Ref. [5] define $$\epsilon = e^{-(E_{n}-E_{n-1})\tau _0}$$ and expand

\begin{aligned} \varLambda _l = \varLambda ^0_l + \epsilon \varLambda ^1_l + \epsilon ^2\varLambda ^2_l + \cdots , \end{aligned}
(22)

where we denote the eigenvalues of the full problem as $$\varLambda (t, \tau _0)$$. Already from here it is clear that in the situation with $$\delta t$$ fixed and $$\tau _0\rightarrow \infty$$ the expansion parameter $$\epsilon$$ becomes arbitrarily small. Simultaneously with $$\tau _0$$ also $$t\rightarrow \infty$$. The first order correction (which is dominant for $$\tau _0\ge t/2$$) to $$\varLambda _l$$ reads

\begin{aligned} \begin{aligned} \varLambda _l(\delta t, \tau _0) = e^{-E_l\delta t} +&\frac{c_n}{c_l} e^{-(\varDelta E_{n,l})\tau _0}\,\\ \times&\left( e^{-\varDelta E_{n,l}\delta t}-1\right) c_{l, n}\\ \end{aligned} \end{aligned}
(23)

with the definition of $$\varDelta E_{m,n}$$ from above and constant coefficients

\begin{aligned} c_{l, n}\ =\ (v_l^0, \chi _{n})(\chi _{n}, v_l^0). \end{aligned}

These corrections are decaying exponentially in $$\tau _0$$ with a decay rate determined by $$\varDelta E_{n,l}$$ as expected from Ref. [5]. For the effective energies we find

\begin{aligned} \begin{aligned} {\tilde{M}}_{\text {eff},l}(\delta t,\tau _0)\ \approx \ E_l&+\frac{c_n}{c_l} e^{-(\varDelta E_{n,l})\tau _0}\,\\&\times \left( e^{E_l\delta t}-e^{-E_{n}\delta t} \right) \frac{c_{l, n}}{\delta t},\\ \end{aligned} \end{aligned}
(24)

likewise with corrections decaying exponentially in $$\tau _0$$, again with a rate set by $$\varDelta E_{n,l}$$.

### Combining GEVM and PGEVM

There is one straightforward way to combine GEVM and PGEVM: we noted already above that the principal correlators of the GEVM are again a sum of exponentials, and, hence, the PGEVM can be applied to them. This means a sequential application of first the GEVM with a correlator matrix of size $$n_0$$ to determine principal correlators $$\lambda _k$$ and then of the PGEVM with size $$n_1$$ and the $$\lambda _k$$’s as input, which we denote as GEVM/PGEVM. This combination allows us to work with two relatively small matrices, which might help to stabilise the method numerically. Moreover, the PGEVM is applied only for the respective ground states in the principal correlators and only relatively small values of $$n_1$$ are needed.

An additional advantage lies in the fact that $$\lambda _k$$ is a sum of exponentials with only positive coefficients, because it represents a correlation function with identical operators at source and sink. As a consequence, the Hankel matrix $$H^0$$ is positive definite.

### Generalised pencil of function (GPOF)

For certain cases, the PGEVM can actually be understood as a special case of the generalised pencil-of-function (GPOF) method, see Refs. [15,16,17,18] and references therein. Making use of the time evolution operator, we can define a new operator

\begin{aligned} {{\hat{O}}}_{\varDelta t}(t')\ \equiv \ {{\hat{O}}}(t'+ \varDelta t) = \exp (H \varDelta t)\ {{\hat{O}}}(t')\ \exp (-H \varDelta t).\nonumber \\ \end{aligned}
(25)

This allows us to write

\begin{aligned} \langle {{\hat{O}}}_i^{}(t')\ {{\hat{O}}}_j^\dagger (t'+t + \varDelta t)\rangle \ =\ \langle {{\hat{O}}}_i^{}(t')\ {{\hat{O}}}_{\varDelta t,j}^\dagger (t'+t)\rangle , \end{aligned}
(26)

which is the same as $$C_{ij}(t + \varDelta t)$$. Using $$i=j$$ and the operators $$O_i$$, $$O_{\varDelta t, i}$$, $$O_{2\varDelta t, i}, \ldots$$ one defines the PGEVM based on a single correlation function. Note, however, that the PGEVM is more general as it is also applicable to sum of exponentials not stemming from a two-point function.

The generalisation is now straightforward by combining $${{\hat{O}}}_i$$ and $${{\hat{O}}}_{m\varDelta t, i}$$ for $$i=0, \ldots , n_0-1$$ and $$m=0, \ldots , n_1-1$$. These operators define a Hankel matrix $${\mathcal {H}}^0$$ with size $$n_1$$ of correlator matrices of size $$n_0$$ as follows ($$\varDelta =1$$ for simplicity)

\begin{aligned} {\mathcal {H}}^0_{\alpha \beta }\ =\ \sum _{k=0}^{n'-1} e^{-E_k t} \eta _{k\alpha }\eta _{k\beta }^*, \end{aligned}
(27)

with

\begin{aligned} (\eta _{k})_{i n_0 + j}\ =\ e^{-E_k i}\psi _{kj}, \end{aligned}
(28)

for $$j=0, \ldots ,n_0-1$$ and $$i=0, \ldots , n_1-1$$. Then $$n'=n_0\cdot n_1$$ is the number of energies that can be resolved. $${\mathcal {H}}$$ is Hermitian, positive definite and the same derivation as the one from the previous subsection leads to the GEVP

\begin{aligned} {\mathcal {H}}^0(t)\, v_k\ =\ \varLambda _k(t, \tau _0)\, {\mathcal {H}}^0(\tau _0)\, v_k \end{aligned}

with solutions

\begin{aligned} \varLambda ^0_k(t, \tau _0) = e^{-E_k(t - \tau _0)}. \end{aligned}

In this case the matrix $${\mathcal {H}}^0$$ is positive definite, but potentially large, which might lead to numerical instabilities. This can be alleviated by using only for a limited subset of operators $${{\hat{O}}}_i$$ their shifted versions $${{\hat{O}}}_{m\varDelta t,i}$$, preferably for those $${{\hat{O}}}_i$$ contributing the least noise.

Note that with $$\varDelta t=0$$ GPOF defines the GEVM. Therefore, GPOF is a generalisation of both, GEVM and PGEVM. We also remark that without noise, GEVM/PGEVM with $$n_0$$ and $$n_1$$ is exactly equivalent to GPOF with $$n'=n_0\cdot n_1$$. However, with noise this is no longer the case.

## Numerical implementation

In case the Hankel matrix $$H^0$$ is positive definite, one can compute the Cholesky decomposition $$C(t_0)=L\cdot L^T$$. Then one solves the ordinary eigenvalue problem

\begin{aligned} L^{-1}\, C(t)\, L^{-T}\, w_k = \lambda _k w_k \end{aligned}

with $$w_k= L^T v_k$$.

If this is not the case, the numerical solution of the PGEVM can proceed along two lines. The first is to compute the inverse of $$H^0(\tau _0)$$ for instance using a QR-decomposition and then solve the ordinary eigenvalue problem for the matrix $$A=H^0(\tau _0)^{-1}H^0(t)$$. Alternatively, one may take advantage of the symmetry of both $$H^0(t)$$ and $$H^0(\tau _0)$$. One diagonalises both $$H^0(t)$$ and $$H^0(\tau _0)$$ with diagonal eigenvalue matrices $$\varLambda _t$$ and $$\varLambda _{\tau _0}$$ and orthogonal eigenvector matrices $$U_t$$ and $$U_{\tau _0}$$. Then, the eigenvectors of the generalised problem are given by the matrix

\begin{aligned} U\ =\ U_{\tau _0}\, \varLambda _{\tau _0}^{-1/2}\, U_t \end{aligned}

and the generalised eigenvalues read

\begin{aligned} \varLambda \ =\ U^T\, H^0(t)\, U. \end{aligned}

Note that U is in contrast to $$U_t$$ and $$U_{\tau _0}$$ not orthogonal.

### Algorithms for sorting GEVP states

Solving the generalized eigenvalue problem in Eq. (6) for an $$n\times n$$ correlation function matrix C(t) (or Hankel matrix H) with $$t>t_0$$, results in an a priori unsorted set $$\left\rbrace s_k(t) |k\in [0,\ldots ,n-1]\right\lbrace$$ of states $$s_k(t)=(\lambda _k(t,t_0), \mathrm{v}_k(t,t_0))$$ on each timeslice t defined by an eigenvalue $$\lambda _k(t,t_0)$$ and an eigenvector $$\mathrm{v}_k(t,t_0)$$. In the following discussion we assume that the initial order of states is always fixed on the very first timeslice $$t_0+1$$ by sorting the states by eigenvalues, i.e. choosing the label n by requiring $$\lambda _0(t_0+1,t_0)> \lambda _1(t_0+1,t_0)> \cdots > \lambda _{n-1}(t_0+1,t_0)$$, s.t. the vector of states reads $$(s_{0}(t_0+1), \ldots , s_{n-1}(t_0+1) )$$.

After defining the initial ordering of states, there are many different possibilities to sort the remaining states for $$t>t_0$$. In general, this requires a prescription that for any unsorted vector of states $$(s_{(k=0)}(t), \ldots , s_{(k=n-1)}(t) )$$ yields a re-ordering $$s_{\epsilon (k)}(t)$$ of its elements. The permutation $$\epsilon (k)$$ may depend on some set of reference states $$(s_{0}({\tilde{t}}), \ldots , s_{n-1}({\tilde{t}}))$$ at time $${\tilde{t}}$$ which we assume to be in the desired order. However, for the algorithms discussed here, such explicit dependence on a previously determined ordering at a reference time $${\tilde{t}}$$ is only required for eigenvector-based sorting algorithms. Moreover, $${\tilde{t}}$$ does not necessarily have to equal $$t_0+1$$. In fact, the algorithms discussed below are in practice often more stable for choosing e.g. the previous timeslice $$t-1$$ to determine the order of states at t while moving through the available set of timeslices in increasing order.

#### Sorting by eigenvalues

This is arguably the most basic way of sorting states; it simply consists of repeating the ordering by eigenvalues that is done at $$t_0$$ for all other values of t, i.e. one chooses $$\epsilon (k)$$ independent of any reference state and ignoring any information encoded in the eigenvectors, s.t.

\begin{aligned} \lambda _0(t,t_0)> \lambda _1(t,t_0)> \cdots > \lambda _{n-1}(t,t_0). \end{aligned}
(29)

The obvious advantage of this method is that it is computationally fast and trivial to implement. However, it is not stable under noise which can lead to a rather large bias and errors in the large-t tail of the correlator due to incorrect tracking of states. This is an issue for systems with a strong exponential signal-to-noise problem (e.g. the $$\eta$$,$$\eta '$$-system) as well as for large system sizes n. Moreover, the algorithm fails by design to correctly track crossing states, which causes a flipping of states at least in an unsupervised setup and tends to give large point errors around their crossing point in t.

#### Simple sorting by eigenvectors

Sorting algorithms relying on eigenvectors instead of eigenvalues generally make use of orthogonality properties. A simple method is based on computing the scalar product

\begin{aligned} c_{kl} = \langle \mathrm{v}_l({\tilde{t}}), \mathrm{v}_k(t)\rangle , \end{aligned}
(30)

where $$\mathrm{v}_l({\tilde{t}})$$ denote eigenvectors of some (sorted) reference states $$s_l({\tilde{t}})$$ at $${\tilde{t}}<t$$ and $$\mathrm{v}_k(t)$$ belongs to a state $$s_k(t)$$ that is part of the set which is to be sorted. For all values of k one assigns $$k\rightarrow \epsilon (k)$$, s.t. $$\left| c_{kl} \right| {\mathop {=}\limits ^{!}} \text {max}$$. If the resulting map $$\epsilon (k)$$ is a permutation the state indexing at t is assigned according to $$s_{k}(t) \rightarrow s_{\epsilon (k)}(t)$$. Otherwise sorting by eigenvalues is used as a fallback.

This method has some advantages over eigenvalue-based sorting methods: It can in principle track crossing states and flipping or mixing of states in the presence of noise are less likely to occur. The latter is especially an issue for resampling (e.g. bootstrap or jackknife), i.e. if state assignment fails only on a subset of samples for some value(s) of t, leading to large point errors and potentially introducing a bias. On the downside, the resulting order of states from this method is in general not unambiguous for systems with $$n>2$$ and the algorithm is not even guaranteed to yield a valid permutation $$\epsilon (k)$$ for such systems in the presence of noise, hence requiring a fallback.

#### Exact sorting by eigenvectors

Any of the shortcomings of the aforementioned methods are readily avoided by an approach that uses volume elements instead of scalar products. This allows to obtain an unambiguous state assignment based on (globally) maximized orthogonality. The idea is to consider the set of all possible permutations $$\{\epsilon (k)\}$$ for a given $$n\times n$$ problem and compute

(31)

for each $$\epsilon$$. This can be understood as assigning a score for how well each individual vector $$v_k(t)$$ fits into the set of vectors at the reference timeslice $${\tilde{t}}$$ at a chosen position $$\epsilon (k)$$ and computing a global score for the current permutation $$\epsilon$$ by taking the product of the individual scores for all vectors $$v_k(t)$$. The final permutation is then chosen s.t. $$c_{\epsilon }{\mathop {=}\limits ^{!}}\text {max}$$.

Unlike the method using the scalar product, this method is guaranteed to always give a unique solution, which is optimal in the sense that it tests all possible permutations and picks the global optimum. Therefore, the algorithm is most stable under noise and well suited for systems with crossing states. Empirically, this results in e.g. the smallest bootstrap bias at larger values of t compared to any other method described here. A minor drawback of the approach is that it is numerically more expensive due to the required evaluations of (products of) volume elements instead of simple scalar products. However, this becomes only an issue for large system sizes and a large number of bootstrap (jackknife) samples.

We remark that another, numerically cheaper method leading to a definite state assignment can be obtained by replacing the score function in Eq. (31) by

\begin{aligned} c_{\epsilon } = \prod _k \left| (v_k({\tilde{t}}) \cdot v_{\epsilon (k)}(t)) \right|. \end{aligned}
(32)

For a $$2\times 2$$ problem both methods give identical results. However, for the general $$n\times n$$ case with $$n>2$$ they are no longer equivalent. This is because the method based on Eq. (31) uses more information, i.e. the individual score for each $$\epsilon (k)$$ entering the product on the r.h.s. is computed against $$(n-1)$$ vectors on the reference time-slice instead of just a single vector as it is the case for the method based on the scalar product.

#### Sorting by minimal distance

While the methods discussed above work all fine for the standard case where the GEVP is solved with fixed time $$t_0$$ (or $$\tau _0$$) and $$\delta t$$ is varied, the situation is different for $${\tilde{M}}_\text {eff}$$ with $$\delta t$$ fixed: there are t-values for which it is numerically not easy to separate wanted states from pollutions, because they are of very similar size in the elements of the sum of exponentials entering at these specific t-values. However, when looking at the bootstrap histogram of all eigenvalues, there is usually a quite clear peak at the expected energy value for all t-values with not too much noise.

Therefore, we implemented an alternative sorting for this situation which goes by specifying a target value $$\xi$$. Then we chose among all eigenvalues for a bootstrap replicate the one which is closest to $$\xi$$. The error is computed from half of the 16 to 84% quantile distance of the bootstrap distribution and the central value as the mean of 16% and 84% quantiles. For the central value one could also use the median, however, we made the above choice to have symmetric errors.

This procedure is much less susceptible to large outliers in the bootstrap distribution, which appear because of the problem discussed at the beginning of this sub-section.

For the numerical experiments shown below we found little to no difference in between sorting by eigenvalues and any of the sorting by vectors. Thus, we will work with sorting by eigenvalues for all cases where we study $$\varLambda _l(t, \tau _0)$$ with $$\tau _0$$ fixed. On the other hand, specifying a target value $$\xi$$ and sort by minimal distance turns out to be very useful for the case $$\varLambda _l(\delta t, \tau _0)$$ with $$\delta t$$ fixed. As it works much more reliably than the other two approaches, we use this sorting by minimal distance for the $$\delta t$$ fixed case throughout this paper.

The methods used in this paper are fully implemented in a R package called hadron [19], which is freely available software.

## Numerical experiments

In this section we first apply the PGEVM to synthetic data. With this we investigate whether additional states not accounted for by the size of the Prony GEVP lead to the expected distortions in the principal correlators and effective masses. At this stage the energy levels and amplitudes are not necessarily chosen realistically, because we would first like to understand the systematics.

In a next step we apply the combination of GEVM and PGEVM to correlator matrices from lattice QCD simulations. After applying the framework to the pion, we have chosen two realistic examples, the $$\eta$$-meson and the $$\rho$$-meson.

### Synthetic data

As a first test we apply the PGEVM alone to synthetic data. We generate a correlator

\begin{aligned} C_s(t)\ =\ \sum _{k=0}^{2} c_k\, e^{-E_k t} \end{aligned}
(33)

containing three states with $$E_k=(0.125, 0.3, 0.5), k=0,1,2$$ and $$t\in \{0, \ldots , 48\}$$. The amplitudes $$c_k$$ have been chosen all equal to 1.

We apply the PGEVM to this correlator $$C_s$$ with $$n=2$$. This allows us to resolve only two states and we would like to see how much the third state affects the two extracted states. The result is plotted in Fig. 1. We plot $${\tilde{M}}_\text {eff}$$ of Eq. (21) as a function of t, filled symbols correspond to $$\delta t=1$$ fixed. Open symbols correspond to $$\tau _0$$ fixed with values $$\tau _0=1, 5$$ and $$\tau _0=10$$. In the left panel we show the ground state $$k=0$$, in the right one the second state $$k=1$$ resolved by the PGEVM. The solid lines represent the input values for $$E_0$$ and $$E_1$$, respectively.

One observes that the third state not resolved by the PGEVM leads to pollutions at small values of t. These pollutions are clearly larger for the case of fixed $$\tau _0$$, as expected from our discussion in Sect. 2. The relative size of the pollutions is much larger in the second state with $$k=1$$ than in the state with $$k=0$$, which is also in line with the expected pollution.

We remark in passing that the not shown values for $$M_\text {eff}$$ of Eq. (3) of the eigenvalue $$\varLambda _k(t, \tau _0)$$ at fixed $$\tau _0$$ are almost indistinguishable on the scale of Fig. 1 from $${\tilde{M}}_\text {eff}$$ with $$\delta t$$ fixed. For the tiny differences and the influence of $$\tau _0$$ thereon see Figs. 2 and 3.

In Eq. (24) we have discussed that we expect corrections in $${\tilde{M}}_\text {eff}$$ and $$M_\text {eff}$$ to decay exponentially in $$t = \delta t + \tau _0$$. We can test this by subtracting the exactly known energy $$E_k$$ from the PGEVM results. Therefore, we plot in Fig. 2 effective masses minus the exact $$E_k$$ values as a function of t. Filled symbols correspond to $${\tilde{M}}_\text {eff}$$ with $$\delta t=1$$ and open symbols (only $$k=0$$) to $$M_\text {eff}$$ with $$\tau _0=10$$. The asymptotically exponential convergence in t is nicely visible for both effective mass definitions and also for $$k=0$$ and $$k=1$$. For $${\tilde{M}}_\text {eff}$$ the decay rate is to a good approximation $$E_2-E_0$$ for $$k=0$$ and $$E_2-E_1$$ for $$k=1$$, respectively, as expected from Eq. (24). For $$M_\text {eff}$$ the asymptotic logarithmic decay rate is approximately $$E_1-E_0$$ and, thus, worse as expected from Eq. (8).

So far we have worked solely with $$\varDelta =1$$. In Fig. 3 we investigate the dependence of $${\tilde{M}}_\text {eff}$$ and $$M_\text {eff}$$ on $$\varDelta$$: we plot $$E-E_0$$ on a logarithmic scale as a function of t for $$\varDelta =1$$ and $$\varDelta =4$$. While $$\varDelta$$ has no influence on the convergence rate, it reduces the amplitude of the pollution for both $${\tilde{M}}_\text {eff}$$ and $$M_\text {eff}$$ by shifting the data points to the left. The reason is that a larger $$\varDelta$$ allows to reach larger times in the Hankel matrices at the same t. A smaller $$\varDelta$$ on the other hand allows to go to larger t, thus the advantage of increased $$\varDelta$$ is negligible.

In order to see the effect of so-called back-propagating states, we next investigate a correlator

\begin{aligned} C_s(t)\ =\ \sum _{k=0}^{2} c_k\, \left( e^{-E_k t}+ \delta _{k0}e^{-E_k (T-t)}\right) \end{aligned}
(34)

with a back-propagating contribution to the ground state $$E_0$$ only (see also Ref. [20]). Energies are chosen as $$E_i=(0.45,0.6,0.8)$$ and the amplitudes are $$c_i=(1, 0.1, 0.01)$$ with $$T=96$$. The result for the ground state effective energy determined from the PGEVM principal correlator is shown in Fig. 4. We show $$M_\text {eff}$$ from the principal correlator for $$\tau _0=10$$ fixed as open red symbols. The filled symbols correspond to $${\tilde{M}}_\text {eff}$$ for $$k=0$$ and $$k=1$$ with $$\delta t=1$$ fixed. Both is again for $$\varDelta =1$$.

One observes a downward bending of the two $$k=0$$ effective masses starting around $$t=28$$. The difference between $$\tau _0$$ fixed and $$\delta t$$ fixed is only visible in the t-range where the bending becomes significant. Obviously, in this region the contribution of the forward and backward propagating states becomes comparable in size, while the state with $$k=2$$ becomes negligible. Interestingly, for $$\delta t=1$$ fixed the state of interest is then contained in the $$k=1$$ state while the $$k=0$$ states drop to the state with energy $$-E_0$$ (not visible in the figure). The (not fully shown) state with $$k=1$$ decays from a value 0.65 towards 0.6 in the time range from 0 to about 30, after which it abruptly drops to a value of about 0.45, which is visible in the figure.

It becomes clear that there is an intermediate region in t, in this case from $$t=28$$ to $$t=38$$, where the different contributions to the correlator cannot be clearly distinguished by the PGEVM using $${\tilde{M}}_\text {eff}$$. Around $$t=28$$ contributions by the $$k=2$$ state have become negligible, while the backward propagating state becomes important. At this point the state with $$k=1$$ becomes the pollution and the PGEVM resolves forward and backward propagating states. This transition will also be visible for the lattice QCD examples discussed next.

### Lattice QCD examples

As a first lattice QCD example we start with the charged pion, which gives rise to one of the cleanest signals in any correlation function extracted from lattice QCD simulations. In particular, the signal to noise ratio is independent of t. From now on quantities are given in units of the lattice spacing a, i.e. aE, aM, t/a, ...are dimensionless real numbers. However, for simplicity we set $$a=1$$.

The example we consider is the B55.32 ensemble generated with $$N_f=2+1+1$$ dynamical quark flavours by ETMC [21] at a pion mass of about $$350\ \text {MeV}$$. For details on the ensemble we refer to Ref. [21]. The correlation functions for the pion have been computed with the so-called one-end-trick and spin dilution, see Ref. [22] on 4996 gauge configurations. The time extent is $$T=64$$ lattice points, the spatial one $$L=T/2$$.

#### Pion

We look at the single pion two-point correlation function $$C_\pi ^{ll}(t)$$ computed with local sink and local source using the standard operator $${\bar{u}}\, i\gamma _5 d$$ projected to zero momentum. Since the pion is relatively light, the backpropagating state due to periodic boundary conditions is important. For this reason, we compute the cosh effective mass from the ratio

\begin{aligned} \frac{C_\pi ^{ll}(t+1)}{C_\pi ^{ll}(t)} = \frac{e^{-E_\pi (t+1)} + e^{-E_\pi (T-(t+1))}}{e^{-E_\pi t} + e^{-E_\pi (T-t)}} \end{aligned}
(35)

by solving numerically for $$E_\pi$$. The corresponding result is shown as red circles in Fig. 5 as a function of t. The effective masses $$M_\text {eff}$$ computed from the PGEVM principal correlator with $$\tau _0=2$$, $$n=2$$ and $$\varDelta =1$$ fixed are shown as blue squares. One observes that excited states are reduced but the pollution by the backward propagating state ruins the plateau. As green diamonds we show the $${\tilde{M}}_\text {eff}$$ for the principal correlator with $$\delta t=1$$, $$n=2$$ and $$\varDelta =2$$ fixed. Here, we used a target value $$\xi =0.16$$ – chosen by eye – to identify the appropriate state during resampling, see Sect. 3.1. The plateau starts as early as $$t=5$$, there is an intermediate region where forward and backward propagating states contribute similarly, and there is a region for large t, where again the ground state is identified. The apparent jump in the data at $$t=11$$ is related to coupling to a different state than on previous timeslices and is accompanied by a large error because the sorting of states is performed for each bootstrap sample. Coupling to a different state is allowed for the method with fixed $$\delta t$$ as the $$\tau _0$$ of the GEVP changes for every timeslice. In fact, this feature is a key difference to the methods with fixed $$\tau _0$$ for which the set of states is unambiguously determined by the initial choice of $$\tau _0$$, see the discussion in Sect. 3.1.4.

Once all the excited states have become negligible, the PGEVM can also resolve both forward and backward propagating states (see also Ref. [17]). For the example at hand this is shown in Fig. 6 with $$\tau _0=17$$ and $$n=2$$ fixed. For this to work it is important to chose $$\tau _0$$ large enough, such that excited states have decayed sufficiently. Interestingly, the noise is mainly projected into the state with negative energy.

In Fig. 7 we visualise the improvement realised by combining GEVM with PGEVM. Starting with a $$2\times 2$$ correlator matrix built from local and fuzzed operators, we determine the GEVM principal correlator $$\lambda _0(t)$$ using $$t_0=1$$. The $$\cosh$$ effective mass of $$\lambda _0$$ is shown as red circles in Fig. 7. In green we show $${\tilde{M}}_\text {eff}$$ of the PGEVM principal correlator $$\varLambda _0$$ obtained with $$\delta t= 1$$, $$n_1=2$$ and $$\varDelta =2$$ fixed.

Compared to Fig. 7, the plateau in $${\tilde{M}}_\text {eff}$$ starts as early as $$t=3$$. However, in particular at larger t-values the noise is also increased compared to the PGEVM directly applied to the original correlator. It should be clear that the pion is not the target system for an analysis combining GEVM and PGEVM, because its energy levels can be extracted without much systematic uncertainty directly from the original correlator. However, it serves as a useful benchmark system, where one can also easily check for correctness.

In Fig. 8 we plot the (interpolated) bootstrap sample densities of $${\tilde{M}}_\text {eff}$$ for three t-values: $$t=4$$, $$t=10$$ and $$t=15$$. They correspond to the green diamonds in Fig. 7. One observes that at $$t=4$$ the distribution is approximately Gaussian. At $$t=15$$ the situation is similar, just that the distribution is a bit skew towards larger $${\tilde{M}}_\text {eff}$$-values. In the intermediate region with $$t=10$$ there is a two peak structure visible, which is responsible for the large error. It is explained – see above – by the inability of the method with $$\delta t=1$$ to distinguish the different exponentials contributing to $$\lambda _0$$.

In Table 1 we have compiled fit results obtained for the pion: the first row corresponds to a fit to the effective mass of the correlator $$C_\pi ^{ll}$$ in the fit range indicated by $$t_1, t_2$$, which was chosen by eye. The second row represents the fit to $${\tilde{M}}_\text {eff}$$ with $$\delta t=1$$ fixed obtained with PGEVM on $$C_\pi ^{ll}$$ directly (green diamonds in Fig. 5). The last row is the same, but for the combination of GEVM/PGEVM (green diamonds in Fig. 7). The agreement is very good, even though the PGEVM and GEVM/PGEVM errors are larger than the ones obtained from the correlator directly. In the last column we give the value of the correlated $$\chi ^2_\text {red}=\chi ^2/\text {dof}$$, where one observes that the fits are roughly comparable in term of fit quality.

#### $$\eta$$-meson

As a next example we study the $$\eta /\eta ^\prime$$ system, where due to mixing of flavour singlet and octet states the GEVM cannot be avoided in the first place. In addition, due to large contributions by fermionic disconnected diagrams the correlators are noisy making the extraction of energy levels at late Euclidean times difficult. The $$\eta /\eta ^\prime$$ analysis on the B55.32 ensemble was first carried out in Refs. [23,24,25] using a powerful method to subtract excited states we can compare to. However, this excited state subtraction method is based on some (well founded) assumptions.

The starting point is a $$3\times 3$$ correlator matrix $$C_{ij}^\eta (t)$$ with light, strange and charm flavour singlet operators and local operators only. We apply the GEVM with $$t_0=1$$ and extract the first principal correlator $$\lambda _0(t)$$ corresponding to the $$\eta$$-state, which is then input to the PGEVM.

In Fig. 9 we show the effective mass of the $$\eta$$-meson for this GEVM principal correlator $$\lambda _0(t)$$ as black circles. In addition we show as red squares the effective masses of $$\varLambda _0$$ obtained from the PGEVM applied to this principal correlator with $$n_1=2$$, $$\tau _0=1$$ and $$\varDelta =1$$. The blue diamonds represent $${\tilde{M}}_\text {eff}$$ of $$\varLambda _0$$ obtained with $$n_1=3$$, $$\delta t=1$$ and $$\varDelta =1$$ fixed. The dashed horizontal line indicates the results obtained using excited state subtraction [23]. For better legibility we show the effective masses for each of the three cases only up to a certain $$t_\text {max}$$ after which errors become too large. Moreover, the two PGEVM results are slightly displaced horizontally. Note that $${\tilde{M}}_\text {eff}$$ with $$n_1=2$$ is in between $${\tilde{M}}_\text {eff}$$ with $$n_1=3$$ and $$M_\text {eff}$$ with $$n_1=2$$. We did not attempt a comparison here, but wanted to show the potential of the method.

One observes two things: excited state pollutions are significantly reduced by the application of the PGEVM to the GEVM principal correlator $$\lambda _0$$. However, also noise increases. But, since in the effective masses of $$\lambda _0$$ there are only 5 points which can be interpreted as a plateau, the usage of PGEVM can increase the confidence in the analysis.

In the corresponding $$\eta ^\prime$$ principal correlator the noise is too large to be able to identify a plateau for any of the cases studied for the $$\eta$$.

In Table 2 we present fit results to the different $$\eta$$ effective masses from Fig. 9. The agreement among the different definitions, but also with the literature value is reasonable within errors.

#### $$I=1, \pi -\pi$$-scattering

Finally, we investigate correlator matrices for the $$I=1, \pi -\pi$$-scattering. The corresponding correlator matrices were determined as part of a Lüscher analysis including moving frames and all relevant lattice irreducible representations (irreps). A detailed discussion of the framework and the theory can be found in Ref. [26]. Here we use the $$N_f=2$$ flavour ensemble cA2.30.48 generated by ETMC [27, 28], to which we apply in Ref. [29] the same methodology as discussed in Ref. [26].

The first example corresponds to the ground state in the $$A_1$$ irreducible representation with total squared momentum equal to 1 in units of $$4\pi ^2/L^2$$, for which the results are shown in the left panel of Fig. 10. In this case the effective mass computed from the GEVM principal correlator $$\lambda _0$$ shows a reasonable plateau (black circles). The red squares show $$M_\text {eff}$$ of $$\varLambda _0$$ with $$n_1=2$$, $$\tau _0=1$$ and $$\varDelta =2$$ fixed. Even though the plateau starts at earlier times, noise is increasing quickly. Actually, we no longer display the energies from $$t>17$$ due to too large error bars for better legibility. When using $${\tilde{M}}_\text {eff}$$ with $$n_1=3$$, $$\delta t=1$$ and $$\varDelta =1$$, a plateau can be identified from $$t=1$$ on and with a very reasonable signal to noise ratio.

Fit results to the effective masses for the $$A_1$$ irrep are compiled in Table 3. Here one notices that, despite the visually much longer plateau range, the error on the fitted mass is significantly larger for $${\tilde{M}}_\text {eff}$$ than for the other two methods. The overall agreement is very good, though.

The same can be observed in the right panel of Fig. 10 for the $$T_{1u}$$ irrep. However, this time it is not straightforward to identify a plateau in $$M_\text {eff}$$ of $$\lambda _0$$ shown as black circles. Using $${\tilde{M}}_\text {eff}$$ instead with $$n_1=3$$, $$\delta t=1$$ and $$\varDelta = 1$$ fixed improves significantly over the traditional effective masses.

Fit results for the $$T_{1u}$$ irrep are compiled in Table 4. The conclusion is similar to the one from the $$A_1$$ irrep.

### GPOF versus GEVM/PGEVM

As discussed in Sect. 2, the sequential application of GEVM/PGEVM is equivalent to GPOF when applied to noise-free data. For data with noise, however, we experienced an advantage of GEVM/PGEVM over GPOF, which becomes larger with increasing matrix size.

While a systematic comparison is beyond the scope of this article, we show in Fig. 11 a comparison of GPOF versus GEVM/PGEVM for the case of the $$\eta$$-meson. We compare $$M_\text {eff}$$ for GEVM/PGEVM with $$n_0=3$$ and $$n_1=2$$ and GPOF with $$n'=6$$. For small t-values the agreement is very good. From $$t=12$$ onwards, however, the errors of GPOF are significantly larger than the ones from GEVM/PGEVM and from $$t=15$$ on the mean values start to differ significantly.

When doubling the matrix size, the difference becomes more pronounced: while the GEVM/PGEVM effective mass stays almost unchanged compared to Fig. 11, GPOF shows large statistical uncertainties and fluctuating mean values from small t-values on.

## Discussion

In this paper we have first discussed the relation among the generalised eigenvalue, the Prony and the generalised pencil of function methods: they are all special cases of a generalised eigenvalue method. This fact allows one to discuss systematic effects stemming from finite matrix sizes used to resolve the infinite tower of states. The results previously derived for the generalised eigenvalue method [4, 5] can be transferred and generalised to the other methods. In particular, pollutions due to unresolved states decay exponentially in time.

At the beginning of the previous section we have demonstrated with synthetic data that the PGEVM works as expected. In particular, we could confirm that pollutions due to unresolved excited states vanish exponentially in t. This exponential convergence to the wanted state is faster if $${\tilde{M}}_\text {eff}$$ Eq. (21) with $$\delta t$$ fixed is used, as expected from the perturbative description. Increasing the footprint of the Hankel matrix by increasing the parameter $$\varDelta$$ helps in reducing the amplitude of the polluting terms.

Still using synthetic data, we have shown that backward propagating states affect PGEVM effective energies at large times. But, PGEVM makes it also possible to distinguish forward from backward propagating states.

As a first example for data with noise we have looked at the pion. There are three important conclusions to be drawn here: first, the PGEVM can also resolve forward and backward propagating states in the presence of noise. Second, $${\tilde{M}}_\text {eff}$$ computed for fixed $$\delta t$$ is advantageous compared to $$M_\text {eff}$$ at fixed $$t_0$$, because in this case strong effects from the backward propagating pion can be avoided. And finally, combining GEVM and PGEVM sequentially leads to a reduction of excited state contributions.

The next two QCD examples are for the $$\eta$$ meson and the $$\rho$$ meson where one must rely on the variational method. Moreover, the signal to noise ratio decays exponentially such that excited state reduction is imperative.

For the case of the $$\eta$$ meson the combined GEVM/PGEVM leads to increased confidence in the extracted energy levels. For the $$I=1, \pi -\pi$$-scattering a strong improvement is visible. The latter is likely due to the large input correlator matrix to the GEVM. This leads to a large gap relevant for the corrections due to excited states and, therefore, to small excited states in the PGEVM principal correlator.

Interestingly, for the $$\rho$$-meson example studied here also the signal to noise ratio in the PGEVM principal correlator at fixed $$\delta t$$ is competitive if not favourable compared to the effective mass of the GEVM principal correlator.

We have also introduced sorting by minimal distance, requiring an input parameter value $$\xi$$. It is important to have in mind that the choice of $$\xi$$ will influence the result and potentially introduce a bias. Moreover, this sorting is, like sorting by value, susceptible to misidentifications during resampling procedures.

Last but not least let us emphasise that the novel method presented here is not always advantageous and many other methods have been developed for the analysis of multi-exponential signals, each with their own strengths and weaknesses. We are especially referring to the recent developments of techniques based on the use of ordinary differential equations [30] and the Gardner method [31], for the latter see Appendix A. Both methods are in principle capable of extracting the full energy spectrum. However, the Gardner method becomes unreliable in the case of insufficient data and precision, while we have not tested the ODE method here. But the results in Ref. [30] look promising.

## Summary

In this paper we have clarified the relation among different methods for the extraction of energy levels in lattice QCD available in the literature. We have proposed and tested a new combination of generalised eigenvalue and Prony method (GEVM/PGEVM), which helps to reduce excited state contaminations.

We have first discussed the systematic effects in the Prony GEVM stemming from states not resolved by the method. They decay exponentially fast in time with $$\exp (-\varDelta E_{n, l} t_0)$$ with $$\varDelta E_{n, l} =E_{n} -E_l$$ the difference of the first not resolved energy level $$E_{n}$$ and the level of interest $$E_l$$. Using synthetic data we have shown that this is indeed the leading correction.

Next we have applied the method to a pion system and discussed its ability to also determine backward propagating states, given high enough statistical accuracy, see also Ref. [17]. Together with the results from the synthetic data we could also conclude that working at fixed $$\delta t$$ is clearly advantageous compared to working at fixed $$t_0$$, at least for data with little noise.

Finally, looking at lattice QCD examples for the $$\eta$$-meson and the $$\rho$$-meson, we find that excited state contaminations can be reduced significantly by using the combined GEVM/PGEVM. While it is not clear whether also the statistical precision can be improved, GEVM/PGEVM can significantly improve the confidence in the extraction of energy levels, because plateaus start early enough in Euclidean time. This is very much in line with the findings for the Prony method in the version applied by the NPLQCD collaboration [8].

The GEVM/PGEVM works particularly well, if in the first step the GEVM removes as many intermediate states as possible and, thus, the gap $$\varDelta E_{n, l}$$ becomes as large as possible in the PGEVM with moderately small n. The latter is important to avoid numerical instabilities in the PGEVM.

## Data Availability Statement

This manuscript has no associated data or the data will not be deposited. [Authors’ comment: The example data used in this paper is available upon request.]

## References

1. 1.

G.P. Lepage, The analysis of algorithms for lattice field theory, 1989, Invited lectures given at TASI’89 Summer School, Boulder, CO, Jun 4–30, 1989. Published in Boulder ASI 1989:97–120 (QCD161:T45:1989)

2. 2.

X. Feng, K. Jansen, D.B. Renner, Phys. Rev. D 83, 094505 (2011). arXiv:1011.5288 [hep-lat]

3. 3.

C. Michael, I. Teasdale, Nucl. Phys. B 215, 433 (1983)

4. 4.

M. Lüscher, U. Wolff, Nucl. Phys. B 339, 222 (1990)

5. 5.

B. Blossier, M. Della Morte, G. von Hippel, T. Mendes, R. Sommer, JHEP 04, 094 (2009). arXiv:0902.1265 [hep-lat]

6. 6.

G.R. de Prony, J. de l’cole Polytech. 1, 24 (1795)

7. 7.

G.T. Fleming, What can lattice QCD theorists learn from NMR spectroscopists?, in QCD and numerical analysis III. Proceedings, 3rd International Workshop, Edinburgh, UK, June 30–July 4, 2003, pp. 143–152 (2004). arXiv:hep-lat/0403023 [hep-lat]

8. 8.

S.R. Beane et al., Phys. Rev. D 79, 114502 (2009). arXiv:0903.2990 [hep-lat]

9. 9.

G.T. Fleming, S.D. Cohen, H.-W. Lin, V. Pereyra, PoS LATTICE2007, 096 (2007)

10. 10.

E. Berkowitz et al., EPJ Web Conf. 175, 05029 (2018). arXiv:1710.05642 [hep-lat]

11. 11.

K.K. Cushman, G.T. Fleming, arXiv:1912.08205 [hep-lat]

12. 12.

M.C. Banuls, M.P. Heller, K. Jansen, J. Knaute, V. Svensson. arXiv:1912.08836 [hep-th]

13. 13.

B.C. Sauer, Approaches to improving $$\eta ^\prime$$ mass calculations. Master’s thesis, University of Bonn (2013)

14. 14.

N. Irges, F. Knechtli, Nucl. Phys. B 775, 283 (2007). arXiv:hep-lat/0609045 [hep-lat]

15. 15.

C. Aubin, K. Orginos, A.I.P. Conf, Proc. 1374, 621 (2011). arXiv:1010.0202 [hep-lat]

16. 16.

C. Aubin, K. Orginos, PoS LATTICE2011, 148 (2011)

17. 17.

R.W. Schiel, Phys. Rev. D 92, 034512 (2015). arXiv:1503.02588 [hep-lat]

18. 18.

K. Ottnad et al., EPJ Web Conf. 175, 06026 (2018). arXiv:1710.07816 [hep-lat]

19. 19.

B. Kostrzewa, J. Ostmeyer, M. Ueding, C. Urbach, hadron: package to extract hadronic quantities. https://github.com/HISKP-LQCD/hadron, 2020, R package version 3.0.1

20. 20.

G. Bailas, B. Blossier, V. Morénas, Eur. Phys. J. C 78, 1018 (2018). arXiv:1803.09673 [hep-lat]

21. 21.

E.T.M. Collaboration, R. Baron et al., JHEP 06, 111 (2010). arXiv:1004.5284 [hep-lat]

22. 22.

E.T.M. Collaboration, P. Boucaud et al., Comput. Phys. Commun. 179, 695 (2008). arXiv:0803.0224 [hep-lat]

23. 23.

E.T.M. Collaboration, K. Ottnad, C. Urbach, Phys. Rev. D 97, 054508 (2018). arXiv:1710.07986 [hep-lat]

24. 24.

E.T.M. Collaboration, K. Ottnad et al., JHEP 11, 048 (2012). arXiv:1206.6719 [hep-lat]

25. 25.

E.T.M. Collaboration, C. Michael, K. Ottnad, C. Urbach, Phys. Rev. Lett. 111, 181602 (2013). arXiv:1310.1207 [hep-lat]

26. 26.

M. Werner et al., Eur. Phys. J. A 56, 61 (2020). arXiv:1907.01237 [hep-lat]

27. 27.

ETM Collaboration, A. Abdel-Rehim et al., Phys. Rev. D 95, 094515 (2017). arXiv:1507.05068 [hep-lat]

28. 28.

L. Liu et al., Phys. Rev. D 96, 054516 (2017). arXiv:1612.02061 [hep-lat]

29. 29.

ETM Collaboration, M. Fischer et al.arXiv:2006.13805 [hep-lat]

30. 30.

S. Romiti, S. Simula, Phys. Rev. D 100, 054515 (2019). https://doi.org/10.1103/PhysRevD.100.054515

31. 31.

D.G. Gardner, J.C. Gardner, G. Laush, W.W. Meinke, J. Chem. Phys. 31, 978 (1959). https://doi.org/10.1063/1.1730560

32. 32.

Jülich Supercomputing Centre, J. Large Scale Res. Facilities 1 (2015). https://doi.org/10.17815/jlsrf-1-18

33. 33.

Jülich Supercomputing Centre, J. Large Scale Res. Facilities 4 (2018). https://doi.org/10.17815/jlsrf-4-121-1

34. 34.

Jülich Supercomputing Centre, J. Large Scale Res. Facilities 5 (2019). https://doi.org/10.17815/jlsrf-5-171

35. 35.

K. Jansen, C. Urbach, Comput. Phys. Commun. 180, 2717 (2009). arXiv:0905.3331 [hep-lat]

36. 36.

A. Abdel-Rehim et al., PoS LATTICE2013, 414 (2014). arXiv:1311.5495 [hep-lat]

37. 37.

A. Deuzeman, K. Jansen, B. Kostrzewa, C. Urbach, PoS LATTICE2013, 416 (2013). arXiv:1311.4521 [hep-lat]

38. 38.

ETM Collaboration, A. Deuzeman, S. Reker and C. Urbach, Comput. Phys. Commun. 183, 1321 (2012). arXiv:1106.4177 [hep-lat]

39. 39.

M.A. Clark, R. Babich, K. Barros, R.C. Brower, C. Rebbi, Comput. Phys. Commun. 181, 1517 (2010). arXiv:0911.3191 [hep-lat]

40. 40.

R. Babich et al., Scaling Lattice QCD beyond 100 GPUs, in SC11 International Conference for High Performance Computing, Networking, Storage and Analysis Seattle, Washington, November 12–18, 2011, (2011). arXiv:1109.2935 [hep-lat]

41. 41.

M.A. Clark et al. arXiv:1612.07873 [hep-lat]

42. 42.

R Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria (2019)

43. 43.

H. Takahasi, M. Mori, Publ. Res. Inst. Math. Sci. 9, 721 (1973)

44. 44.

T. Ooura, M. Mori, J. Comput. Appl. Math. 112, 229 (1999). http://www.sciencedirect.com/science/article/pii/S037704279900223X

45. 45.

A. Jibia, M. Salami, Int. J. Comput. Theory Eng. 4, 16 (2012)

46. 46.

S. Cohn-Sfetcu, M.R. Smith, S.T. Nichols, D.L. Henry, Proc. IEEE 63, 1460 (1975)

47. 47.

S. Provencher, Biophys. J. 16, 27 (1976). http://www.sciencedirect.com/science/article/pii/S0006349576856603

## Acknowledgements

Open Access funding provided by Projekt DEAL. The authors gratefully acknowledge the Gauss Centre for Supercomputing e.V. (www.gauss-centre.eu) for funding this project by providing computing time on the GCS Supercomputer JUQUEEN [32] and the John von Neumann Institute for Computing (NIC) for computing time provided on the supercomputers JURECA [33] and JUWELS [34] at Jülich Supercomputing Centre (JSC). This project was funded in part by the DFG as a project in the Sino-German CRC110. The open source software packages tmLQCD [35,36,37], Lemon [38], QUDA [39,40,41] and R [42] have been used.

## Author information

Authors

### Corresponding author

Correspondence to C. Urbach.

Communicated by William Detmold.

## The Gardner method

### The Gardner method

The Gardner method is a tool for the analysis of multicomponent exponential decays. It completely avoids fits and uses Fourier transformations instead. This global approach makes it extremely powerful, but also unstable. In this section we discuss why we do not find the Gardner method applicable to correlator analysis of lattice theories.

### The algorithm

The most general form of a multicomponent exponential decaying function f(t) is

\begin{aligned} f(t)&= \int _0^\infty g(\lambda )\text {e}^{-\lambda t}\,\text {d}\lambda \end{aligned}
(36)

with some integrable function $$g(\lambda )$$ and t bound from below, WLOG $$t\ge 0$$. In the common discrete case we get

\begin{aligned} g(\lambda )&= \sum _{i=0}^{\infty }A_i\delta (\lambda -E_i) \end{aligned}
(37)

where the $$A_i\in {\mathbb {R}}$$ are the amplitudes, the $$E_i$$ are the decay constants, often identified with energy levels, and $$\delta$$ denotes the Dirac-Delta distribution. Gardner et al. [31] proposed to multiply Eq. (36) by $$t=\exp (x)$$ and substitute $$\lambda =\exp (-y)$$ in order to obtain the convolution

\begin{aligned} \text {e}^{x}f\left( \text {e}^{x}\right)&= \int _{-\infty }^{\infty }g\left( \text {e}^{-y}\right) \exp \left( -\text {e}^{x-y}\right) \text {e}^{x-y}\,\text {d}y. \end{aligned}
(38)

This equation can now easily be solved for $$g(\lambda )$$ using Fourier transformations. We define

\begin{aligned} F(\mu )&:=\frac{1}{\sqrt{2\pi }}\int _{-\infty }^{\infty }\text {e}^{x}f\left( \text {e}^{x}\right) \text {e}^{{{\,\mathrm{i}\,}}\mu x}\,\text {d}x\,, \end{aligned}
(39)
\begin{aligned} K(\mu )&:=\frac{1}{\sqrt{2\pi }}\int _{-\infty }^{\infty }\exp \left( -\text {e}^{x}\right) \text {e}^{x}\text {e}^{{{\,\mathrm{i}\,}}\mu x}\,\text {d}x \end{aligned}
(40)
\begin{aligned}&= \frac{1}{\sqrt{2\pi }}\varGamma (1+{{\,\mathrm{i}\,}}\mu ) \end{aligned}
(41)

and obtain

\begin{aligned} g(\text {e}^{-y})&= \frac{1}{2\pi }\int _{-\infty }^\infty \frac{F(\mu )}{K(\mu )}\text {e}^{-{{\,\mathrm{i}\,}}y\mu }\,\text {d}\mu . \end{aligned}
(42)

The Fourier transformation in Eq. (40) has been solved analytically, yielding the complex Gamma function $$\varGamma$$.

The peaks of $$g(\text {e}^{-y})$$ indicate the values of the $$E_i$$ by their positions and the normalised amplitudes $$A_i E_i$$ by their heights. The normalisation is due to the substitution $$g(\lambda )\mapsto \text {e}^{-y}g(\text {e}^{-y})$$.

### Numerical precision

The Fourier integrals (39) and (42) have to be solved numerically. We used the extremely efficient algorithms double exponential formulas [43] for low frequencies $$\le 2\pi$$ and double exponential transformation for Fourier-type integrals [44] for high frequencies $$\ge 2\pi$$.

These techniques allow to achieve machine precision of floating point double precision arithmetics with $$\lesssim 100$$ function evaluations. This however can only work as long as the result of the integral has the same order of magnitude as the maximum of the integrated function. It turns out that this is not the case for the given integrals. $$F(\mu )$$ decays exponentially in $${{\,\mathrm{{\mathcal {O}}}\,}}\left( \exp (-\frac{\pi }{2}|\mu |)\right)$$ (at the same rate as $$K(\mu )$$) if f(t) follows Eq. (36). Thus, as $$|\mu |$$ grows, the sum of values $$\text {e}^{x}f\left( \text {e}^{x}\right) \in {{\,\mathrm{{\mathcal {O}}}\,}}\left( 1\right)$$ approaches zero more and more, loosing significant digits. To avoid this effect one would have to employ higher precision arithmetics.

With double precision arithmetics the values of $$F(\mu )$$ become completely unreliable in the region $$|\mu | > rsim 20$$ where $$F(\mu )$$ approaches machine precision. In practice we find that only $$F(|\mu | \lesssim 10)$$ is precise enough to be trusted.

### Limited data

In the case relevant for this work the data is limited to a noisy time series $$f(t)+\nu (t)$$, $$t\in \{0,\ldots ,n\}$$, where $$\nu (t)$$ is an error. Thus we have to deal with three difficulties, namely a discrete set, a finite range and noise. Additional problems are the aforementioned limitation in precision for high frequencies and possible small gaps between decay constants $$E_i$$ that cannot be resolved. Ref. [45] summarises a large number of improvements to the Gardner method and we are going to mention the relevant ones explicitly below.

Limited precision of $$F(\mu )$$ at high frequencies leads to a divergence of $$\frac{F(\mu )}{K(\mu )}$$ and thus to a divergent integral in Eq. (42). If one does not have or want to spend the resources for arbitrary precision arithmetics, one is therefore forced to dampen the integrand in (42). Gardner et al. [31] originally proposed to simply introduce a cut off to the integral. It turns out that this cut off leads to sinc-like oscillations of $$g(\text {e}^{-y})$$, i.e. a high number of slowly decaying spurious peaks. These oscillations can be removed by introducing a convergence factor of the form $$\exp (-\frac{\mu ^2}{2w^2})$$ instead of the cut off [46]. The effective convolution of the exact result $$g(\text {e}^{-y})$$ with a Gaussian only smoothes $$g(\text {e}^{-y})$$ but does not introduce oscillations. We chose $$w=2$$ for our test runs. This choice does not always yield optimal results, but it is very stable.

Discrete data is probably easiest to compensate. The exponential of a cubic spline of $$\log (f(t))$$ yields a very precise interpolation of the data. Typically for test functions the relative error is less than $$10^{-4}$$. Usually this is far below noise level.

Finite time range is a much more severe problem. The exponential tail of f(t) for $$t\rightarrow \infty$$ carries a lot of information, especially about the lowest decay modes. Thus extrapolation of the data essentially fixes the ground state energy which we are usually most interested in. An extrapolation of some kind is necessary, as a cut off completely obscures the result (see Fig. 12). For a proper extrapolation one would need to know at least the smallest $$E_i$$ in advance, removing the necessity to apply the Gardner method in the first place. In our test runs we used a linear extrapolation of the splines to the log-data.

Provencher [47] proposes to multiply the complete time series by a damping term of the form $$t^\alpha \text {e}^{-\beta t}$$ with $$\alpha ,\beta >0$$ instead of t. This leads to a suppression of the region beyond the data range, but it also moves the peaks of $$g(\text {e}^{-y})$$ closer together, thus decreasing the resolution. Still, Provencher does not remove the necessity of an extrapolation completely. In addition the method introduces two parameters that have to be tuned.

Let us remark here that, given a reliable extrapolation or very long measurement, the inverse of Provencher’s method can be used to improve resolution: choose $$\beta$$ with $$\min (E_i)< \beta < 0$$ and so separate the lowest lying peak from the others. We show the advantage of such a shift of the decay constants in Fig. 13.

Noisy data is not a significant problem by itself, as long as the magnitude is known. Fluctuations can be captured by the bootstrap or other error propagating methods. Severe problems arise if noise is combined with the aforementioned finite range. Then extrapolations based on the last few points (e.g. with the spline method) become very unreliable. We show this effect in Fig. 14 where we slightly increased the value of the very last data point.

### Applicability in practice

We applied the method to some data obtained from lattice QCD simulations. With some fine tuning of $$\beta$$ and a sensible truncation of the data (we removed points below noise level and regions not falling monotonously) one can obtain very good results. Note especially the high resolution of the ground state in Fig. 15, but the relevant exited states can be resolved as well.

Nevertheless we have to conclude that the Gardner method is not broadly applicable to real data commonly obtained from lattice simulations. One reason is that it requires fine tuning of several parameters to obtain good results. The main problem however is the absence of a reliable extrapolation of noisy data from the limited time range. The algorithm does not fail gracefully, i.e. there is no obvious check if the result for $$g(\text {e}^{-y})$$ is correct or not. Thus even though the Gardner method can yield very precise results, one cannot automatise it and rely on the correctness of the output.

As a last remark we would like to add that the Gardner method is also orders of magnitude costlier in terms of computing resources than simpler methods like $$\chi ^2$$-fits.

## Rights and permissions

Reprints and Permissions