Turing's method for the Selberg zeta-function

In one of his final research papers, Alan Turing introduced a method to certify the completeness of a purported list of zeros of the Riemann zeta-function. In this paper we consider Turing's method in the analogous setting of Selberg zeta-functions, and we demonstrate that it can be carried out rigorously in the prototypical case of the modular surface.


Introduction
In [28], Turing described and implemented a numerical procedure for verifying the Riemann Hypothesis (RH) up to a given height T in the critical strip. Turing's procedure was similar to earlier numerical investigations of RH by Gram [11], Backlund [2], Hutchinson [16] and Titchmarsh [27], in that they were all based on isolating zeros on the critical line by finding sign changes in the Hardy function Z(t), and then confirming that no zeros had been missed. Turing's approach differs only in the latter step; where the earlier authors used ad hoc procedures that are valid only for small values of T , Turing introduced a method for certifying the completeness of a purported list of zeros of Z(t) that is guaranteed to work (when the list is in fact complete). Turing's method has remained a small but essential ingredient in all subsequent verifications of RH and its many generalizations; see [5] and [6] for more on Turing's method and its historical background.
Meanwhile, researchers in the high energy physics community have since the early 1990s applied the same idea to certifying lists of zeros of Selberg zeta-functions for hyperbolic manifolds, albeit at a heuristic level (without explicit error estimates) and without attribution to Turing; see for instance [25], where it was used in one of the first investigations of large eigenvalues of the Laplacian for the modular group, PSL (2, Z). In this paper we show, much in the spirit of Turing's computations, that the method can be made rigorous in the case of the modular group.
We begin by describing Turing's method in greater detail, in the context of the Selberg zeta-function. Let H = {z ∈ C : ℑ(z) > 0} denote the hyperbolic plane, and let {f j } ∞ j=1 be a complete sequence of Hecke-Maass cuspforms on PSL(2, Z)\H, with Laplacian eigenvalues 1 4 + r 2 j satisfying 0 < r 1 ≤ r 2 ≤ · · · . Then the associated Selberg zeta-function Z PSL(2,Z) (s) has zeros at s = 1 2 ± ir j . Let N(t) = #{j : r j ≤ t} denote the counting function of zeros in the upper half plane, and suppose that we have accurately computed several r j up to some height T , so we can construct a minorant N − (t) of the step function N(t), for t ≤ T . Suppose hypothetically that we miss an r j at T − H for some H > 0, so that N(t) ≥ N − (t) + 1 for t ≥ T − H.
2010 Mathematics Subject Classification. 11M36, 11F72. The authors were partially supported by EPSRC Grant EP/K034383/1. 1 Integrating this inequality, we get where h 0 (r) = max(0, T −|r|) and Tr(h) = ∞ j=1 h(r j ) denotes the trace of h over the cuspidal spectrum. Unfortunately, although h 0 has a trace in this sense, it is not suitable for applying the Selberg trace formula, but we can get around that by replacing h 0 by a majorant h + 0 with Fourier transform of compact support. Thus, we have T 0 N − (t) dt + H ≤ Tr(h + 0 ). Applying the trace formula, the right-hand side will be the expected main term (described by Weyl's law) plus the error that arises from truncating the support of the Fourier transform. If it happens that we have not actually missed any zeros and we know them precisely enough then we can expect N − (t) dt to be close to the main term, and in fact it may even exceed the main term sometimes. This gives us an upper bound H ≤ H + , i.e. we can provably show that there are no missing zeros up to height T − H + . Moreover, using both upper and lower bounds for Tr(h 0 ) we can estimate T 2 T 1 N(t) dt, which would allow one to carry out the procedure using only the zeros in an interval around T .
This approach is guaranteed to work for large enough T because the error term S(t) in Weyl's law has mean value 0; precisely, it is known that 1 The remainder of the paper is devoted to obtaining such a bound with explicit (and practical!) constants. Our precise result is the following.
Then for T > 1, To demonstrate the usefulness of this bound in practice, we applied it to the list of zeros r j ≤ 178, which are shown to 20 decimal place accuracy at [8]. This list was kindly provided to us by Andreas Strömbergsson, who computed the r j using Hejhal's algorithm [13] and certified them using the program from [10]. Using Theorem 1.1, we obtain the following: The Selberg zeta-function for PSL(2, Z)\H has exactly 2 184 zeros with imaginary part in (0, 177.75]. All of them are simple.

Remarks.
(1) Our method can also be used to show the lower bound though we do not prove that here. The factor of 2 difference between the upper and lower bounds is due to an asymmetry when approximating h 0 above and below by band-limited functions, as we make explicit in the next section.

2
(2) Following Selberg, Hejhal proved the analogue of the estimate 1 T T 0 S(t) dt ≪ (log T ) −2 for a general cofinite Fuchsian group, using the theory of the Selberg zeta-function; see [15, Ch. 2, Thm. 9.7] and [14, Ch. 10, Thm. 2.29]. Sarnak has suggested that this estimate could be obtained directly via the trace formula, and our work realizes that goal in the case of PSL(2, Z). Our method could be generalized to congruence subgroups, and we expect the implied constants that it produces to compare favorably to the Selberg-Hejhal method (which has not been made explicit, to our knowledge).
In the specific case of the modular group, the full asymptotic for N(t) appearing in Theorem 1.1 was computed by Steil [25]. A detailed proof was given by Jorgenson, Smajlović and Then [19], who also computed the lower-order terms of the asymptotic in the case of moonshine groups.
(3) The leading order constant (π/12) 2 could in principle be divided by 4 by using the Kuznetsov formula and the method of Li and Sarnak [20]; however, it would be quite cumbersome to work out an explicit error term in that setting. A more practical means of achieving a factor of 2 savings in the asymptotic result would be to split the spectrum into even and odd parts and derive a bound for each separately; even there, however, the secondary error terms would be worse, and it would likely not result in a savings for T of any practical size. (4) The estimate (1.2) is substantially weaker than Turing's estimate in the context of the Riemann zeta-function, which is O( log T T ). One key reason for this is that the Selberg zeta-function has a much higher density of zeros at large height (∼ 1 6 t vs. ∼ 1 2π log t for Riemann zeta), which in turn makes S(t) noisier; see Figures 1 and 2 for a comparison of the two over the range t ∈ [0, 100]. In fact it is only by virtue of the fact that the analogue of RH is known to hold in this context that we can prove that 1 T T 0 S(t) dt = o(1) as T → ∞. The true rate of decay of 1 T T 0 S(t) dt is not known, but extensive numerics of Then [26] suggest that it should be o(T − 1 2 ). The outline of the paper is as follows. In Section 1.1, we show how the optimization of our upper bound leads naturally to an extremal problem in Fourier analysis, and we sketch a solution that explains the leading-order constants that we can expect to achieve. In Section 2, we recall the Selberg trace formula for PSL(2, Z), which will be our main tool. We will then deduce an asymptotic for the main term of the trace formula applied to h 0 (Section 3) which we can use to estimate the average of S(t) for various ranges of T (Section 4). Section 5 describes the computation to bound the constant B introduced Acknowledgement. We thank Peter Sarnak for helpful comments and the anonymous referees for their valuable feedback and corrections. 1.
1. An extremal problem. The main error in our estimate comes from approximating h 0 by h + 0 for the upper bound, and similarly by a minorant for the lower bound. To control these errors, we have two contrary objectives. First, we want h + 0 to be a good approximation to h 0 , in order to minimize the contribution from the main terms of the trace formula, which for large T are essentially of the form 1 12 R |r|h + 0 (r) dr. Second, we want the support of the Fourier transform of h + 0 to be as small as possible, in order to control the contribution from the hyperbolic terms. Thus, we are led naturally to the following extremal problem.
There are functions f ± as above with Remark. It seems likely that this result is asymptotically best possible. However, we do not attempt to prove that assertion here.
Clearly we want to minimize R F (x) dx. Equivalently, V should be a majorant of v of exponential type 2π such that R (V (x) − v(x)) dx is minimal. The unique such function, found recently by Littmann [21], 12 . Unfortunately, we cannot simply set V = V 0 in our application, for then the integral over |x| > ∆ in (1.5) would diverge. However, it is not hard to see that one can come arbitrarily close to the constant 1 12 using an approximation argument (e.g., we can take V as defined in Proposition 6.4, with X = 1 and δ sufficiently small); we carry out the full details of this in §6.1.

The trace formula for PSL(2, Z)
Our main tool will be the following version of the Selberg trace formula.

Then,
In the definition of D, we write Remark. We refer to I, E, P , D and C as the identity, elliptic, parabolic, discrete and continuous terms, respectively. For "wide" test functions (those of the sort used to measure Weyl's Law), one can think of M and R as the main term and remainder, respectively. Although the terms of the trace formula are only defined for analytic test functions, both Tr(h) and M(h) can be interpreted for any even, continuous function h : R → C satisfying h(t) ≪ (1 + |t|) −2−δ . In turn, for any such h, we may define R(h) by the equality Proof. Theorem 3 of [9] gives the trace formula for Γ ± 0 (N) with character χ. Thus we need to set N = 1, take χ to be the trivial character of modulus q(χ) = 1 and then sum over ǫ ∈ {0, 1} to account for both even and odd eigenfunctions. We also define χ d and t 2 − 4 = dl 2 as in the statement of the theorem and d(n) to be the usual divisor function. Now, considering the term we note that the product is empty so ǫ ∈ {0, 1} together contribute 1 12 R rh(r) tanh(r) dr = I(h).
Turning now to since χ(−1) = 1 (or, in the language of [9], χ is "pure"), we have C χ,ǫ = 2, 0 for ǫ = 0, 1, respectively. Thus we get The next line, and combining this with the previous term we get The final line contributes nothing as the first sum therein is empty and the first term of the penultimate line does not apply either as χ = 1. The second term, i.e.
This leaves us with the term spanning lines 2 and 3. [9, (2.60)] gives us χ(δ) δ 2 −tδ+n≡0 = 1 and the argument on page 142 of the same shows that . This leads us to and |r[1] 1 | is the size of its norm one unit group. Considering the sum for |t| ≥ 3, where t 2 − 4 > 0, we get by Dirichlet's class number formula. We now combine (2.1) and (2.2) to get D(ĥ). Turning where we have used the fact that h is even. Thus the A(±1, 1) terms together with the A(0, 1) term make up E(h). Clearly the t = 2 term contributes nothing as √ 4 − 4 ∈ Q. Finally, we have the contribution from the constant function with eigenvalue λ = 1 4 +r 2 = 0, which leads to and we have the first form of the trace formula as stated. We now use the results of [9, p. 141] (adjusted for our definition of the Fourier transform) to convert from h toĥ, to whit: dt.
where N is the counting function of Theorem 1.1. Motivated by this, we establish the following asymptotic upper bound for M(h 0 ): The proof proceeds by examining the contributions from the identity term, the elliptic terms, the parabolic terms and the constant eigenfunction in turn, as follows.
Proof. We have 3.2. The elliptic terms.
10 Thus we can write 3.3. The parabolic terms. We will need the following preparatory lemma. and Proof.
This leads us to which, by the miracle that is Maple TM , gives In addition, by [30, (A.14)] we have and we are done.
We can now handle the parabolic term as follows: Then Proof. We write We now write log Γ(z) = A(z) + R(z) as in Lemma 3.4. Then we have The integrals involving A evaluate to and since T > 1 we have so we can maximise the contribution from A with 1 π We can now use Lemma 3.4 to handle the integrals of R from σ to σ + i∞ and then using [12, (4.1)] to bound the error in Stirling's approximation we have

The constant eigenfunction.
Lemma 3.6. The contribution from the constant eigenfunction is trivially 3.5. Proof of Proposition 3.1. We now combine Lemmas 3.2, 3.3, 3.5 and 3.6, observing that is negative for T ≥ 4. 13 4. An upper bound on S(t) dt.
We will now use the trace formula to derive an upper bound for S(t) dt. We start with some preparatory lemmas.
Lemma 4.1. Let ϕ : R → R be a smooth, even function such that x 2 ϕ(x) is absolutely integrable and R ϕ(x) dx = 1. Define Then F is even and absolutely integrable, with Fourier transform Proof. Define Then U(y) − u(y) is an odd function of y, and Hence, for r ≤ 0, we have Since U − u is odd, F is even, and therefore absolutely integrable, by the above. Hence, F has a continuous Fourier transform.
Proof. By the convolution theorem, we have A direct computation shows thatĥ 0 (t) = T 2 sinc 2 (πT t), where  Then, for any T ∈ R, we have where C(·) is as defined in Proposition 2.1.
Proof. Define f (T ) := C cos(2πT t) 2(πt) 2φ (t) . We first observe that cosh(πt)−1 2(πt) 2φ (t) is absolutely integrable, so by the Riemann-Lebesgue lemma, lim T →∞ f (T ) = 0. Now differentiating twice with respect to T gives us On the other hand, if we start from and differentiate twice with respect to T , we get g ′′ (T ) = f ′′ (T ). Furthermore, Lemma 4.1 shows that V (r) − max(0, r) is even, and we get and this vanishes in the limit as T → ∞. Therefore, f = g. Then for T ≥ 4 we have with the constant C 0 defined as in Proposition 3.1 above.
Proof. We wish to derive an upper bound for Tr(h 0 ). As established by Proposition 3.1, we can handle M(h 0 ), and we now proceed by considering R(h 0 ) = R(h 1 ) + R(h 0 − h 1 ), where h 1 majorizes h 0 and has a trace we can actually compute. By hypothesis and Lemma 4.1, F is even and non-negative, so if we set h 1 = h 0 * ϕ + 2F then h 1 majorizes h 0 . Further, by Lemma 4.2, we have Returning to h 1 , by Lemma 4.1 we havê We are almost there, but we must cater for the 1 2(πt) 2 term. To that end, we have where B is the constant defined in the statement of the theorem. Finally, Lemma 4.3 gives the formula for the continuous part.

4.1.
Bound for large T . For relatively small T , we can engineer things so that computing the discrete term is tractable. For large T , we use the following: Proposition 4.5. Let the notation be as in Proposition 4.4 and assume thatφ(t) ≥ 0 for t ∈ R. Then for T ≥ 4, Proof. We bound the contribution from the discrete part trivially, using the assumption that ϕ is non-negative: and we are done.

Bounding the constant B
The first task is to derive a rigorous bound for the constant B, which we recall is defined via
Proof. This is a messy but straightforward application of known Fourier transforms.
A priori, there is no guarantee that Strömbergsson's list of r j is complete. However, we will choose b so that there are no unknown r j ≤ b (see §5.3 below). This makes h 2 (r j ) non-positive for any unknown r j , so that where Tr * is the trace over the known r j and Tr † is the trace over the rest. Thus

5.2.
Procedure. We aim to rigorously compute

5.2.1.
Computing Tr * (h 2 ). We have The first integral we compute numerically for each r j in our database using Theorem A.1.
The second integral becomes which again we compute for each of our known r j . The function Si above is the sine integral Si(x) = x 0 sin y y dy.

5.2.2.
Computing I(h 2 ). We have When t is small, computing the x k for Theorem A.1 will produce an interval that straddles zero. To avoid this, we work instead withĥ .
Proof. We majorize the tail of the Taylor expansion with the obvious geometric series.
Proof. We use the trivial estimate

5.2.4.
Computing P (h 2 ). As a reminder, we have Using Maple TM we getĥ To compute a 0 log(4 sinh(πt/2))ĥ ′ 2 (t) dt 20 we must handle the singularity at t = 0. We compute a 0 (log(4 sinh(πt/2)) − log t)ĥ ′ where the second integral can be computed analytically to yield Once past 4a we compute the three further integrals and then bound the tail using the following lemma.

Computing D β(t)
2π 2 t 2 . This is straightforward other than the evaluations of L(1, χ d ). To this end we produced a database containing class numbers h(d) and fundamental units (u and v such that u 2 − dv 2 = ±4 with u, v ∈ Z >0 and v minimal). The database covered each fundamental discriminant d such that dl 2 = t 2 − 4 with t ∈ [3, 10 5 ]. We used the PARI function qfbclassno 1 to compute the class numbers [4] and the PQA algorithm due to Lagrange [24] to compute the fundamental units. This then allows us to compute L(1, χ d ) for this range of d rapidly and rigorously using Dirichlet's class number formula Since β(t) is zero for t / ∈ [−4a, 4a], we can take a = 1 4π log 10 5 + √ 10 10 − 4 2 = 0.916169 . . .
without running out of pre-computed class group data. We also need 1 π ∞ n=1 Λ(n) n h log n π , so for this value of a we need to sum over primes and prime powers ≤ 99 991.
1 quadclassuint is faster but the correctness of its results is conditional on GRH. 21 5.2.6. Computing C β(t) 2π 2 t 2 . This reduces to the finite integral The only issue here is the computation when t is small, when we resort to the series expansion of t −2 (cosh(t) − 1) using the following lemma.
. We will initially take b = The computation takes about twenty minutes on a single core, the time being dominated by computing the trace of the known r j ≤ 177.75.

5.4.
Improving the bound for B further. Even though the bound on B derived above will more than suffice for our immediate needs, it is possible to improve on it further. This could be done by increasing the exponent 8 used in the definition of β (equation (5.1)). The true value of B is more like but we do not prove that here.
6. Verifying Theorem 1.1 Our verification proceeds in four stages. We will first prove the theorem for T ∈ [100, 27 400] via interval arithmetic and use this to verify Corollary 1.2. We will then rigorously verify that Theorem 1.1 holds for T ∈ (1, 100] by direct computation. Next we will check, again using interval arithmetic, that the theorem holds for T ∈ [27 400, 10 6 ]. Finally we will show that it holds for all T ≥ 10 6 .
First, however, we must define the function ϕ used in Proposition 4.4.
We will need to bound the term involving the trigamma function. The following will suffice.
Lemma 6.2. For z > 0 we have Also, for z = σ + it with σ > 0 and t ∈ R, we have Proof. We use the multiplication formula to write is the Hurwitz zeta-function. We now apply Euler-Maclaurin summation with We now take N = 0 and K = 2. In the case z > 0 the k = 2 term is − 7 240z 5 and the final integral is trivially less in absolute terms than that. In the second case, the k = 2 term is bounded by 7 240|z| 5 and the final integral contributes less than In the final case, the k = 2 term yields 7 240|z| 5 . The integral can be bounded by 7 240σ 5 and the result follows.
6.2. Verifying Theorem 1.1 for T ∈ [100, 27 400]. For T in this range, we proceed computationally. We chose δ = 0.1 and X = 2.55 and kept them fixed throughout. If we computed all the terms of Proposition 4.4 using interval arithmetic, we would be forced to use a relatively narrow width, and the cost of computing using Theorem A.1 would be prohibitive. Fortunately, the part which determines the maximum width we can get away with is the cos(2πtT ) term within D(·). The rest, including the expensive integrals, are much less susceptible, and we can use a width of between 2 and 128 depending on the size of T .
Having computed everything else with a relatively coarse interval, we then compute D(·) many times with a much narrower interval (typically about 2 −8 ). Even here, most of the terms making up D(·) do not depend on T so they can be pre-computed once and stored.
The entire computation coded in C++ using interval arithmetic takes less than an hour on a 16-core node of Bluecrystal Phase III [1]. Despite the foregoing, this time was still dominated by the rigorous computation of R [k(T + r) + k(T − r)]F (r) dr. Proof. The lemma holds trivially for T ∈ (1, r 1 ) where r 1 = 9.533 . . . is the location of the first zero as T 0 N(t) dt = 0 and T 0 N (t) dt+T ·E(T ) is positive. Letting N + be the majorant of N given by our list of r j , we check that T 0 N + (t) − N (t) dt − T · E(T ) < 0 at each r j and divide the interval between the r j 's into small enough sub-intervals so that the inequality works there too. In all, we checked a little under 500 000 intervals covering (r 1 , 100] in a little under 10 seconds to verify the theorem. The nearest miss in absolute terms was around T = 20.6862978, where 6.5. Verifying Theorem 1.1 for T ∈ [27 400, 10 6 ]. Once T is large enough, we can dispense with D(·) and appeal to Proposition 4.5 instead. We start with T ∈ [27 400, 27 402] and check that Theorem 1.1 holds, and then move on to the next interval for T . After every 20 iterations, we try to double the width of the interval and continue. If at any point we fail (presumably because our interval for T grew too wide too quickly) we halve the width of the interval and repeat. Coded in C++ using interval arithmetic, the computation takes less than a couple of hours on a single core (and we could have parallelized it trivially). By the end, the width of the interval had increased to 16 384 and we have: Lemma 6.7. Theorem 1.1 holds for T ∈ [27 400, 10 6 ]. 6.6. Verifying Theorem 1.1 for T ≥ 10 6 . We now look at each of the contributions to our bound for S(t) dt coming from Proposition 4.5. We will need several preparatory lemmas. We assume the notation and hypotheses of Propositions 4.4 and 6.4, and we fix δ = 0.842 throughout.
Proof. Since k is an even function of r, we need only consider r ≥ 0. We first check that k(r) ≤ 4 5 for r ∈ [0, 2] by interval arithmetic. Then since r tanh(πr) 12 ≤ r 12 for all r ≥ 0, we need only check that at r = 2 and observe that the ℜψ(1 + 2ir) term is an increasing function of r whereas the cosh term is decreasing.
Proof. We have 2 R k(r)F (r) dr ≤ 2 2 −2 4 5 F (r) dr + 4 and appeal to Theorem A.1 directly providing the rescaled function has no poles inside the circle |z| = 2. The most common awkward situation is where f does have a single pole on the real line at t = ρ and t 1 > t 0 > ρ. In this case, we are still in the clear so long as t 1 < 3t 0 − 2ρ which may force us to perform the integral piecewise to cover the required interval.