Modular invariance, tauberian theorems and microcanonical entropy

We analyze modular invariance drawing inspiration from tauberian theorems. Given a modular invariant partition function with a positive spectral density, we derive lower and upper bounds on the number of operators within a given energy interval. They are most revealing at high energies. In this limit we rigorously derive the Cardy formula for the microcanonical entropy together with optimal error estimates for various widths of the averaging energy shell. We identify a new universal contribution to the microcanonical entropy controlled by the central charge and the width of the shell. We derive an upper bound on the spacings between Virasoro primaries. Analogous results are obtained in holographic 2d CFTs. We also study partition functions with a UV cutoff. Control over error estimates allows us to probe operators beyond the unity in the modularity condition. We check our results in the 2d Ising model and the Monster CFT and find perfect agreement.


JHEP10(2019)261 1 Introduction
High energy estimates on various physical quantities are commonly stated locally even though they are only true on average. A few famous examples are: the Froissart bound on the growth of the cross-section [1,2], high-frequency expansion of conductivity at finite temperature [3,4], high energy asymptotics of the electromagnetic current spectral density in the context of the so-called quark-hadron duality [5], and finally the Cardy formula for two-dimensional CFTs [6]. The latter is particularly interesting because of its importance for the problem of the black hole microstate counting [7][8][9][10]. It is then a natural question to ask: how do these estimates depend on the details of the averaging?
Let us review the standard derivation of the Cardy formula. We consider a thermal partition function Z(β) of a unitary 2d CFT on a Euclidean torus. The partition function is modular invariant Z(β) = Z( 4π 2 β ). This implies that the high-temperature limit β → 0 of the partition function is captured by the contribution of the vacuum in the dual channel In the β → 0 limit the energy of the system ∆ = −∂ β log Z = π 2 c 3β 2 + . . . goes to infinity. The ∆ → ∞ limit being the thermodynamic limit, see e.g. [11] , one obtains from the usual thermodynamic arguments that the extensive part of the entropy, which is given by (1.1), also correctly captures the leading behavior of the microcanonical entropy S δ (∆) defined by (1.2) as soon as δ is large enough to include many energy levels. That is if we express the temperature as a function of the average energy β = π c/3 ∆ and plug it in (1.1) we arrive at the famous Cardy formula for the micronaconical entropy In discussions and applications of the Cardy formula the averaging width parameter δ is usually kept implicit. Moreover, the rigorous transition from (1.1) to (1.3) requires some extra work which is usually left to the reader. Indeed, the spectral density ρ(∆) is related to the partition function Z(β) by the inverse Laplace transform. It is sometimes argued that this Laplace transform can be evaluated by a saddle point approximation from which the statement about ρ(∆) and therefore S δ (∆) can be made. A more accurate description of this procedure would be to say that one can easily find the crossing kernel of the vacuum contribution e π 2 c 3β , or, in JHEP10(2019)261 other words, a spectral density ρ 0 (∆) that correctly reproduces the vacuum in the dual channel. The question then stays: what is the precise relation between the naive spectral density ρ 0 (∆) and the actual physical density ρ(∆)? This relation cannot be too literal. Indeed, the former is a smooth function of ∆, whereas the latter is a sum of delta-functions. Once again the physical intuition is that they are related on average, but establishing this rigorously is a nontrivial task. The issue of making the argument precise becomes even more important if one considers "finite size" or 1 ∆ corrections to the Cardy formula. The purpose of this paper is to close this gap in the usual discussions of the Cardy formula and to develop further techniques that allow us to study 1 ∆ corrections to it. The physical question of going from the finite temperature partition function to the microcanonical entropy can be addressed in a mathematically rigorous way using the methods of tauberian theory [12], as explained in [13,14]. From the conformal/modular bootstrap point of view tauberian theory provides a natural set of linear functionals with which we act on the crossing/modularity condition to derive optimal estimates on S δ (∆) or other spectral density averages.
As further noticed in [15] the optimal error estimates can be obtained using the socalled complex tauberian theorems, which exploit the fact that physical quantities of interest are very often analytic functions in a complex domain. This is indeed the case for the modularity condition of 2d CFTs. In this note we apply methods of tauberian theory to modular invariance in 2d CFTs and rigorously derive the Cardy formula and corrections to it, where we explicitly keep track of the dependence on δ. Furthermore, combining these ideas with bounds on the partition function of Hartman, Keller and Stoica (HKS) [16] we find lower and upper bounds on the number of operators within a given window of finite conformal dimensions (∆ − δ, ∆ + δ). Though true at finite ∆, they are most revealing in the limit ∆ → ∞.

Review of the results
We consider a modular invariant partition function with zero angular potential and positive spectral density. We derive a set of rigorous results about S δ (∆) (1.2). These concern either all operators present in the theory or only Virasoro primaries in CFTs with c > 1.
• Let us first discuss densities of all operators, both primaries and descendants. We derive a rigorous asymptotic for the microcanonical entropy where depending on the size of the averaging energy shell δ we show that 1 1 By a ∼ b we mean lim a/b = const = 0 in the corresponding limit. The vertical line is δ = δ gap = 3 π , below which we do not have a lower bound. The divergence of s + (δ) at δ = 0 is spurious and is cancelled by log δ in (1.4).
The first two terms in the r.h.s. of (1.4) are the Cardy formula (1.3) and the leading log correction to it discussed for example in [10,17]. The results for s(δ, ∆) are new to the best of our knowledge. In particular, we see that for δ ∼ ∆ α there is yet another universal correction 2 to the microcanonical entropy that is controlled by the central charge c and the width of the energy shell δ and given by the first line in (1.5). Note that for any α > 0 the error decays at large ∆ and the non-decaying contribution to the entropy is fully captured by the 1 4 term. For δ = O(1) the functions s ± (δ) are plotted on figure 1. In particular, the lower bound diverges logarithmically as we approach δ → δ gap . The interpretation of this is that the asymptotic (1.4) is only applicable for δ > δ gap for which the leading behavior of the microcanonical entropy S δ (∆) takes the form (1.4). Note that the lower bound implies that there have to be operators in an energy shell of the size δ > δ gap .
For δ < δ gap we can only prove an upper bound on the microcanonical entropy which is given by (1.4) and s + (δ) in (1.5). 3 For a fixed δ = O(1) the function s(δ, ∆) in general is not a constant and can oscillate as we change ∆, but always between the values s ± (δ). In fact, we will explicitly see these oscillations in the 2d Ising model in section 7.
• For CFTs with c > 1 we can derive analogous formulae for Virasoro primary operators  where ∆ → ∞ and s ± (δ) are the same as in (1.5). For finite width energy shells, or α = 0, we can again write the lower and upper bounds on the entropy as in (1.5) , as soon as δ > δ gap . A simple consequence of this result is an existence of maximal sparseness of Virasoro primaries. In other words, it follows that • At large scaling dimensions ∆ the spacings between Virasoro primary operators in CFTs with c > 1 cannot be larger than 2δ gap = 2 √ 3 π ≈ 1.1. This bound is not necessarily optimal. Nevertheless, it is close to the optimal since there are many examples of theories with the spacings equal to 1. 4 • We derive an asymptotic of the microcanonical entropy in holographic 2d CFTs in the limit c → ∞ with ∆/c -fixed and ∆ > c where δ ∼ c α , 0 ≤ α < 1 and δ > δ gap . This relies on the sparseness condition of Hartman, Keller and Stoica (HKS) [16] and extends their result 5 for the microcanonical entropy which is (1.7) with an extra constraint 1 2 < α < 1. As we will explain later on, an important ingredient in the derivation of s ± (δ) and δ gap relies on the existence of functions φ ± (∆ ′ ) with the following properties: 1) φ + (∆ ′ ) and φ − (∆ ′ ) bound the indicator function of the interval (∆ − δ, ∆ + δ) from above and below respectively; 2) Their Fourier transform has bounded support.
We make an explicit choice of such functions to arrive at the particular value of δ gap and the bounding curves in s ± (δ). Nevertheless, the method is completely general and we leave open the question of finding the functions φ ± (∆ ′ ) giving optimal bounds.
• Above we stated our results at asymptotically high energies. They follow from more general bounds on the number of operators at finite ∆, c, that we derive in section 4. Specifically, given the data about operators ∆ ≤ c/12 we derive rigorous upper and lower bounds on the number of operators in a given window of scaling dimensions. We emphasize that all parameters can be kept finite. In particular, these bounds can be easily implemented numerically. For example, we can derive numerical bounds on the gap above the vacuum, though these turn out to be weaker than [20,21]. On the other hand, we can also bound a number of operators in any window of scaling dimensions at any ∆ above the first excited state as well. 6

JHEP10(2019)261
• We consider partition functions with a UV cutoff. We start by proving a generalized Ingham's theorem: Theorem. Consider a positive spectral density ρ(∆), such that the partition function The r.h.s. of (1.8) comes from the unit operator in the dual modular channel, which dominates the partition function at high temperatures. The average of the physical density of states in the l.h.s. side of (1.8) is a discontinuous "staircase-like" function. It is approximated by a smooth function in the r.h.s. of (1.8) with a bounded error term. The discontinuities of the l.h.s. of (1.8) are hidden in the non-universal 7 error term in the r.h.s. In particular, it does not make sense to write further smooth power suppressed terms in the r.h.s. of (1.8). We will see it explicitly in the example of 2d Ising model that the error term is a highly oscillating function and cannot be approximated by a smooth function. This example will also demonstrate that the error estimate is optimal.
• The results (1.4), (1.5), (1.8) essentially follow from the high-temperature asymptotic of the partition function dictated by the vacuum operator. A natural question arises whether operators above the vacuum give any constraints on the averaged density of states. At first glance one might hope to see the effects of such operators in the remainder terms in (1.8). However, these remainder terms are in general erratic oscillating functions and do not necessarily have a smooth asymptotic. The resolution is that instead we can consider averages with a kernel ∆ 0 d∆ K(∆)ρ(∆) for some function K(∆). One particularly nice example of this type is, of course, the partition function itself with a UV cutoff, i.e. K(∆) = e −β∆ . Therefore, using the theorem (1.8), we derive a bound on the cutoff partition function at finite temperature 8 where ρ 0 is the vacuum crossing kernel defined below. Depending on the temperature some operators in the dual channel in the r.h.s. of (1.9) dominate over the error term and therefore are captured by the cutoff partition function in the l.h.s. 7 Everywhere in this paper by "non-universal terms" we mean the terms that are not controlled by light operators in the dual channel. 8 And a similar bound for β < π c 3∆ . JHEP10(2019)261

Related works
The averaging procedure (1.8) was first pointed out in the context of CFTs in [13]. In the mathematical literature the asymptotic (1.8) without the error estimate is known as Ingham's tauberian theorem for large Laplace transform [23]. For a nice exposition of this result see [12], section IV.21. The relevance of Ingham's theorem for Cardy formula was also emphasized in [24], appendix C. We give a derivation of (1.8), which is different from the original proof [23]. The novelty of (1.8) is the error estimate which is absent in the Ingham's theorem. In the proof we use the methods of [25], section 2.3, extensively discussed in [15]. In particular, the error estimate allows us to access subleading operators in the cutoff partition function (1.9).

Setup
Consider a unitary 2d CFT on a torus with the modular parameter τ = 1 2π (θ + iβ) and the coordinate on the torus z = 1 2π (φ + it E ) with standard identifications z ∼ z + 1 ∼ z + τ . In these conventions the spatial circle φ has length 2π and the Euclidean time circle t E has length β. The partition function is invariant under the modular transformation τ → −1/τ . In what follows we restrict to zero angular potential θ = 0 so that q = e −β . However, we consider complex β with Re[β] > 0. This is possible due to unitarity. 9 In this case the modular invariance is expressed by (2.2) or, equivalently, where the density of states is defined by and the sum is over all operators in the theory, both primaries and descendants. We will be interested in exploring consequences of (2.2). 10 In the high-temperature limit |β| → 0 the r.h.s. of (2.2) is dominated by the unit operator where ∆ 1 is the first operator above the vacuum. 9 Unitarity implies that degeneracies of operators are positive. Therefore, for complex β the trace in (2.1) converges even better than for real β and, hence, finite. 10 For some rational CFTs the solutions to (2.2) were classified [26].

JHEP10(2019)261
To write the asymptotic of spectral density it will convenient to introduce a "naive" spectral density ρ 0 (∆) which correctly reproduces the contribution of the vacuum in the partition function. The correct expression takes the form where θ(x) is the Heaviside step function. This, of course, cannot be literally an approximation of the physical density of states (2.4), as the latter is a sum of delta functions. The index "0 ′′ in the l.h.s. of (2.6) is reminding us of that. Nevertheless, the Laplace transform of (2.6) coincides with the unit operator contribution into the partition function The function ρ 0 (∆) can be naturally called "crossing kernel" in analogy with [27].

HKS bound on heavy operators
An important result for obtaining bounds on the spectral density at finite ∆ will be the bound of Hartman, Keller, Stoica (HKS bound) [16] on the contribution of heavy operators into the partition function. We review its derivation in this section.
We split the partition function as

Modular invariance states that
where by primes we denote the dual channel β ′ = 4π 2 β . Suppose β ≥ 2π. We would like to estimate Z H This also implies a bound on Z ′ H via modular invariance

JHEP10(2019)261
Exchanging β and β ′ in (3.5) we can turn it into a bound at high temperatures Depending on the temperature the bound on the heavy operators is either (3.4) or (3.6).
Everywhere we assume that ∆ H > c/12. Finally, (3.4), (3.6) lead to bounds on the full partition function Note that the bounds (3.4), (3.6) stay finite if we take β → 2π. Indeed, Z L − Z ′ L is zero and cancels the zero of the denominator. Whereas ∆ H is strictly above the BTZ threshold c 12 .

Local bound on the number of operators
We can use modular invariance together with the HKS bound to derive a local bound on the density of operators. To that end let us consider two functions φ ± (∆) such that We can multiply this inequality by e −β∆ ′ and use Integrating both sides of (4.2) with the spectral density ∞ 0 dF (∆ ′ ) we finally obtain an estimate In the inequality above β and δ are free parameters. We will fix β below by making the bound optimal.
Next the idea is to do the Fourier transform

) into a bound in terms of the partition function
where we introduced the Laplace transform L of a density ρ (4.5)

JHEP10(2019)261
As a next step we apply a modular transformation to L(β + it) and separate the contribution of light and heavy operators in the dual channel. We write L(β + it) = e −(β+it)c/12 Z(β + it) = e −(β+it)c/12 Z( 4π 2 β+it ) and split Z = Z L + Z H . As in (2.7) we can rewrite e −(β+it)c/12 Z L ( 4π 2 β+it ) = L ρ 0 ,L (β + it), where the superscript ρ 0 refers to the fact that the Laplace transform is computed with the crossing kernel rather than the density of actual physical operators. 11 In this way we get We will see below that the light contribution produces the expected Cardy behavior, whereas the contribution of the heavy operators we can estimate using the HKS bound.
by removing phases. Then the r.h.s. of the HKS bound (3.6) diverges exponentially as t → ∞ when applied to Z H 4π 2 β β 2 +t 2 . Therefore we require that φ ± (t) is decaying sufficiently rapidly at t → ∞ so that the integrals in (4.6) converge.
One simple choice is to take φ ± (t) with support in a bounded region t ∈ [−Λ ± , Λ ± ]. We then have where it was absolutely crucial that the theory under consideration is unitary. The contribution of the heavy operators can be bounded using the HKS bound (3.6) or (3.4). Also rewriting the first term in (4.4) back in ∆-space we have We do not know what is the best choice of φ ± (∆ ′ ) within the class of functions with the Fourier transform of finite support and satisfying (4.1) that make the bounds optimal.

JHEP10(2019)261
A simple and convenient choice is (4.9) Note that these functions indeed satisfy (4.1) and their Fourier transform has a bounded support. Moreover, for this particular choice we have These are the values relevant for our finite ∆ results in the 2d Ising section.

Bounds at large ∆
The bound (4.8) substantially simplifies in the limit ∆ ≫ 1. Below we will see that in this case the optimal choice is β = π c 3∆ ≪ 1. Using the HKS bound we can show that the second terms in (4.8) proportional to Z H are subleading for Λ ± < 2π. Indeed we get which will be subleading for Λ ± < 2π (we will see it momentarily below). Therefore we get the bound at large ∆ The integrals can be computed by the saddle point approximation and give We see that dropping the terms (8.6) is indeed justified for Λ ± < 2π. The explicit integration of (4.9) gives Note that for δ such that c − > 0 we have to have at least one operator in the interval (4.14)

JHEP10(2019)261
This happens if where we also used the assumption Λ − < 2π to drop the term (8.6). That is for the simple choice of functions (4.9) we get δ 2 gap = 3 π 2 , which is to say that every modular invariant partition function has to have at least one operator within the window of size 2δ gap = 2 √ 3 π ≈ 1.1 at large ∆. Of course, this is completely trivial in 2d CFTs due to the Virasoro descendants. However, in section 6 we will see that the same argument applies to Virasoro primaries as well provided c > 1 and with the same result. It is natural to conjecture that the maximum allowed spacing between Virasoro primairy operators is in fact 1. 12 Similarly, keeping δ arbitrary we can optimize over 0 < Λ ± < 2π to get the tightest possible bound (4.12). For the lower bound the result is (4. 16) and for the upper bound where a * is the positive solution of the equation This bounds the number of "outliers" and shows what is the maximal local deviation of the density of operators from the Cardy distribution. Note that (4.16), (4.17) already imply Cardy formula in the sense of entropies where s is of O(1) and can be bounded from (4.16), (4.17). We find

JHEP10(2019)261
The formula (4.19) is valid up to corrections suppressed at large ∆. The O(1) contribution s(δ, ∆) is generically an oscillating function of ∆. We will observe this explicitly in the 2d Ising model. The bounds (4.20) are plotted in the figure 1. It would be interesting to find the optimal bounds on the local density of operators by a better choice of φ ± . To reiterate, in our argument these obey two defining properties: they satisfy (4.1); they have a finite support in Fourier space (8.6). 13 Let us also emphasize that the bounds (4.4), (4.8) are applicable at finite ∆ as well. In this case we should simply keep the terms (4.7) which we can estimate using the HKS bound.

Proof of the theorem
In the previous section we investigated a local bound on the number of operators in a 2d CFT. In this section we derive a better bound for the case δ ≫ 1. In particular we show that if ∆ ≫ 1 then averaging ρ(∆) over operators in the region [∆ − δ, ∆ + δ] with δ ∼ ∆ α for some α > 0 produces the fixed asymptotic identical to the one given by the crossing kernel ρ 0 (∆) with the controlled error (1.5). As mentioned in the introduction it follows from the theorem (1.8). We prove (1.8) in this section which we repeat for convenience here Few comments are in order. Note that by doing a naive inverse Laplace transform of the vacuum contribution, using the saddle point approximation, and integrating over ∆ one would arrive at the correct estimate for F (∆), namely (5.1). Using the saddle point approximation to make a statement about ρ(∆) itself however is not correct. It would be also incorrect to use the saddle point approximation to compute further corrections to F (∆), beyond (5.1).
Let us introduce the difference between the Laplace transform of the physical density of states ρ(∆) and the crossing kernel ρ 0 (∆) The main idea is to apply a linear functional to the modular invariance equation (3.2) that produces the theta-function θ(∆ − ∆ ′ ) that we want plus terms which we can easily estimate. A convenient choice of the functional is where the integration contour is the interval C + = {Re z = β, −Λ < Im z < Λ} as indicated on the figure 2. The parameters Λ, β, ∆ are so far arbitrary in (5.3). The polynomial in the numerator of (5.3) is chosen to be such that it vanishes at the ends of the interval C, which will be helpful in estimates below.

JHEP10(2019)261
z C + C -- Figure 2. Integration contour in the complex temperature z-plane. We integrate the modular invariance equation (3.2) along the vertical segment C + to derive the bound on the integrated spectral density.
On the one hand we can estimate (5.3) using modular invariance. Inserting the definition of the Laplace transform and swapping the order of integrations we have 1 2πi Now the idea is to deform the contour C + in the last integral in (5.4) either to the left or to the right for ∆ ′ < ∆ or ∆ ′ > ∆ respectively in order to make the exponential factor e (∆−∆ ′ )z smaller. When we deform to the left we also pick up the residue at z = 0. We have where G ± (ν) refer to the integrals over the arcs C ± , see figure 2.
We can use (5.5) to rewrite the equation (5.4) as follows In appendix A we show that 14

JHEP10(2019)261
Therefore we can bound (5.6) as follows where we used the fact that |δρ(∆ ′ )| ≤ ρ(∆ ′ ) + ρ 0 (∆ ′ ) . In the formula above β is an arbitrary parameter. We would like to choose it to optimize the bound. We will show below that to prove (1.8) the correct choice is to set Let us emphasize that the bound (5.8) is valid for finite ∆. In particular, we can use the HKS bound to estimate the first term in the r.h.s. of (5.8) and the local bound from the previous section to bound the second term. Below we investigate (5.8) in the large ∆ limit.
To estimate the third integral in the r.h.s. of (5.8) we use the asymptotic (2.6) To estimate the second integral in the r.h.s. of (5.8) we split it into three parts I 1 , I 2 , I 3 where we used monotonicity of (∆−∆ ′ ) −2 in the first line and L ρ (β) = O(e π 2 c 3β ) and (5.9) in the third line. In particular, (5.12) shows that we chose to split the integral as in (5.11) in JHEP10(2019)261 order to produce the correct prefactor in (5.12) (∆ − ∆ ′ ) −2 ∆ ′ =∆−∆ 3/8 = ∆ −3/4 . Similarly, I 3 is estimated to be of the same order (5.13) Finally, we need to estimate I 2 . We will do so using a local bound from the previous section (5.14) We further split the integral I 2 into To estimate i 1,2,3 we split the integrals into small windows of ∆ ′ in each of which we can apply (5.14) where we used (5.14) and (5.9). The integral i 3 is estimated in a similar fashion. Finally, This finishes the estimate of (5.11).
The last step is to estimate the first term in the r.h.s. of (5.8 where in the second line we used monotonicity of Z H and therefore assumed that ∆ H > c 12 . In the third line we estimated Z H using the vacuum contribution in the dual channel. Choosing Λ < 2π we see that this term is sub-leading. This finishes the proof of (5.1).

Virasoro primaries
The analysis in previous sections can be readily generalized to the density of Virasoro primary operators. Let's consider c > 1 so that there are infinitely many such operators. In this case Virasoro characters are simply related to the Dedekind function and the partition function takes the form, see e.g. [29], where τ = iβ/2π, d Vir n is the degeneracy of a Virasoro primary ∆ n and the sum goes over all primaries except the vacuum ∆ n > 0. Let's define the density of Virasoro primaries The crossing kernel for the vacuum is given by so that it reproduces the vacuum contribution in the dual channel

Local bounds on the number of Virasoro primaries
We can derive bounds analogous to (4.4), (4.8). Essentially the same argument gives 16 The HKS bound for Virasoro primaries can also be derived and takes the form where ∆ H > c−1 12 and we split the partition function into light and heavy contributions The large ∆ analysis is identical to the section 4 and with essentially the same results. Namely we get with the choice (4.9). That is the gap between Virasoro primaries at large scaling dimensions must be no larger than 2 3 π 2 ≈ 1.1. Repeating the rest of the argument from the section 4 we obtain the asymptotic of the microcanonical entropy for energy shells δ = O(1) where s Vir (δ, ∆) is again bounded as in figure 1. 16 Here, as in (4.4), it is implied that the crossing kernel ρ Vir 0 is for all light operators entering ZL. But again in the large ∆ analysis below the vacuum contribution will be dominant.

Cardy formula for Virasoro primaries
The modular invariance dictates the behavior at high temperatures (6.10) Then the tauberian theorem similar to (1.8) takes the form Its proof is completely analogous to the proof of (1.8) given in the section 5. From here we derive that the microcanonical entropy has the asymptotic (6.9) with s(δ, ∆) given by for any 0 < α ≤ 1/2.

Holographic CFTs
In this section we consider holographic 2d CFTs with a sparse spectrum [16] in the limit ∆ ∼ c → ∞. The HKS sparseness condition [16] states that Z L (β) is dominated by the vacuum state for β > 2π and c → ∞ in the sense that We again start with (4.8) and consider the limit In this limit the asymptotic of the vacuum crossing kernel is To optimize the first term in (4.8) we choose As before we find that the second Z H term in (4.8) is suppressed if Λ ± < 2π 1 − 1 12ǫ . Therefore the bound (4.8) can be dominated by the first term only for ǫ > 1 12 , i.e. for states with ∆ > c 6 . In this case we drop the Z H terms, compute the first term in (4.8) by the saddle approximation and get

JHEP10(2019)261
In (7.5) we tacitly assumed that the first term in (4.8) is dominated by the vacuum. For the r.h.s. of (7.5) this relies on sparseness condition and we give more detail in appendix C. In particular, this means that we cannot compute the precise value of c + because it depends on the bound (7.1). On the other hand In the l.h.s. of (7.5) we can simply drop operators above the vacuum since they give positive contribution. The conclusion is that we have the asymptotic of the microcanonical entropy of states with energy of O(c) π . We can also consider large widths δ ∼ c α , 0 < α < 1. We estimate by splitting into and applying the bound (7.5) to each term. Both the upper and lower bounds are dominated by the largest exponent k = 2δ − 1 and are estimated to be Therefore we have for the microcanonical entropy where δ ∼ c α , 0 < α < 1. For 0 < α ≤ 1 2 only the first term in the expansion of the square root dominates the error. For 1/2 < α < 1 more terms in the expansion of the square root give a contribution. Essentially, the formula (7.9) states that the entropy is dominated by the states in an O(1) window near the upper limit. In [16] it was derived that S δ (∆) = 2πc ǫ 3 + O(c α ), 1/2 < α < 1. The formula (7.9) extends their result to all 0 < α < 1 and computes corrections to it.
It would be interesting to reproduce our result for the microcanonical entropy (7.9) from the direct bulk computation. The leading contribution to the on-shell action is insensitive to the ensemble choice, however, the state of the quantum fields in the black hole background changes which should be taken into account when computing the corrections to the leading Cardy formula, see e.g. [30,31].
Note that the logarithmic correction to the microcanonical entropy (7.9) is completely universal. This feature of AdS black holes in AdS was observed in [10] (see section 5 in that paper) and is due to the absence of translational zero modes in AdS. The situation is drastically different from flat space, where logarithmic corrections to the black hole entropy are sensitive to the low energy spectrum of the theory.

JHEP10(2019)261 8 Operators above the vacuum
All of the results that we discussed so far essentially came from the asymptotic (2.5), that is dictated by the vacuum. A natural question to ask is whether the effect of operators above the vacuum in the r.h.s. of (2.5) can be seen in the asymptotic of the averaged density of states. Equivalently, we can consider a "global" average (5.1). However, the effect of operators above the vacuum cannot be seen directly in ∆ 0 d∆ ρ(∆) because the remainder term in (5.1) is, in general, an oscillating erratic function and might not have a smooth asymptotic. 17 Instead, the resolution is to consider averages with a kernel ∆ 0 d∆ K(∆)ρ(∆) for some function K(∆). One particularly nice quantity of this type is, of course, the partition function itself with a UV cutoff, i.e. K(∆) = e −β∆ . In this section we derive its asymptotic and investigate conditions under which it captures the information about operators above the vacuum in the r.h.s. of the modularity condition (2.5).
Consider the modular condition written as where we split the partition function into the contribution of light ∆ ′ ≤ ∆ and heavy ∆ ′ > ∆ operators. Intuitively, it is clear that the partition function is dominated by light (heavy) operators at small (high) temperatures, i.e. by either the first or the second term in (8.1). More precisely, we claim that at small temperatures modular invariance (8.1) can be written as Equivalently, at small (high) temperatures heavy (light) operators are suppressed Formulae (8.2)-(8.4) hold in the limit ∆ → ∞. We derive them below. But first a few comments are in order. As we take β → ∞ in (8.2) the l.h.s. is dominated by a few light operators, while the r.h.s. , i.e. the dual channel, receives contribution from a large number of

JHEP10(2019)261
heavy operators entering Z(4π 2 /β). The error term is exponentially small in this case. Similarly in (8.3) as we take β → 0 an infinite number of heavy operators dominate the l.h.s. , while a small number of light operators dominate in the r.h.s. Both cases are therefore consistent with the intuition that a light operator in one channel is reproduced by a large number of heavy operators in the dual channel. The most interesting case is the intermediate regime β ∼ ∆ −1/2 when both channels are dominated by light operators in the following sense. In this case we can tune β so that a finite number of light operators beyond the vacuum contribute in the r.h.s. of (8.2). Their effect is then reflected in the density of "light" states ∆ ′ < ∆ in the l.h.s. of (8.2). This can be thought of as "non-perturbative corrections" to Cardy formula from operators beyond the vacuum and is discussed in more detail below. Now let us derive (8.2)-(8.4). Consider for example β ≥ π c 3∆ and the first estimate in (8.4). We write δρ(∆) = ∂ ∆ δF (∆) and integrate by parts to get We can estimate this using the error term in the Cardy formula (1.8). We get For β > π c 3∆ the saddle point in the last integral is outside of the integration range and therefore it is dominated close to the lower limit ∆ → ∞. As a result we get the first estimate in (8.4).
Similarly, the second estimate in (8.4) is obtained by integration by parts and using Cardy formula (1.8).
The formulae (8.2), (8.3) allow us to probe subleading operators in the dual channel. In particular, one might hope to test (8.2) numerically for finite ∆. We will do so in the 2d Ising model in the next section. Let's see what operators give contributions larger than the error term. Consider an operator with dimension ∆ * in the r.h.s. of (8.2). Its contribution to the partition function in the dual channel takes the form e π 2 c 3β − 4π 2 β ∆ * . The condition that it is greater than the error term is (8.7) In particular, (8.7) implies that we have to scale β ∼ ∆ −1/2 if we would like to access a finite number of operators in the dual channel in the limit ∆ → ∞.

Example: 2d Ising
In this section we check our results in the 2d Ising model. In particular, we will see that the error estimates are optimal. The partition function is given by [32] Z(β) = 1 2 and we restrict to zero angular potential q = e −β as before and the central charge is c = 1 2 . Expanding the partition function in q we can find degeneracies of operators.

Unit operator
On figure 3 we plot the leading order and the error term for the moment F ρ (∆) =

σ operator
Now let's see how the effect of the first operator above the vacuum ∆ σ = 1 8 can be seen from the formula (8.2). According to (8.7) for σ to give a contribution bigger than the error term and for ∆ ǫ = 1 be smaller than the error term we require Inserting the numerical values we find that β must be chosen in a window Below we plot the contribution of σ-operator to (9.5) and find perfect agreement. One can also plot the error term similarly to figure 3.

Microcanonical entropy
As discussed in the main text the microcanonical entropy S δ (∆) takes the universal form (1.4) at high energies ∆ ≫ 1.
Here we explicitly plot the O(1) correction s(δ, ∆) to the leading behavior of the entropy in the 2d Ising model, see figure 5. In agreement with the general discussion we find that s(δ, ∆) is an oscillating function with oscillations satisfying general bounds (1.5). Note that, strictly speaking, the bound (1.5) was derived in the large ∆ limit and here we plot it at finite ∆. We present the finite ∆ version of the bound (4.8) on the figure 5 as well. Since for 2d Ising model vacuum is the only operator with ∆ < c 12 we use only the vacuum contribution in Z L that enters the HKS bound. We then use the HKS bound to estimate Z H . 10 Example: monster CFT Let us apply our bounds for the microcanonical entropy to monster CFT [18,19]. Recall that it describes a chiral CFT with c = 24 and the partition function that takes the form In principle, nothing prevents us from deriving (6.12) for chiral CFTs. We do not do this here. Instead, at zero angular potential and without imposing invariance under τ → τ + 1 we can interpret (10.1) as a partition function of a non-chiral CFT with c = 12 that satisfies (3.2). Therefore we can apply the asymptotic (1.6) to it directly. On figure 6 we see that for finite δ the difference between the actual microcanonical entropy and the large ∆ expansion satisfies the expected bounds. We can also probe a subleading universal correction by taking δ = ∆ ǫ . The result is presented on figure 7.
In the box we plot s Vir that the non-universal difference between the two curves is consistent with (1.6).

Discussion
In this paper we studied modular invariance of unitary 2d partition functions. Its most famous consequence is the Cardy formula (1.3), which states that the density of states of unitary CFTs at high energies takes a simple, universal form. Needless to say, there is a vast amount of work where the modular invariance of 2d partition functions is explored and variations of the Cardy formula are discussed. However, we have not found a rigorous derivation of the microcanonical entropy S δ (∆) (1.2) from the canonical entropy (1.1) in the literature. We closed this gap by considering a set of linear functionals applied to the modularity condition that naturally appear in tauberian theory [12]. The corresponding tauberian theorems are very general and not bounded to the discussion of partition functions in 2d CFTs. In particular, they are applicable to the higher-dimensional discussions of modular invariance [33], warped 2d CFTs [34,35], as well as to the thermal two-point function [36,37] and the vacuum four-point functions [24,27,[38][39][40][41] (all of which can be studied using modular bootstrap tools in 2d). It would be especially interesting to see if the methods used in this paper could shed light on the Eigenstate Thermalization Hypothesis [11,[42][43][44] either in 2d [36,37] or in higher d using the approach of [15]. We also analyzed our bounds in the large c theories with gravity duals [16,19,45,46] and found (1.7) that the HKS result [16] for the microcanonical entropy can be rigorously extended to include the logarithmic correction with a bounded error of O(1). In this case the microcanonical entropy counts black hole microstates in AdS. In contrast to the situation for supersymmetric, extremal black holes, 18 see e.g. [47][48][49], the logarithmic correction to the black hole entropy of a non-extremal black hole in AdS 3 is universal and given by (7.9), as explained for example by Sen [10]. Our results for the logarithmic correction also agree with the old results of Carlip [17] (after averaging!) and constitute a rigorous derivation thereof.

JHEP10(2019)261
It is important to emphasize that the techniques used in this paper require positivity of the spectral density ρ(∆). Nothing of what we derived here holds if ρ(∆) is not positivedefinite. Even if the asymptotic of the partition function is fixed, one might imagine that many different spectral densities lead to the same asymptotic due to possible cancellations for a non-positive density. It is therefore not clear how to make rigorous the results of [50], which involve (not necessarily positive) three-point functions. 19 Another important feature of our analysis is that the bounds that we obtained are in principle applicable at finite ∆. To derive them we used the so-called HKS bound [16] which allows one to estimate the contribution of heavy operators to the partition function. We have not fully explored these bounds and it would be interesting to do so, e.g. numerically. Recently a bound analogous to HKS was derived in the context of the four-point functions [52]. Therefore, it should be possible to repeat the large ∆ conformal bootstrap analysis of [15] at finite ∆.
One obvious extension of our analysis is to allow for non-zero angular potential. Since the combined spectral density ρ(∆, J) is positive we should be able to derive the corresponding asymptotic results. The corresponding tauberian problem, however, becomes two-dimensional. It will be interesting to extend our results to this case.
Most naturally our work should be thought of as a part of the modular bootstrap program [21,29,[53][54][55] that systematically studies modular invariance by applying the most general set of linear functionals, both numerically and analytically. The functionals that appear in our work are particularly handy in deriving high-energy bounds. They are optimal in the sense that they give optimal scaling of error terms with ∆ in the limit ∆ → ∞, but not necessarily with optimal coefficients. In particular, it might be possible to improve the bounds (1.4), (1.5) on s(δ, ∆). It would be very interesting to find functionals that optimize these bounds in the spirit of [22,56,57]. We leave these tasks for the future.  Figure 8. Contour deformation for G + .

B Power corrections
We can consider multiple integrals of the density of states [15]  The coefficients c i can be computed explicitly using the crossing kernel (2.6). Again, all the spectral density moments F m ρ (∆) are controlled by the unit operator in the dual channel. The intuition behind (B.1) is that each integration enhances smooth power-like terms while keeping intact oscillating non-universal terms.
The derivation of (B.2) is analogous to the one in section 5. We consider (5.3) with a higher order pole 1 2πi where we used the sparseness condition (7.1). 20 Plus lower orders of ∆ − ∆ ′ due to the expansion of the polynomial in (B.3).

JHEP10(2019)261
Open Access. This article is distributed under the terms of the Creative Commons Attribution License (CC-BY 4.0), which permits any use, distribution and reproduction in any medium, provided the original author(s) and source are credited.