Modular Invariance, Tauberian Theorems, and Microcanonical Entropy

We analyze modular invariance drawing inspiration from tauberian theorems. Given a modular invariant partition function with a positive spectral density, we derive lower and upper bounds on the number of operators within a given energy interval. They are most revealing at high energies. In this limit we rigorously derive the Cardy formula for the microcanonical entropy together with optimal error estimates for various widths of the averaging energy shell. We identify a new universal contribution to the microcanonical entropy controlled by the central charge and the width of the shell. We derive an upper bound on the spacings between Virasoro primaries. Analogous results are obtained in holographic 2d CFTs. We also study partition functions with a UV cutoff. Control over error estimates allows us to probe operators beyond the unity in the modularity condition. We check our results in the 2d Ising model and the Monster CFT and find perfect agreement.


Introduction
High energy estimates on various physical quantities are commonly stated locally even though they are only true on average. A few famous examples are: the Froissart bound on the growth of the cross section [1,2], high frequency expansion of conductivity at finite temperature [3,4], high energy asymptotic of the electromagnetic current spectral density in the context of the so-called quark-hadron duality [5], and finally the Cardy formula for two-dimensional CFTs [6]. The latter is particularly interesting because of its importance for the problem of the black hole microstate counting [7][8][9][10]. It is then a natural question to ask: how do these estimates depend on the details of the averaging?
Let us review the standard derivation of the Cardy formula. We consider a thermal partition function Z(β) of a unitary 2d CFT on a Euclidean torus. The partition function is modular invariant Z(β) = Z( 4π 2 β ). This implies that the high-temperature limit β → 0 of the partition function is captured by the contribution of the vacuum in the dual channel Z(β) ∼ e π 2 c 3β , where c is the central charge. Using the standard thermodynamic formula S(β) = (1 − β∂ β ) log Z we can compute the entropy S(β) at high temperatures In the β → 0 limit the energy of the system ∆ = −∂ β log Z = π 2 c 3β 2 + ... goes to infinity. The ∆ → ∞ limit being the thermodynamic limit, see e.g. [11], one obtains from the usual thermodynamic arguments that the extensive part of the entropy, which is given by (1.1), also correctly captures the leading behavior of the microcanonical entropy S δ (∆) defined by S δ (∆) ≡ log ∆+δ ∆−δ d∆ ′ ρ(∆ ′ ), (1.2) as soon as δ is large enough to include many energy levels. That is if we express the temperature as a function of the average energy β = π c/3 ∆ and plug it in (1.1) we arrive at the famous Cardy formula for the micronaconical entropy Indeed, the spectral density ρ(∆) is related to the partition function Z(β) by the inverse Laplace transform. It is sometimes argued that this Laplace transform can be evaluated by a saddle point approximation from which the statement about ρ(∆) and therefore S δ (∆) can be made. A more accurate description of this procedure would be to say that one can easily find the crossing kernel of the vacuum contribution e π 2 c 3β , or, in other words, a spectral density ρ 0 (∆) that correctly reproduces the vacuum in the dual channel. The question then stays: what is the precise relation between the naive spectral density ρ 0 (∆) and the actual physical density ρ(∆)? This relation cannot be too literal.
Indeed, the former is a smooth function of ∆, whereas the latter is a sum of delta-functions.
Once again the physical intuition is that they are related on average, but establishing this rigorously is a nontrivial task. The issue of making the argument precise becomes even more important if one considers "finite size" or 1 ∆ corrections to the Cardy formula. The purpose of this paper is to close this gap in the usual discussions of the Cardy formula and to develop further techniques that allow us to study 1 ∆ corrections to it. The physical question of going from the finite temperature partition function to the microcanonical entropy can be addressed in a mathematically rigorous way using the methods of tauberian theory [12], as explained in [13,14]. From the conformal/modular bootstrap point of view tauberian theory provides a natural set of linear functionals with which we act on the crossing/modularity condition to derive optimal estimates on S δ (∆) or other spectral density averages.
As further noticed in [15] the optimal error estimates can be obtained using the socalled complex tauberian theorems, which exploit the fact that physical quantities of interest are very often analytic functions in a complex domain. This is indeed the case for the modularity condition of 2d CFTs. In this note we apply methods of tauberian theory to modular invariance in 2d CFTs and rigorously derive the Cardy formula and corrections to it, where we explicitly keep track of the dependence on δ. Furthermore, combining these ideas with bounds on the the partition function of Hartman, Keller and Stoica (HKS) [16] we find lower and upper bounds on the number of operators within a given window of finite conformal dimensions (∆ − δ, ∆ + δ). Though true at finite ∆, they are most revealing in the limit ∆ → ∞.

Review of the Results
We consider a modular invariant partition function with zero angular potential and positive spectral density. We derive a set of rigorous results about S δ (∆) (1.2). These concern either all operators present in the theory, or only Virasoro primaries in CFTs with c > 1.
• Let us first discuss densities of all operators, both primaries and descendants. We derive a rigorous asymptotic for the microcanonical entropy where depending on the size of the averaging energy shell δ we show that 1 π , below which we do not have a lower bound. The divergence of s + (δ) at δ = 0 is spurious and is cancelled by log δ in (1.4). 1 By a ∼ b we mean lim a/b = const = 0 in the corresponding limit.
The first two terms in the RHS of (1.4) are the Cardy formula (1.3) and the leading log correction to it discussed for example in [17,10]. The results for s(δ, ∆) are new to the best of our knowledge. In particular, we see that for δ ∼ ∆ α there is yet another universal correction 2 to the microcanonical entropy that is controlled by the central charge c and the width of the energy shell δ and given by the first line in (1.5). Note that for any α > 0 the error decays at large ∆ and the non-decaying contribution to the entropy is fully captured by the 1  is not a constant and can oscillate as we change ∆, but always between the values s ± (δ).
In fact, we will explicitly see these oscillations in the 2d Ising model in section 7.
• For CFTs with c > 1 we can derive analogous formulae for Virasoro primary oper- where ∆ → ∞ and s ± (δ) are the same as in (1.5 This bound is not necessarily optimal. Nevertheless, it is close to the optimal since there are many examples of theories with the spacings equal to 1. 4 • We derive an asymptotic of the microcanonical entropy in holographic 2d CFTs in the limit c → ∞ with ∆/c -fixed and ∆ > c where δ ∼ c α , 0 ≤ α < 1 and δ > δ gap . This relies on the sparseness condition of Hartman, Keller and Stoica (HKS) [16] and extends their result 5 for the microcanonical entropy which is (1.7) with an extra constraint 1 2 < α < 1. As we will explain later on, an important ingredient in the derivation of s ± (δ) and δ gap relies on the existence of functions φ ± (∆ ′ ) with the following properties: 1) φ + (∆ ′ ) and φ − (∆ ′ ) bound the indicator function of the interval (∆ − δ, ∆ + δ) from above and below respectively; 2) Their Fourier transform has a bounded support.
We make an explicit choice of such functions to arrive at the particular value of δ gap and the bounding curves in s ± (δ). Nevertheless, the method is completely general and we leave open the question of finding the functions φ ± (∆ ′ ) giving optimal bounds.
• Above we stated our results at asymptotically high energies. They follow from more general bounds on the number of operators at finite ∆, c, that we derive in section 4. Specifically, given the data about operators ∆ ≤ c/12 we derive rigorous upper and lower bounds on the number of operators in a given window of scaling dimensions. We emphasize that all parameters can be kept finite. In particular, these bounds can be easily implemented numerically. For example, we can derive numerical bounds on the gap above the vacuum, though these turn out to be weaker than [21], [20]. On the other hand we can also bound a number of operators in any window of scaling dimensions at any ∆ above the first excited state as well. 6 4 A famous example is the monster CFT [18,19]. The monster CFT is chiral with (c L , c R ) = (24, 0). However, for zero angular potential its partition function can be interpreted as the one of a non-chiral theory with (c L , c R ) = (12,12), see e.g. [20]. It, therefore, satisfies the modularity constraint imposed in this paper. 5 See appendix A in their paper. 6 Analogous bounds for the spectral density weighted by the squares of OPE coefficients in 1d CFTs were recently derived in [22].
• We consider partition functions with a UV cutoff. We start by proving a generalized Ingham's theorem: Theorem: Consider a positive spectral density ρ(∆), such that the partition function Then the integrated spectral density satisfies The RHS of (1.8) comes from the unit operator in the dual modular channel, which dominates the partition function at high temperatures. The average of the physical density of states in the LHS side of (1.8) is a discontinuous "staircase-like" function. It is approximated by a smooth function in the RHS of (1.8) with a bounded error term. The discontinuities of the LHS of (1.8) are hidden in the non-universal 7 error term in the RHS.
In particular, it does not make sense to write further smooth power suppressed terms in the RHS of (1.8). We will see it explicitly in the example of 2d Ising model that the error term is a highly oscillating function and cannot be approximated by a smooth function.
This example will also demonstrate that the error estimate is optimal.
The asymptotic (1.4), (1.5) of the microcanonical entropy for energy shells δ ∼ ∆ α , 0 < α < 1 follows directly from (1.8). Further, using this theorem we derive a bound on the cutoff partition function at finite temperature 8 where ρ 0 is the vacuum crossing kernel defined below. Depending on the temperature some operators in the dual channel in the RHS of (1.9) dominate over the error term and therefore are captured by the cutoff partition function in the LHS. 7 Everywhere in this paper by "non-universal terms" we mean the terms that are not controlled by light operators in the dual channel. 8 And a similar bound for β < π c 3∆ .

Related Works
The averaging procedure (1.8) was first pointed out in the context of CFTs in [13].
In the mathematical literature the asymptotic (1.8) without the error estimate is known as Ingham's tauberian theorem for large Laplace transform [23]. For a nice exposition of this result see [12], Section IV.21. The relevance of Ingham's theorem for Cardy formula was also emphasized in [24], Appendix C. We give a derivation of (1.8), which is different from the original proof [23]. The novelty of (1.8) is the error estimate which is absent in the Ingham's theorem. In the proof we use the methods of [25], Section 2.3, extensively discussed in [15]. In particular, the error estimate allows us to access subleading operators in the cutoff partition function (1.9).

Setup
Consider a unitary 2d CFT on a torus with the modular parameter τ = 1 2π (θ +iβ) and the coordinate on the torus z = 1 2π (φ+it E ) with standard identifications z ∼ z +1 ∼ z +τ . In these conventions the spatial circle φ has length 2π and the Euclidean time circle t E has length β. The partition function is invariant under the modular transformation τ → −1/τ . In what follows we restrict to zero angular potential θ = 0 so that q = e −β . However, we consider complex β with Re[β] > 0. This is possible due to unitarity. 9 In this case the modular invariance is expressed by (2.2) or, equivalently, where the density of states is defined by and the sum is over all operators in the theory, both primaries and descendants. We will be interested in exploring consequences of (2.2). 10 In the high-temperature limit |β| → 0 the RHS of (2.2) is dominated by the unit operator where ∆ 1 is the first operator above the vacuum.
To write the asymptotic of spectral density it will convenient to introduce a "naive" spectral density ρ 0 (∆) which correctly reproduces the contribution of the vacuum in the partition function. The correct expression takes the form where θ(x) is the Heaviside step function. This, of course, cannot be literally an approximation of the physical density of states (2.4), as the latter is a sum of delta functions. The index "0" in the LHS of (2.6) is reminding us of that. Nevertheless, the Laplace transform of (2.6) coincides with the unit operator contribution into the partition function The function ρ 0 (∆) can be naturally called "crossing kernel" in analogy with [27].

HKS Bound on Heavy Operators
An important result for obtaining bounds on the spectral density at finite ∆ will be the bound of Hartman, Keller, Stoica (HKS bound) [16] on the contribution of heavy operators into the partition function. We review its derivation in this section.
We split the partition function as

Modular invariance states that
where by primes we denote the dual channel This also implies a bound on Z ′ H via modular invariance Exchanging β and β ′ in (3.5) we can turn it into a bound at high temperatures Depending on the temperature the bound on the heavy operators is either (3.4) or (3.6).
Everywhere we assume that ∆ H > c/12.
Finally, (3.4), (3.6) lead to bounds on the full partition function Note that the bounds (3.4), (3.6) stay finite if we take β → 2π. Indeed, Z L − Z ′ L is zero and cancels the zero of the denominator. Whereas ∆ H is strictly above the BTZ threshold c 12 .

Local Bound on the Number of Operators
We can use modular invariance together with the HKS bound to derive a local bound on the density of operators. To that end let us consider two functions φ ± (∆) such that We can multiply this inequality by e −β∆ ′ and use Integrating both sides of (4.2) with the spectral density ∞ 0 dF (∆ ′ ) we finally obtain an estimate In the inequality above β and δ are free parameters. We will fix β below by making the bound optimal.
Next the idea is to do the Fourier transform φ ± (∆) = ∞ −∞ dt φ ± (t)e −i∆t which turns (4.3) into a bound in terms of the partition function where we introduced the Laplace transform L of a density ρ (4.5) As a next step we apply a modular transformation to L(β + it) and separate the contribution of light and heavy operators in the dual channel. We write L(β + it) = e −(β+it)c/12 Z(β + it) = e −(β+it)c/12 Z( 4π 2 β+it ) and split Z = Z L + Z H . As in (2.7) we can rewrite e −(β+it)c/12 Z L ( 4π 2 β+it ) = L ρ 0 ,L (β + it), where the superscript ρ 0 refers to the fact that the Laplace transform is computed with the crossing kernel rather than the density of actual physical operators. 11 In this way we get (4.6) We will see below that the light contribution produces the expected Cardy behavior, whereas the contribution of the heavy operators we can estimate using the HKS bound.
by removing phases. Then the RHS of the HKS bound (3.6) diverges exponentially as t → ∞ when applied to Z H 4π 2 β β 2 +t 2 . Therefore we require that φ ± (t) is decaying sufficiently rapidly at t → ∞ so that the integrals in We then have where it was absolutely crucial that the theory under consideration is unitary. The contribution of the heavy operators can be bounded using the HKS bound (3.6) or (3.4). Also rewriting the first term in (4.4) back in ∆-space we have (4.8) 11 Here it is implied that the crossing kernel ρ 0 is not only for the vacuum (2.7), but for all light operators entering Z L . Though in the large ∆ analysis below the vacuum contribution will be dominant.
We do not know what is the best choice of φ ± (∆ ′ ) within the class of functions with the Fourier transform of finite support and satisfying (4.1) that make the bounds optimal.
A simple and convenient choice is (4.9) Note that these functions indeed satisfy (4.1) and their Fourier transform has a bounded support. Moreover, for this particular choice we have These are the values relevant for our finite ∆ results in the 2d Ising section.

Bounds at large ∆
The bound (4.8) substantially simplifies in the limit ∆ ≫ 1. Below we will see that in this case the optimal choice is β = π c 3∆ ≪ 1. Using the HKS bound we can show that the second terms in (4.8) proportional to Z H are subleading for Λ ± < 2π. Indeed we get which will be subleading for Λ ± < 2π (we will see it momentarily below). Therefore we get the bound at large ∆ The integrals can be computed by the saddle point approximation and give We see that dropping the terms (4.10) is indeed justified for Λ ± < 2π. The explicit integration of (4.9) gives (4.13) Note that for δ such that c − > 0 we have to have at least one operator in the interval This happens if where we also used the assumption Λ − < 2π to drop the term (4.10). That is for the simple choice of functions (4.9) we get δ 2 gap = 3 π 2 , which is to say that every modular invariant partition function has to have at least one operator within the window of size Of course, this is completely trivial in 2d CFTs due to the Virasoro descendants. However, in section 6 we will see that the same argument applies to Virasoro primaries as well provided c > 1 and with the same result. It is natural to conjecture that the maximum allowed spacing between Virasoro primairy operators is in fact 1.
Similarly, keeping δ arbitrary we can optimize over 0 < Λ ± < 2π to get the tightest possible bound (4.12). For the lower bound the result is (4. 16) and for the upper bound where a * is the positive solution of the equation a * = 3 tan(a * /4), a * ≈ 3.38 .
where s is of O(1) and can be bounded from (4.16), (4.17). We find It would be interesting to find the optimal bounds on the local density of operators by a better choice of φ ± . To reiterate, in our argument these obey two defining properties: they satisfy (4.1); they have a finite support in Fourier space (4.10). 12 Let us also emphasize that the bounds (4.4), (4.8) are applicable at finite ∆ as well. In this case we should simply keep the terms (4.7) which we can estimate using the HKS bound.

Proof of the Theorem
In the previous section we investigated a local bound on the number of operators in a 2d CFT. In this section we derive a better bound for the case δ ≫ 1. In particular we show that if ∆ ≫ 1 then averaging ρ(∆) over operators in the region [∆ − δ, ∆ + δ] with δ ∼ ∆ α for some α > 0 produces the fixed asymptotic identical to the one given by the crossing kernel ρ 0 (∆) with the controlled error (1.5). As mentioned in the introduction it follows from the theorem (1.8). We prove (1.8) in this section which we repeat for convenience Few comments are in order. Note that by doing a naive inverse Laplace transform of the vacuum contribution, using the saddle point approximation, and integrating over ∆ one would arrive at the correct estimate for F (∆), namely (5.1). Using the saddle point approximation to make a statement about ρ(∆) itself however is not correct. It would be also incorrect to use the saddle point approximation to compute further corrections to F (∆), beyond (5.1).
Let us introduce the difference between the Laplace transform of the physical density of states ρ(∆) and the crossing kernel ρ 0 (∆) Fig. 2: Integration contour in the complex temperature z-plane. We integrate the modular invariance equation (3.2) along the vertical segment C + to derive the bound on the integrated spectral density.
The main idea is to apply a linear functional to the modular invariance equation (3.2) that produces the theta-function θ(∆ − ∆ ′ ) that we want plus terms which we can easily estimate. A convenient choice of the functional is where the integration contour is the interval C + = {Re z = β, −Λ < Im z < Λ} as indicated on the figure fig. 2. The parameters Λ, β, ∆ are so far arbitrary in (5.3). The polynomial in the numerator of (5.3) is chosen to be such that it vanishes at the ends of the interval C, which will be helpful in estimates below.
On the one hand we can estimate (5.3) using modular invariance. Inserting the definition of the Laplace transform and swapping the order of integrations we have Now the idea is to deform the contour C + in the last integral in (5.4) either to the left or to the right for ∆ ′ < ∆ or ∆ ′ > ∆ respectively in order to make the exponential factor e (∆−∆ ′ )z smaller. When we deform to the left we also pick up the residue at z = 0.
We have where G ± (ν) refer to the integrals over the arcs C ± , see fig. 2.
We can use (5.5) to rewrite the equation (5.4) as follows In appendix A we show that 13 Therefore we can bound (5.6) as follows where we used the fact that |δρ(∆ ′ )| ≤ ρ(∆ ′ ) + ρ 0 (∆ ′ ) . 13 The overall coefficient in this estimate is not optimal and can be improved, but it will be enough for our purposes.
In the formula above β is an arbitrary parameter. We would like to choose it to optimize the bound. We will show below that in order to prove (1.8) the correct choice is to set Let us emphasize that the bound (5.8) is valid for finite ∆. In particular we can use the HKS bound to estimate the first term in the RHS of (5.8) and the local bound from the previous section to bound the second term. Below we investigate (5.8) in the large ∆ limit.
To estimate the third integral in the RHS of (5.8) we use the asymptotic (2.6) To estimate the second integral in the RHS of (5.8) we split it into three parts I 1 , I 2 , I 3 We would like to show that all three terms are of O ∆ −3/4 e 2π √ c∆ 3 separately. For I 1 we have where we used monotonicity of (∆ − ∆ ′ ) −2 in the first line and L ρ (β) = O(e π 2 c 3β ) and (5.9) in the third line. In particular, (5.12) shows that we chose to split the integral as in (5.11) in order to produce the correct prefactor in (5.12) Similarly, I 3 is estimated to be of the same order Finally, we need to estimate I 2 . We will do so using a local bound from the previous (5.14) We further split the integral I 2 into To estimate i 1,2,3 we split the integrals into small windows of ∆ ′ in each of which we can apply (5.14) where we used (5.14) and (5.9). The integral i 3 is estimated in a similar fashion. Finally, This finishes the estimate of (5.11).
The last step is to estimate the first term in the RHS of (5.8 which we can estimate using the vacuum contribution in the dual channel. We get where in the second line we used monotonicity of Z H and therefore assumed that ∆ H > c 12 . In the third line we estimated Z H using the vacuum contribution in the dual channel. Choosing Λ < 2π we see that this term is sub-leading. This finishes the proof of (5.1).

Virasoro Primaries
The analysis in previous sections can be readily generalized to the density of Virasoro primary operators. Let's consider c > 1 so that there are infinitely many such operators. In this case Virasoro characters are simply related to the Dedekind function and the partition function takes the form, see e.g. [28], where τ = iβ/2π, d Vir n is the degeneracy of a Virasoro primary ∆ n and the sum goes over all primaries except the vacuum ∆ n > 0. Let's define the density of Virasoro primaries The crossing kernel for the vacuum is given by so that it reproduces the vacuum contribution in the dual channel (6.4)

Local bounds on the number of Virasoro primaries
We can derive bounds analogous to (4.4), (4.8). Essentially the same argument gives 15 The HKS bound for Virasoro primaries can also be derived and takes the form where ∆ H > c−1 12 and we split the partition function into light and heavy contributions The large ∆ analysis is identical to the section 4 and with essentially the same results.
Namely we get with the choice (4.9). That is the gap between Virasoro primaries at large scaling dimensions must be no larger than 2 3 π 2 ≈ 1.1. 15 Here, as in (4.4), it is implied that the crossing kernel ρ Vir 0 is for all light operators entering Z L . But again in the large ∆ analysis below the vacuum contribution will be dominant.
Repeating the rest of the argument from the section 4 we obtain the asymptotic of the microcanonical entropy for energy shells δ = O(1) where s Vir (δ, ∆) is again bounded as in fig. 1.

Cardy formula for Virasoro primaries
The modular invariance dictates the behavior at high temperatures Then the tauberian theorem similar to (1.8) takes the form Its proof is completely analogous to the proof of (1.8) given in the section 5. From here we derive that the microcanonical entropy has the asymptotic (6.9) with s(δ, ∆) given by for any 0 < α ≤ 1/2.

Holographic CFTs
In this section we consider holographic 2d CFTs with a sparse spectrum [16] in the limit ∆ ∼ c → ∞. The HKS sparseness condition [16] states that Z L (β) is dominated by the vacuum state for β > 2π and c → ∞ in the sense that We again start with (4.8) and consider the limit In this limit the asymptotic of the vacuum crossing kernel is To optimize the first term in (4.8) we choose As before we find that the second Z H term in (4.8) is suppressed if Λ ± < 2π 1 − 1 12ǫ . Therefore the bound (4.8) can be dominated by the first term only for ǫ > 1 12 , i.e. for states with ∆ > c 6 . In this case we drop the Z H terms, compute the first term in (4.8) by the saddle approximation and get In (7.5) we tacitly assumed that the first term in (4.8) is dominated by the vacuum. For the RHS of (7.5) this relies on sparseness condition and we give more detail in appendix C. In particular, this means that we cannot compute the precise value of c + because it depends on the bound (7.1). On the other hand In the LHS of (7.5) we can simply drop operators above the vacuum since they give positive contribution.
The conclusion is that we have the asymptotic of the microcanonical entropy of states π . We can also consider large widths δ ∼ c α , 0 < α < 1. We estimate by splitting into and applying the bound (7.5) to each term. Both the upper and lower bounds are dominated by the largest exponent k = 2δ − 1 and are estimated to be Therefore we have for the microcanonical entropy where δ ∼ c α , 0 < α < 1. For 0 < α ≤ 1 2 only the first term in the expansion of the square root dominates the error. For 1/2 < α < 1 more terms in the expansion of the square root give a contribution. Essentially, the formula (7.9) states that the entropy is dominated by the states in an O(1) window near the upper limit. In [16] it was derived that S δ (∆) = 2πc ǫ 3 + O(c α ), 1/2 < α < 1. The formula (7.9) extends their result to all 0 < α < 1 and computes corrections to it.
It would be interesting to reproduce our result for the microcanonical entropy (7.9) from the direct bulk computation. The leading contribution to the on-shell action is insensitive to the ensemble choice, however the state of the quantum fields in the black hole background changes which should be taken into account when computing the corrections to the leading Cardy formula, see e.g. [29,30].
Note that the logarithmic correction to the microcanonical entropy (7.9) is completely universal. This feature of AdS black holes in AdS was observed in [10] (see section 5 in that paper) and is due to the fact that there are no translational zero modes in AdS. The situation is drastically different from flat space, where logarithmic corrections to the black hole entropy are sensitive to the low energy spectrum of the theory.

Accessing Subleading Operators
One way to access the subleading operators in the dual channel is the following. Consider the modular condition written as where we split the partition function into the contribution of light ∆ ′ ≤ ∆ and heavy ∆ ′ > ∆ operators. Intuitively, it is clear that the partition is dominated by light (heavy) operators at small (high) temperatures. More precisely, we claim that at small temperatures modular invariance (8.1) can be written as while at high temperatures Equivalently, at small (high) temperatures heavy (light) operators are suppressed beyond the vacuum contribute in the RHS of (8.2). Their effect is then reflected in the density of "light" states ∆ ′ < ∆ in the LHS of (8.2). This can be thought of as "nonperturbative corrections" to Cardy formula from operators beyond the vacuum and is discussed in more detail below. Now let us derive (8.2) -(8.4). Consider for example β ≥ π c 3∆ and the first estimate in (8.4). We write δρ(∆) = ∂ ∆ δF (∆) and integrate by parts to get We can estimate this using the error term in the Cardy formula (1.8). We get For β > π c 3∆ the saddle point in the last integral is outside of the integration range and therefore it is dominated close to the lower limit ∆ → ∞. As a result we get the first estimate in (8.4).
Similarly, the second estimate in (8.4) is obtained by integration by parts and using Cardy formula (1.8).
The formulae (8.2), (8.3) allow us to probe subleading operators in the dual channel.
In particular, one might hope to test (8.2) numerically for finite ∆. We will do so in the 2d Ising model in the next section. Let's see what operators give contributions larger than the error term. Consider an operator with dimension ∆ * in the RHS of (8.2). Its contribution to the partition function in the dual channel takes the form e π 2 c 3β − 4π 2 β ∆ * . The condition that it is greater than the error term is (8.7) In particular, (8.7) implies that we have to scale β ∼ ∆ −1/2 if we would like to access a finite number of operators in the dual channel in the limit ∆ → ∞.
To summarize, the partition function (8.2) with the UV cut-off ∆ and temperature β > π c 3∆ allows to systematically probe the operators in the dual channel satisfying (8.7). We will test (8.2) numerically in the 2d Ising model in section 6.

Example: 2d Ising
In this section we check our results in the 2d Ising model. In particular, we will see that the error estimates are optimal. The partition function is given by [31] Z(β) = 1 2 and we restrict to zero angular potential q = e −β as before and the central charge is c = 1 2 . Expanding the partition function in q we can find degeneracies of operators. On fig. 3 we plot the leading order and the error term for the moment F ρ (∆) = ∆ 0 d∆ ′ ρ(∆ ′ ) and find perfect agreement with (5.1). In particular, it is clear from fig. 3 that the error estimate is, in fact, optimal.

σ operator
Now let's see how the effect of the first operator above the vacuum ∆ σ = 1 8 can be seen from the formula (8.2). According to (8.7) for σ to give a contribution bigger than the error term and for ∆ ǫ = 1 be smaller than the error term we require Inserting the numerical values we find that β must be chosen in a window Below we plot the contribution of σ-operator to (9.5) and find perfect agreement. One can also plot the error term similarly to fig. 3.

Microcanonical Entropy
As discussed in the main text the microcanonical entropy S δ (∆) takes the universal form (1.4) at high energies ∆ ≫ 1. Note that, strictly speaking, the bound (1.5) was derived in the large ∆ limit and here we plot it at finite ∆. We present the finite ∆ version of the bound (4.8) on the fig. 5 as well.
Since for 2d Ising model vacuum is the only operator with ∆ < c 12 we use only the vacuum contribution in Z L that enters the HKS bound. We then use the HKS bound to estimate Z H .

Example: Monster CFT
Let us apply our bounds for the microcanonical entropy to monster CFT [18,19].
Recall that it describes a chiral CFT with c = 24 and the partition function that takes the form Z(q) = J(q) = 1 q + 196884q + 21493760q 2 + ... . In principle, nothing prevents us from deriving (6.12) for chiral CFTs. We do not do this here. Instead, at zero angular potential and without imposing invariance under τ → τ + 1 we can interpret (10.1) as a partition function of a non-chiral CFT with c = 12 that satisfies (3.2). Therefore we can apply the asymptotic (1.6) to it directly. On fig. 6 we see that for finite δ the difference between the actual microcanonical entropy and the large ∆ expansion satisfies the expected bounds. We can also probe a subleading universal correction by taking δ = ∆ ǫ . The result is presented on fig. 7.

Discussion
In this paper we studied modular invariance of unitary 2d partition functions. applied to the modularity condition that naturally appear in tauberian theory [12]. The corresponding tauberian theorems are very general and not bounded to the discussion of partition functions in 2d CFTs. In particular, they are applicable to the higher-dimensional discussions of modular invariance [32], warped 2d CFTs [33,34], as well as to the thermal two-point function [35] and the vacuum four-point functions [24,27,[36][37][38][39] (all of which can be studied using modular bootstrap tools in 2d). It would be especially interesting to see if the methods used in this paper could shed light on the Eigenstate Thermalization Hypothesis [40][41][42]11] either in 2d [35] or in higher d using the approach of [15].
We also analyzed our bounds in the large c theories with gravity duals [16,19,43,44] and found (1.7) that the HKS result [16] for the microcanonical entropy can be rigorously extended to include the logarithmic correction with a bounded error of O(1). In this case the microcanonical entropy counts black hole microstates in AdS. In contrast to the situation in flat space, see e.g. [45,46], the logarithmic correction to the black hole entropy in this case is universal and given by (7.9), as explained by Sen [10]. Our results for the logarithmic correction also agree with the old results of Carlip [17] (after averaging!) and constitute a rigorous derivation thereof.
It is important to emphasize that the techniques used in this paper require positivity of the spectral density ρ(∆). Nothing of what we derived here holds if ρ(∆) is not positivedefinite. Even if the asymptotic of the partition function is fixed, one might imagine that many different spectral densities lead to the same asymptotic due to possible cancellations for a non-positive density. It is therefore not clear how to make rigorous the results of [47], which involve (not necessarily positive) three-point functions.
Another important feature of our analysis is that the bounds that we obtained are in principle applicable at finite ∆. To derive them we used the so-called HKS bound [16] which allows one to estimate the contribution of heavy operators to the partition function. We have not fully explored these bounds and it would be interesting to do so, e.g. numerically. Recently a bound analogous to HKS was derived in the context of the four-point functions [48]. Therefore, it should be possible to repeat the large ∆ conformal bootstrap analysis of [15] at finite ∆.
One obvious extension of our analysis is to allow for non-zero angular potential. Since the combined spectral density ρ(∆, J) is positive we should be able to derive the corresponding asymptotic results. The corresponding tauberian problem, however becomes two-dimensional. It will be interesting to extend our results to this case.
Most naturally our work should be thought of as a part of the modular bootstrap program [28,[49][50][51] that systematically studies modular invariance by applying the most general set of linear functionals, both numerically and analytically. The functionals that appear in our work are particularly handy in deriving high-energy bounds. They are optimal in the sense that they give optimal scaling of error terms with ∆ in the limit ∆ → ∞, but not necessarily with optimal coefficients. In particular, it might be possible to improve the bounds (1.4), (1.5) on s(δ, ∆). It would be very interesting to find functionals that optimize these bounds in the spirit of . We leave these tasks for the future.

Appendix B. Power Corrections
We can consider multiple integrals of the density of states [15]    The coefficients c i can be computed explicitly using the crossing kernel (2.6). Again, all the spectral density moments F m ρ (∆) are controlled by the unit operator in the dual channel. The intuition behind (B.1) is that each integration enhances smooth power-like terms while keeping intact oscillating non-universal terms.
The derivation of (B.2) is analogous to the one in section 5. We consider (5. where we used the sparseness condition (7.1).