Quantum corrections to the BTZ black hole extremality bound from the conformal bootstrap

Any unitary compact two-dimensional CFT with $c>1$ and no symmetries beyond Virasoro has a parametrically large density of primary states at large spin for $\bar{h}>\bar{h}_\text{extr}\sim \frac{c-1}{24}$, of a universal form determined by modular invariance. By including the contribution of light primary operators and multi-twist composites constructed from them in the modular bootstrap, we find that $\bar{h}_\text{extr}$ receives corrections in a large spin expansion, which we compute at finite $c$. The analysis uses a formulation of the modular S-transform as a Fourier transform acting on the density of primary states. For theories with gravitational duals, $\bar{h}_\text{extr}$ is interpreted as the extremality bound of rotating BTZ black holes, receiving quantum corrections which we compute at one loop by prohibiting naked singularities in the quantum-corrected geometry. This gravity result is reproduced by modular bootstrap in a semiclassical $c\to\infty$ limit.


Introduction
The bootstrap program applied to two-dimensional conformal field theory has been remarkably successful for the classification and solution of rational theories [1], but far less progress has been made towards understanding the space of theories more generally. Irrational theories, aside from presumably being the generic case, are of particular interest as duals to quantum gravity in AdS 3 , and are more similar to CFTs in higher dimensions, for which the bootstrap has enjoyed enormous recent progress [2]. Exact analytic solution or classification of such theories is almost certainly out of reach (with notable exceptions), so our aims must be more modest. Despite these limitations, the constraints of crossing symmetry and modular invariance impose a great deal of rigidity on the spectrum and couplings, even for irrational theories. A wealth of information about infinitely many primary operators can be deduced -practically, not just in principle -from very little data about the light spectrum.
In this paper, we will explore one aspect of this, by combining modular invariance of the partition function with recent results concerning the existence and spectrum of composite operators. The results are most universal, and most interesting, for the region of operator dimensions we call the 'near-extremal' spectrum. For theories with AdS 3 gravitational duals, this corresponds to rapidly rotating BTZ black holes, close to their extremality bound. This bound marks the edge of a range of energy and spin with a parametrically large density of states as determined by the Bekenstein-Hawking formula for black hole entropy, a part of the spectrum that can be treated as effectively continuous for most purposes. Even in generic theories at small central charge, such a continuum of states emerges at large spin, along with a characteristic 'extremality bound' marking its edge. Our main results find spin dependence of this bound, determined by the dimensions of light operators, both for generic theories in a large spin expansion and for holographic theories at large central charge.
From these results, it is natural to propose a picture of the spectrum of an irrational CFT illustrated in figure 1, consisting of a small number of 'fundamental' operators (single trace in holographic theories), multi-twist composites built from them, and an exponentially large density above an extremality bound. 1 The properties of each of these parts of the spectrum are not independent, but bound together through the bootstrap.  Figure 1: A conjectural cartoon of the spectrum of primaries of an irrational CFT. This paper addresses two features of the spectrum, namely multi-twist operators and the extremality bound, and their relation through modular invariance. The points represent composite multi-twist operators, which grow polynomially in number at large spin. The region above the red dashed line -the extremality bound -represents the 'continuum' of operators, with entropy growing as √ c at large spin or central charge c. The twist of light operators determines the shape of the extremality bound. Generically, we can only trust this picture in an asymptotic expansion in spin, so the region ? without perturbative control is large. In a theory with weakly coupled AdS 3 dual, the perturbation theory is instead controlled by large central charge, the validity is extended, and ? is small compared to c.

Universal results for unitary compact CFTs
The first part of the paper concerns the spectrum of a generic unitary, compact 2 CFT. We build up a picture of the complete spectrum based on minimal information about a few light states, leveraged with the strong constraints of conformal symmetry.
For this, we take a new approach to the analytic modular bootstrap 3 , by formulating modular invariance as a condition on the density of states directly, thereafter dispensing with the partition function. Our strategy will be to choose an appropriate collection of known states of the theory, construct a partition function from them, take a modular S-transform, and express the result as a density of states in the new S-transformed channel.
In section 2, we show that, in appropriate variables, this acts simply as a Fourier transform on the density of primary states (following [7,8]). With a judicious choice of the states we start with, there will be a regime of operator dimensions in which the resulting transformed density is indicative of the actual CFT spectrum, because corrections from including more states in the input are parametrically suppressed in that regime.
The basic result of this approach -taking the modular S-transform of the vacuum only -is the venerable Cardy formula (2.15) [9], giving the density of states at large energy. The same argument (with an additional assumption) gives a formula (2.16) for the asymptotic density of states at large spin, in the limit h → ∞ withh held fixed. Including a primary operator besides the vacuum leads to a correction which is suppressed in this limit, as long as the operator in question has h > 0. The required assumption is therefore a twist gap, meaning in particular that the theory contains no conserved currents (operators with h = 0) beyond those associated with local conformal invariance.
The most striking aspect of the large spin asymptotic formula is the 'extremality bound' alluded to above. This is equivalent to the lower bound on the twist gap derived in [5,10] by a closely related method. Fixinḡ h > c− 1 24 , the formula implies a large microcanonical entropy growing with spin = h −h as S ∼ 2π c−1 6 , but forh < c−1 24 the density of states grows more slowly, if there are any states at all. If we takeh in the 'near-extremal' regime, lying slightly above the bound, we find that the density of states at 2 A compact theory is defined to have a discrete spectrum, including a ground state invariant under the sl(2) ⊕ sl(2) global conformal symmetries.
large spin has a square-root edge: In section 4, we discuss the corrections to this large spin Cardy formula arising from a light operator O in the theory (initially, the operator of smallest positive twist 4 ). We could simply include the operator O itself (and descendants) along with the identity before performing the modular transformation on the spectrum, but we can do better. In addition, there is a tower of infinitely many composite operators built from O, discussed in section 3. We infer the existence and properties of these 'multi-trace' operators at large spin from recent results on 'double-twist' operators, which used a bootstrap analysis of four-point function crossing [11,12]. While the precise spectrum of multi-twist operators depends on details of the theory, at large spin the main features are determined universally, by only the central charge and the dimensions of O.
The modular transform of this universal piece of the multi-twist spectrum has its most important effect in the 'near-extremal' regime discussed above, of large spin withh close to c−1 24 . The modular transform of this multi-twist spectrum gives an asymptotic expansion for the density of states, which can be summed into a shift of the 'extremality bound'h extr with a particular spin dependence. For large spin 1 andh close to c−1 24 , we have The parameters in this formula are defined in equations (2.1) and (2.2): P is roughly √ , α is determined by the left-moving dimension h O of the low-twist operator in question, and b is determined by the central charge. We generalise the result to include the contribution of multiple independent light operators, as well as fermions.

Theories with AdS 3 gravitational duals
A particularly important class of irrational CFTs are those with a dual description in terms of gravity in asymptotically AdS 3 spacetimes. The formulas for the density of states in the previous section have an interpretation as the Bekenstein-Hawking entropy (with loop corrections) of BTZ black holes [13,14]. In particular, the boundh ≥ c−1 24 corresponds to the extremality bound M ≥ J, for which there is a classical geometry without naked singularities 5 .
To make contact with our large spin results, the relevant comparison is to account for the effect of light bulk matter fields on the black hole extremality bound. In section 5, we compute the one-loop expectation value of the stress tensor of a scalar field in the Hartle-Hawking state on the BTZ geometry. Treating this as a source for the linearised Einstein equations, the scalar provides a quantum correction to the geometry, and hence to the 'cosmic censorship' criterion that singularities must be hidden behind a horizon. The result is a modified extremality bound: Here, ∆ is the conformal dimension of the operator dual to the scalar, and c is the horizon radius of the extremal black hole in AdS units, taken to be of order one so that is of order c.
In section 6, we derive the same result (generalised to operators with spin) from a bootstrap argument, applying modular invariance to a spectrum with a non-interacting Bose gas of multi-trace operators, in the limit c → ∞ with c fixed. The Bose gas includes a larger collection of operators than the results (1.2) for a generic theory, for which we included only the leading order in large spin, since this piece does not receive corrections from interactions. This difference accounts for the additional terms in the sum over n appearing in (1.3), but these extra terms are correct only in an approximation where the anomalous dimensions of multi-trace operators can be neglected.

Organisation of the paper
In section 2, we review and expand upon the modular S-transform as a Fourier transform acting on the density of states. In section 2.3, we derive the Cardy formula and its large spin version from this point of view. In section 3, we discuss the spectrum of multi-twist operators. We then apply this to the modular bootstrap in section 4, finding the result (1.2).
The last two sections, concerned with the semiclassical limit, can be read independently. Section 5 describes the calculation of one-loop gravitational corrections to the BTZ extremality bound. Finally, we reproduce this result from modular invariance in the semiclassical limit in section 6.
In appendix A, we give some additional technical discussion of the Fourier transform. In appendix B, we count multi-particle states. Appendix C reviews the Wald formalism for conserved quantities, relevant for section 5.
Acknowledgements I would like to thank Nathan Benjamin, Scott Collier, Don Marolf and Eric Perlmutter for helpful discussions and comments, as well as the participants of the 'Chaos and Order' program at the KITP, where this work was initiated, for stimulating discussions. I am grateful to be funded by a Len DeBenedictis Postdoctoral Fellowship, and for additional support received from the University of California. This research was supported in part by the National Science Foundation under Grant No. NSF PHY-1748958.

The modular S-transform
A key tool in our analysis is a formulation of modular invariance of c > 1 theories directly as a property of the spectrum, with no direct reference to the partition function itself. This involves casting the τ → −1/τ S-transform of the torus into the form of an operation on the density of states, which we review and expand upon in this section.

The modular S-matrix
The traditional parameters in CFT -central charge c and dimensions h,h -are not the most natural for describing Virasoro representation theory, so we will find it convenient to introduce different variables. In place of the central charge, we use Q or b, defined by where we choose 0 < b < 1 when c > 25, so in particular c → ∞ corresponds to b → 0. For operator dimensions, we use α or P , along with similar definitions forP ,ᾱ. Note that this splits the spectrum into two ranges, h ≥ c−1 24 corresponding to real P , and h < c−1 24 , corresponding to imaginary P , in which case we will usually use α ∈ [0, Q 2 ). These were referred to respectively as 'continuum' and 'discrete' ranges of operator dimensions in [11]. We will sometimes refer to P as a 'momentum', a terminology from the linear dilaton and related theories, where it is a momentum in target space. Now, the operator content of the theory is encoded by a density of primary states ρ(P,P ), which is a sum of delta functions supported at the locations of primary operators (in a theory with a discrete spectrum). In terms of this density, the partition function is 3) where the latter equation defines ρ as being even in P andP . The characters encode the contribution of all descendants of a nondegenerate Virasoro primary. In particular, this notation does not take into account the special case of the degenerate vacuum; to compensate, the density of states includes negative delta functions to subtract the null descendants. The vacuum contribution to ρ(P,P ) is ρ 1 (P )ρ 1 (P ), where 6 Our results will be based on modular invariance of the partition function Z, specifically invariance under the S-transform: We will study this by asking the following question: given a particular set of primary states of given energy and angular momentum, if we construct a corresponding partition function, make a modular transformation, and decompose the result into Virasoro primaries in the 'dual channel', what is the resulting spectrum? This procedure is enacted directly as a linear operator acting on the density ρ, the 'modular S-matrix' S, an extension of the object familiar from rational CFTs, for which S is a finite-dimensional matrix. The S-matrix we will derive was written down in [7] (in the context of open/closed duality for boundary CFT on the annulus), and applied in [8] to construct candidate modular invariant spectra.
To find the S-matrix, begin with an alternative definition of the kernel S, acting as a modular transform on the characters: We can decompose a partition function Z into characters in these two different channels, with spectrum ρ in the 'original' channel, andρ in the 'dual' channel. Suppressing dependence on barred variables for now, we have where we have used the S-matrix to transform the characters, before exchanging the order of integration. We can now equate the first and final lines and drop the P integral 7 to find that the modular transform acts as an integral transform with kernel S, as desired (now restoring the barred sector):ρ (P ,P ) = ∞ −∞ dP dP S P PSP P ρ(P,P ) (2.12) We now need only compute the kernel. In terms of momentum P , the characters (2.4) are just Gaussians, so we are looking for a transform that maps Gaussians to Gaussians with inverse width. This is nothing but the Fourier transform, written with a slightly unfamiliar normalisation 8 , having the following kernel: Since we are always taking ρ to be even and real, we get the same answer using √ 2e 4πiP P , which is the kernel for the inverse transform. This means that, on the space of functions we are interested in, the Fourier transform squares to unity as expected for the modular S-transform 9 . We could equivalently use a cosine transform, with kernel √ 2 cos(4πP P ). An important special case is the transform acting on the vacuum density (2.5): (2.14)

Mathematics of the S-transform
Some features of the density of states ρ, in particular delta functions supported at imaginary momentum from states with h < c−1 24 , do not appear in conventional discussions of distributions and the Fourier transform. We must allow ρ to live in a more general space of distributions than those commonly considered.
Distributions are defined as duals to some space of well-behaved test functions, that is as continuous linear functionals: a distribution ρ maps a test function ψ to a number ρ, ψ , giving dP ρ(P )ψ(P ) when ρ is an ordinary function. The Fourier transform is usually defined on the space of tempered distributions, dual to smooth and rapidly decaying (Schwartz) test functions (see [15], for example). However, this does not encompass delta functions at imaginary points (a Schwartz function cannot be evaluated at complex values), nor exponentially growing densities like (2.14) (a Schwartz function ψ may not decay rapidly enough for ψ(P )e λP dP to converge). To enlarge our space of distributions sufficiently, we must choose a more restrictive space of test functions, which are entire analytic and decay faster than any exponential. For us, this space need only be large enough to contain the characters, as a function of P for any fixed τ (that is, Gaussians).
For most of our purposes such mathematical details will not worry us, and it will be sufficient to manipulate the distributions naïvely, but with the confidence that there is a rigorous theory underlying our results. Where more care is required, we will refer to appendix A, which contains further mathematical discussion.

Cardy formulas
The modular S-matrix (2.13) determines how a single operator of dimension h contributes to the density of states at dimension h after performing a modular S-transform. Fixing h and taking h to be large, the S-matrix oscillates if h ≥ c−1 24 (real P ), but grows exponentially as e 2πP (Q−2α ) for input operator dimension h < c−1 24 (imaginary P , real α < Q 2 ). The spectrum at large P is therefore determined predominantly, after S-transform, by the operators of smallest h .
For a unitary compact theory, this means that the density of states at large energy is dominated by the identity in the S-transform decomposition: for all states besides the vacuum, α +ᾱ is bounded below by a positive number, so their contribution is exponentially suppressed at large P,P . The spectrum at large energy approaches the S-transform of the vacuum state 10 : ρ(P,P ) ∼ρ 1 (P )ρ 1 (P ) P,P → ∞ (Unitary, compact) This is the Cardy formula for the asymptotic density of primary states [9], along with some correction terms 11 . Further corrections to (2.15) are exponentially suppressed in at least one of P ∼ √ h andP ∼ √h . From this argument, it is a small extension to consider the large spin limit, where we take only P → ∞, fixingP . In this case, for the vacuum alone to dominate, it must be the only Virasoro primary with h = 0, meaning that there is no extended current algebra. More precisely, we demand a twist gap, meaning that there is a positive lower bound on h for all Virasoro primaries besides the vacuum. With this restriction on the theory in 10 This argument is too fast, because it does not bound the contribution of a sum of infinitely many operators. Indeed, (2.15) interpreted literally must be false, since ρ is a sum of delta functions at integer spins. However, contributions from heavy states are oscillatory, so tend to cancel from an appropriately smeared density of states. Refining the argument to make more precise statements is an interesting problem which we do not attempt to address in this paper. See [16] for related discussion. 11 To express the density in terms of h,h or ∆, , a Jacobian for the change of variables must be included. This gives a logarithmic contribution to the entropy of primary states. To compare to the entropy of all states, such as in [17,18], we should add the 'primary entropy' at hp ∼ (1 − c −1 )h to the 'descendant entropy' at h d ∼ c −1 h; the latter is given by the Hardy-Ramanujan asymptotic formula for partitions, and the allocation of energies in h = hp + h d between primaries and descendants maximises the sum of these entropies. Formulas which are insensitive to spin, such as in [19,16], are obtained by integrating over at fixed ∆.
question, we obtain a large spin version of the Cardy formula: ρ(P,P ) ∼ρ 1 (P )ρ 1 (P ) P → ∞,P fixed (Twist gap) (2. 16) In essence, this argument is equivalent to the 'modular lightcone bootstrap' in [5,10] (see also appendix B of [12]), which considers the partition function with τ,τ independent and imaginary, in the limit τ → i∞ with τ fixed. However, we use the density of states directly, without reference to the partition function. This perspective is more useful when the energy and spin of interest never dominate the canonical ensemble, as can occur in a 'near-extremal' limit where we takeP → 0 sufficiently fast as P → ∞: the large spin density (2.16) at PP < c−1 24 has greater free energy than the vacuum alone, so is always subdominant in the partition function to the identity (see [20] in a context of large c gapped theories). This makes it subtle to isolate the 'continuum' piece of the spectrum in this regime from the identity by using the asymptotic behaviour of the partition function.

The modular transform of large spin growth
It is familiar that the behaviour of a function at large parameters is related to the smoothness of its Fourier transform, by results going under the general name of Paley-Wiener theorems. For example, if ρ decays as a power P −2−n , thenρ is n times continuously differentiable, and if ρ decays as an exponential, thenρ has an analytic extension to a strip. Some further results in this tradition, involving distributions which grow at infinity, will be a useful heuristic for us to understand how different regimes of operator dimensions are related by modular transform.
Firstly, we have already met an exponentially growing distribution, the modular transform of the vacuum (2.14), which behaves as e 2πQ|P | in the |P | → ∞ limit. Under Fourier transform, this is related to the fact that the vacuum has support at imaginary momentum P (dual to the result that exponential decay transforms to analyticity in a strip): In the large spin limit of (2.16) (P → ∞), this means that exponential growth comes from operators with low twist (h < c−1 24 ) in the modular transformed channel, and the fastest growth comes from the lowest twist. The asymptotic expansion at large spin is therefore organised by including operators in increasing order of twist.
Secondly, we will see that operators of low twist do not appear in isolation; rather, infinite families of multi-twist operators accumulate at a given twist, with density of states growing polynomially in spin. After Fourier transform, this shows up in the twist dependence of corrections to the large spin Cardy formula (2.16); the transform of a distribution of power law growth is supported on realP , but with a particular singular part atP = 0. Roughly, since a power can be removed by repeated differentiation, the singularity atP = 0 is sufficiently weak that it is removed by multiplication by a power ofP .
A concrete example is the following S-transform, which will be used later: We derive this result in appendix A, and give a precise definition of the distributionρ k (in particular the resolution of the singularity implied by the principal value symbol p.v.). This leading singularity remains correct even for a density supported at discrete values ofP (in particular, integer spins), as long as the integrated density (that is, the total number of operators up to some spin) is asymptotic to 2k .

The spectrum of multi-twist operators
In this section, we discuss the multi-twist composite operators built from a light primary. We begin by reviewing the properties of double-twist operators in section 3.1, before using this to infer the universal properties of multi-twists in section 3.2. The key results for the sequel are (3.4) and (3.6), which give the asymptotic twist and number of operators (respectively) in multi-twist Regge trajectories. A reader impatient apply the modular bootstrap could take these results on trust and proceed directly to section 4. In section 3.3, we discuss a useful intuition and motivation for the multitwist operators as multi-particle states in an AdS 3 dual. In 3.4, we discuss the non-universal properties of the multi-twist spectrum, subject to corrections which depend on details of the theory in question.

Double-twist Regge trajectories from the fusion kernel
The existence of double-twist composite operators follows from crossing symmetry of a four point correlation function, much as the large spin Cardy formula follows from modular invariance. In this section we sketch the argument of [11], leading to the result (3.2) about the spectrum.
For any two primary operators O 1 , O 2 , consider a four-point correlation function O 1 O 2 O 2 O 1 , and use the OPE to write it in terms of the basic data of the theory, namely the central charge, spectrum, and OPE coefficients. The same correlation function can be decomposed in several different ways, in particular the 'S-channel', taking the OPE of O 1 and O 2 , and the 'Tchannel', taking the OPE between pairs of identical operators. These two decompositions must arrive at the same result, which gives the crossing equations relating the S-and T-channel expansions.
In the previous section, modular invariance was recast as invariance of the density of states under the modular S-matrix integral transform (2.12). The crossing equation can be recast in an analogous way, giving an expression for S-channel spectral density (a sum of delta functions supported at the momenta (P,P ) of operators appearing in the OPE, weighted by OPE coefficients) as an integral transform acting on T-channel spectral density. The kernel of this transform is the fusion kernel, or the Virasoro 6j-symbol [21,22]. Table 1 summarises the analogous objects appearing in the partition function and the four-point function.

Conformal blocks Modular invariance
Crossing equation Modular S-matrix S Fusion kernel ρ 1 (P ) (see (2.14)) Identity fusion kernel Table 1: The analysis of partition functions and modular invariance is closely parallel to four-point functions and crossing We highlight one qualitative difference: while for modular invariance the same data (the spectrum) appears in both channels, for crossing the S-and T-channel data are different (unless O 1 = O 2 ), since an operator labelled by p is weighted by OPE coefficients C 2 12p or C 11p C 22p in the S-and T-channel respectively (and one of these may vanish; most importantly, the identity appears in the T-channel but not S-channel).
For the modular bootstrap (2.16), we argued that at large spin, the contribution from the identity in the cross-channel is parametrically larger than the contribution from any operator of positive twist. The same argument applies for the four-point function: for a theory with a twist gap, the Schannel spectral density at large spin is dominated by the contribution from the identity in the T-channel, so is given by the identity fusion kernel to leading order in an asymptotic → ∞ expansion.
At this point the fusion kernel has a crucial new feature absent from the modular S-matrix: it can have support on some additional, discrete operator dimensions. The modular S-transform of the identity (or any single operator) gives a continuous density of states supported only on real P , that is dimensions h ≥ c−1 24 . The fusion transform of the identity includes a similar continuous spectral density for h ≥ c−1 24 , but also discrete delta function 12 contributions at a finite set of dimensions h m < c−1 24 , determined by the external operator dimensions. These are simple to express in the parameterisation (2.2), where they correspond to real α = α m < Q 2 : Note that these discrete contributions only exist if α 1 + α 2 < Q 2 , and for m ≥ 1 only if c > 25, in which case we have chosen 0 < b < 1.
The upshot is that the twist dependence of the large spin S-channel spectral density -the analogue ofρ 1 (P ) in the large spin (here,P → ∞) Cardy formula (2.16) -includes delta functions supported at imaginary P = ±i( Q 2 − α m ). For each α m , there must be a family of operators (a Regge trajectory) labelled by spin , approaching the corresponding twist as → ∞. This argument alone does not imply that there is an operator on this trajectory for every spin, but the Lorentzian inversion formula [23,24,25], implying analyticity of spectral data in spin, supplies this missing link. The result is that, for each m such that α m < Q 2 , there are 'double-twist' primary (3.2) All here means every integer spin starting at the sum of the spins of the component operators 13 , ≥ 1 + 2 (with the possible exception of small spins, for which the inversion formula may not converge); if O 1 = O 2 we have only even spins. In the c → ∞ limit with dimensions of external operators held fixed, the number of double-twist trajectories (the range of allowed m) is of order c, 12 These come from poles in the fusion kernel, so evaluating the residue gives a delta function, c.f. analytic representations of distributions as described in appendix A. 13 We are here organising by representations of the connected part of the conformal group, so spin can be any integer; including parity (if a symmetry of the theory) would combine positive and negative spins in a single representation. There are also double-twist Regge trajectories for states spinning the opposite way, with ≤ 1 + 2 andᾱ approaching the universal values as → −∞ (this double counts the operators with = 1 + 2 at the start of the trajectories). The precise counting here is based on expectations from MFT. and the asymptotic twists of each trajectory are h m, ∼ h 1 +h 2 +m+O(c −1 ). The double-twist spectrum approaches that of mean field theory (generically requiring c to reach the large spin regime). The rate at which the Regge trajectories approach their asymptotic twist α m at large spin (the 'anomalous twist' γ m, ) is determined by the operator of lowest twistᾱ t exchanged in the T-channel besides the identity: The coefficient γ m (given in [11]) is proportional to OPE coefficients C 11t C 22t .

Extending to multi-twists
Given the existence of double-twist operators, it is only a small extension to deduce that there are also higher composite 'multi-twist' operators -though counting them is slightly more challenging. In this section, we describe the most naive proposal for the spectrum, and motivate its correctness (at least to leading order in large spin) and universality.
To establish the existence of multi-twist operators and determine certain properties, it is enough to recursively apply the argument for double-twist operators. Here we will start with a single operator O; the extension to multiple species is straightforward. For the first step, constructing tripletwists, simply run the argument of section 3.1 taking one of the external operators to be a double-twist, ,m are the triple-twists. The large spin bootstrap is valid if the spin of the double-twist we started with is large ( DT 1) and the spin of the resulting triple-twist is much larger still ( − DT 1); then the twist of the composite must approach the universal value 3α + (m DT + m)b. 14 We can then repeat the exercise taking O 1 to be a triple-twist and so forth, to give p-fold composites approaching the following twists at large spin: This argument applies until p is large enough that pα > Q 2 , and we can no longer separate distinct, discrete Regge trajectories from the h > c−1 24 large spin 'continuum'. Note that not all multi-twist operators will approach these universal twists at large spin; however, in 3.4 we argue that most of them do, in a sense to be made precise momentarily.
This recursive argument alone is not, however, sufficient to correctly count the multi-twists. Enumerating all operators constructed in this way will overcount, because the same multi-twist may be built in several ways, in particular due to the Bose symmetry permuting copies of O. At this point, we make the very natural suggestion that the multi-twist spectrum can be regarded (at least to leading order at large spin) as a deformation of mean field theory (MFT, also called a generalised free field). MFT is not a true CFT, but a set of correlation functions on the plane solving the crossing equations. The correlators of O are Gaussian, given by Wick contractions using a conformal two-point function, and the stress tensor decouples 15 (c → ∞). The spectrum of 'multi-trace' operators in MFT is given by a non-interacting Bose gas built from O and its global descendants. We can write the p-trace operators as normal-ordered products, These operators are mostly sl(2) descendants, and only particular linear combinations are primary. These sl(2) primaries are in fact full Virasoro primaries, with Virasoro descendants built by dressing (3.5) with T ,T . 16 We enumerate the multi-trace states of MFT in appendix B. The main result we will require is the large spin degeneracy of primaries at any given twist (that is, fixed m). For each p, m, the total number of primaries with spin at most grows as follows: For example, for p = 2 we get N 2,m ( ) ∼ 2 for every m, as expected for a state on each double-twist Regge trajectory for each even spin. Our assumption is that this formula remains true in our generic, finite c CFT for multi-twist Virasoro primary operators with α ∼ α p,m .
It is straightforward to extend these results to include several independent 'single-twist' operators, and construct mixed multi-twists from them; we can also allow for fermionic operators. These generalisations are discussed in appendix B, and the relevant results introduced when required in section 4.

Multi-twists as multi-particle states in AdS
A strong motivation and intuition for the multi-twist spectrum comes from an interpretation in an AdS dual, which is useful even at finite c and when no local, semiclassical dual exists [26,27,28].
In this context, it is simplest to use the state-operator correspondence to talk not of local primary operators, but the spectrum of primary states on S d−1 (for a CFT in d dimensions). Each such state corresponds to an excitation in global AdS d+1 , with 'centre of mass' wavefunction in the ground state. For Virasoro primaries, the boundary gravitons are also in their lowest energy state, which in particular means that the boundary stress-tensor expectation value is constant. Taking global descendants boosts the configuration so the excitation orbits AdS, confined by the potential induced by the cosmological constant. Taking large spin, the excitation orbits a large proper distance from the centre of AdS.
Multi-twist states now come from making several such excitations, each with large angular momentum, such that they are well separated. A 'cluster decomposition' principle in AdS then suggests that the excitations become independent in the limit of large separation, forming a non-interacting Fock space with appropriate statistics. Note that this does not require the excitations to be 'elementary' or localised in any sense; it may occupy any region of finite size. At small spin, there is no longer a parametric separation between the excitations, so we may not neglect interactions, and so the spectrum depends on details of the theory and excitations.
It is not obvious that this should make sense for a generic CFT, for which there is no obvious bulk description, and if it exists it will not be local on AdS scales (for example, Planck or string scales will be of order AdS ). Nonetheless, the bootstrap analysis shows that a notion of locality emerges at very large distance, on scales parametrically larger than the curvature of AdS. The notion of a gravitational dual is therefore useful in far more generality than might have been expected, in the context of a large spin analysis corresponding to a bulk long distance expansion. For certain quantities, any CFT is describable by a low-energy bulk effective field theory with an AdS scale cutoff (this is related to, but distinct from the 'effective conformal theory' notion of [29]).
The above description is only really valid in d > 2, in which case all unitary interactions fall off at long distance. The situation for d = 2 is complicated by infinite-range interactions mediated by massless particles, which cannot be neglected even at very large separation. This includes the effect of gravity, by which a localised excitation induces a conical defect, which remains important at all distances. Such interactions correspond to currents in the CFT, with vanishing twisth = 0; our assumption of a twist gap ensures that it suffices to include only the effect of gravity. While we do not currently have a satisfactory bulk description of the gravitational effects at finite c, the Virasoro bootstrap [11] shows how to account for them in the spectrum.

Corrections to the multi-twist spectrum
The previous section identified the leading order large spin spectrum of multi-twists: there are families of operators approaching each of the twists in (3.4) as → ∞, and their number is given by (3.6) up to order −1 corrections. This is the only data we can determine universally, that is without additional detailed information about other operators and their coupling to O.
Double-twist Regge trajectories have anomalous twists (3.3), which are small at large spin; the recursive construction extends this to estimate anomalous twists of many of the multi-twist operators. As long as each of the p − 1 spins added at successive stages of the construction is large (an order one fraction of the final total spin , say), we can apply the large spin bootstrap at each stage, and the multi-trace anomalous dimension is suppressed exponentially in √ . There are certainly multi-twist operators to which the above does not apply. The anomalous twists of double-twist operators need not be suppressed in any sense at finite ; furthermore, when we build higher multi-traces, large anomalous twists will appear even in the large spin spectrum, accumulating at values of the twist different from those in (3.4) as → ∞. However, such operators are relatively few. Roughly speaking, the spins added at each stage in the recursive construction correspond to the i in (3.5) (except that one i is dropped, since we are forming primaries), and the p−1 growth in (3.6) comes from the number of ways to choose (p − 1) component spins summing to less than (up to a factor from permutation symmetry). For most of these, each i contributes an order one fraction to the total , and hence the corresponding operator has small anomalous twist. The propor-tion of decompositions for which at least one of the i is less than any fixed number, allowing for large anomalous twist, is of order −1 . Hence, the number of p-twist operators with large anomalous dimension grows only as p−2 , smaller than the total (3.6) by a factor of −1 .
Finally, we expect that there can be fewer multi-twists than anticipated from the Bose gas of MFT, because the inversion formula [23] does not converge for all spins. For scalar O, the double-twist trajectories need only extend to spin two, excluding scalar double-twists (though we should comment that this applies to sl(2) primaries, and it is not entirely clear how it extends to Virasoro primaries). For higher multi-twists, this can lead to a number of 'missing operators', which is again −1 suppressed relative to (3.6).

The S-transform of multi-twist operators
In this section, we combine the large spin spectrum of multi-twist operators, (3.4) and (3.6), with modular invariance in the framework of section 2. From this, we will recover corrections to (2.16) for the 'near extremal' regime of large h withh close to c−1 24 , which we interpret as shifts of the 'extremality bound'.
Before the calculations, we motivate why the S-transform of multi-twists should tell us about the near extremal spectrum, using the Paley-Wiener type results of section 2.4. Starting with the operator O of lowest twist, we are looking at some of the pieces of the spectrum with largest | Im P |; after modular transform, these become some of the most important terms in the P → ∞ large spin expansion, from (2.17). We have also seen that, for a given twist, the simplest and most universal piece of the multi-twist spectrum is the leading power law growth of states with spin; this leading power ofP leads, after S-transform, to the most singular function of twist from (2.18), and hence the terms that are most important for smallP .
While the higher multi-twist (large p) operators have greater twist, and hence result in contributions that are more suppressed in the large spin P → ∞ limit, they also grow faster in number and so these terms are enhanced for smallP . Their importance is thus highlighted in a particular combined limit P → ∞,P → 0, which focusses on the 'near extremal' spectrum.

Shift in extremality bound from universal multi-twists
Begin by taking the p-twist operators with asymptotic twist (3.4), and write the growth of states (3.6) at large in terms of a density 17 inP : Using the results of section 2, in particular (2.15), it is simple to find the spectrum corresponding to the modular transform of these states (keeping only the term with exponential growth large P and dropping the decaying term, and leaving the p.v. symbol implicit): A nice feature of this formula is that it has a useful interpretation even for p = 0 and p = 1, which was not evident from the starting point (4.2). At p = 1, only the m = 0 term is nonzero, and it gives 4e 2π(Q−2α)P , which is the smallP limit of the S-transform of the operator O alone (keeping only the term with growing exponential in P ). The p = 0 terms are nonzero for m = 0, 1, and give the leading term in a smallP expansion of the Stransform of the vacuum (2.14) (again keeping only growing exponentials in P ); the m = 1 term serves to subtract the left-moving null descendants of the vacuum. Now for each of the finitely many p, m such that pα + mb < Q 2 , the density of states includes a contributionρ p,m . Further, it is likely that all other contributions are either more suppressed at large spin P → ∞ (coming from operators of higher twist), or less singular atP → 0 (coming from slower growth in spin for a given twist). In this case, the expressions (4.3) forρ p,m are the most important terms in a near-extremal asymptotic expansion, taking P → ∞ andP → 0 (fixingP e 2παP , perhaps). 18 However, individual terms in the expansion do not have a particularly satisfactory interpretation, particularly since they give densities which are singular asP → 0. To remedy this, we will sum the terms (4.3) into a simple closed form. This requires extrapolating (4.3) to all integers p, m ≥ 0, and extending to an infinite sum. The terms we add are all exponentially small at large spin, so are not even meaningful as corrections to a discrete spectrum, which becomes a smooth density only in an approximation where there are parametrically many relevant states. A conservative interpretation of the resummed density is simply a convenient repackaging of the near-extremal expansion, valid to the highest order which still contributes exponentially growing terms.
Once we have extended (4.3) to include all m, p ∈ N (including p = 0, 1 for the vacuum and O itself), the sums over m and then p are simply binomial series, and we can write the answer as follows: However, this manipulation -as indicated by the interrogative equalityis somewhat too naïve, since we were taking an infinite sum over singular distributions and treating them as ordinary functions. A more careful analysis, described in appendix A, shows that the correct sum is not simply the smooth function ofP in (4.4), but a distribution with additional support at imaginary values ofP , extending to the point where the square root vanishes. Rewriting (4.4) as a density in terms of twisth = c−1 24 +P 2 by including the Jacobian factor dP dh , we find a density which is supported where the argument of the square root is positive. The corrections from this universal piece of the multi-twist spectrum therefore have a simple interpretation, providing a spin-dependent shift to the 'extremality bound'h extr , the edge of the effective continuum of states appearing at large spin:h This captures only the leading terms in a large spin expansion of the extremality bound; in fact, some of the terms that come from expanding the denominator (from m ≥ 1 trajectories) may be less important than the omitted corrections.
We should be clear that the argument here does not 'prove' (4.6); we have simply derived a finite number of terms in an asymptotic expansion. Indeed, in a generic CFT there is no sharp notion ofh extr at finite spin, so it is unclear how to precisely define (4.6) beyond consistency with an asymptotic expansion. The interpretation of the series as a shift of the √ edge of the spectrum is nonetheless extremely compelling, in particular since it requires a very rigid structure of the expansion: a single operator contribution (the p = 1 term here) uniquely determines the leading singularity at all subsequent orders (all m for all p > 1). The truncation of the expansion after a finite number of terms is an expected feature whenh extr is only an approximate notion at large spin. We will see further evidence for for this interpretation in section 6, in a semiclassical c → ∞ limit where there is a sharp extremality bound for spins of order c. In that context, the shift of h extr appears more directly and explicitly. We have so far included only a particular leading order set of terms. We expect further corrections to have two effects. Firstly, we can have additional terms in the asymptotic expansion ofh extr , either from additional operators or including more details about multi-twists. Secondly, we can have terms which are less singular in theP → 0 limit, which will not affect the extremality bound, but instead correct the functional form ρ 1 (P ) of the density of states in a large spin expansion.

Multiple operators
The result of our analysis came from considering the modular transform of a single light operator and the large spin multi-twist descendants constructed from it. Here, we show that adding more light operators, along with the multi-twists constructed from all possible combinations thereof, is consistent with the same interpretation and changes the result simply by summing the corrections toh extr .
Start with operators O i , with dimensions given by (α i ,ᾱ i ), where i runs from 1 to N . The multi-twist trajectories are labelled by a particle number p i for each species, as well as m. The asymptotic degeneracies, computed in appendix B, are given by (3.6) for composites of p = p i operators of a single type, times a multinomial coefficient: For example, the double-twist Regge trajectories built out of distinguishable particles (p 1 = p 2 = 1) number a factor of two larger than for identical particles, since there are double-twist primaries at both even and odd spin. Now the derivation continues as before, with an extra step of summing over all p i with i p i = p (a multinomial series), before summing over p. The result is that the contributions from different operators inside the square root in the density of states (4.4) simply add:

Fermions
It is straightforward to repeat the analysis for a fermionic operator O. The main difference is the statistics of multi-traces, forming a Fermi gas (more details are in appendix B). However, this does not change the count at leading order in large spin, so (3.6) remains valid. In addition to the statistics, we also have a choice of boundary conditions when fermions are involved. In a path integral, this is a choice of antiperiodic (Neveu-Schwarz) or periodic (Ramond) identification going round the two independent cycles of the torus. For the Hilbert space interpretation, the spatial boundary condition determines which states (living in NS or R Hilbert spaces) are being counted, and the Euclidean time boundary condition determines whether fermions get counted with signs (inserting (−1) F in the trace for periodic).
Choosing the multi-trace operators to always belong to the NS sector Hilbert space, as does the vacuum, we can perform our modular bootstrap for either choice of the Euclidean time boundary conditions (NS-NS or NS-R); after S-transform, this means we are counting NS or R states without (−1) F insertion (NS-NS or R-NS). For the NS sector, since the large spin growth of degeneracies is unaltered, the result (4.6) for Bosons carries over immediately to Fermions: For the R sector, the insertion (−1) F in the trace becomes an inclusion of (−1) p before summing over (4.3). This simply swaps the sign inside the square root (4.4), meaning that fermion multi-trace contributions shift the threshold in the opposite way for the R sector:

Anomalous dimensions
The analysis of this section so far used multi-twist operators with exactly linear Regge trajectories, that is with twist (or α) independent of spin (or P ). This is true only in a leading order approximation at large spin, and the operators receive anomalous dimensions as discussed in section 3.4. Here, we characterise how this correction contributes to the S-transformed spectrum for the double-twist trajectories, and argue that it is of subleading importance for the extremality bound; something similar should be true for higher multi-twists, though we have not explicitly checked. For double-twist families, including the anomalous dimension (3.3) modifies the asymptotic density (4.2) as follows (where a conjugate delta function has been dropped): 19 ρ 2,m (P,P ) ∼ |P |δ P + i Q 2 − 2α − mb − γ m e −2πᾱt|P | − · · · (4.11) We now approximate how this deviates from the spectrum (4.3) without anomalous twists, after taking the S-transform. The Fourier transform in P is straightforward from the delta function, so we are left with the following integral for theP transform: Re ∞ dP e 4πiPP P exp −4πγ m e −2πᾱtP P − 1 (4.12) We leave the lower bound of the integral ambiguous, since we are only looking at contribution fromP → ∞ where the large spin expansion of anomalous twist is valid. Low spin operators are discussed at the end of the section. Now, for any fixed P , the integrand decays exponentially asP → ∞, so the resulting transform is an analytic function ofP . This means in particular that we do not have a singularity, like theP −2 in (4.3). However, it is less clear what happens in a simultaneous limit taking P → ∞ along withP → 0; for this, it is easiest to make a change of variable which factors out the P dependence: This is designed to remove the P dependence from the square bracket in the integrand. The other factors then give a log P enhancement, along with an oscillatory exponential in 2 αtP log P . What remains is the Fourier transform of an exponentially decaying function of x, and hence an analytic function ofP : log P × (analytic inP ) (4.14) This correction competes withρ 2,m if log P P −2 , and hence is unimportant in the regime of the shift of the extremality bound (4.6), withP scaling as an exponential of P .
Note that if we perform the change of variable to x and integrate over x > 0, this corresponds toP log P , which includes double-twists with spin of order (log P ) 2 and greater. We can include lower spins in the above argument by a simple bound on their contribution to the S-transform density of states, coming from the number of such operators times the largest modular Smatrix for any one of them. The largest possible contribution comes from the operator with largest negative anomalous dimension, and hence lowest twist α min . Multiplying the number of neglected operators (of order (log P ) 2 ), we have a bound (log P ) 2 e 4π( Q 2 −α min )P . This is of course nonsingular atP = 0, and is a small correction as long as α min > α (so the single operator we started with indeed had lowest twist).
To go to higher orders in the large P expansion ofh extr (4.6), we can first treat any double-traces of particularly low twist (large negative anomalous dimension) as independent operators, using the result of section 4.2. Even if anomalous twists were entirely absent, there would still be corrections to the number of multi-twist operators, beyond the leading order in spin power (3.6). A reasonable guess is that these corrections take the form of terms added to (4.6), starting at order e −8παP from including a constant term in the number of double-traces. We will see an example of additional terms of this form in the semiclassical limit.

Semiclassical AdS 3 gravity
We now explore the gravitational interpretation of the results of section 4, as a quantum shift of the extremality bound of rotating BTZ black holes when AdS 3 gravity is coupled to matter.

BTZ and the extremality bound
We begin by discussing the extremality bound for rotating black holes in pure gravity, and its relation to the large spin boundh extr > c−1 24 . A general stationary axisymmetric metric in three dimensions can be written in the form Solving the vacuum Einstein equations with cosmological constant Λ = −1 (choosing units with AdS = 1) gives the BTZ metric [13,14], and the parameters r ± are determined in terms of the mass and angular momentum (classically) by The mass here is defined such that empty AdS 3 has energy − 1 8G N (again, classically). In terms of scaling dimensions of the corresponding CFT operators, we have M = h +h − c−1 12 and J = = h −h, so The causal structure is determined largely by the zeros of f . The fastest outgoing null geodesics follow dr dt = n(r)f (r), so there is a horizon at the largest value of r for which this vanishes (r = r + for BTZ). Under the cosmic censorship assumption that the singularity at r = 0 is shrouded by a horizon, f must have a positive real root, which implies the extremality bound This is saturated by the extremal black hole r − = r + , so P = Qr + and P = 0. In fact, BTZ above the extremality bound and empty AdS 3 exhaust all 20 exterior 21 solutions of pure AdS 3 gravity without naked singularities. The shift of c to c − 1 is a one-loop effect from metric fluctuations. It can be interpreted simply as a − 1 12 contribution to M from the Casimir energy of gravitons. 22 A less direct way to see this is from a Euclidean partition function for fluctuations around BTZ, which is a modular transform of Euclidean AdS 3 with thermal identifications. The one-loop graviton partition function on the latter (which is exact to all orders in perturbation theory) gives the CFT vacuum character [32,33], with ground state energy determined as − c 12 by conformal invariance. Performing the modular transform to go back to BTZ, we find a spectrum supported on h,h ≥ c−1 24 (with density of primary states (2.14)). While this closely resembles the CFT derivation of the Cardy formula, it is a calculation purely in semiclassical gravity and requires no CFT dual.

Including a scalar field at one loop
To include the effect of matter, take the simplest example of a free scalar minimally coupled to Einstein gravity: To one-loop order, any weakly interacting theory of a scalar coupled to the metric can be brought to this form; for example, a curvature coupling RΦ 2 20 This classification is up to diffeomorphism, but some 'large' diffeomorphisms are physical, acting as asymptotic symmetries; this is interpreted as dressing with a coherent superposition of Virasoro descendants. 21 Other solutions exist, but they are always isometric to BTZ in any region outside a horizon, causally connected to an asymptotic boundary. 22 There is a similar Casimir energy for empty AdS, which should be interpreted as a one-loop renormalisation of the Brown-Hennaux relation [30] between c and GN . AdS3 has classical energy − c bare 12 with c bare = 3 2G N , and graviton Casimir energy − 1 12 − 1, where the subtraction of unity is due to the invariance of the vacuum under L−1,L−1; this adds up to − c 12 with c = c bare + 13 (see [31] for a useful perspective).
can be absorbed by a Weyl transformation of the metric. Integrating out the scalar at one loop sources the Einstein equation at order G N with the expectation value of the stress tensor, giving a quantum correction to the geometry. We will find the range of mass and angular momentum for which the backreacted black hole has a horizon. Our analysis generalises previous work for a massless conformally coupled scalar with 'transparent' boundary conditions 23 , for which the stress tensor expectation value was computed in [34]. In this special case, the linearised Einstein equations were solved in [35,36].
Computation of the expectation value T ab is relatively straightforward since BTZ is locally isometric to AdS 3 , obtained as a quotient AdS 3 /Γ by a subgroup Γ of its SO(2, 2) isometry group. For BTZ, Γ Z as a group, generated by a single element, which acts to identify the angular coordinate as φ ∼ φ + 2π. The Hartle-Hawking state of the free scalar Φ on such a geometry is characterised as the Gaussian state on which the one-point function Φ vanishes, and the two-point function is given by the method of images, using the AdS 3 propagator G ∆ (discussed in a moment): Variation of the action gives the classical stress-tensor: The expectation value T ab is therefore given by a differential operator acting on the two-point function, with an appropriate regularisation to take the limit of coincident points. The description as a quotient makes regularisation straightforward; we may simply drop the term in the sum over images when γ is the identity. This is equivalent to regularising the divergence and adding a counterterm which renormalises the cosmological constant, chosen such that the renormalised stress tensor expectation value vanishes in pure AdS 3 .
The AdS 3 propagator G ∆ solves the equation of motion with a δ-function source at coincident points, We now take the expression (5.10) and write G as a function of proper distance s γ (x, x ) = s(x, γ ·x ), eliminating the mass term using the equation of motion satisfied by G ∆ : Written in this form, the stress tensor is conserved as an identity for any function G ∆ , and we will not need to use any information about the propagator until the very end. The final required ingredient is an expression for the proper distance s γ (x, x ). A convenient way to calculate this is to express the geometry as a quotient of the SL(2, R) group manifold. For extremal BTZ, an explicit form is (5.15) with metric ds 2 = − det(dg), and the quotient acts by imposing 2π periodicity on φ. We can now use a simple expression for the proper distance in SL(2, R) (see [37], for example), The distances s γ (x, x ) for different preimages in the quotient are obtained by adding integer multiples of 2π to ∆φ. With all these ingredients, we could simply push ahead and solve the linearised Einstein equations with source (5.14). While this is possible (in fact, the solution is algebraic in G ∆ and its derivatives for a general function G ∆ ), we can extract the information of interest much more simply using conservation laws.

Conserved quantities
We are interested in the relationship between the variation of the metric at infinity -namely the shift of energy and angular momentum due to the matter source -and at the horizon, where we impose 'cosmic censorship' in the form of existence of the horizon. In pure gravity (or any diffeomorphism invariant theory, on-shell), such a relationship is provided by the first law of black hole thermodynamics dM − ΩdJ = T dS. We will make use of a formulation of the first law in Einstein gravity derived from the covariant phase space methods of Wald et. al. [38,39,40] (see [41] for application to asymptotically AdS spacetimes), allowing for an arbitrary conserved source, which adds an additional term given by an integral of the stress tensor over a Cauchy surface. 24 We review the relevant constructions of the covariant phase space formalism in appendix C. The most important object for us is the 'Hamiltonian variation' δH ξ corresponding to the vector field ξ, which is a (d − 1)-form (in d + 1 dimensional spacetime) depending on the background metric and a variation. If ξ generates an asymptotic symmetry, then the integral of δH ξ on a spatial surface at infinity gives the variation of the corresponding ADM conserved quantity H ξ .
Im the case that both the background and the variation solve the equations of motion, and if ξ is a Killing field for the background, then δH ξ is a closed form. If we generalise to allow for any variation, for us sourced by the one-loop stress tensor of the scalar, then dδH ξ is proportional to the linearised equations of motion. The result is the following conservation equation: We now integrate this equation over a Cauchy surface Σ, for us a slice of constant t between the horizon of BTZ and the AdS boundary, and use 24 I would like to thank Don Marolf for helpful discussions regarding this section.
Stokes' theorem (n is the unit vector normal to Σ): To evaluate the boundary terms at infinity and on the horizon bifurcation surface, we use explicit expressions for δH ξ derived for Einstein gravity in appendix C. For stationary axisymmetric variations around BTZ, in the gauge (5.1), the expressions for the two Killing fields ∂ t and ∂ φ are where we have kept only the dφ component. Imposing asymptotically AdS boundary conditions, the integrals at infinity are variations of conserved quantities as expected: Now we choose the particular linear combination of ∂ t and ∂ φ which is normal to the horizon, the field ξ K for which the event horizon is a Killing horizon: For this choice, the horizon integral is Hor.
which is precisely the data which determines whether a horizon is present for variations around the extremal geometry.

The modified extremality bound
The extremality bound gives the set of conserved quantities for which the corresponding geometry has an event horizon. For variations around extremal BTZ, this is determined by the sign of δf (r + ): for δf (r + ) > 0, f + δf does not have a root near r = r + , and so the singularity at r = 0 becomes causally connected to the boundary. The linearised extremality bound is therefore δf (r + ) ≤ 0, which we can rewrite using the conservation equations: Hor.
It remains only to evaluate the integral, using (5.14) for the stress tensor expectation value. For extremal BTZ and sources respecting the symmetries, we have where in the second line we have substituted using (5.14), and combined terms corresponding to an element of the quotient group and its inverse. From (5.16), s n (r) is the proper length of a geodesic to and from a point at radius r, wrapping n times round the horizon: cosh s n (r) = cosh(2πnr + ) + nπ r 2 − r 2 + r + sinh(2πnr + ) (5.26) The expressions for A n and B n are rather complicated, but can be written in a form allowing an enormous simplification of the integral: A n (r) = (cosh s n (r) − cosh(2πnr + ))(2πnr + cosh s n (r) − sinh(2πnr + )) πn 2 sinh(2πnr + ) sinh s n (r) s n (r) B n (r) = d dr A n (r) s n (r) + 1 πn 2 s n (r) (5.27) Now we may integrate A n G ∆ by parts, which cancels the first term in the above expression for B n . What remains is a total derivative, expressible in terms of G ∆ at the horizon (requiring only that G ∆ (s) goes to zero as s → ∞): Only at this stage do we need the explicit expression (5.13) for the propagator, with which we write the one-loop bound in terms of the twisth: Matching parameters to CFT variables in the semiclassical limit, we have r + ∼ bP and ∆ ∼ 2b −1 α. The n = 1 term of the sum, which dominates in the large spin limit, matches (4.6). From linearity of the one-loop calculation, the contributions to δh extr from multiple fields will add. We expect this result to be valid for finite r + in the large c limit, with weakly interacting bulk fields; in particular, for small black holes r + ∼ c −1 the loop corrections are not suppressed.

Semiclassical bootstrap
We now reproduce the result (5.29) of the gravity calculation from CFT considerations, using modular invariance in a large central charge limit, assuming that the spectrum of light states is given by MFT, or a Bose gas of free particles in AdS 3 .
In general, the partition function for a gas of noninteracting Bosons is given by where Z SP is the partition function for single-particle states. For a free particle in AdS 3 , including independent left-and right-moving temperatures 25 , Z SP is a character of the global conformal sl(2) ⊕ sl(2) algebra: If we include the Virasoro descendants, which accounts for states with gravitons, as well as the shift from the ground state Casimir energy, we find the following contribution to the CFT partition function: 3) The factors of 1 − e −β cancel an overcounting of descendants, since Z Bose already includes global descendants (generated by L −1 ,L −1 ). The exception is the vacuum, but in that case the same factors are required to subtract the null descendants (L −1 ,L −1 annihilate the vacuum).
We now take a modular S-transform, and find the density of states corresponding to the gas of free particles in the dual decomposition. For real β L,R , the density of primary states is related to the partition function by a two-variable Laplace transform: Here, ρ denotes the density of Virasoro primary states with respect to the left-and right-moving energies E L,R defined above, so differs by a factor of 4PP from the density used in earlier sections. Putting equations (6.3) and (6.4) together allows us to extract the density of states by inverse Laplace transform. In particular, the resulting ratios of η-functions simplify using the modular property η(i β 2π ) = 2π β η(i 2π β ). If we ignore the gas of particles, setting Z Bose to unity in (6.3), the leftand right-moving pieces factorise, and we can perform the inverse Laplace transforms in closed form: ) This amounts to an alternative derivation of the density S-dual to the vacuum state (2.14).
Including the factor of Z Bose , it is not so simple to perform the inverse Laplace transforms, but our purposes do not require an exact result. We are interested in the near-extremal spectrum, which means taking E L to be of order c, but E R to be small. The inverse Laplace transform in the left-moving variables can then be performed by saddle-point in the large c limit, and for the right-moving dependence we only need Z Bose for large β R : The linear dependence on β R in the exponential simply gives a shift of E R in the resulting spectral density. The left-moving temperature is evaluated at the saddle-point , which gives q = e −4πb √ E L . Writing in terms of momentum P and twisth, we find the following shift in the edge of the spectrum:h Taking Φ to be a scalar, h Φ =h Φ = ∆ Φ 2 , this precisely matches the result (5.29) of the extremality bound from the quantum corrected geometry.
If we have multiple species of particle, the shift in the extremality bound is simply a sum of the constituents: the Bose partition functions for each species simply multiply, so their contributions add in (6.6).
The n = 1 term matches (4.6) in the appropriate semiclassical limit. The additional terms are due to the inclusion of the full multi-trace spectrum, rather than just the leading piece at large spin. If we were to include nongravitational interactions to shift the energies of the multi-particle states, we expect the n = 1 term to remain invariant, but higher terms in the sum to receive corrections.

A The mathematical appendix
In this section, we will describe the main mathematical ideas for defining and manipulating the distributions we encounter, and the Fourier transform. We will not attempt to be mathematically rigourous, in particular leaving out details of topologies, completeness, convergence and so forth.

A.1 Distributions
First, recall Schwartz's definition of distrubutions. The idea is to generalise the notion of function, by noting that an integrable function ρ is characterised by its integrals against some well-behaved 'test functions' ψ, ρ, ψ := ∞ −∞ dP ρ(P )ψ(P ). This defines a linear functional on the space of test functions, uniquely determining ρ almost everywhere (if that space is large enough). But this notion is now simple to generalise to nice linear functionals that do not correspond to any integrable function; for example, evaluation at a point P 0 defines the Dirac distribution δ P 0 : δ P 0 , ψ := ψ(P 0 ).
A standard choice for the space of test functions is the Schwartz space, consisting of smooth functions such that all derivatives decay faster than any polynomial (leading to the 'tempered distributions'). We will require a smaller space of test functions, which allows us to define a correspondingly larger space of distributions. 26 To formulate a well-behaved theory of distributions requires an extra technical ingredient, namely a topology on the space of test functions, and distributions are required to be continuous functionals with respect to this 26 Another standard choice takes test functions to be smooth with compact support. We require analytic test functions, which can never have compact support, except for 0.
topology. We will not address this aspect here, but it will be important for a more complete and rigorous treatment.
We can define various operations on distributions by formal manipulations of the heuristic ρ, ψ "=" ∞ −∞ dP ρ(P )ψ(P ). Derivatives of distributions are defined by a formal integration by parts; multiplication by sufficiently nice functions f and the Fourier transform ρ →ρ are defined by formal exchange of the order of integration: These implicitly require that the operation on the right maps test functons to test functions; the last of these in particular requires the space of test functions to be invariant under Fourier transform, which applies, for example, to the space of Schwartz functions.
As indicated in the text, to define the distributions we are interested in requires a more restricted space of test functions than is standard. Some properties we might like of the test functions (and their topology) are the following: 1. Test functions are entire analytic.
2. Test functions decay faster than any exponential on the real axis.
3. The Fourier transform maps test functions to test functions. 4. The Gaussians χ(τ ) are test functions, and their linear span is dense in the subspace of even functions.
The Gaussians in the last example are, for us, the characters χ(τ ) : P → χ P (τ ) ∝ e 2πiτ P 2 for τ in the upper half-plane. Requiring them to be dense in the chosen topology ensures that a distribution ρ can be determined uniquely from its partition function Z(τ ) = ρ, χ(τ ) . Given the third requirement, the first two are somewhat redundant; the Fourier transform of a superexponentially decaying function is entire.
The salient example of a distribution is a delta function supported at P , denoted δ P , and defined by δ P , ψ = ψ(P ). While this is familiar from the more conventional spaces of distributions and tempered distributions, we can take P to be any complex number, since we take the test functions to have entire analytic extensions. More generally, we can define distributions which integrate the test function over curves or regions in C. The other important property of the test functions is their decay rate, which enlarges the space of distributions to include exponential growth.

A.2 A Fourier transform
We here take the Fourier transform of the distribution ρ k (P ) = 2k|P | 2k−1 , which we use in the text to describe the growth of degeneracies of multitwist operators at large spin. We can also define this for k = 0 by taking limit of these distributions, finding ρ 0 (P ) = 2δ(P ).
A simple way to work out the Fourier transform of such a distribution is by differentiating and using the usual property of the Fourier transform under derivatives, recursively on k: If we now just go ahead and naïvely divide by the factors of P 2 , we get the following for the Fourier transform: This is subtle for two related reasons. Firstly, the distribution is singular, so needs a 'regularistion' to properly define it (hence the inclusion of the principal value symbol p.v.). Secondly, the devision by P 2 adds ambiguities supported at the origin (which must be δ-functions and derivatives), which we must fix. The result is correct with the following definition of integration against an even test function ψ (and vanishing for odd test functions): This means that we subtract enough terms of the Taylor expansion of the test function for the integrand to be finite at P = 0, before integrating. To prove this is correct, and in particular show that this definition does not leave out any δ-function pieces, it suffices to check with a Gaussian test function. Note that, because we're subtracting the 'zero mode' of the test function ψ(0), adding these terms shouldn't be interpreted as changing the total number of states; for example, adding −p.v. 1 P 2 removes some states at positive P , but adds them back in at P = 0 in some sense.

A.3 Analytic representations
Since we are using analytic test functions, there is a nice alternative definition of the principal value distribution (A.2) encountered above, by deforming the integral into the complex plane away from the singularity at P = 0: This is an example of a more general tool, an analytic representation of a distribution (closely related to Sato's notion of hyperfunctions). Namely, for a given distribution ρ, there is a function Ω analytic everywhere except the real axis, such that ρ is the discontinuity of Ω across the real axis, in the following sense: In the instance above, we have Ω(P ) = − 1 2 P −2k sgn Im P . Another nice example is Ω(P ) = 1 2πiP , corresponding to δ(P ). For more background, details, examples and applications see [42].
For us, having analytic test functions allows us to generalise the notion of analytic representations, in particular giving us representations of distributions with support away from the real axis. We only require that Ω is analytic outside a strip, for sufficiently large | Im P |. We then say that Ω is an analytic representation of ρ if Γ dP Ω(P )ψ(P ) = ρ, ψ ∀ψ, where Γ is a contour running from left to right in the region of analyticity in the lower half-plane, and similarly from right to left in the upper half-plane. For a square integrable function, an analytic representation can be determined by a Cauchy integral, which is convolution with the analytic representation of the δ-function: We can also rewrite this in momentum space, which shows that in the upper (lower) half-plane, Ω is given by the positive (negative) frequency part of ρ. This follows from the Fourier transforms h P (P ) = 1 2πi(P − P ) =⇒ĥ P (P ) = ±Θ(∓P )S P P , Im P ≷ 0, (A.7) where S is the Fourier kernel (2.13), and Θ the Heaviside step function. We can also perform the Fourier transform directly on analytic representations. If Ω is an analytic representation of ρ, by a slight abuse of notation we denote an analytic representation ofρ byΩ: The contours Γ ± each consist of two pieces, one in the upper half-plane going from Re P → ∓∞ to an arbitrary point P U (in the domain of definition of Ω), and one in the lower half-plane running similarly from an arbitrary point P = P L to Re P → ∓∞. Changing P U or P L amounts to adding identical contours to Γ ± , which adds an entire function to Ωρ, leavingρ invariant as required. The splitting of the contour for Im P > 0 and Im P < 0 is designed to avoid exponential growth in the integral from the Fourier kernel S.
To use analytic representations, instead of choosing ρ to be even it may be convenient to choose it instead to have no support for negative P . Then, Ω will be analytic in the whole left half-plane. The contours in the integrals (A.8) can then be closed, with Γ + becoming empty and Γ − going from Re P → ∞ in the upper half-plane to Re P → ∞ in the lower half-plane, looping round all singularities. For application to the S-transform this must then be combined with the conjugate, since we there assumed that ρ was even.
We note that the analytic function C(∆, J) appearing in [23], which encodes the OPE coefficients of a correlation function in its poles, is an analytic representation of the spectral density in the sense described here. Perhaps existing mathematical results can be helpful for uncovering the physical consequences of analyticity in spin.

A.4 A distributional binomial theorem
In section 4.4, we used a binomial theorem applied to distributions: In this section, we derive this sum, and clarify the meaning of the distribution on the right hand side. This serves as an example of the power of the analytic representations of the previous subsection: we will perform the sum over the analytic representation − 1 2 P −2k sgn Im P (A.10) of p.v. 1 P 2n−2 . In particular, this will automatically take into account the regularisation implied by the principal value symbol. Now we can perform the sum on this analytic function, and it converges (uniformly on compacta) for |P | > η, to where we take the principal branch of the square root, with cut along the negative real axis. Note that the terms in the sum are conventional (tempered) distributions, but the series does not converge in that space: this is indicated by the fact that the sum of analytic representations is not analytic everywhere in upper and lower half-planes. Nonetheless, because our test functions are entire and decay rapidly, convergence in the given region is enough to imply that the sum of distributions converges to ρ η , with ρ η , ψ := 1 2i Γ dP ψ(P )P −(η 2 + P 2 ). (A.11) The contour Γ runs from left to right in the lower half-plane and right to left in the upper half-plane as above, here in particular staying in the analytic region | Im P | > η to avoid the branch cut. We can deform the contour to run along the cuts, so it just picks up the discontinuity. For example, on the positive real axis, the integrand is −iψ(P )P P 2 + η 2 just above the cut, and the negative of this just below it, so this contributes ∞ 0 dP ψ(P )P η 2 + P 2 . But there is also a contribution from jumps across branch cuts on the imaginary axis between ±iη: After translating from P variables to h, this gives a density of states going like h − c−1 24 + η 2 , starting at the zero of the square root, not just at h = c−1 24 .

A.5 Asymptotics of distributions
An important part of our analysis is a characterisation of the asymptotic behaviour of distributions at large |P |, corresponding to large spin. Since we are interested in distributions supported on discrete sets of points, it is not immediately clear how to make sense of this, in particular because the usual definition of an asymptotic series fails for such distributions. For us, we want to know what conditions on a distribution suffice to determine the most singular terms in its Fourier transform. We illustrate various possible approaches to this problem with a simple example, taking a distribution similar to the density of states on double twist Regge trajectories: One approach is to integrate the distribution several times, until it becomes a smooth enough function to use classical results on Fourier transforms. The strategy is to integrate ρ repeatedly, constructing ρ k with kth derivative ρ. If ρ is sufficiently well-behaved, we will find that, for large enough k, ρ k will be a simple function p k (sgn(P ) times a polynomial in cases of interest here) plus a remainder r k which decays sufficiently quickly at infinity. In particular, if r k is absolutely integral (r k ∈ L 1 (R), ∞ −∞ dP |r k (P )| < ∞), then it has a bounded and continuous Fourier transform. We then have the asymptotic estimatê 14) and the difference of distributions on the left is in fact a continuous function.
In our example, we can integrate once to get the total number of states below a given P , ρ 1 (P ) = ρ(P ) = sgn(P ) P 2 2 + 1 , (A.15) but subtracting simple functions like sgn(P )( P 2 2 + c) does not result in a decaying function, since there are oscillations of a constant size as P → ∞.
Integrating again improves things, with oscillations decaying like P −1 at large P , but still r 2 / ∈ L 1 . In fact, we must integrate three times, to get ρ 3 (P ) = sgn(P ) 16) where (with the appropriate choice of constants of integration) P 2 |r 3 (P 2 )| is bounded, so r 3 ∈ L 1 (R). This gives the result We could integrate yet more times to find higher orders in the expansion, but the calculations quickly become unwieldy. The size of the fluctuations in the remainders of integrated densities r k are directly related to the typical spacing between states as P → ∞. A very uneven spectrum, with clumps of many nearby or degenerate states interspersed by large gaps, will tend to require larger k for the arguments outlined here to succeed. It would be interesting to pursue this more systematically.
In some circumstances, analytic representations can be a powerful tool for computing asymptotic expansions of the discretely supported distributions of interest. For our example distribution, we find an analytic representation by summing the poles representing delta-functions, subtracting some entire analytic pieces for convergence and convenience: Here, ψ is the digamma function, which has the asymptotic expansion valid for |z| → ∞ anywhere away from the negative real z axis, where the poles of ψ, which condense into the log branch cut, are located. This gives an asymptotic expansion for our analytic representation of ρ: This is valid in a limit |P | → ∞, as long as Im P → ±∞ (it fails taking |P | → ∞ along lines parallel to the real axis). Naïvely interpreting as a distribution term-by-term, we can write a formal asymptotic expansion of ρ: ρ(P ) ∼ |P | + δ(P ) + · · · (A.21) The first term is very reasonable, giving the density of states at large P .
The δ-function also has a nice interpretation, giving the average discrepancy between the total number of states below a given P (A.15), and the number P 2 /2 deduced from the leading term. We now take the Fourier transform. Computationally, it is simplest to interpreting the terms of the expansion as derivatives of delta functions and transform term by term. However, this hides that we are rigorously computing a small P expansion of the analytic representationΩ ofρ, using (A.8). We get the expansion Despite the sum in intermediate steps being asymptotic, this in fact has infinite radius of convergence, and really equals the Fourier transform. We can check this directly by using the Gaussian test functions χ P in this case (which amounts to constructing the partition functions for ρ,ρ and checking they are related by τ → −1/τ ).

B Counting multi-trace states
In this appendix, we enumerate the operators of mean field theory, which form a Fock space built from the primary φ and its global descendants ∂ k∂k φ. These results are applied in the text to count multi-twist operators.

B.1 Bosons
Begin with the Bose-Einstein partition function: To massage this into a more useful form, we take the log to write the product as a sum, expand the terms using the Taylor series for log, and perform the sum over k,k: This has a nice bulk worldline QFT interpretation: we're exponentiating all connected diagrams, which for free particles just consist of closed loops. The index n counts the number of times a loop winds round the thermal circle (the n = 0 term is cancelled by renormalisation of the cosmological constant). The 1/n is a symmetry factor, multiplying the single-particle partition function with β → nβ. Now, exponentiate and Taylor expand each term: The contribution from a given particle number p will be a sum of terms that are labelled by partitions of p: for each n, we pick the k n th term in the sum over k, where k n is the number of times n appears in the partition. For small particle numbers, we have the following: To count primaries only, multiply by (1 − q)(1 −q) to subtract descendants. For example, for two-particle states we have . (B.5) The first term gives an operator of every spin for each twist; the second cancels the odd spins to give a single operator at each even spin. We observe experimentally that the leading Regge trajectory for p particles is given by the simple generating function which means that number of primaries in the leading p-particle Regge trajectory at each spin is given by the number of partitions of into 2, 3, 4, · · · , p.

B.2 Counting at large spin
The growth of degeneracies at large spin is determined by theq → 1 limit, which for a given Young tableau is controlled by the height (the number of elements in the partition). The leading order is therefore all given by the partition of p into lots of ones, To prove this, we can apply a Tauberian theorem to the coefficients of q m in the p-particle partition function (1 − q)(1 −q)Z p (q,q). Theq → 1 asymptotics are determined by the term in the sum over partitions described above. The Hardy-Littlewood Tauberian theorem 27 then implies that the sum of coefficients ofqm form ≤ obeys the given asymptotic formula.

B.3 Multiple species
For multiple species of particles, just take the product of Bose partition functions for each. For the asymptotic large spin piece of the partition function with particle numbers p i , this gives (B.10) where p = p i , which is just the multinomial coefficient times the singlespecies, p particle answer. This simple result only holds to leading order at large spin.

B.4 Fermions
We here look at Fermi statistics. The Fock space of 'light' fermions we're considering will always be in the NS (antiperiodic) sector. But we can still put periodic or antiperiodic boundary conditions in the thermal cycle; the former means we include a (−1) F insertion. We'll therefore look at two partition functions Z ± = Tr((±) F e −βH ): (1 ± q h φ +k qh φ +k ) (B.11) Following the same steps, we get , (B.12) and so (B.13) Now the p-particle sector will involve the same sum over partitions, but with extra signs. Writing (±) p Z F p q ph φq ph φ , (B.14) we have Compared to the Bose case, we have extra signs (−1) p+ k j weighting different partitions. For example, for p = 2, we get , (B.16) which projects onto even spin primaries (since the (absent) q 0q0 term would correspond to an odd spin). The leading order piece at large spin is given by the same result as for bosons: the partition into p ones has k j = p, so that contribution to Z F p comes without a sign, as it must for positive degeneracies.
C The covariant phase space formalism

C.1 The formalism
We consider a general theory in d + 1 dimensions with dynamical fields φ (including the metric), and diffeomorphism invariant Lagrangian, which we express as a (d + 1)-form L. Under a general variation, we have δL = E(δφ) + dΘ(δφ). (C.1) The two terms are the equations of motion for the background E, expressed as a linear function of the field variations, and the symplectic potential Θ, a d-form depending linearly on the field variations and perhaps their derivatives. The diffeomorphism invariance of the theory means that, for any infinitesimal diffeomorphism labelled by a vector field ξ, we have δ ξ φ = L ξ φ =⇒ δ ξ L = L ξ L. From this, there is a (d − 1)-form Q ξ , constructed locally from the fields and ξ, such that dQ ξ = J ξ (on-shell).
We now wish to construct generators of symmetries and conserved quantities from Q ξ . To find the correct object, we consider J ξ under field variations. A short calculation gives δJ ξ = d(ι ξ Θ(δφ)) − ι ξ E(δφ) + δ [Θ(L ξ φ)] − L ξ [Θ(δφ)] , and the last two terms can be written in terms of the symplectic current ω(δ 1 φ, δ 2 φ) = δ 1 Θ(δ 2 φ) − δ 2 Θ(δ 1 φ), (C.5) as ω(δφ, L ξ φ) (a pair of terms with L ξ δφ cancel). The symplectic form is the integral of ω on a Cauchy surface. This motivates the construction of the (d − 1)-form δH ξ = δQ ξ − ι ξ Θ(δφ), (C.6) which is the variation of the Hamiltonian density generating translation by ξ: it is related, through the symplectic form, with δφ (regarded as a vector field on phase space) by Hamilton's equations: In particular, this vanishes if ξ is a symmetry of the background configuration, L ξ φ = 0. The integral of δH ξ on a spacelike (d−1)-dimensional boundary at infinity defines the variation of conserved charges associated with ξ, in particular the energy and angular momentum for ξ = ∂ t , ∂ φ respectively. Since this is a closed form, it can be evaluated on any homologous (d − 1)-dimensional submanifold; in particular, comparing the evaluation at infinity with a horizon gives the first law of black hole thermodynamics.

C.2 Einstein-Hilbert
We now follow this construction for pure Einstein-Hilbert gravity (in d + 1 spacetime dimensions), taking where is the volume form. We have excluded the normalisation factor 1 16πG N here to reduce clutter; it is restored in the main text.
Computing the variation with respect to the metric, δg ab = h ab we find E(h) = −E ab h ab , E ab = R ab − 1 2 Rg ab + Λg ab (C.8) Θ = ι X , X a = (g ac g bd − g ab g cd )∇ b h cd . (C.9) From this, we construct the Noether current for a vector field ξ, 29 J ξ = 2 E · ξ − d dξ , (E · ξ = E ab ξ b dx a ), (C. 10) and read off the Noether charge Q ξ = − dξ , (C.11) which satisfies dQ ξ = J ξ − 2 E · ξ. (C.12) We now consider the Hamiltonian associated with a Killing field ξ, In a background satisfying the Einstein equations, but with an arbitrary (off-shell) first order variation with linearised Einstein tensor E ab , we have the following: dδH ξ = −2 E · ξ (C.14) Restoring the factor of 16πG N , this gives the conservation equation (5.17).