't Hooft suppression and holographic entropy

Recent works have related the bulk first law of black hole mechanics to the first law of entanglement in a dual CFT. These are first order relations, and receive corrections for finite changes. In particular, the latter is naively expected to be accurate only for small changes in the quantum state. But when Newton's constant is small relative to the AdS scale, the former holds to good approximation even for classical perturbations that contain many quanta. This suggests that -- for appropriate states -- corrections to the first law of entanglement are suppressed by powers of $N$ in CFTs whose correlators satisfy 't Hooft large-$N$ power counting. We take first steps toward verifying that this is so by studying the large-$N$ structure of the entropy of spatial regions for a class of CFT states motivated by those created from the vacuum by acting with real-time single-trace sources. We show that $1/N$ counting matches bulk predictions, though we require the effect of the source on the modular hamiltonian to be non-singular. The magnitude of our sources is $\epsilon N$ with $\epsilon$ fixed-but-small as $N\rightarrow \infty$. Our results also provide a perturbative derivation -- without relying on the replica trick -- of the subleading Faulkner-Lewkowycz-Maldacena correction to the Ryu-Takayagi and Hubeny-Rangamani-Takayanagi conjectures at all orders in $1/N$.


Introduction
There has been much activity exploring the intriguing connection between entanglement in holographic field theories and the gravitational field equations of the bulk dual. This program traces its roots to Jacobson's seminal paper [1], which proposed the Einstein equation to be a thermodynamic equation of state for some unknown quantum mechanical system in which the area of surfaces measures entanglement entropy across causal horizons. Several groups [2][3][4][5] have now used related arguments to derive the linearized gravitational field equations in the context of the anti-de Sitter/conformal field theory (AdS/CFT) correspondence, where the underlying quantum system is well understood; see also the related works [6][7][8]. A key assumption in these more recent derivations is the Ryu-Takayangi (RT) conjecture [9,10] and its covariant generalization by Hubeny-Rangamani-Takayanagi (HRT) [11], both of which relate entanglement entropy of subregions of the field theory to the geometry of bulk surfaces. A partial converse of this result -that the bulk field equations imply aspects of the Ryu-Takayanagi conjecture -has also been argued by Lewkowycz and Maldaena [12], though see [13] for a discussion of the so-called homology constraint, [14] for questions about the possible role of complex bulk surfaces, and [15] for concerns regarding strong time-dependence.
Much of the recent discussion has centered on the so-called first law of entanglement (1.1) Here A is a subregion of some Cauchy surface for the CFT, S A := − Tr(ρ A log(ρ A )) is the von Neumann entropy of the associated reduced density matrix ρ A , H A := − log(ρ A ) is the modular Hamiltonian, and δ denotes the first variation with respect to the state when the operator H A is held fixed on the right hand side. The relation (1.1) holds at first order, but receives corrections for finite changes. According to the HRT conjecture, in holographic theories the left hand side of (1.1) can be extracted from the bulk geometry. The right hand side is typically difficult to evaluate, though it reduces to a simple integral of the CFT stress tensor [16] (see also [17]) when A is a ball-shaped region and the system is in its global vacuum state. The combination of these two results makes (1.1) a useful formula for studying the relationship between entanglement and geometry. In particular, in this context (1.1) coincides with the first law of black hole mechanics in the bulk applied to the Rindler-like Killing horizon defined by a ball-shaped region A on the AdS boundary [17]. This argument for the first law can then be inverted to derive linearized bulk equations of motion from HRT [2][3][4][5].
The starting point for our work is the observation (see e.g. [4]) that the entanglement first law (1.1) is generally useful only for infinitesimal changes in the quantum state; higher order corrections tend to make significant contributions when it instead undergoes any substantial change. In contrast, the bulk first law of black hole mechanics accurately describes classical deformations -typically involving very large numbers of quanta -so long as the changes in entropy and energy are small in comparison with their background values. In particular, corrections to the bulk first law stem from gravitational back-reaction and are thus suppressed by powers of the bulk Newton constant G N . This suggests that corrections to the entanglement first law (1.1) will be correspondingly suppressed, at least for states that would be appropriately semi-classical with respect to the bulk. Our goal below is to show suppressions by powers of N in similar computations involving the entropy of spatial regions for CFTs whose correlators satisfy 't Hooft power counting and whose spectrum of light operators is sufficiently sparse. We consider a class of CFT states motivated by those created from the vacuum by acting with real-time single-trace sources, though we require the source to have non-singular effects on the modular Hamiltonian. In order to model sources that would produce semi-classical coherent states in any bulk dual, the magnitude of our sources is taken to be N with fixed-but-small as N → ∞. See also [18] for other investigations of entanglement in large N gauge theories.
For such states, we verify bulk predictions of powers of N via a direct calculation in the CFT. Though it differs in detail, the suppression found here is analogous to the large N suppression of such corrections found previously in [19,20]. Our analysis also provide additional benefits. First, our explicit formula for second order relative entropy makes manifest the agreement with appropriately-integrated bulk stress tensors -here defined to include contributions from the stress tensor of bulk gravitons -required by comparing corrections to the bulk and boundary first laws [21,22]. As a result, it again demonstrates that a consistent holographic theory of gravity must couple universally to all forms of bulk stress-energy. It also provides a perturbative derivation of the Faulkner-Lewkowycz-Maldacena subleading correction to the RT and HRT conjectures at all orders in 1/N [23].
Before proceeding, we should elaborate on the above restriction to sources with nonsingular action on the modular Hamiltonian. As explained in section 2 below, from a dual bulk point of view this requires our perturbations to vanish in some neighborhood of the bifurcation surface of our bulk Rindler-like horizon. So the leading-order large-N RT or HRT entanglement cannot change. They do, however, affect the above-mentioned order N 0 entanglement. They also change H A and thus the relative entropy R A = H A − S A at order N 2 . We will show that these powers of N are correct at all orders in ; the fact that ∆S A is of smaller order in N than S A itself is the 1/N suppression advertised above.
The rest of the paper is organized as follows. We begin with a brief description of our setup in section 2. Section 3 then computes H A , R A and S A to second order for our family of states. It also argues to all orders that the powers of N in H A , R A and S A are precisely those predicted by intuition from a bulk dual. We conclude in section 4 with a brief discussion of our results as well as comments on possible extensions. Appendix A derives a simple and completely general formula (A.14) for the second order change in R A in bipartite quantum systems that gives the above-mentioned explicit formulas for R A and ∆S A .

Setting and assumptions
We wish to study excitations of the vacuum |0 of a large N CFT in d spacetime dimensions on R × S d−1 . Since we take our inspiration from a possible bulk dual, we impose assumptions similar to those in e.g. [24] and [25,26], taking our CFT to satisfy 't Hooft large-N factorization [27] and to have a sparse spectrum of light operators. As usual, light operators are those whose scaling dimension ∆ i of O i (x) remains finite as we take N → ∞ and the sparse spectrum condition requires that for any fixed ∆ the number of such operators with ∆ i < ∆ remains finite at large N .
The factorization condition states that the set of light single-trace gauge-invariant local operators should admit a basis {O i (x)} for which where . . . c is the connected vacuum correlator and similar notation without the subscript c will also be employed for the full correlator. Furthermore, heavy operators (those that do not remain light as N → ∞) decouple in the sense that connected correlators involving both heavy and light operators are much smaller. In fact, we assume that we study a low energy process from which operators of finite-but-large dimension are sufficiently decoupled that sums over the O i below always converge. Indeed, our only use of the sparse spectrum condition will be to assume that such sums with coefficients of order N p converge to a result of the same order in N . It will be convenient to take the basis operators O i to have definite scaling dimension ∆ i at large N and in fact to diagonalize the order N 0 term in the connected two-point function. We also require for all i, which can be achieved by subtracting appropriate expectation values. Note that we have not required the O i to be scalars; we have merely suppressed any tensor or spinor indices. Thus, up to the above subtractions, one member (say, O 0 ) of our basis is 1/N times the CFT stress tensor which necessarily satisfies ∆ = d; i.e. O 0 = 1 N (T − T ), where we have suppressed the spacetime indices ab.
Our expectations that corrections to the first law of entanglement are suppressed arise from considering the semi-classical behavior of a supposed bulk gravitational dual. Semiclassical bulk states can be created from the vacuum through the action of large classical sources for the perturbative bulk fields. As usual, we may choose to locate these sources at the boundary where they may be translated into sources for the local single-trace O i above. Now, single and multi-trace sources mix under time-evolution, but this mixing is again controlled by the 1/N expansion: since the stress tensor generates time evolution, to any order in 1/N a light operator O(x) can be replaced inside such correlators with an operator at another time that is a sum over k of k-trace terms weighted by 1/N k−1 . Semi-classical behavior is preserved in time, so it should suffice to restrict attention to states of the form where the classical sources j i 1 ...i k are fixed smooth c-number functions of order N 0 and we allow distributional terms so that terms in the k-trace contributions to (2.3b) may effectively include only m < k integrals over operator location.
We have introduced the real number α to be used as an expansion parameter. The symbol T denotes time-ordering, and we employ the convenient abuse of notation that defines T e −iαJ to be the natural time-ordered exponential associated with the particular representation of J given above as an integral over the CFT spacetime. Using the standard AdS/CFT dictionary, the above normalizations would give the bulk field φ i dual to O i an expectation value φ α ∼ α.
We wish to choose α so that (2.3) would behave semi-classically in a bulk gravitational dual. Since the bulk is perturbative at large N , bulk quantum fluctuations become negligible in the limit α 1 and gravitational back reaction scales like To allow this effect to be as large as possible consistent with a perturbative treatment we take where is a small parameter to be held constant in the limit N → ∞.
Our particular interest is in the effect of such sources on a region A of some Cauchy surface Σ in the CFT spacetime, or more generally on the associated domain of dependence D(A). The region A is held fixed as we take N → ∞. We denote the complementary region on this Cauchy surface by A c , so that Σ = A ∪ A c . As a density matrix, the state |α is σ α := |α α|. The associated reduced density matrices and the un-perturbed modular Hamiltonians for A and A c are then As implied above, we will often suppress the label α on ρ A , ρ A c when the meaning is clear, though H A , H A c will always represent the modular Hamiltonians at α = 0. An important property of these objects is For ball-shaped regions A in a constant-time slice, K generates the conformal isometry of R ⊗ S d−1 that moves A orthogonally to itself while preserving the domain of dependence D(A); i.e., it generates the natural forward "time-translation" on D(A) and the natural backward "time-translation" on D(A c ) [16]. For lack of a better name, we will refer to K as the boost Hamiltonian, though this term is only an accurate description of K in the large-sphere limit when A becomes a half-plane. Finally, we introduce a basis for the space of light local gauge-invariant single trace ) that again satisfy (2.1) and (2.2). Below we will consider ball-shaped regions in a constant-time Cauchy surface Σ = S d−1 so that the modular Hamiltonians H A , H A c can be expressed as integrals of the stress tensor over A, A c in any CFT [16].

Adapting the source
A key step in our argument will be to write the source-operator J from (2.3b) in a manner adapted to the decomposition Σ = A∪A c . The basic idea is to use the Heisenberg equations of motion for the CFT to express J in terms of operators on Σ. Thinking of the CFT Hilbert space as a tensor product of separate Hilbert spaces for A and A c would then allow J to be written as a sum of terms, each of which is the tensor product of a (possibly trivial) operator on A with one on A c .
We would like to then expand the J a A , J a A c as a sum or products of the There are, however, several potential obstacles to consider. First, evolving the source to the Cauchy surface Σ will generally mix the O i with both heavy operators and highly nonlocal expressions such as Wilson loops that cannot be expanded in terms of the . However, our calculations below will involve only correlation functions of sources with H A , H A c . For the ball-shaped regions to be considered, up to the choice of zero-point our H A , H A c are integrals of the stress tensor, which is one of our light operators. Since the stress tensor also generates time evolution, to any order in 1/N a light operator O(x) can be replaced inside such correlators with an appropriately-smeared sum of products . So in this sense we may write where we have used the fact that operator mixing is suppressed by powers of 1/N and defined The combinations of operators represented by O A , O A c will play important roles below.
Although not explicitly indicated, we will make use of a similar decomposition of the multitrace parts of (2.10) into sums of products of operators associated separately with A and A c . Since (2.3) requires time-ordered products of such operators, we should really perform an expansion of the form (2.10) to express the source J(t) at each time in terms of operators that evolve to the regions A, A c , but to avoid clutter in our notation we will not explicitly indicate the time at which given operators are to act. In terms of a holographic bulk dual, the rewriting of (2.3) as (2.10) could be described as evolving sources to a bulk Cauchy surface intersecting the boundary at A ∪ A c and then using e.g. the methods of [28][29][30][31][32][33][34] to write the corresponding bulk operators in terms of boundary operators 1 in D(A), D(A c ). At least for operators O i not associated with bulk gauge fields, this interpretation nicely side-steps issues (see e.g. [35][36][37][38][39][40][41][42][43][44]) associated with the expectation that our CFT will be a gauge theory so that its Gauss-law constraint forbids factorization of its Hilbert space into separate Hilbert spaces for A and A c .
However, even without a Gauss-law constraint, quantum field theory Hilbert spaces do not admit a precise tensor product structure. This fact is associated with singularities in various n-point correlation functions which can give contributions localized precisely on the boundary ∂A where A and A c meet. 2 The issue could be ignored if we were to consider only correlators smeared with smooth functions; we could simply deform the smearing functions so that they vanish in some small neighborhood of ∂A and then recover the original undeformed correlators by continuity as these neighborhoods shrink to zero size. But the fact that H A , H A c are integrals of the stress tensor against non-smooth functions 3 means that more care will be required. In section 3 below we simply restrict to sources that induce changes in energy ∆ H A and entropy ∆S A that can be approximated by replacing each term in (2.10) with a source that differs from the original in some neighborhood A collar of ∂A and vanishes smoothly in a smaller neighborhood A source-free of ∂A, and then letting the width of A collar vanish. Roughly speaking, these are the sources that do not produce distributional terms localized at ∂A. We refer to such sources as having non-singular action on the modular Hamiltonian H A .
In practice, rather than working through the above limit explicitly, we will simply consider the regulated operators mentioned above which we take to vanish in a common neighborhood A source-free of ∂A. The limit A collar , A source-free → ∅ will be left implicit. Such sources may for example be obtained by choosing the original stress-tensor sources to be supported in the interior of D(A) ∪ D(A c ). Nevertheless, the fact that the multi-trace terms in either (2.3) or (2.10) are multi-local, means that they can include products of operators in D(A) with those in D(A c ). Thus U is generally not a product of separate unitary transformations on D(A) and D(A c ).

Results with no singular terms
We now study the changes in energy and entropy associated with applying U to |0 for the above sources.

Energy
Changes in the energy are straightforward to evaluate and take the form whereT denotes anti-time-ordering and [J, [J, H A ]] T is the operation defined by the second order expansion of the first line; as indicated by the notation, one may think of this term as an appropriately time-ordered version of a double commutator.
Since the first order term also clearly follows from the first line, it remains only to show that the omitted terms are of order 3 N 2 and in particular involve no higher powers of N . We expect that this argument is also standard, but we state it here for completeness. The point is to note the repeated commutator structure ofT e iαJ H A T e −iαJ − H A , which requires that at order α n all n sources must be connected to each other and also to H A . Considering for the moment only the single-trace contribution O A + O A c to all sources J from (2.10) and using (2.1) then gives only terms of order n N 2 .
One may then show that multi-trace contributions are further suppressed by at least an additional N −2 : Since the coefficient of each multi-trace contribution to (2.10) contains an explicit factor of 1/N , the only possible exception could come from including a single double-trace contribution O i 1 O i 2 to one of the sources. But since O i 1 = O i 2 = 0, the repeated commutator structure again means that all that all non-zero contributions are fully connected. From (2.1), the replacement of a single-trace operator by O i 1 O i 2 thus yields an extra 1/N in this connected correlator giving the stated suppression by two powers of 1/N .
We should also comment further on the linear term in (3.1). Recall that we take sources in (2.10) to vanish near ∂A. This must in particular be true of the single-trace term, which can thus be written as Writing H A as an integral of N O 0 (plus a constant) and using (2.1) would suggest that this single-trace part makes the linear term of order As a result, the linear term in (3.1) receives contributions only from multi-trace terms in (2.10) and is thus suppressed by an extra N −2 as described above. Thus the linear term is in fact of order N 0 .

Entropy
Computing changes in entropy and relative entropy is more complicated than computing ∆ H A . The main issue is that expanding the logarithm in S A requires using the highly nontrivial Baker-Campbell-Hausdorf (BCH) formula. Nevertheless, subject to the assumption that all sources in (2.10) vanish near ∂A, we will argue below that changes in the entropy S A and relative entropy R A take the form 3) This f appears because it is the generating function of the Bernoulli numbers which play an important role in the BCH formula [45]. While it is generally redundant to describe H A , S A , and R A := H A − S A , we choose to do so -and in fact give two expressions for R A -both in order to state definite results and to describe the relative sizes of various contributions. For example, the explicit term in (3.2a) is a general result for the second order term in the relative entropy R A of any bipartite quantum system, 4 but the large N structure is more apparent from (3.2b). Since our current interest focuses on the latter, and since (3.2b) may be derived from more general arguments, we relegate the calculation leading to (3.2a) to appendix A. Equation (3.2a) may nevertheless be useful for applications that require subleading terms of order 2 N 0 as well as the leading term of order 2 N 2 . Although deriving these results will require some work, the intuition behind (3.2b) and (3.2c) is easy to understand. Note that keeping only the single-trace term would make U a product of separate (commuting) unitary transformations U A and U A c on A and A c . This would require ∆S A = 0 and thus R A = ∆ H A . So contributions to ∆S A require the multi-trace terms in (2.10), which are of order N 0 . Deriving (3.2c) thus amounts to controlling cross-terms involving both single-trace and multi-trace source terms. This is done in section 3.3 below.
The explicit term in (3.2b) is the second-order effect of purely single-trace sources on R A , or equivalently on H A . Since either can depend only on the restriction of the state to A, they are unchanged by the action of the U A c defined above. We may thus consider only U A , which commutes with H A c and thus induces identical changes in both H A and K. Indeed, since the linear term in (3.1) vanishes for single-trace sources that vanish near ∂A, it suffices to compute only the second-order term Using K|0 = 0|K = 0 then shows the only non-zero term to be the one displayed in (3.2b). Along with the fact that the first order change in R A vanishes identically by the first law, the errors in (3.1) and (3.2c) then imply those in (3.2b).

Multi-trace contributions to ∆S A
The task that remains is to show that including multi-trace contributions to (2.10) can change S A only by terms of order N 0 . The argument is somewhat lengthy, so we break it into several parts. We first reorganize the action of the single-trace source-terms in order to show that they have little impact. We then work to write a series expansion of S A to which we can usefully apply the large-N counting rule (2.1). Here there are two difficulties, one of which is associated with the fact that S A is defined as a non-linear function of the reduced density matrix ρ A on region A and not as a vacuum expectation value. Any power series expansion of S A thus naturally involves many traces over A c that must be eliminated in order to use (2.1). The other involves controlling contributions from possible disconnected correlators. After completing these tasks, we combine the results and count powers of N .

Reorganizing the action of single-trace sources
To begin, recall the definitions Recall also that dropping the multi-trace contributions and computing only would give a product of separate unitary transformations on A and A c that do not change S A . Similarly, we may note that the entropy of ρ A must be identical to the entropy of where U A single is the part of U single that acts on A. We thus have Now, σ α itself is defined (see (2.3)) by conjugating |0 0| with a time-ordered unitary U . This unitary is the product of many factors of the form where we have separated the single-and multi-trace parts of the source at time t. Note that our new conjugation by U single merely replaces the U in (2.3) by U −1 single U , or equivalently replaces each factor (3.9) with e −iαJ conj multi (t)δt , (3.10) where J conj multi (t) is a multi-trace term conjugated by a time-ordered exponential built from single-trace sources at earlier times; there is no effect on (3.9) from sources at later times as these merely cancel between U −1 single and U . One may make an analogy between U and the time-ordered exponential that implements Heisenberg-picture time-evolution in some quantum system, with J single playing the role of the free Hamiltonian and J multi playing the role of the interaction terms. Our conjugation by U single then plays the role of passing to the interaction picture, where the new time evolution is a product of factors like (3.10).
We wish to write ρ conj 12) and to expand (3.8) in powers of ρ −1 A0 ∆ρ conj A . We then expand ∆ρ conj A , in powers of α. In this latter step will write each ∆ρ conj A as a sum of terms of the form where J conj multi-j is a time integral of J conj multi (t) and the factors are appropriately time-ordered. It is important to note that |0 0| is not conjugated by any such unitary; the privileged position of this operator at the end of the chain of repeated commutators means that, in our analogy with transforming between the Heisenberg and interaction pictures, this operator can be thought of as labeled by the earliest possible time so that the picture-changing transformation acts on it trivially. The point of the form (3.13) is that we will shortly (see section 3.3.2) transform the expansion of (3.8) into a form involving only vacuum correlators of products of operators. The repeated commutator structure of J conj multi-j will then forbid single-trace terms from appearing in correlators unless they are appropriately connected to multi-trace terms.

The entropy as a correlator
The next step in our argument is to show how S A can be written as a sum of vacuum correlators of products of the O i with functions of the boost Hamiltonian K, and where the remaining N -dependence of each term follows directly from powers of α = N and (2.10). In particular, all extra traces over A c will be removed, and the modular Hamiltonians H A , H A c will not appear except in the combination K = H A − H A c . This last feature will be critical in controlling contributions from disconnected correlators.
As a first step toward this goal, we may use (3.11) to express S A as the vacuum correlator where operators on region A are to be interpreted as operators on the full CFT Hilbert space by tensoring them with the identity on A c . Note that log ρ conj , so that we may interpret log ρ conj A as the logarithm of an operator on the full Hilbert space.
We next remove the explicit traces over A c . These enter through the definition (2.6), and are potentially problematic because the sources involve products of operators on A with operators on A c . The trick to proceeding is to use the assumption that each term in (2.10) is supported away from ∂A (so that contributions to each source-term from A and A c commute with each other) to write each term (3.13) as a sum of terms in which all operators on A c have been commuted to act directly either on |0 from the left or on 0| from the right.
Using the entanglement properties of |0 , we may now replace each operator in A c by a so-called "mirror operator" on A; see e.g. [46], though the particular terminology is from the more recent [47]. To explain how this works, let us for the moment take A to be the "southern" hemisphere of our S d−1 at t = 0. In any relativistic theory, given an operator O, not necessarily a scalar or even local, we may study the CPT conjugate operator O CP T . Since the parity operation exchanges the north and south hemispheres, the CPT conjugate of any northern hemisphere operator at t = 0 (i.e., on A c ) is an operator on the t = 0 southern hemisphere A. Furthermore, these operators satisfy where the second relation is just the Hermitian conjugate of the first. These are just Kubo-Martin-Schwinger (KMS) relations in terms of the imaginary time evolution generated by our K, so they encode the thermal nature of |0 with respect to H A . Thus the mirror operatorÕ := e −K/2 O CP T e +K/2 satisfies The key point is that the CPT conjugate of an O i is just another O j (or perhaps a linear combination thereof). So at the expense of introducing additional factors of e −K/2 = e −H A /2 e H A c /2 , we may replace the A c operators acting on σ 0 by CPT-conjugate operators acting on A. Conformal invariance then guarantees that we can again perform a corresponding operation to replace operators on A c with those on A for more general ball-shaped regions. The new operators e H A c /2 may then be commuted past A-operators to act on σ 0 , where they may be replaced by factors of e H A /2 using (2.8). The net result is thus to write ∆ρ A as a sum of terms of the form (3.17) Operators supported away from A c can now be pulled outside the trace over A c . By assumption these include all O i A , but we should take care with the factors of e H A /2 which include support near ∂A. We may do so in each term by using the Zassenhaus formula e X+Y = e X e Y e −[X,Y ]/2 e  for H A /2 = X + Y , with X = H far A /2 a Hermitian integral of the subtracted stress tensor N O 0 A weighted by a smooth function supported away from ∂A and Y a similar integral supported close enough to ∂A to avoid overlap with the support of any O i A appearing in the given term. Since both are smooth integrals, the supports of X and Y will overlap with each other and thus yield non-zero commutators in (3.18 where (3.19b) is the inverse of (3.19a) and the final two relations (3.19c), (3.19d) are the adjoints of (3.19a), (3.19b). The expressions (3.19) allow us to safely reformulate (3.17) as We are now close to achieving our goal. If ρ −1 A0 ∆ρ conj A would commute with ρ A0 , we could use log ρ conj A − log ρ A0 = log 1 + ρ −1 A0 ∆ρ conj A and expand (3.14) in a standard Taylor series. Combined with our results above, this would give an infinite sum of terms with each being a vacuum correlator of a product of O i and exponentials of K, consistent with the form required above. 5 But since ρ −1 A0 ∆ρ conj A and ρ A0 generally do not commute we must use the Baker-Campbell-Hausdorff formula to evaluate log ρ conj An explicit formulation of this identity due to Dynkin takes the form (see e.g. [48]) We wish to set X = −H A = log ρ A0 and Y = log The important observation is that the form of ρ −1 A0 ∆ρ A found above clearly commutes with H A c , so [H A c , Y ] = 0 as well. Recalling from (2.7) that X = −H A also commutes with H A c allows us to replace each X in the above repeated commutators with −K = H A c − H A . The result is an expression for S A as an infinite sum over terms, each of which is a vacuum correlator involving only products of O i A and functions of K, and with all further explicit dependence on N coming from (2.10) as desired.

Counting powers of N
We are now ready to count powers of N . Consider then contributions at order α p involving r single-trace operators O from single that come from single-trace source-terms and s singletrace operators O from multi that come from multi-trace source-terms. Such a contribution 5 With the exception of the term ρ −1 A0 ∆ρ conj A HA . This term resembles the mulit-trace contributions to ∆ log ρA0 considered in section 3.1 and is also of order N 0 for similar reasons.
is equal to a product of connected correlators each with r i of the O from single and s i of the O from multi such that r = i r i and s = i s i . We need not keep track of the factors of K since each contributes an explicit factor of N that is cancelled by the extra 1/N from (2.1) associated with the O 0 it contributes to any correlator. Each connected correlator gives r i + s i − 2 factors of 1/N by (2.1). The single-trace sources provide an additional r explicit powers of N . For double-trace sources the explicit N from α = N cancels against the 1/N in (2.10) and for higher-trace sources the contribution is more suppressed. So the total number of factors of 1/N is greater than or equal to i (s i − 2). We wish to show that this sum is non-negative.
Recall that the reason multi-trace sources affect the entropy is that they can contain products of operators in A with operators in A c . But a given term will only be sensitive to the correlations created by these sources if at least one operator from A and one from A c appear in the same connected correlator. So it is natural to expect that the entropy will only receives contributions from terms with s i ≥ 2 for all i, which would imply the desired result.
To show that this is so, we note that there are no contributions from terms with s i = 0. This is because all of the O from single occur in nested commutators inside some J conj multi-j and therefore must be connected to at least one O from multi in order to contribute. Now suppose s i = 1, where we take the relevant O from multi-j to come from a m-trace source term living inside some J conj multi-j . Since O from multi-j = 0, a non-vanishing connected correlator must involve other operators. None of these can come from single-trace source terms in other J conj multi-k for k = j, which must instead stay attached to their own multi-trace sources. So since s i = 1, if any of these operators come from a final K then the correlator must vanish due to K|0 = 0 = 0|K and the fact that (3.13) prohibits a final K from intervening between factors coming from any given J conj multi-j . So the correlator consists only of O from multi-j , single-trace source-terms, and K's coming from a single J conj multi-j . Terms of this form exponentiate, and have the effect of replacing the operator O from multi-j in J conj multi-j with a classical source of order N , demoting J conj multi-j to an (m − 1)-trace source of order N −m−3 (i.e., the same order as the (m − 1)-trace sources already appearing in (2.10)). So we need not consider such terms separately when counting powers of N . In particular, double-trace sources demoted to single-trace sources in this way can be absorbed into the U single of section 3.3.1 and do not contribute to the entropy. So all relevant terms have s i ≥ 2 and the largest possible contribution to the entropy is of order N 0 as anticipated above.

Discussion
The work above contains first steps toward studying the von Neumann entropy of excited states in CFTs satisfying the 't Hooft large N counting rule (2.1). We considered the entropy of ball-shaped regions for states produced by real-time sources with non-singular action on the modular Hamiltonian and of the form (2.10). From a dual bulk point of view, such sources produce small classical waves -or more properly quantum coherent states with large amplitude of order N -on both sides of a Rindler horizon but which do not disturb the bifurcation surface itself. They also add O(N 0 ) entangled sets of particles across the horizon. As the former does not change the HRT entropy, one expects that the CFT entanglement changes only at order N 0 . We have verified that this is indeed the case by a direct argument in the CFT. While our results are directly formulated in terms of a 1/N expansion, it may be interesting to follow [5] and attempt to formulate a version of our results that would hold in arbitrary CFTs, whether or not they have a large-N counting rule like (2.1). 6 One consequence of our work is the explicit formula (3.2b) for the order 2 N 2 relative entropy. When interpreted in the bulk theory, this formula is precisely the bulk stress tensor on the A side of the bulk horizon, integrated so as to give the associated change in the boost energy K. This may be seen from the fact that (3.2b) is just the contribution to K from the excitations on the A side of the bifurcation surface, and by bulk causality that it thus gives the value of the boost Hamiltonian K in a related bulk solution produced from the vacuum using only the sources O A in the causal past of A. The above claim then follows by writing K in terms of a standard 'bulk stress tensor' that includes quadratic contributions from gravitons as well. For concreteness, one may choose the graviton contribution to be given by the AdS analogue of the Landau-Lifshitz 'pseudo-tensor' expression [49], though our requirement that perturbations vanish near the bifurcation surface means that many other choices give equivalent results 7 .
The above observation means that our analysis provides a new argument for the universal coupling of gravity to all classical forms of stress-energy. Though it differs in detail from [4], this interpretation of our classical (O(N 2 )) result is inspired by the quantum (O(N 0 )) argument of that reference and reinforces the connection found there between the universal structure of variations in S A and the universal coupling of bulk gravity.
Another consequence is to give a derivation of the order N 0 Faulkner-Lewkowycz-Maldacena correction [23] to the Ryu-Takayagi and Hubeny-Rangamani-Takayanagi conjectures (for ball-shaped regions A). In the semi-classical bulk, this argument is perturbative in departures from empty AdS. It thus complements the original reasoning in [23] in that it does not rely on the Lewkowycz-Maldacena argument [12], or on any other use of the replica trick. At first order in the result follows from the first law, but the arguments of section 3 also give this result at higher orders. In general, we may decompose S A into a first-law-piece and R A . Recall that at second order R A is quadratic in the first order contribution to ∆ρ A and in particular is given by where J double is the double-trace part of (2.10). As before the only terms in J double that can contribute to ∆S A are those that involve one operator in A and one in A c . Such terms throw entangled pairs into the bulk, with one member of each pair on each side of the bulk horizon. Given the agreement of bulk and CFT vacuum correlators, it is manifest that the corresponding change in bulk entanglement at this order can be computed just as we have done in the CFT and that the results agree. A similar argument holds at higher orders in , where the repeated commutator structure of (3.13) reproduces the effect of propagating small effects from the multi-trace sources through the large semi-classical coherent state produced by the single-trace parts of (2.10). It also extends to higher orders in N to argue that bulk entanglement gives the full series of 1/N corrections to HRT. Note, however, that since our sources are confined to D(A) ∪ D(A c ) all perturbations vanish at the bulk HRT surface. Our derivation is thus insensitive to possible perturbative shifts of the HRT surface of the kind predicted in [52]. Conversely, extending our results to allow sources supported on ∂A would in principle allow us to test the conjecture of [52] that the full CFT entropy is given by the generalized entropy of a quantum extremal surface, defined as the bulk surface that extermizes the bulk generalized entropy.
We have focussed on ball-shaped regions for simplicity, but we expect our arguments to generalize to arbitrary regions A. Indeed, as described in [53], one may address perturbative deformations of ball-shaped regions by inserting additional factors of the stress tensor (or, equivalently, of N O 0 ). These insertions tend to add another operator to each connected correlator, giving an extra 1/N from (2.1) that cancels the explicit new factor of N . A similar argument can be used to compute corrections to (3.15) and construct the relevant "mirror operators" for these deformed regions. So the only ingredients of our analysis that remain to be checked are that H A , H A c can be thought of as operators on the full CFT Hilbert space that commute both with each other, and with local operators supported away from ∂A. It should be possible to analyze these assumptions perturbatively as well. We expect that these will indeed hold at this level, but verifying this is beyond the scope of our work.
The extension to include sources supported near ∂A would clearly be of great interest. At linear order in the contribution to S A from such sources is governed by the first law and was effectively studied in [2,3]. But at this order there is no displacement of the bulk extremal surface. In particular, for ball-shaped regions A the bulk HRT surface continues to coincide with the bifurcation surface of the corresponding event horizon; i.e., with the causal holographic information surface of [54], which is expected to compute some coarse-grained version of the CFT entropy [54][55][56]. In contrast, the two are distinguished at second order, so comparing bulk and CFT computations may give insight into the nature of the relevant coarse-graining. In particular, one might hope to either support or falsify the conjecture [56] that it corresponds to the maximizing the entropy over all states for which certain one-point functions coincide with the original state.
In addition, comparing bulk and CFT computations to second order would derive or falsify the HRT conjecture at a non-trivial level. Even for static perturbations the fact that it avoids the replica trick would make this a useful complement to the Lewkowycz-Maldacena argument [12], and the perturbative method should be able to address general time-dependence to which [12] does not apply. And since second-order results are sensitive to the displacement of the HRT surface, they could in particular detect any possible motion of this surface in imaginary directions within the complexified AdS spacetime; i.e., they could help diagnose the possible role of complex extremal surfaces as explored in [14].
One reason that we have avoided such singular terms here is that they are in principle sensitive to the particular way that entanglement is to be defined in the CFT. Since the CFT is a gauge theory, this can involve a number of subtle issues [35][36][37][38][39][40][41][42][43][44]). Lewkowycz-Maldacena [12] suggests that the correct notion of CFT entropy is defined by the replica trick, which is precisely the computational tool we wish to avoid. However, the results of [35][36][37][38][39][40][41][42][43][44]) also suggest that the various definitions of entropy differ only by a local boundary term that will cancel in computing the mutual information between pairs of regions A, B. One should thus be able to ignore such concerns in this context. We hope to compute the 'singular' second order terms and to explore the above issues in the near future, perhaps using a suitably-generalized version of the calculation in appendix A.

Acknowledgments
It is a pleasure to thank David Berenstein, William Donnelly, Eric Dzienkowski, Monica Guica, Tom Hartman, Veronika Hubeny, Ted Jacobson, Nima Lashkari, Juan Maldacena, Mukund Rangamani, Vladimir Rosenhaus, Mark Van Raamsdonk, and Aron Wall for helpful discussions and feedback. We especially thank Tom Faulkner for his comments on an early draft of this paper. This work was supported by the National Science Foundation under grant numbers PHY12-05500 and PHY15-04541, and by funds from the University of California. In addition, K.K. is supported by the NSF GRFP under Grant No. DGE-1144085. D.M. thanks the Aspen Center and its NSF Grant #1066293 for their hospitality during the discussions where certain aspects of this project were conceived. He also thanks the KITP for their hospitality during the final stages of the project, where his work was further supported in part by National Science foundation grant number PHY11-25915.

A Computing the relative entropy
This appendix derives an explicit formula for the second order change δ 2 R A in the relative entropy R A corresponding to an arbitrary change δρ A in the reduced density matrix for A. This result is not directly used in the main text, other than writing (3.2a). Our final expression (A.14) bears a striking resemblance to Eq. (C7) of [5]. In fact, (A.14) can also be derived from a straightforward generalization of the calculation leading to (C7). We present a different, somewhat more involved calculation here because we hope that this approach will be useful for analyzing the additional terms at order 2 that arise when the sources in (2.10) do not vanish in a neighborhood of ∂A.
The setting for the calculation below is an arbitrary bipartite quantum system, meaning that it is the tensor product of a system on A and one on A c . As discussed in the main text, the actual system we study is not strictly of this form, though it can be treated as such at least under our assumption that each term in (2.10) vanishes in a neighborhood of ∂A. So a critical step in realizing the hope expressed in the paragraph above is understanding modifications that arise when this assumption fails.

A.1 The Baker-Campbell-Hausdorff Formula
We will compute δ 2 R A using the BCH formula in the form (3.22). Since we work only to order Y 2 , it will be useful to rewrite (3.22) as where C n and D n,k are rational numbers. It was shown in [45] that where the sum on the right hand side converges for y ∈ (−2π, 2π). Note that B 2m+1 = 0 for m ≥ 1.
To compute δ 2 R A we will also need the coefficients D n,0 . To our knowledge these coefficients have not previously been explicitly computed. But a straightforward application of the recursive technique developed in [45]  To obtain the second equality first note that for even n the only non-vanishing terms in the sum are k = 1 and k = n, and that these terms cancel. For n = 1 the equality is easily checked by hand. For odd n ≥ 3, only the even k terms survive so (−1) n−k = −1. The remaining sum is easily evaluated using a well known identity due to Euler and independently rediscovered by Ramanujan (see for example Eq. None of the D n,k≥1 terms from (A.1) contribute to (A.7) because they can only appear in the second term of (A.7) and where the first line is straightforward to prove by induction. In the second equality we have inserted a factor of e H A ρ A = 1 into the trace to convert it to an expectation value with respect to the reduced density matrix ρ A . Since all of the operators inside the expectation value live on A, this is equivalent to the expectation value with respect to the full density matrix ρ = |0 0|, which, as in the main text, is denoted by . . . . The last equality follows from H A |0 = H A c |0 , the definition K := H A − H A c , and the binomial expansion.
Similarly, the second trace in (A.11) can be rewritten in the form where f is the smooth function defined in (3.3). Note that here (and in previous expressions involving correlators) the operators H A , δρ A should be understood as tensor products with the identity operator on A c . We computed the above sums using The representation of log ρ used in e.g. appendix C of [5] may be useful for this purpose.
Note that since K is Hermitian and f (y) − f (y) = −1 + y + e −y (1 − e −y ) 2 > 0 (A. 16) for real y, the eigenvalues of f (K)−f (K) are real and positive. This fact implies δ 2 R A ≥ 0, with equality if and only if δρ A e H A annihilates the vacuum. But since the restriction of |0 to A is non-degenerate, this can occur only if δρ A = 0. Thus (A.14) is consistent with positivity of the relative entropy.