Linearity of Holographic Entanglement Entropy

We consider the question of whether the leading contribution to the entanglement entropy in holographic CFTs is truly given by the expectation value of a linear operator as is suggested by the Ryu-Takayanagi formula. We investigate this property by computing the entanglement entropy, via the replica trick, in states dual to superpositions of macroscopically distinct geometries and find it consistent with evaluating the expectation value of the area operator within such states. However, we find that this fails once the number of semi-classical states in the superposition grows exponentially in the central charge of the CFT. Moreover, in certain such scenarios we find that the choice of surface on which to evaluate the area operator depends on the density matrix of the entire CFT. This nonlinearity is enforced in the bulk via the homology prescription of Ryu-Takayanagi. We thus conclude that the homology constraint is not a linear property in the CFT. We also discuss the existence of entropy operators in general systems with a large number of degrees of freedom.


Introduction
Entropy is not a linear operator while area is, yet in gravity these two quantities are usually equated.
This was first observed in the context of black hole thermodynamics where it was shown that the entropy of a black hole is given by the expectation value of the area operator evaluated on its event horizon [1]. This operator is a nonlinear functional of the canonical variables of quantum gravity (the metric and conjugate momentum) and is understood to be a linear operator which maps states to states. Given that this entropy is a coarse-grained thermodynamic quantity, it seems plausible that it can be represented by a linear operator very much in the same way that the entropy of a gas can be represented by its energy. One should probably expect this property in systems with a thermodynamic limit and which are known to thermalize.
A more paradoxical relationship between entropy and area arises in the context of the AdS/CFT correspondence. This correspondence is a duality between string theories living in d + 1-dimensional asymptotically Anti de Sitter (AdS) space and certain d-dimensional conformal field theories (CFTs) which can be thought of as living on the boundary of AdS [2]. One way in which these two descriptions are connected is via the identification of the central charge of the CFT with the ratio of the AdS length to the Planck length to some positive power, c ∼ (L AdS /l P ) # . This duality provides a nonperturbative definition of a certain class of theories of quantum gravity in asymptotically AdS spacetimes in terms of a certain class of CFTs.
An outcome of this duality is that the strong coupling and c → ∞ limit of the CFT is described on the AdS (bulk) side by classical gravity with a gravitational constant G N ∼ 1/c # , demonstrating the strong/weak dual nature of AdS/CFT. It is in this limit that a remarkably simple, albeit confusing, formula for the entanglement entropy of any region of the CFT was proposed [3]. It was suggested that, in static situations, the entanglement entropy of a subregion R of the CFT is given by the area of the minimal area bulk surface X anchored to the boundary of R, ∂X = ∂R, and homologous to R, denoted by X h ∼ R, (1.1) We shall refer to this henceforth as the RT formula. This formula was proven in [4] under certain reasonable assumptions including the extension of the replica symmetry into the dominant bulk solution. It was also extended to the time dependent case in [5] where the minimal surface generalizes to a spacelike extremal surface. Another proposal for the time dependent case was presented in [6]; their prescription was to find the minimum area X on every possible spatial slice containing the interval R, and then to pick out from this set the one with maximal area. Our focus here will be mostly on the static case.
In the same way that the Bekenstein-Hawking entropy receives corrections from entanglement of quantum fields across the horizon [1,7], the entanglement entropy of a region of the CFT also gets corrected [8]. This analogy was spelled out more generally in [9] which discusses further corrections to the entanglement entropy. However, all these corrections JHEP02(2017)074 are subleading in c, and are manifestly not given by the expectation value of linear operators. At leading order in c there will also be higher derivative corrections involving various curvature invariants which are linear operators like the area. Our focus in this paper is solely on the linear nature of the leading area term of the entanglement entropy, and so will center the discussion mainly on the RT formula.
In contrast to the notion of entropy in black hole thermodynamics, formula 1.1 equates the expectation value of the area operator with a truly microscopic measure of information. This microscopic measure, or entanglement entropy, is given by where − ln ρ R is an operator on R alone, and returns the correct entropy only for the specific state |ψ . One can try to extend this definition to apply to a basis of states |ψ i and construct the following entropy operator, This operator would produce the correct result for any element of the chosen basis. However, it will it general fail to do so for linear combinations. Take for example the two-qubit Hilbert space spanned by the product states |ij , with i, j ∈ {0, 1}. The entropy operator for a single qubit should have zero expectation value for any state in this basis. Since this statement is also true for any other product basis, we conclude (erroneously) that the entropy operator is zero. The confusing aspect of the RT formula is that it seems makes the replacement thus, identifying the area operator with the modular Hamiltonian [10,11]. Just as − ln ρ R was state dependent, the surface on which the area operator is evaluated, X min , depends on the dual geometry and consequently on the state. One might then be tempted to generalize the RT proposal to S R (|ψ ) = ψ|Â|ψ , (1.5) where the projection operators, P i , project onto subspaces of states with the same classical geometry, and X i min is the extremal surface in that geometry. We also removed the operator symbol from the area term due to the presence of the projection operators. Moreover, this construction assumes that we are working at leading order in the 1/c expansion. To this accuracy, different semi-classical states are orthogonal and this operator is block diagonal in this basis allowing for no off-diagonal terms between states of different geometries.

JHEP02(2017)074
However, as we will argue below, a minimal area operator can be constructed as a gauge invariant linear operator in the Hilbert space. Thus, it is sufficient to generalize RT by simply writing the area operator asÂ where now X min will be operator valued and will specify the location of the minimal area surface in any geometry. We will investigate how we expect the off-diagonal elements of this operator to behave. The goal of this paper is to study the applicability of this interpretation of RT beyond semi-classical states. Since the question in focus is about linearity, we investigate what both sides of equation (1.5) produce for states dual to macroscopic superpositions of distinct bulk geometries. We are thus considering an 'extended RT proposal' which asserts that the entanglement entropy of a subregion in the CFT is still given by the expectation value of the area operator within states dual to superpositions of geometric states. This extended RT proposal does not follow trivially from the RT formula (or its derivations) for a single geometry; we will nevertheless provide evidence in favor of the extended RT proposal within certain limits. Since it is crucial to compare the calculation on both sides of the duality, we will focus on the context of AdS 3 /CFT 2 , relying heavily on computational techniques of holographic 1+1 CFTs. Our main probe will be the entanglement entropy of a single interval on the cylinder. What we will find is that the entropy behaves like a linear operator within a large class of subspaces of semi-classical states of dimension less than e O(c) .
Is this result general? We argue yes. By analogy with thermodynamics, where changes in entropy are related to fluxes of energy (a manifest observable), our proposal is that a thermodynamic or large N limit is sufficient to have entropies which behave as the approximate expectation value of a linear operator. Our result that entropies average in two dimensional holographic CFTs supports this proposal. We also exhibit an information theoretic setting where entropy behaves as the expectation of a linear operator. The key idea is that in an appropriate thermodynamic limit the entropy can be determined by performing a measurement which only weakly disturbs the state. Finally, we discuss a number of related issues including the non-linearity of the Renyi entropy, the precise limits of linearity, and the role of strong coupling.
A similar proposal has been sketched by [12]; we discuss in more detail the relationship between their proposal and our work in the discussion.

The area operator of the Ryu-Takayanagi proposal
In this section, we define the quantities that appear on the right hand side of equation (1.5) in some more detail. Firstly, this formula was checked in 1 + 1 holographic CFTs for many states dual to semi-classical geometries, 1 i.e. small quantum fluctuations on a fixed JHEP02(2017)074 gravitational background, where the notion of area is unambiguously well defined. The natural interpretation of these states is as coherent states constructed from the metric and conjugate momenta and are highly peaked about some classical solution of Einstein's equations. The fluctuations about the classical solution are suppressed by a power of which is controlled by some negative power of the central charge, c.

A gauge invariant area operator
The area operatorÂ(X min ) is an operator in quantum gravity and needs to be defined in a gauge invariant way [14]. This is usually ensured by defining the operator with respect to something fixed under gauge transformations [15]. In the specific case of AdS, pure gauge diffeomorphisms are those that keep the boundary conditions of AdS fixed [16,17]. Thus, A(X min ) needs to be completely specified by boundary data to be gauge invariant. In particular, the curve X min needs to be localized in a gauge invariant way, i.e. determinable purely from some boundary data.
The bulk interpretation of the area operator 1.7 on a state can be achieved with the following prescription. Starting with a |ψ of the entire CFT, one can, in principle, construct the background geometry with something like the HKLL formalism [18][19][20] as expectation values of geometric bulk fields. We can then consider all co-dimension 2 surfaces on this geometry that are anchored on the boundary of R and homologous to it and find the specific one that extremizes the area. If there are multiple such surfaces we simply take the one with smallest area. This locates X min in a gauge invariant way. Since the area operator is then evaluated on this surface, it is also gauge invariant.

The boundary support of the area operator
The next thing to determine is the support ofÂ(X min ) in the CFT. In the bulk, this operator lies on the edge of the entanglement wedge of the region R. The entanglement wedge is defined as the domain of dependence of a spacelike surface bounded by R and X min [21]. The part of the bulk this process carves out is the entanglement wedge. That all operators within the entanglement wedge have representations only on R has been argued for in [22], and proven recently in [23]. We will take this point of view for the rest of this paper. One might worry aboutÂ(X min ) acting on the edge of the entanglement wedge and whether that really should be considered as part of the wedge. This can be dealt with by definingÂ(X min ) in a limit sense; follow the same prescription for a slightly smaller interval, that area operator is guaranteed to lie within the entanglement wedge of R [13], and then take the limit as the intervals become the same size.

The linearity of the area operator
A commonly raised question about the minimal area operator of the RT proposal is whether it is state dependent in the same way as the entanglement entropy. We argue that while it is certainly state dependent with regards to picking a different surface for each state it nevertheless is still a linear operator. This is a mild form of state dependence, otherwise known as background dependence [24], unlike what is found with the entanglement entropy.

JHEP02(2017)074
We described in the introduction, around equation (1.6), how the minimal area operator can be constructed to leading order in the 1/c expansion by using projection operators that project onto subspaces with the same background geometry. This operator is by construction a linear operator and is block diagonal in a semi-classical basis. Thus, it contains no off-diagonal terms between states of different background geometry.
Here we want to present a different definition of the minimal area operator which allows for the presence of off-diagonal terms. We make no statement about the uniqueness of this construction but believe that all definitions will behave more or less in the same way. In particular, they should all agree with 1.6 in the infinite c limit.
First, let us write the area operator that is evaluated on some surfaceX. We will shortly specify more carefully howX is defined. The area iŝ A(ĝ,X) = dŷ ĝ(ŷ). (2.1) The measure dŷ should be understood as determining the domain of the integral as localized on the surfaceX. In order for this quantity to be well defined, we need to specifyX in a gauge invariant way. This can be achieved by pinning down its location relationally in terms of proper distances to some boundary points b via an operator relation dĝ y, b;θ(b) =f (b); the metric dependence comes in from the definition of the proper distance betweenŷ and b. The operatorθ(b) specifies along which geodesic to travel into the bulk and the operator f (b) determines the amount of proper distance required to reach the surface. Together, they determine the shape and location of the bulk surface and will be determined shortly by the minimization condition. Inverting this relation giveŝ This process is only consistent ifŷ is an operator since it depends on the metric field operator,f , andθ. Plugging this back into 2.1 we obtain where Jĝ is a Jacobian factor which depends onĝ,f , andθ. Thus, we can think of the area operator as a function of the background metricĝ and off andθ which specify the surface. To get the minimal area operator we simply require that it is minimal with respect tof andθ, as an operator equation. The next step would be to solve forf andθ in terms ofĝ and plug it back into 2.3. This procedure would ultimately produce a minimal area operator as a function purely of the metric. To be clear, the construction above is formal and, for example, involves non-polynomial functions of the metric. Although it is beyond the scope of our work, it is possible that

JHEP02(2017)074
there is a fully non-perturbative definition of the bulk quantum gravity in which case the formal manipulations above might yield a non-perturbatively defined linear area operator. Alternatively, we may work perturbatively around semi-classical solutions in which case the above formal manipulations can be used to define an operator order-by-order in perturbation theory. Such a perturbative construction is sufficient for most of our statements and amounts to working with a space of states defined on top of the quantum state describing the classical solution. In the context of AdS/CFT, an intriguing possibility is that there exists a CFT operator defined on the whole Hilbert space which approximately reduces to the perturbatively defined area operator around any given saddle.

The area operator on superpositions -A prediction
Having shown that the RT area operator computes a physical gauge invariant quantity in the bulk, it is plausible to assume that it is given by a linear operator when acting on any subspace spanned by semi-classical states. The goal of this subsection is to understand the structure of the off-diagonal components of the area operator within such a subspace.
The correct way to think of the area term appearing on the right hand side of RT is as the saddle point evaluation of the expectation value of the area operator, where g s is the dominant saddle point of the partition function. Note that this is an O(c 0 ) number; here we are evaluating the expectation value of the area operator without the factor of 1/G N . One way to see this is via the generating function of moments of A, Thus, semi-classical states that can be prepared using the path integral become eigenstates of the area operator in the infinite c limit. 2 The rest of this section will investigate the nature of the suppression of the off-diagonal elements of the area operator.
States with energy O(c 0 ). Consider first the subspace of states of energy O(c 0 ). All of these states are dual to pure AdS with a very diffuse gas of particles or possibly black holes whose mass does not scale with c. Einstein's equations predict that the deformation 2 The same result can be obtained by taking c → ∞ with J/c fixed, differentiating with respect to J/c, and then taking J/c → 0. If we differentiate but do not set J/c → 0, then we are computing the flucutation of the area operator around a different saddle point. These two ways of computing the flucutation around the original saddle point will agree provided the limits c → ∞ and J → 0 commute.

JHEP02(2017)074
of this stress energy away from pure AdS will be suppressed by 1/c. To see this, consider the linearized form of Einstein's equations where h µν is a perturbation of the metric, T αβ is the stress energy of matter in AdS, and is some differential operator. The background spacetime is determined by the sourceless Einstein equations. The deformation of the area of a surface away from the background value is controlled by h as where K is the Green's function solving 2.9. To estimate the cross terms, we first promote equations (2.9)-(2.11) to operator equations. Then the area operator will have the form and we can directly compute its matrix elements. These will be Since the matrix elements of the stress tensor is O(c 0 ) within this subspace, we conclude that the off-diagonal elements in this subspace is suppressed by G N ∼ 1/c. Notice also that the eigenvalues degenerate in this limit. If we consider an arbitrary state within this low energy subspace, one might worry that the small off-diagonal terms could potentially add up and compete with the diagonal terms. However, due to the sparseness condition on the CFT, the dimension of this subspace is not large enough to ever make the off-diagonal terms matter; the number of states of energy O(c 0 ) is O(c) and so the off-diagonal contribution will always be O(c −1 ). Thus we conclude that the area of the minimal surface for any state in this subspace is same to leading order in c.
States with energy scaling with c. At energies scaling with the central charge, it is characteristic of holographic theories to have a fairly dense spectrum possibly admitting a statistical description. Here we have in mind using the eigenstate thermalization hypothesis (ETH) [25,26] to conjecture a form for the area operator at high energies. ETH states that the expectation value of a suitably coarse operator in an energy eigenstate is given by its microcanonical average. This statement is supposed to hold for states with a finite energy density, meaning lim c→∞ E − E g c > 0 (2.14)

JHEP02(2017)074
where E g is the ground state energy. Since the notion of geometry is expected to be an emergent coarse phenomenon of holographic theories, one would expect that the spectrum of the operator which probes this geometry to be dictated by ETH. Assuming ETH, the form of the area operator in an energy eigenstate basis at high energies will bê and A, f are smooth functions of the average energy. R αβ is an erratic function of α and β with zero mean and unit average magnitude. S(E αβ ) is the logarithm of the number of states between E α and E β .
To get a sense of the structure of energy eigenstates, we follow Hawking and Page [27] and consider a system composed of thermal gas and black holes in more than three dimensions. Let us focus on a microcanonical ensemble of states centered around an energy E with width of order c 0 . The dominant state within this ensemble can be determined by comparing the number of states, or the entropy, of the possible configurations with energy ∼ E. In comparing a thermal gas of light particles in AdS and a black hole, one finds four possible phases. Below some energy E 0 all black holes evaporate and the dominant state is a thermal gas. Above a higher energy E 2 all configurations of gas collapse to form a black hole. Between these two energies there exists stable configurations of either a gas or a small black hole, but which configuration dominates depends on the energy. Across some energy E 1 within this window the dominance of the two configurations switches from gas to black hole as the energy is increased. The restriction to greater than three dimensions arises because there are no small black holes in AdS 3 , but in more complicated examples coming from string theory the phase structure can be much richer and can include small "enigmatic" black holes [28,29].
However, we must be cautious in applying ETH reasoning to microcanonical phases with both gas and black hole states because the energies involved scale like c a with a < 1 so these states do not lie within the traditional regime of validity of ETH. Very large black holes, with energy scaling like c, always have E > E 2 and hence reside in a regime where the only stable solutions are black holes. For such energies it is plausible that all microstates "look the same" geometrically and have small off diagonal matrix elements for the area operator in accord with ETH. Assuming also that the area operator is a coarse operator then the off-diagonal matrix elements of the area operator can be neglected until we consider an exponentially large superposition of microstates.
For states of intermediate energy we cannot make as strong a statement. We would expect matrix elements of the area operator between different energy eigenstates to be at least of order O(c −1 ), so that one can still superpose a small number of microstates while neglecting off diagonal matrix elements. It is not even clear if the microstates are geometric at intermediate energies, say between E 0 and E 2 . It is possible that ETH could still apply with a different notion of energy density, i.e. keeping E−E 0 c a > 0 as c → ∞. It might also be possible to construct sets of wave packets, each consisting of many microstates, such that the corresponding states are approximately stationary (on shorter than exponential times) and are approximately geometrical, being either approximately a JHEP02(2017)074 black hole or approximately a thermal gas. Within sets of such approximate black hole states, say, we might again suspect that the matrix elements of the area operator are exponentially small. Summary and prediction. We have presented plausible reasons for thinking that the area operator behaves as a coarse operator and should have suppressed off-diagonal components. We discussed how the area operator maintains the same result for any state within a low energy subspace of energy O(c 0 ). At higher energies something different happens. Consider the expectation value of the area operator in an arbitrary state within a small shell of high energy way above the Hawking Page transition. Using the ETH form of the area operator 2.15 this is where we have assumed that the functions A, S, and f are more or less constant within the considered energy window. Recall that the matrix R αβ oscillates wildly as a function of its indices. Thus, for an arbitrary state with random c α 's the sum above will be highly suppressed. In fact, even if we pick all the c α 's to be equal, it would still not contribute. The only way to deviate from the microcanonical average is by carefully choosing the coefficients to correlate with the fluctuations in R αβ . Even with this fine tuning, this sum can at most be M 1/2 , 3 In order to deviate by an order one amount from the microcanonical average, the state must consist of a finely tuned superposition involving e S(E) states. The result of this subsection is that as long as we don't consider finely tuned states of e O(c) terms then the expectation value of the area operator will simply be the average of the area in each branch of the wavefunction. Therefore, we can combine this with the RT proposal and make a prediction for the behavior of entropy within such a superposition. In particular we predict that for a superposition of semiclassical states |ψ i . We will confirm this prediction in the following sections to come.

How to compute entanglement entropy in 1 + 1 CFTs
Let us review how entanglement entropy of subregions is computed in 1+1 CFTs. We describe the procedure for arbitrary subregions in general states and discuss the simplifi-

JHEP02(2017)074
cations which occur in holographic CFTs. We will explicitly perform the calculation for a single interval in a primary state. This will mostly be a summary of [32,33].

Entanglement entropy and the replica trick
The entanglement entropy, also called the von Neumann entropy, of a subsystem R of a quantum system is given by where ρ R is the density matrix of R obtained by tracing over the rest of the system, ρ R = trR|ψ ψ|. This quantity is usually technically difficult to compute in a quantum field theory due to the logarithm, but can be simplified by using the so-called 'replica trick' to re-express it as where S R n is called the n th Renyi entropy of R. Since the trace of any density matrix is one and all its eigenvalues are positive definite, one can show that the Renyi entropies are absolutely convergent and analytic for all Re[n] > 1. 4 This justifies the continuation of n and allows one to represent the entropy as Thus, the problem of finding the entropy has been reduced to computing the trace of the n th power of the density matrix as an analytic function of n. This latter task can be implemented by evaluating the partition function of the theory on the replicated manifold C n with the different sheets identified across the interval R [32]. With the appropriate normalization this is where Z 1 is the partition function of the CFT in question. When computing the entropy in an arbitrary state |ψ , Z 1 is given by ψ|ψ . Z n is the 'replicated' partition function obtained by gluing n copies of the original CFT along the region R. Note that the replicated density matrix satisfies the condition that trρ n R → 1 as n → 1. It turns out there is a further simplification for computing this quantity. Let us consider the case where R is a subregion composed of N disjoint intervals. By considering the expectation value of the stress tensor within the replicated partition function [32], one can show that that Z n can be written as the 2N -point function of so-called twist operators,

JHEP02(2017)074
where this expectation value is evaluated in the orbifold theory on C n and |ψ ⊗n = n i=1 |ψ . The coordinates u i and v i are the endpoints of the intervals. In this theory, the twist operators behave as primary operators of dimensions and vanishing spin.

A single interval example
Now we specialize to computing the entanglement entropy of a single interval on the cylinder in an excited state. We will consider an arbitrary primary state prepared in the usual way using the state-operator correspondence, where O is an arbitrary primary operator of dimensions h,h. The conjugate of this state is defined as which ensures the state is normalized to one. The trace of the replicated density matrix on an interval R in this state is The location of z andz will be restricted to the unit circle on the x-plane; this chooses a preferred, and natural, time slicing of the CFT on the cylinder. The locations of these operators is presented in figure 1. This is a four point function of primary operators in the orbifold theory on C n . We can use the techniques of conformal blocks to compute this expression. By performing an operator product expansion (OPE) in the t-channel of the two tensor product operators together and the two twist operators together we get where the sum p is over all the primary operators of the theory. Conformal invariance fixes the contribution from all the descendent operators, which are implicitly resummed to give the functions F andF . These functions are known as 'conformal blocks' and are functions of the dimensions of all the operators appearing in the four point function and the internal primary operator. We see that the entanglement entropy depends on the details of the theory through the values of the OPE coefficients C ij k . In holographic theories, those with large central charge and a sparse spectrum of light operators, such a four point function is dominated by the identity block contribution. The OPE coefficient of this contribution is simply 1, giving a universal result for holographic theories. We should note, however, that this dominance of the identity block fails for states composed of many, O(c), light operators as first observed in [34] in the context of supersymmetric CFTs. 5 In this case the OPE coefficients between light operators and the highly composite operator will be proportional to the number of light operators in the composite and will scale as some positive power of c; one can think of this as simply the expectation value of the light operator in the state created by the composite. These non-identity contributions can then potentially compete with the identity block. We will assume in this paper that we are working with states for which the identity block dominates.

JHEP02(2017)074
Let us specialize to the case where O is a heavy operator of no spin, i.e. h =h ∼ c. In the bulk, this dimension translates to the total mass of the spacetime, up to a factor of the AdS radius. As discussed in section 2.4, for large enough operator dimension the dominant configuration in the bulk is a black hole [27]. Since the state is pure, this is more precisely a black hole microstate. The exterior of this black hole is described to a very good approximation by the standard BTZ geometry.
For n greater than one, 3.11 is a four point function of heavy operators. The form of the identity block in this case is actually not known in closed form, but a perturbative expansion in 1−z can be performed [36]. However, a nonperturbative result can be obtained in the limit as n → 1 [37]. Because the dimension of the twist operators is proportional to n−1, the four point function in this limit becomes that of two heavy and two light operators where we took n = 1 + , and restricted to the identity block term. Remember that the blocks are functions of the dimensions of the O's and the twist operators. As discussed in [36,37], this can be obtained in closed form by solving a differential equation with nontrivial monodromy. The leading term in contribution to this four point function is where α = 1 − 24h i /c. Using eq. (3.3) gives the entanglement entropy where β ≡ 2π/ 24h/c − 1, l is the size of the interval, and U V is the UV cut-off. For l < π, this is precisely the answer one would get for the entanglement entropy of an interval in the thermal state given by temperature β. This is a manifestation of the fact that the geometry outside this BTZ microstate is almost identical to that in the BTZ geometry. Naively continuing this expression to l > π actually gives the wrong result for the entropy in that regime. In fact, since the state we considered is a rotationally symmetric pure state we should expect the entropy to be symmetric under l ↔ 2π − l. Since the state it pure, the entropy should start to decrease once the interval encompasses more than half of the system. This is not the case for 3.14.
The resolution of this issue was discussed by [33] where they point out that the identity block contribution in 3.13 is not analytic. In particular it is not invariant under l → l + 2π. Due to this monodromy, the result is sensitive to how σ n (z,z) is wound around the origin where O n is located. Since there is more than one way to get to any point on the unit circle, there can be many different identity block 'channels'. [33] notes, however, that since JHEP02(2017)074 the exact four point function is analytic, the dominant identity block channel must be equivalent to any subdominant identity block channel plus contributions from other nonidentity blocks. Thus, the four point function is well approximated by the dominant identity block contribution across all channels. In this case, this is the channel which involves no winding around the origin and is taken along an arc of angle less than π. This is shown in figure 2. With this understanding the Renyi entropy and, thus, the von-Neumann entropy are both symmetric under l ↔ 2π − l.
There is actually a clearer way to see that 3.11 is manifestly symmetric under l ↔ 2π−l. Consider performing a uniformizing coordinate transformation, that removes the twist operators and puts all the operators on a single complex plane. Under this transformation the coordinates map to for k an integer ∈ [0, n). k labels which branch an operator came from. In this coordinate system the four-point function becomes, up to a proportionality constant that depends on l and is symmetric under l ↔ 2π − l, the following This is a 2n point function of O's located at e i 2πk n and O † 's placed in between at e i 2πk+l n . This is shown in figure 3. This representation makes it clear that the result will be symmetric under l ↔ 2π − l. When l < π the dominant contribution will be from the identity JHEP02(2017)074 To conclude, the entanglement entropy of an interval in a heavy state of zero spin is given by As noted in [33], this result can be obtained from the bulk using the RT prescription but without imposing the homology constraint. It is not actually clear what imposing this constraint would mean given that the interior of a black hole microstate is not really well understood. And finally, one can also extract the answer for a light state, h/c → 0 as c → ∞, by simply continuing h/c → 0. In this limit β → 2πi giving which works for all l.

Entanglement entropy for superpositions of semi-classical states
We present in this section the computation of entanglement entropy for states dual to macroscopic superpositions of semi-classical geometries. We focus mainly on two classes of such states: superpositions of pure one-sided states considered in section 3.2 and superpositions of thermofield doubles of different temperatures. This will mostly be a summary and the explicit details will be left to appendix A.

Superpositions of one-sided AdS spacetimes
Let us begin by considering superpositions of pure one-sided states constructed from the orthogonal basis where O i are primary operators. States with low dimension correspond to perturbations of pure AdS, while those of high dimension correspond to black hole microstates. We want to compute the entanglement entropy of an interval in states of the form where O i are orthogonal primary operators. Following the techniques of section 3, we can compute the entanglement entropy of an interval using the replica trick. Just as before, we need to compute the replicated density matrix of the interval. This is given by These are orbifold symmetric primary operators belonging to the orbifold CFT on C n .
The replicated density matrix 4.5 is thus a sum of four point functions of heavy operators for n > 1. We are interested in computing this quantity for a holographic theory, so we assume the all the four point functions are well approximated by their identity block contribution. Restricting to the identity block in the t-channel offers an immediate simplification of the above expression. Since the identity block can only appear in the expansion of two non-orthogonal operators of the same dimension, only terms with a i = b i contribute. Thus, 4.5 reduces to (4.7)

JHEP02(2017)074
This expression is also symmetric under l ↔ 2π − l for the very same reasons 3.10 is as explained in figure 4. Let us see how this works explicitly. Let us call the terms where any a i = n the 'diagonal' terms and everything else the 'off-diagonal' terms. It is clear that the diagonal terms have the exact form as 3.10, and so this symmetry follows by the same reasoning. There is an interesting twist for the off-diagonal terms. The different operator orderings in 4.6 for the operator and its complex conjugate pair up in just the right way as l is changed. For simplicity, let us focus on the n = 2 case for a superposition of only two primary states. The off-diagonal term of the n = 2 replicated density matrix is given by Notice the difference in the operator orderings in the last equation. After uniformizing, we find that the channel which expands in the latter two, and vice versa. Thus, the identity block exists in either the first pair of terms or the second and not together. We are forced to apply the same channel for all the terms since that choice is inherited from picking a channel of the orbifold symmetric operators in the four point function before breaking it up into its components. It turns out that the identity block from the first pair of terms dominates for l < π and from the second pair for l > π. This exchange ensures the result has the required symmetry. The same line of reasoning applies for arbitrary n and superposition.
To finally compute the entropy, we need to evaluate the four point functions appearing in 4.7, and then preform the sum over the a i 's. As discussed previously, these four point functions are not known in closed form for n > 1, except as a perturbative expansion in l. We evaluate this expression with the following series of manipulations: This differs from 4.7 in the upper limit of the a i sums. It is clear that lim m→n trρ n m = trρ n .
2. Take the limit of n approaching 1, holding m fixed, where we know the explicit forms of the four point functions appearing in the sum.
3. It turns out that even after plugging in these forms, it is still not easy to perform the sum over a. We get around this by first performing an expansion in l and then do the a sum term by term.

JHEP02(2017)074
4. Then take the limit as m → n and act with lim n→1 ∂ n to obtain the entropy.

Finally, resum the series in l.
We believe this procedure gives the correct entanglement entropy based on the following two strong pieces of evidence. One, it reproduces the result from the perturbative expansion of the identity block in the size of the interval. Two, it maintains the requirement that lim n→1 trρ n = 1. The details of the calculation are presented in appendix A. The result we find is that the identity block contribution to the entanglement entropy in the superposition 4.2 is exactly where S i is the entanglement entropy of an interval in the state O i |0 . This will be a good approximation to the entropy as long as the identity block contribution to the replicated density matrix remains dominant. However, as M is increased the number of non-identity block contributions proliferates faster than the identity block terms; there are M 2n terms of the former and M n of the latter. The magnitude of the individual terms from the identity block is larger than a typical non-identity block term by a factor of e #nc . Thus, we expect that the identity block approximation fails once M ∼ e O(c) .

Superpositions of eternal black holes
Next, let us consider superpositions of thermofield double states of different temperature. These are states defined on a product Hilbert space of two CFTs each living on S × R, and are dual to macroscopic superpositions of eternal black holes of different masses. Such states are given by where and Z(β) = e π 2 c 3β is the partition function of the theory. This state corresponds to a bulk superposition of eternal black holes of different mass M i = π 2 c/3β 2 i . Say we want to compute the entanglement entropy of the right CFT. For a single TFD this computes the Bekenstein-Hawking entropy of the dual black hole. To obtain the entropy in the superposition, we first compute the reduced density matrix of the right CFT and find 14) JHEP02 (2017)074 where ρ i = e −β i E /Z(β i ). Immediately, the entanglement entropy is computed by where D(E) is the density of states, which for a holographic CFT on a cylinder scales as e 2π cE 3 for large E. This expression can be evaluated term by term via saddle point. Focusing on a term in the first sum of the above expression, we find that the saddle point evaluates to (4.18) It can be easily checked that ρ R → |α i | 2 ρ i as E → π 2 c/3β 2 i ; ρ i is always picked out as the dominant term in the logarithm. Thus, the contribution of these terms to the entropy is given by (4.20) The first term is simply the average of the entropies of the different branches of the wavefunction, while the second is a classical Shannon entropy known also as the entropy of mixing [38]. The second sum in 4.17 can also be evaluated via saddle point, and we find which is exponentially suppressed in c unless β i = β j . This result essentially follows from the near orthogonality of the thermofield double states of different temperature; their overlap is suppressed by the same exponential factor. Putting these results together, we find that the entanglement entropy of the right CFT is

Linearity vs. homology
We showed in the previous section that, to leading order in c, the entanglement entropy of an interval in states dual to macroscopic superpositions of a small number of distinct classical geometries is given by the average of the entropy in each branch of the wavefunction, thus confirming the prediction 2.18. This is consistent with the statement that the entropy is approximately represented as the expectation value of a linear operator. This linear operator must have small off-diagonal matrix elements between semi-classical states, consistent with the structure of the area operator. As before, all statements are valid for superpositions of much fewer than e O(c) semi-classical states. Moreover, we identified a new correction to the RT formula, the entropy of mixing, which we expect to appear when the density matrices of the CFT subregion, and its complement, in the different branches are distinguishable. In the regime where the leading contribution to RT is the average of the areas of the different branches of the wavefunction, this mixing term is subleading as compared to the area term.
It seems thus far that the leading contribution of the RT proposal is given by the expectation value of a linear operator, namely the area operator. However, in this section we identify another nonlinearity associated with the area contribution which arises when considering e O(c) states but which manifests in different way. In contrast to the failure of nonlinearity discussed in the previous section, this contribution we will be able to compute exactly.

A failure of linearity: homology
In order to see this nonlinearity, we restrict the RT formula to the area term which is always the leading order in c contribution in any semi-classical state. For simplicity we continue to work in the context of 1+1 holographic CFTs. Then, the prescription for computing the entanglement entropy of an interval I in the state |Ψ is S I, |Ψ = Ψ|Â I |Ψ . (5.1) We saw in the previous section that when dealing with single sided pure states the entanglement entropy truly behaved like the expectation value ofÂ I within subspaces of semi-classical states spanned by O i |0 , 6 and of dimension much less than e O(c) . These are pure states of one CFT on one connected manifold, specifically S 1 . One can ask whether this same operator continues to work for mixed states of this CFT, or more specifically, for pure states of two copies of the same CFT. We will focus on the latter case of a CFT living on S 1 L ∪ S 1 R , which we label as left, L, and right, R. The question now is whetherÂ I applied to, say, the right CFT correctly computes the entanglement entropy of an interval on states composed of the basis elements O L i |0 L ⊗ O R j |0 R . If the leading contribution of RT is truly represented by a linear operator then this must be the case.

JHEP02(2017)074
It is clear that it would do so for any single element of this basis, and also for any superposition that produces a pure density matrix for both CFTs. To see the failure of linearity, we need to consider a highly entangled state between the two CFTs. The most convenient such state to consider is the thermofield double which contains order c entanglement between the two CFTs. We choose one where the inverse temperature β is small enough such that the dominant configuration is an eternal black hole. Using the operatorÂ I , the entropy of an interval on the right CFT is 3) The first equality is simply the application of the RT formula. The second comes from the fact thatÂ I is an operator purely on the right CFT as is suggested from the entanglement wedge reconstruction proposal discussed above in section 2.2. Equation 5.3 says that the entropy in the thermal state is simply the thermal average of the entropy in the eigenstates. We can evaluate this sum via saddle point methods while keeping in mind that the area operator is a coarse operator and will not shift the saddle point to leading order in c as discussed in section 2.4. We find that where E s = π 2 c/3β 2 is the average energy of the canonical ensemble at temperature β. Thus, we have found that the entanglement entropy in the thermal state can be approximated by that of the pure state at the average energy of that ensemble. The state |E s is a pure black hole of the right CFT whose exterior geometry agrees with that of the thermal state to leading order in c. This result is immediately problematic; consider the situation where we are computing the entropy of the entire CFT, or I = 2π. This implies S(2π, |β ) ≈ S(2π, |E s ) = 0 (5.6) which is obviously wrong! This should compute the entanglement between the two CFTs in the thermal state which is proportional to c, reproducing the holographic result of computing the area of the eternal black hole. This issue is very reminiscent of the earlier objection using qubits discussed in the introduction; the entanglement entropy operator which computes the entropy of the entire CFT is the zero operator when constructed in a basis of pure states. Note also that this is different from the problem of cross terms in the area operator adding up and changing the answer when there are too many states in the superposition. The reason for this distinction is that the thermal density matrix is diagonal and thus the cross terms E |Â|E do not appear. We will discuss this issue and its relation to the CFT calculation more carefully in the next section. Surprisingly, however, the formula does not fail for all interval sizes. Let us consider the bulk prescriptions for computing the entropy as a function of the size of the interval for the JHEP02(2017)074 Figure 5. Minimal area surfaces which compute the entanglement entropy of various intervals of the boundary CFT. Entropy of intervals smaller than π are the same for both a pure and eternal block hole, and are given by green and blue curves. The two cases begin to differ once the interval is larger than π, as those are given by different bulk surfaces as shown by the red and magenta curves. We note that the difference for intervals that cover almost the entire boundary is exactly the black hole entropy. thermal state and the pure state. Starting with a small interval, we find that formula 5.5 gives the correct answer to leading order in c up until I = π. The discrepancy begins as soon as I > π and gets worse as we make the interval larger. As noted, while 5.5 falls down to zero, as it must, the thermal answer saturates at the thermal entropy 2π 2 c/3β.
The holographic reason for this discrepancy is clear, and is presented pictorially in figure 5. From the bulk perspective, the difference stems from the differing bulk prescriptions for picking out the minimal area extremal surface in the single sided black hole geometry versus the two sided eternal black hole. Recall that these geometries agree in the exterior of the black hole. As shown in figure 5, the extremal surface that computes the entropy for small intervals is the same for both cases up until I = π. Beyond this point, the extremal surface for the single sided case jumps across to the other side of the black hole, while it remains on the same side for the thermal case. Even though the surface on the other side has smaller area than the surface on the same side, the homology constraint forces the surface of thermal case to stay on the same side. As previously discussed in section 3.2 and [33], it is not clear what it means to impose the homology constraint in the pure case as there might not be a geometric interior to these black holes [39,40]. Nevertheless, the CFT result requires the jump to the other side. This can be interpreted loosely as not imposing the homology constraint in the pure case.
We conclude that there is not a single entropy operatorÂ I which gives the correct entropy for pure and highly mixed states for all intervals I. From a bulk perspective, the nonlinearity was introduced by the requirement of imposing the homology constraint in one case but not in the other. One can thus think of the homology prescription as specifying the set of surfacesX that we are allowed to extremize the area operator over. Thus, the homology constraint precludes the entropy from being the expectation value of a single linear operator defined on only one CFT.
We note that homology being the source of nonlinearity of entanglement entropy is very reminiscent of the recently discussed 'wormhole' operator that measures whether two JHEP02(2017)074 separate AdS bulk spacetimes are connected via an Einstein-Rosen bridge. This is clearly a nonlinear property of a state since the thermofield double, while dual to a wormhole, is a superposition 7 of states with manifestly no geometric connection [42]. 8

The source of the homology constraint in the CFT
From the bulk perspective, the discrepancy found in the previous section was due to imposing the homology constraint. The considered thermal state is a two-sided superposition of e O(c) states; the number of states is actually infinite, but the relevant terms which dominate the canonical ensemble are those with energy roughly the average energy at the considered temperature and number around e O(c) . Therefore, we see that linearity fails once we have a large number of terms in the superposition. However, in contrast to the previous issue of non-identity block contributions becoming important, we will see that the homology constraint can be explained via exchange of identity block channel dominance.
Let us compare the entanglement entropy computation of two states with the same bulk dual, at least from one side. First, consider an approximate form of the thermofield double. The TFD state can be approximated by terms within an energy shell of width O(c 0 ) around the average energy, E s = π 2 c/3β 2 . This defines a microcanonical ensemble. Let us assume β is small enough to be above the Hawking-Page transition, thus each term in this state is dual to a large black hole in AdS. We can then estimate the number of terms in the considered energy shell to be given by the Cardy formula, e 2π cEs 3 = e 2π 2 c/3β . This approximate state is where we take the O L,R i to be primary operators of dimension roughly E s . The restriction to primary operators is a further approximation, since the number of descendant states in the considered energy shell is an order one fraction of the total number of states. However, this approximate state is expected to be accurate when studying coarse-grained observables, namely those that satisfy ETH. As we will momentarily show, this state reproduces the RT result of the entanglement entropy of an interval in a state dual to an eternal black hole.

JHEP02(2017)074
We will actually first consider a truncated version of 5.9 to any M terms, The specific choice will not matter since all of these operators have roughly the same dimension. The state we want to compare this to is a pure state on the right CFT constructed from the right operators appearing in 5.10. This is The bulk dual of this state is a pure black hole, or a microstate, whose exterior geometry is given by that of BTZ. Recall in our discussion below equation (2.17), such a state is not atypical enough from the perspective of coarse observables. Both states 5.10 and 5.11 describe the same right exterior geometry. Let us compute the entanglement entropy of an interval in these states. Their replicated density matrices are 14) and the contraction symbol between the operators indicates pairing of the same permutation. This is the only difference between the two replicated density matrices. Also, the pure replicated density matrix obtains the presented form only after restricting to terms with identity block contributions. Let us compare the contributions to the entropy term by term, starting with the a i = n terms. These are equal in both cases and produce the result Just as in figure 4, the l < π channel corresponds to uniformizing and expanding the operators on e i 2πk n and e i (2π+l)k n together, and the l > π channel corresponds to expanding the operators on e i 2πk n and e i (2π−l)k n together. Things are a bit trickier for the a i = n terms. Let us do this channel by channel. Due to how the operators are arranged after uniformizing, the l < π channel will only involve expansions of the form O † i → O i for both states, and will definitely have an identity block contribution. On the other hand, the states differ in their contribution in the l > π channel. Since the permutations in the mixed case are matched, this channel will involve at least one expansion of the form O † i → O j with i = j, and will not receive an identity block contribution; orthogonal operators cannot fuse into the identity and its descendants. As for the pure case, the sum over permutations ensures there will always be a combination such that the l > π channel expands O † i → O i , and so will have an identity block contribution. Using the techniques of appendix A, we find these contributions to be Combining the contributions from both channels we get The minimizing prescription comes from the rule that the correct identity block approximation to the replicated density matrix is the one which dominates over all other identity block contributions across all channels. Notice that both of these entropies have a discontinuous first derivative at some value of l. From the bulk perspective, this corresponds to a transition between different RT surfaces. The transition for the pure case occurs at l = π, ensuring that the entropy goes to zero as the interval encompasses the entire CFT. The homology constraint is manifestly not imposed, as there is no way to continuously deform the RT surface through the black hole. For the mixed state, the discontinuity occurs at some l > π set by the mixing term ln M . Choosing M = e 2π 2 c/3β , S Mixed reproduces the entropy of an interval in the thermal state. The transition found here is exactly the bulk RT transition from a single surface into two disconnected surfaces in the eternal black hole geometry. We see that for l = 2π, we get the area of the horizon result consistent with the homology constraint. This behavior is displayed pictorially in figure 6.
Let us consider intermediate values of ln M . For ln M c, the transition occurs at almost l ∼ π for the mixed case, and so there is almost no difference between the two states. It then seems that the homology constraint is not imposed. 9 An interesting case 9 In this case, one can perhaps continue to assume that the homology constraint was imposed but that the circumference of the 'wormhole' was too small, O(c) in Planck units, to cause a significant shift in the jump of the RT surface. to consider is when ln M = 2π 2 c/3β , with β > β and does not depend on c. In this case, there will be an appreciable distinction between the two states. After the transition, the mixed state entropy will be

JHEP02(2017)074
The second piece of this expression describes the usual RT surface anchored on the complement of the interval. For β = β, the first term is the horizon area of the black hole. However, for β > β, this contribution is smaller than the area, and there is no closed bulk minimal surface outside the black hole that can reproduce it. Naively, this would say that there there is no bulk prescription for such a state, and this 'thermal' piece needs to be added in by hand. Moreover, it would seem that the homology constraint is not satisfied. Incidentally, we know of two sided states which behave very much like |Mixed for β > β. These are the Shenker-Stanford wormholes constructed in [44,45]. By acting on the TFD with a series of anti-time-ordered shockwaves, they produce a state dual to an elongated wormhole. If the shock waves are sent in symmetrically from the two sides, then the original eternal black hole bifurcate horizon migrates into the wormhole. This region inside the wormhole is known as the causal shadow of the two boundaries. The sizes of the black hole event horizons, those seen by the CFTs, are not representative of the entanglement between them; they only measure how much energy was sent into the wormhole. The bifurcate horizon continues to be the extremal surface and its area correctly measures the entanglement between the CFTs. This is expected since sending in the shock waves amounts to acting on H L ⊗ H R with a factorizable unitary which does not modify the entanglement entropy.
We could also have considered states which behave like asymmetric wormholes, by considering an asymmetric entangled state with the dimensions of the left and right oper-JHEP02(2017)074 Figure 7. The spatial geometry that passes through the bifurcate horizon of an elongated wormhole. 2π 2 c/3β represents the entanglement entropy between the two CFTs. This surface is hidden behind both the left and right horizons, and will not be visible in any RT prescription restricted to the exterior of them.
ators differing by O(c). Note, the maximum number of terms in such a state will be given by the density of states of the side with smaller dimension. From the bulk perspective, the two exterior horizons have different sizes, and again the entanglement is given by the original bifurcate horizon. figure 7 shows what a spatial slice in this geometry looks like. In the situation where the number of terms saturates the density of states of one side, the entanglement entropy will be the horizon area of that same side. In the Shenker-Stanford construction, this is a state produced by sending in shockwaves from a single side.
In both of these cases, the constant piece in the entropy, 2π 2 c/3β , plays the role of the area of the original horizon and will not be visible from any of the exteriors. We should stress that the comparison between the state |Mixed and the Shenker-Stanford wormholes is merely an analogy; the state could instead be dual to a bulk with no geometric description behind the horizons. 10 Perhaps, one can get to the Shenker-Stanford states by acting on |Mixed with a factorizable unitary on the two sides. One can view this large degree of entanglement between the two CFTs as being large enough to possibly describe a geometric connection between the two sides [46]. This is a re-emphasis of the statement that not any entanglement is enough to have a geometry, but a specific kind of one [39,40].

Entropy operators more generally
The preceding discussion established that, for 1+1 CFTs dual to three dimensional Einstein gravity, the entanglement entropy of an interval in subspaces of dimension much less than e O(c) could be interpreted as the expectation value of a linear operator acting within that subspace. The approximate linearity of the entropy was established under the assumption of Virasoro identity block dominance. But, as we now discuss, approximate linearity is expected to hold much more generally. It should certainly hold for Einstein gravity in any dimension. In fact, a version of it should hold in any large N theory with many local degrees of freedom.

JHEP02(2017)074
In the large N limit certain quantum variables become non-fluctuating and a preferred set of "classical" states is selected. Moreover, the entropy of a subsystem R of a state |ψ typically becomes large: if N denotes the extensive parameter, then (6.1) in terms of the entropy density s R . In this sense, the main point of large N is that it defines a small parameter, 1/N , such that the leading contribution to the entropy can be interpreted, within some bounds, as a linear operator.
To illustrate the broad ideas, consider the following general setup. Given a bi-partite system AB, we can choose a set of states D = {|ψ 1 , . . . , |ψ K } and a set of projective measurements on A, M A = {P 1 , . . . , P K }, and on B, M B = {Q 1 , . . . , Q K } such that That is, the projectors P i and Q i serve to distinguish the states in D on both A and B.
The largest K can be (if we demand perfect distinguishability) is the smaller of the two Hilbert space dimensions of A and B. The equivalent statement in the holographic setup is that states of different entropy can be distinguished using the area operator, i.e. if A = A A|A A| is a spectral decomposition of the area operator, then we may take the projectors P = |A A|. What (6.2) says is that the states in D are perfectly distinguished by the measurements in M A and M B . Furthermore, the measurements are non-destructive or gentle in the sense that the final state after the measurement is the same as the initial state. More importantly, even if the set of states of interest only satisfy (6.2) approximately, it can still be true that the large N part of the entropy is correctly reproduced by a linear operator. In the holographic setup, these statements are a reflection of the fact that the area operator becomes non-fluctuating at large c. Given this data, as well as the list of entropies S A = S B of the |ψ i ∈ D, we can form the operatorŝ It follows immediately from (6.2) that Now suppose we take take a superposition of states in D, e.g.

JHEP02(2017)074
To simplify the form of ψ A we use the existence of the projective measurements M B . Without changing the value of the trace, we may insert a resolution of the identity on B which contains the projectors Q 1 and Q 2 . Since tr(Q 1 |ψ 2 ψ 1 |) = tr(Q 2 |ψ 2 ψ 1 |) = 0, it follows that Hence superpositions on the full system reduce to mixtures on a subsystem. This statement is also approximately true given an appropriate approximate form of (6.2). Consider now a general mixture, On the other hand, the entropy of σ A is Inserting a resolution of the identity on A that includes the P i , we can write σ = i p i ψ Ai = i p i P i ψ Ai P i ; using (6.2) the entropy formula collapses to a single sum Provided the second term, called the entropy of mixing and seen earlier in 4.22, is small, the entropy is approximately the average of the entropies of the individual terms. We will now demonstrate the above logic using three concrete models. Besides illustrating the general discussion, these models will allow us to elucidate the physics of approximate distinguishability.

N copies of a qubit
As a first toy example, consider states of N qubits of the form ρ = ψ N (6.12) with ψ = 1 + r · σ 2 (6.13) where σ = (σ x , σ y , σ z ) are Pauli matrices. Such states arise as follows: consider a bipartite system AB where each of A and B consist of N qubits. We restrict attention to states of AB of the form |ψ N where |ψ is an arbitrary pure state of two qubits. Upon tracing out B the resulting state on A is of the form ρ = ψ N . We will construct a linear operatorŜ independent of ψ with the property that tr(Ŝψ N ) ≈ N S(ψ) (6.14) as N → ∞. For notational simplicity we will also drop the subsystem index. The idea is that with many copies of the state ρ we can measure r = | r| without knowing the eigenvalue JHEP02(2017)074 basis of ρ and without substantially disturbing the state. In essence, the states ψ N and ψ N are approximately distinguishable for any fixed ψ = ψ in the large N limit. Think of the N qubits as spin-1/2 operators and introduce the total spin where σ α i is the α = x, y, z Pauli matrix of qubit i. It is straightforward to calculate tr(ψ N J α ) = N r α 2 (6.16) and The last line suggests that measuring J 2 = α J α J α effectively measures r. Indeed, if the eigenvalues of J 2 are j(j + 1) then at large N the approximate relation j = N r 2 holds. Furthermore, the distribution of j is tightly peaked in the state ψ N so that j is effectively a semi-classical variable (this can be seen by computing the variance of | J| 2 /N 2 ).
Let P j denote the projector onto the eigenspace with J 2 = j(j + 1) and let be the Shannon entropy of the probability distribution {x, 1 − x}. Then the operatorŜ may be taken to beŜ where in essence we measure J 2 and then return the entropy that would arise for that value of r = 2j/N . One easily checks that to leading order in large N we have tr(Ŝψ N ) = N S(ψ). (6.20) We can also show that the measurement ofŜ hardly disturbs the state. Indeed, suppose we measureŜ with relative precision . Then the question of disturbance amounts to computing 1 − δ = tr(P |Ŝ− Ŝ |< Ŝ ψ N ). (6.21) The distribution of j in state ψ N is tightly peaked and Gaussian, where the variance per qubit σ 2 depends on r but is order one. Converting from j to S (eigenvalue ofŜ) is accomplished by expanding H((1 + 2j/N )/2) near j = N r/2. Writing j = N r/2 + N y we find S − Ŝ = N yH = (j − N r/2)H , so deviations of the entropy correspond to deviations of j up to a factor of H . Using the distribution for j and the linear change of variables from j to S yields δ ∼ exp(−N k 2 ) (6.23)

JHEP02(2017)074
for some r-dependent constant k. For any fixed the disturbance caused by the measurement rapidly goes to zero as N → ∞. The fact that ψ N is mostly supported on a projector P |Ŝ− Ŝ |< Ŝ which is independent of the direction of r and hence the basis which diagonalizes ψ has significance. It means that there is a "universal compression algorithm" [47][48][49][50] depending on the spectrum of ψ but not the eigenbasis of ψ which compresses ψ N into N S(ψ) qubits. In other words, the same compression procedure works for all ψ of the form ψ = uψ 0 u † with u ∈ SU(2). Explicitly, the algorithm instructs us to make a coarse measure of j, after which the state is approximately contained in one of the j eigenspaces of dimension of order e N S(ψ) . Furthermore, the probability to obtain a result for j corresponding to a significantly larger than expected dimension is very small. For our purposes, these observations amount to the statement that one doesn't need need to know the basis to measure the entropy.
Hence for this class of states with N taken large there is a linear operatorŜ which is independent of the state, semi-classical, and whose expectation value gives the entropy. Furthermore, the linear span of states of the form |ψ N AB is the totally symmetric subspace of dimension N + 1 (recall that each |ψ AB is a two qubit state). Hence we obtain a polynomially large in N number of states for which an entropy operator exists.

Thermal states
The previous subsection dealt with N copies of a state and showed that in the large N limit an entropy operator existed. However, the copies were strictly non-interacting with each other; in other words, if we view ψ N as the thermal state of a Hamiltonian, then that Hamiltonian would have no interactions between the qubits. Thus it is useful to give a more general example. There is again a large N limit, a thermodynamic limit, but the "copies" are no longer non-interacting.
Consider the thermal state of a local Hamiltonian on N qubits in the limit N → ∞. The state is where Z(T ) is the partition function. This state describes one half of the thermofield double, and can also be viewed as modeling the coarse grained state of an old black hole.
Constructing an entropy operator for this set of states (indexed by temperature) thus improves on the N qubit model described above and shows that the strict independence of the N copies is not required. Let S(E, ∆E) denote the microcanonical entropy, the logarithm of the number of energy eigenstates with energy between E − ∆E and E + ∆E, and let P E,∆E denote the projector onto energy eigenstates with energy between E − ∆E and E + ∆E. Denote by bins a set of such energy windows which completely cover the spectrum of H. (6.26) where E(T ) is the average energy at temperature T . As usual, the bin width ∆E does not play a crucial role; the above identification is valid to leading order in large N , with for example ∆E = O(1). Henceforth we suppress the bin width and distinguish between S(E) and S(T ) by context. One can also argue that to leading order in N the measurement of S does not disturb the state so thatŜ is a semi-classical variable.
Thus there exists a linear operatorŜ independent of T which measures the entropy of the family of states ρ(T ). It is tempting to identify this operator with the extremal area operator (in Planck units) of the black hole horizon. This identification seems almost trivial, but it is in principle no different from what we did above since N H((1 + 2j/N )/2) effectively counts the logarithm of the number of states with eigenvalue j.
Two important properties ofŜ are that it is a coarse-grained observable and that it behaves properly under superposition. We first address the behavior ofŜ on mixtures, then we discuss the coarse-grained properties, and finally return to general superpositions.
Consider a mixture of two thermal states, To show thatŜ is a coarse-grained observable we should consider the case where ρ is not a thermal state but is instead a microstate of the thermal ensemble ρ = |E E|. Then we compute tr(Ŝ|E E|) = S(E), (6.30) so in a microstate the entropy operator nevertheless returns the microcanonical entropy, hence it is a coarse-grained observable which returns the coarse-grained entropy not the finegrained entropy. Furthermore, it is clear that in these cases whatŜ is doing is measuring the energy density and returning the appropriate entropy, a manifestly linear operation. That

JHEP02(2017)074
the entropy operator is a coarse-grained observable also matches holographic expectations since the extremal area operator is built from the metric which is in turn constructed from the stress tensor. Finally, consider a general superposition of the form ρ = ( The expectation of the entropy operator is Similarly, for the mixed state ρ = p|E E| + (1 − p)|E E | the fine grained entropy is H(p) (the binary Shannon entropy) while the expectation of the entropy operator is the same as for the superposition. Note, however, that measurement of the entropy operator collapses the coherent superposition and leaves the mixed state behind if E and E are in different bins. From these calculations one sees that the entropy operator can only behave as expected when acting on thermal states, microstates, and mild superpositions or mild mixtures of these. If we begin to make generic superpositions of substantial numbers of microstates then the entropy operator will no longer capture the coarse-grained entropy to leading order in N .

N copies of a free field theory
In this subsection we present one final example of the general construction outlined in the introduction to this section; essentially we study the free limit of a large N vector model.
For simplicity, consider a general bipartite system AB consisting of k A +k B free fermion modes with creation and annihilation operators c † α and c α (α = 1, . . . , k A + k B ) obeying the algebra {c α , c † β } = δ α,β . Then take N copies of these modes, labelled c αi , to give the full algebra {c αi , c † βj } = δ α,β δ i,j defined on the composite system A N B N . The state of A N B N is assumed to be N copies of a single pure Gaussian state of the original k A + k B fermion modes.
Upon tracing out subsystem B N , the state on subsystem A N has the form where c † i hc i = c † αi h αβ c βi with the restricted label set α, β = 1, . . . , k A . The quadratic form of the reduced density matrix is guaranteed due to the initial Gaussian pure state, i.e. because of Wick's theorem. Now we would like to construct a linear operator that measures the entropy of ρ A N for any h. First, note that if we knew the basis of fermion modes in which h was diagonal, then this problem would be trivial because the problem reduces to decoupled two-level systems, i.e. to qubits. The challenge, as with the N qubit model, is to find a way to measure the spectrum without knowing the basis. To accomplish this measurement, we use a little group theory. To the best of our knowledge our result is new, but we note that similar technology has been used on free bosonic models [51,52] as part of the "quantum marginal problem" [53][54][55][56].

JHEP02(2017)074
Let k ≡ k A denote the number of modes in the subsystem. The correlation matrix of a single copy, defined as is a k × k matrix which is one-to-one with the matrix h, G = (e h T + 1) −1 . The entropy of a single copy can be written in terms of G using the well-known formula Clearly then if we knew the spectrum of G we could determine the entropy of the N copy system. However, G itself is not an ideal object to study since it is basis dependent. The basis independent spectrum of G can be obtained from the k numbers tr(G ) for = 1, . . . , k. To construct suitable observables consider the group U(k) of unitary transformations acting on the modes c α . The generators of this group are and where t A are the analogs of the Pauli matrices for SU(k) and q generates the global phase rotation in U(k). Under U(k) transformations q is invariant and j A transforms in the adjoint representation. Now on the N copy system we have the corresponding observables and With the factor of 1/N these observables are normalized so that their fluctuations vanish in the large N limit. Essentially, this is the generalization of the addition of angular momentum, generalized from SU(2) to U(k). From these observables we construct the k Casimir invariants of U(k), and so on up to the k-th Casimir containing k factors of J A . 11 11 The k-th Casimir may be obtained from the invariant tensor in the k-fold tensor product of adjoint representations, i.e. the fusion to the identity of a product of k J A s.

JHEP02(2017)074
The Casimirs, being invariant operators, are not sensitive to the basis which diagonalizes G, but they do reveal the spectrum of G. For example, if λ i are the eigenvalues of G, then the expectation value of C 1 = Q is C 1 N = i λ i . The expectation value of a general C contains terms of the form tr(G n ) with n ≤ . Taken together, the expectation values of all the C suffice to determine the spectrum of G. Furthermore, as already mentioned, the fluctuations of the C vanish in the large N limit, so the spectrum of G becomes in essence a classical variable which can simply be read off from the state without disturbing it.
Since the spectrum determines the single copy entropy via (6.34), the entropy operator may be taken to be of the form (6.3) where the projectors are projective measurements of the k Casimir operators constructed above.
To give one simple example of this construction, consider the case k = 2. Then we are dealing with U(2) and the t A may be taken to be the Pauli matrices σ x , σ y , and σ z . The correlation matrix G has two eigenvalues, λ 1 and λ 2 . The expectation values of the Casimirs are C 1 N = λ 1 + λ 2 (6.42) and Let P c 1 ,c 2 denote the projector onto joint eigenspaces of C 1 and C 2 labelled by c 1 and c 2 . It is also useful to define the function χ(x) to be 0 for x < 0, x for x ∈ [0, 1], and 1 for x > 1. The binary entropy is again H(p) = −p ln p − (1 − p) ln(1 − p). The entropy operator may then be taken to bê (6.44)

Different sets of states
As discussed in the homology section and alluded to generally above, one can choose different sets of states to define an entropy operator. For example, one can consider the entropies of subsystems of one side of a two-sided black hole. In this case the homology constraint has an effect because there is a wormhole. We may define an entropy operator which measures the classical geometry outside the black hole and returns the appropriate area in Planck units as the entropy. However, this same entropy operator, when applied to black hole microstates, will still give a entropies appropriate to the corresponding two-sided state. In particular, the homology constraint will not be properly implemented and the entropy of a region and its one-sided complement will not agree. By the same token, an entropy operator defined for black hole microstates will also not in general function correctly when applied to two-sided states. Of course, this is consistent with everything we said above because these two sets of states are related by superpositions of exponentially many elements. One and two-sided black holes do agree when we restrict to subsystems of less than half the system size. This did not have to be so (it does not follow from just large N ) but is a consequence of strong coupling (dominance of the identity block). More generally, we would only expect sufficiently small sub-systems to agree between one and two-sided black holes.

JHEP02(2017)074
Still another interesting class of states is black holes formed by collapse. We can define another linear entropy operator appropriate to these time dependent states, and this operator only sometimes agrees with the operator for two-sided black holes.

Recap
What the results of this section establish is that effective linear entropy operators exist for simple non-interacting large N systems. Moreover, the thermodynamic analysis showed that strictly non-interacting copies were not essential; only something analogous to a thermodynamic limit need exist. The preceding sections established that a linear entropy operator also exists for very strongly interacting large N theories. These data points are suggestive of a more general picture in which the key physics is simply large N . Indeed, in the beginning of this section we gave a general argument, framed in terms of gentle distinguishing measurements, that large N was sufficient. The physics is that large N renders appropriate sets of states semi-classical and hence distinguishable. Large N also gives us leave to neglect small entropies of mixing, as in (6.11).
In the case of thermal states indexed by temperature, one could simply measure the energy to gently distinguish different temperatures. For a conformal field theory, this amounts to a measurement of the field theory stress tensor averaged over some region.
For theories that are furthermore holographic and described by Einstein gravity, the stress tensor again plays a privileged role. This is true both for thermal states and more generally. This is because the dual geometry is a natural semi-classical variable that distinguishes different states. Furthermore, in Einstein gravity the geometry is closely related to the field theory stress tensor; a fact reflected in the dominance of the Virasoro identity block in conformal field theories dual to Einstein gravity.
Hence holographic duality has two remarkable aspects: the entropy is a linear operator on certain classes of states (true for all large N theories) and the entropy operator has an incredibly simple interpretation in the dual geometry.

Considerations and future directions
In this paper we have analyzed in some detail the entropy of macroscopic superpositions in semi-classical states within the context of AdS 3 /CFT 2 . The main technical tool used was the dominance of the Virasoro identity block in computations of the entropy, a technique that relies on large central charge c and strong coupling (sparse spectrum). We also gave arguments that the same results would be obtained in Einstein gravity in higher dimensions and in fact in a wide variety of systems with an appropriate large N or thermodynamic limit. In this final section we investigate some consequences of our results for certain aspects of quantum information and quantum gravity.
First we note that our extended RT proposal is the same as the recent independent proposal of [57]. They reviewed the standard argument that entropy cannot be a linear operator and argued that the entropy of mild superposition would approximately average assuming at large N that different Schmidt bases were uncorrelated. Our distinguishability JHEP02(2017)074 arguments include this assumption as a special case and provide a more general information theoretic understanding of entropy as a linear operator. We have also explicitly demonstrated that entropies average for holographic CFT 2 s and shown how to construct entropy operators for the non-interacting limit of a large N vector model. Thus our analysis includes both weak and strong coupling. Our investigation also considered a number of additional features including the interplay of linearity and homology, the non-linearity of Renyi entropies, and the precise limits of linearity.

Conditions for a semi-classical spacetime
Our results also bear on the entropic approach to bulk reconstruction. For example, it has been found that the leading order in N 2 contribution to the tripartite information for any three subregions is nonpositive for any semi-classical holographic state [12]. However, since the tripartite information is linear in the entropies, the inequality I 3 ≤ 0 will continue to hold even for superpositions of semi-classical states. In fact, this conclusion holds for the entropy cone of [58] since it is closed under averaging.

Quantum error correction and superpositions
Our results imply that we can enlarge the code subspace of states employed in the interpretation that holography is a quantum error correcting code [22]. There the code subspace is defined as a space of states perturbatively close to a single reference state (such as the vacuum) that has a semi-classical description in the bulk. Bulk operators in the entanglement wedge of some boundary region therefore have representations in that region and which act within the code subspace [23]. Our results suggest that the code subspace can actually be enlarged to a direct sum of such subspaces, each of which is defined around a different reference state. Perturbative bulk operators therefore have representations that are block diagonal in the code subspace.
One can prove this last statement provided that the different semi-classical states are distinguishable within the entanglement wedge. Consider a code subspace composed of a direct sum of such distinguishable code subspaces H i each defined around a different reference state. Here distinguishable means the states have different geometries and obey (6.2) to a high degree of approximation. These code subspaces are not perturbatively connected. Next, consider an operator φ defined in such a way with respect to the boundary that it acts within the entanglement wedge of some region A in all states in the full code subspace. We now show that if the operator φ satisfies the condition for operator algebra quantum error correction (OAQEC) proved in [22] for a set of code subspaces H i distinguishable within A, then it is also satisfied within ⊕ i H i . In particular, we will show that ψ| φ, XĀ |ψ = 0 (7.1) for arbitrary |ψ and |ψ within ⊕ i H i , and for any operator XĀ on the complement region A. We can decompose the states under the direct sum as |ψ = i |c i and find

JHEP02(2017)074
The first sum vanishes by virtue of φ satisfying the OAQEC condition within any H i . We finally argue that the second sum is also zero. Since φ acts perturbatively within the code subspace the second term is a sum of terms of the form c i |XĀ|c j where the two states are distinguishable within the region A. As discussed earlier around (6.2), this entails the existence of a projection operator purely on A which projects on either of the two code subspaces. Since this operator commutes with any XĀ these matrix elements of XĀ must vanish.

A nonlinearity for single sided pure states
We demonstrated in section 5 that there cannot be a linear entropy operator for all semiclassical states with two asymptotic boundaries. This was primarily due to topology change induced by superposing exponentially many semiclassical states. We argue here that the same obstruction applies in semiclassical states with a single asymptotic boundary. Building on [59], consider a state that describes two black holes in pure microstates, separated by some large distance in global AdS. 12 Moreover, consider the setup where the two black holes have non-overlapping gravitational dressing to two different CFT regions A and A c , as is shown in figure 8. Such a state can be created by acting on the vacuum with a product unitary as where i labels the black hole microstate. These unitaries are chosen such that the states |ψ i are distinguishable both on A and A c satisfying and similarly for A. Since the state is prepared by a product unitary on A and A c , the entanglement entropy of any of those regions will be exactly that of the vacuum. As shown in figure 8, the RT surface is simply that of vacuum AdS. Since by the entanglement wedge reconstruction proposal the area operator can be viewed as supported either on A or A c it will be degenerate within the subspace spanned by |ψ i with its eigenvalue given by that in the vacuum. So we can writeŜ where S A (|0 ) is the entanglement entropy in the vacuum state. Consider now the superposition of the states |ψ i involving all the microstates of the black hole. Here we are restricting to some energy window that involves summing over an exponential number of states. This is JHEP02(2017)074 where M is the number of microstates. This state is expected to be dual to a wormhole connecting the two black holes in global AdS. This is motivated by ER=EPR [46] ideas and is also supported by explicit constructions involving pair creation of black holes via tunneling [60,61]. The trace of the replicated density matrix of A is where ρ i A = U i A ρ A U i † A and 7.4 implies that ρ i ρ j = δ ij (ρ i ) 2 as an operator statement. This gives the von Neumann entropy S A (|w ) = S A (|0 ) + ln M. (7.8) Had we used 7.5 to compute the entropy we would completely miss the ln M contribution coming from area of the wormhole which captures the entanglement between the black holes.

Connections to one-shot information theory
Because our arguments relied on a kind of thermodynamic limit, they are related to recent studies of the so-called one shot information theory of quantum field theories [62]. Standard many-copy information deals with operational tasks like compression in the limit where the states of interest consist of many independent copies of a single state, the model considered in section 6.1. One speaks about compression rates, for example: the resources needed per copy to compress many copies of a state. The resources needed in the single copy limit are typically different, but in many cases the existence of a thermodynamic limit in a single copy setting is sufficient to effectively be in the many copy limit. It would be interesting to further explore these connections as part of the burgeoning one-shot information theory of quantum fields. A concrete question concerns the possibility of universal compression, similar to known results in the N qubit model, but perhaps based on representations of JHEP02(2017)074 the conformal group instead of the permutation group. One application of these ideas is the justification of the oft made assumption that one may reason about holographic entanglement by simply "counting Bell pairs".

Tensor networks for superpositions
Another interesting direction relates to tensor network models of AdS/CFT. Because of the connection between tensor networks and geometry, it is a prediction of our work that superpositions of macroscopically distinct tensor networks obey the extended RT proposal in its network form. One setting where this prediction can be tested is the random tensor networks models introduced in [63] and generalized and studied in detail in [64]. Some care must be exercised, since the simplest random tensor network calculations involve not the entanglement entropy but the second Renyi entropy (which does not behave as a linear operator as we show below). However, the general distinguishability arguments should apply to random tensor networks, so we expect that the extended RT proposal does apply to random tensor networks. One simplified setting where this could be explicitly checked consists of so-called random stabilizer tensor networks. Every subsystem density matrix of a stabilizer network has a flat spectrum, so the analysis of random stabilizers is considerably simpler than for generic random tensors. It is also interesting to explore the construction of a more elaborate single tensor network which encodes a superposition of simpler tensor networks.

Comment on (non)linearity of Renyi entropies
Finally, we briefly comment on the inherent non-linearity of the Renyi entropy. Recall that the Renyi entropy S n is defined as S n = 1 1 − n log tr(ρ n ). (7.9) It is usually assumed in field theory calculations that the limit n → 1, which recovers the von Neumann entropy, is smooth. In fact, the identity block calculations above are only really controlled in the limit n → 1 with c(n − 1) kept large.
Here we show that for superpositions of the type we have been considering the Renyi entropy is badly discontinuous as a function of n if the large N limit is taken first. For simplicity consider two states ρ a and ρ b with no overlap, ρ a ρ b = 0. To gain intuition set p = 1/2; the expression inside the logarithm has drastically different behavior depending on whether n < 1 or n > 1. Suppose without loss of generality that JHEP02(2017)074 S a ≥ S b . Then for n < 1 the Renyi entropy is S n<1 = S a + n 1 − n log 2 + O e −N (n−1)(sa−s b ) . (7.12) Hence for fixed n = 1 the limit N → ∞ gives S n<1 = S a . For n > 1 the S a term inside the logarithm is now exponentially smaller than the S b term. Hence S n>1 = S b + n 1 − n log 2 + O e −N (n−1)(s b −sa) , (7.13) and the large N limit again produces a discontinuity. These results are not an artifact of setting of p = 1/2. For any p ∈ ( , 1 − ) with fixed, the Renyi entropy is discontinuous as N → ∞.
Before proceeding we note that the operators O h({a i }) are sums over primary operators with canonically normalized two point functions, 1/x h+h , and so will not be canonically normalized themselves. We fix this with the following rescaling Before taking the derivative with respect to n we need to perform the sum over the a i 's. For n > 1 the summand involves four point functions of heavy operators whose Virasoro identity block contribution is not known in closed form. To get around this, we first consider a modified form of the above equation where we replaced n in the upper limit of the sum over a i and in the combinatoric factor with a new variable m. We will tune m and n separately in the meantime and then take the m → n limit before differentiating. Next, we take n close to 1 and use the known closed form expression of the identity block for this four point function. These are where l is the size of the interval. Note, that here we have specialized to the case of operators with no spin. This function is unfortunately sufficiently complicated that we cannot perform the sum directly. Instead, we perform a Taylor expansion of the function

JHEP02(2017)074
where the sum over k runs only over the evens because of k j=1 j × c j = k and equations (A. 16) & (A.17); since all the c odd terms vanish, k must be even.
Plugging this into the formula for the entropy, we have that Thus, we can perform the differentiation and continuation in n before summing over k. Let us consider the terms with k = 0 and k = 0 separately. Before differentiation, the k = 0 term is

JHEP02(2017)074
Finally, we perform the check that our method of expanding and resuming preserves the requirement that Trρ → 1 as n → 1. From equation (A.21), we can read off the form of the reduced density matrix Trρ n = implying that lim n→1 G k (n) = δ k,0 . Therefore Open Access. This article is distributed under the terms of the Creative Commons Attribution License (CC-BY 4.0), which permits any use, distribution and reproduction in any medium, provided the original author(s) and source are credited.