Leading order corrections to the quantum extremal surface prescription

We show that a na\"{i}ve application of the quantum extremal surface (QES) prescription can lead to paradoxical results and must be corrected at leading order. The corrections arise when there is a second QES (with strictly larger generalized entropy at leading order than the minimal QES), together with a large amount of highly incompressible bulk entropy between the two surfaces. We trace the source of the corrections to a failure of the assumptions used in the replica trick derivation of the QES prescription, and show that a more careful derivation correctly computes the corrections. Using tools from one-shot quantum Shannon theory (smooth min- and max-entropies), we generalize these results to a set of refined conditions that determine whether the QES prescription holds. We find similar refinements to the conditions needed for entanglement wedge reconstruction (EWR), and show how EWR can be reinterpreted as the task of one-shot quantum state merging (using zero-bits rather than classical bits), a task gravity is able to achieve optimally efficiently.


Introduction
The quantum extremal surface (QES) prescription [1] says that the entropy of a boundary region B in AdS/CFT is given by Here we are extremizing over bulk surfaces γ that are homologous to B, A(γ) is the area of the surface γ, G is Newton's constant and S bulk (γ) = − tr(ρ ln(ρ)) is the von Neumann entropy of the state ρ of the fields in the bulk region (known as the entanglement wedge) bounded by the surface γ and the boundary region B. 1 The combination of the two terms is known as the generalized entropy. The perceived role of the S bulk term has shifted over time. Because of the explicit factor of 1/G, the area term becomes very large in the semiclassical limit G → 0. The bulk entropy term, which has no such factor, was therefore initially regarded as a small, perturbative correction. In the last year or two, this view has changed. It has become clear that the QES prescription is still valid -and indeed plays a crucial role -even in situations where S bulk is very large, and so competes with the area term. In particular, it was shown in [2,3] that the QES prescription gives a unitary Page curve for the entanglement entropy of an evaporating black hole. This Page transition happens when the bulk entropy of the trivial 'empty' QES becomes larger than the area term for a non-trivial QES that lies near the horizon.
However, as we shall see, in such situations considerable care is needed when applying the QES prescription. For many states, a naïve application of the QES prescription gives contradictory answers, which are incompatible with basic properties of von Neumann entropies, even at leading order in 1/G.
The primary aim of this paper is to (1) show that such contradictions exist, (2) show how the contradictions are resolved by more careful calculations, producing leading order corrections to the QES prescription, and (3) give general conditions for when the naïve QES prescription is valid, and when it needs to be replaced by a more refined version.
The contradictions can arise whenever there are two extremal surfaces, with O(1/G) bulk entropy in the intermediate region between the two. While common enough in AdS/CFT, this situation is also central to the phase transition that provides the Page Figure 1. Setup in which we derive a contradiction from a naïve application of the QES prescription. The boundary is divided into two subregions, B and B. For both, there are two competing quantum extremal surfaces, γ 1 and γ 2 , with γ 1 homotopic to B and γ 2 to B. We take B to be larger, such that the area of γ 2 is bigger than that of γ 1 at O(1). Between these surfaces is a large amount of matter (the "dustball"), such that some states of the matter have entropy much larger than the difference in areas of the two surfaces.
curve of the evaporating black hole. Indeed, in Section 2 we show that black hole evaporation can still lead to entropies inconsistent with unitarity, when using the naïve QES prescription.
Returning to AdS/CFT, a useful example setup was given in [4]. Consider 2+1d AdS, with the boundary divided into four regions as shown in Figure 1. Let two diametrically opposed regions be slightly larger than the other two, such that the union of those two, named B, has a connected entanglement wedge in the absence of bulk entropy. The complement of the boundary region B shall be labelled B. There are two extremal surfaces homologous to B: one homotopic to B and labelled γ 1 , and one homotopic to B and labelled γ 2 . These surfaces divide the bulk into three regions: one named b that neighbours B, one named b that neighbours B, and a central region labelled by b that is bounded by the two extremal surfaces. Let there be matter in b with energy O(ε/G), for some ε 1. The backreaction is under control for small enough ε, and the bulk matter can have entropy roughly equal to its energy. We can therefore easily dial the size of B such that some bulk states have a bulk entropy larger than the area difference, while all states have the same approximate classical geometry.
Consider two states: In the first, the bulk matter is in a pure energy eigenstate. The matter therefore does not contribute to S bulk , and the entanglement wedge of B is connected. In the second, the matter is in a thermal state with the same average energy. We tune the region B such that the large entropy of the thermal state causes its entanglement wedge to be disconnected. Hence, the von Neumann entropies are Here A 1 and A 2 are the areas of γ 1 and γ 2 respectively. Now we can formulate the contradiction. What is S(B) for a state that is a mixture of the pure state and the thermal state? In other words, ρ matter = p |ψ ψ| + (1 − p)ρ thermal . (

1.4)
A naïve application of the QES prescription tells us that, at leading order, the answer is Mixture: S(B) naïve = min A 1 /4G + (1 − p)S thermal , A 2 /4G . (1.5) However, this cant be correct. The AdS/CFT bulk-to-boundary map is linear, so the global boundary state must also be a mixture of the two boundary states. And, if the global state is a mixture of the two states, the reduced state will also be a mixture of the two reduced states. In general, the von Neumann entropy S(ρ) of a mixture of quantum states of density matrices ρ i is bounded from above and below by see e.g. [5]. 2 Together, the bounds (1.7) are quite restrictive, forcing a mixture of k states to have entropy at most O(ln k) different than the average entropy of those states. In particular, the entropy of a mixture of O(1) states is within O(1) of the average entropy within the mixture. 2 The lower bound formalizes an intuitive fact: the uncertainty of a mixture of states must be at least as large as the average uncertainty of each of those states. The upper bound is true because ρ must have less entropy than a state that includes a correlated reference system with orthonormal basis |i , Why does the naïve QES fail for the mixture when (we claim) it gives the correct answer for both the pure state and the thermal state individually? Intuitively, this is because it is a mixture of states that are on different sides of the phase transition. But this notion is not very precise: the thermal state can itself be written as a mixture of states that are on either side of the transition (admittedly in this case one either needs a large number of states or some probabilities in the mixture to be very small), and yet it doesn't receive large corrections.
A more precise answer is that, unlike the pure state and the thermal state, the mixture of the two is not perfectly compressible. We say a state ρ is perfectly compressible if we can throw away all but e S(ρ) of the states in its support without changing the state very much. More precisely, there must exist another state σ close to ρ such that ln rank(σ) = S(ρ) + subleading. In the thermodynamic limit, thermal states are dominated by energies close to the saddle point energy, and are therefore perfectly compressible. A pure state has rank one and hence is trivially perfectly compressible.
A general mixture of the two is not: any state σ close to ρ matter will have almost the same rank as an approximation to the thermal state itself, because In general the compressibility of a quantum state is characterized not by its von Neumann entropy, but by a quantity known as the smooth max-entropy H ε max (ρ) [6,7]. This is defined by the fact that you can throw away at most all but e H ε max (ρ) of the states in the support of ρ, without changing ρ very much. For thermal and pure states, the smooth max-entropy is approximately equal to the von Neumann entropy -implying those states are perfectly compressible -but in general it can be much larger. For example, in the mixture of a thermal and pure state, we have 3 (1.10) To understand why this should be relevant to the QES prescription, we need to introduce the concept of entanglement wedge reconstruction (EWR) [10][11][12]. This says that the bulk matter in the entanglement wedge is encoded in the boundary state on the boundary subregion B. "Encoded," here, means that the set of bulk operators local to the entanglement wedge has a representation on B that acts faithfully on a "code" subspace of BB. It turns out EWR is implied by the QES prescription [8,12]. 4 EWR (and hence QES) is deeply connected to compressibility. The intuition is that the number of degrees of freedom available in B to describe the bulk state in region b is given by the difference in areas (A 2 − A 1 )/4G between the two extremal surfaces. If the bulk state in region b cannot be compressed into these degrees of freedom, EWR for region b cannot be possible, and hence the QES prescription, with γ 1 the minimal QES, cannot be valid, even if γ 1 is the surface with the smallest generalized entropy. One of the main goals of this paper will be to formalize this intuition by showing that EWR can be reinterpreted as a particular information-theoretic task, called one-shot quantum state merging, where Alice has to communicate a compressed version of a quantum state to Bob.
To make a precise statement, detailing how the QES prescription needs to be modified given the discussion above, it is helpful to write the naïve QES prescription in the following form: . (1.11) The quantity S(b |b) = S(bb ) − S(b) is the conditional von Neumann entropy. We will argue that this naïve prescription only works when bulk states are perfectly compressible, because it implies the inclusion (or not) of b in the bulk entropy term only depends on the von Neumann entropy S(b |b). In reality, the information from b is only accessible in B (and hence its entropy is only included in S(B)) when the quantum information in b can be compressed into (A 2 − A 1 )/4 ln(2)G qubits. The relevant bulk entropy is therefore not the conditional von Neumann entropy S(b |b), but the conditional smooth max-entropy H ε max (b |b). We'll explain H ε max (b |b) in detail in Section 3 (along with the smooth conditional min-entropy H ε min (b |b)), but, roughly speaking, H ε max (b |b) characterizes the compressibility of b when there is entanglement between b and b (and H ε min (b |b) is complementary to H ε max (b |b)). For all states, we have H ε max (b |b) ≥ S(b |b) ≥ H ε min (b |b). A central result of this paper will be to refine the conditions for the QES prescription (1.11), replacing it by 5 (1.12) We will not give a "one answer fits all" description of the middle regime; it does not admit one as convenient as the naïve QES prescription. The entropy there depends on the details of the bulk entanglement. (That said, one can often estimate the answer by finding the average entropy of a set of constituent states, up to a Shannon term.) This refinement can be derived using replica trick calculations, and resolves the contradictions discussed above.
A heuristic way to understand the difference between these two prescriptions is that our refinement of the QES prescription recognizes that different parts of the wavefunction might be on different sides of that phase transition, whereas the naïve prescription assumes that the entire state has to be on one side or the other. The min-/maxentropies appear because they describe the largest/smallest parts of the wavefunction respectively. If the smooth max-entropy is less than (A 2 − A 1 )/4G, we can be sure that no significant part of the wavefunction has undergone the transition. Similarly, if the smooth min-entropy is greater than (A 2 − A 1 )/4G, we know that almost the entire wavefunction has undergone the transition. If they straddle (A 2 − A 1 )/4G, then the entropy will depend on which parts of the wavefunction have crossed the transition.

Overview of paper
The paper is organized as follows.
In Section 2, we illustrate the problem with a naïve application of the QES prescription in more detail. We give several closely related examples of the naïve QES prescription violating the bounds on the von Neumann entropy of mixtures of states.
In Section 3, we review two quantities that are crucial for understanding the refined QES prescription: the smooth conditional min-entropy H ε min (A|B) and maxentropy H ε max (A|B). In Section 4, we return to the simple examples from Sections 1 and 2 and carefully calculate their entropies using the replica trick. By avoiding using the Lewkowycz-Maldacena assumption, we find an answer that disagrees with the naïve QES prescription but is consistent with the bounds on entropy of mixtures. This answer depends on the relative sizes of three quantities: the smooth conditional min-and max-entropy, and the difference in area of the two competing quantum extremal surfaces.
In Section 5, we present general arguments that justify the conditions given in (1.12) for the existence of large corrections to the naïve QES prescription. We start by arguing this for so-called fixed-area states, and then argue that this extends to general holographic states, up to subleading corrections. A key tool is the connection between gravity calculations in fixed-area states and calculations in random tensor networks.
In Section 6, we update the conditions for entanglement wedge reconstruction (EWR), explaining how to generalize the results of Dong, Harlow, Wall [12] and Hayden, Penington [8], given this refinement of the QES prescription. These updated conditions clarify the relationship between EWR and a well-known quantum information task, oneshot quantum state merging. Our results demonstrate that EWR can be a maximally efficient form of one-shot quantum state merging, using zero-bits instead of the usual classical bits.
In Section 7, we present a more general refinement of the QES prescription conditions, applying in situations where there are more than two competing extremal surfaces. To do so, we first introduce two interesting new physically relevant subregions of the bulk: the min-entanglement wedge (min-EW) and max-entanglement wedge (max-EW). The naïve QES prescription applies if and only if the min-EW and max-EW are the same.
In Section 8, we mention some further implications of these results. In particular, we discuss how the smooth min-and max-entropies should be renormalized to get a UV-finite quantity.

Related work
This paper has some technical overlap with the recent papers [13,14]. They too find corrections to the QES prescription by carefully including more than one saddle in the replica trick, and they too use fixed-area states to simplify the calculation enough to do so.
There are three key differences between our corrections and theirs. One, the corrections we discuss can be O(1/G), not just O(1/ √ G). Two, our corrections can exist for an O(1) range of A 2 − A 1 , a window that does not vanish as G → 0. Finally, our corrections do not arise from fluctuations in the geometry, but rather from the bulk state affecting the boundary entropy in a different way than previously expected. In particular, the corrections in [13,14] are correctly computed by the expectation of the naïve QES prescription over all the classical geometries that can be created by the fluctuations.
We also provide a different argument justifying the use of fixed-area states in lessons about general states. Our argument also applies to the setups in [13,14], bounding the error in some of their assumptions.

Mixtures and contradictions
In this section, we further illustrate the need for a careful, refined application of the QES prescription, first generalizing the prior example by adding entanglement, then discussing the importance of the refinement for black hole entropy and the unitarity of black hole evaporation.

Contradiction 1: Dustball
Our first example is the dustball geometry, which was already presented in the introduction. However, we emphasize that many of the details, as presented there, were unimportant. The contradiction can easily be generalized to higher-dimensions, to mixtures where neither state is pure, or to mixtures of a larger number of states (so long as the number is not exponential in 1/G).
We also note that we can easily adapt this example to find a similar contradiction where the state in b is highly entangled with the state in b, eventually illustrating the necessity of using conditional min-and max-entropies in (1.12).
The first step is to consider a purification of the mixed dustball state, where the dustball is entangled with a second, identical dustball in a different bulk spacetime, as in Figure 3. In other words, where the bulk state is and |Φ is a purification of ρ thermal . From a boundary perspective, the mixed CFT state is purified by a second identical CFT, which we shall call the reference system R. Consistent with our notational conventions, we use r to denote the bulk Hilbert space associated to the second CFT. Introducing the reference system R does not change the entropy S(B). However, since the overall state is pure, we have S(BR) = S(B). The entropy S(BR) can also be calculated using the naïve QES prescription. This time, the degrees of freedom in the homology region shared by both extremal surfaces (in this case b⊗r) are entangled with the degrees of freedom between the two surfaces (region b as before). Unsurprisingly, the naïve QES prescription gives the same answers as before, and hence we again find a contradiction.

Contradiction 2: Black hole
A practically identical setup reaches the same contradiction, if we replace the dustballs with black holes [8]. See Figure 4 for the setup with a mixed state black hole (though we also consider two entangled black holes, which would look very similar to Figure 3).
The advantage of this setup is that it's familiar to consider black holes with entropy growing with 1/G. We can, for example, consider all states in an energy band of width ∆E ∼ O(1), centered on some high energy E. There are e O(1/G) states in this subspace, and generic density matrices in this band are expected to be black holes. Additionally, unlike the dustball, we can also use a single interval (in AdS 3 /CFT 2 ) for our boundary region, because the black hole geometry has extremal surfaces on either side of the black hole.
The big disadvantage -indeed the reason we did not lead with this example -is that black hole microstates seem somewhat mysterious. One might worry that mixtures  Figure 1, but now we consider the entropy of BR, where R is an entire extra copy of the boundary, dual to its own dustball. The two dustballs are in a mixture of entangled states, given by (2.1). A naïve application of the QES prescription gives the wrong answer for the entropy S(BR).
of black hole states, like (2.1), are secretly mixtures of classically distinct geometries, mixtures which people already expected to give averaged answers in the QES prescription. For example, a special case of the mixture of entangled black holes is the mixture of an energy eigenstate and the thermofield double (TFD) state. 6 The TFD state is for some inverse-temperature β and energy eigenstates |E i . Two black holes entangled like this are connected by a wormhole [15], and hence there is a nontrivial homology constraint. This is very different from a factorized energy eigenstate, which has trivial homology constraints. The mixture of the two, special case: must therefore have the QES prescription applied to it with care, since it is a mixture of two distinct classical geometries. There is a history of speculating that -for this state -the naïve QES prescription gives an S(B) that is indeed the average entropy (1 − p)A/4G (see e.g. [16]). The argument was that the area operator is linear, and so its expectation value in this mixture of states must be the average of its expectation value in each. While that argument is fine, we emphasize that it does not explain away the contradictions we are pointing out. This can be made sharp using the insights from quantum error-correction in [17]. 7 From the quantum error-correction point of view, it is not necessary to count the black hole entropy as part of the "area." A choice of code subspace that includes the black hole microstates will regard the black hole entropy as part of the matter entropy. This would be inconsistent, giving an answer that does not equal S(B) = (1 − p)A/4G, if the entropy of a mixture of black hole states is given by the naïve QES answer.

Contradiction 3: Hawking radiation
Our final contradiction appears in evaporating black holes. It was shown last year that using the QES prescription allows a gravity calculation of the decrease in entropy of Hawking radiation after the Page time [2,3]. This goes a long way towards resolving the famous black hole information paradox. However, there's a lingering paradox in those calculations, if the QES prescription is applied in the naïve way. We demonstrate this now.
Consider a post-Page time black hole B, having already emitted radiation R in state ρ R . Introduce an ancilla qubit q, and entangle it with R in the following way. First, put q in a superposition Then, perform a joint operation on qR, measuring the radiation if q is in state |1 , and otherwise doing nothing. This measurement need not be complicated -a factorized measurement on each Hawking photon is simple and will suffice. Given measured state |ψ R , the reduced state of the radiation becomes Assuming that the evaporating black hole was following the Page curve, the entropy of the radiation, at leading order, will then be (1 − p)A hor /4G (+ subleading), where A hor is the area of the black hole horizon. What does the naïve QES prescription say that the entropy will be? As long as we don't measure the most recent Hawking quanta to escape into R, the locations of the quantum extremal surfaces will be unchanged. The generalized entropy of the empty surface will be (1 − p)S rad , where S rad is the semiclassical, thermal entropy of the radiation. The generalized entropy of the nonempty surface near the horizon will be A hor /4G as before.
As with our previous contradictions, this is just incorrect (assuming unitarity), even at leading order. The naïve QES prescription is giving an answer that is qualitatively just as wrong as the Hawking, information-loss answer. Indeed, for small values of (1 − p), the naïve QES prescription answer and the Hawking answer are the same.
A very similar contradiction can be created using purely unitary processes, without any measurements. One just creates an ancilla system A, in the state |0 , that is a copy of the radiation Hilbert space R. Then one applies a conditional swap operator (which again factorizes into a product of local interactions) that swaps A and R if and only if the qubit q is in the state |1 . Assuming unitarity, the form of the reduced state on R will again be given by (2.5). The generalized entropy of the empty surface will again be (1 − p)S rad , while the generalized entropy of the nonempty surface will be A hor /4G + pS rad . Again, we find a contradiction with unitarity at leading order.

Summary
This section showed classes of examples in which a naïve application of the QES prescription gets the entropy wrong at leading order. In Section 4 we do a careful calculation that gets the entropy in these examples right, and then in Section 5 we describe more generally when and why there are corrections. First, however, we need to introduce two quantities that will characterize when the naïve QES prescription receives these large corrections.

Smooth min-and max-entropies
While naïvely the QES prescription compares only von Neumann entropies to areas, we will find that a more careful prescription compares to the area two other quantities: the smooth conditional min-entropy and smooth conditional max-entropy. These information-theoretic quantities have historically found use in "one-shot" protocols, settings in which only a single copy of a quantum state is used or transferred.
We explain these quantities now, and in all future sections refer to them heavily. We start with the simplest version, the classical min-and max-entropy, and gradually work up to what we really want, the (quantum) smooth conditional min-and max-entropy.

Non-conditional versions
To introduce the idea of one-shot entropies, it is helpful to temporarily forget about quantum mechanics and simply consider classical probability distributions.
Let us first recall the information-theoretic role of the Shannon entropy S(p) of a classical probability distribution p(x) (analogous to the von Neumann entropy in quantum mechanics). Imagine you randomly sample from a large number of copies n of the probability distribution, getting outcomes {x i }. You, Alice, now want to communicate those outcomes to your friend Bob.
How much information do you need to send to Bob to do this? To always be successful, for any {x i }, you need to send at least n log 2 d bits, where d is the number of values x can take with nonzero probability. However, if you only insist that the communication succeed with high probability (i.e. succeed for a variety of possible outcomes {x i } that collectively have probability p > 1 − ε for some small ε), the task becomes much easier. One can show that, at leading order for large n, you only need to send n S(p)/ ln(2) bits. Essentially, this comes from the law of large numbers ensuring that 'typical' samples from many copies of the distribution have a probability p such that 8 ln p = n ln p(x) p(x) + o(n) = n S(p) + o(n) . (3.1) Hence, you and Bob simply need to agree on a code, in which the nS(p)/ ln(2) bits you send tell Bob which of the e nS(p)+o(n) "typical strings" you sampled. The story in quantum mechanics is very similar: given any density matrix ρ, we can project ρ ⊗n into a 'code subspace', while only changing the state a small amount. This code subspace is just built out of products of states in the Schmidt decomposition of ρ that have typical entropy, as in the classical case. Such states dominate the Schmidt decomposition of ρ ⊗n at large n.
The number of qubits needed for the code subspace grows, in the limit of large n, as nS(ρ)/ ln(2), where S(ρ) is the von Neumann entropy. If Alice has a pure state randomly sampled from ρ ⊗n , she can therefore communicate that state to Bob with high success probability, just by sending nS(ρ)/ ln(2) qubits.
However, both in classical probability and in quantum mechanics, we often (perhaps even typically) encounter situations where we don't have a large number of copies of a single density matrix or distribution. Instead, we only have a single state or distribution, which may still be very large in size. An example, of course, is holography. In the limit G → 0, the boundary Hilbert space dimension blows up exponentially, but this does not mean we have a large number of independent copies of the same state.
In this 'one-shot' setting, the von Neumann entropy does not have an important operational role. 9 It is therefore somewhat surprising that the von Neumann entropy has been playing such a central role in holography, for example in determining whether entanglement wedge reconstruction is possible! As we shall see, the resolution is that the real quantities that are important in holography are smooth max-and min-entropies, which do have a natural operational interpretation in one-shot quantum Shannon theory. It just so happens that these 'one-shot entropies' have been approximately equal to the von Neumann entropy, in most of the situations that have been considered in the literature until now.
Suppose we consider the same task as above (sending the outcome of sampling a probability distribution from Alice to Bob), but now we only sample from a single copy of the distribution. How many bits do we need to send to communicate the outcome with high probability? We need to be able to send a distinct message for each outcome that we want to be successfully communicated, and our success probability is maximized by choosing the outcomes with the highest probability of occurring. So the number of bits that need to be sent is log 2 N (ε) where N (ε) is the smallest integer such that with the probabilities p i ordered from largest to smallest. Again, there is an obvious quantum mechanical generalization, which gives the minimum number of qubits needed to send a quantum state, sampled from a single copy of a density matrix ρ, from Alice to Bob. This is given by Let's unpack this for a moment. We first defined the Rényi 0-entropy (also known as the Hartley entropy) as and then we 'smoothed' this quantity by minimizing it over all ρ close to ρ (which in this case just meant throwing away small eigenvalues). We measured this distance with the trace distance, or Schatten 1-norm, ||X|| 1 = tr √ X † X . In fact, (3.3) is the original definition of the smooth max-entropy [7]. It turns out however that H 0 (ρ) can be replaced [19] by the Rényi entropy for any α < 1, while only changing the smooth entropy by a small amount. Specifically, As we shall see below, H 1/2 (ρ) generalises better to conditional entropies. It is therefore conventionally used in the modern definition of the smooth max-entropy [20], where we are taking an infimum over all states ρ within an ε-ball B ε (ρ) of ρ. 10 In summary, the number of qubits needed to send Bob your quantum state with high fidelity, if you only sample the distribution one time, is the smooth max-entropy (3.7), up to the factor of ln (2). The smooth max-entropy is always greater than or equal to the von Neumann entropy; sending many samples from the distribution can only improve the efficiency of the communication rate.
We can also define a complementary quantity, the smooth min-entropy, as Again, H ∞ ( ρ) could be replaced by H α ( ρ) for any α > 1 while changing the definition by at most O ln(1/ε) . It's operational interpretation is less intuitive than the smooth max-entropy, so we motivate it simply by its relationship to the conditional maxentropy, as we'll explain. Note that the smooth min-entropy is always less than or equal to the von Neumann entropy. Together, these two quantities establish upper and lower bounds on the confidence interval for the value of (non-negligible) eigenvalues of ρ. The smooth max-entropy encodes the size of the smallest eigenvalues in the density matrix (which cannot be thrown away with small error), while the smooth min-entropy captures the size of the largest eigenvalues (that cannot be thrown away).
If the spectrum is close to flat (i.e. is dominated by a small range of eigenvalues) then the smooth min-and max-entropies will be close to the von Neumann entropy (which characterizes the average (log-)eigenvalue). In particular, thanks to the law of large numbers, this happens at leading order in n when you take a large number of copies ρ ⊗n of a state ρ. This explains the importance of the von Neumann in traditional asymptotic quantum Shannon theory, which deals with exactly this limit.
It is also what has led to the success (so far) of the naïve QES prescription; it's been used for bulk states with an (approximately) flat spectrum, where the smooth min-and max-entropy are roughly the same as the von Neumann entropy.

Conditional versions
The most general quantities we will need are the smooth conditional min-and maxentropies, which generalize the conditional von Neumann entropy. Unfortunately, the definition of these quantities is somewhat more technical, and somewhat less intuitive, than their unconditional counterparts.
The operational spirit of these quantities is the following. Let us return to the example in which Alice is trying to send a quantum state on A to Bob. However, now the state is sampled from a density matrix ρ AB , where subsystem B is already held by Bob and the two subsystems may be entangled. Can this entanglement help Alice send her part of the state to Bob? It can! For a particular version of this task, called quantum state merging [21], the number of qubits that need to be sent from Alice to Bob is the smooth conditional max-entropy H ε max (A|B), which is generally less than H ε max (A). We discuss quantum state merging in detail in Section 6.
Here are the technical definitions. The conditional von Neumann entropy, which the conditional min-and max-entropy generalize, is normally defined as However, this definition does not generalize well to smooth entropies. Instead, our starting point will be a definition of the conditional entropy in terms of the relative entropy as To see that this is equivalent to (3.9), note that with equality if σ B = ρ B .

Smooth conditional min-entropy
To generalize (3.10) to a smooth conditional min-entropy, we use the fact that there is a unique quantum generalization of the classical Rényi max-divergence D ∞ (ρ|σ) which satisfies the data-processing inequality and additivity. This is given by In words, the quantum max divergence of ρ relative to σ is the smallest number λ such that e λ σ − ρ is positive semi-definite. We then define the conditional min-entropy as 13) and the smooth conditional min-entropy as (3.14) We can gain some intuition by rewriting the conditional min-entropy as [22] H ε min (A|B) = inf where F is the fidelity F (ρ, τ ) = tr ρ 1/2 τ ρ 1/2 2 , τ AA is a maximally entangled state on two copies of A, and Φ B is a completely positive trace preserving map from B to A . This illustrates that H ε min (A|B), in a sense, quantifies how close ρ AB is to a maximally entangled state, equaling its minimum − ln |A| when A is maximally entangled with B, and its maximum ln |A| when it's completely decoupled.

Smooth conditional max-entropy
The smooth conditional max-entropy is most cleanly defined as a complement to the smooth conditional min-entropy. Recall that for any purification |ρ ABC of ρ AB , we have (3.16) A generalization of this equality will define the smooth conditional max-entropy. One can show that −H min (A|C) = ln |A| + sup This right hand side is a natural candidate definition for H max (A|B). We can test this by considering the special case where subsystem B is trivial (i.e. the state on AC is pure). We then have −H min (A|C) = 2 ln tr ρ 1/2 = H 1/2 (A) . Recall that we previously used H 1/2 (A) in our formal definition of the smooth maxentropy. It is indeed therefore natural to define the smooth conditional max-entropy H ε max (A|B) as This definition provides some intuition for the smooth conditional max-entropy, as quantifying, in a sense, how close ρ AB is to a decoupled state 1 A /|A| ⊗ σ B , equaling ln |A| when A is completely decoupled from B, and − ln |A| when it is maximally entangled with B.

Replica trick calculations
The replica trick is a standard technique for computing the von Neumann entropy S(B), based on interpretting tr(ρ n B ) as a certain observable in n copies (or replicas) of the system [23]. We first illustrate the standard technique for computing tr(ρ n ) holographically -using the saddle point approximation and analytically continuing the dominant saddle -which is well-known to give the naïve QES prescription [9,[24][25][26][27]. We then do a more careful calculation, where we analytically continue a sum over an entire family of saddles. To make this calculation analytically tractable, we make use of the fixed-area states of [28,29]. This more careful calculation gives results that differ from the naïve QES answer and avoid any contradictions.

Replica trick in holography
Given a state ρ BB , the goal is to compute The last equality is useful because tr(ρ n B ) can be computed using a path integral. Schematically, the Euclidean path integral preparing ρ BB looks like, The final picture is just a more schematic version of the first. The orange dots, with index i, label a basis of states, prepared by different boundary conditions, that are summed over. This represents the fact that a general density matrix is not just a product of a ket and a bra, but a sum of such products. In future diagrams, we suppress this index and the sum.
To construct the reduced density matrix ρ B , we glue together B in the bras and kets: Then the path integral for e.g. tr (ρ 3 B ) involves gluing together the different copies of B "cyclically" as (4.4) These boundary conditions can be applied with a "twist operator" τ , which acts on n copies of the B Hilbert space to cyclically permute the state on each copy: We call this n-replica geometry M n . By evaluating the path integral on M n for arbitrary n, and analytically continuing the answer to the limit n → 1, we can compute the entanglement entropy. We can map this boundary path integral to a bulk computation using the AdS/CFT dictionary, where Z B,n is the bulk partition function, defined by integating over all bulk geometries with boundary M n . In the semiclassical limit, this can be approximated by a sum over classical saddles. Crucially, the saddle-point geometries are not simply n copies of the original geometry glued together. They are whatever the equations of motion provide, given that boundary data.
Partially for this reason, and partially because the number of saddles depends on n, this sum over saddles is generally too difficult to evaluate, let alone analytically continue. So historically, the following trick was used [24]. Assume that a replicasymmetric configuration dominates the sum, and that all other contributions to the path integral can be ignored, such that where g s,n is the saddle-point metric, I grav [g s,n ] is the gravitational action, and Z mat B,n [g s,n ] is the matter partition function on this semiclassical background. We shall call this the Lewkowycz-Maldacena (LM) assumption. Because the saddle is replica-symmetric, we can equivalently consider the quotient of the saddle-point geometry by the Z n replica symmetry. This is also a solution to the equations of motion, except at the fixed-points of the Z n action, where there is a conical singularity with opening angle 2π/n.
We can now analytically continue the quotiented geometry to non-integer values of n. In particular, in the limit n → 1, the geometry approaches the original unbackreacted geometry, with a weak conical singularity at the Z n fixed-points. The entanglement entropy ends up being the generalized entropy of the Z n fixed-points [25], which is forced to be a quantum extremal surface by the equations of motion [9,26,27]. The dominant semiclassical saddle is the one where the QES has the smallest generalized entropy, leading to the naïve QES prescription.
Since this traditional derivation reaches a conclusion that we have shown is contradictory, the obvious next step is to do the replica trick more carefully, without the weak link of the LM assumption. This requires we introduce some other simplifying trick to analytically continue the sum over saddles. This trick will involve the use of fixed-area states, which we now explain.

Fixed-area states and their use
The fixed-area states of [28,29] are (approximate) eigenstates of certain area operators. To define such a state, consider the Euclidean path integral that prepares a particular bulk geometry, then insert into that path integral a delta function that fixes the area of some gauge-invariantly defined surfaces. 11 We might physically prepare such a state by measuring the area of these surfaces. Saddle-points of this restricted path integral must satisfy the bulk equations of motion everywhere except at the fixed-area surfaces, where they may have a conical singularity. This is because the conical deficit angle is conjugate to the area operator and is therefore undetermined in fixed-area states, due to the uncertainty principle. 12

Replica trick for fixed-area states
Consider a state ρ BB with two fixed-area surfaces, γ 1 and γ 2 . We depict its path integral as (4.8) Fixing the areas in the initial state is a boundary condition and so also fixes the areas of that surface in path integrals featuring any number of replicas of that geometry. This is what makes the sum over geometries in Z B,n doable.
Indeed, we can form geometries that satisfy all boundary conditions of Z B,nasymptotic boundary M n plus fixed areas of all fixed-area surfaces -simply by gluing together n copies of the original n = 1 bulk around the fixed-area surfaces.
Since we glue the boundary region B together in the bra and the ket path integral to make the density matrix ρ B , the neighbouring bulk region b (shown in orange) is also always glued together (4.9) Similarly, because we glue the boundary regions B together cyclically, the bulk regions b get glued together cyclically. However, because we can have conical singularities at γ 1 and γ 2 , the different copies of the region b can be glued together using an arbitrary permutation π ∈ S n . To evaluate the full path integral, we sum over all saddles, and hence sum over all permutations π. For example, (4.10) Since the replica geometry consists of n copies of the original unbackreacted geometry, the gravitational action away from the fixed-area surfaces cancels between the numerator Z B,n and the denominator Z n B,1 . The only contribution to the gravitational action that doesn't cancel out is the contribution to the Einstein-Hilbert action from the conical singularities, which are different in the numerator and the denominator. Each conical singularity gives a contribution equaling (φ − 2π)A/8πG, where φ is the opening angle of the conical singularity.
If the b regions are glued together using a permutation π, the full contribution to the action from the conical singularities in the replica geometry is therefore where C(g) is the number of cycles in the permutation g, and φ 1 , φ 2 are the conical singularity angles associated to γ 1 , γ 2 in the unreplicated geometry. After normalization, the dependence on φ cancels. Including the matter partition function, we are then left with This further simplifies because we do not need to sum over all S n . Any permutation that does not maximize C(π) + C(τ −1 • π) corresponds to an action subleading by factors of the area. The areas A 1 and A 2 are IR divergent, so those permutations are infinitely suppressed. The remaining permutations lie on the geodesic in the Cayley graph (i.e. shortest path in permutation space, where each step is a transposition) connecting τ and the identity. These are the so-called "non-crossing" permutations N C n , which all satisfy C(π) + C(τ −1 • π) = n + 1 (see e.g. [32]). Without the tr(ρ ⊗n bb τ b π b ) factor, we could evaluate this sum explicitly. The number of non-crossing permutations with C(τ −1 • π) = k is the Narayana number N (n, k). With that, we could organize the terms into a sum over k and get an analytic answer in terms of hypergeometric functions.
The bulk term interferes because it depends not just on the number of cycles C(τ −1 • π), but also on the number of elements per cycle. Fortunately, there is a way, presented in [9], to reorganize this sum into one over the number of elements per cycle.

Resolvent method
To make the calculation tractable, we will need to assume that the entropy of b is small and can be ignored, 13 so that (4.12) becomes Here, we have used the fact that C(π)+C(τ −1 •π) = n+1 for non-crossing permutations to rewrite the formula without This contains all the data about the eigenvalues of ρ B . For example, the density where R is the trace of the resolvent. We compute this as follows, heavily using the fact that ρ BB is a fixed-area state. First, Taylor expand (4.14) around ρ B = 0 to obtain We can visualize this as (4.17) Each dashed line comes with a factor of 1/λ. Then substitute for ρ n B equation (4.9), (4.18) Taking the trace of this quantity -visualized as simply connecting the dangling blue arrows into a closed loop -gives the equation We can reorganize these sums in a convenient way, to get a Schwinger-Dyson equation: (4.20) On the right hand side, the second term sums all non-crossing geometries in which the first replica of b is glued to no other replicas. The third term sums all non-crossing geometries in which the first replica of b is glued to exactly one other replica. And so on.
We now formally explain the diagrammatic expansion (4.20) in terms of equations. Starting with (4.19), decompose the sums into a sum over the number of elements m in the cycle of π that includes the first element (the "primary cycle"), as well as the number of elements n i between the ith and i + 1th element of the primary cycle: That is, (4.21) The primary cycle is always cyclic, but the other permutations π i may not be. We note that n = m + n i and C(π) = 1 + C(π i ) . Also, Plugging these into the formula (4.19) for the resolvent gives (4.23) The part in parenthesis is R itself, from (4.19), assuming rank(ρ B ) = e A 2 /4G . 14 Therefore, We are now ready to work out some specific examples.

Example 1: Mixed states Setup
Consider the setup with a dustball or a black hole, from Sections 1 or 2, depicted in Figures 1 and 4 respectively. We will simultaneously compute S(B) in both cases, first modifying the setups slightly by fixing the areas of γ 1 and γ 2 to A 1 and A 2 . All parameters we mention below apply equally well to both: e.g. ρ b is the state of either the dustball or the black hole. The same calculations also give the entropy of the black hole in our third contradiction from Section 2. A particularly concrete example, where the full non-perturbative path integral can be evaluated and agrees with the answer that we find below, is the JT gravity plus end-of-the-world (EOW) brane model of [9]. In this case, working with fixed-area states is equivalent to working in the microcanonical ensemble (note that A 2 here is the horizon area of the black hole, while A 1 = 0), and the only bulk degrees of freedom are on the EOW brane in region b , which is the assumption that we needed above to make the resolvent calculation possible.
Consider two bulk states, ρ b ,1 = |ψ ψ| pure and ρ b ,2 an arbitrary orthogonal mixed state of entropy S. We will assume that the state ρ b ,2 has a flat spectrum, and hence is perfectly compressible. We will compute the entropy S(B) for their mixture, (4.25) To keep the example as simple as possible, we assume This ensures that the bulk density matrix eigenvalues p and (1 − p)e −S are separated by a large multiplicative factor. 15 The von Neumann, min-, and max-entropies of this state are The naïve QES prescription says . (4.28) We will see that the correct answer, up to O(1) corrections, is Calculation Plug (4.25) into (4.24) and evaluate the two geometric sums to arrive at The roots are the function R(λ) that we seek. As a cubic equation, its roots can be written analytically but are difficult to integrate to compute the entropy. Fortunately, we can find a simple approximate solution, by using the assumption (4.26). We expand (4.30) in two different ways, which are valid at large R (and hence small λ) and small R (large λ) respectively. A full treatment, including proofs of all claims, is in Appendix A. The two expansions are as follows.
Expansion 1: For sufficiently large R, we have Expansion 2: For sufficiently small R, we have The condition (4.26) ensures there is overlap in the conditions where the two expansions are valid, implying that some expansion is valid for all values of R and λ. Each expansion gives a quadratic equation for the resolvent R(λ), which can be easily solved and has a single branch cut, where the eigenvalue density D B (λ) is nonzero. Both branch cuts are within the respective regimes of validity of the corresponding expansion, and so we find two distinct sets of eigenvalues. The eigenvalues in Expansion 1 come from the ρ b ,2 , while the eigenvalues in Expansion 2 come from the |ψ ψ| part of the state.
The entropy is given by where we include both sets of eigenvalues in the integral. The answer depends on how In this regime, the naïve QES prescription gives the right answer. Expansion 1 has a peak of eigenvalues at λ ≈ (1 − p)e −A 1 /4G−S , and Expansion 2 has a peak of eigenvalues at λ ≈ p e −A 1 /4G . Both are within the regime of validity of their expansion. See the top plot of Figure 5. Combined these peaks give entropy where "..." represents terms subleading at large S, A 1 , and A 2 . This includes the Shannon entropy term −p ln Here there are large corrections to the naïve QES prescription. Expansion 2 describes the same peak it did in Regime 1, giving eigenvalues at λ ≈ pe −A 1 /4G . Expansion 1 now describes eigenvalues that have crossed the phase transition, which are therefore at λ ≈ (1 − p)e −A 2 /4G . Both peaks are still well-separated, and the expansions continue to be valid at the peaks. See the middle plot of Figure 5. The entropy comes out to and again we have dropped subleading terms, including the Shannon term. Note that this entropy is different from the naïve QES answer. While the naïve answer only cares about the relative sizes of S(b ) and ∆A/4G, this answer is independent of those relative sizes! Indeed, by dialing S, we can place S(b ) on either side of ∆A/4G, as we please: The entropy S(B) equals (4.35) in both cases, while the naïve QES prescription gives totally different formulas for the two cases! In both cases, the naïve QES prescription gives an answer that is larger (at leading order) than the correct answer. The naïve QES prescription failed because it treated the bulk eigenvalues in an all-or-nothing way, stubbornly refusing to acknowledge that some of the eigenvalues are much larger than the phase transition value e −A 2 /4G , even though many others are small enough to have crossed the phase transition.
The naïve QES prescription is back to receiving no corrections. In this regime, Expansions 1 is never valid, while Expansion 2 describes a peak of eigenvalues that have crossed the phase transition, sitting at λ ≈ e −A 2 /4G . See the bottom plot of Figure 5.
We obtain an entropy again letting "..." represent subleading terms. There is no Shannon term in this regime. This agrees with the answer from the naïve QES prescription because

Higher Renyis
So far we have computed the von Neumann entropy in each regime, finding large corrections to the naïve holographic prescription in Regime 2. What about the higher Renyi entropies? There is a holographic way to compute them as well [33]; does that also have large corrections in some regime?
The answer is that their corrections are generally much smaller, even in Regime 2. This is straightforward to derive with the resolvent approximations we have given. For integer Renyi entropies with n > 1, this is fairly self-explanatory. These can be computed directly using n replicas without the need for any analytic continuation, and so can always be computed in the semiclassical limit using a saddle point approximation.
More interestingly, the corrections are also nonperturbatively small for non-integer Renyi entropies with n > 1 (and n < 1), so long as (n − 1) is finite in the semiclassical limit. The large corrections to the von Neumann entropy come from the limit n → 1 not commuting with the semiclassical limit G → 0, unless we keep track of nonperturbatively small corrections.

Example 2: Entangled states Setup
This next example demonstrates the role of conditional min-/max-entropy. It closely resembles the previous one, but now the dustball or black hole b is entangled with another dustball or black hole that is always in the entanglement wedge. For evaporating black holes (or their JT + EOW brane cousins), it calculates the entropy of the Hawking radiation rather than the black hole. The setups are detailed in Section 2 and the dustball version is depicted in Figure 3. Again those setups are simplified by fixing the areas of γ 1 and γ 2 to A 1 and A 2 .
We emphasize again that this setup is, quite literally, the complement of the first one. In that example, while we imagined a CFT BB in a mixed state, we could have instead imagined it purified by some reference system R. Introducing R changes nothing about that calculation. Nonetheless, it is useful because S(BR) = S(B) regardless of the makeup of R. Here we imagine R to be an identical copy of BB, with the same size dustball or black hole in its bulk dual r. For notational simplicity, we shall combine R into B, so that we are just computing S(B). Similarly in the bulk we combine r into b.
We consider the following two states of the dustballs or black holes. One is a pure, factorized state The other is a pure, maximally entangled state, ρ b b,2 , with entanglement entropy S. We will compute the entropy of their mixture, (4.38) Again, we keep this example simple by assuming (4.26). The conditional von Neumann, min-, and max-entropies of this state are  . Eigenvalue density for the three Regimes in Section 4.3. In Regime 1, there are two peaks of eigenvalues, each associated to one of the two states in the mixture, and each much greater than the critical value 1/e A 2 /4G . Hence the naïve QES prescription is correct.
In Regime 2, one of the peaks has shifted to the critical value, while the other remained where it was, leading to large corrections in the naïve QES prescription. In Regime 3, both peaks have moved to the critical value, and the naïve prescription is valid again. Note the agreement with numerical results for the analogous random tensor network in Appendix C.
naïve QES prescription says The correct answer up to O(1), we will see, is . (4.41)

Calculation
Rather than write out a resolvent like we did before, we will use a trick to compute the entropy in each of these three regimes. Notice that these smooth conditional min-and max-entropies (4.39) equal minus the max-and min-entropies (4.27) respectively, from Example 1. This was the general rule, from Section 3: for a pure state on ABC, (4.42) Since the system b that we conditioned on in Example 1 was trivial, we have So we can compute the entropy in the three regimes as follows. Consider, for example, the regime in which both the conditional min-and max-entropy are less than ∆A/4G ≡ (A 1 − A 2 )/4G. This corresponds exactly to the regime in Example 1 where both minand max-entropy were greater than (A 2 − A 1 )/4G. So, using purity of BB, the entropy S(B) in this regime equals S(B) from that regime. Thus the entropy S(B) is completely deducible from the results of Example 1.
The key lesson is this: there is an important role played by bulk entanglement, encapsulated by the conditional min-and max-entropy. That's the only way this setup would be consistent with the complementary answers from the previous example.
This is Regime 2 of Example 1, so again here there are large corrections to the naïve QES prescription. The entropy comes out to and again we have dropped subleading terms, including the O(1) Shannon term. This entropy is different than the naïve QES answer.
This is Regime 1 of Example 1, and hence the naïve QES prescription is back to receiving no corrections. The entropy is where again "..." includes the Shannon term. This matches the naïve QES answer because

Example 3: Arbitrary Entanglement Spectra
What about more general bulk states ρ b , which aren't simply the mixture of two states with (approximately) flat entanglement spectra (again forgetting about entanglement, for now)? For an arbitrary bulk state ρ b with eigenvalue density D b (λ b ), the resolvent recursion relation (4.24) becomes We will not be able to solve this equation as precisely as we were able to calculate the resolvents in the preceding examples, but we will still have sufficient control to calculate the von Neumann entropy up to O(1) corrections.

Calculation
Our strategy will closely mirror the strategy used to calculate corrections to the von Neumann entropy near the Page transition in [9], and we refer the reader to that paper (and in particular Appendix F) for more detailed justifications. We will perturbatively approximate the resolvent for λ e −A 2 /4G , and argue that there are no eigenvalues with λ e −A 2 /4G whenever the smooth max-entropy is sufficiently large. Combined these two results will enable us to calculate the von Neumann entropy up to O(1) corrections.
For λ e −A 2 /4G , we treat the second term in (4.48) as a small perturbation. At leading order, the resolvent is given by R 0 (λ) = e A 2 /4G /λ. The leading contribution to the density of states comes from the first perturbative correction To justify this perturbative approximation, we assume λ has a small imaginary part i , with λ e −A 2 /4G . Hence  16 For λ e −A 2 /4G , we can ignore the first term in the denominator of (4.48) to get the self-consistent approximation Note that going to higher orders in perturbation theory will not introduce a nonzero eigenvalue density, because there are no poles in (4.48) for these values of λ. We conclude that in Regime 1 we have ? We want to argue that there are no eigenvalues with λ εe −A 2 /4G , and hence that R(λ) is negative and real. To do so, we rewrite (4.48) to give λ as a function of R For small negative R, λ is large and negative, since the first term dominates. When R is very large and negative however, the second term dominates (thanks to our assumptions about the smooth max-entropy), and so λ is positive. There will be some intermediate R where λ is maximal, which gives the bottom of the entanglement spectrum.
To lower bound this maximum, we choose some R e A 2 /2G . Then the second term in (4.52) dominates and we find (4.53) The last approximation again follows from our assumption about the size of the smooth max-entropy. We therefore conclude that there are no eigenvalues with λ e −A 2 /4G , as expected.
We can now calculate the entropy S(B). Since we know the eigenvalue density for both λ e −A 2 /4G and λ εe −A 2 /4G , we know the remaining eigenvalues must all have ε e −A 2 /4G λ e −A 2 /4G . Up to O(ln ε) corrections, this means that , we can ignore the first term in the minimization and we recover the naïve QES prescription result S(B) = A 2 /4G. However, when H ε min (b ) (A 2 − A 1 )/4G, we find leading order corrections. Our results agree with the refined QES prescription in all three regimes.

Summary
Let us summarize what we learned in this section. Doing a careful calculation -without the LM assumption -reveals a refinement of the naïve QES formula, which can differ from the naïve one at leading order. This refined QES prescription compares the smooth conditional min-and max-entropies to the difference in areas. We have seen this in three examples, all of which were fixed-area states, and all of which had particular simple bulk states. Unfortunately, states where there is a large amount of entropy in all the bulk regions and an arbitrary entanglement structure, and states where the areas are not fixed, are beyond the current technology we have for computing the replica trick without using the LM assumption. However, in the next section, we will derive the QES refinement more generally, beyond these particular bulk states and beyond fixed-area states, by using a more indirect approach.

Corrections in general holographic states
We start by arguing that the naïve QES prescription is valid, whenever the smooth conditional min-and max-entropy are safely on the same side of (A 2 − A 1 )/4G. This generalizes half the pattern from our examples, now showing that for any state on bb b the naïve QES prescription can be trusted when the min-and max-entropy are on the same side of the area difference, though we emphasize that we still limit ourselves to two competing fixed-area QES, as in Figure 1.
Then in Section 5.2 we prove that there are generally large corrections to the naïve QES prescription in the regime where we did not prove the corrections are small. I.e. there are large corrections when the min-and max-entropy are on different sides of the area difference.
Finally, in Section 5.3 we remove the fixed-area requirement, demonstrating that more general geometries follow the same pattern, up to a relatively small difference, O(ln G).
Altogether, our argument shows that there are large corrections in general holographic states if and only if the bulk min-and max-entropy straddle the difference in areas between the two competing QES.

The regime of validity of naïve QES in fixed-area states
We first argue that the naïve QES prescription is valid, up to o(1) corrections, for general fixed-area states with two extremal surfaces, so long as either in which case the minimal QES is the surface γ 2 , or in which case the minimal QES is the surface γ 1 . By "much greater than", , we mean a difference that is much larger than O(ln G). Therefore, large corrections can only exist if We will later argue in Section 5.2 that significant corrections (at least O(1) in size) in fact always exist when Our strategy will be to make use of the correspondence between the nonperturbative corrections to the replica trick entropy in a) fixed-area states in gravity and b) single-tensor random tensor networks (RTNs) [35], a nonperturbative equivalence first noted in [9].
Let us start by reviewing that correspondence. We have already evaluated tr(ρ n ) for fixed-area states in Section 4. We found that for a general normalized bulk state ρ bb b, the dual normalized boundary state ρ BB satisfied where the fixed permutation τ is cyclic, the sum is over permutations π that maximize C(τ −1 • π) + C(π), and the operators τ b , π b permute the n copies of their respective subsystem.
We want to show that one finds the same formula in RTNs, where This is shown graphically in Figure 6. Here the subsystems B and B have dimensions d B = exp(A 2 /4G) and d B = exp(A 1 /4G) respectively. We can write [35] tr(ρ n B ) = tr( Here we have written V = U V 0 for a fixed isometry V 0 and a Haar random unitary U . Now, we can use the formula [36] dU where π(i) ∈ {1, ..., n} represents the arbitrary permutation π acting on the i-th element of an n-element set.
This is exactly the result that we found in the gravity calculation.
Since we can (in principle if not in practice) calculate the entropy S(B) simply using by analytically continuing tr(ρ n B ), the RTN must have the same entanglement entropy as the gravity calculation. Armed with this knowledge, we can calculate the entanglement entropy in the RTN, using any techique we want, and thereby find the gravitational answer as well.
The key result that we will use is the one-shot decoupling theorem, Theorem III.1 of [37] (see Appendix D for our summary of the proof), which says that for V 0 ρ bb V † 0 , then so long as What does this theorem mean? It states that (5.10) is a sufficient condition to ensure Moreover, the condition (5.10) can be weakened, replacing the min-entropy H min (b |b) ρ by its smooth version H ε min (b |b) ρ , with only a small degradation in the quality of the approximation, as follows from the definition of smoothing.
The state on the right hand side of (5.12) has two essential features. The first is that it depends only on the reduced state ρ b , and is completely independent of b . The second is that its entropy corresponds in gravity to the generalized entropy of the surface γ 2 . We want to use this to bound the entropy of the state ρ B itself. To do so, we need the Fannes' inequality [34] Here S 2 (p) = −p ln p − (1 − p) ln(1 − p) is the Shannon entropy of the probability distribution (p, 1 − p).
Applying this inequality to the states in (5.11), we find that If we take ε (in both (5.11) and the smooth min-entropy) to be polynomially small in G (say O(G 2 )), then (5.10) is satisfied whenever and this difference grows faster than ln(1/G) in the semiclassical limit G → 0. Moreover, (5.15) says S(ρ B ) is given by the generalized entropy of the quantum extremal surface γ 2 , up to a perturbatively small (O(G)) correction.
We also want to show the QES prescription is valid, this time with minimal QES γ 1 , so long as It turns out that this follows by the same arguments used above, applied to the complementary boundary region. We first consider some arbitrary purification |ψ bb b R of the bulk state ρ bb b. We now want to calculate the entanglement entropy S(BR) of the corresponding pure boundary state. This is, of course, equal to the entropy S(ρ B ) that we are really interested in.
As can be seen immediately from the tensor network picture, this is exactly the same situation that we considered before, except that B has been replaced by B ⊗ R, b has been replaced byb ⊗ R, and the areas A 1 and A 2 have been exchanged. It follows that so long as Since |ψ bb b R is pure, H ε min (b |bR) ψ = −H ε min (b |b) ρ and so this is exactly (5.17). It is worth briefly commenting on whether an equivalent formula to (5.11) could be directly shown in gravity. The proof of (5.11) is reviewed in Appendix D and involves calculating the Hilbert-Schmidt distance between the entangled and product states, and then using the Hilbert-Schmidt norm to bound the trace-norm. Since the Hilbert-Schmidt norm can be computed directly using a path integral (without analytic continuation), it should in principle be possible to evaluate in gravity. Just like for Rényi entropy calculations, for fixed-area states the gravity answer should agree with the random tensor network answer. However, in order to derive a gravitational decoupling theorem, one would still need to use some quantum information tricks (basically a clever application of the Cauchy-Schwarz inequality) in order to eventually bound the trace-norm. The derivation would therefore still not be a completely direct gravity calculation.

The regime where naïve QES fails
So far we have only argued that there aren't significant correction to the QES prescription, so long as we don't have In this section, we argue that there do exist significant corrections (at least O(1) in size) when Here, ε can be relatively small, but should be parametrically O(1) in the semiclassical limit. Note that we do not have a general argument that these corrections need to be leading order, although we strongly expect that this is the case so long as (5.21) holds at leading order, as we found in the simple examples in Section 4. Finding a proof that this is true in full generality is an important task for future work.
Our main tool will be the converse one-shot decoupling theorem of [38]. Applied to the random tensor network in Figure 6, this says that, if (5.21) holds, then Using Pinsker's inequality, these lower bound the relative entropies Since there is O(1) mutual information in each case, we find that the naïve QES prescription receives at least O(1) corrections. Since the same replica trick calculation gives the entropy of both the random tensor network and the corresponding fixed-area state, the same is true for fixed-area states. In this section, we argue that, up to small O(ln G) corrections, the entropies of such states can be calculated by expanding the state as a superposition of fixed-area states, and then taking an expectation of the entropies of the states in the superposition. 17 From our point of view, the primary importance of this result is that, at leading order, the entanglement entropy of a generic holographic state is the same as the entropy of a fixed-area state with the same classical area. Therefore general holographic states inherit the leading order corrections we found for fixed-area states. It also shows that the corrections to the naïve QES prescription are small, for general holographic states, so long as

From fixed-area states to general holographic states
and similarly for H ε min (b |b). That said, our argument also has other technical applications, for example bounding the error in the assumptions used in [14] to calculate the O(1/ √ G) corrections to the entanglement entropy near a QES phase transition.
Our starting point is that the general holographic state |ψ can be written as a superposition over fixed-area states |A 1 , A 2 as The fluctuations in the area are Gaussian (in the semiclassical limit) with width O( √ G) (see [14] for detailed calculations), so we can approximate the state up to any polynomially small error (w.r.t G) by a state with support only on an O( √ G ln G) range of values. 18 As with any continuously valued measurement operator, it is not well defined to measure the area exactly. Instead, the area operator should be viewed as a projectionvalued measure (PVM), and the states |A 1 , A 2 should be viewed as the outcome of measuring the area to some precision δ. We shall take δ to be polynomially small with respect to G.
It follows from the preceding two paragraphs that the number of distinct fixedarea states in the superposition scales as O(G ln G/δ 2 ). (Note that we are taking a superposition over states with both A 1 and A 2 fixed, which squares the number of terms in the superposition.) Crucially this means that the number of states is polynomial in 1/G.
We are now almost ready to consider the reduced density matrix ρ B = tr B |ψ ψ|. However, as an intermediate step we first consider taking a superposition over only states with different values of A 1 , for some fixed A 2 . In other words, we have (5.31) The first thing to observe is that the bulk operatorÂ 1 is always reconstructable on the boundary region B. Hence the only terms that survive the partial trace have A 1 = A 1 . We therefore find that ρ B (A 2 ) can be written as the incoherent mixture The error in neglecting the tail of a Gaussian outside a window ∆x goes like e −O(∆x/σ) 2 . So if ∆x equals k standard deviations, the error goes like e −O(k 2 ) . Hence the √ G is for the standard deviation, and the √ ln G ensures we capture a greater number of standard deviations as G → 0, such that the error tends to zero polynomially in G.
However, as discussed in Section 2, we can bound the entropy of such a mixture from above and below by The difference between the upper and lower bounds is an O(ln G) entropy of mixing term (because there were O( √ G ln G/δ) distinct states in the superposition) and hence can be ignored at leading order (and for calculating the O(1/ √ G) corrections discussed in [13,14]). Now we need to take a superposition over different values of A 2 . Because all the states involved are pure, S(ρ B ) = S(ρ B ), and, for any A 2 , S(ρ B (A 2 )) = S(ρ B (A 2 )). We can therefore compute the entropy of the reduced state on B rather than B.
Since A 2 can always be reconstructed on B, this is again an incoherent mixture Hence we have Again the difference between the lower and upper bounds is O(ln G) and so can be ignored in lower order calculations. Altogether, we therefore find which is exactly what we set out to show. In particular, the QES prescription is valid at leading order for general holographic states, whenever it is valid for the corresponding fixed-area states. Moreover, the QES prescription receives leading order corrections, whenever there are leading order corrections to the entropy of corresponding fixed-area states. When the difference in areas is smaller than the fluctuations in this difference, we also find the (O( 1/G)) corrections from [9,13,14]. There's one remaining remark to make. The fluctuations in the areas A 1 and A 2 are formally divergent when we take the radial cut-off to infinity. This leads to a natural question of whether we were justified in treating the potential entropy of mixing terms as smaller than O(1/G) but non-divergent corrections.
The short answer is that this subtlety does not matter for our purposes. The IR fluctuations create a constant (divergent) difference between the entropy in fixedarea states and entropy in general states, independent of which QES is dominant, independent of the bulk state. Hence the entropy S(B) in a general bulk state can be computed as the expectation of the entropies of the fixed-area states in its superposition, as we already argued, plus a constant shift. This shift is currently underappreciated and deserves more study, but it does not affect our ability to infer general corrections from fixed-area states.
The longer answer is as follows. First note that the IR pieces of A 2 and A 1 are the same, so the fluctuations of A 1 , in states where the area of A 2 is fixed, do not diverge [14]. This implies the difference between the lower and upper bounds in (5.33) is genuinely finite. Moreover, this implies that A 2 − A 1 is independent of this IR subtlety, implying the condition for corrections (5.3) remains well-defined.
The important effect of this IR subtlety is in the entropy of mixing term in (5.35), and is indeed divergent. However, it represents a large constant shift -not a large window -because the lower bound can be strengthened to include this divergence as well. This works as follows. Let there be some fixed radial cutoff ε, such that A 2 diverges in the ε → 0 limit. Group fixed-area states into blocks corresponding to some O(1) range of areas. There are a polynomial in 1/G number, O(1/δ), of fixed-area states in each block. This number grows as G → 0. As the IR cutoff is taken away, the number of such blocks grows as 1/ to some power.
We can separate the Shannon term associated to the mixing of these blocks from the Shannon term associated to the mixing of the O(1/δ) states within each block. The first Shannon term does not depend on G, only on ε.
This IR Shannon term, crucially, can be included in the above lower bound of (5.35). The resulting inequality is true because the different blocks are distinguishable on both B and B. Indeed, A 1 , a quantity known to B, takes on vastly different values in the different blocks (because its IR value matches that of A 2 ). Therefore, the entropy of mixing associated to the IR fluctuations of area can be understood as a constant shift to the entropy, present even if there is just a single fixed-area surface. This concludes the argument.

Entanglement wedge reconstruction
This refinement of the QES prescription brings with it a refinement of the condition for entanglement wedge reconstruction (EWR).
We show in Section 6.1 the refined conditions are the following. A region B of the boundary will be able to reconstruct the state of a region b of the bulk, as in Figure  1, given a bulk state ρ, if and only if 19 This condition is similar to that from Hayden and Penington 20 [8] (see also [4]), but builds on it in a key way. The similarity is that both depend at some level on the comparison between ∆A/4G and H max (b ) ρ (though [8] did not say it this way).
The key difference is that (6.1) tells you whether B can reconstruct the particular state ρ. The condition from [8] tells you whether there exists a single reconstruction procedure that works for any state in a code subspace that contains ρ.
This difference shows up in two places: the smoothing of H max , and the conditioning on b. The smoothing allows us to only care about the approximate dimension of ρ b , formalizing the intuitive notion that we can ignore small pieces of the wavefunction and still approximately reconstruct the state. The conditioning on b quantifies how entanglement in ρ helps B reconstruct b , formalizing the intuition that bulk entanglement between b and b can aid reconstruction.
In Section 6.2, we explain that this new, state-specific formulation of EWR (6.1) is equivalent to a well-known quantum information task, one-shot quantum state merging. Furthermore, we explain that the AdS/CFT dictionary performs this task maximally efficiently. EWR is just very efficient one-shot quantum state merging. 21

State-specific EWR
Let us first carefully define what we mean by EWR for an arbitrary, single bulk state ρ.
Traditionally, EWR has been defined not for a single (mixed) state ρ, but for a code subspace of states H code . There are then two definitions of what it means for EWR to be possible, depending on whether we work in the Schrödinger or Heisenberg picture. In the Schrödinger picture, we need to find a quantum channel R : B → b ⊗ b that recovers the reduced bulk state on b ⊗ b from the reduced boundary state on B, for any 19 We discuss setups with more than two candidate QES in Section 7. 20 See also Dong, Harlow, and Wall [12], which first derived EWR in settings with small code subspaces, where the minimal QES is determined up to perturbative corrections by the area term. 21 Let us make a helpful distinction. The term "entanglement wedge reconstruction" usually means two things at the same time: the task of encoding b into B (and then decoding), and also the particular protocol implicit in the AdS/CFT dictionary, the protocol that performs the task. The task, we will explain, is a special case of quantum state merging. The protocol, we will argue, is a very efficient way to perform quantum state merging. state in the code subspace. In the Heisenberg picture, for any bulk operator acting on b ⊗ b , we need to find an operator O B , acting only on the boundary region B, whose action is the same as the action of the bulk operator, when applied to any state in the code subspace. 22 We can replace this definition with a definition that considers only a single state, by utilizing a canonical purification |ψ bb b R of the maximal mixed state within the code subspace. In this language, EWR is possible if and only if, for any bulk operator on b ⊗ b , there exists an operator reconstruction on B that has the correct action on |ψ bb b R . Similarly, in the Schrödinger picture, EWR is possible -in this single-state language -if and only if it is possible to recover a canonical purification of the bulk state onb ⊗ R from the boundary state on region B.
If EWR were exact, this single-state definition would be exactly equivalent to the traditional, code subspace definition. However, because EWR is in practice only approximate, there is a slight difference. In the traditional definition, the error is commonly defined as the 'worst-case' error, i.e. the largest output error for any input state. The error when acting on a maximally entangled state is more like an 'averagecase' error: the reconstruction can do a lot worse on particular input states, as long as it does well for most input states. (See e.g. the discussion in [39].) An advantage of this new, single-state definition is that it very naturally generalizes to mixed bulk states ρ with an entanglement spectrum that isn't flat. Again, we simply say that EWR is possible if a boundary operator exists with the correct action on a (canonical) purification of the bulk state ρ. When the state ρ is unentangled, this just means that we are taking a 'weighted-average' error, where ρ b tells us how different states should be weighted. However, when the state ρ bb is entangled, we can take advantage of that entanglement to make reconstruction easier. This has no classical analogue.
When is EWR of region b -using this more general definition -possible? We start by considering the tensor network shown in Figure 6. In this setup, a necessary and sufficient condition for EWR is approximate decoupling [40,41]. Namely, that Roughly speaking, the intuition for this is that all purifications are equivalent up to unitaries, and B purifies B ⊗ b ⊗ R. It follows that, if (and only if) the reduced state on B ⊗ b ⊗ R is (approximately) the product of a state on B and a state on b ⊗ R, then we can extract a purification of b ⊗ R from B. As discussed above, this is just the Schrödinger picture definition of EWR. As discussed in Section 5.1, (6.2) holds if and only if In other words, the condition for EWR of region b is exactly the condition for the QES prescription to be valid, with minimal QES γ 1 (and hence region b is 'in the entanglement wedge').
We would like to show that the same condition holds for EWR in gravity. Given our discussion in Section 5.1 about the close connections between random tensor networks and fixed-area states, it should be unsurprising that this indeed the case.
The simplest argument for this is to use the Petz map reconstruction [42][43][44]. This is a explicit general-purpose construction for reconstructing operators that is known to be close to optimal. Specifically, using the Petz map (with reference state ρ bb ⊗ σ b for any full-rank state σ b ) will give a reconstruction error that is at most twice the optimal error [39,42]. Hence, for the random tensor network the Petz map reconstruction will work with small error, if and only if (6.3) holds.
However, Petz map matrix elements can be computed using a replica trick [9]. And, as for the von Neumann entropy, the replica trick calculation is identical for both fixed-area states and random tensor networks [9,45]. We can therefore use the known results for random tensor networks to do the analytic continuation and conclude that the Petz map reconstruction succeeds (and hence EWR is possible at all) if and only if (6.3) holds.
What about EWR in states where the extremal surface areas are not fixed? Since the area operator A 2 can always be measured on B, we are free to consider states of fixed A 2 . If entanglement wedge reconstruction is possible for all values of the area A 2 , it must also be possible for states that involve superpositions over A 2 , because we can reconstruct an operator φ b as where the sum is over possible values of the area is a reconstruction of φ b for states with area A 2 , and Π A 2 is a projector onto the area being A 2 .
In general, we can't do the same thing for the area A 1 , since it is not always measurable from B. However, if the region b is reconstructable on B for all states in the superposition, then A 1 can be reconstructed in B for all the states, and we can use exactly the same argument to contruct operators that work for superpositions of eigenstates of A 1 . We therefore conclude that entanglement wedge reconstruction is possible so long as The above argument was somewhat sloppy. Our previous argument for EWR of the region b , in fixed-area states, involved operators that acted within a single fixed-area code subspace (as in the tensor networks). The operator A 1 instead compares code subspaces with different areas. How do we know that it has the same reconstruction conditions?
Again, we can turn to the Petz map. To reconstruct the operator A 1 using the Petz map, we need to consider a reference state that involves a mixture of states with different areas A 1 . In the replica trick calculation of the Petz map matrix elements, the area A 1 in each replica has to be the same, whenever the different replicas are glued together at the surface γ 1 . If this is the case, the parts of the mixed reference state with the 'wrong' area A 1 will not contribute to the operator action, and the reconstruction will succeed. If some of the replicas are instead glued together at the surface γ 2 , then the areas A 1 do not need to be the same, and the reconstruction will fail.
The same statement is also true for the Petz reconstruction of ordinary bulk operators in region b [9]: the reconstruction succeeds if and only if the contribution from saddles where replicas are glued together at γ 2 is small (and so can be safely ignored while doing the analytic continuation). We already argued that those reconstructions succeed when (6.3) holds. Hence, when (6.3) holds, gluing at γ 1 must dominate the analytic continuation, and hence the operator A 1 must also be reconstructible.

EWR as one-shot quantum state merging
This single-state reformulation of EWR is a special case of a ubiquitous informationtheoretic task, known as one-shot quantum state-merging [21,46,47].
In quantum state-merging, Alice and Bob share a quantum state. This state is chosen from some arbitrary ensemble of pure states with density matrix ρ AB . Alternatively, we can consider a single purification |ψ ABR of ρ AB . The objective of the task is to transfer Alice's part of the state to Bob while sending as few qubits from Alice to Bob as possible. In other words, to produce an output state |ψ A BR ≈ |ψ ABR where the A and B subsystems are both held by Bob. Equivalently, the average error between the initial state (shared between Alice and Bob) and the final state (held only by Bob), should be small, where the average is over the pure states in the ensemble with density matrix ρ AB .
It should be clear that this task is closely related to state-specific EWR. There too, the bulk state either is chosen from some ensemble ρ bb b, or is purified by a reference system as |ψ bb b R . The part of the state that is held by Bob corresponds to the part of the state that is encoded in the boundary region B. The bulk region b is always encoded in the boundary region B; this corresponds the part of the state that is initially held by Bob.
In the one-shot setting (where Alice and Bob are trying to merge a single copy of the state), it is known that the minimum number of qubits required for state merging is H max (A|B)/ ln (2). Remarkably, this is exactly how many qubits gravity seems to require! The number of qubits from region b that can be decoded in region B is ∆A/4ln(2)G, as stated in (6.3). Hence it seems that EWR can be explained not just as a special case of quantum state merging, but as an optimal implementation of it! However, there is an important caveat that we have ignored until now. In quantum state merging as traditionally defined, it is crucial that unlimited classical information can be sent from Alice to Bob [21]. Without this classical communication, significantly more quantum communication would be required.
Holography does not transfer large amounts of classical information from b to B. Indeed, the amount of transferred classical information is bounded by the Holevo information, which is also equal to ∆A/4G [48]. That is, the total number of transferred qubits plus bits is bounded by ∆A/4G. There is no additional classical communication that can make state merging achievable.
So if EWR is accomplishing state merging, why did our results from Section 6.1 suggest that we only need ∆A 4G > H ε max (A|B) (6.6) for EWR to be possible? It turns out that the full power of classical communication is unnecessary for quantum state merging. Instead, a weaker communication primitive, known as zero-bit communication, is sufficient [49]. The number of zero-bits communicated from region b encoded in region B is not constrained by ∆A, and it is this additional information that allows the state merging protocol to succeed when (6.6) holds.
To understand this, we start with the resource inequality governing a highly efficient, rather general quantum protocol, the "one-shot mother protocol," also known as (one-shot) quantum state transfer or fully quantum Slepian-Wolf [50,51]. The inequality states that (6.7) At first glance, this inequality is somewhat terrifying. Let's take some time to unpack it. The whole statement relates the relative usefulness of different quantum communication resources. On the left, we start with the state |ψ , which is shared between Alice, Bob, and the reference R. Alice also has the ability to send [H ε 0 (A) ψ + H ε max (A|B) ψ ] /2 ln(2) qubits to Bob.
The claim is that this is more useful to Alice and Bob than the resources on the right hand side, because the resources on the left can be used to create the resources on the right (up to some small error). What are the resources on the right? We still have the state |ψ , but it has now been successfully 'merged,' so that everything except the reference is now in system A B, held entirely by Bob. Alice and Bob have also gained (2) Bell pairs or 'ebits'. For clarity of presentation, we dropped additional terms in (6.7) of size O(ln ε), terms correcting the number of qubits required and ebits produced. These corrections are subleading for appropriate choices of ε in the limit where the entropies are large. We note that the inequality is optimal in the following sense: in any protocol for oneshot quantum state transfer, the number of qubits communicated, minus the ebits of entanglement gained, will be at least for a particular ε that is controlled by the protocol error. How does this relate to quantum state merging? In the language of resource inequalities, quantum teleportation states that 1 ebit + 2 cbits ≥ 1 qubit , where a cbit is a classical bit. Substituting this inequality into (6.7), and recalling that classical communication is free in traditional quantum state merging, we find that the number of qubits that need to be sent is H ε max (A|B). Hence unlimited classical communication does allow Alice to give her state to Bob, just by using the mother protocol and transferring H ε max (A|B) qubits. As an aside: note that quantum conditional entropies can be negative. What does it mean if only a negative number of qubits need to be sent from Alice to Bob? The answer is that the communication cost in state merging is defined catalytically. If the protocol produces Bell pairs, these can be stored, ready to use, together with the free classical communication, to produce quantum communication in the future. We can end up with more ability to communicate than we started with! Returning to the main point, we emphasize that classical bits are not actually required to do teleportation. Zero-bits are sufficient. We have 1 ebit + 2 zero-bits (a) = 1 qubit , (6.9) where the (a) means that (6.9) only holds at leading order in the limit where we have a large number of each type of bit. Note that, unlike (6.8), (6.9) is an equality, not an inequality. Zero-bits are the minimal resource required for teleportation. Therefore, with enough zero-bits communicated from Alice to Bob, Alice can give Bob her state with just H ε max (A|B)/ ln(2) qubits, using the mother protocol. To see this, substitute (6.9) into (6.7), finding that (2) zero-bits ≥ ψ A BR . Here S 0 = ln(d b ) is the thermodynamic entropy in region b . So, for example, when the code space states in region b are the possible microstates of a black hole with horizon area A hor , we have α = ∆A/A hor . We can convert α-bits into a mixture of qubits and zero-bits using another resource equality from [49], namely 1 α-bit = α qubits + (1 − α) zero-bits . (6.12) We therefore find that region B can receive 1 ln (2) S 0 α-bits = ∆A 4 ln(2)G qubits + 1 ln (2) S 0 − ∆A 4G zero-bits (6.13) from region b . This is worth emphasizing: the AdS/CFT dictionary transfers more than ∆A/4 ln(2)G qubits of information from b to B. It also transfers many zero-bits, precisely S 0 − ∆A 4G / ln(2). That was for ∆A > 0; what about ∆A < 0? In this case, region B encodes no physical information about region b (if b is not heavily entangled with b). Nonetheless, the right hand side of (6.13) still formally defines the amount of information from b accessible in B. This is important, for example, if we start adding bulk entanglement, as in the following scenario. Imagine that more than |∆A|/4 ln(2)G Bell pairs are shared between regions b and b. Then the zero-bits of the remaining degrees of freedom in b will be encoded in B. This follows from the associated phase transition in the minimal QES. This phase transition is reflected in (6.13) in the following way. Converting qubits into ebits and zero-bits using (6.9), the right hand side of (6.13) says that |∆A|/4 ln(2)G ebits allow S 0 − |∆A|/4 ln(2)G zero-bits to be transferred from b to B, which is exactly what we just found. (Any additional ebits will continue to combine with those zerobits to form qubits of communication, reflecting the fact that adding more and more entanglement between b and b allows B to recover larger and larger subspaces of b .) As an aside, we emphasize that (6.13) allowing additional zero-bits (on top of ∆A/4 ln(2)G qubits) from region b to be encoded in region B is not some strange phenomenon that only happens in quantum gravity. Instead, it happens very generically whenever you have a noisy quantum channel. Consider the well-known properties of the quantum capacity of a channel, i.e. the number of qubits that can be communicated through that channel. The quantum capacity of a noisy channel is given by the socalled maximal regularized coherent information. However, the entanglement-assisted quantum capacity is given by half the maximal mutual information, and is generically strictly larger. The difference comes from the channel having an additional zero-bit capacity. Free entanglement allows the zero-bits to be 'upgraded' to qubits, giving additional qubit capacity.
Let's see how the same phenomenon manifests itself in gravity. Suppose we have S 0 > ∆A/4G. In this case, without using entanglement, we can learn, at most, ∆A/4 ln(2) G qubits in b from B. Not all the information is encoded there. However, let's imagine we entangle (S 0 − ∆A/4G)/2 ln 2 Bell pairs between region b and b . If we do this, all the information about the remaining (S 0 + ∆A/4G)/2 ln 2 qubits in region b will be successfully encoded in region B (the entanglement wedge will have expanded to include b ). By using entanglement between b and b , we have increased the amount of information about region b that is accessible in region B. This increase in information capacity from entanglement assistance comes from the extra zero-bits in (6.13).
Having understood the information transferred from bulk to boundary, we are now ready to interpret the conditions for EWR that we found in Section 6.1. We first note that for any state |ψ , we have 14) It therefore follows from (6.10) and (6.13) that there are sufficient qubits and zero-bits for state merging, and hence the encoding (and reconstruction) of region b from region B using any protocol, if and only if This is exactly what we found in Section 6.1.
To summarize, we noted that the task of encoding b in B is the same as the task in quantum state merging. This simply followed from definitions. We were then led to ask how efficiently AdS/CFT performs this task, requiring us to carefully account for exactly how much information is transferred from b to B by the AdS/CFT dictionary. The total information, we noted, is ∆A/4 ln(2)G qubits plus additional zero-bits (6.13). This is just enough transferred information for the most efficient state-merging protocol (the mother protocol) to work. I.e. one could not transfer the bulk information in b to B using any fewer resources. It's remarkable that AdS/CFT encodes b in B exactly when just enough information is transferred from b to B for any protocol to do it. EWR is a maximally efficient state merging protocol.
In contrast, the naïve QES prescription suggests that AdS/CFT exceeds the maximal efficiency bound, performing state merging as though every state were perfectly compressible. 24 We emphasize that the arguments in this section should not be interpreted as an independent proof of the results from Section 5. A channel having sufficient capacity to carry out some task does not automatically mean that any (possibly inefficient) protocol using that channel will actually perform the task. Conversely, one could worry that region B might encode some other form of information about region b , distinct from both qubits and zero-bits, which could help make state merging possible even when the zero-bits and qubits alone would be insufficient.
Instead, our point was to make precise the relationship between entanglement wedge reconstruction (and other questions in AdS/CFT) and standard protocols in quantum information, such as state merging, which may not have been clear to members of either community.
In particular, we want to emphasize that the relevant quantum information protocols are always one-shot protocols. After all, in AdS/CFT, one only typically considers a single copy of a holographic state, rather than a large number of identical copies. The only reason that the von Neumann entropy has proven relevant is that until now people have generally only considered states where the von Neumann entropy is equal to the one-shot entropies, at least at leading order. Once you consider states where this is not the case, it should not be surprising that it is one-shot entropies which play the crucial role.

Beyond two extremal surfaces
So far we have presented refined conditions for the QES prescription when there are exactly two competing surfaces, (1.12). In this section, we discuss the natural generalization of this rule which considers all bulk surfaces homologous to B.
The upshot is that the condition for large corrections is no longer two simple inequalities; it becomes a family of inequalities. Together these inequalities determine what information is actually transmitted to B.
All the claims about reconstruction in this section can be shown in random tensor networks using a careful application of the one-shot decoupling theorem. We expect based on our arguments from Section 5 that they should also be true in AdS/CFT.

Applying the refined prescription
The refined way to find the entanglement wedge (EW) is as follows. 25 Step 1: find the max -entanglement wedge (max-EW) The max-EW is intuitively the bulk region that B can definitely reconstruct with small error. In this sense, it most closely resembles the traditional operational definition of the entanglement wedge.
We define the max-EW as the largest region b that satisfies all of the following inequalities: This definition implicitly assumes that there exists some 'largest' region satisfying (7.1) that contains all other regions satisfying (7.1). We shall prove in the next subsection that this is indeed the case. The essential intuition is that, if we can reconstruct region b 1 , and we can reconstruct region b 2 , then we should also be able to reconstruct their union. Having access to additional degrees of freedom can only make reconstruction easier.
In principle, (7.1) requires checking infinitely many subregions b . However, in practice, except in situations where the bulk entropy gradients can become very large (such as evaporating black holes) it should be sufficient to only check regions where ∂b is perturbatively close to a classical extremal surface. This is because the classical area gradient must be O(G) at minima of ( . This justifies the simple conditions given in (1.12) when only two extremal surfaces exist.
Step 2: find the min-entanglement wedge (min-EW) The min-EW is the complement of region B definitely knows no information about. In other words, it is the region that region B may know at least some information about. For pure states, it is the complement of the max-EW of B. For mixed states, it can be smaller.
We define the min-EW as the smallest region b that satisfies all of the following inequalities, forb the complement of b: where bb is the union of b andb . Again, the existence of a smallest such region is nontrivial, and is equivalent to the existence of a max-EW for the purification of B, namely BR.
Step 3: define EW as min-EW = max-EW In general, the max-EW is contained in the min-EW, as we will prove in the next subsection. In the special case in which they are the same, we can define the EW to be equal to both of them, and the entropy S(B) equals the generalized entropy of this EW. However, if the min-EW contains a region that the max-EW doesn't, then B may have partial information about that region. In general in such cases, the entanglement entropy S(B) will not be equal to the generalized entropy of any single surface.

Properties of the min-EW and max-EW
In this subsection, we prove several important properties of the min-EW and max-EW. To do so, we will need certain inequalities that are satisfied by smooth minand max-entropies. The first is that both the min-and max-entropies satisfy strong subadditivity Secondly, the smooth min-and max-entropies satisfy a number of approximate chain rule inequalities [56]. Most importantly for our purposes, we have

Property 1: existence of the min/max-EW
We will show that, given any two regions satifying (7.1), their union will also satisfy (7.1). This immediately implies the existence of the max-EW, and implies the existence of the min-EW by the equivalence with the min-EW of BR. To prove this, we need to consider three overlapping regions: the two original regions, and an arbitrary subregion b of their union. These three overlapping regions can be decomposed into six disjoint regions, which we label b 0 , b 0 , b 1 , b 1 , b 2 , b 2 , as shown in Figure 7. The original two regions are given by b Because the original two regions satisfied (7.1), we know that for i = 1 or 2, as well as Adding together these four inequalities (two for each of the two regions) and comparing the area terms, we find We can then simplify the left hand side, using 10) The first inequality uses SSA and the second uses the chain rule (7.4). Together with a similar set of inequalities with 1 and 2 exchanged, this gives The max-and min-EW are therefore well-defined, up to O(ln ε) corrections (which is the same entropy difference that was required for EWR and the QES prescription to hold safely, anyway).

Property 2: min-/max-EW nesting
Almost the exact same argument shows that the max-EW and min-EW satisfy nesting. That is, a boundary region B 1 ⊆ B 2 must have a max-EW (min-EW) that is entirely contained in the max-EW (min-EW) of B 2 .
To prove this for the max-EW, once again let the regions b 0 , b 0 , b 1 , b 1 , b 2 , b 2 be disjoint, with the max-EW of B 1 given by b 0 b 0 b 1 b 1 and the max-EW of B 2 given by for an arbitrary region This will imply that the max-EW of B 2 should have included b 1 b 1 since the beginning. The proof, given this setup, is identical to the previous one.
The proof for the min-EW follows from nesting of the max-EW of the complement plus a puryifying reference system.

Property 3: max-EW ⊆ min-EW
The max-EW is always contained in the min-EW. Intuitively this must be true if, as we claim, the max-EW characterizes the region that B has (approximately) all information about, while the min-EW characterizes the region that B has any information about.
To prove this, we assume for contradiction that there is some region b that is contained in the max-EW, but not in the min-EW. Let b be the intersection of the max-and min-EWs, let b be the region contained in the min-EW, but not the max-EW and let b be the complement of the union of the two wedges.
Then it must both be true that and that (7.14) However, We therefore have our desired contradiction.

Property 4: max-EW = min-EW only at minimal generalized entropy surfaces
In the special case that the min-EW and max-EW equal the same region b, they must be bounded by a surface that minimizes This implies 20) where in the first line we used SSA of both area and max-entropy, and in the second line we used H ε max (A|B) ≥ S(A|B). Meanwhile, (7.2) tells us where in the second line we used H ε min (A|B) ≤ S(A|B). Combining these two inequalities gives (7.17), where the inequality must be strict for non-trivial b 1 b 2 because (7.18) and (7.21) are strict for non-trivial b 1 and b 2 respectively. This is what we set out to show.
The converse is not true. A minimal generalized entropy surface will not in general satisfy all of (7.1) and (7.2). However, if all states were perfectly compressible, then this converse would be true, and therefore the naïve QES prescription would hold. Indeed, (conditional) perfect compressibility implies H ε min = S = H ε max , and equations (7.1) and (7.2) would both be satisfied only by the minimal generalized entropy surface. This is one way to understand the refined conditions for the QES prescription.

Full reconstruction outside the max-EW
Everything in the max-EW can be fully reconstructed from B. Similarly no information reaches B from degrees of freedom outside the min-EW. However, the converses of these statements are not necessarily true. There can be regions outside the max-EW which can be fully reconstructed; and regions inside the min-EW that cannot. Nonetheless, when the min-EW and max-EW are not equal, there is always some nonempty intermediate region that is partially, but not fully, reconstructible. A tensor network example will make this clearer. Consider m tripartite random tensors arranged in a line, each with bulk leg b i for i ∈ {1, ..., m}, each connected to the tensors to its left and right by maximally entangled "in-plane" legs B i , of dimension e A i /4G , except for the first and last tensor, which have one dangling in-plane leg each (the "boundary" legs). Let B be the name of the left boundary leg and B be the right one, with dimensions much larger than any e A i /4G . See Figure 8.
Consider a bulk state that is a mixture of a) a pure state with a large amount of entanglement (with entanglement entropy S) between b 2 and b 3 and b) a pure state on b 2 and a highly mixed state (with entropy S) on b 3 . If the extremal surface areas satisfy, then we find that the max- However, the two states in the mixture are perfectly compressible, with EWs that consist of b 1 b 2 b 3 and b 1 b 2 respectively. Hence b 2 is reconstructible in both. And it is easy to check that the two states must be close to orthogonal on B (e.g. their entropies differ at O(1/G) and both have approximately flat spectra). So b 2 must be reconstructible for a mixture of the two states.
How is this possible? The answer is that the max-EW is the largest region that can be reconstructed without knowing anything about the state outside that region.
However, because (and only because) the min-EW is larger than the max-EW, some partial information from outside the max-EW makes it through the tensor network legs. In particular, some information from b 3 makes it through the in-plane leg B 2 (namely the part of the state which is entangled with b 2 ). And this additional information makes it possible for all the information in b 2 to reach B.
More formally, even though we have where the state on B 2 is the state produced by the entire network to its right (and then tracing over B). If the min-EW did not contain anything outside b 1 b 2 , this state would be maximally mixed and (7.25) would reduce to (7.24). When this is not the case, the in-plane legs can expand the fully reconstructable region. While sensible in a tensor network, it is not clear how we should define a quantity analogous to H ε max (b 2 B 2 |b 1 ) in AdS/CFT, except by explicitly converting the calculation into one involving tensor networks. Hence if the min-EW and max-EW are not equal, it may be hard to identify with certainty the full region of the bulk where everything can be reconstructed in B. It will be at least as big as the max-EW, defined by (7.1), but could be larger (because of additional information from outside the max-EW). Likewise (by looking at the complementary region in a purification, as usual), if the min-EW and max-EW are not equal, the full region of the bulk that B has any information about may be smaller (but not larger) than the min-EW, defined by (7.2).

Refining the QES prescription
In this paper, we have introduced a refinement of the usual QES prescription. This refinement is both necessary for the boundary entanglement entropies to be self-consistent, and follows from careful application of the replica trick. Without our refinements, the QES prescription would only be valid for the limited subclass of states that are perfectly compressible.
Specifically, we have strengthened the conditions required for the entropy S(B) to be given by the generalized entropy of the minimal QES. In the language of Section 7, this is only true when the max-and min-entanglement wedges coincide (perhaps up to perturbative corrections). When the two wedges do not coincide, the entropy S(B) is much more complicated. This is closely related to the breakdown of complementary reconstruction, with a large region that cannot be fully reconstructed from either region B or from its complement.
Fundamental lesson: EWR as one-shot state merging In many ways, this second point about entanglement wedge reconstruction (EWR) is the more fundamental one. For pedagogical reasons, our presentation was, in a certain sense, inverted. We led by demonstrating the large corrections to the QES prescription, in Sections 2 through 5. Only then in Section 6 did we explain that EWR should be understood through the lens of one-shot quantum state merging, necessitating the refined conditions for reconstruction.
The QES prescription is just a rule for computing one particular boundary quantity (the von Neumann entropy of a reduced state). This is just one measure of the boundary entanglement structure (albeit a very simple and useful one). EWR is stronger, telling us a deep fact about how information in the bulk is distributed on the boundary that is independent of the particular measure (Petz map operators, relative entropies, modular flows etc.) that one might use to probe it.
As we argued in Section 6, the information theoretic task of encoding the bulk into the boundary is manifestly a form of one-shot state merging, albeit one that uses zero-bits rather than the traditional classical bits. Just from this, one can see that the naïve QES prescription implied EWR conditions that were too powerful. There simply is not enough information transferred from the bulk to the boundary via the AdS/CFT dictionary. No quantum information protocol could encode the bulk in the boundary in the way implied by the naïve QES prescription; it is incompatible with quantum Shannon theory.
This reinterpretation of EWR in terms of one-shot quantum state merging seems likely to have important future consequences. For one thing, it opens the door to connecting QES and quantum error-correction [57], providing an understanding of the QES prescription that doesnt come from the Euclidean path integral. This might shed light on how to modify Hawkings calculation of non-unitary black hole evaporation. Indeed the new arguments from the QES prescription [2,3,9,27] give a unitary answer, but unlike Hawking make vital use the Euclidean path integral. A Hilbert space understanding of the QES prescription may connect the calculations.

Generalized min-/max-entropy
That we know of, these refinements are the first example of a generalization of the generalized entropy that replaces the von Neumann entropy by a new entropy measure (in this case the smooth min-/max-entropy). The generalized entropy of a codimension-2 surface, defined as the area plus matter von Neumann entropy, 26 is believed to be a well-defined continuum quantity, having passed many non-trivial checks. It is UV finite, is scheme independent, and seems to correctly generalize the classical area in many classical general relativity theorems [1,[60][61][62]. It therefore made perfect sense to promote extremal area surfaces to extremal generalized entropy surfaces, in the naïve QES prescription.
In contrast, the refined QES prescription asks us to do something new: to add the smooth min-entropy or max-entropy of the bulk fields to the area. The arguments from this paper suggest that this must be equally well-defined. In particular, there should be an appropriate renormalization procedure that makes these differences UV-finite.
The leading UV-divergence in the smooth min-and max-entropy of a subregion in quantum field theory is proportional to the area (just like for the von Neumann entropy). This is essentially because the UV-divergent parts of the subregion states are thermal Rindler modes, and hence are perfectly compressible.
However, the difference between the von Neumann entropy and the smooth min-/max-entropy will still be O( √ S) and hence UV-divergent [30,63]. This means that the smooth min-/max-entropies cannot be renormalized by the same quantity as the von Neumann entropy. This is OK. As discussed in Section 5.3, the relevant area difference is not the expectation of the difference in area, but a lower confidence bound on the difference in areas. This differs from the expectation of the difference by O( √ G), which is the correct scaling to renormalize the difference between von Neumann entropies and min/maxentropies.
It is therefore natural to hope that the generalized smooth min-and max-entropies, defined as with A ε min/max respectively lower and upper-confidence bounds on the area, should be a UV-finite quantity.
If this is indeed the case, we should also expect that the conditional generalized smooth max-entropy should also be UV-finite (note [(A(bb ) − A(b)] ε max is again an upper confidence bound on A(bb ) − A(b)). There are two sets of modes that give divergent contributions to H ε max (b |b): modes near the boundary of bb , and modes near the boundary between b and b . The contribution to the divergence of H ε max (b |b) from UV-modes near ∂(bb ) will be the same as for the smooth max-entropy H ε max (b b) (because these modes are unentangled with b). Meanwhile the divergence from UV-modes near ∂b will be the same as for the smooth min-entropy H ε min (b b), except with the opposite sign, because Hence we should expect the total divergence to be renormal- As usual, the UV-finiteness of the smooth conditional generalized min-entropy also follows by considering complementary subsystems.
Note that the conditional generalized smooth min-and max-entropies should also be IR finite (just like the conditional generalized entropy (A(bb ) − A(b))/4G + S(b |b)). This follows from [A(bb ) − A(b)] ε and H ε max (b |b) being separately IR-finite. The refined conditions for the QES prescription (1.12) can therefore be written in terms of the sign of the (finite) conditional generalized smooth min-and max-entropies. So instead of (1.12), we should really write 3) This formulation naturally unifies the corrections discussed in this paper with the corrections from [13,14], which considered situations in which H ε max = H ε min = 0, while

Bit threads
The bit threads paradigm [64], to the extent that it continues to be useful with large bulk entropies, should have a matching refinement. A good first step towards finding it is to incorporate bulk entropy, possibly by allowing threads to end on a "reference system" understood to purify the bulk matter. A more sophisticated modification that is sometimes mentioned is to allow threads to 'pass through' entanglement, effectively using the bulk Bell pairs as 'Planckian wormholes.' For this to be consistent with our refinement of the QES prescription, the number of bit threads that can pass through these Planckian wormholes should be controlled by the conditional min-and max-entropy, not the von Neumann entropy.
It would also be interesting to incorporate zero-bits into this framework, allowing bit threads to more precisely depict the total flow of information in AdS/CFT.

Other future work
We have not given a direct path integral argument for these QES refinements for general bulk states. Our argument was more indirect: We proved it for RTN using linear algebra. Because the RTN entropy can be computed using the replica trick, the replica trick must enforce these refinements. The RTN replica trick is identical to the fixed area state replica trick, and so the same results must be true in fixed-area states. More typical (non fixed-area) states have the same entropy as the average of fixed area states that comprise them, plus subleading corrections. Although we think this argument is compelling, it is very indirect. There should be some way to relate bulk min-and max-entropy to the holographic calculation, allowing a direct replica trick proof of our result. In particular, we should be able to directly see why they are the quantities that determine whether the LM assumption is valid.
These results should also be generalized to von Neumann algebras. We have discussed subregions (b, b , etc.) instead of subalgebras, only for simplicity. The smooth conditional min-and max-entropy admit algebraic definitions, which is a better language for bulk reconstruction.
We also didn't give any description as convenient as the naïve QES prescription when H ε min < ∆A/4G < H ε max . We did provide useful bounds, for example that the entropy of a state in that regime is less than the average of the entropies of any mixture comprising that state, plus O(ln d), where d is the number of states in the mixture. But getting something stronger, such as an explicit formula, may be too much to hope for. Any formula would need to encode the details of the entanglement structure of the mixed state ρ bb . This is known to be very hard to characterize.

A Detailed evaluation of the mixture resolvent
Here we elaborate on the calculations in Section 4.3, detailing how to go from the cubic resolvent to the eigenvalues in each regime. Note the differences between this resolvent and (4.30). Here, the bulk state is This is a slight generalization of the state from Section 4.3, in that we don't require one state to be pure. However, we still require both to have flat spectra. We recover the state from Section 4.3 by setting λ 1 = 1 and λ 2 = e −S . Recall that we assumed, for simplicity, This condition ensures that our small and large R expansions have overlapping regimes of validity. Unlike in Section 4.3, we will not assume that p, 1 − p = O(1). This will require us to introduce a third expansion that is valid for sufficiently small R and very small p.
Here are the three expansions we use, plus details about their associated spectra, along with information that will be useful in evaluating their regime of validity. These details are computed with the help of Appendix B.

The Expansions
Expansion 1: Consider the large R expansion Using the results of Appendix B, this leads to To analyze when this expansion is valid, it is useful to know the value of the resolvent. At λ avg , the resolvent is Expansion 2: Consider the small R expansion This results in of average size At λ avg , the resolvent is For very small p, the second term on the right hand side of (A.8) can become smaller than the terms that were dropped. It is therefore helpful to use a slightly adapted version of Expansion 2, namely The only effect of this change is that now Finally, we note that for 1/λ 1 e (A 2 −A 1 )/4G , and for values of λ where D(λ) = 0, we have (A.14) This will again be important when considering small values of p.
Expansion 3: Finally, we can use an alternative small R expansion, where we expand both the λ 1 and λ 2 terms up to O(R 2 ), This results in which gives e A 2 /4G eigenvalues with average eigenvalue λ avg = e −A 2 /4G . Finally, This expansion is important because when p is very small, the O(R 2 ) correction from the λ 2 term may be larger than the corresponding correction from the λ 1 term, even though pλ 1 (1 − p)λ 2 .

The Regimes
Here are the three regimes, each defined by the relative size of ∆A/4G ≡ (A 2 − A 1 )/4G and (A. 18) There are corrections to the naïve QES prescription only in Regime 2, when H ε min (b ) and H ε max (b ) are on different sides of ∆A/4G.
In this regime, Expansion 1 is always valid at is eigenvalue peak, which is at λ avg = pλ 2 e −A 1 /4G . Expansion 2 is valid at its eigenvalue peak, with λ avg = (1 − p)λ 1 e −A 1 /4G , unless p is very small, in which case we need to use the adapted version of Expansion 2. This only has a small effect on the eigenvalue peak. Thus for all parameter values, assuming (A.3), the entropy is given by where we have suppressed terms that vanish in the limits we've taken. The naïve quantum extremal surface prescription gives the correct answer.
Now consider Expansion 2. The resolvent at λ = O(λ avg ) is The largest dropped term, at O(λ avg ), is given by plugging this into the dropped term in (A.4). The smallest kept term is either the second term, with size O(e A 1 /4G /λ 1 ), or the third term with size O((1 − p)e A 1 /4G /pλ 1 ) . In the latter case, the ratio dropped/kept equals O((1 − p)λ 2 /pλ 1 ). This is small given (A.3). In the former case, the ratio equals O((1 − p) 2 λ 2 /p 2 λ 1 ). This is small, unless p itself is very small. For small p, we need to be a bit more careful, recognizing that the second term in Expansion 2 only becomes important near the eigenvalue peak where its denominator is small, and also to make use of the adapted version (A.12) of Expansion 2. Using this adapted version, we find that the ratio of the dropped term to the second term is Going from the left hand side to the right hand side, we used the fact that (A.14) holds near the eigenvalue peak. This ratio is small so long as (1 − p)λ 2 pλ 1 and pλ 1 e (A 1 −A 2 )/4G . It is now a simple matter to compute the entropy using the eigenvalues from Expansion 1 and (the adapted) Expansion 2 to get (A.19).
Proof. Consider Expansion 1. If λ 1 e (A 2 −A 1 )/4G 1, the resolvent is always real and so does not contribute any eigenvalues. 27 If λ 1 e (A 2 −A 1 )/4G 1, then Expansion 1 works exactly as it did in Regime 2. The smallest kept term is 1, Expansion 2 works exactly as it did in Regimes 1 and 2. It is therefore valid (when using the adapted version) so long as In the former case, the dropped/kept ratio is O((1 − p) 2 λ 2 /p 2 λ 1 ), which is small unless p 2 λ 1 (1 − p) 2 λ 2 . In the latter case, we find that the dropped/kept ratio is The ratio is small so long as λ 1 e (A 1 −A 2 )/4G or p 2 λ 1 (1 − p) 2 λ 2 . Between the expansions, we can therefore cover all the possible regimes.
Note that for p 2 λ 1 (1 − p) 2 λ 2 p 2 λ 2 1 e (A 2 −A 1 )/4G , both Expansions 1 and 2, and Expansion 3 are valid. However, Expansion 3 misses the existence of the second eigenvalue peak that appears in Expansion 2, even though it is a small R expansion and this occurs at smaller R than the main eigenvalue peak. This is because, for these intermediate values of p, the Taylor expansion of the λ 1 term in Expansion 3 was not under control, since pλ 1 R e (A 1 +A 2 )/4G . We were only able to get away with the expansion anyway because both the true λ 1 term and our approximation of it were only small correction anyway (because p was so small). Near the second eigenvalue peak itself, this isn't true because the true λ 1 term breaks down, and Expansion 3 breaks down. So we do need to use Expansion 2 here.
We compute the entropy as follows. If λ 1 e (A 2 −A 1 )/4G 1, then for p 2 λ 2 1 e (A 2 −A 1 )/4G (1 − p) 2 λ 2 we can use Expansions 1 and 2 to compute the entropy. For small p, we instead use Expansion 3. If λ 1 e (A 2 −A 1 )/4G 1, then we can always just use Expansion 3, although we can also use Expansion 2 if p is not too small. In all cases, the entropy is given by (A.22).

B Solving the quadratic resolvent
This appendix studies the quadratic resolvent equation, where W, X, Y, Z are some fixed real numbers. This equation has the solutions The minus in front of the square root in (B.2) is required by R(λ → ∞) = 0.

(B.4)
So, we need the imaginary part of the resolvent. We can ignore everything not under the square root, because it will not contribute to D(λ). Use the handy fact that the square root (with positive real part) of a complex number a + ib can be written as √ a + ib = p + iq with The relevant piece of the imaginary part of R gives for λ ∈ [λ − , λ + ], and D(λ) = 0 otherwise. To find the number of eigenvalues, we can integrate this using The ln makes this difficult to evaluate in practice. Fortunately, we can obtain a rather good approximation by expanding λ around the average of the eigenvalue distribution. Use The entropy is (B.14)

Resolvent values
We are sometimes interested in evaluating the resolvent at the location of the average eigenvalue. This is important in determining that the expansions used in Section 4 and Appendix A provide accurate estimates of the eigenvalues associated to the cubic resolvent (4.30). Plug (B.9) into (B.2) to get These are useful when comparing dropped terms to kept ones.

Example: bipartite tensor
Apply this to a simple example. Consider a bipartite random tensor, with legs A and B. The resolvent associated to ρ A satisfies ( This is all just as expected.

C Numerics
Here we present numerical evidence supporting the results of Section 4. These numerics are of a single tripartite random tensor, with legs B, B, b . As pointed out in [9], computing the entropy of e.g. B is equivalent non-perturbatively to the computation of S(B) in a fixed-area state, like Figure 4 with γ 1 fixed to area ln D B and γ 2 fixed to area ln D B . D B and D B are the dimensions of legs B and B respectively. The leg b is the "bulk" leg, analogous to the state of the bulk fields between γ 1 and γ 2 , and is projected into the state (C.1) Figure 9 displays the resulting eigenvalue density D(λ) of density matrix ρ B , in Regimes 1, 2, and 3 from Section 4 and also Appendix A.

D One-shot decoupling
A great many facts in quantum information theory follow from the same basic principle: the decoupling theorem. While this powerful theorem was originally proven and used in the independent identically distributed (i.i.d.) setting [40], in which a large number of independent copies of the state are available, more recently a one-shot version has been proven [37], effectively generalizing many key results to the one-shot setting. The chief difference between the two decoupling theorems is the replacement of the von Neumann entropy with the one-shot entropies, the min-and max-entropy.
The setup is as follows. Consider a system A = A 1 A 2 (with dimensions |A 1 | and |A 2 |) entangled with a system R. The one-shot decoupling theorem provides a sufficient condition for the average unitary U acting on A to "decouple" A 1 and R. This is often used to provide a sufficient condition for something weaker, the existence of a unitary operator U that decouples A 1 and R.
For our purposes, the theorem says if ln |A 1 | ≤ ln |A 2 | + H min (A 1 |R) − 2 ln 1 ε , where dU is the Haar measure on the group of unitaries acting on H A , normalized to dU = 1. We present the proof as Theorem 7 below, after some useful definitions and lemmas.

Definition 1.
Let X be an operator on Hilbert space H. The L 2 norm, or Hilbert-Schmidt norm, is defined as X 2 = tr(X † X) .

(D.3)
This upperbounds the L 1 norm ( X 1 = tr √ X † X), as X 1 ≤ √ d X 2 , where d is the dimension of H. This bound is involved in the i.i.d. proof of decoupling [40], but the one-shot version we are interested in requires a stronger bound.  The second inequality is the Cauchy-Schwarz inequality, applied to the Hilbert space End(H) of operators on H, with the inner product A|B = tr(A † B).
Having bounded the L 1 norm by this particular L 2 norm, we will later bound the relevant L 2 norm by something else. First, we need to define the conditional collision entropy H C (A|B), which, as we'll prove, bounds the conditional min-entropy H min (A|B).

Definition 3.
The quantum conditional collision entropy for density matrix ρ AB on Hilbert space H AB = H A ⊗ H B is defined as where the infimum is taken over all density matrices σ B on Hilbert space H B . Note that σ −1 B is the 'generalized inverse' of σ B , defined as the inverse on its support. That is, where the maximization is over density matrices ω AB and is achieved when ω AB is a projector onto the largest eigenvalue of σ . For κ B and ω AB arbitrary density matrices on H B and H AB respectively, where dU is the Haar measure on the space of unitaries acting on H A , normalized to dU = 1.
Proof. For any Hermitian X, it follows from Schur's lemma (see e.g. [36]) that where we have defined tr(XΠ ± A ) .

(D.19)
Plug in X = (1 A 2 A 2 ⊗ F A 1 A 1 ) and find Using rank(Π ± A ) = 1 2 |A|(|A| ± 1), we get Plugging all of this into (D. 18) gives which is what we wanted to show.

Lemma 6.
(Lemma C.2 of [37]) Let ρ AR be a density matrix on H AR , A = A 1 A 2 , and σ A 1 R (U ) = tr A 2 (U ⊗ 1 R )ρ AR (U ⊗ 1 R ) † . Then where dU is the Haar measure on the space of unitaries acting on H A , normalized to dU = 1.
Proof. Use Lemma 5 to get which is what we wanted to show.
We can finally combine these to prove the one-shot decoupling theorem.
where dU is the Haar measure on the group of unitaries acting on H A , normalized to dU = 1.
Proof. Let σ A 1 R (U ) = tr A 2 (U ⊗ 1 R )ρ AR (U ⊗ 1 R ) † . Because H C (A|R) ≥ H min (A|R) by Lemma 4, and S 1 ≤ √ tr σ σ −1/4 Sσ −1/4 2 by Lemma 2, it suffices to show that where ω R is some density matrix on H R . Define (D. 29) The left hand side of (D.28) then equals where in the third line we have used , in the first inequality we have used Lemma 6, and in the final inequality we have used (D.27).