From state distinguishability to effective bulk locality

We provide quantitative evidence that the emergence of an effective notion of spacetime locality in black hole physics is due to restricting to the subset of observables that are unable to resolve black hole microstates from the maxi- mally entangled state. We identify the subset of observables in the full quantum theory that can distinguish microstates, and argue that any measurement of such observables involves either long times or large energies, both signaling the breaking down of effective field theory where locality is manifest. We discuss some of the implications of our results for black hole complementarity and the existence of black hole interiors.


Introduction
One of the main open questions in black hole physics is the compatibility of unitarity in the entire quantum theory with the emergence of some notion of spacetime locality, the latter being manifest in any effective field theory (EFT) description.
In this paper we explore the emergence of some notion of effective bulk locality in light of the connection between the resolution of the information paradox, and the modern formulation of thermalization in terms of entanglement and typicality. In a nutshell, just as the emergence of effective thermalization in quantum statistical mechanics requires some notion of coarse-graining, we propose that the emergence of effective bulk locality is due to the restriction to a subset of observables which cannot resolve black hole microstates from their maximally entangled averages.
Using tools from quantum information theory, we prove that a given observable O, or a collection of them, cannot distinguish a random pure microstate in a microcanonical ensemble H E of dimension d E from the maximally entangled state Ω E = I E d E unless the number of different outcomes of the operator N (O) scales as √ d E . Furthermore, whenever N (O) ∼ √ d E , we prove, in a simple quantum mechanical model, that any quantum measurement would require either a very long time or involve a very large energy to achieve the accuracy required to distinguish these states. Either way, this points to a breakdown in the EFT description. Alternatively, we show that any measurement involving a finite amount of resources is necessarily coarse-grained.
In an AdS/CFT set-up, our results provide mathematical evidence that the operators belonging to the bulk low energy effective theory are coarse-grained, in the sense that they are unable to distinguish random pure states from the maximally mixed one unless you wait for exponentially long time. Furthermore, they also support the idea that only low point correlators of such operators admit a local bulk semi-classical interpretation compatible with the EFT description. Taking this perspective, we further comment on some implications for quantum gravity: mainly the consistency with black hole complementarity and the relation with recent discussions regarding the (non-)existence of a classical black hole interior.

Black holes vs Quantum Statistical Mechanics
Hawking established that black holes radiate with a thermal spectrum [1] by performing a 2-point function calculation in a framework that we shall henceforth refer to as effective field theory (EFT): he considered a bulk quantum field φ(x) propagating in a non-dynamical black hole background where the subscript EFT refers to the replacement of the black hole quantum state |ψ BH by the classical black hole background. This description is manifestly local, a property that is expected to hold due to the equivalence principle, but is hard to derive in any quantum theory of gravity. Locality appears to be in contradiction with the unitarity of quantum mechanics, and this is commonly referred to as the information paradox. One aspect of this paradox is directly related to the problem of thermalization in quantum mechanics: given a closed quantum system in an initial pure state, unitary evolution never gives rise to a mixed state such as the Gibbs state. The latter is a fundamental problem for the foundations of quantum statistical mechanics (QSM).
How do we understand thermalization then? In QSM, thermalization is always viewed as a consequence of coarse-graining the description of our physical system. In modern language, one expects entanglement and typicality to explain why QSM is an accurate description of nature at a macroscopic level. For example, consider a composite Hilbert space H = H S ⊗ H B , where H S is the subspace that we will measure and H B stands for the bath. Given an initial quantum state ρ ∈ H, its reduced density matrix is the only information that we have access to. Quantum entanglement is responsible for encoding the information about correlations between the subsystems whereas typicality is responsible for the apparent universality in our measurements. More precisely, if the dimensions of these Hilbert spaces satisfy d S √ d B , in the thermodynamic limit d B → ∞, the deviation of ρ S from the maximally entangled state Ω S = I S d S ∈ H S is suppressed [2]. That is where . . . stands for trace distance. 1 In this work we take the perspective that such coarse-graining in holographic field theories is responsible for the emergence of a notion of effective locality (and consequently, the causal structure of spacetime), which is an assumption in Hawking's calculation. To advance our main philosophy, bulk locality emerges only for the subset of observables that fail to distinguish a thermal state, or Ω S in a microcanonical set-up, from the actual microstate of the black hole. In gravitational physics having holographic duals, the coarse-graining appears as a result of restricting to the subset of low energy observables A low belonging to the effective field theory. These correspond to the light sector of the holographic theory in the terminology introduced in [3]. We will discuss this point more thoroughly in section 4.
With this motivation in mind, the main task we undertake is to identify the subset of observables O that can distinguish pure microstates ρ i = |ψ i ψ i | from the thermal density matrix ρ BH (the semi-classical description of the black hole) in a given large but finite dimensional Hilbert space H. 2 In an exact quantum theory of gravity, such as in an AdS/CFT context, we quantify the deviations of correlators computed in black hole microstates |ψ i from the effective field theory answer. These deviations are expected to be of the form [4,5] where e S BH = d E is the dimension of the microcanonical Hilbert space H E of black hole microstates. The e −S BH exponential suppression is expected from semiclassical gravity [6] and statistical considerations [7]. The last term in the right hand side is currently a vague way of parameterising the fact that depending on the nature of the correlator, i.e. the number N of insertions and the properties of the individual probe operators O i , such corrections may not be subleading [8,7,5]. Our main goal is to make this statement precise in quantum mechanics.
Even though our analysis is in the full quantum theory, it concerns the robustness of Hawking's EFT framework because whenever thermal answers are not sharply peaked due to large variances, the notion of bulk locality is not expected to hold. We stress this breaking down in the EFT description is, a priori, on top of the well established fact that low point correlators, such as the 2-pt function in Hawking's original calculation, must break down at large times, i.e. for times of the order of the black hole evaporation time scale or the Poincaré recurrence time. The latter can be made particularly precise in an AdS/CFT context [6].

Distinguishing quantum states
Consider a finite dimensional subspace H E ⊂ H of dimension d E consisting of all pure states ψ = |ψ ψ| that live in the microcanonical ensemble of energy [E −δE, E +δE]. We will assume the Hamiltonian describing the unitary time evolution of the system has non-degenerate energy gaps. This is a condition on the spectrum stating that the equality E k − E l = E m − E n can only be satisfied either by E k = E l and E m = E n , or by E k = E m and E l = E n . 3 In this section, we identify the necessary condition for a set of observables A to distinguish a random pure state ψ ∈ H E from the maximally mixed state in H E . We argue that an actual quantum mechanical measurement of any such observable requires large resources, i.e. very long times or very high energies. Alternatively, any finite time and energy resourced quantum measurement is equivalent to a coarsegrained observable for which random pure states still appear entangled.

Expectation value of operators
One possibility to quantify the difference between quantum states ψ ∈ H E is to measure the expectation value of some operator A. We can study this either at fixed time, by averaging over the entire set of states, or by averaging over time.
To study these questions, it is instructive to think of ψ|A|ψ as a random variable X with uniform distribution on H E or over the positive real line R + , respectively [10]. 4 Applying Chebyshev's inequality Notice that in this ensemble the equilibrium configuration ω ψ depends explicitly on the initial state |ψ(0) and is obtained by setting to zero all off-diagonal elements in |ψ(t) ψ(t)| in the energy eigenbasis. The latter is due to the dephasing occurring when averaging over time.
The variance σ 2 A (ψ) in this random variable is where we assumed the hamiltonian has a non-degenerate energy gap spectrum. Chebyshev's inequality implies Ensemble vs time averages: The equilibrium configuration ω ψ depends on the initial state and one can equally ask how large the probability is for a random equilibrium state to differ from the ensemble average (7). This can be efficiently computed from previous calculations simply by replacing the arbitrary operator A in the ensemble average discussion by E |E E|A|E E|. When we do this, the term tr (A Ω E ) remains invariant, whereas the random expectation value ψ|A|ψ becomes Replacing this into (9), we obtain with the new varianceσ 2 A given bỹ

The use of typicality on expectation values
Our goal is to show the above probabilities are suppressed in the dimensionality d E of the Hilbert space H E . In the process, we learn which properties the operator A must satisfy to violate this conclusion.
To study these issues we will apply typicality arguments. These follow from the phenomenon denoted as concentration of measure in the mathematical literature [12]. For our purposes, it is sufficient to point out that on a d-dimensional sphere S d of unit radius almost all points are within geodesic distance 1 √ d from its equator. More mathematically, given an spherical cap C( ) located a distance 0 < < 1 from the center of the sphere, the isoperimetric inequality allows us to bound its normalised measure by It follows from this inequality that the probability for a random point on this sphere to belong to a band B around the sphere excluding the pair of caps C( ) equals Our claim follows in the limit of large d. These geometric facts become relevant for the foundations of quantum statistical mechanics when we consider subspaces H E of the Hilbert space with a large dimension. Indeed, from (10), a random pure state in H E corresponds to a random point on a sphere of dimension d = 2d E − 1 parameterised by the complex vector components c E .
The application of these ideas to expectation values, i.e. functions over this sphere, is known as Levy's lemma.
Lemma 3.1 Levy's lemma: Given a bounded function f (ψ) defined over the set of pure states ψ ∈ H E , for any such random state ψ and any > 0 See [12] for a proof.
Ensemble average: As a first application of Levy's lemma, consider the random variable X ψ . Its uniform probability distribution is precisely the normalised measure over the sphere describing the microstates in our microcanonical ensemble H E . Thus, expectation values of operators A are functions over this sphere, i.e. f (ψ) = ψ|A|ψ , satisfying f (ψ) ψ = tr (AΩ E ). The Lipschitz constant λ controls how probable large deviations are. It was proved in [2] that Thus, unless the largest eigenvalue of A scales like √ d E , we can conclude that such probability is exponentially suppressed in d E .
Time average: The variance σ 2 A (ψ) in (21) is bounded above according to [10,13] This upper bound has two pieces. The quantity tr (ω 2 ψ ) is the purity of the equilibrium state, it equals E |c E | 4 and manifestly depends on the state. The quantity ∆ 2 A is the difference between the largest and the smallest eigenvalues of A restricted to H E It manifestly depends on the operator A.
A more sophisticated application of Levy's lemma (19) in [14] allows to prove Using this result in (13), we learn that the probability of having significant deviations is suppressed We can get some intuition on this result by replacing δa with the averaged gap δa between eigenvalues of the operator A. In that case, the quotient ∆ A δa ∼ N (A) behaves likes the number of different outcomes for the operator A, N (A). We shall return to the importance of the magnitude N (A) shortly.
Ensemble vs time averages: We use here the same strategy as for the ensemble average discussion, but replacing the operator A by E |E E|A|E E|. Thus, we want to apply Levy's lemma 3.1 directly. To compute an upper bound for the corresponding Lipschitz constant, we first notice that Summing over E, we reach the conclusion Thus, the norm of the operator E |E E|A|E E| is bounded above by the norm of the operator A, A . This means the probability of large deviations satisfies As before, unless the largest eigenvalue of A scales like √ d E , the probability that a random time averaged expectation value significantly differs from the ensemble average is exponentially suppressed in the dimension This last statement refers to the notion of quantum ergodicity introduced in [15] (see also the more recent discussion [16]). The latter is based on the equality between time and microcanonical averages. Here, we are saying that for almost every pure state, time average and ensemble average expectation values of operators A satisfying A 2 d E are equal. To sum up, the use of typicality allows us to argue that almost every pure state in H E behaves like the ensemble average unless the operator we use to probe it satisfies Similarly, the fraction of time a random pure state spends away from an equilibrium state is negligible unless Furthermore, for almost every pure state, time and microcanonical averages agree when A 2 d E .

Measure of distinguishability
The comparison of expectation values of an observable is not the only way to tell quantum states apart in quantum mechanics. In fact, it is easy to find examples of observables whose measurement can distinguish between different quantum states even if their expectation values are equal. Consider two spin one states |0 and 1 √ 2 (| − 1 + |1 ). By construction, both states have vanishing σ z expectation values, however a measurement of σ z can easily tell the states apart.
Notice that in this example we could distinguish both states by comparing the expectation value of σ 2 z . In quantum mechanics, given a state ψ and an observable A, the result of its measurement is a set of eigenvalues a appearing with probabilities p a . It is clear that reconstructing the entire probability spectrum {p a } provides more information about the quantum state ψ than simply measuring ψ|A|ψ . In fact, if the observable A has N (A) different eigenvalues, we may need the collection of expectation values ψ|A i |ψ for i = 1, . . . N (A) to reconstruct such probability spectrum. This reconstruction problem is called the moment problem.
This discussion motivates the notion of distinguishability introduced in the quantum information literature. The distinguishability of two quantum states ρ and σ using a particular observable A is defined as where |a are the eigenvectors of A. 6 Notice that this measure is independent of the absolute value of the eigenvalues a and only depends on the entire probability spectrum {p a }. Thus, it is more appropriate as a measure of quantum state distinguishability than any individual expectation value. In particular, it is shown in [17] that the optimal probability of telling σ and ρ apart in any measurement is ). This guarantees that if D(ρ, σ) is small, no observable can tell ρ and σ apart. We stress that the ratio σ 2 A /( A ) 2 could go to zero in the limit of large d E while D A (ρ, Ω E ) may not.
This notion can be generalized to any set of observables A, irrespectively on whether they commute or not. In particular, if A includes the entire set of observables in the Hilbert space, one talks about the distinguishability of two quantum states ρ and σ as, where it is understood the right hand side equals the maximal difference in probability spectra achieved over the entire set of available observables [17].
We are interested in computing D A (ψ(t), Ω E ), that is the distinguishability between a random pure state ψ ∈ H E and the maximally mixed state Ω E = I E d E . If one identifies the conditions the observable A must satisfy for D A (ψ(t), Ω E ) → 0, one will conclude the random pure state appears entangled from the perspective of the observable A. The theorem below summarizes our results.
Theorem 3.2 Given a random pure state ψ ∈ H E , its distinguishability D A (ψ, Ω E ) from the maximally mixed state Ω E using the set of observables A satisfies for an arbitrary > 0, where N (A) is the maximum number of outcomes of all measurements in A.
The proof is in appendix B.
Since both statements hold for any > 0, we learn that the probability for Furthermore, the second part of the theorem proves that typical states remain indistinguishable from equilibrium for almost all times if N (A) √ d E . Notice that both statements hold, in particular, for an individual operator A. In that case, the number of different outcomes N (A) refers to that single operator.
Notice theorem 3.2 mathematically characterises the set of operators A for which a typical random pure state ψ ∈ H E appears entangled. We can intuitively understand the condition on N (A) as saying the set of measurements in A is not capable of reaching the resolution required to distinguish microstates.
When the entire set A has support on a finite subsystem, theorem 3.2 reduces to a rigorous restatement of the well-known result due to Page [18]: the difference between a pure state and a mixed state in a composite Hilbert space is exponentially small unless the number of states being measured is comparable to the square root of the dimensionality of the entire Hilbert space. Consider a composed Hilbert space H S ⊗ H B , where H S and H B are the system and bath Hilbert spaces, respectively. Let H E ⊆ H S ⊗ H B be the subspace associated with the microcanonical ensemble as defined previously. The maximally mixed state in H E is where I E is the identity matrix on H E , whereas its restriction to H S by tracing out over H B defines Ω S = tr B (Ω E ) . 7 The theorem is not useful if we consider a set A containing of the order of √ d E different operators with individual different outcomes of order one. But it does not tell us how to implement such measurements in practice.

Then, from theorem 3.2 we obtain
since the number of outcomes is necessarily bounded by the dimension d S of the Hilbert space H S . The last statement is equivalent to the bound found in [2] An equivalent, perhaps more explicit, way of reproducing Page's original discussion is by considering a random quantum state for a system of N spins where θ(σ 1 , . . . , σ N ) is a set of random phases. If only k spins are measured, any observable acting on that subspace of the Hilbert space H S will have a number of outcomes bounded above from its dimensionality d S = 2 k . Its expectation values can be evaluated using the density matrix ρ k . Explicit calculation gives [18,4] Thus, ρ k is well approximated by the maximally entangled mixed state Ω k unless k ∼ N 2 , which is equivalent to the requirement above d 2 S ∼ 2 N = d E . This matches the content of Page's initial result [18], emphasizing the role played by quantum entanglement and concentration of measure [12], which is the mathematical justification behind typicality.

Coarse-grained observables
Theorem 3.2 is relevant for first principle considerations, but it does not say whether such distinguishing measurements can be performed. For example, one could conceive the existence of operators whose spectra satisfy the requirement N (A) ∼ √ d E but have an averaged eigenvalue difference suppressed in d E itself. Since obtaining information about a system is never for free in quantum mechanics [19], it is an important question to discuss whether such measurements can be implemented in short time in the probe limit.
The uncertainty principle already teaches us that achieving high precision (large N (A)) in a measurement performed in short time requires large energies. In this section we show that any measurement performed with finite time and energy resources implements the notion of a coarse-grained observable [15].
It is instructive to review the standard approach to measurements in field theory. Given an observable φ(x, t), its measurement is described in terms of a semiclassical process in which a classical source J couples to φ(x, t) by turning on an interaction H int = λ φ Jδ(x) at time t, and turning it off after an infinitesimally short amount of time. In order to keep the disturbance minimal (probe limit), one takes the limit J → 0. A semi-classical measurement is not limited in precision, and correlation functions found this way are arbitrarily accurate. However, this treatment only holds approximately as a limit of a quantum process that we now describe.
To describe a quantum measurement, we follow von Neumann [20]. We entangle our physical system with a quantum detector and consider a sharp projection on the compound state after the measurement takes place. More mathematically, assume our initial state is in a product state |ψ ⊗ |α ∈ H E ⊗ H A , where H A stands for the apparatus Hilbert space. We turn on an interaction Hamiltonian H int = λA ⊗ J, where A is the observable we want to measure, and J acts on H A .
As before the interaction acts at a time scale much smaller than the inverse energy of the system such that the evolution due to the physical system Hamiltonian during the measurement can be ignored. The analogue of the probe limit is the constraint that the kick received by the ensemble due to the measurements is small, i.e. S int / = dt H int / Eτ / where τ = O(1) . Let us expand the time evolution of the initial entangled state in the eigenbasis of A The measurement will distinguish the outcome a from a + δa if the apparatus wavefunctions corresponding to these different eigenvalues become orthogonal at time t This condition relates the amount of time and energy involved in a quantum measurement with resolution δa in our observable A. It allows us to find a lower bound on the smallest resolvable eigenvalue gap δa.

Theorem 3.3
The smallest gap δa resolvable in a measurement of an observable A in time t using the interaction Hamiltonian H int = λA ⊗ J is bounded below by The proof appears in Appendix C, and it is based on the existence of a universal limit on how fast a state can dynamically evolve to an orthogonal state in an isolated quantum mechanical system [21]. Using (39), we can derive the relation Measurements that distinguish microstates, i.e. N (A) ∼ √ d E always require a large kick. Thus, they always involve long times or large energy resources. 8 Relation to coarse-grained observables : we want to show that if the measurement action S int / is finite, i.e. if the amount of time and energy resources are finite, the fine grained observable A effectively behaves as a coarse-grained observable due to the finite precision δa achievable by the measurement. Indeed, given a precision δa, any observable A allows a description in terms of a coarse-grained observablẽ Its eigenvaluesã i are defined as follows :ã 0 = n 0 δa, where n 0 δa ≤ a min < (n 0 + 1) δa, whereas n δa ≤ a max < (n + 1) δa. Thus, the number of coarse-grained eigenvalues N (Ã) equals n − n 0 + 1 and the projectors Π i include all the microscopic eigenvalues a in the macroscopic eigen-band (n 0 + i)δa ≤ a < (n 0 + i + 1)δa. Finite action S int / requires A ∼ δa. This is equivalent to n 0 , n ∼ O(1). Equivalently, even if the observable A has no degenerate eigenvalues, i.e. the number of microscopic outcomes is d E , the number of macroscopic outcomes attainable with finite precision is order one, i.e. the observable A behaves like a coarse-grained observableÃ. 9

Lessons for Quantum Gravity
In this section we discuss the implications of our results for black hole physics in an AdS/CFT context [23,24]. 10 Even though our comments are far more general, we shall use the language of N = 4 SYM with gauge group SU(N ) and its holographic dual, when appropriate. Large AdS 5 black holes have both energies and entropies scaling as N 2 , S ∼ E. Hence, the dimension of the microcanonical ensemble d E is exponentially large in the energy of the microstates, i.e. d E ∼ e aE , where a is an order one number.

Breaking down of effective field theory
Our analysis finds that any quantum measurement capable of resolving the differences between random pure states and a maximally entangled one in large dimensional Hilbert spaces requires an exponentially large amount of resources, i.e. either energies or times exponential in N 2 . Clearly, any low energy effective action does not include such degrees of freedom. For example, in AdS/CFT, the supergravity approximation deals with operators with conformal dimension ∆ ∼ O(1) (no scaling in N ), whereas 9 The connection between information loss in black holes and finite resolution of low energy measurements has been previously discussed in [22]. 10 Any precise application of our results to any quantum theory of gravity would require to extend our analysis not only to infinite dimensional Hilbert spaces but also to unbounded operators. The techniques we used here to prove equilibration using typicality have been generalized to infinite systems in [25]. Even though we believe that an understanding and mathematical formulation of this extension for QFTs and CFTs is of great importance, we shall not pursue this here. classical supergravity saddle points have energies of order N 2 ∼ log d E . Thus, we must conclude that all low energy observables in such effective theory can only achieve this task by waiting log t = O(N 2 ). 11 This is in agreement with the finding that 2-point correlators of light operators detect deviations from thermality at these time-scales [6].
Equivalently, random pure states appear entangled for all semiclassical gravitational probes in time polynomial in N . We emphasize this statement excludes those operators built out of a large number of products of light operators. These are heavy and can distinguish microstates. Thus, these operators of the full quantum theory do not belong to the effective theory, i.e. the set of semiclassical operators A low does not form an algebra. If one considers operators made out of the product of order N 2 light observables, one expects perturbation theory to break down due to the large combinatorial factors appearing in Feynman diagrams, similar to the appearance of non-planar effects in non-abelian gauge theories [26]. 12 Both, our quantum mechanical results in section 3 and the arguments above suggest that the number of operator insertions, N in equation (5) and their conformal dimensions ∆ i = ∆ (O i ) play a similar role to the number of outcomes in our quantum mechanics discussion [27,8,4,5,28]. We postpone a precise mathematical formulation of this problem to future work. But we stress this expectation holds in effective field theory. This is because there exists a map between low energy operators and particle excitations in this regime. Thus, the number of operator insertions in a given correlator describes how large the dimension of the perturbative Fock space is being explored. In this respect, the argument is analogous to our spin discussion in section 3.3.

Effective emergence of locality
Holography allows us to apply our quantum mechanical results to gravitational physics. In particular, we know the low energy sector of the bulk physics involves classical gravity, where locality is manifest. Our work proves that any operator capable of distinguishing a random pure state from a mixed state does not belong to the low energy effective action. Thus, inverting the logic presented in the last subsection, we can say that the notion of bulk locality emerges when we restrict the entire set of observables to A low .
We finish this work emphasizing that the restriction to the subspace of observables compatible with an effective notion of bulk locality is analogous to the recent proposal by Papadodimas & Raju [28] reconstructing local bulk operators in the interior of a black hole from boundary quantum data, building on the seminal work in [29]. 11 Even though it is hard to prove by first principles, it is believed that any high precision measurement, as the ones we require in our discussion, lies beyond the regime of validity of an effective field theory. We thank Zohar Komargodski for emphasising this to us. 12 Perturbation theory in gravity is an expansion in Newton's constant G N . When computing correlators in black hole microstates, the perturbative expansion is expected to be in 1/S, where S = A/(4G N ) is the standard Hawking-Bekenstein formula. For correlators involving m ∼ N 2 ∼ S insertions, one expects such perturbative arguments to break down.

Emergence of bulk locality and relation to Papadodimas & Raju :
According to [29], local bulk operators φ CFT (t, Ω, z) in the exterior of a black hole can be constructed as where (t, Ω, z) are boundary labels which can be interpreted as bulk AdS coordinates, O i ,ω are the Fourier modes of a boundary local operator on the sphere and f ,ω (t, Ω, z) are appropriately chosen functions. Such bulk operators can be constructed, order by order in a 1 N expansion [30]. Recently, this construction was claimed to be extended for operators probing the interior of a black hole [28]. In this case, the field (42) is replaced by the mode expansion ,ω (t, Ω, z) +Õ i ,ω g ,ω (t, Ω, z) where g (a) ,ω (t, Ω, z) a = 1, 2 can be found in [31]. The expansion (43) involves two sets of operator modes : O i ,ω as before and the mirror operatorsÕ i ,ω . The latter were defined in [28] as those satisfying where A α belongs to the subset of local boundary observables belonging to the low energy sector A low . This last condition was more precisely stated in [28] by requiring the functions P (i, ω, ) to satisfy Thus, by construction, A α ∈ A low in our previous discussion. Papadodimas & Raju showed mirror operators exist and that they depend on the specific state |ψ where they act on [28]. The defining equation (44) guarantees that boundary correlators are compatible with thermal behaviour. Furthermore, all operators A α ∈ A low satisfy A α |ψ = 0. Thus, states |ψ do appear entangled for the subset of observables A α satisfying (46).
The connection with our work is as follows. We characterised the set of operators that do not distinguish random pure states from the maximally entangled state Ω E in a microcanonical ensemble H E . Our analysis was in the entire quantum theory and our only assumption on the spectrum was the existence of non-degenerate gaps to exclude integrable systems. 13 When we embed our discussion into a holographic theory, the latter is believed to have an spectrum of relatively sparsed O(1) excitations separated by a large gap from a densed set of heavy states (black hole microstates) [3]. In these theories, it is natural to identify the subset of operators A low with those satisfying the more precise constraint (46). These satisfy Thus all operators considered in [28] are expected to satisfy (47). These statements can be made more explicit for a system of N spins. If we restrict ourselves to the subset of observables acting on subsystems of k spins, where k N 2 , we already showed in section 3 that random pure states appear entangled. Thus, in this example it is clear that the set of operators A α to be considered are those satisfying the property (47).
A further outcome of this discussion is that the construction in [28] may be less state dependent than what it appears, since all operators A α see typical random pure states |ψ ∈ H E as Ω E . The latter follows from typicality and as such, it is necessarily a probabilistic statement at this stage that requires further investigation.
Having discussed the connection between our results and the emergence of locality, we comment below on the implications for complementarity and the existence of a black hole interior. Some of the considerations below are necessarily similar to the ones also discussed in [28].
Complementarity : One of the main conceptual issues in black hole physics regards the compatibility of black hole complementarity [32] with the preservation of bulk locality. One consequence of the former is that the degrees of freedom inside and outside of the black hole are not independent. The possible tension between both concepts can easily be phrased. Low energy bulk observers using an EFT description would expect correlation functions between bulk operators in the interior (z int ) and exterior (z ext ) of the black hole to vanish This is because in the EFT approximation, one is doing a quantum field theory calculation in a black hole background and the points z int and z ext are causally disconnected. From the exact quantum theory of gravity perspective, one is probing the operator equation (48) in its heavy sector using the CFT operators (43). That is, one is considering correlators of the form with ψ i ∈ H E being the black hole microstates. One expects that such correlators are well approximated by the thermal correlator. Consequently, This would seem to violate complementarity given the expected non-independence of interior and exterior black hole degrees of freedom. Our results in sections 3.3 and 3.4 provide mathematical evidence for the existence of a conceptual framework to understand in which sense both statements can be compatible. By first principles, the commutator (49) is non-vanishing. For the subset of operators in A low for which an effective local bulk description exists, deviations from thermal behaviour are exponentially suppressed in the entropy of the black hole Thus, even though these corrections are difficult to measure, particularly so in any semiclassical approximation, they are still responsible for the non-vanishing of the exact quantum gravity answer. For the subset of operators for which no notion of locality exists, there is no reason to expect the commutator to vanish. Thus, there was no tension to begin with. In quantum mechanics, it is possible to construct commuting coarse-grained operators from microscopic non-commuting ones, starting with the momentum and position operators [15], explaining why classical apparatus can measure them simultaneously. What we are saying here is that not only low energy gravity probes appear to be entangled, but they may also appear to commute due to their coarse-grained nature, even though this is not true for the exact microscopic operators.
Existence of a geometric black hole interior : There are two main reasons why we associate black holes with thermal states in the AdS/CFT correspondence. First, because in its euclidean path integral formulation, the euclidean AdS black hole is the dominant saddle contribution (at large enough temperatures) and matches the CFT expectation values of a thermal density matrix [33]. Second, because eternal AdS black holes are believed to be described by the entangled state [6] where H L and H R stand for the two isomorphic Hilbert spaces defined on each of their two conformal boundaries. An asymptotic observer living on H R will only measure observables O R acting on H R . By construction, such expectations values are captured by the reduced density matrix obtained by tracing over H L Thus, these expectation values are manifestly thermal One natural question this eternal AdS black hole description raises is : what is the holographic dual of the quantum state ρ R ? The reason to ask this question is because there are many different quantum states in H L ⊗ H R giving rise to the same density matrix ρ R capturing the physics of a single asymptotic observer. This suggests that ρ R only describes the exterior of the AdS event horizon [34,35,36,37].
In fact, this is also heuristically suggested by euclidean path integral considerations. The euclidean black hole is an smooth geometry constructed from the lorentzian geometry by cutting the horizon, removing its interior and gluing it in an smooth way. It is indeed true that such euclidean saddle knows about the full entropy of the system, but it does not know anything about the specific microstates of the system [38] 14 . This perspective suggests that information about the black hole microstates is "hidden" in the interior, but does such interior allow for a local bulk description ?
The arguments used to reconcile complementarity with EFT locality expectations already suggest the answer depends on the observables one studies. Whenever the effective black hole geometry is reliable, that is when we probe it with operators φ CFT ∈ A low , we can obviously not distinguish between the black hole and the microstates. There should exist some effective notion of interior. This is made more precise in [28]. If an asymptotic observer attempts to improve on this situation, he/she must either wait for a long time or consider correlators involving operators not in A low . These operators can carry very large energies, could distinguish among black hole microstates and will generically have large variances in the ensemble of microstates. Thus, there is no well defined notion of locality for these.
Ensemble averages : The most general parameterisation of the ensemble of states There exists a different commonly used possibility to describe ensemble averages using unitary integrals. In this case, one averages the expectation values of a given operator O over the entire set of states obtained from ψ by a unitary transformation U . The averaged expectation value equals where we used the identity [40] dU Thus, both ensemble averages are equal and match the microcanonical average by construction.
The variance is defined as before, but using the unitary integral average The only non-trivial calculation left is where we used the identity [40] dU and the definition of the swap operator S as S|i, j = |j, i . Plugging this back into (66), we reproduce the variance (63) computed above.
Time averages : In this ensemble, we consider the time evolution of the most general pure state ψ ∈ H E in (55) viewing any expectation value ψ(t)|O|ψ(t) as a random variable uniformly distributed over t ∈ [0, ∞), so that the time averaged expectation value is defined by where is the time averaged of the density matrix ρ ψ (t) = |ψ(t) ψ(t)| constructed out of the initial pure state (69). The variance is defined as in any other ensemble as Since O t is diagonal, this variance only has contributions from the off-diagonal matrix elements of the operator O in the energy eigenbasis, so that The time average forces the exponent E 2 −E 1 +E 4 −E 3 to vanish. It is here where the assumption on the Hamiltonian having non-degenerate energy gaps enters. Assuming the latter and using that E 1 = E 2 and E 3 = E 4 , we conclude the only terms surviving the intrinsic dephasing due to the time average are E 1 = E 4 and E 2 = E 3 , so that the variance equals where we added and subtracted the last term to write the final more symmetric expression.

B Proof of theorem 3.2
As most statements regarding probabilities in Hilbert spaces of large dimensionality, the strategy of the proof relies in the use of Levy's lemma 3.1. First, choose the function f (ψ) in Levy's lemma to be the distinguishability of ψ ∈ H E from the maximally mixed state Ω E = I E /d E using the set of observables A, D A (ψ, Ω E ). We need to compute the ensemble average D A (ψ, Ω E ) ψ . Due to (28), we can bound this average by.
This allows us to restrict the application of Levy's lemma to a single observable A ∈ A. Thus, we need to compute D A (ψ, Ω E ) ψ and its associated Lipschitz constant λ.
Since D A (ψ, Ω E ) ≤ 1, because each individual state probability is bounded by unity, we conclude using (20) that λ ≤ 2. Next, we find an upper bound on D A (ψ, Ω E ) ψ where the observable A ∈ A is described by POVM elements M a .
M E a = Π E M a Π E is the restriction of M a to H E , we used the Cauchy-Schwarz inequality in the third line and used (64) to write ψ ψ = Ω E . 15 Using (63), where we used tr ((M E a ) 2 ) ≤ M a 2 tr (Π E ) ≤ d E , and N (A) is the total number of outcomes, i.e. range of the sum over a. Inserting this bound in (75), we find where N (A) is the total number of different outcomes in the full set A. Inserting this bound in Levy's lemma allows to prove the first of the probability statements in theorem 3.2.
The second part of theorem 3.2 is proved by repeating the argument above for the time-averaged distinguishability as the function of ψ in Levy's lemma, and replacing M a by e iHt M a e −iHt : This finishes the proof of of indistinguishability from equilibrium for almost all states at almost all times.

C Proof of theorem 3.3
Let us expand the initial state of the apparatus |α = j α j |j in the eigenbasis |j ∈ H A of the operator J appearing in the interaction Hamiltonian H int = λA ⊗ J. The proof first consists in establishing a bound involving the real and imaginary components of S(t) S(t) = α|e −iλ(δa)J t |α .