Observations of Hawking radiation: the Page curve and baby universes

We reformulate recent insights into black hole information in a manner emphasizing operationally-defined notions of entropy, Lorentz-signature descriptions, and asymptotically flat spacetimes. With the help of replica wormholes, we find that experiments of asymptotic observers are consistent with black holes as unitary quantum systems, with density of states given by the Bekenstein-Hawking formula. However, this comes at the cost of superselection sectors associated with the state of baby universes. Spacetimes studied by Polchinski and Strominger in 1994 provide a simple illustration of the associated concepts and techniques, and we argue them to be a natural late-time extrapolation of replica wormholes. The work aims to be self-contained and, in particular, to be accessible to readers who have not yet mastered earlier formulations of the ideas above.

The bulk of the recent discussions have been couched in terms of Euclidean path integrals. Indeed, even [19,24] which discussed the effect of replica wormholes in Lorentz signature did so by studying Euclidean signature replica wormholes, using them to compute entropies as functions of Euclidean coordinates, and analytically continuing the results to real times. But it is clearly of interest to understand an intrinsically Lorentz-signature description, especially since topology change is generally incompatible with having a smooth Lorentz-signature metric.
In addition, the recent discussions also rely heavily on AdS/CFT duality or related concepts. This was true even for the asymptotically flat analyses of [25][26][27][28] in which arguments were made by analogy with AdS/CFT. But reliance on AdS/CFT presents difficulties as the physics of spacetime wormholes raises the so-called 'factorization problem' that calls into question the standard interpretation of AdS/CFT. As a result, questions have been raised [29] as to what physics is really being studied.
Our goal here is to reformulate the recent progress in a manner that i) focusses on operationally defined quantities (the outcomes of 'experiments' performed by asymptotic observers), ii) can be stated and analyzed entirely in Lorentz signature, and iii) emphasizes that the physics described follows directly from having a low energy gravitational path integral that sums over topologies. While we take the inclusion of this sum over topologies as a fundamental assumption in this work, there will be no explicit input from string theory, holography (AdS/CFT), or any other UV theory of gravity. To underline the last point, we will work entirely with asymptotically flat spacetimes (though analogous statements apply directly to the asymptotically AdS case as well). As a result, AdS/CFT is mentioned only briefly in tangential comments.
Nevertheless, the interpretation of the black hole's Bekenstein-Hawking entropy (S BH = A 4G + corrections) as a density of states will be a common touchpoint throughout our discussion. We do not take this to be a fundamental assumption, but rather a hypothesis to be constantly tested and explored. Indeed, while to some this interpretation will seem natural due to the success of classical black hole thermodynamics -or perhaps even required by this success, see e.g. [30] -it also flies in the face of physics associated with quantum field theory on an evaporating black hole background and perturbative quantum gravity (see e.g. [31][32][33][34][35]).
In particular, perturbative quantum gravity would suggest that Hawking radiation is essentially thermal, which is in tension with the statistical interpretation that S BH counts black hole states. Under the standard laws of quantum mechanics, the density of states is an upper bound on the entanglement of any system. Since Hawking evaporation causes S BH to decrease over time, the above interpretation thus would appear to force the von Neumann entropy of Hawking radiation to become small in later stages of the evaporation. As described by Page [36], it would then be natural to expect the von Neumann entropy of radiation from a black hole that forms from rapid collapse to begin at a small value, increase while thermal radiation is produced, but then to 'turn over' and decrease once it comes close to saturating this bound, requiring deviations from exact thermality.
The resulting 'Page curve' is shown in figure 1. It will feature many times in our discussion below, again as a touchpoint to be compared with various calculations. In particular, the downward sloping part of the Page curve requires information inside the black hole to be returned to the external universe. The literature on black hole information JHEP04(2021)272 often describes this as a result of requiring 'unitarity'. But, as noted above, there are rather more assumptions involved than just strict unitary evolution of the full quantum gravity system. In the present work we will thus instead use the term 'Bekenstein-Hawking unitarity' (or BH unitarity) to refer to this suite of ideas, which we summarize as follows: 1 Bekenstein-Hawking unitarity: in order to describe measurements of distant observers, black holes can be modelled as a quantum system with density of states e S BH whose evolution is unitary (up to possible interactions with other quantum systems).
We emphasize that our definition of BH unitarity is operational, referring to observations. In contrast, as we discuss further below, the von Neumann entropy of Hawking radiation is not a directly observable quantity; rather, it can only be inferred indirectly from other measurements. Looking ahead, this will be important for our conclusions since it allows BH unitarity to be satisfied despite the fact the von Neumann entropy may not, strictly speaking, follow the Page curve in figure 1. This discussion has strong overlap with those of [11][12][13][14].
We will see below that many of the concepts and techniques related to Lorentzsignature spacetime wormholes, baby universes, and the like are well-illustrated by spacetimes described by Polchinski and Strominger in 1994 [14], which we dub 'PS wormholes'. Indeed, while PS wormholes are not under semi-classical control, and while analyzing them in isolation leads to apparent violations of BH unitary [14], we will argue them to be a natural late-time extrapolation of the replica wormholes that were shown in [18,19] to reproduce the Page curve. Since this extrapolation turns out to lead to several simplifications, we will devote significant time to discussing PS wormholes in effort to make our treatment as explicit as possible.
Indeed, a final goal of this work is to make the manuscript below accessible to those who have not yet mastered the above references. Rather than review those works in detail, we instead return to the logical beginning and start in section 2 with a brief review of the Hawking effect in a fixed black hole background, but emphasizing both the path integral approach and the in-in formalism that will be useful in later parts of this work. While none of this material is new, it differs sufficiently from the most common treatments in the literature. We then use this perspective to discuss the inclusion of semiclassical quantum gravity and perturbative back-reaction in section 3. This sets up the standard challenge for BH unitarity associated with apparent large deviations from the Page curve, and which is often called 'the black hole information problem' [34,[37][38][39].
The following sections resolve this problem by identifying new saddles for the gravitational path integral. Some possible effects of new saddles, and especially on measurements of entropy by asymptotic observers, are illustrated in section 4 through the study of PS wormholes. Although the inclusion of PS wormholes requires assumptions about physics beyond semiclassical control, it provides a simple introduction to ideas that will be of use later in this work. A key such point is that spacetime wormholes lead to correlations JHEP04(2021)272 between the outcomes of what might at first appear to be completely independent experiments. We also discuss challenges for BH unitary raised by PS wormholes alone, setting the stage to introduce and include replica wormholes in section 5. Doing so resolves the PS challenges and reproduces the Page curve using calculations that are fully under semiclassical control. We will also see that PS wormholes are a natural late-time extrapolation of replica wormholes.
It then remains to provide a Hilbert space description of the physics of the Page curve, and to characterise the correlations arising from replica wormholes. This is done in section 6 by slicing open the above path integrals. We find a 'baby universe' Hilbert space of intermediate states which defines superselection sectors associated with the values of asymptotic quantities. As a result, it leads to an ensemble description of the theory from the viewpoint of asymptotic observers. Again, the PS wormholes provide a simple illustration. Section 7 concludes with a summary and discussion of open issues.

Hawking radiation and the path integral
This section contains a schematic overview of Hawking's original calculation [40] of the production radiation using linear quantum fields in a fixed classical spacetime, without back-reaction or evaporation. We also recall how this calculation can be reformulated in terms of a path integral, and how the path integral can be used to compute the Rényi entropies of the Hawking radiation. This review lays the groundwork for the semiclassical quantum gravity discussions in section 3. In keeping with the general philosophy of this paper, we will emphasise the computation of observables accessible to an asymptotic observer. Readers seeking a more thorough review of the Hawking effect in a fixed background should consult appendix A, the original work [40], or pedagogical introductions such as [41] or [38].

Hawking's Heisenberg picture calculation
The original argument of [40] considered a black hole with a single asymptotic region that forms from collapse of matter in an asymptotically flat space. For simplicity we consider a spherically symmetric collapse of uncharged matter so that the final black hole is Schwarzschild. A conformal diagram for such a spacetime is shown in figure 2(a) below. And for further simplicity, we follow [40] in taking the quantum fields to be massless so that their initial data is specified at past null infinity I − . There the spacetime is completely flat, and the state |ψ of the quantum fields is taken to coincide with the Minkowski vacuum on I − .
We are interested in the predictions of observations made at future null infinity I + . In particular, we would like to compute the expectation values ψ|O(I + )|ψ of operators O(I + ) defined at I + . Following [40], we work in the Heisenberg picture. We thus evolve the operators O(I + ) backwards in time to write them in terms of operators at I − . Since the Hilbert space at I + can be described as a Fock space of 'out' scattering states, we can build all operators at I + from creation and annihilation operators a † m (I + ), a m (I + ), labelled by some complete orthonormal set of modes indexed by m. Using the Heisenberg  In the shaded region of (b) near I + , our spacetime is nearly stationary. There, the backwards-propagation reduces to scattering in a fixed potential and results in a transmitted part T and a reflected part R. For modes localized at late (retarded) times on I + , the reflected part R will remain in the nearly stationary region and transmitted part T will be localized very close to H + . In particular, the wavelength of T becomes very short in the reference frame shown. This allows us to complete the backwards-propagation of T from the near-horizon region to I − using geometric optics. evolution back to I − , we can write a m (I + ) in terms of corresponding operators a † n (I − ), a n (I − ) acting on the Fock space of 'in' states, and similarly for a † m (I + ). Since we took the initial state at I − to be the vacuum |0 I − annihilated by all a n (I − ), this rewriting allows us to compute all observables at I + .
For a free quantum field theory, the relationship between creation and annihilation operators at I + and those at I − is linear. The Heisenberg evolution is thus given by a Bogoliubov transformation a m (I + ) = n α mn a n (I − ) + β mn a † n (I − ) ,

JHEP04(2021)272
Black holes radiate as a simple consequence of the fact that β mn is nonzero, so the outgoing occupation numbers are positive despite choosing an ingoing vacuum. At least for operators associated with field modes m that are localized at late retarded times (large affine parameter u along I + ), it is straightforward to compute the Bogoliubov transformation (2.1) using two facts. The first is that, in the region close to I + , the spacetime is well-approximated by that of a stationary black hole. Mode propagation in this region thus reduces to solving a Schrödinger-type problem. The second important fact is that, once the mode is propagated backward into the near-horizon region, it becomes localized very close to the horizon. In particular, as a result of the second property we may use the WKB approximation to justify either the use of geometric optics in further propagating the mode back to I − [40], or the use of the adiabatic approximation to evaluate correlators without explicitly completing the backwards propagation to I − [41][42][43][44]. These features are illustrated in figure 2(b). When combined, they establish the familiar result that the occupation numbers N m (I + ) of such late-time modes are thermally distributed, with grey-body factors appropriate to the black hole. Interactions do not change this qualitative picture. The details of this argument are not relevant to our presentation below, but we include a brief summary in appendix A for readers wishing to review them. Readers seeking a more thorough discussion should consult the original paper [40] or reviews such as [38,41].
In the above discussion we formulated Hawking's calculation as the computation of expectation values of all possible operators on I + . This is equivalent to describing the state of quantum fields on I + . Indeed, one way to define the density matrix of a region is as the linear functional that maps operators on that region to their expectation values. Connecting to the usual Hilbert space language, there is a unique ρ such that this functional acts as O → Tr(ρO). We can recover matrix elements ρ ij of ρ explicitly from expectation values by choosing O = |j i|, where the states |i , |j are chosen from a complete basis of pure states on I + .
Famously, despite choosing a pure state on I − , the state ρ on I + is not pure; that is, it cannot be written as |ψ ψ| for any |ψ . This impurity arises for the simple reason that I + is not a Cauchy surface, as Cauchy surfaces must reach the regular origin shown as a vertical black line in figures 2(a), 2(b). Equivalently, while we can perform Heisenberg evolution of operators from I + back to I − , we cannot do the reverse, since the operator resulting from forward evolution will have support on the black hole interior.

Path integral version
We now recall how the computation outlined in section 2.1 can be formulated as a path integral over quantum fields. 2 In this description, the actual computation of the effect is somewhat more cumbersome. However, as we will see in the remaining sections below, the path integral framework allows us to straightforwardly incorporate both perturbative back-reaction and certain non-perturbative quantum gravity effects.

JHEP04(2021)272
In our experience, most textbook treatments of path integrals work in the Schrödinger picture and emphasize the co-called 'in-out' formulation. In particular, the latter is naturally associated with computations of transition amplitudes. However, since our discussion will continue to emphasize expectation values, we will instead focus on the 'in-in' formulation of path integrals below. We will also continue to use the Heisenberg picture as in section 2.1 above. Both choices will simplify the discussion of various issues in the sections that follow. But the departure from standard textbook treatments suggests that we proceed slowly for the moment. We will thus first review various general features of in-in Heisenberg-picture path integrals in section 2.2.1 before returning to Hawking emission in section 2.2.2.

Path integral preliminaries
Before turning to expectation values, we begin by considering the path integral between initial and final Cauchy surfaces Σ ± . We use φ to denote the set of local bulk fields over which we integrate. The corresponding Heisenberg-picture operatorsφ are defined by insertions of the field φ (or more general functionals of φ) into the path integral. We first consider a path integral with boundary conditions specifying that the fields on Σ ± take definite values φ ± . These boundary conditions correspond to eigenstates of the field operators on Σ ± with eigenvalues φ ± , and this path integral computes the inner product + φ + |φ − − : There is of course a choice of phases to be made in defining such eigenstates, and this choice is associated with the choice of possible boundary terms in the path integral action I [φ] (and with the fact that such boundary terms can change under canonical transformations). In addition, it can be difficult to keep track of normalisations in the path integral, so we should ultimately consider normalization-independent ratios.
Since |φ ± ± are defined as eigenstates of different sets of field operators, on Σ + or on Σ − , they give different bases for the Hilbert space. The inner products + φ + |φ − − give the change of basis matrix. These may be thought of as the matrix elements of the timeevolution operator U = P exp (−i dtH(t)) with a time-dependent Hamiltonian H(t), so we will loosely use U to indicate the path integral (2.3).
Given an operatorÔ + defined in terms of fields on Σ + , we can describe its Heisenberg evolution back to Σ − by computing its matrix elements − φ (2) − |Ô + |φ (1) − − between the pair of field eigenstates |φ To do this, we can insert a complete sets of states |φ + + on whichÔ + takes definite values O + (φ + ). This leaves us to compute the two overlaps φ + |φ (1) − and φ (2) − |φ + before integrating over φ + . Since there are two such overlaps to compute, we have a doubled set of fields φ (1,2) in the path integral, though these sets must be identified at Σ + :

JHEP04(2021)272
Identify Figure 3. A path integral that computes the matrix elements φ − . The right copy of the spacetime contains fields φ (1) and is weighted by e iI[φ (1) ] , while the left copy contains fields φ (2) and is weighted by e −iI[φ (2) ] (or more generally by the CPT conjugate of the action on the left copy). This conjugation is associated with the fact that the initial conditions for the right copy (fixing the field on Σ − ) are defined by the ket-state |φ We may equivalently think of doubling not the fields on a given spacetime, but the spacetime itself. The doubled spacetime then has two branches which are glued to each other on Σ + ; see figure 3. This perspective becomes particularly natural once we incorporate quantum gravity effects, since the geometry can fluctuate independently on each branch of the spacetime. The first branch (which provides a home for the field φ (1) ) begins at the initial 'ket' state |φ (1) − and describes a forward time-evolution computing U . We then insert the operatorÔ + before passing to the second branch of the spacetime. The field φ (2) lives on this second branch, and the associated path integral computes the backward evolution U † . The combination of these gives the familiar Heisenberg evolution of the operator. The distinction between forward and backward evolution is implemented in the path integral by the relative sign between I[φ (1) ] and I[φ (2) ] -or, more generally, by CPT conjugation which may also act nontrivially on fields.
Because our quantum field theory is unitary, if we happen to consider a trivial operator for which O + (φ + ) is independent of φ + then the backwards and forwards evolutions will cancel. In that case the result is clearly independent of the choice of slice Σ + on which the two spacetime branches are joined. More generally, so long as we interpret O + (φ + ) as being evaluated on one of the two branches, we may choose the two spacetime branches to be glued along an arbitrary Cauchy surface Σ, as long as the support ofÔ + lies in the past of Σ. This slicing-independence will prove useful in our discussions below.
The eigenstates |φ − − of field configurations on the initial slice Σ − are typically not of direct physical interest. But other boundary conditions can be described by integrating over field configurations on Σ − with some choice of weighting. This corresponds to allowing a general state, written as a superposition of eigenstates |φ − − as defined by its wavefunction. Now, since it is usually inconvenient to specify states of interest through their explicit wavefunction, we may instead choose to describe them by introducing further  Schwinger-Keldysh) contour in the complex time-plane that computes the expectation value of O + at Lorentzian time t in the vacuum state |0 . The contour begins at negative infinite Euclidean time and follows the Euclidean axis to the origin. This part of the contour computes |0 in terms of fields at t = 0. The contour then proceeds along the Lorentzian axis (this part of the contour corresponding to the right spacetime of figure 3) until O + is inserted at t, whence it returns to the origin (the left spacetime of figure 3). Finally, it proceeds from the origin to positive infinite Euclidean time to compute 0|. For clarity, the various parts of the contour have been slightly displaced from the axes in the figure. path integrals. For example, in our Hawking effect problem, we might specify the initial Minkowski vacuum state |0 I − by inserting a path integral over semi-infinite flat Euclidean space and connecting it to the real Lorentz-signature path integral computing U .
We can now assemble these ingredients: an initial 'ket' state prepared (perhaps) by a Euclidean path integral, a Lorentzian path integral performing forward time evolution, insertion of the operator of interest, backward time evolution, and finally the preparation of the initial 'bra' state. The resulting spacetime on which we perform the path integral (see figure 4) is the 'in-in' or Schwinger-Keldysh contour, and encodes the natural formulation of dynamics when we do not wish to specify a final state [46][47][48].

The in-in formulation of Hawking emission
Let us now apply the above general description of quantum fields in curved spacetime to the problem at hand. The resulting path integral is shown in figure 5(a). We are interested in the expectation values of an operator O + located on I + , so we should take our future boundary Σ + to lie in the far future and to coincide with I + in the region where O + is supported. Away from our operator insertion, we our free to extend Σ + to a complete Cauchy surface in any way we please. Furthermore, the slicing independence described above guarantees the final result to be independent of such choices. The path integral is then performed on two copies of the spacetime, but only in the region to the past of the Cauchy surface Σ + . These two copies are identified along Σ + , where we also insert a weighting corresponding to our operator O + . Since these insertions are restricted to I + , the identification effectively performs a partial trace over the interior part of Σ + .

JHEP04(2021)272
Identify (a) The path integral which computes the expectation value of an operator O + on I + . The right and left copies of the spacetime perform forward and backward time-evolution respectively. They are glued together along a Cauchy surface Σ + , which must coincide with I + in the region where O + is supported (denoted by the black blob) but which is otherwise arbitrary. The region to the future of Σ+ is not part of the spacetime on which our path integral is performed.
The path integral which computes matrix elements of the density matrix on I + . Along the two copies of I + , we impose boundary conditions which weight field configurations according to the wavefunctions of states |i I + , |j I + . If I + were a Cauchy surface, this would cause the path integral to fall into two disconnected pieces, indicating that the state is pure. Here, this does not happen since the two branches remain joined along Σ int , which is a Cauchy surface for the black hole interior. As discussed at the end of section 2.1, computing expectation values of all operators on I + is equivalent to describing the state there. In particular, we can compute components ρ ij of the density matrix on I + by choosing our operator O + to be |j I + I + i| for pure states |i I + , |j I + on I + . We depict this in figure 5(b). This operator insertion corresponds to a boundary condition that weights field configurations on the two branches of the Schwinger-Keldysh contour independently, so the branches are no longer meaningfully joined along I + ; in operator terms, this says that our O + has rank one. If I + were a Cauchy surface, then this boundary condition would cause the path integral to split into two disconnected pieces. Our ρ ij would then become a product of (conjugate) functions of i and j alone, and hence a rank one matrix describing a pure state. However, this does not occur because any Cauchy surface Σ + must include a piece Σ int covering the interior of the black hole as well as a piece running along I + . The two branches of the contour remain connected through Σ int , and the state on I + is mixed. This joining of the two branches is the path integral implementation of what is often called 'tracing out' the interior state living on Σ int .
In practice the simplest way to evaluate the above path integrals may well be to relate it to the Heisenberg-picture computation of section 2.1 and to use the results computed there. Nevertheless, the formulation in terms of the path integrals of figure 5 will prove useful in our quantum gravity discussions below.

Entropies from the Hawking path integral
We now conclude our review of the Hawking effect on a fixed background with a discussion of entropies. The main point will be to review how path integrals may be used to study the Rényi entropies of subsets of the Hawking radiation at I + , quantifying the tension between the original Hawking calculation and BH unitarity via the Page curve in figure 1.
First, we must slightly generalize the above discussion to compute the density matrix ρ u associated not with the entirety of I + , but only with the Hawking radiation that reaches the subset I u ⊂ I + of points at retarded times u < u. To compute matrix elements of ρ u , we simply modify the discussion above as depicted in figure 6. On I u , we fix boundary conditions according to the desired matrix elements. We then join the two branches of the path integral along a partial Cauchy surface Σ u that reaches I + at u (rather than joining them on some Σ int that reaches I + only at its future endpoint i + ).
Next recall that we are interested in Rényi entropies. The nth Rényi entropy of a density matrix ρ is defined by where we have allowed for the possibility that ρ has not yet been normalised (i.e., that it may not have unit trace). As noted above, in path integral constructions it is typically simpler to work with unnormalized states than to keep track of all normalizations.
To compute Tr(ρ(u) n ) from the path integral, we start with n copies of the spacetime depicted in figure 6 to construct n replicas of ρ(u). We then sew these replicas together as instructed by the matrix products and trace in Tr(ρ n ). Specifically, the 'ket' boundary labelled by the state |j r u on the rth replica becomes identified with the 'bra' boundary labelled by u i r+1 | on the (r + 1)th replica since the insertion of complete sets of states amounts to setting i r+1 = j r and then summing over a complete set of such wavefunctions. The trace completes this pattern cyclically. The result is shown in figure 7 for the case n = 2. It is often of interest to compute (or to imagine computing) the Rényi entropies for all integers n ≥ 2, studying an appropriate analytic continuation, 3 and taking the limit n → 1 which defines the von Neumann entropy S(ρ).
For any n, the resulting Rényi entropy will be infinite due to high-frequency modes at the 'entangling surface' where Σ u meets I + at retarded time u. This divergence is local at the entangling surface and is state-independent, so it is not related to the physics of interest (in particular, it is independent of u). We will subsequently assume that some regulator has been chosen, for example subtraction of the Minkowski vacuum result, and implicitly discuss the resulting finite quantity throughout. 3 An analytic function is uniquely determined by its values on nonnegative integers under certain growth conditions. Specifically, by Carlson's theorem it is sufficient that f is analytic on the right half-plane, of exponential type (there are real constants C, τ such that |f (z)| ≤ Ce τ |z| for all z in the half-plane) and there exists c < π such that |f (iy)| ≤ Ce c|y| for all real y. For systems with finite-dimensional Hilbert spaces, the Rényis always satisfy such conditions. In practice, the same seems to hold for physically interesting states on infinite-dimensional Hilbert spaces. Figure 6. The path integral on this geometry computes matrix elements u j|ρ u |i u of the density matrix ρ u describing Hawking radiation in the piece I u of I + before retarded time u. Two copies of the original black hole spacetime have been glued together along a surface Σ u , which defines a Cauchy surface when joined to I u . We impose boundary conditions on I u corresponding to the states |i u , |j u .

JHEP04(2021)272
To compute Tr(ρ(u) 2 ), we perform the path integral on the geometry built from two replicas of figure 6, identified as shown.
Since Hawking radiation rapidly becomes thermal at I + , after some brief transient behavior any correlation functions on I + decay rapidly when clusters of points are separated by more than a thermal retarded time. As a result, over large stretches of time the density matrix on I + may be thought of as a tensor product of thermal density matrices (with appropriate grey-body factors) associated with smaller pieces of I + . As a result, all Rényi entropies S n and the von Neumann entropy S will increase linearly with u at large u. As noted in the introduction, this behavior is inconsistent with BH unitarity which would require S to be bounded by the Bekenstein-Hawking entropy S BH defined by the Bondi mass at each retarded time u.

Semiclassical path integrals and back-reaction
For our review of Hawking's calculation in section 2, we treated spacetime as a background field with a fixed nondynamical metric, and we integrated only over matter fields. We now wish to incorporate gravitational dynamics by integrating over metrics. Of course, outside of simple toy models it will be difficult to perform (or even define) the gravitational path integral exactly. Instead, we will treat the path integral as a weak-coupling expansion in a nonlinear effective theory. In practice, this means that we look for saddle-point configurations for which the classical (or, perhaps, the quantum-corrected) effective action is stationary under variations of the metric and other fields, and we then integrate over fluctuations around these saddles.
We will thus need to specify boundary conditions for the metric. It is natural to impose boundary conditions in asymptotic regions of spacetime where gravity becomes weak, in analogy with scattering problems in quantum field theory. We will integrate over asymptotically flat metrics, and choose in-states and out-states for gravitons (along with matter fields) on I ± . Alternatively, following the review of section 2, we may use boundary conditions that do not completely specify a final state, and we may instead compute an asymptotic observable using an in-in formalism. In either case, we specify the metric and states only in the asymptotic region. We will place no restrictions on the metric deep in the spacetime interior. We will thus include contributions from any saddle-point metric matching the specified asymptotics. In particular, we allow all spacetime topologies.
To describe perturbative quantum effects, it will be convenient for us to treat the metric separately from matter fields, and begin by 'integrating out' the matter. For a given spacetime with metric g, we can use ideas reviewed in section 2 to perform the matter path integral as a QFT on the fixed background, which we can write as a quantum effective action: e iI eff [g] := Dφ e iImatter [φ,g] .
To incorporate perturbative effects from the fluctuations of the metric itself, such as black hole evaporation by emission of gravitons, this 'matter' effective action should also incorporate a one-loop effective action from integrating out linearised metric perturbations; see e.g. [49,50]. A saddle-point in the integral over metrics g is then a stationary point of the combined gravitational (Einstein-Hilbert) action and matter effective action I EH [g]+I eff [g].

Incorporating back-reaction
We now have everything we need to begin making predictions using semiclassical gravity. We first adapt the calculations of section 2.2 to incorporate a dynamical metric, preparing an initial state of matter at I − to form a black hole, and asking for the expectation value of some observable at I + . The relevant boundary conditions are similar to the situation pictured in figure 5(a), with the two branches of the in-in contour joined at a future boundary. But thus far the metric has been specified only asymptotically at I ± , and in the interior we sum over allowed possible metrics. As already noted above, in practice this means that we will proceed by studying saddle points, where here we explicitly mean saddle points of I EH [g] + I eff [g].

JHEP04(2021)272
Finding saddles can be construed as solving the associated equations of motion. However, one should realize that this is not a standard Cauchy evolution problem for two reasons. The first is that the quantum-corrected effective action is generally non-local. The second is that we impose boundary conditions at both copies of I − and also at both copies of I + , rather than imposing two conditions (on fields and on their derivatives) on a single Cauchy slice. As a result, there can be multiple saddles that contribute to a given path integral, and it can be challenging to determine whether one has in fact found all of the relevant ones. One is thus often left with simply searching for saddles and seeing what physics they entail. If one later finds additional saddles, one will need to correct the original calculation to take the new saddles into account.
It is natural to begin by assuming quantum effects to be small and treating I eff as a small correction to I EH [g]. In particular, the latter includes a factor of the inverse Newton constant 1/G, and is thus very large in the semiclassical gravity limit G → 0. The most obvious saddle for our path integrals is thus given by starting with the classical collapsing black hole solution that was used as a fixed background in section 2.1 and including perturbative corrections from I eff . Note that the variation of I eff with respect to the metric is precisely the expectation value of the stress tensor of the quantum matter fields 4 over which we have already integrated in the initial state |0 I − , up to effects associated with post-selection when the state is also (partially) specified at I + . So this indeed incorporates back-reaction from the Hawking radiation described earlier. We shall focus on this saddle below, turning to other possible saddles only in sections 4 and 5.
Let us begin by ignoring post-selection at I + , so that back-reaction is precisely given by the expected stress-energy tensor in the state |0 I − . As is well known, this tensor carries a flux of positive energy to infinity and a flux of negative energy into the black hole. The flux is small, so significant changes to the background occur only when they can build up over long times, or over large affine parameters. Now, in the original classical solution of figure 2(a), the only null geodesics that extend to infinite affine parameters toward the future are those that lie entirely outside the event horizon. As a result, any additional null geodesic that extends to large affine parameter must be confined to the region close to the original event horizon. We thus conclude that there is a large region inside the black hole where perturbative corrections give little change in the physics, and where the spacetime continues to collapse at least until such time as the curvatures become large (which presumeably means Plank scale). For simplicity, we will continue to call this large-curvature a singularity and to indicate it by a jagged line on spacetime diagrams. This is consistent with our current ignorance and lack of control over Planck-scale physics, though we do not rule out the possibility that a better description may become available in the future.
On the other hand, back-reaction can be significant when one follows a null geodesic that lies just inside the event horizon of the original background. Congruences of such geodesics can be studied using the Raychaudhuri equation (see e.g. [51]). In particular, while they begin with a slight negative expansion, if this initial negative value is sufficiently JHEP04(2021)272 small (i.e., for congruences close enough to the event horizon of the original background) the incoming flux of negative energy causes the expansion to evolve through zero and to eventually become positive. This indicates that such congruences in fact escape to I + . Taking a one-parameter family of such congruences and using the cuts on which the expansions vanish to define an apparent horizon, the fact that each successive congruence must begin with a more and more negative expansion means that this apparent horizon must shrink. And again, this description must continue to hold until the curvature becomes Planck scale, at which point the apparent horizon is also correspondingly small. We denote this locus E and refer to it as the 'endpoint' of Hawking evaporation in the expectation that little more of interest can happen after this point. 5 We will idealize E as a codimension-2 surface, though it reality it describes a region of small but finite size. We define the 'evaporation time' u E to be the retarded time of the past boundary of E ; that is, the time at which Planckian curvatures are first visible asymptotically.
Without a better understanding of Planck scale physics, it is impossible to say whether and how the singularity and E influence other parts of the spacetime. But there is a unique perturbatively-semicalssical evolution in regions of spacetime from which they are causally separated, and of course also in the region to the past of the singularity and E . This region of semiclassical control is shown figure 8. It is not geodesically complete, and does not contain a complete I + . Instead, it has a future boundary defined by the singularity, E , and (using our spherical symmetry to rule out caustics and the like) the outgoing null congruence N E from E (dotted line in figure 8) at retarded time u E . However, it can be used to study black hole evaporation so long as we do not ask about what occurs beyond N E .
In particular, let us now use the spacetime of figure 8 to construct back-reacted saddles for the density matrix ρ(u) on a region I u ⊂ I + that is expected to be under semiclassical control. We thus wish to find a back-reacted analogue of figure 6. The one issue we must consider is post-selection at I u , as this can modify the stress-energy fluxes to I + and across H + . However, as typical states at I + have stress-energy fluxed close to the mean, such effects are typically small. And even when they are large, they make little impact on qualitative features of figure 8.
We may thus construct a saddle for ρ(u) in direct analogy with figure 6, and in particular by sewing two copies of figure 8 to each other along a partial Cauchy surface Σ u that runs from the regular origin at the center of the collapsing matter to retarded time u at I + as shown in figure 9. The only difference from working on a fixed background is that gravity dynamically determines the spacetime away from the boundaries. The contribution of this saddle to the path integral is independent of the choice of Σ u , since the phases in the classical action from the two branches of the contour cancel, and the matter evolves unitarily on a fixed background. We see that the entire calculation is under semiclassical control and makes no reference to strong curvature regions. Figure 8. The region under semiclassical control in an evaporating black hole spacetime. In the far past, the diagram coincides with 2(a) and the black hole forms from collapse of matter. This region is bounded by the jagged line (called the 'singularity'), its endpoint E , and the outgoing null congruence N E (dotted line). Planck scale physics becomes important at the singularity and E , and may influence further evolution of the spacetime. The event horizon H + (dashed line) is defined to be the boundary of the past domain of dependence of the singularity and E , and we refer to this past domain of dependence as the black hole interior. |j〉 u u 〈i| Figure 9. The saddle-point spacetime for computing the density matrix of an evaporating black hole. The future of the blue slice Σ u where the identification occurs is not part of the configuration, so the spacetime is weakly curved everywhere, and in particular excludes the singularity.

JHEP04(2021)272
For future reference, and because it involves essentially the same physics as Hawking's original calculation [40], we refer to the density matrix defined by saddles of the form shown in figure 9 as the Hawking density matrix: Because back-reaction is small, ρ Hawking (u) is essentially a thermal state with a temperature that varies slowly with retarded time u.
Since the predictions for any experiment are encoded in the density matrix, we see that perturbatively-semiclassical gravity suffices to make probabilistic predictions for any JHEP04(2021)272 measurement of the Hawking radiation that avoids particularly late retarded times (at which the black hole has become Planck scale). Since these predictions are encoded in the highly mixed and quasi-thermal density matrix ρ Hawking (u), they violate BH unitarity and indicate the black hole density of states to be unrelated to the Bekenstein-Hawking entropy. Indeed, by the usual argument that starting with an arbitrarily large black hole leads to arbitrarily large entropy on I u even when the Bondi mass at u is held fixed, it suggests the actual black hole density of states to be infinite.
However, with access only to the Hawking radiation produced in a single black hole evaporation, we cannot operationally verify that the state is mixed. It turns out that this critical fact provides interesting room for further physics. The remainder of this paper is largely devoted to this point. In order to describe such possibilities without yet delving into the technical complications of replica wormholes, and to make connections with the historical literature, section 4 will use the crutch of making assumptions about physics that is beyond semiclassical control. But we will see in sections 5 and 6 that this crutch can be discarded, and that semiclassical gravitational physics does predict physics consistent with BH unitarity.

Entropy measurements and potential new saddles
Our calculation in the previous section has led us to suspect that the Hawking radiation is in a highly mixed state on I u , which in particular violates BH unitarity. Continuing with our philosophy of concentrating on the predictions for asymptotic observers, we might like to imagine performing an experiment to directly verify such violations. But this is impossible without access to several copies of the state. Indeed, as an immediate consequence of the familiar fact that a mixed state is equivalent to an ensemble of pure states, no measurement on a single copy can help us to distinguish a mixed state from an unknown pure state.
We must therefore form several black holes, taking care to prepare them in identical initial states, and collect their decay products. We end up with n sets of Hawking radiation, presumably in n identical copies of the same state since they were all prepared in the same way. With n identical copies in hand, it is a straightforward task to test whether a state is pure or highly mixed. For example, one may use the swap test of [52,53] which we will describe below. Now, what does semiclassical gravity predict for the state ρ (n) (u) of our n sets of radiation on I u ? At first sight, this may appear to be a frivolous question; surely it is trivially n copies of the result already obtained in (3.2), However, as observed by Polchinski and Strominger [14], this conclusion is too hasty. While it is true that our saddle-point computation of ρ (1) ≈ ρ Hawking immediately leads to a saddle that would give (4.1), considering n copies of the state together turns out to allow potential new saddle points. 6

JHEP04(2021)272
The purpose of this section to describe path integrals that predict experimental measurements of entropy and to connect them with the potential new saddles discussed in [14]. Before doing so, we will admit to the reader that the potential new saddles advocated in [14] involve physics that is not under semiclassical control. It is thus important that they will not form the basis of any analyses in section 5 or 6, or for the final conclusions of this work. We nevertheless review this proposal here for three other reasons. The first is that it serves as a pedagogical tool to explain the idea of new saddles without yet delving into the technical complications of replica wormholes. The second is that this helps to place recent developments in an appropriate historical context, as proposal of [14] turns out to have many similarities to the replica wormholes of section 5. And the third is that it suggests some of the physics that may in fact lie behind the semiclassical replica wormholes of section 5.
We thus dedicate section 4.1 to reviewing the proposal of [14], recasting the discussion in terms of experimental measurements at infinity. This is followed by a short aside in section 4.2, which describes how the black hole information problem is related to the lack of factorization of quantum gravity amplitudes. Experiments that involve only some I u ⊂ I + are introduced in section 4.3, and section 4.4 then describes shortcomings of the Polchinski-Strominger proposal, all of which will be resolved by replica wormholes in section 5.
Before diving in, we should remark that the Polchinski-Strominger work [14] was largely described in terms of two-dimensional models of gravity inspired by analogy with the string worldsheet. We interpret their proposal more broadly, applying it to more general theories of gravity in any dimension. In particular, much of [14] was concerned with the physics of the endpoint of evaporation E , the details of which will be unimportant for our considerations.

Polchinki and Strominger's proposal
To understand how considering n > 1 black holes can lead to new saddles, let us first construct the boundary conditions appropriate for such multi-black-hole experiments. For the purposes of the current section, we take our experimenter to collect all of the Hawking radiation emitted to I + for all times, deferring discussion of subsets I u to section 4.3. This will necessarily involve making assumptions about physics that is not under semiclassical control. 7 We will treat each black hole as if it is formed and decays in its own separate asymptotic region. As a result, our boundary conditions will be precisely n copies of the boundary conditions of figure 9 in the limit u → ∞ or, equivalently, extended from I u to all of I + . Placing each black hole in its own asymptotic region is a convenient abstraction, though the conclusions should be equally valid for n black holes in a single asymptotic region, so long as we prepare black holes which are sufficiently well-separated in time or space. 8 The boundary conditions for computing the components of the n-evaporation 7 As described in section 4.3, in the Polchinski-Strominger context this issue will not be resolved just by considering the subsets Iu. But replica wormholes will offer a resolution in section 5. 8 This can be thought of as a version of the cluster decomposition principle. Figure 10. An extension of the spacetime of figure 8 to larger u under the PS assumption. We have a complete I + with Minkowski-like future timelike infinity i + , and may treat I + ∪ Σ int as a Cauchy surface whenever Σ int is Cauchy in the black hole interior.
For n = 1, we expect a saddle given by extending figure 9 to u = ∞. As noted above, this extension must involve assumptions about effects in the strong curvature region. Roughly speaking, our interpretation of the assumption of [14] is that the black hole evaporates completely, but that information in the black hole interior does not emerge at I + . Indeed, Polchinski and Strominger describe information reaching the singularity as being transferred to a 'baby universe' that branches off from the parent universe and does not return, a perspective which we will explore further in section 6. For our purposes, we can cleanly state the required assumption as follows: 9 PS assumption: the extension of any evaporating black hole spacetime beyond the region of semiclassical control shown in figure 8 is such that (1) the spacetime is empty near future timelike infinity i + , so that this region resembles that of Minkowski space; and (2) for any Cauchy surface Σ int of the black hole interior, we may treat I + ∪Σ int as a (disconnected) Cauchy surface for the full spacetime.
We depict the evaporating black hole spacetime under this assumption in figure 10. We note that the PS assumption requires that the physics of E is appropriately local: in particular, the state of any radiation emitted to I + after the black hole becomes Planckian will be independent of the history of the black hole, such as the state on Σ int away from the strongly curved region E .  (b) Another contribution to ρ (2) with the same boundary conditions, for which the identifications between black hole interiors have been swapped. Figure 11. The Hawking (a) and Polchinski-Strominger (b) wormholes contributing to the density matrix ρ (2) describing the decay products at I + of two identically-prepared black holes.

JHEP04(2021)272
The PS assumption immediately allows us to sew together two copies of figure 10 to define a back-reacted saddle for the density matrix ρ = lim u→∞ ρ(u) on all of I + ; see either top or bottom of figure 11(a) below. The result satisfies the definition of a spacetime wormhole given in the introduction, since the boundary consists of two complete and disconnected copies of I + . For this reason, and because spacetimes like that of figure 10 were often championed by Hawking, we refer to this spacetime as the Hawking wormhole.
For n > 1 replicas, the spacetime which gives rise to the naïve result (4.1) for the n-evaporation density matrix ρ (n) is then simply n copies of the Hawking wormhole with boundary conditions |j r and i r | for r = 1, 2, . . . n; see figure 11(a) for n = 2. But since the boundary conditions are invariant under independent permutations of bras and kets, it is clear that we can then build further wormholes with identical boundary conditions by simply pairing 'bra' and 'ket' boundaries in different ways. This construction defines n! distinct wormholes over which our path integral must sum, one for each permutation of the n kets relative to the n bras. We refer to the doubled-spacetimes defined by the n! − 1 non-trivial pairings as PS wormholes. The single PS wormhole for the n = 2 case is shown in figure 11(b). Note that, although each wormhole involves E and its future (and thus leaves the domain of semiclassical control), since all n! − 1 PS wormholes are diffeomorphic to n-copies of the Hawking wormhole of figure 11(a), they also have precisely the same validity as the Hawking wormhole to be interpreted potential saddles.
Indeed, the fact that all n! saddles are diffeomorphic also requires them to contribute precisely the same weight to the path integral. We therefore find the components of our JHEP04(2021)272 density matrix to be given by a sum over all permutations π ∈ Sym(n), where Sym(n) denotes the symmetric group on n indices: i 1 |ρ Hawking |j π(1) · · · i n |ρ Hawking |j π(n) + · · · , (4.2) and where we have not normalised the state. The ellipsis (+ · · · ) in (4.2) indicates various potential corrections, including any that from possible further saddles that have not yet been identified. We will assume such corrections to be negligible for the rest of section 4.
As a result of (4.2), the rules for our semiclassical path integral, while treating PS wormholes as saddles, imply that the state ρ (n) of the n-evaporation Hawking radiation collected by our experimenter differs significantly from the state ρ ⊗n Hawking that would describe n identical independent copies of the mixed state ρ Hawking that she would collect from a single evaporation.
Since this may at first seem surprising, it is useful to note that (4.2) admits a natural Hilbert space interpretation. After we collapse n black holes and allow them to evaporate, we must trace out the n interiors. But once evaporation has proceeded to completion, we see from figure 11(b) that the interiors are no longer attached to a corresponding external spacetime. As a result, there is no longer anything to distinguish them. The sum in (4.2) treats the n interiors as indistinguishable objects obeying Bose statistics. We could say that each black hole interior is like a Bosonic particle, carrying many internal degrees of freedom to describe the state of the matter that formed the black hole and the ingoing Hawking partners. When we trace these out, having several interiors in the same quantum state means that we must include a symmetrisation as is familiar from Bosonic Fock spaces. This then leads to (4.2). We will explore the Hilbert space interpretation in more detail in section 6.
To understand the implications of (4.2), it is useful to introduce a unitary operator U π for each permutation π in the symmetric group Sym(n), where the U π act to permute states among the n collections of Hawking radiation: We can equivalently think of U π as a geometric symmetry operator acting on n copies of I + by the diffeomorphism which permutes them. Momentarily dropping the · · · in (4.2), we find where P Sym = 1 n! π∈Sn U π is a projection onto the completely symmetric subspace that is invariant under all permutations.
We can now ask what our experimentalist should expect when she tries to verify that the radiation is mixed. For a simple concrete example, we take the case of n = 2 copies and perform the swap test [52,53]. This means that we simply measure the swap operator S, which acts to exchange the two copies of the radiation. In terms of our previous notation, this operator is S = U π where π is the nontrivial permutation in Sym (2). Such

JHEP04(2021)272
measurements have two possible outcomes ±1 corresponding to the eigenvalues of S. For swap measurements performed on two uncorrelated copies of a single (normalised) density matrix ρ, the expectation value of such outcomes is (4.5) The quantity (4.5) is known as the 'purity' of ρ, and the last equality relates it to the second Rényi entropy S 2 (ρ) as defined in (2.5). For a highly mixed state such as ρ Hawking (which has S 2 of order G −1 N ) the expectation value is very close to zero. It is thus essentially equally likely that the measurement gives +1 as −1. On the other hand, for pure ρ it is guaranteed to obtain +1. We can therefore perform only a handful of measurements and distinguish reliably between the two cases. Now, from (4.4) it is manifest that ρ (2) is invariant under the action of S. We thus find Tr(Sρ (2) ) = Tr(ρ (2) ) = 1, and we predict that our experimenter will always obtain the result +1 from measurement of S. In other words, if we are inspired by (4.5) to summarize her observations by defining the 'swap (Rényi) entropy' then this swap entropy vanishes. This can be generalised to an nth swap Rényi entropy, defined through the expectation value of a permutation operator acting on n copies of the radiation as S swap where τ is a cyclic permutation of the n copies, 10 τ = (1 2 · · · n) ∈ Sym(n). (4.8) We leave the n in the definition of τ implicit, since it will be clear from context. Once again, from (4.4) it is manifest that ρ (n) is invariant under U τ , so the outcome of such a measurement will always be unity, and S swap n = 0. More generally, any measurement (of more complicated permutations for example, or even complete tomography to obtain the density matrix) will reproduce the expectations from a pure state, as will be made more manifest in section 6.
Nonetheless, the predictions for experiments on any one of the black holes are essentially unchanged by the inclusion of the PS wormholes. If we trace over n − 1 of our n copies, the density matrix on the remaining radiation is much the same as ρ Hawking . 11 The primary effect of the new contributions is not to alter the state of radiation for any particular black hole, but rather to introduce correlations between several different sets of radiation.

JHEP04(2021)272
(a) A swap test saddle with naïve connections in the bulk.
(b) A saddle in which the swap on the boundary is effectively cancelled by an additional dynamical swap in the bulk. For future reference, we note that the expectation value of S for radiation collected from two identically-prepared evaporating black holes can be directly formulated as a gravitational path integral. Two saddle points satisfying these boundary conditions are shown in figure 12. These are essentially the same saddles pictured in figure 11, where the identification of the black hole interiors can be either 'unswapped' or 'swapped', but now with boundary conditions appropriate to our swap test expectation value. The point is that summing these saddles gives precisely the same result as taking the trace of the saddles in figure 11 since 12(a) is diffeomorphic to the spacetime defined by taking the trace of 11(b) and 12(b) is diffeomorphic to that for the trace of 11(a). We have attempted to swap the radiation to check whether the state is mixed, but the gravitational path integral has dynamically hidden this from us by performing a matching swap of black hole interiors.

Wormholes and factorization
As noted above, the path integral boundary conditions required to compute the density matrix on I + involves two disconnected copies of I + ∪ I − , and thus two disconnected boundaries. So despite the fact that it involves only one density matrix, we may characterize this argument as a 'two-replica' calculation. Furthermore, as discussed above, the Hawking result ρ Hawking is obtained from a spacetime wormhole, in the sense that disconnected boundaries become connected through the dynamical bulk. In the Hawking wormhole this happens due the two copies of figure 10 being joined along the slice Σ int , which does not reach the asymptotic boundary. Both features are closely associated with the failure of BH unitarity due to the large entropy of ρ Hawking on I + .

JHEP04(2021)272
If one believes in BH unitarity, it might thus seem natural to seek a one-replica calculation that describes black hole evaporation. Rather than concentrating on observables, we might try to compute components of the S-matrix directly, or equivalently the wavefunction of the Hawking radiation at I + for a given initial state at I − . For this, we would like to compute the path integral with boundary conditions on a single copy of I + ∪ I − .
However, there is no clear way to compute this path integral semiclassically, even after making assumptions about the endpoint of evaporation E . We might attempt to proceed by using a single copy of the evaporating black hole geometry of figure 10, and then perform the path integral of the quantum fields on this background with appropriate initial and final boundary conditions. But we run into difficulty due to the presence of the future singularity (the jagged line in figure 10). First, we do not expect that our low-energy effective theory will be valid in the high-curvature regions near the singularity. Second, there is no obvious prescription for the boundary conditions or measure that we should apply when we integrate over quantum fields at the singularity, and the spacetime we have chosen may not be a stationary point of the action depending on what variations are allowed by the boundary conditions. This is a more severe problem than the one encountered at the endpoint of evaporation E when studying the path integral of figure 11, since the current problem affects all of the interior Hawking partners and scales with a positive power of the black hole's initial size. Resolving this by choosing a prescription to replace the singularity with a boundary condition is equivalent to the black hole final state proposal of [58]. We will instead take the more conservative point of view that semiclassical gravity simply does not offer an answer to this question.
On the other hand, we have seen above that the Polchinski-Strominger proposal gives operationally-defined entropies indicating the final state to be pure. As a result, it is natural to expect whatever physics lies behind this operational purity to also enable calculations of the above S-matrix components. At least in some sense, it should then cause the 'two-replica' Hawking wormhole calculation of ρ to factorize into a product of 'onereplica' S-matrices. Aficionados of the AdS/CFT correspondence will thus recognize that the black hole information problem is a special case of the so-called 'factorization problem' of AdS/CFT [59][60][61]. We shall return to this issue in the discussion of section 7.3.

Experiments on part of the radiation
Section 4.1 discussed predictions for the swap test as applied to the entirety of radiation on I + , and found that they are consistent with a pure state. Here, we will generalise this to ask for the predictions of the PS proposal when we measure only the radiation on the part I u of I + to the past of some retarded time u. We postpone interpretation of the results to section 4.4.2, where (along with other difficulties) we will discover them to be inconsistent with BH unitarity. Nevertheless, this calculation will be a helpful warm-up for the replica wormholes introduced in section 5.
From the PS proposal (4.4), the expectation value of an operator O (n) acting on n sets of Hawking radiation is given by Hawking , (4.9)

JHEP04(2021)272
where we have here used the more explicit notation U π (I + ) to include the region I + on which the permutation operator acts. Strictly speaking we should divide by a normalisation factor determined by setting O (n) = 1. However, except for the term defined by the identity permutation π = 1, all terms in this normalization factor are exponentially small. Thus the resulting corrections are negligible. We will ask for predictions when we measure a swap operator S(I u ) (or more generally U τ (I u ) for the cyclic permutation τ from equation (4.8)), but now acting only on I u , capturing the Hawking radiation that emerges before the retarded time u. As before, we encode the result in a 'swap Rényi entropy' generalising (5.1).
We begin with the case n = 2, where there are two terms: The first term is the expectation value of the swap operator S(I u ) in the tensor product state ρ Hawking ⊗ ρ Hawking . From (4.5), this yields e −S Hawking is the second Rényi entropy of the part (I u ) that is swapped. To understand the contribution of the second term, note that the product of two swap operators is again a swap operator: As a result, the contribution of the second term to Tr S(I u )ρ (2) is of precisely the same form as the first, but with S Hawking  Generalising this to cyclic permutation on n sets of radiation, there are n! terms, but only two terms are important, the identity permutation 1 and the inverse τ −1 of the cyclic permutation we are measuring. Other PS wormholes are exponentially suppressed relative to at least one of the two included terms. 12 In analogy with (4.12), the two terms give with the second coming from the relation . 12 There is an exception when both terms are comparable S Hawking n (u) ≈S Hawking n (u), in which case additional permutations give further interesting corrections: see footnote 23.
(b) The geometry with swapped interiors gives the second term in (4.11) and (4.12). Figure 13. The two PS wormholes contributing to the computation of Tr S(I u )ρ (2) , the expectation value of a swap operator acting on I u for two sets of radiation.

Challenges for the Polchinski-Strominger proposal
The observations of section 4.1 may suggest that the semiclassical gravity predictions for an asymptotic observer always conspire to produce results consistent with BH unitarity. However, if the only relevant contributions from the path integral are those discussed above, with further consideration one still finds serious problems.
These problems are described below. Using arguments related to the problem that will be described in 4.4.2, [14] concluded in their context that black holes in fact violate BH unitarity and instead described black holes as 'long-lived remnants'. 13 These difficulties will all be resolved in section 5 by appealing to the recently-discovered replica wormholes of [18,19]. Nonetheless, we will first discuss the issues in more detail so we can better appreciate this resolution.

What happens at the endpoint of evaporation E ?
As observed above, we lose semiclassical control near the endpoint of evaporation E once the black hole is of Planckian size. We have thus far followed [14] in making the PS assumption, but it would be a great improvement if we were able to arrive at the same conclusions without such assumptions, and with the semiclassical approximation justified throughout the calculation.

Violations of BH unitarity
We now discuss the result of section 4.3, where we computed the expectation value of a cyclic permutation acting on the radiation arriving at I + before retarded time u. Since von Neumann entropies are more familiar and more physical than Rényis, we will phrase the calculation in terms of the 'swap von Neumann entropy' S swap (u) obtained by formally taking the n → 1 limit of (4.13), (4.14) However, the same considerations apply directly to Rényi entropies as well. We interpret S swap (u) as a prediction for the von Neumann entropy that an asymptotic observer would deduce by performing measurements on many copies of the Hawking radiation emitted before time u.
To understand these quantities, we must simply note that Hawking's state does not contain significant long-range correlations, so can be regarded as a product of uncorrelated thermal states emitted at different times. This means that S Hawking (u) andS Hawking (u) are well-approximated by the thermal entropy of Hawking radiation emitted before and after the time u respectively. In particular, the sum S Hawking (u) +S Hawking (u) gives the total entropy S Hawking (∞) of all radiation at I + in the Hawking saddle, up to order one corrections from the vicinity of the boundary between I u and I u . In particular, S Hawking (u) monotonically increases from zero to S Hawking (∞), whileS Hawking (u) monotonically decreases between the same values.
At early times we have S Hawking (u) <S Hawking (u), so (4.14) is dominated by the first term, corresponding to the saddle-point in figure 13(a). The swap entropy S swap (u) thus increases until S Hawking (u) =S Hawking (u), at which point there is a first order phase transition, the second saddle-point in figure 13(b) becomes dominant, and S swap (u) decreases back to zero. While this is qualitatively very much like the Page curve in figure 1, it disagrees quantitatively and we find a result which is incompatible with BH unitarity.
The key point is that the entropy S Hawking (∞) on I + in the Hawking saddle exceeds the Bekenstein-Hawking entropy S BH of the initial black hole by a factor of order one. This discrepancy occurs because black hole evaporation is thermodynamically irreversible and hence produces thermal entropy; the generalized second law is not saturated by evaporation in the Hawking saddle.
We can be very explicit for Schwarzschild black holes evaporating by production of massless particles, since the various entropies are determined as a function of time by dimensional analysis. In particular, the production of Hawking radiation is determined by the geometry, which provides the only length scale R. The emitted power (energy per unit time) is therefore proportional to R −2 , and thermal entropy is produced at a rate per unit time proportional to R −1 . Meanwhile, in D spacetime dimensions the black hole mass M is proportional to G −1 R D−3 , and S BH is proportional to G −1 R D−2 . From this, we can solve everything up to a few unknown dimensionless constants to find 14 where D is the spacetime dimension and u E is the time taken for complete evaporation. The only undetermined parameter relevant for us is the (constant) ratio which depends in detail on the dynamics through greybody factors. However, the only point that is important for us is that it is greater than one. Indeed, as shown in figure 14 one finds a violation of BH unitarity by a factor of r. For four-dimensional black holes, Page has computed the corresponding ratio for von Neumann entropies in various cases [62,63]; for example, for Schwarzschild black holes radiating by emission of gravitons and photons he computed r ≈ 1.48. 15 14 We can define S BH (u) as the Bekenstein-Hawking of a black hole with mass given by the Bondi mass at time u. Equivalently, this is entropy of the black hole when the radiation arriving at I + at time u was emitted, where the precise definition of emission time is not important since the evaporation timescale u E is a positive power of G −1 N . 15 The total entropy of Hawking radiation S Hawking (∞) depends on details of the endpoint of evaporation beyond semiclassical physics. We can safely ignore these details, since the effect on the entropy does not (by our PS assumption) scale with the original size of the black hole.

JHEP04(2021)272
The above paragraphs describe a problematic violation of entropy bounds, but only by an order one ratio. However, as is familiar from other discussions, we can magnify the problem by refusing to let the black hole evaporate freely and instead feeding it with matter so that it remains at a given size for as long as we desire (perhaps even eternally as in [64]). If this time is very long, then in the middle of this period S Hawking 2 (u) andS Hawking 2 (u) will both become very large, so S swap 2 (u) is also very large. But the Bekenstein-Hawking entropy S BH is fixed by the current mass of the black hole. So from this analysis it would appear that black holes have an unbounded number of internal states below any given mass, a serious failure of BH unitarity.

Violations of causality
Perhaps an even greater problem than the failure of BH unitarity is the observation that (4.12) entails a possible violation of causality. In particular, since it involves the entropyS Hawking 2 (u) on I u , it predicts the swap entropy measured by our experimenter at a finite time u to depend on the entire future of the black hole! This is particularly sharp if we imagine first performing this measurement at a finite distance from the black hole (or at an AdS boundary), whence we can subsequently throw matter into the black hole depending on the swap entropy obtained. Such violations of causality appear large enough to even throw the consistency of above calculations into doubt. We take this to suggest that a consistent framework will require additional corrections to the swap entropy at finite u; such further corrections will be explored in the next section.

Replica wormholes
It is natural to ask if the above challenges might be resolved by finding further new saddles. Similar ideas have been investigated in various forms for many years; see e.g. [4,5,10,16,17]. We are now able to make this more concrete, since in the past year a new class of saddles have been argued to exist. These are known as replica wormholes for reasons that will shortly become clear. They were discovered as contributions to path integrals of the form studied in section 4.3 above, in our context giving the expectation value of the cyclic permutation operators U τ (I u ) acting on n copies of a subset of Hawking radiation. As we review below, the replica wormholes reproduce the expectations from the Page curve quantitatively, via a path integral over spacetimes where the semiclassical approximation can be trusted everywhere. This implies that the replica wormhole geometries must also contribute to other observables, and in general to the components of the n-evaporation density matrix ρ (n) , which we explore in section 5.4.

Replica wormhole spacetimes
In a sense, replica wormholes are a generalisation of PS wormholes studied in section 4, so we first revisit these in a way that is suggestive of the required generalisation. Specifically, we will reconsider the swap entropies of the Hawking radiation that emerges before some finite retarded time u, as discussed in section 4.3. Recall that this is an expression for JHEP04(2021)272 the expectation value of the cyclic permutation U τ (I u ) applied to n copies of Hawking radiation on I u .
The PS wormholes for this amplitude (pictured in figure 13(b) for n = 2) are built from 2n pieces, consisting of n 'ket' replicas M r of the evaporating black hole spacetime and n conjugate 'bra' replicasM r , labelled by a replica index r = 1, . . . , n. These spacetimes terminate at a future Cauchy surface Σ where they are sewn together. The surface Σ is divided into three pieces, with a different rule for sewing replicas along each piece. First, we have a region I u on I + , where the boundary conditions require us to join spacetimes with the cyclic permutation τ , so M r joins toM τ (r) . Next, we have an exterior piece Σ ext stretching from retarded time u on I + to the regular origin r = 0 that is expected to emerge after the final evaporation of the black hole (we can take Σ ext = I + if we like). In this region, we sew without permutation, so M r joins toM r ; this is also fixed by the boundary conditions on I + , which require such an identification in a neighborhood to the future of retarded time u, where Σ ext begins. Finally, we have a Cauchy surface for the black hole interior Σ int , reaching from the original regular origin (before the black hole evaporates) to the evaporation endpoint E . Here, the boundary conditions do not uniquely specify any sewing rule, and we can join M r toM r along I int with any choice of permutation we desire. The path integral includes a sum over all possibilities, and the dominant permutation for a given calculation is dynamically determined. In particular, the interesting new contribution to the swap entropies (4.13) arose from choosing the permutation on Σ int to match the permutation τ on I u imposed by the boundary conditions. This description also applies to replica wormholes, but generalised to allow a more general choice of Cauchy surface Σ where we sew the replicas, and to also allow a more general splitting of this surface into pieces. The region I u is fixed by the boundary conditions, so must remain unchanged, but we are free to choose how the remainder Σ u of the Cauchy surface is split into two pieces: a partial Cauchy surface I (the 'island') generalising Σ int in the discussion above, and its complement in Σ u which we continue to call Σ ext . The exterior surface Σ ext extends to meet I + at retarded time u, where the boundary conditions specify that bra and ket spacetimes are connected in the trivial way, but we sew along the interior island pieces I with a nontrivial permutation. For the boundary conditions computing the expectation value of U τ (I u ), the most interesting possibility again arises when we take the sewing permutation on I to match the cyclic permutation τ which acts on I u . The novelty of the replica wormholes is that we take the Cauchy surface Σ u to be connected, so that I and Σ ext meet at a common codimension-2 boundary γ = ∂I. Indeed, for Lorentz-signature spacetimes of this form, the causal structure must have an interesting singularity: points on γ will have several past light cones, one for each bra spacetime that meets at γ (and also one for each ket spacetime). This is an important feature, but we will treat it only briefly below, referring the reader to [15,65], and [66] for further details and deferring discussion of further implications to section 7.3.4. The resulting spacetime is depicted in figure 15 for n = 2.
We can already see why such spacetimes might avoid the PS-wormhole's dependence on physics near E and the resulting loss of semiclassical control discussed in section 4.4.   To obtain this, we divide the partial Cauchy surface Σ u into two pieces I and Σ ext along the codimension-2 surface γ. The connections along Σ ext are the same as in (a), but are swapped along the island I. The configuration shown is not a saddle as it does not incorporate back-reaction from the structure near the special surface γ, and incorporating such back-reaction will make the spacetime metric complex. That is, passing the contour of integration through the desired saddle requires deforming it away from real Lorentzian metrics. However, the replica wormhole saddle will coincide with the (real) Hawking saddle in the formal limit n → 1 of the replica number n.
the entire singularity -and in particular the endpoint E -is excluded from the spacetime under consideration, just as for the Hawking wormhole in figure 9. We will see that such replica wormholes exist for all times u < u E lying to the past of the future lightcone of E (after the black hole forms), and thus remove the dependence on UV physics until the black hole reaches Planckian dimensions.
The matter path integral in this replica wormhole spacetime is a Schwinger-Keldysh path integral on an n-sheeted spacetime which includes the insertion of a permutation operator U τ (I) acting on the island I, as well as the operator U τ (I u ) imposed by the boundary conditions. In principle, we should compute this for every such replica wormhole spacetime, in particular for all choices of I, and then perform the integral over metrics. Different choices of γ in a given single-sheeted spacetime result in different geometries for the n-sheeted whole, so our gravitational path integral integrates over all inequivalent choices of Σ int (and we might also sum over nontrivial permutations π acting on the island). If saddle-points exist, the location of γ in the resulting geometry will be determined dynamically by extremizing an appropriate action.

Quantum extremal surfaces
The interesting question now is whether this replica wormhole topology can yield a new semiclassical saddle for given boundary conditions at some replica number n. The general case for integer replica number n > 1 is still under exploration. 16 However, we are able to make more progress by considering a formal analytic continuation of the calculation to non-integer n, studying the problem for n − 1 → 0 + to first order in (n − 1). This will not only be convenient, but also physically interesting, since the corresponding limit of Rényi entropies gives the von Neumann entropy. Specifically, we will first compute the same observables as section 4.3, studying the path integrals with boundary conditions appropriate for computing the expectation value of a cyclic permutation τ acting on n copies of the radiation emitted before retarded time u, encoded in the 'swap entropy' Continuing this to non-integer n and taking the n → 1 limit defines the 'swap (von Neumann) entropy' S swap (u) := lim n→1 S swap n (u). In section 4.3, we found a new interesting contribution to this path integral arising when we chose to join the replicas along the black hole interiors Σ int by the same permutation τ as we apply on I u . Our strategy will be to emulate this for replica wormholes as described above, replacing Σ int by a general partial Cauchy surface I. We will reformulate the calculation of the path integral on such geometries in such a way that n need not be an integer. For n = 1 exactly the permutation group Sym(n = 1) is trivial and there is only the original saddle for Tr ρ(u) that computes the normalization of the state. Nonetheless, by continuing the problem to study a neighbourhood of n = 1 we introduce nontrivial dependence on the choice of I, but can still state the calculation in terms of the n = 1 geometry and associated matter state. As pointed out in [18,19], the condition for a saddle to exist at order (n − 1) was found some time ago: see [71], building on [65,72]. The condition is that the splitting surface γ = ∂I is a quantum extremal surface (QES) [73]. See also [66] for discussion of saddles for real-time path integrals when n−1 is not infinitesimal.
Before reviewing the argument, we recall the definition of a QES. This is a 'quantum version' of an extremal surface, which is a stationary point of the area functional A[γ]. To go from 'classical' to 'quantum' extremal surface, we simply replace the area function with a quantum corrected version, the generalised entropy: 17 Since the matter fields are pure on a full Cauchy surface, the second term is also the matter entropy on the partial Cauchy surface Σ ext bounded by γ and by I + at retarded time u. 16 See [18,[67][68][69][70] for related constructions in Euclidean signature and [66] for saddles with Lorentzsignature boundary conditions analogous to those considered here. 17 We have written Sgen as a functional of the partial Cauchy surface I (up to equivalence under changes that leave the domain of dependence invariant), rather than its bounding surface γ. These data are equivalent unless our spacetime includes a closed universe component (that is, a partial Cauchy slice with empty boundary).

JHEP04(2021)272
This is the more standard way of describing a generalized entropy. The final argument u in S gen reminds the reader that this matter entropy term depends on where we choose this partial Cauchy surface to meet I + .
The definition of a QES γ = ∂I is that S gen is stationary to first order variations of γ. In the definition (5.2), S matter (I ∪ I u ) is the von Neumann entropy of matter fields on I ∪ I u in the state under consideration. For a matter QFT, this entropy is divergent and depends on the choice of UV cutoff. Nevertheless, there is strong evidence [74][75][76] (see also the appendix of [77]) that the combination S gen is finite and not UV sensitive, since matter fields give an equal and opposite infinite renormalisation to G −1 N (using the 'bare' value of G N at the EFT cutoff in (5.2)). Relatedly, if the theory has higher derivative terms or non-minimal couplings to gravity (perhaps induced by quantum effects) then the A[γ] 4G N term should be replaced by the corresponding notion of geometric entropy [78][79][80]. These features are not special to replica wormholes, but are familiar from quantum corrections to black hole thermodynamics, for example. Operationally, it suffices to evaluate (5.2) using the finite IR value of G N and a finite subtracted expression for S matter using some convenient regulator which is local at γ.
We now sketch how the generalised entropy functional and the QES prescription arise from the path integral, by looking for replica wormhole saddle-points with boundary conditions for computing Tr U τ (I u )ρ (n) . 18 These spacetimes are replica symmetric: that is, the geometry respects the n-fold cyclic symmetry possessed by the boundary conditions, as well as the two-fold symmetry swapping 'bra' and 'ket' branches of the Schwinger-Keldysh contour. We are thus considering metrics that are obtained by taking 2n replicas of a single spacetime (n of them after CPT conjugation) to the past of a Cauchy surface I ∪ Σ ext ∪ I u , and gluing them along that Cauchy surface, though on the I and I u portions of this Cauchy surface we perform this gluing using a cyclic permutation τ between replicas. It suffices to check that we have a saddle-point varying only amongst such replica symmetric configurations, since the symmetry ensures stationarity to variations which break this symmetry. This will enable us to describe the problem in terms of a single copy.
First, we compute the matter path integral on such a geometry. As noted above, the replica wormhole geometry is the Schwinger-Keldysh contour giving the expectation value of U τ (I u ∪ I) for the matter state on the final Cauchy surface Σ = I ∪ Σ ext ∪ I u produced by unitary evolution from the initial conditions on I − . We can express this in terms of the Rényi entropy of the matter reduced density matrix ρ I∪Iu . This is much the same as the discussion in section 2.3, except we now are computing the Rényi entropy on I ∪ I u , not just on I u . We can thus write the n-replica matter path integral as The factor (Z matter ) n = Tr(ρ I∪Iu ) n gives the normalisation of the state on the unreplicated geometry, and is independent of γ. The matter effective action is thus given by n times the n = 1 effective action, plus a term from the Rényi entropy on I ∪ I + : log Z (n) matter = n log Z (1) matter − (n − 1)S n (I ∪ I u ).
(5.4) 18 Here we depart from the historical presentation of the arguments in order to simplify the discussion. This extra term will become the matter von Neumann entropy S matter (I ∪ I + ) appearing in the generalised entropy (5.2) when we continue this close to n = 1. In particular, since the Rényi entropy is defined for any n ≥ 1, we have succeeded in describing the matter integral in such a way that n is not restricted to be an integer. We now do the same for the integral over replica-symmetric metrics. 19 Since the Einstein-Hilbert action is local, it is tempting to say that the action on our n replicas is simply n times the action on a single copy. This is almost true, but as described in [65] (following similar Euclidean observations in [69]) there is an additional local contribution at the surface γ. To understand this, it is helpful to imagine deforming the Schwinger-Keldysh contour to pass through our final Cauchy surface in an imaginary time direction, so that we can think of the geometry as being Euclidean (at least locally, very close to γ). In this vicinity, the n-sheeted geometry (sketched in figure 16 for n = 2) is obtained from the metric on a single copy by slicing the n replicas along the surface I emanating from γ, and joining them back together with cyclic identifications. Our n-copy geometry must be smooth at γ so that we satisfy the equations of motion there, but this implies that each single geometry is not smooth: it has a conical defect at γ with opening angle 2π n . In particular, this requires back-reaction that will modify the geometry on each replica in some n-dependent way. 20 However, we can find a saddle-point by solving the 19 What we have called the matter path integral should include linearised metric fluctuations as explained after equation (3.1), here computing the entropy of gravitons. In particular, the path integral thus incorporates small deviations from replica-symmetric metrics. In practice, this is rather subtle, but the subtleties are local at γ and so do not accumulate to become significant. 20 Smoothness of the Euclidean geometry implies that such back-reaction can be small and that a oneloop perturbative treatment is self-consistent. It would be interesting to understand in more detail how this relates to classic analyses [81,82] suggesting that quantum back-reaction should be large at singularities in the causal structure such as that associated with our splitting surface, and to conjectures that the presence of such singularities may be related to a Morse index [83][84][85][86].

JHEP04(2021)272
equations of motion from varying the metric on a single copy while imposing the 2π n defect boundary condition at γ, which is a problem which can be continued in n, and in the n → 1 limit we return to the original smooth geometry. Now, if we were simply to evaluate the gravitational action on this single-copy singular configuration, we would find a contribution 4G N from curvature with delta-function support at γ. 21 But this contribution should not be there, since we are really evaluating the action on the n-copy metric, which is smooth and so has no such singular piece. To make up for this difference between the correct n-replica action and the singular Einstein-Hilbert action, we must subtract this 'by hand'. The gravitational action on the replica manifold is therefore given [65,69] In the n → 1 limit, the area term above gives the geometric term in the generalised entropy (5.2). Note that this is a real contribution when the action is evaluated in Euclidean signature, so despite the Lorentzian setting it weights the path integral by the exponen- 4G N , and not by a phase. The same basic phenomenon was observed long ago in [15].
We now have a description for the path integral over replica symmetric configurations that we can nicely continue in n, and which we can study for small values of (n − 1). There are two types of term appearing in the action which weights this integral. First, we have terms which are independent of γ, namely the local gravitational action S (1) EH in (5.5) and the normalising factor (Z (1) matter ) n from (5.3). Together, these just give n times what we will call the single-copy action (that is, the gravitational action -including a contribution from the singularity for n = 1 -plus matter effective action). Secondly, we have the two terms making up the generalised entropy, namely the area term in (5.5) and the matter entropy in (5.3). For n − 1 1, the second class of terms give a small correction so we can ignore them at first, obtaining simply the path integral that computes the norm of the state. Since we will also need to divide by this result to get our final expectation value, such contributions cancel completely in the final expression.
The above considerations fix a saddle-point geometry on a single replica. However, there remains a residual integral over codimension two surfaces γ within that geometry. Note that we require a saddle point for this integral as well if we are to specify a saddle for the full n-fold replicated geometry.
For the integral over γ, the weighting is provided by the second set of terms: 21 For one way to see this, we can split the action 1 16πG N R into an integral in the directions parallel to γ, which gives the area, and a transverse two-dimensional integral. We can evaluate the latter integral on a small circle centred on γ using the Gauss-Bonnet theorem. See [15] and [66] for details of this procedure in Lorentz signature near singularities in the causal structure of the form associated with γ. 22 It is also true that (5.5) defines a good variational principle on the singular single copy defined by taking the n-fold quotient of the n-replica manifold. See [87] for a full discussion of the Euclidean case. The Lorentz signature case follows by analytic continuation; see also [66].

JHEP04(2021)272
where the last step indicates a saddle-point evaluation of the integral over surfaces. Now, in principle we should also realize that for n > 1 the singularity at γ will back-react on the single-replica metric and thus change the value of the single-replica action S (1) EH . But since at n = 1 we are at stationary point for S (1) EH , this effect is quadratic in (n − 1) and can thus be ignored for the purpose of computing our swap von Neumann entropy; see [66] for further discussion of back-reaction at finite n − 1 in saddles for real-time path integrals.
The saddle-points of (5.6) are precisely the quantum extremal surfaces, since these are the points at which S gen is stationary. We may attempt to approximate this by including only the dominant saddle-point and noting that for n near 1 the dominant saddle is given by the term in which S gen takes the smallest value. 23 [73], following generalisations to time-dependent situations [95] and inclusions of quantum corrections [72]. In this context where the quantum extremal surface γ is compact and hence bounds an island I, it has become known as the 'island formula' [96]. These were all originally stated in the context of holographic duality, with the result interpreted as a von Neumann entropy of a dual quantum system. Here, we do not assume any such dual description so our interpretation is rather different, instead predicting the outcome of 'swap' experiments performed on multiple sets of Hawking radiation.
Incidentally, the argument reveals why we should expect that S gen is finite and not UV sensitive. We obtain S gen as a limit of partition functions over replica manifolds which are smooth, with no singular features at the surface γ. These features of S gen are ensured if we have a sensible effective theory.
The first term in S gen (γ; u) = A[γ] 4G + S ext (Σ int ∪ I u ) is naturally of order G −1 N , while the second matter entropy term will typically be a small correction of order one. This means that in most circumstances, a QES will be close to a classical extremal surface. However, this in not always the case. In particular, for evaporating black holes there may be no nontrivial classical extremal surface but, as we will see presently, due to the parametrically long times involved there is nonetheless a QES.

Contributions from replica wormholes
The discussion of section 5.2 reduced the study of replica wormholes near n = 1 to the study of quantum extremal surfaces in the original semiclassical n = 1 saddle. Recall that for us this is the 'Hawking wormhole' in figure 9. A trivial case is when the island I and This is precisely the question that was studied in references [20,21]. Those works showed that a non-trivial QES γ exists soon after the black hole forms (after roughly a scrambling time, which is logarithmic in G N ). To locate the QES, we first define the function v app (u) so that for a given outgoing time u, the apparent horizon of the black hole lies at ingoing time v = v app (u). Given our spherical symmetry, we may define the apparent horizon as the (spherical) surface on which the area of the transverse sphere is stationary under variations in the outgoing null direction. This surface is slightly outside the event horizon since the black hole is evaporating, so the function v app (u) is well-defined for times u soon after formation of the black hole, up until the evaporation time u E . The works [20,21] showed that a QES computing S swap (u) exists very close to the event horizon at advanced time close to v app (u), with the corrections to this advanced time being of order the inverse black hole temperature β. This is sketched in figure 17.
The generalised entropy of γ is dominated by the area term, so S gen (I; u) is close to the Bekenstein-Hawking entropy S BH (u). This QES thus becomes dominant after the Page time and causes S swap (u) to follow the Page curve: S swap ∼ min S Hawking (u), S BH (u) . (5.9)

JHEP04(2021)272
The physics that allows such a QES to exist is rather generic, and in particular is independent of the dimension or asymptotics of the spacetime. Using spherical symmetry, it is sufficient to argue that S gen is stationary to variations in ingoing and outgoing null directions. The outgoing variation of the area vanishes on the apparent horizon, so it is unsurprising that the outgoing variation of S gen can vanish on a nearby surface γ. The ingoing variation is more subtle, requiring a balance between quantum entropy and classical area terms. This is possible due to the exponential divergence of outgoing geodesics near to the event horizon, producing a logarithmically growing contribution to the entropy. For detailed arguments, we refer the reader to the original references [20,21] with AdS asymptotics (though for similar calculations with flat asymptotics see [24][25][26][27][28]). We emphasise that there is no classical extremal surface close to γ at which the A[γ] 4G term would be stationary on its own. The entropy term is thus critically important for the extremisation, with large gradients in entropy arising from the large relative boost between the near-horizon and asymptotic region. As a result, the corresponding replica wormholes are not related to any saddle of the classical action, but only exist as saddles of the quantum-corrected effective action as discussed above.
To make contact with the discussion of section 4, we can think of the PS wormholes as spacetimes of the above form in which we simply take the island to be the entire black hole interior, I = Σ int , so that γ = E . If we choose the area term from E to be zero, the corresponding generalised entropy is S gen (Σ int ; u) =S Hawking (u), giving a third term over which we should minimise in (5.9). Since this term arises from saddle-points which are not under semiclassical control, it is unclear whether or not it should really be allowed. But in the presence of the new QES and the accompanying replica wormholes, we see that it is in any case irrelevant. For any time u < u E , the replica wormhole generalized entropy S gen (I; u) ∼ S BH (u) is smaller thanS Hawking (u) by at least a factor of order one. Since these quantities are both large, the difference is also large. At follows that PS wormholes never dominate, and in fact can only provide at most an exponentially small correction that we ignore.
We thus no longer require any input beyond semiclassical physics or assumptions about E , resolving the problem of section 4.4.1. Furthermore, since γ is located near the past light cone of the relevant cut of I + , we also avoid the causality issues described in section 4.4.3 for the PS proposal. However, we can think of the PS wormholes as a limit of saddle-points which are under semiclassical control, where we take γ to approach E . Indeed, this is what will happen to the QES γ in the limit u → u E . This provides some justification for using the PS wormholes after evaporation (i.e., for u > u E ) as a reasonable extrapolation of controlled calculations at earlier times.
In summary, we have considered a context where an asymptotically flat black hole radiates to I + , with a focus on the region I u ⊂ I + before retarded time u. We have then studied the expectation value of cyclic permutation operators τ on n copies of the radiation in I u . This models the actual results of measurements made by a sophisticated experimentalist 24 who allows n identically-prepared black holes in the same universe to JHEP04(2021)272 evaporate, captures the radiation emitted up to corresponding retarded times, and then measures the action of the corresponding permutation. The experimentalist might then use her measurements to deduce the von Neumann entropy of the radiation on I u ; we interpret the n → 1 limit of the swap calculation as a prediction for the result. With the new QES, following from the replica wormholes, the result reproduces the Page curve, affirming the predictions of BH unitarity.
The expectation is that the replica wormholes exist also for integer n > 1, and give similar results for the expectation value of permutation operators. This would allow the above experimentalist to avoid the awkward step of taking the n → 1 limit. It remains to establish this in full, though see [70] for analogous numerical n = 2 constructions in Euclidean signature, and see [66] for discussions and explicit examples of classical realtime integer n replica saddles (without back-reaction from quantum fields).

Replica wormholes for other observables
So far, we have considered the contribution of replica wormholes to the expectation values of permutation operators U τ (I u ) and thus swap entropies (4.7). It is now natural to ask whether such topologies can also contribute to other observables. This question was also discussed in [98], which inspired many of the considerations in this section.
From one perspective, such contributions seem inevitable. We can write the expectation value of U τ (I u ) as a sum over matrix elements of the density matrix ρ (n) (u) that describes radiation on n copies of I u from n evaporating black holes. This gives where i labels a complete set of boundary conditions on I u . The result (5.10) is simply a reorganisation of the path integral studied above in which we first perform the path integral with fixed matter fields on I u to compute matrix elements of ρ (n) (u), and only then integrate over all possible such boundary values of matter fields with appropriate identifications to perform the sum shown on the right-hand-side. Since the left-hand-side receives replica wormhole contributions, this must be true of the right-hand-side as well, and thus of the n-evaporation radiation density matrix ρ (n) (u). One should thus expect generic observables involving n copies of I u to be modified by replica wormholes as well. However, this argument leaves open whether the required contributions to matrix elements or other observables are large enough to appear at the semiclassical level, and thus also whether replica wormholes need to make an explicit appearance as saddles in their semiclassical computation. Since the right hand side of (5.10) has a sum over exponentially many terms, a semiclassical description of the sum need not tell us anything about the individual terms. Nonetheless, we argue below that the conclusion is plausible, and that replica wormholes may well give saddle-points for matrix elements or for the expectation values of (Rényi) entropy for an unknown state generally requires a number of copies which is exponential in the smaller candidate entropy. More sophisticated methods improve the coefficient of the exponential over that associated with the simple swap test, but the best algorithm still requires exponentially many copies; see e.g. [97].

JHEP04(2021)272
values of simple operators. Our arguments will be rather heuristic and suggestive, so a more detailed study is required to establish this carefully; [98] goes a long way towards this aim.
To illustrate the point, we first consider a very simple observable, namely the product of expectation values of simple operators inserted on different copies of I + : (2) ). (5.11) Here O r (u) is a simple local operator such as the value of particular field mode on I + at retarded time u, and r denotes which 'replica' of the Hawking radiation on which it acts. In contrast to our studies of the swap operator above (which mixes boundaries associated with different values of r), since we now compute the expectation value of a product of operators that each act on a single boundary the corresponding boundary conditions for the path integral do not include any connections between the two asymptotic regions. Nevertheless, as explained below we anticipate saddle-points in the gravitational path integral which dynamically connect the boundaries via replica wormholes.
The reason for this is in fact much the same as for expectation values of the swap operator. There, the existence of a replica wormhole saddle relied on an interplay between the gravitational action (through contributions associated with the area of the surface γ) and the matter effective action, in that case the matter Rényi entropy. In computing (5.11), the role of the matter effective action is played instead by the logarithm of the two-point function O 1 O 2 evaluated in the replica wormhole geometry. But at the qualitative level this behaves much the same as the matter Rényi entropy. In particular, it has a logarithmic singularity as the surface γ approaches null separation from the retarded time u on I + , as occurs near the apparent horizon of the black hole sufficiently far in the past. The interplay between the classical area and such a logarithmic singularity was precisely what allowed for the existence of a nontrivial QES above. It is therefore reasonable to expect that there may similarly exist a semiclassical replica wormhole saddle for (5.11).
However, the effect of this saddle should be much smaller for (5.11) than for expectation values of the swap operator. In the latter case, replica wormholes dominate the late-time answer because the matter entropy in the Hawking saddle is naturally 'extensive' in the sense that it grows linearly with time. As a result, for expectation values of the swap operator the one-loop-corrected action of the Hawking saddle becomes larger than the action (associated with the area of γ) for the replica wormhole. But there is no such extensive effect for (5.11), and no corresponding late-time suppression of contributions from the Hawking saddle. So one expects replica wormholes to contribute as subdominant saddles, and thus to give corrections which are suppressed exponentially in S BH . 25 The suppression by exponential factors agrees with our heuristic argument from (5.10), as the sums on the right-hand-side of (5.10) should run over exponentially many terms so that small corrections of this order in each off-diagonal matrix element lead to the desired leading-order corrections on the left-hand-side of (5.10).

JHEP04(2021)272
In practice it may be rather challenging to check for a replica wormhole saddle-point for quantities like (5.11), since it would seem to require finding a back-reacted (and presumably complex) solution to the gravitational equations sourced by the quantum effective action, just as for integer Rényi entropies. It may be directly tractable in simple models of gravity (as in [98]), or by studying some appropriate family of quantities with an n → 1 limit analogous to the von Neumann entropy, to evade the complications of back-reaction.

The Hilbert space of baby universes
The result reviewed above, showing that replica wormholes suffice to make the swap entropy of Hawking radiation follow a Page curve, is satisfying in many ways. In particular, it gives a completely semiclassical computation that supports Bekenstein-Hawking unitarity. Moreover, it does so by computing a quantity that is experimentally accessible, at least in principle.
However, it also raises many questions. While we now have a path-integral derivation of the Page curve, our new ingredients do not affect the computation of expectation values of observables for the Hawking radiation from a single black hole. The density matrix of radiation is still ρ Hawking (u) as computed as in section 3, and which still comes just from the saddle-point pictured in figure 9. In particular, the swap entropies obtained in 5 are not equal to the Rényi entropies of ρ Hawking (u). How are these results to be reconciled?
The simple answer is that the density matrix ρ (n) (u) on n copies of radiation is not simply equal to the tensor product ρ Hawking (u) ⊗n . But this means that the results of independent and widely separated experiments are correlated, and thus give rise to a violation of cluster decomposition. How are we to interpret predictions of the theory in such a situation? What form can these correlations between experiments take? And what is the Hilbert space interpretation of these results?
In this section we answer these questions by cutting open the path integrals described so far, to obtain a Hilbert space interpretation of the correlations between boundaries from a sum over intermediate states. Before diving in we briefly preview the central ideas, which are much the same as in [11,12,14]. The intermediate states in question are states of closed 'baby' universes which propagate between distinct asymptotic boundaries. But the resulting correlations are restricted to be purely classical, so that we may regard expectation values of asymptotic observables as random variables selected from some probability distribution. The reason is that such asymptotic observables can be regarded as a mutually commuting set of operators acting on the Hilbert space of baby universes, which can be simultaneously diagonalised into superselection sectors. It therefore appears that semiclassical gravity predicts results for asymptotic observers which are consistent with BH unitarity, though the precise dynamics is not uniquely determined but chosen from an ensemble. That ensemble depends on a choice of the initial state of closed (baby) universes.

From path integrals to Hilbert spaces
To set the stage, we begin by briefly reviewing the relationship between the path integral computations of quantum amplitudes and their Hilbert space formulation, emphasizing features relevant to gravitational theories.

JHEP04(2021)272
A Hilbert space appears when we cut a path integral into pieces, writing the integral over the cut as a sum over intermediate states. Before incorporating dynamical gravity, let us discuss this for a QFT path integral on a fixed background spacetime M, and cut the geometry along a Cauchy surface Σ of our choice. This cut manifold has two new boundaries Σ − and Σ + , the past and future sides of Σ respectively. We can now perform the path integral on this manifold with boundary, imposing boundary conditions that the fields φ approach φ ± on the boundaries Σ ± (for example), and integrating over φ elsewhere. To obtain the original path integral on M, we ensure continuity of the fields at Σ by setting φ + = φ − = φ Σ , and then integrate over all field values φ Σ on Σ.
This cutting and gluing has a Hilbert space interpretation as the insertion of a resolution of the identity, 1 = Dφ Σ |φ Σ φ Σ |. We have a Hilbert space H Σ , formally spanned by field eigenstates |φ Σ labelled by field configurations on Σ, where the inner product φ + |φ − is given by a functional delta-function setting φ + = φ − . In a semiclassical approximation, where the path integral is computed by fluctuations around a saddle-point with approximately Gaussian weighting, this Hilbert space H Σ becomes a Fock space for linearised fluctuations about the saddle. This is somewhat complicated by the inclusion of dynamical gravity, when we also sum over the topology and geometry of spacetime. As in QFT we can cut the path integral along some Cauchy surface Σ, and include the geometry of Σ in the sum over intermediate states. But diffeomorphism invariance makes this more subtle. We have many choices of slice Σ that all lead to the same Hilbert space as long as they agree asymptotically (where the geometry is fixed by boundary conditions). These different choices are related by the Hamiltonian constraint or Wheeler-DeWitt equation. We will not need the technical details here, but we do note that this modifies the inner product on the Hilbert space associated with the cut. Due to gauge invariance, it is natural that distinct field configurations on Σ (now including the induced metric) need not define orthogonal states. But something stronger is true here, as the inner product is determined by the dynamics. Indeed, we will see below that the effect of replica wormholes can be described as a dynamical modification of the inner product on the Hilbert space at the cut.
In the context of evaporating black holes, we saw that the semiclassical path integral was helpful for computing observables on I + before some retarded time u E at which the black hole becomes Planckian and the semiclassical treatment is no longer valid. For this situation we are most naturally led to describe a gravitational Hilbert space describing the states on a partial Cauchy slice Σ u which, as part of the boundary conditions, is required to meet I + at time u. This would describe a system with a boundary. However, it will be conceptually simpler and cleaner to consider instead a Hilbert space of closed universes without boundary. As will be described in section 6.2 below, the simplest way to pass from the former to the latter is by making complete measurements on I + . However, as in section 4 this comes at the cost of requiring some assumptions about the evaporation. We will thus at first revive the 'PS assumption' of section 4 in order to explain the main ideas involving in passing to a description in terms of closed universes. We will use this assumption for the next few sections, though in section 6.5 we will describe the modifications required to avoid it, and in fact to avoid using any assumption outside the domain of semiclassical physics.

Hilbert spaces for Hawking and Polchinski-Strominger
Rather than going directly to the replica wormholes of most interest, we will warm up by discussing the Hilbert space interpretation of the Hawking and Polchinski-Strominger calculations of sections 3 and 4. In particular, for now we will make use of the 'PS assumption' introduced in section 4 to simplify the discussion.
We begin with Hawking's calculation using a single black hole and computing the expectation value of some operator O on I + . The Hawking wormhole computing this expectation value consists of bra and ket copies of the black hole spacetime joined along some final Cauchy slice. Using The PS assumption, we may choose this joining Cauchy slice to consist of I + and Σ int , a Cauchy surface for the black hole interior. To obtain a Hilbert space interpretation, we can first cut this geometry along I + , where we obtain the Hilbert space of 'out states' H I + . We choose an orthonormal basis {|i I + } i for this space. However, cutting only along I + is not sufficient to write the expectation value as a sum over states, since the geometry is still connected through the black hole interior. We must thus also slice the geometry along Σ int , obtaining a Hilbert space H int with orthonormal basis {|a int } a .
Once we have cut along both I + and Σ int , we have decomposed the geometry into disconnected bra and ket copies of the Hawking spacetime as shown in figure 18. The path integral on the ket spacetime with boundary conditions imposed on I + and Σ int computes the wavefunction ψ ai of a state in H I + ⊗ H int : The path integral on the conjugate bra spacetime gives the complex conjugate of this wavefunction, which is a state in the dual space H * I + ⊗ H * int . To glue these spacetimes back together along Σ int , we sum over all states of the interior and take the inner product, obtaining the Hawking density matrix for the state on I + : ρ Hawking = i,j,a,bψ bj ψ ai b|a int (|i j|) I + . (6.2)

JHEP04(2021)272
An orthonormal basis on Σ int gives b|a int = δ ab , and we have This is a mixed state because we have traced out the black hole interior, with which the matter on I + is entangled. The Hilbert space on I + is thus insufficient to give a complete description of the original state of the system on I − . We must also include information about the state on Σ int , which we may think of as a closed 26 'baby' universe, born from the black hole formed in the 'parent' universe. In introducing this terminology, we warn the reader that there are two slightly different notions of baby universe in the literature. When required for clarity (mostly in section 6.5), we will use the term 'PS baby universe' to distinguish the above notion from others that may arise. So far, this is a fairly conventional description of information loss. But we will go beyond this by considering the computations of section 4 that involve n copies of the black hole. The Polchinski-Strominger wormholes consist of multiple copies of the Hawking wormhole, so to obtain a Hilbert space interpretation we can again slice them along n copies of I + , where we have n copies of the asymptotic Hilbert space H ⊗n I + , and along n copies of Σ int . After cutting them open in this way, for each term in (4.9) the resulting n 'ket' spacetimes are identical. In particular, they compute the wavefunction of the state |ψ (n) = i 1 ,...,in a 1 ,...,an ψ a 1 i 1 · · · ψ anin |i 1 I + 1 ⊗ · · · ⊗ |i n I + n ⊗ |a 1 , . . . , a n BU (6.4) in H ⊗n I + ⊗ H BU , where H BU is the Hilbert space of closed (baby) universes. We obtain the density matrix on H ⊗n I + by tracing out the baby universes, i 1 , . . . , i n |ρ (n) |j 1 , . . . , j n = a 1 ,...,an b 1 ,...,bn ψ a 1 i 1ψb1j1 · · · ψ aninψb njn b 1 , . . . , b n |a 1 , . . . , a n BU . (6.5) Since we have obtained H BU by cutting along n copies of Σ int , it is tempting to identify H BU with the n-fold tensor product of H int . In that case its inner product would factorize into n copies of the inner product on H int . But if we made this identification, the state (6.4) would be simply the n-fold tensor product |ψ ⊗n , and the density matrix ρ (n) in (6.5) would be the tensor product (ρ (1) ) ⊗n . In particular, we would not find the sum over permutations in (4.4). We will resolve this tension below by not making assumptions about the inner product on H BU , but instead by computing the inner product induced by PS wormholes. Replica wormholes lead to similar modifications to the inner product that we will discuss in section 6.4.
Specifically, the correct inner product on H BU must be obtained by comparing (6.5) with (4.9). Since the PS wormholes involve pairing the n 'ket' copies of Σ int with the n 'bra' copies in any of the n! possible ways, this inner product involves a sum over permutations: . . . , b n |a 1 , . . . , a n BU = π∈Sym(n) δ a 1 b π(1) · · · δ anb π(n) (6.6) 26 We say that the baby universe is closed as the boundary of Σint at E involves a sphere of zero size. We will comment further on this in section 7.

JHEP04(2021)272
Note that this is the inner product on the n-fold symmetric product Sym n H int . As a result, in the Polchinski-Strominger proposal, the baby universe Hilbert space contains a Bosonic Fock space built on the 'one-universe states' H int : Sym n H int . (6.7) We have written 'contains' (⊃) here, since this is not in fact quite the full baby universe Hilbert space. As we will see below, it is natural to extend H BU to be a Fock space built on H int ⊕ H * int , with both baby universes and time-reversed 'anti baby universes'. The second summand H * int (the dual space of H int ) gives the states of a single anti-universe.

Baby universes and ensembles
The physical predictions that follow from the state (6.5) defined by the inner product (6.6) may not immediately be clear. We will describe this in some detail below, taking advantage of the fact that the Polchinski-Strominger proposal is simple enough to allow explicit results. The result will remain useful when we later move beyond the Polchinski-Strominger proposal (and leave behind its challenges), as many of the lessons learned here will remain true for replica wormholes, and also for gravitational path integrals more generally (under certain weak assumptions).

The PS Fock space of baby universes
The predictions of the Polchinski-Strominger proposal can be made manifest by using the familiar representation of the Bose inner product (6.6) as a Gaussian integral: b 1 , . . . , b n |a 1 , . . . , a n BU = α a 1 · · · α anᾱb 1 · · ·ᾱ bn BU , (6.8) where F [α,ᾱ] The normalisation Z is chosen so that 1 BU = 1. The integration variables are complex 27 'baby universe fields' α a labelled by an orthonormal basis of states |a on Σ int . In place of the labels a we could instead use field configurations φ for matter fields on Σ int , so that α is a functional of these fields. Then the Gaussian weighting can be written as an exponential of Dφ α[φ]α * [φ], where we integrate over all field configurations φ. We then compute amplitudes · BU by integrating over all functionals α[φ] with this weighting. 28 27 We use complex fields so that we only count contractions between 'kets' and 'bras', not between two kets, for example. This distinguishes baby universes from 'anti' baby universes. 28 Passing from the gravitational path integral to this integral over functionals α is mathematically analogous to the passage from particle dynamics to a description of quantum field theory as a path integral over field configurations. In this analogy, φ would label points in spacetime, α would be a quantum field (a function of spacetime), and the Gaussian weighting (for a free QFT) is given by the action. The kinetic term in the QFT action can then be understood as arising due to the Hamiltonian constraint on particle worldlines, which we have implemented rather implicitly in (6.9) by diagonalising the physical 'single-universe' inner product. However, the reader should see [99] for comments and warnings about using this analogy to interpret the physics.

JHEP04(2021)272
With this representation, we can write the components of the n-evaporation density matrix (6.5) on n copies of I + as i 1 , . . . , i n |ρ (n) |j 1 , . . . , j n = Ψ j 1 . . .Ψ jn Ψ i 1 . . . Ψ in BU , (6.10) where Ψ i = a α a ψ ai . (6.11) If we for now ignore the integral over α associated with · BU and instead simply fix each α a to some specific value, then the expression completely factorises between the n copies, and also between 'bra' and 'ket' indices: Here we have included a superscript α to emphasise that Ψ α i is now to be regarded as a fixed complex number depending on our choice for each α a . This factorisation means that, for a given value of α a , the Hawking radiation can be described by a pure state |Ψ α ∈ H I + : Now, the above potential factorisation property is spoiled by the fact that we must still integrate over α with some weighting. In other words, the parameters α a are not fixed, but instead selected from a probability distribution. Note, however, that the same choice of α parameter pertains to all asymptotic observers at all boundaries. In particular, for n black holes the state at I + is obtained by a single integral over α, (6.14) for some measure dµ(α) (which in the PS paradigm is given by the Gaussian (6.9)). As a result, any given set of actual measurements 29 of the Hawking radiation states on I + from multiple black holes are correlated in such a way that they are always compatible with n copies of some pure state |Ψ α . But the theory does not give a specific prediction for |Ψ α . Instead, it gives a probabilistic one. For an asymptotic observer, the black hole formation and evaporation can thus be described in terms of an S-matrix taking pure states to pure states, but with an unknown S-matrix selected from an ensemble.
This should be contrasted with the result obtained in the absence of PS wormholes for which the n copies are uncorrelated. Since the inner product on H int can also be written as an integral with respect to the same Gaussian measure 30 dµ(α), we may write this result using independent integrals over α for each evaporation: 29 We remind the reader that quantum mechanics measurements are associated with projection operators and that, while expectation values can be inferred from the relative frequencies of the outcomes associated with projections, a direct measurement of quantum mechanical expectation values would violate the linearity of quantum mechanics. 30 Since Hint gives the n = 1 term in (6.7), we may regard Hint as a subspace of H BU and use the same inner product.

JHEP04(2021)272
We dub this the Hawking result for the n-fold experiment, as the predictions of (6.15) for experiments at I + are given by a Hawking $-matrix [31]. But with PS wormholes we find instead (6.14), which is a classical mixture of n copies of a pure state as described above. Note that the measure dµ(α) arising from the Polchinski-Strominger proposal gives a complex multivariate Gaussian probability distribution for the components Ψ i = i|Ψ of the Hawking radiation wavefunction. The mean is zero, and the covariance matrix is given by the Hawking density matrix ρ Hawking . 31 To understand this from the perspective of the baby universe Hilbert space, we can instead represent the Bose inner product (6.6) in terms of a Fock space generated by baby universe creation and annihilation operators, . . . , a n ,b 1 , . . . ,b Here, A a and A † a annihilate and create a baby universe in the state a. Similarly B b and B † b annihilate and create a time-reversed object that may be called an 'anti' baby universe (an anti-BU). 32 Although anti-BUs did not appear in our discussion above, they naturally form the intermediate states if we considered the time-reverse of our boundary conditions (associated with a white hole that explodes to form a smooth I + with classical matter and quantum fields in the vacuum state but with time-reversed Hawking radiation at I − ). In (6.16), we have used |HH to denote the oscillator vacuum in order to think of it as a Hartle-Hawking state for reasons that we will explain momentarily. Now, the Gaussian integral (6.9) is nothing but the expectation value of (complex) 'position operators'α a in the oscillator vacuum |HH : All the operatorsα a andα † a mutually commute, so we can write the Hilbert space in terms of the position eigenbasis |α BU labelled by a set of complex eigenvalues α a : α a |α BU = α a |α BU .
(6.21) 31 This gives non-normalised wavefunctions. While the normalisations can be absorbed into the measure, doing so appears to introduce a mild n-dependence for dµ(α). One might also consider the possibility of additional involving wormholes connecting the path integral for ρ (n) to the normalising denominator Tr ρ (n) , which should be expected to remove this n dependence. It would be worthwhile to understand this issue in detail, but such a treatment is beyond the scope of this work. 32 These are much the same as the baby universe creation/annihilation operators of [11][12][13], though those references worked in a real basis. There may also be minor differences associated with subtleties discussed in [99].

Lessons and comments
Having completed our derivation of (6.19) from this Hilbert space point of view, let us now pause to extract some useful lessons. The first lesson is that the appearance of only a single integral over α in the n-evaporation state (6.14) follows from the fact that the states (6.21) simultaneously diagonalize the operatorsα a andα † a . The latter statement is a consequence of a more primitive fact that will remain true when we go beyond the PS proposal, in that the boundary conditions for computing expectation values of asymptotic observables will continue to define simultaneously-diagonalizable operators acting on the Hilbert space H BU of closed universes.
Let us first illustrate this rather abstract-sounding statement by recalling that, in the present case, we have boundary conditions Ψ i specifying both an initial state of matter on I − which will collapse to a black hole as well as a final state |i I + of Hawking radiation on I + . The corresponding operatorΨ i = aα a ψ ai (6.24) on H BU either creates a baby universe in some internal state or annihilates a time-reversed baby universe. The path integral computes an expectation value of a product of such operators, one for each separate boundary. 33 It is manifest from (6.24) that the operators are all built from the (complex) 'position' operators α a , and in particular that creation operators A † a never appear alone. Similarly, the time-reversed boundary conditionsψ j would define operators involving α † a , which thus also commute with (6.24). Although we used explicit results for the H BU inner product to derive this result, as argued in [22] it in fact follows from fundamental properties of the gravitational path integral. The point is simply that we may regard (6.10) as an amplitude computed by the quantum gravity path integral with the specified boundary conditions built from Ψ i ,Ψ j . Since the path integral sums over all bulk spacetimes compatible with the stated boundary conditions, the result is independent of how the boundary conditions might have been ordered. As a consequence, the associated operatorsΨ i ,Ψ j commute. 34 33 One may thus refer toΨ as a 'boundary-inserting operator'. Indeed, it is tempting to refer to these as 'boundary-creating' operators. But one should realize that bothΨ and its adjointΨi = aα † aψai create boundaries in this sense. These are thus not standard creation-annihilation operators, and in particular differ from the baby universe creation and annihilation operators A, B, A † , B † . 34 Equation (6.10) describes the inner product of two states that involve only baby universes and not anti-BUs, or in other words states created from |HH by acting with theΨi and not thatΨj. Had we used

JHEP04(2021)272
Another lesson that becomes clear from the perspective of the baby universe Hilbert space is a sense in which our predictions depend what we may call the 'initial state of baby universes' given by (6.23). Indeed, the correlations between different copies of Hawking radiation are mediated by the exchange of baby universes, and we have seen that each set of asymptotic boundary conditions modifies the state of H BU through the action of Ψ i orΨ j =Ψ † j . In the previous sections we thus have implicitly chosen some initial state for closed universes. But recall that our amplitudes were entirely specified by boundary conditions with the experiments to be performed by our asymptotic observer, and that nothing more was added to adjust the baby universe state. As a result, our choice of baby universe state must have been specified by the absence of additional asymptotic boundaries. It is for this reason that we call it a Hartle-Hawking no boundary state |HH . Note that we do not use this term for a state of a single connected closed universe, but a state that can include any number of universes (connected components of space) including zero; indeed, the universe number is not even diffeomorphism invariant if universes can split, join, or appear from nothing. Instead, it is defined according to the spirit of [100] by the absence of boundaries in the path integral which determines the wavefunction.
Had we instead chosen the baby universe initial state to be e.g.Ψ i |HH , expectation values in this state would be adding additional boundaries with boundary conditions Ψ i (from the ket) andΨ i (from the bra). Since we can again expandΨ i |HH in terms of the same basis of α-states, we would again find the n-evaporation density matrix ρ (n) to be a classical mixture of the same pure states |Ψ α described above. However, we will find a different mixture in which the probability distribution for α a ,ᾱ b is defined by the new wavefunction α|Ψ i |HH , which will generally differ from (6.23).
Finally, before proceeding to replica wormholes, we pause to note that the Hilbert space interpretation on which we have concentrated thus far is not unique. It arises from one particular way of cutting the path integral, regarding n sets of boundary conditions as forming a 'ket' state, and taking an overlap with the n conjugate boundary conditions forming the 'bra' state. The same path integral can also be cut in several different ways, giving rise to different Hilbert space interpretations -though always involving the same baby universe Hilbert space H BU . Any such cut splits asymptotic boundaries into two sets, depending on which side of the cut they lie. One set defines a 'ket' state and the other defines the 'bra', with the overlap between the two being obtained by a sum over intermediate baby universe states joining the two sets.
The different interpretations are readily be described in the operator language by writing an amplitude as the expectation values of products ofα a ,α † a in the Hartle-Hawking state |HH (or in another state). Since these operators all commute, we can move any subset of them to the right where they act on the 'ket' state and move the remainder to the latter in the ket-state as well, there would have been additional entries of theΨj on the right-handside. But if we had used the latter in the bra-state, we would instead find additional copies of the Ψi on the right. In general, the rule is that the amplitude contains both the bra boundary conditions and the CPT-conjugates of the ket boundary conditions. This requires theΨi to be the adjoint ofΨi, so that the above mutual-commutativity means that the operators can be simultaneously diagonalized as claimed; see further discussion in [22] and [99].

JHEP04(2021)272
the left to act on the 'bra.' And we can finally insert a complete set of baby universe states between them.
We illustrate this with a simple example computing the n = 2 amplitude, We interpreted this earlier as the overlap between the statesΨ i 1Ψ i 2 HH andΨ j 1Ψ j 2 HH , so that the intermediate states consisted of two baby universes. Alternatively, we could reorder the boundary-inserting operatorsΨ,Ψ † to write the amplitude as the overlap between statesΨ i 1Ψ † j 1 HH andΨ i 2Ψ † j 2 HH . The intermediate states are then |HH (corresponding to the trivial contribution where the black hole interiors are not swapped) and states |a,b of one baby universe and one anti-universe (corresponding to the nontrivial PS wormhole): This interpretation (6.26) is not the most natural one if we wish to describe an intermediate state in real time. Indeed, it is somewhat analogous to describing intermediate states exchanged in the T-channel of some QFT scattering process, which would naively be associated with a Hilbert space for the QFT associated to a timelike surface that splits space into two parts (as opposed to the usual Hilbert spaces associated with spacelike Cauchy surfaces). However, in the operator description above the intermediate states continue to lie in the same baby universe Hilbert space H BU . This Hilbert space description will prove useful in the context of replica wormholes. In particular, it will be straightforward to adapt this description to incomplete measurements at I + by taking a partial sum over the indices i, j to trace out the unobserved piece of the state.

Replica wormholes as baby universe interactions
We now incorporate the replica wormholes introduced in section 5 into our discussion of baby universes. We can think of the Polchinski-Strominger proposal discussed above as a theory of 'free' baby universes, in the sense that H BU is a Bosonic Fock space. The replica wormholes then modify the inner product on H BU by incorporating 'interactions' between baby universes.
For now, we will continue to make the PS assumption that allows us to treat the union of I + and Σ int as a Cauchy surface for an evaporating black hole, where we remind the reader that Σ int runs from the regular origin below the black hole singularity out to the endpoint of evaporation E . This is a useful crutch to simplify the exposition but, as we will explain later, we will be able to upgrade the argument so as to remove this assumption. In addition, for simplicity here we will only consider replica wormholes such that the island I on which we join the replicas lies inside the event horizon. This is well-motivated, since a QES is guaranteed to lie behind the event horizon under the assumption of the quantum focussing conjecture [77] (though this does not directly apply to replica wormholes for n > 1). In such a case, we can choose our Cauchy surface Σ int for the black hole interior to pass through γ = ∂I.

JHEP04(2021)272
Now, just as the Polchinski-Strominger wormholes induced extra terms in the inner product (6.6) by pairing baby universes with permutations, the replica wormholes introduce new terms with a permutation acting only on the associated island. We thus write . . . , b n |a 1 , . . . , a n BU ⊃ ( b 1 | ⊗ · · · ⊗ b n |)U π (I)(|a 1 ⊗ · · · |a n ), (6.27) where the notation ⊃ means 'contains terms of the following form'. The states and inner products on the right hand side of this equation are taken in the tensor product of n copies of the black hole interior Hilbert space, H ⊗n int . The operator U π (I) acts as the permutation π on those parts of the n copies of H int associated with the island I. If we take I = Σ int , we recover the terms in (6.6). We can be a little more explicit by choosing a basis of states |a = |a , a for H int which factorises between an orthonormal basis of states |a for the island I and a corresponding basis |a for its complement: b 1 , . . . , b n |a 1 , . . . , a n BU ⊃ δ b 1 aπ(1) δ b 1 a 1 · · · δ b n aπ(n) δ b n a n . (6.28) Note that adding analogous terms to the inner product in a continuum quantum field theory would naturally given a vanishing contribution. Indeed, in direct parallel with our discussions on I + , for n copies of a given state they would compute e −S QF T n [I] where, is the Rényi entropy of I. Such contributions are then exponentially suppressed by the area of γ in units of the cutoff. 35 However, as we discovered by computing amplitudes with the path integral in section 5.2, making gravity dynamical naturally leads to finite contributions from the terms on the right-hand-side of (6.27) or (6.28). Thus we should think of the states |a as encoding not only the matter state, but also geometrical degrees of freedom (perhaps including the location of γ in the split |a , a ).
Note that if we interpret the sum over states |a in a natural way as a sum over real Lorentzian geometries, the saddle-point replica wormholes discussed in section 5 do not appear directly since they are complex. The direct sum over states |a will be a sum of highly oscillatory phases, which (as is familiar from steepest descent integrals) can be evaluated by deforming the contour. For further discussion see [66].
Using language analogous to that of the Feynman diagrams of perturbative QFT, we can think of the contribution (6.27) to the baby universe inner product as an interaction, giving a 'vertex' for n −→ n 'scattering' of baby universes. Indeed, we can borrow standard techniques from perturbative QFT to compute the associated effect on the expression in (6.9) for the inner product in terms of integrals over α. To include a replica wormhole, we insert a product of n α fields and nᾱ fields, summing over indices to induce the required connections. For example, for n = 2 we insert a term a 1 ,a 2 ,a 2   α a 1 ,a 1ᾱ a 2 ,a 1 α a 2 ,a 2ᾱ a 1 ,a 2 (6.29) into the integrand on the right-hand-side of (6.9). Summing over all possible combinations of replica wormholes exponentiates this factor (and similar terms for all n) so

JHEP04(2021)272
that it modifies the original measure dµ(α) from e − a αaᾱa to a rather complicated non-Gaussian measure.
On the other hand, aside from this modification of the inner product (and the corresponding changes to the wavefunction of the Hartle-Hawking state and the algebra of universe creation and annihilation operators), there are no further changes to either the arguments or the conclusions of section 6.3. In particular, the expression for the n-evaporation density matrix given in equation (6.14) remains true, with the modified measure dµ(α) described above. The details of this measure are not of primary importance for us, except that the modified measure is dominated by states |Ψ α of radiation which follow the Page curve (see section 7.1 for justification).

Baby universes with semiclassical control: dropping the PS assumption
The above sections developed the notion of PS baby universes and the associated H BU using the PS assumption. This allowed us to give a very explicit treatment of the 'saddle-point geometries', the associated amplitudes, and the resulting inner product on H BU . However, it turns out that the most important lessons from the Hilbert space interpretation do not rely on the PS assumption. These lessons include i) the existence of a baby universe Hilbert space H BU , ii) that the inner product on H BU is determined by the path integral, and iii) the fact that asymptotic quantities define simultaneously diagonalizable operators on H BU and the associated existence of superselection sectors.
As we now show, all of these results can be derived using physics that remains fully under semiclassical control. However, the arguments are necessarily more abstract than those using the PS assumption above. Some readers may thus choose to skip this section on a first reading of this paper.
To proceed, we follow the same basic strategy as in our study of Rényi entropies in section 5. Indeed, we will obtain a Hilbert space interpretation by slicing open the path integrals and the associated replica wormhole saddles discussed in section 5.1. We thus specify the state on I + only on the subset I u , choosing u < u E so that I u does not intersect the future light cone of E . We will then sum over all boundary condition on the rest of I + . We may then expect the relevant saddles to remain under semiclassical control as desired.
In particular, since we impose boundary conditions only on I u , we may cut the path integral along Cauchy surfaces Σ u which extend to meet the asymptotic boundary I + at the associated retarded time u. We may then further choose Σ u to be well to the past of both the singularity and E .
In a replica setting, we require several such cuts. The resulting Hilbert spaces H n associated with such cuts are labelled by the number n of boundaries on which these cuts end. 36 Although in the Hawking saddle it arises from n copies of some given Σ u ,

JHEP04(2021)272
we emphasize that H n is not just the product H ⊗n 1 due to contributions from replica wormholes. In particular, as we discuss below, the Hilbert space H 0 without boundaries is not the trivial Hilbert space, but should instead be the space H BU of closed universes.
With this small change in boundary conditions, most of the considerations above will continue to hold. We simply take |i to label a basis of states on I u rather than on the entirety of I + , and we take |a to be a basis of states on a Cauchy surface Σ u meeting the boundary at time u. If we consider the path integral for any single copy of the spacetime, truncate it at Σ u , and impose boundary conditions for the quantum fields on I u and Σ u , the result computes a wavefunction i,a ψ ai (u)|a ⊗ |i on Σ u ∪ I u for a state in H 1 ⊗ H u (with H u the Hilbert space on I u ).
Furthermore, we can write states on the n-boundary Hilbert space H n as linear combinations of a basis |a 1 , . . . , a n . The notation here is similar to that used above for n baby universes, but there is a crucial difference. Because we treat asymptotic boundaries as distinguishable, the order of the a i is important. The states |a 1 , a 2 and |a 2 , a 1 are not the same, and H n does not exhibit Bosonic statistics. This is associated with the fact that we specify the asymptotic identifications between Cauchy slices Σ u as part of the boundary conditions, so there can be no terms in the inner product that permute copies of Σ u in their entirety.
Nonetheless, we still find contributions to the inner product from replica wormholes. Such contributions again permute island regions I just as in equations (6.27), (6.28): . . . , b n |a 1 , . . . , a n ⊃ ( b 1 | ⊗ · · · ⊗ b n |)U π (I)(|a 1 ⊗ · · · |a n ) (6.31) In the second line we have split the index a in two, so the state |a = |a , a on Σ u is labelled by the state a on the island I and a on its complement Σ u \ I, where Σ u \ I now extends to infinity. Translating the discussion above to this notation, the boundary conditions require that the a indices must be paired without permutation, while replica wormholes give rise to the permutation π acting on the a indices. Again we may use Ψ i to denote the boundary condition that fixes both matter at I − that collapses to form the black hole and a state |i on the Hilbert space of state on I u . And again we may take Ψ i to define an operatorΨ i on the Hilbert spaces H n . But now this operator adds a boundary, increasing the value of n. Thus we writeΨ i (u) : H n → H n+1 . Using our bases, the action of this operator takes the form Ψ i (u)|a 1 , . . . , a n = a ψ ai (u)|a, a 1 , . . . , a n . (6.33) Because boundaries are distinguishable, it is important that we added the extra label a to the first slot (we could, if desired, define other versions ofΨ i (u) which choose a different boundary' (associated with the anti-baby universes discussed below), such that the most general Hilbert space Hn,n for the case of sphere boundaries is associated with n boundaries andn anti-boundaries. (There is a natural linear isomorphism from the dual space H n,n to Hn,n.) Finally, while one might at first expect the Hilbert space to also depend on the advanced times u associated with the location of these spheres on I + , choosing a notion of time-translation on I + allows one to canonically identify Hilbert spaces with different values of u.

JHEP04(2021)272
ρ ij (u) −→ |j〉 〈i| Figure 19. The boundary conditions corresponding to the operatorρ ij (u). We have flat asymptotic regions as pictured, with matter boundary conditions at I u specified by the states |i , |j . The Cauchy slices meeting I + at time u must be identified as shown in the asymptotic region. We do not specify what happens to the spacetime away from the asymptotic region, inside the dashed curved. In particular, this allows the spacetime to connect with other boundaries, and the future Cauchy slices need not be identified in the same way in their entirety. For example, the spacetimes in figure 15 involve two copies of such boundary conditions, computing Tr(ρ (2) ordering). In particular, it means that these operators no longer commute. And in any case we cannot talk about diagonalizing them since they map between different Hilbert spaces. Intuitively, this is because the Hilbert spaces H n carry information not just about the closed baby universes, but also about the state that escapes to I + after time u. We would thus like to 'trace out' this extra information, leaving only the piece of the state truly associated with baby universes and which mediates the correlations on I u and gives rise to the Page curve.

Replica wormhole baby universes
A convenient but abstract method of avoiding the extra information involves using the adjoint operatorsΨ † i (u). By definition, the adjoint operators map between Hilbert spaces in the opposite direction toΨ i , so we haveΨ † i (u) : H n+1 → H n . The compositionŝ ρ ij (u) :=Ψ † j (u)Ψ i (u) are then operators that map H n to itself for any n, and for n = 0 in particular act within the closed universe Hilbert space H 0 = H BU .
More concretely, the adjoint operatorΨ † j (u) acts by inserting a conjugate boundary (with boundary condition labelled by j) and gluing it to an asymptotic boundary associated with the state on which it acts (the first such boundary, since in (6.33) we definedΨ i (u) to add a boundary in the first slot). As in our discussion above, this gluing of asymptotic boundaries requires a corresponding gluing of the respective spacetimes on the asymptotic part of Σ u . But again we allow all possible gluings deeper in the bulk, and in particular we allow nontrivial replica wormholes.
The compositionρ ij (u) =Ψ † j (u)Ψ i (u) thus acts by inserting a boundary condition corresponding to a complete in-in contour, as shown in figure 19. This is the boundary condition one would choose for computing the components of the density matrix of Hawking radiation on I u , which justifies the choice of notation.

JHEP04(2021)272
Using the general argument from [22] reviewed in section 6.3.2 above, it follows that the operatorsρ ij (u) mutually commute on H BU . Furthermore, they can be simultaneously diagonalised by a basis of α-states, giving rise to superselection sectors and ensembles as before. In a given superselection sector (an α-state), the eigenvalues ρ α ij (u) of theρ ij (u) are interpreted as the components of the density matrix for the Hawking radiation in that superselection sector that emerges before time u.
As before, the superselection sectors mean that Hawking radiation emerging from one black hole evaporation is correlated with that emerging from another. These are classical correlations, described by a classical a probability distribution determined by the decomposition into α-states of the specified baby universe state from H BU (which in the cases discussed above is the Hartle-Hawking no-boundary state |HH ).
In this language, the swap Rényi entropies computed in section 5 were amplitudes of the form That is, they were correlation functions in the Hartle-Hawking state of products of thê ρ ij (u). The replica wormholes gave particular contributions to (6.34) and, as explained in section 5.4, they should also contribute to more general amplitudesρ ij (u).
If we insert intermediate states of H BU between insertions ofρ in (6.34), the resulting Hilbert space interpretation generalises (6.26) above. We give some interpretation of the intermediate states due to replica wormholes in a moment. But this is not the most natural way to describing intermediate states of baby universes in a real-time process, between consecutive experiments on different black holes. For a Hilbert space description that achieves this aim, see appendix B.

What is a baby universe?
We now pause to more carefully explore the notion of baby universe associated with this replica wormhole construction of H n and H BU . We will refer to the result as a replica wormhole baby universe (RWBU), to contrast it with the Polchinski-Strominger notion of baby universe (PSBU) discussed in sections 6.2-6.4.
In particular, let us consider the intermediate statesρ ij (u)|HH ∈ H BU that mediate the correlations between boundaries in (6.34). Since this is the natural object in a replica wormhole discussion, we will call it an RWBU. Similar states were considered at the end of section 6.3.2, where using the PS assumption we interpreted them containing both a PS baby universe and a PS anti-baby universe. This conclusion must be slightly modified when we consider replica wormholes instead of PS wormholes, and in particular when we wish to avoid universes with Planckian curvature such as those that end at E . However, we can still think of our intermediate state as naturally containing two parts. The first part may be called a 'partial baby universe', consisting of only the island region on some Σ u , while the second part is a partial anti-baby universe. The two much match at the boundary of the island, and they are joined at γ = ∂I. We could thus perhaps refer to this RWBU as a 'BU -anti-BU bound state'. See figure 20(a) (left), where the RWBU consists of the red slice (labelled I) and the teal slice (labelled I * ), joined at γ.

JHEP04(2021)272
(a) Each of the n replicas (as shown on the left) making up a replica wormhole has topology S D−1 times an interval, with the asymptotic boundary at one end of the interval, and two conjugate copies of the island (I and I * from 'ket' and 'bra' spacetimes respectively) joined along their common S D−2 boundary γ at the other. A Euclidean continuation resembles the geometry on the right, which could be described in terms of propagation of a RWBU with topology S D−1 , of which I and I * make up the Northern and Southern hemispheres respectively.
(b) To build a replica wormhole, we sew n of the constituents above together along I and I * . We picture the resulting Euclidean spacetimes for n = 2, 3, which we can think of as propagation of a RWBU (n = 2, left), or an interaction of RWBUs (n ≥ 3, right).

Figure 20.
We may describe the correlations between sets of Hawking radiation arising from replica wormhole spacetimes as mediated by exchange of 'replica wormhole baby universes' (RWBUs) appearing in intermediate states such asρ ij (u)|HH ∈ H BU . Such a state is somewhat unusual in Lorentzian signature, but has a natural Euclidean continuation.
As described after (6.26), this notion of 'intermediate state' may seem unnatural in Lorentz signature. Nevertheless, it is the correct Lorentzian continuation of a natural Euclidean notion of intermediate state. To see this, note that a Euclidean version of the boundary conditionρ ij (u) is a spacetime which asymptotes to a closed Euclidean manifold B. For our case of a black hole formed from collapse B has the topology S D−1 (for a D-dimensional spacetime), with the two hemispheres of S D−1 corresponding to 'ket' and 'bra' segments of the boundary, joined along an asymptotic spatial S D−2 . Figure 20(a) (right) shows the resulting Euclidean continuation of each replica. A replica wormhole will join n such boundaries as shown in figure 20(b) for n = 2, 3.
For example, for n = 2 the topology of spacetime is B times an interval, with a boundary lying at each end of the interval. It is then very natural to describe this cylinder in terms of a baby universe with topology B propagating between the two boundaries. For example, [19] constructed replica wormholes in a two-dimensional spacetime for a JHEP04(2021)272 two-sided black hole, with topology of a cylinder for n = 2, a pair of pants for n = 3 and so forth (as in figure 20(b)). From the Euclidean perspective, it is natural to think of such wormholes as describing interactions between closed universes. And when we analytically continue to Lorentzian signature in the correct sense to describe our density matrix boundary conditions, we arrive at precisely the situation described above. The 'island' from any ket part of the Lorentzian replica wormhole spacetime is then just half of B, with the other half being the island from a bra part of the replica wormhole.

Summary
This work has focussed on describing semiclassical expectations for experiments performed on Hawking radiation collected at I + in an asymptotically flat spacetime. To formulate and perform the relevant computations, we used the Lorentz-signature gravitational pathintegral, which in the semiclassical limit involves a sum over saddle-points. In gravity, as in field theory, classifying all possible saddles tends to be rather difficult, so in practice one works to identify interesting saddles and hopes that they dominate the amplitudes of interest. We thus began with the familiar Hawking saddle described in the form of figure 9, which can be used to compute a density matrix ρ(u) for the Hawking radiation arriving at I + before some retarded time u, and hence expectation values of operators acting on that radiation or associated (Rényi) entropies. If u is sufficiently to the past of the future lightcone of the endpoint of evaporation E , the geometry of the Hawking saddle is weakly curved, and all perturbative corrections are small. Of course, at late retarded times the resulting entropies of the Hawking radiation far exceed the Bekenstein-Hawking entropy of the remaining black hole. We refer to this phenomenon as a violation of Bekenstein-Hawking unitarity (BH unitarity).
No part of our later discussion caused any direct modification of the above conclusions. However, we noted that observers who possess only a single copy of a system cannot experimentally measure its entropy. We thus imagined experiments to verify the above violation of BH unitarity that involved forming and evaporating n identical black holes, collecting the decay products of each, and identifying the 'early' subset of each collection that was emitted before some particular retarded time. We then asked our experimenter to measure a swap operator that acts as a permutation among these n early subsets, but which leaves the remaining late subsets fixed. Such observations performed on many copies of identical-but-independent quantum states give a direct way of measuring entropies, and we accordingly refer to the associated expectation values as 'swap entropies'.
In the limit where the n black holes are well separated, we may approximate each black hole formation and evaporation as occurring in a separate asymptotically flat region of spacetime. The boundary conditions on our gravitational path integral then involve n separate asymptotic boundaries. But in performing the sum over all geometries with such boundary conditions we allowed for so-called 'spacetime wormholes', which we define as geometries which connect distinct asymptotic boundaries. Such geometries introduce correlations between the n sets of early Hawking radiation, so that the state ρ (n) (u) of

JHEP04(2021)272
these n sets is not in fact equal to the tensor product ρ(u) ⊗n . By this mechanism, one might hope that observables such as the swap entropies will be nevertheless be compatible with BH unitarity.
Such an approach was advocated by Polchinski and Strominger in [14]. They considered including in the path integral a class of 'PS wormhole' spacetimes shown in figure 11, where the various interior connections between copies of ρ(u) are 'swapped' in all possible ways relative to the Hawking saddle. We reviewed this proposal in section 4 to introduce the idea without the technicalities of replica wormholes, finding swap entropies that share certain features with BH-unitarity-compatible Page curves. However, on closer inspection this model continues to violate BH unitarity (and perhaps also causality), as well as requiring us to regard the PS wormholes, which contain a singular and strongly-curved region E , as saddle-points.
Nevertheless, we saw in section 5 that there are other saddles which resolve these issues. In particular, our swap experiments receive contributions from replica wormholes analogous to those described in [18,19], which are closely associated with the quantum extremal surfaces studied in [20,21]. We thus briefly reviewed results and arguments from those references, translating them to our Lorentz-signature asymtptotically-flat setting. The result is then that our swap n-Rényi entropies -at least in the n → 1 limit where calculations are more tractable -perfectly reproduce the Page curve associated with BH unitary. This is powerful evidence in favor of the idea that there is an operational sense in which the Bekenstein-Hawking entropy is indeed the black hole density of states.
Finally, section 6 incorporated these ideas into a conceptual framework for the gravitational path integral in the presence of spacetime wormholes, building on the insights of Coleman and Giddings and Strominger [11][12][13]. A particular goal was to reconcile the operational verification of the Page curve described above with the apparent failure of BH unitarity associated with Rényi entropies of the Hawking-saddle Hawking radiaton. To do so, we sought a Hilbert space interpretation by cutting open the relevant contributions to the path integral. This led us to slice open spacetime wormholes along surfaces which do not meet asymptotic boundaries, and which we associated with a Hilbert space of closed 'baby' universes H BU . The correlations between multiple sets of Hawking radiation can then be understood as arising from a sum over intermediate states of these baby universes.
It is crucial that the correlations between sets of Hawking radiation can be described are both strict and classical; i.e., any observer who forms and evaporates identical black holes will find identical sets of Hawking radiation, but the particular radiation state obtained may be thought as of as being chosen from a classical probability distribution. To explain this feature, we considered the expectation values or matrix elements of asymptotic observables in particular states and other asymptotic quantities that one might expect to be c-numbers. We found that such quantities in fact yield operators acting on H BU , defined by inserting the relevant boundary conditions in the gravitational path integral. For example, there is an operatorρ ij (u) on H BU for each ij component of the density matrix of Hawking radiation before time u. But as argued in [22], all such operators can be simultaneously diagonalized. In particular, it is easy to show that they mutually commute. Since we defined the path integral to sum over all topologies with the required boundary conditions,

JHEP04(2021)272
the output of the path integral cannot depend on any ordering of the multiple disconnected boundaries. See also footnote 34 for the argument that these operators are normal, in the sense that they also mutually commute with their adjoints. This means that H BU splits into superselection sectors for the algebra of asymptotic observables. In other words, there is a basis of simultaneous eigenstates α of H BU for all such operators. The correlations between multiple sets of Hawking radiation can then be described as classical correlations from a probability distribution of superselection sectors.
Explicitly, applying this to calculations of the density matrix of Hawking radiation we can write 3) The first line writes our n-copy density matrix as an expectation value in the 'no-boundary' state HH ∈ H BU of baby universes, which was an implicit choice in our earlier calculations. By inserting a complete set of α states, we write this as an average over superselection sectors, with probability measure dµ(α) = | α|HH | 2 dα (where dα is defined so that the completeness relation dα |α α| = 1 holds). This defines the notation of the final line, where we write this as an expectation value of random variables ρ ij (u) selected from the ensemble defined by the measure dµ. And while the set of all possible α-values will be determined by state-independent considerations involving the algebra of our operators on H BU , the formulae above make manifest that our results depend on the choice of state |HH ∈ H BU through the measure µ(α). A different choice of state results in a different measure, with an extreme example being an α-state giving δ-function measure. This framework finally allows us to reconcile the entropy results described above. So long as the initial baby universe state is |HH , the state of Hawking radiation from any given black hole is ρ Hawking (u) = ρ(u) . Its entropy grows with u and fails to follow the Page curve due to entanglement with baby universes. But, as is always the case for superselection sectors, this entanglement is unobservable. Evaporating additional black holes induces further entanglement with the same baby universe states, correlating the decay products so that measurements designed to deduce the entropy produce the Page curve with the help of replica wormholes. 37 In particular, the swap test provides a measurement of Tr(ρ(u) n ) ; it does not measure Tr( ρ(u) n ). Thus as emphasized in [29], replica wormholes do not compute the 'true' von Neumann entropy of the state of the radiation; instead they give the entropy of the state projected to a typical superselection sector [22]. It is important that the value of Tr(ρ(u) n ) in almost any given α-sector will be exponentially close to the average value Tr(ρ(u) n ) computed by replica wormholes. The Page curve is therefore a robust prediction, accurate up to exponentially small corrections. We JHEP04(2021)272 superselection sector. Even for a single evaporation event, such a description must yield a density matrix that reproduces the Page curve, and for a single evaporation must do so without the help of replica wormholes. We might then ask: what have we gained from the above considerations? 38 To explore this question and the associated physics of superselection sectors, we will introduce some ideas that are not directly apparent from our considerations so far, but which were studied in more detail in [22].
First, however, we should briefly discuss the predictive status of a framework involving superselection sectors. One issue is that, as pointed out by [103], the correlations between successive experiments mean that we cannot use a strict frequentist interpretation for the 'probability' of getting a particular state of the radiation. Any given observer will decohere onto a branch of the wavefunction with a baby universe state tightly concentrated around some particular superselection sector. But it is natural to instead interpret 'probabilities' of α-states as minimal Bayesian priors, assigning credences to different possible superselection sectors and thus to particular states for the Hawking radiation. Now, for general baby universe initial states this perspective allows us to make definite predictions only when certain features are common to all allowed superselection sectors, or when we consider selfaveraging observables with parametrically sharply peaked probability distributions. But this is also the only sense in which frequentist probability makes definite predictions for standard systems, though in that context one may use the number of experiments as a parameter controlling the width of the distribution.
It is important to emphasize that measuring the actual state of Hawking radiation is tantamount to experimentally determining, at least in part, the α-sector in which we live. The situation is thus much the same as when working with a theory with unknown free parameters (and indeed, we could identify these parameters with coefficients in the effective action giving the S-matrix for black hole formation and evaporation). Alternatively, we can view the α-state as determined by the initial conditions of baby universes as described above. It thus has the same logical status as any other measurement of initial conditions, a situation which has been much discussed in cosmology and to which many of the same words will necessarily apply.
With the above as prologue, we now point out an important difference between the wormholes studied here and those studied in the late 1980's [11,12,104]. The earlier works primarily studied the effect of microscopic wormholes, and in particular of wormholes much smaller than any macroscopic scale of interest. In that case, they can be 'integrated out', and the resulting ensemble of α-states is describable as providing a distribution of random couplings for terms (such as the cosmological constant term) in a local effective action; see [105] for a recent review. Each member of the ensemble is thus a local theory on the scale of interest. But since the typical scale of replica wormholes is that of the event horizon of the black hole undergoing evaporation, integrating out such wormholes will not provide such a local effective theory on black hole scales. 39 So in our context the effect of α states 38 As has often been stressed to us by Steve Giddings, no clear answer was provided by the discussions of wormholes and baby universes from the 1980's and early 1990's. 39 One could think of it as defining a local effective theory on scales larger than the black hole, but then the black hole itself would simply be treated as a particle with a large but finite number of internal states.

JHEP04(2021)272
cannot be absorbed into a shift of local coupling constants in a useful way. Indeed, this is just the sort of non-locality required for the scenarios discussed in [106,107]. It thus appears that we will not obtain a local semiclassical description of superselection sectors by integrating out topology changing processes. On the other hand, we might still ask if one can find a local semiclassical description that retains such processes, but which explicitly includes an initial α-state for the baby universes. The answer to this question will hinge on whether α-states lie in the regime of semiclassical validity. 40 This seems unlikely, and semiclassical physics seems similarly unlikely to determine the precise spectrum of possible superselection sectors. The basic reason is that writing α-states in the occupation number basis leads to large weights for terms involving very large numbers of baby universes. But for exponentially large occupation numbers, the 'interactions' of baby universes (i.e., the topology changing processes which split and join universes) become important at leading order because the exponential suppression of any particular interaction is compensated by the number of possible such interactions. In this regime, there is no guarantee that H BU has any useful semiclassical description, since it is no longer even approximately a Fock space of single universes, and we do not obtain a good approximation by truncating the path integral to any finite number of topologies. In particular, if we try to sum over the large number of semiclassical terms involved, small corrections to the semiclassical approximation in each term may accumulate to yield large corrections to the final answer. This issue is exemplified by toy models of black holes which are so simple that we may perform the path integral exactly, namely Jackiw-Teitelboim (JT) gravity [18,108] and the even simpler topological model introduced in [22]. In the exact solution of these models, the superselection sectors have features expected of unitary quantum systems, but which are remarkable when appearing from a gravitational path integral: they have a discrete spectrum of black hole microstates, 41 bounded in number by the Bekenstein-Hawking entropy. But this is not manifest from the semiclassical approximation, where we expand in a small parameter of order e −S BH which suppresses more complicated spacetime topologies. If we truncate that expansion at any finite order, we see no restriction to superselection sectors with the above features. In fact, the precise spectrum of α-states turns out to be sensitive to doubly nonperturbative effects. The effects are not merely of order e #S BH as for subleading saddle-point geometries, but are of order e #e S BH . This strong suppression is associated with their arising from an infinite sum of exponentiallysuppressed geometric saddles. 42 From these considerations it would appear that α-states 40 Unless, perhaps, we introduce new objects to resum certain contributions: see section 7.3.2. 41 While there are no propagating degrees of freedom in these theories, we may nonetheless model the black hole interior by 'end-of-the-world branes' with a large number of internal states, perhaps much greater than e S BH . 42 Moreover, these effects may not be determined uniquely from the semiclassical expansion since (as is the case in JT gravity) the sum over topologies describes only an asymptotic expansion that does not converge. For JT, there is an extremely natural completion of the sum over topologies defined by Hamiltonians selected from an ensemble of random matrices, since the topological expansion precisely fits the rigid structure required by such a completion. For more realistic models it is unlikely that we will be so lucky as to identify an obvious completion.

JHEP04(2021)272
involve a regime where quantum fluctuations of spacetime topology are untamed. See section 5 of [22] for a more detailed discussion. It would thus appear that we can say little about individual superselection sectors using only semiclassical physics, and that we can only access averaged or other simple statistical properties. 43 Nonetheless, we can make much stronger statements by taking an axiomatic approach and making use of consistency conditions. Specifically, let us assume that the Hilbert spaces of intermediate states considered in section 6 are well-defined, and that they each have a positive semidefinite inner product. For example, while the replica wormholes discussed above showed that the entropy of Hawking radiation is consistent with BH unitarity on average, general consistency arguments show something much stronger, requiring consistency with BH unitarity for every superselection sector. More precisely, section 4 of [22] showed that the number of linearly independent pure states below a given energy (say, prepared by forming and partially evaporating a black hole and projecting the Hawking radiation onto various possible states) is bounded in every superselection sector by the thermodynamic entropy (defined by the inverse Laplace-transform of a Gibbons-Hawking type path integral [110] with periodic Euclidean boundary conditions). Since old black holes have large interiors and thus naively give rise to many more internal states than allowed by the Bekenstein-Hawking entropy (see e.g. [35]), such a bound requires surprising linear relations between such states (equivalently, some linear combinations of states must be unexpectedly 'null', with vanishing inner product with every other state, and so must be set equal to the zero state). This was seen very explicitly in the toy model of [22] and generalisations [111]. These relations rely on the same doubly-nonperturbative physics as discussed above in relation to α-states. In [22], following [112], we interpreted such relations as novel nonperturbative manifestations of diffeomorphism invariance.
As a result of these considerations, we have no reason to expect semiclassical physics to be a good approximation in the interiors of old black holes for an individual superselection sector. While this has of course been suggested before the situation is now much improved because the semiclassical approxiation itself suggests principled reasons to doubt its validity. The approximation predicts its own break-down as it should.
However, the attentive reader will still want to be assured that we have not thrown out the baby with the bathwater. If semiclassical physics is inadequate to describe old black holes in a given superselection sector, what ensures that we may still trust it in weakly gravitating regimes? In the language of [37], what is the 'niceness condition' which ensures that we may neglect topology changing processes involving replica wormholes or interactions with large baby universes in contexts where BH unitarity was not in danger?
The key observation in this regard is that replica wormholes become important only when the matter entropy is so large that the sum over internal states can compensate for the 43 By focusing on clever averaged quantities, semiclassical calculations can nevertheless give more indirect hints at the structure of α states. For example, [108,109] show that a single topology produces the 'ramp' in the spectral form factor that is characteristic of long-range eigenvalue repulsion and hence indicative of a discrete spectrum with statistics resembling that of a random matrix. However, the feature of the spectral form factor which more directly signifies a discrete spectrum (the 'plateau') appears to require summation of all topologies or going beyond a geometric description.

JHEP04(2021)272
usual exponential suppression of topology change. We therefore need to consider these effects only when we have a region with entropy exceeding the area of its perimeter in Planck units; i.e., when S A 4G .

Further open questions
We now close with some open questions and further comments.

AdS/CFT and the factorisation problem
A potential concern with the above conclusions is the strong tension with the traditional understanding of the AdS/CFT correspondence. The point is that this correspondence provides us with examples of theories of quantum gravity with a nonperturbative, UV complete description in terms of a dual conformal field theory, but in which there is no sign of the superselection sectors that we inferred from the existence of replica wormholes.
To be specific, in the asymptotically AdS context, our considerations point to the idea that semiclassical gravity should be dual not to a single unitary CFT, but should instead be dual to an ensemble of such theories, with a different theory for each superselection sector. While examples of such dualities have been recently discovered for simple two-dimensional models of gravity [108,113], the more well-established examples of gauge/gravity duality (such as the paradigmatic duality between N = 4 super Yang Mills and type IIb string theory in AdS 5 × S 5 ) involve a unique dual theory.
This tension is not entirely new; rather, it brings to the fore an old puzzle, touched upon in section 4.2, which has become known as the factorization problem [59][60][61]. The AdS/CFT correspondence equates gravitational amplitudes with fixed asymptotically AdS boundary conditions to the partition function of a CFT, with background geometry determined by the conformal boundary of the gravitational 'bulk' spacetime. If that boundary is disconnected, locality of the CFT immediately implies that the result should factorize as the product of partition functions on each connected component. But this result is surprising from the gravitational point of view: contributions from bulk spacetimes that connect different boundary components appear to spoil the above factorization property, but it seems arbitrary to exclude such spacetimes from the gravitational path integral. From the point of view of the baby universe Hilbert space discussed in section 6, factorization requires that H BU is one-dimensional, so that all states of baby universes are somehow equivalent [22,114].
There has not been any entirely satisfactory resolution to this puzzle. It thus remains to be seen whether e.g. type IIb string theory in AdS 5 × S 5 has a one-dimensional H BU (perhaps due to the proper inclusion of various stringy objects and features that go beyond semi-classical supergravity), or whether this bulk theory is in fact dual to an ensemble of field theories with only one member of the ensemble being given by N = 4 super Yang Mills using the standard bulk-to-boundary dictionary. 44 In the light of replica wormholes, the factorisation problem is directly related to the black hole information problem, since the entropy computations involved wormholes connecting multiple boundaries. We do not immediately require factorisation for the entropies, since the boundary conditions for separate boundary components are correlated in a way which explicitly spoils factorisation. But if we decompose the Rényi entropies into quantities which do require factorization, it appears that the wormholes remain and spoil factorization [98]. Somewhat less concretely, as pointed out in section 4.2 a mixed state of Hawking radiation represents a failure of factorization: components of the density matrix are computed by a product of two 'S-matrix' boundary conditions, and the state is pure exactly when the amplitude similarly factorizes. Now, it may well be that physics similar to replica wormholes appears naturally for theories with a single unitary dual after some appropriate coarse-graining which explicitly spoils factorization: see e.g. [116][117][118]. But the more pertinent question for us is whether replica wormholes are relevant in a situation where we have performed no such explicit coarse-graining. Paraphrasing [98], in a situation like the standard AdS/CFT setting having factorization and without superselection sectors, can we nonetheless understand replica wormholes as the first term in a systematically improvable expansion?

Description of superselection sectors
In section 7.2, we were rather pessimistic about describing individual superselection sectors directly in terms of standard semiclassical gravitational physics. Nonetheless, there is still scope for a relatively simple description using a different language. One such idea which has appeared recently in toy models is that of 'spacetime D-branes' or 'eigenbranes' [22,108,119,120]. These are dynamical boundaries for spacetime (analogous to D-branes providing boundaries on which the string worldsheet can end) which have the effect of (perhaps partially) fixing an α-state. While these appear to be new objects in the theory, they can also be thought of as an emergent, collective description of a coherent state of baby universes (much like regarding D-branes as a coherent state of closed strings, as opposed to new fundamental objects). Does something similar apply going beyond these toy models, to theories which are rich enough to include evaporating black holes?
In the context of evaporating black holes, the idea of providing boundary conditions for spacetime in the black hole interior to produce a pure state of Hawking radiation is not new: this is essentially the final state proposal [58]. Perhaps these ideas can be revisited as an effective description of baby universe α-states. Certainly, it remains an outstanding open problem to find a more complete, and perhaps more physical, description of the transfer of information from a black hole to the outgoing Hawking radiation in each superselection sector.

Contributions from UV physics
We have been careful to make use only of low-energy physics which is well-established and tested, and in regimes where there is no reason to expect that it fails to be trustworthy. However, we cannot rule out the possibility that the quantities we have studied are sensitive JHEP04(2021)272 to more exotic physics from the UV completion of the theory. Indeed, this may be required to solve the factorization problem in the AdS/CFT context.
One such set of ideas is the fuzzball proposal (reviewed in [121,122]), which we highlight due to some conceptual similarity with physics of an individual superselection sector discussed above. Specifically, one piece of the fuzzball proposal is that gravitational collapse does not lead to formation of a horizon, but instead there is a tunnelling event to a horizonless configuration. The amplitude to tunnel to any given configuration is small, but this is compensated for by the large number of possible states. We can compare this to the situation for superselection sectors described in 7.2, where interactions with baby universes were similarly suppressed individually, but compensated for by a large population of baby universes. One might speculate that the fuzzballs replace the baby universes, effectively selecting a distinguished α-state. But since this selection depends on fine details of the UV completion, with extra dimensions, strings, branes and so forth, the low-energy gravity is ignorant of the details: it does the best job it can in the face of its ignorance, which is to average over the possibilities. In the hope of making such a connection, we conclude with one comment: while the fuzzball literature suggests that the tunnelling event happens before the horizon forms, from our considerations we see that this is in fact unnecessary to solve the information problem. It suffices if this physics kicks in only after the Page time, when the parametrically large interior can play a role, and when large corrections to the state of Hawking radiation are required.

Spacetimes with singular causal structure
The fact that replica wormholes can provide gravitational saddles strongly suggests that spacetimes with singular causal structures play an important role in the gravitational path integral. As noted in section 5.1, the past light cone of any splitting surface γ has multiple disconnected parts. In particular, it has one such part for each of the bra-spacetimes that join at γ (and similarly one such part for each of the ket-spacetimes).
This idea that such causal singularities should be included is not new (see e.g. [15,84,86]), though its implications remain to be fully explored. One would like to understand just how general such causal singularities can be, and in particular what singularities arise in saddle-point geometries. For example, can one find saddles where splitting surfaces for replica wormholes lie outside horizons (and thus in the past of I + )? If so, how are we to understand their effects on measurements performed by asymptotic observers? Similarly, are there saddles with multiple splitting surfaces that are causally related to each other? See [123] for an example of timelike separated islands in a cosmological context. It may be possible to probe the physics of such settings using time-folds, as may be familiar from the study of out-of-time-order correlation functions. That is, instead of each replica being constructed from one branch of forward evolution ('ket') and one of backward evolution ('bra'), we add further forward and backward branches, with the possibility of nontrivial replica-wormhole-like identifications. Such time-folds might be used to connect I + with the past of a splitting surface (where the physics is understood).
Conversely, our work above took as a fundamental assumption that the low-energy gravitational path integral sums over topologies. While this is a common discussion in JHEP04(2021)272 treatments of gravitational path integrals, and despite its utility in describing the Hawking-Page transition in AdS space [124] and defining the Hartle-Hawking no-boundary wavefunction [100], some readers will ask if there might be formulations of quantum gravity in which it fails to hold. This important issue also deserves further attention in the future.

Non-perturbative physics of baby universes
There also remain certain questions about how non-perturbative corrections will affect our discussion of baby universes. For example, as described in section 6.5.2, the Polchinski-Strominger assumption led to a certain notion of PS baby universe, while our analysis of replica wormholes led to a different notion of RW baby universe. In particular, the latter can roughly be thought of as a bound state of a PS-baby and a PS-anti-baby universe. The difference between the two was in part due to the fact that the PS assumption allowed us to discuss the path integral associated with forming a black hole and the performing a complete projective measurement at I + . But did the PS-assumption lead to the correct conclusion? We presume the full non-perturbative theory to allow such boundary conditions, but what are the results? Do the resulting baby universes resemble the PS-babies, or does each PS-baby necessarily come attached to an anti-baby so that the result is more like the RW baby universes? Or is this question fundamentally ill-defined due to the presence of null states as described in section 7.2? And on a similar note, does the non-perturbative theory have a meaningful distinction between universes and anti-universes?

More details of unitarity
Our work above focussed on the Page curve. This is a prominent signature of BH unitarity, but it is not it itself enough to guarantee unitarity for asymptotic observers. Does semiclassical gravity make predictions that are in line with unitarity in other ways, and in more detail?
As an illustration that challenges may lie ahead, we give an example in the context of the Polchinski-Strominger proposal in section 4. In section 4.1, we found this proposal to give predictions consistent with a pure state on I − evolving to a pure state on I + (for example, as probed by the swap test). But unitary evolution also requires that the inner product is conserved, so two orthogonal states on I − should evolve to orthogonal states on I + . We can check this using a swap test, except that we now prepare two black holes with orthogonal states at I − , perhaps by throwing a particle with two possible internal states into the black hole. Unitarity demands that the expectation value of the swap operator acting on I + for these two black holes is zero. But this is not the case for the PS proposal: the expectation value is exponentially small, but nonetheless positive. 45 Thus the Polchinski-Strominger proposal does not result in a unitary S-matrix.
If we remain within the semiclassical regime, considering only experiments on the radiation before the black hole becomes too small, then we do not have such a sharp contra-JHEP04(2021)272 diction with unitarity. Nonetheless, it provides a warning that more must be checked, and motivates a careful study of the situation when we consider several different initial states.

Moving away from asymptotics
We studied black hole formation and collapse in an idealised setting, using states that were prepared and measured at asymptotic boundaries, and using experiments with multiple black holes placed in separate spacetimes. This allowed us to make very clean statements (like commutativity of operators acting on the baby universe Hilbert space), but it can only be an approximation to more realistic settings. Any actual experiment will involve experimenters subject to gravitational physics, even if only weakly. While it is natural to assume that such real-world experiments would be well-modeled by the idealized ones described above (or involving an auxiliary system coupled to AdS, or involving sharp boundary conditions imposed on finite 'cutoff' surfaces as in implicit in e.g. [25][26][27][28]), this remains to be shown in detail. In particular, our concept of cluster decomposition, in the sense that experiments on multiple black holes will approach our 'separate universe' idealisation as the separation between them is taken to infinity, is as yet only an expectation.
It is clearly of interest to explore this further, not least in the context of cosmology. Indeed, in analogy with Everett's treatment of the quantum mechanical 'measurement problem' [125], the most interesting question would appear to be what form of conceptual framework (if any) would allow a sharp discussion of experiments whose final recordsand not just the intermediate steps -are subject to quantum gravity effects.

The experience of an infalling observer
Our main focus in this paper has been to compute observables defined far from the black hole, in asymptotic regions. We have not directly commented upon the more difficult question of predictions for the observers who enter the black hole. This is more challenging, since it is far from obvious how to give a gauge invariant description of such observers, who are inevitably part of the quantum system of the black hole (a situation familiar from quantum cosmology). We will not say anything definitive on this question, but we make a few comments below.
If the baby universe state is simple (as for the Hartle-Hawking state), our path integrals describing any one black hole are dominated by the usual semiclassical black hole spacetime, with a smooth interior until the singularity. This gives us no obvious reason to doubt the conventional description that an infalling observer will experience no drama at the horizon. The firewall paradox [126,127] is evaded because, in a technical sense, information is lost: the late radiation is not required to be entangled with the early radiation.
However, the situation is less clear for multiple identically prepared black holes or more complicated baby universe states. In particular, in the AdS context one could make use of an auxiliary bath system as in [21] to effectively 'measure' the α-parameters, thus decohering the different superselection sectors of the gravitating spacetime. Since infalling observers have no access to the bath, one might expect their experiences to be described by individual superselection sectors. The firewall problem then arises with full force. In addition, we must deal with the vast number of null states required by the discussion in section 7.2.

JHEP04(2021)272
What it means to discuss physics in this context, and how it relates to previous proposed resolutions remains a fascinating topic for both discussion and further investigation. In the first phase (closest to I + ), the localization at large u means that the spacetime is very close to that of a stationary black hole as any transient effects associated with the collapse will have either dispersed to distant parts of the asymptotic region (where its gravitational effect is minimal) or will have fallen into the nascent black hole. In the approximation that the region is exactly stationary, the evolution of a mode of definite frequency ω amounts to solving a Schrödinger-type scattering problem, resulting in a reflected mode R and a transmitted mode T ; see again figure 2(a) (right).
The reflected mode R reaches I − at late advanced times (i.e., large affine parameter v along I − ) without leaving the Phase I region where the spacetime remains nearly stationary. As a result, R has the same positive frequency ω along I − as does L along JHEP04(2021)272 I + . This means that R contributes only to what are usually called the α Bogoliubov parameters (which map annihilation operators to annihilation operators) as opposed to the more interesting β Bogoliubov parameters associated with mixing between creation and annihilation operators.
On the other hand, the transmitted mode T travels through the region where the spacetime is dynamical. However, since L is localized at large u, the transmitted mode T has high frequency with respect to natural freely-falling observers. As a result, the WKB approximation may be used to justify the use of geometric optics in propagating T back to I − and completing the calculation. This is the 2nd phase of the backwards evolution that was foreshadowed above. But rather than complete the full calculation, the end result can be seen [41] by noting that in the overlap of the regions corresponding to phases 1 and 2, the T mode is localized in a region close to the horizon that is well approximated by Rindler space. Furthermore, since the corresponding Rindler time-translation coincides with the (approximate) time translation symmetry outside the black hole, in this region T is purely positive-frequency with respect to Rindler time. But any smooth state will locally approximate the Minkowski vacuum in this Rindler region. Thus the occupation numbers of T modes are thermally distributed.
The Hawking effect is thus associated primarily with the transmitted mode T . The fact that it corresponds only to the part of the original mode that was transmitted through the potential barrier into the region near the horizon during the phase 1 evolution introduces the famous "grey-body" factors into the Hawking effect. Here the name comes from the fact that when evolving modes toward the future the corresponding transmitted part would fall into the black hole and be absorbed, and also to the fact that absorption and emission coefficients must agree in thermal equilibrium. Thus the (squared) fraction of the original mode that remains present in T is naturally interpreted as the coefficient for emission of the original mode by a radiating black hole.

B Intermediate states of baby universes
In section 6.5, we discussed a Hilbert space interpretation of replica wormhole calculations, in particular allowing only measurements on a region I u before the black hole becomes too small. However, this description was not particularly natural from the point of view of consecutive measurements on different sets of Hawking radiation, so in this appendix we give an alternative Hilbert space interpretation.
Consider in particular a real-time process in which H BU begins in some initial state (perhaps |HH ), and an asymptotic observer creates a black hole before making a complete projective measurement for the Hawking radiation on I u . It is then natural to ask for the state of H BU required for predicting subsequent similar measurements. Since we are leaving the radiation which emerges after time u unobserved, we have necessarily lost some information. The state of baby universes will thus become mixed due to entanglement with the unobserved part of the asymptotic state. As a result, this process is best described as a map from density matrices to density matrices (a quantum channel) on H BU .

JHEP04(2021)272
More generally, we can write down the map from an initial density matrix on H BU to a final density matrix on H BU ⊗ H u that describes the state of both the baby universes and the state of the Hawking radiation to which we have access. We can write this map explicitly as ρ BU → N i,j Tr u (Ψ i (u) ρ BUΨ † j (u)) ⊗ |i j|, (B.1) where we have introduced the operation Tr u (to be defined below), which enacts the partial trace over the unobserved radiation. The first term in the tensor product is an operator on H BU , the second factor |i j| on the radiation Hilbert space on I u , and N is chosen to normalise the trace to unity. To explain (B.1), recall that a density matrix ρ BU is an operator on H 0 = H BU . The operatorΨ i (u) maps H BU → H 1 , and soΨ j (u) † maps H 1 → H BU . ThusΨ i (u) ρ BUΨ † j (u) is a map from the one-boundary Hilbert space H 1 to itself. Its matrix elements are computed by a path integral bounded by a pair of Cauchy surfaces Σ u meeting I + , one on the 'ket' branch and one on the 'bra'. The operation Tr u identifies these two branches along Σ u asymptotically, producing an operator on H BU . It is not meaningful to specify a priori how far this identification persists into the interior. The gravitational path integral sums over all such choices; replica wormholes will lead to semiclassical contributions where the identifications persist to the edge of the associated island.
As is the case for all our discussions, the formula (B.1) simplifies if ρ BU is built from asymptotic boundary conditions (so that is part of the superselected algebra of asymptotic observables). In that case one finds Tr u (Ψ i (u) ρ BUΨ † j (u)) =Ψ † j (u)Ψ i (u) ρ BU =ρ ij (u)ρ BU . In particular, if the baby universes are in an α-state (so that ρ BU = |α α|), the map leaves the state of baby universes unchanged, while producing the state ρ α (u) on I u whose components are given by the α-eigenvalues ofρ ij (u).
We can also use this same simplification more generally if we only wish to use (B.1) to compute expectation values of asymptotic observables (thought of as operators on H BU ). By essentially the same argument as above, tracing such observables against (B.1) gives the same result as tracing the observables againstρ ij (u)ρ BU . The point is simply that we can commuteΨ † j (u) past these obervables and then use the cyclic property of the trace in order to useΨ † j (u)Ψ i (u) =ρ ij (u). However, it should be borne in mind that other states and operators exist and may be of interest, for example to describe the experience of an observer falling into a black hole.
Open Access. This article is distributed under the terms of the Creative Commons Attribution License (CC-BY 4.0), which permits any use, distribution and reproduction in any medium, provided the original author(s) and source are credited.