Hints of gravitational ergodicity: Berry’s ensemble and the universality of the semi-classical Page curve

Recent developments on black holes have shown that a unitarity-compatible Page curve can be obtained from an ensemble-averaged semi-classical approximation. In this paper, we emphasize (1) that this peculiar manifestation of unitarity is not specific to black holes, and (2) that it can emerge from a single realization of an underlying unitary theory. To make things explicit, we consider a hard sphere gas leaking slowly from a small box into a bigger box. This is a quantum chaotic system in which we expect to see the Page curve in the full unitary description, while semi-classically, eigenstates are expected to behave as though they live in Berry’s ensemble. We reproduce the unitarity-compatible Page curve of this system, semi-classically. The computation has structural parallels to replica wormholes, relies crucially on ensemble averaging at each epoch, and reveals the interplay between the multiple time-scales in the problem. Working with the ensemble averaged state rather than the entanglement entropy, we can also engineer an information “paradox”. Our system provides a concrete example in which the ensemble underlying the semi-classical Page curve is an ergodic proxy for a time average, and not an explicit average over many theories. The questions we address here are logically independent of the existence of horizons, so we expect that semi-classical gravity should also be viewed in a similar light.


Introduction
Recent developments [1,2] on the information paradox [3][4][5][6] have revealed that one can reproduce the Page curve for Hawking radiation from semi-classical gravity. This can be viewed as surprising for a couple of reasons: • Firstly, it reveals that understanding the fine-grained entropy (or at least its qualitative Page evolution) does not require us to know the microstate/density matrix in the full UV-complete theory; a knowledge of the semi-classical description is enough. While this fact may seem superficially surprising, it should be emphasized that there is no contradiction here. Entanglement entropy is just one number, and the full density matrix is a (possibly infinite dimensional) matrix. So the latter contains a vastly larger amount of information, which is in principle not required for extracting the fine-grained entropy. It is therefore not implausible, at least in hindsight, that semi-classical gravity is able to calculate this entropy.
• A second and more perplexing feature is that the semi-classical calculation that leads to the unitarity-compatible Page curve involves the inclusion of replica wormholes into the Euclidean path integral [7,8]. When interpreted at face value, this suggests that we are in fact dealing with an ensemble average, when we use semi-classical gravity to compute the matrix elements that go into the entropy calculation [7]. Indeed for JT gravity in two dimensions, which is an ensemble average over unitary theories (and

JHEP05(2021)126
therefore is a non-unitary theory), one can explicitly demonstrate the emergence of the Page curve by evaluating the average over the underlying ensemble [7,9]. In short, we seem to be finding a unitarity-compatible Page curve from an ensemble-averaged description.
The second bullet point above, raises a puzzle. Our entire premise when looking for a tent-shaped (i.e., unitarity-compatible) Page curve was that quantum gravity is unitary. And yet, now we have been dealt a devil's bargain. We have a unitarity-compatible Page curve, but in the semi-classical (Euclidean) gravity limit where we are working, it seems to be arising in an ensemble average over theories. Even though this is not quite a contradiction -the ensemble average of a quantity that follows the Page curve will also follow the Page curve -it does raise a puzzle about how one should think about the relationship between the fundamental description of gravity and its semi-classical description.
The Euclidean path integral is believed to be ill-defined as a complete definition of quantum gravity in higher dimensions (e.g., the wrong sign kinetic term of the conformal mode of the metric). At the conceptual level, an obvious piece that is missing in our present understanding is the connection between semi-classical (bulk-metric based) gravity and the underlying "true" quantum gravity degrees of freedom, which are presumably holographic. To make matters more confusing, in low dimensions there seem to be non-unitary metric theories like JT gravity that do have well-defined path integrals. These can be explicitly demonstrated to be ensemble averages over distinct unitary matrix models [9].
The goal of this paper is to make some progress in understanding how to think of semi-classical gravity in more general contexts. More generally, we wish to understand the role (if any) of ensembles in a Page curve calculation in a unitary theory. Does the fact that semi-classical gravity is an ensemble average, suggest that the fundamental theory should also necessarily be an ensemble average over distinct theories? This is the case in JT gravity, and it has been suggested that this may be the general paradigm. Such an explicit ensemble average however would be disappointing from the point of view of the usual lore of the AdS/CFT correspondence, where individual unitary boundary theories (e.g., N = 4 SYM) seem to be dual to individual unitary theories of quantum gravity (e.g., type IIB string theory on AdS 5 × S 5 ) which should each have semi-classical supergravity limits. We do expect black holes to arise as thermalized states in a single copy of an N = 4 SYM theory.
In order to shed some light on this question, we will make two key observations in this paper regarding the two bulletted points mentioned at the beginning of this Introduction. These observations are -• Neither of the points have a a priori anything to do with gravity, black holes or horizons. By this we mean that both features can be seen in systems that apparently 1 are without gravity.

JHEP05(2021)126
• Both features can be seen already at the level of individual unitary theories, without explicit ensemble averages. The ensembles arise much like they do in conventional statistical mechanics, where they arise as proxies for time averages when the system is in (approximate) thermal equilibrium.
In other words, the first bullet point about the semi-classical accessibility of the unitaritycompatible Page curve is equally valid in non-gravitational unitary theories. Similarly, there does not seem to be anything forbidding us from coming up with a non-gravitational theory where a Page curve emerges at the semi-classical level via an apparent ensembleaverage. Indeed, the bulk of this paper deals with the detailed study of an example that illustrates both these points. We expect that such examples should be fairly generically constructible in quantum chaotic systems which can be split into two subsystems. Our goal in the rest of the paper will be to exhibit these two ingredients in a single realization of a non-gravitational unitary theory. In our view, this strengthens the possibility that gravity may also fit into the same rubric: the semi-classical replica wormhole calculation reproduces the unitarity-compatible Page curve via an apparent ensemble average, while the full quantum gravity indeed remains safely unitary. Closely related ideas have appeared earlier, see [10][11][12][13]. One of the new features in our calculation will be that we are able to follow the evolution of the system (and the Page curve) explicitly at the semi-classical level. This also enables us to have a clear understanding of the epochdependence of the ensemble. Other crucial features of our explicit model will become clear as we proceed.
If this picture is correct, low-dimensional examples like JT gravity which come with explicit ensemble averages and well-defined (but non-unitary) metric path integrals, are to be viewed as exceptions. The swampland ideas of [14], which suggest that in high enough dimensions, ensembles for gravity contain only a single theory seem consistent with this picture. What is nice about JT gravity then, is that it gives us an explicitly doable, well-defined metric path integral unlike more realistic theories of gravity.
In what follows, we will work with the concrete example of a hard sphere gas leaking slowly from a small box into a larger one. A hard sphere gas in a box is known to be a quantum chaotic system, whose eigenstates were conjectured by Berry [15,16] to behave semi-classically as though they were picked from a Gaussian ensemble. We will call this conjectural ensemble, Berry's ensemble. Berry's conjecture was one of the initial motivations for the Eigenstate Thermalization Hypothesis (ETH) [17], see also [18]. The reason for our interest in this particular set up involving the hard sphere gas is that based on general principles of unitarity, we expect to see a Page curve in this system if we compute the entanglement entropy of the larger box. Equally importantly, thanks to Berry's conjecture, we may suspect (and indeed we will demonstrate) that it should be possible to show the emergence of this Page curve via a calculation at the semi-classical level, where an ensemble average plays a significant role.
Horizons, islands and other geometric objects do not play a role in our calculations, and there is no genuine information paradox. But note that the questions we are interested in have only to do with the semi-classical ensemble average aspect, and we will show that JHEP05(2021)126 our system shares that with the black hole system. Therefore, despite the differences, the lessons we extract from the hard sphere gas have a chance of holding for gravity as well. Indeed, this is our primary motivation behind the present paper.
We will find that the semi-classical entanglement entropy of the larger box, follows the Page curve. The assumption of slow leakage, 2 leads to two timescales in the problem and we find that there is an analogue of a Hawking radiation epoch [1] in the present problem as well. During each epoch, we can compute the entanglement entropy assuming that the eigenstates of the relevant subsystem are taken from Berry's ensemble. 3 The result, when plotted against epoch, yields a unitarity-compatible Page curve.
Interestingly enough, we also find that despite the absence of horizons, we have a simple way to obtain an information "paradox" in this system. Instead of computing the ensemble-average of the Renyi entropy from the reduced density matrix, one can consider the Renyi entropy of the ensemble-averaged reduced density matrix. By direct calculation through the epochs, we find that the evolution of this object does not have the turnaround and we are left with Page's version of the Hawking paradox.
A key technical assumption in our calculation is that the leakage is slow so that the gas in each box can come to approximate equilibrium during each epoch. This is what enables us to take advantage of Berry's ensemble averaging epoch by epoch. In doing so, we are effectively assuming that the entanglement entropy during each epoch can be computed via a suitable time average (thanks to local equilibrium) and that the ensemble average is an ergodic stand-in for this, as is often the case in statistical mechanics. The entanglement entropy of the reduced density matrix of the larger system is the thermodynamic entropy of the smaller system during that epoch. In the limit when the system has fully thermalized and both boxes have the same density of particles, this reduces to the result obtained in [19]. So our work can be viewed as a type of generalization of the result there. See [20] for some related discussions.
The structure of these observations strongly suggest that perhaps a similar mechanism is what holds in gravity as well. By analogy with the hard sphere gas, we are therefore tempted to conjecture that semi-classical gravity is providing an ergodic ensemble averaged description of quantum gravitational dynamics in bulk local equilibrium. Since gravity is holographic, more ideas will be needed to make this into a fully concrete proposal, but let us make one speculative comment. We suspect that some approximate notion of coarse-graining will likely be required in defining the relevant entanglement entropy in flat space gravity. A cut-off has played a role in flat space ever since the work of Gibbons-Hawking [21], and it seems plausible to us that its correct interpretation is in implementing a coarse-graining [22].
Our work departs from some of the statements in the literature, which call for gravity to be viewed as an explicit ensemble average. On the contrary, we view our results as being in line with the ideas of [10,11]. Our purpose here is to present a concrete non-JHEP05(2021)126 gravitational model which illustrates the relevant points, with an essentially fully calculable semi-classical Page curve. We believe this provides a clean context to evaluate the various ingredients, as well as the precise role played by gravity. We hope that our result is of some use in shedding light on how to think of the semi-classical gravity path integral, and figuring out its ultimate significance in the unitarity of the microscopic/holographic description of quantum gravity.

Two boxes for the hard sphere gas
Let us start by considering a collection of N hard spheres, each with a radius a, enclosed in a cubic box of length 4 L. Assume that there is a larger empty box of length L in contact with the smaller box. At t = 0, we open a hole in the wall between them so that the gas can leak slowly into the larger box. By tuning the size of the hole, we can take the leakage rate to be slow. Technically, what this means is that the mean free path of the hard sphere is hierarchically larger than the diameter of the hole d. We will also take the size of the hole to be (possibly hierarchically) larger than the sphere radius a. As =L 3 / √ 2πÑ a 2 (see e.g., [23]), it suffices to have HereL denotes the fact that we are referring to either of the boxes, andÑ is the number of particles in it during an epoch (a term we will define below). We will model the system by assuming the hard spheres to be point particles/centers satisfying the constraint that the distance between any two centers cannot be less than 2a. If all the particles were enclosed in a single box, this description will reduce to the model discussed in [17]. It is natural to expect our system with the two connected boxes also to exhibit ergodicity and chaos, even though typically the hard sphere gas in a single box is the one that is studied in the context of chaos and thermalization [17].
Let us look at the Hilbert space of the system. We can denote the energy eigenstates by |Ψ α . Let us introduce a position basis |X , where X corresponds to the 3N dimensional position vector of all the particles. In this position basis, we can define the wavefunctions To define the domain where the wave function is defined, we first introduce an auxiliary domain where the three Cartesian coordinates of the individual box domains are B S ≡ [0, L] 3 and The crucial extra boundary condition that defines the true domain of the system is given by the condition that the wavefunction vanishes not on all of 4 In [17] the box size was taken to be L + 2a. This makes sure that the centres of the spheres are living in a box of length L. This adds nothing to our discussion, and makes the definition of the hole connecting the two boxes slightly unwieldy, so we will let the centers themselves bounce off the box walls. This is purely a mathematical convenience.

JHEP05(2021)126
∂B 1 ∪∂B 2 , but only on ∂B 1 ∪∂B 2 −H where H is the part of the domain which corresponds to the location of the hole. The region within this vanishing condition of the wave function is our true domain, and we denote it by D. We will not need to specify the shape and location of the hole in detail to do our calculations below, other than the conditions on its size we noted above. Note that the second box is bigger than the first, i.e. L > L, and H is a subset of ∂B 1 ∩ ∂B 2 . The hierarchy in (2.1) introduces two time-scales into the problem. Since the gas is leaking slowly, the time taken for each of the boxes to reach approximate equilibrium (separately), will be much smaller than the timescale of leakage during which the number of particles in the boxes change appreciably. Implicit is also the assumption that the average energies are sufficiently high that each box thermalizes quickly enough compared to the other scales in the problem. In any event, the end result is an epoch where both of the boxes have separately equilibrated and the number of particles in each of the boxes remains approximately fixed. Let N S and N L denote the number of particles in the smaller and larger box at a particular epoch. As the total number of hard spheres in the boxes remain fixed throughout an epoch, we can use either N S or N L to characterize it. The number of hard spheres are related to each other through the conservation law To study the evolution of the state of the system, we expand it as a linear combination of the eigenstates |Ψ α of the full system (i.e., the two boxes connected by the hole) as follows: The corresponding density matrix of the state is In the position basis, we have To analyze the properties of each of the boxes separately with the epochs, it becomes useful to focus on a subspace of the full Hilbert space. At every epoch, we have (2.4) and this provides a natural partition of the 3N components of the vector X into 3N S and 3N L components as follows: can loosely be thought of as denoting the position vectors of the particles in the smaller and larger boxes respectively. 5 In terms of these coordinates, we can define the wavefunctions as

JHEP05(2021)126
In this notation for the position basis, the density matrix takes the form

Purity of the larger box
To calculate the entanglement entropy of the larger box, we will compute the n-th Renyi entropy of subsystem, and then take the n → 1 limit. As a warm-up, we will start with the computation of the purity (n = 2) of the larger box. We start by computing the reduced density matrix of the particles "associated to the larger box", in the notation of the last paragraphs of the previous section: Squaring the matrix, we get Now let us look at the behavior of this quantity at each epoch. From the discussion in the previous section, we can see that working in various epochs is equivalent to restricting ourselves to processes occurring at time-scales larger than the equilibrization time of each box. Therefore, we can effectively replace the relevant quantities with their time averages over this timescale. As we expect the system to be ergodic, we can in turn replace the time average with an ensemble average. Therefore, to understand the behavior of quantities in each epoch, we should look at their averages in the appropriate ensemble. 6 It turns out that there is a natural choice for such an ensemble. Consider a quantum chaotic system. Berry's conjecture [15,16] says that when the energy of an eigenstate is sufficiently high, the state behaves as if it was picked randomly from a fictitious Gaussian ensemble. It was shown in [17] that when evaluated in this eigenstate ensemble, 7 the single particle momentum distribution function of the hard sphere gas turned out to be equal to the Maxwell-Boltzmann distribution. This is a specific manifestation of the eigenstate thermalization hypothesis (ETH). It is expected that (see e.g., [20]) for systems which satisfy the ETH condition, ergodicity is guaranteed. Therefore, we can hope that averaging over Berry's ensemble acts as an ergodic proxy for the underlying time averaging. A further comment worth making, is that Berry's conjecture is based on semi-classical physics and relies on the connection between classical and quantum chaos [24]. So this further 6 We will assume that the Renyi and entanglement entropies are quantities that can be calculated in this way. We will also assume that the (eigenstate) ensemble replacement can be done when the system is in local equilibirum. The fact that the results are reasonable (as we will see) will be taken as a posteriori evidence for these assumptions. 7 We will refer to this ensemble as Berry's ensemble in the context of the hard sphere gas.

JHEP05(2021)126
strengthens the parallel with the black hole Page curve calculation, which was done in the setting of semi-classical gravity [7]. Adopting this philosophy, we are now ready to compute the purity of the reduced density matrix in Berry's ensemble: where the subscript EE denotes that the quantity is averaged over the eigenstate ensemble. Berry's conjecture would imply that the four-point function 8 will be given in terms of the Wick contractions of the two-point functions, as in [17] (see also appendix A, for related discussions in the single box). Therefore, we have Plugging the above expression into the previous one, we get Pulling the sums into the ensemble average, this becomes Now let us evaluate the two-point functions in the above expression. At each epoch, we are making a semi-classical approximation that N S particles are in one box and the rest are in JHEP05(2021)126 the other. At the level of wave functions, this enables us to assume that the value of the wavefunction Ψ vanishes (at least approximately) at the hole H. Roughly, at each epoch, we choose the boundary condition that Ψ vanishes on the boundary of D S and D L where these domains characterize the two separate boxes (and are defined precisely below). So we can decompose the state Ψ as follows: where ψ i S (x) and φ i L (y) are the eigenfunctions of the smaller and larger boxes, with N S and N L hard spheres respectively. These wavefunctions are defined in the domains D S and D L where and and they vanish on the boundary of their respective domains. The expression for the purity therefore becomes, (3.10) As x and y are independent variables, we can again simplify the expression in the square brackets using Berry's conjecture, now for the individual boxes. This gives us

JHEP05(2021)126
Using (A.19), we can do the above integrals. This will give us where we have defined Simplifying the expression, we get We can see that the terms with |U i − U j |/U i smaller than or equal to 2 /mU i L 2 1/2 will be dominant than the others. In particular, ifŪ is the average energy, then we can see that the sum will be dominated by the terms with |U i −Ū |/Ū ≤ 2 /mŪ L 2 1/2 ∼λ/N 1/2 L, denotes the thermal wavelength at the temperatureT . For such terms, we can approximate the exponential to be 1 and we get (the sums are now restricted to the non-vanishing band)

JHEP05(2021)126
From an analogous calculation, we can also see that Therefore, we can define the normalized purity of the larger box as follows: This expression can be further simplified to where we have defined and the normalized reduced density matrices of the boxes as follows: Here |φ i L and |ψ i S are eigenstates of the larger and smaller box respectively.

Non-crossing partitions and the n-th Renyi entropy
Now let us look at the computation of the n-th Renyi entropy of the larger box. It is straightforward to calculate Tr (ρ n L ) by following the steps in section 3. The resulting expression will contain a product of 2n eigenfunctions, as in (3.2). However, to evaluate this quantity in the eigenstate ensemble, we will have to perform all the possible pairwise contractions of these 2n eigenfunctions and then do integrals over the resulting expressions. This can turn out to be quite tedious as the number of possible contractions go as (2n)! n!2 n for JHEP05(2021)126 We start off by distributing all the 4n indices present in the higher dimensional analogue of (3.10) on a circle as in figure 1(a). The pair of indices corresponding to each copy of the system are connected by a dotted line through the boundary of the circle. Note that these pairs of indices are placed in such a way that the indices of various copies of the smaller (larger) box are adjacent to each other. Now let us connect one such pair of indices to another though the interior of the circle using dashed lines. While making the connection, we make sure that an index corresponding to the smaller (larger) box is connected only to another smaller (larger) box index. Doing this for all the pairs on the circle, we will get a diagram that corresponds to a particular pairwise contraction of all the 2n eigenfunctions (refer 1(b)). Now let us compute the value of each of these diagrams. It is useful to introduce some terminology before we proceed. The dashed lines partition the interior of the circle into various sub-regions (refer figure 2). Let us call such a sub-region an m-connected region if there are m pairs of indices on the boundary of the region. Depending on the box to which the boundary indices belong to, we can attribute each m-connected region to the smaller or larger box. For example, in figure 2(a), there are two 1-connected and one 2-connected regions belonging to the smaller box (These regions are marked in blue).
In terms of these regions, we can assign a value to each diagram by using (A.22) and the structure of the contractions. For every m-connected region, we should introduce a factor of (Tr(ρ S I S )) m−1 or (Tr(ρ L I L )) m−1 , depending on which box the region belongs to. Summing over the value of each of these diagrams will give us the n-th Renyi entropy of the larger box.
If any interior dashed line of a diagram intersect another, then we will refer to these diagram as a crossing diagram. As Tr(ρ L I L ), Tr(ρ S I S ) 1 at every epoch, it is very easy JHEP05(2021)126 Therefore, this diagram will have a factor of (Tr(ρ L I L )) 2 Tr(ρ S I S ). Similarly, the figure (b) will have a factor of (Tr(ρ L I L )) 3 Tr(ρ S I S ). We can see that this crossing diagram will be sub-leading to figure (a) at every epoch.
to see that all the crossing diagrams are sub-leading to the non-crossing diagrams (refer figure 2(a) and (b) for an example). Therefore, it suffices to add the dominant non-crossing diagrams to get the n-th Renyi entropy. This makes the computation easier as the number of non-crossing partitions, called the Catalan number (see e.g., [26]), is much smaller than the number of pairwise contractions. We can see that there is a similar contraction structure as well as leading order behavior in [11]. This close resemblance has to do with the fact that the "equilibrium approximation" in [11] is equivalent to a time averaging when the system has reached an approximate (local) equilibrium. Now let us write an explicit expression for the n-th Renyi entropy by adding the value of all the leading order diagrams. The structure of the contractions results in a large number of degenerate diagrams. Two diagrams can have the same value if one of them can be obtained by permuting of the m-connected regions of the other diagram. We can also have a degeneracy when the diagrams have different m-connected regions but the powers of (Tr(ρ L I L )) and Tr(ρ S I S ) add up to the same number (refer figure 3 for an example).
To take care of these issues, let us first characterize each diagram by the m-connected regions of the larger box. 9 We can represent the m-connected regions of the larger box by the notation (1 m 1 2 m 2 3  structure (1 m 1 2 m 2 3 m 3 . . . n mn ) is given by [26] To account for the second type of degeneracies, let us first look at the partitions of a natural number m, that is, we look at the all the possible ways in which m can be written as a sum over positive integers. We can label each partition by the set {(j, m j )}, where j ∈ Z + and m j corresponds to the multiplicity of each j. Therefore, by definition, we have Let us denote P (m) to be the set of all such partitions of m. Using these definitions, we can write down the leading order contribution to the value of the following expression as: where r = j (j + 1)m j . We have used the notation (1 n−r {(j + 1) m j }) to represent the non-crossing diagram consisting of (n − r) 1-connected regions and m j (j + 1)-connected regions, for all (j, m j ) ∈ {(j, m j )}. When k = 0 and k = (n − 1), we can see that factor in the parenthesis turn out to be 1.
Using the above expression, we can calculate (the leading order contribution to) the averaged n-th Renyi entropy,

JHEP05(2021)126 5 The semi-classical Page curve
To make explicit statements about the behavior of the entanglement entropy, let us define and We will callŪ S andŪ L as average energy of the boxes. To understand the behavior of the entanglement entropy, let us look at early and late times separately. When we make plots, we will assume that the average energy per particle is roughly constant. It is possible to relax this assumption somewhat, while retaining the shape of the Page curve, but we will not explore it here since it is quite reasonable as a physical assumption in a closed system of large number of particles [17].

Early time behavior
At early times, the larger box will have very small number of particles compared to the smaller box. Therefore, Tr(ρ L I L ) Tr(ρ S I S ). The n-th Renyi entropy will be dominated by the k = n − 1 term in (4.3). The von Neumann entropy of the reduced density matrix can be then calculated as For large N , log Γ(3N/2) = (3N/2 − 1) log 3N/2 − 3N/2. This gives us where V L is the volume of the larger box. This is precisely the thermodynamic entropy of the larger box as a function of the number of particles N L at a given epoch. As we discussed above, if we assume that the average energy per particle is roughly constant, as in [17], we immediately see that the entanglement entropy will increase with time as N L increases with time.

Late time behavior
With the passage of time, more and more particles start moving into the larger box. This results in an increase in the value of Tr(ρ S I S ) and decrease in Tr(ρ L I L ). Depending on the relative size of the larger box, there are two types of late time behavior. Let us look at these cases separately.
If the larger box is sufficiently bigger than the smaller box, we will reach an epoch where Tr(ρ S I S ) = Tr(ρ L I L ). We call the time taken to reach this epoch the Page time (t P ) of the system. Let us denoted the number of particles in the smaller and larger boxes at the Page time by N S P and N L P respectively. For any t > t P , Tr(ρ L I L ) Tr(ρ S I S ). Therefore, The k = 0 term in the equation (4.3) will dominate the sum. This gives us The resulting equation is the thermodynamic entropy of the smaller box. Here N is the equilibrium value of the number of particles in the smaller box. Eventually, both the boxes will completely thermalize w.r.t. each other and the net exchange of particles will drop to zero. This happens when the particle density in each of the boxes equalize. Let N S E and N L E denote the final equilibrium values of the number of particles of the boxes. We have the relation Using N L E + N S E = N , we can solve the above equation and we get Let us define t E as the time taken for N L E particles to leak into the larger box. For t > t E , the entanglement entropy will saturate to the value If we assume that at each epoch we have N L + N S = N , then we can plot the entanglement entropy as a function of N L or N S . We can see from figure 4 that the graph increases at early times and the reaches a maximum at the Page time. The graph then decreases and saturates to (5.8).  .7)) as a function of N L when n = 3 for a system with L = 2 meters, L = 2 10 meters, N = 10 6 , and particle mass m = 1 amu. We assume that the average energy of the boxes scale linearly with the number of particles as in [17]. Therefore, at every epoch, we setŪ L N =Ū S N = 3 2 k B T and we choose T = 300 K. As more and more particles leak out into the larger box, we can see that the average entanglement entropy (denoted by the orange line) increases with time initially and then reaches a maximum at the Page time (indicated by the dotted vertical line). After the Page time, the entanglement entropy drops and saturates to the equilibrium value. As this value is much smaller than the maximum value at Page time, the entanglement entropy plot would look like it has dropped down to zero. In contrast to this behavior, the entanglement entropy of the averaged state, indicated by the blue dotted line, keeps on relentlessly increasing with time (until it saturates at its final value). Now let us look at the case where the size of the larger box is comparable to the size of the smaller box. It is possible to have scenario where N L E < N L P . This would mean that the system would reach thermal equilibrium before the Page time is reached. Therefore, the entanglement entropy of the system will saturate to

JHEP05(2021)126
We mention this only for completeness, our primary interest is in the previous scenario.

An information "paradox"
We have managed to reproduce the Page curve by doing an ensemble averaged computation of the entanglement entropy at each epoch. In this section, we will instead compute the entanglement entropy of the ensemble averaged state at each epoch. We will find that this leads to an information "paradox". We put the word in quotes because a genuine

JHEP05(2021)126
information paradox is tied to the existence of horizons, and also because here we know why the paradox is appearing.
Let us start by evaluating the reduced density matrix in the eigenstate ensemble: Squaring the matrix, we get Therefore, the purity of the larger box will be given by Now let us evaluate the two-point functions in the above expression. Using the factorization in (3.7), we get This is precisely the first term of the equation (3.10). Therefore, we can immediately carryover the calculations in section 3 to get Tr ρ L

JHEP05(2021)126
Therefore, the entanglement entropy of the averaged state will be given by Under the assumptions of the previous section, we can see that the plot of S( ρ L EE ) v/s time will keep on increasing and then saturate to the value at N L = N L E (see figure 4). Therefore, when the size of the larger is box is sufficiently larger than the smaller box, the late time behavior of S( ρ L EE ) will be different from that of S(ρ L ) EE . We can calculate the purity of the smaller box in the eigenstate ensemble by interchanging ψ ↔ φ, x ↔ y, x ↔ y and x ↔ y in (3.11) and it turns out to be equal to the purity of the larger box in the eigenstate ensemble. Therefore, we have S(ρ L ) EE = S(ρ S ) EE . This behavior is expected from a unitary theory. However, if we make the same replacements in (6.4), we will be able to see that S( ρ L EE ) = S( ρ S EE ). Therefore, there is an apparent loss of unitarity when we work with the averaged state.
These results, while simple, are interesting because they provide an explicit mechanism for understanding how the information paradox may emerge in gravity. It suggests that the vacuum one obtains by quantizing fields in the black hole background has features of an averaged state from the perspective of the fundamental theory.

Semi-classical gravity as an ergodic effective theory
Our calculations in this paper had nothing to do with gravity, horizons or a true information paradox. In fact our primary goal was to illustrate that an ensemble-averaged semi-classical approximation leading to the Page curve is not limited to gravity. But in doing this, we learnt that the ensemble average can arise as a proxy for a time average during each epoch, even in single realizations of a unitary theory. This is interesting because the precise role of the ensemble in the case of gravity has been a bit murky. For one, in 2-d JT gravity there is an explicit ensemble average. But in usual AdS/CFT in higher dimensions, we expect to see black holes in the duals of single copies of the CFT.
In our hard sphere gas, we found that an ensemble average can arise in the ergodic sense during each epoch of local equilibrium. This suggests a similar picture for gravity in higher dimensions. Loosely related ideas have appeared previously in [10,11], and our goal here was to find a model that provides a nuts-&-bolts understanding of the origin of the Page curve. The Page time of the black hole is vastly larger than its scrambling time, and Hawking temperature is a well-defined approximately constant quantity during any epoch of evaporation. This makes it possible that the ensemble average in gravity is a proxy for a time average during each epoch of Hawking radiation. In other words, an explicit average over an ensemble of distinct unitary theories may not be necessary.

JHEP05(2021)126
A further observation we made is that the Page version of the Hawking paradox can emerge in our perfectly unitary system, if we did our semi-classical calculation using the ensemble-averaged state. We showed that the entropy increases relentlessly until it saturates at the thermodynamic entropy. This again is a strong suggestion that a similar mechanism may be at work in the gravity system as well -indeed, a proposal that the Hawking result is a consequence of an ensemble-averaged state was suggested previously in [27].
In fact, our calculation in section 6 demonstrates that the "state paradox" formulated in [27] can be resolved without an explicit ensemble average over many theories. Let us take a moment to explain this. We start with the Quantum Extremal Surface [28] formula for the von Neumann entropy, which we write schematically as where the S macro on the right hand side is the entropy of the bulk quantum fields as would have been calculated by Hawking. State paradox arises, if we view the latter as a finegrained contribution to the full entropy. It was proposed in [27] that the problem can be solved if we view S micro as an ensemble average of the entropy of the state, and S macro as an entropy of the ensemble averaged state. Our calculations in this paper suggest that these ensembles need not be explicit collections of distinct theories like in JT gravitythey can be ergodic ensembles that stand in for epoch time averages. In particular, the ensembles one thinks of here are not fixed, they are implicitly epoch-dependent. In our example of the hard sphere gas, it is controlled by the number of particles in either box in a given epoch, which fixes the appropriate Berry's ensemble.
A key point in the above discussion is that the S macro on the right hand side of (7.1) is supposed to be computed semi-classically, via a state obtained from quantum field theory in curved space. Even though this object is usually viewed as a fine-grained entropy in the QFT in curved space Hilbert space, it is not clear that it is a fine-grained quantity in the true microscopic degrees of freedom in the holographic CFT Hilbert space. This was emphasized in [29] where the S macro was called a "coarse-grained" entropy. Let us emphasize that this is distinct from some of the uses of the phrase "coarse-graining" in the literature. What [29] emphasized was that the bulk entropy is calculated in a semiclassical bulk state, and not in the truly microscopic CFT state. The precise connection between the two descriptions has never been very clear in AdS/CFT, but observations in this paper suggest that the states in the Hilbert space of a quantum field theory in the black hole background has features of an averaged state. We will elaborate on this preliminary observation, in future work [22]. More generally, if taken at face value, the message of our work is that semi-classical gravity should be viewed as a tool for capturing ergodic averaged gravitational dynamics, for evolution that is in bulk local equilibrium. Of course, developing this idea further is something that will have to be left for future work. In our hard sphere calculation, the assumption of local equilibrium entered due to the hierarchical timescales that ensured the existence of epochs. See [30] for another discussion of hierarchical timescales in strongly coupled theories, which may be related to black hole physics.

JHEP05(2021)126
We have been quite specific in our focus in this paper, but let us conclude by emphasizing that the recent ideas on the black hole Page curve 10 may have implications even beyond black holes. See e.g. closely related discussions in cosmology [32][33][34][35][36]. We refer the reader to e.g. [37][38][39][40][41][42] for more recent papers with a fairly thorough list of references.

Acknowledgments
We thank Jude Pereira for discussions and collaborations.

A Quantum chaos in the hard sphere gas
Let us consider N hard spheres, of radii a, enclosed in a single cubic box of length L + 2a as in [17]. 11 We can denote the energy eigenfunctions of the system by ψ i (X), where X = (x 1 , . . . , x N ) is a 3N -dimensional vector that labels the position of all the particles of the system. We can define ψ i (X) on the domain satisfying the boundary condition that ψ i (X) vanishes on ∂D.
The eigenfunctions of the box can be chosen to be real and they take the form [17] ψ α (X) = N α d 3N P A α (P)δ P 2 − 2mU α exp(iP · X/ ) (A.2) where A * α (P) = A α (−P) and U α is the energy of the state. Berry's conjecture says that when the energy of ψ α (X) is sufficiently high, A α (P) acts as if it is a Gaussian random variable with the two-point function A α (P)A β P EE = δ αβ δ 3N P + P /δ P 2 − P 2 (A.3) Moreover, we can compute the four-point functions in terms of the two-point functions as follows One can also write down a similar "factorization" condition in position space as well by Fourier transforming, and we will use it in the main text. Now let us look at some expressions that will be useful in the computation of the entanglement entropy. Consider the expression 10 See [31] for another take on the Page curve in gravity. 11 Note the technical caveat we made in footnote 4. For the single box, we can use either languagenormal spheres with box size L + 2a or spheres whose center can reach the box walls, with box size L. The two problems are mathematically identical.

JHEP05(2021)126
Using (A.3), we get ψ * j (X)ψ i (X ) Let us focus on the Dirac delta part of the expression. We have Therefore, In particular, when X = X, we have Let us normalize this quantity by demanding that U j are sufficiently close to each other. In fact, it turns out that these states are the ones that dominate the entanglement entropy calculation. Therefore, we will have to use a more precise replacement to do the computation. It suffices to use a Gaussian approximation as follows (see [17] for a closely related but distinct expression) Using the Gaussian approximation, we can see from [17] that Substituting this into (A.12) we get where we have defined When |U j − U k |/U j smaller than or equal to 2 /mU j L 2 1/2 , the exponential can effectively be replaced by 1 and we will get When the eigenstates are sufficiently close to each other in the energy spectrum, we can also see that

JHEP05(2021)126
Open Access. This article is distributed under the terms of the Creative Commons Attribution License (CC-BY 4.0), which permits any use, distribution and reproduction in any medium, provided the original author(s) and source are credited.