Chaos and High Temperature Pure State Thermalization

Classical arguments for thermalization of isolated systems do not apply in a straightforward way to the quantum case. Recently, there has been interest in diagnostics of quantum chaos in many- body systems. In the classical case, chaos is a popular explanation for the legitimacy of the methods of statistical physics. In this work, we relate a previously proposed criteria of quantum chaos in the unitary time evolution operator to the entanglement entropy growth for a far-from-equilibrium initial pure state. By mapping the unitary time evolution operator to a doubled state, chaos can be characterized by suppression of mutual information between subsystems of the past and that of the future. We show that when this mutual information is small, a typical unentangled initial state will evolve to a highly entangled final state. Our result provides a more concrete connection between quantum chaos and thermalization in many-body systems.


Introduction
Empirically, there is a generic tendency towards entropy growth in many body systems. This "arrow of time" appears at odds with the fact that our physical models of these systems are often time-reversal symmetric. A common argument has been available from the time of Boltzmann: states are exponentially likely to evolve to states of higher entropy, simply due to the counting of states at given entropy (see for example [1]). In fact, in the presence of time-reversal symmetry this explanation is incomplete. For every state of a given entropy with entropy growth, there is a state with the same entropy but with entropy decay. The best we can actually hope for is to explain why some class of preferred states experiences entropy growth in some subsystem under specific dynamics.
This question is closely related to another aspect of quantum many body systems. The use of statistical ensembles to understand the long-time collective behavior of many degrees of freedom in terms of local microscopic interactions is one of the great simplifications and triumphs of modern physics. It is natural to ask when and why the use of these ensembles is justified; if all expectation values of interest for a given initial condition after time evolution can be computed to arbitrary accuracy in an ensemble depending only on macroscopic parameters, we will say that system thermalizes for those initial conditions and that time evolution. For isolated classical systems, dynamical chaos is a sufficient and generic condition for ergodicity in phase space, which explains the accuracy of the microcanonical ensemble and hence equilibrium statistical mechanics.
The quantum case is more complicated. It is important to note that there are classical systems with few degrees of freedom whose observables are well-described by the microcanonical ensemble, but upon quantization do not thermalize. Although there are experimental probes of few-body quantum systems that do thermalize [2], generic thermalization in quantum systems appears to be inherently a many-body effect [3]. Quantum mechanics also supports a long-time behavior not present in classical systems, Many-Body Localization (MBL). Recent experiments detecting these phases [4] provide practical motivation to explain the mechanism of and conditions for thermalization of isolated systems. Finally, our understanding of quantum chaos and ergodicity is still incomplete, and leading justifications for quantum statistical mechanics are not as directly connected to quantum chaos as in the classical case.
There are two leading explanations for quantum thermalization. One is known as Canonical Typicality (CT) [5,6], the statement that due to the exponentially large dimension of Hilbert space or the subspaces associated to finite energy windows, almost all pure states in the subspace will appear as if they were randomly chosen from that subspace, i.e. indistinguishable from the microcanonical ensemble, on any small subregion. Importantly, the CT approach can be extended to the dynamical result [7] that, under weak assumptions about the distribution of eigenvalues of the Hamiltonian, a subsystem interacting with a sufficiently large bath will spend most of its time close to its time average, independent of the initial state of the subsystem and for almost all initial states of the bath. One useful way to view CT is as an extension of the statistical argument for entropy growth: most states in a subspace are already close to maximum entanglement within the subspace. The other explanation is the Eigenstate Thermalization Hypothesis (ETH) [3,8], which loosely stated is the conjecture that the high-energy eigenstates of quantized classically chaotic systems are indistinguishable from the microcanonical ensemble of the same system for local observables. This conjecture is well-supported numerically for a large class of systems, and gives a very clean description of quantum thermalization when it applies.
There are some deficiencies remaining in both approaches. There are no direct criteria to evaluate on a Hamiltonian to see if ETH holds, short of finding the eigenstates. On a related note, ETH has only been proven true for a small class of systems. Finally, although the ETH is inspired by ideas about classical chaos and ergodicity, there is no proof that chaos in dynamics implies ETH. 1 The conclusions of CT appear completely unrelated to whether a system is chaotic. The principle mechanism of CT is that typical states are close to maximally entangled, or already at equilibrium. The problem is that we would like information about highly atypical states (out-of-equilibrium low-entanglement states) that form a set of measure zero in most subspaces of high dimension. The dynamical extension [7] solves this problem for systems where a small subsystem can be highly atypical in a much larger typical bath. This is a reasonable assumption for a near-isolated quantum system interacting with the rest of the world, but is not useful when we wish to consider an even smaller class of states, where the entire system is far from equilibrium and has low entanglement. It is also a statement about time averages as opposed to instantaneous density matrices. Finally, like ETH, CT gives no criterion on the time evolution to distinguish thermalizing from non-thermalizing systems. On a related note, there is no explanation for the mechanism of thermalization, apart from the high dimension of Hilbert space.
In this work, we link the entropy growth of low-entanglement states under unitary time U (t) to measures of quantum chaos associated with U (t). In doing so, we begin to address the above deficiencies in explanations of quantum thermalization. More specifically, we consider the chaos criteria proposed in [10].
In this work, the unitary time evolution operator U (t) = e −itH is mapped to a doubled state. The doubled state is defined by considering two copies of the physical system and preparing maximally entangled EPR pairs between each site of the physical system and its doubling partner. Denoting this state by |I , the unitary U (t) is mapped to the pure state 1 ⊗ U (t)|I . By construction, the two copies of systems (named as the past system and future system) are always maximally entangled, with U (t) the Schmidt matrix of the wavefunction. Quantum chaos is characterized by the suppression of mutual information between subsystems of the future and past systems. A small mutual information between a region A in the past and a region B in the future tells us that operators in A mostly evolve to non-local operators exceeding the boundary of B, causing a suppression of local correlation functions. This criteria is shown to be related to another chaos criteria, the out-of-time-ordered correlation (OTOC) functions [11,12,13,14,15].
In this paper, we show that the mutual information criteria defined in [10] also controls the entropy growth for proper choices of low entropy initial states. More specifically, we consider a given partition of the system into multiple regions, and consider an ensemble of initial states that are tensor products of random states in each region. After time evolution by U (t), we study the purity Tr ρ A (t) 2 of a subsystem A in the final state. The ensemble average of the second Renyi entropy is determined by a sum in which each term is controlled by the second Renyi mutual information in the doubled state. When the mutual information terms are sufficiently small, a typical product state at initial time evolves into a state with nearly maximal entropy. Therefore we have shown that chaos in the dynamics U (t) implies thermalization, at least for the ensemble of unentangled initial states we define. Since the random product state has a high energy, the final state has maximal entropy and infinite temperature. We also discuss the generalization of our result to initial state ensembles with finite temperature.
The remainder of the paper is organized as follows. In Section 2, we review the relevant aspects In what follows, we denote density matrices of subsystems by ρ A = TrĀ ρ, dimension of subsystem A by D A , and operators that act by identity inĀ as O A . As a reminder, the von Neumann entropy of a density matrix ρ is S[ρ] = − Tr[ρ ln ρ], and the mutual information between two subsystems in ρ is

Quantum chaos
We start by reviewing some recent results in understanding quantum chaos. In classical systems, one diagnostic of chaos is exponential sensitivity to initial conditions, quantified by the exponential growth of the Poisson bracket of some pair of phase space coordinates {q(t), p(0)} = ∂q(t)/∂q(0) ∼ e λ L t . A natural [11]. Of the four terms in the expansion of this expression, the most interesting for our purpose is the out-of-time ordered four-point correlator (OTOC) whose decay for thermal ρ has been interpreted as a signal of quantum chaos [12,13,10].
The decay of C 4 seems to be an operator-dependent statement, but is in fact related to an informationtheoretic quantity, the second Renyi entropy S (2) , computed from the time evolution U (t) = e −iHt [10].
Since U (t) ∈ H ⊗ H * , we can consider it as a normalized state in a Hilbert space with inner product The state |I encodes an isomorphism from operators (elements of H * ⊗ H ) to states in a doubled system (elements of H ⊗ H ) by right action, so we can explicitly map the unitary operator U (t) to the state |U (t) = (1 ⊗ U (t))|I . 2 We denote the density matrix associated with the pure state |U (t) as The construction of |U (t) and an example partial trace of ρ U (t) is illustrated in Figure 1.
Correlations between the past and future copies of Hilbert space in U (t) are related to chaos and scrambling. For example, the mutual information between a region A in the future and region B in the past bounds correlations in time: We can already see a connection of the information content of ρ U (t) and thermalization in (2). If the mutual information between A in the future and B in the past is small in ρ U (t) , the action of an operator in B in the past has no influence on the action of an operator in A in the future. This shows that small mutual information in ρ U (t) is sufficient for re-thermalization of the infinite temperature ensemble after perturbation. Thus in this case, we have the natural statement that information in U (t) between regions in the future and past tells us how sensitive the future region is to the initial conditions in the past region.
The main goal of this work is to extend this result to far-from-equilibrium pure states. More generally, (2) shows that we can think of the mutual information I[ρ U (t) ; A F , B P ] roughly as quantifying how much initial conditions in B determine the subsystem A after time evolution U (t).
A main result of [10] is a more explicit connection of past-future mutual information to chaos: the average of the OTOC (equation (1)) over operators in subsystem A in the future and B in the past is , whereB P is the complement of B P in the past system. By average of 2 It is important to note that |U (t) depends on the basis choice used when defining |I , which determines the isomorphism H H * . However, entanglement properties of ρ U (t) are completely independent from the basis choice.
A F ∪B P ; on the left, A has a white box around it, and B has a red box. Figure 1: Pictorial representation and explicit construction of the mapping from time evolution operator U (t) to the state |U (t) ∈ H ⊗ H and the associated density matrix ρ U (t). First, in 1a we introduce our notation and draw U (t) as a tensor with "input" legs at the bottom and "output" legs at the top. To help keep track of the future and past, we draw the output edge of U (t) as a bolded line. Each leg corresponds to a subsystem of H and denotes an index in the tensor, and contraction is represented by simply connecting "input" with "output" legs. A particular example of this operation is shown in 1b, where we depict action by 1 ⊗ U (t) on the maximally entangled state |I , turning U (t) into a state |U (t) on a doubled system. In 1c we show ρ is maximally entangled between the past and future, so that for any region R exclusively in the past or future, S[ρ operators on a subsystem A we mean a weighted sum over the D 2 A Hermitian operators in a complete, orthonormal basis (under the above inner product on operators). We will write these operators in a script font, as O A , with the average implied wherever they appear. The second Renyi entropy is defined as and is a measure of uncertainty in ρ: for pure states, S (2) = 0, while for maximally mixed states S (2) = S = log D where S is the von Neumann entropy and D is the dimension of the Hilbert space. There are some other properties of A F ∪B P ] that will be important in what follows. It can be seen from Jensen's inequality that S (2) [ρ] ≤ S[ρ]. Thus when S (2) is near-maximal, so is S. We also have the Thus for B much larger than A, S (2) is large for "kinematic" reasons, independent of the time evolution. The more interesting quantity in this case is a version of the mutual information adopted for Renyi entropy, , which is non-negative in our state since ρ U (t) is maximally entangled between the future and past. I (2) A F ∪B P ] about its kinematic value, and bounds the corresponding mutual information I[ρ U (t) ; A F , B P ] from above. We can then write (as shown in [10]) Thus scrambling in U (t) as quantified by S (2) is directly related to chaos. A generically small four-point correlator means a large Renyi entropy or a small I (2) . For A and B small, the expression (4) is actually in terms of non-local operators onB. In this case, there is a more natural expression in terms of local operators, The two expressions (4) and (5) emphasize the important point that the second Renyi mutual information characterizes the behavior of both two-point functions and OTOC. I (2) [ρ U (t) ; A F , B P ] between two small regions A and B is governed by two-point functions of operators supported on A and B, while that between a small region A and a big region B (bigger than half system size) is governed by the OTOC of operators supported on A and the smaller regionB. This is also consistent with the fact that the decay of the OTOC implies a stronger scrambling of information than simply the decay of two-point functions. As we will see below, one utility of the point of view of information is a unified treatment of the two-and four-point functions.
In [10], the relationship (4) is used to show that a four-point correlator decaying to some value less than in any region implies that the sum of mutual informations In principle, can be so small that this sum is arbitrarily close to zero. In more realistic models, we can expect that the OTOC will decay as some polynomial of the logarithm of the total Hilbert space dimension. This sum is called tripartite information and its negativity is proposed as a measure of "scrambling" due to unitary time evolution; then quantum chaos as measured by the decay of C 4 implies scrambling.
We would like to make a side remark at the end of this section. We treat I (2) and I for ρ U (t) as operator-independent diagnostics of chaos. It is clear from the discussion above that if the OTOC and two-point functions decay generically, I (2) will be small, which implies I is small as well. Although it is most direct from the discussion above to treat I (2) as the intrinsic measure of chaos and I simply as a quantity also small in chaotic systems only because it is bounded by I (2) , the true mutual information I is more natural in many other contexts and it is intuitive that small mutual information of ρ U (t) should imply chaos. To that end, using a bound on von Neumann entropy in terms of Renyi entropy [16] (see Appendix B), we can show that Thus a sufficiently small mutual information implies small I (2) , which in turn implies chaos according to the OTOC. In the remainder of the work, we will focus on I (2) , but (6) should be kept in mind as a way to bound I (2) in terms of the true mutual information.

Thermalization of completely random product states
Our goal is to understand how entropy growth and thermalization is related to quantum chaos as defined above. As discussed in Section 1, entropy growth is a state-dependent statement and can only be true for specific classes of states, for example initial states with small entanglement. The most naive choice of initial state ensemble is product states of some fixed granularity. More precisely, we consider a partition of the initial system into regions R s such that ∪ S s=1 R s = P is the whole system. Correspondingly, each region R s has a Hilbert space H s , and the Hilbert space of the whole system can be written as a tensor product of subsystems H = s H s . We consider states of the form |ψ(0) = s |a s , with |a s a random pure state in H s . An example of one of these states, along with its time evolution, is shown in Figure 2a where the sum runs over all nontrivial subregions R = R i 1 ∪ R i 2 ∪ · · · ∪ R in that are unions of some of the building blocks R s . P({R s }) denotes the set of all such R's, i.e. the powerset of {R 1 , R 2 , ..., R S }. A similar relation has recently been studied in the context of random dynamics in [17]. A representation of a typical term in the sum is shown in Figure 2b. We give some examples of this formula below, and present a derivation in Appendix A.
Apart from bounding von Neumann entropy from below, the utility of computing S (2) is that it can be used to bound the one-norm difference of density matrices. Recall that so the one-norm is the natural distance for density matrices. By Jensen's inequality, Thus as long as the deviation from maximal entanglement is sufficiently small, we can say a density matrix thermalizes in the one-norm in expectation. As we will see below, it turns out that thermalizing in expectation (at infinite temperature) sufficiently well is sufficient for most states to thermalize. Note that the infinite temperature ensemble is the appropriate choice here, since for typical Hamiltonians most states will be infinite temperature states (c.f. [18]), and we always have E H H ρ ψ = H β=0 .
To get an intuition for the implications of (7), it is illuminating to consider some special cases. First, we consider a trivial partition with only one region R 1 = P equal to the whole system. In this case the   (7). Note in general each state |a s is different despite being drawn using the same symbol . In Figure 2b, we show a typical region R from the powerset P({R s }). Figure 2c illustrates the special case with bipartition of the past system. ensemble is that of random pure states on the whole system. Our formula reduces to which is completely independent of dynamics. In the limit of large system size, as long as the subsystem A is less than half the system and grows at most linearly with system size, D 2 A /D decays exponentially with system size. Then (9) is the familiar statement that to exponential accuracy, a random pure state is close to maximally entangled in any small subsystem. This result is expected, as the typical pure state is indeed close to maximally entangled in any subsystem [19,20,21], and a random state evolves to another random state under any dynamics. In fact, the result of [19], derived by explicit integration on S 2D−1 , is a special case of (9) for trivial evolution U (t) = 1. We can see the relationship to a more traditional measure of entanglement, the von Neumann entropy, by Jensen's inequality: We move to the case of two initial subsystems, H = H S ⊗ HS where we take 1 D S ≤ DS. A typical state and its time evolution in this setup is shown in Figure 2c. The expression (7) becomes Already in this next-to-simplest case dynamics play a central role. First, if region S is large (and S is even larger), all the terms in (1 + D S + DS)/D are exponentially small. Regardless, this contribution serves to increase S (2) . For A smaller than half system size, D 2 A /D is exponentially small. The only decrease from maximal entanglement that can survive in the large system size limit is then due to the terms involving I (2) [ρ U (t) ; A F , R P ]. Thus small I (2) between A and both S andS, equivalent respectively to the generic decay of two-and four-point correlators between A and S, is necessary and sufficient for the expectation of S (2) [ρ ψ A (t)] to be near the maximal (equivalently thermal at infinite temperature) value for initial product states in S andS. It is important to note that "small" I (2) depends on our choice of S, as terms have the form e I (2) [ρ U (t) ;A F ,R P ] /D R . If we want this contribution less than R , we only require the condition that It is also useful to rewrite (11) in terms of correlation functions For large systems D 2 A /D is exponentially smaller than D 2 A /D S , so we can safely focus on the contributions due to correlators. It is clear that if the two-point functions decay and D 2 A /D S is finite, the deviation from maximum entanglement will be dictated by the (strictly positive) four-point term Note that depending on the choice of A and S, even with both less than half system size and (for lattice models) |A| < |S|, D 2 A /D S may be made of order, or even much greater than 1. As mentioned in Section 2, C 4 (O A (t), O S (0)) β=0 can be as small as an inverse polynomial in the logarithm of system dimension in chaotic systems, so as long as A and S are chosen so that D 2 A /D S does not grow too quickly with system size, in chaotic systems random product states on S and S will evolve to look thermal in A. In the limit D 2 A /DS → 0, only the local two-point function will contribute to deviations from ln D A . This is the case considered by CT, and shows that chaos in the OTOC sense is not necessary for thermalization into a much larger random bath. On the other hand, when DS is finite, if four-point correlations do not decay sufficiently we can have significant corrections to thermal entropy. This argument extends without significant modification to the case of S initial subsystems H = S s=1 H s , where S may grow linearly with system size. As long as two-point functions generically decay between A and subsystems up to half system size, the contribution from summands in (7) involving R less than half the system will be small. The decay of four-point correlators between A and subsystems up to half system size is necessary to bound contributions from summands involving R greater than half system size. Concretely, we need, for regions R less than half system size, A for A to look maximally entangled on average. Note that in an integrable system, it is expected that I (2) will always be high for some subextensive 5 subregions R (for example numerics in [10]), although these regions may change in time as information propagates. Some subextensive region of initial conditions largely determines the density matrix in A. This demonstrates an obstacle to thermalization in non-chaotic isolated systems. In contrast, a chaotic system will scramble information about the initial conditions in each H s across extensive regions of the system. Equivalently, extensive knowledge of initial conditions determines the density matrix in A. The only R for which small, we can meaningfully bound the number of states that do not thermalize by Markov's inequality: The conclusion is that if we find that states are expected to thermalize sufficiently well, then a particular state is likely to have the average behavior after long times. This bound is easily "weakened" to a statement about probabilities of significant deviation for local entropy. On the other hand, if states are not expected to thermalize, we do not expect to find such a bound on physical grounds; the long-time trajectory of non-thermalizing systems can depend sensitively on the details of initial conditions.

Finite temperature extension
As discussed above, the preceding results should be interpreted as statements about thermalization at infinite temperature. To get ensembles other than infinite temperature we must restrict the set of initial states we average over. One natural way is to still consider a partition into regions R s , s = 1, 2, .. The above results suggest that entropy growth and thermalization of these product states should be related to entanglement properties of some state depending on U (t); tentatively, call this state |U M (t) (in Sections 2 and 3, the relevant state was isomorphic to the operator U (t)). As a first check that we have chosen a useful state, it is natural to require that some analog of (2) hold for ρ U M in Figures 1 and 2. The case π M = 1 has been described in Section 2; it turns out this is a very special case, due to the fact that 1 commutes with everything. For general ρ M and two regions A F , B P in the past and future systems, respectively, we have the bound A F ∪B P ] can become negative (in contrast to the π M = 1 case), so I (2) as defined in Section 2 is not as fundamental a quantity. It also does not bound the corresponding mutual information I. We can define a quantity that upper bounds I, For There are equalities analogous to (4) and (5) relatingĨ to chaos: for regions R ∈ P({R s }) (for other sorts of regions, factors of D M R will be replaced by entropies), wherẽ This modification of O R has a natural interpretation, paralleling the discussion of (15).
The OTOCs in (17) are to be computed for operators in the past that do not move states out of H M (and act by zero on states outside); in the examples following (15),Õ R will conserve local energy density or subsystem charge, respectively. We also clearly have [π M ,Õ R ] = 0, so according to (15) the mutual information bounds the effect of these operators in the most intuitive way. Note that as H M s becomes smaller, the four-point contributions (17) become more important.
With these preparations, we can extend the results of Section 3 to the case of generic ρ M . For product states |ψ M = s |a M s with each |a M s taken from H M s , we obtain the result analogous to (7): TheĨ terms are the positive corrections for product states from the entropy computed from ρ M . We can also generalize the discussion surrounding (8) to see that so states chosen from the ensemble given by ρ M "equilibriate" (in the above sense of 1-norm) to ρ M given smallĨ terms. The major difference is that ρ M is generically not thermal. Interpreting smallĨ as chaos, this shows that chaos is sufficient to "scramble" initial conditions to the extent that the particular state within the initial ensemble is irrelevant, but we have not shown that the "unentangled microcanonical ensemble" ρ M itself thermalizes. That said, in a system with a local Hamiltonian and with a choice of the regions R s of size much bigger than the thermal correlation length, the contribution of boundary terms to energy is small, and ρ M has a volume law entropy that is close to the thermal value at the same energy expectation value. In other words, an initial pure state drawn from ρ M has already almost thermalized when the reduced density matrix approaches that of ρ M .

Conclusion
We have explored the consequences of small correlation (as computed in U (t) and U M (t)) between the past and future. For the density matrices ρ β=0 and ρ M the expressions (2) and (15) respectively show that certain types of perturbations to these density matrices in some region B are "forgotten" in some region A as long as the information between A in the past and B in the future for U (t) or U M (t) has had time to decay. Next, (7) and (19) show that the decay of information between past and future regions corresponds to entropy growth for far-from-equilibrium pure states. Finally, using these expressions in combination with (8), (14), and (20), we have shown that this decay of information between past and future means the initial conditions of a particular pure state chosen from an ensemble are forgotten (although the ensemble itself is not). Although the discussion proceeds most naturally in terms of information, we can also conclude that quantum chaos as diagnosed by the OTOC and the decay of local two-point functions imply entropy growth and erasure of initial conditions ("equilibration") by relating the OTOC and two-point functions to information. Likewise, generic thermalization implies the contribution of I (2) orĨ terms in (7) or (19) are small, so mutual information between local regions in the future and sub-extensive regions in the past is bounded above. Thus there is a sense in which quantum thermalization implies chaos.
The most important extension of this work is a deeper understanding of the finite temperature results.
The same state may be a member of several ensembles ρ M , but the formalism we developed does not identify a preferred density matrix. Such a preferred density matrix should be a time independent distribution, for example the Boltzmann distribution, when that pure state equilibriates. A first step may be to find conditions such that that the "more thermal" (higher entanglement) ρ M thermalize. It should also be possible to improve the factor D A in (20) in the case that E H e −S (2) [ρ ψ M e −S (2) [ρ M A (t)] for locally thermalizing ρ M . Finally, to make this work more practically applicable, it is important to show that either chaos as measured by the OTOC or decay of information between the future and past is generic for local Hamiltonians.
space V n (as defined above), V n can be written as a direct sum where Y is an index running over Young diagrams with n boxes, and W Y (S Y ) is an irreducible representation of U (D) (S n ) not isomorphic to any other representation appearing with different Y .
We will typically use this theorem to constrain operators that commute with the action of U (D) × S n ; since each irrep of the combined action appears only once in the decomposition of V n , such an operator must act as multiplication by a constant on each irrep by Schur's lemma. Furthermore, the theorem tells us we can project onto each irrep by projecting onto an irrep of only S n , so each such operator can be written as a sum of projectors, each of which is in turn a sum of elements of S n .
As an intermediate result, we must compute the Haar integral A n = dU (U |ψ ψ|U † ) ⊗ n , which is clearly independent of the choice of |ψ . Furthermore, A n commutes with the above actions of U (D) and S n , so we take A n to be a sum of σ ∈ S n . It is easy to check that in fact σA n = A n for all σ ∈ S n , so A n ∝ σ∈S n σ. To find the normalization factor, note that We now compute the expectation of the second Renyi entropy in the setup of Section 4. Call the nonidentity element of S 2 , that swaps tensor factors, X. If we have a Hilbert space H with a subsystem labelled A, there is a permutation group S n A that acts on H ⊗ n by only permuting tensor factors corresponding to subsystem A between copies. We refer to these group elements by a subscript A, so for example X A .
To compute the expectation of the second Renyi entropy, we use the relation Tr[ρ 2 A ] = Tr[ρ ⊗ 2 X A ]. As a reminder, the distribution on initial ρ ψ M in our case is fixed as follows. We are given a partition of Hilbert space into S subsystems, H = s H s , and in the vector space associated to each subsystem we choose a linear subspace H M s (with associated projector π M s ). The distribution on ρ ψ M is independently Haar random on the subspace of each subsystem. From the above discussion on the Haar integral, it follows (1 s + X s ) = 1 where D M s is the dimension of H M s , ρ M = s π M s /D M s , and P({R s }) is the powerset of subsystems.
The second equality comes from noting that the product has 2 S terms, based on a choice of 1 s or X s for each subsystem, and the included swaps combine to give a single swap of all included subsystems. We then have = 1 = 1 where U M (t) = U (t) √ ρ M . To see the last equality, we refer to the explicit construction of the state corresponding to U M (t) following the procedure of Figure 1 to check that the index contractions are correct, and the proportionality factor is correct since Tr[U (t)ρ M U (t) † ] = 1. By the same construction, we so that upon multiplication by appropriate factors of entropy, equations (7) and (19) follow from (23).
To connect (21), and more generally entropies of ρ U M (t) , to observables, we use the explicit form of projectors onto irreps of S 2 : π ± = (1 ± X)/2. Then an operator A on H ⊗ 2 that commutes with the joint action of U (D) × S 2 can explicitly be written as a sum of π ± , with coefficients Tr[Aπ ± ]/ Tr[π ± ]. This gives in other words Y = X/D. We can then write (using (22) and the following discussion), assuming that R factors through the tensor factorization into H s for convenience, where the first equality on the last line follows since [π M , X R ] = 0. These expressions, after multiplication by D M R e S[ρ M A (t)] , give equations (4), (5), (17), and (18). Finally, equations (2) and (15) and the definition of ρ U M (t) .
As a possible tool for computing higher Renyi entropies, we note that elements of S n can be implemented in terms of averages of local operators for n > 2, despite the fact that we used a property special to n = 2 (invariance of O ⊗ 2 when summed over an orthonormal basis) above. The idea is to take some as-yet unchosen Hermitian operator A, and define M = dU (U AU † ) ⊗ n , which commutes with U (D)×S n , and so is a weighted sum of projectors onto irreps of S n . The numbers Tr[M σ] for σ ∈ S n determine the weights; these are in turn products of traces of powers of A. Thus M depends only on the spectrum of A, and by tuning this spectrum we can tune the weights of projectors. For example, if we take n = 2, the Haar average over operators A with some fixed spectrum satisfying D( i λ i ) 2 = i λ 2 i is proportional to X. (6) We present a short derivation of (6), based on equation 23 of [16]. That equation, in our notation, is

B Derivation of Equation
where another result of [16] is that τ ≥ ln(D A D B )/(ln(D A D B ) + 1). Then, noting that S[ρ U (t) A F ] = S (2) [ρ U (t) A F ] = ln D A and likewise for ρ U (t) B F as ρ U (t) is maximally entangled between past and future (see Figure 1c), we can multiply both sides by D A D B and use the bound on τ to obtain (6).