Chaos and high temperature pure state thermalization

Classical arguments for thermalization of isolated systems do not apply in a straightforward way to the quantum case. Recently, there has been interest in diagnostics of quantum chaos in many-body systems. In the classical case, chaos is a popular explanation for the legitimacy of the methods of statistical physics. In this work, we relate a previously proposed criteria of quantum chaos in the unitary time evolution operator to the entanglement entropy growth for a far-from-equilibrium initial pure state. By mapping the unitary time evolution operator to a doubled state, chaos can be characterized by suppression of mutual information between subsystems of the past and that of the future. We show that when this mutual information is small, a typical unentangled initial state will evolve to a highly entangled final state. Our result provides a more concrete connection between quantum chaos and thermalization in many-body systems.


Introduction
Empirically, there is a generic tendency towards entropy growth in many body systems. This "arrow of time" appears at odds with the fact that our physical models of these systems are often time-reversal symmetric. A common argument has been available from the time of Boltzmann: states are exponentially likely to evolve to states of higher entropy, simply due to the counting of states at given entropy (see for example [1]). In fact, in the presence of time-reversal symmetry this explanation is incomplete. For every state of a given entropy with entropy growth, there is a state with the same entropy but with entropy decay. The best we can actually hope for is to explain why some class of preferred states experiences entropy growth in some subsystem under specific dynamics.
This question is closely related to another aspect of quantum many body systems. The use of statistical ensembles to understand the long-time collective behavior of many degrees of freedom in terms of local microscopic interactions is one of the great simplifications and triumphs of modern physics. It is natural to ask when and why the use of these ensembles is justified; if all expectation values of interest for a given initial condition after time evolution can be computed to arbitrary accuracy in an ensemble depending only on macroscopic parameters, we will say that system thermalizes for those initial conditions and that time evolution. For isolated classical systems, dynamical chaos is a sufficient and generic condition for ergodicity in phase space, which explains the accuracy of the microcanonical ensemble and hence equilibrium statistical mechanics.
The quantum case is more complicated. It is important to note that there are classical systems with few degrees of freedom whose observables are well-described by the microcanonical ensemble, but upon quantization do not thermalize. Although there are JHEP06(2019)025 experimental probes of few-body quantum systems that do thermalize [2], generic thermalization in quantum systems appears to be inherently a many-body effect [3]. Quantum mechanics also supports a long-time behavior not present in classical systems, Many-Body Localization (MBL). Recent experiments detecting these phases [4] provide practical motivation to explain the mechanism of and conditions for thermalization of isolated systems. Finally, our understanding of quantum chaos and ergodicity is still incomplete, and leading justifications for quantum statistical mechanics are not as directly connected to quantum chaos as in the classical case.
There are two leading explanations for quantum thermalization. One is known as Canonical Typicality (CT) [5,6], the statement that due to the exponentially large dimension of Hilbert space or the subspaces associated to finite energy windows, almost all pure states in the subspace will appear as if they were randomly chosen from that subspace, i.e. indistinguishable from the microcanonical ensemble, on any small subregion. Importantly, the CT approach can be extended to the dynamical result [7] that, under weak assumptions about the distribution of eigenvalues of the Hamiltonian, a subsystem interacting with a sufficiently large bath will spend most of its time close to its time average, independent of the initial state of the subsystem and for almost all initial states of the bath. One useful way to view CT is as an extension of the statistical argument for entropy growth: most states in a subspace are already close to maximum entanglement within the subspace. The other explanation is the Eigenstate Thermalization Hypothesis (ETH) [3,8], which loosely stated is the conjecture that the high-energy eigenstates of quantized classically chaotic systems are indistinguishable from the microcanonical ensemble of the same system for local observables. This conjecture is well-supported numerically for a large class of systems, and gives a very clean description of quantum thermalization when it applies.
There are some deficiencies remaining in both approaches. There are no direct criteria to evaluate on a Hamiltonian to see if ETH holds, short of finding the eigenstates. On a related note, ETH has only been proven true for a small class of systems. Finally, although the ETH is inspired by ideas about classical chaos and ergodicity, there is no proof that chaos in dynamics implies ETH. 1 The conclusions of CT appear completely unrelated to whether a system is chaotic. The principle mechanism of CT is that typical states are close to maximally entangled, or already at equilibrium. The problem is that we would like information about highly atypical states (out-of-equilibrium low-entanglement states) that form a set of measure zero in most subspaces of high dimension. The dynamical extension [7] solves this problem for systems where a small subsystem can be highly atypical in a much larger typical bath. This is a reasonable assumption for a near-isolated quantum system interacting with the rest of the world, but is not useful when we wish to consider an even smaller class of states, where the entire system is far from equilibrium and has low entanglement. It is also a statement about time averages as opposed to instantaneous density matrices. Finally, like ETH, CT gives no criterion on the time evolution to distinguish thermalizing from non-thermalizing systems. On a related note, there is no explanation for the mechanism of thermalization, apart from the high dimension of Hilbert space.

JHEP06(2019)025
In this work, we link the entropy growth of low-entanglement states under unitary time U (t) to measures of quantum chaos associated with U (t). In doing so, we begin to address the above deficiencies in explanations of quantum thermalization. More specifically, we consider the chaos criteria proposed in [10]. In this work, the unitary time evolution operator U (t) = e −itH is mapped to a doubled state. The doubled state is defined by considering two copies of the physical system and preparing maximally entangled EPR pairs between each site of the physical system and its doubling partner. Denoting this state by |I , the unitary U (t) is mapped to the pure state 1⊗U (t)|I . By construction, the two copies of systems (named as the past system and future system) are always maximally entangled, with U (t) the Schmidt matrix of the wavefunction. Quantum chaos is characterized by the suppression of mutual information between subsystems of the future and past systems. A small mutual information between a region A in the past and a region B in the future tells us that operators in A mostly evolve to non-local operators exceeding the boundary of B, causing a suppression of local correlation functions. This criteria is shown to be related to another chaos criteria, the out-of-time-ordered correlation (OTOC) functions [11][12][13][14][15][16].
In this paper, we show that the mutual information criteria defined in [10] also controls the entropy growth for proper choices of low entropy initial states. More specifically, we consider a given partition of the system into multiple regions, and consider an ensemble of initial states that are tensor products of random states in each region. After time evolution by U (t), we study the purity Tr ρ A (t) 2 of a subsystem A in the final state. The ensemble average of the second Renyi entropy is determined by a sum in which each term is controlled by the second Renyi mutual information in the doubled state. When the mutual information terms are sufficiently small, a typical product state at initial time evolves into a state with nearly maximal entropy. Therefore we have shown that chaos in the dynamics U (t) implies thermalization, at least for the ensemble of unentangled initial states we define. Since the random product state has a high energy, the final state has maximal entropy and infinite temperature. We also discuss the generalization of our result to initial state ensembles with finite temperature.
The remainder of the paper is organized as follows. In section 2, we review the relevant aspects of quantum chaos and show how information theoretic quantities are linked to rethermalization of the thermal ensemble. Sections 3 and 4 are the main results of this work, and demonstrate a connection between thermalization of product states, quantum information theory, and quantum chaos. Derivations of the main results are in appendix A.
In what follows, we denote density matrices of subsystems by ρ A = TrĀ ρ, dimension of subsystem A by D A , and operators that act by identity inĀ as O A . As a reminder, the von Neumann entropy of a density matrix ρ is S[ρ] = − Tr[ρ ln ρ], and the mutual information between two subsystems in ρ is

Quantum chaos
We start by reviewing some recent results in understanding quantum chaos. In classical systems, one diagnostic of chaos is exponential sensitivity to initial conditions, quantified by the exponential growth of the Poisson bracket of some pair of phase space coordi- [11]. Of the four terms in the expansion of this expression, the most interesting for our purpose is the out-of-time ordered four-point correlator (OTOC) (2.1) whose decay for thermal ρ has been interpreted as a signal of quantum chaos [10,12,13].
The decay of C 4 seems to be an operator-dependent statement, but is in fact related to an information-theoretic quantity, the second Renyi entropy S (2) , computed from the time evolution U (t) = e −iHt [10]. Since U (t) ∈ H ⊗ H * , we can consider it as a normalized state in a Hilbert space with inner product A, B = Tr[A † B]/D (here D is the dimension of H ). We can associate the copy of the Hilbert space corresponding to the future (past) with the left (right) tensor factor in H ⊗ H * . For intuition and computation, it can be useful to choose an isomorphism H * H that is compatible with the tensor factorization and think of U (t) as an entangled state on two copies of the original system. Concretely, we consider the original Hilbert space H as a tensor product of small Hilbert spaces (for example on each site of a lattice): H = x H x . Denoting an orthonormal basis of H x by |α x , α x = 1, 2, . . . , dim H x , one can define the maximally entangled state in the doubled The state |I encodes an isomorphism from operators (elements of H * ⊗H ) to states in a doubled system (elements of H ⊗ H ) by right action, so we can explicitly map the unitary operator U (t) to the state |U (t) = (1 ⊗ U (t))|I . 2 We denote the density matrix associated with the pure state |U (t) as ρ U (t) = |U (t) U (t)|. The construction of |U (t) and an example partial trace of ρ U (t) is illustrated in figure 1.
Correlations between the past and future copies of Hilbert space in U (t) are related to chaos and scrambling. For example, the mutual information between a region A in the future and region B in the past bounds correlations in time: We can already see a connection of the information content of ρ U (t) and thermalization in (2.2). If the mutual information between A in the future and B in the past is small in ρ U (t) , the action of an operator in B in the past has no influence on the action of an operator in A in the future. This shows that small mutual information in ρ U (t) is sufficient for rethermalization of the infinite temperature ensemble after perturbation. Thus in this case, we have the natural statement that information in U (t) between regions in the future and past tells us how sensitive the future region is to the initial conditions in the past region.
The main goal of this work is to extend this result to far-from-equilibrium pure states. More generally, (2.2) shows that we can think of the mutual information I[ρ U (t) ; A F , B P ] roughly as quantifying how much initial conditions in B determine the subsystem A after time evolution U (t).

JHEP06(2019)025
A F ∪B P ; on the left, A has a white box around it, and B has a red box. Figure 1. Pictorial representation and explicit construction of the mapping from time evolution operator U (t) to the state |U (t) ∈ H ⊗ H and the associated density matrix ρ U (t). First, in 1a we introduce our notation and draw U (t) as a tensor with "input" legs at the bottom and "output" legs at the top. To help keep track of the future and past, we draw the output edge of U (t) as a bolded line. Each leg corresponds to a subsystem of H and denotes an index in the tensor, and contraction is represented by simply connecting "input" with "output" legs. A particular example of this operation is shown in 1b, where we depict action by 1 ⊗ U (t) on the maximally entangled state |I , turning U (t) into a state |U (t) on a doubled system. In 1c we show ρ is maximally entangled between the past and future, so that for any region R exclusively in the past or future, A main result of [10] is a more explicit connection of past-future mutual information to chaos: the average of the OTOC (equation (2.1)) over operators in subsystem A in the future and B in the past is proportional to , whereB P is the complement of B P in the past system. By average of operators on a subsystem A we mean a weighted sum over the D 2 A Hermitian operators in a complete, orthonormal basis (under the above inner product on operators). We will write these operators in a script font, as O A , with the average implied wherever they appear. The second Renyi entropy is defined as and is a measure of uncertainty in ρ: for pure states, S (2) = 0, while for maximally mixed states S (2) = S = log D where S is the von Neumann entropy and D is the dimension of the Hilbert space. There are some other properties of A F ∪B P ] that will be important in what follows. It can be seen from Jensen's inequality that S (2) Thus when S (2) is near-maximal, so is S. We also have the bounds 3 Thus for B much larger than A, S (2) is large for "kinematic" reasons, independent of the time evolution. The more interesting quantity in this case is a version of the mutual information adopted for Renyi entropy,

JHEP06(2019)025
, which is non-negative in our state since ρ U (t) is maximally entangled between the future and past. I (2) A F ∪B P ] about its kinematic value, and bounds the corresponding mutual information I[ρ U (t) ; A F , B P ] from above.
We can then write (as shown in [10]) Thus scrambling in U (t) as quantified by S (2) is directly related to chaos. A generically small four-point correlator means a large Renyi entropy or a small I (2) . For A and B small, the expression (2.4) is actually in terms of non-local operators onB. In this case, there is a more natural expression in terms of local operators, The two expressions (2.4) and (2.5) emphasize the important point that the second Renyi mutual information characterizes the behavior of both two-point functions and OTOC.
between two small regions A and B is governed by two-point functions of operators supported on A and B, while that between a small region A and a big region B (bigger than half system size) is governed by the OTOC of operators supported on A and the smaller regionB. This is also consistent with the fact that the decay of the OTOC implies a stronger scrambling of information than simply the decay of two-point functions.
As we will see below, one utility of the point of view of information is a unified treatment of the two-and four-point functions.
In [10], the relationship (2.4) is used to show that a four-point correlator decaying to some value less than in any region implies that the sum of mutual informations In principle, can be so small that this sum is arbitrarily close to zero. In more realistic models, we can expect that the OTOC will decay as some polynomial of the logarithm of the total Hilbert space dimension. This sum is called tripartite information and its negativity is proposed as a measure of "scrambling" due to unitary time evolution; then quantum chaos as measured by the decay of C 4 implies scrambling.
We would like to make a side remark at the end of this section. We treat I (2) and I for ρ U (t) as operator-independent diagnostics of chaos. It is clear from the discussion above that if the OTOC and two-point functions decay generically, I (2) will be small, which implies I is small as well. Although it is most direct from the discussion above to treat I (2) as the intrinsic measure of chaos and I simply as a quantity also small in chaotic systems only because it is bounded by I (2) , the true mutual information I is more natural in many other contexts and it is intuitive that small mutual information of ρ U (t) should imply chaos. To that end, using a bound on von Neumann entropy in terms of Renyi entropy [17] (see appendix B), we can show that

JHEP06(2019)025
Thus a sufficiently small mutual information implies small I (2) , which in turn implies chaos according to the OTOC. In the remainder of the work, we will focus on I (2) , but (2.6) should be kept in mind as a way to bound I (2) in terms of the true mutual information.

Thermalization of completely random product states
Our goal is to understand how entropy growth and thermalization is related to quantum chaos as defined above. As discussed in section 1, entropy growth is a state-dependent statement and can only be true for specific classes of states, for example initial states with small entanglement. The most naive choice of initial state ensemble is product states of some fixed granularity. More precisely, we consider a partition of the initial system into regions R s such that ∪ S s=1 R s = P is the whole system. Correspondingly, each region R s has a Hilbert space H s , and the Hilbert space of the whole system can be written as a tensor product of subsystems H = s H s . We consider states of the form |ψ(0) = s |a s , with |a s a random pure state in H s . An example of one of these states, along with its time evolution, is shown in figure 2a. There is no change in the global entropy of a density matrix under unitary evolution, but there can be changes in subsystems. Thus we consider the second Renyi entropy of the density matrix corresponding to an initial low-entanglement pure state in some subsystem A 4 after time evolution by U (t), averaged uniformly (according to the Haar measure) over initial states of the form |ψ(0) . Denoting , where integrals are done over the Haar measure, and ρ ψ (0) = |ψ(0) ψ(0)|, we find our main result (3.1) where the sum runs over all nontrivial subregions R = R i 1 ∪ R i 2 ∪ · · · ∪ R in that are unions of some of the building blocks R s . P({R s }) denotes the set of all such R's, i.e. the powerset of {R 1 , R 2 , . . . , R S }. A similar relation has recently been studied in the context of random dynamics in [18]. A representation of a typical term in the sum is shown in figure 2b. We give some examples of this formula below, and present a derivation in appendix A.
Apart from bounding von Neumann entropy from below, the utility of computing S (2) is that it can be used to bound the one-norm difference of density matrices. Recall that , so the one-norm is the natural distance for density matrices. By Jensen's inequality, Thus as long as the deviation from maximal entanglement is sufficiently small, we can say a density matrix thermalizes in the one-norm in expectation. As we will see below, it turns out that thermalizing in expectation (at infinite temperature) sufficiently well is JHEP06(2019)025   Figure 2c illustrates the special case with bipartition of the past system. sufficient for most states to thermalize. Note that the infinite temperature ensemble is the appropriate choice here, since for typical Hamiltonians most states will be infinite temperature states (cf. [19]), and we always have To get an intuition for the implications of (3.1), it is illuminating to consider some special cases. First, we consider a trivial partition with only one region R 1 = P equal to the whole system. In this case the ensemble is that of random pure states on the whole system. Our formula reduces to which is completely independent of dynamics. In the limit of large system size, as long as the subsystem A is less than half the system and grows at most linearly with system size, D 2 A /D decays exponentially with system size. Then (3.3) is the familiar statement that to exponential accuracy, a random pure state is close to maximally entangled in any small subsystem. This result is expected, as the typical pure state is indeed close to maximally entangled in any subsystem [20][21][22], and a random state evolves to another random state under any dynamics. In fact, the result of [20], derived by explicit integration on S 2D−1 , is a special case of (3.3) for trivial evolution U (t) = 1. We can see the relationship to a more traditional measure of entanglement, the von Neumann entropy, by Jensen's inequality: Already in this next-to-simplest case dynamics play a central role. First, if region S is large (and S is even larger), all the terms in (1+D S +DS)/D are exponentially small. Regardless, this contribution serves to increase S (2) . For A smaller than half system size, D 2 A /D is exponentially small. The only decrease from maximal entanglement that can survive in the large system size limit is then due to the terms involving I (2) [ρ U (t) ; A F , R P ]. Thus small I (2) between A and both S andS, equivalent respectively to the generic decay of two-and four-point correlators between A and S, is necessary and sufficient for the expectation of S (2) [ρ ψ A (t)] to be near the maximal (equivalently thermal at infinite temperature) value for initial product states in S andS. It is important to note that "small" I (2) depends on our choice of S, as terms have the form e I (2) [ρ U (t) ;A F ,R P ] /D R . If we want this contribution less than R , we only require the condition that I (2) It is also useful to rewrite (3.5) in terms of correlation functions For large systems D 2 A /D is exponentially smaller than D 2 A /D S , so we can safely focus on the contributions due to correlators. It is clear that if the two-point functions decay and D 2 A /D S is finite, the deviation from maximum entanglement will be dictated by the (strictly positive) four-point term D 2 A C 4 (O A (t), O S (0)) β=0 /D S . Note that depending on the choice of A and S, even with both less than half system size and (for lattice models) |A| < |S|, D 2 A /D S may be made of order, or even much greater than 1. As mentioned in section 2, C 4 (O A (t), O S (0)) β=0 can be as small as an inverse polynomial in the logarithm of system dimension in chaotic systems, so as long as A and S are chosen so that D 2 A /D S does not grow too quickly with system size, in chaotic systems random product states on S and S will evolve to look thermal in A. In the limit D 2 A /DS → 0, only the local two-point function will contribute to deviations from ln D A . This is the case considered by CT, and shows that chaos in the OTOC sense is not necessary for thermalization into a much larger random bath. On the other hand, when DS is finite, if four-point correlations do not decay sufficiently we can have significant corrections to thermal entropy.
This argument extends without significant modification to the case of S initial subsystems H = S s=1 H s , where S may grow linearly with system size. As long as two-point functions generically decay between A and subsystems up to half system size, the contribution from summands in (3.1) involving R less than half the system will be small. The decay of four-point correlators between A and subsystems up to half system size is necessary to JHEP06(2019)025 bound contributions from summands involving R greater than half system size. Concretely, we need, for regions R less than half system size, A for A to look maximally entangled on average. Note that in an integrable system, it is expected that I (2) will always be high for some subextensive 5 subregions R (for example numerics in [10]), although these regions may change in time as information propagates. Some subextensive region of initial conditions largely determines the density matrix in A. This demonstrates an obstacle to thermalization in non-chaotic isolated systems. In contrast, a chaotic system will scramble information about the initial conditions in each H s across extensive regions of the system. Equivalently, extensive knowledge of initial conditions determines the density matrix in A. The only R for which exp(I (2) A /D R decays exponentially in system size if A is chosen as in the examples above.
For systems with small, we can meaningfully bound the number of states that do not thermalize by Markov's inequality: The conclusion is that if we find that states are expected to thermalize sufficiently well, then a particular state is likely to have the average behavior after long times. This bound is easily "weakened" to a statement about probabilities of significant deviation for local entropy. On the other hand, if states are not expected to thermalize, we do not expect to find such a bound on physical grounds; the long-time trajectory of non-thermalizing systems can depend sensitively on the details of initial conditions.

Finite temperature extension
As discussed above, the preceding results should be interpreted as statements about thermalization at infinite temperature. To get ensembles other than infinite temperature we must restrict the set of initial states we average over. One natural way is to still consider a partition into regions R s , s = 1, 2, . . . , S, but in each region we restrict the state into a subspace of Hilbert space H s , denoted as H M s ⊂ H s . Physically, H M s is the subspace of states in an energy window E 0 < E < E 0 + ∆E, when we define the energy with respect to the subsystem Hamiltonian of R s , neglecting the boundary term contribution. We can define a "microcanonical" density matrix ρ M s = π M s /D M s for each region, with π M s the projection

JHEP06(2019)025
operator onto H M s , and D M s the dimension of H M s . Then we consider the initial state as pure states drawn from the ensemble ρ M = ⊗ s ρ M s , which are states with zero entanglement entropy between different regions, and have a finite energy density. It will be convenient to change the normalization on the operator inner product to be A, B = Tr[A † B].
The above results suggest that entropy growth and thermalization of these product states should be related to entanglement properties of some state depending on U (t); tentatively, call this state |U M (t) (in sections 2 and 3, the relevant state was isomorphic to the operator U (t)). As a first check that we have chosen a useful state, it is natural to require that some analog of (2.2) hold for ρ U M (t) . Such a result would suggest that correlations in |U M (t) are related in the same intuitive way to thermalization of states from the ensemble ρ M as correlations in the state |U (t) are to thermalization of states from the ensemble 1/D. A useful choice turns out to be |U M (t) = 1 ⊗ U (t)ρ 1/2 M |I ; since we have chosen ρ M compatible with the tensor factorization of H , to understand this state one can just put projectors on each input leg of U (t) in figures 1 and 2. The case π M = 1 has been described in section 2; it turns out this is a very special case, due to the fact that 1 commutes with everything. For general ρ M and two regions A F , B P in the past and future systems, respectively, we have the bound If the commutator [π M , O B ] is small, (4.1) becomes exactly analogous to (2.2). For example, suppose H has the form H ≡ H L = s H s + ∂s H ∂s where H s act on disjoint subsystems H s , and the boundary terms H ∂s are allowed to couple "nearby" subsystems. If we then choose π M s to project onto some subsystem energy window (one where the eigenvalues of H s lie in some fixed range) and take π M = s π M s , operators O B that are local to subsystems and do not change the energy outside the energy window will have zero commutator with π M . Another example is some local conserved quantity that we choose to concentrate in some subsystem H s by choice of π M s ; if O B does not transport this charge across subsystems it will have zero commutator with π M . If the above conditions are only met approximately (O B has small matrix elements for bringing states out of and into H M ), the commutator will be small. Of course, we can enforce a zero-commutator condition on O B by simply taking it toÕ B = π M O B π M . Then we can directly interpret (4.1) as the "re-equilibration" of ρ M after acting by O B ; perturbing the state by O B does not affect the action of O A in the future. Of course, the case π M = 1 reduces exactly to (2.2) for any choice of O B . As mentioned, π M = 1 is a very special case, and this gives rise to important differences when relating information measures to chaos and equilibration for generic ρ M . The bound S (2) ≤ S is always true, so S (2) of ρ U M (t) is still a good measure of the "correlation" between the past and future. Important special cases are S (2) and S (2)  contrast to the π M = 1 case), so I (2) as defined in section 2 is not as fundamental a quantity. It also does not bound the corresponding mutual information I. We can define a quantity that upper bounds I, There are equalities analogous to (2.4) and (2.5) relatingĨ to chaos: TheĨ terms are the positive corrections for product states from the entropy computed from ρ M . We can also generalize the discussion surrounding (3.2) to see that so states chosen from the ensemble given by ρ M "equilibriate" (in the above sense of 1-norm) to ρ M given smallĨ terms. The major difference is that ρ M is generically not thermal.
Interpreting smallĨ as chaos, this shows that chaos is sufficient to "scramble" initial conditions to the extent that the particular state within the initial ensemble is irrelevant, but we have not shown that the "unentangled microcanonical ensemble" ρ M itself thermalizes.

JHEP06(2019)025
That said, in a system with a local Hamiltonian and with a choice of the regions R s of size much bigger than the thermal correlation length, the contribution of boundary terms to energy is small, and ρ M has a volume law entropy that is close to the thermal value at the same energy expectation value. In other words, an initial pure state drawn from ρ M has already almost thermalized when the reduced density matrix approaches that of ρ M .

Conclusion
We have explored the consequences of small correlation (as computed in U (t) and U M (t)) between the past and future. For the density matrices ρ β=0 and ρ M the expressions (2.2) and (4.1) respectively show that certain types of perturbations to these density matrices in some region B are "forgotten" in some region A as long as the information between A in the past and B in the future for U (t) or U M (t) has had time to decay. Next, (3.1) and (4.5) show that the decay of information between past and future regions corresponds to entropy growth for far-from-equilibrium pure states. Finally, using these expressions in combination with (3.2), (3.8), and (4.6), we have shown that this decay of information between past and future means the initial conditions of a particular pure state chosen from an ensemble are forgotten (although the ensemble itself is not). Although the discussion proceeds most naturally in terms of information, we can also conclude that quantum chaos as diagnosed by the OTOC and the decay of local two-point functions imply entropy growth and erasure of initial conditions ("equilibration") by relating the OTOC and two-point functions to information. Likewise, generic thermalization implies the contribution of I (2) orĨ terms in (3.1) or (4.5) are small, so mutual information between local regions in the future and sub-extensive regions in the past is bounded above. Thus there is a sense in which quantum thermalization implies chaos.
The most important extension of this work is a deeper understanding of the finite temperature results. The same state may be a member of several ensembles ρ M , but the formalism we developed does not identify a preferred density matrix. Such a preferred density matrix should be a time independent distribution, for example the Boltzmann distribution, when that pure state equilibriates. A first step may be to find conditions such that the "more thermal" (higher entanglement) ρ M thermalize. It should also be possible to improve the factor D A in (4.6) in the case that E H e −S (2) for locally thermalizing ρ M . Finally, to make this work more practically applicable, it is important to show that either chaos as measured by the OTOC or decay of information between the future and past is generic for local Hamiltonians.

A Derivation of main results
We present the derivation of the results in section 4, which are a strict generalization of the results in sections 2 and 3. The main tool is the Schur-Weyl duality, which describes the combined action of the symmetric and unitary groups on tensor product spaces. On the vector space V n = (C D ) ⊗ n , the symmetric group on n letters, S n , acts in a natural way by permuting the n factors, while the unitary group U (D) acts by U ⊗ n for U ∈ U (D).
Theorem 1 (Schur-Weyl Duality). Under the combined natural actions of S n and U (D) on the vector space V n (as defined above), V n can be written as a direct sum where Y is an index running over Young diagrams with n boxes, and W Y (S Y ) is an irreducible representation of U (D) (S n ) not isomorphic to any other representation appearing with different Y .
We will typically use this theorem to constrain operators that commute with the action of U (D)×S n ; since each irrep of the combined action appears only once in the decomposition of V n , such an operator must act as multiplication by a constant on each irrep by Schur's lemma. Furthermore, the theorem tells us we can project onto each irrep by projecting onto an irrep of only S n , so each such operator can be written as a sum of projectors, each of which is in turn a sum of elements of S n .
As an intermediate result, we must compute the Haar integral A n = dU (U |ψ ψ|U † ) ⊗ n , which is clearly independent of the choice of |ψ . Furthermore, A n commutes with the above actions of U (D) and S n , so we take A n to be a sum of σ ∈ S n . It is easy to check that in fact σA n = A n for all σ ∈ S n , so A n ∝ σ∈S n σ. To find the normalization factor, note that 6 This gives We now compute the expectation of the second Renyi entropy in the setup of section 4. Call the non-identity element of S 2 , that swaps tensor factors, X. If we have a Hilbert space H with a subsystem labelled A, there is a permutation group S n A that acts on H ⊗ n by only permuting tensor factors corresponding to subsystem A between copies. We refer to these group elements by a subscript A, so for example X A . To compute the expectation 6 The last equality is proved easily after noting that for a given permutation of n − 1 elements, to form a permutation of n elements the nth element either forms a new cycle, contributing a factor of D, or can be put in an existing cycle in n − 1 distinct ways (regardless of the permutation).

JHEP06(2019)025
of the second Renyi entropy, we use the relation Tr[ρ 2 A ] = Tr[ρ ⊗ 2 X A ]. As a reminder, the distribution on initial ρ ψ M in our case is fixed as follows. We are given a partition of Hilbert space into S subsystems, H = s H s , and in the vector space associated to each subsystem we choose a linear subspace H M s (with associated projector π M s ). The distribution on ρ ψ M is independently Haar random on the subspace of each subsystem. From the above discussion on the Haar integral, it follows that (1 s + X s ) = 1 where D M s is the dimension of H M s , ρ M = s π M s /D M s , and P({R s }) is the powerset of subsystems. The second equality comes from noting that the product has 2 S terms, based on a choice of 1 s or X s for each subsystem, and the included swaps combine to give a single swap of all included subsystems. We then have To see the last equality, we refer to the explicit construction of the state corresponding to U M (t) following the procedure of figure 1 to check that the index contractions are correct, and the proportionality factor is correct since Tr[U (t)ρ M U (t) † ] = 1. By the same construction, we can compute S (2)  To connect (A.1), and more generally entropies of ρ U M (t) , to observables, we use the explicit form of projectors onto irreps of S 2 : π ± = (1 ± X)/2. Then an operator A on H ⊗ 2 that commutes with the joint action of U (D) × S 2 can explicitly be written as a sum of π ± , with coefficients Tr[Aπ ± ]/ Tr[π ± ]. This gives 2) and the following discussion), assuming that R factors through the tensor factorization into H s for convenience, and the definition of ρ U M (t) . As a possible tool for computing higher Renyi entropies, we note that elements of S n can be implemented in terms of averages of local operators for n > 2, despite the fact that we used a property special to n = 2 (invariance of O ⊗ 2 when summed over an orthonormal basis) above. The idea is to take some as-yet unchosen Hermitian operator A, and define M = dU (U AU † ) ⊗ n , which commutes with U (D) × S n , and so is a weighted sum of projectors onto irreps of S n . The numbers Tr[M σ] for σ ∈ S n determine the weights; these are in turn products of traces of powers of A. Thus M depends only on the spectrum of A, and by tuning this spectrum we can tune the weights of projectors. For example, if we take n = 2, the Haar average over operators A with some fixed spectrum satisfying D( i λ i ) 2 = i λ 2 i is proportional to X.

B Derivation of equation (2.6)
We present a short derivation of (2.6), based on equation 23 of [17]. That equation, in our notation, is where another result of [17] is that τ ≥ ln(D A D B )/(ln(D A D B ) + 1). Then, noting that S[ρ U (t) A F ] = S (2) [ρ U (t) A F ] = ln D A and likewise for ρ U (t) B F as ρ U (t) is maximally entangled between past and future (see figure 1c), we can multiply both sides by D A D B and use the bound on τ to obtain (2.6).