Onset of Random Matrix Behavior in Scrambling Systems

The fine grained energy spectrum of quantum chaotic systems is widely believed to be described by random matrix statistics. A basic scale in such a system is the energy range over which this behavior persists. We define the corresponding time scale by the time at which the linearly growing ramp region in the spectral form factor begins. We call this time $t_{\rm ramp}$. The purpose of this paper is to study this scale in many-body quantum systems that display strong chaos, sometimes called scrambling systems. We focus on randomly coupled qubit systems, both local and $k$-local (all-to-all interactions) and the Sachdev--Ye--Kitaev (SYK) model. Using numerical results for Hamiltonian systems and analytic estimates for random quantum circuits we find the following results. For geometrically local systems with a conservation law we find $t_{\rm ramp}$ is determined by the diffusion time across the system, order $N^2$ for a 1D chain of $N$ qubits. This is analogous to the behavior found for local one-body chaotic systems. For a $k$-local system with conservation law the time is order $\log N$ but with a different prefactor and a different mechanism than the scrambling time. In the absence of any conservation laws, as in a generic random quantum circuit, we find $t_{\rm ramp} \sim \log N$, independent of connectivity.


Introduction and Summary
Dynamical aspects of many-body quantum chaos are of fundamental interest in various fields of physics, including condensed matter physics, quantum information theory and quantum gravity. Random Matrix Theory (RMT) provides us with a simple characterization of quantum chaos: if the level statistics at small energy separation agrees with RMT, one concludes the system under consideration is chaotic. There is a rich theory of this for few-body semiclassical quantum systems (for a general reference see [1]). For many-body systems there is a substantial amount of numerical evidence, but not yet a fully developed theory (for recent analytic approaches see [2,3,4]). This work was motivated by recent numerical work we and others [6,7,8] performed on the Sachdev-Ye-Kitaev (SYK) model [9,10,12] , which has become a reference model for the study of many-body quantum chaos.
Another diagnostic of strong chaos is sensitive dependence to initial conditions, classically diagnosed by the presence of a positive Lyapunov exponent. The semiclassical quantum analog of this in few-body chaos was discussed in the 1960s [13], and diagnosed by out-of-time-order correlation functions (OTOC). Motivated by a desire to understand the quantum chaotic nature of black holes [14,15], these diagnostics were rediscovered in the general many-body context [16] and have been studied extensively in recent years [10,17,18,19,20,21,22,23]. A positive value of the analogous "chaos exponent" is a signature of strong many-body chaos. Systems with such behavior are sometimes referred to as "fast scramblers" [15]. There is a characteristic time scale for such systems, the scrambling time t scr , that measures when an initial perturbation affecting only a few degrees of freedom has chaotically spread to all the degrees of freedom in the system.
Such systems also display a simpler signature of thermalization. Two point functions of generic simple operators exponentially decay. We call the decay time of such quantities the "dissipation time" t diss . In general this time is different from the inverse of the chaos exponent.
There is also a natural time scale associated to RMT level statistics. In the energy domain one can ask how far in energy space do random matrix statistics persist, that is how many energy levels apart can one go before the characteristic correlations of RMT are lost. The inverse of this energy defines a time, which in the many-body context we will call the "ramp time", t ramp for reasons that will be made clear below.
The RMT behavior of the fine-grained energy spectrum is universal in quantum chaotic systems. 1 The structure of the spectrum at time scales shorter than t ramp contains nonuniversal information about the specific system.
In simpler, one-body, systems this time scale has been studied extensively [24,25]. For a single quantum particle propagating in a random medium, or equivalently for a banded random matrix, this time scale is the time for a particle to diffuse across the system t diff . The corresponding energy is called the "Thouless energy" and we will sometimes refer to this time scale as the "Thouless time." The goal of this paper is to study the relationship between these various time scales in strongly chaotic many-body quantum systems. 2 These time scales are connected, via gauge/gravity duality, to phenomena in quantum black holes. This connection provides an important motivation for our work.
To disentangle the various phenomena involved we study two different types of system: k-local systems, with degrees of freedom nonlocally coupled, but only k degrees of freedom interacting at a time; and geometrically local systems, with only nearest neighbor couplings. The k-local systems we study are the SYK model and a qubit model of pairs of qubits interacting with Gaussian random couplings which we call the "2-local Randomly Coupled Qubits (RCQ)" model. The geometrically local systems we study are a one dimensional nearest neighbor qubit model with Gaussian random couplings (the "Local RCQ") and, for reference, banded matrices.
Robust analytic techniques to study RMT phenomena in such many-body systems have not yet been developed. This is a major goal for future research. So of necessity one part of our work involves numerical investigation. But powerful techniques have been developed to study related systems, ones in which the random couplings vary at each time step. For discrete time steps these are referred to as random quantum circuits, in the continuous time limit these are called Brownian circuits. 3 As we will discuss various indicators of chaos are well captured by such systems. But a major difference with time independent Hamiltonian systems is the absence of conservation of energy. The fluctuating couplings pump energy into the system. To study the effects of conservation laws we use a recently introduced random quantum circuit that conserves spin [43,48].
We are not able to give a definitive analysis of t ramp in time independent Hamiltonian systems, but are able to provide numerical results that are consistent with those obtained from random quantum circuits with conservation laws.

Summary of Results
In Section 2 we review the basics of RMT, including the density of states and the eigenvalue pair correlation function. In Section 3 we introduce diagnostics of spectral rigidity, the hallmark of the spectrum of random matrices and quantum chaotic systems. We first discuss the "number variance" and point out artifacts that can affect this quantity in manybody systems. We then define the spectral form factor which is the Fourier transform of the pair correlation function. We discuss the robustness of this diagnostic in the presence of a varying density of states, and then point out the contamination of this signal due to sharp edges in the spectrum. We then introduce a new quantity Y Y * that incorporates a Gaussian filter into the spectral form factor that removes the effect of sharp edges. We then discuss analogous quantities for random unitary matrices.
In section 4.1 we study the ramp time t ramp in various k-local scrambling systems numerically. In the SYK model we use Y Y * to eliminate the "slope" and show that t ramp 3 Random quantum circuits and Brownian circuits have played an important role in a number of recent developments. An early application in the black hole context was [27]. More recently these systems have been used in the quantum information community to provide rapid approximations to random unitary operators [28]. An important advance was the use of Markov chain techniques to estimate convergence times [29]. In particular these ideas have been applied to the study of approximate unitary k-designs [30,31,32,33,34,35,36,37,38,39,40]. These techniques were used to give the first logarithmic estimate of scrambling [18]. More recently they have been used to calculate OTOC for k-local [22] and geometrically local systems [41,42,43]. These models are currently being used as testbeds to establish "quantum supremacy" [44,45,46,47].
is very short, consistent with a log N dependence (although a slow power law cannot be ruled out). A ramp time t ramp ∼ log N is equivalent to spectral rigidity extending over 2 N/2 /( √ N log N ) eigenvalues, out of a total of 2 N/2 eigenvalues. 4 We then study a 2-local qubit system, the Randomly Coupled Qubit model (RCQ). Here again we find a short ramp time, consistent with t ramp ∼ log N .
In order to understand the relation between this log N time and the scrambling time we turn to geometrically local systems in Section 4.2. We mainly focus on a one dimensional nearest neighbor version of the RCQ. For such geometrically local systems the scrambling time is t scr ∼ N . Our numerical results indicate that t ramp ∼ N 2 . Energy is conserved here and the diffusion time is t diff ∼ N 2 , indicating that one must wait for all conserved quantities to diffuse across the system before random matrix behavior is established. This is one of our central findings and we give analytic arguments for it later.
In Section 5 we turn to analytic methods. In particular we study analogs of the spectral form factor for random and Brownian quantum circuits where the couplings are varied at each time step. Powerful techniques using Markov chains exist to study such systems. We study k-local and geometrically local RCQ random circuits and find that the analog of the ramp time is t ramp ∼ log N , independent of connectivity. The scrambling time in the one dimensional geometrically local random circuit is t scr ∼ N so the mechanisms here are clearly different. In fact we find the source of the logarithm is the exponential decay of N two-point functions of single qubit operators. So the coefficient of log N is set by the dissipation time, not the Lyapunov time.
Energy is not conserved in random circuits and hence its diffusion cannot be studied there. So in order to study diffusion we examine in Section 5.2 a random circuit that conserves spin, a version of the XXZ model. Here we find that the ramp time depends on connectivity. For the geometrically local case t ramp ∼ N 2 . We are able to relate this to (many-body) diffusive effects. For the 2-local case we find that the diffusion time is order one, and so unimportant. Here t ramp ∼ log N . Again we are able to trace this to the decay of N two point functions. In the geometrically local case these decay diffusively ∼ 1/t 1/2 and in the 2-local case they decay exponentially with a scale given by the dissipation time.
Finally in Section 6 we discuss the connection between the ramp time and the relaxation time scale of correlation functions of simple operators in random quantum circuits. We expect the same connection to be true for Hamiltonian evolution, but point to an issue that makes it difficult to construct a precise argument.

Other relevant work
There has been other recent work aimed at computing t ramp in the SYK model [3,7,26]. García-García and Verbaarschot give a numerical study of the number variance. They find that spectral rigidity extends over of order N 2 eigenvalues in the spectrum. Translating their result to the time domain they predict t ramp ∼ 2 N/2 /N 5/2 [7,26]. This differs dramatically from our results. We discuss a possible explanation for this discrepancy in Section 3.1.
Altland and Bagrets [3] use interesting techniques from few-body quantum chaos to argue that in the SYK model t ramp ∼ √ N log N . This disagrees with the log N scaling we expect from conserved charge random quantum circuit intuitions. In addition the N = 34 numerical value for t ramp they calculate disagrees with our numerical results presented below by roughly an order of magnitude. We do not understand this discrepancy, but make a few remarks about it in Section 7. Their techniques may well serve to demonstrate the ramp at later times.
As this paper was in the final stages of editing the paper appeared. These authors [4] introduce a Floquet model without local conservation law with large site dimension q that can be treated analytically in the large q limit. This allows them to compute all moments of their unitary (which in this case corresponds to time evolution) analytically, a significant advance. For a one dimensional chain they find a ramp time of log N consistent with our results. For higher dimensions they find t ramp ∼ 1. For our systems we would expect a higher dimensional model with N sites to again have t ramp ∼ log N . Perhaps the difference is due to the large q limit.
After submitting the first version of this paper we learned of a very interesting paper by Kos, Ljubotina, and Prosen [5] that we had missed. In this paper the authors discuss a clever model that injects just enough randomness to compute all moments of their Floquet unitary even with fixed site dimension. They see the ramp as the sum of cyclically permuted ladders that give the result in RMT (as do the authors of [4]) and make the connection to periodic orbit theory. They find a ramp time of log N in a way that seems quite generic, in accordance with our results. This technique looks quite powerful and seems amenable to generalization.

Random Matrix Theory
In this section we review a few basic properties of Random Matrix Theory (RMT) which are relevant for the following sections. For simplicity we restrict ourselves to Gaussian random Hermitian matrices, the Gaussian Unitary Ensemble (GUE) [51]. A standard reference is [52]. The partition function for the GUE is given by where M is an L × L Hermitian matrix. We denote its eigenvalues E i , because M is often taken to be the Hamiltonian of a quantum system. We introduce the normalized eigenvalue densityρ At large L the average density in this ensemble is given by the Wigner semicircle law We see that with these normalizations the width of the entire band is order one and so the typical spacing between nearest neighbor eigenvalues is ∼ 1/L. The pair correlation function R(E 1 , E 2 ) = ρ(E 1 )ρ(E 2 ) is the probability of finding two eigenvalues at energies E 1 and E 2 . If their separation |E 1 − E 2 | 1 and their averagē E = (E 1 + E 2 )/2 is not near the edge of the band R(E 1 , E 2 ) is given by the sine kernel formula [52] , most simply stated by introducing a scaled variable Then we have A sketch of this formula is given in Figure 1. Note that 1/(Lρ sc (Ē)) is the spacing between nearest neighbor energy levels in the vicinity ofĒ. In regions of small ρ sc the eigenvalues are farther apart. The scale of the oscillations in (5) are set by this spacing. RMT universality means that formula (5) holds after scaling by the local eigenvalue density. When comparing different parts of the spectrum one must scale different regions differently. This process is called "unfolding," and becomes problematic when the fluctuations in the average density are large. We will see some effects of this later. Note that when |E 1 − E 2 | 1/L, the nonconstant part of the pair correlation behaves like 1 x 2 . This shows that long range fluctuations in the eigenvalue "gas" are suppressed, a phenomenon referred to as "spectral rigidity." We can now give a more precise definition of the Thouless energy: it is the energy difference where the RMT form of spectral rigidity ceases to be valid.
In the following sections we will often use ensembles of unitary matrices. The Lie group of L × L unitary matrices has a unique measure that is invariant under left and right multiplication, Haar measure. In random matrix theory, the unitary group U(L) with Haar probability measure is called the Circular Unitary Ensemble (CUE). The eigenvalues of each matrix are of the form λ k = e iθ k , where the eigenphases are defined in the range 0 ≤ θ k ≤ 2π, and the probability density is [52] ρ(θ 1 , θ 2 , ..., where Z CUE = L! · (2π) L denotes the partition function. Note that the single eigenvalue density ρ(θ) is uniform on the unit circle. There are no edges, unlike the semicircle. Again, we see that with these normalizations the width of the entire band is order one and so the typical spacing between nearest neighbor eigenphases (θ k ) ∼ 1/L. The pair correlation function R(θ 1 , θ 2 ) = ρ(θ 1 )ρ(θ 2 ) of eigenvalues is given by the sine kernel formula [52], Note that, unlike the GUE ensemble, this formula is exact for all values of eigenvalue differences and not just for a small window. When the eigenvalues are close, expanding the denominator yields the same answer as GUE, as expected, because the same logarithmic repulsion acts in both cases.

Number Variance
A standard diagnostic of spectral rigidity in random matrix theory and quantum chaos is the number variance. This was first applied to the SYK model in [7]. The number variance is defined as where n(E, K) denotes the count of levels in the interval [E, E + K∆] where∆ is the average nearest neighbor level spacing and the computation is done after unfolding the spectrum. If RMT universality is valid we expect this quantity to be essentially independent of E. By the definition of∆ we have n(E, K) = K and in an interval of length K the range of the expected number of energy levels is of order The number variance can be expressed in terms of the pair correlation function (see for example [53]). The result for GUE is where γ = 0.57721... is Euler's constant. The crucial thing here is the very slow log K behavior which indicates a very rigid spectrum. Poisson distributed eigenvalues would give Σ 2 (K) ∼ K. This quantity has proved a very useful numerical diagnostic in various studies of few-body quantum chaotic systems [53]. But in many-body quantum chaotic systems, like SYK, there is an additional contribution to Σ 2 (K) that obscures the RMT signal. This complication was first pointed out in the study of embedded Gaussian ensembles [54]. The basic issue is that the fluctuations in the coarse grained eigenvalue density in these systems are only suppressed by powers of 1/N . In contrast these fluctuations in random matrix theory and few-body chaos are suppressed by powers of 1/L. Here L is the dimension of the Hilbert space and in many-body systems L ∼ e N . So the density fluctuations are much larger in many-body systems.
The main effect of this on Σ 2 (K) is easy to understand. Assume that the result of each sample is described by random matrix statistics rescaled by the eigenvalue density realized in that sample, not the mean eigenvalue density. Because the typical eigenvalue spacing ∆ in each sample depends on this density it varies a lot from sample to sample. This means that Σ 2 (K) is actually of the following form (see formula 1 in [54]) For a system with zero mean energy ( TrH = 0) and small variance a rough estimate is given by 5 For GUE (12) is of order 1/L 2 so the correction in (11) is of order K 2 /L 2 which is small until K covers a finite fraction of the whole energy band. In contrast in SYK (q = 4) the correction is of order K 2 /N 4 (see for example [8] appendix F), which becomes large at K ∼ N 2 . This is the same answer found by [55] and according to [26] is consistent with numerics. In SYK the spectrum consists of 2 N/2 eigenvalues and so this result would correspond to an exponentially small fraction of the spectrum. Note that this K 2 /N 4 behavior has nothing to do with spectral rigidity and in fact will swamp the logarithmic signal for larger energy scales. We will argue later that spectral rigidity in this system extends over 2 N/2 /( √ N log N ) eigenvalues, corresponding to a Thouless time of t ∼ log N . We have made a preliminary check that this artifact is responsible for the results in [7,26] by rescaling each sample to have the same variance. This increases the range of logarithmic behavior by roughly a factor of 2 for the largest N studied. These results are discussed in detail in Appendix A.1. It is not clear to us whether there is even an in principle way to fully correct for these large fluctuations in this observable. 6

Spectral Form Factor
We now discuss another diagnostic that is less sensitive to the kind of errors discussed above, the spectral form factor. This diagnostic also has a long history in random matrix theory and quantum chaos; see e.g. [52]. 7 It was first used to study the SYK model in [8], motivated by a finite temperature extension discussed in the context of black hole physics in [59]. Here we only consider problems with finite dimensional Hilbert spaces and focus on generic behavior in the band so we restrict ourselves to infinite temperature.
The spectral form factor is defined as follows: 6 Note that the variance in (12) goes like 1/N q for general q SYK. So studying large q will improve this statistic. But we must take q 2 < N to preserve k-locality (see [56,8]). This is not sufficient to make the error in (11) smaller than the desired signal. 7 For elegant analytic formulas in RMT see [57] and [58]. [Right] Ramp and plateau coming from two slices marked with the corresponding colors.
Using the eigenvalue correlator we can write We will often discuss the normalized spectral form factor defined so that g(0) = 1. Here again L is the dimension of the Hilbert space.
To get oriented let us look at this quantity for the GUE ensemble. As a first approximation we can replace ρ sc (E) in (3) by 1, so the eigenvalue spacing is uniform and of order 1/L all across the band. 8 Then we can compute the Fourier transform of (5) giving the result This is plotted in the right panel of Fig. 2 for two different energy slices with different values of ρ. We called the linear t behavior at early time the ramp. It is the time domain 8 We approximate further by extending the integration ranges to (−∞, ∞) reflection of spectral rigidity. The Thouless energy determines the earliest time for which the linear ramp behavior holds -the ramp time, t ramp . In GUE this time is order one, set by the width of the band.
We call the constant region the "plateau" and the time at which it begins the "plateau time, " which is of order L. 9 We now discuss the consequence of having an eigenvalue density ρ(Ē) that varies across the band. We divide the entire energy band up into a large number of regions of equal energy width narrow enough so that ρ is roughly constant in each. Changing the constant value ρ(Ē) in (5) does not change the overall form of (16). It only changes the plateau time and the height of the plateau (see Fig. 2). This means that summing up the contributions from each region to compute the total spectral form factor is very simple. Along the ramp one just adds up linear functions with equal coefficients. The final plateau height is just L, the total number of eigenvalues. The only significant change is a rounding of the sharp transition to the plateau, because different regions of the spectrum have different plateau times. The size of this effect depends on the shape of ρ(E) and depends on the system being studied.
This insensitivity of the ramp means that fluctuations from sample to sample of the eigenvalue density do not affect the ramp structure, making it a robust diagnostic of the Thouless time even without unfolding. This is in contrast to the number variance, discussed above.
There is a problem, though, with using the spectral form factor to diagnose spectral rigidity when it is very long ranged because the ramp time will be very short and the height of the ramp will be very small compared to the plateau. So small effects can hide the ramp. The GUE ensemble offers a simple example of this effect when the variation of the spectral density is included. The leading contribution at relatively early times to the spectral form factor is just given by the Fourier transform of ρ sc (E), The L 2 t 3 behavior for t 1 is due to the sharp square root edge of the semicircle distribution and dominates until the "dip time" t dip when it becomes smaller than the ramp, i.e., at a time when L 2 /t 3 dip ∼ t dip , t dip ∼ L 1/2 . 10 This early time "slope" obscures the ramp until after t dip , but the ramp time is typically much earlier than this. The necessity of following the ramp underneath the slope in SYK was already emphasized in [8].
There are strategies to deal with this. In GUE one can just subtract off the disconnected part of the spectral form factor, computing But this does not work in the SYK model, as explained in [8]. This model has a square root edge in its eigenvalue density [8,61,62] but fluctuations in the edge position from sample to sample are only suppressed by a power of N , the number of fermions, rather than L (equal to 2 N/2 in SYK) as is the case in GUE. This fluctuation converts the power law slope into a Gaussian falloff in | Z(t) | 2 . But fluctuations in the edge position cancel out in |Z(t)| 2 , leaving the 1/t 3 slope.
In this paper we employ the alternate strategy, suggested to us by Douglas Stanford, of using a type of microcanonical ensemble with a Gaussian window to suppress the contribution of the edge. To see the onset of the ramp we introduce a variant of the spectral form factor In other words, we replace the energy spectrum ρ(E) with e −αE 2 ρ(E). This is in the spirit of a microcanonical density of states centered around E = 0 but with smooth edges to the energy window that will not create spurious signals in the time domain. 11 Although this changes the global shape of the energy spectrum, locally it is just a multiplication by a constant factor, and hence it does not spoil the RMT universality of the energy correlations. When α is sufficiently large, any sharp edge of the spectrum is replaced with a Gaus-sian. Then Y , which is essentially the Fourier transform of this 'Gaussian-filtered energy spectrum', decays as a Gaussian at early time. Hence the non-universal part can be strongly suppressed, while preserving the universal part. We also introduce the normalized quantity h(α, t) ≡ |Y (α, t)| 2 /|Y (α, t = 0)| 2 . Note h(0, t) = g(t).
In the GUE case at early time This becomes of order one at t ∼ √ log L. So spectral rigidity can be probed out to this scale. In fact t ramp ∼ 1 here, so we cannot fully diagnose this system using this technique. In the many-body systems we will study we expect t ramp to be large enough that, in principle at least, it can be fully revealed by this measure.
In our analysis we will also discuss the analog of the spectral form factor for the Haar random CUE ensemble. Here the relevant quantity is the (k-th moment) of the trace Using the exact formula for the pair correlation function (7) and doing the integral we get [63,64] |Tr Note that the ramp and plateau are exact for the CUE, even at finite L. There is no slope because there is no eigenvalue edge. No unfolding is necessary because the eigenvalue density is uniform. With these preliminaries out of the way we now turn to a systematic analysis of the models. We first discuss numerical data on time independent Hamiltonian systems, beginning with k-local systems.

k-local Hamiltonian Systems
k-local systems have been studied as a good model for scrambling in black holes [14,15]. These models thermalize very quickly, with a dissipation time of order one and a scrambling time of order log N . Diffusive effects will also be very rapid. Here we will see numerically that t ramp is also short, with a weak N dependence. In Section 6 we will give an analytic argument that t ramp also scales as log N but the physical mechanism is different from scrambling.

SYK Model
Let us start with the SYK model [9,10,11,12]. This is a quantum mechanical model of N Majorana fermions satisfying the anticommutation relation The dimension of the Hilbert space is L = 2 N/2 . The Hamiltonian is given by where J abcd is a Gaussian random number with a mean zero and width 6 N 3 J, and totally antisymmetric with respect to a, b, c, d. The disorder average (the average with respect to J abcd ) is taken after calculating observables.
In the first and second panels of Fig. 3, h with N = 32 has been plotted for various values of α. As mentioned above the spectral form factor of the SYK model has a 1 t 3 'slope' which hides the onset of the ramp. This is visible in the α = 0 curves where h(t) = g(t) in Fig. 3. In Fig. 4 (using the optimal value of α) we can see that the ramp continues all the way down to the Gaussian envelope, and may well continue past it. This intersection time, which we call t min , gives an upper bound to the ramp time t ramp . In the data displayed the ramp extends down to a time t min of order 10. The plateau time is of order 10 5 here, so this represents a ratio of 10 4 in time or energy scales. Note that there are Time Jt only 2 16 ∼ 66 000 energy levels 12 in the spectrum so rigidity extends across an appreciable fraction of the entire spectrum! 13 We should emphasize that as expected t min is far less, over an order of magnitude here, than t dip . It seems from the graph that the ramp probably extends somewhat further but is just being masked by the Gaussian envelope. For this finite value of N it is impossible to do better because an envelope that decays faster in time corresponds to a Gaussian filter that is broader in energy and extends closer to the edge of the spectrum, allowing more contamination from the sharp edge. But as N grows this effect goes away because the edges of the spectrum are at energies ±cN (even though the standard deviation of the energy is of order √ N ). The Gaussian filter lets in an edge signal in |Y (α, t)| 2 of order e −2α(cN ) 2 +2s 0 N /t 3 where s 0 N is the zero temperature entropy. The early time ramp signal is of order one. So choosing α as small as s 0 c 2 N is sufficient to mask the slope contribution from the edge. The Fourier transform of the Gaussian envelope decays ∼ e 2s∞N − t 2 4α where s ∞ N is the infinite temperature entropy. So an α ∼ 1/N is small enough for the Gaussian envelope to become order one in a time of order one, allowing the study of t ramp as short as order one. Thus this quantity provides a reliable definition of the Thouless time in SYK and other many-body systems with similar properties. The required N values may 12 After removing the two fold fermion parity degeneracy which is the only degeneracy present for N = 32 [50]. 13 Of course a precise comparison would require evaluating various numerical factors of order one.
well be computationally prohibitive though.
We discuss our detailed algorithm for determining t min in Appendix A.2. The oscillations in the data make this determination quite noisy. The values listed in Table 1 in the Appendix provide upper bounds for t ramp . The best we can say about the N dependence is that it is weak. Power laws faster than N 1 and exponentials, with order one coefficients, are disfavored. A log N behavior would certainly be consistent.
To gain more experience we now examine a k-local model without a sharp edge in its density of states.

2-local Randomly Coupled Qubits (RCQ)
We now consider the 2-local-RCQ model. 14 This is a model of N spin-(1/2) particles on a complete graph that interact with couplings of random strength. It has the same kind of k-local connectivity as the SYK model and hence the same pattern of sparseness in its Hamiltonian matrix. We therefore expect similar scaling properties of its random matrix statistics. The Hamiltonian is of the form where σ α i represents an element of the set {σ 0 , σ 1 , σ 2 , σ 3 } = {I, X, Y, Z} acting at the i-th site and couplings J αβ ij are randomly drawn from a normal distribution. This choice of normalization is analogous to the typical SYK normalization discussed in Section 4.1.1, such that where L = 2 N is the dimension of the Hilbert space, · J denotes ensemble averaging over Gaussian couplings J αβ ij and Tr denotes the normalized trace. We plot the eigenvalue density in Fig. 5. 15 We see that the energy distribution function ρ(E) has a Gaussian tail for sufficiently large N (N 4) (see [56] for discussion of the spectral density). Therefore the slope behaves like e −N t 2 which decays very quickly, unlike SYK.

Spectral Form Factor
The spectral form factor for the 2-local RCQ system with N from 8 to 16 is shown in Fig. 6. There are several noteworthy features of this graph. First, as just discussed, the slope decays very quickly. Then there is a bump. This is followed by a short intermediate region. Then a ramp begins, at quite early times. To study the bump we have experi-  Figure 6: The spectral form factor for the k-local randomly coupled qubits (RCQ) with 8 to 16 qubits. We observe the Gaussian decay at the early times, follow by a 'bump', ramp and plateau. mented with using Y Y * . We were unable to isolate a sizable transition region to the ramp while excluding the bump. We do not understand its origin, unfortunately.
Nonetheless it is clear that the ramp begins early, at times between 2 and 5, increasing slowly with N . For N = 16 (see Fig. 7) the plateau time is about 10 000 times greater than the ramp time. The entire spectrum contains 2 16 ∼ 66 000 eigenvalues. So as in the SYK model we see rigidity extending across an appreciable part of the entire spectrum.
We now attempt a more quantitative study of the N dependence of t ramp . In the absence of a detailed theory of the early time behavior we proceed phenomenologically. We define the time scale where the bump ends and the ramp starts as the ramp time, t ramp . In order to define the ramp time systematically, we quantify the distance of the  Figure 7: The spectral form factor for the k-local randomly coupled qubits (RCQ) for N = 16 qubits is given by the blue curve. We fit a line to the ramp (red line) and determine the time where the fractional error from the linear fit (t) equal to 5 %, 10 %, 20 % or 30 %. In the plot the green dot represents the ramp time estimate when fractional error is 10 %. spectral form factor from the linear fit to the ramp, by using the fractional error Here we use g ramp (t) to denote the linear fit to the ramp. We consider four different estimates for the ramp time, picking the points where (t) equal to 5 %, 10 %, 20 % and 30 %. See Fig. 7 for an example of an actual fit. We perform this procedure for all the values of N ranging from 8 to 16 and plot the estimates of the ramp time with respect to N . In Fig. 8, we plot an estimate of the ramp time for four different values of the fractional error. Different values of the error systematically shift the curve. As mentioned previously our theoretical expectation is that t ramp scales like log N although for reasons unrelated to the logarithmic behavior of the scrambling time. So in Fig. 8, we fit a function of the form A log N + B to the data with = 10 %. (For other values of the fractional error the fit will have the same A, but different values of the offset B.) t ramp is consistent with A log N + B, although it is far from conclusive.  A power law like N 1/2 fits equally well. A more rapid power, faster than N 1 or so, or an exponential dependence, with order one coefficients is disfavored. Larger N would be required to distinguish these functional forms.
To help disentangle the physical mechanisms at work here we now turn to a geometrically local system.

Local Hamiltonian Systems
We will consider two examples of one dimensional local Hamiltonian systems, a nearest neighbor coupled RCQ model and the XXZ chain with random magnetic field.

Local Randomly Coupled Qubits (Local-RCQ)
In geometrically local systems (which we will refer to as "local", as opposed to k-local) the various interesting time scales become distinct from one another and so it is easier to diagnose the effect of different physical mechanisms. In a one dimensional local spin system we expect the dissipation time t diss ∼ 1, the scrambling time t scr ∼ N and the diffusion time t diff ∼ N 2 . These time scales vary more quickly with N than in the k-local case making them easier to study numerically. In this system we find that t ramp ∼ N 2 and hence conjecture that it is related to diffusion, not scrambling. Let us start with a one-dimensional local randomly coupled qubit model with nearestneighbor random interactions, where parameters J αβ i are normal random variables with mean zero and variance one, so that TrH = 0 and TrH 2 = N − 1. This is analogous to SYK normalization. 16 We will explore the properties of this system numerically.

Energy Spectrum
The spectrum of this Hamiltonian was studied numerically [65,66], and shown to have a Gaussian shape. We repeated the same calculation as a sanity check. The energy spectra for N = 6, 7, 8, · · · , 16 are shown in Fig. 9. The spectrum becomes closer to Gaussian as N becomes larger. We have also looked at nearest-neighbor level spacing distribution and do not see any hints of a MBL phase anywhere in the spectrum.

Spectral Form Factor
The spectral form factor is shown in Fig. 10. We can see that an intermediate region emerges as N grows. Note that this region lasts much longer than than in the k-local spin chain. We do not have a theoretical understanding of this region and regard it as an interesting topic for future research. Here we proceed phenomenologically and define the ramp time to be where this region ends and the ramp starts.
In order to define the ramp time systematically, we adopted two methods (see Fig. 11): • Pick up the points where the fractional error (t) defined by (27) decreases to certain  Figure 11: The spectral form factor for the local randomly coupled qubits (Local-RCQ) for N = 16 qubits is given by the blue curve. We fit two different lines in the log-log plot to the bump and the ramp (two red lines) and determine, the first, time scale where two red lines intersect and, the second, where fractional error of the blue curve from the best fit of the ramp is . In the plot the green dot represents the ramp time estimate with the second method, when fractional error is 5 %.
values, as we have done for the k-local spin chain. We have used (t) equal to 5 %, 10 %, 20 % and 30 %.
• Fit the intermediate region and ramp by powers of t (straight line in a log-log plot) and identify the intersection of two fitting lines with t ramp .
We do this procedure for all the values of N ranging from 8 to 16 and plot the estimates of the ramp time with respect to N . In Fig. 12, we plot estimates of the ramp time determined in these ways. The ramp time can be fit reasonably well by an ansatz has smaller mean square error and the coefficients are of order one. Hence we conjecture that t ramp should be related to t diff rather than t scr . We will give analytic evidence for this below.

XXZ Model
Next let us consider the one-dimensional XXZ Hamiltonian system with random magnetic field. This model has many interesting features, including a very well studied MBL phase [67,68]. We are interested in the model because it has another conservation law besides energy -the z component of spin. This conservation law can be preserved (unlike energy) in a random quantum circuit as we will explore in Sec. 5.2. The random circuit analysis allows us to make the connection between t ramp and t diff manifest. Here we present some preliminary numerical results for the time independent Hamiltonian system. The intermediate region here grows as one approaches the MBL phase. We are unsure of the significance of this result. The Hamiltonian we consider is where σ i = (σ 1 i , σ 2 i , σ 3 i ) are Pauli matrices and ω i , a random magnetic field, is a random number selected from the uniform distribution on the interval [−W, +W ]. Here we use periodic boundary conditions, so the sums are modulo N . It has been argued that this model is in the MBL phase for W ≥ 2.75 [68].
This Hamiltonian commutes with the total spin along the z-direction, S tot z = 1 2 i σ 3 i . Thus we will need to look into specific spin sectors to detect nearest-neighbor Wigner-Dyson statistics. Unless otherwise specified we will consider the total spin zero sector for an even number of spins. 17

Energy Spectrum
We are interested in the diffusive regime, where the Hamiltonian exhibits chaotic prop- 17 If we do not restrict the spin, we expect a Poisson distribution. The spectral form factor for the XXZ Hamiltonian with 10 to 18 qubits for the S=0 sector and W = 0.5. We observe universal behavior for the h(α, t), Gaussian decay at the early times, flat region, ramp and plateau. The time scale where the 'bump' ends and the ramp starts we will again call ramp time.
erties and Griffiths effects, contributions of rare many-body localized regions, are not yet significant [69]. So we will investigate this Hamiltonian for W = 0.5 which is in the heart of diffusive regime. The energy spectrum is shown in Fig. 13. We have checked that the nearest-neighbor level correlations are consistent with the GOE ensemble when we look into each conserved spin sector separately.

Spectral Form Factor
In this section, we study |Z| 2 for the XXZ model. Again we take W = 0.5, which is in the heart of the diffusive phase. The qualitative behavior of |Z| 2 is similar to the local RCQ, however it is more sensitive to finite N effects in the edge of the spectrum.
As in the local RCQ, the intermediate region lasts longer than in the k-local spin chain. We determine t ramp in exactly the same way as in Sec. 4.2.1. However, for this system the numerical analysis is more subtle. For the small values of W that are in the diffusive regime there are some approximate conservation laws that produce oscillating behavior at early time and make it difficult to determine the ramp time accurately so we do not present results. If one increases W the system moves toward the MBL phase and a rather long intermediate region appears in the spectral form factor (see Fig. 15). At this point, we do not have a good theory to explain the shaded region for values of the W ≥ 1. Perhaps it is related to the subdiffusive regime between ergodic and MBL phases.
These numerical results are inconclusive, but as we will see in Sec. 5.2, the random circuit analogue of this model provides an ideal setup to test our proposal that t ramp and t diff are the same.

Random/Brownian Quantum Circuit Models
We now discuss analytic methods. In contrast to few-body systems analytic approaches for time independent Hamiltonian many-body systems have not yet been fully developed. Instead we turn to a related class of systems that display similar thermalization behavior and are analytically tractable. These are random quantum circuit and Brownian quantum circuit models.
In Brownian circuit models, the Hamiltonians are chosen randomly from certain ensembles, independently at each small time step dt. After M steps, the unitary evolution will be given by where t = M dt and each of the operators H i is independently drawn from a given ensemble. A random quantum circuit is a discrete version of this where each e −iH i dt is replaced by a unitary drawn from a suitable ensemble of unitary "gates." If the H i are chaotic we expect that U (t) performs a kind of random walk on the unitary group U(L), eventually converging to the Haar random ensemble. The analogous question concerning the onset of random matrix behavior becomes the question about how quickly the spectrum of U (t) starts displaying CUE statistics. A number of recent papers have studied thermalization in various correlations functions using these techniques [18,22,41,42,43].
The conclusion of all this work is that appropriate Brownian/random quantum circuits display known thermalizing behaviors -two point function decay (dissipation); chaotic/scrambling behavior as diagnosed by OTOCs and subsystem density matrices; and diffusion -with mechanisms and time scales that are very much analogous to fixed Hamiltonian systems. We will assume for now that Brownian/random quantum circuits will also display the onset of random matrix behavior via analogous mechanisms and with the same time scales as fixed Hamlitonian systems with the same locality properties and conservation laws. Because of their more efficient randomization we expect that at the least the time scales in these models are a lower bound for those in the coresponding Hamiltonian systems.
All the results we obtain are consistent with the Hamiltonian numerics discussed above. In addition these results are consistent with a heuristic framework for an analytic argument for Hamiltonian systems we discuss in Section 6. In any case we find this assumption to be a very plausible one, and now turn to working out its consequences.
Because of the Brownian nature of these systems the naive analog of the spectral form factor |TrU (t)| 2 decays to the value given by the average using Haar measure, which we call the Haar value, and stays there. We call the time when this value is attained the 'Haar time' t Haar .
But this does not probe the full range of correlations in the spectrum of U (t). To do this imagine running the quantum circuit for a time t, and after that repeating the same time evolution. 18 Then the time evolution from t = 0 tot = kt (k = 1, 2, · · · ) is described by U (t) = (U (t)) k . The analog of the spectral form factor for timet is then Note that u(k, t) = k for k ≤ L when U (t) is drawn from the Haar (CUE) ensemble (see discussions in Section 2 and Appendix B.1). This is just a discrete version of the ramp.
The time t at which the initial moments approach this value, t Haar , is thus the analog of t ramp in the Hamiltonian systems discussed in previous sections. Hopefully the physical meaning of t ramp can be understood if we understand the meaning of t Haar . An important question is whether t Haar depends on k. We are able to explore k = 1, 2 analytically. We then appeal to random matrix theory intuition that suggests the finer grain structure of the eigenvalue distribution is easier to establish in a chaotic system than the coarse grained structure. Since higher k probes finer structure we expect, at least for large enough k, that t Haar will, at the least, not increase as k increases. Our numerics, discussed below, are consistent with this and seem to indicate that k = 2 is already in the stable range. 19 In this section we consider Brownian circuit versions of the RCQ and XXZ models, both local and 2-local. 20 In earlier sections we suggested that for geometrically local systems t ramp is the diffusion time. An essential difference between the random/Brownian quantum circuit and time-independent Hamiltonian system is that the former does not conserve energy and hence in general there is no diffusion. To study the role of diffusion in this context we use a random circuit where spin, not energy, is conserved. In particular we study a random circuit version of the XXZ model introduced by [43,48] in Sec. 5.2. 21 There we will see it is possible to relate t Haar and hence t ramp to the diffusion time, distinguishing it from the scrambling time. In the 2-local case we show that diffusion happens in order one time, and is not important for setting t ramp . Local random circuit evolution on N qubits, each gate is randomly selected from a gateset Γ ∈ U (4).
[Right] 2-local random circuit on N qubits, in each step qubits are randomly paired and random gate is applied from a gateset Γ.

Analytic Estimate
Here we provide a calculation of the spectral form factor for random circuits. Specifically, we are interested in the behavior of u(k, t) defined by (32). We first describe the k = 1 case, which is conceptually quite simple. We use techniques presented in Ref. [34]. We will investigate the parallelized, local random circuit model shown in the left part of Fig. 16, where each step consists of two layers acting on pairs of qubits and each gate is selected at random from some 2 qubit universal gate set Γ ⊆ U(4). For general k we can express TrU k as a sum of monomials, where L = 2 N is the dimension of the Hilbert space. Therefore, (32) can be written as Calculating |Tr U (t)| 2 We want to study There are two types of terms we need to study: i = j and i = j. Without loss of generality we can set i = 1. First let us consider U 11 U * 11 . In a natural 0 (up spin) and 1 (down spin) basis, i = 1 is |0, 0, · · · , 0 = |0 ⊗N , and U 11 (t) = 0, 0, · · · , 0|Û (t) |0, 0, · · · , 0 .
By using a density matrixρ(t) =Û (t)ρ(0)Û † (t) with the initial condition at t = 0, we can write it as The density matrix can be expanded in tensor products of strings of Pauli matrices σ p , where p is a label specifying a Pauli string, 22 as Note that Tr(ρ(t)) = 1 holds for any t due to the unitarity of time evolution. We use p = 0 to denote the identity matrix, and so γ t (0) = 2 −N/2 for any t. By using these expansion coefficients γ t (p), we obtain We can easily determine γ 0 (p), by noticing that where I = σ 0 and Z = σ 3 . Namely γ 0 (p) is nonzero only when σ p is a Pauli string containing only I and Z's (equivalently, when p is a set consisting only from 0 and 3), and all nonzero elements are 2 −N/2 . Therefore, where p runs through all possible sets of 0 and 3 but (0, 0, · · · , 0). For a given choice of circuit the Pauli chain of I's and Z's evolves into other chains with positive and negative weights. On averaging these tend to cancel and γ t (p ) decays exponentially. 23 A pair II is left unchanged by the evolution. So the only invariant chain is all I's. The rate of decay of other chains is proportional to the number of Z's in the chain, which we will call d. Lemma 4.1 (Section 4.1) of Ref. [34] bounds the one and two norms of γ t (p) for the random circuit. Their bound is where d(p ) is a number of Pauli-Z's in σ p and ∆ is the 'gap parameter' which depends on the choice of the gate set Γ. Thus, we can bound the sum as In our normalization, for both local and k-local RCQ, ∆ should be of order 1, independent of the system size N . In our case d(p )=d |γ 0 (p )| = 2 −N/2 N d , and hence Next we study the behavior of U 11 U * jj . The j-th state is represented by a sequence of 0's and 1's. As we will see, the number of 1's, which we denote by q, controls the decay rate. Hence let us denote the j-th state by 1 q 0 N −q , neglecting the ordering of 0's and 1's. We consider the time evolution of an operatorÔ(t), whose initial condition is specified bŷ By definition,Ô and hence Noticing that we can writeÔ(0) as 24 where f ( α) = q i=1 (α i − 1) counts the number of σ 2 's. By expandingÔ as we have done forρ asÔ we obtain γ 0 (p) = 2 −N/2 (+i) f ( α) . Hence where we sum over all the Pauli strings of the form XY Y..XIIZZ..Z, that have X, Y elements for the first q sites and N −q I, Z elements for the remaining sites. The coefficients of these terms will decay as |γ t (p )| ≤ e −∆t(q+d(p )) |γ 0 (p )| except for q + d(p ) = 0, because the 'conversion' can take place anywhere an X, Y or Z is located. (Note that this expression, and hence the following estimate, hold for all choices of the ordering of 0's and 1's in the j-th state.) Thus, for q > 0 (i.e. j = 1), Now we can get a good bound for u(1, t), Hence, for each fixed N , at sufficiently late time, As we have mentioned above, generic gate sets have ∆ that is O(1) . If this inequality is almost saturated (numerical results explained below suggest this is actually the case), then if we plot It is clear that this estimate t Haar does not depend on the connectivity of the circuit. We should note that we can relate this estimate to the decay of two point functions. As we show in (104) of Section 6 we can rewrite where the P j are a complete basis of Pauli operators acting on the N qubit Hilbert space. The decay of u(1, t) is thus related to the decay of various two point functions. The content of the random circuit analysis is that this sum is dominated by the decay of the P j containing just one Pauli. This two point function decays like e −∆t and there are N such operators. This give the result (59).
Sketch of the calculation for u(2, t) = |Tr U 2 (t)| 2 In the above section we have estimated the convergence of the first moment u(1, t) and shown the Haar time is log N for both local and 2-local circuits. To fully understand the approach to random matrix statistics it is important to examine the Haar time for the k-th moment u(k, t). This is related to the study of approximate k designs. The techniques used above become quite challenging for general k and have only been carefully examined for k = 2. In the Appendix B.2 we carry out an analysis for u(2, t) using the techniques of Ref. [34] for a particularly simple gate set, U (4) random two qubit gates. For this gate set u(1, t) converges to Haar in one time step while u(2, t) converges in log N time. Here we give a summary of the argument. The analysis for u(2, t) boils down to a Markov process on two copies of a string of N Paulis. When the strings are unequal their weight vanishes in one time step. For a more general gate set the k = 1 rate would control their decay. When the strings are equal, a different Markov process involving transitions between pairs of Paulis controls the convergence. Again the strings with very few Z's have the slowest convergence, and cause a deviation from the Haar value of 2 of the form So u(2, t) becomes close to Haar in a time 1 ∆ log N . Again we see a log N dependence but here∆ is different from the ∆ discussed above. It is determined by the Markov chain on pairs of Paulis. We note that the gap of the full Markov chain does not control the convergence of this quantity, as explained in Appendix B.2.
We can again interpret these results in terms of two point function decay. The quantity described in (60) vanishes in one step for U (4) gates. But we can define "two step"  Figure 17: Plot of ( |TrU k | 2 −k)/N for k = 1, 2, 3, 4, 5 against t for the 2-local RCQ chain with dt = 0.2. N = 9, 10 5 samples. These exhibit exponential decays with rates that do not decrease as k increases. correlators using equation (104) of Section 6 to rewrite These "two step" two point functions will decay exponentially and the slowest ones will be the operators with a single Pauli. The analysis in Appendix B.2 shows that these decay like e −∆t = (1/15) t .

2-local RCQ Brownian circuit
We start with the Brownian circuit made from the 2-local RCQ Hamiltonian introduced in Sec. 4.1.2. In the following, 3000, 20000 and 10 5 samples have been used for N = 11, 10 and N ≤ 9 cases, respectively. In Fig. 17, u(k, t) minus the Haar value, |TrU k (t)| 2 − k, is plotted for N = 9 and dt = 0.2. 25 We can see clear exponential decays at early time and late time. As shown in the left panel of Fig. 18, the early time decay for k = 1 can be fitted by an exponential function of the time exp(−CN t), in which C 0.189 is a constant independent of N . Therefore, the early-time decay to |TrU (t)| 2 − 1 ∼ O(1) takes place within order one time.
For late time, |TrU (t)| 2 − 1 ∼ N e −∆t , with ∆ = O(N 0 ), is expected. As shown in the right panel of Fig. 18, the late time decays are consistent with ∆ 0.5, when N is sufficiently large.
In Fig. 19, we used the same fitting ansatz for k > 1, as ( |TrU k (t)| 2 − k)/N ∼ e −∆ k t . For each k, we can see convergence to exponential decay at large N . The exponent ∆ k increases with k (see also Fig. 17) 26 , which means the convergence to the Haar values at larger k happens earlier. Therefore, after t Haar is determined from k = 1, higher moments have already converged.

Local RCQ Brownian circuit
Next we consider the Brownian circuit made from the local RCQ Hamiltonian introduced in Sec. 4.2.1. We expect the same decay pattern as in the 2-local RCQ. We have used the  As shown in the left panel of Fig. 21, the early-time decay can be fit by ( |TrU (t)| 2 − 1) (4 N − 1) · e −c(N −1)t where c 0.187. Therefore, the early-time decay to |TrU (t)| 2 − 1 ∼ O(1) takes place within order one time. As shown in the right panel of Fig. 21 and in Fig. 22, the late time decay appears to be consistent withcN e −∆ k t for k = 1, 2, 3, 4 and 5, wherec is an order one constant. These observations are the same as the case of the 2-local Brownian circuit. Note that here ∆ 1 is larger than ∆ 2 . This is not surprising given the analysis above which shows they result from somewhat different mechanisms. But for k > 2, ∆ k > ∆ 2 indicating that k = 2 is already in the stable range for determining the approach to random matrix behavior.   Figure 22: Plot of ( |TrU k | 2 − k)/N 2 against t for the local RCQ chain with dt = 0.2 for N = 10, 9, 8, 7, 6, 5, 4 qubits, k = 2, 3, 4, 5. An exponential fit of the N = 9 curve is also shown.

XXZ Random Circuit and Diffusion
In this section, we will calculate u(k, t) defined in (32) for an XXZ type random circuit introduced in [43,48] which has a conserved quantity and so helps clarify the relationship between t Haar and t diff . The quantum circuit is composed of a brickwork array of twoqubit gates as in previous examples (see Fig. 16). In order to motivate this circuit, let us consider a Brownian circuit made from the Hamiltonian (30) (which we repeat here) with time varying couplings: The total spin along the z-direction, S tot z = 1 2 i σ 3 i , commutes with the Hamiltonian regardless of the value of the random magnetic field, so Brownian circuit time evolution commutes with S tot z , and hence the total spin is conserved. The random circuit under consideration is the discrete analog of this Brownian circuit.
Because the part of the Hamiltonian relevant for the interaction between i-th and j-th sites, 1 4 j , the unitary gate which describe the infinitesimal time evolution takes the following form, where U (1) and U (3) are independent random elements of U(1) and U (2) is a Haar random unitary element in U(2). The indices i and j indicates the locations of the qubits. Here U (1) and U (3) act on the spin +1 sector (|0, 0 ) and the spin −1 sector (|1, 1 ), respectively, and U (2) acts on the spin 0 sector (|0, 1 and |1, 0 ). Because of the symmetry we use the Pauli basis {I, σ + , σ − , Z}, where In this notation, the sum of the spins at the i-th and j-th sites is The unitary gate can be expressed as We now consider the adjoint action of G i,j on pairs of Paulis in the string. This has the following properties [43]: (for simplicity, we suppress i, j indices): 1. II, ZZ and (IZ + ZI)/2 are invariant.
4. The raise-by-one operators σ + I, σ + Z, Iσ + and Zσ + will transition between each other with equal probability. The averages of these linear combinations are zero. The same will happen to the lower-by-one operators.
The important point here is that any chain other than II, ZZ and (IZ + ZI)/2 are completely randomized after one application of the unitary gate, so that they average to zero immediately. The transitions between the remaining I, Z chains form a Markov process. Note that unlike the U (4) gate the evolution is a Markov chain already at k = 1. This Markov chain can be understood more easily if we use another basis In this basis, P P and M M are invariant, while P M and M P are mapped to 1 2 P M + 1 2 M P . The Markov property is now evident.

|Tr U (t)| 2 in the XXZ Random Circuit
Let us use the setup above to estimate the k = 1 moment, We first consider the 'local' model defined above, which has nearest-neighbor interactions via the gates G i,i,+1 . As in the case of RCQ, we can write the spectral form factor as where as before |i j| can be expressed as a sum over strings of σ-matrices. Let us use the M, P, σ ± basis. When i = j, at least one σ ± appears in the chain. Then, as we have seen, as soon as σ ± is hit by a random gate, U t (σσ · · · ) U † t averages to zero. Hence contributions from i = j terms disappear in one time step. Therefore we only need to consider i = j.
First let us consider the case with a single M , U ii U * ii = Tr U t (P P · · · P M P · · · P ) U † t (P P · · · P M P · · · P ) .
We need to see how U t (P P · · · P M P · · · P ) U † t evolves. After averaging M P and P M evolve to P M +M P 2 , while P P is invariant. Hence, when the random gate hits M P , the chain remains M P P · · · P with probability 1/2, and changes to P M P · · · P with probability 1/2. So M performs a random walk in the string. Because there is no preferred direction there is no net drift of the walker. We analyze this system in detail below but for now make a few qualitative remarks.
The probability distribution of M locations spreads diffusively so U ii U * ii , given by the overlap with the initial state, decays like U ii U * ii ∼ 1/ √ t at early time. At late time it is uniformly spread on the string and eventually U ii U * ii converges to 1/N . For general i there are q M 's and N − q P 's. Then the diffusive decay at early time is 1/ √ t q (when q N ). The late-time value is 1/ N q .
In order to give a precise analysis we identify M and P with two dimensional vectors, In other words we identify |M and |P to be two states of a qubit. Let us denote by P i,j a unitary operator that acts on the vector space of N -qubits and swaps the states in i, j positions: A single step of our process can be identified with the action of Instead of analyzing this Markov process, we will approximate it by N repetitions of a process where in each step a random site i is selected and the gate is applied between i and i + 1. This should approximate the original parallelized circuit at long times and in particular we expect it to give the same N scaling of the gap. 27 The Markov operator for this process is The number r of M 's in a string is conserved by this process. We will use the subscript r to denote the restricted action of Q loc,r in this subspace. This process has already been studied extensively. It is related to an interchange model on cards studied by Diaconis and Ng [71]. 28 They point out that this model can be mapped to a Heisenberg spin chain This connection becomes clear by noting that It is known that the interchange model for local and 2-local cases defines a reversible and irreducible Markov chain. This process has a uniform stationary distribution on the N r different strings. The largest eigenvalue of Q loc,r corresponding to the stationary distribution will be λ 0,r = 1. and all the eigenvalues, as we will discuss, will be between 0 and 1. Now we go through a standard Markov chain analysis. Suppose we have an initial string with r M 's, |v r = |M P P M · · · M P . Then, we can express Denote by H r the subspace of the Hilbert space spanned by vectors that have exactly r elements of M . Let us denote by λ a,r the eigenvalues of Q loc,r for a = 0, 1, .., N r − 1 in decreasing order; 1 = λ 0,r > λ 1,r ≥ λ 2,r ≥ · · · ≥ λ ( N r )−1,r > 0. The stationary state |λ 0,r is the eigenvector with the eigenvalue λ 0,r = 1 and has uniform weight on each string. Now, we can calculate u(k = 1, t) for each sector. Recalling the notation (70) M = |1 1| and P = |0 0|, the number r of M 's is the number of 1's in the binary representation of the state |i appearing in (73). In terms of the XXZ model, 0 and 1 are spin up (+1/2) and down (−1/2), so this state |i has total spin S z = 1 2 (N − r) − 1 2 r = (N/2) − r. Therefore, where the |v r that are summed over are all the permutations of the vector |M P P M · · · M P with r M 's. We now define a gap parameter ∆ r by e −∆r = λ 1,r . At large N this is just the gap in the H XXX spin chain. 29 We can give a simple lower bound on u(1, t) by just keeping the first two terms of (82) So the time to get close to the Haar value 30 is lower bounded by The gap is determined by the properties of the XXX spin chain which is one of the canonical examples of an integrable system. Reference [71] quotes the following results. 31 For r = 1, 2 the gap is the same and is given by ∆ 1 = ∆ 2 = c N 2 where c is an order one constant. Further it is conjectured that the gap is the same for all r. Our numerics (see the following section) support this. Using this result we have t > N 2 c log 1 .
As we discussed earlier the N 2 dependence is a signal of diffusion. A sharp upper bound is more subtle to obtain. The terms ignored in (83) contribute an N dependent offset to the time required to reach the Haar value. We define t offset by We can begin to estimate t offset by approximating the XXX chain spectrum. In the large N finite r limit the spectrum is dominated by r free magnons, each with nonrelativistic spectrum Here p is a continuum momentum, where we assume 1 N p 1. Assuming r N and t N 2 we can approximate This becomes order one at a time t offset ∼ N 2 r 2 . Here we see that the diffusional dynamics is related to the nonrelativistic dispersion relation (87) of the XXX chain.
But this is not the slowest dynamics in this system. It is known that there are bound states of b magnons for all b ≤ r. These have "mass" of order b and hence a spectrum (see [72]) These bound states diffuse more slowly because of their greater mass. Consider the case when all r magnons are bound up into bound states of size b, producing r/b "particles". This makes the following contribution to u(1, t): This interesting function of time becomes order one at a time t offset ∼ N 2 b 3 /r 2 . This suggests we can make t offset ∼ N 2 by choosing b ∼ r 2/3 . In fact it seems we can make t offset parametrically longer, ∼ N 2 r, by choosing the largest allowed value b ∼ r. For r ∼ N this would give t offset ∼ N 3 . However, when t ∼ N 2 the discreteness of the spectrum becomes important and these approximations are no longer valid. Instead we turn to numerics (discussed below) which indicate that t offset is independent of r for large enough r and hence t offset ∼ N 2 . 32 Our best estimate becomes Again we point out that this N 2 dependence is parametrically slower than the scrambling time for this system which is N [43,48], corresponding to ballistic propagation.
We now turn to the 2-local XXZ random circuit. The analysis is analogous to the local system and we end up with a Markov chain. We first consider a process where at each step a gate acts on one randomly selected pair of qubits. One parallel step of the full process corresponds to N repetitions of this. The Markov operator for this process is The spectrum of this operator differs in important ways from the geometrically local case giving a different N scaling for t . The spectrum of Q 2-loc,r is related to the gap of the Heisenberg model on a complete graph. More specifically, we express Then, we use a simple identity where we use S 2 to denote the total spin squared operator. So we find Recall that S z = 1 2 N i σ 3 i and it commutes with S 2 . In the basis where both are diagonal S 2 has eigenvalues s(s + 1), where s = 0, 1, · · · , N/2 for even N and S z has eigenvalues s z = −s, −s + 1, · · · , +s. Note that r = N 2 − s z . The eigenvalues of H XXX are given by Note that the eigenvalue is independent of s z and only depends on s. In a given r sector, that is for a given value of s z , all values of s such that N 2 ≥ s ≥ s z contribute. At large N the important eigenvalues Q 2-loc are given by λ a = e −Ea . So we can write and the gap is given by E 1 − E 0 . For all r sectors E 0 corresponds to s = N 2 and E 1 to s = N −2 2 . So at large N the gap is N independent.
Diffusion on this completely connected graph takes place in order one time. The degeneracy of this first excited state is N [73] and so a lower bound for u(1, t) is giving a Haar time of In fact this is an accurate estimate. The small gaps grow linearly in level, E n −E 0 = 2n and the degeneracies grow like N n − N n−1 . 33 So the full formula is well approximated by The Haar time is again given by (100). Note again that this logarithmic dependence is due to a different mechanism than the logarithmic scrambling time.
One can study u(2, t) for this model using the techniques discussed in Appendix B.2. We do not perform a full analysis of this case, but note that the "unequal chain" part of equation (121) (the last term) will converge slowly because of the same diffusive effects described above. This indicates that the time scale to converge to the Haar value for u(2, t) will be parametrically the same as in the u(1, t) analysis.

Numerical Results for XXZ Markov Chain
We conclude this section by showing some numerical results for local XXZ random circuits. For the local XXZ Markov chain the upper left panel of Fig. 23 illustrates that for a fixed value of N the low lying eigenvalues are independent of the value of r, indicating that bound state effects of the kind described in (90) are not important for the final approach to Haar. We can see that the offset is roughly ∼ N 2 by looking at the behavior of u(1, t) in the lower panel. We plot the u(1, t) for r = N/2 (i.e., S z = 0) sector for different values of N . Note that when the time is rescaled by N 2 the results align completely. This suggests that the best numerical fit to the time to get close to Haar is given by

Connection to Correlation Functions
In this section we discuss the connection between the ramp time and the relaxation time scale of correlation functions of simple operators in random quantum circuits. This may well be useful for determining the ramp time for Hamiltonian systems but there is an obstruction to a simple connection which we discuss below.
We use the approach of [49,50] and rewrite the spectral form factor as a sum over correlation functions. Rewriting the operator trace in terms of all Pauli operators P j [49] one has for any operator O Introducing another operator Q and taking the trace on both sides we get This is a general relation that can be applied to random quantum circuits as well as Hamiltonian dynamics. Let Q † = O = U (t) and average over the U (t) of random quantum circuit realizations. We can then rewrite the infinite temperature spectral form factor Tr U † (t) · Tr U (t) = 1 + To understand the approach to the ramp we must understand the decay of the two point correlators of all L 2 − 1 non-identity Pauli strings ( P j (t)P j ). In Section 5.1 we showed that this sum is dominated by simple Pauli strings and the leading contribution to come from the Pauli strings with a single non-identity element -there are 3N such strings. Each of these simple two point functions will decay exponentially ∼ e −mt for some gap parameter m. Therefore, the spectral form factor can be approximated by demonstrating a log N scaling for the ramp time with a prefactor of order 1/m. So the time scale is related to the decay of two point functions, not the Lyapunov exponent. We expect the same to be true for Hamiltonian evolution, where U (t) = e −iHt and the average is over an ensemble of Hamiltonians. In a previous version of this paper, we gave a heuristic argument estimating the ramp time of Hamiltonian systems by assuming, as above, that the slowest decay was that of such simple operators. However that argument had an error. 34 Two point functions of operators that couple to the Hamiltonian have subleading terms that do not decay in time because of energy conservation. 35 They remain constant for all times. These terms are small, but are much larger than the ramp time value and so need to be controlled to make a reliable estimate. In fact we believe that these terms almost exactly cancel at the ramp time. A reliable calculation of the ramp time has been done for Hamiltonian SYK in [70] using the collective field formalism. The result there is consistent with the intuition described here.

Closing Remarks
In this paper we have studied the onset of random matrix behavior in strongly chaotic many-body systems. But it remains to give a convincing argument for random matrix behavior over the full range of energy and time scales present. For example it is an important task to give a general argument for presence of the full ramp/plateau structure in the spectral form factor. One approach that has been suggested in [2,76] is to extend the ideas of Dyson Brownian motion used with great effect in [76] to many-body systems. Further exploring this approach is an interesting subject for future research. One important issue will be to understand the role of the comparatively large O(1/N ) (rather than O(1/L)) fluctuations in eigenvalue density that are characteristic of many-body systems.
Another interesting approach has been developed recently in [3]. They apply ideas of single-body random hopping problems to the exponentially large Hilbert space of the many-body system. Their approach does not get the value of the ramp time we argue for but we think this technique may well be useful to describe the later time ramp and plateau.
This approach is closely connected to the observation in Appendix F of [8] that the connected contributions to TrH k TrH k where H is the SYK Hamiltonian have the perturbative SY K value for k fixed as N gets large, but at k * ≈ N log N this quantity assumes the value it would have if H were a random matrix in the second quantized Hilbert space. Naively converting this to a time scale (remembering that the width of the spectrum is ∼ √ N ) one finds t ∼ √ N log N , the scale found in [3]. The result of the calculation in [3] is a formula rather similar to (105), a sum over an exponentially large number of more and more rapidly decaying terms. The leading correction is of the form N e −µt much like (106). But here µ ∼ 1/ √ N , corresponding to a parametrically slower decay rate than in the SYK version of (106). It is this slow decay that causes the large estimate t ramp ∼ √ N log N . There are no perturbative modes in SYK with such a slow decay rate 36 so their significance is a puzzle to which we hope to return. 37 One of the motivations for thinking about these strongly chaotic, scrambling, systems is their connection to the physics of quantum black holes. One important thing to understand is which phenomena are important in both small and large AdS black holes and which only occur in large ones. The time scale for evaporation of a small black hole is order S (where S is the black hole entropy). The ramp time we have found for the k-local systems appropriate to the black hole horizon is order log S, parametrically smaller than the evaporation time. These effects will be exponentially subdominant in simple observables at these short times so studying them will be difficult, but they are present and are a signature of the nonperturbative quantum dynamics of these systems.
Perhaps the most important question from the gravitational point of view is the bulk dual in the sense of AdS/CFT of these random matrix phenomena [8]. In SYK a related question is the explanation of these phenomena in terms of the G, Σ collective fields. We have made progress on this problem which we hope to report on in the near future [70]. 4. Then we compute Σ 2 (K).
Steps (1) and (2) are not implemented in [7]. The number variance calculated in this manner is shown in Fig. 24, for N = 32 and N = 34. The agreement with RMT persists to larger K, roughly a factor of two larger, than in [7]. Thus we see that the number variance is sensitive to the unfolding procedure.
As discussed in the text we expect that RMT behavior extends to an exponentially large value of K, much larger than shown even by these improved results. Our unfolding procedure is still rather crude. As a first step one could try to fully unfold the spectrum sample by sample. But it is unclear to us whether this is even in principle sufficient. At a minimum an exponentially large number of samples would be required to get a sufficiently accurate value of the variance.
For this reason the number variance seems not to be the best quantity to probe long range spectral rigidity in many-body systems. As discussed in the text the spectral form factor is less sensitive to such errors.

A.2 Estimating t min in the SYK Model
In this appendix we give our detailed results for determining t min in the SYK model numerically. The algorithm used to select an optimal α is as follows: In the second panel of Figure 3 we have varied α with step size 0.5, and observed that some α between 2.5 < α < 3.0 gives the smallest value of t min ; at α ≤ 2.5 the oscillation due to the sharp edges survives; the oscillation disappears at α = 3.0, and after then the min moves toward later time. By closer inspection we conclude that α for the smallest t min is 2.9 ± 0.1. In Fig. 4, we plotted the near-dip region of g(t) and h(α = 2.5, t). h(α = 2.5, t) is rather smooth near the dip and h(α ≥ 3.0, t) has a single minimum. Hence we can estimate the min time by choosing the value of t in the data which gives a single minimum of h(α, t). The neighboring data points can give the edges of the error bar for α. We should emphasize that this procedure is somewhat arbitrary.  N smallest t min α for smallest t min 10 9.1 ± 0. It is hard to determine the N -dependence of t ramp from Table 1. The fact that t min and hence an upper bound for t ramp is nonmonotonic, a physically unlikely result, indicates the uncertainty in the procedure. The best we can say is that t ramp is certainly far smaller than the dip time t dip and increases at most slowly with N . Assuming order one coefficients a power law faster than linear or an exponential behavior (as suggested in [7]) is disfavored.

A.3 Eigenvalue Behavior for 2-local RCQ
In order to test the spectral rigidity of the energy spectrum, we performed the unfolding with steps 1 -4 explained in Sec. A.1. The shifted-and-rescaled energy spectrum (obtained by performing steps 1, 2 and 3), before the unfolding, is shown in the right panel of Fig. 5. Unlike the SYK model, the edge of the spectrum is not sharp. Due to this, the slope of SFF decays much faster, as we will see shortly. The nearest-level separation obtained from the unfolded spectrum is plotted in the left panel of Fig. 25. A good agreement with GUE ensemble at large N can be seen. The number variance Σ 2 (K) is plotted in the right panel of Fig. 25. At N = 16, an agreement with RMT can be seen only at K 10.

B Analytic Results for Random Quantum Circuit
In this appendix we collect several analytic results which complement the discussion in Sec. 5.

B.1 k-th Moment of the Haar Measure
If we take the average with respect to the Haar measure on U(L), u(k, t) = |TrU k (t)| 2 is equal to k for k ≤ L and is equal to L for k > L. For k = 1 and k = 2, it can be seen by contracting the indices in (see e. g. Ref. [41]) and and A proof for generic k is discussed in Section 3.2 (also see [63]).

B.2 Calculating |Tr U 2 (t)| 2 in RCQ Random Circuit
In this section we study for the RCQ random circuit using the techniques of [34]. The generalization to the Brownian circuit is straightforward. Here we specialize to the U(4) gateset. We need to estimate two kinds of contributions: • U ii U ii U * jj U * jj . These terms start out at 1 and decay to their Haar value which from (108) has a large L value of 2/L 2 for i = j. It is zero for i = j.
These terms start at zero and reach their Haar value ∼ 1/L 2 .
Without loss of generality we can set i = 1 and consider U 1j U j1 U * 1j U * j1 (j = 1). These terms start out at zero and reach their Haar value ∼ 1/L 2 . In the 0, 1-basis, the |1 state is expressed by |0 N . In the same way as the discussion below equation (47), we take the state j to be of the form |1 q 0 N −q , again without loss of generality. The argument is a simple generalization of the one for U ii U * jj , presented below (47). We start from the initial state on two copies of the system, Then, we can express whereρ (2) We will useγ 0 (p 1 , p 2 ) to denote the Pauli coefficients forρ (2) (0). Recalling that |0 0| = 1 2 (I + Z) and |1 1| = we can see thatρ (2) (0) consists of the Pauli strings made only of I and Z. Therefore, where p 1 and p 2 run through the Pauli strings made only of I and Z.
is zero at t = 0 means γ t (p 1 , p 2 )γ 0 (p 1 , p 2 ) cancel with each other at early time (due to sign differences). In order to see the time evolution of the second and third terms, we need to understand how the unitary gates selected from Γ = U(4) act on two copies of Pauli strings σ p 1 and σ p 2 . We write them as σ p 1 = σ p 1,1 ⊗ σ p 1,2 ⊗ · · · ⊗ σ p 1,N and σ p 1 = σ p 2,1 ⊗ σ p 2,2 ⊗ · · · ⊗ σ p 2,N , respectively. Because each gate acts on Pauli matrices at two sites i and j, let us focus on these two sites: σ p 1,i ⊗ σ p 1,j and σ p 2,i ⊗ σ p 2,j . The gate acts on (σ p 1,i ⊗ σ p 1,j ) ⊗ (σ p 2,i ⊗ σ p 2,j ) as Here G i,j ∈ Γ =U(4) is a unitary gate and the Haar average over the gate set Γ has been taken. The coefficients c p ,q are [34] c p ,p = In words, unequal strings are mapped to zero in one step. When strings are equal and II they remain unchanged. When strings are equal and not II they are uniformly sprayed among the 15 non II configurations. (When the gate set is a subset of U (4) the first, unequal case, is given by 1 − ∆ where ∆ is the gap of the gate set. These unequal configurations die away exponentially, like the k = 1 discussion in the text. The last coefficient is also modified with no qualitative effect.) By using this expression, we can estimate the time dependence both for local and 2-local circuits. We first focus on the local circuit.
Note that we can write the sum in Eq. (118) as We can bound the second and third terms separately. For the U (4) gateset the third term in the sum is set to zero after a single step of the random circuit. One can check that the terms in the second term have the same sign, so we can rewrite the second term as where p is a sum of all 2 N − 1 Pauli strings of I's and Z's. For each fixed p , we have a Markov chain that starts at position p (note that 2 N · |γ 0 (p , p )| = 1)and converges to a uniform distribution on 4 N − 1 strings.
This implies that lim t→∞ ( * 2) ≈ 4 −N · 2 −N , which is exponentially suppressed relative to the Haar value (= 4 −N ). So we need only focus on the early time decay of this quantity.
In particular to establish closeness to Haar for fixed we do not need to consider the final decay governed by the gap of the full Markov chain. Because of the large number of configurations, and in particular the 15 possible two Pauli strings not equal to II, the return probability in this chain is small and can be ignored (up to fractional errors of 1/15). So we can think of the configuration space as a large tree. The slowest decaying N qubit strings are those with lots of II substrings and hence very few Z's. We can ignore the rare cases where the Z's are adjacent. Then the overlap (122) decays with a rate proportional to the number of Z's. The estimate becomes analogous to that presented in Section 5.1.
As in Section 5.1 we will find a time of order log N for the sum of the slowest terms to be small enough to get close to the Haar value. At this time each of the slowest chains, those with a few Z's, have only spread to polynomially many other chains. So the gap of the full Markov process should not come into play.
The initial decay rate of the Pauli string with d Z's will be given by e −∆d per step of the process where we write the probability for a two qubit string to remain in a given non II configuration as e −∆ = 1 15 . Therefore we have Putting everything together and assuming N is sufficiently large, we conclude That is the strings with just one Z dominate this decay.
We consider i = 1, without loss of generality. Again in the 0, 1-basis, |1 is expressed by |0 N . In the same way as the discussion below (47), we take the state |j to be of the form The argument is a simple generalization of the one for U ii U * jj , presented below (47). We start from the initial state on two copies of the system, The time evolution ofρ is defined bŷ Then, by construction, U 11 U * jj U 11 U * jj = Tr Ô (2) (t)Ô (2) † (0) .
We expandÔ (2) by Pauli strings aŝ The coefficients at t = 0 are given by γ 0 (p 1 , p 2 ) = 2 −N (+i) f ( α(p 1 ))+f ( α(p 2 )) , when p 1 and p 2 are Pauli strings whose first q elements are X or Y and the remaining N − q elements are I or Z. For other p 1 and p 2 , γ 0 (p 1 , p 2 ) = 0. We can now write U 11 U * jj U 11 U * jj = p 1 ,p 2 γ t (p 1 , p 2 )γ * 0 (p 1 , p 2 ) (132) We can split the sum into three parts The strings we have chosen have q X or Y s. We denote the number of Z's by d, where d can runs from zero to n − q. When j = 1 or equivalently (q ≥ 1) we note that for every string with an X at a given position there is an identical string with a Y at that position. Within the second term, contributions of string with X and Y in the same position will differ by a sign, due to the fact that there are two copies of the system and each one will contribute an i for each copy of Y . So the contributions of all such strings cancel exactly. This implies that when j = 1 the second term in (133) will be set to zero in one step. Again for the U (4) gateset, the third term in the sum is also set to zero after a single step of the random circuit.
In the remaining case (j = 1), we will only have strings of I and Z and the evolution will be analogous to the previous section. In fact we not need to repeat this estimate because there is only one such term, i = j = 1, which makes an exponentially small contribution to the final value of u(2, t).

Summing up all the contributions
We can rewrite the sum as (recall L = 2 N and assume t > 1) The factor of 2 comes from the i, j symmetry in the sum. Note that this result becomes close to the Haar value in times 1 ∆ log(N/ ).
For the 2-local system the above bound goes through without modification. There is one subtlety however. The results of Harrow-Low [34] show that full Markov chain convergence occurs in order log N time, indicating that the entire Pauli string configuration space is covered in that time. So the tree approximation may no longer be valid. But by this time our quantity is exponentially close to the Haar value (123) and so there should be no increase in the estimate.

Alternate Strategy
Another method for calculation the u(2, t) is to use the rewrite presented in equation (104), where the sum is over all the non-identity Pauli strings. In the second term of the above formula, we translate the average over product of circuits to an average on two copies of the system followed by a swap operation F.
We have not analyzed this systematically, but observe that averaging can be done independently for each gate and so a new, perhaps more complicated, chain will result. The question then is to understand how coefficients of the Pauli strings decay in this chain for each application of a gate. If we ignore backtracking and approximate the chain evolution by a tree, as above, the question reduces to computing the decay rate for single gate from U (4). The average of the U(4) gateset on a non-identity Pauli string AB (A, B = I, X, Y, Z) dotted with the same string can be computed and is, This means that in the local and 2-local circuit the strings with d non-identity elements will decay like e −d∆t , where again e −∆ = 1/15. This indicates that the decay is dominated by the slowest decaying terms which have d = 1. The total number of such terms is 3N , giving the log N result described above.
There is a small subtlety here. There are d = 2 terms with the two Paulis adjacent to each other. These decay like a d = 1 term. However there are only O(N ) terms of this kind so this will just produce an addition to the numerical factor in front of N . This is a random circuit version of the factorization issue discussed in Section 6. Therefore, the Haar time scaling, ∼ log N , is unaffected. Hilbert space, which connects w nearby states. This looks 'local' if we identify the states with a single particle hopping on L lattice sites, but it is useful to note that this is rather different from a many-body local Hamiltonian, say a local spin system. In the latter, in a natural local basis, the Hamiltonian is sparse but the nonzero entries are not aligned near the diagonal as in the banded matrix. In local Hamiltonians with N spins, each row and column has L = 2 N entries, among which there are O(N ) nonzero elements. This is much sparser than a banded matrix with w ∼ √ L, but it can already be chaotic. The time scales are also very different; the ramp time for the local Hamiltonian, t ramp ∼ N 2 ∼ (log L) 2 (see Sec. 4.2), is much shorter than (L/w) 2 when w ∼ √ L.

C.2 Brownian Circuit
The Brownian circuit of random band matrices can describe diffusion, because the number of particles -which is one -is conserved. For this reason, as we will see, we can confirm t Haar ∼ (L/w) 2 ∼ t diff . The numerical procedure is the same: we calculate U (t) = n k=1 e −iH k dt , where t = n·dt and H k are L × L random band matrices with a width w. From this we calculate the |T rU | 2 .  Figure 27: The value of (w/L) 2 |Y (α = 1, t)| 2 plotted against (w/L) 2 · t for various (L, w) satisfying √ L ≤ w ≤ L/4, α = 1. The number of samples with each value of w is 10 6 /L = 2500, 1000, 400, 100 for L = 400, 1000, 2500, 10000 cases. For w L/4 the initial growth indicates that |Y (α = 1, τ )| 2 ∼ (L/w) √ t, while for all (L, w) plotted, the growth for larger t before the plateau agrees with |Y (α = 1, t)| 2 ∼ t.
First let us see that the specific choice of dt is not important. When dt is sufficiently small, if we take tdt = n(dt) 2 to be the horizontal axis, the dt-dependence is gone; see Fig. 29. This scaling can be understood if the time evolution is described as a random walk with step-size wdt. After n steps, the typical distance from the starting point should be √ nwdt, if the random walk picture is true. Then for fixed L and w the dt-dependence should disappear when the horizontal axis is n(dt) 2 . Below we fix dt to be 0.5.
In Fig. 28 we have plotted |TrU | 2 − 1 against (t(w/L) 2 )/(L 2 − 1) for fixed w = 25 and various L, and for a fixed L/w. In both cases we can see an exponential decay. The exponent is a function of L/w, as we can see from the right panel of Fig. 28.
As we can see from Fig. 30, we can show the exponential decay at late time as well, |TrU | 2 − 1 ∼ e −2t(w/L) 2 . Note that the exponent is different from the early time. If we define t Haar as the time |TrU | 2 − 1 reaches to a certain value, say 0.1, then this scaling leads to t Haar ∼ (L/w) 2 .
It is possible to understand this scaling from the random walk picture. When the average distance from the starting point, which is √ tw, is of order L, it is natural to expect that the transfer matrix U is almost a random unitary. Therefore √ t Haar w ∼ L, or