Spread of entanglement in a Sachdev-Ye-Kitaev chain

We study the spread of R\'enyi entropy between two halves of a Sachdev-Ye-Kitaev (SYK) chain of Majorana fermions, prepared in a thermofield double (TFD) state. The SYK chain model is a model of chaotic many-body systems, which describes a one-dimensional lattice of Majorana fermions, with spatially local random quartic interaction. We find that for integer R\'enyi index $n>1$, the R\'enyi entanglement entropy saturates at a parametrically smaller value than expected. This implies that the TFD state of the SYK chain does not rapidly thermalize, despite being maximally chaotic: instead, it rapidly approaches a prethermal state. We compare our results to the signatures of thermalization observed in other quenches in the SYK model, and to intuition from nearly-$\mathrm{AdS}_2$ gravity.


Introduction
Intuition from statistical mechanics suggests that generic interacting sytsems should thermalize. For an isolated quantum system, this seems counter-intuitive since a pure state always stays pure under unitary time evolution. Nevertheless, there is a deep sense in which a highly excited pure state can nevertheless "look thermal". If we study the reduced density matrix of a small subregion of the total quantum system, we expect that in a generic thermalizing quantum system, the reduced density matrix of such a region is very close to a thermal density matrix [1][2][3]. Such a thermal reduced density matrix has entanglement entropy proportional to the volume of the subregion, in contrast to the vanishing entanglement entropy of a pure state. Therefore thermalization of an isolated system is fundamentally related to the dynamics of the entanglement entropy between subsystems.
A direct probe of thermalization is thus to start with a highly excited state with low entanglement, and to evolve it forward in time. For example, suppose that we start in the ground state |Ψ of a quantum system for t < 0, and at t = 0 abruptly change the Hamiltonian so that |Ψ is now highly excited. By studying the entanglement growth in a subregion of the quantum system after such a quench, we learn how the system thermalizes; as we have seen, the spread of entanglement is necessary for thermalization. In many strongly interacting quantum systems, it is observed that (for large enough regions) the rate of change of the (von Neumann) entanglement entropy of a region of surface area A is given by where s th is the entropy density of the resulting thermal state, and v E is an "entanglement velocity" [4][5][6]. The entanglement velocity gives a simple measure of how rapidly the density matrix appears thermal, and -at least by dimensional analysis -defines a velocity scale for the thermalization of an interacting quantum system. Another perspective on thermalization arises from quantum chaos. In a chaotic system, quantum information present in a small subregion at time t = 0 becomes spread out quickly. Consider an operator O x with support near a point x at t = 0. After time evolution, the operator O x (t) = e iHt O x e −iHt can become a 'large' operator with support in a ball of radius ∝ t [7]. The process by which these operators become delocalized, and so one must look at a large region of the system to recover a small amount of information, is coined 'scrambling' [8]. The spatial dynamics of scrambling is governed by a different velocity scale, called the butterfly velocity v B [9]. In order for a large subregion of a quantum system to thermalize, certainly operators localized inside of the subregion at t = 0 must begin to extend outside of the subregion by the thermalization time.
Thus, the dynamics of entanglement cannot be entirely independent from the dynamics of scrambling: both are intimately connected with thermalization. There are plausible arguments [10,11] that scrambling should (begin) to occur first: because the growth of entanglement is impossible without the spread of information. 1 However, many open questions remain. Is (1) a universal property of chaotic, thermalizing quantum systems? If so, is v E a physical speed in the quantum system, and if it is, does something locally well-defined propagate at this speed? What velocity scale -if either -limits the onset of classical hydrodynamics, and could thus bound sound speeds and diffusion constants [13], and why? While many of these questions become quite delicate for many-body localized systems [14] (or systems close to a many-body localization transition) [15][16][17][18], they are also not well understood for highly chaotic systems.
Here we will initiate a study of the spatial dynamics of entanglement in the SYK chain model proposed in Ref. [22].We will describe specific details of this model later. For now, let us simply emphasize that it is a highly disordered quantum system with a single energy scale J, and a large number N of degrees of freedom per lattice site. At temperatures T J (commonly denoted βJ 1, with inverse temperature β = 1/T ), the SYK model exhibits near-conformal invariance, becomes maximally chaotic in its early time behavior, as measured by the Lyapunov exponent, and has exponentially many low-lying excited states, leading to a non-zero entropy at zero temperature if we first take the large N limit. Due to being maximally chaotic, we would anticipate that the SYK chain is an effective thermalizer.
More specifically, we study the entropy growth in the generalized SYK model after a global quench from a special initial state, the thermofield double (TFD) state [39]. To construct the TFD, we tensor product two copies of the original Hilbert space of the SYK chain: H = H L ⊗ H R . We have denoted the copies as left (L) and right (R). The TFD state at time t = 0 is |TFD ∝ e −β(H L +H R )/4 |I .
The state |I is a direct product of local EPR pairs between the two systems L and R (See. Fig. 1). For a spatial entanglement cut, |I has no entanglement entropy. H L and H R are suitable notions of the SYK-chain Hamiltonian acting on only the L or R degrees of freedom (see Section 2.2 for precise definitions). Upon tracing out either the L or the R degrees of freedom, the resulting density matrix is thermal. However, we may define a Hamiltonian for the combined LR system such that |TFD is not an eigenstate. If we look at a suitable subregion A with support in both the L and R chains, such as the left half of the chain shown in Fig. 1, we can observe the spatial spread of entanglement in this doubled system. More specifically, by using the replica trick, we compute the n-th Rényi entanglement entropy S A,n (with integer n > 1) of the reduced density matrix of |TFD(t) in one half of the TFD chain. When the coupling between neighboring sites is the smallest energy scale, we find that the entropy increases linearly in time as in (1), with the growth rate The linear growth slows down at long time and eventually has to saturate if the length of the chain is finite. We study the late time behavior by two different approaches. The first approach is a perturbative expansion in the coupling strength between neighboring sites, and allows us to compute the onset of deviation from linear growth at an intermediate time scale. Such an approach does not apply directly to the long time limit. In the second approach, we make a simple ansatz for correlation functions that allows us to compute the entropy at all times by a much simpler (but still nonlinear) geometric problem. Solving this problem in the long real time limit predicts a saturation of the Rényi entropy. Since we have done a restricted minimization of action, what we obtain is an upper limit of the entropy: Here M is the length of the chain, so that is the entropy per site. c v is the specific heat of the doubled SYK chain, which is a constant at low temperature. Surprisingly, for n > 1, at low temperature the Rényi entropy density upper limit is parametrically smaller than that of the thermal ensemble, which implies there are degrees of freedom that do not thermalize in the large N limit.
We propose that this phenomenon of prethermalization is related to the presence of a large density of almost localized states at very low energy. These states are responsible for the zero temperature entropy. They evolve slowly and do not contribute to entanglement if we first take the large N limit. As a consequence, the entropy growth is upper bounded by c v T = s th (T ) − s th (0): the change in entropy due to finite temperature T . This does not include the non-vanishing zero temperature entropy density of the SYK chain. We expect the system eventually thermalizes but that the thermalization time diverges in the large N limit. The fact that the SYK chain only prethermalizes rapidly, but thermalizes slowly, implies that Eq. (1) may not be a sensible definition of an entanglement velocity. We will discuss this possibility in much more detail at the end of the paper. If one uses the definition (1) for v E , then using (4) we find that v E ∝ T for the SYK chain.
It is interesting to discuss the relation of our results with holographic models of strongly interacting theories. Although the holographic dual of the SYK model is not known, the SYK model shares many similar properties with holographic models containing extremal horizons [40][41][42]. We show that these holographic models exhibit similar 'early' time von Neumann entanglement growth, with dS E /dt ∝ T , but the von Neumann entanglement S E saturates at the thermal value, including the extremal zero temperature contribution. Although we are unable to explicitly compute the analogous holographic Rényi entropy growth, we propose that the Rényi entropy in an analogous holographic setting may behave similarly to the SYK chain. This arises due to subtleties with gravitational dynamics in AdS 2 .
Finally, we note that two models studying thermalization in (single site) SYK models in a somewhat different context have recently appeared [43,44]; both studies show evidence for rapid thermalization. Evidence for eigenstate thermalization in the SYK model has also recently appeared [45]. Our results are not inconsistent with theirs, but we defer a detailed comparison to the end of the paper.
The rest of the paper is organized as follows: in Sec. 2 we present the explicit setup of the global quench problem we are studying. We compute S A,n (t) in the limit where the coupling between different sites is much weaker than the on-site coupling in Sec. 3, and in Sec. 4 we consider the leading perturbative correction to this result. In Sec. 5 we present a geometric interpretation of the quench problem we are studying and compute the long time saturation of the entanglement entropy. This is a regime where both previous calculations fail. In Sec. 6 we compare our result to intuition from holography. In Sec. 7 we discuss the broader implications of our result and compare to other recent studies on quantum quenches in the SYK model [44,43].

The SYK Model
The Sachdev-Ye-Kitaev (SYK) model [19,20] describes N Majorana fermions with quartic random all-to-all interactions. The Hamiltonian of this model is where {J jklm } are independent, mean-zero random couplings: The model is solvable at large N , and maximally chaotic at strong coupling N βJ 1 [20,46,21]. It provides a rare example of chaotic yet tractable many-body systems.
Recently, many generalizations of the SYK model have been proposed. In particular, [22] studied a higher dimensional lattice generalization of the SYK model with spatial locality. For a one-dimensional chain, the Hamiltonian of the generalized SYK model is given by where the couplings {J jklm,x } and {J jklm,x } are all independent Gaussian random variables with mean zero, and variances It is convenient to define an effective coupling constant which determines the local properties of the model. Similar to the original SYK model, this generalized model is solvable at large N and maximally chaotic at strong coupling. At leading order in N , this model has a saddle point which is equivalent to the onesite SYK model. At next-to-leading order in N , there is non-trivial spatial dynamics with dynamical critical exponent z = ∞, also known as local criticality. Local criticality implies that space does not scale under renormalization group flow, and is responsible for some of the particular features of the SYK chain model. The spatial locality of the model enables us to study thermal transport and the spatial propagation of scrambling and chaos. The out-of-time-order correlation function (OTOC) takes the form 1 with Lyapunov exponent λ L = 2πT and butterfly velocity v B = √ 2πT D, with D the thermal diffusion constant [22]. We note that this relation between v B and thermal diffusion constant D holds in holographic locally critical theories as well [47,48]. Charge transport can also be studied in a modified model with charge conservation [24], though we will focus in this paper on the simpler model above.
Before describing how to exactly solve this model in the limit N βJ 1 in Sec. 2.4, we would like to first define the TFD state described in the introduction, and discuss how to compute the Rényi entropy in this state.

Thermofield double state and global quench
We consider two copies of a single SYK chain, and consider a special initial state: the thermofield double (TFD) state [39]. As we will see, analytic computations in such a doubled state are tractable; the qualitative features of entropy growth in the quenched time evolution of a short-range entangled initial state should not depend on details of the initial state.
We first give the general definition of the thermofield double state. For a system with lattice sites labeled by x, we first choose a basis |a, x , a = 1, 2, ..., D on each site. Here D is the Hilbert space dimension of each site (D = 2 N/2 for the SYK chain model). Then we consider the following state of the doubled system, which is a direct product of maximally entangled pairs on each site: Here L and R (left and right) label the two copies of the system. In state |I , each chain is maximally entangled with the other chain, but when we consider the two sites at x together, they are unentangled with the rest of the chain. Indeed, interpreting the chain labels L and R as an "internal" label, |I is a direct product state with no spatial entanglement. For a given Hamiltonian H of the original single chain problem, we can define its transpose H T by taking the matrix transpose in the basis |{a x } ≡ ⊗ x |a x , x . 2 More explicitly, H T is defined as Now define a Hamiltonian in the doubled system such that H acts on the left system and H T acts on the right system. One can explicitly check that the state |I satisfies The TFD state |I , introduced in (3), is then defined as with Z β = tr e −βH the thermal partition function of the single-chain system.
A key property of TFD is that the reduced density matrix of the L chain alone, or the R chain alone, is thermal, with inverse temperature β. This can be directly shown by applying Eq. (15) to obtain |TFD = Z −1/2 β e − β 2 H L |I and use the fact that |I maximally entangles the two chains. One can view the TFD state as a purification of the thermal density matrix, in which the chain R plays the role of thermal bath of chain L. Compared to a generic purification, the TFD state has the special property that the entanglement between the two chains is spatially local. The state |I (which is the β → 0 limit of |TFD ) has zero entanglement entropy between different spatial regions. |TFD at finite temperature is obtained by a finite time imaginary time evolution of |I . This imaginary time evolution leads to spatial entanglement. However, any resulting entanglement entropy will satisfy an area law [49], as long as H L,R are local. In the one-dimensional chain case, this means the entanglement entropy of a connected region A stays finite even if the size of A and its compliment go to infinity.
The definition of |TFD is not unique, since it depends on a basis choice. However, different definitions lead to |I that are related by a product of local unitaries, which does not change entanglement properties of the TFD state as long as the definition of H T is conjugated by these unitaries correspondingly. For concreteness, we give an explicit definition of |TFD in the SYK chain case. Denote the Majorana fermion operators by χ j,x,L and χ j,x,R , with j = 1, 2, ..., N . (We remind the reader that N must be even, to have a well-defined Hilbert space at each site.) One convenient choice of the state |I can be defined by the following equations: with j = 1, 2, ..., N/2. In the eigenbasis of n j,x,L(R) = c † j,x,L(R) c j,x,L(R) , the state is a product of |0 L |1 R + |1 L |0 R on each site. This leads to the equations for the Majorana operators (χ j,x,L + iχ j,x,R ) |I = 0, (j = 1, 2, . . . , N ). (18) In this choice, one can check that the SYK chain Hamiltonian (8) satisfies H T = H. . This perspective is useful in computing the partition function Z A,n , where the blue lines play the role of branch cuts. Each replicated fermion χ α shifts its replica index to α + 1 when it crosses the right branch cut line (the side closer to the reader) from below and shift to α − 1 when it crosses the left branch cut line from the above. We can further deform the two horizontal blue branch cuts to a single vertical dashed branch cut (shown in red).
Therefore we can take two identical chains and the TFD state is defined as 3 From the perspective of time evolution, the thermofield double state is an eigen-state of operator H ⊗ I − I ⊗ H T but not the eigen-state of our Hamiltonian H D = H ⊗ I + I ⊗ H T . Therefore, we can treat H D as the initial state after a global quench, and apply the corresponding time evolution operator to obtain a time dependent state: Now we can look at the sub-region A = A L ∪ A R which is supported on two sides as shown in Fig. 2 (a) and consider its reduced density matrix: By construction, each chain in the TFD state has a thermal density matrix that is invariant under time evolution. Consequently a region that is a subset of only L chain or only R chain also has a thermal time-independent density matrix. In the region we choose, ρ A L = tr A R ρ A (t) and ρ A R are thermal, but ρ A is time-dependent. If region A thermalizes after a long time, ρ A should approach the thermal density matrix ρ A L ⊗ ρ A R . Therefore, the increase in entropy of region A during thermalization corresponds to the decrease of mutual information I(A L : A R ) = S(ρ A L )+S(ρ A R )−S(ρ A ) between the two regions A L and A R . If the system thermalizes, the mutual information should vanish in long time (or at least becomes subleading in volume of A). Physically, the decrease of mutual information is a consequence of the scrambling of quantum information during chaotic time evolution. Correlation between operators in A L and A R evolves to more and more non-local operators that cannot be revealed in region A L and A R [50, 16].

Twist operators and Rényi entropy
It is difficult to directly calculate von Neumann entropy of the region A. Instead we use the replica trick [51] to compute the Rényi entropy for the subsystem A by a path integral: where the factor of Z n β in the denominator arises from the definition of the TFD state. The numerator Z A,n is the partition function evaluated on a "twisted" manifold corresponding to an n-sheeted cover of spacetime. The easiest way to understood the partition function on this twisted manifold is to consider a partition function defined on n replicas of the original theory. For the generalized SYK model with Majorana fermions χ j,x , with j = 1, 2, ..., N the flavor index and x the lattice site coordinate, the replicated theory describes fermions χ α j,x with α = 1, . . . , n. We then write down a path integral for all χ α j,x , containing evolution in Euclidean time, which prepares the TFD state, followed by evolution for real time t. From the perspective of the twisted manifold, the replica index of a fermion describes which sheet of Euclidean space-time it lives on. These sheets are connected through the branch cut lines shown in blue in Fig. 2, such that fermions passing across a branch cut have their replica index cyclically permuted. More explicitly, the boundary condition of fermions is where τ * is the time location of the branch cut. Z A,n is the partition function for replicated theory with the above boundary condition. The position of the branch cut line is not important; it can be moved around by relabelling fermions in different replicas. The only invariant information about the twist is the end points of the branch cut. 4 For our convenience, we can move the branch cut points to a "time-like" line connecting the two branch cut points (red line in Fig. 2 (b)). With this gauge choice, the boundary condition in the time direction remains untwisted, and the only effect of the twist operators is to modify the spatial couplings between fermions on different sites. Denoting the sites separated by the boundary of A as x * and x * + 1, the twisted coupling is between χ α j,x * (τ ) and χ α+1 j,x * +1 (τ ) when time τ is in the interval of the branchcut line. The coupling is diagonal in α everywhere else.
It is also helpful to write another equivalent expression of the Rényi entropy. If we define a twist operator X An which is applied to the branch cut lines A L or A R and cyclically permutes the replica index, the Rényi entropy can also be written in a time-ordered thermal two-point function of twist operators: Here ... β = tr ρ ⊗n β ... is the thermal average in n copies of the single chain system. The real time evolution and imaginary time evolution can be drawn as a contour in the complex plane of time, as shown in Fig. 3(a).

The path integral for the SYK chain
The discussion of the previous subsection was completely general. It applies to the TFD state of any system with a spatially local Hamiltonian. We will now write down the coupling more explicitly for the SYK chain model.
The replicated partition function Z A,n with the boundary condition described in the previous subsection can be written as where C stands for a special time contour for the thermofield double states as shown in Fig. 3. At the boundary of A, between sites x * and x * + 1, the contour is split into two parts C 1 and C 2 by the twist operators. The branchcut line runs along C 1 . The effect of the twist is to modify the spatial coupling J jklm,x term by the matrix g αβ x (τ ), given by , where t C = τ + it is the complex time variable. The red part represents the TFD(t)| in Fig. 2, and the black represents the |TFD(t) . For later convenience, we name them as C 1 and C 2 .
In a generic system, to compute the quenched average of the Rényi entropy S A,n over disordered couplings J jklm,x and J jklm,x one should compute Z k A,n for a general integer k, and then analytically continuate to the k → 0 limit. In the SYK model, it is known that at leading and next-to-leading order in N , the partition function is replica diagonal, such that Z k A,n Z A,n k [21,22,52]. Therefore we will directly work with Z A,n . The average over the Gaussian-distributed random couplings J and J leads to the following partition function: Next, we rewrite this fermionic partition function as a theory of bosonic bilocal fields. Define the Green's function G αβ (note the replica indices): We impose this definition of G αβ x (τ 1 , τ 2 ) by a Lagrangian multiplier Σ αβ x (τ 1 , τ 2 ) in the path integral. Integrating out fermions after inserting G and Σ leads to the following effective theory: Up to this point, all the manipulations are exact in the large N limit. In what follows, we will treat the effect of the twist operators by making certain approximations in the low temperature limit as well.

Large N and replica diagonal partition function
Our goal is to compute the Rényi entropy: We have seen that Z A,n may be evaluated using a path integral for a generalized SYK model with n flavors of replicas, with a twisted interaction on a special time contour. In this section, we aim to evaluate Z A,n with some further assumptions.
In the large N limit, the partition function Z A,n can be computed using a saddle point approximation. We find a saddle point equation for Σ and for G. The first equation is standard: We must explicitly consider functions of two time variables because time translation symmetry is broken due to the special time contour C, and the twisted interaction; the −1 should be read as matrix inverse in both time domain (τ 1 , τ 2 ) and replica indices (α, β). The second equation depends on the location x. When x = 0, 1, the self energy is the same as the normal generalized SYK model (with added replica indices): Note that (G αβ ) 3 means an entry-wise cube of the matrix G αβ , and not the αβ element of G 3 . For x = x * or x * + 1, the self energy term experiences the twisted interaction: where we have omitted the time variables (τ 1 , τ 2 ) in G and Σ for simplicity. Without the twist, the saddle point solution to the Schwinger-Dyson equation is diagonal in replica indices, with the form G αβ x (τ 1 , τ 2 ) = G s (τ 1 − τ 2 )δ αβ [21]. Since the twist couples different replicas, it is possible that the saddle point solution becomes off-diagonal. To see whether this possibility is realized, let us start with a theory with J 1 = 0. This reduces to a theory of decoupled SYK sites and the saddle point solution is diagonal. When a small J 1 is gradually turned on, it is natural to consider a perturbative solution to the Schwinger-Dyson equations (32)- (33). However, according to Eq. (33) the self-energy Σ αβ x is always proportional to G αβ x . Consequently, if we start from the J 1 = 0 diagonal solution and solve the equations iteratively, we find that the solution stays diagonal to all orders of the coupling J 2 1 . Therefore we conclude that the solution either stays diagonal or that the solution is non-diagonal, but the off-diagonal contributions are non-perturbatively small as J 1 → 0. In the following we will assume that G and Σ remain diagonal at finite J 1 .
Recall that the Renyi entropy in large N limit will be determined by the saddle point with maximal contribution to the partition function. Hence, if the true minimum of the action occurs for an off-diagonal solution, then the value of S A,n that arises from a diagonal ansatz serves as an upper bound on the true value of S A,n . 5 With the diagonal assumption G αβ With the replica diagonal ansatz, the effective action is proportional to replica number n, so that we divide the overall n to the left side. 1 n S 0 denotes the first two lines, which is the original action of SYK chain, and 1 n ∆S denotes the third line which is the extra action cost caused by the twisted coupling. When only one of τ 1 and τ 2 is on the twisted contour C 1 , the J 2 1 term in the action vanishes since it couples a replica diagonal term G αα x * to an off-diagonal term G α,α+1 x * +1 or G α+1,α x * +1 . This is the origin of the extra term.

Weak link limit
Even with the replica diagonal ansatz, the Schwinger-Dyson equation is still hard to solve, especially because of the lack of time translation invariance. However, we can start with a simple limit where N βJ 1, and 1 βJ In this limit, the twisted coupling term ∝ J 2 1 can be treated perturbatively. To do a perturbative calculation of S A,n , we begin by reviewing the untwisted SYK model at large N and strong coupling N βJ 1. The saddle point solution approximately follows a conformal form: General fluctuations around this saddle costs order N action. However, there is a special class of the fluctuations that cost action N βJ in the long wave-length limit [22]. These fluctuations correspond to a time reparametrization f x ∈ Diff(S 1 ) of the conformal solution G c (τ 1 , τ 2 ), which has the form In the limit J 2 1 /J 2 1/βJ, the twisted interaction is so weak that even these reparameterization modes will not be sourced. The first order shift to the effective action is thus obtained by evaluating the twisted term at the conformal saddle G c . The effect of the reparameterization modes will be considered later in Sec. 4 and 5.
It is convenient to first work in imaginary (Euclidean) time τ and then analytically continue to real time by taking τ → it. The imaginary time problem involves a path integral with twist operators inserted at time τ and β 2 − τ , as is shown in Fig. 4. At first order in J 2 1 , the Rényi entropy is simply obtained by evaluating the effective action on the conformal solution G c . We find where is a small UV regulator. 6 The corresponding Rényi entropy is For convenience, we will define Analytically continuing to real time by taking τ 21 = β 2 − 2it, we obtain: which includes a time-dependent piece and a constant piece coming from the cut-off . At large real time t β, the entropy grows linearly: Denote the Rényi entropy of each site in thermal equilibrium as s th n , we can define a Rényi entanglement velocity as in (1): 6 Physically, this cut-off is of order J −1 : the artificial divergence arises from approximating the actual saddle point two-point function by the conformal solution G c (τ ) in Eq. (36). The conformal approximation applies to the IR region τ J −1 [21], but in the UV limit τ J −1 the conformal saddle diverges at τ → 0 while the true saddle G true (τ → 0) → 1 2 . The effect of this deviation can be described by a cutoff term log with of order J −1 .
The factor of 2 comes from the fact that the TFD state is defined on two chains. At low temperature, s th n approaches the finite zero temperature entropy, so that we conclude v E,n ∝ T at low temperature. However, as we have discussed in the introduction (and in more detail in Sec. 5), the entanglement entropy in this system actually does not saturate to the thermal value at long time. Thus, as we have noted previously, a conventional definition of v E,n may not apply.
Usually, taking n → 1 in S A,n leads to the von Neumann entropy S A . However, this limit is singular in Eq. (42). This is a consequence of the replica diagonal ansatz, because the resulting effective action S ∝ n. Physically, the divergence suggests that the n − 1 1 region is described by a replica non-diagonal saddle point. But we emphasize that S A,n (t), as given in (39), is an upper bound for the Rényi entropy of the optimal non-diagonal solution. We will return to this point in Sec. 5.4.

Deviation from linear growth: Gaussian correction
For any chain with a finite length, entropy is upper bounded. The linear growth of entropy cannot last forever. To see the saturation of S A,n (t) as t becomes large, we need to go beyond the first order approximation of the previous section. In this section we analyze the second order correction to the first order linear growth. More explicitly, we will consider the change of the saddle point solution due to the additional twisted interaction term. Recall that this amounts to finding the minimum of the effective action expressed in Eq. (34): As we noted before in Eq. (37), so long as 1 βJ 1, the soft modes of the SYK model are reparameterizations of the untwisted saddle point solution. When the twisted coupling is also small: γ 1, we can ignore the induced change of G outside the manifold of reparameterization. In this approximation, the saddle point solution is determined by minimizing the effective action − log Z A,n [G f x (τ 1 , τ 2 )] over reparameterization f (τ ). Since the minimal action in the restricted space of G f x (τ 1 , τ 2 ) is always larger or equal to the actual minimal action in the unrestricted space of two-point functions, the entropy we obtain in this approximation always bounds the actual entropy from above. In other words, our results are still meaningful as an upper bound, even when the effect of nonreparameterization modes is not negligible.

Effective action of the reparameterization field
The form of the effective action S[f x (τ )] ≡ − log Z A,n G f x can be explicitly written down. For simplicity, we will consider a chain with an even number M ∈ 2Z of sites We choose open boundary conditions, and an entanglement cut in the middle, between sites x * = 0 and x * + 1 = 1. In this case, the system has reflection symmetry after the random average, so that the saddle point solution shall satisfy f x (τ ) = f −x+1 (τ ). In particular, f 0 (τ ) = f 1 (τ ). With this simplification one can write The S 0 term controls the dynamics of the reparametrization field in SYK chain model [22] without the twisted interaction;. α S ≈ 0.01 is the numerical coefficient of the Schwarzian term [21], and this coefficient also determines the specific heat: c v = N 8π 2 α S J . 7 Without the twist term, the SYK chain model has a saddle point solution f (τ ) = τ , which corresponds to the conformal solution G c . Using the explicit form of G f 0 in terms of the conformal saddle G c , we can further write This integral diverges due to the UV divergence of G c at τ 1 − τ 2 → 0. This divergence is artificial, since the actual saddle point two-point function should saturate to the UV value 1 2 for τ 1 − τ 2 J −1 . To take into account of this UV regularization, we introduce UV cut-off for C 1 and C 2 . On the imaginary time circle, this corresponds to defining C 1 and C 2 as interval [τ 1 , τ 2 ] and [τ 2 + 2 , β + τ 1 − 1 ], respectively, as shown in Fig. 4(b). The resulting integral can be evaluated explicitly, using the functional form of G c given in Eq. (36) 8 and the result is where η[f x * ] is the reparametrized cross ratio: As expected, the result is explicitly invariant under the global conformal group SL(2, R). Therefore we have computed the twisted interactions, restricted to the reparameterization modes f x (τ ). 7 This specific heat is for the TFD state, which is twice of the specific heat for a single chain. [22] 8 The evaluation essentially corresponds to the explicit integral This is simply the cross ratio (49) for four points on the imaginary time circle, for f 0 (τ ) = τ .
In general, it is still difficult to rigorously study the effective action of reparameterization fields. In the remainder of this section we will study the effective action and Rényi entropy using a Gaussian approximation. In the next section we will study the full non-linear effective action, but mostly for the simplified case of two-site problem.

Gaussian correction
In the spirit of our previous perturbative calculation as γ → 0, we now compute the second order correction to S A,n in γ. Defining f x (τ ) = τ + x (τ ), we must expand the action up to the quadratic order in (τ ), compute the first order correction (τ ) ∼ γ caused by the twisted interaction term in the action, and then evaluate the action to order γ 2 , accounting for the non-zero (τ ). The effective action for small (τ ) is given by The first line is the quadratic expansion of S 0 , which has been obtained in [22]. The second line corresponds to 1 n ∆S. η 0 = η[f (τ ) = τ ] is the cross ratio for the trivial reparameterization, which corresponds to the first order contribution to the entropy we obtained earlier in Eq. (39). The twist term contributes a linear in term, which directly sources the first order correction (τ ) ∼ γ. We have ignored the quadratic term in ∆S; it will modify S A,n at higher order in γ. Minimizing S[ ] is straightforward: where · · · denotes expectation values with respect to the quadratic action for (τ ) given above. In Appendix A.1, we explicitly perform this expectation value, and we find that The negative correction ∝ t 3/2 indicates that the entropy growth starts to become slower than linear around the characteristic time t * ∼ α S γJ . In the weak link limit γ 1 βJ discussed here, this time scale t * ∼ β 1 γβJ β is much longer than the thermal time β. We can further estimate the amount of entropy growth by the time t * : which is of order c v T , the specific heat's contribution to the thermal entropy. As we will see later in Sec. 5, the entropy actually saturates to a final value that is comparable with our estimation here.

Long time saturation: Geometric interpretation
In this section, we will evaluate the partition function Z A,n for a simple case, when the chain has only two coupled SYK sites (M = 2). The entanglement cut is between the two sites. For the two-site problem, the full non-linear saddle point problem can be solved using a geometric interpretation of the action. Although this special case does not directly determine the entropy growth in a longer chain, the results will help us to understand qualitative features of this system, especially the long time saturation of the (Rényi) entanglement entropy. 9 As we will discuss in Sec. 5.4, the two-site calculation can be generalized to longer chains, and by doing so, we provide an upper bound of the entropy growth and saturation in that case.

Two site effective action and the mapping to a geometric problem
As we discussed in the previous section, the two-site problem with an entanglement cut between the two sites has a reflection symmetry, so that the saddle point solution should be given by identical reparameterization fields on the two sites: The effective action is thus a functional of a single f (τ ), and the Rényi entropy has the following form: The first term is the Schwarzian action for the reparametrizations f on two sites (hence the factor of 2), and the second term arises from the twisted interaction between the two sites. The last term, coming from the log Z term, is the constant piece of the Schwarzian, which cancels the value of the first term when f (τ ) = τ . Therefore, our goal here is to find the minimal value of S A,n by varying all reparametrizations f ∈ Diff(S 1 ): The time dependence comes from the second term log η f , where η f is the cross ratio of the reparametrization of four time coordinates: (τ 1 , τ 2 , τ 2 + 2 , β + τ 1 − 1 ) (cf. (49)) and 1,2 are cut-offs of order J −1 . τ 1 and τ 2 will be analytic continued to it and β 2 − it towards the end of the calculation. In the limit 1,2 /β ∼ (βJ) − 1 1, η f is simplified to It is manifest that both terms in the two-site action (55) are SL(2, R) invariant. The Schwarzian action term has a geometric interpretation [41,53], which corresponds to the area enclosed by a curve in hyperbolic space with fixed length More explicitly, one can consider a hyperbolic disk with coordinates (ρ, θ) and metric: We specify a curve on the disk by (ρ(τ ), θ(τ )). For each reparametrization f (τ ), we let θ(τ ) = 2π β f (τ ) and ρ(τ ) determined by the constrain that induced metric g τ τ (τ ) = J 2 along the curve is fixed. Then one can show (see Appendix B for details) in the large L = βJ limit, the Schwarzian action: Here A is the area enclosed by the curve. The second term of the two site action (55) also has a simple geometric interpretation in the embedded picture. In the strong coupling limit L = βJ 1 we are interested, the curve is very close to the boundary and the cross ratio term can be written as: where X 1,2 is the point on the curve that corresponds to the twist operator time τ 1,2 , respectively. D is the distance function with respect to the metric (58). For more details see Appendix B. Therefore, the minimization problem for I(t) is, in fact, a simple geometric problem: See Fig. 5(a) for an illustration. The first term in I(t) prefers to maximize the area A enclosed by the curve, for fixed length L. In the untwisted problem (without the second term), the maximal area is obtained by a circle, corresponding to f (τ ) = τ (up to an SL(2, R) transformation). The second term adds an "attractive force" between two particular points on the curve. The locations of these two points are fixed by demanding an arc length Jτ 1 = Jτ and Jτ 2 = βJ 2 − Jτ , starting from a reference point. 10 In the L → ∞ limit, the distance between the two points is large. The attractive force ∝ log cosh D(X 1 , X 2 ) D(X 1 , X 2 ) is a confining force that grows linearly with distance. The intersite coupling γ = J 2 1 8πJ 2 plays the role of a "string tension" connecting the two points. The competition of these two terms determine the saddle point solution f (τ ).
Piece-wise arcs with fixed total length Figure 5: (a) Geometric illustration of the minimization problem we need to solve. The total length of the curve is fixed to be L. The "string" connects X 1 and X 2 tends to pull the two points closer, while the other term of the action prefers the curve to enclose a larger area. (b) For a fixed distance D(X 1 , X 2 ) between X 1 and X 2 , the joint of two arcs maximizes the area enclosed by the curve. Then the entropy is determined by the minimum of action as a function of D(X 1 , X 2 ).
The geometric interpretation greatly simplifies the problem. Since the second term only depends on the position of two points X 1 , X 2 on the curve, the minimization of action I(t) can be considered as a two-step process. Firstly, we maximize the area A for a given position of X 1 , X 2 . Secondly, we change the distance between X 1 , X 2 to minimize the action. In the first step, when X 1 , X 2 are fixed, we are varying two curves, one with the fixed length of τ 2 − τ 1 = β 2 − 2τ and the other with fixed length β 2 + 2τ . Obviously, maximizing the area requires each of them to be an arc, as shown in Fig. 5(b). For a given distance D(X 1 , X 2 ) between the two points, the shape of each arc is completely determined, so that the minimized action I(t) = min D I(D) is now determined by one parameter D = D(X 1 , X 2 ). The second step is thus a one-parameter minimization problem which can be easily studied analytically and numerically.
In principle, the geometric minimization problem can be worked out for arbitrary L and γ. However, the geometric interpretation that relates the Eq. (55) to Eq. (61) is only valid if the curve in the hyperbolic disk is close to the conformal boundary. Beyond this limit, the Schwarzian action and the area term are not equivalent, and there are subleading corrections which must be accounted for. A sufficient condition for the curve to be close to the boundary is γ

Explicit form of the action
We now follow the procedure outlined above and write down the action I(D) for imaginary time explicitly as a function of D. Denote the radius of the two arcs (C 1 and C 2 in Fig. 5(b))as ρ 1,2 , and the 'opening angles' as α 1 and α 2 . These four parameters are determined by the arc length constraint and the distance D = D(X 1 , X 2 ). By construction, the arc length of C 1 and C 2 are fixed, which implies with x = 1 2 − 2τ β . In addition, the distance between the end points X 1 , X 2 satisfies cosh D = 1 + 2 sinh 2 ρ 1 sin 2 α 1 2 = 1 + 2 sinh 2 ρ 2 sin 2 α 2 2 (63) These four equations determine all four parameters ρ 1,2 , α 1,2 as functions of D.
The area enclosed by the joint of the two arcs can be divided into four parts, including two circular wedges and two triangles. The area of the wedges are easy to compute: We can then use the Gauss-Bonnet theorem to compute the area of triangles. As all extrinsic curvature of the triangular regions is located at the corners, we simply sum the inner angles. Two of the inner angles are known. The nontrivial inner angle is ∠CXO := φ in Fig. 5(b) which can be computed by considering the angle between radial line OX/CX and geodesic X 1 X 2 : Therefore, the total area of the two triangles is A = α 1 + α 2 − 2φ − 2π. Evaluating each part with hyperbolic geometry, we obtain

Analytic continuation
The imaginary time minimization problem can be solved numerically, which leads to I(τ ) as a function of imaginary time τ . However, only knowing I(τ ) numerically makes the analytic continuation difficult. Instead, we address the real time problem directly, and analytically continue the equations (62), (63) and (66). The analytic continuation is defined by 2τ β → i2t β , or x = 1 2 − i 2t β in Eqs. (62) and (63). This leads to complex α 1,2 and ρ 1,2 . In general, this would lead to a complex action (and thus complex entropy), which would be unphysical. However, we notice that α 1 sinh ρ 1 = L( 1 2 − i 2t β ) and α 2 sinh ρ 2 = L( 1 2 + i 2t β ) are complex conjugates. Both Eqs. (62) and (63) have a Z 2 symmetry: Thus we can look for saddle point solution that is invariant under this Z 2 symmetry, which satisfies α 2 = α * 1 , ρ 2 = ρ * 1 , D ∈ R. In this case Eqs. (62) and (63) require and the area term becomes The complex angle α 1 is determined by D (which remains real) in Eq. (68). The resulting action I(D) is the sum of (69) and γ 2 log cosh D, and both terms are manifestly real. We can now minimize I(D) with respect to D directly for real time t.

Numerical and analytic results
With the analytically continuated effective action defined above, we can minimize the action with respect to D numerically, and obtain the Rényi entropy S A,n (t). The numerical result is shown in Fig. 6. We see that the entropy grows quadratically at very early time, and subsequently crosses over to a linear growth. The linear growth rate agrees with the perturbative calculation for small γ. At longer times, the growth slows down and eventually S A,n saturates to a finite value as t → ∞. Therefore the result appears to be qualitatively similar to the expectation for a thermalizing system. However, as we will show, this system has not actually thermalized.
To gain better understanding of the long time saturation behavior, we have observed numerically that at long time the saddle point value of D corresponds to Re α 1 → 2π. Therefore we can expand around this point and study the long time behavior. In this limit, Eq. (68) requires α 1 to take the following form (for details, see Appendix C.1): with δ ∈ R. In the limit δ 1, L 1, γL 1, the saddle point value is δ * = − 16α S πβ γLt , which corresponds to the entropy where the saturation value is approximately given by and the coefficient for the saturation term is given by Details of this calculation are presented in Appendix C.2. It should be noticed that the first term is γ-independent, which clearly shows that the result is non-perturbative in γ (as we have first taken t → ∞), although we still consider a small γ L −1 . The entropy at t = 0 can be computed perturbatively, leading to In the limit where t → ∞, and then γ → 0, the entropy growth is thus given by with c v the specific heat of the doubled SYK model we are studying. The entropy growth is independent from the UV cut-off. The relation to specific heat is interesting, since c v T = S th (T ) − S th (0) is the thermal entropy minus the zero temperature entropy. Therefore, even after a long time, the TFD state has not thermalized, and the entropy is smaller than the thermal entropy by an amount that is determined by the zero temperature extremal entropy.
The result of our geometric minimization procedure can also be verified by applying the Gaussian approximation method in Sec. 4 to the two-site problem. We find that the early time linear growth will be corrected by a quadratic term around time scale t * ∼ α S γL in the weak link limit γL 1: In Appendix A.2, we present the detail of the Gaussian approximation calculation, and compare the result with the geometric formula in Appendix A.3. This early time result suggests that the t 2 term becomes significant when the entropy approaches ∝ N α S βJ , which is consistent with the late time saturation value we get above.
Thus, we conclude that the Rényi entanglement entropy in the time-evolved TFD state saturates to a sub-thermal value, and in the low temperature and weak inter-site coupling limit, the entropy is proportional to the "near-extremal entropy" S th A,n (T ) − S th A,n (T = 0). This result indicates that the system does not completely thermalize, but instead reaches a "pre-thermalized" state. Roughly speaking, pre-thermalization occurs because the degrees of freedom in this system are separated into fast modes (the reparameterization quasi-Goldstone modes) and slow modes (which are responsible for the zero temperature entropy). The latter are almost localized, and so it is natural that they require a long time to thermalize. The lack of rapid thermalization for these slow modes seems consistent with the fact that the non-reparameterization modes in the coupled SYK chain have exponentially decaying correlation functions, with correlation length at the order of lattice constant [22]. The thermalization time may grow with some function of N but must diverge in the infinite N limit. We expect that at finite N , the thermalization time for the SYK chain is finite, as it seems unlikely that the slow modes of the SYK chain have many-body localized.
Our result is based on several assumptions. We have taken a replica diagonal ansatz of the two-point function, and then further restricted ourselves to the reparameterization of the unperturbed saddle point. The entropy we obtain is determined by minimization of the effective action in this restricted space. As we have previously noted, the actual entropy obtained by unrestricted minimization will be smaller or equal to what we obtained. Thus, the pre-thermalization feature we have uncovered is independent of the validity of our approximations, and remains true even for the "actual" saddle point solution G αβ x (τ 1 , τ 2 ). We have computed the Rényi entanglement entropy of the SYK chain in a TFD state, whereas most previous studies of similar systems have studied the von Neumann entropy. As such, we now discuss the analytic continuation of the Rényi entropy to n → 1. We can compare the long time entropy (75) with the Rényi entropy of the thermal ensemble for the same system A, which is The comparison of S th A,n and S A,n (t → +∞) is illustrated in Fig. 7. Since the entropy should always be smaller or equal to the thermal value, we conclude that our approximation must fail near n → 1 where S A,n > S th A,n . The actual entropy is upper bounded by both our result S A,n and by the thermal entropy S th A,n , so that it is below both curves in Fig. 7. The crossing of the two curves occur at n 1 + cvT S 0 at low temperature, which serves as an estimation of where our approximations fail.
For chains with more than two sites, it is difficult to solve the non-linear equation determining the saddle point of f x (τ ). However, we can use the same argument above and obtain an upper limit of the Rényi entropy. If we consider a uniform ansatz f x (τ ) = f (τ ), the effective action reduces to the same form as the two-site case, with the parameter 2α S rescaled to M α S when there are M sites. Therefore the minimization problem can be mapped to the same geometric problem, with the effective coupling parameters α S , γ replaced by M α S /2, γ. In the large M limit, so long as we take the uniform ansatz above, we are in fact closer to the perturbative limit since γ/M α S → 0 as M → ∞. The saddle point gives a long time saturation entropy that is simply M/2 times the two site value: Therefore the entropy grows from area law to volume law, but the final entropy density is still lower than the thermal value. Since this is an upper bound of the actual entropy, we conclude that the chain with generic size M also reaches a pre-thermalized state at long time. Furthermore, as we noted in Sec. 4.2, there is likely a significant correction to the The black line denotes the early time weak link result, which contains a linear growth with the same rate, and receives a correction at an M -independent value of order α S βJ . After that, we have no definitive predictions using the tools we have developed in this paper. It is unclear whether the entropy continue to grow linearly (but presumably with a slower growth rate) until it reaches the late time bound we have derived. The growth towards this bound may be sublinear, and it is even possible that the entropy saturates at a value sublinear in M as t → ∞. growth rate of entanglement at the time scale t ∼ t * = α S βJ , independent from the length of the chain. This deviation indicates that the upper bound we found for the long chain is not tight. There are two possibilities about the fate of entropy growth in a long chain. The final entropy in large M may either be proportional to M or grow slower than M . In the latter case, we would consider (at least part of the system) to be many-body localized. An illustration is given in Fig. 8. Physically we expect that a volume law entropy is more likely, although the growth rate may be quite slow.

Comparison to Holography
In this section we briefly compare our results to intuition from gauge-gravity duality. Many features of the SYK model are known to be shared with models of (nearly) AdS 2 gravity: in particular, a common effective action [21,40,41], and similar thermoelectric transport properties [24]. Hence it is natural to ask whether the spread of entanglement shares any qualitative features.
Unlike in the SYK model, in holography it is easier to study von Neumann entropy rather than Rényi entropy. There is a well-known formula for v E in holography in the TFD quench [4][5][6]54]. Applying this formula to geometries with nearly AdS 2 × R d infrared geometries (d ≥ 1 is required for the interpretation that entanglement is flowing across a spatial surface), we find that v E ∝ T. (79) Details of this calculation are found in Appendix D. We emphasize that this definition assumes that v E has been defined as in (1). This result agrees with our intuition based on dimensional analysis, presented in the introduction, and our early time result from the SYK chain. However, there are important differences between the late time behavior of the holographic von Neumann entropy S E (t) and that of the SYK Rényi entropy. For a region of large but finite width R (analogous to the finite length chains studied above), the holographic saturation entanglement is given by and the saturation time will be approximately This behavior is quite different from the SYK chain. Strictly speaking, the calculation was performed in a different limit. In the SYK chain model, we required that n − 1 ≥ 1 was an integer, and the holographic calculation of the previous paragraph was performed exactly at n = 1. While it is difficult to reliably perform a calculation in both models for the same value of n, we can make some preliminary comments about the behavior of the holographic calculation for n > 1. If we define then [55] has shown that S A,n ∝ Area(brane of tension ∝ n − 1), where we calculate the length of a cosmic brane of tension ∝ n − 1, stretching between the boundaries of (two-sided) AdS subject to suitable boundary conditions. Because this brane has finite tension, we must account for its gravitational backreaction. In AdS 2 , the lack of gravitational dynamics means that this backreaction is expected to be very severe. A preliminary hint for the outcome comes from the following argument: in nearly AdS 2 geometries, the gravitational dynamics are associated with the movement of the boundary of the near-horizon region [41,42,44]. Stretching a brane of finite tension from one side of AdS 2 to the other will thus warp the geometry to bring the two sides closer together. This is likely analogous to the geometric calculation that we performed in the previous section, although the physical interpretation of the tensionful brane is somewhat different.
In the geometric calculation we did, we observed that for any finite string tension γ, the backreaction of a tensionful brane stretching between two points on the boundary is so strong that as real time t → ∞, the length of the brane remains finite: Hence, we expect a qualitative change in the Rényi entanglement entropy vs. the von Neumann entanglement entropy, and that the Rényi entropy for n > 1 may saturate at a parametrically smaller value than the von Neumann entropy at n = 1.
There are two subtleties to note, which make a precise comparison between holography and the SYK chain difficult. Firstly, S A,n vanishes when S A,n ∝ n/(n − 1), as we found in our replica diagonal SYK calculation. To the extent that SYK can recover the holographic results described above as n → 1, it is crucial that one finds a replica non-diagonal saddle point of the action (29). Secondly, the calculation of holographic Rényi entropy requires calculating the backreaction of tensionful branes in AdS 2 ×R d , not in purely AdS 2 . Perhaps additional spatial dimensions modify the intuition about gravitational dynamics in AdS 2 that we presented in the previous paragraph.

Discussion
We have initiated a study of the spatial spread of entanglement in the SYK chain model. Although we were unable to compute the von Neumann entanglement directly, we were able to upper bound the spread of Rényi entanglement entropy. We found that this Rényi entropy did not saturate at the expected thermal value, but instead at a parametrically smaller value: This implies that only the light degrees of freedom in the SYK chain (the reparameterization modes) were able to quickly thermalize. The bulk of the degrees of freedom, which are responsible for the finite zero temperature entropy, appear to be 'localized' on every site, and unable to quickly thermalize. Our result casts doubt upon whether the conventional interpretation of v E as a physical velocity scale at which entanglement propagates is sensible, at least for Rényi entropy. Defined as in (1), we have shown that v E ∝ T for the SYK chain. v E ∝ T can be justified on dimensional grounds, using the local criticality of the SYK chain: as length does not scale under renormalization group flows, we expect that v has the dimensions of energy. However, for Rényi entropies, the small saturation value of ∆S A,n suggests that the speed at which entanglement can spread spatially is actually better thought of as This may seem surprising, as v E v B , in contrast with the conjecture of [10]. Of course, there are a few caveats to the direct interpretation of v E as the correct definition of entanglement velocity. (i ) We only know that the early time entropy growth rate dS A,n dt ∝ T in the weakly coupled limit. In the low temperature limit γ > 1 βJ , our result is only an upper bound of the true entropy, and so it is possible that the entropy growth rate is much slower, leading to a smaller v E . (ii ) It is also possible that indeed v E > v B , which does not directly violate the inequality v E v B since the latter is based on the assumption of thermalization [10,11].
Thus, it is not straightforward in this model what the "correct" entanglement velocity is, or even how it scales with temperature. We contrast this with two dimensional conformal field theories,where the n = 1 and n > 1 entropies behave qualitatively similarly in such a quench [4]. In fact, there is already a well-known holographic model where v B is not the fastest "infrared" velocity scale. The theory holographically dual to the planar extremal AdS-Reissner-Nordström black hole has a speed of sound v sound ∝ T 0 [56], even while v B ∝ √ T [48]. Sound waves are classical hydrodynamic excitations that only exist after the onset of local thermalization (at the very least in a sector containing the energy-momentum tensor). We do expect that both v sound and v E are smaller than the Lieb-Robinson velocity [7]. While there have been some proposals for bounds and relations between the velocities of scrambling, entanglement, and sound [13], it is clear that they must be made more precise.
Two complementary recent studies of quantum quenches in a single site SYK model have recently appeared [43,44], and both show evidence for rapid thermalization. In particular, in an extension of the SYK model involving a χ q interaction instead of χ 4 interaction in (6), [43] was able to solve for the non-equilibrium χχ two-point function exactly at q = ∞, in a simple quantum quench involving a change in the Hamiltonian at time t = 0. In this limit it was observed that (i ) this two-point function was instantaneously thermal after the quench, 11 and (ii ) χχ appears to come entirely from the light reparameterization modes in the SYK model. Generalizing our analysis of the two-site SYK chain to a model with finite q reveals that (i ) the saturation entropy ∆S A,n ∝ q −2 while (ii ) dS A,n /dt is approximately q-independent. Hence, the entropy saturation time is ∝ q −2 , which means the light degrees of freedom does thermalize instantaneously in the large q limit. We expect that correlation functions which are dominated by these light degrees of freedom do not detect the slow thermalization of the 'heavy' modes that we have found by studying S A,n .
It will also be interesting to compare our results with non-perturbative approaches of computing correlation functions in the Schwarzian action [58,59]. As we discussed in Eq. (24), the Renyi entropy calculation can be considered as a twist-operator thermal two-point function. In the perturbative limit, the Renyi entropy we obtained in Eq. (39) corresponds to a two-point function from which we see that X A,n behaves like a dimension nN γ In summary, it appears that the SYK chain is both maximally chaotic at short times and takes a long time to completely thermalize, at least in the special state that we have prepared. These statements are not inconsistent: the rapid scrambling of the SYK model comes entirely from the Schwarzian action for the reparameterization modes. It would be interesting if there are other notable consequences of the remaining, slowly thermalizing, heavy modes. We have also proposed that a similar prethermalization phenomenon may arise in the maximally chaotic holographic models with AdS 2 infrared geometries. It would also be worth studying this more closely in the future.
We have left open the possibility that the von Neumann entanglement entropy saturates at the thermal value in our quench setup, even as we have shown that all higher Rényi entropies saturate at a parametrically smaller value. It would be interesting to resolve this question in future work. Even were this to occur, the reduced density matrix of one half of the chain should not appear thermal. Interestingly, it is known that in holographic theories there are examples in which two density matrices have identical von Neumann entropy but distinct Renyi entropies to the leading order of N in the large N limit. For two disjoint regions A and B with their minimal surface also disjointed, the density matrix ρ AB and ρ A ⊗ ρ B have identical von Neumann entropy to the leading order of N (thus vanishing mutual information I(A : B)), but different Renyi entropies [55]. We do not know whether such a qualitative discrepancy between n = 1 and n > 1 Rényi entropies signals something more profound about the dynamics of the model.
In conformal field theories in two, three and four spacetime dimensions, v E , v B , v sound and the speed of light c differ by, at most, about a factor of 2. The SYK chain is one example of a class of theories where these quantities can be parametrically different. It will serve as an excellent model for sharpening and making precise a deep yet mysterious relationship between chaos, thermalization, entanglement and hydrodynamics.

A.1 Infinite chain
The propagator of the reparametrization field x,t is determined by the quadratic action: where we replace p 2 by its lattice regularized form 2(1 − cos p) as we are going to integrate over the whole Brillouin zone. For simplicity, we will neglect the large N factor in the effective action in this appendix, since it is an overall prefactor and only rescales the final answer. The two-point function of that arises from (88) is with a > 0 and 0 θ < 2π. The summation over integers n can be done through the Matsubara trick: where C 1 is an integration contour that winds integer points 2, 3, . . . clockwise, see Fig. 9 for an illustration. The integrand is analytic in the right half plane Re x > 0 except the integer points x = 1, 2, . . .. Therefore we can deform the contour to C 2 : −i∞+ → i∞+ as shown in Fig. 9 with the cost of a double pole at x = 1: The residue can be computed explicitly: while the first two integrals need further treatment. We notice the integrands diverge near x = 0 and exponentially decay when going to large imaginary x in the contour C 2 . One can show that at large real time t, corresponding to large Im(θ), it is safe to approximate the integrands by their form near x = 0: Thus, the propagator has an approximate form: where θ = 2πτ β and a = 2βD π is small. We can further simplify the second term to: This step amounts to replacing x(x + a) by x when a is small. We can now evaluate the leading growing term in the Gaussian correction for the effective action at large real time τ = β 2 + i2t, or θ = π + i4π t β : and α S using (following [22]): Therefore, we can express the final minimum of the action in following form: We factor the 2πγt β out for easy comparison with the linear t growth term. The formula indicates the linear t growth will receive a correction at time scale: as claimed in the main text.

A.2 Two-sites
The calculations for two sites are much simpler. Due to the symmetry between the two sites, we can neglect the p 2 terms in (88), and compute the two-point function of as The extra 1 2 confirms that the global reparametrization is further suppressed by the system size. Now the evaluation of the summation is much simpler because there is no longer a branch cut of the summand. We can complete the contour nicely as shown in Fig. 10. In short, we can compute the infinite sum I(θ) := n 2 2 cos nθ n 2 (n 2 − 1) = n =0,±1 e −inθ n 2 (n 2 − 1) Figure 10: Contour deformation: the integrand of I(θ) is analytic in the whole plane except the integer points x ∈ Z. Therefore we can deform the contour C 1 ∪C 2 that encloses |x| 2 integer points to C which only encloses three points 0, ±1. by a contour deformation to a simple contour C that only includes three poles (two double poles at x = ±1 and a triple pole at x = 0): Therefore we can easily compute the integral by the residue theorem: The correlator has the form [21] τ 0 = − βJ 2α S β 2 (2π) 4 3θ 2 − 6πθ + 6(π − θ) sin θ − 15 cos θ + 2.π 2 − 6 6 (105) Now we again evaluate the leading growing term in the Gaussian correction by setting Again, we see that the linear growth receives a correction at time scale α S γJ .
A.3 Comparison of the two-site result with the geometric minimization at weak link limit We can do one further self-consistency check for the two-site problem in the weak link limit. We now use the geometric interpretation to reproduce the above two-site result.
The strategy is to start with the circle solution, which is a saddle point for the area term, and then expand around the circle solution and find the minimal value when we include the twisted interaction.
In the geometric picture, we can treat α 1,2 as function of y = cosh D, and expand the area function around the saddle point: The saddle point represents a circle geometrically, which is expected to be the shape with maximal area under constrain. So we must expand the area term to quadratic order in the deviation y − y * : The constant term has simple expression A * := A(y * ) = L + 2π 2 L − 2π and the linear term vanishes since we are expanding around a saddle point for the area. The most important piece is the quadratic term. We have define a function Q(x) as follows: and it is clear that Q(x) determines the cost of fluctuations near saddle point y * . Note that Q(x) has a minimum at x = 1 2 . Now the twisted term γ 2 log y is a logarithmic function of y, which we may also expand to quadratic order: log y = log y * + 1 y * (y − y * ) − 1 2 1 (y * ) 2 (y − y * ) 2 . (110) Notice that y * is of order L 2 ; therefore the quadratic term from γ 2 log y is of order γL −4 , while the area term has L −5 . So expanding around y is useful in the limit γ 1 L , or J 2 1 J 2 1 βJ . In this limit, we get a correction for the saddle point free energy: Notice y * = 1+2 L 2π 2 sin 2 πx and we focus on x away from 0 and 1, so we can approximate y * ∼ 2 L 2π 2 sin 2 πx, therefore: where we have taken x = 1 2 − i 2t β and large real time t β to simplify the result. This expression precisely agrees with (106).

B Derivation of the geometric interpretation
In this appendix we provide the derivation of Eq. (61).

B.1 The Schwarzian action term
The relation between the Schwarzian action and the area enclosed by a closed curve in hyperbolic space has been discussed in [41,53]. To make our discussion self-contained, we include a derivation here. We embed our curve in a global Euclidean AdS 2 Poincare disk, so that our mapping is a little different from that in [41].

C Some details of the geometric minimization
Here we provide some details for the geometric optimization problem at large real time t → ∞. In particular we derive the asymptotic form of the two terms in the action.

C.1 Solving the constraint
As t → ∞, our numerics suggests that the angle α 1 → 2π. We can use this knowledge to derive a constraint between the real and imaginary parts of α 1 . The starting point is the reality condition This can be rewritten as an equation relating the real and imaginary part of α 1 = 2π− +iδ: After Taylor expanding the tan / tanh functions for small and δ we can solve the equation in leading order: C.2 Minimization of I(D) at t → ∞ limit In the limit t → ∞, we can Taylor expand A(D) and keep the leading order terms: Using the constraint = δ 2 2π − δβ 4t for α 1 = 2π − + iδ, we have: Since δ is small, we only need to keep 16π δt term to minimize the action. After δ is determined variationally, the subleading terms will determine the subleading corrections to the final entropy at late times. We next analyze the log term: Now all the terms are explicit function of δ and we can proceed to find the minimum at leading order: Inserting this value back into the action, we obtain with c given in Eq. (73). We have checked that our numerical results agree well with the analytic result in both the long-time saturation value and the 1 t 2 term, as is shown in Fig.  11.

D Holographic von Neumann Entanglement Velocity
In this appendix we calculate v E for a holographic model with an AdS 2 × R d geometry in the IR. The formula for v E in a generic planar geometry with metric ds 2 = L 2 r 2 g(r) f (r) dr 2 − f (r)g(r)dt 2 + dx 2 is [4][5][6]54] where r + is the location of the (outer) event horizon of (132), and r * is the solution to 2d r * = (f g) (r * ) (f g)(r * ) .
which should occur behind the horizon. Note that v E is best understood as arising from a calculation of entanglement in a TFD state [4], analogous to the case we studied in the main text. This calculation may also be done for spatial quenches [5,6], but in this case it is important that the initial state of the quench has vanishing entropy density; otherwise the formula above is generally modified. If the matter which sources (132) gives rise to an extremal black hole, then at zero temperature r + → r e < ∞. (Note that r −d e ∝ s > 0 [56].) At a very small but nonzero temperature T , we expect that near the horizon, and that r + > r e for this new geometry if the specific heat is positive. The coefficients c 1 , c 2 and a are not independent of T but we will only need the fact that, to leading order in T , they are constants. (134) implies that where d r e = ab c 2 .
Using (133) we conclude that to leading order in T This confirms the scaling that we claimed in the main text.