Spectral Fluctuations in the Sachdev-Ye-Kitaev Model

We present a detailed quantitative analysis of spectral correlations in the Sachdev-Ye-Kitaev (SYK) model. We find that the deviations from universal Random Matrix Theory (RMT) behavior are due to a small number of long-wavelength fluctuations from one realization of the ensemble to the next one. These modes can be parameterized effectively in terms of Q-Hermite orthogonal polynomials, the main contribution being the scale fluctuations for which we give a simple estimate. Our numerical results for $N=32$ show that only the lowest eight polynomials are needed to eliminate the nonuniversal part of the spectral fluctuations. The covariance matrix of the coefficients of this expansion is obtained analytically from low-order double-trace moments. We evaluate the covariance matrix of the first six moments and find that it agrees with the numerics. We also analyze the spectral correlation using a nonlinear $\sigma$-model, which is derived through a Fierz transformation, and evaluate the one and two-point spectral correlator to two-loop order. The wide correlator is given by the sum of the universal RMT result and corrections whose lowest-order term corresponds to scale fluctuations. However, the loop expansion of the $\sigma$-model results in an ill-behaved expansion of the resolvent, and it gives universal RMT fluctuations not only for $q=4$ but also for the $q=2$ SYK model while the correct result in this case should have been Poisson statistics. We analyze the number variance and spectral form factor for $N=32$ and $q=4$ numerically. We show that the quadratic deviation of the number variance for large energies appears as a peak for small times in the spectral form factor. After eliminating the long-wavelength fluctuations, we find quantitative agreement with RMT up to an exponentially large number of level spacings or exponentially short times, respectively.


Introduction
Starting with the seminal talk by Kitaev [1], the Sachdev-Ye-Kitaev (SYK) model [2] has attracted a great deal of attention in recent years, in particular as a model for two-dimensional gravity. The low-energy limit of the SYK model is given by the Schwarzian action which can also be obtained from Jackiw-Teitelboim gravity [3,4]. In this limit, the SYK model is dual to a black hole [5], and because of this an initial state has to thermalize which is only possible if its dynamics are chaotic. In fact the SYK model turned out to be maximally chaotic [1,3,6,7], which was shown by the calculation of Out-of-Time-Order Correlators (OTOC) [1,3,8].
A different measure of chaos in quantum systems is the extent to which correlations of eigenvalues are given by random matrix theory. This goes back to the Bohigas-Giannoni-Schmidt conjecture [9] stating that if the corresponding classical system is chaotic, the eigenvalue correlations of the quantum system are given by random matrix theory. In the SYK model this can be investigated by the exact diagonalization of the SYK Hamiltonian. It was found that level statistics is given by the random matrix theory with the same anti-unitary symmetries as N mod 8 dependent anti-unitary symmetry of the SYK model [10][11][12]. This has been understood analytically in terms of a two-replica nonperturbative saddle point of the so called ΣG formulation of the SYK model [13]. However, there are also deviations from random matrix theory at many level spacings or small times which follow from a nonlinear σ-model formulation [14,15] or from a moment calculation of the SYK model [16,17].
The so-called complex SYK model [2] was first introduced in nuclear physics [18][19][20] to reflect the four-body nature of the nuclear interaction (known as a two-body interaction in the nuclear physics literature) as well as the exponential increase of the level density and the random matrix behavior of nuclear level correlations. The great advance that was made in [2] is to formulate the SYK model as a path integral which isolates N as a prefactor of the action, making it possible to evaluate the Green's functions of the model by mean field theory. This analysis revealed one of the most striking properties of the SYK model, namely that its ground state entropy is nonzero and extensive, making it a model for non-Fermi liquids and black hole physics alike [5,21]. This approach also showed that the level density of the SYK model increases as exp √ E − E 0 [12,[22][23][24] exactly as the phenomenologically successful Bethe formula [25] for the nuclear level density, which was actually not realized in the early nuclear physics literature. The randomness of the SYK model is not expected to be important, and it has been shown for several non-random SYK-like models, specifically tensor models, have a similar melonic mean field behavior [26][27][28].
The SYK model has become a paradigm of quantum many-body physics. It has been used to understand thermalization [29,30], eigenstate thermalization [31] and decay of the thermofield double state [32]. The chaotic-integrable phase transition has been studied in a mass-deformed SYK [33,34]. Coupled SYK models have been used to get a deeper understanding of wormholes [35][36][37][38][39] and black hole microstates [40]. Lattices of coupled SYK models describe phase transitions involving non-Fermi liquids [41][42][43]. The complex SYK model has a conserved charge (the total number of fermions) and its effects were recently analyzed in the ΣG formulation [44]. It has also been used to construct a model for quantum batteries [45]. The duality been the SYK model and Jackiw-Teitelboim gravity has been further explored in random matrix theories with the spectral density of the SYK model at low energies [46][47][48][49][50].
There are different observables to study level correlations. The best known one is the spacing distribution of neighboring levels or the ratio of the maximum and the minimum of two consecutive spacings [51]. The disadvantage of these measures is that they include both two-point and higher-point correlations. In this paper we will focus on the number variance, which is the variance of the number of eigenvalues in a interval that contains n eigenvalues on average, and the spectral form factor which is the Fourier transform of the pair correlation function. Note that the number variance is an integral transform of both the pair correlation function (by definition) and the spectral form factor.
The average spectral density is not universal, and to analyze the universality of spectral correlators, one has to eliminate this non-universal part which appears in two different places. First, one has to subtract the disconnected part of the two-point correlator, and second, one has to unfold the spectrum by a smooth transformation resulting in a spectral density that is uniform. Note that the number variance is already defined in terms of level numbers and no further unfolding is needed (although in practice it is often convenient to do so). The number variance and the spectral form factor are complementary observables, each with their own advantages and disadvantages.
The goal of this paper is to determine quantitatively the agreement of level correlations in the SYK model with random matrix theory. To do this we distinguish between level correlations of one specific realization of the SYK model and level correlations due to fluctuations from one realization to the next. The latter are expected to be large. Since the SYK model is determined by only N q independent random variables, the relative error in a observable is of order 1/ N q 1/2 . Such fluctuations in the average level density result in a contribution to the number variance of ∼ n 2 / N q which for q = 4 becomes important when n ∼ N 2 , which is not large in comparison to the total number of levels of 2 N/2 /2. This effect gives a constant contribution to the pair correlation function, and a function proportional to δ(τ ) to the spectral form factor, which becomes the dominant contribution for τ < N q . It was already realized many years ago that these scale fluctuations can be eliminated by rescaling the eigenvalues according to the width of the spectral density for each configuration [52,53]. Recently, it was shown that the main long-range spectral fluctuations in the SYK model are of this nature [15][16][17]. However, these are not the only long-range fluctuations: they are just the first term in a "multipole" expansion of the smoothened spectral density for each realization. In this paper we systematically study such long range fluctuations. Since the spectral density is close to the weight function of the Q-Hermite polynomials [12,23,54], it is natural to expand the deviations in terms of Q-Hermite polynomials. It turns out that we only need a small number of polynomials, suggesting a separation of scales between long-wavelength fluctuations of the spectral density and the short-wavelength fluctuations of the universal RMT spectral correlations. The long-wavelength fluctuations are determined by low-order moments, and we present an analytical calculation of the covariance matrix of the first six moments.
We also analyze the deviations from random matrix theory in terms of the replica limit of a spectral determinant [14,15,55,56]. We evaluate the wide correlator to two-loop order. The random matrix contribution to this correlator is given by the massless part of the propagator, whereas the deviations are due to massive modes. For q = 4 these results are in agreement with [15] and the numerical results of [11] (for values of N in the universality class of the Gaussian Unitary Ensemble (GUE)). However, the expressions for q = 2 are qualitatively the same albeit with a Thouless energy that scales as N rather than N 2 for q = 4. It is clear though, that for q = 2, with energies given by sums of single-particle energies, spectral correlations are given by Poisson statistics. Presently, the mechanism that nullifies the contribution from the zero modes is not clear. Both the ΣG formulation [13] and the spectral determinant [14,15] of the SYK model make use of the replica trick. Although the replica trick may result in an incorrect answer [57], we expect that at the mean field level only replica diagonal solutions contribute to one-point functions [58] while solutions that couple the two replicas, but are otherwise diagonal, appear in the calculation of the two-point function [13-15, 37, 59, 60].
Our strategy to eliminate the collective fluctuations of the spectral density is discussed in section 3 after introducing the SYK model in section 2. In section 4 we give a simple argument to determine the leading correction to the number variance. The σ-model formulation of the SYK model is discussed in section 5. We calculate the one-point function and the two-point function to two-loop order. We also obtain a very efficient expansion for the the expansion of the resolvent in terms of powers of the resolvent of a semicircle rescaled to the actual width of the spectrum of the SYK model. This expansion is obtained by a resummation of the expansion in semicicular resolvents obtained in [15]. The covariance matrix is obtained in section 6, where we give explicit results for the first six double-trace moments. Numerical results for the number variance and spectral form factor are presented in section 7. In this section we also discuss the effects of ensemble averaging and the fluctuations relative to the ensemble average, the latter contributes to the number variance and the spectral form factor. The structure of high-order double-trace moments and the convergence to the random matrix result is discussed in section 8. Concluding remarks are made in section 9. In appendix A we derive the σ-model for the spectral determinant from a Fierz transformation. Several combinatorial formulas are given in appendix B. The replica limit of the one-point and two-point functions of the GUE are calculated in appendix C. Examples for the calculation of double-trace moments are worked out in appendix D and explicit results for low-order moments are given in appendix E.

The Sachdev-Ye-Kitaev (SYK) model
The SYK model is a model of N interacting Majorana fermions with a q-body Hamiltonian given by with α being an index set of q integer elements: and hence α can have N q configurations. Furthermore, and the J α are independently Gaussian distributed random variables with variance v 2 given by This choice results in a many-body variance (the following bracket · · · denotes ensemble average over all J α ) and a ground state energy that scales linearly with N , see equation (3.5). In this paper we only consider even values of q, especially q = 4. The Majorana fermions are represented as Dirac γ matrices which is an effective way to obtain a Hamiltonian that can be diagonalized numerically.

Spectral density
The average spectral density of the SYK model is well approximated [23,54] by the weight function of the Q-Hermite polynomials (in units where the many-body variance σ 2 is one) [61]: where In physical units the energy E is related to dimensionless energy x by 4) and the ground state energy is given by where σ 2 was given in (2.5). Note that E 0 is extensive in N . This results in (3.6) In this paper we will only use the dimensionless energy x.
The average spectral density of the SYK model can be expanded in terms of the orthogonal polynomials corresponding to this weight function, The average spectral density is even in x so that all odd coefficients vanish after averaging.
Since the second and fourth moments of the SYK model coincide with those of the Q-Hermite spectral density we also have Both ρ SYK and and ρ QH are normalized to unity. The Q-Hermite polynomials satisfy the recursion relation [61] H with The orthogonality relations are given by where n η ! is the Q-factorial defined as (1 + η s ). (3.12) The expansion (3.7) is a generalization of the Gram-Charlier expansion -for η = 1 it becomes the Gram-Charlier expansion. It is only positive-definite when the expansion coefficients are sufficiently small. For the SYK model with N = 32 and q = 4 we find that | c k | < 0.01 and decreasing for larger values of k. The expansion converges very well in this case. The expansion coefficients can be calculated in two ways. First, from the scalar product of the numerically calculated average spectral density and the Q-Hermite orthogonal polynomials, and second, from minimizing the L 2 norm of the difference between the numerical result and the expansion up to a given order. The two should give the same results. However, an expansion in orthogonal polynomials does not converge well near the end points of the spectrum, and a much better fit is obtained by excluding a small fraction of the spectrum in this region. In that case, the coefficients cannot be obtained by taking scalar products, but they still can be obtained by minimization. We illustrate this in figure 1 for the average spectral density of an ensemble of 400 SYK Hamiltonians for N = 32 and q = 4. What is plotted is the difference of the average spectral density and the eighth order Q-Hermite approximation. In the right figure the coefficients are obtained by calculating the scalar products and the difference (black curve) is close to the contribution of the tenth order Q-Hermite polynomial (red curve). In the left figure, the coefficients are obtained by minimization of χ 2 value of the difference not taking into account 2.5% of the total number of eigenvalues at each end of the spectrum. This gives a fit that is a factor 10 better with an accuracy of 1 part in 10000.
We can also expand the spectral density of the SYK Hamiltonian for each configuration in terms of Q-Hermite polynomials Now, the c k are stochastic variables whose averages give the expansion coefficients c k of the average spectral density in equation (3.7). The fluctuations of c k correspond to spectral fluctuations with wavelength scale of E 0 /k. We expect that these fluctuations are nonuniversal for small values of k, but for large values of k, they will correspond to universal random matrix correlations. The aim of this paper is to study at which scale this transition to universal random matrix behavior takes place.
Since the coefficients c k are determined by N q independent stochastic variables, the relative error of the c k as well as of ρ(x) is of order 1/ N q 1/2 . We thus expect that the variance of the number of levels in an interval containingn levels on average is equal to Σ 2 (n) ∼n 2 N q −1 (3.14) in agreement with the explicit calculation of TrH 2 TrH 2 for the SYK model [15,17]. The covariance matrix c k c l can be calculated analytically for small values for k and l by calculation double-trace moments using methods similar to those first introduced for the calculation of the single-trace moments [20,[62][63][64], see section 6.

Collective fluctuations of eigenvalues
As was noticed in earlier work, the main contribution to the number variances comes from overall rescaling of the eigenvalues from one configuration to the next [15-17, 52, 53]. Such fluctuations can be written as where ξ is a stochastic variable with a zero average and a finite variance. The corresponding spectral density fluctuates as where ρ(x) is the smoothened spectral density of a single realization. This results in the two-point correlation function where · denotes averaging over δ. Since we consider spectral correlations on a scale |x−y| 1, terms involving the first and second derivative of ρ(x) can be ignored. The expectation value of ξ 2 can be obtained from the normalized (by Hilbert space dimension) and rescaled (by variance) double-trace momentM 2,2 , withM m,n defined as M m,n := TrH m TrH n 2 N σ m+n , (4.4) where σ is given by (2.5). As a special case we introduce the notation for (normalized and rescaled) single-trace momentsM n :=M 0,n . (4.5) TheM 2,2 in terms of scale fluctuations is given bỹ On the other hand, this quantity can be calculated directly in the SYK model [17]: We thus find This results in the number variance This follows from the elementary calculatioñ 2mM2n . (4.12) We will see below that in the SYK model, this result is equal to the average double-trace moment with two cross contractions, see equation (6.13).

Nonlinear σ-model
Using standard random matrix techniques it is possible to derive a nonlinear σ-model for the SYK model. For the complex SYK model this was done already in the early eighties [14] but because of the coupling between the massive and massless modes, it was hard to analyze the σ-model reliably. The SYK model with Majorana fermions is simpler and Altland and Bagrets obtained [15] the following nonlinear σ-model for the β = 2 universality class (see appendix A for a derivation), The coefficient of a 2 µ is not positive definite, but the integrals can be made convergent by an appropriate rotation of the integration contour in the complex plane [15]. However, we evaluate the integral perturbatively in a loop expansion about its saddle point when the choice of the integration contour is irrelevant. The X µ 's are all the linearly independent products of Dirac matrices in N dimensions and they form a basis for the vector space of all 2 N/2 × 2 N/2 matrices. Because the Hamiltonian commutes with the Dirac chirality matrix γ c , the Hamiltonian splits into two blocks, and the partition function is for one of the two blocks. Therefore, the sum over µ is, as indicated by the prime, over the upper block of the X µ with even |µ| (the length of the multi-index µ), and only ranges from 0 ≤ |µ| ≤ N/2 (with only half the generators for |µ| = N/2, see appendix A). We use the normalization convention that X 2 µ = 1 and denote the upper block by X U µ . As an example, for N = 4 we have and in this case the µ would be a sum over the set {1, iγ 1 γ 2 , iγ 1 γ 3 , iγ 1 γ 4 }. In the analytical calculation we sum over all µ and correct for that by including the appropriate combinatorial factor. Note that contrary to [15] we use Hermitian generators X µ † = X µ . The matrix a µ is an 2n × 2n matrix in replica space and The matrix block corresponding x or y will be denoted by 11 or 22, respectively. The trace over the 2n dimensional replica space (n dimensional in case of the one-point function) is denoted by tr, while the combined trace over the Clifford algebra and the replica space is denoted by Tr. The coefficient T µ a combinatorial factor due the the commutation of the γ matrices, with |µ| the cardinality of index set µ. It is a Fierz coefficient of the Fierz transformation that arises in the derivation of the σ model, see appendix A, with Γ α the q-body operator defined in equation (2.3) and D = 2 N/2 /2. The pair correlation function is given by where P pp is the projection onto the 11 block and 22 block for p = 1 and p = 2, respectively. To obtain this expression we have written the derivatives with respect to x and y as a derivative with respect to a 0 .
The generating function for the one-point function has the same form as in the case of the two-point function, (5.1), but now a µ is an n × n matrix and z = diag(x, · · · , x n ). The resolvent is given by where the right-hand side is obtained after a partial integration by expressing the z-derivative as a derivative with respect to a 0 .

One-point function
To better understand the convergence properties of the nonlinear sigma model, it is instructive to work out the one-point function. The generating function for the one-point function has the same form as in the case of the two-point function, (5.1), but now a µ is an n × n matrix and z = (x, · · · , x n ). The resolvent is given by where a 0 is the diagonal element of the replica matrix. We evaluate the integral by a loop expansion about the saddle point. The saddle point equation is given by where the trace Tr is over the gamma matrices. The solution is given by The saddle point result for the resolvent is equal tō The resolvent thus satisfies the saddle point equation (5.12) and is given byḠ We expand a µ about its saddle point a µ =ā + α µ (5.14) Using the saddle point equation for a, the expansion of the logarithm is given by This results in the propagator while the vertices of the loop expansion are given by We now calculate the resolvent to two-loop order in this expansion. The one-loop contribution, Tra 0 Tr( µ α µ X U µ ) 3 , vanishes in the replica limit. The first nonvanishing contribution is given by the diagram TrαTrααααα (5.19) Inserting the expressions for the vertices and propagators we obtain after taking the replica limit: The trace can be evaluated by observing that X U µ and X U ν commute or anti-commute de-pending on the number of gamma matrices they have in common, while T µ only depends on the number of indices in µ. This results in m n N m The coefficients of the 1/z expansion can be obtained by using various combinatorial identities, see appendix B. The normalized fourth moment is thus given by Next we consider the contribution we find after taking the replica limit The traces are only nonzero if X U ρ contains the gamma matrices that X U µ and X U ν do not have in common. For m 1 gamma matrices in X U µ , m 2 gamma matrices in X U ν and s common gamma matrices this gives the combinatorial factor where the phase factor is due to the fact that only i s X U µ X U ν is Hermitian. The diagram (5.25) is thus equal to The sums for the leading order term in the 1/z expansion of the resolvent can be evaluated as the contribution T 6 [62], 1 see appendix B. The combinatorial factor 1 4 is to account for the prime on the sums in equation (5.25). Note that the sum over m 1 and m 2 only runs up to N/2 while the sum over odd m 1 and m 2 vanishes anyway. We thus obtain the large z expansion of the correction (5.28) The second diagram where we have used combinatorial identities to obtain the coefficient of 1/z 7 . To summarize, starting from the generating function Z, we have obtained the normalized sixth moment M 6 σ 6 = 5 + 6η + 3η 2 + T 6 (5. 31) in agreement with an explicit moment calculation of the SYK model [11]. If we continue this expansion we will recover the full 1/z expansion of the resolvent of the SYK model. This expansion is not well behaved for z close to the support of the spectrum and will have to be resummed to obtain nonperturbative results, see section 5.3 for more discussion of this issue.

Loop expansion of the two-point function
The saddle-point equation for the two point function has the same form as equation (5.8) but now z is a vector of length 2n, see equation (5.2), and a µ is a 2n × 2n matrix. The solution with a 12 µ = 0 is the same as for the one point function with z replaced by x and y for the 11-sector and the 22-sector, respectively, The saddle point value of the resolvent is thus given bȳ whereḠ(x) andḠ(y) are related toā 11 andā 22 according to equation (5.11), in this order. The propagator and vertices follow from the expansion of the logarithm in the action The propagator is thus given by .
We first calculate the one-loop contribution to the two-point correlator As is the case for the GUE (see appendix C.2) we have two contributions. The first contribution is given by where we have used that and that the sum over µ in equation (5.38), as indicated by the primes, only runs over the even values of µ < N/2 resulting in the combinatorial factor of 1 4 . The second one-loop contribution to the two-point function is given by after taking the replica limit. The sum of the two contributions is equal to The µ = 0 term is the large N limit of the two point correlator for the GUE (see appendix C.2). It is given by .
The corresponding spectral correlation function follows from the discontinuity across the real axis This is exactly the asymptotic behavior of where D is the total number of eigenvalues, D = 2 N/2 /2. In figure 2 we show the GUE result µ = 0 and the sum of the correction terms µ = 0) both for q = 2 and q = 4. The correction of the q = 2 result is much larger, by a factor of order N 2 , but is not qualitatively different from that of the q = 4 result. We observe that in both cases we have a scale separation between the GUE result and the correction due to the massive modes. Since other correction terms are of higher order in 1/D, it is puzzling how the correction terms for q = 2 can nullify the GUE contribution in order to get Poisson statistics. The only way out seems to be that interaction terms between the zero modes (|µ| = 0 or |µ| = N terms) and the massive modes (|µ| = 0, N terms) are important for q = 2 while their contributions cancel for q = 4. The details of this cancellation are not clear, in particular because the lowest order contribution of such terms was large in the complex SYK model [14]. Since T µ ∼ 1 − 2|µ|q/N + · · · , the |µ| = 0 terms are strongly suppressed with respect to the |µ| = 0 or |µ| = N terms for x → y. For µ values with N |µ| 1 we have that T µ 1. Therefore, we can approximate the correction to the GUE result by expanding the denominator to first order in T µ . In the center of the spectrum this given the result so that the correction to the two point correlator is given by In terms of the unfolded energy difference ω = (x − y)ρ(0), we find which is in agreement with the results in figure 2. Note that we only get the universal GUE result if we rescale by the saddle point result for the spectral density which differs by O(1) from the correct SYK result. This result agrees with [15], where it was argued that the corrections to the two-point spectral correlation function are of the form where N 2k is the degeneracy of massive modes with mass (k), and ∆ = πσ/D is the level spacing in the center of the band. The (k) are given by Since T k ∼ 1/N q , the (k) are much larger than the span of the spectrum, while ω N . The correlator of [15] is thus well-approximated by which is exactly the result from the above perturbative calculation.
Although the σ-model was derived for arbitrary N and even q, it cannot be naively applied to cases with Dyson index β = 1 or β = 4. As was discussed at end of appendix A, the reason is that the |µ| = 0 term after the Fierz transformation, gives the GUE result. To extract the GOE result, we have exploit the symmetry of the γ matrices for N mod 8 = 0 before the Fierz transformation. Then the |µ| = 0 terms include the Cooperon contributions characteristic of the GOE result. Similar arguments can be made for the β = 4 case.

Higher order corrections
It becomes increasingly hard to calculate higher order corrections to the loop-expansion of the two-point correlation function. Such terms are important for ensemble fluctuations that go beyond scale fluctuations. We only calculate the three-loop diagram given by One easily finds, There is another remarkable combinatorial identity (note that there is no prime) which can be proved by applying the Fierz transformation (A.8) to µ T µ X U µ X U ν and the same for the sum over ν and ρ. The sum is only nonzero when the summation indices are either all even or all odd. Since the sum over the many-body space is only over the even indices up to N/2, we thus get an overall combinatorial factor of 1/16, so that the total contribution of the diagram (5.51) to the 33 moment is given by which together with the last term of equation (5.41) gives the correct result for the 33 moment after using the identity This agrees with with the moment calculation in section 6, and is a very nontrivial check of the correctness of the σ-model calculation.

Does the expansion in powers of the resolvent of the semicircle make sense?
The σ-model results in an expansion of the resolvent of the SYK model in powers of g 0 Each of the terms has a cut in the complex plane on the interval [−2, 2], while the resolvent of the SYK model has a cut beyond that. In the Q-Hermite approximation, the cut is located on the interval where η is given in equation (3.2). Since the resolvent of the Q-Hermite spectral density is known analytically, we can get the coefficients a k in this case, The spectral density that can be derived from derived from (5.58) is strongly oscillating, and to make sense of the result, the asymptotic series has to be resummed. Based on the case η = 1, one might think that a Borel resummation might lead to a convergent result [14], but we were not able to work out the sums for arbitrary η. However, we can also expand the resolvent in powers of Naively, because the spectrum corresponding to this resolvent has the same support as the weight function of the Q-Hermite polynomials, we expect that this gives a much better expansion. It turns out that the expansion is surprisingly simple [65] Since |g 0η | = √ 1 − η and |η| < 1, this expansion is convergent for the Q-Hermite spectral density.
We could also expand, the exact resolvent of the SYK model in powers of g 2k+1 0η (z), by requiring that the moments up to a given order are the same. To eight order in the moments this gives with corresponding corrections δρ 3 (k = 3) and δρ 4 (k = 4) to the spectral density. For N = 32 the sixth order result agrees better than 0.1% with the exact result, which is much better than the Q-Hermite approximation, see figure 3. The eighth order correction only gives a slight improvement.

Analytical Calculation of the Fluctuations of the Expansion Coefficients
In this section, we calculate the covariance matrix of the expansion coefficients of the spectral density in terms of the Q-Hermite density and the Q-Hermite polynomials, We will obtain explicit results for the expectation values c k c l for k, l ≤ 6. Since the numerics in this paper are done in dimensionless units, and spectral densities are normalized to one, it is convenient to study the rescaled, by the many-body variance σ given in (2.5), and normalized momentsM m,n := which where already introduced in section 4. The covariances of the stochastic coefficients c i are related to the double-trace moments byM Note that because the H η i (x) are orthogonal with respect to ρ QH (x) we have f mi = 0 for m < i, which means f mi forms a triangular matrix of infinite size. The coefficients c mi also vanish when m + i is odd. The covariance matrix c i c j is given by Since the inverse of the triangular matrix f mi must also be triangular, we can consistently truncate equation (6.5) up to some finite values of i, j, m and n. An efficient way to calculate the f mi , using that where the momentsM QH m+k (η) can be obtained from Riordan-Touchard formula, introduced earlier as equation (5.59): An important caveat is in order: in the numerics we used an irreducible block of the random Hamiltonians to calculate the two-point correlations, which is the appropriate thing to do for probing RMT universalities. This means we really should be looking at instead ofM m,n , where γ c is the chirality Dirac matrix in even dimensions. However, since γ c is a product of N different Dirac matrices, we have Tr (γ c H m ) = 0 realization by realization if m < N/q. For N = 32 and q = 4, this means in the double-trace moments up to m = 7 and n = 7 equation (6.8) coincide withM m,n , which means we might as well useM m,n for the calculation of c i c j up to i = 7 and j = 7. In this paper we will not go beyondM 6,6 due to computational complexity, but we do caution that Tr (γ c H m ) must be confronted for higher moments. 2

General expression for double-trace Wick contractions
In this section we give a general expression for double-trace contractions contributing to rescaled double-trace momentsM m,n .
(a) Since the ensemble average is taken over a Gaussian distribution and thus reduced to a sum over Wick contractions, we can represent the double traces as chord diagrams. Since there are two traces, we will use two different horizontal lines to attach the chords on. Such horizontal lines are called backbones in some of the chord diagram literature. For example, in figure 4 we draw three chord diagrams with two backbones that represent some of the contractions that contribute toM 4,4 . In a self-evident manner they represent (we will not adopt Einstein's summation convention unless otherwise stated) Note (a) belongs to the disconnected part ofM 4,4 , since all the chords connect only within single traces, and we call such chords single-trace links; on the other hand (b) and (c) belong to the connected part ofM 4,4 because both contain chords that go from one backbone to the other, and we call such chords cross links. It is important to notice that notationally we used • Latin-letter subscripts on the cross-linked Γ's and • Greek-letter subscripts on the single-trace-linked Γ's.
The combinatorics for double-trace moments are much like the single trace moments discussed in [62], with two additional rules for the d variables which are the cardinality of the regions in the Venn diagram of overlapping indices. In other words, d a i 1 a i 2 ...a i k is the number of elements common and only common to the sets a i 1 , a i 2 , . . . , a i k (naturally, i 1 , . . . , i k are all different from each other in this definition). The additional rules are as follows: • The d-variables with odd number of Latin-letter subscripts (and an arbitrary number of Greek-letter subscripts) must be set to zero.
• Kronecker deltas are needed to enforce that the total number of indices corresponding to one vertex is q.
More explicitly, a double-trace chord diagram G has a value of   As one can see, the most general description of the double trace combinatorics is unfortunately convoluted. We encourage the readers to look into the application of (6.9)-(6.12) to the examples ofM 3,3 andM 3,5 illustrated in appendix D.

Some useful properties of double-trace moments
There are a few other properties of double-trace contractions that will be useful for evaluating double-trace averages: (i)M m,n = 0 if m + n is odd. Hence we only need to concern ourselves with the cases of m + n being even.
(iv) For any N , there exists a charge conjugation matrix, either C + or C − (or both), such that C −1 ± γ i C ± = ±γ T i , this implies the following reflection identity 4 where we also used This reflection, together with the cyclic permutations of the Γ's, gives rise to a natural dihedral group action on the traces.
(v) If all but one subscript in the traces are summed, the result is a constant independent of the remaining unsummed subscript, see appendix D of reference [62]. This implies every contraction's value must contain a factor of N q . This means each term contributing to 2 −N TrH m TrH n will factorize into N q times another integer (and times σ m+n ). Analogous to the single trace situation, this implies a further factorization property of a special class of double-trace contractions: if a single-trace link intersects with exactly one cross link and nothing else, then this chord diagram factorizes into a double-trace diagram with this single-trace link removed and a single-trace diagram of two chords with one intersection (whose value is η of (3.2)). Figure 5 illustrates such an example. = × η Figure 5. The factorization of a chord diagram.

Low-order covariances of Q-Hermite expansion coefficients
In this section we give explicit results for the covariances up to sixth order for N = 32 and q = 4 which is the case we study numerically later. Apart from the momentsM 2,2m given by equation (6.13), we also need the following results: Then using equation (6.5) up to i, j, m, n = 6, we obtain for N = 32, q = 4 the following nonzero covariances up to i, j ≤ 6: Note that c 0 = 1 and c 1 = 0 even before averaging. 5 These numbers agree reasonably well with the numerical results presented in figure 10. General expressions for low-order moments are given in appendix E.

Numerical analysis of spectral correlations
In this section we analyze the spectral correlation of the eigenvalues obtained by diagonalization of the SYK Hamiltonian for N = 32. In section 7.1 we discuss the number variance. The fluctuations due to ensemble averaging are analyzed in section 7.2 and spectral form factor is evaluated in 7.3.

Number variance
If ρ(x) is the average spectral density, then the average number of levels in an interval of width ∆ is given byn = x+∆/2 x−∆/2 ρ(y) dy.
The actual number of levels in the interval is equal to n = x+∆/2 x−∆/2 ρ(y)dy, (7.2) so that the deviation from the average number is given by 3) The variance of δn as a function ofn is known as the number variance, The average eigenvalue density can be obtained in two ways. First as the ensemble average of the spectral density for a very large ensemble. Second, for large matrices, as a fit of a smooth function,ρ(x), to the spectral density of a single disorder realization. If the two are the same for N → ∞, we speak of spectral ergodicity [67]. Even ifρ(x) − ρ(x) ∼ 1/N , it can give a large contributions to the number variance for intervals where n is no longer much smaller than N . In particular, this may happen in many-body systems where the number of levels increases exponentially with the number of particles.
To calculate the number variance as a function of the average number of levels in an interval we need to know x as a function of the number of levels with energy less than x. In other words, we have to invert the mode number function defined as Since the number of eigenvalues in an interval does not change under a coordinate transformation, it is simplest to invert (7.5) in coordinates where the level density is constant. If the constant is equal to one, the transformation is particularly simple, The spacing of the levels in the new coordinates is thus given by and the new level sequence is obtained by adding the differences, This procedure is usually referred to as unfolding. We emphasize that this is not necessary to calculate the number variance, but it makes it much more convenient to invert the mode number function N (x).
As was argued in section 4, due to the relative error in the spectral density of N q −1/2 , for largen the number variance behaves as which agrees with results obtained previously [15,16] for q = 4 and q = 3. The Thouless energy scale can thus be estimated as To calculate the number variance, we unfold the spectrum by means of a fit to the ensemble average of the spectral density including up to eight order Q-Hermitian polynomials The results are shown in figure 6 where we plot Σ 2 (n) (black points) versus the average number of levelsn in an interval for an ensemble of 400 SYK Hamiltonians with N = 32. For small n < 40 we see excellent agreement with the GOE (red curve) (see right figure), but for large distances, the number variance grows quadratically, Σ 2 (n) ∼ (n/293) 2 . The latter result is in good agreement with analytical result ofn 2 /2 N q obtained earlier in this section (which gives (n/268) 2 for N = 32 and q = 4).

The effect of ensemble averaging
As mentioned before, the ensemble fluctuations contribute to the number variance as This contribution can be subtracted by rescaling the eigenvalues of each realization of the ensemble according to its width [15-17, 52, 53]. However, this is only the first term in a "multipole" expansion, and in this section we systematically study the long-wavelength fluctuations that give rise to the discrepancy between spectral correlation in the SYK model and random matrix theory. This quadratic contribution (7.11) due to the scale fluctuations is projected out in the ∆ 3 statistic defined by ∆ 3 (n) = 1 0 duK(u)Σ 2 (un), (7.12) with smoothening kernel K(u) given by (see figure 7) in case where the number variance depends only weakly onn, the ∆ 3 (n) statistic in essence only measures level fluctuations up to ( √ 2 − 1)n and roughly corresponds to the number variance atn/4. Indeed, as is shown in figure 7, the ∆ 3 statistic agrees with the GOE to much larger values ofn with a Thouless energy that is about 2000 level spacings.
Next we explore the origin of other contributions to the discrepancy between the SYK model and the Wigner-Dyson ensembles. We do this by eliminating the "collective" fluctuations of the eigenvalues of each realization of the ensemble in which a macroscopic number of eigenvalues moves together relative to the ensemble average. This is achieved by calculating the average mode number by fitting a smooth function to the spectral density of a single configuration. The smoothened spectral density can be obtained in a systematic way by expanding it in Q-Hermite polynomials, truncated at l-th order: 14) The coefficients c k can be calculated by minimizing the L 2 norm excluding a small fraction of the eigenvalues in both tails of the spectrum. Generically, all even and odd coefficients are nonvanishing with an ensemble average that agrees with the fit to the average spectral density obtained in equation (3.7). In figure 8 we show the difference of the cumulative spectral density of the SYK model andρ QH,2 ,ρ QH,4 ,ρ QH,6 andρ QH, 8 . It is clear that for l = 8 the systematic dependence is of the same order as the statistical fluctuations, and no further gains can be made by including   higher values of l.
We now calculate the number variance usingρ QH,l to determine the number of eigenvalues in the interval for each configuration. We use both ensemble and spectral averaging to reduce the statistical fluctuations. For the latter we choose half-overlapping intervals. In figure 9 we show results for l = 8. The Thouless energy is about 2000 level spacings, but for largē n the number variance saturates to a constant and decreases beyondn ≈ 6000. This has two reasons. By unfolding configuration by configuration, we have eliminated fluctuations with wavelength larger than about 2 N/2 /16 = 2048. Second, because the total number of eigenvalues is fixed, the variance is suppressed whenn becomes of the same order as 2 N/2 /2.
The coefficients of the expansion of the mode number in Q-Hermite polynomials as a function of the ensemble realization number are shown in figure 10. In agreement with our analytical results, the first four coefficients and c 5 and c 7 fluctuate about zero, while c 6 and c 8 fluctuate about a nonzero value.
Next we study the dependence of the deviation of the number variance from the GOE result on the number of Q-Hermite polynomials that have been taken into account. In figure  11 we show the number variance when an increasing number of coefficients has been fitted to the spectral density of each configuration. In the upper row only a 2 has been fitted while a 3 until a 8 have been put to the value of the ensemble average. In the second row, a 2 , a 3 and a 4 have been fitted and a 5 , a 6 , a 7 and a 8 have been put equal to the ensemble average. In the third (fourth) row, the first five (six) coefficients have been obtained by fitting while the remaining ones up to order eight have been put equal to the ensemble average. Because the bulk of the deviation of the spectral density from the Q-Hermite approximation is given by a sixth order polynomial, the number variance is sensitive to what extent the length of the interval is commensurate with the distance of the zeros of the polynomial. That is why we observe large jumps (see figure 11 lower left) when the number of intervals used to calculate the number variance changes. When the length of the interval becomes smaller than about half the distance between the zeros of the sixth order Q-Hermite polynomial, this effect is no longer visible. The right column which shows a blow-up of the figures in the left column for smalln illustrates that the Thouless energy increases gradually with the number of Q-Hermite polynomials that have been taken into account in fitting the local spectral density, until it saturates at a value ofn corresponding to the shortest wavelength used for unfolding.
For second-order local unfolding, the number variance for largen is still quite accurately given by a quadraticn-dependence, but with a much smaller coefficient than in the case of ensemble averaging, For fourth order and beyond, the quadratic contribution is negligible and the number variance beyond the Thouless energy is very well fitted by a linearn-dependence.

Spectral form factor
Alternatively spectral correlations can be studied by means of the connected spectral form factor defined as The second term is the disconnected part. One could also study the spectral form factor without this subtraction [12], but in this paper we only consider the connected spectral form factor. Since spectral correlations are universal on the scale of the average level spacing, we have where ρ 2,unv is the universal random matrix result. Therefore, the spectral form factor becomes universal in terms of the variable t/ ρ(x) . Its large N limit is thus given by a double scaling limit which receives contributions from all orders in 1/N . We calculate the spectral form factor for the eigenvalues of the Hamiltonian of the SYK model. In order to have a well-behaved sum in equation (7.15) we need to perform some degree of smoothening which we will do by introducing a Gaussian cutoff of width w [13], We will evaluate the spectral form factor for the unfolded spectrum with ρ unf (x) = 1 and have chosen the prefactor such that for large t, when we can use the diagonal approximation, it is equal to unity. In figures 12 and 14 we show the spectral form factor for the unfolded spectra that were used to calculate the number variance in the previous section. In figure 12 finite size effects. The GOE result given by is represented by the red curve in both figures. It agrees with the SYK spectra up to very short times. Comparing the two figures, we observe that the spectral fluctuations due to the ensemble average result in a peak at small times. The peak should not be confused with the contribution from the disconnected part of the two-point correlator which is many orders of magnitude larger. For local unfolding there is no peak, and spectral form factor approaches zero for t → 0. Note that without the Gaussian cut-off the spectral form factor K c (t = 0) = 0 because of the normalization of the spectral density. With the cut-off, the spectral form factor at t = 0 measures the fluctuations of the number of levels inside the Gaussian window. It becomes only zero when the window is larger than the spectral support. In figure 13 we show the form factor for small times on a linear scale. Then the constant  Figure 13.
The spectral form factor for small t obtained by using the ensemble average of the spectral density for unfolding (black points). The results are fitted to a Gaussian (red curve). part of the two-point correlator gives the contribution For w → ∞ this converges to the δ-function 2πb N δ(t). In figure 13, we fit the parameters b N and w to the SYK data. For w ≤ 3000, the fitted value of b N ≈ 1.26 × 10 −5 ≈ 1/282 2 is close to the theoretical value given in equation (7.21), while for w = 4000 it is 15 % smaller. The number variance is related to the spectral form factor by a an integral over a smoothening kernel [68]: Therefore, the δ(t) contribution to the spectral form factor leads to a quadratic dependence If K(t) remains finite for t → 0 the large n limit of the number variance is given by In figure 14 we show the spectral form factor for an increasing number of fitted coefficients while the remaining ones up to eighth order are kept equal to the ensemble average. For a a fourth or fifth order fit, the spectral form factor seems to approach a constant for t → 0. From equation (7.22) it is clear that this will result in a linear dependence of the number variance. The coefficient of the linear term for fifth order unfolding is equal to 3.910 −4 which is in good agreement with the spectral form factor for t → 0 (which is 5.7 × 10 −4 , 7.1 × 10 −4 , 5.7 × 10 −4 and 3.0 × 10 −4 for w = 1000, w = 2000, w = 3000 and w = 4000, respectively.) In order to see the consistency of the calculation we compare two different ways to calculate the number variance. We conclude that a seemingly insignificant deviation of the form factor from the GOE result gives rise to a large deviation from the random matrix result for then-dependence of the number variance.

Double-trace moments and the validity of random matrix correlations
In this section we explore the question whether double-trace moments can shed light on the convergence to random matrix correlations. In subsection 8.1 we discuss the calculation of high-order double-trace moments, and in section 8.2 evaluate subleading corrections to estimate the convergence to the RMT result.

The calculation of high-order moments
The connected two-point correlator is given by Since the universal random matrix result scales as high-order moments have to be suppressed by 2 N , yet a naive subscripts counting when Wickcontracting Γ's would suggest double traces are suppressed by a power law N −kq where k is the number of cross links. An important observation was first made in the appendix F of [12] on the nested crosslinked-only chord diagrams. The authors of [12] proved that −k where T m was first defined in equation (5.3): The observation was that lim k→∞ t 1234...k|k(k−1)..
for the simple reason that this limit is dominated by only two terms m = 0, N , both of which give T m = 1 (other terms have |T m | < 1). This shows the naive expectation for moments is wrong and may serve as a starting point for understanding the RMT ramp from a moment method perspective.
In this section we offer a proof of a generalized version of equation ( The starting point of our proof is the completeness relation The X µ 's are as defined at the beginning of section 5: {X µ } is the set of all linearly independent matrices formed by products of Dirac gamma matrices, with appropriate prefactors such that X 2 µ = 1, and µ is a multi-index containing all the subscripts of the Dirac matrices in the product. The set {X µ } has 2 N elements because they form a basis for the vector space of 2 N/2 × 2 N/2 matrices. It will be useful to organize X µ 's by the number of Dirac matrices in the product, that is, the length |µ| of the multi-index µ, then the completeness relation can be rewritten as Tr (X µm Γ a 1 . . . Γ a k X µm Γ a k . . . Γ a 1 ) Since (with c aµm the number of indices that a and µ m have in common) we have for fixed m, µm a Tr (X µm Γ a X µm Γ a ) Substituting equation (8.10) into equation (8.8), we arrive at This result is an application of the so called "cut-vertex factorization" property for single traces [62]. 6 This property states that the chord intersections that do not form a closed loop factorize into products of individual intersections. In this instance we have k independent intersections, each giving a factor of (−1) mq T m . We thus have demonstrated that equation (8.7) reproduces equation (8.3) for even q and even k, and trivially for when both q and k are odd because both formulas give zero. We will now use (8.7) to derive a modest generalization to equations (8.3) and (8.4). The diagrams we consider are the ones with a number of intersecting cross links and a number of nested links. Repeating the previous analysis, we conclude that these contributions are of the form where k is the number of nested cross links and rest(N, m, q) is whatever that remains of the single-trace chord diagram after taking out the intersections between the dashed chord and nested links. We remind the reader that the dashed chord represents the insertion of X µ 's. Again in this limit the sum is dominated by two terms m = 0 and m = N , so the limit is simply 2 −N [rest(N, 0, q) + rest (N, N, q)] . (8.14) · · · · · · · · · · · · = 2 1−N × Figure 17. An example of equation (8.14). The ellipses on the left represent infinite number of nested cross links. On the right we have same contraction with all the nested links removed, as a single trace diagram.
Note that rest(N, 0, q) = rest(N, N, q) = single-trace diagram with dashed chord removed, (8.15) because the m = 0 and m = N terms correspond to the insertions of identity matrix and Dirac chirality matrix in equation (8.7), and the chirality matrix insertion is equivalent to the identity matrix insertion when q is even 7 , we conclude that a diagram with a large number of nested cross links is equal to 2 1−N times the corresponding single-trace diagram with all nested links removed -see figure 17 for a concrete example.
The approach developed in this section can be viewed as complementary to that of section 6.1: the method of section 6.1 gives a completely combinatorial formula (6.9)-(6.12) in which the number of summations scales as a power in q and is independent of N , hence is useful for calculating exact numerics for low-order moments with large N , as we have seen. However, for exactly the same reason it is not very useful for discussing the large q behaviors, neither is it helpful for discussing the asymptotic behavior with large number of cross links due to its complexity. On the other hand, the approach of this section gives a symbolically simple formula in terms of single traces, hence makes various asymptotic behaviors amenable to discussion for a large number of nested cross links as we have seen and will see more soon.

Long-wavelength fluctuations
Previously we saw a somewhat mysterious numerical phenomenon: we do not need to remove exponentially many long-range modes to get very close to the random matrix result. With the techniques discussed in section 8, we give a partial explanation to this phenomenon and estimate how the number of subtractions scales with N . We first summarize the gist of this section: • The mode numbers correspond to the numbers of cross links in double-trace contractions.
• The double-trace contractions converge rapidly to the RMT result as the number of cross links increases.
Here by "RMT result" we simply mean the weak statement that after a certain point the contractions stop being suppressed by the number of cross links but by 2 −N instead, as would be the case for an RMT theory. As we have seen in equations (4.12) and (6.13), the double traces from diagrams with precisely two cross links can be entirely attributed to scale fluctuations -the lowest long-wavelength mode. For higher modes and larger number of cross links, we cannot make the correspondence as precise, but it is intuitively clear that both probe finer structures of the correlations. In the case of actual RMT ensembles, such a correspondence can be made precise [69]. Since the goal of this section is humble, this level of understanding should suffice as a starting point. We will estimate speed of convergence of equation (8.4). In fact t 1234...k|k(k−1)...1 is not quite the right quantity to study. Near equation (6.8), it was pointed out that for higher moments (large number of cross links) the contribution Tr (γ c H m ) (Trγ c H n ) must be included. The completely nested diagram contribution in this double trace is and this is the quantity whose speed of convergence we wish to study. Its finite-k relative deviation from the limiting result is For m = 0, N , we have |T m | < 1, and the remaining largest terms are . (8.20) So for large enough k, we would expect the summand N m T k m to be sharply peaked near So we see that contractions with k > N/(2q) log N are suppressed, and to obtain the RMT result we only have to subtract long wavelength fluctuations with k < N/(2q) log N . 8 For N = 32 and q = 4, we subtracted the first eight long-range modes to get a decent agreement to the RMT result, which is in the right ball park of our estimate N/(2q) log N ≈ 14. This analysis also applies to the slightly broader class of diagrams that include the ones described near equation (8.14).
For a more complete analysis, we would need to get a handle on the cases with linked intersections and the unfolding, which is outside the scope of this paper.

Discussion and Conclusions
How can we analytically understand the observations of this paper? To get some perspective, we first discuss spectral correlations of the eigenvalues of the QCD Dirac operator. The first results were obtained in [70] were the number variance was calculated from the quenched Dirac eigenvalues of a single lattice QCD configuration. Agreement with random matrix theory was found to as many level spacings as allowed by the statistics (which is about 100 level spacings for a 12 4 lattice). However, it was soon realized that when we compare to the ensemble average, which is what should be done, deviations are already visible on the scale of a couple of level spacings. The deviations are now well understood and can be obtained analytically from chiral perturbation theory [71,72]. The scale where deviations from random matrix theory occur is the quark mass scale for which the corresponding pion Compton wave length is equal to the size of the box.
The situation with the SYK model is similar. Comparing the estimate of the scale where spectral correlations start deviating from RMT with statistical fluctuations of the spectral density from one realization of the ensemble to the next, we see that the deviations from the RMT predictions can mostly be accounted for by such long-wavelength fluctuations. This is further confirmed by our numerical study, where after the first few long-wavelength modes are subtracted realization by realization of the ensemble, the spectral form factor becomes RMT-behaved until very short times and the number variance has RMT spectral rigidity until the energy scale given by the wavelength of the subtracted modes. This can be partly explained by the fact that the Hilbert space is 2 N/2 -dimensional whereas there are only N q model parameters J α , implying that very few matrix entries of the Hamiltonian can fluctuate independently. This scenario is to be contrasted with actual random matrix ensembles, where all matrix entries (barring Hermiticity and symmetry constraints) can fluctuate independently.
However, this does not quite explain the separation of scales between the long-wavelength modes and scale of universal RMT fluctuations. We have gained some insight into this scale separation by looking at the convergence of a class of double-trace chord diagrams toward those of RMT, and analytically demonstrated that their early convergence behavior is consistent with our numerical results. It is desirable to have analytically controlled estimates of all chord diagrams to make more quantitative comparisons. We did develop a combinatorial formula for all chord diagrams in section 6.1, and it has been used to calculate the first few moments, but it is complicated and hard to apply to general high-order moments. We hope to investigate this in future works.
In the same vein of understanding QCD Dirac operator's spectral statistics through chiral perturbation theory, we studied the nonlinear sigma model formulation of the spectral determinant of the SYK model. Although it is reassuring that the large energy expansion of the sigma model correctly reproduces the moments in a nontrivial manner, and that the wide correlator is given by the sum of the GUE result and corrections that are in agreement with numerical results, only few low-order correction terms could be calculated and it is not clear how the contribution of all other corrections to the two-point function cancels. In our view the following two issues have to be addressed to complete our understanding of the spectral density and spectral correlations in terms of the σ-model: 1. The most straightforward calculation gives a one-point resolvent that has the wrong branch cut. It is not known how to perform a resummation at the level of the σ-model action. In particular, we have not been able to obtain a Gaussian spectral density when N q.
2. Although the correction terms to the RMT result explain the numerical results for q = 4, it has not been shown that all other corrections in the loop expansion are small -in fact we do not know the validity of the loop expansion. Since naive application of the result to q = 2, which has Poisson statistics, also gives RMT spectral correlations albeit with a much smaller Thouless energy, and we conclude that the loop expansion has to break down in this case.
Neither issue implies that the σ-model approach is inherently flawed, but they do suggest that our understanding of the σ-model is not complete. Similar issues arise in the calculation of nested cross-linked diagrams where the q = 2 case is not qualitatively different from the q = 4 case. We hope to address some of these questions in future work.

A Derivation of the σ-model
In this section we derive the nonlinear σ-model for the SYK model, see [14] for the complex SYK model, and [15] for the Majorana SYK model. Our derivation follows a different route using the Fierz transformation, see also [73]. Since [γ c , H] = 0 we have the Hamiltonian in the block-diagonal form in the chiral basis where γ c = diag(1, . . . , 1 Taking into account the unitary symmetries as required for universal RMT correlations, we focus on the upper block and define the generating function as where k is the replica index. We should think of φ as also having an implicit index of the D = 2 N/2−1 dimensional Hilbert space (the index of the block Hamiltonian H U ) and " · " indicates summation over the Hilbert space. The integrals are convergent because Im(x) = i and Im(y) = −i . We also introduce the sign factor which will be inserted in the definition of the partition function such that the integrals are convergent. The correlation function of two resolvents is given by Recall that the disordered average is over Gaussian variables J α with variance v 2 , so after averaging we obtain where p = 1, 2, k = 1, 2, . . . , n and z pk can be viewed as the diagonal entries of z := diag( n x, · · · , x, n y, · · · , y) (A.7) defined in the replica space. Next we apply the Fierz transformation and where the X µ 's are defined in equation (8.5). Restricting this equation to the left-upper block, we get We summed over µ's with even string length because only a product of even number of Dirac matrices has a non-zero left-upper block, and we denote the left-upper block of X µ by X U µ . We also used that We can further reduce the number of generators in equation (A.9) by half by noticing that generators of X µ with |µ| > N/2 are related to |µ| < N/2 one-to-one by multiplication of γ c , and half of X µ 's with |µ| = N/2 is related to the other half in the same way. On the other hand γ c is the identity matrix when restricted to upper block, so within the upper block the above-mentioned pairs are identical. This allows us to reduce the number of generators to 2 N −2 = D 2 , and where µ denotes the sum over µ's with even |µ| and |µ| < N/2 if N/2 is odd, and also over half of µ's with |µ| = N/2 if N/2 is even. After the Fierz transformation we thus obtain the generating function Z(x, y) = DφDφ * e i p,k φ * pk ηpz pk φ pk − σ 2 2 1 D pkql ηpηq µ Tµφ * pk ·X U µ ·φ ql φ * ql ·X U µ ·φ pk , (A.12) where σ 2 = N q v 2 is the multi-particle variance and Now we use that This results in Z(x, y) = Da µ DφDφ * e i pk φ * pk ηpz pk φ pk e − µ 1 2σ 2 Tµ tra 2 µ + pkql ηpηq µ a pk;ql We can now perform the Gaussian integral over φ and φ * resulting in the partition function A rescaling of the integration variable a µ → σa µ gives equation (5.1). This is the partition function obtained by Altland and Bagrets [15]. Note that the µ = 0 term in this sum is exactly the GUE result. This derivation raises an important concern. The µ = 0 term in (A.12) consists 4n 2 D 2 N q terms of the form φ i * pk φ i ql φ j * ql φ j pk each with weight σ 2 /2D. However, the expression before the Fierz transformation (A.6) has only 4n 2 D N/2 q such terms from the diagonal Γ α (there are N/2 q of them) and 4n 2 D( N q − N/2 q such terms from the off-diagonal Γ α , each with weight σ 2 /2. Note that all Γ α have only one nonzero matrix element in each column and in each row in the representation we are working in. So in total we have 4n 2 D N q terms of this form in equation (A.6) which is an exponentially smaller number than in equation (A.12), and most terms of the form φ i * pk φ i ql φ j * ql φ j pk in (A.12) are actually canceled by the other terms obtained after the Fierz transformation. Indeed, the dominance of the GUE contribution has to break down for q = 2, where the eigenvalues are uncorrelated. Yet, there is no qualitative difference between the σ-model for q = 2 and q = 4.
For N mod 8 = 0 the γ matrices are real and the spectral correlations are in the universality class of the Gaussian Orthogonal Ensemble (GOE). In this case, the term in (A.6) φ * pk · Γ U α · φ pk (A.17) can be written as After applying the Fierz transformation, we obtain additional terms of the form which is the so-called Cooperon contribution to the GOE result. Note that in order to get a β independent spectral density for the Wigner-Dyson ensemble, we have to scale the variance of the random matrix Hamiltonian as 1/β. For the SYK model we do not have such rescaling and we expect an additional √ β dependence in the saddle-point equation and the corresponding resolvent and semi-circular spectral density.

B Some combinatorial identities
One can easily show the identity, For arbitrary p the sum behaves as If |µ| and |ν| are fixed we have Using this we obtain the identity which can also be shown by applying the Fierz transformation to T ν X ν X ν . To obtain the sixth moment from the σ-model calculation we need the identity ≡ 4T 6 . (B.5)

C.1 One-point function
The GUE partition function for n replicas is given by where the integral is over n × n Hermitian matrices σ. Using that the replica limit of the partition function is equal to one, we find that the resolvent is given by where we have expressed the derivative with respect z in terms of a derivative with respect to theσ kk and partially integrated these variables. We evaluate the resolvent in powers of 1/z and in powers of 1/N . The saddle point equation is given bỹ Expanding around the physical solutionσ that asymptotes as 1/z for large z, and using the saddle point equation we obtain the expansion where the expectation value is given by the sum over all Wick contractions. The terms with an odd number of factors σ vanish and we have not included them in the above. One can easily show that the following terms vanish in the replica limit The two-point correlation function is given by 16) with P kk the projection on the kk block. The differentiation gives other contributions but they do not contribute to the double trace connected two-point function. We evaluate the integral by a saddle point approximation. The saddle point equation is given by In a 12 block notation, the solution with σ 12 = 0 has the replica-diagonal form The propagators and the vertices follow from the expansion of the logarithm This results in the following quadratic part of the action (C.23) The connect part of the second diagram can be evaluated as lim n→0 1 n 2 N 2 trσ 11 trσ 22 trσ 11 σ 12 σ 21 trσ 22 σ 21 σ 12 c (C.24) The sum of the two contributions is equal to (C. 25) which gives the correct result for two-point correlator to order 1/N 2 (see [14]) and the M 1,1 and M 2,2 moments. Note that in the normalization of this appendix,σ(x)σ(y) → 1 for x → y, so that the two-point function behaves as 1/(N 2 (x − y) 2 ) in this limit.

12) by 3-cross-linked examples
To illustrate are general prescription for the evaluation double-trace diagrams (see 6.1) we work out two double-trace diagrams with three cross-links in this appendix. The starting point is the fact that 2 −N/2 Tr(Γ a 1 Γ a 2 . . . Γ am ) can only be 0 or ±1. This implies 2 −N Tr(Γ a 1 Γ a 2 . . . Γ am )Tr(Γ am Γ a m−1 . . . Γ a 1 ) = 2 −N |Tr(Γ a 1 Γ a 2 . . . Γ am )| 2 = 0 or 1. (D.1) This particular simplicity motivates us to shuffle the Γ's to the above "canonical" ordering. The shuffling will introduce phase factors due to the relation where c αβ = |α ∩ β| is the number of common elements in sets α and β. For example let us consider the contraction Tr(Γ a 1 Γ a 2 Γ a 3 )Tr(Γ a 1 Γ a 3 Γ a 2 ), the shuffling (see figure 18) will introduce a phase factor of (−1) ca 1 a 2 +ca 1 a 3 , and hence a 1 ,a 2 ,a 3 In fact in this particular case we can get rid of the phase factor by cyclicity permuting the a 1 a 2 a 3 a 1 a 3 a 2 = ×(−1) ca 1 a 2 +ca 1 a 3 a 1 a 2 a 3 a 3 a 2 a 1 Figure 18. Shuffling a double trace to its canonical ordering (subscripts not summed over).
Γ's in the second trace, so we must have However we want to illustrate a general point beyond this simple example, so we keep phase factors in our discussion. We see in general each intersection among the chords introduces a phase factor of (−1) q+c αβ . We would still like to get rid of the trace in our equation (D.3) in favor of a purely combinatorial term, so that it can be effectively handled by computers.
The key question is, when is 2 −N |Tr(Γ a 1 Γ a 2 Γ a 3 )| 2 equal to 0 and when is it equal to 1? Recall that each Γ is a product of q different Dirac matrices. The necessary and sufficient condition for Tr(Γ a 1 Γ a 2 Γ a 3 ) to be nonvanishing is that every one of the N Dirac matrices occurs exactly even number (including zero) number of times in the totality of Γ a 1 , Γ a 2 and Γ a 3 .
This suggests that a useful perspective will be provided by the d-variables: d a i 1 a i 2 ...a i k is the number of elements common and only common to the sets a i 1 , a i 2 , . . . , a i k (naturally, i 1 , . . . , i k are all different from each other in this definition). In the case of three index sets (k = 3), we have the d-variables {d a 1 a 2 , d a 1 a 3 , d a 2 a 3 , d a 1 a 2 a 3 }. There are also {d a 1 , d a 2 , d a 3 } which count the number of subscripts that occur exactly once in only a 1 , a 2 or a 3 , but they are not independent variables due to the constraint that each index set has q elements. 9 Both the c-variables (c a i 1 a i 2 ) and the d-variables have played an important role in calculating single-trace contractions, and the relations between them were discussed in [62]. Here we cite one diagram that makes their relation clear, see figure 19: c a 1 a 2 = d a 1 a 2 + d a 1 a 2 a 3 ; c a 1 a 3 = d a 1 a 3 + d a 1 a 2 a 3 ; c a 2 a 3 = d a 2 a 3 + d a 1 a 2 a 3 ; (D. 4) In general, c a i 1 a i 2 = d a i 1 a i 2 +  We come back to the case of |Tr(Γ a 1 Γ a 2 Γ a 3 )| 2 . Here d a 1 a 2 counts the number of Dirac matrices of which the subscripts appear exactly in sets a 1 and a 2 , no less and no more. So there the Dirac matrices appear exactly twice in the totality of Γ a 1 , Γ a 2 and Γ a 3 . The same can be said about d a 1 a 3 and d a 2 a 3 . The total number of Dirac matrices that appear exactly twice in the totality of Γ a 1 , Γ a 2 , Γ a 3 is thus d 2 = d a 1 a 2 + d a 1 a 3 + d a 2 a 3 .

(D.6)
On the other hand d a 1 a 2 a 3 counts the number of Dirac matrices that appear exactly three times in the totality of Γ a 1 , Γ a 2 , Γ a 3 . Hence if d a 1 a 2 a 3 = 0, |Tr(Γ a 1 Γ a 2 Γ a 3 )| 2 = 0. We also want to know how many Dirac matrices appear exactly once: Exactly once in a 1 : d a 1 = q − d a 1 a 2 − d a 1 a 3 − d a 1 a 2 a 3 ; Exactly once in a 2 : d a 2 = q − d a 1 a 2 − d a 2 a 3 − d a 1 a 2 a 3 ; Exactly once in a 3 : d a 3 = q − d a 1 a 3 − d a 2 a 3 − d a 1 a 2 a 3 .

(D.7)
And |Tr(Γ a 1 Γ a 2 Γ a 3 )| 2 = 0 also if any of the {d a 1 , d a 2 , d a 3 } is nonzero. Synthesizing everything α 1 α 2 α 3 α 1 α 3 α 2 Figure 20. The single trace chord diagram that has the same intersection structure as that of figure  18, note here all chords are attached to a single backbone as opposed to two in figure 18. discussed so far, we arrive at the formula 2 −N a 1 ,a 2 ,a 3 Tr(Γ a 1 Γ a 2 Γ a 3 )Tr(Γ a 1 Γ a 3 Γ a 2 ) =2 −N a 1 ,a 2 ,a 3 (−1) ca 1 a 2 +ca 1 a 3 |Tr(Γ a 1 Γ a 2 Γ a 3 )| 2 Note for the second equality we replaced the sum over over index sets a 1 , a 2 , a 3 by a sum over numbers d a 1 a 2 , d a 1 a 3 , d a 2 a 3 . In principle there is also a sum over d a 1 a 2 a 3 , but we have argued only d a 1 a 2 a 3 = 0 case contributes. It is clear that in this case there is only one scenario where the three Kronecker δ constraint is satisfied: How about cases with both single-trace links and cross links? Let us consider the example in figure 21, we would need to calculate 2 −N a 1 ,a 2 ,a 3 ,β Tr(Γ a 1 Γ a 2 Γ a 3 )Tr(Γ a 1 Γ a 3 Γ β Γ a 2 Γ β ) =2 −N a 1 ,a 2 ,a 3 ,β (−1) q+ca 1 a 2 +ca 1 a 3 +c a 2 β |Tr(Γ a 1 Γ a 2 Γ a 3 )| 2 . (D.12) Now the d-variables are {d a 1 a 2 , d a 1 a 3 , d a 2 a 3 , d a 1 β , d a 2 β , d a 3 β , d a 1 a 2 a 3 , d a 1 a 2 β , d a 1 a 3 β , d a 2 a 3 β , d a 1 a 2 a 3 β }.
(D. 13) However the constraint of the sum on the right-hand side is still |Tr(Γ a 1 Γ a 2 Γ a 3 )| 2 , just as in equation (D.8), so we may say similar things about the conditions on the contributing summands: the traces with Dirac matrices whose subscripts appear odd number of times in the totality of a 1 , a 2 , a 3 are vanishing. This means d a 1 β , d a 2 β , d a 3 β , d a 1 a 2 a 3 , d a 1 a 2 a 3 β must all be zero. So the list (D.13) is shortened to {d a 1 a 2 , d a 1 a 3 , d a 2 a 3 , d a 1 a 2 β , d a 1 a 3 β , d a 2 a 3 β }. (D.14) We still need to take into account the Dirac matrices whose subscripts only appear once exclusively in either a 1 , a 2 or a 3 , and they are counted respectively by (D. 15) So these must also be zero. We also have c a 1 a 2 = d a 1 a 2 + d a 1 a 2 β , c a 1 a 3 = d a 1 a 3 + d a 1 a 3 β , c a 2 β = d a 1 a 2 β + d a 2 a 3 β .