Multi-trace Correlators in the SYK Model and Non-geometric Wormholes

We consider multi-energy level distributions in the SYK model, and in particular, the role of global fluctuations in the density of states of the SYK model. The connected contributions to the moments of the density of states go to zero as $N \to \infty$, however, they are much larger than the standard RMT correlations. We provide a diagrammatic description of the leading behavior of these connected moments, showing that the dominant diagrams are given by 1PI cactus graphs, and derive a vector model of the couplings which reproduces these results. We generalize these results to the first subleading corrections, and to fluctuations of correlation functions. In either case, the new set of correlations between traces (i.e. between boundaries) are not associated with, and are much larger than, the ones given by topological wormholes. The connected contributions that we discuss are the beginning of an infinite series of terms, associated with more and more information about the ensemble of couplings, which hints towards the dual of a single realization. In particular, we suggest that incorporating them in the gravity description requires the introduction of new, lighter and lighter, fields in the bulk with fluctuating boundary couplings.


Introduction
The Sachdev-Ye-Kitaev (SYK) model [1,2] is a simple tractable model of quantum chaos and holography. The model consists of N Majorana fermions, and a Hamiltonian with all-to-all random interactions of p fermions (the full definition is given in section 1.1). It has been extensively analyzed in the large N limit, both at finite p [3][4][5] and when p ∝ √ N , AKA the double scaled limit [6][7][8][9]. 1 In the fixed p large N limit, the theory at low energies is governed by an effective Schwarzian action, which, via its relation to JT gravity, is a simple toy model of holography [11][12][13]. In the double scaled limit p ∝ √ N , the model also has a Schwarzian regime but it is also related to a q-deformation of random matrix theory [14], and the full large N density of states is given by the q-Gaussian distribution [6,9,15,16].
As we are dealing with a random Hamiltonian in the large N limit, the natural objects to consider are disorder averaged. The simplest of these are expectation values of a single trace; for example we may consider the averaged moments of the Hamiltonian m k = tr(H k ) J , where · J represents an ensemble average over different random couplings in the Hamiltonian, or the averaged thermal partition function Z(β) = tr(e −βH ) J . These teach us about the expected spectral density of the Hamiltonian ρ(E) = i δ(E − E i ) J . We may also consider the correlations between the eigenvalues, which are computed through expectation values of higher trace objects. For example the spectral form factor tr e (−β+it)H tr e (−β−it)H J at long times gives us insight into the nearest neighbor eigenvalue spacing [16]. This spacing follows a distribution given by a random matrix theory (RMT) universality class which only depends on the symmetries of the model [8,[16][17][18][19][20]. The eigenvalues also exhibit long range correlations [21,22], which dominate at shorter time scales, and are the focus of this paper. More precisely, our main objects of focus are the multi-trace thermal expectation values Z(β 1 , β 2 , . . . , β n ) = tr e −β 1 H tr e −β 2 H . . . tr e −βnH J (1.1) (and their behavior at relatively short time scales). Specifically we focus on the connected component, and to a few leading orders in the large N expansion.
The motivation for focusing on the connected correlators is to understand them in the gravitational dual theory. In a gravitational theory these objects usually come about from a path integral over a space-time with multiple boundaries, with the connected components representing a geometry which is connected, in some sort of way, in the bulk [23]. Such a picture arises in JT gravity, and its proposed duality to the β-ensemble RMT models with a Schwarzian envelope density of states. There, the connected components come from topological connections between boundaries, and replicate exactly the topological recursion results of RMT [24].
In contradistinction, we are discussing the full SYK model. There at shorter time scales, the gravitational interpretation is different -the SYK model contains the RMT results, but it also has modifications to the connected components that come from "global fluctuations" of the spectrum [21,22]. These large scale fluctuations are suppressed as some power of 1/N , rather than the exponential 2 −N/2 suppression of the RMT terms 2 , and so are dominant at least at early times. We will focus on these global fluctuations at leading orders in 1/N , and try to fit them into the broader gravitational picture. We note that in the double scaling limit these corrections go like e −c √ N log(N ) , which is exponentially suppressed but still much larger than 2 −N/2 . So at short time scales these are effects which are much larger than geometric wormholes.
Studying the full SYK model is necessary because the JT gravity-RMT correspondence restricts to extremely low energies and very long times, and throws out many interesting effects at finite times. In particular we will see that, properly interpreted, these "global fluctuations" imply the existence of specific ultra light particles in the bulk (or low dimension operators). Furthermore, when one attempts to discuss the dual model at a given fixed realization, the relevant information resides in these additional fields. 3 To put this work in context, we present a brief reference to known results with regards to the SYK spectrum and spectral fluctuations. An additional more detailed comparison is carried out in subsection 3.6 where we use our explicit formulas to compare to existing analytic and numerical results.
Single trace correlators give us the overall shape of the spectrum. For the finite p, N → ∞ they were computed in [1][2][3][4]. In the double scaled limit, N, p → ∞ while keeping N/p 2 fixed, they were computed in [7,9,15,16], and there the SYK model has an asymptotic q-Gaussian spectrum. This q-Gaussian spectrum is a great approximation of the actual spectrum at finite N and p [26][27][28]. The center of this spectrum resembles a Gaussian distribution [8], while the edge of the spectrum resembles the Schwarzian density of states ρ(E) ∼ sinh( √ E − E 0 ) [16,26,29,30]. The nearest neighbor (or a few apart) level spacing in the SYK model follows that of a random matrix ensemble. 4 The precise RMT ensemble (GOE, GUE, or GSE) depends on the values N mod 8 and p mod 4, which give rise to different particlehole symmetries, and have been completely classified in [8,[16][17][18][19][20]. This level spacing statistics results in a universal RMT contribution to the spectral form factor which is dominant at exponentially late times [16]. Apart from these universal contributions, there are also global fluctuations of the spectrum which are less suppressed in the large N limit [9,16,22]. It has been an active area of research in past few years to analytically show the SYK model has RMT spectral fluctuations, as well as to characterize the global fluctuations.
The most straightforward approach to computing the spectral fluctuations of the SYK model is to start with its path integral formulation [3][4][5], and to study the path integral of two replicas of the SYK model [33]. The two replicas path integral describes the double trace thermal partition function tr e −β 1 H tr e −β 2 H . It has been argued that the universal RMT contributions should arise from replica non-diagonal saddle points of this action [16,24,33], thus by finding these saddle points one can try to understand the connected thermal partition function and the spectral form factor. Finding these saddle points both analytically and numerically has been an ongoing area of research [24,[33][34][35][36].
A different method to study the spectral fluctuations is to derive a non-linear σmodel for the SYK model using random matrix theory techniques, 5 as was first done in [21]. 6 This method produces the universal RMT level spacing statistics, as well as corrections around it, and was also studied in [22]. This σ model is also related to the study of spectral fluctuations using supersymmetry techniques [38,39].
We note that this problem can also be mapped to the study of the cummulants of q-Brownian motion [14,40].

Framework and Main Results
The goal of this paper is two fold. The first is to calculate explicit expressions for multitrace correlators in the SYK model at early times, and develop a general perturbative framework for calculating corrections to these observables. The second goal is to start exploring how to interpret such correlators in a gravitational theory. In particular we identify the leading deviations that an observer sees at a given realization of the couplings. Two notions that we will discuss are global modes 7 and fluctuations parameters. Both have to do with specific ways in which the spectrum of the theory can fluctuate over specific energy separations or time scales, although we will focus more on their implications to the gravitational picture.
Of course, if we start with any fixed Hamiltonian then the full set of eigenvalues is tantamount to the full model. But if we are interested in expectation values of simple operators on typical states, or some other coarse grained observables at shorter time scales, we can do with less information which is captured in a coarse grained description of the spectrum. 8 To understand what is meant by a coarse grained description of the spectrum, consider the Hamiltonian in (1.2) for some arbitrary choice of couplings J, and the quantity ρ(E, ) = 2 −N/2 N E− /2,E+ /2 , (2.1) where N E− /2,E+ /2 is the number of eigenvalues in the small interval [E − /2, E + /2]. Then the coarse grained spectrum is given by the limits 9 If this limit exists then it implies that the coarse grained spectrum coincides with the ensemble average spectrum and that fluctuations around this average spectrum are small for any typical Hamiltonian in the ensemble. For a fixed Hamiltonian we can still use this averaged spectrum to compute observables like the thermal partition function, and it will give the correct result up to small differences which depend on the exact realizations of the couplings. We can ask, however, about fluctuations of ρ(E, ) at finite N , and then at infinite N . We expect that in some cases will become a finite random function with some fixed distribution, if we choose the right normalization factor F (N ). We will find that for some fixed positive κ, which depends on the precise scaling of the model at large N .
In other words, as we go over the various realizations with a measure dictated by their distribution, we can write ρ(E) = ρ 0 (E) + N −κ δρ(E) (2.6) where δρ(E) is a random function drawn from some ensemble 10 . All the information on the specific realization is projected into the distribution of this function. Thus we have coarse grained the information in all the J's, but not to the full extent of averaging over all of them. The scaling (2.5) is already a striking statement since in ordinary β-ensemble RMT models, and in the gravitational paraphernalia that goes along with it, the leading coarse grained information is of order (dim H) −1 ∼ e −const·N . So for SYK like models the fluctuations of the spectrum are much much larger. As we discuss below, δρ(E) is the leading global mode.
From the coarse grained spectrum, (2.6), we can compute the leading fluctuations of the multi-energy distribution, ρ(E 1 , . . . , E n ) ≡ ρ(E 1 ) . . . ρ(E n ). These leading order fluctuations are encoded in the connected part of the multi-energy distribution with respect to the average over the random fluctuations δρ. Denoting the ensemble from which δρ(E) is drawn as P({δρ}) the connected part of this multi-energy distribution is given by ρ c (E 1 , · · · , E n ) = N −nκ DP(δρ) δρ(E 1 ) · · · δρ(E n ). (2.7) Next we can proceed in two directions: 1. We may extend these fluctuations to an entire expansion for a series of increasing positive numbers κ. We expect this sum to be asymptotic. However, if we terminate the expansion at some fixed order, say i = 1, · · · , L we can hope that there is some joint distribution of the vector of functions δρ(E) = δρ 1 (E), δρ 2 (E), · · · , δρ L (E) , (2.9) which encodes the leading and some subleading orders coarse grained information on the exact realization of the couplings. We will derive such an expansion for the SYK model in sections 5 and 6.
2. In some cases the distribution is highly degenerate, which is the situation in the SYK model. In this case, the measure on the ρ's is not on a space of functions with an infinite number of parameters. Rather, if we truncate the series at some finite order, the vector δρ is a vector of functions that depend on a finite number of parameters, which we will denote as h i 's. Since these parameters control changes to the entire spectrum we will refer to them as fluctuation parameters, and we will refer to how they deform the spectrum as global modes. We call these deformations global as they encode spectral correlations of eigenvalues with a finite energy separation; this is in contrast to the local nearest neighbor eigenvalue statistics which follows that of a β-ensemble RMT universality class.
For example, we will see in section 4.2 that in the SYK model the leading order fluctuation behaves as an overall rescaling of the spectrum, i.e., we can write with some distribution on h which we denote as P (h). In this case the leading order fluctuation also captures some contributions to smaller terms (larger κ i 's) in the expansion (2.8). The connected leading contribution to the multi-trace correlator will then be where we are subtracting disconnected contribution with respect to the integral over h. In this case the fluctuation parameter is h and the global mode is a breathing mode which scales the distribution. We can now start asking about an observer in a given realization of the couplings, and how they may measure a deviation from the ensemble averaged theory. Actually this whole construction was built to allow us to easily isolate this leading coarse grained information. Indeed equations (2.4) and (2.6) tell us that the leading deviation from the averaged theory is given by measuring a particular deviation δρ(E), or equivalently measuring the particular values of the fluctuation parameters {h i }. The question is whether we can identify this set of fluctuation parameters with partial information about the set of random couplings J.
The answer turns out to be yes. Consider the set of couplings J I , where I is an index set of length p (there are N p distinct couplings), and notice that the SYK Hamiltonian along with the Gaussian ensemble of couplings is invariant under SO(N ) rotations. Thus different realizations of the couplings are parametrized by the orbits of SO(N ), i.e., by the cosets {J I }/SO(N ). (2.12) Another way of parametrizing these cosets is by the SO(N ) invariant combinations of the J's. For example the invariants containing the least number of J's are where ⊕ stands for the XOR operation. Already at the level of 4 J's there are many different invariants. It turns out that the fluctuations parameters that we discussed before are just these h's, hence the leading order fluctuations of the SYK model are characterized by SO(N ) invariant combinations of the random couplings which contain the least number of J's. A more detailed explanation and discussion of this general result happens in section 6. The main implication of these fluctuation parameters is that one needs to introduce corresponding light fields into the gravitational path integral, which participate in generating correlations between disconnected bulk geometries. Such correlations should be distinguished from the standard picture of gravitational wormholes, in which the wormhole connects two disconnected boundaries through a connected bulk geometry. Rather they seem to be related to multi-trace deformations involving these light fields [46][47][48]. This allows for the description of these non-geometric wormholes as a perturbative expansion around the disconnected geometry saddle. This will also be shown using the path integral formulation in section 8. It is puzzling that these nongeometric wormholes can be the dominant term in the multi-boundary dynamics, say, for concreteness, by being the dominant contribution to the spectral form factor at intermediate times, as we will show in section 3.6. This shows that these additional fields must be included in any consistent gravitational dual to the SYK model.
Suppose that indeed all the correlations are multi-trace deformations that live on the boundary of AdS 2 and consider the situation that AdS 2 (perhaps times an additional compact manifold) is obtained as a near horizon limit of some higher dimensional black hole. In this case, the boundary of AdS 2 should be thought of as the transition region surrounding the near horizon region. The suggestion above then implies that more light fields need to be introduced in the near horizon limit, with new interactions at the transition region. An outside observer might see them as new degrees of freedom, beyond standard fields, living close to the horizon.

The outline of the paper
In section 3 we compute the leading correction to the multi-trace moments. We start by considering the moments as a sum over multi-trace chord diagram, which are generated by summing over all Wick contractions of the random couplings. Then we focus on the double trace moments in section 3.1, and show that the leading order correction to the double trace moments comes from the minimal number of contractions between the two traces. This minimal connectivity leads to a general structure for the leading order multi-trace contribution coming from a sum over cactus diagrams, which is developed in section 3.2, and proven in sections 3.3 and 3.4. In section 3.5 we re-sum these contributions, allowing us to give explicit formulae for the leading order contribution to the connected parts of the multi-trace thermal partition function and density of states.
We end section 3 by comparing this leading order correction to previous results of the spectral fluctuations. Specifically in section 3.6 we compare the leading order connected contribution of the spectral form factor to the RMT contribution and to numerical simulations from [16]. We show that this leading order contribution is the dominant connected contribution to the spectral form factor at early times, and even becomes the dominant contribution overall at intermediate times for large enough N .
In section 4 we show that the leading order contribution calculated in section 3 can be recast as a dual 0-dimensional large M = N p vector model. We then show, in section 4.1, that this vector model is actually a theory of the random couplings in the SYK model. This allows us to extend our analysis to couplings that have a non-Gaussian distribution. Finally the gravitational interpretation of this leading connected contribution is discussed is section 4.2, which follows the framework laid out in this introduction.
Section 5 focuses on the first sub-leading contribution to the multi-trace correlators, which is the leading contribution involving odd moments. Similar to the leading contribution, we first compute the double trace moments in section 5.1, followed by a derivation of the general structure of these contributions in section 5.2. We conclude this portion with section 5.3, in which we recast this sub-leading contribution as a new fluctuation parameter which is embedded in a fluctuating effective Hamiltonian.
The general structure of the multi-trace correlations, and their gravitational description, is discussed in section 6. We start by systematically describing the high order corrections to the connected multi-trace correlators as a perturbative collection of fluctuation parameters in section 6.1. This description follows the formalism described is this section. We then go on to interpret the dual gravitational picture by considering life in a single realization of the couplings in 6.2, and the connection of these connected contributions to geometric wormholes in 6.3.
In section 7 we use the multi-trace chord diagram description to calculate the leading order contributions to the connected multi-trace expectations of random operators. Then in section 8 we replicate the leading order connected contributions to the double trace correlators in the replica path integral formalism of the SYK model. In particular we show that the leading order contributions to the double trace correlators can be computed as a perturbative expansion around the disconnected saddle point of the 2-replica action for the collective fields G and Σ. We end the paper with a brief discussion on future directions in section 9.

Multi-Trace Thermal Partition Function
Our goal is to calculate the leading contribution to the connected part of the thermal partition function (1.1), by expanding in multi-trace moments. For example, the connected part of double-trace correlator is given explicitly by where k = k 1 + k 2 . The final expression in this case takes a very simple form and is given in equations (3.3), (3.5) and (3.7). Diagrammatic representation: A convenient diagrammatic representation of this quantity is the following. As the distribution of the random variables J I is Gaussian, we use Wick's theorem to pair the J I 's. We then represent this graphically as 'chord diagrams' (this is similar to the construction in [6], though we stress the following results are valid both in the double scaled limit and in the finite p large N limit). We then proceed as follows: • Each trace is drawn as a circle with k j nodes marked on it, one for each Hamiltonian insertion. 11 • Wick contraction of the coefficients J I is shown by drawing chords connecting the appropriate nodes. These chords can go between nodes in the same circle, or between different circles.
• The connected moments only contain contributions from the pairings that fully connect the different traces. 12 An example of such a 'multi-trace chord diagram' can be seen in figure 1(a,b). Next, we can represent these multi-trace chord diagrams in a more concise way as follows: we represent each trace by a single vertex (instead of a circle), and preserve the external chords connecting the different traces, omitting the internal chords. Only if the resulting diagram is fully connected will it contribute to the connected moment. An example of this is presented in figure 1(c,d). In order to avoid confusion with the vertices in chord diagrams, we represent each trace by a square-like vertex, and refer to these diagrams as 'multi-trace diagrams'.
We will use this compact diagrammatic description to compute the leading order connected contributions to the moments and the thermal partition function. We shall start by focusing on the double trace expectations as an illustrative example in section 3.1. Then, we move on to the general multi-trace expectations in section 3.2.

Double trace expectation value
We start by computing the leading order contribution to the double trace connected moments M c (k 1 , k 2 ) (see (3.1)). We shall do this by considering the number of index sets paired between the two traces, and show that the minimal number of pairings is 2 and that it dominates at leading order. Consider first a single pairing of H's between the two traces. All other multifermion operators Ψ I are contracted within themselves in each trace, and the fermion species in these contractions appear in pairs. This leaves the fermion species, in a single H, unpaired within the same trace, and the result of the trace in the Hilbert space is zero (because tr(ψ i ) = 0). So a single pairing between the two traces gives an exactly vanishing contribution.
Next we may have two pairings with index sets I 1 , I 2 , each of them corresponding to a pairing between the two traces. An example of this is shown in the left hand side of figure 2. In this case the traces do not vanish only if I 1 = I 2 for the same reason as above. Since the same index set I 1 = I 2 appears twice in each trace, the corresponding nodes within each trace are effectively contracted, as shown in the right hand side of figure 2. A priori the effective contractions within each trace are not independent (they are the same index set in the two traces), so naively we cannot do the sum over the index sets in each trace separately. However, since the trace is independent of permutations of the i = 1, · · · , N sites, and we sum over all the other index sets, we have in (3.1) the exact relation N p −k/2 I,J,K,··· tr (Ψ I Ψ J · · · Ψ I · · · ) tr (Ψ I Ψ K · · · Ψ I · · · ) = N p −k/2−1 I,I ,J,K,··· tr (Ψ I Ψ J · · · Ψ I · · · ) tr (Ψ I Ψ K · · · Ψ I · · · ) .

(3.2)
We see that this factorizes into the product of the appropriate single-trace chord diagrams that appear in the expectation value of a single trace, down by a factor of N p −1 (originating from the constraint I 1 = I 2 ). Each choice of two internal diagrams for the two traces comes from 2 k 1 2 k 2 2 choices of which two lines go from one trace to the other, 13 so we get Figure 2: An example of a diagram contributing to the double trace with k 1 = 4 and k 2 = 8.
In [16] and [22], it was found that having k pairings between the traces results in a suppression by a factor which is N p −k/2 for small k, and becomes a flat exponential suppression of 2 −N for large enough k. Thus the dominant contributions come from minimizing the number of pairings between the traces, at least for early times.
There are actually different ways in which more than two pairings can appear. For example, we could have four index sets I 1 , · · · , I 4 connecting the two traces, with one possibility for a non-vanishing contribution being I 1 = I 2 and I 3 = I 4 . 14 Clearly this diagram is subleading because the I 3 = I 4 imposes another constraint on the index sets. But there could also be more complicated situations. A particular case, which we deal with in section 5, is the case in which three sets I 1 , I 2 , I 3 are identified between the two traces. By the non-vanishing trace condition, as before, I 3 is fixed to be those indices in I 1 and I 2 that are not in their intersection. However, since the size of I 3 should be p, we must have in addition that |I 1 ∩ I 2 | = p/2, giving a further suppression. This effect is subleading in the partition function, but it is the leading effect for traces with odd numbers of H's, and hence can be isolated. But generally, the minimal number of pairings between the traces dominates the connected component, and hence the leading contribution is equation (3.3). We note that this is not a new result, and was derived several times before [9,14,22].
It will be convenient to organize the corrections that we calculate in terms of the small parameter We can now exponentiate (3.3) to arrive at the leading order contribution to the connected double trace thermal partition function Here Z(β) is the expected thermal partition function of the SYK model. If we consider the SYK model in the Schwarzian regime then Z(β) is the Schwarzian thermal partition function [3,29], while if we consider the SYK model in the double scaled limit then Z(β) was calculated in [6,7]. But, of course, equation (3.5) holds throughout all energy ranges, and for any length p of the Hamiltonian (as long as the couplings are Gaussian). We can also compute the connected spectral density function directly from (3.3) to be where ρ 0 (E) is the averaged spectral density function of the model.

Multi-trace expectation values and cactus diagrams
Next we would like to see what are the leading contributions for a general number of traces when considering the connected part after disorder average. We will see that these corrections can be described concisely via sums over 1PI 'cactus' diagrams (to be defined below).
We again represent the different possibilities for chords connecting different traces by diagrams, as described above. In these diagrams, every trace is represented by a vertex, and the edges are the chords that go between the different traces. The computation in the previous subsection, for the leading order contribution to the connected double trace, just corresponds to the diagram in figure 3. Another example, which contributes to the three traces connected correlator is shown in figure 4. Because of the further suppression when having more than two external contractions in a trace, as was the case in the double trace, one could expect that only nodes of degree two will be present at leading order (like the diagram in figure 4), but this is not the case as we will soon see.  To explain this, we will show that we can consider each diagram as a "Feynman diagram" in the following way. Each index set I represents a momentum, r, valued in Z N 2 , so that r i = 1 if i ∈ I and otherwise r i = 0 (where i = 1, · · · , N ). The condition that at a given vertex the I's are such that each index i appears an even number of times (i.e., the non-vanishing of the trace) is exactly r (j) = 0 for r (j) being the incoming momenta to this vertex, and the equations are valued in Z N 2 , which is precisely momentum conservation at the vertices. We see that this is a direct analogy to Feynman diagrams.
The first claim we would like to present is that in the connected average of V traces, the leading behavior is V −1 . To show this, recall that for every chord we have one explicit factor of coming just from the normalization of the Hamiltonian, so overall this gives a scaling of E where E is the number of edges. 15 Similarly, each free index set I (that we sum over) gives something of order at most I 1 = N p = −1 . The question then is how many free momenta, i.e., index sets, we have.
Each vertex gives one momentum conservation equation, and of course as usual one vertex gives just overall momentum conservation of the external lines, that we do not have here; so it just gives δ(0) and no additional constraints. So we have V − 1 constraints on the momenta, so long as the graph is connected. 16 Now, without further restrictions, all that those constraints would do is to eliminate precisely V − 1 of the free momenta. However, an important point is that our momenta are in addition restricted to satisfy |I| = p. If a particular momentum conservation boils down to be setting the sum of two momenta to be vanishing, this |I| = p restriction does not change the counting. However, this restriction can give further constraints if more momenta are involved. For instance, for three momenta I 1 , I 2 , I 3 incoming to a vertex, momentum conservation indeed fixed I 3 to be I 1 ∪ I 2 minus their intersection; this is what was taken into account in this counting. However, because |I j | = p, not only is I 3 fixed, but moreover I 1 and I 2 are now restricted to satisfy |I 1 ∩ I 2 | = p/2 in order to have a solution. As was mentioned, we will see that this still does not mean that at leading order we can have only vertices of degree two. What it does mean is that taking into account all the possible momenta conservations, if we cannot express them as setting only sums of two momenta to be zero at a time, we will obtain a further suppression (that is, the suppression will be greater than just fixing V − 1 momenta) and the result will be subleading. This point will be essential below.
With at least V − 1 constraints we get at most E − V + 1 free momenta and thus a suppression of at least It is simple to check that the circular diagrams of the form of figure 4 indeed saturate this bound. This justifies the claim above that this is the leading order connected contribution for V traces.
To arrive at the main claim of this section, let us use the following graph theory terminology. A path is a sequence of vertices v 1 − v − · · · − v 2 connected by edges such that all vertices (and edges) are distinct. A connected graph is a graph where every two vertices are connected by a path. A graph cycle is a sequence of vertices v 1 − v 2 − · · · − v n − v 1 connected by edges, such that we do not visit an edge twice, nor a vertex twice except for the first and the last one.
The graphs saturating the V −1 leading behavior, and so those which contribute at leading order, are connected 1PI graphs, 17 such that any pair of cycles have no edge in common (equivalently, any two cycles have at most one vertex in common.) In graph theory, connected graphs in which no pair of cycles share an edge are known as cactus graphs (or cactus trees) [49]; for examples, see figure 5. The claim is then that the leading order contribution to the connected moments scales like V −1 , and comes only from the connected 1PI cactus graphs. The restriction to 1PI diagrams in the claim above is easy to understand using the Feynman diagram structure. To show this, consider a graph that is not 1PI, where e is the edge that connects two connected components that are otherwise not connected to each other (see fig. 6). Since e is the only external leg of the connected components, by momentum conservation of each of these components we see that r(e) = 0, which is a contradiction as |{i : r i (e) = 1}| = p. Thus the diagram's contribution must vanish identically. Moreover, we can give a simple characterization of these diagrams that give the dominant contribution. A generic 1PI cactus diagram is just a tree of "bubbles" attached at vertices. Along the edges, there can be arbitrarily many vertices of degree two (where each bubble contains at least two vertices). A generic example is shown in figure 7. This fact, as well as the dominance of 1PI cactus diagrams, are proved in the following two subsections.

General structure of 1PI cactus diagrams
In order to characterize the general structure of 1PI cactus diagrams, we should concentrate on vertices of degree greater than 2, since as we saw a vertex of degree two only imposes that the two incoming momenta are equal (in Z 2 ). Therefore, consider a vertex v of degree > 2 in a 1PI cactus diagram, with edges e 1 , e 2 , · · · .
Consider going from v along e 1 , denoting the other endpoint by v 1 (we denote this by v e 1 − v 1 ). Then necessarily we can complete it to a cycle ending on v. Indeed, erase e 1 -since any non-vanishing graph is 1PI -it is still connected, so we have a path from v 1 to v not going through e 1 ; it necessarily completes to a cycle v − v with e i = e 1 (it is indeed a cycle). Now start from some other e j = e 1 , e i and do the same, getting another cycle ending on e k , with all v 1 , v i , v j , v k being distinct because no two cycles share the same edge. Therefore we see that v looks like a flower, with all its edges forming distinct cycles. In particular the degree of each vertex is even for 1PI cactus graphs (this is not true for a general cactus graph).
Every node on each of these cycles can be of degree two, or it can be of higher degree. In the latter case, we saw that it can have further cycles emanating from it. But let us show that they are only of the form we met, i.e., distinct cycles. Consider the first such vertex v 1 of higher degree on one of the cycles C 1 that we found that encircles v; see fig. 8. Let e be an additional edge of v 1 . Deleting e, since again the graph is 1PI, we can find a path P from v to v. If this path does not share a common edge with C 1 , then we get the cycles C 1 and P − e − P which have a common edge (where P is the path from v to v 1 , see the figure), being a contradiction. Otherwise, the path comes to C 1 . It cannot cross before v 1 since v 1 is the first higher degree vertex. If it crosses C 1 after v 1 , then we can still get two cycles C 1 , and C 2 with a common edge, as shown in fig. 9. Figure 9: The case described in the analysis in the text, where we get two cycles C 1 and C 2 with a common edge.
The only remaining possibility is that the path gets back to v 1 , meaning it is a cycle. This is possible, since the analogous construction of C 2 would not give a cycle. 18 Applying the entire same argument with v replaced by any other vertex on these cycles, we get that every such cycle can have more distinct cycles, or 'bubbles", emanating from its vertices (that is, we can attach to any such bubble more bubbles), but these are the only cycles that we have. In particular, the most general non-vanishing cactus graph (which is 1PI and all vertex degrees being at least two) takes the form of the example in fig. 7. More precisely, as claimed, it is a tree of bubbles attached at vertices, where any additional vertices of degree two can appear on edges. There is nothing more than those bubbles; this is because as we saw, every edge belongs to a cycle necessarily, and we mapped all the possible cycles. 19

Dominance of cactus diagrams
We saw that contributions to the connected moments cannot be larger than the scaling V −1 . We will now prove the claim from section 3.2 that a connected graph saturates this bound if and only if it is a 1PI cactus graph. (As we saw, the contribution of non-1PI graphs vanishes.) In order to prove this, let us recall when the bound V −1 was tight. At any vertex, we have momentum conservation stating that the sum of the incoming momenta vanishes. However, the important point is that when taking into account momentum conservation in other vertices as well, we may find out that not only the sum of all momenta vanishes, but also there are subsets of momenta where the sum of momenta vanishes in each subset separately. 20 After taking all the momentum conservation into account, when reducing those subsets to be of the smallest size possible, the question is whether there is at least one subset containing three or more momenta. If that is the case, then we saw that we will get a larger suppression than V −1 . This is because the momentum conservation not only fixes one momentum (giving the counting that lead to V −1 ) but also constraints the others. For example, in the case of a subset of three momenta corresponding to chords I 1 ,I 2 ,I 3 , not only that I 3 was completely fixed, but |I 1 ∩ I 2 | = p/2 necessarily, which gave a further suppression. If on the other hand, all momentum conservations lead to subsets including only pairs of momenta, we will have the V −1 leading behavior.
Let us start with the direction ⇐, showing that all cactus diagrams scale as V −1 . Consider a vertex v of degree > 2 with edges e 1 , e 2 , · · · , as in the previous subsection (if its degree is 2, then necessarily it gives a momentum conservation involving only two momenta). If we show that any such vertex has no further suppression, then we will get the minimal V −1 .
Given the general structure of cactus diagrams that we found in the previous subsection, this follows immediately. Indeed, because of the bubble structure found, as we saw the momentum on e 1 equals that on e i (in the notation used in the previous subsection), the momentum on e j is the same as on e k , and so on. Thus at each vertex we actually get that the sums of pairs of momenta vanish based on these bubbles, and we have the minimal V −1 suppression.
In the other direction ⇒, let us show that if we have two cycles with a common edge, we get a further suppression and so no V −1 behavior. That is, we must show that there is a minimal subset of momenta satisfying momentum conservation, containing at least three momenta. v e r r 2 r 1 Figure 10: Two cycles with a common edge.
Consider a pair of two such cycles with a common edge. Since they are different, we can find a vertex v of degree > 2, having edges with momenta r, r 1 , r 2 , · · · ; see fig. 10. In order to construct a minimal set of momenta satisfying momentum conservation in a well defined way, let us apply the following painting procedure: • We go from v in the direction of the common edge e with momentum r, painting the edge e.
• In each step there are the vertices that were touched by the edges we painted in the previous step -in the next step we apply momentum conservation, painting all the rest of the edges of these vertices that are not already painted.
• We apply this painting procedure at each vertex we come across other than v. 21 Therefore we will stop (necessarily after a finite number of steps) when there is no vertex having both painted and unpainted edges, with the only allowed exception being v.
We should think about this procedure as an equation, where the sum of the momenta painted in the previous step equals the sum of the newly painted momenta. 21 We never get back to a vertex we visited before, since all of its surrounding is already painted.
We can do this process in any order we would like with the same result. Therefore, we can just as well go first along the two cycles. We thus get r 1 and r 2 necessarily. After we complete all vertices other than v, we may get some additional edges of v. Thus we found a subset of momenta satisfying r + r 1 + r 2 + · · · = 0. By the procedure we did, there is necessarily no smaller subset containing r that satisfies momentum conservation, while this subset includes at least three momenta. This means that this diagram is suppressed more than V −1 as we saw. An example of this construction is shown in figure 11. This completes the proof showing the dominance of cactus diagrams. r r 1 r 2 r 3 r 4 r 5 r 6 Figure 11: Applying momentum conservation for a common edge to two cycles. In this case we get r = r 1 + r 3 + r 4 = r 1 + r 2 . Here the resulting equation includes only r, r 1 , r 2 without r 5 , r 6 , but in other cases we may get more momenta. This diagram scales as 4 .

Partition functions and density of states correlations
With the understanding of which graphs to consider, we move to calculate the leading order contribution to multi-trace moments and thermal partition functions. Let us first consider the contribution of each cactus diagram to the connected moments M c (k 1 , · · · , k n ). Recall that from the disorder average, we have the contractions that link the different traces, as shown in the cactus diagrams, and the remaining contractions will form the usual chord diagrams in each trace separately. From now on we restrict to only cactus diagrams. For these, every full multi-trace chord diagram will be multiplied by two factors: (1) a power of according to the number of constraints, which as we saw is V −1 for a cactus diagram, and (2) a combinatorial factor that counts how many multi-trace chord diagrams result in the same internal chord diagrams for each trace (one such example appears in figure 2); this contribution is described precisely below. As we sum over the chord diagrams inside each trace, we get that M c (k 1 , · · · , k n ) is i M (k i ) times the two factors mentioned above. All that remains is to describe the combinatorial factor in item (2). For every vertex i of degree d i in a cactus diagram, each of the d i /2 inner chords will be contracted to other vertices. So first we need to choose which chords participate in the multi-trace contraction, which is k i /2 d i /2 options. Clearly, there are (d i /2)! options to choose to which of the chords in each vertex we contract. Then we need to contract individual J's (and not only chords) between the different traces. So what remains is simply to note that there are two ways to contract two chords. Similarly, for a cycle made of n vertices of degree 2, there are 2 n options to contract the chords. Therefore we should assign a factor of 2 for every edge in the multi-trace diagrams. The exceptional case n = 2 is accounted for by recalling that it has a Z 2 symmetry, and we should divide by a symmetry factorŜ for each diagram. For a single such component of n = 2 we have a factor of 2 contribution toŜ giving indeed the correct counting. In these kinds of cactus diagrams where the vertices are labeled, these Z 2 symmetries are actually the only symmetries we have.
To summarize this, we have whereŜ is the symmetry factor of each diagram, as described above. Now let us consider the connected thermal partition function (remembering from the beginning of this section that all d i are even) (3.10) The cactus diagrams in the last formula are for fixed vertices associated to chosen k 1 , · · · , k V (that is, the vertices are labeled). Therefore there is a large degeneracy in this formula, where we will need to include many different labelings of the same cactus diagram. Instead, it will be more convenient to have only one cactus diagram of each kind in the sum, without including permutations of the vertices. We can take care of this degeneracy by including a factor of V !, and then diving by the symmetry factor related to vertex permutations, as usual in Feynman diagrams; this symmetry factor will be included as usual in the symmetry factor S of unlabeled graphs. Therefore we can write this as (3.11) Let us clarify this formula. As mentioned, we want to count each distinct cactus diagram just once. Then, in this formula, we fix an assignment of the β i 's to the vertices of the cactus diagram in an arbitrary way. Of course, at the end, we know that the partition function is symmetric in the β i 's, so the obtained expression in this formula is implicitly understood as symmetrized in the β i . As mentioned, the over-counting that can happen in specific diagrams is accounted for as usual by the symmetry factor S (which is now enlarged with respect to the previous case ofŜ since the vertices are not labeled, similarly to the usual internal vertices in a Feynman diagram). We give an example demonstrating how to use this formula below. Finally, we can translate this to the joint density of states as well 22 We note that all these results hold to leading order in N , or equivalently in . Furthermore, we again stress that the average density of states ρ 0 (E), thermal partition function Z(β), and moments M (k), depend on the scaling and limit one considers the SYK model in; but the result itself is universal and valid in any large N scaling, either the fixed p or the double scaling limit.
The formula (3.12) for the connected density of states can be written as a superposition of integer (positive or vanishing) powers of the dilation operator E ∂ ∂E for the various energies acting on the leading order density of states ρ 0 .

An example
As an example consider a diagram with V vertices where one vertex is of degree 4 and the rest have degree 2. It is not hard to see that the only such connected diagrams are of the form shown in fig. 12 (and they are cactus diagrams). The contribution to ρ c of such a diagram for a fixed number k of degree 2 vertices on one of the circles is (3.13) 22 Note for this equation, that ρ 0 is even.
where the operator D k acting on E i is The symmetry factor is just 4 because each cycle has a Z 2 symmetry. 23 We wrote the result explicitly, but it is simpler to write it in the symmetrized form as instructed in (3.12). This means that we can simply forget the superscripts and just write this as Again, this formula is understood by distributing arbitrarily the V energies in the V factors of D k , and understanding the expression as symmetrized in the energies.

Time scales and comparison to RMT fluctuations
We have written the leading contributions to connected multi-trace correlators using dilatation operators acting on the spectrum. Actually this is the first out of a whole series of transformations on the spectrum (we discuss the next term in section 5.1 and comment on the others in section 6.1). Each of them has an amplitude, and may become important in a different range of energies, or time scales. In this section we make some comments on the time scales associated with the leading operator, and compare to numerical data. 24 It is a well known empirical fact that the statistics of the nearest neighbor eigenvalue spacing in the SYK model is the same as that of a random matrix ensemble (RMT).
The exact ensemble (GOE, GUE, or GSE) depends on the particle hole symmetry class, leading to a complete classification of the RMT ensemble based on the values of N mod 8 and p mod 4 [8,[16][17][18]. This universal level spacing statistics is given by an exponentially suppressed term in the double trace spectral density, ρ(E, E ), which dominates at energy separations of the typical eigenvalue separation E−E ∼ 2 −N/2 (the first term on the RHS of equation (3.16)). The perturbative moment method we used does not allow us to find this universal term as it is non-perturbative or exponentially suppressed in N . 25 The range of accessible moments implicitly determines the scale of energy separation that we can probe. If we can reliably compute moments up to the k'th moment, then we can resolve energies (and energy separations) up to 1/k of the energy range. However, the approach above does not necessitate the explicit evaluation (or approximation) of any moment -rather we write the connected part as a dilation operator acting on the moments, for which we can take the exact value at each k. So the bottleneck question is up to what order is the operator reliable (in amplitude or in energy range).
Generally, explicit evaluation and re-summation of moments is reliable for finite k (in the limit of N → ∞). But since we can act with our operator on the exact partition functions we are not limited by this. We can therefore hope that the leading contributions to the connected double trace spectral density are valid down to perhaps polynomially small energy separation of order E − E ∼ N −#p . Actually there are arguments that in this case the situation is considerably better, and that the connected double trace spectral density computed by the moment method is correct up to exponentially small energy separation (but still much larger than the typical level spacing of 2 −N/2 ).
The authors in [21] used a sigma model to compute the double trace spectral density and argued that at small separations ω = E − E ∼ 2 −N/2 this spectral density consists of a random matrix part plus a one loop correction. The one loop correction can be written as a sum over massive modes which are suppressed by a power law in N : (ω) is the contribution from random matrix theory, ∆ = 2 −N/2 is the average energy spacing in the bulk of the spectrum, and (k) = T −1 k − 1 are the masses of the These massive modes become important at level spacing of the order ω ∼ N # 2 −N/2 , which are still exponentially suppressed but much larger than the RMT scale. Furthermore, a calculation in [22] showed that the leading term in the massive modes agrees with the leading term moment calculation, while a two loop calculation in the sigma model scales like the next leading term in the moment expansion. Thus there is some evidence that we should be able to trust the perturbative moment series up to energy separations that are exponentially small in N , and not just suppressed in powers of N .
(A caveat to this is that we were not been able to precisely match (3.16) to our formulas. There are two related reasons for this -the first is that (3.16) smears over the average energy, and the second is that it is more reliable at the center of the distribution, where the σ-model is defined. If we use our formula there, around the point E = 0, then the leading correction is just a scaling 1 + 1 We can translate the energy separation to a time scale by considering the spectral form factor [50][51][52] The late time behavior of the spectral form factor contains a ramp and a plateau [16,33], which is described by the appropriate β-ensemble. This ramp dominates the spectral form factor at exponential times which was approximated in [16] as t dip ∼ e S 0 /2 , where S 0 ∝ N is the zero temperature entropy in the large N limit (we will see in (3.20) that t dip gets modified when including the effects of the global modes). Thus we expect that the perturbative moment expansion is relevant up to these exponential times.
To be a little more precise, we mentioned before that the leading correction to the multi-trace correlator, which we computed so far, is only the first of an infinite (in the large N limit) set of corrections. When we say that the moment expansion is relevant up to the time scales above, we mean the sum of these terms. However, for the purposes of comparisons at finite p and N , which we do next, we use only the leading term computed above.
From our double trace moments we can approximate the known contributions to the spectral form factor as where the first and third terms are from [16] -the first term is the disconnected contribution and the third term is the RMT contribution given in equation (42) there. 26 The second term is the leading order contribution calculated in (3.5). N d is a symmetry factor counting the degeneracy of each energy level (for even N this factor is N d = 2 for N mod 8 = 0, and 1 for N mod 8 = 0). The normalized spectral form factor g (given in (3.17)) can similarly be written as g ≈ g d + g 2 + g RM T , where g d is the disconnected part, g 2 is the leading order correction and g RM T is the random matrix theory contribution. In figure 13 we present a plot of the approximate spectral form factor and its three contributions, which seems to match numerical computations of g and the connected part g c based on data from [16]. See appendix A for more details on how this comparison was done, as well as further comparisons for N = 28, 30, and 32. Figure 13: The approximate spectral form factor as a function of time for N = 34 and p = 4. g = g d + g 2 + g RM T is the full spectral form factor, g d is the disconnected portion given by the first term in (3.18), g 2 is the leading connected portion given by the second term in (3.18), and g RM T is the universal random matrix contribution given by the last term in (3.18). The connected spectral form factor g c = g − g d is also shown. These are matched to the numerical computation of g and g c from [16].
A more careful comparison of our g c and the numerical g c shows that they match with a typical deviation of less than 10%. The maximal deviation is around 35%. This is also to be expected -jumping ahead, the size of the first subleading correction is given in (5.2). If we plug in p = 4, N = 34, and compute the ratio between (5.2) and the leading correction we obtain a deviation of 33%, so this is the best that we can hope to do for this value of N . In the holographic Schwarzian regime, using (3.6), we can approximate the second moment contribution to the spectral form factor as This term at late times decays as g 2 ∼ t −1 , which is slower than the disconnected term g d ∼ t −3 . Thus we expect there to be some intermediate time frame where this term is the leading contribution to the spectral form factor, with a crossover time t c ≈ 2/( E 2 0 ). This crossover region is not seen for N = 34, p = 4, as the crossover time is very similar to the dip time, however for larger N this intermediate region should exist. We present an example of such crossover region in figure 14, where we plot the expected spectral form factor for N = 60. Numerical verification of such crossover region may be within reach with the recent advances in numerical techniques that have been used to calculate certain correlation functions numerically for N = 60 [53].
This slower decay of g 2 compared to the disconnected spectral form factor also implies that the dip time (defined as the time of transition from the decay to the ramp) will be larger. As mentioned above, the dip time as inferred from the disconnected contribution behaves as t dip ∼ e S 0 /2 [16]. Taking into account the correction from g 2 and finding the time where it crosses the ramp behavior, we get (where we show the scaling, dropping overall constant and β factors). This is a much later time compared to t dip . In fact, its exponential entropy dependent term is the same as that of the plateau time t p ∼ e S 0 + C 2β found in [16]. They are still exponentially separated in N , in a temperature dependent way, owing to the C/(2β) term (as C is linear in N ).

Dual Vector Model
In this section we will take the Feynman rules that gave us cactus diagrams as the leading order contribution to the moments of the Hamiltonian, and show that they are equivalent to a zero dimensional vector field theory. We then show that this dual theory is in fact closely related to a theory for the random couplings, which allows us to extend this analysis to non-Gaussian distributions for the random couplings.
To obtain the vector model, we first go over the Feynman rules we found for evaluating a cactus diagram, and extend them to any Feynman diagram such that only cacti remain to leading order. The general rules for cactus diagrams are as follows: 1. Each vertex represents a single trace, with k Hamiltonian insertions.
2. Each line connecting vertices is a propagator with the scaling of .
3. Each closed loop comes with a value of 1/ from the sum over index sets.
Under these rules, a cactus diagram with n vertices is of order n−1 .
These rules are the same as for a large M = 1/ 0-dimensional vector model. We can think of each trace as a vertex, and associate to each index set I a dual scalar vector field φ I . Specifically a trace with k insertions becomes a single vertex with a value M (k) Furthermore, we take the propagator for φ I to be Gaussian, with In this large 1/ vector model we see that propagators come with a factor of , as desired, and closed index loops indicate a summation over all indices and thus give a 1/ . It is simple to check that the leading order term of any number of traces in this theory will be 1, which is achieved by contracting all pairs of identical index sets φ I from the same trace with each other. This corresponds to the leading order disconnected moment, and indeed it has the right value of 1 (times the disconnected contribution). The leading order connected piece is indeed cactus diagrams, as before, and the combinatorial factor it gives is identical to the one above as the counting of the number of diagrams is exactly the same.
We can also translate the multi-trace chord diagrams to this large 1/ vector model, by simply thinking of each chord as a φ propagator and each trace as a vertex. An example of this procedure can be seen in figure 15.
The complete joint moments (to leading orders) can be calculated using this dual model as where C is the normalization C ≡ dφ I e − I φ 2 I /(2 ) . We will denote the expectation in this vector model by so that 1 φ = 1. Then the connected expectation is defined recursively as in (1.6), only now with respect to the expectation over the dual fields φ I . This dual vector model gives the correct leading order behavior of the connected moments. We can also exponentiate the moments (4.3) to get the joint thermal partition function as an expectation in this vector model (4.5) To calculate the leading order connected contribution we consider only the connected parts, or joint cumulants, on both sides of (4.5) with respect to the expectation over the φ I 's.

The vector model as a theory for the couplings
Already in the dual model results for the moments and the thermal partition function, equations (4.3) and (4.5), it is apparent that the expectation over φ I 's is similar to the expectation over the random couplings J I . Not only do they correspond to the same index sets I, but they also have the same Gaussian distribution (once we re-scale the 1/2 factor in the Hamiltonian). In this subsection we will make this equivalence exact. As a consequence of this, we can extend our results to non-Gaussian random couplings. Throughout this subsection we will assume that the random couplings are independent identically distributed (i.i.d) with zero mean, unit variance (by their definition, with J extracted), and bounded moments (but not necessarily Gaussian). We can also easily extend to other cases as we comment below. Under these assumptions the SYK model is self averaging and independent of the exact distribution of the random couplings in the large N limit [9,15]. 27 As we will see, multi-trace connected expectations do however depend on the exact distribution of the couplings [54], and so we should consider different distributions for the couplings when computing connected multi-trace expectations. We will further assume in what follows that the distribution is even to make the computations simpler, though this is not strictly required.
The idea is that at the leading orders we can perform the trace first, before the expectation over the random couplings, and what remains is a theory for the couplings. We can do it perturbatively around a Gaussian model. This is done as follows. Consider a particular contraction of J I 's in the averaged multi-trace. The argument here is valid only for the contractions that are leading in N for a given connected component with an even number of insertions. As derived above, at leading order in N all the J I 's in a given trace will eventually come in pairs. As a result we can first evaluate each trace as a chord diagram, giving us a factor of the disconnected moment. There are various different contractions of the couplings that will give the same internal chord diagrams. At leading order in N , the same set of contractions is obtained by considering the expectation value of product of terms of the form J I 1 J I 2 · · · J I 1 · · · J I 2 · · · for every trace, with ordering according to the chord diagram in this trace. (That is, we impose the ordering of the chord diagrams using large N .) But this means the coefficients coming from the J's of all chord diagrams are the same, and are just ( I J 2 Note that in the traces over the fermions there are constraints on the indices, which just give the overall suppression by a power of , as explained before.
For clarity, let us give a simple example where we show this explicitly. We can do the following sequence of equalities for the particular contraction shown here, with an implied summation over index sets: Let us repeat the steps. In the first equality, we used the fact that in the leading order in N contractions, the J I 's come in pairs. This is the crucial step from which the argument already follows. But to be explicit about the chord diagrams we can continue with the evaluation. In the second equality we separated the indices of the J I 's and the fermions Ψ I 's, and it results in the explicit binomial coefficient (the variances from the contractions are the same in the second and third lines). In the third equality we used once again the argument in (3.2) by which we can make the indices in the traces independent, with the appropriate binomial suppression. Therefore, we see that We note that (4.7) only holds at leading order in N for each connected component. As (4.7) matches the expression for the moments from the dual vector model, (4.3), it follows that the φ I 's of the vector model are really the random couplings J I , up to a normalization of −1/2 . The distribution of the couplings J I could include higher moments (independent of ) and the perturbative analysis goes through, with the result (4.7), where the average over J given with the corresponding distribution. We analyze a simple example below. We could just as well consider couplings that are not independent, and the result remains the same. The difference is that the scaling with of the higher moments should be taken appropriately, and it can be chosen so as not to affect the single trace moments.

An example of non-Gaussian couplings
As a simple example of such a computation with a non Gaussian distribution, consider adding a 4-point interaction term I u 4! J 4 I to the distribution of the couplings. This adds a new vertex with 4 identical φ I fields to the vector model. Notice that all the moments of J I for fixed I are independent of N and uniformly bounded (in the large N limit), so we have universality of the single trace quantities, in the sense that they are independent of u. For example, Z(β) does not depend on u in the large N limit but Z c (β 1 , β 2 ) does.
Consider the double-trace moments, and let us consider them as a perturbation theory in u at leading order in (or at each order in ). At leading order, we have the diagrams shown in fig. 16, where the notation is as before, and in addition by non-filled squares we denote the perturbations (recall that the filled ones stand for the traces). The first diagram is the only one we had before, and gave us something of the order . The second diagram goes as 2 1 u and so is also proportional to . 28 We see it affects the (connected) double trace at the leading order in . Its effect on the disconnected part is subleading. Thus the double-trace moments up to order u (in the convention above) is (4.8) 28 The first factor of 2 comes from the insertions, and the 1/ is from the sum over the free index I. Note that we do not allow self contractions of the interaction vertex.
We can continue in perturbation theory in the standard way to find higher order corrections in u, or add additional interaction vertices for the couplings which can also be taken into account in a similar manner. Thus the vector model is a powerful tool to find the leading order multi-trace correlates for arbitrary distributions of the coupling. 29 We would like to emphasize the point that M (k) does not depend on u, but M (k 1 , k 2 ) does. I.e., single trace quantities are universal, while connected multi-trace quantities are not, and even the leading corrections change as the distribution changes. This means that at the level of a single trace there is a single gravity description (independent of the distribution of the couplings) but the gravity description of the theory with several boundaries is not universal at the time scales that we are interested in. We would like to interpret the results above in the gravitational dual theory (going back to the Gaussian distribution for the couplings). In equation (4.3) we wrote (to leading order) the connected multi-trace correlator as a simple integral, where in the integrand we have the single trace quantities after the ensemble average. Therefore, to obtain the gravitational interpretation we just need to rewrite each trace in terms of the gravity dual, interpret the result, and we are done. The only twist comes in the interpretation stage. The integral that we carry out is over N p quantities, and these do not necessarily have a meaning in gravity, which is the effective theory of the averaged quantities (in the SYK context). We would therefore like to replace this integral by a "compressed" version. In other words, we would like to find a set of "minimal" modifications to the gravity action that produces the full cactus expansion for any number of traces. We therefore begin by "minimizing" the vector model to the smallest possible set of degrees of freedom, which is just the fluctuation of 29 So long as the distribution satisfies the standard requirements of being bounded with a normalized variance, so that the single trace expectations are universal and self averaging.
I . This is similar to a single instance of the average over α states discussed in [23,61]. This is straightforward. Starting from equation (4.5), we can go to radial coordinates for the φ I 's with h 2 ≡ I φ 2 I , and integrate the N p − 1 dimensional sphere. Then we see that the joint thermal partition function is simply 2 /(2 ) , and P h 2 (h 2 )dh 2 (implicitly defined by the equality) is a probability measure on the fluctuation parameter h 2 .
From here we can do a simple saddle point calculation of this integral. The large 1/ action for h 2 has a saddle point at h 2 = 1 and expanding around this saddle point gives the cactus diagrams we saw before. For example, the leading order connected two trace thermal partition function is obtained by setting h 2 = 1 + √ ρ, and expanding the integral for small .
The important point is that Z is already the ensemble averaged single trace partition function. That is, and the non-trivial connected multi-trace correlator is induced only by the fluctuation parameter h 2 . In the 2nd equality we emphasize that it is also the partition function in an ensemble where the average size of the couplings is rescaled by h 2 .
The main point now is that if the ensemble averaged theory has a gravitational description, then we can replace each Z in (4.9) by the gravity expression for the quantity, i.e., In the SYK model that we are discussing, this can be concretely one of the following • H grav = H Schwarzian in the low energy limit. For example we can take 30 H Schwarzian = H Liouville (as in [55,56]).
• H grav = T double scaled where the latter is the transfer matrix of the double-scaled SYK model (as in [6]). 30 with an additional constraint on the initial and final state In any case, if we think about (4.11) as coming from some Euclidean path integral over some geometry, then each space is multiplied by a fluctuating parameter which rescales the entire action. Another way of phrasing this is to define an effective Hamiltonian and an effective partition function Equation (4.11) is perhaps familiar from the discussion of wormholes [20,23,24,43,[57][58][59][60][61][62][63]. There too, wormholes are equivalent to fluctuations in parameters in the theory. There are, however, some differences.
The first is that there are no wormholes. In fact, this effect is considerably larger than anything that can be obtained by a smooth 2D surface with multiple boundaries. Smooth surfaces will introduce a connected component which scales like where κ is a finite number which depends on the topology of the surface (and N is the number of fermions). Here the size of the effect is for the double scaled SYK model. In particular, it is a perturbative effect for finite p, which is perhaps an irrationally large value. This is of course just the power of N in which we will expect to see the effects of replicas. The other difference is that Z in equation (4.10) is still an ensemble averaged quantity, and not the partition function for a fixed value of the parameters. This, however, can be fixed at a small cost. We can consider a different averaged partition function, which is the ensemble average conditioned on the value of where h 2 is slightly off 1. We can then use the fact that (h 2 2 ) k/2 m k = m k|h 2 (1 + O( k 2 )). (4.17) So to leading order we can replace each Z by the conditioned partition function.
Life in a given realization: Consider now taking (4.17), and resumming it to the partition function, to obtain then we can discuss the dual of a given realization. In a given realization of the theory has a specific h 2 2 value which slightly differs from 1. This is the leading piece of information about the specific realization. It seems that, to the order that we are discussing, correlation function in this specific realization are the same as correlation functions in a universe obtained by integrating over all the couplings with the conditional distribution (i.e. constrained by . So on the one hand we are provided with the leading information about the distribution, and on the other hand we are still averaging on almost the same number of random coupling, giving rise to a gravitational dual (as in the standard coupling-averaged SYK duality). This is at the leading level of precision. More and more detailed information about the specific realization appears in higher and higher order corrections, and we turn to these in the next section.

New Fluctuation Parameters
The multi-trace diagrams that are not of the cactus type give higher order corrections in N . However, if one of the k's in M c (k 1 , · · · , k V ) is odd, there are no contributing cactus diagrams and a non-zero answer comes from these higher order terms. Higher order contributions of this type are therefore both easy to isolate, and provide a useful example of how generic higher order corrections work in general.
We will start in section 5.1 by analyzing in detail the case of two traces M c (k 1 , k 2 ) for k 1 , k 2 odd, and find its leading behavior. Then in section 5.2 we will generalize this to other multi-trace moments, and in section 5.3 we show how to change the effective Hamiltonian to generate such connected terms. In particular we will show that if we require locality of the effective Hamiltonian, then not only would there be new fluctuations parameters, but we necessarily have to introduce new fields into the theory with specific couplings. We will occasionally refer to them as fluctuation fields.

The odd moments of the double trace
As we found before, the leading contributions to the double trace partition function come from the minimal number of couplings connecting the two traces. Thus the next order correction to the double trace partition function comes from connecting three couplings between the traces, 31 which contributes to the connected odd moments M c (k 1 , k 2 ). An example is shown in figure 17. We shall start by computing the lowest of these moments, M c (3, 3), and then generalize to other k 1 , k 2 . We must fully connect the Hamiltonians between the two traces for this contribution to be non-vanishing, and additionally all fermionic indices must appear exactly twice within each trace. As mentioned, this immediately tells us that every one of the three Ψ I j 's in each trace must share half of their indices with the other two Ψ I 's. 32 M c (3, 3) is now straightforward to compute. Having balanced the fermions in each trace as was just mentioned, ensemble average over the couplings implies that we have the same index sets in the two traces. The only thing that can change between the two traces is the ordering of the fermions inside them. Since the trace is cyclic we only have two possible such configurations, and since we are summing over all configurations we 31 Recall that when we connect a single coupling between the two traces we get zero in each trace. 32 We have such a single solution for the intersections size only in this case. This is another distinguishing feature of this first subleading correction. In section 6.1 we will discuss the situation where four or more index sets are correlated, in which case there is a large number of possibilities for partial overlaps.

are bound to receive contributions from both. This gives us
where we have implicitly rearranged the ordering of the indices in each index set relative to the previous convention. This, however, does not introduce any additional signs since we rearrange the index set in the two traces simultaneously. We see that in the first line the traces have the same ordering, and in the second line we have switched ( so that if p/2 is odd the two contributions in (5.1) cancel each other, while if p/2 is even they add up. We can take the sum over all possible index sets explicitly in this case, and find that For the rest of the section we will assume p is divisible by 4 and so this contribution does not vanish. Now let us consider arbitrary k 1 , k 2 . We should go over all possibilities of choosing, in each trace, which three Hamiltonian insertions participate in the triple contraction as above. When we draw the chord diagrams it is convenient to cut the diagram (or start the trace) at one of those three insertions, which we can think of as the "first" one. Since the trace is cyclic, we need to divide each trace by 3, as any of them will be counted once as the "first" one. The distances between those three insertions can be arbitrary, hence we should sum over them. Each trace therefore contributes Note that we dropped the constraint that I 1 , I 2 , I 3 are distinct, which is justified at large N . 33 The normalization factor is chosen in correspondence which will be useful when handling W k as a 6-pt function later on.
With this definition, the full double-trace connected moment for odd k 1 , k 2 is given by (as mentioned, we choose for each trace the first element, giving a factor of k, and divide by 3 for overcounting).
We see that the computation reduces to a calculation of a certain (ordered) 6point function. In fact, up to a numerical value, we could consider a generic 6-point function (where every pair of operators are next to each other), and allow all possible contractions of those operators, except for neighboring ones (as in normal ordering). For p/2 even, this is just what we have here. We will see such a computation explicitly in subsection 5.3.
Computing this 6-point function depends on the precise large N limit considered, and can be done in the scaling in which p is independent of N using [64]. Here we will perform this calculation explicitly in the double-scaled limit. But it is important to emphasize that for any scaling the result is a product of two 6-point functions, one in each space.  figure 18. It can be calculated similarly to what is done in [6], which we closely follow. The idea is that the intersections with the first and third dashed chords are easily accounted for, since any open chord that we have in the regions between the dashed chords, necessary crosses them. This is accounted for by the operator S that gives a factor of q 1/2 for every open chord (since here the dashed chords correspond to operators of size p/2). All that remains is to account for intersections with the dashed chord in the middle. Propagation along such a contracted pair is explained in section 3.1 of [6]. Denoting 33 When viewed as chord diagrams, we do not assign a value to the intersections of the corresponding chords, but rather only to an intersection of a Hamiltonian chord with one of the I j chords. Since at large N we have no elements in common to three such sets of size of order p, we can drop the empty intersection condition. The calculation can also be done without this assumption with the same large N result. an operator of size p/2 byM , and using notations as in [6], 34 W k is given by × H n+i (cos θ 1 |q)H n (cos θ 2 |q)H m (cos θ 2 |q)H m+i (cos θ 3 |q).

(5.5)
This can be simplified, written in terms of Al Salam-Chihara polynomials Q i (defined in equation (B.14) of [6]) This expression for W k completes the result for the double-trace connected odd moments.
k 1 k 2 k 3 Figure 18: A demonstration of the 6-point function that is needed for the calculation of odd moments. 34 T is the transfer matrix that represents the Hamiltonian in chord space. U (D) are matrices with 1's one diagonal above (below) the main diagonal, and P

Multi-trace correlators
In the previous subsection we discussed the case of two traces, with odd powers of H. It is easy to generalize it to the case of a general number of vertices V , where two of them are of odd degree. In this case, the general diagrams that contribute to the connected moment are cactus diagrams built on top of the basic diagram of the previous subsection. A generic diagram of this sort is demonstrated in figure 19. In order to represent these contributions in the theory for the couplings J (as in section 4), we use, as before, that whenever three index sets are paired the trace forces those three indices I 1 , I 2 , I 3 to satisfy |I 1 ∩ I 2 | = p/2 and then I 3 = I 1 ⊕ I 2 (where ⊕ stands here for the XOR operation). Then, the vertices with odd degree k correspond to the operator This allows for the leading order case of having a single constraint involving three paired chords, with the rest being of the form of cactus diagrams.
Just as before, the propagator is still normalized to 1 as it simply comes from the value assigned to each chord (that is, a contraction of two J's) J I 1 J I 2 = δ I 1 ,I 2 . (5.8) The traces of even degree are just as before The generalization to further higher orders should now be clear. Any vertex corresponds to a sum of terms of the form such that every SYK site appears an even number of times among the I j 's, or, more concisely, k j=1 r j = 0 where r j is the set I j represented in Z N 2 , multiplied by the appropriate amplitude. This is simply the trace constraint. For every given structure of indices, the trace translates to a certain form of a chord diagram (that is not just a contraction of pairs), and the value of the diagram (with the appropriate combinatorial factor) is the amplitude. The expressions above, equations (5.7) and (5.9), give the leading contribution in this sum, for odd and even degrees respectively.

More than two vertices of odd degree
We saw what happens exactly at leading order when we have two vertices of odd degree. Now we would like to discuss the general case. To be concrete, suppose that there are non-zero diagrams contributing when we use (5.7) for the odd degree vertices, and (5.9) for the even degree vertices. Are there other possibilities that may give a result that is of lower order (i.e., more dominant)? As we will show here, the short answer to this question is "no", namely that a non-vanishing result with the vertices (5.7) and (5.9) will be the leading order. For example consider the case of four traces with all nodes having an odd degree. Two of the diagrams contributing to this case are shown in figure 20. While all vertices are of the form (5.7) in the diagram on the left, there is one vertex not of this form for the diagram on the right (requiring three invariants, each consisting of three J's). In these cases it can be checked that the order of the first diagram is 4 , while the second one is 9/2 which is subleading.
In order to show the statement above, first note that if we use basic invariants of the J's that are higher than the J I 1 J I 2 J I 1 ⊕I 2 and J 2 I appearing in (5.7) and (5.9), then clearly the result will be of higher order. What is not clear is what happens when the Figure 20: Two different diagrams contributing to four traces with odd degree. Note that additionally there is a diagram looking like a tetrahedron, contributing at the same order as the diagram shown on the LHS.
vertices are made of one or more of these dominant quadratic and cubic invariants (as we have in the example on the right hand side of figure 20) 35 . In all these cases, we can represent them by the familiar double line notation. Namely, we split all index sets of size p, and consider now index sets of size p/2. The random couplings are then of the form J I 1 I 2 and the cubic invariant is J I 1 I 2 J I 2 I 3 J I 1 I 3 .
Suppose then that we have a Feynman diagram with V n vertices of degree n (for n = 2, 3, · · · ), E (double line) propagators and L loops (of single lines). Then as we saw each propagator gives , while each loop gives an order of −1/2 (since the sets are of size p/2). Then each diagram is of the order (using χ = V + L − E and E = n As expected, increasing the degree of vertices can only lead to a smaller result. Indeed, for the first diagram in figure 20 we have χ = 2, V 3 = 4 giving 4 . In the second diagram, we are connecting three times the basic diagram (of two vertices connected by three edges) each one having V 3 = 2 (and χ = 2) giving in total ( 3/2 ) 3 . Each such addition of a diagram can lower the degree by −χ/2 , that is at most by −1 , however, if we compare diagrams with the same number of vertices, it has an extra vertex giving 1 2 + n 4 , which suppresses by at least 5/4 . Therefore, it can only lead to higher order results. Note that for n = 2 we have 1 2 + n 4 = , and this is why we found many diagrams of the same order in the even degree case, which are the cactus diagrams.

The effective Hamiltonian of the h 3 fluctuation
In subsection 5.1 we wrote the connected odd correlation function in equation (5.4). In the double scaling limit, an exact expression for W k is given in (5.5) but an inspection of figure 18 reveals that it is related to a 6-point function for any value of p. In subsection 35 Which scales as a disconnected diagram. 5.2 we showed how to incorporate this in the theory for the couplings. However, in the latter, we had to explicitly use W k as a new object in the vertex. Here we show another way to represent the correction to the double-trace of odd moments that automatically generates this W k correction via a relatively simple modification of the single trace Hamiltonian.
The ideology is similar to the one we used before in section 4.2. There we rewrote the connected part of the multi-trace correlator, induced by fluctuations of h 2 , as where the Z's on the RHS are the single trace partition functions after averaging over the couplings or, equivalently but more interesting for us, the gravitational partition functions. The fluctuation parameter h 2 is just h 2 2 = I J 2 I . Here, we would like to write the odd double-trace contribution in a similar form. The fluctuation parameter that we need to include is h 3 3 = 3/2 |I 1 ∩I 2 |=p/2 J I 1 J I 2 J I 1 ⊕I 2 . But for now let us not commit to using h 3 and denote it as a general parameter α where h 2 is the random variable which encodes the h 2 fluctuation, and α is a new random variable (or a set of a few variables) which reproduces the h 3 fluctuation parameter. P (h 2 , α) is their joint probability measure. The reason for using α is that we will present two possible options of implementing (5.13). The first uses a general α not related to h 3 , but we currently have it working only for the double trace. The second uses h 3 , and it works more generally, but it entails another peculiar ingredient to be discussed below.
The Z's are modified single trace "partition functions" computed separately for each trace, for which we can find a gravitational dual (for every relevant value of α). These new "partition functions" are approximately the familiar single trace partition functions (after ensemble average, or equivalently written in the gravitational language), modified by small α dependent terms.
The key requirement that goes into constructing the Z's is the requirement that the α-dependent modifications will be "minimal" when interpreted gravitationally. By this we mean, for example, that the modification will be local and will include a minimal set of particles and interactions, and with a "reasonable" (in some loose sense) gravity interpretation.
In the first realization, the form of the modification will be the following. At step one, we consider a random operator 36  Here H is the original Hamiltonian and we are averaging over the random coefficients in H and in O. By the normal ordering of O 2 we meant that we do not allow self contractions.
In step 2, we replace the RHS of (5.16) by the suitable gravitational dual. Each such Z has such a dual gravitational interpretation, since we are doing a small doubletrace (in the sense that we deform the theory by O 2 ) deformation of a background which has a good gravitational dual [46][47][48]. The net result will be a deformation of the original gravitational action by a small double-trace deformation with a random coefficient (which is correlated across different universes).
Basically, we will simply verify explicitly that (5.15), with an appropriate P (α), generates the correct fluctuation for two traces. Still, it is instructive to first argue why we modify the Hamiltonian by an O 2 term. The original expression that we are interested in can be written as where there are L 1 (L 2 ) products in the first (second) trace and L 1 , L 2 → ∞. To get the desired odd moments, we go over all possibilities of choosing three Hamiltonian terms out of the first trace in (5.17), as well as three Hamiltonian terms in the second one. We would like to correlate them in such a way as to generate (5.4), or more precisely figure 18. The diagram in that figure corresponds to a 6-point function, where each Hamiltonian is replaced by a a pair of operators. Effectively we replace the 36 As a reminder, a random operator O of length p is of the form where L are index sets of length p , and theJ are Gaussian independent random variables normalized such that J LJL = δ L,L .
3 Hamiltonian insertions (that we chose in each trace) by where A 1 +A 2 +A 3 +A 4 = L 1 −3, and α is a small parameter proportional to M c (3, 3) 1/3 . It will shortly be promoted to a random variable. The subindex in O j indicates the identification of the operators. This reproduces both the original α 0 partition function and the α 3 W L 1 −3 terms. The net result seems to involve adding O i O j terms to the Hamiltonian with a coefficient proportional to α. Changing the Hamiltonian into H + α i =j,i,j=1,2,3 O i O j is still cumbersome, since they are all operators with the same dimension (1/2) and the same propagators. In fact, we can do with a modification of the form if we focus on the α 3 term in each trace. So choosing 3 H's in the trace and replacing them by the 6-point function is correctly captured just by changing the effective Hamiltonian.
In this replacement we of course generate additional terms that we do not want. So there are several consistency conditions that we should impose on the Hamiltonian modification: • It should not modify, at leading order in N , the single trace partition function.
• Since we can bring down also 2 interaction terms in each trace, it might give us cactus-like contributions -we need to verify that this is not the case, at least in leading order.
• It should reproduce the odd double-trace leading order result.
Let us verify that (5.15) satisfies these constraints, with an appropriate P (α). The last condition is given in terms of moments as the requirement that is the effective moment after tracing over the microscopic operators and couplings. As above, α is a small parameter in which we expand. Explicitly, it should go as N p −1/4 in order to give the N dependence of the odd double trace. (This is because we will see that the leading contribution will come from six insertions of α, and we saw that the odd double trace scales like N p −3/2 .) We should specify a probability distribution for α that reproduces the 6-point function contribution that we saw above. Let us take P (α) = 1 2 (δ(α − κ) + δ(α + κ)) ; (5.22) we will specify κ in a moment. Note that this distribution satisfies Since :O 2 : is normal ordered, the effective trace vanishes at first order in α. Working up to sixth order in α, the only contribution to the connected double trace is where we have used cyclicity to start the trace before one of the operator insertions, just as before. We have not included various lower powers of κ which correct the disconnected moments (i.e., correct each moment separately), since we consider the connected moment. Each averaged trace is what we have in the definition of W k . 37 These are non-vanishing only for k 1 , k 2 odd. So, choosing we get indeed M c (k 1 , k 2 ) as in (5.4), establishing (5.20). This effective Hamiltonian has been constructed in order to reproduce the connected piece in the two trace correlator. One can ask whether, for example, it reproduces correctly all the diagrams in figure 19. The answer is no -it actually does generate all the correct diagrams for a multi-trace correlator with odd vertices, but it does not give the correct combinatorial factors. So one can either try and complicate the model, or re-start in a more systematic way (which will entail some additional twists).
Perhaps the most "natural" choice is to replace α in (5.15) by 26) and then to consider the vector model of the couplings derived in section 5.2. This is in analogy to the definition of h 2 as h 2 = ( I J 2 I ) 1/2 . This choice also has a "natural" probability distribution P (h 2 , h 3 ) derived directly from the probability distribution of the couplings (or dual vector model.) Replacing α by h 3 has the advantage of automatically satisfying the first and last conditions from the list above, however it does modify the even terms. Additionally there is some ambiguity as to which root to pick out of the possible three (if we allow phases). Unlike previously where taking h 2 to be the positive or negative square-root gives the same result, in this case the different roots do not give the same answer. Indeed the different choices of h 3 "interfere" with each other, and in particular if we sum over all three possibilities only the desired contributions with three h 3 terms remain.
This observation tells us how to modify the Hamiltonian in a consistent way: simply sum over all different choices of the cube-root in each trace separately. This is done by correcting the :O 2 : term using a new "random" variable χ which is to be drawn from the discrete measure over the complex plane with equal support at the points 1, e 2iπ/3 , and e −2iπ/3 . This new effective Hamiltonian now also satisfies the second condition, and allows us to write the effective action in a compact form 27) and The probability measure P (h 2 , h 3 ) for Gaussian couplings is given by integrating over the dual vector field with the defining constraints: It is worth noting several points. The first is that the χ variable is local to each universe. I.e. in each Z we sum over its own χ variable independently of the other universes. The only information which carries between universes is h 2 and h 3 as expected.
Averaging over χ for each universe, in and of itself, is not a problem. It just means that the theory splits into sectors, labelled by χ, and the full theory is the sum over these sectors. In each sector, the Hamiltonian is slightly different.
There is one fly in the ointment which is that the effective Hamiltonian (5.27) is no longer Hermitian as χ is complex, though the final result in the multi-trace correlator is always real due to the sum over complex conjugate values of χ. We are not sure how to interpret this from a gravitational standpoint, and if the loss of Hermiticity has any gravitational consequences, especially since the underlying microscopic Hamiltonian is always Hermitian.

General fluctuation parameters
We discussed until now the connected contribution of multi-trace correlators induced by fluctuations of h 2 = ( terms were easier to handle because they were the leading contribution to appropriate moments (they are also unique in that they are "isolated" in a sense that we will discuss below). These, however, are just the first out of an infinite (in the large N limit) number of fluctuation parameters. We will focus on the latter and briefly discuss their contribution to the connected two-trace correlator. One estimate for the size of contributions from higher fluctuation parameters is given in [16,22]. For the case of tr(H k )tr(H k ) , when the random coefficients are all paired between the traces in their order along the circle, then This is of the order of N p −k/2 for small k, and of the order 2 −N at large k.
This clumps many fluctuation parameters together, but we can be more specific about the strength of each one separately. Consider the case that we want to contract n Hamiltonians between the two traces 38 . The requirement that each Majorana fermion appears twice means that we need to split the n multi-fermions index sets into smaller 38 The two traces can have k 1 , k 2 ≥ n insertions in total.
index sets, such that each of the latter appears in two of the n Hamiltonians. More specifically, we split the multi-index sets of the Majoranas in groups as Ψ 1 = Ψ 1,1 · · · Ψ 1,n Ψ 2 = Ψ 2,1 · · · Ψ 2,n . . . Ψ n = Ψ n,1 · · · Ψ n,n , with the constraint that 1) Ψ i,j = Ψ j,i , 2) Ψ i,i = ∅, 3) the intersection of Ψ i,j 1 and Ψ i,j 2 is empty for all 1 ≤ i ≤ n, j 1 = j 2 , and 4) that the total length in each row is p. Notice that these four conditions are equivalent to contracting the Ψ I 's in an O(N ) invariant way, and hence we are really just enumerating the possible O(N ) invariant combinations of n couplings. If we denote |Ψ i,j | = n ij , then the n ij 's satisfy The associated chord diagram is the same as in figure 17 except with n insertions correlated between the two traces. Each fluctuation parameter corresponds to a different set of n ij 's that satisfy the constraints (6.3). For a fixed set of n ij 's the contribution of such a fluctuation parameter is determined by the number of random parameters J such that their index sets satisfy the four constraints listed above. In the limit N p, n ij , this fluctuation parameter contributes to the double trace correlator with strength where the numerator comes from the number of indices which satisfy the decomposition (6.2), and the denominators comes from the normalization of the J's and the fact that we have 2n of them participating in this term (n in each trace) 39 . In the double scaling limit, we rearrange the expression above to be ( √ λ/e) pn i<j 1/n ij !. We can carry out a saddle point estimate of the strength of this contribution in the large N, p, n ij limit. We define A = (2π) n(n−1)/4 (2πp) n/2 λ e np/2 , (6.5) and n ij = px ij . Then we can evaluate the following integral for which there is a saddle point at x ij = 1/(n − 1). This is of course just the range where each of the Ψ i overlaps with the others to a similar extent. Evaluating the width around the saddle point, we obtain that the contribution for (N, p, n) is , (6.7) for λ = p 2 /N .

(6.8)
This regime is relevant for the double scaled limit, when we take the fluctuation parameters to be smaller than p (which is taken to infinity). We see that there is no divergence associated with the increasing n.
In the case of finite p and n large, most of the n ij 's are zero and the analysis above needs to be modified. We will not do this analysis here.

Fluctuation parameters and the dual of a single realization
We would like here to briefly suggest a possible solution to the question of what is the dual of a given realization of the couplings J I , and how it differs from the ensemble average. In the discussion, we will focus on lessons that can be drawn from the twotrace correlator. The phrasing of the question will actually be different from "what is the dual of..." but rather it would be: suppose we are provided information about the couplings J up to a certain precision, what is the gravitational dual which captures all the observables up to that set precision?
The advantage of this phrasing is that it is naturally compatible with the grading in the strength of the fluctuation parameters, and it is also more operational in the sense that it sets a benchmark precision. Its main advantage though is that, at any finite order in precision, we still do an average over a very large set of random couplings, making a gravitational dual perhaps more reasonable. Of course, the precision can always be further increased.
We have really discussed it before in sections 4.2, 5.3 and 6.1, but here we summarize the structure. There were several ingredients in the computation of the connected two-trace correlator, which we summarize right below. We have argued for the first four in general, and have shown the last item for the h 2 and h 3 fluctuation parameters. The sequence of ingredients leads to a natural suggestion to what is effectively the dual of a given realization.
• In each trace, correlate sets of H's according to (6.2) in general. Each such splitting is associated with a fluctuation parameter, h 2 and h 3 being the simplest ones.
• The expectation values of the fluctuation parameters is zero (except h 2 ). However, their variance contributes to the two-trace correlator.
• In each trace the contribution is similar to a n(n − 1) functions, made out (in the generic case) of n(n−1)/2 insertions of pairs of the same operator. Each operator is made out of n ij = n ji fundamental fermions as in the previous subsection. We will refer to these operators as • We label by h {n ij } the fluctuation parameter that corresponds to the set {n ij , i < j} as in the previous subsection. Then these n(n − 1) functions are among the ones generated by the substitution Oˆiĵ . (6.9) We emphasize that this generates the required contribution, but in general it generates much more, and we have not explored whether they can be cancelled in general by a suitable choice of the h's joint distribution. To do this one might have to extend the set of fluctuation parameters.
• If we are able to cancel all the unwanted contributions that originate from equation (6.9) (by a suitable probability measure of the fluctuation parameter) as we have seen for h 2 and h 3 , then the multi-trace connected correlation function is given by a substitution where {h} is the set of L leading fluctuation parameters, and L is fixed by the degree of precision that we want to obtain. P is the probability measure on the space of fluctuation parameters.
Our proposal is therefore the following. If we have the theory in a given realization, we still need to specify the precision to which we are working. This amounts to specifying a limited set of fluctuation parameters which take the values prescribed by the given realization. We can then take the ensemble average, conditional on the values of these small number of fluctuation parameters. The dual of this is described by adding the additional operators (field in the bulk) {O n } and using the Hamiltonian H eff .
For a slightly more "operational" perspective, consider the case that we have some higher dimensional theory, which we will call the UV theory, which flows in the IR to some AdS 2 × M near horizon. Suppose also that the latter (the IR theory) is effectively described by some SYK like theory, or a set of weakly coupled SYK models, for which the rules above apply. We would like to consider the experience of an observer outside the IR SYK-like region.
This outside observer probes the near horizon IR region and the black-hole with the set of fields at her/his disposal. The set of allowed fields is really determined by the theory outside the black-hole -actually all the way to the boundary of space. As they are defined using the UV degrees of freedom, they are not necessarily easily defined in terms of the IR degrees of freedom. An operator in the UV theory flows to some operator in the IR SYK model, but they do not need to span or coincide with any set of preconceived operators in the latter. Since the Hamiltonian, in terms of the UV degrees of freedom, flows to a random operator in terms of the IR SYK degrees of freedom, we will assume that this is true for all UV operators. So we are probing the black-hole with random operators of the class that we discussed before. The outside observer throws quanta of the corresponding fields, and measures the response of the black-hole by measuring the outgoing quanta, i.e. by measuring a correlation function of random operators in the IR SYK model.
Let us go back to the Hamiltonian. Given a specific UV theory we flow to a specific realization of the SYK Hamiltonian in the IR. Under the suggestion above we are instructed to add the set of random operators with all values of n < p. In the conformal limit it means that there is a spectrum of low mass virtual particles (that correspond to dimensions smaller than 1, and a continuum of such particles in the double scaled limit). Time evolution using the HamiltonianĤ now means that quanta of these fields generate, bubble and decay as part of the time evolution. The details of the realization are encoded in their couplings.
It's important to emphasize that none of these fields need to exist in the UV theory. They are suggested only for the internal consistency of the near horizon, when trying to discuss the theory in a given realization. It does mean, however, that this consistency implies a very large amount of sort of "quantum hair", confined to the near horizon area.

The relation to geometric wormholes
Since we are discussing connected multi-trace correlators, a natural question is whether there is any relation to the wormholes studied in [24] for JT gravity, or more generally in [23, 57-59, 61, 62]. It seems that the answer is no. This can be seen from several points of view: 1. The strength of correlation that we are discussing is much larger: In this work we have seen that for the SYK model, the leading connected contribution to the spectral form factor Z( Note that this is of the same strength as a perturbative process (in the sense that it is a power of N −1 ). In the double scaling limit (λ = p 2 /N ), the spectral form factor scales like exp(− 1 2 √ λN log(N )). In contrast, following [24], we know that the first connected contribution to the spectral form factor in JT comes from a connected topology, which is reduced by a factor of e −κN , where there N represents the ground state entropy of JT theory (and κ is some coefficient of order 1). This is a non-perturbative result, and the two corrections obviously do not match. This is a consequence of the fact that JT itself, which is a radical truncation of the full SYK model, is dual to an RMT model with a sinh( √ E) envelope, but otherwise has standard β-ensemble level statistics, unlike the full SYK model.
We might ask if there's some local deformation we can apply to JT gravity, namely -some W (φ) term we can add to the Lagrangian that will reproduce the results we see in this work. In [65] it was shown that any deformation of this type still results in a matrix model. As such, the first connected component will scale like e −κN , and will not match the one computed here.
2 Relevant time scales: The standard e −κN JT wormholes build the ramp and the plateau [33]. These are very late time phenomena. In contrast, the corrections we find here are early time corrections (as evident in figure 23), and tend to zero at late times.
3. Bulk vs. boundary corrections: Finally, we would like to highlight again one point, which is that the gravitational implementation of the connected correlator (with the exception of the h 2 contribution) is done via an analogue of a multi-trace deformation of the theory [46][47][48]. After all, we are (schematically) deforming the Lagrangian by a local term L → L + C 1···k O 1 · · · O k where the O's are operators whose dimension sums to 1, and C are fluctuation parameters. This is rather different from the wormholes of [24,57,58], in which wormholes connect the two spaces via the interior 40 .
Note, however, that the necessity of a large number of fluctuating fields is certainly a statement about the bulk and may affect its internal dynamics (for example, when discussing the stability of bulk configurations). Furthermore, double trace deformations can drive a change in the bulk by their effect on the quantization of the fields as in [66,67].

Multi-Trace Operator Correlations
The simplest generalization of the multi-trace thermal partition function is to consider multi-trace correlation functions of random operators. We will look at random operators of the form where the couplingsJ I are independent Gaussian variables with zero mean and unit variance, as already defined in (5.14). We will further take the disorder average over these couplings. These are the same type of operators considered in [6,7]. If we are working in the double-scaled limit then we require these operators to have a well defined double-scaled limit,p 2 /N fixed, as in [6]; though our results also hold in the usual large N limit.
There are several arguments as to why this class of operators is interesting to consider, and we refer the reader to [7,68] for further details. The only argument that we will repeat is in the spirit of section 6.2: if we are given some UV theory which flows to an IR of the form AdS 2 × M , which is approximately described by an SYK model (or an array of such models), then the probes at our disposal are determined by the theory away from the near horizon region, which knows little about the SYK regime. The only expectation that we can have about such operators is that they are statistically similar to the local energy-momentum tensor, which is one of these probes.
We can also motivate this choice by noting that these correlation functions reduce to the standard SYK correlation functions after taking the disorder average overJ I . For example the 2-point function of A is really which is the class of 2-point functions that are computed in the large N limit of the SYK model, say in [3,4]. The higher point functions after disorder averaging are also identical to the regular correlation functions computed, say in [64]. As such, they seem like a reasonable choice of operators to consider. These operators are also reminiscent of end-of-the-world branes [57] as they have some internal degrees of freedom,p, and they can be correlated between different traces as the Hamiltonian.

Multi-trace 2-point functions
We will start by computing the connected two point functions where we insert two operators on each side of the trace, and subtract the disconnected part. We will denote the regular thermal 2-point function by 3) The exact expression of these correlation functions is not important for what follows, though we note that such functions were computed in [6,7] in the double scaled limit, and in [3,4,64] for the large N finite p limit. Furthermore, one can take β 1 = β + τ and β 2 = −τ to get the thermal Euclidean 2-point function, though we found the β 1,2 variables to be more convenient for this specific computation. We will be interested in the connected two point functions The leading order contribution to these functions will again be minimally connected, as was the case in the thermal partition function in (3.5). In this case, however, there are two different diagrams that may dominate, depending on the ratio of p,p, and they are given in figure 21.
The first diagram in figure 21 is simply contracting the four A operators. There are two ways to do this, and each is suppressed by a factor of Ñ p −1 , which together give a contribution that we denote by I 1 , with The second diagram in figure 21 can be computed in the same manner as was done for the double trace moments. Its moments are identical to those of (3.3), only with k i =k 1 +k 2 , wherek i 's are the moments in the expansion of e −β i H . We can immediately switchk i with derivatives β i ∂ β i , resulting in a value of The full connected correlation function is the sum of I 1 and I 2 .
The leading order behavior of G Ifp > p then the exact result of the n trace function will be a sum of cactus diagrams with derivative operators, as in (3.11). Ifp < p then the leading order behavior will just be a constant multiplying the disconnected result, similar to (7.5).
Of course, the correlation induced by A in diagram I in figure 21 is a new effect on top of figure II, and as such we can trust it (there is also a clear regime where it is larger than subleading fluctuation parameters associated with H). It is associated with a fluctuation parameter derived from A: I A 2 I . Correspondingly, the vector model we find for A is in the same spirit as the one we found for h 2 .

The associated vector model for operators
In the case wherep = p we can construct a dual vector model for each random operator which captures the leading order behavior, similar to the one we found for the Hamil-tonian in section 4. This time the 0-dimensional vector field represents the random couplings of the operator in addition to the Hamiltonian.
Starting from the operator A defined in (7.1), we consider the zero dimensional fields φ (A) I taking an index I from 1 to −1 A ≡ N p A , the number of index sets, or random couplings, in the operator A. Similar to the Hamiltonian case, the joint generating functional of A is 41 As before, we can take only the connected part of both sides if we so desire. Additionally we can then go to the variables φ 2 I and (φ (A) I ) 2 and reduce the integration to these two fluctuation parameters. Since the integration on these two variables is after we do the average over J,J, then we can replace . J,J by their GR expression and obtain the GR description of these two fluctuation parameters.
We stress that this vector model is only accurate at leading order in A or , and for all values of p,p, as long as p =p. This means that any correlation functions whose leading order scaling is smaller cannot be computed using this dual vector model. In particular the correlations functions involving an odd number of A insertions in some traces is more suppressed in A , similarly to an odd number of Hamiltonian insertions considered in section 5, and so cannot be computed using this vector model. To compute these odd correlation functions we must add additional terms to the effective action, similarly to what was done for the Hamiltonian in section 5.3. Another interesting correlation function that cannot be computed using this vector model is the correlations of single operator insertions. Both these cases will be discussed in the next subsection.

1-point functions and an odd number of operator insertions
A slightly more complicated observable is the fluctuation (across the ensemble) of the 1-point function of the operator A. This will teach us about the expectation value of the operator A in a specific realization. Thus we want to find the leading order behavior of the connected thermal 1-point function G (1,1) (β 1 , β 2 ;p) ≡ tr e −β 1 H A tr e −β 2 H A J,J , (7.10) or its moments For the expectation value to not vanish, we must have that the two A's are contracted, and that each fermion in the A chain must have a pair in one of the H operators that are contracted between the traces. As before, only minimally connected diagrams contribute at leading order, so the number of H's contracted between the sides will be n = p/p , except forp < p when n = 2. In the following we will discuss a couple of simpler cases: Case 1:p = p Ifp = p then we can have all the fermions in A be in a single H which is contracted between the traces. All the other H's on each side will be paired, so the leading order moments will be odd, and their contribution will be similar to the double trace moments The thermal two point function in this case will be In the same spirit as in section 5.3 and section 6.1 we would like to find a "minimal" description of this connected correlator, after we do the ensemble average. This can be implemented using the substitutionH = H + αA (7.14) where A is the operator above and α is a random variable with In gravity, the interpretation is the following. A marginal operator corresponds to a massless field, and adding αA to the Hamiltonian corresponds to shifting the VEV of the field in the bulk. In the theory in which we carry out an ensemble average over the coefficients of A, α is averaged over. In the language above, α encodes the fluctuation parameter I J IJI . We interpret the case of a fixed realization as having a fixed α.
The ansatz is arranged such that we reproduce the correlator (7.13). At the level of a single trace correlator it (1) does not introduce a 1-pt function for A since we take α = 0, and (2) introduces a correction to the partition function at order , which is small compared to terms we are neglecting. At the level of connected multi-trace correlation function, it introduces corrections which are subleading to the fluctuation parameter h 2 (which is the fluctuating parameter in front of H). For example, in the connected part of 2 traces it will introduce a term proportional to α 4 − α 2 2 , but this is of order 2 , whereas the cactus diagrams that we discussed in section 3.2 give a contribution of order .
Finally we would like to point out that we could also think of A not as a random operator but rather as a fixed operator, and this same computation would yield the variance of the thermal expectation value of A. For example we can take A = Ψ I for a fixed index set I and using this chord diagram formalism calculate the probability distribution of the thermal expectation value of Ψ I . 42 As this computation shows, the operators with the largest expectation value, on average, would be operators of the same length as the Hamiltonian. Thus living in a given realization of the couplings, the first operators whose expectations deviate from the average value are the ones of length p, and their typical deviation is of order 1/2 ∼ N −p/2 . 43 This also measures the extent of the breaking of the O(N ) symmetry: at large N it looks like the theory has an O(N ) symmetry, just like averaged model has, but really the exact realization of the couplings breaks this symmetry by a small amount of order 1/2 .
In this case we must pair the fermions in A with those in at least two insertions of H in each trace. The random coefficients of the H's are then contracted between the two sides. We must give half of the fermions in A to one of the H's, and the other half to the other H. The rest of the fermions in the two H's are paired. This contribution is similar to the one discussed on section 5.1. For example we can think of figure 17, where now two of the nodes that connect the different traces are H nodes, and one is an A node. The moments will once again factorize in a similar way to the odd double trace moments (see section 5.1) The combinatorial factor is so the order of this will be 2 ≤ mp(2, 2) < . We note that this contribution only happens ifp mod 4 ≡ 0, as otherwise the two possible contractions anti-commute and thus cancel each other, similar to the argument as to why the contributions to the odd moments vanish if p mod 4 ≡ 2 in section 5.1. The function Wp(k) is also very similar to W (k) from equation (5.3) -only the size of the index set of the three chords is no longer p/2, but rather two of them have length p/2, and the third has length p −p/2. Explicitly, in the large N limit, it is given by the three point function (7.18) We can write these contributions as a correction to the vector model coming from a matrix model type interaction, similar to section 5.2. The idea is to add an additional term to the effective Hamiltonian, and correct the operator A accordingly, in order to allow for an index contraction of (H, H, A). In order to not change the leading order result of an even number of insertions of A, H, we need to make sure that these new terms are subdominant with respect to the previous corrections. The Hamiltonian is then H eff = h 2 H + β:Op /2 O p−p/2 :, (7.19) where the distribution of h 2 is given in (4.9). The operator A is corrected by the random operator A eff = h A A + α:Op /2 Op /2 :, (7.20) where by normal ordering we mean that we don't allow self contractions for the random operator (if the two operators are of the same length). We have introduced the Gaussian variables α, β, with zero mean and variances to be specified later, and the variable h A with mean 1 and variance Ñ p −1 . Now we can compute the operator moment Wp(k − 2), (7.22) where q = e −2p 2 /N , and the factor (1 + qp 2 /(4p 2 ) ) comes from different possible contractions of the fluctuation fields. This means that This matches (7.16) if we require (7.24) Next we need to verify that we have not introduced any large corrections to observables that include an even number of both A, H insertions, in all traces. This motivates us to require that both the α and β contractions need to be subleading w.
As before, we interpret the operators Op /2 and O p−p/2 as new operators in the theory (or new fields in the bulk), whose existence is required in order to explain, in gravity, this two-trace correlator. Before we saw this happening for the Hamiltonian, but it is actually a feature associated with any operator observable in the theory.

Case 3:p > 2p
These will look like the deformations of high order corrections to the double trace moments. In general they will involve higher point functions, a sum over all the possible ways to distributep among the various H's, and a sum over all the ways to distribute the remaining indices in the H's among themselves. This function will have a discontinuity every timep = np for some integer n, as the leading order moments will change parity, from even moments to odd moments, or vice versa. We have not carried out a full analysis of these cases.

Multi-Trace Correlations from the Replica Path Integral
Computations of SYK model are typically done in the path integral formulation of the theory, as in [3,4,10]. As the resulting action is a large N action, the theory can be solved in the large N limit by finding the saddle points and looking at the fluctuations around them.
Similarly, multi-trace expectations have previously been computed in the path integral formulation using replicas [16,[33][34][35][36]57]. The main object of interest in these computations is the spectral form factor, which is an analytic continuation of the double trace thermal partition function. The main focus of these computations is finding non-diagonal saddle points of the two replica path integral that generate the RMT contribution to the multi-trace expectations [33]. These additional saddle points are not the source of the global fluctuations, as they are related to connected topologies in the gravity dual. Rather, we will demonstrate how to calculate the leading order contributions to the connected thermal expectation function as a perturbative expansion around the disconnected saddle.
We shall start out by looking at the fermionic path integral and show how the global modes arise in this description. Then we use this intuition to derive a formulation of the two replica action in terms of the bi-local fields G and Σ. This action is found via an expansion followed by a resummation of the interacting part, and has a completely different form (and saddle point equations) than the naive action derived in [33]. Finally we show how the global modes naturally arise from a perturbative expansion around the disconnected saddle of this action.

The path integral derivation
The thermal partition function of the SYK model is given by the path integral 44 with the Euclidean action given by where τ is the normalized Euclidean time and we set J = 1. The expectation value of the thermal partition function is computed through the path integral Taking the integral over J I 's leaves us with the following expression for the thermal partition function: 4) with = N p −1 as before. We will call the action in the path integral above S 0 [ψ], and it is nothing but the standard single trace action.
We can now compute the leading connected double trace thermal expectation value using this same path integral formulation of the model. Notice that We can treat this extra term as a small interaction term, and expand the exponent in a Taylor series, as is typically done in perturbation theory. The zeroth order term in the series is just the disconnected component. The first term in the series vanishes as each free action S 0 [ψ] has a Z 2 symmetry of ψ (i) → −ψ (i) , but the interaction term is odd under this symmetry. Thus the leading order of the connected component will be the second order term, and will give the contribution The path integral will vanish because of the aforementioned Z 2 × Z 2 symmetry if J = I, in which case we can write the connected thermal partition function as Z(β 2 ), (8.8) which is identical to (3.5) computed using the moment method.
If we want to calculate n trace thermal partition functions then the computation is similar. The "free" part of the action will consist of n averaged actions, one for each of the n replicas (or traces). Additionally we will have an interaction term of the form Expanding the interaction term order by order in will again give us only cactus diagrams as the leading order contribution to the connected thermal expectation function.

Path Integral in terms of G and Σ
Up to now we have worked with the action for the original fermions to compute the connected thermal expectation functions. However the path integral formulation is much more elegant as we can express it in terms of bi-local fields G and Σ rather than the original N fermions. Furthermore, in these variables (with the correct scaling) the action becomes a large N action, and thus a saddle point approximation is valid (see for example [10] for details). We would like to reproduce the above computation when considering these bilocal fields, both for completeness and to be able to relate this work to others, such as [33]. We will define the bi-local fields as We use a slightly different normalization for the G ii field, the reason for which will become apparent in the analysis. Furthermore we shall concentrate on the double trace thermal partition function for brevity, though similar calculations can be preformed for high trace expectations. We introduce Lagrange multipliers Σ i and Σ 12 to enforce the definitions of G, as is typically done in the replica path integral (see for example [33]). After substituting in the definitions for G, and taking the integral over the couplings, we arrive at the following expression for the thermal partition function Typically at this stage the fermions are immediately integrated out, leaving the well known G-Σ action [33], however we will take another route. We will expand the exponential of the last term (the one involving fermions) in the above path integral action and consider the path integral over the fermions order by order. As the fermions are Grassmann variables, the integral over them will vanish unless they exactly come in pairs. This already tells us that all the odd terms in this expansion will vanish. The zero term is just a one, so the first nontrivial term is the second term which will be j (τ 4 ), (8.12) which will vanish for all i = j, by Grassmann integration arguments. This leaves us with the term (8.13) where in the last line we substituted the definitions of the two point functions.
In general, all higher expansion will reduce to index pairings of the fermions from Grassmann integration arguments, allowing us to substitute in the two point functions. This leads to Wick contractions of Σ 12 's. We note that we neglect corrections arising from the case where more than two fermions share the same index as these are suppressed by higher order powers in 1/N . 45 The numerical factor multiplying the Wick contraction of k contraction is (−1) k from anti-commuting the fermions times 1/(2k)! from expanding the exponent times (2k − 1)!! which is the number of Wick contractions. Overall this gives a factor of 45 Similar corrections have already been neglected in the path integral derivation when we substituted G p (τ, τ ) for I Ψ I (τ )Ψ I (τ ).
(−1/2) k /k!, allowing us to re-sum the exponent, and rewrite the partition function as (8.14) Now we are ready to integrate out the fermions, leaving us with the large N action dτ k Σ 12 (τ 1 , τ 2 )Σ 12 (τ 3 , τ 4 )G 1 (τ 1 , τ 3 )G 2 (τ 2 , τ 4 ). (8.15) The first term in the action (8.15) is the sum of the original actions for the replica diagonal fields. To this we add an action for the replica off-diagonal fields, along with a quadratic interaction term between the diagonal and off-diagonal fields.

(8.16)
The final step is expanding this action order by order perturbatively in √ , after which the quadratic action for G 12 leads to Wick contractions of the form G 12 (τ 1 , τ 2 )G 12 (τ 3 , τ 4 ) = −G 1 (τ 1 , τ 3 )G 2 (τ 2 , τ 4 ). (8.17) This transforms the calculation of the two replica thermal partition function to a sum of correlation functions of G 1 and G 2 in the original theory.
For concreteness let us analyze the first few terms in this expansion, and compare them to the previous results. It is clear that the zeroth order term gives the disconnected contribution, while the term of order √ vanishes as self contractions vanish due to the anti-symmetry of G i (namely G i (t, t) = G i (0) = 0 by construction.) The first non-trivial term is the one of order , which is 1 2p! β 2 1 β 2 2 dτ 1 . . . dτ 4 G p (τ 12 )G p (τ 34 ). (8.18) There is only one way to do the Wick contractions for this term, and the resulting contribution is 1 2 β 2 1 β 2 2 dτ 1 dτ 3 G p 1 (τ 13 ) dτ 2 dτ 4 G p 2 (τ 24 ) = 2 β 1 ∂ ∂β 1 Z 0 (β 1 )β 2 ∂ ∂β 2 Z 0 (β 1 ), (8.19) where the expectations are with respect to the standard SYK action for each replica. 47 This leading order connected contribution is identical to the previously obtained results in equations (3.5) and (8.8), as should be expected.
The next correction comes from taking down three interaction terms, and it reads different Wick contraction (that all give the same result,) the p p/2 factor has to do with choosing which G 12 's in each group contract with a different G 12 , while the (p/2)! counts the different ways to contract the G 12 's after choosing the overall pairings. At the end this contribution reduces to the product of two six point functions The factorization form and the lengths of the operators in the correlation functions is also the same as in section 5.1. Indeed this is the same correlation function as in (5.4) only from the path integral approach.
Higher order terms in this pertubrative expansion have the same form as the ones studied in section 6.1. This is because the counting of all possible Wick contraction at order n is equivalent to counting how many ways there are to split n multi-fermions into groups such that each group appears twice, and so the suppression factors in this expansion will be identical to the ones computed in section 6.1.
Finally, we note that this calculation is a perturbative expansion around the disconnected saddle of (8.16), namely the solution where G 12 = 0. One can also consider connected saddle points of (8.16), similar to [33], which are related to a connected geometry in the bulk; though that is beyond the scope of this paper.

Discussion
In this paper we analyzed the leading order contributions to connected multi-trace expectation values in the SYK model. We showed that at early times the contributions can be organized into a well defined perturbative expansion in powers of = N p −1 ∼ N −p .
The lowest order contributions arise from connected cactus diagrams, as derived in section 3, or equivalently using a large M = N p vector model for the random couplings which was described in section 4. They are associated with a specific fluctuation parameter I J 2 I . This is actually the first of an infinite set of fluctuation parameters, which give higher order corrections to the moments, with decreasing strength. The next term in this expansion, which is the first correction of the odd moments, was systematically studied in section 5; while the general structure of this expansion was further examined in section 6.1.
The puzzling aspect of this expansion is its gravitational interpretation. Our analysis is valid at any energy range (captured by the moment method), and if we are studying the SYK model in a low temperature regime, albeit in early times, we expect there to be a clear gravitational dual. As these are connected contributions, one might expect them to arise from connected geometries between two disconnected boundaries in the sum over geometries implied by the gravitational path integral. However such contributions are topologically different and so are suppressed by factors of e −S 0 ∼ e −N [24], rather than the polynomial suppression in N that we observe. Thus the gravitational path integral of the dual theory must contain an additional non-geometric way to couple disconnected boundaries. Based on the leading order contributions studied in section 4.2, it seems like these contributions correspond to a global mode that collectively re-scales the AdS radius of the dual theory. The rest of the perturbative expansion can also be seen as additional fluctuation fields connecting the different copies of the boundary theory via the coefficient of their interaction terms without resorting to wormholes. Actually, the form of the interaction suggests that the between-universe correlations are only given by boundary terms. However, once we have identified the need for more fluctuation fields, then the rules of the AdS/CFT correspondence tell us that they propagate in the bulk.
It would be interesting to precisely specify the gravitational theory where such non-geometric fluctuations naturally arise from the path integral, perhaps by some deformation of JT gravity like [65,69], with the additional light fields that we discussed here. These non-geometric wormholes may also exist in recent stringy constructions of the SYK model [70], and so may be related to certain constructions in string theory. A more ambitious goal would be to understand under what more general circumstances these non-geometric global fluctuations arise in a theory of quantum gravity. For example, we might expect them to be there in any black holes with an AdS 2 near horizon limit.
One may speculate that perhaps these global fluctuations, and the corresponding bulk fields, are a generic feature of a UV complete theory of quantum gravity, at least when the background is associated with high entropy configurations. The only piece of evidence that we can provide is to consider the case in which there are no global modes at all, in the sense above. This is just a standard RMT model for the Hamiltonian and the various observables in the theory. I.e. H, or any other operator, is drawn from an ensemble dH exp(−N Tr(V (H))) where N is the dimension of the Hilbert space. This is not the case for the SYK model, but JT gravity alone falls into this class [24]. The problem is that it is difficult to have weakly coupled fields, i.e, familiar matter, propagating in any bulk dual. Consider two such fields, corresponding to operators O 1,2 in the field theory. If the statistics of these operators is RMT, i.e., independent of each other and of the Hamiltonian, then the correlator O 1 (t 1 )O 2 (t 2 )O 1 (t 3 )O 2 (t 4 ) will be exponentially suppressed (i.e. a suppression of the order 1/N ∼ e −S 0 ), 48 which does not correspond to weakly coupled gravity. But if there are strong correlations between matrix elements, these imply global modes (although perhaps not as simple as the ones we discussed here).
Generally, it is unclear if the correct gravitational picture is that of a single dual theory to the whole SYK ensemble, as suggested by [23,60], or perhaps that there is a gravitational dual to each realization of random couplings. In the second case the effective gravitational picture arises from measuring coarse-grained observables which only weakly depend on the exact values of the couplings, similar to [43,71,72]. In both cases, it is not quite clear what is the dual of a single realization if no coarsegraining is done. In this note, we are adding the option that there is actually a range of intermediate possibilities. We actually need to specify first the precision in which we plan to carry out our experiment, i.e., how much information about the specific couplings we want to obtain. If we increase the precision, we need to use a gravitational description which includes more fields, more couplings and more fluctuation parameters. This is not unreasonable since GR is an effective theory and we may have to use more features to describe effects with higher and higher precision. We can perhaps think about all these fields and couplings as already existing in the background, but averaged over if we carry out an experiment with low precision. They are still low mass fields, not to be integrated out by a Wilsonian argument, but we can ignore them at low precision experiments. This is tantamount to saying that coarse grained observable is given by GR. As we increase the level of precision we need to add more of these fields and couplings. When we reach high precision measurements -as we try and nail down what is the precise realization -we need to include many such fields, which might change the behaviour of the background altogether. For example, it might lead to disconnecting universes, and, simultaneously, introducing many new objects in the bulk, beyond GR. Clearly, more work is needed to elaborate this point of view.
Another interesting endeavor would be to analyze the global fluctuations in the charged SYK model [73,74], or the supersymmetric SYK models [75,76]. This can be done using the same chord diagram methodology which has been previously used to study these models [68,77]. Especially interesting would be to analyze how the existence of a large number of exact ground states in the N = 2 theory, observed in [17,77], affects these global fluctuations.
Finally we note that two replica action derived in the path integral analysis of the global fluctuations (in section 8) is different than the standard action for the two-replica bi-local fields G and Σ derived in [33]. 49 It may be simpler to study the connected saddle points of this new action, perhaps even finding the exact contributions that lead to the universal RMT structure of the ramp and plateau in the spectral form factor. and in the double scaled limit q is simply q = e −2p 2 /N . At finite N and p we can find an exact expression for q given by (see [26] for details) (A.4) Figure 23: A comparison of the leading order correction to the spectral form factor for N = 28, 30, 32 to the numerical results from [16].
Using this spectral density we can calculate Z(β) and the first contribution to the connected 2-point thermal partition function (given by (3.5)) via simple numerical integration. However the scaling we choose for the normalization of H is different than the choice in [16], namely we choose to normalize 2 −N/2 tr(H 2 ) = 1, while they choose to have it proportional to N to get a well defined large N action. Thus to compare to their numerical results we need to set (see [22] for details) Then we can compare these results to the numerical results in [16] using simple numerical integration of the spectral density. Apart from the comparison in the text for N = 34, we also compared the leading term to the numerical results from [16] for N = 28, 30, 32, which are given in figure 23. We note that the slope-plateau transition is not exactly matched as we use the GUE approximation from (3.18) rather than the GOE or GSE forms of the ramp, which are the correct universality classes for N = 32 and N = 28 respectively.