On the Spectral Form Factor for Random Matrices

In the physics literature the spectral form factor (SFF), the squared Fourier transform of the empirical eigenvalue density, is the most common tool to test universality for disordered quantum systems, yet previous mathematical results have been restricted only to two exactly solvable models (Forrester in J Stat Phys 183:33, 2021. 10.1007/s10955-021-02767-5, Commun Math Phys 387:215–235, 2021. 10.1007/s00220-021-04193-w). We rigorously prove the physics prediction on SFF up to an intermediate time scale for a large class of random matrices using a robust method, the multi-resolvent local laws. Beyond Wigner matrices we also consider the monoparametric ensemble and prove that universality of SFF can already be triggered by a single random parameter, supplementing the recently proven Wigner–Dyson universality (Cipolloni et al. in Probab Theory Relat Fields, 2021. 10.1007/s00440-022-01156-7) to larger spectral scales. Remarkably, extensive numerics indicates that our formulas correctly predict the SFF in the entire slope-dip-ramp regime, as customarily called in physics.


Introduction
Spectral statistics of disordered quantum systems tend to exhibit universal behavior and hence are widely used to study quantum chaos and to identify universality classes. In the chaotic regime, the celebrated Wigner-Dyson-Mehta eigenvalue gap statistics involving the well-known sine-kernel [42] tests this universality on the scale of individual eigenvalue spacing. On this small microscopic scale the universality phenomenon is the most robust and it depends only on the fundamental symmetry type of the model. On larger scales more details of the model influence the spectral statistics, nevertheless several qualitative and also quantitative universal patterns still prevail.
1.1. The spectral form factor and predictions from physics. In the physics literature the standard tool to investigate eigenvalues λ 1 , λ 2 , . . . , λ N of a Hermitian N × N matrix (Hamiltonian) H on all scales at once is the spectral form factor (SFF) [40] defined as e it(λi−λj ) = | e itH | 2 (1.1) For typical disordered Hamiltonians a key feature of SFF(t) is that for larger t (more precisely, in the ramp and plateau regimes, see later) it is strongly dependent on the sample, i.e. the standard deviation of SFF(t) is comparable with K(t). In other words, SFF(t) is not self-averaging [45] despite the large summation in (1.1). The spectral form factor and its expectation K(t) have a very rich physics literature since they contain most physically relevant information about spectral statistics. Quantizations of integrable systems typically result in K(t) ∼ 1/N for all t where N is the dimension of the Hilbert space. Chaotic systems give rise to a linearly growing behavior of K(t) for smaller t (so-called ramp) until it turns into a flat regime, the plateau. The turning point is around the Heisenberg time T H , but the details of the transition depend on the symmetry class of H and on whether the eigenvalues are rescaled to take into account the non-constant density of states (in physics terminology: unfolding the spectrum). For example, in the time irreversible case (GUE symmetry class) the unfolded SFF has a sharp kink, while in the GOE symmetry class the kink is smoothened. The exact formulas can be computed from the Fourier transform of the two point eigenvalue correlation function of the corresponding Gaussian random matrix ensemble, see [42,Eqs. (6.2.17), (7.2.46)], the result is 3) for any fixed τ > 0 in the large N limit. Here we expressed the physical time t in units of the Heisenberg time, τ = t/T H , where T H is given by T H = 2πρ withρ being the average density. Choosing the standard normalisation for the independent (up to symmetry) matrix elements, the limiting density of states is the semicircle law ρ sc (E) = 1 2π (4 − E 2 ) + , so we have N eigenvalues in an interval of size 4, henceρ = N/4 and thus T H = π 2 N . In particular, in the original t variable (1.5) Note the lower bound on t: the formula holds in the large N limit in the regime where t ≥ δN for some fixed δ > 0 that is independent of N . The corresponding formulas without unfolding the spectrum (i.e. for the quantity defined in (1.1)) are somewhat different, see e.g. [9,Eq. (4.8)] for the GUE case; they still have a ramp-plateau shape but the kink is smoothened. The ramp-plateau picture and its sensitivity to the symmetry type has been established well beyond the standard mean field random matrix models. In fact, the Bohigas-Giannoni-Schmit conjecture [6] asserts that the formulas (1.3) are universal, i.e. they hold essentially for any chaotic quantum system, depending only on whether the system is without or with time reversal symmetry. The nonrigorous but remarkably effective semiclassical orbit theory [48,43,31,4] based upon Gutzwiller's trace formula [27] and many follow-up works verified this conjecture for quantizations of a large family of classical chaotic systems, e.g. for certain billiards.
For smaller times, t T H , other details of H may become relevant. In particular the drop from K(t = 0) = 1 to K(t) 1 for 1 t T H is first dominated by the typical non-analyticity of the density of states at the spectral edges giving rise to the slope regime up to an intermediate minimum Std| e itH | 2 Figure 1. A typical slope-dip-ramp-plateau picture for the spectral form factor of a chaotic system. The figure on log-log scale shows the SFF of a single GUE realisation H of size 500×500, as well as the empirical mean and standard deviation obtained from 500 independent realisations. point of K(t), called the dip (in the early literature the dip was called correlation hole [40], for a recent overview, see [17]). Figure 1 shows the typical slope-dip-ramp-plateau picture for the GUE ensemble. Formula (1.5) is valid starting from scales t N 1/2 , while K(t) is oscillatorily decreasing for t N 1/2 with a dip-time t dip ∼ N 1/2 . Thus K(t) follows the universal behavior (1.5) only for t t dip . In this regime the fluctuation of the SFF is comparable with its expectation, K(t), in fact e itH is approximately Gaussian. In contrast, the dominant contribution to the slope regime, t t dip , is self-averaging with a relatively negligible fluctuation. However, if the edge effects are properly discounted (e.g. by considering the circular ensemble with uniform spectral density on the unit circle), i.e. the slope regime is entirely removed, then the Gaussian behavior holds for all t T H with a universal variance given by (1.5).
In more recent works spectral form factors were studied for the celebrated Sachdev-Ye-Kitaev (SYK) model [46,18,32,24,23] which also exhibits a similar slope-dip-ramp-plateau pattern although the details are still debated in the physics literature and the numerics are much less reliable due to the exponentially large dimensionality of the model.

1.2.
Our results. Quite surprisingly, despite its central role in the physics literature on quantum chaos, SFF has not been rigorously investigated in the mathematics literature up to very recently, when Forrester computed the large N limit of K(t) rigorously for the GUE in [21] and the Laguerre Unitary Ensemble (LUE) in [22] in the entire regime t N . Both results rely on a remarkable identity from [9] (and its extension to the LUE case) and on previous stimulating work of Okuyama [44]. However, these methods use exact identities and thus are restricted to a few explicitly solvable invariant ensembles.
The main goal of the current paper is to investigate SFF beyond these special cases with a robust method, the multi-resolvent local laws. While our approach is valid for quite general ensembles, for definiteness we focus on two models: the standard Wigner ensemble (for both symmetry classes) and the novel monoparametric ensemble introduced recently [25] by Gharibyan, Pattison, Shenker and Wells. The latter consists of matrices of the form H s := s 1 H 1 + s 2 H 2 , where H 1 and H 2 are typical but fixed realisations of two independent Wigner matrices and s = (s 1 , s 2 ) ∈ S 1 ⊂ R is a continuous random variable. The normalization s 2 1 + s 2 2 = 1 guarantees that the semicircle law for H s is independent of s and it also shows that the model has effectively only one random parameter. One may also consider similar ensembles with finitely many parameters (see Remark 2.4) resulting in qualitatively the same behavior but with different power laws, see Table 1.
We study the statistics of H s in the probability space of the single random variable s and probe how much universality still persists with such reduced randomness. We write E s for the expectation wrt. s and E H , Std H for the expectation and standard deviation wrt. H 1 and H 2 .
Our main result is to prove a formula for the expectation and standard deviation of SFF for both ensembles up to an intermediate time. While this does not include the ramp regime, it already allows us to draw the following two main conclusions of the paper: (a) The expectation and standard deviation of SFF(t) for Wigner and monoparametric ensembles exhibit the same universal behavior to leading order for 1 t N 1/4 if the trivial edge effects are removed. In the monoparametric case it is quite remarkable that already a single real random variable generates universality. (b) For the monoparametric ensemble K(t) = E s [SFF(t)] depends non-trivially on the fixed H 1 , H 2 matrices, but for large t this dependence is a subleading effect whose relative size becomes increasingly negligible as a negative power of t. In particular, while the speed of convergence to universality is much slower for the monoparametric ensemble than for the Wigner case, it is improving for larger t.
The second item answers a question raised by the authors of [25] which strongly motivated the current work. In particular, sampling from s does not give a consistent estimator for K(t), but the relative precision of such estimate improves for larger times. We supplement these proofs with an extensive numerics demonstrating that both conclusions hold not only for t N 1/4 but for the entire ramp regime, i.e. up to t T H ∼ N . Note that recently we have proved [15] that the Wigner-Dyson-Mehta eigenvalue gap universality holds for the monoparametric ensemble, which strongly supports, albeit does not prove, that K(t) in the plateau regime is also universal.
We remark that our method applies without difficulty for finite temperatures (expressed by a parameter β > 0) and for different-time autocorrelation functions, i.e. for e (−β+it)H e (−β−it )H as well, but for the simplicity of the presentation we focus on SFF(t) defined in (1.1), i.e. on β = 0 and t = t .
1.3. Relations to previous mathematical results. Rigorous mathematics for the spectral form factor, even for Wigner matrices or even for GOE, significantly lags behind establishing the compelling physics picture about the slope-dip-ramp-plateau. Given the recently developed tools in random matrix theory, it may appear surprising that they do not directly answer the important questions on SFF. We now briefly explain why.
1.3.1. Limitations of the resolvent methods. For problems on macroscopic spectral scales (involving the cumulative effect of order N many eigenvalues), and to a large extent also on mesoscopic scales (involving many more than O(1) eigenvalues), the resolvent method is suitable. This method considers the resolvent G(z) = (H − z) −1 of H for a spectral parameter z away from (but typically still close to) the real axis and establishes that in a certain sense G(z) becomes deterministic. This works for η = z N −1 (in the bulk spectrum), i.e. on scales just above the eigenvalue spacing (note that the imaginary part of the spectral parameter sets a scale in the spectrum). Such results are called local laws and they can be extended to regular functions f (H) by standard spectral calculus (Helffer-Sjöstrand formula, see (3.3) later).
However, the interesting questions about SFF concern a 1/N subleading fluctuation effect beyond the local laws. Indeed Tr is a special case of the well-studied linear eigenvalue statistics, To leading order it is deterministic and its fluctuation satisfies the central limit theorem (CLT) without the customary is a normal random variable with variance 1 The computation of higher moments of Tr f (H) − E Tr f (H) requires a generalization of the local laws to polynomial combinations of several G's that are called multi-resolvent local laws. Applying (1.6)-(1.7) to f (x) = e itx we obtain, roughly, using that where J 1 is the first Bessel function of the first kind. Note that V f in (1.7) scales essentially as the H 1/2 Sobolev norm of f hence V f ∼ t for our f (x) = e itx in the regime t 1. Therefore the size of the fluctuation term in (1.8) is V f /N 2 ∼ t/N 2 and it competes with the deterministic term linear ramp function) becomes bigger than the slope function (J 1 /t) 2 . This argument, however, is heuristic as it neglects the error terms in (1.6) that also depend on t via f . CLT for linear statistics (1.6) for Wigner matrices H has been proven [33,34,1,47,49,26,29,30,13,36,41,38,3,28] for test functions of the form f (x) = g(N a (x − E)) with some fixed reference point |E| < 2, scaling exponent a ∈ [0, 1) and smooth function g with compact support, i.e for macroscopic (a = 0) and mesoscopic (0 < a < 1) test functions living on a single scale N −a . These proofs give optimal error terms for such functions but they were not optimized for dealing with functions that oscillate on a mesoscopic scale and have macroscopic support, like f (x) = e itx for some t ∼ N α , α > 0. The only CLT-type result for a special two-scale observable is in [2] where the eigenvalue counting function smoothed on an intermediate scale N −1/3 was considered.
Quite remarkably, extensive numerics shows that the formulas (1.6)-(1.7) for f (x) = e itx are in perfect agreement with the expected behavior of K(t) in the entire slope-dip-ramp regime all the way up to t N , i.e. the CLT for linear statistics correctly predicts SFF well beyond its proven regime of validity. In the current paper we optimise the error terms specifically for e itx and thus we could cover the regime t N 5/11 for the variance in (1.6) (corresponding to E[SFF(t)]).
is the two point function, given by the celebrated Wigner-Dyson sine kernel, and K GOE (t) has a similar origin. Wigner-Dyson theory is designed for microscopic scales, i.e. to describe eigenvalue correlations on scales comparable with the local level spacing ∆, this is encoded in the fact that (1.3) holds for any fixed τ > 0 in the N → ∞ limit (equivalently that (1.5) holds only for t ≥ δN since ∆ ∼ 1/N in the bulk). While this is a very elegant argument supporting (1.3), mathematically it is quite far from a rigorous proof. The mathematical proofs of the sine-kernel universality use test functions that are rapidly decaying beyond scale ∆. The typical statements (so called fixed energy universality [7,39]) show that for any fixed energy E in the bulk in the large N limit, for any smooth, compactly supported functions g : R 2 → R. The current methods for proving the Wigner-Dyson universality cannot deal with functions that are macroscopically supported, like g(x, y) = e it(x−y) with a fast oscillation t ∼ N .
1.4. Summary. Using multi-resolvent local laws we prove a CLT for linear statistics of monoparametric ensembles (Theorem 2.5) with covariance with an additional term depending on the fourth cumulant. Due to a careful analysis of the error terms this allows us to prove the expected behavior on the expectation and standard deviation of the SFF for Wigner matrices for t N 5/17 (Theorem 2.7) and for the monoparametric ensemble for t N 1/4 (Theorem 2.8). Beyond these regime the spectral form factor is not understood mathematically apart from the special GUE and LUE cases. However, we can still use our predictions from the CLT for linear statistics (1.6) to derive an Ansatz for the behavior of SFF(t) in the entire t N regime. In particular, we show that the SFF is universal for the monoparametric ensemble. We find numerically that our theory correctly reproduces SFF(t) for any t N and it also coincides with the physics predictions for the GUE case.

Notations and conventions.
For positive quantities f and g we will frequently use the notation f ≈ g meaning that f /g → 1 in a limit that is always clear from the context. Similarly, f g means that f /g → 0. Finally, the relation f ∼ g means that there exist two positive constants c, C such that c ≤ f /g ≤ C.
We say that an event holds "with very high probability" if for any fixed D > 0 the probability of the event is bigger than Acknowledgement. We are grateful to the authors of [25] for sharing with us their insights and preliminary numerical results. We are especially thankful to Stephen Shenker for very valuable advice over several email communications. Helpful comments on the manuscript from Peter Forrester and from the anonymous referees are also acknowledged.

Statement of the main results
Our new results mainly concern the monoparametric ensemble but for comparison reasons we also prove the analogous results for the Wigner ensemble. We start with the two corresponding definitions.
Definition 2.1. The Wigner ensemble consists of Hermitian N × N random matrices H with the following properties. The off-diagonal matrix elements below the diagonal are independent, identically distributed (i.i.d) real (β = 1) or complex (β = 2) random variables; in the latter case we assume that Eh 2 ij = 0. The diagonal elements are i.i.d. real random variables with Eh 2 ii = 2/(N β). Besides the standard normalisation (1.4), we also make the customary moment assumption: for every q ∈ N there is a constant C q such that In the case of Gaussian distributions, it is called the Gaussian Orthogonal or Unitary Ensemble (GOE/GUE), for the real and complex cases, respectively.
Remark 2.2. The assumptions Eh 2 ij = 0 in the complex case, and Eh 2 ii = 2/(βN ) are made purely for convenience. All results can easily be generalised beyond this case but we refrain from doing so for notational simplicity.
where H 1 , H 2 are independent Wigner matrices satisfying 2 E|h ij | 4 and s = (s 1 , s 2 ) ∈ S 1 is a random vector, independent of H 1 , H 2 . On the distribution of s we assume that it has an square integrable density ρ(s) independent of N . We write E s for the expectation wrt. s and E H , Std H for the expectation and standard deviation wrt. the Wigner matrices H 1 and H 2 .
The parameter space S 1 ⊂ R 2 inherits the usual scalar product and norms from R 2 , so for s, r ∈ S 1 we have s, r := s 1 r 1 + s 2 r 2 , s p := (|s 1 | p + |s 2 | p ) 1/p . We also introduce the entrywise product of two vectors: For a fixed s, H s is just the weighted sum of two Wigner matrices, and, due to the normalisation, itself is just a Wigner matrix. However, the concept of monoparametric ensemble views H s as a random matrix in the probability space of the single random variable s for a typical but fixed (quenched) realization of H 1 and H 2 . While Wigner matrices have a large (∼ N 2 ) number of independent random degrees of freedom, the monoparametric ensemble is generated by one single random variable hence, naively, much less universality properties are expected. Nevertheless, the standard Wigner-Dyson local eigenvalue universality holds [15].
Remark 2.4. In [15] we considered the un-normalized monoparametric model H s := H 1 + sH 2 , for some real valued random variable s, whose density of states is a rescaled semicircular distribution. In this paper we prefer to work with more homogeneous models since the formulas are somewhat nicer, but our main results also apply to this inhomogeous model with some slightly different exponents in the error terms. One may also consider a different un-normalized ensemble, s 1 H 1 + s 2 H 2 with s ∈ R 2 having an absolutely continuous distribution, which is effectively a two parameter model. Similar results also hold for the multi-parametric analogue of (2.2), i.e. s 1 H 1 + · · · + s k H k for s ∈ S k−1 , see Remark 2.6 and Section 2.4 later. Despite all these options, for definiteness, the main body of this paper concerns the homogenous monoparametric model from Definition 2.3.

Central limit theorem for sum of Wigner matrices.
To understand the effect of the random s, we study the joint statistics of H s and H r for two different fixed realisations r, s in the probability space of H 1 , H 2 , i.e. we aim at the correlation effects between H s and H r . We introduce the short-hand notations To estimate the error term in the following theorem we introduce a parameter 1 ≤ τ N and the weighted norm For the applications later, the parameter τ will be optimized.
Here E p are error terms which for any 1 ≤ τ N and any ξ, > 0 may be estimated by 4 for p ≥ 2. Additionally, if s 1 = · · · = s p , i.e. in the Wigner case, we have the improved bound and the first term of (2.7) for β = 2 coincides with (1.7).
We note that (2.7) generalizes the standard variance calculation yielding (1.7) to s = r, see Section 3.2.4. Remark 2.6. Theorem 2.5 verbatim holds true also for the multi-parametric model upon interpreting s, r and s p as the Euclidean inner product and p-norm in R k . Similarly, Theorem 2.5 also applies to the un-normalised case s ∈ R 2 for which on the rhs. of (2.5) the function f has to be replaced by f ( s ·) with · := · 2 and v sr from (2.7) has to be replaced by (2.10) 3 For the applications in this paper, SFF in the regime t 1, the first term in (2.7) is the only relevant one. 4 The exponent in (2.8) can be optimized depending on τ and f .

SFF for Wigner and monoparametric ensemble.
In this section we specialise Theorem 2.5 to the SFF case. We define the approximate expectation (rescaled by 1/N ) in terms of the Bessel functions J k of the first kind. We also define the approximate variance From Theorem 2.5, choosing f i (x) = e ±itx and τ = t, and recalling that e ±itH s = N −1 Tr e ±itH s , we readily conclude the following asymptotics for SFF of the Wigner and monoparametric ensemble.

13)
and we have the asymptotics where we set e := (1, 0) ∈ S 1 . Figure 2). In particular, in the ramp regime the SFF is a non-negative random variable whose fluctuations are of the same size as its expectation. Thus the SFF is not self-averaging in the ramp regime, while it is self-averaging in the slope regime but only owing to the dominance of the function e(t) representing the edge effect. If one discounts the edge effect, i.e. artificially removes e(t), then S wig (t) ≈ E wig (t) would hold for all 1 t N , demonstrating the universal behavior of SFF in the entire slope-dip-ramp regime.
In the first plot we compare the empirical mean (red) and standard deviation (blue) of | e itH | 2 obtained from sampling 10, 000 independent 100 × 100 GUE matrices H with our approximation (2.13). In the second plot we similarly compare the empirical mean (red) and variance (blue), with respect to s, obtained from sampling 500 independent scalar random variables s (from the uniform distribution on S 1 ) and 500 independent 100 × 100 GUE matrix pairs H 1 , H 2 , with the prediction (2.15). We also test the precision of the latter GUE-pair sampling by finding the empirical standard deviation (with respect to H 1 , H 2 ) of the empirical mean of the monoparametric SFF (orange). We observe that for both ensembles our resolvent approximation seems valid for all t < N .
where the function where ψ(t) ∼ 1 is a positive function with some oscillation.
In particular, this result immediately shows the following concentration effect: i.e. averaging in s reduces the size of the fluctuation of the SFF by a factor of t −1/4 .

Note that
both in the slope and ramp regimes showing that not only the expectation but also the variance of the SFF for the monoparametric ensemble coincide with those for the Wigner ensemble to leading order, hence they follow the universal pattern (red and blue curves in the second plot in Figure 2). However, the dependence of E s [SFF(t)] on the fixed Wigner matrix pair (H 1 , H 2 ) is still present, albeit to a lower order, expressed by the residual standard deviation S res (t) whose relative size decreases as t −1/4 as t increases (orange curves in Figure 2). It is quite remarkable that a single random mode s generates almost the entire randomness in the ensemble that is responsible for the universality of SFF. A similar phenomenon was manifested in the Wigner-Dyson universality proven in [15].
Remark 2.10. Based upon extensive numerics (see Figure 2) we believe that (2.13), (2.15) and (2.18) hold up to any t N , i.e. in the entire slope-dip-ramp regime and not only up to some fractional power of N as stated and proved rigorously. The proof for the entire regime t N is out of reach with the current technology based upon the multi-resolvent local law Lemma 3.3 whose error term does not trace the expected improvement due to different spectral parameters z 1 = z 2 . We expect that the entire ramp regime t N should be accessible by resolvent techniques if a sharp version of Lemma 3.3, tracing the gain from z 1 = z 2 , was available.
Remark 2.11. We stated Theorems 2.7 and 2.8 only for the first two moments but the CLT from Theorem 2.5 allows us to compute arbitrary moments E| e itH | 2m for the Wigner case and E s | e itH s | 2m for the monoparametric case (together with their concentration in the (H 1 , H 2 )space), albeit with worsening error estimates. This would lead to rigorous results of the type (2.13) and (2.15) but for a shorter time scale t N c(m) with some c(m) > 0. However, in the spirit of Remark 2.10, we believe that e itH s can be approximated for any t N , to leading order, by a family of complex Gaussians ξ(t, s) of mean and variance with v sr from (2.7). Note that (2.20) also specifies the covariance of ξ(t, s) and ξ(t , s ) = ξ(−t , s ) for different times.
The next lemma, to be proven in Section 3.2.4, provides explicit asymptotic formulas for v ss ± (t), in particular they imply the asymptotics in (2.14) together with e(t) ∼ t −3/2 (up to some oscillation due to the Bessel function) in the large t regime.
The relation in (2.17) requires a stationary phase calculation that will be done separately in Section 5.
2.3. Implications for sampling. Determining the standard deviation of | e itH | 2 is important for numerical testing of (2.13). By taking the empirical average E n H of n independent Wigner matrices we may approximate the true expectation E H | e itH | 2 at a speed c.f. the top of Figure 3.
Here Ω(· · · ) indicates an oscillatory error term of the given size. In the ramp regime the fluctuation of E n H | e itH | 2 thus scales like t/( √ nN 2 ) using (2.14). In particular, this fluctuation vanishes as the sample size n goes to infinite, hence the statistics via sampling to test (2.13) is consistent. In contrast, for the monoparametric ensemble, by taking the empirical average of n copies of s we naturally have Replacing the first term by its expectation plus its fluctuation in the H-probability space, we also get where the error term contains both standard deviations and satisfies , S res (t)) n = 5 n = 20 n = 1000 Figure 3. In the first plot we show the empirical mean of | e itH | 2 for k independent GUE matrices H. As expected the standard deviation of the sample average fluctuates within a strip of width n −1/2 Std H | e itH | 2 , in particular the sample average exactly reproduces the mean if n → ∞. In the second plot we show the empirical mean of | e itH s | 2 for k independently sampled scalar random variables s for a fixed GUE matrix pair H 1 , H 2 . We observe that while the sample mean approximates the true mean E s increasingly well as n → ∞, the latter is still dependent on the chosen realisation of H 1 , H 2 . Thus the empirical mean fluctuates in a strip of width max(n −1/2 S wig (t), S res (t)) around the doubly averaged E H E s | e itH s | 2 .
due to (2.15) and (2.17). In particular, both in the slope and in the ramp regimes the size of the fluctuation of E n s | e itH s | 2 does not vanish even as the number of samples goes to infinity, n → ∞, hence the statistics is not consistent, c.f. the bottom of Figure 3. However, this lack of consistency, expressed by S res (t) is still negligible compared with the leading first term in (2.24) by a factor t −1/4 1 in the large t regime, see (2.19). We recall that mathematically rigorously we can prove all these facts only for t N 1/4 , i.e. well before the dip time, but the numerical tests leave no doubt on their validity in the entire regime 1 t N .

Extensions.
Beside the Wigner ensemble, we formulated our main results on SFF for the normalized monoparametric model in Theorem 2.8. We chose this model for definiteness, but our approach applies to the multi-parametric as well as to the un-normalised models introduced in Remark 2.4. Here we explain the modified results for these natural generalisations.
First, for the multi-parametric normalised model, H s = s 1 H 1 + . . . + s k H k with k − 1 effective parameters s ∈ S k−1 , Theorem 2.8 holds true verbatim modulo different sizes for the residual standard deviation S res (t). In fact, we have Consequently, the upper bounds on the times of proven validity in (2.15) slightly improve but they still remain below the dip time and we omit the precise formulas. We note that the t-power in (2.26) is not optimal for k ≥ 3. A refined stationary phase estimate could be used to improve the estimate but we refrain from doing so since our primary interest is the mono-parametric model with few degrees of freedom. Second, for the un-normalised model H s = s 1 H 1 + s 2 H 2 with two effective parameters s ∈ R 2 , Theorem 2.8 also holds true modulo some minor changes. More precisely, (2.15) becomes with S res obtained from replacing v sr by v sr from Remark 2.6 in (2.12). For S res (t) a stationary phase calculation gives the modified assuming that s has an absolutely continuous distribution with a differentiable, compactly supported density ρ on R 2 with ρ(0) = 0. We will not prove the relation in these formulas in this paper, we only show how to obtain the necessary upper bound on them at the end of Section 5.
Note that now 29) i.e. the fluctuation due to the residual randomness of (H 1 , H 2 ) after taking the expectation in s remains negligible, in fact it is reduced compared with the normalised case (2.19). As a consequence t 1/4 in (2.25) is replaced by t 3/4 .
Analogous results hold for the most general multi-parametric un-normalised model as well as to the mono-parametric inhomogeneous model H s = H 1 + sH 2 , s ∈ R. We omit their precise formulation, the key point is that the analogue of (2.27) hold in all cases with a residual standard deviation S res (t) being smaller than the leading term S wig (t) by a polynomial factor in t (e.g. by t −1/2 for H s = H 1 + sH 2 ). This guarantees that the universality of SFF holds for all these models. Table 1 summarizes the decay exponents of our main parametric models. Table 1. For our three main parametric models the following table lists the size of the residual fluctuation compared to the fluctuation of the Wigner-SFF.
Quenched parametric model Randomness Outline. The rest of the paper is organised as follows. In Section 3 we outline the resolvent method and explain how via the Helffer-Sjöstrand representation a resolvent CLT implies the CLT for the linear statistics f (λ i ) of arbitrary test functions f from which our main results Theorems 2.5-2.8 follow. In Section 4 we present the proof of the resolvent CLT, while in Section 5 we conclude the proof of the asympotics (2.17) via a stationary phase argument.

Resolvent method
Let H be a Wigner matrix 5 and G(z) := (H − z) −1 its resolvent with a spectral parameter z ∈ C \ R. Define m sc (z), the Stieltjes transform of the semicircle law: The local law for a single resolvent states that the diagonal matrix m(z) · I well approximates the random resolvent G(z) in the following sense (see e.g. [5,20,35]): with η = | z|, for any fixed deterministic matrix A and deterministic vectors x, y. The first bound is called averaged local law, while the second one is the isotropic local law. The bounds (3.2) are understood in very high probability for any fixed ξ > 0. The Helffer-Sjöstrand formula with z = x + iη and d 2 z := dη dx, expresses the linear statistics of arbitrary functions as an integral of the resolvent G(z) and the almost-analytic extension of f . Here the free parameter τ ∈ R is chosen such that N −1 τ −1 1, and χ a smooth cut-off equal to 1 on [−5, 5] and equal to 0 on [−10, 10] c . The same τ was used to define the weighted H 2 -norm (2.4) and eventually we will optimize its value, a procedure that improves the standard error terms in the CLT. By (3.2) it follows that In order to compute the fluctuation in (3.5) via (3.3) we need to understand the correlation between G(z) , G(z ) for two different spectral parameters z, z which turns out to be given by allows only deterministic test matrices multiplying G. Nevertheless G(z)G(z ) is still approximable by a deterministic object: Statements of the form (3.7) with an appropriate error term are called multi-resolvent local laws. We will apply this theory to the product of the resolvents G s of H s = s 1 H 1 + s 2 H 2 for two different parameters s, see the corresponding local law on G s G r in (3.11) later. Even though H 1 and H 2 as well as s and r are independent, the common (H 1 , H 2 ) ingredients in H s and H r introduce a nontrivial correlation between these matrices. We therefore need to extend CLT for resolvents via multi-resolvent local laws to this parametric situation.

10)
with ρ i := π −1 | m i |. We point out that similar resolvent CLT have often been used as a basic input to prove CLT for linear eigenvalue statistics of both Hermitian and non-Hermitian matrices down to optimal mesoscopic scales (see e.g. [11,10,14,28,29,30,37,38,36]). The main novelty here is to extend the resolvent CLT to the monoparametric ensemble.
Along the proof of Proposition 3.1 we establish the following multi-resolvent local laws.
we have the two-and three-resolvent local laws where m i = m sc (z i ), with very high probability for any fixed ξ, > 0 and | z i | ≥ N −1+ .
The proofs of Proposition 3.1 and Lemma 3.3 will be presented in Section 4. In these proofs we will often use the standard cumulant expansion (see [8,30,34] in the random matrix context): Here ∂ ab denotes the directional derivative ∂ h ab , the first term in the rhs. represents the second order (Gaussian) contribution, while the sum in (3.12) represents the non-Gaussian contribution with κ p,q ab denoting the joint cumulant of p copies of N 1/2 h ab and q copies of N 1/2 h ab . The cumulant expansion is typically truncated at a high (N -independent) level R with an error term Ω R that is negligible. To see this, note that in our applications f will be a product of resolvents at spectral parameters z i with η * = min | z i | 1/N hence derivatives of f remain bounded with very high probability by the isotropic local law (3.2) thus the tail of the series (3.12) decays as N −(k+1)/2 .

3.2.
Proof of Theorem 2.5. The proof of Theorem 2.5 is divided into three steps: (i) computation of the expectation, (ii) computation of the variance, (iii) proof of Wick Theorem. The expectation is computed in Section 3.2.1, while the Wick Theorem and the explicit computation of the variance are proven in Section 3.2.2.

Expectation. Using the bound
and | G s − m | N ξ (N η) −1 by (3.2), with m = m sc , we conclude that for any N −1 η 0 τ −1 . Note that we chose η 0 N −1 in order to use Proposition 3.1.
Plugging (3.10) into (3.14), and using (3.13) to estimate the error term, we get that where to go to the last line we chose η 0 ∼ N −1+ , for some very small > 0, and we used the norm f τ defined in (2.4).
Adding back the regime |η| < η 0 at the price of a negligible error smaller than the one in (3.15), by explicit computations (exactly as in [13,Section D.1]) in the leading term of (3.15), we conclude (3.16)

Second moment and Wick theorem.
Define (3.17) then in this section, using Proposition 3.1, we compute the leading order term of E H L N (f 1 , s 1 )L N (f 2 , s 2 ). More precisely, by (3.8) for p = 2, and using (3.13) to estimate the error term, it follows that 18) where to go to the last line we chose η 0 ∼ N − τ −1 , for any > 0, and V 12 is defined in (3.9). From (3.18), adding back the regimes |η i | < N − τ −1 at the price of an error smaller than the one in the last line of (3.18), we conclude (2.6) for p = 2 by explicit computation in deterministic term as in [13,Section D.2]. We conclude this section with the computation of higher moments: which concludes the proof of (2.6) for any p ∈ N, after adding back the regimes |η i | < N − τ −1 at the price of an error smaller than the one in the second line of (3.19).

3.2.3.
Proof of Theorems 2.7 and 2.8. We just show how Theorem 2.7 follows by Theorem 2.5; the proof of Theorems 2.8 is completely analogous and so omitted. In particular, to make the presentation shorter we just show the details of the proof of the first equation in (2.13). Using Theorem 2.5 as an input, the proof of the second equation in (2.13) follows exactly in the same way.
First of all we write with E wig (t) defined in (2.14). Finally, using the asymptotics of E wig (t) in (2.14) we readily conclude that the error term in (3.21) is much smaller than the leading term E wig (t) as long as t N 5/11 .

3.2.4.
Variance calculations when s = r and the proof of Lemma 2.12. We note that (2.7) generalises the standard variance calculation yielding (1.7) to s = r. For the case s = r the two formulas can be seen to be equivalent using the identity and v ss where the series representations follow directly from [13, Remark 2.6] and the series evaluations follow from [50, V. §5.51 (1)].

Central Limit Theorem for resolvents
The proof of Proposition 3.1 is divided into three parts: in Section 4.1 we compute the subleading order correction to E H G i , in Section 4.2 we explicitly compute the variance, and finally in Section 4.3 we prove a Wick Theorem. To keep our presentation simpler we only prove the CLT for resolvent in the complex case, the real case is completely analogous and so omitted (see e.g. [13,Section 4]).

4.1.
Computation of the expectation. For G = G s (z) we have so that G ≈ m for the solution m to the equation The fact that G ≈ m in averaged and isotropic sense follows from the single resolvent local law (3.2). This is a consequence of the fact that the term H i G in (4.1) is designed in such a way EH i G ≈ 0 in averaged and isotropic sense. In fact, for Gaussian ensembles EH i G = 0 and the deviation from zero for general ensembles is a lower order effect due to non-vanishing of higher order cumulants of the entry distribution. From (4.1) and (4.2) we obtain Additionally, we define ρ(z) := π −1 | m(z)|. For simplicity of notation from now on we assume that z > 0. We remark that by 1 − m 2 · in the lhs. of (4.3) we denote the operator acting on matrices We then start computing: for any small ξ > 0, where we used that |1 − m 2 | ρ, that m = m 2 /(1 − m 2 ), and that | G − m | N ξ (N η) −1 by (3.2). Then using cumulant expansion (see (3.12), ignoring the truncation error) we claim (and prove below) that where κ (i) (ab, α) denotes the joint cumulant of the random variables h i ab , h i α1 , . . . , h i α k , and ∂ (i) Here h i αj are the entries of H i . Combining (4.5) with (4.4) we obtain exactly the expansion in (3.10) (recall that here we only present the proof in the complex case, the real case being completely analogous).
We start with k = 2. In this case we can neglect the summation when a = b since it gives a contribution N −3/2 . Hence we can assume that a = b. In this case we have the bounds with very high probability. The first bound in (4.6) follows from the isotropic law in (3.2). The second bound in (4.6) follows by writing G = m + (G − m) and using the isotropic resummation with e a ∈ R N the unit vector in the a-direction and 1 := (1, . . . , 1) ∈ R N . For k = 3 whenever there are at least two off-diagonal G's we get a bound N −2 η −1 ρ. The only way to get only diagonal G's is that α is one of (ab, ba, ba), (ba, ab, ba), (ba, ba, ab); in this case κ (i) (ab, α) = κ 4 /N 2 , with κ 4 := κ (i) (ab, ba, ab, ba). For these terms we have (see [13,Lemma 4.2] for the analogous proof for Wigner matrices) with very high probability, where the error comes from terms with at least two off-diagonal G's.
Hence we finally conclude that the terms k = 3 give a contribution: All the terms with k ≥ 4 can be estimated trivially using that |G ab | 1 with very high probability by (3.2).

4.2.
Computation of the variance. For the second moment, using (4.3), we compute We made this replacement to use the equation for G − m from (4.3).
Then performing cumulant expansion we compute: (4.11) Using the local law (3.11) we conclude that with very high probability.
We are now left with the third line of (4.11). The α-derivative in (4.11) may hit either (G 1 ) ba or G 2 − E 2 G 2 . Define where k 1 denotes the number of derivatives that hit (G 1 ) ba . The summation α indicates the summation over tuples α ki i , with i = 1, 2 and k 2 := k − k 1 . We now claim that Similarly to the proof of [13, Eq. (113)] we readily conclude that the terms in Φ k in (4.13) with k = 2, or k 1 odd and k ≥ 4, or k ≥ 3 and k 1 even are bounded by N ξ Ψ 2 L −1/2 . For k = 3 and k 1 = 3, analogously to (4.8)-(4.9) we obtain a contribution of to (4.14). For k = 3 and k 1 = 1 we start computing the action of the α 1 -derivative on (G 1 ) ba : with very high probability. Additionally, we have that (see [13,Lemma 4.2] for the analogous proof for Wigner matrices) with very high probability. We thus conclude that the (k, k 1 ) = (3, 1) contribution to (4.14) is where we used that only the terms with κ 4 = κ (i) (ab, ba, ab, ba) contribute. This concludes the proof of (3.8) for p = 2.

4.3.
Asymptotic Wick Theorem. The proof of the Wick Theorem for resolvent is completely analogous to the one for Wigner matrices in [13,Section 4]. The only differences are that along the proof we have to carefully keep track of the s i , as we did in Section 4.2, since in the Wigner case s 1 = · · · = s p = (1, 0), and that we have to use the three G's local law in (3.11) with a weaker error term instead of the one in [13,Eq. (45)] to compute the leading order deterministic term (see (4.21)-(4.22) below). Define with S ⊂ N. Similarly to Section 4.2 we start computing Then proceeding analogously to (4.13)-(4.18) (see also [13,Eqs. (110)-(114)] for the Wigner case) we conclude that In order to compute the leading deterministic term of G 1 G 2 i we use the local law (3.11) and get (4.22) Finally, proceeding iteratively we conclude (3.8).

4.4.
Multi resolvents local laws. The goal of this section is to prove the local laws in (3.11).
For the second local law in (3.11) we start writing the equation for G 1 G 2 2 : (4.27) Then, using the usual single G local law and the two G's local law from (3.11), we conclude that (4.28) Then, using that with very high probability, and (4.26) we conclude (3.11). The proof of (4.29) follows analogously to the one of (4.25).

Stationary phase calculations
The proof of (2.17) is a tedious stationary phase calculation since v sr ± (t), the leading part of v sr ±,κ (t) (see (2.12)), are given in terms of oscillatory integrals for t 1 being the large parameter. Unlike in the s = r case, no explicit formula similar to (2.21) is available. The main complication is that V sr (x, y) defined in (2.7) has logarithmic singularities, integrated against a fast oscillatory term from f g , so standard stationary phase formulas cannot directly be applied. Nevertheless, a certain number of integration by parts can still be performed before the derivative of the integrand stops being integrable and the leading term can be computed.
We will first give a proof of then we explain how to modify this argument to obtain in both cases with a definite large t asymptotics with computable explicit constants. The proof reveals that the corresponding results for E s E r v sr + (t) and E s E r v sr + (t) 2 guarantee only an upper bound with the same behavior depending on the distribution of s on S 1 , the matching lower bound may not necessarily hold. However, for our main conclusions like (2.19) only an upper bound on S res (t) is important. All these exponents are valid for the k = 2 case, i.e. for H s = s 1 H 1 + s 2 H 2 . For the general multivariate model, k ≥ 3, exactly the same proof gives the upper bounds The k-dependence of the exponent can directly be related to the tail behavior (5.5) and (5.8) below, so for simplicity we will carry out our main analysis only for k = 2. In fact, a more careful analysis yields somewhat better bounds than (5.4), but we will not pursue this improvement here. We introduce a new random variable U := s, r then clearly |U | ≤ 1 and since r, s ∈ S k have a distribution with an L 2 density, it is easy to see that the density ρ * of U is bounded by The fact that the main contribution to the lhs. of (5.4) comes from the regime U ≈ 1 is a consequence of the singularity of the logarithm in (2.7) in this regime (see computations below).
Indeed, U = cos α where α is the angle between r, s and near U ≈ ±1 we have 1 ± U ≈ 1 2 α 2 (1 + O(α 2 )). For example, for k = 2 we have in the 1 regime. In particular, the bound in (5.5) is actually an asymptotics in the most critical U ≈ 1 regime, while the regime U ≈ −1 it may happen that the density ρ * is much smaller than (5.5) predicts. For symmetric distribution, ρ(s) = ρ(s + π), the two asymptotics are the same. Similar relations hold for k ≥ 3, in which case we have with an explicit asymptotics for U ≈ 1. So we will study Since |m| ≤ 1, as long as |U | ≤ 1 − δ for any small fixed δ > 0, the arguments of the logarithms are separated away from zero and they allow to perform arbitrary number of integration by parts, each gaining a factor of 1/t. There is a square root singularity of m(x) and m(y) at the spectral edges 2, −2 which still allows one to perform one integration by parts in each variable since m is still integrable. Therefore the contribution of the regime |U | ≤ 1 − δ to (5.9) is of order t 2 (1/t) 2 = O(1), hence negligible compared with the target (5.1). In the sequel we thus focus on the important U ≈ ±1 regimes, in particular every dU integral is understood to be restricted to |U | ≥ 1 − δ.
Note that m(y) = −m(−y), so if U has a symmetric distribution (for example if s ∈ S 1 has a symmetric distribution), then by symmetry we have For definiteness, we focus on R − (t), the analysis of R + is analogous. From the explicit form This shows that the critical regime is U ≈ 1 and x ≈ y for the first integrand in (5.9) and U ≈ −1, x ≈ −y for the second. Again, for definiteness, we focus on the first regime, i.e. on the first log-integrand in (5.9) and establish the following relations for large t and k = 2: Lemma 5.1. In the k = 2 case we have and for t ≥ 1. For t 1 an analogous asymptotic statement holds with explicitly computable positive constants that depend on the distribution of s.
Proof of Lemma 5.1. Introduce the variables Since |x|, |y| ≤ 2 we have |a| ≤ 2, |b| ≤ min{|2 − a|, |2 + a|}. (5.13) In terms of these variables, we have (5.14) Here we also used the identity following from the equation −m(x) −1 = x + m(x) and similarly for m(y). In the regime (5.13) we have Note that by Taylor expansion around a and concavity of the function , as well as We define the function for |U | ≤ 1, and a, b as in (5.13). We will use F to approximate in the critical regime where |U | ≥ 1 − δ and |b| ≤ δ for some small fixed δ > 0. We clearly have in the regime (5.13), where |b| ≤ √ 4 − a 2 ≤ 2d, using (5.16). For the difference function an elementary calculation from (5.14)-(5.16) gives in the regime |U | ≥ 1 − δ and |b| ≤ δ. Furthermore, similar estimates hold for the first derivative; as well as for the second derivatives The proof of Lemma 5.1 consists of two parts. First we compute the integral with log F , i.e. we show that with an explicit positive constant factor in the asymptotic regime t 1. Second, we show that the integrand in (5.11) can indeed be replaced with F up to a negligible error, Part I. To prove (5.24), we use the a, b variables and the symmetry of F in a to restrict the a integration to 0 ≤ a ≤ 2: db e 2itb log F U, a, b). In the boundary terms we can perform one more integration by parts in the a variable when plugged into (5.26). Just focusing on the first boundary term in (5.27), using |U | ≤ 1 we have Since ρ * (U ) is a density bounded by (1 − U 2 ) −1/2 in the U ≈ 1 regime from (5.5), the logarithmic singularity is integrable showing that the two boundary terms in (5.27), when plugged into (5.26), give at most an O(1) contribution, negligible compared with the target behavior of order √ t in (5.1). To compute the main (second) term in the rhs. of (5.27), we first extend the integration limits to infinity and claim that gives a negligible contribution to (5.26) (the lower limit is removed similarly). Indeed, we apply one more integration by parts inside the absolute value in (5.28): .
Its contribution to the rhs of (5.28) is thus bounded by Summarizing, we just proved that where in the second line we used residue calculation, in the third line we used that in the regime U ≈ 1 with some positive constant c 0 > 0 depending on the distribution of s (see (5.6)), and finally in the fourth line we used that for large t the main contribution to the integral comes from U ≈ 1 in order to simplify the integrand. This completes the proof of (5.24).
Part II. We now prove (5.25). After changing to the a, b variables and considering only the 0 ≤ a ≤ 2 regime for definiteness, we perform an integration by parts in b that gives  recalling the definition of M from (5.18). The first term in (5.30) is the boundary term, which is negligible after one more integration by parts using the ∂ a derivative estimate from (5.22).
In the second term we perform one more integration by parts to obtain where the first term comes from the boundary. In this term we can perform one more integration by parts in a. The corresponding boundary terms are easily seen to be order one and the main term is analogous to the first term in the rhs of (5.31) just we have the mixed ∂ a ∂ b derivative. Recalling ∆ = M − F from (5.20), we use the estimate in the situation where M F > 0 are positive functions (see (5.19)). Similar bound holds for the mixed derivative. Therefore, we can estimate both integrals in (5.31) as follows: Here we used the bounds (5.21), (5.22) and (5.23) and that |b| ≤ 2 − a 4 − a 2 to simplify some estimates. For computing the derivatives of F we used its explicit form (5.17). This completes the proof of (5.25) and thus also the proof of (5.11) in Lemma 5.1. The proof of (5.12) is very similar. We again approximate M = |1 − U m(x)m(y)| 2 by F at the expense of negligible errors. We omit these calculations as they are very similar those for (5.11) and focus only on the main term which is (see the analogous (5.26)) 16t 4 dU ρ * (U ) (5.33) After one integration by parts and neglecting the lower order boundary terms, we have the following analogue of (5.29): as the leading term. This proves (5.12) and completes the proof of Lemma 5.1.
We close this section by commenting on the proof of the upper bound in (2.28). Recall from (2.16) that the essential part of S res (t) in the slope regime is given by E s E r v sr (t) expressed by the oscillatory integrals R ± (t) := t 2 R 2 ρ(s)ρ(r) ds dr where U = s,r s r is the cosine of the angle between the vectors s, r ∈ R 2 . Assuming for the moment that ρ, the density of s, is rotationally symmetric, ρ(s) = ρ( s ) with a slight abuse of notations, performing an integration by parts in x and ignoring lower order boundary term. In the last step we also computed the Fourier transform (we used that ρ(0) = 0 to extend ρ to R). The main contribution comes from the regime where A is nearly singular, and considering (5.10), we just focus on the regime U ∼ 1 and x ∼ y, the singularity from the other logarithmic term is treated analogously. Similarly to the proof of (5.25) we may ignore the edge regime, and effectively we have Here we used the regularity of ρ, so that the last two factors essentially restrict the integration to the regime |x|, |y| 1/t. The final inequality is obtained just by scaling.
To understand S res (t) in the ramp regime, we need to compute E s E r v sr ± (t) 2 , i.e. integrals of the following type:  The last two factors essentially restrict the integration to the regime |x − x | 1/t, |y − y | 1/t and by scaling we obtain a bound of order t 1/2 for |(5.38)|. This completes the sketch of the proof of (2.28) in the radially symmetric case, the general case is analogous but technically more cumbersome and we omit the details. Data availability statement. All data generated or analysed are included in this published article.