Onset of the wave turbulence description of the longtime behavior of the nonlinear Schr\"odinger equation

Consider the cubic nonlinear Schr\"odinger equation set on a d-dimensional torus, with data whose Fourier coefficients have phases which are uniformly distributed and independent. We show that, on average, the evolution of the moduli of the Fourier coefficients is governed by the so-called wave kinetic equation, predicted in wave turbulence theory, on a nontrivial timescale.


The Kinetic equation
The central theme in the theory of non-equilibrium statistical physics of interacting particles is the derivation of a kinetic equation that describes the distribution of particles in phase space. The main example here is Boltzmann's kinetic theory: rather than looking at the individual trajectories of N -point particles following N −body Newtonian dynamics, Boltzmann derived a kinetic equation that described the effective dynamics of the distribution function in a certain large-particle limit (so-called the Boltzmann-Grad limit).
A parallel kinetic theory for waves, being as fundamental as particles, was proposed by physicists in the past century. Much like the Boltzmann theory, the aim is to understand the effective behavior and energy-dynamics of systems where many waves interact nonlinearly according to time-reversible dispersive or wave equations. The theory predicts that the macroscopic behavior of such nonlinear wave systems is described by a wave kinetic equation that gives the average distribution of energy among the available wave numbers (frequencies). Of course, the shape of this kinetic equation depends directly on the particular dispersive system/PDE that describes the reversible microscopic dynamics.
The aim of this work is to start the rigorous investigation of such passage from a reversible nonlinear dispersive PDE to an irreversible kinetic equation that describes its effective dynamics. For this, we consider the cubic nonlinear Schrödinger equations on a generic torus of size L (with periodic boundary conditions) and with a parameter λ > 0 quantifying the importance of nonlinear effects (or equivalently via scaling, the size of the initial datum): The spatial dimension is d ≥ 3. Here, and throughout the paper, we denote where β := (β 1 , . . . , β d ) ∈ [1,2] d , and we denote Z d L := 1 L Z d , the dual space to T d L . Typically in this theory, the initial data are randomly distributed in an appropriate fashion. For us, we consider random initial data of the form for some nice (smooth and localized) deterministic function φ : R d → [0, ∞). The phases ϑ k (ω) are independent random variables, uniformly distributed on [0, 1]. Notice that the normalization of the Fourier transform is chosen so that u 0 L 2 ∼ 1.
Filtering by the linear group and expanding in Fourier series, we write (1. 2) The main conjecture of wave turbulence theory is that as L → ∞ (large box limit) and λ 2 L d → 0 (weakly nonlinear limit), the quantity converges to a solution of a kinetic equation. More precisely, it is conjectured that, as L → ∞, t → ∞ and λ 2 L d → 0, then ρ L k (t) ∼ ρ(t, k), where ρ : R × R d → R + satisfies the wave kinetic equation , we introduced the convention k 0 = k and the notation and finally δ( )δ( ) is to be understood in the sense of distributions: δ( ) is just the convolution integral over k 1 − k 2 + k 3 = k, whereas δ( = 0) := lim →0´ϕ ( )dk 1 dk 2 dk 3 for some ϕ ∈ C ∞ c (R) with´ϕ = 1. Note that this is absolutely continuous to the surface measure through the formula δ( ) = 1 |∇ | dμ , with dμ being the surface measures on { = 0}.

Background
In the physics literature, the wave kinetic Eq. (WKE) was first derived by Peierls [33] in his investigations of solid state physics; it was discovered again by Hasselmann [23,24] in his work on the energy spectrum of water waves. The subject was revived and systematically investigated by Zakharov and his collaborators [38], particularly after the discovery of special power-type stationary solutions for the kinetic equation that serve as analogs of the Kolmogorov spectra of hydrodynamic turbulence. These so-called Kolmogorov-Zakharov spectra predict steady states of the corresponding microscopic system (possibly with forcing and dissipation at well-separated extreme scales), where the energy cascades at a constant flux through the (intermediate) frequency scales. Nowadays, wave turbulence is a vibrant area of research in nonlinear wave theory with important practical applications in several areas including oceanography and plasma physics, to mention a few. We refer to [31,32] for recent reviews.
The analysis of (WKE) is full of very interesting questions, see [16,22,34] for recent developments, but we will focus here on the problem of its rigorous derivation. Several partial or heuristic derivations have been put forward for (WKE), or the closely related quantum Boltzmann equations [1][2][3]10,13,17,28,30,36]. However, to the best of our knowledge, there is no rigorous mathematical statement on the derivation of (WKE) from random data. The closest attempt in this direction is due to Lukkarinen and Spohn [29], who studied the large box limit for the discrete nonlinear Schrödinger equation at statistical equilibrium (corresponding to a stationary solution to (WKE)).
In preparation for such a study, one can first try to understand the large box and weakly nonlinear limit of (NLS) without assuming any randomness in the data. In the case where (NLS) is set on a rational torus, it is possible to extract a governing equation by retaining only exact resonances [6,18,20,21]. The limiting equation is then Hamiltonian and dictates the behavior of the microscopic system (NLS on T d L ) on the timescales L 2 /λ 2 (up to a log loss for d = 2) and for sufficiently small λ. It is worth mentioning that such a result is not possible if the equation is set on generic tori, since most of the exact resonances are destroyed there.
Finally, we point out that there are very few instances where the derivation of kinetic equations has been done rigorously. The fundamental result of Lanford [27], later clarified in [19], deals with the N -body Newtonian dynamics, from which emerges, in the Grad limit, the Boltzmann equation. This can be understood as a classical analog of the rigorous derivation on (WKE). Another instance of such success was the case of random linear Schrödinger operators (Anderson's model) [12,14,15,35]. This can be understood as a linear analog of the problem of rigorously deriving (WKE).

The difficulties of the problem
There are several difficulties in proving the validity of (WKE) which we now enumerate: (a) The textbook derivation of the wave kinetic equation is done under the assumption that the independence of the data propagates for all time. This assumption cannot be verified for any nonlinear model. A way around this difficulty is to Taylor expand the profile a k in terms of the initial data. Such an expansion can be represented by Feynman trees, and permits us to utilize the statistical independence of the data in computing the expected value of |a k | 2 . Moreover one needs to control the errors in such an expansion to derive the kinetic equation (WKE). These calculations are presented in Sects. 4 and 5. (b) The wave kinetic equation induces an O(1) change on its initial configuration at a timescale of τ . Thus we need to establish that for solutions of (NLS), the expansion mentioned above converge up to time τ . This requires a local existence result on a timescale which is several orders of magnitude longer than what is known. This shortcoming is a main reason why our argument cannot reach the kinetic timescale τ , and we have to contend with a derivation over timescales where the kinetic equation only affects a relatively small change on the initial distribution, and as such coincides (up to negligible errors) with its first time-iterate.
Therefore, a pressing issue is to increase the length of the time interval [0, T ], over which the Taylor expansion gives a good representation of solutions to the nonlinear problem. For deterministic data, the best known results that give effective bounds in terms of L come from our previous work [6] which gives a description of the solution up to times ∼ L 2 /λ 2 (up to a log L loss for d = 2) and for λ 1. Such timescale would be very short for our purposes. To increase T , we have to rely on the randomness of the initial data. Roughly speaking, for a random field that is normalized to 1 in L 2 (T d L ), its L ∞ norm can be heuristically bounded on average by L −d/2 . Therefore, regarding the nonlinearity λ 2 |u| 2 u as a nonlinear potential V u with V = λ 2 |u| 2 and V L ∞ λ 2 L −d , one would hope that this should get a convergent expansion on an interval [0, T ] provided that T λ 2 L −d 1, which amounts to T ≤ √ τ . This is the target in this manuscript. The heuristic presented above can be implemented by relying on Khinchine-type improvements to the Strichartz norms of a linear solution e it β u 0 with random initial data u 0 . Similar improvements have been used to lower the regularity threshold for well-posedness of nonlinear dispersive PDE. Here, the aim is to prolong the existence time and improve the Taylor approximation. The randomness gives us better control on the size of the linear solution over the interval [0, T ], while an improved deterministic Strichartz estimate for e it β ψ L p ([0,T ]×T d ) with ψ ∈ L 2 (T d ), allows us to maintain the random improvement for the nonlinear problem. The genericity of the (β i ) is crucial (as was first observed in [11]), and allows us to go beyond the limiting T 1/ p growth that occurs on the rational torus. Unfortunately, the available estimates here (including those in [11]) are not optimal for some ranges of the parameters λ and L, which is why, in d = 3, our result in Theorem 1.1 below falls short of the timescale √ τ ∼ L 3 /λ. (c) To derive the kinetic equation in the large box limit, using the expansion for ρ L k (t) = E|a k (t)| 2 , one has to prove equidistribution theorems for the quasi-resonances over a very fine scale, i.e., T −1 . Since T could be L 2 , such scales are much finer than the any equidistribution scale on the rational torus. Again, here the genericity of the (β i ) is crucial. For this we use and extend a recent result of Bourgain on pair correlation for irrational quadratic forms [5].

The main result
Precise statements of our results in arbitrary dimensions d ≥ 3 will be given in Sect. 2. Those statements depend on several parameters coming from equidistribution of lattice points and Strichartz estimates. For the purposes of this introductory section, we present a less general theorem without the explicit appearance of these parameters.

Theorem 1.1 Consider the cubic (NLS) on the three-dimensional torus T 3
L . Assume that the initial data are chosen randomly as in (1.1). There exists δ > 0 such that the following holds for L sufficiently large and L −A ≤ λ ≤ L B (for positive A and B): We note that the right-hand side of (1.3) is nothing but the first time-iterate of the wave kinetic Eq. (WKE) with initial data φ (cf. (1.1)) which coincides (up to the error term in (1.3)) with the exact solution of the (WKE) over long times scales, but shorter than the kinetic timescale τ .
The proof this theorem can be split into three components: (1) Section 4: Feynman tree representation. In this section we derive the Taylor expansion of the nonlinear solution in terms of the initial data. Roughly speaking, we write the Fourier modes of the nonlinear solution a k (t) (see (1.2)) as follows: where J n are sums of monomials of degree 2n + 1 in the initial data a (0) , and R N is the remainder which depends on the nonlinear solution a (t) . Each term of J n can be represented by a Feynman tree which makes the calculations of E(J n J n ) more transparent. Such terms appear in the expansion of E|a k | 2 . The estimates in this section rely on essentially sharp bounds on quasi-resonant sums of the form where 1(S) denotes the characteristic function of a set S and Q is an irrational quadratic form. Since A will be taken large 2 A ∼ T L 2 , such estimates belong to the realm of number theory and will be a consequence the third component of this work. The bounds we obtain for such interaction are good up to times of order √ τ which is sufficient given the restrictions on the time interval of convergence imposed by the second component below. (2) Section 5: Construction of solutions. In this section we construct solutions on a time interval [0, T ] via a contraction mapping argument. To maximize T while maintaining a contraction, we rely on the Khinchine improvement to the space-time Strichartz bounds, as well as the long-time Strichartz estimates on generic irrational tori proved in [11]. It is here that our estimates are very far from optimal, since there is no proof to the conjectured optimal Strichartz estimates. (3) Section 8: Equidistribution of irrational quadratic forms. The purpose of this section is two-fold. The first is proving bounds on quasi-resonant sums like those in (1.4) for the largest possible T , and the second is to extract the exact asymptotic, with effective error bounds, of the leading part of the sum. It is this leading part that converges to the kinetic equation collision kernel as L → ∞.
Here we remark, that if Q is a rational form, then the largest A for which one could hope for an estimate like (1.4) is 2 A ∼ L 2 which reflects the fact that a rational quadratic form cannot be equidistributed at scales smaller than L −2 (at the level of NLS, it would yield a time interval restriction of T L 2 for the rational torus). However, for generic irrational quadratic forms, Q is actually equidistributed at much finer scales than L −2 . Here, we adapt a recent work of Bourgain [5] which will allow us to reach equidistribution scales essentially up to L −d .

Notations
In addition to the notation introduced earlier for T d L = [0, L] d and Z d L = 1 L Z d , we use standard notations. A function f on T d L and its Fourier transform f on Z d L are related by Parseval's theorem becomes We adopt the following definition for weighted p spaces: if p ≥ 1, s ∈ R, Sobolev spaces H s (T d ) are then defined naturally by For functions defined on R d , we adopt the normalization We denote by C any constant whose value does not depend on λ or L. The notation A B means that there exists a constant C such that A ≤ C B. We Finally we use the notation u = O X (B) to mean u X B. We would like to thank Peter Sarnak for pointing us to unpublished work by Bourgain [5]. This reference helped us improve an earlier version of our work. We also would like to thank Peter and Simon Myerson for many helpful and illuminating discussions.

The general result
We start by writing the equations for the interaction representation (a k (t)) k∈Z d L , given in (1.2): where we recall Ω(k, , and ϑ k (ω) are i.i.d. random variables that are uniformly distributed in [0, 2π ]. Our results depend on two parameters: the equidistribution parameter ν, and a Strichartz parameter θ p , which we now explain.

The Equidistribution parameter ν
The interaction frequency Ω(k, k 1 , k 2 , k 3 ) above is an irrational quadratic form. Such quadratic forms can be equidistributed at scales that are much smaller than the finest scale ∼ L −2 of rational forms. We will denote by ν the largest real number such that for all k ∈ Z d L , |k| ≤ 1, and > 0, there exists δ > 0 such that, for |a|, |b|

Proposition 2.1
With the above definition for ν, we have Proof The first assertion is classical, e.g., see [6]. The second assertion is proved in Sect. 8.

The Strichartz parameter θ d
Our proof relies on long-time Strichartz estimates, which are used to maintain linear bounds for the nonlinear problem. The genericity of the β's gives crucial improvements from the rational case. The improved estimates for generic β's were proved in [11], The N γ term can be thought of as the time it takes for a focused wave with localized wave number ≤ N , to focus again. For the rational torus γ = 0.
Here we only need to use the L 4 t,x ([0, T ] × T d L ) norm, and therefore we introduce a parameter θ d to record how the constant in the L 4 t,x ([0, T ] × T d L ) estimates depends on L. By scaling, the result in [11] translates into,

The approximation theorem
With these parameters defined, we state the approximation theorem for the cubic NLS in dimension d ≥ 3 and generic β's.
For every 0 , a sufficiently small constant, and L > L * ( 0 ) sufficiently large, the following holds: There exists a set E 0 ,L of measure P(E 0 ,L ) ≥ 1 − e −L 0 such that: if ω ∈ E 0 ,L , then for any L > L * ( 0 ), the solution a k (t) of (NLS) exists in Moreover, For d = 3, 4, the solutions exist globally in time [4,26], and one has the same estimate without multiplying with 1 E 0 inside the expectation.
Here we note that the error could be controlled in a much stronger norm than ∞ , and that other randomizations of the data are possible (complex Gaussians for instance) without any significant changes in the proof.

Formal derivation of the kinetic equation
In this section, we present the formal derivation of the kinetic equation, whose basic steps we shall follow in the proof. The starting point is Eq. (2.1) integrated in time, The derivation of the kinetic equation proceeds as follows: Step 1: expanding in the data Noting the symmetry in (3.1) in the variables k 1 and k 3 , we have upon integrating by parts twice, and substituting (2.1) foṙ a k , Step 2: parity pairing We now compute E|a k | 2 , where the expectation E is understood with respect to the random phases (random parameter ω). The key observation is, . Computing E |a k | 2 with the help of the above formula, we see that, there are no terms of order λ 2 . There are two kinds of terms of order λ 4 obtained as follows: either by pairing the term of order λ 2 , namely (3.2b), with its conjugate, or by pairing one of the terms of order λ 4 , (3.2c) or (3.2d), with the term of order 1, namely a 0 k . Overall, this leads to where degenerate cases occur for instance if k, k 1 , k 2 , k 3 are not distinct 1 . The details of the computation are as follows: (a) Consider first E|(3.2b)| 2 = E(3.2b)(3.2b), and denote k 1 , k 2 , k 3 the indices in (3.2b) and k 1 , k 2 , k 3 the indices in (3.2b). There are two possibilities: Overall, we find, neglecting degenerate cases (which occur if k, k 1 , k 2 , k 3 are not distinct), (b) Consider next the pairing of a 0 k with (3.2c), which contributes 2ERe (3.2c)a 0 k . The possible pairings are • {k, k 2 } = {k 4 , k 6 }, implying k 3 = k 5 , and leading to Ω(k 1 , k 4 , k 5 , This gives, neglecting degenerate cases, where we used in the last line the symmetry between the variables k 1 and k 3 , as well as the identity Re(e iy − 1) = −2| sin(y/2)| 2 , for y ∈ R. (c) Finally, the pairing of a 0 k with (3.2d) can be discussed similarly, to yield Summing the above expressions for E|(3.2b)| 2 , 2ERe a 0 k (3.2c) and 2ERe a 0 k (3.2d) gives the desired result.
Step 3: the big box limit L → ∞ Assuming that Ω(k, we see that, as L → ∞, Step 4: the large time limit t → ∞ Observe that 2´(sin x) 2 x 2 dx = π 2 , so that, in the sense of distributions, Therefore, as t → ∞, Conclusion: relevant timescales for the problem Overall, we find, assuming that the above limits are justified This suggests that the actual timescale of the problem is and that, setting s = t τ , the governing equation should read In which regime is this approximation expected? Let T be the timescale over which we consider the equation.
• In order for (3.4) to hold, the condition (3.3) has to hold, and the limits L → ∞ and T → ∞ have to be taken: one needs • In order for the nonlinear evolution of (3.5) to affect an O(κ) change on the initial data, the two conditions above should be satisfied; in addition T should be of the order of κτ (equivalently s ∼ κ). Thus we find the conditions

Feynman trees: bounding the terms in the expansion
Since we are considering the problem with rapidly decaying φ, then the rapid decay of φ yields all the bounds one needs for wave numbers |k| ≥ L 0 + , thus we might as well consider φ to be compactly supported.

Expansion of the solution in the data
We follow mostly the notations in Lukkarinen-Spohn [29], Section 3 (see also [9]). The iterates of φ, considered in the previous section, can be represented through trees (at least up to lower order error terms). To explain these trees, let us start with the equation satisfied by the amplitude of the wave number a k where the subscript in P 3 is to indicate that it is a monomial of degree 3, and where we suppressed the k dependence for convenience. The expansion can be obtained by integrating by parts on the oscillating factor e −2πisΩ . Thus the first integration by parts gives the cubic expansion, Using the equation for a, we see thatṖ 3 (a) consists of three monomials of degree 5, and if we denote on of them by P 5 , then the integral term consists Another integration by parts gives the quintic expansion, which consist of three terms of the form Consequently, to compute the expansion to order N we need to integrate by parts N times on the oscillating exponentials, giving the expansion, where J n = J n, , and each J n, is a monomial of degree 2n + 1 generated by the n th integration by parts. The index is a vector whose entries keep track of the history of how the monomial J n, was generated. R N +1 is the remaining time integral. Each J n, can be represented by a tree similar to Fig. 1 below. which we now explain.
The trees will be constructed in reverse order of their usage. Therefore the labeling of the wave numbers will be done backwards: n − j, 0 ≤ j ≤ n.
The tree corresponding to J n, , is given as follows.
• There are n+1 levels in the tree, the bottom level is the 0 th level. Descending from the top to the bottom, each level is generated from the previous level by an integration by parts step. Thus level j represents the terms present after n − j integration by parts. • k j,m denote the wave numbers present in level j, and therefore 1 ≤ m ≤ 2(n − j) + 1. • k j,m has a parity σ m due to complex conjugation. For m odd or even, σ m = +1 or σ m = −1 respectively.
• For each level j, we associate a number j , which signals out the wave number k j, j which has 3 branches. This is the wave number of the a (or a) that was differentiated by the j th integration by parts. The index vector , keeps track of the integration by parts history in the tree for J n, . The entries j , 1 ≤ j ≤ n, are given by • The tree has a signature σ = n j=1 (−1) j +1 . • Transition rules. To go from level j to level j − 1, the wave numbers are related as follows Note that for any j, The wave numbers at level 0, i.e., those present in J n, , are labeled • At each level j, the derivative of the element with wave number k j, j (due to the integration by parts), generates a oscillatory term with frequency • We introduce variables s = (s 0 , . . . , s n ) ∈ R n+1 This choice of variables can be explained as follows. Repeated integration by parts generates terms of the form which can be written aŝ With this notation at hand, and Fig. 1 represents J 3,(2,3,1) . The general formula for J n, is given by Here and throughout the manuscript we write (4.4)

Bound on the correlation
Our aim is to prove the following proposition.
The trivial estimate would be that Therefore, the above proposition essentially allows a gain of 1 t over the trivial bound. This gain of 1 t comes from cancelations in the "non degenerate interactions" as will be exhibited by Eq. (4.13).
Before we start the proof of Proposition 4.1, we shall classify the transitions (4.2) as degenerate if degenerates into a line. In this case Ω j (k) = 0. When all transitions in a tree that represents J n, are degenerate we denote the term by D n, (t, k), and if one transition is non degenerate we denote it by J n, (t, k), that is

Cancellation of degenerate interactions
As can be seen from a simple computation in the formula for D n, , the contribu- which is too large. Luckily, all those terms cancel out as shows the lemma below.
Note that this cancellation between graph expectations is essentially due to the invariance of the expectation E|a k | 2 under Wick renormalization, which is a classical trick in the analysis of the nonlinear Schrödinger equation that eliminates all degenerate interactions. However, working at the level of graph expectations might be applicable in more general contexts.
Proof First we note that since each level in the tree has parity equal to 1, then Hence by Eq. (4.7) Thus we obtain The result will follow once we show that This follows by parametrizing the above sum as

Estimate on non-degenerate interactions
Proposition 4.1 now follows from the following lemma:

Fig. 2 Relabeling trees
Proof We will only consider the case of G n , (t, k) = J n , (t, k), since the case G n , (t, k) = D n , (t, k)) is easier to bound. Using the identity we can write for any (e 0 , . . . , e n ) ∈ R n+1 and η > 0 (4.9) Thus by choosing η = 1 t , we have .
Here we employed the notation Ω j = 2πΩ j (k).
Next, since the phases are i.i.d. with mean 0, then only specific paring of the wave numbers contribute nonzero terms, namely the paring should be between terms with the same wave number and opposite parity. For this reason we introduce P = P(n, n , σ , σ ) a class of pairings indices and parities, as illustrated in Fig. 3 P Furthermore, we define the pairing of wave numbers induced by ψ, Hence we obtain By Hölder's inequality, for any m ≥ 1 and b 1 , and applying this bound to the α integral yields .
Let p = p(k) be the smallest integer such that k p+1, p / ∈ {k p, p +1 , k p, p +1+σ p+1, p }, i.e., in the tree for J n, the transition from level p + 1 to level p is not degenerate. Note that 0 ≤ p ≤ n − 1, and We now set Note that, by definition of p, The figure below illustrate all the introduced notations and parings for the product of two non degenerate terms. We distinguish three cases depending on the values of the numbers J i . Case 1: J 1 , J 2 , J 3 ≥ 2n + 2 For a fixed p and ψ we sum over all wave numbers in I p,ψ that yield degenerate transitions, i.e., wave numbers generated in rows 0 ≤ l ≤ p − 1. This contributes L dp to the bound, where * stands for the sum over k p, j , where 1 ≤ j ≤ 2(n − p) + 1 and j / ∈ {I 1 , I 2 , I 3 }, and k 0, j for 1 ≤ j ≤ 2n + 1 with j / ∈ {J 1 , J 2 , J 3 }. The contribution of the above integral is acceptable as long as the denominator is O( α −2 ). Therefore, it suffices to prove the desired bound when the domain of integration reduces to α ∈ [−R, R], for some R > 0, since the resonance moduli Ω i are bounded. Furthermore by bounding the integrand by , matters reduce to estimating (4.11) By the identity k p+1,I 1 − k p,I 2 = σ p,I 1 (k p,I 3 − k p,I 1 ), this can also be written and since 2(n− p)+1 j=1 σ p, j k p, j = k n,1 = k, we note that where C depends only on k and the variables k p, j with j / ∈ {I 1 , I 2 , I 3 }.
By setting P = k p,I 3 and R = k p,I 1 , for t < L ν we bound using the equidistribution result in Sect. 8 (4.13) Therefore, we can bound (4.11) by The sum * is over 2(n + n − p − 2) variables; however, because of the pairing Γ ψ , half of them drop out, so that the remaining sum is log t, the above expression can be bounded by, which gives the stated bound. Case 2: only two of J 1 , J 2 , J 3 are ≥ 2n + 2 Suppose for instance that J 2 ≤ 2n+1. Then, there exists I 4 ≤ 2(n− p)+1 such that ψ(I 4 ) = J 4 ≥ 2n+2 (such an index exists because there is an odd number of elements in the set of elements in {1, . . . , 2(n − p) + 1} \ {I 1 , I 2 , I 3 , J 2 }, so they cannot be paired together completely). One can then follow the above argument replacing I 2 by I 4 .
Case 3: two of J 1 , J 2 , J 3 are ≤ 2n + 1 Assume for instance that J 1 , J 3 ≤ 2n + 1 Proceeding as in Case 1, it suffices to bound t n L dp λ 2 L 2d n+n * where * is the sum over k p, j , with j ∈ {1, . . . , 2(n − p)+1}\{I 1 , I 3 , J 1 , J 3 }, and over k 0, j , with j ∈ {1, . . . , 2n + 1}. A crucial observation is that, since 2(n− p)+1 j=1 σ p, j k p, j = k n,1 = k, the wave numbers k p,I 1 and k p,I 3 do not contribute to this sum since the paring k p,I 1 = k p,J 1 and k p,I 3 = k p,J 3 , causes them to cancel one another. Furthermore, 0 ≤ p ≤ n − 2 since J 1 , J 3 ≤ 2n + 1, and therefore we bound the integrand by Overall, we can bound the above by From Eq. (4.12), we conclude n l= p+2 where C only depends on the variables in * . Applying (4.13) enables us to bound the inner sum by L 2d log t, and the α integral by log t. Finally, the number of variables in * is 2(n + n − p − 1). By pairing them there are only n + n − p − 1, and fixing k n,1 = k brings their number down to n + n − p − 2. Thus * will contribute L d(n+n − p−2) . Overall, we obtain the bound which is the desired estimate.

Deterministic local well-posedness
Local or long time existence existence of smooth solutions is usually carried out by using Strichartz estimates to bound solutions. The known Strichartz estimates for our problem (2.2) are not sufficient to allow us to prove existence of solutions for a long time interval where the wave kinetic Eq. (WKE) emerges. However, if the data is assumed to be random, then one has improved estimates due to Khinchin's inequality [7]. Based on this, we first present a local well-posedness theorem provided the data satisfies a certain estimate. In Sect. 6, we show that such an improved estimate occurs with high probability. Moreover, to use the results from Sects. 4 and 8, we will restrict discussion to the case T < L d− 0 .

Strichartz estimate
Recall Eq. (2.2) , which can be written as, Moreover if we denote the characteristic function of the unit cube centered at j ∈ Z d by 1 B j , and define and therefore ψ B 0 = P 1 ψ .
Then, using the Galilean invariance Converting this estimate to its dual, and applying the Christ-Kiselev inequality, one gets for an appropriate choice of C d, used in the definition of S d, .

A priori bound in Z s T and energy
Let Z s T denote the function space defined by the norm, Using Eq. (5.3), we have for every 0 > 0, , and therefore ˆt proving Eq. (5.5).

Lemma 5.2 (A priori energy estimates)
Proof By duality, we have Applying the Strichartz estimate (5.1) yields This establishes the stated bound.

Existence theorem
Local well-posedness for (NLS) will be established in the space Z s T , with data f of size at most I , This seemingly strange normalization is actually well adapted to the problem we are considering. Indeed, consider for simplicity initial data f supported on Fourier frequencies 1, whose L 2 norm is of size L 0 , and with random Fourier coefficients of uncorrelated phases. Then we expect e it β f to be evenly spread over T d L . By conservation of the L 2 norm, this corresponds to e it β f Z s T ∼ I . The solution u ∈ Z s T , satisfies u Z s T ≤ 2I . Moreover (5.9)

Remark 5.4
The time scale T over which the solution can be constructed would be equal to √ τ , up to subpolynomial losses in L, if the long-time Strichartz estimate conjectured in [11] for p = 4 could be established. Since it is currently not known to be true, the result stated above gives a shorter time scale, with a more complicated numerology.
Proof This theorem is proved by using a contraction mapping argument, to find a fixed point of the map, To check that Φ is a contraction on B Z s T (0, 2I ), note that by Eq. (5.5), and thus Φ maps B Z s T (0, 2I ) into itself. Again, by Eq. (5.5), Therefore Φ is a contraction on {u ∈ Z s T u Z s T ≤ 2I }, and the H s estimate follows from the a priori energy bound.
Besides the established bounds on u, we need to investigate the rate of convergence of Φ N (u) → u.

Corollary 5.5 Under the conditions of Theorem 5.3, there holds
Moreover the energy estimate (5.6) gives

Consequently by writing
Next we establish an energy bound for the Feynman trees, Q(k)) .

Corollary 5.6 Under the conditions of Theorem 5.3,
Since U n, is the linear propagator of J n, in physical space, then they can be represented by the following iterative procedure: Set v m 0 = e 2πit β u 0 for 0 ≤ m ≤ 2n + 1 and for any 1 Hence we have U n, = v 1 n . Using the energy estimate (5.6), we bound v 1 We can then descend down the tree by estimating v n− j n− j using the Z s estimate (5.5). This leads to the stated bound.

Improved integrability through randomization
Recall that where the ϑ k (ω) are independent random variables, uniformly distributed on For any t, s, ω, we have In other words, the randomization of the angles of the Fourier coefficients does not have any effect on L 2 based norms. This is not the case for Lebesgue indices larger than 2.
Proof (i) The proof is more or less standard. See [7] for instance.

Proof of the main theorem
Fix 0 > 0 sufficiently small, and recall that T ≤ L d , with This is the set appearing in the statement of Theorem 2.2. By conservation of mass 2) Iterative resolution. To ensure that R ≤ 1 2 we restrict the range of the parameters λ, T relative to L. There are two regimes depending on the Strichartz constant S * and the number theory restriction t ≤ L d− 0 (see Remark 8.2).
For this range of parameters, the energy inequality (5.9) implies Here the energy inequality also implies u L ∞ 1.
Note that for these ranges of parameters T ≤ L −2δ √ τ , where δ is that of Theorem 8.1.
With these restrictions on the range of the parameters we proceed by writing where S N includes all the terms in Φ N (0) of degree greater than N . By Corollary 5.5 and Proposition 5.6, this implies that where the constant depends on N . In terms of Fourier variables this can be written as,

5) Large time limit t ∼ T → ∞.
Since for a smooth function f , Consequently, for 0 sufficiently small and t ≤ T ≤ L d− 0 , we choose L ≥ L 1 ( 0 ) to bound the error term in Step 1 by t τ L − 0 . Also, since R

Number theoretic results
Our aim in this section is to prove the asymptotic formula for the following Riemann sum, The difficulty in proving this theorem is that Ω can be very small, while the stated time interval for the validity of the asymptotic formula is very large. In fact if we restrict ourselves to a timescale which is not too long, then the asymptotic formula is straight forward as will be demonstrated in Proposition 8.10. However to prove this theorem as stated we need to generalize a result of Bourgain on pair correlations of generic quadratic forms [5].
Bourgain considered a positive definite diagonal form, for generic β = (β 1 , . . . , β d ) ∈ [1, 2] d , and proved that for d = 3 the lattice points in the region, are equidistributed at a scale of 1 L ρ , for 0 < ρ < d −1. Specifically, he proved, Our quadratic form Ω, restricted to Σ, can be transformed to Q( p, q), given in (8.1), as follows. Rescale time μ := t L −2 , let K i = Lk i ∈ Z, and denote by Then the sum can be expressed as By defining Hence the sum can be expressed as The quadratic form Q 0 can be diagonalized by making the change of coordinates where p i and q i are either both even or both odd, i.e.
Consequently, the sum (8.3), can be written as four different sums of the form, ( p, q)), (8.4) where Q( p, q) is given by 3 (8.1), and where we suppressed the dependence of W on k for convenience.

Remark 8.2
Note that we do not exclude the points when p 2 i = q 2 i for all i ∈ [1, . . . , n], as Bourgain did. These points contribute O(L d ) to the sum and will be considered as lower order terms. They also explain the O(L d ) term in Theorem 8.1.
It is this fact that prevents us from using the full strength of our equidistribution result which holds for μ = t L −2 ≤ L d−1− , and we use the result for t ≤ L d− . This ensures that O(L d ) term is an error in the asymptotic formula.
To prove the asymptotic formula given in Theorem 8.1, with 0 < μ = t L −2 ≤ L d−1− , we proceed as follows: 1) identify which part of the sum contributes the leading order term and which part contributes error terms; 2) prove equidistribution of lattice points on a coarse scale; 3) present Bourgain's theorem on equidistribution on a fine scale; and finally 4) prove Theorem 8.1.

Identifying main terms vs error terms
To identify the leading order term in the equidistribution formula, we first obtain upper bounds on lattice sums that are optimal up to sub-polynomial factor.
An upper bound on the cardinality of the set, can be obtained by bounding the number of lattice points in subsets of the form, and by (8.5), with M = L 2 , we obtain and (8.6) follows.

Corollary 8.5 The number of elements in R Z , can be bounded by
Moreover, if we further assume |a| , |b| ≤ 1, then we have the improved bound
Remark 8.6 Note, that in terms of the first estimate (8.7), the second term may be treated as an error as long as b − a ≥ L −(d−1)+ 0 for some 0 > 0. Analogously, the second term of (8.8) may be treated as an error assuming b − a ≥ L −d+ 0 .
Following this remark on identifying the leading order term, we can now identify subsets of R Z that contribute error terms only. The first such subsets are when | p i − q i | L 1−δ for some fixed δ > 0 and some i that we may without loss of generality assume to be 1.

Lemma 8.7
For |a| , |b| ≤ 1, the number of elements in R Z satisfying | p 1 − q 1 | L 1−δ satisfy the following bound Proof If p i = q i for at least one i, then by Corollary 8.5 with d replaced by which is lower order. Moreover, if p i = q i for all i, and | p 1 − q 1 | L 1−δ , then the sum over 2 ≤ i ≤ d can be bounded by L 2(d−2) + (b − a) + L 0 + , using Lemma 8.4, while the sum over p 1 and q 1 can be by L 2−δ . This gives a bound of Next we show that if one p i or q i is less than L 1−δ , where we may again assume i = 1, then the contribution to the number of elements in R Z is lower order.

Lemma 8.8
For |a| , |b| ≤ 1, we have the following estimate Proof If both | p 1 | L 1−δ and |q 1 | L 1−δ or p i = q i for at least one i, then by Lemma 8.7 we have the stated bound. Otherwise, the sum over 2 ≤ i ≤ d contributes L 2(d−2) + (b − a) + L 0 + , while the sum over p 1 and q 1 contributes L 2−δ .
From Lemmas 8.7 and 8.8, we have Then, for |a| , |b| ≤ 1, we have the following cardinality bound on the set difference R Z \ R Zδ

Asymptotic formula on a coarse scale
These upper bounds, in particular Corollary 8.5 allow us to present a simple proof of the asymptotic formula for # R Z on a coarser scale, e.g. b − a = L 4 3 . Note hat this is still better then the trivial Riemann sum scale of b−a = O(L 2 ). Q(x, y)) dxdy Proof First we will smooth the characteristic functions by extending the region to a slightly bigger region with a controlled error term. This is done as follows.
Then by setting then by Corollary 8.5, we have assuming that b − a ≥ L 1+4δ . Thus, it is sufficient to obtain the asymptotic formula for (Q( p, q)) .
Using Fourier transform, we express S as Applying Poisson summation we may rewrite S(s) as where z = (x, y), and = (m, n). The term = 0 contributes the asymptotic formula where we used (b − a) < L 2−δ in replacing h L (L 2 Q)) by δ dirac (Q). So it remains to show that the sum for = 0 can be treated as error. First we estimate the sum for s ≤ 1 L 1+δ . In this case we write (z, , s) = L 2 Q(z)s − L · z, and note that since |s| ≤ 1 L 1+δ and |z| 1, and thus upon integrating (8.11) by parts, we obtain z, , s)) dz. (8.13) Since each derivative of W L contributes L 1−δ , then each integration by parts contributes a factor of 1 L δ | | . Applying a sufficient number of integrations by parts, and using the fact that | h L (s)| b − a, we may ensure that the contribution for = 0 and |s| ≤ 1 L 1+δ is arbitrarily small. For |s| ≥ 1 L 1+δ we note that for all N , and thus this term can be treated as an error. This concludes the stated result.

Bourgain's theorem
Now we present Bourgain's proof of equidistribution.

(8.15)
In order to prove Theorem 8.11, we first make a series of reductions.
Step 1: Restrict to dyadic lengths and discrete intervals (a,b) We first show that it sufficient to assume dyadic lengths L = 2 N 1 for N 1 ∈ N and that (a, b) The restriction to dyadic lengths L = 2 N 1 is valid since it only has potential effect of modifying the implicit constants in the theorem. Now suppose (8.15) is satisfied for all such L and (a, b) as described above and suppose we are given another interval (a , b ) such that a , b satisfies a , b ≤ 1 and L −d+1+2 < b − a < L − . Then, by assuming δ is sufficiently small (depending on ε), and summing over intervals of the form (N 2 L −d+1+ε , ( Thus, by again taking δ smaller if needed, we obtain Theorem 8.11 with ε replaced by 2ε, i.e. up to a relabeling of ε, we obtain Theorem 8.11.
Step 2: Ignore intervals that contribute lower order sums Setδ = 4dδ, then by Corollary 8.9 we have forδ sufficiently small, where we have used the restriction of a − b and assumed δ to be sufficiently small compared to . Thus we restrict our attention to the case where With this reduction at hand, we divide each interval into at most L 3δ intervals, and prove that for intervals I α i and J α i , satisfying Conditions (a), (b), and (c) we have )) . (8.17) Summing in α and using (8.16) we have Using thatδ = 4dδ and we conclude (8.15). Summarizing, if by abuse of notation, we drop the index α and replaceδ with δ, we have reduced the proof of Theorem 8.11 to proving the following proposition.
Step 3: Transform the region of summation The sum can be written as, , and utilizing the fact that |u − v| > L 1−δ , we express the region R Z as, Setting ξ = b+a 2 and η = b−a 2 , then by taking logarithms and Taylor expanding ln(x) around x = 1 we obtain here we assumed, without loss of generality, d−1 Step 4: Replace the sum with an analogous sum Instead of considering the sum over the region R Z , we will consider the sum over the region S Z , defined as In order to make this reduction, we need a bound on cardinality of ( p, q) satisfying Such a bound would follow as a consequence of a version of a weaker version of Proposition 8.12 with the asymptotic formula (8.18) replaced with a sharp upper bound, i.e., Proposition 8.13 Fix > 0, then for δ > 0 sufficiently small the following statement is true: Suppose I j and J j satisfy the hypothesis of Proposition 8.12, then for a, b satisfying |a| , |b| ≤ 1 and )) . (8.22) We note that for Proposition 8.12 compared with Proposition 8.13 we may require a stricter smallness criteria on δ relative to the choice of . With this in mind, applying Proposition 8.13, the difference in summing in p and q satisfying (8.20) and computing the cardinality of S Z is of order O(L 2(d−1)−(3d+1)δ (b − a)) and hence can be treated as an error. We remark that such arguments will be used later to bound analogous error terms. By the arguments above, the sum in Proposition 8.13 may be estimated from above by the cardinality of S Z with η replaced by 2η in the set's definition. Hence up to a factor of 2 in the definition of S Z , to prove both Propositions 8.13 and 8.12, it suffices to obtain an asymptotic formula for S Z .
If we set then we can rewrite the cardinality of S Z as (F( p, q)) .
For a technical reason (as will be seen in Step 7), we replace 1 [−A,A] by a smooth approximation. Let φ : R → R be a smooth, non-negative, symmetric Friedrich mollifier, that is monotonically decreasing on R + . Setting φ ε (x) = ε −1 φ( x ε ). Then, we have (F( p, q)) (F( p, q)) In an analogous argument to showing that the cardinality of R Z can well approximated by the cardinality of R Z , we may show that sum I I can be estimated up to an acceptable error.
Step 5: Expressing the sum using Fourier Transform The number #S Z can be expressed using the Fourier transform as follows. Let and write where, Step 6: A scaling argument As mentioned earlier, if A is large compared to L −1 , then comparing the sum over S Z and the area of S is relatively simple. For this reason we split our sum by scaling with a factor A A 0 , where A 0 = L 4/3 |u 2 −v 2 | , i.e., split the integral into two terms, Ignoring the factor φ L −100d , the first integral is counting p, q such that

As in
Step 4, the factor φ L −100d can be ignored, up to a suitable contributing error. Then, one is reduced to counting Again, applying a similar upper/lower bounding argument to that used in Step 4 with the use of Proposition 8.13 replaced by the use of Proposition 8.10, we obtain For the purpose of proving Proposition 8.13, one simply observes that the first term is of order a)). Thus in order to complete the proof of Propositions 8.12, 8.13, and by implication Theorem 8.11, it suffices to estimate I V .
Step 7: Replace S 2 with a sum involving smooth cut-offs We now replace the sum S 2 with a sum involving smooth cut-offs. This is a preparatory step, that will be needed for Step 10, in order to apply an argument involving the Mellin transform and Riemann zeta function estimates.
We rewrite S 2 in terms of the coordinates m = p d − q d , n = p d + q d and the set Then S 2 becomes Without loss of generality, we may assume u ≤ v. Let us cover K by disjoint intervals M j of length L 1−100dδ and define w j to be the center of M j . It is not difficult to show that that may be achieved such that #{M j } L 100dδ we have the following bound on the set difference Thus we have Using that M j is of length L 1−100dδ , we may also replace m with the midpoints w j in order to obtain the estimate and hence Again, up to an allowable error we may also replace the sharp cut-off cutoff functions with a smooth cut-off ψ ≡ 1 on [−1 + L −100dδ , 1 − L −100dδ ] and supported on the interval [−1, 1], i.e.
Finally, the sum in m can be replaced be a sum involving a smooth cut-off, up to an allowable error We now decompose I V as Observe that . (8.28) and for any N we have Thus using that A (b − a)L −2+2δ , we have which is an acceptable error.
Step 8: V is an error Now consider V , we aim to show that for a set of (β 2 , β d ) of full measure, independent of our choice of length L = 2 N 1 and interval (a, b) = (N 2 L −d+1+ε , (N 2 + 1)L −d+1+ε ). By Chebyshev's inequality, it suffices to show To see this, define By Chebyshev's inequality we have we obtain (8.30) for a set of (β 2 , β d ) of full measure, where the implicit constant depends on (β 2 , β d ).
Applying (8.28) and (8.29) we have Averaging in β 2 and β d , and using Plancherel's theorem for the integral in β d , we have from the bounds A = ηL −2+2δ and Step 9: Bounding VII To bound S 1 L 2 β 2 , we rewrite where then for t ≤ L 4 , and by taking the sup over indices 3 ≤ i ≤ d − 1, we havê Here we a using the trivial bound for the case 1 ≥ |t| inf β 2 |∂ β 2 |, otherwise we use Van der Corput's Lemma (see for example [37] Chapter 8, Proposition 2). For the former case, to apply the proposition, we split the integral into regions for which ∂ β 2 is monotonic in β 2 . Set ( p i − q i )( p i + q i ) = w i and (r i − s i )(r i + s i ) = z i , and sum over fixed w i and z i using the divisor bound d(k) |k| , we obtain The above sum can rearranged by summing first over the set, and then over (k, w 2 , z 2 ) to obtain, Now we estimate #A ψ (k, w 2 , z 2 ) for a fixed (k, w 2 , z 2 ). Assume A ψ (k, w 2 , z 2 ) = ∅, then there exists w 0 and z 0 , such that, L 2−2δ ≤ |w 0 − ψ 2 | ≤ L 2 and Thus Since w 1 ∈ Z then #A ψ 1 + L 2 gcd(w 2 ,z 2 ) z 2 , and consequentlŷ Hence, applying this bound to V I I yields where we used that η = L −d+1+ε , δ is sufficiently small and d ≥ 3.
Step 10: Bounding VIII Now consider V I I I , we have We proceed to estimate S j , defined in (8.26). Defining and lettingχ denote Mellin transform of χ , then Shifting the contour to s = 1 2 we pick up the residuê which for |t| ≥ L 1 3 is order O(L −N ) for any N due to the decay ofχ and that u and v are of comparable size. Then using u ∼ L 1−3δ Again, using the rapid decay ofψ, we have We now utilize following classical L 4 bound of the zeta function in the critical strip [25] 1 Using the above bound yields Thus, combining the above estimate on S 2 with (8.31), we obtain where we used that η = L −d+1+ε > L 2 since d ≥ 3. Thus assuming δ to be sufficiently small, then L −2d+2+200dδ ≤ L 2d+2+ε−3dδ = η 2 L −3dδ , and hence V I I I ≤ L 2(d−1)−3dδ η , as desired.

Proof of Theorem 8.1
First we note that the sum in Theorem 8.1 can be simplified as follows, (1) Ignore all pairs ( p, q) such that p j = q j for each j. The sum of such pairs such that | p| , |q| ≤ L 1+δ is of order O(t 2 L d(1+δ) ) and hence contributes to an admissible error, where here we used the restriction t ≤ L d− .
(2) We restrict the sum to the positive sector p, q ∈ Z d + ∩[0, L 1+δ ] for p = q. Here we are using that the subset of ( p, q) such that p j = 0 or q j = 0 for some j is an admissible error. This follows as a consequence of Lemma 8.8. To rigorously carry out such an estimate, one must split the contributions when |Q( p, q)| ≤ μ −1 and |Q( p, q)| > μ −1 . Assuming without of loss of generality that p 1 = 0, then splitting up the later part dyadically in the size of |Q( p, q)| and using |g(x)| 1 |x| 2 one obtains the estimate where W was defined in (8.4). With all these reductions in mind, proving Theorem 8.1 will follow as a consequence of the following theorem. Theorem 8.14 (Equidistribution) Fix > 0 and let δ > 0 be sufficiently small. Then for generic β ∈ [1, 2] d , we have that for any function W ∈ S (R d ), the following holds, We remark that the above theorem is actually stronger than required: in view of the restriction on t in the hypothesis of Theorem 8.1, we need only consider μ within the range 0 < μ ≤ L d−2− . Before we prove Theorem 8.14, we will need a couple of auxiliary lemmas. The following lemma is helpful in bounding errors to the asymptotic formula. The following lemma will be useful localizing the sum in Theorem 8.14. Proof of Theorem 8.14 We first note that by symmetry, it is sufficient to restrict ourselves to the positive sector p, q ∈ Z d + . Note that Lemma 8.8 implies the subset of ( p, q) such that p j = 0 or q j = 0 may be treated as an admissible error. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.