Propagation of chaos for maxima of particle systems with mean-field drift interaction

We study the asymptotic behavior of the normalized maxima of real-valued diffusive particles with mean-field drift interaction. Our main result establishes propagation of chaos: in the large population limit, the normalized maxima behave as those arising in an i.i.d. system where each particle follows the associated McKean–Vlasov limiting dynamics. Because the maximum depends on all particles, our result does not follow from classical propagation of chaos, where convergence to an i.i.d. limit holds for any fixed number of particles but not all particles simultaneously. The proof uses a change of measure argument that depends on a delicate combinatorial analysis of the iterated stochastic integrals appearing in the chaos expansion of the Radon–Nikodym density.


Introduction
This paper is concerned with the large-population asymptotics of the maxima of certain real-valued diffusive particle systems X 1,N , . . ., X N,N with mean-field interaction through the drifts.Specifically, we are interested in large-N limits of where a N T and b N T are suitable normalizing constants.The particle dynamics are specified as follows, specializing the setup of [11].For each N ∈ N the N -particle system evolves according to a stochastic differential equation of the form dX i,N t = A(t, X i,N [0,t] ) B t, X i,N [0,t] , g(t, X i,N [0,t] , y [0,t] )µ N t (dy) dt + dW i t + C(t, X i,N [0,t] )dt (1.2) for i = 1, . . ., N , with i.i.d.initial conditions X i,N 0 ∼ ν 0 where ν 0 is a given probability measure on R. We use the notation x [0,t] = (x(s)) s∈[0,t] for any continuous function x, and for each t ∈ R + we let denote the empirical measure of the particle trajectories up to time t.The coefficients A(t, x [0,t] ), B(t, x [0,t] , r), C(t, x [0,t] ) and the interaction function g(t, x [0,t] , y [0,t] ) are defined for all t ∈ R + , x, y ∈ C(R + ), and r ∈ R. Precise assumptions are discussed below.Finally, W i , i ∈ N, is family of independent standard Brownian motions.We emphasize that there is no interaction in the volatility coefficient A. This is crucial for the methods used in this paper.
Under suitable assumptions, classical propagation of chaos [19,24,9] states that for any fixed number k ∈ N, the first k particles (X 1,N , . . ., X k,N ) converge jointly as N → ∞ to k independent copies (X 1 , . . ., X k ) of the solution to the McKean-Vlasov equation 3) with initial condition µ 0 = ν 0 .A rigorous version of this statement that fits our current setup is given in [11,Theorem 2.1], where convergence takes place in total variation and comes with quantitative bounds on the distance between the k-tuple from the N -particle system and the limiting k-tuple; see also [16].
At an intuitive level, propagation of chaos means that for large N the interacting particle system behaves approximately like a system of i.i.d.particles.This intuition suggests that the large-N asymptotics of the normalized maxima in (1.1) should match the asymptotics of the normalized maxima of the independent copies X i of the solution of (1.3), Because they are i.i.d., the latter fall within the framework of classical extreme value theory; see e.g.[7,20] for an introduction.This intuition is flawed however, because propagation of chaos only makes statements about a fixed number k of particles, while the maximum max i≤N X i,N T depends on all the particles.Furthermore, there are lower bounds on how similar (X 1,N , . . ., X k,N ) and (X 1 , . . ., X k ) can be in general.In a simple Gaussian example, it is shown in [17] that the relative entropy between the two is bounded below by a constant times (k/N ) 2 .In particular, if k → ∞ and k/N remains bounded away from zero, convergence does not take place.Barriers of this kind have prevented us from deriving statements about normalized maxima as corollaries of standard results on propagation of chaos.
Our main result nonetheless shows, under assumptions, that the normalized maxima of the N -particle systems do behave asymptotically like those of an i.i.d.system.In this sense, one has propagation of chaos of normalized maxima.The following statement is slightly informal; Theorem 2.4 gives the precise version.
Theorem 1.1.Suppose Assumptions 2.1 and 2.3 below are satisfied.Fix T ∈ (0, ∞) and suppose that for some normalizing constants a N T , b N T the normalized maxima (1.4) of the i.i.d.system converge weakly to a nondegenerate distribution Γ T on R as N → ∞.Then the normalized maxima (1.1) of the interacting particle systems also converge to Γ T as N → ∞.
The precise assumptions are discussed in Section 2, along with additional comments, and examples are developed in Section 3.Here we only highlight three points, deferring the details to Sections 2 and 3.
First, a key motivating example and application of Theorem 1.1 comes from a class of models known as rank-based diffusions, which were first studied by [8] in the context of stochastic portfolio theory.In a rank-based model with drift interaction, the N -particle system evolves as 1 N rank t (X i,N t ) dt + dW i t , i = 1, . . ., N, where rank t (X i,N t ) denotes the rank of the ith particle within the population: rank t (X i,N t ) = k if X i,N t is the kth largest particle, with a suitable convention in case of ties.The factor 1/N anticipates a passage to the large-N limit.Rank-based diffusions of this type have been studied extensively and their mean-field asymptotics are well understood.However, the asymptotics of the largest particle, of particular interest in the applied context, were previously unknown.As shown in Example 3.2, our main result is applicable and allows us to fill this gap.
Second, note that Theorem 1.1 only asserts one-dimensional marginal convergence at single time points T .Nonetheless, as discussed in Section 2, in some cases one expects joint marginal convergence of the form for any T 1 < . . .< T n , where the limit is in the sense of weak convergence toward a product measure with nondegenerate components.No continuous process has finite-dimensional marginal distributions of this form, so this precludes convergence at the level of continuous processes.Third, as part of the hypotheses of Theorem 1.1 we assume that the normalized maxima (1.4) of the i.i.d.system admit a nondegenerate limit law Γ T .Classical extreme value theory asserts that up to affine transformations, Γ T must belong to a one-parameter family of extreme value distributions consisting of the Fréchet, Gumbel, and Weibull distributions.It is obviously of interest to characterize Γ T in terms of the data A, B, C, g, and ν 0 .This question is the subject of ongoing work, and falls outside the scope of this paper.Nonetheless, in the examples in Section 3, we are able to verify this domain of attraction hypothesis by hand.
Let us mention that the large body of work that exists on the extreme eigenvalue statistics of random matrices is related to our paper in that those eigenvalues in many cases can be described by mean-field interacting diffusions.For example, the eigenvalues of a GUE (Gaussian Unitary Ensemble) random matrix are described by Dyson Brownian motion.However, the largest eigenvalue, suitably normalized, converges in distribution to the Tracy-Widom law [25], which is different from the extreme value distributions that can arise in our framework.Another random matrix model is the Ginibre ensemble [10], whose normalized spectral radius converges to the Gumbel law [21].Although this is the same limit law that we observe in our examples, the interaction among the eigenvalues of the Ginibre ensemble is not covered by our setup, in essence because we exclude interaction in the diffusion coefficients.
The rest of the paper is organized as follows.First, we finish the introduction with an outline of some of the main steps and ideas of the proof of the main theorem.Then, in Section 2, we give precise statements of our assumptions and results.We also reproduce an argument due to D. Lacker (personal communication) which vastly simplifies the proof under suitable Lipschitz assumptions; see Remark 2.9.Examples and applications are discussed in Section 3. Section 4 collects key lemmas needed for the proof of the main theorem.These lemmas are proved in Section 5. Finally, the main theorem is proved in Section 6.We will frequently use the notation [n] = {1, . . ., n} for any n ∈ N = {1, 2, . ..}, and R + = [0, ∞).We will allow generic constants C to vary from line to line, and occasionally indicate the dependence on parameters by writing C(n), C(p, n), etc.

Outline of the proof of Theorem 1.1.
The remainder of this introduction contains an outline of some of the main steps and ideas of the proof of Theorem 1.1.To simplify the discussion we take A = 1, C = 0, and B(t, x [0,t] , r) = r.We fix T ∈ (0, ∞) and note that the theorem will be proved if we show that for any x ∈ R, Here X i , i ∈ N, are i.i.d.copies of the solution of (1.3) with driving Brownian motions W i , and all objects are defined on a filtered probability space (Ω, F, (F t ) t≥0 , P).
The first observation, going back at least to [1] and also used by [6,16,11], is that the structure of the particle dynamics (1.2) allows us to construct for each N a (locally) equivalent measure Q N ∼ loc P under which (X 1 , . . ., X N ) acquires the law of (X 1,N , . . ., X N,N ).This is accomplished by the Radon-Nikodym density process where the local martingale M N is given by and where, by overloading notation, we set . We may then re-express the left-hand side of (1.5) as The key point is that (1.7) is expressed in terms of the mutually independent processes while the dependence that exists among the particle in the original N -particle system is captured by the Radon-Nikodym derivative Z N T .The proof of the theorem rests on a detailed analysis of how Z N T interacts with the indicators in (1.7), ultimately allowing us to "extract enough independence" to show that (1.7) tends to zero in the large-N limit.An analogous strategy of "extracting independence" through the above change of measure was used in [11], although the actual execution of this strategy is very different in our context.
Iterating the SDE satisfied by the stochastic exponential Z N leads to the formal chaos expansion If T is sufficiently small, one can show that a truncated version of this expansion can be substituted for Z N T − 1 in (1.7) at the cost an arbitrarily small error ε > 0. Importantly, although the truncation level m 0 (say) depends on ε, it does not depend on N .We are thus left with showing that each of the remaining m 0 terms tends to zero, that is, for each m ∈ [m 0 ], This is done by substituting and expanding the product, as well as substituting the definition of M N into the iterated integral and expand using multilinearity.The result is a sum consisting of all terms of the form , and (j 1 , . . ., j m ) ∈ [N ] m , and where the processes arise when the empirical measure µ N t is substituted into the definition of M N .We are now in a position to sketch the main ways in which we exploit the independence among the processes (X i , W i ), i ∈ N.
For each k, there are N 2m N k terms of the form (1.10).Using iterated stochastic integral estimates, along with the independence of the X i , i ∈ N, and fact that P(X T > x N ) = O(1/N ) due to the domain of attraction assumption, we show that each of these terms is bounded by This is not enough to deduce (1.9) however, because it only produces the upper bound O(N m ⌈log N ⌉ m ) which does not tend to zero with N .Nonetheless, a refined analysis shows that a large number of the terms (1.10) are in fact zero.Very roughly, this happens when there is a small overlap between the indices {ℓ 1 , . . ., ℓ k } and {i 1 , . . ., i m , j 1 , . . ., j m }, in which case the expectation in (1.10) vanishes despite the presence of the indicators.A counting argument then shows that for each k, at most Using the earlier estimate to control these remaining terms finally yields the bound O(N −1 ⌈log N ⌉ m ) of the left-hand side of (1.9).This does tend to zero as N → ∞ and allows us to complete the proof.
Counting the nonzero terms (1.10) and bounding their size constitute the heart of the proof.The key arguments involved are given as lemmas in Section 4.However, other parts of the proof also require substantial technical effort.In particular, work is required to (i) reduce from the case of general coefficients A, B, C to the simpler ones discussed above; (ii) obtain sufficiently strong iterated integral bounds to truncate the chaos expansion independently of N when T is small; and (iii) remove the smallness requirement on T .This leads to added complexity and explains why the full proof of Theorem 1.1 is rather long and technical.

Assumptions and main results
To give a precise description of our setup, we first introduce regularity and growth assumptions on the data A, B, C, g.
, respectively.They satisfy the following conditions: • A and C are uniformly bounded, • for every t ∈ R + and x ∈ C(R + ), the function r → B(t, x [0,t] , r) is twice continuously differentiable, and its first and second derivatives are bounded uniformly in (t, x).
Remark 2.2.Note that r → B(t, x [0,t] , r) itself need not be bounded, only its first two derivatives.We thus cover examples with linear growth.Moreover, if the interaction function g is uniformly bounded, the growth properties of r → B(t, x [0,t] , r) become irrelevant.
By imposing further conditions we could appeal to known results on well-posedness of McKean-Vlasov equations to assert that (1.3) has a solution.Rather than doing this, we will assume existence directly (uniqueness is not actually required, so we do not assume it.)Assumption 2.3.Fix a probability measure ν 0 on R and assume that the McKean-Vlasov equation (1.3) admits a weak solution (X, W ) with X 0 ∼ ν 0 .Construct (for instance as a countable product) a filtered probability space (Ω, F, (F t ) t≥0 , P) with a countable sequence (X i , W i ), i ∈ N of independent copies of (X, W ).Then, assume that there is a continuous function K(t) such that for all p ∈ N, t ∈ R + , N ∈ N, and i, j Sufficient conditions for the moment bounds (2.1)-(2.2) along with further discussion are given in Remark 2.7 below.
Let Assumptions 2.1 and 2.3 be in force.For each N ∈ N we now use the processes (X i , W i ) to construct the N -particle systems by changing the probability measure.First define the N -particle empirical measure Next, define the (candidate) density process where and (2.5) Assumptions 2.1 and 2.3 imply that E[ t 0 (∆B i,N s ) 2 ds] < ∞ for all i and t, which ensures that M N is a well-defined positive martingale.We claim that E[Z N T ] = 1 for all T ∈ (0, ∞), so that Z N is a true martingale.To see this, note that Lemma 4.3 implies that for any s < t ≤ T with t − s small enough, the chaos expansion converges in L 2 .Moreover, [3, Proposition 1] together with Assumption 2.3 imply that each iterated integral has expectation zero.As a result, E[Z N t /Z N s ] = 1 for all such s, t, and this implies E[Z T ] = 1 as claimed.
Since Z N is a true martingale, it induces a locally equivalent probability measure Q N ∼ loc P under which the processes defined by are mutually independent standard Brownian motions.Thus under Q N we find that X 1 , . . ., X N follow the N -particle dynamics (1.2), (2.6) The following is the precise formulation of our main result.
Theorem 2.4.Suppose Assumptions 2.1 and 2.3 are satisfied and consider the laws Q N constructed above.Fix T ∈ (0, ∞) and suppose that for some normalizing constants a N T , b N T the normalized maxima of the i.i.d.system converge weakly to a nondegenerate distribution function Γ T on R: Then the normalized maxima of the interacting particle systems also converge to Γ T : Classical extreme value theory asserts that up to affine transformations, Γ T must belong to a one-parameter family of extreme value distributions consisting of the Fréchet, Gumbel, and Weibull distributions.Our assumptions tend to preclude the heavy-tailed behavior that is characteristic of the Fréchet class.
Proposition 2.5.Let the assumptions of Theorem 2.4 be satisfied.Assume in addition that all moments of ν 0 are finite and one has the linear growth bound , where the constant c may depend on T and we use the notation Then Γ T must belong to the Gumbel or Weibull family.Proof.We allow c to change from one occurrence to the next.The assumptions imply the bound , which together with the uniform boundedness of A and C yields This in turn implies for the nondecreasing process Pathwise application of Gronwall's inequality then yields X * T ≤ e cT J T .Because all moments of ν 0 are finite, A is uniformly bounded, and thanks to (2.1) of Assumption 2.3, all moments of J T are finite.(For the stochastic integral term this uses the BDG inequalities.)Then so are the moments of X * T , and then also of X T .However, if X T were in the Fréchet domain of attraction it would have a regularly varying tail (see [7,Theorem 1.2.1]), implying that all sufficiently high moments are infinite.This excludes the Fréchet family.
Remark 2.6.Weak convergence is equivalent to convergence for all x ∈ R where Γ T is continuous.However, since all extreme value distributions are continuous, restricting to continuity points is redundant.Theorem 2.4 asserts one-dimensional marginal convergence at single time points T .We do not prove full finite-dimensional marginal convergence in this paper, but let us nonetheless make the following observation.In certain examples, the random vectors (X T 1 , . . ., X Tn ) with T 1 < . . .< T n exhibit asymptotic independence.This means that each X Tα , α ∈ [n], belongs to the maximum domain of attraction of some extreme value distribution Γ Tα with normalizing constants a N Tα , b N Tα , and that the vector of normalized maxima converges to a product measure: In particular, this is known to hold for multivariate Gaussian distributions with correlation in (−1, 1); see [20,Corollary 5.28].Thus if X is a Gaussian process with non-trivial correlation function, then all finite-dimensional marginal distributions of the centered and scaled processes max i≤N (X i t −b N t )/a N t converge as N → ∞ to product distributions with nondegenerate components (specifically, affine transformations of Gumbel).
No continuous process has finite-dimensional marginal distributions of this form, so this precludes convergence at the level of continuous processes.The Gaussian case is discussed further in Example 3.1.Whenever the i.i.d.particles X i , i ∈ N, satisfy the asymptotic independence property (2.8), it is natural to expect that the same is true for the interacting N -particle systems, although proving this is outside the scope of this paper.
We end this section with a few additional remarks.

Remark 2.7 (on Assumption 2.3).
There is a large literature on well-posedness of McKean-Vlasov equations, providing a range of conditions under which a solution to (1.3) exists; see e.g.[19,24,9,16].Next, the moment bound ) sub-Gaussian with a uniformly bounded variance proxy (see e.g.[22] for a review of sub-Gaussianity).One can then also verify (2.2) by noticing that the 2p-th moment of can be controlled by that of plus a term proportional to p!3 p K(r) p /N 2p .Conditionally on X i [0,t] , the N − 1 summands in (2.9) are pairwise independent and identically distributed with zero mean.In the sub-Gaussian case, these N − 1 summands above are also sub-Gaussian, so their average is sub-Gaussian with an O( 1N ) variance, and the desired bound follows from [22,Lemma 1.4].In the case of bounded summands, we instead apply Hoeffding's inequality [22, Theorem 1.9] and then again [22,Lemma 1.4].
Remark 2.8 (Non-i.i.d.initial conditions).Standard propagation of chaos is frequently formulated under weaker assumptions on the initial conditions of the N -particle systems than being i.i.d.A common assumption is that (X 1,N 0 , . . ., X k,N 0 ) converges weakly to (X 1 0 , . . ., X k 0 ) as N → ∞ for each k ∈ N, where X i 0 , i ∈ N, is an i.i.d.sequence.Although we have not succeeded in proving our main result under this weaker assumption on the initial conditions, it is nonetheless possible to move slightly beyond the i.i.d.setting through an additional change of measure.Specifically, let ν N 0 (a probability measure on R N ) be the desired joint initial law of the N -particle system, and assume it is absolutely continuous with respect to the N -fold product measure ν ⊗N 0 , where as above ν 0 is the initial law of the limiting McKean-Vlasov SDE.We make the total variation type stability assumption that lim Letting Q N be defined as before, we now obtain a new measure Q N by using as Radon-Nikodym derivative.This affects the initial law, but not the form of the particle dynamics.Then as N → ∞ we have This shows that the large-N asymptotics of the normalized maxima of the N -particle system are unaffacted when the initial distribution is ν N 0 instead of ν ⊗N 0 .
Remark 2.9 (A coupling argument).D. Lacker has pointed out to us that a simple coupling argument yields our propagation of chaos result in the presence of constant volatility and Lipschitz drift.Although this does not lead to a proof of our main result (in particular, our key example of rank-based models is excluded due to discontinuous drifts; see Example 3.2), it is worth recording the argument here.Assume that the drift function B satisfies the Lipschitz condition for some constant C, all x, y ∈ R, and all probability measures µ, ν with finite p-th moment.
Here W p (µ, ν) is the p-Wasserstein distance between µ and ν for some fixed p ∈ [1, ∞).We let the N -particle system be given as the unique strong solution of the system of SDEs where W i , i ∈ N, is a sequence of independent standard Brownian motions and ξ i , i ∈ N, is a sequence of p-integrable i.i.d.initial conditions.For each i, let X i be the unique strong solution of the McKean-Vlasov SDE using the same Brownian motion and initial condition as for the N -particle systems.We then obtain and a Gronwall-type argument gives Consequently,

Provided that
T 0 E W p (µ N s , µ s ) ds/a N T → 0 as N → ∞, our propagation of chaos result follows.This happens, for instance, in the Gaussian case where a N T behaves like 1/ √ log N (see Example 3.1 below) and E W p (µ N s , µ s ) behaves like N −γ for some γ > 0.

Examples
We discuss two examples that illustrate the main result.
Example 3.1 (Gaussian particles).The following Gaussian particle system has been studied in a number of contexts, such as models for monetary reserves of banks [4,2], and default intensities in large interbank networks [12,Example 2.2].The N -particle system evolves according to the multivariate Ornstein-Uhlenbeck process Here κ, m 0 ∈ R and σ, σ 0 ∈ (0, ∞) are parameters.In our setting this example arises by taking Taking expectations one obtains E[X t ] = E[X 0 ] = m 0 for all t ∈ R + , showing that X is an Ornstein-Uhlenbeck process with constant mean m 0 and time-t variance given by Letting X i , i ∈ N, be independent copies of X, we see that g(t, X i [0,t] , X j [0,t] ) = X j t is Gaussian for all i, j.Thus in view of Remark 2. By normalizing, we see that X T also belongs to the maximum domain of attraction of Γ for each T , with normalizing constants This shows that the hypotheses of Theorem 2.4 are satisfied, and we deduce that same asymptotics hold for the N -particle systems, Lastly, X is a Gaussian process with correlation function where α = 2κσ 2 0 /σ 2 .Thus Corr(X s , X t ) ∈ (0, 1) for all s = t.The discussion in Section 2 implies that the finite-dimensional marginal distributions of X exhibit asymptotic independence, and that (2.8) holds for all n ∈ N and T 1 < • • • < T n .In particular, there is no functional convergence in the space of continuous processes.
Example 3.2 (Rank-based diffusions).Consider the N -particle system evolving according to , where B(r) is a twice continuously differentiable function on [0, 1] and is the empirical distribution function.Such systems are called rank-based because the drift (and in more general formulations also the diffusion) of each particle depends on its rank within the population.Indeed, module tie-breaking, is the second smallest, and so on.Rank-based systems have been studied extensively and play an important role in stochastic portfolio theory; see e.g.[8,23,13,14,15].They are challenging to analyze in part because the drift is discontinuous as a function of the current state and the empirical measure (with the Wasserstein metric W p for any p ≥ 1), making e.g. the argument in Remark 2.9 inapplicable.
The above system fits into our setup by taking The above setup is well-studied, and both the N -particle system and the McKean-Vlasov equation are well-posed [23,13].Since the interaction function g and drift coefficient B are both bounded, Assumption 2.3 is readily seen to be satisfied.
General criteria for verifying the domain of attraction assumption on X are not available.However, if X is stationary, more can be said.It is known [23,13] that the distribution function F t (x) satisfies the PDE where B(u) = u 0 B(r)dr.Let us assume that B(0), B(1) = 0, B(u) > 0 for all u ∈ (0, 1), and B(1) = 0.In this case there is a solution F (x) to the stationary equation which is a distribution function.By using F as initial condition for X 0 , the solution of the McKean-Vlasov equation has constant marginal law, P(X t ≤ x) = F (x) for all t and x.By integrating (3.1) once and using that (Here it becomes clear why B ≥ 0 and B(1) = 0 are needed, as F ′ is a probability density.) We now apply the von Mises condition [7, Theorem 1.1.8],which states that F belongs to the Gumbel domain of attraction if The mean value theorem yields B(F for some r * ∈ (F (x), 1).Next, (3.2) implies that F ′′ = B(F )B(F ).Thus, This confirms that the hypotheses of Theorem 2.4 are satisfied.We deduce that Gumbel asymptotics hold for the N -particle systems,

Key lemmas
As discussed in Section 1.1, the proof of Theorem 2.4 relies on counting the nonzero terms of the form (1.10) and bounding their size.This was done under a smallness assumption on T which allows us to truncate the chaos expansion (1.8) at a finite level.In order to perform this truncation without any smallness assumption on T , we have to partition the interval (0, T ] into a sufficiently large number n of subintervals . Doing so leads to expressions analogous to (1.10) but more complex, and it is those expressions that we need to control.Lemmas 4.1 and 4.2 control the number of nonzero expressions.Lemmas 4.3 and 4.5 provide tail bounds on iterated stochastic integrals which are used to bound the size of the nonzero expressions and control the error that we commit when truncating the chaos expansions, among other things.The proofs of the lemmas are given in Section 5.
We work with the notation and assumptions of Section 2. In particular, Assumptions 2.1 and 2.3 are in force.We also use the notation and write L for the space of all progressively measurable processes Y with locally integrable moments, We fix a family of progressively measurable processes ) t≥0 , and introduce the iterated integral notation for any k ∈ N and any multiindices i = (i 1 , . . ., i k ) ∈ [N ] k and j = (j 1 , . . ., j k ) ∈ [N ] k .Our first key lemma is the following, where later on the random variable Ψ will be instantiated as products of indicators as in (1.10).

Lemma 4.1 (criteria for zero expectation). Assume for all
Finally, let K ⊂ [N ] and consider a bounded F K T -measurable random variable Ψ. Assume at least one of the following conditions is satisfied: where {j β,1 , . . ., j β,ℓ 0 −1 } is regarded as the empty set when ℓ 0 = 1, (ii) one has The criteria (i) and (ii) in Lemma 4.1 for zero expectation are of a combinatorial nature involving index set membership.The following lemma counts the number ways in which these conditions can fail, thereby bounding the number of nonzero terms. where We next develop bounds on iterated stochastic integrals.The following lemma will allow us to truncate the chaos expansions of ratios Z N t /Z N s at levels that do not need to increase with N to comply with given error tolerances.Note that the lemma gives an upper bound that is summable in m only if t − s is sufficiently small.This is the reason we are forced to partition [0, T ] into subintervals when proving Theorem 2.4 without any smallness assumption on T .Lemma 4.3 (first iterated integral L p estimate).Let where M N is defined in (2.4) and it is understood that Then, for any N, m, p ∈ N, any T ∈ (0, ∞), and all s, t ∈ [0, T ] we have where the constant C(T ) only depends on T and the bounds from Assumptions 2.1-2.3.
Remark 4.4.Note that we only consider L 2p norms for positive integers p, which is all that is needed later on.This is why Assumption 2.3 only involves even integer moments.
The proof of Lemma 4.3 relies on the following sharp iterated integral estimate, valid for any continuous local martingale M and any p ∈ [1, ∞), which follows from [3, Theorem 1] on noting that 1 + 1 + 1/(2p) < 3 for any such p: where we write M s,t = M t − M s for brevity.While (4.5) is instrumental for proving Lemma 4.3, it cannot be used to bound the iterated integrals appearing in (4.4), which involve several different local martingales.In order to control the nonzero terms of the form (4.4) we will instead use a weaker estimate obtained by repeated application of the BDG and Hölder inequalities.Fortunately this is sufficient thanks to the sharp control on the number of nonzero terms afforded by Lemmas 4.1 and 4.2.The following general estimate for iterated stochastic integrals involving several continuous local martingales serves this purpose, and it is also used in the proof of Lemma 4.1 as well as to control linearization errors when reducing from general drift coefficients B to linear ones in the proof of the main result.Lemma 4.5 (second iterated integral L p estimate).For any set of k ∈ N continuous local martingales M 1 , . . ., M k and any p ∈ (1, ∞) we have the estimate We end this section with an algebraic estimate which will allow us to combine Lemma 4.2 and Lemma 4.5 to show that (1.7) indeed tends to zero as N → ∞.Lemma 4.6.For any C ∈ (1, ∞), N, S ∈ N, one has the inequality ≤ (S + 2)(S + 1) 2(S+1) e Ce (Ce) S+1 .

Proofs of the key lemmas
In this section we prove the lemmas presented in Section 4. We start with the proof of Lemma 4.5 because it is used in the proof of Lemma 4.1.

Proof of Lemma 4.5
We prove the lemma by induction.The base case k = 1 follows from the sharp BDG inequality (7) in [3] and Hölder's inequality.For the induction step, we assume that the inequality holds for any p > 1 with k replaced by k −1.Applying the sharp BDG inequality, Hölder's inequality, Doob's maximal inequality (e.g.Theorem 5.1.3in [5] with p replaced by 2p ≥ 2 and so q ≤ 2), and finally the induction hypothesis yield

Proof of Lemma 4.1
We will need the following two auxiliary lemmas on conditioning, the proofs of which are a trivial modification of the proof of Lemma 2.1.4 in [18].
Lemma 5.1.For any Brownian motion W , two processes a, b ∈ L, and a σ-algebra G such that a(s) and W (s) are G-measurable for s ≤ t, one has Lemma 5.2.For any Brownian motion W , a processs a ∈ L, and a σ-algebra G such that W is independent of G, one has Notice that these lemmas can be applied for a and b being either the processes G ij , which belong to L by definition, or the iterated integrals I N i,j (s, t) for i = (i 1 , . . ., i k ) ∈ [N ] k and j = (j 1 , . . ., j k ) ∈ [N ] k , which also belong to L. (Recall that these iterated integrals are defined in (4.1).)The latter can be seen by applying Lemma 4.5 with for all ℓ ∈ [k] and then Hölder's inequality.
Assume now that condition (i) is satisfied.Let β ∈ [n] be the largest index such that (4.3) holds for some ℓ 0 ∈ [k β ], and then let ℓ 0 be the smallest index for which this happens.Now define Maximality of β implies that i α,ℓ ∈ V for all α ≥ β + 1 and all ℓ ∈ [k α ].Moreover, by definition of V we have j α,ℓ ∈ V for all α ≥ β + 1 and all ℓ ∈ [k α ].Thus every index appearing in i α or j α for α ≥ β + 1 belongs to V .As a result, It remains to show that and this will rely on repeated application of Lemma 5.1.Note that j β,ℓ ∈ V for all ℓ ≤ ℓ 0 −1 by definition of V .Moreover, minimality of ℓ 0 implies that i β,ℓ ∈ V for all ℓ ≤ ℓ 0 − 1.For ℓ in this range, starting with ℓ = 1, we may therefore apply Lemma 5.1 iteratively with The right-hand side of the last is zero.Indeed, we have where 3).We therefore deduce from Lemma 5.2 that This yields (5.1) as required.
Next, assume that condition (ii) is satisfied.In addition, we may assume that condition (i) does not hold since otherwise we would fall in the case just treated.We then define and observe that i α,ℓ ∈ V for all α ∈ [n] and all ℓ ∈ [k α ] (since condition (i) does not hold), and that j α,ℓ ∈ V for all α ∈ [n] and all ℓ ∈ [k α ] except if (α, ℓ) = (1, k 1 ) (by definition of V and since (ii) holds).In particular, The same iterative application of Lemma 5.1 as before, but now with G = F V T and using that W The conditional expectation on the right-hand side is equal to zero for all t k 1 ≤ T , thanks to (4.2) and the fact that i 1,k 1 ∈ V and j 1,k 1 / ∈ V .This completes the proof of the lemma.

Proof of Lemma 4.3
For m = 0 the inequality holds trivially so we can assume that m ≥ 1. Applying (4.5) with Next, Hölder's inequality and the fact that the distribution of ∆B i,N is the same for all i since our system is exchangeable give . ( Letting C be the upper bound on the derivative of r → B(t, x [0,t] , r) afforded by Assumption 2.3, we obtain Using this in (5.3) yields .

Proof of Lemma 4.6
We recall the identity By the binomial theorem and Leibniz's rule for the derivative of a product of functions, we have the estimate Combining (5.5) and (5.6), plugging in x = (C/N ) 1−1/ log N , noting that this value of x is upper bounded by Ce/N since C > 1, and finally using that 1 + Ce/N ≤ exp(Ce/N ) and that (eC) S−i+1 < (eC) S+1 since eC > e > 1, we obtain the desired inequality.
6 Proof of Theorem 2.4 We now prove Theorem 2.4.The setup of Section 2 will be used.In particular the objects µ N , M N , ∆B i,N in (2.3)-(2.5)as well as the density process Z N = exp(M N − 1 2 M N ) will be referred to freely.Assumptions 2.1 and 2.3 are in force.The induced measure Q N , the normalizing constants a N T , b N T , and the limiting distribution function Γ T are as in the statement of the theorem.The time point T is fixed throughout.
We must prove (2.7).It suffices to do this for x ∈ R such that Γ T (x) > 0. Indeed, suppose this has been done and consider x such that Γ T (x) = 0.Because all extreme value distributions are continuous, for any ε > 0 there is x ′ > x such that 0 < Γ T (x ′ ) < ε, and thus Since ε > 0 was arbitrary, the left-hand side converges to Γ T (x) = 0 as N → ∞.We thus pick x such that Γ T (x) > 0 and set out to prove that as N → ∞, where for brevity we introduce the notation The proof is divided into several steps.
Step 1: partitioning the time interval.Chaos expansions of Z N are at the core of the proof, and to get sufficient control on the convergence of these expansions we partition the interval (0, T ] into n subintervals (T α−1 , T α ], α ∈ [n], of equal length T α − T α−1 = T /n.We choose n large enough that C(T ) T /n < 1/2, where C(T ) is the constant in Lemma 4.3, and then keep n fixed for the remained of the proof.We now observe the identity where δ αβ is the Kronecker delta, thus δ αβ = 1 if α = β and δ αβ = 0 otherwise.This yields where To prove the theorem it suffices to show that A N α → 0 as N → ∞ for each α ∈ [n].We thus fix any such α and set out to prove that A N α → 0.
Step 2: controlling the tails of the chaos expansions uniformly in N .Let ε > 0 be arbitrary.We will show by induction that there are positive integers m 1 , m 2 , . . ., m α , which do not depend on N , such that for γ = 1, . . ., α + 1 we have with the convention that an empty product is equal to one.The base case γ = 1 holds trivially because the right-hand side is then just equal to |A N α |.Suppose now that for some γ ∈ [α] we have determined positive integers m 1 , . . ., m γ−1 such that (6.1) holds.We will find m γ such that (6.1) is true with γ replaced by γ + 1.
To this end, decompose the chaos expansion of Z N Tγ /Z N T γ−1 as As will become clear shortly, the infinite series converges in L 2 thanks to Lemma 4.3 and the fact that T γ − T γ−1 = T /n is sufficiently small.Plugging this into the induction hypothesis (6.1) we get 2) The third term on the right-hand side of (6.2) is bounded by and by one if γ = α.The martingale property of Z N thus implies that the conditional expectation above is bounded by two.Using also Hölder's inequality, the triangle inequality, and finally Lemma 4.3, we bound the expression in the preceding display by Thanks to the choice of n in Step 1, the right-hand side is bounded by We now simply choose m γ large enough that this expression is less than ε.Plugging this back into (6.2) yields (6.1) with γ replaced by γ + 1.This completes the induction step and shows that (6.1) holds for all γ = 1, . . ., α + 1.In particular, taking γ = α + 1 we obtain Step 3: reduction to linear drift.We now linearize the drift function B(t, x [0,t] , r) with respect to its third argument.We write D 3 B(t, x [0,t] , r) for the derivative with respect to r and define for simplicity the process Note that D 3 B i is adapted to the filtration (F {i} t ) t≥0 generated by (X i , W i ).We also write )µ(dy) for any signed measure µ on C(R + ).We then have the Taylor formula where R i,N is a process which is uniformly bounded in terms of the bound on the second derivative of r → B(t, x [0,t] , r) given by Assumption 2.1.We now define local martingales and iterated integrals We will prove that there exists a constant C, which does not depend on N , such that To prove this, we expand the products and use the triangle inequality to bound the left hand side by where the sum ranges over all (k 1 , . . ., k α ) such that On each summand in the above expression we apply the identity and then use the triangle inequality along with Hölder's inequality to get Thus it suffices to bound each of the products in (6.7) by a constant times 1/ √ N. Since each of these products has at least one factor with i β = 1, this will follow directly from the estimates and Once these estimates have been proved, (6.6) follows.
To prove (6.8)-(6.9)we first derive L p estimates for the quadratic variations of M N and M N − M N .Let C be a uniform bound on the first and second derivatives of r → B(t, x [0,t] , r) as given by Assumption 2.1, and recall that Assumption 2.3 gives for any positive integer p and t ∈ R + .Therefore using Hölder's inequality we obtain, for any positive integer p, and To prove (6.8) we apply Lemma 4.5 to the iterated integral I N k β (T β−1 , T β ) and combine this with (6.10) to get To prove (6.9) we observe that ) can be written as the sum of 2 k β − 1 terms, each having the form where Y ℓ = M N − M N for at least one ℓ and Y ℓ = M N for the remaining ℓ.By first applying Lemma 4.5 and then (6.10) and (6.11) we get (Here we used (6.10) for each Y ℓ that equals M N and (6.11) for each Y ℓ that equals M N − M N , and the 1/ √ N factor emerged because there is at least one factor of the latter kind.)By summing and using the triangle inequality we finally obtain (6.9).
To summarize, we have now proved (6.8)-(6.9),thus showing that each of the products in (6.7) is bounded by a constant times 1/ √ N .This in turn yields (6.6) as desired.We end Step 3 by combing (6.6) and (6.3) to get The key point is that the iterated integrals I N m (T β−1 , T β ) are defined in terms of the local martingale M N in (6.4) which, unlike M N , depends linearly on µ N − µ.In sense, all nonlinear dependence on µ N − µ has been absorbed into the vanishing term C/ √ N .
Step 4: expanding the iterated integrals.Our starting point is now (6.12),where we recall that α is fixed, ε > 0 is arbitrary, and m 1 , . . ., m α , C do not depend on N .Therefore, to show that A N α → 0 as N → ∞, it is enough to show that the expectation in (6.12) tends to zero as N → ∞.We now pave the way by expanding the sums and products in (6.12) to bring us into a position where the results of Section 4 can be applied.
We first expand the product indexed by β to bound the expectation in (6.12) by where the sum ranges over all (k 1 , . . It suffices to show that each summand in (6.13) vanishes as N → ∞, so we fix a tuple (k 1 , . . ., k α ) and focus on the corresponding expectation.The next step is to insert the identity and expand the product to write the expectation as The purpose of the substitution is to allow us to use the fact that {X i T > x N } are independent events whose probabilities are of order 1/N .We proceed to expand the iterated integrals I N k β (T β−1 , T β ).In view of (6.4) and the definition of H i s (µ N s − µ s ) and µ N s we have where for all i, j ∈ [N ].Plugging (6.15) and (6.16) into the definition of I N k β (T β−1 , T β ), see (6.5), gives where we use the iterated integral notation (4.1) of Section 4. The product of iterated integrals appearing in (6.14) can then be written (6.18)We are now finally in a position where the results of Section 4 can be applied to show that (6.18) tends to zero as N → ∞.In particular, for small values of κ, an overwhelming number of expectations in (6.18) will be zero, while for large values of κ we can exploit the smallness of the probabilities P(X i 0ℓ T > x N ).
Step 5: application of key lemmas.Our focus is on showing that (6.18) tends to zero as N → ∞, and we recall that α and k 1 , . . ., k α are fixed.We first aim to apply Lemma 4.1 to assert that a large number of the expectations in (6.18) are in fact zero.We thus fix κ ∈ [N ] and instantiate the lemma with G ij as in (6.16), the time points T 0 , . . ., T α , the natural numbers k 1 , . . ., k α , the k β -tuples i β , j β ∈ [N ] k β for β ∈ [α], the subset K = {i 01 , . . ., i 0κ } ⊂ [N ], and the bounded F K T -measurable random variable Ψ = κ ℓ=1 1 {X i 0ℓ T >x N } .We must verify the conditions of Lemma 4.1.It is clear that for each i, j, G ij is adapted to (F {i,j} t ) t≥0 and, thanks to Assumption 2.3 and the uniform boundedness of D 3 B i , belongs to L. Indeed, a brief calculation yields for any p ∈ N and t ∈ R + , where C is a uniform bound on B 3 B i and K(t) comes from Assumption 2.3.Moreover, using that the (X i , W i ) are mutually independent and D 3 B i is adapted to (F {i} t ) t≥0 , one verifies that (4.2) holds whenever V ⊂ [N ] and i ∈ V , j / ∈ V .Lemma 4.1 now tells us that the expectation in (6.18) vanishes whenever the subset K = {i 01 , . . ., i 0κ } and tuples i 1 , . . ., i α , j 1 , . . ., j α satisfy at least one of the conditions (i)-(ii) of the lemma.Thanks to Lemma 4. .
Thanks to Lemma 4.6, this is in turn bounded by C⌈log N ⌉ S N (S + 2)(S + 1) 2(S+1) e Ce (Ce) S+1 , which tends to zero as N → ∞.Tracing backwards, we deduce that (6.18) and hence (6.14) tends to zero as well.This is true for any choice of (k 1 , . . ., k α ), showing that (6.13) tends to zero.As a result, we see from (6.12) that lim sup N →∞ |A N α | ≤ αε and thus A N α → 0 since ε > 0 was arbitrary.We recall from Step 1 that it was enough to obtain this for any α ∈ [n] in order to prove the theorem.
It still remains to establish (6.20).Applying Hölder's inequality with exponents p N = log N and q N = (1 − 1/ log N ) where in the last step we used Stirling's approximation and where C (which as per our conventions may change from one occurrence to the next) does not depend on p or N .We deduce that Choosing p = α⌈p N ⌉ we apply the above bounds to obtain All that remains in order to establish (6.20) is to show that P(X T > x N ) ≤ C/N .But this follows from the fact that (1 − P(X T > x N )) N = P(max i≤N X i T ≤ a N T x + b N T ) → Γ T (x) > 0 by assumption, so that N P(X T > x N ) ≤ −N log(1 − P(X T > x N )) ≤ C for some constant C that does not depend on N .This completes the proof of (6.20), and of the theorem.

7 ,
Assumption 2.3 is satisfied.Now, it is a well-known fact [7, Example 1.1.7]that the standard Gaussian distribution belongs to the maximum domain of attraction of the standard Gumbel distribution Γ(x) = exp(−e −x ) with normalizing constants b N = 1 a N = 2 log N − log log N − log(4π).
1 is satisfied.The limiting McKean-Vlasov equation takes the form

Lemma 4 . 2 (
counting lemma).Fix natural numbers n, N, κ, k 1 , . . ., k n .The number of ways we can pick a subset K ⊂ [N ] with |K| = κ along with tuples i α , j α ∈ [N ] kα for all α ∈ [n] such that both properties (i) and (ii) of Lemma 4.1 fail to hold is bounded by