Abstract
We study the asymptotic behavior of the normalized maxima of real-valued diffusive particles with mean-field drift interaction. Our main result establishes propagation of chaos: in the large population limit, the normalized maxima behave as those arising in an i.i.d. system where each particle follows the associated McKean–Vlasov limiting dynamics. Because the maximum depends on all particles, our result does not follow from classical propagation of chaos, where convergence to an i.i.d. limit holds for any fixed number of particles but not all particles simultaneously. The proof uses a change of measure argument that depends on a delicate combinatorial analysis of the iterated stochastic integrals appearing in the chaos expansion of the Radon–Nikodym density.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
This paper is concerned with the large-population asymptotics of the maxima of certain real-valued diffusive particle systems \(X^{1,N},\ldots ,X^{N,N}\) with mean-field interaction through the drifts. Specifically, we are interested in large-N limits of
where \(a^N_T\) and \(b^N_T\) are suitable normalizing constants. The particle dynamics are specified as follows, specializing the setup of [11]. For each \(N \in {{\mathbb {N}}}\) the N-particle system evolves according to a stochastic differential equation of the form
for \(i=1,\ldots ,N\), with i.i.d. initial conditions \(X^{i,N}_0 \sim \nu _0\) where \(\nu _0\) is a given probability measure on \({{\mathbb {R}}}\). We use the notation \({\varvec{x}}_{[0,t]} = (x(s))_{s \in [0,t]}\) for any continuous function \({\varvec{x}}\), and for each \(t \in {{\mathbb {R}}}_+\) we let
denote the empirical measure of the particle trajectories up to time t. The coefficients \(A(t, {\varvec{x}}_{[0,t]})\), \(B(t, {\varvec{x}}_{[0,t]}, r)\), \(C(t, {\varvec{x}}_{[0,t]})\) and the interaction function \(g(t, {\varvec{x}}_{[0,t]}, {\varvec{y}}_{[0,t]})\) are defined for all \(t \in {{\mathbb {R}}}_+\), \({\varvec{x}}, {\varvec{y}} \in C({{\mathbb {R}}}_+)\), and \(r \in {{\mathbb {R}}}\). Precise assumptions are discussed below. Finally, \(W^i\), \(i \in {{\mathbb {N}}}\), is family of independent standard Brownian motions. We emphasize that there is no interaction in the volatility coefficient A. This is crucial for the methods used in this paper.
Under suitable assumptions, classical propagation of chaos [9, 19, 24] states that for any fixed number \(k \in {{\mathbb {N}}}\), the first k particles \((X^{1,N}, \ldots , X^{k,N})\) converge jointly as \(N \rightarrow \infty \) to k independent copies \((X^1,\ldots ,X^k)\) of the solution to the McKean–Vlasov equation
with initial condition \(\mu _0 = \nu _0\). A rigorous version of this statement that fits our current setup is given in [11, Theorem 2.1], where convergence takes place in total variation and comes with quantitative bounds on the distance between the k-tuple from the N-particle system and the limiting k-tuple; see also [16].
At an intuitive level, propagation of chaos means that for large N the interacting particle system behaves approximately like a system of i.i.d. particles. This intuition suggests that the large-N asymptotics of the normalized maxima in (1.1) should match the asymptotics of the normalized maxima of the independent copies \(X^i\) of the solution of (1.3),
Because they are i.i.d., the latter fall within the framework of classical extreme value theory; see e.g. [7, 20] for an introduction. This intuition is flawed however, because propagation of chaos only makes statements about a fixed number k of particles, while the maximum \(\max _{i\le N} X^{i,N}_T\) depends on all the particles. Furthermore, there are lower bounds on how similar \((X^{1,N},\ldots ,X^{k,N})\) and \((X^1,\ldots ,X^k)\) can be in general. In a simple Gaussian example, it is shown in [17] that the relative entropy between the two is bounded below by a constant times \((k/N)^2\). In particular, if \(k \rightarrow \infty \) and k/N remains bounded away from zero, convergence does not take place. Barriers of this kind have prevented us from deriving statements about normalized maxima as corollaries of standard results on propagation of chaos.
Our main result nonetheless shows, under assumptions, that the normalized maxima of the N-particle systems do behave asymptotically like those of an i.i.d. system. In this sense, one has propagation of chaos of normalized maxima. The following statement is slightly informal; Theorem 2.4 gives the precise version.
Theorem 1.1
Suppose Assumptions 2.1 and 2.3 below are satisfied. Fix \(T \in (0,\infty )\) and suppose that for some normalizing constants \(a^N_T,b^N_T\) the normalized maxima (1.4) of the i.i.d. system converge weakly to a nondegenerate distribution \(\Gamma _T\) on \({{\mathbb {R}}}\) as \(N \rightarrow \infty \). Then the normalized maxima (1.1) of the interacting particle systems also converge to \(\Gamma _T\) as \(N \rightarrow \infty \).
The precise assumptions are discussed in Sect. 2, along with additional comments, and examples are developed in Sect. 3. Here we only highlight three points, deferring the details to Sects. 2 and 3.
First, a key motivating example and application of Theorem 1.1 comes from a class of models known as rank-based diffusions, which were first studied by [8] in the context of stochastic portfolio theory. In a rank-based model with drift interaction, the N-particle system evolves as
where \(\text {rank}_t(X^{i,N}_t)\) denotes the rank of the ith particle within the population: \(\text {rank}_t(X^{i,N}_t) = k\) if \(X^{i,N}_t\) is the kth largest particle, with a suitable convention in case of ties. The factor 1/N anticipates a passage to the large-N limit. Rank-based diffusions of this type have been studied extensively and their mean-field asymptotics are well understood. However, the asymptotics of the largest particle, of particular interest in the applied context, were previously unknown. As shown in Example 3.2, our main result is applicable and allows us to fill this gap.
Second, note that Theorem 1.1 only asserts one-dimensional marginal convergence at single time points T. Nonetheless, as discussed in Sect. 2, in some cases one expects joint marginal convergence of the form
for any \(T_1< \ldots < T_n\), where the limit is in the sense of weak convergence toward a product measure with nondegenerate components. No continuous process has finite-dimensional marginal distributions of this form, so this precludes convergence at the level of continuous processes.
Third, as part of the hypotheses of Theorem 1.1 we assume that the normalized maxima (1.4) of the i.i.d. system admit a nondegenerate limit law \(\Gamma _T\). Classical extreme value theory asserts that up to affine transformations, \(\Gamma _T\) must belong to a one-parameter family of extreme value distributions consisting of the Fréchet, Gumbel, and Weibull distributions. It is obviously of interest to characterize \(\Gamma _T\) in terms of the data A, B, C, g, and \(\nu _0\). This question is the subject of ongoing work, and falls outside the scope of this paper. Nonetheless, in the examples in Sect. 3, we are able to verify this domain of attraction hypothesis by hand.
Let us mention that the large body of work that exists on the extreme eigenvalue statistics of random matrices is related to our paper in that those eigenvalues in many cases can be described by mean-field interacting diffusions. For example, the eigenvalues of a GUE (Gaussian Unitary Ensemble) random matrix are described by Dyson Brownian motion. However, the largest eigenvalue, suitably normalized, converges in distribution to the Tracy–Widom law [25], which is different from the extreme value distributions that can arise in our framework. Another random matrix model is the Ginibre ensemble [10], whose normalized spectral radius converges to the Gumbel law [21]. Although this is the same limit law that we observe in our examples, the interaction among the eigenvalues of the Ginibre ensemble is not covered by our setup, in essence because we exclude interaction in the diffusion coefficients.
The rest of the paper is organized as follows. First, we finish the introduction with an outline of some of the main steps and ideas of the proof of the main theorem. Then, in Sect. 2, we give precise statements of our assumptions and results. We also reproduce an argument due to D. Lacker (personal communication) which vastly simplifies the proof under suitable Lipschitz assumptions; see Remark 2.9. Examples and applications are discussed in Sect. 3. Section 4 collects key lemmas needed for the proof of the main theorem. These lemmas are proved in Sect. 5. Finally, the main theorem is proved in Sect. 6. We will frequently use the notation
and \({{\mathbb {R}}}_+ = [0,\infty )\). We will allow generic constants C to vary from line to line, and occasionally indicate the dependence on parameters by writing C(n), C(p, n), etc.
1.1 Outline of the proof of Theorem 1.1
The remainder of this introduction contains an outline of some of the main steps and ideas of the proof of Theorem 1.1. To simplify the discussion we take \(A = 1\), \(C = 0\), and \(B(t, {\varvec{x}}_{[0,t]}, r) = r\). We fix \(T \in (0,\infty )\) and note that the theorem will be proved if we show that for any \(x \in {{\mathbb {R}}}\),
Here \(X^i\), \(i \in {{\mathbb {N}}}\), are i.i.d. copies of the solution of (1.3) with driving Brownian motions \(W^i\), and all objects are defined on a filtered probability space \((\Omega ,{{\mathcal {F}}},({{\mathcal {F}}}_t)_{t\ge 0},{{\mathbb {P}}})\).
The first observation, going back at least to [1] and also used by [6, 11, 16], is that the structure of the particle dynamics (1.2) allows us to construct for each N a (locally) equivalent measure \({{\mathbb {Q}}}^N \sim _\text {loc} {{\mathbb {P}}}\) under which \((X^1,\ldots ,X^N)\) acquires the law of \((X^{1,N},\ldots ,X^{N,N})\). This is accomplished by the Radon–Nikodym density process
where the local martingale \(M^N\) is given by
and where, by overloading notation, we set \(\mu ^N_t = \frac{1}{N} \sum _{i=1}^N \delta _{X^i_{[0,t]}}\). We may then re-express the left-hand side of (1.5) as
The key point is that (1.7) is expressed in terms of the mutually independent processes \((X^i,W^i)\), \(i \in [N]\), while the dependence that exists among the particle in the original N-particle system is captured by the Radon–Nikodym derivative \(Z^N_T\). The proof of the theorem rests on a detailed analysis of how \(Z^N_T\) interacts with the indicators in (1.7), ultimately allowing us to “extract enough independence” to show that (1.7) tends to zero in the large-N limit. An analogous strategy of “extracting independence” through the above change of measure was used in [11], although the actual execution of this strategy is very different in our context.
Iterating the SDE satisfied by the stochastic exponential \(Z^N\) leads to the formal chaos expansion
If T is sufficiently small, one can show that a truncated version of this expansion can be substituted for \(Z^N_T-1\) in (1.7) at the cost an arbitrarily small error \(\varepsilon > 0\). Importantly, although the truncation level \(m_0\) (say) depends on \(\varepsilon \), it does not depend on N. We are thus left with showing that each of the remaining \(m_0\) terms tends to zero, that is, for each \(m \in [m_0]\),
This is done by substituting \({\varvec{1}}_{\{ X^i_T \le x_N \}} = 1 - {\varvec{1}}_{\{ X^i_T > x_N \}}\) and expanding the product, as well as substituting the definition of \(M^N\) into the iterated integral and expand using multilinearity. The result is a sum consisting of all terms of the form
with \(k\in [N]\), \(\{\ell _1,\ldots ,\ell _k\} \subset [N]\), \((i_1,\ldots ,i_m) \in [N]^m\), and \((j_1,\ldots ,j_m) \in [N]^m\), and where the processes
arise when the empirical measure \(\mu ^N_t\) is substituted into the definition of \(M^N\). We are now in a position to sketch the main ways in which we exploit the independence among the processes \((X^i,W^i)\), \(i \in {{\mathbb {N}}}\).
For each k, there are \(N^{2m} \left( {\begin{array}{c}N\\ k\end{array}}\right) \) terms of the form (1.10). Using iterated stochastic integral estimates, along with the independence of the \(X^i\), \(i \in {{\mathbb {N}}}\), and fact that \({{\mathbb {P}}}(X_T > x_N) = O(1/N)\) due to the domain of attraction assumption, we show that each of these terms is bounded by
This is not enough to deduce (1.9) however, because it only produces the upper bound \(O(N^m \lceil \log N \rceil ^{m})\) which does not tend to zero with N. Nonetheless, a refined analysis shows that a large number of the terms (1.10) are in fact zero. Very roughly, this happens when there is a small overlap between the indices \(\{\ell _1,\ldots ,\ell _k\}\) and \(\{i_1,\ldots ,i_m,j_1,\ldots ,j_m\}\), in which case the expectation in (1.10) vanishes despite the presence of the indicators. A counting argument then shows that for each k, at most \(\left( {\begin{array}{c}N\\ k\end{array}}\right) k (k+1) \cdots (k+m) N^{m-1}\) terms remain. Using the earlier estimate to control these remaining terms finally yields the bound \(O(N^{-1} \lceil \log N \rceil ^{m})\) of the left-hand side of (1.9). This does tend to zero as \(N \rightarrow \infty \) and allows us to complete the proof.
Counting the nonzero terms (1.10) and bounding their size constitute the heart of the proof. The key arguments involved are given as lemmas in Sect. 4. However, other parts of the proof also require substantial technical effort. In particular, work is required to (i) reduce from the case of general coefficients A, B, C to the simpler ones discussed above; (ii) obtain sufficiently strong iterated integral bounds to truncate the chaos expansion independently of N when T is small; and (iii) remove the smallness requirement on T. This leads to added complexity and explains why the full proof of Theorem 1.1 is rather long and technical.
2 Assumptions and main results
To give a precise description of our setup, we first introduce regularity and growth assumptions on the data A, B, C, g.
Assumption 2.1
The coefficient functions \((t,{\varvec{x}}) \mapsto A(t, {\varvec{x}}_{[0,t]})\), \((t,{\varvec{x}},r) \mapsto B(t, {\varvec{x}}_{[0,t]}, r)\), \((t,{\varvec{x}}) \mapsto C(t, {\varvec{x}}_{[0,t]})\) and the interaction function \((t,{\varvec{x}},{\varvec{y}}) \mapsto g(t, {\varvec{x}}_{[0,t]}, {\varvec{y}}_{[0,t]})\) are real-valued measurable functions on \({{\mathbb {R}}}_+ \times C({{\mathbb {R}}}_+)\), \({{\mathbb {R}}}_+ \times C({{\mathbb {R}}}_+) \times {{\mathbb {R}}}\), \({{\mathbb {R}}}_+ \times C({{\mathbb {R}}}_+)\), and \({{\mathbb {R}}}_+ \times C({{\mathbb {R}}}_+) \times C({{\mathbb {R}}}_+)\), respectively. They satisfy the following conditions:
-
A and C are uniformly bounded,
-
for every \(t \in {{\mathbb {R}}}_+\) and \({\varvec{x}} \in C({{\mathbb {R}}}_+)\), the function \(r \mapsto B(t, {\varvec{x}}_{[0,t]}, r)\) is twice continuously differentiable, and its first and second derivatives are bounded uniformly in \((t, {\varvec{x}})\).
Remark 2.2
Note that \(r \mapsto B(t, {\varvec{x}}_{[0,t]}, r)\) itself need not be bounded, only its first two derivatives. We thus cover examples with linear growth. Moreover, if the interaction function g is uniformly bounded, the growth properties of \(r \mapsto B(t, {\varvec{x}}_{[0,t]}, r)\) become irrelevant.
By imposing further conditions we could appeal to known results on well-posedness of McKean–Vlasov equations to assert that (1.3) has a solution. Rather than doing this, we will assume existence directly (uniqueness is not actually required, so we do not assume it.)
Assumption 2.3
Fix a probability measure \(\nu _0\) on \({{\mathbb {R}}}\) and assume that the McKean–Vlasov equation (1.3) admits a weak solution (X, W) with \(X_0 \sim \nu _0\). Construct (for instance as a countable product) a filtered probability space \((\Omega ,{{\mathcal {F}}},({{\mathcal {F}}}_t)_{t\ge 0},{{\mathbb {P}}})\) with a countable sequence \((X^i,W^i)\), \(i \in {{\mathbb {N}}}\) of independent copies of (X, W). Then, assume that there is a continuous function K(t) such that for all \(p \in {{\mathbb {N}}}\), \(t \in {{\mathbb {R}}}_+\), \(N \in {{\mathbb {N}}}\), and \(i,j \in [N]\), one has the moment bounds
and
Sufficient conditions for the moment bounds (2.1)–(2.2) along with further discussion are given in Remark 2.7 below.
Let Assumptions 2.1 and 2.3 be in force. For each \(N \in {{\mathbb {N}}}\) we now use the processes \((X^i,W^i)\) to construct the N-particle systems by changing the probability measure. First define the N-particle empirical measure
Next, define the (candidate) density process \(Z^N = \exp (M^N - \frac{1}{2}\langle M^N \rangle )\) where
and
Assumptions 2.1 and 2.3 imply that \({{\mathbb {E}}}[ \int _0^t (\Delta B^{i,N}_s)^2 ds] < \infty \) for all i and t, which ensures that \(M^N\) is a well-defined positive martingale. We claim that \({{\mathbb {E}}}[Z^N_T] = 1\) for all \(T \in (0,\infty )\), so that \(Z^N\) is a true martingale. To see this, note that Lemma 4.3 implies that for any \(s < t \le T\) with \(t-s\) small enough, the chaos expansion
converges in \(L^2\). Moreover, [3, Proposition 1] together with Assumption 2.3 imply that each iterated integral has expectation zero. As a result, \({{\mathbb {E}}}[Z^N_t / Z^N_s] = 1\) for all such s, t, and this implies \({{\mathbb {E}}}[Z_T]=1\) as claimed.
Since \(Z^N\) is a true martingale, it induces a locally equivalent probability measure \({{\mathbb {Q}}}^N \sim _\text {loc} {{\mathbb {P}}}\) under which the processes defined by
are mutually independent standard Brownian motions. Thus under \({{\mathbb {Q}}}^N\) we find that \(X^1,\ldots ,X^N\) follow the N-particle dynamics (1.2),
The following is the precise formulation of our main result.
Theorem 2.4
Suppose Assumptions 2.1 and 2.3 are satisfied and consider the laws \({{\mathbb {Q}}}^N\) constructed above. Fix \(T \in (0,\infty )\) and suppose that for some normalizing constants \(a^N_T,b^N_T\) the normalized maxima of the i.i.d. system converge weakly to a nondegenerate distribution function \(\Gamma _T\) on \({{\mathbb {R}}}\):
Then the normalized maxima of the interacting particle systems also converge to \(\Gamma _T\):
Classical extreme value theory asserts that up to affine transformations, \(\Gamma _T\) must belong to a one-parameter family of extreme value distributions consisting of the Fréchet, Gumbel, and Weibull distributions. Our assumptions tend to preclude the heavy-tailed behavior that is characteristic of the Fréchet class.
Proposition 2.5
Let the assumptions of Theorem 2.4 be satisfied. Assume in addition that all moments of \(\nu _0\) are finite and one has the linear growth bound \(|B(t,{\varvec{x}}_{[0,t]}, 0)| \le c(1 + x^*_t)\) for all \(t \in [0,T]\), \({\varvec{x}} \in C({{\mathbb {R}}}_+)\), where the constant c may depend on T and we use the notation \(x^*_t = \sup _{s\le t}|x_s|\). Then \(\Gamma _T\) must belong to the Gumbel or Weibull family.
Proof
We allow c to change from one occurrence to the next. The assumptions imply the bound \(|B(t, X_{[0,t]}, r)| \le c(1 + X^*_t + |r|)\), which together with the uniform boundedness of A and C yields
This in turn implies
for the nondecreasing process
Pathwise application of Gronwall’s inequality then yields \(X^*_T \le e^{cT} J_T\). Because all moments of \(\nu _0\) are finite, A is uniformly bounded, and thanks to (2.1) of Assumption 2.3, all moments of \(J_T\) are finite. (For the stochastic integral term this uses the BDG inequalities.) Then so are the moments of \(X^*_T\), and then also of \(X_T\). However, if \(X_T\) were in the Fréchet domain of attraction it would have a regularly varying tail (see [7, Theorem 1.2.1]), implying that all sufficiently high moments are infinite. This excludes the Fréchet family. \(\square \)
Remark 2.6
Weak convergence is equivalent to convergence for all \(x \in {{\mathbb {R}}}\) where \(\Gamma _T\) is continuous. However, since all extreme value distributions are continuous, restricting to continuity points is redundant.
Theorem 2.4 asserts one-dimensional marginal convergence at single time points T. We do not prove full finite-dimensional marginal convergence in this paper, but let us nonetheless make the following observation. In certain examples, the random vectors \((X_{T_1},\ldots ,X_{T_n})\) with \(T_1<\ldots <T_n\) exhibit asymptotic independence. This means that each \(X_{T_\alpha }\), \(\alpha \in [n]\), belongs to the maximum domain of attraction of some extreme value distribution \(\Gamma _{T_\alpha }\) with normalizing constants \(a^N_{T_\alpha },b^N_{T_\alpha }\), and that the vector of normalized maxima converges to a product measure:
as \(N \rightarrow \infty \) for all \((x_1,\ldots ,x_n) \in {{\mathbb {R}}}^n\). Asymptotic independence is characterized by the condition
for all \(\alpha \ne \beta \) in [n] and all \(x_\alpha ,x_\beta \in {{\mathbb {R}}}\) such that \(\Gamma _{T_\alpha }(x_\alpha ) > 0\) and \(\Gamma _{T_\beta }(x_\beta ) > 0\); see [20, Proposition 5.27]. In particular, this is known to hold for multivariate Gaussian distributions with correlation in \((-1,1)\); see [20, Corollary 5.28]. Thus if X is a Gaussian process with non-trivial correlation function, then all finite-dimensional marginal distributions of the centered and scaled processes \(\max _{i \le N} (X^i_t - b^N_t)/a^N_t\) converge as \(N \rightarrow \infty \) to product distributions with nondegenerate components (specifically, affine transformations of Gumbel). No continuous process has finite-dimensional marginal distributions of this form, so this precludes convergence at the level of continuous processes. The Gaussian case is discussed further in Example 3.1. Whenever the i.i.d. particles \(X^i\), \(i \in {{\mathbb {N}}}\), satisfy the asymptotic independence property (2.8), it is natural to expect that the same is true for the interacting N-particle systems, although proving this is outside the scope of this paper.
We end this section with a few additional remarks.
Remark 2.7
(on Assumption 2.3) There is a large literature on well-posedness of McKean–Vlasov equations, providing a range of conditions under which a solution to (1.3) exists; see e.g. [9, 16, 19, 24]. Next, the moment bound (2.1) is satisfied if the centered random variables \(g(t, X^i_{[0,t]}, X^j_{[0,t]}) - \int g(t, X^i_{[0,t]}, \varvec{y})\mu _t(d\varvec{y})\) are bounded or conditionally (on \(X^i_{[0,t]}\)) sub-Gaussian with a uniformly bounded variance proxy (see e.g. [22] for a review of sub-Gaussianity). One can then also verify (2.2) by noticing that the 2p-th moment of
can be controlled by that of
plus a term proportional to \(p! 3^p K(r)^p / N^{2p}\). Conditionally on \(X^i_{[0,t]}\), the \(N - 1\) summands in (2.9) are pairwise independent and identically distributed with zero mean. In the sub-Gaussian case, these \(N-1\) summands above are also sub-Gaussian, so their average is sub-Gaussian with an \(O(\frac{1}{N})\) variance, and the desired bound follows from [22, Lemma 1.4]. In the case of bounded summands, we instead apply Hoeffding’s inequality [22, Theorem 1.9] and then again [22, Lemma 1.4].
Remark 2.8
(Non-i.i.d. initial conditions) Standard propagation of chaos is frequently formulated under weaker assumptions on the initial conditions of the N-particle systems than being i.i.d. A common assumption is that \((X_0^{1,N}, \ldots , X_0^{k,N})\) converges weakly to \((X_0^{1}, \ldots , X_0^{k})\) as \(N \rightarrow \infty \) for each \(k \in {\mathbb {N}}\), where \(X^i_0\), \(i \in {{\mathbb {N}}}\), is an i.i.d. sequence. Although we have not succeeded in proving our main result under this weaker assumption on the initial conditions, it is nonetheless possible to move slightly beyond the i.i.d. setting through an additional change of measure. Specifically, let \(\nu ^N_0\) (a probability measure on \({{\mathbb {R}}}^N\)) be the desired joint initial law of the N-particle system, and assume it is absolutely continuous with respect to the N-fold product measure \(\nu _0^{\otimes N}\), where as above \(\nu _0\) is the initial law of the limiting McKean–Vlasov SDE. We make the total variation type stability assumption that
Letting \({{\mathbb {Q}}}^N\) be defined as before, we now obtain a new measure \(\widetilde{{{\mathbb {Q}}}}^N\) by using
as Radon–Nikodym derivative. This affects the initial law, but not the form of the particle dynamics. Then as \(N \rightarrow \infty \) we have
This shows that the large-N asymptotics of the normalized maxima of the N-particle system are unaffacted when the initial distribution is \(\nu ^N_0\) instead of \(\nu _0^{\otimes N}\).
Remark 2.9
(A coupling argument) D. Lacker has pointed out to us that a simple coupling argument yields our propagation of chaos result in the presence of constant volatility and Lipschitz drift. Although this does not lead to a proof of our main result (in particular, our key example of rank-based models is excluded due to discontinuous drifts; see Example 3.2), it is worth recording the argument here. Assume that the drift function B satisfies the Lipschitz condition
for some constant C, all \(x,y \in {{\mathbb {R}}}\), and all probability measures \(\mu ,\nu \) with finite p-th moment. Here \({{\mathcal {W}}}_p(\mu ,\nu )\) is the p-Wasserstein distance between \(\mu \) and \(\nu \) for some fixed \(p \in [1,\infty )\). We let the N-particle system be given as the unique strong solution of the system of SDEs
where \(W^i\), \(i \in {{\mathbb {N}}}\), is a sequence of independent standard Brownian motions and \(\xi ^i\), \(i \in {{\mathbb {N}}}\), is a sequence of p-integrable i.i.d. initial conditions. For each i, let \(X^i\) be the unique strong solution of the McKean–Vlasov SDE
using the same Brownian motion and initial condition as for the N-particle systems. We then obtain
and a Gronwall-type argument gives
Consequently,
Provided that \(\int _0^T{{\mathbb {E}}}\left[ W_p(\mu _{s}^N, \mu _{s})\right] ds / a^N_T \rightarrow 0\) as \(N \rightarrow \infty \), our propagation of chaos result follows. This happens, for instance, in the Gaussian case where \(a^N_T\) behaves like \(1/\sqrt{\log N}\) (see Example 3.1 below) and \({{\mathbb {E}}}\left[ W_p(\mu _{s}^N, \mu _{s})\right] \) behaves like \(N^{-\gamma }\) for some \(\gamma > 0\).
3 Examples
We discuss two examples that illustrate the main result.
Example 3.1
(Gaussian particles) The following Gaussian particle system has been studied in a number of contexts, such as models for monetary reserves of banks [2, 4], and default intensities in large interbank networks [12, Example 2.2]. The N-particle system evolves according to the multivariate Ornstein–Uhlenbeck process
with i.i.d. \(N(m_0,\sigma _0^2)\) initial conditions. Here \(\kappa , m_0 \in {{\mathbb {R}}}\) and \(\sigma , \sigma _0 \in (0,\infty )\) are parameters. In our setting this example arises by taking \(A(t,{\varvec{x}}_{[0,t]}) = \sigma \), \(B(t,{\varvec{x}}_{[0,t]},r) = -\kappa (x_t - r)/\sigma \), \(C(t, {\varvec{x}}_{[0,t]}) = 0\), and \(g(t,{\varvec{x}}_{[0,t]},{\varvec{y}}_{[0,t]}) = y_t\). Clearly Assumption 2.1 is satisfied. The McKean–Vlasov equation (1.3) reduces to
Taking expectations one obtains \({{\mathbb {E}}}[X_t] = {{\mathbb {E}}}[X_0] = m_0\) for all \(t \in {{\mathbb {R}}}_+\), showing that X is an Ornstein–Uhlenbeck process with constant mean \(m_0\) and time-t variance given by
Letting \(X^i\), \(i \in {{\mathbb {N}}}\), be independent copies of X, we see that \(g(t, X^i_{[0,t]}, X^j_{[0,t]}) = X^j_t\) is Gaussian for all i, j. Thus in view of Remark 2.7, Assumption 2.3 is satisfied.
Now, it is a well-known fact [7, Example 1.1.7] that the standard Gaussian distribution belongs to the maximum domain of attraction of the standard Gumbel distribution \(\Gamma (x) = \exp (-e^{-x})\) with normalizing constants
By normalizing, we see that \(X_T\) also belongs to the maximum domain of attraction of \(\Gamma \) for each T, with normalizing constants
Indeed, since \((X^i_T - m_0)/\sigma _T\) is standard Gaussian we have
This shows that the hypotheses of Theorem 2.4 are satisfied, and we deduce that same asymptotics hold for the N-particle systems,
Lastly, X is a Gaussian process with correlation function
where \(\alpha = 2 \kappa \sigma _0^2 / \sigma ^2\). Thus \(\text {Corr}(X_s,X_t) \in (0,1)\) for all \(s \ne t\). The discussion in Sect. 2 implies that the finite-dimensional marginal distributions of X exhibit asymptotic independence, and that (2.8) holds for all \(n \in {{\mathbb {N}}}\) and \(T_1< \cdots < T_n\). In particular, there is no functional convergence in the space of continuous processes.
Example 3.2
(Rank-based diffusions) Consider the N-particle system evolving according to
for \(i \in [N]\), where B(r) is a twice continuously differentiable function on [0, 1] and
is the empirical distribution function. Such systems are called rank-based because the drift (and in more general formulations also the diffusion) of each particle depends on its rank within the population. Indeed, module tie-breaking, \(F^N_t( X^{i,N}_t) = k/N\) where \(k=1\) if \(X^{i,N}_t\) is the smallest particle, \(k=2\) if \(X^{i,N}_t\) is the second smallest, and so on. Rank-based systems have been studied extensively and play an important role in stochastic portfolio theory; see e.g. [8, 13,14,15, 23]. They are challenging to analyze in part because the drift is discontinuous as a function of the current state and the empirical measure (with the Wasserstein metric \({{\mathcal {W}}}_p\) for any \(p \ge 1\)), making e.g. the argument in Remark 2.9 inapplicable.
The above system fits into our setup by taking \(A(t,{\varvec{x}}_{[0,t]}) = \sqrt{2}\), \(B(t,{\varvec{x}}_{[0,t]},r) = B(r)\), \(C(t, {\varvec{x}}_{[0,t]}) = 0\), and \(g(t,{\varvec{x}}_{[0,t]},{\varvec{y}}_{[0,t]}) = {\varvec{1}}_{\{y_t \le x_t\}}\). Clearly Assumption 2.1 is satisfied. The limiting McKean–Vlasov equation takes the form
The above setup is well-studied, and both the N-particle system and the McKean–Vlasov equation are well-posed [13, 23]. Since the interaction function g and drift coefficient B are both bounded, Assumption 2.3 is readily seen to be satisfied.
General criteria for verifying the domain of attraction assumption on X are not available. However, if X is stationary, more can be said. It is known [13, 23] that the distribution function \(F_t(x)\) satisfies the PDE
where \({\mathfrak {B}}(u) = \int _0^u B(r) dr\). Let us assume that \(B(0), B(1) \ne 0\), \({\mathfrak {B}}(u) > 0\) for all \(u \in (0,1)\), and \({\mathfrak {B}}(1) = 0\). In this case there is a solution F(x) to the stationary equation
which is a distribution function. By using F as initial condition for \(X_0\), the solution of the McKean–Vlasov equation has constant marginal law, \({{\mathbb {P}}}(X_t \le x) = F(x)\) for all t and x. By integrating (3.1) once and using that \(F(-\infty ) = F'(-\infty ) = {\mathfrak {B}}(0) = 0\) one obtains
(Here it becomes clear why \({\mathfrak {B}} \ge 0\) and \({\mathfrak {B}}(1) = 0\) are needed, as \(F'\) is a probability density.) We now apply the von Mises condition [7, Theorem 1.1.8], which states that F belongs to the Gumbel domain of attraction if
The mean value theorem yields \({\mathfrak {B}}(F(x)) = {\mathfrak {B}}(F(x)) - {\mathfrak {B}}(1) = B(r^*)(F(x)-1)\) for some \(r^* \in (F(x),1)\). Next, (3.2) implies that \(F'' = B(F){\mathfrak {B}}(F)\). Thus,
This confirms that the hypotheses of Theorem 2.4 are satisfied. We deduce that Gumbel asymptotics hold for the N-particle systems,
4 Key lemmas
As discussed in Sect. 1.1, the proof of Theorem 2.4 relies on counting the nonzero terms of the form (1.10) and bounding their size. This was done under a smallness assumption on T which allows us to truncate the chaos expansion (1.8) at a finite level. In order to perform this truncation without any smallness assumption on T, we have to partition the interval (0, T] into a sufficiently large number n of subintervals \((T_{\alpha -1},T_\alpha ]\), \(\alpha \in [n]\). Doing so leads to expressions analogous to (1.10) but more complex, and it is those expressions that we need to control. Lemmas 4.1 and 4.2 control the number of nonzero expressions. Lemmas 4.3 and 4.5 provide tail bounds on iterated stochastic integrals which are used to bound the size of the nonzero expressions and control the error that we commit when truncating the chaos expansions, among other things. The proofs of the lemmas are given in Sect. 5.
We work with the notation and assumptions of Sect. 2. In particular, Assumptions 2.1 and 2.3 are in force. We also use the notation
and write \({\mathbb {L}}\) for the space of all progressively measurable processes Y with locally integrable moments, \(\int _0^t {{\mathbb {E}}}[ |Y_s|^p ] ds < \infty \) for all \(t \in {{\mathbb {R}}}_+\) and \(p \in {{\mathbb {N}}}\).
We fix a family of progressively measurable processes \({G}^{{i}{j}} \in {\mathbb {L}}\), \({i},{j}\in [N]\), such that \({G}^{{i}{j}}\) is adapted to the filtration \(({{\mathcal {F}}}_t^{\{{i},{j}\}})_{t \ge 0}\), and introduce the iterated integral notation
for any \(k \in {{\mathbb {N}}}\) and any multiindices \({\varvec{i}} = ({i}_1,\ldots ,{i}_k) \in [N]^k\) and \({{\varvec{j}}} = ({j}_1,\ldots ,{j}_k) \in [N]^k\). Our first key lemma is the following, where later on the random variable \(\Psi \) will be instantiated as products of indicators as in (1.10).
Lemma 4.1
(criteria for zero expectation) Assume for all \(V \subset [N]\), \({i}\in V\), \({j}\notin V\) that
Let \(T \ge 0\) and \(n \in {{\mathbb {N}}}\). Consider an increasing finite sequence \(0 = T_0 \le \cdots \le T_n = T\) and fix natural numbers \(k_1,\ldots ,k_n \in {{\mathbb {N}}}\). For each \(\alpha \in [n]\), fix two \(k_\alpha \)-tuples
Finally, let \(K \subset [N]\) and consider a bounded \({{\mathcal {F}}}_T^K\)-measurable random variable \(\Psi \). Assume at least one of the following conditions is satisfied:
-
(i)
there exist some \(\beta \in [n]\) and \(\ell _0 \in [k_{\beta }]\) such that
$$\begin{aligned} {i}_{\beta , \ell _0} \notin K \cup \{{j}_{\beta , 1}, \ldots , {j}_{\beta , \ell _0-1} \} \cup \bigcup _{\alpha = \beta + 1}^n \{{j}_{\alpha , 1},\ldots ,{j}_{\alpha , k_\alpha } \}, \end{aligned}$$(4.3)where \(\{{j}_{\beta , 1}, \ldots , {j}_{\beta , \ell _0-1} \}\) is regarded as the empty set when \(\ell _0 = 1\),
-
(ii)
one has
$$\begin{aligned} {j}_{1, k_1} \notin K \cup \{{j}_{1,1}, \ldots , {j}_{1,k_1-1} \} \cup \bigcup _{\alpha = 2}^n \{{j}_{\alpha ,1},\ldots ,{j}_{\alpha , k_\alpha } \}. \end{aligned}$$
Then
The criteria (i) and (ii) in Lemma 4.1 for zero expectation are of a combinatorial nature involving index set membership. The following lemma counts the number ways in which these conditions can fail, thereby bounding the number of nonzero terms.
Lemma 4.2
(counting lemma) Fix natural numbers \(n, N, \kappa , k_1,\ldots ,k_n\). The number of ways we can pick a subset \(K \subset [N]\) with \(|K| = \kappa \) along with tuples \(\varvec{i}_{\alpha }, \varvec{j}_{\alpha } \in [N]^{k_{\alpha }}\) for all \(\alpha \in [n]\) such that both properties (i) and (ii) of Lemma 4.1 fail to hold is bounded by
where \(S = k_1 + \cdots + k_n\).
We next develop bounds on iterated stochastic integrals. The following lemma will allow us to truncate the chaos expansions of ratios \(Z^N_t / Z^N_s\) at levels that do not need to increase with N to comply with given error tolerances. Note that the lemma gives an upper bound that is summable in m only if \(t-s\) is sufficiently small. This is the reason we are forced to partition [0, T] into subintervals when proving Theorem 2.4 without any smallness assumption on T.
Lemma 4.3
(first iterated integral \(L^p\) estimate) Let
where \(M^N\) is defined in (2.4) and it is understood that \(I^N_1(s,t) = M^N_t - M^N_s\). Then, for any \(N,m, p \in {\mathbb {N}}\), any \(T \in (0,\infty )\), and all \(s, t \in \left[ 0, \, T\right] \) we have
where the constant C(T) only depends on T and the bounds from Assumptions 2.1–2.3.
Remark 4.4
Note that we only consider \(L^{2p}\) norms for positive integers p, which is all that is needed later on. This is why Assumption 2.3 only involves even integer moments.
The proof of Lemma 4.3 relies on the following sharp iterated integral estimate, valid for any continuous local martingale M and any \(p \in [1,\infty )\), which follows from [3, Theorem 1] on noting that \(1 + \sqrt{1 + 1/(2p)} < 3\) for any such p:
where we write \(\langle M \rangle _{s,t} = \langle M \rangle _t - \langle M \rangle _s\) for brevity.
While (4.5) is instrumental for proving Lemma 4.3, it cannot be used to bound the iterated integrals appearing in (4.4), which involve several different local martingales. In order to control the nonzero terms of the form (4.4) we will instead use a weaker estimate obtained by repeated application of the BDG and Hölder inequalities. Fortunately this is sufficient thanks to the sharp control on the number of nonzero terms afforded by Lemmas 4.1 and 4.2. The following general estimate for iterated stochastic integrals involving several continuous local martingales serves this purpose, and it is also used in the proof of Lemma 4.1 as well as to control linearization errors when reducing from general drift coefficients B to linear ones in the proof of the main result.
Lemma 4.5
(second iterated integral \(L^p\) estimate) For any set of \(k \in {{\mathbb {N}}}\) continuous local martingales \(M^1,\ldots ,M^k\) and any \(p \in (1,\infty )\) we have the estimate
We end this section with an algebraic estimate which will allow us to combine Lemma 4.2 and Lemma 4.5 to show that (1.7) indeed tends to zero as \(N \rightarrow \infty \).
Lemma 4.6
For any \(C \in (1,\infty )\), \(N, S \in {{\mathbb {N}}}\), one has the inequality
5 Proofs of the key lemmas
In this section we prove the lemmas presented in Sect. 4. We start with the proof of Lemma 4.5 because it is used in the proof of Lemma 4.1.
5.1 Proof of Lemma 4.5
We prove the lemma by induction. The base case \(k=1\) follows from the sharp BDG inequality (7) in [3] and Hölder’s inequality. For the induction step, we assume that the inequality holds for any \(p>1\) with k replaced by \(k-1\). Applying the sharp BDG inequality, Hölder’s inequality, Doob’s maximal inequality (e.g. Theorem 5.1.3 in [5] with p replaced by \(2p \ge 2\) and so \(q \le 2\)), and finally the induction hypothesis yield
5.2 Proof of Lemma 4.1
We will need the following two auxiliary lemmas on conditioning, the proofs of which are a trivial modification of the proof of Lemma 2.1.4 in [18].
Lemma 5.1
For any Brownian motion W, two processes \(a, b \in {\mathbb {L}}\), and a \(\sigma \)-algebra \({{\mathcal {G}}}\) such that a(s) and W(s) are \({{\mathcal {G}}}\)-measurable for \(s \le t\), one has
Lemma 5.2
For any Brownian motion W, a processs \(a \in {\mathbb {L}}\), and a \(\sigma \)-algebra \({{\mathcal {G}}}\) such that W is independent of \({{\mathcal {G}}}\), one has
Notice that these lemmas can be applied for a and b being either the processes \(G^{ij}\), which belong to \({\mathbb {L}}\) by definition, or the iterated integrals \(I_{{\varvec{i}},{{\varvec{j}}}}^{N}(s, \, t)\) for \({\varvec{i}} = ({i}_1,\ldots ,{i}_k) \in [N]^k\) and \({{\varvec{j}}} = ({j}_1,\ldots ,{j}_k) \in [N]^k\), which also belong to \({\mathbb {L}}\). (Recall that these iterated integrals are defined in (4.1).) The latter can be seen by applying Lemma 4.5 with \(M^{\ell }_t = \int _{0}^{t}G_t^{i_{\ell }j_{\ell }}dW_s^{i_{\ell }}\) for all \(\ell \in [k]\) and then Hölder’s inequality.
Assume now that condition (i) is satisfied. Let \(\beta \in [n]\) be the largest index such that (4.3) holds for some \(\ell _0 \in [k_{\beta }]\), and then let \(\ell _0\) be the smallest index for which this happens. Now define
Maximality of \(\beta \) implies that \({i}_{\alpha , \ell } \in V\) for all \(\alpha \ge \beta + 1\) and all \(\ell \in [k_\alpha ]\). Moreover, by definition of V we have \(j_{\alpha , \ell } \in V\) for all \(\alpha \ge \beta + 1\) and all \(\ell \in [k_\alpha ]\). Thus every index appearing in \(\varvec{i}_\alpha \) or \(\varvec{j}_\alpha \) for \(\alpha \ge \beta + 1\) belongs to V. As a result, \(I_{{\varvec{i}_{\alpha }},{{\varvec{j}}_{\alpha }}}^{N}(T_{\alpha - 1}, T_{\alpha })\) is \({{\mathcal {F}}}_T^V\)-measurable for all \(\alpha \ge \beta + 1\). Since \(K \subset V\), \(\Psi \) is also \({{\mathcal {F}}}_T^V\)-measurable. Finally, for \(\alpha \le \beta - 1\) we have that \(I_{{\varvec{i}_{\alpha }},{{\varvec{j}}_{\alpha }}}^{N}(T_{\alpha - 1}, T_{\alpha })\) is \({{\mathcal {F}}}_{T_{\beta -1}}\)-measurable. We conclude that
It remains to show that
and this will rely on repeated application of Lemma 5.1. Note that \(j_{\beta , \ell } \in V\) for all \(\ell \le \ell _0 - 1\) by definition of V. Moreover, minimality of \(\ell _0\) implies that \({i}_{\beta , \ell } \in V\) for all \(\ell \le \ell _0-1\). For \(\ell \) in this range, starting with \(\ell = 1\), we may therefore apply Lemma 5.1 iteratively with
where \(\varvec{i}_\beta ^{(\ell )} = ({i}_{\beta , \ell }, \ldots , {i}_{\beta , k_{\beta }})\) and \(\varvec{j}_\beta ^{(\ell )} = ({j}_{\beta , \ell }, \ldots , {j}_{\beta , k_{\beta }}),\) to obtain
The right-hand side of the last is zero. Indeed, we have
where \(W^{{i}_{\beta , \ell _0}}\) is independent of \({{\mathcal {F}}}_T^V\) because \(i_{\beta , \ell _0} \notin V\) due to (4.3). We therefore deduce from Lemma 5.2 that
This yields (5.1) as required.
Next, assume that condition (ii) is satisfied. In addition, we may assume that condition (i) does not hold since otherwise we would fall in the case just treated. We then define
and observe that \({i}_{\alpha , \ell } \in V\) for all \(\alpha \in [n]\) and all \(\ell \in [k_\alpha ]\) (since condition (i) does not hold), and that \({j}_{\alpha , \ell } \in V\) for all \(\alpha \in [n]\) and all \(\ell \in [k_\alpha ]\) except if \((\alpha , \ell ) = (1, k_1)\) (by definition of V and since (ii) holds). In particular, \(I_{{\varvec{i}_{\alpha }},{{\varvec{j}}_{\alpha }}}^{N}(T_{\alpha - 1}, T_{\alpha })\) is \({{\mathcal {F}}}_T^V\)-measurable for all \(\alpha \ge 2\), and as before \(\Psi \) is \({{\mathcal {F}}}_T^V\)-measurable as well. Thus
The same iterative application of Lemma 5.1 as before, but now with \({{\mathcal {G}}}= {{\mathcal {F}}}_T^V\) and using that \(W^{{i}_{1,\ell }}_t\), \(t \le T\), is \({{\mathcal {F}}}_T^V\)-measurable for all \(\ell \in [k_1]\) and that \({G}^{{i}_{1,\ell }{j}_{1,\ell }}_{t}\), \(t \le T\), is \({{\mathcal {F}}}_T^V\)-measurable for all \(\ell \in [k_1-1]\), leads to
The conditional expectation on the right-hand side is equal to zero for all \(t_{k_1} \le T\), thanks to (4.2) and the fact that \(i_{1, k_1} \in V\) and \(j_{1, k_1} \notin V\). This completes the proof of the lemma.
5.3 Proof of Lemma 4.2
First we pick the subset \(K = \{i_{0,1}, \ldots , i_{0,\kappa }\} \subset [N]\), which can be done in exactly \(\left( {\begin{array}{c}N\\ \kappa \end{array}}\right) \) ways. Next, we pick the coordinates of the vectors \({{\varvec{j}}_{1}} = ({j}_{1, 1}, \ldots ,{j}_{1, k_1}), \ldots , {{\varvec{j}}_{n}} = ({j}_{n, 1}, \ldots ,{j}_{n, k_n})\). There are N possible choices for the first \(k_1 - 1\) coordinates \({j}_{1, 1}, \ldots ,{j}_{1, k_1-1}\) of \({{\varvec{j}}_{1}}\), and also N choices for all the \(k_\alpha \) coordinates of \({{\varvec{j}}_\alpha }\) for \(\alpha \in \{2, \ldots , n\}\). Therefore, we can pick all these coordinates in \(N^{k_1 - 1 + k_2 + \ldots + k_n} = N^{S - 1}\) ways. Then, the \(k_1\)-th coordinate \(j_{1,k_1}\) of \({{\varvec{j}}_1}\) needs to be taken equal to either one of the other \(k_1 - 1 + k_2 + k_3 + \cdots + k_n = S - 1\) coordinates we have already picked or one of the \(\kappa \) elements of \(K = \{i_{0,1}, \ldots , i_{0,\kappa }\}\), as otherwise the second condition (ii) of Lemma 4.1 will be satisfied. The latter can be done in at most \(S - 1 + \kappa \) ways. Hence, there are at most \(\left( {\begin{array}{c}N\\ \kappa \end{array}}\right) (\kappa + S - 1)N^{S - 1}\) ways to pick the subset K of [N] and all the coordinates of \({{\varvec{j}}_1}, \ldots , {{\varvec{j}}_n}\). Finally, we pick the coordinates of the vectors \({\varvec{i}_1},\ldots , {\varvec{i}_n}\), where for each \(\beta \in [n]\) and \(\ell \in [k_\beta ]\) we must take
so that the first condition (i) of Lemma 4.1 will fail to hold. This can be done in at most \(u(\beta , \ell ) = \kappa + \ell - 1 + k_{\beta +1} + k_{\beta +2} + \ldots + k_{n}\) ways. Observing that \(u(\beta , \ell )\) takes every integer value between \(u(n, 1) = \kappa \) and \(u(1, k_1) = \kappa + S - 1\) exactly once, we see that the coordinates of \({\varvec{i}_1}, \ldots , {\varvec{i}_n}\) can be picked in at most \(\kappa (\kappa + 1) \cdots (\kappa + S - 1)\) ways. Therefore, the number of ways in which the entire selection of the elements of K and the coordinates of \({\varvec{i}_1}, \ldots , {\varvec{i}_n}\) and \({{\varvec{j}}_1}, \ldots , {{\varvec{j}}_n}\) can be done is at most
5.4 Proof of Lemma 4.3
For \(m = 0\) the inequality holds trivially so we can assume that \(m \ge 1\). Applying (4.5) with \(M = M^N\) yields
Next, Hölder’s inequality and the fact that the distribution of \(\Delta B^{i,N}\) is the same for all i since our system is exchangeable give
Letting C be the upper bound on the derivative of \(r \mapsto B(t, {\varvec{x}}_{[0,t]}, r)\) afforded by Assumption 2.3, we obtain
Using this in (5.3) yields
We may now use Assumption 2.3 to deduce that for any integer \(p > 1\),
Plugging this into (5.2) we obtain
A calculation using Stirling’s approximation yields \(((pm)!)^{1/2p} / m! < (2pm )^{-m/2}(8e^2p)^{m}\), and substituting this into (5.4) finally leads to
This shows that the desired result holds with \(C(T) = 24Ce^2\sqrt{\sup _{0 \le u \le T}K(u)}\).
5.5 Proof of Lemma 4.6
We recall the identity
By the binomial theorem and Leibniz’s rule for the derivative of a product of functions, we have the estimate
Combining (5.5) and (5.6), plugging in \(x = (C/N)^{1-1/\log N}\), noting that this value of x is upper bounded by Ce/N since \(C>1\), and finally using that \(1 + Ce/N \le \exp (Ce/N)\) and that \(\left( eC\right) ^{S - i + 1} < \left( eC\right) ^{S + 1}\) since \(eC> e > 1\), we obtain the desired inequality.
6 Proof of Theorem 2.4
We now prove Theorem 2.4. The setup of Sect. 2 will be used. In particular the objects \(\mu ^N\), \(M^N\), \(\Delta B^{i,N}\) in (2.3)–(2.5) as well as the density process \(Z^N = \exp (M^N - \frac{1}{2} \langle M^N \rangle )\) will be referred to freely. Assumptions 2.1 and 2.3 are in force. The induced measure \({{\mathbb {Q}}}^N\), the normalizing constants \(a^N_T,b^N_T\), and the limiting distribution function \(\Gamma _T\) are as in the statement of the theorem. The time point T is fixed throughout.
We must prove (2.7). It suffices to do this for \(x \in {{\mathbb {R}}}\) such that \(\Gamma _T(x) > 0\). Indeed, suppose this has been done and consider x such that \(\Gamma _T(x) = 0\). Because all extreme value distributions are continuous, for any \(\varepsilon > 0\) there is \(x' > x\) such that \(0< \Gamma _T(x') < \varepsilon \), and thus
Since \(\varepsilon > 0\) was arbitrary, the left-hand side converges to \(\Gamma _T(x) = 0\) as \(N \rightarrow \infty \). We thus pick x such that \(\Gamma _T(x) > 0\) and set out to prove that as \(N \rightarrow \infty \),
where for brevity we introduce the notation
The proof is divided into several steps.
Step 1: partitioning the time interval. Chaos expansions of \(Z^N\) are at the core of the proof, and to get sufficient control on the convergence of these expansions we partition the interval (0, T] into n subintervals \((T_{\alpha -1},T_\alpha ]\), \(\alpha \in [n]\), of equal length \(T_\alpha - T_{\alpha -1} = T/n\). We choose n large enough that \(C(T) \sqrt{T/n} < 1/2\), where C(T) is the constant in Lemma 4.3, and then keep n fixed for the remained of the proof. We now observe the identity
where \(\delta _{\alpha \beta }\) is the Kronecker delta, thus \(\delta _{\alpha \beta }=1\) if \(\alpha = \beta \) and \(\delta _{\alpha \beta }=0\) otherwise. This yields
where
To prove the theorem it suffices to show that \(A^N_\alpha \rightarrow 0\) as \(N \rightarrow \infty \) for each \(\alpha \in [n]\). We thus fix any such \(\alpha \) and set out to prove that \(A^N_\alpha \rightarrow 0\).
Step 2: controlling the tails of the chaos expansions uniformly in N. Let \(\varepsilon > 0\) be arbitrary. We will show by induction that there are positive integers \(m_1, m_2, \ldots , m_{\alpha }\), which do not depend on N, such that for \(\gamma = 1,\ldots , \alpha + 1\) we have
with the convention that an empty product is equal to one. The base case \(\gamma = 1\) holds trivially because the right-hand side is then just equal to \(|A^N_\alpha |\). Suppose now that for some \(\gamma \in [\alpha ]\) we have determined positive integers \(m_1,\ldots ,m_{\gamma -1}\) such that (6.1) holds. We will find \(m_\gamma \) such that (6.1) is true with \(\gamma \) replaced by \(\gamma +1\).
To this end, decompose the chaos expansion of \(Z^N_{T_\gamma } / Z^N_{T_{\gamma -1}}\) as
As will become clear shortly, the infinite series converges in \(L^2\) thanks to Lemma 4.3 and the fact that \(T_{\gamma }-T_{\gamma -1} = T/n\) is sufficiently small. Plugging this into the induction hypothesis (6.1) we get
The third term on the right-hand side of (6.2) is bounded by
Note that \(| \prod _{\beta = \gamma +1}^\alpha ( Z_{T_\beta }^N / Z_{T_{\beta - 1}}^N - \delta _{\alpha \beta })|\) is bounded by \((Z^N_{T_\alpha } + Z^N_{T_{\alpha -1}}) / Z^N_{T_\gamma }\) if \(\gamma \le \alpha -1\), and by one if \(\gamma =\alpha \). The martingale property of \(Z^N\) thus implies that the conditional expectation above is bounded by two. Using also Hölder’s inequality, the triangle inequality, and finally Lemma 4.3, we bound the expression in the preceding display by
Thanks to the choice of n in Step 1, the right-hand side is bounded by
We now simply choose \(m_\gamma \) large enough that this expression is less than \(\varepsilon \). Plugging this back into (6.2) yields (6.1) with \(\gamma \) replaced by \(\gamma +1\). This completes the induction step and shows that (6.1) holds for all \(\gamma =1,\ldots ,\alpha +1\). In particular, taking \(\gamma =\alpha +1\) we obtain
Step 3: reduction to linear drift. We now linearize the drift function \(B(t, {\varvec{x}}_{[0,t]}, r)\) with respect to its third argument. We write \(D_3 B(t, {\varvec{x}}_{[0,t]}, r)\) for the derivative with respect to r and define for simplicity the process
Note that \(D_3B^i\) is adapted to the filtration \(({{\mathcal {F}}}^{\{i\}}_t)_{t \ge 0}\) generated by \((X^i,W^i)\). We also write \(H^i_t(\mu ) = \int g(t, X^i_{[0,t]}, {\varvec{y}}_{[0,t]}) \mu (d{\varvec{y}})\) for any signed measure \(\mu \) on \(C({{\mathbb {R}}}_+)\). We then have the Taylor formula
where \(R^{i,N}\) is a process which is uniformly bounded in terms of the bound on the second derivative of \(r \mapsto B(t, {\varvec{x}}_{[0,t]}, r)\) given by Assumption 2.1. We now define local martingales
and iterated integrals
We will prove that there exists a constant C, which does not depend on N, such that
To prove this, we expand the products and use the triangle inequality to bound the left hand side by
where the sum ranges over all \((k_1,\ldots , k_\alpha )\) such that \(k_\beta \in [m_{\beta }]\cup \{0\}\) for \(\beta < \alpha \) and \(k_{\alpha }\in [m_\alpha ]\). On each summand in the above expression we apply the identity
and then use the triangle inequality along with Hölder’s inequality to get
Thus it suffices to bound each of the products in (6.7) by a constant times \(1/\sqrt{N}\). Since each of these products has at least one factor with \(i_\beta = 1\), this will follow directly from the estimates
and
Once these estimates have been proved, (6.6) follows.
To prove (6.8)–(6.9) we first derive \(L^p\) estimates for the quadratic variations of \({\widetilde{M}}^N\) and \(M^N - {\widetilde{M}}^N\). Let C be a uniform bound on the first and second derivatives of \(r \mapsto B(t, {\varvec{x}}_{[0,t]}, r)\) as given by Assumption 2.1, and recall that Assumption 2.3 gives
for any positive integer p and \(t \in {{\mathbb {R}}}_+\). Therefore using Hölder’s inequality we obtain, for any positive integer p,
and
To prove (6.8) we apply Lemma 4.5 to the iterated integral \(\widetilde{I}_{k_\beta }^N(T_{\beta - 1}, T_\beta )\) and combine this with (6.10) to get
To prove (6.9) we observe that \(I_{k_\beta }^N(T_{\beta - 1}, T_\beta ) - \widetilde{I}_{k_\beta }^N(T_{\beta - 1}, T_\beta )\) can be written as the sum of \(2^{k_\beta } - 1\) terms, each having the form
where \(Y^{\ell } = M^N - \widetilde{M}^N\) for at least one \(\ell \) and \(Y^{\ell } = \widetilde{M}^N\) for the remaining \(\ell \). By first applying Lemma 4.5 and then (6.10) and (6.11) we get
(Here we used (6.10) for each \(Y^{\ell }\) that equals \(\tilde{M}^N\) and (6.11) for each \(Y^{\ell }\) that equals \(M^N - \tilde{M}^N\), and the \(1/\sqrt{N}\) factor emerged because there is at least one factor of the latter kind.) By summing and using the triangle inequality we finally obtain (6.9).
To summarize, we have now proved (6.8)–(6.9), thus showing that each of the products in (6.7) is bounded by a constant times \(1/\sqrt{N}\). This in turn yields (6.6) as desired.
We end Step 3 by combing (6.6) and (6.3) to get
The key point is that the iterated integrals \({\widetilde{I}}^N_m(T_{\beta -1},T_\beta )\) are defined in terms of the local martingale \({\widetilde{M}}^N\) in (6.4) which, unlike \(M^N\), depends linearly on \(\mu ^N-\mu \). In sense, all nonlinear dependence on \(\mu ^N-\mu \) has been absorbed into the vanishing term \(C/\sqrt{N}\).
Step 4: expanding the iterated integrals. Our starting point is now (6.12), where we recall that \(\alpha \) is fixed, \(\varepsilon >0\) is arbitrary, and \(m_1,\ldots ,m_\alpha , C\) do not depend on N. Therefore, to show that \(A^N_\alpha \rightarrow 0\) as \(N \rightarrow \infty \), it is enough to show that the expectation in (6.12) tends to zero as \(N \rightarrow \infty \). We now pave the way by expanding the sums and products in (6.12) to bring us into a position where the results of Sect. 4 can be applied.
We first expand the product indexed by \(\beta \) to bound the expectation in (6.12) by
where the sum ranges over all \((k_1,\ldots , k_\alpha )\) such that \(k_\beta \in [m_{\beta }]\cup \{0\}\) for \(\beta < \alpha \) and \(k_{\alpha }\in [m_\alpha ]\). It suffices to show that each summand in (6.13) vanishes as \(N \rightarrow \infty \), so we fix a tuple \((k_1,\ldots ,k_\alpha )\) and focus on the corresponding expectation.
The next step is to insert the identity \({\varvec{1}}_{\{X_{T}^{i} \le x_N\}} = 1 - {\varvec{1}}_{\{X_{T}^{i} > x_N\}}\) and expand the product to write the expectation as
The purpose of the substitution \({\varvec{1}}_{\{X_{T}^{i} \le x_N\}} = 1 - {\varvec{1}}_{\{X_{T}^{i} > x_N\}}\) is to allow us to use the fact that \(\{X_{T}^{i} > x_N\}\) are independent events whose probabilities are of order 1/N.
We proceed to expand the iterated integrals \(\widetilde{I}_{k_{\beta }}^N(T_{\beta - 1}, T_{\beta })\). In view of (6.4) and the definition of \(H_s^i(\mu ^N_s - \mu _s)\) and \(\mu ^N_s\) we have
where
for all \(i, j \in [N]\). Plugging (6.15) and (6.16) into the definition of \(\widetilde{I}_{k_{\beta }}^N(T_{\beta - 1}, T_{\beta })\), see (6.5), gives
where we use the iterated integral notation (4.1) of Sect. 4. The product of iterated integrals appearing in (6.14) can then be written
where the sum extends over all \(\alpha \)-tuples \((({\varvec{i}}_1,{{\varvec{j}}}_1),\ldots ,({\varvec{i}}_\alpha ,{{\varvec{j}}}_\alpha ))\) consisting of pairs \(({\varvec{i}}_{\beta },{{\varvec{j}}}_{\beta })\) in \([N]^{k_\beta } \times [N]^{k_\beta }\). Substituting this representation turns the right-hand side of (6.14) into
whose absolute value is bounded by
We are now finally in a position where the results of Sect. 4 can be applied to show that (6.18) tends to zero as \(N\rightarrow \infty \). In particular, for small values of \(\kappa \), an overwhelming number of expectations in (6.18) will be zero, while for large values of \(\kappa \) we can exploit the smallness of the probabilities \({{\mathbb {P}}}(X_{T}^{i_{0\ell }} > x_N)\).
Step 5: application of key lemmas. Our focus is on showing that (6.18) tends to zero as \(N \rightarrow \infty \), and we recall that \(\alpha \) and \(k_1,\ldots ,k_\alpha \) are fixed.
We first aim to apply Lemma 4.1 to assert that a large number of the expectations in (6.18) are in fact zero. We thus fix \(\kappa \in [N]\) and instantiate the lemma with \(G^{ij}\) as in (6.16), the time points \(T_0,\ldots ,T_\alpha \), the natural numbers \(k_1,\ldots ,k_\alpha \), the \(k_\beta \)-tuples \(\varvec{i}_\beta , {\varvec{j}}_\beta \in [N]^{k_\beta }\) for \(\beta \in [\alpha ]\), the subset \(K = \{i_{01},\ldots ,i_{0\kappa }\} \subset [N]\), and the bounded \({{\mathcal {F}}}^K_T\)-measurable random variable \(\Psi = \prod _{\ell =1}^{\kappa } {\varvec{1}}_{\{X_{T}^{i_{0\ell }} > x_N\}}\). We must verify the conditions of Lemma 4.1. It is clear that for each i, j, \(G^{ij}\) is adapted to \(({{\mathcal {F}}}^{\{i,j\}}_t)_{t \ge 0}\) and, thanks to Assumption 2.3 and the uniform boundedness of \(D_3B^i\), belongs to \({\mathbb {L}}\). Indeed, a brief calculation yields
for any \(p \in {{\mathbb {N}}}\) and \(t \in {{\mathbb {R}}}_+\), where C is a uniform bound on \(B_3B^i\) and K(t) comes from Assumption 2.3. Moreover, using that the \((X^i,W^i)\) are mutually independent and \(D_3B^i\) is adapted to \(({{\mathcal {F}}}^{\{i\}}_t)_{t \ge 0}\), one verifies that (4.2) holds whenever \(V \subset [N]\) and \(i \in V\), \( j \notin V\).
Lemma 4.1 now tells us that the expectation in (6.18) vanishes whenever the subset \(K=\{i_{01},\ldots ,i_{0\kappa }\}\) and tuples \(\varvec{i}_1,\ldots ,\varvec{i}_\alpha , {\varvec{j}}_1, \ldots , {\varvec{j}}_\alpha \) satisfy at least one of the conditions (i)–(ii) of the lemma. Thanks to Lemma 4.2, for each \(\kappa \in [N]\) there are at most
terms for which this is not the case, where we write \(S = k_1 + \ldots + k_{\alpha }\). We claim, and will prove below, that each of these nonzero terms admits the bound
for a constant C that does not depend on N. Combining these two facts we upper bound (6.18) by
Thanks to Lemma 4.6, this is in turn bounded by
which tends to zero as \(N \rightarrow \infty \). Tracing backwards, we deduce that (6.18) and hence (6.14) tends to zero as well. This is true for any choice of \((k_1,\ldots ,k_\alpha )\), showing that (6.13) tends to zero. As a result, we see from (6.12) that \(\limsup _{N\rightarrow \infty } |A^N_\alpha | \le \alpha \varepsilon \) and thus \(A^N_\alpha \rightarrow 0\) since \(\varepsilon > 0\) was arbitrary. We recall from Step 1 that it was enough to obtain this for any \(\alpha \in [n]\) in order to prove the theorem.
It still remains to establish (6.20). Applying Hölder’s inequality with exponents \(p_N=\log N\) and \(q_N = (1-1/\log N)^{-1}\) gives
Using Lemma 4.5 with \(k = k_{\beta }\) and \(M_{t}^{\ell } = \int _0^{t}G^{i_{\beta \ell },j_{\beta \ell }}_s dW_s^{i_{\beta \ell }}\) for \(\ell \in [k_{\beta }]\) we bound
for any \(p \in {{\mathbb {N}}}\). Moreover, from Hölder’s inequality and (6.19), and recalling also that \(T_\beta - T_{\beta -1} ~=~ T/n\) (see Step 1), we get
where in the last step we used Stirling’s approximation and where C (which as per our conventions may change from one occurrence to the next) does not depend on p or N. We deduce that
Choosing \(p = \alpha \lceil p_N \rceil \) we apply the above bounds to obtain
All that remains in order to establish (6.20) is to show that \({{\mathbb {P}}}( X_T > x_N) \le C/N\). But this follows from the fact that \((1 - {{\mathbb {P}}}(X_T> x_N))^N = {{\mathbb {P}}}(\max _{i \le N} X^i_T \le a^N_T x + b^N_T) \rightarrow \Gamma _T(x) > 0\) by assumption, so that \(N {{\mathbb {P}}}(X_T> x_N) \le - N \log (1-{{\mathbb {P}}}(X_T > x_N)) \le C\) for some constant C that does not depend on N. This completes the proof of (6.20), and of the theorem.
References
Ben Arous, G., Brunaud, M.: Méthode de Laplace: étude variationnelle des fluctuations de diffusions de type “champ moyen”. Stoch. Stoch. Rep. 31(1–4), 79–144 (1990)
Bo, L., Capponi, A.: Systemic risk in interbanking networks. SIAM J. Financial Math. 6(1), 386–424 (2015)
Carlen, E., Kree, P.: Lp estimates on iterated stochastic integrals. Ann. Probab. 19(1), 354–368 (1991)
Carmona, R., Fouque, J., Sun, L.: Mean field games and systemic risk. Commun. Math. Sci. 13(4), 911–933 (2015)
Cohen, S.N., Elliott, R.J.: Stochastic Calculus and Applications, volume 5 of Probability and its Applications. Springer, Cham, second edition (2015)
Dawson, D.A., Gärtner, J.: Large deviations from the McKean–Vlasov limit for weakly interacting diffusions. Stochastics 20(4), 247–308 (1987)
de Haan, L., Ferreira, A.: Extreme Value Theory: An Introduction. Springer Series in Operations Research and Financial Engineering. Springer, New York (2007)
Fernholz, E.R.: Stochastic portfolio theory, volume 48 of Applications of Mathematics. Stochastic Modelling and Applied Probability (New York). Springer-Verlag, New York (2002)
Gärtner, J.: On the McKean–Vlasov limit for interacting diffusions. Math. Nachr. 137, 197–248 (1988)
Ginibre, J.: Statistical ensembles of complex, quaternion, and real matrices. J. Math. Phys. 6, 440–449 (1965)
Jabir, J.-F.: Rate of propagation of chaos for diffusive stochastic particle systems via Girsanov transformation. arxiv:1907.09096 (2019)
Jiao, Y., Kolliopoulos, N.: Well-posedness of a system of SDEs driven by jump random measures. arxiv:2102.03918 (2021)
Jourdain, B., Reygner, J.: Propogation of chaos for rank-based interacting diffusions and long time behaviour of a scalar quasilinear parabolic equation. Stoch. Partial Differ. Equ. Anal. Comput. 1(3), 455–506 (2013)
Jourdain, B., Reygner, J.: Capital distribution and portfolio performance in the mean-field Atlas model. Ann. Finance 11(2), 151–198 (2015)
Kolli, P., Shkolnikov, M.: SPDE limit of the global fluctuations in rank-based models. Ann. Probab. 46(2), 1042–1069 (2018)
Lacker, D.: On a strong form of propagation of chaos for McKean-Vlasov equations. Electron. Commun. Probab., 23:Paper No. 45, 11 (2018)
Lacker, D.: Hierarchies, entropy, and quantitative propagation of chaos for mean field diffusions. arxiv:2105.02983 (2021)
Ledger, S.: Particle systems and stochastic PDEs on the half-line. Ph.D. thesis, University of Oxford (2016)
McKean, H.P., Jr.: A class of Markov processes associated with nonlinear parabolic equations. Proc. Nat. Acad. Sci. USA 56, 1907–1911 (1966)
Resnick, S.I.: Extreme values, regular variation and point processes. Springer Series in Operations Research and Financial Engineering. Springer, New York (2008). Reprint of the 1987 original
Rider, B.: A limit theorem at the edge of a non-Hermitian random matrix ensemble, Random matrix theory. volume 36, pp. 3401–3409 (2003)
Rigollet, P.: Graduate lecture notes in High - Dimensional Statistics, Chapter 1. MIT OpenCourseWare, Massachusetts Institute of Technology. https://ocw.mit.edu/courses/18-s997-high-dimensional-statistics-spring-2015/resources/mit18_s997s15_chapter1/ (2015)
Shkolnikov, M.: Large systems of diffusions interacting through their ranks. Stoch. Process. Appl. 122(4), 1730–1747 (2012)
Sznitman, A.-S.: Topics in propagation of chaos. In École d’Été de Probabilités de Saint-Flour XIX—1989, volume 1464 of Lecture Notes in Math., pages 165–251. Springer, Berlin (1991)
Tracy, C.A., Widom, H.: Level-spacing distributions and the Airy kernel. Commun. Math. Phys. 159(1), 151–174 (1994)
Funding
Open Access funding provided by Carnegie Mellon University
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The authors would like to thank Dan Lacker for valuable input, in particular for communicating the elegant argument presented in Remark 2.9. Several insightful comments by an anonymous referee are gratefully acknowledged. This work has been partially supported by the National Science Foundation under grant NSF DMS-2206062.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Kolliopoulos, N., Larsson, M. & Zhang, Z. Propagation of chaos for maxima of particle systems with mean-field drift interaction. Probab. Theory Relat. Fields 187, 1093–1127 (2023). https://doi.org/10.1007/s00440-023-01213-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00440-023-01213-9