1 Introduction

This paper is concerned with the large-population asymptotics of the maxima of certain real-valued diffusive particle systems \(X^{1,N},\ldots ,X^{N,N}\) with mean-field interaction through the drifts. Specifically, we are interested in large-N limits of

$$\begin{aligned} \max _{i \le N} \frac{X^{i,N}_T - b^N_T}{a^N_T}, \end{aligned}$$
(1.1)

where \(a^N_T\) and \(b^N_T\) are suitable normalizing constants. The particle dynamics are specified as follows, specializing the setup of [11]. For each \(N \in {{\mathbb {N}}}\) the N-particle system evolves according to a stochastic differential equation of the form

$$\begin{aligned} dX^{i,N}_t= & {} A(t, X^{i,N}_{[0,t]}) \left( B\left( t, X^{i,N}_{[0,t]}, \int g(t, X^{i,N}_{[0,t]}, {\varvec{y}}_{[0,t]}) \mu ^N_t(d{\varvec{y}}) \right) dt + dW^i_t \right) \nonumber \\{} & {} + C(t, X^{i,N}_{[0,t]}) dt \end{aligned}$$
(1.2)

for \(i=1,\ldots ,N\), with i.i.d. initial conditions \(X^{i,N}_0 \sim \nu _0\) where \(\nu _0\) is a given probability measure on \({{\mathbb {R}}}\). We use the notation \({\varvec{x}}_{[0,t]} = (x(s))_{s \in [0,t]}\) for any continuous function \({\varvec{x}}\), and for each \(t \in {{\mathbb {R}}}_+\) we let

$$\begin{aligned} \mu ^N_t = \frac{1}{N} \sum _{i=1}^N \delta _{X^{i,N}_{[0,t]}} \end{aligned}$$

denote the empirical measure of the particle trajectories up to time t. The coefficients \(A(t, {\varvec{x}}_{[0,t]})\), \(B(t, {\varvec{x}}_{[0,t]}, r)\), \(C(t, {\varvec{x}}_{[0,t]})\) and the interaction function \(g(t, {\varvec{x}}_{[0,t]}, {\varvec{y}}_{[0,t]})\) are defined for all \(t \in {{\mathbb {R}}}_+\), \({\varvec{x}}, {\varvec{y}} \in C({{\mathbb {R}}}_+)\), and \(r \in {{\mathbb {R}}}\). Precise assumptions are discussed below. Finally, \(W^i\), \(i \in {{\mathbb {N}}}\), is family of independent standard Brownian motions. We emphasize that there is no interaction in the volatility coefficient A. This is crucial for the methods used in this paper.

Under suitable assumptions, classical propagation of chaos [9, 19, 24] states that for any fixed number \(k \in {{\mathbb {N}}}\), the first k particles \((X^{1,N}, \ldots , X^{k,N})\) converge jointly as \(N \rightarrow \infty \) to k independent copies \((X^1,\ldots ,X^k)\) of the solution to the McKean–Vlasov equation

$$\begin{aligned} \begin{aligned} dX_t&= A(t, X_{[0,t]}) \left( B\left( t, X_{[0,t]}, \int g(t, X_{[0,t]}, {\varvec{y}}_{[0,t]}) \mu _t(d{\varvec{y}}) \right) dt + dW_t \right) \\&\quad +C(t, X_{[0,t]}) dt \\ \mu _t&= \text {Law}(X_{[0,t]}) \end{aligned} \end{aligned}$$
(1.3)

with initial condition \(\mu _0 = \nu _0\). A rigorous version of this statement that fits our current setup is given in [11, Theorem 2.1], where convergence takes place in total variation and comes with quantitative bounds on the distance between the k-tuple from the N-particle system and the limiting k-tuple; see also [16].

At an intuitive level, propagation of chaos means that for large N the interacting particle system behaves approximately like a system of i.i.d. particles. This intuition suggests that the large-N asymptotics of the normalized maxima in (1.1) should match the asymptotics of the normalized maxima of the independent copies \(X^i\) of the solution of (1.3),

$$\begin{aligned} \max _{i \le N} \frac{X^i_T - b^N_T}{a^N_T}. \end{aligned}$$
(1.4)

Because they are i.i.d., the latter fall within the framework of classical extreme value theory; see e.g. [7, 20] for an introduction. This intuition is flawed however, because propagation of chaos only makes statements about a fixed number k of particles, while the maximum \(\max _{i\le N} X^{i,N}_T\) depends on all the particles. Furthermore, there are lower bounds on how similar \((X^{1,N},\ldots ,X^{k,N})\) and \((X^1,\ldots ,X^k)\) can be in general. In a simple Gaussian example, it is shown in [17] that the relative entropy between the two is bounded below by a constant times \((k/N)^2\). In particular, if \(k \rightarrow \infty \) and k/N remains bounded away from zero, convergence does not take place. Barriers of this kind have prevented us from deriving statements about normalized maxima as corollaries of standard results on propagation of chaos.

Our main result nonetheless shows, under assumptions, that the normalized maxima of the N-particle systems do behave asymptotically like those of an i.i.d. system. In this sense, one has propagation of chaos of normalized maxima. The following statement is slightly informal; Theorem 2.4 gives the precise version.

Theorem 1.1

Suppose Assumptions 2.1 and 2.3 below are satisfied. Fix \(T \in (0,\infty )\) and suppose that for some normalizing constants \(a^N_T,b^N_T\) the normalized maxima (1.4) of the i.i.d. system converge weakly to a nondegenerate distribution \(\Gamma _T\) on \({{\mathbb {R}}}\) as \(N \rightarrow \infty \). Then the normalized maxima (1.1) of the interacting particle systems also converge to \(\Gamma _T\) as \(N \rightarrow \infty \).

The precise assumptions are discussed in Sect. 2, along with additional comments, and examples are developed in Sect. 3. Here we only highlight three points, deferring the details to Sects. 2 and 3.

First, a key motivating example and application of Theorem 1.1 comes from a class of models known as rank-based diffusions, which were first studied by [8] in the context of stochastic portfolio theory. In a rank-based model with drift interaction, the N-particle system evolves as

$$\begin{aligned} dX^{i,N}_t = B\left( \frac{1}{N} \text {rank}_t(X^{i,N}_t) \right) dt + dW^i_t, \quad i=1,\ldots ,N, \end{aligned}$$

where \(\text {rank}_t(X^{i,N}_t)\) denotes the rank of the ith particle within the population: \(\text {rank}_t(X^{i,N}_t) = k\) if \(X^{i,N}_t\) is the kth largest particle, with a suitable convention in case of ties. The factor 1/N anticipates a passage to the large-N limit. Rank-based diffusions of this type have been studied extensively and their mean-field asymptotics are well understood. However, the asymptotics of the largest particle, of particular interest in the applied context, were previously unknown. As shown in Example 3.2, our main result is applicable and allows us to fill this gap.

Second, note that Theorem 1.1 only asserts one-dimensional marginal convergence at single time points T. Nonetheless, as discussed in Sect. 2, in some cases one expects joint marginal convergence of the form

$$\begin{aligned} \left( \max _{i \le N} \frac{X^{i,N}_{T_1} - b^N_{T_1}}{a^N_{T_1}}, \ldots , \max _{i \le N} \frac{X^{i,N}_{T_n} - b^N_{T_n}}{a^N_{T_n}} \right) \Rightarrow \Gamma _{T_1} \otimes \cdots \otimes \Gamma _{T_n} \text { as } N \rightarrow \infty , \end{aligned}$$

for any \(T_1< \ldots < T_n\), where the limit is in the sense of weak convergence toward a product measure with nondegenerate components. No continuous process has finite-dimensional marginal distributions of this form, so this precludes convergence at the level of continuous processes.

Third, as part of the hypotheses of Theorem 1.1 we assume that the normalized maxima (1.4) of the i.i.d. system admit a nondegenerate limit law \(\Gamma _T\). Classical extreme value theory asserts that up to affine transformations, \(\Gamma _T\) must belong to a one-parameter family of extreme value distributions consisting of the Fréchet, Gumbel, and Weibull distributions. It is obviously of interest to characterize \(\Gamma _T\) in terms of the data ABCg, and \(\nu _0\). This question is the subject of ongoing work, and falls outside the scope of this paper. Nonetheless, in the examples in Sect. 3, we are able to verify this domain of attraction hypothesis by hand.

Let us mention that the large body of work that exists on the extreme eigenvalue statistics of random matrices is related to our paper in that those eigenvalues in many cases can be described by mean-field interacting diffusions. For example, the eigenvalues of a GUE (Gaussian Unitary Ensemble) random matrix are described by Dyson Brownian motion. However, the largest eigenvalue, suitably normalized, converges in distribution to the Tracy–Widom law [25], which is different from the extreme value distributions that can arise in our framework. Another random matrix model is the Ginibre ensemble [10], whose normalized spectral radius converges to the Gumbel law [21]. Although this is the same limit law that we observe in our examples, the interaction among the eigenvalues of the Ginibre ensemble is not covered by our setup, in essence because we exclude interaction in the diffusion coefficients.

The rest of the paper is organized as follows. First, we finish the introduction with an outline of some of the main steps and ideas of the proof of the main theorem. Then, in Sect. 2, we give precise statements of our assumptions and results. We also reproduce an argument due to D. Lacker (personal communication) which vastly simplifies the proof under suitable Lipschitz assumptions; see Remark 2.9. Examples and applications are discussed in Sect. 3. Section 4 collects key lemmas needed for the proof of the main theorem. These lemmas are proved in Sect. 5. Finally, the main theorem is proved in Sect. 6. We will frequently use the notation

$$\begin{aligned}{}[n] = \{1,\ldots ,n\} \text { for any } n \in {{\mathbb {N}}}= \{1,2,\ldots \}, \end{aligned}$$

and \({{\mathbb {R}}}_+ = [0,\infty )\). We will allow generic constants C to vary from line to line, and occasionally indicate the dependence on parameters by writing C(n), C(pn), etc.

1.1 Outline of the proof of Theorem 1.1

The remainder of this introduction contains an outline of some of the main steps and ideas of the proof of Theorem 1.1. To simplify the discussion we take \(A = 1\), \(C = 0\), and \(B(t, {\varvec{x}}_{[0,t]}, r) = r\). We fix \(T \in (0,\infty )\) and note that the theorem will be proved if we show that for any \(x \in {{\mathbb {R}}}\),

$$\begin{aligned} {{\mathbb {P}}}\left( \max _{i \le N} \frac{X^{i,N}_T - b^N_T}{a^N_T} \le x \right) - {{\mathbb {P}}}\left( \max _{i \le N} \frac{X^i_T - b^N_T}{a^N_T} \le x \right) \rightarrow 0 \text { as } N \rightarrow \infty . \end{aligned}$$
(1.5)

Here \(X^i\), \(i \in {{\mathbb {N}}}\), are i.i.d. copies of the solution of (1.3) with driving Brownian motions \(W^i\), and all objects are defined on a filtered probability space \((\Omega ,{{\mathcal {F}}},({{\mathcal {F}}}_t)_{t\ge 0},{{\mathbb {P}}})\).

The first observation, going back at least to [1] and also used by [6, 11, 16], is that the structure of the particle dynamics (1.2) allows us to construct for each N a (locally) equivalent measure \({{\mathbb {Q}}}^N \sim _\text {loc} {{\mathbb {P}}}\) under which \((X^1,\ldots ,X^N)\) acquires the law of \((X^{1,N},\ldots ,X^{N,N})\). This is accomplished by the Radon–Nikodym density process

$$\begin{aligned} Z^N_t = \frac{d{{\mathbb {Q}}}^N|_{{{\mathcal {F}}}_t}}{d{{\mathbb {P}}}|_{{{\mathcal {F}}}_t}} = \exp \left( M^N_t - \frac{1}{2} \langle M^N \rangle _t \right) , \end{aligned}$$
(1.6)

where the local martingale \(M^N\) is given by

$$\begin{aligned} M^N_t = \sum _{i=1}^N \int _0^t \int g(s, X^i_{[0,s]}, {\varvec{y}}_{[0,s]}) \left( \mu ^N_s(d{\varvec{y}}) - \mu _s(d{\varvec{y}})\right) dW^i_s \end{aligned}$$

and where, by overloading notation, we set \(\mu ^N_t = \frac{1}{N} \sum _{i=1}^N \delta _{X^i_{[0,t]}}\). We may then re-express the left-hand side of (1.5) as

$$\begin{aligned} {{\mathbb {E}}}\left[ \prod _{i=1}^N {\varvec{1}}_{\{ X^i_T \le x_N \}} \left( Z^N_T - 1\right) \right] , \quad \text {where}\; x_N = a^N_T x + b^N_T. \end{aligned}$$
(1.7)

The key point is that (1.7) is expressed in terms of the mutually independent processes \((X^i,W^i)\), \(i \in [N]\), while the dependence that exists among the particle in the original N-particle system is captured by the Radon–Nikodym derivative \(Z^N_T\). The proof of the theorem rests on a detailed analysis of how \(Z^N_T\) interacts with the indicators in (1.7), ultimately allowing us to “extract enough independence” to show that (1.7) tends to zero in the large-N limit. An analogous strategy of “extracting independence” through the above change of measure was used in [11], although the actual execution of this strategy is very different in our context.

Iterating the SDE satisfied by the stochastic exponential \(Z^N\) leads to the formal chaos expansion

$$\begin{aligned} Z^N_T = 1 + \sum _{m=1}^\infty \int _0^T \int _0^{t_1} \cdots \int _0^{t_{m-1}} dM^N_{t_m} \cdots dM^N_{t_1}. \end{aligned}$$
(1.8)

If T is sufficiently small, one can show that a truncated version of this expansion can be substituted for \(Z^N_T-1\) in (1.7) at the cost an arbitrarily small error \(\varepsilon > 0\). Importantly, although the truncation level \(m_0\) (say) depends on \(\varepsilon \), it does not depend on N. We are thus left with showing that each of the remaining \(m_0\) terms tends to zero, that is, for each \(m \in [m_0]\),

$$\begin{aligned} {{\mathbb {E}}}\left[ \prod _{i=1}^N {\varvec{1}}_{\{ X^i_T \le x_N \}} \int _0^T \int _0^{t_1} \cdots \int _0^{t_{m-1}} dM^N_{t_m} \cdots dM^N_{t_1} \right] \rightarrow 0 \text { as } N \rightarrow \infty . \end{aligned}$$
(1.9)

This is done by substituting \({\varvec{1}}_{\{ X^i_T \le x_N \}} = 1 - {\varvec{1}}_{\{ X^i_T > x_N \}}\) and expanding the product, as well as substituting the definition of \(M^N\) into the iterated integral and expand using multilinearity. The result is a sum consisting of all terms of the form

$$\begin{aligned} \frac{1}{N^m} {{\mathbb {E}}}\left[ \prod _{r=1}^k {\varvec{1}}_{\{ X^{\ell _r}_T > x_N \}} \int _0^T G^{i_1 j_1}_{t_1} \int _0^{t_1} G^{i_2 j_2}_{t_2} \cdots \int _0^{t_{m-1}} G^{i_m j_m}_{t_m} dW^{i_m}_{t_m} \cdots dW^{i_1}_{t_1} \right] \end{aligned}$$
(1.10)

with \(k\in [N]\), \(\{\ell _1,\ldots ,\ell _k\} \subset [N]\), \((i_1,\ldots ,i_m) \in [N]^m\), and \((j_1,\ldots ,j_m) \in [N]^m\), and where the processes

$$\begin{aligned} G^{ij}_t = g(t, X^i_{[0,t]}, X^j_{[0,t]}) - \int g(t, X^i_{[0,t]}, {\varvec{y}}_{[0,t]}) \mu _t(d{\varvec{y}}) \end{aligned}$$

arise when the empirical measure \(\mu ^N_t\) is substituted into the definition of \(M^N\). We are now in a position to sketch the main ways in which we exploit the independence among the processes \((X^i,W^i)\), \(i \in {{\mathbb {N}}}\).

For each k, there are \(N^{2m} \left( {\begin{array}{c}N\\ k\end{array}}\right) \) terms of the form (1.10). Using iterated stochastic integral estimates, along with the independence of the \(X^i\), \(i \in {{\mathbb {N}}}\), and fact that \({{\mathbb {P}}}(X_T > x_N) = O(1/N)\) due to the domain of attraction assumption, we show that each of these terms is bounded by

$$\begin{aligned} {{\mathbb {P}}}(X_t > x_N)^{k \left( 1-\frac{1}{\log N}\right) } \lceil \log N \rceil ^{m} = \left( \frac{O(1)}{N} \right) ^k \lceil \log N \rceil ^{m}. \end{aligned}$$

This is not enough to deduce (1.9) however, because it only produces the upper bound \(O(N^m \lceil \log N \rceil ^{m})\) which does not tend to zero with N. Nonetheless, a refined analysis shows that a large number of the terms (1.10) are in fact zero. Very roughly, this happens when there is a small overlap between the indices \(\{\ell _1,\ldots ,\ell _k\}\) and \(\{i_1,\ldots ,i_m,j_1,\ldots ,j_m\}\), in which case the expectation in (1.10) vanishes despite the presence of the indicators. A counting argument then shows that for each k, at most \(\left( {\begin{array}{c}N\\ k\end{array}}\right) k (k+1) \cdots (k+m) N^{m-1}\) terms remain. Using the earlier estimate to control these remaining terms finally yields the bound \(O(N^{-1} \lceil \log N \rceil ^{m})\) of the left-hand side of (1.9). This does tend to zero as \(N \rightarrow \infty \) and allows us to complete the proof.

Counting the nonzero terms (1.10) and bounding their size constitute the heart of the proof. The key arguments involved are given as lemmas in Sect. 4. However, other parts of the proof also require substantial technical effort. In particular, work is required to (i) reduce from the case of general coefficients ABC to the simpler ones discussed above; (ii) obtain sufficiently strong iterated integral bounds to truncate the chaos expansion independently of N when T is small; and (iii) remove the smallness requirement on T. This leads to added complexity and explains why the full proof of Theorem 1.1 is rather long and technical.

2 Assumptions and main results

To give a precise description of our setup, we first introduce regularity and growth assumptions on the data ABCg.

Assumption 2.1

The coefficient functions \((t,{\varvec{x}}) \mapsto A(t, {\varvec{x}}_{[0,t]})\), \((t,{\varvec{x}},r) \mapsto B(t, {\varvec{x}}_{[0,t]}, r)\), \((t,{\varvec{x}}) \mapsto C(t, {\varvec{x}}_{[0,t]})\) and the interaction function \((t,{\varvec{x}},{\varvec{y}}) \mapsto g(t, {\varvec{x}}_{[0,t]}, {\varvec{y}}_{[0,t]})\) are real-valued measurable functions on \({{\mathbb {R}}}_+ \times C({{\mathbb {R}}}_+)\), \({{\mathbb {R}}}_+ \times C({{\mathbb {R}}}_+) \times {{\mathbb {R}}}\), \({{\mathbb {R}}}_+ \times C({{\mathbb {R}}}_+)\), and \({{\mathbb {R}}}_+ \times C({{\mathbb {R}}}_+) \times C({{\mathbb {R}}}_+)\), respectively. They satisfy the following conditions:

  • A and C are uniformly bounded,

  • for every \(t \in {{\mathbb {R}}}_+\) and \({\varvec{x}} \in C({{\mathbb {R}}}_+)\), the function \(r \mapsto B(t, {\varvec{x}}_{[0,t]}, r)\) is twice continuously differentiable, and its first and second derivatives are bounded uniformly in \((t, {\varvec{x}})\).

Remark 2.2

Note that \(r \mapsto B(t, {\varvec{x}}_{[0,t]}, r)\) itself need not be bounded, only its first two derivatives. We thus cover examples with linear growth. Moreover, if the interaction function g is uniformly bounded, the growth properties of \(r \mapsto B(t, {\varvec{x}}_{[0,t]}, r)\) become irrelevant.

By imposing further conditions we could appeal to known results on well-posedness of McKean–Vlasov equations to assert that (1.3) has a solution. Rather than doing this, we will assume existence directly (uniqueness is not actually required, so we do not assume it.)

Assumption 2.3

Fix a probability measure \(\nu _0\) on \({{\mathbb {R}}}\) and assume that the McKean–Vlasov equation (1.3) admits a weak solution (XW) with \(X_0 \sim \nu _0\). Construct (for instance as a countable product) a filtered probability space \((\Omega ,{{\mathcal {F}}},({{\mathcal {F}}}_t)_{t\ge 0},{{\mathbb {P}}})\) with a countable sequence \((X^i,W^i)\), \(i \in {{\mathbb {N}}}\) of independent copies of (XW). Then, assume that there is a continuous function K(t) such that for all \(p \in {{\mathbb {N}}}\), \(t \in {{\mathbb {R}}}_+\), \(N \in {{\mathbb {N}}}\), and \(i,j \in [N]\), one has the moment bounds

$$\begin{aligned} {{\mathbb {E}}}\left[ g(t, X^i_{[0,t]}, X^j_{[0,t]})^{2p} \right] \le p!\, K(t)^p \end{aligned}$$
(2.1)

and

$$\begin{aligned} {{\mathbb {E}}}\left[ \left( \int g(t, X^i_{[0,t]}, {\varvec{y}}_{[0,t]}) ( \mu ^N_t - \mu _t)(d{\varvec{y}}) \right) ^{2p} \right] \le \frac{1}{N^p} p! \, K(t)^p. \end{aligned}$$
(2.2)

Sufficient conditions for the moment bounds (2.1)–(2.2) along with further discussion are given in Remark 2.7 below.

Let Assumptions 2.1 and 2.3 be in force. For each \(N \in {{\mathbb {N}}}\) we now use the processes \((X^i,W^i)\) to construct the N-particle systems by changing the probability measure. First define the N-particle empirical measure

$$\begin{aligned} \mu ^N_t = \frac{1}{N} \sum _{i=1}^N \delta _{X^i_{[0,t]}}. \end{aligned}$$
(2.3)

Next, define the (candidate) density process \(Z^N = \exp (M^N - \frac{1}{2}\langle M^N \rangle )\) where

$$\begin{aligned} M^N_t = \sum _{i=1}^N \int _0^t \Delta B^{i,N}_s dW^i_s \end{aligned}$$
(2.4)

and

$$\begin{aligned} \begin{aligned} \Delta B^{i,N}_t&= B\left( t,X_{[0,t]}^i,\int g(t, X_{[0,t]}^i, \varvec{y}_{[0,t]}) \mu _t^N(d\varvec{y})\right) \\&\quad - B\left( t,X_{[0,t]}^i,\int g(t, X_{[0,t]}^i, \varvec{y}_{[0,t]}) \mu _t(d\varvec{y})\right) . \end{aligned} \end{aligned}$$
(2.5)

Assumptions 2.1 and 2.3 imply that \({{\mathbb {E}}}[ \int _0^t (\Delta B^{i,N}_s)^2 ds] < \infty \) for all i and t, which ensures that \(M^N\) is a well-defined positive martingale. We claim that \({{\mathbb {E}}}[Z^N_T] = 1\) for all \(T \in (0,\infty )\), so that \(Z^N\) is a true martingale. To see this, note that Lemma 4.3 implies that for any \(s < t \le T\) with \(t-s\) small enough, the chaos expansion

$$\begin{aligned} \frac{Z^N_t}{Z^N_s} = 1 + \sum _{m=1}^\infty \int _s^t \int _s^{t_1} \cdots \int _s^{t_{m-1}} dM^N_{t_m} \cdots dM^N_{t_1} \end{aligned}$$

converges in \(L^2\). Moreover, [3, Proposition 1] together with Assumption 2.3 imply that each iterated integral has expectation zero. As a result, \({{\mathbb {E}}}[Z^N_t / Z^N_s] = 1\) for all such st, and this implies \({{\mathbb {E}}}[Z_T]=1\) as claimed.

Since \(Z^N\) is a true martingale, it induces a locally equivalent probability measure \({{\mathbb {Q}}}^N \sim _\text {loc} {{\mathbb {P}}}\) under which the processes defined by

$$\begin{aligned} W^{i,N}_t = W^i_t - \int _0^t \Delta B^{i,N}_s ds \end{aligned}$$

are mutually independent standard Brownian motions. Thus under \({{\mathbb {Q}}}^N\) we find that \(X^1,\ldots ,X^N\) follow the N-particle dynamics (1.2),

$$\begin{aligned} dX^i_t= & {} A(t, X^i_{[0,t]}) \left( B\left( t, X^i_{[0,t]}, \int g(t, X^i_{[0,t]}, {\varvec{y}}_{[0,t]}) \mu ^N_t(d{\varvec{y}}) \right) dt+dW^{i,N}_t \right) \nonumber \\{} & {} + C(t, X^i_{[0,t]}) dt. \end{aligned}$$
(2.6)

The following is the precise formulation of our main result.

Theorem 2.4

Suppose Assumptions 2.1 and 2.3 are satisfied and consider the laws \({{\mathbb {Q}}}^N\) constructed above. Fix \(T \in (0,\infty )\) and suppose that for some normalizing constants \(a^N_T,b^N_T\) the normalized maxima of the i.i.d. system converge weakly to a nondegenerate distribution function \(\Gamma _T\) on \({{\mathbb {R}}}\):

$$\begin{aligned} {{\mathbb {P}}}\left( \max _{i \le N} \frac{X^i_T - b^N_T}{a^N_T} \le x \right) \rightarrow \Gamma _T(x) \text { as } N \rightarrow \infty , \quad x \in {{\mathbb {R}}}. \end{aligned}$$

Then the normalized maxima of the interacting particle systems also converge to \(\Gamma _T\):

$$\begin{aligned} {{\mathbb {Q}}}^N\left( \max _{i \le N} \frac{X^i_T - b^N_T}{a^N_T} \le x \right) \rightarrow \Gamma _T(x) \text { as } N \rightarrow \infty , \quad x \in {{\mathbb {R}}}. \end{aligned}$$
(2.7)

Classical extreme value theory asserts that up to affine transformations, \(\Gamma _T\) must belong to a one-parameter family of extreme value distributions consisting of the Fréchet, Gumbel, and Weibull distributions. Our assumptions tend to preclude the heavy-tailed behavior that is characteristic of the Fréchet class.

Proposition 2.5

Let the assumptions of Theorem 2.4 be satisfied. Assume in addition that all moments of \(\nu _0\) are finite and one has the linear growth bound \(|B(t,{\varvec{x}}_{[0,t]}, 0)| \le c(1 + x^*_t)\) for all \(t \in [0,T]\), \({\varvec{x}} \in C({{\mathbb {R}}}_+)\), where the constant c may depend on T and we use the notation \(x^*_t = \sup _{s\le t}|x_s|\). Then \(\Gamma _T\) must belong to the Gumbel or Weibull family.

Proof

We allow c to change from one occurrence to the next. The assumptions imply the bound \(|B(t, X_{[0,t]}, r)| \le c(1 + X^*_t + |r|)\), which together with the uniform boundedness of A and C yields

$$\begin{aligned} |X_t|\le & {} |X_0| + c\int _0^t \left( 1 + X^*_s + \int \left| g(s, X_{[0,s]}, {\varvec{y}}_{[0,s]}) \right| \mu _s(d{\varvec{y}}) \right) ds\\ {}{} & {} + \left| \int _0^t A(s, X_{[0,s]}) dW_s \right| . \end{aligned}$$

This in turn implies

$$\begin{aligned} X^*_t \le c \int _0^t X^*_s ds + J_t \end{aligned}$$

for the nondecreasing process

$$\begin{aligned} J_t = |X_0| + c\int _0^t \left( 1 + \int \left| g(s, X_{[0,s]}, {\varvec{y}}_{[0,s]}) \right| \mu _s(d{\varvec{y}}) \right) ds + \sup _{s \le t} \left| \int _0^s A(u, X_{[0,u]}) dW_u \right| . \end{aligned}$$

Pathwise application of Gronwall’s inequality then yields \(X^*_T \le e^{cT} J_T\). Because all moments of \(\nu _0\) are finite, A is uniformly bounded, and thanks to (2.1) of Assumption 2.3, all moments of \(J_T\) are finite. (For the stochastic integral term this uses the BDG inequalities.) Then so are the moments of \(X^*_T\), and then also of \(X_T\). However, if \(X_T\) were in the Fréchet domain of attraction it would have a regularly varying tail (see [7, Theorem 1.2.1]), implying that all sufficiently high moments are infinite. This excludes the Fréchet family. \(\square \)

Remark 2.6

Weak convergence is equivalent to convergence for all \(x \in {{\mathbb {R}}}\) where \(\Gamma _T\) is continuous. However, since all extreme value distributions are continuous, restricting to continuity points is redundant.

Theorem 2.4 asserts one-dimensional marginal convergence at single time points T. We do not prove full finite-dimensional marginal convergence in this paper, but let us nonetheless make the following observation. In certain examples, the random vectors \((X_{T_1},\ldots ,X_{T_n})\) with \(T_1<\ldots <T_n\) exhibit asymptotic independence. This means that each \(X_{T_\alpha }\), \(\alpha \in [n]\), belongs to the maximum domain of attraction of some extreme value distribution \(\Gamma _{T_\alpha }\) with normalizing constants \(a^N_{T_\alpha },b^N_{T_\alpha }\), and that the vector of normalized maxima converges to a product measure:

$$\begin{aligned} {{\mathbb {P}}}\left( \max _{i \le N} \frac{X^i_{T_1} - b^N_{T_1}}{a^N_{T_1}} \le x_1, \ldots , \max _{i \le N} \frac{X^i_{T_n} - b^N_{T_n}}{a^N_{T_n}} \le x_n \right) \rightarrow \Gamma _{T_1}(x_1) \cdots \Gamma _{T_n}(x_n) \end{aligned}$$
(2.8)

as \(N \rightarrow \infty \) for all \((x_1,\ldots ,x_n) \in {{\mathbb {R}}}^n\). Asymptotic independence is characterized by the condition

$$\begin{aligned} \frac{{{\mathbb {P}}}(X_{T_\alpha }> a^N_{T_\alpha } x_\alpha + b^N_{T_\alpha } \text { and } X_{T_\beta }> a^N_{T_\beta } x_\beta + b^N_{T_\beta } )}{{{\mathbb {P}}}(X_{T_\alpha } > a^N_{T_\alpha } x_\alpha )} \rightarrow 0 \end{aligned}$$

for all \(\alpha \ne \beta \) in [n] and all \(x_\alpha ,x_\beta \in {{\mathbb {R}}}\) such that \(\Gamma _{T_\alpha }(x_\alpha ) > 0\) and \(\Gamma _{T_\beta }(x_\beta ) > 0\); see [20, Proposition 5.27]. In particular, this is known to hold for multivariate Gaussian distributions with correlation in \((-1,1)\); see [20, Corollary 5.28]. Thus if X is a Gaussian process with non-trivial correlation function, then all finite-dimensional marginal distributions of the centered and scaled processes \(\max _{i \le N} (X^i_t - b^N_t)/a^N_t\) converge as \(N \rightarrow \infty \) to product distributions with nondegenerate components (specifically, affine transformations of Gumbel). No continuous process has finite-dimensional marginal distributions of this form, so this precludes convergence at the level of continuous processes. The Gaussian case is discussed further in Example 3.1. Whenever the i.i.d. particles \(X^i\), \(i \in {{\mathbb {N}}}\), satisfy the asymptotic independence property (2.8), it is natural to expect that the same is true for the interacting N-particle systems, although proving this is outside the scope of this paper.

We end this section with a few additional remarks.

Remark 2.7

(on Assumption 2.3) There is a large literature on well-posedness of McKean–Vlasov equations, providing a range of conditions under which a solution to (1.3) exists; see e.g. [9, 16, 19, 24]. Next, the moment bound (2.1) is satisfied if the centered random variables \(g(t, X^i_{[0,t]}, X^j_{[0,t]}) - \int g(t, X^i_{[0,t]}, \varvec{y})\mu _t(d\varvec{y})\) are bounded or conditionally (on \(X^i_{[0,t]}\)) sub-Gaussian with a uniformly bounded variance proxy (see e.g. [22] for a review of sub-Gaussianity). One can then also verify (2.2) by noticing that the 2p-th moment of

$$\begin{aligned}&\int g(t, X^i_{[0,t]}, \varvec{y}) (\mu _t^{N} - \mu _t )(d\varvec{y}) \\&\quad =\frac{1}{N}\sum _{j = 1}^{N}\left( g(t, X^i_{[0,t]}, X^j_{[0,t]}) - \int g(t, X^i_{[0,t]}, \varvec{y})\mu _t(d\varvec{y}) \right) \end{aligned}$$

can be controlled by that of

$$\begin{aligned} \frac{1}{N - 1}\sum _{\begin{array}{c} j = 1 \\ j \ne i \end{array}}^{N}\left( g(t, X^i_{[0,t]}, X^j_{[0,t]}) - \int g(t, X^i_{[0,t]}, \varvec{y})\mu _t(d\varvec{y}) \right) \end{aligned}$$
(2.9)

plus a term proportional to \(p! 3^p K(r)^p / N^{2p}\). Conditionally on \(X^i_{[0,t]}\), the \(N - 1\) summands in (2.9) are pairwise independent and identically distributed with zero mean. In the sub-Gaussian case, these \(N-1\) summands above are also sub-Gaussian, so their average is sub-Gaussian with an \(O(\frac{1}{N})\) variance, and the desired bound follows from [22, Lemma 1.4]. In the case of bounded summands, we instead apply Hoeffding’s inequality [22, Theorem 1.9] and then again [22, Lemma 1.4].

Remark 2.8

(Non-i.i.d. initial conditions) Standard propagation of chaos is frequently formulated under weaker assumptions on the initial conditions of the N-particle systems than being i.i.d. A common assumption is that \((X_0^{1,N}, \ldots , X_0^{k,N})\) converges weakly to \((X_0^{1}, \ldots , X_0^{k})\) as \(N \rightarrow \infty \) for each \(k \in {\mathbb {N}}\), where \(X^i_0\), \(i \in {{\mathbb {N}}}\), is an i.i.d. sequence. Although we have not succeeded in proving our main result under this weaker assumption on the initial conditions, it is nonetheless possible to move slightly beyond the i.i.d. setting through an additional change of measure. Specifically, let \(\nu ^N_0\) (a probability measure on \({{\mathbb {R}}}^N\)) be the desired joint initial law of the N-particle system, and assume it is absolutely continuous with respect to the N-fold product measure \(\nu _0^{\otimes N}\), where as above \(\nu _0\) is the initial law of the limiting McKean–Vlasov SDE. We make the total variation type stability assumption that

$$\begin{aligned} \lim _{N \rightarrow \infty } \int _{{{\mathbb {R}}}^N} \left| \frac{d\nu ^N_0}{d\nu _0^{\otimes N}} - 1 \right| d\nu _0^{\otimes N} = 0. \end{aligned}$$

Letting \({{\mathbb {Q}}}^N\) be defined as before, we now obtain a new measure \(\widetilde{{{\mathbb {Q}}}}^N\) by using

$$\begin{aligned} {\widetilde{Z}}^N_0 = \frac{d\nu ^N_0}{d\nu _0^{\otimes N}}(X^1_0,\ldots ,X^N_0) \end{aligned}$$

as Radon–Nikodym derivative. This affects the initial law, but not the form of the particle dynamics. Then as \(N \rightarrow \infty \) we have

$$\begin{aligned}{} & {} \left| \widetilde{{{\mathbb {Q}}}}^N\left( \max _{i \le N} \frac{X^i_T - b^N_T}{a^N_T} \le x \right) - {{\mathbb {Q}}}^N\left( \max _{i \le N} \frac{X^i_T - b^N_T}{a^N_T} \le x \right) \right| \\{} & {} \quad = \left| {{\mathbb {E}}}\left[ \prod _{i=1}^N {\varvec{1}}_{\{ X^{i}_{T} \le x_N \}} \left( \widetilde{Z}_0^N - 1\right) \right] \right| \le \int _{{{\mathbb {R}}}^N} \left| \frac{d\nu ^N_0}{d\nu _0^{\otimes N}} - 1 \right| d\nu _0^{\otimes N} \rightarrow 0. \end{aligned}$$

This shows that the large-N asymptotics of the normalized maxima of the N-particle system are unaffacted when the initial distribution is \(\nu ^N_0\) instead of \(\nu _0^{\otimes N}\).

Remark 2.9

(A coupling argument) D. Lacker has pointed out to us that a simple coupling argument yields our propagation of chaos result in the presence of constant volatility and Lipschitz drift. Although this does not lead to a proof of our main result (in particular, our key example of rank-based models is excluded due to discontinuous drifts; see Example 3.2), it is worth recording the argument here. Assume that the drift function B satisfies the Lipschitz condition

$$\begin{aligned} |B(x,\mu ) - B(y,\nu )| \le C(|x-y| + {{\mathcal {W}}}_p(\mu ,\nu )) \end{aligned}$$

for some constant C, all \(x,y \in {{\mathbb {R}}}\), and all probability measures \(\mu ,\nu \) with finite p-th moment. Here \({{\mathcal {W}}}_p(\mu ,\nu )\) is the p-Wasserstein distance between \(\mu \) and \(\nu \) for some fixed \(p \in [1,\infty )\). We let the N-particle system be given as the unique strong solution of the system of SDEs

$$\begin{aligned} dX^{i,N}_t = B(X^{i,N}_t, \mu ^N_t) dt + dW^i_t, \quad X^{i,N}_0 = \xi ^i, \quad i=1,\ldots ,N, \end{aligned}$$

where \(W^i\), \(i \in {{\mathbb {N}}}\), is a sequence of independent standard Brownian motions and \(\xi ^i\), \(i \in {{\mathbb {N}}}\), is a sequence of p-integrable i.i.d. initial conditions. For each i, let \(X^i\) be the unique strong solution of the McKean–Vlasov SDE

$$\begin{aligned} dX^i_t = B(X^i_t, \mu _t)dt + dW^i_t, \quad X^i_0 = \xi ^i, \quad \mu _t = \text {Law}(X^i_t), \end{aligned}$$

using the same Brownian motion and initial condition as for the N-particle systems. We then obtain

$$\begin{aligned} \max _{i \le N}|X_t^{i, N} - X_t^{i}|&= \max _{i \le N}\left| \int _0^t \left( B(X_s^{i, N}, \mu _s^N) - B(X_s^{i}, \mu _s) \right) ds\right| \nonumber \\&\le C\int _0^t\max _{i \le N}|X_s^{i, N} - X_s^{i}|ds + C\int _0^T {{\mathcal {W}}}_p(\mu _{s}^N,\mu _{s})ds, \end{aligned}$$

and a Gronwall-type argument gives

$$\begin{aligned} \max _{i \le N}\left| X_T^{i, N} - X_T^{i}\right| \le Ce^{CT}\int _0^TW_p(\mu _{s}^N,\mu _{s})ds. \end{aligned}$$

Consequently,

$$\begin{aligned}&{{\mathbb {E}}}\left[ \left| \max _{i \le N} \frac{X^{i,N}_T - b^N_T}{a^N_T} - \max _{i \le N} \frac{X^{i}_T - b^N_T}{a^N_T}\right| \right] \\&\quad \le \frac{1}{a^N_T}{{\mathbb {E}}}\left[ \max _{i \le N}\left| X_T^{i, N} - X_T^{i}\right| \right] \\&\quad \le \frac{Ce^{CT}}{a^N_T}\int _0^T{{\mathbb {E}}}\left[ W_p(\mu _{s}^N, \mu _{s})\right] ds. \end{aligned}$$

Provided that \(\int _0^T{{\mathbb {E}}}\left[ W_p(\mu _{s}^N, \mu _{s})\right] ds / a^N_T \rightarrow 0\) as \(N \rightarrow \infty \), our propagation of chaos result follows. This happens, for instance, in the Gaussian case where \(a^N_T\) behaves like \(1/\sqrt{\log N}\) (see Example 3.1 below) and \({{\mathbb {E}}}\left[ W_p(\mu _{s}^N, \mu _{s})\right] \) behaves like \(N^{-\gamma }\) for some \(\gamma > 0\).

3 Examples

We discuss two examples that illustrate the main result.

Example 3.1

(Gaussian particles) The following Gaussian particle system has been studied in a number of contexts, such as models for monetary reserves of banks [2, 4], and default intensities in large interbank networks [12, Example 2.2]. The N-particle system evolves according to the multivariate Ornstein–Uhlenbeck process

$$\begin{aligned} X_{t}^{i, N}&= X_{0}^{i} - \kappa \int _{0}^{t}\left( X_{s}^{i, N} - \frac{1}{N}\sum _{j=1}^NX_{s}^{j, N} \right) ds + \sigma W_t^{i}, \quad i=1,\ldots ,N, \end{aligned}$$

with i.i.d. \(N(m_0,\sigma _0^2)\) initial conditions. Here \(\kappa , m_0 \in {{\mathbb {R}}}\) and \(\sigma , \sigma _0 \in (0,\infty )\) are parameters. In our setting this example arises by taking \(A(t,{\varvec{x}}_{[0,t]}) = \sigma \), \(B(t,{\varvec{x}}_{[0,t]},r) = -\kappa (x_t - r)/\sigma \), \(C(t, {\varvec{x}}_{[0,t]}) = 0\), and \(g(t,{\varvec{x}}_{[0,t]},{\varvec{y}}_{[0,t]}) = y_t\). Clearly Assumption 2.1 is satisfied. The McKean–Vlasov equation (1.3) reduces to

$$\begin{aligned} dX_t = - \kappa \left( X_t - {{\mathbb {E}}}[X_t] \right) dt + \sigma dW_t. \end{aligned}$$

Taking expectations one obtains \({{\mathbb {E}}}[X_t] = {{\mathbb {E}}}[X_0] = m_0\) for all \(t \in {{\mathbb {R}}}_+\), showing that X is an Ornstein–Uhlenbeck process with constant mean \(m_0\) and time-t variance given by

$$\begin{aligned} \sigma _t^2 = \text {Var}(X_t) = e^{-2\kappa t}\sigma _0^2 + (1-e^{-2\kappa t}) \frac{\sigma ^2}{2\kappa }. \end{aligned}$$

Letting \(X^i\), \(i \in {{\mathbb {N}}}\), be independent copies of X, we see that \(g(t, X^i_{[0,t]}, X^j_{[0,t]}) = X^j_t\) is Gaussian for all ij. Thus in view of Remark 2.7, Assumption 2.3 is satisfied.

Now, it is a well-known fact [7, Example 1.1.7] that the standard Gaussian distribution belongs to the maximum domain of attraction of the standard Gumbel distribution \(\Gamma (x) = \exp (-e^{-x})\) with normalizing constants

$$\begin{aligned} b^N = \frac{1}{a^N} = \sqrt{ 2\log N - \log \log N - \log (4\pi ) }. \end{aligned}$$

By normalizing, we see that \(X_T\) also belongs to the maximum domain of attraction of \(\Gamma \) for each T, with normalizing constants

$$\begin{aligned} a^N_T = \sigma _T a^N \text { and } b^N_T = m_0 + \sigma _T b^N. \end{aligned}$$

Indeed, since \((X^i_T - m_0)/\sigma _T\) is standard Gaussian we have

$$\begin{aligned}{} & {} {{\mathbb {P}}}\left( \max _{i \le N} \frac{X^i_T - b^N_T}{a^N_T} \le x \right) \\ {}{} & {} \quad = {{\mathbb {P}}}\left( \max _{i \le N} \frac{(X^i_T - m_0)/\sigma _T - b^N}{a^N} \le x \right) \rightarrow \exp (-e^{-x}) \text { as } N \rightarrow \infty . \end{aligned}$$

This shows that the hypotheses of Theorem 2.4 are satisfied, and we deduce that same asymptotics hold for the N-particle systems,

$$\begin{aligned} {{\mathbb {P}}}\left( \max _{i \le N} \frac{X^{i,N}_T - b^N_T}{a^N_T} \le x \right) \rightarrow \exp (-e^{-x}) \text { as } N \rightarrow \infty . \end{aligned}$$

Lastly, X is a Gaussian process with correlation function

$$\begin{aligned} \text {Corr}(X_s,X_t) = \sqrt{ \frac{\alpha + e^{2\kappa s} - 1}{\alpha + e^{2\kappa t} - 1} }, \quad s < t, \end{aligned}$$

where \(\alpha = 2 \kappa \sigma _0^2 / \sigma ^2\). Thus \(\text {Corr}(X_s,X_t) \in (0,1)\) for all \(s \ne t\). The discussion in Sect. 2 implies that the finite-dimensional marginal distributions of X exhibit asymptotic independence, and that (2.8) holds for all \(n \in {{\mathbb {N}}}\) and \(T_1< \cdots < T_n\). In particular, there is no functional convergence in the space of continuous processes.

Example 3.2

(Rank-based diffusions) Consider the N-particle system evolving according to

$$\begin{aligned} dX^{i,N}_t = B\left( F^N_t( X^{i,N}_t) \right) dt + \sqrt{2} dW^i_t \end{aligned}$$

for \(i \in [N]\), where B(r) is a twice continuously differentiable function on [0, 1] and

$$\begin{aligned} F^N_t(x) = \frac{1}{N} \sum _{j=1}^N {\varvec{1}}_{\{X^{j,N}_t \le x\}} \end{aligned}$$

is the empirical distribution function. Such systems are called rank-based because the drift (and in more general formulations also the diffusion) of each particle depends on its rank within the population. Indeed, module tie-breaking, \(F^N_t( X^{i,N}_t) = k/N\) where \(k=1\) if \(X^{i,N}_t\) is the smallest particle, \(k=2\) if \(X^{i,N}_t\) is the second smallest, and so on. Rank-based systems have been studied extensively and play an important role in stochastic portfolio theory; see e.g. [8, 13,14,15, 23]. They are challenging to analyze in part because the drift is discontinuous as a function of the current state and the empirical measure (with the Wasserstein metric \({{\mathcal {W}}}_p\) for any \(p \ge 1\)), making e.g. the argument in Remark 2.9 inapplicable.

The above system fits into our setup by taking \(A(t,{\varvec{x}}_{[0,t]}) = \sqrt{2}\), \(B(t,{\varvec{x}}_{[0,t]},r) = B(r)\), \(C(t, {\varvec{x}}_{[0,t]}) = 0\), and \(g(t,{\varvec{x}}_{[0,t]},{\varvec{y}}_{[0,t]}) = {\varvec{1}}_{\{y_t \le x_t\}}\). Clearly Assumption 2.1 is satisfied. The limiting McKean–Vlasov equation takes the form

$$\begin{aligned} dX_t&= B( F_t(X_t) ) dt + \sqrt{2} dW_t, \\ F_t(x)&= {{\mathbb {P}}}(X_t \le x). \end{aligned}$$

The above setup is well-studied, and both the N-particle system and the McKean–Vlasov equation are well-posed [13, 23]. Since the interaction function g and drift coefficient B are both bounded, Assumption 2.3 is readily seen to be satisfied.

General criteria for verifying the domain of attraction assumption on X are not available. However, if X is stationary, more can be said. It is known [13, 23] that the distribution function \(F_t(x)\) satisfies the PDE

$$\begin{aligned} \partial _t F = \partial _{xx} F - \partial _x {\mathfrak {B}}(F), \end{aligned}$$

where \({\mathfrak {B}}(u) = \int _0^u B(r) dr\). Let us assume that \(B(0), B(1) \ne 0\), \({\mathfrak {B}}(u) > 0\) for all \(u \in (0,1)\), and \({\mathfrak {B}}(1) = 0\). In this case there is a solution F(x) to the stationary equation

$$\begin{aligned} F'' = {\mathfrak {B}}(F)' \end{aligned}$$
(3.1)

which is a distribution function. By using F as initial condition for \(X_0\), the solution of the McKean–Vlasov equation has constant marginal law, \({{\mathbb {P}}}(X_t \le x) = F(x)\) for all t and x. By integrating (3.1) once and using that \(F(-\infty ) = F'(-\infty ) = {\mathfrak {B}}(0) = 0\) one obtains

$$\begin{aligned} F' = {\mathfrak {B}}(F). \end{aligned}$$
(3.2)

(Here it becomes clear why \({\mathfrak {B}} \ge 0\) and \({\mathfrak {B}}(1) = 0\) are needed, as \(F'\) is a probability density.) We now apply the von Mises condition [7, Theorem 1.1.8], which states that F belongs to the Gumbel domain of attraction if

$$\begin{aligned} \lim _{x \rightarrow \infty } \frac{(1 - F(x)) F''(x)}{F'(x)^2} = -1. \end{aligned}$$

The mean value theorem yields \({\mathfrak {B}}(F(x)) = {\mathfrak {B}}(F(x)) - {\mathfrak {B}}(1) = B(r^*)(F(x)-1)\) for some \(r^* \in (F(x),1)\). Next, (3.2) implies that \(F'' = B(F){\mathfrak {B}}(F)\). Thus,

$$\begin{aligned} \frac{(1 - F(x)) F''(x)}{F'(x)^2} = \frac{(1-F(x))B(F(x))}{{\mathfrak {B}}(F(x))} = - \frac{B(F(x))}{B(r^*)} \rightarrow -\frac{B(1)}{B(1)} = -1 \text { as } x \rightarrow \infty . \end{aligned}$$

This confirms that the hypotheses of Theorem 2.4 are satisfied. We deduce that Gumbel asymptotics hold for the N-particle systems,

$$\begin{aligned} {{\mathbb {P}}}\left( \max _{i \le N} \frac{X^{i,N}_T - b^N_T}{a^N_T} \le x \right) \rightarrow \exp (-e^{-x}) \text { as } N \rightarrow \infty . \end{aligned}$$

4 Key lemmas

As discussed in Sect. 1.1, the proof of Theorem 2.4 relies on counting the nonzero terms of the form (1.10) and bounding their size. This was done under a smallness assumption on T which allows us to truncate the chaos expansion (1.8) at a finite level. In order to perform this truncation without any smallness assumption on T, we have to partition the interval (0, T] into a sufficiently large number n of subintervals \((T_{\alpha -1},T_\alpha ]\), \(\alpha \in [n]\). Doing so leads to expressions analogous to (1.10) but more complex, and it is those expressions that we need to control. Lemmas 4.1 and 4.2 control the number of nonzero expressions. Lemmas 4.3 and 4.5 provide tail bounds on iterated stochastic integrals which are used to bound the size of the nonzero expressions and control the error that we commit when truncating the chaos expansions, among other things. The proofs of the lemmas are given in Sect. 5.

We work with the notation and assumptions of Sect. 2. In particular, Assumptions 2.1 and 2.3 are in force. We also use the notation

$$\begin{aligned} {{\mathcal {F}}}^V_t = \sigma (X^i_s,W^i_s :s \le t, i \in V) \text { for any index set } V \subset {{\mathbb {N}}}, \end{aligned}$$

and write \({\mathbb {L}}\) for the space of all progressively measurable processes Y with locally integrable moments, \(\int _0^t {{\mathbb {E}}}[ |Y_s|^p ] ds < \infty \) for all \(t \in {{\mathbb {R}}}_+\) and \(p \in {{\mathbb {N}}}\).

We fix a family of progressively measurable processes \({G}^{{i}{j}} \in {\mathbb {L}}\), \({i},{j}\in [N]\), such that \({G}^{{i}{j}}\) is adapted to the filtration \(({{\mathcal {F}}}_t^{\{{i},{j}\}})_{t \ge 0}\), and introduce the iterated integral notation

$$\begin{aligned} I_{{\varvec{i}},{{\varvec{j}}}}^{N}(s, t) = \int _s^t G_{t_1}^{i_1,j_1} \int _s^{t_1} G_{t_2}^{i_2,j_2}\cdots \int _s^{t_{k-1}}G_{t_k}^{i_k,j_k} dW_{t_k}^{i_k}\cdots dW_{t_1}^{i_1} \end{aligned}$$
(4.1)

for any \(k \in {{\mathbb {N}}}\) and any multiindices \({\varvec{i}} = ({i}_1,\ldots ,{i}_k) \in [N]^k\) and \({{\varvec{j}}} = ({j}_1,\ldots ,{j}_k) \in [N]^k\). Our first key lemma is the following, where later on the random variable \(\Psi \) will be instantiated as products of indicators as in (1.10).

Lemma 4.1

(criteria for zero expectation) Assume for all \(V \subset [N]\), \({i}\in V\), \({j}\notin V\) that

$$\begin{aligned} {{\mathbb {E}}}[ {G}^{{i}{j}}_s \mid {{\mathcal {F}}}_t^V ] = 0, \quad s \le t. \end{aligned}$$
(4.2)

Let \(T \ge 0\) and \(n \in {{\mathbb {N}}}\). Consider an increasing finite sequence \(0 = T_0 \le \cdots \le T_n = T\) and fix natural numbers \(k_1,\ldots ,k_n \in {{\mathbb {N}}}\). For each \(\alpha \in [n]\), fix two \(k_\alpha \)-tuples

$$\begin{aligned} {\varvec{i}_{\alpha }}= & {} ({i}_{\alpha , 1}, \ldots ,{i}_{\alpha , k_\alpha }) \in [N]^{k_\alpha },\\ {{\varvec{j}}_{\alpha }}= & {} ({j}_{\alpha , 1}, \ldots ,{j}_{\alpha , k_\alpha }) \in [N]^{k_\alpha }. \end{aligned}$$

Finally, let \(K \subset [N]\) and consider a bounded \({{\mathcal {F}}}_T^K\)-measurable random variable \(\Psi \). Assume at least one of the following conditions is satisfied:

  1. (i)

    there exist some \(\beta \in [n]\) and \(\ell _0 \in [k_{\beta }]\) such that

    $$\begin{aligned} {i}_{\beta , \ell _0} \notin K \cup \{{j}_{\beta , 1}, \ldots , {j}_{\beta , \ell _0-1} \} \cup \bigcup _{\alpha = \beta + 1}^n \{{j}_{\alpha , 1},\ldots ,{j}_{\alpha , k_\alpha } \}, \end{aligned}$$
    (4.3)

    where \(\{{j}_{\beta , 1}, \ldots , {j}_{\beta , \ell _0-1} \}\) is regarded as the empty set when \(\ell _0 = 1\),

  2. (ii)

    one has

    $$\begin{aligned} {j}_{1, k_1} \notin K \cup \{{j}_{1,1}, \ldots , {j}_{1,k_1-1} \} \cup \bigcup _{\alpha = 2}^n \{{j}_{\alpha ,1},\ldots ,{j}_{\alpha , k_\alpha } \}. \end{aligned}$$

Then

$$\begin{aligned} {{\mathbb {E}}}\left[ \Psi \prod _{\alpha \in [n]} I_{{\varvec{i}_{\alpha }},{{\varvec{j}}_{\alpha }}}^{N}(T_{\alpha - 1} , T_{\alpha }) \right] = 0. \end{aligned}$$
(4.4)

The criteria (i) and (ii) in Lemma 4.1 for zero expectation are of a combinatorial nature involving index set membership. The following lemma counts the number ways in which these conditions can fail, thereby bounding the number of nonzero terms.

Lemma 4.2

(counting lemma) Fix natural numbers \(n, N, \kappa , k_1,\ldots ,k_n\). The number of ways we can pick a subset \(K \subset [N]\) with \(|K| = \kappa \) along with tuples \(\varvec{i}_{\alpha }, \varvec{j}_{\alpha } \in [N]^{k_{\alpha }}\) for all \(\alpha \in [n]\) such that both properties (i) and (ii) of Lemma 4.1 fail to hold is bounded by

$$\begin{aligned} \left( {\begin{array}{c}N\\ \kappa \end{array}}\right) \kappa (\kappa +1) \cdots (\kappa +S) N^{S-1}, \end{aligned}$$

where \(S = k_1 + \cdots + k_n\).

We next develop bounds on iterated stochastic integrals. The following lemma will allow us to truncate the chaos expansions of ratios \(Z^N_t / Z^N_s\) at levels that do not need to increase with N to comply with given error tolerances. Note that the lemma gives an upper bound that is summable in m only if \(t-s\) is sufficiently small. This is the reason we are forced to partition [0, T] into subintervals when proving Theorem 2.4 without any smallness assumption on T.

Lemma 4.3

(first iterated integral \(L^p\) estimate) Let

$$\begin{aligned} I_m^N(s,t) = \int _s^t\int _s^{t_1}\cdots \int _s^{t_{m-1}}dM_{t_m}^{N}dM_{t_{m-1}}^{N}\cdots dM_{t_1}^{N} \end{aligned}$$

where \(M^N\) is defined in (2.4) and it is understood that \(I^N_1(s,t) = M^N_t - M^N_s\). Then, for any \(N,m, p \in {\mathbb {N}}\), any \(T \in (0,\infty )\), and all \(s, t \in \left[ 0, \, T\right] \) we have

$$\begin{aligned} \Vert I_m^N(s,t) \Vert _{2p} \le \left( C(T)p\sqrt{t - s}\right) ^m \end{aligned}$$

where the constant C(T) only depends on T and the bounds from Assumptions 2.12.3.

Remark 4.4

Note that we only consider \(L^{2p}\) norms for positive integers p, which is all that is needed later on. This is why Assumption 2.3 only involves even integer moments.

The proof of Lemma 4.3 relies on the following sharp iterated integral estimate, valid for any continuous local martingale M and any \(p \in [1,\infty )\), which follows from [3, Theorem 1] on noting that \(1 + \sqrt{1 + 1/(2p)} < 3\) for any such p:

$$\begin{aligned} \left\| \int _s^t\int _s^{t_1}\cdots \int _s^{t_{m-1}}dM_{t_m} dM_{t_{m-1}} \cdots dM_{t_1} \right\| _{2p} \le \frac{(2pm)^{m/2}3^m}{m!} \left\| \langle M \rangle _{s,t}^{1/2} \right\| _{2pm}^{m}, \end{aligned}$$
(4.5)

where we write \(\langle M \rangle _{s,t} = \langle M \rangle _t - \langle M \rangle _s\) for brevity.

While (4.5) is instrumental for proving Lemma 4.3, it cannot be used to bound the iterated integrals appearing in (4.4), which involve several different local martingales. In order to control the nonzero terms of the form (4.4) we will instead use a weaker estimate obtained by repeated application of the BDG and Hölder inequalities. Fortunately this is sufficient thanks to the sharp control on the number of nonzero terms afforded by Lemmas 4.1 and 4.2. The following general estimate for iterated stochastic integrals involving several continuous local martingales serves this purpose, and it is also used in the proof of Lemma 4.1 as well as to control linearization errors when reducing from general drift coefficients B to linear ones in the proof of the main result.

Lemma 4.5

(second iterated integral \(L^p\) estimate) For any set of \(k \in {{\mathbb {N}}}\) continuous local martingales \(M^1,\ldots ,M^k\) and any \(p \in (1,\infty )\) we have the estimate

$$\begin{aligned} \left\| \int _s^t\int _s^{t_1}\cdots \int _s^{t_{k-1}}dM_{t_k}^k\cdots dM_{t_1}^1\right\| _p\le (4\sqrt{p})^{k} 2^{k(k-1)/4}\prod _{\ell =1}^{k}\Vert \langle M^{\ell } \rangle _{s,t}^{1/2}\Vert _{2^{\ell }p}. \end{aligned}$$

We end this section with an algebraic estimate which will allow us to combine Lemma 4.2 and Lemma 4.5 to show that (1.7) indeed tends to zero as \(N \rightarrow \infty \).

Lemma 4.6

For any \(C \in (1,\infty )\), \(N, S \in {{\mathbb {N}}}\), one has the inequality

$$\begin{aligned} \sum _{\kappa =1}^N\left( {\begin{array}{c}N\\ \kappa \end{array}}\right) \kappa (\kappa +1)\cdots (\kappa +S)\left( \frac{C}{N}\right) ^{\kappa (1 - 1/\log N)}\le (S+2)(S+1)^{2(S+1)}e^{Ce}(Ce)^{S+1}. \end{aligned}$$

5 Proofs of the key lemmas

In this section we prove the lemmas presented in Sect. 4. We start with the proof of Lemma 4.5 because it is used in the proof of Lemma 4.1.

5.1 Proof of Lemma 4.5

We prove the lemma by induction. The base case \(k=1\) follows from the sharp BDG inequality (7) in [3] and Hölder’s inequality. For the induction step, we assume that the inequality holds for any \(p>1\) with k replaced by \(k-1\). Applying the sharp BDG inequality, Hölder’s inequality, Doob’s maximal inequality (e.g. Theorem 5.1.3 in [5] with p replaced by \(2p \ge 2\) and so \(q \le 2\)), and finally the induction hypothesis yield

$$\begin{aligned}&\bigg \Vert \int _s^t\int _s^{t_1} \cdots \int _s^{t_{k-1}}dM_{t_k}^k\cdots dM_{t_1}^1\bigg \Vert _p \\&\quad \le 2\sqrt{p}\left\| \left( \int _s^t\left( \int _s^{t_1}\cdots \int _s^{t_{k-1}}dM_{t_k}^k\cdots dM_{t_2}^2\right) ^2d\langle M^1\rangle _{t_1}\right) ^\frac{1}{2}\right\| _p\\&\quad \le 2\sqrt{p}\left\| \sup _{s\le t_1\le t}\bigg |\int _s^{t_1}\cdots \int _s^{t_{k-1}}dM_{t_k}^k\cdots dM_{t_2}^2\bigg |\, \langle M^1\rangle _{s,t}^{1/2}\right\| _p\\&\quad \le 2\sqrt{p}\left\| \sup _{s\le t_1\le t}\bigg |\int _s^{t_1}\cdots \int _s^{t_{k-1}}dM_{t_k}^k\cdots dM_{t_2}^2\bigg |\right\| _{2p}\, \left\| \langle M^1\rangle _{s,t}^{1/2}\right\| _{2p}\\&\quad \le 2\sqrt{p}\, 2\left\| \int _s^{t}\cdots \int _s^{t_{k-1}}dM_{t_k}^k\cdots dM_{t_2}^2\right\| _{2p}\, \left\| \langle M^1\rangle _{s,t}^{1/2} \right\| _{2p}\\&\quad \le 2\sqrt{p}\, 2\, (4\sqrt{2p} )^{k-1} 2^{(k-1)(k-2)/4}\prod _{\ell =1}^{k-1}\left\| \langle M^{\ell +1}\rangle _{s,t}^{1/2}\right\| _{2^{\ell +1}p} \left\| \langle M^1\rangle _{s,t}^{1/2} \right\| _{2p}\\&\quad =(4\sqrt{p})^{k} 2^{k(k-1)/4}\prod _{\ell =1}^{k}\Vert \langle M^{\ell } \rangle _{s,t}^{1/2}\Vert _{2^{\ell }p}. \end{aligned}$$

5.2 Proof of Lemma 4.1

We will need the following two auxiliary lemmas on conditioning, the proofs of which are a trivial modification of the proof of Lemma 2.1.4 in [18].

Lemma 5.1

For any Brownian motion W, two processes \(a, b \in {\mathbb {L}}\), and a \(\sigma \)-algebra \({{\mathcal {G}}}\) such that a(s) and W(s) are \({{\mathcal {G}}}\)-measurable for \(s \le t\), one has

$$\begin{aligned} {{\mathbb {E}}}\left[ \int _0^t a(s) b(s) dW(s) {\ \Big |\ }{{\mathcal {G}}}\right] = \int _0^t a(s) {{\mathbb {E}}}[ b(s) \mid {{\mathcal {G}}}] dW(s). \end{aligned}$$

Lemma 5.2

For any Brownian motion W, a processs \(a \in {\mathbb {L}}\), and a \(\sigma \)-algebra \({{\mathcal {G}}}\) such that W is independent of \({{\mathcal {G}}}\), one has

$$\begin{aligned} {{\mathbb {E}}}\left[ \int _s^t a(u) dW(u) {\ \Big |\ }{{\mathcal {F}}}_s \vee {{\mathcal {G}}}\right] = 0, \quad s \le t. \end{aligned}$$

Notice that these lemmas can be applied for a and b being either the processes \(G^{ij}\), which belong to \({\mathbb {L}}\) by definition, or the iterated integrals \(I_{{\varvec{i}},{{\varvec{j}}}}^{N}(s, \, t)\) for \({\varvec{i}} = ({i}_1,\ldots ,{i}_k) \in [N]^k\) and \({{\varvec{j}}} = ({j}_1,\ldots ,{j}_k) \in [N]^k\), which also belong to \({\mathbb {L}}\). (Recall that these iterated integrals are defined in (4.1).) The latter can be seen by applying Lemma 4.5 with \(M^{\ell }_t = \int _{0}^{t}G_t^{i_{\ell }j_{\ell }}dW_s^{i_{\ell }}\) for all \(\ell \in [k]\) and then Hölder’s inequality.

Assume now that condition (i) is satisfied. Let \(\beta \in [n]\) be the largest index such that (4.3) holds for some \(\ell _0 \in [k_{\beta }]\), and then let \(\ell _0\) be the smallest index for which this happens. Now define

$$\begin{aligned} V = K \cup \{{j}_{1,1}, \ldots , {j}_{1,k_1-1} \} \cup \bigcup _{\alpha = 2}^n \{{j}_{\alpha ,1},\ldots ,{j}_{\alpha , k_\alpha } \}. \end{aligned}$$

Maximality of \(\beta \) implies that \({i}_{\alpha , \ell } \in V\) for all \(\alpha \ge \beta + 1\) and all \(\ell \in [k_\alpha ]\). Moreover, by definition of V we have \(j_{\alpha , \ell } \in V\) for all \(\alpha \ge \beta + 1\) and all \(\ell \in [k_\alpha ]\). Thus every index appearing in \(\varvec{i}_\alpha \) or \(\varvec{j}_\alpha \) for \(\alpha \ge \beta + 1\) belongs to V. As a result, \(I_{{\varvec{i}_{\alpha }},{{\varvec{j}}_{\alpha }}}^{N}(T_{\alpha - 1}, T_{\alpha })\) is \({{\mathcal {F}}}_T^V\)-measurable for all \(\alpha \ge \beta + 1\). Since \(K \subset V\), \(\Psi \) is also \({{\mathcal {F}}}_T^V\)-measurable. Finally, for \(\alpha \le \beta - 1\) we have that \(I_{{\varvec{i}_{\alpha }},{{\varvec{j}}_{\alpha }}}^{N}(T_{\alpha - 1}, T_{\alpha })\) is \({{\mathcal {F}}}_{T_{\beta -1}}\)-measurable. We conclude that

$$\begin{aligned}&{{\mathbb {E}}}\left[ \Psi \prod _{\alpha \in [n]}I_{{\varvec{i}_{\alpha }},{{\varvec{j}}_{\alpha }}}^{N}(T_{\alpha - 1} , T_{\alpha }) \right] \\&\quad = {{\mathbb {E}}}\left[ \Psi \prod _{\begin{array}{c} \alpha \in [n] \\ \alpha \ne \beta \end{array}} I_{{\varvec{i}_{\alpha }},{{\varvec{j}}_{\alpha }}}^{N}(T_{\alpha - 1} , T_{\alpha }) {{\mathbb {E}}}\left[ I_{{\varvec{i}_{\beta }},{{\varvec{j}}_{\beta }}}^{N}(T_{\beta - 1} , T_{\beta }) {\ \Big |\ }{{\mathcal {F}}}_{T_{\beta -1}} \vee {{\mathcal {F}}}_T^V \right] \right] . \end{aligned}$$

It remains to show that

$$\begin{aligned} {{\mathbb {E}}}\left[ I_{{\varvec{i}_{\beta }},{{\varvec{j}}_{\beta }}}^{N}(T_{\beta - 1} , T_{\beta }) {\ \Big |\ }{{\mathcal {F}}}_{T_{\beta -1}} \vee {{\mathcal {F}}}_T^V \right] = 0, \end{aligned}$$
(5.1)

and this will rely on repeated application of Lemma 5.1. Note that \(j_{\beta , \ell } \in V\) for all \(\ell \le \ell _0 - 1\) by definition of V. Moreover, minimality of \(\ell _0\) implies that \({i}_{\beta , \ell } \in V\) for all \(\ell \le \ell _0-1\). For \(\ell \) in this range, starting with \(\ell = 1\), we may therefore apply Lemma 5.1 iteratively with

$$\begin{aligned} {{\mathcal {G}}}&= {{\mathcal {F}}}_{T_{\beta -1}} \vee {{\mathcal {F}}}_T^V, \\ W(t)&= W^{{i}_{\beta , \ell }}_{t}, \\ a(t)&= G^{{i}_{\beta , \ell } {j}_{\beta , \ell }}_{t}, \\ b(t)&= I_{{\varvec{i}_{\beta }^{(\ell +1)}},{{\varvec{j}}_{\beta }^{(\ell +1)}}}^{N}(T_{\beta - 1} , t), \end{aligned}$$

where \(\varvec{i}_\beta ^{(\ell )} = ({i}_{\beta , \ell }, \ldots , {i}_{\beta , k_{\beta }})\) and \(\varvec{j}_\beta ^{(\ell )} = ({j}_{\beta , \ell }, \ldots , {j}_{\beta , k_{\beta }}),\) to obtain

$$\begin{aligned}&{{\mathbb {E}}}\left[ I_{{\varvec{i}_{\beta }},{{\varvec{j}}_{\beta }}}^{N}(T_{\beta - 1} , T_{\beta }) {\ \Big |\ }{{\mathcal {F}}}_{T_{\beta -1}} \vee {{\mathcal {F}}}_T^V \right] \\&\quad = {{\mathbb {E}}}\left[ \int _{T_{\beta -1}}^{T_\beta } G^{{i}_{\beta , 1} {j}_{\beta , 1}}_{t_1} I_{{\varvec{i}_{\beta }^{(2)}},{{\varvec{j}}_{\beta }^{(2)}}}^{N}(T_{\beta - 1} , t_1) dW^{i_{\beta , 1}}_{t_1} {\ \Big |\ }{{\mathcal {F}}}_{T_{\beta -1}} \vee {{\mathcal {F}}}_T^V \right] \\&\quad = \int _{T_{\beta -1}}^{T_\beta } G^{{i}_{\beta , 1} {j}_{\beta , 1}}_{t_1} {{\mathbb {E}}}\left[ I_{{\varvec{i}_{\beta }^{(2)}},{{\varvec{j}}_{\beta }^{(2)}}}^{N}(T_{\beta - 1} , t_1) {\ \Big |\ }{{\mathcal {F}}}_{T_{\beta -1}} \vee {{\mathcal {F}}}_T^V \right] dW^{i_{\beta , 1}}_{t_1} \\&\quad \ \vdots \\&\quad = \int _{T_{\beta -1}}^{T_{\beta }} {G}^{{i}_{\beta , 1} {j}_{\beta , 1}}_{t_1} \cdots \int _{T_{\beta -1}}^{t_{\ell _0 - 2}} {G}^{{i}_{\beta , \ell _0-1} {j}_{\beta , \ell _0-1}}_{t_{\ell _0-1}} \\&\quad {{\mathbb {E}}}\left[ I_{{\varvec{i}_{\beta }^{(\ell _0)}},{{\varvec{j}}_{\beta }^{(\ell _0)}}}^{N}(T_{\beta - 1} , t_{\ell _0 - 1}) {\ \Big |\ }{{\mathcal {F}}}_{T_{\beta -1}} \vee {{\mathcal {F}}}_T^V \right] dW^{{i}_{\beta , \ell _0-1}}_{t_{\ell _0-1}} \cdots dW^{{i}_{\beta , 1}}_{ t_{1}}. \end{aligned}$$

The right-hand side of the last is zero. Indeed, we have

$$\begin{aligned} I_{{\varvec{i}_{\beta }^{(\ell _0)}},{{\varvec{j}}_{\beta }^{(\ell _0)}}}^{N}(T_{\beta - 1}, t) = \int _{T_{\beta - 1}}^{t} G^{i_{\beta , \ell _0}, j_{\beta , \ell _0}}_{t_{\ell _0}} I_{{\varvec{i}_{\beta }^{(\ell _0 + 1)}},{{\varvec{j}}_{\beta }^{(\ell _0 + 1)}}}^{N}(T_{\beta - 1}, t_{\ell _0}) dW^{{i}_{\beta , \ell _0}}_{t_{\ell _0}}, \quad t \ge T_{\beta - 1}, \end{aligned}$$

where \(W^{{i}_{\beta , \ell _0}}\) is independent of \({{\mathcal {F}}}_T^V\) because \(i_{\beta , \ell _0} \notin V\) due to (4.3). We therefore deduce from Lemma 5.2 that

$$\begin{aligned} {{\mathbb {E}}}\left[ I_{{\varvec{i}_{\beta }^{(\ell _0)}},{{\varvec{j}}_{\beta }^{(\ell _0)}}}^{N}(T_{\beta - 1}, t) {\ \Big |\ }{{\mathcal {F}}}_{T_{\beta -1}} \vee {{\mathcal {F}}}_T^V \right] = 0, \quad t \ge T_{\beta - 1}. \end{aligned}$$

This yields (5.1) as required.

Next, assume that condition (ii) is satisfied. In addition, we may assume that condition (i) does not hold since otherwise we would fall in the case just treated. We then define

$$\begin{aligned} V = K \cup \{{j}_{1,1}, \ldots , {j}_{1,k_1-1} \} \cup \bigcup _{\alpha = 2}^n \{{j}_{\alpha ,1},\ldots ,{j}_{\alpha , k_\alpha } \} \end{aligned}$$

and observe that \({i}_{\alpha , \ell } \in V\) for all \(\alpha \in [n]\) and all \(\ell \in [k_\alpha ]\) (since condition (i) does not hold), and that \({j}_{\alpha , \ell } \in V\) for all \(\alpha \in [n]\) and all \(\ell \in [k_\alpha ]\) except if \((\alpha , \ell ) = (1, k_1)\) (by definition of V and since (ii) holds). In particular, \(I_{{\varvec{i}_{\alpha }},{{\varvec{j}}_{\alpha }}}^{N}(T_{\alpha - 1}, T_{\alpha })\) is \({{\mathcal {F}}}_T^V\)-measurable for all \(\alpha \ge 2\), and as before \(\Psi \) is \({{\mathcal {F}}}_T^V\)-measurable as well. Thus

$$\begin{aligned}&{{\mathbb {E}}}\left[ \Psi \prod _{\alpha \in [n]} I_{{\varvec{i}_{\alpha }},{{\varvec{j}}_{\alpha }}}^{N}(T_{\alpha - 1} , T_{\alpha }) \right] \\&\quad = {{\mathbb {E}}}\left[ \Psi \prod _{\alpha = 2}^n I_{{\varvec{i}_{\alpha }},{{\varvec{j}}_{\alpha }}}^{N}(T_{\alpha - 1} , T_{\alpha }) {{\mathbb {E}}}\left[ I_{{\varvec{i}_1},{{\varvec{j}}_1}}^{N}(T_{0} , T_{1}) {\ \Big |\ }{{\mathcal {F}}}_T^V \right] \right] . \end{aligned}$$

The same iterative application of Lemma 5.1 as before, but now with \({{\mathcal {G}}}= {{\mathcal {F}}}_T^V\) and using that \(W^{{i}_{1,\ell }}_t\), \(t \le T\), is \({{\mathcal {F}}}_T^V\)-measurable for all \(\ell \in [k_1]\) and that \({G}^{{i}_{1,\ell }{j}_{1,\ell }}_{t}\), \(t \le T\), is \({{\mathcal {F}}}_T^V\)-measurable for all \(\ell \in [k_1-1]\), leads to

$$\begin{aligned}&{{\mathbb {E}}}\left[ I_{{\varvec{i}_1},{{\varvec{j}}_1}}^{N}(T_{0} , T_{1}) {\ \Big |\ }{{\mathcal {F}}}_T^V \right] \\&\quad = \int _{T_{0}}^{T_{1}} {G}^{{i}_{1, 1} {j}_{1, 1}}_{t_1} \cdots \int _{T_{0}}^{t_{k_1 - 1}} {{\mathbb {E}}}\left[ {G}^{{i}_{1, k_1} {j}_{1, k_1}}_{t_{k_1}} {\ \Big |\ }{{\mathcal {F}}}_T^V \right] dW^{{i}_{1, k_1}}_{ t_{k_1} } \cdots dW^{{i}_{1, 1}}_{ t_{1} }. \end{aligned}$$

The conditional expectation on the right-hand side is equal to zero for all \(t_{k_1} \le T\), thanks to (4.2) and the fact that \(i_{1, k_1} \in V\) and \(j_{1, k_1} \notin V\). This completes the proof of the lemma.

5.3 Proof of Lemma 4.2

First we pick the subset \(K = \{i_{0,1}, \ldots , i_{0,\kappa }\} \subset [N]\), which can be done in exactly \(\left( {\begin{array}{c}N\\ \kappa \end{array}}\right) \) ways. Next, we pick the coordinates of the vectors \({{\varvec{j}}_{1}} = ({j}_{1, 1}, \ldots ,{j}_{1, k_1}), \ldots , {{\varvec{j}}_{n}} = ({j}_{n, 1}, \ldots ,{j}_{n, k_n})\). There are N possible choices for the first \(k_1 - 1\) coordinates \({j}_{1, 1}, \ldots ,{j}_{1, k_1-1}\) of \({{\varvec{j}}_{1}}\), and also N choices for all the \(k_\alpha \) coordinates of \({{\varvec{j}}_\alpha }\) for \(\alpha \in \{2, \ldots , n\}\). Therefore, we can pick all these coordinates in \(N^{k_1 - 1 + k_2 + \ldots + k_n} = N^{S - 1}\) ways. Then, the \(k_1\)-th coordinate \(j_{1,k_1}\) of \({{\varvec{j}}_1}\) needs to be taken equal to either one of the other \(k_1 - 1 + k_2 + k_3 + \cdots + k_n = S - 1\) coordinates we have already picked or one of the \(\kappa \) elements of \(K = \{i_{0,1}, \ldots , i_{0,\kappa }\}\), as otherwise the second condition (ii) of Lemma 4.1 will be satisfied. The latter can be done in at most \(S - 1 + \kappa \) ways. Hence, there are at most \(\left( {\begin{array}{c}N\\ \kappa \end{array}}\right) (\kappa + S - 1)N^{S - 1}\) ways to pick the subset K of [N] and all the coordinates of \({{\varvec{j}}_1}, \ldots , {{\varvec{j}}_n}\). Finally, we pick the coordinates of the vectors \({\varvec{i}_1},\ldots , {\varvec{i}_n}\), where for each \(\beta \in [n]\) and \(\ell \in [k_\beta ]\) we must take

$$\begin{aligned} {i}_{\beta , \ell } \in K \cup \{{j}_{\beta , 1}, \ldots , {j}_{\beta , \ell -1} \} \cup \bigcup _{\alpha = \beta + 1}^n \{{j}_{\alpha , 1},\ldots ,{j}_{\alpha , k_\alpha } \}, \end{aligned}$$

so that the first condition (i) of Lemma 4.1 will fail to hold. This can be done in at most \(u(\beta , \ell ) = \kappa + \ell - 1 + k_{\beta +1} + k_{\beta +2} + \ldots + k_{n}\) ways. Observing that \(u(\beta , \ell )\) takes every integer value between \(u(n, 1) = \kappa \) and \(u(1, k_1) = \kappa + S - 1\) exactly once, we see that the coordinates of \({\varvec{i}_1}, \ldots , {\varvec{i}_n}\) can be picked in at most \(\kappa (\kappa + 1) \cdots (\kappa + S - 1)\) ways. Therefore, the number of ways in which the entire selection of the elements of K and the coordinates of \({\varvec{i}_1}, \ldots , {\varvec{i}_n}\) and \({{\varvec{j}}_1}, \ldots , {{\varvec{j}}_n}\) can be done is at most

$$\begin{aligned} \left( {\begin{array}{c}N\\ \kappa \end{array}}\right) (\kappa + S - 1)N^{S - 1} \kappa (\kappa + 1) \cdots (\kappa + S - 1) < \left( {\begin{array}{c}N\\ \kappa \end{array}}\right) \kappa (\kappa + 1)\cdots (\kappa + S)N^{S - 1}. \end{aligned}$$

5.4 Proof of Lemma 4.3

For \(m = 0\) the inequality holds trivially so we can assume that \(m \ge 1\). Applying (4.5) with \(M = M^N\) yields

$$\begin{aligned} \left\| \int _{s}^t\int _{s}^{t_1} \cdots \int _{s}^{t_{m-1}}dM_{t_{m}}^{N} \cdots dM_{t_1}^{N} \right\| _{2p} \le \frac{(2pm)^{m/2}3^m}{m!} \Vert \langle M^{N} \rangle _{s,t}^{1/2} \Vert _{2pm}^{m}. \end{aligned}$$
(5.2)

Next, Hölder’s inequality and the fact that the distribution of \(\Delta B^{i,N}\) is the same for all i since our system is exchangeable give

$$\begin{aligned} \begin{aligned} \left\| \langle M^{N} \rangle _{s,t}^{1/2} \right\| _{2pm}^{m}&= {\mathbb {E}} \left[ \left( \sum _{i=1}^{N}\int _{s}^{t}(\Delta B_u^{i,N})^2du\right) ^{pm}\right] ^\frac{1}{2p} \\&\le {\mathbb {E}} \left[ (N(t-s))^{pm - 1}\sum _{i=1}^{N} \int _{s}^{t}(\Delta B_u^{i,N})^{2pm}du\right] ^\frac{1}{2p} \\&= \left( N^{pm}(t - s)^{pm - 1}\int _{s}^{t}{\mathbb {E}}\left[ (\Delta B_u^{1,N} )^{2pm}\right] du\right) ^\frac{1}{2p}. \\ \end{aligned} \end{aligned}$$
(5.3)

Letting C be the upper bound on the derivative of \(r \mapsto B(t, {\varvec{x}}_{[0,t]}, r)\) afforded by Assumption 2.3, we obtain

$$\begin{aligned} \left| \Delta B_u^{1,N}\right| \le C\left| \int g(u,X_{[0,u]}^1, \varvec{y}) (\mu _u^{N} - \mu _u)(d\varvec{y}) \right| . \end{aligned}$$

Using this in (5.3) yields

$$\begin{aligned}{} & {} \left\| \langle M^{N} \rangle _{s, t}^{1/2} \right\| _{2pm}^{m}\\{} & {} \quad \le \Bigg (N^{pm}(t - s)^{pm - 1}C^{2pm}\int _{s}^{t} {\mathbb {E}}\Bigg [ \left| \int g(u,X_{[0,u]}^1, \varvec{y}) (\mu _u^{N} - \mu _u)(d\varvec{y}) \right| ^{2pm}\Bigg ]du \Bigg )^\frac{1}{2p}. \end{aligned}$$

We may now use Assumption 2.3 to deduce that for any integer \(p > 1\),

$$\begin{aligned} \left\| \langle M^{N} \rangle _{s, t}^{1/2} \right\| _{2pm}^{m}&\le \left( N^{pm}(t - s)^{pm - 1}C^{2pm}\int _{s}^{t}\frac{1}{N^{pm}} \left( pm\right) ! K(u)^{pm} du \right) ^\frac{1}{2p} \\&\le (t - s)^{\frac{m}{2}}C^{m}\left( \left( pm\right) !\right) ^{\frac{1}{2p}}\sup _{0 \le u \le T}K(u)^{m/2}. \end{aligned}$$

Plugging this into (5.2) we obtain

$$\begin{aligned}&\left\| \int _{s}^t\int _{s}^{t_1} \cdots \int _{s}^{t_{m-1}}dM_{t_{m}}^{N} \cdots dM_{t_1}^{N} \right\| _{2p} \nonumber \\&\quad \le (3C)^{m}\frac{((pm)!)^{1/2p}}{m!}(2pm)^{\frac{m}{2}}(t - s)^{\frac{m}{2}}\sup _{0 \le u \le T}K(u)^{m/2}. \end{aligned}$$
(5.4)

A calculation using Stirling’s approximation yields \(((pm)!)^{1/2p} / m! < (2pm )^{-m/2}(8e^2p)^{m}\), and substituting this into (5.4) finally leads to

$$\begin{aligned} \Bigg \Vert \int _{s}^t\int _{s}^{t^1} \cdots \int _{s}^{t^{m-1}}dM_{t^{m}}^{N}\cdots dM_{t^1}^{N} \Bigg \Vert _{2p} \le (3C)^{m}(t - s)^{\frac{m}{2}}(8e^2p)^{m} \sup _{0 \le u \le T}K(u)^{m/2}. \end{aligned}$$

This shows that the desired result holds with \(C(T) = 24Ce^2\sqrt{\sup _{0 \le u \le T}K(u)}\).

5.5 Proof of Lemma 4.6

We recall the identity

$$\begin{aligned} x \frac{d^{S+1}}{dx^{S+1}}\sum _{\kappa =0}^N\left( {\begin{array}{c}N\\ \kappa \end{array}}\right) x^{\kappa +S} = \sum _{\kappa =1}^{N}\left( {\begin{array}{c}N\\ \kappa \end{array}}\right) \kappa (\kappa +1) \cdots (\kappa +S-1)(\kappa + S)x^{\kappa }. \end{aligned}$$
(5.5)

By the binomial theorem and Leibniz’s rule for the derivative of a product of functions, we have the estimate

$$\begin{aligned} x \frac{d^{S+1}}{dx^{S+1}}\sum _{\kappa =0}^N\left( {\begin{array}{c}N\\ \kappa \end{array}}\right) x^{\kappa +S}&= x \frac{d^{S+1}}{dx^{S+1}} \Bigg (x^S(1+x)^N\Bigg )\nonumber \\&=x \sum _{i=0}^{S+1}\left( {\begin{array}{c}S+1\\ i\end{array}}\right) S(S-1)\cdots (S-i+1)x^{S-i}\nonumber \\&\qquad \qquad \times N(N-1)\cdots (N-S+i)(1+x)^{N-S-1+i}\nonumber \\&\le (S+1)^{2(S+1)}(1+x)^N\sum _{i=0}^{S+1}(xN)^{S-i+1}. \end{aligned}$$
(5.6)

Combining (5.5) and (5.6), plugging in \(x = (C/N)^{1-1/\log N}\), noting that this value of x is upper bounded by Ce/N since \(C>1\), and finally using that \(1 + Ce/N \le \exp (Ce/N)\) and that \(\left( eC\right) ^{S - i + 1} < \left( eC\right) ^{S + 1}\) since \(eC> e > 1\), we obtain the desired inequality.

6 Proof of Theorem 2.4

We now prove Theorem 2.4. The setup of Sect. 2 will be used. In particular the objects \(\mu ^N\), \(M^N\), \(\Delta B^{i,N}\) in (2.3)–(2.5) as well as the density process \(Z^N = \exp (M^N - \frac{1}{2} \langle M^N \rangle )\) will be referred to freely. Assumptions 2.1 and 2.3 are in force. The induced measure \({{\mathbb {Q}}}^N\), the normalizing constants \(a^N_T,b^N_T\), and the limiting distribution function \(\Gamma _T\) are as in the statement of the theorem. The time point T is fixed throughout.

We must prove (2.7). It suffices to do this for \(x \in {{\mathbb {R}}}\) such that \(\Gamma _T(x) > 0\). Indeed, suppose this has been done and consider x such that \(\Gamma _T(x) = 0\). Because all extreme value distributions are continuous, for any \(\varepsilon > 0\) there is \(x' > x\) such that \(0< \Gamma _T(x') < \varepsilon \), and thus

$$\begin{aligned} {{\mathbb {Q}}}^N\left( \max _{i \le N} \frac{X^i_T - b^N_T}{a^N_T} \le x \right) \le {{\mathbb {Q}}}^N\left( \max _{i \le N} \frac{X^i_T - b^N_T}{a^N_T} \le x' \right) \rightarrow \Gamma _T(x') < \varepsilon . \end{aligned}$$

Since \(\varepsilon > 0\) was arbitrary, the left-hand side converges to \(\Gamma _T(x) = 0\) as \(N \rightarrow \infty \). We thus pick x such that \(\Gamma _T(x) > 0\) and set out to prove that as \(N \rightarrow \infty \),

$$\begin{aligned} {{\mathbb {Q}}}^N\left( \max _{i \le N} \frac{X^i_T - b^N_T}{a^N_T} \le x \right) - {{\mathbb {P}}}\left( \max _{i \le N} \frac{X^i_T - b^N_T}{a^N_T} \le x \right) = {{\mathbb {E}}}\left[ \prod _{i=1}^N {\varvec{1}}_{\{X^i_T \le x_N\}} (Z^N_T - 1) \right] \rightarrow 0, \end{aligned}$$

where for brevity we introduce the notation

$$\begin{aligned} x_N = a^N_T x + b^N_T. \end{aligned}$$

The proof is divided into several steps.

Step 1: partitioning the time interval. Chaos expansions of \(Z^N\) are at the core of the proof, and to get sufficient control on the convergence of these expansions we partition the interval (0, T] into n subintervals \((T_{\alpha -1},T_\alpha ]\), \(\alpha \in [n]\), of equal length \(T_\alpha - T_{\alpha -1} = T/n\). We choose n large enough that \(C(T) \sqrt{T/n} < 1/2\), where C(T) is the constant in Lemma 4.3, and then keep n fixed for the remained of the proof. We now observe the identity

$$\begin{aligned} Z^N_T - 1 = \sum _{\alpha =1}^n \prod _{\beta =1}^\alpha \left( \frac{Z^N_{T_\beta }}{Z^N_{T_{\beta -1}}} - \delta _{\alpha \beta } \right) , \end{aligned}$$

where \(\delta _{\alpha \beta }\) is the Kronecker delta, thus \(\delta _{\alpha \beta }=1\) if \(\alpha = \beta \) and \(\delta _{\alpha \beta }=0\) otherwise. This yields

$$\begin{aligned} {{\mathbb {E}}}\left[ \prod _{i=1}^N {\varvec{1}}_{\{X^i_T \le x_N\}} (Z^N_T - 1) \right] = \sum _{\alpha = 1}^n A^N_\alpha , \end{aligned}$$

where

$$\begin{aligned} A^N_\alpha = {{\mathbb {E}}}\left[ \prod _{i=1}^N {\varvec{1}}_{\{X^i_T \le x_N\}} \prod _{\beta =1}^\alpha \left( \frac{Z^N_{T_\beta }}{Z^N_{T_{\beta -1}}} - \delta _{\alpha \beta } \right) \right] . \end{aligned}$$

To prove the theorem it suffices to show that \(A^N_\alpha \rightarrow 0\) as \(N \rightarrow \infty \) for each \(\alpha \in [n]\). We thus fix any such \(\alpha \) and set out to prove that \(A^N_\alpha \rightarrow 0\).

Step 2: controlling the tails of the chaos expansions uniformly in N. Let \(\varepsilon > 0\) be arbitrary. We will show by induction that there are positive integers \(m_1, m_2, \ldots , m_{\alpha }\), which do not depend on N, such that for \(\gamma = 1,\ldots , \alpha + 1\) we have

$$\begin{aligned} \begin{aligned} |A^N_{\alpha }|&\le (\gamma - 1)\varepsilon \\&\quad + \left| {\mathbb {E}}\left[ \prod _{i=1}^N {\varvec{1}}_{\{X_{T}^{i} \le x_N\}}\prod _{\beta = 1}^{\gamma - 1}\left( \sum _{m = \delta _{\alpha \beta }}^{m_{\beta }}I_m^N(T_{\beta - 1}, T_\beta )\right) \prod _{\beta = \gamma }^\alpha \left( \frac{Z_{T_\beta }^N}{Z_{T_{\beta - 1}}^N}-\delta _{\alpha \beta }\right) \right] \right| , \end{aligned} \end{aligned}$$
(6.1)

with the convention that an empty product is equal to one. The base case \(\gamma = 1\) holds trivially because the right-hand side is then just equal to \(|A^N_\alpha |\). Suppose now that for some \(\gamma \in [\alpha ]\) we have determined positive integers \(m_1,\ldots ,m_{\gamma -1}\) such that (6.1) holds. We will find \(m_\gamma \) such that (6.1) is true with \(\gamma \) replaced by \(\gamma +1\).

To this end, decompose the chaos expansion of \(Z^N_{T_\gamma } / Z^N_{T_{\gamma -1}}\) as

$$\begin{aligned} \frac{Z_{T_\gamma }^N}{Z_{T_{\gamma - 1}}^N}-\delta _{\alpha \gamma } = \sum _{m=\delta _{\alpha \gamma }}^{m_{\gamma }} I_m^N(T_{\gamma - 1}, T_\gamma ) + \sum _{m=m_{\gamma } + 1}^{\infty } I_m^N(T_{\gamma - 1}, T_\gamma ). \end{aligned}$$

As will become clear shortly, the infinite series converges in \(L^2\) thanks to Lemma 4.3 and the fact that \(T_{\gamma }-T_{\gamma -1} = T/n\) is sufficiently small. Plugging this into the induction hypothesis (6.1) we get

$$\begin{aligned} |A^N_{\alpha }|\le & {} (\gamma - 1)\varepsilon \nonumber \\{} & {} \quad + \Bigg |{\mathbb {E}}\Bigg [\prod _{i=1}^N {\varvec{1}}_{\{X_{T}^{i} \le x_N\}}\prod _{\beta = 1}^{\gamma }\Bigg (\sum _{m = \delta _{\alpha \beta }}^{m_{\beta }}I_m^N(T_{\beta - 1}, T_\beta )\Bigg ) \prod _{\beta = \gamma +1}^\alpha \Bigg (\frac{Z_{T_\beta }^N}{Z_{T_{\beta - 1}}^N}-\delta _{\alpha \beta }\Bigg )\Bigg ]\Bigg | \nonumber \\{} & {} \quad + \Bigg | {\mathbb {E}}\Bigg [ \prod _{i=1}^N {\varvec{1}}_{\{X_{T}^{i} \le x_N\}}\prod _{\beta = 1}^{\gamma - 1}\Bigg (\sum _{m = \delta _{\alpha \beta }}^{m_{\beta }}I_m^N(T_{\beta - 1}, T_\beta )\Bigg ) \nonumber \\{} & {} \quad \times \Bigg (\sum _{m = m_\gamma + 1}^{\infty }I_m^N(T_{\gamma - 1}, T_\gamma )\Bigg ) \prod _{\beta = \gamma +1}^\alpha \Bigg (\frac{Z_{T_\beta }^N}{Z_{T_{\beta - 1}}^N}-\delta _{\alpha \beta }\Bigg ) \Bigg ] \Bigg | . \end{aligned}$$
(6.2)

The third term on the right-hand side of (6.2) is bounded by

$$\begin{aligned} {\mathbb {E}}\Bigg [ \prod _{\beta = 1}^{\gamma - 1} \Bigg | \sum _{m = \delta _{\alpha \beta }}^{m_{\beta }}I_m^N(T_{\beta - 1}, T_\beta ) \Bigg | \sum _{m = m_\gamma + 1}^{\infty } \left| I_m^N(T_{\gamma - 1}, T_\gamma ) \right| {{\mathbb {E}}}\Bigg [ \Bigg | \prod _{\beta = \gamma +1}^\alpha \Bigg ( \frac{Z_{T_\beta }^N}{Z_{T_{\beta - 1}}^N}-\delta _{\alpha \beta } \Bigg ) \Bigg | {\ \Big |\ }{{\mathcal {F}}}_{T_\gamma } \Bigg ] \Bigg ]. \end{aligned}$$

Note that \(| \prod _{\beta = \gamma +1}^\alpha ( Z_{T_\beta }^N / Z_{T_{\beta - 1}}^N - \delta _{\alpha \beta })|\) is bounded by \((Z^N_{T_\alpha } + Z^N_{T_{\alpha -1}}) / Z^N_{T_\gamma }\) if \(\gamma \le \alpha -1\), and by one if \(\gamma =\alpha \). The martingale property of \(Z^N\) thus implies that the conditional expectation above is bounded by two. Using also Hölder’s inequality, the triangle inequality, and finally Lemma 4.3, we bound the expression in the preceding display by

$$\begin{aligned}&2\, {\mathbb {E}}\Bigg [ \prod _{\beta = 1}^{\gamma - 1} \Bigg | \sum _{m = \delta _{\alpha \beta }}^{m_{\beta }}I_m^N(T_{\beta - 1}, T_\beta ) \Bigg | \sum _{m = m_\gamma + 1}^{\infty } \left| I_m^N(T_{\gamma - 1}, T_\gamma ) \right| \Bigg ] \\&\quad \le 2 \prod _{\beta = 1}^{\gamma - 1} \Bigg \Vert \sum _{m = \delta _{\alpha \beta }}^{m_{\beta }}I_m^N(T_{\beta - 1}, T_\beta ) \Bigg \Vert _{2\gamma -2} \Bigg \Vert \sum _{m = m_\gamma + 1}^{\infty } \left| I_m^N(T_{\gamma - 1}, T_\gamma ) \right| \Bigg \Vert _2 \\&\quad \le 2 \prod _{\beta = 1}^{\gamma - 1} \sum _{m = \delta _{\alpha \beta }}^{m_{\beta }} \left\| I_m^N(T_{\beta - 1}, T_\beta ) \right\| _{2\gamma -2} \sum _{m = m_\gamma + 1}^{\infty } \left\| I_m^N(T_{\gamma - 1}, T_\gamma ) \right\| _2 \\&\quad \le 2 \prod _{\beta = 1}^{\gamma - 1} \sum _{m = \delta _{\alpha \beta }}^{m_{\beta }} \left( C(T) (\gamma -1) \sqrt{T/n} \right) ^m \sum _{m = m_\gamma + 1}^{\infty } \left( C(T) \sqrt{T/n} \right) ^m. \end{aligned}$$

Thanks to the choice of n in Step 1, the right-hand side is bounded by

$$\begin{aligned} 2 \prod _{\beta = 1}^{\gamma - 1} \sum _{m = \delta _{\alpha \beta }}^{m_{\beta }} \left( \frac{\gamma -1}{2} \right) ^m \sum _{m = m_\gamma + 1}^{\infty } 2^{-m}. \end{aligned}$$

We now simply choose \(m_\gamma \) large enough that this expression is less than \(\varepsilon \). Plugging this back into (6.2) yields (6.1) with \(\gamma \) replaced by \(\gamma +1\). This completes the induction step and shows that (6.1) holds for all \(\gamma =1,\ldots ,\alpha +1\). In particular, taking \(\gamma =\alpha +1\) we obtain

$$\begin{aligned} |A^N_{\alpha }| \le \alpha \varepsilon + \left| {\mathbb {E}}\left[ \prod _{i=1}^N {\varvec{1}}_{\{X_{T}^{i} \le x_N\}}\prod _{\beta = 1}^{\alpha } \sum _{m = \delta _{\alpha \beta }}^{m_{\beta }}I_m^N(T_{\beta - 1}, T_\beta ) \right] \right| . \end{aligned}$$
(6.3)

Step 3: reduction to linear drift. We now linearize the drift function \(B(t, {\varvec{x}}_{[0,t]}, r)\) with respect to its third argument. We write \(D_3 B(t, {\varvec{x}}_{[0,t]}, r)\) for the derivative with respect to r and define for simplicity the process

$$\begin{aligned} D_3 B^i_t = D_3 B\left( t, X^i_{[0,t]}, \int g(t, X^i_{[0,t]}, {\varvec{y}}_{[0,t]}) \mu _t(d{\varvec{y}}) \right) . \end{aligned}$$

Note that \(D_3B^i\) is adapted to the filtration \(({{\mathcal {F}}}^{\{i\}}_t)_{t \ge 0}\) generated by \((X^i,W^i)\). We also write \(H^i_t(\mu ) = \int g(t, X^i_{[0,t]}, {\varvec{y}}_{[0,t]}) \mu (d{\varvec{y}})\) for any signed measure \(\mu \) on \(C({{\mathbb {R}}}_+)\). We then have the Taylor formula

$$\begin{aligned} \Delta B^{i,N}_t = D_3 B^i_t \, H^i_t(\mu ^N_t - \mu _t) + R^{i,N}_t \left( H^i_t(\mu ^N_t - \mu _t) \right) ^2, \end{aligned}$$

where \(R^{i,N}\) is a process which is uniformly bounded in terms of the bound on the second derivative of \(r \mapsto B(t, {\varvec{x}}_{[0,t]}, r)\) given by Assumption 2.1. We now define local martingales

$$\begin{aligned} \widetilde{M}_t^N = \sum _{i=1}^N \int _0^t D_3B_s^i \, H_s^i(\mu _s^N - \mu _s) dW_s^i \end{aligned}$$
(6.4)

and iterated integrals

$$\begin{aligned} \widetilde{I}_m^N(s,t) = \int _s^t\int _s^{t_1}\cdots \int _s^{t_{m-1}} d\tilde{M}^N_{t_m}\cdots d\tilde{M}^N_{t_1}. \end{aligned}$$
(6.5)

We will prove that there exists a constant C, which does not depend on N, such that

$$\begin{aligned} {\mathbb {E}}\left[ \left| \prod _{\beta = 1}^{\alpha } \sum _{m = \delta _{\alpha \beta }}^{m_{\beta }}I_m^N(T_{\beta - 1}, T_\beta ) - \prod _{\beta = 1}^{\alpha } \sum _{m = \delta _{\alpha \beta }}^{m_{\beta }} {\widetilde{I}}_m^N(T_{\beta - 1}, T_\beta ) \right| \right] \le \frac{C}{\sqrt{N}}. \end{aligned}$$
(6.6)

To prove this, we expand the products and use the triangle inequality to bound the left hand side by

$$\begin{aligned} \sum _{k_1,\ldots ,k_\alpha } {\mathbb {E}} \left[ \left| I_{k_1}^N(T_{0},T_1)\cdots I_{k_\alpha }^N(T_{\alpha - 1},T_\alpha ) - \widetilde{I}_{k_1}^N(T_{0},T_1)\cdots \widetilde{I}_{k_\alpha }^N(T_{\alpha - 1},T_\alpha ) \right| \right] , \end{aligned}$$

where the sum ranges over all \((k_1,\ldots , k_\alpha )\) such that \(k_\beta \in [m_{\beta }]\cup \{0\}\) for \(\beta < \alpha \) and \(k_{\alpha }\in [m_\alpha ]\). On each summand in the above expression we apply the identity

$$\begin{aligned} \prod _{\beta =1}^\alpha x_\beta - \prod _{\beta =1}^\alpha y_\beta = \sum _{(i_1,\ldots ,i_\alpha )\in \{0,1\}^\alpha \setminus \{\textbf{0}\}}\prod _{\beta = 1}^\alpha (x_\beta - y_\beta )^{i_\beta }y_\beta ^{1-i_\beta } \end{aligned}$$

and then use the triangle inequality along with Hölder’s inequality to get

$$\begin{aligned}&{\mathbb {E}} \left[ \left| I_{k_1}^N(T_{0},T_1)\cdots I_{k_\alpha }^N(T_{\alpha - 1},T_\alpha ) - \widetilde{I}_{k_1}^N(T_{0},T_1)\cdots \widetilde{I}_{k_\alpha }^N(T_{\alpha - 1},T_\alpha ) \right| \right] \nonumber \\&\quad \le \sum _{(i_1,\ldots ,i_\alpha )\in \{0,1\}^\alpha \setminus \{\textbf{0}\}} {\mathbb {E}}\left[ \left| \prod _{\beta = 1}^\alpha \left( I_{k_\beta }^N(T_{\beta - 1}, T_\beta ) - \widetilde{I}_{k_\beta }^N(T_{\beta - 1}, T_\beta )\right) ^{i_\beta }\widetilde{I}_{k_\beta }^N(T_{\beta - 1}, T_\beta )^{1-i_\beta }\right| \right] \nonumber \\&\quad \le \sum _{(i_1,\ldots ,i_\alpha )\in \{0,1\}^\alpha \setminus \{\textbf{0}\}}\prod _{\beta = 1}^\alpha \left\| \left( I_{k_\beta }^N(T_{\beta - 1}, T_\beta ) - \widetilde{I}_{k_\beta }^N(T_{\beta - 1}, T_\beta ) \right) ^{i_\beta }\widetilde{I}_{k_\beta }^N(T_{\beta - 1}, T_\beta )^{1-i_\beta } \right\| _{\alpha }. \end{aligned}$$
(6.7)

Thus it suffices to bound each of the products in (6.7) by a constant times \(1/\sqrt{N}\). Since each of these products has at least one factor with \(i_\beta = 1\), this will follow directly from the estimates

$$\begin{aligned} \Vert \widetilde{I}_{k_\beta }^N(T_{\beta - 1}, T_\beta )\Vert _{\alpha } \le C \end{aligned}$$
(6.8)

and

$$\begin{aligned} \Vert I_{k_\beta }^N(T_{\beta - 1}, T_\beta ) - \widetilde{I}_{k_\beta }^N(T_{\beta - 1}, T_\beta )\Vert _{\alpha } \le \frac{C}{\sqrt{N}}. \end{aligned}$$
(6.9)

Once these estimates have been proved, (6.6) follows.

To prove (6.8)–(6.9) we first derive \(L^p\) estimates for the quadratic variations of \({\widetilde{M}}^N\) and \(M^N - {\widetilde{M}}^N\). Let C be a uniform bound on the first and second derivatives of \(r \mapsto B(t, {\varvec{x}}_{[0,t]}, r)\) as given by Assumption 2.1, and recall that Assumption 2.3 gives

$$\begin{aligned} {\mathbb {E}}\left[ \left( H_t^i(\mu _t^N - \mu _t) \right) ^{2p} \right] \le \frac{1}{N^p}p!K(t)^p. \end{aligned}$$

for any positive integer p and \(t \in {{\mathbb {R}}}_+\). Therefore using Hölder’s inequality we obtain, for any positive integer p,

$$\begin{aligned} \begin{aligned} \left\| \langle \widetilde{M}^N \rangle _{s,t}^{1/2} \right\| _{2p}&\le C\, {\mathbb {E}}\left[ \left( \sum _{i=1}^N\int _s^t \left( H_u^i(\mu _u^N - \mu _u) \right) ^2 du \right) ^{p}\right] ^{\frac{1}{2p}} \\&\le C\, (N(t - s))^{\frac{1}{2} - \frac{1}{2p}} \left( \sum _{i=1}^N\int _s^t {\mathbb {E}}\left[ \left( H_u^i(\mu _u^N - \mu _u) \right) ^{2p} \right] du\right) ^{\frac{1}{2p}} \\&\le C\, (t-s)^{1/2} (p!)^{1/(2p)} \sup _{s \le u \le t} K(u)^{1/2} \end{aligned} \end{aligned}$$
(6.10)

and

$$\begin{aligned} \begin{aligned} \left\| \langle M^N - \widetilde{M}^N \rangle _{s,t}^{1/2} \right\| _{2p}&\le C\, {\mathbb {E}}\left[ \left( \sum _{i=1}^N\int _s^t \left( H_u^i(\mu _u^N - \mu _u) \right) ^4 du \right) ^{p}\right] ^{\frac{1}{2p}} \\&\le C\, (N(t - s))^{\frac{1}{2} - \frac{1}{2p}} \left( \sum _{i=1}^N \int _s^t {\mathbb {E}} \left[ \left( H_u^i(\mu _u^N - \mu _u)\right) ^{4p} \right] du \right) ^{\frac{1}{2p}} \\&\le C\, (t-s)^{1/2} ((2p)!)^{1/(2p)} \sup _{s \le u \le t} K(u) \frac{1}{\sqrt{N}}. \end{aligned} \end{aligned}$$
(6.11)

To prove (6.8) we apply Lemma 4.5 to the iterated integral \(\widetilde{I}_{k_\beta }^N(T_{\beta - 1}, T_\beta )\) and combine this with (6.10) to get

$$\begin{aligned} \Vert \widetilde{I}_{k_\beta }^N(T_{\beta - 1}, T_\beta )\Vert _{\alpha } \le C \prod _{\ell =1}^{k_\beta } \left\| \langle \widetilde{M}^N \rangle _{T_{\beta -1},T_{\beta }}^{1/2} \right\| _{2^{\ell }\alpha } \le C. \end{aligned}$$

To prove (6.9) we observe that \(I_{k_\beta }^N(T_{\beta - 1}, T_\beta ) - \widetilde{I}_{k_\beta }^N(T_{\beta - 1}, T_\beta )\) can be written as the sum of \(2^{k_\beta } - 1\) terms, each having the form

$$\begin{aligned} \int _{T_{\beta -1}}^{T_\beta }\int _{T_{\beta -1}}^{t_1}\cdots \int _{T_{\beta -1}}^{t_{k_\beta -1}} d{Y}^{k_\beta }_{t_{k_\beta }}\cdots d{Y}^1_{t_1}, \end{aligned}$$

where \(Y^{\ell } = M^N - \widetilde{M}^N\) for at least one \(\ell \) and \(Y^{\ell } = \widetilde{M}^N\) for the remaining \(\ell \). By first applying Lemma 4.5 and then (6.10) and (6.11) we get

$$\begin{aligned} \left\| \int _{T_{\beta -1}}^{T_\beta }\int _{T_{\beta -1}}^{t_1}\cdots \int _{T_{\beta -1}}^{t_{k_\beta -1}} d{Y}^{k_\beta }_{t_{k_\beta }}\cdots d{Y}^1_{t_1} \right\| _{\alpha } \le C \prod _{\ell =1}^{k_\beta }\left\| \langle Y^\ell \rangle _{T_{\beta -1}, T_\beta }^{1/2}\right\| _{2^{\ell }\alpha } \le \frac{C}{\sqrt{N}}. \end{aligned}$$

(Here we used (6.10) for each \(Y^{\ell }\) that equals \(\tilde{M}^N\) and (6.11) for each \(Y^{\ell }\) that equals \(M^N - \tilde{M}^N\), and the \(1/\sqrt{N}\) factor emerged because there is at least one factor of the latter kind.) By summing and using the triangle inequality we finally obtain (6.9).

To summarize, we have now proved (6.8)–(6.9), thus showing that each of the products in (6.7) is bounded by a constant times \(1/\sqrt{N}\). This in turn yields (6.6) as desired.

We end Step 3 by combing (6.6) and (6.3) to get

$$\begin{aligned} |A^N_{\alpha }| \le \alpha \varepsilon + \frac{C}{\sqrt{N}} + \left| {\mathbb {E}}\left[ \prod _{i=1}^N {\varvec{1}}_{\{X_{T}^{i} \le x_N\}}\prod _{\beta = 1}^{\alpha } \sum _{m = \delta _{\alpha \beta }}^{m_{\beta }} {\widetilde{I}}_m^N(T_{\beta - 1}, T_\beta ) \right] \right| . \end{aligned}$$
(6.12)

The key point is that the iterated integrals \({\widetilde{I}}^N_m(T_{\beta -1},T_\beta )\) are defined in terms of the local martingale \({\widetilde{M}}^N\) in (6.4) which, unlike \(M^N\), depends linearly on \(\mu ^N-\mu \). In sense, all nonlinear dependence on \(\mu ^N-\mu \) has been absorbed into the vanishing term \(C/\sqrt{N}\).

Step 4: expanding the iterated integrals. Our starting point is now (6.12), where we recall that \(\alpha \) is fixed, \(\varepsilon >0\) is arbitrary, and \(m_1,\ldots ,m_\alpha , C\) do not depend on N. Therefore, to show that \(A^N_\alpha \rightarrow 0\) as \(N \rightarrow \infty \), it is enough to show that the expectation in (6.12) tends to zero as \(N \rightarrow \infty \). We now pave the way by expanding the sums and products in (6.12) to bring us into a position where the results of Sect. 4 can be applied.

We first expand the product indexed by \(\beta \) to bound the expectation in (6.12) by

$$\begin{aligned} \sum _{k_1,\ldots ,k_\alpha } \left| {\mathbb {E}}\left[ \prod _{i=1}^N {\varvec{1}}_{\{X_{T}^{i} \le x_N\}} \widetilde{I}_{k_1}^N(T_{0},T_1)\cdots \widetilde{I}_{k_\alpha }^N(T_{\alpha - 1},T_\alpha ) \right] \right| , \end{aligned}$$
(6.13)

where the sum ranges over all \((k_1,\ldots , k_\alpha )\) such that \(k_\beta \in [m_{\beta }]\cup \{0\}\) for \(\beta < \alpha \) and \(k_{\alpha }\in [m_\alpha ]\). It suffices to show that each summand in (6.13) vanishes as \(N \rightarrow \infty \), so we fix a tuple \((k_1,\ldots ,k_\alpha )\) and focus on the corresponding expectation.

The next step is to insert the identity \({\varvec{1}}_{\{X_{T}^{i} \le x_N\}} = 1 - {\varvec{1}}_{\{X_{T}^{i} > x_N\}}\) and expand the product to write the expectation as

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}\left[ \prod _{i=1}^N {\varvec{1}}_{\{X_{T}^{i} \le x_N\}} \widetilde{I}_{k_1}^N(T_{0},T_1)\cdots \widetilde{I}_{k_\alpha }^N(T_{\alpha - 1},T_\alpha ) \right] \\&\quad =\sum _{\kappa =1}^N (-1)^{\kappa }\sum _{\{i_{01},\ldots ,i_{0\kappa }\} \subset [N]}{\mathbb {E}} \left[ \prod _{\ell =1}^{\kappa } {\varvec{1}}_{\{X_{T}^{i_{0\ell }} > x_N\}} \widetilde{I}_{k_1}^N(T_{0},T_1)\cdots \widetilde{I}_{k_\alpha }^N(T_{\alpha - 1},T_\alpha ) \right] . \end{aligned} \end{aligned}$$
(6.14)

The purpose of the substitution \({\varvec{1}}_{\{X_{T}^{i} \le x_N\}} = 1 - {\varvec{1}}_{\{X_{T}^{i} > x_N\}}\) is to allow us to use the fact that \(\{X_{T}^{i} > x_N\}\) are independent events whose probabilities are of order 1/N.

We proceed to expand the iterated integrals \(\widetilde{I}_{k_{\beta }}^N(T_{\beta - 1}, T_{\beta })\). In view of (6.4) and the definition of \(H_s^i(\mu ^N_s - \mu _s)\) and \(\mu ^N_s\) we have

$$\begin{aligned} \widetilde{M}_t^N = \frac{1}{N}\sum _{i=1}^N\sum _{j=1}^N\int _0^tG_s^{ij} dW_s^i, \end{aligned}$$
(6.15)

where

$$\begin{aligned} G_t^{ij} = D_3B_t^i\, \left( g\left( t,X_{[0,t]}^i,X_{[0,t]}^j\right) - \int g\left( t,X_{[0,t]}^i,{\varvec{y}}_{[0,t]}\right) \mu _t(d{{\varvec{y}}}) \right) \end{aligned}$$
(6.16)

for all \(i, j \in [N]\). Plugging (6.15) and (6.16) into the definition of \(\widetilde{I}_{k_{\beta }}^N(T_{\beta - 1}, T_{\beta })\), see (6.5), gives

$$\begin{aligned} \widetilde{I}^N_{k_\beta }(T_{\beta - 1}, T_\beta ) = \frac{1}{N^{k_\beta }}\sum _{{\varvec{i}}\in [N]^{k_\beta }}\sum _{{{\varvec{j}}}\in [N]^{k_\beta }}I_{{\varvec{i}},{{\varvec{j}}}}^{N}(T_{\beta - 1}, T_\beta ), \end{aligned}$$
(6.17)

where we use the iterated integral notation (4.1) of Sect. 4. The product of iterated integrals appearing in (6.14) can then be written

$$\begin{aligned}{} & {} \widetilde{I}_{k_1}^N(T_{0},T_1)\cdots \widetilde{I}_{k_\alpha }^N(T_{\alpha - 1},T_\alpha )\\{} & {} \quad = \frac{1}{N^{k_1 + \cdots + k_\alpha }}\sum _{({\varvec{i}}_1,{{\varvec{j}}}_1),\ldots ,({\varvec{i}}_\alpha ,{{\varvec{j}}}_\alpha )} \prod _{\beta = 1}^{\alpha } I_{{\varvec{i}_{\beta }},{{\varvec{j}}_{\beta }}}^{N}(T_{\beta - 1}, T_\beta ) \end{aligned}$$

where the sum extends over all \(\alpha \)-tuples \((({\varvec{i}}_1,{{\varvec{j}}}_1),\ldots ,({\varvec{i}}_\alpha ,{{\varvec{j}}}_\alpha ))\) consisting of pairs \(({\varvec{i}}_{\beta },{{\varvec{j}}}_{\beta })\) in \([N]^{k_\beta } \times [N]^{k_\beta }\). Substituting this representation turns the right-hand side of (6.14) into

$$\begin{aligned} \frac{1}{N^{k_1 + \cdots + k_\alpha }} \sum _{\kappa =1}^N (-1)^{\kappa }\sum _{\{i_{01},\ldots ,i_{0\kappa }\} \subset [N]} \sum _{({\varvec{i}}_1,{{\varvec{j}}}_1),\ldots ,({\varvec{i}}_\alpha ,{{\varvec{j}}}_\alpha )} {\mathbb {E}} \left[ \prod _{\ell =1}^{\kappa } {\varvec{1}}_{\{X_{T}^{i_{0\ell }} > x_N\}} \prod _{\beta = 1}^{\alpha } I_{{\varvec{i}_{\beta }},{{\varvec{j}}_{\beta }}}^{N}(T_{\beta - 1}, T_\beta ) \right] , \end{aligned}$$

whose absolute value is bounded by

$$\begin{aligned} \frac{1}{N^{k_1 + \cdots + k_\alpha }} \sum _{\kappa =1}^N \sum _{\{i_{01},\ldots ,i_{0\kappa }\} \subset [N]} \sum _{({\varvec{i}}_1,{{\varvec{j}}}_1),\ldots ,({\varvec{i}}_\alpha ,{{\varvec{j}}}_\alpha )} \left| {\mathbb {E}} \left[ \prod _{\ell =1}^{\kappa } {\varvec{1}}_{\{X_{T}^{i_{0\ell }} > x_N\}} \prod _{\beta = 1}^{\alpha } I_{{\varvec{i}_{\beta }},{{\varvec{j}}_{\beta }}}^{N}(T_{\beta - 1}, T_\beta ) \right] \right| .\nonumber \\ \end{aligned}$$
(6.18)

We are now finally in a position where the results of Sect. 4 can be applied to show that (6.18) tends to zero as \(N\rightarrow \infty \). In particular, for small values of \(\kappa \), an overwhelming number of expectations in (6.18) will be zero, while for large values of \(\kappa \) we can exploit the smallness of the probabilities \({{\mathbb {P}}}(X_{T}^{i_{0\ell }} > x_N)\).

Step 5: application of key lemmas. Our focus is on showing that (6.18) tends to zero as \(N \rightarrow \infty \), and we recall that \(\alpha \) and \(k_1,\ldots ,k_\alpha \) are fixed.

We first aim to apply Lemma 4.1 to assert that a large number of the expectations in (6.18) are in fact zero. We thus fix \(\kappa \in [N]\) and instantiate the lemma with \(G^{ij}\) as in (6.16), the time points \(T_0,\ldots ,T_\alpha \), the natural numbers \(k_1,\ldots ,k_\alpha \), the \(k_\beta \)-tuples \(\varvec{i}_\beta , {\varvec{j}}_\beta \in [N]^{k_\beta }\) for \(\beta \in [\alpha ]\), the subset \(K = \{i_{01},\ldots ,i_{0\kappa }\} \subset [N]\), and the bounded \({{\mathcal {F}}}^K_T\)-measurable random variable \(\Psi = \prod _{\ell =1}^{\kappa } {\varvec{1}}_{\{X_{T}^{i_{0\ell }} > x_N\}}\). We must verify the conditions of Lemma 4.1. It is clear that for each ij, \(G^{ij}\) is adapted to \(({{\mathcal {F}}}^{\{i,j\}}_t)_{t \ge 0}\) and, thanks to Assumption 2.3 and the uniform boundedness of \(D_3B^i\), belongs to \({\mathbb {L}}\). Indeed, a brief calculation yields

$$\begin{aligned} {{\mathbb {E}}}\left[ ( G^{ij}_t )^{2p} \right] \le 2 C p! K(t)^p \end{aligned}$$
(6.19)

for any \(p \in {{\mathbb {N}}}\) and \(t \in {{\mathbb {R}}}_+\), where C is a uniform bound on \(B_3B^i\) and K(t) comes from Assumption 2.3. Moreover, using that the \((X^i,W^i)\) are mutually independent and \(D_3B^i\) is adapted to \(({{\mathcal {F}}}^{\{i\}}_t)_{t \ge 0}\), one verifies that (4.2) holds whenever \(V \subset [N]\) and \(i \in V\), \( j \notin V\).

Lemma 4.1 now tells us that the expectation in (6.18) vanishes whenever the subset \(K=\{i_{01},\ldots ,i_{0\kappa }\}\) and tuples \(\varvec{i}_1,\ldots ,\varvec{i}_\alpha , {\varvec{j}}_1, \ldots , {\varvec{j}}_\alpha \) satisfy at least one of the conditions (i)–(ii) of the lemma. Thanks to Lemma 4.2, for each \(\kappa \in [N]\) there are at most

$$\begin{aligned} \left( {\begin{array}{c}N\\ \kappa \end{array}}\right) \kappa (\kappa +1) \cdots (\kappa +S-1) (\kappa +S) N^{S-1} \end{aligned}$$

terms for which this is not the case, where we write \(S = k_1 + \ldots + k_{\alpha }\). We claim, and will prove below, that each of these nonzero terms admits the bound

$$\begin{aligned}{} & {} \left| {\mathbb {E}} \left[ \prod _{\ell =1}^{\kappa } {\varvec{1}}_{\{X_{T}^{i_{0\ell }} > x_N\}} \prod _{\beta = 1}^{\alpha } I_{{\varvec{i}_{\beta }},{{\varvec{j}}_{\beta }}}^{N}(T_{\beta - 1}, T_\beta ) \right] \right| \le C \lceil \log N \rceil ^{S}\left( \frac{C}{N}\right) ^{\kappa (1 - 1/\log N)} \qquad \end{aligned}$$
(6.20)

for a constant C that does not depend on N. Combining these two facts we upper bound (6.18) by

$$\begin{aligned}&\frac{1}{N^{k_1 + \cdots + k_\alpha }} \sum _{\kappa =1}^N \left( {\begin{array}{c}N\\ \kappa \end{array}}\right) \kappa (\kappa +1) \cdots (\kappa +S) N^{S-1} C \lceil \log N \rceil ^{S}\left( \frac{C}{N}\right) ^{\kappa (1 - 1/\log N)} \\&\quad = \frac{C \lceil \log N \rceil ^{S}}{N} \sum _{\kappa =1}^N \left( {\begin{array}{c}N\\ \kappa \end{array}}\right) \kappa (\kappa +1) \cdots (\kappa +S) \left( \frac{C}{N}\right) ^{\kappa (1 - 1/\log N)}. \end{aligned}$$

Thanks to Lemma 4.6, this is in turn bounded by

$$\begin{aligned} \frac{C \lceil \log N \rceil ^{S}}{N} (S+2)(S+1)^{2(S+1)}e^{Ce}(Ce)^{S+1}, \end{aligned}$$

which tends to zero as \(N \rightarrow \infty \). Tracing backwards, we deduce that (6.18) and hence (6.14) tends to zero as well. This is true for any choice of \((k_1,\ldots ,k_\alpha )\), showing that (6.13) tends to zero. As a result, we see from (6.12) that \(\limsup _{N\rightarrow \infty } |A^N_\alpha | \le \alpha \varepsilon \) and thus \(A^N_\alpha \rightarrow 0\) since \(\varepsilon > 0\) was arbitrary. We recall from Step 1 that it was enough to obtain this for any \(\alpha \in [n]\) in order to prove the theorem.

It still remains to establish (6.20). Applying Hölder’s inequality with exponents \(p_N=\log N\) and \(q_N = (1-1/\log N)^{-1}\) gives

$$\begin{aligned}&\left| {\mathbb {E}} \left[ \prod _{\ell =1}^{\kappa } {\varvec{1}}_{\{X_{T}^{i_{0\ell }}> x_N\}} \prod _{\beta = 1}^{\alpha } I_{{\varvec{i}_{\beta }},{{\varvec{j}}_{\beta }}}^{N}(T_{\beta - 1}, T_\beta ) \right] \right| \\&\quad \le {\mathbb {E}} \left[ \prod _{\ell =1}^{\kappa } {\varvec{1}}_{\{X_{T}^{i_{0\ell }}> x_N\}} \right] ^{\frac{1}{q_N}} \prod _{\beta = 1}^{\alpha } \left\| I_{{\varvec{i}_{\beta }},{{\varvec{j}}_{\beta }}}^{N}(T_{\beta - 1}, T_\beta ) \right\| _{\alpha p_N} \\&\quad = {{\mathbb {P}}}\left( X_T > x_N \right) ^{\kappa /q_N} \prod _{\beta = 1}^{\alpha } \left\| I_{{\varvec{i}_{\beta }},{{\varvec{j}}_{\beta }}}^{N}(T_{\beta - 1}, T_\beta ) \right\| _{\alpha p_N}. \end{aligned}$$

Using Lemma 4.5 with \(k = k_{\beta }\) and \(M_{t}^{\ell } = \int _0^{t}G^{i_{\beta \ell },j_{\beta \ell }}_s dW_s^{i_{\beta \ell }}\) for \(\ell \in [k_{\beta }]\) we bound

$$\begin{aligned} \left\| I_{{\varvec{i}_{\beta }},{{\varvec{j}}_{\beta }}}^{N}(T_{\beta - 1}, T_\beta ) \right\| _p\le (4\sqrt{p})^{k_\beta } 2^{k_\beta (k_\beta -1)/4}\prod _{\ell =1}^{k_\beta }\Vert \langle M^{\ell } \rangle _{T_{\beta -1},T_\beta }^{1/2}\Vert _{2^{\ell }p} \end{aligned}$$

for any \(p \in {{\mathbb {N}}}\). Moreover, from Hölder’s inequality and (6.19), and recalling also that \(T_\beta - T_{\beta -1} ~=~ T/n\) (see Step 1), we get

$$\begin{aligned}&\Vert \langle M^{\ell } \rangle _{T_{\beta -1},T_\beta }^{1/2}\Vert _{2^{\ell }p}\\&\quad \le \left( \frac{T}{n} \right) ^{\frac{1}{2} - \frac{1}{2^\ell p}} \left( \int _{T_{\beta -1}}^{T_\beta } {{\mathbb {E}}}\left[ ( G^{ij}_s )^{2^\ell p} \right] ds \right) ^{1/(2^\ell p)} \\&\quad \le \left( \frac{T}{n} \right) ^{1/2} (2 C (2^{\ell -1}p)! )^{1/(2^\ell p)} \sup _{t \le T} K(t)^{1/2} \\&\quad \le C \sqrt{p}, \end{aligned}$$

where in the last step we used Stirling’s approximation and where C (which as per our conventions may change from one occurrence to the next) does not depend on p or N. We deduce that

$$\begin{aligned} \left\| I_{{\varvec{i}_{\beta }},{{\varvec{j}}_{\beta }}}^{N}(T_{\beta - 1}, T_\beta ) \right\| _p\le C (4\sqrt{p})^{k_\beta } \prod _{\ell =1}^{k_\beta } \sqrt{p} = C p^{k_\beta }. \end{aligned}$$

Choosing \(p = \alpha \lceil p_N \rceil \) we apply the above bounds to obtain

$$\begin{aligned}&\left| {\mathbb {E}} \left[ \prod _{\ell =1}^{\kappa } {\varvec{1}}_{\{X_{T}^{i_{0\ell }}> x_N\}} \prod _{\beta = 1}^{\alpha } I_{{\varvec{i}_{\beta }},{{\varvec{j}}_{\beta }}}^{N}(T_{\beta - 1}, T_\beta ) \right] \right| \\&\quad \le {{\mathbb {P}}}\left( X_T> x_N \right) ^{\kappa /q_N} \prod _{\beta = 1}^{\alpha } C (\alpha \lceil p_N \rceil )^{k_\beta } \\&\quad = C \lceil p_N \rceil ^{S}\, {{\mathbb {P}}}\left( X_T > x_N \right) ^{\kappa /q_N}. \end{aligned}$$

All that remains in order to establish (6.20) is to show that \({{\mathbb {P}}}( X_T > x_N) \le C/N\). But this follows from the fact that \((1 - {{\mathbb {P}}}(X_T> x_N))^N = {{\mathbb {P}}}(\max _{i \le N} X^i_T \le a^N_T x + b^N_T) \rightarrow \Gamma _T(x) > 0\) by assumption, so that \(N {{\mathbb {P}}}(X_T> x_N) \le - N \log (1-{{\mathbb {P}}}(X_T > x_N)) \le C\) for some constant C that does not depend on N. This completes the proof of (6.20), and of the theorem.