1 Introduction

In this article, we study the existence and uniqueness of strong solutions for classes of McKean–Vlasov SDEs, where the drift exhibits a discontinuity in the spatial component. We also provide time-stepping schemes of Euler–Maruyama type, for which we show strong convergence of a certain rate.

A McKean–Vlasov equation (introduced in [44, 45]) for a d-dimensional process \(X=(X_t)_{t \in [0,T]}\), with a given finite time-horizon \(T >0\), is an SDE where the underlying coefficients depend on the current state \(X_t\) and, additionally, on the law of \(X_t\). We consider more specifically the one-dimensional equation of the form

$$\begin{aligned} \mathrm {d}X_t =b(X_t, {\mathcal {L}}_{X_t}) \, \mathrm {d}t + \sigma (X_t) \, \mathrm {d}W_t, \quad X_0 = \xi \in L_2^{0}({\mathbb {R}}), \end{aligned}$$
(1.1)

where \(L_2^{0}({\mathbb {R}})\) denotes the space of real-valued, \({\mathcal {F}}_0\)-measurable random variables with second finite moments, \((W_t)_{t \in [0,T]}\) is a one-dimensional standard Brownian motion and \({\mathcal {L}}_{X_t}\) denotes the marginal law of the process X at time \(t \in [0,T]\). In particular, we are concerned with the well-posedness of equations of the form (1.1), where \(b(\cdot ,\mu )\) is discontinuous in zero, and piecewise Lipschitz on the subintervals \((-\infty ,0)\) and \((0,\infty )\). Concerning the measure component of the drift, we will require global Lipschitz continuity with respect to the Wasserstein distance with quadratic cost denoted by \({\mathcal {W}}_2\) (see below for a precise definition). The diffusion term will be only state dependent and globally Lipschitz continuous. Our setting contrasts with the standard case with globally Lipschitz continuous coefficients, which is well-studied in the literature, both from an analytic and numerical perspective, e.g., in [7, 46, 58], respectively.

The study of SDEs and McKean–Vlasov equations with discontinuous drift is motivated by such models in biology (see, e.g., [25]) and financial mathematics (see, e.g., Atlas models in equity markets [3, 33] and dividend maximisation problems [59]). Further, in stochastic control a discontinuous control can lead to equations with discontinuous drift (see [57]). In the context of stochastic N-player games, non-smooth cost functions (such as the \(\ell _1\)-regularisation) or constraints on the size of the control process can result in discontinuous controls (bang-bang type optimal controls) and hence will give controlled state dynamics with discontinuous drift, as in [11].

We start our literature review with some key references on standard SDEs with irregular and discontinuous drift, namely [39, 40, 61, 62], and then proceed to discuss some recent articles on McKean–Vlasov SDEs with non-Lipschitz drift. Zvonkin [62] (for one-dimensional SDEs) and Veretennikov [61] (for the multi-dimensional setting) prove the existence of a unique strong solution for an SDE where the drift is assumed to be measurable and bounded, but the diffusion coefficient \(\sigma \) needs to satisfy rather strong assumptions, namely that it is bounded and uniformly elliptic, i.e., there is a \(\lambda >0\) such that for all \(x \in {\mathbb {R}}^{d}\) and all \(v \in {\mathbb {R}}^{d}\), we have \(v^{\top }\sigma (x)\sigma (x)^{\top }v \ge \lambda v^{\top }v\). An interesting addition to the aforementioned results in the case where the diffusion is not uniformly elliptic was established in the one-dimensional case in [39]. The authors assume the drift coefficient to be piecewise Lipschitz and \(\sigma \) to be globally Lipschitz with \(\sigma (\eta ) \ne 0\) for each of finitely many points of discontinuity \(\eta \) of the drift. This condition guarantees that the process does not spend a positive amount of time in the singularity. Under these assumptions, by explicitly constructing a transformation that removes the singularities, the existence of a unique strong solution can be proven and a numerical procedure for solving this class of SDEs can be constructed.

The main contribution of [40] is the extension of the one-dimensional case to the multi-dimensional setting under the assumption of piecewise Lipschitz continuity of the drift. In [40], the authors introduce a meaningful concept of piecewise Lipschitz continuity in higher dimensions, which is based on the notion of the so-called intrinsic metric. As already indicated by the one-dimensional case, there needs to be an intricate connection between the geometry of the set of discontinuities and the diffusion coefficient. We note that the exceptional set of singularities, denoted by \(\varTheta \), is assumed to be a \({\mathcal {C}}^{4}\) hypersurface and for the diffusion part one requires the following: There exists a constant \(C>0\) such that \(| \sigma (\eta )^{\top } n(\eta ) | \ge C\) for all \(\eta \in \varTheta \), where \(n(\eta )\) is orthogonal to the tangent space of \(\varTheta \) in \(\eta \) and \(| n(\eta )| =1\). Under these assumptions (and some additional technical conditions on the coefficients and on the geometry of \(\varTheta \)) the existence of a unique strong solution for multi-dimensional SDEs with piecewise Lipschitz continuous drift can be proven.

Moving on to McKean–Vlasov equations, the existence and uniqueness theory for strong solutions of such SDEs with coefficients of linear growth and Lipschitz type conditions (with respect to the state and the measure component) is well-established (see, e.g., [13, 58]). More general existence/uniqueness results for weak and strong solutions of McKean–Vlasov SDEs can be found in [6, 37, 48]. The article [6] is concerned with the weak and strong existence/uniqueness of one-dimensional equations with additive noise, where the drift is assumed to be measurable, continuous in the measure component with respect to the Monge–Kantorovich metric and further satisfies a linear growth condition. In [37, 48], a d-dimensional setting is considered, where the drift is assumed to be bounded, measureable (and possibly path-dependent) and Lipschitz continuous in the measure component with respect to the total variation distance. The diffusion is non-degenerate and independent of the measure. Under these assumptions (and some technical conditions) weak existence and uniqueness is proven.

For further recent existence and uniqueness results for strong and weak solutions of McKean–Vlasov SDEs, including results concerning standard Lipschitz assumptions on the coefficients, we refer to [15, 20, 21, 27, 30, 42, 55] and the references given therein.

The numerical analysis of SDEs with discontinuous drifts has received significant attention over the last few years, see, e.g., [19, 29, 38,39,40,41, 49,50,51,52] and the references therein for well-posedness results and for strong and weak convergence rates of numerical schemes.

In particular, in [39] (for the one-dimensional case) and in [40] (for the multi-dimensional setting) the standard strong convergence rate of order 1/2 for a method derived from the Euler–Maruyama scheme was proven. However, the applicability of these schemes is limited as they require the explicit knowledge of a transformation (and its inverse) to map the SDE with discontinuous coefficients into one with Lipschitz continuous coefficients. In [41], an Euler–Maruyama scheme without the aforementioned transformation is introduced (in a multi-dimensional setting). While this scheme is easier to apply, the authors only show a strong convergence rate of order \(1/4 - \varepsilon \) for any \(\varepsilon >0\), imposing also the stronger assumption of boundedness for both coefficients of the underlying SDE. The central idea of [41] is to quantify the probability that a multi-dimensional process is in a small neighbourhood of the set of discontinuities, using an occupation time formula. In the one-dimensional case, with coefficients of linear growth, these techniques were refined in [49] and the expected strong convergence rate of order 1/2 was recovered. Other recent works concerned with the numerical approximation of SDEs with discontinuous drifts include [50, 51], where a higher order scheme and an adaptive time-stepping scheme were introduced, respectively. In [38] a numerical scheme for classical one-dimensional diffusion processes generated by a differential operator involving discontinuous coefficients is presented. As the generator is non-local for McKean–Vlasov equations it seems a challenging problem to use these techniques in our framework.

The simulation of McKean–Vlasov SDEs typically involves two steps: First, at each time t, the true measure \({\mathcal {L}}_{X_t}\) is approximated by the empirical measure

$$\begin{aligned} \mu _t^{{\varvec{X}}^{N}}(\mathrm {d}x) := \frac{1}{N}\sum _{j=1}^{N} \delta _{X_t^{j,N}}(\mathrm {d}x), \end{aligned}$$

where \(\delta _{x}\) denotes the Dirac measure at point x and \(({\varvec{X}}^{N}_t)_{ t \in [0,T]} = (X_t^{1,N}, \ldots , X_t^{N,N})_{t \in [0,T]}^{\top }\), an interacting particle system, is the solution to the \({\mathbb {R}}^{dN}\)-dimensional SDE with components

$$\begin{aligned} \mathrm {d}X_t^{i,N} = b(X_t^{i,N}, \mu _t^{{\varvec{X}}^{N}} ) \, \mathrm {d}t + \sigma (X_t^{i,N}) \, \mathrm {d}W_t^{i}, \quad X_{0}^{i,N} = \xi ^{i}. \end{aligned}$$

Here, \(W^{i} = (W_t^{i})_{t \in [0,T]}\) and \(\xi ^{i}\), for \(i \in \lbrace 1, \ldots , N \rbrace \), are independent Brownian motions (also independent of W) and independent copies of \(\xi \), respectively. In a next step, one needs to introduce a reasonable time-stepping method to discretise the particles \((X_t^{i,N})_{t \in [0,T]}\) over some finite time horizon [0, T]. Numerical schemes for interacting particle systems with Hölder continuous coefficients (in the state variable) and with coefficients satisfying certain assumptions on monotonicity (in the state variable) and Lipschitz continuity (in the measure variable), can be found in [4, 5, 36, 54] (and the references cited therein), respectively, where a strong convergence analysis is conducted. In [1, 9], a quantitative \(L_p\)-error analysis in terms of density and cumulative distribution function approximation is presented. The survey [7] discusses several examples and numerical schemes for McKean–Vlasov equations involving singular drifts, e.g., a probabilistic interpretation of the Burgers equation, see also [9], of the 2D-incompressible Navier–Stokes equation (see e.g., [22, 47]) and turbulent flow models [53]. Other examples of McKean–Vlasov equations with singular drifts appear in the Keller–Segel equation [26], the Coulomb gas model [17], the Thomson problem [32], and the Stefan problem [34, 35].

Our numerical schemes present an original approximation method which, as of now, is restricted to the specific case of a one-dimensional spatial and one-point discontinuous drift component, but provides, again in this specific framework, a suitable alternative to the methodical mollification/cut-off approximation methods.

In this article, we first focus on the decomposable case, namely that

$$\begin{aligned} b(x,\mu ) = b_1(x) + b_2(x,\mu ), \end{aligned}$$

where \(b_1\) is piecewise Lipschitz continuous with a discontinuity in zero, and \(b_2\) satisfies the usual Lipschitz assumptions in both components. This structure allows us to present the main ideas of the analysis to be used later in a more general setting, in particular a transformation of the state variable to remove the discontinuity. In this setting, we prove well-posedness of the McKean–Vlasov equation and the associated particle system. This structure includes the important class of McKean–Vlasov equations of the form

$$\begin{aligned} \mathrm {d}X_t = \left( V(X_t) + \int _{{\mathbb {R}}} \beta (X_t-y) \, {\mathcal {L}}_{X_t}(\mathrm {d}y) \right) \, \mathrm {d}t + \sigma (X_t) \, \mathrm {d}W_t, \quad X_0 = \xi , \end{aligned}$$
(1.2)

where V describes an external potential and \(\beta \) an interaction kernel; see, e.g., [31] and the references cited therein related to mean-field over-damped Langevin equations. These models also embed the class of self-stabilizing diffusions and the McKean–Vlasov model related to the granular media equation.

We then relax the structural assumption on decomposability slightly, but still have to require certain continuity of the measure derivatives at the points of discontinuity, which encompasses the above setting as a special case. The necessity for this condition arises due to the explicit measure or time dependence of the employed transformation. A future research direction concerns a setting where the point of discontinuity is time-dependent, or depends on the distribution of the process \((X_t)_{t \in [0,T]}\), which is relevant to study further practically important models, e.g., from [33].

Having established the existence of a unique strong solution with bounded moments, we propose two Euler–Maruyama schemes for the particle systems as numerical approximations to the McKean–Vlasov equations. For an Euler–Maruyama scheme applied to the SDE in the transformed state, strong convergence of order 1/2 follows immediately, while for a direct time-discretisation of the particle system without transformation, we are only able to show order 1/9. Numerical tests indicate that this order is in general not sharp. We will discuss the reasons for this gap and possible improvements later.

The main contributions of the present article are as follows. First, we establish the well-posedness of McKean–Vlasov SDEs (with a certain discontinuity) and of their associated particle systems. Techniques from variational calculus on the measure space \({\mathcal {P}}({\mathbb {R}})\) equipped with the Wasserstein distance \({\mathcal {W}}_2\) will be essential in the proofs, due to the possible measure dependence of the transformation applied to the processes as described above. The second central contribution of the present paper is the development of numerical schemes for approximating such McKean–Vlasov SDEs and their associated particle systems. Here, a non-standard strong convergence analysis based on occupation time estimates of the discretised processes in a neighbourhood of the discontinuity will be presented.

The remainder of the paper is organised as follows: In Sect. 2, we collect all preliminary tools and notions needed throughout the paper. The precise problem description and the main results are presented in Sect. 3. Then, Sect. 4 discusses numerical schemes for McKean–Vlasov SDEs with discontinuous drift. We show strong convergence of certain orders with respect to the number of particles and time-steps, respectively. In Sect. 5, we apply our numerical scheme to a model problem arising in neuroscience [25] and to a slight modification of a mean-field game in systemic risk [16, 28].

2 Preliminaries

In the sequel, we will introduce several concepts and notions, which will be needed throughout this article. In addition, we will give a brief introduction to the so-called Lions derivative (abbreviated by L-derivative), which allows us to define a derivative with respect to measures of the space \({\mathcal {P}}_2({\mathbb {R}})\) (see below for a precise definition). Also, we recall the transformation used to cope with drifts having discontinuities in a given finite number of points and first developed in [39]. We give a summary of important properties of this mapping. Note that generic constants used in this article are denoted by \(C>0\). They are independent of the number of particles and number of time-steps, and might change their values from line to line.

2.1 Notions and notation

We start with introducing some notions and fixing the notation.

  • Throughout this article, \((\varOmega ,{\mathcal {F}},({\mathcal {F}}_t)_{t \in [0,T]},{\mathbb {P}})\) will denote a filtered probability space, where \(\mathbb {F}=({\mathcal {F}}_t)_{t \in [0,T]}\) is the natural filtration of W augmented with an independent \(\sigma \)-algebra \({\mathcal {F}}_0\) and \((\varOmega ,{\mathcal {F}},{\mathbb {P}})\) is assumed to be atomless.

  • \(({\mathbb {R}}^d,\left\langle \cdot ,\cdot \right\rangle , |\cdot |)\) represents the d-dimensional (\(d \ge 1\)) Euclidean space. As a matrix-norm, we will use \(\Vert A \Vert := \sup _{v \in {\mathbb {R}}^{d}, |v|=1}|Av|\), for any \(A \in {\mathbb {R}}^{d \times d}\).

  • We use \({\mathcal {P}}({\mathbb {R}})\) to denote the family of all probability measures on \(({\mathbb {R}},{\mathcal {B}}({\mathbb {R}}))\), where \({\mathcal {B}}({\mathbb {R}})\) denotes the Borel \(\sigma \)-field over \({\mathbb {R}}\) and define the subset of probability measures with finite second moment by

    $$\begin{aligned} {\mathcal {P}}_2({\mathbb {R}}):= \Big \{ \mu \in {\mathcal {P}}({\mathbb {R}}) : \ \int _{{\mathbb {R}}} |x|^2 \mu (\mathrm {d} x)<\infty \Big \}. \end{aligned}$$
  • We recall the definition of the standard Wasserstein distance with quadratic cost: For any \(\mu , \nu \in {\mathcal {P}}_2({\mathbb {R}})\), we define

    $$\begin{aligned} {\mathcal {W}}_2(\mu , \nu ) := \left( \inf _{\pi \in \varPi (\mu ,\nu )} \int _{{\mathbb {R}} \times {\mathbb {R}}} |x-y |^2 \pi (\mathrm {d}x,\mathrm {d}y) \right) ^{1/2}, \end{aligned}$$

    where \(\varPi (\mu ,\nu )\) denotes the set of all couplings between \(\mu \) and \(\nu \).

  • For a given \(p \ge 2\), \(L_p^{0}({\mathbb {R}})\) refers to the space of real-valued, \({\mathcal {F}}_0\)-measurable random variables X satisfying \({\mathbb {E}}[|X|^p] < \infty \) and for a terminal time \(T>0\), \({\mathcal {S}}^p([0,T])\) refers to the space of real-valued continuous, \({\mathbb {F}}\)-adapted processes, defined on the interval [0, T], with finite p-th moments, i.e., processes \((X_t)_{t \in [0,T]}\) satisfying \({\mathbb {E}} \left[ \sup _{t \in [0,T]}|X_t|^p \right] < \infty \).

We briefly introduce the L-derivative of a functional \(f: {\mathcal {P}}_2({\mathbb {R}}) \rightarrow {\mathbb {R}}\), as it will appear in the proofs presented in the main section. For further information on this concept, we refer to [12] or [10]. Here, we follow the exposition of [14]. We will associate to the function f a lifted function \({\tilde{f}}\), defined by \({\tilde{f}}(X)=f({\mathcal {L}}_X)\), where \({\mathcal {L}}_X\) is the law of X, for \(X \in L_2(\varOmega , {\mathcal {F}},{\mathbb {P}};{\mathbb {R}})\).

This will allow us to introduce L-differentiability as Fréchet derivative on the lifted space. In particular, a function f on \({\mathcal {P}}_2({\mathbb {R}})\) is said to be L-differentiable at \(\mu _0 \in {\mathcal {P}}_2({\mathbb {R}})\) if there exists a random variable \(X_0 \in L_2(\varOmega , {\mathcal {F}},{\mathbb {P}};{\mathbb {R}})\) with law \(\mu _0\), such that the lifted function \({\tilde{f}}\) is Fréchet differentiable at \(X_0\).

Now, the Riesz representation theorem implies that there is a (\({\mathbb {P}}\)-a.s.) unique \(\varPhi \in L_2(\varOmega , {\mathcal {F}},{\mathbb {P}};{\mathbb {R}})\) with

$$\begin{aligned} {\tilde{f}}(X) = {\tilde{f}}(X_0) + \langle \varPhi , X-X_0 \rangle _{L_2} + o(\Vert X-X_0\Vert _{L_2}), \text { as } \Vert X-X_0\Vert _{L_2} \rightarrow 0, \end{aligned}$$

with the standard inner product and norm on \(L_2(\varOmega , {\mathcal {F}},{\mathbb {P}};{\mathbb {R}})\). If f is L-differentiable for all \(\mu _0 \in {\mathcal {P}}_2({\mathbb {R}})\), then we say that f is L-differentiable.

It is known (see, e.g., [14, Proposition 5.25]) that there exists a Borel measurable function \(\chi : {\mathbb {R}} \rightarrow {\mathbb {R}}\), such that \(\varPhi = \chi (X_0)\) almost surely, and hence

$$\begin{aligned} f({\mathcal {L}}_X) = f({\mathcal {L}}_{X_0}) + {\mathbb {E}}\left\langle \chi (X_0), X -X_0 \right\rangle +o(\Vert X-X_0\Vert _{L_2}). \end{aligned}$$

Note that \(\chi \) only depends on the law of \(X_0\), but not on \(X_0\) itself. We define \(\partial _{\mu }f({\mathcal {L}}_{X_0})(y):=\chi (y)\), \(y \in {\mathbb {R}}\), as the L-derivative of f at \(\mu _0\). If, in addition, for a fixed \(y \in {\mathbb {R}}\), there is a version of the mapping \({\mathcal {P}}_2({\mathbb {R}}) \ni \mu \mapsto \partial _{\mu }f(\mu )(y)\) which is continuously L-differentiable, then the L-derivative of \(\partial _{\mu }f(\cdot )(y): {\mathcal {P}}_2({\mathbb {R}}) \rightarrow {\mathbb {R}}\), is defined as

$$\begin{aligned} \partial ^2_{\mu } f(\mu )(y,y'):= \partial _{\mu }(\partial _{\mu }f)(\cdot )(y)(\mu ,y'), \end{aligned}$$

for \((\mu ,y,y') \in {\mathcal {P}}_2({\mathbb {R}}) \times {\mathbb {R}} \times {\mathbb {R}}\).

We require a definition describing regularity properties of a function \(f: {\mathcal {P}}_2({\mathbb {R}}) \rightarrow {\mathbb {R}}\) in terms of the measure derivative (see [14, 18]).

Definition 2.1

Let \(f: {\mathcal {P}}_2({\mathbb {R}}) \rightarrow {\mathbb {R}}\) be a given functional.

  • We say that f is an element of the class \({\mathcal {C}}^{(1,1)}_{b}\), if f is continuously L-differentiable, for any \(\mu \), there is a continuous version of the mapping \({\mathbb {R}} \ni y \mapsto \partial _{\mu } f(\mu )(y)\) and the derivatives

    $$\begin{aligned} \partial _{\mu } f(\mu )(y), \quad \partial _{y} \lbrace \partial _{\mu } f(\mu )(\cdot ) \rbrace (y), \end{aligned}$$

    exist, are bounded and jointly continuous in the variables \((\mu ,y)\) such that \(y \in {\mathrm{Supp}}(\mu )\).

  • We say that f is an element of the class \({\mathcal {C}}^{(2,1)}_{b}\), if it is an element of \({\mathcal {C}}^{(1,1)}_{b}\) and in addition the second order Lions derivative \(\partial ^2_{\mu } f(\mu )(y,y')\) exists, is bounded and is again jointly continuous in the corresponding variables. Also, the joint continuity of all derivatives is here required globally, i.e., for all \((\mu ,y,y')\).

We give the following additional remark, which links the L-derivative of functions of empirical measures to the standard partial derivatives of its empirical projections. For a functional \(f: {\mathcal {P}}_2({\mathbb {R}}) \rightarrow {\mathbb {R}}\), we associate with it the finite dimensional projection \(f^N: {\mathbb {R}}^N \rightarrow {\mathbb {R}}\) defined as

$$\begin{aligned} f^{N}({\varvec{x}}^{N}):=f\left( \frac{1}{N} \sum _{j=1}^N \delta _{x_j} \right) , \end{aligned}$$

for \({\varvec{x}}^{N}:= (x_1, \ldots , x_N)\). If \(f \in {\mathcal {C}}^{(2,1)}_{b}\), then \(f^{N}\) is twice differentiable (in a classical sense) and

$$\begin{aligned}&\partial _{x_i} f^{N}({\varvec{x}}^{N}) = \frac{1}{N} \partial _{\mu }f\left( \frac{1}{N} \sum _{j=1}^N \delta _{x_j} \right) (x_i), \\&\partial _{x_i} \partial _{x_k} f^{N}({\varvec{x}}^{N}) = \frac{1}{N} \partial _y \partial _{\mu }f\left( \frac{1}{N} \sum _{j=1}^N \delta _{x_j} \right) (x_i) \delta _{i,k} + \frac{1}{N^2} \partial ^2_{\mu }f\left( \frac{1}{N} \sum _{j=1}^N \delta _{x_j} \right) (x_i,x_k), \end{aligned}$$

where \(\delta _{i,k}\) is the Kronecker delta, see, e.g., [14, Proposition 5.35].

2.2 Properties of the transformation G

In [39], the authors consider one-dimensional SDEs of the form

$$\begin{aligned} \mathrm {d}X_t=b(X_t) \, \mathrm {d}t+\sigma (X_t) \, \mathrm {d}W_t, \quad X_0 = x \in {\mathbb {R}}, \end{aligned}$$

with a piecewise Lipschitz continuous drift coefficient b that is discontinuous in \(K \in {\mathbb {N}}\) points \(\eta _1, \ldots , \eta _K\), and a Lipschitz diffusion coefficient \(\sigma \) that does not vanish in any \(\eta _k\). A mapping \(G:{\mathbb {R}}\rightarrow {\mathbb {R}}\) is defined to transform the SDE into one for \(Z=G(X)\) with globally Lipschitz continuous coefficients. For simplicity, we restrict the discussion to \(K=1\) with \(\eta _1=0\). We define the mapping G by

$$\begin{aligned} G(x) := x + \alpha x |x| \phi \left( \frac{x}{c} \right) , \end{aligned}$$

where

$$\begin{aligned} \phi (x):= {\left\{ \begin{array}{ll} (1-x^2)^3, &{} |x|\le 1, \\ 0, &{} |x|>1\, \end{array}\right. } \quad \alpha : =\frac{b(0^{-})- b(0^{+})}{2\sigma ^2(0)}, \end{aligned}$$

and c is a constant satisfying \(0< c < 1/|\alpha |\). The choice of \(\alpha \) yields a Lipschitz continuous drift coefficient for the SDE of \(Z=G(X)\), in particular, it removes the discontinuity in 0 from the drift. The restriction on c guarantees that G possesses a global inverse.

It is known from [40] that G satisfies the following properties:

  • G is \({\mathcal {C}}^{1}({\mathbb {R}},{\mathbb {R}})\) with \(0< \inf _{x \in {\mathbb {R}}} G'(x) \le \sup _{x \in {\mathbb {R}}} G'(x) < \infty \). Therefore, G is Lipschitz continuous and has an inverse \(G^{-1}: {\mathbb {R}}\rightarrow {\mathbb {R}}\) that is Lipschitz continuous as well.

  • The derivative \(G'\) is Lipschitz continuous (i.e., also absolutely continuous). In addition, \(G'\) has a bounded Lebesgue density \(G'':{\mathbb {R}}\rightarrow {\mathbb {R}}\), which is Lipschitz continuous on each of the subintervals \((-\infty ,0)\) and \((0,\infty )\). Also, Itô’s formula can still be applied to G and \(G^{-1}\).

3 Existence and uniqueness results

The following subsections are devoted to proving well-posedness results for certain classes of one-dimensional McKean–Vlasov SDEs with a drift having a discontinuity in zero. In a first step, we study a simple class where the resulting transformation will not depend on the measure. Here, the transformation techniques developed in [39] will allow us to prove existence and uniqueness of a strong solution. The second class of McKean–Vlasov SDEs investigated below has the intrinsic difficulty that the required transformation will depend on the measure (i.e., will be time-dependent). Hence, a fixed-point iteration in the measure component will be required and we need to use techniques from variational calculus on the measure space \({\mathcal {P}}_2({\mathbb {R}})\), in particular an Itô formula for functionals acting on this space.

For each of these classes of McKean–Vlasov SDEs, we will additionally study the well-posedness of their associated interacting particle system. Although they can be considered as N-dimensional classical SDEs, with N denoting the number of particles, the resulting set of discontinuities of the N-dimensional drift cannot be handled by the main results of [40].

Future work is needed to extend the methods developed in this article to a multi-dimensional framework. In particular, it seems that the decomposable case can be generalised when discontinuities of the form discussed in [40] are considered.

3.1 McKean–Vlasov SDEs and interacting particle systems with decomposable drift

For a given terminal time \(T >0\) and given \(p \ge 2\), we consider a one-dimensional McKean–Vlasov SDE of the form

$$\begin{aligned} \mathrm {d} X_t = b(X_t, {\mathcal {L}}_{X_t}) \, \mathrm {d}t + \sigma (X_t) \, \mathrm {d}W_t, \quad \ X_0= \xi \in L_p^{0}({\mathbb {R}}), \end{aligned}$$
(3.1)

where \(b:{\mathbb {R}}\times {\mathcal {P}}_2({\mathbb {R}}) \rightarrow {\mathbb {R}}\) and \(\sigma : {\mathbb {R}}\rightarrow {\mathbb {R}}\) are measurable functions.

In the following, we state the model assumptions which will specify the set-up for this subsection:

H. 1

  1. (1)

    We have \(\sigma (0) \ne 0\) and there exists a constant \(L >0\) such that

    $$\begin{aligned} | \sigma (x) - \sigma (x') | \le L |x-x'| \quad \forall x, x' \in {\mathbb {R}}. \end{aligned}$$
  2. (2)

    The drift is decomposable in the following sense:

    $$\begin{aligned} b(x, \mu ) = b_1(x) + b_2(x,\mu ) \quad \forall x \in {\mathbb {R}}, \ \forall \mu \in {\mathcal {P}}_2({\mathbb {R}}), \end{aligned}$$

    where \(b_1: {\mathbb {R}} \rightarrow {\mathbb {R}}\) is Lipschitz continuous on the subintervals \((-\infty ,0)\) and \((0,\infty )\) and there exists a constant \(L_1>0\) such that

    $$\begin{aligned} | b_2(x,\mu ) - b_2(x',\nu ) | \le L_1 \left( |x-x'| + {\mathcal {W}}_2(\mu ,\nu ) \right) \quad \forall x, x' \in {\mathbb {R}}, \ \forall \mu , \nu \in {\mathcal {P}}_2({\mathbb {R}}). \end{aligned}$$

We now state the main results of this section:

Proposition 3.1

Let Assumption (H.1) be satisfied, let \(\xi \in L_p^{0}({\mathbb {R}})\) for a given \(p \ge 2\) and assume \(c < 1/|\alpha |\). Then, the McKean–Vlasov SDE defined in (3.1) has a unique strong solution in \({\mathcal {S}}^{p}([0,T])\).

Proof

Transforming the McKean–Vlasov SDE (3.1), employing the transformation \(G: {\mathbb {R}} \rightarrow {\mathbb {R}}\) defined in Sect. 2.2, with

$$\begin{aligned} \alpha =\frac{b(0^{-}, \mu )- b(0^{+},\mu )}{2\sigma ^2(0)} = \frac{b_1(0^{-}) - b_1(0^{+})}{2 \sigma ^2(0)}, \end{aligned}$$

in order to eliminate the discontinuity in zero, yields a McKean–Vlasov SDE with globally Lipschitz continuous coefficients. This can be shown in a similar manner to [39, Theorem 2.5]). Moreover, G has a global inverse due to the choice \(c < 1/|\alpha |\) (see [40, Lemma 2.2]), and Itô’s formula can be applied to \(G^{-1}\), which allows to deduce the claim. \(\square \)

The interacting particles \((X^{i,N}_t)_{t \in [0,T]}\), \(i \in \lbrace 1, \ldots , N \rbrace \), associated with (3.1) satisfy

$$\begin{aligned} \mathrm {d}X_t^{i,N} = b_1(X_t^{i,N}) \, \mathrm {d}t + b_2(X_t^{i,N}, \mu _t^{{\varvec{X}}^{N}}) \, \mathrm {d}t + \sigma (X_t^{i,N}) \, \mathrm {d}W_t^{i}, \end{aligned}$$
(3.2)

where \((\xi ^{i},W^{i})\), for \(i \in \lbrace 1, \ldots , N \rbrace \), are independent copies of \((\xi ,W)\).

Proposition 3.2

Let Assumption (H.1) be satisfied, let \(\xi \in L_p^{0}({\mathbb {R}})\) for a given \(p \ge 2\) and assume \(c < 1/|\alpha |\). Then, the interacting particle system defined in (3.2) has a unique strong solution in \({\mathcal {S}}^{p}([0,T])\).

Proof

In contrast to the set-up in [40], the set of discontinuities, denoted by \(\varTheta \), is not a differentiable manifold, but has the form

$$\begin{aligned} \varTheta =\{(x_1,\dots ,x_N)^{\top } \in {\mathbb {R}}^N:\exists j\in \{1,\dots ,N\}:x_j=0\}. \end{aligned}$$

However, we may define \({\varvec{G}}_N: {\mathbb {R}}^{N} \rightarrow {\mathbb {R}}^{N}\) by

$$\begin{aligned} {\varvec{G}}_N({\varvec{x}}^N) := (G(x_1), \ldots , G(x_N))^{\top }, \end{aligned}$$

where G is as in Proposition 3.1 and \({\varvec{x}}^N=(x_1, \ldots , x_N)^{\top }\), which allows us to transform the particle system \(({\varvec{X}}_t^{N})_{t \in [0,T]} = (X^{1,N}_t, \ldots , X^{N,N}_t)^{\top }_{t \in [0,T]}\) into a new particle system with globally Lipschitz continuous coefficients. Now, \({\varvec{G}}_N\) has a global inverse, as the mapping G has a global inverse, due to the choice of c (see Sect. 2.2). Therefore, applying Itô’s formula to the inverse allows to deduce the claim. \(\square \)

3.2 McKean–Vlasov SDE with non-decomposable drift

Here, we consider again a one-dimensional McKean–Vlasov SDE of the form (3.1),

$$\begin{aligned} \mathrm {d} X_t = b(X_t, {\mathcal {L}}_{X_t}) \, \mathrm {d}t + \sigma (X_t) \, \mathrm {d}W_t, \quad \ X_0= \xi \in L_p^{0}({\mathbb {R}}). \end{aligned}$$
(3.3)

However, in contrast to the above setting, we will not assume that b can be decomposed in two parts as in Assumption (H.1(2)) from the previous section, and therefore the transformation will also depend on the measure. To be precise, for any \((x,\mu ) \in {\mathbb {R}}\times {\mathcal {P}}_2({\mathbb {R}})\), we define

$$\begin{aligned} G(x,\mu ) := x + \alpha (\mu ) x |x| \phi \left( \frac{x}{c} \right) , \end{aligned}$$
(3.4)

where

$$\begin{aligned}&\phi (x):= {\left\{ \begin{array}{ll} (1-x^2)^3, &{} |x|\le 1, \\ 0, &{} |x|>1\, \end{array}\right. } \qquad \alpha (\mu ): =\frac{b(0^{-},\mu )- b(0^{+},\mu )}{2\sigma ^2(0)}, \end{aligned}$$
(3.5)

and \(c>0\) is a constant small enough to guarantee the invertibility of G. When we speak of an ‘inverse’ of \(G(x,\mu )\), we mean ‘inverse with respect to x’, i.e., the inverse is a function \(G^{-1}:{\mathbb {R}}\times {\mathcal {P}}_2({\mathbb {R}}) \rightarrow {\mathbb {R}}\) which satisfies \(G^{-1} \big (G(x,\mu ), \mu \big )=x\) for all \((x,\mu ) \in {\mathbb {R}}\times {\mathcal {P}}_2({\mathbb {R}})\) and \(G\big (G^{-1}(z,\mu ), \mu \big )=z\) for all \((z,\mu )\in {\mathbb {R}}\times {\mathcal {P}}_2({\mathbb {R}})\). For a given flow of measures \((\mu _t)_{t \in [0,T]} \in {\mathcal {C}}([0,T],{\mathcal {P}}_2({\mathbb {R}}))\), G may also be viewed as a mapping \(G :{\mathbb {R}}\times [0,T] \rightarrow {\mathbb {R}}\).

In the following, we state the model assumptions which will specify the set-up for this subsection:

H. 2

Assumption (H.1(1)) is satisfied and we require:

  1. (1)

    There exist constants \(L, L_1>0\) such that

    $$\begin{aligned}&\sup _{x \ne 0} \frac{|b(x,\mu )|}{1+|x|} \le L, \quad | b(x,\mu ) - b(x,\nu ) | \le L_1 {\mathcal {W}}_2(\mu ,\nu ) \quad \forall x \in {\mathbb {R}}\setminus \lbrace 0 \rbrace ,\\ {}&\forall \mu , \nu \in {\mathcal {P}}_2({\mathbb {R}}). \end{aligned}$$

    Additionally, for all \(\mu \in {\mathcal {P}}_2({\mathbb {R}})\), \({\mathbb {R}}\ni x \mapsto b(x, \mu )\) is Lipschitz continuous on the subintervals \((-\infty ,0)\) and \((0,\infty )\), uniformly with respect to \(\mu \).

  2. (2)

    \(\alpha \in {\mathcal {C}}^{(1,1)}_{b}\), and the mapping \({\mathcal {P}}_2({\mathbb {R}}) \times {\mathbb {R}}\ni (\mu , y) \mapsto \partial _y \partial _{\mu } \alpha (\mu )(y)\) is Lipschitz continuous, that is, there exists a constant \(L_2>0\) such that

    $$\begin{aligned}&| \partial _y \partial _{\mu } \alpha (\mu )(y) - \partial _y \partial _{\mu } \alpha (\nu )(y')| \le L_2 \left( |y-y'| + {\mathcal {W}}_2(\mu ,\nu ) \right) \quad \forall y,y' \in {\mathbb {R}}, \\ {}&\forall \mu , \nu \in {\mathcal {P}}_2({\mathbb {R}}). \end{aligned}$$
  3. (3)

    For any \(\mu \in {\mathcal {P}}_2({\mathbb {R}})\), the mapping \({\mathbb {R}}\ni y \mapsto \partial _{\mu } \alpha (\mu )(y)\) vanishes in zero, i.e.,

    $$\begin{aligned}&\partial _{\mu } \alpha (\mu )(0) = \partial _{\mu } b(0^{-}, \mu )(0)- \partial _{\mu }b(0^{+},\mu )(0) = 0. \end{aligned}$$
    (3.6)
  4. (4)

    The mapping \({\mathcal {P}}_2({\mathbb {R}}) \times {\mathbb {R}}\ni (\mu ,y) \mapsto \partial _{\mu }\alpha (\mu )(y) b(y,\mu )\) is Lipschitz continuous.

Remark 3.1

The requirement in (H.2(2)) that \(\alpha \in {\mathcal {C}}^{(1,1)}_{b}\) is needed to apply an Itô formula for \(\alpha \) (see [14, Proposition 5.102]).

Remark 3.2

Note that (H.2(4)) could also be replaced by the following alternative set of assumptions: On each of the two subintervals \((-\infty ,0)\) and \((0,\infty )\), \(b(\cdot ,\mu )\) is a \({\mathcal {C}}^{1}({\mathbb {R}},{\mathbb {R}})\) function with bounded derivative and, additionally, for any \(\mu \in {\mathcal {P}}_2({\mathbb {R}})\), the mapping \({\mathbb {R}}\ni y \mapsto \partial _y\partial _{\mu } \alpha (\mu )(y)\) vanishes in zero, i.e.,

$$\begin{aligned} \partial _y \partial _{\mu } \alpha (\mu )(0) = \partial _y\partial _{\mu } b(0^{-}, \mu )(0)- \partial _y\partial _{\mu }b(0^{+},\mu )(0) = 0. \end{aligned}$$

The following proposition shows the Lipschitz continuity of the mapping \({\mathbb {R}}\times {\mathcal {P}}_2({\mathbb {R}}) \ni (x,\mu ) \rightarrow G^{-1}(x,\mu )\).

Proposition 3.3

Let the function G be defined as in (3.4) with \(c < 1/ \sup _{\mu \in {\mathcal {P}}_2({\mathbb {R}})}|\alpha (\mu )|\) and (3.5) and let Assumption (H.2(1)) be satisfied. Then, there exists a constant L(c), which depends on c and the model parameters in Assumption (H.2(1)), such that \(L(c) \rightarrow 0\) as \(c \rightarrow 0\) and for any \(x,y \in {\mathbb {R}}\) and \(\mu , \nu \in {\mathcal {P}}_2({\mathbb {R}})\)

$$\begin{aligned} |G^{-1}(x,\mu ) - G^{-1}(y,\nu )| \le 2|x-y| + L(c){\mathcal {W}}_2(\mu ,\nu ). \end{aligned}$$

Proof

First, we note that by differentiating (3.4), we get for \(x \in [-c,c]\)

$$\begin{aligned} \partial _x G(x,\mu )&= 1 - \frac{6\alpha (\mu )}{c^2} |x| x^2 \left( 1-(x/c)^2 \right) ^2 + 2\alpha (\mu )|x|(1-(x/c)^2)^3\\&= 1 - 2c \alpha (\mu ) \frac{|x|}{c}\big (4x^2/c^2 - 1\big )\left( 1-(x/c)^2 \right) ^2\,. \end{aligned}$$

It is easy to verify that

$$\begin{aligned} \sup _{x\in [0,c]}\Big |\frac{|x|}{c}\big (4x^2/c^2 - 1\big )\left( 1-(x/c)^2 \right) ^2\Big | =\sup _{z\in [0,1]}\Big ||z|\big (4z^2 - 1\big )\left( 1-z^2 \right) ^2\Big |<\frac{1}{4}\,, \end{aligned}$$

which implies that for all \(x\in {\mathbb {R}}\) and \(\mu \in {\mathcal {P}}_2({\mathbb {R}})\)

$$\begin{aligned} \big |\partial _x G(x,\mu )-1\big | <c \frac{|\sup _{\mu \in {\mathcal {P}}_2({\mathbb {R}})}\alpha (\mu )|}{2}\,. \end{aligned}$$

In particular, if \(c < 1/\sup _{\mu \in {\mathcal {P}}_2({\mathbb {R}})} |\alpha (\mu )|\), then for all \(x,y\in {\mathbb {R}}\) and \(\mu \in {\mathcal {P}}_2({\mathbb {R}})\), we have

$$\begin{aligned} \partial _x G(x,\mu )>\frac{1}{2}, \quad \big |G^{-1}(x,\mu )-G^{-1}(y,\mu )\big |<2|x-y|\,. \end{aligned}$$

That \( \mu \mapsto \partial _xG(x,\mu )\) is Lipschitz continuous is a consequence of (H.2(1)). It is easy to show that the mapping \( x \mapsto \partial _xG(x,\mu )\) is also Lipschitz continuous. Denote the Lipschitz constant of \(\partial _xG\) with respect to the first and second argument by \(L_x\) and \(L_{\mu }\), respectively. Writing

$$\begin{aligned} G^{-1}(x,\mu ) = \int _{0}^{x} \frac{1}{\partial _xG(G^{-1}(y,\mu ),\mu )} \, \mathrm {d}y, \end{aligned}$$

we obtain

$$\begin{aligned}&|G^{-1}(x,\mu ) - G^{-1}(x,\nu )| \\&\quad \le \int _{0}^{x} \left| \frac{1}{\partial _xG(G^{-1}(y,\mu ),\mu )} - \frac{1}{\partial _xG(G^{-1}(y,\nu ),\mu )} \right| \, \mathrm {d}y \\&\qquad + \int _{0}^{x} \left| \frac{1}{\partial _xG(G^{-1}(y,\nu ),\mu )} - \frac{1}{\partial _xG(G^{-1}(y,\nu ),\nu )} \right| \, \mathrm {d}y \\&\quad \le \int _{0}^{x} \left| \frac{\partial _xG(G^{-1}(y,\nu ),\mu )-\partial _xG(G^{-1}(y,\mu ),\mu )}{\partial _xG(G^{-1}(y,\mu ),\mu )\partial _xG(G^{-1}(y,\nu ),\mu )} \right| \, \mathrm {d}y\\&\qquad + \int _{0}^{x} \left| \frac{\partial _xG(G^{-1}(y,\nu ),\nu )-\partial _xG(G^{-1}(y,\nu ),\mu )}{\partial _xG(G^{-1}(y,\nu ),\mu )\partial _xG(G^{-1}(y,\nu ),\nu )} \right| \, \mathrm {d}y \\&\quad \le \int _{0}^{x} 4L_x\left| G^{-1}(y,\nu )-G^{-1}(y,\mu ) \right| \, \mathrm {d}y + \int _{0}^{x} 4L_{\mu } {\mathcal {W}}_2(\mu ,\nu ) \, \mathrm {d}y \\&\quad \le 4L_x\int _{0}^{x} \left| G^{-1}(y,\nu )-G^{-1}(y,\mu ) \right| \, \mathrm {d}y + 4L_{\mu }x {\mathcal {W}}_2(\mu ,\nu ) \,. \end{aligned}$$

For \(|x|< c\), we have

$$\begin{aligned} |G^{-1}(x,\mu ) - G^{-1}(x,\nu )|&\le 4L_x\int _{0}^{x} \left| G^{-1}(y,\nu )-G^{-1}(y,\mu ) \right| \, \mathrm {d}y + 4L_{\mu }x {\mathcal {W}}_2(\mu ,\nu ) \\&\le 4L_x\int _{0}^{x} \left| G^{-1}(y,\nu )-G^{-1}(y,\mu ) \right| \, \mathrm {d}y + 4L_{\mu }c {\mathcal {W}}_2(\mu ,\nu ) \,, \end{aligned}$$

and hence Gronwall’s inequality implies

$$\begin{aligned} |G^{-1}(x,\mu ) - G^{-1}(x,\nu )| \le 4L_{\mu }c {\mathcal {W}}_2(\mu ,\nu )e^{4L_xx} \le 4L_{\mu }ce^{4L_xc} {\mathcal {W}}_2(\mu ,\nu )\,. \end{aligned}$$

For \(|x| \ge c\), \(|G^{-1}(x,\mu ) - G^{-1}(x,\nu )|=0\le 4L_{\mu }ce^{4L_xc} {\mathcal {W}}_2(\mu ,\nu )\) by the definition of G. We finally obtain, for all \(x,y\in {\mathbb {R}}\) and all \(\mu ,\nu \in {\mathcal {P}}_2({\mathbb {R}})\), with \(L(c):=4L_{\mu }ce^{4L_xc}\),

$$\begin{aligned}&|G^{-1}(x,\mu ) - G^{-1}(y,\nu )| \le |G^{-1}(x,\mu ) - G^{-1}(x,\nu )|+|G^{-1}(x,\nu ) - G^{-1}(y,\nu )|\\&\quad \le L(c) {\mathcal {W}}_2(\mu ,\nu ) + 2|x-y| \le \max (L(c),2) ({\mathcal {W}}_2(\mu ,\nu ) + |x-y|)\,. \end{aligned}$$

\(\square \)

Remark 3.3

In what follows, we will assume that \(c < 1/\sup _{\mu \in {\mathcal {P}}_2({\mathbb {R}})} |\alpha (\mu )|\) and is small enough such that the Lipschitz constant of the mapping \({\mathcal {P}}_2({\mathbb {R}}) \ni \mu \mapsto G^{-1}(x,\mu )\) (i.e., the constant L(c) a few lines above) is less than a half. The reason for this requirement will become clearer in the proof of Theorem 3.1.

Similar to the previous section, we aim to recover a unique strong solution of (3.3) by setting \(X_t = G^{-1}(Z_t^{\mu },\mu _t)\), where \(\mu _t = {\mathcal {L}}_{X_t}\) for \(t \in [0,T]\), and \((Z_t^{\mu })_{t \in [0,T]}\) is the process obtained by applying the transformation G to X. Even though G is not twice continuously differentiable in the state variable, Itô’s formula is still applicable due to the special form of the discontinuity (see [24, Theorem 2.1] and the comments after the proof of this theorem). Now, observe that

$$\begin{aligned} \mathrm {d}G(X_t,\mu _t)&= \left( \partial _t G(X_t,\mu _t) + b(X_t,\mu _t) + \alpha ({\mu _t}) {\bar{\phi }}'(X_t) b(X_t,\mu _t) t \right. \\&\qquad \left. + \frac{1}{2} \alpha ({\mu _t}) {\bar{\phi }}''(X_t) \sigma ^2(X_t) \right) \, \mathrm {d}t \\&\qquad + (\sigma (X_t) + \alpha ({\mu _t}) {\bar{\phi }}'(X_t)\sigma (X_t)) \, \mathrm {d}W_t. \end{aligned}$$

Itô’s formula along a flow of measures \((\mu _t)_{t \in [0,T]} \in {\mathcal {C}}([0,T],{\mathcal {P}}_2({\mathbb {R}}))\) (see, e.g., [14, Proposition 5.102]) implies

$$\begin{aligned}&\partial _t G(x,\mu _t) \\&= \int _{{\mathbb {R}}} \left( b(y,\mu _t) \partial _{\mu }G(x,\mu _t)(y) + \frac{\sigma ^{2}(y)}{2} \partial _y \partial _{\mu }G(x,\mu _t)(y) \right) \mu _t(\mathrm {d}y) \\&=:{\mathcal {L}}_{\mu _t}(G(x,\cdot ))(\mu _t), \end{aligned}$$

where we recall that \(\partial _y \partial _{\mu }G(x,\mu _t)(y)\) denotes the derivative of the mapping \({\mathbb {R}}\ni y \mapsto \partial _{\mu }G(x,\mu _t)(y)\) and

$$\begin{aligned}&\partial _{\mu }G(x,\mu _t)(y) = \partial _{\mu }\alpha (\mu _t)(y)|x|x\phi (x/c),\\&\partial _y \partial _{\mu }G(x,\mu _t)(y) = \partial _y\partial _{\mu }\alpha (\mu _t)(y)|x|x\phi (x/c). \end{aligned}$$

Hence, we define

$$\begin{aligned}&\mathrm {d}Z_t^{\mu } :={\tilde{b}}(Z^{\mu }_t,\mu _t) \, \mathrm {d}t + {\tilde{\sigma }}(Z^{\mu }_t) \, \mathrm {d}W_t, \quad Z_0^{\mu } = G(\xi ,\delta _{\xi }), \end{aligned}$$
(3.7)

where

$$\begin{aligned} {\tilde{b}}(z,\mu )&:= {\mathcal {L}}_{\mu }(G(G^{-1}(z,\mu ),\cdot )(\mu ) + b(G^{-1}(z,\mu ),\mu ) \nonumber \\&\qquad + \alpha ({\mu }) {\bar{\phi }}'(G^{-1}(z,\mu ))\nonumber b(G^{-1}(z,\mu ),\mu ) \nonumber \\&\qquad + \frac{1}{2} \alpha (\mu ) {\bar{\phi }}''(G^{-1}(z,\mu )) \sigma ^2(G^{-1}(z,\mu )), \nonumber \\ {\tilde{\sigma }}(z,\mu )&:= \sigma (G^{-1}(z,\mu )) + \alpha ({\mu }) {\bar{\phi }}'(G^{-1}(z,\mu ))\sigma (G^{-1}(z,\mu )). \end{aligned}$$
(3.8)

In the following, we will show that the decoupled SDE (3.7) where the flow \((\mu _t)_{t \in [0,T]} \in {\mathcal {C}}([0,T],{\mathcal {P}}_2({\mathbb {R}}))\) is fixed has Lipschitz continuous coefficients. Note that for a such a flow of measures the process \(X^{\mu }\) in (3.3) (interpreted as classical SDE) has bounded moments uniformly in \((\mu _t)_{t \in [0,T]}\), which is a consequence of (H.1(1)) and (H.2(1)). In particular, for \(\xi \in L_2^{0}({\mathbb {R}})\), we have the a-priori estimate \({\mathbb {E}}\left[ \sup _{0 \le t \le T} |X^{\mu }_t|^{2} \right] \le C(1+ {\mathbb {E}}[|\xi |^{2}])=:{\bar{C}}\), where \(C>0\) only depends on T and the constants appearing in the model assumptions. We will introduce the following subspace of \({\mathcal {C}}([0,T],{\mathcal {P}}_2({\mathbb {R}}))\): We define \({\mathcal {P}}^{b} := \lbrace \mu \in {\mathcal {C}}([0,T],{\mathcal {P}}_2({\mathbb {R}})): \ \sup _{t \in [0,T]} \int _{{\mathbb {R}}} x^2 \, \mu _t(\mathrm {d}x) \le {\bar{C}} \rbrace \), where \({\bar{C}}\) is defined as above and complete this space with the metric \(\sup _{t \in [0,T]} {\mathcal {W}}_2(\mu _t,\nu _t)\), for \((\mu _t)_{t \in [0,T]}, (\nu _t)_{t \in [0,T]} \in {\mathcal {P}}^{b}\).

Lemma 3.1

Let Assumption (H.2) be satisfied and assume \(c < 1/ \sup _{\mu \in {\mathcal {P}}_2({\mathbb {R}})}|\alpha (\mu )|\). Then, \({\tilde{b}}\) and \({\tilde{\sigma }}\) given in (3.8) are Lipschitz continuous on \({\mathbb {R}}\times {\mathcal {P}}^{b}\).

Proof

Using (H.1(1)), the Lipschitz continuity of \(z \mapsto G^{-1}(z,\nu _t)\) and the uniform boundedness of \(\alpha \), see (H.2(1)), in combination with [40, Lemma 2.5], gives the Lipschitz continuity of \( z \mapsto {\tilde{\sigma }}(z,\nu _t)\), with a Lipschitz constant independent of \(\nu _t\). Similarly, we can deduce that \(\nu _t \mapsto {\tilde{\sigma }}(z,\nu _t)\) is Lipschitz continuous, due to Proposition 3.3 and the Lipschitz continuity of \( \nu _t \mapsto \alpha (\nu _t)\).

The choice of \(\alpha \) along with (H.2(1)) guarantees that the mapping

$$\begin{aligned} z \mapsto b(G^{-1}(z,\nu _t),\nu _t) + \frac{1}{2} \alpha (\nu _t) {\bar{\phi }}''(G^{-1}(z,\nu _t)) \sigma ^2(G^{-1}(z,\nu _t)), \end{aligned}$$

is Lipschitz continuous. From [40, Lemma 2.4], we can deduce the Lipschitz continuity of

$$\begin{aligned} z \mapsto \alpha (\nu _t) {\bar{\phi }}'(G^{-1}(z,\nu _t)) b(G^{-1}(z,\nu _t),\nu _t). \end{aligned}$$

That

$$\begin{aligned}&\nu _t \mapsto b(G^{-1}(z,\nu _t),\nu _t) + \frac{1}{2} \alpha (\nu _t) {\bar{\phi }}''(G^{-1}(z,\nu _t)) \sigma ^2(G^{-1}(z,\nu _t)), \\&\nu _t \mapsto \alpha (\nu _t) {\bar{\phi }}'(G^{-1}(z,\nu _t)) b(G^{-1}(z,\nu _t),\nu _t), \end{aligned}$$

are Lipschitz continuous is a consequence of (H.1(1)) and (H.2(1)), Proposition 3.3 and the fact that \(G^{-1}(z,\mu ^{(1)})\) and \(G^{-1}(z,\mu ^{(2)})\), for \(\mu ^{(1)}, \mu ^{(2)} \in {\mathcal {P}}_2({\mathbb {R}})\), have the same sign. Also note that \({\bar{\phi }}'\) and \({\bar{\phi }}''\) are (piecewise) Lipschitz continuous and bounded. It remains to analyse the Lipschitz continuity of

$$\begin{aligned}&(z,\nu _t) \mapsto \int _{{\mathbb {R}}} \left( b(y,\nu _t) \partial _{\mu }G(G^{-1}(z,\nu _t),\nu _t)(y) + \frac{\sigma ^{2}(y)}{2} \partial _y \partial _{\mu }G(G^{-1}(z,\nu _t),\nu _t)(y) \right) \nu _t(\mathrm {d}y). \end{aligned}$$
(3.9)

Assumptions (H.2(2)) and (H.2(3)) guarantee that above mapping exists and further that the mapping \(y \mapsto b(y,\nu _t) \partial _{\mu }G(G^{-1}(z,\nu _t),\nu _t)(y)\) is continuous in zero. We start by analysing the Lipschitz continuity of (3.9) with respect to the measure variable. Consider now an arbitrary coupling \(\varPi _t(\cdot ,\cdot )\) between \(\nu _t(\cdot )\) and \(\mu _t(\cdot )\), for \((\mu _t)_{t \in [0,T]}, (\nu _t)_{t \in [0,T]} \in {\mathcal {P}}^{b}\), and estimate

$$\begin{aligned}&\int _{{\mathbb {R}}^2} \left( b(y,\nu _t) \partial _{\mu }G(G^{-1}(z,\nu _t),\nu _t)(y) - b(x,\mu _t) \partial _{\mu }G(G^{-1}(z,\mu _t),\mu _t)(x) \right) \, \varPi _t(\mathrm {d}y,\mathrm {d}x) \\&= \int _{{\mathbb {R}}^2} \left( b(y,\nu _t) \partial _{\mu }G(G^{-1}(z,\nu _t),\nu _t)(y) - b(x,\mu _t) \partial _{\mu }G(G^{-1}(z,\nu _t),\mu _t)(x) \right) \, \varPi _t(\mathrm {d}y,\mathrm {d}x) \\&\quad + \int _{{\mathbb {R}}^2} \left( b(x,\mu _t) \partial _{\mu }G(G^{-1}(z,\nu _t),\mu _t)(x) - b(x,\mu _t)\partial _{\mu }G(G^{-1}(z,\mu _t),\mu _t)(x) \right) \, \varPi _t(\mathrm {d}y,\mathrm {d}x). \end{aligned}$$

Note that

$$\begin{aligned}&\int _{{\mathbb {R}}^2} \left| b(y,\nu _t) \partial _{\mu }G(G^{-1}(z,\nu _t),\nu _t)(y) - b(x,\mu _t) \partial _{\mu }G(G^{-1}(z,\nu _t),\mu _t)(x) \right| \, \varPi _t(\mathrm {d}y,\mathrm {d}x) \\&= \int _{{\mathbb {R}}^2} \Big | b(y,\nu _t) \partial _{\mu }\alpha (\nu _t)(y) {\bar{\phi }}(G^{-1}(z,\nu _t)) - b(x,\mu _t) \partial _{\mu }\alpha (\mu _t)(x){\bar{\phi }}(G^{-1}(z,\nu _t)) \Big | \, \varPi _t(\mathrm {d}y,\mathrm {d}x) \\&\le C {\mathcal {W}}_2(\mu _t,\nu _t), \end{aligned}$$

where we used (H.2(4)) and the fact that \({\bar{\phi }}\) is bounded. Furthermore, from the boundedness of \((x,\nu _t) \mapsto \partial _{\mu }\alpha (\nu _t)(x)\) and (H.2(1)), we derive

$$\begin{aligned}&\int _{{\mathbb {R}}^2} \left| b(x,\mu _t) \partial _{\mu }G(G^{-1}(z,\nu _t),\mu _t)(x) - b(x,\mu _t) \partial _{\mu }G(G^{-1}(z,\mu _t),\mu _t)(x) \right| \, \varPi _t(\mathrm {d}y,\mathrm {d}x) \nonumber \\&\quad \le \int _{{\mathbb {R}}^2} \left| b(x,\mu _t)\partial _{\mu }\alpha (\mu _t)(x) \right| \left| {\bar{\phi }}(G^{-1}(z,\nu _t)) - {\bar{\phi }}(G^{-1}(z,\mu _t)) \right| \, \varPi _t(\mathrm {d}y,\mathrm {d}x) \nonumber \\&\quad \le C \int _{{\mathbb {R}}^2} (1+|x|) \left| {\bar{\phi }}(G^{-1}(z,\nu _t)) - {\bar{\phi }}(G^{-1}(z,\mu _t)) \right| \, \varPi _t(\mathrm {d}y,\mathrm {d}x) \nonumber \\&\quad \le C {\mathcal {W}}_2(\mu _t,\nu _t). \end{aligned}$$
(3.10)

We remark that in the last inequality, we used the Lipschitz continuity of \( z \mapsto |z| z \phi (z/c)\), Proposition 3.3 and employed that \((\mu _t)_{t \in [0,T]}\) is an element of the space \({\mathcal {P}}^{b}\). In a similar manner, we can show the Lipschitz continuity of

$$\begin{aligned} z \mapsto \int _{{\mathbb {R}}} b(y,\nu _t) \partial _{\mu }G(G^{-1}(z,\nu _t),\nu _t)(y) \, \nu _t(\mathrm {d}y). \end{aligned}$$

Analogous statements can be derived for

$$\begin{aligned} (z,\nu _t) \mapsto \int _{{\mathbb {R}}} \frac{\sigma ^{2}(y)}{2} \partial _y \partial _{\mu }G(G^{-1}(z,\nu _t),\nu _t)(y) \, \nu _t(\mathrm {d}y), \end{aligned}$$

taking (H.1(1)) and (H.2(2)) into account. \(\square \)

We are now ready to present the main result and its proof of this section:

Theorem 3.1

Let Assumption (H.2) be satisfied, let \(\xi \in L_p^{0}({\mathbb {R}})\) for a given \(p \ge 2\) and assume that the constant c is sufficiently small (as in Remark 3.3). Then, the McKean–Vlasov SDE defined in (3.3) has a unique strong solution in \({\mathcal {S}}^{p}([0,T])\).

Proof

First, we remark that for any given flow of measures \((\mu _t)_{t \in [0,T]} \in {\mathcal {C}}([0,T],{\mathcal {P}}_2({\mathbb {R}}))\) the SDE defined in (3.7) has a unique strong solution by Lemma 3.1. Now, for \((\mu _t)_{t \in [0,T]}, (\nu _t)_{t \in [0,T]} \in {\mathcal {P}}^{b}\), we obtain, for any \(t \in [0,T]\), using Lemma 3.1, BDG’s inequality and Hölder’s inequality along with Gronwall’s inequality

$$\begin{aligned}&{\mathbb {E}}\left[ |Z^{\mu }_t-Z^{\nu }_t|^2 \right] \\&\le C \left( {\mathbb {E}} \left[ \int _{0}^{t} |{\tilde{b}}(Z^{\mu }_s,\mu _s) - {\tilde{b}}(Z^{\nu }_s,\nu _s)|^2 \, \mathrm {d}s \right] + {\mathbb {E}} \left[ \int _{0}^{t} |{\tilde{\sigma }}(Z^{\mu }_s,\mu _s) - {\tilde{\sigma }}(Z^{\nu }_s,\nu _s)|^2 \, \mathrm {d}s \right] \right) \\&\le C \Bigg ( {\mathbb {E}} \left[ \int _{0}^{t} |{\tilde{b}}(Z^{\mu }_s,\mu _s) - {\tilde{b}}(Z^{\nu }_s,\mu _s)|^2 \, \mathrm {d}s \right] + {\mathbb {E}} \left[ \int _{0}^{t} |{\tilde{b}}(Z^{\nu }_s,\mu _s) - {\tilde{b}}(Z^{\nu }_s,\nu _s)|^2 \, \mathrm {d}s \right] \\&\quad + {\mathbb {E}} \left[ \int _{0}^{t} |{\tilde{\sigma }}(Z^{\mu }_s,\mu _s) - {\tilde{\sigma }}(Z^{\nu }_s,\mu _s)|^2 \, \mathrm {d}s \right] + {\mathbb {E}} \left[ \int _{0}^{t} |{\tilde{\sigma }}(Z^{\nu }_s,\mu _s) - {\tilde{\sigma }}(Z^{\nu }_s,\nu _s)|^2 \, \mathrm {d}s \right] \Bigg ) \\&\le C {\mathbb {E}} \left[ \int _{0}^{t} \left( |Z^{\mu }_s-Z^{\nu }_s|^2 + {\mathcal {W}}_2^{2}(\mu _s,\nu _s) \right) \mathrm {d}s \right] \le C \int _{0}^{t} {\mathcal {W}}_2^{2}(\mu _s,\nu _s) \, \mathrm {d}s. \end{aligned}$$

For \(k \ge 0\) and \(t \in [0,T]\), we define the Picard iteration

$$\begin{aligned} \mu _t^{k+1} = \text {Law}\left( G^{-1}(Z_t^{\mu ^{k}},\mu _t^{k}) \right) , \end{aligned}$$
(3.11)

with \(Z_t^{\mu ^{0}} = G(\xi ,\delta _{\xi })\) and \(\mu _t^{0}=\delta _{\xi }\). Note that by Itô’s formula (applied to \(G^{-1}\)), the process defined by \(X_t^{k+1}:= G^{-1}(Z_t^{\mu ^{k}},\mu _t^{k})\) is the solution to

$$\begin{aligned} \mathrm {d} X^{k+1}_t = b(X^{k+1}_t,\mu _t^{k}) \, \mathrm {d}t + \sigma (X^{k+1}_t) \, \mathrm {d}W_t, \quad X^{k+1}_0= \xi . \end{aligned}$$

Recall that \((X^{k+1}_t)_{t \in [0,T]}\) has uniformly bounded moments (uniformly in k), due to (H.1(1)) and (H.2(1)), i.e., we have

$$\begin{aligned} \sup _{k \ge 1} {\mathbb {E}}\left[ \sup _{0 \le t \le T} |X^{k}_t|^{p} \right] \le C(1+{\mathbb {E}}[|\xi |^{p}]). \end{aligned}$$
(3.12)

The applicability of Itô’s formula for \(G^{-1}\) is a consequence of the fact that the inverse inherits the regularity of G, in particular the mapping \({\mathcal {P}}_2({\mathbb {R}}) \ni \mu \mapsto G^{-1}(y,\mu )\) is still an element of the class \({\mathcal {C}}^{(1,1)}_{b}\) (see, Proposition A.1 in Appendix A.1). Then above estimate and Proposition 3.3 yield

$$\begin{aligned} \sup _{t \in [0, T]} {\mathcal {W}}_2^{2}(\mu _t^{k+1},\mu _t^{k})&\le \sup _{t \in [0, T]}{\mathbb {E}}\left[ |X^{k+1}_t-X^{k}_t|^2 \right] \nonumber \\&\le 2 \sup _{t \in [0, T]} {\mathbb {E}}\left[ |G^{-1}(Z_t^{\mu ^{k}},\mu _t^{k}) - G^{-1}(Z_t^{\mu ^{k-1}},\mu _t^{k})|^2 \right] \nonumber \\&\quad + 2 \sup _{t \in [0, T]}{\mathbb {E}}\left[ |G^{-1}(Z_t^{\mu ^{k-1}},\mu _t^{k}) - G^{-1}(Z_t^{\mu ^{k-1}},\mu _t^{k-1})|^2 \right] \nonumber \\&\le C \int _{0}^{T} {\mathcal {W}}_2^{2}(\mu ^{k}_s,\mu ^{k-1}_s) \mathrm {d}s + 2L(c) \sup _{t \in [0, T]} {\mathcal {W}}_2^{2}(\mu ^{k}_{t},\mu ^{k-1}_{t}) \nonumber \\&\le C\int _{0}^{T} {\mathcal {W}}_2^{2}(\mu ^{k}_s,\mu ^{k-1}_s) \mathrm {d}s + 2L(c) \sup _{t \in [0, T]} {\mathbb {E}}\left[ |X^{k}_{t}-X^{k-1}_{t}|^2 \right] , \end{aligned}$$
(3.13)

where \(L:= 2L(c) < 1\) due to the choice of c and \(C>0\) is a constant depending on the constants appearing in Proposition 3.3 and Lemma 3.1.

Let now \(0< T_0 <T\) such that \(CT_0 + L < 1\). With this choice the uniqueness of (3.3) on \([0,T_0]\) follows from the estimate in (3.13) by assuming there exist two solutions \((X,\mu )\) and \((Y,\nu )\) to (3.3), with \(\mu _t\) and \(\nu _t\) the marginal laws of \(X_t\) and \(Y_t\), respectively, for \(t \in [0,T_0]\). In addition, we observe that the sequence of flows \((\mu ^{k})_{k}\), for \(\mu ^{k}=(\mu ^{k}_t)_{t \in [0,T_0]}\), is a Cauchy sequence in the complete metric space \({\mathcal {C}}([0,T],{\mathcal {P}}_2({\mathbb {R}}))\) equipped with the Wasserstein distance \(\sup _{t \in [0, T_0]} {\mathcal {W}}_2(\mu _t,\nu _t)\). Hence, (3.11) has a fixed point, in particular we have \(X_t = G^{-1}(Z_t^{\mu },\mu _t)\), where \(\mu _t={\mathcal {L}}_{X_t}\). Itô’s formula applied to \(G^{-1}\) yields the claim for the time interval \([0,T_0]\). Repeating the above procedure starting at \(T_0\), we can extend the solution to the interval \([T_0,T_1]\), for some \(T_0< T_1 <T\). This is possible as the choice of \(T_1\) depends on \(X_{T_0}\) only through the second moment of \(X_{T_0}\), for which we have a uniform bound for the entire interval [0, T], see (3.12). Proceeding in such a manner, we can obtain well-posedness of (3.3) on [0, T]. \(\square \)

3.3 Interacting particle system with non-decomposable drift

In the following, we state the model assumptions which will specify the set-up for this subsection:

H. 3

Assumptions (H.2(3)) and (H.2(4)) are satisfied and we require:

  1. (1)

    Assumption (H.1(1)) holds and there exists a constant \(L >0\) such that \(|\sigma (x)| \le L\) for all \(x \in {\mathbb {R}}\).

  2. (2)

    There exists a constant \(L_1>0\) such that

    $$\begin{aligned} | b(x,\mu ) - b(x,\nu ) | \le L_1 {\mathcal {W}}_2(\mu ,\nu ) \quad \forall x \in {\mathbb {R}} \setminus \lbrace 0 \rbrace , \ \forall \mu , \nu \in {\mathcal {P}}_2({\mathbb {R}}). \end{aligned}$$

    Further, for any \(\mu \in {\mathcal {P}}_2({\mathbb {R}})\), \(x \mapsto b(x,\mu )\) is Lipschitz continuous on the subintervals \((-\infty ,0)\) and \((0,\infty )\), uniformly with respect to \(\mu \).

  3. (3)

    \(\alpha \in {\mathcal {C}}^{(1,2)}_{b}\) is a bounded function and the mappings

    $$\begin{aligned}&{\mathcal {P}}_2({\mathbb {R}}) \times {\mathbb {R}}\ni (\mu ,y) \mapsto \partial _{\mu } \alpha (\mu )(y), \\&{\mathcal {P}}_2({\mathbb {R}}) \times {\mathbb {R}}\ni (\mu ,y) \mapsto \partial _y \partial _{\mu } \alpha (\mu )(y),\\&{\mathcal {P}}_2({\mathbb {R}}) \times {\mathbb {R}}\times {\mathbb {R}}\ni (\mu ,y,y') \mapsto \partial ^{2}_{\mu }\alpha (\mu )(y,y'), \end{aligned}$$

    are bounded and Lipschitz continuous.

Remark 3.4

Note that, compared to (H.2(1)), we do not require the drift to be uniformly bounded in the measure component.

The interacting particles of the system \((X^{i,N}_t)_{t \in [0,T]}\), for \(i \in \lbrace 1, \ldots , N \rbrace \) associated with (3.3) satisfy

$$\begin{aligned} \mathrm {d}X_t^{i,N} = b(X_t^{i,N}, \mu _t^{{\varvec{X}}^{N}}) \, \mathrm {d}t + \sigma (X_t^{i,N}) \, \mathrm {d}W_t^{i}, \end{aligned}$$
(3.14)

where \((\xi ^{i},W^{i})\), for \(i \in \lbrace 1, \ldots , N \rbrace \), are independent copies of \((\xi ,W)\).

In contrast to the case of particle systems with decomposable drift, we set \(\alpha : {\mathcal {P}}_2({\mathbb {R}}) \rightarrow {\mathbb {R}}\),

$$\begin{aligned} \alpha (\mu ^{{\varvec{x}}^{N}}) = \frac{b(0^{-},\mu ^{{\varvec{x}}^{N}})- b(0^{+},\mu ^{{\varvec{x}}^{N}})}{2\sigma ^2(0)}, \end{aligned}$$

(which could also be interpreted as a mapping \(\alpha _N: {\mathbb {R}}^N \rightarrow {\mathbb {R}}\)) and apply to each particle the following transformation \(G: {\mathbb {R}}\times {\mathcal {P}}_2({\mathbb {R}}) \rightarrow {\mathbb {R}}\),

$$\begin{aligned} G \left( x_i, \mu ^{{\varvec{x}}^{N}} \right) = x_i + \alpha \left( \mu ^{{\varvec{x}}^{N}}\right) x_i|x_i| \phi \big (x_i/c \big ). \end{aligned}$$
(3.15)

We set \({\mathbb {R}}^{N} \ni {\varvec{x}}^{N} \mapsto G_i({\varvec{x}}^{N}) := G \left( x_i, \mu ^{{\varvec{x}}^{N}} \right) \), and use these mappings to define \({\varvec{G}}_N: {\mathbb {R}}^{N} \rightarrow {\mathbb {R}}^{N}\) by

$$\begin{aligned} {\varvec{G}}_N({\varvec{x}}^{N}) := \left( G_1({\varvec{x}}^{N}), \ldots , G_N({\varvec{x}}^{N}) \right) ^{\top }. \end{aligned}$$

To obtain the transformed process \(({\varvec{Z}}^N_t)_{t \in [0,T]} = (Z^{1,N}_t,\ldots ,Z^{N,N}_t)_{t \in [0,T]}^{\top } \in {\mathbb {R}}^N\), we proceed as follows: For any \(t \in [0,T]\) and \(i \in \lbrace 1, \ldots , N \rbrace \), we have, using [14, Proposition 5.35] (see also Sect. 2)

$$\begin{aligned} \mathrm {d}G(X_t^{i,N},\mu _t^{{\varvec{X}}^{N}})&= \mathrm {d}G_i(X_t^{1,N},\ldots ,X_t^{N,N}) \nonumber \\&= \partial _{x_i} G(X_t^{i,N},\mu _t^{{\varvec{X}}^{N}})\mathrm {d}X_t^{i,N}+\frac{1}{2} \partial ^2_{x_i} G(X_t^{i,N},\mu _t^{{\varvec{X}}^{N}})\mathrm {d}[X^{i,N}]_t \nonumber \\&\quad + \frac{1}{N} \sum _{k =1}^{N} \partial _{\mu } G(X_t^{i,N},\mu _t^{{\varvec{X}}^{N}})(X_t^{k,N}) \mathrm {d}X^{k,N}_t \nonumber \\&\quad + \frac{1}{2N} \sum _{k =1}^{N} \partial _y \partial _{\mu }G(X_t^{i,N},\mu _t^{{\varvec{X}}^{N}})(X_t^{k,N}) \mathrm {d}[X^{k,N}]_t \nonumber \\&\quad + \frac{1}{2N^{2}} \sum _{k=1}^{N}\partial ^2_{\mu }G(X_t^{i,N},\mu _t^{{\varvec{X}}^{N}})(X_t^{k,N},X_t^{k,N}) \mathrm {d}[X^{k,N}]_t \nonumber \\&\quad + \frac{1}{N} \partial _{x_i} \partial _{\mu }G(X_t^{i,N},\mu _t^{{\varvec{X}}^{N}})(X_t^{i,N}) \mathrm {d}[X^{i,N}]_t. \nonumber \end{aligned}$$

The applicability of Itô’s formula for the function G is guaranteed by [40, Theorem 3.19]. Note that Assumption 3.4 therein is imposed to guarantee Lipschitz continuity of second order derivatives of G outside the set of discontinuities. Assumption (H.3(3)) and the definition of G substitute this condition.

Assuming for now the global invertibility of \({\varvec{G}}_N\), we may introduce

$$\begin{aligned} \mathrm {d}{\varvec{Z}}^{N}_t = {\varvec{B}}_N\left( {\varvec{G}}^{-1}_N({\varvec{Z}}^{N}_t) \right) \, \mathrm {d}t + \varvec{\varSigma }_N\left( {\varvec{G}}^{-1}_N({\varvec{Z}}^{N}_t) \right) \, \mathrm {d}{\varvec{W}}^{N}_t, \quad {\varvec{Z}}^{N}_0= {\varvec{G}}_N((X_0^{1,N}, \ldots , X_0^{N,N})), \end{aligned}$$

where \({\varvec{W}}^{N}_t = (W^{1}_t, \ldots , W^{N}_t)^{\top }\), \({\varvec{B}}_N({\varvec{x}}^{N})=(B_1({\varvec{x}}^{N}), \ldots , B_N({\varvec{x}}^{N}))^{\top }\) is defined by

$$\begin{aligned} B_i({\varvec{x}}^{N})&:= \partial _{x_i} G(x_i,\mu ^{{\varvec{x}}^{N}}) b(x_i,\mu ^{{\varvec{x}}^{N}}) + \frac{1}{2} \sigma ^{2}(x_i) \partial ^2_{x_i} G(x_i,\mu ^{{\varvec{x}}^{N}}) \nonumber \\&\quad + \frac{1}{N} \sum _{k =1}^{N} \partial _{\mu } G(x_i,\mu ^{{\varvec{x}}^{N}})(x_k) b(x_k,\mu ^{{\varvec{x}}^{N}}) + \frac{1}{2N} \sum _{k =1}^{N} \partial _y \partial _{\mu }G(x_i,\mu ^{{\varvec{x}}^{N}})(x_k) \sigma ^{2}(x_k) \nonumber \nonumber \\&\quad + \frac{1}{2N^{2}} \sum _{k=1}^{N}\partial ^2_{\mu }G(x_i,\mu ^{{\varvec{x}}^{N}})(x_k,x_k) \sigma ^{2}(x_k) + \frac{1}{N} \partial _{x_i} \partial _{\mu }G(x_i,\mu ^{{\varvec{x}}^{N}})(x_i) \sigma ^2(x_i), \end{aligned}$$
(3.16)

and \(\varvec{\varSigma }_N({\varvec{x}}^N) = \left( \varSigma ^{i,j}({\varvec{x}}^N) \right) _{i,j \in \lbrace 1, \ldots , N \rbrace }\) by

$$\begin{aligned} \varSigma ^{i,j}({\varvec{x}}^N)&= \partial _{x_i} G(x_i,\mu ^{{\varvec{x}}^{N}}) \sigma (x_i) \delta _{i,j} + \frac{1}{N}\partial _{\mu } G(x_i,\mu ^{{\varvec{x}}^{N}})(x_j) \sigma (x_j). \end{aligned}$$
(3.17)

In the following lemma, we will prove the invertibility of \({\varvec{G}}_N\).

Lemma 3.2

Let Assumption (H.3(3)) be satisfied and assume that the constant c in (3.15) satisfies

$$\begin{aligned} c< \min \left( 1, \left( \sup _{{\varvec{x}}^N \in {\mathbb {R}}^{N}} \left( |\alpha _N({\varvec{x}}^N)| + \max _{i \in \lbrace 1, \ldots , N \rbrace } |\partial _\mu \alpha (\mu ^{{\varvec{x}}^N})(x_i)| \right) \right) ^{-1} \right) . \end{aligned}$$
(3.18)

Then, \({\varvec{G}}_N\) has a global inverse.

Proof

We will employ Hadamard’s global inverse function theorem (see, e.g., [56, Theorem 2.2]) to prove that \({\varvec{G}}_N\) has a global inverse. To do so, we need to verify the following properties of \({\varvec{G}}_N\): \({\varvec{G}}_N\) is in \({\mathcal {C}}^{1}({\mathbb {R}}^N,{\mathbb {R}}^{N \times N})\), \(\lim _{|{\varvec{x}}^N| \rightarrow \infty } |{\varvec{G}}_N({\varvec{x}}^N)| = \infty \), and \({\varvec{G}}'_N({\varvec{x}}^N)\) is invertible for all \({\varvec{x}}^N \in {\mathbb {R}}^N\). The first two mentioned conditions are obvious, due to the definition of \({\varvec{G}}_N\) and the uniform boundedness of \(\alpha \) and \({\bar{\phi }}(x) := x|x| \phi \big (x/c \big )\).

Hence, we need to prove that \({\varvec{G}}'_N({\varvec{x}}^N)\) is invertible. First, note that

$$\begin{aligned} {\varvec{G}}'_N({\varvec{x}}^N) = \mathrm {I}_{N \times N} + \text {diag}_{N \times N}({\bar{\phi }}'(x_1)\alpha _N({\varvec{x}}^N), \ldots ,{\bar{\phi }}'(x_N)\alpha _N({\varvec{x}}^N))+ \bar{\varvec{\phi }}({\varvec{x}}^N)\varvec{\alpha }'_N({\varvec{x}}^N), \end{aligned}$$

where \(\mathrm {I}_{N \times N}\) is the \(N \times N\) identity matrix, \(\bar{\varvec{\phi }}({\varvec{x}}^N) = ({\bar{\phi }}(x_1),\ldots ,{\bar{\phi }}(x_N))^{\top }\), with \((\varvec{\alpha }_N'({\varvec{x}}^N))_i= \frac{1}{N}\partial _\mu \alpha (\mu ^{{\varvec{x}}^N})(x_i)\) and \(\varvec{\alpha }'_N\) is a row vector.

Now, we define

$$\begin{aligned} {\mathcal {A}}({\varvec{x}}^N) := \text {diag}_{N \times N}({\bar{\phi }}'(x_1)\alpha _N({\varvec{x}}^N), \ldots ,{\bar{\phi }}'(x_N)\alpha _N({\varvec{x}}^N)) + \bar{\varvec{\phi }}({\varvec{x}}^N)\varvec{\alpha }'_N({\varvec{x}}^N), \end{aligned}$$

and remark that \({\varvec{G}}'_N({\varvec{x}}^N)\) can be identified with the linear operator \(\mathrm {I}_{N \times N} + {\mathcal {A}}({\varvec{x}}^{N}): {\mathbb {R}}^N \rightarrow {\mathbb {R}}^N\). Therefore, succeeding in showing that c can be chosen (uniformly in \({\varvec{x}}^{N}\)) in a way such that the operator norm of \({\mathcal {A}}({\varvec{x}}^N)\) is smaller than one would yield the claim, as in this case \({\varvec{G}}'_N({\varvec{x}}^N)\) is close to the identity (see, [40, Lemma 3.17]). We compute

$$\begin{aligned} \Vert {\mathcal {A}}({\varvec{x}}^N) \Vert&\le \max _{i \in \lbrace {1, \ldots , N \rbrace }} |{\bar{\phi }}'(x_i)| |\alpha _N({\varvec{x}}^N)|+ \max _{i \in \lbrace 1, \ldots , N \rbrace } |{\bar{\phi }}(x_i)| |\partial _\mu \alpha (\mu ^{{\varvec{x}}^N})(x_i)| \\&\le c |\alpha _N({\varvec{x}}^N)| + c^2 \max _{i \in \lbrace 1, \ldots , N \rbrace }|\partial _\mu \alpha (\mu ^{{\varvec{x}}^N})(x_i)|, \end{aligned}$$

which implies that for

$$\begin{aligned} c< \min \left( 1, \left( |\alpha _N({\varvec{x}}^N)| + \max _{i \in \lbrace 1, \ldots , N \rbrace }|\partial _\mu \alpha (\mu ^{{\varvec{x}}^N})(x_i)| \right) ^{-1} \right) , \end{aligned}$$

\(\Vert {\mathcal {A}}({\varvec{x}}^N) \Vert < 1\). Note that (H.3(3)) guarantees that \(\alpha \) and its derivatives are uniformly bounded, i.e., c can be chosen uniformly in \({\varvec{x}}^N\). Therefore, Hadamard’s global inverse function theorem proves that, for each given \(N \ge 1\), \({\varvec{G}}_N: {\mathbb {R}}^N \rightarrow {\mathbb {R}}^N\) is a diffeomorphism. \(\square \)

We proceed by showing that the transformed SDE has (locally) Lipschitz continuous coefficients.

Lemma 3.3

Let Assumption (H.3) be satisfied. Then, the coefficients \({\varvec{B}}_N\) and \(\varvec{\varSigma }_N\) introduced in (3.16) and (3.17), respectively, are locally Lipschitz continuous with linear growth.

Proof

For \({\varvec{x}}^{N}, {\varvec{y}}^{N} \in {\mathbb {R}}^N\), we obtain using (3.17)

$$\begin{aligned}&\Vert \varvec{\varSigma }_N({\varvec{x}}^{N}) - \varvec{\varSigma }_N({\varvec{y}}^{N}) \Vert ^2 \\&\quad \le \sum _{i=1}^{N} |\varSigma ^{i,i}({\varvec{x}}^{N}) - \varSigma ^{i,i}({\varvec{y}}^{N})|^2 + \sum _{i \ne j} |\varSigma ^{i,j}({\varvec{x}}^{N}) - \varSigma ^{i,j}({\varvec{y}}^{N})|^2 \\&\quad \le \sum _{i=1}^{N} |\partial _{x_i} G(x_i,\mu ^{{\varvec{x}}^{N}}) \sigma (x_i) - \partial _{y_i} G(y_i,\mu ^{{\varvec{y}}^{N}}) \sigma (y_i)|^2 \\&\qquad \quad + \frac{1}{N^2} \sum _{i \ne j} |\partial _{\mu } G(x_i,\mu ^{{\varvec{x}}^{N}})(x_j) \sigma (x_j) -\partial _{\mu } G(y_i,\mu ^{{\varvec{x}}^{N}})(y_j) \sigma (y_j)|^2 \\&\quad \le C |{\varvec{x}}^{N} - {\varvec{y}}^{N}|^2, \end{aligned}$$

where we used (H.3(1)), and the Lipschitz continuity of the functions \(x \mapsto \partial _{x} G(x,\mu )\) and \(\mu \mapsto \partial _{x} G(x,\mu )\) to estimate the first term. Assumptions (H.3(1)) and (H.3(3)), in particular the Lipschitz continuity of \(x \mapsto \partial _{\mu } G(x,\mu )(y)\) and \(\mu \mapsto \partial _{\mu } G(x,\mu )(y)\), are employed to handle the second sum. Also note that all these mappings are bounded due to (H.3(3)) and the definition of the transformation.

Noting that

$$\begin{aligned}&| {\varvec{B}}_N({\varvec{x}}^{N}) - {\varvec{B}}_N({\varvec{y}}^{N}) |^2 = \sum _{i=1}^{N} |B_i({\varvec{x}}^{N}) - B_i({\varvec{y}}^{N})|^2, \end{aligned}$$

we further obtain for the drift

$$\begin{aligned}&|B_i({\mathbf {x}}^{N}) - B_i({\mathbf {y}}^{N})|^2 \\&\quad \le C \Bigg ( |\partial _{x_i} G(x_i,\mu ^{{\mathbf {x}}^{N}}) b(x_i,\mu ^{{\mathbf {x}}^{N}}) + \frac{1}{2} \sigma ^{2}(x_i) \partial ^2_{x_i} G(x_i,\mu ^{{\mathbf {x}}^{N}}) \\& -\partial _{y_i} G(y_i,\mu ^{{\mathbf {y}}^{N}}) b(y_i,\mu ^{{\mathbf {y}}^{N}}) - \frac{1}{2} \sigma ^{2}(y_i) \partial ^2_{y_i} G(y_i,\mu ^{{\mathbf {y}}^{N}})|^2 \\&\qquad +\frac{1}{N}\sum _{k =1}^{N}| \partial _{\mu } G(x_i,\mu ^{{\mathbf {x}}^{N}})(x_k) b(x_k,\mu ^{{\mathbf {x}}^{N}}) - \partial _{\mu } G(y_i,\mu ^{{\mathbf {y}}^{N}})(y_k) b(y_k,\mu ^{{\mathbf {y}}^{N}})|^2 \\&\qquad + \frac{1}{2N} \sum _{k =1}^{N} |\partial _y \partial _{\mu }G(x_i,\mu ^{{\mathbf {x}}^{N}})(x_k) \sigma ^{2}(x_k) - \partial _y \partial _{\mu }G(y_i,\mu ^{{\mathbf {x}}^{N}})(y_k) \sigma ^{2}(y_k)|^2 \\&\qquad + \frac{1}{2N^{2}} \sum _{k=1}^{N}|\partial ^2_{\mu }G(x_i,\mu ^{{\mathbf {x}}^{N}})(x_k,x_k) \sigma ^{2}(x_k)-\partial ^2_{\mu }G(y_i,\mu ^{{\mathbf {y}}^{N}})(y_k,y_k) \sigma ^{2}(y_k)|^2 \\&\qquad + \frac{1}{N} |\partial _{x_i} \partial _{\mu }G(x_i,\mu ^{{\mathbf {x}}^{N}})(x_i) \sigma ^2(x_i)-\partial _{y_i} \partial _{\mu }G(y_i,\mu ^{{\mathbf {y}}^{N}})(y_i)) \sigma ^2(y_i)|^2 \Bigg )=: \sum _{i=1}^{5} \varPi _i. \end{aligned}$$

That the terms \(\varPi _3,\varPi _4\) and \(\varPi _5\) allow a Lipschitz bound is a consequence of (H.3(1)) and (H.3(3)).

For any \(R>0\), in view of (H.3(2)) and (H.3(3)), we derive the following estimate for \(\varPi _2\):

$$\begin{aligned} \varPi _2&\le \frac{C}{N} \Bigg ( \sum _{k = 1}^{N} \left| \partial _{\mu } \alpha (\mu ^{{\mathbf {x}}^{N}})(x_k){\bar{\phi }}(x_i) b(x_k,\mu ^{{\mathbf {x}}^{N}}) - \partial _{\mu } \alpha (\mu ^{{\mathbf {y}}^{N}})(y_k){\bar{\phi }}(x_i)b(y_k,\mu ^{{\mathbf {y}}^{N}}) \right| ^2 \\&\quad + \sum _{k = 1}^{N} \left| \partial _{\mu } \alpha (\mu ^{{\mathbf {y}}^{N}})(y_k){\bar{\phi }}(x_i)b(y_k,\mu ^{{\mathbf {y}}^{N}}) - \partial _{\mu } \alpha (\mu ^{{\mathbf {y}}^{N}})(y_k){\bar{\phi }}(y_i)b(y_k,\mu ^{{\mathbf {y}}^{N}}) \right| ^2 \Bigg ) \\&\le \frac{C}{N} \sum _{k = 1}^{N} \left| \partial _{\mu } \alpha (\mu ^{{\mathbf {x}}^{N}})(x_k) b(x_k,\mu ^{{\mathbf {x}}^{N}}) - \partial _{\mu } \alpha (\mu ^{{\mathbf {y}}^{N}})(y_k)b(y_k,\mu ^{{\mathbf {y}}^{N}}) \right| + L_R|x_i-y_i|^2, \end{aligned}$$

for some constant \(L_R>0\) and any \(|{\varvec{x}}^{N}|, |{\varvec{y}}^{N}| \le R\). We proceed with the estimate

$$\begin{aligned} \sum _{k = 1}^{N} \left| \partial _{\mu } \alpha (\mu ^{{\varvec{x}}^{N}})(x_k) b(x_k,\mu ^{{\varvec{x}}^{N}}) - \partial _{\mu } \alpha (\mu ^{{\varvec{y}}^{N}})(y_k)b(y_k,\mu ^{{\varvec{y}}^{N}}) \right| ^2 \le C \sum _{k = 1}^{N} |x_k-y_k|^2, \end{aligned}$$

which holds due to Assumptions (H.2(3)) and (H.2(4)). Combining above estimates, we obtain

$$\begin{aligned} \varPi _2 \le L_R \left( |x_i-y_i|^2 + \frac{1}{N}\sum _{k = 1}^{N} |x_k-y_k|^2 \right) . \end{aligned}$$

Finally, we point out that

$$\begin{aligned} x \mapsto b(x,\mu ) + \frac{1}{2} \sigma ^{2}(x) \partial ^2_{x} G(x,\mu ), \end{aligned}$$

is Lipschitz continuous due to the choice of \(\alpha \). Employing this along with (H.3(1)), (H.3(2)) and (H.3(3)), we derive

$$\begin{aligned} \varPi _1 \le C \left( |x_i-y_i|^2 + \frac{1}{N}\sum _{k = 1}^{N} |x_k-y_k|^2 \right) , \end{aligned}$$

for some constant \(C >0\). Taking the estimates for \(\varPi _1, \ldots , \varPi _5\) into account, yields the local Lipschitz continuity of \({\varvec{B}}_N\).

The linear growth of \({\varvec{B}}_N\) and \(\varvec{\varSigma }_N\), i.e., that there exists a constant \(C>0\) such that \(|{\varvec{B}}_N({\varvec{x}}^{N})| + \Vert \varvec{\varSigma }_N({\varvec{x}}^{N})\Vert \le C(1+|{\varvec{x}}^{N}|)\) for all \({\varvec{x}}^{N} \in {\mathbb {R}}^{N}\), is a direct consequence of the growth conditions on b and \(\sigma \) along with the bounds for the derivatives of G and \(\alpha \). \(\square \)

Theorem 3.2

Let Assumption (H.3) be satisfied, let \(\xi \in L_p^{0}({\mathbb {R}})\) for a given \(p \ge 2\) and assume that the constant c in (3.15) satisfies (3.18). Then, the interacting particle system defined in (3.14) has a unique strong solution in \({\mathcal {S}}^{p}([0,T])\).

Proof

From Lemma 3.3 and the linear growth of \({\varvec{B}}_N\) and \(\varvec{\varSigma }_N\), we can deduce that the SDE for \({\varvec{Z}}^{N}\) has a unique strong solution (see, [43, Chapter 5, Theorem 2.5]). Applying now Itô’s formula to \({\varvec{G}}_N^{-1}({\varvec{Z}}^{N}_t)\) proves the strong uniqueness of a solution to the particle system defined in (3.14). Note that \({\varvec{G}}_N^{-1}\) exists due to Lemma 3.2 and Itô’s formula is applicable for \({\varvec{G}}_N^{-1}\) as it inherits the regularity of \({\varvec{G}}_N\) (see, Appendix A.2 for details). \(\square \)

4 Euler–Maruyama scheme with and without transformation

In this section, we restrict most of our discussion to the case of a McKean–Vlasov SDE with decomposable drift, due to the simpler structure of the underlying transformation and the particle systems.

In the following subsections, we will present two Euler–Maruyama schemes to discretise the particle system defined in (3.2) in time.

For the first scheme (Scheme 1), we will discretise the transformed (continuous) particle system in time and then exploit the global inverse \(G^{-1}\) to obtain approximations of the original (discontinuous) particle system. A slight modification of this scheme will also be applied in the non-decomposable case. An approximation result with respect to the number of particles will also be presented.

The second scheme (Scheme 2) will be defined by directly discretising the discontinuous particle system, without making use of the transformation G. We give strong convergence rates in terms of the number of time-steps and pathwise strong propagation of chaos results in order to obtain quantitative \(L_2\)-approximations for the underlying McKean–Vlasov SDE.

4.1 Scheme 1: Euler–Maruyama after transformation (decomposable case)

We define the following explicit Euler–Maruyama scheme to discretise the particle system (3.2) in time. In a first step, we partition a given time interval [0, T] into subintervals of equal length \(h=T/M\), for some integer \(M>0\), and define \(t_n:= nh\). Then, we simulate the transformed particle system by

$$\begin{aligned} Z_{t_{n+1}}^{i,N,M} = Z_{t_{n}}^{i,N,M} + {\tilde{b}}(G^{-1}(Z_{t_n}^{i,N,M}), \mu _{t_n}^{{\varvec{Z}}^{N,M}}) h + {\tilde{\sigma }}(G^{-1}(Z_{t_n}^{i,N,M})) \varDelta W_{n}^{i}, \end{aligned}$$
(4.1)

for \(n \in \lbrace 0, \ldots , M-1 \rbrace \), where \(Z_0^{i,N,M} = G(X_0^{i,N})\), \(\varDelta W_{n}^{i} = W_{t_{n+1}}^{i} - W_{t_{n}}^{i}\), for \(i \in \lbrace 1, \ldots , N \rbrace \), and

$$\begin{aligned} \mu _{t_n}^{{\varvec{Z}}^{N,M}}(\mathrm {d}x) := \frac{1}{N} \sum _{j=1}^{N} \delta _{G^{-1}(Z_{t_{n}}^{j,N})}(\mathrm {d}x). \end{aligned}$$

We introduce the notation \(\eta (t):= \sup \lbrace s \in \lbrace 0, h, \ldots , Mh \rbrace : s \le t \rbrace \), for \(t \in [0,T]\), which allows us to define the continuous time version of (4.1)

$$\begin{aligned} Z_{t}^{i,N,M} = Z_0^{i,N,M} + \int _{0}^{t} {\tilde{b}}(G^{-1}(Z_{\eta (s)}^{i,N,M}), \mu _{\eta (s)}^{{\varvec{Z}}^{N,M}}) \, \mathrm {d}s + \int _{0}^{t} {\tilde{\sigma }}(G^{-1}(Z_{\eta (s)}^{i,N,M})) \, \mathrm {d}W_s^{i}. \end{aligned}$$
(4.2)

Then, we propose an Euler–Maruyama approximation to \(X_t^{i,N}\), for \(i \in \lbrace 1, \ldots , N \rbrace \) and \(t \in [0,T]\), by

$$\begin{aligned} X_t^{i,N,M}=G^{-1}(Z_{t}^{i,N,M}). \end{aligned}$$
(4.3)

The convergence of this algorithm is proven in the following theorem:

Theorem 4.1

Let Assumption (H.1) be satisfied, let \(\xi \in L_p^{0}({\mathbb {R}})\) for some \(p > 4\) and assume \(c < 1/|\alpha |\). For \(i \in \lbrace 1, \ldots , N \rbrace \), let \((X_t^{i})_{t \in [0,T]}\) be the unique strong solution of (3.1) driven by the Brownian motion \((W_t^{i})_{t \in [0,T]}\) with initial data \(\xi ^{i}\), and \((X_t^{i,N,M})_{t \in [0,T]}\) be given by (4.2) and (4.3). Then, there exists a constant \(C>0\) (independent of N and M) such that

$$\begin{aligned} \max _{i \in \lbrace 1, \ldots , N \rbrace }{\mathbb {E}}\left[ \sup _{t \in [0,T]} |X_t^{i} - X_t^{i,N,M}|^2 \right] \le C \left( h + N^{-1/2} \right) . \end{aligned}$$

Proof

Note that

$$\begin{aligned} |X_t^{i} - X_t^{i,N,M}|^2 \le 2 |X_t^{i} - X_t^{i,N}|^2 + 2|X_t^{i,N} - X_t^{i,N,M}|^2, \end{aligned}$$

and recall that the dynamics of \(Z_t^{i} = G(X_t^{i})\) satisfies

$$\begin{aligned} \mathrm {d}Z_t^{i} = {\tilde{b}}(Z_t^{i}, {\tilde{\mu }}_t^{Z}) \, \mathrm {d}t + {\tilde{\sigma }}(Z_t^{i}) \, \mathrm {d}W_t^{i}, \end{aligned}$$

where \({\tilde{\mu }}_t^{Z} := {\mathcal {L}}_{G^{-1}(Z^{i}_t)}\). Therefore, we obtain for some constant \(C>0\)

$$\begin{aligned} {\mathbb {E}} \left[ \sup _{t \in [0,T]} |X_t^{i} - X_t^{i,N}|^2 \right]&= {\mathbb {E}} \left[ \sup _{t \in [0,T]} |G^{-1}(Z_t^{i}) - G^{-1}(Z_t^{i,N})|^2 \right] \\&\le L_{G^{-1}}^2 {\mathbb {E}} \left[ \sup _{t \in [0,T]} |Z_t^{i} - Z_t^{i,N}|^2 \right] \le C N^{-1/2}, \end{aligned}$$

where the last inequality can be derived similarly to the propagation of chaos results for equations with Lipschitz continuous coefficients as in, e.g., [13] for \(d=1\) (note that the rate \(N^{-1/2}\) can be improved into \(N^{-1}\) in case where \(b_2(x,\mu ) = \int _{{\mathbb {R}}} \beta (x,y) \, \mu (\mathrm {d}y)\), with \(\beta \) Lipschitz continuous, see [46]). To apply the aforementioned propagation of chaos result, we need the requirement that \(\xi \in L_p^{0}({\mathbb {R}})\) for \(p > 4\) (see, also [14, Theorem 5.8]). Furthermore, we have

$$\begin{aligned} {\mathbb {E}} \left[ \sup _{t \in [0,T]} |X_t^{i,N} - X_t^{i,N,M}|^2 \right]&= {\mathbb {E}} \left[ \sup _{t \in [0,T]} |G^{-1}(Z_t^{i,N}) - G^{-1}(Z_t^{i,N,M})|^2 \right] \\&\le L_{G^{-1}}^2 {\mathbb {E}} \left[ \sup _{t \in [0,T]} |Z_t^{i,N} - Z_t^{i,N,M}|^2 \right] \le C h, \end{aligned}$$

since the SDEs for \((Z_t^{i,N})_{t \in [0,T]}\) and \((Z_t^{i,N,M})_{t \in [0,T]}\) have globally Lipschitz continuous coefficients. From these two estimates the claim follows. \(\square \)

4.2 Scheme 2: Euler–Maruyama without transformation (decomposable case)

As G and \(G^{-1}\) may be difficult to construct in multi-dimensional settings, and since the evaluation for the inverse at each time point can be computationally expensive, it would be preferable to discretise the particle system \((X^{i,N})_{i \in \lbrace 1, \ldots , N \rbrace }\) in time directly, without the use of the transformation G. In addition, a drawback of Scheme 1 is that an SDE with additive diffusion term will be transformed into one with multiplicative noise and therefore the Euler–Maruyama scheme no longer coincides with the Milstein scheme. We employ an Euler–Maruyama scheme to discretise the particle system (3.2) and compute an approximate solution by

$$\begin{aligned} X_{t}^{i,N,M} = X^{i,N}_0 + \int _{0}^{t} \left( b_1(X_{\eta (s)}^{i,N,M}) + b_2(X_{\eta (s)}^{i,N,M}, \mu _{\eta (s)}^{{\varvec{X}}^{N,M}}) \right) \, \mathrm {d}s + \int _{0}^{t} \sigma (X_{\eta (s)}^{i,N,M}) \, \mathrm {d}W_s^{i}. \end{aligned}$$
(4.4)

The following results are concerned with moment stability of the discretised particle system and estimates for the occupation time of the particle system in the neighbourhood of the set of discontinuities.

Moment stability:

We first remark that due to the linear growth of the coefficients in the state component, the Lipschitz continuity in the measure variable and the fact that all particles are identically distributed, we have the following result (see, e.g., [43] for details):

Proposition 4.1

Let Assumption (H.1) be satisfied, and let \(\xi \in L_p^{0}({\mathbb {R}})\) for \(p \ge 2\). Then, there exist constants \(C_1, C_2>0\), such that

$$\begin{aligned} \max _{i \in \lbrace 1, \ldots , N \rbrace } \max _{n \in \lbrace 0, \ldots , M \rbrace } {\mathbb {E}}\left[ |X_{t_n}^{i,N,M} |^p \right] \le C_1, \end{aligned}$$

and for all \(i \in \lbrace 1, \ldots , N \rbrace \) and for all \(t \in [0,T]\),

$$\begin{aligned} {\mathbb {E}} \left[ |X_{t}^{i,N,M} - X_{\eta (t)}^{i,N,M}|^p\right] \le C_2h^{p/2}. \end{aligned}$$

Occupation time formula for (4.4):

Below, we will show an estimate of the expected occupation time of a fixed particle of the system defined by (4.4) in a neighbourhood of zero.

Proposition 4.2

Let Assumption (H.1) be satisfied and let i be an arbitrary but fixed particle index. Further, let \(\xi \in L_p^{0}({\mathbb {R}})\) for \(p \ge 2\) and let \((X_t^{i,N,M})_{t \in [0,T]}\) be given by (4.4). Then, there exists a constant \(C>0\) such that for all \(N, M \in {\mathbb {N}}\) and all sufficiently small \(\varepsilon >0\)

$$\begin{aligned} \int _{0}^{T} {\mathbb {P}} \left( \lbrace {\varvec{X}}_{t}^{N,M} \in \varTheta ^{i,\varepsilon } \rbrace \right) \, \mathrm {d}t \le C \varepsilon , \end{aligned}$$

where \( {\varvec{X}}_{t}^{N,M} = (X_{t}^{1,N,M},\ldots ,X_{t}^{N,N,M})^{\top } \in {\mathbb {R}}^{N}\) and \(\varTheta ^{i,\varepsilon }\) is given by

$$\begin{aligned} \varTheta ^{i,\varepsilon }:= \lbrace {\varvec{x}}^{N} =(x_1, \ldots , x_N)^{\top } \in {\mathbb {R}}^N: \ \exists {\varvec{y}}^N \in \varTheta ^i \text { with } |{\varvec{x}}^N-{\varvec{y}}^N| < \varepsilon \rbrace , \end{aligned}$$

with \(\varTheta ^i:=\{{\varvec{x}}^{N}=(x_1,\ldots ,x_N)^{\top } \in {\mathbb {R}}^{N} \ :\ x_i=0\}\).

Proof

We aim to apply [41, Theorem 2.7], which states the following: Let \((X_t)_{t \in [0,T]}\) be an \({\mathbb {R}}^d\)-valued Itô process

$$\begin{aligned} X_T = X_0 + \int _{0}^{T} A_t \, \mathrm {d}t + \int _{0}^{T} B_t \, \mathrm {d}W_t, \end{aligned}$$

with progressively measurable processes \(A=(A_t)_{t \in [0,T]}\) and \(B=(B_t)_{t \in [0,T]}\), where A is \({\mathbb {R}}^d\)-valued and B is \({\mathbb {R}}^{d \times d}\)-valued. The set of discontinuities, \(\varTheta \), is assumed to be a \({\mathcal {C}}^3\) hypersurface of positive reach. Namely, there exists \( \varepsilon > 0\) such that \(p(x)= {{\,\mathrm{arg\,min}\,}}_{y \in \varTheta } |x-y|\) is a single valued function of class \({\mathcal {C}}^3\) on the tubular neighbourhood \(\varTheta ^{\varepsilon }:= \lbrace x \in {\mathbb {R}}^{d}: \ \inf _{y \in \varTheta } |x-y| < \varepsilon \rbrace \) (see Definition 2.4 in [41] for details). Then, there exists a constant \(C>0\), such that for all sufficiently small \(\varepsilon >0\)

$$\begin{aligned} \int _{0}^{T} {\mathbb {P}} \left( \lbrace X_t \in \varTheta ^{\varepsilon } \rbrace \right) \, \mathrm {d}t \le C \varepsilon , \end{aligned}$$

assuming, additionally, that

  1. 1.

    the processes A and B are almost surely bounded by a constant C if \(X_t(\omega )\) is in a small neighbourhood of \(\varTheta \), and

  2. 2.

    there exists a constant \(C>0\) such that for almost all \(\omega \in \varOmega \), we have: If, for any \(t \in [0,T]\), \(X_t(\omega )\) is in a small neighbourhood of \(\varTheta \) then

    $$\begin{aligned} n^{\top }\left( p\left( X_t(\omega ) \right) \right) B_t^{\top }(\omega )B_t(\omega ) n\left( p\left( X_t(\omega ) \right) \right) \ge C, \end{aligned}$$

    where n(x) has length one and is orthogonal to the tangent space of \(\varTheta \) in x.

We now return to our particular model problem. First, we remark that \(\varTheta ^{i}\) satisfies all regularity conditions of [41, Theorem 2.7], i.e., it is a \({\mathcal {C}}^{3}\) hypersurface of positive reach. We then observe that the N-dimensional particle system can be rewritten as

$$\begin{aligned} \mathrm {d}{\varvec{X}}_{t}^{N,M} = {\varvec{B}}_N({\varvec{X}}_{\eta (t)}^{N,M}) \, \mathrm {d}t + \varvec{\varSigma }_N({\varvec{X}}_{\eta (t)}^{N,M}) \, \mathrm {d}{\varvec{W}}^{N}_t, \end{aligned}$$

where \({\varvec{W}}^{N}_t= (W^{1}_t, \ldots , W^{N}_t)^{\top }\) and \({\varvec{B}}_N: {\mathbb {R}}^N \rightarrow {\mathbb {R}}^N\) and \(\varvec{\varSigma }_N: {\mathbb {R}}^N \rightarrow {\mathbb {R}}^{N \times N}\) are defined by

$$\begin{aligned} {\varvec{B}}_N({\varvec{x}}^{N})&= (b(x_1,\mu ^{{\varvec{x}}^{N}}),\ldots ,b(x_N,\mu ^{{\varvec{x}}^{N}}))^{\top }, \\ \varvec{\varSigma }_N({\varvec{x}}^N)&= \text{ diag}_{N \times N}(\sigma (x_1),\ldots ,\sigma (x_N)). \end{aligned}$$

Further, we observe that there is a constant \(C>0\) such that: If, for any \(t \in [0,T]\) and \(\omega \in \varOmega \), \({\varvec{X}}_{t}^{N,M}(\omega )\) is in a small neighbourhood of \(\varTheta ^{i}\) then

$$\begin{aligned} n^{\top }\left( p\left( {\varvec{X}}_{t}^{N,M}(\omega ) \right) \right) \varvec{\varSigma }_N^{\top }({\varvec{X}}_{t}^{N,M}(\omega ))\varvec{\varSigma }_N({\varvec{X}}_{t}^{N,M}(\omega )) n\left( p\left( {\varvec{X}}_{t}^{N,M}(\omega ) \right) \right) \ge C, \end{aligned}$$

as \(\sigma \) is continuous and \(\sigma (0) \ne 0\). Also, note that a normal vector of the tangent space of \(\varTheta ^{i}\) is \(e_i\), i.e., the i-th unit vector. Further, a close inspection of the proof of [41, Theorem 2.7], shows that the boundedness assumption on the coefficients in a neighbourhood of \(\varTheta ^{i}\) is not needed in our case, due to the moment bound of an individual particle established in Proposition 4.1. Hence,

$$\begin{aligned} \int _{0}^{T} {\mathbb {P}} \left( \lbrace {\varvec{X}}_{t}^{N,M} \in \varTheta ^{i,\varepsilon } \rbrace \right) \, \mathrm {d}t \le C \varepsilon , \end{aligned}$$

where the constant \(C>0\) is independent of N, due to the fact that the normal vector is the i-th unit vector. \(\square \)

Auxiliary proposition:

Based on the occupation time estimate from Proposition 4.2, we will prove the following result, which is needed in the proof of Theorem 4.2 given below.

Proposition 4.3

Let Assumption (H.1) be satisfied, and let \(\xi \in L_p^{0}({\mathbb {R}})\) for \(p \ge 8\). Furthermore, let \((X_t^{i,N,M})_{t \in [0,T]}\) be given by (4.4). Then, there exists a constant \(C>0\) (independent of N and M) such that for any \(t \in [0,T]\), we have

$$\begin{aligned} \max _{i \in \lbrace 1, \ldots , N \rbrace } {\mathbb {E}}\left[ \left| \int _{0}^{t} \left( G''(X_s^{i,N,M}) - G''(X_{\eta (s)}^{i,N,M}) \right) \sigma ^2(X_{\eta (s)}^{i,N,M}) \, \mathrm {d}s \right| ^2 \right] \le C h^{2/9}. \end{aligned}$$

Proof

First, observe that the linear growth of \(\sigma \) and the piecewise Lipschitz continuity of \(G''\) imply that there exists a constant \(C>0\) such that

$$\begin{aligned}&\left| \left( G''(X_s^{i,N,M}) - G''(X_{\eta (s)}^{i,N,M}) \right) \sigma ^2(X_{\eta (s)}^{i,N,M}) \right| \\&\quad \le {\left\{ \begin{array}{ll} C \left( 1+ (X_{\eta (s)}^{i,N,M})^2 \right) |X_s^{i,N,M} - X_{\eta (s)}^{i,N,M}|, \ X_s^{i,N,M} \notin (-\varepsilon ,\varepsilon ), |X_s^{i,N,M} - X_{\eta (s)}^{i,N,M}| < \varepsilon , \\ C \left( 1+ (X_{\eta (s)}^{i,N,M})^2 \right) , \text{ otherwise }, \end{array}\right. } \end{aligned}$$

where \(\varepsilon >0\) will be specified later. With this at hand, we derive

$$\begin{aligned}&{\mathbb {E}}\left[ \left| \int _{0}^{t} \left( G''(X_s^{i,N,M}) - G''(X_{\eta (s)}^{i,N,M}) \right) \sigma ^2(X_{\eta (s)}^{i,N,M}) \, \mathrm {d}s \right| ^2 \right] \\&\quad \le C \int _{0}^{t} {\mathbb {E}} \left[ \left| \left( G''(X_s^{i,N,M}) - G''(X_{\eta (s)}^{i,N,M}) \right) \sigma ^2(X_{\eta (s)}^{i,N,M}) \right| ^2 \right] \, \mathrm {d}s \\&\quad \le C \Bigg ( \int _{0}^{t} {\mathbb {E}} \left[ \left| \left( G''(X_s^{i,N,M}) - G''(X_{\eta (s)}^{i,N,M}) \right) \sigma ^2(X_{\eta (s)}^{i,N,M}) \right| ^2 \right. \\&\qquad \qquad \times \left. \left( \mathrm {I}_{\lbrace X_s^{i,N,M} \notin (-\varepsilon ,\varepsilon ) \rbrace } \mathrm {I}_{\lbrace |X_s^{i,N,M} - X_{\eta (s)}^{i,N,M}|< \varepsilon \rbrace } \right) \right] \, \mathrm {d}s \\&\qquad + \int _{0}^{t} {\mathbb {E}} \Big [ \left| \left( G''(X_s^{i,N,M}) - G''(X_{\eta (s)}^{i,N,M}) \right) \sigma ^2(X_{\eta (s)}^{i,N,M}) \right| ^2 \\&\qquad \qquad \times \left( \mathrm {I}_{\lbrace X_s^{i,N,M} \in (-\varepsilon ,\varepsilon ) \rbrace } + \mathrm {I}_{\lbrace X_s^{i,N,M} \notin (-\varepsilon ,\varepsilon ) \rbrace } \mathrm {I}_{\lbrace |X_s^{i,N,M} - X_{\eta (s)}^{i,N,M}| \ge \varepsilon \rbrace } \right) \Big ] \, \mathrm {d}s \Bigg ) \\&\quad \le C \Bigg ( \int _{0}^{t} {\mathbb {E}} \Bigg [ \left( 1+ (X_{\eta (s)}^{i,N,M})^4 \right) |X_s^{i,N,M} - X_{\eta (s)}^{i,N,M}|^2\\&\quad \qquad \times \left( \mathrm {I}_{\lbrace X_s^{i,N,M} \notin (-\varepsilon ,\varepsilon ) \rbrace } \mathrm {I}_{\lbrace |X_s^{i,N,M} - X_{\eta (s)}^{i,N,M}| < \varepsilon \rbrace } \right) \Bigg ] \, \mathrm {d}s + \int _{0}^{t} {\mathbb {E}} \Bigg [ \left( 1+ (X_{\eta (s)}^{i,N,M})^4 \right) \\&\quad \qquad \qquad \times \left( \mathrm {I}_{\lbrace X_s^{i,N,M} \in (-\varepsilon ,\varepsilon ) \rbrace } + \mathrm {I}_{\lbrace X_s^{i,N,M} \notin (-\varepsilon ,\varepsilon ) \rbrace } \mathrm {I}_{\lbrace |X_s^{i,N,M} - X_{\eta (s)}^{i,N,M}| \ge \varepsilon \rbrace } \right) \Bigg ] \, \mathrm {d}s \Bigg ) \\&\quad \le C \left( \varepsilon ^2 + \varepsilon ^{1/2} + \int _{0}^{t} \left( {\mathbb {P}}(|X_s^{i,N,M} - X_{\eta (s)}^{i,N,M}| \ge \varepsilon ) \right) ^{1/2} \, \mathrm {d}s \right) , \end{aligned}$$

where we used Hölder’s inequality, Propositions 4.1 and 4.2 in the last display. Markov’s inequality along with Proposition 4.1 imply that there exists a constant \(C>0\) such that

$$\begin{aligned} \left( {\mathbb {P}}(|X_s^{i,N,M} - X_{\eta (s)}^{i,N,M}| \ge \varepsilon ) \right) ^{1/2} \le \frac{\left( {\mathbb {E}} \left[ \left| X_s^{i,N,M} - X_{\eta (s)}^{i,N,M} \right| ^8 \right] \right) ^{1/2}}{\varepsilon ^4} \le \frac{C h^2}{\varepsilon ^4}. \end{aligned}$$

Choosing \(\varepsilon = h^{4/9}\) gives the result. \(\square \)

We are now ready to present our main convergence result. In this case, we only obtain the strong convergence rate of order 1/9:

Theorem 4.2

Let Assumption (H.1) be satisfied, let \(\xi \in L_p^{0}({\mathbb {R}})\) for some \(p \ge 8\) and assume \(c < 1/|\alpha |\). Furthermore, let \((X_t^{i})_{t \in [0,T]}\) be the unique strong solution of (3.1) driven by the Brownian motion \((W_t^{i})_{t \in [0,T]}\)with initial data \(\xi ^{i}\) and \((X_t^{i,N,M})_{t \in [0,T]}\) given by (4.4). Then, there exists a constant \(C>0\) (independent of N and M) such that

$$\begin{aligned} \max _{i \in \lbrace 1, \ldots , N \rbrace } {\mathbb {E}}\left[ \sup _{t \in [0,T]} |X_t^{i} - X_t^{i,N,M}|^2 \right] \le C \left( h^{2/9} + N^{-1/2} \right) . \end{aligned}$$

Proof

Note that

$$\begin{aligned} {\mathbb {E}}\left[ \sup _{t \in [0,T]} |X_t^{i} - X_t^{i,N} |^2 \right]&\le L_{G^{-1}} {\mathbb {E}} \left[ \sup _{t \in [0,T]} |G( X_t^{i}) - G(X_t^{i,N})|^2 \right] \nonumber \\&\le C N^{-1/2}, \end{aligned}$$
(4.5)

where in the last display, we used the pathwise propagation of chaos result as in the previous subsection. Further, we have

$$\begin{aligned}&{\mathbb {E}} \left[ \sup _{t \in [0,T]} |X_t^{i,N}-X_t^{i,N,M}|^2 \right] \le L_{G^{-1}} {\mathbb {E}} \left[ \sup _{t \in [0,T]} |Z_t^{i,N} - G(X_t^{i,N,M})|^2 \right] \\&\quad \le C \left( {\mathbb {E}} \left[ \sup _{t \in [0,T]} |Z_t^{i,N}-Z_t^{i,N,M}|^2 \right] + {\mathbb {E}} \left[ \sup _{t \in [0,T]} |Z_t^{i,N,M} - G(X_t^{i,N,M})|^2 \right] \right) \\&\quad \le C \left( h + {\mathbb {E}} \left[ \sup _{t \in [0,T]} |Z_t^{i,N,M} - G(X_t^{i,N,M})|^2 \right] \right) , \end{aligned}$$

where in the last estimate, we employed standard strong convergence results for the Euler–Maruyama scheme applied to SDEs with globally Lipschitz continuous coefficients. Following similar arguments to [41] or [49], one further obtains, by applying Itô’s formula to \(G(X_t^{i,N,M})\),

$$\begin{aligned}&{\mathbb {E}} \left[ \sup _{t \in [0,T]} |Z_t^{i,N,M} - G(X_t^{i,N,M})|^2 \right] \\&\quad \le C\Bigg (h + \int _{0}^{T} {\mathbb {E}}\left[ \left| \left( G''(X_s^{i,N,M}) - G''(X_{\eta (s)}^{i,N,M}) \right) \sigma ^2(X_{\eta (s)}^{i,N,M}) \right| ^2 \right] \, \mathrm {d}s \\&\qquad + \int _{0}^{T} {\mathbb {E}} \left[ \sup _{s \in [0,t]} |Z_s^{i,N,M} - G(X_s^{i,N,M})|^2 \right] \, \mathrm {d}t \Bigg ), \end{aligned}$$

where the second summand on the right side is of order \(h^{2/9}\) due to Proposition 4.3. Hence, Gronwall’s inequality yields

$$\begin{aligned} {\mathbb {E}} \left[ \sup _{t \in [0,T]} |X_t^{i,N}-X_t^{i,N,M}|^2 \right] \le Ch^{2/9}, \end{aligned}$$
(4.6)

and the claim follows combining (4.5) and (4.6). \(\square \)

Remark 4.1

The convergence rate in terms of the number of particles in the above theorem can again be improved to 1/2 if the drift has the form (1.2), see [46].

The convergence rate in terms of number of time-steps established in Theorem 4.2 could be improved by employing exponential tail estimate techniques, as in [41]. The resulting strong convergence rate would be \(1/4-\varepsilon \), for an arbitrarily small \(\varepsilon >0\). However, to achieve this, one would need to assume boundedness of the coefficients in Eq. (3.1). Another possibility to recover a better convergence rate in our setting would be to require that the initial data \(\xi \in L_p^{0}({\mathbb {R}})\) for any \(p \ge 2\) and that \(\sigma \) is uniformly bounded. This would enable us to obtain sharper estimates, when employing Markov’s inequality in the proof of Proposition 4.3. If we assume moment boundedness of the initial data of all orders, but allow \(\sigma \) to grow-linearly, we would obtain a rate of order \(1/8-\varepsilon \).

Moreover, although we expect that the optimal convergence rate of the Euler–Maruyama scheme applied to the interacting particle system is 1/2 (as for equations with Lipschitz coefficients), we only achieve the order 1/9 (or \(1/4-\varepsilon \) under stronger assumptions on the initial data or the coefficients of the underlying McKean–Vlasov SDE), due to the estimate of the probability that \(X_{\eta (s)}^{i,N,M}\) and \(X_s^{i,N,M}\) have a different sign, i.e., that the term \(|G''(X_s^{i,N,M}) - G''(X_{\eta (s)}^{i,N,M})|\) in the proof of Proposition 4.3 does not allow a Lipschitz type estimate. Refined estimates of the aforementioned expected sign change, as derived in [49] for one-dimensional SDEs, are not easy to prove for an interacting particle system. The proof of the main result in [49] is not applicable to our setting as an individual particle (seen as a one-dimensional equation) does not satisfy Markov properties (due to the dependence of interaction terms), which are key in [49].

4.3 Scheme 1 for the non-decomposable case

Here, we first prove a propagation of chaos result in the case of non-decomposable drifts in Lemma 4.1. The time-discretisation error is then given in Theorem 4.3.

Lemma 4.1

Let Assumption (H.3) hold, let \(\xi \in L_p^{0}({\mathbb {R}})\) for \(p > 4\) and assume that \(b: {\mathbb {R}} \times {\mathcal {P}}_2({\mathbb {R}}) \rightarrow {\mathbb {R}}\) is uniformly bounded. Let \((X_t^{i})_{t \in [0,T]}\) be the unique strong solution of (3.3) driven by the Brownian motion \((W_t^{i})_{t \in [0,T]}\) with initial data \(\xi ^{i}\), and \((X_t^{i,N})_{t \in [0,T]}\) is the solution to the associated particle system. Then, there exists a constant \(C>0\) (independent of N) such that

$$\begin{aligned} \max _{i \in \lbrace 1, \ldots , N \rbrace } \sup _{t \in [0,T]} {\mathbb {E}}\left[ |X_t^{i} - X_t^{i,N}|^2 \right] \le C N^{-1/2}. \end{aligned}$$

Proof

First, we observe, using the definitions \(\mu _t = {\mathcal {L}}_{X_t}\), \(\mu ^{N}_t(\mathrm {d}x) = \frac{1}{N} \sum _{j=1}^{N} \delta _{X_t^{j}}(\mathrm {d}x)\), and \(\mu _t^{{\varvec{X}}^{N},N-1}(\mathrm {d}x) = \frac{1}{N-1} \sum _{j \ne i} \delta _{X_t^{j,N}}(\mathrm {d}x)\), that

$$\begin{aligned}&\sup _{t \in [0,T]} {\mathbb {E}} \left[ |X_t^{i} - X_t^{i,N}|^2 \right] \nonumber \\&\quad = \sup _{t \in [0,T]} {\mathbb {E}} \left[ |G^{-1}(G(X^{i}_t,\mu _t),\mu _t) - G^{-1}(G(X_t^{i,N},\mu _t^{{\mathbf {X}}^{N},N-1}),\mu _t^{{\mathbf {X}}^{N},N-1})|^2 \right] \nonumber \\&\quad \le C \Bigg (\sup _{t \in [0,T]} {\mathbb {E}} \left[ |G(X^{i}_t,\mu _t)- G(X_t^{i,N},\mu _t^{{\mathbf {X}}^{N},N-1})|^2 \right] \nonumber \\&\qquad + \sup _{t \in [0,T]} L(c)\left( {\mathcal {W}}^{2}_2(\mu _t,\mu ^{N}_t) + {\mathcal {W}}^{2}_2(\mu ^{N}_t,\mu _t^{{\mathbf {X}}^{N},N-1}) \right) \Bigg ), \end{aligned}$$
(4.7)

where \(L(c) \rightarrow 0\) as \(c \rightarrow 0\) (see Proposition 3.3). Furthermore, [14, Theorem 5.8] implies that \({\mathcal {W}}^{2}_2(\mu _t,\mu ^{N}_t) \le CN^{-1/2}\) and in addition, by triangle inequality, we deduce

$$\begin{aligned} L(c) {\mathcal {W}}^{2}_2(\mu ^{N}_t,\mu _t^{{\mathbf {X}}^{N},N-1})&\le 2L(c) \left( \sup _{t \in [0,T]} {\mathbb {E}} \left[ |X_t^{i} - X_t^{i,N}|^2 \right] + {\mathcal {W}}^{2}_2(\mu _t^{{\mathbf {X}}^{N}},\mu _t^{{\mathbf {X}}^{N},N-1}) \right) \nonumber \\&\le 2L(c) \sup _{t \in [0,T]} {\mathbb {E}} \left[ |X_t^{i} - X_t^{i,N}|^2 \right] + CN^{-1/2}, \end{aligned}$$
(4.8)

where we used \({\mathcal {W}}^{2}_2(\mu _t^{{\varvec{X}}^{N}},\mu _t^{{\varvec{X}}^{N},N-1}) \le C N^{-1}\), which follows from [60, Lemma 3.1].

To summarise, taking (4.7) and (4.8) into account, we obtain

$$\begin{aligned} \sup _{t \in [0,T]} {\mathbb {E}} \left[ |X_t^{i} - X_t^{i,N}|^2 \right]&\le C \Bigg ( \sup _{t \in [0,T]} {\mathbb {E}} \left[ |G(X^{i}_t,\mu _t)- G(X_t^{i,N},\mu _t^{{\mathbf {X}}^{N}})|^2 \right] \\&\quad + \sup _{t \in [0,T]} {\mathbb {E}} \left[ |G(X_t^{i,N},\mu _t^{{\mathbf {X}}^{N}})- G(X_t^{i,N},\mu _t^{{\mathbf {X}}^{N},N-1})|^2 \right] + N^{-1/2} \Bigg ). \end{aligned}$$

A similar analysis to above can be employed to handle the second term and hence, we arrive at

$$\begin{aligned} \sup _{t \in [0,T]} {\mathbb {E}} \left[ |X_t^{i} - X_t^{i,N}|^2 \right]&\le C \left( \sup _{t \in [0,T]} {\mathbb {E}} \left[ |G(X^{i}_t,\mu _t)- G(X_t^{i,N},\mu _t^{{\mathbf {X}}^{N}})|^2 \right] + N^{-1/2} \right) . \end{aligned}$$
(4.9)

To further estimate (4.9), we apply Itô’s formula to derive

$$\begin{aligned}&G(X^{i}_t,\mu _t)- G(X_t^{i,N},\mu _t^{{\mathbf {X}}^{N}}) \\&\quad = G(X^{i}_0,\mu _0)- G(X_0^{i,N},\mu _0^{{\mathbf {X}}^{N}}) \\&\qquad + \int _{0}^{t} \left( b(X^{i}_s,\mu _s) + \alpha ({\mu _s}) {\bar{\phi }}'(X^{i}_s) b(X^{i}_s,\mu _s) + \frac{1}{2} \alpha ({\mu _s}) {\bar{\phi }}''(X^{i}_s) \sigma ^2(X^{i}_s) \right) \, \mathrm {d}s \\&\qquad - \int _{0}^{t} \Bigg (b(X^{i,N}_s,\mu _s^{{\mathbf {X}}^{N}}) + \alpha (\mu _s^{{\mathbf {X}}^{N}}) {\bar{\phi }}'(X^{i,N}_s) b(X^{i,N}_s,\mu _s^{{\mathbf {X}}^{N}}) \\& + \frac{1}{2} \alpha (\mu _s^{{\mathbf {X}}^{N}}) {\bar{\phi }}''(X^{i,N}_s) \sigma ^2(X^{i,N}_s) \Bigg ) \, \mathrm {d}s \\&\qquad + \int _{0}^{t} \left( \sigma (X^{i}_s) + \alpha ({\mu _s}) {\bar{\phi }}'(X^{i}_s)\sigma (X^{i}_s) - \sigma (X^{i,N}_s) - \alpha (\mu _s^{{\mathbf {X}}^{N}}) {\bar{\phi }}'(X^{i,N}_s)\sigma (X^{i,N}_s) \right) \, \mathrm {d}W^{i}_s \\&\qquad + \int _{0}^{t} \Big ( \partial _s G(X^{i}_s,\mu _s) - \frac{1}{2N} \sum _{k =1}^{N} \partial _y \partial _{\mu }G(X_s^{i,N},\mu _s^{{\mathbf {X}}^{N}})(X_s^{k,N}) \sigma ^2(X^{k,N}_s) \\& - \frac{1}{N} \sum _{k =1}^{N} \partial _{\mu } G(X_s^{i,N},\mu _s^{{\mathbf {X}}^{N}})(X_s^{k,N}) b(X^{k,N}_s,\mu _s^{{\mathbf {X}}^{N}}) \Big ) \, \mathrm {d}s \\&\qquad - \int _{0}^{t} \frac{1}{N} \sum _{k =1}^{N} \partial _{\mu } G(X_s^{i,N},\mu _s^{{\mathbf {X}}^{N}})(X_s^{k,N}) \sigma (X^{k,N}_s) \, \mathrm {d}W_s^{k} \\&\qquad - \int _{0}^{t} \frac{1}{2N^{2}} \sum _{k=1}^{N}\partial ^2_{\mu }G(X_s^{i,N},\mu _s^{{\mathbf {X}}^{N}})(X_s^{k,N},X_s^{k,N}) \sigma ^2(X_s^{k,N}) \, \mathrm {d}s \\&\qquad - \int _{0}^{t} \frac{1}{N} \partial _{x_i} \partial _{\mu }G(X_s^{i,N},\mu _s^{{\mathbf {X}}^{N}})(X_s^{i,N})\sigma ^2(X_s^{i,N}) \, \mathrm {d}s =: \sum _{i=1}^{7} \varPi _i. \nonumber \end{aligned}$$

It is clear due to (H.3(3)) that \({\mathbb {E}}[|\varPi _6|^2] + {\mathbb {E}}[|\varPi _7|^2] \le CN^{-1}\). For the term \(\varPi _5\) we derive, using BDG’s inequality,

$$\begin{aligned}&\mathbb {E}[|\varPi _5|^2] \le \frac{1}{N}\mathbb {E}\Bigg [\sum _{k=1}^{N} C \int _{0}^{t}\Big ( \partial _{\mu } G(X_s^{i,N},\mu _s^{\varvec{X}^{N}})(X_s^{k,N}) \sigma (X^{k,N}_s) \\&- \partial _{\mu }\alpha (\mu _s)(X_s^{k})\bar{\phi }(X^{k}_s) \sigma (X^{k}_s) \Big )^2 \, \mathrm {d}s \\&\quad +\sum _{k,l=1}^{N} \int _{0}^{t}\partial _{\mu }\alpha (\mu _s)(X_s^{k})\bar{\phi }(X^{k}_s) \sigma (X^{k}_s) \, \mathrm {d}W_s^{k} \int _{0}^{t} \partial _{\mu }\alpha (\mu _s)(X_s^{l})\bar{\phi }(X^{l}_s) \sigma (X^{l}_s) \, \mathrm {d}W_s^{l} \Bigg ]. \end{aligned}$$

Therefore, taking the Lipschitz continuity of \((x,\mu ) \mapsto \partial _{\mu }\alpha (\mu )(x) {\bar{\phi }}(x) \sigma (x)\) and the independence of \((X_t^{k})_{k \in \lbrace 1, \ldots , N \rbrace }\), for \(t \in [0,T]\), into account and using \({\mathcal {W}}_2(\mu _s^{{\varvec{X}}^{N}},\mu _s) \le {\mathcal {W}}_2(\mu _s^{{\varvec{X}}^{N}},\mu ^{N}_s) + {\mathcal {W}}_2(\mu _s^N,\mu _s)\) along with [14, Theorem 5.8], we obtain

$$\begin{aligned} {\mathbb {E}}[|\varPi _5|^2]&\le C \left( \int _{0}^{T} {\mathbb {E}} \left[ |X_t^{i} - X_t^{i,N}|^2 \right] \, \mathrm {d}t + N^{-1/2} \right) . \end{aligned}$$

Similar to Lemma 3.1 combined with BDG’s inequality and Hölder’s inequality, we deduce

$$\begin{aligned}&{\mathbb {E}}[|\varPi _1|^2]+ {\mathbb {E}}[|\varPi _2|^2]+ {\mathbb {E}}[|\varPi _3|^2] + {\mathbb {E}}[|\varPi _4|^2] \\&\quad \le C \left( \int _{0}^{T} {\mathbb {E}} \left[ |X_t^{i} - X_t^{i,N}|^2 \right] \, \mathrm {d}t + N^{-1/2} \right) . \end{aligned}$$

Therefore, inserting the estimates for \({\mathbb {E}}[|\varPi _1|^2], \ldots , {\mathbb {E}}[|\varPi _7|^2]\) back into (4.9), we deduce the claim using Gronwall’s inequality. \(\square \)

The following hybrid explicit-implicit time-stepping algorithm computes a discrete-time approximation of \((X_t^{i,N})_{t \in [0,T]}\), denoted by \(X_{t_n}^{i,N,M}\) for \(n \in \lbrace 0, \ldots , M \rbrace \) and \(i \in \lbrace 1, \ldots , N \rbrace \):

  • Set \({\tilde{X}}^{i,N,M}_{t_0} = G(X^{i,N}_0,\mu _0^{{\varvec{X}}^{N}})\) and \(X^{i,N,M}_{t_0} = X^{i,N}_0= \xi ^{i}\).

  • For \(n \ge 1\), compute

    $$\begin{aligned} {\tilde{X}}^{i,N,M}_{t_n} = {\tilde{X}}^{i,N,M}_{t_{n-1}}+ & {} B_i({\tilde{X}}^{1,N,M}_{t_{n-1}}, \ldots , {\tilde{X}}^{N,N,M}_{t_{n-1}}) h \\+ & {} \sum _{j=1}^{N}\varSigma ^{i,j}({\tilde{X}}^{1,N,M}_{t_{n-1}}, \ldots , {\tilde{X}}^{N,N,M}_{t_{n-1}}) \varDelta W^{j}_n, \end{aligned}$$

    where \(B_i\) and \(\varSigma ^{i,j}\) are defined by (3.16) and (3.17), respectively.

  • Find \(X^{i,N,M}_{t_n}\) such that \(X^{i,N,M}_{t_n} = G^{-1}({\tilde{X}}^{i,N,M}_{t_n},\mu _{t_n}^{{\varvec{X}}^{N,M}})\), with \(\mu _{t_n}^{{\varvec{X}}^{N,M}}(\mathrm {d}x) = \frac{1}{N} \sum _{j=1}^{N} \delta _{X^{j,N,M}_{t_n}}(\mathrm {d}x)\).

Remark 4.2

The implicit function theorem applied to the function

$$\begin{aligned} {\varvec{F}}_N({\varvec{x}}^{N},{\varvec{y}}^{N}) = {\varvec{y}}^{N}- \left( G^{-1}(x_1,\mu ^{{\varvec{y}}^{N}}), \ldots , G^{-1}(x_N,\mu ^{{\varvec{y}}^{N}}) \right) ^{\top }, \quad {\varvec{x}}^{N}, {\varvec{y}}^{N} \in {\mathbb {R}}^{N}, \end{aligned}$$

implies that we can express \({\varvec{y}}^{N}\) in terms of \({\varvec{x}}^{N}\). The applicability of the implicit function theorem follows from similar arguments to the ones presented in Lemma 3.2 along with Proposition A.1.

Remark 4.3

We could also define an explicit scheme by setting \(X^{i,N,M}_{t_n} = G^{-1}({\tilde{X}}^{i,N,M}_{t_n},\mu _{t_n}^{\tilde{{\varvec{X}}}^{N,M}})\), with \(\mu _{t_n}^{\tilde{{\varvec{X}}}^{N,M}}(\mathrm {d}x) = \frac{1}{N} \sum _{j=1}^{N} \delta _{{\tilde{X}}^{j,N,M}_{t_n}}(\mathrm {d}x)\). However, to derive a strong convergence rate for the resulting scheme, one has to analyse the quantity \({\mathbb {E}}[|X^{i,N,M}_{t_n}-{\tilde{X}}^{i,N,M}_{t_n}|^2]\). Similar arguments as for Scheme 2 could possibly be used here, but our current analysis does not allow us to derive an optimal convergence rate in h.

Theorem 4.3

Let Assumption (H.3) hold, let \(\xi \in L_p^{0}({\mathbb {R}})\) for \(p > 4\) and assume that \(b: {\mathbb {R}} \times {\mathcal {P}}_2({\mathbb {R}}) \rightarrow {\mathbb {R}}\) is uniformly bounded. Let \((X_t^{i})_{t \in [0,T]}\) be the unique strong solution of (3.3) driven by the Brownian motion \((W_t^{i})_{t \in [0,T]}\) with initial data \(\xi ^{i}\), and \(X_{t_n}^{i,N,M}\) for \(n \in \lbrace 0, \ldots , M\rbrace \) be defined by the above algorithm. Then, there exists a constant \(C>0\) (independent of N and M) such that

$$\begin{aligned} \max _{i \in \lbrace 1, \ldots , N \rbrace } \max _{n \in \lbrace 0, \ldots , M \rbrace } {\mathbb {E}}\left[ |X_{t_n}^{i} - X_{t_n}^{i,N,M}|^2 \right] \le C (N^{-1/2} + h). \end{aligned}$$

Proof

We start with the observation that for any \(n \in \lbrace 0, \ldots , M \rbrace \)

$$\begin{aligned}&|X_{t_n}^{i,N} - X_{t_n}^{i,N,M}|^2 \\&= |G^{-1}(G(X_{t_n}^{i,N},\mu _{t_n}^{{\varvec{X}}^{N},N-1}),\mu _{t_n}^{{\varvec{X}}^{N},N-1})-G^{-1}({\tilde{X}}^{i,N,M}_{t_n},\mu _{t_n}^{{\varvec{X}}^{N,M}})|^2 \\&\le 2|G^{-1}(G(X_{t_n}^{i,N},\mu _{t_n}^{{\varvec{X}}^{N},N-1}),\mu _{t_n}^{{\varvec{X}}^{N},N-1})-G^{-1}(G(X_{t_n}^{i,N},\mu _{t_n}^{{\varvec{X}}^{N}}),\mu _{t_n}^{{\varvec{X}}^{N}})|^2 \\&\quad + 2|G^{-1}(G(X_{t_n}^{i,N},\mu _{t_n}^{{\varvec{X}}^{N}}),\mu _{t_n}^{{\varvec{X}}^{N},N})-G^{-1}({\tilde{X}}^{i,N,M}_{t_n},\mu _{t_n}^{{\varvec{X}}^{N,M}})|^2. \end{aligned}$$

Using the arguments from the previous lemma, the first term can be shown to be of order \({\mathcal {O}}(N^{-1})\). For the second term, we derive the estimate

$$\begin{aligned}&|G^{-1}(G(X_{t_n}^{i,N},\mu _{t_n}^{{\varvec{X}}^{N}}),\mu _{t_n}^{{\varvec{X}}^{N}})-G^{-1}({\tilde{X}}^{i,N,M}_{t_n},\mu _{t_n}^{{\varvec{X}}^{N,M}})|^2 \\&\quad \le C \left( |G(X_{t_n}^{i,N},\mu _{t_n}^{{\varvec{X}}^{N}}) - {\tilde{X}}^{i,N,M}_{t_n}|^2 + L(c){\mathcal {W}}_2^{2}(\mu _{t_n}^{{\varvec{X}}^{N}},\mu _{t_n}^{{\varvec{X}}^{N,M}}) \right) , \end{aligned}$$

where \(L(c) \rightarrow 0\) as \(c \rightarrow 0\), as in Proposition 3.3.

Therefore, we get

$$\begin{aligned} {\mathbb {E}} \left[ |X_{t_n}^{i,N} - X_{t_n}^{i,N,M}|^2 \right] \le C \left( N^{-1} + {\mathbb {E}} \left[ |G(X_{t_n}^{i,N},\mu _{t_n}^{{\varvec{X}}^{N}}) - {\tilde{X}}^{i,N,M}_{t_n}|^2 \right] \right) . \end{aligned}$$

Using now the definition of \({\tilde{X}}^{i,N,M}_{t_n}\) and Lemma 3.3, one shows that there exists a constant \(C>0\) such that \({\mathbb {E}} \left[ |G(X_{t_n}^{i,N},\mu _{t_n}^{{\varvec{X}}^{N}}) - {\tilde{X}}^{i,N,M}_{t_n}|^2 \right] \le Ch\). This together with Lemma 4.1 yields the claim. \(\square \)

5 Numerical illustration

In the following, we will present two examples of McKean–Vlasov SDEs and interacting particle systems exhibiting discontinuous drifts in order to motivate the theoretical study of such equations and to numerically illustrate the strong convergence behaviour of an Euler–Maruyama scheme. These models find applications in biology and mathematical finance, in particular systemic risk.

As we do not know the exact solution of the considered equations, the convergence rate (in terms of number of time-steps) was determined by comparing two solutions computed on a fine and coarse grid, respectively, where the same Brownian motions were used. In order to illustrate the strong convergence behaviour in the uniform time-step h, we compute the root-mean-square error (RMSE) by comparing the numerical solution at level l of the time discretisation with the solution at level \(l-1\) at the final time \(T=1\). To be precise, as error measure we use the quantity

$$\begin{aligned} \text {RMSE}:= \sqrt{\frac{1}{N} \sum _{i=1}^{N} \left( X_T^{i,N,M_l} - X_T^{i,N,M_{l-1}} \right) ^2}, \end{aligned}$$

where \(M_l = 2^lT\) and by \(X_T^{i,N,M_l}\) we denote the approximation of X at time T computed based on N particles and \(2^lT\) time-steps. The number of particles used in the tests will be specified below.

5.1 Neuronal interactions

In this section, we provide a numerical illustration for a specific model for neuronal interactions. Interacting particle systems are ubiquitous in neuroscience, such as the Hodgkin-Huxley model [2, 8] or mean-field equations describing the behaviour of a (large) network of interacting spiking neurons [23]. For other mean-field models appearing in neuroscience, we refer to the references given in [23].

A recent model of the action potential of neurons is described in [25] and involves discontinuous coefficients. The reason for the necessity of introducing discontinuities is the following: After a charging phase of an individual neuron, subject to spikes of nearby neurons, randomness, and the effect of discharge with constant rate, the neuron emits a spike to the network once a certain threshold is hit and is then in a recovery phase. The change between these two phases is characterised by a discontinuity in the dynamics describing the potential of each neuron.

The action potential of N interacting neurons at time \(t \in [0,T]\), \(V^{i,N}_t (\text {mod } 2) \in [0,2)\), where \(V^{i,N}_t \in {\mathbb {R}}\), \(i \in \lbrace 1, \ldots , N \rbrace \), is modelled by the discontinuous mean-field equation

$$\begin{aligned} \mathrm {d}V^{i,N}_t&= \lambda (V^{i,N}_t (\text {mod } 2)) \, \mathrm {d}t + \sigma ^{\varepsilon }(V^{i,N}_t (\text {mod } 2)) \, \mathrm {d}W^{i}_t \\&\quad + \frac{1}{N}\sum _{j=1}^N \varTheta (\xi _i,\xi _j) \mathrm {I}_{[1,1+\kappa ]}(V^{j,N}_t (\text {mod } 2)) \mathrm {I}_{[0,1]}(V^{i,N}_t (\text {mod } 2)) \, \mathrm {d}t, \end{aligned}$$

with square-integrable random initial values \(V_0^{i,N} = \eta _i\), for \(i \in \lbrace 1, \ldots , N \rbrace \) and \(0< \kappa < 1\) fixed. The set of i. i. d. random variables \(\lbrace \xi _1, \ldots , \xi _N \rbrace \), \(\xi _i \in D\), describes the location of the N (non-moving) neurons, where D is modelled as an open connected domain of \({\mathbb {R}}^3\). Hence, the position of each neuron is given by \(X_t^{i,N} = \xi ^{i}\) at each time \(t \in [0,T]\). It is further assumed that \(\lbrace \xi _1, \ldots , \xi _N, \eta _1, \ldots , \eta _N \rbrace \) are independent for each integer \(N \ge 1\). The standard Brownian motions \((W_t^{i})_{t \in [0,T]}\) are independent, and independent of \(\xi _i\) and \(\eta _i\), for \(i \in \lbrace 1, \ldots , N \rbrace \). Observe that \(V^{i,N}\) is specified by an SDE, where the drift has a discontinuity in the state variable (due to the choice of \(\lambda \); see below for details), but also has a jump, in case a particle \(j \ne i\) reaches the the critical values 1 or \(1+\kappa \).

The following conditions are imposed in [25] to guarantee strong well-posedness of the particle system above (see, [25, Theorem 2.2]) and the associated McKean–Vlasov equation (see, [25, Theorem 6.1]); propagation of chaos type results, i.e., weak convergence of the law of the empirical distribution of \((\xi ^{i},V^{i,N})\) to a Dirac measure centred at the law of the solution to the underlying McKean–Vlasov equation, are shown in [25, Theorem 5.7]:

  1. 1.

    \(\lambda (v) = -{\hat{\lambda }}v \mathrm {I}_{[0,1]}(v) + \mathrm {I}_{(1,2)}, \quad {\hat{\lambda }} >0\),    \(\varTheta (x,y) = \sin (|x-y|)\), for \(x,y \in {\mathbb {R}}^3\);

  2. 2.

    \( \sigma ^{\varepsilon }\) is a \({\mathcal {C}}_b^{1}([0,2])\) function satisfying \( \sigma ^{\varepsilon } \ge \sqrt{2 \varepsilon } > 0\) and

    $$\begin{aligned}&\sigma ^{\varepsilon }(v) = \sqrt{2\varepsilon } \text { on } [1,2], \quad \sigma ^{\varepsilon }(2) = \sigma ^{\varepsilon }(0) = \sqrt{2 \varepsilon }, \quad (\sigma ^{\varepsilon })'(0) = (\sigma ^{\varepsilon })'(2) = 0, \\&\quad \text { with } \varepsilon >0 \text { fixed}. \end{aligned}$$

These conditions specify our Example 1. For Example 2, the diffusion term was chosen \(\sigma ^{\varepsilon }(x) = \sqrt{2\varepsilon }+x\). For our tests, we used \(N=10^3\). Furthermore, we set \(\kappa =0.01\), \({\hat{\lambda }} = 0.02\) and \(\varepsilon = 0.1\). The initial values \(\eta _1, \ldots , \eta _N\) are chosen as independent normal random variables with mean 1 and standard deviation 2. Also, these values are considered modulo 2. The variables \(\xi _1, \ldots , \xi _N\) are independent three-dimensional random variables chosen from the same multivariate normal distribution with some given mean vector and covariance matrix.

We investigate numerically the convergence of Scheme 2, i.e., the Euler–Maruyama scheme without applying any transformations. In Fig. 1, we observe strong convergence of order 3/4 for Example 1, which is most likely due to the choice of \(\sigma ^{\varepsilon } = \sqrt{2\varepsilon }\) as constant. In [50] a Milstein scheme for one-dimensional SDEs with discontinuities in the drift was derived and a strong convergence order of 3/4 was proven. In addition, it is conjectured in [50] (Conjecture 1 and Conjecture 2) that the rate 3/4 is optimal.

Fig. 1
figure 1

Strong convergence of the Euler–Maruyama scheme applied to the particle system obtained by approximating the equation for the action potential of the neurons

5.2 Systemic risk

In this section, we consider a McKean–Vlasov SDE of the form

$$\begin{aligned} \mathrm {d}X_t = \left( a \left( {\mathbb {E}}[X_t]- X_t \right) + \kappa _1 \mathrm {I}_{ \lbrace X_t \le 0 \rbrace } + \kappa _2 \mathrm {I}_{\lbrace X_t > 0 \rbrace } \right) \, \mathrm {d}t + (\sigma + X_t) \, \mathrm {d}W_t, \quad X_0=x \in {\mathbb {R}}, \end{aligned}$$
(5.1)

where \(a \ge 0\) is the mean-reversion rate, \(\kappa _1 < 0\), \(\kappa _2>0\) and \(\sigma >0\). The strong well-posedness of (5.1) follows from Proposition 3.1. This equation can be linked to a model of systemic risk in [16], where a mean-field game of N banks borrowing from, and lending to, a central bank is proposed. The banks control the rate of their borrowing depending on their (log-)monetary reserves, which are modelled by a system of SDEs with interaction through their average. In this setting flocking, and thus systemic default events, may occur. Here, we give a slight reformulation of this problem following [28, Section 4]. The problem consists in finding \(({\hat{\mu }}_t,{\hat{\beta }}_t)_{t \in [0,T]}\), where \(({\hat{\mu }}_t)_{t \in [0,T]}\) is a flow of measures in \({\mathcal {C}}([0,T],{\mathcal {P}}_2({\mathbb {R}}))\) and \(({\hat{\beta }}_t)_{t \in [0,T]}\) is an adapted, square-integrable control process (describing the rate of borrowing from, or lending to, the central bank), such that \(({\hat{\beta }}_t)_{t \in [0,T]}\) minimises the objective function given by

$$\begin{aligned} J^{{\hat{\mu }}}(x;\beta )= {\mathbb {E}}&\left[ \int _{0}^{T}\left( r |\beta _t| + \frac{\varepsilon }{2} \left( X^{{\hat{\mu }},\beta }_t - \int _{{\mathbb {R}}} x \, {\hat{\mu }}_t(\mathrm {d}x) \right) ^2 \right) \, \mathrm {d}t \right. \\&\qquad \qquad \quad \left. + \frac{c}{2} \left( X^{{\hat{\mu }},\beta }_T - \int _{{\mathbb {R}}} x \, {\hat{\mu }}_T(\mathrm {d}x) \right) ^2 \right] , \end{aligned}$$

for \(r, \varepsilon , c \ge 0\), where

$$\begin{aligned} \mathrm {d}X^{{\hat{\mu }},\beta }_t = \left( a \left( \int _{{\mathbb {R}}} x \, {\hat{\mu }}_t(\mathrm {d}x)- X^{{\hat{\mu }},\beta }_t \right) + \beta _t \right) \, \mathrm {d}t + (\sigma + X^{{\hat{\mu }},\beta }_t) \, \mathrm {d}W_t, \quad X^{{\hat{\mu }},\beta }_0=x \in {\mathbb {R}}, \end{aligned}$$
(5.2)

and \({\hat{\mu }}_t = {\mathcal {L}}_{X^{{\hat{\mu }},{\hat{\beta }}}_t}\) for all \(t \in [0,T]\).

For simplicity, we did not add the common noise term as in [16]. In addition, we modified the diffusion to allow it to be degenerate. In [28], a constraint \(\beta _t \in [\kappa _1,\kappa _2]\) on the borrowing/lending rate is imposed for all \(t \in [0,T]\). The minimiser of the objective function for this mean-field game (with constant diffusion term) is shown to be a control of bang-zero-bang type (see [28, equation (24)] for an analytic expression). Written in feedback form, this optimal control strategy has discontinuities that are time-dependent, i.e., the zero-control region changes over time, a setting not covered by our current analysis.

Using instead the special bang-bang type control \((\beta ^{*}_t)_{t \in [0,T]}\) of the form

$$\begin{aligned} \beta ^{*}_t = {\left\{ \begin{array}{ll} \kappa _1, &{}\text { if } X_t \le 0 \\ \kappa _2, &{} \text { if } X_t > 0, \\ \end{array}\right. } \end{aligned}$$

and plugging \(\beta _t^{*}\) back into (5.2) results into an equation of the form (5.1) with a one-point discontinuity.

In our numerical experiments, we set \(\kappa _1= -0.5\), \(\kappa _2=0.5, x=0\) and \(\sigma =0.7\). Further, we consider three different choices for the mean-reversion rate, i.e., we set \(a=1,5,10\). The expected value is approximated by the empirical mean of \(N=10^4\) particles. For a larger value of a, we expect sample paths of (5.1) to be more concentrated around the mean of the samples; see Fig. 2a, b (with \(T=1\) and \(M=2^7\)). We can also observe that for a stronger concentration effect the strong approximation behaviour becomes better; see Fig. 2b (Fig. 3).

Fig. 2
figure 2

Sample trajectories of the particle system associated with (5.1) for \(N=10\) and \(a=1\) (left) and \(a=10\) (right)

Fig. 3
figure 3

Strong convergence of the Euler–Maruyama scheme applied to the particle system obtained by approximating the Eq. (5.1) with \(a=1,5,10\)