Multilevel and Multiindex Monte Carlo methods for the McKean–Vlasov equation
 2.2k Downloads
Abstract
We address the approximation of functionals depending on a system of particles, described by stochastic differential equations (SDEs), in the meanfield limit when the number of particles approaches infinity. This problem is equivalent to estimating the weak solution of the limiting McKean–Vlasov SDE. To that end, our approach uses systems with finite numbers of particles and a timestepping scheme. In this case, there are two discretization parameters: the number of time steps and the number of particles. Based on these two parameters, we consider different variants of the Monte Carlo and Multilevel Monte Carlo (MLMC) methods and show that, in the best case, the optimal work complexity of MLMC, to estimate the functional in one typical setting with an error tolerance of \(\mathrm {TOL}\), is Open image in new window when using the partitioning estimator and the Milstein timestepping scheme. We also consider a method that uses the recent Multiindex Monte Carlo method and show an improved work complexity in the same typical setting of Open image in new window . Our numerical experiments are carried out on the socalled Kuramoto model, a system of coupled oscillators.
Keywords
Multiindex Monte Carlo Multilevel Monte Carlo Monte Carlo Particle systems McKean–Vlasov Meanfield Stochastic differential equations Weak approximation Sparse approximation Combination techniqueMathematics Subject Classification
65C05 (Monte Carlo methods) 65C30 (Stochastic differential and integral equations) 65C35 (Stochastic particle methods)1 Introduction
In our setting, a stochastic particle system is a system of coupled ddimensional stochastic differential equations (SDEs), each modeling the state of a “particle.” Such particle systems are versatile tools that can be used to model the dynamics of various complicated phenomena using relatively simple interactions, e.g., pedestrian dynamics (Helbing and Molnar 1995; HajiAli 2012), collective animal behavior (Erban et al. 2016; Erban and Haskovec 2012), interactions between cells (Dobramysl et al. 2016) and in some numerical methods such as ensemble Kalman filters (Pierre Del Moral and Tugaut 2016). One common goal of the simulation of these particle systems is to average some quantity of interest computed on all particles, e.g., the average velocity, average exit time or average number of particles in a specific region.
Under certain conditions, most importantly the exchangeability of particles and sufficient regularity of the SDE coefficients, the stochastic particle system approaches a meanfield limit as the number of particles tends to infinity (Sznitman 1991). Exchangeability of particles refers to the assumption that all permutations of the particles have the same joint distribution. In the meanfield limit, each particle follows a single McKean–Vlasov SDE where the advection and/or diffusion coefficients depend on the distribution of the solution to the SDE (Gärtner 1988). In many cases, the objective is to approximate the expected value of a quantity of interest (QoI) in the meanfield limit as the number of particles tends to infinity, subject to some error tolerance, \(\mathrm {TOL}\). While it is possible to approximate the expectation of these QoIs by estimating the solution to a nonlinear PDE using traditional numerical methods, such methods usually suffer from the curse of dimensionality. Indeed, the cost of these methods is usually of Open image in new window for some constant \(w>1\) that depends on the particular numerical method. Using sparse numerical methods alleviates the curse of dimensionality but requires increasing regularity as the dimensionality of the state space increases. On the other hand, Monte Carlo methods do not suffer from this curse with respect to the dimensionality of the state space. This work explores different variants and extensions of the Monte Carlo method when the underlying stochastic particle system satisfies certain crucial assumptions. We theoretically show the validity of some of these assumptions in a somewhat general setting, while verifying the other assumptions numerically on a simple stochastic particle system, leaving further theoretical justification to a future work.
Generally, the SDEs that constitute a stochastic particle system cannot be solved exactly, and their solution must instead be approximated using a timestepping scheme with a number of time steps, N. This approximation parameter and a finite number of particles, P, are the two approximation parameters that are involved in approximating a finite average of the QoI computed for all particles in the system. Then, to approximate the expectation of this average, we use a Monte Carlo method. In such a method, multiple independent and identical stochastic particle systems, approximated with the same number of time steps, N, are simulated, and the average QoI is computed from each and an overall average is then taken. Using this method, a reduction in the variance of the estimator is achieved by increasing the number of simulations of the stochastic particle system or increasing the number of particles in the system. Section 3.1 presents the Monte Carlo method more precisely in the setting of stochastic particle systems. Particle methods that are not based on Monte Carlo were also discussed in Bossy and Talay (1996, 1997). In these methods, a single simulation of the stochastic particle system is carried out, and only the number of particles is increased to reduce the variance.
As an improvement of Monte Carlo methods, the Multilevel Monte Carlo (MLMC) method was first introduced in Heinrich (2001) for parametric integration and in Giles (2008b) for SDEs; see Giles (2015) and references therein for an overview. MLMC improves the efficiency of the Monte Carlo method when only an approximation, controlled with a single discretization parameter, of the solution to the underlying system can be computed. The basic idea is to reduce the number of required samples on the finest, most accurate but most expensive discretization, by reducing the variability of this approximation with a correlated coarser and cheaper discretization as a control variate. More details are given in Sect. 3.2 for the case of stochastic particle systems. The application of MLMC to particle systems has been investigated in many works (Bujok et al. 2013; HajiAli 2012; Rosin et al. 2014). The same concepts have also been applied to nested expectations (Giles 2015). More recently, a particle method applying the MLMC methodology to stochastic particle systems was also introduced in Ricketson (2015) achieving, for a linear system with a diffusion coefficient that is independent of the state variable, a work complexity of Open image in new window .
Recently, the Multiindex Monte Carlo (MIMC) method (HajiAli et al. 2015a) was introduced to tackle highdimensional problems with more than one discretization parameter. MIMC is based on the same concepts as MLMC and improves the efficiency of MLMC even further but requires mixed regularity with respect to the discretization parameters. More details are given in Sect. 3.3 for the case of stochastic particle systems. In that section, we demonstrate the improved work complexity of MIMC compared with the work complexity of MC and MLMC, when applied to a stochastic particle system. More specifically, we show that, when using a naive simulation method for the particle system with quadratic complexity, the optimal work complexity of MIMC is Open image in new window when using the Milstein timestepping scheme and Open image in new window when using the Euler–Maruyama timestepping scheme. Finally, in Sect. 4, we provide numerical verification for the assumptions that are made throughout the current work and the derived rates of the work complexity.
In what follows, the notation \(a \lesssim b\) means that there exists a constant c that is independent of b such that \(a < cb\).
2 Problem setting
Kuramoto Example
Lemma 2.1
Proof
Finally, as mentioned above, with a naive method, the total cost to compute a single sample of \(\phi _{P}^N\) is Open image in new window . The quadratic power of P can be reduced by using, for example, a multipole algorithm (Carrier et al. 1988; Greengard and Rokhlin 1987). In general, we consider the work required to compute one sample of \(\phi _P^N\) as Open image in new window for a positive constant, \({{\gamma _{\mathrm {p}}}}\ge 1\).
3 Monte Carlo methods
In this section, we study different Monte Carlo methods that can be used to estimate the previous quantity, \(\phi _\infty \). In the following, we use the notation Open image in new window where, for each q, \(\omega ^{(m)}_{q}\) denotes the m’th sample of the set of underlying random variables that are used in calculating \(X_{qP}^{NN}\), i.e., the Wiener path, \(W_{q}\), the initial condition, \(x_q^0\), and any random variables that are used in Open image in new window or Open image in new window . Moreover, we sometimes write \(\phi _P^N(\varvec{\omega }_{1:P}^{(m)})\) to emphasize the dependence of the \(m'\)th sample of \(\phi _P^{N}\) on the underlying random variables.
3.1 Monte Carlo (MC)
Kuramoto Example
Using a naive calculation method of \(\phi _{P}^N\) (i.e., \({{\gamma _{\mathrm {p}}}}=2\)) gives a work complexity of Open image in new window . See also Table 1 for the work complexities for different common values of \({{\gamma _{\mathrm {p}}}}\).
3.2 Multilevel Monte Carlo (MLMC)
In the following subsections, we look at different settings in which either \(P_\ell \) or \(N_\ell \) depends on \(\ell \) while the other parameter is constant for all \(\ell \). We begin by recalling the optimal convergence rates of MLMC when applied to a generic random variable, Y, with a trivial generalization to the case when there are two discretization parameters: one that is a function of the level, \(\ell \), and the other, \({\widetilde{L}}\), that is fixed for all levels.
Theorem 3.1
 1.
 2.
 3.
\(\mathrm {Work}\left[ Y_{{\widetilde{L}}, \ell }  Y_{{\widetilde{L}}, \ell 1}\right] \lesssim {\widetilde{\beta }}^{{\widetilde{\gamma }} {\widetilde{L}}} \beta ^{\gamma \ell }\).
The work complexity of the different methods presented in this work in common situations, encoded as (a, b) to represent Open image in new window
Method  \({{s_{\mathrm {t}}}}=1, {{\gamma _{\mathrm {p}}}}=1\)  \({{s_{\mathrm {t}}}}=1, {{\gamma _{\mathrm {p}}}}=2\)  \({{s_{\mathrm {t}}}}=2, {{\gamma _{\mathrm {p}}}}=1\)  \({{s_{\mathrm {t}}}}=2, {{\gamma _{\mathrm {p}}}}=2\) 

MC (Sect. 3.1)  \((3, 0)\)  \((4, 0)\)  \((3, 0)\)  \((4, 0)\) 
MLMC (Sect. 3.2.1)  \((2, 2)\)  \((3, 2)\)  \((2, 0)\)  \((3, 0)\) 
MLMC (Sect. 3.2.2)  \((3, 0)\)  \((3, 2)\)  \((3, 0)\)  \((3, 2)\) 
MLMC (Sect. 3.2.3)  \((2, 2)\)  \((3, 0)\)  \((2, 2)\)  \((3, 0)\) 
MIMC (Sect. 3.3)  \((2, 2)\)  \((2, 4)\)  \((2, 0)\)  \((2, 2)\) 
Proof
3.2.1 MLMC hierarchy based on the number of time steps
Kuramoto Example
In this example, using the Milstein timestepping scheme, we have \({{s_{\mathrm {t}}}}= 2\) (cf. Fig. 1), and a naive calculation method of \(\phi _{P}^N\) (\({{\gamma _{\mathrm {p}}}}=2\)) gives a work complexity of Open image in new window . See also Table 1 for the work complexities for different common values of \({{s_{\mathrm {t}}}}\) and \({{\gamma _{\mathrm {p}}}}\).
3.2.2 MLMC hierarchy based on the number of particles
Kuramoto Example

Using the sampler \(\overline{\varphi }\) in (13), we verify numerically that \({{s_{\mathrm {p}}}}= 0\) (cf. Fig. 1). Hence, the work complexity is Open image in new window which is the same work complexity as a Monte Carlo estimator. This should be expected since using the “correlated” samples of \(\overline{\varphi }_{P_{\ell 1}}^N\) and \(\phi _{P_\ell }^N\) does not reduce the variance of the difference, as Fig. 1 shows.

Using the partitioning estimator, \(\widehat{\varphi }\), in (14), we verify numerically that \({{s_{\mathrm {p}}}}= 1\) (cf. Fig. 1). Hence, the work complexity is Open image in new window . Here the samples of \(\widehat{\varphi }_{P_{\ell 1}}^N\) have higher correlation to corresponding samples of \(\phi _{P_\ell }^N\), thus reducing the variance of the difference. Still, using MLMC with hierarchies based on the number of times steps (fixing the number of particles) yields better work complexity. See also Table 1 for the work complexities for different common values of \({{s_{\mathrm {t}}}}\) and \({{\gamma _{\mathrm {p}}}}\).
3.2.3 MLMC hierarchy based on both the number of particles and the number of times steps
Kuramoto Example
We choose \({{\beta _{\mathrm {p}}}}={{\beta _{\mathrm {t}}}}\) and use a naive calculation method of \(\phi _{P}^N\) (yielding \({{\gamma _{\mathrm {p}}}}=2\)) and the partitioning sampler (yielding \({{s_{\mathrm {p}}}}=1\)). Finally, using the Milstein timestepping scheme, we have \({{s_{\mathrm {t}}}}=2\). Refer to Fig. 1 for numerical verification. Based on these rates, we have, in (19), \(s=2\log ({{\beta _{\mathrm {p}}}}), w=\log ({{\beta _{\mathrm {p}}}})\) and \(\gamma =3\log ({{\beta _{\mathrm {p}}}})\). The MLMC work complexity in this case is Open image in new window See also Table 1 for the work complexities for different common values of \({{s_{\mathrm {t}}}}\) and \({{\gamma _{\mathrm {p}}}}\).
3.3 Multiindex Monte Carlo (MIMC)
Kuramoto Example
Here again, we use a naive calculation method of \(\phi _{P}^N\) (yielding \({{\gamma _{\mathrm {p}}}}=2\)) and the partitioning sampler (yielding \({{s_{\mathrm {p}}}}=1\)). Finally, using the Milstein timestepping scheme, we have \({{s_{\mathrm {t}}}}=2\). Hence, \(\zeta = 0\), \(\mathfrak z=1\) and Open image in new window See also Table 1 for the work complexities for different common values of \({{s_{\mathrm {t}}}}\) and \({{\gamma _{\mathrm {p}}}}\).
4 Numerical example
In this section, we provide numerical evidence of the assumptions and work complexities that are made in Sect. 3. This section also verifies that the constants of the work complexity (which were not tracked) are not significant for reasonable error tolerances. The results in this section were obtained using the mimclib software library (HajiAli 2016) and GNU parallel (Tange 2011).
We now compare the MLMC method (Giles 2008b) in the setting that is presented in Sect. 3.2.3 and the MIMC method (HajiAli et al. 2015a) that is presented in Sect. 3.3. In both methods, we use the Milstein timestepping scheme and the partitioning sampler, \(\widehat{\varphi }\), in (14). Recall that in this case, we verified numerically that \({{\gamma _{\mathrm {p}}}}=2\), \({{s_{\mathrm {p}}}}=1\) and \({{s_{\mathrm {t}}}}=2\). We also use the MLMC and MIMC algorithms that were outlined in their original work and use an initial 25 samples on each level or multiindex to compute a corresponding variance estimate that is required to compute the optimal number of samples. In the following, we refer to these methods as simply “MLMC” and “MIMC.” We focus on the settings in Sects. 3.2.3 and 3.3 since checking the bias of the estimator in those settings can be done straightforwardly by checking the absolute value of the level differences in MLMC or the multiindex differences in MIMC. On the other hand, checking the bias in the settings outlined in Sects. 3.1, 3.2.1 and 3.2.2 is not as straightforward and determining the number of times steps and/or the number of particles to satisfy a certain error tolerance requires more sophisticated algorithms. This makes a fair numerical comparison with these later settings somewhat difficult.
Figure 3left shows the exact errors of both MLMC and MIMC for different prescribed tolerances. This plot shows that both methods estimate the quantity of interest up to the same error tolerance; comparing their work complexity is thus fair. On the other hand, Fig. 3right shows a PP plot, i.e., a plot of the cumulative distribution function (CDF) of the MLMC and MIMC estimators, normalized by their variance and shifted by their mean, versus the CDF of a standard normal distribution. This figure shows that our assumption in Sect. 2 of the asymptotic normality of these estimators is well founded. Figure 4 shows the maximum discretization level for both the number of time steps and the number of particles for MLMC and MIMC (cf. (22)). Recall that, for a fixed tolerance in MIMC, \(2 \alpha _2 + \alpha _1\) is bounded by a constant (cf. (21)). Hence, Fig. 4 shows a direct implication on the results reported in Fig. 5 where we plot the maximum cost of the samples used in both MLMC and MIMC for different tolerances. This cost represents an indivisible unit of simulation for both methods, assuming we treat the simulation of the particle system as a black box. Hence, Fig. 5 shows that MIMC has better parallelization scaling, i.e., even with an infinite number of computation nodes MIMC would still be more efficient than MLMC.
Finally, we show in Fig. 6 the cost estimates of MLMC and MIMC for different tolerances. This figure clearly shows the performance improvement of MIMC over MLMC and shows that the complexity rates that we derived in this work are reasonably accurate.
5 Conclusions
This work has shown both numerically and theoretically under certain assumptions, which could be verified numerically, the improvement of MIMC over MLMC when used to approximate a quantity of interest computed on a particle system as the number of particles goes to infinity. The application to other particle systems (or equivalently other McKean–Vlasov SDEs) is straightforward, and similar improvements are expected. The same machinery was also suggested for approximating nested expectations in Giles (2015), and the analysis here applies to that setting as well. Moreover, the same machinery, i.e., multiindex structure with respect to time steps and number of particles coupled with a partitioning estimator, could be used to create control variates to reduce the computational cost of approximating quantities of interest on stochastic particle systems with a finite number of particles.
Future work includes analyzing the optimal level separation parameters, \({{\beta _{\mathrm {p}}}}\) and \({{\beta _{\mathrm {t}}}}\), and the behavior of the tolerance splitting parameter, \(\theta \). Another direction could be applying the MIMC method to higherdimensional particle systems such as the crowd model in HajiAli (2012). On the theoretical side, the next step is to prove the assumptions that were postulated and verified numerically in this work for certain classes of particle systems, namely:
the secondorder convergence with respect to the number of particles of the variance of the partitioning estimator (14) and the convergence rates for mixed differences (MIMC1) and (MIMC2).
Notes
Acknowledgements
R. Tempone is a member of the KAUST Strategic Research Initiative, Center for Uncertainty Quantification in Computational Sciences and Engineering. R. Tempone received support from the KAUST CRG3 Award Ref: 2281 and the KAUST CRG4 Award Ref: 2584. The authors would like to thank Lukas Szpruch for the valuable discussions regarding the theoretical foundations of the methods.
References
 Acebrón, J.A., Bonilla, L.L., Vicente, C.J.P., Ritort, F., Spigler, R.: The Kuramoto model: a simple paradigm for synchronization phenomena. Rev. Mod. Phys. 77(1), 137 (2005)CrossRefGoogle Scholar
 Bossy, M., Talay, D.: Convergence rate for the approximation of the limit law of weakly interacting particles: application to the Burgers equation. Ann. Appl. Probab. 6(3), 818–861 (1996)MathSciNetCrossRefzbMATHGoogle Scholar
 Bossy, M., Talay, D.: A stochastic particle method for the McKean–Vlasov and the Burgers equation. Math. Comput. Am. Math. Soc. 66(217), 157–192 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
 Bujok, K., Hambly, B., Reisinger, C.: Multilevel simulation of functionals of Bernoulli random variables with application to basket credit derivatives. Methodol. Comput. Appl. Probab. 73, 1–26 (2013)zbMATHGoogle Scholar
 Carrier, J., Greengard, L., Rokhlin, V.: A fast adaptive multipole algorithm for particle simulations. SIAM J. Sci. Stat. Comput. 9(4), 669–686 (1988)MathSciNetCrossRefzbMATHGoogle Scholar
 Cliffe, K., Giles, M., Scheichl, R., Teckentrup, A.: Multilevel Monte Carlo methods and applications to elliptic PDEs with random coefficients. Comput. Vis. Sci. 14(1), 3–15 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
 Collier, N., HajiAli, A.L., Nobile, F., von Schwerin, E., Tempone, R.: A continuation multilevel Monte Carlo algorithm. BIT Numer. Math. 55(2), 399–432 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
 Dobramysl, U., Rüdiger, S., Erban, R.: Particlebased multiscale modeling of calcium puff dynamics. Multiscale Model. Simul. 14(3), 997–1016 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
 Erban, R., Haskovec, J.: From individual to collective behaviour of coupled velocity jump processes: a locust example. Kinet. Relat. Models 5(4), 817–842 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
 Erban, R., Haskovec, J., Sun, Y.: A Cucker–Smale model with noise and delay. SIAM J. Appl. Math. 76(4), 1535–1557 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
 Gärtner, J.: On the McKean–Vlasov limit for interacting diffusions. Mathematische Nachrichten 137(1), 197–248 (1988)MathSciNetCrossRefzbMATHGoogle Scholar
 Giles, M.B.: Improved multilevel Monte Carlo convergence using the Milstein scheme. In: Keller, A., Heinrich, S., Niederreiter, H. (eds.) Monte Carlo and QuasiMonte Carlo Methods 2006, pp. 343–358. Springer, Berlin (2008a)CrossRefGoogle Scholar
 Giles, M.B.: Multilevel Monte Carlo path simulation. Oper. Res. 56(3), 607–617 (2008b)MathSciNetCrossRefzbMATHGoogle Scholar
 Giles, M.B.: Multilevel Monte Carlo methods. Acta Numer. 24, 259–328 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
 Giles, M.B., Szpruch, L.: Antithetic multilevel Monte Carlo estimation for multidimensional SDEs without Lévy area simulation. Ann. Appl. Probab. 24(4), 1585–1620 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
 Greengard, L., Rokhlin, V.: A fast algorithm for particle simulations. J. Comput. Phys. 73(2), 325–348 (1987)MathSciNetCrossRefzbMATHGoogle Scholar
 HajiAli, A.L.: Pedestrian flow in the meanfield limit. King Abdullah University of Science and Technology (KAUST). http://hdl.handle.net/10754/250912 (2012)
 HajiAli, A.L.: mimclib. https://github.com/StochasticNumerics/mimclib (2016)
 HajiAli, A.L., Nobile, F., Tempone, R.: Multiindex Monte Carlo: when sparsity meets sampling. Numer. Math. 132, 767–806 (2015a)MathSciNetCrossRefzbMATHGoogle Scholar
 HajiAli, A.L., Nobile, F., von Schwerin, E., Tempone, R.: Optimization of mesh hierarchies in multilevel Monte Carlo samplers. Stoch. Partial Differ. Equ. Anal. Comput. 4, 76–112 (2015b)MathSciNetzbMATHGoogle Scholar
 Heinrich, S.: Multilevel Monte Carlo methods. In: LargeScale Scientific Computing, vol. 2179 of Lecture Notes in Computer Science. Springer, Berlin, pp. 58–67 (2001)Google Scholar
 Helbing, D., Molnar, P.: Social force model for pedestrian dynamics. Phys. Rev. E 51(5), 4282 (1995)CrossRefGoogle Scholar
 Kloeden, P.E., Platen, E.: Numerical solution of stochastic differential equations. Springer, Berlin (1992). doi: 10.1007/9783662126165
 Kolokoltsov, V., Troeva, M.: On the mean field games with common noise and the Mckean–Vlasov SPDEs. ArXiv preprint arXiv:1506.04594 (2015)
 Pierre Del Moral, A.K., Tugaut, J.: On the stability and the uniform propagation of chaos of a class of extended Ensemble Kalman–Bucy filters. SIAM J. Control Optim. 55(1), 119–155 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
 Ricketson, L.: A multilevel Monte Carlo method for a class of McKean–Vlasov processes. ArXiv preprint arXiv:1508.02299 (2015)
 Rosin, M., Ricketson, L., Dimits, A., Caflisch, R., Cohen, B.: Multilevel Monte Carlo simulation of Coulomb collisions. J. Comput. Phys. 274, 140–157 (2014)Google Scholar
 Sznitman, A.S.: Topics in propagation of chaos. In: Ecole d’été de probabilités de SaintFlour XIX–1989. Springer, pp. 165–251 (1991)Google Scholar
 Tange, O.: GNU parallel—the commandline power tool. login USENIX Mag. 36(1), 42–47 (2011)Google Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.