Abstract
We perform a general optimization of the parameters in the multilevel Monte Carlo (MLMC) discretization hierarchy based on uniform discretization methods with general approximation orders and computational costs. We optimize hierarchies with geometric and non-geometric sequences of mesh sizes and show that geometric hierarchies, when optimized, are nearly optimal and have the same asymptotic computational complexity as non-geometric optimal hierarchies. We discuss how enforcing constraints on parameters of MLMC hierarchies affects the optimality of these hierarchies. These constraints include an upper and a lower bound on the mesh size or enforcing that the number of samples and the number of discretization elements are integers. We also discuss the optimal tolerance splitting between the bias and the statistical error contributions and its asymptotic behavior. To provide numerical grounds for our theoretical results, we apply these optimized hierarchies together with the Continuation MLMC Algorithm (Collier et al., BIT Numer Math 55(2):399–432, 2015). The first example considers a three-dimensional elliptic partial differential equation with random inputs. Its space discretization is based on continuous piecewise trilinear finite elements and the corresponding linear system is solved by either a direct or an iterative solver. The second example considers a one-dimensional Itô stochastic differential equation discretized by a Milstein scheme.
This is a preview of subscription content, access via your institution.












References
Amestoy, P.R., Duff, I.S., L’Excellent, J.Y., Koster, J.: A fully asynchronous multifrontal solver using distributed dynamic scheduling. SIAM J. Matrix Anal. Appl. 23, 15–41 (2001). doi:10.1137/S0895479899358194
Babuška, I., Nobile, F., Tempone, R.: A stochastic collocation method for elliptic partial differential equations with random input data. SIAM Rev. 52(2), 317–355 (2010)
Balay, S., Gropp, W.D., McInnes, L.C., Smith, B.F.: Efficient management of parallelism in object oriented numerical software libraries. In: Arge, E., Bruaset, A.M., Langtangen, H.P. (eds.) Modern Software Tools in Scientific Computing, pp. 163–202. Birkhäuser, Boston (1997)
Balay, S., Brown, J., Buschelman, K., Gropp, W.D., Kaushik, D., Knepley, M.G., McInnes, L.C., Smith, B.F., Zhang, H.: PETSc Web page (2013). http://www.mcs.anl.gov/petsc
Barth, A., Schwab, C., Zollinger, N.: Multi-level Monte Carlo finite element method for elliptic PDEs with stochastic coefficients. Numer. Math. 119(1), 123–161 (2011)
Barth, A., Lang, A., Schwab, C.: Multilevel Monte Carlo method for parabolic stochastic partial differential equations. BIT Numer. Math. 53(1), 3–27 (2013)
Bayer, C., Hoel, H., von Schwerin, E., Tempone, R.: On nonasymptotic optimal stopping criteria in Monte Carlo simulations. SIAM J. Sci. Comput. 36(2), A869–A885 (2014). doi:10.1137/130911433
Charrier, J., Scheichl, R., Teckentrup, A.: Finite element error analysis of elliptic PDEs with random coefficients and its application to multilevel Monte Carlo methods. SIAM J. Numer. Anal. 51(1), 322–352 (2013)
Cliffe, K., Giles, M., Scheichl, R., Teckentrup, A.: Multilevel Monte Carlo methods and applications to elliptic PDEs with random coefficients. Comput. Vis. Sci. 14(1), 3–15 (2011)
Collier, N., Dalcin, L., Calo, V.: PetIGA: High-performance isogeometric analysis. arxiv (1305.4452) (2013). arXiv:1305.4452
Collier, N., Haji-Ali, A.L., Nobile, F., von Schwerin, E., Tempone, R.: A continuation multilevel Monte Carlo algorithm. BIT Numer. Math. 55(2), 399–432 (2015). doi:10.1007/s10543-014-0511-3
Giles, M.: Improved multilevel Monte Carlo convergence using the Milstein scheme. Monte Carlo and Quasi-Monte Carlo Methods 2006, pp. 343–358. Springer, Berlin (2008)
Giles, M.: Multilevel Monte Carlo path simulation. Oper. Res. 56(3), 607–617 (2008)
Giles, M., Reisinger, C.: Stochastic finite differences and multilevel Monte Carlo for a class of SPDEs in finance. SIAM J. Financ. Math. 3(1), 572–592 (2012)
Glasserman, P.: Monte Carlo methods in financial engineering. Stochastic Modelling and Applied Probability. Applications of Mathematics. Springer, New York (2004)
Heinrich, S.: Monte Carlo complexity of global solution of integral equations. J. Complex. 14(2), 151–175 (1998)
Heinrich, S., Sindambiwe, E.: Monte Carlo complexity of parametric integration. J. Complex. 15(3), 317–341 (1999)
Hoel, H., von Schwerin, E., Szepessy, A., Tempone, R.: Adaptive multilevel Monte Carlo simulation. In: Engquist, B., Runborg, O., Tsai, Y.H. (eds.) Numerical Analysis of Multiscale Computations. Lecture Notes in Computational Science and Engineering, vol. 82, pp. 217–234. Springer, Berlin (2012)
Hoel, H., von Schwerin, E., Szepessy, A., Tempone, R.: Implementation and analysis of an adaptive multilevel Monte Carlo algorithm. Monte Carlo Methods Appl. 20(1), 1–41 (2014)
Jouini, E., Cvitanić, J., Musiela, M. (eds.): Option pricing, interest rates and risk management. Handbooks in Mathematical Finance. Cambridge University Press, Cambridge (2001)
Karatzas, I., Shreve, S.E.: Brownian Motion and Stochastic Calculus. Graduate Texts in Mathematics, vol. 113, 2nd edn. Springer, New York (1991)
Kebaier, A.: Statistical Romberg extrapolation: a new variance reduction method and applications to options pricing. Ann. Appl. Probab. 14(4), 2681–2705 (2005)
Milstein, G.N., Tretyakov, M.V.: Stochastic numerics for mathematical physics. Springer, New York (2004)
Moon, K.S., Szepessy, A., Tempone, R., Zouraris, G.E.: Convergence rates for adaptive weak approximation of stochastic differential equations. Stoch. Anal. Appl. 23(3), 511–558 (2005)
Moraes, A., Tempone, R., Vilanova, P.: Multilevel hybrid chernoff tau-leap. BIT Numer. Math. (2015). doi:10.1007/s10543-015-0556-y
Øksendal, B.: Stochastic Differential Equations. Universitext, 5th edn. Springer, Berlin (1998)
Teckentrup, A., Scheichl, R., Giles, M., Ullmann, E.: Further analysis of multilevel Monte Carlo methods for elliptic PDEs with random coefficients. Numer. Math. 125(3), 569–600 (2013)
Tesei, F., Nobile, F.: A multi level Monte Carlo method with control variate for elliptic pdes with log-normal coefficients. Technical report (2014)
Xia, Y., Giles, M.: Multilevel path simulation for jump-diffusion SDEs. In: Plaskota, L., Woźniakowski, H. (eds.) Monte Carlo and Quasi-Monte Carlo Methods 2010, pp. 695–708. Springer, Berlin (2012)
Acknowledgments
R. Tempone is a member of the Research Center on Uncertainty Quantification (SRI-UQ), division of Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) at King Abdullah University of Science and Technology (KAUST). The authors would like to recognize the support of the following KAUST and University of Texas at Austin AEA projects: Round 2, “Predictability and Uncertainty Quantification for Models of Porous Media”, and Round 3, “Uncertainty quantification for predictive modeling of the dissolution of porous and fractured media”. F. Nobile has been partially supported by the Swiss National Science Foundation under the Project No. 140574 “Efficient numerical methods for flow and transport phenomena in heterogeneous random porous media” and by the Center for ADvanced MOdeling Science (CADMOS). E. von Schwerin has been partially supported by the aforementioned SRI-UQ and CADMOS. We would also like to acknowledge the following open source software packages that made this work possible: PETSc [4], PetIGA [10].
Author information
Authors and Affiliations
Corresponding authors
Appendix 1: Derivations and proofs
Appendix 1: Derivations and proofs
1.1 Appendix 1.1: Optimal hierarchies given \(h_0\), \(\theta \), and L
Here we solve Problem 2.1 of Sect. 2.2 for the optimal hierarchy for any fixed value of L. We initially treat the parameter \(\theta \) as given, postponing its optimization until later, and proceed in two steps to find the optimal \(\{M_\ell \}_{\ell =0}^L\) and \(\{h_\ell \}_{\ell =0}^L\). Assuming general work estimates \(\{W_\ell \}_{\ell =0}^L\) in (2.4) and general variance estimates of \(\{V_\ell \}_{\ell =0}^L\), we assume equality in (2.9b) and introduce the Lagrange multiplier \(\lambda \) to obtain the Lagrangian
The requirement that the variation of the Lagrangian with respect to \(M_\ell \) is zero, gives \(M_\ell = \sqrt{\lambda \frac{V_\ell }{W_\ell }}.\) Solving for \(\lambda \) in the variance constraint (2.9b) and substituting back leads to (2.13). Substituting this optimal \(M_\ell \) in the total work (2.4) yields
We proceed to find the optimal \(\{h_\ell \}_{\ell =0}^L\) under the particular models (2.12). The total work (5.1) is minimized when
is minimized. Here the finest mesh, \(h_L\), is given by the bias constraint (2.12b) as
independently of the multilevel construction. Now, treat the coarsest mesh, \(h_0\), as given and find the optimal \(h_1,\ldots ,h_{L-1}\) that minimize
The requirement that the derivative of this sum with respect to \(h_\ell \) equals zero, for \(\ell =1,\ldots ,L-1\), leads to the optimality condition
which after taking the logarithm and using \(\chi \) defined in (2.14), leads to
This is a second order linear difference whose solution depends on \(\chi \).
1.1.1 Appendix 1.1.1: For \(\chi =1\)
This section provides proofs of Theorem 2.1, Lemma 2.1, and Corollary 2.1. The solution of the difference Eq. (5.5) for the case \(\chi =1\) is the geometric sequence
In other words, all \(h_\ell \) are defined in terms of \(h_0\) and \(h_L\), where the latter is determined by \(\theta \) through (5.3) and we solve for the former by setting the derivative of (5.2) with respect to \(h_0\) equal to zero. This optimality condition becomes (for \(q_2=d\gamma \))
Combining this expression with (5.6) for \(\ell =1\) and solving for \(h_0\) yields
Substituting this expression and (5.3) in the expression for \( {\beta } \) in (5.6) we obtain (2.15c). Moreover, substituting (2.15c) and (5.6) and (2.10) and (2.5) in (2.13) yields (2.15b). Next, we substitute (2.15b) and (2.15a) in (2.6) to obtain the optimal work for \(q_2=d\gamma \)
Using (5.6) and (5.7), we obtain
Substituting for \(h_L\) from (5.3)
Optimizing for \(\theta \) yields (2.15d). Substituting back gives the work as a function of L
where \({{\mathfrak {e}}}(L) = \frac{1}{2\eta (L+1)}\). Treating L as a continuous variable and differentiating with respect to L yields
where \({y=2\eta (L+1)} > 0\) and \({C = \log \left( {\mathrm {TOL}}^{-1} Q_W V_0^\eta Q_S^{-\eta } \right) }\). Setting (5.12) to zero gives the equation
It follows that \(\lim _{C \rightarrow \infty } \frac{y}{C} = 1\) which leads to (2.17) for the value of L and (2.18) for the work. Since \(y>0\), Eq. (5.13) implies
for any \(C>0\). Furthermore, for any \(y>0\), it holds
which together with (5.13) gives
for any \(C>0\). Inequalities (5.14) and (5.15) are (2.16).
1.1.2 Appendix 1.1.2: For \(\chi \ne 1\)
This section provides proofs of Theorem 2.2, Lemma 2.2, and Corollary 2.2. The solution of the difference Eq. (5.5) for the case \(\chi \ne 1\) is
We now distinguish between two different cases for \(h_0\): either we are free to choose the optimal \(h_0\in {\mathbb {R}}_+\), or we have an upper bound on the coarsest mesh \(h_0\). The first, idealized, situation will allow us to obtain explicit expressions for the optimal splitting parameter \(\theta \) and the asymptotic work, and we start by considering this case. We return to the other case at the end of this section.
Unconstrained optimization of \(h_0\) We take \(h_1,\ldots ,h_L\) given by (5.16) and (5.3) and set the derivative of (5.2) with respect to \(h_0\) equal to zero. This optimality condition becomes (after some straightforward simplifications)
which, since all parameters are positive, is equivalent to
Combining this expression for \(h_1\) with the one in (5.16) and solving for \(h_0\) gives
which after substituting back into (5.16) and using (5.3) yields (2.19a). Finally substituting these optimal mesh sizes into (2.13) yields (2.19b).
Optimal splitting parameter \(\theta \) Now the sequences \(\{h_\ell \}_{\ell =0}^L\) and \(\{M_\ell \}_{\ell =0}^L\) are determined in terms of the still not optimized L and \(\theta \) as well as measurable model parameters. The work per level in (2.6) becomes
Since the only \(\ell \)-dependent factor in the right hand side is the last one, \(\chi ^{-\ell }\), and using \(\sum _{\ell =0}^L\chi ^{-\ell }=\chi ^{-L}(1-\chi ^{L+1})/(1-\chi )\), the total work in (2.6) becomes
with
Thus given the value of L the dependence on the splitting parameter \(\theta \) is straightforward, and the minimal work for a given L is obtained with the minimizer of (5.18c), namely (2.19c). With this optimal splitting parameter \(\theta \) in (5.17) the total work as a function of the yet to be determined parameter L and the tolerance is
with
Optimal number of levels
The optimal integer L seems impossible to find analytically. In practical computations we instead perform an extensive search over a small range of integer values. In the analysis below we treat L as a real parameter to obtain the bounds (2.20) that delimit the range of integer values that must be tested, and allow a complexity analysis as \({\mathrm {TOL}}\rightarrow 0\) without an exactly determined L.
Treating L as a real parameter, we differentiate the work (5.19) with respect to L to obtain
where, introducing the shorthand
and using the constants \(c_1\) and \(c_2\) in (2.21) we write
so that
with
Clearly \(u(L,{\mathrm {TOL}})<0\) for all \(\chi \in {\mathbb {R}}_+{\setminus }\{1\}\) so the sign of \(\partial W/\partial L\) is the opposite of the sign of \(v(L,{\mathrm {TOL}})\). For a fixed \(\chi \in {\mathbb {R}}_+{\setminus }\{1\}\) we have
and, since \(\xi (L)\ge \xi (0)=2\eta \),
For the opposite inequality,
we distinguish between the cases \(0<\chi <1\) and \(1<\chi \). When \(0<\chi <1\) we have the upper bound \(\xi (L)<\frac{2\eta }{1-\chi }\) and consequently
In contrast \(\xi (L)\) is unbounded when \(1<\chi \) but, since the definitions of \(\chi \) and \(\eta \) and the relation between strong and weak convergence orders implies that \(2\eta \ge \chi \), we have
and
which gives the bound
Hence
and it follows that
Combining (5.24) with (5.25) and (5.26), we obtain the bounds (2.20).
Optimal hierarchies with an upper bound on \(h_0\) Practical computations will impose an upper limit on the mesh sizes, \(h_0\le h_{{\mathrm {max}}}\). If the mesh sizes (2.19a) violate such a bound, we must modify our analysis slightly. We now consider \(h_0\) given as one of the coarsest mesh sizes that can be realized in the given discretization, and analyze the case \(L\ge 1\). Using the optimal mesh sizes (5.16) yields
where the only \(\ell \)-dependent factor in the right hand side is the last one, \(\chi ^{-\ell }\), so that the sum in (5.4) is
In this sum only \(h_L\) depends on \(\theta \) through (5.3). Keeping L fixed we wish to minimize the total work, which by (5.1)–(5.2) is
with respect to \(\theta \). Letting
and
we obtain
with the optimality condition
where
In this case when \(h_0\) is constrained we no longer have an explicit expression for the optimal \(\theta \). However, using
and that
we conclude that the optimal \(\theta \) satisfies
Similarly, from the inequality
and the relation
we obtain an upper bound for \(\theta \), namely
Finally, combining (5.27) and (5.28) we have the following bounds for the optimal \(\theta \):
where the upper bound has a non-trivial dependence on \({\mathrm {TOL}}\) and L through C.
1.2 Appendix 1.2: Heuristic optimization of geometric hierarchies
This section motivates the results in Sect. 2.3 and Corollary 2.3 where we optimized geometric hierarchies defined by \(h_\ell = h_0 {\beta } ^{-\ell }\) for given \(h_0\) and \( {\beta } > 1\). In this case, the work and variance models are in (2.28) and L is must satisfy the bias constraint
We distinguish between two cases:
• \(\chi =1\): Or equivalently \(q_2 = d\gamma \). In this case, the total work defined in (5.1) simplifies to
We make the simplification of treating L as a real parameter and substitute the lower bound of (5.30) in (5.31) and optimize with respect to \( {\beta } \) to get \( {\beta } =\exp \left( \frac{2}{q_2}\right) \). Substituting this choice and (2.30), the total work satisfies
Optimizing for \(\theta \) suggests that \(\theta \rightarrow 1\) as \({\mathrm {TOL}}\rightarrow 0\) and (2.18) follows.
• \(\chi \ne 1\): In this case, the total work defined in (5.1) simplifies to
for a given \(L, h_0\) and \(\theta \). Again, we make the simplification of treating L as a real parameter and substitute the lower bound (5.30) to obtain
for any \( {\beta } \). Substituting back in (5.32) and optimizing with respect to \( {\beta } \) to minimize the work gives (2.29). Substituting this optimal \( {\beta } \) in (5.32) yields
Asymptotically, using (2.30) as \({\mathrm {TOL}}\rightarrow 0\) yields (2.23) with the following constants
Optimizing these constants with respect to \(\theta \) yields (2.25) and substituting this and (2.32) back yields (2.24a) and (2.31) for \(C_1\) and \(C_2\), respectively. This, as Remark 2.6 mentions, shows that the asymptotic computational complexities of optimal non-geometric and geometric hierarchies are the same.
Rights and permissions
About this article
Cite this article
Haji-Ali, AL., Nobile, F., von Schwerin, E. et al. Optimization of mesh hierarchies in multilevel Monte Carlo samplers. Stoch PDE: Anal Comp 4, 76–112 (2016). https://doi.org/10.1007/s40072-015-0049-7
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40072-015-0049-7
Keywords
- Multilevel Monte Carlo
- Monte Carlo
- Partial differential equations with random data
- Stochastic differential equations
- Optimal discretization
Mathematics Subject Classification
- 65C05
- 65N30
- 65N22