Skip to main content

Optimization of mesh hierarchies in multilevel Monte Carlo samplers


We perform a general optimization of the parameters in the multilevel Monte Carlo (MLMC) discretization hierarchy based on uniform discretization methods with general approximation orders and computational costs. We optimize hierarchies with geometric and non-geometric sequences of mesh sizes and show that geometric hierarchies, when optimized, are nearly optimal and have the same asymptotic computational complexity as non-geometric optimal hierarchies. We discuss how enforcing constraints on parameters of MLMC hierarchies affects the optimality of these hierarchies. These constraints include an upper and a lower bound on the mesh size or enforcing that the number of samples and the number of discretization elements are integers. We also discuss the optimal tolerance splitting between the bias and the statistical error contributions and its asymptotic behavior. To provide numerical grounds for our theoretical results, we apply these optimized hierarchies together with the Continuation MLMC Algorithm (Collier et al., BIT Numer Math 55(2):399–432, 2015). The first example considers a three-dimensional elliptic partial differential equation with random inputs. Its space discretization is based on continuous piecewise trilinear finite elements and the corresponding linear system is solved by either a direct or an iterative solver. The second example considers a one-dimensional Itô stochastic differential equation discretized by a Milstein scheme.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12


  1. We consider uniform meshes, but the extension to certain non-uniform meshes is immediate; see Remark 2.2 in Sect. 2.2.


  1. Amestoy, P.R., Duff, I.S., L’Excellent, J.Y., Koster, J.: A fully asynchronous multifrontal solver using distributed dynamic scheduling. SIAM J. Matrix Anal. Appl. 23, 15–41 (2001). doi:10.1137/S0895479899358194

    Article  MathSciNet  MATH  Google Scholar 

  2. Babuška, I., Nobile, F., Tempone, R.: A stochastic collocation method for elliptic partial differential equations with random input data. SIAM Rev. 52(2), 317–355 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  3. Balay, S., Gropp, W.D., McInnes, L.C., Smith, B.F.: Efficient management of parallelism in object oriented numerical software libraries. In: Arge, E., Bruaset, A.M., Langtangen, H.P. (eds.) Modern Software Tools in Scientific Computing, pp. 163–202. Birkhäuser, Boston (1997)

    Chapter  Google Scholar 

  4. Balay, S., Brown, J., Buschelman, K., Gropp, W.D., Kaushik, D., Knepley, M.G., McInnes, L.C., Smith, B.F., Zhang, H.: PETSc Web page (2013).

  5. Barth, A., Schwab, C., Zollinger, N.: Multi-level Monte Carlo finite element method for elliptic PDEs with stochastic coefficients. Numer. Math. 119(1), 123–161 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  6. Barth, A., Lang, A., Schwab, C.: Multilevel Monte Carlo method for parabolic stochastic partial differential equations. BIT Numer. Math. 53(1), 3–27 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  7. Bayer, C., Hoel, H., von Schwerin, E., Tempone, R.: On nonasymptotic optimal stopping criteria in Monte Carlo simulations. SIAM J. Sci. Comput. 36(2), A869–A885 (2014). doi:10.1137/130911433

    Article  MathSciNet  MATH  Google Scholar 

  8. Charrier, J., Scheichl, R., Teckentrup, A.: Finite element error analysis of elliptic PDEs with random coefficients and its application to multilevel Monte Carlo methods. SIAM J. Numer. Anal. 51(1), 322–352 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  9. Cliffe, K., Giles, M., Scheichl, R., Teckentrup, A.: Multilevel Monte Carlo methods and applications to elliptic PDEs with random coefficients. Comput. Vis. Sci. 14(1), 3–15 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  10. Collier, N., Dalcin, L., Calo, V.: PetIGA: High-performance isogeometric analysis. arxiv (1305.4452) (2013). arXiv:1305.4452

  11. Collier, N., Haji-Ali, A.L., Nobile, F., von Schwerin, E., Tempone, R.: A continuation multilevel Monte Carlo algorithm. BIT Numer. Math. 55(2), 399–432 (2015). doi:10.1007/s10543-014-0511-3

  12. Giles, M.: Improved multilevel Monte Carlo convergence using the Milstein scheme. Monte Carlo and Quasi-Monte Carlo Methods 2006, pp. 343–358. Springer, Berlin (2008)

    Chapter  Google Scholar 

  13. Giles, M.: Multilevel Monte Carlo path simulation. Oper. Res. 56(3), 607–617 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  14. Giles, M., Reisinger, C.: Stochastic finite differences and multilevel Monte Carlo for a class of SPDEs in finance. SIAM J. Financ. Math. 3(1), 572–592 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  15. Glasserman, P.: Monte Carlo methods in financial engineering. Stochastic Modelling and Applied Probability. Applications of Mathematics. Springer, New York (2004)

    Google Scholar 

  16. Heinrich, S.: Monte Carlo complexity of global solution of integral equations. J. Complex. 14(2), 151–175 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  17. Heinrich, S., Sindambiwe, E.: Monte Carlo complexity of parametric integration. J. Complex. 15(3), 317–341 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  18. Hoel, H., von Schwerin, E., Szepessy, A., Tempone, R.: Adaptive multilevel Monte Carlo simulation. In: Engquist, B., Runborg, O., Tsai, Y.H. (eds.) Numerical Analysis of Multiscale Computations. Lecture Notes in Computational Science and Engineering, vol. 82, pp. 217–234. Springer, Berlin (2012)

    Chapter  Google Scholar 

  19. Hoel, H., von Schwerin, E., Szepessy, A., Tempone, R.: Implementation and analysis of an adaptive multilevel Monte Carlo algorithm. Monte Carlo Methods Appl. 20(1), 1–41 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  20. Jouini, E., Cvitanić, J., Musiela, M. (eds.): Option pricing, interest rates and risk management. Handbooks in Mathematical Finance. Cambridge University Press, Cambridge (2001)

  21. Karatzas, I., Shreve, S.E.: Brownian Motion and Stochastic Calculus. Graduate Texts in Mathematics, vol. 113, 2nd edn. Springer, New York (1991)

    MATH  Google Scholar 

  22. Kebaier, A.: Statistical Romberg extrapolation: a new variance reduction method and applications to options pricing. Ann. Appl. Probab. 14(4), 2681–2705 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  23. Milstein, G.N., Tretyakov, M.V.: Stochastic numerics for mathematical physics. Springer, New York (2004)

    Book  MATH  Google Scholar 

  24. Moon, K.S., Szepessy, A., Tempone, R., Zouraris, G.E.: Convergence rates for adaptive weak approximation of stochastic differential equations. Stoch. Anal. Appl. 23(3), 511–558 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  25. Moraes, A., Tempone, R., Vilanova, P.: Multilevel hybrid chernoff tau-leap. BIT Numer. Math. (2015). doi:10.1007/s10543-015-0556-y

  26. Øksendal, B.: Stochastic Differential Equations. Universitext, 5th edn. Springer, Berlin (1998)

    Book  MATH  Google Scholar 

  27. Teckentrup, A., Scheichl, R., Giles, M., Ullmann, E.: Further analysis of multilevel Monte Carlo methods for elliptic PDEs with random coefficients. Numer. Math. 125(3), 569–600 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  28. Tesei, F., Nobile, F.: A multi level Monte Carlo method with control variate for elliptic pdes with log-normal coefficients. Technical report (2014)

  29. Xia, Y., Giles, M.: Multilevel path simulation for jump-diffusion SDEs. In: Plaskota, L., Woźniakowski, H. (eds.) Monte Carlo and Quasi-Monte Carlo Methods 2010, pp. 695–708. Springer, Berlin (2012)

    Chapter  Google Scholar 

Download references


R. Tempone is a member of the Research Center on Uncertainty Quantification (SRI-UQ), division of Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) at King Abdullah University of Science and Technology (KAUST). The authors would like to recognize the support of the following KAUST and University of Texas at Austin AEA projects: Round 2, “Predictability and Uncertainty Quantification for Models of Porous Media”, and Round 3, “Uncertainty quantification for predictive modeling of the dissolution of porous and fractured media”. F. Nobile has been partially supported by the Swiss National Science Foundation under the Project No. 140574 “Efficient numerical methods for flow and transport phenomena in heterogeneous random porous media” and by the Center for ADvanced MOdeling Science (CADMOS). E. von Schwerin has been partially supported by the aforementioned SRI-UQ and CADMOS. We would also like to acknowledge the following open source software packages that made this work possible: PETSc [4], PetIGA [10].

Author information

Authors and Affiliations


Corresponding authors

Correspondence to Abdul-Lateef Haji-Ali or Erik von Schwerin.

Appendix 1: Derivations and proofs

Appendix 1: Derivations and proofs

1.1 Appendix 1.1: Optimal hierarchies given \(h_0\), \(\theta \), and L

Here we solve Problem 2.1 of Sect. 2.2 for the optimal hierarchy for any fixed value of L. We initially treat the parameter \(\theta \) as given, postponing its optimization until later, and proceed in two steps to find the optimal \(\{M_\ell \}_{\ell =0}^L\) and \(\{h_\ell \}_{\ell =0}^L\). Assuming general work estimates \(\{W_\ell \}_{\ell =0}^L\) in (2.4) and general variance estimates of \(\{V_\ell \}_{\ell =0}^L\), we assume equality in (2.9b) and introduce the Lagrange multiplier \(\lambda \) to obtain the Lagrangian

$$\begin{aligned} {{\mathcal {L}}}\left( \{M_\ell \}_{\ell =0}^L,\lambda \right)&= \sum _{\ell =0}^{L} M_\ell W_\ell +\lambda \left\{ \sum _{\ell =0}^L \frac{V_\ell }{M_\ell }- \left( \theta \frac{{\mathrm {TOL}}}{C_\alpha }\right) ^2 \right\} . \end{aligned}$$

The requirement that the variation of the Lagrangian with respect to \(M_\ell \) is zero, gives \(M_\ell = \sqrt{\lambda \frac{V_\ell }{W_\ell }}.\) Solving for \(\lambda \) in the variance constraint (2.9b) and substituting back leads to (2.13). Substituting this optimal \(M_\ell \) in the total work (2.4) yields

$$\begin{aligned} W({\mathbf {H}})&= \left( \frac{C_\alpha }{ \theta {\mathrm {TOL}}} \right) ^2 \left( \sum _{\ell =0}^L \sqrt{W_\ell V_\ell } \right) ^2. \end{aligned}$$

We proceed to find the optimal \(\{h_\ell \}_{\ell =0}^L\) under the particular models (2.12). The total work (5.1) is minimized when

$$\begin{aligned} \sum _{\ell =0}^L \sqrt{W_\ell V_\ell }&= \sqrt{\frac{{V_0}}{h_0^{d\gamma }}} + \sqrt{Q_S}\sum _{\ell =1}^L \sqrt{\frac{h_{\ell -1}^{q_2}}{h_\ell ^{d\gamma }}}, \end{aligned}$$

is minimized. Here the finest mesh, \(h_L\), is given by the bias constraint (2.12b) as

$$\begin{aligned} h_L&= \left( \frac{(1-\theta )\,{\mathrm {TOL}}}{Q_W} \right) ^{\frac{1}{q_1}}, \end{aligned}$$

independently of the multilevel construction. Now, treat the coarsest mesh, \(h_0\), as given and find the optimal \(h_1,\ldots ,h_{L-1}\) that minimize

$$\begin{aligned} \frac{1}{\sqrt{Q_S}}\sum _{\ell =1}^L \sqrt{W_\ell V_\ell }&= \sum _{\ell =1}^L \sqrt{\frac{h_{\ell -1}^{q_2}}{h_\ell ^{d\gamma }}}. \end{aligned}$$

The requirement that the derivative of this sum with respect to \(h_\ell \) equals zero, for \(\ell =1,\ldots ,L-1\), leads to the optimality condition

$$\begin{aligned} q_2 h_\ell ^{\left( \frac{q_2+d\gamma }{2}\right) }&= d\gamma h_{\ell -1}^{\left( \frac{q_2}{2}\right) } h_{\ell +1}^{\left( \frac{d\gamma }{2}\right) }, \end{aligned}$$

which after taking the logarithm and using \(\chi \) defined in (2.14), leads to

$$\begin{aligned} - \log {\left( h_{\ell +1}\right) } + \left( 1+\chi \right) \log {\left( h_\ell \right) } - \chi \log {\left( h_{\ell -1}\right) }&= -\frac{2}{d\gamma } \log {\left( \chi \right) }. \end{aligned}$$

This is a second order linear difference whose solution depends on \(\chi \).

1.1.1 Appendix 1.1.1: For \(\chi =1\)

This section provides proofs of Theorem 2.1, Lemma 2.1, and Corollary 2.1. The solution of the difference Eq. (5.5) for the case \(\chi =1\) is the geometric sequence

$$\begin{aligned} h_\ell = h_0 {\beta } ^{-\ell },\,\, \text {with}\, {\beta } =\left( \frac{h_0}{h_L}\right) ^{1/L}. \end{aligned}$$

In other words, all \(h_\ell \) are defined in terms of \(h_0\) and \(h_L\), where the latter is determined by \(\theta \) through (5.3) and we solve for the former by setting the derivative of (5.2) with respect to \(h_0\) equal to zero. This optimality condition becomes (for \(q_2=d\gamma \))

$$\begin{aligned} h_1 = \left( \frac{Q_S}{V_0}\right) ^{\frac{1}{q_2}} h_0^2. \end{aligned}$$

Combining this expression with (5.6) for \(\ell =1\) and solving for \(h_0\) yields

$$\begin{aligned} h_0 = h_L^\frac{1}{L+1} \left( \frac{V_0}{Q_S}\right) ^\frac{L}{q_2 (L+1)}. \end{aligned}$$

Substituting this expression and (5.3) in the expression for \( {\beta } \) in (5.6) we obtain (2.15c). Moreover, substituting (2.15c) and (5.6) and (2.10) and (2.5) in (2.13) yields (2.15b). Next, we substitute (2.15b) and (2.15a) in (2.6) to obtain the optimal work for \(q_2=d\gamma \)

$$\begin{aligned} W = \left( \frac{C_\alpha }{\theta {\mathrm {TOL}}} \right) ^2 \left( \sqrt{V_0} h_0^\frac{-q_2}{2} + \sqrt{Q_S} {\beta } ^\frac{q_2}{2} L \right) ^2. \end{aligned}$$

Using (5.6) and (5.7), we obtain

$$\begin{aligned} W = \left( \frac{C_\alpha }{\theta {\mathrm {TOL}}} \right) ^2 h_L^\frac{-q_2}{L+1} V_0^\frac{1}{L+1} Q_S^\frac{L}{L+1}\left( 1+L\right) ^2 . \end{aligned}$$

Substituting for \(h_L\) from (5.3)

$$\begin{aligned} W = \left( \frac{C_\alpha }{\theta {\mathrm {TOL}}} \right) ^2 \left( \frac{Q_W}{(1-\theta )\,{\mathrm {TOL}}} \right) ^{\frac{1}{ \eta (L+1)}} V_0^\frac{1}{L+1} Q_S^\frac{L}{L+1}\left( 1+L\right) ^2. \end{aligned}$$

Optimizing for \(\theta \) yields (2.15d). Substituting back gives the work as a function of L

$$\begin{aligned} W(L) \!=\! C_\alpha ^2 {\mathrm {TOL}}^{-2(1+{{\mathfrak {e}}}(L))} Q_W^{2{{\mathfrak {e}}}(L)} V_0^{2 \eta {{\mathfrak {e}}}(L)} Q_S^{-2 \eta {{\mathfrak {e}}}(L)} Q_S \left( \frac{1}{2\eta }\right) ^2 \left( 1 + \frac{1}{{{\mathfrak {e}}}(L)}\right) ^{2(1+{{\mathfrak {e}}}(L))}, \end{aligned}$$

where \({{\mathfrak {e}}}(L) = \frac{1}{2\eta (L+1)}\). Treating L as a continuous variable and differentiating with respect to L yields

$$\begin{aligned} W'(L) = 2 W(L) {{\mathfrak {e}}}'(L) \left( C -y + \log (1+y) \right) , \end{aligned}$$

where \({y=2\eta (L+1)} > 0\) and \({C = \log \left( {\mathrm {TOL}}^{-1} Q_W V_0^\eta Q_S^{-\eta } \right) }\). Setting (5.12) to zero gives the equation

$$\begin{aligned} y - \log (1+y) = C. \end{aligned}$$

It follows that \(\lim _{C \rightarrow \infty } \frac{y}{C} = 1\) which leads to (2.17) for the value of L and (2.18) for the work. Since \(y>0\), Eq. (5.13) implies

$$\begin{aligned} 1<y C^{-1} \end{aligned}$$

for any \(C>0\). Furthermore, for any \(y>0\), it holds

$$\begin{aligned} \left( \frac{\exp (1)-1}{\exp (1)}\right) (1+y) - C \le 1+y - \log (1+y) - C, \end{aligned}$$

which together with (5.13) gives

$$\begin{aligned} y C^{-1} \le \frac{\exp (1)}{\exp (1)-1} + \frac{1}{(\exp (1)-1)C}, \end{aligned}$$

for any \(C>0\). Inequalities (5.14) and (5.15) are (2.16).

1.1.2 Appendix 1.1.2: For \(\chi \ne 1\)

This section provides proofs of Theorem 2.2, Lemma 2.2, and Corollary 2.2. The solution of the difference Eq. (5.5) for the case \(\chi \ne 1\) is

$$\begin{aligned} h_\ell&= h_0^{\left( \frac{\chi ^\ell -\chi ^L}{1-\chi ^L}\right) } h_L^{\left( \frac{1-\chi ^\ell }{1-\chi ^L}\right) } \chi ^{-\frac{2}{d\gamma }\left( \frac{L(1-\chi ^\ell ) -\ell (1-\chi ^L)}{(1-\chi )(1-\chi ^L)}\right) }. \end{aligned}$$

We now distinguish between two different cases for \(h_0\): either we are free to choose the optimal \(h_0\in {\mathbb {R}}_+\), or we have an upper bound on the coarsest mesh \(h_0\). The first, idealized, situation will allow us to obtain explicit expressions for the optimal splitting parameter \(\theta \) and the asymptotic work, and we start by considering this case. We return to the other case at the end of this section.

Unconstrained optimization of \(h_0\) We take \(h_1,\ldots ,h_L\) given by (5.16) and (5.3) and set the derivative of (5.2) with respect to \(h_0\) equal to zero. This optimality condition becomes (after some straightforward simplifications)

$$\begin{aligned} -\frac{d\gamma }{2}\frac{\sqrt{{V_0}}}{h_0^{1+d\gamma /2}} +\frac{q_2}{2}\sqrt{Q_S}\frac{h_0^{q_2/2-1}}{h_1^{d\gamma /2}}&= 0, \end{aligned}$$

which, since all parameters are positive, is equivalent to

$$\begin{aligned} h_1&= \left( \frac{\chi ^2 Q_S}{{V_0}} \right) ^{\frac{1}{d\gamma }} h_0^{1+\chi }. \end{aligned}$$

Combining this expression for \(h_1\) with the one in (5.16) and solving for \(h_0\) gives

$$\begin{aligned} h_0&= h_L^{\frac{1-\chi }{1-\chi ^{L+1}}} \left( \frac{{V_0}}{Q_S}\right) ^{\frac{1}{d\gamma } \frac{1-\chi ^L}{1-\chi ^{L+1}}} \chi ^{-\frac{2}{d\gamma }\frac{1}{1-\chi } \left( L\frac{1-\chi }{1-\chi ^{L+1}}-\chi \frac{1-\chi ^L}{1-\chi ^{L+1}} \right) }, \end{aligned}$$

which after substituting back into (5.16) and using (5.3) yields (2.19a). Finally substituting these optimal mesh sizes into (2.13) yields (2.19b).

Optimal splitting parameter \(\theta \) Now the sequences \(\{h_\ell \}_{\ell =0}^L\) and \(\{M_\ell \}_{\ell =0}^L\) are determined in terms of the still not optimized L and \(\theta \) as well as measurable model parameters. The work per level in (2.6) becomes

$$\begin{aligned} \frac{M_\ell }{h_\ell ^{d\gamma }}&= \left( \frac{C_\alpha }{\theta {\mathrm {TOL}}}\right) ^2 \left( \frac{Q_W}{(1-\theta )\,{\mathrm {TOL}}} \right) ^{\frac{1}{\eta }\frac{1-\chi }{1-\chi ^{L+1}}} V_0 \left( \frac{Q_S}{V_0}\right) ^{\left\{ \frac{1-\chi ^L}{1-\chi ^{L+1}} \right\} } \\&\quad \times \, \frac{1-\chi ^{L+1}}{1-\chi } \chi ^{\left\{ -\frac{2\chi }{1-\chi }\frac{1-\chi ^L}{1-\chi ^{L+1}} +L\frac{1+\chi ^{L+1}}{1-\chi ^{L+1}}\right\} } \chi ^{-\ell }. \end{aligned}$$

Since the only \(\ell \)-dependent factor in the right hand side is the last one, \(\chi ^{-\ell }\), and using \(\sum _{\ell =0}^L\chi ^{-\ell }=\chi ^{-L}(1-\chi ^{L+1})/(1-\chi )\), the total work in (2.6) becomes

$$\begin{aligned} W{(L,\theta ,{\mathrm {TOL}})}&= w_1\left( L,{\mathrm {TOL}}\right) \, w_2(L) \, f\left( L,\theta \right) \left( \frac{1-\chi ^{L+1}}{1-\chi }\right) ^2, \end{aligned}$$


$$\begin{aligned} w_1\left( L,{\mathrm {TOL}}\right)&= {\mathrm {TOL}}^{-\left( 2+\frac{1}{\eta }\frac{1-\chi }{1-\chi ^{L+1}}\right) }, \end{aligned}$$
$$\begin{aligned} w_2(L)&= C_\alpha ^2 \, V_0 \left( \frac{Q_S}{V_0}\right) ^{\left\{ \frac{1-\chi ^L}{1-\chi ^{L+1}} \right\} } Q_W^{\left\{ \frac{1}{\eta }\frac{1-\chi }{1-\chi ^{L+1}}\right\} } \nonumber \\&\quad \times \, \chi ^{\left\{ -\frac{2\chi }{1-\chi }\frac{1-\chi ^L}{1-\chi ^{L+1}} +2L\frac{\chi ^{L+1}}{1-\chi ^{L+1}}\right\} }, \end{aligned}$$
$$\begin{aligned} f\left( L,\theta \right)&= \frac{1}{ \theta ^2 \left( 1-\theta \right) ^{\frac{1}{\eta }\frac{1-\chi }{1-\chi ^{L+1}}} }. \end{aligned}$$

Thus given the value of L the dependence on the splitting parameter \(\theta \) is straightforward, and the minimal work for a given L is obtained with the minimizer of (5.18c), namely (2.19c). With this optimal splitting parameter \(\theta \) in (5.17) the total work as a function of the yet to be determined parameter L and the tolerance is

$$\begin{aligned} W{\left( L,{\mathrm {TOL}}\right) }&= w_1(L,{\mathrm {TOL}}) \, w_2(L) \, w_3 (L), \end{aligned}$$


$$\begin{aligned} w_3(L)&= \left( \frac{1}{2\eta }\right) ^2 \left( 1+2\eta \frac{1-\chi ^{L+1}}{1-\chi } \right) ^{2\left( 1+\frac{1}{2\eta }\frac{1-\chi }{1-\chi ^{L+1}}\right) }. \end{aligned}$$

Optimal number of levels

The optimal integer L seems impossible to find analytically. In practical computations we instead perform an extensive search over a small range of integer values. In the analysis below we treat L as a real parameter to obtain the bounds (2.20) that delimit the range of integer values that must be tested, and allow a complexity analysis as \({\mathrm {TOL}}\rightarrow 0\) without an exactly determined L.

Treating L as a real parameter, we differentiate the work (5.19) with respect to L to obtain

$$\begin{aligned} \frac{\partial W}{\partial L}&= \frac{\partial w_1}{\partial L} \, w_2 \, w_3 + w_1 \, \frac{\partial w_2}{\partial L} \, w_3 + w_1 \, w_2 \, \frac{\partial w_3}{\partial L}, \end{aligned}$$

where, introducing the shorthand

$$\begin{aligned} \xi (L) = 2\eta \frac{1-\chi ^{L+1}}{1-\chi }\quad \text {for } L\in [0,\infty ), \end{aligned}$$

and using the constants \(c_1\) and \(c_2\) in (2.21) we write

$$\begin{aligned} \frac{\partial w_1}{\partial L}&= w_1(L,{\mathrm {TOL}})\,\frac{\log {(\chi )}\,\chi ^{L+1}}{1-\chi ^{L+1}} \frac{2}{\xi (L)} \log {\left( {\mathrm {TOL}}^{-1}\right) }, \end{aligned}$$
$$\begin{aligned} \frac{\partial w_2}{\partial L}&= w_2(L)\, \frac{\log {(\chi )}\,\chi ^{L+1}}{1-\chi ^{L+1}} \frac{2}{\xi (L)} \left( c_1 + -c_2(L+1) + \xi (L) \right) , \end{aligned}$$
$$\begin{aligned} \frac{\partial w_3}{\partial L}&= w_3(L)\,\frac{\log {(\chi )}\,\chi ^{L+1}}{1-\chi ^{L+1}} \frac{2}{\xi (L)} \big ( \log {\left( 1+\xi (L)\right) }-\xi (L) \big ), \end{aligned}$$

so that

$$\begin{aligned} \frac{\partial W}{\partial L}(L,{\mathrm {TOL}})&= u(L,{\mathrm {TOL}})v(L,{\mathrm {TOL}}), \end{aligned}$$


$$\begin{aligned} u(L,{\mathrm {TOL}})&= W{(L,{\mathrm {TOL}})}\, \frac{\log {(\chi )}\,\chi ^{L+1}}{1-\chi ^{L+1}} \frac{2}{\xi (L)}, \\ v(L,{\mathrm {TOL}})&= \log {\left( {\mathrm {TOL}}^{-1}\right) } + c_1 + -c_2 (L+1) + \log {\left( 1+\xi (L)\right) }. \end{aligned}$$

Clearly \(u(L,{\mathrm {TOL}})<0\) for all \(\chi \in {\mathbb {R}}_+{\setminus }\{1\}\) so the sign of \(\partial W/\partial L\) is the opposite of the sign of \(v(L,{\mathrm {TOL}})\). For a fixed \(\chi \in {\mathbb {R}}_+{\setminus }\{1\}\) we have

$$\begin{aligned} v(L,{\mathrm {TOL}}) > 0&\Leftrightarrow L+1 < \frac{1}{c_2} \left( \log {\left( {\mathrm {TOL}}^{-1}\right) }+c_1+\log { \left( 1+\xi (L)\right) }\right) , \end{aligned}$$

and, since \(\xi (L)\ge \xi (0)=2\eta \),

$$\begin{aligned} L+1 < \frac{1}{c_2} \left( \log {\left( {\mathrm {TOL}}^{-1}\right) }+c_1+\log {\left( 1+2\eta \right) }\right)&\Rightarrow v(L,{\mathrm {TOL}}) > 0 \Leftrightarrow \frac{\partial W}{\partial L} < 0. \end{aligned}$$

For the opposite inequality,

$$\begin{aligned} v(L,{\mathrm {TOL}}) < 0&\Leftrightarrow L+1 > \frac{1}{c_2} \left( \log {\left( {\mathrm {TOL}}^{-1}\right) }+c_1+\log {\left( 1+\xi (L) \right) }\right) , \end{aligned}$$

we distinguish between the cases \(0<\chi <1\) and \(1<\chi \). When \(0<\chi <1\) we have the upper bound \(\xi (L)<\frac{2\eta }{1-\chi }\) and consequently

$$\begin{aligned} L+1 > \frac{1}{c_2} \left( \log {\left( {\mathrm {TOL}}^{-1}\right) }+c_1+\log {\left( 1+ \frac{2\eta }{1-\chi }\right) }\right)&\Rightarrow \frac{\partial W}{\partial L} > 0,&\chi \in (0,1). \end{aligned}$$

In contrast \(\xi (L)\) is unbounded when \(1<\chi \) but, since the definitions of \(\chi \) and \(\eta \) and the relation between strong and weak convergence orders implies that \(2\eta \ge \chi \), we have

$$\begin{aligned} \log {\left( 1+\xi (L)\right) }&< \log {\left( \frac{2\eta }{\chi -1}\right) } + (L+1)\log {\chi }, \end{aligned}$$


$$\begin{aligned} c_2&\ge \frac{\chi }{\chi -1}\log {\chi }, \end{aligned}$$

which gives the bound

$$\begin{aligned} \frac{1}{c_2} \log {\left( 1+\xi (L)\right) }&< \frac{\chi -1}{\chi }(L+1) + \frac{1}{c_2} \log {\left( \frac{2\eta }{\chi -1}\right) }. \end{aligned}$$


$$\begin{aligned} L+1 - \frac{1}{c_2} \log {\left( 1+\xi (L)\right) }&> \frac{L+1}{\chi } - \frac{1}{c_2} \log {\left( \frac{2\eta }{\chi -1}\right) }, \end{aligned}$$

and it follows that

$$\begin{aligned} L+1 > \frac{\chi }{c_2} \left( \log {\left( {\mathrm {TOL}}^{-1}\right) }+c_1+\log {\left( \frac{2\eta }{\chi -1}\right) }\right)&\Rightarrow \frac{\partial W}{\partial L} > 0,&\chi \in (1,\infty ). \end{aligned}$$

Combining (5.24) with (5.25) and (5.26), we obtain the bounds (2.20).

Optimal hierarchies with an upper bound on \(h_0\) Practical computations will impose an upper limit on the mesh sizes, \(h_0\le h_{{\mathrm {max}}}\). If the mesh sizes (2.19a) violate such a bound, we must modify our analysis slightly. We now consider \(h_0\) given as one of the coarsest mesh sizes that can be realized in the given discretization, and analyze the case \(L\ge 1\). Using the optimal mesh sizes (5.16) yields

$$\begin{aligned} \sqrt{\frac{h_{\ell -1}^{q_2}}{h_\ell ^{d\gamma }}}&= h_0^{\frac{d\gamma }{2}\chi ^L\frac{1-\chi }{1-\chi ^L}} h_L^{-\frac{d\gamma }{2}\frac{1-\chi }{1-\chi ^L}} \chi ^{\left( \frac{L}{1-\chi ^L}-\frac{\chi }{1-\chi } -\ell \right) }, \end{aligned}$$

where the only \(\ell \)-dependent factor in the right hand side is the last one, \(\chi ^{-\ell }\), so that the sum in (5.4) is

$$\begin{aligned} \sum _{\ell =1}^L \sqrt{\frac{h_{\ell -1}^{q_2}}{h_\ell ^{d\gamma }}}&= \left( \frac{h_0^{\left( \chi ^L\right) }}{h_L}\right) ^{\frac{d\gamma }{2}\frac{1-\chi }{1-\chi ^L}} \chi ^{\left( \frac{L\chi ^L}{1-\chi ^L}-\frac{\chi }{1-\chi }\right) } \frac{1-\chi ^L}{1-\chi }. \end{aligned}$$

In this sum only \(h_L\) depends on \(\theta \) through (5.3). Keeping L fixed we wish to minimize the total work, which by (5.1)–(5.2) is

$$\begin{aligned} W({\mathbf {H}})&= \left( \frac{C_\alpha }{ \theta {\mathrm {TOL}}} \right) ^2 \left( \sqrt{\frac{{V_0}}{h_0^{d\gamma }}} + \sqrt{Q_S} \left( \frac{h_0^{\left( \chi ^L\right) }}{h_L}\right) ^ {\frac{d\gamma }{2}\frac{1-\chi }{1-\chi ^L}} \chi ^{\left( \frac{L\chi ^L}{1-\chi ^L}-\frac{\chi }{1-\chi }\right) } \frac{1-\chi ^L}{1-\chi } \right) ^2, \end{aligned}$$

with respect to \(\theta \). Letting

$$\begin{aligned} \varDelta&= \frac{1}{2\eta }\frac{1-\chi }{1-\chi ^L}, \end{aligned}$$


$$\begin{aligned} C&= \sqrt{\frac{Q_S}{{V_0}}} h_0^{\frac{d\gamma }{2}\frac{1-\chi ^{L+1}}{1-\chi ^L}} \chi ^{\left( \frac{L\chi ^L}{1-\chi ^L}-\frac{\chi }{1-\chi }\right) } \frac{1-\chi ^L}{1-\chi } \left( \frac{Q_W}{{\mathrm {TOL}}}\right) ^\varDelta , \end{aligned}$$

we obtain

$$\begin{aligned} W({\mathbf {H}})&\propto \tilde{f}\left( \theta ,L,h_0\right) = \frac{1}{\theta ^2}\left( 1+\frac{C}{\left( 1-\theta \right) ^\varDelta } \right) ^2, \end{aligned}$$

with the optimality condition

$$\begin{aligned} \frac{\partial \tilde{f}}{\partial \theta }&= \frac{2}{\theta ^2}\left( 1+\frac{C}{\left( 1-\theta \right) ^\varDelta }\right) \left( \frac{C\varDelta }{\left( 1-\theta \right) ^{\varDelta +1}} -\frac{1}{\theta }\left( 1+\frac{C}{\left( 1-\theta \right) ^\varDelta }\right) \right) = 0, \end{aligned}$$


$$\begin{aligned} \frac{2}{\theta ^2}\left( 1+\frac{C}{\left( 1-\theta \right) ^\varDelta }\right)&> 0. \end{aligned}$$

In this case when \(h_0\) is constrained we no longer have an explicit expression for the optimal \(\theta \). However, using

$$\begin{aligned} \frac{C\varDelta }{\left( 1-\theta \right) ^{\varDelta +1}} -\frac{1}{\theta }\left( 1+\frac{C}{\left( 1-\theta \right) ^\varDelta }\right)&< \frac{C}{\left( 1-\theta \right) ^\varDelta } \left( \frac{\varDelta }{1-\theta }-\frac{1}{\theta }\right) , \end{aligned}$$

and that

$$\begin{aligned} \frac{\varDelta }{1-\theta }-\frac{1}{\theta } = 0&\Leftrightarrow \theta = \frac{1}{1+\varDelta }, \end{aligned}$$

we conclude that the optimal \(\theta \) satisfies

$$\begin{aligned} \frac{1}{1+\varDelta }&\le \theta . \end{aligned}$$

Similarly, from the inequality

$$\begin{aligned} \frac{C\varDelta }{\left( 1-\theta \right) ^{\varDelta +1}} -\frac{1}{\theta }\left( 1+\frac{C}{\left( 1-\theta \right) ^\varDelta }\right)&> \frac{1}{\left( 1-\theta \right) ^\varDelta } \left( \frac{C\varDelta }{1-\theta }-\frac{1+C}{\theta }\right) , \end{aligned}$$

and the relation

$$\begin{aligned} \frac{C\varDelta }{1-\theta }-\frac{1+C}{\theta } = 0&\Leftrightarrow \theta = \frac{1+C}{1+C+\varDelta }, \end{aligned}$$

we obtain an upper bound for \(\theta \), namely

$$\begin{aligned} \theta&\le \frac{1+C}{1+C+C\varDelta }. \end{aligned}$$

Finally, combining (5.27) and (5.28) we have the following bounds for the optimal \(\theta \):

$$\begin{aligned} \left( 1+\frac{1}{2\eta }\frac{1-\chi }{1-\chi ^L}\right) ^{-1} \le \theta \le \left( 1+\frac{1}{2\eta }\frac{1-\chi }{1-\chi ^L}\frac{C}{1+C}\right) ^{-1}, \end{aligned}$$

where the upper bound has a non-trivial dependence on \({\mathrm {TOL}}\) and L through C.

1.2 Appendix 1.2: Heuristic optimization of geometric hierarchies

This section motivates the results in Sect. 2.3 and Corollary 2.3 where we optimized geometric hierarchies defined by \(h_\ell = h_0 {\beta } ^{-\ell }\) for given \(h_0\) and \( {\beta } > 1\). In this case, the work and variance models are in (2.28) and L is must satisfy the bias constraint

$$\begin{aligned} L&\ge \frac{\log {(h_0)} - \frac{1}{q_1}\log {\left( \frac{(1-\theta ) {\mathrm {TOL}}}{Q_W}\right) } }{\log ( {\beta } )}. \end{aligned}$$

We distinguish between two cases:

\(\chi =1\): Or equivalently \(q_2 = d\gamma \). In this case, the total work defined in (5.1) simplifies to

$$\begin{aligned} W= \left( \frac{C_\alpha }{\theta {\mathrm {TOL}}}\right) ^2 \left( \sqrt{V_0} h_0^\frac{ -q_2}{2} + L \sqrt{Q_S} \beta ^{\frac{q_2}{2}} \right) ^2, \end{aligned}$$

We make the simplification of treating L as a real parameter and substitute the lower bound of (5.30) in (5.31) and optimize with respect to \( {\beta } \) to get \( {\beta } =\exp \left( \frac{2}{q_2}\right) \). Substituting this choice and (2.30), the total work satisfies

$$\begin{aligned} \frac{W}{{\mathrm {TOL}}^{-2} \left( \log {{\mathrm {TOL}}} \right) ^2} \rightarrow \theta ^{-2} C_\alpha ^2 Q_S \exp (2) \left( \frac{1}{2\eta }\right) ^2, \qquad \text {as } {\mathrm {TOL}}\rightarrow 0. \end{aligned}$$

Optimizing for \(\theta \) suggests that \(\theta \rightarrow 1\) as \({\mathrm {TOL}}\rightarrow 0\) and (2.18) follows.

\(\chi \ne 1\): In this case, the total work defined in (5.1) simplifies to

$$\begin{aligned} W= \left( \frac{C_\alpha }{\theta {\mathrm {TOL}}} \right) ^2 h_0^{d\gamma (\chi -1)} \left( \sqrt{V_0} h_0^\frac{-q_2}{2} + \sqrt{Q_S} \frac{ \left( 1- {\beta } ^\frac{L(d\gamma -q_2)}{2} \right) }{ {\beta } ^{-\frac{d\gamma }{2}} - {\beta } ^{-\frac{q_2}{2}}} \right) ^2, \end{aligned}$$

for a given \(L, h_0\) and \(\theta \). Again, we make the simplification of treating L as a real parameter and substitute the lower bound (5.30) to obtain

$$\begin{aligned} {\beta } ^\frac{L(d\gamma -q_2)}{2} = \left( \frac{(1-\theta ){\mathrm {TOL}}}{Q_W} \right) ^\frac{\chi -1}{2\eta } h_0^\frac{d\gamma (1-\chi )}{2}, \end{aligned}$$

for any \( {\beta } \). Substituting back in (5.32) and optimizing with respect to \( {\beta } \) to minimize the work gives (2.29). Substituting this optimal \( {\beta } \) in (5.32) yields

$$\begin{aligned} W= \left( \frac{C_\alpha }{\theta {\mathrm {TOL}}} \right) ^2 h_0^{d\gamma (\chi -1)} \left( \sqrt{V_0} h_0^\frac{-q_2}{2} + \sqrt{Q_S} \frac{\chi ^\frac{\chi }{\chi -1}}{\chi -1} \left( 1 - \chi ^{-L}\right) \right) ^2. \end{aligned}$$

Asymptotically, using (2.30) as \({\mathrm {TOL}}\rightarrow 0\) yields (2.23) with the following constants

$$\begin{aligned} C_1&= (1-\theta )^\frac{\chi -1}{\eta }\theta ^{-2} C_\alpha ^2 Q_W^\frac{1-\chi }{\eta } Q_S \left( \frac{\chi ^\frac{\chi }{\chi -1}}{\chi -1} \right) ^2, \end{aligned}$$
$$\begin{aligned} C_2&= \theta ^{-2} C_\alpha ^2 h_0^{d\gamma (\chi -1)} \left( \sqrt{V_0} h_0^\frac{-q_2}{2} + \sqrt{Q_S} \frac{\chi ^\frac{\chi }{\chi -1}}{\chi -1} \right) ^2. \end{aligned}$$

Optimizing these constants with respect to \(\theta \) yields (2.25) and substituting this and (2.32) back yields (2.24a) and (2.31) for \(C_1\) and \(C_2\), respectively. This, as Remark 2.6 mentions, shows that the asymptotic computational complexities of optimal non-geometric and geometric hierarchies are the same.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Haji-Ali, AL., Nobile, F., von Schwerin, E. et al. Optimization of mesh hierarchies in multilevel Monte Carlo samplers. Stoch PDE: Anal Comp 4, 76–112 (2016).

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI:


  • Multilevel Monte Carlo
  • Monte Carlo
  • Partial differential equations with random data
  • Stochastic differential equations
  • Optimal discretization

Mathematics Subject Classification

  • 65C05
  • 65N30
  • 65N22