A randomized multi-index sequential Monte Carlo method

We consider the problem of estimating expectations with respect to a target distribution with an unknown normalizing constant, and where even the unnormalized target needs to be approximated at finite resolution. Under such an assumption, this work builds upon a recently introduced multi-index sequential Monte Carlo (SMC) ratio estimator, which provably enjoys the complexity improvements of multi-index Monte Carlo (MIMC) and the efficiency of SMC for inference. The present work leverages a randomization strategy to remove bias entirely, which simplifies estimation substantially, particularly in the MIMC context, where the choice of index set is otherwise important. Under reasonable assumptions, the proposed method provably achieves the same canonical complexity of MSE-1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{-1}$$\end{document} as the original method (where MSE is mean squared error), but without discretization bias. It is illustrated on examples of Bayesian inverse and spatial statistics problems.


Introduction
We consider the computational approach to Bayesian inverse problems [49], which has attracted a lot of attention in recent years.One typically requires the expectation of a quantity of interest ϕ(x), where unknown parameter x ∈ X ⊂ R d has posterior probability distribution π(x), as given by Bayes' theorem 1 .Assuming the target distribution cannot be computed analytically, we instead compute the expectation as where Z = X f (x)π 0 (dx), f is prescribed up to a normalizing constant, π 0 is the prior and π(x) ∝ f (x)π 0 (x).Markov chain Monte Carlo (MCMC) [17,47,2,8] and sequential Monte Carlo (SMC) [7,11] are two methodologies which can be used to achieve this.In this paper, we use the degenerate notations dπ(x) = π(dx) = π(x)dx to mean the same thing, i.e. the probability under π of an infinitesimal volume element dx (Lebesgue measure by default) centered at x.Standard Monte Carlo methods can be costly, particularly in the case where the problem involves approximation of an underlying continuum domain, a problem setting which is becoming progressively more prevalent over time [50,8,49,33,54,40].The multilevel Monte Carlo (MLMC) method was developed to reduce the computational cost in this setting, by performing most simulations with low accuracy and low cost [18], and successively refining the approximation with corrections that use fewer simulations with higher cost but lower variance.The MLMC approach has attracted a lot of interest from those working on inference problems recently, such as MLMCMC [14,24] and MLSMC [3,4,38].The related multi-fidelity Monte Carlo methods often focus on the case where the models lack structure and quantifiable convergence behaviour [41,5], which is very common across science and engineering applications.It is worthwhile to note that MLMC methods can be implemented on the same class of problems, and do not require structure or a priori convergence estimates in order to be implemented.However, convergence rates provide a convenient mechanism to deliver quantifiable theoretical results.This is the case for both multilevel and multi-fidelity approaches.
Recently, an extension of the MLMC method has been established called multiindex Monte Carlo (MIMC) [21].Instead of using first-order differences, MIMC uses high-order mixed differences to reduce the variance of the hierarchical differences dramatically.MIMC has first been considered to apply to the inference context in [9] and [27,29].The state-of-the-art research of MIMC in inference is presented in [34], in which the MISMC ratio estimator for posterior inference is given, and the theoretical convergence rate of this estimator is guaranteed.Although a canonical complexity of MSE −1 for the MISMC ratio estimator can be achieved, it is still suffers from discretization bias.This bias constrains the choice of index set and estimators of the bias often suffer from high variance, which means implementation can be cumbersome for challenging problems where the method is expected to be particularly advantageous otherwise.
Debiasing techniques were first introduced in [43,44], [36] and [48], with many more works using or developing it further [1,19,25,35,56].These debiasing techniques are based on a similar idea as MLMC, but in addition to reducing the estimator variance, the former focus on building unbiased estimators.The connection between the debiasing technique and the MLMC method has been pointed out by [12], [18] and [44].[55] has further clarified the connection within a general framework for unbiased estimators.The first work to combine the debiasing technique and MLMC in the context of inference is [6].A recent breakthrough involves using double randomization strategies to remove the bias of the increment estimator [23,28,30].
The starting point of our current work is the MISMC ratio estimator introduced in [34].Our new randomized MISMC (rMISMC) ratio estimator will be reformulated in the framework of [44] to remove discretization bias entirely.Like the MISMC ratio estimator, our estimator provably enjoys the complexity improvements of MIMC and the efficiency of SMC for inference.Theoretical results will be given to show that it achieves the canonical complexity of MSE −1 under appropriate assumptions, but without any discretization bias and the consequent requirements for its estimation.From a practical perspective, estimating this bias, and balancing it along with the variance and cost in order to select the index set, comprises a significant overhead for existing multi-index methods.In addition to convenience and simplification, the particular formulation of our un-normalized estimators is novel, and may prove useful in other contexts where one cannot obtain i.i.d.samples from the increments.The unbiased estimators of the normalizing constant and un-normalized integral can also be useful in their own right, in the context of Robbins-Monro [46] or other stochastic approximation algorithms [31,32,28].
The paper is organized as follows.In Section 2, we present the motivating problems considered in the following numerical experiments.In Section 3, the original MISMC ratio estimator is reviewed for convenience, and the rMISMC ratio estimator and its theoretical results are stated.In Section 4, we apply MISMC and rMISMC methods on Bayesian inverse problems for elliptic PDEs and log Gaussian process models.

Motivating problems
Here, we introduce the Bayesian inference for a D-dimensional elliptic partial differential equation and two statistical models, the log Gaussian Cox model and the log Gaussian process model.We will apply the methods that we present in Section 3 to these motivating problems in order to show their efficacy.

Elliptic partial differential equation
We consider a D-dimensional elliptic partial differential equation defined over an open domain Ω ⊂ R D with locally continuous boundary ∂Ω, i.e. the boundary is the graph of a continuous function in a neighbourhood of any point.Given a forcing function f(x) : Ω → R and a diffusion coefficient function a(x) : Ω → R, depending on a random variable x ∼ π, the partial differential equation for u(x) : Ω → R (where Ω is the closure of Ω) is given by The dependence of the solution u of (2) on x is raised from the dependence of a and f on x.
In particular, we assume the prior distribution in the numerical experiment as and a(x) as where ψ i are smooth functions with ψ i ∞ := sup z∈Ω |ψ(z)| ≤ 1 for i = 1, ..., d, and a 0 > d i=1 x i .

The Bayesian inverse problem
Under an elliptic partial differential equation model, we wish to infer the unknown parameter value x ∈ X ⊂ R d given n evaluations of the solution y ∈ R n [49].We aim to analyse the posterior distribution P(x|y) with density π(x) = π(x|y).In practice, one can only expect to evaluate a discretized version of x ∈ X. π(dx) then can be obtained up to a constant of proportionality by applying Bayes' theorem: where π 0 (dx) is a density of the prior distribution and L(x) is the likelihood which is proportional to the probability density of the data y was created with a given value of the unknown parameter x.Define the vector-valued function as follows where n is the number of data, v i ∈ L 2 and v i (u(x)) = v i (z)u(x)(z)dz for i = 1, ..., n.Then the data can be modelled as where N (0, Σ) denotes the Gaussian distribution with mean zero and variancecovariance matrix Σ.Then the likelihood of the evaluations y can be derived as where When the solution of the elliptic PDE can only be solved approximately, we denote the approximate solution at resolution multi-index α as u α as described above and the approximate likelihood is given by and the posterior density is given by

Log Gaussian process models
Now, we consider the log Gaussian Cox model and the log Gaussian process model.A log Gaussian process (LGP) Λ(z) is given by where x = {x(z) : z ∈ Ω ⊂ R D } is a real-valued Gaussian process [42,49].The log Gaussian process model provides a flexible approach to non-parametric density modelling with controllable smoothness properties.However, inference for the LGP is intractable.The LGP model for density estimation [53] assumes data z i ∼ p, where p(z) = Λ(z)/ Ω Λ(z)dz.As such, the likelihood of x associated to observations Z = {z 1 , . . ., z n } is given by The log Gaussian Cox (LGC) model assumes the observations are distributed according to a spatially inhomogeneous Poisson point process with intensity function given by Λ.The likelihood of observing Z = {z 1 , . . ., z n } under the LGC model is [37,39,34,5] This construction has an elegant simplicity, which is flexible and convenient due to the underlying Gaussian process.Some example applications are presented in [13].
We consider a dataset comprised of the location of n = 126 Scots pine saplings in a natural forest in Finland [37], denoted z 1 , ..., z n ∈ [0, 1] 2 .This is modeled with both LGC, following [22], and LGP, following [53].The prior is defined in terms of a KL-expansion with a suitable parameter θ where CN (0, 1) denotes a standard complex normal distribution, The coefficient r controls the smoothness, and here we will choose r = 1.6.Note that the periodic prior measure is defined on [0, 2] 2 so that no boundary conditions are imposed on the sub-domain [0, 1] 2 .Then, the posterior distribution is given by where π 0 is constructed in (20) and L(x) is constructed in (19) (or ( 18)).

The finite approximation problem
One typically use a grid-based approximation to approximate the inferences in LGC [39,13,51,5] and in LGP [45,20,52].We approximate the likelihoods and priors of LGC and LGP by the fast Fourier transform (FFT) respectively, as described below.First, we truncate the KL-expansion of prior as follows, for an index α where The finite approximations of the likelihood of LGC and LGP are then defined by where xα (z) is defined as an interpolant over the grid output from FFT and Q denotes a quadrature rule, such that Q(exp(x α )) ≈ [0,1] 2 exp(x(z))dz.Then, the finite approximations of the posterior distribution of LGC and LGP are defined by The quantity of interest for these models will be ϕ(x) = [0,1] 2 exp(x(z))dz, and we will estimate its expectation π(ϕ) = E(ϕ(x)|z 1 , . . ., z n ).

Randomized Multi-index sequential Monte Carlo
The original MISMC estimator has been considered in [9] and [27,29].Convergence guarantees have been established in [34], which demonstrates the importance of selecting a reasonable index set, by comparing the results with the tensor product index set and the total degree index set.Then, a very interesting extension to multiindex sequential Monte Carlo is introduced, which is called randomized multi-index sequential Monte Carlo.The basic methodology of randomized multi-index Monte Carlo is first introduced in [43,44].Instead of giving an index set in advance, we choose α randomly from a distribution.Another advantage of this approach is that it can give an unbiased unnormalized estimator, which is discretization-free.
Define the target distribution as π(x) = f (x)/Z, where Z = X f (x)dx and f (x) := L(x)π 0 (x).Given a quantity of interest ϕ : X → R, for simplicity, we define where f (1) = X f (x)dx = Z.Define their approximations at finite resolution Consider the ratio decomposition where ∆ is the first-order mixed difference operator which is defined recursively by the first-order difference operator ∆ i along direction where e i is the canonical vectors in R D , i.e. (e i ) j = 1 for j = i and 0 otherwise.If For convenience, we denote the vector of multi-indices where α 1 (α) = α, α 2 D (α) = α − D i=1 e i and α i (α) for 1 < i < 2 D correspond to the intermediate multi-indices while computing the mixed difference operator ∆.
Throughout this section C > 0 is a constant whose value may change from line to line.
In order to provide estimates analogous to the variance rate in the MIMC [21], we use the SMC sampler [7,11] to compute.We hence adapt Algorithm 1 to an extended target which is an approximate coupling of the actual target as in [26,27,29], [9] and [16], and utilize a ratio of estimates, similar to [16].To this end, we define a likelihood on the coupled space as The approximate coupling is defined by Example 3.1 (Approximate Coupling).Let D = 2, d = 1 and α = (1, 1), an example of the approximate coupling constructed in (32), ( 34) and ( 35) is given by, where ).For our choice of prior coupling (32), we effectively have a single distribution, for x ∈ X, Note that any suitable prior which preserves the marginal as in (33) is admissible.
Assumption 3.1.Let J ∈ N be given, and let X be a Banach space.For each j ∈ {1, . . ., J} there exists some C > 0 such that for all (α, x) ∈ Z D + × X, Then, we have the following convergence result [10].
Proposition 3.1.Assume 3.1.Then for any (J, p) ∈ N × (0, ∞) there exists a C > 0 such that for any N ∈ N, ψ : X2 D → R bounded and measurable, and In addition, the estimator is unbiased E[F N α (ψ)] = F α (ψ).Now, we define the function ψ with respect to an arbitrary test function ζ α , as follows where Note that the signs of the terms in the mixed difference are Following from Proposition 3.1 we have that and there is a C > 0 such that Now given I ⊂ Z D + and {N α } α∈I , and ϕ : X → R, for each α run an independent SMC sampler as in Algorithm 1 with N α samples, and define the MIMC estimator as where Z min is a lower bound on Z.
A finer analysis than provided in Proposition 3.1 in order to achieve rigorous MIMC complexity results is shown in Theorem 3.1 given in [34].
Theorem 3.1.Assume 3.1.Then for any J ∈ N there exists a C > 0 such that for any N ∈ N, ψ : X 2 D → R bounded and measurable and α where ψ ζα (x) is as (38).
Proof.The result is proven in [34].

Theoretical results
The following standard MISMC assumptions will be made.
Proof.The proof is given in Appendix A.1.
Now that unbiasedness has been established, the next step is to examine the variance.
Proposition 3.3.Assume 3.1 and 3.2, and let ζ : X → R. For any multi-index α, we have p α > 0. Then the variance of the randomized MISMC estimator (43) is given by (45) In particular, if then one has the canonical convergence rate.
Proof.The proof is given in Appendix A.2.
The randomized MISMC ratio estimator is now defined for ϕ : X → R by where F rMI is defined in (43).Before presenting the main result of the present work, we first recall the main result of [34] which is derived by Theorem 3.1.[34] considers two index sets for the original MISMC ratio estimator, tensor product index set and total degree index set.Compared to the tensor product index set, the total degree index set abandons some expensive indices, with much looser conditions in the convergence theorem.The convergence result for tensor product index set is given in Theorem 3.2 of [34] and for total degree index set it is given in Theorem 3.3 of [34].Theorem 3.2.Assume 3.1 and 3.2, with D j=1 γ j s j ≤ 2 and β i > γ i for i = 1, . . ., D. Then for any ε > 0 and bounded and Lipschitz ϕ : X → R, it is possible to choose a tensor product index set and COST( ϕ MI Proof.The proof is given in [34].Theorem 3.3.Assume 3.1 and 3.2, with β i > γ i for i = 1, . . ., D. Then for any ε > 0 and bounded and Lipschitz ϕ : X → R, it is possible to choose a total degree index set and COST( ϕ MI I L ) ≤ Cε −2 , the canonical rate.The estimator ϕ MI I L is defined in equation (42).
Proof.The proof is given in [34].
Theorem 3.4.Assume 3.1 and 3.2 (V,C), with β i > γ i for i = 1, . . ., D. Then, for bounded and Lipschitz ϕ : X → R, it is possible to choose a probability distribution p on Z D + such that for some C > 0 that is independent of N and expected COST( ϕ rMI ) ≤ CN, i.e. the canonical rate.The estimator ϕ rMI is defined in equation (46).
Proof.The proof is given in Appendix A.3.
The noticeable differences in Theorem 3.4 with respect to Theorem 3.2 and 3.3 are that (i) discretization bias does not appear and so the bias rates as in Assumption 3.2 (B) are not required, nor is the constraint related to them shown in Table 1, and (ii) no index set I needs to be selected since the estimator sums over α ∈ Z D + (noting that many of these indices do not get populated).

Methods
Conditions

Numerical results
The problems considered here are the same as in [34], and we intend to compare our rMISMC ratio estimator with the original MISMC ratio estimator.The codes used for the numerical results in this paper can be found in https://github.com/liang

Verification of assumption
Discussions in connection with the required Assumption 3.2 for the 2D PDE and 2D LGP models are revisited here.Verification of the 1D PDE model is naturally satisfied according to the discussion of the 2D PDE model.Propositions 4.1, 4.2 and 4.3 and their proofs are given in [34].
We define the mixed Sobolev-like norms as where A = k∈Z 2 a k φ k ⊗ φ k , for the orthonormal basis {φ k } defined above (21), , and • is the L 2 (Ω) norm.Note that the approximation of the posteriors of the motivating problems have the form exp(Φ α (x)) for some Φ α : X → R. Proposition 4.1.Let X be a Banach space with D = 2 s.t.π 0 (X) = 1, with norm • X .For all ǫ > 0, there exists a C(ǫ) > 0 such that the following holds for Φ α = log(L α ) given by (25) or (15), respectively: The variance rate required in Assumption 3.2 (V) for PDE and LGP models are verified following Proposition 4.2 and 4.3.However, it is difficult to give theoretical verification for the variance rate in the LGC model.Since it involves a factor of double exponentials, like exp( exp(x(z))dz), the Fernique Theorem does not guarantee that such a term is finite.Instead we verify it numerically, which is given in the Appendix B. Proposition 4.2.Let u α be the solution of (2) at the resolution α, as described in Section 2.1.1,for a(x) given by ( 4) and uniformly over x ∈ [−1, 1] d , and f(x) ∈ L 2 (Ω).Then there exists a C > 0 such that Since L α (x) ≤ C < ∞ in the PDE problem by Assumption 3.1, the constant in Proposition 4.1 can be made uniform over x, so the variance rate in Assumption 3.2 is obtained.Proposition 4.3.Let x ∼ π 0 , where π 0 is a Gaussian process of the form (20) with spectral decay corresponding to (21), and let x α correspond to truncation on the index set (23).Then there is a C > 0 such that for all q < (β Furthermore, this rate is inherited by the likelihood with β i = β.
The reference solution is computed as in [28] and [34].From Figure 5, we have α = 2, β = 4.The value of γ is 1 because we use linear nodal basis functions in FEM and tridiagonal solver.
From Figure 1, the convergence behaviour for rMLSMC and MLSMC is nearly the same and the convergence rate for them is approximately −1 which is the canonical rate.The difference in performance between (r)MLSMC and SMC is the rate of convergence, where the convergence rate for SMC is approximately −4/5.With the same total computational cost, the MSE of (r)MLSMC is larger than SMC until the cost reaches 10 4 .We conclude that (r)MLSMC performs better than SMC in terms of the rate of convergence as expected.
We consider two index sets in the MISMC approach, which are the tensor product (TP) index set and the total degree (TD) index set.From Figure 2, MISMC with two different sets and rMISMC have similar convergence behaviour with convergence rate approximately being −1.Although we do not show SMC method in Figure 2, the theoretical convergence rate of SMC will drop from −4/5 (1D) to −2/3 (2D), whose rate of convergence suffers the curse of dimensionality.Up to now, the convergence behaviour of MISMC (with TP index set or TD index set) and rMISMC is similar when applied to 1D and 2D PDE problems, which both achieve the canonical rates, but we will see a difference in the following two statistical models.

Log Gaussian Cox model
Now, we consider the LGC model introduced in Section 2.2.We set the parameters as θ = (θ 1 , θ 2 , θ 3 ) = (0, 1, (33/π) 2 ) in the LGC model.When using rMISMC and MISMC on the 2D log Gaussian Cox model, we need to set the starting level to α 1 = α 2 = 5, since the regularity shows up when α 1 ≥ 5 and α 2 ≥ 5. Further, from Figure 7, we have mixed rates s 1 = s 2 = 0.8 and β 1 = β 2 = 1.6.Since we use the FFT for approximation, one has an asymptotic rate for the cost, γ i = 1 + ω < β i for ω > 0 and i = 1, 2.  The rate of convergence of MISMC TD and rMISMC is approximately −1, and both of them achieve the canonical complexity of MSE −1 .However, the constant for MISMC TD is smaller than rMISMC.We have set a relatively large number of the minimum number of sample, N 0 , in SMC sampler to alleviate the unexpected high variance caused by the few samples.It is reasonable to expect a higher variance for the randomized method, however, since it involves infinitely many terms compared to finite.MISMC TD achieves a canonical rate, but MISMC TP only has a subcanonical rate.This is because the assumption 2 2 ) in MISMC, and this assumption is only needed in the tensor product index set, not in the total degree index set.This indicates that an improper choice of an index set in MISMC will result in dropping the canonical rate to the sub-canonical rate, which highlights the benefit of rMISMC since it achieves the canonical rate without providing an index set in advance.

Log Gaussian process model
We set the parameters as θ = (θ 1 , θ 2 , θ 3 ) = (0, 1, (33/π/2) 2 ) in the LGP model.Similar to the setting in LGP, when using rMISMC and MISMC on the 2D log Gaussian model, we need to set the starting level to α 1 = α 2 = 5, since the regularity shows when α 1 ≥ 5 and α 2 ≥ 5. Further, from Figure 8, we set s 1 = s 2 = 0.8 and β 1 = β 2 = 1.6.Same as the cost rate in LGC, one has γ i = 1 + ω < β i for ω > 0 and i = 1, 2. In the LGP model, we can interpret similar results as in the LGC model: MISMC TP has the sub-canonical rate, and the constant for MISMC TD is smaller than rMISMC.However, the difference between constants for rMISMC and MISMC TD in LGP is much greater than LGC.There may be other sources of high variance with respect to the This the subject of ongoing investigation.In addition, it should be noted that the LGP model is much more sensitive to parameter values (θ = (θ 1 , θ 2 , θ 3 )) than the LGC model.

Financial Interests
KJHL and XL gratefully acknowledge the support of IBM and EPSRC in the form of an Industrial Case Doctoral Studentship Award.

A Proofs
The proofs of the various results in the paper are presented here, along with restatements of the results.
Proof.Using the law of iterated conditional expectations, and (40) conditioned on N α , one has A.2 Proof relating to Proposition 3.3 In particular, if then one has the canonical convergence rate.
Proof.One has The diagonal and off-diagonal terms will be treated separately.First, for the diagonal terms, adding ± Nα N pα ∆f α (ζ α ) in the square term of the diagonal term and using the triangle inequality, we have Since p is proportional to a multinomial distribution, we have E Then, for the first term of (60), the same decomposition of (56), along with the result of Theorem 3.1 (conditioned on N α ) and Assumption 3.2 (V) gives For the second term of (60), Applying Jensen's inequality to (∆f α (ζ α )) 2 , we have Hence, we derive the bound for the diagonal term as follows Now, the off-diagonal terms are more subtle because of the correlation between N α and N α ′ , which are (proportional to) draws from a multinomial distribution with N/N min samples and probability p, and so Using the law of iterated conditional expectations Line (65) follows from the conditional independence of F Nα α (ψ ζα ) and + and some simple algebra.Line (66) follows from (64) and Assumption 3.2 (V), along with Jensen's inequality applied to Furthermore, Since we can found a constant C to bound the last line.
A.3 Proof relating to Theorem 3.4 Theorem 4. Assume 3.1 and 3.2 (V,C), with β i > γ i for i = 1, . . ., D. Then, for suitable ϕ : X → R, it is possible to choose a probability distribution p on Z D + such that for some and expected COST( ϕ rMI ) ≤ CN, i.e. the canonical rate.The estimator ϕ rMI is defined in equation (46).
Proof.Since our unnormalized estimator is unbiased, we have Following from Proposition 3.3, we need

B Figures
The plots of best log linear fit of convergence parameters for the various problems are illustrated in Figures 5, 6

1} is the sign of the k th term in ∆f α 2 .
The function ψ ζα gives the mixed difference of the quantity of interest ζ α among 2 D intermediate multi-indices.Of particular interest in our estimator are the functions ζ α = ϕ α , for arbitrary ϕ α , and ζ α = 1.Example 3.2 (Mixed Difference).Following from Example 3.1, an example of the mixed difference of the quantity of interest ζ (1,1) constructed in (38) is given by Consider drawing N/N min i.i.d.samples α i ∼ p, where p is a probability distribution on Z D + with p(α) =: p α > 0, to be specified later.Define the allocations A ∈ Z D×N/N min + by A i = α i , and the (scaled) counts for each α ∈ Z D + by N α = N min #{i; α i = α} ∈ Z + , collectively denoted N. Note that EN α = Np α and N α → Np α .Now consider constructing a MISMC estimator of the type in (42) using a random number N α of samples, F Nα α (ψ ζα ), and recall the properties (40) and Theorem 3.1.Conditioned on A, or equivalently conditioned on N, these properties still hold.For ζ : X → R define the estimator

Figure 1 :
Figure 1: Comparison among three methods for the 1D inverse toy problem.We use 100 realisations to compute the MSE for each experiment and use the rate of convergence to compare the different methods.The red circle line is MLSMC with ratio estimator; the yellow diamond line is rMLSMC with ratio estimator; the purple square line is single-level sequential Monte Carlo.Rate of regression: (1)MLSMC: -1.008; (2)rMLSMC: -1.016; (3)SMC: -0.812

Figure 2 :
Figure 2: Comparison between two methods for the 2D non-analytical PDE.We use 200 realisations to compute the MSE for each experiment and use the rate of convergence to compare the different methods.The blue triangle line is MISMC with ratio estimator and tensor product set; the red circle line is MISMC with ratio estimator and total degree set; the yellow diamond line is rMISMC with ratio estimator.Rate of regression: (1)MISMC_TP: -0.964; (2)MISMC_TD: -0.925; (3)rMISMC: -1.015

Figure 3 :
Figure 3: Comparison between two methods for the 2D LGC model.We use 200 realisations to compute the MSE for each experiment and use the rate of convergence to compare the different methods.The blue triangle line is MISMC with ratio estimator and tensor product set; the red circle line is MISMC with ratio estimator and total degree set; the yellow diamond line is rMISMC with ratio estimator.Rate of regression: (1)MISMC_TP: -0.502; (2)MISMC_TD: -0.972; (3)rMISMC: -1.008

Figure 4 :
Figure 4: Comparison between two methods for the 2D LGP Model.We use 200 realisations to compute the MSE for each experiment and use the rate of convergence to compare the different methods.The blue triangle line is MISMC with ratio estimator and tensor product set; the red circle line is MISMC with ratio estimator and total degree set; the yellow diamond line is rMISMC with ratio estimator.Rate of regression: (1)MISMC_TP: -0.565; (2)MISMC_TD: -1.017; (3)rMISMC: -1.195

Proposition 3 .
Assume 3.1 and 3.2, and let ζ : X → R. For any multi-index α, we have p α > 0. Then the variance of the randomized MISMC estimator (43) is given by

8 Figure 5 :
Figure 5: Convergence parameters fitting for the 1D inverse toy problem.Empirical values for strong and weak convergence parameters.All plots use 100 realizations with 1000 samples in each index as the maximum level L max = 6 at each experiment.(Left) Logarithm plot with base 2 for the mean of difference and reference line with slope -2; (Right) Logarithm plot with base 2 for the variance of difference and reference line with slope -4

Figure 6 :
Figure 6: Convergence parameters fitting for the 2D inverse elliptic PDE problem.Empirical values for strong and weak convergence parameters.All plots use 20 realizations with 1000 samples in each index as the maximum multi-index α 1 = 6 and α 2 = 6 for each experiment.(Left) Logarithm plots with base 2 for the mean of unnormalized increments of increments; (Right) logarithm plots with base 2 for the variance of unnormalized increments of increments

Figure 7 :
Figure 7: Convergence parameters fitting for the 2D LGC model.Empirical values for strong and weak convergence parameters.All plots use 20 realizations with 1000 samples in each index as the minimum multi-index α 1 = α 2 = 5 and maximum multi-index α 1 = α 2 = 8 at each experiment.(Left) Logarithm plots with base 2 for the mean of unnormalized increments of increments; (Right) logarithm plots with base 2 for the variance of unnormalized increments of increments

Figure 8 :
Figure 8: Convergence parameters fitting for the 2D LGP model.Empirical values for strong and weak convergence parameters.All plots use 20 realizations with 1000 samples in each index as the minimum multi-index α 1 = α 2 = 5 and maximum multi-index α 1 = α 2 = 8 at each experiment.(Left) Logarithm plots with base 2 for the mean of unnormalized increments of increments; (Right) logarithm plots with base 2 for the variance of unnormalized increments of increments

Table 1 :
Comparison of non-desirable constraints required to achieve canonical complexity with MISMC and rMISMC.