1 Introduction

We consider the computational approach to Bayesian inverse problems (Stuart 2010), which has attracted a lot of attention in recent years. One typically requires the expectation of a quantity of interest \(\varphi (x)\), where unknown parameter \(x \in \mathsf X\subset \mathbb R^{d}\) has posterior probability distribution \(\pi (x)\), as given by Bayes’ theorem.Footnote 1 Assuming the target distribution cannot be computed analytically, we instead compute the expectation as

$$\begin{aligned} \int _{\mathsf X}\varphi (x)\pi (dx) = \frac{1}{Z} \int _{\mathsf X}\varphi (x)f(x) \pi _0(dx), \end{aligned}$$
(1)

where \(Z = \int _{\mathsf X}f(x)\pi _0(dx)\), f is prescribed up to a normalizing constant, \(\pi _0\) is the prior and \(\pi (x) \propto f(x) \pi _0(x)\). Markov chain Monte Carlo (MCMC) (Geyer 1992; Robert and Casella 1999; Bernardo et al. 1998; Cotter et al. 2013) and sequential Monte Carlo (SMC) (Chopin and Papaspiliopoulos 2020; Del Moral et al. 2006) are two methodologies which can be used to achieve this. In this paper, we use the degenerate notations \(d\pi (x) = \pi (dx) = \pi (x)dx\) to mean the same thing, i.e. the probability under \(\pi \) of an infinitesimal volume element dx (Lebesgue measure by default) centered at x.

Standard Monte Carlo methods can be costly, particularly in the case where the problem involves approximation of an underlying continuum domain, a problem setting which is becoming progressively more prevalent over time (Tarantola 2005; Cotter et al. 2013; Stuart 2010; Law et al. 2015; Van Leeuwen et al. 2015; Oliver et al. 2008). The multilevel Monte Carlo (MLMC) method was developed to reduce the computational cost in this setting, by performing most simulations with low accuracy and low cost (Giles 2015), and successively refining the approximation with corrections that use fewer simulations with higher cost but lower variance. The MLMC approach has attracted a lot of interest from those working on inference problems recently, such as MLMCMC (Dodwell et al. 2015; Hoang et al. 2013) and MLSMC (Beskos et al. 2018, 2017; Moral et al. 2017). The related multi-fidelity Monte Carlo methods often focus on the case where the models lack structure and quantifiable convergence behaviour (Peherstorfer et al. 2018; Cai and Adams 2022), which is very common across science and engineering applications. It is worthwhile to note that MLMC methods can be implemented on the same class of problems, and do not require structure or a priori convergence estimates in order to be implemented. However, convergence rates provide a convenient mechanism to deliver quantifiable theoretical results. This is the case for both multilevel and multi-fidelity approaches.

Recently, an extension of the MLMC method has been established called multi-index Monte Carlo (MIMC) (Haji-Ali et al. 2016). Instead of using first-order differences, MIMC uses high-order mixed differences to reduce the variance of the hierarchical differences dramatically. MIMC has first been considered to apply to the inference context in Cui et al. (2018) and Jasra et al. (2018b, 2021b). The state-of-the-art research of MIMC in inference is presented in Law et al. (2022), in which the MISMC ratio estimator for posterior inference is given, and the theoretical convergence rate of this estimator is guaranteed. Although a canonical complexity of MSE\(^{-1}\) for the MISMC ratio estimator can be achieved, it is still suffers from discretization bias. This bias constrains the choice of index set and estimators of the bias often suffer from high variance, which means implementation can be cumbersome for challenging problems where the method is expected to be particularly advantageous otherwise.

Debiasing techniques were first introduced in Rhee and Glynn (2012, 2015), McLeish (2011) and Strathmann et al. (2015), with many more works using or developing it further (Agapiou et al. 2014; Glynn and Rhee 2014; Jacob and Thiery 2015; Lyne et al. 2015; Walter 2017). These debiasing techniques are based on a similar idea as MLMC, but in addition to reducing the estimator variance, the former focus on building unbiased estimators. The connection between the debiasing technique and the MLMC method has been pointed out by Dereich and Mueller-Gronbach (2015), Giles (2015) and Rhee and Glynn (2015). Vihola (2018) has further clarified the connection within a general framework for unbiased estimators. The first work to combine the debiasing technique and MLMC in the context of inference is Chada et al. (2021). A recent breakthrough involves using double randomization strategies to remove the bias of the increment estimator (Heng et al. 2021; Jasra et al. 2021a, 2020).

The starting point of our current work is the MISMC ratio estimator introduced in Law et al. (2022). Our new randomized MISMC (rMISMC) ratio estimator will be reformulated in the framework of Rhee and Glynn (2015) to remove discretization bias entirely. Like the MISMC ratio estimator, our estimator provably enjoys the complexity improvements of MIMC and the efficiency of SMC for inference. Theoretical results will be given to show that it achieves the canonical complexity of MSE\(^{-1}\) under appropriate assumptions, but without any discretization bias and the consequent requirements for its estimation. From a practical perspective, estimating this bias, and balancing it along with the variance and cost in order to select the index set, comprises a significant overhead for existing multi-index methods. In addition to convenience and simplification, the particular formulation of our un-normalized estimators is novel, and may prove useful in other contexts where one cannot obtain i.i.d. samples from the increments. The unbiased estimators of the normalizing constant and un-normalized integral can also be useful in their own right, in the context of Robbins-Monro Robbins and Monro (1951) or other stochastic approximation algorithms Kushner and Clark (2012); Law et al. (2019); Jasra et al. (2021a).

The paper is organized as follows. In Sect. 2, we present the motivating problems considered in the following numerical experiments. In Sect. 3, the original MISMC ratio estimator is reviewed for convenience, and the rMISMC ratio estimator and its theoretical results are stated. In Sect. 4, we apply MISMC and rMISMC methods on Bayesian inverse problems for elliptic PDEs and log Gaussian process models.

2 Motivating problems

Here, we introduce the Bayesian inference for a D-dimensional elliptic partial differential equation and two statistical models, the log Gaussian Cox model and the log Gaussian process model. We will apply the methods that we present in Sect. 3 to these motivating problems in order to show their efficacy.

2.1 Elliptic partial differential equation

We consider a D-dimensional elliptic partial differential equation defined over an open domain \(\Omega \subset \mathbb R^{D}\) with locally continuous boundary \(\partial \Omega \), i.e. the boundary is the graph of a continuous function in a neighbourhood of any point. Given a forcing function \(\textsf{f}(x): \Omega \rightarrow \mathbb R\) and a diffusion coefficient function \(a(x): \Omega \rightarrow \mathbb R\), depending on a random variable \(x \sim \pi \), the partial differential equation for \(u(x): {\bar{\Omega }} \rightarrow \mathbb R\) (where \({\bar{\Omega }}\) is the closure of \(\Omega \)) is given by

$$\begin{aligned} \begin{aligned} -\nabla \cdot (a(x) \nabla u(x))&= \textsf {f}(x), \ \ \text{ on } \ \Omega ,\\ u(x)&= 0, \ \ \text{ on } \ \partial \Omega . \end{aligned} \end{aligned}$$
(2)

The dependence of the solution u of (2) on x is raised from the dependence of a and \(\textsf{f}\) on x.

In particular, we assume the prior distribution in the numerical experiment as

$$\begin{aligned} x \sim U[-1,1]^d =: \pi _0 \end{aligned}$$
(3)

and a(x) as

$$\begin{aligned} a(x)(z) = a_0 + \sum _{i=1}^{d} x_i \psi _i(z), \end{aligned}$$
(4)

where \(\psi _i\) are smooth functions with \(\left\Vert \psi _i\right\Vert _{\infty }:= \sup _{z\in \Omega } \vert \psi (z) \vert \le 1\) for \(i=1,...,d\), and \(a_0 > \sum _{i=1}^{d}x_i\).

2.1.1 Finite element method

Consider 1D piecewise linear nodal basis functions \(\phi _{j}^{K}\) for meshes \(\{ z_{i}^{K} = i/(K+1) \}_{i=0}^{K+1}\), \(j=1,2,...,K\), which is defined as

$$\begin{aligned} \phi _j^{K}(z) = \left\{ \begin{array}{lll} \frac{z-z_{j-1}^{K}}{z_{j}^{K}-z_{j-1}^{K}}, &{} z \in [z_{j-1}^{K}, z_{j}^{K}]\\ 1-\frac{z-z_{j-1}^{K}}{z_{j+1}^{K}-z_{j}^{K}}, &{} z \in [z_{j}^{K}, z_{j+1}^{K}] \\ 0, &{} \text {otherwise}. \end{array} \right. \end{aligned}$$
(5)

For an index \(\alpha = (\alpha _1,\alpha _2) \in \mathbb {N}^{2}\), we can form the tensor product grid over \(\Omega =[0,1]^2\) as

$$\begin{aligned} \left\{ (z_{i_1}^{K_{1,\alpha }},z_{i_2}^{K_{2,\alpha }})\right\} ^{K_{1,\alpha }+1,K_{2,\alpha }+1}_{i_1=0,i_2=0}, \end{aligned}$$
(6)

where \(K_{1,\alpha }=2^{\alpha _1}\) and \(K_{2,\alpha }=2^{\alpha _2}\) and the mesh size in each direction is \(K_{1,\alpha }^{-1}\) and \(K_{2,\alpha }^{-1}\), respectively. Then the bilinear basis function is constructed by the product of nodal basis functions in two directions:

$$\begin{aligned} \phi _i^{\alpha }(z) = \phi _{i_1,i_2}^{\alpha }(z_1,z_2) = \phi _{i_1}^{K_{1,\alpha }}(z_1)\phi _{i_2}^{K_{2,\alpha }}(z_2), \end{aligned}$$
(7)

where \(i=i_1+K_{1,\alpha }i_2\) for \(i_1=1,...,K_{1,\alpha }\) and \(i_2 = 1,...,K_{2,\alpha }\) and \(K_{\alpha } = K_{1,\alpha }K_{2,\alpha }\).

A Galerkin approximation can be written as

$$\begin{aligned} u_{\alpha }(x)(z) = \sum _{i=1}^{K_{\alpha }}u_{\alpha }^{i}(x)(z) \ \phi _{i}^{\alpha }(z), \end{aligned}$$
(8)

where \(u_{\alpha }^i\) for \(i=1,...,K_{\alpha }\) are approximate values of the solution u(x) at mesh points that we want to obtain. Using Galerkin approximation to solve the weak solution of PDE (2), we can derive a corresponding Galerkin system:

$$\begin{aligned} \textbf{A}_{\alpha }(x) \textbf{u}_{\alpha }(x) = \textbf{f}_{\alpha }(x), \end{aligned}$$
(9)

where \(\textbf{A}_{\alpha }(x)\) is the stiffness matrix whose components are given by

$$\begin{aligned}{}[\textbf{A}_{\alpha }(x)]_{ij} :=&\int _{z_{j_1-1}^{K_{1,\alpha }}}^{z_{j_1+1}^{K_{1,\alpha }}} \int _{z_{j_2-1}^{K_{2,\alpha }}}^{z_{j_2+1}^{K_{2,\alpha }}} a(x)(z) \nabla \phi _{i}^{\alpha }(z)\nonumber \\&\cdot \nabla \phi _{j}^{\alpha }(z) dz, \end{aligned}$$
(10)

where \(j = j_1+j_2 \,K_{1,\alpha }\) for \(j_1 = 1,...,K_{1,\alpha }\) and \(j_2 =1,...,K_{2,\alpha }\),

$$\begin{aligned} \textbf{u}_{\alpha }(x) = [u_{\alpha }^{1}(x)(z_{1}),...,u_{\alpha }^{K_{\alpha }}(x)(z_{K_{\alpha }})]^{T}, \end{aligned}$$

and

$$\begin{aligned}{}[\textbf{f}_{\alpha }(x)]_{j} = \int _{z_{j_1-1}^{K_{1,\alpha }}}^{z_{j_1+1}^{K_{1,\alpha }}}\int _{z_{j_2-1}^{K_{2,\alpha }}}^{z_{j_2+1}^{K_{2,\alpha }}} \textsf{f}(x)(z) \phi _{j}^{\alpha }(z) dz. \end{aligned}$$

2.1.2 The Bayesian inverse problem

Under an elliptic partial differential equation model, we wish to infer the unknown parameter value \(x \in \mathsf X\subset \mathbb R^{d}\) given n evaluations of the solution \(y \in \mathbb R^{n}\) (Stuart 2010). We aim to analyse the posterior distribution \(\mathbb {P}(x \vert y)\) with density \(\pi (x) = \pi (x \vert y)\). In practice, one can only expect to evaluate a discretized version of \(x\in \mathsf X\). \(\pi (dx)\) then can be obtained up to a constant of proportionality by applying Bayes’ theorem:

$$\begin{aligned} \pi (dx) \propto L(x) \pi _0(dx), \end{aligned}$$
(11)

where \(\pi _0(dx)\) is a density of the prior distribution and L(x) is the likelihood which is proportional to the probability density of the data y was created with a given value of the unknown parameter x.

Define the vector-valued function as follows

$$\begin{aligned} \mathcal {G}(u(x)) = [v_1(u(x)),...,v_n(u(x))]^{T}, \end{aligned}$$
(12)

where n is the number of data, \(v_i \in L^2\) and \(v_i(u(x)) = \int v_i(z)u(x)(z)dz\) for \(i=1,...,n\). Then the data can be modelled as

$$\begin{aligned} y = \mathcal {G}(u(x)) + \nu , \ \ \ \nu \sim \mathcal {N}(0,\Sigma ), \end{aligned}$$
(13)

where \(\mathcal {N}(0,\Sigma )\) denotes the Gaussian distribution with mean zero and variance-covariance matrix \(\Sigma \). Then the likelihood of the evaluations y can be derived as

$$\begin{aligned} \pi (y\vert x) \propto L(x):= \exp \Big (-\frac{1}{2}\vert y-\mathcal {G}(u(x))\vert ^2_{\Sigma } \Big ), \end{aligned}$$
(14)

where \(\vert w \vert _{\Sigma } = (w^{\top }\Sigma ^{-1}w)^{1/2}\).

When the solution of the elliptic PDE can only be solved approximately, we denote the approximate solution at resolution multi-index \(\alpha \) as \(u_{\alpha }\) as described above and the approximate likelihood is given by

$$\begin{aligned} \pi _{\alpha }(y\vert x) \propto L_{\alpha }(x):= \exp \Big (-\frac{1}{2}\vert y-\mathcal {G}(u_{\alpha }(x))\vert ^2_{\Sigma } \Big ), \end{aligned}$$
(15)

and the posterior density is given by

$$\begin{aligned} \pi _{\alpha }(dx) \propto L_{\alpha }(x)\pi _0(dx). \end{aligned}$$
(16)

2.2 Log Gaussian process models

Now, we consider the log Gaussian Cox model and the log Gaussian process model. A log Gaussian process (LGP) \(\Lambda (z)\) is given by

$$\begin{aligned} \Lambda (z) = \exp \{ x(z) \} , \end{aligned}$$
(17)

where \(x = \{ x(z): z\in \Omega \subset \mathbb R^{D} \}\) is a real-valued Gaussian process (Rasmussen 2003; Stuart 2010). The log Gaussian process model provides a flexible approach to non-parametric density modelling with controllable smoothness properties. However, inference for the LGP is intractable. The LGP model for density estimation (Tokdar and Ghosh 2007) assumes data \(z_i \sim p\), where \(p(z) = \Lambda (z)/\int _{\Omega } \Lambda (z) dz\). As such, the likelihood of x associated to observations \({\textsf{Z}}=\{z_1,\dots ,z_n\}\) is given by

$$\begin{aligned} L(x; {\textsf{Z}} ) = \prod _{z \in {\textsf{Z}}} p(z) = \left( \int _{\Omega } \Lambda (z) dz \right) ^{-n} \prod _{z \in {\textsf{Z}}} \Lambda (z) . \end{aligned}$$
(18)

The log Gaussian Cox (LGC) model assumes the observations are distributed according to a spatially inhomogeneous Poisson point process with intensity function given by \(\Lambda \). The likelihood of observing \({\textsf{Z}}=\{z_1,\dots ,z_n\}\) under the LGC model is (Møller et al. 1998; Murray et al. 2010; Law et al. 2022; Cai and Adams 2022)

$$\begin{aligned} \begin{aligned} L(x; {\textsf{Z}})&= \exp \Big \{ \int _{\Omega } (1-\Lambda (z)) dz \Big \} \prod _{z\in {\textsf{Z}}} \Lambda (z), \\&= \exp \Big \{ \int _{\Omega } (1-\exp (x(z))) dz \Big \} \prod _{z\in {\textsf{Z}}} \exp (x(z)). \end{aligned} \end{aligned}$$
(19)

This construction has an elegant simplicity, which is flexible and convenient due to the underlying Gaussian process. Some example applications are presented in Diggle et al. (2013).

We consider a dataset comprised of the location of \(n=126\) Scots pine saplings in a natural forest in Finland (Møller et al. 1998), denoted \(z_1,...,z_n \in [0,1]^2\). This is modeled with both LGC, following Heng et al. (2020), and LGP, following Tokdar and Ghosh (2007). The prior is defined in terms of a KL-expansion with a suitable parameter \(\theta = (\theta _1,\theta _2,\theta _3)\) as follows, for \(z \in [0,2]^2\),

$$\begin{aligned} x(z) = \theta _1 + \sum _{k \in \mathbb {Z}\times \mathbb {Z}_{+} \cup \mathbb {Z}_{+} \times 0} \rho _{k} (\theta ) (\xi _{k}\phi _{k}(z) + \xi _{k}^{*} \phi _{-k}(z)),\nonumber \\ \end{aligned}$$
(20)
$$\begin{aligned} \xi _{k} \sim \mathcal {C}\mathcal {N}(0,1) \ \text {i.i.d.}, \end{aligned}$$

where \(\mathcal {C}\mathcal {N}(0,1)\) denotes a standard complex normal distribution, \(\xi _{k}^{*}\) is the complex conjugate of \(\xi _{k}\), \(\phi _{k}(z) \propto \exp [\pi i z \cdot k]\) are Fourier series basis functions (with \(i=\sqrt{-1}\)) and

$$\begin{aligned} \rho _{k}^{2}(\theta ) = \theta _2/((\theta _3 + k_1^2)(\theta _3 + k_2^2))^{\frac{r+1}{2}}. \end{aligned}$$
(21)

The coefficient r controls the smoothness, and here we will choose \(r=1.6\). Note that the periodic prior measure is defined on \([0,2]^2\) so that no boundary conditions are imposed on the sub-domain \([0,1]^2\). Then, the posterior distribution is given by

$$\begin{aligned} \pi (dx) \propto L(x) \pi _0(dx), \end{aligned}$$
(22)

where \(\pi _0\) is constructed in (20) and L(x) is constructed in (19) (or (18)).

2.2.1 The finite approximation problem

One typically use a grid-based approximation to approximate the inferences in LGC (Murray et al. 2010; Diggle et al. 2013; Teng et al. 2017; Cai and Adams 2022) and in LGP (Riihimäki and Vehtari 2014; Griebel and Hegland 2010; Tokdar 2007). We approximate the likelihoods and priors of LGC and LGP by the fast Fourier transform (FFT) respectively, as described below. First, we truncate the KL-expansion of prior as follows, for an index \(\alpha = (\alpha _1,\alpha _2) \in \mathbb {N}^{2}\),

$$\begin{aligned} x_{\alpha }(z) = \theta _1 + \sum _{k \in \mathcal {A}_{\alpha }} \rho _{k} (\theta ) (\xi _{k}\phi _{k}(z) + \xi _{k}^{*} \phi _{-k}(z)),\nonumber \\ \end{aligned}$$
(23)
$$\begin{aligned} \xi _{k} \sim \mathcal {C}\mathcal {N}(0,1) \ \text {i.i.d.}, \end{aligned}$$

where \(\mathcal {A}_{\alpha }:= \{ -2^{\alpha _1/2},-(2^{\alpha _1/2}-1),...,2^{\alpha _1/2}-1,2^{\alpha _1/2} \} \times \{ 1,2,...,2^{\alpha _2/2}-1,2^{\alpha _2/2} \} \cup \{ 1,2,...,2^{\alpha _2/2}-1,2^{\alpha _2/2} \} \times 0\). The cost for approximating \(x_{\alpha }(z)\) over the grid is \(\mathcal {O}((\alpha _1+\alpha _2)2^{\alpha _1+\alpha _2})\). The finite approximations of the likelihood of LGC and LGP are then defined by

$$\begin{aligned}&\text{(LGC) } \ L_{\alpha }(x_{\alpha }) :=\exp \Bigg [ \sum _{i=1}^{n} {\hat{x}}_{\alpha }(z_i) - Q(\exp (x_{\alpha })) \Bigg ] ,\end{aligned}$$
(24)
$$\begin{aligned}&\text{(LGP) } \ L_{\alpha }(x_{\alpha }) :=\exp \Bigg [ \sum _{i=1}^{n} {\hat{x}}_{\alpha }(z_i) - n\log Q(\exp (x_{\alpha })) \Bigg ] , , \end{aligned}$$
(25)

where \({\hat{x}}_{\alpha }(z)\) is defined as an interpolant over the grid output from FFT and Q denotes a quadrature rule, such that \(Q(\exp (x_{\alpha })) \approx \int _{[0,1]^2} \exp (x(z)) dz\). Then, the finite approximations of the posterior distribution of LGC and LGP are defined by

$$\begin{aligned} \pi _{\alpha }(dx_{\alpha }) \propto L_{\alpha }(x_{\alpha }) \pi _0(dx_{\alpha }). \end{aligned}$$
(26)

The quantity of interest for these models will be \(\varphi (x) = \int _{[0,1]^2} \exp (x(z)) dz\), and we will estimate its expectation \(\pi (\varphi ) = \mathbb {E}( \varphi (x) \vert z_1,\dots , z_n )\).

3 Randomized multi-index sequential Monte Carlo

The original MISMC estimator has been considered in Cui et al. (2018) and Jasra et al. (2018b, 2021b). Convergence guarantees have been established in Law et al. (2022), which demonstrates the importance of selecting a reasonable index set, by comparing the results with the tensor product index set and the total degree index set. Then, a very interesting extension to multi-index sequential Monte Carlo is introduced, which is called randomized multi-index sequential Monte Carlo. The basic methodology of randomized multi-index Monte Carlo is first introduced in Rhee and Glynn (2012, 2015). Instead of giving an index set in advance, we choose \(\alpha \) randomly from a distribution. Another advantage of this approach is that it can give an unbiased unnormalized estimator, which is discretization-free.

Define the target distribution as \(\pi (x) = f(x)/Z\), where \(Z = \int _{\mathsf X} f(x) dx\) and \(f(x):= L(x) \pi _0(x)\). Given a quantity of interest \(\varphi : \mathsf X\rightarrow \mathbb R\), for simplicity, we define

$$\begin{aligned} f(\varphi ):= \int _{\mathsf X} \varphi (x) f(x) dx = f(1) \pi (\varphi ), \end{aligned}$$
(27)

where \(f(1) = \int _{\mathsf X} f(x) dx = Z\). Define their approximations at finite resolution \(\alpha \in \mathbb {Z}_+^D\) by \(\pi _\alpha (x) = f_\alpha (x)/Z_\alpha \), where \(Z_\alpha = \int _{\mathsf X} f_\alpha (x) dx\) and \(f_\alpha (x):= L_\alpha (x) \pi _0(x)\), and \(\varphi _{\alpha }:\mathsf X\rightarrow \mathbb R\), where \(\lim _{\vert \alpha \vert \uparrow \infty } f_\alpha = f\) and \(\lim _{\vert \alpha \vert \uparrow \infty } \varphi _\alpha = \varphi \).

Consider the ratio decomposition

$$\begin{aligned} \pi (\varphi ) = \frac{f(\varphi )}{f(1)} = \frac{\sum _{\alpha \in \mathbb {Z}_+^D} \Delta ( f_\alpha (\varphi _\alpha ) )}{\sum _{\alpha \in \mathbb {Z}_+^D} \Delta f_\alpha (1)}, \end{aligned}$$
(28)

where \(\Delta \) is the first-order mixed difference operator

$$\begin{aligned} \Delta = \otimes _{i=1}^{D} \Delta _i:= \Delta _D \circ \cdots \circ \Delta _1 , \end{aligned}$$
(29)

which is defined recursively by the first-order difference operator \(\Delta _i\) along direction \(1 \le i \le D\). If \(\alpha _i>0\),

$$\begin{aligned} \Delta _i \varphi _{\alpha }(x_{\alpha }) = \varphi _{\alpha }(x_{\alpha }) - \varphi _{\alpha -e_i}(x_{\alpha }) , \end{aligned}$$
(30)

where \(e_i\) is the canonical vectors in \(\mathbb R^{D}\), i.e. \((e_i)_j=1\) for \(j=i\) and 0 otherwise. If \(\alpha _i=0\), \(\Delta _i \varphi _{\alpha }(x_{\alpha }) = \varphi _{\alpha }(x_{\alpha })\).

For convenience, we denote the vector of multi-indices

$$\begin{aligned} \varvec{\alpha }(\alpha ):= (\varvec{\alpha }_1(\alpha ),...,\varvec{\alpha }_{2^D}(\alpha ) ) \in \mathbb {Z}_{+}^{D \times 2^D}, \end{aligned}$$
(31)

where \(\varvec{\alpha }_1(\alpha ) = \alpha \), \(\varvec{\alpha }_{2^D}(\alpha ) = \alpha - \sum _{i=1}^{D}e_i\) and \(\varvec{\alpha }_i(\alpha )\) for \(1< i < 2^D\) correspond to the intermediate multi-indices while computing the mixed difference operator \(\Delta \).

Throughout this section \(C>0\) is a constant whose value may change from line to line.

3.1 Original MISMC ratio estimator

In order to make use of (28), we need to construct estimators of \(\Delta ( f_\alpha (\zeta _\alpha ) )\), both for our quantity of interest \(\zeta _\alpha = \varphi _\alpha \) and for \(\zeta _\alpha =1\). The natural and naive way to estimate \(\Delta ( f_\alpha (\zeta _\alpha ) )\) is based on sampling from a coupling of \((\pi _{\varvec{\alpha }_1(\alpha )},...,\pi _{\varvec{\alpha }_{2^D}(\alpha )})\). However, this is not a trivial approach, instead we construct an approximate coupling \(\Pi _{\alpha }: \sigma (\mathsf X^{2^D}) \rightarrow [0,1]\) as follows. We first define the coupling prior distribution as

$$\begin{aligned} \Pi _0(d\varvec{x}) = \pi _0(d\varvec{x}_1) \prod _{i=2}^{2^D} \delta _{\varvec{x}_1}(d \varvec{x}_i) , \end{aligned}$$
(32)

where \(\varvec{x}= (\varvec{x}_1,...,\varvec{x}_{2^D}) \in \mathsf X^{2^D}\) and \(\delta _{\varvec{x}_1}\) denotes the Dirac delta function at \(\varvec{x}_1\). Note that this is an exact coupling of the prior in the sense that for any \(j \in \{1,\dots , 2^D\}\)

$$\begin{aligned} \int _{\mathsf X^{2^D-1}} \Pi _0(d\varvec{x}_{-j}) = \pi _{0}(d\varvec{x}_j) . \end{aligned}$$
(33)

Here we denote \(\varvec{x}_{-j} = (\varvec{x}_1,...,\varvec{x}_{j-1},\varvec{x}_{j+1},...,\varvec{x}_{2^D})\) which omits the jth coordinate. Indeed it is the same coupling used in MIMC (Haji-Ali et al. 2016).

In order to provide estimates analogous to the variance rate in the MIMC (Haji-Ali et al. 2016), we use the SMC sampler (Chopin and Papaspiliopoulos 2020; Del Moral et al. 2006) to compute. We hence adapt Algorithm 1 to an extended target which is an approximate coupling of the actual target as in Jasra et al. (2018a, 2018b, 2021b), Cui et al. (2018) and Franks et al. (2018), and utilize a ratio of estimates, similar to Franks et al. (2018). To this end, we define a likelihood on the coupled space as

$$\begin{aligned} \textbf{L}_\alpha (\varvec{x}) = \max \{ L_{\varvec{\alpha }_1(\alpha )}(\varvec{x}_1), \dots , L_{\varvec{\alpha }_{2^D}(\alpha )}(\varvec{x}_{2^D}) \} . \end{aligned}$$
(34)

The approximate coupling is defined by

$$\begin{aligned} F_\alpha (d\varvec{x}) = \textbf{L}_\alpha (\varvec{x}) \Pi _0(d\varvec{x}), \quad \Pi _\alpha (d\varvec{x}) = \frac{1}{F_\alpha (1)} F_\alpha (d\varvec{x}) .\quad \end{aligned}$$
(35)

Example 1

(Approximate Coupling) Let \(D=2\), \(d=1\) and \(\alpha =(1,1)\), an example of the approximate coupling constructed in (32), (34) and (35) is given by, for \(\varvec{x}=(\varvec{x}_1,\varvec{x}_2,\varvec{x}_3,\varvec{x}_4) \in \mathsf X^{4}\),

$$\begin{aligned}&\Pi _{(1,1)}(\varvec{x}_1,\varvec{x}_2,\varvec{x}_3,\varvec{x}_4) \approx F_{(1,1)}(\varvec{x}_1,\varvec{x}_2,\varvec{x}_3,\varvec{x}_4)\\&\quad = \textbf{L}_{(1,1)}(\varvec{x}_1,\varvec{x}_2,\varvec{x}_3,\varvec{x}_4) \Pi _0(\varvec{x}_1,\varvec{x}_2,\varvec{x}_3,\varvec{x}_4), \end{aligned}$$

where \(\textbf{L}_{(1,1)}(\varvec{x}_1,\varvec{x}_2,\varvec{x}_3,\varvec{x}_4) = \max \{ L_{00}(\varvec{x}_1),L_{01}(\varvec{x}_2),L_{10}(\varvec{x}_3),L_{11}(\varvec{x}_4) \}\) and \( \Pi _0(\varvec{x}_1,\varvec{x}_2,\varvec{x}_3,\varvec{x}_4) = \pi _0(\varvec{x}_1)\delta _{\varvec{x}_1}(\varvec{x}_2) \delta _{\varvec{x}_1}(\varvec{x}_3) \delta _{\varvec{x}_1}(\varvec{x}_4)\). For our choice of prior coupling (32), we effectively have a single distribution, for \(x \in \mathsf X\),

$$\begin{aligned} \Pi _{(1,1)} \approx \max \{ L_{00}(x), L_{01}(x),L_{10}(x),L_{(11)}(x) \} \pi _0(x). \end{aligned}$$

Note that any suitable prior which preserves the marginal as in (33) is admissible.

Let \(H_{\alpha ,j} = F_{\alpha , j+1}/F_{\alpha ,j}\) for some intermediate distributions \(F_{\alpha ,1}, \dots , F_{\alpha ,J}=F_{\alpha }\), for example \(F_{\alpha ,j} = \textbf{L}_\alpha (\varvec{x})^{\tau _j} \Pi _0(\varvec{x})\), where the tempering parameter satisfies \(\tau _1=0\), \(\tau _j<\tau _{j+1}\), and \(\tau _J=1\) (for example \(\tau _j = (j-1)\tau _0\)). Now let \(\varvec{\mathcal {M}}_{\alpha ,j}\) for \(j=2,\dots ,J\) be Markov transition kernels such that \((\Pi _{\alpha ,j} \varvec{\mathcal {M}}_{\alpha ,j})(d\varvec{x}) = \Pi _{\alpha ,j}(d\varvec{x})\), analogous to \(\mathcal {M}\) as any suitable MCMC kernel (Geyer 1992; Robert and Casella 1999; Cotter et al. 2013).

figure a

SMC sampler for coupled estimation of \(\Delta ( f_\alpha (\zeta _\alpha ) )\)

For \(j=1,\dots , J\) define

$$\begin{aligned} \Pi ^N_{\alpha ,j}(d\varvec{x}):= \frac{1}{N} \sum _{i=1}^N \delta _{\varvec{x}^{j,i}}(d\varvec{x}) , \end{aligned}$$
(36)

and then define

$$\begin{aligned} \varvec{Z}_{\alpha }^N:= \prod _{j=1}^{J-1} \Pi ^N_{\alpha ,j}( H_{\alpha ,j} ) , \quad F^N_\alpha (d\varvec{x}):= \varvec{Z}_\alpha ^N \Pi ^N_{\alpha ,J}(d\varvec{x}) . \end{aligned}$$
(37)

The following Assumption will be needed.

Assumption 1

Let \(J\in \mathbb {N}\) be given, and let \(\mathsf X\) be a Banach space. For each \(j\in \{1,\dots ,J\}\) there exists some \(C>0\) such that for all \((\alpha ,x)\in \mathbb {Z}_+^D\times \mathsf X\),

$$\begin{aligned} C^{-1} < Z, L_\alpha (x) \le C. \end{aligned}$$

Then, we have the following convergence result (Del Moral 2004).

Proposition 1

Assume 1. Then for any \((J,p)\in \mathbb {N}\times (0,\infty )\) there exists a \(C>0\) such that for any \(N\in \mathbb {N}\), \(\psi : \mathsf X^{2^D} \rightarrow \mathbb R\) bounded and measurable, and \(\alpha \in \mathbb {Z}_+^D\),

$$\begin{aligned} \mathbb {E}\left[ \vert F_\alpha ^N(\psi ) - F_\alpha (\psi ) \vert ^p\right] ^{1/p} \le \frac{C \left\Vert \psi \right\Vert _\infty }{N^{1/2}} . \end{aligned}$$

In addition, the estimator is unbiased \(\mathbb {E}[F_{\alpha }^N(\psi )] = F_\alpha (\psi )\).

Now, we define the function \(\psi \) with respect to an arbitrary test function \(\zeta _\alpha \), as follows

$$\begin{aligned} \psi _{\zeta _\alpha }(\varvec{x})&:= \sum _{k=1}^{2^D} \iota _k \omega _k(\varvec{x}) \zeta _{\varvec{\alpha }_k(\alpha )}(\varvec{x}_k) \, , \end{aligned}$$
(38)
$$\begin{aligned} \omega _k(\varvec{x})&:= \frac{L_{\varvec{\alpha }_k(\alpha )}\left( \varvec{x}_{k} \right) }{\textbf{L}_{\alpha }(\varvec{x})} \, , \end{aligned}$$
(39)

where \(\iota _k \in \{-1,1\}\) is the sign of the \(k^\textrm{th}\) term in \(\Delta f_\alpha \)Footnote 2. The function \(\psi _{\zeta _{\alpha }}\) gives the mixed difference of the quantity of interest \(\zeta _{\alpha }\) among \(2^D\) intermediate multi-indices. Of particular interest in our estimator are the functions \(\zeta _{\alpha } = \varphi _{\alpha }\), for arbitrary \(\varphi _\alpha \), and \(\zeta _{\alpha } = 1\).

Example 2

(Mixed Difference) Following from Example 1, an example of the mixed difference of the quantity of interest \(\zeta _{(1,1)}\) constructed in (38) is given by

$$\begin{aligned} \psi _{\zeta _{(1,1)}}(\varvec{x})&= \bigg (\frac{L_{11}(\varvec{x}_4)}{\textbf{L}_{(1,1)}(\varvec{x})} \zeta _{11}(\varvec{x}_4) - \frac{L_{10}(\varvec{x}_4)}{\textbf{L}_{(1,1)}(\varvec{x})} \zeta _{10}(\varvec{x}_3) \bigg ) \\&\quad - \bigg (\frac{L_{01}(\varvec{x}_4)}{\textbf{L}_{(1,1)}(\varvec{x})} \zeta _{01}(\varvec{x}_2) - \frac{L_{00}(\varvec{x}_4)}{\textbf{L}_{(1,1)}(\varvec{x})} \zeta _{00}(\varvec{x}_1) \bigg ). \end{aligned}$$

Note that the signs of the terms in the mixed difference are \(\iota _1 = 1\), \(\iota _2 = -1\), \(\iota _3 = -1\) and \(\iota _4 = 1\).

Following from Proposition 1 we have that

$$\begin{aligned} \mathbb {E}[F_\alpha ^N(\psi _{\zeta _\alpha })] = F_\alpha (\psi _{\zeta _\alpha }) = \Delta f_\alpha (\zeta _\alpha ) , \end{aligned}$$
(40)

and there is a \(C>0\) such that

$$\begin{aligned} \mathbb {E}\left[ \vert F_\alpha ^N(\psi _{\zeta _\alpha }) - F_\alpha (\psi _{\zeta _\alpha }) \vert ^2\right] \le C \frac{\left\Vert \psi _{\zeta _\alpha }\right\Vert _\infty }{N} . \end{aligned}$$
(41)

Now given \(\mathcal{I}\subset \mathbb {Z}_+^D\) and \(\{N_\alpha \}_{\alpha \in \mathcal{I}}\), and \(\varphi :\mathsf X\rightarrow \mathbb R\), for each \(\alpha \) run an independent SMC sampler as in Algorithm 1 with \(N_\alpha \) samples, and define the MIMC estimator as

$$\begin{aligned} {\widehat{\varphi }}^\textrm{MI}_\mathcal{I}= \frac{\sum _{\alpha \in \mathcal{I}} F^{N_\alpha }_\alpha (\psi _{\varphi _\alpha })}{\max \{\sum _{\alpha \in \mathcal{I}} F^{N_\alpha }_\alpha (\psi _{1}), Z_\textrm{min}\}} , \end{aligned}$$
(42)

where \(Z_\textrm{min}\) is a lower bound on Z.

A finer analysis than provided in Proposition 1 in order to achieve rigorous MIMC complexity results is shown in Theorem 1 given in Law et al. (2022).

Theorem 1

Assume 1. Then for any \(J\in \mathbb {N}\) there exists a \(C>0\) such that for any \(N\in \mathbb {N}\), \(\psi : \mathsf X^{2^D} \rightarrow \mathbb R\) bounded and measurable and \(\alpha \in \mathbb {Z}_+^D\)

$$\begin{aligned}{} & {} \mathbb {E}_{\alpha } \left[ \vert F_\alpha ^N(\psi _{\zeta _\alpha }) - F_\alpha (\psi _{\zeta _\alpha }) \vert ^2\right] \\{} & {} \quad \le \frac{C}{N} \int _\mathsf X(\Delta ( L_\alpha (x)\zeta _\alpha (x) ))^2 \pi _0(dx) , \end{aligned}$$

where \(\psi _{\zeta _\alpha }(\varvec{x})\) is as (38).

Proof

The result is proven in Law et al. (2022). \(\square \)

3.2 Random sample size version

Consider drawing \(N/N_\textrm{min}\) i.i.d. samples \(\alpha _i \sim \textbf{p}\), where \(\textbf{p}\) is a probability distribution on \(\mathbb {Z}_+^D\) with \(\textbf{p}(\alpha ) =: p_{\alpha } > 0\), to be specified later. Define the allocations \(\textbf{A} \in \mathbb {Z}_+^{D \times N/N_\textrm{min}}\) by \(\textbf{A}_i = \alpha _i\), and the (scaled) counts for each \(\alpha \in \mathbb {Z}_+^D\) by \(N_\alpha = N_\textrm{min} \#\{ i; \alpha _i = \alpha \} \in \mathbb {Z}_+\), collectively denoted \(\textbf{N}\). Note that \(\mathbb {E}N_\alpha = N p_\alpha \) and \(N_\alpha \rightarrow N p_\alpha \).

Now consider constructing a MISMC estimator of the type in (42) using a random number \(N_\alpha \) of samples, \(F_\alpha ^{N_\alpha }(\psi _{\zeta _\alpha })\), and recall the properties (40) and Theorem 1. Conditioned on \(\textbf{A}\), or equivalently conditioned on \(\textbf{N}\), these properties still hold. For \(\zeta : {\textsf{X}} \rightarrow \mathbb R\) define the estimator

$$\begin{aligned} {\widehat{F}}^\textrm{rMI}(\zeta ):= \sum _{\alpha \in \mathbb {Z}_+^D} \frac{N_\alpha }{N p_\alpha } F_\alpha ^{N_\alpha }(\psi _{\zeta _\alpha }) . \end{aligned}$$
(43)

3.3 Theoretical results

The following standard MISMC assumptions will be made.

Assumption 2

For any \(\zeta : \mathsf X\rightarrow \mathbb R\) bounded and Lipschitz, there exist \(C, \beta _i, s_i, \gamma _i >0\) for \(i=1,\dots , D\) such that for resolution vector \((2^{-\alpha _1},\dots , 2^{-\alpha _D})\), i.e. resolution \(2^{-\alpha _i}\) in the \(i^\textrm{th}\) direction, the following holds

  1. (B)

    \(\vert \Delta f_\alpha (\zeta ) \vert =: B_{\alpha } \le C \prod _{i=1}^D 2^{-\alpha _i s_i}\);

  2. (V)

    \(\int _\mathsf X(\Delta ( L_\alpha (x)\zeta _\alpha (x) ))^2 \pi _0(dx) =: V_{\alpha } \le C \prod _{i=1}^D 2^{-\alpha _i \beta _i}\);

  3. (C)

    \(\textrm{COST}(F_\alpha (\psi _{\zeta })) \propto \prod _{i=1}^D 2^{\alpha _i \gamma _i}\).

First, we need to examine the bias of the estimator (43).

Proposition 2

Assume 1, and let \(\zeta : \mathsf X\rightarrow \mathbb R\). For any multi-index \(\alpha \), we have \(p_{\alpha } >0\). Then the randomized MISMC estimator (43) is free from discretization bias, i.e.

$$\begin{aligned} \mathbb {E}[ {\widehat{F}}^\textrm{rMI}(\zeta ) ] = f(\zeta ) . \end{aligned}$$
(44)

Proof

The proof is given in Appendix A.1. \(\square \)

Now that unbiasedness has been established, the next step is to examine the variance.

Proposition 3

Assume 1 and 2, and let \(\zeta : \mathsf X\rightarrow \mathbb R\). For any multi-index \(\alpha \), we have \(p_{\alpha } >0\). Then the variance of the randomized MISMC estimator (43) is given by

$$\begin{aligned}{} & {} \mathbb {E}\left[ ({\widehat{F}}^\textrm{rMI}(\zeta ) - f(\zeta ))^2 \right] \le \frac{C}{N} \Bigg ( \sum _{\alpha \in \mathbb {Z}_{+}^{D}} \frac{1}{p_\alpha } \prod _{i=1}^D 2^{-\alpha _i \beta _i}\nonumber \\{} & {} \qquad \qquad \qquad + \sum _{\alpha ' \ne \alpha \in \mathbb {Z}_{+}^{D}} \prod _{i=1}^D 2^{-\alpha _i \beta _i/2} \prod _{j=1}^D 2^{-\alpha _j' \beta _j/2} \Bigg ). \end{aligned}$$
(45)

In particular, if

$$\begin{aligned} \sum _{\alpha \in \mathbb {Z}_{+}^{D}} \frac{1}{p_\alpha } \prod _{i=1}^D 2^{-\alpha _i \beta _i} \le C , \end{aligned}$$

then one has the canonical convergence rate.

Proof

The proof is given in Appendix A.2. \(\square \)

The randomized MISMC ratio estimator is now defined for \(\varphi : \mathsf X\rightarrow \mathbb R\) by

$$\begin{aligned} \widehat{\varphi }^\textrm{rMI}:= \frac{{\widehat{F}}^\textrm{rMI}(\varphi )}{\max \{{\widehat{F}}^\textrm{rMI}(1),Z_\textrm{min}\}} , \end{aligned}$$
(46)

where \({\widehat{F}}^\textrm{rMI}\) is defined in (43).

Before presenting the main result of the present work, we first recall the main result of Law et al. (2022) which is derived by Theorem 1. Law et al. (2022) considers two index sets for the original MISMC ratio estimator, tensor product index set and total degree index set. Compared to the tensor product index set, the total degree index set abandons some expensive indices, with much looser conditions in the convergence theorem. The convergence result for tensor product index set is given in Theorem 2 of Law et al. (2022) and for total degree index set it is given in Theorem 3 of Law et al. (2022).

Theorem 2

Assume 1 and 2, with \(\sum _{j=1}^D \frac{\gamma _j}{s_j} \le 2\) and \(\beta _i>\gamma _i\) for \(i=1,\dots ,D\). Then for any \(\varepsilon >0\) and bounded and Lipschitz \(\varphi : \mathsf X\rightarrow \mathbb R\), it is possible to choose a tensor product index set \(\mathcal{I}_{L_1:L_D}:= \{\alpha \in \mathbb {Z}_{+}^{D}:\alpha _1\in \{0,...,L_1\},...,\alpha _D \in \{0,...,L_D\}\}\) and \(\{N_\alpha \}_{\alpha \in \mathcal{I}_{L_1:L_D}}\), such that for some \(C>0\) that is independent of \(\varepsilon \)

$$\begin{aligned} \mathbb {E}\left[ ( {\widehat{\varphi }}^\textrm{MI}_{\mathcal{I}_{L_1:L_D}} - \pi (\varphi ) )^2 \right] \le C \varepsilon ^2 , \end{aligned}$$

and \(\textrm{COST}({\widehat{\varphi }}^\textrm{MI}_{\mathcal{I}_{L_1:L_D}} ) \le C \varepsilon ^{-2}\), the canonical rate. The estimator \(\widehat{\varphi }^\textrm{MI}_{\mathcal{I}_{L_1:L_D}} \) is defined in equation (42).

Proof

The proof is given in Law et al. (2022). \(\square \)

Theorem 3

Assume 1 and 2, with \(\beta _i>\gamma _i\) for \(i=1,\dots ,D\). Then for any \(\varepsilon >0\) and bounded and Lipschitz \(\varphi : \mathsf X\rightarrow \mathbb R\), it is possible to choose a total degree index set \(\mathcal{I}_{L}:= \{\alpha \in \mathbb {Z}_{+}^{D}: \sum _{i=1}^{D} \delta _i\alpha _i \le L, \sum _{i=1}^{D} \delta _i=1 \}\), \(\delta _i \in (0,1]\) and \(\{N_\alpha \}_{\alpha \in \mathcal{I}_{L_1:L_D}}\), such that for some \(C>0\) that is independent of \(\varepsilon \)

$$\begin{aligned} \mathbb {E}\left[ ( {\widehat{\varphi }}^\textrm{MI}_{\mathcal{I}_{L}} - \pi (\varphi ) )^2 \right] \le C \varepsilon ^2 , \end{aligned}$$

and \(\textrm{COST}({\widehat{\varphi }}^\textrm{MI}_{\mathcal{I}_{L}} ) \le C \varepsilon ^{-2}\), the canonical rate. The estimator \(\widehat{\varphi }^\textrm{MI}_{\mathcal{I}_{L}} \) is defined in equation (42).

Proof

The proof is given in Law et al. (2022). \(\square \)

Theorem 4

Assume 1 and 2 (V,C), with \(\beta _i>\gamma _i\) for \(i=1,\dots ,D\). Then, for bounded and Lipschitz \(\varphi : \mathsf X\rightarrow \mathbb R\), it is possible to choose a probability distribution \(\textbf{p}\) on \(\mathbb {Z}_+^D\) such that for some \(C>0\) that is independent of N

$$\begin{aligned} \mathbb {E}[ ( {\widehat{\varphi }}^\textrm{rMI} - \pi (\varphi ) )^2 ] \le \frac{C}{N} , \end{aligned}$$

and expected \(\textrm{COST}({\widehat{\varphi }}^\textrm{rMI}) \le C N\), i.e. the canonical rate. The estimator \({\widehat{\varphi }}^\textrm{rMI}\) is defined in equation (46).

Proof

The proof is given in Appendix A.3. \(\square \)

The noticeable differences in Theorem 4 with respect to Theorem 2 and 3 are that (i) discretization bias does not appear and so the bias rates as in Assumption 2 (B) are not required, nor is the constraint related to them shown in Table 1, and (ii) no index set \(\mathcal{I}\) needs to be selected since the estimator sums over \(\alpha \in \mathbb {Z}_+^D\) (noting that many of these indices do not get populated).

Table 1 Comparison of non-desirable constraints required to achieve canonical complexity with MISMC and rMISMC

4 Numerical results

The problems considered here are the same as in Law et al. (2022), and we intend to compare our rMISMC ratio estimator with the original MISMC ratio estimator.

The codes used for the numerical results in this paper can be found in https://github.com/liangxinzhu/rMISMCRE.git.

4.1 Verification of assumption

Discussions in connection with the required Assumption 2 for the 2D PDE and 2D LGP models are revisited here. Verification of the 1D PDE model is naturally satisfied according to the discussion of the 2D PDE model. Propositions 45 and 6 and their proofs are given in Law et al. (2022).

We define the mixed Sobolev-like norms as

$$\begin{aligned} \left\Vert x\right\Vert _{q}:=\left\Vert A^{q/2}x\right\Vert , \end{aligned}$$
(47)

where \(A=\sum _{k\in \mathbb {Z}^2} a_k\phi _k \otimes \phi _k\), for the orthonormal basis \(\{\phi _k\}\) defined above (21), \(a_k = k_1^2k_2^2\), and \(\left\Vert \cdot \right\Vert \) is the \(L^2(\Omega )\) norm. Note that the approximation of the posteriors of the motivating problems have the form \(\exp (\Phi _{\alpha }(x))\) for some \(\Phi _{\alpha }:\mathsf X\rightarrow \mathbb R\).

Proposition 4

Let \(\mathsf X\) be a Banach space with \(D=2\) s.t. \(\pi _0(\mathsf X)=1\), with norm \(\left\Vert \cdot \right\Vert _{\mathsf X}\). For all \(\epsilon > 0\), there exists a \(C(\epsilon ) > 0\) such that the following holds for \(\Phi _{\alpha } = \log (L_{\alpha })\) given by (25) or (15), respectively:

$$\begin{aligned} \Delta \exp (\Phi (x_{\alpha }))\le & {} C(\epsilon )\exp (\epsilon \left\Vert x\right\Vert _{\mathsf X}^2)\big (\vert \Delta \Phi _{\alpha }(x_{\alpha })\vert \nonumber \\{} & {} + \vert \Delta _1 \Phi _{\alpha -e_2}(x_{\alpha -e_2})\vert \vert \Delta _2 \Phi _{\alpha -e_1}(x_{\alpha -e_1})\vert \big ).\nonumber \\ \end{aligned}$$
(48)

The variance rate required in Assumption 2 (V) for PDE and LGP models are verified following Proposition 5 and 6. However, it is difficult to give theoretical verification for the variance rate in the LGC model. Since it involves a factor of double exponentials, like \(\exp (\int \exp (x(z))dz)\), the Fernique Theorem does not guarantee that such a term is finite. Instead we verify it numerically, which is given in the Appendix B.

Proposition 5

Let \(u_{\alpha }\) be the solution of (2) at the resolution \(\alpha \), as described in Sect. 2.1.1, for a(x) given by (4) and uniformly over \(x \in [-1,1]^d\), and \(\textsf{f}(x) \in L^2(\Omega )\). Then there exists a \(C>0\) such that

$$\begin{aligned} \left\Vert \Delta u_{\alpha }(x)\right\Vert \le C 2^{-2(\alpha _1 + \alpha _2)}. \end{aligned}$$
(49)

Since \(L_{\alpha }(x) \le C < \infty \) in the PDE problem by Assumption 1, the constant in Proposition 4 can be made uniform over x, so the variance rate in Assumption 2 is obtained.

Proposition 6

Let \(x \sim \pi _0\), where \(\pi _0\) is a Gaussian process of the form (20) with spectral decay corresponding to (21), and let \(x_{\alpha }\) correspond to truncation on the index set \(\mathcal {A}_{\alpha } = \cap _{i=1}^2\{ \vert k_i \vert \le 2^{\alpha _i} \}\) as in (23). Then there is a \(C>0\) such that for all \(q<(\beta -1)/2\)

$$\begin{aligned} \left\Vert \Delta x_{\alpha }\right\Vert ^2 \le C \left\Vert x\right\Vert _{q}^2 2^{-2q(\alpha _1 + \alpha _2) }. \end{aligned}$$
(50)

Furthermore, this rate is inherited by the likelihood with \(\beta _i=\beta \).

4.2 1D toy problem

We consider a 1D PDE toy problem which has already been applied in Jasra et al. (2021a) and Law et al. (2022). Let the domain be \(\Omega =[0,1]\), \(a = 1\) and \(\textsf{f}=x\) in PDE (2). This toy PDE problem has an analytical solution, \(u(x)(z) = \frac{x}{2}(z^2-z)\). Given the quantity of interest \(\varphi (x) = x^2\), we aim to compute the expectation of the quantity of interest \(\mathbb {E}[\varphi (x)]\). In the following implementation, we take the observations at ten points in the interval [0, 1], which are [0.1, 0.2, ..., 0.9, 1], so the observations are generated by

$$\begin{aligned} y_i = -0.5x^{*}(z_i^2-z_i) + \nu _i, \text { for } i=1,...,10, \end{aligned}$$
(51)

where \([z_1,...,z_{10}] = [0.1, 0.2,...,0.9,1]\), \(x^*\) is sampled from \(U[-1,1]\) and \(\nu _i \sim \mathcal {N}(0,0.2^2)\). The reference solution is computed as in Jasra et al. (2021a) and Law et al. (2022). From Fig. 5, we have \(\alpha = 2\), \(\beta = 4\). The value of \(\gamma \) is 1 because we use linear nodal basis functions in FEM and tridiagonal solver.

Fig. 1
figure 1

Comparison among three methods for the 1D inverse toy problem. We use 100 realisations to compute the MSE for each experiment and use the rate of convergence to compare the different methods. The red circle line is MLSMC with ratio estimator; the yellow diamond line is rMLSMC with ratio estimator; the purple square line is single-level sequential Monte Carlo. Rate of regression: (1) MLSMC: \(-\)1.008; (2) rMLSMC: \(-\)1.016; (3) SMC: \(-\)0.812

From Fig. 1, the convergence behaviour for rMLSMC and MLSMC is nearly the same and the convergence rate for them is approximately \(-1\) which is the canonical rate. The difference in performance between (r)MLSMC and SMC is the rate of convergence, where the convergence rate for SMC is approximately \(-4/5\). With the same total computational cost, the MSE of (r)MLSMC is larger than SMC until the cost reaches \(10^4\). We conclude that (r)MLSMC performs better than SMC in terms of the rate of convergence as expected.

4.3 2D elliptic partial differential equation

Applying rMLSMC in a 1D analytical PDE problem is only an appetizer, we now focus on applying rMISMC to high-dimensional problems. We now consider a 2D non-analytical elliptic PDE on the domain \(\Omega =[0,1]^2\) with \(\textsf{f} = 100\) and a(x) taking the form as

$$\begin{aligned} a(x)(z) = 3+x_1 \cos (3z_1)\sin (3z_2) + x_2 \cos (z_1)\sin (z_2).\nonumber \\ \end{aligned}$$
(52)

We let the prior distribution be a uniform distribution \(U[-1,1]^2\) and set the quantity of interest to be \(\varphi (x) = x_1^2 + x_2^2\), which is a generalisation of the one-dimensional case. We take the observations at a set of four points: {(0.25, 0.25), (0.25, 0.75), (0.75, 0.25), (0.75, 0.75)}, and the corresponding observations are given by

$$\begin{aligned} y = u_{\alpha }(x^{*}) + \nu , \end{aligned}$$
(53)

where \(u_{\alpha }(x^{*})\) is the approximate solution at \(\alpha = [10,10]\), \(x^*\) samples from \(U[-1,1]^2\) and \(\nu \sim \mathcal {N}(0,0.5^2\textbf{I}_{4})\). The 2D PDE solver applied in this report is modified based on a MATLAB toolbox IFISS (Elman et al. 2007).

Due to the zero Dirichlet boundary condition, the solution is zero at \(\alpha _i=0\) and \(\alpha _i=1\) for \(i=1,2\). So we set the starting index as \(\alpha _1=\alpha _2 = 2\). From Fig. 6, we have \(s_1=s_2=2\) and \(\beta _1=\beta _2=4\) for the mixed rates. Since we use the bilinear basis function and MATLAB backslash code, one has \(\gamma _1=\gamma _2=1\).

Fig. 2
figure 2

Comparison between two methods for the 2D non-analytical PDE. We use 200 realisations to compute the MSE for each experiment and use the rate of convergence to compare the different methods. The blue triangle line is MISMC with ratio estimator and tensor product set; the red circle line is MISMC with ratio estimator and total degree set; the yellow diamond line is rMISMC with ratio estimator. Rate of regression: (1) MISMC_TP: \(-\)0.964; (2) MISMC_TD: \(-\)0.925; (3) rMISMC: \(-\)1.015

We consider two index sets in the MISMC approach, which are the tensor product (TP) index set and the total degree (TD) index set. From Fig. 2, MISMC with two different sets and rMISMC have similar convergence behaviour with convergence rate approximately being \(-1\). Although we do not show SMC method in Fig. 2, the theoretical convergence rate of SMC will drop from \(-4/5\) (1D) to \(-2/3\) (2D), whose rate of convergence suffers the curse of dimensionality. Up to now, the convergence behaviour of MISMC (with TP index set or TD index set) and rMISMC is similar when applied to 1D and 2D PDE problems, which both achieve the canonical rates, but we will see a difference in the following two statistical models.

4.4 Log Gaussian Cox model

Now, we consider the LGC model introduced in Sect. 2.2. We set the parameters as \(\theta = (\theta _1,\theta _2,\theta _3) = (0,1,(33/\pi )^2)\) in the LGC model. When using rMISMC and MISMC on the 2D log Gaussian Cox model, we need to set the starting level to \(\alpha _1 = \alpha _2 = 5\), since the regularity shows up when \(\alpha _1 \ge 5\) and \( \alpha _2 \ge 5\). Further, from Fig. 7, we have mixed rates \(s_1=s_2=0.8\) and \(\beta _1=\beta _2=1.6\). Since we use the FFT for approximation, one has an asymptotic rate for the cost, \(\gamma _i=1 + \omega < \beta _i\) for \(\omega >0\) and \(i=1,2\).

Fig. 3
figure 3

Comparison between two methods for the 2D LGC model. We use 200 realisations to compute the MSE for each experiment and use the rate of convergence to compare the different methods. The blue triangle line is MISMC with ratio estimator and tensor product set; the red circle line is MISMC with ratio estimator and total degree set; the yellow diamond line is rMISMC with ratio estimator. Rate of regression: (1) MISMC_TP: \(-\)0.502; (2) MISMC_TD: \(-\)0.972; (3) rMISMC: \(-\)1.008

Fig. 4
figure 4

Comparison between two methods for the 2D LGP Model. We use 200 realisations to compute the MSE for each experiment and use the rate of convergence to compare the different methods. The blue triangle line is MISMC with ratio estimator and tensor product set; the red circle line is MISMC with ratio estimator and total degree set; the yellow diamond line is rMISMC with ratio estimator. Rate of regression: (1) MISMC_TP: \(-\)0.565; (2) MISMC_TD: \(-\)1.017; (3) rMISMC: \(-\)1.195

The rate of convergence of MISMC TD and rMISMC is approximately \(-1\), and both of them achieve the canonical complexity of MSE\(^{-1}\). However, the constant for MISMC TD is smaller than rMISMC. We have set a relatively large number of the minimum number of sample, \(N_0\), in SMC sampler to alleviate the unexpected high variance caused by the few samples. It is reasonable to expect a higher variance for the randomized method, however, since it involves infinitely many terms compared to finite. MISMC TD achieves a canonical rate, but MISMC TP only has a sub-canonical rate. This is because the assumption \( \sum _{i=1}^{2} \frac{\gamma _i}{s_i} \le 2\) is violated \(\left( \sum _{i=1}^{2} \frac{\gamma _i}{s_i} = \frac{5}{2}\right) \) in MISMC, and this assumption is only needed in the tensor product index set, not in the total degree index set. This indicates that an improper choice of an index set in MISMC will result in dropping the canonical rate to the sub-canonical rate, which highlights the benefit of rMISMC since it achieves the canonical rate without providing an index set in advance.

4.5 Log Gaussian process model

We set the parameters as \(\theta = (\theta _1, \theta _2, \theta _3) = (0,1,(33/\pi /2)^2)\) in the LGP model. Similar to the setting in LGP, when using rMISMC and MISMC on the 2D log Gaussian model, we need to set the starting level to \(\alpha _1 = \alpha _2 = 5\), since the regularity shows when \(\alpha _1 \ge 5\) and \( \alpha _2 \ge 5\). Further, from Fig. 8, we set \(s_1=s_2=0.8\) and \(\beta _1=\beta _2=1.6\). Same as the cost rate in LGC, one has \(\gamma _i=1 + \omega < \beta _i\) for \(\omega >0\) and \(i=1,2\).

In the LGP model, we can interpret similar results as in the LGC model: MISMC TP has the sub-canonical rate, and the constant for MISMC TD is smaller than rMISMC. However, the difference between constants for rMISMC and MISMC TD in LGP is much greater than in LGC. There may be other unidentified sources of high variance with respect to the rMISMC. This is the subject of ongoing investigation. In addition, it should be noted that the LGP model is much more sensitive to parameter values (\(\theta =(\theta _1,\theta _2,\theta _3)\)) than the LGC model.