Multilevel particle filters: normalizing constant estimation


In this article, we introduce two new estimates of the normalizing constant (or marginal likelihood) for partially observed diffusion (POD) processes, with discrete observations. One estimate is biased but non-negative and the other is unbiased but not almost surely non-negative. Our method uses the multilevel particle filter of Jasra et al. (Multilevel particle lter, arXiv:1510.04977, 2015). We show that, under assumptions, for Euler discretized PODs and a given \(\varepsilon >0\) in order to obtain a mean square error (MSE) of \({\mathcal {O}}(\varepsilon ^2)\) one requires a work of \({\mathcal {O}}(\varepsilon ^{-2.5})\) for our new estimates versus a standard particle filter that requires a work of \({\mathcal {O}}(\varepsilon ^{-3})\). Our theoretical results are supported by numerical simulations.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2


  1. Beskos, A., Jasra, A., Law, K.J.H., Tempone, R., Zhou, Y.: Multilevel SMC samplers. Stoch. Proc. Appl. (2016) (to appear)

  2. Cappé, O., Ryden, T., Moulines, É.: Inference in Hidden Markov Models. Springer, New York (2005)

    Google Scholar 

  3. Cerou, F., Del Moral, P., Guyader, A.: A non-asymptotic theorem for unnormalized Feynman-Kac particle models. Ann. Inst. Henri Poincaire 47, 629–649 (2011)

    Article  MATH  Google Scholar 

  4. Chopin, N., Singh, S.S.: On particle Gibbs sampling. Bernoulli 21, 1855–1883 (2015)

    MathSciNet  Article  MATH  Google Scholar 

  5. Del Moral, P.: Feynman–Kac Formulae: Genealogical and Interacting Particle Systems with Applications. Springer, New York (2004)

    Google Scholar 

  6. Del Moral, P.: Mean Field Simulation for Monte Carlo Integration. Chapman & Hall, London (2013)

    Google Scholar 

  7. Del Moral, P., Jacod, J., Protter, P.: The Monte-Carlo method for filtering with discrete-time observations. Probab. Theory Rel. Fields 120, 346–368 (2001)

    MathSciNet  Article  MATH  Google Scholar 

  8. Doucet, A., Johansen, A.: A tutorial on particle filtering and smoothing: Fifteen years later. In: Crisan, D., Rozovsky, B. (eds.) Handbook of Nonlinear Filtering. Oxford University Press, Oxford (2011)

    Google Scholar 

  9. Fearnhead, P., Papaspiliopoulos, O., Roberts, G.O.: Particle filters for partially observed diffusions. J. R. Stat. Soc. Ser. B 70, 755–777 (2008)

    MathSciNet  Article  MATH  Google Scholar 

  10. Giles, M.B.: Multi-level Monte Carlo path simulation. Oper. Res. 56, 607–617 (2008)

    MathSciNet  Article  MATH  Google Scholar 

  11. Giles, M.B.: Multilevel Monte Carlo methods. Acta Numer. 24, 259–328 (2015)

    MathSciNet  Article  MATH  Google Scholar 

  12. Heinrich, S.: Multilevel monte carlo methods. In: Margenov, S., Wasniewski, J., Yalamov, P. (eds.) Large-Scale Scientific Computing. Springer, Berlin (2001)

    Google Scholar 

  13. Jacob, P. E., Lindsten, F., Schonn, T.: Coupling of particle filters. arXiv preprint arXiv:1606.01156 (2016)

  14. Jasra, A., Kamatani, K., Law, K. J., Zhou, Y.: Multilevel particle filter. arXiv preprint arXiv:1510.04977 (2015)

  15. Kloeden, P.E., Platen, E.: Numerical Solution of Stochastic Differential Equations. Springer, Berlin (1992)

    Google Scholar 

  16. Sen, D., Thiery, A., Jasra, A.: On coupling particle filter trajectories. arXiv preprint arXiv:1606.01016 (2016)

Download references


We thank the referee for his/her comments which have substantially improved the paper. AJ and YZ were supported by an AcRF tier 2 grant: R-155-000-161-112. AJ is affiliated with the Risk Management Institute, the Center for Quantitative Finance, and the OR & Analytics cluster at NUS. KK and AJ acknowledge CREST, JST for additionally supporting the research.

Author information



Corresponding author

Correspondence to Ajay Jasra.


Appendix 1: set up

Basic notations

Recall the following notations. The total variation norm is \(\Vert \cdot \Vert _{\text {tv}}\). The collection of real-valued Lipschitz functions on a space E is written \(\text {Lip}(E)\). For two Markov kernels \(M_1\) and \(M_2\) on the same space E, letting \({\mathcal {A}}=\{\varphi : \Vert \varphi \Vert \le 1, \varphi \in \text {Lip}(E)\}\) write

$$\begin{aligned}&|||M_{1}-M_{2}||| := \sup _{\varphi \in {\mathcal {A}}}\sup _x |\int _E \varphi (y) M_1(x,\mathrm{d}y)\\&\quad - \int _E \varphi (y) M_2(x,\mathrm{d}y) |. \end{aligned}$$

Consider a sequence of random variables \((v_n)_{n\ge 0}\) with \(v_n=(u_{n,1},u_{n,2})\in {\mathcal {U}}\times {\mathcal {U}}=: {\mathcal {V}}\). For \(\mu \in {\mathcal {P}}({\mathcal {V}})\) (the probability measures on \({\mathcal {V}}\)) and function \(\varphi \in {\mathcal {B}}_b({\mathcal {U}})\) (bounded-measurable, real-valued) we will write:

$$\begin{aligned} \mu (\varphi _j) = \int _{\mathcal {V}}\varphi (u_j) \mu (\mathrm{d}v)\qquad j\in \{1,2\}. \end{aligned}$$

Write the \(j\in \{1,2\}\) marginals (on \(u_j\)) of a probability \(\mu \in {\mathcal {P}}({\mathcal {V}})\) as \(\mu _j\). Define the potentials \(G_n{:}\,{\mathcal {U}}\rightarrow {\mathbb {R}}_+\). Let \(\eta _0\in {\mathcal {P}}({\mathcal {V}})\) and define Markov kernels \(M_{n}{:}\,{\mathcal {V}}\rightarrow {\mathcal {P}}({\mathcal {V}})\) with \(n\ge 1\). It is explicitly assumed that for \(\varphi \in {\mathcal {B}}_b({\mathcal {U}})\) the j marginals satisfy

$$\begin{aligned} M_{n}(\varphi _j)(v)= & {} \int _{\mathcal {V}}\varphi (u_j') M_n(v,\mathrm{d}v')\nonumber \\= & {} \int _{\mathcal {U}}\varphi (u_j') M_{n,j}(u_j,\mathrm{d}u_j'). \end{aligned}$$

We adopt the definition for \((v,\tilde{v})=((u_1,u_2),(\tilde{u}_1,\tilde{u}_2))\) of a sequence of Markov kernels \((\bar{M}_n)_{n\ge 1}\), \(\bar{M}_n:{\mathcal {V}}\times {\mathcal {V}}\rightarrow {\mathcal {P}}({\mathcal {V}})\)

$$\begin{aligned} \bar{M}_n((v,\tilde{v}),\mathrm{d}v') := M_n((u_1,\tilde{u}_2),\mathrm{d}v'). \end{aligned}$$

In the main text \({\mathcal {U}}={\mathbb {R}}^d\).

Marginal Feynman–Kac formula

Given the above notations and definitions we define the \(j-\)marginal Feynman–Kac formulae:

$$\begin{aligned} \gamma _{n,j}(du_n) = \int \prod _{p=0}^{n-1} G_p(u_p) \eta _{0,j}(\mathrm{d}u_0) \prod _{p=1}^n M_{p,j}(u_{p-1},\mathrm{d}u_p) \end{aligned}$$

with for \(\varphi \in {\mathcal {B}}_b({\mathcal {U}})\)

$$\begin{aligned} \eta _{n,j}(\varphi ) = \frac{\gamma _{n,j}(\varphi )}{\gamma _{n,j}(1)}. \end{aligned}$$

One can also define the sequence of Bayes operators, for \(\mu \in {\mathcal {P}}({\mathcal {U}})\)

$$\begin{aligned} \Phi _{n,j}(\mu )(\mathrm{d}u) = \frac{\mu (G_{n-1}M_{n,j}(\cdot ,du))}{\mu (G_{n-1})}\qquad n\ge 1. \end{aligned}$$

Recall that for \(n\ge 1\), \(\eta _{n,j} = \Phi _{n,j}(\eta _{n-1,j})\).

Feynman–Kac formulae for multilevel particle filters

For \(\mu \in {\mathcal {P}}({\mathcal {V}})\) define for \(u\in {\mathcal {U}}\), \(v\in {\mathcal {V}}\):

$$\begin{aligned}&G_{n,j,\mu }(u) = \frac{G_{n}(u)}{\mu _j(G_{n})} \\&\bar{G}_{n,\mu }(v) = G_{n,1,\mu }(u_1) \wedge G_{n,2,\mu }(u_2). \end{aligned}$$

Now for any sequence \((\mu _n)_{n\ge 0}\), \(\mu _n\in {\mathcal {P}}({\mathcal {V}})\), define the sequence of operators \((\bar{\Phi }_n(\mu _{n-1}))_{n\ge 1}\):

$$\begin{aligned}&\bar{\Phi }_n(\mu _{n-1})(\mathrm{d}v_n)\\&\quad =\mu _{n-1}(\bar{G}_{n-1,\mu _{n-1}})\frac{\mu _{n-1}(\bar{G}_{n-1,\mu _{n-1}}M_n(\cdot ,\mathrm{d}v_n))}{\mu _{n-1}(\bar{G}_{n-1,\mu _{n-1}})}\\&\qquad +\,(1-\mu _{n-1}(\bar{G}_{n-1,\mu _{n-1}}))\\&\qquad \times \,\mu _{n-1}\otimes \mu _{n-1} \Big ( \Big [ \frac{G_{n-1,1,\mu _{n-1}}-\bar{G}_{n-1,\mu _{n-1}}}{\mu _{n-1}(G_{n-1,1,\mu _{n-1}}-\bar{G}_{n-1,\mu _{n-1}})}\\&\qquad \otimes \frac{G_{n-1,2,\mu _{n-1}}-\bar{G}_{n-1,\mu _{n-1}}}{\mu _{n-1}(G_{n-1,2,\mu _{n-1}}-\bar{G}_{n-1,\mu _{n-1}})} \Big ] \bar{M}_n(\cdot ,\mathrm{d}v_n) \Big ). \end{aligned}$$

Now define \(\bar{\eta }_n := \bar{\Phi }_n(\bar{\eta }_{n-1})\) for \(n\ge 1\), \(\bar{\eta }_0=\eta _0\). The following Proposition is proved in Jasra et al. (2015):

Proposition 5.1

Let \((\mu _n)_{n\ge 0}\) be a sequence of probability measures on \({\mathcal {V}}\) with \(\mu _0=\eta _0\) and for each \(j\in \{1,2\}\), \(\varphi \in {\mathcal {B}}_b({\mathcal {U}})\)

$$\begin{aligned} \mu _{n}(\varphi _j) = \eta _{n,j}(\varphi ). \end{aligned}$$


$$\begin{aligned} \eta _{n,j}(\varphi ) = \bar{\Phi }_n(\mu _{n-1})(\varphi _j). \end{aligned}$$

In particular \(\bar{\eta }_{n,j}=\eta _{n,j}\) for each \(n\ge 0\).

The point of the proposition is that if one has a system that samples \(\bar{\eta }_0\), \(\bar{\Phi }_1(\bar{\eta }_0)\), and so on, that marginally, one has exactly the marginals \(\eta _{n,j}\) at each time point. In practice one cannot do this, but rather samples at time 0

$$\begin{aligned} \Big (\prod _{i=1}^N \bar{\eta }_0(\mathrm{d}v_0^i)\Big ). \end{aligned}$$

Writing the empirical measure of the samples as \(\bar{\eta }^N_{0}\), one then samples

$$\begin{aligned} \prod _{i=1}^N \bar{\Phi }_{p}(\bar{\eta }^N_{p-1})(\mathrm{d}v_p^i). \end{aligned}$$

Again writing the empirical measure as \(\bar{\eta }^N_{1}\) and so on, one runs the following system:

$$\begin{aligned} \Big (\prod _{i=1}^N \bar{\eta }_0(\mathrm{d}v_0^i)\Big )\Big (\prod _{p=1}^n \prod _{i=1}^N \bar{\Phi }_{p}(\bar{\eta }^N_{p-1})(\mathrm{d}v_p^i)\Big ) \end{aligned}$$

which is exactly one pair of particle filters at a given level of the MLPF.

\(\eta _{n,1}\) (and its approximation) will represent the predictor at time n for a ‘fine’ level and \(\eta _{n,2}\) (and its approximation) will represent the predictor at time n for a ‘coarse’ level. The time index here is shifted backwards one, relative to the main text and this whole section only considers one coupled particle filter. This is all that is required due to the independence of the particle filters.

Appendix 2: normalizing constant: unbiased estimator

Note the following

$$\begin{aligned} \gamma _{n,j}^N(1) = \prod _{p=0}^{n-1} \bar{\eta }_{p,j}^N(G_p) \end{aligned}$$

to estimate \(\gamma _{n,j}(1)\) (\(p(y_{1:n})\) in the main text; recall the subscript \(j\in \{1,2\}\) has 1 as the fine, 2 the coarse). This estimate is unbiased as proved in Jasra et al. (2015). We will be considering the analysis of

$$\begin{aligned} \gamma _{n,1}^N(1) - \gamma _{n,2}^N(1). \end{aligned}$$

In the assumptions below \(G_n\) is exactly \(G(x_n,y_n)\) in the main text. \(M_{n,1}\) (resp. \(M_{n,1}\)) is simply the finer (resp. coarser) Euler discretized Markov transition (there is no time parameter for the transition kernel in the main text). The following assumptions are (A1-2) in the main text, adapted to the notations of this appendix.

(A3) There exist \(c>1\) and \(C>0\), such that for all \(n\ge 0\), \(u,u' \in {\mathcal {U}}\)

  1. (i)

    Boundedness: \(c^{-1}< G_n(u) < c\);

  2. (ii)

    Globally Lipschitz: \(|G_n(u) - G_n(u')| \le C |u-u'|\).

(A4) There exists a \(C>0\) such that for each \(u,u'\in {\mathcal {U}}\), \(j\in \{1,2\}\) and \(\varphi \in {\mathcal {B}}_b({\mathcal {U}})\cap \text {Lip}({\mathcal {U}})\)

$$\begin{aligned} |M_{n,j}(\varphi )(u) - M_{n,j}(\varphi )(u')| \le C_n\Vert \varphi \Vert ~|u-u'|. \end{aligned}$$


$$\begin{aligned} B(n)= & {} \left( \sum _{p=0}^n \left\{ {\mathbb {E}}\left[ \left\{ \left| U_{p,1}^1-U_{p,2}^1\right| \wedge 1 \right\} ^{2}\right] ^{1/2} \right. \right. \\&\left. +\left\| \eta _{p,1}-\eta _{p,2}\right\| _{\text {tv}}\} + \sum _{p=1}^n|||M_{p,1}-M_{p,2}|||\right) ^2 \end{aligned}$$

where \({\mathbb {E}}\) is expectation w.r.t. the law associated to the algorithm described in this appendix. Let \(\overline{B}(0) = C B(0)\) and for \(n\ge 1\)

$$\begin{aligned} \overline{B}(n)= & {} C(n)[B(n-1) + \overline{B}(n-1) + \Vert \eta _{n-1,1}-\eta _{n-1,2}\Vert _{\text {tv}}^2\\&\quad +(\gamma _{n-1,1}(1)-\gamma _{n-1,2}(1))^2] \end{aligned}$$

where C(n) is a constant depending upon n.

Proposition 5.2

Assume (A3-4). Then for any \(n\ge 1\), \(N\ge 1\):

$$\begin{aligned} {\mathbb {E}}\left[ \left( \left[ \gamma _{n,1}^N(1) - \gamma _{n,2}^N(1)\right] - \left[ \gamma _{n,1}(1) - \gamma _{n,2}(1)\right] \right) ^2\right] \le \frac{\overline{B}(n)}{N}. \end{aligned}$$


Throughout, C(n) is a constant that depends on n whose value may change from line to line. We prove the result by induction on n. The case \(n=1\) follows by Jasra et al. (2015, Theorem C.2), so we go immediately to the case of a general \(n>1\) and assume the result at \(n-1\). We have

$$\begin{aligned}&\left[ \gamma _{n,1}^N(1) - \gamma _{n,2}^N(1)\right] - \left[ \gamma _{n,1}(1) - \gamma _{n,2}(1)\right] \nonumber \\&\quad =\prod _{p=0}^{n-2}\eta _{p,1}^N(G_p)\left[ \eta _{n-1,1}^N(G_{n-1})-\eta _{n-1,2}^N(G_{n-1})\right] \nonumber \\&\qquad +\, \eta _{n-1,2}^N(G_{n-1})\left[ \prod _{p=0}^{n-2}\eta _{p,1}^N(G_p)-\prod _{p=0}^{n-2}\eta _{p,2}^N(G_p)\right] \nonumber \\&\qquad -\, \prod _{p=0}^{n-2}\eta _{p,1}(G_p)\left[ \eta _{n-1,1}(G_{n-1})-\eta _{n-1,2}(G_{n-1})\right] \nonumber \\&\qquad +\,\eta _{n-1,2}(G_{n-1})\left[ \prod _{p=0}^{n-2}\eta _{p,1}(G_p)-\prod _{p=0}^{n-2}\eta _{p,2}(G_p)\right] \nonumber \\&\quad = T_1^N + T_2^N - (T_1 - T_2). \end{aligned}$$

By the \(C_2-\)inequality, we can consider bounding \({\mathbb {E}}[(T_1^N-T_1)^2]\) and \({\mathbb {E}}[(T_2^N-T_2)^2]\) respectively in (10).

Term \({\mathbb {E}}[(T_1^N-T_1)^2]\).

We have

$$\begin{aligned}&{\mathbb {E}}\left[ \left( T_1^N-T_1\right) ^2\right] \\&\quad \le 2{\mathbb {E}}\left[ \left( \gamma _{n-2,1}^N(1)\left[ (\eta _{n-1,1}^N(G_{n-1})-\eta _{n-1,2}^N(G_{n-1})\right) \right. \right. \\&\qquad \left. \left. \left. -\,\left( \eta _{n-1,1}(G_{n-1})-\eta _{n-1,2}(G_{n-1})\right) \right] \right) ^2\right] \\&\qquad +\,2(\eta _{n-1,1}(G_{n-1})-\eta _{n-1,2}(G_{n-1}))^2\\&\qquad \times \,{\mathbb {E}}\left[ \left( \gamma _{n-2,1}^N(1)-\gamma _{n-2,1}(1)\right) ^2\right] . \end{aligned}$$

The almost sure-boundedness of \(\gamma _{n-2,1}^N(1)\) and Jasra et al. (2015, Theorem C.2) means that

$$\begin{aligned}&{\mathbb {E}}\left[ \left( \gamma _{n-2,1}^N(1)\left[ \left( \eta _{n-1,1}^N(G_{n-1})-\eta _{n-1,2}^N(G_{n-1})\right) \right. \right. \right. \\&\qquad -(\eta _{n-1,1}(G_{n-1})\left. \left. \left. \left. \quad -\,\eta _{n-1,2}(G_{n-1})\right) \right] \right) ^2\right] \\&\quad \le C(n) \frac{B(n-1)}{N}. \end{aligned}$$

Proposition 5.3 along with (A3) gives

$$\begin{aligned}&(\eta _{n-1,1}(G_{n-1})-\eta _{n-1,2}(G_{n-1}))^2{\mathbb {E}}[(\gamma _{n-2,1}^N(1)\\&\quad -\,\gamma _{n-2,1}(1))^2] \le \Vert \eta _{n-1,1}-\eta _{n-1,2}\Vert _{\text {tv}}^2 \frac{C(n)}{N}. \end{aligned}$$


$$\begin{aligned} {\mathbb {E}}\left[ (T_1^N-T_1)^2\right] \le C(n) \Big [\frac{B(n-1)}{N} + \Vert \eta _{n-1,1}-\eta _{n-1,2}\Vert _{\text {tv}}^2\frac{1}{N}\Big ]. \end{aligned}$$

Term \({\mathbb {E}}[(T_2^N-T_2)^2]\).

We have

$$\begin{aligned}&{\mathbb {E}}\left[ (T_2^N-T_2)^2\right] \\&\quad \le 2{\mathbb {E}}\left[ \eta _{n-1,2}^N(G_{n-1})^2(\left[ \gamma _{n-1,1}^N(1)\right. \right. \\&\left. \left. \qquad -\, \gamma _{n-1,2}^N(1)\right] - \left[ \gamma _{n-1,1}(1) - \gamma _{n-1,2}(1)\right] )^2\right] \\&\qquad +\, 2\left[ \gamma _{n-1,1}(1) - \gamma _{n-1,2}(1)\right] ^2\\&\qquad \times \,{\mathbb {E}}\left[ (\eta _{n-1,2}^N(G_{n-1})-\eta _{n-1,2}(G_{n-1}))^2\right] . \end{aligned}$$

By (A3) and the induction hypothesis

$$\begin{aligned}&{\mathbb {E}}\left[ \eta _{n-1,2}^N(G_{n-1})^2(\left[ \gamma _{n-1,1}^N(1) - \gamma _{n-1,2}^N(1)\right] \right. \\&\left. \quad -\, \left[ \gamma _{n-1,1}(1) - \gamma _{n-1,2}(1)\right] )^2\right] \le C(n) \frac{\overline{B}(n-1)}{N}. \end{aligned}$$

By Jasra et al. (2015, Proposition C.1)

$$\begin{aligned}&\left[ \gamma _{n-1,1}(1) - \gamma _{n-1,2}(1)\right] ^2{\mathbb {E}}\left[ (\eta _{n-1,2}^N(G_{n-1})\right. \\&\left. \quad -\,\eta _{n-1,2}(G_{n-1}))^2\right] \le \left[ \gamma _{n-1,1}(1) - \gamma _{n-1,2}(1)\right] ^2\frac{C(n)}{N}. \end{aligned}$$


$$\begin{aligned}&{\mathbb {E}}[(T_2^N-T_2)^2]\\&\quad \le C(n)\left[ \frac{\overline{B}(n-1)}{N}+\left[ \gamma _{n-1,1}(1) - \gamma _{n-1,2}(1)\right] ^2\frac{1}{N}\right] . \end{aligned}$$

From here one can conclude the proof. \(\square \)

Proposition 5.3

Assume (A3-4). Then for any \(n\ge 1\) there exist a \(C(n)<+\infty \) such that for any \(N\ge 1\), \(j\in \{1,2\}\)

$$\begin{aligned} {\mathbb {E}}[(\gamma _{n,j}^N(1) - \gamma _{n,j}(1))^2] \le \frac{C(n)}{N}. \end{aligned}$$


We prove the result by induction on n. The case \(n=1\) follows by Jasra et al. (2015, Proposition C.1), so we go immediately to the case of a general \(n>1\) and assuming the result at \(n-1\). We have

$$\begin{aligned}&\gamma _{n,j}^N(1) - \gamma _{n,j}(1) = \prod _{p=0}^{n-2}\eta _{p,j}^N(G_p)\left[ \eta _{n-1,j}^N(G_{n-1})\right. \\&\left. \quad -\,\eta _{n-1,j}(G_{n-1})\right] + \eta _{n-1,j}(G_{n-1})\left[ \gamma _{n-1,j}^N(1) - \gamma _{n-1,j}(1)\right] . \end{aligned}$$

Thus, by the \(C_2-\)inequality:

$$\begin{aligned}&{\mathbb {E}}\left[ (\gamma _{n,j}^N(1) - \gamma _{n,j}(1))^2\right] \\&\quad \le 2{\mathbb {E}}\left[ \left( \prod _{p=0}^{n-2}\eta _{p,j}^N(G_p)\left[ \eta _{n-1,j}^N(G_{n-1})-\eta _{n-1,j}(G_{n-1})\right] \right) ^2 \right] \\&\quad +\, 2{\mathbb {E}}\left[ \left( \eta _{n-1,j}(G_{n-1})\left[ \gamma _{n-1,j}^N(1) - \gamma _{n-1,j}(1)\right] \right) ^2\right] . \end{aligned}$$

Using the boundedness of the \(\{G_p\}_{p\ge 0}\) and Jasra et al. (2015, Proposition C.1) deals with the first term on the R.H.S. of the inequality and the induction hypothesis the second term. \(\square \)

For the following result, it is assumed that \(M_{n,1}\) and \(M_{n,2}\) are induced by an Euler approximation and the discretization levels are h / 2 and h.

Proposition 5.4

Assume (A3(i)). Then for any \(n\ge 1\) there exist a \(C(n)<+\infty \) such that for any \(\varphi \in {\mathcal {B}}_b({\mathcal {U}})\)

$$\begin{aligned} |\gamma _{n,1}(\varphi )-\gamma _{n,2}(\varphi )| \le C(n) \sup _{u\in {\mathcal {U}}}|\varphi (u)|h. \end{aligned}$$


We prove the result by induction on n. The case \(n=1\) follows by Del Moral et al. (2001, Eq. 2.4), so we go immediately to the case of a general \(n>1\) and assume the result at \(n-1\). We have

$$\begin{aligned}&\gamma _{n,1}(\varphi )-\gamma _{n,2}(\varphi ) = \gamma _{n-1,1}(G_{n-1})[\eta _{n,1}(\varphi )- \eta _{n,2}(\varphi )]\\&\quad +\, \eta _{n,2}(\varphi )[\gamma _{n-1,1}(G_{n-1})-\gamma _{n-1,2}(G_{n-1})]. \end{aligned}$$

By Jasra et al. (2015, Lemma D.1.) (assumption 4.2(i) of that paper holds for an Euler approximation)

$$\begin{aligned}&|\gamma _{n-1,1}(G_{n-1})[\eta _{n,1}(\varphi )- \eta _{n,2}(\varphi )]|\\&\quad \le 2\sup _{u\in {\mathcal {U}}}|\varphi (u)|\gamma _{n-1,1}(G_{n-1})\Vert \eta _{n,1}- \eta _{n,2}\Vert _{\text {tv}} \\&\quad \le 2\sup _{u\in {\mathcal {U}}}|\varphi (u)|\gamma _{n-1,1}(G_{n-1}) h. \end{aligned}$$

The induction hypothesis yields

$$\begin{aligned}&|\eta _{n,2}(\varphi )[\gamma _{n-1,1}(G_{n-1})-\gamma _{n-1,2}(G_{n-1})]|\\&\quad \le \sup _{u\in {\mathcal {U}}}|\varphi (u)|C(n-1) \sup _{u\in {\mathcal {U}}}|G_{n-1}(u)|h. \end{aligned}$$

The proof can then easily be completed.

Remark 5.1

In the Euler case, Proposition 5.4 along with Jasra et al. (2015, Lemma D.1.) and that \(B(n)= {\mathcal {O}}(h^{1/2})\) (see Jasra et al. (2015, Corollary D.1)) establishes that \(\overline{B}(n) = {\mathcal {O}}(h^{1/2})\).

Appendix 3: normalizing constant: biased estimator

In order to follow this section, one must have read the previous sections of the appendix. We now consider the case of the biased estimator. In this scenario, the full algorithm is considered, that is, a single particle filter and L coupled (but independent) particle filters. Let \(n\ge 1\) be given. We define \(\gamma _{n,j}^l(1), j\in \{1,2\}\) as the normalizing constants associated to level \(l\in \{1,\ldots ,L\}\). We write \(\gamma _{n,1}^1(1)\) as the normalizing constant at the coarsest level. We set

$$\begin{aligned} \gamma _{n,j}^{N_l}(1) = \prod _{p=0}^{n-1}\eta _{p,j}^{N_l}(G_p) \end{aligned}$$

with \(j\in \{1,2\}\), \(l\in \{1,\ldots ,L\}\), with an obvious extension to \(\gamma _{n,1}^{N_0}(1)\). We are to analyze the estimate

$$\begin{aligned} \gamma _{n,1}^{N_0}(1) \prod _{l=1}^L \frac{\gamma _{n,1}^{N_l}(1)}{\gamma _{n,2}^{N_l}(1)}. \end{aligned}$$

We denote by (A) that the assumptions (A3-4) in the previous section uniformly at each level (where applicable). We write \(\overline{B}_l(n)\) to denote the level specific version of \(\overline{B}(n)\) in the previous section.

Proposition 5.5

Assume (A). Then for any \(n\ge 1\) there exist a \(C(n)<+\infty \) such that for any \(L\ge 1\), \(N_{0:L}\ge 1\) we have

$$\begin{aligned}&{\mathbb {E}}\left[ \left( \gamma _{n,1}^{N_0}(1) \prod _{l=1}^L \frac{\gamma _{n,1}^{N_l}(1)}{\gamma _{n,2}^{N_l}(1)} - \gamma _{n,1}^{0}(1) \prod _{l=1}^L \frac{\gamma _{n,1}^{l}(1)}{\gamma _{n,2}^{l}(1)} \right) ^2\right] \\&\quad \le C(n)\left( \frac{1}{\sqrt{N}_0} + \sum _{l=1}^L\left( \frac{\overline{B}_l(n)^{1/2}}{\sqrt{N}_l} + \frac{|\gamma _{n,1}^{l}(1)-\gamma _{n,2}^{l}(1)|}{\sqrt{N}_l} \right) \right) ^2. \end{aligned}$$


Note that,

$$\begin{aligned}&\gamma _{n,1}^{N_0}(1) \prod _{l=1}^L \frac{\gamma _{n,1}^{N_l}(1)}{\gamma _{n,2}^{N_l}(1)} - \gamma _{n,1}^{0}(1) \prod _{l=1}^L \frac{\gamma _{n,1}^{l}(1)}{\gamma _{n,2}^{l}(1)} \\&\quad =(\gamma _{n,1}^{N_0}(1)-\gamma _{n,1}^{0}(1))\prod _{l=1}^L \frac{\gamma _{n,1}^{N_l}(1)}{\gamma _{n,2}^{N_l}(1)}\\&\qquad +\, \sum _{l=1}^L\left( \gamma _{n,1}^{0}(1) \prod _{t=1}^{l-1} \frac{\gamma _{n,1}^{t}(1)}{\gamma _{n,2}^{t}(1)} \left( \frac{\gamma _{n,1}^{N_l}(1)}{\gamma _{n,2}^{N_l}(1)}- \frac{\gamma _{n,1}^{l}(1)}{\gamma _{n,2}^{l}(1)} \right) \right. \nonumber \\&\left. \qquad \prod _{s=l+1}^{L} \frac{\gamma _{n,1}^{N_s}(1)}{\gamma _{n,2}^{N_s}(1)} \right) . \end{aligned}$$

So by Minkowski

$$\begin{aligned}&{\mathbb {E}}\left[ \left( \gamma _{n,1}^{N_0}(1) \prod _{l=1}^L \frac{\gamma _{n,1}^{N_l}(1)}{\gamma _{n,2}^{N_l}(1)} - \gamma _{n,1}^{0}(1) \prod _{l=1}^L \frac{\gamma _{n,1}^{l}(1)}{\gamma _{n,2}^{l}(1)} \right) ^2\right] \nonumber \\&\quad \le \left( {\mathbb {E}}[(\gamma _{n,1}^{N_0}(1)-\gamma _{n,1}^{0}(1))^2]^{1/2} \prod _{l=1}^L {\mathbb {E}}\left[ \frac{\gamma _{n,1}^{N_l}(1)^2}{\gamma _{n,2}^{N_l}(1)^2}\right] ^{1/2}\right. \nonumber \\&\qquad +\,\sum _{l=1}^L\left( \gamma _{n,1}^{0}(1) \prod _{t=1}^{l-1} \frac{\gamma _{n,1}^{t}(1)}{\gamma _{n,2}^{t}(1)} {\mathbb {E}}\left[ \left( \frac{\gamma _{n,1}^{N_l}(1)}{\gamma _{n,2}^{N_l}(1)}\right. \right. \right. \nonumber \\&\left. \left. \left. \left. \qquad -\, \frac{\gamma _{n,1}^{l}(1)}{\gamma _{n,2}^{l}(1)} \right) ^2 \right] ^{1/2} \prod _{s=l+1}^{L} {\mathbb {E}}\left[ \left( \frac{\gamma _{n,1}^{N_s}(1)}{\gamma _{n,2}^{N_s}(1)}\right) ^2 \right] ^{1/2}\right) \right) ^2.\nonumber \\ \end{aligned}$$

By standard results in SMC:

$$\begin{aligned} {\mathbb {E}}[(\gamma _{n,1}^{N_0}(1)-\gamma _{n,1}^{0}(1))^2]^{1/2} \le \frac{C(n)}{\sqrt{N_0}}. \end{aligned}$$


So we have by Minkowski and the bounded property of the \(\{G_n\}_{n\ge 0}\):

$$\begin{aligned}&{\mathbb {E}}\left[ \left( \frac{\gamma _{n,1}^{N_l}(1)}{\gamma _{n,2}^{N_l}(1)} - \frac{\gamma _{n,1}^{l}(1)}{\gamma _{n,2}^{l}(1)} \right) ^2 \right] ^{1/2}\\&\quad \le C(n)\left( {\mathbb {E}}\left[ \left[ \gamma _{n,1}^{N_l}(1)-\gamma _{n,2}^{N_l}(1) -(\gamma _{n,1}^{l}(1)-\gamma _{n,2}^{l}(1)) \right] ^2\right] ^{1/2}\right. \\&\left. \qquad +\, |\gamma _{n,1}^{l}(1)-\gamma _{n,2}^{l}(1)| {\mathbb {E}}\left[ (\gamma _{n,2}^{l}(1)-\gamma _{n,2}^{N_l}(1))^2\right] ^{1/2} \right) . \end{aligned}$$

Applying Proposition 5.2 and 5.3 to the two expectations, we obtain

$$\begin{aligned}&{\mathbb {E}}\left[ \left( \frac{\gamma _{n,1}^{N_l}(1)}{\gamma _{n,2}^{N_l}(1)} - \frac{\gamma _{n,1}^{l}(1)}{\gamma _{n,2}^{l}(1)} \right) ^2 \right] ^{1/2}\nonumber \\&\quad \le C(n)\left( \frac{\overline{B}_l(n)^{1/2}}{\sqrt{N}_l} + \frac{|\gamma _{n,1}^{l}(1)-\gamma _{n,2}^{l}(1)|}{\sqrt{N}_l} \right) . \end{aligned}$$

Combining (11) with (12) and (13) along with the bounded property of the \(\{G_n\}_{n\ge 0}\) allows one to conclude the proof. \(\square \)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Jasra, A., Kamatani, K., Osei, P.P. et al. Multilevel particle filters: normalizing constant estimation. Stat Comput 28, 47–60 (2018).

Download citation


  • Filtering
  • Diffusions
  • Particle filter
  • Multilevel Monte Carlo