An Algorithm for Approximating the Second Moment of the Normalizing Constant Estimate from a Particle Filter

We propose a new algorithm for approximating the non-asymptotic second moment of the marginal likelihood estimate, or normalizing constant, provided by a particle filter. The computational cost of the new method is O(M) per time step, independently of the number of particles N in the particle filter, where M is a parameter controlling the quality of the approximation. This is in contrast to O(MN) for a simple averaging technique using M i.i.d. replicates of a particle filter with N particles. We establish that the approximation delivered by the new algorithm is unbiased, strongly consistent and, under standard regularity conditions, increasing M linearly with time is sufficient to prevent growth of the relative variance of the approximation, whereas for the simple averaging technique it can be necessary to increase M exponentially with time in order to achieve the same effect. This makes the new algorithm useful as part of strategies for estimating Monte Carlo variance. Numerical examples illustrate performance in the context of a stochastic Lotka–Volterra system and a simple AR(1) model.


Introduction
Particle filters, also known as Sequential Monte Carlo (SMC) methods (Doucet et al, 2001), are used across a variety of disciplines including systems biology, econometrics, neuroscience and signal processing, to perform approximate inferential calculations in general state-space Hidden Markov Models (HMM) and in particular, provide an unbiased estimate of the marginal likelihood.Recent application areas of these techniques include for example, systems biology (Golightly and Wilkinson, 2011;Golightly et al, 2015), where the calculation of the marginal likelihood (ML) plays an important role in the estimation of the parameters of stochastic models of biochemical networks.Estimation of the marginal likelihood also features centrally in Particle Markov Chain Monte Carlo methods (Andrieu et al, 2010).
In the present paper we address the problem of approximating the non-asymptotic second moment of the particle filter estimate of the marginal likelihood, henceforth for brevity "the second moment".As part of strategies to estimate Monte Carlo variance, this allows one to report a numerical measure of the reliability of the particle filter estimate.Our contributions are to introduce a new particle "Pairs algorithm" and prove that it unbiasedly and consistently approximates the second moment.We also establish, under regularity conditions, a linear-in-time bound on the relative variance of the approximation to the second moment, and illustrate through a simple calculation and numerical simulations, that the Pairs algorithm performs more reliably than a default strategy which uses independent copies of the particle filter.In order to discuss the connections between our work and the existing literature, we first need to introduce some notation and definitions.
A HMM is a process pX n , Y n q ně0 , where pX n q ně0 , called the signal process, is a Markov chain with state space X, initial distribution π 0 and transition kernel f .Each of the observations Y n P Y, is conditionally independent of the rest of the signal process given X n , with conditional distribution, gpX n , ¨q, where g is a probability kernel from X to Y.The HMM can be represented as: Y n | X n " gpX n , ¨q, n ě 0.
We consider a fixed observation sequence py n q ně0 , assume that g admits a density gpx, yq w.r.t. to some dominating measure and write for brevity g n pxq " gpx, y n q.For simplicity we also assume throughout that for all n ě 0, sup x g n pxq ă `8 and g n pxq ą 0, @x P X.We then define the sequence of distributions pπ n q ně1 , called prediction filters, as π n`1 pAq :" ş X π n pdxqg n pxqf px, Aq ş X π n pdxqg n pxq , @A P X , n ě 0, where X is the σ-algebra associated with the space X, and the sequence pZ n q ně0 , Z 0 :" The interpretation of these definitions is the following: π n`1 is the distribution of X n`1 | Y 0:n " y 0:n , where for any sequence pa n q ně0 we write a p:q " pa p , . . ., a q q, and Z n is the marginal likelihood of the first n`1 observations y 0:n .In many cases of interest, the distributions π n and constants Z n cannot be computed exactly, and numerical approximations are needed.A particle filter, shown in Algorithm 1, provides such approximations, denoted respectively π N n and Z N n .In Algorithm 1 q 0 and q n , n ě 1 are respectively a distribution and Markov kernels on X, which may depend on the observations sequence py n q ně0 , but this dependence is suppressed from the notation.We assume throughout the rest of the paper that π 0 p¨q, f px, ¨q and q 0 p¨q and q n px, ¨q admit a density w.r.t. to some common dominating measure dx, and with a slight abuse of notation, the corresponding densities are denoted by π 0 pxq, f px, x 1 q, q 0 pxq and q n px, x 1 q.
Algorithm 1 SMC algorithm for estimating Z n using N particles Initialization using the normalized weights to obtain a set of equally-weighted particles For n ě 1: using the normalized weights to obtain a set of equally-weighted particles It is well known that Algorithm 1 provides an unbiased estimate of Z n , i.e.E " Z N n ı " Z n .A detailed account of this fact is given in (Del Moral, 2004, Ch. 9)

¯2
. The approximation is delivered by Algorithm 2 -the Pairs algorithm -which we introduce in the next section, and which must be run in addition to the particle filter used to estimate Z N n .Our main motivation for approximating E delivered by the Pairs algorithm, showing that under standard regularity conditions, it is sufficient to increase M linearly with n to control the relative variance of this approximation.This is in contrast to Lee and Whiteley (2015), who do not provide any results concerning the time-dependence of the errors associated with their estimators.
We note that Chan and Lai (2013) investigated numerical techniques for assessing the asymptotic variance associated with particle estimates of expectations with respect to filtering distributions, but they didn't explore methods for ap- . We also note that Bhadra and Ionides (2014) proposed using a "meta-model", for purposes of optimizing parameters of the particle filter.Their method amounts to fitting an AR(1) process to the output of the particle filter; it seems difficult to assess the bias of their approach and no proof of consistency is given.

Outline of how the algorithm is derived
The full details of the derivation of the Pairs algorithm are given in Appendix A. We now give an account of some of the main ideas behind this derivation.For this some more notation is needed.Let us introduce the nonnegative integral kernels: for x P X, y " py 1 , y 2 q P X 2 , Q 1 px, dyq " g 0 pxqπ 0 pxq q 0 pxq q 1 py 1 , y 2 qδ x pdy 1 qdy 2 , (3) and for n ě 2 and x " px 1 , x 2 q P X 2 , y P X 2 , Q n px, dyq " g n´1 px 2 qf px 1 , x 2 q q n´1 px 1 , x 2 q q n py 1 , y 2 qδ x 2 pdy 1 qdy 2 . (4) In terms of compositions of these kernels, the lack-of-bias property of the particle filter reads as: The kernels also encapsulate the main ingredients of the particle filter itself, indeed one may take the point of view that Algorithm 1 is actually derived from the Q n , in the sense that resampling is performed according to weights given by evaluating the functions and sampling is performed using the the Markov kernels: Now introduce the so-called coalescence operator C which acts on functions F : X 2 ˆX2 Ñ R as CpF qpx, yq " F px, xq.Cérou et al (2011) derived the following representation of the second moment of Z N n , where C 1 :" C, C 0 :" Id, tǫ n u ně0 is a sequence of i.i.d., t0, 1u-valued random variables with distribution and Q b2 n is the two-fold tensor product of Q n .The main idea behind the Pairs algorithm is to identify, using (8), certain nonnegative kernels Q pNq n such that the second moment can be written The details of these kernels Q pNq n are given in the Appendix.Observing the similarity with (5), to obtain the Pairs algorithm we shall derive a particle algorithm from the weighting functions and Markov kernels which are associated with Q pNq n in the same way as ( 6)-( 7) are associated with Q n , the result being the Pairs algorithm.Results for standard particle filters then transfer to the Pairs algorithm directly, which leads to our Theorem 2.1 below.

The algorithm and its properties
In Algorithm 2 both N ě 2 and M ě 1 are parameters.The computational cost of Algorithm 2 is OpM q per time step, uniformly in N , and the quantity Ξ pN,M q n which it delivers can be considered an approximation to E , in the sense of Theorem 2.1 below.
If additionally for each n ě 0 there exist constants 0 ă w ń ď w ǹ ă `8, and for each n ě 1, constants 0 ă ǫ ń ď ǫ ǹ ă `8 and a probability measure µ n such that w 0 ď g 0 pxqπ 0 pxq{q 0 pxq ď w 0 , @x, (10) Algorithm 2 Pairs algorithm for approximating according to -Resample conditionally i.i.d.draws from to obtain a set of equally-weighted particles to obtain a set of equally-weighted particles then for any N ě 2 and n ě 0, where ∆ s :" ˙2 is independent of M and N .
The proof of Theorem 2.1 is given in Appendix A. The conditions in ( 10)-( 12) are fairly standard in the stability theory of particle filters, but are rather strong: they rarely hold when X is an unbounded subset of R d .Attempting to establish similar results under more realistic conditions, for example via the techniques of Whiteley ( 2013), seems to be a much more difficult task, beyond the scope of the present work, and we leave a full investigation of this matter to future research.

Comparison to using
The cost of computing r Ξ pN,M q n is OpM N q per time step since it involves M copies of Algorithm 1, each using N particles.
To illustrate why Ξ pN,M q n is to be preferred over r Ξ pN,M q n in terms of relative variance, consider for simplicity of exposition the case: for n ě 1, q n px, ¨q " f px, ¨q " π 0 p¨q; for n " 0, q 0 p¨q " π 0 p¨q; and for n ě 0, g n pxq " gpxq.In this case, for all n ě 0, we have π n " π 0 and in Algorithm 1, tX i n u N i"1 are i.i.d.draws from π 0 .Then with π N p pgq :" p"0 π N p pgq, and where ě 1 by Jensen's inequality, with equality holding if and only if π N 0 pgq is a.s.constant.So if π N 0 pgq exhibits any stochastic variability at all, in the sense that C ą 1, then M must be scaled exponentially fast with n in order to control (14), cf. the linear-in-n scaling in Theorem 2.1.

Numerical examples
We will illustrate the properties of the Pairs algorithm using two numerical examples.The first, in Section 3.1 is a simple toy example, based on a ARp1q autoregressive process.The second, in Section 3.2, is a more realistic example involving a Lotka -Volterra system of ODEs, observed in noise.In Section 3.3 we investigated the performance of the pairs algorithm within a strategy for estimating Monte Carlo variance.
Throughout section 3 we denote by M 1 a number of pairs used in the pairs algorithm to obtain a reliable, benchmark estimate of the true quantity E " ´ZN n

ARp1q example
The signal of this model pX n q ně0 is an ARp1q process, defined by X n`1 " αX n ǫn`1 , where we set α " 0.5, ǫ n " N p0, σ 2 q, σ " 10.Assume that g n pxq " exp `´x 2 {100 ˘, @n.We will also assume that q n px, ¨q " f px, ¨q, i.e. we will propose using the actual signal density and we will set q 0 " π 0 , given by X 0 " N p0, σ 2 {p1 ´α2 qq, i.e. the process pX n q ně0 is stationary a priori.
In Figure 1 we compare two approaches for estimating E " ´ZN n ¯2 : using the Pairs algorithm, and the standard MC approach using i.i.d.replicates as in (13).We consider two sub-examples: the first one is for comparatively small number of particles N " 50, and the second sub-example is with higher number of particles N " 250.The plots show logpΞ pN,M q n q ´logpΞ pN,M 1 q n q for the Pairs algorithm and logp r Ξ pN, M q n q ´logpΞ pN,M 1 q n q for the standard MC approach (please refer to Algorithm 2 and ( 13)).Here we take M 1 " 10 6 so that Ξ In the top left plot of Figure 1 we have chosen M " M " 10 4 .For the equal cost plot on the top right we have chosen M " 10 4 and M " 2500.Here, by "equal cost" we mean that M and M are chosen such that the execution times of the standard MC algorithm and the Pairs algorithm are the same.The time parameter n varies from 0 to 500 in both plots and we plot 20 independent runs of both algorithms in order to compare their variability properties.
The second row of plots in Figure 1 consists of plots for the case of larger number of particles N " 250.Again, in the bottom left we are comparing the case where M " M " 10 4 , and in bottom right we are comparing the equal cost case where M " 10 4 and M " 700.The fact that M is lower here than in the N " 50 case reflects the fact that the cost of the standard MC approach is Op M N q per time step, compared to OpM q for the Pairs algorithm.We have plotted 20 independent runs for both algorithms.
Figure 2 shows boxplots based on 100 independent runs for both algorithms for the case of equal M " M " 10 4 and equal cost.We also have N " 50.It is apparent that the estimates of E " ´ZN n ¯2 that we obtain using the Pairs algorithm have much less variability than the estimates produced using the standard Monte Carlo approach with i.i.d.replicates (especially for big values of the time parameter n).In this section we illustrate the numerical performance of the pairs algorithm in the context of a partially observed Langevin approximation to Lotka-Volterra ODE system (Golightly and Wilkinson, 2011).The signal process in the HMM is obtained from a discretization of the stochastic differential equation (SDE) dX t " αpX t , cqdt `aβpX t , cqdW t , where X t " pX 1,t , X 2,t q, W t " pW 1,t , W ,2t q.Here W t is a vector, each of the components of which is independent standard Brownian motion, c " pc 1 , c 2 , c 3 q are parameters and αpx, cq and βpx, cq are the drift and diffusion coefficients given for the Lotka-Volterra system by αpx, cq " We consider Euler discretization of the SDE with time resolution ∆t " 1{m for some m ě 1, with the resulting process satisfying X n`pj`1q∆t ´Xn`j∆t " αpX n`j∆t , cq∆t `bβpX n`j∆t , cq∆tχ j (15) for n P N and j P t0, 1, . . ., m ´1u, where χ j is a sequence of N p0, 1q-independent random variables.The signal process in the HMM, denoted by pX n q ně0 , consists of a R 2 -valued random variable X 0 " p100, 100q and for n ě 1 a R 2m -valued random variable X n`1 " pX n`∆t , X n`2∆t , . . ., X n`1 q.The model for the observations is Y n " X n `εn , where ε n " N p0, Σ 2ˆ2 q , Σ 2ˆ2 " σ 2 I 2ˆ2 , where I 2ˆ2 is the 2 ˆ2 identity matrix.We also assume that we have observed the process at integer times n.Following Golightly and Wilkinson (2011), we consider two values of the observation noise variance σ 2 " 10 and σ 2 " 200.We fix the rate constants c " pc 1 , c 2 , c 3 q " p0.5, 0.0025, 0.3q, and we will use m " 1 for the discretization parameter.
We adopt the same approach to constructing the proposal kernels pq n q ně1 suggested in Golightly and Wilkinson (2011, Section 4.3), in which q n px n , x n`1 q is chosen to be a tractable Gaussian approximation to the conditional density of x n`1 given x n ,y n`1 .The proposal kernel is given by q n`1 px n , x n`1 q " m´1 ź j"0 ψ n`pj`1q∆t px n`j∆t , x n`pj`1q∆t q where ψ n`pj`1q∆t px n`j∆t , ¨q " N p¨; x n`j∆t `aj ∆t, b j ∆tq, where a j " α j `βj pβ j ∆ j Σq ´1py n`1 ´px n`j∆t `αj ∆ j qq, b j " β j ´βj pβ j ∆ j `Σq ´1β j ∆t, ∆ j " 1 ´j∆t, α j " αpx n`j∆t , cq, β j " βpx n`j∆t , cq.We consider the process pX n , Y n q ně0 as a HMM, to which the particle algorithms are applied to.
We first obtain a reliable benchmark value of E , denoted by Ξ pN,M 1 q n , using a single run of the Pairs algorithm with M 1 " 10 6 .We compare Ξ pN,M q n from the Pairs algorithm with the simple Monte Carlo approximation r Ξ pN, M q n based on i.i.d.replicates, defined in (13) in Figure 3 for two different values of the observation noiseσ 2 " 10 and σ 2 " 200.In both cases we plot again for the standard MC approach.
On the top left of Figure 3 we have the low noise example.In this example, we set N " 100, M " 10 4 and M " 300.On the top right plot we present the large noise case where we set N " 100, M " 10 5 and M " 3000 in order to equalize the computational cost.Again, as in the previous example, we have plotted 20 independent runs for both algorithms.
In the two plots, and especially for large values of the time parameter n, the estimate that we obtain with the help of the Pairs algorithm has much less variability than the estimate calculated using standard Monte Carlo with i.i.d.replicates.We can clearly see that with the increase of the time parameter n, the rate of growth of the variability of the estimates of E " ´ZN n ¯2 obtained using the Pairs algorithm is far less than the corresponding rate for the standard Monte Carlo approach (using i.i.d.replicates).The observations about the variability of the estimates in Figure 3 are also supported by the corresponding boxplots, based on 100 independent runs of the two algorithms.

Time
Low noise Large noise for different values of the time parameter n for the Lotka-Volterra example.We see, that although the scale of the values in Table 1 is small, we still have, by Jensen's inequality, that

Estimating Monte Carlo variance
The purpose of this example is to show that the benefits of approximating E " ´ZN n ¯2 using the Pairs algorithm carry over to its use within a strategy for both estimating Z n and reporting Monte Carlo variance.As a benchmark for comparisons, we consider the following standard approach based on i.i.d.replicates of a particle filter.
MC strategy.Run M independent particle filters, each with Ñ particles, to give The cost of this strategy is Op Ñ M q, and the variance estimate it delivers is a standard sample variance, thus unbiased.There are various ways that the MC strategy could be changed or augmented by using the Pairs algorithm.We consider the following: Pairs strategy.Run M independent particle filter algorithms, each with N particles, to give . Additionally run one instance of the Pairs algorithm with parameters pM, N q, to give Ξ pN,M q n .Then report: as an estimate of Var The cost of this strategy is OpM N `M q.So if for instance N " Ñ and M " M, the additional cost of the Pairs strategy beyond that of the MC strategy becomes negligible as N grows.
To see that the variance estimate delivered by the Pairs strategy is unbiased, note that: n ¯2, where Z N 1 n is a reliable, benchmark estimate of Z n obtained from a particle filter with N 1 " 10 6 .We make comparisons with N " Ñ " 50 and M " M " 10 4 , with these settings in our implementation the additional cost of the Pairs strategy beyond that of the MC strategy was found to be insignificant, very similar results were obtained if the costs of the two strategies were exactly equalized.
In Figure 4 we compare the MC and Pairs strategies.The top left shows box plots of n obtained from 1000 independent realizations of the two strategies, for different values of n.The top right shows boxplots for the variance estimates, also from 1000 realizations.We can clearly see that for increasing n the estimates for the MC strategy exhibit larger variability than the estimates obtained from the Pairs strategy.On the bottom two plots of Figure 4 we compare the kernel density estimates for of Var for n " 500.On bottom left the estimated density is plotted, and on bottom right the log of the density is plotted, highlighting the heavier tails of the distribution for the MC strategy.The kernel density estimates in both plots were produced using a normal kernel function with bandwidths 0.06 (Pairs n for both MC and Pairs strategies (which are equal).On top right plot we compare the two strategies in terms of estimates of the relative variance V ar n ¯2 respectively (the y-axis is on a log-scale).On the bottom left (right) plot we compare the kernel density estimates of the pdf (logpdf) of the relative variance for the two strategies for time n " 500 (the y-axis of the bottom right plot is on a log-scale).For the bottom two plots the x-axis is on a log-scale strategy) and 0.9 (MC strategy).The density estimates indicated a more concentrated distribution for the Pairs strategy (thick, black line) than for the MC strategy (grey line).

Appendix A: Auxiliary definitions, results and proof of Theorem 2.1
This appendix is structured as follows.After introducing notation in A.1, A.2 introduces a generic particle system, of which we show Algorithm 1 to be a special case.The account of this generic particle system and some of its properties is needed in order to derive an associated pairs particle system in A.3, of which we show Algorithm 2 to be a special case.The proof of Theorem 2.1, in A.4, rests on the key observation that the pairs particle system is also an instance of the generic particle system of A.2, allowing properties of the latter to be transferred to the Pairs algorithm.

A.1 Notation and conventions
For a measurable space pE, Eq, denote by B b pEq the set of all R-valued, measurable and bounded functions on E, and by MpEq and PpEq the sets of respectively measures and probability measures on E. For µ P MpEq and ϕ P B b pEq we write µpϕq :" ş E ϕpxqµpdxq.For a non-negative integral kernel L : E ˆE Ñ r0, 8q, ϕ P B b pEq and µ P MpEq, we write Lpϕqpxq :" ş E Lpx, dyqϕpyq, pµLq p¨q :" ş E µpdxqLpx, ¨q and for two such kernels, L and M , we write their composition as pLM qpx, ¨q :" ş E Lpx, dx 1 qM px, ¨q.We write two-fold tensor product measures and functions as respectively µ b2 P MpE 2 q and ϕbϕ P B b pE ˆEq.For ϕ P B b pE ˆEq we write the tensor product integral operator L b2 pϕqpx, x 1 q :" ş EˆE Lpx, dyqLpx 1 , dy 1 qϕpy, y 1 q.We introduce also a measurable space pE 0 , E 0 q and use exactly similar notation when dealing with functions, measures and kernels on pE 0 , E 0 q, and kernels between pE 0 , E 0 q and pE, Eq.

A.2 A generic particle system
For each n ě 2 let Qn : E ˆE Ñ p0, 8q be an integral kernel such that for each x P E, Qnpx, ¨q is a finite measure on pE, Eq.Then introduce Mn : px, Aq P E ˆE Þ Ñ Qnpx, Aq Qnpx, Eq P r0, 1s; G n´1 : x P E Þ Ñ Qnpx, Eq P p0, 8q, which are respectively a Markov kernel and a measurable, bounded, strictly positive function.
With these objects, and for some fixed N ě 1, we associate a particle process pζnq ně0 as follows.The initial configuration ζ 0 " ζ i 0 ( N i"1 are independent and identically distributed according to η 0 , and the evolution of ζn " ζ i n ( N i"1 is described by the following probability law Pp ζn P dζn| ζ 0 , ..., ζ n´1 q : " " where dζn is to be understood as an infinitesimal neighborhood of a point pζ 1 n , ..., ζ N n q.Let us define the empirical measures Algorithm 1 as an instance of the generic particle system Let pX, X q, π 0 , f , g be the ingredients of the HMM as in Section 1.To obtain Algorithm 1 as an instance of the generic particle system under the law (18), take E 0 " X, E 0 " X , and E " X 2 , E " X b2 .Then for points x " px 1 , x 2 q P E and y " py 1 , y 2 q P E, take Mnpx, dyq " δx 2 pdy 1 qqnpy 1 , y 2 qdy 2 , G n´1 pxq " g n´1 px 2 qf px 1 , x 2 q q n´1 px 1 , x 2 q , n ě 2, (21) and for x P E 0 , y " py 1 , y 2 q P E, take M 1 px, dyq " δxpdy 1 qq 1 py 1 , y 2 qdy 2 , G 0 pxq " g 0 pxqπ 0 pxq q 0 pxq , η 0 " π 0 .
Observe then that with Zn as in (2) and Z N n as in Algorithm 1, Properties of the generic particle system We now give a brief account of certain key properties of the particle system introduced above, which we shall later put to use in analyzing the pairs algorithm.
Remark A.1.It is known that when, for each n ě 0, we have for any ϕ P B b pEq, see e.g.(Del Moral, 2004, Theorem 7.4.2).Moreover, as discussed in (Del Moral, 2004, Section 9.4.1), Cérou et al ( 2011) have obtained second moment formulae for γ N n p1q via a study of the tensor product empirical measures: Introducing the coalescence operator C which acts on bounded measurable functions F as CpF qpx, yq " F px, xq, we have: and in particular for F " 1 b 1, where C 1 :" C, C 0 :" Id and tǫnu ně0 is a sequence of i.i.d., t0, 1u-valued random variables with distribution Ppǫn " 1q " 1 ´Ppǫn " 0q " 1 N .

A.3 The pairs particle system
In order to derive the Pairs algorithm, our first step is to obtain in Proposition A.3 below an alternative representation of the formula on the right of ( 27).Define for each n ě 1, the kernels, using the previously defined coalescence operator C as: Proposition A.3.For any n ě 1, N ě 2, and where F N :" N ´1CF `p1 ´1{N qF , and in the particular case F " 1 b 1, which establishes (31).For (32), note Cp1 b 1q " 1 b 1 and pγ N n q b2 p1 b 1q " γ N n p1q 2 .

A.4 Proof of Theorem 2.1
To conclude the paper, we gather together various facts from the preceeding sections of the appendix and complete the proof of Theorem 2.1.
Proof of Theorem 2.1.Unless stated otherwise, throughout the proof N ě 2 is fixed to an arbitrary value.Comparing (37) with ( 18), we see that the pairs particle system described in Section A.3 is itself an instance of the generic particle system described in Section A.2; in place of E 0 , η 0 , E, Gn, Mn etc. in the latter take E 2 0 , η b2 0 , E 2 , G pNq n , M pNq n etc.This observation allows us to transfer the various properties described in Section A.2 over to the pairs particle system, as follows.

.
In a recent arXiv manuscript(Lee and Whiteley, 2015), A. Lee and the second author of the present paper have introduced a method which allows one to unbiasedly approximate Var " single run of the particle filter which delivers Z N n .As N Ñ 8, the method of Lee and Whiteley (2015) allows one to consistently approximate asymptotic variance lim N Ñ8 N Var " the Pairs algorithm performs the different task of approximating, for any fixed N ě 2, the non-asymptotic quantity E an auxiliary parameter M (this statement is made precise in Theorem 2.1 below).Thus the Pairs algorithm allows one to reliably approxi-N is large.We shall later illustrate how this property makes the Pairs algorithm useful within strategies for estimating Var " 2.1 we prove an important result regarding the time dependence of the error of the approximation of E " ´ZN n ¯2

Fig. 1 Fig. 2
Fig. 1 ARp1q example -The top two plots represent the comparison of the estimates of E Fig. 3 Lotka -Volterra example -comparison of the estimates of E shown in Figure4.In order to achieve better visual representation, we plot normalized estimates Z

Fig. 4
Fig. 4 ARp1q example -comparison of Pairs and MC strategies for Ñ " N " 50 particles and M " M " 10 4 .On top left we plot the estimates of E " Z N n ı {Z N 1 Proposition A.1.(Cérou et al, 2011, Lemma 3.2)For any F P B b pE ˆEq,

)
Proof.Starting from the identity of Proposition A.1, namely equation (27), we have lack-of-bias property (26), combined with (36) and (32), reads as: A.2 reads: if for each p ě 0 there exists a finite constant cp such that sup Now consider (43) for some given p.When n ď p `1, For n ě p `2, suppose there exist contants 0 ă k ṕ ď k p ă `8 independent of N , and m Whiteley N (2015) Variance estimation and allocation in the particle filter.arXiv preprint arXiv:150900394 Whiteley N (2013) Stability properties of some particle filters.The Annals of Applied Probability 23(6):2500-2537 . The main contribution of the present paper is to propose and study a new method

Table 1
Estimates of Z n and E (Cérou et al, 20112011, Corollary 1.5)If for each p ě 0 there exists a finite constant cp such that with y " py, ŷq P E 2 , x " px, xq P E 2 when n ě 2 and x " px, xq P E 2 0 when n " 1. Similarly to Qp,n we write for p ă n, Q