Adaptive online variance estimation in particle filters: the ALVar estimator

We present a new approach—the ALVar estimator—to estimation of asymptotic variance in sequential Monte Carlo methods, or, particle filters. The method, which adjusts adaptively the lag of the estimator proposed in Olsson and Douc (Bernoulli 25(2):1504–1535) applies to very general distribution flows and particle filters, including auxiliary particle filters with adaptive resampling. The algorithm operates entirely online, in the sense that it is able to monitor the variance of the particle filter in real time and with, on the average, constant computational complexity and memory requirements per iteration. Crucially, it does not require the calibration of any algorithmic parameter. Estimating the variance only on the basis of the genealogy of the propagated particle cloud, without additional simulations, the routine requires only minor code additions to the underlying particle algorithm. Finally, we prove that the ALVar estimator is consistent for the true asymptotic variance as the number of particles tends to infinity and illustrate numerically its superiority to existing approaches.


Introduction
In this paper we present an adaptive online algorithm estimating the asymptotic variance in particle filters, or, sequential Monte Carlo (SMC) methods.SMC methods approximate a given sequence of distributions by propagating recursively a sample of random simulations, so-called particles, with associated importance weights.Applications include finance, signal processing, robotics, biology, and several more; see, e.g., [10,4].This methodology, introduced first by Gordon, Salmond and Smith [12] in the form of the bootstrap particle filter, revolves around two operations: a selection step, which resamples the particles in proportion to their importance weights, and a mutation step, which randomly propagates the particles in the state space.
Since the the bootstrap particle filter was introduced, several theoretical results describing the convergence of SMC methods as the number of particles tends to infinity have been established; see, e.g., [1,6,7].A contribution of vital importance was made by Del Moral and Guionnet [8], who established, under general assumptions, a central limit theorem (CLT) for standard SMC methods, a result that was later refined in, among others, [3,16,9].CLTs are generally essential in Monte Carlo simulation, as these allow the accuracy of produced estimates to be assessed in terms of confidence bounds.However, in the case of particle filters, the asymptotic variance of the weak, Gaussian limit is generally intractable due to the recursive nature of these algorithms.Thus, to estimate the variance of SMC methods is a very challenging task, and although the literature on SMC is vast, only very few works are dedicated to this specific problem.Until just a couple of years ago, the only possible way to estimate the particle-filter variance was take a naive-and computationally very demanding-approach consisting of calculating the sample variance across independent replicates of the particle filter; see [5] for a similar procedure in the context of parallelisation of SMC methods.An important step towards online variance estimation in particle filters was taken by Chan and Lai [2], who developed a consistent asymptotic-variance estimator which can be computed sequentially on the basis of a single realisation of the particle filter and without significant additional computational effort.In the same work, the estimator, which we will refer to as the Chan and Lai estimator (CLE), was also shown to be asymptotically consistent as the number of particles tends to infinity.The CLE was later refined and analysed further in [17,11].
In a particle filter, the repeated resampling operations induce genealogical relations between the particles, allowing the estimator-the weighted empirical measure formed by the particles-to be split into terms corresponding to particle subpopulations obtained by stratifying the particle sample by the timezero ancestors.At each iteration, the CLE is, simply put, given by the sample variance of these contributions with respect to the average of the full population.However, as time increases, the set of distinct time-zero ancestors depletes gradually, and eventually all the particles share one and the same time-zero ancestor.This particle-path degeneracy phenomenon makes the CLE collapse to zero in the long run.In order to remedy to this issue and to push the technology towards truly online settings, Olsson and Douc [20] devised a lag-based, numerically stable strategy in which the particle sample at time n is stratified by the ancestors at some more recent time (n − λ) ∨ 0, where λ ∈ N is a fixed lag parameter.Such a procedure-which can still be implemented in an online fashion-avoids completely the issue of particle-path degeneracy at the cost of a bias induced by the lag.Still, under mild assumptions being satisfied also for models with a non-compact state space, the authors managed to bound this bias uniformly in time by a quantity that decays geometrically with λ.The simulation study presented in the same work confirms the long-term stability of the produced estimates, which stay, when the lag is well chosen, very close to the ones produced by the naive estimator for arbitrarily long periods of time.However, designing the lag parameter λ is highly non-trivial as the optimal choice depends on the ergodicity properties of the model; indeed, the user faces a delicate bias-variance tradeoff in the sense that using a too small lag results in a numerically stable but significantly biased estimator, while using a too large one eliminates the bias at the cost of high variance implied by the same degeneracy issue as that of the CLE.
In this paper we develop further the lag-based approach of Olsson and Douc [20] and propose an estimator that is capable of adapting automatically, by monitoring the degree of depletion of the ancestor sets, the size of the lag as the particles evolve.Like the fixed-lag method in [20], our adaptive-lag variance (ALVar) estimator operates online with time-constant memory requirements, but does not require the calibration of any algorithmic parameter.Moreover, estimating the variance only on the basis of the genealogy of the propagated particle cloud, without additional simulations, the routine requires only minor code additions to the underlying particle algorithm and has a linear computational complexity in the number of particles that is fully comparable to the particle filter itself.These appealing complexity properties are absolutely crucial in practical applications.As a comparison, the online approach to variance estimation in SMC methods recently proposed in [14], relying on backward-sampling techniques, has, at best, a quadratic complexity in the number of particles, which is impractical for large particle sample sizes.Unlike previous works on variance estimation in SMC, which focus on the standard bootstrap particle filter operating on Feynman-Kac models [6], our estimator applies to more general auxiliary particle filters (APF) [21] and classes of models.In this setting, we show that the ALVar estimator is asymptotically consistent as the number of particles tends to infinity.Moreover, we claim and illustrate numerically that the values of the lag chosen adaptively by the algorithm stay stable over time and increase, on the average, only logarithmically with the number of particles; the latter property is fundamental to avoid an excessive demand of computational resources in applications.Furthermore, we extend our estimator to particle filters with adaptive resampling, in which the selection operation is performed only when triggered by some criterion monitoring the particle weight degeneracy, yielding the first SMC variance estimator in that context.
The rest of the paper is structured as follows: in Section 2 we introduce some notation, our general model framework, SMC methods, and variance estimation in particle filters; in addition, we show that all the results obtained in the framework of Feynman-Kac models and the bootstrap particle filter can be extended to our framework and the APF.In Section 3 we present the ALVar estimator, prove of its consistency, and provide an extension to particle filters with adaptive resampling.Section 4 provides numerical simulations illustrating the algorithm on some classic state-space models.Finally, Sections A-B provide some of the proofs of the results stated in Sections 2-3.

Notation
We denote by N the set of nonnegative integer numbers and let N * := N\{0}.For every (m, n) ∈ N 2 such that m ≤ n, we denote m, n := {k ∈ N : m ≤ k ≤ n}.Moreover, we let R + and R * + be the sets of nonnegative and positive real numbers, respectively, and denote vectors by x m:n := (x m , x m+1 , . . ., x n−1 , x n ).For a finite set (p i ) N i=1 , N ∈ N * , of nonnegative numbers, we denote by Cat((p i ) N i=1 ) the categorical distribution with sample space 1, N and probability function 1, N ∋ i → p i / N ℓ=1 p ℓ .For some general state space (E, E) we let M(E) and F(E) be the sets of probability measures on E and bounded measurable functions on (E, E), respectively.For any µ ∈ M(E) and h ∈ F(E) we denote by µh := h(x) µ(dx) the Lebesgue integral of h with respect to µ.
The following kernel notation will be frequently used.Let (E ′ , E ′ ) be another measurable space; then a (possibly unnormalised) transition kernel K : E×E ′ → R + induces the following operations.For any h ∈ F(E ′ ) and µ ∈ M(E) we may define the measurable function as well as the measures Now, let (E ′′ , E ′′ ) be a third measurable state-space and L a possibly unnormalised transition kernel on E ′ × E ′′ ; then, similarly to the operations between measures and kernels, we may define the products

Model setup
In order to define the distribution-flow model under consideration, let (X n , X n ) n∈N be a sequence of measurable state spaces.We introduce unnormalised transition kernels ( id by convention.In addition, we let χ be some measure on X 0 .Using these quantities we may define a flow φ n ∈ M 1 (X n ), n ∈ N, of probability distributions by letting, for every n ∈ N, Example 1 (Feynman-Kac models).Feynman-Kac models are applied in a variety of scientific fields such as statistics, physics, biology, and signal processing; see [6] for a broad coverage of the topic.For every n ∈ N, let M n : X n × X n+1 → [0, 1] be a Markov transition kernel, g n : X n → R + a measurable potential function and ν be a probability measure on (X 0 , X 0 ).Then the Feynman-Kac model Example 2 (Partially dominated state-space models).General state-space models (SSMs) constitute an important modeling tool in a diversity of scientific and engineering disciplines; see e.g.[1] and the references therein.An SSM consists of a bivariate Markov chain (X n , Y n ) n∈N evolving on some measurable product space (X × Y, X Y) according to a Markov transition kernel The chain is initialised according to ν G, where ν is some probability measure on (X, X ).In this setting, only the process (Y n ) n∈N is observed, whereas the process (X n ) n∈N -referred to as the state process-is unobserved and hence referred to as hidden.In this construction, it can be shown (see [1,Section 2.2] for details), first, that the state process is itself a Markov chain with transition kernel M and, second, that the observations (Y n ) n∈N are conditionally independent given (X n ) n∈N , with marginal emission distributions Y n ∼ G(X n , •), n ∈ N. We assume that the model is partially dominated, i.e., that the kernel G admits a transition density g : X × Y → R + with respect to some reference measure µ.
Many practical applications of SSMs call for computation of flows of hiddenstate posteriors given a sequence (y n ) n∈N of observations.In particular, the flow (φ n ) n∈N of filter distributions, each filter φ n being the conditional distribution of the state X n at time n given Y 0:n = y 0:n , can be expressed as a Feynman-Kac model with (X n , X n ) = (X, X ), M n = M, and g n (x) := g(x, y n ) for all n ∈ N; see [1, Section 3.1] for details.Inspired by this terminology, we will sometimes refer to each distribution φ n in the general flow defined by (2.1) as the filter at time n.

Sequential Monte Carlo methods
In the following we assume that all random variables are well defined on a common probability space (Ω, F , P).As mentioned in the introduction, we may approximate recursively the distribution sequence (φ n ) n∈N by propagating a random sample (ξ i n , ω i n ) N i=1 of particles and associated weights.Here N ∈ N * is the Monte Carlo sample size.More precisely, at each time step, the filter distribution φ n is approximated by the weighted empirical measure where Ω n := N i=1 ω i n and δ ξ i n is the Dirac measure located at ξ i n .The APF propagates the sample (ξ i n , ω i n ) N i=1 recursively as follows.The algorithm is initialised by standard importance sampling, drawing (ξ i 0 ) n i=1 from ν N , where ν ∈ M 1 (X 0 ) is some proposal distribution dominating χ, and letting ω i 0 ← γ −1 (ξ i 0 ) for each i, where γ −1 is the Radon-Nikodym derivative of χ with respect to ν.The auxiliary functions (ϑ n ) n∈N , where ϑ n ∈ F(X n ), are introduced in order to favor the resampling of particles that are more likely to be propagated into regions of high likelihood (as measured by the target distributions).The particles are propagated according to some proposal Markov transition kernels P n , n ∈ N.These kernels are such that, for each n ∈ N and x n ∈ X n , the probability measure P n (x n , •) is absolutely continuous with respect to the the measure L n (x n , •).Hence, given x n , there is a Radon-Nikodym derivative γ n (x n , •) such that for every x n ∈ X n and h ∈ F(X n+1 ), Algorithm 1 shows one iteration of the APF.In the following we will express one iteration of the APF as , where also the resampled indices are included in the output for reasons that will be clear later.As mentioned in the introduction, the first proof of the CLT for Algorithm 1 Auxiliary particle filter draw ξ i n+1 ∼ Pn(ξ weight ω i n+1 ← γn(ξ , ξ i n+1 )/ϑn(ξ ); 5: end for SMC methods obtained in [8] has been refined and generalised in a number of papers.The following theorem provides a CLT for APFs in the general model context of Section 2.2, and follows immediately from the more general result [19,Theorem B.6 with Z being standard normally distributed and σ 2 n (h n ) := σ 2 0,n (h n ), where, for ℓ ∈ 0, n , 3) The truncated asymptotic variance (ℓ > 0) will be useful later on.The present paper focuses on estimating online, as n increases and while the particle sample is propagated, the sequence of the asymptotic variances in (2.2).Before presenting our online variance estimator, the next section provides a brief overview of some current approaches.

Estimation of the asymptotic variance
As touched upon in the introduction, a naive approach to variance estimation in particle filters is to use a brute-force strategy which runs a sufficiently large number K ∈ N * of independent particle filters.Then the asymptotic variance of interest can be estimated by multiplying the sample variance of these filter approximations by N .However, having O(KN ) complexity, where N as well as K should be sufficiently large to provide precise filter and variance estimates, respectively, this approach is clearly computationally impractical.Moreover, implementing this procedure in an online fashion requires all the samples of each particle filter to be stored, implying also an O(KN ) memory requirement.
Appealingly, the online approach devised Chan and Lai [2] estimates consistently the sequence of asymptotic variances based only on the cloud of evolved particles and without requiring the execution of multiple SMC algorithms in parallel or any additional simulations.This is possible by keeping track, as n increases, of the so-called Eve indices (E i n ) N i=1 (using the terminology of Lee and Whiteley [17]) identifying the particles at time zero from which the ones at time n originate, in the sense that E i n denotes the index of the time-zero ancestor of particle ξ i n .These indices can be traced iteratively in the particle filter by initially letting, for all i ∈ 1, N , E i 0 ← i and then, as n increases, update the same according to . Such updates are straightforwardly implemented by adding one line of code after the selection operation on Line 2 in Algorithm 1. Then the CLE estimator of (2.4) As a main result, Chan and Lai [2] established the consistency of this estimator, in the sense that for every n ∈ N and h N tends to infinity.Although being groundbreaking in theory, the estimator (2.4) suffers a severe drawback in practice due to the particle-path degeneracy phenomenon.Indeed, because of the resampling operation, at each iteration of the filter some particles will inevitably be propagated from the same parentparticle.Thus, eventually, when n is large enough, all particles will share the same time-zero ancestor, i.e., there will exist i 0 ∈ 1, N such that E i n = i 0 , for all i ∈ 1, N .Recently, Koskela et al. [15] showed, under some standard mixing assumptions on the model, that the number of iterations needed to make the genealogical paths of the particles coalesce in this way is O(N ).Hence, eventually the estimate (2.4) collapses to zero, which makes it unusable for large values of n.In practice, the estimator exhibits poor accuracy and high variability already when the Eve indices take only few distinct values, as the variance estimates will be based on only a few distinct values in that case.
In order to remedy this issue, Olsson and Douc [20] suggest to, rather than tracing the time-zero ancestors, estimate the variance on the basis of the ancestors in some more recent generation.For this purpose, they introduce the Enoch indices defined recursively, for all i ∈ 1, N and m ∈ 0, n + 1 , by In words, E i m,n indicates the index of the ancestor at time m ≤ n of particle i at time n; moreover, notice that when m = 0, these indices correspond to the Eve indices.Then, letting, for some lag λ ∈ N, n λ := (n − λ) ∨ 0, the CLE (2.4) is replaced by the modified estimator Since for a given lag λ, the number n λ of the generation to which the Enoch indices underpinning the estimator (2.6) refer varies with n, the algorithm requires the storage and iterative updating of a window (E i n λ ,n , . . ., E i n,n ) N i=1 of Enoch indices.One iteration of the procedure is shown in Algorithm 2, which is initialised be generating the initial particle cloud as in Algorithm 1 and letting, in addition, E i 0,0 ← i for all i ∈ 1, N .We observe that the memory requirement and computational complexity of each iteration of the algorithm are both O(λN ), independently of the time index n.

Algorithm 2 APF with online Enoch-indices updates
Data: The estimator (2.6) is not consistent for the asymptotic variance σ 2 n (h n ) as N tends to infinity; still, Olsson and Douc [20,Proposition 8] showed that for all λ ∈ N and h n ∈ F(X n ), σ2 n,λ (h n ) converges to σ 2 n λ ,n (h n ) in probability as N tends to infinity, where σ 2 n λ ,n (h n ) is the truncated asymptotic variance given by (2.3), a quantity that is always smaller than the true asymptotic variance.Additional theoretical results [20, Section 4] establish that under mild, verifiable model assumptions, the asymptotic bias induced by the truncation decays geometrically fast with λ (uniformly in n).
The results of [20] were derived in the context of Feynman-Kac models and standard bootstrap particle filters, which is a more restrictive setting than the one considered here.Still, interestingly, it is possible to show that a general APF operating on a general distribution-flow in the form (2.1) can actually be interpreted as standard bootstrap filter operating on a certain auxiliary, extended Feynman-Kac model.Thus, using this trick, which is described in detail in Section A, we are able to extend the consistency results obtained in [20] to the general setting of the present paper.This is the contents of Theorem 2.2, whose proof is found in Section A.
Moreover, for all x 0 ∈ X 0 , Theorem 2.2.Let Assumptions 1 and 2 hold.Then for every n ∈ N, λ ∈ N, and The main practical issue with the lag-based approach of [20] is that the design of an optimal lag might be a difficult task.Using a too large lag implies, as for the CLE, depletion of the set of ancestors supporting the estimator, leading to high variance; on the other hand, using a too small lag decreases this variance, however at the cost of significant underestimation of the asymptotic variance of interest.The fact that the asymptotic bias decreases geometrically fast suggests that we should obtain a good approximation of the asymptotic variance even for moderate values of λ, but quantifying this optimal lag size may be a laborious task.In the numerical simulations in [20], the algorithm is run multiple times for several distinct values of λ, whereupon the variance estimates obtained in this manner are compared to that obtained using the naive estimator in order to determine the empirically best lag.This method is not ideal as it requires extensive prefatory computations and does not take into account the possibility of varying the lag as the particles evolve.Instead, it is desirable to keep the lag as large as possible as long as the estimator is of good quality (in some sense) and decrease it whenever we detect some degeneracy, determined by the depletion of the Enoch indices.This argument will be developed further in the next section, leading to the design of a fully adaptive approach.

The ALVar estimator
We first need to identify a criterion to determine an optimal lag at a given iteration n.We have previously discussed the bias-variance tradeoff, which usually arises when the objective is to minimise the mean-squared error (MSE) of an estimator with respect to the estimand of interest.For every n ∈ N, h n ∈ F(X n ), and λ ∈ N, the MSE of the estimator (2.6) can be written as the sum of its variance and its squared bias according to Our intention is to design a routine for adapting the lag λ in such a way that (3.1) is minimised.Even if we do not have closed-form expressions of the expectation and the variance of the lag-based estimator in (3.1), we may make the following considerations.
• Since σ2 n,λ (h n ) tends to σ 2 n λ ,n (h n ) in probability as the number N of particles tends to infinity, where σ 2 n λ ,n (h n ) ≤ σ 2 n (h n ) for all λ, and the difference decreases as λ approaches n, we may assume that even the nonasymptotic bias is reduced when λ increases.
• On the other hand, the larger the value of λ is, the fewer distinct elements has the set (E i n λ ,n ) N i=1 of Enoch indices, causing an increase of the variance of the estimator (2.6); see Figure 5.
The reduction of the number of distinct Enoch indices may be tolerated until an increase of the lag is beneficial for the reduction of the bias, but at some point the behavior becomes pathological.Imagine, for instance, that we use the CLE in the early iterations of the particle filter for estimating the variance; then, at some time n, one realises that there exists some λ ∈ 0, n − 1 for which σ2 n,λ (h n ) > σ2 n (h n ), although their asymptotic values are supposed to be in the opposite order and the lag-based estimator is expected to be less variable.This suggests that the Eve indices might be depleted and not reliable anymore for supporting the variance estimator.It is then reasonable to assume that these will be unreliable also in the subsequent steps, since their degeneracy can only get worse.Extending this idea to the Enoch indices, we may define recursively the concept of depleted Enoch indices.Definition 3.1.Let (h n ) n∈N be a given a sequence of functions such that h n ∈ F(X n ) for all n.The Enoch indices (E i m,n ) N i=1 are said to be depleted if at least one of the following conditions is satisfied: are always depleted (even if these are not explicitly defined).In order to check the depletion status of some indices (E i m,n ) N i=1 using Definition 3.1 we need to know the status of previous generations.Thus, in practice, depletion may be determined iteratively forwards in time, starting from (E i 0,0 ) N i=1 , which are not depleted by definition.Then for every n ∈ N * , knowing whether the indices (E i m,n−1 ) N i=1 are depleted or not for all m ∈ 0, n − 1 , it is possible to check the same for (E i m,n ) N i=1 starting from m = 0 and proceeding forwards to m = n.This is done by checking first condition (i) in Definition 3.1; if this is not satisfied, then we check condition (ii).The idea behind condition (i) is that if a set (E i m,n−1 ) N i=1 of Enoch indices is ill-suited to estimate the variance at some time n − 1, it will not be suited to estimate the variance at any future time, since the number of distinct elements in the set can only decrease with n.Regarding condition (ii), if instead the indices (E i m,n−1 ) N i=1 are non-depleted, we still need to check if there is a more recent generation Algorithm 3 describes our method, the adaptive-lag variance (ALVar) estimator, in which the optimal lag at each iteration is, as established by Theorem 3.2 below, chosen such that it is the largest one for which the corresponding Enoch indices are not depleted.This non-depletion condition is ensured by selecting recursively λ n+1 in such a way that it produces the largest estimate, whose selection is bounded from above by λ n +1.The lag is initialised by setting λ 0 ← 0.
Proof.We proceed by induction.The claim is true for n = 0 since we initialise λ 0 ← 0. Now, let the claim be true for some n ∈ N; then by the induction hypothesis and condition (i On the other hand, by the induction hypothesis and the very construction of λ n+1 in Algorithm 3, none of the depletion conditions of Definition 3.1 are satisfied for m ∈ (n + 1) λ n+1 , n + 1 ; hence, the corresponding Enoch indices are not depleted.If λ n+1 < λ n + 1, then, again by the construction of λ n+1 , (E i m,n+1 ) N i=1 are depleted for m ∈ (n + 1) λ n + 1 , (n + 1) λ n+1 − 1 as well by condition (ii).This concludes the proof.
The computation of the estimator (2.6) has complexity O(N ) and is performed λ n + 2 times at each iteration n.In order to have an online algorithm with constant memory requirements we need λ n to be uniformly bounded in n.Although in theory the lag might increase indefinitely such that λ n = n for all n ∈ N, we may assume that there exists an upper bound on the lag for any fixed number N of particles.In support of this assumption, we know that the expected number of generations to the time where all the Enoch indices are equal, which is certainly larger than any lag selected by the proposed method, is O(N ) uniformly in n; see [15].Thus, in practice there will generally exist some λ max , depending on the model and on N but independent of n, such that λ n < λ max for all n ∈ N. Hence, the final algorithm is online, since it has both complexity and memory demand (again dominated by the storage of the Enoch indices) of order O(λ max N ), independently of n, and adaptive, since the choice of each new lag is adapted to the output of the particle filter as well as the lag of the previous iteration.In the next section we are going to prove consistency of the estimator and present a heuristic argument concerning the dependence of the lag on the number of particles.

Theoretical results
Next, we show that for every n ∈ N , the resulting adaptive-lag estimator constructed in the previous section is asymptotically consistent for the true asymptotic variance σ 2 n (h n ), recalling however that the algorithm is meant to work in the regime where N is fixed and n is arbitrarily large.The 'asymptotic' algorithm is not online, since we are going to show that for all n ∈ N, λ n tends to n in probability as N grows, implying that in the limit we obtain the CLE at each step.Nevertheless, as we will see later, for a fixed number of particles, the range of the lags returned by the algorithm is expected to grow very slowly with N ; more precisely, in Section 3.2.2we argue for that this range increases only logarithmically with N , a claim that is also confirmed by our numerical experiments in Section 4.1.1.

Consistency
We now establish the consistency of the ALVar estimator.n,λn (h n ) Proof.We proceed by induction, assuming that the claim holds true for n − 1.For every ε > 0, it holds that where σ2 n (h n ) is the CLE defined in (2.4) and based on the same particle system.The second term on the right-hand side converges to zero as To treat the first term, write To treat the probability P(λ n = n) we may write where the second term of (3.5) is zero since λ n ≤ λ n−1 + 1 by construction.Now, ) where, by the induction hypothesis, P( n,λn (h n ) Finally, the base case holds trivially true since λ 0 = 0 and σ2 0,0 (h 0 ) P −→ σ 2 0 (h 0 ) for all h 0 ∈ F(X 0 ).

Heuristics on the dependence of the lag on the number of particles
In the light of Theorem 2.2, we expect λ n to increase with N .It is however crucial to understand how the values of the lags (λ n ) n∈N depend on N , since this will determine the performance and memory requirement of our algorithm.
For instance, a linear dependence would imply a quadratic complexity, which is not desirable.In the rest of this section we provide a heuristic showing that if we minimise the MSE (3.1), then we may expect λ n to be O(log N ) for all n ∈ N.
If we approximate E[σ 2 n,λ (h n )] in (3.1) by the asymptotic limit σ 2 n λ ,n (h n ), then the second term on the right-hand side is approximately the square of the asymptotic bias, (σ In [20] it is shown that under mild assumptions, the asymptotic bias is O(ρ λ ) for some mixing rate ρ ∈ (0, 1).Regarding the variance of σ2 n,λ (h n ), we know that it increases with the lag and therefore as the number of distinct Enoch indices decreases.Since the variance of a Monte Carlo estimator is generally inversely proportional to the Monte Carlo sample size, we may expect Var(σ 2 n,λ (h n )) to be O(1/N λ ), where N λ is the number of distinct Enoch indices (E i n λ ,n ) N i=1 at generation n λ .Now, by adopting the proof of Corollary 2 in [15], we may argue that under standard mixing assumptions which can be are relaxed in practice, N λ is O(N/λ). 3Finally, we determine the order of the optimal lag λ * by letting it be the minimum of the resulting crude approximation of the MSE (3.1) as a function of λ, where c > 0 and c ′ > 0 are constants independent of λ and N .It is then easily seen that λ * is Although this argument is heuristic, we will see later that it is well supported by our numerical simulations, in which the lags produced are very close the ones minimising the MSE, with a logarithmic dependence on N .

Extension to particle filters with adaptive resampling
We now consider the case in which selection is not necessarily performed at each iteration.Selection is essential in particle filters, as it copes with the well-known importance-weight degeneracy phenomenon (see, e.g., [1, Section 7.3]); however, since resampling adds variance to the estimator, this operation should not be used unnecessarily.A common procedure is hence to resample only when flagged by some weight-degeneracy criterion.One popular such criterion among others is the effective sample size (ESS) [18] defined by , which gives an approximation of the number of active particles, i.e., particles with nondegenerated importance weight at time n.The ESS is minimal and equal to one when all the weights are equal to zero except one and maximal and equal to N when all weights are non-zero and equal.Using the ESS, one may, e.g., let the resampling operation be triggered only when ESS N n ≤ αN , where α ∈ (0, 1) is a design parameter.More generally, we may let (ρ N n ) n∈N be a sequence of binaryvalued random variables indicating whether resampling should be triggered or not.The sequence (ρ N n ) n∈N is assumed to be adapted to the filtration (F N n ) n∈N generated by the particle filter, where Thus, these indicators may be based on the ESS, letting ρ N n = ½ {ESS N n <αN } , but also on n only, implying a deterministic selection schedule.Algorithm 4 shows one iteration of this adaptive procedure, which we later express in the compact form As described in Algorithm 4 APF with adaptive resampling , ξ i n+1 )/(ϑn(ξ )) ρ N n ; 9: end for the following, particle filters with adaptive resampling still satisfy a CLT, with asymptotic variance having a structure similar to that of (2.3) but depending also on an "asymptotic" resampling schedule to be defined next.Assumption 3.For given α ∈ (0, 1), let (ρ N n ) n∈N be defined as The following lemma is adopted from [19, Lemma 3.5] (with d = ∞).We now have the following CLT for adaptive APFs, whose proof is found in Section B.2. Theorem 3.5.Let Assumption 1 hold.Let (ξ i n , ω i n ) N i=1 be generated by n iterations of Algorithm 4 according to a selection schedule (ρ N n ) n∈N satisfying Assumption 3. Then for every n ∈ N and h where Z is standard normally distributed random variable and the asymptotic variance σ 2 n ρ α 0:n−1 (h n ), depending on α, is given in detail in Appendix B. When designing a lag-based estimator of the asymptotic variance provided by Theorem 3.5, it turns out to be more convenient to define the lag in terms of the number of resampling operations rather than the number of iterations of the particle filter.For this purpose, let r n := n−1 m=0 ρ N m be the counter of the number of times selection is performed before time n (with the convention r 0 = 0).Then ancestors of N offspring is 2/N ′ − 2/N .If we plug this value (instead of the time to the most recent common ancestor) into the proof of the mentioned corollary, we obtain that the Enoch indices at each time n will be indexed by the resampling times r n rather than n, since every iteration without resampling leaves these unaltered.More specifically, in the following, a generic Enoch index E i m,rn will indicate the ancestor of the particle ξ i n at any time n ′ ∈ 0, n such that r n ′ = m ∈ 0, r n .Then for all i ∈ 1, N and m ∈ 0, r n+1 , the update (2.5) can been rewritten as for m < r n+1 .
Notice that when we do not have resampling at time n, it holds that r n+1 = r n and for all m ∈ 0, r n+1 and i ∈ 1, N .In practice, for a given n ∈ N, the lag takes on values in 0, r n instead of 0, n and, as before, the expression r n λ , λ ∈ N, indicates the quantity (r n − λ) ∨ 0. In this setting, the estimator (2.6) is rewritten as Result: set r n+1 ← rn + 1; 4: for i = 1 → N do 5: for m = r n+1 λn + 1 → rn do

Numerical illustrations
In this section we apply, as an illustration, our approach to optimal filtering in SSMs.In order to benchmark carefully our variance estimator against the fixed-lag estimator of [20], we tested ALVar on the same SSMs as in the latter work, namely -the stochastic volatility model introduced by Hull and White [13] and -a linear Gaussian state space model for which exact computation of the filter is possible using the Kalman filter.

Stochastic volatility model
Out first SSM is governed by the equations where (U n ) n∈N * and (V n ) n∈N are sequences of uncorrelated standard Gaussian noise variables.The parameters are assumed to be known, with (a, b, σ) = (0.975, 0.641, 0.165).We only observe the process (Y n ) n∈N , representing stock log-returns, while n ) n∈N , representing the log-volatility, is a hidden state process which we want to infer.The state X 0 is initialised according to a zero-mean Gaussian distribution with variance σ 2 /(1 − a 2 ), i.e., the stationary distribution of the state process.Thus, we deal with a fully dominated nonlinear SSM with X = Y = R, X = Y = B(R), the Borel σ-field on R, in which both M and G are Gaussian kernels.
A record y 0:5000 of observations was obtained by simulating the process (X n , Y n ) n∈N under the dynamics (4.1) and the given parameterisation.For all n ∈ N, we let h n be the identity function.In order to have a reliable benchmark for the variance we first implemented the naive, brute-force estimation technique described in Section 2.4, producing 2000 replicates of the particle filter with N = 5000.Then we computed the sample variance of these filter estimates at each iteration and multiplied the same by N .
Algorithm 3 was implemented with the two different sample sizes N = 1000 and N = 100000 in order to assess stability as well as convergence.The output is displayed in Figures 2 and 3, where the ALVar estimator is compared to the brute-force benchmark, the CLE, and the fixed-lag approach of [20] with λ ∈ {14, 24}.In both cases, our estimator produces more precise and stable estimates of the asymptotic variance.Moreover, increasing the number of particles leads to significantly better accuracy, demonstrating the convergence properties of our method.These patterns can also be noticed in Figure 4, where we focus on large values of n.We see that, as expected, that when n is large the CLE either drops to zero or suffers from large variance due to the depletion of the Eve indices.The fixed-lag approach has a similar behavior as our adaptive approach, being both close to the benchmark brute-force values.The fundamental difference is that in the adaptive method the lag is designed adaptively and dynamically, whereas for the fixed-lag method the lag is set to a constant value close to the average lag produced by the ALVar estimator.We stress again that without access to the ALVar procedure, the design of a suitable fixed lag λ would require an exhaustive prefatory simulation-based analysis, where λ is selected by producing multiple fixed-lag variance estimates for a range of different lags and repeated runs of the particle filter and comparing the same to an estimate obtained using the brute-force estimator.In the boxplots in Figure 5, each box represents the distribution of variance estimates at time n = 1000 using the ALVar algorithm, the CLE, and the fixed-lag approach with several choices of λ, obtained on the basis of 100 replicates of Algorithm 2 for each of these lags.For the box dedicated to the ALVar we have indicated the average λ 1000 across the 100 independent particle filter replicates (not to be confused with the average lag across all iterations of a single realisation of the particle filter).We observe that our estimator manifests negligible bias, with variability similar to the one of the best fixed-lag estimators.

Adaptive-lag analysis
In this part we investigate how the values of the lags chosen adaptively at each iteration of Algorithm 3 are distributed and depend on the number of particles N .In Figure 6 we show the evolution of the chosen lags over time for N = 1000 particles.We see that after an initial constant increase, the values stabilise in a range where most of these are between 5 and 30.An interesting pattern is the presence of regimes with constant increase of the lag, during which the same generation of Enoch indices is used, and possibly sudden drops, when the so-far-used generation becomes depleted and a more recent one comes into substitution.We do not show it here, but we notice a similar pattern also when N is increased from 1000 to 100000, in which case the range of selected lags is between 10 and 50, with an average around 24.This is not a surprise, since in Theorem 3.3 we have shown that the adaptive lag is expected to converge to the maximum possible value λ n = n, why we expect larger lags for large sample sizes.The good news is that the complexity does not explode as N is increasing; indeed, the ALVar algorithm is between 1.5 and 2 times slower than a standard particle filter for N = 1000 particles and between 2 and 2.5 times slower for N = 100000 particles.Moreover, our novel method always took significantly less than twice the time of a fixed-lag algorithm with λ selected around the average value of the adaptive approach (1.4 and 1.7 times slower for N = 1000 and N = 100000 particles, respectively).The computational time of the ALVar procedure is closely related to the values of the lags that it chooses across all iterations, since the larger these are, the longer it takes to update the Enoch indices and to make the update on Line 8 in Algorithm 3. In Figure 7 we illustrate how the lags are distributed for different particle sample sizes; as predicted by our heuristic argument in Section 3.2.2, the dependence of the average lag on N clearly appears to be logarithmic with respect to N .Also the maxima behave similarly.

ALVar vs. empirical MSE
At the beginning of Section 3 we claimed that an optimal choice of the lag could be the one minimising the MSE.We now want to check that the lags selected by the ALVar algorithm are sufficiently close to the ones minimising (3.1) for most iterations.As we mentioned, (3.1) is hard to evaluate analytically but can be estimated by means of the empirical MSE obtained by running M ∈ N * independent particle filters.More precisely, for every n ∈ N, λ ∈ N, and h n ∈ F(X n ), we define the empirical MSE at time n, where σ2,j n,λ (h n ) is the estimate produced by the j-th particle filter and σ 2 n (h n ) can be approximated by a brute-force estimate.Then, for every n ∈ N and h n ∈ F(X n ) we determine the optimal lag by selecting λ * n ← arg min λ∈ 0,n In order to compare the adaptive lags formed by the ALVar estimator to the empirical MSE-optimal lags (4.2), we run M = 1000 particle filters, with each N = 10000 particles, for n = 500 iterations; letting h n = id for all n, we determined, for each replicate, the adaptive lags selected by the ALVar procedure as well as the ones minimising the empirical MSE. Figure 8 reports adaptive-lag distributions at some iterations, together with the lag values λ * n minimising the empirical MSE; remarkably, we observe that the empirically optimal lags are within the range of lags selected by the ALVar algorithm, although the latter tends to choose slightly larger values on average.

Variance estimation in the case of adaptive resampling
In this section we test the ALVar estimator in the case where the resampling operation is triggered by the ESS criterion according to Algorithm 4. Figure 9 displays brute-force estimates of the asymptotic variance as well as estimates produced by the ALVar estimator in Algorithm 5 with two distinct choices of the parameter α ∈ {0, 5, 0.2}.In both cases, the observations y 0:5000 generated in the previous section were used as input to the particle filter.Although being based on the same observations and exhibiting similar patterns, we notice that the two brute-force-estimated asymptotic variances differ, as expected, from each other and from the ones reported Figures 2 and 3. Still, in both cases the ALVar estimator is capable of tracking closely the time evolution of the variance.Since the lag is now chosen in terms of selection times and not of simple iterations, the resulting average values are significantly smaller than before, namely 3.0 when α = 0.5 and 1.9 when α = 0.2.

Linear Gaussian SSM and approximate confidence bounds
In this section we are interested in evaluating the ability of the ALVar estimator to provide reliable Monte Carlo confidence bounds for the quantities interest.In order to do that, we consider the linear Gaussian SSM where (U n ) n∈N * and (V n ) n∈N are again sequences of uncorrelated standard Gaussian noise variables.We let the parameters be equal to (a, b, σ u , σ v ) = (0.98, 1, 0.2, 1).For linear Gaussian models, the filter distributions are Gaussian and available in a closed form through the Kalman filter (see e.g.[1, Section 5.2.3]), which makes these models particularly well suited for assessing the performance of particle methods.We proceed as follows.Given a sequence y 0:1000 of observations, the Kalman filter produces the exact values of φ n (id) = E[X n | Y 0:n = y 0:n ], n ∈ 0, 1000 ; then, for a single run of the ALVar we may create, for each n ∈ 0, 1000 a 95% confidence interval where λ 0.025 is the 2.5% quantile of the standard Gaussian distribution.Finally, we let N = 10000 and produce 200 independent replicates of the APF and associated ALVar estimator (Algorithm 3).For each replicate we calculate, for every n ∈ 0, 1000 , a 95% confidence interval (4.3). Figure 10 reports the failure rate, i.e., the ratio of cases in which the true value falls outside the corresponding confidence interval, at each iteration.We observe an average failure rate across all times around 5.2% (instead of the ideal 5% failure rate), which may suggest a slight underestimation of the asymptotic variance.Similar results was obtained with the ESS-based approach described by Algorithm 5, where using α = 0.2 and α = 0.5 lead to average failure rates of 5.2% and 5.0%, respectively.All in all, these results confirm that our approach is reliable and has small or negligible bias.
A. Proof of Theorem 2.2 In the following we will prove Theorem 2.2 by extending [20, Proposition 8], which provides the convergence of interest in the special case of Feynman-Kac models and standard bootstrap filters, to the general model context in Section 2.2 and APFs.For this purpose, we first need to reformulate the APF introduced Section 2.3 as a simple bootstrap filter operating on an auxiliary, extended Feynman-Kac model.More precisely, define a sequence ( Xn , Xn ) n∈N of measurable spaces by setting X0 := X 0 and X0 := X 0 and, for n ∈ N * , Xn := X n−1 × X n and Xn := X n−1 X n .Moreover, elements in these spaces will be denoted by x0 := x 0 , xn := (x n−1 , x n ) and we will also write x1 n := x n−1 and x2 n := x n to indicate the first and second element of xn , respectively.Let ν := ν and define ḡ0 (x 0 ) := ϑ 0 (x 0 )γ −1 (x 0 ).We also define the initial unnormalised measure χ ∈ M( X0 ) such that for A ∈ X0 , χ(A) = ν(ḡ 0 ½ A ) (notice that χ = χ).Now, define, for each n ∈ N * , the auxiliary potential function and the Markov transition kernel Thus, as established by Lemma (A.1) below, Algorithm 1 may now be reinterpreted as a bootstrap particle filter operating on the auxiliary Feynman-Kac model given by (A.1) and (A.2).Algorithm 6 shows one iteration of this procedure, which is initialised by sampling ( ξi 0 ) N i=1 from ν N = ν N and letting ωi 0 ← ḡ0 ( ξi 0 ) for all i.
Algorithm 6 Bootstrap particle filter on auxiliary Feynman-Kac model be the particles, weights, and resampling indices produced by n iterations of Algorithms 1 and 6, respectively.Then for every n ∈ N * , where ξi,2 n indicates the second component of ξi n := ( ξi,1 n , ξi,2 n ).
Proof.First note that by construction, (ξ i 0 , ω i 0 ϑ 0 (ξ i 0 )) N i=1 D = ( ξi 0 , ωi 0 ) N i=1 .We may hence proceed by induction and assume that ( for some n ∈ N (in the case n = 0, we denote ξi,2 0 := ξi 0 ).In Algorithm 1 we draw the conditionally i.i.d.resampling indices (I i n+1 ) N i=1 according to whereas in Algorithm 6 we draw ( Īi , dx 2 n+1 ).Using the induction hypothesis and the previous result on the resampling indices, this implies that (ξ i n+1 ) N i=1 D = ( ξi,2 n+1 ) N i=1 .Finally, since the weights are functions only of the particles and the indices, it holds, for i ∈ 1, N , . This completes the proof.
We are now ready to prove Theorem 2.2.
Proof of Theorem 2.2.Consider a given iteration index n ∈ N; if we are interested in estimating the variance of the particle estimator after n iterations, we may assume that ϑ n ≡ 1, since the adjustment multiplier ϑ n only influences the distribution of the particles of the APF at the next iterations n + 1, n + 2, . . .Now, the estimate of φ n h n produced by Algorithm 1 for a given h n ∈ F(X n ) can be interpreted as a statistically equivalent estimate formed by Algorithm 6 by defining hn : Xn ∋ xn → h n (x 2 n ).Then by Lemma A.1, Next, we also notice that the expectations φn hn and φ n h n coincide; indeed, for every m ∈ N * , we may define for ℓ ∈ 1, n − 1 and Lℓ:m−1 = id otherwise; moreover, for any function hm ∈ F( Xm ) such that for all xm ∈ Xm , hm ( This implies that φm hm = χL 0:m−1 hm Thus, since we are assuming that ϑ n ≡ , where σ 2 n ( hn ) is given by (2.3) with ℓ = 0.It is also easily checked that the equality holds for all the truncated variances.This is done by first recalling that Algorithm 6 may be seen as a particular case of the general framework under consideration, for which the Radon-Nikodym derivatives with respect to the initial proposal density and the proposal Markov kernels are given by the potential functions and, moreover, the adjustment multipliers are all assumed to be equal to one.Thus, we may write, for all ℓ ∈ 0, n , = (E i m,n ) N i=1 (where the latter are the Enoch indices of the particle system generated by Algorithm 2).Now, if we let, for any λ ∈ N, σ2 n,λ ( hn ) be the lag-based variance estimator for the auxiliary bootstrap particle model, we have, again as a consequence of Lemma A.1, recalling that hn ( and by (2.6) we may thus conclude that σ2 n,λ ( hn ) Finally, note that the assumptions of the theorem imply that for all n ∈ N and xn ∈ Xn , ḡn (x n ) > 0 and sup xn∈ Xn ḡn (x n ) < ∞, which means that the assumptions of [20,Proposition 8] are satisfied as well.This implies that σ2 n,λ ( hn ) of Section 2.2; more precisely, for every m ∈ N, Note that L m depends only on x nm and is constant with respect to the previous states.Moreover, note that if selection is performed at each iteration, then n m = m for all m, implying L m = L m .The initial measure χ on X 0 is defined as In order to apply an APF to this model we introduce auxiliary functions (ϑ m ) m∈N defined by After resampling, the particles are propagated according to the Markov proposal kernels (P m ) m∈N , where Under the assumptions of the paper, each probability measure P n (x m , •), x m ∈ X m , is absolutely continuous with respect to L m (x m , •).Hence, for every x m , we may let γ m (x m , x m+1 ) = dL m (x m , •)/dP n (x m , •), x m+1 ∈ X m+1 , be the corresponding Radon-Nikodym derivative.Moreover, it is easily seen that for Finally, we define the proposal probability measure ν = ν P 0 • • • P n0−1 .This measure is absolutely continuous with respect to χ, and we let γ −1 be the Radon-Nikodym derivative of χ with respect to ν.It is easily shown that γ ) for ν-almost all x 0 ∈ X 0 .Algorithm 7 shows one iteration of the APF for the extended model, which is initialised by drawing (ξ i 0 ) N i=1 from ν N and letting ω i 0 ← γ −1 (ξ i 0 ) for all i ∈ 1, N .Proposition B.1 connects the output of this algorithm to that of Algorithm 4 (cf.[19, Proposition C.1], which states a similar result in the context of particle-based additive smoothing).
Proposition B.1.Let (ρ n ) n∈N be a deterministic selection schedule and let (n m ) m∈N be the induced selection times.Furthermore, let (ξ i nm , ω i nm ) N i=1 , m ∈ N, be a subsequence of weighted samples generated by Algorithm 4 for the original Proof.We proceed by induction.Assume that we have generated a sample (ξ i nm , ω i nm ) N i=1 by means of n m iterations of Algorithm 4, and that the claim holds true for this sample.We now examine the output of iteration n m+1 .Since we know that ρ nm = 1, the sample at time n m + 1 is produced by selection and mutation; thereafter, selection is not performed until time n m+1 (since ρ k = 0 for all k ∈ n m + 1, n m+1 − 1 ).Hence, each particle path ξ i nm+1:nm+1 will be generated according to The base case m = 0 is established similarly.This completes the proof.
Thus, in the case of a deterministic-but possibly irregular-resampling schedule, the APF may be reinterpreted as a particle model with systematic resampling operating on the auxiliary model described above.As the CLT in Proposition 2.1 is a general result, valid for arbitrary models and APFs (with systematic resampling), it holds also for the extended model and Algorithm 7, and the asymptotic normality of the output of Algorithm 7 follows immediately.This finding is summarised by the following proposition.Proposition B.2 implies that we may obtain a CLT also in the case where Algorithm 4 is driven by a deterministic resampling schedule (ρ n ) n∈N .To conclude formally this argument, consider the output of Algorithm 4 after an arbitrarily chosen number n of iterations; even though n is generally not a resampling time, we may, without loss of generality, assume that it is so (since the distribution of the particle sample at a given time point does not depend on whether selection will be performed in a subsequent iteration of the algorithm).In particular, in the extended model we may let X rn = X nr n −1 +1 × • • • × X n ; then by Proposition B.1, (Π rn n (ξ i rn ), ω i rn ) N i=1 D = (ξ i n , ω i n ) N i=1 .For any function h n ∈ F(X n ) we may define h rn : X rn ∋ x rn → h n (Π rn n (x rn )) = h n (x n ).It is straightforward to check that for a so-defined extended function it holds that φ n h n = φ rn h rn Thus, under Assumption The proof is completed by noting that the base case holds true, since λ 0 = 0 and σ2 0,0 (h 0 ) P −→ σ 2 0 (h 0 ) for all h 0 ∈ F(X 0 ).
We are now ready to prove Corollary 3.6.
Proof of Corollary 3.6.Following the lines of the proof of Lemma 3.5, we let again R n = {0, 1} n be the set of binary sequences of length n.For all ρ 0:n−1 ∈ R n , let σ2 n,λn ρ 0:n−1 (h n ) be independent variance estimators obtained on the basis of independent realisations of Algorithm

Fig 1 :
Fig 1: The points in the plot correspond to fixed-lag estimates of σ 2 n (id), n = 500, for the stochastic volatility model in Section 4.1, with different values of λ.The particle filter used N = 1000 particles.Each estimate is marked differently depending whether the corresponding Enoch indices (E i n λ ,n ) N i=1 are depleted or not.In addition, we indicate which of the two conditions in Definition 3.1 that indicates non-depletion.For λ ≥ 25, condition (i) was fulfilled at least, while for λ ∈ 14, 24 condition (ii) (and not (i)) was fulfilled.Algorithm 3 ALVar estimator Data: (ξ i n , ω i n ) N i=1 , λn, (E i n λn ,n , . . ., E i n,n ) N

Fig 2 :
Fig 2: Comparison between variance estimators in the context of optimal filtering in the stochastic volatility model (4.1).The particle sample size was N = 1000 and the plot displays every 50th variance estimate.The brute-force estimates are based on 2000 replicates of the particle filter.

Fig 3 :
Fig 3: Comparison between variance estimators in the context of optimal filtering in the stochastic volatility model (4.1).The particle sample size was N = 100000 and the plot displays every 50th variance estimate.The brute-force estimates are based on 2000 replicates of the particle filter.

Fig 4 :
Fig 4: Similar plots as in Figures 2 and 3, but with focus on large n ∈ 4900, 5000 .The left and right panels correspond to N = 1000 and N = 100000, respectively.

Fig 5 :
Fig 5: Distribution of fixed-lag and adaptive variance estimates at iteration n = 1000.Each box is based on 100 replicates of Algorithm 2 for a given value of λ, except the one marked ALVar, which corresponds to our adaptive method.The average adaptive lag was approximately 19.0.Each particle filter used N = 10000 particles.The circles correspond to the estimate produced by the brute-force algorithm and the lines and stars in the boxes indicate the medians and means of each sample, respectively.

Fig 6 :
Fig 6: Evolution of the choice of optimal lag for n ∈ 0, 5000 (left panel) and N = 1000 particles.The right panel shows the 100 first iterations.The average lag was approximately 14.0.

Fig 7 :
Fig 7: Each box shows the distribution of lags selected adaptively by the ALVar algorithm in a single run up to iteration n = 5000 for a given particle sample size N (excluding the first 100 lags in each run).Different boxes correspond to different sample sizes.The dots and lines in each box represent the means and the medians of the sample distributions, respectively.The dashed line is the least squares fit between the average lag of each run and log 10 (N ).

Fig 8 :
Fig 8: Each box represents the distribution, at a given iteration, of the adaptive lags selected at that time by the ALVar estimator for the M = 1000 different replicates.The crosses and the lines in each box are the corresponding means and medians, respectively.The stars are the lags minimising the empirical MSE evaluated at each iteration on the basis of the M = 1000 replicates.

Fig 9 :
Fig 9: Variance estimation in the APF with ESS-based selection schedule using N = 10000 particles and a resampling parameter α equal to 0.5 (top panel) and 0.2 (bottom panel), respectively.The plot displays every 50th estimate.

Fig 10 :
Fig 10: Empirical failure rates across iterations n ∈ 0, 1000 .Each rate is obtained on the basis of 200 replicates of Algorithm 3 with N = 10000.The dashed line represents the perfect rate of 5%, while the dash-dotted line is the overall average failure rate.

,
and by(2.3)we may conclude that σ2 ℓ,n ( hn ) = σ 2 ℓ,n (h n ).(A.3) Finally, recall that for every m ∈ 0, n , the Enoch indices ( Ēi m,n ) N i=1 are, for all k ∈ 0, n and i ∈ 1, N , recursively defined by Ēi m,k := i for m = k, Ē Īi k m,k−1 for m < k.Consequently, these indices are functions of the resampling indices, and by Lemma A.1 it holds that ( Ēi m,n ) N i=1 D

Algorithm 7
Bootstrap particle with systematic selection in the extended model.end for model and let (ξ i m , ω i m ) N i=1 , m ∈ N, be weighted samples generated by Algorithm 7 for the extended model.Then for every m ∈ N,(Π m nm (ξ i m ), ω i m ) N i=1 D = (ξ i nm , ω i nm ) N i=1 .