Abstract
We present a new approach—the ALVar estimator—to estimation of asymptotic variance in sequential Monte Carlo methods, or, particle filters. The method, which adjusts adaptively the lag of the estimator proposed in Olsson and Douc (Bernoulli 25(2):1504–1535) applies to very general distribution flows and particle filters, including auxiliary particle filters with adaptive resampling. The algorithm operates entirely online, in the sense that it is able to monitor the variance of the particle filter in real time and with, on the average, constant computational complexity and memory requirements per iteration. Crucially, it does not require the calibration of any algorithmic parameter. Estimating the variance only on the basis of the genealogy of the propagated particle cloud, without additional simulations, the routine requires only minor code additions to the underlying particle algorithm. Finally, we prove that the ALVar estimator is consistent for the true asymptotic variance as the number of particles tends to infinity and illustrate numerically its superiority to existing approaches.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
In this paper we present an adaptive online algorithm estimating the asymptotic variance in particle filters, or, sequential Monte Carlo (SMC) methods. SMC methods approximate a given sequence of distributions by propagating recursively a sample of random simulations, so-called particles, with associated importance weights. Applications include finance, signal processing, robotics, biology, and several more; see, e.g., Doucet et al. (2001) and Chopin and Papaspiliopoulos (2020). This methodology, introduced first by Gordon et al. (1993) in the form of the bootstrap particle filter, revolves around two operations: a selection step, which resamples the particles in proportion to their importance weights, and a mutation step, which randomly propagates the particles in the state space.
Since the introduction of the bootstrap particle filter, several theoretical results describing the convergence of SMC methods as the number of particles tends to infinity have been established; see, e.g., Cappé et al. (2005); Del Moral (2004), and Del Moral (2013). A contribution of vital importance was made by Del Moral and Guionnet (1999), who established, under general assumptions, a central limit theorem (CLT) for standard SMC methods, a result that was later refined by, among others, Chopin (2004); Künsch (2005), and Douc and Moulines (2008). CLTs are generally essential in Monte Carlo simulation, as these allow the accuracy of produced estimates to be assessed in terms of confidence bounds. However, in the case of particle filters, the asymptotic variance of the weak, Gaussian limit is generally intractable due to the recursive nature of these algorithms. Thus, to estimate the variance of SMC methods is a very challenging task, and although the literature on SMC is vast, only very few works are dedicated to this specific problem. Until just a couple of years ago, the only possible way to estimate the particle-filter variance was to take a naive—and computationally very demanding—approach consisting of calculating the sample variance across independent replicates of the particle filter; see Crisan et al. (2018) for a similar procedure in the context of parallelisation of SMC methods. An important step towards online variance estimation in particle filters was taken by Chan and Lai (2013), who developed a consistent asymptotic-variance estimator which can be computed sequentially on the basis of a single realisation of the particle filter and without significant additional computational effort. In the same work, the estimator, which we will refer to as the Chan and Lai estimator (CLE), was also shown to be asymptotically consistent as the number of particles tends to infinity. The CLE was later refined and analysed further by Lee and Whiteley (2018) and Du and Guyader (2021).
In a particle filter, the repeated resampling operations induce genealogical relations between the particles, allowing the estimator—the weighted empirical measure formed by the particles—to be split into terms corresponding to particle subpopulations obtained by stratifying the particle sample by the time-zero ancestors. At each iteration, the CLE is, simply put, given by the sample variance of these contributions with respect to the average of the full population. However, as time increases, the set of distinct time-zero ancestors depletes gradually, and eventually all the particles share one and the same time-zero ancestor. This particle-path degeneracy phenomenon makes the CLE collapse to zero in the long run. In order to remedy to this issue and to push the technology towards truly online settings, Olsson and Douc (2019) devised a lag-based, numerically stable strategy in which the particle sample at time n is stratified by the ancestors at some more recent time \((n-\uplambda ) \vee 0\), where \(\uplambda \in {\mathbb {N}}\) is a fixed lag parameter. Such a procedure—which can still be implemented in an online fashion—avoids completely the issue of particle-path degeneracy at the cost of a bias induced by the lag. Still, under mild assumptions being satisfied also for models with a non-compact state space, the authors managed to bound this bias uniformly in time by a quantity that decays geometrically with \(\uplambda \). The simulation study presented in the same work confirms the long-term stability of the produced estimates, which stay, when the lag is well chosen, very close to the ones produced by the naive estimator for arbitrarily long periods of time. However, designing the lag parameter \(\uplambda \) is highly non-trivial as the optimal choice depends on the ergodicity properties of the model; indeed, the user faces a delicate bias–variance tradeoff in the sense that using a too small lag results in a numerically stable but significantly biased estimator, whereas using a too large lag eliminates the bias at the cost of high variance implied by the same degeneracy issue as that affecting the CLE.
In this paper we develop further the lag-based approach of Olsson and Douc (2019) and propose an estimator that is capable of adapting automatically, by monitoring the degree of depletion of the ancestor sets, the size of the lag as the particles evolve. Like the fixed-lag method of Olsson and Douc (2019), our adaptive-lag variance (ALVar) estimator operates online with time-constant memory requirements, but does not require the calibration of any algorithmic parameter. Moreover, estimating the variance only on the basis of the genealogy of the propagated particle cloud, without additional simulations, the routine requires only minor code additions to the underlying particle algorithm and has a linear computational complexity in the number of particles that is fully comparable to the particle filter itself. These appealing complexity properties are absolutely crucial in practical applications. As a comparison, the online approach to variance estimation in SMC methods recently proposed by Janati et al. (2022), relying on backward-sampling techniques, has, at best, a quadratic complexity in the number of particles, which is impractical for large particle sample sizes. In addition, just like the CLE, the estimator of Janati et al. (2022) also exhibits a decay towards zero for longer time horizons, even though this occurs at a lower rate than for the CLE. Unlike previous works on variance estimation in SMC, which focus on the standard bootstrap particle filter operating on Feynman–Kac models (Del Moral 2004), our estimator applies to more general auxiliary particle filters (APF, Pitt and Shephard 1999) and classes of models. In this setting, we show that the ALVar estimator is asymptotically consistent as the number of particles tends to infinity. Moreover, we claim and illustrate numerically that the values of the lag chosen adaptively by the algorithm stay stable over time and increase, on the average, only logarithmically with the number of particles; the latter property is fundamental to avoid an excessive demand of computational resources in applications. Furthermore, we extend our estimator to particle filters with adaptive resampling, in which the selection operation is performed only when triggered by some criterion monitoring the particle weight degeneracy, yielding the first SMC variance estimator in that context.
The rest of the paper is structured as follows: in Sect. 2 we introduce some notation, our general model framework, SMC methods, and give some background to variance estimation in particle filters; in addition, we show that all the results obtained in the framework of Feynman–Kac models and the bootstrap particle filter can be extended to our framework and the APF. In Sect. 3 we present the ALVar estimator, prove its consistency, provide an extension to particle filters with adaptive resampling, and show how the estimator can be applied also in the context of online lag-based fixed-point marginal smoothing. Section 4 provides numerical simulations illustrating the algorithm on some classic state-space models. Finally, Appendix A and B provide some of the proofs of the results stated in Sects. 2 and 3.
2 Preliminaries
2.1 Notation
We denote by \({\mathbb {N}}\) the set of nonnegative integers and let \({\mathbb {N}}^*{:}{=}{\mathbb {N}}{\setminus }\{0\}\). For every \((m,n)\in {\mathbb {N}}^2\) such that \(m\le n\), we denote \(\llbracket m, n \rrbracket {:}{=}\{k\in {\mathbb {N}}:m\le k\le n\}\). Moreover, we let \({\mathbb {R}}_+\) and \({\mathbb {R}}_+^*\) be the sets of nonnegative and positive real numbers, respectively, and denote vectors by \(x_{m:n}{:}{=}(x_m,x_{m+1},\dots ,x_{n-1},x_n)\). For a finite set \((p_i)_{i=1}^N\), \(N \in {\mathbb {N}}^*\), of nonnegative numbers, we denote by \(\textsf{Cat}((p_i)_{i=1}^N)\) the categorical distribution with sample space \(\llbracket 1, N \rrbracket \) and probability function \(\llbracket 1, N \rrbracket \ni i \mapsto p_i/\sum _{\ell =1}^N p_\ell \). For some general state space \(({\textsf{E}}, \mathcal {E})\) we let \({\textsf{M}}(\mathcal {E}) ~{\textsf {M}}_1(\mathcal {E})\) and \({\textsf{F}}(\mathcal {E})\) be the sets of (probability) measures and bounded measurable functions on \(({\textsf{E}},\mathcal {E})\), respectively. For any \(\mu \in {\textsf{M}}(\mathcal {E})\) and \(h \in {\textsf{F}}(\mathcal {E})\) we denote by \(\mu h {:}{=}\int h(x) \, \mu (dx)\) the Lebesgue integral of h with respect to \(\mu \).
The following kernel notation will be frequently used. Let \(({\textsf{E}}', \mathcal {E}')\) be another measurable space; then a (possibly unnormalised) transition kernel \({\textbf {K}}:{\textsf{E}} \times \mathcal {E}'\rightarrow {\mathbb {R}}_+\) induces the following operations. For any \(h \in {\textsf{F}}(\mathcal {E}')\) and \(\mu \in {\textsf{M}}(\mathcal {E})\) we may define the measurable function
as well as the measures
Now, let \(({\textsf{E}}'', \mathcal {E}'')\) be a third measurable state-space and \({\textbf {L}}\) a possibly unnormalised transition kernel on \({\textsf{E}}' \times \mathcal {E}''\); then, similarly to the operations between measures and kernels, we may define the products
2.2 Model setup
In order to define the distribution-flow model under consideration, let \(({\textsf{X}}_n,\mathcal {X}_n)_{n\in {\mathbb {N}}}\) be a sequence of measurable state spaces. We introduce unnormalised transition kernels \(({\textbf {L}}_{n})_{n \in {\mathbb {N}}}\), \({\textbf {L}}_{n}: {\textsf{X}}_n \times \mathcal {X}_{n+1}\rightarrow {\mathbb {R}}_+\), where each \({\textbf {L}}_{n}\) is such that \(\sup _{x_n \in {\textsf{X}}_n} {\textbf {L}}_{n} \mathbb {1}_{{\textsf{X}}_{n+1}}(x_n) < \infty \). For compactness, we write \({\textbf {L}}_{k,m} {:}{=}{\textbf {L}}_{k}{\textbf {L}}_{k+1}\cdots {\textbf {L}}_{m}\) whenever \(k \le m\), otherwise \( {\textbf {L}}_{k,m}=\text {id}\) by convention. In addition, we let \(\chi \) be some measure on \(\mathcal {X}_0\). Using these quantities we may define a flow \(\phi _{n} \in {\textsf{M}}_1(\mathcal {X}_n)\), \(n \in {\mathbb {N}}\), of probability distributions by letting, for every \(n \in {\mathbb {N}}\),
Example 1
(Feynman–Kac models) Feynman–Kac models are applied in a variety of scientific fields such as statistics, physics, biology, and signal processing; see Del Moral (2004) for a broad coverage of the topic. For every \(n\in {\mathbb {N}}\), let \({\textbf {M}}_{n}:{\textsf{X}}_n \times \mathcal {X}_{n+1}\rightarrow [0,1]\) be a Markov transition kernel, \(g_{n}: {\textsf{X}}_n \rightarrow {\mathbb {R}}_+\) a measurable potential function, and \(\nu \) a probability measure on \(({\textsf{X}}_0,\mathcal {X}_0)\). Then the Feynman–Kac model \((\phi _{n})_{n \in {\mathbb {N}}}\) induced by \(\nu \) and \((({\textsf{X}}_n, \mathcal {X}_n), {\textbf {M}}_{n}, g_{n})_{n \in {\mathbb {N}}}\) is given by (2.1) with.Footnote 1
Example 2
(Partially dominated state-space models) General state-space models (SSMs) constitute an important modeling tool in a diversity of scientific and engineering disciplines; see, e.g., Cappé et al. (2005) and the references therein. An SSM consists of a bivariate Markov chain \((X_n, Y_n)_{n \in {\mathbb {N}}}\) evolving on some measurable product space according to a Markov transition kernel constructed on the basis of two other Markov kernels \({\textbf {M}}:{\textsf{X}}\times \mathcal {X}\rightarrow [0,1]\) and \({\textbf {G}}:{\textsf{X}}\times \mathcal {Y}\rightarrow [0,1]\) as
The chain is initialised according to , where \(\nu \) is some probability measure on \(({\textsf{X}}, \mathcal {X})\). In this setting, only the process \((Y_n)_{n \in {\mathbb {N}}}\) is observed, whereas the process \((X_n)_{n \in {\mathbb {N}}}\)—referred to as the state process—is unobserved and hence referred to as hidden. In this construction, it can be shown (see Cappé et al. 2005, Section 2.2, for details), first, that the state process is itself a Markov chain with transition kernel \({\textbf {M}}\) and, second, that the observations \((Y_n)_{n \in {\mathbb {N}}}\) are conditionally independent given \((X_n)_{n \in {\mathbb {N}}}\), with marginal emission distributions \(Y_n \sim {\textbf {G}}(X_n, \cdot )\), \(n \in {\mathbb {N}}\). We assume that the model is partially dominated, i.e., that the kernel \({\textbf {G}}\) admits a transition density \(g:{\textsf{X}}\times {\textsf{Y}}\rightarrow {\mathbb {R}}_+\) with respect to some reference measure \(\mu \).
Many practical applications of SSMs call for computation of flows of hidden-state posteriors given a sequence \((y_n)_{n \in {\mathbb {N}}}\) of observations. In particular, the flow \((\phi _{n})_{n \in {\mathbb {N}}}\) of filter distributions, each filter \(\phi _{n}\) being the conditional distribution of the state \(X_n\) at time n given \(Y_{0:n} = y_{0:n}\), can be expressed as a Feynman–Kac model with \(({\textsf{X}}_n, \mathcal {X}_n)=({\textsf{X}},\mathcal {X})\), \({\textbf {M}}_{n}={\textbf {M}}\), and \(g_{n}(x){:}{=}g(x,y_n)\) for all \(n \in {\mathbb {N}}\); see Cappé et al. (2005, Section 3.1) for details. Inspired by this terminology, we will sometimes refer to each distribution \(\phi _{n}\) in the general flow defined by (2.1) as the filter at time n.
2.3 Sequential Monte Carlo methods
In the following we assume that all random variables are well defined on a common probability space \((\Omega ,\mathcal {F},{\mathbb {P}})\). As mentioned in the introduction, we may approximate recursively the distribution sequence \((\phi _{n})_{n\in {\mathbb {N}}}\) by propagating a random sample \((\xi _{n}^{i},\omega _{n}^{i})_{i=1}^N\) of particles and associated weights. Here \(N \in {\mathbb {N}}^*\) is the Monte Carlo sample size. More precisely, at each time step, the filter distribution \(\phi _{n}\) is approximated by the weighted empirical measure
where \(\Omega _n{:}{=}\sum _{i=1}^N\omega _{n}^{i}\) and \(\delta _{\xi _{n}^{i}}\) is the Dirac measure located at \(\xi _{n}^{i}\). The APF propagates the sample \((\xi _{n}^{i},\omega _{n}^{i})_{i=1}^N\) recursively as follows. The algorithm is initialised by standard importance sampling, drawing , where \(\nu \in {\textsf{M}}_1(\mathcal {X}_0)\) is some proposal distribution dominating \(\chi \), and letting \(\omega _{0}^{i}\leftarrow \gamma _{-1}(\xi _{0}^{i})\) for each i, where \(\gamma _{-1}\) is the Radon–Nikodym derivative of \(\chi \) with respect to \(\nu \). The auxiliary functions \((\vartheta _{n})_{n\in {\mathbb {N}}}\), where \(\vartheta _{n}\in {\textsf{F}}(\mathcal {X}_n)\), are introduced in order to favor the resampling of particles that are more likely to be propagated into regions of high likelihood (as measured by the target distributions). The particles are propagated according to some proposal Markov transition kernels \({\textbf {P}}_{n}\), \(n\in {\mathbb {N}}\). These kernels are such that, for each \(n\in {\mathbb {N}}\) and \(x_n\in {\textsf{X}}_n\), the measure \({\textbf {L}}_{n}(x_n,\cdot )\) is absolutely continuous with respect to the probability measure \({\textbf {P}}_{n}(x_n,\cdot )\). Hence, given \(x_n\), there is a Radon–Nikodym derivative \(\gamma _{n}(x_n,\cdot )\) such that for every \(x_n\in {\textsf{X}}_n\) and \(h\in {\textsf{F}}(\mathcal {X}_{n+1})\),
Algorithm 1 shows one iteration of the APF. In the following we will express one iteration of the APF as \((\xi _{n+1}^{i},\omega _{n+1}^{i}, I_{n+1}^{i})_{i=1}^N\leftarrow {{\textsf{P}}}{{\textsf{F}}}((\xi _{n}^{i},\omega _{n}^{i})_{i=1}^N)\), where also the resampled indices are included in the output for reasons that will be clear later.
As mentioned in the introduction, the first proof of the CLT for SMC methods obtained by Del Moral and Guionnet (1999) has been refined and generalised in a number of papers. The following theorem provides a CLT for APFs in the general model context of Sect. 2.2, and follows immediately from the more general result of Mastrototaro et al. (2022, Theorem B.6).Footnote 2
Assumption 1
For every \(n \in {\mathbb {N}}\), \(\vartheta _{n} \in {\textsf{F}}(\mathcal {X}_n)\) and \(\gamma _{n}/\vartheta _{n} \in {\textsf{F}}(\mathcal {X}_n)\). In addition, \(\gamma _{-1} \in {\textsf{F}}(\mathcal {X}_0)\).
Theorem 2.1
Let Assumption 1 hold. Then for every \(n\in {\mathbb {N}}\) and \(h_{n}\in {\textsf{F}}(\mathcal {X}_n)\), as \(N\rightarrow \infty \),
with Z being standard normally distributed and \( \sigma _{n}^2(h_{n}) {:}{=}\sigma _{0,n}^2(h_{n})\), where, for \(\ell \in \llbracket 0, n \rrbracket \),
The truncated asymptotic variance (\(\ell > 0\)) will be useful later on.
The present paper focuses on estimating online, as n increases and while the particle sample is propagated, the sequence of the asymptotic variances in (2.2). Before presenting our online variance estimator, the next section provides a brief overview of some current approaches.
2.4 Estimation of asymptotic variance
As touched upon in the introduction, a naive approach to variance estimation in particle filters is to use a brute-force strategy which runs a sufficiently large number \(K \in {\mathbb {N}}^*\) of independent particle filters. Then the asymptotic variance of interest can be estimated by multiplying the sample variance of these filter approximations by N. However, having \({\mathcal {O}}(KN)\) complexity, where N as well as K should be sufficiently large to provide precise filter and variance estimates, respectively, this approach is clearly computationally impractical. Moreover, implementing this procedure in an online fashion requires all the samples of each particle filter to be stored, implying also an \({\mathcal {O}}(KN)\) memory requirement.
Appealingly, the online approach devised by Chan and Lai (2013) estimates consistently the sequence of asymptotic variances based only on the cloud of evolved particles and without requiring the execution of multiple SMC algorithms in parallel or any additional simulations. This is possible by keeping track, as n increases, of the so-called Eve indices \((E_{n}^{i})_{i=1}^N\) (borrowing the terminology from Lee and Whiteley (2018)) identifying the particles at time zero from which the ones at time n originate, in the sense that \(E_{n}^{i}\) denotes the index of the time-zero ancestor of particle \( \xi _{n}^{i}\). These indices can be traced iteratively in the particle filter by initially letting, for all \(i\in \llbracket 1, N \rrbracket \), \(E_{0}^{i}\leftarrow i\) and then, as n increases, update the same according to \(E_{n+1}^{i}\leftarrow E_{n}^{I_{n+1}^{i}}\). Such updates are straightforwardly implemented by adding one line of code after the selection operation on Line 2 in Algorithm 1. Then the CLE estimator of \( \sigma _{n}^2(h_{n})\) is, for any \(h_{n}\in {\textsf{F}}(\mathcal {X}_n)\), given by
As a main result, Chan and Lai (2013) established the consistency of this estimator, in the sense that for every \(n\in {\mathbb {N}}\) and \(h_{n}\in {\textsf{F}}(\mathcal {X}_n)\), \({\hat{\sigma }}_{n}^2(h_{n})\overset{{\mathbb {P}}}{\longrightarrow }\sigma _{n}^2(h_{n})\) as N tends to infinity. Although being groundbreaking in theory, the estimator (2.4) suffers a severe drawback in practice due to the particle-path degeneracy phenomenon. Indeed, because of the resampling operation, at each iteration of the filter some particles will inevitably be propagated from the same parent particle. Thus, eventually, when n is large enough, all particles will share the same time-zero ancestor, i.e., there will exist \(i_0\in \llbracket 1, N \rrbracket \) such that \(E_{n}^{i}=i_0\), for all \(i\in \llbracket 1, N \rrbracket \). Recently, Koskela et al. (2020) showed, under some standard mixing assumptions on the model, that the number of iterations needed to make the genealogical paths of the particles coalesce in this way is \({\mathcal {O}}(N)\). Hence, eventually the estimate (2.4) collapses to zero, which makes it unusable for large values of n. In practice, the estimator exhibits poor accuracy and high variability already when the Eve indices take on only a few distinct values, as also the variance estimates will be based on only a few distinct values in that case.
In order to remedy this issue, Olsson and Douc (2019) suggest to, rather than tracing the time-zero ancestors, estimate the variance on the basis of the ancestors in some more recent generation. For this purpose, they introduce the Enoch indices defined recursively, for all \(i\in \llbracket 1, N \rrbracket \) and \(m\in \llbracket 0, n+1 \rrbracket \), by
In words, \(E_{m,n}^{i}\) indicates the index of the ancestor at time \(m \le n\) of particle i at time n; moreover, notice that when \(m=0\), these indices correspond to the Eve indices. Then, letting \(n\langle \uplambda \rangle {:}{=}(n-\uplambda )\vee 0\) for some lag \(\uplambda \in {\mathbb {N}}\), the CLE (2.4) is replaced by the modified estimator
Since for a given lag \( \uplambda \), the number \(n\langle \uplambda \rangle \) of the generation to which the Enoch indices underpinning the estimator (2.6) refer varies with n, the algorithm requires the storage and iterative updating of a window \((E_{n\langle \uplambda \rangle ,n}^{i}, \dots , E_{n,n}^{i})_{i = 1}^N\) of Enoch indices. One iteration of the procedure is shown in Algorithm 2, which is initialised be generating the initial particle cloud as in Algorithm 1 and letting, in addition, \(E_{0,0}^{i}\leftarrow i\) for all \(i\in \llbracket 1, N \rrbracket \). We observe that the memory requirement and computational complexity of each iteration of the algorithm are both \({\mathcal {O}}(\uplambda N)\), independently of the time index n.
The estimator (2.6) is not consistent for the asymptotic variance \(\sigma _{n}^2(h_{n})\) as N tends to infinity; still, Olsson and Douc (2019, Proposition 8) showed that for all \(\uplambda \in {\mathbb {N}}\) and \(h_{n}\in {\textsf{F}}(\mathcal {X}_n)\), \({\hat{\sigma }}_{n,\uplambda }^2(h_{n})\) converges to \(\sigma _{n\langle \uplambda \rangle ,n}^2(h_{n})\) in probability as N tends to infinity, where \(\sigma _{n\langle \uplambda \rangle ,n}^2(h_{n})\) is the truncated asymptotic variance given by (2.3), a quantity that is always smaller than the true asymptotic variance. Additional theoretical results (Olsson and Douc 2019, Section 4) establish that under mild, verifiable model assumptions, the asymptotic bias induced by the truncation decays geometrically fast with \(\uplambda \) (uniformly in n).
The results of Olsson and Douc (2019) were derived in the context of Feynman–Kac models and standard bootstrap particle filters, which is a more restrictive setting than the one considered here. Still, interestingly, it is possible to show that a general APF operating on a general distribution flow in the form (2.1) can actually be interpreted as a standard bootstrap filter operating on a certain auxiliary, extended Feynman–Kac model. Thus, using this trick, which is described in detail in Appendix A, we are able to extend the consistency results obtained by Olsson and Douc (2019) to the general setting of the present paper. This is the contents of Theorem 2.2, whose proof is found in Appendix A.
Assumption 2
For all \(n\in {\mathbb {N}}\) and \((x_n,x_{n+1})\in {\textsf{X}}_n\times {\textsf{X}}_{n+1}\),
and
Moreover, for all \(x_0\in {\textsf{X}}_0\),
Theorem 2.2
Let Assumptions 1 and 2 hold. Then for every \(n\in {\mathbb {N}}\), \(\uplambda \in {\mathbb {N}}\), and \(h_{n} \in {\textsf{F}}(\mathcal {X}_n)\), as \(N \rightarrow \infty \),
The main practical issue with the lag-based approach of Olsson and Douc (2019) is that the design of an optimal lag might be a difficult task. Using a too large lag implies, as for the CLE, depletion of the set of ancestors supporting the estimator, leading to high variance; on the other hand, using a too small lag decreases this variance, however at the cost of significant underestimation of the asymptotic variance of interest. The fact that the asymptotic bias decreases geometrically fast suggests that we should obtain a good approximation of the asymptotic variance even for moderate values of \(\uplambda \), but quantifying this optimal lag size may be a laborious task. In the numerical simulations of Olsson and Douc (2019), the algorithm is run multiple times for several distinct values of \(\uplambda \), whereupon the variance estimates obtained in this manner are compared to that obtained using the naive estimator in order to determine the empirically best lag. This method is not ideal as it requires extensive prefatory computations and does not take into account the possibility of varying the lag as the particles evolve. Instead, it is desirable to keep the lag as large as possible as long as the estimator is of good quality (in some sense) and decrease it whenever some degeneracy, determined by the depletion of the Enoch indices, is detected. This argument will be developed further in the next section, leading to the design of a fully adaptive approach.
3 Main results
3.1 The ALVar estimator
We first need to identify a criterion to determine an optimal lag at a given iteration n. We have previously discussed the bias–variance tradeoff, which usually arises when the objective is to minimise the mean-squared error (MSE) of an estimator with respect to the estimand of interest. For every \(n \in {\mathbb {N}}\), \(h_{n}\in {\textsf{F}}(\mathcal {X}_n)\), and \(\uplambda \in {\mathbb {N}}\), the MSE of the estimator (2.6) can be written as the sum of its variance and its squared bias according to
Our intention is to design a routine for adapting the lag \(\uplambda \) in such a way that (3.1) is minimised. Even if we do not have closed-form expressions of the expectation and the variance of the lag-based estimator in (3.1), we may make the following considerations.
-
Since \({\hat{\sigma }}_{n,\uplambda }^2(h_{n})\) tends to \(\sigma _{n\langle \uplambda \rangle ,n}^2(h_{n})\) in probability as the number N of particles tends to infinity, where \(\sigma _{n\langle \uplambda \rangle ,n}^2(h_{n})\le \sigma _{n}^2(h_{n})\) for all \(\uplambda \), and the difference decreases as \(\uplambda \) approaches n, we may assume that also the non-asymptotic bias is reduced when \(\uplambda \) increases.
-
On the other hand, the larger the value of \(\uplambda \) is, the fewer distinct elements has the set \((E_{n\langle \uplambda \rangle ,n}^{i})_{i=1}^N\) of Enoch indices, causing an increase of the variance of the estimator (2.6); see Fig. 5.
The reduction of the number of distinct Enoch indices may be tolerated until an increase of the lag is beneficial for the reduction of the bias, but at some point the behavior becomes pathological. Imagine, for instance, that we use the CLE in the early iterations of the particle filter for estimating the variance; then, at some time n, one realises that there exists some \(\uplambda \in \llbracket 0, n-1 \rrbracket \) for which \({\hat{\sigma }}_{n,\uplambda }^2(h_{n}) > {\hat{\sigma }}_{n}^2(h_{n})\), although their asymptotic values are supposed to be in the opposite order and the lag-based estimator is expected to be less variable. This suggests that the Eve indices might be depleted and not reliable anymore for supporting the variance estimator. It is then reasonable to assume that these will be unreliable also in the subsequent steps, since their degeneracy can only get worse. Extending this idea to the Enoch indices, we may define recursively the concept of depleted Enoch indices.
Definition 3.1
Let \((h_{n})_{n\in {\mathbb {N}}}\) be a given a sequence of functions such that \(h_{n}\in {\textsf{F}}(\mathcal {X}_n)\) for all n. The Enoch indices \((E_{m,n}^{i})_{i=1}^N\) are said to be depleted if at least one of the following conditions is satisfied:
-
(i)
the Enoch indices \((E_{m,n-1}^{i})_{i=1}^N\) are depleted;
-
(ii)
the Enoch indices \((E_{m-1,n}^{i})_{i=1}^N\) are depleted and, letting \(\uplambda :=n-m\), there exists \(\uplambda '\in \llbracket 0, \uplambda -1 \rrbracket \) such that \( {\hat{\sigma }}_{n,\uplambda }^2(h_{n})<{\hat{\sigma }}_{n,\uplambda '}^2(h_{n}) \).
By convention, for every \(n\in {\mathbb {N}}\), the Enoch indices \((E_{n,n}^{i})_{i=1}^N\) are never depleted, while \((E_{-1,n}^{i})_{i=1}^N\) are always depleted (even if these indices are never explicitly defined).
In order to check the depletion status of some indices \((E_{m,n}^{i})_{i=1}^N\) using Definition 3.1 we need to know the status of previous generations. Thus, in practice, depletion may be determined iteratively forwards in time, starting from \((E_{0,0}^{i})_{i=1}^N\), which are not depleted by definition. Then for every \(n\in {\mathbb {N}}^*\), knowing whether the indices \((E_{m,n-1}^{i})_{i=1}^N\) are depleted or not for all \(m\in \llbracket 0, n-1 \rrbracket \), it is possible to check the same for \((E_{m,n}^{i})_{i=1}^N\) starting from \(m=0\) and proceeding forwards to \(m = n\). This is done by checking first condition (i) in Definition 3.1; if this is not satisfied, then we check condition (ii). The idea behind condition (i) is that if a set \((E_{m,n-1}^{i})_{i=1}^N\) of Enoch indices is ill-suited to estimate the variance at some time \(n-1\), it will not be suited to estimate the variance at any future time, since the number of distinct elements in the set can only decrease with n. Regarding condition (ii), if instead the indices \((E_{m,n-1}^{i})_{i=1}^N\) are non-depleted, we still need to check if there is a more recent generation \((E_{m',n}^{i})_{i=1}^N\), \(m' \in \llbracket m + 1, n \rrbracket \), that produces a better estimate. The additional requirement of \((E_{m-1,n}^{i})_{i=1}^N\) being depleted serves to guarantee monotonicity, i.e., if \((E_{m,n}^{i})_{i=1}^N\) are depleted, then \((E_{m',n}^{i})_{i=1}^N\) should be as well for all \(m'\in \llbracket 0, m \rrbracket \), whereas if \((E_{m,n}^{i})_{i=1}^N\) are non-depleted, then \((E_{m',n}^{i})_{i=1}^N\) should not be either for all \(m'\in \llbracket m, n \rrbracket \) (Fig. 1).
Algorithm 3 describes our method, the adaptive-lag variance (ALVar) estimator, in which the optimal lag at each iteration is, as established by Theorem 3.2 below, chosen such that it is the largest one for which the corresponding Enoch indices are not depleted. This non-depletion condition is ensured by selecting recursively \(\uplambda _{n+1}\) in such a way that it produces the largest estimate, whose selection is bounded from above by \(\uplambda _n+1\). The lag is initialised by setting \(\uplambda _0\leftarrow 0\).
Theorem 3.2
For every \(n \in {\mathbb {N}}\), let \(\uplambda _{n}\) be the lag produced by n iterations of Algorithm 3. Then if \(\uplambda _n<n\), none of the Enoch indices \((E_{m,n}^{i})_{i=1}^N\), \(m\in \llbracket n\langle \uplambda _n\rangle , n \rrbracket \), are depleted whereas all the Enoch indices \((E_{m,n}^{i})_{i=1}^N\), \(m\in \llbracket 0, n\langle \uplambda _n\rangle -1 \rrbracket \), are depleted.
Proof
We proceed by induction. The claim is true for \(n=0\) since we initialise \(\uplambda _0\leftarrow 0\). Now, let the claim be true for some \(n\in {\mathbb {N}}\); then by the induction hypothesis and condition (i) of Definition 3.1, it holds that \((E_{m,n+1}^{i})_{i=1}^N\) are depleted for every \(m \in \llbracket 0, n \langle \uplambda _n \rangle -1 \rrbracket \) if \(\uplambda _n<n\), where \(n \langle \uplambda _n \rangle - 1 = (n + 1) \langle \uplambda _n+1\rangle -1\). On the other hand, by the induction hypothesis and the very construction of \(\uplambda _{n + 1}\) in Algorithm 3, none of the depletion conditions of Definition 3.1 are satisfied for \(m \in \llbracket (n+1) \langle \uplambda _{n+1} \rangle , n+1 \rrbracket \); hence, the corresponding Enoch indices are not depleted. If \(\uplambda _{n+1} < \uplambda _n+1\), then, again by the construction of \(\uplambda _{n + 1}\), \((E_{m,n+1}^{i})_{i=1}^N\) are depleted for \(m\in \llbracket (n+1)\langle \uplambda _n+1\rangle , (n+1)\langle \uplambda _{n+1}\rangle -1 \rrbracket \) as well by condition (ii). This concludes the proof. \(\square \)
The computation of the estimator (2.6) has complexity \({\mathcal {O}}(N)\) and is performed \(\uplambda _n+2\) times at each iteration n. In order to have an online algorithm with constant memory requirements we need \(\uplambda _n\) to be uniformly bounded in n. Although in theory the lag might increase indefinitely such that \(\uplambda _n=n\) for all \(n\in {\mathbb {N}}\), we may assume that there exists an upper bound on the lag for any fixed number N of particles. In support of this assumption, we know that the expected number of generations to the time where all the Enoch indices are equal, which is certainly larger than any lag selected by the proposed method, is \({\mathcal {O}}(N)\) uniformly in n; see Koskela et al. (2020). Thus, in practice there will generally exist some \(\uplambda _\text {max}\), depending on the model and on N but independent of n, such that \(\uplambda _n<\uplambda _\text {max}\) for all \(n\in {\mathbb {N}}\). Hence, the final algorithm is online, since it has both complexity and memory demand (again dominated by the storage of the Enoch indices) of order \({\mathcal {O}}(\uplambda _\text {max}N)\), independently of n, and adaptive, since the choice of each new lag is adapted to the output of the particle filter as well as the lag of the previous iteration. In the next section we are going to prove consistency of the estimator and present a heuristic argument concerning the dependence of the lag on the number of particles.
3.2 Theoretical results
Next, we show that for every \(n \in N\), the resulting adaptive-lag estimator constructed in the previous section is asymptotically consistent for the true asymptotic variance \(\sigma _{n}^2(h_{n})\), recalling however that the algorithm is meant to work in the regime where N is fixed and n is arbitrarily large. The ‘asymptotic’ algorithm is not online, since we are going to show that for all \(n\in {\mathbb {N}}\), \(\uplambda _n\) tends to n in probability as N grows, implying that in the limit we obtain the CLE at each step. Nevertheless, as we will see later, for a fixed number of particles, the range of the lags returned by the algorithm is expected to grow very slowly with N; more precisely, in Sect. 3.2.2 we argue for that this range increases only logarithmically with N, a claim that is also confirmed by our numerical experiments in Sect. 4.1.1.
3.2.1 Consistency
We now establish the consistency of the ALVar estimator.
Theorem 3.3
Let Assumption 2 hold. For every \(n\in {\mathbb {N}}\) and \(h_{m}\in {\textsf{F}}(\mathcal {X}_m)\), \(m\in \llbracket 1, n \rrbracket \), let \((\uplambda _m)_{m=1}^n\) be the lags produced by n iterations of Algorithm 3. Then, as \(N\rightarrow \infty \), it holds that \(\uplambda _n\overset{{\mathbb {P}}}{\longrightarrow }n\) and \( {\hat{\sigma }}_{n,\uplambda _n}^2(h_{n})\overset{{\mathbb {P}}}{\longrightarrow }\sigma _{n}^2(h_{n}). \)
Proof
We proceed by induction, assuming that the claim holds true for \(n-1\). For every \(\varepsilon >0\), it holds that
where \({\hat{\sigma }}_{n}^2(h_{n})\) is the CLE defined in (2.4), based on the same particle system. The second term on the right-hand side converges to zero as \(N \rightarrow \infty \), since \( {\hat{\sigma }}_{n}^2(h_{n})={\hat{\sigma }}_{n,n}^2(h_{n}) \) is consistent for \( \sigma _{n}^2(h_{n})\) by Theorem 2.2. To treat the first term, write
Since \({\hat{\sigma }}_{n}^2(h_{n})={\hat{\sigma }}_{n,n}^2(h_{n})\), it holds necessarily that \(\uplambda _n\ne n\) on the event \({\hat{\sigma }}_{n,\uplambda _n}^2(h_{n})\ne {\hat{\sigma }}_{n}^2(h_{n})\); thus,
To treat the probability \({\mathbb {P}}(\uplambda _n=n)\) we may write
where the second term of (3.5) is zero since \(\uplambda _n\le \uplambda _{n-1}+1\) by construction. Now,
where, by the induction hypothesis, \({\mathbb {P}}(\uplambda _{n-1}={n-1})\rightarrow 1\) as \(N\rightarrow \infty \). Moreover, by Theorem 2.2, it holds that \({\hat{\sigma }}_{n,\uplambda }^2(h_{n})\overset{{\mathbb {P}}}{\longrightarrow }\sigma _{n\langle \uplambda \rangle ,n}^2(h_{n})\) and \({\hat{\sigma }}_{n,n}^2(h_{n})\overset{{\mathbb {P}}}{\longrightarrow }\sigma _{n}^2(h_{n})\) as \(N\rightarrow \infty \), where \(\sigma _{n\langle \uplambda \rangle ,n}^2(h_{n})\le \sigma _{n}^2(h_{n})\) for all \(\uplambda \in \llbracket 0, n - 1 \rrbracket \), implying that (3.7) converges to one as \(N\rightarrow \infty \). Hence, \(\uplambda _n\overset{{\mathbb {P}}}{\longrightarrow }n\) and combining this with (3.6), (3.4), (3.3), and (3.2) yields that \( {\hat{\sigma }}_{n,\uplambda _n}^2(h_{n})\overset{{\mathbb {P}}}{\longrightarrow }\sigma _{n}^2(h_{n})\). Finally, the base case holds trivially true since \(\uplambda _0=0\) and \( {\hat{\sigma }}_{0,0}^2(h_{0})\overset{{\mathbb {P}}}{\longrightarrow }\sigma _{0}^2(h_{0}) \) for all \(h_{0}\in {\textsf{F}}(\mathcal {X}_0)\). \(\square \)
3.2.2 Heuristics on the dependence of the lag on the number of particles
In the light of Theorem 2.2, we expect \(\uplambda _n\) to increase with N. It is however crucial to understand how the values of the lags \((\uplambda _n)_{n\in {\mathbb {N}}}\) depend on N, since this will determine the performance and memory requirement of our algorithm. For instance, a linear dependence would imply a quadratic complexity, which is not desirable. In the rest of this section we provide a heuristic argument showing that if we minimise the MSE (3.1), then we may expect \(\uplambda _n\) to be \({\mathcal {O}}(\log N)\) for all \(n\in {\mathbb {N}}\).
If we approximate \({\mathbb {E}}[{\hat{\sigma }}_{n,\uplambda }^2(h_{n})]\) in (3.1) by the asymptotic limit \(\sigma _{n\langle \uplambda \rangle ,n}^2(h_{n})\), then the second term on the right-hand side is approximately the square of the asymptotic bias, \( (\sigma _{n}^2(h_{n})-\sigma _{n\langle \uplambda \rangle ,n}^2(h_{n}))^2\). Olsson and Douc (2019) show that under mild assumptions, the asymptotic bias is \({\mathcal {O}}(\rho ^\uplambda )\) for some mixing rate \(\rho \in (0,1)\). Regarding the variance of \({\hat{\sigma }}_{n,\uplambda }^2(h_{n})\), we know that it increases with the lag and therefore as the number of distinct Enoch indices decreases. Since the variance of a Monte Carlo estimator is generally inversely proportional to the Monte Carlo sample size, we may expect \({\text {Var}}({\hat{\sigma }}_{n,\uplambda }^2(h_{n}))\) to be \({\mathcal {O}}(1/N_\uplambda )\), where \(N_\uplambda \) is the number of distinct Enoch indices \((E_{n\langle \uplambda \rangle ,n}^{i})_{i=1}^N\) at generation \(n\langle \uplambda \rangle \). Now, by adopting the proof of Corollary 2 in Koskela et al. (2020), we may argue that under standard mixing assumptions which can be relaxed in practice, \( N_\uplambda \) is \({\mathcal {O}}(N/\uplambda )\),Footnote 3 Finally, we determine the order of the optimal lag \(\uplambda ^*\) by letting it be the minimum of the resulting crude approximation
of the MSE (3.1) as a function of \(\uplambda \), where \(c > 0\) and \(c' > 0\) are constants independent of \(\uplambda \) and N. It is then easily seen that \(\uplambda ^*\) is
Although this argument is heuristic, we will see later that it is well supported by our numerical simulations, in which the lags produced are very close the ones minimising the MSE, with a logarithmic dependence on N.
3.3 Extension to particle filters with adaptive resampling
We now consider the case in which selection is not necessarily performed at each iteration. Selection is essential in particle filters, as it copes with the well-known importance-weight degeneracy phenomenon (see, e.g., Cappé et al. 2005, Section 7.3); however, since resampling adds variance to the estimator, this operation should not be used unnecessarily. A common approach is hence to resample only when flagged by some weight-degeneracy criterion. One popular such criterion among others is the effective sample size (ESS, Liu 1996) defined by \(\textsf{ESS}_n^N {:}{=}1 / \sum _{i=1}^{N}(\omega _{n}^{i}/\Omega _{n})^2\), which gives an approximation of the number of active particles, i.e., particles with non-degenerated importance weight at time n. The ESS is minimal and equal to one when all the weights are equal to zero except one and maximal and equal to N when all weights are non-zero and equal. Using the ESS, one may, e.g., let the resampling operation be triggered only when \(\textsf{ESS}_n^N\le \alpha N\), where \(\alpha \in (0,1)\) is a design parameter. More generally, we may let \((\rho _{n}^{N})_{n\in {\mathbb {N}}}\) be a sequence of binary-valued random variables indicating whether resampling should be triggered or not. The sequence \((\rho _{n}^{N})_{n \in {\mathbb {N}}}\) is assumed to be adapted to the filtration \(({\mathcal {F}}_{n}^{N})_{n \in {\mathbb {N}}}\) generated by the particle filter, where \({\mathcal {F}}_{n}^{N} {:}{=}\sigma ((\xi _{0}^{i})_{i=1}^N, (\xi _{m}^{i}, I_{m}^{i})_{i=1}^N, m \in \llbracket 1, n \rrbracket )\). Thus, these indicators may be based on the ESS, letting \(\rho _{n}^{N}=\mathbb {1}_{\{ \textsf{ESS}_n^N<\alpha N\}}\), but also on n only, implying a deterministic selection schedule. Algorithm 4 shows one iteration of this adaptive procedure, which we later express in the compact form \((\xi _{n+1}^{i},\omega _{n+1}^{i},I_{n+1}^{i})_{i=1}^N\leftarrow \textsf{AdaPF}((\xi _{n}^{i},\omega _{n}^{i})_{i=1}^N,\rho _{n}^{N})\).
As described in the following, particle filters with adaptive resampling still satisfy a CLT, with asymptotic variance having a structure similar to that of (2.3) but depending also on an “asymptotic" resampling schedule to be defined next. We proceed in the same way as Mastrototaro et al. (2022, Section 3.2), following in turn Del Moral et al. (2012, Section 5.2), and consider, rather than a single deterministic parameter \(\alpha \), a sequence \( (\alpha _n)_{n\in {\mathbb {N}}} \) of parameters being realisations of random variables with state space (0, 1). This assumption, which can be relaxed in practice, is needed in order to deal with some technicalities in the proofs.
Assumption 3
The resampling schedule \((\rho _{n}^{N})_{n\in {\mathbb {N}}}\) is governed by the ESS, i.e., for every \(n\in {\mathbb {N}}\),
where the parameters \( (\alpha _n)_{n\in {\mathbb {N}}} \) are realisations of absolutely continuous independent random variables \((\upalpha _n)_{n\in {\mathbb {N}}}\) taking on values in (0, 1).
The following lemma is adopted from Mastrototaro et al. (2022, Lemma 3.5, with \(d=\infty \)).
Lemma 3.4
Let Assumption 3 hold in Algorithm 4. Then for every \(n\in {\mathbb {N}}\) and almost all \(\alpha _{0:n}\in (0,1)^{n+1}\) there exists \(\rho _{n}^{\alpha }\in \{0,1\}\) such that, as \(N\rightarrow \infty \),
We now have the following CLT for adaptive APFs, whose proof is found in Appendix B.
Theorem 3.5
Let Assumption 1 hold and let \((\xi _{n}^{i},\omega _{n}^{i})_{i=1}^N\) be generated by n iterations of Algorithm 4 according to a selection schedule \( (\rho _{n}^{N})_{n\in {\mathbb {N}}} \) satisfying Assumption 3. Then for every \(n\in {\mathbb {N}}\), \(h_{n}\in {\textsf{F}}(\mathcal {X}_n)\), and almost all \(\alpha _{0:n-1}\in (0,1)^{n}\), as \(N\rightarrow \infty \),
where Z is standard normally distributed random variable and the asymptotic variance \(\sigma _{n}^2\langle \rho _{0:n-1}^{\alpha }\rangle (h_{n})\), depending on \(\alpha _{0:n-1}\), is given in detail in Appendix B.
When designing a lag-based estimator of the asymptotic variance provided by Theorem 3.5, it turns out to be more convenient to define the lag in terms of the number of resampling operations rather than the number of iterations of the particle filter. For this purpose, let \(r_{n} {:}{=}\sum _{m=0}^{n-1}\rho _{m}^{N}\) be the counter of the number of times selection is performed before time n (with the convention \(r_{0}=0\)). Then the Enoch indices at each time n will be indexed by the resampling times \(r_{n}\) rather than n, since every iteration without resampling leaves these unaltered. More specifically, in the following, a generic Enoch index \(E_{m,r_{n}}^{i}\) will indicate the ancestor of the particle \(\xi _{n}^{i}\) at any time \(n'\in \llbracket 0, n \rrbracket \) such that \(r_{n'}=m \in \llbracket 0, r_{n} \rrbracket \). Then for all \(i\in \llbracket 1, N \rrbracket \) and \(m\in \llbracket 0, r_{n+1} \rrbracket \), the update (2.5) can been rewritten as
Notice that when we do not have resampling at time n, it holds that \(r_{n+1}=r_{n}\) and \(I_{n+1}^{i}=i\), implying \( E_{m,r_{n}}^{i} = E_{m,r_{n+1}}^{i} \) for all \(m\in \llbracket 0, r_{n+1} \rrbracket \) and \(i\in \llbracket 1, N \rrbracket \). In practice, for a given \(n\in {\mathbb {N}}\), the lag takes on values in \(\llbracket 0, r_{n} \rrbracket \) instead of \(\llbracket 0, n \rrbracket \) and, as before, the expression \(r_{n}\langle \uplambda \rangle \), \(\uplambda \in {\mathbb {N}}\), indicates the quantity \((r_{n}-\uplambda )\vee 0\). In this setting, the estimator (2.6) is rewritten as
Algorithm 5 shows one update of the adaptive-resampling APF along with the calculation of the corresponding ALVar estimate. Corollary 3.6, whose proof is found in Appendix B, provides the consistency of the variance estimator produced by the algorithm.
Corollary 3.6
Let Assumptions 2 and 3 hold. Moreover, for every \(n\in {\mathbb {N}}\) and \(h_{m}\in {\textsf{F}}(\mathcal {X}_m)\), \(m\in \llbracket 1, n \rrbracket \), let \((\uplambda _m)_{m=1}^n\) be the lags produced by n iterations of Algorithm 5. Then, letting \( {\hat{\sigma }}_{n,\uplambda _n}^2(h_{n}) \) be computed according to (3.8), it holds for almost all \(\alpha _{0:n-1}\in (0,1)^{n}\), as \(N\rightarrow \infty \),
3.4 Application of ALVar to lag-based fixed-point particle smoothing
As we will see next, the generality of the model framework in Sect. 2.2 allows the ALVar estimator to be used to assess the accuracy of certain online particle smoothing estimators. More precisely, for fixed \(m \in {\mathbb {N}}\) and \(h_{m} \in {\textsf{F}}(\mathcal {X}_m)\), consider online computation of the expectations \(\phi _{m\mid n}h_{m}\) as \(n \in \{m, m + 1, \ldots \}\) progresses, where for each \(n \ge m\),
This problem is referred to as fixed-point smoothing. In the context of SSMs (Example 2), \(\phi _{m \mid n}\) is the conditional distribution of the hidden state at time m given all the observations up to time \(n \ge m\), and the fixed-point smoothing problem consists of updating this distribution online as new observations become available. It is well known that Algorithm 1 provides, as a by-product, particle approximations also of the distributions (3.9), in the sense that \(\phi _{m\mid n}^{N} h_{m}\), where
forms a consistent estimator of \(\phi _{m \mid n} h_{m}\) for each n. In following we show how the variance of \(\phi _{m \mid n}^{N} h_{m}\) can be estimated using the ALVar estimator. For this purpose, we will exploit the generality of the model framework in Sect. 2.2 and introduce an auxiliary path-space model in which \(\phi _{m \mid n}\) can be interpreted as a filter distribution. More precisely, for every \(n\in {\mathbb {N}}\), define the path space \({\textsf{X}}_n^\textsf{path}{:}{=}{\textsf{X}}_0 \times \cdots \times {\textsf{X}}_n\) with corresponding \(\sigma \)-field and the unnormalised transition kernel
Defining also, for \(n \ge m\), the path-wise objective functions \(h_{n}^\textsf{path}(x_{0:n}) {:}{=}h_{m}(x_m)\), \(x_{0:n} \in {\textsf{X}}_n^\textsf{path}\), allows us to write \(\phi _{m \mid n} h_{m} = \phi _{n}^\textsf{path}h_{n}^\textsf{path}\), where \(\phi _{n}^\textsf{path}\) is induced by (2.1) for the kernels \(({\textbf {L}}_{n}^\textsf{path})_{n \in {\mathbb {N}}}\) and the same initial distribution \(\chi \) as in the original model. In other words, by extending the original model to an auxiliary path-space model, we have been able to express the quantity of interest as a filter expectation, which can be targeted using Algorithm 1 (operating on the extended model). Thus, by defining also proposal transition kernels
of similar form it holds that \({\textbf {L}}_{n}^\textsf{path}(x_{0:n}, \cdot )\) is absolutely continuous with respect to \({\textbf {P}}_{n}^\textsf{path}(x_{0:n}, \cdot )\) for all \(x_{0:n} \in {\textsf{X}}_n^\textsf{path}\), with Radon–Nikodym derivative given by \(\gamma _{n}^\textsf{path}(x_{0:n + 1}) {:}{=}\gamma _{n} (x_n, x_{n + 1})\), \(x_{0:n + 1} \in {\textsf{X}}_{n + 1}^\textsf{path}\). Finally, we define the auxiliary adjustment-weight multipliers \(\vartheta _{n}^\textsf{path}(x_{0:n}) {:}{=}\vartheta _{n}(x_n)\), \(x_{0:n} \in {\textsf{X}}_n^\textsf{path}\). Using this interpretation, we may now address the fixed-point smoothing problem by estimating \(\phi _{n}^\textsf{path}h_{n}^\textsf{path}= \phi _{m \mid n} h_{m}\) sequentially for \(n \in \{m, m + 1, \ldots \}\) using Algorithm 1 and monitor online the variance using the ALVar.
However, this approach is not without problems; indeed, due to the particle-path degeneracy phenomenon, this estimator suffers from high variance for large n, since the set \((E_{m,n}^{i})_{i = 1}^N\) of Enoch indices deteriorates eventually as n increases. Following Kitagawa and Sato (2001) and Olsson et al. (2008), this issue can be addressed by introducing a lag parameter \(\Delta \in {\mathbb {N}}^*\) and approximating \(\phi _{m \mid n} h_{m}\) by \(\phi _{m \mid m_{\Delta }(n)}^{N} h_{m}\), where \(m_{\Delta }(n) {:}{=}n \wedge (m+\Delta )\), leading to a somewhat biased but variance-reduced estimator. This idea can be most easily understood in the SSM context, where the argument is that future observations at long temporal distances from the state of interest do not affect the posterior distribution of the same, and that these can therefore be omitted from the particle estimator in order to avoid the genealogical-tree degeneracy. As the bias depends on the ergodic properties of the model (see Olsson et al. 2008, for an analysis), designing a good lag is, however, generally non-trivial (an adaptive approach was developed by Alenlöv and Olsson 2019). Nevertheless, in the following we assume that we are given some suitable lag \(\Delta \) and want to estimate the asymptotic variance of the lag-based approximation \(\phi _{m \mid m_{\Delta }(n)}^{N} h_{m}\). Moreover, since the updating of the particle approximation \(\phi _{m \mid m_{\Delta }(n)}^{N} h_{m}\) of \(\phi _{m \mid n} h_{m}\) ceases when \(n \ge m + \Delta \), we may, by keeping track of the particle history across a sliding window of fixed length \(\Delta + 1\), use this technique to address simultaneously the fixed-point smoothing problem for a range of fixed time points and objective functions, i.e., to approximate online, with time-homogeneous computational load and memory requirements, the elements of the vector
as n increases, where \(h_{m} \in {\textsf{F}}(\mathcal {X}_m)\) for each m (see Alenlöv and Olsson 2019, for a treatment of this problem). More precisely, we proceed by computing, for every iteration \(n \in \{\Delta , \Delta + 1, \ldots \}\), the marginal smoothing estimate \(\phi _{n-\Delta \mid n}^{N} h_{n-\Delta }\) and furnish the same with the lag-based variance estimate
obtained by applying (2.6) in the context of the auxiliary path-space particle model defined above. Here the lag \(\uplambda = \uplambda _n\) is designed adaptively with n using the ALVar in accordance with Algorithm 3, i.e., by letting, recursively, \(\uplambda _n\) be the \(\uplambda \in \llbracket 0, \uplambda _{n - 1}+1 \rrbracket \) maximising \({\hat{\sigma }}_{n-\Delta \mid n,\uplambda }^2(h_{n-\Delta })\). Note that since no estimates are computed in the first \(\Delta -1\) iterations, we let by convention \(\uplambda _n=n\) for all \(n\in \llbracket 0, \Delta -1 \rrbracket \). Interestingly, as established by the following result (which is proven in Appendix B), the lag selected adaptively by the ALVar in this manner always exceeds the smoothing lag \(\Delta \).
Proposition 3.7
Let \((\uplambda _n)_{n \in {\mathbb {N}}}\) be a sequence of lags produced by applying ALVar to the asymptotic-variance estimator (3.10). Then \(\uplambda _n \ge \Delta \) for all \(n\ge \Delta \).
In Sect. 4.1.4 we illustrate numerically that the ALVar may exhibit an excellent performance also in the context of lag-based fixed-point smoothing.
4 Numerical illustrations
In this section we apply, as an illustration, our approach to optimal filtering in SSMs (Example 2). In order to benchmark carefully our variance estimator against the fixed-lag estimator of Olsson and Douc (2019), we tested the ALVar on the same SSMs as in the latter work, namely
-
the stochastic volatility model introduced by Hull and White (1987) and
-
a linear Gaussian state space model for which exact computation of the filter is possible using the Kalman filter.
4.1 Stochastic volatility model
Our first SSM is governed by the equations
where \((U_n)_{n\in {\mathbb {N}}^*}\) and \((V_n)_{n\in {\mathbb {N}}}\) are sequences of uncorrelated standard Gaussian noise variables. The parameters are assumed to be known, with \((a,b,\sigma )=(\)0.975, 0.641, 0.165). We only observe the process \((Y_n)_{n\in {\mathbb {N}}}\), representing stock log-returns, while \((X_n)_{n\in {\mathbb {N}}}\), representing the log-volatility, is a hidden state process which we want to infer. The state \(X_0\) is initialised according to a zero-mean Gaussian distribution with variance \( \sigma ^2/(1-a^2)\), i.e., the stationary distribution of the state process. Thus, we deal with a fully dominated nonlinear SSM with \({\textsf{X}}={\textsf{Y}}={\mathbb {R}}\), \(\mathcal {X}=\mathcal {Y}={\mathcal {B}}({\mathsf {{\mathbb {R}}}})\), the Borel \(\sigma \)-field on \({\mathbb {R}}\), in which both \({\textbf {M}}\) and \({\textbf {G}}\) are Gaussian kernels.
A record \(y_{0:5000}\) of observations was obtained by simulating the process \((X_n, Y_n)_{n\in {\mathbb {N}}}\) under the dynamics (4.1) for the given parameterisation. For all \(n\in {\mathbb {N}}\), we let \(h_{n}\) be the identity function. In order to have a reliable benchmark for the variance we first implemented the naive, brute-force estimation technique described in Sect. 2.4, producing 2000 replicates of the particle filter with \(N=5000\). Then we computed the sample variance of these filter estimates at each iteration and multiplied the same by N.
Algorithm 3, with an underlying bootstrap particle filter (\({\textbf {P}}_{n} \equiv {\textbf {M}}\) and \(\vartheta _{n} \equiv 1\)), was implemented with the two different sample sizes \(N=1000\) and \(N=100000\) in order to assess stability as well as convergence. The output is displayed in Figs. 2 and 3, where the ALVar estimator is compared to the brute-force benchmark, the CLE, and the fixed-lag approach of Olsson and Douc (2019) with \(\uplambda \in \{14, 24\}\). In both cases, our estimator produces more precise and stable estimates of the asymptotic variance. Moreover, increasing the number of particles leads to significantly better accuracy, demonstrating the convergence properties of our method. These patterns can also be noticed in Fig. 4, where we focus on large values of n. As expected, we see that when n is large the CLE either drops to zero or suffers from large variance due to the depletion of the Eve indices. The fixed-lag approach has a similar behavior as our adaptive approach, being both close to the benchmark brute-force values. The fundamental difference is that in the adaptive method the lag is designed adaptively and dynamically, whereas for the fixed-lag method the lag is set to a constant value close to the average lag produced by the ALVar estimator. We stress again that without access to the ALVar procedure, the design of a suitable fixed lag \(\uplambda \) would require an exhaustive prefatory simulation-based analysis, where \(\uplambda \) is selected by producing multiple fixed-lag variance estimates for a range of different lags and repeated runs of the particle filter and comparing the same to an estimate obtained using the brute-force estimator.
The previous plots are based upon single runs of the algorithms producing variance estimates for different \(n\in \llbracket 0, 5000 \rrbracket \); we now focus instead on how several estimates are distributed for some specific times n. In the boxplots displayed in Fig. 5, each box represents the distribution of variance estimates at time \(n = 1000\) using the ALVar algorithm, the CLE, and the fixed-lag approach with several choices of \(\uplambda \), obtained on the basis of 100 replicates of Algorithm 2 for each of these lags. For the box dedicated to the ALVar we have indicated the average \(\uplambda _{1000}\) across the 100 independent particle filter replicates (not to be confused with the average lag across all iterations of a single realisation of the particle filter). We observe that our estimator manifests negligible bias, with variability similar to the one of the best fixed-lag estimators.
In addition, we include a comparison of the ALVar estimator to the one proposed by Janati et al. (2022), in which genealogical tracing is replaced by an online update based on particle approximation of the so-called backward kernels. Still, this gain in variance and time stability comes at a price; indeed, the most efficient version of the algorithm, where a set of auxiliary statistics are propagated through backward sampling in accordance with the so-called particle-based, rapid incremental smoother (PaRIS, Olsson and Westerborn 2017), has an \({\mathcal {O}}(N^2)\) computational cost per iteration (instead of the \({\mathcal {O}}(N^3)\) cost of the Rao-Blackwellised version). Figure 6 displays boxplots of variance estimates produced using the ALVar estimator, the backward-sampling method of Janati et al. (2022) and the CLE for different times n. We observe that in the early stages the backward-sampling method is more precise than the ALVar and the CLE in terms of sample variance. However, as the number of iterations increases, it begins to exhibit some increasing empirical bias and decays eventually towards zero like the CLE, but more slowly and with less variability. On the other hand, our approach has the fundamental advantages of being (i) stable throughout all iterations and (ii) significantly faster in terms of computational time. Indeed, already with \(N=1000\), the ALVar estimator was, in our implementation, about two orders of magnitude faster. We do not exclude that the execution time of the backward-sampling method might be optimised further to reduce this gap; still, suffering from an \({\mathcal {O}}(N^2)\) computational complexity, the backward-sampling estimator will always be considerably slower than the ALVar and scale unfavorably with the number of particles N.
4.1.1 Adaptive-lag analysis
In this part we investigate how the values of the lags chosen adaptively at each iteration of Algorithm 3 are distributed and depend on the number of particles N. Figure 7 displays the evolution of the chosen lags over time for \(N=1000\) particles. We see that after an initial constant increase of the lag, the values stabilise in a range between 5 and 30. An interesting pattern is the presence of regimes with constant increase of the lag, during which the same generation of Enoch indices is used, and sudden drops, when the so-far-used generation becomes depleted and a more recent one comes into substitution.
Without being shown here, a similar pattern can also be seen when N is increased from 1000 to 100, 000, in which case the range of selected lags is between 10 and 50, with an average around 24. This is not a surprise, since we have shown in Theorem 3.3 that the adaptive lag is expected to converge to the maximum possible value \(\uplambda _{n}=n\), why we expect the lags to increase with the sample size. The good news is that the complexity does not explode as N is increasing; indeed, the ALVar algorithm is between 1.5 and 2 times slower than a standard particle filter for \(N=1000\) particles and between 2 and 2.5 times slower for \(N=100{,}000\) particles. Moreover, our novel method always took significantly less than twice the time of a fixed-lag algorithm with \(\uplambda \) selected around the average value of the adaptive approach (1.4 and 1.7 times slower for \(N=1000\) and \(N=100000\) particles, respectively). The computational time of the ALVar procedure is closely related to the values of the adaptively selected lags: the larger the lags, the longer it takes to update the Enoch indices and to perform the update on Line 8 in Algorithm 3. In Fig. 8 we illustrate how the lags are distributed for different particle sample sizes; as predicted by our heuristic argument in Sect. 3.2.2, the dependence of the average lag on N appears to be (close to perfectly) logarithmic with respect to N. Also the maxima behave similarly.
4.1.2 ALVar versus empirical MSE
At the beginning of Sect. 3 we claimed that an optimal choice of the lag could be the one minimising the MSE. We now want to check that the lags selected by the ALVar algorithm are sufficiently close to the ones minimising (3.1) for most iterations. As we mentioned, (3.1) is hard to evaluate analytically but can be estimated by means of the empirical MSE obtained by running \(M\in {\mathbb {N}}^*\) independent particle filters. More precisely, for every \(n\in {\mathbb {N}}\), \(\uplambda \in {\mathbb {N}}\), and \(h_{n}\in {\textsf{F}}(\mathcal {X}_n)\), we define the empirical MSE
at time n, where \({\hat{\sigma }}^{2, j}_{n, \uplambda }(h_{n})\) is the estimate produced by the j-th particle filter and \(\sigma _{n}^2(h_{n})\) can be approximated by a brute-force estimate. Then, for every \(n\in {\mathbb {N}}\) and \(h_{n}\in {\textsf{F}}(\mathcal {X}_n)\) we determine the optimal lag by selecting
In order to compare the adaptive lags formed by the ALVar estimator to the empirical MSE-optimal lags (4.2), we run \(M=1000\) particle filters, each with \(N=10000\) particles, for \(n=500\) iterations; letting \(h_{n}=\text {id}\) for all n, we determined, for each replicate, the adaptive lags selected by the ALVar procedure as well as the ones minimising the empirical MSE. Figure 9 reports adaptive-lag distributions at some iterations, together with the lag values \({\hat{\uplambda }}_n^*\) minimising the empirical MSE; remarkably, we observe that the empirically optimal lags are within the range of lags selected by the ALVar algorithm, although the latter tends to choose slightly larger values on average.
4.1.3 Variance estimation in the case of adaptive resampling
In this section we test the ALVar estimator in the setting where the resampling operation is triggered by the ESS criterion according to Algorithm 4. Figure 10 displays brute-force estimates of the asymptotic variance as well as estimates produced by the ALVar estimator in Algorithm 5 for two distinct choices of the parameter \(\alpha \in \{0,5, 0.2\}\). In both cases, the observations \(y_{0:5000}\) generated in the previous section were used as input to the particle filter. Although being based on the same observations and exhibiting similar patterns, we notice that the two brute-force-estimated asymptotic variances differ, as expected, from each other and from the ones reported in Figs. 2 and 3. Still, in both cases the ALVar estimator is capable of tracking closely the brute-force benchmark. Since the lag size is now defined in terms of the number of selection-operation activations and not of simple iterations, the resulting average values, 3.0 when \(\alpha =0.5\) and 1.9 when \(\alpha =0.2\), are significantly smaller than before.
4.1.4 Variance estimation in fixed-point particle smoothing
We test the performance of our approach in the context of lag-based fixed-point smoothing with two distinct choices of the lag parameter \(\Delta \). As established in Sect. 3.4, we may apply directly the ALVar also in this context by storing (a part of) the trajectories and interpreting the lag-based particle smoother as a particle filter in a path model. Note that as a consequence of Proposition 3.7, it necessarily holds that the lags selected by the ALVar estimator are systematically larger than or equal to \(\Delta \). This is reflected by our simulations, reported in Fig. 11, where the average lag selected by the ALVar is about 24 and 59 in the cases \(\Delta =10\) and \(\Delta =50\), respectively, and never below the values of \(\Delta \). Clearly, an increase of \(\Delta \) implies a higher degree of path degeneracy at the time m of estimation, which in turn results in higher variance and a reduced difference of the between \(\uplambda _n\) and \(\Delta \). The latter can be viewed as a demonstration of the ability of the ALVar to reduce the lag when a higher degree of depletion of the Enoch indices is detected. The increase of variance is clear from Fig. 11. Finally, we conclude that also in this setting, the ALVar provides time stable and nearly unbiased variance estimates following closely the brute-force benchmark.
4.2 Linear Gaussian SSM
As a second example, we consider optimal filtering in the linear Gaussian SSM given by
Here \({\textsf{X}}={\mathbb {R}}^{d_x}\) and \({\textsf{Y}}={\mathbb {R}}^{d_y}\), \((d_x,d_y)\in {\mathbb {N}}^*\times {\mathbb {N}}^*\), with \(\mathcal {X}\) and \(\mathcal {Y}\) being the corresponding Borel \(\sigma \)-fields. The elements in the sequences \((U_n)_{n\in {\mathbb {N}}^*}\) and \((V_n)_{n\in {\mathbb {N}}}\) are mutually independent and standard multi-normally distributed in \({\mathbb {R}}^{d_x}\) and \({\mathbb {R}}^{d_y}\), respectively. Here the matrices A and \(S_u\) are \(d_x \times d_x\), B is \(d_y \times d_x\), and \(S_v\) is \(d_y \times d_y\).
4.2.1 Approximate confidence bounds
First, in order to evaluate the ability of the ALVar estimator to provide reliable Monte Carlo confidence bounds for the targeted quantities we consider a linear Gaussian SSM with \(d_x = d_y =1\) and scalar parameters given by \((A,B,S_u,S_v)=(0.98, 1, 0.2, 1)\). For linear Gaussian models, the filter distributions are Gaussian and available in a closed form through the Kalman filter (see, e.g., Cappé et al. 2005, Section 5.2.3), which makes these models particularly well suited for assessing the performance of particle methods. We proceed as follows. Given a sequence \(y_{0:1000}\) of observations, we first calculate the exact values of \(\phi _{n} \text {id} = {\mathbb {E}}[X_n \mid Y_{0:n}=y_{0:n}]\) for \(n\in \llbracket 0, 1000 \rrbracket \) using the Kalman filter; then we execute the fully adapted APF, for which the adjustment-weight multipliers and proposal kernels are given by \(\vartheta _{n}(x) = {\textbf {L}}_{n} \mathbb {1}_{{\textsf{X}}}(x) = {\textbf {M}} g_{n+1} (x)\), \(x \in {\textsf{X}}\), and \({\textbf {P}}_{n}(x, A) = {\textbf {L}}_{n}(x, A) / {\textbf {L}}_{n} \mathbb {1}_{{\textsf{X}}}(x) = {\textbf {M}}(g_{n+1}\mathbb {1}_{A})(x)/\vartheta _{n}(x)\), \((x, A) \in {\textsf{X}} \times \mathcal {X}\), respectively. In the case of systematic resampling, the fully adapted APF always provides completely uniform importance weights. We execute this particle filter with \(N = 10000\) particles on the same data as before and furnish, using the ALVar (Algorithm 3), each produced particle estimate with a 95% confidence interval
where \(\uplambda _{0.025}\) is the 2.5% quantile of the standard Gaussian distribution. The latter particle filtering procedure is repeated 200 times, yielding equally many statistically independent confidence intervals per time point n. Figure 12 reports the failure rate, i.e., the ratio of cases in which the true value falls outside the corresponding confidence interval, at each time point. We observe an average failure rate across all times around \(5.0\%\), which is the ideal value.
Similar results were obtained with the ESS-based approach described by Algorithm 5, where using \(\alpha = 0.2\) and \(\alpha = 0.5\) lead to average failure rates of \(5.2\%\) and \(4.9\%\), respectively. All in all, these results confirm that our approach is reliable and has small or negligible bias.
4.2.2 Increasing the dimensionality
We finally test how the ALVar responds to an increase of dimensionality. As pointed out in several works (see, e.g., Bickel et al. 2008; Rebeschini and van Handel 2015), the performance of standard particle filters deteriorates generally with increasing dimensionality, due to increased weight skewness. Countermeasures usually include increasing the number of particles or designing better proposal distributions. In order to form an idea of how this effects the performance of the ALVar, we present some results on a linear Gaussian SSM (4.3) with \(d_x=15\) and \(d_y=5\). The matrix A has entries \(a_{ij}\propto 0.5^{ \mid i-j \mid + 1}\), \((i,j)\in \llbracket 1, 15 \rrbracket ^2\), renormalised such that \(\Vert A\Vert _{2}=0.98\), while B has entries \(b_{ij}\), \((i,j)\in \llbracket 1, 5 \rrbracket \times \llbracket 1, 15 \rrbracket \), being unity if \(i=(j \mod 5)\), otherwise zero; the remaining parameters are \(S_u=0.2{{\textbf {I}}}\) and \(S_v={{\textbf {I}}}\). In this setting, the bootstrap particle filter was used to track online the expected radius of the hidden states under the filter posterior, i.e., \((\phi _{n} h_{n})_{n \in {\mathbb {N}}}\) with \(h_{n}{:}{=}\Vert \cdot \Vert _{2}\), while monitoring the variance using the ALVar. Figure 13 reports averages and quantiles of the resulting variance estimates produced at different time steps, obtained on the basis of 100 independent runs of the particle filter and associated ALVar estimator. Also in this case the variance estimator stays stable with good accuracy in the long term, without any additional precaution except a possibly larger size of the particle sample.
Notes
More precisely, following the terminology of Del Moral (2004), the distribution flow generated by kernels of this type is referred to as an updated Feynman–Kac marginal model We will however refer to it as simply a ‘Feynman–Kac model’ for simplicity.
In the mentioned work, where focus is on particle approximation of path-distribution flows using backward-sampling techniques, the underlying model is, in contrast to here, supposed to be dominated in the sense that the kernels \(({\textbf {L}}_{n})_{n\in {\mathbb {N}}}\) and \(({\textbf {P}}_{n})_{n\in {\mathbb {N}}}\) are assumed to have densities with respect to some dominating measures. Still, as long as focus is only on the marginal flow \((\phi _{n})_{n \in {\mathbb {N}}}\), the same proofs may be straightforwardly adapted to the non-dominated setting.
In order to clarify this reasoning, we redefine \(\uplambda \) as the random variable such that \(n\langle \uplambda \rangle \) is the largest time before n for which there are \(N' < N\) distinct Enoch indices. Now, in Kingman’s n-coalescent model the expected (continuous) time needed to reach \(N'\) distinct ancestors of N offspring is \(2/N'-2/N\). If we plug this value (instead of the time to the most recent common ancestor) into the proof of the mentioned corollary, we obtain that \({\mathbb {E}}[\uplambda ]\) is \({\mathcal {O}}(N/N')\). Thus, letting \(N'=N_\uplambda \) we may conclude that \(N_\uplambda \) is \({\mathcal {O}}(N/{\mathbb {E}}[\uplambda ])\).
References
Alenlöv, J., Olsson, J.: Particle-based adaptive-lag online marginal smoothing in general state-space models. IEEE Trans. Signal Process. 67(21), 5571–5582 (2019)
Bickel, P., Li, B., Bengtsson, T.: Sharp failure rates for the bootstrap particle filter in high dimensions. Pushing the limits of contemporary statistics: contributions in honor of Jayanta K. Ghosh 3, 318–329 (2008)
Cappé, O., Moulines, E., Rydén, T.: Inference in Hidden Markov Models. Springer, New York (2005)
Chan, H.P., Lai, T.L.: A general theory of particle filters in hidden Markov models and some applications. Ann. Stat. 41(6), 2877–2904 (2013)
Chopin, N.: Central limit theorem for sequential Monte Carlo methods and its application to Bayesian inference. Ann. Stat. 32(6), 2385–2411 (2004)
Chopin, N., Papaspiliopoulos, O.: An Introduction to Sequential Monte Carlo Methods. Springer, New York (2020)
Crisan, D., Míguez, J., Ríos-Muñoz, G.: On the performance of parallelisation schemes for particle filtering. EURASIP J. Adv. Signal Process. 2018(1), 1–18 (2018)
Del Moral, P.: Feynman–Kac Formulae. Genealogical and Interacting Particle Systems With Applications. Springer, New York (2004)
Del Moral, P.: Mean Field Simulation for Monte Carlo Integration. CRC Press, Chapman & Hall/CRC Monographs on Statistics & Applied Probability (2013)
Del Moral, P., Guionnet, A.: Central limit theorem for nonlinear filtering and interacting particle systems. Ann. Appl. Probab. 9(2), 275–297 (1999)
Del Moral, P., Doucet, A., Jasra, A.: On adaptive resampling strategies for sequential Monte Carlo methods. Bernoulli 18(1), 252–278 (2012)
Douc, R., Moulines, E.: Limit theorems for weighted samples with applications to sequential Monte Carlo methods. Ann. Stat. 36(5), 2344–2376 (2008)
Doucet, A., De Freitas, N., Gordon, N. (eds.): Springer, New York (2001)
Du, Q., Guyader, A.: Variance estimation in adaptive sequential Monte Carlo. Ann. Appl. Probab. 31(3), 1021–1060 (2021)
Gordon, N., Salmond, D., Smith, A.F.: Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proc. F Radar Signal Process. 140, 107–113 (1993)
Hull, J., White, A.: The pricing of options on assets with stochastic volatilities. J. Finance 42, 281–300 (1987)
Janati, Y., Le Corff, S., Petetin, Y.: Variance estimation for sequential Monte Carlo algorithms: a backward sampling approach (2022). https://doi.org/10.48550/ARXIV.2204.01401
Kitagawa, G., Sato, S.: Monte Carlo smoothing and self-organising state-space model. In: Sequential Monte Carlo Methods in Practice. Stat. Eng. Inf. Sci., pp. 177–195. Springer, New York (2001)
Koskela, J., Jenkins, P.A., Johansen, A.M., Spanò, D.: Asymptotic genealogies of interacting particle systems with an application to sequential Monte Carlo. Ann. Stat. 48(1), 560–583 (2020). https://doi.org/10.1214/19-AOS1823
Künsch, H.R.: Recursive Monte Carlo filters: algorithms and theoretical analysis. Ann. Stat. 33(5), 1983–2021 (2005)
Lee, A., Whiteley, N.: Variance estimation in the particle filter. Biometrika 105(3), 609–625 (2018)
Liu, J.S.: Metropolized independent sampling with comparisons to rejection sampling and importance sampling. Stat. Comput. 6, 113–119 (1996)
Mastrototaro, A., Olsson, J., Alenlöv, J.: Fast and numerically stable particle-based online additive smoothing: the AdaSmooth algorithm. J. Am. Stat. Assoc. 25, 1–12 (2022). https://doi.org/10.1080/01621459.2022.2118602
Olsson, J., Douc, R.: Numerically stable online estimation of variance in particle filters. Bernoulli 25(4), 1504–1535 (2019)
Olsson, J., Westerborn, J.: Efficient particle-based online smoothing in general hidden Markov models: the PaRIS algorithm. Bernoulli 23(3), 1951–1996 (2017)
Olsson, J., Cappé, O., Douc, R., Moulines, E.: Sequential Monte Carlo smoothing with application to parameter estimation in non-linear state space models. Bernoulli 14(1), 155–179 (2008)
Pitt, M.K., Shephard, N.: Filtering via simulation: auxiliary particle filters. J. Am. Stat. Assoc. 94(446), 590–599 (1999)
Rebeschini, P., van Handel, R.: Can local particle filters beat the curse of dimensionality? Ann. Appl. Probab. 25(5), 2809–2866 (2015)
Acknowledgements
This work is supported by the Swedish Research Council, Grant 2018-05230.
Funding
Open access funding provided by Royal Institute of Technology.
Author information
Authors and Affiliations
Contributions
Both authors contributed equally to the manuscript
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
A: Proof of Theorem 2.2
In the following we will prove Theorem 2.2 by extending Olsson and Douc (2019, Proposition 8), which provides the convergence of interest in the special case of Feynman–Kac models and standard bootstrap filters, to the general model context in Sect. 2.2 and APFs. For this purpose, we first need to reformulate the APF introduced Sect. 2.3 as a simple bootstrap filter operating on an auxiliary, extended Feynman–Kac model. More precisely, define a sequence \((\bar{{\textsf{X}}}_{n},\bar{\mathcal {X}}_{n})_{n\in {\mathbb {N}}}\) of measurable spaces by setting \(\bar{{\textsf{X}}}_{0} {:}{=}{\textsf{X}}_0\) and \(\bar{\mathcal {X}}_{0} {:}{=}\mathcal {X}_0\) and, for \(n \in {\mathbb {N}}^*\), \(\bar{{\textsf{X}}}_{n} {:}{=}{\textsf{X}}_{n-1} \times {\textsf{X}}_n\) and . Moreover, elements in these spaces will be denoted by \({\bar{x}}_{0} {:}{=}x_0\), \({\bar{x}}_{n} {:}{=}(x_{n-1}, x_n)\) and we will also write \({\bar{x}}_{n}^1 {:}{=}x_{n - 1}\) and \({\bar{x}}_{n}^2 {:}{=}x_n\) to indicate the first and second element of \({\bar{x}}_{n}\), respectively. Let \({\bar{\nu }} {:}{=}\nu \) and define \({\bar{g}}_{0}({\bar{x}}_{0}) {:}{=}\vartheta _{0}(x_0)\gamma _{-1}(x_0)\). We also define the initial unnormalised measure \({\bar{\chi }} \in {\textsf{M}}(\bar{\mathcal {X}}_{0})\) such that for \(A \in \bar{\mathcal {X}}_{0}\), \({\bar{\chi }}(A) = {\bar{\nu }}({\bar{g}}_{0} \mathbb {1}_{A})\) (notice that \({\bar{\chi }}\ne \chi \)). Now, define, for each \(n\in {\mathbb {N}}^*\), the auxiliary potential function
and the Markov transition kernel
Thus, as established by Lemma (A.1) below, Algorithm 1 may now be reinterpreted as a bootstrap particle filter operating on the auxiliary Feynman–Kac model given by (A.1) and (A.2). Algorithm 6 shows one iteration of this procedure, which is initialised by sampling \(({\bar{\xi }}_{0}^{i})_{i=1}^N\) from and letting \({\bar{\omega }}_{0}^{i}\leftarrow {\bar{g}}_{0}({\bar{\xi }}_{0}^{i})\) for all i.
Lemma A.1
Let \((\xi _{n}^{i},\omega _{n}^{i},I_{n}^{i})_{i=1}^N\) and \(({\bar{\xi }}_{n}^{i},{\bar{\omega }}_{n}^{i},{\bar{I}}_{n}^{i})_{i=1}^N\) be the particles, weights, and resampling indices produced by n iterations of Algorithms 1 and 6, respectively. Then for every \(n\in {\mathbb {N}}^*\),
where \({\bar{\xi }}_{n}^{i,2}\) indicates the second component of \({\bar{\xi }}_{n}^{i}{:}{=}({\bar{\xi }}_{n}^{i,1},{\bar{\xi }}_{n}^{i,2})\).
Proof
First note that by construction, \((\xi _{0}^{i},\omega _{0}^{i}\vartheta _{0}(\xi _{0}^{i}))_{i=1}^N\overset{{\mathcal {D}}}{=}({\bar{\xi }}_{0}^{i},{\bar{\omega }}_{0}^{i})_{i=1}^N\). We may hence proceed by induction and assume that \((\xi _{n}^{i},\omega _{n}^{i}\vartheta _{n}(\xi _{n}^{i}))_{i=1}^N\overset{{\mathcal {D}}}{=}({\bar{\xi }}_{n}^{i,2},{\bar{\omega }}_{n}^{i})_{i=1}^N\) for some \(n\in {\mathbb {N}}\) (in the case \(n=0\), we denote \({\bar{\xi }}_{0}^{i,2} {:}{=}{\bar{\xi }}_{0}^{i}\)). In Algorithm 1 we draw the conditionally i.i.d. resampling indices \((I_{n+1}^{i})_{i=1}^N\) according to
whereas in Algorithm 6 we draw \(({\bar{I}}_{n+1}^{i})_{i=1}^N\) according to
which implies that \((I_{n+1}^{i})_{i=1}^N\overset{{\mathcal {D}}}{=}({\bar{I}}_{n+1}^{i})_{i=1}^N\). Furthermore, in Algorithm 1 the particles are propagated by sampling, for \(i \in \llbracket 1, N \rrbracket \),
whereas in Algorithm 6, for \(i \in \llbracket 1, N \rrbracket \),
which corresponds to assigning
and then sampling
Using the induction hypothesis and the previous result on the resampling indices, this implies that \((\xi _{n+1}^{i})_{i=1}^N\overset{{\mathcal {D}}}{=}({\bar{\xi }}_{n+1}^{i,2})_{i=1}^N\). Finally, since the weights are functions only of the particles and the indices, it holds, for \(i\in \llbracket 1, N \rrbracket \),
which shows that \((\xi _{n+1}^{i},\omega _{n+1}^{i}\vartheta _{n+1}(\xi _{n+1}^{i}),I_{n+1}^{i})_{i=1}^N \overset{{\mathcal {D}}}{=}({\bar{\xi }}_{n+1}^{i,2},{\bar{\omega }}_{n+1}^{i},{\bar{I}}_{n+1}^{i})_{i=1}^N\). This completes the proof. \(\square \)
We are now ready to prove Theorem 2.2.
Proof of Theorem 2.2
Consider a given iteration index \(n\in {\mathbb {N}}\); if we are interested in estimating the variance of the particle estimator after n iterations, we may assume that \(\vartheta _{n}\equiv 1\), since the adjustment multiplier \(\vartheta _{n}\) only influences the distribution of the particles of the APF at the next iterations \(n + 1, n + 2, \ldots \) Now, the estimate of \(\phi _{n} h_{n}\) produced by Algorithm 1 for a given \(h_{n}\in {\textsf{F}}(\mathcal {X}_n)\) can be interpreted as a statistically equivalent estimate formed by Algorithm 6 by defining \({\bar{h}}_{n}: \bar{{\textsf{X}}}_{n} \ni {\bar{x}}_{n} \mapsto h_{n}({\bar{x}}_{n}^2)\). Then by Lemma A.1,
Next, we also notice that the expectations \({\bar{\phi }}_{n}{\bar{h}}_{n}\) and \(\phi _{n}h_{n}\) coincide; indeed, for every \(m\in {\mathbb {N}}^*\), we may define
for \(\ell \in \llbracket 1, n - 1 \rrbracket \) and \( \bar{{\textbf {L}}}_{\ell :m-1}=\text {id} \) otherwise; moreover, for any function \({\bar{h}}_{m} \in {\textsf{F}}(\bar{\mathcal {X}}_{m})\) such that for all \({\bar{x}}_{m}\in \bar{{\textsf{X}}}_{m}\), \({\bar{h}}_{m}({\bar{x}}_{m})=h_{m}({\bar{x}}_{m}^2)\) for some \(h_{m}\in {\textsf{F}}(\mathcal {X}_{m})\) it holds, for every \(\ell \in \llbracket 0, m - 1 \rrbracket \),
This implies that
Thus, since we are assuming that \(\vartheta _{n}\equiv 1\), it follows that \({\bar{\phi }}_{n}{\bar{h}}_{n} = \phi _{n}h_{n}\). Applying the CLT in Proposition 2.1 to the auxiliary particle model yields, since \(({\bar{g}}_{n})_{n\in {\mathbb {N}}}\) are all bounded by Assumption 1, as \(N\rightarrow \infty \),
for some asymptotic variance \({\bar{\sigma }}_n^2({\bar{h}}_{n})\). However, since \({\bar{\phi }}_{n}^{N}{\bar{h}}_{n} \overset{{\mathcal {D}}}{=}\phi _{n}^{N}h_{n}\) and \({\bar{\phi }}_{n}{\bar{h}}_{n}=\phi _{n}h_{n}\), it necessarily holds that \({\bar{\sigma }}_n^2({\bar{h}}_{n})=\sigma _n^2(h_{n})\), where \(\sigma _n^2({\bar{h}}_{n})\) is given by (2.3) with \(\ell =0\). It is also easily checked that the equality holds for all the truncated variances. This is done by first recalling that Algorithm 6 may be seen as a particular case of the general framework under consideration, for which the Radon–Nikodym derivatives with respect to the initial proposal density and the proposal Markov kernels are given by the potential functions and, moreover, the adjustment multipliers are all assumed to be equal to one. Thus, we may write, for all \(\ell \in \llbracket 0, n \rrbracket \),
and by (2.3) we may conclude that
Finally, recall that for every \(m\in \llbracket 0, n \rrbracket \), the Enoch indices \(({\bar{E}}_{m,n}^{i})_{i=1}^N\) are, for all \(k\in \llbracket 0, n \rrbracket \) and \(i\in \llbracket 1, N \rrbracket \), recursively defined by
Consequently, these indices are functions of the resampling indices, and by Lemma A.1 it holds that \(({\bar{E}}_{m,n}^{i})_{i=1}^N\overset{{\mathcal {D}}}{=}(E_{m,n}^{i})_{i=1}^N\) (where the latter are the Enoch indices of the particle system generated by Algorithm 2). Now, if we let, for any \(\uplambda \in {\mathbb {N}}\), \(\hat{{\bar{\sigma }}}_{n,\uplambda }^2({\bar{h}}_{n})\) be the lag-based variance estimator for the auxiliary bootstrap particle model, we have, again as a consequence of Lemma A.1, recalling that \({\bar{h}}_{n}({\bar{x}}_{n}) = h_{n}({\bar{x}}_{n}^2)\) and \(\vartheta _{n}\equiv 1\),
and by (2.6) we may thus conclude that
Finally, note that the assumptions of the theorem imply that for all \(n\in {\mathbb {N}}\) and \({\bar{x}}_{n}\in \bar{{\textsf{X}}}_{n}\), \({\bar{g}}_{n}({\bar{x}}_{n})>0\) and \(\sup _{{\bar{x}}_{n}\in \bar{{\textsf{X}}}_{n}}{\bar{g}}_{n}({\bar{x}}_{n})<\infty \), which means that the assumptions of (Olsson and Douc 2019, Proposition 8) are satisfied as well. This implies that \(\hat{{\bar{\sigma }}}_{n,\uplambda }^2({\bar{h}}_{n})\overset{{\mathbb {P}}}{\longrightarrow }{\bar{\sigma }}_{n\langle \uplambda \rangle ,n}^2({\bar{h}}_{n})\) as \(N \rightarrow \infty \), and combining this result with (A.3) and (A.4) yields immediately that \( {\hat{\sigma }}_{n,\uplambda }^2(h_{n})\overset{{\mathbb {P}}}{\longrightarrow }\sigma _{n\langle \uplambda \rangle ,n}^2(h_{n})\) as \(N \rightarrow \infty \). The proof is complete. \(\square \)
B: Extensions to APFs with adaptive resampling
1.1 B.1: Model extension
The purpose of this section is to show, using arguments similar to those of Mastrototaro et al. (2022, Appendix C.1), that the results for the standard bootstrap particle filter can be used directly to establish the asymptotic normality in the case where the selection schedule is deterministic and determined by to some given sequence \(({\rho _{n}})_{n\in {\mathbb {N}}}\) of indicators, where \({\rho _{n}} = 1\) means that resampling happens at time n. To do this, we will extend the model of Sect. 2.2 by allowing the states to be represented by paths of different lengths determined by \(({\rho _{n}})_{n\in {\mathbb {N}}}\). First, define the resampling times \(n_m {:}{=}\min \{n \in {\mathbb {N}}: r_{n+1} = m + 1\}\), \(m \in {\mathbb {N}}\), where \((r_{n})_{n \in {\mathbb {N}}}\) are defined in Sect. 3.3, and, by convention, let \(n_{-1} {:}{=}-1\). Then we introduce the sequence \((\varvec{{\textsf{X}}}_{m}, \varvec{{\mathcal {X}}}_{\!m})_{m \in {\mathbb {N}}}\) of measurable spaces, where \(\varvec{{\textsf{X}}}_{m} {:}{=}{\textsf{X}}_{n_{m-1}+1 }\times {\textsf{X}}_{n_{m-1}+2} \times \cdots \times {\textsf{X}}_{n_{m}}\) and . As a general rule, we use boldface to indicate that a quantity is related to such an extended path space; e.g., we let \({\varvec{x}}_{m} {:}{=}x_{n_{m-1}+1:n_m}\) indicate a generic element in \(\varvec{{\textsf{X}}}_{m}\) and define, for every \(m \in {\mathbb {N}}^*\) and \(k\in \llbracket n_{m-1}+1, n_m \rrbracket \), the projection
We then introduce the multi-step unnormalised transition kernels \((\varvec{{\mathcal {L}}}_{m})_{m \in {\mathbb {N}}}\), obtained as tensor-products of the single-step Markov transition kernels \(({\textbf {L}}_{n})_{n\in {\mathbb {N}}}\) of Sect. 2.2; more precisely, for every \(m \in {\mathbb {N}}\),
Note that \(\varvec{{\mathcal {L}}}_{m}\) depends only on \(x_{n_m}\) and is constant with respect to the previous states. Moreover, note that if selection is performed at each iteration, then \(n_m =m\) for all m, implying \(\varvec{{\mathcal {L}}}_{m} = {\textbf {L}}_{m}\). The initial measure \(\varvec{\chi }\) on \(\varvec{{\mathcal {X}}}_{\!0}\) is defined as . Again, for compactness, we write \(\varvec{{\mathcal {L}}}_{k:\ell } {:}{=}\varvec{{\mathcal {L}}}_{k}\cdots \varvec{{\mathcal {L}}}_{\ell }\) whenever \(k \in \llbracket 0, \ell \rrbracket \), otherwise \( \varvec{{\mathcal {L}}}_{k:\ell }=\text {id} \). Next, for every \(n\in {\mathbb {N}}\), we define the distribution flow
In order to apply an APF to this model we introduce auxiliary functions \((\varvec{\vartheta }_{m})_{m\in {\mathbb {N}}}\) defined by
After resampling, the particles are propagated according to the Markov proposal kernels \((\varvec{{\mathcal {P}}}_{m})_{m \in {\mathbb {N}}}\), where
Under the assumptions of the paper, each measure \(\varvec{{\mathcal {L}}}_{n}({\varvec{x}}_{m},\cdot )\), \({\varvec{x}}_{m}\in \varvec{{\textsf{X}}}_{m}\), is absolutely continuous with respect to \(\varvec{{\mathcal {P}}}_{m}({\varvec{x}}_{m},\cdot )\). Hence, for every \({\varvec{x}}_{m}\), we may let \(\varvec{\gamma }_{m}({\varvec{x}}_{m}, {\varvec{x}}_{m + 1}) = d \varvec{{\mathcal {L}}}_{m}({\varvec{x}}_{m},\cdot ) / d \varvec{{\mathcal {P}}}_{n}({\varvec{x}}_{m},\cdot )\), \({\varvec{x}}_{m + 1} \in \varvec{{\textsf{X}}}_{m + 1}\), be the corresponding Radon–Nikodym derivative. Moreover, it is easily seen that for \(\varvec{{\mathcal {P}}}_{m}({\varvec{x}}_{m}, \cdot )\)-almost all \({\varvec{x}}_{m+1} \in \varvec{{\textsf{X}}}_{m+1}\),
Finally, we define the proposal probability measure . The measure \(\chi \) is absolutely continuous with respect to \(\varvec{\nu }\), and we let \(\varvec{\gamma }_{-1}\) be the Radon–Nikodym derivative of \(\varvec{\chi }\) with respect to \(\varvec{\nu }\). It is easily shown that \(\varvec{\gamma }_{-1}({\varvec{x}}_{0})=\gamma _{-1}(x_0)\prod _{k=0}^{n_0-1}\gamma _{k}(x_k,x_{k+1})\) for \(\varvec{\nu }\)-almost all \({\varvec{x}}_{0} \in \varvec{{\textsf{X}}}_{0}\).
Algorithm 7 shows one iteration of the APF for the extended model, which is initialised by drawing and letting \(\varvec{\omega }_{0}^{i} \leftarrow \varvec{\gamma }_{-1}(\varvec{\xi }_{0}^{i})\) for all \(i\in \llbracket 1, N \rrbracket \). Proposition B.1 connects the output of this algorithm to that of Algorithm 4 (cf. Mastrototaro et al. 2022, Proposition C.1, which states a similar result in the context of particle-based additive smoothing).
Proposition B.1
Let \(({\rho _{n}})_{n \in {\mathbb {N}}}\) be a deterministic selection schedule and let \((n_m)_{m \in {\mathbb {N}}}\) be the induced selection times. Furthermore, let \((\xi _{n_m}^{i}, \omega _{n_m}^{i})_{i = 1}^N\), \(m \in {\mathbb {N}}\), be a subsequence of weighted samples generated by Algorithm 4 for the original model and let \((\varvec{\xi }_{m}^{i}, \varvec{\omega }_{m}^{i})_{i = 1}^N\), \(m \in {\mathbb {N}}\), be weighted samples generated by Algorithm 7 for the extended model. Then for every \(m \in {\mathbb {N}}\),
Proof
We proceed by induction. Assume that we have generated a sample \((\xi _{n_m}^{i}, \omega _{n_m}^{i})_{i = 1}^N\) by means of \(n_m\) iterations of Algorithm 4, and that the claim holds true for this sample. We now examine the output of iteration \(n_{m + 1}\). Since we know that \(\rho _{n_m} = 1\), the sample at time \(n_m + 1\) is produced by selection and mutation; thereafter, selection is not performed until time \(n_{m+1}\) (since \(\rho _k = 0\) for all \(k \in \llbracket n_m + 1, n_{m + 1} - 1 \rrbracket \)). Hence, each particle path \(\xi _{n_m + 1:n_{m + 1}}^{i}\) will be generated according to
and assigned the importance weight
where
Now, on the other hand, by applying one iteration of Algorithm 7 to the sample \((\varvec{\xi }_{m}^{i}, \varvec{\omega }_{m}^{i})_{i = 1}^N\) we obtain path-particles \(\varvec{\xi }_{m + 1}^{i} = \xi _{n_m + 1:n_{m+ 1}}^{i}\), \(i \in \llbracket 1, N \rrbracket \), with distribution
whose associated weights are
and where
Finally, by comparing (B.1) and (B.4), (B.2) and (B.5), (B.3) and (B.6), and applying the induction hypothesis,
The base case \(m = 0\) is established similarly. This completes the proof. \(\square \)
Thus, in the case of a deterministic—but possibly irregular—resampling schedule, the APF may be reinterpreted as a particle model with systematic resampling operating on the auxiliary model described above. As the CLT in Proposition 2.1 is a general result, valid for arbitrary models and APFs (with systematic resampling), it holds also for the extended model and Algorithm 7, and the asymptotic normality of the output of Algorithm 7 follows immediately. This finding is summarised by the following proposition.
Proposition B.2
Assume that the functions \(\varvec{\gamma }_{-1}\), \((\varvec{\gamma }_{m}/\varvec{\vartheta }_{m})_{m\in {\mathbb {N}}}\), and \((\varvec{\vartheta }_{m})_{m\in {\mathbb {N}}}\) are all bounded. Then for every \(m\in {\mathbb {N}}\) and \({\varvec{h}}_{m}\in {\textsf{F}}(\varvec{{\mathcal {X}}}_{\!m})\), as \(N\rightarrow \infty \),
where Z is standard normally distributed and \(\varvec{\sigma }_m^2({\varvec{h}}_{m}) {:}{=}\varvec{\sigma }_{0,m}^2({\varvec{h}}_{m})\), with, for \(\ell \in \llbracket 0, m \rrbracket \),
Proposition B.2 implies that we may obtain a CLT also in the case where Algorithm 4 is driven by a deterministic resampling schedule \(({\rho _{n}})_{n\in {\mathbb {N}}}\). To conclude formally this argument, consider the output of Algorithm 4 after an arbitrarily chosen number n of iterations; even though n is generally not a resampling time, we may, without loss of generality, assume that it is so (since the distribution of the particle sample at a given time point does not depend on whether selection will be performed in a subsequent iteration of the algorithm). In particular, in the extended model we may let \(\varvec{{\textsf{X}}}_{r_{n}} = {\textsf{X}}_{n_{r_{n}-1}+1}\times \cdots \times {\textsf{X}}_n\); then by Proposition B.1,
For any function \(h_{n} \in {\textsf{F}}({\textsf{X}}_{n})\) we may define \({\varvec{h}}_{r_{n}}: \varvec{{\textsf{X}}}_{r_{n}} \ni {\varvec{x}}_{r_{n}} \mapsto h_{n}(\varvec{\Pi }_n^{r_{n}}({\varvec{x}}_{r_{n}}))=h_{n}(x_{n})\). It is straightforward to check that for a so-defined extended function it holds that \(\phi _{n}h_{n}=\varvec{\phi }_{r_{n}}{\varvec{h}}_{r_{n}}\). Thus, under Assumption 1, Proposition B.2 implies that, as \(N\rightarrow \infty \),
where Z is standard normally distributed and, since \(\phi _{n}^{N}h_{n}={\varvec{\phi }_{r_{n}}^N}{\varvec{h}}_{r_{n}}\) and \(\phi _{n}h_{n}=\varvec{\phi }_{r_{n}}{\varvec{h}}_{r_{n}}\), the asymptotic variance \(\sigma _{n}^2\langle {\rho _{0:n-1}}\rangle (h_{n})\) equals \(\varvec{\sigma }_{r_{n}}^2\langle {\rho _{0:n-1}}\rangle ({\varvec{h}}_{r_{n}})\); here we have added \({\rho _{0:n-1}}\) to the notation in order to highlight that the extended model under consideration is governed by the given selection schedule.
1.2 B.2: Proof of Theorem 3.5
The following proof resembles closely the proof of Mastrototaro et al. (2022, Corollary 3.7).
Proof
Let \({\textsf{R}}_n {:}{=}\{0,1\}^{n}\) be the set of binary sequences of length n. To all \({\rho _{0:n-1}} \in {\textsf{R}}_n\) we associate independent realisations \((\xi _{n}^{i}, \omega _{n}^{i})_{i=1}^N\) of Algorithm 4, each realisation being driven by the deterministic selection schedule governed by the corresponding \({\rho _{0:n-1}}\), and let \(h_{n}^N\langle {\rho _{0:n-1}}\rangle {:}{=}\Omega _{n}^{-1}\sum _{i=1}^{n} \omega _{n}^{i}h_{n}(\xi _{n}^{i})\) denote the corresponding filter estimate. Then for every \(N\in {\mathbb {N}}^*\),
By Lemma 3.4, for almost all \(\alpha _{0:n-1}\in (0,1)^{n}\),
and by Slutsky’s Lemma and (B.8), all terms in the sum (B.9) tends to zero except one, which converges weakly to \(\sigma _n\langle \rho _{0:n-1}^{\alpha }\rangle (h_{n}) Z\). This completes the proof. \(\square \)
1.3 B.3: Proof of Corollary 3.6
We first show the consistency of the variance estimates provided by Algorithm 5 in the case of a deterministic resampling schedule.
Lemma B.3
Let Assumption 2 hold. For every \(n\in {\mathbb {N}}\) and functionals \(h_{m}\in {\textsf{F}}(\mathcal {X}_m)\), \(m\in \llbracket 1, n \rrbracket \), let \((\uplambda _m)_{m=1}^n\) be the lags produced by n iterations of Algorithm 5 driven by some deterministic selection schedule \(({\rho _{n}})_{n \in {\mathbb {N}}}\). Then, as \(N\rightarrow \infty \), it holds that \(\uplambda _n\overset{{\mathbb {P}}}{\longrightarrow }r_{n}\) and \({\hat{\sigma }}_{n,\uplambda _n}^2(h_{n})\overset{{\mathbb {P}}}{\longrightarrow }\sigma _{n}^2\langle {\rho _{0:n-1}}\rangle (h_{n})\).
Proof
We proceed by induction, assuming that this holds true for some \(n-1\) with \(n\in {\mathbb {N}}^*\). Along the lines of the proof of Theorem 3.3, we show that \(\uplambda _n\) produced by Algorithm 5 converges in probability to \(r_{n}\). We consider separately the two cases \({\rho _{n-1}}=0\) and \({\rho _{n-1}}=1\). In the former, resampling is not performed at time \(n-1\); thus, \(\uplambda _n=\uplambda _{n-1}\) and \(r_{n}=r_{n-1}\), which implies that
as \(N\rightarrow \infty \) by the induction hypothesis. In the latter case \({\rho _{n-1}}=1\), when resampling is triggered, \(r_{n}=r_{n-1}+1\) while \(\uplambda _n\) is determined by Line 10 in Algorithm 5. Thus, we may write
where the second term of (B.11) is zero since, necessarily, \(\uplambda _n\le \uplambda _{n-1} + 1\) by construction. To treat the first term (B.12), write
In order to show that (B.13) tends to one, we consider the extended model with resampling times \((n_m)_{m\in {\mathbb {N}}}\) induced by \(({\rho _{n}})_{n\in {\mathbb {N}}}\). Without loss of generality we assume that n is a resampling time. Then by Proposition B.1, for every \(\uplambda \in \llbracket 0, r_{n} \rrbracket \),
where we have defined, as previously, \({\varvec{h}}_{r_{n}}: \varvec{{\textsf{X}}}_{r_{n}} \ni {\varvec{x}}_{r_{n}} \mapsto h_{n}(\varvec{\Pi }_n^{r_{n}}({\varvec{x}}_{r_{n}}))\) (where \(\varvec{{\textsf{X}}}_{r_{n}} {:}{=}{\textsf{X}}_{n_{r_{n}-1}+1}\times \cdots \times {\textsf{X}}_n\)). Furthermore, by noting that
and
we conclude that once Assumption 2 is satisfied for the original model, then it is also satisfied for the extended one. Thus, Theorem 2.2 implies that for every \(\uplambda \in \llbracket 0, r_{n} \rrbracket \),
where we have included \({\rho _{0:n-1}}\) in the notation to highlight that the extended model under consideration is determined by the given selection schedule. By (B.7) it holds that for \(\uplambda \in \llbracket 0, r_{n} - 1 \rrbracket \),
thus, using (B.12–B.14) and the induction hypothesis we may conclude that \({\mathbb {P}}(\uplambda _n=r_{n}) \rightarrow 1\) as \(N\rightarrow \infty \).
Finally, we show that for every \(\varepsilon >0\), \({\mathbb {P}}(\mid {\hat{\sigma }}_{n,\uplambda _n}^2(h_{n})- \sigma _{n}^2\langle {\rho _{0:n-1}}\rangle (h_{n})\mid \ge 2\varepsilon )\) tends to zero as \(N\rightarrow \infty \). Recalling that \(\sigma _{n}^2\langle {\rho _{0:n-1}}\rangle (h_{n}) = \varvec{\sigma }_{r_{n}}^2 \langle {\rho _{0:n-1}}\rangle ({\varvec{h}}_{r_{n}})\), we obtain the bound
where the second term tends to zero as \(N\rightarrow \infty \) by (B.14). For the first term it holds that
as \(N\rightarrow \infty \).
The proof is completed by noting that the base case holds true, since \(\uplambda _0=0\) and \({\hat{\sigma }}_{0,0}^2(h_{0})\overset{{\mathbb {P}}}{\longrightarrow }\sigma _{0}^2(h_{0})\) for all \(h_{0}\in {\textsf{F}}(\mathcal {X}_0)\). \(\square \)
We are now ready to prove Corollary 3.6.
Proof of Corollary 3.6
Following the lines of the proof of Lemma 3.5, we let again \({\textsf{R}}_n = \{0,1\}^n\) be the set of binary sequences of length n. For all \({\rho _{0:n-1}} \in {\textsf{R}}_n\), let \({\hat{\sigma }}_{n,\uplambda _n}^2 \langle {\rho _{0:n-1}}\rangle (h_{n})\) be independent variance estimators obtained on the basis of independent realizations of Algorithm 5, each realization being driven by the corresponding selection schedule \({\rho _{0:n-1}}\). Then for every \(N\in {\mathbb {N}}^*\),
Now, for almost all \(\alpha _{0:n-1}\in (0,1)^{n}\), by (B.10) and Lemma B.3, all terms of (B.15) tend to zero as \(N\rightarrow \infty \) except one which converges in probability to \(\sigma _{n}^2\langle \rho _{0:n-1}^{\alpha }\rangle (h_{n})\). This completes the proof. \(\square \)
1.4 B.4: Proof of Proposition 3.7
By construction of the \(\textsf {ALVar} \) estimator in Algorithm 3, Proposition 3.7 is a direct consequence of the following lemma.
Lemma B.4
Let \((\Delta ,n)\in {\mathbb {N}}^2\) be such that \(n\ge \Delta \) and let \(h_{n-\Delta } \in {\textsf{F}}(\mathcal {X}_{n - \Delta })\). Then for every \(\uplambda \in \llbracket 0, \Delta -1 \rrbracket \), \({\hat{\sigma }}_{n-\Delta \mid n,\uplambda }^2(h_{n-\Delta })\le {\hat{\sigma }}_{n-\Delta \mid n,\Delta }^2(h_{n-\Delta })\), where \({\hat{\sigma }}_{n-\Delta \mid n, \uplambda }^2(h_{n-\Delta })\) is given by (3.10).
Proof
For ease of notation, set \(m = n-\Delta \). First, since \(n \langle \Delta \rangle = m\), we note that
We now pick arbitrarily \(\uplambda \in \llbracket 0, \Delta -1 \rrbracket \) and show that \({\hat{\sigma }}_{n-\Delta \mid n,\uplambda }^2(h_{n-\Delta })\) given by (3.10) is bounded from above by (B.16). For this purpose, note that \(E_{m, n}^{j}=E_{m, n\langle \uplambda \rangle }^{i'}\) for all \((i',j) \in \llbracket 1, N \rrbracket ^2\) such that \(E_{n\langle \uplambda \rangle ,n}^{j}=i'\), from which it follows that
We may now conclude the proof by showing that the right-hand side of the previous equation is always smaller than (B.16); indeed, since the weights are nonnegative, for each \(i\in \llbracket 1, N \rrbracket \),
The proof is complete. \(\square \)
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Mastrototaro, A., Olsson, J. Adaptive online variance estimation in particle filters: the ALVar estimator. Stat Comput 33, 77 (2023). https://doi.org/10.1007/s11222-023-10243-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11222-023-10243-1