1 Introduction

In this paper we present an adaptive online algorithm estimating the asymptotic variance in particle filters, or, sequential Monte Carlo (SMC) methods. SMC methods approximate a given sequence of distributions by propagating recursively a sample of random simulations, so-called particles, with associated importance weights. Applications include finance, signal processing, robotics, biology, and several more; see, e.g., Doucet et al. (2001) and Chopin and Papaspiliopoulos (2020). This methodology, introduced first by Gordon et al. (1993) in the form of the bootstrap particle filter, revolves around two operations: a selection step, which resamples the particles in proportion to their importance weights, and a mutation step, which randomly propagates the particles in the state space.

Since the introduction of the bootstrap particle filter, several theoretical results describing the convergence of SMC methods as the number of particles tends to infinity have been established; see, e.g., Cappé et al. (2005); Del Moral (2004), and Del Moral (2013). A contribution of vital importance was made by Del Moral and Guionnet (1999), who established, under general assumptions, a central limit theorem (CLT) for standard SMC methods, a result that was later refined by, among others, Chopin (2004); Künsch (2005), and Douc and Moulines (2008). CLTs are generally essential in Monte Carlo simulation, as these allow the accuracy of produced estimates to be assessed in terms of confidence bounds. However, in the case of particle filters, the asymptotic variance of the weak, Gaussian limit is generally intractable due to the recursive nature of these algorithms. Thus, to estimate the variance of SMC methods is a very challenging task, and although the literature on SMC is vast, only very few works are dedicated to this specific problem. Until just a couple of years ago, the only possible way to estimate the particle-filter variance was to take a naive—and computationally very demanding—approach consisting of calculating the sample variance across independent replicates of the particle filter; see Crisan et al. (2018) for a similar procedure in the context of parallelisation of SMC methods. An important step towards online variance estimation in particle filters was taken by Chan and Lai (2013), who developed a consistent asymptotic-variance estimator which can be computed sequentially on the basis of a single realisation of the particle filter and without significant additional computational effort. In the same work, the estimator, which we will refer to as the Chan and Lai estimator (CLE), was also shown to be asymptotically consistent as the number of particles tends to infinity. The CLE was later refined and analysed further by Lee and Whiteley (2018) and Du and Guyader (2021).

In a particle filter, the repeated resampling operations induce genealogical relations between the particles, allowing the estimator—the weighted empirical measure formed by the particles—to be split into terms corresponding to particle subpopulations obtained by stratifying the particle sample by the time-zero ancestors. At each iteration, the CLE is, simply put, given by the sample variance of these contributions with respect to the average of the full population. However, as time increases, the set of distinct time-zero ancestors depletes gradually, and eventually all the particles share one and the same time-zero ancestor. This particle-path degeneracy phenomenon makes the CLE collapse to zero in the long run. In order to remedy to this issue and to push the technology towards truly online settings, Olsson and Douc (2019) devised a lag-based, numerically stable strategy in which the particle sample at time n is stratified by the ancestors at some more recent time \((n-\uplambda ) \vee 0\), where \(\uplambda \in {\mathbb {N}}\) is a fixed lag parameter. Such a procedure—which can still be implemented in an online fashion—avoids completely the issue of particle-path degeneracy at the cost of a bias induced by the lag. Still, under mild assumptions being satisfied also for models with a non-compact state space, the authors managed to bound this bias uniformly in time by a quantity that decays geometrically with \(\uplambda \). The simulation study presented in the same work confirms the long-term stability of the produced estimates, which stay, when the lag is well chosen, very close to the ones produced by the naive estimator for arbitrarily long periods of time. However, designing the lag parameter \(\uplambda \) is highly non-trivial as the optimal choice depends on the ergodicity properties of the model; indeed, the user faces a delicate bias–variance tradeoff in the sense that using a too small lag results in a numerically stable but significantly biased estimator, whereas using a too large lag eliminates the bias at the cost of high variance implied by the same degeneracy issue as that affecting the CLE.

In this paper we develop further the lag-based approach of Olsson and Douc (2019) and propose an estimator that is capable of adapting automatically, by monitoring the degree of depletion of the ancestor sets, the size of the lag as the particles evolve. Like the fixed-lag method of Olsson and Douc (2019), our adaptive-lag variance (ALVar) estimator operates online with time-constant memory requirements, but does not require the calibration of any algorithmic parameter. Moreover, estimating the variance only on the basis of the genealogy of the propagated particle cloud, without additional simulations, the routine requires only minor code additions to the underlying particle algorithm and has a linear computational complexity in the number of particles that is fully comparable to the particle filter itself. These appealing complexity properties are absolutely crucial in practical applications. As a comparison, the online approach to variance estimation in SMC methods recently proposed by Janati et al. (2022), relying on backward-sampling techniques, has, at best, a quadratic complexity in the number of particles, which is impractical for large particle sample sizes. In addition, just like the CLE, the estimator of Janati et al. (2022) also exhibits a decay towards zero for longer time horizons, even though this occurs at a lower rate than for the CLE. Unlike previous works on variance estimation in SMC, which focus on the standard bootstrap particle filter operating on Feynman–Kac models (Del Moral 2004), our estimator applies to more general auxiliary particle filters (APF, Pitt and Shephard 1999) and classes of models. In this setting, we show that the ALVar estimator is asymptotically consistent as the number of particles tends to infinity. Moreover, we claim and illustrate numerically that the values of the lag chosen adaptively by the algorithm stay stable over time and increase, on the average, only logarithmically with the number of particles; the latter property is fundamental to avoid an excessive demand of computational resources in applications. Furthermore, we extend our estimator to particle filters with adaptive resampling, in which the selection operation is performed only when triggered by some criterion monitoring the particle weight degeneracy, yielding the first SMC variance estimator in that context.

The rest of the paper is structured as follows: in Sect. 2 we introduce some notation, our general model framework, SMC methods, and give some background to variance estimation in particle filters; in addition, we show that all the results obtained in the framework of Feynman–Kac models and the bootstrap particle filter can be extended to our framework and the APF. In Sect. 3 we present the ALVar estimator, prove its consistency, provide an extension to particle filters with adaptive resampling, and show how the estimator can be applied also in the context of online lag-based fixed-point marginal smoothing. Section 4 provides numerical simulations illustrating the algorithm on some classic state-space models. Finally, Appendix A and B provide some of the proofs of the results stated in Sects. 2 and 3.

2 Preliminaries

2.1 Notation

We denote by \({\mathbb {N}}\) the set of nonnegative integers and let \({\mathbb {N}}^*{:}{=}{\mathbb {N}}{\setminus }\{0\}\). For every \((m,n)\in {\mathbb {N}}^2\) such that \(m\le n\), we denote \(\llbracket m, n \rrbracket {:}{=}\{k\in {\mathbb {N}}:m\le k\le n\}\). Moreover, we let \({\mathbb {R}}_+\) and \({\mathbb {R}}_+^*\) be the sets of nonnegative and positive real numbers, respectively, and denote vectors by \(x_{m:n}{:}{=}(x_m,x_{m+1},\dots ,x_{n-1},x_n)\). For a finite set \((p_i)_{i=1}^N\), \(N \in {\mathbb {N}}^*\), of nonnegative numbers, we denote by \(\textsf{Cat}((p_i)_{i=1}^N)\) the categorical distribution with sample space \(\llbracket 1, N \rrbracket \) and probability function \(\llbracket 1, N \rrbracket \ni i \mapsto p_i/\sum _{\ell =1}^N p_\ell \). For some general state space \(({\textsf{E}}, \mathcal {E})\) we let \({\textsf{M}}(\mathcal {E}) ~{\textsf {M}}_1(\mathcal {E})\) and \({\textsf{F}}(\mathcal {E})\) be the sets of (probability) measures and bounded measurable functions on \(({\textsf{E}},\mathcal {E})\), respectively. For any \(\mu \in {\textsf{M}}(\mathcal {E})\) and \(h \in {\textsf{F}}(\mathcal {E})\) we denote by \(\mu h {:}{=}\int h(x) \, \mu (dx)\) the Lebesgue integral of h with respect to \(\mu \).

The following kernel notation will be frequently used. Let \(({\textsf{E}}', \mathcal {E}')\) be another measurable space; then a (possibly unnormalised) transition kernel \({\textbf {K}}:{\textsf{E}} \times \mathcal {E}'\rightarrow {\mathbb {R}}_+\) induces the following operations. For any \(h \in {\textsf{F}}(\mathcal {E}')\) and \(\mu \in {\textsf{M}}(\mathcal {E})\) we may define the measurable function

$$\begin{aligned} {\textbf {K}} h: {\textsf{E}} \ni x \mapsto \int h(x') \, {\textbf {K}}(x, dx') \end{aligned}$$

as well as the measures

Now, let \(({\textsf{E}}'', \mathcal {E}'')\) be a third measurable state-space and \({\textbf {L}}\) a possibly unnormalised transition kernel on \({\textsf{E}}' \times \mathcal {E}''\); then, similarly to the operations between measures and kernels, we may define the products

2.2 Model setup

In order to define the distribution-flow model under consideration, let \(({\textsf{X}}_n,\mathcal {X}_n)_{n\in {\mathbb {N}}}\) be a sequence of measurable state spaces. We introduce unnormalised transition kernels \(({\textbf {L}}_{n})_{n \in {\mathbb {N}}}\), \({\textbf {L}}_{n}: {\textsf{X}}_n \times \mathcal {X}_{n+1}\rightarrow {\mathbb {R}}_+\), where each \({\textbf {L}}_{n}\) is such that \(\sup _{x_n \in {\textsf{X}}_n} {\textbf {L}}_{n} \mathbb {1}_{{\textsf{X}}_{n+1}}(x_n) < \infty \). For compactness, we write \({\textbf {L}}_{k,m} {:}{=}{\textbf {L}}_{k}{\textbf {L}}_{k+1}\cdots {\textbf {L}}_{m}\) whenever \(k \le m\), otherwise \( {\textbf {L}}_{k,m}=\text {id}\) by convention. In addition, we let \(\chi \) be some measure on \(\mathcal {X}_0\). Using these quantities we may define a flow \(\phi _{n} \in {\textsf{M}}_1(\mathcal {X}_n)\), \(n \in {\mathbb {N}}\), of probability distributions by letting, for every \(n \in {\mathbb {N}}\),

$$\begin{aligned} \phi _{n} = \frac{\chi {\textbf {L}}_{0,n-1}}{\chi {\textbf {L}}_{0,n-1}\mathbb {1}_{{\textsf{X}}_n}}. \end{aligned}$$
(2.1)

Example 1

(Feynman–Kac models) Feynman–Kac models are applied in a variety of scientific fields such as statistics, physics, biology, and signal processing; see Del Moral (2004) for a broad coverage of the topic. For every \(n\in {\mathbb {N}}\), let \({\textbf {M}}_{n}:{\textsf{X}}_n \times \mathcal {X}_{n+1}\rightarrow [0,1]\) be a Markov transition kernel, \(g_{n}: {\textsf{X}}_n \rightarrow {\mathbb {R}}_+\) a measurable potential function, and \(\nu \) a probability measure on \(({\textsf{X}}_0,\mathcal {X}_0)\). Then the Feynman–Kac model \((\phi _{n})_{n \in {\mathbb {N}}}\) induced by \(\nu \) and \((({\textsf{X}}_n, \mathcal {X}_n), {\textbf {M}}_{n}, g_{n})_{n \in {\mathbb {N}}}\) is given by (2.1) with.Footnote 1

$$\begin{aligned}{} & {} {\textbf {L}}_{n}h_{n+1}(x_n)= {\textbf {M}}_{n}(g_{n+1} h_{n+1})(x_n),h_{n+1}\in {\textsf{F}}(\mathcal {X}_{n+1}), \quad x_n \in {\textsf{X}}_n. \end{aligned}$$

Example 2

(Partially dominated state-space models) General state-space models (SSMs) constitute an important modeling tool in a diversity of scientific and engineering disciplines; see, e.g., Cappé et al. (2005) and the references therein. An SSM consists of a bivariate Markov chain \((X_n, Y_n)_{n \in {\mathbb {N}}}\) evolving on some measurable product space according to a Markov transition kernel constructed on the basis of two other Markov kernels \({\textbf {M}}:{\textsf{X}}\times \mathcal {X}\rightarrow [0,1]\) and \({\textbf {G}}:{\textsf{X}}\times \mathcal {Y}\rightarrow [0,1]\) as

The chain is initialised according to , where \(\nu \) is some probability measure on \(({\textsf{X}}, \mathcal {X})\). In this setting, only the process \((Y_n)_{n \in {\mathbb {N}}}\) is observed, whereas the process \((X_n)_{n \in {\mathbb {N}}}\)—referred to as the state process—is unobserved and hence referred to as hidden. In this construction, it can be shown (see Cappé et al. 2005, Section 2.2, for details), first, that the state process is itself a Markov chain with transition kernel \({\textbf {M}}\) and, second, that the observations \((Y_n)_{n \in {\mathbb {N}}}\) are conditionally independent given \((X_n)_{n \in {\mathbb {N}}}\), with marginal emission distributions \(Y_n \sim {\textbf {G}}(X_n, \cdot )\), \(n \in {\mathbb {N}}\). We assume that the model is partially dominated, i.e., that the kernel \({\textbf {G}}\) admits a transition density \(g:{\textsf{X}}\times {\textsf{Y}}\rightarrow {\mathbb {R}}_+\) with respect to some reference measure \(\mu \).

Many practical applications of SSMs call for computation of flows of hidden-state posteriors given a sequence \((y_n)_{n \in {\mathbb {N}}}\) of observations. In particular, the flow \((\phi _{n})_{n \in {\mathbb {N}}}\) of filter distributions, each filter \(\phi _{n}\) being the conditional distribution of the state \(X_n\) at time n given \(Y_{0:n} = y_{0:n}\), can be expressed as a Feynman–Kac model with \(({\textsf{X}}_n, \mathcal {X}_n)=({\textsf{X}},\mathcal {X})\), \({\textbf {M}}_{n}={\textbf {M}}\), and \(g_{n}(x){:}{=}g(x,y_n)\) for all \(n \in {\mathbb {N}}\); see Cappé et al. (2005, Section 3.1) for details. Inspired by this terminology, we will sometimes refer to each distribution \(\phi _{n}\) in the general flow defined by (2.1) as the filter at time n.

2.3 Sequential Monte Carlo methods

In the following we assume that all random variables are well defined on a common probability space \((\Omega ,\mathcal {F},{\mathbb {P}})\). As mentioned in the introduction, we may approximate recursively the distribution sequence \((\phi _{n})_{n\in {\mathbb {N}}}\) by propagating a random sample \((\xi _{n}^{i},\omega _{n}^{i})_{i=1}^N\) of particles and associated weights. Here \(N \in {\mathbb {N}}^*\) is the Monte Carlo sample size. More precisely, at each time step, the filter distribution \(\phi _{n}\) is approximated by the weighted empirical measure

$$\begin{aligned} \phi _{n}^{N} {:}{=}\sum _{i=1}^N\frac{\omega _{n}^{i}}{\Omega _{n}}\delta _{\xi _{n}^{i}}, \end{aligned}$$

where \(\Omega _n{:}{=}\sum _{i=1}^N\omega _{n}^{i}\) and \(\delta _{\xi _{n}^{i}}\) is the Dirac measure located at \(\xi _{n}^{i}\). The APF propagates the sample \((\xi _{n}^{i},\omega _{n}^{i})_{i=1}^N\) recursively as follows. The algorithm is initialised by standard importance sampling, drawing , where \(\nu \in {\textsf{M}}_1(\mathcal {X}_0)\) is some proposal distribution dominating \(\chi \), and letting \(\omega _{0}^{i}\leftarrow \gamma _{-1}(\xi _{0}^{i})\) for each i, where \(\gamma _{-1}\) is the Radon–Nikodym derivative of \(\chi \) with respect to \(\nu \). The auxiliary functions \((\vartheta _{n})_{n\in {\mathbb {N}}}\), where \(\vartheta _{n}\in {\textsf{F}}(\mathcal {X}_n)\), are introduced in order to favor the resampling of particles that are more likely to be propagated into regions of high likelihood (as measured by the target distributions). The particles are propagated according to some proposal Markov transition kernels \({\textbf {P}}_{n}\), \(n\in {\mathbb {N}}\). These kernels are such that, for each \(n\in {\mathbb {N}}\) and \(x_n\in {\textsf{X}}_n\), the measure \({\textbf {L}}_{n}(x_n,\cdot )\) is absolutely continuous with respect to the probability measure \({\textbf {P}}_{n}(x_n,\cdot )\). Hence, given \(x_n\), there is a Radon–Nikodym derivative \(\gamma _{n}(x_n,\cdot )\) such that for every \(x_n\in {\textsf{X}}_n\) and \(h\in {\textsf{F}}(\mathcal {X}_{n+1})\),

$$\begin{aligned}{} & {} \int h(x_{n+1}) \, {\textbf {L}}_{n}(x_n,dx_{n+1}) = \int h(x_{n+1}) \gamma _{n}(x_n,x_{n+1}) \,{\textbf {P}}_{n}(x_n,dx_{n+1}). \end{aligned}$$

Algorithm 1 shows one iteration of the APF. In the following we will express one iteration of the APF as \((\xi _{n+1}^{i},\omega _{n+1}^{i}, I_{n+1}^{i})_{i=1}^N\leftarrow {{\textsf{P}}}{{\textsf{F}}}((\xi _{n}^{i},\omega _{n}^{i})_{i=1}^N)\), where also the resampled indices are included in the output for reasons that will be clear later.

figure a

As mentioned in the introduction, the first proof of the CLT for SMC methods obtained by Del Moral and Guionnet (1999) has been refined and generalised in a number of papers. The following theorem provides a CLT for APFs in the general model context of Sect. 2.2, and follows immediately from the more general result of Mastrototaro et al. (2022, Theorem B.6).Footnote 2

Assumption 1

For every \(n \in {\mathbb {N}}\), \(\vartheta _{n} \in {\textsf{F}}(\mathcal {X}_n)\) and \(\gamma _{n}/\vartheta _{n} \in {\textsf{F}}(\mathcal {X}_n)\). In addition, \(\gamma _{-1} \in {\textsf{F}}(\mathcal {X}_0)\).

Theorem 2.1

Let Assumption 1 hold. Then for every \(n\in {\mathbb {N}}\) and \(h_{n}\in {\textsf{F}}(\mathcal {X}_n)\), as \(N\rightarrow \infty \),

$$\begin{aligned} \sqrt{N}(\phi _{n}^{N}h_{n}-\phi _{n}h_{n})\overset{{\mathcal {D}}}{\longrightarrow }\sigma _n(h_{n})Z, \end{aligned}$$
(2.2)

with Z being standard normally distributed and \( \sigma _{n}^2(h_{n}) {:}{=}\sigma _{0,n}^2(h_{n})\), where, for \(\ell \in \llbracket 0, n \rrbracket \),

$$\begin{aligned}{} & {} \sigma _{\ell ,n}^2(h_{n}){:}{=}\frac{\chi (\gamma _{-1}\{{\textbf {L}}_{0,n-1}(h_{n}-\phi _{n}h_{n})\}^2)}{(\chi {\textbf {L}}_{0,n-1}\mathbb {1}_{{\textsf{X}}_n})^2} \mathbb {1}_{\{\ell =0\}} \nonumber \\{} & {} \quad + \sum _{m=(\ell -1)\vee 0}^{n-1} \phi _{m}\vartheta _{m} \frac{\phi _{m}{\textbf {L}}_{m}(\vartheta _{m}^{-1}\gamma _{m} \{{\textbf {L}}_{m+1,n-1}(h_{n}-\phi _{n}h_{n})\}^2 )}{(\phi _{m} {\textbf {L}}_{m,n-1} \mathbb {1}_{{\textsf{X}}_n})^2}.\nonumber \\ \end{aligned}$$
(2.3)

The truncated asymptotic variance (\(\ell > 0\)) will be useful later on.

The present paper focuses on estimating online, as n increases and while the particle sample is propagated, the sequence of the asymptotic variances in (2.2). Before presenting our online variance estimator, the next section provides a brief overview of some current approaches.

2.4 Estimation of asymptotic variance

As touched upon in the introduction, a naive approach to variance estimation in particle filters is to use a brute-force strategy which runs a sufficiently large number \(K \in {\mathbb {N}}^*\) of independent particle filters. Then the asymptotic variance of interest can be estimated by multiplying the sample variance of these filter approximations by N. However, having \({\mathcal {O}}(KN)\) complexity, where N as well as K should be sufficiently large to provide precise filter and variance estimates, respectively, this approach is clearly computationally impractical. Moreover, implementing this procedure in an online fashion requires all the samples of each particle filter to be stored, implying also an \({\mathcal {O}}(KN)\) memory requirement.

Appealingly, the online approach devised by Chan and Lai (2013) estimates consistently the sequence of asymptotic variances based only on the cloud of evolved particles and without requiring the execution of multiple SMC algorithms in parallel or any additional simulations. This is possible by keeping track, as n increases, of the so-called Eve indices \((E_{n}^{i})_{i=1}^N\) (borrowing the terminology from Lee and Whiteley (2018)) identifying the particles at time zero from which the ones at time n originate, in the sense that \(E_{n}^{i}\) denotes the index of the time-zero ancestor of particle \( \xi _{n}^{i}\). These indices can be traced iteratively in the particle filter by initially letting, for all \(i\in \llbracket 1, N \rrbracket \), \(E_{0}^{i}\leftarrow i\) and then, as n increases, update the same according to \(E_{n+1}^{i}\leftarrow E_{n}^{I_{n+1}^{i}}\). Such updates are straightforwardly implemented by adding one line of code after the selection operation on Line 2 in Algorithm 1. Then the CLE estimator of \( \sigma _{n}^2(h_{n})\) is, for any \(h_{n}\in {\textsf{F}}(\mathcal {X}_n)\), given by

$$\begin{aligned} {\hat{\sigma }}_{n}^2(h_{n}){:}{=}N \sum _{i=1}^N\left( \sum _{j:E_{n}^{j}=i}\frac{\omega _{n}^{j}}{\Omega _{n}}\{h_{n}(\xi _{n}^{j})-\phi _{n}^{N}h_{n}\}\right) ^2. \end{aligned}$$
(2.4)

As a main result, Chan and Lai (2013) established the consistency of this estimator, in the sense that for every \(n\in {\mathbb {N}}\) and \(h_{n}\in {\textsf{F}}(\mathcal {X}_n)\), \({\hat{\sigma }}_{n}^2(h_{n})\overset{{\mathbb {P}}}{\longrightarrow }\sigma _{n}^2(h_{n})\) as N tends to infinity. Although being groundbreaking in theory, the estimator (2.4) suffers a severe drawback in practice due to the particle-path degeneracy phenomenon. Indeed, because of the resampling operation, at each iteration of the filter some particles will inevitably be propagated from the same parent particle. Thus, eventually, when n is large enough, all particles will share the same time-zero ancestor, i.e., there will exist \(i_0\in \llbracket 1, N \rrbracket \) such that \(E_{n}^{i}=i_0\), for all \(i\in \llbracket 1, N \rrbracket \). Recently, Koskela et al. (2020) showed, under some standard mixing assumptions on the model, that the number of iterations needed to make the genealogical paths of the particles coalesce in this way is \({\mathcal {O}}(N)\). Hence, eventually the estimate (2.4) collapses to zero, which makes it unusable for large values of n. In practice, the estimator exhibits poor accuracy and high variability already when the Eve indices take on only a few distinct values, as also the variance estimates will be based on only a few distinct values in that case.

In order to remedy this issue, Olsson and Douc (2019) suggest to, rather than tracing the time-zero ancestors, estimate the variance on the basis of the ancestors in some more recent generation. For this purpose, they introduce the Enoch indices defined recursively, for all \(i\in \llbracket 1, N \rrbracket \) and \(m\in \llbracket 0, n+1 \rrbracket \), by

$$\begin{aligned} E_{m,n+1}^{i}{:}{=}{\left\{ \begin{array}{ll} i\quad &{}\text {for }m=n+1,\\ E_{m,n}^{I_{n+1}^{i}}&{}\text {for }m<n+1. \end{array}\right. } \end{aligned}$$
(2.5)

In words, \(E_{m,n}^{i}\) indicates the index of the ancestor at time \(m \le n\) of particle i at time n; moreover, notice that when \(m=0\), these indices correspond to the Eve indices. Then, letting \(n\langle \uplambda \rangle {:}{=}(n-\uplambda )\vee 0\) for some lag \(\uplambda \in {\mathbb {N}}\), the CLE (2.4) is replaced by the modified estimator

$$\begin{aligned} {\hat{\sigma }}_{n,\uplambda }^2(h_{n}){:}{=}N \sum _{i=1}^N\left( \sum _{j:E_{n\langle \uplambda \rangle ,n}^{j}=i}\frac{\omega _{n}^{j}}{\Omega _{n}}\{h_{n}(\xi _{n}^{j})-\phi _{n}^{N}h_{n}\}\right) ^2.\nonumber \\ \end{aligned}$$
(2.6)

Since for a given lag \( \uplambda \), the number \(n\langle \uplambda \rangle \) of the generation to which the Enoch indices underpinning the estimator (2.6) refer varies with n, the algorithm requires the storage and iterative updating of a window \((E_{n\langle \uplambda \rangle ,n}^{i}, \dots , E_{n,n}^{i})_{i = 1}^N\) of Enoch indices. One iteration of the procedure is shown in Algorithm 2, which is initialised be generating the initial particle cloud as in Algorithm 1 and letting, in addition, \(E_{0,0}^{i}\leftarrow i\) for all \(i\in \llbracket 1, N \rrbracket \). We observe that the memory requirement and computational complexity of each iteration of the algorithm are both \({\mathcal {O}}(\uplambda N)\), independently of the time index n.

figure b

The estimator (2.6) is not consistent for the asymptotic variance \(\sigma _{n}^2(h_{n})\) as N tends to infinity; still, Olsson and Douc (2019, Proposition 8) showed that for all \(\uplambda \in {\mathbb {N}}\) and \(h_{n}\in {\textsf{F}}(\mathcal {X}_n)\), \({\hat{\sigma }}_{n,\uplambda }^2(h_{n})\) converges to \(\sigma _{n\langle \uplambda \rangle ,n}^2(h_{n})\) in probability as N tends to infinity, where \(\sigma _{n\langle \uplambda \rangle ,n}^2(h_{n})\) is the truncated asymptotic variance given by (2.3), a quantity that is always smaller than the true asymptotic variance. Additional theoretical results (Olsson and Douc 2019, Section 4) establish that under mild, verifiable model assumptions, the asymptotic bias induced by the truncation decays geometrically fast with \(\uplambda \) (uniformly in n).

The results of Olsson and Douc (2019) were derived in the context of Feynman–Kac models and standard bootstrap particle filters, which is a more restrictive setting than the one considered here. Still, interestingly, it is possible to show that a general APF operating on a general distribution flow in the form (2.1) can actually be interpreted as a standard bootstrap filter operating on a certain auxiliary, extended Feynman–Kac model. Thus, using this trick, which is described in detail in Appendix A, we are able to extend the consistency results obtained by Olsson and Douc (2019) to the general setting of the present paper. This is the contents of Theorem 2.2, whose proof is found in Appendix A.

Assumption 2

For all \(n\in {\mathbb {N}}\) and \((x_n,x_{n+1})\in {\textsf{X}}_n\times {\textsf{X}}_{n+1}\),

$$\begin{aligned} \frac{\gamma _{n}(x_n,x_{n+1})\vartheta _{n+1}(x_{n+1}) }{\vartheta _{n}(x_n)}>0 \end{aligned}$$

and

$$\begin{aligned} \sup _{(x_n,x_{n+1})\in {\textsf{X}}_n\times {\textsf{X}}_{n+1}}\frac{\gamma _{n}(x_n,x_{n+1})\vartheta _{n+1}(x_{n+1}) }{\vartheta _{n}(x_n)}<\infty . \end{aligned}$$

Moreover, for all \(x_0\in {\textsf{X}}_0\),

$$\begin{aligned} \gamma _{-1}(x_0)\vartheta _{0}(x_0)>0\quad \text {and}\quad \sup _{x_0\in {\textsf{X}}_0}\gamma _{-1}(x_0)\vartheta _{0}(x_0)<\infty . \end{aligned}$$

Theorem 2.2

Let Assumptions 1 and 2 hold. Then for every \(n\in {\mathbb {N}}\), \(\uplambda \in {\mathbb {N}}\), and \(h_{n} \in {\textsf{F}}(\mathcal {X}_n)\), as \(N \rightarrow \infty \),

$$\begin{aligned} {\hat{\sigma }}_{n,\uplambda }^2(h_{n}) \overset{{\mathbb {P}}}{\longrightarrow }\sigma _{n\langle \uplambda \rangle ,n}^2(h_{n}). \end{aligned}$$

The main practical issue with the lag-based approach of Olsson and Douc (2019) is that the design of an optimal lag might be a difficult task. Using a too large lag implies, as for the CLE, depletion of the set of ancestors supporting the estimator, leading to high variance; on the other hand, using a too small lag decreases this variance, however at the cost of significant underestimation of the asymptotic variance of interest. The fact that the asymptotic bias decreases geometrically fast suggests that we should obtain a good approximation of the asymptotic variance even for moderate values of \(\uplambda \), but quantifying this optimal lag size may be a laborious task. In the numerical simulations of Olsson and Douc (2019), the algorithm is run multiple times for several distinct values of \(\uplambda \), whereupon the variance estimates obtained in this manner are compared to that obtained using the naive estimator in order to determine the empirically best lag. This method is not ideal as it requires extensive prefatory computations and does not take into account the possibility of varying the lag as the particles evolve. Instead, it is desirable to keep the lag as large as possible as long as the estimator is of good quality (in some sense) and decrease it whenever some degeneracy, determined by the depletion of the Enoch indices, is detected. This argument will be developed further in the next section, leading to the design of a fully adaptive approach.

3 Main results

3.1 The ALVar estimator

We first need to identify a criterion to determine an optimal lag at a given iteration n. We have previously discussed the bias–variance tradeoff, which usually arises when the objective is to minimise the mean-squared error (MSE) of an estimator with respect to the estimand of interest. For every \(n \in {\mathbb {N}}\), \(h_{n}\in {\textsf{F}}(\mathcal {X}_n)\), and \(\uplambda \in {\mathbb {N}}\), the MSE of the estimator (2.6) can be written as the sum of its variance and its squared bias according to

$$\begin{aligned}{} & {} {\mathbb {E}}\left[ \left( {\hat{\sigma }}_{n,\uplambda }^2(h_{n})-\sigma _{n}^2(h_{n})\right) ^2\right] ={\text {Var}}\left( {\hat{\sigma }}_{n,\uplambda }^2(h_{n})\right) \nonumber \\ {}{} & {} +\left( {\mathbb {E}}\left[ {\hat{\sigma }}_{n,\uplambda }^2(h_{n})\right] -\sigma _{n}^2(h_{n})\right) ^2. \end{aligned}$$
(3.1)

Our intention is to design a routine for adapting the lag \(\uplambda \) in such a way that (3.1) is minimised. Even if we do not have closed-form expressions of the expectation and the variance of the lag-based estimator in (3.1), we may make the following considerations.

  • Since \({\hat{\sigma }}_{n,\uplambda }^2(h_{n})\) tends to \(\sigma _{n\langle \uplambda \rangle ,n}^2(h_{n})\) in probability as the number N of particles tends to infinity, where \(\sigma _{n\langle \uplambda \rangle ,n}^2(h_{n})\le \sigma _{n}^2(h_{n})\) for all \(\uplambda \), and the difference decreases as \(\uplambda \) approaches n, we may assume that also the non-asymptotic bias is reduced when \(\uplambda \) increases.

  • On the other hand, the larger the value of \(\uplambda \) is, the fewer distinct elements has the set \((E_{n\langle \uplambda \rangle ,n}^{i})_{i=1}^N\) of Enoch indices, causing an increase of the variance of the estimator (2.6); see Fig. 5.

The reduction of the number of distinct Enoch indices may be tolerated until an increase of the lag is beneficial for the reduction of the bias, but at some point the behavior becomes pathological. Imagine, for instance, that we use the CLE in the early iterations of the particle filter for estimating the variance; then, at some time n, one realises that there exists some \(\uplambda \in \llbracket 0, n-1 \rrbracket \) for which \({\hat{\sigma }}_{n,\uplambda }^2(h_{n}) > {\hat{\sigma }}_{n}^2(h_{n})\), although their asymptotic values are supposed to be in the opposite order and the lag-based estimator is expected to be less variable. This suggests that the Eve indices might be depleted and not reliable anymore for supporting the variance estimator. It is then reasonable to assume that these will be unreliable also in the subsequent steps, since their degeneracy can only get worse. Extending this idea to the Enoch indices, we may define recursively the concept of depleted Enoch indices.

Definition 3.1

Let \((h_{n})_{n\in {\mathbb {N}}}\) be a given a sequence of functions such that \(h_{n}\in {\textsf{F}}(\mathcal {X}_n)\) for all n. The Enoch indices \((E_{m,n}^{i})_{i=1}^N\) are said to be depleted if at least one of the following conditions is satisfied:

  1. (i)

    the Enoch indices \((E_{m,n-1}^{i})_{i=1}^N\) are depleted;

  2. (ii)

    the Enoch indices \((E_{m-1,n}^{i})_{i=1}^N\) are depleted and, letting \(\uplambda :=n-m\), there exists \(\uplambda '\in \llbracket 0, \uplambda -1 \rrbracket \) such that \( {\hat{\sigma }}_{n,\uplambda }^2(h_{n})<{\hat{\sigma }}_{n,\uplambda '}^2(h_{n}) \).

By convention, for every \(n\in {\mathbb {N}}\), the Enoch indices \((E_{n,n}^{i})_{i=1}^N\) are never depleted, while \((E_{-1,n}^{i})_{i=1}^N\) are always depleted (even if these indices are never explicitly defined).

In order to check the depletion status of some indices \((E_{m,n}^{i})_{i=1}^N\) using Definition 3.1 we need to know the status of previous generations. Thus, in practice, depletion may be determined iteratively forwards in time, starting from \((E_{0,0}^{i})_{i=1}^N\), which are not depleted by definition. Then for every \(n\in {\mathbb {N}}^*\), knowing whether the indices \((E_{m,n-1}^{i})_{i=1}^N\) are depleted or not for all \(m\in \llbracket 0, n-1 \rrbracket \), it is possible to check the same for \((E_{m,n}^{i})_{i=1}^N\) starting from \(m=0\) and proceeding forwards to \(m = n\). This is done by checking first condition (i) in Definition 3.1; if this is not satisfied, then we check condition (ii). The idea behind condition (i) is that if a set \((E_{m,n-1}^{i})_{i=1}^N\) of Enoch indices is ill-suited to estimate the variance at some time \(n-1\), it will not be suited to estimate the variance at any future time, since the number of distinct elements in the set can only decrease with n. Regarding condition (ii), if instead the indices \((E_{m,n-1}^{i})_{i=1}^N\) are non-depleted, we still need to check if there is a more recent generation \((E_{m',n}^{i})_{i=1}^N\), \(m' \in \llbracket m + 1, n \rrbracket \), that produces a better estimate. The additional requirement of \((E_{m-1,n}^{i})_{i=1}^N\) being depleted serves to guarantee monotonicity, i.e., if \((E_{m,n}^{i})_{i=1}^N\) are depleted, then \((E_{m',n}^{i})_{i=1}^N\) should be as well for all \(m'\in \llbracket 0, m \rrbracket \), whereas if \((E_{m,n}^{i})_{i=1}^N\) are non-depleted, then \((E_{m',n}^{i})_{i=1}^N\) should not be either for all \(m'\in \llbracket m, n \rrbracket \) (Fig. 1).

Fig. 1
figure 1

The points in the plot correspond to fixed-lag estimates of \(\sigma _{n}^2(\text {id})\), n = 500, for the stochastic volatility model in Sect. 4.1, with different values of \(\uplambda \). The particle filter used \(N=1000\) particles. Each estimate is marked differently depending on whether the corresponding Enoch indices \((E_{n\langle \uplambda \rangle ,n}^{i})_{i=1}^N\) are depleted or not. In addition, we indicate which of the two conditions in Definition 3.1 that indicates depletion. For \(\uplambda \ge 25\), condition (i) was always fulfilled, while for \(\uplambda \in \llbracket 14, 24 \rrbracket \) condition (ii) (and not (i)) was fulfilled

Algorithm 3 describes our method, the adaptive-lag variance (ALVar) estimator, in which the optimal lag at each iteration is, as established by Theorem 3.2 below, chosen such that it is the largest one for which the corresponding Enoch indices are not depleted. This non-depletion condition is ensured by selecting recursively \(\uplambda _{n+1}\) in such a way that it produces the largest estimate, whose selection is bounded from above by \(\uplambda _n+1\). The lag is initialised by setting \(\uplambda _0\leftarrow 0\).

figure c

Theorem 3.2

For every \(n \in {\mathbb {N}}\), let \(\uplambda _{n}\) be the lag produced by n iterations of Algorithm 3. Then if \(\uplambda _n<n\), none of the Enoch indices \((E_{m,n}^{i})_{i=1}^N\), \(m\in \llbracket n\langle \uplambda _n\rangle , n \rrbracket \), are depleted whereas all the Enoch indices \((E_{m,n}^{i})_{i=1}^N\), \(m\in \llbracket 0, n\langle \uplambda _n\rangle -1 \rrbracket \), are depleted.

Proof

We proceed by induction. The claim is true for \(n=0\) since we initialise \(\uplambda _0\leftarrow 0\). Now, let the claim be true for some \(n\in {\mathbb {N}}\); then by the induction hypothesis and condition (i) of Definition 3.1, it holds that \((E_{m,n+1}^{i})_{i=1}^N\) are depleted for every \(m \in \llbracket 0, n \langle \uplambda _n \rangle -1 \rrbracket \) if \(\uplambda _n<n\), where \(n \langle \uplambda _n \rangle - 1 = (n + 1) \langle \uplambda _n+1\rangle -1\). On the other hand, by the induction hypothesis and the very construction of \(\uplambda _{n + 1}\) in Algorithm 3, none of the depletion conditions of Definition 3.1 are satisfied for \(m \in \llbracket (n+1) \langle \uplambda _{n+1} \rangle , n+1 \rrbracket \); hence, the corresponding Enoch indices are not depleted. If \(\uplambda _{n+1} < \uplambda _n+1\), then, again by the construction of \(\uplambda _{n + 1}\), \((E_{m,n+1}^{i})_{i=1}^N\) are depleted for \(m\in \llbracket (n+1)\langle \uplambda _n+1\rangle , (n+1)\langle \uplambda _{n+1}\rangle -1 \rrbracket \) as well by condition (ii). This concludes the proof. \(\square \)

The computation of the estimator (2.6) has complexity \({\mathcal {O}}(N)\) and is performed \(\uplambda _n+2\) times at each iteration n. In order to have an online algorithm with constant memory requirements we need \(\uplambda _n\) to be uniformly bounded in n. Although in theory the lag might increase indefinitely such that \(\uplambda _n=n\) for all \(n\in {\mathbb {N}}\), we may assume that there exists an upper bound on the lag for any fixed number N of particles. In support of this assumption, we know that the expected number of generations to the time where all the Enoch indices are equal, which is certainly larger than any lag selected by the proposed method, is \({\mathcal {O}}(N)\) uniformly in n; see Koskela et al. (2020). Thus, in practice there will generally exist some \(\uplambda _\text {max}\), depending on the model and on N but independent of n, such that \(\uplambda _n<\uplambda _\text {max}\) for all \(n\in {\mathbb {N}}\). Hence, the final algorithm is online, since it has both complexity and memory demand (again dominated by the storage of the Enoch indices) of order \({\mathcal {O}}(\uplambda _\text {max}N)\), independently of n, and adaptive, since the choice of each new lag is adapted to the output of the particle filter as well as the lag of the previous iteration. In the next section we are going to prove consistency of the estimator and present a heuristic argument concerning the dependence of the lag on the number of particles.

3.2 Theoretical results

Next, we show that for every \(n \in N\), the resulting adaptive-lag estimator constructed in the previous section is asymptotically consistent for the true asymptotic variance \(\sigma _{n}^2(h_{n})\), recalling however that the algorithm is meant to work in the regime where N is fixed and n is arbitrarily large. The ‘asymptotic’ algorithm is not online, since we are going to show that for all \(n\in {\mathbb {N}}\), \(\uplambda _n\) tends to n in probability as N grows, implying that in the limit we obtain the CLE at each step. Nevertheless, as we will see later, for a fixed number of particles, the range of the lags returned by the algorithm is expected to grow very slowly with N; more precisely, in Sect. 3.2.2 we argue for that this range increases only logarithmically with N, a claim that is also confirmed by our numerical experiments in Sect. 4.1.1.

3.2.1 Consistency

We now establish the consistency of the ALVar estimator.

Theorem 3.3

Let Assumption 2 hold. For every \(n\in {\mathbb {N}}\) and \(h_{m}\in {\textsf{F}}(\mathcal {X}_m)\), \(m\in \llbracket 1, n \rrbracket \), let \((\uplambda _m)_{m=1}^n\) be the lags produced by n iterations of Algorithm 3. Then, as \(N\rightarrow \infty \), it holds that \(\uplambda _n\overset{{\mathbb {P}}}{\longrightarrow }n\) and \( {\hat{\sigma }}_{n,\uplambda _n}^2(h_{n})\overset{{\mathbb {P}}}{\longrightarrow }\sigma _{n}^2(h_{n}). \)

Proof

We proceed by induction, assuming that the claim holds true for \(n-1\). For every \(\varepsilon >0\), it holds that

$$\begin{aligned}{} & {} {\mathbb {P}}(\mid {\hat{\sigma }}_{n,\uplambda _n}^2(h_{n})-\sigma _{n}^2(h_{n})\mid \ge 2\varepsilon ) \le {\mathbb {P}}(\mid {\hat{\sigma }}_{n,\uplambda _n}^2(h_{n})-{\hat{\sigma }}_{n}^2(h_{n})\mid \ge \varepsilon )\nonumber \\{} & {} \quad +{\mathbb {P}}(\mid {\hat{\sigma }}_{n}^2(h_{n})-\sigma _{n}^2(h_{n})\mid \ge \varepsilon ), \end{aligned}$$
(3.2)

where \({\hat{\sigma }}_{n}^2(h_{n})\) is the CLE defined in (2.4), based on the same particle system. The second term on the right-hand side converges to zero as \(N \rightarrow \infty \), since \( {\hat{\sigma }}_{n}^2(h_{n})={\hat{\sigma }}_{n,n}^2(h_{n}) \) is consistent for \( \sigma _{n}^2(h_{n})\) by Theorem 2.2. To treat the first term, write

$$\begin{aligned} {\mathbb {P}}(\mid {\hat{\sigma }}_{n,\uplambda _n}^2(h_{n})-{\hat{\sigma }}_{n}^2(h_{n})\mid \ge \varepsilon )\le {\mathbb {P}}({\hat{\sigma }}_{n,\uplambda _n}^2(h_{n})\ne {\hat{\sigma }}_{n}^2(h_{n})).\nonumber \\ \end{aligned}$$
(3.3)

Since \({\hat{\sigma }}_{n}^2(h_{n})={\hat{\sigma }}_{n,n}^2(h_{n})\), it holds necessarily that \(\uplambda _n\ne n\) on the event \({\hat{\sigma }}_{n,\uplambda _n}^2(h_{n})\ne {\hat{\sigma }}_{n}^2(h_{n})\); thus,

$$\begin{aligned} {\mathbb {P}}({\hat{\sigma }}_{n,\uplambda _n}^2(h_{n})\ne {\hat{\sigma }}_{n}^2(h_{n}))\le 1-{\mathbb {P}}(\uplambda _n=n). \end{aligned}$$
(3.4)

To treat the probability \({\mathbb {P}}(\uplambda _n=n)\) we may write

$$\begin{aligned} {\mathbb {P}}(\uplambda _n=n)&= {\mathbb {P}}(\uplambda _n=n, \uplambda _{n-1}={n-1} )\nonumber \\&\quad +{\mathbb {P}}(\uplambda _n=n, \uplambda _{n-1}<{n-1}) \end{aligned}$$
(3.5)
$$\begin{aligned}&={\mathbb {P}}(\uplambda _n=n, \uplambda _{n-1}={n-1}), \end{aligned}$$
(3.6)

where the second term of (3.5) is zero since \(\uplambda _n\le \uplambda _{n-1}+1\) by construction. Now,

$$\begin{aligned}{} & {} {\mathbb {P}}(\uplambda _n=n, \uplambda _{n-1}={n-1}) \nonumber \\{} & {} \quad = {\mathbb {P}}\left( \{\uplambda _{n-1}={n-1}\} \bigcap _{\uplambda = 0}^{n - 1} \{{\hat{\sigma }}_{n,n}^2(h_{n})\ge {\hat{\sigma }}_{n,\uplambda }^2(h_{n})\} \right) , \end{aligned}$$
(3.7)

where, by the induction hypothesis, \({\mathbb {P}}(\uplambda _{n-1}={n-1})\rightarrow 1\) as \(N\rightarrow \infty \). Moreover, by Theorem 2.2, it holds that \({\hat{\sigma }}_{n,\uplambda }^2(h_{n})\overset{{\mathbb {P}}}{\longrightarrow }\sigma _{n\langle \uplambda \rangle ,n}^2(h_{n})\) and \({\hat{\sigma }}_{n,n}^2(h_{n})\overset{{\mathbb {P}}}{\longrightarrow }\sigma _{n}^2(h_{n})\) as \(N\rightarrow \infty \), where \(\sigma _{n\langle \uplambda \rangle ,n}^2(h_{n})\le \sigma _{n}^2(h_{n})\) for all \(\uplambda \in \llbracket 0, n - 1 \rrbracket \), implying that (3.7) converges to one as \(N\rightarrow \infty \). Hence, \(\uplambda _n\overset{{\mathbb {P}}}{\longrightarrow }n\) and combining this with (3.6), (3.4), (3.3), and (3.2) yields that \( {\hat{\sigma }}_{n,\uplambda _n}^2(h_{n})\overset{{\mathbb {P}}}{\longrightarrow }\sigma _{n}^2(h_{n})\). Finally, the base case holds trivially true since \(\uplambda _0=0\) and \( {\hat{\sigma }}_{0,0}^2(h_{0})\overset{{\mathbb {P}}}{\longrightarrow }\sigma _{0}^2(h_{0}) \) for all \(h_{0}\in {\textsf{F}}(\mathcal {X}_0)\). \(\square \)

3.2.2 Heuristics on the dependence of the lag on the number of particles

In the light of Theorem 2.2, we expect \(\uplambda _n\) to increase with N. It is however crucial to understand how the values of the lags \((\uplambda _n)_{n\in {\mathbb {N}}}\) depend on N, since this will determine the performance and memory requirement of our algorithm. For instance, a linear dependence would imply a quadratic complexity, which is not desirable. In the rest of this section we provide a heuristic argument showing that if we minimise the MSE (3.1), then we may expect \(\uplambda _n\) to be \({\mathcal {O}}(\log N)\) for all \(n\in {\mathbb {N}}\).

If we approximate \({\mathbb {E}}[{\hat{\sigma }}_{n,\uplambda }^2(h_{n})]\) in (3.1) by the asymptotic limit \(\sigma _{n\langle \uplambda \rangle ,n}^2(h_{n})\), then the second term on the right-hand side is approximately the square of the asymptotic bias, \( (\sigma _{n}^2(h_{n})-\sigma _{n\langle \uplambda \rangle ,n}^2(h_{n}))^2\). Olsson and Douc (2019) show that under mild assumptions, the asymptotic bias is \({\mathcal {O}}(\rho ^\uplambda )\) for some mixing rate \(\rho \in (0,1)\). Regarding the variance of \({\hat{\sigma }}_{n,\uplambda }^2(h_{n})\), we know that it increases with the lag and therefore as the number of distinct Enoch indices decreases. Since the variance of a Monte Carlo estimator is generally inversely proportional to the Monte Carlo sample size, we may expect \({\text {Var}}({\hat{\sigma }}_{n,\uplambda }^2(h_{n}))\) to be \({\mathcal {O}}(1/N_\uplambda )\), where \(N_\uplambda \) is the number of distinct Enoch indices \((E_{n\langle \uplambda \rangle ,n}^{i})_{i=1}^N\) at generation \(n\langle \uplambda \rangle \). Now, by adopting the proof of Corollary 2 in Koskela et al. (2020), we may argue that under standard mixing assumptions which can be relaxed in practice, \( N_\uplambda \) is \({\mathcal {O}}(N/\uplambda )\),Footnote 3 Finally, we determine the order of the optimal lag \(\uplambda ^*\) by letting it be the minimum of the resulting crude approximation

$$\begin{aligned} \uplambda \mapsto c \frac{\uplambda }{N}+ c' \rho ^{2\uplambda } \end{aligned}$$

of the MSE (3.1) as a function of \(\uplambda \), where \(c > 0\) and \(c' > 0\) are constants independent of \(\uplambda \) and N. It is then easily seen that \(\uplambda ^*\) is

$$\begin{aligned} {\mathcal {O}}\left( \frac{1}{2}\log _\rho \left( -\frac{c}{2 c' N\log \rho }\right) \right) ={\mathcal {O}}(\log _{1/\rho }N)={\mathcal {O}}(\log N). \end{aligned}$$

Although this argument is heuristic, we will see later that it is well supported by our numerical simulations, in which the lags produced are very close the ones minimising the MSE, with a logarithmic dependence on N.

3.3 Extension to particle filters with adaptive resampling

We now consider the case in which selection is not necessarily performed at each iteration. Selection is essential in particle filters, as it copes with the well-known importance-weight degeneracy phenomenon (see, e.g., Cappé et al. 2005, Section 7.3); however, since resampling adds variance to the estimator, this operation should not be used unnecessarily. A common approach is hence to resample only when flagged by some weight-degeneracy criterion. One popular such criterion among others is the effective sample size (ESS, Liu 1996) defined by \(\textsf{ESS}_n^N {:}{=}1 / \sum _{i=1}^{N}(\omega _{n}^{i}/\Omega _{n})^2\), which gives an approximation of the number of active particles, i.e., particles with non-degenerated importance weight at time n. The ESS is minimal and equal to one when all the weights are equal to zero except one and maximal and equal to N when all weights are non-zero and equal. Using the ESS, one may, e.g., let the resampling operation be triggered only when \(\textsf{ESS}_n^N\le \alpha N\), where \(\alpha \in (0,1)\) is a design parameter. More generally, we may let \((\rho _{n}^{N})_{n\in {\mathbb {N}}}\) be a sequence of binary-valued random variables indicating whether resampling should be triggered or not. The sequence \((\rho _{n}^{N})_{n \in {\mathbb {N}}}\) is assumed to be adapted to the filtration \(({\mathcal {F}}_{n}^{N})_{n \in {\mathbb {N}}}\) generated by the particle filter, where \({\mathcal {F}}_{n}^{N} {:}{=}\sigma ((\xi _{0}^{i})_{i=1}^N, (\xi _{m}^{i}, I_{m}^{i})_{i=1}^N, m \in \llbracket 1, n \rrbracket )\). Thus, these indicators may be based on the ESS, letting \(\rho _{n}^{N}=\mathbb {1}_{\{ \textsf{ESS}_n^N<\alpha N\}}\), but also on n only, implying a deterministic selection schedule. Algorithm 4 shows one iteration of this adaptive procedure, which we later express in the compact form \((\xi _{n+1}^{i},\omega _{n+1}^{i},I_{n+1}^{i})_{i=1}^N\leftarrow \textsf{AdaPF}((\xi _{n}^{i},\omega _{n}^{i})_{i=1}^N,\rho _{n}^{N})\).

figure d

As described in the following, particle filters with adaptive resampling still satisfy a CLT, with asymptotic variance having a structure similar to that of (2.3) but depending also on an “asymptotic" resampling schedule to be defined next. We proceed in the same way as Mastrototaro et al. (2022, Section 3.2), following in turn Del Moral et al. (2012, Section 5.2), and consider, rather than a single deterministic parameter \(\alpha \), a sequence \( (\alpha _n)_{n\in {\mathbb {N}}} \) of parameters being realisations of random variables with state space (0, 1). This assumption, which can be relaxed in practice, is needed in order to deal with some technicalities in the proofs.

Assumption 3

The resampling schedule \((\rho _{n}^{N})_{n\in {\mathbb {N}}}\) is governed by the ESS, i.e., for every \(n\in {\mathbb {N}}\),

$$\begin{aligned} \rho _{n}^{N} {:}{=}\mathbb {1}_{\{\textsf{ESS}_n^N<\alpha _nN\}}, \end{aligned}$$

where the parameters \( (\alpha _n)_{n\in {\mathbb {N}}} \) are realisations of absolutely continuous independent random variables \((\upalpha _n)_{n\in {\mathbb {N}}}\) taking on values in (0, 1).

The following lemma is adopted from Mastrototaro et al. (2022, Lemma 3.5, with \(d=\infty \)).

Lemma 3.4

Let Assumption 3 hold in Algorithm 4. Then for every \(n\in {\mathbb {N}}\) and almost all \(\alpha _{0:n}\in (0,1)^{n+1}\) there exists \(\rho _{n}^{\alpha }\in \{0,1\}\) such that, as \(N\rightarrow \infty \),

$$\begin{aligned} \rho _{n}^{N}\overset{{\mathbb {P}}}{\longrightarrow }\rho _{n}^{\alpha }. \end{aligned}$$

We now have the following CLT for adaptive APFs, whose proof is found in Appendix B.

Theorem 3.5

Let Assumption 1 hold and let \((\xi _{n}^{i},\omega _{n}^{i})_{i=1}^N\) be generated by n iterations of Algorithm 4 according to a selection schedule \( (\rho _{n}^{N})_{n\in {\mathbb {N}}} \) satisfying Assumption 3. Then for every \(n\in {\mathbb {N}}\), \(h_{n}\in {\textsf{F}}(\mathcal {X}_n)\), and almost all \(\alpha _{0:n-1}\in (0,1)^{n}\), as \(N\rightarrow \infty \),

$$\begin{aligned} \sqrt{N}(\phi _{n}^{N}h_{n}-\phi _{n}h_{n})\overset{{\mathcal {D}}}{\longrightarrow }\sigma _n\langle \rho _{0:n-1}^{\alpha }\rangle (h_{n})Z, \end{aligned}$$

where Z is standard normally distributed random variable and the asymptotic variance \(\sigma _{n}^2\langle \rho _{0:n-1}^{\alpha }\rangle (h_{n})\), depending on \(\alpha _{0:n-1}\), is given in detail in Appendix B.

When designing a lag-based estimator of the asymptotic variance provided by Theorem 3.5, it turns out to be more convenient to define the lag in terms of the number of resampling operations rather than the number of iterations of the particle filter. For this purpose, let \(r_{n} {:}{=}\sum _{m=0}^{n-1}\rho _{m}^{N}\) be the counter of the number of times selection is performed before time n (with the convention \(r_{0}=0\)). Then the Enoch indices at each time n will be indexed by the resampling times \(r_{n}\) rather than n, since every iteration without resampling leaves these unaltered. More specifically, in the following, a generic Enoch index \(E_{m,r_{n}}^{i}\) will indicate the ancestor of the particle \(\xi _{n}^{i}\) at any time \(n'\in \llbracket 0, n \rrbracket \) such that \(r_{n'}=m \in \llbracket 0, r_{n} \rrbracket \). Then for all \(i\in \llbracket 1, N \rrbracket \) and \(m\in \llbracket 0, r_{n+1} \rrbracket \), the update (2.5) can been rewritten as

$$\begin{aligned} E_{m,r_{n+1}}^{i} {:}{=}{\left\{ \begin{array}{ll} i\quad &{}\text {for }m=r_{n+1},\\ E_{m,r_{n}}^{I_{n+1}^{i}}&{}\text {for }m<r_{n+1}. \end{array}\right. } \end{aligned}$$

Notice that when we do not have resampling at time n, it holds that \(r_{n+1}=r_{n}\) and \(I_{n+1}^{i}=i\), implying \( E_{m,r_{n}}^{i} = E_{m,r_{n+1}}^{i} \) for all \(m\in \llbracket 0, r_{n+1} \rrbracket \) and \(i\in \llbracket 1, N \rrbracket \). In practice, for a given \(n\in {\mathbb {N}}\), the lag takes on values in \(\llbracket 0, r_{n} \rrbracket \) instead of \(\llbracket 0, n \rrbracket \) and, as before, the expression \(r_{n}\langle \uplambda \rangle \), \(\uplambda \in {\mathbb {N}}\), indicates the quantity \((r_{n}-\uplambda )\vee 0\). In this setting, the estimator (2.6) is rewritten as

$$\begin{aligned} {\hat{\sigma }}_{n,\uplambda }^2(h_{n}) {:}{=}N\sum _{i=1}^N\left( \sum _{j:E_{r_{n}\langle \uplambda \rangle ,r_{n}}^{j}=i}\frac{\omega _{n}^{j}}{\Omega _{n}}\{h_{n}(\xi _{n}^{j})-\phi _{n}^{N}h_{n}\}\right) ^2.\nonumber \\ \end{aligned}$$
(3.8)

Algorithm 5 shows one update of the adaptive-resampling APF along with the calculation of the corresponding ALVar estimate. Corollary 3.6, whose proof is found in Appendix B, provides the consistency of the variance estimator produced by the algorithm.

figure e

Corollary 3.6

Let Assumptions 2 and 3 hold. Moreover, for every \(n\in {\mathbb {N}}\) and \(h_{m}\in {\textsf{F}}(\mathcal {X}_m)\), \(m\in \llbracket 1, n \rrbracket \), let \((\uplambda _m)_{m=1}^n\) be the lags produced by n iterations of Algorithm 5. Then, letting \( {\hat{\sigma }}_{n,\uplambda _n}^2(h_{n}) \) be computed according to (3.8), it holds for almost all \(\alpha _{0:n-1}\in (0,1)^{n}\), as \(N\rightarrow \infty \),

$$\begin{aligned} {\hat{\sigma }}_{n,\uplambda _n}^2(h_{n})\overset{{\mathbb {P}}}{\longrightarrow }\sigma _{n}^2\langle \rho _{0:n-1}^{\alpha }\rangle (h_{n}). \end{aligned}$$

3.4 Application of ALVar to lag-based fixed-point particle smoothing

As we will see next, the generality of the model framework in Sect. 2.2 allows the ALVar estimator to be used to assess the accuracy of certain online particle smoothing estimators. More precisely, for fixed \(m \in {\mathbb {N}}\) and \(h_{m} \in {\textsf{F}}(\mathcal {X}_m)\), consider online computation of the expectations \(\phi _{m\mid n}h_{m}\) as \(n \in \{m, m + 1, \ldots \}\) progresses, where for each \(n \ge m\),

$$\begin{aligned} \phi _{m\mid n}: \mathcal {X}_m \ni A \mapsto \frac{\chi {\textbf {L}}_{0,m-1}(\mathbb {1}_{A} {\textbf {L}}_{m,n-1} \mathbb {1}_{{\textsf{X}}_n})}{\chi {\textbf {L}}_{0,n-1}\mathbb {1}_{{\textsf{X}}_n}}. \end{aligned}$$
(3.9)

This problem is referred to as fixed-point smoothing. In the context of SSMs (Example 2), \(\phi _{m \mid n}\) is the conditional distribution of the hidden state at time m given all the observations up to time \(n \ge m\), and the fixed-point smoothing problem consists of updating this distribution online as new observations become available. It is well known that Algorithm 1 provides, as a by-product, particle approximations also of the distributions (3.9), in the sense that \(\phi _{m\mid n}^{N} h_{m}\), where

$$\begin{aligned} \phi _{m\mid n}^{N} {:}{=}\sum _{i = 1}^N \frac{\omega _{n}^{i}}{\Omega _{n}} \delta _{\xi _{m}^{E_{m,n}^{i}}}, \end{aligned}$$

forms a consistent estimator of \(\phi _{m \mid n} h_{m}\) for each n. In following we show how the variance of \(\phi _{m \mid n}^{N} h_{m}\) can be estimated using the ALVar estimator. For this purpose, we will exploit the generality of the model framework in Sect. 2.2 and introduce an auxiliary path-space model in which \(\phi _{m \mid n}\) can be interpreted as a filter distribution. More precisely, for every \(n\in {\mathbb {N}}\), define the path space \({\textsf{X}}_n^\textsf{path}{:}{=}{\textsf{X}}_0 \times \cdots \times {\textsf{X}}_n\) with corresponding \(\sigma \)-field and the unnormalised transition kernel

$$\begin{aligned}{} & {} {\textbf {L}}_{n}^\textsf{path}: {\textsf{X}}_n^\textsf{path}\times \mathcal {X}_{n+1}^\textsf{path}\ni (x_{0:n}, A)\\{} & {} \mapsto \int \mathbb {1}_{A}(x_{0:n + 1}) \, {\textbf {L}}_{n}(x_n, dx_{n+1}). \end{aligned}$$

Defining also, for \(n \ge m\), the path-wise objective functions \(h_{n}^\textsf{path}(x_{0:n}) {:}{=}h_{m}(x_m)\), \(x_{0:n} \in {\textsf{X}}_n^\textsf{path}\), allows us to write \(\phi _{m \mid n} h_{m} = \phi _{n}^\textsf{path}h_{n}^\textsf{path}\), where \(\phi _{n}^\textsf{path}\) is induced by (2.1) for the kernels \(({\textbf {L}}_{n}^\textsf{path})_{n \in {\mathbb {N}}}\) and the same initial distribution \(\chi \) as in the original model. In other words, by extending the original model to an auxiliary path-space model, we have been able to express the quantity of interest as a filter expectation, which can be targeted using Algorithm 1 (operating on the extended model). Thus, by defining also proposal transition kernels

$$\begin{aligned}&{\textbf {P}}_{n}^\textsf{path}: {\textsf{X}}_n^\textsf{path}\times \mathcal {X}_{n+1}^\textsf{path}\ni (x_{0:n}, A)\\&\quad \mapsto \int \mathbb {1}_{A}(x_{0:n + 1}) \, {\textbf {P}}_{n}(x_{n}, dx_{n+1}), \end{aligned}$$

of similar form it holds that \({\textbf {L}}_{n}^\textsf{path}(x_{0:n}, \cdot )\) is absolutely continuous with respect to \({\textbf {P}}_{n}^\textsf{path}(x_{0:n}, \cdot )\) for all \(x_{0:n} \in {\textsf{X}}_n^\textsf{path}\), with Radon–Nikodym derivative given by \(\gamma _{n}^\textsf{path}(x_{0:n + 1}) {:}{=}\gamma _{n} (x_n, x_{n + 1})\), \(x_{0:n + 1} \in {\textsf{X}}_{n + 1}^\textsf{path}\). Finally, we define the auxiliary adjustment-weight multipliers \(\vartheta _{n}^\textsf{path}(x_{0:n}) {:}{=}\vartheta _{n}(x_n)\), \(x_{0:n} \in {\textsf{X}}_n^\textsf{path}\). Using this interpretation, we may now address the fixed-point smoothing problem by estimating \(\phi _{n}^\textsf{path}h_{n}^\textsf{path}= \phi _{m \mid n} h_{m}\) sequentially for \(n \in \{m, m + 1, \ldots \}\) using Algorithm 1 and monitor online the variance using the ALVar.

Fig. 2
figure 2

Comparison between variance estimators in the context of optimal filtering in the stochastic volatility model (4.1). The particle sample size was \(N=1000\) and the plot displays every 50th variance estimate. The brute-force estimates are based on 2000 replicates of the particle filter

However, this approach is not without problems; indeed, due to the particle-path degeneracy phenomenon, this estimator suffers from high variance for large n, since the set \((E_{m,n}^{i})_{i = 1}^N\) of Enoch indices deteriorates eventually as n increases. Following Kitagawa and Sato (2001) and Olsson et al. (2008), this issue can be addressed by introducing a lag parameter \(\Delta \in {\mathbb {N}}^*\) and approximating \(\phi _{m \mid n} h_{m}\) by \(\phi _{m \mid m_{\Delta }(n)}^{N} h_{m}\), where \(m_{\Delta }(n) {:}{=}n \wedge (m+\Delta )\), leading to a somewhat biased but variance-reduced estimator. This idea can be most easily understood in the SSM context, where the argument is that future observations at long temporal distances from the state of interest do not affect the posterior distribution of the same, and that these can therefore be omitted from the particle estimator in order to avoid the genealogical-tree degeneracy. As the bias depends on the ergodic properties of the model (see Olsson et al. 2008, for an analysis), designing a good lag is, however, generally non-trivial (an adaptive approach was developed by Alenlöv and Olsson 2019). Nevertheless, in the following we assume that we are given some suitable lag \(\Delta \) and want to estimate the asymptotic variance of the lag-based approximation \(\phi _{m \mid m_{\Delta }(n)}^{N} h_{m}\). Moreover, since the updating of the particle approximation \(\phi _{m \mid m_{\Delta }(n)}^{N} h_{m}\) of \(\phi _{m \mid n} h_{m}\) ceases when \(n \ge m + \Delta \), we may, by keeping track of the particle history across a sliding window of fixed length \(\Delta + 1\), use this technique to address simultaneously the fixed-point smoothing problem for a range of fixed time points and objective functions, i.e., to approximate online, with time-homogeneous computational load and memory requirements, the elements of the vector

$$\begin{aligned} (\phi _{0 \mid n} h_{0}, \phi _{1 \mid n} h_{1}, \ldots , \phi _{n \mid n} h_{n}) \end{aligned}$$

as n increases, where \(h_{m} \in {\textsf{F}}(\mathcal {X}_m)\) for each m (see Alenlöv and Olsson 2019, for a treatment of this problem). More precisely, we proceed by computing, for every iteration \(n \in \{\Delta , \Delta + 1, \ldots \}\), the marginal smoothing estimate \(\phi _{n-\Delta \mid n}^{N} h_{n-\Delta }\) and furnish the same with the lag-based variance estimate

$$\begin{aligned}{} & {} {\hat{\sigma }}_{n-\Delta \mid n,\uplambda }^2(h_{n-\Delta }) \nonumber \\{} & {} \quad {:}{=}N \sum _{i=1}^N\left( \sum _{j:E_{n\langle \uplambda \rangle ,n}^{j}=i}\frac{\omega _{n}^{j}}{\Omega _{n}}\{h_{n-\Delta }(\xi _{n-\Delta }^{E_{n-\Delta ,n}^{j}})-\phi _{n-\Delta \mid n}^{N}h_{n-\Delta }\}\right) ^2\nonumber \\ \end{aligned}$$
(3.10)

obtained by applying (2.6) in the context of the auxiliary path-space particle model defined above. Here the lag \(\uplambda = \uplambda _n\) is designed adaptively with n using the ALVar in accordance with Algorithm 3, i.e., by letting, recursively, \(\uplambda _n\) be the \(\uplambda \in \llbracket 0, \uplambda _{n - 1}+1 \rrbracket \) maximising \({\hat{\sigma }}_{n-\Delta \mid n,\uplambda }^2(h_{n-\Delta })\). Note that since no estimates are computed in the first \(\Delta -1\) iterations, we let by convention \(\uplambda _n=n\) for all \(n\in \llbracket 0, \Delta -1 \rrbracket \). Interestingly, as established by the following result (which is proven in Appendix B), the lag selected adaptively by the ALVar in this manner always exceeds the smoothing lag \(\Delta \).

Proposition 3.7

Let \((\uplambda _n)_{n \in {\mathbb {N}}}\) be a sequence of lags produced by applying ALVar to the asymptotic-variance estimator (3.10). Then \(\uplambda _n \ge \Delta \) for all \(n\ge \Delta \).

In Sect. 4.1.4 we illustrate numerically that the ALVar may exhibit an excellent performance also in the context of lag-based fixed-point smoothing.

Fig. 3
figure 3

Comparison between variance estimators in the context of optimal filtering in the stochastic volatility model (4.1). The particle sample size was \(N=100{,}000\) and the plot displays every 50th variance estimate. The brute-force estimates are based on 2000 replicates of the particle filter

Fig. 4
figure 4

Similar plots as in Figs. 2 and 3, but with focus on large \(n \in \llbracket 4900, 5000 \rrbracket \). The left and right panels correspond to \(N = 1000\) and \(N = 100{,}000\), respectively

4 Numerical illustrations

In this section we apply, as an illustration, our approach to optimal filtering in SSMs (Example 2). In order to benchmark carefully our variance estimator against the fixed-lag estimator of Olsson and Douc (2019), we tested the ALVar on the same SSMs as in the latter work, namely

  • the stochastic volatility model introduced by Hull and White (1987) and

  • a linear Gaussian state space model for which exact computation of the filter is possible using the Kalman filter.

4.1 Stochastic volatility model

Our first SSM is governed by the equations

$$\begin{aligned} \begin{aligned}&X_{n+1}=a X_n +\sigma U_{n+1},\\ {}&\quad Y_n=b \exp (X_n/2) V_n, \end{aligned} n \in {\mathbb {N}},\end{aligned}$$
(4.1)

where \((U_n)_{n\in {\mathbb {N}}^*}\) and \((V_n)_{n\in {\mathbb {N}}}\) are sequences of uncorrelated standard Gaussian noise variables. The parameters are assumed to be known, with \((a,b,\sigma )=(\)0.975, 0.641, 0.165). We only observe the process \((Y_n)_{n\in {\mathbb {N}}}\), representing stock log-returns, while \((X_n)_{n\in {\mathbb {N}}}\), representing the log-volatility, is a hidden state process which we want to infer. The state \(X_0\) is initialised according to a zero-mean Gaussian distribution with variance \( \sigma ^2/(1-a^2)\), i.e., the stationary distribution of the state process. Thus, we deal with a fully dominated nonlinear SSM with \({\textsf{X}}={\textsf{Y}}={\mathbb {R}}\), \(\mathcal {X}=\mathcal {Y}={\mathcal {B}}({\mathsf {{\mathbb {R}}}})\), the Borel \(\sigma \)-field on \({\mathbb {R}}\), in which both \({\textbf {M}}\) and \({\textbf {G}}\) are Gaussian kernels.

A record \(y_{0:5000}\) of observations was obtained by simulating the process \((X_n, Y_n)_{n\in {\mathbb {N}}}\) under the dynamics (4.1) for the given parameterisation. For all \(n\in {\mathbb {N}}\), we let \(h_{n}\) be the identity function. In order to have a reliable benchmark for the variance we first implemented the naive, brute-force estimation technique described in Sect. 2.4, producing 2000 replicates of the particle filter with \(N=5000\). Then we computed the sample variance of these filter estimates at each iteration and multiplied the same by N.

Algorithm 3, with an underlying bootstrap particle filter (\({\textbf {P}}_{n} \equiv {\textbf {M}}\) and \(\vartheta _{n} \equiv 1\)), was implemented with the two different sample sizes \(N=1000\) and \(N=100000\) in order to assess stability as well as convergence. The output is displayed in Figs. 2 and 3, where the ALVar estimator is compared to the brute-force benchmark, the CLE, and the fixed-lag approach of Olsson and Douc (2019) with \(\uplambda \in \{14, 24\}\). In both cases, our estimator produces more precise and stable estimates of the asymptotic variance. Moreover, increasing the number of particles leads to significantly better accuracy, demonstrating the convergence properties of our method. These patterns can also be noticed in Fig. 4, where we focus on large values of n. As expected, we see that when n is large the CLE either drops to zero or suffers from large variance due to the depletion of the Eve indices. The fixed-lag approach has a similar behavior as our adaptive approach, being both close to the benchmark brute-force values. The fundamental difference is that in the adaptive method the lag is designed adaptively and dynamically, whereas for the fixed-lag method the lag is set to a constant value close to the average lag produced by the ALVar estimator. We stress again that without access to the ALVar procedure, the design of a suitable fixed lag \(\uplambda \) would require an exhaustive prefatory simulation-based analysis, where \(\uplambda \) is selected by producing multiple fixed-lag variance estimates for a range of different lags and repeated runs of the particle filter and comparing the same to an estimate obtained using the brute-force estimator.

Fig. 5
figure 5

Distribution of fixed-lag and adaptive variance estimates at iteration \(n=1000\). Each box is based on 100 replicates of Algorithm 2 for a given value of \(\uplambda \), except the one marked ALVar , which corresponds to our adaptive method. The average adaptive lag was approximately 19.0. Each particle filter used \(N=10{,}000\) particles. The circles correspond to the estimate produced by the brute-force algorithm and the lines and stars in the boxes indicate the medians and means of each sample, respectively

The previous plots are based upon single runs of the algorithms producing variance estimates for different \(n\in \llbracket 0, 5000 \rrbracket \); we now focus instead on how several estimates are distributed for some specific times n. In the boxplots displayed in Fig. 5, each box represents the distribution of variance estimates at time \(n = 1000\) using the ALVar algorithm, the CLE, and the fixed-lag approach with several choices of \(\uplambda \), obtained on the basis of 100 replicates of Algorithm 2 for each of these lags. For the box dedicated to the ALVar we have indicated the average \(\uplambda _{1000}\) across the 100 independent particle filter replicates (not to be confused with the average lag across all iterations of a single realisation of the particle filter). We observe that our estimator manifests negligible bias, with variability similar to the one of the best fixed-lag estimators.

In addition, we include a comparison of the ALVar estimator to the one proposed by Janati et al. (2022), in which genealogical tracing is replaced by an online update based on particle approximation of the so-called backward kernels. Still, this gain in variance and time stability comes at a price; indeed, the most efficient version of the algorithm, where a set of auxiliary statistics are propagated through backward sampling in accordance with the so-called particle-based, rapid incremental smoother (PaRIS, Olsson and Westerborn 2017), has an \({\mathcal {O}}(N^2)\) computational cost per iteration (instead of the \({\mathcal {O}}(N^3)\) cost of the Rao-Blackwellised version). Figure 6 displays boxplots of variance estimates produced using the ALVar estimator, the backward-sampling method of Janati et al. (2022) and the CLE for different times n. We observe that in the early stages the backward-sampling method is more precise than the ALVar and the CLE in terms of sample variance. However, as the number of iterations increases, it begins to exhibit some increasing empirical bias and decays eventually towards zero like the CLE, but more slowly and with less variability. On the other hand, our approach has the fundamental advantages of being (i) stable throughout all iterations and (ii) significantly faster in terms of computational time. Indeed, already with \(N=1000\), the ALVar estimator was, in our implementation, about two orders of magnitude faster. We do not exclude that the execution time of the backward-sampling method might be optimised further to reduce this gap; still, suffering from an \({\mathcal {O}}(N^2)\) computational complexity, the backward-sampling estimator will always be considerably slower than the ALVar and scale unfavorably with the number of particles N.

Fig. 6
figure 6

Boxplots of variance estimates produced using the ALVar, backward-sampling (PaRIS implementation with \(M=3\) draws) and CLE estimators for \(n\in \{50,100,750,2000,5000\}\). Each box is based on 100 independent replicates of the corresponding algorithm and each particle filter used \(N=1000\) particles. The circles correspond to estimates produced by the brute-force method and lines and stars indicate the medians and means of each sample, respectively. In our simulations one run of the ALVar with \(N=1000 \) and 5000 iterations takes 2–3 s compared to almost 5 min for the backward sampling-technique

4.1.1 Adaptive-lag analysis

In this part we investigate how the values of the lags chosen adaptively at each iteration of Algorithm 3 are distributed and depend on the number of particles N. Figure 7 displays the evolution of the chosen lags over time for \(N=1000\) particles. We see that after an initial constant increase of the lag, the values stabilise in a range between 5 and 30. An interesting pattern is the presence of regimes with constant increase of the lag, during which the same generation of Enoch indices is used, and sudden drops, when the so-far-used generation becomes depleted and a more recent one comes into substitution.

Fig. 7
figure 7

Evolution of the choice of optimal lag for \(n \in \llbracket 0, 5000 \rrbracket \) (left panel) and \(N = 1000\) particles. The right panel shows the 100 first iterations. The average lag was approximately 14.0

Without being shown here, a similar pattern can also be seen when N is increased from 1000 to 100, 000, in which case the range of selected lags is between 10 and 50, with an average around 24. This is not a surprise, since we have shown in Theorem 3.3 that the adaptive lag is expected to converge to the maximum possible value \(\uplambda _{n}=n\), why we expect the lags to increase with the sample size. The good news is that the complexity does not explode as N is increasing; indeed, the ALVar algorithm is between 1.5 and 2 times slower than a standard particle filter for \(N=1000\) particles and between 2 and 2.5 times slower for \(N=100{,}000\) particles. Moreover, our novel method always took significantly less than twice the time of a fixed-lag algorithm with \(\uplambda \) selected around the average value of the adaptive approach (1.4 and 1.7 times slower for \(N=1000\) and \(N=100000\) particles, respectively). The computational time of the ALVar procedure is closely related to the values of the adaptively selected lags: the larger the lags, the longer it takes to update the Enoch indices and to perform the update on Line 8 in Algorithm 3. In Fig. 8 we illustrate how the lags are distributed for different particle sample sizes; as predicted by our heuristic argument in Sect. 3.2.2, the dependence of the average lag on N appears to be (close to perfectly) logarithmic with respect to N. Also the maxima behave similarly.

Fig. 8
figure 8

Each box shows the distribution of lags selected adaptively by the ALVar algorithm in a single run up to iteration \(n=5000\) for a given particle sample size N (excluding the first 100 lags in each run). Different boxes correspond to different sample sizes. The dots and lines in each box represent the means and the medians of the sample distributions, respectively. The dashed line is the least squares fit between the average lag of each run and \(\log _{10}(N)\)

Fig. 9
figure 9

Each box represents the distribution of adaptive lags selected at a given iteration n by the ALVar estimator and is based on \(M=1000\) independent replicates. The red crosses and lines in each box correspond to means and medians, respectively. Blue stars correspond to lags minimising the empirical MSE evaluated at each iteration on the basis of the \(M=1000\) replicates

4.1.2 ALVar versus empirical MSE

At the beginning of Sect. 3 we claimed that an optimal choice of the lag could be the one minimising the MSE. We now want to check that the lags selected by the ALVar algorithm are sufficiently close to the ones minimising (3.1) for most iterations. As we mentioned, (3.1) is hard to evaluate analytically but can be estimated by means of the empirical MSE obtained by running \(M\in {\mathbb {N}}^*\) independent particle filters. More precisely, for every \(n\in {\mathbb {N}}\), \(\uplambda \in {\mathbb {N}}\), and \(h_{n}\in {\textsf{F}}(\mathcal {X}_n)\), we define the empirical MSE

$$\begin{aligned} \textsf{MSE}^N_n\langle h_{n} \rangle (\uplambda ) {:}{=}\frac{1}{M}\sum _{j=1}^{M}\left( {\hat{\sigma }}^{2, j}_{n, \uplambda }(h_{n}) -\sigma _{n}^2(h_{n})\right) ^2 \end{aligned}$$

at time n, where \({\hat{\sigma }}^{2, j}_{n, \uplambda }(h_{n})\) is the estimate produced by the j-th particle filter and \(\sigma _{n}^2(h_{n})\) can be approximated by a brute-force estimate. Then, for every \(n\in {\mathbb {N}}\) and \(h_{n}\in {\textsf{F}}(\mathcal {X}_n)\) we determine the optimal lag by selecting

$$\begin{aligned} {\hat{\uplambda }}_n^*\leftarrow \mathop {\mathrm {arg\,min}}\limits _{\uplambda \in \llbracket 0, n \rrbracket }\textsf{MSE}^N_n\langle h_{n} \rangle (\uplambda ). \end{aligned}$$
(4.2)

In order to compare the adaptive lags formed by the ALVar estimator to the empirical MSE-optimal lags (4.2), we run \(M=1000\) particle filters, each with \(N=10000\) particles, for \(n=500\) iterations; letting \(h_{n}=\text {id}\) for all n, we determined, for each replicate, the adaptive lags selected by the ALVar procedure as well as the ones minimising the empirical MSE. Figure 9 reports adaptive-lag distributions at some iterations, together with the lag values \({\hat{\uplambda }}_n^*\) minimising the empirical MSE; remarkably, we observe that the empirically optimal lags are within the range of lags selected by the ALVar algorithm, although the latter tends to choose slightly larger values on average.

Fig. 10
figure 10

Variance estimation in the particle filter with ESS-based selection schedule using \(N=10{,}000\) particles and a resampling parameter \(\alpha \) equal to 0.5 (top panel) and 0.2 (bottom panel). The plot displays every 50th estimate

4.1.3 Variance estimation in the case of adaptive resampling

In this section we test the ALVar estimator in the setting where the resampling operation is triggered by the ESS criterion according to Algorithm 4. Figure 10 displays brute-force estimates of the asymptotic variance as well as estimates produced by the ALVar estimator in Algorithm 5 for two distinct choices of the parameter \(\alpha \in \{0,5, 0.2\}\). In both cases, the observations \(y_{0:5000}\) generated in the previous section were used as input to the particle filter. Although being based on the same observations and exhibiting similar patterns, we notice that the two brute-force-estimated asymptotic variances differ, as expected, from each other and from the ones reported in Figs. 2 and 3. Still, in both cases the ALVar estimator is capable of tracking closely the brute-force benchmark. Since the lag size is now defined in terms of the number of selection-operation activations and not of simple iterations, the resulting average values, 3.0 when \(\alpha =0.5\) and 1.9 when \(\alpha =0.2\), are significantly smaller than before.

4.1.4 Variance estimation in fixed-point particle smoothing

We test the performance of our approach in the context of lag-based fixed-point smoothing with two distinct choices of the lag parameter \(\Delta \). As established in Sect. 3.4, we may apply directly the ALVar also in this context by storing (a part of) the trajectories and interpreting the lag-based particle smoother as a particle filter in a path model. Note that as a consequence of Proposition 3.7, it necessarily holds that the lags selected by the ALVar estimator are systematically larger than or equal to \(\Delta \). This is reflected by our simulations, reported in Fig. 11, where the average lag selected by the ALVar is about 24 and 59 in the cases \(\Delta =10\) and \(\Delta =50\), respectively, and never below the values of \(\Delta \). Clearly, an increase of \(\Delta \) implies a higher degree of path degeneracy at the time m of estimation, which in turn results in higher variance and a reduced difference of the between \(\uplambda _n\) and \(\Delta \). The latter can be viewed as a demonstration of the ability of the ALVar to reduce the lag when a higher degree of depletion of the Enoch indices is detected. The increase of variance is clear from Fig. 11. Finally, we conclude that also in this setting, the ALVar provides time stable and nearly unbiased variance estimates following closely the brute-force benchmark.

Fig. 11
figure 11

Variance estimates for lag-based fixed-point particle smoothing with \(N=10{,}000\) particles and lag parameter \(\Delta \) equal to 10 (top panel) and 50 (bottom panel). Each plot displays estimated asymptotic variances of \(\phi _{m\mid m+\Delta }^{N} \text {id}\) for \(m \in \{0, 100, 200, \ldots , 4900\}\). For the ALVar approach, means and quantiles are based on 100 independent runs. The benchmark brute-force estimator is based on 2000 independent replicates of the marginal smoother with \(N=10{,}000\) particles

4.2 Linear Gaussian SSM

As a second example, we consider optimal filtering in the linear Gaussian SSM given by

$$\begin{aligned} \begin{aligned} X_{n+1}&=A X_n +S_u U_{n+1}, \\ Y_n&=B X_n +S_v V_n, \end{aligned} \quad n \in {\mathbb {N}}. \end{aligned}$$
(4.3)

Here \({\textsf{X}}={\mathbb {R}}^{d_x}\) and \({\textsf{Y}}={\mathbb {R}}^{d_y}\), \((d_x,d_y)\in {\mathbb {N}}^*\times {\mathbb {N}}^*\), with \(\mathcal {X}\) and \(\mathcal {Y}\) being the corresponding Borel \(\sigma \)-fields. The elements in the sequences \((U_n)_{n\in {\mathbb {N}}^*}\) and \((V_n)_{n\in {\mathbb {N}}}\) are mutually independent and standard multi-normally distributed in \({\mathbb {R}}^{d_x}\) and \({\mathbb {R}}^{d_y}\), respectively. Here the matrices A and \(S_u\) are \(d_x \times d_x\), B is \(d_y \times d_x\), and \(S_v\) is \(d_y \times d_y\).

4.2.1 Approximate confidence bounds

First, in order to evaluate the ability of the ALVar estimator to provide reliable Monte Carlo confidence bounds for the targeted quantities we consider a linear Gaussian SSM with \(d_x = d_y =1\) and scalar parameters given by \((A,B,S_u,S_v)=(0.98, 1, 0.2, 1)\). For linear Gaussian models, the filter distributions are Gaussian and available in a closed form through the Kalman filter (see, e.g., Cappé et al. 2005, Section 5.2.3), which makes these models particularly well suited for assessing the performance of particle methods. We proceed as follows. Given a sequence \(y_{0:1000}\) of observations, we first calculate the exact values of \(\phi _{n} \text {id} = {\mathbb {E}}[X_n \mid Y_{0:n}=y_{0:n}]\) for \(n\in \llbracket 0, 1000 \rrbracket \) using the Kalman filter; then we execute the fully adapted APF, for which the adjustment-weight multipliers and proposal kernels are given by \(\vartheta _{n}(x) = {\textbf {L}}_{n} \mathbb {1}_{{\textsf{X}}}(x) = {\textbf {M}} g_{n+1} (x)\), \(x \in {\textsf{X}}\), and \({\textbf {P}}_{n}(x, A) = {\textbf {L}}_{n}(x, A) / {\textbf {L}}_{n} \mathbb {1}_{{\textsf{X}}}(x) = {\textbf {M}}(g_{n+1}\mathbb {1}_{A})(x)/\vartheta _{n}(x)\), \((x, A) \in {\textsf{X}} \times \mathcal {X}\), respectively. In the case of systematic resampling, the fully adapted APF always provides completely uniform importance weights. We execute this particle filter with \(N = 10000\) particles on the same data as before and furnish, using the ALVar (Algorithm 3), each produced particle estimate with a 95% confidence interval

$$\begin{aligned} \phi _{n}^{N} \text {id} \pm \uplambda _{0.025}\frac{{\hat{\sigma }}_{n,\uplambda _n}(\text {id})}{\sqrt{N}}, \end{aligned}$$

where \(\uplambda _{0.025}\) is the 2.5% quantile of the standard Gaussian distribution. The latter particle filtering procedure is repeated 200 times, yielding equally many statistically independent confidence intervals per time point n. Figure 12 reports the failure rate, i.e., the ratio of cases in which the true value falls outside the corresponding confidence interval, at each time point. We observe an average failure rate across all times around \(5.0\%\), which is the ideal value.

Fig. 12
figure 12

Empirical failure rates across iterations \(n\in \llbracket 0, 1000 \rrbracket \). Each rate is obtained on the basis of 200 replicates of Algorithm 3 with \(N=10{,}000\). The dashed line represents the overall average failure rate

Fig. 13
figure 13

ALVar-produced variance estimates of particle approximations of the filtered state radius in the multivariate linear Gaussian SSM discussed in Sect. 4.2.2. Means and quantiles are shown for \(n \in \{0, 20, 40, \ldots , 1000\}\) on the basis of 100 independent replicates of the ALVar approach and each particle filter used \(N=10{,}000\) particles. The dots correspond to estimates produced using the brute-force approach (500 replicates with \(N=1000\))

Similar results were obtained with the ESS-based approach described by Algorithm 5, where using \(\alpha = 0.2\) and \(\alpha = 0.5\) lead to average failure rates of \(5.2\%\) and \(4.9\%\), respectively. All in all, these results confirm that our approach is reliable and has small or negligible bias.

4.2.2 Increasing the dimensionality

We finally test how the ALVar responds to an increase of dimensionality. As pointed out in several works (see, e.g., Bickel et al. 2008; Rebeschini and van Handel 2015), the performance of standard particle filters deteriorates generally with increasing dimensionality, due to increased weight skewness. Countermeasures usually include increasing the number of particles or designing better proposal distributions. In order to form an idea of how this effects the performance of the ALVar, we present some results on a linear Gaussian SSM (4.3) with \(d_x=15\) and \(d_y=5\). The matrix A has entries \(a_{ij}\propto 0.5^{ \mid i-j \mid + 1}\), \((i,j)\in \llbracket 1, 15 \rrbracket ^2\), renormalised such that \(\Vert A\Vert _{2}=0.98\), while B has entries \(b_{ij}\), \((i,j)\in \llbracket 1, 5 \rrbracket \times \llbracket 1, 15 \rrbracket \), being unity if \(i=(j \mod 5)\), otherwise zero; the remaining parameters are \(S_u=0.2{{\textbf {I}}}\) and \(S_v={{\textbf {I}}}\). In this setting, the bootstrap particle filter was used to track online the expected radius of the hidden states under the filter posterior, i.e., \((\phi _{n} h_{n})_{n \in {\mathbb {N}}}\) with \(h_{n}{:}{=}\Vert \cdot \Vert _{2}\), while monitoring the variance using the ALVar. Figure 13 reports averages and quantiles of the resulting variance estimates produced at different time steps, obtained on the basis of 100 independent runs of the particle filter and associated ALVar estimator. Also in this case the variance estimator stays stable with good accuracy in the long term, without any additional precaution except a possibly larger size of the particle sample.