Abstract
We investigate a new sampling scheme aimed at improving the performance of particle filters whenever (a) there is a significant mismatch between the assumed model dynamics and the actual system, or (b) the posterior probability tends to concentrate in relatively small regions of the state space. The proposed scheme pushes some particles toward specific regions where the likelihood is expected to be high, an operation known as nudging in the geophysics literature. We reinterpret nudging in a form applicable to any particle filtering scheme, as it does not involve any changes in the rest of the algorithm. Since the particles are modified, but the importance weights do not account for this modification, the use of nudging leads to additional bias in the resulting estimators. However, we prove analytically that nudged particle filters can still attain asymptotic convergence with the same error rates as conventional particle methods. Simple analysis also yields an alternative interpretation of the nudging operation that explains its robustness to model errors. Finally, we show numerical results that illustrate the improvements that can be attained using the proposed scheme. In particular, we present nonlinear tracking examples with synthetic data and a model inference example using realworld financial data.
Keywords
Particle filtering Nudging Robust filtering Data assimilation Model errors Approximation errors.1 Introduction
1.1 Background
Statespace models (SSMs) are ubiquitous in many fields of science and engineering, including weather forecasting, mathematical finance, target tracking, machine learning, population dynamics, etc., where inferring the states of dynamical systems from data plays a key role.
A SSM comprises a pair of stochastic processes \((x_t)_{t\ge 0}\) and \((y_t)_{t\ge 1}\) called signal process and observation process, respectively. The conditional relations between these processes are defined with a transition and an observation model (also called likelihood model) where observations are conditionally independent given the signal process, and the latter is itself a Markov process. Given an observation sequence, \(y_{1:t}\), the filtering problem in SSMs consists in the estimation of expectations with respect to the posterior probability distribution of the hidden states, conditional on \(y_{1:t}\), which is also referred to as the filtering distribution.
Apart from a few special cases, neither the filtering distribution nor the integrals (or expectations) with respect to it can be computed exactly; hence, one needs to resort to numerical approximations of these quantities. Particle filters (PFs) have been a classical choice for this task since their introduction by Gordon et al. (1993); see also Kitagawa (1996), Liu and Chen (1998), Doucet et al. (2000, 2001). The PF constructs an empirical approximation of the posterior probability distribution via a set of Monte Carlo samples (usually termed particles) which are modified or killed sequentially as more data are taken into account. These samples are then used to estimate the relevant expectations. The original form of the PF, often referred to as the bootstrap particle filter (BPF), has received significant attention due to its efficiency in a variety of problems, its intuitive appeal and its straightforward implementation. A large body of theoretical work concerning the BPF has also been compiled. For example, it has been proved that the expectations with respect to the empirical measures constructed by the BPF converge to the expectations with respect to the true posterior distributions when the number of particles is large enough (Del Moral and Guionnet 1999; Chopin 2004; Künsch 2005; Douc and Moulines 2008) or that they converge uniformly over time under additional assumptions related to the stability of the true distributions (Del Moral and Guionnet 2001; Del Moral 2004).
Despite the success of PFs in relatively lowdimensional settings, their use has been regarded impractical in models where \((x_t)_{t\ge 0}\) and \((y_t)_{t\ge 1}\) are sequences of highdimensional random variables. In such scenarios, standard PFs have been shown to collapse (Bengtsson et al. 2008; Snyder et al. 2008). This problem has received significant attention from the data assimilation community. The highdimensional models which are common in meteorology and other fields of Geophysics are often dealt with via an operation called nudging (Hoke and Anthes 1976; MalanotteRizzoli and Holland 1986, 1988; Zou et al. 1992). Within the particle filtering context, nudging can be defined as a transformation of the particles, which are pushed toward the observations using some observationdependent map (van Leeuwen 2009, 2010; Ades and van Leeuwen 2013, 2015). If the dimensions of the observations and the hidden states are different, which is often the case, a gain matrix is computed in order to perform the nudging operation. In van Leeuwen (2009, 2010), Ades and van Leeuwen (2013, 2015), nudging is performed after the sampling step of the particle filter. The importance weights are then computed accordingly, so that they remain proper. Hence, nudging in this version amounts to a sophisticated choice of the importance function that generates the particles. It has been shown (numerically) that the schemes proposed by van Leeuwen (2009, 2010), Ades and van Leeuwen (2013, 2015) can track highdimensional systems with a low number of particles. However, generating samples from the nudged proposal requires costly computations for each particle and the evaluation of weights becomes heavier as well. It is also unclear how to apply existing nudging schemes when nonGaussianity and nontrivial nonlinearities are present in the observation model.
A related class of algorithms includes the socalled implicit particle filters (IPFs) (Chorin and Tu 2009; Chorin et al. 2010; Atkins et al. 2013). Similar to nudging schemes, IPFs rely on the principle of pushing particles to highprobability regions in order to prevent the collapse of the filter in highdimensional state spaces. In a typical IPF, the region where particles should be generated is determined by solving an algebraic equation. This equation is model dependent, yet it can be solved for a variety of different cases (general procedures for finding solutions are given by Chorin and Tu 2009; Chorin et al. 2010). The fundamental principle underlying IPFs, moving the particles toward highprobability regions, is similar to nudging. Note, however, that unlike IPFs, nudgingbased methods are not designed to guarantee that the resulting particles land on highprobability regions; it can be the case that nudged particles are moved to relatively low probability regions (at least occasionally). Since an IPF requires the solution of a modeldependent algebraic equation for every particle, it can be computationally costly, similar to the nudging methods by van Leeuwen (2009, 2010), Ades and van Leeuwen (2013, 2015). Moreover, it is not straightforward to derive the map for the translation of particles in general models; hence, the applicability of IPFs depends heavily on the specific model at hand.
1.2 Contribution

First, we define the nudging step not just as a relaxation step toward observations but as a step that strictly increases the likelihood of a subset of particles. This definition paves the way for different nudging schemes, such as using the gradients of likelihoods or employing random search schemes to move around the state space. In particular, classical nudging (relaxation) operations arise as a special case of nudging using gradients when the likelihood is assumed to be Gaussian. Compared to IPFs, the nudging operation we propose is easier to implement as we only demand the likelihood to increase (rather than the posterior density). Indeed, nudging operators can be implemented in relatively straightforward forms, without the need to solve modeldependent equations.

Second, unlike the other nudgingbased PFs, we do not correct the bias induced by the nudging operation during the weighting step. Instead, we compute the weights in the same way they would be computed in a conventional (nonnudged) PF and the nudging step is devised to preserve the convergence rate of the PF, under mild standard assumptions, despite the bias. Moreover, computing biased weights is usually faster than computing proper (unbiased) weights. Depending on the choice of nudging scheme, the proposed algorithm can have an almost negligible computational overhead compared to the conventional PF from which it is derived.

Finally, we show that a nudged PF for a given SSM (say \({\mathcal {M}}_0\)) is equivalent to a standard BPF running on a modified dynamical model (denoted \({\mathcal {M}}_1\)). In particular, model \({\mathcal {M}}_1\) is endowed with the same likelihood function as \({\mathcal {M}}_0\), but the transition kernel is observation driven in order to match the nudging operation. As a consequence, the implicit model \({\mathcal {M}}_1\) is “adapted to the data” and we have empirically found that, for any sufficiently long sequence \(y_1, \ldots , y_t\), the evidence^{1} (Robert 2007) in favor of \({\mathcal {M}}_1\) is greater than the evidence in favor of \({\mathcal {M}}_0\). We can show, for several examples, that this implicit adaptation to the data makes the NuPF robust to mismatches in the state equation of the SSM compared to conventional PFs. In particular, provided that the likelihoods are specified or calibrated reliably, we have found that NuPFs perform reliably under a certain amount of mismatch in the transition kernel of the SSM, while standard PFs degrade clearly in the same scenario.
The second and third examples are aimed at testing the robustness of the NuPF when there is a significant misspecification in the state equation of the SSM. This is helpful in realworld applications because practitioners often have more control over measurement systems, which determine the likelihood, than they have over the state dynamics. We present computer simulation results for a stochastic Lorenz 63 model and a maneuvering target tracking problem.
In the fourth example, we present numerical results for a stochastic Lorenz 96 model, in order to show how a relatively highdimensional system can be tracked without a major increase in the computational effort compared to the standard BPF. For this set of computer simulations, we have also compared the NuPF with the ensemble Kalman filter (EnKF), which is the de facto choice for tackling this type of systems.
Let us remark that, for the two stochastic Lorenz systems, the Markov kernel in the SSM can be sampled in a relatively straightforward way, yet transition probability densities cannot be computed (as they involve a sequence of noise variables mapped by a composition of nonlinear functions). Therefore, computing proper weights for proposal functions other than the Markov kernel itself is, in general, not possible for these examples.
Finally, we demonstrate the practical use of the NuPF on a problem where a real dataset is used to fit a stochastic volatility model using either particle Markov chain Monte Carlo (pMCMC) (Andrieu et al. 2010) or nested particle filters (Crisan and Miguez 2018).
1.3 Organization
The paper is structured as follows. After a brief note about notation, we describe the SSMs of interest and the BPF in Sect. 2. Then in Sect. 3, we outline the general algorithm and the specific nudging schemes we propose to use within the PF. We prove a convergence result in Sect. 4 which shows that the new algorithm has the same asymptotic convergence rate as the BPF. We also provide an alternative interpretation of the nudging operation that explains its robustness in scenarios where there is a mismatch between the observed data and the assumed SSM. We discuss the computer simulation experiments in Sect. 5 and present results for real data in Sect. 6. Finally, we make some concluding remarks in Sect. 7.
1.4 Notation
We denote the set of real numbers as \({\mathbb {R}}\), while \({\mathbb {R}}^d = {\mathbb {R}}\times {\mathop {\cdots }\limits ^{d}} \times {\mathbb {R}}\) is the space of ddimensional real vectors. We denote the set of positive integers with \({\mathbb {N}}\) and the set of positive reals with \({\mathbb {R}}_+\). We represent the state space with \({\mathsf {X}} \subset {\mathbb {R}}^{d_x}\) and the observation space with \({\mathsf {Y}} \subset {\mathbb {R}}^{d_y}\).
In order to denote sequences, we use the shorthand notation \(x_{i_1:i_2} = \{x_{i_1},\ldots ,x_{i_2}\}\). For sets of integers, we use \([n] = \{1,\ldots ,n\}\). The pnorm of a vector \(x\in {\mathbb {R}}^d\) is defined by \(\Vert x\Vert _p = (x_1^p + \cdots + x_d^p)^{{1}/{p}}\). The \(L_p\) norm of a random variable z with probability density function (pdf) p(z) is denoted \(\Vert z\Vert _p = \left( \int z^p p(z) {\text{ d }}z\right) ^{1/p}\), for \(p\ge 1\). The Gaussian (normal) probability distribution with mean m and covariance matrix C is denoted \({\mathcal {N}}(m,C)\). We denote the identity matrix of dimension d with \(I_d\).
The supremum norm of a real function \(\varphi :{\mathsf {X}} \rightarrow {\mathbb {R}}\) is denoted \(\Vert \varphi \Vert _\infty = \sup _{x \in {\mathsf {X}}} \varphi (x)\). A function is bounded if \(\Vert \varphi \Vert _\infty < \infty \) and we indicate the space of real bounded functions \({\mathsf {X}} \rightarrow {\mathbb {R}}\) as \(B({\mathsf {X}})\). The set of probability measures on \({\mathsf {X}}\) is denoted \({\mathcal {P}}({\mathsf {X}})\), the Borel \(\sigma \)algebra of subsets of \({\mathsf {X}}\) is denoted \({\mathcal {B}}({\mathsf {X}})\), and the integral of a function \(\varphi :{\mathsf {X}}\rightarrow {\mathbb {R}}\) with respect to a measure \(\mu \) on the measurable space \(\left( {\mathsf {X}},{\mathcal {B}}({\mathsf {X}})\right) \) is denoted \((\varphi ,\mu ):=\int \varphi {\text{ d }}\mu \). The unit Dirac delta measure located at \(x \in {\mathbb {R}}^d\) is denoted \(\delta _x({\text{ d }}x)\). The Monte Carlo approximation of a measure \(\mu \) constructed using N samples is denoted as \(\mu ^N\). Given a Markov kernel \(\tau (\text{ d }x'x)\) and a measure \(\pi (\text{ d }x)\), we define the notation \(\xi (\text{ d }x') = \tau \pi \triangleq \int \tau (\text{ d }x'x) \pi (\text{ d }x)\).
2 Background
2.1 Statespace models
We are interested in the sequence of posterior probability distributions of the states generated by the SSM. To be specific, at each time \(t=1, 2, \ldots \) we aim at computing (or, at least, approximating) the probability measure \(\pi _t\) which describes the probability distribution of the state \(x_t\) conditional on the observation of the sequence \(y_{1:t}\). When it exists, we use \(\pi (x_ty_{1:t})\) to denote the pdf of \(x_t\) given \(y_{1:t}\) with respect to the Lebesgue measure, i.e., \(\pi _t({\text{ d }}x_t) = \pi (x_ty_{1:t}){\text{ d }}x_t\).
The measure \(\pi _t\) is often termed the optimal filter at time t. It is closely related to the probability measure \(\xi _t\), which describes the probability distribution of the state \(x_t\) conditional on \(y_{1:t1}\), and it is, therefore, termed the predictive measure at time t. As for the case of the optimal filter, we use \(\xi (x_ty_{1:t1})\) to denote the pdf, with respect to the Lebesgue measure, of \(x_t\) given \(y_{1:t1}\).
2.2 Bootstrap particle filter
3 Nudged particle filter
3.1 General algorithm
When considered jointly, the sampling and nudging steps in (3.1) can be seen as sampling from a proposal distribution which is obtained by modifying the kernel \(\tau _t(\cdot x_{t1})\) in a way that depends on the observation \(y_t\). Indeed, this is the classical view of nudging in the literature (van Leeuwen 2009, 2010; Ades and van Leeuwen 2013, 2015). However, unlike in this classical approach, here the weighting step does not account for the effect of nudging. In the proposed NuPF, the weights are kept the same as in the original filter, \(w_t^{(i)} \propto g_t(x_t^{(i)})\). In doing so, we save computations but, at the same time, introduce bias in the Monte Carlo estimators. One of the contributions of this paper is to show that this bias can be controlled using simple design rules for the nudging step, while practical performance can be improved at the same time.
In order to provide an explicit description of the NuPF, let us first state a definition for the nudging step.
Definition 1

First, we choose a set of indices \({\mathcal {I}}_t \subset [N]\) that identifies the particles to be nudged. Let \(M={\mathcal {I}}_t\) denote the number of elements in \({\mathcal {I}}_t\). We prove in Sect. 4 that keeping \(M \le {\mathcal {O}}({\sqrt{N}})\) allows the NuPF to converge with the same error rates \({\mathcal {O}}({1}/{\sqrt{N}})\) as the BPF. In Sect. 3.2, we discuss two simple methods to build \({\mathcal {I}}_t\) in practice.

Second, we choose an operator \(\alpha _t^{y_t}\) that guarantees an increase in the likelihood of any particle. We discuss different implementations of \(\alpha _t^{y_t}\) in Sect. 3.3.
3.2 Selection of particles to be nudged

Batch nudging. Let the number of nudged particles M be fixed. A simple way to construct \({\mathcal {I}}_t\) is to draw indices \(i_1, i_2, \ldots , i_M\) uniformly from [N] without replacement, and then let \({\mathcal {I}}_t = i_{1:M}\). We refer to this scheme as batch nudging, referring to selection of the indices at once. One advantage of this scheme is that the number of particles to be nudged, M, is deterministic and can be set a priori.

Independent nudging. The size and the elements of \({\mathcal {I}}_t\) can also be selected randomly in a number of ways. Here, we have studied a procedure in which, for each index \(i = 1, \ldots , N\), we assign \(i \in {\mathcal {I}}_t\) with probability \(\frac{M}{N}\). In this way, the actual cardinality \({\mathcal {I}}_t\) is random, but its expected value is exactly M. This procedure is particularly suitable for parallel implementations, since each index can be assigned to \({\mathcal {I}}_t\) (or not) at the same time as all others.
3.3 How to nudge
 Gradient nudging. If \(g_t(x_t)\) is a differentiable function of \(x_t\), one straightforward way to nudge particles is to take gradient steps. In Algorithm 3, we show a simple procedure with one gradient step alone, and where \(\gamma _t>0\) is a stepsize parameter and \(\nabla _x g_t(x)\) denotes the vector of partial derivatives of \(g_t\) with respect to the state variables, i.e.,Algorithms can obviously be designed where nudging involves several gradient steps. In this work, we limit our study to the singlestep case, which is shown to be effective and keeps the computational overhead to a minimum. We also note that the performance of gradient nudging can be sensitive to the choice of the stepsize parameters \(\gamma _t>0\), which are, in turn, model dependent.^{2}$$\begin{aligned} \nabla _{x_t} g_t = \left[ \begin{array}{c} \frac{\partial g_t}{\partial x_{1,t}}\\ \frac{\partial g_t}{\partial x_{2,t}}\\ \vdots \\ \frac{\partial g_t}{\partial x_{d_x,t}}\\ \end{array} \right] \quad \text{ for } \quad x_t = \left[ \begin{array}{c} x_{1,t}\\ x_{2,t}\\ \vdots \\ x_{d_x,t}\\ \end{array} \right] \in {\mathsf {X}}. \end{aligned}$$

Random nudging. Gradientfree techniques inherited from the field of global optimization can also be employed in order to push particles toward regions where they have higher likelihoods. A simple stochasticsearch technique adapted to the nudging framework is shown in Algorithm 4. We hereafter refer to the latter scheme as random search nudging.

Modelspecific nudging. Particles can also be nudged using the specific model information. For instance, in some applications the state vector \(x_t\) can be split into two subvectors, \(x_t^\text {obs}\) and \(x_t^\text {unobs}\) (observed and unobserved, respectively), such that \(g_t(x_t) = g_t(x_t^\text {obs})\), i.e., the likelihood depends only on \(x_t^\text {obs}\) and not on \(x_t^\text {unobs}\). If the relationship between \(x_t^\text {obs}\) and \(x_t^\text {unobs}\) is tractable, one can first nudge \(x_t^\text {obs}\) in order to increase the likelihood and then modify \(x_t^\text {unobs}\) in order to keep it coherent with \(x_t^\text {obs}\). A typical example of this kind arises in object tracking problems, where positions and velocities have a special and simple physical relationship, but usually only position variables are observed through a linear or nonlinear transformation. In this case, nudging would only affect the position variables. However, using these position variables, one can also nudge velocity variables with simple rules. We discuss this idea and show numerical results in Sect. 5.
3.4 Nudging general particle filters
In this paper, we limit our presentation to BPFs in order to focus on the key concepts of nudging and to ease presentation. It should be apparent, however, that nudging steps can be plugged into general PFs. More specifically, since the nudging step is algorithmically detached from the sampling and weighting steps, it can be easily used within any PF, even if it relies on different proposals and different weighting schemes. We leave for future work the investigation of the performance of nudging within widely used PFs, such as auxiliary particle filters (APFs) (Pitt and Shephard 1999).
4 Analysis
The nudging step modifies the random generation of particles in a way that is not compensated by the importance weights. Therefore, we can expect nudging to introduce bias in the resulting estimators in general. However, in Sect. 4.1 we prove that, as long as some basic guidelines are followed, the estimators of integrals with respect to the filtering measure \(\pi _t\) and the predictive measure \(\xi _t\) converge in \(L_p\) as \(N\rightarrow \infty \) with the usual Monte Carlo rate \({\mathcal {O}}(1/\sqrt{N})\). The analysis is based on a simple induction argument and ensures the consistency of a broad class of estimators. In Sect. 4.2, we briefly comment on the conditions needed to guarantee that convergence is attained uniformly over time. We do not provide a full proof, but this can be done by extending the classical arguments in Del Moral and Guionnet (2001) or Del Moral (2004) and using the same treatment of the nudging step as in the induction proof of Sect. 4.1. Finally, in Sect. 4.3, we provide an interpretation of nudging in a scenario with modeling errors. In particular, we show that the NuPF can be seen as a standard BPF for a modified dynamical model which is “a better fit” for the available data than the original SSM.
4.1 Convergence in \(L_p\)
The goal in this section is to provide theoretical guarantees of convergence for the NuPF under mild assumptions. First, we analyze a general NuPF (with arbitrary nudging operator \(\alpha _t^{y_t}\) and an upper bound on the size M of the index set \({\mathcal {I}}_t\)) and then we provide a result for a NuPF with gradient nudging.
Similar to the BPF, simple Assumption 1 stated next is sufficient for consistency and to obtain explicit error rates (Del Moral and Miclo 2000; Crisan and Doucet 2002; Míguez et al. 2013) for the NuPF, as stated in Theorem 1.
Assumption 1
Theorem 1
See “Appendix A” for a proof.
Theorem 1 is very general; it actually holds for any map \(\alpha _t^{y_t}: {\mathsf {X}}\rightarrow {\mathsf {X}}\), i.e., not necessarily a nudging operator. We can also obtain error rates for specific choices of the nudging scheme. A simple, yet practically appealing, setup is the combination of batch and gradient nudging, as described in Sects. 3.2 and 3.3, respectively.
Assumption 2
Lemma 1
See “Appendix B” for a proof.
It is straightforward to apply Lemma 1 to prove convergence of the NuPF with a batch gradientnudging step. Specifically, we have the following result.
Theorem 2
The proof is straightforward (using the same argument as in the proof of Theorem 1 combined with Lemma 1), and we omit it here. We note that Lemma 1 provides a guideline for the choice of M and \(\gamma _t\). In particular, one can select \(M = N^\beta \), where \(0< \beta < 1\), together with \(\gamma _t \le N^{\frac{1}{2}  \beta }\) in order to ensure that \(\gamma _t M \le \sqrt{N}\). Actually, it would be sufficient to set \(\gamma _t \le C N^{\frac{1}{2}  \beta }\) for some constant \(C<\infty \) in order to keep the same error rate (albeit with a different constant in the numerator of the bound). Therefore, Lemma 1 provides a heuristic to balance the step size with the number of nudged particles.^{3} We can increase the number of nudged particles, but in that case we need to shrink the step size accordingly, so as to keep \(\gamma _t M \le \sqrt{N}\). Similar results can be obtained using the gradient of the loglikelihood, \(\log g_t\), if the \(g_t\) comes from the exponential family of densities.
4.2 Uniform convergence
 (i)
The likelihood function is bounded and bounded away from zero, i.e., \(g_t \in B({\mathsf {X}})\), and there is some constant \(a>0\) such that \(\inf _{t>0, x\in {\mathsf {X}}} g_t(x) \ge a\).
 (ii)The kernel mixes sufficiently well, namely, for any given integer m there is a constant \(0<\varepsilon <1\) such thatfor any Borel set A, where \(\tau _{t+mt}\) is the composition of the kernels \(\tau _{t+m} \circ \tau _{t+m1} \circ \cdots \circ \tau _t\).$$\begin{aligned} \inf _{t>0;(x,x')\in {\mathsf {X}}^2} \frac{ \tau _{t+mt}(Ax) }{ \tau _{t+mt}(Ax') } > \varepsilon \end{aligned}$$
4.3 Nudging as a modified dynamical model
We have found in computer simulation experiments that the NuPF is consistently more robust to model errors than the conventional BPF. In order to obtain some analytical insight of this scenario, in this section we reinterpret the NuPF as a standard BPF for a modified, observationdriven dynamical model and discuss why this modified model can be expected to be a better fit for the given data than the original SSM. In this way, the NuPF can be seen as an automatic adaptation of the underlying model to the available data.
The dynamic models of interest in stochastic filtering can be defined by a prior measure \(\tau _0\), the transition kernels \(\tau _t\) and the likelihood functions \(g_t(x)=g_t(y_tx)\), for \(t\ge 1\). In this section, we write the latter as \(g_t^{y_t}(x) = g_t(y_tx)\), in order to emphasize that \(g_t\) is parametrized by the observation \(y_t\), and we also assume that every \(g_t^{y_t}\) is a normalized pdf in \(y_t\) for the sake of clarity. Hence, we can formally represent the SSM defined by (2.1), (2.2) and (2.3) as \({\mathcal {M}}_0=\{\tau _0,\tau _t,g_t^{y_t}\}\).

We first draw \({{\bar{x}}}_t^{(i)}\) from \(\tau _t({\text{ d }}x_tx_{t1}^{(i)})\),

Then generate a sample \(u_t^{(i)}\) from the uniform distribution \({\mathcal {U}}(0,1)\), and

If \(u_t^{(i)} < \varepsilon _M\), then we set \({{\tilde{x}}}_t^{(i)} = \alpha _t^{y_t}({{\bar{x}}}_t^{(i)})\), else we set \({{\tilde{x}}}_t^{(i)} = {{\bar{x}}}_t^{(i)}\).
Intuitively, one can expect that the observationdriven \({\mathcal {M}}_1\) is a better fit for the data sequence \(y_{1:T}\) than the original model \({\mathcal {M}}_0\). Within the Bayesian methodology, a common approach to compare two competing probabilistic models (\({\mathcal {M}}_0\) and \({\mathcal {M}}_1\) in this case) for a given dataset \(y_{1:t}\) is to evaluate the socalled model evidence (Bernardo and Smith 1994) for both \({\mathcal {M}}_0\) and \({\mathcal {M}}_1\).
Definition 2
The evidence (or likelihood) of a probabilistic model \({\mathcal {M}}\) for a given dataset \(y_{1:t}\) is the probability density of the data conditional on the model that we denote as \(\mathsf{p}(y_{1:t}{\mathcal {M}})\).
We have empirically observed in several computer experiments that \(\mathsf{p}(y_{1:t}{\mathcal {M}}_1) > \mathsf{p}(y_{1:t}{\mathcal {M}}_0)\) and we argue that the observationdriven kernel \({{\tilde{\tau }}}_t^{y_t}\) implicit to the NuPF is the reason why the latter filter is robust to modeling errors in the state equation, compared to standard PFs. This claim is supported by the numerical results in Sects. 5.2 and 5.3, which show how the NuPF attains a significant better performance than the standard BPF, the auxiliary PF Pitt and Shephard (1999) or the extended Kalman filter (Anderson and Moore 1979) in scenarios where the filters are built upon a transition kernel different from the one used to generate the actual observations.
While it is hard to show that \(\mathsf{p}(y_{1:t}{\mathcal {M}}_1) > \mathsf{p}(y_{1:t}{\mathcal {M}}_0)\) for every NuPF, it is indeed possible to guarantee that the latter inequality holds for specific nudging schemes. An example is provided in “Appendix C”, where we describe a certain nudging operator \(\alpha _t^{y_t}\) and then proceed to prove that \(\mathsf{p}(y_{1:t}{\mathcal {M}}_1) > \mathsf{p}(y_{1:t}{\mathcal {M}}_0)\), for that particular scheme, under some regularity conditions on the likelihoods and transition kernels.
5 Computer simulations

A stochastic Lorenz 63 model with misspecified parameters,

A maneuvering target monitored by a network of sensors collecting nonlinear observations corrupted with heavytailed noise,

And, finally, a highdimensional stochastic Lorenz 96 model.^{4}
5.1 A highdimensional, inhomogeneous linearGaussian statespace model
We compare the NuPF with three alternative PFs. The first method we implement is the PF with the optimal proposal pdf \(p(x_tx_{t1},y_t) \propto g_t(y_tx_t) \tau _t(x_tx_{t1})\), abbreviated as Optimal PF. The pdf \(p(x_tx_{t1},y_t)\) leads to an analytically tractable Gaussian density for model (5.1)–(5.3) (Doucet et al. 2000) but not in the nonlinear tracking examples below. Note, however, that at each time step, the mean and covariance matrix of this proposal have to be explicitly evaluated in order to compute the importance weights.
The third tracking algorithm implemented for model (5.1)–(5.3) is the conventional BPF.
5.2 Stochastic Lorenz 63 model with misspecified parameters
Let us note here that the Markov kernel which takes the state from time \(n1\) to time n (i.e., from the time of one observation to the time of the next observation) is straightforward to simulate using Euler–Maruyama scheme (5.6); however, the associated transition probability density cannot be evaluated because it involves the mapping of both the state and a sequence of \(t_s\) noise samples through a composition of nonlinear functions. This precludes the use of importance sampling schemes that require the evaluation of this density when computing the weights.
We have implemented the NuPF with independent gradient nudging. Each particle is nudged with probability \(\frac{1}{\sqrt{N}}\), where N is the number of particles (hence \({\mathbb {E}}[M]=\sqrt{N}\)) and the size of the gradient steps is set to \(\gamma = 0.75\) (see Algorithm 3).
Figure 2a displays the \({\text{ NMSE }}\), attained for varying number of particles N, for the standard BPF and the NuPF. It is seen that the NuPF outperforms the BPF for the whole range of values of N in the experiment, in terms of both the mean and the standard deviation of the errors, although the NMSE values become closer for larger N. The plot on the right displays the values of \(x_{2,t}\) and its estimates for a typical simulation. In general, the experiment shows that the NuPF can track the actual system using the misspecified model and a small number of particles, whereas the BPF requires a higher computational effort to attain a similar performance.
As a final experiment with this model, we have tested the robustness of the algorithms with respect to the choice of parameters in the nudging step. In particular, we have tested the NuPF with independent gradient nudging for a wide range of step sizes \(\gamma \). Also, we have tested the NuPF with random search nudging using a wide range of covariances of the form \(C = \sigma ^2 I\) by varying \(\sigma ^2\).
The results can be seen in Fig. 3. This figure shows that the algorithm is robust to the choice of parameters for a range of step sizes and variances of the random search step. As expected, random search nudging takes longer running time compared to gradient steps. This difference in run times is expected to be larger in higherdimensional models since random search is expected to be harder in such scenarios.
5.3 Object tracking with a misspecified model
We have run 10, 000 Monte Carlo runs with \(N = 500\) particles in the auxiliary particle filter (APF) (Pitt and Shephard 1999; Johansen and Doucet 2008; Douc et al. 2009), the BPF (Gordon et al. 1993) and the NuPF. We have also implemented the extended Kalman filter (EKF), which uses the gradient of the observation model.
Figure 4 shows a typical simulation run with each one of the four algorithms [on the left side, plots (a)–(d)] and a boxplot of the NMSEs obtained for the 10,000 simulations [on the right, plot (e)]. Plots (a)–(d) show that, while the EKF also uses the gradient of the observation model, it fails to handle the heavytailed noise, as it relies on Gaussian approximations. The BPF and the APF collapse due to the model mismatch in the state equation. Plot (d) shows that the NMSE of the NuPF is just slightly smaller in the mean than the NMSE of the EKF, but much more stable.
5.4 Highdimensional stochastic Lorenz 96 model
In all the simulations for this system, we run the NuPF with batch gradient nudging (with \(M = \lfloor \sqrt{N} \rfloor \) nudged particles and step size \(\gamma = 0.075\)). In the first computer experiment, we fixed the dimension \(d = 40\) and run the BPF and the NuPF with increasing number of particles. The results can be seen in Fig. 5, which shows how the NuPF performs better than the BPF in terms of NMSE [plot (a)]. Since the run times of both algorithms are nearly identical, it can be seen that, when considered jointly with NMSEs, the NuPF attains a significantly better performance [plot (b)].
In a second computer experiment, we compared the NuPF with the EnKF. Figure 6a shows how the NMSE of the two algorithms grows as the model dimension d increases and the number of particles N is kept fixed. In particular, the EnKF attains a better performance for smaller dimensions (up to \(d=10^3\)); however, its NMSE blows up for \(d > 10^3\) while the performance of the NuPF remains stable. The running time of the EnKF was also higher than the running time of the NuPF in the range of higher dimensions (\(d\ge 10^3\)).
5.5 Assessment of bias
In this section, we numerically quantify the bias of the proposed algorithm on a lowdimensional linearGaussian statespace model. To assess the bias, we compute the marginal likelihood estimates given by the BPF and the NuPF. The reason for this choice is that the BPF is known to yield unbiased estimates of the marginal likelihood (Del Moral 2004).^{6} The NuPF leads to biased (typically overestimated) marginal likelihood estimates; hence, it is of interest to compare them with those of the BPF. To this end, we choose a simple linearGaussian statespace model for which the marginal likelihood can be exactly computed as a byproduct of the Kalman filter. We then compare this exact marginal likelihood to the estimates given by the BPF and the NuPF.
We have conducted an experiment aimed at quantifying the bias \(\epsilon >0\) above. In particular, we have run 20,000 independent simulations for the BPF and the NuPF with \(N=100\), N = 1000 and \(N=10,000\). For each value of N, we have computed running empirical means as in (5.12) and (6.1) for \(K=1, \ldots , 20,000\). The variance of \({{\bar{Z}}}_{\text {BPF}}^N\) increases with T; hence, the estimators for small K display a relatively large variance and we need \(K>>1\) to clearly observe the bias. The NuPF filter performs independent gradient nudging with step size \(\gamma = 0.1\).
The results of the experiment are displayed in Fig. 7, which shows how, as expected, the NuPF overestimates \(Z^\star \). We can also see how the bias becomes smaller as N increases (because only and average of \(\sqrt{N}\) particles are nudged per time step).
6 Experimental results on model inference
In this section, we illustrate the application of the NuPF to estimate the parameters of a financial timeseries model. In particular, we adopt a stochastic volatility SSM and we aim at estimating its unknown parameters (and track its state variables) using the EURUSD logreturn data from 20141231 to 20161231 (obtained from www.quandl.com). For this task, we apply two recently proposed Monte Carlo schemes: the nested particle filter (NPF) (Crisan and Miguez 2018) (a purely recursive, particle filter style Monte Carlo method) and the particle Metropolis–Hastings (pMH) algorithm (Andrieu et al. 2010) (a batch Markov chain Monte Carlo procedure). In their original forms, both algorithms use the marginal likelihood estimators given by the BPF to construct a Monte Carlo approximation of the posterior distribution of the unknown model parameters. Here, we compare the performance of these algorithms when the marginal likelihoods are computed using either the BPF or the proposed NuPF.
6.1 Nudging the nested particle filter
We have conducted 1000 independent Monte Carlo runs for each algorithm and computed the model evidence estimates. We have used the same parameters and the same setup for the two versions of the NPF (nudged and conventional). In particular, each unknown parameter is jittered independently. The parameter \(\mu \) is jittered with a zeromean Gaussian kernel variance \(\sigma _\mu ^2 = 10^{3}\), the parameter \(\sigma _v\) is jittered with a truncated Gaussian kernel on \((0,\infty )\) with variance \({\sigma }_{\sigma _v}^2 = 10^{4}\), and the parameter \(\phi \) is jittered with a zeromean truncated Gaussian kernel on \([1,1]\), with variance \(\sigma _\phi ^2 = 10^{4}\). We have chosen a large step size for the nudging step, \(\gamma = 4\), and we have used batch nudging with \(M = \lfloor \sqrt{N} \rfloor \).
The results in Fig. 8 demonstrate empirically that the use of the nudging step within the NPF reduces the variance of the model evidence estimators; hence, it improves the numerical stability of the NPF.
6.2 Nudging the particle Metropolis–Hastings
The pMH algorithm is a Markov chain Monte Carlo (MCMC) method for inferring parameters of general SSMs (Andrieu et al. 2010). The pMH uses PFs as auxiliary devices to estimate parameter likelihoods in a similar way as the NPF uses them to compute importance weights. In the case of the pMH, these estimates should be unbiased and they are needed to determine the acceptance probability for each element of the Markov chain. For the details of the algorithm, see Andrieu et al. (2010) (or Dahlin and Schön 2015 for a tutorialstyle introduction). Let us note that the use of NuPF does not lead to an unbiased estimate of the likelihood with respect to the assumed SSM. However, as discussed in Sect. 4.3, one can view the use of nudging in this context as an implementation of pMH with an implicit dynamical model \({\mathcal {M}}_1\) derived from the original SSM \({\mathcal {M}}_0\).
We have compared the pMHBPF algorithm and the pMHNuPF scheme (using a batch nudging procedure with \(\gamma = 0.1\) and \(M = \lfloor \sqrt{N}\rfloor \)) by running 1000 independent Monte Carlo trials. We have computed the marginal likelihood estimates in the NuPF after the nudging step.
First, in order to illustrate the impact of the nudging on the parameter posteriors, we have run the pMHNuPF and the pMHBPF and obtained a long Markov chain (\(2\times 10^6\) iterations) from both algorithms. Figure 9 displays the twodimensional marginals of the resulting posterior distribution. It can be seen from Fig. 9 that the bias of the NuPF yields a perturbation compared to the posterior distribution approximated with the pMHBPF. The discrepancy is small but noticeable for small N (see Fig. 9a for \(N=100\)) and vanishes as we increase N (see Fig. 9b, c, for \(N=500\) and \(N=1000\), respectively). We observe that for a moderate number of particles, such as \(N=500\) in Fig. 9b, the error in the posterior distribution due to the bias in the NuPF is very slight.
Two common figures of merit for MCMC algorithms are the acceptance rate of the Markov kernel (desirably high) and the autocorrelation function of the chain (desirably low). Figure 10 shows the acceptance rates for the pMHNuPF and the pMHBPF algorithms with \(N=100\), \(N=500\) and \(N=1000\) particles in both PFs. It is observed that the use of nudging leads to noticeably higher acceptance rates, although the difference becomes smaller as N increases.
Figure 11 displays the average autocorrelation functions (ACFs) of the chains obtained in the 1000 independent simulations. We see that the autocorrelation of the chains produced by the pMHNuPF method decays more quickly than the autocorrelation of the chains output by the conventional pMHBPF, especially for lower values of N. Even for \(N=1000\) (which ensures an almost negligible perturbation of the posterior distribution, as shown in Fig. 9c), there is an improvement in the ACFs of the parameters \(\phi \) and \(\sigma _v\) using the NuPF. Less correlation can be expected to translate into better estimates as well for a fixed length of the chain.
7 Conclusions
We have proposed a simple modification of the particle filter which, according to our computer experiments, can improve the performance of the algorithm (e.g., when tracking highdimensional systems) or enhance its robustness to model mismatches in the state equation of a SSM. The modification of the standard particle filtering scheme consists of an additional step, which we term nudging, in which a subset of particles are pushed toward regions of the state space with a higher likelihood. In this way, the state space can be explored more efficiently while keeping the computational effort at nearly the same level as in a standard particle filter. We refer to the new algorithm as the “nudged particle filter” (NuPF). While, for clarity and simplicity, we have kept the discussion and the numerical comparisons restricted to the modification (nudging) of the conventional BPF, the new step can be naturally incorporated to most known particle filtering methods.
We have presented a basic analysis of the NuPF which indicates that the algorithm converges (in \(L_p\)) with the same error rate as the standard particle filter. In addition, we have also provided a simple reinterpretation of nudging that illustrates why the NuPF tends to outperform the BPF when there is some mismatch in the state equation of the SSM. To be specific, we have shown that, given a fixed sequence of observations, the NuPF amounts to a standard PF for a modified dynamical model which empirically leads to a higher model evidence (i.e., a higher likelihood) compared to the original SSM.
The analytical results have been supplemented with a number of computer experiments, with both synthetic and real data. In the latter case, we have tackled the fitting of a stochastic volatility SSM using Bayesian methods for model inference and a timeseries dataset consisting of eurotoUSdollar exchange rates over a period of two years. We have shown how different figures of merit (model evidence, acceptance probabilities or autocorrelation functions) improve when using the NuPF, instead of a standard BPF, in order to implement a nested particle filter (Crisan and Miguez 2018) and a particle Metropolis–Hastings (Andrieu et al. 2010) algorithm.
Since the nudging step is fairly general, it can be used with a wide range of differentiable or nondifferentiable likelihoods. Besides, the new operation does not require any modification of the welldefined steps of the PF so it can be plugged into a variety of common particle filtering methods. Therefore, it can be adopted by a practitioner with hardly any additional effort. In particular, gradientnudging steps (for differentiable loglikelihoods) can be implemented using automatic differentiation tools, currently available in many software packages, hence relieving the user from explicitly calculating the gradient of the likelihood.
Similar to the resampling step, which is routinely employed for numerical stability, we believe the nudging step can be systematically used for improving the performance and robustness of particle filters.
Footnotes
 1.
Given a dataset \(\{ y_1, \ldots , y_t \}\), the evidence in favor of a model \({\mathcal {M}}\) is the joint probability density of \(y_1, \ldots , y_t\) conditional on \({\mathcal {M}}\), denoted \(\mathsf{p}(y_{1:t}{\mathcal {M}})\).
 2.
 3.
Note that the step sizes may have to be kept small enough to ensure that \(g_t({\bar{x}}_t^{(i)} + \gamma _t \nabla _x g_t({{\bar{x}}}_t^{(i)}) ) \ge g_t({{\bar{x}}}_t^{(i)})\), so that proper nudging, according to Definition 1, is performed.
 4.
For the experiments involving Lorenz 96 model, simulation from the model is implemented in C++ and integrated into MATLAB. The rest of the simulations are fully implemented in MATLAB.
 5.
When N is increased, the results are similar for the NuPF, the optimal PF and the NuPFPW larger number particles, as they already perform close to optimally for \(N=100\), and only the BPF improves significantly.
 6.
Note that the estimates of integrals \((\varphi ,\pi _t)\) computed using the selfnormalized importance sampling approximations (i.e., \((\varphi ,\pi _t^N) \approx (\varphi ,\pi _t)\)) produced by the BPF and the NuPF methods are biased and the bias vanishes with the same rate for both algorithms as a result of Theorem 1. The same is true for the approximate predictive measures \(\xi _t^N\).
Notes
References
 Ades, M., van Leeuwen, P.J.: An exploration of the equivalent weights particle filter. Q. J. R. Meteorol. Soc. 139(672), 820–840 (2013)Google Scholar
 Ades, M., van Leeuwen, P.J.: The equivalentweights particle filter in a highdimensional system. Q. J. R. Meteorol. Soc. 141(687), 484–503 (2015)Google Scholar
 Anderson, B.D.O., Moore, J.B.: Optimal Filtering. PrenticeHall, Englewood Cliffs (1979)zbMATHGoogle Scholar
 Andrieu, C., Doucet, A., Holenstein, R.: Particle Markov chain Monte Carlo methods. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 72(3), 269–342 (2010)MathSciNetzbMATHGoogle Scholar
 Atkins, E., Morzfeld, M., Chorin, A.J.: Implicit particle methods and their connection with variational data assimilation. Mon. Weather Rev. 141(6), 1786–1803 (2013)Google Scholar
 Bain, A., Crisan, D.: Fundamentals of Stochastic Filtering. Springer, Berlin (2009)zbMATHGoogle Scholar
 Bengtsson, T., Bickel, P., Li, B.: Curseofdimensionality revisited: collapse of the particle filter in very large scale systems. Probability and statistics: Essays in Honor of David A. Freedman, pp. 316–334. Institute of Mathematical Statistics, Beachwood (2008)Google Scholar
 Bernardo, J.M., Smith, A.F.M.: Bayesian Theory. Wiley, New York (1994)zbMATHGoogle Scholar
 Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. I. Athena Scientific, Belmont (2001)zbMATHGoogle Scholar
 Bubeck, S., et al.: Convex optimization: algorithms and complexity. Found. Trends\({\textregistered }\) Mach. Learn. 8(3–4), 231–357 (2015)Google Scholar
 Chopin, N.: Central limit theorem for sequential Monte Carlo methods and its application to Bayesian inference. Ann. Stat. 32(6), 2385–2411 (2004)MathSciNetzbMATHGoogle Scholar
 Chorin, A.J., Tu, X.: Implicit sampling for particle filters. Proc. Natl. Acad. Sci. 106(41), 17249–17254 (2009)Google Scholar
 Chorin, A., Morzfeld, M., Tu, X.: Implicit particle filters for data assimilation. Commun. Appl. Math. Comput. Sci. 5(2), 221–240 (2010)MathSciNetzbMATHGoogle Scholar
 Crisan, D.: Particle filters—a theoretical perspective. In: Doucet, A., de Freitas, N., Gordon, N. (eds.) Sequential Monte Carlo Methods in Practice. Statistics for Engineering and Information Science, pp. 17–41. Springer, New York (2001)Google Scholar
 Crisan, D., Doucet, A.: A survey of convergence results on particle filtering. IEEE Trans. Signal Process. 50(3), 736–746 (2002)MathSciNetzbMATHGoogle Scholar
 Crisan, D., Miguez, J.: Uniform convergence over time of a nested particle filtering scheme for recursive parameter estimation in statespace Markov models. Adv. Appl. Probab. 49(4), 1170–1200 (2017)MathSciNetGoogle Scholar
 Crisan, D., Miguez, J.: Nested particle filters for online parameter estimation in discretetime statespace Markov models. Bernoulli 24(4A), 3039–3086 (2018)MathSciNetzbMATHGoogle Scholar
 Dahlin, J., Schön, T.B.: Getting started with particle Metropolis–Hastings for inference in nonlinear dynamical models. arxiv:1511.01707 (2015)
 Del Moral, P., Miclo, L.: Branching and interacting particle systems. Approximations of Feynman–Kac formulae with applications to nonlinear filtering. In: Azéma J., Ledoux M., Émery M., Yor M. (eds) Séminaire de Probabilités XXXIV. Lecture Notes in Mathematics, Springer, Berlin, vol. 1729, pp. 1–145 (2000)Google Scholar
 Del Moral, P.: Feynman–Kac Formulae: Genealogical and Interacting Particle Systems with Applications. Springer, New York (2004)zbMATHGoogle Scholar
 Del Moral, P., Guionnet, A.: Central limit theorem for nonlinear filtering and interacting particle systems. Ann. Appl. Probab. 9(2), 275–297 (1999)MathSciNetzbMATHGoogle Scholar
 Del Moral, P., Guionnet, A.: On the stability of interacting processes with applications to filtering and genetic algorithms. Ann. Inst. Henri Poincare (B) Probab. Stat. 37(2), 155–194 (2001)MathSciNetzbMATHGoogle Scholar
 Douc, R., Moulines, E.: Limit theorems for weighted samples with applications to sequential Monte Carlo methods. Ann. Stat. 36(5), 2344–2376 (2008)MathSciNetzbMATHGoogle Scholar
 Douc, R., Moulines, E., Olsson, J.: Optimality of the auxiliary particle filter. Probab. Math. Stat. 29(1), 1–28 (2009)MathSciNetzbMATHGoogle Scholar
 Doucet, A., Godsill, S., Andrieu, C.: On sequential Monte Carlo sampling methods for Bayesian filtering. Stat. Comput. 10(3), 197–208 (2000)Google Scholar
 Doucet, A., De Freitas, N., Gordon, N.: An introduction to sequential Monte Carlo methods. In: Doucet, A., de Freitas, N., Gordon, N. (eds.) Sequential Monte Carlo Methods in Practice. Statistics for Engineering and Information Science, pp. 3–14. Springer, New York (2001)Google Scholar
 Gordon, N.J., Salmond, D.J., Smith, A.F.: Novel approach to nonlinear/nonGaussian Bayesian state estimation. In: IEEE Proceedings F (Radar and Signal Processing), vol. 140, pp. 107–113. IET (1993)Google Scholar
 Hoke, J.E., Anthes, R.A.: The initialization of numerical models by a dynamicinitialization technique. Month. Weather Rev. 104(12), 1551–1556 (1976)Google Scholar
 Johansen, A.M., Doucet, A.: A note on auxiliary particle filters. Stat. Probab. Lett. 78(12), 1498–1504 (2008)MathSciNetzbMATHGoogle Scholar
 Kitagawa, G.: Monte Carlo filter and smoother for nonGaussian nonlinear state space models. J. Comput. Graph. Stat. 5(1), 1–25 (1996)MathSciNetGoogle Scholar
 Künsch, H.R.: Recursive Monte Carlo filters: algorithms and theoretical analysis. Ann. Stat. 33(5), 1983–2021 (2005)MathSciNetzbMATHGoogle Scholar
 Liu, J.S., Chen, R.: Sequential Monte Carlo methods for dynamic systems. J. Am. Stat. Assoc. 93(443), 1032–1044 (1998)MathSciNetzbMATHGoogle Scholar
 MalanotteRizzoli, P., Holland, W.R.: Data constraints applied to models of the ocean general circulation. Part I: the steady case. J. Phys. Oceanogr. 16(10), 1665–1682 (1986)Google Scholar
 MalanotteRizzoli, P., Holland, W.R.: Data constraints applied to models of the ocean general circulation. Part II: the transient, eddyresolving case. J. Phys. Oceanogr. 18(8), 1093–1107 (1988)Google Scholar
 Míguez, J., Crisan, D., Djurić, P.M.: On the convergence of two sequential Monte Carlo methods for maximum a posteriori sequence estimation and stochastic global optimization. Stat. Comput. 23(1), 91–107 (2013)MathSciNetzbMATHGoogle Scholar
 Oreshkin, B.N., Coates, M.J.: Analysis of error propagation in particle filters with approximation. Ann. Appl. Probab. 21(6), 2343–2378 (2011)MathSciNetzbMATHGoogle Scholar
 Pitt, M.K., Shephard, N.: Filtering via simulation: auxiliary particle filters. J. Am. Stat. Assoc. 94(446), 590–599 (1999)MathSciNetzbMATHGoogle Scholar
 Robert, C.P.: The Bayesian Choice. Springer, New York (2007)Google Scholar
 Shiryaev, A.N.: Probability. Springer, Berlin (1996)zbMATHGoogle Scholar
 Snyder, C., Bengtsson, T., Bickel, P., Anderson, J.: Obstacles to highdimensional particle filtering. Month. Weather Rev. 136(12), 4629–4640 (2008)Google Scholar
 Tsay, R.S.: Analysis of Financial Time Series. Wiley, New York (2005)zbMATHGoogle Scholar
 van Leeuwen, P.J.: Particle filtering in geophysical systems. Month. Weather Rev. 137(12), 4089–4114 (2009)Google Scholar
 van Leeuwen, P.J.: Nonlinear data assimilation in geosciences: an extremely efficient particle filter. Q. J. R. Meteorol. Soc. 136(653), 1991–1999 (2010)Google Scholar
 Zou, X., Navon, I., LeDimet, F.: An optimal nudging data assimilation scheme using parameter estimation. Q. J. R. Meteorol. Soc. 118(508), 1163–1186 (1992)Google Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.