Particle-based online estimation of tangent filters with application to parameter estimation in nonlinear state-space models

Olsson, Jimmy; Westerborn Alenlöv, Johan

doi:10.1007/s10463-018-0698-1

Particle-based online estimation of tangent filters with application to parameter estimation in nonlinear state-space models

Published: 19 November 2018

Volume 72, pages 545–576, (2020)
Cite this article

Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Jimmy Olsson¹ &
Johan Westerborn Alenlöv¹

233 Accesses
3 Citations
Explore all metrics

Abstract

This paper presents a novel algorithm for efficient online estimation of the filter derivatives in general hidden Markov models. The algorithm, which has a linear computational complexity and very limited memory requirements, is furnished with a number of convergence results, including a central limit theorem with an asymptotic variance that can be shown to be uniformly bounded in time. Using the proposed filter derivative estimator, we design a recursive maximum likelihood algorithm updating the parameters according the gradient of the one-step predictor log-likelihood. The efficiency of this online parameter estimation scheme is illustrated in a simulation study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Kalman particle filter for online parameter estimation with applications to affine models

Article Open access 03 March 2021

Bandwidth selection in pre-smoothed particle filters

Article 03 July 2015

Conditionally Minimax Nonlinear Filter and Unscented Kalman Filter: Empirical Analysis and Comparison

Article 12 July 2019

References

Anderson, B. D. O., Moore, J. B. (1979). Optimal filtering. New Jersey: Prentice-Hall.
MATH Google Scholar
Cappé, O. (2001). Ten years of HMMs (online bibliography 1989–2000). http://perso.telecom-paristech.fr/~cappe/docs/hmmbib.html. Accessed Mar 2013.
Cappé, O. (2011). Online EM algorithm for hidden Markov models. Journal of Computational and Graphical Statistics, 20(3), 728–749.
Article MathSciNet Google Scholar
Cappé, O., Moulines, E., Rydén, T. (2005). Inference in hidden Markov models. New York: Springer.
Book MATH Google Scholar
Crisan, D., Heine, K. (2008). Stability of the discrete time filter in terms of the tails of noise distributions. Journal of the London Mathematical Society, 78(2), 441–458.
Article MathSciNet MATH Google Scholar
Del Moral, P., Guionnet, A. (2001). On the stability of interacting processes with applications to filtering and genetic algorithms. Annales de l’Institut Henri Poincaré, 37(2), 155–194.
Article MathSciNet MATH Google Scholar
Del Moral, P., Doucet, A., Singh, S. (2010). A backward particle interpretation of Feynman–Kac formulae. ESAIM: Mathematical Modelling and Numerical Analysis, 44(5), 947–975.
Article MathSciNet MATH Google Scholar
Del Moral, P., Doucet, A., Singh, S. S. (2015). Uniform stability of a particle approximation of the optimal filter derivative. SIAM Journal on Control and Optimization, 53(3), 1278–1304.
Article MathSciNet MATH Google Scholar
Delyon, B. (1996). General results on the convergence of stochastic algorithms. IEEE Transactions on Automatic Control, 41(9), 1245–1255.
Article MathSciNet MATH Google Scholar
Dempster, A. P., Laird, N. M., Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39(1), 1–38 (with discussion).
MathSciNet MATH Google Scholar
Douc, R., Matias, C. (2001). Asymptotics of the maximum likelihood estimator for general hidden Markov models. Bernoulli, 7(3), 381–420.
Article MathSciNet MATH Google Scholar
Douc, R., Garivier, A., Moulines, E., Olsson, J. (2011). Sequential Monte Carlo smoothing for general state space hidden Markov models. Annals of Applied Probability, 21(6), 2109–2145.
Article MathSciNet MATH Google Scholar
Douc, R., Moulines, E., Olsson, J. (2014). Long-term stability of sequential Monte Carlo methods under verifiable conditions. Annals of Applied Probability, 24(5), 1767–1802.
Article MathSciNet MATH Google Scholar
Doucet, A., Tadić, V. B. (2003). Parameter estimation in general state-space models using particle methods. Annals of the Institute of Statistical Mathematics, 55(2), 409–422.
MathSciNet MATH Google Scholar
Doucet, A., Godsill, S., Andrieu, C. (2000). On sequential Monte–Carlo sampling methods for Bayesian filtering. Statistics and Computing, 10(3), 197–208.
Article Google Scholar
Doucet, A., De Freitas, N., Gordon, N. (Eds.). (2001). Sequential Monte Carlo methods in practice. New York: Springer.
MATH Google Scholar
Fearnhead, P., Wyncoll, D., Tawn, J. (2010). A sequential smoothing algorithm with linear computational cost. Biometrika, 97(2), 447–464.
Article MathSciNet MATH Google Scholar
Gordon, N., Salmond, D., Smith, A. F. (1993). Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proceedings F-Radar and Signal Processing, 140(2), 107–113.
Article Google Scholar
Hull, J., White, A. (1987). The pricing of options on assets with stochastic volatilities. The Journal of Finance, 42(2), 281–300.
Article MATH Google Scholar
Jacob, P. E., Murray, L. M., Rubenthaler, S. (2013). Path storage in the particle filter. Statistics and Computing, 25(2), 487–496.
Article MathSciNet MATH Google Scholar
Jasra, A. (2015). On the behaviour of the backward interpretation of Feynman–Kac formulae under verifiable conditions. Journal of Applied Probability, 52(2), 339–359.
Article MathSciNet MATH Google Scholar
Julier, S. J., Uhlmann, J. K. (1997). A new extension of the Kalman filter to nonlinear systems. In AeroSense: The 11th international symposium on aerospace/defense sensing. Simulation and controls.
Kantas, N., Doucet, A., Singh, S. S., Maciejowski, J., Chopin, N. (2015). On particle methods for parameter estimation in state-space models. Statistical Science, 30(3), 328–351.
Article MathSciNet MATH Google Scholar
Kitagawa, G. (1996). Monte Carlo filter and smoother for non-Gaussian nonlinear state space models. Journal of Computational and Graphical Statistics, 5(1), 1–25.
MathSciNet Google Scholar
Kitagawa, G., Sato, S. (2001). Monte Carlo smoothing and self-organising state-space model. In Sequential Monte Carlo methods in practice (pp. 177–195). New York: Springer.
Chapter MATH Google Scholar
Le Corff, S., Fort, G., Moulines, E. (2011). Online expectation maximization algorithm to solve the SLAM problem. In 2011 IEEE statistical signal processing workshop (SSP) (pp. 225–228).
Le Gland, F., Mevel, L. (1996). Geometric ergodicity in hidden markov models. In Research report, RR-2991, INRIA.
Le Gland, F., Mevel, L. (1997) Recursive estimation in hidden Markov models. In Priceesings of the 36th IEEE conference on decision and control (pp. 3468–3473).
Martinez-Cantin, R., de Freitas, N., Castellanos, J. A. (2007). Analysis of particle methods for simultaneous robot localization and mapping and a new algorithm: Marginal-slam. In Proceedings 2007 IEEE international conference on robotics and automation (pp. 2415–2420).
Montemerlo, M., Thrun, S., Koller, D., Wegbreit, B. (2002). FastSLAM: A factored solution to the simultaneous localization and mapping problem. In Proceedings of the AAAI national conference on artificial intelligence. Edmonton: AAAI.
Montemerlo, M., Thrun, S., Koller, D., Wegbreit, B. (2003). An improved particle filtering algorithm for simultaneous localization and mapping that provably converges. In Proceedings of the sixteenth international joint conference on artificial intelligence (IJCAI). Acapulco: IJCAI.
Nguyen, T. N. M., Le Corff, S., Moulines, E. (2017). On the two-filter approximations of marginal smoothing distributions in general state-space models. Advances in Applied Probability, 50(1), 154–177.
Article MathSciNet MATH Google Scholar
Olsson, J., Cappé, O., Douc, R., Moulines, E. (2008). Sequential Monte Carlo smoothing with application to parameter estimation in non-linear state space models. Bernoulli, 14(1), 155–179.
Article MathSciNet MATH Google Scholar
Olsson, J., Westerborn, J. (2016). Efficient parameter inference in general hidden markov models using the filter derivatives. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 3984–3988).
Olsson, J., Westerborn, J. (2017). Efficient particle-based online smoothing in general hidden Markov models: The PaRIS algorithm. Bernoulli, 23(3), 1951–1996.
Article MathSciNet MATH Google Scholar
Poyiadjis, G., Doucet, A., Singh, S. (2011). Particle approximations of the score and observed information matrix in state space models with application to parameter estimation. Biometrika, 98(1), 65–80.
Article MathSciNet MATH Google Scholar
Poyiadjis, G., Doucet, A., Singh, S. S. (2005). Particle methods for optimal filter derivative: application to parameter estimation. In Proceedings IEEE international conference on acoustics, speech, and signal processing (pp. 925–928).
Tadic, V. B. (2010). Analyticity, convergence, and convergence rate of recursive maximum-likelihood estimation in hidden markov models. IEEE Transactions on Information Theory, 56(12), 6406–6432.
Article MathSciNet MATH Google Scholar
Tadić, V. B., Doucet, A. (2017). Asymptotic bias of stochastic gradient search. Annals of Applied Probability, 27(6), 3255–3304.
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, KTH Royal Institute of Technology, 100 44, Stockholm, Sweden
Jimmy Olsson & Johan Westerborn Alenlöv

Authors

Jimmy Olsson
View author publications
You can also search for this author in PubMed Google Scholar
Johan Westerborn Alenlöv
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Johan Westerborn Alenlöv.

Appendices

Proofs

Define for all $t \in \mathbb {N}$ and $\theta \in \Theta $,

$$\begin{aligned} \mathbf {L}_{t;\theta } : \textsf {X} \times \mathcal {X} \ni (x, A) \mapsto g_{t;\theta }(x) \mathbf {Q}_{\theta }(x, A). \end{aligned}$$

(Note that our definition of $\mathbf {L}_{t}$ differs from that used by Olsson and Westerborn (2017), in which the order of $g_{t;\theta }$ and $\mathbf {Q}_{\theta }$ is swapped.) With this notation, by the filtering recursion (4)–(5),

$$\begin{aligned} \pi _{t+1;\theta } = \frac{\pi _{t;\theta } \mathbf {L}_{t;\theta }}{\pi _{t;\theta } \mathbf {L}_{t;\theta }\mathbb {1}_{\textsf {X}}}, \end{aligned}$$

(20)

with, as previously, . This condensed form of the filtering recursion will be used in Sect. A.3.

In the coming analysis, the following decomposition will be instrumental. For all $t \in \mathbb {N}$,

(21)

1.1 Proof of Theorem 1

We apply the decomposition (21). Note that

(22)

where

(23)

Now, since and both belong to $\textsf {F}(\mathcal {X})$, Proposition 1 provides constants $c_t > 0$ and $\tilde{c}_t > 0$ such that for all $\varepsilon > 0$,

(24)

To deal with the second part of the decomposition (21), we use the same technique. First, by applying Proposition 1 with $f \equiv 1/g_{t;\theta }$ and $\tilde{f} \equiv 0$, we obtain constants $a_t > 0$ and $\tilde{a}_t > 0$ such that for all $\varepsilon > 0$,

(25)

Similarly, using Proposition 1 with and $\tilde{f} \equiv f_t / g_{t;\theta }$ provides constants $b_t > 0$ and $\tilde{b}_t > 0$ such that for all $\varepsilon > 0$,

(26)

Combining (24), (25), and (26) yields, for all $\varepsilon > 0$,

from which the statement of the theorem follows. $\square $

The following result is obtained by inspection of the proof of Olsson and Westerborn (2017, Theorem 1(i)).

Proposition 1

Let Assumption 1 hold. Then, for all $t \in \mathbb {N}$, all $\theta \in \Theta $, all additive state functionals , all measurable functions $f_t$and $\tilde{f}_t$ such that $f_t g_{t;\theta } \in \textsf {F}(\mathcal {X})$ and $\tilde{f}_t g_{t;\theta } \in \textsf {F}(\mathcal {X})$, and all $\tilde{N}\in \mathbb {N}$, there exist constants $c_t > 0$ and $\tilde{c}_t > 0$ (possibly depending on $\theta $, $h_{t}$$f_t$, $\tilde{f}_t$, and $\tilde{N}$) such that for all $\varepsilon > 0$,

where are produced using the PaRIS algorithm.

1.2 Proof of corollary 1

The $\mathbb {P}$-a.s. convergence of to is implied straightforwardly by the exponential convergence rate in Theorem 1. Indeed, note that

now, by Theorem 1,

where the right-hand side tends to zero when n tends to infinity. This completes the proof. $\square $

1.3 Proof of Theorem 2

By combining (21) and (22),

where in this case

are defined in (23). By Proposition 2, since and ,

where Z is standard normally distributed and

with $\sigma _{t;\theta }(h_{t;\theta })$ being defined in (14). Now, Proposition 1 and Proposition 2 yield

and

(with 0 denoting the zero function), respectively, implying, by Slutsky’s theorem,

Finally, we complete the proof by noting that the term in (14) coincides with the asymptotic variance provided by Del Moral et al. (2015, Theorem 3.2). $\square $

Proposition 2

Assumption 1 hold. Then for all $t \in \mathbb {N}$, all $\theta \in \Theta $, all additive state functionals , all measurable functions $f_t \in \textsf {F}(\mathcal {X})$ and $\tilde{f}_t \in \textsf {F}(\mathcal {X})$, and all $\tilde{N}\in \mathbb {N}$, as $N\rightarrow \infty $,

where Z is a standard Gaussian random variable and

$$\begin{aligned} \sigma _{t;\theta }^2 \langle f_t, \tilde{f}_t \rangle (h_{t}) {:=}\tilde{\sigma }_{t;\theta }^2 \langle f_t, \tilde{f}_t \rangle (h_{t}) + \sum _{s = 0}^{t-1} \sum _{\ell = 0}^{s} \tilde{N}^{\ell - (s+1)} \varsigma _{s, \ell , t; \theta } \langle f_t \rangle (h_{t}), \end{aligned}$$

(27)

with

and

Proof of Proposition 2

Assume first that . Then, by Lemma 1 and Slutsky’s theorem, as $\Omega _{t} / N\overset{\mathbb {P}}{\longrightarrow }\pi _{t;\theta } g_{t;\theta }$ by Proposition 1,

where again Z has standard Gaussian distribution, is given in Lemma 1, and we have set and and used, first, that and and, second, that

Now, by iterating (20) we conclude that for all $(s, t) \in \mathbb {N}^2$,

$$\begin{aligned} \pi _{t;\theta }= & {} \frac{\pi _{t - 1;\theta } \mathbf {L}_{t - 1;\theta }}{\pi _{t - 1;\theta } \mathbf {L}_{t - 1;\theta } \mathbb {1}_{\textsf {X}}} = \frac{\pi _{t - 2;\theta } \mathbf {L}_{t - 2;\theta } \mathbf {L}_{t - 1;\theta }}{(\pi _{t - 2;\theta } \mathbf {L}_{t - 2;\theta } \mathbb {1}_{\textsf {X}}) (\pi _{t - 1;\theta } \mathbf {L}_{t - 1;\theta } \mathbb {1}_{\textsf {X}})}\\= & {} \frac{\pi _{t - 2;\theta } \mathbf {L}_{t - 2;\theta } \mathbf {L}_{t - 1;\theta }}{\pi _{t - 2;\theta } \mathbf {L}_{t - 2;\theta } \mathbf {L}_{t - 1;\theta } \mathbb {1}_{\textsf {X}}} \\= & {} \frac{\pi _{s + 1;\theta } \mathbf {L}_{s + 1;\theta } \ldots \mathbf {L}_{t - 1;\theta }}{\pi _{s + 1;\theta } \mathbf {L}_{s + 1;\theta } \ldots \mathbf {L}_{t - 1;\theta } \mathbb {1}_{\textsf {X}}} \end{aligned}$$

and, consequently,

$$\begin{aligned} \pi _{t;\theta } g_{t;\theta } = \frac{\pi _{s + 1;\theta } \mathbf {L}_{s + 1;\theta } \ldots \mathbf {L}_{t - 1;\theta } g_{t;\theta }}{\pi _{s + 1;\theta } \mathbf {L}_{s + 1;\theta } \ldots \mathbf {L}_{t - 1;\theta } \mathbb {1}_{\textsf {X}}}. \end{aligned}$$

(28)

Finally, by (28) it holds that

where $\Gamma ^2_{t;\theta } \langle f_t, \tilde{f}_t \rangle (h_{t})$ is defined in (29) and $\sigma ^2_{t;\theta } \langle f_t, \tilde{f}_t \rangle (h_{t})$ is defined in (27). Finally, in the general case, the previous holds true when $\tilde{f}_t$ is replaced by , which completes the proof.

The following lemma is obtained by inspection of the proof of Olsson and Westerborn (2017, Theorem 3).

Lemma 1

Assumption 1 hold. Then for all $t \in \mathbb {N}$, all $\theta \in \Theta $, all additive state functionals , all measurable functions $f_t$and $\tilde{f}_t$ such that $f_t g_{t;\theta } \in \textsf {F}(\mathcal {X})$ and $\tilde{f}_t g_{t;\theta } \in \textsf {F}(\mathcal {X})$, and all $\tilde{N}\in \mathbb {N}$, as $N\rightarrow \infty $,

where Z is a standard normal distribution and

$$\begin{aligned} \Gamma _{t;\theta }^2 \langle f_t, \tilde{f}_t \rangle (h_{t}) {:=}\tilde{\Gamma }_{t;\theta }^2 \langle f_t, \tilde{f}_t \rangle (h_{t}) + \sum _{s = 0}^{t-1} \sum _{\ell = 0}^{s} \tilde{N}^{\ell - (s+1)} \gamma _{s, \ell , t; \theta } \langle f_t \rangle (h_{t}), \end{aligned}$$

(29)

with

and

1.4 Proof of Theorem 3

As noted above, the first term of the asymptotic variance coincides with the asymptotic variance obtained by Del Moral et al. (2015, Theorem 3.2). The same work provides a constant $c \in \mathbb {R}_+$ such that , and we may hence focus on bounding second term of the asymptotic variance.

For this purpose, note that for all $s \le t - 1$ and $x_{s + 1} \in \textsf {X}$,

(30)

By applying the forgetting of the filter, or, more particularly, Douc et al. (2011, Lemma 10), to (30) we obtain

Note that in the previous bound, the exponential contraction follows from the fact that the objective function $f_t$ is centred around its predicted mean. The latter is a consequence of the fact that the tangent filter is, as a covariance, centred itself (recall the identities (10) and (11) and the decomposition (21)). In addition, from the proof of Olsson and Westerborn (2017, Theorem 8) we extract, using Assumption 3,

and under Assumption 2, for all $x \in \textsf {X}$,

Combining the previous bounds gives

where

Finally, summing up yields

which completes the proof. $\square $

Kernels

Given two measurable spaces $(\textsf {X},\mathcal {X})$ and $(\textsf {Y},\mathcal {Y})$, an unnormalised transition kernel $\mathbf {K}$ between these spaces induces two integral operators, one acting on functions and the other on measures. Specifically, we define the function

and the measure

$$\begin{aligned} \nu \mathbf {K} : \mathcal {Y} \ni \textsf {A} \mapsto \int \mathbf {K}(x, \textsf {A}) \, \nu (\text {d}x) \quad (\nu \in \textsf {M}(\mathcal {X})) \end{aligned}$$

whenever these quantities are well defined. Moreover, let $\mathbf {L}$ be another unnormalised transition kernel from $(\textsf {Y},\mathcal {Y})$ to the measurable space $(\textsf {Z},\mathcal {Z})$; then two different products of $\mathbf {K}$ and $\mathbf {L}$ can be defined, namely

$$\begin{aligned} \mathbf {K} \mathbf {L} : \textsf {X} \times \mathcal {Z} \ni (x, \textsf {A}) \mapsto \int \mathbf {K}(x, \text {d}y) \, \mathbf {L}(y, \textsf {A}) \end{aligned}$$

and

whenever these are well defined. These products form new transition kernels from $(\textsf {X},\mathcal {X})$ to $(\textsf {Z},\mathcal {Z})$ and from $(\textsf {X},\mathcal {X})$ to , respectively. Also the -product of a kernel $\mathbf {K}$ and a measure $\nu \in \textsf {M}(\mathcal {X})$ is defined as the new measure

Finally, for any kernel $\mathbf {K}$ and any bounded measurable function h we write $\mathbf {K}^2 h {:=}(\mathbf {K} h)^2$ and $\mathbf {K} h^2 {:=}\mathbf {K}(h^2)$. Similar notation will be used for measures.

About this article

Cite this article

Olsson, J., Westerborn Alenlöv, J. Particle-based online estimation of tangent filters with application to parameter estimation in nonlinear state-space models. Ann Inst Stat Math 72, 545–576 (2020). https://doi.org/10.1007/s10463-018-0698-1

Download citation

Received: 22 December 2017
Revised: 18 October 2018
Published: 19 November 2018
Issue Date: April 2020
DOI: https://doi.org/10.1007/s10463-018-0698-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Particle-based online estimation of tangent filters with application to parameter estimation in nonlinear state-space models

Abstract

Access this article

Similar content being viewed by others

A Kalman particle filter for online parameter estimation with applications to affine models

Bandwidth selection in pre-smoothed particle filters

Conditionally Minimax Nonlinear Filter and Unscented Kalman Filter: Empirical Analysis and Comparison

References