Skip to main content
Log in

Particle-based online estimation of tangent filters with application to parameter estimation in nonlinear state-space models

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

This paper presents a novel algorithm for efficient online estimation of the filter derivatives in general hidden Markov models. The algorithm, which has a linear computational complexity and very limited memory requirements, is furnished with a number of convergence results, including a central limit theorem with an asymptotic variance that can be shown to be uniformly bounded in time. Using the proposed filter derivative estimator, we design a recursive maximum likelihood algorithm updating the parameters according the gradient of the one-step predictor log-likelihood. The efficiency of this online parameter estimation scheme is illustrated in a simulation study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Anderson, B. D. O., Moore, J. B. (1979). Optimal filtering. New Jersey: Prentice-Hall.

    MATH  Google Scholar 

  • Cappé, O. (2001). Ten years of HMMs (online bibliography 1989–2000). http://perso.telecom-paristech.fr/~cappe/docs/hmmbib.html. Accessed Mar 2013.

  • Cappé, O. (2011). Online EM algorithm for hidden Markov models. Journal of Computational and Graphical Statistics, 20(3), 728–749.

    Article  MathSciNet  Google Scholar 

  • Cappé, O., Moulines, E., Rydén, T. (2005). Inference in hidden Markov models. New York: Springer.

    Book  MATH  Google Scholar 

  • Crisan, D., Heine, K. (2008). Stability of the discrete time filter in terms of the tails of noise distributions. Journal of the London Mathematical Society, 78(2), 441–458.

    Article  MathSciNet  MATH  Google Scholar 

  • Del Moral, P., Guionnet, A. (2001). On the stability of interacting processes with applications to filtering and genetic algorithms. Annales de l’Institut Henri Poincaré, 37(2), 155–194.

    Article  MathSciNet  MATH  Google Scholar 

  • Del Moral, P., Doucet, A., Singh, S. (2010). A backward particle interpretation of Feynman–Kac formulae. ESAIM: Mathematical Modelling and Numerical Analysis, 44(5), 947–975.

    Article  MathSciNet  MATH  Google Scholar 

  • Del Moral, P., Doucet, A., Singh, S. S. (2015). Uniform stability of a particle approximation of the optimal filter derivative. SIAM Journal on Control and Optimization, 53(3), 1278–1304.

    Article  MathSciNet  MATH  Google Scholar 

  • Delyon, B. (1996). General results on the convergence of stochastic algorithms. IEEE Transactions on Automatic Control, 41(9), 1245–1255.

    Article  MathSciNet  MATH  Google Scholar 

  • Dempster, A. P., Laird, N. M., Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39(1), 1–38 (with discussion).

    MathSciNet  MATH  Google Scholar 

  • Douc, R., Matias, C. (2001). Asymptotics of the maximum likelihood estimator for general hidden Markov models. Bernoulli, 7(3), 381–420.

    Article  MathSciNet  MATH  Google Scholar 

  • Douc, R., Garivier, A., Moulines, E., Olsson, J. (2011). Sequential Monte Carlo smoothing for general state space hidden Markov models. Annals of Applied Probability, 21(6), 2109–2145.

    Article  MathSciNet  MATH  Google Scholar 

  • Douc, R., Moulines, E., Olsson, J. (2014). Long-term stability of sequential Monte Carlo methods under verifiable conditions. Annals of Applied Probability, 24(5), 1767–1802.

    Article  MathSciNet  MATH  Google Scholar 

  • Doucet, A., Tadić, V. B. (2003). Parameter estimation in general state-space models using particle methods. Annals of the Institute of Statistical Mathematics, 55(2), 409–422.

    MathSciNet  MATH  Google Scholar 

  • Doucet, A., Godsill, S., Andrieu, C. (2000). On sequential Monte–Carlo sampling methods for Bayesian filtering. Statistics and Computing, 10(3), 197–208.

    Article  Google Scholar 

  • Doucet, A., De Freitas, N., Gordon, N. (Eds.). (2001). Sequential Monte Carlo methods in practice. New York: Springer.

    MATH  Google Scholar 

  • Fearnhead, P., Wyncoll, D., Tawn, J. (2010). A sequential smoothing algorithm with linear computational cost. Biometrika, 97(2), 447–464.

    Article  MathSciNet  MATH  Google Scholar 

  • Gordon, N., Salmond, D., Smith, A. F. (1993). Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proceedings F-Radar and Signal Processing, 140(2), 107–113.

    Article  Google Scholar 

  • Hull, J., White, A. (1987). The pricing of options on assets with stochastic volatilities. The Journal of Finance, 42(2), 281–300.

    Article  MATH  Google Scholar 

  • Jacob, P. E., Murray, L. M., Rubenthaler, S. (2013). Path storage in the particle filter. Statistics and Computing, 25(2), 487–496.

    Article  MathSciNet  MATH  Google Scholar 

  • Jasra, A. (2015). On the behaviour of the backward interpretation of Feynman–Kac formulae under verifiable conditions. Journal of Applied Probability, 52(2), 339–359.

    Article  MathSciNet  MATH  Google Scholar 

  • Julier, S. J., Uhlmann, J. K. (1997). A new extension of the Kalman filter to nonlinear systems. In AeroSense: The 11th international symposium on aerospace/defense sensing. Simulation and controls.

  • Kantas, N., Doucet, A., Singh, S. S., Maciejowski, J., Chopin, N. (2015). On particle methods for parameter estimation in state-space models. Statistical Science, 30(3), 328–351.

    Article  MathSciNet  MATH  Google Scholar 

  • Kitagawa, G. (1996). Monte Carlo filter and smoother for non-Gaussian nonlinear state space models. Journal of Computational and Graphical Statistics, 5(1), 1–25.

    MathSciNet  Google Scholar 

  • Kitagawa, G., Sato, S. (2001). Monte Carlo smoothing and self-organising state-space model. In Sequential Monte Carlo methods in practice (pp. 177–195). New York: Springer.

    Chapter  MATH  Google Scholar 

  • Le Corff, S., Fort, G., Moulines, E. (2011). Online expectation maximization algorithm to solve the SLAM problem. In 2011 IEEE statistical signal processing workshop (SSP) (pp. 225–228).

  • Le Gland, F., Mevel, L. (1996). Geometric ergodicity in hidden markov models. In Research report, RR-2991, INRIA.

  • Le Gland, F., Mevel, L. (1997) Recursive estimation in hidden Markov models. In Priceesings of the 36th IEEE conference on decision and control (pp. 3468–3473).

  • Martinez-Cantin, R., de Freitas, N., Castellanos, J. A. (2007). Analysis of particle methods for simultaneous robot localization and mapping and a new algorithm: Marginal-slam. In Proceedings 2007 IEEE international conference on robotics and automation (pp. 2415–2420).

  • Montemerlo, M., Thrun, S., Koller, D., Wegbreit, B. (2002). FastSLAM: A factored solution to the simultaneous localization and mapping problem. In Proceedings of the AAAI national conference on artificial intelligence. Edmonton: AAAI.

  • Montemerlo, M., Thrun, S., Koller, D., Wegbreit, B. (2003). An improved particle filtering algorithm for simultaneous localization and mapping that provably converges. In Proceedings of the sixteenth international joint conference on artificial intelligence (IJCAI). Acapulco: IJCAI.

  • Nguyen, T. N. M., Le Corff, S., Moulines, E. (2017). On the two-filter approximations of marginal smoothing distributions in general state-space models. Advances in Applied Probability, 50(1), 154–177.

    Article  MathSciNet  MATH  Google Scholar 

  • Olsson, J., Cappé, O., Douc, R., Moulines, E. (2008). Sequential Monte Carlo smoothing with application to parameter estimation in non-linear state space models. Bernoulli, 14(1), 155–179.

    Article  MathSciNet  MATH  Google Scholar 

  • Olsson, J., Westerborn, J. (2016). Efficient parameter inference in general hidden markov models using the filter derivatives. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 3984–3988).

  • Olsson, J., Westerborn, J. (2017). Efficient particle-based online smoothing in general hidden Markov models: The PaRIS algorithm. Bernoulli, 23(3), 1951–1996.

    Article  MathSciNet  MATH  Google Scholar 

  • Poyiadjis, G., Doucet, A., Singh, S. (2011). Particle approximations of the score and observed information matrix in state space models with application to parameter estimation. Biometrika, 98(1), 65–80.

    Article  MathSciNet  MATH  Google Scholar 

  • Poyiadjis, G., Doucet, A., Singh, S. S. (2005). Particle methods for optimal filter derivative: application to parameter estimation. In Proceedings IEEE international conference on acoustics, speech, and signal processing (pp. 925–928).

  • Tadic, V. B. (2010). Analyticity, convergence, and convergence rate of recursive maximum-likelihood estimation in hidden markov models. IEEE Transactions on Information Theory, 56(12), 6406–6432.

    Article  MathSciNet  MATH  Google Scholar 

  • Tadić, V. B., Doucet, A. (2017). Asymptotic bias of stochastic gradient search. Annals of Applied Probability, 27(6), 3255–3304.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Johan Westerborn Alenlöv.

Appendices

Proofs

Define for all \(t \in \mathbb {N}\) and \(\theta \in \Theta \),

$$\begin{aligned} \mathbf {L}_{t;\theta } : \textsf {X} \times \mathcal {X} \ni (x, A) \mapsto g_{t;\theta }(x) \mathbf {Q}_{\theta }(x, A). \end{aligned}$$

(Note that our definition of \(\mathbf {L}_{t}\) differs from that used by Olsson and Westerborn (2017), in which the order of \(g_{t;\theta }\) and \(\mathbf {Q}_{\theta }\) is swapped.) With this notation, by the filtering recursion (4)–(5),

$$\begin{aligned} \pi _{t+1;\theta } = \frac{\pi _{t;\theta } \mathbf {L}_{t;\theta }}{\pi _{t;\theta } \mathbf {L}_{t;\theta }\mathbb {1}_{\textsf {X}}}, \end{aligned}$$
(20)

with, as previously, . This condensed form of the filtering recursion will be used in Sect. A.3.

In the coming analysis, the following decomposition will be instrumental. For all \(t \in \mathbb {N}\),

(21)

1.1 Proof of Theorem 1

We apply the decomposition (21). Note that

(22)

where

(23)

Now, since and both belong to \(\textsf {F}(\mathcal {X})\), Proposition 1 provides constants \(c_t > 0\) and \(\tilde{c}_t > 0\) such that for all \(\varepsilon > 0\),

(24)

To deal with the second part of the decomposition (21), we use the same technique. First, by applying Proposition 1 with \(f \equiv 1/g_{t;\theta }\) and \(\tilde{f} \equiv 0\), we obtain constants \(a_t > 0\) and \(\tilde{a}_t > 0\) such that for all \(\varepsilon > 0\),

(25)

Similarly, using Proposition 1 with and \(\tilde{f} \equiv f_t / g_{t;\theta }\) provides constants \(b_t > 0\) and \(\tilde{b}_t > 0\) such that for all \(\varepsilon > 0\),

(26)

Combining (24), (25), and (26) yields, for all \(\varepsilon > 0\),

from which the statement of the theorem follows. \(\square \)

The following result is obtained by inspection of the proof of Olsson and Westerborn (2017, Theorem 1(i)).

Proposition 1

Let Assumption 1 hold. Then, for all \(t \in \mathbb {N}\), all \(\theta \in \Theta \), all additive state functionals , all measurable functions \(f_t\)and \(\tilde{f}_t\) such that \(f_t g_{t;\theta } \in \textsf {F}(\mathcal {X})\) and \(\tilde{f}_t g_{t;\theta } \in \textsf {F}(\mathcal {X})\), and all \(\tilde{N}\in \mathbb {N}\), there exist constants \(c_t > 0\) and \(\tilde{c}_t > 0\) (possibly depending on \(\theta \), \(h_{t}\)\(f_t\), \(\tilde{f}_t\), and \(\tilde{N}\)) such that for all \(\varepsilon > 0\),

where are produced using the PaRIS algorithm.

1.2 Proof of corollary 1

The \(\mathbb {P}\)-a.s. convergence of to is implied straightforwardly by the exponential convergence rate in Theorem 1. Indeed, note that

now, by Theorem 1,

where the right-hand side tends to zero when n tends to infinity. This completes the proof. \(\square \)

1.3 Proof of Theorem 2

By combining (21) and (22),

where in this case

are defined in (23). By Proposition 2, since and ,

where Z is standard normally distributed and

with \(\sigma _{t;\theta }(h_{t;\theta })\) being defined in (14). Now, Proposition 1 and Proposition 2 yield

and

(with 0 denoting the zero function), respectively, implying, by Slutsky’s theorem,

Finally, we complete the proof by noting that the term in (14) coincides with the asymptotic variance provided by Del Moral et al. (2015, Theorem 3.2). \(\square \)

Proposition 2

Assumption 1 hold. Then for all \(t \in \mathbb {N}\), all \(\theta \in \Theta \), all additive state functionals , all measurable functions \(f_t \in \textsf {F}(\mathcal {X})\) and \(\tilde{f}_t \in \textsf {F}(\mathcal {X})\), and all \(\tilde{N}\in \mathbb {N}\), as \(N\rightarrow \infty \),

where Z is a standard Gaussian random variable and

$$\begin{aligned} \sigma _{t;\theta }^2 \langle f_t, \tilde{f}_t \rangle (h_{t}) {:=}\tilde{\sigma }_{t;\theta }^2 \langle f_t, \tilde{f}_t \rangle (h_{t}) + \sum _{s = 0}^{t-1} \sum _{\ell = 0}^{s} \tilde{N}^{\ell - (s+1)} \varsigma _{s, \ell , t; \theta } \langle f_t \rangle (h_{t}), \end{aligned}$$
(27)

with

and

Proof of Proposition 2

Assume first that . Then, by Lemma 1 and Slutsky’s theorem, as \(\Omega _{t} / N\overset{\mathbb {P}}{\longrightarrow }\pi _{t;\theta } g_{t;\theta }\) by Proposition 1,

where again Z has standard Gaussian distribution, is given in Lemma 1, and we have set and and used, first, that and and, second, that

Now, by iterating (20) we conclude that for all \((s, t) \in \mathbb {N}^2\),

$$\begin{aligned} \pi _{t;\theta }= & {} \frac{\pi _{t - 1;\theta } \mathbf {L}_{t - 1;\theta }}{\pi _{t - 1;\theta } \mathbf {L}_{t - 1;\theta } \mathbb {1}_{\textsf {X}}} = \frac{\pi _{t - 2;\theta } \mathbf {L}_{t - 2;\theta } \mathbf {L}_{t - 1;\theta }}{(\pi _{t - 2;\theta } \mathbf {L}_{t - 2;\theta } \mathbb {1}_{\textsf {X}}) (\pi _{t - 1;\theta } \mathbf {L}_{t - 1;\theta } \mathbb {1}_{\textsf {X}})}\\= & {} \frac{\pi _{t - 2;\theta } \mathbf {L}_{t - 2;\theta } \mathbf {L}_{t - 1;\theta }}{\pi _{t - 2;\theta } \mathbf {L}_{t - 2;\theta } \mathbf {L}_{t - 1;\theta } \mathbb {1}_{\textsf {X}}} \\= & {} \frac{\pi _{s + 1;\theta } \mathbf {L}_{s + 1;\theta } \ldots \mathbf {L}_{t - 1;\theta }}{\pi _{s + 1;\theta } \mathbf {L}_{s + 1;\theta } \ldots \mathbf {L}_{t - 1;\theta } \mathbb {1}_{\textsf {X}}} \end{aligned}$$

and, consequently,

$$\begin{aligned} \pi _{t;\theta } g_{t;\theta } = \frac{\pi _{s + 1;\theta } \mathbf {L}_{s + 1;\theta } \ldots \mathbf {L}_{t - 1;\theta } g_{t;\theta }}{\pi _{s + 1;\theta } \mathbf {L}_{s + 1;\theta } \ldots \mathbf {L}_{t - 1;\theta } \mathbb {1}_{\textsf {X}}}. \end{aligned}$$
(28)

Finally, by (28) it holds that

where \(\Gamma ^2_{t;\theta } \langle f_t, \tilde{f}_t \rangle (h_{t})\) is defined in (29) and \(\sigma ^2_{t;\theta } \langle f_t, \tilde{f}_t \rangle (h_{t})\) is defined in (27). Finally, in the general case, the previous holds true when \(\tilde{f}_t\) is replaced by , which completes the proof.

The following lemma is obtained by inspection of the proof of Olsson and Westerborn (2017, Theorem 3).

Lemma 1

Assumption 1 hold. Then for all \(t \in \mathbb {N}\), all \(\theta \in \Theta \), all additive state functionals , all measurable functions \(f_t\)and \(\tilde{f}_t\) such that \(f_t g_{t;\theta } \in \textsf {F}(\mathcal {X})\) and \(\tilde{f}_t g_{t;\theta } \in \textsf {F}(\mathcal {X})\), and all \(\tilde{N}\in \mathbb {N}\), as \(N\rightarrow \infty \),

where Z is a standard normal distribution and

$$\begin{aligned} \Gamma _{t;\theta }^2 \langle f_t, \tilde{f}_t \rangle (h_{t}) {:=}\tilde{\Gamma }_{t;\theta }^2 \langle f_t, \tilde{f}_t \rangle (h_{t}) + \sum _{s = 0}^{t-1} \sum _{\ell = 0}^{s} \tilde{N}^{\ell - (s+1)} \gamma _{s, \ell , t; \theta } \langle f_t \rangle (h_{t}), \end{aligned}$$
(29)

with

and

1.4 Proof of Theorem 3

As noted above, the first term of the asymptotic variance coincides with the asymptotic variance obtained by Del Moral et al. (2015, Theorem 3.2). The same work provides a constant \(c \in \mathbb {R}_+\) such that , and we may hence focus on bounding second term of the asymptotic variance.

For this purpose, note that for all \(s \le t - 1\) and \(x_{s + 1} \in \textsf {X}\),

(30)

By applying the forgetting of the filter, or, more particularly, Douc et al. (2011, Lemma 10), to (30) we obtain

Note that in the previous bound, the exponential contraction follows from the fact that the objective function \(f_t\) is centred around its predicted mean. The latter is a consequence of the fact that the tangent filter is, as a covariance, centred itself (recall the identities (10) and (11) and the decomposition (21)). In addition, from the proof of Olsson and Westerborn (2017, Theorem 8) we extract, using Assumption 3,

and under Assumption 2, for all \(x \in \textsf {X}\),

Combining the previous bounds gives

where

Finally, summing up yields

which completes the proof. \(\square \)

Kernels

Given two measurable spaces \((\textsf {X},\mathcal {X})\) and \((\textsf {Y},\mathcal {Y})\), an unnormalised transition kernel \(\mathbf {K}\) between these spaces induces two integral operators, one acting on functions and the other on measures. Specifically, we define the function

and the measure

$$\begin{aligned} \nu \mathbf {K} : \mathcal {Y} \ni \textsf {A} \mapsto \int \mathbf {K}(x, \textsf {A}) \, \nu (\text {d}x) \quad (\nu \in \textsf {M}(\mathcal {X})) \end{aligned}$$

whenever these quantities are well defined. Moreover, let \(\mathbf {L}\) be another unnormalised transition kernel from \((\textsf {Y},\mathcal {Y})\) to the measurable space \((\textsf {Z},\mathcal {Z})\); then two different products of \(\mathbf {K}\) and \(\mathbf {L}\) can be defined, namely

$$\begin{aligned} \mathbf {K} \mathbf {L} : \textsf {X} \times \mathcal {Z} \ni (x, \textsf {A}) \mapsto \int \mathbf {K}(x, \text {d}y) \, \mathbf {L}(y, \textsf {A}) \end{aligned}$$

and

whenever these are well defined. These products form new transition kernels from \((\textsf {X},\mathcal {X})\) to \((\textsf {Z},\mathcal {Z})\) and from \((\textsf {X},\mathcal {X})\) to , respectively. Also the -product of a kernel \(\mathbf {K}\) and a measure \(\nu \in \textsf {M}(\mathcal {X})\) is defined as the new measure

Finally, for any kernel \(\mathbf {K}\) and any bounded measurable function h we write \(\mathbf {K}^2 h {:=}(\mathbf {K} h)^2\) and \(\mathbf {K} h^2 {:=}\mathbf {K}(h^2)\). Similar notation will be used for measures.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Olsson, J., Westerborn Alenlöv, J. Particle-based online estimation of tangent filters with application to parameter estimation in nonlinear state-space models. Ann Inst Stat Math 72, 545–576 (2020). https://doi.org/10.1007/s10463-018-0698-1

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-018-0698-1

Keywords

Navigation