Abstract
This paper presents a novel algorithm for efficient online estimation of the filter derivatives in general hidden Markov models. The algorithm, which has a linear computational complexity and very limited memory requirements, is furnished with a number of convergence results, including a central limit theorem with an asymptotic variance that can be shown to be uniformly bounded in time. Using the proposed filter derivative estimator, we design a recursive maximum likelihood algorithm updating the parameters according the gradient of the one-step predictor log-likelihood. The efficiency of this online parameter estimation scheme is illustrated in a simulation study.
Similar content being viewed by others
References
Anderson, B. D. O., Moore, J. B. (1979). Optimal filtering. New Jersey: Prentice-Hall.
Cappé, O. (2001). Ten years of HMMs (online bibliography 1989–2000). http://perso.telecom-paristech.fr/~cappe/docs/hmmbib.html. Accessed Mar 2013.
Cappé, O. (2011). Online EM algorithm for hidden Markov models. Journal of Computational and Graphical Statistics, 20(3), 728–749.
Cappé, O., Moulines, E., Rydén, T. (2005). Inference in hidden Markov models. New York: Springer.
Crisan, D., Heine, K. (2008). Stability of the discrete time filter in terms of the tails of noise distributions. Journal of the London Mathematical Society, 78(2), 441–458.
Del Moral, P., Guionnet, A. (2001). On the stability of interacting processes with applications to filtering and genetic algorithms. Annales de l’Institut Henri Poincaré, 37(2), 155–194.
Del Moral, P., Doucet, A., Singh, S. (2010). A backward particle interpretation of Feynman–Kac formulae. ESAIM: Mathematical Modelling and Numerical Analysis, 44(5), 947–975.
Del Moral, P., Doucet, A., Singh, S. S. (2015). Uniform stability of a particle approximation of the optimal filter derivative. SIAM Journal on Control and Optimization, 53(3), 1278–1304.
Delyon, B. (1996). General results on the convergence of stochastic algorithms. IEEE Transactions on Automatic Control, 41(9), 1245–1255.
Dempster, A. P., Laird, N. M., Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39(1), 1–38 (with discussion).
Douc, R., Matias, C. (2001). Asymptotics of the maximum likelihood estimator for general hidden Markov models. Bernoulli, 7(3), 381–420.
Douc, R., Garivier, A., Moulines, E., Olsson, J. (2011). Sequential Monte Carlo smoothing for general state space hidden Markov models. Annals of Applied Probability, 21(6), 2109–2145.
Douc, R., Moulines, E., Olsson, J. (2014). Long-term stability of sequential Monte Carlo methods under verifiable conditions. Annals of Applied Probability, 24(5), 1767–1802.
Doucet, A., Tadić, V. B. (2003). Parameter estimation in general state-space models using particle methods. Annals of the Institute of Statistical Mathematics, 55(2), 409–422.
Doucet, A., Godsill, S., Andrieu, C. (2000). On sequential Monte–Carlo sampling methods for Bayesian filtering. Statistics and Computing, 10(3), 197–208.
Doucet, A., De Freitas, N., Gordon, N. (Eds.). (2001). Sequential Monte Carlo methods in practice. New York: Springer.
Fearnhead, P., Wyncoll, D., Tawn, J. (2010). A sequential smoothing algorithm with linear computational cost. Biometrika, 97(2), 447–464.
Gordon, N., Salmond, D., Smith, A. F. (1993). Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proceedings F-Radar and Signal Processing, 140(2), 107–113.
Hull, J., White, A. (1987). The pricing of options on assets with stochastic volatilities. The Journal of Finance, 42(2), 281–300.
Jacob, P. E., Murray, L. M., Rubenthaler, S. (2013). Path storage in the particle filter. Statistics and Computing, 25(2), 487–496.
Jasra, A. (2015). On the behaviour of the backward interpretation of Feynman–Kac formulae under verifiable conditions. Journal of Applied Probability, 52(2), 339–359.
Julier, S. J., Uhlmann, J. K. (1997). A new extension of the Kalman filter to nonlinear systems. In AeroSense: The 11th international symposium on aerospace/defense sensing. Simulation and controls.
Kantas, N., Doucet, A., Singh, S. S., Maciejowski, J., Chopin, N. (2015). On particle methods for parameter estimation in state-space models. Statistical Science, 30(3), 328–351.
Kitagawa, G. (1996). Monte Carlo filter and smoother for non-Gaussian nonlinear state space models. Journal of Computational and Graphical Statistics, 5(1), 1–25.
Kitagawa, G., Sato, S. (2001). Monte Carlo smoothing and self-organising state-space model. In Sequential Monte Carlo methods in practice (pp. 177–195). New York: Springer.
Le Corff, S., Fort, G., Moulines, E. (2011). Online expectation maximization algorithm to solve the SLAM problem. In 2011 IEEE statistical signal processing workshop (SSP) (pp. 225–228).
Le Gland, F., Mevel, L. (1996). Geometric ergodicity in hidden markov models. In Research report, RR-2991, INRIA.
Le Gland, F., Mevel, L. (1997) Recursive estimation in hidden Markov models. In Priceesings of the 36th IEEE conference on decision and control (pp. 3468–3473).
Martinez-Cantin, R., de Freitas, N., Castellanos, J. A. (2007). Analysis of particle methods for simultaneous robot localization and mapping and a new algorithm: Marginal-slam. In Proceedings 2007 IEEE international conference on robotics and automation (pp. 2415–2420).
Montemerlo, M., Thrun, S., Koller, D., Wegbreit, B. (2002). FastSLAM: A factored solution to the simultaneous localization and mapping problem. In Proceedings of the AAAI national conference on artificial intelligence. Edmonton: AAAI.
Montemerlo, M., Thrun, S., Koller, D., Wegbreit, B. (2003). An improved particle filtering algorithm for simultaneous localization and mapping that provably converges. In Proceedings of the sixteenth international joint conference on artificial intelligence (IJCAI). Acapulco: IJCAI.
Nguyen, T. N. M., Le Corff, S., Moulines, E. (2017). On the two-filter approximations of marginal smoothing distributions in general state-space models. Advances in Applied Probability, 50(1), 154–177.
Olsson, J., Cappé, O., Douc, R., Moulines, E. (2008). Sequential Monte Carlo smoothing with application to parameter estimation in non-linear state space models. Bernoulli, 14(1), 155–179.
Olsson, J., Westerborn, J. (2016). Efficient parameter inference in general hidden markov models using the filter derivatives. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 3984–3988).
Olsson, J., Westerborn, J. (2017). Efficient particle-based online smoothing in general hidden Markov models: The PaRIS algorithm. Bernoulli, 23(3), 1951–1996.
Poyiadjis, G., Doucet, A., Singh, S. (2011). Particle approximations of the score and observed information matrix in state space models with application to parameter estimation. Biometrika, 98(1), 65–80.
Poyiadjis, G., Doucet, A., Singh, S. S. (2005). Particle methods for optimal filter derivative: application to parameter estimation. In Proceedings IEEE international conference on acoustics, speech, and signal processing (pp. 925–928).
Tadic, V. B. (2010). Analyticity, convergence, and convergence rate of recursive maximum-likelihood estimation in hidden markov models. IEEE Transactions on Information Theory, 56(12), 6406–6432.
Tadić, V. B., Doucet, A. (2017). Asymptotic bias of stochastic gradient search. Annals of Applied Probability, 27(6), 3255–3304.
Author information
Authors and Affiliations
Corresponding author
Appendices
Proofs
Define for all \(t \in \mathbb {N}\) and \(\theta \in \Theta \),
(Note that our definition of \(\mathbf {L}_{t}\) differs from that used by Olsson and Westerborn (2017), in which the order of \(g_{t;\theta }\) and \(\mathbf {Q}_{\theta }\) is swapped.) With this notation, by the filtering recursion (4)–(5),
with, as previously, . This condensed form of the filtering recursion will be used in Sect. A.3.
In the coming analysis, the following decomposition will be instrumental. For all \(t \in \mathbb {N}\),
1.1 Proof of Theorem 1
We apply the decomposition (21). Note that
where
Now, since and both belong to \(\textsf {F}(\mathcal {X})\), Proposition 1 provides constants \(c_t > 0\) and \(\tilde{c}_t > 0\) such that for all \(\varepsilon > 0\),
To deal with the second part of the decomposition (21), we use the same technique. First, by applying Proposition 1 with \(f \equiv 1/g_{t;\theta }\) and \(\tilde{f} \equiv 0\), we obtain constants \(a_t > 0\) and \(\tilde{a}_t > 0\) such that for all \(\varepsilon > 0\),
Similarly, using Proposition 1 with and \(\tilde{f} \equiv f_t / g_{t;\theta }\) provides constants \(b_t > 0\) and \(\tilde{b}_t > 0\) such that for all \(\varepsilon > 0\),
Combining (24), (25), and (26) yields, for all \(\varepsilon > 0\),
from which the statement of the theorem follows. \(\square \)
The following result is obtained by inspection of the proof of Olsson and Westerborn (2017, Theorem 1(i)).
Proposition 1
Let Assumption 1 hold. Then, for all \(t \in \mathbb {N}\), all \(\theta \in \Theta \), all additive state functionals , all measurable functions \(f_t\)and \(\tilde{f}_t\) such that \(f_t g_{t;\theta } \in \textsf {F}(\mathcal {X})\) and \(\tilde{f}_t g_{t;\theta } \in \textsf {F}(\mathcal {X})\), and all \(\tilde{N}\in \mathbb {N}\), there exist constants \(c_t > 0\) and \(\tilde{c}_t > 0\) (possibly depending on \(\theta \), \(h_{t}\)\(f_t\), \(\tilde{f}_t\), and \(\tilde{N}\)) such that for all \(\varepsilon > 0\),
where are produced using the PaRIS algorithm.
1.2 Proof of corollary 1
The \(\mathbb {P}\)-a.s. convergence of to is implied straightforwardly by the exponential convergence rate in Theorem 1. Indeed, note that
now, by Theorem 1,
where the right-hand side tends to zero when n tends to infinity. This completes the proof. \(\square \)
1.3 Proof of Theorem 2
where in this case
are defined in (23). By Proposition 2, since and ,
where Z is standard normally distributed and
with \(\sigma _{t;\theta }(h_{t;\theta })\) being defined in (14). Now, Proposition 1 and Proposition 2 yield
and
(with 0 denoting the zero function), respectively, implying, by Slutsky’s theorem,
Finally, we complete the proof by noting that the term in (14) coincides with the asymptotic variance provided by Del Moral et al. (2015, Theorem 3.2). \(\square \)
Proposition 2
Assumption 1 hold. Then for all \(t \in \mathbb {N}\), all \(\theta \in \Theta \), all additive state functionals , all measurable functions \(f_t \in \textsf {F}(\mathcal {X})\) and \(\tilde{f}_t \in \textsf {F}(\mathcal {X})\), and all \(\tilde{N}\in \mathbb {N}\), as \(N\rightarrow \infty \),
where Z is a standard Gaussian random variable and
with
and
Proof of Proposition 2
Assume first that . Then, by Lemma 1 and Slutsky’s theorem, as \(\Omega _{t} / N\overset{\mathbb {P}}{\longrightarrow }\pi _{t;\theta } g_{t;\theta }\) by Proposition 1,
where again Z has standard Gaussian distribution, is given in Lemma 1, and we have set and and used, first, that and and, second, that
Now, by iterating (20) we conclude that for all \((s, t) \in \mathbb {N}^2\),
and, consequently,
Finally, by (28) it holds that
where \(\Gamma ^2_{t;\theta } \langle f_t, \tilde{f}_t \rangle (h_{t})\) is defined in (29) and \(\sigma ^2_{t;\theta } \langle f_t, \tilde{f}_t \rangle (h_{t})\) is defined in (27). Finally, in the general case, the previous holds true when \(\tilde{f}_t\) is replaced by , which completes the proof.
The following lemma is obtained by inspection of the proof of Olsson and Westerborn (2017, Theorem 3).
Lemma 1
Assumption 1 hold. Then for all \(t \in \mathbb {N}\), all \(\theta \in \Theta \), all additive state functionals , all measurable functions \(f_t\)and \(\tilde{f}_t\) such that \(f_t g_{t;\theta } \in \textsf {F}(\mathcal {X})\) and \(\tilde{f}_t g_{t;\theta } \in \textsf {F}(\mathcal {X})\), and all \(\tilde{N}\in \mathbb {N}\), as \(N\rightarrow \infty \),
where Z is a standard normal distribution and
with
and
1.4 Proof of Theorem 3
As noted above, the first term of the asymptotic variance coincides with the asymptotic variance obtained by Del Moral et al. (2015, Theorem 3.2). The same work provides a constant \(c \in \mathbb {R}_+\) such that , and we may hence focus on bounding second term of the asymptotic variance.
For this purpose, note that for all \(s \le t - 1\) and \(x_{s + 1} \in \textsf {X}\),
By applying the forgetting of the filter, or, more particularly, Douc et al. (2011, Lemma 10), to (30) we obtain
Note that in the previous bound, the exponential contraction follows from the fact that the objective function \(f_t\) is centred around its predicted mean. The latter is a consequence of the fact that the tangent filter is, as a covariance, centred itself (recall the identities (10) and (11) and the decomposition (21)). In addition, from the proof of Olsson and Westerborn (2017, Theorem 8) we extract, using Assumption 3,
and under Assumption 2, for all \(x \in \textsf {X}\),
Combining the previous bounds gives
where
Finally, summing up yields
which completes the proof. \(\square \)
Kernels
Given two measurable spaces \((\textsf {X},\mathcal {X})\) and \((\textsf {Y},\mathcal {Y})\), an unnormalised transition kernel \(\mathbf {K}\) between these spaces induces two integral operators, one acting on functions and the other on measures. Specifically, we define the function
and the measure
whenever these quantities are well defined. Moreover, let \(\mathbf {L}\) be another unnormalised transition kernel from \((\textsf {Y},\mathcal {Y})\) to the measurable space \((\textsf {Z},\mathcal {Z})\); then two different products of \(\mathbf {K}\) and \(\mathbf {L}\) can be defined, namely
and
whenever these are well defined. These products form new transition kernels from \((\textsf {X},\mathcal {X})\) to \((\textsf {Z},\mathcal {Z})\) and from \((\textsf {X},\mathcal {X})\) to , respectively. Also the -product of a kernel \(\mathbf {K}\) and a measure \(\nu \in \textsf {M}(\mathcal {X})\) is defined as the new measure
Finally, for any kernel \(\mathbf {K}\) and any bounded measurable function h we write \(\mathbf {K}^2 h {:=}(\mathbf {K} h)^2\) and \(\mathbf {K} h^2 {:=}\mathbf {K}(h^2)\). Similar notation will be used for measures.
About this article
Cite this article
Olsson, J., Westerborn Alenlöv, J. Particle-based online estimation of tangent filters with application to parameter estimation in nonlinear state-space models. Ann Inst Stat Math 72, 545–576 (2020). https://doi.org/10.1007/s10463-018-0698-1
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-018-0698-1