Skip to main content
Log in

Reinforcement learning, Sequential Monte Carlo and the EM algorithm

  • Published:
Sādhanā Aims and scope Submit manuscript

Abstract

Using the expression for the unnormalized nonlinear filter for a hidden Markov model, we develop a dynamic-programming-like backward recursion for the filter. This is combined with some ideas from reinforcement learning and a conditional version of importance sampling in order to develop a scheme based on stochastic approximation for estimating the desired conditional expectation. This is then extended to a smoothing problem. Applying these ideas to the EM algorithm, a reinforcement learning scheme is developed for estimating the partially observed log-likelihood function. A stochastic approximation scheme maximizes this function over the unknown parameter. The two procedures are performed on two different time scales, emulating the alternating ‘expectation’ and ‘maximization’ operations of the EM algorithm. We also extend this to a continuous state space problem. Numerical results are presented in support of our schemes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4

Similar content being viewed by others

Notes

  1. Double, because the importance sampling measures for two ‘value functions’ corresponding to terminal condition f and \(\mathbf 1 \) differ unlike in the non-adaptive case, where it was fixed as the common \(q(\cdot | \cdot )\).

References

  1. Borkar V S, Jain A V 2014 Reinforcement learning, particle filters and the EM algorithm. In: Proceedings of the Workshop on Information Theory and Applications, San Diego, February

  2. Bain A and Crisan D 2009 Fundamentals of stochastic filtering. Berlin–Heidelberg: Springer Verlag

    Book  MATH  Google Scholar 

  3. Asmussen S and Glynn P W 2007 Stochastic simulation: algorithms and analysis. New York: Springer Verlag

    MATH  Google Scholar 

  4. Borkar V S 2009 Reinforcement learning – a bridge between numerical methods and Monte Carlo. In: Sastry N S N, Rao T S S R K, Delampady M and Rajeev B (Eds.) Perspectives in mathematical sciences I: probability and statistics. Singapore: World Scientific, pp. 71–91

    Chapter  Google Scholar 

  5. Zakai M 1969 On the optimal filtering of diffusion processes. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete 11(3): 230–243

    Article  MathSciNet  MATH  Google Scholar 

  6. Borkar V S 1991 A remark on control of partially observed Markov chains. Annals of Operations Research 29(1): 429–438

    Article  MathSciNet  MATH  Google Scholar 

  7. Borkar V S 1991 Topics in controlled Markov chains. Pitman Research Notes in Maths. No. 240. Harlow, UK: Longman Scientific and Technical

  8. Elliott R J, Aggoun L and Moore J B 1995 Hidden Markov models: estimation and control. New York: Springer Verlag

    MATH  Google Scholar 

  9. Ahamed T P I, Borkar V S and Juneja S 2006 Adaptive importance sampling technique for Markov chains using stochastic approximation. Operations Research 54(3): 489–504

    Article  MathSciNet  MATH  Google Scholar 

  10. Lindsten F and Schön T B 2013 Backward simulation methods for Monte Carlo statistical inference. In: Foundations and Trends in Machine Learning, vol. 6(1), Hanover, MA: NOW Publishers, pp. 1–143

  11. Cappé O, Moulines E and Rydén T 2005 Inference in hidden Markov models. New York: Springer Verlag

    MATH  Google Scholar 

  12. Borkar V S 2008 Stochastic approximation: a dynamical systems viewpoint. New Delhi, India: Hindustan Book Agency and Cambridge, UK: Cambridge University Press

  13. Dempster A P, Laird N M and Rubin D B 1977 Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B 39(1): 1–38

    MathSciNet  MATH  Google Scholar 

  14. Hirsch M W 1989 Convergent activation dynamics in continuous time networks. Neural Networks 2: 331–349

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to VIVEK S BORKAR.

Additional information

Work of VSB supported in part by a J C Bose Fellowship and a grant for ‘Distributed Computation for Optimization over Large Networks and High Dimensional Data Analysis’ from the Department of Science and Technology, Government of India. A part of this work was presented at the Workshop on Information Theory and Applications, San Diego, CA, February 2014 [1].

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

BORKAR, V.S., JAIN, A.V. Reinforcement learning, Sequential Monte Carlo and the EM algorithm. Sādhanā 43, 123 (2018). https://doi.org/10.1007/s12046-018-0889-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12046-018-0889-8

Keywords

Navigation