Reinforcement learning, Sequential Monte Carlo and the EM algorithm

BORKAR, VIVEK S; JAIN, ANKUSH V

doi:10.1007/s12046-018-0889-8

Reinforcement learning, Sequential Monte Carlo and the EM algorithm

Published: 29 June 2018

Volume 43, article number 123, (2018)
Cite this article

Sādhanā Aims and scope Submit manuscript

VIVEK S BORKAR¹ &
ANKUSH V JAIN¹^nAff2

410 Accesses
Explore all metrics

Abstract

Using the expression for the unnormalized nonlinear filter for a hidden Markov model, we develop a dynamic-programming-like backward recursion for the filter. This is combined with some ideas from reinforcement learning and a conditional version of importance sampling in order to develop a scheme based on stochastic approximation for estimating the desired conditional expectation. This is then extended to a smoothing problem. Applying these ideas to the EM algorithm, a reinforcement learning scheme is developed for estimating the partially observed log-likelihood function. A stochastic approximation scheme maximizes this function over the unknown parameter. The two procedures are performed on two different time scales, emulating the alternating ‘expectation’ and ‘maximization’ operations of the EM algorithm. We also extend this to a continuous state space problem. Numerical results are presented in support of our schemes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Uncertainty and filtering of hidden Markov models in discrete time

Article Open access 03 June 2020

Discriminative Bayesian filtering lends momentum to the stochastic Newton method for minimizing log-convex functions

Article Open access 22 June 2022

An expectation maximization algorithm for the hidden markov models with multiparameter student-t observations

Article 06 December 2023

Notes

Double, because the importance sampling measures for two ‘value functions’ corresponding to terminal condition f and \(\mathbf 1 \) differ unlike in the non-adaptive case, where it was fixed as the common \(q(\cdot | \cdot )\).

References

Borkar V S, Jain A V 2014 Reinforcement learning, particle filters and the EM algorithm. In: Proceedings of the Workshop on Information Theory and Applications, San Diego, February
Bain A and Crisan D 2009 Fundamentals of stochastic filtering. Berlin–Heidelberg: Springer Verlag
Book MATH Google Scholar
Asmussen S and Glynn P W 2007 Stochastic simulation: algorithms and analysis. New York: Springer Verlag
MATH Google Scholar
Borkar V S 2009 Reinforcement learning – a bridge between numerical methods and Monte Carlo. In: Sastry N S N, Rao T S S R K, Delampady M and Rajeev B (Eds.) Perspectives in mathematical sciences I: probability and statistics. Singapore: World Scientific, pp. 71–91
Chapter Google Scholar
Zakai M 1969 On the optimal filtering of diffusion processes. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete 11(3): 230–243
Article MathSciNet MATH Google Scholar
Borkar V S 1991 A remark on control of partially observed Markov chains. Annals of Operations Research 29(1): 429–438
Article MathSciNet MATH Google Scholar
Borkar V S 1991 Topics in controlled Markov chains. Pitman Research Notes in Maths. No. 240. Harlow, UK: Longman Scientific and Technical
Elliott R J, Aggoun L and Moore J B 1995 Hidden Markov models: estimation and control. New York: Springer Verlag
MATH Google Scholar
Ahamed T P I, Borkar V S and Juneja S 2006 Adaptive importance sampling technique for Markov chains using stochastic approximation. Operations Research 54(3): 489–504
Article MathSciNet MATH Google Scholar
Lindsten F and Schön T B 2013 Backward simulation methods for Monte Carlo statistical inference. In: Foundations and Trends in Machine Learning, vol. 6(1), Hanover, MA: NOW Publishers, pp. 1–143
Cappé O, Moulines E and Rydén T 2005 Inference in hidden Markov models. New York: Springer Verlag
MATH Google Scholar
Borkar V S 2008 Stochastic approximation: a dynamical systems viewpoint. New Delhi, India: Hindustan Book Agency and Cambridge, UK: Cambridge University Press
Dempster A P, Laird N M and Rubin D B 1977 Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B 39(1): 1–38
MathSciNet MATH Google Scholar
Hirsch M W 1989 Convergent activation dynamics in continuous time networks. Neural Networks 2: 331–349
Article Google Scholar

Download references

Author information

ANKUSH V JAIN
Present address: Graviton Research Capital LLP, 14th Floor, Tower C, Building 8, Cyber City, Gurugram, Haryana, 122002, India

Authors and Affiliations

Department of Electrical Engineering, Indian Institute of Technology Bombay, Powai, Mumbai, 400076, India
VIVEK S BORKAR & ANKUSH V JAIN

Authors

VIVEK S BORKAR
View author publications
You can also search for this author in PubMed Google Scholar
ANKUSH V JAIN
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to VIVEK S BORKAR.

Additional information

Work of VSB supported in part by a J C Bose Fellowship and a grant for ‘Distributed Computation for Optimization over Large Networks and High Dimensional Data Analysis’ from the Department of Science and Technology, Government of India. A part of this work was presented at the Workshop on Information Theory and Applications, San Diego, CA, February 2014 [1].

Rights and permissions

Reprints and permissions

About this article

Cite this article

BORKAR, V.S., JAIN, A.V. Reinforcement learning, Sequential Monte Carlo and the EM algorithm. Sādhanā 43, 123 (2018). https://doi.org/10.1007/s12046-018-0889-8

Download citation

Received: 09 February 2017
Revised: 19 November 2017
Accepted: 19 November 2017
Published: 29 June 2018
DOI: https://doi.org/10.1007/s12046-018-0889-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reinforcement learning, Sequential Monte Carlo and the EM algorithm

Abstract

Access this article

Similar content being viewed by others

Uncertainty and filtering of hidden Markov models in discrete time

Discriminative Bayesian filtering lends momentum to the stochastic Newton method for minimizing log-convex functions

An expectation maximization algorithm for the hidden markov models with multiparameter student-t observations

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Reinforcement learning, Sequential Monte Carlo and the EM algorithm

Abstract

Access this article

Similar content being viewed by others

Uncertainty and filtering of hidden Markov models in discrete time

Discriminative Bayesian filtering lends momentum to the stochastic Newton method for minimizing log-convex functions

An expectation maximization algorithm for the hidden markov models with multiparameter student-t observations

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation