Reinforcement Learning Algorithm for Partially Observable Environment Considering State Estimation Error Prediction

Watabe, Yuuya; Shibuya, Takeshi

doi:10.1007/978-981-16-5207-3_52

Yuuya Watabe¹⁷ &
Takeshi Shibuya¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1406))

465 Accesses

Abstract

Reinforcement learning is a machine learning technique that attempts to learn a policy based on a reward criterion through trial and error in the target environment. In this paper, we investigate a reinforcement learning algorithm for a partially observable environment without prior knowledge about the state transition and observation likelihood functions. When only noisy or incomplete measurement is available due to inexpensive sensors, we can use a state estimator in combination with a policy trained via reinforcement learning to recover the performance of the policy. In such cases, performance may degrade due to state estimation error. To solve this problem, we propose a planning method that considers the prediction of state estimation error. The proposed algorithm indirectly predicts the state estimation error and avoids to get into areas where large state estimation error may occur using prediction. We tested the proposed method in a simple inverted pendulum task, where the agent only receives noisy and drifted measurements. Empirical result demonstrates that performance of policy in a partially observable environment is improved by the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Approximate Bayesian reinforcement learning based on estimation of plant

Article 06 February 2020

A critical state identification approach to inverse reinforcement learning for autonomous systems

Article 30 October 2021

A reward allocation method for reinforcement learning in stabilizing control tasks

Article 29 July 2014

References

Brockman, G., et al.: Openai gym. arXiv preprint arXiv:1606.01540 (2016)
Deisenroth, M.P., et al.: Solving nonlinear continuous state-action-observation pomdps for mechanical systems with gaussian noise. In: Proceedings of European Workshop on Reinforcement Learning (2012)
Google Scholar
Karkus, P., et al.: Particle filter networks with application to visual localization. In: Proceedings of The 2nd Conference on Robot Learning, vol. 87, pp. 169–178 (2018)
Google Scholar
Ko, J., et al.: Gp-Bayes filters: Bayesian filtering using gaussian process prediction and observation models. Auton. Robot. 27(1), 75–90 (2009)
Article Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Nishida, T.: State feedback control using particle filter. IEE J Trans. Electron. Inf. Syst. 133(7), 1376–1383 (2013). (in Japanese)
Google Scholar
Rigatos, G.G.: Particle and Kalman filtering for state estimation and control of dc motors. ISA Trans. 48(1), 62–72 (2009)
Article Google Scholar
Sutton, R.S., et al.: Reinforcement Learning: An Introduction. The MIT Press (2018)
Google Scholar
Suyama, A.: Bayesian Deep Learning. Kodansha (2019) (in Japanese)
Google Scholar
Thrun, S., et al.: Probabilistic Robotics. MIT Press (2005)
Google Scholar
Yano, K.: A tutorial on particle filters: filters, smoothing, and parameter estimation. J. Jpn. Stat. Soc. 44(1), 189–216 (2014). (in Japanese)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Master’s Program in Intelligent and Mechanical Interaction Systems, University of Tsukuba, Ibaraki, Japan
Yuuya Watabe
Faculty of Engineering, Information and Systems, University of Tsukuba, Ibaraki, Japan
Takeshi Shibuya

Authors

Yuuya Watabe
View author publications
You can also search for this author in PubMed Google Scholar
Takeshi Shibuya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Takeshi Shibuya .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, University of Kalyani, Kalyani, West Bengal, India
Jyotsna Kumar Mandal
School of Computing and Information Systems, The University of Melbourne, Melbourne, VIC, Australia
Rajkumar Buyya
Department of Computer Science and Engineering, Maulana Abul Kalam Azad University of Technology, West Bengal, Kolkata, West Bengal, India
Debashis De

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Watabe, Y., Shibuya, T. (2022). Reinforcement Learning Algorithm for Partially Observable Environment Considering State Estimation Error Prediction. In: Mandal, J.K., Buyya, R., De, D. (eds) Proceedings of International Conference on Advanced Computing Applications. Advances in Intelligent Systems and Computing, vol 1406. Springer, Singapore. https://doi.org/10.1007/978-981-16-5207-3_52

Download citation

DOI: https://doi.org/10.1007/978-981-16-5207-3_52
Published: 24 November 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-5206-6
Online ISBN: 978-981-16-5207-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Reinforcement Learning Algorithm for Partially Observable Environment Considering State Estimation Error Prediction

Abstract

Access this chapter

Similar content being viewed by others

Approximate Bayesian reinforcement learning based on estimation of plant

A critical state identification approach to inverse reinforcement learning for autonomous systems

A reward allocation method for reinforcement learning in stabilizing control tasks

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Reinforcement Learning Algorithm for Partially Observable Environment Considering State Estimation Error Prediction

Abstract

Access this chapter

Similar content being viewed by others

Approximate Bayesian reinforcement learning based on estimation of plant

A critical state identification approach to inverse reinforcement learning for autonomous systems

A reward allocation method for reinforcement learning in stabilizing control tasks

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation