A Framework for Reinforcement Learning with Autocorrelated Actions

Szulc, Marcin; Łyskawa, Jakub; Wawrzyński, Paweł

doi:10.1007/978-3-030-63833-7_8

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12533))

Included in the following conference series:

International Conference on Neural Information Processing

2360 Accesses
2 Citations

Abstract

The subject of this paper is reinforcement learning. Policies are considered here that produce actions based on states and random elements autocorrelated in subsequent time instants. Consequently, an agent learns from experiments that are distributed over time and potentially give better clues to policy improvement. Also, physical implementation of such policies, e.g. in robotics, is less problematic, as it avoids making robots shake. This is in opposition to most RL algorithms which add white noise to control causing unwanted shaking of the robots. An algorithm is introduced here that approximately optimizes the aforementioned policy. Its efficiency is verified for four simulated learning control problems (Ant, HalfCheetah, Hopper, and Walker2D) against three other methods (PPO, SAC, ACER). The algorithm outperforms others in three of these problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We chose PyBullet because it is a freeware, while MuJoCo is a commercial software.

References

Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike adaptive elements that can learn difficult learning control problems. IEEE Trans. Syst. Man Cybern. B 13, 834–846 (1983)
Article Google Scholar
Coumans, E., Bai, Y.: Pybullet, a python module for physics simulation for games, robotics and machine learning (2016–2019). http://pybullet.org
Erez, T., Tassa, Y., Todorov, E.: Simulation tools for model-based robotics: comparison of Bullet, Havok, MuJoCo, ODE and PhysX. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 4397–4404 (2015)
Google Scholar
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft Actor-Critic: offpolicy maximum entropy deep reinforcement learning with a stochastic actor (2018). arXiv:1801.01290
van Hoof, H., Tanneberg, D., Peters, J.: Generalized exploration in policy search. Mach. Learn. 106, 1705–1724 (2017). https://doi.org/10.1007/s10994-017-5657-1
Article MathSciNet Google Scholar
Kakade, S., Langford, J.: Approximately optimal approximate reinforcement learning. In: Proceedings of the Nineteenth International Conference on Machine Learning, ICML’02, pp. 267–274 (2002)
Google Scholar
Kimura, H., Kobayashi, S.: An analysis of actor/critic algorithms using eligibility traces: reinforcement learning with imperfect value function. In: ICML (1998)
Google Scholar
Korenkevych, D., Mahmood, A.R., Vasan, G., Bergstra, J.: Autoregressive policies for continuous control deep reinforcement learning. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19), pp. 2754–2762 (2019)
Google Scholar
Liang, E., et al.: RLlib: abstractions for distributed reinforcement learning. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 3053–3062. PMLR, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018
Google Scholar
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning (2016). arXiv:1509.02971
Mnih, V., et al.: Playing atari with deep reinforcement learning (2013). arXiv:1312.5602
Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P.: Trust region policy optimization (2015). arXiv:1502.05477
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms (2017). arXiv:1707.06347
Todorov, E., Erez, T., Tassa, Y.: Mujoco: a physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033. IEEE (2012)
Google Scholar
Wang, Z., et al.: Sample efficient actor-critic with experience replay (2016). arXiv:1611.01224
Wawrzyński, P.: Learning to control a 6-degree-of-freedom walking robot. In: Proceedings of EUROCON 2007 the International Conference on Computer as a Tool, pp. 698–705 (2007)
Google Scholar
Wawrzyński, P.: Real-time reinforcement learning by sequential actor-critics and experience replay. Neural Networks 22(10), 1484–1497 (2009)
Article Google Scholar
Wawrzyński, P.: Control policy with autocorrelated noise in reinforcement learning for robotics. Int. J. Mach. Learn. Comput. 5(2), 91–95 (2015)
Article Google Scholar

Download references

Acknowledgement

This work was partially funded by a grant of Warsaw University of Technology Scientific Discipline Council for Computer Science and Telecommunications.

Author information

Authors and Affiliations

Institute of Computer Science, Warsaw University of Technology, Warsaw, Poland
Marcin Szulc, Jakub Łyskawa & Paweł Wawrzyński

Authors

Marcin Szulc
View author publications
You can also search for this author in PubMed Google Scholar
Jakub Łyskawa
View author publications
You can also search for this author in PubMed Google Scholar
Paweł Wawrzyński
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Paweł Wawrzyński .

Editor information

Editors and Affiliations

Department of AI, Ping An Life, Shenzhen, China
Haiqin Yang
Faculty of Information Technology, King Mongkut’s Institute of Technology Ladkrabang, Bangkok, Thailand
Kitsuchart Pasupa
City University of Hong Kong, Kowloon, China
Andrew Chi-Sing Leung
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, Hong Kong
James T. Kwok
School of Information Technology, King Mongkut’s University of Technology Thonburi, Bangkok, Thailand
Jonathan H. Chan
The Chinese University of Hong Kong, New Territories, Hong Kong
Irwin King

A Algorithms’ Hyperparameters

This section presents hyperparameters used in simulations reported in Sect. 5. All algorithms used the discount factor equal to 0.99. The rest of hyperparameters for ACERAC, ACER, SAC, and PPO, are depicted in Tables 1, 2, 3, and 4, respectively.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Szulc, M., Łyskawa, J., Wawrzyński, P. (2020). A Framework for Reinforcement Learning with Autocorrelated Actions. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Lecture Notes in Computer Science(), vol 12533. Springer, Cham. https://doi.org/10.1007/978-3-030-63833-7_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-63833-7_8
Published: 20 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63832-0
Online ISBN: 978-3-030-63833-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Framework for Reinforcement Learning with Autocorrelated Actions

Abstract

Access this chapter

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Algorithms’ Hyperparameters

A Algorithms’ Hyperparameters

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation