Abstract
The subject of this paper is reinforcement learning. Policies are considered here that produce actions based on states and random elements autocorrelated in subsequent time instants. Consequently, an agent learns from experiments that are distributed over time and potentially give better clues to policy improvement. Also, physical implementation of such policies, e.g. in robotics, is less problematic, as it avoids making robots shake. This is in opposition to most RL algorithms which add white noise to control causing unwanted shaking of the robots. An algorithm is introduced here that approximately optimizes the aforementioned policy. Its efficiency is verified for four simulated learning control problems (Ant, HalfCheetah, Hopper, and Walker2D) against three other methods (PPO, SAC, ACER). The algorithm outperforms others in three of these problems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We chose PyBullet because it is a freeware, while MuJoCo is a commercial software.
References
Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike adaptive elements that can learn difficult learning control problems. IEEE Trans. Syst. Man Cybern. B 13, 834–846 (1983)
Coumans, E., Bai, Y.: Pybullet, a python module for physics simulation for games, robotics and machine learning (2016–2019). http://pybullet.org
Erez, T., Tassa, Y., Todorov, E.: Simulation tools for model-based robotics: comparison of Bullet, Havok, MuJoCo, ODE and PhysX. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 4397–4404 (2015)
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft Actor-Critic: offpolicy maximum entropy deep reinforcement learning with a stochastic actor (2018). arXiv:1801.01290
van Hoof, H., Tanneberg, D., Peters, J.: Generalized exploration in policy search. Mach. Learn. 106, 1705–1724 (2017). https://doi.org/10.1007/s10994-017-5657-1
Kakade, S., Langford, J.: Approximately optimal approximate reinforcement learning. In: Proceedings of the Nineteenth International Conference on Machine Learning, ICML’02, pp. 267–274 (2002)
Kimura, H., Kobayashi, S.: An analysis of actor/critic algorithms using eligibility traces: reinforcement learning with imperfect value function. In: ICML (1998)
Korenkevych, D., Mahmood, A.R., Vasan, G., Bergstra, J.: Autoregressive policies for continuous control deep reinforcement learning. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19), pp. 2754–2762 (2019)
Liang, E., et al.: RLlib: abstractions for distributed reinforcement learning. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 3053–3062. PMLR, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning (2016). arXiv:1509.02971
Mnih, V., et al.: Playing atari with deep reinforcement learning (2013). arXiv:1312.5602
Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P.: Trust region policy optimization (2015). arXiv:1502.05477
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms (2017). arXiv:1707.06347
Todorov, E., Erez, T., Tassa, Y.: Mujoco: a physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033. IEEE (2012)
Wang, Z., et al.: Sample efficient actor-critic with experience replay (2016). arXiv:1611.01224
Wawrzyński, P.: Learning to control a 6-degree-of-freedom walking robot. In: Proceedings of EUROCON 2007 the International Conference on Computer as a Tool, pp. 698–705 (2007)
Wawrzyński, P.: Real-time reinforcement learning by sequential actor-critics and experience replay. Neural Networks 22(10), 1484–1497 (2009)
Wawrzyński, P.: Control policy with autocorrelated noise in reinforcement learning for robotics. Int. J. Mach. Learn. Comput. 5(2), 91–95 (2015)
Acknowledgement
This work was partially funded by a grant of Warsaw University of Technology Scientific Discipline Council for Computer Science and Telecommunications.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Algorithms’ Hyperparameters
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Szulc, M., Łyskawa, J., Wawrzyński, P. (2020). A Framework for Reinforcement Learning with Autocorrelated Actions. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Lecture Notes in Computer Science(), vol 12533. Springer, Cham. https://doi.org/10.1007/978-3-030-63833-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-63833-7_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63832-0
Online ISBN: 978-3-030-63833-7
eBook Packages: Computer ScienceComputer Science (R0)