Skip to main content

A Framework for Reinforcement Learning with Autocorrelated Actions

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2020)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12533))

Included in the following conference series:

Abstract

The subject of this paper is reinforcement learning. Policies are considered here that produce actions based on states and random elements autocorrelated in subsequent time instants. Consequently, an agent learns from experiments that are distributed over time and potentially give better clues to policy improvement. Also, physical implementation of such policies, e.g. in robotics, is less problematic, as it avoids making robots shake. This is in opposition to most RL algorithms which add white noise to control causing unwanted shaking of the robots. An algorithm is introduced here that approximately optimizes the aforementioned policy. Its efficiency is verified for four simulated learning control problems (Ant, HalfCheetah, Hopper, and Walker2D) against three other methods (PPO, SAC, ACER). The algorithm outperforms others in three of these problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We chose PyBullet because it is a freeware, while MuJoCo is a commercial software.

References

  1. Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike adaptive elements that can learn difficult learning control problems. IEEE Trans. Syst. Man Cybern. B 13, 834–846 (1983)

    Article  Google Scholar 

  2. Coumans, E., Bai, Y.: Pybullet, a python module for physics simulation for games, robotics and machine learning (2016–2019). http://pybullet.org

  3. Erez, T., Tassa, Y., Todorov, E.: Simulation tools for model-based robotics: comparison of Bullet, Havok, MuJoCo, ODE and PhysX. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 4397–4404 (2015)

    Google Scholar 

  4. Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft Actor-Critic: offpolicy maximum entropy deep reinforcement learning with a stochastic actor (2018). arXiv:1801.01290

  5. van Hoof, H., Tanneberg, D., Peters, J.: Generalized exploration in policy search. Mach. Learn. 106, 1705–1724 (2017). https://doi.org/10.1007/s10994-017-5657-1

    Article  MathSciNet  Google Scholar 

  6. Kakade, S., Langford, J.: Approximately optimal approximate reinforcement learning. In: Proceedings of the Nineteenth International Conference on Machine Learning, ICML’02, pp. 267–274 (2002)

    Google Scholar 

  7. Kimura, H., Kobayashi, S.: An analysis of actor/critic algorithms using eligibility traces: reinforcement learning with imperfect value function. In: ICML (1998)

    Google Scholar 

  8. Korenkevych, D., Mahmood, A.R., Vasan, G., Bergstra, J.: Autoregressive policies for continuous control deep reinforcement learning. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19), pp. 2754–2762 (2019)

    Google Scholar 

  9. Liang, E., et al.: RLlib: abstractions for distributed reinforcement learning. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 3053–3062. PMLR, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018

    Google Scholar 

  10. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning (2016). arXiv:1509.02971

  11. Mnih, V., et al.: Playing atari with deep reinforcement learning (2013). arXiv:1312.5602

  12. Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P.: Trust region policy optimization (2015). arXiv:1502.05477

  13. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms (2017). arXiv:1707.06347

  14. Todorov, E., Erez, T., Tassa, Y.: Mujoco: a physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033. IEEE (2012)

    Google Scholar 

  15. Wang, Z., et al.: Sample efficient actor-critic with experience replay (2016). arXiv:1611.01224

  16. Wawrzyński, P.: Learning to control a 6-degree-of-freedom walking robot. In: Proceedings of EUROCON 2007 the International Conference on Computer as a Tool, pp. 698–705 (2007)

    Google Scholar 

  17. Wawrzyński, P.: Real-time reinforcement learning by sequential actor-critics and experience replay. Neural Networks 22(10), 1484–1497 (2009)

    Article  Google Scholar 

  18. Wawrzyński, P.: Control policy with autocorrelated noise in reinforcement learning for robotics. Int. J. Mach. Learn. Comput. 5(2), 91–95 (2015)

    Article  Google Scholar 

Download references

Acknowledgement

This work was partially funded by a grant of Warsaw University of Technology Scientific Discipline Council for Computer Science and Telecommunications.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paweł Wawrzyński .

Editor information

Editors and Affiliations

A Algorithms’ Hyperparameters

A Algorithms’ Hyperparameters

This section presents hyperparameters used in simulations reported in Sect. 5. All algorithms used the discount factor equal to 0.99. The rest of hyperparameters for ACERAC, ACER, SAC, and PPO, are depicted in Tables 123, and 4, respectively.

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Szulc, M., Łyskawa, J., Wawrzyński, P. (2020). A Framework for Reinforcement Learning with Autocorrelated Actions. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Lecture Notes in Computer Science(), vol 12533. Springer, Cham. https://doi.org/10.1007/978-3-030-63833-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-63833-7_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-63832-0

  • Online ISBN: 978-3-030-63833-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics