Autonomous Learning in a Pseudo-Episodic Physical Environment

Haughn, Kevin P. T.; Inman, Daniel J.

doi:10.1007/s10846-022-01577-5

Autonomous Learning in a Pseudo-Episodic Physical Environment

Regular paper
Open access
Published: 08 February 2022

Volume 104, article number 32, (2022)
Cite this article

Download PDF

You have full access to this open access article

Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Autonomous Learning in a Pseudo-Episodic Physical Environment

Download PDF

617 Accesses
4 Citations
2 Altmetric
Explore all metrics

Abstract

Forpractical considerations reinforcement learning has proven to be a difficult task outside of simulation when applied to a physical experiment. Here we derive an optional approach to model free reinforcement learning, achieved entirely online, through careful experimental design and algorithmic decision making. We design a reinforcement learning scheme to implement traditionally episodic algorithms for an unstable 1-dimensional mechanical environment. The training scheme is completely autonomous, requiring no human to be present throughout the learning process. We show that the pseudo-episodic technique allows for additional learning updates with off-policy actor-critic and experience replay methods. We show that including these additional updates between periods of traditional training episodes can improve speed and consistency of learning. Furthermore, we validate the procedure in experimental hardware. In the physical environment, several algorithm variants learned rapidly, each surpassing baseline maximum reward. The algorithms in this research are model free and use only information obtained by an onboard sensor during training.

Article PDF

Reinforcement Learning and Adaptive Control

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

Article 22 April 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Code Availability

Code used to exemplify methods implemented in this paper may be available upon request.

References

Ahang, S., Sutton, R.S.: A deeper look at experience replay. arXiv:1712.01275 (2017)
Andrychowicz, O.M., Baker, B., Chociej, M., Jozefowicz, R., McGrew, B., Pachocki, J., Petron, A., Plappert, M., Powell, G., Ray, A., et al: Learning dexterous in-hand manipulation. Int. J. Robot. Res. 39(1), 3–20 (2020)
Article Google Scholar
Atkeson, C.G., Santamaria, J.C.: A comparison of direct and model-based reinforcement learning. In: Proceedings of International Conference on Robotics and Automation, vol. 4, pp. 3557–3564. IEEE (1997)
Babaeizadeh, M., Frosio, I., Tyree, S., Clemons, J., Kautz, J.: Reinforcement learning through asynchronous advantage actor-critic on a gpu. arXiv:1611.06256 (2016)
Berkenkamp, F., Turchetta, M., Schoellig, A.P., Krause, A.: Safe model-based reinforcement learning with stability guarantees. arXiv:1705.08551 (2017)
Berner, C., Brockman, G., Chan, B., Cheung, V., Debiak, P., Dennison, C., Farhi, D., Fischer, Q., Hashme, S., Hesse, C., et al: Dota 2 with large scale deep reinforcement learning. arXiv:1912.06680 (2019)
Bøhn, E., Coates, E.M., Moe, S., Johansen, T.A.: Deep reinforcement learning attitude control of fixed-wing uavs using proximal policy optimization. In: 2019 International Conference on Unmanned Aircraft Systems (ICUAS), pp. 523–533. IEEE (2019)
Caarls, W., Schuitema, E.: Parallel online temporal difference learning for motor control. IEEE Trans. Neural Netw. Learn. Syst. 27(7), 1457–1468 (2015)
Article MathSciNet Google Scholar
Chen, Y.: Brain-inspired synaptic resistor circuits for self-programming intelligent systems. Advanced Intelligent Systems, p 2000219 (2021)
Degris, T., White, M., Sutton, R.S.: Off-policy actor-critic. arXiv:1205.4839 (2012)
Dulac-Arnold, G., Mankowitz, D., Hester, T.: Challenges of real-world reinforcement learning. arXiv:1904.12901 (2019)
Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: International Conference on Machine Learning, pp. 1126–1135. PMLR (2017)
Fujimoto, S., Conti, E., Ghavamzadeh, M., Pineau, J.: Benchmarking batch deep reinforcement learning algorithms. arXiv:1910.01708 (2019)
Fujimoto, S., Meger, D., Precup, D.: Off-policy deep reinforcement learning without exploration. In: International Conference on Machine Learning, pp. 2052–2062. PMLR (2019)
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning, pp. 1861–1870. PMLR (2018)
Hessel, M., Modayil, J., Van Hasselt, H., Schaul, T., Ostrovski, G., Dabney, W., Horgan, D., Piot, B., Azar, M., Silver, D.: Rainbow: Combining improvements in deep reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Hwangbo, J., Sa, I., Siegwart, R., Hutter, M.: Control of a quadrotor with reinforcement learning. IEEE Robot. Autom. Lett. 2(4), 2096–2103 (2017)
Article Google Scholar
Imanberdiyev, N., Fu, C., Kayacan, E., Chen, I.M.: Autonomous navigation of uav by using real-time model-based reinforcement learning. In: 2016 14th International Conference on Control, Automation, Robotics and Vision (ICARCV), pp. 1–6. IEEE (2016)
Kersandt, K.: Deep Reinforcement Learning as Control Method for Autonomous uavs. Master’s thesis, Universitat Politècnica de Catalunya (2018)
Kirkpatrick, J, Pascanu, R, Rabinowitz, N, Veness, J, Desjardins, G, Rusu, A A, Milan, K, Quan, J, Ramalho, T, Grabska-Barwinska, A, et al.: Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences 114(13), 3521–3526 (2017)
Article MathSciNet Google Scholar
Kober, J., Peters, J.: Imitation and reinforcement learning. IEEE Robot. Autom. Mag. 17 (2), 55–62 (2010)
Article Google Scholar
Koch, W., Mancuso, R., West, R., Bestavros, A.: Reinforcement learning for uav attitude control. ACM Trans. Cyber-Phys. Syst. 3(2), 1–21 (2019)
Article Google Scholar
Konidaris, G., Osentoski, S., Thomas, P.: Value function approximation in reinforcement learning using the fourier basis. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 25 (2011)
Lange, S., Gabel, T., Riedmiller, M.: Batch reinforcement learning. In: Reinforcement Learning, pp. 45–73. Springer (2012)
Lin, L.J.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach Learn 8(3-4), 293–321 (1992)
Article Google Scholar
Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937. PMLR (2016)
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing atari with deep reinforcement learning. arXiv:1312.5602 (2013)
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Mülling, K., Kober, J., Kroemer, O., Peters, J.: Learning to select and generalize striking movements in robot table tennis. Int J Robot Res 32(3), 263–279 (2013)
Article Google Scholar
Ng, A.Y., Kim, H.J., Jordan, M.I., Sastry, S., Ballianda, S.: Autonomous helicopter flight via reinforcement learning. In: NIPS, vol. 16. Citeseer (2003)
Pham, H.X., La, H.M., Feil-Seifer, D., Nguyen, L.V.: Autonomous uav navigation using reinforcement learning. arXiv:1801.05086 (2018)
Rolnick, D, Ahuja, A, Schwarz, J, Lillicrap, T, Wayne, G: Experience replay for continual learning. Advances in Neural Information Processing Systems 32, 350–360 (2019)
Google Scholar
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897. PMLR (2015)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv:1707.06347 (2017)
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Article Google Scholar
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., et al: A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419), 1140–1144 (2018)
Article MathSciNet Google Scholar
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: International Conference on Machine Learning, pp. 387–395. PMLR (2014)
Smart, W.D., Kaelbling, L.P.: Effective reinforcement learning for mobile robots. In: Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No. 02CH37292), vol. 4, pp. 3404–3410. IEEE (2002)
Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. MIT Press (2018)
Tai, L., Paolo, G., Liu, M.: Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 31–36. IEEE (2017)
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 23–30. IEEE (2017)
Wang, Z., Bapst, V., Heess, N., Mnih, V., Munos, R., Kavukcuoglu, K., de Freitas, N.: Sample efficient actor-critic with experience replay. arXiv:1611.01224 (2016)
Wu, Y., Mansimov, E., Liao, S., Grosse, R., Ba, J.: Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. arXiv:1708.05144 (2017)
Zhu, H., Yu, J., Gupta, A., Shah, D., Hartikainen, K., Singh, A., Kumar, V., Levine, S.: The ingredients of real-world robotic reinforcement learning. arXiv:2004.12570 (2020)

Download references

Acknowledgements

We are grateful to Professor Peter Washabaugh, at the University of Michigan’s Department of Aerospace Engineering, for providing the air-sled and air-track experiment and to Connor Stadler, an undergraduate in the same department for working to determine an efficient method of communicating state and action information between the Arduino and Python script.

Funding

This work is supported in part by the US Air Force Office of Scientific Research under a grant number AFOSR-FA9550-19-0213, titled “Brain Inspired Networks for Multifunctional Intelligent Systems in Aerial Vehicles” and in part by grant number FA9550-16-1-0087, titled “Avian-Inspired Multifunctional Morphing Vehicles” both monitored by Dr BL Lee.

Author information

Authors and Affiliations

The University of Michigan Department of Aerospace Engineering, 1320 Beal Ave, Ann Arbor, MI, 48109, USA
Kevin P. T. Haughn & Daniel J. Inman

Authors

Kevin P. T. Haughn
View author publications
You can also search for this author in PubMed Google Scholar
Daniel J. Inman
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Both authors contributed to the concept and design of the experiment presented in this article. Algorithm design, experimental setup, data collection and analysis where completed by Kevin Haughn. The first draft of this manuscript was written by Kevin Haughn. Both authors were active in revising the first draft, and all subsequent versions, and approve of the final manuscript.

Corresponding author

Correspondence to Kevin P. T. Haughn.

Ethics declarations

Ethics approval

This article has the approval of both authors.

Consent for Publication

Both authors authorized the publishing of this article.

Conflict of Interests

The authors have no conflict of interest.

Additional information

Availability of data and material

Not applicable

Consent to participate

Both authors gave consent to participate in this article.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Haughn, K.P.T., Inman, D.J. Autonomous Learning in a Pseudo-Episodic Physical Environment. J Intell Robot Syst 104, 32 (2022). https://doi.org/10.1007/s10846-022-01577-5

Download citation

Received: 26 May 2021
Accepted: 13 January 2022
Published: 08 February 2022
DOI: https://doi.org/10.1007/s10846-022-01577-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Autonomous Learning in a Pseudo-Episodic Physical Environment

Abstract

Article PDF

Similar content being viewed by others

Reinforcement Learning and Adaptive Control

Reinforcement Learning and Adaptive Control

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

Code Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Consent for Publication

Conflict of Interests

Additional information

Availability of data and material

Consent to participate

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Autonomous Learning in a Pseudo-Episodic Physical Environment

Abstract

Article PDF

Similar content being viewed by others

Reinforcement Learning and Adaptive Control

Reinforcement Learning and Adaptive Control

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

Code Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Consent for Publication

Conflict of Interests

Additional information

Availability of data and material

Consent to participate

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation