Abstract
Forpractical considerations reinforcement learning has proven to be a difficult task outside of simulation when applied to a physical experiment. Here we derive an optional approach to model free reinforcement learning, achieved entirely online, through careful experimental design and algorithmic decision making. We design a reinforcement learning scheme to implement traditionally episodic algorithms for an unstable 1-dimensional mechanical environment. The training scheme is completely autonomous, requiring no human to be present throughout the learning process. We show that the pseudo-episodic technique allows for additional learning updates with off-policy actor-critic and experience replay methods. We show that including these additional updates between periods of traditional training episodes can improve speed and consistency of learning. Furthermore, we validate the procedure in experimental hardware. In the physical environment, several algorithm variants learned rapidly, each surpassing baseline maximum reward. The algorithms in this research are model free and use only information obtained by an onboard sensor during training.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Code Availability
Code used to exemplify methods implemented in this paper may be available upon request.
References
Ahang, S., Sutton, R.S.: A deeper look at experience replay. arXiv:1712.01275 (2017)
Andrychowicz, O.M., Baker, B., Chociej, M., Jozefowicz, R., McGrew, B., Pachocki, J., Petron, A., Plappert, M., Powell, G., Ray, A., et al: Learning dexterous in-hand manipulation. Int. J. Robot. Res. 39(1), 3–20 (2020)
Atkeson, C.G., Santamaria, J.C.: A comparison of direct and model-based reinforcement learning. In: Proceedings of International Conference on Robotics and Automation, vol. 4, pp. 3557–3564. IEEE (1997)
Babaeizadeh, M., Frosio, I., Tyree, S., Clemons, J., Kautz, J.: Reinforcement learning through asynchronous advantage actor-critic on a gpu. arXiv:1611.06256 (2016)
Berkenkamp, F., Turchetta, M., Schoellig, A.P., Krause, A.: Safe model-based reinforcement learning with stability guarantees. arXiv:1705.08551 (2017)
Berner, C., Brockman, G., Chan, B., Cheung, V., Debiak, P., Dennison, C., Farhi, D., Fischer, Q., Hashme, S., Hesse, C., et al: Dota 2 with large scale deep reinforcement learning. arXiv:1912.06680 (2019)
Bøhn, E., Coates, E.M., Moe, S., Johansen, T.A.: Deep reinforcement learning attitude control of fixed-wing uavs using proximal policy optimization. In: 2019 International Conference on Unmanned Aircraft Systems (ICUAS), pp. 523–533. IEEE (2019)
Caarls, W., Schuitema, E.: Parallel online temporal difference learning for motor control. IEEE Trans. Neural Netw. Learn. Syst. 27(7), 1457–1468 (2015)
Chen, Y.: Brain-inspired synaptic resistor circuits for self-programming intelligent systems. Advanced Intelligent Systems, p 2000219 (2021)
Degris, T., White, M., Sutton, R.S.: Off-policy actor-critic. arXiv:1205.4839 (2012)
Dulac-Arnold, G., Mankowitz, D., Hester, T.: Challenges of real-world reinforcement learning. arXiv:1904.12901 (2019)
Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: International Conference on Machine Learning, pp. 1126–1135. PMLR (2017)
Fujimoto, S., Conti, E., Ghavamzadeh, M., Pineau, J.: Benchmarking batch deep reinforcement learning algorithms. arXiv:1910.01708 (2019)
Fujimoto, S., Meger, D., Precup, D.: Off-policy deep reinforcement learning without exploration. In: International Conference on Machine Learning, pp. 2052–2062. PMLR (2019)
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning, pp. 1861–1870. PMLR (2018)
Hessel, M., Modayil, J., Van Hasselt, H., Schaul, T., Ostrovski, G., Dabney, W., Horgan, D., Piot, B., Azar, M., Silver, D.: Rainbow: Combining improvements in deep reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Hwangbo, J., Sa, I., Siegwart, R., Hutter, M.: Control of a quadrotor with reinforcement learning. IEEE Robot. Autom. Lett. 2(4), 2096–2103 (2017)
Imanberdiyev, N., Fu, C., Kayacan, E., Chen, I.M.: Autonomous navigation of uav by using real-time model-based reinforcement learning. In: 2016 14th International Conference on Control, Automation, Robotics and Vision (ICARCV), pp. 1–6. IEEE (2016)
Kersandt, K.: Deep Reinforcement Learning as Control Method for Autonomous uavs. Master’s thesis, Universitat Politècnica de Catalunya (2018)
Kirkpatrick, J, Pascanu, R, Rabinowitz, N, Veness, J, Desjardins, G, Rusu, A A, Milan, K, Quan, J, Ramalho, T, Grabska-Barwinska, A, et al.: Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences 114(13), 3521–3526 (2017)
Kober, J., Peters, J.: Imitation and reinforcement learning. IEEE Robot. Autom. Mag. 17 (2), 55–62 (2010)
Koch, W., Mancuso, R., West, R., Bestavros, A.: Reinforcement learning for uav attitude control. ACM Trans. Cyber-Phys. Syst. 3(2), 1–21 (2019)
Konidaris, G., Osentoski, S., Thomas, P.: Value function approximation in reinforcement learning using the fourier basis. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 25 (2011)
Lange, S., Gabel, T., Riedmiller, M.: Batch reinforcement learning. In: Reinforcement Learning, pp. 45–73. Springer (2012)
Lin, L.J.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach Learn 8(3-4), 293–321 (1992)
Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937. PMLR (2016)
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing atari with deep reinforcement learning. arXiv:1312.5602 (2013)
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Mülling, K., Kober, J., Kroemer, O., Peters, J.: Learning to select and generalize striking movements in robot table tennis. Int J Robot Res 32(3), 263–279 (2013)
Ng, A.Y., Kim, H.J., Jordan, M.I., Sastry, S., Ballianda, S.: Autonomous helicopter flight via reinforcement learning. In: NIPS, vol. 16. Citeseer (2003)
Pham, H.X., La, H.M., Feil-Seifer, D., Nguyen, L.V.: Autonomous uav navigation using reinforcement learning. arXiv:1801.05086 (2018)
Rolnick, D, Ahuja, A, Schwarz, J, Lillicrap, T, Wayne, G: Experience replay for continual learning. Advances in Neural Information Processing Systems 32, 350–360 (2019)
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897. PMLR (2015)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv:1707.06347 (2017)
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., et al: A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419), 1140–1144 (2018)
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: International Conference on Machine Learning, pp. 387–395. PMLR (2014)
Smart, W.D., Kaelbling, L.P.: Effective reinforcement learning for mobile robots. In: Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No. 02CH37292), vol. 4, pp. 3404–3410. IEEE (2002)
Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. MIT Press (2018)
Tai, L., Paolo, G., Liu, M.: Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 31–36. IEEE (2017)
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 23–30. IEEE (2017)
Wang, Z., Bapst, V., Heess, N., Mnih, V., Munos, R., Kavukcuoglu, K., de Freitas, N.: Sample efficient actor-critic with experience replay. arXiv:1611.01224 (2016)
Wu, Y., Mansimov, E., Liao, S., Grosse, R., Ba, J.: Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. arXiv:1708.05144 (2017)
Zhu, H., Yu, J., Gupta, A., Shah, D., Hartikainen, K., Singh, A., Kumar, V., Levine, S.: The ingredients of real-world robotic reinforcement learning. arXiv:2004.12570 (2020)
Acknowledgements
We are grateful to Professor Peter Washabaugh, at the University of Michigan’s Department of Aerospace Engineering, for providing the air-sled and air-track experiment and to Connor Stadler, an undergraduate in the same department for working to determine an efficient method of communicating state and action information between the Arduino and Python script.
Funding
This work is supported in part by the US Air Force Office of Scientific Research under a grant number AFOSR-FA9550-19-0213, titled “Brain Inspired Networks for Multifunctional Intelligent Systems in Aerial Vehicles” and in part by grant number FA9550-16-1-0087, titled “Avian-Inspired Multifunctional Morphing Vehicles” both monitored by Dr BL Lee.
Author information
Authors and Affiliations
Contributions
Both authors contributed to the concept and design of the experiment presented in this article. Algorithm design, experimental setup, data collection and analysis where completed by Kevin Haughn. The first draft of this manuscript was written by Kevin Haughn. Both authors were active in revising the first draft, and all subsequent versions, and approve of the final manuscript.
Corresponding author
Ethics declarations
Ethics approval
This article has the approval of both authors.
Consent for Publication
Both authors authorized the publishing of this article.
Conflict of Interests
The authors have no conflict of interest.
Additional information
Availability of data and material
Not applicable
Consent to participate
Both authors gave consent to participate in this article.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Haughn, K.P.T., Inman, D.J. Autonomous Learning in a Pseudo-Episodic Physical Environment. J Intell Robot Syst 104, 32 (2022). https://doi.org/10.1007/s10846-022-01577-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10846-022-01577-5