Abstract
Reinforcement learning methods often produce brittle policies — policies that perform well during training, but generalize poorly beyond their direct training experience, thus becoming unstable under small disturbances. To address this issue, we propose a method for stabilizing a control policy in the space of configuration paths. It is applied post-training and relies purely on the data produced during training, as well as on an instantaneous control-matrix estimation. The approach is evaluated empirically on a planar bipedal walker subjected to a variety of perturbations. The control policies obtained via reinforcement learning are compared against their stabilized counterparts. Across different experiments, we find two- to four-fold increase in stability, when measured in terms of the perturbation amplitudes. We also provide a zero-dynamics interpretation of our approach.
Similar content being viewed by others
References
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105, 2012.
Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, pp. 436–444, 2015.
G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury, “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82–97, 2012.
T. Young, D. Hazarika, S. Poria, and E. Cambria, “Recent trends in deep learning based natural language processing,” IEEE Computational Intelligence Magazine, vol. 13, no. 3, pp. 55–75, 2018.
S. P. Singh, A. Kumar, H. Darbari, L. Singh, A. Rastogi, and S. Jain, “Machine translation using deep learning: An overview,” Proc. of International Conference on Computer, Communications and Electronics (Comptelix), pp. 162–167, IEEE, 2017.
V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” arXiv preprint arXiv:1312.5602, pp. 1–9, 2013.
E. Gibney, “Google AI algorithm masters ancient game of Go,” Nature, vol. 529, no. 7587, pp. 445–446, 2016.
Y. Duan, X. Chen, R. Houthooft, J. Schulman, and P. Abbeel, “Benchmarking deep reinforcement learning for continuous control,” Proc. of International Conference on Machine Learning, vol. 48, pp. 1329–1338, 2016.
J. Schulman, S. Levine, P. Moritz, M. Jordan, and P. Abbeel, “Trust region policy optimization,” Proc. of the 32nd International Conference on Machine Learning, vol. 37, pp. 1889–1897, 2015.
J. Schulman, P. Moritz, S. Levine, M. I. Jordan, and P. Abbeel, “High-dimensional continuous control using generalized advantage estimation,” Proc. of 4th International Conference on Learning Representations, pp. 1–14, 2016.
S. Gu, E. Holly, T. Lillicrap, and S. Levine, “Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates,” Proc. of IEEE International Conference on Robotics and Automation, pp. 3389–3396, IEEE, 2017.
S. Kuindersma, R. Deits, M. F. Fallon, A. Valenzuela, H. Dai, F. Permenter, T. Koolen, P. Marion, and R. Tedrake, “Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot,” Autonomous Robots, vol. 40, no. 3, pp. 429–455, 2016.
Y. Gong, R. Hartley, X. Da, A. Hereid, O. Harib, J.-K. Huang, and J. Grizzle, “Feedback control of a cassie bipedal robot: Walking, standing, and riding a segway,” Proc. of American Control Conference (ACC), pp. 4559–4566, IEEE, 2019.
S. Shigemi, “ASIMO and humanoid robot research at Honda,” Humanoid Robotics: A Reference, pp. 55–90, 2018.
M. Raibert, K. Blankespoor, G. Nelson, and R. Playter, “Bigdog, the rough-terrain quadruped robot,” IFAC Proceedings Volumes, vol. 41, no. 2, pp. 10822–10825, 2008.
G. Bledt, M. J. Powell, B. Katz, J. Di Carlo, P. M. Wensing, and S. Kim, “Mit cheetah 3: Design and control of a robust, dynamic quadruped robot,” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2245–2252, IEEE, 2018.
M. Hutter, C. Gehring, D. Jud, A. Lauber, C. D. Bellicoso, V. Tsounis, J. Hwangbo, K. Bodie, P. Fankhauser, M. Bloesch, R. Diethelm, S. Bachmann, A. Melzer, and M. Hoepflinger, “Anymal-A highly mobile and dynamic quadrupedal robot,” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 38–44, IEEE, 2016.
S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen, “Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection,” The International Journal of Robotics Research, vol. 37, no. 4–5, pp. 421–436, 2018.
J. Kober, J. A. Bagnell, and J. Peters, “Reinforcement learning in robotics: A survey,” The International Journal of Robotics Research, vol. 32, no. 11, pp. 1238–1274, 2013.
J. P. Hanna and P. Stone, “Grounded action transformation for robot learning in simulation,” Proc. of the AAAI Conference on Artificial Intelligence, vol. 31, pp. 3834–3840, 2017.
F. Sadeghi and S. Levine, “CAD2RL: Real single-image flight without a single real image,” Robotics: Science and Systems XIII, pp. 1–12, 2017.
J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” Proc. of International Conference on Intelligent Robots and Systems (IROS), pp. 23–30, IEEE, 2017.
S. James, A. J. Davison, and E. Johns, “Transferring end-to-end visuomotor control from simulation to real world for a multi-stage task,” Proc. of Conference on Robot Learning, pp. 334–343, PMLR, 2017.
F. Muratore, M. Gienger, and J. Peters, “Assessing transferability from simulation to reality for reinforcement learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 4, pp. 1172–1183, 2021.
J. M. Wang, D. J. Fleet, and A. Hertzmann, “Optimizing walking controllers for uncertain inputs and environments,” ACM Transactions on Graphics (TOG), vol. 29, no. 4, pp. 1–8, 2010.
H. K. Venkataraman and P. J. Seiler, “Recovering robustness in model-free reinforcement learning,” Proc. of American Control Conference (ACC), pp. 4210–4216, IEEE, 2019.
L. Pinto, J. Davidson, R. Sukthankar, and A. Gupta, “Robust adversarial reinforcement learning,” Proc. of International Conference on Machine Learning, pp. 2817–2826, PMLR, 2017.
A. Mandlekar, Y. Zhu, A. Garg, L. Fei-Fei, and S. Savarese, “Adversarially robust policy learning: Active construction of physically-plausible perturbations,” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3932–3939, IEEE, 2017.
C. I. Byrnes and A. Isidori, “Asymptotic stabilization of minimum phase nonlinear systems,” IEEE Transactions on Automatic Control, vol. 36, no. 10, pp. 1122–1137, 1991.
E. R. Westervelt, J. W. Grizzle, C. Chevallereau, J. H. Choi, and B. Morris, Feedback Control of Dynamic Bipedal Robot Locomotion, CRC press, 2018.
J. W. Grizzle and C. Chevallereau, “Virtual constraints and hybrid zero dynamics for realizing underactuated bipedal locomotion,” arXiv preprint arXiv:1706.01127, pp. 1–17, 2017.
B. D. Anderson and J. B. Moore, Optimal Control: Linear Quadratic Methods, Courier Corporation, 2007.
B. Griffin and J. Grizzle, “Nonholonomic virtual constraints and gait optimization for robust walking control,” The International Journal of Robotics Research, vol. 36, no. 8, pp. 895–922, 2017.
J. C. Horn, A. Mohammadi, K. A. Hamed, and R. D. Gregg, “Hybrid zero dynamics of bipedal robots under nonholonomic virtual constraints,” IEEE Control Systems Letters, vol. 3, no. 2, pp. 386–391, 2018.
S. M. Kakade, “A natural policy gradient,” Advances in Neural Information Processing Systems, pp. 1531–1538, 2002.
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, pp. 1–12, 2017.
J. Fu, A. Kumar, O. Nachum, G. Tucker, and S. Levine, “D4RL: Datasets for deep data-driven reinforcement learning,” arXiv preprint arXiv:2004.07219, pp. 1–19, 2020.
Author information
Authors and Affiliations
Corresponding author
Additional information
We thank Georges Harik for many useful discussions.
Sergey Pankov received his Ph.D. degree in physics from Rutgers University in 2003. His research interests include legged locomotion control, reinforcement learning, and deep learning.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Pankov, S. Configuration Path Control. Int. J. Control Autom. Syst. 21, 306–317 (2023). https://doi.org/10.1007/s12555-021-0466-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12555-021-0466-5