Skip to main content
Log in

Configuration Path Control

  • Regular Papers
  • Robot and Applications
  • Published:
International Journal of Control, Automation and Systems Aims and scope Submit manuscript

Abstract

Reinforcement learning methods often produce brittle policies — policies that perform well during training, but generalize poorly beyond their direct training experience, thus becoming unstable under small disturbances. To address this issue, we propose a method for stabilizing a control policy in the space of configuration paths. It is applied post-training and relies purely on the data produced during training, as well as on an instantaneous control-matrix estimation. The approach is evaluated empirically on a planar bipedal walker subjected to a variety of perturbations. The control policies obtained via reinforcement learning are compared against their stabilized counterparts. Across different experiments, we find two- to four-fold increase in stability, when measured in terms of the perturbation amplitudes. We also provide a zero-dynamics interpretation of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105, 2012.

    Google Scholar 

  2. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, pp. 436–444, 2015.

    Article  Google Scholar 

  3. G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury, “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82–97, 2012.

    Article  Google Scholar 

  4. T. Young, D. Hazarika, S. Poria, and E. Cambria, “Recent trends in deep learning based natural language processing,” IEEE Computational Intelligence Magazine, vol. 13, no. 3, pp. 55–75, 2018.

    Article  Google Scholar 

  5. S. P. Singh, A. Kumar, H. Darbari, L. Singh, A. Rastogi, and S. Jain, “Machine translation using deep learning: An overview,” Proc. of International Conference on Computer, Communications and Electronics (Comptelix), pp. 162–167, IEEE, 2017.

  6. V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” arXiv preprint arXiv:1312.5602, pp. 1–9, 2013.

  7. E. Gibney, “Google AI algorithm masters ancient game of Go,” Nature, vol. 529, no. 7587, pp. 445–446, 2016.

    Article  Google Scholar 

  8. Y. Duan, X. Chen, R. Houthooft, J. Schulman, and P. Abbeel, “Benchmarking deep reinforcement learning for continuous control,” Proc. of International Conference on Machine Learning, vol. 48, pp. 1329–1338, 2016.

    Google Scholar 

  9. J. Schulman, S. Levine, P. Moritz, M. Jordan, and P. Abbeel, “Trust region policy optimization,” Proc. of the 32nd International Conference on Machine Learning, vol. 37, pp. 1889–1897, 2015.

    Google Scholar 

  10. J. Schulman, P. Moritz, S. Levine, M. I. Jordan, and P. Abbeel, “High-dimensional continuous control using generalized advantage estimation,” Proc. of 4th International Conference on Learning Representations, pp. 1–14, 2016.

  11. S. Gu, E. Holly, T. Lillicrap, and S. Levine, “Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates,” Proc. of IEEE International Conference on Robotics and Automation, pp. 3389–3396, IEEE, 2017.

  12. S. Kuindersma, R. Deits, M. F. Fallon, A. Valenzuela, H. Dai, F. Permenter, T. Koolen, P. Marion, and R. Tedrake, “Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot,” Autonomous Robots, vol. 40, no. 3, pp. 429–455, 2016.

    Article  Google Scholar 

  13. Y. Gong, R. Hartley, X. Da, A. Hereid, O. Harib, J.-K. Huang, and J. Grizzle, “Feedback control of a cassie bipedal robot: Walking, standing, and riding a segway,” Proc. of American Control Conference (ACC), pp. 4559–4566, IEEE, 2019.

  14. S. Shigemi, “ASIMO and humanoid robot research at Honda,” Humanoid Robotics: A Reference, pp. 55–90, 2018.

  15. M. Raibert, K. Blankespoor, G. Nelson, and R. Playter, “Bigdog, the rough-terrain quadruped robot,” IFAC Proceedings Volumes, vol. 41, no. 2, pp. 10822–10825, 2008.

    Article  Google Scholar 

  16. G. Bledt, M. J. Powell, B. Katz, J. Di Carlo, P. M. Wensing, and S. Kim, “Mit cheetah 3: Design and control of a robust, dynamic quadruped robot,” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2245–2252, IEEE, 2018.

  17. M. Hutter, C. Gehring, D. Jud, A. Lauber, C. D. Bellicoso, V. Tsounis, J. Hwangbo, K. Bodie, P. Fankhauser, M. Bloesch, R. Diethelm, S. Bachmann, A. Melzer, and M. Hoepflinger, “Anymal-A highly mobile and dynamic quadrupedal robot,” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 38–44, IEEE, 2016.

  18. S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen, “Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection,” The International Journal of Robotics Research, vol. 37, no. 4–5, pp. 421–436, 2018.

    Article  Google Scholar 

  19. J. Kober, J. A. Bagnell, and J. Peters, “Reinforcement learning in robotics: A survey,” The International Journal of Robotics Research, vol. 32, no. 11, pp. 1238–1274, 2013.

    Article  Google Scholar 

  20. J. P. Hanna and P. Stone, “Grounded action transformation for robot learning in simulation,” Proc. of the AAAI Conference on Artificial Intelligence, vol. 31, pp. 3834–3840, 2017.

    Article  Google Scholar 

  21. F. Sadeghi and S. Levine, “CAD2RL: Real single-image flight without a single real image,” Robotics: Science and Systems XIII, pp. 1–12, 2017.

  22. J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” Proc. of International Conference on Intelligent Robots and Systems (IROS), pp. 23–30, IEEE, 2017.

  23. S. James, A. J. Davison, and E. Johns, “Transferring end-to-end visuomotor control from simulation to real world for a multi-stage task,” Proc. of Conference on Robot Learning, pp. 334–343, PMLR, 2017.

  24. F. Muratore, M. Gienger, and J. Peters, “Assessing transferability from simulation to reality for reinforcement learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 4, pp. 1172–1183, 2021.

    Article  Google Scholar 

  25. J. M. Wang, D. J. Fleet, and A. Hertzmann, “Optimizing walking controllers for uncertain inputs and environments,” ACM Transactions on Graphics (TOG), vol. 29, no. 4, pp. 1–8, 2010.

    Google Scholar 

  26. H. K. Venkataraman and P. J. Seiler, “Recovering robustness in model-free reinforcement learning,” Proc. of American Control Conference (ACC), pp. 4210–4216, IEEE, 2019.

  27. L. Pinto, J. Davidson, R. Sukthankar, and A. Gupta, “Robust adversarial reinforcement learning,” Proc. of International Conference on Machine Learning, pp. 2817–2826, PMLR, 2017.

  28. A. Mandlekar, Y. Zhu, A. Garg, L. Fei-Fei, and S. Savarese, “Adversarially robust policy learning: Active construction of physically-plausible perturbations,” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3932–3939, IEEE, 2017.

  29. C. I. Byrnes and A. Isidori, “Asymptotic stabilization of minimum phase nonlinear systems,” IEEE Transactions on Automatic Control, vol. 36, no. 10, pp. 1122–1137, 1991.

    Article  MathSciNet  MATH  Google Scholar 

  30. E. R. Westervelt, J. W. Grizzle, C. Chevallereau, J. H. Choi, and B. Morris, Feedback Control of Dynamic Bipedal Robot Locomotion, CRC press, 2018.

  31. J. W. Grizzle and C. Chevallereau, “Virtual constraints and hybrid zero dynamics for realizing underactuated bipedal locomotion,” arXiv preprint arXiv:1706.01127, pp. 1–17, 2017.

  32. B. D. Anderson and J. B. Moore, Optimal Control: Linear Quadratic Methods, Courier Corporation, 2007.

  33. B. Griffin and J. Grizzle, “Nonholonomic virtual constraints and gait optimization for robust walking control,” The International Journal of Robotics Research, vol. 36, no. 8, pp. 895–922, 2017.

    Article  Google Scholar 

  34. J. C. Horn, A. Mohammadi, K. A. Hamed, and R. D. Gregg, “Hybrid zero dynamics of bipedal robots under nonholonomic virtual constraints,” IEEE Control Systems Letters, vol. 3, no. 2, pp. 386–391, 2018.

    Article  MathSciNet  Google Scholar 

  35. S. M. Kakade, “A natural policy gradient,” Advances in Neural Information Processing Systems, pp. 1531–1538, 2002.

  36. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, pp. 1–12, 2017.

  37. J. Fu, A. Kumar, O. Nachum, G. Tucker, and S. Levine, “D4RL: Datasets for deep data-driven reinforcement learning,” arXiv preprint arXiv:2004.07219, pp. 1–19, 2020.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sergey Pankov.

Additional information

We thank Georges Harik for many useful discussions.

Sergey Pankov received his Ph.D. degree in physics from Rutgers University in 2003. His research interests include legged locomotion control, reinforcement learning, and deep learning.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pankov, S. Configuration Path Control. Int. J. Control Autom. Syst. 21, 306–317 (2023). https://doi.org/10.1007/s12555-021-0466-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12555-021-0466-5

Keywords

Navigation