Configuration Path Control

Pankov, Sergey

doi:10.1007/s12555-021-0466-5

Configuration Path Control

Regular Papers
Robot and Applications
Published: 06 January 2023

Volume 21, pages 306–317, (2023)
Cite this article

International Journal of Control, Automation and Systems Aims and scope Submit manuscript

Sergey Pankov ORCID: orcid.org/0000-0002-7843-0540¹

55 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Reinforcement learning methods often produce brittle policies — policies that perform well during training, but generalize poorly beyond their direct training experience, thus becoming unstable under small disturbances. To address this issue, we propose a method for stabilizing a control policy in the space of configuration paths. It is applied post-training and relies purely on the data produced during training, as well as on an instantaneous control-matrix estimation. The approach is evaluated empirically on a planar bipedal walker subjected to a variety of perturbations. The control policies obtained via reinforcement learning are compared against their stabilized counterparts. Across different experiments, we find two- to four-fold increase in stability, when measured in terms of the perturbation amplitudes. We also provide a zero-dynamics interpretation of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning to exploit passive compliance for energy-efficient gait generation on a compliant humanoid

Article 13 February 2018

GLiDE: Generalizable Quadrupedal Locomotion in Diverse Environments with a Centroidal Model

Adapting Biped Locomotion to Sloped Environments

Article 01 February 2015

References

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105, 2012.
Google Scholar
Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, pp. 436–444, 2015.
Article Google Scholar
G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury, “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82–97, 2012.
Article Google Scholar
T. Young, D. Hazarika, S. Poria, and E. Cambria, “Recent trends in deep learning based natural language processing,” IEEE Computational Intelligence Magazine, vol. 13, no. 3, pp. 55–75, 2018.
Article Google Scholar
S. P. Singh, A. Kumar, H. Darbari, L. Singh, A. Rastogi, and S. Jain, “Machine translation using deep learning: An overview,” Proc. of International Conference on Computer, Communications and Electronics (Comptelix), pp. 162–167, IEEE, 2017.
V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” arXiv preprint arXiv:1312.5602, pp. 1–9, 2013.
E. Gibney, “Google AI algorithm masters ancient game of Go,” Nature, vol. 529, no. 7587, pp. 445–446, 2016.
Article Google Scholar
Y. Duan, X. Chen, R. Houthooft, J. Schulman, and P. Abbeel, “Benchmarking deep reinforcement learning for continuous control,” Proc. of International Conference on Machine Learning, vol. 48, pp. 1329–1338, 2016.
Google Scholar
J. Schulman, S. Levine, P. Moritz, M. Jordan, and P. Abbeel, “Trust region policy optimization,” Proc. of the 32nd International Conference on Machine Learning, vol. 37, pp. 1889–1897, 2015.
Google Scholar
J. Schulman, P. Moritz, S. Levine, M. I. Jordan, and P. Abbeel, “High-dimensional continuous control using generalized advantage estimation,” Proc. of 4th International Conference on Learning Representations, pp. 1–14, 2016.
S. Gu, E. Holly, T. Lillicrap, and S. Levine, “Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates,” Proc. of IEEE International Conference on Robotics and Automation, pp. 3389–3396, IEEE, 2017.
S. Kuindersma, R. Deits, M. F. Fallon, A. Valenzuela, H. Dai, F. Permenter, T. Koolen, P. Marion, and R. Tedrake, “Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot,” Autonomous Robots, vol. 40, no. 3, pp. 429–455, 2016.
Article Google Scholar
Y. Gong, R. Hartley, X. Da, A. Hereid, O. Harib, J.-K. Huang, and J. Grizzle, “Feedback control of a cassie bipedal robot: Walking, standing, and riding a segway,” Proc. of American Control Conference (ACC), pp. 4559–4566, IEEE, 2019.
S. Shigemi, “ASIMO and humanoid robot research at Honda,” Humanoid Robotics: A Reference, pp. 55–90, 2018.
M. Raibert, K. Blankespoor, G. Nelson, and R. Playter, “Bigdog, the rough-terrain quadruped robot,” IFAC Proceedings Volumes, vol. 41, no. 2, pp. 10822–10825, 2008.
Article Google Scholar
G. Bledt, M. J. Powell, B. Katz, J. Di Carlo, P. M. Wensing, and S. Kim, “Mit cheetah 3: Design and control of a robust, dynamic quadruped robot,” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2245–2252, IEEE, 2018.
M. Hutter, C. Gehring, D. Jud, A. Lauber, C. D. Bellicoso, V. Tsounis, J. Hwangbo, K. Bodie, P. Fankhauser, M. Bloesch, R. Diethelm, S. Bachmann, A. Melzer, and M. Hoepflinger, “Anymal-A highly mobile and dynamic quadrupedal robot,” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 38–44, IEEE, 2016.
S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen, “Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection,” The International Journal of Robotics Research, vol. 37, no. 4–5, pp. 421–436, 2018.
Article Google Scholar
J. Kober, J. A. Bagnell, and J. Peters, “Reinforcement learning in robotics: A survey,” The International Journal of Robotics Research, vol. 32, no. 11, pp. 1238–1274, 2013.
Article Google Scholar
J. P. Hanna and P. Stone, “Grounded action transformation for robot learning in simulation,” Proc. of the AAAI Conference on Artificial Intelligence, vol. 31, pp. 3834–3840, 2017.
Article Google Scholar
F. Sadeghi and S. Levine, “CAD2RL: Real single-image flight without a single real image,” Robotics: Science and Systems XIII, pp. 1–12, 2017.
J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” Proc. of International Conference on Intelligent Robots and Systems (IROS), pp. 23–30, IEEE, 2017.
S. James, A. J. Davison, and E. Johns, “Transferring end-to-end visuomotor control from simulation to real world for a multi-stage task,” Proc. of Conference on Robot Learning, pp. 334–343, PMLR, 2017.
F. Muratore, M. Gienger, and J. Peters, “Assessing transferability from simulation to reality for reinforcement learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 4, pp. 1172–1183, 2021.
Article Google Scholar
J. M. Wang, D. J. Fleet, and A. Hertzmann, “Optimizing walking controllers for uncertain inputs and environments,” ACM Transactions on Graphics (TOG), vol. 29, no. 4, pp. 1–8, 2010.
Google Scholar
H. K. Venkataraman and P. J. Seiler, “Recovering robustness in model-free reinforcement learning,” Proc. of American Control Conference (ACC), pp. 4210–4216, IEEE, 2019.
L. Pinto, J. Davidson, R. Sukthankar, and A. Gupta, “Robust adversarial reinforcement learning,” Proc. of International Conference on Machine Learning, pp. 2817–2826, PMLR, 2017.
A. Mandlekar, Y. Zhu, A. Garg, L. Fei-Fei, and S. Savarese, “Adversarially robust policy learning: Active construction of physically-plausible perturbations,” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3932–3939, IEEE, 2017.
C. I. Byrnes and A. Isidori, “Asymptotic stabilization of minimum phase nonlinear systems,” IEEE Transactions on Automatic Control, vol. 36, no. 10, pp. 1122–1137, 1991.
Article MathSciNet MATH Google Scholar
E. R. Westervelt, J. W. Grizzle, C. Chevallereau, J. H. Choi, and B. Morris, Feedback Control of Dynamic Bipedal Robot Locomotion, CRC press, 2018.
J. W. Grizzle and C. Chevallereau, “Virtual constraints and hybrid zero dynamics for realizing underactuated bipedal locomotion,” arXiv preprint arXiv:1706.01127, pp. 1–17, 2017.
B. D. Anderson and J. B. Moore, Optimal Control: Linear Quadratic Methods, Courier Corporation, 2007.
B. Griffin and J. Grizzle, “Nonholonomic virtual constraints and gait optimization for robust walking control,” The International Journal of Robotics Research, vol. 36, no. 8, pp. 895–922, 2017.
Article Google Scholar
J. C. Horn, A. Mohammadi, K. A. Hamed, and R. D. Gregg, “Hybrid zero dynamics of bipedal robots under nonholonomic virtual constraints,” IEEE Control Systems Letters, vol. 3, no. 2, pp. 386–391, 2018.
Article MathSciNet Google Scholar
S. M. Kakade, “A natural policy gradient,” Advances in Neural Information Processing Systems, pp. 1531–1538, 2002.
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, pp. 1–12, 2017.
J. Fu, A. Kumar, O. Nachum, G. Tucker, and S. Levine, “D4RL: Datasets for deep data-driven reinforcement learning,” arXiv preprint arXiv:2004.07219, pp. 1–19, 2020.

Download references

Author information

Authors and Affiliations

Harik Shazeer Labs, Palo Alto, CA, 94301, USA
Sergey Pankov

Authors

Sergey Pankov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sergey Pankov.

Additional information

We thank Georges Harik for many useful discussions.

Sergey Pankov received his Ph.D. degree in physics from Rutgers University in 2003. His research interests include legged locomotion control, reinforcement learning, and deep learning.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pankov, S. Configuration Path Control. Int. J. Control Autom. Syst. 21, 306–317 (2023). https://doi.org/10.1007/s12555-021-0466-5

Download citation

Received: 04 June 2021
Revised: 28 January 2022
Accepted: 09 March 2022
Published: 06 January 2023
Issue Date: January 2023
DOI: https://doi.org/10.1007/s12555-021-0466-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Configuration Path Control

Abstract

Access this article

Similar content being viewed by others

Learning to exploit passive compliance for energy-efficient gait generation on a compliant humanoid

GLiDE: Generalizable Quadrupedal Locomotion in Diverse Environments with a Centroidal Model

Adapting Biped Locomotion to Sloped Environments

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Configuration Path Control

Abstract

Access this article

Similar content being viewed by others

Learning to exploit passive compliance for energy-efficient gait generation on a compliant humanoid

GLiDE: Generalizable Quadrupedal Locomotion in Diverse Environments with a Centroidal Model

Adapting Biped Locomotion to Sloped Environments

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation