Nonparametric representation of an approximated Poincaré map for learning biped locomotion

Morimoto, Jun; Atkeson, Christopher G.

doi:10.1007/s10514-009-9133-z

Nonparametric representation of an approximated Poincaré map for learning biped locomotion

Published: 01 September 2009

Volume 27, pages 131–144, (2009)
Cite this article

Autonomous Robots Aims and scope Submit manuscript

Jun Morimoto^1,2 &
Christopher G. Atkeson³

388 Accesses
29 Citations
Explore all metrics

Abstract

We propose approximating a Poincaré map of biped walking dynamics using Gaussian processes. We locally optimize parameters of a given biped walking controller based on the approximated Poincaré map. By using Gaussian processes, we can estimate a probability distribution of a target nonlinear function with a given covariance. Thus, an optimization method can take the uncertainty of approximated maps into account throughout the learning process. We use a reinforcement learning (RL) method as the optimization method. Although RL is a useful non-linear optimizer, it is usually difficult to apply RL to real robotic systems due to the large number of iterations required to acquire suitable policies. In this study, we first approximated the Poincaré map by using data from a real robot, and then applied RL using the estimated map in order to optimize stepping and walking policies. We show that we can improve stepping and walking policies both in simulated and real environments. Experimental validation on a humanoid robot of the approach is presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Abbeel, P., Quigley, M., & Ng, A. Y. (2006). Using inaccurate models in reinforcement learning. In Proceedings of the 23rd international conference on machine learning (pp. 1–8). New York: ACM.
Chapter Google Scholar
Atkeson, C. G. (1998). Nonparametric model-based reinforcement learning. In M.I. Jordan, M. Kearns & S. Solla (Eds.), Advances in neural information processing systems 10 (pp. 1008–1014). Cambridge: MIT.
Google Scholar
Atkeson, C. G., & Schaal, S. (1997). Robot learning from demonstration. In Proc. 14th international conference on machine learning (pp. 12–20). San Mateo: Morgan Kaufmann.
Google Scholar
Bagnell, A., & Schneider, J. (2003). Covariant policy search. In Proceedings of the eighteenth international joint conference on artificial intelligence (pp. 1019–1024).
Baird, L. C., & Moore, A. W. (1999). Gradient descent for general reinforcement learning. In Advances in neural information processing systems 11 (pp. 968–974). Cambridge: MIT.
Google Scholar
Benbrahim, H., & Franklin, J. (1997). Biped dynamic walking using reinforcement learning. Robotics and Autonomous Systems, 22, 283–302.
Article Google Scholar
Bishop, C. M. (2006). Pattern recognition and machine learning. New York: Springer.
MATH Google Scholar
Byl, K., & Tedrake, R. (2008). Metastable walking on stochastically rough terrain. In Proceedings of robotics: science and systems IV, Zurich, Switzerland, June 2008.
Candela, J. Q., & Rasmussen, C. E. (2005). A unifying view of sparse approximate Gaussian process regression. Journal of Machine Learning Research, 6, 1939–1959.
Google Scholar
Dearden, R., Friedman, N., & Andre, D. (1999). Model based Bayesian exploration. In Proceedings of fifteenth conference on uncertainty in artificial intelligence (pp. 457–464). San Francisco: Morgan Kaufmann.
Google Scholar
der Linde, R. Q. V. (1999). Passive bipedal walking with phasic muscle contraction. Biological Cybernetics, 82, 227–237.
Article Google Scholar
Doya, K. (2000). Reinforcement learning in continuous time and space. Neural Computation, 12(1), 219–245.
Article Google Scholar
Endo, G., Morimoto, J., Matsubara, T., Nakanishi, J., & Cheng, G. (2008). Learning CPG-based biped locomotion with a policy gradient method: application to a humanoid robot. International Journal of Robotics Research, 27(2), 213–228.
Article Google Scholar
Ghavamzadeh, M., & Engel, Y. (2007). Bayesian policy gradient algorithms. In B. Scholkopf, J. Platt & T. Hofmann (Eds.), Advances in neural information processing systems 19 (pp. 457–464). Cambridge: MIT.
Google Scholar
Hirai, K., Hirose, M., & Takenaka, T. (1998). The development of Honda humanoid robot. In Proceedings of the 1998 IEEE international conference on robotics and automation (pp. 160–165).
Howard, M., Klanke, S., Gienger, M., Goerick, C., & Vijayakumar, S. (2009). A novel method for learning policies from variable constraint data. Autonomous Robots (same special issue, Part B).
Hyon, S., Hale, J. G., & Cheng, G. (2007). Full-body compliant human-humanoid interaction: Balancing in the presence of unknown external forces. IEEE Transactions on Robotics, 23(5), 884–898.
Article Google Scholar
Jaakkola, T., Singh, S. P., & Jordan, M. I. (1995). Reinforcement learning algorithm for partially observable Markov decision problems. In G. Tesauro, D. Touretzky & T. Leen (Eds.), Advances in neural information processing systems 7 (pp. 345–352). Cambridge: MIT.
Google Scholar
Kajita, S., Nagasaki, T., Kaneko, K., & Hirukawa, H. (2007). ZMP-based biped running control. Robotics and Automation Magazine, IEEE, 14(2), 63–72.
Article Google Scholar
Kakade, S. (2002). A natural policy gradient. In Advances in neural information processing systems 14 (pp. 1531–1536). Cambridge: MIT.
Google Scholar
Kimura, H., & Kobayashi, S. (1998). An analysis of actor/critic algorithms using eligibility traces: reinforcement learning with imperfect value functions. In Proceedings of the 15th int. conf. on machine learning (pp. 284–292).
Ko, J., & Fox, D. (2009). GP-BayesFilters: Bayesian filtering using Gaussian process prediction and observation models. Autonomous Robots (same special issue, Part A).
Konda, V. R., & Tsitsiklis, J. N. (2003). Actor-critic algorithms. SIAM Journal on Control and Optimization, 42(4), 1143–1166.
Article MATH MathSciNet Google Scholar
Kuvayev, L., & Sutton, R. (1996). Model-based reinforcement learning with an approximate, learned model. In Proceedings of the ninth Yale workshop on adaptive and learning systems (pp. 101–105).
Matsubara, T., Morimoto, J., Nakanishi, J., Sato, M., & Doya, K. (2006). Learning CPG-based biped locomotion with a policy gradient method. Robotics and Autonomous Systems, 54(11), 911–920.
Article Google Scholar
McGeer, T. (1990). Passive dynamic walking. International Journal of Robotics Research, 9(2), 62–82.
Article Google Scholar
Meuleau, N., Kim, K. E., & Kaelbling, L. P. (2001). Exploration in gradient-based reinforcement learning. Technical report, AI Memo 2001-003, MIT.
Miura, H., & Shimoyama, I. (1984). Dynamical walk of biped locomotion. International Journal of Robotics Research, 3(2), 60–74.
Article Google Scholar
Miyazaki, F., & Arimoto, S. (1981). Implementation of a hierarchical control for biped locomotion. In 8th IFAC (pp. 43–48).
Morimoto, J., & Atkeson, C. G. (2007). Learning biped locomotion: application of Poincaré-map-based reinforcement learning. IEEE Robotics and Automation Magazine, 14(2), 41–51.
Article Google Scholar
Morimoto, J., & Doya, K. (2001). Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning. Robotics and Autonomous Systems, 36, 37–51.
Article MATH Google Scholar
Morimoto, J., Endo, G., Nakanishi, J., Hyon, S., Cheng, G., Atkeson, C. G., & Bentivegna, D. (2006). Modulation of simple sinusoidal patterns by a coupled oscillator model for biped walking. In Proceedings of the 2006 IEEE international conference on robotics and automation (pp. 1579–1584).
Morimoto, J., Endo, G., Nakanish, J., & Cheng, G. (2008). A biologically inspired biped locomotion strategy for humanoid robots: modulation of sinusoidal patterns by a coupled oscillator model. IEEE Transaction on Robotics, 24(1), 185–191.
Article Google Scholar
Nagasaka, K., Inaba, M., & Inoue, H. (1999). Stabilization of dynamic walk on a humanoid using torso position compliance control. In Proceedings of 17th annual conference on robotics society of Japan (pp. 1193–1194).
Nagasaka, K., Kuroki, Y., Suzuki, S., Itoh, Y., & Yamaguchi, J. (2004). Integrated motion control for walking, jumping and running on a small bipedal entertainment robot. In Proceedings of IEEE 2004 international conference on robotics and automation (pp. 3189–3194). New Orleans, LA, USA.
Peters, J., & Schaal, S. (2008). Reinforcement learning of motor skills with policy gradients. Neural Networks, 21(4), 682–697.
Article Google Scholar
Peters, J., & Schaal, S. (2008). Natural actor-critic. Neurocomputing, 71(7–9), 1180–1190.
Article Google Scholar
Rasmussen, C. E., & Kuss, M. (2004). Gaussian processes in reinforcement learning. In Advances in neural information processing systems (vol. 16, pp. 751–759). Cambridge: MIT.
Google Scholar
Rasmussen, C. E., & Williams, C. K. I. (2006). Gaussian processes for machine learning. Cambridge: MIT.
MATH Google Scholar
Riedmiller, M., Gablel, R. H. T., & Lange, S. (2009). Reinforcement learning for robot soccer. Autonomous Robots (same special issue, Part A).
Shiriaev, A., Robertsson, A., Perram, J., & Sandberg, A. (2005). Periodic motion planning for virtually constrained (hybrid) mechanical systems. In Proceedings of IEEE conference on decision and control (pp. 4035–4040).
Smola, J., & Bartlett, P. (2001). Sparse greedy Gaussian process regression. In T. G. Diettrich & V. Tresp (Eds.), Advances in neural information processing systems 13 (pp. 619–625). Cambridge: MIT.
Google Scholar
Snelson, E., & Ghahramani, Z. (2006). Sparse Gaussian processes using pseudo-inputs. In Y. Weiss, B. Scholkof & J. Platt (Eds.), Advances in neural information processing systems 18 (pp. 1257–1264). Cambridge: MIT.
Google Scholar
Strogatz, S. H. (1994). Nonlinear dynamics and chaos. Reading: Addison-Wesley.
Google Scholar
Sugihara, T., & Nakamura, Y. (2002). Whole-body cooperative COG control through ZMP manipulation for humanoid robots. In IEEE int. conf. on robotics and automation, Washington DC, USA, 2002.
Sugihara, T., & Nakamura, Y. (2005). A fast online gait planning with boundary condition relaxation for humanoid robots. In IEEE int. conf. on robotics and automation (pp. 306–311). Barcelona, Spain.
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: an introduction. Cambridge: MIT.
Google Scholar
Sutton, R. S., McAllester, D., Singh, S., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems 12 (pp. 1057–1063). Cambridge: MIT.
Google Scholar
Tedrake, R., Zhang, T. W., & Seung, H. S. (2004). Stochastic policy gradient reinforcement learning on a simple 3D biped. In Proceedings of the 2004 IEEE/RSJ international conference on intelligent robots and systems (pp. 2849–2854).
Tsuchiya, K., Aoi, S., & Tsujita, K. (2003). Locomotion control of a biped locomotion robot using nonlinear oscillators. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (pp. 1745–1750). Las Vegas, NV, USA.
Westervelt, E. R., Buche, G., & Grizzle, J. W. (2004). Experimental validation of a framework for the design of controllers that induce stable walking in planar bipeds. International Journal of Robotics Research, 23(6), 559–582.
Article Google Scholar
Williams, C. K. I., & Rasmussen, C. E. (1996). Gaussian processes for regression. In Advances in neural information processing systems (vol. 8, pp. 514–520). Cambridge: MIT.
Google Scholar

Download references

Author information

Authors and Affiliations

Computational Brain Project, Japan Science and Technology Agency, ICORP, Saitama, Japan
Jun Morimoto
Department of Brain Robot Interface, ATR Computational Neuroscience Laboratories, Kyoto, Japan
Jun Morimoto
The Robotics Institute, Carnegie Mellon University, Pittsburgh, USA
Christopher G. Atkeson

Authors

Jun Morimoto
View author publications
You can also search for this author in PubMed Google Scholar
Christopher G. Atkeson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun Morimoto.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Morimoto, J., Atkeson, C.G. Nonparametric representation of an approximated Poincaré map for learning biped locomotion. Auton Robot 27, 131–144 (2009). https://doi.org/10.1007/s10514-009-9133-z

Download citation

Received: 17 November 2008
Accepted: 02 August 2009
Published: 01 September 2009
Issue Date: August 2009
DOI: https://doi.org/10.1007/s10514-009-9133-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Nonparametric representation of an approximated Poincaré map for learning biped locomotion

Abstract

Access this article

Similar content being viewed by others

Bayesian Gait Optimization for Bipedal Locomotion

Gait generation for a biped robot with knees and torso via trajectory learning and state-transition estimation

Combining Simulations and Real-Robot Experiments for Bayesian Optimization of Bipedal Gait Stabilization

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Nonparametric representation of an approximated Poincaré map for learning biped locomotion

Abstract

Access this article

Similar content being viewed by others

Bayesian Gait Optimization for Bipedal Locomotion

Gait generation for a biped robot with knees and torso via trajectory learning and state-transition estimation

Combining Simulations and Real-Robot Experiments for Bayesian Optimization of Bipedal Gait Stabilization

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation