Skip to main content
Log in

Nonparametric representation of an approximated Poincaré map for learning biped locomotion

  • Published:
Autonomous Robots Aims and scope Submit manuscript

Abstract

We propose approximating a Poincaré map of biped walking dynamics using Gaussian processes. We locally optimize parameters of a given biped walking controller based on the approximated Poincaré map. By using Gaussian processes, we can estimate a probability distribution of a target nonlinear function with a given covariance. Thus, an optimization method can take the uncertainty of approximated maps into account throughout the learning process. We use a reinforcement learning (RL) method as the optimization method. Although RL is a useful non-linear optimizer, it is usually difficult to apply RL to real robotic systems due to the large number of iterations required to acquire suitable policies. In this study, we first approximated the Poincaré map by using data from a real robot, and then applied RL using the estimated map in order to optimize stepping and walking policies. We show that we can improve stepping and walking policies both in simulated and real environments. Experimental validation on a humanoid robot of the approach is presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abbeel, P., Quigley, M., & Ng, A. Y. (2006). Using inaccurate models in reinforcement learning. In Proceedings of the 23rd international conference on machine learning (pp. 1–8). New York: ACM.

    Chapter  Google Scholar 

  • Atkeson, C. G. (1998). Nonparametric model-based reinforcement learning. In M.I. Jordan, M. Kearns & S. Solla (Eds.), Advances in neural information processing systems 10 (pp. 1008–1014). Cambridge: MIT.

    Google Scholar 

  • Atkeson, C. G., & Schaal, S. (1997). Robot learning from demonstration. In Proc. 14th international conference on machine learning (pp. 12–20). San Mateo: Morgan Kaufmann.

    Google Scholar 

  • Bagnell, A., & Schneider, J. (2003). Covariant policy search. In Proceedings of the eighteenth international joint conference on artificial intelligence (pp. 1019–1024).

  • Baird, L. C., & Moore, A. W. (1999). Gradient descent for general reinforcement learning. In Advances in neural information processing systems 11 (pp. 968–974). Cambridge: MIT.

    Google Scholar 

  • Benbrahim, H., & Franklin, J. (1997). Biped dynamic walking using reinforcement learning. Robotics and Autonomous Systems, 22, 283–302.

    Article  Google Scholar 

  • Bishop, C. M. (2006). Pattern recognition and machine learning. New York: Springer.

    MATH  Google Scholar 

  • Byl, K., & Tedrake, R. (2008). Metastable walking on stochastically rough terrain. In Proceedings of robotics: science and systems IV, Zurich, Switzerland, June 2008.

  • Candela, J. Q., & Rasmussen, C. E. (2005). A unifying view of sparse approximate Gaussian process regression. Journal of Machine Learning Research, 6, 1939–1959.

    Google Scholar 

  • Dearden, R., Friedman, N., & Andre, D. (1999). Model based Bayesian exploration. In Proceedings of fifteenth conference on uncertainty in artificial intelligence (pp. 457–464). San Francisco: Morgan Kaufmann.

    Google Scholar 

  • der Linde, R. Q. V. (1999). Passive bipedal walking with phasic muscle contraction. Biological Cybernetics, 82, 227–237.

    Article  Google Scholar 

  • Doya, K. (2000). Reinforcement learning in continuous time and space. Neural Computation, 12(1), 219–245.

    Article  Google Scholar 

  • Endo, G., Morimoto, J., Matsubara, T., Nakanishi, J., & Cheng, G. (2008). Learning CPG-based biped locomotion with a policy gradient method: application to a humanoid robot. International Journal of Robotics Research, 27(2), 213–228.

    Article  Google Scholar 

  • Ghavamzadeh, M., & Engel, Y. (2007). Bayesian policy gradient algorithms. In B. Scholkopf, J. Platt & T. Hofmann (Eds.), Advances in neural information processing systems 19 (pp. 457–464). Cambridge: MIT.

    Google Scholar 

  • Hirai, K., Hirose, M., & Takenaka, T. (1998). The development of Honda humanoid robot. In Proceedings of the 1998 IEEE international conference on robotics and automation (pp. 160–165).

  • Howard, M., Klanke, S., Gienger, M., Goerick, C., & Vijayakumar, S. (2009). A novel method for learning policies from variable constraint data. Autonomous Robots (same special issue, Part B).

  • Hyon, S., Hale, J. G., & Cheng, G. (2007). Full-body compliant human-humanoid interaction: Balancing in the presence of unknown external forces. IEEE Transactions on Robotics, 23(5), 884–898.

    Article  Google Scholar 

  • Jaakkola, T., Singh, S. P., & Jordan, M. I. (1995). Reinforcement learning algorithm for partially observable Markov decision problems. In G. Tesauro, D. Touretzky & T. Leen (Eds.), Advances in neural information processing systems 7 (pp. 345–352). Cambridge: MIT.

    Google Scholar 

  • Kajita, S., Nagasaki, T., Kaneko, K., & Hirukawa, H. (2007). ZMP-based biped running control. Robotics and Automation Magazine, IEEE, 14(2), 63–72.

    Article  Google Scholar 

  • Kakade, S. (2002). A natural policy gradient. In Advances in neural information processing systems 14 (pp. 1531–1536). Cambridge: MIT.

    Google Scholar 

  • Kimura, H., & Kobayashi, S. (1998). An analysis of actor/critic algorithms using eligibility traces: reinforcement learning with imperfect value functions. In Proceedings of the 15th int. conf. on machine learning (pp. 284–292).

  • Ko, J., & Fox, D. (2009). GP-BayesFilters: Bayesian filtering using Gaussian process prediction and observation models. Autonomous Robots (same special issue, Part A).

  • Konda, V. R., & Tsitsiklis, J. N. (2003). Actor-critic algorithms. SIAM Journal on Control and Optimization, 42(4), 1143–1166.

    Article  MATH  MathSciNet  Google Scholar 

  • Kuvayev, L., & Sutton, R. (1996). Model-based reinforcement learning with an approximate, learned model. In Proceedings of the ninth Yale workshop on adaptive and learning systems (pp. 101–105).

  • Matsubara, T., Morimoto, J., Nakanishi, J., Sato, M., & Doya, K. (2006). Learning CPG-based biped locomotion with a policy gradient method. Robotics and Autonomous Systems, 54(11), 911–920.

    Article  Google Scholar 

  • McGeer, T. (1990). Passive dynamic walking. International Journal of Robotics Research, 9(2), 62–82.

    Article  Google Scholar 

  • Meuleau, N., Kim, K. E., & Kaelbling, L. P. (2001). Exploration in gradient-based reinforcement learning. Technical report, AI Memo 2001-003, MIT.

  • Miura, H., & Shimoyama, I. (1984). Dynamical walk of biped locomotion. International Journal of Robotics Research, 3(2), 60–74.

    Article  Google Scholar 

  • Miyazaki, F., & Arimoto, S. (1981). Implementation of a hierarchical control for biped locomotion. In 8th IFAC (pp. 43–48).

  • Morimoto, J., & Atkeson, C. G. (2007). Learning biped locomotion: application of Poincaré-map-based reinforcement learning. IEEE Robotics and Automation Magazine, 14(2), 41–51.

    Article  Google Scholar 

  • Morimoto, J., & Doya, K. (2001). Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning. Robotics and Autonomous Systems, 36, 37–51.

    Article  MATH  Google Scholar 

  • Morimoto, J., Endo, G., Nakanishi, J., Hyon, S., Cheng, G., Atkeson, C. G., & Bentivegna, D. (2006). Modulation of simple sinusoidal patterns by a coupled oscillator model for biped walking. In Proceedings of the 2006 IEEE international conference on robotics and automation (pp. 1579–1584).

  • Morimoto, J., Endo, G., Nakanish, J., & Cheng, G. (2008). A biologically inspired biped locomotion strategy for humanoid robots: modulation of sinusoidal patterns by a coupled oscillator model. IEEE Transaction on Robotics, 24(1), 185–191.

    Article  Google Scholar 

  • Nagasaka, K., Inaba, M., & Inoue, H. (1999). Stabilization of dynamic walk on a humanoid using torso position compliance control. In Proceedings of 17th annual conference on robotics society of Japan (pp. 1193–1194).

  • Nagasaka, K., Kuroki, Y., Suzuki, S., Itoh, Y., & Yamaguchi, J. (2004). Integrated motion control for walking, jumping and running on a small bipedal entertainment robot. In Proceedings of IEEE 2004 international conference on robotics and automation (pp. 3189–3194). New Orleans, LA, USA.

  • Peters, J., & Schaal, S. (2008). Reinforcement learning of motor skills with policy gradients. Neural Networks, 21(4), 682–697.

    Article  Google Scholar 

  • Peters, J., & Schaal, S. (2008). Natural actor-critic. Neurocomputing, 71(7–9), 1180–1190.

    Article  Google Scholar 

  • Rasmussen, C. E., & Kuss, M. (2004). Gaussian processes in reinforcement learning. In Advances in neural information processing systems (vol. 16, pp. 751–759). Cambridge: MIT.

    Google Scholar 

  • Rasmussen, C. E., & Williams, C. K. I. (2006). Gaussian processes for machine learning. Cambridge: MIT.

    MATH  Google Scholar 

  • Riedmiller, M., Gablel, R. H. T., & Lange, S. (2009). Reinforcement learning for robot soccer. Autonomous Robots (same special issue, Part A).

  • Shiriaev, A., Robertsson, A., Perram, J., & Sandberg, A. (2005). Periodic motion planning for virtually constrained (hybrid) mechanical systems. In Proceedings of IEEE conference on decision and control (pp. 4035–4040).

  • Smola, J., & Bartlett, P. (2001). Sparse greedy Gaussian process regression. In T. G. Diettrich & V. Tresp (Eds.), Advances in neural information processing systems 13 (pp. 619–625). Cambridge: MIT.

    Google Scholar 

  • Snelson, E., & Ghahramani, Z. (2006). Sparse Gaussian processes using pseudo-inputs. In Y. Weiss, B. Scholkof & J. Platt (Eds.), Advances in neural information processing systems 18 (pp. 1257–1264). Cambridge: MIT.

    Google Scholar 

  • Strogatz, S. H. (1994). Nonlinear dynamics and chaos. Reading: Addison-Wesley.

    Google Scholar 

  • Sugihara, T., & Nakamura, Y. (2002). Whole-body cooperative COG control through ZMP manipulation for humanoid robots. In IEEE int. conf. on robotics and automation, Washington DC, USA, 2002.

  • Sugihara, T., & Nakamura, Y. (2005). A fast online gait planning with boundary condition relaxation for humanoid robots. In IEEE int. conf. on robotics and automation (pp. 306–311). Barcelona, Spain.

  • Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: an introduction. Cambridge: MIT.

    Google Scholar 

  • Sutton, R. S., McAllester, D., Singh, S., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems 12 (pp. 1057–1063). Cambridge: MIT.

    Google Scholar 

  • Tedrake, R., Zhang, T. W., & Seung, H. S. (2004). Stochastic policy gradient reinforcement learning on a simple 3D biped. In Proceedings of the 2004 IEEE/RSJ international conference on intelligent robots and systems (pp. 2849–2854).

  • Tsuchiya, K., Aoi, S., & Tsujita, K. (2003). Locomotion control of a biped locomotion robot using nonlinear oscillators. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (pp. 1745–1750). Las Vegas, NV, USA.

  • Westervelt, E. R., Buche, G., & Grizzle, J. W. (2004). Experimental validation of a framework for the design of controllers that induce stable walking in planar bipeds. International Journal of Robotics Research, 23(6), 559–582.

    Article  Google Scholar 

  • Williams, C. K. I., & Rasmussen, C. E. (1996). Gaussian processes for regression. In Advances in neural information processing systems (vol. 8, pp. 514–520). Cambridge: MIT.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Morimoto.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Morimoto, J., Atkeson, C.G. Nonparametric representation of an approximated Poincaré map for learning biped locomotion. Auton Robot 27, 131–144 (2009). https://doi.org/10.1007/s10514-009-9133-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10514-009-9133-z

Keywords

Navigation