Abstract
We propose approximating a Poincaré map of biped walking dynamics using Gaussian processes. We locally optimize parameters of a given biped walking controller based on the approximated Poincaré map. By using Gaussian processes, we can estimate a probability distribution of a target nonlinear function with a given covariance. Thus, an optimization method can take the uncertainty of approximated maps into account throughout the learning process. We use a reinforcement learning (RL) method as the optimization method. Although RL is a useful non-linear optimizer, it is usually difficult to apply RL to real robotic systems due to the large number of iterations required to acquire suitable policies. In this study, we first approximated the Poincaré map by using data from a real robot, and then applied RL using the estimated map in order to optimize stepping and walking policies. We show that we can improve stepping and walking policies both in simulated and real environments. Experimental validation on a humanoid robot of the approach is presented.
Similar content being viewed by others
References
Abbeel, P., Quigley, M., & Ng, A. Y. (2006). Using inaccurate models in reinforcement learning. In Proceedings of the 23rd international conference on machine learning (pp. 1–8). New York: ACM.
Atkeson, C. G. (1998). Nonparametric model-based reinforcement learning. In M.I. Jordan, M. Kearns & S. Solla (Eds.), Advances in neural information processing systems 10 (pp. 1008–1014). Cambridge: MIT.
Atkeson, C. G., & Schaal, S. (1997). Robot learning from demonstration. In Proc. 14th international conference on machine learning (pp. 12–20). San Mateo: Morgan Kaufmann.
Bagnell, A., & Schneider, J. (2003). Covariant policy search. In Proceedings of the eighteenth international joint conference on artificial intelligence (pp. 1019–1024).
Baird, L. C., & Moore, A. W. (1999). Gradient descent for general reinforcement learning. In Advances in neural information processing systems 11 (pp. 968–974). Cambridge: MIT.
Benbrahim, H., & Franklin, J. (1997). Biped dynamic walking using reinforcement learning. Robotics and Autonomous Systems, 22, 283–302.
Bishop, C. M. (2006). Pattern recognition and machine learning. New York: Springer.
Byl, K., & Tedrake, R. (2008). Metastable walking on stochastically rough terrain. In Proceedings of robotics: science and systems IV, Zurich, Switzerland, June 2008.
Candela, J. Q., & Rasmussen, C. E. (2005). A unifying view of sparse approximate Gaussian process regression. Journal of Machine Learning Research, 6, 1939–1959.
Dearden, R., Friedman, N., & Andre, D. (1999). Model based Bayesian exploration. In Proceedings of fifteenth conference on uncertainty in artificial intelligence (pp. 457–464). San Francisco: Morgan Kaufmann.
der Linde, R. Q. V. (1999). Passive bipedal walking with phasic muscle contraction. Biological Cybernetics, 82, 227–237.
Doya, K. (2000). Reinforcement learning in continuous time and space. Neural Computation, 12(1), 219–245.
Endo, G., Morimoto, J., Matsubara, T., Nakanishi, J., & Cheng, G. (2008). Learning CPG-based biped locomotion with a policy gradient method: application to a humanoid robot. International Journal of Robotics Research, 27(2), 213–228.
Ghavamzadeh, M., & Engel, Y. (2007). Bayesian policy gradient algorithms. In B. Scholkopf, J. Platt & T. Hofmann (Eds.), Advances in neural information processing systems 19 (pp. 457–464). Cambridge: MIT.
Hirai, K., Hirose, M., & Takenaka, T. (1998). The development of Honda humanoid robot. In Proceedings of the 1998 IEEE international conference on robotics and automation (pp. 160–165).
Howard, M., Klanke, S., Gienger, M., Goerick, C., & Vijayakumar, S. (2009). A novel method for learning policies from variable constraint data. Autonomous Robots (same special issue, Part B).
Hyon, S., Hale, J. G., & Cheng, G. (2007). Full-body compliant human-humanoid interaction: Balancing in the presence of unknown external forces. IEEE Transactions on Robotics, 23(5), 884–898.
Jaakkola, T., Singh, S. P., & Jordan, M. I. (1995). Reinforcement learning algorithm for partially observable Markov decision problems. In G. Tesauro, D. Touretzky & T. Leen (Eds.), Advances in neural information processing systems 7 (pp. 345–352). Cambridge: MIT.
Kajita, S., Nagasaki, T., Kaneko, K., & Hirukawa, H. (2007). ZMP-based biped running control. Robotics and Automation Magazine, IEEE, 14(2), 63–72.
Kakade, S. (2002). A natural policy gradient. In Advances in neural information processing systems 14 (pp. 1531–1536). Cambridge: MIT.
Kimura, H., & Kobayashi, S. (1998). An analysis of actor/critic algorithms using eligibility traces: reinforcement learning with imperfect value functions. In Proceedings of the 15th int. conf. on machine learning (pp. 284–292).
Ko, J., & Fox, D. (2009). GP-BayesFilters: Bayesian filtering using Gaussian process prediction and observation models. Autonomous Robots (same special issue, Part A).
Konda, V. R., & Tsitsiklis, J. N. (2003). Actor-critic algorithms. SIAM Journal on Control and Optimization, 42(4), 1143–1166.
Kuvayev, L., & Sutton, R. (1996). Model-based reinforcement learning with an approximate, learned model. In Proceedings of the ninth Yale workshop on adaptive and learning systems (pp. 101–105).
Matsubara, T., Morimoto, J., Nakanishi, J., Sato, M., & Doya, K. (2006). Learning CPG-based biped locomotion with a policy gradient method. Robotics and Autonomous Systems, 54(11), 911–920.
McGeer, T. (1990). Passive dynamic walking. International Journal of Robotics Research, 9(2), 62–82.
Meuleau, N., Kim, K. E., & Kaelbling, L. P. (2001). Exploration in gradient-based reinforcement learning. Technical report, AI Memo 2001-003, MIT.
Miura, H., & Shimoyama, I. (1984). Dynamical walk of biped locomotion. International Journal of Robotics Research, 3(2), 60–74.
Miyazaki, F., & Arimoto, S. (1981). Implementation of a hierarchical control for biped locomotion. In 8th IFAC (pp. 43–48).
Morimoto, J., & Atkeson, C. G. (2007). Learning biped locomotion: application of Poincaré-map-based reinforcement learning. IEEE Robotics and Automation Magazine, 14(2), 41–51.
Morimoto, J., & Doya, K. (2001). Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning. Robotics and Autonomous Systems, 36, 37–51.
Morimoto, J., Endo, G., Nakanishi, J., Hyon, S., Cheng, G., Atkeson, C. G., & Bentivegna, D. (2006). Modulation of simple sinusoidal patterns by a coupled oscillator model for biped walking. In Proceedings of the 2006 IEEE international conference on robotics and automation (pp. 1579–1584).
Morimoto, J., Endo, G., Nakanish, J., & Cheng, G. (2008). A biologically inspired biped locomotion strategy for humanoid robots: modulation of sinusoidal patterns by a coupled oscillator model. IEEE Transaction on Robotics, 24(1), 185–191.
Nagasaka, K., Inaba, M., & Inoue, H. (1999). Stabilization of dynamic walk on a humanoid using torso position compliance control. In Proceedings of 17th annual conference on robotics society of Japan (pp. 1193–1194).
Nagasaka, K., Kuroki, Y., Suzuki, S., Itoh, Y., & Yamaguchi, J. (2004). Integrated motion control for walking, jumping and running on a small bipedal entertainment robot. In Proceedings of IEEE 2004 international conference on robotics and automation (pp. 3189–3194). New Orleans, LA, USA.
Peters, J., & Schaal, S. (2008). Reinforcement learning of motor skills with policy gradients. Neural Networks, 21(4), 682–697.
Peters, J., & Schaal, S. (2008). Natural actor-critic. Neurocomputing, 71(7–9), 1180–1190.
Rasmussen, C. E., & Kuss, M. (2004). Gaussian processes in reinforcement learning. In Advances in neural information processing systems (vol. 16, pp. 751–759). Cambridge: MIT.
Rasmussen, C. E., & Williams, C. K. I. (2006). Gaussian processes for machine learning. Cambridge: MIT.
Riedmiller, M., Gablel, R. H. T., & Lange, S. (2009). Reinforcement learning for robot soccer. Autonomous Robots (same special issue, Part A).
Shiriaev, A., Robertsson, A., Perram, J., & Sandberg, A. (2005). Periodic motion planning for virtually constrained (hybrid) mechanical systems. In Proceedings of IEEE conference on decision and control (pp. 4035–4040).
Smola, J., & Bartlett, P. (2001). Sparse greedy Gaussian process regression. In T. G. Diettrich & V. Tresp (Eds.), Advances in neural information processing systems 13 (pp. 619–625). Cambridge: MIT.
Snelson, E., & Ghahramani, Z. (2006). Sparse Gaussian processes using pseudo-inputs. In Y. Weiss, B. Scholkof & J. Platt (Eds.), Advances in neural information processing systems 18 (pp. 1257–1264). Cambridge: MIT.
Strogatz, S. H. (1994). Nonlinear dynamics and chaos. Reading: Addison-Wesley.
Sugihara, T., & Nakamura, Y. (2002). Whole-body cooperative COG control through ZMP manipulation for humanoid robots. In IEEE int. conf. on robotics and automation, Washington DC, USA, 2002.
Sugihara, T., & Nakamura, Y. (2005). A fast online gait planning with boundary condition relaxation for humanoid robots. In IEEE int. conf. on robotics and automation (pp. 306–311). Barcelona, Spain.
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: an introduction. Cambridge: MIT.
Sutton, R. S., McAllester, D., Singh, S., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems 12 (pp. 1057–1063). Cambridge: MIT.
Tedrake, R., Zhang, T. W., & Seung, H. S. (2004). Stochastic policy gradient reinforcement learning on a simple 3D biped. In Proceedings of the 2004 IEEE/RSJ international conference on intelligent robots and systems (pp. 2849–2854).
Tsuchiya, K., Aoi, S., & Tsujita, K. (2003). Locomotion control of a biped locomotion robot using nonlinear oscillators. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (pp. 1745–1750). Las Vegas, NV, USA.
Westervelt, E. R., Buche, G., & Grizzle, J. W. (2004). Experimental validation of a framework for the design of controllers that induce stable walking in planar bipeds. International Journal of Robotics Research, 23(6), 559–582.
Williams, C. K. I., & Rasmussen, C. E. (1996). Gaussian processes for regression. In Advances in neural information processing systems (vol. 8, pp. 514–520). Cambridge: MIT.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Morimoto, J., Atkeson, C.G. Nonparametric representation of an approximated Poincaré map for learning biped locomotion. Auton Robot 27, 131–144 (2009). https://doi.org/10.1007/s10514-009-9133-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10514-009-9133-z