Autonomous Robots

, Volume 27, Issue 2, pp 105–121 | Cite as

A novel method for learning policies from variable constraint data

  • Matthew Howard
  • Stefan Klanke
  • Michael Gienger
  • Christian Goerick
  • Sethu Vijayakumar
Article

Abstract

Many everyday human skills can be framed in terms of performing some task subject to constraints imposed by the environment. Constraints are usually unobservable and frequently change between contexts. In this paper, we present a novel approach for learning (unconstrained) control policies from movement data, where observations come from movements under different constraints. As a key ingredient, we introduce a small but highly effective modification to the standard risk functional, allowing us to make a meaningful comparison between the estimated policy and constrained observations. We demonstrate our approach on systems of varying complexity, including kinematic data from the ASIMO humanoid robot with 27 degrees of freedom, and present results for learning from human demonstration.

Keywords

Direct policy learning Constrained motion Imitation Nullspace control 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

Learning to reach for a ball. (23.5 MB)

Learning to wash a car. (37.3 MB)

References

  1. Alissandrakis, A., Nehaniv, C., & Dautenhahn, K. (2007). Correspondence mapping induced state and action metrics for robotic imitation. IEEE Transactions on Systems, Man and Cybernetics, 37(2), 299–307. CrossRefGoogle Scholar
  2. Antonelli, G., Arrichiello, F., & Chiaverini, S. (2005). The null-space-based behavioral control for soccer-playing mobile robots. In IEEE int. conf. advanced intelligent mechatronics, 2005. Google Scholar
  3. Atkeson, C., & Schaal, S. (1997). Robot learning from demonstration. In Int. conf. machine learning, 1997. Google Scholar
  4. Billard, A., Calinon, S., Dillmann, R., & Schaal, S. (2007). Robot programming by demonstration. In Handbook of robotics. Cambridge: MIT Press. Google Scholar
  5. Bolder, B., Dunn, M., Gienger, M., Janssen, H., Sugiura, H., & Goerick, C. (2007). Visually guided whole body interaction. In IEEE int. conf. robotics and automation, 2007. Google Scholar
  6. Calinon, S., & Billard, A. (2007). Learning of gestures by imitation in a humanoid robot. In Imitation and social learning in robots, humans & animals: behavioural, social & communicative dimensions, 2007. Google Scholar
  7. Chalodhorn, R., Grimes, D. B., Maganis, G. Y., Rao, R. P., & Asada, M. (2006). Learning humanoid motion dynamics through sensory-motor mapping in reduced dimensional space. In IEEE int. conf. robotics and automation, 2006. Google Scholar
  8. Chaumette, F., & Marchand, A. (2001). A redundancy-based iterative approach for avoiding joint limits: application to visual servoing. IEEE Transactions on Robotics and Automation, 17, 719–730. CrossRefGoogle Scholar
  9. Choi, S., & Kim, B. (2000). Obstacle avoidance control for redundant manipulators using collidability measure. Robotica, 18, 143–151. CrossRefGoogle Scholar
  10. Conner, D., Rizzi, A., & Choset, H. (2003). Composition of local potential functions for global robot control and navigation. In IEEE int. conf. intelligent robots and systems, 2003. Google Scholar
  11. D’Souza, A., Vijayakumar, S., & Schaal, S. (2001). Learning inverse kinematics. In IEEE int. conf. intelligent robots and systems, 2001. Google Scholar
  12. Gienger, M., Janssen, H., & Goerick, C. (2005). Task-oriented whole body motion for humanoid robots. In IEEE int. conf. humanoid robots, 2005. Google Scholar
  13. Grimes, D., Chalodhorn, R., & Rao, R. (2006). Dynamic imitation in a humanoid robot through nonparametric probabilistic inference. In Robotics: science and systems, 2006. Google Scholar
  14. Grimes, D., Rashid, D., & Rao, R. (2007). Learning nonparametric models for probabilistic imitation. In: Adv. neural information processing systems, 2007. Google Scholar
  15. Guenter, F., Hersch, M., Calinon, S., & Billard, A. (2007). Reinforcement learning for imitating constrained reaching movements. RSJ Advanced Robotics, 21, 1521–1544 Special Issue on Imitative Robots. Google Scholar
  16. Howard, M., & Vijayakumar, S. (2007). Reconstructing null-space policies subject to dynamic task constraints in redundant manipulators. In W.S. robotics and mathematics, 2007. Google Scholar
  17. Howard, M., Klanke, S., Gienger, M., Goerick, C., & Vijayakumar, S. (2008). Learning potential-based policies from constrained motion. In: IEEE int. conf. on humanoid robots, 2008. Google Scholar
  18. Ijspeert, A., Nakanishi, J., & Schaal, S. (2002). Movement imitation with nonlinear dynamical systems in humanoid robots. In IEEE int. conf. robotics and automation, 2002. Google Scholar
  19. Ijspeert, A., Nakanishi, J., & Schaal, S. (2003). Learning attractor landscapes for learning motor primitives. In Adv. neural information processing systems, 2003. Google Scholar
  20. Inamura, T., Toshima, I., Tanie, H., & Nakamura, Y. (2004). Embodied symbol emergence based on mimesis theory. International Journal of Robotics Research, 23, 363–377. CrossRefGoogle Scholar
  21. Kajita, S., Kanehiro, F., Kaneko, K., Fujiwara, K., Harada, K., Yokoi, K., & Hirukawa, H. (2003). Resolved momentum control: humanoid motion planning based on the linear and angular momentum. In IEEE int. conf. intelligent robots and systems, 2003. Google Scholar
  22. Khatib, O. (1985). Real-time obstacle avoidance for manipulators and mobile robots. In IEEE int. conf. robotics and automation, 1985. Google Scholar
  23. Khatib, O. (1987). A unified approach for motion and force control of robot manipulators: The operational space formulation. IEEE Journal of Robotics and Automation, RA-3, 43–53. CrossRefGoogle Scholar
  24. Liégeois, A. (1977). Automatic supervisory control of the configuration and behavior of multibody mechanisms. IEEE Transactions on Systems, Man and Cybernetics, 7, 868–871. MATHCrossRefGoogle Scholar
  25. Martinez-Cantin, R., de Freitas, N., Castellanos, J. A., & Docet, A. (2009). A Bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot. Autonomous Robots, 27 (this issue). Google Scholar
  26. Mattikalli, R., & Khosla, P. (1992). Motion constraints from contact geometry: representation and analysis. In IEEE int. conf. robotics and automation, 1992. Google Scholar
  27. Murray, R., Li, Z., & Sastry, S. (1994). A mathematical introduction to robotic manipulation. Boca Raton: CRC Press. MATHGoogle Scholar
  28. Mussa-Ivaldi, F. (1997). Nonlinear force fields: A distributed system of control primitives for representing and learning movements. In IEEE int. sympos. computational intelligence in robotics and automation, 1997. Google Scholar
  29. Nakanishi, J., Morimoto, J., Endo, G., Cheng, G., Schaal, S., & Kawato, M. (2004). Learning from demonstration and adaptation of biped locomotion. Robotics and Autonomous Systems, 47, 79–91. CrossRefGoogle Scholar
  30. Ohta, K., Svinin, M., Luo, Z., Hosoe, S., & Laboissiere, R. (2004). Optimal trajectory formation of constrained human arm reaching movements. Biological Cybernetics, 91, 23–36. MATHCrossRefGoogle Scholar
  31. Park, J., & Khatib, O. (2006). Contact consistent control framework for humanoid robots. In IEEE int. conf. robotics and automation, 2006. Google Scholar
  32. Peters, J., & Schaal, S. (2008a). Learning to control in operational space. International Journal of Robotics Research, 27, 197–212. CrossRefGoogle Scholar
  33. Peters, J., & Schaal, S. (2008b). Natural actor-critic. Neurocomputing, 71(7–9), 1180–1190. CrossRefGoogle Scholar
  34. Peters, J., Mistry, M., Udwadia, F., Nakanishi, J., & Schaal, S. (2008). A unifying framework for robot control with redundant DOFs. Autonomous Robots Journal, 24, 1–12. CrossRefGoogle Scholar
  35. Ratliff, N. D., Silver, D., & Bagnell, J. A. (2009). Learning to search: Functional gradient techniques for imitation learning. Autonomous Robots, 27(1), 25–53. CrossRefGoogle Scholar
  36. Riedmiller, M., Gabel, T., Hafner, R., & Lange, S. (2009). Reinforcement learning for robot soccer. Autonomous Robots, 27(1), 55–73. CrossRefGoogle Scholar
  37. Sapio, V. D., Warren, J., Khatib, O., & Delp, S. (2005). Simulating the task-level control of human motion: A methodology and framework for implementation. The Visual Computer, 21(5), 289–302. CrossRefGoogle Scholar
  38. Sapio, V. D., Khatib, O., & Delp, S. (2006). Task-level approaches for the control of constrained multibody systems. Multibody System Dynamics, 16, 73–102. MATHCrossRefMathSciNetGoogle Scholar
  39. Schaal, S., & Atkeson, C. (1998). Constructive incremental learning from only local information. Neural Computation, 10, 2047–2084. CrossRefGoogle Scholar
  40. Schaal, S., Ijspeert, A., & Billard, A. (2003). Computational approaches to motor learning by imitation. Actions of the Royal Society B: Biological Sciences, 358, 537–547. CrossRefGoogle Scholar
  41. Sentis, L., & Khatib, O. (2004). Task-oriented control of humanoid robots through prioritization. In IEEE int. conf. on humanoid robots, 2004. Google Scholar
  42. Sentis, L., & Khatib, O. (2005). Synthesis of whole-body behaviors through hierarchical control of behavioral primitives. International Journal of Humanoid Robotics, 2, 505–518. CrossRefGoogle Scholar
  43. Sentis, L., & Khatib, O. (2006). A whole-body control framework for humanoids operating in human environments. In IEEE int. conf. robotics and automation, 2006. Google Scholar
  44. Stolle, M., & Atkeson, C. (2009). Finding and transferring policies using stored behaviors. Autonomous Robots, 27 (this issue). Google Scholar
  45. Sugiura, H., Gienger, M., Janssen, H., & Goerick, C. (2007). Real-time collision avoidance with whole body motion control for humanoid robots. In IEEE int. conf. intelligent robots and systems, 2007. Google Scholar
  46. Svinin, M., Odashima, T., Ohno, S., Luo, Z., & Hosoe, S. (2005). An analysis of reaching movements in manipulation of constrained dynamic objects. In IEEE int. conf. intelligent robots and systems, 2005. Google Scholar
  47. Takano, W., Yamane, K., Sugihara, T., Yamamoto, K., & Nakamura, Y. (2006). Primitive communication based on motion recognition and generation with hierarchical mimesis model. In IEEE int. conf. robotics and automation, 2006. Google Scholar
  48. Udwadia, F., & Kalaba, R. (1996). Analytical dynamics: a new approach. Cambridge: Cambridge University Press. Google Scholar
  49. Vlassis, N., Toussaint, M., Kontes, G., & Piperidis, S. (2009). Learning model-free robot control using a Monte Carlo em algorithm. Autonomous Robots, 27 (this issue). Google Scholar
  50. Yoshikawa, T. (1985). Manipulability of robotic mechanisms. International Journal of Robotics Research, 4, 3–9. Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Matthew Howard
    • 1
  • Stefan Klanke
    • 1
  • Michael Gienger
    • 2
  • Christian Goerick
    • 2
  • Sethu Vijayakumar
    • 1
  1. 1.Institute of Perception Action and BehaviourUniversity of EdinburghEdinburghUK
  2. 2.Honda Research Institute Europe (GmBH)OffenbachGermany

Personalised recommendations