Methods for Learning Control Policies from Variable-Constraint Demonstrations

  • Matthew Howard
  • Stefan Klanke
  • Michael Gienger
  • Christian Goerick
  • Sethu Vijayakumar
Chapter

Abstract

Many everyday human skills can be framed in terms of performing some task subject to constraints imposed by the task or the environment. Constraints are usually not observable and frequently change between contexts. In this chapter, we explore the problem of learning control policies from data containing variable, dynamic and non-linear constraints on motion. We discuss how an effective approach for doing this is to learn the unconstrained policy in a way that is consistent with the constraints. We then go on to discuss several recent algorithms for extracting policies from movement data, where observations are recorded under variable, unknown constraints. We review a number of experiments testing the performance of these algorithms and demonstrating how the resultant policy models generalise over constraints allowing prediction of behaviour under unseen settings where new constraints apply.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alissandrakis, A., Nehaniv, C.L., Dautenhahn, K.: Correspondence mapping induced state and action metrics for robotic imitation. IEEE Transactions on Systems, Man and Cybernetics 37(2), 299–307 (2007)CrossRefGoogle Scholar
  2. 2.
    Antonelli, G., Arrichiello, F., Chiaverini, S.: The null-space-based behavioral control for soccer-playing mobile robots. In: IEEE International Conference Advanced Intelligent Mechatronics, pp. 1257–1262 (2005)Google Scholar
  3. 3.
    Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. In: Robotics and Autonomous Systems (2008) (in press) (Corrected Proof)Google Scholar
  4. 4.
    Billard, A., Calinon, S., Dillmann, R., Schaal, S.: Robot programming by demonstration. In: Handbook of Robotics, ch. 59. MIT Press, Cambridge (2007)Google Scholar
  5. 5.
    Bolder, B., Dunn, M., Gienger, M., Janssen, H., Sugiura, H., Goerick, C.: Visually guided whole body interaction. In: IEEE International Conference on Robotics and Automation, pp. 3054–3061 (2007)Google Scholar
  6. 6.
    Calinon, S., Billard, A.: Learning of gestures by imitation in a humanoid robot. In: Imitation and Social Learning in Robots, Humans and Animals: Behavioural, Social and Communicative Dimensions (2007)Google Scholar
  7. 7.
    Chajewska, U., Koller, D., Ormoneit, D.: Learning an agent’s utility function by observing behavior. In: International Conference on Machine Learning (2001)Google Scholar
  8. 8.
    Chajewska, U., Getoor, L., Norman, J., Shahar, Y.: Utility elicitation as a classification problem. In: Uncertainty in Artificial Intelligence, pp. 79–88. Morgan Kaufmann Publishers, San Francisco (1998)Google Scholar
  9. 9.
    Chaumette, F., Marchand, A.: A redundancy-based iterative approach for avoiding joint limits: Application to visual servoing. IEEE Trans. Robotics and Automation 17(5), 719–730 (2001)CrossRefGoogle Scholar
  10. 10.
    Il Choi, S., Kim, B.K.: Obstacle avoidance control for redundant manipulators using collidability measure. Robotica 18(2), 143–151 (2000)CrossRefGoogle Scholar
  11. 11.
    Conner, D.C., Rizzi, A.A., Choset, H.: Composition of local potential functions for global robot control and navigation. In: IEEE International Conference on Intelligent Robots and Systems, October 27-31, vol. 4, pp. 3546–3551 (2003)Google Scholar
  12. 12.
    D’Souza, A., Vijayakumar, S., Schaal, S.: Learning inverse kinematics. In: IEEE International Conference on Intelligent Robots and Systems (2001)Google Scholar
  13. 13.
    English, J.D., Maciejewski, A.A.: On the implementation of velocity control for kinematically redundant manipulators. IEEE Transactions on Systems, Man and Cybernetics 30(3), 233–237 (2000)CrossRefGoogle Scholar
  14. 14.
    Fumagalli, M., Gijsberts, A., Ivaldi, S., Jamone, L., Metta, G., Natale, L., Nori, F., Sandini, G.: Learning how to exploit proximal force sensing: A comparison approach. In: Sigaud, O., Peters, J. (eds.) From Motor Learning to Interaction Learning in Robots. SCI, vol. 264, pp. 149–167. Springer, Heidelberg (2010)Google Scholar
  15. 15.
    Gienger, M., Janssen, H., Goerick, C.: Task-oriented whole body motion for humanoid robots. In: IEEE International Conference on Humanoid Robots, December 5, pp. 238–244 (2005)Google Scholar
  16. 16.
    Grimes, D.B., Chalodhorn, R., Rajesh, P.N.R.: Dynamic imitation in a humanoid robot through nonparametric probabilistic inference. In: Robotics: Science and Systems. MIT Press, Cambridge (2006)Google Scholar
  17. 17.
    Grimes, D.B., Rashid, D.R., Rajesh, P.N.R.: Learning nonparametric models for probabilistic imitation. In: Advances in Neural Information Processing Systems. MIT Press, Cambridge (2007)Google Scholar
  18. 18.
    Guenter, F., Hersch, M., Calinon, S., Billard, A.: Reinforcement learning for imitating constrained reaching movements. RSJ Advanced Robotics, Special Issue on Imitative Robots 21(13), 1521–1544 (2007)Google Scholar
  19. 19.
    Howard, M., Klanke, S., Gienger, M., Goerick, C., Vijayakumar, S.: Behaviour generation in humanoids by learning potential-based policies from constrained motion. Applied Bionics and Biomechanics 5(4), 195–211 (2008) (in press)CrossRefGoogle Scholar
  20. 20.
    Howard, M., Klanke, S., Gienger, M., Goerick, C., Vijayakumar, S.: Learning potential-based policies from constrained motion. In: IEEE International Conference on Humanoid Robots (2008)Google Scholar
  21. 21.
    Howard, M., Klanke, S., Gienger, M., Goerick, C., Vijayakumar, S.: A novel method for learning policies from constrained motion. In: IEEE International Conference on Robotics and Automation (2009)Google Scholar
  22. 22.
    Howard, M., Klanke, S., Gienger, M., Goerick, C., Vijayakumar, S.: A novel method for learning policies from variable constraint data. In: Autonomous Robots (submitted, 2009)Google Scholar
  23. 23.
    Howard, M., Vijayakumar, S.: Reconstructing null-space policies subject to dynamic task constraints in redundant manipulators. In: Workshop on Robotics and Mathematics (September 2007)Google Scholar
  24. 24.
    Ijspeert, A.J., Nakanishi, J., Schaal, S.: Movement imitation with nonlinear dynamical systems in humanoid robots. In: IEEE International Conference on Robotics and Automation, pp. 1398–1403 (2002); ICRA 2002 best paper awardGoogle Scholar
  25. 25.
    Ijspeert, A.J., Nakanishi, J., Schaal, S.: Learning attractor landscapes for learning motor primitives. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems, pp. 1523–1530. MIT Press, Cambridge (2003)Google Scholar
  26. 26.
    Inamura, T., Toshima, I., Tanie, H., Nakamura, Y.: Embodied symbol emergence based on mimesis theory. The International Journal of Robotics Research 23(4), 363–377 (2004)CrossRefGoogle Scholar
  27. 27.
    Kajita, S., Kanehiro, F., Kaneko, K., Fujiwara, K., Harada, K., Yokoi, K., Hirukawa, H.: Resolved momentum control: Humanoid motion planning based on the linear and angular momentum. In: IEEE Int. Conf. on Intelligent Robots and Systems (2003)Google Scholar
  28. 28.
    Kannan, R., Vempala, S., Vetta, A.: On clusterings: Good, bad and spectral. Journal of the ACM 51(3), 497–515 (2004)CrossRefMathSciNetGoogle Scholar
  29. 29.
    Khatib, O.: Real-time obstacle avoidance for manipulators and mobile robots. In: IEEE International Conference on Robotics and Automation, vol. 1, pp. 428–436 (1985)Google Scholar
  30. 30.
    Khatib, O.: A unified approach for motion and force control of robot manipulators: the operational space formulation. IEEE Journal of Robotics and Automation RA-3(1), 43–53 (1987)CrossRefGoogle Scholar
  31. 31.
    Körding, K.P., Fukunaga, I., Howard, I.S., Ingram, J.N., Wolpert, D.M.: A neuroeconomics approach to inferring utility functions in sensorimotor control. PLoS Biolology 2(10), 330 (2004)CrossRefGoogle Scholar
  32. 32.
    Körding, K.P., Wolpert, D.M.: The loss function of sensorimotor learning. Proceedings of the National Academy of Sciences 101, 9839–9842 (2004)CrossRefGoogle Scholar
  33. 33.
    Liégeois, A.: Automatic supervisory control of the configuration and behavior of multibody mechanisms. IEEE Trans. Sys., Man and Cybernetics 7, 868–871 (1977)MATHCrossRefGoogle Scholar
  34. 34.
    Mattikalli, R., Khosla, P.: Motion constraints from contact geometry: Representation and analysis. In: IEEE International Conference on Robotics and Automation (1992)Google Scholar
  35. 35.
    Murray, R.M., Li, Z., Sastry, S.S.: A Mathematical Introduction to Robotic Manipulation. CRC Press, Boca Raton (1994)MATHGoogle Scholar
  36. 36.
    Nakamura, Y.: Advanced Robotics: Redundancy and Optimization. Addison Wesley, Reading (1991)Google Scholar
  37. 37.
    Ohta, K., Svinin, M., Luo, Z., Hosoe, S., Laboissiere, R.: Optimal trajectory formation of constrained human arm reaching movements. Biological Cybernetics 91, 23–36 (2004)MATHCrossRefGoogle Scholar
  38. 38.
    Park, J., Khatib, O.: Contact consistent control framework for humanoid robots. In: IEEE International Conference on Robotics and Automation (May 2006)Google Scholar
  39. 39.
    Peters, J., Mistry, M., Udwadia, F.E., Nakanishi, J., Schaal, S.: A unifying framework for robot control with redundant dofs. Autonomous Robots 24, 1–12 (2008)CrossRefGoogle Scholar
  40. 40.
    Peters, J., Schaal, S.: Learning to control in operational space. The International Journal of Robotics Research 27(2), 197–212 (2008)CrossRefGoogle Scholar
  41. 41.
    Ren, J., McIsaac, K.A., Patel, R.V.: Modified Newton’s method applied to potential field-based navigation for mobile robots. In: IEEE Transactions on Robotics (2006)Google Scholar
  42. 42.
    Rimon, E., Koditschek, D.E.: Exact robot navigation using artificial potential functions. IEEE Transactions on Robotics and Automation 8(5), 501–518 (1992)CrossRefGoogle Scholar
  43. 43.
    De Sapio, V., Khatib, O., Delp, S.: Task-level approaches for the control of constrained multibody systems (2006)Google Scholar
  44. 44.
    De Sapio, V., Warren, J., Khatib, O., Delp, S.: Simulating the task-level control of human motion: a methodology and framework for implementation. The Visual Computer 21(5), 289–302 (2005)CrossRefGoogle Scholar
  45. 45.
    Schaal, S.: Learning from demonstration. In: Mozer, M.C., Jordan, M., Petsche, T. (eds.) Advances in Neural Information Processing Systems, pp. 1040–1046. MIT Press, Cambridge (1997)Google Scholar
  46. 46.
    Schaal, S., Atkeson, C.G.: Constructive incremental learning from only local information. Neural Computation 10, 2047–2084 (1998)CrossRefGoogle Scholar
  47. 47.
    Schaal, S., Ijspeert, A., Billard, A.: Computational approaches to motor learning by imitation. Philosophical Transactions: Biological Sciences 358(1431), 537–547 (2003)CrossRefGoogle Scholar
  48. 48.
    Sentis, L., Khatib, O.: Task-oriented control of humanoid robots through prioritization. In: IEEE International Conference on Humanoid Robots (2004)Google Scholar
  49. 49.
    Sentis, L., Khatib, O.: Synthesis of whole-body behaviors through hierarchical control of behavioral primitives. International Journal of Humanoid Robotics 2(4), 505–518 (2005)CrossRefGoogle Scholar
  50. 50.
    Sentis, L., Khatib, O.: A whole-body control framework for humanoids operating in human environments. In: IEEE International Conference on Robotics and Automation (May 2006)Google Scholar
  51. 51.
    Sugiura, H., Gienger, M., Janssen, H., Goerick, C.: Real-time collision avoidance with whole body motion control for humanoid robots. In: IEEE International Conference on Intelligent Robots and Systems, pp. 2053–2058 (2007)Google Scholar
  52. 52.
    Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction. MIT Press, Cambridge (1998)Google Scholar
  53. 53.
    Takano, W., Yamane, K., Sugihara, T., Yamamoto, K., Nakamura, Y.: Primitive communication based on motion recognition and generation with hierarchical mimesis model. In: IEEE International Conference on Robotics and Automation (2006)Google Scholar
  54. 54.
    Todorov, E.: Optimal control theory. In: Doya, K. (ed.) Bayesian Brain. MIT Press, Cambridge (2006)Google Scholar
  55. 55.
    Udwadia, F.E., Kalaba, R.E.: Analytical Dynamics: A New Approach. Cambridge University Press, Cambridge (1996)Google Scholar
  56. 56.
    Verbeek, J.: Learning non-linear image manifolds by combining local linear models. IEEE Transactions on Pattern Analysis & Machine Intelligence 28(8), 1236–1250 (2006)CrossRefGoogle Scholar
  57. 57.
    Verbeek, J., Roweis, S., Vlassis, N.: Non-linear cca and pca by alignment of local models. In: Advances in Neural Information Processing Systems (2004)Google Scholar
  58. 58.
    Vijayakumar, S., D’Souza, A., Schaal, S.: Incremental online learning in high dimensions. Neural Computation 17(12), 2602–2634 (2005)CrossRefMathSciNetGoogle Scholar
  59. 59.
    Yoshikawa, T.: Manipulability of robotic mechanisms. The International Journal of Robotics Research 4(2), 3–9 (1985)CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Matthew Howard
    • 1
  • Stefan Klanke
    • 1
  • Michael Gienger
    • 2
  • Christian Goerick
    • 2
  • Sethu Vijayakumar
    • 1
  1. 1.Institute of Perception Action and BehaviourUniversity of EdinburghScotlandUK
  2. 2.Honda Research Institute Europe (GmBH)OffenbachGermany

Personalised recommendations