Many everyday human skills can be framed in terms of performing some task subject to constraints imposed by the environment. Constraints are usually unobservable and frequently change between contexts. In this paper, we present a novel approach for learning (unconstrained) control policies from movement data, where observations come from movements under different constraints. As a key ingredient, we introduce a small but highly effective modification to the standard risk functional, allowing us to make a meaningful comparison between the estimated policy and constrained observations. We demonstrate our approach on systems of varying complexity, including kinematic data from the ASIMO humanoid robot with 27 degrees of freedom, and present results for learning from human demonstration.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
Alissandrakis, A., Nehaniv, C., & Dautenhahn, K. (2007). Correspondence mapping induced state and action metrics for robotic imitation. IEEE Transactions on Systems, Man and Cybernetics, 37(2), 299–307.
Antonelli, G., Arrichiello, F., & Chiaverini, S. (2005). The null-space-based behavioral control for soccer-playing mobile robots. In IEEE int. conf. advanced intelligent mechatronics, 2005.
Atkeson, C., & Schaal, S. (1997). Robot learning from demonstration. In Int. conf. machine learning, 1997.
Billard, A., Calinon, S., Dillmann, R., & Schaal, S. (2007). Robot programming by demonstration. In Handbook of robotics. Cambridge: MIT Press.
Bolder, B., Dunn, M., Gienger, M., Janssen, H., Sugiura, H., & Goerick, C. (2007). Visually guided whole body interaction. In IEEE int. conf. robotics and automation, 2007.
Calinon, S., & Billard, A. (2007). Learning of gestures by imitation in a humanoid robot. In Imitation and social learning in robots, humans & animals: behavioural, social & communicative dimensions, 2007.
Chalodhorn, R., Grimes, D. B., Maganis, G. Y., Rao, R. P., & Asada, M. (2006). Learning humanoid motion dynamics through sensory-motor mapping in reduced dimensional space. In IEEE int. conf. robotics and automation, 2006.
Chaumette, F., & Marchand, A. (2001). A redundancy-based iterative approach for avoiding joint limits: application to visual servoing. IEEE Transactions on Robotics and Automation, 17, 719–730.
Choi, S., & Kim, B. (2000). Obstacle avoidance control for redundant manipulators using collidability measure. Robotica, 18, 143–151.
Conner, D., Rizzi, A., & Choset, H. (2003). Composition of local potential functions for global robot control and navigation. In IEEE int. conf. intelligent robots and systems, 2003.
D’Souza, A., Vijayakumar, S., & Schaal, S. (2001). Learning inverse kinematics. In IEEE int. conf. intelligent robots and systems, 2001.
Gienger, M., Janssen, H., & Goerick, C. (2005). Task-oriented whole body motion for humanoid robots. In IEEE int. conf. humanoid robots, 2005.
Grimes, D., Chalodhorn, R., & Rao, R. (2006). Dynamic imitation in a humanoid robot through nonparametric probabilistic inference. In Robotics: science and systems, 2006.
Grimes, D., Rashid, D., & Rao, R. (2007). Learning nonparametric models for probabilistic imitation. In: Adv. neural information processing systems, 2007.
Guenter, F., Hersch, M., Calinon, S., & Billard, A. (2007). Reinforcement learning for imitating constrained reaching movements. RSJ Advanced Robotics, 21, 1521–1544 Special Issue on Imitative Robots.
Howard, M., & Vijayakumar, S. (2007). Reconstructing null-space policies subject to dynamic task constraints in redundant manipulators. In W.S. robotics and mathematics, 2007.
Howard, M., Klanke, S., Gienger, M., Goerick, C., & Vijayakumar, S. (2008). Learning potential-based policies from constrained motion. In: IEEE int. conf. on humanoid robots, 2008.
Ijspeert, A., Nakanishi, J., & Schaal, S. (2002). Movement imitation with nonlinear dynamical systems in humanoid robots. In IEEE int. conf. robotics and automation, 2002.
Ijspeert, A., Nakanishi, J., & Schaal, S. (2003). Learning attractor landscapes for learning motor primitives. In Adv. neural information processing systems, 2003.
Inamura, T., Toshima, I., Tanie, H., & Nakamura, Y. (2004). Embodied symbol emergence based on mimesis theory. International Journal of Robotics Research, 23, 363–377.
Kajita, S., Kanehiro, F., Kaneko, K., Fujiwara, K., Harada, K., Yokoi, K., & Hirukawa, H. (2003). Resolved momentum control: humanoid motion planning based on the linear and angular momentum. In IEEE int. conf. intelligent robots and systems, 2003.
Khatib, O. (1985). Real-time obstacle avoidance for manipulators and mobile robots. In IEEE int. conf. robotics and automation, 1985.
Khatib, O. (1987). A unified approach for motion and force control of robot manipulators: The operational space formulation. IEEE Journal of Robotics and Automation, RA-3, 43–53.
Liégeois, A. (1977). Automatic supervisory control of the configuration and behavior of multibody mechanisms. IEEE Transactions on Systems, Man and Cybernetics, 7, 868–871.
Martinez-Cantin, R., de Freitas, N., Castellanos, J. A., & Docet, A. (2009). A Bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot. Autonomous Robots, 27 (this issue).
Mattikalli, R., & Khosla, P. (1992). Motion constraints from contact geometry: representation and analysis. In IEEE int. conf. robotics and automation, 1992.
Murray, R., Li, Z., & Sastry, S. (1994). A mathematical introduction to robotic manipulation. Boca Raton: CRC Press.
Mussa-Ivaldi, F. (1997). Nonlinear force fields: A distributed system of control primitives for representing and learning movements. In IEEE int. sympos. computational intelligence in robotics and automation, 1997.
Nakanishi, J., Morimoto, J., Endo, G., Cheng, G., Schaal, S., & Kawato, M. (2004). Learning from demonstration and adaptation of biped locomotion. Robotics and Autonomous Systems, 47, 79–91.
Ohta, K., Svinin, M., Luo, Z., Hosoe, S., & Laboissiere, R. (2004). Optimal trajectory formation of constrained human arm reaching movements. Biological Cybernetics, 91, 23–36.
Park, J., & Khatib, O. (2006). Contact consistent control framework for humanoid robots. In IEEE int. conf. robotics and automation, 2006.
Peters, J., & Schaal, S. (2008a). Learning to control in operational space. International Journal of Robotics Research, 27, 197–212.
Peters, J., & Schaal, S. (2008b). Natural actor-critic. Neurocomputing, 71(7–9), 1180–1190.
Peters, J., Mistry, M., Udwadia, F., Nakanishi, J., & Schaal, S. (2008). A unifying framework for robot control with redundant DOFs. Autonomous Robots Journal, 24, 1–12.
Ratliff, N. D., Silver, D., & Bagnell, J. A. (2009). Learning to search: Functional gradient techniques for imitation learning. Autonomous Robots, 27(1), 25–53.
Riedmiller, M., Gabel, T., Hafner, R., & Lange, S. (2009). Reinforcement learning for robot soccer. Autonomous Robots, 27(1), 55–73.
Sapio, V. D., Warren, J., Khatib, O., & Delp, S. (2005). Simulating the task-level control of human motion: A methodology and framework for implementation. The Visual Computer, 21(5), 289–302.
Sapio, V. D., Khatib, O., & Delp, S. (2006). Task-level approaches for the control of constrained multibody systems. Multibody System Dynamics, 16, 73–102.
Schaal, S., & Atkeson, C. (1998). Constructive incremental learning from only local information. Neural Computation, 10, 2047–2084.
Schaal, S., Ijspeert, A., & Billard, A. (2003). Computational approaches to motor learning by imitation. Actions of the Royal Society B: Biological Sciences, 358, 537–547.
Sentis, L., & Khatib, O. (2004). Task-oriented control of humanoid robots through prioritization. In IEEE int. conf. on humanoid robots, 2004.
Sentis, L., & Khatib, O. (2005). Synthesis of whole-body behaviors through hierarchical control of behavioral primitives. International Journal of Humanoid Robotics, 2, 505–518.
Sentis, L., & Khatib, O. (2006). A whole-body control framework for humanoids operating in human environments. In IEEE int. conf. robotics and automation, 2006.
Stolle, M., & Atkeson, C. (2009). Finding and transferring policies using stored behaviors. Autonomous Robots, 27 (this issue).
Sugiura, H., Gienger, M., Janssen, H., & Goerick, C. (2007). Real-time collision avoidance with whole body motion control for humanoid robots. In IEEE int. conf. intelligent robots and systems, 2007.
Svinin, M., Odashima, T., Ohno, S., Luo, Z., & Hosoe, S. (2005). An analysis of reaching movements in manipulation of constrained dynamic objects. In IEEE int. conf. intelligent robots and systems, 2005.
Takano, W., Yamane, K., Sugihara, T., Yamamoto, K., & Nakamura, Y. (2006). Primitive communication based on motion recognition and generation with hierarchical mimesis model. In IEEE int. conf. robotics and automation, 2006.
Udwadia, F., & Kalaba, R. (1996). Analytical dynamics: a new approach. Cambridge: Cambridge University Press.
Vlassis, N., Toussaint, M., Kontes, G., & Piperidis, S. (2009). Learning model-free robot control using a Monte Carlo em algorithm. Autonomous Robots, 27 (this issue).
Yoshikawa, T. (1985). Manipulability of robotic mechanisms. International Journal of Robotics Research, 4, 3–9.
Electronic Supplementary Material
About this article
Cite this article
Howard, M., Klanke, S., Gienger, M. et al. A novel method for learning policies from variable constraint data. Auton Robot 27, 105–121 (2009). https://doi.org/10.1007/s10514-009-9129-8
- Direct policy learning
- Constrained motion
- Nullspace control