Autonomous Robots

, Volume 36, Issue 3, pp 273–294 | Cite as

Socially guided intrinsic motivation for robot learning of motor skills

Article

Abstract

This paper presents a technical approach to robot learning of motor skills which combines active intrinsically motivated learning with imitation learning. Our algorithmic architecture, called SGIM-D, allows efficient learning of high-dimensional continuous sensorimotor inverse models in robots, and in particular learns distributions of parameterised motor policies that solve a corresponding distribution of parameterised goals/tasks. This is made possible by the technical integration of imitation learning techniques within an algorithm for learning inverse models that relies on active goal babbling. After reviewing social learning and intrinsic motivation approaches to action learning, we describe the general framework of our algorithm, before detailing its architecture. In an experiment where a robot arm has to learn to use a flexible fishing line, we illustrate that SGIM-D efficiently combines the advantages of social learning and intrinsic motivation and benefits from human demonstration properties to learn how to produce varied outcomes in the environment, while developing more precise control policies in large spaces.

Keywords

Active learning Intrinsic motivation Exploration  Motor skill learning Inverse model  Programming by demonstration Learning from demonstration Imitation 

References

  1. Abbeel, P. & Ng, A. Y. (2004). Apprenticeship learning via inverse reinforcement learning. In Proceedings of the 21st international conference on machine learning (ICML’04) (pp. 1–8).Google Scholar
  2. Akgun, B., Cakmak, M., Yoo, J., & Thomaz, A. (2012). Trajectories and keyframes for kinesthetic teaching: A human–robot interaction perspective. In International conference on human–robot interaction.Google Scholar
  3. Argall, B. D., Browning, B., & Veloso, M. (2008). Learning robot motion control with demonstration and advice-operators. In Proceedings IEEE/RSJ international conference on intelligent robots and systems IEEE (pp. 399–404).Google Scholar
  4. Argall, B. D., Chernova, S., Veloso, M., & Browning, B. (2009). A survey of robot learning from demonstration. Robotics and Autonomous Systems, 57(5), 469–483. doi:10.1016/j.robot.2008.10.024.CrossRefGoogle Scholar
  5. Argall, B. D., Browning, B., & Veloso, M. (2011). Teacher feedback to scaffold and refine demonstrated motion primitives on a mobile robot. Robotics and Autonomous Systems, 59(3–4), 243–255.CrossRefGoogle Scholar
  6. Baldassarre, G. (2011). What are intrinsic motivations? A biological perspective. In 2011 IEEE international conference on development and learning (ICDL) (Vol. 2, pp. 1–8).Google Scholar
  7. Baranes, A., & Oudeyer, P. Y. (2010). Intrinsically motivated goal exploration for active motor learning in robots. Paris: INRIA.Google Scholar
  8. Baranes, A., & Oudeyer, P. Y. (2013). Active learning of inverse models with intrinsically motivated goal exploration in robots. Robotics and Autonomous Systems, 61(1), 49–73.CrossRefGoogle Scholar
  9. Barto, A. G., Singh, S., & Chenatez, N. (2004a). Intrinsically motivated learning of hierarchical collections of skills. In Proceedings of 3rd international conference on development and learning, San Diego, CA (pp. 112–119).Google Scholar
  10. Barto, A. G., Singh, S., & Chentanez, N. (2004b). Intrinsically motivated learning of hierarchical collections of skills. In ICDL international conference on developmental learning.Google Scholar
  11. Billard, A., Calinon, S., Dillmann, R., & Schaal, S. (2007). Robot programming by demonstration. In B. Siciliano & O. Khatib (Eds.), Handbook of robotics (Chapt. 59). New York: Springer.Google Scholar
  12. Bishop, C. (2007). Pattern recognition and machine learning. In Information science and statistics. Heidelberg: Springer.Google Scholar
  13. Blumberg, B., Downie, M., Ivanov, Y., Berlin, M., Johnson, M. P., & Tomlinson, B. (2002). Integrated learning for interactive synthetic characters. ACM Transactions on Graphics 21:417–426. doi:10.1145/566654.566597.Google Scholar
  14. Breazeal, C., & Scassellati, B. (2002). Robots that imitate humans. Trends in Cognitive Sciences, 6(11), 481–487.CrossRefGoogle Scholar
  15. Cakmak, M., & Thomaz, A. L. (2010). Optimality of human teachers for robot learners. In IEEE international conference on development and learning (ICDL) (Vol. 4).Google Scholar
  16. Cakmak, M., DePalma, N., Thomaz, A. L., & Arriaga, R. (2009). Effects of social exploration mechanisms on robot learning. In The 18th IEEE international symposium on robot and human interactive communication (RO-MAN 2009) (pp. 128–134).Google Scholar
  17. Cakmak, M., Chao, C., & Thomaz, A. L. (2010). Designing interactions for robot active learners. IEEE Transactions on Autonomous Mental Development, 2(2), 108–118.CrossRefGoogle Scholar
  18. Calinon, S. (2009). Robot programming by demonstration: A probabilistic approach. Boca Raton: EPFL/CRC Press. EPFL Press ISBN 978-2-940222-31-5, CRC Press ISBN 978-1-4398-0867-2.Google Scholar
  19. Calinon, S., & F G, Billard A,. (2007). On learning, representing and generalizing a task in a humanoid robot. IEEE Transactions on Systems, Man and Cybernetics, 37(2), 286–298.Google Scholar
  20. Call, J., & Carpenter, M. (2002). Three sources of information in social learning. In K. Dautenhahn & C. L. Nehaniv (Eds.), Imitation in animals and artifacts (pp. 211–228). Cambridge, MA: MIT Press.Google Scholar
  21. Cederborg, T., Li, M., Baranes, A., & Oudeyer, P. Y. (2010). Incremental local inline gaussian mixture regression for imitation learning of multiple tasks. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (IROS), Taipei, Taiwan.Google Scholar
  22. Chernova, S., & Veloso, M. (2009). Interactive policy learning through confidence-based autonomy. Journal of Artificial Intelligence Research, 34. doi:10.1613/jair.2584.
  23. Clouse, J., & Utgoff, P. (1992). A teaching method for reinforcement learning. In Proceedings of the nineth international conference on machine learning.Google Scholar
  24. Cohn, D. A., Ghahramani, Z., & Jordan, M. I. (1996). Active learning with statistical models. Journal of Artificial Intelligence Research, 4, 129–145.MATHGoogle Scholar
  25. Coleman, T., & Li, Y. (1994). On the convergence of reflective newton methods for large-scale nonlinear minimization subject to bounds. Mathematical Programming, 67(2), 189–224.CrossRefMATHMathSciNetGoogle Scholar
  26. Coleman, T., & Li, Y. (1996). An interior, trust region approach for nonlinear minimization subject to bounds. SIAM Journal on Optimization, 6, 418–445.CrossRefMATHMathSciNetGoogle Scholar
  27. Csibra, G. (2003). Teleological and referential understanding of action in infancy. Philosophical Transactions of the Royal Society of London Series B: Biological Sciences, 358(1431), 447.CrossRefGoogle Scholar
  28. Csibra, G., & Gergely, G. (2007). Obsessed with goals: Functions and mechanisms of teleological interpretation of actions in humans. Acta Psychologica, 124(1), 60–78. doi:10.1016/j.actpsy.2006.09.007. Becoming an intentional agent: Early development of action interpretation and action control.Google Scholar
  29. da Silva, B., Konidaris, G., & Barto, A. (2012). Learning parameterized skills. In 29th international conference on machine learning (ICML 2012).Google Scholar
  30. Dautenhahn, K., & Nehaniv, C. L. (2002). Imitation in animals and artifacts. Cambridge: MIT Press.Google Scholar
  31. d’Avella, A., Portone, A., Fernandez, L., & Lacquaniti, F. (2006). Control of fast-reaching movement by muscle synergies combinations. The Journal of Neuroscience, 26(30), 7791–7810.CrossRefGoogle Scholar
  32. Deci, E., & Ryan, R. M. (1985). Intrinsic motivation and self-determination in human behavior. New York: Plenum Press.CrossRefGoogle Scholar
  33. Fedorov, V. (1972). Theory of optimal experiment. New York, NY: Academic Press, Inc.Google Scholar
  34. Grollman, D. H., & Jenkins, O. C. (2008). Sparse incremental learning for interactive robot control policy estimation. In International conference on robotics and automation (ICRA 2008) (pp. 3315–3320).Google Scholar
  35. Kaplan, F., Oudeyer, P. Y., Kubinyi, E., & Miklosi, A. (2002). Robotic clicker training. Robotics and Autonomous Systems, 38(3–4), 197–206.CrossRefGoogle Scholar
  36. Kober, J., & Peters, J. (2011). Policy search for motor primitives in robotics. Machine Learning, 84(1), 171–203.CrossRefMATHMathSciNetGoogle Scholar
  37. Kober, J., Wilhelm, A., Oztop, E., & Peters, J. (2012). Reinforcement learning to adjust parametrized motor primitives to new situations. Autonomous Robots, 1–19. doi:10.1007/s10514-012-9290-3.
  38. Koenig, N., Takayama, L., & Matarić, M. (2010). Communication and knowledge sharing in human–robot interaction and learning from demonstration. Neural Networks, 23(8–9), 1104–1112. doi:10.1016/j.neunet.2010.06.005.CrossRefGoogle Scholar
  39. Kormushev, P., Calinon, S., & Caldwell, D. G. (2010). Robot motor skill coordination with EM-based reinforcement learning. In Proceedings of IEEE/RSJ international conference on intelligent robots and systems (IROS), Taipei, Taiwan (pp 3232–3237).Google Scholar
  40. Kormushev, P., Calinon, S., & Caldwell, D. G. (2011). Imitation learning of positional and force skills demonstrated via kinesthetic teaching and haptic input. Advanced Robotics, 25(5), 581–603.CrossRefGoogle Scholar
  41. Krzanowski, W. J. (1988). Principles of multivariate analysis: A user’s perspective. New York: Oxford University Press.MATHGoogle Scholar
  42. Lagarias, J. C., Reeds, J. A., Wright, M. H., & Wright, P. E. (1998). Convergence properties of the nelder-mead simplex method in low dimensions. SIAM Journal of Optimization, 9(1), 112–147.CrossRefMATHMathSciNetGoogle Scholar
  43. Lopes, M. (2012). Optimal teaching on sequential decision tasks (to appear).Google Scholar
  44. Lopes, M., & Oudeyer, P. Y. (2010). Active learning and intrinsically motivated exploration in robots: Advances and challenges (guest editorial). IEEE Transactions on Autonomous Mental Development, 2(2), 65–69.CrossRefGoogle Scholar
  45. Lopes, M., Melo, F., Montesano, L., & Santos-Victor, J. (2009a). Abstraction levels for robotic imitation: Overview and computational approaches. In From motor to interaction learning in robots. Berlin: Springer.Google Scholar
  46. Lopes, M., Melo, F. S., Kenward, B., & Santos-Victor, J. (2009b). A computational model of social-learning mechanisms. Adaptive Behaviour, 17(6), 467–483.Google Scholar
  47. Lopes, M., Melo, F., Montesano, L., & Santos-Victor, J. (2010b). Abstraction levels for robotic imitation: Overview and computational approaches. In O. Sigaud & J. Peters (Eds.), From motor to interaction learning in robots, Studies in computational intelligence (Vol. 264, pp. 313–355). Berlin: Springer.Google Scholar
  48. Lopes, M., Cederbourg, T., & Oudeyer, P. Y. (2011) Simultaneous acquisition of task and feedback models. In IEEE international conference on development and learning.Google Scholar
  49. Mangin, O., & Oudeyer, P. Y. (2012) Learning the combinatorial structure of demonstrated behaviors with inverse feedback control. In A. A. Salah, J., Ruiz-del Solar, Ç. Meriçli, & P. Y. Oudeyer (Eds.), HBU 2012. LNCS (Vol. 7559, pp 135–148). Heidelberg: Springer.Google Scholar
  50. Muja, M., & Lowe, D. (2009). Fast approximate nearest neighbors with automatic algorithm. In International conference on computer vision theory and applications (VISAPP’09).Google Scholar
  51. Nehaniv, C. L., Dautenhahn, K., et al. (2004). Imitation and social learning in robots, humans, and animals: Behavioural, social and communicative dimensions. Cambridge: Cambridge University Press.Google Scholar
  52. Nehaniv, C. L., & Dautenhahn, K. (2007). Imitation and social learning in robots, humans and animals: Behavioural, social and communicative dimensions. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  53. Nguyen, S. M., & Oudeyer, P.-Y. (2012a). Interactive learning gives the tempo to an intrinsically motivated robot learner. In IEEE-RAS international conference on humanoid robots.Google Scholar
  54. Nguyen, S. M., & Oudeyer, P.-Y. (2012b). Whom will an intrinsically motivated robot learner choose to imitate from? In J. Szufnarowska (Ed.), Proceedings of the post-graduate conference on robotics and development of cognition (pp. 32–35). doi:10.2390/biecoll-robotdoc2012-12.
  55. Nguyen, S. M., & Oudeyer, P.-Y. (2012c). Active choice of teachers, learning strategies and goals for a socially guided intrinsic motivation learner. Paladyn Journal of Behavioural Robotics, 3(3), 136–146.Google Scholar
  56. Nicolescu, M., & Mataric, M. (2003). Natural methods for robot task learning: Instructive demonstrations, generalization and practice. In Proceedings of the second international joint conference on autonomous agents and multiagent systems, ACM (pp. 241–248).Google Scholar
  57. Oudeyer, P. Y. (2011). Developmental constraints on the evolution and acquisition of sensorimotor skills. Habilitation a Diriger des Recherches.Google Scholar
  58. Oudeyer, P. Y., & Kaplan, F. (2007). What is intrinsic motivation? a typology of computational approaches. Frontiers in Neurorobotics, 1, 6.Google Scholar
  59. Oudeyer, P. Y., Kaplan, F., & Hafner, V. (2007). Intrinsic motivation systems for autonomous mental development. IEEE Transactions on Evolutionary Computation, 11(2), 265–286.CrossRefGoogle Scholar
  60. Oudeyer, P. Y., Baranes, A., & Kaplan, F. (2013). Intrinsically motivated learning of real-word sensorimotor skills with developmental constraints. In G. Baldassarre & Miroli (Eds.), Intrinsically motivated learning in natural and artificial system. London: Springer.Google Scholar
  61. Peters, J., & Schaal, S. (2008). Reinforcement learning of motor skills with policy gradients. Neural Networks, 21(4), 682–697.CrossRefGoogle Scholar
  62. Rolf, M., Steil, J., & Gienger, M. (2010). Gobal babbling permits direct learning of inverse kinematics. IEEE Transactions on Autonomous Mental Development, 2(3), 216–229.CrossRefGoogle Scholar
  63. Roy, N., & McCallum, A. (2001). Towards optimal active learning through sampling estimation of error reduction. In Proceedings of the 18th international conference on machine learning, 1, 143–160.Google Scholar
  64. Schaal, S., Ijspeert, A., & Billard, A. (2003). Computational approaches to motor learning by imitation. Philosophical Transactions of the Royal Society of London Series B, Biological sciences 358(1431), 537–547.Google Scholar
  65. Schmidhuber J (1991) Curious model-building control systems. In: Proceedings of the international joint conference on neural networks (Vol. 2, pp. 1458–1463).Google Scholar
  66. Schmidhuber, J. (2010). Formal theory of creativity, fun, and intrinsic motivation (1990–2010). IEEE Transactions on Autonomous Mental Development, 2(3), 230–247.CrossRefGoogle Scholar
  67. Slater, A., & Lewis, M. (2006). Introduction to infant development. Oxford: Oxford University Press.Google Scholar
  68. Smart, W., & Kaelbling, L. (2002). Effective reinforcement learning for mobile robots. In Proceedings of the IEEE international conference on robotics and automation (pp 3404–3410).Google Scholar
  69. Stulp, F., & Schaal, S. (2011). Hierarchical reinforcement learning with movement primitives. In Humanoids (pp. 231–238).Google Scholar
  70. Stulp, F., & Sigaud, O. (2012). Policy improvement methods: Between black-box optimization and episodic reinforcement learning.Google Scholar
  71. Theodorou, E., Buchli, J., & Schaal, S. (2010). Reinforcement learning of motor skills in high dimensions: A path integral approach. In IEEE international conference on robotics and automation (ICRA) 2010 (pp. 2397–2403).Google Scholar
  72. Thomaz, A. L. (2006). Socially guided machine learning. PhD thesis, MIT.Google Scholar
  73. Thomaz, A. L., & Breazeal, C. (2008). Experiments in socially guided exploration: Lessons learned in building robots that learn with and without human teachers. Connection Science, Special Issue on Social Learning in Embodied Agents, 20(2, 3), 91–110.Google Scholar
  74. Tomasello, M., & Carpenter, M. (2007). Shared intentionality. Developmental Science, 10(1), 121–125. Google Scholar
  75. Verma, D., & Rao, R. (2006). Goal-based imitation as probabilistic inference over graphical models. In Advances in NIPS (Vol. 18).Google Scholar
  76. Weiss, E., & Flanders, M. (2004). Muscular and postural synergies of the human hand. Journal of Neurophysiology, 92, 523–535.CrossRefGoogle Scholar
  77. Whiten, A. (2000). Primate culture and social learning. Cognitive Science, 24(3), 477–508.CrossRefGoogle Scholar
  78. Xu, T., Yu, C., & Smith, L. (2011). It’s the child’s body: The role of toddler and parent in selecting toddler’s visual experience. IN Proceedings of IEEE 10th international conference in development and learning.Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.Flowers TeamINRIA and ENSTA ParisTechParisFrance

Personalised recommendations