Skip to main content

Advertisement

Log in

Socially guided intrinsic motivation for robot learning of motor skills

  • Published:
Autonomous Robots Aims and scope Submit manuscript

Abstract

This paper presents a technical approach to robot learning of motor skills which combines active intrinsically motivated learning with imitation learning. Our algorithmic architecture, called SGIM-D, allows efficient learning of high-dimensional continuous sensorimotor inverse models in robots, and in particular learns distributions of parameterised motor policies that solve a corresponding distribution of parameterised goals/tasks. This is made possible by the technical integration of imitation learning techniques within an algorithm for learning inverse models that relies on active goal babbling. After reviewing social learning and intrinsic motivation approaches to action learning, we describe the general framework of our algorithm, before detailing its architecture. In an experiment where a robot arm has to learn to use a flexible fishing line, we illustrate that SGIM-D efficiently combines the advantages of social learning and intrinsic motivation and benefits from human demonstration properties to learn how to produce varied outcomes in the environment, while developing more precise control policies in large spaces.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Abbeel, P. & Ng, A. Y. (2004). Apprenticeship learning via inverse reinforcement learning. In Proceedings of the 21st international conference on machine learning (ICML’04) (pp. 1–8).

  • Akgun, B., Cakmak, M., Yoo, J., & Thomaz, A. (2012). Trajectories and keyframes for kinesthetic teaching: A human–robot interaction perspective. In International conference on human–robot interaction.

  • Argall, B. D., Browning, B., & Veloso, M. (2008). Learning robot motion control with demonstration and advice-operators. In Proceedings IEEE/RSJ international conference on intelligent robots and systems IEEE (pp. 399–404).

  • Argall, B. D., Chernova, S., Veloso, M., & Browning, B. (2009). A survey of robot learning from demonstration. Robotics and Autonomous Systems, 57(5), 469–483. doi:10.1016/j.robot.2008.10.024.

    Article  Google Scholar 

  • Argall, B. D., Browning, B., & Veloso, M. (2011). Teacher feedback to scaffold and refine demonstrated motion primitives on a mobile robot. Robotics and Autonomous Systems, 59(3–4), 243–255.

    Article  Google Scholar 

  • Baldassarre, G. (2011). What are intrinsic motivations? A biological perspective. In 2011 IEEE international conference on development and learning (ICDL) (Vol. 2, pp. 1–8).

  • Baranes, A., & Oudeyer, P. Y. (2010). Intrinsically motivated goal exploration for active motor learning in robots. Paris: INRIA.

  • Baranes, A., & Oudeyer, P. Y. (2013). Active learning of inverse models with intrinsically motivated goal exploration in robots. Robotics and Autonomous Systems, 61(1), 49–73.

    Article  Google Scholar 

  • Barto, A. G., Singh, S., & Chenatez, N. (2004a). Intrinsically motivated learning of hierarchical collections of skills. In Proceedings of 3rd international conference on development and learning, San Diego, CA (pp. 112–119).

  • Barto, A. G., Singh, S., & Chentanez, N. (2004b). Intrinsically motivated learning of hierarchical collections of skills. In ICDL international conference on developmental learning.

  • Billard, A., Calinon, S., Dillmann, R., & Schaal, S. (2007). Robot programming by demonstration. In B. Siciliano & O. Khatib (Eds.), Handbook of robotics (Chapt. 59). New York: Springer.

  • Bishop, C. (2007). Pattern recognition and machine learning. In Information science and statistics. Heidelberg: Springer.

  • Blumberg, B., Downie, M., Ivanov, Y., Berlin, M., Johnson, M. P., & Tomlinson, B. (2002). Integrated learning for interactive synthetic characters. ACM Transactions on Graphics 21:417–426. doi:10.1145/566654.566597.

    Google Scholar 

  • Breazeal, C., & Scassellati, B. (2002). Robots that imitate humans. Trends in Cognitive Sciences, 6(11), 481–487.

    Article  Google Scholar 

  • Cakmak, M., & Thomaz, A. L. (2010). Optimality of human teachers for robot learners. In IEEE international conference on development and learning (ICDL) (Vol. 4).

  • Cakmak, M., DePalma, N., Thomaz, A. L., & Arriaga, R. (2009). Effects of social exploration mechanisms on robot learning. In The 18th IEEE international symposium on robot and human interactive communication (RO-MAN 2009) (pp. 128–134).

  • Cakmak, M., Chao, C., & Thomaz, A. L. (2010). Designing interactions for robot active learners. IEEE Transactions on Autonomous Mental Development, 2(2), 108–118.

    Article  Google Scholar 

  • Calinon, S. (2009). Robot programming by demonstration: A probabilistic approach. Boca Raton: EPFL/CRC Press. EPFL Press ISBN 978-2-940222-31-5, CRC Press ISBN 978-1-4398-0867-2.

  • Calinon, S., & F G, Billard A,. (2007). On learning, representing and generalizing a task in a humanoid robot. IEEE Transactions on Systems, Man and Cybernetics, 37(2), 286–298.

  • Call, J., & Carpenter, M. (2002). Three sources of information in social learning. In K. Dautenhahn & C. L. Nehaniv (Eds.), Imitation in animals and artifacts (pp. 211–228). Cambridge, MA: MIT Press.

  • Cederborg, T., Li, M., Baranes, A., & Oudeyer, P. Y. (2010). Incremental local inline gaussian mixture regression for imitation learning of multiple tasks. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (IROS), Taipei, Taiwan.

  • Chernova, S., & Veloso, M. (2009). Interactive policy learning through confidence-based autonomy. Journal of Artificial Intelligence Research, 34. doi:10.1613/jair.2584.

  • Clouse, J., & Utgoff, P. (1992). A teaching method for reinforcement learning. In Proceedings of the nineth international conference on machine learning.

  • Cohn, D. A., Ghahramani, Z., & Jordan, M. I. (1996). Active learning with statistical models. Journal of Artificial Intelligence Research, 4, 129–145.

    MATH  Google Scholar 

  • Coleman, T., & Li, Y. (1994). On the convergence of reflective newton methods for large-scale nonlinear minimization subject to bounds. Mathematical Programming, 67(2), 189–224.

    Article  MATH  MathSciNet  Google Scholar 

  • Coleman, T., & Li, Y. (1996). An interior, trust region approach for nonlinear minimization subject to bounds. SIAM Journal on Optimization, 6, 418–445.

    Article  MATH  MathSciNet  Google Scholar 

  • Csibra, G. (2003). Teleological and referential understanding of action in infancy. Philosophical Transactions of the Royal Society of London Series B: Biological Sciences, 358(1431), 447.

    Article  Google Scholar 

  • Csibra, G., & Gergely, G. (2007). Obsessed with goals: Functions and mechanisms of teleological interpretation of actions in humans. Acta Psychologica, 124(1), 60–78. doi:10.1016/j.actpsy.2006.09.007. Becoming an intentional agent: Early development of action interpretation and action control.

    Google Scholar 

  • da Silva, B., Konidaris, G., & Barto, A. (2012). Learning parameterized skills. In 29th international conference on machine learning (ICML 2012).

  • Dautenhahn, K., & Nehaniv, C. L. (2002). Imitation in animals and artifacts. Cambridge: MIT Press.

    Google Scholar 

  • d’Avella, A., Portone, A., Fernandez, L., & Lacquaniti, F. (2006). Control of fast-reaching movement by muscle synergies combinations. The Journal of Neuroscience, 26(30), 7791–7810.

    Article  Google Scholar 

  • Deci, E., & Ryan, R. M. (1985). Intrinsic motivation and self-determination in human behavior. New York: Plenum Press.

    Book  Google Scholar 

  • Fedorov, V. (1972). Theory of optimal experiment. New York, NY: Academic Press, Inc.

    Google Scholar 

  • Grollman, D. H., & Jenkins, O. C. (2008). Sparse incremental learning for interactive robot control policy estimation. In International conference on robotics and automation (ICRA 2008) (pp. 3315–3320).

  • Kaplan, F., Oudeyer, P. Y., Kubinyi, E., & Miklosi, A. (2002). Robotic clicker training. Robotics and Autonomous Systems, 38(3–4), 197–206.

    Article  Google Scholar 

  • Kober, J., & Peters, J. (2011). Policy search for motor primitives in robotics. Machine Learning, 84(1), 171–203.

    Article  MATH  MathSciNet  Google Scholar 

  • Kober, J., Wilhelm, A., Oztop, E., & Peters, J. (2012). Reinforcement learning to adjust parametrized motor primitives to new situations. Autonomous Robots, 1–19. doi:10.1007/s10514-012-9290-3.

  • Koenig, N., Takayama, L., & Matarić, M. (2010). Communication and knowledge sharing in human–robot interaction and learning from demonstration. Neural Networks, 23(8–9), 1104–1112. doi:10.1016/j.neunet.2010.06.005.

    Article  Google Scholar 

  • Kormushev, P., Calinon, S., & Caldwell, D. G. (2010). Robot motor skill coordination with EM-based reinforcement learning. In Proceedings of IEEE/RSJ international conference on intelligent robots and systems (IROS), Taipei, Taiwan (pp 3232–3237).

  • Kormushev, P., Calinon, S., & Caldwell, D. G. (2011). Imitation learning of positional and force skills demonstrated via kinesthetic teaching and haptic input. Advanced Robotics, 25(5), 581–603.

    Article  Google Scholar 

  • Krzanowski, W. J. (1988). Principles of multivariate analysis: A user’s perspective. New York: Oxford University Press.

    MATH  Google Scholar 

  • Lagarias, J. C., Reeds, J. A., Wright, M. H., & Wright, P. E. (1998). Convergence properties of the nelder-mead simplex method in low dimensions. SIAM Journal of Optimization, 9(1), 112–147.

    Article  MATH  MathSciNet  Google Scholar 

  • Lopes, M. (2012). Optimal teaching on sequential decision tasks (to appear).

  • Lopes, M., & Oudeyer, P. Y. (2010). Active learning and intrinsically motivated exploration in robots: Advances and challenges (guest editorial). IEEE Transactions on Autonomous Mental Development, 2(2), 65–69.

    Article  Google Scholar 

  • Lopes, M., Melo, F., Montesano, L., & Santos-Victor, J. (2009a). Abstraction levels for robotic imitation: Overview and computational approaches. In From motor to interaction learning in robots. Berlin: Springer.

  • Lopes, M., Melo, F. S., Kenward, B., & Santos-Victor, J. (2009b). A computational model of social-learning mechanisms. Adaptive Behaviour, 17(6), 467–483.

    Google Scholar 

  • Lopes, M., Melo, F., Montesano, L., & Santos-Victor, J. (2010b). Abstraction levels for robotic imitation: Overview and computational approaches. In O. Sigaud & J. Peters (Eds.), From motor to interaction learning in robots, Studies in computational intelligence (Vol. 264, pp. 313–355). Berlin: Springer.

  • Lopes, M., Cederbourg, T., & Oudeyer, P. Y. (2011) Simultaneous acquisition of task and feedback models. In IEEE international conference on development and learning.

  • Mangin, O., & Oudeyer, P. Y. (2012) Learning the combinatorial structure of demonstrated behaviors with inverse feedback control. In A. A. Salah, J., Ruiz-del Solar, Ç. Meriçli, & P. Y. Oudeyer (Eds.), HBU 2012. LNCS (Vol. 7559, pp 135–148). Heidelberg: Springer.

  • Muja, M., & Lowe, D. (2009). Fast approximate nearest neighbors with automatic algorithm. In International conference on computer vision theory and applications (VISAPP’09).

  • Nehaniv, C. L., Dautenhahn, K., et al. (2004). Imitation and social learning in robots, humans, and animals: Behavioural, social and communicative dimensions. Cambridge: Cambridge University Press.

  • Nehaniv, C. L., & Dautenhahn, K. (2007). Imitation and social learning in robots, humans and animals: Behavioural, social and communicative dimensions. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Nguyen, S. M., & Oudeyer, P.-Y. (2012a). Interactive learning gives the tempo to an intrinsically motivated robot learner. In IEEE-RAS international conference on humanoid robots.

  • Nguyen, S. M., & Oudeyer, P.-Y. (2012b). Whom will an intrinsically motivated robot learner choose to imitate from? In J. Szufnarowska (Ed.), Proceedings of the post-graduate conference on robotics and development of cognition (pp. 32–35). doi:10.2390/biecoll-robotdoc2012-12.

  • Nguyen, S. M., & Oudeyer, P.-Y. (2012c). Active choice of teachers, learning strategies and goals for a socially guided intrinsic motivation learner. Paladyn Journal of Behavioural Robotics, 3(3), 136–146.

    Google Scholar 

  • Nicolescu, M., & Mataric, M. (2003). Natural methods for robot task learning: Instructive demonstrations, generalization and practice. In Proceedings of the second international joint conference on autonomous agents and multiagent systems, ACM (pp. 241–248).

  • Oudeyer, P. Y. (2011). Developmental constraints on the evolution and acquisition of sensorimotor skills. Habilitation a Diriger des Recherches.

  • Oudeyer, P. Y., & Kaplan, F. (2007). What is intrinsic motivation? a typology of computational approaches. Frontiers in Neurorobotics, 1, 6.

  • Oudeyer, P. Y., Kaplan, F., & Hafner, V. (2007). Intrinsic motivation systems for autonomous mental development. IEEE Transactions on Evolutionary Computation, 11(2), 265–286.

    Article  Google Scholar 

  • Oudeyer, P. Y., Baranes, A., & Kaplan, F. (2013). Intrinsically motivated learning of real-word sensorimotor skills with developmental constraints. In G. Baldassarre & Miroli (Eds.), Intrinsically motivated learning in natural and artificial system. London: Springer.

    Google Scholar 

  • Peters, J., & Schaal, S. (2008). Reinforcement learning of motor skills with policy gradients. Neural Networks, 21(4), 682–697.

    Article  Google Scholar 

  • Rolf, M., Steil, J., & Gienger, M. (2010). Gobal babbling permits direct learning of inverse kinematics. IEEE Transactions on Autonomous Mental Development, 2(3), 216–229.

    Article  Google Scholar 

  • Roy, N., & McCallum, A. (2001). Towards optimal active learning through sampling estimation of error reduction. In Proceedings of the 18th international conference on machine learning, 1, 143–160.

  • Schaal, S., Ijspeert, A., & Billard, A. (2003). Computational approaches to motor learning by imitation. Philosophical Transactions of the Royal Society of London Series B, Biological sciences 358(1431), 537–547.

    Google Scholar 

  • Schmidhuber J (1991) Curious model-building control systems. In: Proceedings of the international joint conference on neural networks (Vol. 2, pp. 1458–1463).

  • Schmidhuber, J. (2010). Formal theory of creativity, fun, and intrinsic motivation (1990–2010). IEEE Transactions on Autonomous Mental Development, 2(3), 230–247.

    Article  Google Scholar 

  • Slater, A., & Lewis, M. (2006). Introduction to infant development. Oxford: Oxford University Press.

    Google Scholar 

  • Smart, W., & Kaelbling, L. (2002). Effective reinforcement learning for mobile robots. In Proceedings of the IEEE international conference on robotics and automation (pp 3404–3410).

  • Stulp, F., & Schaal, S. (2011). Hierarchical reinforcement learning with movement primitives. In Humanoids (pp. 231–238).

  • Stulp, F., & Sigaud, O. (2012). Policy improvement methods: Between black-box optimization and episodic reinforcement learning.

  • Theodorou, E., Buchli, J., & Schaal, S. (2010). Reinforcement learning of motor skills in high dimensions: A path integral approach. In IEEE international conference on robotics and automation (ICRA) 2010 (pp. 2397–2403).

  • Thomaz, A. L. (2006). Socially guided machine learning. PhD thesis, MIT.

  • Thomaz, A. L., & Breazeal, C. (2008). Experiments in socially guided exploration: Lessons learned in building robots that learn with and without human teachers. Connection Science, Special Issue on Social Learning in Embodied Agents, 20(2, 3), 91–110.

  • Tomasello, M., & Carpenter, M. (2007). Shared intentionality. Developmental Science, 10(1), 121–125.

    Google Scholar 

  • Verma, D., & Rao, R. (2006). Goal-based imitation as probabilistic inference over graphical models. In Advances in NIPS (Vol. 18).

  • Weiss, E., & Flanders, M. (2004). Muscular and postural synergies of the human hand. Journal of Neurophysiology, 92, 523–535.

    Article  Google Scholar 

  • Whiten, A. (2000). Primate culture and social learning. Cognitive Science, 24(3), 477–508.

    Article  Google Scholar 

  • Xu, T., Yu, C., & Smith, L. (2011). It’s the child’s body: The role of toddler and parent in selecting toddler’s visual experience. IN Proceedings of IEEE 10th international conference in development and learning.

Download references

Acknowledgments

The authors would like to thank Paul Fudal, Jerome Bechu and Haylee Fogg for their support for the experimental setup, and Freek Stulp for his very helpful comments. This research was partially funded by ERC Grant EXPLORERS 240007 and ANR MACSi.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sao Mai Nguyen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nguyen, S.M., Oudeyer, PY. Socially guided intrinsic motivation for robot learning of motor skills. Auton Robot 36, 273–294 (2014). https://doi.org/10.1007/s10514-013-9339-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10514-013-9339-y

Keywords

Navigation