Abstract
This paper presents a technical approach to robot learning of motor skills which combines active intrinsically motivated learning with imitation learning. Our algorithmic architecture, called SGIM-D, allows efficient learning of high-dimensional continuous sensorimotor inverse models in robots, and in particular learns distributions of parameterised motor policies that solve a corresponding distribution of parameterised goals/tasks. This is made possible by the technical integration of imitation learning techniques within an algorithm for learning inverse models that relies on active goal babbling. After reviewing social learning and intrinsic motivation approaches to action learning, we describe the general framework of our algorithm, before detailing its architecture. In an experiment where a robot arm has to learn to use a flexible fishing line, we illustrate that SGIM-D efficiently combines the advantages of social learning and intrinsic motivation and benefits from human demonstration properties to learn how to produce varied outcomes in the environment, while developing more precise control policies in large spaces.
Similar content being viewed by others
References
Abbeel, P. & Ng, A. Y. (2004). Apprenticeship learning via inverse reinforcement learning. In Proceedings of the 21st international conference on machine learning (ICML’04) (pp. 1–8).
Akgun, B., Cakmak, M., Yoo, J., & Thomaz, A. (2012). Trajectories and keyframes for kinesthetic teaching: A human–robot interaction perspective. In International conference on human–robot interaction.
Argall, B. D., Browning, B., & Veloso, M. (2008). Learning robot motion control with demonstration and advice-operators. In Proceedings IEEE/RSJ international conference on intelligent robots and systems IEEE (pp. 399–404).
Argall, B. D., Chernova, S., Veloso, M., & Browning, B. (2009). A survey of robot learning from demonstration. Robotics and Autonomous Systems, 57(5), 469–483. doi:10.1016/j.robot.2008.10.024.
Argall, B. D., Browning, B., & Veloso, M. (2011). Teacher feedback to scaffold and refine demonstrated motion primitives on a mobile robot. Robotics and Autonomous Systems, 59(3–4), 243–255.
Baldassarre, G. (2011). What are intrinsic motivations? A biological perspective. In 2011 IEEE international conference on development and learning (ICDL) (Vol. 2, pp. 1–8).
Baranes, A., & Oudeyer, P. Y. (2010). Intrinsically motivated goal exploration for active motor learning in robots. Paris: INRIA.
Baranes, A., & Oudeyer, P. Y. (2013). Active learning of inverse models with intrinsically motivated goal exploration in robots. Robotics and Autonomous Systems, 61(1), 49–73.
Barto, A. G., Singh, S., & Chenatez, N. (2004a). Intrinsically motivated learning of hierarchical collections of skills. In Proceedings of 3rd international conference on development and learning, San Diego, CA (pp. 112–119).
Barto, A. G., Singh, S., & Chentanez, N. (2004b). Intrinsically motivated learning of hierarchical collections of skills. In ICDL international conference on developmental learning.
Billard, A., Calinon, S., Dillmann, R., & Schaal, S. (2007). Robot programming by demonstration. In B. Siciliano & O. Khatib (Eds.), Handbook of robotics (Chapt. 59). New York: Springer.
Bishop, C. (2007). Pattern recognition and machine learning. In Information science and statistics. Heidelberg: Springer.
Blumberg, B., Downie, M., Ivanov, Y., Berlin, M., Johnson, M. P., & Tomlinson, B. (2002). Integrated learning for interactive synthetic characters. ACM Transactions on Graphics 21:417–426. doi:10.1145/566654.566597.
Breazeal, C., & Scassellati, B. (2002). Robots that imitate humans. Trends in Cognitive Sciences, 6(11), 481–487.
Cakmak, M., & Thomaz, A. L. (2010). Optimality of human teachers for robot learners. In IEEE international conference on development and learning (ICDL) (Vol. 4).
Cakmak, M., DePalma, N., Thomaz, A. L., & Arriaga, R. (2009). Effects of social exploration mechanisms on robot learning. In The 18th IEEE international symposium on robot and human interactive communication (RO-MAN 2009) (pp. 128–134).
Cakmak, M., Chao, C., & Thomaz, A. L. (2010). Designing interactions for robot active learners. IEEE Transactions on Autonomous Mental Development, 2(2), 108–118.
Calinon, S. (2009). Robot programming by demonstration: A probabilistic approach. Boca Raton: EPFL/CRC Press. EPFL Press ISBN 978-2-940222-31-5, CRC Press ISBN 978-1-4398-0867-2.
Calinon, S., & F G, Billard A,. (2007). On learning, representing and generalizing a task in a humanoid robot. IEEE Transactions on Systems, Man and Cybernetics, 37(2), 286–298.
Call, J., & Carpenter, M. (2002). Three sources of information in social learning. In K. Dautenhahn & C. L. Nehaniv (Eds.), Imitation in animals and artifacts (pp. 211–228). Cambridge, MA: MIT Press.
Cederborg, T., Li, M., Baranes, A., & Oudeyer, P. Y. (2010). Incremental local inline gaussian mixture regression for imitation learning of multiple tasks. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (IROS), Taipei, Taiwan.
Chernova, S., & Veloso, M. (2009). Interactive policy learning through confidence-based autonomy. Journal of Artificial Intelligence Research, 34. doi:10.1613/jair.2584.
Clouse, J., & Utgoff, P. (1992). A teaching method for reinforcement learning. In Proceedings of the nineth international conference on machine learning.
Cohn, D. A., Ghahramani, Z., & Jordan, M. I. (1996). Active learning with statistical models. Journal of Artificial Intelligence Research, 4, 129–145.
Coleman, T., & Li, Y. (1994). On the convergence of reflective newton methods for large-scale nonlinear minimization subject to bounds. Mathematical Programming, 67(2), 189–224.
Coleman, T., & Li, Y. (1996). An interior, trust region approach for nonlinear minimization subject to bounds. SIAM Journal on Optimization, 6, 418–445.
Csibra, G. (2003). Teleological and referential understanding of action in infancy. Philosophical Transactions of the Royal Society of London Series B: Biological Sciences, 358(1431), 447.
Csibra, G., & Gergely, G. (2007). Obsessed with goals: Functions and mechanisms of teleological interpretation of actions in humans. Acta Psychologica, 124(1), 60–78. doi:10.1016/j.actpsy.2006.09.007. Becoming an intentional agent: Early development of action interpretation and action control.
da Silva, B., Konidaris, G., & Barto, A. (2012). Learning parameterized skills. In 29th international conference on machine learning (ICML 2012).
Dautenhahn, K., & Nehaniv, C. L. (2002). Imitation in animals and artifacts. Cambridge: MIT Press.
d’Avella, A., Portone, A., Fernandez, L., & Lacquaniti, F. (2006). Control of fast-reaching movement by muscle synergies combinations. The Journal of Neuroscience, 26(30), 7791–7810.
Deci, E., & Ryan, R. M. (1985). Intrinsic motivation and self-determination in human behavior. New York: Plenum Press.
Fedorov, V. (1972). Theory of optimal experiment. New York, NY: Academic Press, Inc.
Grollman, D. H., & Jenkins, O. C. (2008). Sparse incremental learning for interactive robot control policy estimation. In International conference on robotics and automation (ICRA 2008) (pp. 3315–3320).
Kaplan, F., Oudeyer, P. Y., Kubinyi, E., & Miklosi, A. (2002). Robotic clicker training. Robotics and Autonomous Systems, 38(3–4), 197–206.
Kober, J., & Peters, J. (2011). Policy search for motor primitives in robotics. Machine Learning, 84(1), 171–203.
Kober, J., Wilhelm, A., Oztop, E., & Peters, J. (2012). Reinforcement learning to adjust parametrized motor primitives to new situations. Autonomous Robots, 1–19. doi:10.1007/s10514-012-9290-3.
Koenig, N., Takayama, L., & Matarić, M. (2010). Communication and knowledge sharing in human–robot interaction and learning from demonstration. Neural Networks, 23(8–9), 1104–1112. doi:10.1016/j.neunet.2010.06.005.
Kormushev, P., Calinon, S., & Caldwell, D. G. (2010). Robot motor skill coordination with EM-based reinforcement learning. In Proceedings of IEEE/RSJ international conference on intelligent robots and systems (IROS), Taipei, Taiwan (pp 3232–3237).
Kormushev, P., Calinon, S., & Caldwell, D. G. (2011). Imitation learning of positional and force skills demonstrated via kinesthetic teaching and haptic input. Advanced Robotics, 25(5), 581–603.
Krzanowski, W. J. (1988). Principles of multivariate analysis: A user’s perspective. New York: Oxford University Press.
Lagarias, J. C., Reeds, J. A., Wright, M. H., & Wright, P. E. (1998). Convergence properties of the nelder-mead simplex method in low dimensions. SIAM Journal of Optimization, 9(1), 112–147.
Lopes, M. (2012). Optimal teaching on sequential decision tasks (to appear).
Lopes, M., & Oudeyer, P. Y. (2010). Active learning and intrinsically motivated exploration in robots: Advances and challenges (guest editorial). IEEE Transactions on Autonomous Mental Development, 2(2), 65–69.
Lopes, M., Melo, F., Montesano, L., & Santos-Victor, J. (2009a). Abstraction levels for robotic imitation: Overview and computational approaches. In From motor to interaction learning in robots. Berlin: Springer.
Lopes, M., Melo, F. S., Kenward, B., & Santos-Victor, J. (2009b). A computational model of social-learning mechanisms. Adaptive Behaviour, 17(6), 467–483.
Lopes, M., Melo, F., Montesano, L., & Santos-Victor, J. (2010b). Abstraction levels for robotic imitation: Overview and computational approaches. In O. Sigaud & J. Peters (Eds.), From motor to interaction learning in robots, Studies in computational intelligence (Vol. 264, pp. 313–355). Berlin: Springer.
Lopes, M., Cederbourg, T., & Oudeyer, P. Y. (2011) Simultaneous acquisition of task and feedback models. In IEEE international conference on development and learning.
Mangin, O., & Oudeyer, P. Y. (2012) Learning the combinatorial structure of demonstrated behaviors with inverse feedback control. In A. A. Salah, J., Ruiz-del Solar, Ç. Meriçli, & P. Y. Oudeyer (Eds.), HBU 2012. LNCS (Vol. 7559, pp 135–148). Heidelberg: Springer.
Muja, M., & Lowe, D. (2009). Fast approximate nearest neighbors with automatic algorithm. In International conference on computer vision theory and applications (VISAPP’09).
Nehaniv, C. L., Dautenhahn, K., et al. (2004). Imitation and social learning in robots, humans, and animals: Behavioural, social and communicative dimensions. Cambridge: Cambridge University Press.
Nehaniv, C. L., & Dautenhahn, K. (2007). Imitation and social learning in robots, humans and animals: Behavioural, social and communicative dimensions. Cambridge: Cambridge University Press.
Nguyen, S. M., & Oudeyer, P.-Y. (2012a). Interactive learning gives the tempo to an intrinsically motivated robot learner. In IEEE-RAS international conference on humanoid robots.
Nguyen, S. M., & Oudeyer, P.-Y. (2012b). Whom will an intrinsically motivated robot learner choose to imitate from? In J. Szufnarowska (Ed.), Proceedings of the post-graduate conference on robotics and development of cognition (pp. 32–35). doi:10.2390/biecoll-robotdoc2012-12.
Nguyen, S. M., & Oudeyer, P.-Y. (2012c). Active choice of teachers, learning strategies and goals for a socially guided intrinsic motivation learner. Paladyn Journal of Behavioural Robotics, 3(3), 136–146.
Nicolescu, M., & Mataric, M. (2003). Natural methods for robot task learning: Instructive demonstrations, generalization and practice. In Proceedings of the second international joint conference on autonomous agents and multiagent systems, ACM (pp. 241–248).
Oudeyer, P. Y. (2011). Developmental constraints on the evolution and acquisition of sensorimotor skills. Habilitation a Diriger des Recherches.
Oudeyer, P. Y., & Kaplan, F. (2007). What is intrinsic motivation? a typology of computational approaches. Frontiers in Neurorobotics, 1, 6.
Oudeyer, P. Y., Kaplan, F., & Hafner, V. (2007). Intrinsic motivation systems for autonomous mental development. IEEE Transactions on Evolutionary Computation, 11(2), 265–286.
Oudeyer, P. Y., Baranes, A., & Kaplan, F. (2013). Intrinsically motivated learning of real-word sensorimotor skills with developmental constraints. In G. Baldassarre & Miroli (Eds.), Intrinsically motivated learning in natural and artificial system. London: Springer.
Peters, J., & Schaal, S. (2008). Reinforcement learning of motor skills with policy gradients. Neural Networks, 21(4), 682–697.
Rolf, M., Steil, J., & Gienger, M. (2010). Gobal babbling permits direct learning of inverse kinematics. IEEE Transactions on Autonomous Mental Development, 2(3), 216–229.
Roy, N., & McCallum, A. (2001). Towards optimal active learning through sampling estimation of error reduction. In Proceedings of the 18th international conference on machine learning, 1, 143–160.
Schaal, S., Ijspeert, A., & Billard, A. (2003). Computational approaches to motor learning by imitation. Philosophical Transactions of the Royal Society of London Series B, Biological sciences 358(1431), 537–547.
Schmidhuber J (1991) Curious model-building control systems. In: Proceedings of the international joint conference on neural networks (Vol. 2, pp. 1458–1463).
Schmidhuber, J. (2010). Formal theory of creativity, fun, and intrinsic motivation (1990–2010). IEEE Transactions on Autonomous Mental Development, 2(3), 230–247.
Slater, A., & Lewis, M. (2006). Introduction to infant development. Oxford: Oxford University Press.
Smart, W., & Kaelbling, L. (2002). Effective reinforcement learning for mobile robots. In Proceedings of the IEEE international conference on robotics and automation (pp 3404–3410).
Stulp, F., & Schaal, S. (2011). Hierarchical reinforcement learning with movement primitives. In Humanoids (pp. 231–238).
Stulp, F., & Sigaud, O. (2012). Policy improvement methods: Between black-box optimization and episodic reinforcement learning.
Theodorou, E., Buchli, J., & Schaal, S. (2010). Reinforcement learning of motor skills in high dimensions: A path integral approach. In IEEE international conference on robotics and automation (ICRA) 2010 (pp. 2397–2403).
Thomaz, A. L. (2006). Socially guided machine learning. PhD thesis, MIT.
Thomaz, A. L., & Breazeal, C. (2008). Experiments in socially guided exploration: Lessons learned in building robots that learn with and without human teachers. Connection Science, Special Issue on Social Learning in Embodied Agents, 20(2, 3), 91–110.
Tomasello, M., & Carpenter, M. (2007). Shared intentionality. Developmental Science, 10(1), 121–125.
Verma, D., & Rao, R. (2006). Goal-based imitation as probabilistic inference over graphical models. In Advances in NIPS (Vol. 18).
Weiss, E., & Flanders, M. (2004). Muscular and postural synergies of the human hand. Journal of Neurophysiology, 92, 523–535.
Whiten, A. (2000). Primate culture and social learning. Cognitive Science, 24(3), 477–508.
Xu, T., Yu, C., & Smith, L. (2011). It’s the child’s body: The role of toddler and parent in selecting toddler’s visual experience. IN Proceedings of IEEE 10th international conference in development and learning.
Acknowledgments
The authors would like to thank Paul Fudal, Jerome Bechu and Haylee Fogg for their support for the experimental setup, and Freek Stulp for his very helpful comments. This research was partially funded by ERC Grant EXPLORERS 240007 and ANR MACSi.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Nguyen, S.M., Oudeyer, PY. Socially guided intrinsic motivation for robot learning of motor skills. Auton Robot 36, 273–294 (2014). https://doi.org/10.1007/s10514-013-9339-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10514-013-9339-y