Paladyn

, Volume 3, Issue 3, pp 136–146 | Cite as

Active choice of teachers, learning strategies and goals for a socially guided intrinsic motivation learner

Research Article

Abstract

We present an active learning architecture that allows a robot to actively learn which data collection strategy is most efficient for acquiring motor skills to achieve multiple outcomes, and generalise over its experience to achieve new outcomes. The robot explores its environment both via interactive learning and goal-babbling. It learns at the same time when, who and what to actively imitate from several available teachers, and learns when not to use social guidance but use active goal-oriented self-exploration. This is formalised in the framework of life-long strategic learning.

The proposed architecture, called Socially Guided Intrinsic Motivation with Active Choice of Teacher and Strategy (SGIM-ACTS), relies on hierarchical active decisions of what and how to learn driven by empirical evaluation of learning progress for each learning strategy. We illustrate with an experiment where a simulated robot learns to control its arm for realising two kinds of different outcomes. It has to choose actively and hierarchically at each learning episode: 1) what to learn: which outcome is most interesting to select as a goal to focus on for goal-directed exploration; 2) how to learn: which data collection strategy to use among self-exploration, mimicry and emulation; 3) once he has decided when and what to imitate by choosing mimicry or emulation, then he has to choose who to imitate, from a set of different teachers. We show that SGIM-ACTS learns significantly more effciently than using single learning strategies, and coherently selects the best strategy with respect to the chosen outcome, taking advantage of the available teachers (with different levels of skills).

Keywords

strategic learner imitation learning mimicry emulation artificial curiosity intrinsic motivation interactive learner active learning goal babbling robot skill learning 

References

  1. [1]
    Brenna D. Argall, B. Browning, and Manuela Veloso. Learning robot motion control with demonstration and advice-operators. In In Proceedings IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 399–404. IEEE, September 2008.Google Scholar
  2. [2]
    Brenna D. Argall, B. Browning, and Manuela Veloso. Teacher feedback to scaffold and refine demonstrated motion primitives on a mobile robot. Robotics and Autonomous Systems, 59(3–4):243–255, 2011.CrossRefGoogle Scholar
  3. [3]
    Brenna D. Argall, Sonia Chernova, Manuela Veloso, and Brett Browning. A survey of robot learning from demonstration. Robotics and Autonomous Systems, 57(5):469–483, 2009.CrossRefGoogle Scholar
  4. [4]
    C.G. Atkeson, Moore Andrew, and Schaal Stefan. Locally weighted learning. AI Review, 11:11–73, April 1997.Google Scholar
  5. [5]
    Y. Baram, R. El-Yaniv, and K. Luz. Online choice of active learning algorithms. The Journal of Machine Learning Research,, 5:255–291, 2004.MathSciNetGoogle Scholar
  6. [6]
    Adrien Baranes and Pierre-Yves Oudeyer. Active learning of inverse models with intrinsically motivated goal exploration in robots. Robotics and Autonomous Systems, 61(1):49–73, 2013.CrossRefGoogle Scholar
  7. [7]
    Andrew G. Barto, S. Singh, and N Chentanez. Intrinsically motivated learning of hierarchical collections of skills. In ICDL International Conference on Developmental Learning, pages 112–119, 2004.Google Scholar
  8. [8]
    Aude Billard, Sylvain Calinon, Ruediger Dillmann, and Stefan Schaal. Handbook of Robotics, chapter Robot Programming by Demonstration. Number 59. MIT Press, 2007.Google Scholar
  9. [9]
    Cynthia Breazeal and B. Scassellati. Robots that imitate humans. Trends in Cognitive Sciences, 6(11):481–487, 2002.CrossRefGoogle Scholar
  10. [10]
    Maya Cakmak, C. Chao, and Andrea L. Thomaz. Designing interactions for robot active learners. Autonomous Mental Development, IEEE Transactions on, 2(2):108–118, 2010.CrossRefGoogle Scholar
  11. [11]
    Maya Cakmak, Nick DePalma, Andrea L. Thomaz, and Rosa Arriaga. Effects of social exploration mechanisms on robot learning. In The 18th IEEE International Symposium on Robot and Human Interactive Communication, 2009. RO-MAN 2009., pages 128–134. IEEE, 2009.CrossRefGoogle Scholar
  12. [12]
    J. Call and M. Carpenter. Imitation in animals and artifacts, chapter Three sources of information in social learning, pages 211–228. Cambridge, MA: MIT Press., 2002.Google Scholar
  13. [13]
    Sonia Chernova and Manuela Veloso. Interactive policy learning through confidence-based autonomy. Journal of Artificial Intelligence Research, 34, 2009.Google Scholar
  14. [14]
    Gergely Csibra. Teleological and referential understanding of action in infancy. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 358(1431):447, 2003.CrossRefGoogle Scholar
  15. [15]
    B.C. da Silva, G. Konidaris, and Andrew G. Barto. Learning parameterized skills. In 29th International Conference on Machine Learning (ICML 2012), 2012.Google Scholar
  16. [16]
    Kerstin Dautenhahn and Chrystopher L. Nehaniv. Imitation in Animals and Artifacts. MIT Press, 2002.Google Scholar
  17. [17]
    Daniel H Grollman and Odest Chadwicke” Jenkins. Incremental learning of subtasks from unsegmented demonstration. In Intelligent Robots and Systems IROS 2010 IEEERSJ International Conference on, pages 261–266, 2010.CrossRefGoogle Scholar
  18. [18]
    Jens Kober, Andreas Wilhelm, Erhan Oztop, and Jan Peters. Reinforcement learning to adjust parametrized motor primitives to new situations. Autonomous Robots, pages 1–19, 2012. 10.1007/s10514-012-9290-3.Google Scholar
  19. [19]
    N Koenig, L Takayama, and M Matari¢. Communication and knowledge sharing in human-robot interaction and learning from demonstration. Neural Netw, 23(8–9):1104–1112, Oct–Nov 2010.CrossRefGoogle Scholar
  20. [20]
    Petar Kormushev, Sylvain Calinon, and Darwin G. Caldwell. Robot motor skill coordination with EM-based reinforcement learning. In Proc. IEEE/RSJ Intl Conf. on Intelligent Robots and Systems (IROS), pages 3232–3237, Taipei, Taiwan, October 2010.Google Scholar
  21. [21]
    Petar Kormushev, Sylvain Calinon, and Darwin G. Caldwell. Imitation learning of positional and force skills demonstrated via kinesthetic teaching and haptic input. Advanced Robotics, 25(5):581–603, 2011.CrossRefGoogle Scholar
  22. [22]
    J. C. Lagarias, J. A. Reeds, M. H. Wright, and P. E. Wright. Convergence properties of the nelder-mead simplex method in low dimensions. SIAM Journal of Optimization, 9(1):112–147, 1998.MathSciNetMATHCrossRefGoogle Scholar
  23. [23]
    Manuel Lopes, Thomas Cederborg, and Pierre-Yves Oudeyer. Simultaneous acquisition of task and feedback models. Development and Learning (ICDL), 2011 IEEE International Conference on, pages 1–7, 2011.CrossRefGoogle Scholar
  24. [24]
    Manuel Lopes, Tobias Lang, Marc Toussaint, Pierre-Yves Oudeyer, et al. Exploration in model-based reinforcement learning by empirically estimating learning progress. In Neural Information Processing Systems (NIPS), 2012.Google Scholar
  25. [25]
    Manuel Lopes, Francisco Melo, and Luis Montesano. Active learning for reward estimation in inverse reinforcement learning. Machine Learning and Knowledge Discovery in Databases, pages 31–46, 2009.CrossRefGoogle Scholar
  26. [26]
    Manuel Lopes, Francisco Melo, Luis Montesano, and Jose Santos-Victor. From Motor to Interaction Learning in Robots, chapter Abstraction Levels for Robotic Imitation: Overview and Computational Approaches. Springer, 2009.Google Scholar
  27. [27]
    Manuel Lopes and Pierre-Yves Oudeyer. The Strategic Student Approach for Life-Long Exploration and Learning. In IEEE Conference on Development and Learning / EpiRob, San Diego, États-Unis, November 2012.Google Scholar
  28. [28]
    Chrystopher L Nehaniv and Kerstin Dautenhahn. Imitation and Social Learning in Robots, Humans and Animals: Behavioural, Social and Communicative Dimensions. Cambridge Univ. Press, Cambridge, March 2007.CrossRefGoogle Scholar
  29. [29]
    Sao Mai Nguyen, Adrien Baranes, and Pierre-Yves Oudeyer. Bootstrapping intrinsically motivated learning with human demonstrations. In IEEE International Conference on Development and Learning, Frankfurt, Germany, 2011.Google Scholar
  30. [30]
    Sao Mai Nguyen and Pierre-Yves Oudeyer. Properties for efficient demonstrations to a socially guided intrinsically motivated learner. In 21st IEEE International Symposium on Robot and Human Interactive Communication, 2012.Google Scholar
  31. [31]
    M.N. Nicolescu and M.J. Mataric. Natural methods for robot task learning: Instructive demonstrations, generalization and practice. In Proceedings of the second international joint conference on Autonomous agents and multiagent systems, pages 241–248. ACM, 2003.CrossRefGoogle Scholar
  32. [32]
    Pierre-Yves Oudeyer and Frederic Kaplan. What is intrinsic motivation? a typology of computational approaches. Frontiers in Neurorobotics, 2007.Google Scholar
  33. [33]
    Pierre-Yves Oudeyer, Frederic Kaplan, and Verena Hafner. Intrinsic motivation systems for autonomous mental development. IEEE Transactions on Evolutionary Computation, 11(2):265–286, 2007.CrossRefGoogle Scholar
  34. [34]
    Jan Peters and Stefan Schaal. Reinforcement learning of motor skills with policy gradients. Neural Networks, 21(4):682–697, 2008.CrossRefGoogle Scholar
  35. [35]
    G. Qi, X. Hua, Y. Rui, J. Tang, and H. Zhang. Two-dimensional active learning for image classification. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1–8. IEEE, 2008.Google Scholar
  36. [36]
    Roi Reichart, Katrin Tomanek, Udo Hahn, and Ari Rappoport. Multi-task active learning for linguistic annotations. In Annual Meeting of the Association for Computational Linguistics (ACL). Citeseer, 2008.Google Scholar
  37. [37]
    Matthias Rolf and Jochen J Steil. Goal babbling: a new concept for early sensorimotor exploration. pages 40–43, Osaka, 11/2012 2012. IEEE.Google Scholar
  38. [38]
    Richard M. Ryan and Edward L. Deci. Intrinsic and extrinsic motivations: Classic definitions and new directions. Contemporary Educational Psychology, 25(1):54–67, 2000.CrossRefGoogle Scholar
  39. [39]
    Stefan Schaal, A Ijspeert, and Aude Billard. Computational approaches to motor learning by imitation. Philosophical transactions of the Royal Society of London. Series B, Biological sciences, 358(1431), 03 2003.Google Scholar
  40. [40]
    Andrea L. Thomaz. Socially Guided Machine Learning. PhD thesis, MIT, 5 2006.Google Scholar
  41. [41]
    Andrea L. Thomaz and Cynthia Breazeal. Experiments in socially guided exploration: Lessons learned in building robots that learn with and without human teachers. Connection Science, 20Special Issue on Social Learning in Embodied Agents (2–3):91–110, 2008.CrossRefGoogle Scholar
  42. [42]
    M. Tomasello and M. Carpenter. Shared intentionality. Developmental Science, 10(1):121–125, 2007.CrossRefGoogle Scholar
  43. [43]
    Andrew Whiten. Primate culture and social learning. Cognitive Science, 24(3):477–508, 2000.CrossRefGoogle Scholar

Copyright information

© Versita Warsaw and Springer-Verlag Wien 2012

Authors and Affiliations

  1. 1.Flowers TeamINRIA and ENSTA ParisTech, FranceTalence CedexFrance

Personalised recommendations