International Journal of Social Robotics

, Volume 2, Issue 2, pp 195–215 | Cite as

Confidence-Based Multi-Robot Learning from Demonstration

  • Sonia ChernovaEmail author
  • Manuela Veloso


Learning from demonstration algorithms enable a robot to learn a new policy based on demonstrations provided by a teacher. In this article, we explore a novel research direction, multi-robot learning from demonstration, which extends demonstration based learning methods to collaborative multi-robot domains. Specifically, we study the problem of enabling a single person to teach individual policies to multiple robots at the same time. We present flexMLfD, a task and platform independent multi-robot demonstration learning framework that supports both independent and collaborative multi-robot behaviors. Building upon this framework, we contribute three approaches to teaching collaborative multi-robot behaviors based on different information sharing strategies, and evaluate these approaches by teaching two Sony QRIO humanoid robots to perform three collaborative ball sorting tasks. We then present scalability analysis of flexMLfD using up to seven Sony AIBO robots. We conclude the article by proposing a formalization for a broader multi-robot learning from demonstration research area.


Learning from demonstration Multi-robot learning Human–robot interaction Multi-robot systems 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Alissandrakis A, Nehaniv CL, Dautenhahn K (2002) Do as i do: Correspondences across different robotic embodiments. In: Kim J, Polani D, Martinetz T (eds) Fifth German workshop on artificial life (GWAL5), pp 143–152 Google Scholar
  2. 2.
    Argall B, Chernova S, Veloso M, Browning B (2009) A survey of robot learning from demonstration. Robot Auton Syst 57(5):469–483 CrossRefGoogle Scholar
  3. 3.
    Atkeson CG, Schaal S (1997) Robot learning from demonstration. In: Fisher DH Jr (ed) Machine learning: proceedings of the fourteenth international conference (ICML’97). San Francisco, California, pp 12–20 Google Scholar
  4. 4.
    Balch T, Arkin RC (1994) Communication in reactive multiagent robotic systems. Auton Robots 1(1):27–52 CrossRefGoogle Scholar
  5. 5.
    Bentivegna DC, Ude A, Atkeson CG, Cheng G (2004) Learning to act from observation and practice. Int J Humanoid Robot 1(4) Google Scholar
  6. 6.
    Breazeal C, Hoffman G, Lockerd A (2004) Teaching and working with robots as a collaboration. In: AAMAS ’04: Proceedings of the third international joint conference on autonomous agents and multiagent systems. IEEE Computer Society, Washington, DC, pp 1030–1037 Google Scholar
  7. 7.
    Browning B, Xu L, Veloso M (2004) Skill acquisition and use for a dynamically-balancing soccer robot. In: Proceedings of nineteenth national conference on artificial intelligence (AAAI’04) Google Scholar
  8. 8.
    Calinon S, Billard A (2007) Incremental learning of gestures by imitation in a humanoid robot. In: Second annual conference on human-robot interactions (HRI’07). Arlington, Virginia, March 2007 Google Scholar
  9. 9.
    Chaimowicz L, Campos MFM, Kumar V (2002) Dynamic role assignment for cooperative robots. In: Proc. of the IEEE intl. conf. on robotics and automation (ICRA), pp 293–298 Google Scholar
  10. 10.
    Chernova S (2009) Confidence-based robot policy learning from demonstration. PhD thesis, Computer Science Dept., Carnegie Mellon University, Advisor-Manuela Veloso Google Scholar
  11. 11.
    Chernova S, Veloso M (2008) Multi-thresholded approach to demonstration selection for interactive robot learning. In: Proceedings of 3rd ACM/IEEE international conference on human-robot interaction (HRI’08), March 2008 Google Scholar
  12. 12.
    Chernova S, Veloso M (2008) Teaching multi-robot coordination using demonstration of communication and state sharing (short paper). In: Proceedings of the international conference on autonomous agents and multiagent systems (AMMAS ’08), May 2008 Google Scholar
  13. 13.
    Chernova S, Veloso M (2009) Interactive policy learning through confidence-based autonomy. J Artif Intell Res 34(1):1–25 zbMATHMathSciNetGoogle Scholar
  14. 14.
    Clouse JA (1996) On integrating apprentice learning and reinforcement learning. PhD thesis, University of Massachusetts, Department of Computer Science. Director-Paul E Utgoff Google Scholar
  15. 15.
    Crandall JW, Goodrich MA, Olsen DR Jr, Nielsen, CW (2005) Validating human-robot interaction schemes in multitasking environments. IEEE Trans Syst Man Cybern A 35(4):438–449 CrossRefGoogle Scholar
  16. 16.
    Dias MB, Zlot R, Kalra N, Stentz A (2006) Market-based multirobot coordination: A survey and analysis. Proc IEEE 94(7):1257–1270 CrossRefGoogle Scholar
  17. 17.
    Endsley MR, Garland DJ (2000) Situation awareness: analysis and measurement. Lawrence Erlbaum Associates Google Scholar
  18. 18.
    Farinelli A, Farinelli R, Iocchi L, Nardi D (2004) Multi-robot systems: A classification focused on coordination. IEEE Trans Syst Man Cybern B 34:2015–2028 CrossRefGoogle Scholar
  19. 19.
    Fong TW, Thorpe C, Baur C (2003) Robot, asker of questions. In: Robotics and autonomous systems Google Scholar
  20. 20.
    Gerkey BP, Mataric MJ (2000) Principled communication for dynamic multi-robot task allocation. In: Experimental robotics VII. LNCIS, vol 271. Springer, Berlin, pp 353–362 Google Scholar
  21. 21.
    Goodrich MA, Schultz AC (2007) Human-robot interaction: a survey. Found Trends Hum Comput Interact 1(3):203–275 zbMATHCrossRefGoogle Scholar
  22. 22.
    Goodrich MA, Olsen DR Jr (2003) Seven principles of efficient human robot interaction. In: Proc IEEE Int Conf Syst, Man and Cybernetics, vol 4, pp 3942–3948 Google Scholar
  23. 23.
    Grollman SH, Jenkins OC (2007) Dogged learning for robots. In: Proceedings of the IEEE international conference on robotics and automation (ICRA’07), Roma, Italy Google Scholar
  24. 24.
    Guenter F, Hersch M, Calinon S, Billard A (2007) Reinforcement learning for imitating constrained reaching movements. RSJ Adv Robot 21(13):1521–1544 (Special issue on imitative robots) Google Scholar
  25. 25.
    Hersch M, Guenter F, Calinon S, Billard A (2008) Dynamical system modulation for robot learning via kinesthetic demonstrations. IEEE Trans Robot 24(6):1463–1467 CrossRefGoogle Scholar
  26. 26.
    Jan’t Hoen P, Tuyls K, Panait L, Luke S, La Poutré JA (2005) An overview of cooperative and competitive multiagent learning. In: LAMAS, pp 1–46 Google Scholar
  27. 27.
    Jones C, Shell D, Matarić M, Gerkey B (2004) Principled approaches to the design of multi-robot systems. In: IEEE/RSJ intl conf on intelligent robots and systems, workshop on networked robotics Google Scholar
  28. 28.
    Kube RC, Zhang H (1997) Task modelling in collective robotics. Auton Robots 4(1):53–72 CrossRefGoogle Scholar
  29. 29.
    Lee JD, See KA (2004) Trust in automation: designing for appropriate reliance. Hum Factors 46:50–80 Google Scholar
  30. 30.
    Likert R (1932) A technique for the measurement of attitudes. In: Archives of psychology, pp 1–55 Google Scholar
  31. 31.
    Lockerd A, Breazeal C (2004) Tutelage and socially guided robot learning. In: IEEE/RSJ international conference on intelligent robots and systems Google Scholar
  32. 32.
    Mataric MJ (2002) Sensory-motor primitives as a basis for learning by imitation: Linking perception to action and biology to robotics. In: Dautenhahn K, Nehaniv C (eds) Imitation in animals and artifacts. MIT Press, Cambridge, pp 392–422 Google Scholar
  33. 33.
    Mayo M (2003) Symbol grounding and its implications for artificial intelligence. In: Oudshoorn MJ (ed) Twenty-sixth australasian computer science conference (ACSC2003), CRPIT, vol 16. Adelaide, Australia, ACS, pp 55–60 Google Scholar
  34. 34.
    Nielsen CW, Few DA, Athey DS (2008) Using mixed-initiative human-robot interaction to bound performance in a search task. In: international conference on intelligent sensors, sensor networks and information processing. ISSNIP 2008, pp 195–200 Google Scholar
  35. 35.
    Oliveira E, Nunes L (2004) Learning by exchanging Advice. Springer, Berlin Google Scholar
  36. 36.
    Ossowski S, Menezes R (2006) On coordination and its significance to distributed and multi-agent systems: Research articles. Concurr Comput Pract Exper 18(4):359–370 CrossRefGoogle Scholar
  37. 37.
    Pagello E, D’Angelo A, Montesello F, Garelli F, Ferrari C (1999) Cooperative behaviors in multi-robot systems through implicit communication. Robot Auton Syst 29(1):65–77 CrossRefGoogle Scholar
  38. 38.
    Peters J, Vijayakumar S, Schaal S (2003) Reinforcement learning for humanoid robotics. In: IEEE-RAS international conference on humanoid robots, pp 1–20 Google Scholar
  39. 39.
    Pollard N, Hodgins JK (2002) Generalizing demonstrated manipulation tasks. In Workshop on the algorithmic foundations of robotics, December 2002 Google Scholar
  40. 40.
    Price B, Boutilier C (2003) Accelerating reinforcement learning through implicit imitation. J Artif Intell Res 19:569–629 zbMATHGoogle Scholar
  41. 41.
    Roth M, Vail D, Veloso M (2003) A real-time world model for multi-robot teams with high-latency communication. In: IEEE/RSJ international conference on intelligent robots and systems, vol 3. pp 2494–2499 Google Scholar
  42. 42.
    Rybski PE, Yoon K, Stolarz J, Veloso MM (2007) Interactive robot task training through dialog and demonstration. In: HRI’07: Proceedings of the ACM/IEEE international conference on human-robot interaction. ACM Press, New York, pp 49–56 CrossRefGoogle Scholar
  43. 43.
    Saunders J, Nehaniv CL, Dautenhahn K (2006) Teaching robots by moulding behavior and scaffolding the environment. In: HRI ’06: proceeding of the 1st ACM SIGCHI/SIGART conference on human-robot interaction. ACM Press, New York, pp 118–125 CrossRefGoogle Scholar
  44. 44.
    Schaal S, Ijspeert A, Billard A (2003) Computational approaches to motor learning by imitation. Philos Trans R Soc Lond, B, Biol Sci 358:537–547 CrossRefGoogle Scholar
  45. 45.
    Scholtz J, Antonishek B, Young J (2004) Evaluation of a human-robot interface: Development of a situational awareness methodology. In: HICSS ’04: Proceedings of the 37th annual Hawaii international conference on system sciences (HICSS’04)—Track 5, IEEE Computer Society, Washington, DC p 50130.3 Google Scholar
  46. 46.
    Smart WD, Kaelbling LP (2002) Effective reinforcement learning for mobile robots. In: IEEE international conference on robotics and automation Google Scholar
  47. 47.
    Steinfeld A (2004) Interface lessons for fully and semi-autonomous mobile robots. In: IEEE international conference on robotics and automation Google Scholar
  48. 48.
    Steinfeld A, Fong T, Kaber D, Lewis M, Scholtz J, Schultz A, Goodrich M (2006) Common metrics for human-robot interaction. In: 1st annual conference on human-robot interaction, Salt Lake City, Utah Google Scholar
  49. 49.
    Stone P, Veloso M (2000) Multiagent systems: A survey from a machine learning perspective. Auton Robots 8(3):345–383 CrossRefGoogle Scholar
  50. 50.
    Wang J, Lewis M (2007) Human control for cooperating robot teams. In HRI ’07: Proceedings of the ACM/IEEE international conference on human-robot interaction, New York, NY, USA, pp 9–16 Google Scholar

Copyright information

© Springer Science & Business Media BV 2010

Authors and Affiliations

  1. 1.Computer Science DepartmentCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations