International Journal of Social Robotics

, Volume 4, Issue 4, pp 409–421 | Cite as

How Humans Teach Agents

A New Experimental Perspective
  • W. Bradley Knox
  • Brian D. Glass
  • Bradley C. Love
  • W. Todd Maddox
  • Peter Stone


Human beings are a largely untapped source of in-the-loop knowledge and guidance for computational learning agents, including robots. To effectively design agents that leverage available human expertise, we need to understand how people naturally teach. In this paper, we describe two experiments that ask how differing conditions affect a human teacher’s feedback frequency and the computational agent’s learned performance. The first experiment considers the impact of a self-perceived teaching role in contrast to believing one is merely critiquing a recording. The second considers whether a human trainer will give more frequent feedback if the agent acts less greedily (i.e., choosing actions believed to be worse) when the trainer’s recent feedback frequency decreases. From the results of these experiments, we draw three main conclusions that inform the design of agents. More broadly, these two studies stand as early examples of a nascent technique of using agents as highly specifiable social entities in experiments on human behavior.


Human-agent interaction Human teachers Shaping Single agent learning Reinforcement learning Misbehavior Attention 



This research was supported in part by NIH (R01 MH077708 to WTM), NSF (IIS-0917122), AFOSR (FA9550-10-1-0268), ONR (N00014-09-1-0658), and the FHWA (DTFH61-07-H-00030). We thank the research assistants of MaddoxLab for their crucial help gathering data.


  1. 1.
    Abbeel P, Ng A (2004) Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the twenty-first international conference on machine learning. ACM, New York, p 1 CrossRefGoogle Scholar
  2. 2.
    Argall B, Browning B, Veloso M (2007) Learning by demonstration with critique from a human teacher. In: Proceedings of the ACM/IEEE international conference on Human-robot interaction. ACM, New York, pp 57–64 CrossRefGoogle Scholar
  3. 3.
    Argall B, Chernova S, Veloso M, Browning B (2009) A survey of robot learning from demonstration. Robot Auton Syst 57(5):469–483 CrossRefGoogle Scholar
  4. 4.
    Bertsekas D, Tsitsiklis J (1996) Neuro-dynamic programming. Athena Scientific, Nashua zbMATHGoogle Scholar
  5. 5.
    Bouton M (2007) Learning and behavior: a contemporary synthesis. Sinauer Associates, Sunderland Google Scholar
  6. 6.
    Breazeal C (2004) Designing sociable robots. MIT Press, Cambridge Google Scholar
  7. 7.
    Chernova S, Veloso M (2009) Interactive policy learning through confidence-based autonomy. J Artif Intell Res 34(1):1–25 MathSciNetzbMATHGoogle Scholar
  8. 8.
    Chernova S, Veloso M (2009) Teaching collaborative multi-robot tasks through demonstration. In: 8th IEEE-RAS international conference on humanoid robots, Humanoids, 2008. IEEE Press, New York, pp 385–390. Google Scholar
  9. 9.
    Dautenhahn K (2007) Methodology and themes of human-robot interaction: a growing research field. Int J Adv Robot Syst 4(1):103–108 Google Scholar
  10. 10.
    Dobbs J, Arnold D, Doctoroff G (2004) Attention in the preschool classroom: the relationships among child gender, child misbehavior, and types of teacher attention. Early Child Dev Care 174(3):281–295 CrossRefGoogle Scholar
  11. 11.
    Evers V, Maldonado H, Brodecki T, Hinds P (2008) Relational vs. group self-construal: untangling the role of national culture in hri. In: Proceedings of the 3rd ACM/IEEE international conference on human robot interaction. ACM, New York, pp 255–262 CrossRefGoogle Scholar
  12. 12.
    Fagot B (1973) Influence of teacher behavior in the preschool. Dev Psychol 9(2):198 CrossRefGoogle Scholar
  13. 13.
    Grollman D, Jenkins O (2007) Dogged learning for robots. In: IEEE international conference on robotics and automation, 2007. IEEE Press, New York, pp 2483–2488 Google Scholar
  14. 14.
    Hinds P, Roberts T, Jones H (2004) Whose job is it anyway? A study of human-robot interaction in a collaborative task. Hum-Comput Interact 19(1):151–181 CrossRefGoogle Scholar
  15. 15.
    Isbell C, Kearns M, Singh S, Shelton C, Stone P, Kormann D (2006) Cobot in LambdaMOO: an adaptive social statistics agent. In: AAMAS Google Scholar
  16. 16.
    Kaochar T, Peralta R, Morrison C, Fasel I, Walsh T, Cohen P (2011) Towards understanding how humans teach robots. In: User modeling, adaption and personalization, pp 347–352 CrossRefGoogle Scholar
  17. 17.
    Kim E, Leyzberg D, Tsui K, Scassellati B (2009) How people talk when teaching a robot. In: Proceedings of the 4th ACM/IEEE international conference on human robot interaction. ACM, New York, pp 23–30 Google Scholar
  18. 18.
    Knox W, Stone P (2009) Interactively shaping agents via human reinforcement: the TAMER framework. In: The 5th international conference on knowledge capture Google Scholar
  19. 19.
    Knox WB, Breazeal C, Stone P (2012) Learning from feedback on actions past and intended. In: Proceedings of 7th ACM/IEEE international conference on Human-Robot interaction, Late-Breaking reports session (HRI 2012) Google Scholar
  20. 20.
    Knox WB, Stone P (2012) Reinforcement learning with human and MDP reward. In: Proceedings of the 11th international conference on autonomous agents and multiagent systems (AAMAS) Google Scholar
  21. 21.
    Kuhlmann G, Stone P, Mooney R, Shavlik J (2004) Guiding a reinforcement learner with natural language advice: initial results in RoboCup soccer. In: The AAAI-2004 workshop on supervisory control of learning and adaptive systems Google Scholar
  22. 22.
    MacDorman K, Ishiguro H (2006) The uncanny advantage of using androids in cognitive and social science research. Interact Stud 7(3):297–337 CrossRefGoogle Scholar
  23. 23.
    MacDorman K, Minato T, Shimada M, Itakura S, Cowley S, Ishiguro H (2005) Assessing human likeness by eye contact in an android testbed. In: Proceedings of the XXVII annual meeting of the cognitive science society, pp 21–23 Google Scholar
  24. 24.
    Maclin R, Shavlik J (1996) Creating advice-taking reinforcement learners. Mach Learn 22(1):251–281 Google Scholar
  25. 25.
    Nicolescu M, Mataric M (2002) Learning and interacting in human-robot domains. IEEE Trans Syst Man Cybern, Part A, Syst Hum 31(5):419–430 CrossRefGoogle Scholar
  26. 26.
    Nicolescu M, Mataric M (2003) Natural methods for robot task learning: instructive demonstrations, generalization and practice. In: AAMAS. ACM, New York, pp 241–248 Google Scholar
  27. 27.
    Pomerleau D (1989) ALVINN: an autonomous land vehicle in a neural network. Advances in neural information processing systems, vol 1. Morgan Kaufmann, San Mateo Google Scholar
  28. 28.
    Pryor K (2002) Don’t shoot the dog! The new art of teaching and training. Interpet Publishing, Dorking Google Scholar
  29. 29.
    Ramirez K (1999) Animal training: successful animal management through positive reinforcement. Shedd Aquarium, Chicago Google Scholar
  30. 30.
    Reed K, Patton J, Peshkin M (2007) Replicating human-human physical interaction. In: IEEE international conference on robotics and automation Google Scholar
  31. 31.
    Rouder J, Speckman P, Sun D, Morey R, Iverson G (2009) Bayesian t tests for accepting and rejecting the null hypothesis. Psychon Bull Rev 16(2):225–237 CrossRefGoogle Scholar
  32. 32.
    Saunders J, Nehaniv C, Dautenhahn K (2006) Teaching robots by moulding behavior and scaffolding the environment. In: Proceedings of the 1st ACM SIGCHI/SIGART conference on human-robot interaction. ACM, New York, pp 118–125 CrossRefGoogle Scholar
  33. 33.
    Sridharan M (2011) Augmented reinforcement learning for interaction with non-expert humans in agent domains. In: Proceedings of IEEE international conference on machine learning applications Google Scholar
  34. 34.
    Sutton R, Barto A (1998) Reinforcement learning: an introduction. MIT Press, Cambridge Google Scholar
  35. 35.
    Tanner B, White A (2009) RL-Glue: Language-independent software for reinforcement-learning experiments. J Mach Learn Res 10:2133–2136 Google Scholar
  36. 36.
    Thomaz A (2006) Socially guided machine learning. PhD thesis, Citeseer Google Scholar
  37. 37.
    Thomaz A, Breazeal C (2006) Reinforcement learning with human teachers: evidence of feedback and guidance with implications for learning performance. In: AAAI Google Scholar
  38. 38.
    Thomaz A, Cakmak M (2009) Learning about objects with human teachers. In: Proceedings of the 4th ACM/IEEE international conference on human robot interaction. ACM, New York, pp 15–22 Google Scholar
  39. 39.
    Wolfgang C (2004) Solving discipline and classroom management problems: methods and models for today’s teachers. Wiley, New York Google Scholar
  40. 40.
    Woodward M, Wood R (2009) Using Bayesian inference to learn high-level tasks from a human teacher. In: International conference on artificial intelligence and pattern recognition, AIPR-09 Google Scholar

Copyright information

© Springer Science & Business Media BV 2012

Authors and Affiliations

  • W. Bradley Knox
    • 1
  • Brian D. Glass
    • 2
  • Bradley C. Love
    • 3
  • W. Todd Maddox
    • 2
  • Peter Stone
    • 1
  1. 1.Department of Computer ScienceUniversity of Texas at AustinAustinUSA
  2. 2.Department of PsychologyUniversity of Texas at AustinAustinUSA
  3. 3.Department of Cognitive, Perceptual and Brain SciencesUniversity College LondonLondonUK

Personalised recommendations