Skip to main content
Log in

How Humans Teach Agents

A New Experimental Perspective

  • Published:
International Journal of Social Robotics Aims and scope Submit manuscript

Abstract

Human beings are a largely untapped source of in-the-loop knowledge and guidance for computational learning agents, including robots. To effectively design agents that leverage available human expertise, we need to understand how people naturally teach. In this paper, we describe two experiments that ask how differing conditions affect a human teacher’s feedback frequency and the computational agent’s learned performance. The first experiment considers the impact of a self-perceived teaching role in contrast to believing one is merely critiquing a recording. The second considers whether a human trainer will give more frequent feedback if the agent acts less greedily (i.e., choosing actions believed to be worse) when the trainer’s recent feedback frequency decreases. From the results of these experiments, we draw three main conclusions that inform the design of agents. More broadly, these two studies stand as early examples of a nascent technique of using agents as highly specifiable social entities in experiments on human behavior.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. Following common practice in reinforcement learning, we use “reward” to mean both positively and negatively valued feedback.

  2. The trainer’s assessment of return is, of course, dependent on her understanding of the task and expectation of future behavior, both of which may be flawed and will likely become more accurate over time.

  3. Tetris is one of five task domains for which tamer has published results [1820, 33].

  4. The specification of Tetris in RL-Library follows does not differ from that of traditional Tetris, except that there are no points or levels of increasing speed, omissions that are standard in Tetris learning literature [4]. We use RL-Library for convenience and its compatibility with RL-Glue, a software specification for reinforcement learning agents and environments.

  5. Instructions given to subjects can be found at http://www.cs.utexas.edu/~bradknox/papers/12ijsr.

  6. The Greedy group can be considered similar to the Teaching group from the critique experiment. The two groups’ instructions do contain differences, but both groups have identical tamer agent algorithms and subjects are aware that they are teaching.

  7. Performance is again tested offline, not during training, and the testing policy is greedy regardless of condition.

  8. Illustrating the bimodality of performance, there were 79 subjects across conditions. In the 9th testing interval, 23 agents clear between 0–1 lines; 47 clear more than 100. Only 2 agents clear 5–20 lines.

  9. Though exploration is often considered equivalent to non-greedy action, this definition does not fit all instances of its use in RL. For instance, an agent that employs an exploratory policy might have a greedy policy that sometimes agrees on what action to select. However, this is a semantic point that does not affect our assertion that the comprehensive dichotomy of explore/exploit is insufficient.

  10. A human opposite the subject could have fully scripted behavior, act naturally except in certain situations (like misbehaving at certain times), or simply act naturally. Additionally, the subject may believe either that this person is a fellow subject or that she is working for the experimenters. We call this human that would potentially be replaced by an agent a “human actor” for simplicity and to differentiate from the subject.

References

  1. Abbeel P, Ng A (2004) Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the twenty-first international conference on machine learning. ACM, New York, p 1

    Chapter  Google Scholar 

  2. Argall B, Browning B, Veloso M (2007) Learning by demonstration with critique from a human teacher. In: Proceedings of the ACM/IEEE international conference on Human-robot interaction. ACM, New York, pp 57–64

    Chapter  Google Scholar 

  3. Argall B, Chernova S, Veloso M, Browning B (2009) A survey of robot learning from demonstration. Robot Auton Syst 57(5):469–483

    Article  Google Scholar 

  4. Bertsekas D, Tsitsiklis J (1996) Neuro-dynamic programming. Athena Scientific, Nashua

    MATH  Google Scholar 

  5. Bouton M (2007) Learning and behavior: a contemporary synthesis. Sinauer Associates, Sunderland

    Google Scholar 

  6. Breazeal C (2004) Designing sociable robots. MIT Press, Cambridge

    Google Scholar 

  7. Chernova S, Veloso M (2009) Interactive policy learning through confidence-based autonomy. J Artif Intell Res 34(1):1–25

    MathSciNet  MATH  Google Scholar 

  8. Chernova S, Veloso M (2009) Teaching collaborative multi-robot tasks through demonstration. In: 8th IEEE-RAS international conference on humanoid robots, Humanoids, 2008. IEEE Press, New York, pp 385–390.

    Google Scholar 

  9. Dautenhahn K (2007) Methodology and themes of human-robot interaction: a growing research field. Int J Adv Robot Syst 4(1):103–108

    Google Scholar 

  10. Dobbs J, Arnold D, Doctoroff G (2004) Attention in the preschool classroom: the relationships among child gender, child misbehavior, and types of teacher attention. Early Child Dev Care 174(3):281–295

    Article  Google Scholar 

  11. Evers V, Maldonado H, Brodecki T, Hinds P (2008) Relational vs. group self-construal: untangling the role of national culture in hri. In: Proceedings of the 3rd ACM/IEEE international conference on human robot interaction. ACM, New York, pp 255–262

    Chapter  Google Scholar 

  12. Fagot B (1973) Influence of teacher behavior in the preschool. Dev Psychol 9(2):198

    Article  Google Scholar 

  13. Grollman D, Jenkins O (2007) Dogged learning for robots. In: IEEE international conference on robotics and automation, 2007. IEEE Press, New York, pp 2483–2488

    Google Scholar 

  14. Hinds P, Roberts T, Jones H (2004) Whose job is it anyway? A study of human-robot interaction in a collaborative task. Hum-Comput Interact 19(1):151–181

    Article  Google Scholar 

  15. Isbell C, Kearns M, Singh S, Shelton C, Stone P, Kormann D (2006) Cobot in LambdaMOO: an adaptive social statistics agent. In: AAMAS

    Google Scholar 

  16. Kaochar T, Peralta R, Morrison C, Fasel I, Walsh T, Cohen P (2011) Towards understanding how humans teach robots. In: User modeling, adaption and personalization, pp 347–352

    Chapter  Google Scholar 

  17. Kim E, Leyzberg D, Tsui K, Scassellati B (2009) How people talk when teaching a robot. In: Proceedings of the 4th ACM/IEEE international conference on human robot interaction. ACM, New York, pp 23–30

    Google Scholar 

  18. Knox W, Stone P (2009) Interactively shaping agents via human reinforcement: the TAMER framework. In: The 5th international conference on knowledge capture

    Google Scholar 

  19. Knox WB, Breazeal C, Stone P (2012) Learning from feedback on actions past and intended. In: Proceedings of 7th ACM/IEEE international conference on Human-Robot interaction, Late-Breaking reports session (HRI 2012)

    Google Scholar 

  20. Knox WB, Stone P (2012) Reinforcement learning with human and MDP reward. In: Proceedings of the 11th international conference on autonomous agents and multiagent systems (AAMAS)

    Google Scholar 

  21. Kuhlmann G, Stone P, Mooney R, Shavlik J (2004) Guiding a reinforcement learner with natural language advice: initial results in RoboCup soccer. In: The AAAI-2004 workshop on supervisory control of learning and adaptive systems

    Google Scholar 

  22. MacDorman K, Ishiguro H (2006) The uncanny advantage of using androids in cognitive and social science research. Interact Stud 7(3):297–337

    Article  Google Scholar 

  23. MacDorman K, Minato T, Shimada M, Itakura S, Cowley S, Ishiguro H (2005) Assessing human likeness by eye contact in an android testbed. In: Proceedings of the XXVII annual meeting of the cognitive science society, pp 21–23

    Google Scholar 

  24. Maclin R, Shavlik J (1996) Creating advice-taking reinforcement learners. Mach Learn 22(1):251–281

    Google Scholar 

  25. Nicolescu M, Mataric M (2002) Learning and interacting in human-robot domains. IEEE Trans Syst Man Cybern, Part A, Syst Hum 31(5):419–430

    Article  Google Scholar 

  26. Nicolescu M, Mataric M (2003) Natural methods for robot task learning: instructive demonstrations, generalization and practice. In: AAMAS. ACM, New York, pp 241–248

    Google Scholar 

  27. Pomerleau D (1989) ALVINN: an autonomous land vehicle in a neural network. Advances in neural information processing systems, vol 1. Morgan Kaufmann, San Mateo

    Google Scholar 

  28. Pryor K (2002) Don’t shoot the dog! The new art of teaching and training. Interpet Publishing, Dorking

    Google Scholar 

  29. Ramirez K (1999) Animal training: successful animal management through positive reinforcement. Shedd Aquarium, Chicago

    Google Scholar 

  30. Reed K, Patton J, Peshkin M (2007) Replicating human-human physical interaction. In: IEEE international conference on robotics and automation

    Google Scholar 

  31. Rouder J, Speckman P, Sun D, Morey R, Iverson G (2009) Bayesian t tests for accepting and rejecting the null hypothesis. Psychon Bull Rev 16(2):225–237

    Article  Google Scholar 

  32. Saunders J, Nehaniv C, Dautenhahn K (2006) Teaching robots by moulding behavior and scaffolding the environment. In: Proceedings of the 1st ACM SIGCHI/SIGART conference on human-robot interaction. ACM, New York, pp 118–125

    Chapter  Google Scholar 

  33. Sridharan M (2011) Augmented reinforcement learning for interaction with non-expert humans in agent domains. In: Proceedings of IEEE international conference on machine learning applications

    Google Scholar 

  34. Sutton R, Barto A (1998) Reinforcement learning: an introduction. MIT Press, Cambridge

    Google Scholar 

  35. Tanner B, White A (2009) RL-Glue: Language-independent software for reinforcement-learning experiments. J Mach Learn Res 10:2133–2136

    Google Scholar 

  36. Thomaz A (2006) Socially guided machine learning. PhD thesis, Citeseer

  37. Thomaz A, Breazeal C (2006) Reinforcement learning with human teachers: evidence of feedback and guidance with implications for learning performance. In: AAAI

    Google Scholar 

  38. Thomaz A, Cakmak M (2009) Learning about objects with human teachers. In: Proceedings of the 4th ACM/IEEE international conference on human robot interaction. ACM, New York, pp 15–22

    Google Scholar 

  39. Wolfgang C (2004) Solving discipline and classroom management problems: methods and models for today’s teachers. Wiley, New York

    Google Scholar 

  40. Woodward M, Wood R (2009) Using Bayesian inference to learn high-level tasks from a human teacher. In: International conference on artificial intelligence and pattern recognition, AIPR-09

    Google Scholar 

Download references

Acknowledgements

This research was supported in part by NIH (R01 MH077708 to WTM), NSF (IIS-0917122), AFOSR (FA9550-10-1-0268), ONR (N00014-09-1-0658), and the FHWA (DTFH61-07-H-00030). We thank the research assistants of MaddoxLab for their crucial help gathering data.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to W. Bradley Knox.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Knox, W.B., Glass, B.D., Love, B.C. et al. How Humans Teach Agents. Int J of Soc Robotics 4, 409–421 (2012). https://doi.org/10.1007/s12369-012-0163-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12369-012-0163-x

Keywords

Navigation