Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning


For real-world applications, virtual agents must be able to learn new behaviors from non-technical users. Positive and negative feedback are an intuitive way to train new behaviors, and existing work has presented algorithms for learning from such feedback. That work, however, treats feedback as numeric reward to be maximized, and assumes that all trainers provide feedback in the same way. In this work, we show that users can provide feedback in many different ways, which we describe as “training strategies.” Specifically, users may not always give explicit feedback in response to an action, and may be more likely to provide explicit reward than explicit punishment, or vice versa, such that the lack of feedback itself conveys information about the behavior. We present a probabilistic model of trainer feedback that describes how a trainer chooses to provide explicit reward and/or explicit punishment and, based on this model, develop two novel learning algorithms (SABL and I-SABL) which take trainer strategy into account, and can therefore learn from cases where no feedback is provided. Through online user studies we demonstrate that these algorithms can learn with less feedback than algorithms based on a numerical interpretation of feedback. Furthermore, we conduct an empirical analysis of the training strategies employed by users, and of factors that can affect their choice of strategy.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8


  1. 1.

    Note that for the \(\mu \) parameters, \(+\) and \(-\) distinguish reward and punishment, and not explicit/implicit feedback as in the R\(+\)/P\(+\) notation taken from the behaviorism literature.

  2. 2.

    Though users were instructed to enable their computer speakers, we have no way of knowing whether the participant could actually hear the dog cry.

  3. 3.

    We exclude more data in the Mechanical Turk studies to remove participants who do the minimum amount of work to receive their compensation.


  1. 1.

    Abbeel, P., & Ng, A. Y. (2004). Apprenticeship learning via inverse reinforcement learning. In Proceedings of the Twenty-First International Conference on Machine Learning.

  2. 2.

    Argall, B., Browning, B., & Veloso, M. (2007). Learning by demonstration with critique from a human teacher. In Proceedings of the ACM/IEEE International Conference on human-robot interaction (pp. 57–64).

  3. 3.

    Argall, B. D., Chernova, S., Veloso, M., & Browning, B. (2009). A survey of robot learning from demonstration. Robotics and Autonomous Systems, 57(5), 469–483.

    Article  Google Scholar 

  4. 4.

    Atkeson, C. G., & Schaal, S. (1997). Robot learning from demonstration. In Proceedings of the Fourteenth International Conference on machine learning.

  5. 5.

    Cakmak, M., & Lopes, M. (2012). Algorithmic and human teaching of sequential decision tasks. In Proceedings of the Twenty-sixth AAAI Conference on artificial intelligence (pp. 1536–1542).

  6. 6.

    Chernova, S., & Veloso, M. (2007). Confidence-based policy learning from demonstration using gaussian mixture models. In Proceedings of the 6th International Conference on autonomous agents and multiagent systems.

  7. 7.

    Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1), 1–38.

    MATH  MathSciNet  Google Scholar 

  8. 8.

    Griffith, S., Subramanian, K., Scholz, J., Isbell, C., & Thomaz, A. L. (2013). Policy shaping: Integrating human feedback with reinforcement learning. Advances in Neural Information Processing Systems, 26.

  9. 9.

    Heer, J., Good, N. S., Ramirez, A., Davis, M., & Mankoff, J. (2004). Presiding over accidents: system direction of human action. In Proceedings of the SIGCHI Conference on human factors in computing systems (pp. 463–470).

  10. 10.

    Hiby, E., Rooney, N., & Bradshaw, J. (2004). Dog training methods: Their use, effectiveness and interaction with behaviour and welfare. Animal Welfare, 13(1), 63–70.

    Google Scholar 

  11. 11.

    Isbell, C., Shelton, C., Kearns, M., Singh, S., & Stone, P. (2001). A social reinforcement learning agent. In Proceedings of the Fifth International Conference on Autonomous Agents (pp. 377–384).

  12. 12.

    Judah, K., Fern, A., Tadepalli, P., & Goetschalckx, R. (2014). Imitation learning with demonstrations and shaping rewards. In Proceedings of the twenty-eighth AAAI Conference on Artificial Intelligence (pp. 1890–1896).

  13. 13.

    Judah, K., Roy, S., Fern, A., & Dietterich, T. G. (2010). Reinforcement learning via practice and critique advice. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (pp. 481–486).

  14. 14.

    Khan, F., Zhu, X., & Mutlu, B. (2011). How do humans teach: On curriculum learning and teaching dimension. In Proceedings of the Twenty-fifth Annual Conference on Neural Information Processing Systems (pp. 1449–1457).

  15. 15.

    Knox, W., Stone, P., & Breazeal, C. (2013). Training a robot via human feedback: A case study. Social Robotics, Lecture Notes in Computer Science, 8239, 460–470.

    Article  Google Scholar 

  16. 16.

    Knox, W. B., Glass, B. D., Love, B. C., Maddox, W. T., & Stone, P. (2012). How humans teach agents—a new experimental perspective. International Journal of Social Robotics, 4(4), 409–421.

    Article  Google Scholar 

  17. 17.

    Knox, W. B., & Stone, P. (2009). Interactively shaping agents via human reinforcement: The tamer framework. In Proceedings of the Fifth International Conference on Knowledge Capture (pp. 9–16).

  18. 18.

    Knox, W. B., Taylor, M. E., & Stone, P. (2011). Understanding human teaching modalities in reinforcement learning environments: A preliminary report. In Proceedings of the Workshop on Agents Learning Interactively from Human Teachers (at IJCAI-11).

  19. 19.

    Li, G., Hung, H., Whiteson, S., & Knox, W. B. (2013). Using informative behavior to increase engagement in the TAMER framework. In Proceedings of the Twelfth International Conference on Autonomous Agents and Multiagent Systems (pp. 909–916).

  20. 20.

    Lu, T., Pal, D., Pal, M. (2010). Contextual multi-armed bandits. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics.

  21. 21.

    Ng, A. Y., & Russell, S. J. (2000). Algorithms for inverse reinforcement learning. In Proceedings of the Seventeenth International Conference on Machine Learning (pp. 663–670).

  22. 22.

    Ramachandran, D. (2007). Bayesian inverse reinforcement learning. In Proceedings of the Twentieth International Joint Conference on Artificial Intelligence.

  23. 23.

    Skinner, B. F. (1938). The behavior of organisms: An experimental analysis. New York: Appleton-Century.

    Google Scholar 

  24. 24.

    Skinner, B. F. (1953). Science and human behavior. New York: Macmillan.

    Google Scholar 

  25. 25.

    Sutton, R., & Barto, A. (1998). Reinforcement learning: An introduction. Cambridge: MIT Press.

    Google Scholar 

  26. 26.

    Tesauro, G. (1990). Neurogammon: A neural-network backgammon program. In Proceedings of the International Joint Conference on Neural Networks (pp. 33–39). IEEE.

  27. 27.

    Thomaz, A. L., Breazeal, C. (2006). Reinforcement learning with human teachers: Evidence of feedback and guidance with implications for learning performance. In Proceedings of the Twenty-first AAAI Conference on Artificial Intelligence (pp. 1000–1005).

  28. 28.

    Ziebart, B. D., Maas, A., Bagnell, J. A., & Dey, A. K. (2008). Maximum entropy inverse reinforcement learning. In Proceedings of the Twenty-third AAAI conference on artificial intelligence.

Download references


This work was supported in part by Grants IIS-1149917 and IIS-1319412 from the National Science Foundation.

Author information



Corresponding author

Correspondence to Robert Loftin.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Loftin, R., Peng, B., MacGlashan, J. et al. Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning. Auton Agent Multi-Agent Syst 30, 30–59 (2016). https://doi.org/10.1007/s10458-015-9283-7

Download citation


  • Learning from feedback
  • Reinforcement learning
  • Bayesian inference
  • Interactive learning
  • Machine learning
  • Human–computer interaction