Advertisement

Autonomous Agents and Multi-Agent Systems

, Volume 30, Issue 1, pp 30–59 | Cite as

Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning

  • Robert Loftin
  • Bei Peng
  • James MacGlashan
  • Michael L. Littman
  • Matthew E. Taylor
  • Jeff Huang
  • David L. Roberts
Article

Abstract

For real-world applications, virtual agents must be able to learn new behaviors from non-technical users. Positive and negative feedback are an intuitive way to train new behaviors, and existing work has presented algorithms for learning from such feedback. That work, however, treats feedback as numeric reward to be maximized, and assumes that all trainers provide feedback in the same way. In this work, we show that users can provide feedback in many different ways, which we describe as “training strategies.” Specifically, users may not always give explicit feedback in response to an action, and may be more likely to provide explicit reward than explicit punishment, or vice versa, such that the lack of feedback itself conveys information about the behavior. We present a probabilistic model of trainer feedback that describes how a trainer chooses to provide explicit reward and/or explicit punishment and, based on this model, develop two novel learning algorithms (SABL and I-SABL) which take trainer strategy into account, and can therefore learn from cases where no feedback is provided. Through online user studies we demonstrate that these algorithms can learn with less feedback than algorithms based on a numerical interpretation of feedback. Furthermore, we conduct an empirical analysis of the training strategies employed by users, and of factors that can affect their choice of strategy.

Keywords

Learning from feedback Reinforcement learning Bayesian inference Interactive learning Machine learning Human–computer interaction 

Notes

Acknowledgments

This work was supported in part by Grants IIS-1149917 and IIS-1319412 from the National Science Foundation.

References

  1. 1.
    Abbeel, P., & Ng, A. Y. (2004). Apprenticeship learning via inverse reinforcement learning. In Proceedings of the Twenty-First International Conference on Machine Learning.Google Scholar
  2. 2.
    Argall, B., Browning, B., & Veloso, M. (2007). Learning by demonstration with critique from a human teacher. In Proceedings of the ACM/IEEE International Conference on human-robot interaction (pp. 57–64).Google Scholar
  3. 3.
    Argall, B. D., Chernova, S., Veloso, M., & Browning, B. (2009). A survey of robot learning from demonstration. Robotics and Autonomous Systems, 57(5), 469–483.CrossRefGoogle Scholar
  4. 4.
    Atkeson, C. G., & Schaal, S. (1997). Robot learning from demonstration. In Proceedings of the Fourteenth International Conference on machine learning.Google Scholar
  5. 5.
    Cakmak, M., & Lopes, M. (2012). Algorithmic and human teaching of sequential decision tasks. In Proceedings of the Twenty-sixth AAAI Conference on artificial intelligence (pp. 1536–1542).Google Scholar
  6. 6.
    Chernova, S., & Veloso, M. (2007). Confidence-based policy learning from demonstration using gaussian mixture models. In Proceedings of the 6th International Conference on autonomous agents and multiagent systems.Google Scholar
  7. 7.
    Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1), 1–38.zbMATHMathSciNetGoogle Scholar
  8. 8.
    Griffith, S., Subramanian, K., Scholz, J., Isbell, C., & Thomaz, A. L. (2013). Policy shaping: Integrating human feedback with reinforcement learning. Advances in Neural Information Processing Systems, 26.Google Scholar
  9. 9.
    Heer, J., Good, N. S., Ramirez, A., Davis, M., & Mankoff, J. (2004). Presiding over accidents: system direction of human action. In Proceedings of the SIGCHI Conference on human factors in computing systems (pp. 463–470).Google Scholar
  10. 10.
    Hiby, E., Rooney, N., & Bradshaw, J. (2004). Dog training methods: Their use, effectiveness and interaction with behaviour and welfare. Animal Welfare, 13(1), 63–70.Google Scholar
  11. 11.
    Isbell, C., Shelton, C., Kearns, M., Singh, S., & Stone, P. (2001). A social reinforcement learning agent. In Proceedings of the Fifth International Conference on Autonomous Agents (pp. 377–384).Google Scholar
  12. 12.
    Judah, K., Fern, A., Tadepalli, P., & Goetschalckx, R. (2014). Imitation learning with demonstrations and shaping rewards. In Proceedings of the twenty-eighth AAAI Conference on Artificial Intelligence (pp. 1890–1896).Google Scholar
  13. 13.
    Judah, K., Roy, S., Fern, A., & Dietterich, T. G. (2010). Reinforcement learning via practice and critique advice. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (pp. 481–486).Google Scholar
  14. 14.
    Khan, F., Zhu, X., & Mutlu, B. (2011). How do humans teach: On curriculum learning and teaching dimension. In Proceedings of the Twenty-fifth Annual Conference on Neural Information Processing Systems (pp. 1449–1457).Google Scholar
  15. 15.
    Knox, W., Stone, P., & Breazeal, C. (2013). Training a robot via human feedback: A case study. Social Robotics, Lecture Notes in Computer Science, 8239, 460–470.CrossRefGoogle Scholar
  16. 16.
    Knox, W. B., Glass, B. D., Love, B. C., Maddox, W. T., & Stone, P. (2012). How humans teach agents—a new experimental perspective. International Journal of Social Robotics, 4(4), 409–421.CrossRefGoogle Scholar
  17. 17.
    Knox, W. B., & Stone, P. (2009). Interactively shaping agents via human reinforcement: The tamer framework. In Proceedings of the Fifth International Conference on Knowledge Capture (pp. 9–16).Google Scholar
  18. 18.
    Knox, W. B., Taylor, M. E., & Stone, P. (2011). Understanding human teaching modalities in reinforcement learning environments: A preliminary report. In Proceedings of the Workshop on Agents Learning Interactively from Human Teachers (at IJCAI-11).Google Scholar
  19. 19.
    Li, G., Hung, H., Whiteson, S., & Knox, W. B. (2013). Using informative behavior to increase engagement in the TAMER framework. In Proceedings of the Twelfth International Conference on Autonomous Agents and Multiagent Systems (pp. 909–916).Google Scholar
  20. 20.
    Lu, T., Pal, D., Pal, M. (2010). Contextual multi-armed bandits. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics.Google Scholar
  21. 21.
    Ng, A. Y., & Russell, S. J. (2000). Algorithms for inverse reinforcement learning. In Proceedings of the Seventeenth International Conference on Machine Learning (pp. 663–670).Google Scholar
  22. 22.
    Ramachandran, D. (2007). Bayesian inverse reinforcement learning. In Proceedings of the Twentieth International Joint Conference on Artificial Intelligence.Google Scholar
  23. 23.
    Skinner, B. F. (1938). The behavior of organisms: An experimental analysis. New York: Appleton-Century.Google Scholar
  24. 24.
    Skinner, B. F. (1953). Science and human behavior. New York: Macmillan.Google Scholar
  25. 25.
    Sutton, R., & Barto, A. (1998). Reinforcement learning: An introduction. Cambridge: MIT Press.Google Scholar
  26. 26.
    Tesauro, G. (1990). Neurogammon: A neural-network backgammon program. In Proceedings of the International Joint Conference on Neural Networks (pp. 33–39). IEEE.Google Scholar
  27. 27.
    Thomaz, A. L., Breazeal, C. (2006). Reinforcement learning with human teachers: Evidence of feedback and guidance with implications for learning performance. In Proceedings of the Twenty-first AAAI Conference on Artificial Intelligence (pp. 1000–1005).Google Scholar
  28. 28.
    Ziebart, B. D., Maas, A., Bagnell, J. A., & Dey, A. K. (2008). Maximum entropy inverse reinforcement learning. In Proceedings of the Twenty-third AAAI conference on artificial intelligence.Google Scholar

Copyright information

© The Author(s) 2015

Authors and Affiliations

  • Robert Loftin
    • 1
  • Bei Peng
    • 2
  • James MacGlashan
    • 3
  • Michael L. Littman
    • 3
  • Matthew E. Taylor
    • 2
  • Jeff Huang
    • 3
  • David L. Roberts
    • 1
  1. 1.Department of Computer ScienceNorth Carolina State UniversityRaleighUSA
  2. 2.School of Electrical Engineering and Computer ScienceWashington State UniversityPullmanUSA
  3. 3.Department of Computer ScienceBrown UniversityProvidenceUSA

Personalised recommendations