Advertisement

Machine Learning

, Volume 107, Issue 8–10, pp 1517–1535 | Cite as

Inverse reinforcement learning from summary data

  • Antti Kangasrääsiö
  • Samuel Kaski
Article
Part of the following topical collections:
  1. Special Issue of the ECML PKDD 2018 Journal Track

Abstract

Inverse reinforcement learning (IRL) aims to explain observed strategic behavior by fitting reinforcement learning models to behavioral data. However, traditional IRL methods are only applicable when the observations are in the form of state-action paths. This assumption may not hold in many real-world modeling settings, where only partial or summarized observations are available. In general, we may assume that there is a summarizing function \(\sigma \), which acts as a filter between us and the true state-action paths that constitute the demonstration. Some initial approaches to extending IRL to such situations have been presented, but with very specific assumptions about the structure of \(\sigma \), such as that only certain state observations are missing. This paper instead focuses on the most general case of the problem, where no assumptions are made about the summarizing function, except that it can be evaluated. We demonstrate that inference is still possible. The paper presents exact and approximate inference algorithms that allow full posterior inference, which is particularly important for assessing parameter uncertainty in this challenging inference situation. Empirical scalability is demonstrated to reasonably sized problems, and practical applicability is demonstrated by estimating the posterior for a cognitive science RL model based on an observed user’s task completion time only.

Keywords

Inverse reinforcement learning Bayesian inference Monte-Carlo estimation Approximate Bayesian computation 

Notes

Acknowledgements

This work has been supported by the Academy of Finland (Finnish Centre of Excellence in Computational Inference Research COIN, and Grants 294238, 292334). Computational resources were provided by the Aalto Science IT project.

References

  1. Abbeel, P., Coates, A., & Ng, A. Y. (2010). Autonomous helicopter aerobatics through apprenticeship learning. The International Journal of Robotics Research, 29(13), 1608–1639.CrossRefGoogle Scholar
  2. Abbeel, P., & Ng, A. Y. (2004). Apprenticeship learning via inverse reinforcement learning. In International conference on machine learning, ACM, ICML ’04 (pp. 1–8).Google Scholar
  3. Bailly, G., Oulasvirta, A., Brumby, D. P., & Howes, A. (2014). Model of visual search and selection time in linear menus. In ACM conference on human factors in computing systems, ACM, CHI ’14 (pp. 3865–3874).Google Scholar
  4. Banovic, N., Buzali, T., Chevalier, F., Mankoff, J., & Dey, A. K. (2016). Modeling and understanding human routine behavior. In ACM conference on human factors in computing systems, ACM, CHI ’16 (pp. 248–260).Google Scholar
  5. Bloem, M., & Bambos, N. (2014). Infinite time horizon maximum causal entropy inverse reinforcement learning. In IEEE conference on decision and control, CDC ’14 (pp. 4911–4916).Google Scholar
  6. Boularias, A., Kober, J., & Peters, J. (2011). Relative entropy inverse reinforcement learning. In International conference on artificial intelligence and statistics, PMLR, AISTATS ’11 (Vol. 15, pp. 182–189).Google Scholar
  7. Brochu, E., Cora, M., & De Freitas, N. (2009). A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. Technical Report TR-2009-023, Department of Computer Science, University of British Columbia.Google Scholar
  8. Chandramohan, S., Geist, M., Lefevre, F., & Pietquin, O. (2011). User simulation in dialogue systems using inverse reinforcement learning. In Conference of the international speech communication association, INTERSPEECH ’11 (pp. 1025–1028).Google Scholar
  9. Chen, X., Bailly, G., Brumby, D. P., Oulasvirta, A., & Howes, A. (2015). The emergence of interactive behavior: A model of rational menu search. In ACM conference on human factors in computing systems, ACM, CHI ’15 (pp. 4217–4226).Google Scholar
  10. Choi, J., & Kim, K. E. (2011). Inverse reinforcement learning in partially observable environments. Journal of Machine Learning Research, 12, 691–730.MathSciNetzbMATHGoogle Scholar
  11. Choi, J., & Kim, K. E. (2015). Hierarchical Bayesian inverse reinforcement learning. IEEE Transactions on Cybernetics, 45(4), 793–805.CrossRefGoogle Scholar
  12. Dimitrakakis, C., & Rothkopf, C. A. (2011). Bayesian multitask inverse reinforcement learning. In European workshop on recent advances in reinforcement learning, Springer (pp. 273–284).Google Scholar
  13. Gallese, V., & Goldman, A. (1998). Mirror neurons and the simulation theory of mind-reading. Trends in Cognitive Sciences, 2(12), 493–501.CrossRefGoogle Scholar
  14. González, J., Dai, Z., Hennig, P., & Lawrence, N. (2016). Batch Bayesian optimization via local penalization. In International conference on artificial intelligence and statistics, PMLR, AISTATS ’16 (pp. 648–657).Google Scholar
  15. Gutmann, M. U., & Corander, J. (2016). Bayesian optimization for likelihood-free inference of simulator-based statistical models. Journal of Machine Learning Research, 17(125), 1–47.MathSciNetzbMATHGoogle Scholar
  16. Gutmann, M., Dutta, R., Kaski, S., & Corander, J. (2018). Likelihood-free inference via classification. Statistics and Computing, 28(2), 411–425.MathSciNetCrossRefzbMATHGoogle Scholar
  17. Herman, M., Gindele, T., Wagner, J., Schmitt, F., & Burgard, W. (2016). Inverse reinforcement learning with simultaneous estimation of rewards and dynamics. In International conference on artificial intelligence and statistics, PMLR, AISTATS ’16 (pp. 102–110).Google Scholar
  18. Kangasrääsiö, A., Athukorala, K., Howes, A., Corander, J., Kaski, S., & Oulasvirta, A. (2017). Inferring cognitive models from data using approximate Bayesian computation. In ACM conference on human factors in computing systems, ACM, CHI ’17 (pp. 1295–1306).Google Scholar
  19. Kitani, K. M., Ziebart, B. D., Bagnell, J. A., & Hebert, M. (2012). Activity forecasting. In European conference on computer vision, Springer, ECCV ’12 (pp. 201–214).Google Scholar
  20. Klein, E., Geist, M., Piot, B., & Pietquin, O. (2012). Inverse reinforcement learning through structured classification. In Advances in neural information processing systems, Curran Associates, Inc., NIPS ’12 (pp. 1007–1015).Google Scholar
  21. Klein, E., Piot, B., Geist, M., & Pietquin, O. (2013). A cascaded supervised learning approach to inverse reinforcement learning. In Joint European conference on machine learning and knowledge discovery in databases, Springer, ECML PKDD ’13 (pp. 1–16).Google Scholar
  22. Lintusaari, J., Gutmann, M. U., Dutta, R., Kaski, S., & Corander, J. (2017). Fundamentals and recent developments in approximate Bayesian computation. Systematic Biology, 66(1), e66–e82.Google Scholar
  23. Michini, B., & How, J. P. (2012). Bayesian nonparametric inverse reinforcement learning. In Joint European conference on machine learning and knowledge discovery in databases, Springer, ECML PKDD ’12 (pp. 148–163).Google Scholar
  24. Mohammed, R. A. A., & Staadt, O. (2015). Learning eye movements strategies on tiled large high-resolution displays using inverse reinforcement learning. In International joint conference on neural networks, IJCNN ’15 (pp. 1–7).Google Scholar
  25. Neu, G., & Szepesvári, C. (2007). Apprenticeship learning using inverse reinforcement learning and gradient methods. In Conference on uncertainty in artificial intelligence, UAI ’07 (pp. 295–302).Google Scholar
  26. Ng, A. Y., & Russell, S. (2000). Algorithms for inverse reinforcement learning. In International conference on machine learning (pp. 663–670).Google Scholar
  27. Nguyen, Q. P., Low, B. K. H., & Jaillet, P. (2015). Inverse reinforcement learning with locally consistent reward functions. In Advances in neural information processing systems, Curran Associates, Inc., NIPS ’15 (pp. 1747–1755).Google Scholar
  28. Ramachandran, D., & Amir, E. (2007). Bayesian inverse reinforcement learning. In International joint conference on artificial intelligence, IJCAI ’07 (Vol. 51, pp. 2586–2591).Google Scholar
  29. Rasmussen, C. E. (2003). Gaussian processes to speed up hybrid Monte Carlo for expensive Bayesian integrals. In Bayesian statistics 7, Oxford University Press (pp. 651–659).Google Scholar
  30. Rasmussen, C. E. (2004). Gaussian processes in machine learning. In Advanced lectures on machine learning, Springer (pp. 63–71).Google Scholar
  31. Ratliff, N. D., Bagnell, J. A., & Zinkevich, M. A. (2006). Maximum margin planning. In International conference on machine learning, ACM, ICML ’06 (pp. 729–736).Google Scholar
  32. Rothkopf, C. A., & Dimitrakakis, C. (2011). Preference elicitation and inverse reinforcement learning. In Joint European conference on machine learning and knowledge discovery in databases, Springer, ECML PKDD ’11 (pp. 34–48).Google Scholar
  33. Russell, S. (1998). Learning agents for uncertain environments. In Conference on computational learning theory, ACM, COLT ’98 (pp. 101–103).Google Scholar
  34. Sunnåker, M., Busetto, A. G., Numminen, E., Corander, J., Foll, M., & Dessimoz, C. (2013). Approximate Bayesian computation. PLOS Computational Biology, 9(1), 1–10.MathSciNetCrossRefGoogle Scholar
  35. Surana, A. (2014). Unsupervised inverse reinforcement learning with noisy data. In IEEE conference on decision and control, CDC ’14 (pp. 4938–4945).Google Scholar
  36. Tossou, A. C. Y., & Dimitrakakis, C. (2013). Probabilistic inverse reinforcement learning in unknown environments. In Conference on uncertainty in artificial intelligence, AUAI Press, UAI ’13 (pp. 635–643).Google Scholar
  37. Wang, Z., Hutter, F., Zoghi, M., Matheson, D., & de Feitas, N. (2016). Bayesian optimization in a billion dimensions via random embeddings. Journal of Artificial Intelligence Research, 55, 361–387.MathSciNetCrossRefzbMATHGoogle Scholar
  38. Zhifei, S., & Joo, E. M. (2012). A survey of inverse reinforcement learning techniques. International Journal of Intelligent Computing and Cybernetics, 5(3), 293–311.MathSciNetCrossRefGoogle Scholar
  39. Ziebart, B. D., Maas, A. L., Bagnell, J. A., & Dey, A. K. (2008). Maximum entropy inverse reinforcement learning. In National conference on artificial intelligence, AAAI ’08 (pp. 1433–1438).Google Scholar
  40. Ziebart, B. D., Ratliff, N., Gallagher, G., Mertz, C., Peterson, K., Bagnell, J. A., Hebert, M., Dey, A. K., & Srinivasa, S. (2009). Planning-based prediction for pedestrians. In International conference on intelligent robots and systems (pp. 3931–3936).Google Scholar

Copyright information

© The Author(s) 2018

Authors and Affiliations

  1. 1.Department of Computer ScienceAalto UniversityEspooFinland

Personalised recommendations