# Inverse reinforcement learning from summary data

**Part of the following topical collections:**

## Abstract

Inverse reinforcement learning (IRL) aims to explain observed strategic behavior by fitting reinforcement learning models to behavioral data. However, traditional IRL methods are only applicable when the observations are in the form of state-action paths. This assumption may not hold in many real-world modeling settings, where only partial or summarized observations are available. In general, we may assume that there is a summarizing function \(\sigma \), which acts as a filter between us and the true state-action paths that constitute the demonstration. Some initial approaches to extending IRL to such situations have been presented, but with very specific assumptions about the structure of \(\sigma \), such as that only certain state observations are missing. This paper instead focuses on the most general case of the problem, where no assumptions are made about the summarizing function, except that it can be evaluated. We demonstrate that inference is still possible. The paper presents exact and approximate inference algorithms that allow full posterior inference, which is particularly important for assessing parameter uncertainty in this challenging inference situation. Empirical scalability is demonstrated to reasonably sized problems, and practical applicability is demonstrated by estimating the posterior for a cognitive science RL model based on an observed user’s task completion time only.

## Keywords

Inverse reinforcement learning Bayesian inference Monte-Carlo estimation Approximate Bayesian computation## Notes

### Acknowledgements

This work has been supported by the Academy of Finland (Finnish Centre of Excellence in Computational Inference Research COIN, and Grants 294238, 292334). Computational resources were provided by the Aalto Science IT project.

## References

- Abbeel, P., Coates, A., & Ng, A. Y. (2010). Autonomous helicopter aerobatics through apprenticeship learning.
*The International Journal of Robotics Research*,*29*(13), 1608–1639.CrossRefGoogle Scholar - Abbeel, P., & Ng, A. Y. (2004). Apprenticeship learning via inverse reinforcement learning. In
*International conference on machine learning*, ACM, ICML ’04 (pp. 1–8).Google Scholar - Bailly, G., Oulasvirta, A., Brumby, D. P., & Howes, A. (2014). Model of visual search and selection time in linear menus. In
*ACM conference on human factors in computing systems*, ACM, CHI ’14 (pp. 3865–3874).Google Scholar - Banovic, N., Buzali, T., Chevalier, F., Mankoff, J., & Dey, A. K. (2016). Modeling and understanding human routine behavior. In
*ACM conference on human factors in computing systems*, ACM, CHI ’16 (pp. 248–260).Google Scholar - Bloem, M., & Bambos, N. (2014). Infinite time horizon maximum causal entropy inverse reinforcement learning. In
*IEEE conference on decision and control*, CDC ’14 (pp. 4911–4916).Google Scholar - Boularias, A., Kober, J., & Peters, J. (2011). Relative entropy inverse reinforcement learning. In
*International conference on artificial intelligence and statistics*, PMLR, AISTATS ’11 (Vol. 15, pp. 182–189).Google Scholar - Brochu, E., Cora, M., & De Freitas, N. (2009). A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. Technical Report TR-2009-023, Department of Computer Science, University of British Columbia.Google Scholar
- Chandramohan, S., Geist, M., Lefevre, F., & Pietquin, O. (2011). User simulation in dialogue systems using inverse reinforcement learning. In
*Conference of the international speech communication association*, INTERSPEECH ’11 (pp. 1025–1028).Google Scholar - Chen, X., Bailly, G., Brumby, D. P., Oulasvirta, A., & Howes, A. (2015). The emergence of interactive behavior: A model of rational menu search. In
*ACM conference on human factors in computing systems*, ACM, CHI ’15 (pp. 4217–4226).Google Scholar - Choi, J., & Kim, K. E. (2011). Inverse reinforcement learning in partially observable environments.
*Journal of Machine Learning Research*,*12*, 691–730.MathSciNetzbMATHGoogle Scholar - Choi, J., & Kim, K. E. (2015). Hierarchical Bayesian inverse reinforcement learning.
*IEEE Transactions on Cybernetics*,*45*(4), 793–805.CrossRefGoogle Scholar - Dimitrakakis, C., & Rothkopf, C. A. (2011). Bayesian multitask inverse reinforcement learning. In
*European workshop on recent advances in reinforcement learning*, Springer (pp. 273–284).Google Scholar - Gallese, V., & Goldman, A. (1998). Mirror neurons and the simulation theory of mind-reading.
*Trends in Cognitive Sciences*,*2*(12), 493–501.CrossRefGoogle Scholar - González, J., Dai, Z., Hennig, P., & Lawrence, N. (2016). Batch Bayesian optimization via local penalization. In
*International conference on artificial intelligence and statistics*, PMLR, AISTATS ’16 (pp. 648–657).Google Scholar - Gutmann, M. U., & Corander, J. (2016). Bayesian optimization for likelihood-free inference of simulator-based statistical models.
*Journal of Machine Learning Research*,*17*(125), 1–47.MathSciNetzbMATHGoogle Scholar - Gutmann, M., Dutta, R., Kaski, S., & Corander, J. (2018). Likelihood-free inference via classification.
*Statistics and Computing*,*28*(2), 411–425.MathSciNetCrossRefzbMATHGoogle Scholar - Herman, M., Gindele, T., Wagner, J., Schmitt, F., & Burgard, W. (2016). Inverse reinforcement learning with simultaneous estimation of rewards and dynamics. In
*International conference on artificial intelligence and statistics*, PMLR, AISTATS ’16 (pp. 102–110).Google Scholar - Kangasrääsiö, A., Athukorala, K., Howes, A., Corander, J., Kaski, S., & Oulasvirta, A. (2017). Inferring cognitive models from data using approximate Bayesian computation. In
*ACM conference on human factors in computing systems*, ACM, CHI ’17 (pp. 1295–1306).Google Scholar - Kitani, K. M., Ziebart, B. D., Bagnell, J. A., & Hebert, M. (2012). Activity forecasting. In
*European conference on computer vision*, Springer, ECCV ’12 (pp. 201–214).Google Scholar - Klein, E., Geist, M., Piot, B., & Pietquin, O. (2012). Inverse reinforcement learning through structured classification. In
*Advances in neural information processing systems*, Curran Associates, Inc., NIPS ’12 (pp. 1007–1015).Google Scholar - Klein, E., Piot, B., Geist, M., & Pietquin, O. (2013). A cascaded supervised learning approach to inverse reinforcement learning. In
*Joint European conference on machine learning and knowledge discovery in databases*, Springer, ECML PKDD ’13 (pp. 1–16).Google Scholar - Lintusaari, J., Gutmann, M. U., Dutta, R., Kaski, S., & Corander, J. (2017). Fundamentals and recent developments in approximate Bayesian computation.
*Systematic Biology*,*66*(1), e66–e82.Google Scholar - Michini, B., & How, J. P. (2012). Bayesian nonparametric inverse reinforcement learning. In
*Joint European conference on machine learning and knowledge discovery in databases*, Springer, ECML PKDD ’12 (pp. 148–163).Google Scholar - Mohammed, R. A. A., & Staadt, O. (2015). Learning eye movements strategies on tiled large high-resolution displays using inverse reinforcement learning. In
*International joint conference on neural networks*, IJCNN ’15 (pp. 1–7).Google Scholar - Neu, G., & Szepesvári, C. (2007). Apprenticeship learning using inverse reinforcement learning and gradient methods. In
*Conference on uncertainty in artificial intelligence*, UAI ’07 (pp. 295–302).Google Scholar - Ng, A. Y., & Russell, S. (2000). Algorithms for inverse reinforcement learning. In
*International conference on machine learning*(pp. 663–670).Google Scholar - Nguyen, Q. P., Low, B. K. H., & Jaillet, P. (2015). Inverse reinforcement learning with locally consistent reward functions. In
*Advances in neural information processing systems*, Curran Associates, Inc., NIPS ’15 (pp. 1747–1755).Google Scholar - Ramachandran, D., & Amir, E. (2007). Bayesian inverse reinforcement learning. In
*International joint conference on artificial intelligence*, IJCAI ’07 (Vol. 51, pp. 2586–2591).Google Scholar - Rasmussen, C. E. (2003). Gaussian processes to speed up hybrid Monte Carlo for expensive Bayesian integrals. In
*Bayesian statistics 7*, Oxford University Press (pp. 651–659).Google Scholar - Rasmussen, C. E. (2004). Gaussian processes in machine learning. In
*Advanced lectures on machine learning*, Springer (pp. 63–71).Google Scholar - Ratliff, N. D., Bagnell, J. A., & Zinkevich, M. A. (2006). Maximum margin planning. In
*International conference on machine learning*, ACM, ICML ’06 (pp. 729–736).Google Scholar - Rothkopf, C. A., & Dimitrakakis, C. (2011). Preference elicitation and inverse reinforcement learning. In
*Joint European conference on machine learning and knowledge discovery in databases*, Springer, ECML PKDD ’11 (pp. 34–48).Google Scholar - Russell, S. (1998). Learning agents for uncertain environments. In
*Conference on computational learning theory*, ACM, COLT ’98 (pp. 101–103).Google Scholar - Sunnåker, M., Busetto, A. G., Numminen, E., Corander, J., Foll, M., & Dessimoz, C. (2013). Approximate Bayesian computation.
*PLOS Computational Biology*,*9*(1), 1–10.MathSciNetCrossRefGoogle Scholar - Surana, A. (2014). Unsupervised inverse reinforcement learning with noisy data. In
*IEEE conference on decision and control*, CDC ’14 (pp. 4938–4945).Google Scholar - Tossou, A. C. Y., & Dimitrakakis, C. (2013). Probabilistic inverse reinforcement learning in unknown environments. In
*Conference on uncertainty in artificial intelligence*, AUAI Press, UAI ’13 (pp. 635–643).Google Scholar - Wang, Z., Hutter, F., Zoghi, M., Matheson, D., & de Feitas, N. (2016). Bayesian optimization in a billion dimensions via random embeddings.
*Journal of Artificial Intelligence Research*,*55*, 361–387.MathSciNetCrossRefzbMATHGoogle Scholar - Zhifei, S., & Joo, E. M. (2012). A survey of inverse reinforcement learning techniques.
*International Journal of Intelligent Computing and Cybernetics*,*5*(3), 293–311.MathSciNetCrossRefGoogle Scholar - Ziebart, B. D., Maas, A. L., Bagnell, J. A., & Dey, A. K. (2008). Maximum entropy inverse reinforcement learning. In
*National conference on artificial intelligence*, AAAI ’08 (pp. 1433–1438).Google Scholar - Ziebart, B. D., Ratliff, N., Gallagher, G., Mertz, C., Peterson, K., Bagnell, J. A., Hebert, M., Dey, A. K., & Srinivasa, S. (2009). Planning-based prediction for pedestrians. In
*International conference on intelligent robots and systems*(pp. 3931–3936).Google Scholar