Recognition of Agents Based on Observation of Their Sequential Behavior

  • Qifeng Qiao
  • Peter A. Beling
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8188)

Abstract

We study the use of inverse reinforcement learning (IRL) as a tool for recognition of agents on the basis of observation of their sequential decision behavior. We model the problem faced by the agents as a Markov decision process (MDP) and model the observed behavior of an agent in terms of forward planning for the MDP. The reality of the agent’s decision problem and process may not be expressed by the MDP and its policy, but we interpret the observation as optimal actions in the MDP. We use IRL to learn reward functions for the MDP and then use these reward functions as the basis for clustering or classification models. Experimental studies with GridWorld, a navigation problem, and the secretary problem, an optimal stopping problem, show algorithms’ performance in different learning scenarios for agent recognition where the agents’ underlying decision strategy may be expressed by the MDP policy or not. Empirical comparisons of our method with several existing IRL algorithms and with direct methods that use feature statistics observed in state-action space suggest it may be superior for agent recognition problems, particularly when the state space is large but the length of the observed decision trajectory is small.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the Twenty-First International Conference on Machine Learning (2004)Google Scholar
  2. 2.
    Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robot. Auton. Syst. 57(5), 469–483 (2009)CrossRefGoogle Scholar
  3. 3.
    Babes-Vroman, M., Marivate, V., Subramanian, K., Litman, M.: Apprenticeship learning about multiple intentions. In: The 28th International Conference on Machine Learning, WA, USA (2011)Google Scholar
  4. 4.
    Baker, C.L., Saxe, R., Tenenbaum, J.B.: Action understanding as inverse planning. Cognition 113, 329–349 (2009)CrossRefGoogle Scholar
  5. 5.
    Boularias, A., Chaib-draa, B.: Bootstrapping apprenticeship learning. In: Advances in Neural Information Processing Systems 24. MIT Press (2010)Google Scholar
  6. 6.
    Choi, J., Kim, K.-E.: Map inference for bayesian inverse reinforcement learning. In: Advances in Neural Information Processing System, pp. 1989–1997 (2011)Google Scholar
  7. 7.
    Cobo, L.C., Isbell Jr., C.L., Thomaz, A.L.: Automatic task decomposition and state abstraction from demonstration. In: AAMAS, pp. 483–490 (2012)Google Scholar
  8. 8.
    Deepak, R., Eyal, A.: Bayesian inverse reinforcement learning. In: Proc. 20th International Joint Conf. on Artificial Intelligence (2007)Google Scholar
  9. 9.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley (2001)Google Scholar
  10. 10.
    Dvijotham, K., Todorov, E.: Inverse optimal control with linearly-solvable mdps. In: Proc. 27th International Conf. on Machine Learning. ACM (2010)Google Scholar
  11. 11.
    Konidaris, G.D., Kuindersma, S.R., Grupen, R.A., Barto, A.G.: Robot learning from demonstration by constructing skill trees. International Journal of Robotics Research 31(3), 360–375 (2012)CrossRefGoogle Scholar
  12. 12.
    Neu, G., Szepesvari, C.: Apprenticeship learning using inverse reinforcement learning and gradient methods. In: Proc. Uncertainty in Artificial Intelligence (2007)Google Scholar
  13. 13.
    Ng, A.Y., Russell, S.: Algorithms for inverse reinforcement learning. In: Proc. 17th International Conf. on Machine Learning, pp. 663–670. Morgan Kaufmann (2000)Google Scholar
  14. 14.
    Paddrik, M., Hayes, R., Todd, A., Yang, S., Beling, P., Scherer, W.: An agent based model of the e-mini s&p 500: Applied to flash crash analysis. In: 2012 IEEE Symposium on Computational Intelligence for Financial Engineering and Economics, CIFEr 2012 (2012)Google Scholar
  15. 15.
    Qiao, Q., Beling, P.A.: Inverse reinforcement learning via convex programming. In: Americon Control Conference (2011)Google Scholar
  16. 16.
    Ramirez, M., Geffner, H.: Plan recognition as planing. In: 21st Int’l Joint Conf. on Artificial Intelligence, pp. 1778–1783 (2009)Google Scholar
  17. 17.
    Ratliff, N.D., Bagnell, J.A., Zinkevich, M.A.: Maximum margin planning. In: Proceedings of the 23rd International Conference on Machine Learning (2006)Google Scholar
  18. 18.
    Schunk, D., Winter, J.: The relationship between risk attitudes and heuristics in search tasks: A laboratory experiment. Journal of Economic Behavior and Organization 71, 347–360 (2009)CrossRefGoogle Scholar
  19. 19.
    Seale, D.A.: Sequential decision making with relative ranks: An experimental investigation of the ’secretary problem’. Organizational Behavior and Human Decision Process 69, 221–236 (1997)Google Scholar
  20. 20.
    Strehl, A., Ghosh, J.: Cluster ensembles? a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research 3, 583–617 (2002)MathSciNetGoogle Scholar
  21. 21.
    Syed, U., Schapire, R.E.: A game-theoretic approach to apprenticeship learning. In: Advances in Neural Information Processing Systems, pp. 1449–1456. MIT Press (2008)Google Scholar
  22. 22.
    Tapia, E.M., Intille, S.S., Larson, K.: Activity recognition in the home using simple and ubiquitous sensors. In: Ferscha, A., Mattern, F. (eds.) PERVASIVE 2004. LNCS, vol. 3001, pp. 158–175. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  23. 23.
    Xu, L., Neufeld, J., Larson, B., Schuurmans, D.: Maximum margin clustering. In: Advanced Neural Information Process Systems, pp. 1537–1544 (2005)Google Scholar
  24. 24.
    Yang, S., Paddrik, M., Hayes, R., Todd, A., Kirilenko, A., Beling, P., Scherer, W.: Behavior based learning in identifying high frequency trading strategies. In: 2012 IEEE Symposium on Computational Intelligence for Financial Engineering and Economics, CIFEr 2012 (2012)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Qifeng Qiao
    • 1
  • Peter A. Beling
    • 1
  1. 1.Department of Systems EngineeringUniversity of VirginiaUSA

Personalised recommendations