Accelerating Imitation Learning in Relational Domains via Transfer by Initialization

  • Sriraam Natarajan
  • Phillip Odom
  • Saket Joshi
  • Tushar Khot
  • Kristian Kersting
  • Prasad Tadepalli
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8812)

Abstract

The problem of learning to mimic a human expert/teacher from training trajectories is called imitation learning. To make the process of teaching easier in this setting, we propose to employ transfer learning (where one learns on a source problem and transfers the knowledge to potentially more complex target problems). We consider multi-relational environments such as real-time strategy games and use functional-gradient boosting to capture and transfer the models learned in these environments. Our experiments demonstrate that our learner learns a very good initial model from the simple scenario and effectively transfers the knowledge to the more complex scenario thus achieving a jump start, a steeper learning curve and a higher convergence in performance.

Keywords

Optimal Policy Target Task Game Engine Gradient Ascent Functional Gradient 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgments

SN and PO thank Army Research Office grant number W911NF-13-1-0432 under the Young Investigator Program. SN and TK gratefully acknowledge the support of the DARPA DEFT Program under the Air Force Research Laboratory (AFRL) prime contract no. FA8750-13-2-0039. Any opinions, findings, and conclusion or recommendations expressed in this material are those of the authors and do not necessarily reflect the view of the DARPA, AFRL, or the US government. SJ was supported by a Computing Innovations Postdoctoral Fellowship. KK was supported by the Fraunhofer ATTRACT fellowship STREAM and by the European Commission under contract number FP7-248258-First-MM. PT acknowledges the support of ONR grant N000141110106.

References

  1. 1.
    Segre, A., DeJong, G.: Explanation-based manipulator learning: acquisition of planning ability through observation. In: Conference on Robotics and Automation (1985)Google Scholar
  2. 2.
    Argall, B., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robot. Auton. Syst. 57, 469–483 (2009)CrossRefGoogle Scholar
  3. 3.
    Calinon, S.: Robot Programming By Demonstration: A Probabilistic Approach. EPFL Press, Boca Raton (2009)Google Scholar
  4. 4.
    Lieberman, H.: Programming by example (introduction). Commun. ACM 43, 72–74 (2000)CrossRefGoogle Scholar
  5. 5.
    Ng, A., Russell, S.: Algorithms for inverse reinforcement learning. In: ICML (2000)Google Scholar
  6. 6.
    Sammut, C., Hurst, S., Kedzier, D., Michie, D.: Learning to fly. In: ICML (1992)Google Scholar
  7. 7.
    Ratliff, N., Bagnell, A., Zinkevich, M.: Maximum margin planning. In: ICML (2006)Google Scholar
  8. 8.
    Natarajan, S., Joshi, S., Tadepalli, P., Kersting, K., Shavlik, J.: Imitation learning in relational domains: a functional-gradient boosting approach. In: IJCAI (2011)Google Scholar
  9. 9.
    Khardon, R.: Learning action strategies for planning domains. Artif. Intell. 113, 125–148 (1999)CrossRefMATHGoogle Scholar
  10. 10.
    Yoon, S., Fern, A., Givan, R.: Inductive policy selection for first-order mdps. In: UAI (2002)Google Scholar
  11. 11.
    Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001)CrossRefMATHGoogle Scholar
  12. 12.
    Blockeel, H.: Top-down induction of first order logical decision trees. AI Commun. 12(1–2), 119–120 (1999)Google Scholar
  13. 13.
    Pan, S., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22, 1345–1359 (2010)CrossRefGoogle Scholar
  14. 14.
    Al-Zubi, S., Sommer, G.: Imitation learning and transferring of human movement and hand grasping to adapt to environment changes. In: Human Motion. Computational Imaging and Vision, vol. 36, pp. 435–452 (2008)Google Scholar
  15. 15.
    Mehta, N., Natarajan, S., Tadepalli, P., Fern, A.: Transfer in variable-reward hierarchical reinforcement learning. Mach. Learn. 73(3), 289–312 (2008)CrossRefGoogle Scholar
  16. 16.
    Ratliff, N., Silver, D., Bagnell, A.: Learning to search: functional gradient techniques for imitation learning. Auton. Robots 27, 25–53 (2009)CrossRefGoogle Scholar
  17. 17.
    Dietterich, T.G., Ashenfelter, A., Bulatov, Y.: Training conditional random fields via gradient tree boosting. In: ICML (2004)Google Scholar
  18. 18.
    Gutmann, B., Kersting, K.: TildeCRF: conditional random fields for logical sequences. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 174–185. Springer, Heidelberg (2006) Google Scholar
  19. 19.
    Natarajan, S., Khot, T., Kersting, K., Guttmann, B., Shavlik, J.: Gradient-based boosting for statistical relational learning: the relational dependency network case. Mach. Learn. 86, 25–56 (2012)MathSciNetCrossRefMATHGoogle Scholar
  20. 20.
    Kersting, K., Driessens, K.: Non-parametric policy gradients: a unified treatment of propositional and relational domains. In: ICML (2008)Google Scholar
  21. 21.
    Driessens, K.: Non-disjoint modularity in reinforcement learning through boosted policies. In: Multi-disciplinary Symposium on Reinforcement Learning (2009)Google Scholar
  22. 22.
    Driessens, K., Dzeroski, S.: Integrating guidance into relational reinforcement learning. Mach. Learn. 57(3), 271–304 (2004)CrossRefMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Sriraam Natarajan
    • 1
  • Phillip Odom
    • 1
  • Saket Joshi
    • 2
  • Tushar Khot
    • 3
  • Kristian Kersting
    • 4
  • Prasad Tadepalli
    • 5
  1. 1.Indiana University BloomingtonBloomingtonUSA
  2. 2.Cycorp IncAustinUSA
  3. 3.University of Wisconsin-MadisonMadisonUSA
  4. 4.Fraunhofer IAISNew YorkGermany
  5. 5.Oregon State UniversityCorvallisUSA

Personalised recommendations