ILP 2013: Inductive Logic Programming pp 64-75 | Cite as
Accelerating Imitation Learning in Relational Domains via Transfer by Initialization
Abstract
The problem of learning to mimic a human expert/teacher from training trajectories is called imitation learning. To make the process of teaching easier in this setting, we propose to employ transfer learning (where one learns on a source problem and transfers the knowledge to potentially more complex target problems). We consider multi-relational environments such as real-time strategy games and use functional-gradient boosting to capture and transfer the models learned in these environments. Our experiments demonstrate that our learner learns a very good initial model from the simple scenario and effectively transfers the knowledge to the more complex scenario thus achieving a jump start, a steeper learning curve and a higher convergence in performance.
Keywords
Optimal Policy Target Task Game Engine Gradient Ascent Functional GradientNotes
Acknowledgments
SN and PO thank Army Research Office grant number W911NF-13-1-0432 under the Young Investigator Program. SN and TK gratefully acknowledge the support of the DARPA DEFT Program under the Air Force Research Laboratory (AFRL) prime contract no. FA8750-13-2-0039. Any opinions, findings, and conclusion or recommendations expressed in this material are those of the authors and do not necessarily reflect the view of the DARPA, AFRL, or the US government. SJ was supported by a Computing Innovations Postdoctoral Fellowship. KK was supported by the Fraunhofer ATTRACT fellowship STREAM and by the European Commission under contract number FP7-248258-First-MM. PT acknowledges the support of ONR grant N000141110106.
References
- 1.Segre, A., DeJong, G.: Explanation-based manipulator learning: acquisition of planning ability through observation. In: Conference on Robotics and Automation (1985)Google Scholar
- 2.Argall, B., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robot. Auton. Syst. 57, 469–483 (2009)CrossRefGoogle Scholar
- 3.Calinon, S.: Robot Programming By Demonstration: A Probabilistic Approach. EPFL Press, Boca Raton (2009)Google Scholar
- 4.Lieberman, H.: Programming by example (introduction). Commun. ACM 43, 72–74 (2000)CrossRefGoogle Scholar
- 5.Ng, A., Russell, S.: Algorithms for inverse reinforcement learning. In: ICML (2000)Google Scholar
- 6.Sammut, C., Hurst, S., Kedzier, D., Michie, D.: Learning to fly. In: ICML (1992)Google Scholar
- 7.Ratliff, N., Bagnell, A., Zinkevich, M.: Maximum margin planning. In: ICML (2006)Google Scholar
- 8.Natarajan, S., Joshi, S., Tadepalli, P., Kersting, K., Shavlik, J.: Imitation learning in relational domains: a functional-gradient boosting approach. In: IJCAI (2011)Google Scholar
- 9.Khardon, R.: Learning action strategies for planning domains. Artif. Intell. 113, 125–148 (1999)CrossRefMATHGoogle Scholar
- 10.Yoon, S., Fern, A., Givan, R.: Inductive policy selection for first-order mdps. In: UAI (2002)Google Scholar
- 11.Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001)CrossRefMATHGoogle Scholar
- 12.Blockeel, H.: Top-down induction of first order logical decision trees. AI Commun. 12(1–2), 119–120 (1999)Google Scholar
- 13.Pan, S., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22, 1345–1359 (2010)CrossRefGoogle Scholar
- 14.Al-Zubi, S., Sommer, G.: Imitation learning and transferring of human movement and hand grasping to adapt to environment changes. In: Human Motion. Computational Imaging and Vision, vol. 36, pp. 435–452 (2008)Google Scholar
- 15.Mehta, N., Natarajan, S., Tadepalli, P., Fern, A.: Transfer in variable-reward hierarchical reinforcement learning. Mach. Learn. 73(3), 289–312 (2008)CrossRefGoogle Scholar
- 16.Ratliff, N., Silver, D., Bagnell, A.: Learning to search: functional gradient techniques for imitation learning. Auton. Robots 27, 25–53 (2009)CrossRefGoogle Scholar
- 17.Dietterich, T.G., Ashenfelter, A., Bulatov, Y.: Training conditional random fields via gradient tree boosting. In: ICML (2004)Google Scholar
- 18.Gutmann, B., Kersting, K.: TildeCRF: conditional random fields for logical sequences. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 174–185. Springer, Heidelberg (2006) Google Scholar
- 19.Natarajan, S., Khot, T., Kersting, K., Guttmann, B., Shavlik, J.: Gradient-based boosting for statistical relational learning: the relational dependency network case. Mach. Learn. 86, 25–56 (2012)MathSciNetCrossRefMATHGoogle Scholar
- 20.Kersting, K., Driessens, K.: Non-parametric policy gradients: a unified treatment of propositional and relational domains. In: ICML (2008)Google Scholar
- 21.Driessens, K.: Non-disjoint modularity in reinforcement learning through boosted policies. In: Multi-disciplinary Symposium on Reinforcement Learning (2009)Google Scholar
- 22.Driessens, K., Dzeroski, S.: Integrating guidance into relational reinforcement learning. Mach. Learn. 57(3), 271–304 (2004)CrossRefMATHGoogle Scholar