Abstract
Imitation learning is an effective strategy to reinforcement learning, which avoids the delayed reward problem by learning from mentor-demonstrated trajectories. A limitation for imitation learning is that collecting sufficient qualified demonstrations is quite expensive. In this work, we study how an agent can automatically improve its performance from a weak policy, by automatically acquiring more demonstrations for learning. We propose the LEWE framework to sample tasks for the weak policy to execute, and then learn from the successful trajectories to achieve an improvement. As the sampling strategy is the key to the efficiency of LEWE, we further propose to incorporate active learning for the sampling strategy for LEWE. Experiments in a spatial positioning task show that LEWE with active learning can effectively and efficiently improve the weak policy and achieves a better performance than the comparing sampling approaches.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sutton, R., Barto, A.: Reinforcement Learning. An Introduction. Cambridge University Press, Cambridge (1998)
Schaal, S.: Is imitation learning the route to humanoid robots. Trends Cogn. Sci. 3(6), 233–242 (1999)
Argall, B., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Rob. Auton. Syst. 57(5), 469–483 (2009)
Atkeson, C., Schaal, S.: Robot learning from demonstration. In: Proceedings of the ICML’97, San Francisco, USA, pp. 12–20, July 1997
Choi, J., Kim, K.: Inverse reinforcement learning in partially observable environments. In: Proceedings of IJCAI’09, Barcelona, Spain, pp. 1028–1033, July 2009
Jetchev, N., Toussaint, M.: Task space retrieval using inverse feedback control. In: Proceedings of ICML’11, Bellevue, WA, USA, pp. 449–456, June 2011
Zhang, D., Cai, Z., Nebel, B.: Playing tetris using learning by imitation. In: Proceedings of GAMEON’10, Leicester, UK, pp. 23–27, November 2010
Ng, A., Russell, S.: Algorithms for inverse reinforcement learning. In: Proceedings of ICML’00, Stanford, USA, pp. 663–670, June 2000
Ziebart, B., Maas, A., Bagnell, J., Dey, A.: Maximum entropy inverse reinforcement learning. In: Proceedings of AAAI’08, Chicago, USA, pp. 1433–1438, July 2008
Bentivegna, D.: Learning from Observation Using Primitives. Ph.D. thesis, College of Computing, Georgia Institute of Technology (2011)
Bentivegna, D., Atkeson, C.: Learning from observation using primitives. In: Proceedings of ICRA’11, Seoul, Korea, pp. 1988–1993, May 2001
Silver, D., Bagnell, J., Stentz, A.: Perceptual interpretation for autonomous navigation through dynamic imitation learning. Robot. Res. 70, 433–449 (2011)
Settles, B.: Active learning literature survey. Computer Sciences Technical Report, University of Wisconsin-Madison (2009)
Huang, S., Jin, R., Zhou, Z.: Active learning by querying informative and representative examples. In: NIPS’11, pp. 892–900 (2011)
Beyer, H., Schwefel, H.: Evolution strategies-a comprehensive introduction. Nat. Comput. 1(1), 3–52 (2002)
Argall, B., Browning, B., Veloso, M.: Learning robot motion control with demonstration and advice-operators. In: Proceedings of IROS’08, Nice, France, pp. 399–404, September 2008
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New York (2001)
Quinlan, J.: C4.5: Programs for machine learning. Morgan kaufmann, San Franscisco (1993)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Franscisco (2005)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (2000)
Chang, C., Lin, C.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syss. Technol. 2(3), 27 (2011)
Caruana, R., Niculescu-Mizil, A.: An empirical comparison of supervised learning algorithms. In: Proceedings of ICML’06, Pittsburgh, PE, pp. 161–168 (2006)
Acknowledgments
This research was supported by the Jiangsu Science Foundation (BK2012303), the 2013 State Grid Research Project, and the Baidu Fund (181315P00651).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Da, Q., Yu, Y., Zhou, ZH. (2013). Self-Practice Imitation Learning from Weak Policy. In: Zhou, ZH., Schwenker, F. (eds) Partially Supervised Learning. PSL 2013. Lecture Notes in Computer Science(), vol 8183. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40705-5_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-40705-5_2
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40704-8
Online ISBN: 978-3-642-40705-5
eBook Packages: Computer ScienceComputer Science (R0)