Deep Active Learning for Autonomous Navigation

Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 629)

Abstract

Imitation learning refers to an agent’s ability to mimic a desired behavior by learning from observations. A major challenge facing learning from demonstrations is to represent the demonstrations in a manner that is adequate for learning and efficient for real time decisions. Creating feature representations is especially challenging when extracted from high dimensional visual data. In this paper, we present a method for imitation learning from raw visual data. The proposed method is applied to a popular imitation learning domain that is relevant to a variety of real life applications; namely navigation. To create a training set, a teacher uses an optimal policy to perform a navigation task, and the actions taken are recorded along with visual footage from the first person perspective. Features are automatically extracted and used to learn a policy that mimics the teacher via a deep convolutional neural network. A trained agent can then predict an action to perform based on the scene it finds itself in. This method is generic, and the network is trained without knowledge of the task, targets or environment in which it is acting. Another common challenge in imitation learning is generalizing a policy over unseen situation in training data. To address this challenge, the learned policy is subsequently improved by employing active learning. While the agent is executing a task, it can query the teacher for the correct action to take in situations where it has low confidence. The active samples are added to the training set and used to update the initial policy. The proposed approach is demonstrated on 4 different tasks in a 3D simulated environment. The experiments show that an agent can effectively perform imitation learning from raw visual data for navigation tasks and that active learning can significantly improve the initial policy using a small number of samples. The simulated testbed facilitates reproduction of these results and comparison with other approaches.

Keywords

Active Learning Optimal Policy Reinforcement Learning Deep Learning Convolutional Neural Network 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Abbeel, P., Coates, A., Quigley, M., Ng, A.Y.: An application of reinforcement learning to aerobatic helicopter flight. Adv. Neural Inf. Process. Syst. 19, 1 (2007)Google Scholar
  2. 2.
    Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robot. Auton. Syst. 57(5), 469–483 (2009)CrossRefGoogle Scholar
  3. 3.
    Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents (2012). arXiv preprint arXiv:1207.4708
  4. 4.
    Bemelmans, R., Gelderblom, G.J., Jonker, P., De Witte, L.: Socially assistive robots in elderly care: a systematic review into effects and effectiveness. J. Am. Med. Direct. Assoc. 13(2), 114–120 (2012)CrossRefGoogle Scholar
  5. 5.
    Calinon, S., Billard, A.G.: What is the teachers role in robot programming by demonstration? Toward benchmarks for improved learning. Interact. Stud. 8(3), 441–464 (2007)CrossRefGoogle Scholar
  6. 6.
    Cardamone, L., Loiacono, D., Lanzi, P.L.: Learning drivers for torcs through imitation using supervised methods. In: 2009 IEEE Symposium on Computational Intelligence and Games, CIG 2009, pp. 148–155. IEEE (2009)Google Scholar
  7. 7.
    Chernova, S., Veloso, M.: Confidence-based policy learning from demonstration using Gaussian mixture models. In: Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems, p. 233. ACM (2007)Google Scholar
  8. 8.
    Ciresan, D., Meier, U., Schmidhuber, J.: Multi-column deep neural networks for image classification. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3642–3649. IEEE (2012)Google Scholar
  9. 9.
    Clark, C., Storkey, A.: Training deep convolutional neural networks to play go. In: Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), pp. 1766–1774 (2015)Google Scholar
  10. 10.
    Dixon, K.R., Khosla, P.K.: Learning by observation with mobile robots: a computational approach. In: Proceedings 2004 IEEE International Conference on Robotics and Automation, ICRA 2004, vol. 1, pp. 102–107. IEEE (2004)Google Scholar
  11. 11.
    Gorman, B.: Imitation learning through games: theory, implementation and evaluation. Ph.D. thesis, Dublin City University (2009)Google Scholar
  12. 12.
    Guo, X., Singh, S., Lee, H., Lewis, R.L., Wang, X.: Deep learning for real-time Atari game play using offline monte-carlo tree search planning. In: Proceedings of Advances in Neural Information Processing Systems, pp. 3338–3346 (2014)Google Scholar
  13. 13.
    Ijspeert, A.J., Nakanishi, J., Schaal, S.: Learning rhythmic movements by demonstration using nonlinear oscillators. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2002), pp. 958–963 (2002). No. BIOROB-CONF-2002-003Google Scholar
  14. 14.
    Judah, K., Fern, A., Dietterich, T.G.: Active imitation learning via reduction to IID active learning (2012). arXiv preprint arXiv:1210.4876
  15. 15.
    Karakovskiy, S., Togelius, J.: The mario AI benchmark and competitions. IEEE Trans. Comput. Intell. AI Games 4(1), 55–67 (2012)CrossRefGoogle Scholar
  16. 16.
    Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32, 1238 (2013). 0278364913495721CrossRefGoogle Scholar
  17. 17.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  18. 18.
    Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies (2015). arXiv preprint arXiv:1504.00702
  19. 19.
  20. 20.
    Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing Atari with deep reinforcement learning (2013). arXiv preprint arXiv:1312.5602
  21. 21.
    Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRefGoogle Scholar
  22. 22.
    Munoz, J., Gutierrez, G., Sanchis, A.: Controller for torcs created by imitation. In: 2009 IEEE Symposium on Computational Intelligence and Games, CIG 2009, pp. 271–278. IEEE (2009)Google Scholar
  23. 23.
    Ng, A.Y., Coates, A., Diel, M., Ganapathi, V., Schulte, J., Tse, B., Berger, E., Liang, E.: Autonomous inverted helicopter flight via reinforcement learning. In: Ang Jr., M.H., Khatib, O. (eds.) Experimental Robotics IX. Springer Tracts in Advanced Robotics, vol. 21, pp. 363–372. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  24. 24.
    Nicolescu, M.N., Mataric, M.J.: Natural methods for robot task learning: instructive demonstrations, generalization and practice. In: Proceedings of the Second International Joint Conference on Autonomous Agents and Multiagent Systems, pp. 241–248. ACM (2003)Google Scholar
  25. 25.
    Noda, I., Matsubara, H., Hiraki, K., Frank, I.: Soccer server: a tool for research on multiagent systems. Appl. Artif. Intell. 12(2–3), 233–250 (1998)CrossRefGoogle Scholar
  26. 26.
    Ollis, M., Huang, W.H., Happold, M.: A Bayesian approach to imitation learning for robot navigation. In: 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2007, pp. 709–714. IEEE (2007)Google Scholar
  27. 27.
    Ratliff, N., Bradley, D., Bagnell, J.A., Chestnutt, J.: Boosting structured prediction for imitation learning. In: Proceedings of Robotics Institute, p. 54 (2007)Google Scholar
  28. 28.
    Ross, S., Bagnell, D.: Efficient reductions for imitation learning. In: International Conference on Artificial Intelligence and Statistics, pp. 661–668 (2010)Google Scholar
  29. 29.
    Sammut, C., Hurst, S., Kedzier, D., Michie, D., et al.: Learning to fly. In: Proceedings of the Ninth International Workshop on Machine Learning, pp. 385–393 (1992)Google Scholar
  30. 30.
    Saunders, J., Nehaniv, C.L., Dautenhahn, K.: Teaching robots by moulding behavior and scaffolding the environment. In: Proceedings of the 1st ACM SIGCHI/SIGART Conference on Human-robot Interaction, pp. 118–125. ACM (2006)Google Scholar
  31. 31.
    Schaal, S.: Is imitation learning the route to humanoid robots? Trends Cogn. Sci. 3(6), 233–242 (1999)CrossRefGoogle Scholar
  32. 32.
    Silver, D., Bagnell, J., Stentz, A.: High performance outdoor navigation from overhead data using imitation learning. In: Proceedings of Robotics: Science and Systems IV, Zurich, Switzerland (2008)Google Scholar
  33. 33.
    Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)CrossRefGoogle Scholar
  34. 34.
    Theano Development Team: Theano: a Python framework for fast computation of mathematical expressions. arXiv e-prints abs/1605.02688, May 2016. http://arxiv.org/abs/1605.02688
  35. 35.
    Togelius, J., De Nardi, R., Lucas, S.M.: Towards automatic personalised content creation for racing games. In: 2007 IEEE Symposium on Computational Intelligence and Games, CIG 2007, pp. 252–259. IEEE (2007)Google Scholar
  36. 36.
    Vogt, D., Amor, H.B., Berger, E., Jung, B.: Learning two-person interaction models for responsive synthetic humanoids. J. Virtual Real. Broadcast. 11(1) (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Ahmed Hussein
    • 1
  • Mohamed Medhat Gaber
    • 1
  • Eyad Elyan
    • 1
  1. 1.School of ComputingRobert Gordon UniversityAberdeenUK

Personalised recommendations