Design of Transfer Reinforcement Learning Mechanisms for Autonomous Collision Avoidance

  • Xiongqing Liu
  • Yan JinEmail author
Conference paper


It is often hard for a reinforcement learning (RL) agent to utilize previous experience to solve new similar but more complex tasks. In this research, we combine the transfer learning with reinforcement learning and investigate how the hyperparameters of both transfer learning and reinforcement learning impact the learning effectiveness and task performance in the context of autonomous robotic collision avoidance. A deep reinforcement learning algorithm was first implemented for a robot to learn, from its experience, how to avoid randomly generated single obstacles. After that the effect of transfer of previously learned experience was studied by introducing two important concepts, transfer belief—i.e., how much a robot should believe in its previous experience—and transfer period—i.e., how long the previous experience should be applied in the new context. The proposed approach has been tested for collision avoidance problems by altering transfer period. It is shown that transfer learnings on average had ~50% speed increase at ~30% competence levels, and there exists an optimal transfer period where the variance is the lowest and learning speed is the fastest.


  1. 1.
    Bojarski M et al (2016) End to end learning for self-driving cars. arXiv: 1604.07316 [cs.LG]Google Scholar
  2. 2.
    Casanova D, Tardioli C, Lemaître A (2014) Space debris collision avoidance using a three-filter sequence. Mon Not R Astron Soc 442(4):3235–3242CrossRefGoogle Scholar
  3. 3.
    Chen JX (2016) The evolution of computing: AlphaGo. Comput Sci Eng 18(4):4–7CrossRefGoogle Scholar
  4. 4.
    Churchland PS, Sejnowski TJ (2016) The computational brain. MIT Press, CambridgeGoogle Scholar
  5. 5.
    Coates A, Huval B, Wang T, Wu D, Ng A (2013) Deep learning with COTS HPC systems. In: International conference on machine learningGoogle Scholar
  6. 6.
    Chen YF, Liu M, Everett M, How JP (2016) Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning. arXiv: 1609.07845 [cs.MA]Google Scholar
  7. 7.
    Dean J et al (2012) Large scale distributed deep networks. In: International conference on neural information processing systems. Curran Associates Inc., New YorkGoogle Scholar
  8. 8.
    Dieleman S, Schrauwen B (2014) End-to-end learning for music audio. In: IEEE international conference on acoustics, speech and signal processingGoogle Scholar
  9. 9.
    Ding Z, Nasrabadi N, Fu Y (2016) Task-driven deep transfer learning for image classification. In: IEEE international conference on acoustics, speech and signal processingGoogle Scholar
  10. 10.
    Fahimi F, Nataraj C, Ashrafiuon H (2009) Real-time obstacle avoidance for multiple mobile robots. Robotica 27(2):189–198CrossRefGoogle Scholar
  11. 11.
    Fernandez F, Veloso M (2006) Probabilistic policy reuse in a reinforcement learning agent. In: International joint conference on autonomous agents and multiagent systems, vol 58, pp 720–727Google Scholar
  12. 12.
    Frommberger L (2008) Learning to behave in space: a qualitative spatial representation for robot navigation with reinforcement learning. Int J Artif Intell Tools 17(03):465–482CrossRefGoogle Scholar
  13. 13.
    Fujii T, Arai Y, Asama H, Endo I (1998) Multilayered reinforcement learning for complicated collision avoidance problems. In: Proceedings 1998 IEEE international conference on robotics and automation, vol 3, pp 2186–2191Google Scholar
  14. 14.
    Goerlandt F, Kujala P (2014) On the reliability and validity of ship–ship collision risk analysis in light of different perspectives on risk. Saf Sci 62:348–365CrossRefGoogle Scholar
  15. 15.
    Hinton G, Deng L, Yu D, Dahl GE, Mohamed A, Jaitly N (2012) A senior, deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig Process Mag 29(6):82–97CrossRefGoogle Scholar
  16. 16.
    Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv: 1503.02531v1 [stat.ML]Google Scholar
  17. 17.
    Hourtash AM, Hingwe P, Schena BM, Devengenzo RL (2016) U.S. Patent No. 9,492,235. U.S. Patent and Trademark Office, Washington, DCGoogle Scholar
  18. 18.
    Keller J, Thakur D, Gallier J, Kumar V (2016) Obstacle avoidance and path intersection validation for UAS: a B-spline approach. In: IEEE international conference on unmanned aircraft systems, pp 420–429Google Scholar
  19. 19.
    Khatib O (1986) Real-time obstacle avoidance for manipulators and mobile robots. Int J Robot Res 5(1)Google Scholar
  20. 20.
    Krizhevsky A, Sutskever I, Hinton G (2012) ImageNet classification with deep convolutional neural networks. Commun ACM 60(2)CrossRefGoogle Scholar
  21. 21.
    LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444CrossRefGoogle Scholar
  22. 22.
    Liu X, Jin Y (2018) Transfer reinforcement learning: task similarities and transfer strategies (in preparation)Google Scholar
  23. 23.
    Machado T, Malheiro T, Monteiro S, Erlhagen W, Bicho E (2016) Multi-constrained joint transportation tasks by teams of autonomous mobile robots using a dynamical systems approach. In: 2016 IEEE international conference on robotics and automation (ICRA), pp 3111–3117Google Scholar
  24. 24.
    March JG (1991) Exploration and exploitation in organizational learning. Organ Sci 2(1):71–87CrossRefGoogle Scholar
  25. 25.
    Mastellone S, Stipanovic D, Graunke C, Intlekofer K, Spong M (2008) Formation control and collision avoidance for multi-agent non-holonomic systems: theory and experiments. Int J Rob Res 27(1):107–126CrossRefGoogle Scholar
  26. 26.
    Matarić MJ (1997) Reinforcement learning in the multi-robot domain. In: Robot colonies. Springer, US, pp 73–83CrossRefGoogle Scholar
  27. 27.
    Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing Atari with deep reinforcement learning. arXiv:1312.5602v1 [cs.LG]Google Scholar
  28. 28.
    Mukhtar A, Xia L, Tang TB (2015) Vehicle detection techniques for collision avoidance systems: a review. IEEE Trans Intell Transp Syst 16(5):2318–2338CrossRefGoogle Scholar
  29. 29.
    Ohn-Bar E, Trivedi MM (2016) Looking at humans in the age of self-driving and highly automated vehicles. IEEE Trans Intell Veh 1(1):90–104CrossRefGoogle Scholar
  30. 30.
    Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359CrossRefGoogle Scholar
  31. 31.
    Parisotto E, Ba JL, Salakhutdinov R (2016) Actor-mimic: deep multitask and transfer reinforcement learning. arXiv:1511.06342v4 [cs.LG]Google Scholar
  32. 32.
    Schaul T, Quan J, Antonoglou I, Silver D (2016) Prioritized experience replay. International Conference on Learning Representations, 2016Google Scholar
  33. 33.
    Shiomi M, Zanlungo F, Hayashi K, Kanda T (2014) Towards a socially acceptable collision avoidance for a mobile robot navigating among pedestrians using a pedestrian model. Int J Soc Robot 6(3):443–455CrossRefGoogle Scholar
  34. 34.
    Silver D et al (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484CrossRefGoogle Scholar
  35. 35.
    Tang S, Kumar V (2015) A complete algorithm for generating safe trajectories for multi-robot teams. In: International symposium on robotics researchGoogle Scholar
  36. 36.
    Taylor M, Stone P (2007) Cross-domain transfer for reinforcement learning. In: International conference on machine learning, ACMGoogle Scholar
  37. 37.
    Torrey L, Shavlik J, Walker T, Maclin R (2006) Skill acquisition via transfer learning and advice taking. In: European conference on machine learning. Springer, BerlinGoogle Scholar
  38. 38.
    van Hasselt H, Guez A, Silver D (2015) Deep reinforcement learning with double Q-learning. arXiv:1509.06461v3 [cs.LG]Google Scholar
  39. 39.
    Wang FY, Zhang JJ, Zheng X, Wang X, Yuan Y, Dai X, Zhang J, Yang, L (2016). Where does AlphaGo go: from Church-Turing thesis to AlphaGo thesis and beyond. IEEE/CAA J Automatica Sin 3(2):113–120Google Scholar
  40. 40.
    Wang Z, School T, Hessel M, van Haselt H, Lanctot M, de Freitas N (2016) Dueling network architectures for deep reinforcement learning. arXiv:1511.06581v3 [cs.LG]Google Scholar
  41. 41.
    Watkins C (1989) Learning from delayed rewards. Doctoral dissertation, University of Cambridge, CambridgeGoogle Scholar
  42. 42.
    Yu A, Palefsky-Smith R, Bedi R (2016) Deep reinforcement learning for simulated autonomous vehicle controlGoogle Scholar
  43. 43.
    Zou X, Alexander R, McDermid J (2016) On the validation of a UAV collision avoidance system developed by model-based optimization: challenges and a tentative partial solution. In: 2016 46th annual IEEE/IFIP international conference on dependable systems and networks workshop, pp 192–199Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.University of Southern CaliforniaLos AngelesUSA

Personalised recommendations