Crashing to Learn, Learning to Survive: Planning in Dynamic Environments via Domain Randomization

Conference paper
Part of the Mechanisms and Machine Science book series (Mechan. Machine Science, volume 84)


Autonomous robots in the real world should avert collisions with pedestrians and react quickly to sudden changes in the environment. Most local planners rely on static environment maps that cannot capture such critical elements of uncertainty. Learning based methods for end-to-end navigation have become popular recently, but it is still unclear how to incorporate collision avoidance in a simple, safe and quick manner. We propose a reinforcement learning curriculum based on domain randomization to train a high performing policy entirely in simulation. The policy is first trained in a simple obstacle-free environment to quickly learn point to point navigation. The learned policy is transferred to a dynamic environment to learn collision avoidance. The key idea is to randomize the obstacle dynamics to obtain a robust planner that can be directly deployed to the real world. The resultant policy outperforms conventional planners in dynamic real world environments, even when the robot is intentionally obstructed.


Mobile robot navigation Reinforcement learning Domain randomization 


  1. 1.
    Chiang, L., et al.: Learning navigation behaviors end-to-end with AutoRL. IEEE Robot. Autom. Lett. 4, 2007–2014 (2019)CrossRefGoogle Scholar
  2. 2.
    Dosovitskiy, A., Koltun, V.: Learning to act by predicting the future. CoRR (2016).
  3. 3.
    Dulac-Arnold, G., Mankowitz, D.J., Hester, T.: Challenges of real-world reinforcement learning. CoRR (2019).
  4. 4.
    Everett, M., Chen, Y.F., How, J.P.: Motion planning among dynamic, decision-making agents with deep reinforcement learning. IROS (2018).
  5. 5.
    Faust, A., Ramirez, O., Fiser, M., Oslund, K., Francis, A., Davidson, J., Tapia, L.: PRM-RL: long-range robotic navigation tasks by combining reinforcement learning and sampling-based planning. CoRR (2017).
  6. 6.
    Fox, D., Burgard, W., Thrun, S.: The dynamic window approach to collision avoidance. IEEE Robot. Autom. Mag. 4, 23–33 (1997)CrossRefGoogle Scholar
  7. 7.
    Helbing, D., Molnár, P.: Social force model for pedestrian dynamics. Phys. Rev. E 51, 4282–4286 (1995)CrossRefGoogle Scholar
  8. 8.
    Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. CoRR (2015).
  9. 9.
    Mahmood, A.R., Korenkevych, D., Vasan, G., Ma, W., Bergstra, J.: Benchmarking reinforcement learning algorithms on real-world robots. CoRR (2018).
  10. 10.
    Pfeiffer, M., Schaeuble, M., Nieto, J.I., Siegwart, R., Cadena, C.: From perception to decision: a data-driven approach to end-to-end motion planning for autonomous ground robots. CoRR (2016).
  11. 11.
    Pfeiffer, M., Shukla, S., Turchetta, M., Cadena, C., Krause, A., Siegwart, R., Nieto, J.I.: Reinforced imitation: sample efficient deep reinforcement learning for map-less navigation by leveraging prior demonstrations. CoRR (2018).
  12. 12.
    Quigley, M., Conley, K., Gerkey, B.P., Faust, J., Foote, T., Leibs, J., Wheeler, R., Ng, A.Y.: ROS: an open-source robot operating system (2009).
  13. 13.
    Quinlan, S., Khatib, O.: Elastic bands: connecting path planning and control, vol. 2, pp. 802–807, May 1993.
  14. 14.
    Sadeghi, F., Levine, S.: CAD2RL: real single-image flight without a single real image. CoRR (2016).
  15. 15.
    Siegwart, R., Nourbaksh, R., Scaramuzza, D.: Introduction to Autonomous Mobile Robots, 2nd edn. The MIT Press, Cambridge (2011)Google Scholar
  16. 16.
    Smart, W.D., Pack Kaelbling, L.: Effective reinforcement learning for mobile robots. In: Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. CH37292) (2002).
  17. 17.
    Tai, L., Paolo, G., Liu, M.: Virtual-to-real deep reinforcement learning: continuous control of mobile robots for mapless navigation. CoRR (2017).
  18. 18.
    Thrun, S.: An approach to learning mobile robot navigation. Robot. Autom. Syst. 14, 301–319 (1995)CrossRefGoogle Scholar
  19. 19.
    Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world, pp. 23–30, September 2017.
  20. 20.
    Zhang, J., Springenberg, J.T., Boedecker, J., Burgard, W.: Deep reinforcement learning with successor features for navigation across similar environments. CoRR (2016).

Copyright information

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.PSG College of TechnologyCoimbatoreIndia
  2. 2.IIT PalakkadPalakkadIndia

Personalised recommendations