Advertisement

Crashing to Learn, Learning to Survive: Planning in Dynamic Environments via Domain Randomization

Conference paper
  • 484 Downloads
Part of the Mechanisms and Machine Science book series (Mechan. Machine Science, volume 84)

Abstract

Autonomous robots in the real world should avert collisions with pedestrians and react quickly to sudden changes in the environment. Most local planners rely on static environment maps that cannot capture such critical elements of uncertainty. Learning based methods for end-to-end navigation have become popular recently, but it is still unclear how to incorporate collision avoidance in a simple, safe and quick manner. We propose a reinforcement learning curriculum based on domain randomization to train a high performing policy entirely in simulation. The policy is first trained in a simple obstacle-free environment to quickly learn point to point navigation. The learned policy is transferred to a dynamic environment to learn collision avoidance. The key idea is to randomize the obstacle dynamics to obtain a robust planner that can be directly deployed to the real world. The resultant policy outperforms conventional planners in dynamic real world environments, even when the robot is intentionally obstructed.

Keywords

Mobile robot navigation Reinforcement learning Domain randomization 

References

  1. 1.
    Chiang, L., et al.: Learning navigation behaviors end-to-end with AutoRL. IEEE Robot. Autom. Lett. 4, 2007–2014 (2019)CrossRefGoogle Scholar
  2. 2.
    Dosovitskiy, A., Koltun, V.: Learning to act by predicting the future. CoRR (2016). http://arxiv.org/abs/1611.01779
  3. 3.
    Dulac-Arnold, G., Mankowitz, D.J., Hester, T.: Challenges of real-world reinforcement learning. CoRR (2019). http://arxiv.org/abs/1904.12901
  4. 4.
    Everett, M., Chen, Y.F., How, J.P.: Motion planning among dynamic, decision-making agents with deep reinforcement learning. IROS (2018). http://arxiv.org/abs/1805.01956
  5. 5.
    Faust, A., Ramirez, O., Fiser, M., Oslund, K., Francis, A., Davidson, J., Tapia, L.: PRM-RL: long-range robotic navigation tasks by combining reinforcement learning and sampling-based planning. CoRR (2017). http://arxiv.org/abs/1710.03937
  6. 6.
    Fox, D., Burgard, W., Thrun, S.: The dynamic window approach to collision avoidance. IEEE Robot. Autom. Mag. 4, 23–33 (1997)CrossRefGoogle Scholar
  7. 7.
    Helbing, D., Molnár, P.: Social force model for pedestrian dynamics. Phys. Rev. E 51, 4282–4286 (1995)CrossRefGoogle Scholar
  8. 8.
    Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. CoRR (2015). https://arxiv.org/abs/1509.02971v4
  9. 9.
    Mahmood, A.R., Korenkevych, D., Vasan, G., Ma, W., Bergstra, J.: Benchmarking reinforcement learning algorithms on real-world robots. CoRR (2018). http://arxiv.org/abs/1809.07731
  10. 10.
    Pfeiffer, M., Schaeuble, M., Nieto, J.I., Siegwart, R., Cadena, C.: From perception to decision: a data-driven approach to end-to-end motion planning for autonomous ground robots. CoRR (2016). http://arxiv.org/abs/1609.07910
  11. 11.
    Pfeiffer, M., Shukla, S., Turchetta, M., Cadena, C., Krause, A., Siegwart, R., Nieto, J.I.: Reinforced imitation: sample efficient deep reinforcement learning for map-less navigation by leveraging prior demonstrations. CoRR (2018). http://arxiv.org/abs/1805.07095
  12. 12.
    Quigley, M., Conley, K., Gerkey, B.P., Faust, J., Foote, T., Leibs, J., Wheeler, R., Ng, A.Y.: ROS: an open-source robot operating system (2009). http://www.willowgarage.com/sites/default/files/icraoss09-ROS.pdf
  13. 13.
    Quinlan, S., Khatib, O.: Elastic bands: connecting path planning and control, vol. 2, pp. 802–807, May 1993. http://www8.cs.umu.se/research/ifor/dl/Control/elastic%20bands.pdf
  14. 14.
    Sadeghi, F., Levine, S.: CAD2RL: real single-image flight without a single real image. CoRR (2016). http://arxiv.org/abs/1611.04201
  15. 15.
    Siegwart, R., Nourbaksh, R., Scaramuzza, D.: Introduction to Autonomous Mobile Robots, 2nd edn. The MIT Press, Cambridge (2011)Google Scholar
  16. 16.
    Smart, W.D., Pack Kaelbling, L.: Effective reinforcement learning for mobile robots. In: Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. CH37292) (2002). http://people.csail.mit.edu/lpk/papers/2002/SmartKaelbling-ICRA2002.pdf
  17. 17.
    Tai, L., Paolo, G., Liu, M.: Virtual-to-real deep reinforcement learning: continuous control of mobile robots for mapless navigation. CoRR (2017). http://arxiv.org/abs/1703.00420
  18. 18.
    Thrun, S.: An approach to learning mobile robot navigation. Robot. Autom. Syst. 14, 301–319 (1995)CrossRefGoogle Scholar
  19. 19.
    Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world, pp. 23–30, September 2017. http://arxiv.org/abs/1703.06907
  20. 20.
    Zhang, J., Springenberg, J.T., Boedecker, J., Burgard, W.: Deep reinforcement learning with successor features for navigation across similar environments. CoRR (2016). http://arxiv.org/abs/1612.05533

Copyright information

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.PSG College of TechnologyCoimbatoreIndia
  2. 2.IIT PalakkadPalakkadIndia

Personalised recommendations