Skip to main content

Perceive, Predict, and Plan: Safe Motion Planning Through Interpretable Semantic Representations

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12368))

Abstract

In this paper we propose a novel end-to-end learnable network that performs joint perception, prediction and motion planning for self-driving vehicles and produces interpretable intermediate representations. Unlike existing neural motion planners, our motion planning costs are consistent with our perception and prediction estimates. This is achieved by a novel differentiable semantic occupancy representation that is explicitly used as cost by the motion planning process. Our network is learned end-to-end from human demonstrations. The experiments in a large-scale manual-driving dataset and closed-loop simulation show that the proposed model significantly outperforms state-of-the-art planners in imitating the human behaviors while producing much safer trajectories.

A. Sadat and S. Casas—Equal contribution.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Liang, M., Yang, B., Chen, Y., Hu, R., Urtasun, R.: Multi-task multi-sensor fusion for 3d object detection. In: CVPR (2019)

    Google Scholar 

  2. Luo, W., Yang, B., Urtasun, R.: Fast and furious: real time end-to-end 3d detection, tracking and motion forecasting with a single convolutional net. In: CVPR (2018)

    Google Scholar 

  3. Casas, S., Luo, W., Urtasun, R.: IntentNet: learning to predict intention from raw sensor data. In: CoRL (2018)

    Google Scholar 

  4. Chai, Y., Sapp, B., Bansal, M., Anguelov, D.: MultiPath: multiple probabilistic anchor trajectory hypotheses for behavior prediction. arXiv preprint arXiv:1910.05449 (2019)

  5. Tang, C., Salakhutdinov, R.R.: Multiple futures prediction. In: Advances in Neural Information Processing Systems, pp. 15398–15408 (2019)

    Google Scholar 

  6. Casas, S., Gulino, C., Liao, R., Urtasun, R.: Spatially-aware graph neural networks for relational behavior forecasting from sensor data. arXiv preprint arXiv:1910.08233 (2019)

  7. Pomerleau, D.A.: ALVINN: an autonomous land vehicle in a neural network. In: NIPS (1989)

    Google Scholar 

  8. Chen, D., Zhou, B., Koltun, V., Krähenbühl, P.: Learning by cheating. arXiv preprint arXiv:1912.12294 (2019)

  9. Hou, Y., Ma, Z., Liu, C., Loy, C.C.: Learning to steer by mimicking features from heterogeneous auxiliary networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8433–8440 (2019)

    Google Scholar 

  10. Zeng, W., et al.: End-to-end interpretable neural motion planner. In: CVPR (2019)

    Google Scholar 

  11. Rhinehart, N., McAllister, R., Levine, S.: Deep imitative models for flexible inference, planning, and control. arXiv preprint arXiv:1810.06544 (2018)

  12. Bojarski, M., et al.: End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (2016)

  13. Kendall, A., et al..: Learning to drive in a day. arXiv preprint arXiv:1807.00412 (2018)

  14. Codevilla, F., Miiller, M., López, A., Koltun, V., Dosovitskiy, A.: End-to-end driving via conditional imitation learning. In: ICRA (2018)

    Google Scholar 

  15. Müller, M., Dosovitskiy, A., Ghanem, B., Koltun, V.: Driving policy transfer via modularity and abstraction. arXiv preprint arXiv:1804.09364 (2018)

  16. Codevilla, F., Santana, E., López, A.M., Gaidon, A.: Exploring the limitations of behavior cloning for autonomous driving. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9329–9338 (2019)

    Google Scholar 

  17. Sadat, A., Ren, M., Pokrovsky, A., Lin, Y.C., Yumer, E., Urtasun, R.: Jointly learnable behavior and trajectory planning for self-driving vehicles. In: IROS (2018)

    Google Scholar 

  18. Fan, H., Xia, Z., Liu, C., Chen, Y., Kong, Q.: An auto-tuning framework for autonomous vehicles. arXiv preprint arXiv:1808.04913 (2018)

  19. Ma, H., Wang, Y., Tang, L., Kodagoda, S., Xiong, R.: Towards navigation without precise localization: weakly supervised learning of goal-directed navigation cost map. CoRR abs/1906.02468 (2019)

    Google Scholar 

  20. Banzhaf, H., Sanzenbacher, P., Baumann, U., Zöllner, J.M.: Learning to predict ego-vehicle poses for sampling-based nonholonomic motion planning. IEEE Robot. Autom. Lett. 4(2), 1053–1060 (2019)

    Article  Google Scholar 

  21. Gupta, S., Davidson, J., Levine, S., Sukthankar, R., Malik, J.: Cognitive mapping and planning for visual navigation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2616–2625 (2017)

    Google Scholar 

  22. Parisotto, E., Salakhutdinov, R.: Neural map: structured memory for deep reinforcement learning. arXiv preprint arXiv:1702.08360 (2017)

  23. Khan, A., Zhang, C., Atanasov, N., Karydis, K., Kumar, V., Lee, D.D.: Memory augmented control networks. arXiv preprint arXiv:1709.05706 (2017)

  24. Ratliff, N.D., Bagnell, J.A., Zinkevich, M.A.: Maximum margin planning. In: ICML (2006)

    Google Scholar 

  25. Bansal, M., Krizhevsky, A., Ogale, A.: ChauffeurNet: learning to drive by imitating the best and synthesizing the worst. arXiv preprint arXiv:1812.03079 (2018)

  26. Zhao, A., He, T., Liang, Y., Huang, H., Broeck, G.V.d., Soatto, S.: LaTeS: latent space distillation for teacher-student driving policy learning. arXiv preprint arXiv:1912.02973 (2019)

  27. Hawke, J., et al.: Urban driving with conditional imitation learning. arXiv preprint arXiv:1912.00177 (2019)

  28. Ziebart, B.D., Maas, A.L., Bagnell, J.A., Dey, A.K.: Maximum entropy inverse reinforcement learning. In: AAAI (2008)

    Google Scholar 

  29. Wulfmeier, M., Ondruska, P., Posner, I.: Deep inverse reinforcement learning. CoRR abs/1507.04888 (2015)

    Google Scholar 

  30. Ho, J., Ermon, S.: Generative adversarial imitation learning. In: NIPS (2016)

    Google Scholar 

  31. Feichtenhofer, C., Pinz, A., Zisserman, A.: Detect to track and track to detect. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 3057–3065. IEEE (2017)

    Google Scholar 

  32. Frossard, D., Kee, E., Urtasun, R.: DeepSignals: predicting intent of drivers through visual signals. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 9697–9703. IEEE (2019)

    Google Scholar 

  33. Phan-Minh, T., Grigore, E.C., Boulton, F.A., Beijbom, O., Wolff, E.M.: CoverNet: multimodal behavior prediction using trajectory sets. arXiv preprint arXiv:1911.10298 (2019)

  34. Liang, M., et al.: PnPNet: end-to-end perception and prediction with tracking in the loop. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020

    Google Scholar 

  35. Zhao, T., et al.: Multi-agent tensor fusion for contextual trajectory prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12126–12134 (2019)

    Google Scholar 

  36. Casas, S., Gulino, C., Suo, S., Urtasun, R.: The importance of prior knowledge in precise multimodal prediction. arXiv preprint arXiv:2006.02636 (2020)

  37. Li, L., et al.: End-to-end contextual perception and prediction with interaction transformer. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2020)

    Google Scholar 

  38. Casas, S., Gulino, C., Suo, S., Luo, K., Liao, R., Urtasun, R.: Implicit latent variable model for scene-consistent motion forecasting (2020)

    Google Scholar 

  39. Elfes, A.: Using occupancy grids for mobile robot perception and navigation. Computer 22(6), 46–57 (1989)

    Article  Google Scholar 

  40. Thrun, S.: Learning occupancy grid maps with forward sensor models. Auton. Robots 15(2), 111–127 (2003)

    Article  Google Scholar 

  41. Hoermann, S., Bach, M., Dietmayer, K.: Dynamic occupancy grid prediction for urban autonomous driving: a deep learning approach with fully automatic labeling. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 2056–2063. IEEE (2018)

    Google Scholar 

  42. Jain, A., et al.: Discrete residual flow for probabilistic pedestrian behavior prediction. arXiv preprint arXiv:1910.08041 (2019)

  43. Ridel, D., Deo, N., Wolf, D., Trivedi, M.: Scene compliant trajectory forecast with agent-centric spatio-temporal grids. IEEE Robot. Autom. Lett. 5(2), 2816–2823 (2020)

    Article  Google Scholar 

  44. Liang, J., Jiang, L., Murphy, K., Yu, T., Hauptmann, A.: The garden of forking paths: towards multi-future trajectory prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10508–10518 (2020)

    Google Scholar 

  45. Yang, B., Luo, W., Urtasun, R.: PIXOR: real-time 3d object detection from point clouds. In: CVPR (2018)

    Google Scholar 

  46. Werling, M., Ziegler, J., Kammel, S., Thrun, S.: Optimal trajectory generation for dynamic street scenarios in a frenet frame. In: ICRA (2010)

    Google Scholar 

  47. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361. IEEE (2012)

    Google Scholar 

  48. Manivasagam, S., et al.: LiDARsim: realistic lidar simulation by leveraging the real world. arXiv (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abbas Sadat .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 2 (mp4 9244 KB)

Supplementary material 1 (pdf 35355 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sadat, A., Casas, S., Ren, M., Wu, X., Dhawan, P., Urtasun, R. (2020). Perceive, Predict, and Plan: Safe Motion Planning Through Interpretable Semantic Representations. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12368. Springer, Cham. https://doi.org/10.1007/978-3-030-58592-1_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58592-1_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58591-4

  • Online ISBN: 978-3-030-58592-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics