Perceive, Predict, and Plan: Safe Motion Planning Through Interpretable Semantic Representations

Sadat, Abbas; Casas, Sergio; Ren, Mengye; Wu, Xinyu; Dhawan, Pranaab; Urtasun, Raquel

doi:10.1007/978-3-030-58592-1_25

Perceive, Predict, and Plan: Safe Motion Planning Through Interpretable Semantic Representations

Abbas Sadat¹²,
Sergio Casas^12,13,
Mengye Ren^12,13,
Xinyu Wu¹²,
Pranaab Dhawan¹² &
…
Raquel Urtasun^12,13

Conference paper
First Online: 03 November 2020

4773 Accesses
52 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12368))

Abstract

In this paper we propose a novel end-to-end learnable network that performs joint perception, prediction and motion planning for self-driving vehicles and produces interpretable intermediate representations. Unlike existing neural motion planners, our motion planning costs are consistent with our perception and prediction estimates. This is achieved by a novel differentiable semantic occupancy representation that is explicitly used as cost by the motion planning process. Our network is learned end-to-end from human demonstrations. The experiments in a large-scale manual-driving dataset and closed-loop simulation show that the proposed model significantly outperforms state-of-the-art planners in imitating the human behaviors while producing much safer trajectories.

A. Sadat and S. Casas—Equal contribution.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Liang, M., Yang, B., Chen, Y., Hu, R., Urtasun, R.: Multi-task multi-sensor fusion for 3d object detection. In: CVPR (2019)
Google Scholar
Luo, W., Yang, B., Urtasun, R.: Fast and furious: real time end-to-end 3d detection, tracking and motion forecasting with a single convolutional net. In: CVPR (2018)
Google Scholar
Casas, S., Luo, W., Urtasun, R.: IntentNet: learning to predict intention from raw sensor data. In: CoRL (2018)
Google Scholar
Chai, Y., Sapp, B., Bansal, M., Anguelov, D.: MultiPath: multiple probabilistic anchor trajectory hypotheses for behavior prediction. arXiv preprint arXiv:1910.05449 (2019)
Tang, C., Salakhutdinov, R.R.: Multiple futures prediction. In: Advances in Neural Information Processing Systems, pp. 15398–15408 (2019)
Google Scholar
Casas, S., Gulino, C., Liao, R., Urtasun, R.: Spatially-aware graph neural networks for relational behavior forecasting from sensor data. arXiv preprint arXiv:1910.08233 (2019)
Pomerleau, D.A.: ALVINN: an autonomous land vehicle in a neural network. In: NIPS (1989)
Google Scholar
Chen, D., Zhou, B., Koltun, V., Krähenbühl, P.: Learning by cheating. arXiv preprint arXiv:1912.12294 (2019)
Hou, Y., Ma, Z., Liu, C., Loy, C.C.: Learning to steer by mimicking features from heterogeneous auxiliary networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8433–8440 (2019)
Google Scholar
Zeng, W., et al.: End-to-end interpretable neural motion planner. In: CVPR (2019)
Google Scholar
Rhinehart, N., McAllister, R., Levine, S.: Deep imitative models for flexible inference, planning, and control. arXiv preprint arXiv:1810.06544 (2018)
Bojarski, M., et al.: End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (2016)
Kendall, A., et al..: Learning to drive in a day. arXiv preprint arXiv:1807.00412 (2018)
Codevilla, F., Miiller, M., López, A., Koltun, V., Dosovitskiy, A.: End-to-end driving via conditional imitation learning. In: ICRA (2018)
Google Scholar
Müller, M., Dosovitskiy, A., Ghanem, B., Koltun, V.: Driving policy transfer via modularity and abstraction. arXiv preprint arXiv:1804.09364 (2018)
Codevilla, F., Santana, E., López, A.M., Gaidon, A.: Exploring the limitations of behavior cloning for autonomous driving. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9329–9338 (2019)
Google Scholar
Sadat, A., Ren, M., Pokrovsky, A., Lin, Y.C., Yumer, E., Urtasun, R.: Jointly learnable behavior and trajectory planning for self-driving vehicles. In: IROS (2018)
Google Scholar
Fan, H., Xia, Z., Liu, C., Chen, Y., Kong, Q.: An auto-tuning framework for autonomous vehicles. arXiv preprint arXiv:1808.04913 (2018)
Ma, H., Wang, Y., Tang, L., Kodagoda, S., Xiong, R.: Towards navigation without precise localization: weakly supervised learning of goal-directed navigation cost map. CoRR abs/1906.02468 (2019)
Google Scholar
Banzhaf, H., Sanzenbacher, P., Baumann, U., Zöllner, J.M.: Learning to predict ego-vehicle poses for sampling-based nonholonomic motion planning. IEEE Robot. Autom. Lett. 4(2), 1053–1060 (2019)
Article Google Scholar
Gupta, S., Davidson, J., Levine, S., Sukthankar, R., Malik, J.: Cognitive mapping and planning for visual navigation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2616–2625 (2017)
Google Scholar
Parisotto, E., Salakhutdinov, R.: Neural map: structured memory for deep reinforcement learning. arXiv preprint arXiv:1702.08360 (2017)
Khan, A., Zhang, C., Atanasov, N., Karydis, K., Kumar, V., Lee, D.D.: Memory augmented control networks. arXiv preprint arXiv:1709.05706 (2017)
Ratliff, N.D., Bagnell, J.A., Zinkevich, M.A.: Maximum margin planning. In: ICML (2006)
Google Scholar
Bansal, M., Krizhevsky, A., Ogale, A.: ChauffeurNet: learning to drive by imitating the best and synthesizing the worst. arXiv preprint arXiv:1812.03079 (2018)
Zhao, A., He, T., Liang, Y., Huang, H., Broeck, G.V.d., Soatto, S.: LaTeS: latent space distillation for teacher-student driving policy learning. arXiv preprint arXiv:1912.02973 (2019)
Hawke, J., et al.: Urban driving with conditional imitation learning. arXiv preprint arXiv:1912.00177 (2019)
Ziebart, B.D., Maas, A.L., Bagnell, J.A., Dey, A.K.: Maximum entropy inverse reinforcement learning. In: AAAI (2008)
Google Scholar
Wulfmeier, M., Ondruska, P., Posner, I.: Deep inverse reinforcement learning. CoRR abs/1507.04888 (2015)
Google Scholar
Ho, J., Ermon, S.: Generative adversarial imitation learning. In: NIPS (2016)
Google Scholar
Feichtenhofer, C., Pinz, A., Zisserman, A.: Detect to track and track to detect. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 3057–3065. IEEE (2017)
Google Scholar
Frossard, D., Kee, E., Urtasun, R.: DeepSignals: predicting intent of drivers through visual signals. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 9697–9703. IEEE (2019)
Google Scholar
Phan-Minh, T., Grigore, E.C., Boulton, F.A., Beijbom, O., Wolff, E.M.: CoverNet: multimodal behavior prediction using trajectory sets. arXiv preprint arXiv:1911.10298 (2019)
Liang, M., et al.: PnPNet: end-to-end perception and prediction with tracking in the loop. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020
Google Scholar
Zhao, T., et al.: Multi-agent tensor fusion for contextual trajectory prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12126–12134 (2019)
Google Scholar
Casas, S., Gulino, C., Suo, S., Urtasun, R.: The importance of prior knowledge in precise multimodal prediction. arXiv preprint arXiv:2006.02636 (2020)
Li, L., et al.: End-to-end contextual perception and prediction with interaction transformer. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2020)
Google Scholar
Casas, S., Gulino, C., Suo, S., Luo, K., Liao, R., Urtasun, R.: Implicit latent variable model for scene-consistent motion forecasting (2020)
Google Scholar
Elfes, A.: Using occupancy grids for mobile robot perception and navigation. Computer 22(6), 46–57 (1989)
Article Google Scholar
Thrun, S.: Learning occupancy grid maps with forward sensor models. Auton. Robots 15(2), 111–127 (2003)
Article Google Scholar
Hoermann, S., Bach, M., Dietmayer, K.: Dynamic occupancy grid prediction for urban autonomous driving: a deep learning approach with fully automatic labeling. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 2056–2063. IEEE (2018)
Google Scholar
Jain, A., et al.: Discrete residual flow for probabilistic pedestrian behavior prediction. arXiv preprint arXiv:1910.08041 (2019)
Ridel, D., Deo, N., Wolf, D., Trivedi, M.: Scene compliant trajectory forecast with agent-centric spatio-temporal grids. IEEE Robot. Autom. Lett. 5(2), 2816–2823 (2020)
Article Google Scholar
Liang, J., Jiang, L., Murphy, K., Yu, T., Hauptmann, A.: The garden of forking paths: towards multi-future trajectory prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10508–10518 (2020)
Google Scholar
Yang, B., Luo, W., Urtasun, R.: PIXOR: real-time 3d object detection from point clouds. In: CVPR (2018)
Google Scholar
Werling, M., Ziegler, J., Kammel, S., Thrun, S.: Optimal trajectory generation for dynamic street scenarios in a frenet frame. In: ICRA (2010)
Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361. IEEE (2012)
Google Scholar
Manivasagam, S., et al.: LiDARsim: realistic lidar simulation by leveraging the real world. arXiv (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Uber ATG, Toronto, Canada
Abbas Sadat, Sergio Casas, Mengye Ren, Xinyu Wu, Pranaab Dhawan & Raquel Urtasun
University of Toronto, Toronto, Canada
Sergio Casas, Mengye Ren & Raquel Urtasun

Authors

Abbas Sadat
View author publications
You can also search for this author in PubMed Google Scholar
Sergio Casas
View author publications
You can also search for this author in PubMed Google Scholar
Mengye Ren
View author publications
You can also search for this author in PubMed Google Scholar
Xinyu Wu
View author publications
You can also search for this author in PubMed Google Scholar
Pranaab Dhawan
View author publications
You can also search for this author in PubMed Google Scholar
Raquel Urtasun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abbas Sadat .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 2 (mp4 9244 KB)

Supplementary material 1 (pdf 35355 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sadat, A., Casas, S., Ren, M., Wu, X., Dhawan, P., Urtasun, R. (2020). Perceive, Predict, and Plan: Safe Motion Planning Through Interpretable Semantic Representations. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12368. Springer, Cham. https://doi.org/10.1007/978-3-030-58592-1_25

Download citation

DOI: https://doi.org/10.1007/978-3-030-58592-1_25
Published: 03 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58591-4
Online ISBN: 978-3-030-58592-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics