DSDNet: Deep Structured Self-driving Network

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12366)


In this paper, we propose the Deep Structured self-Driving Network (DSDNet), which performs object detection, motion prediction, and motion planning with a single neural network. Towards this goal, we develop a deep structured energy based model which considers the interactions between actors and produces socially consistent multimodal future predictions. Furthermore, DSDNet explicitly exploits the predicted future distributions of actors to plan a safe maneuver by using a structured planning cost. Our sample-based formulation allows us to overcome the difficulty in probabilistic inference of continuous random variables. Experiments on a number of large-scale self driving datasets demonstrate that our model significantly outperforms the state-of-the-art.


Autonomous driving Motion prediction Motion planning 

Supplementary material

504479_1_En_10_MOESM1_ESM.pdf (1.9 mb)
Supplementary material 1 (pdf 1928 KB)

Supplementary material 2 (mp4 51724 KB)


  1. 1.
    Alahi, A., Goel, K., Ramanathan, V., Robicquet, A., Fei-Fei, L., Savarese, S.: Social LSTM: human trajectory prediction in crowded spaces. In: CVPR (2016)Google Scholar
  2. 2.
    Bandyopadhyay, T., Won, K.S., Frazzoli, E., Hsu, D., Lee, W.S., Rus, D.: Intention-aware motion planning. In: Frazzoli, E., Lozano-Perez, T., Roy, N., Rus, D. (eds.) Algorithmic Foundations of Robotics X. STAR, vol. 86, pp. 475–491. Springer, Heidelberg (2013). Scholar
  3. 3.
    Belanger, D., McCallum, A.: Structured prediction energy networks. In: ICML (2016)Google Scholar
  4. 4.
    Bojarski, M., et al.: End to end learning for self-driving cars. arXiv (2016)Google Scholar
  5. 5.
    Buehler, M., Iagnemma, K., Singh, S.: The DARPA Urban Challenge: Autonomous Vehicles in City Traffic (2009)Google Scholar
  6. 6.
    Caesar, H., et al.: nuScenes: a multimodal dataset for autonomous driving. arXiv (2019)Google Scholar
  7. 7.
    Casas, S., Gulino, C., Liao, R., Urtasun, R.: Spatially-aware graph neural networks for relational behavior forecasting from sensor data. arXiv (2019)Google Scholar
  8. 8.
    Casas, S., Gulino, C., Suo, S., Luo, K., Liao, R., Urtasun, R.: Implicit latent variable model for scene-consistent motion forecasting. In: ECCV (2020)Google Scholar
  9. 9.
    Casas, S., Gulino, C., Suo, S., Urtasun, R.: The importance of prior knowledge in precise multimodal prediction. In: IROS (2020)Google Scholar
  10. 10.
    Casas, S., Luo, W., Urtasun, R.: IntentNet: learning to predict intention from raw sensor data. In: Proceedings of The 2nd Conference on Robot Learning (2018)Google Scholar
  11. 11.
    Chai, Y., Sapp, B., Bansal, M., Anguelov, D.: Multipath: multiple probabilistic anchor trajectory hypotheses for behavior prediction. arXiv (2019)Google Scholar
  12. 12.
    Chen, L.C., Schwing, A., Yuille, A., Urtasun, R.: Learning deep structured models. In: ICML (2015)Google Scholar
  13. 13.
    Codevilla, F., Miiller, M., López, A., Koltun, V., Dosovitskiy, A.: End-to-end driving via conditional imitation learning. In: ICRA (2018)Google Scholar
  14. 14.
    Deo, N., Trivedi, M.M.: Convolutional social pooling for vehicle trajectory prediction. In: CVPR (2018)Google Scholar
  15. 15.
    Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: Carla: an open urban driving simulator. arXiv (2017)Google Scholar
  16. 16.
    Fan, H., et al.: Baidu apollo em motion planner. arXiv (2018)Google Scholar
  17. 17.
    Graber, C., Meshi, O., Schwing, A.: Deep structured prediction with nonlinear output transformations. In: NeurIPS (2018)Google Scholar
  18. 18.
    Gupta, A., Johnson, J., Fei-Fei, L., Savarese, S., Alahi, A.: Social GAN: socially acceptable trajectories with generative adversarial networks. In: CVPR (2018)Google Scholar
  19. 19.
    Hardy, J., Campbell, M.: Contingency planning over probabilistic obstacle predictions for autonomous road vehicles. IEEE Trans. Robot. D (2013)Google Scholar
  20. 20.
    Hong, J., Sapp, B., Philbin, J.: Rules of the road: predicting driving behavior with a convolutional model of semantic interactions. In: CVPR (2019)Google Scholar
  21. 21.
    Ihler, A., McAllester, D.: Particle belief propagation. In: Artificial Intelligence and Statistics (2009)Google Scholar
  22. 22.
    Jain, A., et al.: Discrete residual flow for probabilistic pedestrian behavior prediction. arXiv (2019)Google Scholar
  23. 23.
    Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. arXiv (2013)Google Scholar
  24. 24.
    Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: PointPillars: fast encoders for object detection from point clouds. In: CVPR (2019)Google Scholar
  25. 25.
    Lee, N., Choi, W., Vernaza, P., Choy, C.B., Torr, P.H., Chandraker, M.: Desire: distant future prediction in dynamic scenes with interacting agents. In: CVPR (2017)Google Scholar
  26. 26.
    Li, L., et al.: End-to-end contextual perception and prediction with interaction transformer. In: IROS (2020)Google Scholar
  27. 27.
    Liang, M., et al.: Learning lane graph representations for motion forecasting. In: ECCV (2020)Google Scholar
  28. 28.
    Liang, M., et al.: PnPNet: end-to-end perception and prediction with tracking in the loop. In: CVPR (2020)Google Scholar
  29. 29.
    Luo, W., Yang, B., Urtasun, R.: Fast and furious: real time end-to-end 3d detection, tracking and motion forecasting with a single convolutional netGoogle Scholar
  30. 30.
    Manivasagam, S., et al.: LiDARsim: realistic lidar simulation by leveraging the real world. In: CVPR (2020)Google Scholar
  31. 31.
    Marcos, D., et al.: Learning deep structured active contours end-to-end. In: CVPR (2018)Google Scholar
  32. 32.
    Min Choi, H., Kang, H., Hyun, Y.: Multi-view reprojection architecture for orientation estimation. In: ICCV (2019)Google Scholar
  33. 33.
    Montemerlo, M., et al.: Junior: the stanford entry in the urban challenge. J. Field Robot. (2008)Google Scholar
  34. 34.
    Müller, M., Dosovitskiy, A., Ghanem, B., Koltun, V.: Driving policy transfer via modularity and abstraction. arXiv (2018)Google Scholar
  35. 35.
    Murphy, K.P., Weiss, Y., Jordan, M.I.: Loopy belief propagation for approximate inference: an empirical study. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (1999)Google Scholar
  36. 36.
    Phan-Minh, T., Grigore, E.C., Boulton, F.A., Beijbom, O., Wolff, E.M.: CoverNet: multimodal behavior prediction using trajectory sets. In: CVPR (2020)Google Scholar
  37. 37.
    Pomerleau, D.A.: ALVINN: an autonomous land vehicle in a neural network. In: NeurIPS (1989)Google Scholar
  38. 38.
    Rhinehart, N., Kitani, K.M., Vernaza, P.: r2p2: a reparameterized pushforward policy for diverse, precise generative path forecasting. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 794–811. Springer, Cham (2018). Scholar
  39. 39.
    Rhinehart, N., McAllister, R., Kitani, K., Levine, S.: PRECOG: prediction conditioned on goals in visual multi-agent settings. arXiv (2019)Google Scholar
  40. 40.
    Sadat, A., Casas, S., Ren, M., Wu, X., Dhawan, P., Urtasun, R.: Perceive, predict, and plan: Safe motion planning through interpretable semantic representations. In: ECCV (2020)Google Scholar
  41. 41.
    Sadat, A., Ren, M., Pokrovsky, A., Lin, Y.C., Yumer, E., Urtasun, R.: Jointly learnable behavior and trajectory planning for self-driving vehicles. arXiv (2019)Google Scholar
  42. 42.
    Sadeghian, A., Legros, F., Voisin, M., Vesel, R., Alahi, A., Savarese, S.: CAR-Net: clairvoyant attentive recurrent network. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 162–180. Springer, Cham (2018). Scholar
  43. 43.
    Schwing, A.G., Urtasun, R.: Fully connected deep structured networks. arXiv (2015)Google Scholar
  44. 44.
    Sohn, K., Lee, H., Yan, X.: Learning structured output representation using deep conditional generative models. In: NeurIPS (2015)Google Scholar
  45. 45.
    Sudderth, E.B., Ihler, A.T., Isard, M., Freeman, W.T., Willsky, A.S.: Nonparametric belief propagation. Commun. ACM (2010)Google Scholar
  46. 46.
    Tang, Y.C., Salakhutdinov, R.: Multiple futures prediction. arXiv (2019)Google Scholar
  47. 47.
    Wang, T.H., Manivasagam, S., Liang, M., Yang, B., Zeng, W., Raquel, U.: V2VNET: vehicle-to-vehicle communication for joint perception and prediction. In: ECCV (2020)Google Scholar
  48. 48.
    Weiss, Y., Pearl, J.: Belief propagation: technical perspective. Commun. ACM (2010)Google Scholar
  49. 49.
    Wulfmeier, M., Ondruska, P., Posner, I.: Maximum entropy deep inverse reinforcement learning. arXiv (2015)Google Scholar
  50. 50.
    Yamaguchi, K., Hazan, T., McAllester, D., Urtasun, R.: Continuous Markov random fields for robust stereo estimation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 45–58. Springer, Heidelberg (2012). Scholar
  51. 51.
    Yamaguchi, K., McAllester, D., Urtasun, R.: Efficient joint segmentation, occlusion labeling, stereo and flow estimation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 756–771. Springer, Cham (2014). Scholar
  52. 52.
    Yang, B., Luo, W., Urtasun, R.: Pixor: Real-time 3D object detection from point cloudsGoogle Scholar
  53. 53.
    Yedidia, J.S., Freeman, W.T., Weiss, Y.: Understanding belief propagation and its generalizations. In: Exploring Artificial Intelligence in the New Millennium (2003)Google Scholar
  54. 54.
    Zeng, W., Luo, W., Suo, S., Sadat, A., Yang, B., Casas, S., Urtasun, R.: End-to-end interpretable neural motion planner. In: CVPR (2019)Google Scholar
  55. 55.
    Zhai, S., Cheng, Y., Lu, W., Zhang, Z.: Deep structured energy based models for anomaly detection. In: ICML (2016)Google Scholar
  56. 56.
    Zhan, W., Liu, C., Chan, C.Y., Tomizuka, M.: A non-conservatively defensive strategy for urban autonomous driving. In: 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC) (2016)Google Scholar
  57. 57.
    Zhao, T., et al.: Multi-agent tensor fusion for contextual trajectory prediction. In: CVPR (2019)Google Scholar
  58. 58.
    Zhou, Y., Tuzel, O.: VoxelNet: end-to-end learning for point cloud based 3D object detection. In: CVPR (2018)Google Scholar
  59. 59.
    Zhu, B., Jiang, Z., Zhou, X., Li, Z., Yu, G.: Class-balanced grouping and sampling for point cloud 3D object detection. arXiv (2019)Google Scholar
  60. 60.
    Ziebart, B.D., Maas, A.L., Bagnell, J.A., Dey, A.K.: Maximum entropy inverse reinforcement learning. In: AAAI (2008)Google Scholar
  61. 61.
    Ziegler, J., Bender, P., Dang, T., Stiller, C.: Trajectory planning for bertha–a local, continuous method. In: Intelligent Vehicles Symposium Proceedings, 2014 IEEE (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Uber ATGPittsburghUSA
  2. 2.University of TorontoTorontoCanada

Personalised recommendations