Learning Lane Graph Representations for Motion Forecasting

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12347)


We propose a motion forecasting model that exploits a novel structured map representation as well as actor-map interactions. Instead of encoding vectorized maps as raster images, we construct a lane graph from raw map data to explicitly preserve the map structure. To capture the complex topology and long range dependencies of the lane graph, we propose LaneGCN which extends graph convolutions with multiple adjacency matrices and along-lane dilation. To capture the complex interactions between actors and maps, we exploit a fusion network consisting of four types of interactions, actor-to-lane, lane-to-lane, lane-to-actor and actor-to-actor. Powered by LaneGCN and actor-map interactions, our model is able to predict accurate and realistic multi-modal trajectories. Our approach significantly outperforms the state-of-the-art on the large scale Argoverse motion forecasting benchmark.


HD map Motion forecasting Autonomous driving 

Supplementary material (1.4 mb)
Supplementary material 1 (zip 1385 KB)


  1. 1.
    ArgoAI challenge. NeurIPS Workshop on Machine Learning for Autonomous Driving (2019)
  2. 2.
    Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)
  3. 3.
    Bansal, M., Krizhevsky, A., Ogale, A.: Chauffeurnet: Learning to drive by imitating the best and synthesizing the worst. arXiv preprint arXiv:1812.03079 (2018)
  4. 4.
    Casas, S., Gulino, C., Liao, R., Urtasun, R.: Spatially-aware graph neural networks for relational behavior forecasting from sensor data. In: ICRA (2020)Google Scholar
  5. 5.
    Casas, S., Gulino, C., Suo, S., Luo, K., Liao, R., Urtasun, R.: Implicit latent variable model for scene-consistent motion forecasting. In: Proceedings of the European Conference on Computer Vision (ECCV) (2020)Google Scholar
  6. 6.
    Casas, S., Gulino, C., Suo, S., Urtasun, R.: The importance of prior knowledge in precise multimodal prediction. In: IROS (2020)Google Scholar
  7. 7.
    Casas, S., Luo, W., Urtasun, R.: Intentnet: Learning to predict intention from raw sensor data. In: Conference on Robot Learning. pp. 947–956 (2018)Google Scholar
  8. 8.
    Chai, Y., Sapp, B., Bansal, M., Anguelov, D.: Multipath: Multiple probabilistic anchor trajectory hypotheses for behavior prediction. arXiv:abs/1910.05449 (2019)
  9. 9.
    Chang, M.F., Lambert, J., Sangkloy, P., Singh, J., Bak, S., Hartnett, A., Wang, D., Carr, P., Lucey, S., Ramanan, D., et al.: Argoverse: 3d tracking and forecasting with rich maps. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 8748–8757 (2019)Google Scholar
  10. 10.
    Chou, F.C., Lin, T.H., Cui, H., Radosavljevic, V., Nguyen, T., Huang, T.K., Niedoba, M., Schneider, J., Djuric, N.: Predicting motion of vulnerable road users using high-definition maps and efficient convnets. arXiv:abs/1906.08469 (2019)
  11. 11.
    Chu, H., Li, D., Acuna, D., Kar, A., Shugrina, M., Wei, X., Liu, M.Y., Torralba, A., Fidler, S.: Neural turtle graphics for modeling city road layouts. In: ICCV (2019)Google Scholar
  12. 12.
    Cui, H., Radosavljevic, V., Chou, F.C., Lin, T.H., Nguyen, T., Huang, T.K., Schneider, J., Djuric, N.: Multimodal trajectory predictions for autonomous driving using deep convolutional networks. 2019 International Conference on Robotics and Automation (ICRA) pp. 2090–2096 (2018)Google Scholar
  13. 13.
    Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. In: Advances in neural information processing systems. pp. 3844–3852 (2016)Google Scholar
  14. 14.
    Djuric, N., Radosavljevic, V., Cui, H., Nguyen, T., Chou, F.C., Lin, T.H., Schneider, J.: Motion prediction of traffic actors for autonomous driving using deep convolutional networks. arXiv preprint arXiv:1808.05819 (2018)
  15. 15.
    Duvenaud, D.K., Maclaurin, D., Iparraguirre, J., Bombarell, R., Hirzel, T., Aspuru-Guzik, A., Adams, R.P.: Convolutional networks on graphs for learning molecular fingerprints. In: Advances in neural information processing systems. pp. 2224–2232 (2015)Google Scholar
  16. 16.
    Gao, J., Sun, C., Zhao, H., Shen, Y., Anguelov, D., Li, C., Schmid, C.: Vectornet: Encoding hd maps and agent dynamics from vectorized representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 11525–11533 (2020)Google Scholar
  17. 17.
    Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics. pp. 315–323 (2011)Google Scholar
  18. 18.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016)Google Scholar
  19. 19.
    Henaff, M., Bruna, J., LeCun, Y.: Deep convolutional networks on graph-structured data. arXiv preprint arXiv:1506.05163 (2015)
  20. 20.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  21. 21.
    Homayounfar, N., Ma, W.C., Lakshmikanth, S.K., Urtasun, R.: Hierarchical recurrent attention networks for structured online maps. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition pp. 3417–3426 (2018)Google Scholar
  22. 22.
    Homayounfar, N., Ma, W.C., Liang, J., Wu, X., Fan, J., Urtasun, R.: Dagmapper: Learning to map by discovering lane topology. In: ICCV (2019)Google Scholar
  23. 23.
    Hong, J., Sapp, B., Philbin, J.: Rules of the road: Predicting driving behavior with a convolutional model of semantic interactions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 8454–8462 (2019)Google Scholar
  24. 24.
    Huang, X., McGill, S.G., DeCastro, J.A., Williams, B.C., Fletcher, L., Leonard, J.J., Rosman, G.: Diversity-aware vehicle motion prediction via latent semantic sampling. arXiv preprint arXiv:1911.12736 (2019)
  25. 25.
    Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  26. 26.
    Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
  27. 27.
    Li, L., Yang, B., Liang, M., Zeng, W., Ren, M., Segal, S., Urtasun, R.: End-to-end contextual perception and prediction with interaction transformer. In: IROS (2020)Google Scholar
  28. 28.
    Liang, J., Homayounfar, N., Ma, W.C., Wang, S., Urtasun, R.: Convolutional recurrent network for road boundary extraction. In: CVPR (2019)Google Scholar
  29. 29.
    Liang, M., Yang, B., Zeng, W., Chen, Y., Hu, R., Casas, S., Urtasun, R.: Pnpnet: End-to-end perception and prediction with tracking in the loop. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 11553–11562 (2020)Google Scholar
  30. 30.
    Liao, R., Zhao, Z., Urtasun, R., Zemel, R.S.: Lanczosnet: Multi-scale deep graph convolutional networks. arXiv preprint arXiv:1901.01484 (2019)
  31. 31.
    Lin, T.Y., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 936–944 (2016)Google Scholar
  32. 32.
    Máttyus, G., Wang, S., Fidler, S., Urtasun, R.: Enhancing road maps by parsing aerial images around the world. 2015 IEEE International Conference on Computer Vision (ICCV) pp. 1689–1697 (2015)Google Scholar
  33. 33.
    Máttyus, G., Wang, S., Fidler, S., Urtasun, R.: Hd maps: Fine-grained road segmentation by parsing ground and aerial images. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 3611–3619 (2016)Google Scholar
  34. 34.
    Mercat, J., Gilles, T., Zoghby, N.E., Sandou, G., Beauvois, D., Gil, G.P.: Multi-head attention for multi-modal joint vehicle motion forecasting. arXiv preprint arXiv:1910.03650 (2019)
  35. 35.
    Sadat, A., Casas, S., Ren, M., Wu, X., Dhawan, P., Urtasun, R.: Perceive, predict, and plan: Safe motion planning through interpretable semantic representations. In: Proceedings of the European Conference on Computer Vision (ECCV) (2020)Google Scholar
  36. 36.
    Shuman, D.I., Narang, S.K., Frossard, P., Ortega, A., Vandergheynst, P.: The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE signal processing magazine 30(3), 83–98 (2013)CrossRefGoogle Scholar
  37. 37.
    Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in neural information processing systems. pp. 5998–6008 (2017)Google Scholar
  38. 38.
    Yang, B., Liang, M., Urtasun, R.: Hdnet: Exploiting hd maps for 3d object detection. In: Conference on Robot Learning. pp. 146–155 (2018)Google Scholar
  39. 39.
    Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)
  40. 40.
    Zeng, W., Luo, W., Suo, S., Sadat, A., Yang, B., Casas, S., Urtasun, R.: End-to-end interpretable neural motion planner. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)Google Scholar
  41. 41.
    Zeng, W., Wang, S., Liao, R., Chen, Y., Yang, B., Urtasun, R.: Dsdnet: Deep structured self-driving network. In: ECCV (2020)Google Scholar
  42. 42.
    Ziegler, J., Bender, P., Schreiber, M., Lategahn, H., Strauss, T., Stiller, C., Dang, T., Franke, U., Appenrodt, N., Keller, C.G., et al.: Making bertha drivean autonomous journey on a historic route. IEEE Intelligent transportation systems magazine 6(2), 8–20 (2014)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Uber ATGPittsburghUSA
  2. 2.University of TorontoTorontoCanada

Personalised recommendations