Skip to main content

DA4AD: End-to-End Deep Attention-Based Visual Localization for Autonomous Driving

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 (ECCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12373))

Included in the following conference series:

Abstract

We present a visual localization framework based on novel deep attention aware features for autonomous driving that achieves centimeter level localization accuracy. Conventional approaches to the visual localization problem rely on handcrafted features or human-made objects on the road. They are known to be either prone to unstable matching caused by severe appearance or lighting changes, or too scarce to deliver constant and robust localization results in challenging scenarios. In this work, we seek to exploit the deep attention mechanism to search for salient, distinctive and stable features that are good for long-term matching in the scene through a novel end-to-end deep neural network. Furthermore, our learned feature descriptors are demonstrated to be competent to establish robust matches and therefore successfully estimate the optimal camera poses with high precision. We comprehensively validate the effectiveness of our method using a freshly collected dataset with high-quality ground truth trajectories and hardware synchronization between sensors. Results demonstrate that our method achieves a competitive localization accuracy when compared to the LiDAR-based localization solutions under various challenging circumstances, leading to a potential low-cost localization solution for autonomous driving.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Baidu Apollo open platform. http://apollo.auto/

  2. Alahi, A., Ortiz, R., Vandergheynst, P.: FREAK: fast retina keypoint. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 510–517. IEEE (2012)

    Google Scholar 

  3. Barsan, I.A., Wang, S., Pokrovsky, A., Urtasun, R.: Learning to localize using a LiDAR intensity map. In: Proceedings of the Conference on Robot Learning (CoRL), pp. 605–616 (2018)

    Google Scholar 

  4. Brahmbhatt, S., Gu, J., Kim, K., Hays, J., Kautz, J.: Geometry-aware learning of maps for camera localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

    Google Scholar 

  5. Bürki, M., et al.: VIZARD: reliable visual localization for autonomous vehicles in urban outdoor environments. arXiv preprint arXiv:1902.04343 (2019)

  6. Calonder, M., Lepetit, V., Strecha, C., Fua, P.: BRIEF: binary robust independent elementary features. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 778–792. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_56

    Chapter  Google Scholar 

  7. Carlevaris-Bianco, N., Ushani, A.K., Eustice, R.M.: University of Michigan North Campus long-term vision and LiDAR dataset. Int. J. Rob. Res. (IJRR) 35(9), 1023–1035 (2015)

    Article  Google Scholar 

  8. Caselitz, T., Steder, B., Ruhnke, M., Burgard, W.: Monocular camera localization in 3D LiDAR maps. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1926–1931. IEEE (2016)

    Google Scholar 

  9. Chen, Y., Wang, G.: EnforceNet: monocular camera localization in large scale indoor sparse LiDAR point cloud. arXiv preprint arXiv:1907.07160 (2019)

  10. Cui, D., Xue, J., Du, S., Zheng, N.: Real-time global localization of intelligent road vehicles in lane-level via lane marking detection and shape registration. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4958–4964. IEEE (2014)

    Google Scholar 

  11. Cui, D., Xue, J., Zheng, N.: Real-time global localization of robotic cars in lane level via lane marking detection and shape registration. IEEE Trans. Intell. Transp. Syst. (T-ITS) 17(4), 1039–1050 (2015)

    Article  Google Scholar 

  12. DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperPoint: self-supervised interest point detection and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2018)

    Google Scholar 

  13. Dusmanu, Met al.: D2-Net: atrainable CNN for joint description and detection of local features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

  14. Eldar, Y., Lindenbaum, M., Porat, M., Zeevi, Y.Y.: The farthest point strategy for progressive image sampling. IEEE Trans. Image Process. (TIP) 6(9), 1305–1315 (1997)

    Article  Google Scholar 

  15. Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 381–395 (1981)

    Article  MathSciNet  Google Scholar 

  16. Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Rob. Res. (IJRR) 32(11), 1231–1237 (2013)

    Article  Google Scholar 

  17. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the KITTI vision benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3354–3361. IEEE (2012)

    Google Scholar 

  18. Germain, H., Bourmaud, G., Lepetit, V.: Sparse-to-dense hypercolumn matching for long-term visual localization. In: Proceedings of the International Conference on 3D Vision (3DV), pp. 513–523. IEEE (2019)

    Google Scholar 

  19. Haralick, B.M., Lee, C.N., Ottenberg, K., Nölle, M.: Review and analysis of solutions of the three point perspective pose estimation problem. Int. J. Comput. Vis.(IJCV) 13(3), 331–356 (1994)

    Article  Google Scholar 

  20. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)

    Google Scholar 

  21. He, K., Lu, Y., Sclaroff, S.: Local descriptors optimized for average precision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

    Google Scholar 

  22. Jo, K., Jo, Y., Suhr, J.K., Jung, H.G., Sunwoo, M.: Precise localization of an autonomous car based on probabilistic noise models of road surface marker features using multiple cameras. IEEE Trans. Intell. Transp. Syst. 16(6), 3377–3392 (2015)

    Article  Google Scholar 

  23. Kendall, A., Grimes, M., Cipolla, R.: PoseNet: a convolutional network for real-time 6-DOF camera relocalization. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2938–2946 (2015). https://doi.org/10.1109/ICCV.2015.336

  24. Kendall, A., Cipolla, R., et al.: Geometric loss functions for camera pose regression with deep learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 3, p. 8 (2017)

    Google Scholar 

  25. Lategahn, H., Beck, J., Kitt, B., Stiller, C.: How to learn an illumination robust image feature for place recognition. In: Proceedings of the IEEE Intelligent Vehicles Symposium (IV), pp. 285–291. IEEE (2013)

    Google Scholar 

  26. Lategahn, H., Schreiber, M., Ziegler, J., Stiller, C.: Urban localization with camera and inertial measurement unit. In: Proceedings of the IEEE Intelligent Vehicles Symposium (IV), pp. 719–724. IEEE (2013)

    Google Scholar 

  27. Lategahn, H., Stiller, C.: Vision only localization. IEEE Trans. Intell. Transp. Syst. (T-ITS) 15(3), 1246–1257 (2014)

    Article  Google Scholar 

  28. Levinson, J., Montemerlo, M., Thrun, S.: Map-based precision vehicle localization in urban environments. In: Proceedings of the Robotics: Science and Systems (RSS), vol. 4, p. 1 (2007)

    Google Scholar 

  29. Levinson, J., Thrun, S.: Robust vehicle localization in urban environments using probabilistic maps. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp. 4372–4378 (2010)

    Google Scholar 

  30. Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2117–2125 (2017)

    Google Scholar 

  31. Linegar, C., Churchill, W., Newman, P.: Work smart, not hard: recalling relevant experiences for vast-scale but time-constrained localisation. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp. 90–97. IEEE (2015)

    Google Scholar 

  32. Linegar, C., Churchill, W., Newman, P.: Made to measure: bespoke landmarks for 24-hour, all-weather localisation with a camera. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp. 787–794. IEEE (2016)

    Google Scholar 

  33. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. (IJCV) 60(2), 91–110 (2004)

    Article  Google Scholar 

  34. Lu, W., Wan, G., Zhou, Y., Fu, X., Yuan, P., Song, S.: DeepVCP: an end-to-end deep neural network for point cloud registration. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2019)

    Google Scholar 

  35. Lu, W., Zhou, Y., Wan, G., Hou, S., Song, S.: L3-Net: towards learning based LiDAR localization for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6389–6398 (2019)

    Google Scholar 

  36. Maddern, W., Pascoe, G., Gadd, M., Barnes, D., Yeomans, B., Newman, P.: Real-time kinematic ground truth for the Oxford robotcar dataset. arXiv preprint arXiv:2002.10152 (2020)

  37. Maddern, W., Pascoe, G., Linegar, C., Newman, P.: 1 year, 1000 km: the Oxford RobotCar dataset. Int. J. Rob. Res. (IJRR) 36(1), 3–15 (2017)

    Article  Google Scholar 

  38. Maddern, W., Stewart, A.D., Newman, P.: LAPS-II: 6-DoF day and night visual localisation with prior 3D structure for autonomous road vehicles. In: Proceedings of the IEEE Intelligent Vehicles Symposium (IV), pp. 330–337. IEEE (2014)

    Google Scholar 

  39. Naseer, T., Burgard, W.: Deep regression for monocular camera-based 6-DoF global localization in outdoor environments. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1525–1530 (2017)

    Google Scholar 

  40. Neubert, P., Schubert, S., Protzel, P.: Sampling-based methods for visual navigation in 3D maps by synthesizing depth images. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2492–2498. IEEE (2017)

    Google Scholar 

  41. Noh, H., Araujo, A., Sim, J., Weyand, T., Han, B.: Large-scale image retrieval with attentive deep local features. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 3456–3465 (2017)

    Google Scholar 

  42. Pandey, G., McBride, J.R., Eustice, R.M.: Ford campus vision and LiDAR data set. Int. J. Rob. Res. (IJRR) 30(13), 1543–1552 (2011)

    Article  Google Scholar 

  43. Pascoe, G., Maddern, W., Newman, P.: Direct visual localisation and calibration for road vehicles in changing city environments. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 9–16 (2015)

    Google Scholar 

  44. Radwan, N., Valada, A., Burgard, W.: VLocNet++: deep multitask learning for semantic visual localization and odometry. IEEE Rob. Autom. Lett. (RA-L) 3(4), 4407–4414 (2018)

    Article  Google Scholar 

  45. Ranganathan, A., Ilstrup, D., Wu, T.: Light-weight localization for vehicles using road markings. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 921–927. IEEE (2013)

    Google Scholar 

  46. Sarlin, P.E., Cadena, C., Siegwart, R., Dymczyk, M.: From coarse to fine: robust hierarchical localization at large scale. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

  47. Sattler, T., et al.: Benchmarking 6-DoF outdoor visual localization in changing conditions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8601–8610 (2018)

    Google Scholar 

  48. Sattler, T., Zhou, Q., Pollefeys, M., Leal-Taixe, L.: Understanding the limitations of CNN-Based absolute camera pose regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

  49. Savinov, N., Seki, A., Ladicky, L., Sattler, T., Pollefeys, M.: Quad-networks: unsupervised learning to rank for interest point detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  50. Schreiber, M., Knöppel, C., Franke, U.: Laneloc: lane marking based localization using highly accurate maps. In: Proceedings of the IEEE Intelligent Vehicles Symposium (IV), pp. 449–454. IEEE (2013)

    Google Scholar 

  51. Stewart, A.D., Newman, P.: LAPS-localisation using appearance of prior structure: 6-DOF monocular camera localisation using prior pointclouds. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp. 2625–2632. IEEE (2012)

    Google Scholar 

  52. von Stumberg, L., Wenzel, P., Khan, Q., Cremers, D.: GN-Net: the Gauss-Newton loss for multi-weather relocalization. IEEE Rob. Autom. Lett. 5(2), 890–897 (2020)

    Article  Google Scholar 

  53. Suhr, J.K., Jang, J., Min, D., Jung, H.G.: Sensor fusion-based low-cost vehicle localization system for complex urban environments. IEEE Trans. Intell. Transp. Syst. (T-ITS) 18(5), 1078–1086 (2016)

    Article  Google Scholar 

  54. Taira, H., et al.: InLoc: indoor visual localization with dense matching and view synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

    Google Scholar 

  55. Toft, C., et al.: Semantic match consistency for long-term visual localization. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018)

    Google Scholar 

  56. Valada, A., Radwan, N., Burgard, W.: Deep auxiliary learning for visual localization and odometry. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp. 6939–6946 (2018). https://doi.org/10.1109/ICRA.2018.8462979

  57. Walch, F., Hazirbas, C., Leal-Taixe, L., Sattler, T., Hilsenbeck, S., Cremers, D.: Image-based localization using LSTMs for structured feature correlation. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 627–637 (2017)

    Google Scholar 

  58. Wan, G., et al.: Robust and precise vehicle localization based on multi-sensor fusion in diverse city scenes. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp. 4670–4677 (2018)

    Google Scholar 

  59. Wang, P., Yang, R., Cao, B., Xu, W., Lin, Y.: DeLS-3D: deep localization and segmentation with a 3D semantic map. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

    Google Scholar 

  60. Wolcott, R.W., Eustice, R.M.: Visual localization within LiDAR maps for automated urban driving. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 176–183. IEEE (2014)

    Google Scholar 

  61. Wolcott, R.W., Eustice, R.M.: Fast LiDAR localization using multiresolution Gaussian mixture maps. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp. 2814–2821 (2015)

    Google Scholar 

  62. Wolcott, R.W., Eustice, R.M.: Robust LiDAR localization using multiresolution gaussian mixture maps for autonomous driving. Int. J. Rob. Res. (IJRR) 36(3), 292–319 (2017)

    Article  Google Scholar 

  63. Wu, T., Ranganathan, A.: Vehicle localization using road markings. In: Proceedings of the IEEE Intelligent Vehicles Symposium (IV), pp. 1185–1190. IEEE (2013)

    Google Scholar 

  64. Yi, K.M.Y., Trulls, E., Lepetit, V., Fua, P.: LIFT: learned invariant feature transform. In: Proceedings of the European Conference on Computer Vision (ECCV) (2016)

    Google Scholar 

  65. Yu, Y., Zhao, H., Davoine, F., Cui, J., Zha, H.: Monocular visual localization using road structural features. In: Proceedings of the IEEE Intelligent Vehicles Symposium Proceedings (IV), pp. 693–699. IEEE (2014)

    Google Scholar 

Download references

Acknowledgments

This work is supported by Baidu Autonomous Driving Technology Department (ADT) in conjunction with the Apollo Project. Shufu Xie helped with the development of the lane-based method. Shirui Li and Yuanfan Xie helped with the sensor calibration. Shuai Wang, Lingchang Li, and Shuangcheng Guo helped with the sensor synchronization.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shiyu Song .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 82394 KB)

Supplementary material 2 (pdf 4098 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhou, Y. et al. (2020). DA4AD: End-to-End Deep Attention-Based Visual Localization for Autonomous Driving. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12373. Springer, Cham. https://doi.org/10.1007/978-3-030-58604-1_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58604-1_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58603-4

  • Online ISBN: 978-3-030-58604-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics