BEVSeg: Geometry and Data-Driven Based Multi-view Segmentation in Bird’s-Eye-View

Chen, Qiuxiao; Tai, Hung-Shuo; Li, Pengfei; Wang, Ke; Qi, Xiaojun

doi:10.1007/978-3-031-44137-0_36

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14253))

Included in the following conference series:

International Conference on Computer Vision Systems

560 Accesses

Abstract

Perception and awareness of the surroundings are significant for autonomous vehicle navigation. To drive safely, autonomous systems must be able to extract spatial information and understand the semantic meaning of the environment. We propose a novel network architecture BEVSeg to generate the perception and semantic information by incorporating geometry-based and data-driven techniques into two respective modules. Specifically, the geometry-based aligned BEV domain data augmentation module addresses overfitting and misalignment issues by augmenting the coherent BEV feature map, aligning the augmented object and segmentation ground truths, and aligning augmented BEV feature map and its augmented ground truths. The data-driven hierarchy double-branch spatial attention module addresses the inflexibility of the BEV feature generation by learning multi-scale BEV features flexibly via the enlarged feature receptive field and learned interest regions. Experimental results on the nuScenes benchmark dataset demonstrate BEVSeg achieves state-of-the-art results with a higher mIoU of 3.6% than the baseline. Code and models will be released.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers

X-Align++: cross-modal cross-view alignment for Bird’s-eye-view segmentation

Article 16 May 2023

JPerceiver: Joint Perception Network for Depth, Pose and Layout Estimation in Driving Scenes

References

Ammar Abbas, S., Zisserman, A.: A geometric approach to obtain a bird’s eye view from an image. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp. 4095–4104 (2019)
Google Scholar
Caesar, H., et al.: nuscenes: A multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11621–11631 (2020)
Google Scholar
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-End object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chapter Google Scholar
Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: Carla: an open urban driving simulator. In: Conference on Robot Learning, pp. 1–16. PMLR (2017)
Google Scholar
Eberly, D.: Euler angle formulas, pp. 1–18. Geometric Tools, LLC, Technical report (2008)
Google Scholar
Gregor, K., Danihelka, I., Graves, A., Rezende, D., Wierstra, D.: Draw: a recurrent neural network for image generation. In: International Conference on Machine Learning, pp. 1462–1471. PMLR (2015)
Google Scholar
Hu, A., et al.: Fiery: future instance prediction in bird’s-eye view from surround monocular cameras. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15273–15282 (2021)
Google Scholar
Huang, J., Huang, G., Zhu, Z., Du, D.: Bevdet: High-performance multi-camera 3d object detection in bird-eye-view. arXiv preprint arXiv:2112.11790 (2021)
Li, Z., et al.: Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. arXiv preprint arXiv:2203.17270 (2022)
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Google Scholar
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Google Scholar
Liu, Z., et al.: Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation. arXiv preprint arXiv:2205.13542 (2022)
Pan, B., Sun, J., Leung, H.Y.T., Andonian, A., Zhou, B.: Cross-view semantic segmentation for sensing surroundings. IEEE Rob. Autom. Lett. 5(3), 4867–4873 (2020)
Article Google Scholar
Philion, J., Fidler, S.: Lift, splat, shoot: encoding images from arbitrary camera rigs by implicitly unprojecting to 3D. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12359, pp. 194–210. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58568-6_12
Chapter Google Scholar
Roddick, T., Kendall, A., Cipolla, R.: Orthographic feature transform for monocular 3d object detection. arXiv preprint arXiv:1811.08188 (2018)
Sun, P., et al.: Scalability in perception for autonomous driving: Waymo open dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2446–2454 (2020)
Google Scholar
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)
Google Scholar
Xie, E., et al.: M\(^2\)BEV: Multi-camera joint 3d detection and segmentation with unified birds-eye view representation. arXiv preprint arXiv:2204.05088 (2022)
Xu, R., Tu, Z., Xiang, H., Shao, W., Zhou, B., Ma, J.: Cobevt: Cooperative bird’s eye view semantic segmentation with sparse transformers. arXiv preprint arXiv:2207.02202 (2022)
Zhang, Y., et al.: Beverse: Unified perception and prediction in birds-eye-view for vision-centric autonomous driving. arXiv preprint arXiv:2205.09743 (2022)
Zhou, B., Krähenbühl, P.: Cross-view transformers for real-time map-view semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13760–13769 (2022)
Google Scholar
Zhu, M., Zhang, S., Zhong, Y., Lu, P., Peng, H., Lenneman, J.: Monocular 3d vehicle detection using uncalibrated traffic cameras through homography. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3814–3821. IEEE (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

Utah State University, Logan, UT, 84322, USA
Qiuxiao Chen & Xiaojun Qi
DiDi Labs, Mountain View, CA, 94043, USA
Hung-Shuo Tai & Ke Wang
University of California, Riverside, Riverside, CA, 92521, USA
Pengfei Li

Authors

Qiuxiao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Hung-Shuo Tai
View author publications
You can also search for this author in PubMed Google Scholar
Pengfei Li
View author publications
You can also search for this author in PubMed Google Scholar
Ke Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaojun Qi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qiuxiao Chen .

Editor information

Editors and Affiliations

UC San Diego, La Jolla, CA, USA
Henrik I. Christensen
Queensland University of Technology, Brisbane, QLD, Australia
Peter Corke
KU Leuven, Leuven, Belgium
Renaud Detry
TU Wien, Vienna, Austria
Jean-Baptiste Weibel
TU Wien, Vienna, Austria
Markus Vincze

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, Q., Tai, HS., Li, P., Wang, K., Qi, X. (2023). BEVSeg: Geometry and Data-Driven Based Multi-view Segmentation in Bird’s-Eye-View. In: Christensen, H.I., Corke, P., Detry, R., Weibel, JB., Vincze, M. (eds) Computer Vision Systems. ICVS 2023. Lecture Notes in Computer Science, vol 14253. Springer, Cham. https://doi.org/10.1007/978-3-031-44137-0_36

Download citation

DOI: https://doi.org/10.1007/978-3-031-44137-0_36
Published: 21 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44136-3
Online ISBN: 978-3-031-44137-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

BEVSeg: Geometry and Data-Driven Based Multi-view Segmentation in Bird’s-Eye-View

Abstract

Access this chapter

Similar content being viewed by others

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers

X-Align++: cross-modal cross-view alignment for Bird’s-eye-view segmentation

JPerceiver: Joint Perception Network for Depth, Pose and Layout Estimation in Driving Scenes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

BEVSeg: Geometry and Data-Driven Based Multi-view Segmentation in Bird’s-Eye-View

Abstract

Access this chapter

Similar content being viewed by others

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers

X-Align++: cross-modal cross-view alignment for Bird’s-eye-view segmentation

JPerceiver: Joint Perception Network for Depth, Pose and Layout Estimation in Driving Scenes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation