Abstract
Ascertaining the precise detection of traffic objects, drivable areas, and lane lines is the foremost obligation for an automatic driving system. The efficient execution of these tasks is made arduous by the intricate and dynamic nature of the driving environment, as well as varying lighting conditions. This paper proposes EHSINet, an efficient and versatile neural network architecture that adaptively addresses multiple tasks, including traffic object detection, drivable area segmentation, and lane line segmentation. ESHINet is based on the fully convolutional network, enabling the long-range and high-order spatial interactions between neighborhood features without using vision transformers. The performance of EHSINet is evaluated on the BDD100K and KITTI datasets, where it demonstrates superior performance, especially in complex environments, illumination changes, and severe weather conditions. EHSINet outperforms state-of-the-art methods on public datasets. The test results in real-world scenarios also demonstrate its strong generalization and practical value. Code is available at https://github.com/Pepper-FlavoredChewingGum/EHSINet.
Similar content being viewed by others
Data Availibility
The code and datasets analysed during the current study are available in the [EHSINet] repository, [https://github.com/Pepper-FlavoredChewingGum/EHSINet].
References
Chen L, Li Y, Huang C et al (2023) Milestones in autonomous driving and intelligent vehicles: survey of surveys. IEEE Trans Intell Veh 8(2):1046–1056. https://doi.org/10.1109/TIV.2022.3223131
Angelos M, Chandra R, Manocha D (2020) B-gap: Behavior-rich simulation and navigation for autonomous driving. IEEE Robot Autom Lett 7(4):4718–4725. https://doi.org/10.1109/lra.2022.3152594
Wu D, Liao M, Zhang W-T et al (2021) Yolop: You only look once for panoptic driving perception. Mach Intell Res 19(8):550–562. https://doi.org/10.1007/s11633-022-1339-y
Qiu Z, Zhao J, Sun S (2022) Mfialane: Multiscale feature information aggregator network for lane detection. IEEE Trans Intell Transport Syst 23(12):24263–24275. https://doi.org/10.1109/TITS.2022.3195742
Qian Y, Dolan JM, Yang M (2020) Dlt-net: Joint detection of drivable areas, lane lines, and traffic objects. IEEE Trans Intell Transp Syst 21(11):4670–4679. https://doi.org/10.1109/TITS.2019.2943777
Xu Y, Pedram G (2022) Consistency-regularized region-growing network for semantic segmentation of urban scenes with point-level annotations. IEEE Trans Imag Process 31(7):5038–5051. https://doi.org/10.1109/TIP.2022.3189825
Chen K, Hong L, Xu H, et al. (2021) MultiSiam: Self-supervised multi-instance siamese representation learning for autonomous driving. Paper presented at the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 11–17 October . https://doi.org/10.1109/ICCV48922.2021.00745
Shu F, Xie Y, Rambach J, et al. (2021) Visual SLAM with graph-cut optimized multi-plane reconstruction. Paper presented at the 2021 IEEE International Symposium on Mixed and Augmented Reality Adjunct, Osaka, Japan, 22–27 August. https://doi.org/10.1109/ISMAR-Adjunct54149.2021.00042
Chen K, Li L, Liu H, et al. (2023) SwinFSR: Stereo image super-resolution using SwinIR and Frequency Domain Knowledge. Paper presented at the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 20–22 June. https://doi.org/10.48550/arXiv.2304.12556
Hinzmann T, Stastny T, Cadena C et al (2018) Free lsd: Prior-free visual landing site detection for autonomous planes. IEEE Robot Autom Lett 3(4):2545–2552. https://doi.org/10.1109/LRA.2018.2809962
Eraqi HM, Moustafa MN, Honer J (2022) Dynamic conditional imitation learning for autonomous driving. IEEE Trans Intell Transp Syst 23(12):22988–23001. https://doi.org/10.1109/TITS.2022.3214079
Liu Q, Han T, Xie J et al (2022) Real-time dynamic map with crowdsourcing vehicles in edge computing. IEEE Trans Intell Veh 6(4):1–10. https://doi.org/10.1109/TIV.2022.3214119
Ren S, He K, Girshick R et al (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Patt Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Li C, Li L, Jiang H, et al. (2022) YOLOv6: A single-stage object detection framework for Industrial Applications. Preprint at arXiv:2209.02976
Wang C-Y, Bochkovskiy A, Liao H-YM (2023) YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. Preprint at arXiv:2207.02696
Wang Y, Peng Y, Li W et al (2022) Ddu-net: Dual-decoder-u-net for road extraction using high-resolution remote sensing images. IEEE Trans Geosci Remote Sens 60(12):1–12. https://doi.org/10.1109/TGRS.2022.3197546
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Patt Anal Mach Intell 39(12):2481–2495. https://doi.org/10.1109/TPAMI.2016.2644615
Pan X, Shi J, Luo P, et al. (2018) Spatial As Deep: Spatial CNN for Traffic Scene Understanding. Paper presented at the 2018 Association for the Advancement of Artificial Intelligence, New Orleans, USA, 2–7 February. https://doi.org/10.1609/aaai.v32i1.12301
Tran DN-N, Pham LH, Nguyen H-H et al (2022) Universal detection-based driving assistance using a mono camera with jetson devices. IEEE Access 9(6):1–13. https://doi.org/10.1109/ACCESS.2022.3179999
Yu F, Chen H, Wang X, et al. (2018) BDD100K: A diverse driving dataset for heterogeneous multitask learning. Paper presented at the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 16–20 June. https://doi.org/10.1109/cvpr42600.2020.00271
Andreas G, Philip L, Christoph S et al (2013) Vision meets robotics: the kitti dataset. Int J Robot Res 32(11):1231–1237. https://doi.org/10.1177/0278364913491297
Ren S, He K, Girshick R et al (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Patt Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
He K, Zhang X, Ren S et al (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Patt Anal Mach Intell 37(12):1904–1916. https://doi.org/10.1007/978-3-319-10578-9_23
Cai Z, Vasconcelos N (2018) Cascade R-CNN: Delving into high quality object detection. Paper presented at the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 16–20 June. https://doi.org/10.1109/CVPR.2018.00644
Liu W, Anguelov D, Erhan D et al. (2016) SSD: Single Shot MultiBox Detector. Paper presented at the 2016 European Conference on Computer Vision, Amsterdam, Netherlands 10–16 October. https://doi.org/10.1007/978-3-319-46448-0_2
Bochkovskiy A, Wang C-Y, Liao H-YM (2020) YOLOv4: Optimal Speed and Accuracy of Object Detection. Preprint at arXiv:2004.10934
Wang J, Chen Y, Dong Z et al (2021) Improved yolov5 network for real-time multi-scale traffic sign detection. Neural Comput Appl 35(4):7853–7865. https://doi.org/10.1007/s00521-022-08077-5
Benjumea A, Teeti I, Cuzzolin F, et al. (2021) YOLO-Z: Improving small object detection in YOLOv5 for autonomous vehicle. Paper presented at the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 11–17 October. arXiv:2112.11798
Ge Z, Liu S, Wang F, et al. (2021) YOLOX: Exceeding YOLO Series in 2021. Preprint at arxiv:2107.08430
Zhu X, Su W, Lu L et al. (2020) Deformable DETR: Deformable Transformers for End-to-End Object Detection. Preprint at arXiv:2010.04159
Wang X, Wang D, Li S, et al. (2022) Low-light traffic objects detection for automated vehicles. Paper presented at the 2022 CAA International Conference on Vehicular Control and Intelligence, Nanjing, China, 28–30 October. https://doi.org/10.1109/CVCI56766.2022.9964586
Yang K, Liu J, Yang D, et al. (2022) A novel efficient multi-view traffic-related object detection framework. Paper presented at the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Rhodes Island, Greece, 04–10 June. https://doi.org/10.1109/ICASSP49357.2023.10095027
Shi Y, Wu J, Zhao S, et al. (2022) Rethinking the detection head configuration for traffic object detection. Preprint at arXiv:2210.03883
Liu Z, Hu H, Lin Y, et al. (2022) Swin transformer v2: Scaling up capacity and resolution. Paper presented at the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA 19–24 June. https://doi.org/10.1109/CVPR52688.2022.01170
Li F, Zhang H, Liu S, et al. (2022) DN-DETR: Accelerate DETR Training by Introducing Query DeNoising. Paper presented at the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA 19–24 June . https://doi.org/10.1109/CVPR52688.2022.01325
Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. Paper presented at the 2017 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Hawaii, USA, 21–26 July. https://doi.org/10.1109/CVPR.2017.660
Adam P, Abhishek C, Sangpil K, et al. (2016) Enet: A deep neural network architecture for real-time semantic segmentation. Preprint at arXiv:1606.02147
Chen L-C, Papandreou G, Kokkinos I et al (2018) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Patt Anal Mach Intell 40(4):834–848. https://doi.org/10.1109/TPAMI.2017.2699184
Hou Y, Ma Z, Liu C, et al. (2019) Learning lightweight lane detection CNNs by self attention distillation. Paper presented at the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea 27 October–2 November. https://doi.org/10.1109/ICCV.2019.00110
Strudel R, Pinel RG, Laptev I, et al. (2021) Segmenter: transformer for semantic segmentation. Paper presented at the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Canada 11–17 October. https://doi.org/10.1109/ICCV48922.2021.00717
Xie E, Wang W, Yu Z, et al. (2021) SegFormer: simple and efficient design for semantic segmentation with transformers. Preprint at arXiv:2105.15203
Yu Q, Wang H, Kim D, et al. (2022) CMT-DeepLab: clustering mask transformers for panoptic segmentation. Paper presented at the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA 19–24 June. https://doi.org/10.1109/CVPR52688.2022.00259
Rao Y, Zhao W, Tang Y, et al. (2022) HorNet: Efficient high-order spatial interactions with recursive gated convolutions. Preprint at arXiv:2207.14284
Kong T, Sun F, Yao A, et al. (2017) RON: Reverse connection with objectness prior networks for object detection. Paper presented at the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA 21–26 July. https://doi.org/10.1109/CVPR.2017.557
Zhang S, Wen L, Bian X, et al. (2018) Single-shot refinement neural network for object detection. Paper presented at the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA 18–21 June. https://doi.org/10.1109/CVPR.2018.00442
Ghiasi G, Lin T-Y, Pang, R, et al. (2019) NAS-FPN: Learning scalable feature architecture for object detection. Paper presented at the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA 15–21 June. https://doi.org/10.1109/CVPR.2019.00720
Qiu H, Yu B, Tao D (2022) GFNet: Geometric flow network for 3D point cloud semantic segmentation. Preprint at arXiv:2207.02605
Ding X, Zhang X, Zhou Y, et al. (2022) Scaling Up Your Kernels to 31\(\times \)31: Revisiting Large Kernel Design in CNNs. Paper presented at the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA 19–24 June. https://doi.org/10.1109/CVPR52688.2022.01166
Woo S, Debnath S, Hu R, et al. (2023) ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders. Preprint at arXiv:2301.00808
Lin T-Y, Dollár P, Girshick RB, et al. (2017) Feature Pyramid Networks for Object Detection. Paper presented at the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA 21–26 July . https://doi.org/10.1109/CVPR.2017.106
Marvin T, Michael W, Marius Z, et al. (2018) Multinet: Real-time joint semantic reasoning for autonomous driving. Paper presented at the 2018 IEEE intelligent vehicles symposium, Suzhou, China 26–30 June . https://doi.org/10.1109/IVS.2018.8500504
Joseph R, Santosh D, Ross G, et al. (2016) You only look once: Unified, real-time object detection. Paper presented at the 2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 26 June–01 July. https://doi.org/10.1109/CVPR.2016.91
Funding
The authors did not receive support from any organization for the submitted work.
Author information
Authors and Affiliations
Contributions
The project design and paper writing were carried out by all authors. JY conceived the core ideas and principles of the paper. YL designed the experiments, trained the models, and wrote the paper. CL collected the data, while RT processed and documented it. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Informed Consent
The research did not involves human participants or animals.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yao, J., Li, Y., Liu, C. et al. Ehsinet: Efficient High-Order Spatial Interaction Multi-task Network for Adaptive Autonomous Driving Perception. Neural Process Lett 55, 11353–11370 (2023). https://doi.org/10.1007/s11063-023-11379-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-023-11379-x