Ehsinet: Efficient High-Order Spatial Interaction Multi-task Network for Adaptive Autonomous Driving Perception

Yao, Jianjun; Li, Yingzhao; Liu, Chongjun; Tang, Ruizhuo

doi:10.1007/s11063-023-11379-x

Ehsinet: Efficient High-Order Spatial Interaction Multi-task Network for Adaptive Autonomous Driving Perception

Published: 03 August 2023

Volume 55, pages 11353–11370, (2023)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Jianjun Yao¹,
Yingzhao Li¹,
Chongjun Liu¹ &
…
Ruizhuo Tang¹

234 Accesses
Explore all metrics

Abstract

Ascertaining the precise detection of traffic objects, drivable areas, and lane lines is the foremost obligation for an automatic driving system. The efficient execution of these tasks is made arduous by the intricate and dynamic nature of the driving environment, as well as varying lighting conditions. This paper proposes EHSINet, an efficient and versatile neural network architecture that adaptively addresses multiple tasks, including traffic object detection, drivable area segmentation, and lane line segmentation. ESHINet is based on the fully convolutional network, enabling the long-range and high-order spatial interactions between neighborhood features without using vision transformers. The performance of EHSINet is evaluated on the BDD100K and KITTI datasets, where it demonstrates superior performance, especially in complex environments, illumination changes, and severe weather conditions. EHSINet outperforms state-of-the-art methods on public datasets. The test results in real-world scenarios also demonstrate its strong generalization and practical value. Code is available at https://github.com/Pepper-FlavoredChewingGum/EHSINet.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Real-Time Multi-task Network for Autonomous Driving

YOLOMH: you only look once for multi-task driving perception with high efficiency

Article 29 March 2024

YOLOP: You Only Look Once for Panoptic Driving Perception

Article Open access 07 November 2022

Data Availibility

The code and datasets analysed during the current study are available in the [EHSINet] repository, [https://github.com/Pepper-FlavoredChewingGum/EHSINet].

References

Chen L, Li Y, Huang C et al (2023) Milestones in autonomous driving and intelligent vehicles: survey of surveys. IEEE Trans Intell Veh 8(2):1046–1056. https://doi.org/10.1109/TIV.2022.3223131
Article Google Scholar
Angelos M, Chandra R, Manocha D (2020) B-gap: Behavior-rich simulation and navigation for autonomous driving. IEEE Robot Autom Lett 7(4):4718–4725. https://doi.org/10.1109/lra.2022.3152594
Article Google Scholar
Wu D, Liao M, Zhang W-T et al (2021) Yolop: You only look once for panoptic driving perception. Mach Intell Res 19(8):550–562. https://doi.org/10.1007/s11633-022-1339-y
Article Google Scholar
Qiu Z, Zhao J, Sun S (2022) Mfialane: Multiscale feature information aggregator network for lane detection. IEEE Trans Intell Transport Syst 23(12):24263–24275. https://doi.org/10.1109/TITS.2022.3195742
Article Google Scholar
Qian Y, Dolan JM, Yang M (2020) Dlt-net: Joint detection of drivable areas, lane lines, and traffic objects. IEEE Trans Intell Transp Syst 21(11):4670–4679. https://doi.org/10.1109/TITS.2019.2943777
Article Google Scholar
Xu Y, Pedram G (2022) Consistency-regularized region-growing network for semantic segmentation of urban scenes with point-level annotations. IEEE Trans Imag Process 31(7):5038–5051. https://doi.org/10.1109/TIP.2022.3189825
Article Google Scholar
Chen K, Hong L, Xu H, et al. (2021) MultiSiam: Self-supervised multi-instance siamese representation learning for autonomous driving. Paper presented at the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 11–17 October . https://doi.org/10.1109/ICCV48922.2021.00745
Shu F, Xie Y, Rambach J, et al. (2021) Visual SLAM with graph-cut optimized multi-plane reconstruction. Paper presented at the 2021 IEEE International Symposium on Mixed and Augmented Reality Adjunct, Osaka, Japan, 22–27 August. https://doi.org/10.1109/ISMAR-Adjunct54149.2021.00042
Chen K, Li L, Liu H, et al. (2023) SwinFSR: Stereo image super-resolution using SwinIR and Frequency Domain Knowledge. Paper presented at the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 20–22 June. https://doi.org/10.48550/arXiv.2304.12556
Hinzmann T, Stastny T, Cadena C et al (2018) Free lsd: Prior-free visual landing site detection for autonomous planes. IEEE Robot Autom Lett 3(4):2545–2552. https://doi.org/10.1109/LRA.2018.2809962
Article Google Scholar
Eraqi HM, Moustafa MN, Honer J (2022) Dynamic conditional imitation learning for autonomous driving. IEEE Trans Intell Transp Syst 23(12):22988–23001. https://doi.org/10.1109/TITS.2022.3214079
Article Google Scholar
Liu Q, Han T, Xie J et al (2022) Real-time dynamic map with crowdsourcing vehicles in edge computing. IEEE Trans Intell Veh 6(4):1–10. https://doi.org/10.1109/TIV.2022.3214119
Article Google Scholar
Ren S, He K, Girshick R et al (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Patt Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Article Google Scholar
Li C, Li L, Jiang H, et al. (2022) YOLOv6: A single-stage object detection framework for Industrial Applications. Preprint at arXiv:2209.02976
Wang C-Y, Bochkovskiy A, Liao H-YM (2023) YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. Preprint at arXiv:2207.02696
Wang Y, Peng Y, Li W et al (2022) Ddu-net: Dual-decoder-u-net for road extraction using high-resolution remote sensing images. IEEE Trans Geosci Remote Sens 60(12):1–12. https://doi.org/10.1109/TGRS.2022.3197546
Article Google Scholar
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Patt Anal Mach Intell 39(12):2481–2495. https://doi.org/10.1109/TPAMI.2016.2644615
Article Google Scholar
Pan X, Shi J, Luo P, et al. (2018) Spatial As Deep: Spatial CNN for Traffic Scene Understanding. Paper presented at the 2018 Association for the Advancement of Artificial Intelligence, New Orleans, USA, 2–7 February. https://doi.org/10.1609/aaai.v32i1.12301
Tran DN-N, Pham LH, Nguyen H-H et al (2022) Universal detection-based driving assistance using a mono camera with jetson devices. IEEE Access 9(6):1–13. https://doi.org/10.1109/ACCESS.2022.3179999
Article Google Scholar
Yu F, Chen H, Wang X, et al. (2018) BDD100K: A diverse driving dataset for heterogeneous multitask learning. Paper presented at the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 16–20 June. https://doi.org/10.1109/cvpr42600.2020.00271
Andreas G, Philip L, Christoph S et al (2013) Vision meets robotics: the kitti dataset. Int J Robot Res 32(11):1231–1237. https://doi.org/10.1177/0278364913491297
Article Google Scholar
Ren S, He K, Girshick R et al (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Patt Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Article Google Scholar
He K, Zhang X, Ren S et al (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Patt Anal Mach Intell 37(12):1904–1916. https://doi.org/10.1007/978-3-319-10578-9_23
Article Google Scholar
Cai Z, Vasconcelos N (2018) Cascade R-CNN: Delving into high quality object detection. Paper presented at the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 16–20 June. https://doi.org/10.1109/CVPR.2018.00644
Liu W, Anguelov D, Erhan D et al. (2016) SSD: Single Shot MultiBox Detector. Paper presented at the 2016 European Conference on Computer Vision, Amsterdam, Netherlands 10–16 October. https://doi.org/10.1007/978-3-319-46448-0_2
Bochkovskiy A, Wang C-Y, Liao H-YM (2020) YOLOv4: Optimal Speed and Accuracy of Object Detection. Preprint at arXiv:2004.10934
Wang J, Chen Y, Dong Z et al (2021) Improved yolov5 network for real-time multi-scale traffic sign detection. Neural Comput Appl 35(4):7853–7865. https://doi.org/10.1007/s00521-022-08077-5
Article Google Scholar
Benjumea A, Teeti I, Cuzzolin F, et al. (2021) YOLO-Z: Improving small object detection in YOLOv5 for autonomous vehicle. Paper presented at the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 11–17 October. arXiv:2112.11798
Ge Z, Liu S, Wang F, et al. (2021) YOLOX: Exceeding YOLO Series in 2021. Preprint at arxiv:2107.08430
Zhu X, Su W, Lu L et al. (2020) Deformable DETR: Deformable Transformers for End-to-End Object Detection. Preprint at arXiv:2010.04159
Wang X, Wang D, Li S, et al. (2022) Low-light traffic objects detection for automated vehicles. Paper presented at the 2022 CAA International Conference on Vehicular Control and Intelligence, Nanjing, China, 28–30 October. https://doi.org/10.1109/CVCI56766.2022.9964586
Yang K, Liu J, Yang D, et al. (2022) A novel efficient multi-view traffic-related object detection framework. Paper presented at the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Rhodes Island, Greece, 04–10 June. https://doi.org/10.1109/ICASSP49357.2023.10095027
Shi Y, Wu J, Zhao S, et al. (2022) Rethinking the detection head configuration for traffic object detection. Preprint at arXiv:2210.03883
Liu Z, Hu H, Lin Y, et al. (2022) Swin transformer v2: Scaling up capacity and resolution. Paper presented at the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA 19–24 June. https://doi.org/10.1109/CVPR52688.2022.01170
Li F, Zhang H, Liu S, et al. (2022) DN-DETR: Accelerate DETR Training by Introducing Query DeNoising. Paper presented at the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA 19–24 June . https://doi.org/10.1109/CVPR52688.2022.01325
Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. Paper presented at the 2017 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Hawaii, USA, 21–26 July. https://doi.org/10.1109/CVPR.2017.660
Adam P, Abhishek C, Sangpil K, et al. (2016) Enet: A deep neural network architecture for real-time semantic segmentation. Preprint at arXiv:1606.02147
Chen L-C, Papandreou G, Kokkinos I et al (2018) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Patt Anal Mach Intell 40(4):834–848. https://doi.org/10.1109/TPAMI.2017.2699184
Article Google Scholar
Hou Y, Ma Z, Liu C, et al. (2019) Learning lightweight lane detection CNNs by self attention distillation. Paper presented at the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea 27 October–2 November. https://doi.org/10.1109/ICCV.2019.00110
Strudel R, Pinel RG, Laptev I, et al. (2021) Segmenter: transformer for semantic segmentation. Paper presented at the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Canada 11–17 October. https://doi.org/10.1109/ICCV48922.2021.00717
Xie E, Wang W, Yu Z, et al. (2021) SegFormer: simple and efficient design for semantic segmentation with transformers. Preprint at arXiv:2105.15203
Yu Q, Wang H, Kim D, et al. (2022) CMT-DeepLab: clustering mask transformers for panoptic segmentation. Paper presented at the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA 19–24 June. https://doi.org/10.1109/CVPR52688.2022.00259
Rao Y, Zhao W, Tang Y, et al. (2022) HorNet: Efficient high-order spatial interactions with recursive gated convolutions. Preprint at arXiv:2207.14284
Kong T, Sun F, Yao A, et al. (2017) RON: Reverse connection with objectness prior networks for object detection. Paper presented at the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA 21–26 July. https://doi.org/10.1109/CVPR.2017.557
Zhang S, Wen L, Bian X, et al. (2018) Single-shot refinement neural network for object detection. Paper presented at the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA 18–21 June. https://doi.org/10.1109/CVPR.2018.00442
Ghiasi G, Lin T-Y, Pang, R, et al. (2019) NAS-FPN: Learning scalable feature architecture for object detection. Paper presented at the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA 15–21 June. https://doi.org/10.1109/CVPR.2019.00720
Qiu H, Yu B, Tao D (2022) GFNet: Geometric flow network for 3D point cloud semantic segmentation. Preprint at arXiv:2207.02605
Ding X, Zhang X, Zhou Y, et al. (2022) Scaling Up Your Kernels to 31\(\times \)31: Revisiting Large Kernel Design in CNNs. Paper presented at the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA 19–24 June. https://doi.org/10.1109/CVPR52688.2022.01166
Woo S, Debnath S, Hu R, et al. (2023) ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders. Preprint at arXiv:2301.00808
Lin T-Y, Dollár P, Girshick RB, et al. (2017) Feature Pyramid Networks for Object Detection. Paper presented at the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA 21–26 July . https://doi.org/10.1109/CVPR.2017.106
Marvin T, Michael W, Marius Z, et al. (2018) Multinet: Real-time joint semantic reasoning for autonomous driving. Paper presented at the 2018 IEEE intelligent vehicles symposium, Suzhou, China 26–30 June . https://doi.org/10.1109/IVS.2018.8500504
Joseph R, Santosh D, Ross G, et al. (2016) You only look once: Unified, real-time object detection. Paper presented at the 2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 26 June–01 July. https://doi.org/10.1109/CVPR.2016.91

Download references

Funding

The authors did not receive support from any organization for the submitted work.

Author information

Authors and Affiliations

College of Mechanical and Electrical Engineering, Harbin Engineering University, Nantong Street, Harbin, 150001, Heilongjiang, China
Jianjun Yao, Yingzhao Li, Chongjun Liu & Ruizhuo Tang

Authors

Jianjun Yao
View author publications
You can also search for this author in PubMed Google Scholar
Yingzhao Li
View author publications
You can also search for this author in PubMed Google Scholar
Chongjun Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ruizhuo Tang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The project design and paper writing were carried out by all authors. JY conceived the core ideas and principles of the paper. YL designed the experiments, trained the models, and wrote the paper. CL collected the data, while RT processed and documented it. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Jianjun Yao.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Informed Consent

The research did not involves human participants or animals.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yao, J., Li, Y., Liu, C. et al. Ehsinet: Efficient High-Order Spatial Interaction Multi-task Network for Adaptive Autonomous Driving Perception. Neural Process Lett 55, 11353–11370 (2023). https://doi.org/10.1007/s11063-023-11379-x

Download citation

Accepted: 21 July 2023
Published: 03 August 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s11063-023-11379-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Ehsinet: Efficient High-Order Spatial Interaction Multi-task Network for Adaptive Autonomous Driving Perception

Abstract

Access this article

Similar content being viewed by others

Real-Time Multi-task Network for Autonomous Driving

YOLOMH: you only look once for multi-task driving perception with high efficiency

YOLOP: You Only Look Once for Panoptic Driving Perception

Data Availibility

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Informed Consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Ehsinet: Efficient High-Order Spatial Interaction Multi-task Network for Adaptive Autonomous Driving Perception

Abstract

Access this article

Similar content being viewed by others

Real-Time Multi-task Network for Autonomous Driving

YOLOMH: you only look once for multi-task driving perception with high efficiency

YOLOP: You Only Look Once for Panoptic Driving Perception

Data Availibility

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Informed Consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation