Skip to main content
Log in

Channel spatial attention based single-shot object detector for autonomous vehicles

  • 1200: Machine Vision Theory and Applications for Cyber Physical Systems
  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Real-time object detection with high accuracy is the measure concern for the autonomous vehicle to provide safety. Recently many state-of-the-art methods used Convolutional Neural Network (CNN) for object detection. Although these methods provide better results but still provide a trade-off between accuracy and real-time detection becomes challenging tasks. High accuracy ensures the vehicle for avoiding collisions and abide the traffic rules while the faster speed helps to make the decision quickly. In this paper, the single-shot object detection is provided faster results and the attention module helps to provide more accurate detection. The channel attention mechanism provides more grained refine features and emphasizes ‘what’ is a semantic part from a given input. Apart from the channel attention mechanisms, spatial attention emphasizes ‘where’ is meaningful information which is working as a booster for the attention block. The proposed model incorporates these two attention mechanisms sequentially such as channel (RGB-wise) as well spatial attention for single-shot object detection (CSA-SS). The proposed model is trained and tested using challenging datasets such as KITTI and Berkeley Deep Drive (BDD). The experimental result shows that the proposed model surpasses the state-of-the-art techniques by 1.66 and 1.13 mAP for the KITTI and BDD datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Bernardin K, Stiefelhagen R (2018) Evaluating multiple object tracking performance: the CLEAR MOT Metrics. 2008

  2. Cai Z, Fan Q, Feris RS, Vasconcelos N (2016) A unified multi-scale deep convolutional neural network for fast object detection. Lect Notes Comput Sci (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 9908 LNCS, pp 354–370

  3. Casanova A, Cucurull G, Drozdzal M, Romero A, Bengio Y (2018) On the iterative refinement of densely connected representation levels for semantic segmentation. In: IEEE Comput Soc Conf Comput Vis Pattern Recognit workshops, vol 2018-June, pp 1091–1100

  4. Chen G, Qin H (2021) Class-discriminative focal loss for extreme imbalanced multiclass object detection towards autonomous driving. Vis Comput. https://doi.org/10.1007/s00371-021-02067-9

    Article  Google Scholar 

  5. Choi J, Chun D, Kim H, Lee HJ (2019) Gaussian YOLOv3: an accurate and fast object detector using localization uncertainty for autonomous driving. In: Proceedings of the IEEE Int Conf Comput Vis, vol 2019-Oct, pp 502–511

  6. Gao Z, Xie J, Wang Q, Li P (2018) Global second-order pooling convolutional networks. arXiv, pp 3024–3033

  7. Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: the KITTI dataset. Int J Rob Res 32(11):1231–1237

    Article  Google Scholar 

  8. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Comput Soc Conf Comput Vis Pattern Recognit, pp 580–587

  9. Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE Int Conf Comput Vis, vol. 2015 Inter, pp 1440–1448

  10. Hu J, Shen L, Albanie S, Sun G, Vedaldi A (2018) Gather-excite: exploiting feature context in convolutional neural networks. In: Adv Neural Inf Process Syst, vol 2018-Dec, no NeurIPS, pp 9401–9411

  11. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Comput Soc Conf Comput Vis Pattern Recognit, pp 7132–7141

  12. Hu X et al (2019) SINet: a scale-insensitive convolutional neural network for fast vehicle detection. IEEE Trans Intell Transport Syst 20(3):1010–1019

    Article  Google Scholar 

  13. Jiao L et al (2019) A survey of deep learning-based object detection. IEEE Access 7(3):128837–128868

    Article  Google Scholar 

  14. Lin TY, Goyal P, Girshick R, He K, Dollar P (2017) Focal loss for dense object detection. Proceedings of the IEEE Int Conf Comput Vis, vol 2017-Oct, pp 2999–3007

  15. Liu W et al (2016) SSD: Single shot multibox detector. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 9905 LNCS, pp 21–37

  16. Liu S, Huang D, Wang Y (2018) Receptive field block net for accurate and fast object detection. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 11215 LNCS, pp 404–419

  17. Lu J, J. Yang J, Batra D, Parikh D (2016) Hierarchical question-image co-attention for visual question answering. In: Adv Neural Inf Process Syst, no c, pp 289–297

  18. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE Comput Soc Conf Comput Vis Pattern Recognit, vol 2016-Dec, pp 779–788

  19. Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: Proceedings of the 30th IEEE Comput Soc Conf Comput Vis Pattern Recognit, CVPR 2017, vol 2017-Jan, pp 6517–6525

  20. Redmon J, Farhadi A (2017) YOLOv3: an incremental improvement. In: Proceedings of the tech report, Comp Vis Pattern Recognit, CVPR 2018, vol 2017

  21. Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

    Article  Google Scholar 

  22. Roy AG, Navab N, Wachinger C (2019) Recalibrating fully convolutional networks with spatial and channel ‘squeeze and excitation’ blocks. IEEE Trans Med Imaging 38(2):540–549

    Article  Google Scholar 

  23. Seo PH, Lehrmann A, Han B, Sigal L (2017) Visual reference resolution using attention memory for visual dialog. In: Adv Neural Inf Process Syst, vol 2017-Dec, no Nips, pp 3720–3730

  24. Seo PH, Lin Z, Cohen S, Shen X, Han B (2019) Progressive attention networks for visual attribute prediction. In: BMVC 2018, BMVC 2018, pp 1–19

  25. Singh S, Krishnan S (2019) Filter response normalization layer: eliminating batch dependence in the training of deep neural networks.

  26. SWoo S, Park J, Lee JY, Kweon IS (2018) CBAM: convolutional block attention module. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 11211 LNCS, pp 3–19

  27. Tian Z, Shen C, Chen H, He T (2019) FCOS: fully convolutional one-stage object detection. In: Proceedings of the IEEE Int Conf Comput Vis, vol 2019-Oct, pp 9626–9635

  28. Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q (2019) ECA-Net: efficient channel attention for deep convolutional neural networks.

  29. Yu F et al (2020) BDD100K: a diverse driving dataset for heterogeneous multitask learning. In: Proceedings of the IEEE Comput Soc Conf Comput Vis Pattern Recognit, pp 2633–2642

  30. Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In: Proceedings of the IEEE Comput Soc Conf Comput Vis Pattern Recognit, pp 4203–4212

  31. Zhao Q, Wang Y, Sheng T, Tang Z (2019) Comprehensive feature enhancement module for single-shot object detector. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 11365 LNCS, pp 325–340

  32. Zhu Y, Groth O, Bernstein M, Fei-Fei L (2016) Visual7W: grounded question answering in images. In: Proceedings of the IEEE Comput Soc Conf Comput Vis Pattern Recognit, vol 2016-December, pp 4995–5004

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Divya Singh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Singh, D., Srivastava, R. Channel spatial attention based single-shot object detector for autonomous vehicles. Multimed Tools Appl 81, 22289–22305 (2022). https://doi.org/10.1007/s11042-021-11267-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-11267-3

Keywords

Navigation