Skip to main content
Log in

DCM3-YOLOv4: A Real-Time Multi-Object Detection Framework

  • Published:
Automotive Innovation Aims and scope Submit manuscript

Abstract

The key issues for roadside sensing system (RSS) include achieving accuracy and real-time sharing of over-horizon perception information. This study proposes a novel and efficient framework dedicated to multi-object detection from the roadside perspective. Firstly, compared to other backbones, the mobile net-based model has superior performance and speed as results of the network parameters obtained from network architecture search (NAS), developed to increase the forward inference speed. Secondly, a method of optimization based on the coordinate attention mechanism is developed to increase the long-range dependence of neural networks on spatial information. Thirdly, the traditional convolution operation in the attention mechanism is optimized by the depthwise over-parameterized convolution (DOPC) to improve the capability of extracting features from high-dimensional feature space. Finally, the lightweight single-stage multi-target detection model from the roadside perspective based on DCM3-YOLOv4 is developed. The test results show that the optimized one-stage lightweight multiple object detection model DCM3-YOLOv4 on the RS-UA dataset produces a mean average precision (mAP) value of 0.930 and a network model with parameter size of 31.12 Million. The inference time is 96.13 ms, which is faster than another basic model on the same platform. The proposed methods can be utilized in a wide range of applications, where the accuracy and speed requirements of RSS must be met.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Algorithm 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Abbreviations

CA:

Coordinate attention

CNN:

Convolutional neural network

CPU:

Central processing unit

DO-Conv:

Depthwise over-parameterized convolution

DSSD:

Deconvolutional single shot detector

FPNs:

Feature pyramid network

FPS:

Frame Per Second

GPU:

Graphics processing unit

HOG:

Histogram of gradient

IDE:

Integrated drive electronics

LiDAR:

Light detect and ranging

mAP:

Mean average precision

PANet:

Path aggregation network

PCA:

Principal component analysis

PHOG:

Pyramid histogram of oriented gradient

RPN:

Region proposal network

RSS:

Roadside sensing system

SE:

Squeeze and excite

SIFT:

Scale-invariant feature transform

SPP:

Spatial pyramid pooling

SSD:

Single shot multibox detector

SSN:

Spatial shortcut network

SURF:

Speeded up robust feature

SVM:

Support vector machine

YOLOv4:

You only look once v4

References

  1. Zhao, X., Fang, Y., Min, H., Wu, X., Wang, W., Teixeira, R.: Potential sources of sensor data anomalies for autonomous vehicles: an overview from road vehicle safety perspective. Expert Syst. Appl. 236, 121358 (2023)

    Article  Google Scholar 

  2. Lowe, D.: G: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  3. Agrawal, P., Sharma, T., Verma, N.K.: Supervised approach for object identification using speeded up robust features. Int. J. Adv. Intell. Paradig. 15(2), 165–182 (2020)

    Google Scholar 

  4. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. Paper presented at the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) (2005)

  5. Sun, X., Liu, K., Chen, L., et al.: LLTH-YOLOv5: a real-time traffic sign detection algorithm for low-light scenes. Automot. Innov. 7, 121–137 (2024)

    Article  Google Scholar 

  6. Guo, K., Hu, Y., Qian, Z., et al.: Optimized graph convolution recurrent neural network for traffic prediction. IEEE Trans. Intell. Transp. Syst. 22(2), 1138–1149 (2020)

    Article  Google Scholar 

  7. Jin, L., Liu, X., Wang, Y., et al.: Multi-modality trajectory prediction with the dynamic spatial interaction among vehicles under connected vehicle environment. Sci. Rep. 14, 2873 (2024)

    Article  Google Scholar 

  8. Wei, Y., Tian, Q., Guo, J., Huang, W., Cao, J.: Multi-vehicle detection algorithm through combining Harr and HOG features. Math. Comput. Simul. 155, 130–145 (2019)

    Article  MathSciNet  Google Scholar 

  9. Khairdoost, N., Monadjemi, S.A.: Jamshidi, K: Front and rear vehicle detection using hypothesis generation and verification. Signal Image Process. 4(4), 31 (2013)

    Google Scholar 

  10. Park, K.Y., Hwang, S.Y.: An improved Haar-like feature for efficient object detection. Pattern Recogn. Lett. 42, 148–153 (2014)

    Article  Google Scholar 

  11. Azimjonov, J., Özmen, A.: A real-time vehicle detection and a novel vehicle tracking systems for estimating and monitoring traffic flow on highways. Adv. Eng. Inform. 50, 101393 (2021)

    Article  Google Scholar 

  12. Zhang, L., Wang, J., An, Z.: Vehicle recognition algorithm based on Haar-like features and improved Adaboost classifier. J. Ambient. Intell. Humaniz. Comput. 14(2), 807–815 (2023)

    Article  Google Scholar 

  13. Cheng, Y.H., Wang, J.: A motion image detection method based on the inter-frame difference method. Appl. Mech. Mater. 490, 1283–1286 (2014)

    Google Scholar 

  14. Ramya, P., Rajeswari, R.: A modified frame difference method using correlation coefficient for background subtraction. Proc. Comput. Sci. 93, 478–485 (2016)

    Article  Google Scholar 

  15. Zhang, M., Zheng, Y., Lu, F.: Optical flow in the dark. IEEE Trans. Pattern Anal. Mach. Intell. 14(12), 9464–9476 (2021)

    Article  Google Scholar 

  16. Lei, M., Geng, J.: Fusion of three-frame difference method and background difference method to achieve infrared human target detection. In: Paper presented at the 2019 IEEE 1st International Conference on Civil Aviation Safety and Information Technology (ICCASIT), Kunming (2019)

  17. Sengar, S.S., Mukhopadhyay, S.: Foreground detection via background subtraction and improved three-frame differencing. Arab. J. Sci. Eng. 42(8), 3621–3633 (2017)

    Article  Google Scholar 

  18. Barnich, O., Van Droogenbroeck, M.: ViBe: a universal background subtraction algorithm for video sequences. IEEE Trans. Image Process. 20(6), 1709–1724 (2010)

    Article  MathSciNet  Google Scholar 

  19. Jagannathan, P., Rajkumar, S., Frnda, J., et al.: Moving vehicle detection and classification using gaussian mixture model and ensemble deep learning technique. Wirel. Commun. Mob. Comput. 2021, 1–15 (2021)

    Article  Google Scholar 

  20. Hoss, M., Scholtes, M., Eckstein, L.: A review of testing object-based environment perception for safe automated driving. Automot. Innov. 5, 223–250 (2022)

    Article  Google Scholar 

  21. Schrittwieser, J., Antonoglou, I., Hubert, T., et al.: Mastering atari, go, chess and shogi by planning with a learned model. Nature 588(7839), 604–609 (2020)

    Article  Google Scholar 

  22. Moravčík, M., Schmid, M., Burch, N., et al.: Deepstack: expert-level artificial intelligence in heads-up no-limit poker. Science 356(6337), 508–513 (2017)

    Article  MathSciNet  Google Scholar 

  23. Sharma, P., Singh, A., Singh, K.K., et al.: Vehicle identification using modified region based convolution network for intelligent transportation system. Multimed Tools Appl 81, 34893–34917 (2022)

    Article  Google Scholar 

  24. Girshick, R.: Fast R-CNN. In: Paper Presented at the Proceedings of the 2015 IEEE International Conference on Computer Vision (2015)

  25. Redmon, J., Divvala, S., Girshick, R.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)

  26. Liu, W., Anguelov, D., Erhan, D., et al.: SSD: Single shot multibox detector. In: European Conference on Computer Vision. Springer, Cham (2016)

  27. Fu, C. Y., Liu, W., Ranga, A., et al.: DSSD: Deconvolutional Single Shot Detector. arXiv preprint arXiv:1701.06659 (2017)

  28. Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)

  29. Redmon, J.,Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)

  30. Bochkovskiy, A., Wang, C.Y., Liao, H. Y. M.: YOLOv4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)

  31. He, K., Zhang, X., Ren, S., et al.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)

    Article  Google Scholar 

  32. Liu, S., Qi, L., Qin, H., et al.: Path aggregation network for instance segmentation. In: Paper Presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)

  33. Hou, Q., Zhou, D., Feng, J.: Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021)

  34. Ramachandran P., Zoph B., Le Q. V.: Searching for Activation Functions. arXiv preprint arXiv:1710.05941 (2017)

  35. Cao, J., Li, Y., Sun, M., et al.: DO-Conv: depthwise over-parameterized convolutional layer. IEEE Trans. Image Process. 31, 3726–3736 (2022)

    Article  Google Scholar 

  36. Wen, L., Du, D., Cai, Z., et al.: UA-DETRAC: a new benchmark and protocol for multi-object detection and tracking. Comput. Vis. Image Underst. 193, 102907 (2020)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (52072333, 52202503) and Science and Technology Project of Hebei Education Department (BJK2023026) and Hebei Natural Science Foundation (F2022203054).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lisheng Jin.

Ethics declarations

Conflict of interest

On behalf of all the authors, the corresponding author states that there is no conflict of interest.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guo, B., Wang, H., Jin, L. et al. DCM3-YOLOv4: A Real-Time Multi-Object Detection Framework. Automot. Innov. 7, 283–299 (2024). https://doi.org/10.1007/s42154-023-00258-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42154-023-00258-9

Keywords

Navigation