Abstract
The key issues for roadside sensing system (RSS) include achieving accuracy and real-time sharing of over-horizon perception information. This study proposes a novel and efficient framework dedicated to multi-object detection from the roadside perspective. Firstly, compared to other backbones, the mobile net-based model has superior performance and speed as results of the network parameters obtained from network architecture search (NAS), developed to increase the forward inference speed. Secondly, a method of optimization based on the coordinate attention mechanism is developed to increase the long-range dependence of neural networks on spatial information. Thirdly, the traditional convolution operation in the attention mechanism is optimized by the depthwise over-parameterized convolution (DOPC) to improve the capability of extracting features from high-dimensional feature space. Finally, the lightweight single-stage multi-target detection model from the roadside perspective based on DCM3-YOLOv4 is developed. The test results show that the optimized one-stage lightweight multiple object detection model DCM3-YOLOv4 on the RS-UA dataset produces a mean average precision (mAP) value of 0.930 and a network model with parameter size of 31.12 Million. The inference time is 96.13 ms, which is faster than another basic model on the same platform. The proposed methods can be utilized in a wide range of applications, where the accuracy and speed requirements of RSS must be met.
Similar content being viewed by others
Abbreviations
- CA:
-
Coordinate attention
- CNN:
-
Convolutional neural network
- CPU:
-
Central processing unit
- DO-Conv:
-
Depthwise over-parameterized convolution
- DSSD:
-
Deconvolutional single shot detector
- FPNs:
-
Feature pyramid network
- FPS:
-
Frame Per Second
- GPU:
-
Graphics processing unit
- HOG:
-
Histogram of gradient
- IDE:
-
Integrated drive electronics
- LiDAR:
-
Light detect and ranging
- mAP:
-
Mean average precision
- PANet:
-
Path aggregation network
- PCA:
-
Principal component analysis
- PHOG:
-
Pyramid histogram of oriented gradient
- RPN:
-
Region proposal network
- RSS:
-
Roadside sensing system
- SE:
-
Squeeze and excite
- SIFT:
-
Scale-invariant feature transform
- SPP:
-
Spatial pyramid pooling
- SSD:
-
Single shot multibox detector
- SSN:
-
Spatial shortcut network
- SURF:
-
Speeded up robust feature
- SVM:
-
Support vector machine
- YOLOv4:
-
You only look once v4
References
Zhao, X., Fang, Y., Min, H., Wu, X., Wang, W., Teixeira, R.: Potential sources of sensor data anomalies for autonomous vehicles: an overview from road vehicle safety perspective. Expert Syst. Appl. 236, 121358 (2023)
Lowe, D.: G: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Agrawal, P., Sharma, T., Verma, N.K.: Supervised approach for object identification using speeded up robust features. Int. J. Adv. Intell. Paradig. 15(2), 165–182 (2020)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. Paper presented at the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) (2005)
Sun, X., Liu, K., Chen, L., et al.: LLTH-YOLOv5: a real-time traffic sign detection algorithm for low-light scenes. Automot. Innov. 7, 121–137 (2024)
Guo, K., Hu, Y., Qian, Z., et al.: Optimized graph convolution recurrent neural network for traffic prediction. IEEE Trans. Intell. Transp. Syst. 22(2), 1138–1149 (2020)
Jin, L., Liu, X., Wang, Y., et al.: Multi-modality trajectory prediction with the dynamic spatial interaction among vehicles under connected vehicle environment. Sci. Rep. 14, 2873 (2024)
Wei, Y., Tian, Q., Guo, J., Huang, W., Cao, J.: Multi-vehicle detection algorithm through combining Harr and HOG features. Math. Comput. Simul. 155, 130–145 (2019)
Khairdoost, N., Monadjemi, S.A.: Jamshidi, K: Front and rear vehicle detection using hypothesis generation and verification. Signal Image Process. 4(4), 31 (2013)
Park, K.Y., Hwang, S.Y.: An improved Haar-like feature for efficient object detection. Pattern Recogn. Lett. 42, 148–153 (2014)
Azimjonov, J., Özmen, A.: A real-time vehicle detection and a novel vehicle tracking systems for estimating and monitoring traffic flow on highways. Adv. Eng. Inform. 50, 101393 (2021)
Zhang, L., Wang, J., An, Z.: Vehicle recognition algorithm based on Haar-like features and improved Adaboost classifier. J. Ambient. Intell. Humaniz. Comput. 14(2), 807–815 (2023)
Cheng, Y.H., Wang, J.: A motion image detection method based on the inter-frame difference method. Appl. Mech. Mater. 490, 1283–1286 (2014)
Ramya, P., Rajeswari, R.: A modified frame difference method using correlation coefficient for background subtraction. Proc. Comput. Sci. 93, 478–485 (2016)
Zhang, M., Zheng, Y., Lu, F.: Optical flow in the dark. IEEE Trans. Pattern Anal. Mach. Intell. 14(12), 9464–9476 (2021)
Lei, M., Geng, J.: Fusion of three-frame difference method and background difference method to achieve infrared human target detection. In: Paper presented at the 2019 IEEE 1st International Conference on Civil Aviation Safety and Information Technology (ICCASIT), Kunming (2019)
Sengar, S.S., Mukhopadhyay, S.: Foreground detection via background subtraction and improved three-frame differencing. Arab. J. Sci. Eng. 42(8), 3621–3633 (2017)
Barnich, O., Van Droogenbroeck, M.: ViBe: a universal background subtraction algorithm for video sequences. IEEE Trans. Image Process. 20(6), 1709–1724 (2010)
Jagannathan, P., Rajkumar, S., Frnda, J., et al.: Moving vehicle detection and classification using gaussian mixture model and ensemble deep learning technique. Wirel. Commun. Mob. Comput. 2021, 1–15 (2021)
Hoss, M., Scholtes, M., Eckstein, L.: A review of testing object-based environment perception for safe automated driving. Automot. Innov. 5, 223–250 (2022)
Schrittwieser, J., Antonoglou, I., Hubert, T., et al.: Mastering atari, go, chess and shogi by planning with a learned model. Nature 588(7839), 604–609 (2020)
Moravčík, M., Schmid, M., Burch, N., et al.: Deepstack: expert-level artificial intelligence in heads-up no-limit poker. Science 356(6337), 508–513 (2017)
Sharma, P., Singh, A., Singh, K.K., et al.: Vehicle identification using modified region based convolution network for intelligent transportation system. Multimed Tools Appl 81, 34893–34917 (2022)
Girshick, R.: Fast R-CNN. In: Paper Presented at the Proceedings of the 2015 IEEE International Conference on Computer Vision (2015)
Redmon, J., Divvala, S., Girshick, R.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Liu, W., Anguelov, D., Erhan, D., et al.: SSD: Single shot multibox detector. In: European Conference on Computer Vision. Springer, Cham (2016)
Fu, C. Y., Liu, W., Ranga, A., et al.: DSSD: Deconvolutional Single Shot Detector. arXiv preprint arXiv:1701.06659 (2017)
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Redmon, J.,Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Bochkovskiy, A., Wang, C.Y., Liao, H. Y. M.: YOLOv4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)
He, K., Zhang, X., Ren, S., et al.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)
Liu, S., Qi, L., Qin, H., et al.: Path aggregation network for instance segmentation. In: Paper Presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Hou, Q., Zhou, D., Feng, J.: Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021)
Ramachandran P., Zoph B., Le Q. V.: Searching for Activation Functions. arXiv preprint arXiv:1710.05941 (2017)
Cao, J., Li, Y., Sun, M., et al.: DO-Conv: depthwise over-parameterized convolutional layer. IEEE Trans. Image Process. 31, 3726–3736 (2022)
Wen, L., Du, D., Cai, Z., et al.: UA-DETRAC: a new benchmark and protocol for multi-object detection and tracking. Comput. Vis. Image Underst. 193, 102907 (2020)
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China (52072333, 52202503) and Science and Technology Project of Hebei Education Department (BJK2023026) and Hebei Natural Science Foundation (F2022203054).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all the authors, the corresponding author states that there is no conflict of interest.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Guo, B., Wang, H., Jin, L. et al. DCM3-YOLOv4: A Real-Time Multi-Object Detection Framework. Automot. Innov. 7, 283–299 (2024). https://doi.org/10.1007/s42154-023-00258-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42154-023-00258-9