Efficient Multi-object Detection for Complexity Spatio-Temporal Scenes

Wang, Kai; Song, Xiangyu; Sun, Shijie; Zhao, Juan; Xu, Cai; Song, Huansheng

doi:10.1007/978-981-97-2421-5_13

Kai Wang¹²,
Xiangyu Song¹³,
Shijie Sun¹²,
Juan Zhao¹⁴,
Cai Xu¹⁵ &
…
Huansheng Song¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14334))

Included in the following conference series:

Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data

52 Accesses

Abstract

Multi-Object detection in traffic scenarios plays a crucial role in ensuring the safety of people and property, as well as facilitating the smooth flow of traffic on roads. However, the existing algorithms are inefficient in detecting real scenarios due to the following drawbacks: (1) a scarcity of traffic scene datasets; (2) a lack of tailoring for specific scenarios; and (3) high computational complexity, which hinders practical use. In this paper, we propose a solution to eliminate these drawbacks. Specifically, we introduce a Full-Scene Traffic Dataset (FSTD) with Spatio-temporal features that includes multiple views, multiple scenes, and multiple objectives. Additionally, we propose the improved YOLOv7 model with redesigned BiFusion, NWD and SPPFCSPC modules (BNF-YOLOv7), which is a lightweight and efficient approach that addresses the intricacies of multi-object detection in traffic scenarios. BNF-YOLOv7 is achieved through several improvements over YOLOv7, including the use of the BiFusion feature fusion module, the NWD approach, and the redesign of the loss function. First, we improve the SPPCSPC structure to obtain SPPFCSPC, which maintains the same receptive field while achieving speedup. Second, we use the BiFusion feature fusion module to enhance feature representation capability and improve positional information of objects. Additionally, we introduce NWD and redesign the loss function to address the detection of tiny objects in traffic scenarios. Experiments on the FSTD and UA-DETRAC dataset show that BNF-YOLOv7 outperforms other algorithms with a 3.3% increase in mAP on FSTD and a 2.4% increase on UA-DETRAC. Additionally, BNF-YOLOv7 maintains significantly better real-time performance, increasing the FPS by 10% in real scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: YOLOv4: optimal speed and accuracy of object detection. arXiv:2004.10934 [cs, eess] (2020)
Cai, Z., Vasconcelos, N.: Cascade R-CNN: high quality object detection and instance segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 43(5), 1483–1498 (2019)
Article Google Scholar
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: Centernet: keypoint triplets for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6569–6578 (2019)
Google Scholar
Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw. 107, 3–11 (2018)
Article Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The kitti vision benchmark suite. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)
Google Scholar
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456. PMLR (2015)
Google Scholar
Jocher, G.: YOLOv5 by Ultralytics (2020). https://doi.org/10.5281/zenodo.3908559, https://github.com/ultralytics/yolov5
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kowol, K., Rottmann, M., Bracke, S., Gottschalk, H.: YOdar: uncertainty-based sensor fusion for vehicle detection with camera and radar sensors (2020). https://doi.org/10.48550/arXiv.2010.03320
Li, C., et al.: YOLOv6 v3.0: a full-scale reloading (2023).https://doi.org/10.48550/arXiv.2301.05586
Liang, M., Yang, B., Chen, Y., Hu, R., Urtasun, R.: Multi-task multi-sensor fusion for 3D object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7345–7353 (2019)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8759–8768 (2018)
Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Marriott, R.T., Romdhani, S., Chen, L.: A 3D GAN for improved large-pose facial recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13445–13455 (2021)
Google Scholar
Qin, L., et al.: Id-yolo: real-time salient object detection based on the driver’s fixation region. IEEE Trans. Intell. Transp. Syst. 23(9), 15898–15908 (2022)
Article Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Google Scholar
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)
Google Scholar
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. syst. 28 (2015)
Google Scholar
Song, X., et al.: A survey on deep learning based knowledge tracing. Knowl.-Based Syst. 258, 110036 (2022)
Article Google Scholar
Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Google Scholar
Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M.: Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:2207.02696 (2022)
Wang, C.Y., Liao, H.Y.M., Wu, Y.H., Chen, P.Y., Hsieh, J.W., Yeh, I.H.: CSPNet: a new backbone that can enhance learning capability of CNN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 390–391 (2020)
Google Scholar
Wang, F., Xu, J., Liu, C., Zhou, R., Zhao, P.: On prediction of traffic flows in smart cities: a multitask deep learning based approach. World Wide Web 24, 805–823 (2021)
Article Google Scholar
Wang, J., Xu, C., Yang, W., Yu, L.: A normalized gaussian wasserstein distance for tiny object detection. arXiv preprint arXiv:2110.13389 (2021)
Wang, L., et al.: Model: motif-based deep feature learning for link prediction. IEEE Trans. Comput. Soc. Syst. 7(2), 503–516 (2020)
Article Google Scholar
Wen, L., et al.: UA-DETRAC: a new benchmark and protocol for multi-object detection and tracking. Comput. Vis. Image Underst, 193, 102907 (2020)
Google Scholar
Xu, C., et al.: Uncertainty-aware multi-view deep learning for internet of things applications. IEEE Trans. Industr. Inf. 19(2), 1456–1466 (2022)
Article Google Scholar
Yang, X., Yan, J., Ming, Q., Wang, W., Zhang, X., Tian, Q.: Rethinking rotated object detection with gaussian wasserstein distance loss. In: International Conference on Machine Learning, pp. 11830–11841. PMLR (2021)
Google Scholar
Yin, H., Yang, S., Song, X., Liu, W., Li, J.: Deep fusion of multimodal features for social media retweet time prediction. World Wide Web 24, 1027–1044 (2021)
Article Google Scholar
Yu, F., et al.: Bdd100k: a diverse driving dataset for heterogeneous multitask learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2636–2645 (2020)
Google Scholar
Yu, J., Jiang, Y., Wang, Z., Cao, Z., Huang, T.: Unitbox: an advanced object detection network. In: Proceedings of the 24th ACM International Conference on Multimedia, pp. 516–520 (2016)
Google Scholar
Zhang, Wei, Gao, Xian-zhong, Yang, Chi-fu, Jiang, Feng, Chen, Zhi-yuan: A object detection and tracking method for security in intelligence of unmanned surface vehicles. J. Ambient Intell. Humanized Comput. 13(3), 1279–1291 (2020). https://doi.org/10.1007/s12652-020-02573-z
Article Google Scholar
Zheng, Z., et al.: Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE Trans. Cybern. 52(8), 8574–8586 (2021)
Article Google Scholar

Download references

Acknowledgments

This work was supported in part by the Major Key Project of PCL under Grant PCL2023A09 and PCL2022A03, Guangdong Major Project of Basic and Applied Basic Research under Grant 2019B030302002.

Author information

Authors and Affiliations

School of Information Engineering, Chang’an University, Xi’an, China
Kai Wang, Shijie Sun & Huansheng Song
School of Software and Electrical Engineering, Swinburne University of Technology, Melbourne, Australia
Xiangyu Song
New Network Research Division, Peng Cheng Laboratory, Shenzhen, China
Juan Zhao
School of Computer Science and Technology, Xidian University, Xi’an, China
Cai Xu

Authors

Kai Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiangyu Song
View author publications
You can also search for this author in PubMed Google Scholar
Shijie Sun
View author publications
You can also search for this author in PubMed Google Scholar
Juan Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Cai Xu
View author publications
You can also search for this author in PubMed Google Scholar
Huansheng Song
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiangyu Song .

Editor information

Editors and Affiliations

Peng Cheng Laboratory, Shenzhen, China
Xiangyu Song
China University of Geosciences, Wuhan, China
Ruyi Feng
China University of Geosciences, Wuhan, China
Yunliang Chen
Deakin University, Burwood, VIC, Australia
Jianxin Li
University of Exeter, Exeter, UK
Geyong Min

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, K., Song, X., Sun, S., Zhao, J., Xu, C., Song, H. (2024). Efficient Multi-object Detection for Complexity Spatio-Temporal Scenes. In: Song, X., Feng, R., Chen, Y., Li, J., Min, G. (eds) Web and Big Data. APWeb-WAIM 2023. Lecture Notes in Computer Science, vol 14334. Springer, Singapore. https://doi.org/10.1007/978-981-97-2421-5_13

Download citation

DOI: https://doi.org/10.1007/978-981-97-2421-5_13
Published: 12 May 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2420-8
Online ISBN: 978-981-97-2421-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Efficient Multi-object Detection for Complexity Spatio-Temporal Scenes