Skip to main content

Efficient Multi-object Detection for Complexity Spatio-Temporal Scenes

  • Conference paper
  • First Online:
Web and Big Data (APWeb-WAIM 2023)

Abstract

Multi-Object detection in traffic scenarios plays a crucial role in ensuring the safety of people and property, as well as facilitating the smooth flow of traffic on roads. However, the existing algorithms are inefficient in detecting real scenarios due to the following drawbacks: (1) a scarcity of traffic scene datasets; (2) a lack of tailoring for specific scenarios; and (3) high computational complexity, which hinders practical use. In this paper, we propose a solution to eliminate these drawbacks. Specifically, we introduce a Full-Scene Traffic Dataset (FSTD) with Spatio-temporal features that includes multiple views, multiple scenes, and multiple objectives. Additionally, we propose the improved YOLOv7 model with redesigned BiFusion, NWD and SPPFCSPC modules (BNF-YOLOv7), which is a lightweight and efficient approach that addresses the intricacies of multi-object detection in traffic scenarios. BNF-YOLOv7 is achieved through several improvements over YOLOv7, including the use of the BiFusion feature fusion module, the NWD approach, and the redesign of the loss function. First, we improve the SPPCSPC structure to obtain SPPFCSPC, which maintains the same receptive field while achieving speedup. Second, we use the BiFusion feature fusion module to enhance feature representation capability and improve positional information of objects. Additionally, we introduce NWD and redesign the loss function to address the detection of tiny objects in traffic scenarios. Experiments on the FSTD and UA-DETRAC dataset show that BNF-YOLOv7 outperforms other algorithms with a 3.3% increase in mAP on FSTD and a 2.4% increase on UA-DETRAC. Additionally, BNF-YOLOv7 maintains significantly better real-time performance, increasing the FPS by 10% in real scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: YOLOv4: optimal speed and accuracy of object detection. arXiv:2004.10934 [cs, eess] (2020)

  2. Cai, Z., Vasconcelos, N.: Cascade R-CNN: high quality object detection and instance segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 43(5), 1483–1498 (2019)

    Article  Google Scholar 

  3. Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Google Scholar 

  4. Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: Centernet: keypoint triplets for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6569–6578 (2019)

    Google Scholar 

  5. Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw. 107, 3–11 (2018)

    Article  Google Scholar 

  6. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The kitti vision benchmark suite. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)

    Google Scholar 

  7. Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)

    Google Scholar 

  8. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)

    Google Scholar 

  9. He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)

    Article  Google Scholar 

  10. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  11. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456. PMLR (2015)

    Google Scholar 

  12. Jocher, G.: YOLOv5 by Ultralytics (2020). https://doi.org/10.5281/zenodo.3908559, https://github.com/ultralytics/yolov5

  13. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  14. Kowol, K., Rottmann, M., Bracke, S., Gottschalk, H.: YOdar: uncertainty-based sensor fusion for vehicle detection with camera and radar sensors (2020). https://doi.org/10.48550/arXiv.2010.03320

  15. Li, C., et al.: YOLOv6 v3.0: a full-scale reloading (2023).https://doi.org/10.48550/arXiv.2301.05586

  16. Liang, M., Yang, B., Chen, Y., Hu, R., Urtasun, R.: Multi-task multi-sensor fusion for 3D object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7345–7353 (2019)

    Google Scholar 

  17. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  18. Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8759–8768 (2018)

    Google Scholar 

  19. Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2

    Chapter  Google Scholar 

  20. Marriott, R.T., Romdhani, S., Chen, L.: A 3D GAN for improved large-pose facial recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13445–13455 (2021)

    Google Scholar 

  21. Qin, L., et al.: Id-yolo: real-time salient object detection based on the driver’s fixation region. IEEE Trans. Intell. Transp. Syst. 23(9), 15898–15908 (2022)

    Article  Google Scholar 

  22. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

    Google Scholar 

  23. Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)

    Google Scholar 

  24. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)

  25. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. syst. 28 (2015)

    Google Scholar 

  26. Song, X., et al.: A survey on deep learning based knowledge tracing. Knowl.-Based Syst. 258, 110036 (2022)

    Article  Google Scholar 

  27. Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

    Google Scholar 

  28. Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M.: Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:2207.02696 (2022)

  29. Wang, C.Y., Liao, H.Y.M., Wu, Y.H., Chen, P.Y., Hsieh, J.W., Yeh, I.H.: CSPNet: a new backbone that can enhance learning capability of CNN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 390–391 (2020)

    Google Scholar 

  30. Wang, F., Xu, J., Liu, C., Zhou, R., Zhao, P.: On prediction of traffic flows in smart cities: a multitask deep learning based approach. World Wide Web 24, 805–823 (2021)

    Article  Google Scholar 

  31. Wang, J., Xu, C., Yang, W., Yu, L.: A normalized gaussian wasserstein distance for tiny object detection. arXiv preprint arXiv:2110.13389 (2021)

  32. Wang, L., et al.: Model: motif-based deep feature learning for link prediction. IEEE Trans. Comput. Soc. Syst. 7(2), 503–516 (2020)

    Article  Google Scholar 

  33. Wen, L., et al.: UA-DETRAC: a new benchmark and protocol for multi-object detection and tracking. Comput. Vis. Image Underst, 193, 102907 (2020)

    Google Scholar 

  34. Xu, C., et al.: Uncertainty-aware multi-view deep learning for internet of things applications. IEEE Trans. Industr. Inf. 19(2), 1456–1466 (2022)

    Article  Google Scholar 

  35. Yang, X., Yan, J., Ming, Q., Wang, W., Zhang, X., Tian, Q.: Rethinking rotated object detection with gaussian wasserstein distance loss. In: International Conference on Machine Learning, pp. 11830–11841. PMLR (2021)

    Google Scholar 

  36. Yin, H., Yang, S., Song, X., Liu, W., Li, J.: Deep fusion of multimodal features for social media retweet time prediction. World Wide Web 24, 1027–1044 (2021)

    Article  Google Scholar 

  37. Yu, F., et al.: Bdd100k: a diverse driving dataset for heterogeneous multitask learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2636–2645 (2020)

    Google Scholar 

  38. Yu, J., Jiang, Y., Wang, Z., Cao, Z., Huang, T.: Unitbox: an advanced object detection network. In: Proceedings of the 24th ACM International Conference on Multimedia, pp. 516–520 (2016)

    Google Scholar 

  39. Zhang, Wei, Gao, Xian-zhong, Yang, Chi-fu, Jiang, Feng, Chen, Zhi-yuan: A object detection and tracking method for security in intelligence of unmanned surface vehicles. J. Ambient Intell. Humanized Comput. 13(3), 1279–1291 (2020). https://doi.org/10.1007/s12652-020-02573-z

    Article  Google Scholar 

  40. Zheng, Z., et al.: Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE Trans. Cybern. 52(8), 8574–8586 (2021)

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported in part by the Major Key Project of PCL under Grant PCL2023A09 and PCL2022A03, Guangdong Major Project of Basic and Applied Basic Research under Grant 2019B030302002.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiangyu Song .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, K., Song, X., Sun, S., Zhao, J., Xu, C., Song, H. (2024). Efficient Multi-object Detection for Complexity Spatio-Temporal Scenes. In: Song, X., Feng, R., Chen, Y., Li, J., Min, G. (eds) Web and Big Data. APWeb-WAIM 2023. Lecture Notes in Computer Science, vol 14334. Springer, Singapore. https://doi.org/10.1007/978-981-97-2421-5_13

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-2421-5_13

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-2420-8

  • Online ISBN: 978-981-97-2421-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics