Abstract
Pedestrian detection is a challenging task in the field of computer vision and plays a crucial role in downstream tasks, such as video surveillance and autonomous driving.Despite significant progress over the past two decades, scale variance and occlusion remain prominent issues. To address these problems, we propose BDF-YOLOv5 in this paper. Based on YOLOv5, we replace the original FPN with the BDF network structure. Furthermore, to further improve our BDF-YOLOv5, we additionally improved the loss function for bounding box regression and proposed weighted-CIOU. Extensive experimental results on the Crowdhuman dataset demonstrate the feasibility of our method. Compared to the baseline model (YOLOv5), BDF-YOLOv5 achieves an improvement of approximately 4.0%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Girshick RB, Donahue J, Darrell T, Malik J (2013) Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE conference on computer vision and pattern recognition, pp 580–587
Girshick RB (2015) Fast r-cnn. In: 2015 IEEE international conference on computer vision (ICCV), pp 1440-1448
He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37:1904–1916
Liu W, Anguelov D, Erhan D, Szegedy C, Reed SE, Fu C-Y, Berg AC (2015) Ssd: single shot multibox detector. In: European conference on computer vision
Lin T-Y, Goyal P, Girshick RB, He K, Dollár P (2017) Focal loss for dense object detection. In: 2017 IEEE international conference on computer vision (ICCV), pp 2999–3007
Redmon J, Farhadi A (2016) Yolo9000: better, faster, stronger. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 6517–6525
Redmon J, Divvala SK, Girshick RB, Farhadi A (2015) You only look once: unified, real-time object detection. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 779–788
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. ArXiv, vol. abs/1804.02767
Bochkovskiy A, Wang C-Y, Liao H-YM (2020) Yolov4: optimal speed and accuracy of object detection. ArXiv, vol. abs/2004.10934
Jocher G, Stoken A, Borovec J, NanoCode012, Chaurasia A, TaoXie, Changyu L, Laughing AV, tkianai, yxNONG, Hogan A, lorenzomammana, AlexWang1900, Hajek J, Diaconu L, Marc, Kwon Y, oleg, wanghaoyang0106, Defretin Y, Lohia A, ml5ah, Milanko B, Fineran B, Khromov D, Yiwei D, Durgesh D, Ingham F (Apr. 2021) Ultralytics/yolov5: v5.0 - YOLOv5-P6 1280 models, AWS, Supervise.ly and YouTube integrations. [Online]. Available: https://doi.org/10.5281/zenodo.4679653
Lin Z, Pei W, Chen F, Zhang D, Lu G (2021) Pedestrian detection by exemplar-guided contrastive learning. IEEE Trans Image Process 32:2003–2016
Hasan I, Liao S, Li J, Akram SU, Shao L (2022) Pedestrian detection: domain generalization, cnns, transformers and beyond. ArXiv, vol abs/2201.03176
Rezatofighi SH, Tsoi N, Gwak J, Sadeghian A, Reid ID, Savarese S (2019) Generalized intersection over union: a metric and a loss for bounding box regression. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 658–666
Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2019) Distance-iou loss: faster and better learning for bounding box regression. In: AAAI conference on artificial intelligence
Gevorgyan Z (2022) Siou loss: more powerful learning for bounding box regression. ArXiv, vol abs/2205.12740
Zhou X, Koltun V, Krähenbühl P (2021) Probabilistic two-stage detection. ArXiv, vol abs/2103.07461
Lin T-Y, Dollár P, Girshick RB, He K, Hariharan B, Belongie SJ (2016) Feature pyramid networks for object detection. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 936–944
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 8759–8768
Tan M, Pang R, Le QV (2019) Efficientdet: scalable and efficient object detection. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 10 778–10 787
Ghiasi G, Lin T-Y, Pang R, Le QV (2019) Nas-fpn: learning scalable feature pyramid architecture for object detection. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7029–7038
Shao S, Zhao Z, Li B, Xiao T, Yu G, Zhang X, Sun J (2018) Crowdhuman: a benchmark for detecting human in a crowd. ArXiv, vol abs/1805.00123
Acknowledgements
Fund Project This work was supported by Tianjin Normal University Graduate Research Innovation Project Funding (2023KYCX005Y).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Xu, Y., Liu, R. (2024). BDF-YOLOV5: Improved YOLOV5 Based on Bi-directional Fusion Network for Dense Pedestrian Detection. In: Wang, W., Liu, X., Na, Z., Zhang, B. (eds) Communications, Signal Processing, and Systems. CSPS 2023. Lecture Notes in Electrical Engineering, vol 1032. Springer, Singapore. https://doi.org/10.1007/978-981-99-7505-1_52
Download citation
DOI: https://doi.org/10.1007/978-981-99-7505-1_52
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7539-6
Online ISBN: 978-981-99-7505-1
eBook Packages: EngineeringEngineering (R0)