Abstract
Small object detection has been widely used in real-world applications, such as small object detection from the perspective of UAVs and industrial inspection to locate small defects visible on the surface of materials. The width of each layer of network structure is not enough to represent rich multi-scale information, which may result in the model being insensitive to small objects and low detection accuracy. To address the above issues, we propose an MSF-YOLO model on the basis of the YOLOv3 algorithm. First, the multi-scale features of image is fused. With respect to the original ResNet cell, the single convolutional scale is increased to four convolutional scales, and the features under each different perceptual field are fused to obtain rich hierarchical information from images. Second, the initial anchor box is optimized. Twice K-means clustering methods are invoked to optimize the size of the initial anchor box to improve the overlap of the anchor box, further improving the accuracy of the model. Finally, the convergence of model is accelerated. By introducing the weight parameters obtained from training on the COCO dataset, the training process of the model is optimized as well as the convergence of the model is accelerated. Experimental results on two public datasets show that MSF-YOLO outperforms YOLOv3 with an average accuracy of 98.67% and 97.51%, and performs very well in mAP and IoU metrics compared to state-of-the-art models. Finally, an industrial dataset is introduced for evaluation, and the results showed a 31.54% improvement over the original YOLOv3. In summary, the MSF-YOLO model proposed in this paper is adaptable to the small object detection task in many different scenarios.
Similar content being viewed by others
Data availability
UAV-View and S2TLD datasets are available in refs 15, 26. Airplane-Rive is an industrial dataset that is not publicly according to the partner's policy available, but are available from the corresponding author on reasonable request. The source code of the paper is available on https://github.com/798911956/MSF-YOLO.
References
Pathak AR, Pandey M, Rautaray S (2018) Application of deep learning for object detection. Procedia Comput Sci 132:1706–1717
Sharma V, Mir RN (2020) A comprehensive and systematic look up into deep learning based object detection techniques: A review. Comput Sci Rev 38:100301
Zhao ZQ, Zheng P, Xu S et al (2019) Object detection with deep learning: A review. IEEE Trans Neural Netw Learn Syst 30(11):3212–3232
Dhillon A, Verma GK (2020) Convolutional neural network: a review of models, methodologies and applications to object detection. Prog Artif Intell 9(2):85–112
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollar P, Zitnick CL (2014) Microsoft coco: common objects in context. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014. Proceedings, Part V 13. Springer, pp 740–755
Everingham M, Van Gool L, Williams CKI et al (2010) The pascal visual object classes (voc) challenge. Int J Comput Vision 88(2):303–338
Liu Y, Sun P, Wergeles N et al (2021) A survey and performance evaluation of deep learning methods for small object detection. Expert Syst Appl 172(4):114602
Tong K, Wu Y, Zhou F (2020) Recent advances in small object detection based on deep learning: A review. Image Vis Comput 97:103910
Peters M, Neumann M, Iyyer M et al (2018) Deep contextualized word. Representations. https://doi.org/10.18653/v1/N18-1202
He K, Chen X, Xie S, Li Y, Dollar P, Girshick R (2022) Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 16000–16009
Weng L (2017) Object detection for dummies part 3: R-cnn family. lilianweng.github.io/lil-log
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1440–1448
Sun X, Wu P, Hoi SCH (2018) Face detection using deep learning: An improved faster RCNN approach. Neurocomputing 299:42–50
Du J (2018) Understanding of object detection based on CNN family and YOLO[C]//Journal of Physics: Conference Series. IOP Publishing 1004(1):012029
Loey M, Manogaran G, Taha MHN et al (2021) Fighting against COVID-19: A novel deep learning model based on YOLO-v2 with ResNet-50 for medical face mask detection. Sustain Cities Soc 65:102600
Wang C-Y, Bochkovskiy A, Liao H-YM (2023) Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7464–7475
Kisantal M, Wojna Z, Murawski J, Naruniec J, Cho K (2019) Augmentation for small object detection. arXiv preprint arXiv:1902.07296
Ren Y, Zhu C, Xiao S (2018) Small object detection in optical remote sensing images via modified faster R-CNN. Applied ences 8(5):813
Fu K, Chang Z, Zhang Y et al (2020) Rotation-aware and multi-scale convolutional neural network for object detection in remote sensing images. ISPRS J Photogramm Remote Sens 161:294–308
Hu Y, Wu X, Zheng G, Liu X (2019) Object detection of UAV for anti-UAV based on improved yolo v3. In: 2019 Chinese Control Conference (CCC). IEEE, pp 8386–8390
Pham MT, Courtrai L, Friguet C et al (2020) YOLO-Fine: one-stage detector of small objects under various backgrounds in remote sensing images. Remote Sens 12(15):2501
Hu G, Yang Z, Hu L, Huang L, Han J (2018) Small object detection with multiscale features. Int J Digit Multim Broadcast 2018
Liu M, Wang X, Zhou A et al (2020) UAV-YOLO: Small object detection on unmanned aerial vehicle perspective. Sensors 20(8):2238
Zhang C, Benz P, Argaw DM, Lee S, Kim J, Rameau F, Bazin J-C, Kweon IS (2021) Resnet or densenet? Introducing dense shortcuts to resnet. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 3550–3559
Zhong Y, Wang J, Peng J, Zhang L (2020) Anchor box optimization for object detection. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 1286–1294
Anand R, Shanthi T, Nithish MS et al (2020) Face recognition and classification using GoogleNET architecture[M]//Soft computing for problem solving. Springer, Singapore, pp 261–269
Cheng B, Girshick R, Dollar P, Berg AC, Kirillov A (2021) Boundary IOU: improving object-centric image segmentation evaluation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 15334–15342
Yang X, Yan J, Liao W, Yang X, Tang J, He T (2022) Scrdet++: detecting small, cluttered and rotated objects via instance-level feature denoising and rotation loss smoothing. IEEE Trans Pattern Anal Mach Intell 45(2):2384–2399
Funding
This work was supported by Jiangxi Provincial Department of Science and Technology (Grant numbers: 20202BBEL53002) and Beijing Science and Technology Planning Project (Grant number: Z201100001820022).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Authors declare that they have no conflicts of interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yang, F., Zhou, J., Chen, Y. et al. MSF-YOLO: A multi-scale features fusion-based method for small object detection. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-023-17818-0
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11042-023-17818-0