Skip to main content
Log in

MSF-YOLO: A multi-scale features fusion-based method for small object detection

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Small object detection has been widely used in real-world applications, such as small object detection from the perspective of UAVs and industrial inspection to locate small defects visible on the surface of materials. The width of each layer of network structure is not enough to represent rich multi-scale information, which may result in the model being insensitive to small objects and low detection accuracy. To address the above issues, we propose an MSF-YOLO model on the basis of the YOLOv3 algorithm. First, the multi-scale features of image is fused. With respect to the original ResNet cell, the single convolutional scale is increased to four convolutional scales, and the features under each different perceptual field are fused to obtain rich hierarchical information from images. Second, the initial anchor box is optimized. Twice K-means clustering methods are invoked to optimize the size of the initial anchor box to improve the overlap of the anchor box, further improving the accuracy of the model. Finally, the convergence of model is accelerated. By introducing the weight parameters obtained from training on the COCO dataset, the training process of the model is optimized as well as the convergence of the model is accelerated. Experimental results on two public datasets show that MSF-YOLO outperforms YOLOv3 with an average accuracy of 98.67% and 97.51%, and performs very well in mAP and IoU metrics compared to state-of-the-art models. Finally, an industrial dataset is introduced for evaluation, and the results showed a 31.54% improvement over the original YOLOv3. In summary, the MSF-YOLO model proposed in this paper is adaptable to the small object detection task in many different scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data availability

UAV-View and S2TLD datasets are available in refs 15, 26. Airplane-Rive is an industrial dataset that is not publicly according to the partner's policy available, but are available from the corresponding author on reasonable request. The source code of the paper is available on https://github.com/798911956/MSF-YOLO.

References

  1. Pathak AR, Pandey M, Rautaray S (2018) Application of deep learning for object detection. Procedia Comput Sci 132:1706–1717

    Article  Google Scholar 

  2. Sharma V, Mir RN (2020) A comprehensive and systematic look up into deep learning based object detection techniques: A review. Comput Sci Rev 38:100301

    Article  MathSciNet  Google Scholar 

  3. Zhao ZQ, Zheng P, Xu S et al (2019) Object detection with deep learning: A review. IEEE Trans Neural Netw Learn Syst 30(11):3212–3232

    Article  Google Scholar 

  4. Dhillon A, Verma GK (2020) Convolutional neural network: a review of models, methodologies and applications to object detection. Prog Artif Intell 9(2):85–112

    Article  Google Scholar 

  5. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollar P, Zitnick CL (2014) Microsoft coco: common objects in context. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014. Proceedings, Part V 13. Springer, pp 740–755

  6. Everingham M, Van Gool L, Williams CKI et al (2010) The pascal visual object classes (voc) challenge. Int J Comput Vision 88(2):303–338

    Article  Google Scholar 

  7. Liu Y, Sun P, Wergeles N et al (2021) A survey and performance evaluation of deep learning methods for small object detection. Expert Syst Appl 172(4):114602

    Article  Google Scholar 

  8. Tong K, Wu Y, Zhou F (2020) Recent advances in small object detection based on deep learning: A review. Image Vis Comput 97:103910

    Article  Google Scholar 

  9. Peters M, Neumann M, Iyyer M et al (2018) Deep contextualized word. Representations. https://doi.org/10.18653/v1/N18-1202

    Article  Google Scholar 

  10. He K, Chen X, Xie S, Li Y, Dollar P, Girshick R (2022) Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 16000–16009

  11. Weng L (2017) Object detection for dummies part 3: R-cnn family. lilianweng.github.io/lil-log

  12. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1440–1448

  13. Sun X, Wu P, Hoi SCH (2018) Face detection using deep learning: An improved faster RCNN approach. Neurocomputing 299:42–50

    Article  Google Scholar 

  14. Du J (2018) Understanding of object detection based on CNN family and YOLO[C]//Journal of Physics: Conference Series. IOP Publishing 1004(1):012029

    Google Scholar 

  15. Loey M, Manogaran G, Taha MHN et al (2021) Fighting against COVID-19: A novel deep learning model based on YOLO-v2 with ResNet-50 for medical face mask detection. Sustain Cities Soc 65:102600

    Article  Google Scholar 

  16. Wang C-Y, Bochkovskiy A, Liao H-YM (2023) Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7464–7475

  17. Kisantal M, Wojna Z, Murawski J, Naruniec J, Cho K (2019) Augmentation for small object detection. arXiv preprint arXiv:1902.07296

  18. Ren Y, Zhu C, Xiao S (2018) Small object detection in optical remote sensing images via modified faster R-CNN. Applied ences 8(5):813

    Google Scholar 

  19. Fu K, Chang Z, Zhang Y et al (2020) Rotation-aware and multi-scale convolutional neural network for object detection in remote sensing images. ISPRS J Photogramm Remote Sens 161:294–308

    Article  Google Scholar 

  20. Hu Y, Wu X, Zheng G, Liu X (2019) Object detection of UAV for anti-UAV based on improved yolo v3. In: 2019 Chinese Control Conference (CCC). IEEE, pp 8386–8390

  21. Pham MT, Courtrai L, Friguet C et al (2020) YOLO-Fine: one-stage detector of small objects under various backgrounds in remote sensing images. Remote Sens 12(15):2501

    Article  Google Scholar 

  22. Hu G, Yang Z, Hu L, Huang L, Han J (2018) Small object detection with multiscale features. Int J Digit Multim Broadcast 2018

  23. Liu M, Wang X, Zhou A et al (2020) UAV-YOLO: Small object detection on unmanned aerial vehicle perspective. Sensors 20(8):2238

    Article  Google Scholar 

  24. Zhang C, Benz P, Argaw DM, Lee S, Kim J, Rameau F, Bazin J-C, Kweon IS (2021) Resnet or densenet? Introducing dense shortcuts to resnet. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 3550–3559

  25. Zhong Y, Wang J, Peng J, Zhang L (2020) Anchor box optimization for object detection. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 1286–1294

  26. Anand R, Shanthi T, Nithish MS et al (2020) Face recognition and classification using GoogleNET architecture[M]//Soft computing for problem solving. Springer, Singapore, pp 261–269

    Google Scholar 

  27. Cheng B, Girshick R, Dollar P, Berg AC, Kirillov A (2021) Boundary IOU: improving object-centric image segmentation evaluation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 15334–15342

  28. Yang X, Yan J, Liao W, Yang X, Tang J, He T (2022) Scrdet++: detecting small, cluttered and rotated objects via instance-level feature denoising and rotation loss smoothing. IEEE Trans Pattern Anal Mach Intell 45(2):2384–2399

Download references

Funding

This work was supported by Jiangxi Provincial Department of Science and Technology (Grant numbers: 20202BBEL53002) and Beijing Science and Technology Planning Project (Grant number: Z201100001820022).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiaqi Zhou.

Ethics declarations

Conflict of interest

Authors declare that they have no conflicts of interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, F., Zhou, J., Chen, Y. et al. MSF-YOLO: A multi-scale features fusion-based method for small object detection. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-023-17818-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11042-023-17818-0

Keywords

Navigation