DFE-Net: detail feature extraction network for small object detection

Li, Haibin; Ling, Li; Li, Yaqian; Zhang, Wenming

doi:10.1007/s00371-024-03277-7

DFE-Net: detail feature extraction network for small object detection

Research
Published: 15 February 2024

(2024)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Haibin Li^1,2,
Li Ling^1,2,
Yaqian Li^1,2 &
…
Wenming Zhang^1,2

354 Accesses
Explore all metrics

Abstract

The universal object detector has the problem of low detection accuracy and high missed detection rate when detecting small objects. In this paper, we propose a novel deep learning model called detail feature extraction network (DFE-Net) based on YOLOv5. Firstly, we remove C5 layer from the baseline backbone and introduce the detail feature enhancement module. In this module, we incorporate extended separable convolution to enable the model to effectively capture fine-grained information about small objects. Secondly, we propose a novel decoupled head based on attention. The mixed attention module assigns different weights to the classification and regression tasks in separate branches, effectively improving detection performance. Finally, the SIoU loss function is introduced to speed up the convergence of the model. The experimental results indicate that our proposed DFE-Net, when compared to the baseline, not only exhibits a significantly higher improvement in accuracy on the general small object dataset, but also reduces the parameter count by 57.7%. The mAP\(_{50}\) and mAP\(_{50-95}\) improve by 3.7% and 2.1% on the VisDrone2019 dataset, and by 5.9% and 5.1% on the TT100K dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Shallow Information Enhanced Efficient Small Object Detector Based on YOLOv5

Multi-scale detector optimized for small target

Article 01 March 2024

An attention-based feature pyramid network for single-stage small object detection

Article 18 November 2022

Availability of data and materials

The data and materials for this study can be accessed by contacting the authors or through an open data repository.

References

Bai, Y., Zhang, Y., Ding, M., Ghanem, B.: Sod-mtgan: small object detection via multi-task generative adversarial network. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 206–221 (2018)
Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.: Yolov4: optimal speed and accuracy of object detection. arXiv:2004.10934 (2020)
Cai, Z., Vasconcelos, N.: Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162 (2018)
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229. Springer (2020)
Cheng, G., Yuan, X., Yao, X., Yan, K., Zeng, Q., Xie, X., Han, J.: Towards large-scale small object detection: survey and benchmarks. IEEE Trans. Pattern Anal. Mach. Intell. (2023)
Deng, C., Wang, M., Liu, L., Liu, Y., Jiang, Y.: Extended feature pyramid network for small object detection. IEEE Trans. Multimed. 24, 1968–1979 (2021)
Article Google Scholar
Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part IV 13, pp. 184–199. Springer (2014)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv:2010.11929 (2020)
Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: Yolox: exceeding yolo series in 2021. arXiv:2107.08430 (2021)
Gevorgyan, Z.: Siou loss: more powerful learning for bounding box regression. arXiv:2205.12740 (2022)
Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Guo, C., Fan, B., Zhang, Q., Xiang, S., Pan, C.: Augfpn: improving multi-scale feature learning for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12595–12604 (2020)
Hong, D., Yokoya, N., Chanussot, J., Zhu, X.X.: An augmented linear mixing model to address spectral variability for hyperspectral unmixing. IEEE Trans. Image Process. 28(4), 1923–1938 (2018)
Article MathSciNet ADS Google Scholar
Hong, D., Zhang, B., Li, H., Li, Y., Yao, J., Li, C., Werner, M., Chanussot, J., Zipf, A., Zhu, X.X.: Cross-city matters: a multimodal remote sensing benchmark dataset for cross-city semantic segmentation using high-resolution domain adaptation networks. Remote Sens. Environ. 299, 113856 (2023)
Article Google Scholar
Hong, D., Zhang, B., Li, X., Li, Y., Li, C., Yao, J., Yokoya, N., Li, H., Jia, X., Plaza, A., et al.: Spectralgpt: spectral foundation model. arXiv:2311.07113 (2023)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Hu, P., Ramanan, D.: Finding tiny faces. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 951–959 (2017)
Jocher, G., Stoken, A., Borovec, J., Chaurasia, A., Changyu, L., Hogan, A., Hajek, J., Diaconu, L., Kwon, Y., Defretin, Y., et al.: ultralytics/yolov5: v5. 0-yolov5-p6 1280 models, aws, supervise. ly and youtube integrations. In: Zenodo (2021)
Leng, J., Mo, M., Zhou, Y., Gao, C., Li, W., Gao, X.: Pareto refocusing for drone-view object detection. IEEE Trans. Circuits Syst. Video Technol. 33(3), 1320–1334 (2022)
Article Google Scholar
Li, C., Zhang, B., Hong, D., Yao, J., Chanussot, J.: Lrr-net: an interpretable deep unfolding network for hyperspectral anomaly detection. IEEE Trans. Geosci. Remote Sens. (2023)
Li, C., Li, L., Jiang, H., Weng, K., Geng, Y., Li, L., Ke, Z., Li, Q., Cheng, M., Nie, W., et al.: Yolov6: a single-stage object detection framework for industrial applications. arXiv:2209.02976 (2022)
Li, H., Xiong, P., An, J., Wang, L.: Pyramid attention network for semantic segmentation. arXiv:1805.10180 (2018)
Li, J., Liang, X., Wei, Y., Xu, T., Feng, J., Yan, S.: Perceptual generative adversarial networks for small object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1222–1230 (2017)
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Ma, M., Xia, C., Xie, C., Chen, X., Li, J.: Boosting broader receptive fields for salient object detection. IEEE Trans. Image Process. 32, 1026–1038 (2023)
Article ADS Google Scholar
Noh, J., Bae, W., Lee, W., Seo, J., Kim, G.: Better to follow, follow to be better: towards precise supervision of feature super-resolution for small object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9725–9734 (2019)
Qi, L., Kuen, J., Gu, J., Lin, Z., Wang, Y., Chen, Y., Li, Y., Jia, J.: Multi-scale aligned distillation for low-resolution detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp .14443–14453 (2021)
Ran, Q., Wang, Q., Zhao, B., Yuanfeng, W., Shengliang, P., Li, Z.: Lightweight oriented object detection using multiscale context and enhanced channel attention in remote sensing images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 14, 5786–5795 (2021)
Article ADS Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv:1804.02767 (2018)
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5693–5703 (2019)
Tan, M., Pang, R., Le, Q.V.: Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10781–10790 (2020)
Tian, Z., Shen, C., Chen, H., He, T.: Fcos: fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9627–9636 (2019)
Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464–7475 (2023)
Wang, W., Shen, J., Cheng, M.-M., Shao, L.: An iterative and cooperative top-down and bottom-up inference network for salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5968–5977 (2019)
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
Yang, L., Li, L., Xin, X., Sun, Y., Song, Q., Wang, W.: Large-scale person detection and localization using overhead fisheye cameras. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 19961–19971 (2023)
Yang, X., Yang, J., Yan, J., Zhang, Y., Zhang, T., Guo, Z., Sun, X., Fu, K.: Scrdet: towards more robust detection for small, cluttered and rotated objects. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8232–8241 (2019)
Yuan, X., Cheng, G., Yan, K., Zeng, Q., Han, J.: Small object detection via coarse-to-fine proposal generation and imitation learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6317–6327 (2023)
Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., Yoo, Y.: Cutmix: regularization strategy to train strong classifiers with localizable features. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6023–6032 (2019)
Zhang, H., Wang, Y., Dayoub, F., Sunderhauf, N.: Varifocalnet: an iou-aware dense object detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8514–8523 (2021)
Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: beyond empirical risk minimization. arXiv:1710.09412 (2017)
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017)
Zhu, P., Wen, L., Dawei, D., Bian, X., Fan, H., Qinghua, H., Ling, H.: Detection and tracking meet drones challenge. IEEE Trans. Pattern Anal. Mach. Intell. 44(11), 7380–7399 (2021)
Article Google Scholar
Zhu, X., Cheng, D., Zhang, Z., Lin, S., Dai, J.: An empirical study of spatial attention mechanisms in deep networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6688–6697 (2019)
Zhu, Y., Zhou, Q., Liu, N., Xu, Z., Ou, Z., Mou, X., Tang, J.: Scalekd: distilling scale-aware knowledge in small object detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19723–19733 (2023)
Zhu, Z., Liang, D., Zhang, S., Huang, X., Li, B., Hu, S.: Traffic-sign detection and classification in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2110–2118 (2016)

Download references

Acknowledgements

This work was supported by the Provincial Key Laboratory Performance Subsidy Project (22567612H) and the National Natural Science Foundation of China (62106214).

Funding

This study received funding from the Provincial Key Laboratory Performance Subsidy (22567612H) and the National Natural Science Foundation of China (62106214).

Author information

Authors and Affiliations

Key Laboratory of Industrial Computer Control Engineering of Hebei Province, Yanshan University, Qinhuangdao, 066004, Hebei, China
Haibin Li, Li Ling, Yaqian Li & Wenming Zhang
Engineering Research Center of the Ministry of Education for Intelligent Control System and Intelligent Equipment, Yanshan University, Qinhuangdao, 066004, Hebei, China
Haibin Li, Li Ling, Yaqian Li & Wenming Zhang

Authors

Haibin Li
View author publications
You can also search for this author in PubMed Google Scholar
Li Ling
View author publications
You can also search for this author in PubMed Google Scholar
Yaqian Li
View author publications
You can also search for this author in PubMed Google Scholar
Wenming Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

LL designed the research experiments and conducted the paper writing. The other authors provided guidance and supervision.

Corresponding author

Correspondence to Li Ling.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Code availability

The code used in the study can be obtained by contacting the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, H., Ling, L., Li, Y. et al. DFE-Net: detail feature extraction network for small object detection. Vis Comput (2024). https://doi.org/10.1007/s00371-024-03277-7

Download citation

Accepted: 11 January 2024
Published: 15 February 2024
DOI: https://doi.org/10.1007/s00371-024-03277-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DFE-Net: detail feature extraction network for small object detection

Abstract

Access this article

Similar content being viewed by others

A Shallow Information Enhanced Efficient Small Object Detector Based on YOLOv5

Multi-scale detector optimized for small target

An attention-based feature pyramid network for single-stage small object detection

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Code availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

DFE-Net: detail feature extraction network for small object detection

Abstract

Access this article

Similar content being viewed by others

A Shallow Information Enhanced Efficient Small Object Detector Based on YOLOv5

Multi-scale detector optimized for small target

An attention-based feature pyramid network for single-stage small object detection

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Code availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation