Skip to main content
Log in

FESSD:SSD target detection based on feature fusion and feature enhancement

  • Original Research Paper
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

In recent years, significant breakthroughs have been made in target detection. However, although the existing two-stage target detection algorithm has high precision, the detection velocity is slow to content the real-time requirements. One-stage target detection algorithms can meet real-time requirements but have poor detection capabilities, especially for detecting the small target. In this paper, we propose an end-to-end feature fusion and feature enhancement SSD (FESSD) target detection algorithm to increase the capability of one-stage target detection. Firstly, a deeper ResNet-50 is used to replace VGG16 as the backbone network to obtain richer semantic information. Five extra layers are added to generate feature maps of different sizes for multi-scale target detection. Then, the feature maps are fused by the maximum pooling feature fusion module (MPFFM) and upsampling feature fusion module (UPFFM) to generate a new feature pyramid, which introduces semantic information into the shallow feature mapping. Finally, the feature enhancement module (FEM) is used to expand the receptive field of the output feature map, introduce more context information, and further enhance the feature expression ability of the model. Experimental results on the PASCAL VOC and MS COCO datasets validated the method’s validity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)

  2. Cao, G., Xie, X., Yang, W., Liao, Q., Shi, G., Wu, J.: Feature-fused DDS: fast detection for small objects. In: Ninth International Conference on Graphic and Image Processing (ICGIP 2017), vol. 10615, p. 106151E. International Society for Optics and Photonics (2018)

  3. Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. Adv. Neural Inf. Process. Syst. 29, 379–387 (2016)

    Google Scholar 

  4. Everingham, M., Zisserman, A., Williams, C.K., Gool, L.V., Allan, M., Bishop, C.M., Chapelle, O., Dalal, N., Deselaers, T., Dorkó, G., et al.: The 2005 pascal visual object classes challenge. In: Machine Learning Challenges Workshop, pp. 117–176. Springer (2005)

  5. Fu, C.Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: Dssd: Deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)

  6. Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)

  7. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 142–158 (2015)

    Article  Google Scholar 

  8. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  9. Hou, Q., Zhou, D., Feng, J.: Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13713–13722 (2021)

  10. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)

  11. Ji, H., Gao, Z., Mei, T., Ramesh, B.: Vehicle detection in remote sensing images leveraging on simultaneous super-resolution. IEEE Geosci. Remote Sens. Lett. 17(4), 676–680 (2019)

    Article  Google Scholar 

  12. Kumar, C., Punitha, R., et al.: Performance analysis of object detection algorithm for intelligent traffic surveillance system. In: 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA), pp. 573–579. IEEE (2020)

  13. Li, Z., Zhou, F.: Fssd: feature fusion single shot multibox detector. arXiv preprint arXiv:1712.00960 (2017)

  14. Lian, G., Wang, Y., Qin, H., Chen, G.: Towards unified on-road object detection and depth estimation from a single image. Int. J. Mach. Learn. Cybern. 13(5), 1231–1241 (2022)

    Article  Google Scholar 

  15. Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)

  16. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: European Conference on Computer Vision, pp. 740–755. Springer (2014)

  17. Liu, S., Huang, D., et al.: Receptive field block net for accurate and fast object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 385–400 (2018)

  18. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37. Springer (2016)

  19. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)

  20. Lu, X., Ji, J., Xing, Z., Miao, Q.: Attention and feature fusion SSD for remote sensing object detection. IEEE Trans. Instrum. Meas. 70, 1–9 (2021)

    Article  Google Scholar 

  21. Mao, L., Li, X., Yang, D., Zhang, R.: Convolutional feature frequency adaptive fusion object detection network. Neural Process. Lett. 53(5), 3545–3560 (2021)

    Article  Google Scholar 

  22. Preetha, K., et al.: A fuzzy rule-based abandoned object detection using image fusion for intelligent video surveillance systems. Turk. J. Comput. Math. Educ. (TURCOMAT) 12(3), 3694–3702 (2021)

    Article  Google Scholar 

  23. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

  24. Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)

  25. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)

  26. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)

    Article  Google Scholar 

  27. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241. Springer (2015)

  28. Shi, W., Bao, S., Tan, D.: Ffessd: an accurate and efficient single-shot detector for target detection. Appl. Sci. 9(20), 4276 (2019)

    Article  Google Scholar 

  29. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  30. Song, H., Wang, W., Zhao, S., Shen, J., Lam, K.M.: Pyramid dilated deeper convlstm for video salient object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 715–731 (2018)

  31. Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., Tang, X.: Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2017)

  32. Wang, K., Liu, M.: Yolov3-mt: a yolov3 using multi-target tracking for vehicle visual detection. Appl. Intell. 52(2), 2070–2091 (2022)

    Article  Google Scholar 

  33. Wang, Q., Wu, B., Zhu, P., Li, P., Hu, Q.: Eca-net: efficient channel attention for deep convolutional neural networks. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

  34. Wang, W., Zhao, S., Shen, J., Hoi, S.C., Borji, A.: Salient object detection with pyramid attention and salient edges. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1448–1457 (2019)

  35. Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)

  36. Ye, L., Wang, L., Sun, Y., Zhao, L., Wei, Y.: Parallel multi-stage features fusion of deep convolutional neural networks for aerial scene classification. Remote Sens. Lett. 9(3), 294–303 (2018)

    Article  Google Scholar 

  37. Ying, X., Wang, Q., Li, X., Yu, M., Jiang, H., Gao, J., Liu, Z., Yu, R.: Multi-attention object detection model in remote sensing images based on multi-scale. IEEE Access 7, 94508–94519 (2019)

    Article  Google Scholar 

  38. Yu, F., Koltun, V., Funkhouser, T.: Dilated residual networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 472–480 (2017)

Download references

Acknowledgements

This work was supported by Key-Area Research and Development Program of Guangdong Province under Grant (Funding No.: 2020B0909020001) and National Natural Science Foundation of China (Funding No.: 61573113).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huilin Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qian, H., Wang, H., Feng, S. et al. FESSD:SSD target detection based on feature fusion and feature enhancement. J Real-Time Image Proc 20, 2 (2023). https://doi.org/10.1007/s11554-023-01258-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11554-023-01258-y

Keywords

Navigation