Abstract
In recent years, significant breakthroughs have been made in target detection. However, although the existing two-stage target detection algorithm has high precision, the detection velocity is slow to content the real-time requirements. One-stage target detection algorithms can meet real-time requirements but have poor detection capabilities, especially for detecting the small target. In this paper, we propose an end-to-end feature fusion and feature enhancement SSD (FESSD) target detection algorithm to increase the capability of one-stage target detection. Firstly, a deeper ResNet-50 is used to replace VGG16 as the backbone network to obtain richer semantic information. Five extra layers are added to generate feature maps of different sizes for multi-scale target detection. Then, the feature maps are fused by the maximum pooling feature fusion module (MPFFM) and upsampling feature fusion module (UPFFM) to generate a new feature pyramid, which introduces semantic information into the shallow feature mapping. Finally, the feature enhancement module (FEM) is used to expand the receptive field of the output feature map, introduce more context information, and further enhance the feature expression ability of the model. Experimental results on the PASCAL VOC and MS COCO datasets validated the method’s validity.
Similar content being viewed by others
References
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)
Cao, G., Xie, X., Yang, W., Liao, Q., Shi, G., Wu, J.: Feature-fused DDS: fast detection for small objects. In: Ninth International Conference on Graphic and Image Processing (ICGIP 2017), vol. 10615, p. 106151E. International Society for Optics and Photonics (2018)
Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. Adv. Neural Inf. Process. Syst. 29, 379–387 (2016)
Everingham, M., Zisserman, A., Williams, C.K., Gool, L.V., Allan, M., Bishop, C.M., Chapelle, O., Dalal, N., Deselaers, T., Dorkó, G., et al.: The 2005 pascal visual object classes challenge. In: Machine Learning Challenges Workshop, pp. 117–176. Springer (2005)
Fu, C.Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: Dssd: Deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)
Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 142–158 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Hou, Q., Zhou, D., Feng, J.: Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13713–13722 (2021)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Ji, H., Gao, Z., Mei, T., Ramesh, B.: Vehicle detection in remote sensing images leveraging on simultaneous super-resolution. IEEE Geosci. Remote Sens. Lett. 17(4), 676–680 (2019)
Kumar, C., Punitha, R., et al.: Performance analysis of object detection algorithm for intelligent traffic surveillance system. In: 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA), pp. 573–579. IEEE (2020)
Li, Z., Zhou, F.: Fssd: feature fusion single shot multibox detector. arXiv preprint arXiv:1712.00960 (2017)
Lian, G., Wang, Y., Qin, H., Chen, G.: Towards unified on-road object detection and depth estimation from a single image. Int. J. Mach. Learn. Cybern. 13(5), 1231–1241 (2022)
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: European Conference on Computer Vision, pp. 740–755. Springer (2014)
Liu, S., Huang, D., et al.: Receptive field block net for accurate and fast object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 385–400 (2018)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37. Springer (2016)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Lu, X., Ji, J., Xing, Z., Miao, Q.: Attention and feature fusion SSD for remote sensing object detection. IEEE Trans. Instrum. Meas. 70, 1–9 (2021)
Mao, L., Li, X., Yang, D., Zhang, R.: Convolutional feature frequency adaptive fusion object detection network. Neural Process. Lett. 53(5), 3545–3560 (2021)
Preetha, K., et al.: A fuzzy rule-based abandoned object detection using image fusion for intelligent video surveillance systems. Turk. J. Comput. Math. Educ. (TURCOMAT) 12(3), 3694–3702 (2021)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241. Springer (2015)
Shi, W., Bao, S., Tan, D.: Ffessd: an accurate and efficient single-shot detector for target detection. Appl. Sci. 9(20), 4276 (2019)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Song, H., Wang, W., Zhao, S., Shen, J., Lam, K.M.: Pyramid dilated deeper convlstm for video salient object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 715–731 (2018)
Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., Tang, X.: Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2017)
Wang, K., Liu, M.: Yolov3-mt: a yolov3 using multi-target tracking for vehicle visual detection. Appl. Intell. 52(2), 2070–2091 (2022)
Wang, Q., Wu, B., Zhu, P., Li, P., Hu, Q.: Eca-net: efficient channel attention for deep convolutional neural networks. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Wang, W., Zhao, S., Shen, J., Hoi, S.C., Borji, A.: Salient object detection with pyramid attention and salient edges. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1448–1457 (2019)
Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
Ye, L., Wang, L., Sun, Y., Zhao, L., Wei, Y.: Parallel multi-stage features fusion of deep convolutional neural networks for aerial scene classification. Remote Sens. Lett. 9(3), 294–303 (2018)
Ying, X., Wang, Q., Li, X., Yu, M., Jiang, H., Gao, J., Liu, Z., Yu, R.: Multi-attention object detection model in remote sensing images based on multi-scale. IEEE Access 7, 94508–94519 (2019)
Yu, F., Koltun, V., Funkhouser, T.: Dilated residual networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 472–480 (2017)
Acknowledgements
This work was supported by Key-Area Research and Development Program of Guangdong Province under Grant (Funding No.: 2020B0909020001) and National Natural Science Foundation of China (Funding No.: 61573113).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Qian, H., Wang, H., Feng, S. et al. FESSD:SSD target detection based on feature fusion and feature enhancement. J Real-Time Image Proc 20, 2 (2023). https://doi.org/10.1007/s11554-023-01258-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11554-023-01258-y