Research on improved algorithm of object detection based on feature pyramid



To solve the low detection accuracy of SSD for the small size object, this paper proposed an improved algorithm of SSD object detection based on the feature pyramid (FP-SSD). In the deep convolutional neural network, the high-level features contain well semantic information but are not sensitive to the translations. The low-level features have high resolutions but could not represent the features well. The feature pyramid structure contains multi-scale features. To combine the high and low-level features of the pyramid, the algorithm of this paper applied the deconvolution network to the high-level features of the feature pyramid to get the semantic information, dilated convolution network to learn the position information of the low-level features and used convolution for the middle level features to reduce the feature channels, then used convolution to fuse the features. After using the algorithm, a multi-scale detection structure is constructed. FP-SSD achieves a mean accuracy of 79% on PASCAL VOC2007, and 47% on MSCOCO, which has a great improve compared with SSD. We compared the detection accuracy and results with all kinds of scales by experiments, compared with SSD, the accuracy of FP-SSD is higher, which has more accurate location and higher recognition confidence.


Feature pyramid Object detection Convolutional neural network Multi-scale detection Deep learning 



This work is partially supported by Shanxi Science Foundation (No.2015011045). The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation.


  1. 1.
    Everingham M, Gool LV, Williams CKI et al (2010) ThePascal, Visual Object Classes (VOC) Challenge[J]. Int J Comput Vis 88(2):303–338CrossRefGoogle Scholar
  2. 2.
    Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448.
  3. 3.
    Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, p 580–587.
  4. 4.
    He K, Zhang X, Ren S et al (2014) Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition[J]. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916CrossRefGoogle Scholar
  5. 5.
    He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778Google Scholar
  6. 6.
    Hoiem D, Chodpathumwan Y, Dai Q (2012) Diagnosing error in object detectors[C]// European conference on computer vision. p. 7574;340–353Google Scholar
  7. 7.
    Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift, Computer Science, pp. 448–456)Google Scholar
  8. 8.
    Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, ... Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision. Springer, Champions, p 740–755Google Scholar
  9. 9.
    Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017, July). Feature pyramid networks for object detection. IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944Google Scholar
  10. 10.
    Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. European conference on computer vision, p. 21–37.
  11. 11.
    Nair V, Hinton GE (2010) Rectified Linear Units Improve Restricted Boltzmann Machines.[J]. Proc Icml:807–814Google Scholar
  12. 12.
    Redmon J, Divvala S, Girshick R et al (2016) You Only Look Once: Unified, Real-Time Object Detection[C]// IEEE Conference on Computer Vision and Pattern Recognition. IEEE Comput Soc:779–788Google Scholar
  13. 13.
    Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Neural Information Processing Systems, Montreal, pp 91–99Google Scholar
  14. 14.
    Russakovsky O, Deng J, Su H et al (2015) ImageNet Large Scale Visual Recognition Challenge[J]. Int J Comput Vis 115(3):211–252MathSciNetCrossRefGoogle Scholar
  15. 15.
    Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. In 3rd International Conference on Learning Representations, pp. 580–587Google Scholar
  16. 16.
    Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017). Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceeding of the Thirty-First AAAI Conference on Artificial Intelligence, pp. 4278–4284Google Scholar
  17. 17.
    Uijlings JRR, Sande KEAV (2013) D, Gevers T, et al. Selective Search for Object Recognition[J]. Int J Comput Vis 104(2):154–171CrossRefGoogle Scholar
  18. 18.
    Zhou Q (2018) Multi-layer affective computing model based on emotional psychology[J]. Electron Commer Res 18(1):109–124. CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Data Science and TechnologyNorth University of ChinaTaiyuanChina

Personalised recommendations