Advertisement

Single-Stage Detector with Semantic Attention for Occluded Pedestrian Detection

  • Fang Wen
  • Zehang Lin
  • Zhenguo YangEmail author
  • Wenyin LiuEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11296)

Abstract

In this paper, we propose a pedestrian detection method with semantic attention based on the single-stage detector architecture (i.e., RetinaNet) for occluded pedestrian detection, denoted as PDSA. PDSA contains a semantic segmentation component and a detector component. Specifically, the first component uses visible bounding boxes for semantic segmentation, aiming to obtain an attention map for pedestrians and the inter-class (non-pedestrian) occlusion. The second component utilizes the single-stage detector to locate the pedestrian from the features obtained previously. The single-stage detector adopts over-sampling of possible object locations, which is faster than two-stage detectors that train classifier to identify candidate object locations. In particular, we introduce the repulsion loss to deal with the intra-class occlusion. Extensive experiments conducted on the public CityPersons dataset demonstrate the effectiveness of PDSA for occluded pedestrian detection, which outperforms the state-of-the-art approaches.

Keywords

Occluded pedestrian detection Single-stage detector Repulsion loss Semantic segmentation network 

Notes

Acknowledgments

This work is supported by the National Natural Science Foundation of China (No. 61703109, No. 91748107), China Postdoctoral Science Foundation (No. 2018M643026), and the Guangdong Innovative Research Team Program (No. 2014ZT05G157).

References

  1. 1.
    Lin, T., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Computer Vision and Pattern Recognition (CVPR), pp. 2117–2125 (2017)Google Scholar
  2. 2.
    Lin, T., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: International Conference on Computer Vision (ICCV), pp. 2999–3007 (2017)Google Scholar
  3. 3.
    Girshick, R.: Fast R-CNN. In: Computer Vision and Pattern Recognition (CVPR), pp. 1440–1448 (2015)Google Scholar
  4. 4.
    Zhang, S., Yang, J., Schiele, B.: Occluded pedestrian detection through guided attention in CNNs. In: Computer Vision and Pattern Recognition (CVPR), pp. 6995–7003 (2018)Google Scholar
  5. 5.
    Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  6. 6.
    Fu, C., Liu, W., Ranga, A., Tyagi, A., Berg, A.: DSSD: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)
  7. 7.
    Wang, X., Xiao, T., Jiang, Y., Shao, S., Sun, J., Shen, C.: Repulsion loss: detecting pedestrians in a crowd. In: International Conference on Computer Vision (CVPR) (2018)Google Scholar
  8. 8.
    Luo, P., Tian, Y., Wang, X., Tang, X.: Switchable deep network for pedestrian detection. In: Computer Vision and Pattern Recognition (CVPR) (2014)Google Scholar
  9. 9.
    Hosang, J., Omran, M., Benenson, R., Schiele, B.: Taking a deeper look at pedestrians. In: Computer Vision and Pattern Recognition (CVPR), pp. 4073–4082 (2015)Google Scholar
  10. 10.
    Zhang, S., Benenson, R., Schiele, B.: Filtered channel features for pedestrian detection. In: Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
  11. 11.
    Li, J., Liang, X., Shen, S., Xu, T., Yan, S.: Scale-aware fast R-CNN for pedestrian detection. IEEE Trans. Multimedia 20(4), 985–996 (2017) Google Scholar
  12. 12.
    Cai, Z., Fan, Q., Feris, Rogerio S., Vasconcelos, N.: A unified multi-scale deep convolutional neural network for fast object detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 354–370. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46493-0_22CrossRefGoogle Scholar
  13. 13.
    Zhang, L., Lin, L., Liang, X., He, K.: Is faster R-CNN doing well for pedestrian detection? In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 443–457. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46475-6_28CrossRefGoogle Scholar
  14. 14.
    Ouyang, W., Wang, X.: A discriminative deep model for pedestrian detection with occlusion handling. In: Computer Vision and Pattern Recognition (CVPR) (2012)Google Scholar
  15. 15.
    Mathias, M., Benenson, R., Timofte, R., Van, L.: Handling occlusions with Franken-classifiers. In: International Conference on Computer Vision (ICCV) (2013)Google Scholar
  16. 16.
    Tian, Y., Luo, P., Wang, X., Tang, X.: Deep learning strong parts for pedestrian detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1904–1912 (2015)Google Scholar
  17. 17.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016)Google Scholar
  18. 18.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (ICLR) (2014)Google Scholar
  19. 19.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, vol. 60, pp. 1097–1105 (2012)Google Scholar
  20. 20.
    Jiang, Y., Jiang, Y., Cao, Z., Cao, Z., Huang, T.: UnitBox: an advanced object detection network. In: ACM on Multimedia Conference, pp. 516–520 (2016)Google Scholar
  21. 21.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)CrossRefGoogle Scholar
  22. 22.
    Zhang, S., Benenson, R., Schiele, B.: CityPersons: a diverse dataset for pedestrian detection. In: Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  23. 23.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (ICAI), pp. 249–256 (2010)Google Scholar
  24. 24.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  25. 25.
    Song, T., Sun, L., Xie, D., Sun, H., Pu, S.: Small-scale pedestrian detection based on somatic topology localization and temporal feature aggregation. arXiv preprint arXiv:1807.01438 (2018)

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of AutomationGuangdong University of TechnologyGuangzhouChina
  2. 2.School of Computer Science and TechnologyGuangdong University of TechnologyGuangzhouChina
  3. 3.Department of Computer ScienceCity University of Hong KongHong KongChina

Personalised recommendations