Advertisement

Is Faster R-CNN Doing Well for Pedestrian Detection?

  • Liliang Zhang
  • Liang LinEmail author
  • Xiaodan Liang
  • Kaiming He
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9906)

Abstract

Detecting pedestrian has been arguably addressed as a special topic beyond general object detection. Although recent deep learning object detectors such as Fast/Faster R-CNN have shown excellent performance for general object detection, they have limited success for detecting pedestrian, and previous leading pedestrian detectors were in general hybrid methods combining hand-crafted and deep convolutional features. In this paper, we investigate issues involving Faster R-CNN for pedestrian detection. We discover that the Region Proposal Network (RPN) in Faster R-CNN indeed performs well as a stand-alone pedestrian detector, but surprisingly, the downstream classifier degrades the results. We argue that two reasons account for the unsatisfactory accuracy: (i) insufficient resolution of feature maps for handling small instances, and (ii) lack of any bootstrapping strategy for mining hard negative examples. Driven by these observations, we propose a very simple but effective baseline for pedestrian detection, using an RPN followed by boosted forests on shared, high-resolution convolutional feature maps. We comprehensively evaluate this method on several benchmarks (Caltech, INRIA, ETH, and KITTI), presenting competitive accuracy and good speed. Code will be made publicly available.

Keywords

Pedestrian detection Convolutional neural networks Boosted forests Hard-negative mining 

Notes

Acknowledgement

This work was supported in part by State Key Development Program under Grant 2016YFB1001000, in part by Guangdong Natural Science Foundation under Grant S2013050014548. This work was also supported by Special Program for Applied Research on Super Computation of the NSFC-Guangdong Joint Fund (the second phase). We thank the anonymous reviewers for their constructive comments on improving this paper.

References

  1. 1.
    Hosang, J., Omran, M., Benenson, R., Schiele, B.: Taking a deeper look at pedestrians. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
  2. 2.
    Tian, Y., Luo, P., Wang, X., Tang, X.: Pedestrian detection aided by deep learning semantic tasks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
  3. 3.
    Tian, Y., Luo, P., Wang, X., Tang, X.: Deep learning strong parts for pedestrian detection. In: IEEE International Conference on Computer Vision (ICCV) (2015)Google Scholar
  4. 4.
    Cai, Z., Saberian, M., Vasconcelos, N.: Learning complexity-aware cascades for deep pedestrian detection. In: IEEE International Conference on Computer Vision (ICCV) (2015)Google Scholar
  5. 5.
    Dollár, P., Tu, Z., Perona, P., Belongie, S.: Integral channel features. In: British Machine Vision Conference (BMVC) (2009)Google Scholar
  6. 6.
    Dollár, P., Appel, R., Belongie, S., Perona, P.: Fast feature pyramids for object detection. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 36, 1532–1545 (2014)CrossRefGoogle Scholar
  7. 7.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Neural Information Processing Systems (NIPS) (2012)Google Scholar
  8. 8.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv:1409.1556
  9. 9.
    Benenson, R., Omran, M., Hosang, J., Schiele, B.: Ten years of pedestrian detection, what have we learned? In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014 Workshops. LNCS, vol. 8926, pp. 613–627. Springer, Heidelberg (2015)Google Scholar
  10. 10.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)Google Scholar
  11. 11.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Neural Information Processing Systems (NIPS) (2015)Google Scholar
  12. 12.
    Girshick, R.: Fast R-CNN. In: IEEE International Conference on Computer Vision (ICCV) (2015)Google Scholar
  13. 13.
    Uijlings, J.R.R., van de Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. Int. J. Comput. Vis. (IJCV) 104, 154–171 (2013)CrossRefGoogle Scholar
  14. 14.
    Dollár, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: an evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 34, 743–761 (2012)CrossRefGoogle Scholar
  15. 15.
    He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part III. LNCS, vol. 8691, pp. 346–361. Springer, Heidelberg (2014)Google Scholar
  16. 16.
    Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets, fully connected CRFs (2014). arXiv:1412.7062
  17. 17.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
  18. 18.
    Friedman, J., Hastie, T., Tibshirani, R., et al.: Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann. Stat. 28, 337–407 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Appel, R., Fuchs, T., Dollár, P., Perona, P.: Quickly boosting decision trees-pruning underachieving features early. In: International Conference on Machine Learning (ICML) (2013)Google Scholar
  20. 20.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2005)Google Scholar
  21. 21.
    Ess, A., Leibe, B., Gool, L.V.: Depth and appearance for mobile scene analysis. In: IEEE International Conference on Computer Vision (ICCV) (2007)Google Scholar
  22. 22.
    Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The kitti vision benchmark suite. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2012)Google Scholar
  23. 23.
    Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vis. (IJCV) 57, 137–154 (2004)CrossRefGoogle Scholar
  24. 24.
    Nam, W., Dollár, P., Han, J.H.: Local decorrelation for improved pedestrian detection. In: Neural Information Processing Systems (NIPS) (2014)Google Scholar
  25. 25.
    Yang, B., Yan, J., Lei, Z., Li, S.Z.: Convolutional channel features. In: IEEE International Conference on Computer Vision (ICCV) (2015)Google Scholar
  26. 26.
    Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115, 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  27. 27.
    Liu, W., Rabinovich, A., Berg, A.C.: Parsenet: Looking wider to see better (2015). arXiv:1506.04579
  28. 28.
    Dollár, P.: Piotr’s Computer Vision Matlab Toolbox (PMT). https://github.com/pdollar/toolbox
  29. 29.
    Zhang, S., Benenson, R., Omran, M., Hosang, J., Schiele, B.: How far are we from solving pedestrian detection? In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  30. 30.
    Paisitkriangkrai, S., Shen, C., van den Hengel, A.: Strengthening the effectiveness of pedestrian detection with spatially pooled features. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part IV. LNCS, vol. 8692, pp. 546–561. Springer, Heidelberg (2014)Google Scholar
  31. 31.
    Zhang, S., Benenson, R., Schiele, B.: Filtered channel features for pedestrian detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
  32. 32.
    Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining (2016). arXiv:1604.03540

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Liliang Zhang
    • 1
  • Liang Lin
    • 1
    Email author
  • Xiaodan Liang
    • 1
  • Kaiming He
    • 2
  1. 1.School of Data and Computer ScienceSun Yat-sen UniversityGuangzhouChina
  2. 2.Microsoft ResearchBeijingChina

Personalised recommendations