Occlusion-Aware R-CNN: Detecting Pedestrians in a Crowd

  • Shifeng Zhang
  • Longyin Wen
  • Xiao Bian
  • Zhen LeiEmail author
  • Stan Z. Li
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11207)


Pedestrian detection in crowded scenes is a challenging problem since the pedestrians often gather together and occlude each other. In this paper, we propose a new occlusion-aware R-CNN (OR-CNN) to improve the detection accuracy in the crowd. Specifically, we design a new aggregation loss to enforce proposals to be close and locate compactly to the corresponding objects. Meanwhile, we use a new part occlusion-aware region of interest (PORoI) pooling unit to replace the RoI pooling layer in order to integrate the prior structure information of human body with visibility prediction into the network to handle occlusion. Our detector is trained in an end-to-end fashion, which achieves state-of-the-art results on three pedestrian detection datasets, i.e., CityPersons, ETH, and INRIA, and performs on-pair with the state-of-the-arts on Caltech.


Pedestrian detection Occlusion-aware Convolutional network Structure information Visibility prediction 



This work was supported by the National Key Research and Development Plan (Grant No. 2016YFC0801002), the Chinese National Natural Science Foundation Projects \(\#61473291\), \(\#61572501\), \(\#61502491\), \(\#61572536\), the Science and Technology Development Fund of Macau (No. 0025/2018/A1, 151/2017/A, 152/2017/A), JDGrapevine Plan and AuthenMetric R&D Funds. We also thank NVIDIA for GPU donations through their academic program.

Supplementary material

Supplementary material 1 (mp4 3629 KB)


  1. 1.
    Angelova, A., Krizhevsky, A., Vanhoucke, V., Ogale, A.S., Ferguson, D.: Real-time pedestrian detection with deep network cascades. In: BMVC, pp. 32.1–32.12 (2015)Google Scholar
  2. 2.
    Benenson, R., Mathias, M., Timofte, R., Gool, L.J.V.: Pedestrian detection at 100 frames per second. In: CVPR, pp. 2903–2910 (2012)Google Scholar
  3. 3.
    Benenson, R., Mathias, M., Tuytelaars, T., Gool, L.J.V.: Seeking the strongest rigid detector. In: CVPR, pp. 3666–3673 (2013)Google Scholar
  4. 4.
    Brazil, G., Yin, X., Liu, X.: Illuminating pedestrians via simultaneous detection and segmentation. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, 22–29 October 2017, pp. 4960–4969 (2017)Google Scholar
  5. 5.
    Cai, Z., Fan, Q., Feris, R.S., Vasconcelos, N.: A unified multi-scale deep convolutional neural network for fast object detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016 Part IV. LNCS, vol. 9908, pp. 354–370. Springer, Cham (2016). Scholar
  6. 6.
    Cai, Z., Saberian, M.J., Vasconcelos, N.: Learning complexity-aware cascades for deep pedestrian detection. In: ICCV, pp. 3361–3369 (2015)Google Scholar
  7. 7.
    Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016)Google Scholar
  8. 8.
    Costea, A.D., Nedevschi, S.: Word channel based multiscale pedestrian detection without image resizing and using only one classifier. In: CVPR (2014)Google Scholar
  9. 9.
    Costea, A.D., Nedevschi, S.: Semantic channels for fast pedestrian detection. In: CVPR, pp. 2360–2368 (2016)Google Scholar
  10. 10.
    Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: NIPS, pp. 379–387 (2016)Google Scholar
  11. 11.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, pp. 886–893 (2005)Google Scholar
  12. 12.
    Dollár, P., Appel, R., Belongie, S.J., Perona, P.: Fast feature pyramids for object detection. TPAMI 36(8), 1532–1545 (2014)CrossRefGoogle Scholar
  13. 13.
    Dollár, P., Tu, Z., Perona, P., Belongie, S.J.: Integral channel features. In: BMVC, pp. 1–11 (2009)Google Scholar
  14. 14.
    Dollár, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: an evaluation of the state of the art. TPAMI 34(4), 743–761 (2012)CrossRefGoogle Scholar
  15. 15.
    Du, X., El-Khamy, M., Lee, J., Davis, L.S.: Fused DNN: a deep neural network fusion approach to fast and robust pedestrian detection. In: WACV (2017)Google Scholar
  16. 16.
    Duan, G., Ai, H., Lao, S.: A structural filter approach to human detection. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010 Part VI. LNCS, vol. 6316, pp. 238–251. Springer, Heidelberg (2010). Scholar
  17. 17.
    Enzweiler, M., Eigenstetter, A., Schiele, B., Gavrila, D.M.: Multi-cue pedestrian classification with partial occlusion handling. In: CVPR, pp. 990–997 (2010)Google Scholar
  18. 18.
    Ess, A., Leibe, B., Gool, L.J.V.: Depth and appearance for mobile scene analysis. In: ICCV, pp. 1–8 (2007)Google Scholar
  19. 19.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D.A., Ramanan, D.: Object detection with discriminatively trained part-based models. TPAMI 32(9), 1627–1645 (2010)CrossRefGoogle Scholar
  20. 20.
    Girshick, R.B.: Fast R-CNN. In: ICCV, pp. 1440–1448 (2015)Google Scholar
  21. 21.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: AISTATS, pp. 249–256 (2010)Google Scholar
  22. 22.
    Hosang, J.H., Omran, M., Benenson, R., Schiele, B.: Taking a deeper look at pedestrians. In: CVPR, pp. 4073–4082 (2015)Google Scholar
  23. 23.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS, pp. 1106–1114 (2012)Google Scholar
  24. 24.
    Leibe, B., Seemann, E., Schiele, B.: Pedestrian detection in crowded scenes. In: CVPR, pp. 878–885 (2005)Google Scholar
  25. 25.
    Li, J., Liang, X., Shen, S., Xu, T., Yan, S.: Scale-aware fast R-CNN for pedestrian detection. IEEE Trans. Multimed. 20, 985–996 (2017)Google Scholar
  26. 26.
    Lim, J.J., Zitnick, C.L., Dollár, P.: Sketch tokens: a learned mid-level representation for contour and object detection. In: CVPR, pp. 3158–3165 (2013)Google Scholar
  27. 27.
    Lin, T., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: CVPR (2017)Google Scholar
  28. 28.
    Liu, W., et al.: SSD: single shot multibox detector. In: ECCV, pp. 21–37 (2016)Google Scholar
  29. 29.
    Luo, P., Tian, Y., Wang, X., Tang, X.: Switchable deep network for pedestrian detection. In: CVPR, pp. 899–906 (2014)Google Scholar
  30. 30.
    Mao, J., Xiao, T., Jiang, Y., Cao, Z.: What can help pedestrian detection? In: CVPR, pp. 6034–6043 (2017)Google Scholar
  31. 31.
    Marín, J., Vázquez, D., López, A.M., Amores, J., Leibe, B.: Random forests of local experts for pedestrian detection. In: ICCV, pp. 2592–2599 (2013)Google Scholar
  32. 32.
    Mathias, M., Benenson, R., Timofte, R., Gool, L.J.V.: Handling occlusions with Franken-classifiers. In: ICCV, pp. 1505–1512 (2013)Google Scholar
  33. 33.
    Nam, W., Dollár, P., Han, J.H.: Local decorrelation for improved pedestrian detection. In: NIPS, pp. 424–432 (2014)Google Scholar
  34. 34.
    Ohn-Bar, E., Trivedi, M.M.: To boost or not to boost? On the limits of boosted trees for object detection. In: ICPR, pp. 3350–3355 (2016)Google Scholar
  35. 35.
    Ouyang, W., Wang, X.: A discriminative deep model for pedestrian detection with occlusion handling. In: CVPR, pp. 3258–3265 (2012)Google Scholar
  36. 36.
    Ouyang, W., Wang, X.: Joint deep learning for pedestrian detection. In: ICCV, pp. 2056–2063 (2013)Google Scholar
  37. 37.
    Ouyang, W., Wang, X.: Single-pedestrian detection aided by multi-pedestrian detection. In: CVPR, pp. 3198–3205 (2013)Google Scholar
  38. 38.
    Ouyang, W., Zeng, X., Wang, X.: Modeling mutual visibility relationship in pedestrian detection. In: CVPR, pp. 3222–3229 (2013)Google Scholar
  39. 39.
    Paisitkriangkrai, S., Shen, C., van den Hengel, A.: Strengthening the effectiveness of pedestrian detection with spatially pooled features. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014 Part IV. LNCS, vol. 8692, pp. 546–561. Springer, Cham (2014). Scholar
  40. 40.
    Papageorgiou, C., Poggio, T.A.: A trainable system for object detection. IJCV 38(1), 15–33 (2000)CrossRefGoogle Scholar
  41. 41.
    Pepik, B., Stark, M., Gehler, P.V., Schiele, B.: Occlusion patterns for object class detection. In: CVPR, pp. 3286–3293 (2013)Google Scholar
  42. 42.
    Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. CoRR abs/1612.08242 (2016)Google Scholar
  43. 43.
    Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. TPAMI 39(6), 1137–1149 (2017)CrossRefGoogle Scholar
  44. 44.
    Sermanet, P., Kavukcuoglu, K., Chintala, S., LeCun, Y.: Pedestrian detection with unsupervised multi-stage feature learning. In: CVPR, pp. 3626–3633 (2013)Google Scholar
  45. 45.
    Shen, C., Wang, P., Paisitkriangkrai, S., van den Hengel, A.: Training effective node classifiers for cascade classification. IJCV 103(3), 326–347 (2013)MathSciNetCrossRefGoogle Scholar
  46. 46.
    Shet, V.D., Neumann, J., Ramesh, V., Davis, L.S.: Bilattice-based logical reasoning for human detection. In: CVPR (2007)Google Scholar
  47. 47.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)Google Scholar
  48. 48.
    Tang, S., Andriluka, M., Schiele, B.: Detection and tracking of occluded people. In: BMVC, pp. 1–11 (2012)Google Scholar
  49. 49.
    Tian, Y., Luo, P., Wang, X., Tang, X.: Deep learning strong parts for pedestrian detection. In: ICCV, pp. 1904–1912 (2015)Google Scholar
  50. 50.
    Tian, Y., Luo, P., Wang, X., Tang, X.: Pedestrian detection aided by deep learning semantic tasks. In: CVPR, pp. 5079–5087 (2015)Google Scholar
  51. 51.
    Toca, C., Ciuc, M., Patrascu, C.: Normalized autobinomial Markov channels for pedestrian detection. In: BMVC, pp. 175.1–175.13 (2015)Google Scholar
  52. 52.
    Uijlings, J.R.R., van de Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. IJCV 104(2), 154–171 (2013)CrossRefGoogle Scholar
  53. 53.
    Viola, P.A., Jones, M.J.: Robust real-time face detection. IJCV 57(2), 137–154 (2004)CrossRefGoogle Scholar
  54. 54.
    Wang, X., Han, T.X., Yan, S.: An HOG-LBP human detector with partial occlusion handling. In: ICCV, pp. 32–39 (2009)Google Scholar
  55. 55.
    Wang, X., Xiao, T., Jiang, Y., Shao, S., Sun, J., Shen, C.: Repulsion loss: detecting pedestrians in a crowd. CoRR abs/1711.07752 (2017)Google Scholar
  56. 56.
    Wu, B., Nevatia, R.: Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors. In: ICCV (2005)Google Scholar
  57. 57.
    Xu, H., Lv, X., Wang, X., Ren, Z., Bodla, N., Chellappa, R.: Deep regionlets for object detection. CoRR abs/1712.02408 (2017)Google Scholar
  58. 58.
    Yan, J., Lei, Z., Yi, D., Li, S.Z.: Multi-pedestrian detection in crowded scenes: a global view. In: CVPR, pp. 3124–3129 (2012)Google Scholar
  59. 59.
    Yan, J., Zhang, X., Lei, Z., Liao, S., Li, S.Z.: Robust multi-resolution pedestrian detection in traffic scenes. In: CVPR, pp. 3033–3040 (2013)Google Scholar
  60. 60.
    Yang, B., Yan, J., Lei, Z., Li, S.Z.: Convolutional channel features. In: ICCV (2015)Google Scholar
  61. 61.
    Yang, F., Choi, W., Lin, Y.: Exploit all the layers: fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers. In: CVPR (2016)Google Scholar
  62. 62.
    Yang, Y., Wang, Z., Wu, F.: Exploring prior knowledge for pedestrian detection. In: BMVC, pp. 176.1–176.12 (2015)Google Scholar
  63. 63.
    Zhang, L., Lin, L., Liang, X., He, K.: Is faster R-CNN doing well for pedestrian detection? In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016 Part II. LNCS, vol. 9906, pp. 443–457. Springer, Cham (2016). Scholar
  64. 64.
    Zhang, S., Bauckhage, C., Cremers, A.B.: Informed Haar-like features improve pedestrian detection. In: CVPR, pp. 947–954 (2014)Google Scholar
  65. 65.
    Zhang, S., Benenson, R., Omran, M., Hosang, J.H., Schiele, B.: How far are we from solving pedestrian detection? In: CVPR, pp. 1259–1267 (2016)Google Scholar
  66. 66.
    Zhang, S., Benenson, R., Schiele, B.: Filtered channel features for pedestrian detection. In: CVPR, pp. 1751–1760 (2015)Google Scholar
  67. 67.
    Zhang, S., Benenson, R., Schiele, B.: CityPersons: a diverse dataset for pedestrian detection. In: CVPR, pp. 4457–4465 (2017)Google Scholar
  68. 68.
    Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Single-shot refinement neural network for object detection. In: CVPR (2018)Google Scholar
  69. 69.
    Zhang, S., Zhu, X., Lei, Z., Shi, H., Wang, X., Li, S.Z.: Detecting face with densely connected face proposal network. In: CCBR, pp. 3–12 (2017)Google Scholar
  70. 70.
    Zhang, S., Zhu, X., Lei, Z., Shi, H., Wang, X., Li, S.Z.: Faceboxes: a CPU real-time face detector with high accuracy. In: IJCB (2017)Google Scholar
  71. 71.
    Zhang, S., Zhu, X., Lei, Z., Shi, H., Wang, X., Li, S.Z.: S\({}^{\text{3}}\)FD: single shot scale-invariant face detector. In: ICCV (2017)Google Scholar
  72. 72.
    Zhou, C., Yuan, J.: Learning to integrate occlusion-specific detectors for heavily occluded pedestrian detection. In: Lai, S.-H., Lepetit, V., Nishino, K., Sato, Y. (eds.) ACCV 2016 Part II. LNCS, vol. 10112, pp. 305–320. Springer, Cham (2017). Scholar
  73. 73.
    Zhou, C., Yuan, J.: Multi-label learning of part detectors for heavily occluded pedestrian detection. In: ICCV, pp. 3506–3515 (2017)Google Scholar
  74. 74.
    Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014 Part V. LNCS, vol. 8693, pp. 391–405. Springer, Cham (2014). Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Center for Biometrics and Security Research, National Laboratory of Pattern Recognition, Institute of AutomationChinese Academy of SciencesBeijingChina
  2. 2.University of Chinese Academy of SciencesBeijingChina
  3. 3.GE Global ResearchNiskayunaUSA
  4. 4.Macau University of Science and TechnologyMacauChina

Personalised recommendations