Efficiently Handling Scale Variation for Pedestrian Detection
Abstract
Pedestrian detection is a popular yet challenging research topic in the computer vision community. Although it has achieved great progress in recent years, it still remains an open question how to handle scale variation, which commonly exists in real world applications. To address this problem, this paper presents a novel pedestrian detector to better classify and regress proposals of different scales given by a region proposal network (RPN). Specifically, we have made the following major modifications to the Adapted FasterRCNN baseline. First, we divide all proposals into small and large pools according to their scales, and deal with each pool in a separate classification network. Also, we employ two auxiliary supervisions to balance the effect of two parts of proposals on the back propagation. It is worth noting that the proposed new detector does not bring extra computational overhead and only introduces very few additional parameters. We have conducted experiments on the CityPersons, Caltech and ETH datasets and achieved significant improvements to the baseline method, especially on the small scale subset. In particular, on the CityPersons and ETH datasets, our method surpasses previous state-of-the-art methods with lower computational costs at test time.
Keywords
Pedestrian detection Scale variation Convolutional neural networksNotes
Acknowledgements
This work is supported by National Natural Science Foundation of China (Grant No. 61702262), Funds for International Cooperation and Exchange of the National Natural Science Foundation of China (Grant No. 61861136011), Natural Science Foundation of Jiangsu Province, China (Grant No. BK20181299), CCF-Tencent Open Fund (RAGR20180113), “the Fundamental Research Funds for the Central Universities” (No. 30918011322) and Young Elite Scientists Sponsorship Program by CAST (2018QNRC001).
References
- 1.Ouyang, W., Wang, X.: Joint deep learning for pedestrian detection. In: ICCV, pp. 2056–2063 (2013)Google Scholar
- 2.Ess, A., Leibe, B., Van Gool, L.: Depth and appearance for mobile scene analysis. In: ICCV, pp. 1–8 (2007)Google Scholar
- 3.Li, W., Zhao, R., Xiao, T., Wang, X.: DeepReID: deep filter pairing neural network for person re-identification. In: CVPR, pp. 152–159 (2014)Google Scholar
- 4.Wang, X., Wang, M., Li, W.: Scene-specific pedestrian detection for static video surveillance. PAMI 36(2), 361–374 (2014)MathSciNetCrossRefGoogle Scholar
- 5.Dollar, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: an evaluation of the state of the art. PAMI 34(4), 743–761 (2011)CrossRefGoogle Scholar
- 6.Zhang, S., Benenson, R., Omran, M., Hosang, J., Schiele, B.: Towards reaching human performance in pedestrian detection. PAMI 40(4), 973–986 (2017)CrossRefGoogle Scholar
- 7.Chen, D., Zhang, S., Ouyang, W., Yang, J., Tai, Y.: Person search via a mask-guided two-stream CNN model. In: ECCV, pp. 734–750 (2018)CrossRefGoogle Scholar
- 8.Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)Google Scholar
- 9.Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016)Google Scholar
- 10.Girshick, R.: Fast R-CNN. In: ICCV, pp. 1440–1448 (2015)Google Scholar
- 11.Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2CrossRefGoogle Scholar
- 12.Brazil, G., Yin, X., Liu, X.: Illuminating pedestrians via simultaneous detection & segmentation. In: ICCV, pp. 4950–4959 (2017)Google Scholar
- 13.Zhang, S., Yang, J., Schiele, B.: Occluded pedestrian detection through guided attention in CNNs. In: CVPR, pp. 6995–7003 (2018)Google Scholar
- 14.Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: CVPR, pp. 7263–7271 (2017)Google Scholar
- 15.Zhou, C., Yuan, J.: Multi-label learning of part detectors for heavily occluded pedestrian detection. In: ICCV, pp. 3486–3495 (2017)Google Scholar
- 16.Song, T., Sun, L., Xie, D., Sun, H., Pu, S.: Small-scale pedestrian detection based on somatic topology localization and temporal feature aggregation. In: arXiv preprint. arXiv:1807.01438 (2018)
- 17.Hosang, J., Omran, M., Benenson, R., Schiele, B.: Taking a deeper look at pedestrians. In: CVPR, pp. 4073–4082 (2015)Google Scholar
- 18.Zhang, S., Benenson, R., Omran, M., Hosang, J., Schiele, B.: How far are we from solving pedestrian detection? In: CVPR, pp. 1259–1267 (2016)Google Scholar
- 19.Zhang, S., Benenson, R., Schiele, B.: Citypersons: a diverse dataset for pedestrian detection. In: CVPR, pp. 3213–3221 (2017)Google Scholar
- 20.Singh, B., Davis, L.S.: An analysis of scale invariance in object detection snip. In: CVPR, pp. 3578–3587 (2018)Google Scholar
- 21.Cai, Z., Fan, Q., Feris, R.S., Vasconcelos, N.: A unified multi-scale deep convolutional neural network for fast object detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 354–370. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_22CrossRefGoogle Scholar
- 22.Kong, T., Sun, F., Yao, A., Liu, H., Lu, M., Chen, Y.: Ron: reverse connection with objectness prior networks for object detection. In: CVPR, pp. 5936–5944 (2017)Google Scholar
- 23.Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR, pp. 2117–2125 (2017)Google Scholar
- 24.Wang, X., Xiao, T., Jiang, Y., Shao, S., Sun, J., Shen, C.: Repulsion loss: detecting pedestrians in a crowd. In: CVPR, pp. 7774–7783 (2018)Google Scholar
- 25.Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Occlusion-aware R-CNN: detecting pedestrians in a crowd. In: ECCV, pp. 637–653 (2018)CrossRefGoogle Scholar
- 26.Yang, F., Choi, W., Lin, Y.: Exploit all the layers: fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers. In: CVPR, pp. 2129–2137 (2016)Google Scholar
- 27.Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: CVPR, pp. 7132–7141 (2018)Google Scholar
- 28.Daniel Costea, A., Nedevschi, S.: Semantic channels for fast pedestrian detection. In: CVPR, pp. 2360–2368 (2016)Google Scholar
- 29.Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp. 580–587 (2014)Google Scholar
- 30.Li, J., Liang, X., Shen, S., Xu, T., Feng, J., Yan, S.: Scale-aware fast R-CNN for pedestrian detection. IEEE Trans. Multimedia 20(4), 985–996 (2018)Google Scholar