Abstract
Driven by recent advances in deep learning, the accuracy of object detection has been tremendously improved. However, detecting small and blurred pedestrians still remains an open challenge. In this paper, we propose a novel neural network structure, which can be flexibly combined with powerful object detection systems for boosting pedestrian detection. The proposed structure contains two key modules: (i) a cascaded deconvolution-convolution (CDC) module to expand the resolution of feature maps, meanwhile, keep the crucial information in the feature maps; and (ii) a double-helix connection (DHC) module to effectively fuse shallow-level and deep-level features in the detection network. The CDC module enables the network to reuse features of the lower layers and learn richer features given low-resolution input. In addition, the DHC module incorporates the features learned in different layers in a novel and unified fashion. Extensive experiments on KITTI and Caltech Pedestrian datasets demonstrate that the proposed modules can be easily plugged into existing object detection networks (e.g., single-stage SSD and two-stage MSCNN) and consistently achieve better performance without bells and whistles.
This work is supported by NSFC(61471235), and Shanghai ‘The Belt and Road’ Young Scholar Exchange Grant(17510740100).
G. Liu—NSFC 61622305, NSFC 61502238, NSFJPC BK20141003
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)
Lin, T.-Y., et al. Focal loss for dense object detection. arXiv preprint arXiv:1708.02002 (2017)
Shen, Z., Liu, Z., Li, J., Jiang, Y.-G., Chen, Y., Xue, X.: DSOD: learning deeply supervised object detectors from scratch. In: The IEEE International Conference on Computer Vision (ICCV) (2017)
Dollar, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: a benchmark. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 304–311. IEEE (2009)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012. IEEE (2012)
Wang, J., Yao, J., Zhang, Y., et al.: Collaborative learning for weakly supervised object detection. arXiv preprint arXiv:1802.03531 (2018)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C., Berg, A.: SSD: single shot multibox detector. In: ECCV (2016)
Li, J., Liang, X., Shen, S., Xu, T., Yan, S.: Scale-aware fast R-CNN for pedestrian detection. In: CVPR (2015)
Lu, Z., et al.: Modeling the resource requirements of convolutional neural networks on mobile devices. In: Proceedings of the 2017 ACM on Multimedia Conference. ACM (2017)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)
Hosang, J., Omran, M., Benenson, R., Schiele, B.: Taking a deeper look at pedestrians. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4073–4082 (2015)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Cai, Z., Fan, Q., Feris, R.S., Vasconcelos, N.: A unified multi-scale deep convolutional neural network for fast object detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 354–370. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_22
Felzenszwalb, P.F.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)
Girshick, R.B., Felzenszwalb, P.F., McAllester, D.: Discriminatively trained deformable part models, release 5 (2012)
Ren, J., et al.: Accurate single stage detector using recurrent rolling convolution. In: CVPR (2017)
Yang, F., Choi, W.: Exploit all the layers: fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Girshick, R.: Fast R-CNN. In: The IEEE International Conference on Computer Vision (ICCV) (2015)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, p. 2 (2014)
Lin, T.-Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Kong, T., et al.: Hypernet: towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (2015)
Triggs, B., Dalal, N.: Histograms of oriented gradients for human detection. In: CVPR (2005)
Dollar, P., Tu, Z., Perona, P., Belongie, S.: Integral channel features. In: BMVC (2009)
Wang, X., Xiao, T., Jiang, Y., et al.: Repulsion loss: detecting pedestrians in a crowd. arXiv preprint arXiv:1711.07752 (2017)
Zhang, L., Lin, L., Liang, X., He, K.: Is faster R-CNN doing well for pedestrian detection? In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 443–457. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_28
Brazil, G., Yin, X., Liu, X.: Illuminating pedestrians via simultaneous detection & segmentation. arXiv preprint arXiv:1706.08564 (2017)
Gidaris, S., Komodakis, N.: Object detection via a multi-region and semantic segmentation-aware CNN model. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1134–1142 (2015)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)
Dollar, P., Belongie, S., Perona, P.: The fastest pedestrian detector in the west. In: BMVC (2010)
Benenson, R., Mathias, M., Timofte, R., Van Gool, L.: Pedestrian detection at 100 frames per second. In: CVPR (2012)
Sermanet, P., Kavukcuoglu, K., Chintala, S., LeCun, Y.: Pedestrian detection with unsupervised multi-stage feature learning. In: CVPR (2013)
Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. In: IJCV (2013)
Fu, C.Y., Liu, W., Ranga, A., et al.: DSSD: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, Z., Han, X., Lin, W., Cheng, MM., Liu, G., Xiong, H. (2018). Pedestrian Detection with a Directly-Cascaded Deconvolution-Convolution Structure. In: Hong, R., Cheng, WH., Yamasaki, T., Wang, M., Ngo, CW. (eds) Advances in Multimedia Information Processing – PCM 2018. PCM 2018. Lecture Notes in Computer Science(), vol 11164. Springer, Cham. https://doi.org/10.1007/978-3-030-00776-8_34
Download citation
DOI: https://doi.org/10.1007/978-3-030-00776-8_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00775-1
Online ISBN: 978-3-030-00776-8
eBook Packages: Computer ScienceComputer Science (R0)