Pedestrian Detection with a Directly-Cascaded Deconvolution-Convolution Structure

Chen, Zhiming; Han, Xintong; Lin, Weiyao; Cheng, Ming-Ming; Liu, Guangcan; Xiong, Hongkai

doi:10.1007/978-3-030-00776-8_34

Zhiming Chen¹⁸,
Xintong Han¹⁹,
Weiyao Lin¹⁸,
Ming-Ming Cheng²⁰,
Guangcan Liu²¹ &
…
Hongkai Xiong¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11164))

Included in the following conference series:

Pacific Rim Conference on Multimedia

3669 Accesses

Abstract

Driven by recent advances in deep learning, the accuracy of object detection has been tremendously improved. However, detecting small and blurred pedestrians still remains an open challenge. In this paper, we propose a novel neural network structure, which can be flexibly combined with powerful object detection systems for boosting pedestrian detection. The proposed structure contains two key modules: (i) a cascaded deconvolution-convolution (CDC) module to expand the resolution of feature maps, meanwhile, keep the crucial information in the feature maps; and (ii) a double-helix connection (DHC) module to effectively fuse shallow-level and deep-level features in the detection network. The CDC module enables the network to reuse features of the lower layers and learn richer features given low-resolution input. In addition, the DHC module incorporates the features learned in different layers in a novel and unified fashion. Extensive experiments on KITTI and Caltech Pedestrian datasets demonstrate that the proposed modules can be easily plugged into existing object detection networks (e.g., single-stage SSD and two-stage MSCNN) and consistently achieve better performance without bells and whistles.

This work is supported by NSFC(61471235), and Shanghai ‘The Belt and Road’ Young Scholar Exchange Grant(17510740100).

G. Liu—NSFC 61622305, NSFC 61502238, NSFJPC BK20141003

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)
Google Scholar
Lin, T.-Y., et al. Focal loss for dense object detection. arXiv preprint arXiv:1708.02002 (2017)
Shen, Z., Liu, Z., Li, J., Jiang, Y.-G., Chen, Y., Xue, X.: DSOD: learning deeply supervised object detectors from scratch. In: The IEEE International Conference on Computer Vision (ICCV) (2017)
Google Scholar
Dollar, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: a benchmark. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 304–311. IEEE (2009)
Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012. IEEE (2012)
Google Scholar
Wang, J., Yao, J., Zhang, Y., et al.: Collaborative learning for weakly supervised object detection. arXiv preprint arXiv:1802.03531 (2018)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C., Berg, A.: SSD: single shot multibox detector. In: ECCV (2016)
Google Scholar
Li, J., Liang, X., Shen, S., Xu, T., Yan, S.: Scale-aware fast R-CNN for pedestrian detection. In: CVPR (2015)
Google Scholar
Lu, Z., et al.: Modeling the resource requirements of convolutional neural networks on mobile devices. In: Proceedings of the 2017 ACM on Multimedia Conference. ACM (2017)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)
Google Scholar
Hosang, J., Omran, M., Benenson, R., Schiele, B.: Taking a deeper look at pedestrians. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4073–4082 (2015)
Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Cai, Z., Fan, Q., Feris, R.S., Vasconcelos, N.: A unified multi-scale deep convolutional neural network for fast object detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 354–370. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_22
Chapter Google Scholar
Felzenszwalb, P.F.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)
Article Google Scholar
Girshick, R.B., Felzenszwalb, P.F., McAllester, D.: Discriminatively trained deformable part models, release 5 (2012)
Google Scholar
Ren, J., et al.: Accurate single stage detector using recurrent rolling convolution. In: CVPR (2017)
Google Scholar
Yang, F., Choi, W.: Exploit all the layers: fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Girshick, R.: Fast R-CNN. In: The IEEE International Conference on Computer Vision (ICCV) (2015)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, p. 2 (2014)
Google Scholar
Lin, T.-Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Kong, T., et al.: Hypernet: towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (2015)
Google Scholar
Triggs, B., Dalal, N.: Histograms of oriented gradients for human detection. In: CVPR (2005)
Google Scholar
Dollar, P., Tu, Z., Perona, P., Belongie, S.: Integral channel features. In: BMVC (2009)
Google Scholar
Wang, X., Xiao, T., Jiang, Y., et al.: Repulsion loss: detecting pedestrians in a crowd. arXiv preprint arXiv:1711.07752 (2017)
Zhang, L., Lin, L., Liang, X., He, K.: Is faster R-CNN doing well for pedestrian detection? In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 443–457. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_28
Chapter Google Scholar
Brazil, G., Yin, X., Liu, X.: Illuminating pedestrians via simultaneous detection & segmentation. arXiv preprint arXiv:1706.08564 (2017)
Gidaris, S., Komodakis, N.: Object detection via a multi-region and semantic segmentation-aware CNN model. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1134–1142 (2015)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)
Google Scholar
Dollar, P., Belongie, S., Perona, P.: The fastest pedestrian detector in the west. In: BMVC (2010)
Google Scholar
Benenson, R., Mathias, M., Timofte, R., Van Gool, L.: Pedestrian detection at 100 frames per second. In: CVPR (2012)
Google Scholar
Sermanet, P., Kavukcuoglu, K., Chintala, S., LeCun, Y.: Pedestrian detection with unsupervised multi-stage feature learning. In: CVPR (2013)
Google Scholar
Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. In: IJCV (2013)
Google Scholar
Fu, C.Y., Liu, W., Ranga, A., et al.: DSSD: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)

Download references

Author information

Authors and Affiliations

Shanghai Jiao Tong University, Shanghai, China
Zhiming Chen, Weiyao Lin & Hongkai Xiong
University of Maryland, College Park, USA
Xintong Han
Nankai University, Tianjin, China
Ming-Ming Cheng
Nanjing University of Information Science and Technology, Nanjing, China
Guangcan Liu

Authors

Zhiming Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xintong Han
View author publications
You can also search for this author in PubMed Google Scholar
Weiyao Lin
View author publications
You can also search for this author in PubMed Google Scholar
Ming-Ming Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Guangcan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Hongkai Xiong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weiyao Lin .

Editor information

Editors and Affiliations

Hefei University of Technology, Hefei, China
Richang Hong
National Chiao Tung University, Hsinchu, Taiwan
Wen-Huang Cheng
University of Tokyo, Tokyo, Japan
Toshihiko Yamasaki
Hefei University of Technology, Hefei, China
Meng Wang
City University of Hong Kong, Hong Kong, Hong Kong
Chong-Wah Ngo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, Z., Han, X., Lin, W., Cheng, MM., Liu, G., Xiong, H. (2018). Pedestrian Detection with a Directly-Cascaded Deconvolution-Convolution Structure. In: Hong, R., Cheng, WH., Yamasaki, T., Wang, M., Ngo, CW. (eds) Advances in Multimedia Information Processing – PCM 2018. PCM 2018. Lecture Notes in Computer Science(), vol 11164. Springer, Cham. https://doi.org/10.1007/978-3-030-00776-8_34

Download citation

DOI: https://doi.org/10.1007/978-3-030-00776-8_34
Published: 19 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00775-1
Online ISBN: 978-3-030-00776-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics