Advertisement

Multimedia Tools and Applications

, Volume 78, Issue 2, pp 1719–1736 | Cite as

Multi-scale pedestrian detection using skip pooling and recurrent convolution

  • Chen ZhangEmail author
  • Joohee Kim
Article
  • 152 Downloads

Abstract

Detecting pedestrians of different scales is essential for applications like autonomous driving. Recent research progress showed that combining multiple feature maps and contextual information helps detecting objects of different scales. In this paper, we propose a multi-scale pedestrian detector that combines skip pooling from multi-resolution feature maps and recurrent convolutional layers for extracting contextual information. To fully exploit the unique characteristics of the features at different levels for multi-scale pedestrian detection, the multi-scale features and the context features are fused at the fully connected layer. To gather spatial contextual information, we propose a modified recurrent convolutional layer that produces context feature maps with different resolutions. In addition, we construct a set of scale-dependent classification and bounding box regression subnetworks to further improve the performance of multi-scale pedestrian detection. Experiments on Caltech and KITTI pedestrian detection benchmark datasets show that the proposed method achieves the state-of-the-art performance with faster speed.

Keywords

Pedestrian detection Deep learning Convolutional neural networks Multi-scale object detection Recurrent neural networks 

Notes

Acknowledgements

This work is supported by the Industrial Core Technology Development Program of MOTIE/KEIT, KOREA.[#10083639, Development of Camera-based Real-time Artificial Intelligence System for Detecting Driving Environment and Recognizing Objects on Road Simultaneously]

References

  1. 1.
    Bell S, Zitnick CL, Bala K, Girshick R (2015) Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In: arXiv:1512.04143
  2. 2.
    Braun M, Rao Q, Wang Y, Flohr F (2016) Pose-rcnn: Joint object detection and pose estimation using 3d object proposals. In: IEEE 19th International Conference on Intelligent Transportation Systems, pp 1546–1551Google Scholar
  3. 3.
    Byeon W, Breuel T, Raue F, Liwicki M (2015) Scene labeling with LSTM recurrent neural networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 3547–3555Google Scholar
  4. 4.
    Cai Z, Saberian M, Vasconcelos N (2015) Learning complexity-aware cascades for deep pedestrian detection. In: IEEE International Conference on Computer Vision, pp 3361–3369Google Scholar
  5. 5.
    Cai Z, Fan Q, Feris RS, Vasconcelos N (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: European Conference on Computer Vision, pp 354–370Google Scholar
  6. 6.
    Chen X, Kundu K, Zhu Y, Berneshawi A, Ma H (2015) 3d object proposals for accurate object class detection. In: Neural Information Processing Systems, pp 424–432Google Scholar
  7. 7.
    Chen X, Kundu K, Zhang Z, Ma H, Fidler S, Urtasun R (2016) Monocular 3d object detection for autonomous driving. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 2147–2156Google Scholar
  8. 8.
    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. Proc IEEE Conf Comput Vis Struct Recogn 1:886–893Google Scholar
  9. 9.
    DiCarlo JJ, Zoccolan D, Rust NC (2012) How does the brain solve visual object recognition? Neuron 29(3):415–434CrossRefGoogle Scholar
  10. 10.
    Dollár P, Tu Z, Perona P, Belongie S (2009) Integral channel features. In: Proceedings of british machine vision conference, pp 99.1–99.11Google Scholar
  11. 11.
    Dollár P, Wojek C, Schiele B, Perona P (2012) Pedestrian detection: An evaluation of the state of the art. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, pp 743–761Google Scholar
  12. 12.
    Du X, El-Khamy M, Lee J, Davis L (2016) Fused dnn: A deep neural network fusion approach to fast and robust pedestrian detection. In: arXiv:1610.03466
  13. 13.
    Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? The kitti vision benchmark suite. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 3354– 3361Google Scholar
  14. 14.
    Girshick R (2015) Fast r-cnn. In: IEEE International Conference on Computer Vision, pp 1440– 1448Google Scholar
  15. 15.
    Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 580–587Google Scholar
  16. 16.
    Girshick R, Donahue J, Darrell T, Malik J (2016) Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans Pattern Anal Mach Intell 38(1):142–158CrossRefGoogle Scholar
  17. 17.
    Hariharan B, Arbelaez P, Girshick R, Malik J (2015) Hypercolumns for object segmentation and fine-grained localization. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 447–456Google Scholar
  18. 18.
    He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. In: arXiv:1512.03385
  19. 19.
    He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37 (9):1904–1916CrossRefGoogle Scholar
  20. 20.
    Hu Q, Wang P, Shen C, Hengel A, Porikli F (2017) Pushing the limits of deep cnns for pedestrian detection. IEEE Trans Circ Syst Video Technol 89(99):1–1Google Scholar
  21. 21.
    Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Derrell T (2014) Caffe: Convolutional architecture for fast feature embedding. In: arXiv:1408.5093
  22. 22.
    Jung S, Hong K (2017) Deep network aided by guiding network for pedestrian detection. In: Pattern Recognition Letters, pp 43–49Google Scholar
  23. 23.
    Kong T, Sun F, Yao A, Liu H, Lu M, Chen Y (2017) Ron: Reverse connection with objectness prior networks for object detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 5244–5252Google Scholar
  24. 24.
    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Neural Information Processing Systems, pp 1106–1114Google Scholar
  25. 25.
    Le QV, Jaitly N, Hinton GE (2015) A simple way to initialize recurrent networks of rectified linear units. In: arXiv:1504.00941
  26. 26.
    Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. In: Proceedings of the IEEE, pp 2278–2324Google Scholar
  27. 27.
    Li J, Liang X, Shen S, Xu T, Yan S (2015) Scale-aware fast r-cnn for pedestrian detection. IEEE Trans Multimed 10:1109Google Scholar
  28. 28.
    Liang M, Hu X (2015) Recurrent convolutional neural network for object recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 3367–3375Google Scholar
  29. 29.
    Lin T, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 936–944Google Scholar
  30. 30.
    Liu W, Rabinovich A, Berg AC (2015) ParseNet: Looking wider to see better. In: arXiv:1506.04579
  31. 31.
    Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C, Berg A (2016) Ssd: Single shot multibox detector. In: European Conference on Computer VisionGoogle Scholar
  32. 32.
    Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 3431–3440Google Scholar
  33. 33.
    Pham C, Jeon J (2017) Robust object proposals re-ranking for object detection in autonomous driving using convolutional neural networks. In: Signal Processing: Image Communication, pp 110–122Google Scholar
  34. 34.
    Ren J, Chen X, Liu J, Sun W, Pang J, Yan Q, Tai Y, Xu L (2017) Accurate single stage detector using recurrent rolling convolution. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 752–760Google Scholar
  35. 35.
    Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149CrossRefGoogle Scholar
  36. 36.
    Sermanet P, Kavukcuoglu K, Chintala S, Lecun Y (2013) Pedestrian detection with unsupervised multi-stage feature learning. In: Proceedings of IEEE Conference on Computer Vision and pattern recognition, pp 3626–3633Google Scholar
  37. 37.
    Shuai B, Zuo Z, Wang B, Wang G (2017) Scene segmentation with dag-recurrent neural networks. IEEE Trans Pattern Anal Mach Intell 89(99):1–1Google Scholar
  38. 38.
    Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. In: arXiv:1409.1556
  39. 39.
    Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 1–9Google Scholar
  40. 40.
    Tian Y, Luo P, Wang X, Tang X (2015) Deep learning strong parts for pedestrian detection. In: IEEE International Conference on Computer Vision, pp 1904–1912Google Scholar
  41. 41.
    Tripathi S, Lipton Z, Belongie S, Nguyen T (2016) Context matters: Refining object detection in video with recurrent neural networks. In: Proceedings of British Machine Vision ConferenceGoogle Scholar
  42. 42.
    Uijlings JR, van de Sande KE, Gevers T, Smeulders AW (2013) Selective search for object recognition. In: International journal of Computer Vision, pp 154–171Google Scholar
  43. 43.
    Xiang Y, Choi W, Lin Y, Savarese S (2017) Subcategory-aware convolutional neural networks for object proposals and detection. In: IEEE winter Conference on Applications of Computer VisionGoogle Scholar
  44. 44.
    Yang B, Yan J, Lei Z, Li S (2016) Craft objects from images. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 6043–6051Google Scholar
  45. 45.
    Yang F, Choi W, Lin Y (2016) Exploit all the layers: Fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 2129–2137Google Scholar
  46. 46.
    Zagoruyko S, Lerer A, Lin T, Pinheiro PO, Gross S, Chintala S, Doll P (2016) A multipath network for object detection. In: Proceedings of British Machine Vision ConferenceGoogle Scholar
  47. 47.
    Zhang L, Lin L, Liang X, He K (2016) Is faster r-cnn doing well for pedestrian detection? In: European Conference on Computer Vision, pp 443–457Google Scholar
  48. 48.
    Zhu Y, Wang J, Zhao C, Guo H, Lu H (2017) Scale-adaptive deconvolutional regression network for pedestrian detection. In: Asian Conference on Computer Vision, pp 416–430Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Electrical and Computer EngineeringIllinois Institute of TechnologyChicagoUSA

Personalised recommendations