Abstract
For pedestrian detection, many deep learning approaches have shown effectiveness, but they are not accurate enough for the positioning of obstructed pedestrians. A novel segmentation and context network (SCN) structure is proposed that combines the segmentation and context information for improving the accuracy of bounding box regression for pedestrian detection. The SCN model contains the segmentation sub-model and the context sub-model. For separating the pedestrian instance from the background and solving the pedestrian occlusion problem, this paper uses the segmentation sub-model for extracting pedestrian segmentation information to generate more accurate pedestrian regions. Considering that different pedestrian instances need different context information, this paper uses context regions with different scales to extract context information. For improving the detection performance, this paper uses the hole algorithm in the context sub-model to expand the receptive field of the output feature maps and connect the multi-channel features with the skip layer. Finally, the loss functions of the two sub-models outputs are fused. The experimental results on different datasets validate the effectiveness of our SCN model, and the deeply supervised algorithm has a good trade-off between accuracy and complexity.
Similar content being viewed by others
References
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE computer society conference on computer vision and pattern recognition
Felzenszwalb P, McAllester D, Ramanan D (2008) A discriminatively trained, multiscale, deformable part model. In: 2008 IEEE conference on computer vision and pattern recognition
Ouyang W, Wang X (2014) Joint deep learning for pedestrian detection. In: IEEE international conference on computer vision. IEEE, pp 2056–2063
Yang B, Yan J, Lei Z, Li SZ (2015) Convolutional channel features. In: 2015 IEEE international conference on computer vision (ICCV), pp 82–90
Cai Z, Saberian M, Vasconcelos N (2015) Learning complexity-aware cascades for deep pedestrian detection. In: 2015 IEEE international conference on computer vision (ICCV), Santiago, Chile, pp 3361–3369
Liang X, He K, Zhang L, Lin L (2016) Is faster R-CNN doing well for pedestrian detection? In: European conference on computer vision, pp 443–457
Su Y, Colombo A, Ghorban F, Marn J, Kummert A (2018) Aggregated channels network for real-time pedestrian detection. arXiv: 1801.00476v1
Zhang H, Cao X, Ho JKL et al (2017) Object-level video advertising: an optimization framework. IEEE Trans Ind Inf 13(2):520–531
Zhang H, Ji Y, Huang W et al (2018) Sitcom-star-based clothing retrieval for video advertising: a deep learning framework. Neural Comput Appl, pp 1–20
Du X, El-Khamy M, Lee J, Davis L (2017) Fused DNN: a deep neural network fusion approach to fast and robust pedestrian detection. In: 2017 IEEE winter conference on applications of computer vision (WACV), pp 953–961
Wang S, Liu J, Zhang S, Metaxas DN (2016) Multispectral deep neural networks for pedestrian detection. In: Computer vision and pattern recognition
Mao J, Xiao T, Jiang Y, Cao Z (2017) What can help pedestrian detection? In: 2017 IEEE conference on computer vision and pattern recognition, pp 6034–6043
Li G, Yu Y (2016) Deep contrast learning for salient object detection. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 478–487
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE computer society conference on computer vision and pattern recognition, pp 580–587
Girshick R (2015) Fast R-CNN. In: 2015 IEEE international conference on computer vision (ICCV), pp 1440–1448
Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Redmon J, Farhadi A (2016) YOLO9000: better, faster, stronger. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 6517–6525
Erhan D, Szegedy C, Reed S, Fu CY, Liu W, Anguelov D, Berg AC (2016) SSD: single shot multibox detector. In: European conference computer vision (ECCV), pp 21–37
Guanbin L, Yu Y (2005) Visual saliency based on multiscale deep features. In: 2005 IEEE conference on computer vision and pattern recognition, pp 5455–5463
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Ranga A, Tyagi A, Fu CY, Liu W, Berg AC (2017) DSSD: deconvolutional single shot detector. In: Computer vision and pattern recognition
Shen Z, Liu Z, Li J, Jiang Y, Chen Y, Xue X (2017) DSOD: learning deeply supervised object detectors from scratch. In: 2017 IEEE international conference on computer vision (ICCV), pp 1937–1945
Leibe B, Seemann E, Schiele B (2005) Pedestrian detection in crowded scenes. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), pp 878–885
Zhao, X, Liang S, Wei Y (2018) Pseudo mask augmented object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Cho M, Laptev I, Kantorov V, Oquab M (2016) ContextLocNet: context-aware deep network models for weakly supervised localization. In: European conference on computer vision, Springer, Berlin, pp 350–365
Liang X, Yu Y, Cheng H, Li Z, Gan Y, Lin L (2016) LSTM-CF: unifying context modeling and fusion with LSTMs for RGB-D scene labeling. In: European conference on computer vision, pp 541–557
Zhang H, Li J, Ji Y et al (2017) Understanding subtitles by character-level sequence-to-sequence learning. IEEE Trans Ind Inf 13(2):616–624
Li X, Liu Z, Luo P, Loy CC, Tang X (2017) Not all pixels are equal: difficulty-aware semantic segmentation via deep layer cascade. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 6459–6468
Roh B, Cheon Y, Kim KH, Hong S, Park M (2016) PVANET: deep but lightweight neural networks for real-time object detection. arXiv: 1608.08021v1
Cao J, Pang Y, Li X (2018) Exploring multi-branch and high-level semantic networks for improving pedestrian detection. In: IEEE conference on computer vision and pattern recognition
Dollr P, Appel R, Belongie S, Perona P (2014) Fast feature pyramids for object detection. IEEE Trans Pattern Anal Mach Intell 36(8):1532–1545
Lin T, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: IEEE computer society conference on computer vision and pattern recognition, pp 936–944
Tian Y, Luo P, Wang X, Tang X (2015) Pedestrian detection aided by deep learning semantic tasks. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 5079–5087
Liu H, Wang S, Cheng J, Tang M (2018) Pcn: part and context information for pedestrian detection with cnns. arXiv: 1804.04483v1
Jia Y, Shelhamer E, Donahue J et al (2014) Caffe: convolutional architecture for fast feature embedding. arXiv preprint, arXiv: 1408.5093
Tom D, Monti F, Baroffio L, Bondi L, Tagliasacchi M, Tubaro S (2016) Deep convolutional neural networks for pedestrian detection. Sig Process Image Commun 47(C):482–489
Ess A, Leibe B, Van Gool L (2007) Depth and appearance for mobile scene analysis. In: 2007 IEEE international conference on computer vision (ICCV)
Schiele B, Dollr P, Wojek C, Perona P (2012) Pedestrian detection: an evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell 34(4):743–761
Hengel A, Paisitkriangkrai S, Shen C, Van D (2014) Strengthening the effectiveness of pedestrian detection with spatially pooled features. Computer Vision–ECCV 2014. Springer International Publishing, pp 546–561
Zhang X, Cheng L, Li B, Hu HM (2018) Too far to see? Not really!—pedestrian detection with scale-aware localization policy. IEEE Trans Image Process 27(8):3703–3715
Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Russakovsky O, Deng J, Bernstein M (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Hosang J, Benenson R, Omran M, Schiele B (2014) Ten years of pedestrian detection, what have we learned? Computer vision-ECCV 2014 workshops. Springer, Berlin, pp 613–627
Viola P, Jones MJ, Snow D (2005) Detecting pedestrians using patterns of motion and appearance. Int J Comput Vis 63(2):153–161
Maji S, Berg AC, Malik J (2008) Classification using intersection kernel support vector machines is efficient. In: 2008 IEEE conference on computer vision and pattern recognition, pp 1–8
Davis LS, Lin Z (2008) A pose-invariant descriptor for human detection and segmentation. In: European conference on computer vision, pp 423–436
Dollar P, Tu Z, Tao H, Belongie S (2007) Feature mining for image classification. In: 2007 conference on computer vision and pattern recognition, pp 1–8
Wojek C, Schiele B (2008) A Performance evaluation of single and multi-feature people detection. In: Rigoll G (ed) Pattern recognition. DAGM 2008. Lecture notes in computer science, vol 5096. Springer, Berlin, pp 82–91
Dollr P, Nam W, Han JH (2014) Local decorrelation for improved pedestrian detection. In: NIPS’14 proceedings of the 27th international conference on neural information processing systems, pp 424–432
Dollar P, Wojek C, Schiele B, Perona P (2009) Pedestrian detection: a benchmark. In: 2009 IEEE conference on computer vision and pattern recognition (CVPR), pp 304–311
Perona P, Dollar P, Schiele B, Wojek C http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians
Acknowledgements
The authors would like to thank Dollar et al. for sharing Caltech dataset and MATLAB Toolbox, Ess et al. for sharing ETH dataset and Dalal et al. for sharing INRIA dataset. This work was supported in part by the National Natural Science Foundation of China under Grants 61203261, 61876099 and U1613223, in part by the China Post-Doctoral Science Foundation through the Project under Grant 2012M521335, in part by the Research Fund of Guangxi Key Lab of Multi-source Information Mining and Security under Grant MIMS16-02, in part by the Shenzhen Science and Technology Research and Development Funds under Grant JCYJ20170307093018753, and in part by the Fundamental Research Funds of Shandong University under Grant 2018JCG07.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, Z., Chen, Z., Jonathan Wu, Q.M. et al. Pedestrian detection via deep segmentation and context network. Neural Comput & Applic 32, 5845–5857 (2020). https://doi.org/10.1007/s00521-019-04057-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-019-04057-4