Exploiting context based on CNN and coding representations for pedestrian co-detection

Jiang, Linfeng; Ji, Jinsheng; Zhong, Weilin; Zhang, Tao; Xiong, Huilin

doi:10.1007/s11042-018-6806-7

Exploiting context based on CNN and coding representations for pedestrian co-detection

Published: 31 October 2018

Volume 79, pages 4277–4296, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Linfeng Jiang¹,
Jinsheng Ji¹,
Weilin Zhong¹,
Tao Zhang¹ &
…
Huilin Xiong¹

368 Accesses
3 Citations
Explore all metrics

Abstract

The exploitation of contextual information among multiple images has been proven significant to improve detection performance by object co-detection methods. In this paper, we propose a pedestrian co-detection method that combines the strengths of convolutional neural networks (CNNs) and locality-constrained linear coding (LLC) in a unified conditional random field (CRF) model. First, we obtain object candidates by using a region proposal network (RPN) in Faster R-CNN. Second, we build a fully connected CRF that consists of unary potentials on individual object candidates and two types of pairwise potentials on pairs of object candidates. The unary potential is computed independently for each object candidate by using the baseline method. The pairwise potentials consist of multiscale CNN and LLC representation-based potentials, which contribute to the capturing of relationships among object candidates in all the test images. Finally, we jointly predict the category labels of all the object candidates through the mean field inference in the CRF. We evaluated the proposed method on the ETH, Caltech, and INRIA Pedestrian datasets. The experimental results demonstrate the effectiveness of the proposed method as compared to the baseline method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Context-aware pedestrian detection especially for small-sized instances with Deconvolution Integrated Faster RCNN (DIF R-CNN)

Article 29 October 2018

Deep Learning of Scene-Specific Classifier for Pedestrian Detection

Semantic-driven multi-camera pedestrian detection

Article Open access 09 April 2022

References

Adams A, Baek J, Davis MA (2010) Fast high-dimensional filtering using the permutohedral lattice. In: Computer graphics forum, vol 29(2). Blackwell Publishing Ltd, Oxford, pp 753–762
Article Google Scholar
Appel R, Fuchs T, Dollár P et al (2013) Quickly boosting decision trees-pruning underachieving features early. In: International conference on machine learning, pp 594–602
Arnab A, Jayasumana S, Zheng S et al (2016) Higher order conditional random fields in deep neural networks. In: European conference on computer vision, Cham, pp 524–540
Arnab A, Zheng S, Jayasumana S et al (2018) Conditional random fields meet deep neural networks for semantic segmentation. IEEE Signal Proc Mag 35(1):37–52
Article Google Scholar
Bao SY, Xiang Y, Savarese S (2012) Object co-detection. In: European conference on computer vision. Springer, Berlin, pp 86–101
Chapter Google Scholar
Barinova O, Lempitsky V, Kholi P (2012) On detection of multiple object instances using hough transforms. IEEE Trans Softw Eng 34(9):1773
Google Scholar
Bell S, Zitnick LC, Bala K et al (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2874–2883
Benenson R, Mathias M, Tuytelaars T et al (2013) Seeking the strongest rigid detector. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 3666–3673
Brazil G, Yin X, Liu X (2017) Illuminating Pedestrians via Simultaneous Detection & Segmentation. arXiv:1706.08564
Cai Z, Saberian M, Vasconcelos N (2015) Learning complexity-aware cascades for deep pedestrian detection. In: Proceedings of the IEEE international conference on computer vision, pp 3361– 3369
Cai Z, Fan Q, Feris RS et al (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: European conference on computer vision. Springer, Cham, pp 354–370
Chapter Google Scholar
Chen Q, Song Z, Dong J et al (2015) Contextualizing object detection and classification. IEEE Trans Pattern Anal Mach Intell 37(1):13–27
Article Google Scholar
Chen LC, Papandreou G, Kokkinos I et al (2018) Deeplab: semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Article Google Scholar
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE computer society conference on computer vision and pattern recognition, 2005. CVPR 2005. IEEE, 1, pp 886– 893
Dollar P, Wojek C, Schiele B et al (2012) Pedestrian detection: an evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell 34(4):743–761
Article Google Scholar
Felzenszwalb PF, Girshick RB, McAllester D et al (2010) Object detection with discriminatively trained part-based models. IEEE transactions on pattern analysis and machine intelligence 32(9):1627– 1645
Article Google Scholar
Fu K, Gu I, Yang J (2017) Saliency detection by fully learning a continuous conditional random field. IEEE Transactions on Multimedia 19(7):1531–1544
Article Google Scholar
Gidaris S, Komodakis N (2015) Object detection via a multi-region and semantic segmentation-aware cnn model. In: Proceedings of the IEEE international conference on computer vision, pp 1134–1142
Girshick R, Fast r-cnn[C] (2015). In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Girshick R, Donahue J, Darrell T et al (2016) Region-Based Convolutional networks for accurate object detection and segmentation. IEEE Trans Pattern Anal Mach Intell 38(1):142
Article Google Scholar
Guo X, Liu D, Jou B et al (2013) Robust object co-detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3206–3213
Hayder Z, Salzmann M, He X (2014) Object co-detection via efficient inference in a fully-connected CRF. In: European conference on computer vision. Springer, Cham, pp 330–345
Chapter Google Scholar
Hayder Z, He X, Salzmann M (2015) Structural kernel learning for large scale multiclass object co-detection. In: 2015 IEEE International Conference on Computer Vision (ICCV). IEEE, pp 2632– 2640
He K, Zhang X, Ren S et al (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: European conference on computer vision. Springer, Cham, pp 346–361
Chapter Google Scholar
Hoffman J, Guadarrama S, Tzeng ES et al (2014) LSDA: Large scale detection through adaptation. Advances in Neural Information Processing Systems, pp 3536–3544
Hosang J, Omran M, Benenson R et al (2015). In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4073–4082
Krähenbühl P, Koltun V (2011) Efficient inference in fully connected crfs with gaussian edge potentials. Advances in neural information processing systems, pp 109–117
Kumar S, Hebert M (2006) Discriminative random fields. Int J Comput Vis 68(2):179–201
Article Google Scholar
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 2169–2178
Li J, Liang X, Shen SM et al (2018) Scale-aware fast r-CNN for pedestrian detection. IEEE Transactions on Multimedia 20(4):985–996
Google Scholar
Marin J, Vaázquez D, López AM et al (2013) Random forests of local experts for pedestrian detection. In: Proceedings of the IEEE international conference on computer vision, pp 2592–2599
Nam W, Dollár P, Han JH (2014) Local decorrelation for improved pedestrian detection. Advances in Neural Information Processing Systems, pp 424–432
Ouyang W, Wang X (2013) Joint deep learning for pedestrian detection, pp 2056–2063
Paisitkriangkrai S, Shen C, Van Den Hengel A (2014) Strengthening the effectiveness of pedestrian detection with spatially pooled features. In: European conference on computer vision. Springer, Cham, pp 546–561
Chapter Google Scholar
Ren X, Ramanan D (2013) Histograms of sparse codes for object detection. In: 2013 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3246–3253
Ren S, He K, Girshick R et al (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39 (6):1137–1149
Article Google Scholar
Rui T, Zou J, Zhou Y et al (2017) Pedestrian detection based on multi-convolutional features by feature maps pruning. Multimed Tools Appl 76 (23):25079–25089
Article Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Shi J, Liao R, Jia J (2013) CoDeL: a human co-detection and labeling framework. In: IEEE international conference on computer vision. IEEE, pp 2096–2103
Shen C, Wang P, Paisitkriangkrai S et al (2013) Training effective node classifiers for cascade classification. Int J Comput Vis 103(3):326–347
Article MathSciNet MATH Google Scholar
Shotton J, Winn J, Rother C et al (2009) Textonboost for image understanding: multi-class object recognition and segmentation by jointly modeling texture, layout, and context. Int J Comput Vis 81(1):2–23
Article Google Scholar
Tian Y, Luo P, Wang X et al (2015). In: Proceedings of the IEEE international conference on computer vision, pp 1904–1912
Tian Y, Luo P, Wang X et al (2015). In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5079–5087
Toca C, Ciuc M, Patrascu C (2015) Normalized autobinomial Markov channels for pedestrian detection. BMVC, pp 175.1-175.13
Toyoda T, Hasegawa O (2008) Random field model for integration of local information and global information. IEEE Trans Pattern Anal Mach Intell 30 (8):1483–1489
Article Google Scholar
Uijlings JR, Sande KE, Gevers T et al (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171
Article Google Scholar
Vineet V, Warrell J, Torr PHS (2014) Filter-based mean-field inference for random fields with higher-order terms and product label-spaces. Int J Comput Vis 110 (3):290–307
Article MathSciNet MATH Google Scholar
Wang J, Yang J, Yu K et al (2010) Locality-constrained Linear Coding for image classification. In: Computer vision and pattern recognition, IEEE, pp 3360–3367
Yang J, Yu K, Gong Y et al (2009) Linear spatial pyramid matching using sparse coding for image classification. In: IEEE computer society conference on computer vision and pattern recognition. DBLP, pp 1794–1801
Yang B, Yan J, Lei Z et al (2015) Convolutional channel features. In: 2015 IEEE international conference on computer vision (ICCV). IEEE, pp 82–90
Yang F, Choi W, Lin Y (2016) Exploit all the layers: fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2129–2137
Yang J, Yang MH (2017) Top-down visual saliency via joint CRF and dictionary learning. IEEE Trans Pattern Anal Mach Intell 39(3):576–588
Article Google Scholar
Yang D, Zhang J, Xu S et al (2018) Real-time pedestrian detection via hierarchical convolutional feature. Multimedia Tools & Applications 2018(4):1–20
Google Scholar
Zhang S, Benenson R, Schiele B (2015) Filtered channel features for pedestrian detection. CVPR 1(2):4
Google Scholar
Zhang L, Lin L, Liang X et al (2016) Is faster r-cnn doing well for pedestrian detection?. In: European conference on computer vision. Springer, Cham, pp 443–457
Chapter Google Scholar

Download references

Acknowledgements

This work was partially supported by the National Natural Science Foundation of China with Nos. 61673274, 61375008, and 61075106.

Author information

Authors and Affiliations

Department of Automation, Shanghai Jiao Tong University, 800 Dongchuan Rd., Shanghai, 200240, China
Linfeng Jiang, Jinsheng Ji, Weilin Zhong, Tao Zhang & Huilin Xiong

Authors

Linfeng Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Jinsheng Ji
View author publications
You can also search for this author in PubMed Google Scholar
Weilin Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Tao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Huilin Xiong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Linfeng Jiang or Huilin Xiong.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiang, L., Ji, J., Zhong, W. et al. Exploiting context based on CNN and coding representations for pedestrian co-detection. Multimed Tools Appl 79, 4277–4296 (2020). https://doi.org/10.1007/s11042-018-6806-7

Download citation

Received: 08 May 2018
Revised: 13 September 2018
Accepted: 23 October 2018
Published: 31 October 2018
Issue Date: February 2020
DOI: https://doi.org/10.1007/s11042-018-6806-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploiting context based on CNN and coding representations for pedestrian co-detection

Abstract

Access this article

Similar content being viewed by others

Context-aware pedestrian detection especially for small-sized instances with Deconvolution Integrated Faster RCNN (DIF R-CNN)

Deep Learning of Scene-Specific Classifier for Pedestrian Detection

Semantic-driven multi-camera pedestrian detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Exploiting context based on CNN and coding representations for pedestrian co-detection

Abstract

Access this article

Similar content being viewed by others

Context-aware pedestrian detection especially for small-sized instances with Deconvolution Integrated Faster RCNN (DIF R-CNN)

Deep Learning of Scene-Specific Classifier for Pedestrian Detection

Semantic-driven multi-camera pedestrian detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation