Skip to main content
Log in

Exploiting context based on CNN and coding representations for pedestrian co-detection

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The exploitation of contextual information among multiple images has been proven significant to improve detection performance by object co-detection methods. In this paper, we propose a pedestrian co-detection method that combines the strengths of convolutional neural networks (CNNs) and locality-constrained linear coding (LLC) in a unified conditional random field (CRF) model. First, we obtain object candidates by using a region proposal network (RPN) in Faster R-CNN. Second, we build a fully connected CRF that consists of unary potentials on individual object candidates and two types of pairwise potentials on pairs of object candidates. The unary potential is computed independently for each object candidate by using the baseline method. The pairwise potentials consist of multiscale CNN and LLC representation-based potentials, which contribute to the capturing of relationships among object candidates in all the test images. Finally, we jointly predict the category labels of all the object candidates through the mean field inference in the CRF. We evaluated the proposed method on the ETH, Caltech, and INRIA Pedestrian datasets. The experimental results demonstrate the effectiveness of the proposed method as compared to the baseline method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Adams A, Baek J, Davis MA (2010) Fast high-dimensional filtering using the permutohedral lattice. In: Computer graphics forum, vol 29(2). Blackwell Publishing Ltd, Oxford, pp 753–762

    Article  Google Scholar 

  2. Appel R, Fuchs T, Dollár P et al (2013) Quickly boosting decision trees-pruning underachieving features early. In: International conference on machine learning, pp 594–602

  3. Arnab A, Jayasumana S, Zheng S et al (2016) Higher order conditional random fields in deep neural networks. In: European conference on computer vision, Cham, pp 524–540

  4. Arnab A, Zheng S, Jayasumana S et al (2018) Conditional random fields meet deep neural networks for semantic segmentation. IEEE Signal Proc Mag 35(1):37–52

    Article  Google Scholar 

  5. Bao SY, Xiang Y, Savarese S (2012) Object co-detection. In: European conference on computer vision. Springer, Berlin, pp 86–101

    Chapter  Google Scholar 

  6. Barinova O, Lempitsky V, Kholi P (2012) On detection of multiple object instances using hough transforms. IEEE Trans Softw Eng 34(9):1773

    Google Scholar 

  7. Bell S, Zitnick LC, Bala K et al (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2874–2883

  8. Benenson R, Mathias M, Tuytelaars T et al (2013) Seeking the strongest rigid detector. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 3666–3673

  9. Brazil G, Yin X, Liu X (2017) Illuminating Pedestrians via Simultaneous Detection & Segmentation. arXiv:1706.08564

  10. Cai Z, Saberian M, Vasconcelos N (2015) Learning complexity-aware cascades for deep pedestrian detection. In: Proceedings of the IEEE international conference on computer vision, pp 3361– 3369

  11. Cai Z, Fan Q, Feris RS et al (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: European conference on computer vision. Springer, Cham, pp 354–370

    Chapter  Google Scholar 

  12. Chen Q, Song Z, Dong J et al (2015) Contextualizing object detection and classification. IEEE Trans Pattern Anal Mach Intell 37(1):13–27

    Article  Google Scholar 

  13. Chen LC, Papandreou G, Kokkinos I et al (2018) Deeplab: semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

    Article  Google Scholar 

  14. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE computer society conference on computer vision and pattern recognition, 2005. CVPR 2005. IEEE, 1, pp 886– 893

  15. Dollar P, Wojek C, Schiele B et al (2012) Pedestrian detection: an evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell 34(4):743–761

    Article  Google Scholar 

  16. Felzenszwalb PF, Girshick RB, McAllester D et al (2010) Object detection with discriminatively trained part-based models. IEEE transactions on pattern analysis and machine intelligence 32(9):1627– 1645

    Article  Google Scholar 

  17. Fu K, Gu I, Yang J (2017) Saliency detection by fully learning a continuous conditional random field. IEEE Transactions on Multimedia 19(7):1531–1544

    Article  Google Scholar 

  18. Gidaris S, Komodakis N (2015) Object detection via a multi-region and semantic segmentation-aware cnn model. In: Proceedings of the IEEE international conference on computer vision, pp 1134–1142

  19. Girshick R, Fast r-cnn[C] (2015). In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448

  20. Girshick R, Donahue J, Darrell T et al (2016) Region-Based Convolutional networks for accurate object detection and segmentation. IEEE Trans Pattern Anal Mach Intell 38(1):142

    Article  Google Scholar 

  21. Guo X, Liu D, Jou B et al (2013) Robust object co-detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3206–3213

  22. Hayder Z, Salzmann M, He X (2014) Object co-detection via efficient inference in a fully-connected CRF. In: European conference on computer vision. Springer, Cham, pp 330–345

    Chapter  Google Scholar 

  23. Hayder Z, He X, Salzmann M (2015) Structural kernel learning for large scale multiclass object co-detection. In: 2015 IEEE International Conference on Computer Vision (ICCV). IEEE, pp 2632– 2640

  24. He K, Zhang X, Ren S et al (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: European conference on computer vision. Springer, Cham, pp 346–361

    Chapter  Google Scholar 

  25. Hoffman J, Guadarrama S, Tzeng ES et al (2014) LSDA: Large scale detection through adaptation. Advances in Neural Information Processing Systems, pp 3536–3544

  26. Hosang J, Omran M, Benenson R et al (2015). In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4073–4082

  27. Krähenbühl P, Koltun V (2011) Efficient inference in fully connected crfs with gaussian edge potentials. Advances in neural information processing systems, pp 109–117

  28. Kumar S, Hebert M (2006) Discriminative random fields. Int J Comput Vis 68(2):179–201

    Article  Google Scholar 

  29. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 2169–2178

  30. Li J, Liang X, Shen SM et al (2018) Scale-aware fast r-CNN for pedestrian detection. IEEE Transactions on Multimedia 20(4):985–996

    Google Scholar 

  31. Marin J, Vaázquez D, López AM et al (2013) Random forests of local experts for pedestrian detection. In: Proceedings of the IEEE international conference on computer vision, pp 2592–2599

  32. Nam W, Dollár P, Han JH (2014) Local decorrelation for improved pedestrian detection. Advances in Neural Information Processing Systems, pp 424–432

  33. Ouyang W, Wang X (2013) Joint deep learning for pedestrian detection, pp 2056–2063

  34. Paisitkriangkrai S, Shen C, Van Den Hengel A (2014) Strengthening the effectiveness of pedestrian detection with spatially pooled features. In: European conference on computer vision. Springer, Cham, pp 546–561

    Chapter  Google Scholar 

  35. Ren X, Ramanan D (2013) Histograms of sparse codes for object detection. In: 2013 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3246–3253

  36. Ren S, He K, Girshick R et al (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39 (6):1137–1149

    Article  Google Scholar 

  37. Rui T, Zou J, Zhou Y et al (2017) Pedestrian detection based on multi-convolutional features by feature maps pruning. Multimed Tools Appl 76 (23):25079–25089

    Article  Google Scholar 

  38. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  39. Shi J, Liao R, Jia J (2013) CoDeL: a human co-detection and labeling framework. In: IEEE international conference on computer vision. IEEE, pp 2096–2103

  40. Shen C, Wang P, Paisitkriangkrai S et al (2013) Training effective node classifiers for cascade classification. Int J Comput Vis 103(3):326–347

    Article  MathSciNet  MATH  Google Scholar 

  41. Shotton J, Winn J, Rother C et al (2009) Textonboost for image understanding: multi-class object recognition and segmentation by jointly modeling texture, layout, and context. Int J Comput Vis 81(1):2–23

    Article  Google Scholar 

  42. Tian Y, Luo P, Wang X et al (2015). In: Proceedings of the IEEE international conference on computer vision, pp 1904–1912

  43. Tian Y, Luo P, Wang X et al (2015). In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5079–5087

  44. Toca C, Ciuc M, Patrascu C (2015) Normalized autobinomial Markov channels for pedestrian detection. BMVC, pp 175.1-175.13

  45. Toyoda T, Hasegawa O (2008) Random field model for integration of local information and global information. IEEE Trans Pattern Anal Mach Intell 30 (8):1483–1489

    Article  Google Scholar 

  46. Uijlings JR, Sande KE, Gevers T et al (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171

    Article  Google Scholar 

  47. Vineet V, Warrell J, Torr PHS (2014) Filter-based mean-field inference for random fields with higher-order terms and product label-spaces. Int J Comput Vis 110 (3):290–307

    Article  MathSciNet  MATH  Google Scholar 

  48. Wang J, Yang J, Yu K et al (2010) Locality-constrained Linear Coding for image classification. In: Computer vision and pattern recognition, IEEE, pp 3360–3367

  49. Yang J, Yu K, Gong Y et al (2009) Linear spatial pyramid matching using sparse coding for image classification. In: IEEE computer society conference on computer vision and pattern recognition. DBLP, pp 1794–1801

  50. Yang B, Yan J, Lei Z et al (2015) Convolutional channel features. In: 2015 IEEE international conference on computer vision (ICCV). IEEE, pp 82–90

  51. Yang F, Choi W, Lin Y (2016) Exploit all the layers: fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2129–2137

  52. Yang J, Yang MH (2017) Top-down visual saliency via joint CRF and dictionary learning. IEEE Trans Pattern Anal Mach Intell 39(3):576–588

    Article  Google Scholar 

  53. Yang D, Zhang J, Xu S et al (2018) Real-time pedestrian detection via hierarchical convolutional feature. Multimedia Tools & Applications 2018(4):1–20

    Google Scholar 

  54. Zhang S, Benenson R, Schiele B (2015) Filtered channel features for pedestrian detection. CVPR 1(2):4

    Google Scholar 

  55. Zhang L, Lin L, Liang X et al (2016) Is faster r-cnn doing well for pedestrian detection?. In: European conference on computer vision. Springer, Cham, pp 443–457

    Chapter  Google Scholar 

Download references

Acknowledgements

This work was partially supported by the National Natural Science Foundation of China with Nos. 61673274, 61375008, and 61075106.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Linfeng Jiang or Huilin Xiong.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, L., Ji, J., Zhong, W. et al. Exploiting context based on CNN and coding representations for pedestrian co-detection. Multimed Tools Appl 79, 4277–4296 (2020). https://doi.org/10.1007/s11042-018-6806-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-6806-7

Keywords

Navigation