Advertisement

SPID: Surveillance Pedestrian Image Dataset and Performance Evaluation for Pedestrian Detection

  • Dan Wang
  • Chongyang Zhang
  • Hao Cheng
  • Yanfeng Shang
  • Lin Mei
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10118)

Abstract

Pedestrian detection is highly valued in intelligent surveillance systems. Most existing pedestrian datasets are autonomously collected from non-surveillance videos, which result in significant data differences between the self-collected data and practical surveillance data. The data differences include: resolution, illumination, view point, and occlusion. Due to the data differences, most existing pedestrian detection algorithms based on traditional datasets can hardly be adopted to surveillance applications directly. To fill the gap, one surveillance pedestrian image dataset (SPID), in which all the images were collected from the on-using surveillance systems, was constructed and used to evaluate the existing pedestrian detection (PD) methods. The dataset covers various surveillance scenes and pedestrian scales, view points, and illuminations. Four traditional PD algorithms using hand-crafted features and one deep-learning-model based deep PD methods are adopted to evaluate their performance on the SPID and some well-known existing pedestrian datasets, such as INRIA and Caltech. The experimental ROC curves show that: The performance of all these algorithms tested on SPID is worse than that on INRIA dataset and Caltech dataset, which also proves that the data differences between non-surveillance data and real surveillance data will induce the decreasing of PD performance. The main factors include scale, view point, illumination and occlusion. Thus the specific surveillance pedestrian dataset is very necessary. We believe that the release of SPID can stimulate innovative research on the challenging and important surveillance pedestrian detection problem. SPID is available online at: http://ivlab.sjtu.edu.cn/best/Data/List/Datasets.

Keywords

Height Distribution Convolutional Neural Network Surveillance Camera Pedestrian Detection Deep Convolutional Neural Network 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgement

This work was partly funded by NSFC (No. 61571297, No. 61527804), 111 Project (B07022), and China National Key Technology R&D Program (No. 2012BAH07B01). The authors also thank the following organizations for their surveillance data supports: SEIEE of Shanghai Jiao Tong University, The Third Research Institute of Ministry of Public Security, Tianjin Tiandy Digital Technology Co., Shanghai Jian Qiao University, and Qingpu Branch of Shanghai Public Security Bureau.

References

  1. 1.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 886–893. IEEE (2005)Google Scholar
  2. 2.
    Ess, A., Leibe, B., Gool, L.V.: Depth and appearance for mobile scene analysis. In: IEEE 11th International Conference on Computer Vision, ICCV 2007, pp. 1–8. IEEE (2007)Google Scholar
  3. 3.
    Dollár, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: l benchmark. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 304–311. IEEE (2009)Google Scholar
  4. 4.
    Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The kitti vision benchmark suite. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)Google Scholar
  5. 5.
    Wojek, C., Walk, S., Schiele, B.: Multi-cue onboard pedestrian detection. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 794–801. IEEE (2009)Google Scholar
  6. 6.
    Enzweiler, M., Gavrila, D.M.: Monocular pedestrian detection: survey and experiments. IEEE Trans. Pattern Anal. Mach. Intell. 31(12), 2179–2195 (2009)CrossRefGoogle Scholar
  7. 7.
    Nam, W., Dollár, P., Han, J.H.: Local decorrelation for improved detection. arXiv preprint arXiv:1406.1134 (2014)
  8. 8.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)CrossRefGoogle Scholar
  9. 9.
    Dollár, P., Tu, Z., Perona, P., Belongie, S.: Integral channel features (2009)Google Scholar
  10. 10.
    Gkioxari, G., Hariharan, B., Girshick, R., Malik, J.: Using k-poselets for detecting people and localizing their keypoints. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3582–3589 (2014)Google Scholar
  11. 11.
    Bourdev, L., Malik, J.: Poselets: body part detectors trained using 3D human pose annotations. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 1365–1372. IEEE (2009)Google Scholar
  12. 12.
    Dollár, P., Appel, R., Belongie, S., Perona, P.: Fast feature pyramids for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 36(8), 1532–1545 (2014)CrossRefGoogle Scholar
  13. 13.
    Zhang, S., Bauckhage, C., Cremers, A.B.: Informed Haar-like features improve pedestrian detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 947–954 (2014)Google Scholar
  14. 14.
    Zhang, S., Benenson, R., Schiele, B.: Filtered channel features for pedestrian detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1751–1760 (2015)Google Scholar
  15. 15.
    Chen, X., Wei, P., Ke, W., Ye, Q., Jiao, J.: Pedestrian detection with deep convolutional neural network. In: Jawahar, C.V., Shan, S. (eds.) ACCV 2014. LNCS, vol. 9008, pp. 354–365. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-16628-5_26 Google Scholar
  16. 16.
    Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)Google Scholar
  17. 17.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  18. 18.
    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the ACM International Conference on Multimedia, pp. 675–678. ACM (2014)Google Scholar
  19. 19.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)Google Scholar
  20. 20.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)Google Scholar
  21. 21.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  22. 22.
    Dollar, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: an evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 34(4), 743–761 (2012)CrossRefGoogle Scholar
  23. 23.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Li, F.F.: ImageNet: a large-scale hierarchical image database, pp. 248–255 (2009)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Dan Wang
    • 1
  • Chongyang Zhang
    • 1
  • Hao Cheng
    • 1
  • Yanfeng Shang
    • 2
  • Lin Mei
    • 2
  1. 1.Institute of Image Communication and Network EngineeringShanghai Jiao Tong UniversityShanghaiChina
  2. 2.The Third Research Institute of The Ministry of Public SecurityShanghaiChina

Personalised recommendations