BEST: Benchmark and Evaluation of Surveillance Task

  • Chongyang Zhang
  • Bingbing Ni
  • Li Song
  • Guangtao Zhai
  • Xiaokang Yang
  • Wenjun Zhang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10118)


Smart/Intelligent video surveillance technology plays the central role in the emerging smart city systems. Most intelligent visual algorithms require large-scale image/video datasets to train classifiers or acquire discriminative features using machine learning. However, most existing datasets are collected from non-surveillance conditions, which have significant differences as compared to the practical surveillance data. As a consequence, many existing intelligent visual algorithms trained on traditional datasets perform not so well in the real world surveillance applications. We believe the lack of high quality surveillance datasets has greatly limited the application of the computer vision algorithms in practical surveillance scenarios. To solve this problem, one large-scale and comprehensive surveillance image and video database and test platform, called Benchmark and Evaluation of Surveillance Task (abbreviated as BEST), is developed in this work. The original images and videos in BEST were all collected from on-using surveillance cameras, and have been carefully selected to cover a wide and balanced range of outdoor surveillance scenarios. Compared with the existing surveillance/non-surveillance datasets, the proposed BEST dataset provides a realistic, extensive and diversified testbed for a more comprehensive performance evaluation. Our experimental results show that, performance of seven pedestrian detection algorithms on BEST is worse than that on the existing datasets. This highlights the difference between non-surveillance data and real surveillance data, which is the major cause of the performance decreases. The dataset is open to the public and can be downloaded at:



This work was partly funded by NSFC (No. 61571297, No. 61371146, No. 61527804, 61521062), 111 Program (B07022), and China National Key Technology R&D Program (No. 2012BAH07B01). The authors also thank the following organizations for their surveillance data supports: SEIEE of Shanghai Jiao Tong University, The Third Research Institute of Ministry of Public Security, Tianjin Tiandy Digital Technology Co., Shanghai Jian Qiao University, and Qingpu Branch of Shanghai Public Security Bureau.


  1. 1.
    Shu, C.F., Hampapur, A., Lu, M., Brown, L., Connell, J., Senior, A., Tian, Y.: IBM smart surveillance system (s3): a open and extensible framework for event based surveillance. In: IEEE Conference on Advanced Video and Signal Based Surveillance, pp. 318–323 (2005)Google Scholar
  2. 2.
    Hampapur, A., Brown, L., Connell, J., Pankanti, S.: Smart surveillance: applications, technologies and implications. In: Joint Conference of the Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia, pp. 1133–1138 (2004)Google Scholar
  3. 3.
    Hu, W., Tieniu, T., Wang, L., Maybank, S.: A survey on visual surveillance of object motion and behaviors. IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev. 34(3), 334–352 (2004)CrossRefGoogle Scholar
  4. 4.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Li, F.F.: ImageNet: a large-scale hierarchical image database, pp. 248–255 (2009)Google Scholar
  5. 5.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)CrossRefGoogle Scholar
  6. 6.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 886–893 (2005)Google Scholar
  7. 7.
    Ess, A., Leibe, B., Van Gool, L.: Depth and appearance for mobile scene analysis. In: IEEE International Conference on Computer Vision, pp. 1–8 (2007)Google Scholar
  8. 8.
    Enzweiler, M., Gavrila, D.M.: Monocular pedestrian detection: survey and experiments. IEEE Trans. Pattern Anal. Mach. Intell. 31(12), 2179–2195 (2009)CrossRefGoogle Scholar
  9. 9.
    Dollar, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: a benchmark. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 304–311 (2009)Google Scholar
  10. 10.
    Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The kitti vision benchmark suite. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361 (2012)Google Scholar
  11. 11.
    Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: 17th International Conference on Proceedings of the Pattern Recognition, (ICPR 2004), vol. 3, pp. 32–36 (2004)Google Scholar
  12. 12.
    Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Action as space-time shapes. IEEE Trans. Pattern Anal. Mach. Intell. 29(12), 1395–1402 (2005)Google Scholar
  13. 13.
  14. 14.
    Oh, S., Hoogs, A., Perera, A., Cuntoor, N.: A large-scale benchmark dataset for event recognition in surveillance video. In: Proceedings of IEEE Computer Vision and Pattern Recognition, pp. 3153–3160 (2011)Google Scholar
  15. 15.
    Over, P., Awad, G.M., Fiscus, J.G., Antonishek, B., Michel, M., Kraaij, W., Smeaton, A.F., Qunot, G.: TRECVID 2015 an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: Proceedings of TRECVID 2015. NIST, USA (2015)Google Scholar
  16. 16.
  17. 17.
    Wang, T., Gong, S., Zhu, X., Wang, S.: Person re-identification by video ranking. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 688–703. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10593-2_45 Google Scholar
  18. 18.
    Hirzer, M., Beleznai, C., Roth, P.M., Bischof, H.: Person re-identification by descriptive and discriminative classification. In: Scandinavian Conference on Image Analysis, pp. 91–102 (2011)Google Scholar
  19. 19.
    Dollr, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: an evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 34(4), 743–761 (2012)CrossRefGoogle Scholar
  20. 20.
    Gkioxari, G., Hariharan, B., Girshick, R., Malik, J.: Using k-poselets for detecting people and localizing their keypoints. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3582–3589 (2014)Google Scholar
  21. 21.
    Dollar, P., Appel, R., Belongie, S., Perona, P.: Fast feature pyramids for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 36(8), 1532–1545 (2014)CrossRefGoogle Scholar
  22. 22.
    Nam, W., Dollr, P., Han, J.H.: Local decorrelation for improved detection. Adv. Neural Inf. Process. Syst. 1, 424–432 (2014)Google Scholar
  23. 23.
    Felzenszwalb, P.F., Girshick, R.B., Mcallester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Softw. Eng. 32(9), 1627–1645 (2014)Google Scholar
  24. 24.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (2015)Google Scholar
  25. 25.
    Wang, D., Zhang, C., Cheng, H., Shang, Y., Mei, L.: SPID: surveillance pedestrian image dataset and performance evaluation for pedestrian detection. In: 13th Asian Conference on Computer Vision Workshop on Benchmark and Evaluation of Surveillance Task (2016)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Chongyang Zhang
    • 1
  • Bingbing Ni
    • 1
  • Li Song
    • 1
  • Guangtao Zhai
    • 1
  • Xiaokang Yang
    • 1
  • Wenjun Zhang
    • 1
  1. 1.Institute of Image Communication and Network EngineeringShanghai Jiao Tong UniversityShanghaiChina

Personalised recommendations