Learning Pedestrian Detection from Virtual Worlds

  • Giuseppe Amato
  • Luca CiampiEmail author
  • Fabrizio Falchi
  • Claudio Gennaro
  • Nicola Messina
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11751)


In this paper, we present a real-time pedestrian detection system that has been trained using a virtual environment. This is a very popular topic of research having endless practical applications and recently, there was an increasing interest in deep learning architectures for performing such a task. However, the availability of large labeled datasets is a key point for an effective train of such algorithms. For this reason, in this work, we introduced ViPeD, a new synthetically generated set of images extracted from a realistic 3D video game where the labels can be automatically generated exploiting 2D pedestrian positions extracted from the graphics engine. We exploited this new synthetic dataset fine-tuning a state-of-the-art computationally efficient Convolutional Neural Network (CNN). A preliminary experimental evaluation, compared to the performance of other existing approaches trained on real-world images, shows encouraging results.



This work was partially supported by the AI4EU project, funded by the EC (H2020 - Contract n. 825619). We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Jetson TX2 board used for this research.


  1. 1.
    Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)Google Scholar
  2. 2.
    Lin, T., et al.: Microsoft COCO: common objects in context. CoRR, vol. abs/1405.0312 (2014)CrossRefGoogle Scholar
  3. 3.
    Fabbri, M., Lanzi, F., Calderara, S., Palazzi, A., Vezzani, R., Cucchiara, R.: Learning to detect and track visible and occluded body joints in a virtual world. In: European Conference on Computer Vision (ECCV) (2018)Google Scholar
  4. 4.
    Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. CoRR, vol. abs/1804.02767 (2018)Google Scholar
  5. 5.
    Milan, A., Leal-Taixé, L., Reid, I.D., Roth, S., Schindler, K.: MOT16: a benchmark for multi-object tracking. CoRR, vol. abs/1603.00831 (2016)Google Scholar
  6. 6.
    Benenson, R., Omran, M., Hosang, J., Schiele, B.: Ten years of pedestrian detection, what have we learned? In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8926, pp. 613–627. Springer, Cham (2015). Scholar
  7. 7.
    Zhang, S., Bauckhage, C., Cremers, A.B.: Informed haar-like features improve pedestrian detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2014Google Scholar
  8. 8.
    Zhang, S., Benenson, R., Schiele, B.: Filtered channel features for pedestrian detection. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015, pp. 1751–1760 (2015)Google Scholar
  9. 9.
    Zhang, S., Benenson, R., Omran, M., Hosang, J., Schiele, B.: How far are we from solving pedestrian detection? In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016Google Scholar
  10. 10.
    Nam, W., Dollar, P., Han, J.H.: Local decorrelation for improved pedestrian detection. In: Advances in Neural Information Processing Systems, vol. 27, pp. 424–432. Curran Associates Inc, New York (2014)Google Scholar
  11. 11.
    Tian, Y., Luo, P., Wang, X., Tang, X.: Deep learning strong parts for pedestrian detection. In: 2015 IEEE International Conference on Computer Vision (ICCV), December 2015, pp. 1904–1912 (2015)Google Scholar
  12. 12.
    Yang, F., Choi, W., Lin, Y.: Exploit all the layers: fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers. In: 2016 IEEE CVPR, June 2016, pp. 2129–2137 (2016)Google Scholar
  13. 13.
    Cai, Z., Fan, Q., Feris, R.S., Vasconcelos, N.: A unified multi-scale deep convolutional neural network for fast object detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 354–370. Springer, Cham (2016). Scholar
  14. 14.
    Sermanet, P., Kavukcuoglu, K., Chintala, S., Lecun, Y.: Pedestrian detection with unsupervised multi-stage feature learning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2013Google Scholar
  15. 15.
    Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  16. 16.
    Dollar, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: an evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 34(4), 743–761 (2012)CrossRefGoogle Scholar
  17. 17.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 1, June 2005, pp. 886–893 (2005)Google Scholar
  18. 18.
    Zhang, S., Benenson, R., Schiele, B.: Citypersons: a diverse dataset for pedestrian detection. CoRR, vol. abs/1702.05693 (2017)Google Scholar
  19. 19.
    Kaneva, B., Torralba, A., Freeman, W.T.: Evaluation of image features using a photorealistic virtual world. In: 2011 International Conference on Computer Vision, November 2011, pp. 2282–2289 (2011)Google Scholar
  20. 20.
    Marín, J., Vázquez, D., Gerónimo, D., López, A.M.: Learning appearance in virtual scenarios for pedestrian detection. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 2010, pp. 137–144 (2010)Google Scholar
  21. 21.
    Vazquez, D., Lopez, A.M., Ponsa, D.: Unsupervised domain adaptation of virtual and real worlds for pedestrian detection. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR 2012), November 2012, pp. 3492–3495 (2012)Google Scholar
  22. 22.
    Vázquez, D., López, A.M., Marín, J., Ponsa, D., Gerónimo, D.: Virtual and real world adaptation for pedestrian detection. IEEE Trans. Pattern Anal. Mach. Intell. 36(4), 797–809 (2014)CrossRefGoogle Scholar
  23. 23.
    Johnson-Roberson, M., Barto, C., Mehta, R., Sridhar, S.N., Vasudevan, R.: Driving in the matrix: can virtual worlds replace human-generated annotations for real world tasks? CoRR, vol. abs/1610.01983 (2016)Google Scholar
  24. 24.
    Bochinski E., Eiselein, V., Sikora, T.: Training a convolutional neural network for multi-class object detection using solely virtual world data. In: 2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 278–285. IEEE (2016)Google Scholar
  25. 25.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, vol. 28, pp. 91–99. Curran Associates Inc., New York (2015)Google Scholar
  26. 26.
    Leal-Taixé, L., Milan, A., Reid, I.D., Roth, S., Schindler, K.: Motchallenge 2015: towards a benchmark for multi-target tracking. CoRR, vol. abs/1504.01942 (2015)Google Scholar
  27. 27.
    Redmon, J.: Darknet: open source neural networks in c (2013)Google Scholar
  28. 28.
    Yu, F., Li, W., Li, Q., Liu, Y., Shi, X., Yan J.: POI: multiple object tracking with high performance detection and appearance feature. CoRR, vol. abs/1610.06136 (2016)Google Scholar
  29. 29.
    Lin, C., Lu, J., Wang, G., Zhou, J.: Graininess-aware deep feature learning for pedestrian detection. In: The European Conference on Computer Vision (ECCV), September 2018CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Giuseppe Amato
    • 1
  • Luca Ciampi
    • 1
    Email author
  • Fabrizio Falchi
    • 1
  • Claudio Gennaro
    • 1
  • Nicola Messina
    • 1
  1. 1.Institute of Information Science and Technologies (ISTI)Italian National Research Council (CNR)PisaItaly

Personalised recommendations