Pedestrian Behavior Understanding and Prediction with Deep Neural Networks

  • Shuai Yi
  • Hongsheng LiEmail author
  • Xiaogang WangEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9905)


In this paper, a deep neural network (Behavior-CNN) is proposed to model pedestrian behaviors in crowded scenes, which has many applications in surveillance. A pedestrian behavior encoding scheme is designed to provide a general representation of walking paths, which can be used as the input and output of CNN. The proposed Behavior-CNN is trained with real-scene crowd data and then thoroughly investigated from multiple aspects, including the location map and location awareness property, semantic meanings of learned filters, and the influence of receptive fields on behavior modeling. Multiple applications, including walking path prediction, destination prediction, and tracking, demonstrate the effectiveness of Behavior-CNN on pedestrian behavior modeling.


Receptive Field Association Strategy Displacement Volume Deep Neural Network Walking Pattern 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work was supported in part by the Ph.D. Programs Foundation of China under Grant 20130185120039, in part by the Hong Kong Innovation and Technology Support Programme under Grant ITS/221/13FP, in part by the National Natural Science Foundation of China under Grant 61371192 and Grant 61301269, and in part by the General Research Fund through the Research Grants Council, Hong Kong, under Grant CUHK14206114, Grant CUHK14205615, Grant CUHK419412, and Grant CUHK14203015.


  1. 1.
    Yi, S., Li, H., Wang, X.: Understanding pedestrian behaviors from stationary crowd groups. In: Proceedings of CVPR (2015)Google Scholar
  2. 2.
    Cancela, B., Iglesias, A., Ortega, M., Penedo, M.: Unsupervised trajectory modelling using temporal information via minimal paths. In: Proceedings of CVPR (2014)Google Scholar
  3. 3.
    Alahi, A., Ramanathan, V., Fei-Fei, L.: Socially-aware large-scale crowd forecasting. In: Proceedings of CVPR (2014)Google Scholar
  4. 4.
    Yi, S., Li, H., Wang, X.: Pedestrian behavior modeling from stationary crowds with applications to intelligent surveillance. TIP 25(9), 4354–4368 (2016)Google Scholar
  5. 5.
    Tang, S., Andriluka, M., Milan, A., Schindler, K., Roth, S., Schiele, B.: Learning people detectors for tracking in crowded scenes. In: Proceedings of ICCV (2013)Google Scholar
  6. 6.
    Shu, G., Dehghan, A., Oreifej, O., Hand, E., Shah, M.: Part-based multiple-person tracking with partial occlusion handling. In: Proceedings of CVPR (2012)Google Scholar
  7. 7.
    Leal-Taixé, L., Fenzi, M., Kuznetsova, A., Rosenhahn, B., Savarese, S.: Learning an image-based motion context for multiple people tracking. In: Proceedings of CVPR (2014)Google Scholar
  8. 8.
    Zhou, B., Wang, X., Tang, X.: Understanding collective crowd behaviors: learning a mixture model of dynamic pedestrian-agents. In: Proceedings of CVPR (2012)Google Scholar
  9. 9.
    Nascimento, J.C., Marques, J.S., Lemos, J.M.: Modeling and classifying human activities from trajectories using a class of space-varying parametric motion fields. TIP 22(5), 2066–2080 (2013)MathSciNetGoogle Scholar
  10. 10.
    Kim, K., Lee, D., Essa, I.: Gaussian process regression flow for analysis of motion trajectories. In: Proceedings of ICCV (2011)Google Scholar
  11. 11.
    Chang, M.C., Krahnstoever, N., Ge, W.: Probabilistic group-level motion analysis and scenario recognition. In: Proceedings of ICCV (2011)Google Scholar
  12. 12.
    Mehran, R., Oyama, A., Shah, M.: Abnormal crowd behavior detection using social force model. In: Proceedings of CVPR (2009)Google Scholar
  13. 13.
    Lu, C., Shi, J., Jia, J.: Abnormal event detection at 150 FPS in matlab. In: Proceedings of ICCV (2013)Google Scholar
  14. 14.
    Mahadevan, V., Li, W., Bhalodia, V., Vasconcelos, N.: Anomaly detection in crowded scenes. In: Proceedings of CVPR (2010)Google Scholar
  15. 15.
    Helbing, D., Molnar, P.: Social force model for pedestrian dynamics. Phys. Rev. E 51(5), 4282 (1995)CrossRefGoogle Scholar
  16. 16.
    Yi, S., Wang, X., Lu, C., Jia, J., Li, H.: L0 regularized stationary-time estimation for crowd analysis. TPAMI PP(99), 1 (2016). doi: 10.1109/TPAMI.2016.2560807 CrossRefGoogle Scholar
  17. 17.
    Pellegrini, S., Ess, A., Schindler, K., Van Gool, L.: You’ll never walk alone: modeling social behavior for multi-target tracking. In: Proceedings of ICCV (2009)Google Scholar
  18. 18.
    Kuettel, D., Breitenstein, M.D., Van Gool, L., Ferrari, V.: What’s going on? Discovering spatio-temporal dependencies in dynamic scenes. In: Proceedings of CVPR (2010)Google Scholar
  19. 19.
    Wang, X., Ma, X., Grimson, W.E.L.: Unsupervised activity perception in crowded and complicated scenes using hierarchical Bayesian models. TPAMI 31(3), 539–555 (2009)CrossRefGoogle Scholar
  20. 20.
    Hospedales, T.M., Li, J., Gong, S., Xiang, T.: Identifying rare and subtle behaviors: a weakly supervised joint topic model. TPAMI 33(12), 2451–2464 (2011)CrossRefGoogle Scholar
  21. 21.
    Basharat, A., Gritai, A., Shah, M.: Learning object motion patterns for anomaly detection and improved object detection. In: Proceedings of CVPR (2008)Google Scholar
  22. 22.
    Morris, B.T., Trivedi, M.M.: Trajectory learning for activity understanding: unsupervised, multilevel, and long-term adaptive approach. TPAMI 33(11), 2287–2301 (2011)CrossRefGoogle Scholar
  23. 23.
    Wang, X., Ma, K.T., Ng, G.W., Grimson, W.E.L.: Trajectory analysis and semantic region modeling using nonparametric hierarchical Bayesian models. IJCV 95(3), 287–312 (2011)CrossRefGoogle Scholar
  24. 24.
    Kitani, K.M., Ziebart, B.D., Bagnell, J.A., Hebert, M.: Activity forecasting. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 201–214. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33765-9_15 Google Scholar
  25. 25.
    Bonabeau, E.: Agent-based modeling: methods and techniques for simulating human systems. PNAS 99(Suppl 3), 7280–7287 (2002)CrossRefGoogle Scholar
  26. 26.
    Helbing, D., Farkas, I., Vicsek, T.: Simulating dynamical features of escape panic. Nature 407(6803), 487–490 (2000)CrossRefGoogle Scholar
  27. 27.
    Bengio, Y.: Learning deep architectures for AI. Found. Trends\({\textregistered }\) Mach. Learn. 2(1), 1–127 (2009)Google Scholar
  28. 28.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of NIPS (2012)Google Scholar
  29. 29.
    Girshick, R.: Fast R-CNN. In: Proceedings of ICCV (2015)Google Scholar
  30. 30.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of NIPS (2015)Google Scholar
  31. 31.
    Wang, N., Yeung, D.Y.: Learning a deep compact image representation for visual tracking. In: Proceedings of NIPS (2013)Google Scholar
  32. 32.
    Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. TPAMI 35(8), 1915–1929 (2013)CrossRefGoogle Scholar
  33. 33.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of CVPR (2015)Google Scholar
  34. 34.
    Reddy, N.D., Singhal, P., Krishna, K.M.: Semantic motion segmentation using dense CRF formulation. In: Proceedings of ICVGIP (2014)Google Scholar
  35. 35.
    Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Proceedings of NIPS (2014)Google Scholar
  36. 36.
    Shao, J., Kang, K., Loy, C.C., Wang, X.: Deeply learned attributes for crowded scene understanding. In: Proceedings of CVPR (2015)Google Scholar
  37. 37.
    Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. TPAMI 35(1), 221–231 (2013)CrossRefGoogle Scholar
  38. 38.
    Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: Proceedings of CVPR (2014)Google Scholar
  39. 39.
    Yan, X., Chang, H., Shan, S., Chen, X.: Modeling video dynamics with deep dynencoder. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 215–230. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10593-2_15 Google Scholar
  40. 40.
    Srivastava, N., Mansimov, E., Salakhutdinov, R.: Unsupervised learning of video representations using lstms (2015). arXiv preprint arXiv:1502.04681
  41. 41.
    Tomasi, C., Kanade, T.: Detection and tracking of point features. School of Computer Science, Carnegie Mellon Univ. Pittsburgh (1991)Google Scholar
  42. 42.
    Lerner, A., Chrysanthou, Y., Lischinski, D.: Crowds by example. In: Computer Graphics Forum, vol. 26, pp. 655–664. Wiley Online Library (2007)Google Scholar
  43. 43.
    Alahi, A., Goel, K., Ramanathan, V., Robicquet, A., Fei-Fei, L., Savarese, S.: Social lstm: human trajectory prediction in crowded spaces. In: Proceedings of CVPR (2016)Google Scholar
  44. 44.
    Walker, J., Gupta, A., Hebert, M.: Patch to the future: unsupervised visual prediction. In: Proceedings of CVPR (2014)Google Scholar
  45. 45.
    Zhou, B., Wang, X., Tang, X.: Random field topic model for semantic region analysis in crowded scenes from tracklets. In: Proceedings of CVPR (2011)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.Department of Electronic EngineeringChinese University of Hong KongHong KongChina
  2. 2.Sensetime Group LimitedHong KongChina

Personalised recommendations