Skip to main content
Log in

Towards Intelligent Crowd Behavior Understanding Through the STFD Descriptor Exploration

  • Original Paper
  • Published:
Sensing and Imaging Aims and scope Submit manuscript

Abstract

Realizing the automated and online detection of crowd anomalies from surveillance CCTVs is a research-intensive and application-demanding task. This research proposes a novel technique for detecting crowd abnormalities through analyzing the spatial and temporal features of input video signals. This integrated solution defines an image descriptor (named spatio-temporal feature descriptor—STFD) that reflects the global motion pattern of crowds over time. A designed convolutional neural network (CNN) has then been adopted to classify dominant or large-scale crowd abnormal behaviors. The work reported has focused on: (1) detecting moving objects in online (or near real-time) manner through spatio-temporal segmentations of crowds identified by the similarity of group trajectory structures in the temporal space and the foreground blocks based on the Gaussian mixture model in the spatial space; (2) dividing multiple clustered groups based on the spectral clustering methods through treating image pixels from segmented regions as dynamic particles; (3) creating STFD descriptor instances by calculating corresponding attributes such as collectiveness, stability, conflict and crowd density for individuals (particles) in the corresponding groups; (4) inputting generated STFD descriptor instances into the devised CNN to detect suspicious crowd behaviors. For the test and evaluation of the devised models and techniques, the PETS database has been selected as the primary experimental data sets. Results against benchmarking models and systems have shown promising advancements of this novel approach in terms of accuracy and efficiency for crowd anomaly detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Li, T., Chang, H., Wang, M., Ni, B., Hong, R., & Yan, S. (2015). Crowded scene analysis: A Survey. IEEE Transactions on Circuits and Systems for Video Technology, 25(3), 367–386.

    Article  Google Scholar 

  2. Zhou, B., Wang, X., & Tang, X. (2012). Understanding collective crowd behaviors: Learning a mixture model of dynamic pedestrian-agents. In Computer vision and pattern recognition (pp. 2871–2878).

  3. Jacques Junior, J. C. S., Raupp Musse, S., & Jung, C. R. (2010). Crowd analysis using computer vision techniques. Signal Processing Magazine, IEEE, 27(5), 66–77.

    Google Scholar 

  4. Mousavi, H., Galoogahi, H. K., Perina, A., & Murino, V. (2016). Detecting abnormal behavioral patterns in crowd scenarios. Cham: Springer International Publishing.

    Book  Google Scholar 

  5. Mehran, R., Oyama, A., & Shah, M. (2009). Abnormal crowd behavior detection using social force model. In IEEE conference on computer vision and pattern recognition, CVPR 2009 (pp 935–942).

  6. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.

    MATH  Google Scholar 

  7. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In IEEE computer society conference on computer vision and pattern recognition, CVPR 2005 (pp. 886–893).

  8. Yuan, Y., Fang, J., & Wang, Q. (2015). Online anomaly detection in crowd scenes via structure analysis. IEEE Transactions on Cybernetics, 45(3), 562.

    Article  Google Scholar 

  9. Shao, J., Chen, C. L., & Wang, X. (2017). Learning scene-independent group descriptors for crowd understanding. IEEE Transactions on Circuits and Systems for Video Technology, 27(6), 1290–1303.

    Article  Google Scholar 

  10. Christian, R., Carsten, S., Dodgson, N. A., Hans-Peter, S., & Christian, T. (2012). Coherent spatiotemporal filtering, upsampling and rendering of RGBZ videos. In Computer graphics forum, 2012 (pp. 247–256).

  11. Brox, T., Bruhn, A., Papenberg, N., & Weickert, J. (2004). High accuracy optical flow estimation based on a theory for warping. European conference on computer vision, 3024(10), 25–36.

    MATH  Google Scholar 

  12. Horn, B. K. P., & Schunck, B. G. (1981). Determining optical flow. Artificial Intelligence, 17(1–3), 185–203.

    Article  Google Scholar 

  13. Bouguet, J. Y. (1999). Pyramidal implementation of the Lucas–Kanade feature tracker description of the algorithm. Opencv Documents, 22(2), 363–381.

    Google Scholar 

  14. Zhou, B., Tang, X., & Wang, X. (2012). Coherent filtering: Detecting coherent motions from crowd clutters. Berlin: Springer.

    Google Scholar 

  15. Davies, A. C., Yin, J. H., & Velastin, S. A. (1995). Crowd monitoring using image processing. Electronics and Communication Engineering Journal, 7(1), 37–47.

    Article  Google Scholar 

  16. Andrade, E. L., Blunsden, S., & Fisher, R. B. (2006). Modelling crowd scenes for event detection. In International conference on pattern recognition, 2006 (pp. 175–178).

  17. Wang, C., Zhao, X., Wu, Z., & Liu, Y. (2014). Motion pattern analysis in crowded scenes based on hybrid generative-discriminative feature maps. In IEEE International conference on image processing, 2014 (pp. 2837–2841).

  18. Zhang, Y., Qin, L., Ji, R., Yao, H., & Huang, Q. (2015). Social attribute-aware force model: Exploiting richness of interaction for abnormal crowd detection. IEEE Transactions on Circuits and Systems for Video Technology, 25(7), 1231–1245.

    Article  Google Scholar 

  19. Dahrendorf, R. (1958). Toward a theory of social conflict. Journal of Conflict Resolution, 2(2), 170–183.

    Article  Google Scholar 

  20. Wheelan, S. A. (2005). The handbook of group research and practice. Thousand Oaks: SAGE Publications.

    Book  Google Scholar 

  21. Zhang, X. G. (2000). Introduction to statistical learning theory and support vector machines. Acta Automatica Sinica, 26(01), 32–42.

    MathSciNet  Google Scholar 

  22. Cutler, A., Cutler, D. R., & Stevens, J. R. (2012). Random forests. New York: Springer.

    Book  Google Scholar 

  23. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012) ImageNet classification with deep convolutional neural networks. In International conference on neural information processing systems, 2012 (pp. 1097–1105).

  24. Yim, J., Ju, J., Jung, H., & Kim, J. (2015). Image classification using convolutional neural networks with multi-stage feature. Berlin: Springer International Publishing.

    Book  Google Scholar 

  25. Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Computer vision and pattern recognition, 2015 (pp. 3431–3440).

  26. Oquab, M., Bottou, L., Laptev, I., & Sivic, J. (2014). Learning and transferring mid-level image representations using convolutional neural networks. In IEEE conference on computer vision and pattern recognition, 2014 (pp. 1717–1724).

  27. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Li, F. F. Large-scale video classification with convolutional neural networks. In IEEE conference on computer vision and pattern recognition, 2014 (pp. 1725–1732).

  28. Zha, S., Luisier, F., Andrews, W., Srivastava, N., & Salakhutdinov, R. (2015). Exploiting image-trained CNN architectures for unconstrained video classification. In 26th British machine vision conference BMVC’15, 2015 (pp. 60.1–60.13).

  29. Ouyang, W., Luo, P., Zeng, X., Qiu, S., Tian, Y., Li, H., et al. (2014). DeepID-Net: Multi-stage and deformable deep convolutional neural networks for object detection. Eprint Arxiv.

  30. Ferryman, J., & Shahrokni, A. (2010) PETS2009: Dataset and challenge. In Twelfth IEEE international workshop on performance evaluation of tracking and surveillance, 2010 (pp. 1–6).

  31. Li, J., Yang, H., & Wu, S. (2016). Crowd semantic segmentation based on spatial-temporal dynamics. In IEEE international conference on advanced video and signal based surveillance, 2016 (pp. 102–108).

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (No. 61203172), the SSTP of Sichuan (Nos. 2018YYJC0994 and 2017JY0011), and Shenzhen STPP (No. GJHZ20160301164521358)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuanping Xu.

Additional information

This article is part of the Topical Collection on Recent Developments in Sensing and Imaging.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, Y., Lu, L., Xu, Z. et al. Towards Intelligent Crowd Behavior Understanding Through the STFD Descriptor Exploration. Sens Imaging 19, 17 (2018). https://doi.org/10.1007/s11220-018-0201-3

Download citation

  • Received:

  • Revised:

  • Published:

  • DOI: https://doi.org/10.1007/s11220-018-0201-3

Keywords

Navigation