Towards Intelligent Crowd Behavior Understanding Through the STFD Descriptor Exploration

Xu, Yuanping; Lu, Li; Xu, Zhijie; He, Jia; Wang, Jing; Huang, Jian; Lu, Jun

doi:10.1007/s11220-018-0201-3

Towards Intelligent Crowd Behavior Understanding Through the STFD Descriptor Exploration

Original Paper
Published: 25 April 2018

Volume 19, article number 17, (2018)
Cite this article

Sensing and Imaging Aims and scope Submit manuscript

Yuanping Xu¹,
Li Lu¹,
Zhijie Xu^2,3,
Jia He²,
Jing Wang^2,4,
Jian Huang¹ &
…
Jun Lu¹

279 Accesses
4 Citations
Explore all metrics

Abstract

Realizing the automated and online detection of crowd anomalies from surveillance CCTVs is a research-intensive and application-demanding task. This research proposes a novel technique for detecting crowd abnormalities through analyzing the spatial and temporal features of input video signals. This integrated solution defines an image descriptor (named spatio-temporal feature descriptor—STFD) that reflects the global motion pattern of crowds over time. A designed convolutional neural network (CNN) has then been adopted to classify dominant or large-scale crowd abnormal behaviors. The work reported has focused on: (1) detecting moving objects in online (or near real-time) manner through spatio-temporal segmentations of crowds identified by the similarity of group trajectory structures in the temporal space and the foreground blocks based on the Gaussian mixture model in the spatial space; (2) dividing multiple clustered groups based on the spectral clustering methods through treating image pixels from segmented regions as dynamic particles; (3) creating STFD descriptor instances by calculating corresponding attributes such as collectiveness, stability, conflict and crowd density for individuals (particles) in the corresponding groups; (4) inputting generated STFD descriptor instances into the devised CNN to detect suspicious crowd behaviors. For the test and evaluation of the devised models and techniques, the PETS database has been selected as the primary experimental data sets. Results against benchmarking models and systems have shown promising advancements of this novel approach in terms of accuracy and efficiency for crowd anomaly detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSD: Single Shot MultiBox Detector

Convolutional neural network: a review of models, methodologies and applications to object detection

Article 20 December 2019

Tools, techniques, datasets and application areas for object detection in an image: a review

Article 23 April 2022

References

Li, T., Chang, H., Wang, M., Ni, B., Hong, R., & Yan, S. (2015). Crowded scene analysis: A Survey. IEEE Transactions on Circuits and Systems for Video Technology, 25(3), 367–386.
Article Google Scholar
Zhou, B., Wang, X., & Tang, X. (2012). Understanding collective crowd behaviors: Learning a mixture model of dynamic pedestrian-agents. In Computer vision and pattern recognition (pp. 2871–2878).
Jacques Junior, J. C. S., Raupp Musse, S., & Jung, C. R. (2010). Crowd analysis using computer vision techniques. Signal Processing Magazine, IEEE, 27(5), 66–77.
Google Scholar
Mousavi, H., Galoogahi, H. K., Perina, A., & Murino, V. (2016). Detecting abnormal behavioral patterns in crowd scenarios. Cham: Springer International Publishing.
Book Google Scholar
Mehran, R., Oyama, A., & Shah, M. (2009). Abnormal crowd behavior detection using social force model. In IEEE conference on computer vision and pattern recognition, CVPR 2009 (pp 935–942).
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.
MATH Google Scholar
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In IEEE computer society conference on computer vision and pattern recognition, CVPR 2005 (pp. 886–893).
Yuan, Y., Fang, J., & Wang, Q. (2015). Online anomaly detection in crowd scenes via structure analysis. IEEE Transactions on Cybernetics, 45(3), 562.
Article Google Scholar
Shao, J., Chen, C. L., & Wang, X. (2017). Learning scene-independent group descriptors for crowd understanding. IEEE Transactions on Circuits and Systems for Video Technology, 27(6), 1290–1303.
Article Google Scholar
Christian, R., Carsten, S., Dodgson, N. A., Hans-Peter, S., & Christian, T. (2012). Coherent spatiotemporal filtering, upsampling and rendering of RGBZ videos. In Computer graphics forum, 2012 (pp. 247–256).
Brox, T., Bruhn, A., Papenberg, N., & Weickert, J. (2004). High accuracy optical flow estimation based on a theory for warping. European conference on computer vision, 3024(10), 25–36.
MATH Google Scholar
Horn, B. K. P., & Schunck, B. G. (1981). Determining optical flow. Artificial Intelligence, 17(1–3), 185–203.
Article Google Scholar
Bouguet, J. Y. (1999). Pyramidal implementation of the Lucas–Kanade feature tracker description of the algorithm. Opencv Documents, 22(2), 363–381.
Google Scholar
Zhou, B., Tang, X., & Wang, X. (2012). Coherent filtering: Detecting coherent motions from crowd clutters. Berlin: Springer.
Google Scholar
Davies, A. C., Yin, J. H., & Velastin, S. A. (1995). Crowd monitoring using image processing. Electronics and Communication Engineering Journal, 7(1), 37–47.
Article Google Scholar
Andrade, E. L., Blunsden, S., & Fisher, R. B. (2006). Modelling crowd scenes for event detection. In International conference on pattern recognition, 2006 (pp. 175–178).
Wang, C., Zhao, X., Wu, Z., & Liu, Y. (2014). Motion pattern analysis in crowded scenes based on hybrid generative-discriminative feature maps. In IEEE International conference on image processing, 2014 (pp. 2837–2841).
Zhang, Y., Qin, L., Ji, R., Yao, H., & Huang, Q. (2015). Social attribute-aware force model: Exploiting richness of interaction for abnormal crowd detection. IEEE Transactions on Circuits and Systems for Video Technology, 25(7), 1231–1245.
Article Google Scholar
Dahrendorf, R. (1958). Toward a theory of social conflict. Journal of Conflict Resolution, 2(2), 170–183.
Article Google Scholar
Wheelan, S. A. (2005). The handbook of group research and practice. Thousand Oaks: SAGE Publications.
Book Google Scholar
Zhang, X. G. (2000). Introduction to statistical learning theory and support vector machines. Acta Automatica Sinica, 26(01), 32–42.
MathSciNet Google Scholar
Cutler, A., Cutler, D. R., & Stevens, J. R. (2012). Random forests. New York: Springer.
Book Google Scholar
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012) ImageNet classification with deep convolutional neural networks. In International conference on neural information processing systems, 2012 (pp. 1097–1105).
Yim, J., Ju, J., Jung, H., & Kim, J. (2015). Image classification using convolutional neural networks with multi-stage feature. Berlin: Springer International Publishing.
Book Google Scholar
Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Computer vision and pattern recognition, 2015 (pp. 3431–3440).
Oquab, M., Bottou, L., Laptev, I., & Sivic, J. (2014). Learning and transferring mid-level image representations using convolutional neural networks. In IEEE conference on computer vision and pattern recognition, 2014 (pp. 1717–1724).
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Li, F. F. Large-scale video classification with convolutional neural networks. In IEEE conference on computer vision and pattern recognition, 2014 (pp. 1725–1732).
Zha, S., Luisier, F., Andrews, W., Srivastava, N., & Salakhutdinov, R. (2015). Exploiting image-trained CNN architectures for unconstrained video classification. In 26th British machine vision conference BMVC’15, 2015 (pp. 60.1–60.13).
Ouyang, W., Luo, P., Zeng, X., Qiu, S., Tian, Y., Li, H., et al. (2014). DeepID-Net: Multi-stage and deformable deep convolutional neural networks for object detection. Eprint Arxiv.
Ferryman, J., & Shahrokni, A. (2010) PETS2009: Dataset and challenge. In Twelfth IEEE international workshop on performance evaluation of tracking and surveillance, 2010 (pp. 1–6).
Li, J., Yang, H., & Wu, S. (2016). Crowd semantic segmentation based on spatial-temporal dynamics. In IEEE international conference on advanced video and signal based surveillance, 2016 (pp. 102–108).

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (No. 61203172), the SSTP of Sichuan (Nos. 2018YYJC0994 and 2017JY0011), and Shenzhen STPP (No. GJHZ20160301164521358)

Author information

Authors and Affiliations

School of Software Engineering, Chengdu University of Information Technology, Chengdu, 610225, China
Yuanping Xu, Li Lu, Jian Huang & Jun Lu
School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
Zhijie Xu, Jia He & Jing Wang
School of Computing and Engineering, University of Huddersfield, Huddersfield, HD1 3DH, UK
Zhijie Xu
Department of Computing, Sheffield Hallam University, Sheffield, S1 2NT, UK
Jing Wang

Authors

Yuanping Xu
View author publications
You can also search for this author in PubMed Google Scholar
Li Lu
View author publications
You can also search for this author in PubMed Google Scholar
Zhijie Xu
View author publications
You can also search for this author in PubMed Google Scholar
Jia He
View author publications
You can also search for this author in PubMed Google Scholar
Jing Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jian Huang
View author publications
You can also search for this author in PubMed Google Scholar
Jun Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuanping Xu.

Additional information

This article is part of the Topical Collection on Recent Developments in Sensing and Imaging.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, Y., Lu, L., Xu, Z. et al. Towards Intelligent Crowd Behavior Understanding Through the STFD Descriptor Exploration. Sens Imaging 19, 17 (2018). https://doi.org/10.1007/s11220-018-0201-3

Download citation

Received: 30 October 2017
Revised: 12 April 2018
Published: 25 April 2018
DOI: https://doi.org/10.1007/s11220-018-0201-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards Intelligent Crowd Behavior Understanding Through the STFD Descriptor Exploration

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Convolutional neural network: a review of models, methodologies and applications to object detection

Tools, techniques, datasets and application areas for object detection in an image: a review

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Towards Intelligent Crowd Behavior Understanding Through the STFD Descriptor Exploration

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Convolutional neural network: a review of models, methodologies and applications to object detection

Tools, techniques, datasets and application areas for object detection in an image: a review

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation