Abstract
In recent years, crowd behavior prediction (CBP) has gained much attention from academics and helps to control crowd disasters. The CBP has been solved either as one-class classification (OCC) or multi-class classification (MCC) problems. The OCC-based CBP models learn the normal crowd behavior patterns and treat outliers as anomalies or abnormal crowd behaviors. Nevertheless, these models do not consider the differences in anomaly types and interpret them as one class. On the other hand, the MCC-based CBP models overcome such drawbacks. However, very few datasets and models have been proposed. The current state-of-the-art MCC-based CBP approaches exploit spatial–temporal features but lack in addressing two crucial challenges in the crowd scenes: (a) human-scale variation due to perspective distortion and (b) minimizing effects of cluttered background. To this end, an end-to-end trainable two-stream multiscale deep architecture has been proposed for MCC-based CBP. The first stream uses a deep convolution neural network to extract multiscale spatial features from the frames to handle human-scale variation. The second stream extracts multiscale temporal features from de-background frames using a multi-layer dilated convolution long short-term memory. The effect of the cluttered background has been minimized by extracting de-background frames by adopting a visual background extractor algorithm. The multiscale features from the two streams are concatenated and used to classify different crowd behaviors. The experiments are manifested on two large-scale crowd behavior datasets: MED and GTA. The experimental results show that the proposed model performs better than the state-of-the-art MCC-based CBP approaches.
Similar content being viewed by others
References
Lu, C., Shi, J., Jia, J.: Abnormal event detection at 150 FPS in MATLAB. Proc IEEE Int. Conf. Comput. Vis. (2013). https://doi.org/10.1109/ICCV.2013.338
Cheng, K.W., Chen, Y.T., Fang, W.H.: Video anomaly detection and localization using hierarchical feature representation and Gaussian process regression. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (2015). https://doi.org/10.1109/CVPR.2015.7298909
Saligrama, V., Chen, Z.: Video anomaly detection based on local statistical aggregates. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (2012). https://doi.org/10.1109/CVPR.2012.6247917
Lamba, S., Nain, N.: Detecting anomalous crowd scenes by oriented Tracklets’ approach in active contour region. Multimed. Tools Appl. 78, 31101–31120 (2019). https://doi.org/10.1007/s11042-019-07806-8
Zhou, S., Shen, W., Zeng, D., et al.: Spatial-temporal convolutional neural networks for anomaly detection and localization in crowded scenes. Signal Process. Image Commun. 47, 358–368 (2016). https://doi.org/10.1016/j.image.2016.06.007
Bouindour, S., Hittawe, M.M., Mahfouz, S., Snoussi, H.: Abnormal event detection using convolutional neural networks and 1-Class SVM classifier. 1–6 (2018). https://doi.org/10.1049/ic.2017.0040
Smeureanu, S., Ionescu, R.T., Popescu, M., Alexe, B.: Deep appearance features for abnormal behavior detection in video. In: Image Analysis and Processing—ICIAP 2017 (2017)
Ravanbakhsh, M., Nabi, M., Mousavi, H., et al.: Plug-and-play CNN for crowd motion analysis: an application in abnormal event detection. In: Proc—2018 IEEE Winter Conf. Appl. Comput. Vision, WACV 2018-Janua, pp. 1689–1698. https://doi.org/10.1109/WACV.2018.00188 (2018)
Bouindour, S., Snoussi, H., Hittawe, M., et al.: An on-line and adaptive method for detecting abnormal events in videos using spatio-temporal ConvNet. Appl. Sci. 9, 757 (2019). https://doi.org/10.3390/app9040757
Song, W., Zhang, D., Zhao, X., et al.: A novel violent video detection scheme based on modified 3D convolutional neural networks. IEEE Access 7, 39172–39179 (2019). https://doi.org/10.1109/ACCESS.2019.2906275
Dinesh Jackson, S.R., Fenil, E., Gunasekaran, M., et al.: Real time violence detection framework for football stadium comprising of big data analysis and deep learning through bidirectional LSTM. Comput. Netw. 151, 191–200 (2019). https://doi.org/10.1016/j.comnet.2019.01.028
Sabokrou, M., Fathy, M., Hoseini, M., Klette, R.: Real-time anomaly detection and localization in crowded scenes. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Work (2015). https://doi.org/10.1109/CVPRW.2015.7301284
Xu, D., Ricci, E., Yan, Y., et al.: Learning deep representations of appearance and motion for anomalous event detection. Proc. Br. Mach. Vis. Conf. (2015). https://doi.org/10.5244/C.29.8
George, M., Jose, B.R., Mathew, J., Kokare, P.: Autoencoder-based abnormal activity detection using parallelepiped spatio-temporal region. IET Comput. Vis. 13, 23–30 (2018). https://doi.org/10.1049/iet-cvi.2018.5240
Tran, H.T.M., Hogg, D.: Anomaly detection using a convolutional autoencoder. Winner-take-all (2017)
Chong, Y.S., Tay, Y.H.: Abnormal event detection in videos using spatiotemporal autoencoder. Lect. Notes Comput. Sci. (Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinform.) 10262, 189–196 (2017). https://doi.org/10.1007/978-3-319-59081-3_23
Sabokrou, M., Fayyaz, M., Fathy, M., Klette, R.: Deep-cascade: cascading 3D deep neural networks for fast anomaly detection and localization in crowded scenes. IEEE Trans. Image Process. 26, 1992–2004 (2017). https://doi.org/10.1109/TIP.2017.2670780
Ravanbakhsh, M., Nabi, M., Sangineto, E., et al.: Abnormal event detection in videos using generative adversarial nets. In: ICIP, pp. 1577–1581. (2017). https://doi.org/10.1109/ICIP.2017.8296547.
Ravanbakhsh, M., Sangineto, E., Nabi, M., Sebe, N.: Training adversarial discriminators for cross-channel abnormal event detection in crowds. In: Proc—2019 IEEE Winter Conf. Appl. Comput. Vision, WACV, 2019, pp. 1896–1904. https://doi.org/10.1109/WACV.2019.00206 (2019)
Zhuang, N.: Convolutional DLSTM for crowd scene understanding. https://doi.org/10.1109/ISM.2017.19 (2017)
Yang, B., Cao, J., Wang, N., Liu, X.: Anomalous behaviors detection in moving crowds based on a weighted convolutional autoencoder-long short-term memory network. IEEE Trans. Cogn. Dev. Syst. (2018). https://doi.org/10.1109/TCDS.2018.2866838
H. Rabiee, J. Haddadnia, H. Mousavi, M. Kalantarzadeh, M. Nabi and V. Murino, Novel dataset for fine-grained abnormal behavior understanding in crowd. In: 2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 95-101 (2016). https://doi.org/10.1109/AVSS.2016.7738074
Lazaridis, L., Dimou, A., Daras, P.: Abnormal behavior detection in crowded scenes using density heatmaps and optical flow. Eur. Signal Process. Conf. (2018). https://doi.org/10.23919/EUSIPCO.2018.8553620
Dupont, C., Tobias, L., Luvison, B.: Crowd-11: a dataset for fine grained crowd behaviour analysis. In: IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Work 2017-July, pp. 2184–2191. https://doi.org/10.1109/CVPRW.2017.271 (2017)
Sindagi, V.A., Patel, V.M.: HA-CCN: hierarchical attention-based crowd counting network. IEEE Trans. Image Process. 29, 323–335 (2020). https://doi.org/10.1109/TIP.2019.2928634
Tripathy, S.K., Srivastava, R.: A real-time two-input stream multi-column multi-stage convolution neural network (TIS-MCMS-CNN) for efficient crowd congestion-level analysis. Multimed. Syst. 26, 585–605 (2020). https://doi.org/10.1007/s00530-020-00667-4
Aldissi, B., Ammar, H.: Real-time frequency-based detection of a panic behavior in human crowds. Multimed. Tools Appl. 79, 24851–24871 (2020). https://doi.org/10.1007/s11042-020-09024-z
Singh, G., Khosla, A., Kapoor, R.: Crowd escape event detection via pooling features of optical flow for intelligent video surveillance systems. Int. J. Image Graph Signal Process. 11, 40–49 (2019). https://doi.org/10.5815/ijigsp.2019.10.06
Sabokrou, M., Fayyaz, M., Fathy, M., et al.: Deep-anomaly: fully convolutional neural network for fast anomaly detection in crowded scenes. Comput. Vis. Image Underst. 172, 88–97 (2018). https://doi.org/10.1016/j.cviu.2018.02.006
Huang, S., Huang, D., Zhou, X.: Learning multimodal deep representations for crowd anomaly event detection. Math Probl Eng (2018). https://doi.org/10.1155/2018/6323942
Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: the all convolutional net (2014)
Ammar, H., Cherif, A.: DeepROD: a deep learning approach for real-time and online detection of a panic behavior in human crowds. Mach. Vis. Appl. (2021). https://doi.org/10.1007/s00138-021-01182-w
Ribeiro, M., Lazzaretti, A.E., Lopes, H.S.: A study of deep convolutional auto-encoders for anomaly detection in videos. Pattern Recognit. Lett. 105, 13–22 (2018). https://doi.org/10.1016/j.patrec.2017.07.016
Gutoski, M., Marcelo, N., Aquino, R., et al.: Detection of video anomalies using convolutional autoencoders and one-class support vector machines. In: XIII Brazilian Congr. Comput. Intell. 2017 (2017)
Sang, J., Wu, W., Luo, H., et al.: Improved crowd counting method based on scale-adaptive convolutional neural network. IEEE Access 7, 24411–24419 (2019). https://doi.org/10.1109/ACCESS.2019.2899939
Barnich, O., Van Droogenbroeck, M.: ViBe: a universal background subtraction algorithm for video sequences. IEEE Trans. Image Process. 20, 1709–1724 (2011). https://doi.org/10.1109/TIP.2010.2101613
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35, 221–231 (2013). https://doi.org/10.1109/TPAMI.2012.59
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. 1–15 (2014)
Acknowledgements
The support and the resources provided by ‘PARAM Shivay Facility' under the National Supercomputing Mission, Government of India at the Indian Institute of Technology, Varanasi, are gratefully acknowledged.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Ichiro IDE.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Tripathy, S.K., Kostha, H. & Srivastava, R. TS-MDA: two-stream multiscale deep architecture for crowd behavior prediction. Multimedia Systems 29, 15–31 (2023). https://doi.org/10.1007/s00530-022-00975-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-022-00975-x