TS-MDA: two-stream multiscale deep architecture for crowd behavior prediction

Tripathy, Santosh Kumar; Kostha, Harsh; Srivastava, Rajeev

doi:10.1007/s00530-022-00975-x

TS-MDA: two-stream multiscale deep architecture for crowd behavior prediction

Regular Paper
Published: 21 July 2022

Volume 29, pages 15–31, (2023)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Santosh Kumar Tripathy¹,
Harsh Kostha² &
Rajeev Srivastava¹

435 Accesses
3 Citations
Explore all metrics

Abstract

In recent years, crowd behavior prediction (CBP) has gained much attention from academics and helps to control crowd disasters. The CBP has been solved either as one-class classification (OCC) or multi-class classification (MCC) problems. The OCC-based CBP models learn the normal crowd behavior patterns and treat outliers as anomalies or abnormal crowd behaviors. Nevertheless, these models do not consider the differences in anomaly types and interpret them as one class. On the other hand, the MCC-based CBP models overcome such drawbacks. However, very few datasets and models have been proposed. The current state-of-the-art MCC-based CBP approaches exploit spatial–temporal features but lack in addressing two crucial challenges in the crowd scenes: (a) human-scale variation due to perspective distortion and (b) minimizing effects of cluttered background. To this end, an end-to-end trainable two-stream multiscale deep architecture has been proposed for MCC-based CBP. The first stream uses a deep convolution neural network to extract multiscale spatial features from the frames to handle human-scale variation. The second stream extracts multiscale temporal features from de-background frames using a multi-layer dilated convolution long short-term memory. The effect of the cluttered background has been minimized by extracting de-background frames by adopting a visual background extractor algorithm. The multiscale features from the two streams are concatenated and used to classify different crowd behaviors. The experiments are manifested on two large-scale crowd behavior datasets: MED and GTA. The experimental results show that the proposed model performs better than the state-of-the-art MCC-based CBP approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Motion-shape-based deep learning approach for divergence behavior detection in high-density crowd

Article 26 February 2021

Real-time crowd behavior recognition in surveillance videos based on deep learning methods

Article 03 May 2021

Human crowd behaviour analysis based on video segmentation and classification using expectation–maximization with deep learning architectures

Article 16 March 2024

References

Lu, C., Shi, J., Jia, J.: Abnormal event detection at 150 FPS in MATLAB. Proc IEEE Int. Conf. Comput. Vis. (2013). https://doi.org/10.1109/ICCV.2013.338
Article Google Scholar
Cheng, K.W., Chen, Y.T., Fang, W.H.: Video anomaly detection and localization using hierarchical feature representation and Gaussian process regression. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (2015). https://doi.org/10.1109/CVPR.2015.7298909
Article MATH Google Scholar
Saligrama, V., Chen, Z.: Video anomaly detection based on local statistical aggregates. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (2012). https://doi.org/10.1109/CVPR.2012.6247917
Article Google Scholar
Lamba, S., Nain, N.: Detecting anomalous crowd scenes by oriented Tracklets’ approach in active contour region. Multimed. Tools Appl. 78, 31101–31120 (2019). https://doi.org/10.1007/s11042-019-07806-8
Article Google Scholar
Zhou, S., Shen, W., Zeng, D., et al.: Spatial-temporal convolutional neural networks for anomaly detection and localization in crowded scenes. Signal Process. Image Commun. 47, 358–368 (2016). https://doi.org/10.1016/j.image.2016.06.007
Article Google Scholar
Bouindour, S., Hittawe, M.M., Mahfouz, S., Snoussi, H.: Abnormal event detection using convolutional neural networks and 1-Class SVM classifier. 1–6 (2018). https://doi.org/10.1049/ic.2017.0040
Smeureanu, S., Ionescu, R.T., Popescu, M., Alexe, B.: Deep appearance features for abnormal behavior detection in video. In: Image Analysis and Processing—ICIAP 2017 (2017)
Ravanbakhsh, M., Nabi, M., Mousavi, H., et al.: Plug-and-play CNN for crowd motion analysis: an application in abnormal event detection. In: Proc—2018 IEEE Winter Conf. Appl. Comput. Vision, WACV 2018-Janua, pp. 1689–1698. https://doi.org/10.1109/WACV.2018.00188 (2018)
Bouindour, S., Snoussi, H., Hittawe, M., et al.: An on-line and adaptive method for detecting abnormal events in videos using spatio-temporal ConvNet. Appl. Sci. 9, 757 (2019). https://doi.org/10.3390/app9040757
Article Google Scholar
Song, W., Zhang, D., Zhao, X., et al.: A novel violent video detection scheme based on modified 3D convolutional neural networks. IEEE Access 7, 39172–39179 (2019). https://doi.org/10.1109/ACCESS.2019.2906275
Article Google Scholar
Dinesh Jackson, S.R., Fenil, E., Gunasekaran, M., et al.: Real time violence detection framework for football stadium comprising of big data analysis and deep learning through bidirectional LSTM. Comput. Netw. 151, 191–200 (2019). https://doi.org/10.1016/j.comnet.2019.01.028
Article Google Scholar
Sabokrou, M., Fathy, M., Hoseini, M., Klette, R.: Real-time anomaly detection and localization in crowded scenes. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Work (2015). https://doi.org/10.1109/CVPRW.2015.7301284
Article Google Scholar
Xu, D., Ricci, E., Yan, Y., et al.: Learning deep representations of appearance and motion for anomalous event detection. Proc. Br. Mach. Vis. Conf. (2015). https://doi.org/10.5244/C.29.8
Article Google Scholar
George, M., Jose, B.R., Mathew, J., Kokare, P.: Autoencoder-based abnormal activity detection using parallelepiped spatio-temporal region. IET Comput. Vis. 13, 23–30 (2018). https://doi.org/10.1049/iet-cvi.2018.5240
Article Google Scholar
Tran, H.T.M., Hogg, D.: Anomaly detection using a convolutional autoencoder. Winner-take-all (2017)
Chong, Y.S., Tay, Y.H.: Abnormal event detection in videos using spatiotemporal autoencoder. Lect. Notes Comput. Sci. (Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinform.) 10262, 189–196 (2017). https://doi.org/10.1007/978-3-319-59081-3_23
Article Google Scholar
Sabokrou, M., Fayyaz, M., Fathy, M., Klette, R.: Deep-cascade: cascading 3D deep neural networks for fast anomaly detection and localization in crowded scenes. IEEE Trans. Image Process. 26, 1992–2004 (2017). https://doi.org/10.1109/TIP.2017.2670780
Article MathSciNet MATH Google Scholar
Ravanbakhsh, M., Nabi, M., Sangineto, E., et al.: Abnormal event detection in videos using generative adversarial nets. In: ICIP, pp. 1577–1581. (2017). https://doi.org/10.1109/ICIP.2017.8296547.
Ravanbakhsh, M., Sangineto, E., Nabi, M., Sebe, N.: Training adversarial discriminators for cross-channel abnormal event detection in crowds. In: Proc—2019 IEEE Winter Conf. Appl. Comput. Vision, WACV, 2019, pp. 1896–1904. https://doi.org/10.1109/WACV.2019.00206 (2019)
Zhuang, N.: Convolutional DLSTM for crowd scene understanding. https://doi.org/10.1109/ISM.2017.19 (2017)
Yang, B., Cao, J., Wang, N., Liu, X.: Anomalous behaviors detection in moving crowds based on a weighted convolutional autoencoder-long short-term memory network. IEEE Trans. Cogn. Dev. Syst. (2018). https://doi.org/10.1109/TCDS.2018.2866838
Article Google Scholar
H. Rabiee, J. Haddadnia, H. Mousavi, M. Kalantarzadeh, M. Nabi and V. Murino, Novel dataset for fine-grained abnormal behavior understanding in crowd. In: 2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 95-101 (2016). https://doi.org/10.1109/AVSS.2016.7738074
Lazaridis, L., Dimou, A., Daras, P.: Abnormal behavior detection in crowded scenes using density heatmaps and optical flow. Eur. Signal Process. Conf. (2018). https://doi.org/10.23919/EUSIPCO.2018.8553620
Article Google Scholar
Dupont, C., Tobias, L., Luvison, B.: Crowd-11: a dataset for fine grained crowd behaviour analysis. In: IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Work 2017-July, pp. 2184–2191. https://doi.org/10.1109/CVPRW.2017.271 (2017)
Sindagi, V.A., Patel, V.M.: HA-CCN: hierarchical attention-based crowd counting network. IEEE Trans. Image Process. 29, 323–335 (2020). https://doi.org/10.1109/TIP.2019.2928634
Article MathSciNet MATH Google Scholar
Tripathy, S.K., Srivastava, R.: A real-time two-input stream multi-column multi-stage convolution neural network (TIS-MCMS-CNN) for efficient crowd congestion-level analysis. Multimed. Syst. 26, 585–605 (2020). https://doi.org/10.1007/s00530-020-00667-4
Article Google Scholar
Aldissi, B., Ammar, H.: Real-time frequency-based detection of a panic behavior in human crowds. Multimed. Tools Appl. 79, 24851–24871 (2020). https://doi.org/10.1007/s11042-020-09024-z
Article Google Scholar
Singh, G., Khosla, A., Kapoor, R.: Crowd escape event detection via pooling features of optical flow for intelligent video surveillance systems. Int. J. Image Graph Signal Process. 11, 40–49 (2019). https://doi.org/10.5815/ijigsp.2019.10.06
Article Google Scholar
Sabokrou, M., Fayyaz, M., Fathy, M., et al.: Deep-anomaly: fully convolutional neural network for fast anomaly detection in crowded scenes. Comput. Vis. Image Underst. 172, 88–97 (2018). https://doi.org/10.1016/j.cviu.2018.02.006
Article MATH Google Scholar
Huang, S., Huang, D., Zhou, X.: Learning multimodal deep representations for crowd anomaly event detection. Math Probl Eng (2018). https://doi.org/10.1155/2018/6323942
Article Google Scholar
Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: the all convolutional net (2014)
Ammar, H., Cherif, A.: DeepROD: a deep learning approach for real-time and online detection of a panic behavior in human crowds. Mach. Vis. Appl. (2021). https://doi.org/10.1007/s00138-021-01182-w
Article Google Scholar
Ribeiro, M., Lazzaretti, A.E., Lopes, H.S.: A study of deep convolutional auto-encoders for anomaly detection in videos. Pattern Recognit. Lett. 105, 13–22 (2018). https://doi.org/10.1016/j.patrec.2017.07.016
Article Google Scholar
Gutoski, M., Marcelo, N., Aquino, R., et al.: Detection of video anomalies using convolutional autoencoders and one-class support vector machines. In: XIII Brazilian Congr. Comput. Intell. 2017 (2017)
Sang, J., Wu, W., Luo, H., et al.: Improved crowd counting method based on scale-adaptive convolutional neural network. IEEE Access 7, 24411–24419 (2019). https://doi.org/10.1109/ACCESS.2019.2899939
Article Google Scholar
Barnich, O., Van Droogenbroeck, M.: ViBe: a universal background subtraction algorithm for video sequences. IEEE Trans. Image Process. 20, 1709–1724 (2011). https://doi.org/10.1109/TIP.2010.2101613
Article MathSciNet MATH Google Scholar
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35, 221–231 (2013). https://doi.org/10.1109/TPAMI.2012.59
Article Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. 1–15 (2014)

Download references

Acknowledgements

The support and the resources provided by ‘PARAM Shivay Facility' under the National Supercomputing Mission, Government of India at the Indian Institute of Technology, Varanasi, are gratefully acknowledged.

Author information

Authors and Affiliations

Computing and Vision Lab, Department of Computer Science and Engineering, Indian Institute of Technology (BHU), Varanasi, UP, 221005, India
Santosh Kumar Tripathy & Rajeev Srivastava
Department of Chemical Engineering, Indian Institute of Technology (BHU), Varanasi, UP, 221005, India
Harsh Kostha

Authors

Santosh Kumar Tripathy
View author publications
You can also search for this author in PubMed Google Scholar
Harsh Kostha
View author publications
You can also search for this author in PubMed Google Scholar
Rajeev Srivastava
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Santosh Kumar Tripathy.

Additional information

Communicated by Ichiro IDE.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tripathy, S.K., Kostha, H. & Srivastava, R. TS-MDA: two-stream multiscale deep architecture for crowd behavior prediction. Multimedia Systems 29, 15–31 (2023). https://doi.org/10.1007/s00530-022-00975-x

Download citation

Received: 05 September 2021
Accepted: 27 June 2022
Published: 21 July 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s00530-022-00975-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

TS-MDA: two-stream multiscale deep architecture for crowd behavior prediction

Abstract

Access this article

Similar content being viewed by others

Motion-shape-based deep learning approach for divergence behavior detection in high-density crowd

Real-time crowd behavior recognition in surveillance videos based on deep learning methods

Human crowd behaviour analysis based on video segmentation and classification using expectation–maximization with deep learning architectures

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

TS-MDA: two-stream multiscale deep architecture for crowd behavior prediction

Abstract

Access this article

Similar content being viewed by others

Motion-shape-based deep learning approach for divergence behavior detection in high-density crowd

Real-time crowd behavior recognition in surveillance videos based on deep learning methods

Human crowd behaviour analysis based on video segmentation and classification using expectation–maximization with deep learning architectures

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation