Abstract
In recent years video-based crowd counting and density estimation (CCDE) have become essential for crowd analysis. Current approaches rarely exploit spatial–temporal features for CCDE, and they also usually do not consider measures to minimize the frame's background influence for obtaining crowd density maps, which has resulted in lower performance in terms of Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). Again, attention to individual feature set's response toward crowd counting is also neglected. To this end, we are motivated to design an end-to-end trainable attentive multi-stream convolutional neural network (AMS-CNN) for crowd counting. At first, a multi-stream CNN (MS-CNN) is designed to obtain crowd density maps. The MS-CNN comprises three streams to fuse deep spatial, temporal, and spatial foreground features from different cues of the crowd video dataset, like frames, the volume of frames, and foregrounds of frames. To improve the accuracy, we designed three stream-wise attention modules to generate attentive crowd density maps, and their relative average is obtained using a relative averaged attentive density-map (RAAD) layer. The relative averaged density map is concatenated with the MS-CNN output, followed by two-stage CNN blocks to get the final density map. The experiments are demonstrated on three publicly available crowd density video datasets: Mall, UCSD, and Venice. We obtained promising and better results in terms of MAE and RMSE as compared with state-of-the-art approaches.
Similar content being viewed by others
References
Liu Y, Shi M, Zhao Q, Wang X (2019) Point in, box out: beyond counting persons in crowds. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. https://doi.org/10.1109/CVPR.2019.00663
Chen K, Loy CC, Gong S, Xiang T (2012) Feature mining for localised crowd counting. BMVC 1:1–11
An S, Liu A, Venkatesh S (2007) Face recognition using kernel ridge regression. In: CVPR’07 IEEE Conference on, IEEE, pp. 1–7
Chan AB, Liang ZSJ, Vasconcelos N (2008) Privacy preserving crowd monitoring: counting people without people models or tracking. In: 26th IEEE Conference computer vision and pattern recognition, CVPR. https://doi.org/10.1109/CVPR.2008.4587569
Chen K, Gong S, Xiang T, Loy CC (2013) Cumulative attribute space for age and crowd density estimation. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. https://doi.org/10.1109/CVPR.2013.319
Wang C, Zhang H, Yang L, et al (2015) Deep people counting in extremely dense crowds. In: MM 2015 - Proceedings of the 2015 ACM Multimedia Conference. pp 1299–1302
Shang C, Ai H, Bai B (2016) End-to-end crowd counting via joint learning local and global count. Proc - Int Conference on Image Process ICIP 2016-August pp. 1215–1219. https://doi.org/10.1109/ICIP.2016.7532551
Hu Y, Chang H, Nian F et al (2016) Dense crowd counting from still images with convolutional neural networks. J Vis Commun Image Represent 38:530–539. https://doi.org/10.1016/j.jvcir.2016.03.021
Miao Y, Han J, Gao Y, Zhang B (2019) ST-CNN: Spatial-temporal convolutional neural network for crowd counting in videos. Pattern Recognit Lett 125:113–118. https://doi.org/10.1016/j.patrec.2019.04.012
Zhang C, Li H, Wang X, Yang X (2015) Cross-scene crowd counting via deep convolutional neural networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition 07–12-June:833–841. https://doi.org/10.1109/CVPR.2015.7298684
Cho SH, Kang HB (2014) Abnormal behavior detection using hybrid agents in crowded scenes. Pattern Recognit Lett 44:64–70. https://doi.org/10.1016/j.patrec.2013.11.017
Tripathy SK, Srivastava R (2020) A real-time two-input stream multi-column multi-stage convolution neural network (TIS-MCMS-CNN) for efficient crowd congestion-level analysis. Multimed Syst 26:585–605. https://doi.org/10.1007/s00530-020-00667-4
Yang DS, Liu CY, Liao WH, Ruan SJ (2020) Crowd gathering and commotion detection based on the stillness and motion model. Multimed Tools Appl 79:19435–19449. https://doi.org/10.1007/s11042-020-08827-4
Shi X, Li X, Wu C, et al (2020) A real-time deep network for crowd counting
Liu Z, Chen Y, Chen B et al (2019) Crowd counting method based on convolutional neural network with global density feature. IEEE Access 7:88789–88798. https://doi.org/10.1109/ACCESS.2019.2926881
Xiong F, Shi X, Yeung DY (2017) Spatiotemporal modeling for crowd counting in videos. Proceedings of IEEE International Conference on Computer Vision 2017, pp. 5161–5169. https://doi.org/10.1109/ICCV.2017.551
Zhang S, Wu G (2017) FCN-rLSTM : Deep Spatio-Temporal Neural Networks for. Iccv 3687–3696
Zhang Y, Zhou D, Chen S et al (2016) Single-image crowd counting via multi-column convolutional neural network. Proc IEEE Conf Comput Vis pattern Recognit. https://doi.org/10.1002/slct.201701956
Boominathan L (2016) CrowdNet : A deep convolutional network for dense crowd counting. In: Proceedings of 24th ACM International Conference on Multimedia pp. 640–644
Zeng L, Xu X, Cai B et al (2017) Multi-scale convolutional neural networks for crowd counting. IEEE Int Conf Image Process 2017:465–469
Sam DB, Surya S, Babu RV (2017) Switching convolutional neural network for crowd counting. In: Proceedings- 30th IEEE Conference computer vision on pattern recognition, CVPR 2017 2017, pp. 4031–4039. https://doi.org/10.1109/CVPR.2017.429
Zhang L, Shi M, Chen Q (2018) Crowd counting via scale-adaptive convolutional neural network. In: Proceedings - 2018 IEEE Winter conference applications of computer vision, WACV 2018 2018, pp. 1113–1121. https://doi.org/10.1109/WACV.2018.00127
Wang Y, Hu S, Wang G et al (2020) Multi-scale dilated convolution of convolutional neural network for crowd counting. Multimed Tools Appl 79:1057–1073. https://doi.org/10.1007/s11042-019-08208-6
Zhou Y, Yang J, Li H et al (2020) Adversarial learning for multiscale crowd counting under complex scenes. IEEE Trans Cybern. https://doi.org/10.1109/TCYB.2019.2956091
Wang Y, Zhang W, Liu Y, Zhu J (2020) Multi-density map fusion network for crowd counting. Neurocomputing. https://doi.org/10.1016/j.neucom.2020.02.010
Onoro-Rubio D, López-Sastre RJ (2016) Towards perspective-free object counting with deep learning. In: European conference on computer vision. Springer, Cham, pp. 615–629. https://doi.org/10.1007/978-3-319-46478-7_38
Kang D, Chan A (2019) Crowd counting by adaptively fusing predictions from an image pyramid. In: Br Mach Vis Conf 2018, BMVC 2018 pp. 1–12
Wei X, Du J, Liang M, Ye L (2019) Boosting deep attribute learning via support vector regression for fast moving crowd counting. Pattern Recognit Lett 119:12–23. https://doi.org/10.1016/j.patrec.2017.12.002
Xu M, Ge Z, Jiang X et al (2019) Depth information guided crowd counting for complex crowd scenes. Pattern Recognit Lett 125:563–569. https://doi.org/10.1016/j.patrec.2019.02.026
Mixture of Gaussian-2. https://docs.opencv.org/3.4/d1/dc5/tutorial_background_subtraction.html
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323:533–536
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. pp. 1–15
Liu W, Salzmann M, Fua P (2019) Context-aware crowd counting. In: Proceedings of IEEE Computer Soc Conference on Computer Vision on Pattern Recognition 2019, pp. 5094–5103. https://doi.org/10.1109/CVPR.2019.00524
Chan AB, Vasconcelos N (2012) Counting people with low-level features and bayesian regression. IEEE Trans Image Process 21:2160–2177. https://doi.org/10.1109/TIP.2011.2172800
Pham VQ, Kozakaya T, Yamaguchi O, Okada R (2015) COUNT forest: Co-voting uncertain number of targets using random forest for crowd density estimation. In: Proceedings of IEEE International Conference on Computer Vision 2015 Inter, pp. 3253–3261. https://doi.org/10.1109/ICCV.2015.372
Han K, Wan W, Yao H, Hou L (2017) Image crowd counting using convolutional neural network and markov random field. J Adv Comput Intell Intell Inform 2:1–6
Zhang L, Lin L, Liang X, He K (2016) Is faster R-CNN doing well for pedestrian detection? Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 9906 LNCS:443–457. https://doi.org/10.1007/978-3-319-46475-6_28
Li Y, Zhang X, Chen D (2018) CSRNet: Dilated convolutional neural networks for understanding the highly congested scenes. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. https://doi.org/10.1109/CVPR.2018.00120
statistical visual computing laboratory (SVCL) at UC SanDiego (UCSD) UCSD Anomaly Detection Dataset. http://www.svcl.ucsd.edu/projects/anomaly/dataset.htm
Lempitsky V, Zisserman A (2010) Learning to count objects in images victor. Adv Neural Inf Process Syst 3:1–5
Saqib M, Khan SD, Sharma N, Blumenstein M (2019) Crowd counting in low-resolution crowded scenes using region-based deep convolutional neural networks. IEEE Access 7:35317–35329. https://doi.org/10.1109/ACCESS.2019.2904712
Acknowledgments
The support and the resources provided by ‘PARAM Shivay Facility' under the National Supercomputing Mission, Government of India at the Indian Institute of Technology, Varanasi, are gratefully acknowledged.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Tripathy, S.K., Srivastava, R. AMS-CNN: Attentive multi-stream CNN for video-based crowd counting. Int J Multimed Info Retr 10, 239–254 (2021). https://doi.org/10.1007/s13735-021-00220-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13735-021-00220-7