AMS-CNN: Attentive multi-stream CNN for video-based crowd counting

Tripathy, Santosh Kumar; Srivastava, Rajeev

doi:10.1007/s13735-021-00220-7

AMS-CNN: Attentive multi-stream CNN for video-based crowd counting

Regular Paper
Published: 31 October 2021

Volume 10, pages 239–254, (2021)
Cite this article

International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

Santosh Kumar Tripathy¹ &
Rajeev Srivastava¹

673 Accesses
Explore all metrics

Abstract

In recent years video-based crowd counting and density estimation (CCDE) have become essential for crowd analysis. Current approaches rarely exploit spatial–temporal features for CCDE, and they also usually do not consider measures to minimize the frame's background influence for obtaining crowd density maps, which has resulted in lower performance in terms of Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). Again, attention to individual feature set's response toward crowd counting is also neglected. To this end, we are motivated to design an end-to-end trainable attentive multi-stream convolutional neural network (AMS-CNN) for crowd counting. At first, a multi-stream CNN (MS-CNN) is designed to obtain crowd density maps. The MS-CNN comprises three streams to fuse deep spatial, temporal, and spatial foreground features from different cues of the crowd video dataset, like frames, the volume of frames, and foregrounds of frames. To improve the accuracy, we designed three stream-wise attention modules to generate attentive crowd density maps, and their relative average is obtained using a relative averaged attentive density-map (RAAD) layer. The relative averaged density map is concatenated with the MS-CNN output, followed by two-stage CNN blocks to get the final density map. The experiments are demonstrated on three publicly available crowd density video datasets: Mall, UCSD, and Venice. We obtained promising and better results in terms of MAE and RMSE as compared with state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DRENet: Giving Full Scope to Detection and Regression-Based Estimation for Video Crowd Counting

Crowd Counting Based on MMCNN in Still Images

MACC Net: Multi-task attention crowd counting network

Article 08 August 2022

References

Liu Y, Shi M, Zhao Q, Wang X (2019) Point in, box out: beyond counting persons in crowds. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. https://doi.org/10.1109/CVPR.2019.00663
Article Google Scholar
Chen K, Loy CC, Gong S, Xiang T (2012) Feature mining for localised crowd counting. BMVC 1:1–11
Google Scholar
An S, Liu A, Venkatesh S (2007) Face recognition using kernel ridge regression. In: CVPR’07 IEEE Conference on, IEEE, pp. 1–7
Chan AB, Liang ZSJ, Vasconcelos N (2008) Privacy preserving crowd monitoring: counting people without people models or tracking. In: 26th IEEE Conference computer vision and pattern recognition, CVPR. https://doi.org/10.1109/CVPR.2008.4587569
Chen K, Gong S, Xiang T, Loy CC (2013) Cumulative attribute space for age and crowd density estimation. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. https://doi.org/10.1109/CVPR.2013.319
Article Google Scholar
Wang C, Zhang H, Yang L, et al (2015) Deep people counting in extremely dense crowds. In: MM 2015 - Proceedings of the 2015 ACM Multimedia Conference. pp 1299–1302
Shang C, Ai H, Bai B (2016) End-to-end crowd counting via joint learning local and global count. Proc - Int Conference on Image Process ICIP 2016-August pp. 1215–1219. https://doi.org/10.1109/ICIP.2016.7532551
Hu Y, Chang H, Nian F et al (2016) Dense crowd counting from still images with convolutional neural networks. J Vis Commun Image Represent 38:530–539. https://doi.org/10.1016/j.jvcir.2016.03.021
Article Google Scholar
Miao Y, Han J, Gao Y, Zhang B (2019) ST-CNN: Spatial-temporal convolutional neural network for crowd counting in videos. Pattern Recognit Lett 125:113–118. https://doi.org/10.1016/j.patrec.2019.04.012
Article Google Scholar
Zhang C, Li H, Wang X, Yang X (2015) Cross-scene crowd counting via deep convolutional neural networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition 07–12-June:833–841. https://doi.org/10.1109/CVPR.2015.7298684
Cho SH, Kang HB (2014) Abnormal behavior detection using hybrid agents in crowded scenes. Pattern Recognit Lett 44:64–70. https://doi.org/10.1016/j.patrec.2013.11.017
Article Google Scholar
Tripathy SK, Srivastava R (2020) A real-time two-input stream multi-column multi-stage convolution neural network (TIS-MCMS-CNN) for efficient crowd congestion-level analysis. Multimed Syst 26:585–605. https://doi.org/10.1007/s00530-020-00667-4
Article Google Scholar
Yang DS, Liu CY, Liao WH, Ruan SJ (2020) Crowd gathering and commotion detection based on the stillness and motion model. Multimed Tools Appl 79:19435–19449. https://doi.org/10.1007/s11042-020-08827-4
Article Google Scholar
Shi X, Li X, Wu C, et al (2020) A real-time deep network for crowd counting
Liu Z, Chen Y, Chen B et al (2019) Crowd counting method based on convolutional neural network with global density feature. IEEE Access 7:88789–88798. https://doi.org/10.1109/ACCESS.2019.2926881
Article Google Scholar
Xiong F, Shi X, Yeung DY (2017) Spatiotemporal modeling for crowd counting in videos. Proceedings of IEEE International Conference on Computer Vision 2017, pp. 5161–5169. https://doi.org/10.1109/ICCV.2017.551
Zhang S, Wu G (2017) FCN-rLSTM : Deep Spatio-Temporal Neural Networks for. Iccv 3687–3696
Zhang Y, Zhou D, Chen S et al (2016) Single-image crowd counting via multi-column convolutional neural network. Proc IEEE Conf Comput Vis pattern Recognit. https://doi.org/10.1002/slct.201701956
Article Google Scholar
Boominathan L (2016) CrowdNet : A deep convolutional network for dense crowd counting. In: Proceedings of 24th ACM International Conference on Multimedia pp. 640–644
Zeng L, Xu X, Cai B et al (2017) Multi-scale convolutional neural networks for crowd counting. IEEE Int Conf Image Process 2017:465–469
Google Scholar
Sam DB, Surya S, Babu RV (2017) Switching convolutional neural network for crowd counting. In: Proceedings- 30th IEEE Conference computer vision on pattern recognition, CVPR 2017 2017, pp. 4031–4039. https://doi.org/10.1109/CVPR.2017.429
Zhang L, Shi M, Chen Q (2018) Crowd counting via scale-adaptive convolutional neural network. In: Proceedings - 2018 IEEE Winter conference applications of computer vision, WACV 2018 2018, pp. 1113–1121. https://doi.org/10.1109/WACV.2018.00127
Wang Y, Hu S, Wang G et al (2020) Multi-scale dilated convolution of convolutional neural network for crowd counting. Multimed Tools Appl 79:1057–1073. https://doi.org/10.1007/s11042-019-08208-6
Article Google Scholar
Zhou Y, Yang J, Li H et al (2020) Adversarial learning for multiscale crowd counting under complex scenes. IEEE Trans Cybern. https://doi.org/10.1109/TCYB.2019.2956091
Article Google Scholar
Wang Y, Zhang W, Liu Y, Zhu J (2020) Multi-density map fusion network for crowd counting. Neurocomputing. https://doi.org/10.1016/j.neucom.2020.02.010
Article Google Scholar
Onoro-Rubio D, López-Sastre RJ (2016) Towards perspective-free object counting with deep learning. In: European conference on computer vision. Springer, Cham, pp. 615–629. https://doi.org/10.1007/978-3-319-46478-7_38
Kang D, Chan A (2019) Crowd counting by adaptively fusing predictions from an image pyramid. In: Br Mach Vis Conf 2018, BMVC 2018 pp. 1–12
Wei X, Du J, Liang M, Ye L (2019) Boosting deep attribute learning via support vector regression for fast moving crowd counting. Pattern Recognit Lett 119:12–23. https://doi.org/10.1016/j.patrec.2017.12.002
Article Google Scholar
Xu M, Ge Z, Jiang X et al (2019) Depth information guided crowd counting for complex crowd scenes. Pattern Recognit Lett 125:563–569. https://doi.org/10.1016/j.patrec.2019.02.026
Article Google Scholar
Mixture of Gaussian-2. https://docs.opencv.org/3.4/d1/dc5/tutorial_background_subtraction.html
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323:533–536
Article Google Scholar
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. pp. 1–15
Liu W, Salzmann M, Fua P (2019) Context-aware crowd counting. In: Proceedings of IEEE Computer Soc Conference on Computer Vision on Pattern Recognition 2019, pp. 5094–5103. https://doi.org/10.1109/CVPR.2019.00524
Chan AB, Vasconcelos N (2012) Counting people with low-level features and bayesian regression. IEEE Trans Image Process 21:2160–2177. https://doi.org/10.1109/TIP.2011.2172800
Article MathSciNet MATH Google Scholar
Pham VQ, Kozakaya T, Yamaguchi O, Okada R (2015) COUNT forest: Co-voting uncertain number of targets using random forest for crowd density estimation. In: Proceedings of IEEE International Conference on Computer Vision 2015 Inter, pp. 3253–3261. https://doi.org/10.1109/ICCV.2015.372
Han K, Wan W, Yao H, Hou L (2017) Image crowd counting using convolutional neural network and markov random field. J Adv Comput Intell Intell Inform 2:1–6
Google Scholar
Zhang L, Lin L, Liang X, He K (2016) Is faster R-CNN doing well for pedestrian detection? Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 9906 LNCS:443–457. https://doi.org/10.1007/978-3-319-46475-6_28
Li Y, Zhang X, Chen D (2018) CSRNet: Dilated convolutional neural networks for understanding the highly congested scenes. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. https://doi.org/10.1109/CVPR.2018.00120
Article Google Scholar
statistical visual computing laboratory (SVCL) at UC SanDiego (UCSD) UCSD Anomaly Detection Dataset. http://www.svcl.ucsd.edu/projects/anomaly/dataset.htm
Lempitsky V, Zisserman A (2010) Learning to count objects in images victor. Adv Neural Inf Process Syst 3:1–5
Google Scholar
Saqib M, Khan SD, Sharma N, Blumenstein M (2019) Crowd counting in low-resolution crowded scenes using region-based deep convolutional neural networks. IEEE Access 7:35317–35329. https://doi.org/10.1109/ACCESS.2019.2904712
Article Google Scholar

Download references

Acknowledgments

The support and the resources provided by ‘PARAM Shivay Facility' under the National Supercomputing Mission, Government of India at the Indian Institute of Technology, Varanasi, are gratefully acknowledged.

Author information

Authors and Affiliations

Computing and Vision Lab, Department of Computer Science and Engineering, Indian Institute of Technology (BHU), Varanasi, UP, 221005, India
Santosh Kumar Tripathy & Rajeev Srivastava

Authors

Santosh Kumar Tripathy
View author publications
You can also search for this author in PubMed Google Scholar
Rajeev Srivastava
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Santosh Kumar Tripathy.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tripathy, S.K., Srivastava, R. AMS-CNN: Attentive multi-stream CNN for video-based crowd counting. Int J Multimed Info Retr 10, 239–254 (2021). https://doi.org/10.1007/s13735-021-00220-7

Download citation

Received: 26 March 2021
Revised: 24 September 2021
Accepted: 06 October 2021
Published: 31 October 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s13735-021-00220-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

AMS-CNN: Attentive multi-stream CNN for video-based crowd counting

Abstract

Access this article

Similar content being viewed by others

DRENet: Giving Full Scope to Detection and Regression-Based Estimation for Video Crowd Counting

Crowd Counting Based on MMCNN in Still Images

MACC Net: Multi-task attention crowd counting network

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

AMS-CNN: Attentive multi-stream CNN for video-based crowd counting

Abstract

Access this article

Similar content being viewed by others

DRENet: Giving Full Scope to Detection and Regression-Based Estimation for Video Crowd Counting

Crowd Counting Based on MMCNN in Still Images

MACC Net: Multi-task attention crowd counting network

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation