Abstract
Crowd counting has played a substantial role in intelligent surveillance. This work presents a multi-scale multi-task convolutional neural network (MSMT-CNN) to estimate accurate density maps, thus can count the crowd through summing up all values in the estimated density maps. The ground truth density maps used for training are generated by a novel adaptive human-shaped kernel. In addition to resolving the scale problem with the multi-scale strategy, the multi-task learning strategy is added so as to make the estimated density maps more accurate. A weighted loss function is proposed to enhance the activations in dense regions and suppress the background noise. Experimental results on two benchmarking datasets reveal the strong ability of MSMT-CNN. Compared with existing crowd counting methods, the root mean squared error is decreased by 39.8 on the UCF_CC_50 dataset, and the mean absolute error is decreased by 2.3 on the World Expo’10 dataset. Furthermore, the evaluations in practical bus videos verify the practicability of our MSMT-CNN.
Similar content being viewed by others
References
Cao J, Yang B, Zhang Y, Zou L (2017) Crowd counting from a still image using multi-scale fully convolutional network with adaptive human-shaped kernel. In: Pacific-rim symposium on image and video technology. Springer, pp 227–240
Chen M, Wang Q, Li X (2017) Patch-based topic model for group detection. Sci China Inf Sci 60(11):113,101
Fradi H, Dugelay J (2012) Low level crowd analysis using frame-wise normalized feature for people counting. In: 2012 IEEE International workshop on information forensics and security (WIFS). IEEE, pp 246–251
Gao C, Liu J, Feng Q, Lv J (2016) People-flow counting in complex environments by combining depth and color information. Multimed Tools Appl 75 (15):9315–9331
Hashemzadeh M, Farajzadeh N (2016) Combining keypoint-based and segment-based features for counting people in crowded scenes. Inform Sci 345:199–216
Hu X, Zheng H, Wang W, Li X (2013) A novel approach for crowd video monitoring of subway platforms. Optik-Int J Light Electron Opt 124(22):5301–5306
Idrees H, Saleemi I, Seibert C, Shah M (2013) Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2547–2554
Kang D, Ma Z, Chan AB (2018) Beyond counting: comparisons of density maps for crowd analysis tasks-counting, detection, and tracking. IEEE Trans Circ Syst Vid Technol, 1–1
Li Y, Zhang X, Chen D (2018) Csrnet: dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1091–1100
Liu J, Gao C, Meng D, Hauptmann AG (2017) Decidenet: counting varying density crowds through attention guided detection and density estimation. arXiv:1712.06679
Luo J, Wang J, Xu H, Lu H (2016) Real-time people counting for indoor scenes. Signal Process 124:27–35
Marsden M, McGuinness K, Little S, O’Connor NE (2016) Fully convolutional crowd counting on highly congested scenes. arXiv:1612.00220
Onoro-Rubio D, López-Sastre RJ (2016) Towards perspective-free object counting with deep learning. In: European conference on computer vision. Springer, pp 615–629
Ryan D, Denman S, Sridharan S, Fookes C (2015) An evaluation of crowd counting methods, features and regression models. Comput Vis Image Underst 130:1–17
Sam DB, Surya S, Babu RV (2017) Switching convolutional neural network for crowd counting. In: Proceedings of the IEEE conference on computer vision and pattern recognition, vol 1, p 6
Shang C, Ai H, Bai B (2016) End-to-end crowd counting via joint learning local and global count. In: 2016 IEEE International conference on image processing (ICIP). IEEE, pp 1215–1219
Shen Z, Xu Y, Ni B, Wang M, Hu J, Yang X (2018) Crowd counting via adversarial cross-scale consistency pursuit. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5245–5254
Shi Z, Zhang L, Liu Y, Cao X, Ye Y, Cheng MM, Zheng G (2018) Crowd counting with deep negative correlation learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5382–5390
Sindagi VA, Patel VM (2017) Cnn-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In: 2017 14th IEEE International conference on advanced video and signal based surveillance (AVSS). IEEE, pp 1–6
Sindagi VA, Patel VM (2017) Generating high-quality crowd density maps using contextual pyramid cnns. In: 2017 IEEE international conference on computer vision (ICCV). IEEE, pp 1879–1888
Siva P, Shafiee MJ, Jamieson M, Wong A (2016) Scene invariant crowd segmentation and counting using scale-normalized histogram of moving gradients (homg). arXiv:1602.00386
Wang T, Li G, Lei J, Li S, Xu S (2017) Crowd counting based on mmcnn in still images. In: Scandinavian conference on image analysis. Springer, pp 468–479
Wang Q, Wan J, Yuan Y (2018) Deep metric learning for crowdedness regression. IEEE Trans Circ Syst Vid Technol 28(10):2633–2643
Xiong F, Shi X, Yeung DY (2017) Spatiotemporal modeling for crowd counting in videos. In: 2017 IEEE International conference on computer vision (ICCV). IEEE, pp 5161–5169
Yang J, Li J, He Y (2014) Crowd density and counting estimation based on image textural feature. J Multimed 9(10):1152
Yang B, Cao JM, Wang N, Zhang YY, Cui GZ (2018) Cross-scene counting based on domain adaptation-extreme learning machine. IEEE Access, 1–1
Zeiler MD, Ranzato M, Monga R, Mao M, Yang K, Le QV, Nguyen P, Senior A, Vanhoucke V, Dean J et al (2013) On rectified linear units for speech processing. In: 2013 IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 3517–3521
Zeng L, Xu X, Cai B, Qiu S, Zhang T (2017) Multi-scale convolutional neural networks for crowd counting. In: 2017 IEEE International conference on image processing (ICIP). IEEE, pp 465–469
Zhang C, Li H, Wang X, Yang X (2015) Cross-scene crowd counting via deep convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 833–841
Zhang X, He H, Cao S, Liu H (2015) Flow field texture representation-based motion segmentation for crowd counting. Mach Vis Appl 26(7-8):871–883
Zhang Y, Zhou D, Chen S, Gao S, Ma Y (2016) Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 589–597
Acknowledgements
This work has been supported by the National Natural Science Foundation of China under Grant No. 61501060, No. 61703381, No. 61601203, No. U1762264 and No. U1764257, the National Key Research and Development Program of China No. 2018YFB0105003.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Cao, J., Yang, B., Nan, W. et al. Robust crowd counting based on refined density map. Multimed Tools Appl 79, 2837–2853 (2020). https://doi.org/10.1007/s11042-019-08467-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-08467-3