Abstract
Crowd counting is a practical yet essential research topic in computer vision, which has been beneficial to diverse applications in smart city environment safety. The commonly adopted paradigm in most existing methods is to regress a Gaussian density map that works as the learning objective during model training. However, given the unavoidable identity occlusion and scale variation in a crowd image, the corresponding Gaussian density map is degraded, failing to provide reliable supervision for optimization. To address this problem, we propose to replace the traditional Gaussian density map with a better alternation, namely the smooth inverse map (SIM). The proposed SIM can reflect the head location spatially and provide a smooth gradient to stabilize the model learning. Besides, we want the method to learn more discriminative features to cope with the challenge of large-scale variations. We deliver a multiscale aggregation (MA) to adaptively fuse features in different hierarchies to benefit semantic information under diverse receptive filed. The SIM and MA are meant to be complementary modules to guide the model in learning an accurate density map. Extensive experiments on benchmark datasets demonstrate the effectiveness of the proposed method compared with the state-of-the-art techniques.
Similar content being viewed by others
Data Availability
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.
References
Cao X, Wang Z, Zhao Y, Su F (2018) Scale aggregation network for accurate and efficient crowd counting. In: ECCV. https://doi.org/10.1007/978-3-030-01228-1_45
Dollár P, Wojek C, Schiele B, Perona P (2012) Pedestrian detection: an evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell 34:743–761. https://doi.org/10.1109/TPAMI.2011.155
Felzenszwalb PF, Girshick RB, McAllester DA, Ramanan D (2009) Object detection with discriminatively trained part based models. IEEE Trans Pattern Anal Mach Intell 32:1627–1645. https://doi.org/10.1109/TPAMI.2009.167
Fu M, Xu P, Li X, Liu Q, Ye M, Zhu C (2015) Fast crowd density estimation with convolutional neural networks. Eng Appl Artif Intell 43:81–88. https://doi.org/10.1016/j.engappai.2015.04.006
Gao J, Wang Q, Li X (2020) Pcc net: Perspective crowd counting via spatial convolutional network. IEEE Trans Circuits Syst Video Technol 30:3486–3498. https://doi.org/10.1109/TCSVT.2019.2919139
Guo X, Gao M, Zhai W, Shang J, Li Q (2022) Spatial-frequency attention network for crowd counting Big data. https://doi.org/10.1089/big.2022.0039
Hwan Oh M, Olsen PA, Ramamurthy KN (2020) Crowd counting with decomposed uncertainty, arXiv:1903.07427. https://doi.org/10.1609/AAAI.V34I07.6852
Idrees H, Saleemi I, Seibert C, Shah M (2013) Multi-source multi-scale counting in extremely dense crowd images. In: 2013 IEEE Conference on computer vision and pattern recognition, pp 2547–2554. https://doi.org/10.1109/CVPR.2013.329
Idrees H, Saleemi I, Seibert C, Shah M (2013) Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2547–2554. https://doi.org/10.1109/CVPR.2013.329
Idrees H, Tayyab M, Athrey K, Zhang D, Al-Maadeed S, Rajpoot N, Shah M (2018) Composition loss for counting, density map estimation and localization in dense crowds. In: Proceedings of the european conference on computer vision (ECCV), pp 532–546. https://doi.org/10.1007/978-3-030-01216-8_33
Kasmani SA, He X, Jia W, Wang D, Zeibots M (2018) A-ccnn: Adaptive ccnn for density estimation and crowd counting. In: 2018 25th IEEE International conference on image processing (ICIP), pp 948–952. https://doi.org/10.1109/ICIP.2018.8451399
Li Y, Zhang X, Chen D (2018) Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition, pp 1091–1100. https://doi.org/10.1109/CVPR.2018.00120
Liang D, Xu W, Zhu Y, Zhou Y (2021) Reciprocal distance transform maps for crowd counting and people localization in dense crowd. arXiv:2102.07925
Liu C, Weng X, Mu Y (2019) Recurrent attentive zooming for joint crowd counting and precise localization. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 1217–1226. https://doi.org/10.1109/CVPR.2019.00131
Liu W, Salzmann M, Fua PV (2019) Context-aware crowd counting. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 5094–5103. https://doi.org/10.1109/CVPR.2019.00524
Oghaz MM, Khadka AR, Argyriou V, Remagnino P (2019) Content-aware density map for crowd counting and density estimation. arXiv:1906.07258
Olmschenk G, Tang H, Zhu Z (2020) Improving dense crowd counting convolutional neural networks using inverse k-nearest neighbor maps and multiscale upsampling, arXiv:1902.05379. https://doi.org/10.5220/0009156201850195
Sajid U, Ma W, Wang G (2021) Multi-resolution fusion and multi-scale input priors based crowd counting. In: 2020 25th International conference on pattern recognition (ICPR), pp 5790–5797. https://doi.org/10.1109/ICPR48806.2021.9412406
Sam DB, Peri SV, Sundararaman MN, Kamath A, Babu RV (2021) Locate, size, and count: Accurately resolving people in dense crowds via detection. IEEE Trans Pattern Anal Mach Intell 43:2739–2751. https://doi.org/10.1109/tpami.2020.2974830
Sindagi VA, Patel VM (2017) Generating high-quality crowd density maps using contextual pyramid cnns. In: 2017 IEEE International conference on computer vision (ICCV), pp 1879–1888. https://doi.org/10.1109/ICCV.2017.206
Sindagi VA, Patel VM (2019) Multi-level bottom-top and top-bottom feature fusion for crowd counting. In: 2019 IEEE/CVF International conference on computer vision (ICCV), pp 1002–1012. https://doi.org/10.1109/ICCV.2019.00109
Sindagi VA, Patel VM (2020) Ha-ccn: Hierarchical attention-based crowd counting network. IEEE Trans Image Process 29:323–335. https://doi.org/10.1109/TIP.2019.2928634
Sindagi VA, Yasarla R, Patel VM (2022) Jhu-crowd++: Large-scale crowd counting dataset and a benchmark method. IEEE Trans Pattern Anal Mach Intell 44:2594–2609. https://doi.org/10.1109/tpami.2020.3035969
Song Q, Wang C, Wang Y, Tai Y, Wang C, Li J, Wu J, Ma J (2021) To choose or to fuse? scale selection for crowd counting. In: AAAI
Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 5686–5696. https://doi.org/10.1109/CVPR.2019.00584
Tian Y, Lei Y, Zhang J, Wang JZ (2020) Padnet: Pan-density crowd counting. IEEE Trans Image Process 29:2714–2727. https://doi.org/10.1109/TIP.2019.2952083
Topkaya IS, Erdogan H, Porikli FM (2014) Counting people by clustering person detector outputs. In: 2014 11th IEEE International conference on advanced video and signal based surveillance (AVSS), pp 313–318. https://doi.org/10.1109/AVSS.2014.6918687
Wan J, Wang Q, Chan AB (2022) Kernel-based density map generation for dense object counting. IEEE Trans Pattern Anal Mach Intell 44:1357–1370. https://doi.org/10.1109/TPAMI.2020.3022878
Wang Q, Gao J, Lin W, Yuan Y (2019) Learning from synthetic data for crowd counting in the wild. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 8190–8199. https://doi.org/10.1109/CVPR.2019.00839
Xu C, Liang D, Xu Y, Bai S, Zhan W, Tomizuka M, Bai X (2022) Autoscale: Learning to scale for crowd counting. Int J Comput Vis, pp 1–30. https://doi.org/10.1007/s11263-021-01542-z
Zhai W, Gao M, Anisetti M, Li Q, Jeon S, Pan J (2022) Group-split attention network for crowd counting, J Electron Imaging. https://doi.org/10.1117/1.JEI.31.4.041214
Zhang C, Li H, Wang X, Yang X (2015) Cross-scene crowd counting via deep convolutional neural networks
Zhang Y, Zhou D, Chen S, Gao S, Ma Y (2016) Single-image crowd counting via multi-column convolutional neural network. In: 2016 IEEE Conference on computer vision and pattern recognition (CVPR), pp 589–597. https://doi.org/10.1109/CVPR.2016.70
Acknowledgements
This work is supported in part by the National Natural Science Foundation of China (Nos. 61601266 and 61801272) and National Natural Science Foundation of Shandong Province (Nos. ZR2021QD041 and ZR2020MF127).
Funding
This work is supported in part by the National Natural Science Foundation of China (Nos. 61601266 and 61801272) and National Natural Science Foundation of Shandong Province (Nos. ZR2021QD041 and ZR2020MF127).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethics approval and consent to participate
We declare that there is no ethics issue.
Conflict of Interests
We declare that we have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article belongs to the Topical Collection: 1229: Multimedia Data Analysis for Smart City Environment Safety
Guest Editors: Alessandro Bruno, Aladine Chetouani, Zoheir Sabeur, Marouane Tliba, Evangelos Maltezos, Miguel Gonzalez San Emeterio
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Guo, X., Gao, M., Zhai, W. et al. Multiscale aggregation network via smooth inverse map for crowd counting. Multimed Tools Appl (2022). https://doi.org/10.1007/s11042-022-13664-8
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11042-022-13664-8