Abstract
Crowd counting technology is to let people understand the spatial distribution of crowds in various scenes. In reality, a large number of occlusions and scale variations make it extremely challenging to achieve accurate counting in crowded venues. Aiming at these problems, this paper designs a crowd density estimation network that can maintain good accuracy in scenes that are both crowded and have large-scale changes: Texture Feature Attention Convolutional Neural Network (TFA-CNN). Specifically: (1) A Differential Texture Module (DT Module) is proposed to identify various texture features of the bottom feature map and to better distinguish between background and foreground regions; (2) proposed the Multi-Channel Threshold Replacement Attention Module (MTRA Module), which combines channel and spatial attention mechanisms to allow the network to pay more focus on the head position of the crowd, thereby reducing the counting error. TFA-CNN has conducted multiple experiments on several publicly available and challenging datasets, and the results are superior to many SOTA methods, demonstrating excellent generalization and robustness.
Similar content being viewed by others
Data availability
The tagged data set used in this article is available on request from the corresponding author.
References
Wen, L., Du, D., Zhu, P., Hu, Q., Wang, Q., Bo, L., Lyu, S.: Detection, tracking, and counting meets drones in crowds: a benchmark. In: CVPR, pp. 7812–7821 (2021)
Liu, N., Long, Y., Zou, C., Niu, Q., Pan, L., Wu, H., ADCrowdNet: An attention-injective deformable convolutional network for crowd understanding. In: CVPR, pp. 3225–3234 (2019)
Liang, D., Chen, X., Wei, X., Zhou, Y., Xiang Bai, X., TransCrowd: weakly-supervised crowd counting with transformers. Sci. China Inf. Sci 65(6), 1–14 (2022)
Rao, A.S., Gubbi, J., Marusic, S., et al.: Estimation of crowd density by clustering motion cues. Vis. Comput. 31, 1533–1552 (2015)
Bo, Wu., Nevatia, R.: Detection and tracking of multiple, partially occluded humans by bayesian combination of edgelet based part detectors. Int. J. Comput. Vis. 75(2), 247–266 (2007)
Azizpour, H., Laptev, I.: Object detection using strongly-supervised deformable part models. In: ECCV, pp. 836–849 (2012)
Lempitsky, V.S., Zisserman, A.: Learning to count objects in images. In: NIPS, pp. 1324–1332 (2010)
Tian, M., Guo, H., Long, C.: Multi-level attentive convoluntional neural network for crowd counting. arXiv https://arxiv.org/abs/2105.11422 (2021)
Marsden, M., McGuinness, K., Little, S., O'Connor, N.E.: Fully convolutional crowd counting on highly congested scenes. In: VISIGRAPP, pp. 27–33 (2017)
Liyan Xiong, Hu., Yi, X.H., Huang, W.: An efficient multi-scale contextual feature fusion network for counting crowds with varying densities and scales. Multimed. Tools Appl 82(9), 13929–13949 (2023)
Kong, D., Gray, D., Tao, H.: A viewpoint invariant approach for crowd counting. In: ICPR, pp. 1187–1190 (2006)
Siva, P., Javad Shafiee, M., Jamieson, M., Wong, A.: Real-time, embedded scene invariant crowd counting using scale-normalized histogram of moving gradients (HoMG). In: CVPR Workshop 67–74 (2016)
Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: CVPR, pp. 589–597 (2016)
Liu, L., Qiu, Z., Li, G., Liu, S., Ouyang, W., Lin, L.: Crowd counting with deep structured scale integration network. In: ICCV, pp. 1774–1783 (2019)
Chan, A.B., Vasconcelos, N.: Bayesian Poisson regression for crowd counting. In: ICCV, pp. 545–551 (2009)
Zhang, C., Li, H., Wang, X., Yang, X.: Cross-scene crowd counting via deep convolutional neural networks. In: CVPR, pp. 833–841 (2015)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR, pp. 1–9 (2015)
Ma, Y.: Inception-based crowd counting - being fast while remaining accurate. arXiv https://arxiv.org/abs/2210.09796v1 (2022)
Wang, Q., Breckon, T.P.: Crowd counting via segmentation guided attention networks and curriculum loss. IEEE Trans. Intell. Transp. Syst. 23(9), 15233–15243 (2022)
Liu, W., Salzmann, M., Fua, P.: Context-aware crowd counting. In: CVPR, pp. 5099–5108 (2019)
Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., Lu, H.: Dual attention network for scene segmentation. In: CVPR., pp. 3146–3154 (2019)
Gao, J., Wang, Qi., Yuan, Y.: SCAR: Spatial-/channel-wise attention regression networks for crowd counting. Neurocomputing 363, 1–8 (2019)
Li, H., Zhang, S., Kong, W.: Bilateral counting network for single-image object counting. Vis. Comput. 36(8), 1693–1704 (2020)
Khan, M.A., Menouar, H., Hamila, R.: Crowd density estimation using imperfect labels. In: ICCE 1–6 (2023)
Bai, S., He, Z., Xu, C., Qiao, Y. et al.: Adaptive dilated network with self-correction supervision for counting. In: CVPR, pp. 4594–4603 (2022)
Ma, Y., Sanchez, V., Guha, T., Fusioncount: Efficient crowd counting via multiscale feature fusion. In: ICIP, pp. 3256–3260 (2022)
Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv https://arxiv.org/abs/1706.05587 (2017)
Li, Y., Zhang, X., Chen, D.: CSRNet: dilated convolutional neural networks for understanding the highly congested scenes. In: CVPR, pp .1091–1100 (2018)
Wang, X., Zhao, Y., Yang, T., Ruan Q.: Multi-scale context aggregation network with attention-guided for crowd counting. In: ICSP https://arxiv.org/abs/2104.02245 (2020)
Sam, D.B., Surya, S., Venkatesh Babu, R.: Switching convolutional neural network for crowd counting. In: CVPR , pp. 4031–4039 (2017)
Li, Z., Shuhua, Lu., Dong, Y., Guo, J.: MSFFA: a multi-scale feature fusion and attention mechanism network for crowd counting. Vis. Comput. 39(3), 1045–1056 (2023)
Song, Q., Wang, C., Wang, Y., Tai, Y., Wang, C., Li, J., Wu, J., Ma, J.: To choose or to fuse? Scale selection for crowd counting. In: AAAI, pp .2576–2583 (2021)
Hou, Y., Li, C., Yang, F., Ma, C., Zhu, L., Yuan Li, Huizhu Jia, Xiaodong Xie (2020) BBA-net: A bi-branch attention network for crowd counting. ICASSP 4072–4076
Ma, Z., Wei, X., Hong, X., Gong, Y.: Bayesian loss for crowd count estimation with point supervision. In: ICCV, pp. 6141–6150 (2019)
Cheng, Z.-Q., Li, J.-X., Dai, Q., Wu, X., Hauptmann, A.G.: Learning spatial awareness to improve crowd counting. In: ICCV, pp. 6151–6160 (2019)
Miao, Y., Lin, Z., Ding, G., Han, J.: Shallow feature based dense attention network for crowd counting. In: AAAI, pp. 11765–11772 (2020)
Sindagi, V.A., Patel, V.M.: Inverse attention guided deep crowd counting network. In: AVSS, pp. 1–8 (2019)
Zhang, A., Yue, L., Shen, J., Zhu, F., Zhen, X., Cao, X., Shao, L.: Attentional neural fields for crowd counting. In: ICCV, pp. 5713–5722 (2019)
Jiang, X., Zhang, L., Xu, M., Zhang, T., Lv, P., Zhou, B., Yang, X., Pang, X.: Attention scaling for crowd counting. In: CVPR, pp. 4705–4714 (2020)
Amirgholipour, S., Jia, W., Liu, L., Fan, X., Wang, D., He, X.: PDANet: pyramid density-aware attention based network for accurate crowd counting. Neurocomputing 451, 215–230 (2021)
Lin, H., Ma, Z., Ji, R., Wang, Y., Hong, X.: Boosting crowd counting via multifaceted attention. In: CVPR., pp. 19596–19605 (2022)
Han, S., Wang, G., Liu, D.: Indirect-instant attention optimization for crowd counting in dense scenes. arXiv https://arxiv.org/abs/2206.05648v1 (2022)
Tang, C., Liu, X., An, S., Wang, P., BR2Net: Defocus blur detection via a bidirectional channel attention residual refining network. IEEE Trans. Multimed. 23, 624–635 (2021)
Gao, J., Huang, Z., Lei, Y., Wang, J.Z., Wang, F.Y., Zhang, J.: S2FPR: crowd counting via self-supervised course to fine feature pyramid ranking. arXiv. 2201.04819. https://arxiv.org/abs/2201.04819 (2022)
Zhikang Zou, Yu., Cheng, X.Q., Ji, S., Guo, X., Zhou, P.: Attend to count: crowd counting with adaptive capacity multi-scale CNNs. Neurocomputing 367, 75–83 (2019)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint https://arxiv.org/abs/1412.6980 (2014)
Li, Y., Zhang, X., Chen, D., Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In: CVPR, pp. 1091–1100 (2018)
Ma, T., Ji, Q., Ning, L.: Scene invariant crowd counting using multi-scales head detection in video surveillance. IET Image Process 12(12), 2258–2263 (2018)
Idrees, H., Saleemi, I., Seibert, C., Shah, M.: Multi-source multi-scale counting in extremely dense crowd images. In: CVPR, pp. 2547–2554 (2013)
Cheng, Z.-Q., Dai, Q., Li, H., Song, J., Wu, X., Hauptmann, A.G.: Rethinking spatial invariance of convolutional networks for object counting. In: CVPR, pp. 19638–19648 (2022)
Idrees, H., Tayyab, M., Athrey, K., Zhang, D., Al-Maddeed, S., Rajpoot, N., Shah, M.: Composition loss for counting, density map estimation and localization in dense crowds. In: Proceedings of IEEE European Conference on Computer Vision (ECCV), Munich, Germany, September 8–14 (2018)
Liu, L., Jiang, J., Jia, W., Amirgholipour, S., Wang, Yi., Zeibots, M., He, X.: DENet: a universal network for counting crowd with varying densities and scales. IEEE Trans. Multimed. 23, 1060–1068 (2021)
Cao, X., Wang, Z., Zhao, Y., Su, F.: Scale aggregation network for accurate and efficient crowd counting. In: ECCV, pp. 757–773 (2018)
Li, P., Zhang, M., Wan, J., Jiang, M.: Multi-scale guided attention network for crowd counting. Sci. Program. 2021, 1–13 (2021)
Ding, X., He, F., Lin, Z., Wang, Y., Guo, H., Huang, Y.: Crowd density estimation using fusion of multilayer features. IEEE Trans. Intell. Transp. Syst. 22(8), 4776–4787 (2021)
Sindagi, V.A., Patel, V.M., HA-CCN: Hierarchical attention-based crowd counting network. IEEE Trans. Image Process. 29, 323–335 (2019)
Liang, D., Xu, W., Xiang Bai, X.: An end-to-end transformer model for crowd localization. In: ECCV vol 13661 (2022)
Jiang, X., Xiao, Z., Zhang, B., Zhen, X., Cao, X., Doermann, D., Shao L.: Crowd counting and density estimation by trellis encoderdecoder networks. In: CVPR pp. 6133–6142 (2019)
Tian, Y., Chu, X., Wang, H.: Cctrans: simplifying and improving crowd counting with transformer. arXiv preprint arXiv:2109.14483 (2021)
Wang, F., Liu, K., Long, F., Sang, N., Xia, X., Sang, J.: Joint cnn and transformer network via weakly supervised learning for efficient crowd counting. arXiv preprint arXiv:2203.06388, (2022)
Chen, Y., Yang, J., Chen, B., Shaoyi, Du.: Counting varying density crowds through density guided adaptive selection cnn and transformer estimation. IEEE Trans. Circ. Syst. Video Technol. 33(3), 1055–1068 (2023)
Liang, D., Chen, X., Wei, Xu., Zhou, Yu., Bai, X.: TransCrowd: weakly-supervised crowd counting with transformers. Sci. China Inf. Sci. 65(6), 1–14 (2022)
Jiang, X., Xiao, Z., Zhang, B., Zhen, X., Cao, X., Doermann, D.S., Shao, L.: (2019) Crowd counting and density estimation by trellis encoder-decoder networks. In: CVPR , pp. 6133–6142 (2019)
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China (Nos. 62067002, 61967006, and 62062033), in part by the Science and Technology Project of the Transportation Department of Jiangxi Province, China (Nos. 2022X0040) and in part by the Natural Science Foundation of Jiangxi Province underGrant 20232BAB202018.
Author information
Authors and Affiliations
Contributions
XL and LZ wrote the main manuscript style, HX optimized it, and ZY and HP collected part of the data. All the authors read the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there are no competing interests related to the content of this article.
Additional information
Communicated by T. Li.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Xiong, L., Li, Z., Huang, X. et al. TFA-CNN: an efficient method for dealing with crowding and noise problems in crowd counting. Multimedia Systems 29, 3259–3276 (2023). https://doi.org/10.1007/s00530-023-01194-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-023-01194-8