Crowd density estimation based on classification activation map and patch density level

  • Liping Zhu
  • Chengyang Li
  • Zhongguo Yang
  • Kun Yuan
  • Shang Wang
Original Article


The task of crowd counting and density map estimation is riddled with many challenges, such as occlusions, non-uniform density, intra-scene and inter-scene variations in scale and perspective. Due to the development of deep learning and large crowd datasets in recent years, most crowd counting methods have achieved notable success. This paper aims to solve crowd density estimation problem for both sparse and dense conditions. To this end, we make two contributions: (1) a network named Patch Scale Discriminant Regression Network (PSDR). Given an input crowd image, it divides the image into patches and sends image patches of different density levels into different regression networks to get the corresponding density maps. It combines all patch density maps to predict the entire density map as the output. (2) A person classification activation map (CAM) method. CAM provides person location information and guides the generation of the entire density map in the final stage. Experiment confirms that CAM allows PSDR to gain another round of performance boost. For instance, on the SmartCity dataset, we achieve (8.6–1.1) MAE and (11.6–1.4) MSE. Our method combining above two methods performs better than state-of-the-art methods.


Crowd density estimation Image patch Density level Attention mechanism Classification activation map 



This work was supported by National Natural Science Foundation of China (Grant No. 61672042), Models and Methodology of Data Services Facilitating Dynamic Correlation of Big Stream Data, 2017.1~2020.12.


  1. 1.
    Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: IEEE conference on computer vision and pattern recognition. IEEE Computer Society, pp 779–788Google Scholar
  2. 2.
    Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision – ECCV 2016. European conference on computer vision, ECCV 2016. Lecture notes in computer science, vol 9905. Springer, Cham, pp 21–37Google Scholar
  3. 3.
    Sam DB, Surya S, Babu RV (2017) Switching convolutional neural network for crowd countingGoogle Scholar
  4. 4.
    Zhang L, Shi M, Chen Q (2018) Crowd counting via scale-adaptive convolutional neural network. In: Proceedings of the 2018 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1113–1121Google Scholar
  5. 5.
    Zhang Y, Zhou D, Chen S, Gao S, Ma Y (2016). Single-image crowd counting via multi-column convolutional neural network. In: IEEE conference on computer vision and pattern recognition. IEEE Computer Society, pp 589–597Google Scholar
  6. 6.
    Zhang H, Cao X, Ho JKL, Chow TWS (2017) Object-level video advertising: an optimization framework. IEEE Trans Ind Inf 13(2):520–531CrossRefGoogle Scholar
  7. 7.
    Zhang H, Ji Y, Huang W, Liu L (2018) Sitcom-star-based clothing retrieval for video advertising: a deep learning framework. Neural Comput Appl. CrossRefGoogle Scholar
  8. 8.
    Nagao K, Yanagisawa D, Nishinari K (2018) Estimation of crowd density applying wavelet transform and machine learning. Physica A Stat Mech Appl 510:145–163CrossRefGoogle Scholar
  9. 9.
    Zhou B, Song B, Hassan MM, Alamri A (2018) Multilinear rank support tensor machine for crowd density estimation. Eng Appl Artif Intell 72:382–392CrossRefGoogle Scholar
  10. 10.
    Fu M, Xu P, Li X, Liu Q, Ye M, Zhu C (2015) Fast crowd density estimation with convolutional neural networks. Eng Appl Artif Intell 43:81–88CrossRefGoogle Scholar
  11. 11.
    Sindagi VA, Patel VM (2018) A survey of recent advances in cnn-based single image crowd counting and density estimation. Pattern Recognit Lett 107:3–16CrossRefGoogle Scholar
  12. 12.
    Saleh SAM, Suandi SA, Ibrahim H (2015) Recent survey on crowd density estimation and counting for visual surveillance. Eng Appl Artif Intell 41:103–114CrossRefGoogle Scholar
  13. 13.
    Chen K, Kämäräinen JK (2016) Pedestrian density analysis in public scenes with spatiotemporal tensor features. IEEE Trans Intell Transp Syst 17(7):1968–1977CrossRefGoogle Scholar
  14. 14.
    Zhang C, Kang K, Li H, Wang X, Xie R, Yang X (2016) Data-driven crowd understanding: a baseline for a large-scale crowd dataset. IEEE Trans Multimedia 18(6):1048–1061CrossRefGoogle Scholar
  15. 15.
    Boominathan L, Kruthiventi SSS, Babu RV (2016) CrowdNet: a deep convolutional network for dense crowd counting. In: ACM on multimedia conference. ACM, pp 640–644Google Scholar
  16. 16.
    Sarvadevabhatla RK, Surya S, Kruthiventi SSS et al (2016) SwiDeN: convolutional neural networks for depiction invariant object recognition. In: ACM on multimedia conference. ACM, pp 187–191Google Scholar
  17. 17.
    Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision, vol 8689. Springer, Cham, pp 818–833Google Scholar
  18. 18.
    Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2921–2929Google Scholar
  19. 19.
    Lin M, Chen Q, Yan S (2013) Network in network. Comput SciGoogle Scholar
  20. 20.
    Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of the 2017 IEEE international conference on computer vision (ICCV), 22–29 Oct 2017. IEEE, Venice, Italy, pp 618–626Google Scholar
  21. 21.
    Springenberg JT, Dosovitskiy A, Brox T, Riedmiller M (2014) Striving for simplicity: the all convolutional net. arXiv preprint, arXiv:1412.6806
  22. 22.
    Chen LC, Yang Y, Wang J, Xu W, Yuille AL (2016). Attention to scale: scale-aware semantic image segmentation. In: Computer vision and pattern recognition. IEEE, pp 3640–3649Google Scholar
  23. 23.
    Polus A, Schofer JL, Ushpiz A (2016) Pedestrian flow and level of service. J Transp Eng 109(1):46–56CrossRefGoogle Scholar
  24. 24.
    Zeng L, Xu X, Cai B, Qiu S, Zhang T (2017) Multi-scale convolutional neural networks for crowd counting. In: Proceedings of the 2017 IEEE international conference on image processing (ICIP). IEEE, pp 465–469Google Scholar
  25. 25.
    Li Y, Zhang X, Chen D (2018) CSRNet: dilated convolutional neural networks for understanding the highly congested scenesGoogle Scholar
  26. 26.
    Shelhamer E, Long J, Darrell T (2014) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):1Google Scholar
  27. 27.
    Horé A, Ziou D (2013) Is there a relationship between peak-signal-to-noise ratio and structural similarity index measure? IET Image Process 7(1):12–24MathSciNetCrossRefGoogle Scholar
  28. 28.
    Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612CrossRefGoogle Scholar
  29. 29.
    Sindagi VA, Patel VM (2017) Generating high-quality crowd density maps using contextual pyramid cnns. In: 2017 IEEE international conference on computer vision (ICCV). IEEE, pp 1879–1888Google Scholar
  30. 30.
    Idrees H, Saleemi I, Seibert C, Shah M (2013) Multi-source multi-scale counting in extremely dense crowd images. In: Computer vision and pattern recognition, vol 9. IEEE, pp 2547–2554Google Scholar
  31. 31.
    Chan AB, Liang ZSJ, Vasconcelos N (2008) Privacy preserving crowd monitoring: counting people without people models or tracking. In: IEEE conference on computer vision and pattern recognition, 2008. CVPR 2008. IEEE, pp 1–7Google Scholar
  32. 32.
    Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252MathSciNetCrossRefGoogle Scholar
  33. 33.
    Shen Z, Xu Y, Ni B, Wang M, Hu J, Yang X (2018) Crowd counting via adversarial cross-scale consistency pursuit. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5245–5254Google Scholar
  34. 34.
    Liu X, van de Weijer J, Bagdanov AD (2018) Leveraging unlabeled data for crowd counting by learning to rank. arXiv preprint arXiv:1803.03095
  35. 35.
    Shi Z, Zhang L, Liu Y, Cao X, Ye Y, Cheng MM, Zheng G (2018) Crowd counting with deep negative correlation learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5382–5390Google Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.College of Information Science and EngineeringChina University of Petroleum (Beijing)BeijingChina
  2. 2.Key Lab of Petroleum Data MiningChina University of Petroleum (Beijing)BeijingChina
  3. 3.Beijing Key Laboratory on Integration and Analysis of Large-Scale Stream DataNorth China University of TechnologyBeijingChina
  4. 4.School of Computer ScienceNorth China University of TechnologyBeijingChina
  5. 5.School of Electrical Engineering and Computer ScienceUniversity of OttawaOttawaCanada

Personalised recommendations