Multi-scale dilated convolution of convolutional neural network for crowd counting

  • Yanjie Wang
  • Shiyu Hu
  • Guodong WangEmail author
  • Chenglizhao Chen
  • Zhenkuan Pan


Growing numbers of crowd density estimation methods have been developed in scene monitoring, crowd safety and on-site management scheduling. We proposed a method for density estimation of a single static image based on convolutional neural network naming Multi-scale Dilated Convolution of Convolutional Neural Network (Multi-scale-CNN). The proposed method employed the method of density maps regression to learn the mapping relationship between single-image and density maps through convolutional neural network. The adopted network structure is composed of two major components to adapt changes of characters scales in crowd images, a convolutional neural network for the general feature extraction and the other is multi-scale dilated convolution for disposing the scale change problem. It is insufficient for currently study that tackled the multi-column or multi-input convolutional neural networks to solve multi-scale problems. Our method utilizes a single-column network to extract features and combines multi-scale dilated convolution to aggregate multi-scale information to address the shortcomings of two networks. The multi-scale dilated convolution module aggregates multi-scale context information systematically by making use of dilated convolution without reducing the receiving domain, thereby integrate the underlying detail information into the high-level semantic features to promote the perception and counting ability of network for small targets. This paper demonstrates the proposed network structure in ShanghaiTech dataset, UCF_CC_50 dataset and worldexpo’10 dataset, and compares the results with numbers of current mainstream crowd counting algorithms, proves that our method surpasses current state-of-the-art methods and has excellent counting accuracy and robustness. The training and testing codes of our method models can be downloaded at


Image processing Crowd counting Deep learning Dilated convolution 



  1. 1.
    Abualigah L, Khader A, Hanandeh E (2017) A new feature selection method to improve the document clustering using particle swarm optimization algorithm. J Comput SciGoogle Scholar
  2. 2.
    Abualigah L, Khader A, Hanandeh E (2018) A combination of objective functions and hybrid krill herd algorithm for text document clustering analysis. Eng Appl Artif IntGoogle Scholar
  3. 3.
    Chen J, He L, Yang T (2016) Scale-up purification for rutin hyrdrolysates by high-performance counter-current chromatography coupled with semi-preparative high-performance liquid chromatography. Sep Sci Technol 51(9):152–1530Google Scholar
  4. 4.
    Chen J, Kumar A, Ranjan R, Patel VM, Alavi A, Chellappa R (2016) A cascaded convolutional neural network for age estimation of unconstrained faces, 2016 IEEE 8th International Conference on Biometrics Theory, Applications and Systems (BTAS) 1–8Google Scholar
  5. 5.
    Chen K, Loy CC, Gong S, Xiang T (2012) Feature mining for localised crowd counting. Proc Br Mach Vis Conf 21.1–21.11Google Scholar
  6. 6.
    Cheng Z, Chang X, Zhu L (2019) MMALFM: explainable recommendation by leveraging reviews and images, ACM Transactions on Information Systems (TOIS) 37(2)CrossRefGoogle Scholar
  7. 7.
    Collobert R, Kavukcuoglu K, Farabet C (2011) Torch7: A matlab-like environment for machine learningGoogle Scholar
  8. 8.
    Dai J, He K, Sun J (2016) Instance-aware semantic segmentation via multi-task network cascades. IEEE Conf Comput Vis Pattern Recognit (CVPR) 2016:3150–3158Google Scholar
  9. 9.
    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection, Computer Vision and Pattern Recognition, 2005 IEEE Computer Society Conference on New York IEEE 886–893Google Scholar
  10. 10.
    Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645CrossRefGoogle Scholar
  11. 11.
    Guo Y, Cheng Z, Nie L et al (2019) Attentive long short-term preference modeling for personalized product search, ACM Transactions on Information Systems (TOIS) 37(2)CrossRefGoogle Scholar
  12. 12.
    Idrees H, Saleemi I, Seibert C, Shah M (2013) Multi-source multi-scale counting in extremely dense crowd images. IEEE Conf Comput Vis Pattern Recognit (CVPR) 2013:2547–2554Google Scholar
  13. 13.
    Lempitsky V, Zisserman A (2010) Learning to count objects m images, advances in neural information processing systems 1324–1332Google Scholar
  14. 14.
    Li M, Zhang Z, Huang K, Tan T (2008) Estimating the number of people in crowded scenes by MID based foreground segmentation and head-shoulder detection, 2008 19th International Conference on Pattern Recognition 1–4Google Scholar
  15. 15.
    Lin SF, Chen JY, Chao HX (2001) Estimation of number of people in crowded scenes using perspective transformation. IEEE Trans Syst Man Cybern Syst Hum 31(6):645–654CrossRefGoogle Scholar
  16. 16.
    Liu L, Wang H, Li G et al (2018) Crowd counting using deep recurrent spatial-aware network, Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18) 849–855Google Scholar
  17. 17.
    Marsden M, McGuiness K, Little S et al (2016) Fully convolutional crowd counting on highly congested scenes. ArXiv preprint arXiv: 1612.00220Google Scholar
  18. 18.
    Marsden M, McGuinness K, Little S, O'Connor NE (2017) ResnetCrowd: A residual deep learning architecture for crowd counting, violent behaviour detection and crowd density level classification, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) 1–7Google Scholar
  19. 19.
    Onoro-Rubio D, Lopez-Sastre RJ (2016) Towards perspective-free object counting with deep learning, european conference on computer vision (ECCV), Springer, ChamCrossRefGoogle Scholar
  20. 20.
    Paragios N, Ramesh V (2001) A MRF-based approach for real-time subway monitoring, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), I-IGoogle Scholar
  21. 21.
    Ranjan R, Patel VM, Chellappa R (2019) HyperFace: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans Pattern Anal Mach Intell 41(1):121–135CrossRefGoogle Scholar
  22. 22.
    Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149CrossRefGoogle Scholar
  23. 23.
    Sam DB, Surya S, Babu RV (2017) Switching convolutional neural network for crowd counting. IEEE Conf Comput Vis Pattern Recognit (CVPR) 2017:4031–4039Google Scholar
  24. 24.
    Sharma A, De S, Gupta HM (2014) R Gangopadhyay, multiple description transform coded transmission over OFDM broadcast channels. Phys Commun 12:79–92CrossRefGoogle Scholar
  25. 25.
    Sindagi VA, Patel VM (2017) CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) 1–6Google Scholar
  26. 26.
    Sindagi VA, Patel VM (2017) Generating high-quality crowd density maps using contextual pyramid CNNs, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR) 1861–1870Google Scholar
  27. 27.
    Szegedy C et al (2015) Going deeper with convolutions. IEEE Conf Comput Vis Pattern Recognit (CVPR) 2015:1–9Google Scholar
  28. 28.
    Viola P, Jones MJ (2013) Robust real-time face recognition, 2013 Africon, Pointe-Aux-Piments pp 1–5Google Scholar
  29. 29.
    Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions, International Conference on Learning Representations (ICLR)Google Scholar
  30. 30.
    Yu J, Zhang B, Kuang Z, Lin D, Fan J (2017) iPrivacy: image privacy protection by identifying sensitive objects via deep multi-task learning. IEEE Trans Inf For Secur 12(5):1005–1016CrossRefGoogle Scholar
  31. 31.
    Zhang C, Li HS, Wang X, Yang XK (2015) Cross-scene crowd counting via deep convolutional neural networks, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 833–841Google Scholar
  32. 32.
    Zhang Y, Zhou D, Chen S, Gao S, Ma Y (2016) Single-image crowd counting via multi-column convolutional neural network, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 589–597Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • Yanjie Wang
    • 1
  • Shiyu Hu
    • 1
  • Guodong Wang
    • 1
    Email author
  • Chenglizhao Chen
    • 1
  • Zhenkuan Pan
    • 1
  1. 1.College of Computer Science and TechnologyQingdao UniversityQingdaoPeople’s Republic of China

Personalised recommendations