Multi-scale dilated convolution of convolutional neural network for crowd counting
- 75 Downloads
- 1 Citations
Abstract
Growing numbers of crowd density estimation methods have been developed in scene monitoring, crowd safety and on-site management scheduling. We proposed a method for density estimation of a single static image based on convolutional neural network naming Multi-scale Dilated Convolution of Convolutional Neural Network (Multi-scale-CNN). The proposed method employed the method of density maps regression to learn the mapping relationship between single-image and density maps through convolutional neural network. The adopted network structure is composed of two major components to adapt changes of characters scales in crowd images, a convolutional neural network for the general feature extraction and the other is multi-scale dilated convolution for disposing the scale change problem. It is insufficient for currently study that tackled the multi-column or multi-input convolutional neural networks to solve multi-scale problems. Our method utilizes a single-column network to extract features and combines multi-scale dilated convolution to aggregate multi-scale information to address the shortcomings of two networks. The multi-scale dilated convolution module aggregates multi-scale context information systematically by making use of dilated convolution without reducing the receiving domain, thereby integrate the underlying detail information into the high-level semantic features to promote the perception and counting ability of network for small targets. This paper demonstrates the proposed network structure in ShanghaiTech dataset, UCF_CC_50 dataset and worldexpo’10 dataset, and compares the results with numbers of current mainstream crowd counting algorithms, proves that our method surpasses current state-of-the-art methods and has excellent counting accuracy and robustness. The training and testing codes of our method models can be downloaded at https://github.com/doctorwgd/Multi-scale-CNN.
Keywords
Image processing Crowd counting Deep learning Dilated convolutionNotes
References
- 1.Abualigah L, Khader A, Hanandeh E (2017) A new feature selection method to improve the document clustering using particle swarm optimization algorithm. J Comput SciGoogle Scholar
- 2.Abualigah L, Khader A, Hanandeh E (2018) A combination of objective functions and hybrid krill herd algorithm for text document clustering analysis. Eng Appl Artif IntGoogle Scholar
- 3.Chen J, He L, Yang T (2016) Scale-up purification for rutin hyrdrolysates by high-performance counter-current chromatography coupled with semi-preparative high-performance liquid chromatography. Sep Sci Technol 51(9):152–1530Google Scholar
- 4.Chen J, Kumar A, Ranjan R, Patel VM, Alavi A, Chellappa R (2016) A cascaded convolutional neural network for age estimation of unconstrained faces, 2016 IEEE 8th International Conference on Biometrics Theory, Applications and Systems (BTAS) 1–8Google Scholar
- 5.Chen K, Loy CC, Gong S, Xiang T (2012) Feature mining for localised crowd counting. Proc Br Mach Vis Conf 21.1–21.11Google Scholar
- 6.Cheng Z, Chang X, Zhu L (2019) MMALFM: explainable recommendation by leveraging reviews and images, ACM Transactions on Information Systems (TOIS) 37(2)CrossRefGoogle Scholar
- 7.Collobert R, Kavukcuoglu K, Farabet C (2011) Torch7: A matlab-like environment for machine learningGoogle Scholar
- 8.Dai J, He K, Sun J (2016) Instance-aware semantic segmentation via multi-task network cascades. IEEE Conf Comput Vis Pattern Recognit (CVPR) 2016:3150–3158Google Scholar
- 9.Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection, Computer Vision and Pattern Recognition, 2005 IEEE Computer Society Conference on New York IEEE 886–893Google Scholar
- 10.Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645CrossRefGoogle Scholar
- 11.Guo Y, Cheng Z, Nie L et al (2019) Attentive long short-term preference modeling for personalized product search, ACM Transactions on Information Systems (TOIS) 37(2)CrossRefGoogle Scholar
- 12.Idrees H, Saleemi I, Seibert C, Shah M (2013) Multi-source multi-scale counting in extremely dense crowd images. IEEE Conf Comput Vis Pattern Recognit (CVPR) 2013:2547–2554Google Scholar
- 13.Lempitsky V, Zisserman A (2010) Learning to count objects m images, advances in neural information processing systems 1324–1332Google Scholar
- 14.Li M, Zhang Z, Huang K, Tan T (2008) Estimating the number of people in crowded scenes by MID based foreground segmentation and head-shoulder detection, 2008 19th International Conference on Pattern Recognition 1–4Google Scholar
- 15.Lin SF, Chen JY, Chao HX (2001) Estimation of number of people in crowded scenes using perspective transformation. IEEE Trans Syst Man Cybern Syst Hum 31(6):645–654CrossRefGoogle Scholar
- 16.Liu L, Wang H, Li G et al (2018) Crowd counting using deep recurrent spatial-aware network, Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18) 849–855Google Scholar
- 17.Marsden M, McGuiness K, Little S et al (2016) Fully convolutional crowd counting on highly congested scenes. ArXiv preprint arXiv: 1612.00220Google Scholar
- 18.Marsden M, McGuinness K, Little S, O'Connor NE (2017) ResnetCrowd: A residual deep learning architecture for crowd counting, violent behaviour detection and crowd density level classification, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) 1–7Google Scholar
- 19.Onoro-Rubio D, Lopez-Sastre RJ (2016) Towards perspective-free object counting with deep learning, european conference on computer vision (ECCV), Springer, ChamCrossRefGoogle Scholar
- 20.Paragios N, Ramesh V (2001) A MRF-based approach for real-time subway monitoring, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), I-IGoogle Scholar
- 21.Ranjan R, Patel VM, Chellappa R (2019) HyperFace: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans Pattern Anal Mach Intell 41(1):121–135CrossRefGoogle Scholar
- 22.Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149CrossRefGoogle Scholar
- 23.Sam DB, Surya S, Babu RV (2017) Switching convolutional neural network for crowd counting. IEEE Conf Comput Vis Pattern Recognit (CVPR) 2017:4031–4039Google Scholar
- 24.Sharma A, De S, Gupta HM (2014) R Gangopadhyay, multiple description transform coded transmission over OFDM broadcast channels. Phys Commun 12:79–92CrossRefGoogle Scholar
- 25.Sindagi VA, Patel VM (2017) CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) 1–6Google Scholar
- 26.Sindagi VA, Patel VM (2017) Generating high-quality crowd density maps using contextual pyramid CNNs, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR) 1861–1870Google Scholar
- 27.Szegedy C et al (2015) Going deeper with convolutions. IEEE Conf Comput Vis Pattern Recognit (CVPR) 2015:1–9Google Scholar
- 28.Viola P, Jones MJ (2013) Robust real-time face recognition, 2013 Africon, Pointe-Aux-Piments pp 1–5Google Scholar
- 29.Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions, International Conference on Learning Representations (ICLR)Google Scholar
- 30.Yu J, Zhang B, Kuang Z, Lin D, Fan J (2017) iPrivacy: image privacy protection by identifying sensitive objects via deep multi-task learning. IEEE Trans Inf For Secur 12(5):1005–1016CrossRefGoogle Scholar
- 31.Zhang C, Li HS, Wang X, Yang XK (2015) Cross-scene crowd counting via deep convolutional neural networks, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 833–841Google Scholar
- 32.Zhang Y, Zhou D, Chen S, Gao S, Ma Y (2016) Single-image crowd counting via multi-column convolutional neural network, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 589–597Google Scholar