CAFN: The Combination of Atrous and Fractionally Strided Convolutional Neural Networks for Understanding the Densely Crowded Scenes

  • Lvyuan Fan
  • Minglei TongEmail author
  • Min Li
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11257)


The task to estimate crowd count in highly clustered scenes is extremely challenged on account of variable scales with non-uniformity. This paper aims to develop a simple but valid method that concentrates on predicting the density map accurately. We proposed a combination of atrous and fractionally strided convolutional neural network (CAFN), which is merely constituted by two components: an atrous convolutional neural network as the front-end for 2D features extraction which utilizes dilated kernels to deliver larger receptive fields and to lessen the network parameters, a fractionally strided convolutional neural network for the back-end to lower the loss of details during down-sampling. CAFN is an easy-trained model because of its unadulterated convolutional structure. We demonstrated CAFN on three datasets (Shanghai Tech dataset A and B, UCF_CC_50) and deliver satisfactory performance. Additionally, CAFN achieves lower Mean Absolute Error (MAE) on Shanghai Tech A (MAE = 100.8), UCF_CC_50 (MAE = 305.3) while the experiment results reveal that the proposed model can effectively lower estimation errors when compared with previous methods.


CAFN Crowd density estimation Atrous convolutions Fractionally strided convolutions 



Sponsored by Natural Science Foundation of Shanghai (16ZR1413300).


  1. 1.
    Ma, Z., Chan, A.B.: Crossing the line: crowd counting by integer programming with local features. In: 31st IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2539–2546. IEEE, Portland (2013)Google Scholar
  2. 2.
    Li, Z., Tang, J.: Weakly supervised deep matrix factorization for social image understanding. IEEE Trans. Image Process. 26(1), 276–288 (2017)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Li, Z., Tang, J., Mei, T.: Deep collaborative embedding for social image understanding. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 1 (2018)Google Scholar
  4. 4.
    Zhang, C., Li, H., Wang, X., et al.: Cross-scene crowd counting via deep convolutional neural networks. In: 33rd IEEE International Conference on Computer Vision and Pattern Recognition, pp. 833–841. IEEE, Boston (2015)Google Scholar
  5. 5.
    Boominathan, L., Kruthiventi, S.S., Babu, R.V.: CrowdNet: a deep convolutional network for dense crowd counting. In: 24th Proceedings of the ACM on Multimedia Conference, pp. 640–644. Springer, Amsterdam (2016)Google Scholar
  6. 6.
    Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: 34th IEEE International Conference on Computer Vision and Pattern Recognition, pp. 589–597. IEEE, Las Vegas (2016)Google Scholar
  7. 7.
    Sindagi, V.A., Patel, V.M.: CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In: 14th IEEE International Conference on Advanced Video and Signal Based Surveillance, pp. 1–6. IEEE, Lecce (2017)Google Scholar
  8. 8.
    Topkaya, I.S., Erdogan, H., Porikli, F.: Counting people by clustering person detector outputs. In: 11th IEEE International Conference on Advanced Video and Signal-Based Surveillance, pp. 313–318. IEEE, Seoul (2014)Google Scholar
  9. 9.
    Chen, K., Loy, C.C., Gong, S., Xiang, T.: Feature mining for localised crowd counting. In: 23rd British Machine Vision Conference, Guildford (2012)Google Scholar
  10. 10.
    Lempitsky, V., Zisserman, A.: Learning to count objects in images. In: 24th Neural Information Processing Systems, pp. 1324–1332, Curran Associates Inc., Vancouver (2010)Google Scholar
  11. 11.
    Pham, V., Kozakaya, T., Yamaguchi, O., Okada, R.: COUNT Forest: co-voting uncertain number of targets using random forest for crowd density estimation. In: 17th IEEE International Conference on Computer Vision (ICCV), pp. 3253–3261. IEEE, Santiago (2015)Google Scholar
  12. 12.
    Xu, B., Qiu, G.: Crowd density estimation based on rich features and random projection forest. In: 21th IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, pp. 1–8 (2016)Google Scholar
  13. 13.
    Idrees, H., Saleemi, I., Seibert, C., Shah, M.: Multi-source multi-scale counting in extremely dense crowd images. In: 31st IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2547–2554. IEEE, Portland (2013)Google Scholar
  14. 14.
    Wang, C., Zhang, H., Yang, L., et al.: Deep people counting in extremely dense crowds. In: 23rd International Conference ACM on Multimedia, pp. 1299–1302. ACM, Brisbane (2015)Google Scholar
  15. 15.
    Oñoro-Rubio, D., López-Sastre, R.J.: Towards perspective-free object counting with deep learning. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 615–629. Springer, Cham (2016). Scholar
  16. 16.
    Shang, C., Ai, H., Bai, B.: End-to-end crowd counting via joint learning local and global count. In: 23rd IEEE International Conference on Image Processing (ICIP), pp. 1215–1219. IEEE, Phoenix (2016)Google Scholar
  17. 17.
    Sam, D.B., Surya, S., Babu, R.V.: Switching convolutional neural network for crowd counting. In: 35th IEEE International Conference on Computer Vision and Pattern Recognition, Hawaii, pp. 5744–5752 (2017)Google Scholar
  18. 18.
    Sindagi, V.A., Patel, V.M.: Generating high-quality crowd density maps using contextual pyramid CNNs. In: 19th IEEE International Conference on Computer Vision (ICCV), pp. 1879–1888. IEEE, Venice (2017)Google Scholar
  19. 19.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 6th International Conference on Learning Representations, San Diego (2015)Google Scholar
  20. 20.
    Li, Y., Zhang, X., Chen, D.: CSRNet: dilated convolutional neural networks for understanding the highly congested scenes. In: 36th IEEE International Conference on Computer Vision and Pattern Recognition. IEEE, Utah (2018)Google Scholar
  21. 21.
    Collobert, R., Kavukcuoglu, K., Farabet, C.: Torch7: a matlab-like environment for machine learning. In: 26th Annual Conference on Neural Information Processing Systems (NIPS), Biglearn, Nips Workshop, Lake Tahoe (2012)Google Scholar
  22. 22.
    Marsden, M., Mcguinness, K., Little, S., et al.: Fully convolutional crowd counting on highly congested scene, pp. 27–33 (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Shanghai University of Electric PowerShanghaiPeople’s Republic of China

Personalised recommendations