Bilateral counting network for single-image object counting

  • He Li
  • Shihui ZhangEmail author
  • Weihang Kong
Original Article


This paper proposes a novel bilateral counting network to estimate the accurate and robust counting result for single-image object counting task. The proposed network is composed of two main components: the concentrated dilated pyramid module and dual-context extraction path. The concentrated dilated pyramid module extracts the multi-scale feature from the image to address the scale variant issue in object counting task via a pyramid structure and also uses a shortcut concentration to facilitate the back-propagation of the gradient so as to improve the counting performance. And the dual-context extraction path obtains different-level context related to the object counting task through convoluting and down-sampling the image different times. The concentrated dilated pyramid module and the dual-context extraction path are integrated to boost the final counting result. Extensive experiments on vehicle counting and crowd counting datasets including TRANCOS, Mall, Shanghaitech_A and WorldExpo’10 demonstrate the feasibility and effectiveness for the object counting task.


Object counting Bilateral counting network Multi-scale feature Different-level context 



This work was supported partly by the National Natural Science Foundation of China (No. 61379065), the Natural Science Foundation of Hebei province in China (Nos. F2019203285; 2019203526), the Project funded by China Postdoctoral Science Foundation (No. 2018M631763) and Yanshan University Doctoral Foundation (BL18010)

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.


  1. 1.
    Arteta, C., Lempitsky, V., Zisserman, A.: Counting in the wild. In: Proceedings of the ECCV Conference, pp. 483–498 (2016)CrossRefGoogle Scholar
  2. 2.
    Boominathan, L., Kruthiventi, S.S.S., Babu, R.: Crowdnet: A deep convolutional network for dense crowd counting. In: Proceedings of the ACMMM Conference, pp. 640–644 (2016)Google Scholar
  3. 3.
    Chen, J.C., Kumar, A., Ranjan, R., et al.: A cascaded convolutional neural network for age estimation of unconstrained faces. In: IEEE 8th International Conference on Biometrics Theory, Applications and Systems (2016)Google Scholar
  4. 4.
    Chen, K., Loy, C.C., Gong, S., Xiang, T.: Feature mining for localised crowd counting. In: Proceedings of BMVC (2012)Google Scholar
  5. 5.
    Choi, J.S., Choi, M.J., Lee, J.M., et al.: A new automated cell counting program by using hough transform-based double edge. Lect. Not. Electr. Eng. 421, 712–716 (2016)CrossRefGoogle Scholar
  6. 6.
    Daniel, O., Roberto, J.L.: Towards perspective-free object counting with deep learning. In: Proceedings of the ECCV Conference, pp. 615–629 (2016)Google Scholar
  7. 7.
    Fan, C.S., Liang, J.M., Lin, Y.T., et al.: A survey of intelligent video surveillance systems: history, applications and future. Front. Artif. Intell. Appl. 274, 1479–1488 (2015)Google Scholar
  8. 8.
    Fiaschi, L., Koethe, U., Nair, R., et al.: Learning to count with regression forest and structured labels. In: Proceedings ICPR Conference, pp. 2685–2688 (2012)Google Scholar
  9. 9.
    Guerrerogómezolmedo, R., Torrejiménez, B., et al.: Extremely overlapping vehicle counting. In: In: Proceedings of the Iberian Conference, pp. 423–431 (2015)Google Scholar
  10. 10.
    Idrees, H., Saleemi, I., Seibert, C., et al.: Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of the CVPR Conference, pp. 2547–2554 (2013)Google Scholar
  11. 11.
    Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations, pp. 1–13 (2015)Google Scholar
  12. 12.
    Kumagai, S., Hotta, K., Kurita, T.: Mixture of counting CNNS. Mach. Vis. Appl. 29, 1119–1126 (2018)CrossRefGoogle Scholar
  13. 13.
    Lempitsky, V.S., Zisserman, A.: Learning to count objects in images. In: Proceedings of the ICONIP Conference, pp. 1324–1332 (2010)Google Scholar
  14. 14.
    Liu, L.B., Wang, H.J., Li, G.B., et al.: Crowd counting using deep recurrent spatial-aware network. In: International Joint Conference on Artificial Intelligence, pp. 849–855 (2018)Google Scholar
  15. 15.
    Luo, H.L., Sang, J., Wu, W.Q., et al.: A high-density crowd counting method based on convolutional feature fusion. Appl. Sci. 8, 2367 (2018)CrossRefGoogle Scholar
  16. 16.
    Marsden, M., McGuiness, K., Little, S., et al.: Fully convolutional crowd counting on highly congested scenes. In: Proceedings of International Joint Conference on Computer Vision, Imaging Computer Graphics Theory and Applications, pp. 27–33 (2017)Google Scholar
  17. 17.
    Mukherjee, S., Gil, S., Ray, N.: Unique people count from monocular videos. Vis. Comput. 31, 1405–1417 (2015)CrossRefGoogle Scholar
  18. 18.
    Pham V. Q., Kozakaya, T., Yamaguchi, O., et al.: Count forest: co-voting uncertain number of targets using random forest for crowd density estimation. In: Proceedings of the ICCV Conference, pp. 3253–3261 (2015)Google Scholar
  19. 19.
    Ranjan, V., Le, H., Hoai, M.: Iterative crowd counting. In: Proceedings of the ECCV Conference, pp. 278–293 (2018)CrossRefGoogle Scholar
  20. 20.
    Rao, A.S., Gubbi, J., Marusic, S., et al.: Estimation of crowd density by clustering motion cues. Vis. Comput. 31, 1533–1552 (2015)CrossRefGoogle Scholar
  21. 21.
    Sam, D.B., Surya, S., Babu, R.V.: Switching convolutional neural network for crowd counting. In: Proceedings of the CVPR Conference, pp. 6–17 (2017)Google Scholar
  22. 22.
    Sheng, B., Shen, C., Lin, G., et al.: Crowd counting via weighted VLAD on dense attribute feature maps. IEEE Trans. Circ. Syst. Video Technol. 28, 1788–1797 (2018)CrossRefGoogle Scholar
  23. 23.
    Sindagi, V.A., Patel, V.M.: A survey of recent advances in cnn-based single image crowd counting and density estimation. Pattern Recogn. Lett. 107, 3–16 (2016)CrossRefGoogle Scholar
  24. 24.
    Sindagi, V.A., Patel, V.M.: CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In: Proceedings of AVSS Conference, pp. 1–6 (2017)Google Scholar
  25. 25.
    Sossa, H., Pogrebnyak, O., Cuevas, F.: Object counting without conglomerate separation. In: Mexican International Conference on Computer Science, pp. 216–220 (2003)Google Scholar
  26. 26.
    Spampinato, C., Chen-Burger, Y.H., Nadarajan, G., et al: Detecting tracking and counting fish in low quality unconstrained underwater videos. In: Proceedings of the VISAPP Conference, pp. 514–519 (2008)Google Scholar
  27. 27.
    Vishwakarma, S., Agrawal, A.: A survey on activity recognition and behavior understanding in video surveillance. Vis. Comput. 29, 983–1009 (2013)CrossRefGoogle Scholar
  28. 28.
    Wang, C., Zhang, H., Yang, L., et al.: Deep people counting in extremely dense crowds. In: Proceedings of the ACMMM Conference, pp. 1299–1302 (2015)Google Scholar
  29. 29.
    Xu, B., Qiu, G.: Crowd density estimation based on rich features and random projection forest. In: IEEE Winter Conference on Applications of Computer Vision, pp. 1–8 (2016)Google Scholar
  30. 30.
    Yao, H.Y., Kang, H., Wan, W., Li, H.: Deep spatial regression model for image crowd counting. arXiv:1710.09757 (2017)
  31. 31.
    Zhang, C., Li, H.S., Wang, X.G., et al.: Cross-scene crowd counting via deep convolutional neural networks. In: Proceedings of the CVPR Conference, pp. 833–841 (2015)Google Scholar
  32. 32.
    Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the CVPR Conference, pp. 589–597 (2016)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of Information Science and EngineeringYanshan UniversityQinhuangdaoChina
  2. 2.The Key Laboratory for Computer Virtual Technology and System Integration of Hebei ProvinceQinhuangdaoChina

Personalised recommendations