Skip to main content

Weakly-Supervised Crowd Counting Learns from Sorting Rather Than Locations

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 (ECCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12353))

Included in the following conference series:

Abstract

In crowd counting datasets, the location labels are costly, yet, they are not taken into the evaluation metrics. Besides, existing multi-task approaches employ high-level tasks to improve counting accuracy. This research tendency increases the demand for more annotations. In this paper, we propose a weakly-supervised counting network, which directly regresses the crowd numbers without the location supervision. Moreover, we train the network to count by exploiting the relationship among the images. We propose a soft-label sorting network along with the counting network, which sorts the given images by their crowd numbers. The sorting network drives the shared backbone CNN model to obtain density-sensitive ability explicitly. Therefore, the proposed method improves the counting accuracy by utilizing the information hidden in crowd numbers, rather than learning from extra labels, such as locations and perspectives. We evaluate our proposed method on three crowd counting datasets, and the performance of our method plays favorably against the fully supervised state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Boominathan, L., Kruthiventi, S.S.S., Babu, R.V.: Crowdnet: a deep convolutional network for dense crowd counting. In: ACM Multimedia, pp. 640–644 (2016)

    Google Scholar 

  2. Cao, X., Wang, Z., Zhao, Y., Su, F.: Scale aggregation network for accurate and efficient crowd counting. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 757–773 (2018)

    Google Scholar 

  3. Chan, A.B., Liang, Z.S.J., Vasconcelos, N.: Privacy preserving crowd monitoring: counting people without people models or tracking. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–7 (2008)

    Google Scholar 

  4. Cheng, Z., Li, J., Dai, Q., Wu, X., Hauptmann, A.G.: Learning spatial awareness to improve crowd counting. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6152–6161 (2019)

    Google Scholar 

  5. Cuturi, M.: Sinkhorn distances: lightspeed computation of optimal transportation distances. In: Neural Information Processing Systems, pp. 2292–2300 (2013)

    Google Scholar 

  6. Cuturi, M., Teboul, O., Vert, J.: Differentiable ranks and sorting using optimal transport. In: Conference on Neural Information Processing Systems (2019)

    Google Scholar 

  7. Deb, D., Ventura, J.: An aggregated multicolumn dilated convolution network for perspective-free counting. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 195–204 (2018)

    Google Scholar 

  8. Grover, A., Wang, E.H., Zweig, A., Ermon, S.: Stochastic optimization of sorting networks via continuous relaxations. In: International Conference on Learning Representations (2019)

    Google Scholar 

  9. Guerrerogomezolmedo, R., Torrejimenez, B., Lopezsastre, R.J., Maldonadobascon, S., Onororubio, D.: Extremely overlapping vehicle counting. In: Iberian Conference on Pattern Recognition and Image Analysis, pp. 423–431 (2015)

    Google Scholar 

  10. Guo, B., et al.: Mobile crowd sensing and computing: the review of an emerging human-powered sensing paradigm. ACM Comput. Surv. 48(1), 7:1–7:31 (2015)

    Google Scholar 

  11. Huang, S., et al.: Body structure aware deep crowd counting. IEEE Trans. Image Process. 27(3), 1049–1059 (2018)

    Article  MathSciNet  Google Scholar 

  12. Idrees, H., Saleemi, I., Seibert, C., Shah, M.: Multi-source multi-scale counting in extremely dense crowd images. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2547–2554 (2013)

    Google Scholar 

  13. Jiang, X., et al.: Crowd counting and density estimation by trellis encoder-decoder network. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

  14. Lempitsky, V.S., Zisserman, A.: Learning to count objects in images. In: NIPS (2010)

    Google Scholar 

  15. Li, Y., Zhang, X., Chen, D.: Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1091–1100 (2018)

    Google Scholar 

  16. Linderman, S.W., Mena, G., Cooper, H., Paninski, L., Cunningham, J.P.: Reparameterizing the birkhoff polytope for variational permutation inference. In: International Conference on Artificial Intelligence and Statistics (2017)

    Google Scholar 

  17. Liu, C., Wen, X., Mu, Y.: Recurrent attentive zooming for joint crowd counting and precise localization. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

  18. Liu, J., Gao, C., Meng, D., Hauptmann, A.G.: Decidenet: counting varying density crowds through attention guided detection and density estimation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5197–5206 (2018)

    Google Scholar 

  19. Liu, N., Long, Y., Zou, C., Niu, Q., Pan, L., Wu, H.: Adcrowdnet: an attention-injective deformable convolutional network for crowd understanding. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

    Google Scholar 

  20. Liu, W., Salzmann, M., Fua, P.: Context-aware crowd counting. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

    Google Scholar 

  21. Liu, X., De Weijer, J.V., Bagdanov, A.D.: Leveraging unlabeled data for crowd counting by learning to rank. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7661–7669 (2018)

    Google Scholar 

  22. Liu, Y., Shi, M., Zhao, Q., Wang, X.: Point in, box out: beyond counting persons in crowds. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

  23. Longyin, W., et al.: Drone-based joint density map estimation, localization and tracking with space-time multi-scale attention network. arxiv (2020)

    Google Scholar 

  24. Loy, C.C., Gong, S., Xiang, T.: From semi-supervised to transfer counting of crowds. In: International Conference on Computer Vision, pp. 2256–2263 (2013)

    Google Scholar 

  25. Ma, Z., Wei, X., Hong, X., Gong, Y.: Bayesian loss for crowd count estimation with point supervision. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

  26. Noroozi, M., Favaro, P.: Unsupervised learning of visual representations by solving Jigsaw puzzles. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 69–84. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_5

    Chapter  Google Scholar 

  27. Ranjan, V., Le, H., Hoai, M.: Iterative crowd counting. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 278–293. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_17

    Chapter  Google Scholar 

  28. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer Assisted Intervention, pp. 234–241 (2015)

    Google Scholar 

  29. Sam, D.B., Babu, R.V.: Top-down feedback for crowd counting convolutional neural network. In: National Conference on Artificial Intelligence, pp. 7323–7330 (2018)

    Google Scholar 

  30. Sam, D.B., Sajjan, N., Babu, R.V., Srinivasan, M.: Divide and grow: capturing huge diversity in crowd images with incrementally growing CNN. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3618–3626 (2018)

    Google Scholar 

  31. Sam, D.B., Sajjan, N.N., Maurya, H., Radhakrishnan, V.B.: Almost unsupervised learning for dense crowd counting. Assoc. Adv. Artif. Intell. 33, 8868–8875 (2019)

    Google Scholar 

  32. Sam, D.B., Surya, S., Babu, R.V.: Switching convolutional neural network for crowd counting. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4031–4039 (2017)

    Google Scholar 

  33. Sermanet, P., et al.: Time-contrastive networks: self-supervised learning from video. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  34. Shen, Z., Xu, Y., Ni, B., Wang, M., Hu, J., Yang, X.: Crowd counting via adversarial cross-scale consistency pursuit. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018

    Google Scholar 

  35. Sheng, X., Tang, J., Xiao, X., Xue, G.: Leveraging GPS-less sensing scheduling for green mobile crowd sensing. IEEE Internet Things J. 1(4), 328–336 (2014)

    Article  Google Scholar 

  36. Shi, M., Yang, Z., Xu, C., Chen, Q.: Revisiting perspective information for efficient crowd counting. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

    Google Scholar 

  37. Shi, Z., Mettes, P., Snoek, C.G.M.: Counting with focus for free. In: International Conference on Computer Vision, pp. 4200–4209 (2019)

    Google Scholar 

  38. Shi, Z., et al.: Crowd counting with deep negative correlation learning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5382–5390 (2018)

    Google Scholar 

  39. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)

    Google Scholar 

  40. Sindagi, V.A., Patel, V.M.: Generating high-quality crowd density maps using contextual pyramid CNNs. In: International Conference on Computer Vision, pp. 1879–1888 (2017)

    Google Scholar 

  41. Wan, J., Chan, A.B.: Adaptive density map generation for crowd counting. In: International Conference on Computer Vision, pp. 1130–1139 (2019)

    Google Scholar 

  42. Wang, Q., Gao, J., Lin, W., Yuan, Y.: Learning from synthetic data for crowd counting in the wild. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

  43. Xu, D., Xiao, J., Zhao, Z., Shao, J., Xie, D., Zhuang, Y.: Self-supervised spatiotemporal learning via video clip order prediction. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10334–10343 (2019)

    Google Scholar 

  44. Yan, Z., et al.: Perspective-guided convolution networks for crowd counting. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

  45. Zhang, C., Li, H., Wang, X., Yang, X.: Cross-scene crowd counting via deep convolutional neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 833–841 (2015)

    Google Scholar 

  46. Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 589–597 (2016)

    Google Scholar 

  47. Zhao, M., Zhang, J., Zhang, C., Zhang, W.: Leveraging heterogeneous auxiliary tasks to assist crowd counting. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12736–12745 (2019)

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by the Italy-China collaboration project TALENT:2018YFE0118400, in part by National Natural Science Foundation of China: 61620106009, 61772494, 61931008, U1636214, 61836002 and 61976069, in part by Key Research Program of Frontier Sciences, CAS: QYZDJ-SSW-SYS013, in part by Youth Innovation Promotion Association CAS.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guorong Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yang, Y., Li, G., Wu, Z., Su, L., Huang, Q., Sebe, N. (2020). Weakly-Supervised Crowd Counting Learns from Sorting Rather Than Locations. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12353. Springer, Cham. https://doi.org/10.1007/978-3-030-58598-3_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58598-3_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58597-6

  • Online ISBN: 978-3-030-58598-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics