Advertisement

Weighing Counts: Sequential Crowd Counting by Reinforcement Learning

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12355)

Abstract

We formulate counting as a sequential decision problem and present a novel crowd counting model solvable by deep reinforcement learning. In contrast to existing counting models that directly output count values, we divide one-step estimation into a sequence of much easier and more tractable sub-decision problems. Such sequential decision nature corresponds exactly to a physical process in reality—scale weighing. Inspired by scale weighing, we propose a novel ‘counting scale’ termed LibraNet where the count value is analogized by weight. By virtually placing a crowd image on one side of a scale, LibraNet (agent) sequentially learns to place appropriate weights on the other side to match the crowd count. At each step, LibraNet chooses one weight (action) from the weight box (the pre-defined action pool) according to the current crowd image features and weights placed on the scale pan (state). LibraNet is required to learn to balance the scale according to the feedback of the needle (Q values). We show that LibraNet exactly implements scale weighing by visualizing the decision process how LibraNet chooses actions. Extensive experiments demonstrate the effectiveness of our design choices and report state-of-the-art results on a few crowd counting benchmarks, including ShanghaiTech, UCF_CC_50 and UCF-QNRF. We also demonstrate good cross-dataset generalization of LibraNet. Code and models are made available at https://git.io/libranet.

Keywords

Crowd counting Reinforcement learning 

Notes

Acknowledgement

This work is supported by the Natural Science Foundation of China under Grant No. 61876211 and Grant No. U1913602. Part of this work was done when L. Liu was visiting The University of Adelaide.

Supplementary material

504449_1_En_10_MOESM1_ESM.pdf (3.3 mb)
Supplementary material 1 (pdf 3341 KB)

References

  1. 1.
    Araslanov, N., Rothkopf, C.A., Roth, S.: Actor-critic instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8237–8246 (2019)Google Scholar
  2. 2.
    Caicedo, J.C., Lazebnik, S.: Active object localization with deep reinforcement learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2488–2496 (2015)Google Scholar
  3. 3.
    Chan, A.B., Liang, Z.S.J., Vasconcelos, N.: Privacy preserving crowd monitoring: counting people without people models or tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–7. IEEE (2008)Google Scholar
  4. 4.
    Chattopadhyay, P., Vedantam, R., Selvaraju, R.R., Batra, D., Parikh, D.: Counting everyday objects in everyday scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1135–1144 (2017)Google Scholar
  5. 5.
    Chen, K., Loy, C.C., Gong, S., Xiang, T.: Feature mining for localised crowd counting. In: Proceedings of the British Machine Vision Conference (BMVC), p. 3 (2012)Google Scholar
  6. 6.
    Cohen, J.P., Boucher, G., Glastonbury, C.A., Lo, H.Z., Bengio, Y.: Count-ception: counting by fully convolutional redundant counting. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 18–26 (2017).  https://doi.org/10.1109/ICCVW.2017.9
  7. 7.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 886–893 (2005)Google Scholar
  8. 8.
    Diuk, C., Cohen, A., Littman, M.L.: An object-oriented representation for efficient reinforcement learning. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 240–247. ACM (2008)Google Scholar
  9. 9.
    Guerrero-Gómez-Olmedo, R., Torre-Jiménez, B., López-Sastre, R., Maldonado-Bascón, S., Oñoro-Rubio, D.: Extremely overlapping vehicle counting. In: Paredes, R., Cardoso, J.S., Pardo, X.M. (eds.) IbPRIA 2015. LNCS, vol. 9117, pp. 423–431. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-19390-8_48CrossRefGoogle Scholar
  10. 10.
    Hussein, A., Gaber, M.M., Elyan, E., Jayne, C.: Imitation learning: a survey of learning methods. ACM Comput. Surv. (CSUR) 50(2), 1–35 (2017)CrossRefGoogle Scholar
  11. 11.
    Idrees, H., Saleemi, I., Seibert, C., Shah, M.: Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2547–2554 (2013)Google Scholar
  12. 12.
    Idrees, H., et al.: Composition loss for counting, density map estimation and localization in dense crowds. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 544–559. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01216-8_33CrossRefGoogle Scholar
  13. 13.
    Jiang, X., et al.: Crowd counting and density estimation by trellis encoder-decoder networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6133–6142 (2019)Google Scholar
  14. 14.
    Kong, X., Xin, B., Wang, Y., Hua, G.: Collaborative deep reinforcement learning for joint object search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1695–1704 (2017)Google Scholar
  15. 15.
    Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Where are the blobs: counting by localization with point supervision. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 560–576. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01216-8_34CrossRefGoogle Scholar
  16. 16.
    Lempitsky, V., Zisserman, A.: Learning to count objects in images. In: Advances in Neural Information Processing Systems (NIPS), pp. 1324–1332 (2010)Google Scholar
  17. 17.
    Li, Y., Zhang, X., Chen, D.: CSRNet: dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1091–1100 (2018)Google Scholar
  18. 18.
    Lin, L.J.: Reinforcement learning for robots using neural networks. Technical report, School of Computer Science, Carnegie-Mellon Univ, Pittsburgh, PA (1993)Google Scholar
  19. 19.
    Liu, L., Lu, H., Xiong, H., Xian, K., Cao, Z., Shen, C.: Counting objects by blockwise classification. IEEE Trans. Circ. Syst. Video Technol. 30, 3513–3527 (2019)CrossRefGoogle Scholar
  20. 20.
    Liu, L., Qiu, Z., Li, G., Liu, S., Ouyang, W., Lin, L.: Crowd counting with deep structured scale integration network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)Google Scholar
  21. 21.
    Liu, L., Wang, H., Li, G., Ouyang, W., Lin, L.: Crowd counting using deep recurrent spatial-aware network. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 849–855. AAAI Press (2018)Google Scholar
  22. 22.
    Liu, W., Salzmann, M., Fua, P.: Context-aware crowd counting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5099–5108 (2019)Google Scholar
  23. 23.
    Lu, H., Cao, Z., Xiao, Y., Zhuang, B., Shen, C.: TasselNet: counting maize tassels in the wild via local counts regression network. Plant Methods 13(1), 79 (2017)CrossRefGoogle Scholar
  24. 24.
    Lu, H., Dai, Y., Shen, C., Xu, S.: Indices matter: learning to index for deep image matting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3266–3275 (2019)Google Scholar
  25. 25.
    Lu, H., Dai, Y., Shen, C., Xu, S.: Index networks. IEEE Trans. Pattern Anal. Mach. Intell. (2020)Google Scholar
  26. 26.
    Ma, Z., Wei, X., Hong, X., Gong, Y.: Bayesian loss for crowd count estimation with point supervision. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6142–6151 (2019)Google Scholar
  27. 27.
    Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 1928–1937 (2016)Google Scholar
  28. 28.
    Mnih, V., et al.: Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)
  29. 29.
    Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)CrossRefGoogle Scholar
  30. 30.
    OpenAI: OpenAI five (2018). https://blog.openai.com/openai-five/
  31. 31.
    Riedmiller, M., Gabel, T., Hafner, R., Lange, S.: Reinforcement learning for robot soccer. Auton. Robots 27(1), 55–73 (2009)CrossRefGoogle Scholar
  32. 32.
    Ryan, D., Denman, S., Fookes, C., Sridharan, S.: Crowd counting using multiple local features. In: 2009 Digital Image Computing: Techniques and Applications, pp. 81–88. IEEE (2009)Google Scholar
  33. 33.
    Sam, D.B., Surya, S., Babu, R.V.: Switching convolutional neural network for crowd counting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  34. 34.
    Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
  35. 35.
    Shi, M., Yang, Z., Xu, C., Chen, Q.: Revisiting perspective information for efficient crowd counting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7279–7288 (2019)Google Scholar
  36. 36.
    Shi, Z., et al.: Crowd counting with deep negative correlation learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5382–5390 (2018)Google Scholar
  37. 37.
    Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484 (2016)CrossRefGoogle Scholar
  38. 38.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  39. 39.
    Sindagi, V.A., Patel, V.M.: Multi-level bottom-top and top-bottom feature fusion for crowd counting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1002–1012 (2019)Google Scholar
  40. 40.
    Stahl, T., Pintea, S.L., van Gemert, J.C.: Divide and count: generic object counting by image divisions. IEEE Trans. Image Process. 28(2), 1035–1044 (2018)MathSciNetCrossRefGoogle Scholar
  41. 41.
    Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double Q-learning. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)Google Scholar
  42. 42.
    Van Hove, L.: Optimal denominations for coins and bank notes: in defense of the principle of least effort. J. Money Credit Bank. 33, 1015–1021 (2001)CrossRefGoogle Scholar
  43. 43.
    Vinyals, O., et al.: Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 1–5 (2019)CrossRefGoogle Scholar
  44. 44.
    Wan, J., Luo, W., Wu, B., Chan, A.B., Liu, W.: Residual regression with semantic prior for crowd counting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4036–4045 (2019)Google Scholar
  45. 45.
    Wang, C., Zhang, H., Yang, L., Liu, S., Cao, X.: Deep people counting in extremely dense crowds. In: Proceedings of the ACM International Conference on Multimedia (ACMMM), pp. 1299–1302. ACM (2015)Google Scholar
  46. 46.
    Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., De Freitas, N.: Dueling network architectures for deep reinforcement learning. arXiv preprint arXiv:1511.06581 (2015)
  47. 47.
    Xiong, H., Cao, Z., Lu, H., Madec, S., Liu, L., Shen, C.: TasselNetv2: in-field counting of wheat spikes with context-augmented local regression networks. Plant Methods 15(1), 150 (2019)CrossRefGoogle Scholar
  48. 48.
    Xiong, H., Lu, H., Liu, C., Liang, L., Cao, Z., Shen, C.: From open set to closed set: counting objects by spatial divide-and-conquer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 8362–8371 (2019)Google Scholar
  49. 49.
    Xu, C., Qiu, K., Fu, J., Bai, S., Xu, Y., Bai, X.: Learn to scale: generating multipolar normalized density maps for crowd counting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 8382–8390 (2019)Google Scholar
  50. 50.
    Yan, Z., et al.: Perspective-guided convolution networks for crowd counting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 952–961 (2019)Google Scholar
  51. 51.
    Zhang, C., Li, H., Wang, X., Yang, X.: Cross-scene crowd counting via deep convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 833–841 (2015)Google Scholar
  52. 52.
    Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 589–597 (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.School of Artificial Intelligence and AutomationHuazhong University of Science and TechnologyWuhanChina
  2. 2.The University of AdelaideAdelaideAustralia

Personalised recommendations