Advertisement

Estimating People Flows to Better Count Them in Crowded Scenes

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12360)

Abstract

Modern methods for counting people in crowded scenes rely on deep networks to estimate people densities in individual images. As such, only very few take advantage of temporal consistency in video sequences, and those that do only impose weak smoothness constraints across consecutive frames.

In this paper, we advocate estimating people flows across image locations between consecutive images and inferring the people densities from these flows instead of directly regressing. This enables us to impose much stronger constraints encoding the conservation of the number of people. As a result, it significantly boosts performance without requiring a more complex architecture. Furthermore, it also enables us to exploit the correlation between people flow and optical flow to further improve the results.

We will demonstrate that we consistently outperform state-of-the-art methods on five benchmark datasets.

Keywords

Crowd counting Grid flow model Temporal consistency 

Notes

Acknowledgments

This work was supported in part by the Swiss National Science Foundation.

References

  1. 1.
    BenShitrit, H., Berclaz, J., Fleuret, F., Fua, P.: Tracking multiple people under global apperance constraints. In: International Conference on Computer Vision (2011)Google Scholar
  2. 2.
    BenShitrit, H., Berclaz, J., Fleuret, F., Fua, P.: Multi-commodity network flow for tracking multiple people. IEEE Trans. Pattern Anal. Mach. Intell. 36(8), 1614–1627 (2014)CrossRefGoogle Scholar
  3. 3.
    Berclaz, J., Fleuret, F., Türetken, E., Fua, P.: Multiple object tracking using k-shortest paths optimization. IEEE Trans. Pattern Anal. Mach. Intell. 33(11), 1806–1819 (2011)CrossRefGoogle Scholar
  4. 4.
    Butt, A., Collins, R.: Multi-target tracking by Lagrangian relaxation to min-cost network flow. In: Conference on Computer Vision and Pattern Recognition, pp. 1846–1853 (2013)Google Scholar
  5. 5.
    Cao, X., Wang, Z., Zhao, Y., Su, F.: Scale aggregation network for accurate and efficient crowd counting. In: European Conference on Computer Vision (2018)Google Scholar
  6. 6.
    Chan, A., Liang, Z., Vasconcelos, N.: Privacy preserving crowd monitoring: counting people without people models or tracking. In: Conference on Computer Vision and Pattern Recognition (2008)Google Scholar
  7. 7.
    Chan, A., Vasconcelos, N.: Bayesian Poisson regression for crowd counting. In: International Conference on Computer Vision, pp. 545–551 (2009)Google Scholar
  8. 8.
    Cheng, Z., Li, J., Dai, Q., Wu, X., Hauptmann, A.G.: Learning spatial awareness to improve crowd counting. In: International Conference on Computer Vision (2019)Google Scholar
  9. 9.
    Collins, R.: Multitarget data association with higher-order motion models. In: Conference on Computer Vision and Pattern Recognition (2012)Google Scholar
  10. 10.
    Dicle, C., Camps, O.I., Sznaier, M.: The way they move: tracking multiple targets with similar appearance. In: International Conference on Computer Vision (2013)Google Scholar
  11. 11.
    Fang, Y., Zhan, B., Cai, W., Gao, S., Hu, B.: Locality-constrained spatial transformer network for video crowd counting. In: International Conference on Multimedia and Expo (2019)Google Scholar
  12. 12.
    He, Z., Li, X., You, X., Tao, D., Tang, Y.Y.: Connected component model for multi-object tracking. IEEE Trans. Image Process. 25(8), 3698–3711 (2016)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  14. 14.
    Idrees, H., et al.: Composition loss for counting, density map estimation and localization in dense crowds. In: European Conference on Computer Vision (2018)Google Scholar
  15. 15.
    Jiang, X., Xiao, Z., Zhang, B., Zhen, X.: Crowd counting and density estimation by trellis encoder-decoder networks. In: Conference on Computer Vision and Pattern Recognition (2019)Google Scholar
  16. 16.
    Lempitsky, V., Zisserman, A.: Learning to count objects in images. In: Advances in Neural Information Processing Systems (2010)Google Scholar
  17. 17.
    Li, Y., Zhang, X., Chen, D.: CSRNet: dilated convolutional neural networks for understanding the highly congested scenes. In: Conference on Computer Vision and Pattern Recognition (2018)Google Scholar
  18. 18.
    Lian, D., Li, J., Zheng, J., Luo, W., Gao, S.: Density map regression guided detection network for RGB-D crowd counting and localization. In: Conference on Computer Vision and Pattern Recognition (2019)Google Scholar
  19. 19.
    Liu, C., Weng, X., Mu, Y.: Recurrent attentive zooming for joint crowd counting and precise localization. In: Conference on Computer Vision and Pattern Recognition (2019)Google Scholar
  20. 20.
    Liu, J., Gao, C., Meng, D., Hauptmann, A.: DecideNet: counting varying density crowds through attention guided detection and density estimation. In: Conference on Computer Vision and Pattern Recognition (2018)Google Scholar
  21. 21.
    Liu, L., Qiu, Z., Li, G., Liu, S., Ouyang, W., Lin, L.: Crowd counting with deep structured scale integration network. In: International Conference on Computer Vision (2019)Google Scholar
  22. 22.
    Liu, L., Wang, H., Li, G., Ouyang, W., Lin, L.: Crowd counting using deep recurrent spatial-aware network. In: International Joint Conference on Artificial Intelligence (2018)Google Scholar
  23. 23.
    Liu, N., Long, Y., Zou, C., Niu, Q., Pan, L., Wu, H.: ADCrowdNet: an attention-injective deformable convolutional network for crowd understanding. In: Conference on Computer Vision and Pattern Recognition (2019)Google Scholar
  24. 24.
    Liu, W., Lis, K., Salzmann, M., Fua, P.: Geometric and physical constraints for drone-based head plane crowd density estimation. In: International Conference on Intelligent Robots and Systems (2019)Google Scholar
  25. 25.
    Liu, W., Salzmann, M., Fua, P.: Context-aware crowd counting. In: Conference on Computer Vision and Pattern Recognition (2019)Google Scholar
  26. 26.
    Liu, X., Weijer, J., Bagdanov, A.: Leveraging unlabeled data for crowd counting by learning to rank. In: Conference on Computer Vision and Pattern Recognition (2018)Google Scholar
  27. 27.
    Liu, Y., Shi, M., Zhao, Q., Wang, X.: Point in, box out: beyond counting persons in crowds. In: Conference on Computer Vision and Pattern Recognition (2019)Google Scholar
  28. 28.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (2015)Google Scholar
  29. 29.
    Ma, Z., Wei, X., Hong, X., Gong, Y.: Bayesian loss for crowd count estimation with point supervision. In: International Conference on Computer Vision (2019)Google Scholar
  30. 30.
    Oñoro-Rubio, D., López-Sastre, R.J.: Towards perspective-free object counting with deep learning. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 615–629. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46478-7_38CrossRefGoogle Scholar
  31. 31.
    Pirsiavash, H., Ramanan, D., Fowlkes, C.: Globally-optimal greedy algorithms for tracking a variable number of objects. In: Conference on Computer Vision and Pattern Recognition, pp. 1201–1208, June 2011Google Scholar
  32. 32.
    Ranjan, V., Le, H., Hoai, M.: Iterative crowd counting. In: European Conference on Computer Vision (2018)Google Scholar
  33. 33.
    Sam, D., Sajjan, N., Babu, R., Srinivasan, M.: Divide and grow: capturing huge diversity in crowd images with incrementally growing CNN. In: Conference on Computer Vision and Pattern Recognition (2018)Google Scholar
  34. 34.
    Sam, D., Surya, S., Babu, R.: Switching convolutional neural network for crowd counting. In: Conference on Computer Vision and Pattern Recognition, p. 6 (2017)Google Scholar
  35. 35.
    Schröder, G., Senst, T., Bochinski, E., Sikora, T.: Optical flow dataset and benchmark for visual crowd analysis. In: International Conference on Advanced Video and Signal Based Surveillance (2018)Google Scholar
  36. 36.
    Shen, Z., Xu, Y., Ni, B., Wang, M., Hu, J., Yang, X.: Crowd counting via adversarial cross-scale consistency pursuit. In: Conference on Computer Vision and Pattern Recognition (2018)Google Scholar
  37. 37.
    Shi, M., Yang, Z., Xu, C., Chen, Q.: Revisiting perspective information for efficient crowd counting. In: Conference on Computer Vision and Pattern Recognition (2019)Google Scholar
  38. 38.
    Shi, X., Chen, Z., Wang, H., Yeung, D., Wong, W., Woo, W.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Advances in Neural Information Processing Systems, pp. 802–810 (2015)Google Scholar
  39. 39.
    Shi, Z., Mettes, P., Snoek, C.G.M.: Counting with focus for free. In: International Conference on Computer Vision (2019)Google Scholar
  40. 40.
    Shi, Z., Zhang, L., Liu, Y., Cao, X.: Crowd counting with deep negative correlation learning. In: Conference on Computer Vision and Pattern Recognition (2018)Google Scholar
  41. 41.
    Sindagi, V., Patel, V.: Generating high-quality crowd density maps using contextual pyramid CNNs. In: International Conference on Computer Vision, pp. 1879–1888 (2017)Google Scholar
  42. 42.
    Sindagi, V., Patel, V.: Multi-level bottom-top and top-bottom feature fusion for crowd counting. In: International Conference on Computer Vision (2019)Google Scholar
  43. 43.
    Sun, D., Yang, X., Liu, M., Kautz, J.: PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: Conference on Computer Vision and Pattern Recognition (2018)Google Scholar
  44. 44.
    Suurballe, J.: Disjoint paths in a network. Networks 4, 125–145 (1974)MathSciNetCrossRefGoogle Scholar
  45. 45.
    Walach, E., Wolf, L.: Learning to count with CNN boosting. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 660–676. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46475-6_41CrossRefGoogle Scholar
  46. 46.
    Wan, J., Chan, A.B.: Adaptive density map generation for crowd counting. In: International Conference on Computer Vision (2019)Google Scholar
  47. 47.
    Wan, J., Luo, W., Wu, B., Chan, A.B., Liu, W.: Residual regression with semantic prior for crowd counting. In: Conference on Computer Vision and Pattern Recognition (2019)Google Scholar
  48. 48.
    Wang, Q., Gao, J., Lin, W., Yuan, Y.: Learning from synthetic data for crowd counting in the wild. In: Conference on Computer Vision and Pattern Recognition (2019)Google Scholar
  49. 49.
    Xiong, F., Shi, X., Yeung, D.: Spatiotemporal modeling for crowd counting in videos. In: International Conference on Computer Vision, pp. 5161–5169 (2017)Google Scholar
  50. 50.
    Xiong, H., Lu, H., Liu, C., Liu, L., Cao, Z., Shen, C.: From open set to closed set: counting objects by spatial divide-and-conquer. In: International Conference on Computer Vision (2019)Google Scholar
  51. 51.
    Xu, C., Qiu, K., Fu, J., Bai, S., Xu, Y., Bai, X.: Learn to scale: generating multipolar normalized density maps for crowd counting. In: International Conference on Computer Vision (2019)Google Scholar
  52. 52.
    Yan, Z., et al.: Perspective-guided convolution networks for crowd counting. In: International Conference on Computer Vision (2019)Google Scholar
  53. 53.
    Zhang, A., et al.: Relational attention network for crowd counting. In: International Conference on Computer Vision (2019)Google Scholar
  54. 54.
    Zhang, A., et al.: Attentional neural fields for crowd counting. In: International Conference on Computer Vision (2019)Google Scholar
  55. 55.
    Zhang, C., Li, H., Wang, X., Yang, X.: Cross-scene crowd counting via deep convolutional neural networks. In: Conference on Computer Vision and Pattern Recognition, pp. 833–841 (2015)Google Scholar
  56. 56.
    Zhang, L., Li, Y., Nevatia, R.: Global data association for multi-object tracking using network flows. In: Conference on Computer Vision and Pattern Recognition (2008)Google Scholar
  57. 57.
    Zhang, Q., Chan, A.B.: Wide-area crowd counting via ground-plane density maps and multi-view fusion CNNs. In: Conference on Computer Vision and Pattern Recognition (2019)Google Scholar
  58. 58.
    Zhang, S., Wu, G., Costeira, J., Moura, J.: FCN-rLSTM: deep spatio-temporal neural networks for vehicle counting in city cameras. In: International Conference on Computer Vision (2017)Google Scholar
  59. 59.
    Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: Conference on Computer Vision and Pattern Recognition, pp. 589–597 (2016)Google Scholar
  60. 60.
    Zhao, M., Zhang, J., Zhang, C., Zhang, W.: Leveraging heterogeneous auxiliary tasks to assist crowd counting. In: Conference on Computer Vision and Pattern Recognition (2019)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.CVLab, EPFLLausanneSwitzerland
  2. 2.ClearSpaceÉcublensSwitzerland

Personalised recommendations