Skip to main content

A Flow Base Bi-path Network for Cross-Scene Video Crowd Understanding in Aerial View

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 Workshops (ECCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12538))

Included in the following conference series:

Abstract

Drones shooting can be applied in dynamic traffic monitoring, object detecting and tracking, and other vision tasks. The variability of the shooting location adds some intractable challenges to these missions, such as varying scale, unstable exposure, and scene migration. In this paper, we strive to tackle the above challenges and automatically understand the crowd from the visual data collected from drones. First, to alleviate the background noise generated in cross-scene testing, a double-stream crowd counting model is proposed, which extracts optical flow and frame difference information as an additional branch. Besides, to improve the model’s generalization ability at different scales and time, we randomly combine a variety of data transformation methods to simulate some unseen environments. To tackle the crowd density estimation problem under extreme dark environments, we introduce synthetic data generated by game Grand Theft Auto V(GTAV). Experiment results show the effectiveness of the virtual data. Our method wins the challenge with a mean absolute error (MAE) of 12.70\(^1\). Moreover, a comprehensive ablation study is conducted to explore each component’s contribution.

Z. Zhao and T. Han—Equal Contribution.

\(^1\) Finally, reach the MAE of 12.36, ranked the second.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 850–865. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_56

    Chapter  Google Scholar 

  2. Borges, P.V.K., Conci, N., Cavallaro, A.: Video-based human behavior understanding: a survey. IEEE Trans. Circ. Syst. Video Technol. 23(11), 1993–2008 (2013)

    Article  Google Scholar 

  3. Cao, X., Wang, Z., Zhao, Y., Su, F.: Scale aggregation network for accurate and efficient crowd counting. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 734–750 (2018)

    Google Scholar 

  4. Cenggoro, T.W., Aslamiah, A.H., Yunanto, A.: Feature pyramid networks for crowd counting. Procedia Comput. Sci. 157, 175–182 (2019)

    Article  Google Scholar 

  5. Chen, Y., Gao, C., Su, Z., He, X., Liu, N.: Scale-aware rolling fusion network for crowd counting. In: 2020 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2020)

    Google Scholar 

  6. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)

    Google Scholar 

  7. Dong, Z., Zhang, R., Shao, X., Li, Y.: Scale-recursive network with point supervision for crowd scene analysis. Neurocomputing 384, 314–324 (2020)

    Article  Google Scholar 

  8. Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 1126–1135. JMLR. org (2017)

    Google Scholar 

  9. Gao, J., Han, T., Wang, Q., Yuan, Y.: Domain-adaptive crowd counting via inter-domain features segregation and gaussian-prior reconstruction. arXiv preprint arXiv:1912.03677 (2019)

  10. Gao, J., Wang, Q., Yuan, Y.: Feature-aware adaptation and structured density alignment for crowd counting in video surveillance. arXiv preprint arXiv:1912.03672 (2019)

  11. Gao, J., Wang, Q., Yuan, Y.: Scar: spatial-/channel-wise attention regression networks for crowd counting. Neurocomputing 363, 1–8 (2019)

    Article  Google Scholar 

  12. Han, T., Gao, J., Yuan, Y., Wang, Q.: Focus on semantic consistency for cross-domain crowd understanding. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1848–1852. IEEE (2020)

    Google Scholar 

  13. Hossain, M., Hosseinzadeh, M., Chanda, O., Wang, Y.: Crowd counting using scale-aware attention networks. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1280–1288. IEEE (2019)

    Google Scholar 

  14. Hossain, M.A., Kumar, M., Hosseinzadeh, M., Chanda, O., Wang, Y.: One-shot scene-specific crowd counting

    Google Scholar 

  15. Idrees, H., et al.: Composition loss for counting, density map estimation and localization in dense crowds, pp. 532–546 (2018)

    Google Scholar 

  16. Jiang, X., et al.: Attention scaling for crowd counting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4706–4715 (2020)

    Google Scholar 

  17. Kang, D., Chan, A.: Crowd counting by adaptively fusing predictions from an image pyramid. In: British Machine Vision Conference 2018, BMVC 2018, Newcastle, UK, September 3–6, 2018, p. 89. BMVA Press (2018)

    Google Scholar 

  18. Kang, K., Wang, X.: Fully convolutional neural networks for crowd segmentation. arXiv preprint arXiv:1411.4464 (2014)

  19. Kroeger, T., Timofte, R., Dai, D., Van Gool, L.: Fast optical flow using dense inverse search. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 471–488. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_29

    Chapter  Google Scholar 

  20. Li, Y., Zhang, X., Chen, D.: Csrnet: dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1091–1100 (2018)

    Google Scholar 

  21. Lin, T., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  22. Liu, H., Li, Y., Zhou, Z., Wu, W.: Cross-scene crowd counting via fcn and gaussian model. In: 2016 International Conference on Virtual Reality and Visualization (ICVRV), pp. 148–153. IEEE (2016)

    Google Scholar 

  23. Liu, W., Salzmann, M., Fua, P.: Context-aware crowd counting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5099–5108 (2019)

    Google Scholar 

  24. Nie, F., Wang, Z., Wang, R., Wang, Z., Li, X.: Towards robust discriminative projections learning via non-greedy \(l_{2,1}\)-norm minmax. IEEE Transactions on Pattern Analysis and Machine Intelligence (2019)

    Google Scholar 

  25. Oñoro-Rubio, D., López-Sastre, R.J.: Towards perspective-free object counting with deep learning. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 615–629. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_38

    Chapter  Google Scholar 

  26. Paszke, A., et al.: Automatic differentiation in pytorch (2017)

    Google Scholar 

  27. Ranjan, V., Le, H., Hoai, M.: Iterative crowd counting. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 270–285 (2018)

    Google Scholar 

  28. Reddy, M.K.K., Hossain, M., Rochan, M., Wang, Y.: Few-shot scene adaptive crowd counting using meta-learning. In: The IEEE Winter Conference on Applications of Computer Vision, pp. 2814–2823 (2020)

    Google Scholar 

  29. Sindagi, V.A., Yasarla, R., Patel, V.M.: Jhu-crowd++: large-scale crowd counting dataset and a benchmark method. arXiv preprint arXiv:2004.03597 (2020)

  30. Sun, D., Yang, X., Liu, M.Y., Kautz, J.: Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8934–8943 (2018)

    Google Scholar 

  31. Wang, Q., Chen, M., Nie, F., Li, X.: Detecting coherent groups in crowd scenes by multiview clustering. IEEE Trans. Pattern Anal. Mach. Intell. 42(1), 46–58 (2018)

    Article  Google Scholar 

  32. Wang, Q., Gao, J., Lin, W., Li, X.: Nwpu-crowd: a large-scale benchmark for crowd counting. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020)

    Google Scholar 

  33. Wang, Q., Gao, J., Lin, W., Yuan, Y.: Learning from synthetic data for crowd counting in the wild. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 8198–8207 (2019)

    Google Scholar 

  34. Wang, Q., Han, T., Gao, J., Yuan, Y.: Neuron linear transformation: modeling the domain shift for crowd counting. arXiv preprint arXiv:2004.02133 (2020)

  35. Wang, Z., Nie, F., Tian, L., Wang, R., Li, X.: Discriminative feature selection via a structured sparse subspace learning module

    Google Scholar 

  36. Wen, L., et al.: Visdrone challenge leaderboard. http://aiskyeye.com/leaderboard/

  37. Wen, L., et al.: Drone-based joint density map estimation, localization and tracking with space-time multi-scale attention network. arXiv preprint arXiv:1912.01811 (2019)

  38. Wu, X., Zheng, Y., Ye, H., Hu, W., Yang, J., He, L.: Adaptive scenario discovery for crowd counting. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2382–2386. IEEE (2019)

    Google Scholar 

  39. Yang, B., Cao, J.M., Wang, N., Zhang, Y.Y., Cui, G.Z.: Cross-scene counting based on domain adaptation-extreme learning machine. IEEE Access 6, 17029–17038 (2018)

    Article  Google Scholar 

  40. Yang, B., Zhan, W., Wang, N., Liu, X., Lv, J.: Counting crowds using a scale-distribution-aware network and adaptive human-shaped kernel. Neurocomputing 390, 207–216 (2019)

    Article  Google Scholar 

  41. Yuan, L., et al.: Crowd counting via scale-communicative aggregation networks. Neurocomputing 409, 420–430 (2020)

    Article  Google Scholar 

  42. Zeng, L., Xu, X., Cai, B., Qiu, S., Zhang, T.: Multi-scale convolutional neural networks for crowd counting. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 465–469. IEEE (2017)

    Google Scholar 

  43. Zhang, C., Kang, K., Li, H., Wang, X., Xie, R., Yang, X.: Data-driven crowd understanding: a baseline for a large-scale crowd dataset. IEEE Trans. Multimedia 18(6), 1048–1061 (2016)

    Article  Google Scholar 

  44. Zhang, H., et al.: Resnest: split-attention networks. arXiv preprint arXiv:2004.08955 (2020)

  45. Zhang, L., Shi, M., Chen, Q.: Crowd counting via scale-adaptive convolutional neural network. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1113–1121. IEEE (2018)

    Google Scholar 

  46. Zhang, X., Yu, Q., Yu, H.: Physics inspired methods for crowd video surveillance and analysis: a survey. IEEE Access 6, 66816–66830 (2018)

    Article  Google Scholar 

  47. Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 589–597 (2016)

    Google Scholar 

  48. Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)

    Google Scholar 

  49. Zhu, P., Wen, L., Du, D., Bian, X., Hu, Q., Ling, H.: Vision meets drones: past, present and future. arXiv preprint arXiv:2001.06303 (2020)

  50. Zou, Z., Cheng, Y., Qu, X., Ji, S., Guo, X., Zhou, P.: Attend to count: crowd counting with adaptive capacity multi-scale cnns. Neurocomputing 367, 75–83 (2019)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qi Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhao, Z., Han, T., Gao, J., Wang, Q., Li, X. (2020). A Flow Base Bi-path Network for Cross-Scene Video Crowd Understanding in Aerial View. In: Bartoli, A., Fusiello, A. (eds) Computer Vision – ECCV 2020 Workshops. ECCV 2020. Lecture Notes in Computer Science(), vol 12538. Springer, Cham. https://doi.org/10.1007/978-3-030-66823-5_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-66823-5_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-66822-8

  • Online ISBN: 978-3-030-66823-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics