Edge computing-based real-time passenger counting using a compact convolutional neural network

  • Biao Yang
  • Jinmeng Cao
  • Xiaofeng LiuEmail author
  • Nan Wang
  • Jidong Lv
Original Article


Crowd counting from low-resolution images is a challenging task, in particular in the edge computing system. An embedded equipment is commonly incompetent at patch-based crowd counting with real-time performance. This work develops a real-time method to count passengers in a bus by using Nvidia TX2. The videos of entry are recorded by a camera up ahead, and the data suffer from severe occlusion, which makes designing handcrafted features difficult. The counting is performed by summing up pixel values of the density map estimated using a compact convolutional neural network (CCNN), which is robust to scale variations by employing skip connections. A weighted Euclidean loss is proposed to handle cluttered backgrounds and blurry foregrounds. The loss increases the activations in dense regions, but can restrain the activations in background regions. The counting results are further improved by smoothing, which utilizes constraints between consecutive frames. Comparisons with existing counting approaches, including patch-based and whole image-based approaches, are made on two benchmarking datasets. The results indicate the accuracy of CCNN in counting dense crowds. Moreover, the evaluated bus datasets verify the feasibility of CCNN in counting passengers from low-resolution input images with real-time performance on TX2.


Crowd counting Edge computing Compact convolutional neural network Weighted Euclidean loss Nvidia TX2 



This work has been supported by the National Natural Science Foundation of China under Grant Nos. 61501060 and 61703381, the Natural Science Foundation of Jiangsu Province under Grant No. BK20150271, Key Laboratory for New Technology Application of Road Conveyance of Jiangsu Province under Grant BM20082061708, Fundamental Research Funds for the Central Universities No. 2018B47114, and Key Research and Development Projects of Jiangsu Province under Grant BE2017071, BE2017647, and BE2018004-04 and by the Projects of International Cooperation and Exchanges of Changzhou under grant CZ20170018.


  1. 1.
    Li T, Chang H, Wang M, Ni B, Hong R, Yan S (2015) Crowded scene analysis: a survey. IEEE Trans Circuits Syst Video Technol 25(3):367–386CrossRefGoogle Scholar
  2. 2.
    Lempitsky V, Zisserman A (2010) Learning to count objects in images. In: Proceedings of the 23rd international conference on neural information processing systems, vol 1. Curran Associates Inc., pp 1324–1332Google Scholar
  3. 3.
    Hsieh M-R, Lin Y-L, Hsu W.H (2017) Drone-based object counting by spatially regularized regional proposal network. In: The IEEE international conference on computer vision (ICCV), vol 1Google Scholar
  4. 4.
    Sabzmeydani P, Mori G (2007) Detecting pedestrians by learning shapelet features. In: IEEE conference on computer vision and pattern recognition. CVPR’07. IEEE, pp 1–8Google Scholar
  5. 5.
    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE computer society conference on computer vision and pattern recognition. CVPR 2005, vol 1. IEEE, pp 886–893Google Scholar
  6. 6.
    Luo J, Wang J, Xu H, Lu H (2016) Real-time people counting for indoor scenes. Signal Process 124:27–35CrossRefGoogle Scholar
  7. 7.
    Chan AB, Liang Z-SJ, Vasconcelos N (2008) Privacy preserving crowd monitoring: counting people without people models or tracking. In: IEEE conference on computer vision and pattern recognition. CVPR 2008. IEEE, pp 1–7Google Scholar
  8. 8.
    Chan AB, Morrow M, Vasconcelos N, et al (2009) Analysis of crowded scenes using holistic properties. In: Performance evaluation of tracking and surveillance workshop at CVPR, pp 101–108Google Scholar
  9. 9.
    Hashemzadeh M, Farajzadeh N (2016) Combining keypoint-based and segment-based features for counting people in crowded scenes. Inf Sci 345:199–216CrossRefGoogle Scholar
  10. 10.
    Liang R, Zhu Y, Wang H (2014) Counting crowd flow based on feature points. Neurocomputing 133:377–384CrossRefGoogle Scholar
  11. 11.
    Siva P, Shafiee MJ, Jamieson M, Wong A Scene invariant crowd segmentation and counting using scale-normalized histogram of moving gradients (homg). arXiv preprint arXiv:1602.00386
  12. 12.
    Zhang X, He H, Cao S, Liu H (2015) Flow field texture representation-based motion segmentation for crowd counting. Mach Vis Appl 26(7–8):871–883CrossRefGoogle Scholar
  13. 13.
    Chen K, Gong S, Xiang T, Change Loy C (2013) Cumulative attribute space for age and crowd density estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2467–2474Google Scholar
  14. 14.
    Zhang H, Cao X, Ho JK, Chow TW (2017) Object-level video advertising: an optimization framework. IEEE Trans Ind Inform 13(2):520–531CrossRefGoogle Scholar
  15. 15.
    Zhang H, Ji Y, Huang W et al (2018) Sitcom-star-based clothing retrieval for video advertising: a deep learning framework. Neural Comput Appl. CrossRefGoogle Scholar
  16. 16.
    Walach E, Wolf L (2016) Learning to count with CNN boosting. In: Europeanconference on computer vision. Springer, pp 660–676Google Scholar
  17. 17.
    Wang C, Zhang H, Yang L, Liu S, Cao X (2015) Deep people counting in extremely dense crowds. In: Proceedings of the 23rd ACM international conference on multimedia. ACM, pp 1299–1302Google Scholar
  18. 18.
    Onoro-Rubio D, López-Sastre RJ (2016) Towards perspective-free object counting with deep learning. In: European conference on computer vision. Springer, pp 615–629Google Scholar
  19. 19.
    Zhang C, Li H, Wang X, Yang X (2015) Cross-scene crowd counting via deep convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 833–841Google Scholar
  20. 20.
    Yang B, Cao J, Wang N, Zhang Y, Zou L (2018) Counting challenging crowds robustly using a multi-column multi-task convolutional neural network. Signal Process Image Commun 64:118–129CrossRefGoogle Scholar
  21. 21.
    Xiong F, Shi X, Yeung D-Y (2017) Spatiotemporal modeling for crowd counting in videos. In: 2017 IEEE international conference on computer vision (ICCV). IEEE, pp 5161–5169Google Scholar
  22. 22.
    Shen Z, Xu Y, Ni B, Wang M, Hu J, Yang X (2018) Crowd counting via adversarial cross-scale consistency pursuit. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5245–5254Google Scholar
  23. 23.
    Li Y, Zhang X, Chen D (2018) Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1091–1100Google Scholar
  24. 24.
    Liu J, Gao C, Meng D, Hauptmann AG Decidenet: counting varying density crowds through attention guided detection and density estimation. arXiv preprint arXiv:1712.06679
  25. 25.
    Shi Z, Zhang L, Liu Y, Cao X, Ye Y, Cheng M-M, Zheng G (2018) Crowd counting with deep negative correlation learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5382–5390Google Scholar
  26. 26.
    Marsden M, McGuinness K, Little S, O’Connor NE Fully convolutional crowd counting on highly congested scenes. arXiv preprint arXiv:1612.00220
  27. 27.
    Shang C, Ai H, Bai B (2016) End-to-end crowd counting via joint learning local and global count. In: 2016 IEEE international conference on image processing (ICIP). IEEE, pp 1215–1219Google Scholar
  28. 28.
    Sindagi VA, Patel VM (2017) Cnn-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In: 2017 14th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE, pp 1–6Google Scholar
  29. 29.
    Zhang Y, Zhou D, Chen S, Gao S, Ma Y (2016) Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 589–597Google Scholar
  30. 30.
    Wang H, Dai L, Cai Y, Sun X, Chen L (2018) Salient object detection based on multi-scale contrast. Neural Netw 101:47–56CrossRefGoogle Scholar
  31. 31.
    Zhang H, Li J, Ji Y, Yue H (2017) Understanding subtitles by character-level sequence-to-sequence learning. IEEE Trans Ind Inform 13(2):616–624CrossRefGoogle Scholar
  32. 32.
    Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J (2017) Lstm: a search space odyssey. IEEE Trans Neural Netw Learn Syst 28(10):2222–2232MathSciNetCrossRefGoogle Scholar
  33. 33.
    He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034Google Scholar
  34. 34.
    Sam DB, Surya S, Babu RV (2017) Switching convolutional neural network for crowd counting. In: Proceedings of the IEEE conference on computer vision and pattern recognition, vol 1, p 6Google Scholar
  35. 35.
    Sindagi VA, Patel VM (2017) Generating high-quality crowd density maps using contextual pyramid CNNS. In: 2017 IEEE international conference on computer vision (ICCV). IEEE, pp 1879–1888Google Scholar
  36. 36.
    Redmon J, Farhadi A Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Information Science and EngineeringChangzhou UniversityChangzhouChina
  2. 2.Department of IoT EngineeringCollege of Hohai UniversityChangzhouChina
  3. 3.Department of Electronic Engineering, College of Information Science and EngineeringOcean University of ChinaQingdaoChina

Personalised recommendations