Multimedia Tools and Applications

, Volume 73, Issue 1, pp 273–289 | Cite as

Scene-adaptive accurate and fast vertical crowd counting via joint using depth and color information

  • Huiyuan Fu
  • Huadong MaEmail author
  • Hongtian Xiao


Reliable and real-time crowd counting is one of the most important tasks in intelligent visual surveillance systems. Most previous works only count passing people based on color information. Owing to the restrictions of color information influences themselves for multimedia processing, they will be affected inevitably by the unpredictable complex environments (e.g. illumination, occlusion, and shadow). To overcome this bottleneck, we propose a new algorithm by multimodal joint information processing for crowd counting. In our method, we use color and depth information together with a ordinary depth camera (e.g. Microsoft Kinect). Specifically, we first detect each head of the passing or still person in the surveillance region with adaptive modulation ability to varying scenes on depth information. Then, we track and count each detected head on color information. The characteristic advantage of our algorithm is that it is scene adaptive, which means the algorithm can be applied into all kinds of different scenes directly without additional conditions. Based on the proposed approach, we have built a practical system for robust and fast crowd counting facing complicated scenes. Extensive experimental results show the effectiveness of our proposed method.


Multimodal joint multimedia processing Crowd counting Ordinary depth camera Scene-adaptive scheme Real time system 



This work was supported by the China National Funds for Distinguished Young Scientists under Grant No.60925010, Natural Science Foundation of China under Grant No.61272517, The Research Fund for the Doctoral Program of Higher Education of China under Grant No.20120005130002, the Co-sponsored Project of Beijing Committee of Education, the Funds for Creative Research Groups of China under Grant No.61121001, and the Program for Changjiang Scholars and Innovative Research Team in University under Grant No.IRT1049.


  1. 1.
    Antic B, Letic D, Culibrk D, Crnojevic V (2009) K-MEANS based segmentation for real-time zenithal people counting. In: IEEE international conference on image processing, pp 2565–2568Google Scholar
  2. 2.
    Antoniou C, Ben-Akiva M, Koutsopoulos HN (2007) Nonlinear kalman filtering algorithms for on-line calibration of dynamic traffic assignment models. IEEE Trans Intell Transport Syst 8:661–670CrossRefGoogle Scholar
  3. 3.
    Bouaynaya N, Schonfeld D (2009) On the Optimality of motion-based particle filtering. IEEE Trans Circuits Syst Video Technol 19:1068–1072CrossRefGoogle Scholar
  4. 4.
    Chateau T, Gay-Belille V (2006) Real-time tracking with classifiers. In: IEEE European conference on computer vision, pp 218–231Google Scholar
  5. 5.
    Cong Y, Gong HF, Zhu SC, Tang YD (2009) Flow mosaicking: real-time pedestrian counting without scene-specific learning. In: IEEE conference on computer vision and pattern recognition, pp 1093–1100Google Scholar
  6. 6.
    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE conference on computer vision and pattern recognition, vol 1, pp 886–893Google Scholar
  7. 7.
    Dollar P, Wojek C, Schiele B, Perona P (2009) Pedestrian detection: a benchmark. In: IEEE conference on computer vision and pattern recognition, pp 304–311Google Scholar
  8. 8.
    Fu HY, Ma HD, Liu L (2011) Robust human detection with low energy consumption in visual sensor network. In: IEEE international conference on mobile ad-hoc and sensor networks, pp 91–97Google Scholar
  9. 9.
    Gavrila DM, Munder S (2007) Multi-cue pedestrian detection and tracking from a moving vehicle. Int J Comput Vis 73:41–59CrossRefGoogle Scholar
  10. 10.
    Kai X, Wei CL, Liu LD (2010) Robust extended kalman filtering for nonlinear systems with stochastic uncertainties. IEEE Trans Syst Man Cybern 40:399–405CrossRefGoogle Scholar
  11. 11.
    Ma HD, Zeng CB, Ling CX (2012) A reliable people counting system via multiple cameras. ACM Trans Intel Syst Technol 3:1–22CrossRefGoogle Scholar
  12. 12.
    Mikolajczyk K, Schmid C, Zisserman A (2004) Human detection based on a probabilistic assembly of robust part detectors. In: Pajdla T, Matas J (eds) European Conference on Computer Vision, vol 3021. Berlin, Heidelberg, pp 69–82Google Scholar
  13. 13.
    Mu Y, Yan S, Liu Y, Huang T, Zhou B (2008) Discriminative local binary patterns for human detection in personal album. In: IEEE conference on computer vision and pattern recognition, pp 1–8Google Scholar
  14. 14.
    Kilambi P, Ribnick E, Joshi AJ, Masoud O, Papanikolopoulos N (2008) Estimating pedestrian counts in groups. Comp Vision Image Underst 110:43–59CrossRefGoogle Scholar
  15. 15.
    Shimada A, Arita D, Taniguchi R (2006) Dynamic control of adaptive mixture of gaussians background model. In: IEEE international conference on video and signal based surveillance, pp 20–24Google Scholar
  16. 16.
    Stauffer C, Grimson WEL (1999) Adaptive background mixture models for real-time tracking. In: IEEE conference on computer vision and pattern recognition, vol 2, pp 246–252Google Scholar
  17. 17.
    Tuzel O, Porikli FM, Meer P (2008) Pedestrian detection via classification on riemannian manifolds. IEEE Trans Pattern Anal Mach Intell 30(10):1713–1727CrossRefGoogle Scholar
  18. 18.
    Velipasalar S, Tian YL, Hampapur A (2006) Automatic counting of interacting people by using a single uncalibrated camera. In: IEEE international conference on multimedia and expo, pp 1265–1268Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.Beijing Key Laboratory of Intelligent Telecommunications Software and MultimediaBeijing University of Posts and TelecommunicationsBeijingChina

Personalised recommendations