Multimedia Tools and Applications

, Volume 76, Issue 6, pp 8915–8936 | Cite as

Gesture recognition of traffic police based on static and dynamic descriptor fusion

  • Fan Guo
  • Jin TangEmail author
  • Xile Wang


We present a method to recognize gestures made by Chinese traffic police based on the static and dynamic descriptor fusion for driver assistance systems and intelligent vehicles. Gesture recognition is made possible by combining the extracted static and dynamic features. First, the point cloud data of human upper body in each frame of input video is obtained to estimate the static descriptor with 2.5D gesture model. Then, the dynamic descriptor is estimated by computing the motion history image of the input RGB video sequence. Finally, the above two descriptors are fused and the mean structural similarity index is used to recognize the gestures made by Chinese traffic police. A comparative study and qualitative evaluation are proposed with other gesture recognition methods, which demonstrate that better recognition results can be obtained using the proposed method on a number of video sequences.


Chinese traffic police Gesture recognition 2.5D gesture model Motion history image Descriptor fusion 



This work was supported in part by the National Natural Science Foundation of China (No. 61502537, 91220301), China Postdoctoral Science Foundation (No. 2014 M552154), Hunan Planned Projects for Key Scientific Research Funds (No. 2015WK3006), Postdoctoral Science Foundation of Central South University (No. 126648).


  1. 1.
    Bradski G, Davis J (2000) Motion segmentation and pose recognition with motion history gradients. In: Proceedings of IEEE Workshop on Applications of Computer Vision, pp 174–184Google Scholar
  2. 2.
    Cai ZX, Guo F (2015) Max-covering scheme for gesture recognition of Chinese traffic police. Pattern Anal Applic 18(2):403–418MathSciNetCrossRefGoogle Scholar
  3. 3.
    Eichner M, Ferrari V (2009) Better appearance models for pictorial structures. In: Proceeding of British Machine Vision Conference, London, UK, pp 1–11Google Scholar
  4. 4.
    Eichner M, Ferrari V (2012) Human pose co-estimation and applications. IEEE Trans Pattern Anal Mach Intell (PAMI) 34(11):2282–2288CrossRefGoogle Scholar
  5. 5.
    Eichner M, Marin-Jimenez M, Zisserman A, Ferrari V (2012) 2D articulated human pose estimation and retrieval in (almost) unconstrained still images. Int J Comput Vis (IJCV) 99(2):190–214MathSciNetCrossRefGoogle Scholar
  6. 6.
    Ferrari V, Marin-Jimenez M, Zisserman A (2008) Progressive search space reduction for human pose estimation. In: Proceeding of IEEE Conference on Computer Vision & Pattern Recognition, Anchorage, AK, pp 1–8Google Scholar
  7. 7.
    Guo F, Cai ZX, Tang J (2011) Chinese traffic police gesture recognition in complex scene. In: Proceeding of the 2011 International Joint Conference of IEEE FCST-11, Los Alamitos, USA, pp 1505–1511Google Scholar
  8. 8.
    Guo F, Tang J, Cai ZX (2013) Automatic recognition of Chinese traffic police gesture based on max-covering scheme. Adv Inf Sci Serv Sci 5(1):428–436Google Scholar
  9. 9.
    Huang YM, Zhang GB, Li X, Da FP (2011) Improved emotion recognition with novel global utterance-level features. Appl Math Inf Sci 5(2):147–153Google Scholar
  10. 10.
    Johnson S, Everigham M (2011) Learning effective human pose estimation from inaccurate annotation. In: Proceeding of IEEE Conference on Computer Vision & Pattern Recognition, Colorado Springs, USA, pp 1465–1472Google Scholar
  11. 11.
    Kang H, Lee CW, Jung K (2004) Recognition-based gesture spotting in video games. Pattern Recogn Lett 25(15):1701–1714CrossRefGoogle Scholar
  12. 12.
    Le QK, Pham CH, Le TH (2012) Road traffic control gesture recognition using depth images. IEEK Trans Smart Process Comput 1(1):1–7Google Scholar
  13. 13.
    Liu JG, Luo JB, Shan M (2009) Recognizing realistic actions from videos ‘in the wild’. In: Proceeding of IEEE Conference on Computer Vision & Pattern Recognition, Miami, FL, pp 1996–2003Google Scholar
  14. 14.
    Sapp B, Jordan C, Taskar B (2010) Adaptive pose prior for pictorial structures. In: Proceeding of IEEE Conference on Computer Vision & Pattern Recognition, San Francisco, USA pp 422–429Google Scholar
  15. 15.
    Singh M, Mandal M, Basu A (2005) Visual gesture recognition for ground air traffic control using the Radon transform. In: Proceeding of IEEE/RSJ International Conference on Intelligent Robots & Systems, Edmonton, Canada, pp 2586–2591Google Scholar
  16. 16.
    Smisek J, Jancosek M, Pajdla T (2011) 3D with Kinect. In: Proceedings of the 2011 I.E. International Conference on Computer Vision Workshops, Barcelona, Spain, pp 1154–1160Google Scholar
  17. 17.
    Song Y, Demirdjian D, Davis R (2011) Tracking body and hands for gesture recognition: NATOPS aircraft handling signal database. In: Proceeding of IEEE International Conference on Automatic Face & Gesture Recognition and Workshops, Santa Barbara, CA, pp 500–506Google Scholar
  18. 18.
    Suau X, Casas JR, Ruiz-Hidalgo J (2011) Real-time head and hand tracking based on 2.5D data. In: Proceedings of the 2011 I.E. International Conference on Multimedia and Expo, Barcelona, Spain, pp 1–6Google Scholar
  19. 19.
    Tang J, Luo J, Tjahjadi T, Gao Y (2014) 2.5D multi-view gait recognition based on point cloud registration. Sensors 14:6124–6143CrossRefGoogle Scholar
  20. 20.
    Visual Geometry Group (2015) 2D articulated human pose estimation software v1.22, Accessed 15 May 2015
  21. 21.
    Yuan T, Wang B (2010) Accelerometer-based Chinese traffic police gesture recognition system. Chin J Electron 19(2):270–274MathSciNetGoogle Scholar
  22. 22.
    Zhou W, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612CrossRefGoogle Scholar
  23. 23.
    Zhou Z, Li ST, Sun B (2014) Extreme learning machine based hand posture recognition in color-depth image. In: Proceedings of Chinese Conference on Pattern Recognition, pp 1–10Google Scholar
  24. 24.
    Zhu Y, Fujimura K (2010) A Bayesian framework for human body pose tracking from depth image sequences. Sensors 10:5280–5293CrossRefGoogle Scholar
  25. 25.
    Zou BJ, Chen S, Shi C et al (2009) Automatic reconstruction of 3D human motion pose from uncalibrated monocular video sequences based on markerless human motion tracking. Pattern Recogn 42:1559–1571CrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.School of Information Science and EngineeringCentral South UniversityChangshaChina

Personalised recommendations