Skip to main content
Log in

Spatial and temporal pyramid-based real-time gesture recognition

  • Special Issue Paper
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

This paper proposes a novel method for real-time gesture recognition. Aiming at improving the effectiveness and accuracy of HGR, spatial pyramid is applied to linguistically segment gesture sequence into linguistic units and a temporal pyramid is proposed to get a time-related histogram for each single gesture. Those two pyramids can help to extract more comprehensive information of human gestures from RGB and depth video. A two-layered HGR is further exploited to further reduce the computation complexity. The proposed method obtains high accuracy and low computation complexity performance on the ChaLearn Gesture Dataset, comprising more than 50, 000 gesture sequences recorded.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Kendon, A.: Visible Action as Utterance. Cambridge University Press, Cambridge (2004)

    Book  Google Scholar 

  2. Mitra, S., Acharya, T.: Gesture recognition: a survey. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 37(3), 311–324 (2007)

    Article  Google Scholar 

  3. Fang, G., Gao, W., Zhao, D.: Large-vocabulary continuous sign language recognition based on transition-movement models. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 37(1), 1–9 (2007)

    Article  Google Scholar 

  4. Baklouti, M., Monacelli, E., Guitteny, V., Couvet, S.: Intelligent assistive exoskeleton with vision based interface. In: Proceedings of the 6th International Conference on Smart Homes and Health Telematics, vol. 5120, pp. 123–135 (2008)

  5. Nickel, K., Stiefelhagen, R.: Visual recognition of pointing gestures for human-robot interaction. Image Vis. Comput. 25(12), 1875–1884 (2007)

    Article  Google Scholar 

  6. Wu, Y., Huang, T.S.: Hand modeling analysis and recognition for vision based human computer interaction. IEEE Signal Process. Mag. Spec. Issue Immers. Interact. Technol. 18(3), 51–60 (2001)

    Google Scholar 

  7. Corradini, A.: Real-time gesture recognition by means of hybrid recognizers. In: International Workshop on Gesture and Sign Languages in Human–Computer Interaction, vol. 2298, pp. 34–46 (2001)

  8. Ong, S.C., Ranganath, S.: Automatic sign language analysis: a survey and the future beyond lexical meaning. IEEE Trans. Pattern Anal. Mach. Intell. 27(6), 873–891 (2005)

    Article  Google Scholar 

  9. Aran, O., Keskin, C., Akarun, L.: Computer applications for disabled people and sign language tutoring. In: Proceedings of the Fifth GAP Engineering Congress, pp. 26–28 (2006)

  10. Cooper, H., Holt, B. Bowden, R.: Sign language recognition. In: Visual Analysis of Humans. Springer, London, pp. 539–562 (2011)

  11. Triesch, J., Malsburg, C.: Robotic gesture recognition by cue combination. In: Gesture and Sign Language in Human–Computer Interaction. Lecture Notes in Computer Science. Springer, Berlin, pp. 233–244 (1998)

  12. Hong, S., Setiawan, N.A., Lee, C.: Real-time vision based gesture recognition for human robot interaction. In: Proceedings of International Conference on Knowledge-Based and Intelligent Information & Engineering Systems, vol. 4692, p. 493 (2007)

  13. Grzeszcuk, R., Bradski, G., Chu, M.H., Bouguet, J.Y.: Stereo based gesture recognition invariant to 3d pose and lighting. In: Proceedings of CVPR, pp. 826–833 (2000)

  14. Fujimura, K., Liu, X.: Sign recognition using depth image streams. In: Proceedings of FGR, Southampton, UK, pp. 381–386 (2006)

  15. Hadfield, S., Bowden, R.: Generalised pose estimation using depth. In: Proceedings of ECCV International Workshop: Sign, Gesture, Activity, Heraklion, Crete (2010)

  16. Ershaed, H., Al-Alali, I., Khasawneh, N., Fraiwan, M.: An Arabic sign language computer interface using the Xbox Kinect. In: Annual Undergraduate Research Conference on Applied Computing, Dubai, UAE (2011)

  17. Hong, R., Wang, M., Gao, Y., Tao, D., Li, X., Wu, X.: Image annotation by multiple-instance learning with discriminative feature mapping and selection. IEEE Trans. Cybern. 44(5), 669–680 (2014)

    Article  Google Scholar 

  18. Suma, E., Lange, B., Rizzo, A., Krum, D., Bolas, M.: FAAST: The flexible action and articulated skeleton toolkit. In: IEEE Virtual Reality Conference, pp. 247–248 (2011)

  19. Guyon, I., Athitsos, V., Jangyodsuk, P., Hamner, B., Escalante, H.J.: Chalearn gesture challenge: Design and first results. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1–6 (2012)

  20. Zhou, H., Huang, T.: Tracking articulated hand motion with Eigen dynamics analysis. In: Proceedings of International Conference on Computer Vision, vol. 2, pp. 1102–1109 (2003)

  21. Wu, Y., Lin, J., Huang, T.: Capturing natural hand articulation. In: IEEE International Conference on Computer Vision, pp. 426–432 (2001)

  22. Dardas, N.H.A.Q.: Real-time hand gesture detection and recognition for human computer interaction. Ottawa-Carleton Institute for Electrical and Computer Engineering, School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, Ontario (2012)

  23. Kadir, T., Bowden, R., Ong, E.J., Zisserman, A.: Minimal training, large lexicon, unconstrained sign language recognition. In: Proceedings of BMVC, Kingston, UK, pp. 939–948 (2004)

  24. Zhang, L.G., Chen, Y., Fang, G., Chen, X., Gao, W.: A vision-based sign language recognition system using tied-mixture density HMM. In: Proceedings of International Conference on Multimodal interfaces, State College, PA, USA, pp. 198–204 (2004)

  25. Awad, G., Han, J., Sutherland, A.: A unified system for segmentation and tracking of face and hands in sign language recognition. In: Proceedings of ICPR, Hong Kong, China, pp. 239–242 (2006)

  26. Stergiopoulou, E., Papamarkos, N.: Hand gesture recognition using a neural network shape fitting technique. Eng. Appl. Artif. Intell. 22, 1141–1158 (2009)

    Article  Google Scholar 

  27. Maung, T.H.H.: Real-time hand tracking and gesture recognition system using neural networks. World Acad. Sci. Eng. Technol. 50, 466–470 (2009)

    Google Scholar 

  28. Maraqa, M., Abu-Zaiter, R.: Recognition of Arabic Sign Language (ArSL) using recurrent neural networks. In: International Conference on Applications of Digital Information and Web Technologies, pp. 478–481 (2008)

  29. Akyol, S., Alvarado, P.: Finding relevant image content for mobile sign language recognition. In: International Conference on Signal Processing, Pattern Recognition and Application, Rhodes, Greece, pp. 48–52 (2001)

  30. Wong, S.F., Kim, T.K., Cipolla, R.: Learning motion categories using both semantic and structural information. In: IEEE Conference on Computer Vision and Pattern Recognition, 2007, pp. 1–6 (2007)

  31. Zahedi, M., Keysers, D. Ney, H.: Appearance-based recognition of words in American sign language. In: Pattern Recognition and Image Analysis. Springer, Heidelberg, pp. 511–519 (2005)

  32. Yin, X., Zhu, X.: Hand posture recognition in gesture-based human–robot interaction. In: IEEE Conference on Industrial Electronics and Applications, pp. 1–6 (2006)

  33. Chen, B.W., He, X., Ji, W., Rho, S., Kung, S.Y.: Support vector analysis of large-scale data based on kernels with iteratively increasing order. J. Supercomput., 1–15 (2015)

  34. Chen, B.W., Wang, J.C., Wang, J.F.: A novel video summarization based on mining the story-structure and semantic relations among concept entities. IEEE Trans. Multimedia 11(2), 295–312 (2009)

    Article  Google Scholar 

  35. Chen, B.W., Chen, C.Y., Wang, J.F.: Smart homecare surveillance system: behavior identification based on state transition support vector machines and sound directivity pattern analysis. IEEE Trans. Syst. Man Cybern. Syst. 43(6), 1279–1289 (2013)

    Article  Google Scholar 

  36. Jiang, F., Wu, S., Yang, G., Zhao, D., Kung, S.Y.: Viewpoint-independent hand gesture recognition with Kinect. SIViP 8(1), 163–172 (2014)

    Article  Google Scholar 

  37. Simpson, P.: Fuzzy min-max neural networks—part 1: classification. IEEE Trans. Neural Netw. 3, 776–786 (1992)

    Article  Google Scholar 

  38. Al-Jarrah, O., Halawani, A.: Recognition of gestures in Arabic sign language using neuro-fuzzy systems. Artif. Intell. 133, 117–138 (2001)

    Article  MATH  Google Scholar 

  39. Su, M.C.: A fuzzy rule-based approach to spatio-temporal hand gesture recognition. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 30(2), 276–281 (2000)

    Google Scholar 

  40. Huang, C.L., Huang, W.Y.: Sign language recognition using model-based tracking and a 3D Hopfield neural network. Mach. Vis. Appl. 10(6), 292–307 (1998)

    Article  Google Scholar 

  41. Wang, C., Gao, W., Shan, S.: An approach based on phonemes to large vocabulary Chinese sign language recognition. In: Proceedings of International Conference on Automatic Face and Gesture Recognition, pp. 393–398 (2002)

  42. Bauer, B., Kraiss, K.F.: Video-based sign recognition using self-organizing subunits. In: Proceedings of International Conference on Pattern Recognition, vol. 2, pp. 434–437 (2002)

  43. Tanibata, N., Shimada, N., Shirai, Y.: Extraction of hand features for recognition of sign language words. In: Proceedings of International Conference on Vision Interface, pp. 391–398 (2002)

  44. Hong, R., Wang, M., Li, G., Nie, L., Zha, Z.J., Chua, T.S.: Multimedia question answering. IEEE Multimedia 19(4), 72–78 (2012)

    Article  Google Scholar 

  45. Jiang, F., Gao, W., Yao, H., Zhao, D., Chen, X.: Synthetic data generation technique in Signer-independent sign language recognition. Pattern Recogn. Lett. 30(5), 513–524 (2009)

    Article  Google Scholar 

  46. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  47. Wu, D., Zhu, F., Shao, L.: One shot learning gesture recognition from rgbd images. In: CVPR Workshop on Gesture Recognition, pp. 7–12 (2012)

  48. Lui, Y.M.: A least squares regression framework on manifolds and its application to gesture recognition. In: CVPR Workshop on Gesture Recognition, pp. 13–18 (2012)

  49. Fanello, S. R., Gori, I., Metta, G., Odone, F.: One-shot learning for real-time action recognition. In: Iberian Conference on Pattern Recognition and Image Analysis, pp. 31–40. Madeira, Portugal (2013)

  50. Escalante, H.J., Guyon, I., Athitsos, V., Jangyodsuk, P., Wan, J.: Principal motion components for one-shot gesture recognition. Pattern Anal. Appl. 1–16 (2015). doi:10.1007/s10044-015-0481-3

  51. Keogh, E., Ratanamahatana, C.A.: exact indexing of dynamic time warping. Knowl. Inf. Syst. 7(3), 358–386 (2005)

    Article  Google Scholar 

  52. Wan, J., Ruan, Q., Li, W., An, G., Zhao, R.: 3D SMoSIFT: three-dimensional sparse motion scale invariant feature transform for activity recognition from RGB-D videos. J. Electron. Imaging 23(2), 023017 (2014)

    Article  Google Scholar 

  53. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2006, vol. 2, pp. 2169–2178 (2006)

  54. Grauman, K., Darrell, T.: The pyramid match kernel: discriminative classification with sets of image features. In: IEEE International Conference on Computer Vision, 2005, vol. 2, pp. 1458–1465 (2005)

  55. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)

  56. Levenshtein, V.I.: February. Binary codes capable of correcting deletions, insertions, and reversals. In: Soviet Physics Doklady, vol. 10, no. 8, pp. 707–710 (1966)

  57. Harris, C., Stephens, M.: A combined corner and edge detector. In Alvey Vision Conference, vol. 15, pp. 50–54 (1988)

  58. Dollár, P., Rabaud, V., Gottrell, G.: Behavior recognition via sparse spatio-temporal features. In: 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005)

  59. Mahbub, U., Roy, T., Rahman, M.S, Imtiaz, H.: One-shot-learning gesture recognition using motion history based gesture silhouettes. In: Proceedings of the International Conference on Industrial Application Engineering, pp. 186–193 (2013)

  60. Malgireddy, M.R., Inwogu, I., Govindaraju, V.: A temporal Bayesian model for classifying, detecting and localizing activities in video sequences. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 43–48 (2012)

Download references

Acknowledgments

We would like to acknowledge the editors and reviewers, whose valuable comments greatly improved the manuscript. This work was supported in part by the Major State Basic Research Development Program of China (973 Program 2015CB351804) and the National Natural Science Foundation of China under Grant Nos. 61572155 and 61272386.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Feng Jiang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, F., Ren, J., Lee, C. et al. Spatial and temporal pyramid-based real-time gesture recognition. J Real-Time Image Proc 13, 599–611 (2017). https://doi.org/10.1007/s11554-016-0620-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11554-016-0620-0

Keywords

Navigation