Spatial and temporal pyramid-based real-time gesture recognition

Jiang, Feng; Ren, Jie; Lee, Changhoon; Shi, Wuzhen; Liu, Shaohui; Zhao, Debin

doi:10.1007/s11554-016-0620-0

Spatial and temporal pyramid-based real-time gesture recognition

Special Issue Paper
Published: 18 July 2016

Volume 13, pages 599–611, (2017)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

Feng Jiang¹,
Jie Ren¹,
Changhoon Lee²,
Wuzhen Shi¹,
Shaohui Liu¹ &
…
Debin Zhao¹

427 Accesses
5 Citations
Explore all metrics

Abstract

This paper proposes a novel method for real-time gesture recognition. Aiming at improving the effectiveness and accuracy of HGR, spatial pyramid is applied to linguistically segment gesture sequence into linguistic units and a temporal pyramid is proposed to get a time-related histogram for each single gesture. Those two pyramids can help to extract more comprehensive information of human gestures from RGB and depth video. A two-layered HGR is further exploited to further reduce the computation complexity. The proposed method obtains high accuracy and low computation complexity performance on the ChaLearn Gesture Dataset, comprising more than 50, 000 gesture sequences recorded.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-layered Gesture Recognition with Kinect

One-Shot-Learning Gesture Recognition Using HOG-HOF Features

Research on Dynamic Gesture Recognition Based on Multi Feature Fusion

References

Kendon, A.: Visible Action as Utterance. Cambridge University Press, Cambridge (2004)
Book Google Scholar
Mitra, S., Acharya, T.: Gesture recognition: a survey. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 37(3), 311–324 (2007)
Article Google Scholar
Fang, G., Gao, W., Zhao, D.: Large-vocabulary continuous sign language recognition based on transition-movement models. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 37(1), 1–9 (2007)
Article Google Scholar
Baklouti, M., Monacelli, E., Guitteny, V., Couvet, S.: Intelligent assistive exoskeleton with vision based interface. In: Proceedings of the 6th International Conference on Smart Homes and Health Telematics, vol. 5120, pp. 123–135 (2008)
Nickel, K., Stiefelhagen, R.: Visual recognition of pointing gestures for human-robot interaction. Image Vis. Comput. 25(12), 1875–1884 (2007)
Article Google Scholar
Wu, Y., Huang, T.S.: Hand modeling analysis and recognition for vision based human computer interaction. IEEE Signal Process. Mag. Spec. Issue Immers. Interact. Technol. 18(3), 51–60 (2001)
Google Scholar
Corradini, A.: Real-time gesture recognition by means of hybrid recognizers. In: International Workshop on Gesture and Sign Languages in Human–Computer Interaction, vol. 2298, pp. 34–46 (2001)
Ong, S.C., Ranganath, S.: Automatic sign language analysis: a survey and the future beyond lexical meaning. IEEE Trans. Pattern Anal. Mach. Intell. 27(6), 873–891 (2005)
Article Google Scholar
Aran, O., Keskin, C., Akarun, L.: Computer applications for disabled people and sign language tutoring. In: Proceedings of the Fifth GAP Engineering Congress, pp. 26–28 (2006)
Cooper, H., Holt, B. Bowden, R.: Sign language recognition. In: Visual Analysis of Humans. Springer, London, pp. 539–562 (2011)
Triesch, J., Malsburg, C.: Robotic gesture recognition by cue combination. In: Gesture and Sign Language in Human–Computer Interaction. Lecture Notes in Computer Science. Springer, Berlin, pp. 233–244 (1998)
Hong, S., Setiawan, N.A., Lee, C.: Real-time vision based gesture recognition for human robot interaction. In: Proceedings of International Conference on Knowledge-Based and Intelligent Information & Engineering Systems, vol. 4692, p. 493 (2007)
Grzeszcuk, R., Bradski, G., Chu, M.H., Bouguet, J.Y.: Stereo based gesture recognition invariant to 3d pose and lighting. In: Proceedings of CVPR, pp. 826–833 (2000)
Fujimura, K., Liu, X.: Sign recognition using depth image streams. In: Proceedings of FGR, Southampton, UK, pp. 381–386 (2006)
Hadfield, S., Bowden, R.: Generalised pose estimation using depth. In: Proceedings of ECCV International Workshop: Sign, Gesture, Activity, Heraklion, Crete (2010)
Ershaed, H., Al-Alali, I., Khasawneh, N., Fraiwan, M.: An Arabic sign language computer interface using the Xbox Kinect. In: Annual Undergraduate Research Conference on Applied Computing, Dubai, UAE (2011)
Hong, R., Wang, M., Gao, Y., Tao, D., Li, X., Wu, X.: Image annotation by multiple-instance learning with discriminative feature mapping and selection. IEEE Trans. Cybern. 44(5), 669–680 (2014)
Article Google Scholar
Suma, E., Lange, B., Rizzo, A., Krum, D., Bolas, M.: FAAST: The flexible action and articulated skeleton toolkit. In: IEEE Virtual Reality Conference, pp. 247–248 (2011)
Guyon, I., Athitsos, V., Jangyodsuk, P., Hamner, B., Escalante, H.J.: Chalearn gesture challenge: Design and first results. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1–6 (2012)
Zhou, H., Huang, T.: Tracking articulated hand motion with Eigen dynamics analysis. In: Proceedings of International Conference on Computer Vision, vol. 2, pp. 1102–1109 (2003)
Wu, Y., Lin, J., Huang, T.: Capturing natural hand articulation. In: IEEE International Conference on Computer Vision, pp. 426–432 (2001)
Dardas, N.H.A.Q.: Real-time hand gesture detection and recognition for human computer interaction. Ottawa-Carleton Institute for Electrical and Computer Engineering, School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, Ontario (2012)
Kadir, T., Bowden, R., Ong, E.J., Zisserman, A.: Minimal training, large lexicon, unconstrained sign language recognition. In: Proceedings of BMVC, Kingston, UK, pp. 939–948 (2004)
Zhang, L.G., Chen, Y., Fang, G., Chen, X., Gao, W.: A vision-based sign language recognition system using tied-mixture density HMM. In: Proceedings of International Conference on Multimodal interfaces, State College, PA, USA, pp. 198–204 (2004)
Awad, G., Han, J., Sutherland, A.: A unified system for segmentation and tracking of face and hands in sign language recognition. In: Proceedings of ICPR, Hong Kong, China, pp. 239–242 (2006)
Stergiopoulou, E., Papamarkos, N.: Hand gesture recognition using a neural network shape fitting technique. Eng. Appl. Artif. Intell. 22, 1141–1158 (2009)
Article Google Scholar
Maung, T.H.H.: Real-time hand tracking and gesture recognition system using neural networks. World Acad. Sci. Eng. Technol. 50, 466–470 (2009)
Google Scholar
Maraqa, M., Abu-Zaiter, R.: Recognition of Arabic Sign Language (ArSL) using recurrent neural networks. In: International Conference on Applications of Digital Information and Web Technologies, pp. 478–481 (2008)
Akyol, S., Alvarado, P.: Finding relevant image content for mobile sign language recognition. In: International Conference on Signal Processing, Pattern Recognition and Application, Rhodes, Greece, pp. 48–52 (2001)
Wong, S.F., Kim, T.K., Cipolla, R.: Learning motion categories using both semantic and structural information. In: IEEE Conference on Computer Vision and Pattern Recognition, 2007, pp. 1–6 (2007)
Zahedi, M., Keysers, D. Ney, H.: Appearance-based recognition of words in American sign language. In: Pattern Recognition and Image Analysis. Springer, Heidelberg, pp. 511–519 (2005)
Yin, X., Zhu, X.: Hand posture recognition in gesture-based human–robot interaction. In: IEEE Conference on Industrial Electronics and Applications, pp. 1–6 (2006)
Chen, B.W., He, X., Ji, W., Rho, S., Kung, S.Y.: Support vector analysis of large-scale data based on kernels with iteratively increasing order. J. Supercomput., 1–15 (2015)
Chen, B.W., Wang, J.C., Wang, J.F.: A novel video summarization based on mining the story-structure and semantic relations among concept entities. IEEE Trans. Multimedia 11(2), 295–312 (2009)
Article Google Scholar
Chen, B.W., Chen, C.Y., Wang, J.F.: Smart homecare surveillance system: behavior identification based on state transition support vector machines and sound directivity pattern analysis. IEEE Trans. Syst. Man Cybern. Syst. 43(6), 1279–1289 (2013)
Article Google Scholar
Jiang, F., Wu, S., Yang, G., Zhao, D., Kung, S.Y.: Viewpoint-independent hand gesture recognition with Kinect. SIViP 8(1), 163–172 (2014)
Article Google Scholar
Simpson, P.: Fuzzy min-max neural networks—part 1: classification. IEEE Trans. Neural Netw. 3, 776–786 (1992)
Article Google Scholar
Al-Jarrah, O., Halawani, A.: Recognition of gestures in Arabic sign language using neuro-fuzzy systems. Artif. Intell. 133, 117–138 (2001)
Article MATH Google Scholar
Su, M.C.: A fuzzy rule-based approach to spatio-temporal hand gesture recognition. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 30(2), 276–281 (2000)
Google Scholar
Huang, C.L., Huang, W.Y.: Sign language recognition using model-based tracking and a 3D Hopfield neural network. Mach. Vis. Appl. 10(6), 292–307 (1998)
Article Google Scholar
Wang, C., Gao, W., Shan, S.: An approach based on phonemes to large vocabulary Chinese sign language recognition. In: Proceedings of International Conference on Automatic Face and Gesture Recognition, pp. 393–398 (2002)
Bauer, B., Kraiss, K.F.: Video-based sign recognition using self-organizing subunits. In: Proceedings of International Conference on Pattern Recognition, vol. 2, pp. 434–437 (2002)
Tanibata, N., Shimada, N., Shirai, Y.: Extraction of hand features for recognition of sign language words. In: Proceedings of International Conference on Vision Interface, pp. 391–398 (2002)
Hong, R., Wang, M., Li, G., Nie, L., Zha, Z.J., Chua, T.S.: Multimedia question answering. IEEE Multimedia 19(4), 72–78 (2012)
Article Google Scholar
Jiang, F., Gao, W., Yao, H., Zhao, D., Chen, X.: Synthetic data generation technique in Signer-independent sign language recognition. Pattern Recogn. Lett. 30(5), 513–524 (2009)
Article Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article MATH Google Scholar
Wu, D., Zhu, F., Shao, L.: One shot learning gesture recognition from rgbd images. In: CVPR Workshop on Gesture Recognition, pp. 7–12 (2012)
Lui, Y.M.: A least squares regression framework on manifolds and its application to gesture recognition. In: CVPR Workshop on Gesture Recognition, pp. 13–18 (2012)
Fanello, S. R., Gori, I., Metta, G., Odone, F.: One-shot learning for real-time action recognition. In: Iberian Conference on Pattern Recognition and Image Analysis, pp. 31–40. Madeira, Portugal (2013)
Escalante, H.J., Guyon, I., Athitsos, V., Jangyodsuk, P., Wan, J.: Principal motion components for one-shot gesture recognition. Pattern Anal. Appl. 1–16 (2015). doi:10.1007/s10044-015-0481-3
Keogh, E., Ratanamahatana, C.A.: exact indexing of dynamic time warping. Knowl. Inf. Syst. 7(3), 358–386 (2005)
Article Google Scholar
Wan, J., Ruan, Q., Li, W., An, G., Zhao, R.: 3D SMoSIFT: three-dimensional sparse motion scale invariant feature transform for activity recognition from RGB-D videos. J. Electron. Imaging 23(2), 023017 (2014)
Article Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2006, vol. 2, pp. 2169–2178 (2006)
Grauman, K., Darrell, T.: The pyramid match kernel: discriminative classification with sets of image features. In: IEEE International Conference on Computer Vision, 2005, vol. 2, pp. 1458–1465 (2005)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)
Levenshtein, V.I.: February. Binary codes capable of correcting deletions, insertions, and reversals. In: Soviet Physics Doklady, vol. 10, no. 8, pp. 707–710 (1966)
Harris, C., Stephens, M.: A combined corner and edge detector. In Alvey Vision Conference, vol. 15, pp. 50–54 (1988)
Dollár, P., Rabaud, V., Gottrell, G.: Behavior recognition via sparse spatio-temporal features. In: 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005)
Mahbub, U., Roy, T., Rahman, M.S, Imtiaz, H.: One-shot-learning gesture recognition using motion history based gesture silhouettes. In: Proceedings of the International Conference on Industrial Application Engineering, pp. 186–193 (2013)
Malgireddy, M.R., Inwogu, I., Govindaraju, V.: A temporal Bayesian model for classifying, detecting and localizing activities in video sequences. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 43–48 (2012)

Download references

Acknowledgments

We would like to acknowledge the editors and reviewers, whose valuable comments greatly improved the manuscript. This work was supported in part by the Major State Basic Research Development Program of China (973 Program 2015CB351804) and the National Natural Science Foundation of China under Grant Nos. 61572155 and 61272386.

Author information

Authors and Affiliations

School of Computer Science, Harbin Institute of Technology, Harbin, China
Feng Jiang, Jie Ren, Wuzhen Shi, Shaohui Liu & Debin Zhao
Seoul National University of Science and Technology, Seoul, Korea
Changhoon Lee

Authors

Feng Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Jie Ren
View author publications
You can also search for this author in PubMed Google Scholar
Changhoon Lee
View author publications
You can also search for this author in PubMed Google Scholar
Wuzhen Shi
View author publications
You can also search for this author in PubMed Google Scholar
Shaohui Liu
View author publications
You can also search for this author in PubMed Google Scholar
Debin Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Feng Jiang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiang, F., Ren, J., Lee, C. et al. Spatial and temporal pyramid-based real-time gesture recognition. J Real-Time Image Proc 13, 599–611 (2017). https://doi.org/10.1007/s11554-016-0620-0

Download citation

Received: 29 December 2015
Accepted: 07 July 2016
Published: 18 July 2016
Issue Date: September 2017
DOI: https://doi.org/10.1007/s11554-016-0620-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spatial and temporal pyramid-based real-time gesture recognition

Abstract

Access this article

Similar content being viewed by others

Multi-layered Gesture Recognition with Kinect

One-Shot-Learning Gesture Recognition Using HOG-HOF Features

Research on Dynamic Gesture Recognition Based on Multi Feature Fusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Spatial and temporal pyramid-based real-time gesture recognition

Abstract

Access this article

Similar content being viewed by others

Multi-layered Gesture Recognition with Kinect

One-Shot-Learning Gesture Recognition Using HOG-HOF Features

Research on Dynamic Gesture Recognition Based on Multi Feature Fusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation