Abstract
Human hands are dexterous and always be an intuitive way to instruct or communicate with peers. In recent years, hand gesture is widely used as a novel way for human computer interaction as well. However, existing approaches target solely to recognize single-handed gesture, but not gestures with two hands in close proximity (bimanual gesture). Thus, this paper tries to tackle the problems in bimanual gestures recognition which are not well studied from the literature. To overcome the critical issue of hand-hand self-occlusion problem in bimanual gestures, multiple cameras from different view points are used. A tailored multi-camera system is constructed to acquire multi-views bimanual gesture data. By employing both shape and color features, classifiers are trained with our bimanual gestures dataset. A weighted sum fusion scheme is employed to ensemble results predicted from different classifiers. While, the weightings in the fusion are optimized according to how well the recognition performed on a particular view. Our experiments show that multiple-view results outperform single-view results. The proposed method is especially suitable to interactive multimedia applications, such as our two demo programs: a video game and a sign language learner.
Similar content being viewed by others
References
Barros P, Magg S, Weber C, Wermter S (2014) A multichannel convolutional neural network for hand posture recognition. Springer, Berlin, pp 403–410
Biswas KK, Basu SK (2011) Gesture recognition using microsoft kinect®;. In: 5Th international conference on automation, robotics and applications. IEEE, ICARA, pp 100–103
Bulugu I, Ye Z (2016) Scale invariant static hand-postures detection using extended higher-order local autocorrelation features. Int J Comput Appl 135(5):1–5. published by Foundation of Computer Science (FCS), NY, USA
Chen M, AlRegib G, Juang BH (2013) Feature processing and modeling for 6d motion gesture recognition. IEEE Trans Multimedia 15(3):561–571. https://doi.org/10.1109/TMM.2012.2237024
hua Chen Z, Kim JT, Liang J, Zhang J, Yuan YB (2014) Real-time hand gesture recognition using finger segmentation. The Scientific World Journal 2014, Article ID 267872
Cheng H, Yang L, Liu Z (2016) Survey on 3d hand gesture recognition. IEEE Trans Circuits Syst Video Technol 26(9):1659–1673. https://doi.org/10.1109/TCSVT.2015.2469551
Deng X, Yang S, Zhang Y, Tan P, Chang L, Wang H (2017) Hand3D: hand pose estimation using 3d neural network. CoRR arXiv:1704.02224[cs.CV]
Dietterich TG (2000) Ensemble methods in machine learning. In: Proceedings of the first international workshop on multiple classifier systems, Springer-Verlag, London, UK, UK, MCS ’00, pp 1–15, http://dl.acm.org/citation.cfm?id=648054.743935
Erol A, Bebis G, Nicolescu M, Boyle RD, Twombly X (2007) Vision-based hand pose estimation: a review. Comput Vis Image Underst 108(1-2):52–73
Fogel I, Sagi D (1989) Gabor filters as texture discriminator. Biol Cybern 61 (2):103–113
Ge L, Liang H, Yuan J, Thalmann D (2017) 3D convolutional neural networks for efficient and robust hand pose estimation from single depth images. In: Proc CVPR
Hansen LK, Salamon P (1990) Neural network ensembles. IEEE Trans Pattern Anal Mach Intell 12(10):993–1001. https://doi.org/10.1109/34.58871
Joshi A, Monnier C, Betke M, Sclaroff S (2015) A random forest approach to segmenting and classifying gestures. In: 2015 11Th IEEE international conference and workshops on automatic face and gesture recognition (FG), vol 1, pp 1–7
Karime A, Al-Osman H, Gueaieb W, Saddik AE (2011) E-glove: an electronic glove with vibro-tactile feedback for wrist rehabilitation of post-stroke patients. In: IEEE international conference on multimedia and expo
Kittler J, Hatef M, Duin RPW, Matas J (1998) On combining classifiers. IEEE Trans Pattern Anal Mach Intell 20(3):226–239
Kristensson PO, Nicholson T, Quigley A (2012) Continuous recognition of one-handed and two-handed gestures using 3d full-body motion tracking sensors. In: Proceedings of the 2012 ACM international conference on intelligent user interfaces. ACM, pp 89–92
LeCun Y, Bengio Y (1998) The handbook of brain theory and neural networks. In: MIT Press, Cambridge, MA, USA, chap convolutional networks for images, speech, and time series, pp 255–258, http://dl.acm.org/citation.cfm?id=303568.303704
Leite DQ, Duarte JC, Neves LP, de Oliveira JC, Giraldi GA (2016) Hand gesture recognition from depth and infrared kinect data for cave applications interaction. Multimed Tools Appl:1–33
Li M, Leung H (2016) Multiview skeletal interaction recognition using active joint interaction graph. IEEE Trans Multimedia 18(11):2293–2302. https://doi.org/10.1109/TMM.2016.2614228
Molchanov P, Gupta S, Kim K, Kautz J (2015) Hand gesture recognition with 3d convolutional neural networks. In: 2015 IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp 1–7. https://doi.org/10.1109/CVPRW.2015.7301342
Molchanov P, Yang X, Gupta S, Kim K, Tyree S, Kautz J (2016) Online detection and classification of dynamic hand gestures with recurrent 3d convolutional neural networks. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 4207–4215. https://doi.org/10.1109/CVPR.2016.456
Mueller F, Mehta D, Sotnychenko O, Sridhar S, Casas D, Theobalt C (2017) Real-time hand tracking under occlusion from an egocentric RGB-D sensor. In: Proc ICCV, http://handtracker.mpi-inf.mpg.de/projects/OccludedHands/
Oikonomidis I, Kyriazis N, Argyros AA (2012) Tracking the articulated motion of two strongly interacting hands. In: CVPR, IEEE computer society, pp 1862–1869
Parvini F, Mcleod D, Shahabi C, Navai B, Zali B, Ghandeharizadeh S (2009) An approach to glove-based gesture recognition. In: Proceedings of the 13th international conference on human-computer interaction. Springer-Verlag, Berlin, Heidelberg, pp 236–245, https://doi.org/10.1007/978-3-642-02577-8_26
Potter LE, Araullo J, Carter L (2013) The leap motion controller: a view on sign language. In: Proceedings of the 25th Australian computer-human interaction conference: augmentation, application, innovation, collaboration, ACM, pp 175–178
Poularakis S, Katsavounidis I (2014) Finger detection and hand posture recognition based on depth information. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 4329–4333, https://doi.org/10.1109/ICASSP.2014.6854419
Rahman N, Wei K, See J (2006) Rgb-h-cbcr skin colour model for human face detection. In: MMU international symposium on information & communications technologies (M2USIC 2006), MMU
Rautaray SS, Agrawal A (2015) Vision based hand gesture recognition for human computer interaction: a survey. Artif Intell Rev 43(1):1–54
Ren Z, Yuan J, Meng J, Zhang Z (2013) Robust part-based hand gesture recognition using kinect sensor. IEEE Trans Multimed 15(5):1110–1120
Saeed A, Niese R, Al-Hamadi A, Michaelis B (2011) Coping with hand-hand overlapping in bimanual movements. In: IEEE international conference on signal and image processing applications. IEEE, ICSIPA, pp 238–243
Sarkar AR, Sanyal G, Majumder S (2013) Hand gesture recognition systems: a survey. Int J Comput Appl 71(15)
Schramm R, Jung CR, Miranda ER (2015) Dynamic time warping for music conducting gestures evaluation. IEEE Trans Multimedia 17(2):243–255. https://doi.org/10.1109/TMM.2014.2377553
Singha J, Laskar RH (2016) Hand gesture recognition using two-level speed normalization, feature selection and classifier fusion. Multimedia Syst:1–16. https://doi.org/10.1007/s00530-016-0510-0
Tang A, Lu K, Wang Y, Huang J, Li H (2015) A real-time hand posture recognition system using deep neural networks. ACM Trans Intell Syst Technol 6(2):21:1–21:23
Tax DM, van Breukelen M, Duin RP, Kittler J (2000) Combining multiple classifiers by averaging or by multiplying? Pattern Recogn 33(9):1475–1485
Tzionas D, Ballan L, Srikantha A, Aponte P, Pollefeys M, Gall J (2016) Capturing hands in action using discriminative salient points and physics simulation. Int J Comput Vis 118(2):172–193. https://doi.org/10.1007/s11263-016-0895-4
Wan C, Probst T, Gool LV, Yao A (2017) Crossing nets: combining gans and vaes with a shared latent space for hand pose estimation. In: Proc CVPR
Wang C, Liu Z, Chan SC (2015) Superpixel-based hand gesture recognition with kinect depth camera. IEEE Trans Multimedia 17(1):29–39. https://doi.org/10.1109/TMM.2014.2374357
Wang RY, Popović J (2009) Real-time hand-tracking with a color glove. ACM Trans Graph 28(3):63:1–63:8
Wu G, Kang W (2016) Robust fingertip detection in a complex environment. IEEE Trans Multimedia 18(6):978–987. https://doi.org/10.1109/TMM.2016.2545401
Zen G, Porzi L, Sangineto E, Ricci E, Sebe N (2016) Learning personalized models for facial expression analysis and gesture recognition. IEEE Trans Multimedia 18(4):775–788. https://doi.org/10.1109/TMM.2016.2523421
Acknowledgments
The work described in this paper was fully supported by a grant from the Research Grants Council of the Hong Kong SAR, China (Ref. No. UGC/FDS11/E03/15).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Poon, G., Kwan, K.C. & Pang, WM. Occlusion-robust bimanual gesture recognition by fusing multi-views. Multimed Tools Appl 78, 23469–23488 (2019). https://doi.org/10.1007/s11042-019-7660-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-7660-y