Abstract
This paper proposes a novel moving hand segmentation approach using skin color, grayscale, depth, and motion cues for gesture recognition. The proposed approach does not depend on unreasonable restrictions, and it can solve the problem of hand-over-face occlusion. First, an online updated skin color histogram (OUSCH) model is built to robustly represent skin color; second, according to the variance information of grayscale and depth optical flow, a motion region of interest (MRoI) is adaptively extracted to locate the moving body part (MBP) and reduce the impact of noise; then, Harris-Affine corners that satisfy skin color and adaptive motion constraints are adopted as skin seed points in the MRoI; next, the skin seed points are grown to obtain a candidate hand region utilizing skin color, depth and motion criteria; finally, boundary depth gradient, skeleton extraction, and shortest path search are employed to segment the moving hand region from the candidate hand region. Experimental results demonstrate that the proposed approach can accurately segment moving hand regions under different situations, especially when the face is occluded by a hand. Furthermore, this approach achieves higher segmentation accuracy than other state-of-the-art approaches.
Similar content being viewed by others
References
Wan, J., Ruan, Q., Li, W., and Deng, S., One-shot learning gesture recognition from RGB-D data using bag of features, J. Mach. Learn. Res., 2013, no. 14(1), pp. 2549–2582.
Monnier, C., German, S., and Ost, A., A multi-scale boosted detector for efficient and robust gesture recognition, Proc. European Conference on Computer Vision Workshops, Zurich, 2014, pp. 491–502.
Yang, H.D. and Lee, S.W, Simultaneous spotting of signs and fingerspellings based on hierarchical conditional random fields and boostmap embeddings, Pattern Recognit., 2010, no. 43(8), pp. 2858–2870.
Cheng, H., Dai, Z.J., Liu, Z.C., and Zhao, Y, An image-to-class dynamic time warping approach for both 3D static and trajectory hand gesture recognition, Pattern Recognit., 2016, no. 55, pp. 137–147.
Jiang, H., Duerstock, B.S., and Wachs, J.P., A machine vision-based gestural interface for people with upper extremity physical impairments, IEEE Trans. Syst. Man Cybern.-Syst., 2014, no. 44(5), pp. 630–641.
Alon, J., Athitsos, V., Yuan, Q., and Sclaroff, S., A unified framework for gesture recognition and spatiotemporal gesture segmentation, IEEE Trans. Pattern Anal. Mach. Intell., 2009, no. 31(9), pp. 1685–1699.
Pfister, T., Charles, J., and Zisserman, A., Domain-adaptive discriminative one-shot learning of gestures, Proc. European Conference on Computer Vision (ECCV), 2014, pp. 814–829.
Dominio, F. and Donadeo, M. and Zanuttigh P, Combining multiple depth-based descriptors for hand gesture recognition, Pattern Recognit. Lett., 2014, no. 50(2014), pp. 101–111.
Liang, H., Yuan, J., and Thalmann, D., 3D fingertip and palm tracking in depth image sequences, Proc. ACM International Conference on Multimedia (MM), 2012, pp. 785–788.
Jadooki, S., Mohamad, D., Tanzila, S., Almazyad, A.S., and Rehman, A, Fused features mining for depthbased hand gesture recognition to classify blind human communication, Neural Comput. Appl., 2016, pp. 1–10.
Wang, C., Liu, Z., and Chan, S.C., Superpixel-based hand gesture recognition with Kinect depth camera, IEEE Trans. Multimedia, 2015, no. 17(1), pp. 29–39.
Abella, J., Alcaide, R., Sabate, A., Mas, J., Escalera, S., Gonzalez, J., and Antens, C., Multi-modal descriptors for multi-class hand pose recognition in human computer interaction systems, Proc. ACM International Conference on Multimodal Interaction (ICMI), 2013, pp. 503–508.
Chemyshov, V. and Mestetskiy, L., Real-time hand detection using continuous skeletons, Pattern Recognit. Image Anal., 2016, no. 26(2), pp. 368–373.
Maenpaa, T. and Pietikainen, M., Classification with color and texture: Jointly or separately? Pattern Recognit., 2004, no. 37(8), pp. 1629–1640.
Jones, M.J. and Rehg, J.M, Statistical color models with application to skin detection, Int. J. Comput. Vis., 2002, no. 46(1), pp. 81–96.
Farneback, G., Two-frame motion estimation based on polynomial expansion, Proc. Scandinavian Conference on Image Analysis (SCIA), 2003, pp. 363–370.
Mikolajczyk, K. and Schmid, C, Scale & affine invariant interest point detectors, Int. J. Comput. Vis., 2004, vol. 60, no. 1, pp. 63–86.
Otsu, N., A threshold selection method from gray-level histograms, Automatica, 1975, no. 11, pp. 23–27.
Talbot, H. and Vincent, L.M, Euclidean skeletons and conditional bisectors, Proc. SPIE Visual Communications and Image Processing (VCIP), 1992, pp. 862–876.
Chalearn, ChaLearn Gesture Dataset (CGD2011), California, 2011. http://gesture.chalearn.org/data.
Stergiopoulou, E., Sgouropoulos, K., Nikolaou, N., Papamarkos, N., and Mitianoudis, N, Real time hand detection in a complex background, Eng. Appl. Artif. Intell., 2014, no. 35, pp. 54–70.
Author information
Authors and Affiliations
Corresponding author
Additional information
The article is published in the original.
About this article
Cite this article
Lin, J., Ruan, X., Yu, N. et al. Multi-cue based moving hand segmentation for gesture recognition. Aut. Control Comp. Sci. 51, 193–203 (2017). https://doi.org/10.3103/S0146411617030063
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.3103/S0146411617030063