Journal of Signal Processing Systems

, Volume 91, Issue 10, pp 1105–1113 | Cite as

2D Hand Detection Using Multi-Feature Skin Model Supervised Cascaded CNN

  • Qiangyu WangEmail author
  • Guoying Zhang
  • Shu Yu


Hand gesture recognition is one of the most popular Human Computer Interface. The first step in most vision-based gesture recognition system is the hand detection and segmentation. Since hands are involved in a variety of daily tasks, the detection work suffers from both extreme illumination changes and the intrinsic variability of hand appearance. To overcome these problems, we propose a new method for 2D hand detection which can be considered as a combination of Multi-Feature based hand proposal generation and cascaded convolutional neural network (CCNN) classification. Considered various luminance, we choose color, Gabor, HOG and SIFT feature to discriminate skin region and generate hand proposal. Also, we propose a cascaded CNN that keeps the deep context information to detect hand among the proposals. The proposed Multi-Feature Supervised Cascaded CNN (MFS-CCNN) method is tested on a combination of several datasets including Oxford Hands Dataset, VIVA hand detection and Egohands Dataset as positive sample and ImageNet 2012, FDDB dataset as negative sample. The proposed method achieves competitive results.


Hand detection Feature modeling Convolutional neural networks 



This work is supported by Foundation of China Institute of water resources and hydropower research (GE0145B112017).


  1. 1.
    Stergiopoulou, E., Sgouropoulos, K., Nikolaou, N., Papamarkos, N., & Mitianoudis, N. (2014). Real time hand detection in a complex background. Engineering Applications of Artificial Intelligence, 35(2), 54–70.CrossRefGoogle Scholar
  2. 2.
    Ebert, A., Gershon, N. D., & van der Veer, G. C. (2012). Human-computer interaction: Introduction and overview. Künstliche Intelligenz, 26(2), 121–126.CrossRefGoogle Scholar
  3. 3.
    Zariffa, J., & Popovic, M. R. (2013). Hand contour detection in wearable camera video using an adaptive histogram region of interest. J NeuroEng Rehab, 10,1(2013-12-19), 10(1), 114–114.Google Scholar
  4. 4.
    Rogez, G., Supancic, J. S., & Ramanan, D. (2015). Understanding everyday hands in action from RGB-D images. IEEE International Conference on Computer Vision, 22, 3889–3897 IEEE Computer Society.Google Scholar
  5. 5.
    Mittal, A., Zisserman, A., & Torr, P. (2011). Hand detection using multiple proposals. British Machine Vision Conference, 40, 75.1–75.11.Google Scholar
  6. 6.
    Li, C., & Kitani, K. M. (2013). Pixel-level hand detection in ego-centric videos. Computer Vision and Pattern Recognition, 9, 3570–3577 IEEE.Google Scholar
  7. 7.
    Fathi, A., & Rehg, J. M. (2011). Learning to recognize objects in egocentric activities. IEEE Conference on Computer Vision and Pattern Recognition 42, pp.3281-3288). IEEE Computer Society.Google Scholar
  8. 8.
    Serra, G., Camurri, M., Baraldi, L., Benedetti, M., & Cucchiara, R. (2013). Hand segmentation for gesture recognition in EGO-vision. ACM International Workshop on Interactive Multimedia on Mobile & Portable Devices, 24, 31–36 ACM.Google Scholar
  9. 9.
    Dalal, N., & Triggs, & Bill. (2005). Histograms of oriented gradients for human detection. Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, 1, 886–893 IEEE.Google Scholar
  10. 10.
    Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.CrossRefGoogle Scholar
  11. 11.
    Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2016). Region-based convolutional networks for accurate object detection and segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(1), 142–158.CrossRefGoogle Scholar
  12. 12.
    Girshick, R. (2015). Fast R-CNN. IEEE International Conference on Computer Vision (pp.1440-1448). IEEE.Google Scholar
  13. 13.
    Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. International Conference on Neural Information Processing Systems, 39, 91–99 MIT press.Google Scholar
  14. 14.
    Dai, J., Li, Y., He, K., & Sun, J. (2016). R-fcn: Object detection via region-based fully convolutional networks.Google Scholar
  15. 15.
    Erol, A., Bebis, G., Nicolescu, M., Boyle, R. D., & Twombly, X. (2007). Vision based hand pose estimation: A review. Computer Vision & Image Understanding, 108(1), 52–73.CrossRefGoogle Scholar
  16. 16.
    Wachs, J. P., Kölsch, M., Stern, H., & Edan, Y. (2011). Vision-based hand-gesture applications. Communications of the ACM, 54(2), 60–71.CrossRefGoogle Scholar
  17. 17.
    The Vision for Intelligent Vehicles and Applications (VIVA) Challenge, Laboratory for Intelligent and Safe Automobiles, UCSD.
  18. 18.
    Bambach, S., Lee, S., Crandall, D. J., & Yu, C. (2016). Lending a hand: Detecting hands and recognizing activities in complex egocentric interactions. IEEE International Conference on Computer Vision (pp.1949-1957). IEEE.Google Scholar
  19. 19.
    Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.MathSciNetCrossRefGoogle Scholar
  20. 20.
    Jain, V., & Learned-Miller, E. (2010). FDDB: A benchmark for face detection in unconstrained settings. UMass Amherst Technical Report.Google Scholar
  21. 21.
    Betancourt, A. (2014). A sequential classifier for hand detection in the framework of egocentric vision. Computer Vision and Pattern Recognition Workshops (pp.600-605). IEEE.Google Scholar
  22. 22.
    Wang, Q., & Zhang, G. (2017). Ore image edge detection using hog-index dictionary learning approach. Journal of Engineering, 1(1).Google Scholar
  23. 23.
    Yin, H., & Gai, K. (2015). An empirical study on preprocessing high-dimensional class-imbalanced data for classification. IEEE, International Conference on High PERFORMANCE Computing and Communications (pp.1314-1319). IEEE Computer Society.Google Scholar
  24. 24.
    Le, T. H. N., Zhu, C., Zheng, Y., Luu, K., & Savvides, M. (2017). Robust hand detection in vehicles. International Conference on Pattern Recognition (pp.573-578). IEEE.Google Scholar
  25. 25.
    Kong, T., Yao, A., Chen, Y., & Sun, F. (2016). HyperNet: Towards accurate region proposal generation and joint object detection. Computer Vision and Pattern Recognition (pp.845-853). IEEE.Google Scholar
  26. 26.
    Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., & Fu, C. Y., et al. (2016). SSD: Single shot MultiBox detector. European Conference on Computer Vision (pp.21-37). Springer international publishing.Google Scholar
  27. 27.
    Kakumanu, P., Makrogiannis, S., & Bourbakis, N. (2007). A survey of skin-color modeling and detection methods. Pattern Recognition, 40(3), 1106–1122.CrossRefGoogle Scholar
  28. 28.
    Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. Computer Science.Google Scholar
  29. 29.
    Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K., Dally, W. J., & Keutzer, K. (2016). Squeezenet: Alexnet-level accuracy with 50x fewer parameters and <0.5mb model size.Google Scholar
  30. 30.
    Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., & Weyand, T., et al. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications.Google Scholar
  31. 31.
    Gai, K., Qiu, M., & Sun, X. (2017). A survey on fintech. Journal of Network & Computer Applications. Google Scholar
  32. 32.
    Yin, H., Gai, K., & Wang, Z. (2016). A classification algorithm based on ensemble feature selections for imbalanced-class dataset. IEEE, International Conference on Big Data Security on Cloud (pp.245-249). IEEE.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Mechanical Electronic & Information EngineeringChina University of Mining & TechnologyBeijingPeople’s Republic of China
  2. 2.State Key Laboratory of Simulation and Regulation of Water Cycle in River BasinBeijingChina
  3. 3.Department of Geotechnical EngineeringChina Institute of Water Resources and Hydropower ResearchBeijingChina

Personalised recommendations