Pattern Recognition and Image Analysis

, Volume 18, Issue 3, pp 417–430 | Cite as

Robust hand detection in still video images using a combination of salient regions and color cues for interaction with an intelligent environment

  • T. PlötzEmail author
  • J. Richarz
  • G. A. Fink
Application Problems


The “intelligence” of an intelligent environment is not only influenced by the functionality it offers, but also largely by the naturalness and intuitiveness of its interaction modes. A very important natural interaction mode is gestures, as long as the environment’s interface poses no strict constraints on how the gestures may be performed. Since gestures are generally defined by hand/arm poses and motions, an important prerequisite to the recognition of unconstrained gestures is the robust detection of hands in video images. However, due to the strongly articulated nature of hands and the challenges given by a realistic (i.e., not strictly controlled) environment, this is a very challenging task, because it means hands need to be found in almost arbitrary configurations and under strongly varying lighting conditions. In this article, we present an approach to hand detection in the context of an intelligent house using a fusion of structural cues and color information. We first describe our detection algorithm using scale-invariant salient region features, combined with an efficient region-based filtering approach to reduce the number of false positives. The results are fused with the output of a skin color classifier. A detailed experimental evaluation on realistic data, including different cue fusing schemes, is presented. By means of an experimental evaluation on a challenging task, we demonstrate that, although each of the two different feature types (image structure and color) has drawbacks, their combination yields promising results for robust hand detection.


Near Neighbor Gesture Recognition Scale Invariant Feature Transform Skin Pixel Scale Invariant Feature Transform Descriptor 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    J. Richarz, T. Plötz, and G. A. Fink, “Detecting Hands in Video Images Using Scale Invariant Local Descriptors,” in Proceedings of IASTED Int. Conf. on Visualization, Imaging and Image Processing (VHP 2007) (Palma de Mallorca, Spain, 2007), pp. 259–264.Google Scholar
  2. 2.
    T. Plötz, “The FINCA: A Flexible, Intelligent eNvironment with Computational Augmentation,”, 2007.
  3. 3.
    N. Hofemann, J. Fritsch, and G. Sagerer, “Recognition of Deictic Gestures with Context,” in Proceedings of 26th Deutsche Arbeitsgemeinschaft Mustererkennung Symposium (LNCS Vol. 3175, Springer, 2004), pp. 334–341.Google Scholar
  4. 4.
    R. Lockton and A. W. Fitzgibbon, “Real-Time Gesture Recognition Using Deterministic Boosting,” in Proceedings of British Machine Vision Conference, 2002, pp. 817–826.Google Scholar
  5. 5.
    J. Triesch and C. von der Malsburg, “Classification of Hand Postures against Complex Backgrounds Using Elastic Graph Matching,” Image and Vision Computing 20, 937–943 (2002).CrossRefGoogle Scholar
  6. 6.
    M. J. Jones and J. M. Rehg, “Statistical Color Models with Aapplication to Skin Detection,” International Journal of Computer Vision 46(1), 81–96 (2002).zbMATHCrossRefGoogle Scholar
  7. 7.
    S. Jayaram, S. Schmugge, M. C. Shin, and L. V. Tsap, “Effect of Colorspace Transformation, the Illuminance Component, and Color Modeling on Skin Detection,” in Proceedings of IEEE Conf. on Computer Vision and Pattern Recognition, 2004, Vol. 2, pp. 813–818.Google Scholar
  8. 8.
    M. C. Shin, K. I. Chang, and L. V. Tsap, “Does Color-space Transformation Make Any Difference on Skin Detection?,” in Proceedings of 6th IEEE Workshop on Applications of Computer Vision, 2002, pp. 275–279.Google Scholar
  9. 9.
    F. Dadgostar and A. Sarrafzadeh, “An Adaptive Real-Time Skin Detector Based on Hue Thresholding: A Comparison on Two Motion Tracking Methods,” Pattern Recognition Letters 27, 1342–1352 (2006).CrossRefGoogle Scholar
  10. 10.
    Q. Zhu, K.-T. Cheng, and C.-T. Wu, “A Unified Aadaptive Approach to Accurate Skin Detection,” in Proceedings of IEEE Int. Conf. on Image Processing, 2004, Vol. 2, pp. 1189–1192.Google Scholar
  11. 11.
    L. Sigal, S. Sclaroff, and V. Athitsos, “Skin Color-Based Video Segmentation under Time-Varying Illumination,” IEEE Transactions on Pattern Analysis and Machine Intelligence 26(7), 862–877 (2004).CrossRefGoogle Scholar
  12. 12.
    P. Kakumanu, S. Makrogiannis, and N. Bourbakis, “A Survey of Skin-Color Modeling and Detection Methods,” Pattern Recognition 40(3), 1106–1122 (2007).zbMATHCrossRefGoogle Scholar
  13. 13.
    P. Viola and M. Jones, “Rapid Object Detection Using a Boosted Cascade of Simple Features,” in Proceedings of IEEE Conf. on Computer Vision and Pattern Recognition, 2001, pp. 511–518.Google Scholar
  14. 14.
    T. F. Cootes and C. J. Taylor, “Statistical Models of Appearance for Medical Image Analysis and Computer Vision,” Proceedings of SPIE Medical Imaging, 2001, Vol. 4322, pp. 238–248.Google Scholar
  15. 15.
    M. C. Burl, M. Weber, and P. Perona, “A Probabilistic Approach to Object Recognition Using Local Photometry and Global Geometry,” Proceedings of 5th European Conf. on Computer Vision, LNCS 1407, 1998, Vol. 2, pp. 628–641.Google Scholar
  16. 16.
    C. Harris and M. Stephens, “A Combined Corner and Edge Detector,” in Proceedings of Alvey Vision Conference, 1988, pp. 147–151.Google Scholar
  17. 17.
    T. Kadir and M. Brady, “Scale, Saliency and Image Description,” Int. Journal of Computer Vision 45(2), 83–105 (2001).zbMATHCrossRefGoogle Scholar
  18. 18.
    H. Bay, T. Tuytelaars, and L. van Gool, “Surf: Speeded up Robust Features,” in Proceedings of 9th European Conf. on Computer Vision (LNCS Vol. 3951, Springer, 2006), pp. 404–417.Google Scholar
  19. 19.
    D. G. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” Int. Journal of Computer Vision 60, 91–110 (2004).CrossRefGoogle Scholar
  20. 20.
    G. Fumera and F. Roli, “Linear Combiners for Classifier Fusion: Some Theoretical and Experimental Results,” in Proceedings of Int. Workshop on Multiple Classifier Systems, LNCS 2709, 2003, pp. 74–83.Google Scholar
  21. 21.
    F. Roli, J. Kittler, G. Fumera, and D. Muntoni, “An Experimental Comparison of Classifier Fusion Rules for Multimodel Personal Identity Verification Systems,” in Proceedings of Int. Workshop on Multiple Classifier Systems, LNCS 2364, 2002, pp. 325–336.Google Scholar
  22. 22.
    J. Triesch and C. von der Malsburg, “Democratic Integration: Self-Organized Integration of Adaptive Cues,” Neural Computation 13, 2049–2074 (2002).CrossRefGoogle Scholar
  23. 23.
    C. Martin, E. Schaffernicht, A. Scheidig, and H.-M. Gross, “Sensor Fusion Using a Probabilistic Aggregation Scheme for People Detection and Tracking,” in Proceedings of 2nd European Conf. on Mobile Robots, 2005, pp. 176–181.Google Scholar
  24. 24.
    J. Alon, V. Athitsos, and S. Sclaroff, “Simultaneous Localization and Recognition of Dynamic Hand Gestures,” in Proceedings of IEEE Workshop on Motion and Video Computing, 2005, Vol. 2, pp. 254–260.Google Scholar
  25. 25.
    A. S. Micilotta, E.-J. Ong, and R. Bowden, “Real-Time Upper Body Detection and 3D Pose Estimation in Monoscopic Images,” in Proceedings of European Conference on Computer Vision (LNCS 3953, Springer Verlag, 2006), pp. 139–150.Google Scholar
  26. 26.
    J. Kosecka and F. Li, “Vision Based Topological Markov Localization,” in Proceedings of IEEE Int. Conf. on Robotics and Automation, 2004, Vol. 2, pp. 1481–1486.Google Scholar
  27. 27.
    J. Liu and R. Hubbold, “Automatic Camera Calibration and Scene Reconstruction with Scale-Invariant Features,” in Proceedings of Int. Symposium on Visual Computing, LNCS 4291, 2006, pp. 558–568.Google Scholar
  28. 28.
    D. A. Lisin, M. A. Mattar, and M. B. Blaschko, “Combining Local and Global Image Features for Object Class Recognition,” in Proceedings of IEEE Int. Conf. on Computer Vision and Pattern Recognition, 2005, pp. 47–54.Google Scholar
  29. 29.
    A. Vedaldi, “Matlab Sift Implementation,”
  30. 30.
    J. L. Bentley, “Multidimensional Binary Search Trees Used for Associative Searching,” Communications of the ACM 18(9), 509–517 (1975).zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Pleiades Publishing, Ltd. 2008

Authors and Affiliations

  1. 1.Intelligent Systems Group, Robotics Research InstituteDortmund University of TechnologyDortmundGermany

Personalised recommendations