Robust hand detection in still video images using a combination of salient regions and color cues for interaction with an intelligent environment
The “intelligence” of an intelligent environment is not only influenced by the functionality it offers, but also largely by the naturalness and intuitiveness of its interaction modes. A very important natural interaction mode is gestures, as long as the environment’s interface poses no strict constraints on how the gestures may be performed. Since gestures are generally defined by hand/arm poses and motions, an important prerequisite to the recognition of unconstrained gestures is the robust detection of hands in video images. However, due to the strongly articulated nature of hands and the challenges given by a realistic (i.e., not strictly controlled) environment, this is a very challenging task, because it means hands need to be found in almost arbitrary configurations and under strongly varying lighting conditions. In this article, we present an approach to hand detection in the context of an intelligent house using a fusion of structural cues and color information. We first describe our detection algorithm using scale-invariant salient region features, combined with an efficient region-based filtering approach to reduce the number of false positives. The results are fused with the output of a skin color classifier. A detailed experimental evaluation on realistic data, including different cue fusing schemes, is presented. By means of an experimental evaluation on a challenging task, we demonstrate that, although each of the two different feature types (image structure and color) has drawbacks, their combination yields promising results for robust hand detection.
Unable to display preview. Download preview PDF.
- 1.J. Richarz, T. Plötz, and G. A. Fink, “Detecting Hands in Video Images Using Scale Invariant Local Descriptors,” in Proceedings of IASTED Int. Conf. on Visualization, Imaging and Image Processing (VHP 2007) (Palma de Mallorca, Spain, 2007), pp. 259–264.Google Scholar
- 2.T. Plötz, “The FINCA: A Flexible, Intelligent eNvironment with Computational Augmentation,” http://www.finca.irf.de, 2007.
- 3.N. Hofemann, J. Fritsch, and G. Sagerer, “Recognition of Deictic Gestures with Context,” in Proceedings of 26th Deutsche Arbeitsgemeinschaft Mustererkennung Symposium (LNCS Vol. 3175, Springer, 2004), pp. 334–341.Google Scholar
- 4.R. Lockton and A. W. Fitzgibbon, “Real-Time Gesture Recognition Using Deterministic Boosting,” in Proceedings of British Machine Vision Conference, 2002, pp. 817–826.Google Scholar
- 7.S. Jayaram, S. Schmugge, M. C. Shin, and L. V. Tsap, “Effect of Colorspace Transformation, the Illuminance Component, and Color Modeling on Skin Detection,” in Proceedings of IEEE Conf. on Computer Vision and Pattern Recognition, 2004, Vol. 2, pp. 813–818.Google Scholar
- 8.M. C. Shin, K. I. Chang, and L. V. Tsap, “Does Color-space Transformation Make Any Difference on Skin Detection?,” in Proceedings of 6th IEEE Workshop on Applications of Computer Vision, 2002, pp. 275–279.Google Scholar
- 10.Q. Zhu, K.-T. Cheng, and C.-T. Wu, “A Unified Aadaptive Approach to Accurate Skin Detection,” in Proceedings of IEEE Int. Conf. on Image Processing, 2004, Vol. 2, pp. 1189–1192.Google Scholar
- 13.P. Viola and M. Jones, “Rapid Object Detection Using a Boosted Cascade of Simple Features,” in Proceedings of IEEE Conf. on Computer Vision and Pattern Recognition, 2001, pp. 511–518.Google Scholar
- 14.T. F. Cootes and C. J. Taylor, “Statistical Models of Appearance for Medical Image Analysis and Computer Vision,” Proceedings of SPIE Medical Imaging, 2001, Vol. 4322, pp. 238–248.Google Scholar
- 15.M. C. Burl, M. Weber, and P. Perona, “A Probabilistic Approach to Object Recognition Using Local Photometry and Global Geometry,” Proceedings of 5th European Conf. on Computer Vision, LNCS 1407, 1998, Vol. 2, pp. 628–641.Google Scholar
- 16.C. Harris and M. Stephens, “A Combined Corner and Edge Detector,” in Proceedings of Alvey Vision Conference, 1988, pp. 147–151.Google Scholar
- 18.H. Bay, T. Tuytelaars, and L. van Gool, “Surf: Speeded up Robust Features,” in Proceedings of 9th European Conf. on Computer Vision (LNCS Vol. 3951, Springer, 2006), pp. 404–417.Google Scholar
- 20.G. Fumera and F. Roli, “Linear Combiners for Classifier Fusion: Some Theoretical and Experimental Results,” in Proceedings of Int. Workshop on Multiple Classifier Systems, LNCS 2709, 2003, pp. 74–83.Google Scholar
- 21.F. Roli, J. Kittler, G. Fumera, and D. Muntoni, “An Experimental Comparison of Classifier Fusion Rules for Multimodel Personal Identity Verification Systems,” in Proceedings of Int. Workshop on Multiple Classifier Systems, LNCS 2364, 2002, pp. 325–336.Google Scholar
- 23.C. Martin, E. Schaffernicht, A. Scheidig, and H.-M. Gross, “Sensor Fusion Using a Probabilistic Aggregation Scheme for People Detection and Tracking,” in Proceedings of 2nd European Conf. on Mobile Robots, 2005, pp. 176–181.Google Scholar
- 24.J. Alon, V. Athitsos, and S. Sclaroff, “Simultaneous Localization and Recognition of Dynamic Hand Gestures,” in Proceedings of IEEE Workshop on Motion and Video Computing, 2005, Vol. 2, pp. 254–260.Google Scholar
- 25.A. S. Micilotta, E.-J. Ong, and R. Bowden, “Real-Time Upper Body Detection and 3D Pose Estimation in Monoscopic Images,” in Proceedings of European Conference on Computer Vision (LNCS 3953, Springer Verlag, 2006), pp. 139–150.Google Scholar
- 26.J. Kosecka and F. Li, “Vision Based Topological Markov Localization,” in Proceedings of IEEE Int. Conf. on Robotics and Automation, 2004, Vol. 2, pp. 1481–1486.Google Scholar
- 27.J. Liu and R. Hubbold, “Automatic Camera Calibration and Scene Reconstruction with Scale-Invariant Features,” in Proceedings of Int. Symposium on Visual Computing, LNCS 4291, 2006, pp. 558–568.Google Scholar
- 28.D. A. Lisin, M. A. Mattar, and M. B. Blaschko, “Combining Local and Global Image Features for Object Class Recognition,” in Proceedings of IEEE Int. Conf. on Computer Vision and Pattern Recognition, 2005, pp. 47–54.Google Scholar
- 29.A. Vedaldi, “Matlab Sift Implementation,” http://vision.ucla.edu/vedaldi/code/sift/sift.html.