Skip to main content
Log in

Extracting hand articulations from monocular depth images using curvature scale space descriptors

  • Published:
Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Abstract

We propose a framework of hand articulation detection from a monocular depth image using curvature scale space (CSS) descriptors. We extract the hand contour from an input depth image, and obtain the fingertips and finger-valleys of the contour using the local extrema of a modified CSS map of the contour. Then we recover the undetected fingertips according to the local change of depths of points in the interior of the contour. Compared with traditional appearance-based approaches using either angle detectors or convex hull detectors, the modified CSS descriptor extracts the fingertips and finger-valleys more precisely since it is more robust to noisy or corrupted data; moreover, the local extrema of depths recover the fingertips of bending fingers well while traditional appearance-based approaches hardly work without matching models of hands. Experimental results show that our method captures the hand articulations more precisely compared with three state-of-the-art appearance-based approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abbasi, S., Mokhtarian, F., Kittler, J., 1999. Curvature scale space image in shape similarity retrieval. Multimedia Syst., 7(6):467–476. http://dx.doi.org/10.1007/s005300050147

    Article  Google Scholar 

  • Athitsos, V., Sclaroff, S., 2002. An appearance-based framework for 3D hand shape classification and camera viewpoint estimation. Proc. 5th IEEE Int. Conf. on Automatic Face and Gesture Recognition, p.40–45. http://dx.doi.org/10.1109/AFGR.2002.1004129

    Google Scholar 

  • Athitsos, V., Sclaroff, S., 2003. Estimating 3D hand pose from a cluttered image. Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, p.432–439. http://dx.doi.org/10.1109/CVPR.2003.1211500

    Google Scholar 

  • Cerezo, T., 2012. 3D hand and finger recognition using Kinect. Technical Report, Universidad de Granada, Spain. Available at http://frantracerkinectft.codeplex.com.

    Google Scholar 

  • Chang, W.Y., Chen, C.S., Jian, Y.D., 2008. Visual tracking in high-dimensional state space by appearanceguided particle filtering. IEEE Trans. Image Process., 17(7):1054–1067. http://dx.doi.org/10.1109/TIP.2008.924283

    Google Scholar 

  • de La Gorce, M., Fleet, D.J., Paragios, N., 2011. Modelbased 3D hand pose estimation from monocular video. IEEE Trans. Patt. Anal. Mach. Intell., 33(9):1793–1805. http://dx.doi.org/10.1109/TPAMI.2011.33

    Article  Google Scholar 

  • Feng, Z., Yang, B., Chen, Y., et al., 2011. Features extraction from hand images based on new detection operators. Patt. Recog., 44(5):1089–1105. http://dx.doi.org/10.1016/j.patcog.2010.08.007

    Article  Google Scholar 

  • Keskin, C., Kiraç, F., Kara, Y.E., et al., 2011. Real time hand pose estimation using depth sensors. In: Fossati, A., Gall, J., Grabner, H., et al. (Eds.), Consumer Depth Cameras for Computer Vision, Springer, London, p.119–137. http://dx.doi.org/10.1007/978-1-4471-4640-7_7

    Google Scholar 

  • Kirac, F., Kara, Y.E., Akarun, L., 2014. Hierarchically constrained 3D hand pose estimation using regression forests from single frame depth data. Patt. Recog. Lett., 50:91–100. http://dx.doi.org/10.1016/j.patrec.2013.09.003

    Article  Google Scholar 

  • Lee, D., Lee, S., 2011. Vision-based finger action recognition by angle detection and contour analysis. ETRI J., 33(3):415–422. http://dx.doi.org/10.4218/etrij.11.0110.0313

    Article  Google Scholar 

  • Ma, Z., Wu, E., 2014. Real-time and robust hand tracking with a single depth camera. Vis. Comput., 30(10):1133–1144. http://dx.doi.org/10.1007/s00371-013-0894-1

    Article  Google Scholar 

  • Maisto, M., Panella, M., Liparulo, L., et al., 2013. An accurate algorithm for the identification of fingertips using an RGB-D camera. IEEE J. Emerg. Sel. Topics Circ. Syst., 3(2):272–283. http://dx.doi.org/10.1109/JETCAS.2013.2256830

    Article  Google Scholar 

  • Morshidi, M., Tjahjadi, T., 2014. Gravity optimised particle filter for hand tracking. Patt. Recog., 47(1):194–207. http://dx.doi.org/10.1016/j.patcog.2013.06.032

    Article  Google Scholar 

  • Nagarajan, S., Subashini, T., Ramalingam, V., 2012. Vision based real time finger counter for hand gesture recognition. Int. J. Technol., 2(2):1–5.

    Google Scholar 

  • Oikonomidis, I., Kyriazis, N., Argyros, A.A., 2011. Efficient model-based 3D tracking of hand articulations using Kinect. BMVC, 1(2):1–11.

    Google Scholar 

  • Qian, C., Sun, X., Wei, Y., et al., 2014. Realtime and robust hand tracking from depth. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, p.1106–1113. http://dx.doi.org/10.1109/CVPR.2014.145

    Google Scholar 

  • Ren, Z., Yuan, J., Zhang, Z., 2011. Robust hand gesture recognition based on finger-earth mover’s distance with a commodity depth camera. Proc. 19th ACM Int. Conf. on Multimedia, p.1093–1096. http://dx.doi.org/10.1145/2072298.2071946

    Chapter  Google Scholar 

  • Rosales, R., Athitsos, V., Sigal, L., et al., 2001. 3D hand pose reconstruction using specialized mappings. Proc. 8th IEEE Int. Conf. on Computer Vision, p.378–385. http://dx.doi.org/10.1109/ICCV.2001.937543

    Google Scholar 

  • Schlattmann, M., Kahlesz, F., Sarlette, R., et al., 2007. Markerless 4 gestures 6 DOF real-time visual tracking of the human hand with automatic initialization. Comput. Graph. Forum, 26(3):467–476. http://dx.doi.org/10.1111/j.1467-8659.2007.01069.x

    Article  Google Scholar 

  • Tomasi, C., Petrov, S., Sastry, A., 2003. 3D tracking = classification + interpolation. Proc. 9th IEEE Int. Conf. on Computer Vision, p.1441–1448. http://dx.doi.org/10.1109/ICCV.2003.1238659

    Chapter  Google Scholar 

  • Tompson, J., Stein, M., Lecun, Y., et al., 2014. Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans. Graph., 33(5):169.1–169.10. http://dx.doi.org/10.1145/2629500

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to De-hui Kong.

Additional information

Project supported by the National Natural Science Foundation of China (Nos. 61227004, 61370120, 61390510, 61300065, and 61402024), Beijing Municipal Natural Science Foundation, China (No. 4142010), Beijing Municipal Commission of Education, China (No. km201410005013), and the Funding Project for Academic Human Resources Development in Institutions of Higher Learning under the Jurisdiction of Beijing Municipality, China

ORCID: Shao-fan WANG, http://orcid.org/0000-0002-3045-624X

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Sf., Li, C., Kong, Dh. et al. Extracting hand articulations from monocular depth images using curvature scale space descriptors. Frontiers Inf Technol Electronic Eng 17, 41–54 (2016). https://doi.org/10.1631/FITEE.1500126

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/FITEE.1500126

Keywords

CLC number

Navigation