Extracting hand articulations from monocular depth images using curvature scale space descriptors

Wang, Shao-fan; Li, Chun; Kong, De-hui; Yin, Bao-cai

doi:10.1631/FITEE.1500126

Extracting hand articulations from monocular depth images using curvature scale space descriptors

Published: 09 January 2016

Volume 17, pages 41–54, (2016)
Cite this article

Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Shao-fan Wang¹,
Chun Li¹,
De-hui Kong¹ &
…
Bao-cai Yin^2,1,3

89 Accesses
6 Citations
Explore all metrics

Abstract

We propose a framework of hand articulation detection from a monocular depth image using curvature scale space (CSS) descriptors. We extract the hand contour from an input depth image, and obtain the fingertips and finger-valleys of the contour using the local extrema of a modified CSS map of the contour. Then we recover the undetected fingertips according to the local change of depths of points in the interior of the contour. Compared with traditional appearance-based approaches using either angle detectors or convex hull detectors, the modified CSS descriptor extracts the fingertips and finger-valleys more precisely since it is more robust to noisy or corrupted data; moreover, the local extrema of depths recover the fingertips of bending fingers well while traditional appearance-based approaches hardly work without matching models of hands. Experimental results show that our method captures the hand articulations more precisely compared with three state-of-the-art appearance-based approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Computer vision-based hand gesture recognition for human-robot interaction: a review

Article Open access 19 July 2023

A review of hand gesture and sign language recognition techniques

Article 08 August 2017

Lightweight real-time hand segmentation leveraging MediaPipe landmark detection

Article Open access 05 September 2023

References

Abbasi, S., Mokhtarian, F., Kittler, J., 1999. Curvature scale space image in shape similarity retrieval. Multimedia Syst., 7(6):467–476. http://dx.doi.org/10.1007/s005300050147
Article Google Scholar
Athitsos, V., Sclaroff, S., 2002. An appearance-based framework for 3D hand shape classification and camera viewpoint estimation. Proc. 5th IEEE Int. Conf. on Automatic Face and Gesture Recognition, p.40–45. http://dx.doi.org/10.1109/AFGR.2002.1004129
Google Scholar
Athitsos, V., Sclaroff, S., 2003. Estimating 3D hand pose from a cluttered image. Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, p.432–439. http://dx.doi.org/10.1109/CVPR.2003.1211500
Google Scholar
Cerezo, T., 2012. 3D hand and finger recognition using Kinect. Technical Report, Universidad de Granada, Spain. Available at http://frantracerkinectft.codeplex.com.
Google Scholar
Chang, W.Y., Chen, C.S., Jian, Y.D., 2008. Visual tracking in high-dimensional state space by appearanceguided particle filtering. IEEE Trans. Image Process., 17(7):1054–1067. http://dx.doi.org/10.1109/TIP.2008.924283
Google Scholar
de La Gorce, M., Fleet, D.J., Paragios, N., 2011. Modelbased 3D hand pose estimation from monocular video. IEEE Trans. Patt. Anal. Mach. Intell., 33(9):1793–1805. http://dx.doi.org/10.1109/TPAMI.2011.33
Article Google Scholar
Feng, Z., Yang, B., Chen, Y., et al., 2011. Features extraction from hand images based on new detection operators. Patt. Recog., 44(5):1089–1105. http://dx.doi.org/10.1016/j.patcog.2010.08.007
Article Google Scholar
Keskin, C., Kiraç, F., Kara, Y.E., et al., 2011. Real time hand pose estimation using depth sensors. In: Fossati, A., Gall, J., Grabner, H., et al. (Eds.), Consumer Depth Cameras for Computer Vision, Springer, London, p.119–137. http://dx.doi.org/10.1007/978-1-4471-4640-7_7
Google Scholar
Kirac, F., Kara, Y.E., Akarun, L., 2014. Hierarchically constrained 3D hand pose estimation using regression forests from single frame depth data. Patt. Recog. Lett., 50:91–100. http://dx.doi.org/10.1016/j.patrec.2013.09.003
Article Google Scholar
Lee, D., Lee, S., 2011. Vision-based finger action recognition by angle detection and contour analysis. ETRI J., 33(3):415–422. http://dx.doi.org/10.4218/etrij.11.0110.0313
Article Google Scholar
Ma, Z., Wu, E., 2014. Real-time and robust hand tracking with a single depth camera. Vis. Comput., 30(10):1133–1144. http://dx.doi.org/10.1007/s00371-013-0894-1
Article Google Scholar
Maisto, M., Panella, M., Liparulo, L., et al., 2013. An accurate algorithm for the identification of fingertips using an RGB-D camera. IEEE J. Emerg. Sel. Topics Circ. Syst., 3(2):272–283. http://dx.doi.org/10.1109/JETCAS.2013.2256830
Article Google Scholar
Morshidi, M., Tjahjadi, T., 2014. Gravity optimised particle filter for hand tracking. Patt. Recog., 47(1):194–207. http://dx.doi.org/10.1016/j.patcog.2013.06.032
Article Google Scholar
Nagarajan, S., Subashini, T., Ramalingam, V., 2012. Vision based real time finger counter for hand gesture recognition. Int. J. Technol., 2(2):1–5.
Google Scholar
Oikonomidis, I., Kyriazis, N., Argyros, A.A., 2011. Efficient model-based 3D tracking of hand articulations using Kinect. BMVC, 1(2):1–11.
Google Scholar
Qian, C., Sun, X., Wei, Y., et al., 2014. Realtime and robust hand tracking from depth. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, p.1106–1113. http://dx.doi.org/10.1109/CVPR.2014.145
Google Scholar
Ren, Z., Yuan, J., Zhang, Z., 2011. Robust hand gesture recognition based on finger-earth mover’s distance with a commodity depth camera. Proc. 19th ACM Int. Conf. on Multimedia, p.1093–1096. http://dx.doi.org/10.1145/2072298.2071946
Chapter Google Scholar
Rosales, R., Athitsos, V., Sigal, L., et al., 2001. 3D hand pose reconstruction using specialized mappings. Proc. 8th IEEE Int. Conf. on Computer Vision, p.378–385. http://dx.doi.org/10.1109/ICCV.2001.937543
Google Scholar
Schlattmann, M., Kahlesz, F., Sarlette, R., et al., 2007. Markerless 4 gestures 6 DOF real-time visual tracking of the human hand with automatic initialization. Comput. Graph. Forum, 26(3):467–476. http://dx.doi.org/10.1111/j.1467-8659.2007.01069.x
Article Google Scholar
Tomasi, C., Petrov, S., Sastry, A., 2003. 3D tracking = classification + interpolation. Proc. 9th IEEE Int. Conf. on Computer Vision, p.1441–1448. http://dx.doi.org/10.1109/ICCV.2003.1238659
Chapter Google Scholar
Tompson, J., Stein, M., Lecun, Y., et al., 2014. Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans. Graph., 33(5):169.1–169.10. http://dx.doi.org/10.1145/2629500
Article Google Scholar

Download references

Author information

Authors and Affiliations

Beijing Key Laboratory of Multimedia and Intelligent Software Technology, College of Metropolitan Transportation, Beijing University of Technology, Beijing, 100124, China
Shao-fan Wang, Chun Li, De-hui Kong & Bao-cai Yin
School of Software Technology, Dalian University of Technology, Dalian, 116024, China
Bao-cai Yin
Collaborative Innovation Center of Electric Vehicles in Beijing, Beijing, 100081, China
Bao-cai Yin

Authors

Shao-fan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chun Li
View author publications
You can also search for this author in PubMed Google Scholar
De-hui Kong
View author publications
You can also search for this author in PubMed Google Scholar
Bao-cai Yin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to De-hui Kong.

Additional information

Project supported by the National Natural Science Foundation of China (Nos. 61227004, 61370120, 61390510, 61300065, and 61402024), Beijing Municipal Natural Science Foundation, China (No. 4142010), Beijing Municipal Commission of Education, China (No. km201410005013), and the Funding Project for Academic Human Resources Development in Institutions of Higher Learning under the Jurisdiction of Beijing Municipality, China

ORCID: Shao-fan WANG, http://orcid.org/0000-0002-3045-624X

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Sf., Li, C., Kong, Dh. et al. Extracting hand articulations from monocular depth images using curvature scale space descriptors. Frontiers Inf Technol Electronic Eng 17, 41–54 (2016). https://doi.org/10.1631/FITEE.1500126

Download citation

Received: 20 April 2015
Accepted: 26 October 2015
Published: 09 January 2016
Issue Date: January 2016
DOI: https://doi.org/10.1631/FITEE.1500126

Keywords

CLC number

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Extracting hand articulations from monocular depth images using curvature scale space descriptors

Abstract

Access this article

Similar content being viewed by others

Computer vision-based hand gesture recognition for human-robot interaction: a review

A review of hand gesture and sign language recognition techniques

Lightweight real-time hand segmentation leveraging MediaPipe landmark detection

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

CLC number

Navigation

Extracting hand articulations from monocular depth images using curvature scale space descriptors

Abstract

Access this article

Similar content being viewed by others

Computer vision-based hand gesture recognition for human-robot interaction: a review

A review of hand gesture and sign language recognition techniques

Lightweight real-time hand segmentation leveraging MediaPipe landmark detection

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

CLC number

Search

Navigation