Real-Time Hand Pose Estimation Using Depth Camera

Ge, Liuhao; Yuan, Junsong; Magnenat Thalmann, Nadia

doi:10.1007/978-3-030-28603-3_16

Liuhao Ge¹⁵,
Junsong Yuan¹⁶ &
Nadia Magnenat Thalmann¹⁵

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

1657 Accesses

Abstract

In recent years, we have witnessed a steady growth of the research in real-time 3D hand pose estimation with depth cameras, since this technology plays an important role in various human–computer interaction applications. In this chapter, we first review existing techniques and systems for real-time 3D hand pose estimation. Then, we will discuss two point-set-based methods for 3D hand pose estimation from depth images: (1) point-set-based holistic regression method that directly regresses holistic 3D hand pose; (2) point-set-based point-wise regression method that generates dense outputs for robust 3D hand pose estimation. Extensive experiments are conducted to evaluate the effectiveness of these two methods. We will also discuss the limitations and advantages of the proposed methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alcover EA, Jaume-i Capó A, Moyà-Alcover B (2018) PROGame: a process framework for serious game development for motor rehabilitation therapy. PloS one 13(5)
Google Scholar
Ayed I, Ghazel A, Jaume-i Capó A, Moya-Alcover G, Varona J, Martínez-Bueso P (2018) Feasibility of Kinect-based games for balance rehabilitation: A case study. J Healthc Eng
Google Scholar
Chen X, Wang G, Guo H, Zhang C (2017) Pose guided structured region ensemble network for cascaded hand pose estimation. CoRR. https://arxiv.org/abs/1708.03416
Choi C, Kim S, Ramani K (2017) Learning hand articulations by hallucinating heat distribution. In: Proceedings of international conference on computer vision, pp 3104–3113
Google Scholar
Choi C, Sinha A, Hee Choi J, Jang S, Ramani K (2015) A collaborative filtering approach to real-time hand pose estimation. In: Proceedings of international conference on computer vision, pp 2336–2344
Google Scholar
Dollár P, Welinder P, Perona P (2010) Cascaded pose regression. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1078–1085
Google Scholar
Erol A, Bebis G, Nicolescu M, Boyle RD, Twombly X (2005) A review on vision-based full DOF hand motion estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. Workshops, pp 75–82
Google Scholar
Erol A, Bebis G, Nicolescu M, Boyle RD, Twombly X (2007) Vision-based hand pose estimation: a review. Comput Vis Image Underst 108(1):52–73
Article Google Scholar
Ge L, Liang H, Yuan J, Thalmann D (2016) Robust 3D hand pose estimation in single depth images: from single-view CNN to multi-view CNNs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3593–3601
Google Scholar
Ge L, Liang H, Yuan J, Thalmann D (2017) 3D convolutional neural networks for efficient and robust hand pose estimation from single depth images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1991–2000
Google Scholar
Ge L, Liang H, Yuan J, Thalmann D (2018) Real-time 3D hand pose estimation with 3D convolutional neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 1–15. https://doi.org/10.1109/TPAMI.2018.2827052
Article Google Scholar
Guo H, Wang G, Chen X, Zhang C, Qiao F, Yang H (2017) Region ensemble network: improving convolutional network for hand pose estimation. In: Proceedings international conference on image processing
Google Scholar
Hoppe H, DeRose T, Duchamp T, Mcdonald J, Stuetzle W (1992) Surface reconstruction from unorganized points. Comput Graph 26(2):71–78
Article Google Scholar
Joo H, Simon T, Sheikh Y (2018) Total capture: A 3D deformation model for tracking faces, hands, and bodies. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8320–8329
Google Scholar
Keskin C, Kıraç F, Kara YE, Akarun L (2012) Hand pose estimation and hand shape classification using multi-layered randomized decision forests. In: Proceedings european conference on computer vision, pp 852–863
Google Scholar
Khamis S, Taylor J, Shotton J, Keskin C, Izadi S, Fitzgibbon A (2015) Learning an efficient model of hand shape variation from depth images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2540–2548
Google Scholar
Kirac F, Kara YE, Akarun L (2014) Hierarchically constrained 3D hand pose estimation using regression forests from single frame depth data. Pattern Recognit Lett 50:91–100
Article Google Scholar
Li P, Ling H, Li X, Liao C (2015) 3D hand pose estimation using randomized decision forest with segmentation index points. In: Proceedings of international conference on computer vision, pp 819–827
Google Scholar
Liang H, Yuan J, Thalmann D (2014) Parsing the hand in depth images. IEEE Trans Multimed 16(5):1241–1253
Article Google Scholar
Liang H, Yuan J, Thalmann D (2015) Resolving ambiguous hand pose predictions by exploiting part correlations. IEEE Trans Circuits Syst Video Technol 25(7):1125–1139
Google Scholar
Melax S, Keselman L, Orsten S (2013) Dynamics based 3D skeletal hand tracking. In: Proceedings of graphics interface, pp 63–70
Google Scholar
Moon G, Chang JY, Lee KM (2018) V2V-PoseNet: Voxel-to-voxel prediction network for accurate 3D hand and human pose estimation from a single depth map. In: Proceedings of the IEEE conference on computer vision and pattern Recognition, pp 5079–5088
Google Scholar
Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: proceedings of the European conference on computer vision, pp 483–499
Google Scholar
Oberweger M, Lepetit V (2017) DeepPrior++: improving fast and accurate 3D hand pose estimation. In: Proceedings of international conference on computer vision. Workshop, pp 585–594
Google Scholar
Oberweger M, Riegler G, Wohlhart P, Lepetit V (2016) Efficiently creating 3D training data for fine hand pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3593–3601
Google Scholar
Oberweger M, Wohlhart P, Lepetit V (2015) Hands deep in deep learning for hand pose estimation. In: Proceedings of the computer vision. Winter Workshop, pp 21–30
Google Scholar
Oberweger M, Wohlhart P, Lepetit V (2015) Training a feedback loop for hand pose estimation. In: Proceedings of the international conference computer vision, pp 3316–3324
Google Scholar
Oikonomidis I, Kyriazis N, Argyros A (2011) Efficient model-based 3D tracking of hand articulations using Kinect. In: Proceedings of the British machine computer vision, pp 101.1–101.11
Google Scholar
Oikonomidis I, Kyriazis N, Argyros AA (2010) Markerless and efficient 26-DOF hand pose recovery. In: Proceedings of the Asian conference on compute vision, pp 744–757. Springer
Google Scholar
Pavlakos G, Zhou X, Derpanis KG, Daniilidis K (2017) Coarse-to-fine volumetric prediction for single-image 3D human pose. In: Proceedings of the IEEE conference computer vision pattern recognition, pp 7025–7034
Google Scholar
Pavlovic VI, Sharma R, Huang TS (1997) Visual interpretation of hand gestures for human-computer interaction: a review. IEEE Trans. Pattern Anal. Mach. Intell. 19(7):677–695
Article Google Scholar
Qi CR, Su H, Mo K, Guibas, LJ (2017) PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 652–660
Google Scholar
Qi CR, Yi L, Su H, Guibas LJ (2017) PointNet++: deep hierarchical feature learning on point sets in a metric space. In: Proceedings of the conference neural information processing systems
Google Scholar
Qian C, Sun X, Wei Y, Tang X, Sun J (2014) Realtime and robust hand tracking from depth. In: Proceedings of the IEEE Conference Computer Vision Pattern Recognition, pp 1106–1113
Google Scholar
Quam DL (1990) Gesture recognition with a dataglove. Proc. IEEE Conf. Aerosp. Electron. 2:755–760
Article Google Scholar
Rad M, Oberweger M, Lepetit V (2018) Feature mapping for learning fast and accurate 3D pose inference from synthetic images. In: Proceedings of the ieee conference computer vision pattern recognition, pp 4663–4672
Google Scholar
Rogez G, Weinzaepfel P, Schmid C (2017) LCR-net: localization-classification-regression for human pose. In: Proceedings of the IEEE conference computer vision pattern recognition, pp 3433–3441
Google Scholar
Romero J, Kjellström H, Kragic D (2009) Monocular real-time 3D articulated hand pose estimation. In: Proceedings of the IEEE-RAS conference humanoid robots, pp 87–92
Google Scholar
Romero J, Kjellström H, Kragic D (2010) Hands in action: real-time 3D reconstruction of hands in interaction with objects. In: Proceedings IEEE Conference Robotics and Automation, pp 458–463
Google Scholar
Romero J, Tzionas D, Black MJ (2017) Embodied hands: modeling and capturing hands and bodies together. ACM Trans Graph 36(6):245:1–245:17
Article Google Scholar
Shakhnarovich G, Viola P, Darrell T (2003) Fast pose estimation with parameter-sensitive hashing. In: Proceedings of the International Conference Computer Vision, pp 750–758
Google Scholar
Sharp T, Keskin C, Robertson D, Taylor J, Shotton J, Kim D, Rhemann C, Leichter I, Vinnikov A, Wei Y, Freedman D, Kohli P, Krupka E, Fitzgibbon A, Izadi S (2015) Accurate, robust, and flexible real-time hand tracking. In: Proceedings of the 33rd annual ACM conference human factors in computing systems, pp 3633–3642
Google Scholar
Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A (2011) Real-time human pose recognition in parts from a single depth image. In: Proceedings IEEE conference computer vision pattern recognition, pp 1297–1304
Google Scholar
Sinha A, Choi C, Ramani K (2016) Deephand: Robust hand pose estimation by completing a matrix with deep features. In: Proceedings of the IEEE Conference Computer Vision Pattern Recognition, pp 4150–4158
Google Scholar
Sridhar S, Mueller F, Oulasvirta A, Theobalt C (2015) Fast and robust hand tracking using detection-guided optimization. In: Proceedings of the IEEE conference computer vision pattern recognition, pp 3213–3221
Google Scholar
Sridhar S, Mueller F, Zollhoefer M, Casas D, Oulasvirta A, Theobalt C (2016) Real-time joint tracking of a hand manipulating an object from RGB-D input. In: Proceedings of the European conference computer vision, pp 294–310
Chapter Google Scholar
Sun X, Wei Y, Liang S, Tang X, Sun J (2015) Cascaded hand pose regression. In: Proceedings of the IEEE conference computer vision pattern recognition, pp 824–832
Google Scholar
Supancic III JS, Rogez G, Yang Y, Shotton J, Ramanan D (2015) Depth-based hand pose estimation: methods, data, and challenges. In: Proceedings international conference computer vision, pp 1868–1876
Google Scholar
Tagliasacchi A, Schroeder M, Tkach A, Bouaziz S, Botsch M, Pauly M (2015) Robust articulated-ICP for real-time hand tracking. Comput Graph Forum 34(5):101–114
Article Google Scholar
Tang D, Chang HJ, Tejani A, Kim TK (2014) Latent regression forest: structured estimation of 3D articulated hand posture. In: Proceedings of the IEEE conference computer vision pattern recognition, pp 3786–3793
Google Scholar
Tang D, Taylor J, Kohli P, Keskin C, Kim TK, Shotton J (2015) Opening the black box: hierarchical sampling optimization for estimating human hand pose. In: Proceedings of the international conference computer vision, pp 3325–3333
Google Scholar
Tang D, Yu TH, Kim TK (2013) Real-time articulated hand pose estimation using semi-supervised transductive regression forests. In: Proceedings of the international conference computer vision, pp 3224–3231
Google Scholar
Taylor J, Bordeaux L, Cashman T, Corish B, Keskin C, Sharp T, Soto E, Sweeney D, Valentin J, Luff B, Topalian A, Wood E, Khamis S, Kohli P, Izadi S, Banks R, Fitzgibbon A, Shotton J (2016) Efficient and precise interactive hand tracking through joint, continuous optimization of pose and correspondences. ACM Trans Graph 35(4):143:1–143:12
Article Google Scholar
Taylor J, Shotton J, Sharp T, Fitzgibbon A (2012) The Vitruvian manifold: Inferring dense correspondences for one-shot human pose estimation. In: Proceedings of the IEEE conference computer vision pattern recognition, pp 103–110
Google Scholar
Tkach A, Pauly M, Tagliasacchi A (2016) Sphere-meshes for real-time hand modeling and tracking. ACM Trans Graph 35(6):222:1–222:11
Article Google Scholar
Tome D, Russell C, Agapito L (2017) Lifting from the deep: Convolutional 3D pose estimation from a single image. In: Proceedings of the IEEE conference computer vision pattern recognition, pp 2500–2509
Google Scholar
Tompson J, Jain A, LeCun Y, Bregler C (2014) Joint training of a convolutional network and a graphical model for human pose estimation. In: Proceedings neural information processing systems, pp 1799–1807
Google Scholar
Tompson J, Stein M, Lecun Y, Perlin K (2014) Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans Graph 33(5):169:1–169:10
Article Google Scholar
Tzionas D, Ballan L, Srikantha A, Aponte P, Pollefeys M, Gall J (2016) Capturing hands in action using discriminative salient points and physics simulation. Int J Comput Vis 118(2):172–193
Article MathSciNet Google Scholar
Verth JMV, Bishop LM (2008) Essential mathematics for games and interactive applications, Second Edition: A Programmer’s Guide, 2nd edn. Morgan Kaufmann Publishers Inc., San Francisco, CA
Google Scholar
Wan C, Probst T, Van Gool L, Yao A (2017) Crossing nets: dual generative models with a shared latent space for hand pose estimation. In: Proceedings of the IEEE conference computer vision pattern recognition, pp 680–689
Google Scholar
Wan C, Probst T, Van Gool L, Yao A (2018) Dense 3D regression for hand pose estimation. In: Proceedings of the IEEE conference computer vision pattern recognition, pp 5147–5156
Google Scholar
Wan C, Yao A, Van Gool L (2016) Direction matters: hand pose estimation from local surface normals. In: Proceedings of the European conference computer vision, pp 554–569
Chapter Google Scholar
Wang C, Cannon DJ (1993) A virtual end-effector pointing system in point-and-direct robotics for inspection of surface flaws using a neural network based skeleton transform. Proc Int Conf Robot Autom 3:784–789
Google Scholar
Wei SE, Ramakrishna V, Kanade T, Sheikh Y (2016) Convolutional pose machines. In: Proceedings IEEE conference computer vision pattern recognition, pp 4724–4732
Google Scholar
Wu Y, Huang TS (1999) Vision-based gesture recognition: a review. In: International Gesture Workshop, pp 103–115
Chapter Google Scholar
Wu Y, Huang TS (2001) Hand modeling, analysis and recognition. IEEE Signal Process Mag 18(3):51–60
Article Google Scholar
Xu C, Cheng L (2013) Efficient hand pose estimation from a single depth image. In: Proceedings of the international conference computer vision, pp 3456 – 3462
Google Scholar
Xu C, Govindarajan LN, Zhang Y, Cheng L (2016) Lie-X: depth image based articulated object pose estimation, tracking, and action recognition on lie groups. Int J Comput Vis 454–478
Article MathSciNet Google Scholar
Ye M, Shen Y, Du C, Pan Z, Yang R (2016) Real-time simultaneous pose and shape estimation for articulated objects using a single depth camera. IEEE Trans Pattern Anal Mach Intell 38(8):1517–1532
Article Google Scholar
Ye Q, Yuan S, Kim TK (2016) Spatial attention deep net with partial PSO for hierarchical hybrid hand pose estimation. In: Proceedings European conference computer vision, pp 346–361
Chapter Google Scholar
Yuan S, Garcia-Hernando G, Stenger B, Moon G, Chang JY, Lee KM, Molchanov P, Kautz J, Honari S, Ge L, Yuan J, Chen X, Wang G, Yang F, Akiyama K, Wu Y, Wan Q, Madadi M, Escalera S, Li S, Lee D, Oikonomidis I, Argyros A, Kim TK (2018) Depth-based 3D hand pose estimation: from current achievements to future goals. In: Proceedings of the IEEE conference computer vision pattern recognition, pp 2636–2645
Google Scholar
Yuan S, Ye Q, Stenger B, Jain S, Kim TK (2017) Bighand2. 2m benchmark: Hand pose dataset and state of the art analysis. In: Proceedings IEEE conference computer vision pattern recognition, pp 2605–2613
Google Scholar
Zhou X, Wan Q, Zhang W, Xue X, Wei Y (2016) Model-based deep hand pose estimation. In: Proceedings of the international joint conference artificial intelligence, pp 2421–2427
Google Scholar
Zollhöfer M, Nießner M, Izadi S, Rehmann C, Zach C, Fisher M, Wu C, Fitzgibbon A, Loop C, Theobalt C et al (2014) Real-time non-rigid reconstruction using an RGB-D camera. ACM Trans Graph (TOG) 33(4):156
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Media Innovation, Nanyang Technological University Singapore, Singapore, 637553, Singapore
Liuhao Ge & Nadia Magnenat Thalmann
Department of Computer Science and Engineering, State University of New York at Buffalo, Buffalo, NY, 14260-2500, USA
Junsong Yuan

Authors

Liuhao Ge
View author publications
You can also search for this author in PubMed Google Scholar
Junsong Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Nadia Magnenat Thalmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Junsong Yuan .

Editor information

Editors and Affiliations

School of Computer Science and Informatics, Cardiff University, Cardiff, UK
Paul L. Rosin
School of Computer Science and Informatics, Cardiff University, Cardiff, UK
Yu-Kun Lai
IEEE, University of East Anglia, Norwich, UK
Ling Shao
Department of Computer Science, Edge Hill University, Ormskirk, UK
Yonghuai Liu

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ge, L., Yuan, J., Magnenat Thalmann, N. (2019). Real-Time Hand Pose Estimation Using Depth Camera. In: Rosin, P., Lai, YK., Shao, L., Liu, Y. (eds) RGB-D Image Analysis and Processing. Advances in Computer Vision and Pattern Recognition. Springer, Cham. https://doi.org/10.1007/978-3-030-28603-3_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-28603-3_16
Published: 27 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28602-6
Online ISBN: 978-3-030-28603-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics