The Visual Computer

, Volume 29, Issue 6–8, pp 837–848 | Cite as

Model-based hand pose estimation via spatial-temporal hand parsing and 3D fingertip localization

  • Hui Liang
  • Junsong Yuan
  • Daniel Thalmann
  • Zhengyou Zhang
Original Article


In this paper we present a novel vision-based markerless hand pose estimation scheme with the input of depth image sequences. The proposed scheme exploits both temporal constraints and spatial features of the input sequence, and focuses on hand parsing and 3D fingertip localization for hand pose estimation. The hand parsing algorithm incorporates a novel spatial-temporal feature into a Bayesian inference framework to assign the correct label to each image pixel. The 3D fingertip localization algorithm adapts a recently developed geodesic extrema extraction method to fingertip detection with the hand parsing algorithm, a novel path-reweighting method and K-means clustering in metric space. The detected 3D fingertip locations are finally used for hand pose estimation with an inverse kinematics solver. Quantitative experiments on synthetic data show the proposed hand pose estimation scheme can accurately capture the natural hand motion. A simulated water-oscillator application is also built to demonstrate the effectiveness of the proposed method in human-computer interaction scenarios.


Fingertip detection Geodesic distance Hand pose estimation Human computer interaction 



This research, which is carried out at BeingThere Centre, is supported by the Singapore National Research Foundation under its International Research Centre @ Singapore Funding Initiative and administered by the IDM Programme Office.


  1. 1.
  2. 2.
    Aristidou, A., Lasenby, J.: Motion capture with constrained inverse kinematics for real-time hand tracking. In: International Symposium on Communications, Control and Signal Processing, pp. 1–5 (2010) Google Scholar
  3. 3.
    Wu, Y., Huang, T.S.: Capturing articulated human hand motion: a divide-and-conquer approach. In: Proceedings of the IEEE International Conference on Computer Vision 1, pp. 606–611 (1999) Google Scholar
  4. 4.
    Henia, O.B., Hariti, M., Bouakaz, S.: A two-step minimization algorithm for model-based hand tracking. In: WSCG (2010) Google Scholar
  5. 5.
    Ho, M., Tseng, C., Lien, C., Huang, C.: A multi-view vision-based hand motion capturing system. Pattern Recognit. 44(2), 443–453 (2011) zbMATHCrossRefGoogle Scholar
  6. 6.
    Ballan, L., Taneja, A., Gall, J., Gool, L.V., Pollefeys, M.: Motion capture of hands in action using discriminative salient points. In: ECCV, vol. 12, pp. 640–653 (2012) Google Scholar
  7. 7.
    Keskin, C., Kira, F., Kara, Y.E., Akarun, L.: Real time hand pose estimation using depth sensors. In: Proceeding of the IEEE International Conference on Computer Vision Workshops, pp. 1228–1234 (2011) Google Scholar
  8. 8.
    Oikonomidis, I., Kyriazis, N., Argyros, A.A.: Efficient model-based 3D tracking of hand articulations using kinect. In: Proceedings of the British Machine Vision Conference (2011) Google Scholar
  9. 9.
    Pellegrini, S., Schindler, K., Nardi, D.: A generalisation of the ICP algorithm for articulated bodies. In: Proceedings of the British Machine Vision Conference (2008) Google Scholar
  10. 10.
    Stenger, B., Mendonqa, P.R.S., Cipolla, R.: Model-based 3D tracking of an articulated hand. In: CVPR, vol. 2, pp. 310–315 (2001) Google Scholar
  11. 11.
    Stenger, B., Thayananthan, A., Torr, P.H.S., Cipolla, R.: Model-based hand tracking using a hierarchical Bayesian filter. IEEE Trans. Pattern Anal. Mach. Intell. 28(9), 1372–1384 (2006) CrossRefGoogle Scholar
  12. 12.
    Lin, J.Y., Wu, Y., Huang, T.S.: 3D model-based hand tracking using stochastic direct search method. In: FG 2004, pp. 693–698 (2004) Google Scholar
  13. 13.
    Romero, J., Kjellstrom, H., Kragic, D.: Monocular real-time 3D articulated hand pose estimation. In: IEEE-RAS International Conference on Humanoid Robots, pp. 87–92 (2009) Google Scholar
  14. 14.
    Xu, J., Wu, Y., Katsaggelos, A.: Part-based initialization for hand tracking. In: IEEE International Conference on Image Processing, pp. 3257–3260 (2010) Google Scholar
  15. 15.
    Doliotis, P., Athitsos, V., Kosmopoulos, D., Perantonis, S.: Hand shape and 3D pose estimation using depth data from a single cluttered frame. In: ISVC, vol. 1, pp. 148–158 (2012) Google Scholar
  16. 16.
    Plagemann, C., Ganapathi, V., Koller, D., Thrun, S.: Real-time identification and localization of body parts from depth images. In: IEEE International Conference on Robotics and Automation, pp. 3108–3113 (2010) Google Scholar
  17. 17.
    Wang, R.Y., Popovic, J.: Real-time hand tracking with a color glove. ACM Trans. Graph. 28(3) (2009). doi: 10.1145/1531326.1531369
  18. 18.
    Besl, P.J., McKay, N.D.: A method for registration of 3-d shapes. IEEE Trans. Pattern Anal. Mach. Intell. 14(2), 239–256 (1992) CrossRefGoogle Scholar
  19. 19.
    Erol, A., Bebis, G., Nicolescu, M., Boyle, X.R.D.: A review on vision-based full DOF hand motion estimation. In: CVPR 05, pp. 75–82 (2005) Google Scholar
  20. 20.
    Lin, L.J., Ying, W., Huang, T.S.: Modeling the constraints of human hand motion. In: Proceedings of the Workshop on Human Motion, pp. 121–126 (2000) Google Scholar
  21. 21.
    Mo, Z., Neumann, N.: Real-time hand pose recognition using low-resolution depth images. In: CVPR 06, pp. 1499–1505 (2006) Google Scholar
  22. 22.
    Panin, G., Klose, S., Knoll, A.: Real-time articulated hand detection and pose estimation. In: Proceedings of the International Symposium on Advances in Visual Computing, pp. 1131–1140 (2009) CrossRefGoogle Scholar
  23. 23.
    Kolsch, M., Turk, M.: Robust hand detection. In: FG 2004, pp. 614–619 (2004) Google Scholar
  24. 24.
    Kolsch, M., Turk, M.: Hand tracking with flocks of features. In: CVPR (2005) Google Scholar
  25. 25.
    Toyama, K., Blake, A.: Probabilistic tracking with exemplars in a metric space. Int. J. Comput. Vis. 48(1), 9–19 (2002) zbMATHCrossRefGoogle Scholar
  26. 26.
    Chua, C.S., Guan, H., Ho, Y.K.: Model-based 3d hand posture estimation from a single 2d image. Image Vis. Comput. 20(3), 191–202 (2002) CrossRefGoogle Scholar
  27. 27.
    Baak, A., Muller, M., Bharaj, G., Seidel, H.P., Theobal, C.: A data-driven approach for real-time full body pose reconstruction from a depth camera. In: Proceedings of the IEEE International Conference on Computer Vision (2011) Google Scholar
  28. 28.
    Schwarz, L., Mkhitaryan, A., Mateus, D., Navab, N.: Estimating human 3d pose from time-of-flight images based on geodesic distances and optical flow. In: FG 2011, pp. 700–706 (2011) Google Scholar
  29. 29.
    Wang, L.C.T., Chen, C.C.: A combined optimization method for solving the inverse kinematics problem of mechanical manipulators. IEEE Trans. Robot. Autom. 7(4), 489–499 (1991) CrossRefGoogle Scholar
  30. 30.
    Liang, H., Yuan, J., Thalmann, D.: 3D fingertip and palm tracking in depth image sequences. In: ACM MultiMedia, pp. 785–788 (2012) Google Scholar
  31. 31.
    Liang, H., Yuan, J., Thalmann, D.: Hand pose estimation by combining fingertip tracking and articulated ICP. In: VRCAI, vol. 12, pp. 87–90 (2012) CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Hui Liang
    • 1
  • Junsong Yuan
    • 2
  • Daniel Thalmann
    • 3
  • Zhengyou Zhang
    • 4
  1. 1.Institute for Media Innovation & School of EEENanyang Technological UniversitySingaporeSingapore
  2. 2.School of Electrical and Electronic EngineeringNanyang Technological UniversitySingaporeSingapore
  3. 3.Institute for Media InnovationNanyang Technological UniversitySingaporeSingapore
  4. 4.Microsoft ResearchRedmondUSA

Personalised recommendations