3D Hand Pose Detection in Egocentric RGB-D Images

  • Grégory RogezEmail author
  • Maryam Khademi
  • J. S. Supančič III
  • J. M. M. Montiel
  • Deva Ramanan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8925)


We focus on the task of hand pose estimation from egocentric viewpoints. For this problem specification, we show that depth sensors are particularly informative for extracting near-field interactions of the camera wearer with his/her environment. Despite the recent advances in full-body pose estimation using Kinect-like sensors, reliable monocular hand pose estimation in RGB-D images is still an unsolved problem. The problem is exacerbated when considering a wearable sensor and a first-person camera viewpoint: the occlusions inherent to the particular camera view and the limitations in terms of field of view make the problem even more difficult. We propose to use task and viewpoint specific synthetic training exemplars in a discriminative detection framework. We also exploit the depth features for a sparser and faster detection. We evaluate our approach on a real-world annotated dataset and propose a novel annotation technique for accurate 3D hand labelling even in case of partial occlusions.


Egocentric vision Hand pose Multi-class classifier RGB-D sensor 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

Supplementary material 1 (MP4 17,420 KB)

Supplementary material 2 (MP4 16,284 KB)


  1. 1.
    Hodges, S., Williams, L., Berry, E., Izadi, S., Srinivasan, J., Butler, A., Smyth, G., Kapur, N., Wood, K.: SenseCam: a retrospective memory aid. In: Dourish, P., Friday, A. (eds.) UbiComp 2006. LNCS, vol. 4206, pp. 177–193. Springer, Heidelberg (2006) CrossRefGoogle Scholar
  2. 2.
    Yang, R., Sarkar, S., Loeding, B.L.: Handling movement epenthesis and hand segmentation ambiguities in continuous sign language recognition using nested dynamic programming. PAMI 32(3), 462–477 (2010)CrossRefGoogle Scholar
  3. 3.
    den Bergh, M.V., Gool, L.J.V.: Combining rgb and tof cameras for real-time 3d hand gesture interaction. In: WACV, 66–72 (2011)Google Scholar
  4. 4.
    Shotton, J., Fitzgibbon, A.W., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: CVPR (2011)Google Scholar
  5. 5.
    Oikonomidis, I., Kyriazis, N., Argyros, A.A.: Tracking the articulated motion of two strongly interacting hands. In: CVPR (2012)Google Scholar
  6. 6.
    Romero, J., Kjellstrom, H., Ek, C.H., Kragic, D.: Non-parametric hand pose estimation with object context. Im. and Vision Comp. 31(8), 555–564 (2013)CrossRefGoogle Scholar
  7. 7.
    Tang, D., Kim, T.H.Y.T.K.: Real-time articulated hand pose estimation using semi-supervised transductive regression forests. In: ICCV (2013)Google Scholar
  8. 8.
    Sakata, H., Taira, M., Kusunoki, M., Murata, A., Tsutsui, K.I., Tanaka, Y., Shein, W.N., Miyashita, Y.: Neural representation of three-dimensional features of manipulation objects with stereopsis. Experimental Brain Research 128(1–2), 160–169 (1999)CrossRefGoogle Scholar
  9. 9.
    Fathi, A., Ren, X., Rehg, J.: Learning to recognize objects in egocentric activities. In: CVPR (2011)Google Scholar
  10. 10.
    Pirsiavash, H., Ramanan, D.: Detecting activities of daily living in first-person camera views. In: CVPR (2012)Google Scholar
  11. 11.
    Starner, T., Schiele, B., Pentland, A.: Visual contextual awareness in wearable computing. In: International Symposium on Wearable Computing (1998)Google Scholar
  12. 12.
    Kurata, T., Kato, T., Kourogi, M., Jung, K., Endo, K.: A functionally-distributed hand tracking method for wearable visual interfaces and its applications. In: MVA, 84–89 (2002)Google Scholar
  13. 13.
    Kölsch, M., Turk, M.: Hand tracking with flocks of features. In: CVPR (2), 1187 (2005)Google Scholar
  14. 14.
    Kölsch, M.: An appearance-based prior for hand tracking. In: Blanc-Talon, J., Bone, D., Philips, W., Popescu, D., Scheunders, P. (eds.) ACIVS 2010, Part II. LNCS, vol. 6475, pp. 292–303. Springer, Heidelberg (2010) CrossRefGoogle Scholar
  15. 15.
    Morerio, P., Marcenaro, L., Regazzoni, C.S.: Hand detection in first person vision. In: FUSION (2013)Google Scholar
  16. 16.
    Dominguez, S., Keaton, T., Sayed, A.: A robust finger tracking method for multimodal wearable computer interfacing. IEEE Transactions on Multimedia 8(5), 956–972 (2006)CrossRefGoogle Scholar
  17. 17.
    Ryoo, M.S., Matthies, L.: First-person activity recognition: What are they doing to me?. In: CVPR (2013)Google Scholar
  18. 18.
    Mayol, W., Davison, A., Tordoff, B., Molton, N., Murray, D.: Interaction between hand and wearable camera in 2d and 3d environments. In: BMVC (2004)Google Scholar
  19. 19.
    Ren, X., Philipose, M.: Egocentric recognition of handled objects: Benchmark and analysis. In: IEEE Workshop on Egocentric Vision (2009)Google Scholar
  20. 20.
    Damen, D., Gee, A.P., Mayol-Cuevas, W.W., Calway, A.: Egocentric real-time workspace monitoring using an rgb-d camera. In: IROS (2012)Google Scholar
  21. 21.
    Ren, X., Gu, C.: Figure-ground segmentation improves handled object recognition in egocentric video. In: CVPR, pp. 3137–3144. IEEE (2010)Google Scholar
  22. 22.
    Fathi, A., Farhadi, A., Rehg, J.: Understanding egocentric activities. In: ICCV (2011)Google Scholar
  23. 23.
    Oikonomidis, I., Kyriazis, N., Argyros, A.: Efficient model-based 3d tracking of hand articulations using kinect. In: BMVC (2011)Google Scholar
  24. 24.
    Keskin, C., Kıraç, F., Kara, Y.E., Akarun, L.: Hand pose estimation and hand shape classification using multi-layered randomized decision forests. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 852–863. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  25. 25.
    Xu, C., Cheng, L.: Efficient hand pose estimation from a single depth image. In: ICCV (2013)Google Scholar
  26. 26.
    Mann, S., Huang, J., Janzen, R., Lo, R., Rampersad, V., Chen, A., Doha, T.: Blind navigation with a wearable range camera and vibrotactile helmet. In: ACM International Conf. on Multimedia. MM 2011 (2011)Google Scholar
  27. 27.
    Argyros, A.A., Lourakis, M.I.A.: Real-Time Tracking of Multiple Skin-Colored Objects with a Possibly Moving Camera. In: Pajdla, T., Matas, J.G. (eds.) ECCV 2004. LNCS, vol. 3023, pp. 368–379. Springer, Heidelberg (2004) CrossRefGoogle Scholar
  28. 28.
    Erol, A., Bebis, G., Nicolescu, M., Boyle, R.D., Twombly, X.: Vision-based hand pose estimation: A review. CVIU 108(1–2), 52–73 (2007)Google Scholar
  29. 29.
    Sridhar, S., Oulasvirta, A., Theobalt, C.: Interactive markerless articulated hand motion tracking using rgb and depth data. In: ICCV (2013)Google Scholar
  30. 30.
    Stenger, B., Thayananthan, A., Torr, P., Cipolla, R.: Model-based hand tracking using a hierarchical bayesian filter. PAMI 28(9), 1372–1384 (2006)CrossRefzbMATHGoogle Scholar
  31. 31.
    Oikonomidis, I., Kyriazis, N., Argyros, A.A.: Full dof tracking of a hand interacting with an object by modeling occlusions and physical constraints. In: ICCV (2011)Google Scholar
  32. 32.
    de La Gorce, M., Fleet, D.J., Paragios, N.: Model-based 3d hand pose estimation from monocular video. IEEE PAMI 33(9), 1793–1805 (2011)CrossRefGoogle Scholar
  33. 33.
    Ong, E.J., Bowden, R.: A boosted classifier tree for hand shape detection. In: FGR (2004)Google Scholar
  34. 34.
    Rogez, G., Rihan, J., Orrite, C., Torr, P.H.S.: Fast human pose detection using randomized hierarchical cascades of rejectors. IJCV 99(1), 25–52 (2012)MathSciNetCrossRefGoogle Scholar
  35. 35.
    Sense, P.: The primesensortmreference design 1.08. Prime Sense (2011)Google Scholar
  36. 36.
    Intel: Perceptual computing sdk (2013)Google Scholar
  37. 37.
    Šarić, M.: Libhand: A library for hand articulation Version 0.9 (2011)Google Scholar
  38. 38.
    SmithMicro: Poser10 (2010)
  39. 39.
    Shakhnarovich, G., Viola, P., Darrell, T.: Fast pose estimation with parameter-sensitive hashing. In: 2003 Proceedings of the Ninth IEEE International Conference on Computer Vision, pp. 750–757. IEEE (2003)Google Scholar
  40. 40.
    Romero, J., Feix, T., Kjellstrom, H., Kragic, D.: Spatio-temporal modeling of grasping actions. In: IROS (2010)Google Scholar
  41. 41.
    Daz3D: Every-hands pose library (2013).
  42. 42.
    Spinello, L., Arras, K.O.: People detection in rgb-d data. In: IROS (2011)Google Scholar
  43. 43.
    PrimeSense: Nite2 middleware (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Grégory Rogez
    • 1
    • 2
    Email author
  • Maryam Khademi
    • 1
  • J. S. Supančič III
    • 1
  • J. M. M. Montiel
    • 2
  • Deva Ramanan
    • 1
  1. 1.Department of Computer ScienceUniversity of CaliforniaIrvineUSA
  2. 2.Aragon Institute of Engineering Research (i3A)Universidad de ZaragozaZaragozaSpain

Personalised recommendations