3D Pose Estimation of a Front-Pointing Hand Using a Random Regression Forest

  • Dai Fujita
  • Takashi KomuroEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10118)


In this paper, we propose a method for estimating the 3D poses of a front-pointing hand from camera images to realize freehand pointing interaction from a distance. Our method uses a Random Regression Forest (RRF) to realize robust estimation against environmental and individual variations. In order to improve the estimation accuracy, our method supports the use of two cameras and integrates the distributions of the hand poses for these cameras, which are modeled by the Gaussian mixture model. Moreover, tracking of the hand poses further improves the estimation accuracy and stability. The results of performance evaluation showed that the root mean square error of the angle estimation was 4.10\(^{\circ }\), which is accurate enough to expect that our proposed method can be applied to user interface systems.


Root Mean Square Error Feature Vector Posterior Distribution Estimation Accuracy Regression Tree 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Supplementary material

Supplementary material 1 (wmv 18515 KB)


  1. 1.
  2. 2.
    Kölsch, M., Turk, M.: Robust hand detection. In: Sixth IEEE International Conference on Automatic Face and Gesture Recognition, pp. 614–619 (2004)Google Scholar
  3. 3.
    Song, J., Sörös, G., Pece, F., Fanello, S.R., Izadi, S., Keskin, C., Hilliges, O.: In-air gestures around unmodified mobile devices. In: 27th Annual ACM Symposium on User Interface Software and Technology, pp. 319–329 (2014)Google Scholar
  4. 4.
    Fanelli, G., Gall, J., Gool, L.V.: Real time head pose estimation with random regression forests. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 617–624 (2011)Google Scholar
  5. 5.
    Girshick, R., Shotton, J., Kohli, P., Criminisi, A., Fitzgibbon, A.: Efficient regression of general-activity human poses from depth images. In: IEEE International Conference on Computer Vision, pp. 415–422 (2011)Google Scholar
  6. 6.
    Hara, K., Chellappa, R.: Growing regression forests by classification: applications to object pose estimation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 552–567. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10605-2_36 Google Scholar
  7. 7.
    Zhen, X., Wang, Z., Yu, M., Li, S.: Supervised descriptor learning for multi-output regression. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1211–1218 (2015)Google Scholar
  8. 8.
    Oikonomidis, I., Kyriazis, N., Argyros, A.: Efficient model-based 3D tracking of hand articulations using kinect. In: Proceedings of the British Machine Vision Conference, pp. 101.1–101.11 (2011)Google Scholar
  9. 9.
    Keskin, C., Kıraç, F., Kara, Y.E., Akarun, L.: Hand pose estimation and hand shape classification using multi-layered randomized decision forests. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 852–863. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33783-3_61 CrossRefGoogle Scholar
  10. 10.
    Tompson, J., Stein, M., Lecun, Y., Perlin, K.: Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans. Graph. 33, 169:1–169:10 (2014)CrossRefGoogle Scholar
  11. 11.
    Sharp, T., Keskin, C., Robertson, D., Taylor, J., Shotton, J., Kim, D., Rhemann, C., Leichter, I., Vinnikov, A., Wei, Y., Freedman, D., Kohli, P., Krupka, E., Fitzgibbon, A., Shahram, I.: Accurate, robust, and flexible real-time hand tracking. In: 33rd Annual ACM Conference on Human Factors in Computing Systems, pp. 3633–3642 (2015)Google Scholar
  12. 12.
    Schick, A., van de Camp, F., Ijsselmuiden, J., Stiefelhagen, R.: Extending touch: Towards interaction with large-scale surfaces. In: ACM International Conference on Interactive Tabletops and Surfaces, pp. 117–124 (2009)Google Scholar
  13. 13.
    Hu, K., Canavan, S., Yin, L.: Hand pointing estimation for human computer interaction based on two orthogonal-views. In: 20th International Conference on Pattern Recognition, pp. 3760–3763 (2010)Google Scholar
  14. 14.
    Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)CrossRefzbMATHGoogle Scholar
  15. 15.
    Criminisi, A., Shotton, J., Konukoglu, E.: Decision forests: a unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning. Found. Trends Comput. Graph. Vis. 7, 81–227 (2012)CrossRefzbMATHGoogle Scholar
  16. 16.
    Ali-Löytty, S., Niilo, S.: Gaussian mixture filter in hybrid navigation. In: European Navigation Conference, pp. 831–837 (2007)Google Scholar
  17. 17.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, 886–893 (2005)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Graduate School of Science and EngineeringSaitama UniversitySaitamaJapan

Personalised recommendations