Skip to main content

Learning Fast Hand Pose Recognition

  • Chapter
  • First Online:
Computer Vision and Machine Learning with RGB-D Sensors

Abstract

Practical real-time hand pose recognition requires a classifier of high accuracy, running in a few millisecond speed. We present a novel classifier architecture, the Discriminative Ferns Ensemble (DFE), for addressing this challenge. The classifier architecture optimizes both classification speed and accuracy when a large training set is available. Speed is obtained using simple binary features and direct indexing into a set of tables, and accuracy by using a large capacity model and careful discriminative optimization. The proposed framework is applied to the problem of hand pose recognition in depth and infrared images, using a very large training set. Both the accuracy and the classification time obtained are considerably superior to relevant competing methods, allowing one to reach accuracy targets with runtime orders of magnitude faster than the competition. We show empirically that using DFE, we can significantly reduce classification time by increasing training sample size for a fixed target accuracy. Finally, scalability to a large number of classes is tested using a synthetically generated data set of \(81\) classes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Assuming that the underlying space is of dimension higher than \(K\) and MK, respectively, which are satisfied for the image sizes considered.

  2. 2.

    Note that FN is measured at false-positive rate of 2 %. Hence, FN near 50 % is far better than random. At FP \(=\) 10 % the false-negative rates of Naive Bayes MI bits and Rand bits drops to 11 and 18 %, respectively.

References

  1. Bar-Hillel A, Hanukaev D, Levi D (2011) Fusing visual and range imaging for object class recognition. In: IEEE international conference on computer vision (ICCV) 2011

    Google Scholar 

  2. Bar-Hillel A, Levi D, Krupka E, Goldberg C (2010) Part-based feature synthesis for human detection. In: Computer vision-ECCV 2010

    Google Scholar 

  3. Benenson R, Mathias M, Timofte R, Gool LJV (2012) Pedestrian detection at 100 frames per second. In: IEEE conference on computer vision and pattern recognition (CVPR) 2012

    Google Scholar 

  4. Bi J, Zhang T, Bennett K (2004) Column-generation boosting methods for mixture of kernels. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD) 2004

    Google Scholar 

  5. Bo L, Lai K, Ren X, Fox D (2011) Object recognition with hierarchical kernel descriptors, In: IEEE conference on computer vision and pattern recognition (CVPR) 2011

    Google Scholar 

  6. Bosch A, Zisserman A, Muñoz X (2007) Image classification using random forests and ferns. In: IEEE international conference on computer vision (ICCV), pp 1–8

    Google Scholar 

  7. Dean T, Ruzon M, Segal M, Shlens J, Vijayanarasimhan S, Yagnik J (2013) Fast, accurate detection of 100,000 object classes on a single machine. In: Proceedings of IEEE conference on computer vision and pattern recognition, Washington, DC, USA, 2013

    Google Scholar 

  8. Dietterich TG, Bakiri G (1995) Solving multiclass learning problems via error-correcting output codes. J Artif Intell Res 2(1):263–286

    Google Scholar 

  9. Dietterich TG, Fisher D (2000) An experimental comparison of three methods for constructing ensembles of decision trees. Mach Learn, 139–157

    Google Scholar 

  10. Doliotis P, Athitsos V, Kosmopoulos DI, Perantonis SJ (2012) Hand shape and 3d pose estimation using depth data from a single cluttered frame. In: ISVC 2012

    Google Scholar 

  11. Dollár P, Belongie S, Perona P (2010) The fastest pedestrian detector in the west. BMVC, UK

    Google Scholar 

  12. Dollár P, Tu Z, Perona P, Belongie S (2009) Integral channel features. BMVC, UK

    Google Scholar 

  13. Erol A, Bebis G, Nicolescu M, Boyle RD, Twombly X (2007) Vision-based hand pose estimation: a review. Comput Vis Image Underst 108(1–2):52–73

    Article  Google Scholar 

  14. Fan R, Chang K, Hsieh C, Wang X, Lin C (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9:1871–1874

    MATH  Google Scholar 

  15. Han J, Shao L, Xu D, Shotton J (2013) Enhanced computer vision with microsoft kinect sensor: a review. In: IEEE transactions on cybernetics 2013

    Google Scholar 

  16. Keskin C, Kirac F, Kara Y, Akarun L (2011) Real-time hand pose estimation using depth sensors. In: IEEE international conference on computer vision (ICCV) 2011

    Google Scholar 

  17. Keskin C, Kirac F, Kara YE, Akarun L (2012) Hand pose estimation and hand shape classification using multi-layered randomized decision forests. In: Computer vision-ECCV 2012

    Google Scholar 

  18. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE Computer Society conference on computer vision and pattern recognition (CVPR) 2006

    Google Scholar 

  19. Levi D, Silberstein S, Bar-Hillel A (2013) Fast multiple-part based object detection using kd-ferns. In: IEEE Computer Society conference on computer vision and pattern recognition (CVPR)

    Google Scholar 

  20. Liu L, Shao L (2013) Learning discriminative representations from RGB-D video data. In: Proceedings of the twenty-third international joint conference on artificial Intelligence (IJCAI) 2013

    Google Scholar 

  21. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Google Scholar 

  22. Mason L, Baxter J, Bartlett P, Frean M (2000) Boosting algorithms as gradient descent. NIPS

    Google Scholar 

  23. Moosmann F, Triggs B, Jurie F (2007) Fast discriminative visual codebooks using randomized clustering forests. Adv Neural Inf Process Syst

    Google Scholar 

  24. Ozuysal M, Calonder M, Lepetit V, Fua P (2010) Fast keypoint recognition using random ferns. IEEE Trans Pattern Anal Mach Intell 32(3):448–461

    Article  Google Scholar 

  25. Shotton J, Sharp T, Kipman A, Fitzgibbon AW, Finocchio M, Blake A, Cook M, Moore R (2013) Real-time human pose recognition in parts from single depth images. Commun ACM 56(1):116–124

    Article  Google Scholar 

  26. Tang D, Yu T, Kim T-K (2013) Real-time articulated hand pose estimation using semi-supervised transductive regression forests. In: International conference on computer vision (ICCV) 2013

    Google Scholar 

  27. Tola E, Lepetit V, Fua P (2008) A Fast Local Descriptor for Dense Matching. In: IEEE conference on computer vision and pattern recognition (CVPR) 2008

    Google Scholar 

  28. Vedaldi A, Fulkerson B (2008) VLFeat: an open and portable library of computer vision algorithms. http://www.vlfeat.org/

  29. Vedaldi A, Zisserman A (2011) Efficient additive kernels via explicit feature maps. Pattern Anal Mach Intell 34(3)

    Google Scholar 

  30. Winder S, Hua G, Brown M (2009) Picking the best daisy. In: IEEE conference on computer vision and pattern recognition (CVPR)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aharon Bar-Hillel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Krupka, E. et al. (2014). Learning Fast Hand Pose Recognition. In: Shao, L., Han, J., Kohli, P., Zhang, Z. (eds) Computer Vision and Machine Learning with RGB-D Sensors. Advances in Computer Vision and Pattern Recognition. Springer, Cham. https://doi.org/10.1007/978-3-319-08651-4_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08651-4_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08650-7

  • Online ISBN: 978-3-319-08651-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics