Learning Fast Hand Pose Recognition

Krupka, Eyal; Vinnikov, Alon; Klein, Ben; Bar-Hillel, Aharon; Freedman, Daniel; Stachniak, Simon; Keskin, Cem

doi:10.1007/978-3-319-08651-4_13

Eyal Krupka⁷,
Alon Vinnikov⁷,
Ben Klein⁷,
Aharon Bar-Hillel⁷,
Daniel Freedman⁷,
Simon Stachniak⁸ &
…
Cem Keskin⁹

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

4347 Accesses

Abstract

Practical real-time hand pose recognition requires a classifier of high accuracy, running in a few millisecond speed. We present a novel classifier architecture, the Discriminative Ferns Ensemble (DFE), for addressing this challenge. The classifier architecture optimizes both classification speed and accuracy when a large training set is available. Speed is obtained using simple binary features and direct indexing into a set of tables, and accuracy by using a large capacity model and careful discriminative optimization. The proposed framework is applied to the problem of hand pose recognition in depth and infrared images, using a very large training set. Both the accuracy and the classification time obtained are considerably superior to relevant competing methods, allowing one to reach accuracy targets with runtime orders of magnitude faster than the competition. We show empirically that using DFE, we can significantly reduce classification time by increasing training sample size for a fixed target accuracy. Finally, scalability to a large number of classes is tested using a synthetically generated data set of \(81\) classes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Assuming that the underlying space is of dimension higher than \(K\) and MK, respectively, which are satisfied for the image sizes considered.
2.
Note that FN is measured at false-positive rate of 2 %. Hence, FN near 50 % is far better than random. At FP \(=\) 10 % the false-negative rates of Naive Bayes MI bits and Rand bits drops to 11 and 18 %, respectively.

References

Bar-Hillel A, Hanukaev D, Levi D (2011) Fusing visual and range imaging for object class recognition. In: IEEE international conference on computer vision (ICCV) 2011
Google Scholar
Bar-Hillel A, Levi D, Krupka E, Goldberg C (2010) Part-based feature synthesis for human detection. In: Computer vision-ECCV 2010
Google Scholar
Benenson R, Mathias M, Timofte R, Gool LJV (2012) Pedestrian detection at 100 frames per second. In: IEEE conference on computer vision and pattern recognition (CVPR) 2012
Google Scholar
Bi J, Zhang T, Bennett K (2004) Column-generation boosting methods for mixture of kernels. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD) 2004
Google Scholar
Bo L, Lai K, Ren X, Fox D (2011) Object recognition with hierarchical kernel descriptors, In: IEEE conference on computer vision and pattern recognition (CVPR) 2011
Google Scholar
Bosch A, Zisserman A, Muñoz X (2007) Image classification using random forests and ferns. In: IEEE international conference on computer vision (ICCV), pp 1–8
Google Scholar
Dean T, Ruzon M, Segal M, Shlens J, Vijayanarasimhan S, Yagnik J (2013) Fast, accurate detection of 100,000 object classes on a single machine. In: Proceedings of IEEE conference on computer vision and pattern recognition, Washington, DC, USA, 2013
Google Scholar
Dietterich TG, Bakiri G (1995) Solving multiclass learning problems via error-correcting output codes. J Artif Intell Res 2(1):263–286
Google Scholar
Dietterich TG, Fisher D (2000) An experimental comparison of three methods for constructing ensembles of decision trees. Mach Learn, 139–157
Google Scholar
Doliotis P, Athitsos V, Kosmopoulos DI, Perantonis SJ (2012) Hand shape and 3d pose estimation using depth data from a single cluttered frame. In: ISVC 2012
Google Scholar
Dollár P, Belongie S, Perona P (2010) The fastest pedestrian detector in the west. BMVC, UK
Google Scholar
Dollár P, Tu Z, Perona P, Belongie S (2009) Integral channel features. BMVC, UK
Google Scholar
Erol A, Bebis G, Nicolescu M, Boyle RD, Twombly X (2007) Vision-based hand pose estimation: a review. Comput Vis Image Underst 108(1–2):52–73
Article Google Scholar
Fan R, Chang K, Hsieh C, Wang X, Lin C (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9:1871–1874
MATH Google Scholar
Han J, Shao L, Xu D, Shotton J (2013) Enhanced computer vision with microsoft kinect sensor: a review. In: IEEE transactions on cybernetics 2013
Google Scholar
Keskin C, Kirac F, Kara Y, Akarun L (2011) Real-time hand pose estimation using depth sensors. In: IEEE international conference on computer vision (ICCV) 2011
Google Scholar
Keskin C, Kirac F, Kara YE, Akarun L (2012) Hand pose estimation and hand shape classification using multi-layered randomized decision forests. In: Computer vision-ECCV 2012
Google Scholar
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE Computer Society conference on computer vision and pattern recognition (CVPR) 2006
Google Scholar
Levi D, Silberstein S, Bar-Hillel A (2013) Fast multiple-part based object detection using kd-ferns. In: IEEE Computer Society conference on computer vision and pattern recognition (CVPR)
Google Scholar
Liu L, Shao L (2013) Learning discriminative representations from RGB-D video data. In: Proceedings of the twenty-third international joint conference on artificial Intelligence (IJCAI) 2013
Google Scholar
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Google Scholar
Mason L, Baxter J, Bartlett P, Frean M (2000) Boosting algorithms as gradient descent. NIPS
Google Scholar
Moosmann F, Triggs B, Jurie F (2007) Fast discriminative visual codebooks using randomized clustering forests. Adv Neural Inf Process Syst
Google Scholar
Ozuysal M, Calonder M, Lepetit V, Fua P (2010) Fast keypoint recognition using random ferns. IEEE Trans Pattern Anal Mach Intell 32(3):448–461
Article Google Scholar
Shotton J, Sharp T, Kipman A, Fitzgibbon AW, Finocchio M, Blake A, Cook M, Moore R (2013) Real-time human pose recognition in parts from single depth images. Commun ACM 56(1):116–124
Article Google Scholar
Tang D, Yu T, Kim T-K (2013) Real-time articulated hand pose estimation using semi-supervised transductive regression forests. In: International conference on computer vision (ICCV) 2013
Google Scholar
Tola E, Lepetit V, Fua P (2008) A Fast Local Descriptor for Dense Matching. In: IEEE conference on computer vision and pattern recognition (CVPR) 2008
Google Scholar
Vedaldi A, Fulkerson B (2008) VLFeat: an open and portable library of computer vision algorithms. http://www.vlfeat.org/
Vedaldi A, Zisserman A (2011) Efficient additive kernels via explicit feature maps. Pattern Anal Mach Intell 34(3)
Google Scholar
Winder S, Hua G, Brown M (2009) Picking the best daisy. In: IEEE conference on computer vision and pattern recognition (CVPR)
Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft Research, Herzelia, Israel
Eyal Krupka, Alon Vinnikov, Ben Klein, Aharon Bar-Hillel & Daniel Freedman
Microsoft Console dev R&D, Redmond, WA, USA
Simon Stachniak
Microsoft Research Cambridge, Cambridge, UK
Cem Keskin

Authors

Eyal Krupka
View author publications
You can also search for this author in PubMed Google Scholar
Alon Vinnikov
View author publications
You can also search for this author in PubMed Google Scholar
Ben Klein
View author publications
You can also search for this author in PubMed Google Scholar
Aharon Bar-Hillel
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Freedman
View author publications
You can also search for this author in PubMed Google Scholar
Simon Stachniak
View author publications
You can also search for this author in PubMed Google Scholar
Cem Keskin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aharon Bar-Hillel .

Editor information

Editors and Affiliations

University of Sheffield, United Kingdom
Ling Shao
Civolution Technology, Eindhoven, The Netherlands
Jungong Han
Microsoft Research, Cambridge, United Kingdom
Pushmeet Kohli
Microsoft Research, Redmond, Washington, USA
Zhengyou Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Krupka, E. et al. (2014). Learning Fast Hand Pose Recognition. In: Shao, L., Han, J., Kohli, P., Zhang, Z. (eds) Computer Vision and Machine Learning with RGB-D Sensors. Advances in Computer Vision and Pattern Recognition. Springer, Cham. https://doi.org/10.1007/978-3-319-08651-4_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-08651-4_13
Published: 15 July 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08650-7
Online ISBN: 978-3-319-08651-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics