Abstract
Practical real-time hand pose recognition requires a classifier of high accuracy, running in a few millisecond speed. We present a novel classifier architecture, the Discriminative Ferns Ensemble (DFE), for addressing this challenge. The classifier architecture optimizes both classification speed and accuracy when a large training set is available. Speed is obtained using simple binary features and direct indexing into a set of tables, and accuracy by using a large capacity model and careful discriminative optimization. The proposed framework is applied to the problem of hand pose recognition in depth and infrared images, using a very large training set. Both the accuracy and the classification time obtained are considerably superior to relevant competing methods, allowing one to reach accuracy targets with runtime orders of magnitude faster than the competition. We show empirically that using DFE, we can significantly reduce classification time by increasing training sample size for a fixed target accuracy. Finally, scalability to a large number of classes is tested using a synthetically generated data set of \(81\) classes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Assuming that the underlying space is of dimension higher than \(K\) and MK, respectively, which are satisfied for the image sizes considered.
- 2.
Note that FN is measured at false-positive rate of 2Â %. Hence, FN near 50Â % is far better than random. At FPÂ \(=\)Â 10Â % the false-negative rates of Naive Bayes MI bits and Rand bits drops to 11 and 18Â %, respectively.
References
Bar-Hillel A, Hanukaev D, Levi D (2011) Fusing visual and range imaging for object class recognition. In: IEEE international conference on computer vision (ICCV) 2011
Bar-Hillel A, Levi D, Krupka E, Goldberg C (2010) Part-based feature synthesis for human detection. In: Computer vision-ECCV 2010
Benenson R, Mathias M, Timofte R, Gool LJV (2012) Pedestrian detection at 100 frames per second. In: IEEE conference on computer vision and pattern recognition (CVPR) 2012
Bi J, Zhang T, Bennett K (2004) Column-generation boosting methods for mixture of kernels. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD) 2004
Bo L, Lai K, Ren X, Fox D (2011) Object recognition with hierarchical kernel descriptors, In: IEEE conference on computer vision and pattern recognition (CVPR) 2011
Bosch A, Zisserman A, Muñoz X (2007) Image classification using random forests and ferns. In: IEEE international conference on computer vision (ICCV), pp 1–8
Dean T, Ruzon M, Segal M, Shlens J, Vijayanarasimhan S, Yagnik J (2013) Fast, accurate detection of 100,000 object classes on a single machine. In: Proceedings of IEEE conference on computer vision and pattern recognition, Washington, DC, USA, 2013
Dietterich TG, Bakiri G (1995) Solving multiclass learning problems via error-correcting output codes. J Artif Intell Res 2(1):263–286
Dietterich TG, Fisher D (2000) An experimental comparison of three methods for constructing ensembles of decision trees. Mach Learn, 139–157
Doliotis P, Athitsos V, Kosmopoulos DI, Perantonis SJ (2012) Hand shape and 3d pose estimation using depth data from a single cluttered frame. In: ISVC 2012
Dollár P, Belongie S, Perona P (2010) The fastest pedestrian detector in the west. BMVC, UK
Dollár P, Tu Z, Perona P, Belongie S (2009) Integral channel features. BMVC, UK
Erol A, Bebis G, Nicolescu M, Boyle RD, Twombly X (2007) Vision-based hand pose estimation: a review. Comput Vis Image Underst 108(1–2):52–73
Fan R, Chang K, Hsieh C, Wang X, Lin C (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9:1871–1874
Han J, Shao L, Xu D, Shotton J (2013) Enhanced computer vision with microsoft kinect sensor: a review. In: IEEE transactions on cybernetics 2013
Keskin C, Kirac F, Kara Y, Akarun L (2011) Real-time hand pose estimation using depth sensors. In: IEEE international conference on computer vision (ICCV) 2011
Keskin C, Kirac F, Kara YE, Akarun L (2012) Hand pose estimation and hand shape classification using multi-layered randomized decision forests. In: Computer vision-ECCV 2012
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE Computer Society conference on computer vision and pattern recognition (CVPR) 2006
Levi D, Silberstein S, Bar-Hillel A (2013) Fast multiple-part based object detection using kd-ferns. In: IEEE Computer Society conference on computer vision and pattern recognition (CVPR)
Liu L, Shao L (2013) Learning discriminative representations from RGB-D video data. In: Proceedings of the twenty-third international joint conference on artificial Intelligence (IJCAI) 2013
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Mason L, Baxter J, Bartlett P, Frean M (2000) Boosting algorithms as gradient descent. NIPS
Moosmann F, Triggs B, Jurie F (2007) Fast discriminative visual codebooks using randomized clustering forests. Adv Neural Inf Process Syst
Ozuysal M, Calonder M, Lepetit V, Fua P (2010) Fast keypoint recognition using random ferns. IEEE Trans Pattern Anal Mach Intell 32(3):448–461
Shotton J, Sharp T, Kipman A, Fitzgibbon AW, Finocchio M, Blake A, Cook M, Moore R (2013) Real-time human pose recognition in parts from single depth images. Commun ACM 56(1):116–124
Tang D, Yu T, Kim T-K (2013) Real-time articulated hand pose estimation using semi-supervised transductive regression forests. In: International conference on computer vision (ICCV) 2013
Tola E, Lepetit V, Fua P (2008) A Fast Local Descriptor for Dense Matching. In: IEEE conference on computer vision and pattern recognition (CVPR) 2008
Vedaldi A, Fulkerson B (2008) VLFeat: an open and portable library of computer vision algorithms. http://www.vlfeat.org/
Vedaldi A, Zisserman A (2011) Efficient additive kernels via explicit feature maps. Pattern Anal Mach Intell 34(3)
Winder S, Hua G, Brown M (2009) Picking the best daisy. In: IEEE conference on computer vision and pattern recognition (CVPR)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Krupka, E. et al. (2014). Learning Fast Hand Pose Recognition. In: Shao, L., Han, J., Kohli, P., Zhang, Z. (eds) Computer Vision and Machine Learning with RGB-D Sensors. Advances in Computer Vision and Pattern Recognition. Springer, Cham. https://doi.org/10.1007/978-3-319-08651-4_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-08651-4_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08650-7
Online ISBN: 978-3-319-08651-4
eBook Packages: Computer ScienceComputer Science (R0)