Generalised Pose Estimation Using Depth

  • Simon Hadfield
  • Richard Bowden
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6553)


Estimating the pose of an object, be it articulated, deformable or rigid, is an important task, with applications ranging from Human-Computer Interaction to environmental understanding. The idea of a general pose estimation framework, capable of being rapidly retrained to suit a variety of tasks, is appealing. In this paper a solution is proposed requiring only a set of labelled training images in order to be applied to many pose estimation tasks. This is achieved by treating pose estimation as a classification problem, with particle filtering used to provide non-discretised estimates. Depth information extracted from a calibrated stereo sequence, is used for background suppression and object scale estimation. The appearance and shape channels are then transformed to Local Binary Pattern histograms, and pose classification is performed via a randomised decision forest. To demonstrate flexibility, the approach is applied to two different situations, articulated hand pose and rigid head orientation, achieving 97% and 84% accurate estimation rates, respectively.


pose depth stereo head hand classification particle filter gesture lbp rdf background suppression object extraction segmentation 


  1. 1.
    Ba, S.O., Odobez, J.M.: Recognizing Visual Focus of Attention From Head Pose in Natural Meetings. IEEE T. Syst. Man. Cyb. 39, 16–33 (2009)CrossRefGoogle Scholar
  2. 2.
    Breiman, L.: Random Forests. Mach. Learn. 45, 5–32 (2001)zbMATHCrossRefGoogle Scholar
  3. 3.
    de Campos, T.E., Murray, D.W.: Regression-based Hand Pose Estimation from Multiple Cameras. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 782–789. IEEE Press, New York (2006)Google Scholar
  4. 4.
    Isard, M., Blake, A.: CONDENSATION - Conditional Density Propagation for Visual Tracking. Mach. Learn. 29, 5–28 (1998)Google Scholar
  5. 5.
    Lablack, A., Maquet, F.: Visual gaze projection in front of a target scene. In: IEEE International Conference on Multimedia and Expo, pp. 1839–1840. IEEE Press, New York (2009)CrossRefGoogle Scholar
  6. 6.
    Malassiotis, S., Strintzis, M.G.: Robust real-time 3D head pose estimation from range data. Pattern Recogn. 38, 1153–1165 (2005)CrossRefGoogle Scholar
  7. 7.
    Marras, I., Nikolaidis, N., Pitas, I.: 3D head pose estimation in monocular video sequences by sequential camera self-calibration. In: IEEE International Workshop on Multimedia Signal Processing, pp. 1–6. IEEE Press, Brazil (2009)CrossRefGoogle Scholar
  8. 8.
    Mitome, A., Ishii, R.: A comparison of hand shape recognition algorithms. In: Annual Conference of the IEEE Industrial Electronics Society. IEEE Press, Virginia (2003)Google Scholar
  9. 9.
    Ojala, T., Pietikainen, M., Harwood, D.: A Comparative Study of Texture Measures with Classification Based on Feature Distributions. Pattern Recogn. 29, 51–59 (1996)CrossRefGoogle Scholar
  10. 10.
    Ojala, T., Pietikainen, M., Topi, M.: Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns. IEEE T. Pattern Anal. 24, 971–987 (2002)CrossRefGoogle Scholar
  11. 11.
    Ong, E.J., Bowden, R.: A boosted classifier tree for hand shape detection. In: IEEE International Conference on Automatic Face and Gesture Recognition, pp. 889–894. IEEE Press, Korea (2004)Google Scholar
  12. 12.
    Viola, P., Jones, M.: Rapid Object Detection using a Boosted Cascade of Simple Features. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 511–518. IEEE Press, Hawaii (2001)Google Scholar
  13. 13.
    Zhenyao, M., Neumann, U.: Real-time Hand Pose Recognition Using Low-Resolution Depth Images. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1499–1505. IEEE Press, New York (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Simon Hadfield
    • 1
  • Richard Bowden
    • 1
  1. 1.Centre for Vision, Speech and Signal ProcessingUniversity of SurreyGuildfordEngland

Personalised recommendations