Real-Time Upper Body Detection and 3D Pose Estimation in Monoscopic Images

  • Antonio S. Micilotta
  • Eng-Jon Ong
  • Richard Bowden
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3953)


This paper presents a novel solution to the difficult task of both detecting and estimating the 3D pose of humans in monoscopic images. The approach consists of two parts. Firstly the location of a human is identified by a probabalistic assembly of detected body parts. Detectors for the face, torso and hands are learnt using adaBoost. A pose likliehood is then obtained using an a priori mixture model on body configuration and possible configurations assembled from available evidence using RANSAC. Once a human has been detected, the location is used to initialise a matching algorithm which matches the silhouette and edge map of a subject with a 3D model. This is done efficiently using chamfer matching, integral images and pose estimation from the initial detection stage. We demonstrate the application of the approach to large, cluttered natural images and at near framerate operation (16fps) on lower resolution video streams.


Mixture Model Body Part Face Detection False Detection Integral Image 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Barrow, H., Tenenbaum, J., Bolles, R., Wolf, H.: Parametric correspondence and chamfer matching: Two new techniques for image matching. In: Proc. of Joint Conf. Artificial Intelligence, pp. 659–663 (1977)Google Scholar
  2. 2.
    Felzenszwalb, P., Hurrenlocher, D.: Distance transforms of sampled functions. Technical Report TR2004-1963, Cornell Computing and Information Science (2004)Google Scholar
  3. 3.
    Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient matching of pictorial structures. In: Proc. of CVPR, vol. 2, pp. 66–73 (2000)Google Scholar
  4. 4.
    Fischler, M.A., Bolles, R.C.: Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Comm. of the ACM 24, 381–395 (1981)CrossRefMathSciNetGoogle Scholar
  5. 5.
    Howe, N., Leventon, M., Freeman, W.: Bayesian reconstruction of 3d human motion from single camera video. Advances in Neural Information Processing Systems 12, 820–826 (2000)Google Scholar
  6. 6.
    Ioffe, S., Forsyth, D.: Probabilistic methods for finding people. International Journal of Computer Vision 43(1), 45–68 (2001)zbMATHCrossRefGoogle Scholar
  7. 7.
    Micilotta, A.S., Bowden, R.: View-based location and tracking of body parts for visual interaction. In: Proc. of British Machine Vision Conference, September 2004, vol. 2, pp. 849–858 (2004)Google Scholar
  8. 8.
    Mikolajczyk, K., Schmid, C., Zisserman, A.: Human detection based on a probabilistic assembly of robust body part detectors. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3021, pp. 69–82. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  9. 9.
    Mohan, A., Papageorgiou, C., Poggio, T.: Example-based object detection in images by components. IEEE Transactions on PAMI 23(4), 349–361 (2001)Google Scholar
  10. 10.
    Triggs, B., Ronfard, R., Schmid, C.: Learning to parse pictures of people. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2353, pp. 700–707. Springer, Heidelberg (2002)Google Scholar
  11. 11.
    Roberts, T., McKenna, S., Ricketts, I.: Human pose estimation using learnt probabilistic region similarities and partial configurations. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3024, pp. 291–303. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  12. 12.
    Rowley, H.A., Baluja, S., Kanade, T.: Neural network-based face detection. IEEE Transactions on PAMI 20(1), 23–38 (1998)Google Scholar
  13. 13.
    Sigal, L., Isard, M., Sigelman, B., Black, M.: Attractive people: Assembling looselimbed models using non-parametric belief propagation. Proc. of Advances in Neural Information Processing Systems 16, 1539–1546 (2003)Google Scholar
  14. 14.
    Stenger, B., Thayananthan, A., Torr, P., Cipolla, R.: Hand pose estimation using hierarchical detection. In: Workshop on Human Computer Interaction, pp. 105–116 (2004)Google Scholar
  15. 15.
    Viola, P., Jones, M.: Robust real-time object detection. In: Proc. of IEEE Workshop on Statistical and Computational Theories of Vision (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Antonio S. Micilotta
    • 1
  • Eng-Jon Ong
    • 1
  • Richard Bowden
    • 1
  1. 1.Centre for Vision, Speech and Signal ProcessingUniversity of SurreyGuildfordUnited Kingdom

Personalised recommendations