Fast Human Pose Detection Using Randomized Hierarchical Cascades of Rejectors

Rogez, Grégory; Rihan, Jonathan; Orrite-Uruñuela, Carlos; Torr, Philip H. S.

doi:10.1007/s11263-012-0516-9

Fast Human Pose Detection Using Randomized Hierarchical Cascades of Rejectors

Published: 31 January 2012

Volume 99, pages 25–52, (2012)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Grégory Rogez¹,
Jonathan Rihan²,
Carlos Orrite-Uruñuela¹ &
…
Philip H. S. Torr²

1211 Accesses
20 Citations
Explore all metrics

Abstract

This paper addresses human detection and pose estimation from monocular images by formulating it as a classification problem. Our main contribution is a multi-class pose detector that uses the best components of state-of-the-art classifiers including hierarchical trees, cascades of rejectors as well as randomized forests. Given a database of images with corresponding human poses, we define a set of classes by discretizing camera viewpoint and pose space. A bottom-up approach is first followed to build a hierarchical tree by recursively clustering and merging the classes at each level. For each branch of this decision tree, we take advantage of the alignment of training images to build a list of potentially discriminative HOG (Histograms of Orientated Gradients) features. We then select the HOG blocks that show the best rejection performances. We finally grow an ensemble of cascades by randomly sampling one of these HOG-based rejectors at each branch of the tree. The resulting multi-class classifier is then used to scan images in a sliding window scheme. One of the properties of our algorithm is that the randomization can be applied on-line at no extra-cost, therefore classifying each window with a different ensemble of randomized cascades. Our approach, when compared to other pose classifiers, gives fast and efficient detection performances with both fixed and moving cameras. We present results using different publicly available training and testing data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Latent-Class Hough Forests for 3D Object Detection and Pose Estimation

Hough-Based Tracking of Deformable Objects

Efficient Estimation of Human Upper Body Pose in Static Depth Images

References

Agarwal, A., & Triggs, B. (2006). Recovering 3d human pose from monocular images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(1), 44–58.
Article Google Scholar
Andriluka, M., Roth, S., & Schiele, B. (2009). Pictorial structures revisited: People detection and articulated pose estimation. In CVPR.
Google Scholar
Andriluka, M., Roth, S., & Schiele, B. (2010). Monocular 3d pose estimation and tracking by detection. In CVPR (pp. 623–630).
Google Scholar
Bergtholdt, M., Kappes, J. H., Schmidt, S., & Schnörr, C. (2010). A study of parts-based object class detection using complete graphs. International Journal of Computer Vision, 87(1–2), 93–117.
Article MathSciNet Google Scholar
Bissacco, A., Yang, M. H., & Soatto, S. (2006). Detecting humans via their pose. In NIPS (pp. 169–176).
Google Scholar
Bissacco, A., Yang, M. H., & Soatto, S. (2007). Fast human pose estimation using appearance and motion via multi-dimensional boosting regression. In CVPR.
Google Scholar
Bookstein, F. (1991). Morphometric tools for landmark data: geometry and biology. Cambridge: Cambridge University Press.
MATH Google Scholar
Bosch, A., Zisserman, A., & Munoz, X. (2007). Image classification using random forests and ferns. In ICCV.
Google Scholar
Bourdev, L., & Malik, J. (2009). Poselets: Body part detectors trained using 3d human pose annotations. In ICCV.
Google Scholar
Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123–140.
MathSciNet MATH Google Scholar
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Article MATH Google Scholar
Brostow, G. J., Shotton, J., Fauqueur, J., & Cipolla, R. (2008). Segmentation and recognition using structure from motion point clouds. In ECCV (pp. 44–57).
Google Scholar
Collins, R., & Liu, Y. (2003). On-line selection of discriminative tracking features. In ICCV.
Google Scholar
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR (pp. 886–893).
Google Scholar
Datar, M., Immorlica, N., Indyk, P., & Mirrokni, V. (2004). Locality-sensitive hashing scheme based on p-stable distributions. In Proc. of the 20th annual symposium on computational geometry (pp. 253–262).
Google Scholar
Deselaers, T., Criminisi, A., Winn, J. M., & Agarwal, A. (2007). Incorporating on-demand stereo for real time recognition. In CVPR.
Google Scholar
Dimitrijevic, M., Lepetit, V., & Fua, P. (2006). Human body pose detection using bayesian spatio-temporal templates. Computer Vision and Image Understanding, 104(2), 127–139.
Article Google Scholar
Elgammal, A. M., & Lee, C. S. (2009). Tracking people on a torus. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(3), 520–538.
Article Google Scholar
Felzenszwalb, P. F., & Huttenlocher, D. P. (2005). Pictorial structures for object recognition. International Journal of Computer Vision, 61(1), 55–79.
Article Google Scholar
Felzenszwalb, P. F., Girshick, R. B., & McAllester, D. A. (2010). Cascade object detection with deformable part models. In CVPR (pp. 2241–2248).
Google Scholar
Ferrari, V., Marn-Jimnez, M. J., & Zisserman, A. (2008). Progressive search space reduction for human pose estimation. In CVPR.
Google Scholar
Fossati, A., Dimitrijevic, M., Lepetit, V., & Fua, P. (2007). Bridging the gap between detection and tracking for 3d monocular video-based motion capture. In CVPR.
Google Scholar
Gall, J., Rosenhahn, B., Brox, T., & Seidel, H. P. (2010). Optimization and filtering for human motion capture. International Journal of Computer Vision, 87(1–2), 75–92.
Article Google Scholar
Gavrila, D. M. (2007). A bayesian, exemplar-based approach to hierarchical shape matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(8), 1408–1421.
Article Google Scholar
Gross, R., & Shi, J. (2001). The cmu motion of body (mobo) database. Robotics Institute, Carnegie Mellon University, Pittsburgh, PA.
Jaeggli, T., Koller-Meier, E., & Gool, L. J. V. (2009). Learning generative models for multi-activity body pose estimation. International Journal of Computer Vision, 83(2), 121–134.
Article Google Scholar
Kanade, T., Cohn, J. F., & Tian, Y. (2000). Comprehensive database for facial expression analysis. In FG (pp. 46–53).
Google Scholar
Laptev, I. (2009). Improving object detection with boosted histograms. Image and Vision Computing, 27(5), 535–544.
Article Google Scholar
Lee, C. S., & Elgammal, AM (2010). Coupled visual and kinematic manifold models for tracking. International Journal of Computer Vision, 87(1–2), 118–139.
Article Google Scholar
Lepetit, V., & Fua, P. (2006). Keypoint recognition using randomized trees. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(9), 1465–1479.
Article Google Scholar
Lin, Z., & Davis, L. S. (2010). Shape-based human detection and segmentation via hierarchical part-template matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(4), 604–618.
Article Google Scholar
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Article Google Scholar
Ma, Y., & Ding, X. (2005). Real-time multi-view face detection and pose estimation based on cost-sensitive adaboost. Tsinghua Science and Technology, 10(2), 152–157.
Article Google Scholar
Moosmann, F., Nowak, E., & Jurie, F. (2008). Randomized clustering forests for image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(9), 1632–1646.
Article Google Scholar
Mori, G., & Malik, J. (2006). Recovering 3d human body configurations using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(7), 1052–1062.
Article Google Scholar
Navaratnam, R., Thayananthan, A., Torr, P., & Cipolla, R. (2005). Hierarchical part-based human body pose estimation. In BMVC.
Google Scholar
Okada, R., & Soatto, S. (2008). Relevant feature selection for human pose estimation and localization in cluttered images. In ECCV (pp. 434–445).
Google Scholar
Okada, R., & Stenger, B. (2008). A single camera motion capture system for human-computer interaction. IEICE Transactions on Information and Systems, 91(7), 1855–1862.
Article Google Scholar
Orrite, C., Gañán, A., & Rogez, G. (2009). Hog-based decision tree for facial expression classification. In IbPRIA (pp. 176–183).
Google Scholar
Roberts, T., McKenna, S., & Ricketts, I. (2004). Human pose estimation using learnt probabilistic region similarities and partial configurations. In ECCV (pp. 291–303).
Google Scholar
Rogez, G., Orrite, C., & Martínez, J. (2008a). A spatio-temporal 2d-models framework for human pose recovery in monocular sequences. Pattern Recognition.
Rogez, G., Rihan, J., Ramalingam, S., Orrite, C., & Torr, P. H. (2008b). Randomized trees for human pose detection. In CVPR (pp. 1–8).
Google Scholar
Sabzmeydani, P., & Mori, G. (2007). Detecting pedestrians by learning shapelet features. In CVPR07.
Google Scholar
Shakhnarovich, G., Viola, P., & Darrell, R. (2003). Fast pose estimation with parameter-sensitive hashing. In ICCV.
Google Scholar
Shotton, J., Johnson, M., Cipolla, R., Center, T., & Kawasaki, J. (2008). Semantic texton forests for image categorization and segmentation. In CVPR.
Google Scholar
Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., & Blake, A. (2011). Real-time human pose recognition in parts from single depth images. In CVPR.
Google Scholar
Sigal, L., & Black, M. J. (2010). Guest editorial: State of the art in image- and video-based human pose and motion estimation. International Journal of Computer Vision, 87(1–2), 1–3.
Article Google Scholar
Sigal, L., Balan, A. O., & Black, M. J. (2010). Humaneva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. International Journal of Computer Vision, 87(1–2), 4–27.
Article Google Scholar
Sminchisescu, C., Kanaujia, A., & Metaxas, D. N. (2006). Learning joint top-down and bottom-up processes for 3d visual inference. In CVPR (2) (pp. 1743–1752).
Google Scholar
Stenger, B. (2004). Model-based hand tracking using a hierarchical bayesian filter. PhD thesis, Department of Engineering, University of Cambridge.
Sugano, H., & Miyamoto, R. (2007). A real-time object recognition system on cell broadband engine. In Proc. of the 2nd Pacific Rim conference on advances in image and video technology (pp. 932–943).
Google Scholar
Thayananthan, A., Navaratnam, R., Stenger, B., Torr, P. H. S., & Cipolla, R. (2006). Multivariate relevance vector machines for tracking. In ECCV (3) (pp. 124–138).
Google Scholar
Toyama, K., & Blake, A. (2002). Probabilistic tracking with exemplars in a metric space. International Journal of Computer Vision, 48(1), 9–19.
Article MATH Google Scholar
Villamizar, M., Sanfeliu, A., & Andrade-Cetto, J. (2009). Local boosted features for pedestrian detection. In IbPRIA (pp. 128–135).
Google Scholar
Viola, P., & Jones, M. (2002). Robust real-time object detection. International Journal of Computer Vision.
Viola, P., Jones, M. J., & Snow, D. (2005). Detecting pedestrians using patterns of motion and appearance. International Journal of Computer Vision, 63(2), 153–161.
Article Google Scholar
Wu, B., & Nevatia, R. (2005). Detection of multiple, partially occluded humans in a single image by bayesian combination of edgelet part detectors. In ICCV (pp. 90–97).
Google Scholar
Zehnder, P., Koller-Meier, E., & Van Gool, L. (2005). A hierarchical system for recognition, tracking and pose estimation. In MLMI (pp. 329–340).
Google Scholar
Zhang, J., Zhou, S., McMillan, L., & Comaniciu, D. (2007). Joint real-time object detection and pose estimation using probabilistic boosting network. In CVPR (pp. 1–8).
Google Scholar
Zhang, Z., Zhu, L., Li, S., & Zhang, H. (2002). Real-time multi-view face detection. In Proc. int’l conf. automatic face and gesture recognition (pp. 149–154).
Google Scholar
Zhu, Q., Avidan, S., Yeh, M. C., & Cheng, K. T. (2006). Fast human detection using a cascade of histograms of oriented gradients. In CVPR (pp. 1491–1498).
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Vision Lab, Aragon Institute for Engineering Research (I3A), University of Zaragoza, Zaragoza, Spain
Grégory Rogez & Carlos Orrite-Uruñuela
Department of Computing, Oxford Brookes University, Wheatley Campus, Oxford, OX33 1HX, UK
Jonathan Rihan & Philip H. S. Torr

Authors

Grégory Rogez
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan Rihan
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Orrite-Uruñuela
View author publications
You can also search for this author in PubMed Google Scholar
Philip H. S. Torr
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Grégory Rogez.

Additional information

Part of this work was conducted while the first author was a research fellow at Oxford Brookes University. This work was partly supported by the EPSRC grant GR/T21790/01(P) and by Sony Entertainment Europe (SCEE). G. Rogez and C. Orrite would like to acknowledge support provided by: “Departamento de Ciencia, Tecnología y Universidad del Gobierno de Aragón”, “Fondo Social Europeo” and “Ministerio de Ciencia e Innovación (TIN2010-20177)”. Prof. Torr is in receipt of a Royal Society Wolfson Research Merit Award.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rogez, G., Rihan, J., Orrite-Uruñuela, C. et al. Fast Human Pose Detection Using Randomized Hierarchical Cascades of Rejectors. Int J Comput Vis 99, 25–52 (2012). https://doi.org/10.1007/s11263-012-0516-9

Download citation

Received: 24 April 2011
Accepted: 05 January 2012
Published: 31 January 2012
Issue Date: August 2012
DOI: https://doi.org/10.1007/s11263-012-0516-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast Human Pose Detection Using Randomized Hierarchical Cascades of Rejectors

Abstract

Access this article

Similar content being viewed by others

Latent-Class Hough Forests for 3D Object Detection and Pose Estimation

Hough-Based Tracking of Deformable Objects

Efficient Estimation of Human Upper Body Pose in Static Depth Images

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fast Human Pose Detection Using Randomized Hierarchical Cascades of Rejectors

Abstract

Access this article

Similar content being viewed by others

Latent-Class Hough Forests for 3D Object Detection and Pose Estimation

Hough-Based Tracking of Deformable Objects

Efficient Estimation of Human Upper Body Pose in Static Depth Images

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation