Max Margin Learning of Hierarchical Configural Deformable Templates (HCDTs) for Efficient Object Parsing and Pose Estimation

Zhu, Long (Leo); Chen, Yuanhao; Lin, Chenxi; Yuille, Alan

doi:10.1007/s11263-010-0375-1

Max Margin Learning of Hierarchical Configural Deformable Templates (HCDTs) for Efficient Object Parsing and Pose Estimation

Open access
Published: 31 August 2010

Volume 93, pages 1–21, (2011)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Computer Vision Aims and scope Submit manuscript

Max Margin Learning of Hierarchical Configural Deformable Templates (HCDTs) for Efficient Object Parsing and Pose Estimation

Download PDF

Long (Leo) Zhu¹,
Yuanhao Chen²,
Chenxi Lin³ &
…
Alan Yuille⁴

1127 Accesses
22 Citations
Explore all metrics

Abstract

In this paper we formulate a hierarchical configurable deformable template (HCDT) to model articulated visual objects—such as horses and baseball players—for tasks such as parsing, segmentation, and pose estimation. HCDTs represent an object by an AND/OR graph where the OR nodes act as switches which enables the graph topology to vary adaptively. This hierarchical representation is compositional and the node variables represent positions and properties of subparts of the object. The graph and the node variables are required to obey the summarization principle which enables an efficient compositional inference algorithm to rapidly estimate the state of the HCDT. We specify the structure of the AND/OR graph of the HCDT by hand and learn the model parameters discriminatively by extending Max-Margin learning to AND/OR graphs. We illustrate the three main aspects of HCDTs—representation, inference, and learning—on the tasks of segmenting, parsing, and pose (configuration) estimation for horses and humans. We demonstrate that the inference algorithm is fast and that max-margin learning is effective. We show that HCDTs gives state of the art results for segmentation and pose estimation when compared to other methods on benchmarked datasets.

Article PDF

Parametric Image Segmentation of Humans with Structural Shape Priors

Qualitative Pose Estimation by Discriminative Deformable Part Models

Pose Invariant Deformable Shape Priors Using L 1 Higher Order Sparse Graphs

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Altun, Y., Tsochantaridis, I., & Hofmann, T. (2003). Hidden Markov support vector machines. In ICML (pp. 3–10).
Belongie, S., Malik, J., & Puzicha, J. (2002). Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis Machine Intelligence, 24(4), 509–522.
Article Google Scholar
Borenstein, E., & Ullman, S. (2002). Class-specific, top-down segmentation. In ECCV (2) (pp. 109–124).
Borenstein, E., & Malik, J. (2006). Shape guided object segmentation. In CVPR (1) (pp. 969–976).
Chen, X., & Yuille, A. (2005). A time-efficient cascade for real-time object detection: with applications for the visually impaired. In CVPR.
Chen, H., Xu, Z., Liu, Z., & Zhu, S. C. (2006). Composite templates for cloth modeling and sketching. In CVPR (1) (pp. 943–950).
Chen, Y., Zhu, L., Lin, C., Yuille, A. L., & Zhang, H. (2007). Rapid inference on a novel and/or graph for object detection, segmentation and parsing. In NIPS.
Chui, H., & Rangarajan, A. (2000). A new algorithm for non-rigid point matching. In CVPR (pp. 2044–2051).
Coughlan, J. M., Yuille, A. L., English, C., & Snow, D. (1998). Efficient optimization of a deformable template using dynamic programming. In CVPR.
Coughlan, J. M., Yuille, A. L., English, C., & Snow, D. (2000). Efficient deformable template detection and localization without user initialization. Computer Vision and Image Understanding, 78(3), 303–319.
Article Google Scholar
Coughlan, J. M., & Ferreira, S. J. (2002). Finding deformable shapes using loopy belief propagation. In ECCV (3) (pp. 453–468).
Cour, T., & Shi, J. (2007). Recognizing objects by piecing together the segmentation puzzle. In CVPR.
Crammer, K., & Singer, Y. (2001). On the algorithmic implementation of multiclass kernel-based vector machines. Journal of Machine Learning Research, 2, 265–292.
Article Google Scholar
Cristianini, N., & Shawe-Taylor, J. (2000). An introduction to support vector machines: and other kernel-based learning methods. New York: Cambridge University Press.
Google Scholar
Dechter, R., & Mateescu, R. (2007). And/or search spaces for graphical models. Artifical Intelligence, 171(2–3), 73–106.
Article MATH MathSciNet Google Scholar
Felzenszwalb, P., McAllester, D., & Ramanan, D. (2008). A discriminatively trained, multiscale, deformable part model. In Proceedings of the IEEE CVPR.
Jin, Y., & Geman, S. (2006). Context and hierarchy in a probabilistic image model. In CVPR (2) (pp. 2145–2152).
Kumar, M. P., Torr, P. H. S., & Zisserman, A. (2005). Obj cut. In CVPR (1) (pp. 18–25).
Lafferty, J. D., McCallum, A., & Pereira, F. C. N. (2001). Conditional random fields: probabilistic models for segmenting and labeling sequence data. In ICML (pp. 282–289).
Lauritzen, S. L., & Spiegelhalter, D. J. (1988). Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society. Series B, 50(2), 157–224.
MATH MathSciNet Google Scholar
Lee, M. W., & Cohen, I. (2004). Proposal maps driven mcmc for estimating human body pose in static images. In CVPR (2) (pp. 334–341).
Leibe, B., Leonardis, A., & Schiele, B. (2004). Combined object categorization and segmentation with an implicit shape model. In ECCV’04 workshop on statistical learning in computer vision, Prague, Czech Republic, May 2004 (pp. 17–32).
Levin, A., & Weiss, Y. (2006). Learning to combine bottom-up and top-down segmentation. In ECCV (4) (pp. 581–594).
Manning, C., & Schuetze, H. (1999). Foundations of statistical natural language processing. Cambridge: MIT Press.
MATH Google Scholar
Meila, M., & Jordan, M. I. (2000). Learning with mixtures of trees. Journal of Machine Learning Research, 1, 1–48.
Article MathSciNet Google Scholar
Mori, G., Ren, X., Efros, A. A., & Malik, J. (2004). Recovering human body configurations: Combining segmentation and recognition. In CVPR (2) (pp. 326–333).
Mori, G. (2005). Guiding model search using segmentation. In ICCV (pp. 1417–1423).
Oren, M., Papageorgiou, C., Sinha, P., Osuna, E., & Poggio, T. (1997). Pedestrian detection using wavelet templates. In Proc. computer vision and pattern recognition (pp. 193–199), Puerto Rico, June 16–20 1997.
Osuna, E., Freund, R., & Girosi, F. (1997). Training support vector machines: an application to face detection. In CVPR (pp. 130–136).
Platt, J. C. (1998). Using analytic qp and sparseness to speed training of support vector machines. In NIPS (pp. 557–563).
Ramanan, D. (2006). Learning to parse images of articulated bodies. In NIPS (pp. 1129–1136).
Ren, X., Berg, A. C., & Malik, J. (2005). Recovering human body configurations using pairwise constraints between parts. In ICCV (pp. 824–831).
Ren, X., Fowlkes, C., & Malik, J. (2005). Cue integration for figure/ground labeling. In NIPS.
Ronfard, R., Schmid, C., & Triggs, B. (2002). Learning to parse pictures of people. In ECCV (4) (pp. 700–714).
Rother, C., Kolmogorov, V., & Blake, A. (2004). “grabcut”: interactive foreground extraction using iterated graph cuts. ACM Transactions on Graphics, 23(3), 309–314.
Article Google Scholar
Sigal, L., & Black, M. J. (2006). Measure locally, reason globally: Occlusion-sensitive articulated pose estimation. In CVPR (2) (pp. 2041–2048).
Srinivasan, P., & Shi, J. (2007). Bottom-up recognition and parsing of the human body. In EMMCVPR (pp. 153–168).
Srinivasan, P., & Shi, J. (2007). Bottom-up recognition and parsing of the human body. In CVPR.
Taskar, B., Guestrin, C., & Koller, D. (2003). Max-margin Markov networks. In NIPS.
Taskar, B., Klein, D., Collins, M., Koller, D., & Manning, C. (2004). Max-margin parsing. In EMNLP.
Tsochantaridis, I., Hofmann, T., Joachims, T., & Altun, Y. (2004). Support vector machine learning for interdependent and structured output spaces. In ICML.
Vapnik, V. N. (1995). The nature of statistical learning theory. New York: Springer.
MATH Google Scholar
Viola, P. A., & Jones, M. J. (2004). Robust real-time face detection. International Journal of Computer Vision, 57(2), 137–154.
Article Google Scholar
Viola, P., Platt, J. C., & Zhang, C. (2005). Multiple Instance Boosting for Object detection. In NIPS.
Winn, J. M., & Jojic, N. (2005). Locus: learning object classes with unsupervised segmentation. In ICCV (pp. 756–763).
Zhang, J., Luo, J., Collins, R. T., & Liu, Y. (2006). Body localization in still images using hierarchical models and hybrid search. In CVPR (2) (pp. 1536–1543).
Zhu, L., Chen, Y., & Yuille, A. L. (2006). Unsupervised learning of a probabilistic grammar for object detection and parsing. In NIPS (pp. 1617–1624).
Zhu, L., & Yuille, A. L. (2005). A hierarchical compositional system for rapid object detection. In NIPS.
Zhu, L., Chen, Y., & Yuille, A. (2009). Unsupervised learning of probabilistic grammar-Markov models for object categories. IEEE Transactions on Pattern Analysis Machine Intelligence.
Zhu, L., Lin, C., Huang, H., Chen, Y., & Yuille, A. (2008). Unsupervised structure learning: hierarchical recursive composition, suspicious coincidence and competitive exclusion. In ECCV.
Zhu, L., Chen, Y., Lu, Y., Lin, C., & Yuille, A. (2008). Max margin AND/OR graph learning for parsing the human body. In CVPR.
Zhu, S., & Mumford, D. (2006). A stochastic grammar of images. Foundations and Trends in Computer Graphics and Vision, 2(4), 259–362.
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, University of California at Los Angeles, Los Angeles, CA, 90095, USA
Long (Leo) Zhu
University of Science and Technology of China, Hefei, Anhui, 230026, P.R. China
Yuanhao Chen
Alibaba Group R&D, Hangzhou, P.R. China
Chenxi Lin
Department of Statistics, Psychology and Computer Science, University of California at Los Angeles, Los Angeles, CA, 90095, USA
Alan Yuille

Authors

Long (Leo) Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Yuanhao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Chenxi Lin
View author publications
You can also search for this author in PubMed Google Scholar
Alan Yuille
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Long (Leo) Zhu.

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and permissions

About this article

Cite this article

Zhu, L.(., Chen, Y., Lin, C. et al. Max Margin Learning of Hierarchical Configural Deformable Templates (HCDTs) for Efficient Object Parsing and Pose Estimation. Int J Comput Vis 93, 1–21 (2011). https://doi.org/10.1007/s11263-010-0375-1

Download citation

Received: 17 July 2008
Accepted: 05 August 2010
Published: 31 August 2010
Issue Date: May 2011
DOI: https://doi.org/10.1007/s11263-010-0375-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Max Margin Learning of Hierarchical Configural Deformable Templates (HCDTs) for Efficient Object Parsing and Pose Estimation

Abstract

Article PDF

Similar content being viewed by others

Parametric Image Segmentation of Humans with Structural Shape Priors

Qualitative Pose Estimation by Discriminative Deformable Part Models

Pose Invariant Deformable Shape Priors Using L 1 Higher Order Sparse Graphs

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Max Margin Learning of Hierarchical Configural Deformable Templates (HCDTs) for Efficient Object Parsing and Pose Estimation

Abstract

Article PDF

Similar content being viewed by others

Parametric Image Segmentation of Humans with Structural Shape Priors

Qualitative Pose Estimation by Discriminative Deformable Part Models

Pose Invariant Deformable Shape Priors Using L 1 Higher Order Sparse Graphs

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation