Detecting Actions, Poses, and Objects with Relational Phraselets

Desai, Chaitanya; Ramanan, Deva

doi:10.1007/978-3-642-33765-9_12

Chaitanya Desai²¹ &
Deva Ramanan²¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7575))

Included in the following conference series:

European Conference on Computer Vision

9598 Accesses
74 Citations

Abstract

We present a novel approach to modeling human pose, together with interacting objects, based on compositional models of local visual interactions and their relations. Skeleton models, while flexible enough to capture large articulations, fail to accurately model self-occlusions and interactions. Poselets and Visual Phrases address this limitation, but do so at the expense of requiring a large set of templates. We combine all three approaches with a compositional model that is flexible enough to model detailed articulations but still captures occlusions and object interactions. Unlike much previous work on action classification, we do not assume test images are labeled with a person, and instead present results for “action detection” in an unlabeled image. Notably, for each detection, our model reports back a detailed description including an action label, articulated human pose, object poses, and occlusion flags. We demonstrate that modeling occlusion is crucial for recognizing human-object interactions. We present results on the PASCAL Action Classification challenge that shows our unified model advances the state-of-the-art for detection, action classification, and articulated pose estimation.

Download to read the full chapter text

Chapter PDF

Discriminative Hierarchical Part-Based Models for Human Parsing and Action Recognition

Weakly Aligned Multi-part Bag-of-Poses for Action Recognition from Depth Cameras

Dyadic Interaction Detection from Pose and Flow

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Felzenszwalb, P., Huttenlocher, D.: Pictorial structures for object recognition. IJCV (2005)
Google Scholar
Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: CVPR (2011)
Google Scholar
Sadeghi, M., Farhadi, A.: Recognition using visual phrases. In: CVPR (2011)
Google Scholar
Bourdev, L., Malik, J.: Poselets: Body part detectors trained using 3D human pose annotations. In: ICCV (2009)
Google Scholar
Ramanan, D.: Part-based models for finding people and estimating their pose. In: Visual Analysis of Humans (2011)
Google Scholar
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE PAMI (2009)
Google Scholar
Kumar, M., Zisserman, A., Torr, P.: Efficient discriminative learning of parts-based models. In: CVPR (2010)
Google Scholar
Yang, W., Wang, Y., Mori, G.: Recognizing human actions from still images with latent poses. In: CVPR (2010)
Google Scholar
Sun, M., Savarese, S.: Articulated part-based model for joint object detection and pose estimation. In: ICCV (2011)
Google Scholar
Wang, Y., Tran, D., Liao, Z., Forsyth, D.: Discriminative hierarchical part-based models for human parsing and action recognition. In: JMLR (2012)
Google Scholar
Tran, D., Forsyth, D.: Configuration estimates improve pedestrian finding. In: NIPS (2007)
Google Scholar
Bourdev, L., Maji, S., Brox, T., Malik, J.: Detecting People Using Mutually Consistent Poselet Activations. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 168–181. Springer, Heidelberg (2010)
Chapter Google Scholar
Maji, S., Bourdev, L., Malik, J.: Action Recognition from a Distributed Representation of Pose and Appearance. In: CVPR (2011)
Google Scholar
Bangpeng, Y., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: CVPR (2010)
Google Scholar
Gupta, A., Kembhavi, A., Davis, L.S.: Observing human-object interactions: Using spatial and functional compatibility for recognition. IEEE PAMI (2009)
Google Scholar
Prest, A., Schmid, C., Ferrari, V.: Weakly supervised learning of interactions between humans and objects. IEEE PAMI (2011) (in press)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)
Google Scholar
Chow, C., Liu, C.: Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory (1968)
Google Scholar
Joachims, T., Finley, T., Yu, C.: Cutting plane training of structural SVMs. Machine Learning (2009)
Google Scholar
Ferrari, V., Marin-Jimenez, M., Zisserman, A.: Progressive search space reduction for human pose estimation. In: CVPR (2008)
Google Scholar
Ramanan, D.: Learning to parse images of articulated bodies. In: NIPS (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

University of California at Irvine, Irvine, CA, USA
Chaitanya Desai & Deva Ramanan

Authors

Chaitanya Desai
View author publications
You can also search for this author in PubMed Google Scholar
Deva Ramanan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Microsoft Research Ltd., CB3 0FB, Cambridge, UK
Andrew Fitzgibbon
Dept. of Computer Science, University of North Carolina, 27599, Chapel Hill, NC, USA
Svetlana Lazebnik
California Institute of Technology, 91125, Pasadena, CA, USA
Pietro Perona
Institute of Industrial Science, The University of Tokyo, 153-8505, Tokyo, Japan
Yoichi Sato
INRIA, 38330, Montbonnot, France
Cordelia Schmid

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Desai, C., Ramanan, D. (2012). Detecting Actions, Poses, and Objects with Relational Phraselets. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds) Computer Vision – ECCV 2012. ECCV 2012. Lecture Notes in Computer Science, vol 7575. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33765-9_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-33765-9_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33764-2
Online ISBN: 978-3-642-33765-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Detecting Actions, Poses, and Objects with Relational Phraselets

Abstract

Chapter PDF

Similar content being viewed by others

Discriminative Hierarchical Part-Based Models for Human Parsing and Action Recognition

Weakly Aligned Multi-part Bag-of-Poses for Action Recognition from Depth Cameras

Dyadic Interaction Detection from Pose and Flow

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Detecting Actions, Poses, and Objects with Relational Phraselets

Abstract

Chapter PDF

Similar content being viewed by others

Discriminative Hierarchical Part-Based Models for Human Parsing and Action Recognition

Weakly Aligned Multi-part Bag-of-Poses for Action Recognition from Depth Cameras

Dyadic Interaction Detection from Pose and Flow

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation