Deep Semantic Pyramids for Human Attributes and Action Recognition

Khan, Fahad Shahbaz; Anwer, Rao Muhammad; van de Weijer, Joost; Felsberg, Michael; Laaksonen, Jorma

doi:10.1007/978-3-319-19665-7_28

Fahad Shahbaz Khan¹⁵,
Rao Muhammad Anwer¹⁶,
Joost van de Weijer¹⁷,
Michael Felsberg¹⁵ &
…
Jorma Laaksonen¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9127))

Included in the following conference series:

Scandinavian Conference on Image Analysis

2706 Accesses
6 Citations

Abstract

Describing persons and their actions is a challenging problem due to variations in pose, scale and viewpoint in real-world images. Recently, semantic pyramids approach [1] for pose normalization has shown to provide excellent results for gender and action recognition. The performance of semantic pyramids approach relies on robust image description and is therefore limited due to the use of shallow local features. In the context of object recognition [2] and object detection [3], convolutional neural networks (CNNs) or deep features have shown to improve the performance over the conventional shallow features.

We propose deep semantic pyramids for human attributes and action recognition. The method works by constructing spatial pyramids based on CNNs of different part locations. These pyramids are then combined to obtain a single semantic representation. We validate our approach on the Berkeley and 27 Human Attributes datasets for attributes classification. For action recognition, we perform experiments on two challenging datasets: Willow and PASCAL VOC 2010. The proposed deep semantic pyramids provide a significant gain of \(17.2\,\%\), \(13.9\,\%\), \(24.3\,\%\) and \(22.6\,\%\) compared to the standard shallow semantic pyramids on Berkeley, 27 Human Attributes, Willow and PASCAL VOC 2010 datasets respectively. Our results also show that deep semantic pyramids outperform conventional CNNs based on the full bounding box of the person. Finally, we compare our approach with state-of-the-art methods and show a gain in performance compared to best methods in literature.

Download to read the full chapter text

Chapter PDF

Attributes and Action Recognition Based on Convolutional Neural Networks and Spatial Pyramid VLAD Encoding

Scale coding bag of deep features for human attribute and action recognition

Article Open access 11 September 2017

Human Attribute Recognition by Deep Hierarchical Contexts

Keywords

References

Khan, F.S., van de Weijer, J., Anwer, R.M., Felsberg, M., Gatta, C.: Semantic pyramids for gender and action recognition. TIP 23(8), 3633–3645 (2014)
Google Scholar
Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: CVPR (2014)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)
Google Scholar
Bourdev, L., Maji, S., Malik, J.: Describing people: A poselet-based approach to attribute classification. In: ICCV (2011)
Google Scholar
Zhang, N., Farrell, R., Iandola, F., Darrell, T.: Deformable part descriptors for fine-grained recognition and attribute prediction. In: ICCV (2013)
Google Scholar
Zhang, N., Paluri, M., Ranzato, M., Darrell, T., Bourdev, L.: Panda: Pose aligned networks for deep attribute modeling. In: CVPR (2014)
Google Scholar
Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. PAMI 32(9), 1627–1645 (2010)
Article Google Scholar
Bourdev, L., Malik, J.: Poselets: Body part detectors trained using 3d human pose annotations. In: ICCV (2009)
Google Scholar
LeCun, Y., Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W., Jackel, L.: Handwritten digit recognition with a back-propagation network. In: NIPS (1989)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)
Google Scholar
Toshev, A., Szegedy, C.: Deeppose: Human pose estimation via deep neural networks. In: CVPR (2014)
Google Scholar
Azizpour, H., Sullivan, J., Carlsson, S.: Cnn features off-the-shelf: an astounding baseline for recognition. In: CVPRW (2014)
Google Scholar
Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: CVPR (2012)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint arXiv:1409.1556
Sharma, G., Jurie, F., Schmid, C.: Expanded parts model for human attribute and action recognition in still images. In: CVPR (2013)
Google Scholar
Joo, J., Wang, S., Zhu, S.C.: Human attribute recognition by rich appearance dictionary. In: ICCV (2013)
Google Scholar
Liang, Z., Wang, X., Huang, R., Lin, L.: An expressive deep model for human action parsing from a single image. In: ICME (2014)
Google Scholar
Khan, F.S., Anwer, R.M., van de Weijer, J., Bagdanov, A., Lopez, A., Felsberg, M.: Coloring action recognition in still images. IJCV 105(3), 205–221 (2013)
Article Google Scholar
Maji, S., Bourdev, L.D., Malik, J.: Action recognition from a distributed representation of pose and appearance. In: CVPR (2011)
Google Scholar
Prest, A., Schmid, C., Ferrari, V.: Weakly supervised learning of interactions between humans and objects. PAMI 34(3), 601–614 (2012)
Article Google Scholar
Yao, B., Jiang, X., Khosla, A., Lin, A.L., Guibas, L.J., Li, F.F.: Human action recognition by learning bases of action attributes and parts. In: ICCV (2011)
Google Scholar
Delaitre, V., Laptev, I., Sivic, J.: Recognizing human actions in still images: a study of bag-of-features and part-based representations. In: BMVC (2010)
Google Scholar
Delaitre, V., Sivic, J., Laptev, I.: Learning person-object interactions for action recognition in still images. In: NIPS (2011)
Google Scholar
Sharma, G., Jurie, F., Schmid, C.: Discriminative spatial saliency for image classification. In: CVPR (2012)
Google Scholar
Khan, F.S., van de Weijer, J., Bagdanov, A., Felsberg, M.: Scale coding bag-of-words for action recognition. In: ICPR (2014)
Google Scholar
Shapovalova, N., Gong, W., Pedersoli, M., Roca, F.X., Gonzàlez, J.: On importance of interactions and context in human action recognition. In: Vitrià, J., Sanches, J.M., Hernández, M. (eds.) IbPRIA 2011. LNCS, vol. 6669, pp. 58–66. Springer, Heidelberg (2011)
Chapter Google Scholar

Download references

Acknowledgments

This work has been supported by SSF through a grant for the project CUAS, by VR through a grant for the projects ETT, by EU’s Horizon 2020 Program through a grant for the project CENTAURO, through the Strategic Area for ICT research ELLIIT, and CADICS, project TIN2013-41751 of Spanish Ministry of Science and the Catalan project 2014 SGR 221, grants 255745 and 251170 of the Academy of Finland and Data to Intelligence (D2I) DIGILE SHOK project. The calculations were performed using computer resources within the Aalto University School of Science “Science-IT” project.

Author information

Authors and Affiliations

Computer Vision Laboratory, Linköping University, Linköping, Sweden
Fahad Shahbaz Khan & Michael Felsberg
Department of Information and Computer Science, Aalto University School of Science, Aalto, Finland
Rao Muhammad Anwer & Jorma Laaksonen
Computer Vision Center, CS Department, Universitat Autonoma de Barcelona, Barcelona, Spain
Joost van de Weijer

Authors

Fahad Shahbaz Khan
View author publications
You can also search for this author in PubMed Google Scholar
Rao Muhammad Anwer
View author publications
You can also search for this author in PubMed Google Scholar
Joost van de Weijer
View author publications
You can also search for this author in PubMed Google Scholar
Michael Felsberg
View author publications
You can also search for this author in PubMed Google Scholar
Jorma Laaksonen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fahad Shahbaz Khan .

Editor information

Editors and Affiliations

Technical University of Denmark, Lyngby, Denmark
Rasmus R. Paulsen
University of Copenhagen, Copenhagen, Denmark
Kim S. Pedersen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khan, F.S., Anwer, R.M., van de Weijer, J., Felsberg, M., Laaksonen, J. (2015). Deep Semantic Pyramids for Human Attributes and Action Recognition. In: Paulsen, R., Pedersen, K. (eds) Image Analysis. SCIA 2015. Lecture Notes in Computer Science(), vol 9127. Springer, Cham. https://doi.org/10.1007/978-3-319-19665-7_28

Download citation

DOI: https://doi.org/10.1007/978-3-319-19665-7_28
Published: 09 June 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19664-0
Online ISBN: 978-3-319-19665-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Deep Semantic Pyramids for Human Attributes and Action Recognition

Abstract

Chapter PDF

Similar content being viewed by others

Attributes and Action Recognition Based on Convolutional Neural Networks and Spatial Pyramid VLAD Encoding

Scale coding bag of deep features for human attribute and action recognition

Human Attribute Recognition by Deep Hierarchical Contexts

Keywords

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Deep Semantic Pyramids for Human Attributes and Action Recognition

Abstract

Chapter PDF

Similar content being viewed by others

Attributes and Action Recognition Based on Convolutional Neural Networks and Spatial Pyramid VLAD Encoding

Scale coding bag of deep features for human attribute and action recognition

Human Attribute Recognition by Deep Hierarchical Contexts

Keywords

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation