Script Data for Attribute-Based Recognition of Composite Activities

Rohrbach, Marcus; Regneri, Michaela; Andriluka, Mykhaylo; Amin, Sikandar; Pinkal, Manfred; Schiele, Bernt

doi:10.1007/978-3-642-33718-5_11

Marcus Rohrbach²¹,
Michaela Regneri²²,
Mykhaylo Andriluka²¹,
Sikandar Amin^21,23,
Manfred Pinkal²² &
…
Bernt Schiele²¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7572))

Included in the following conference series:

European Conference on Computer Vision

10k Accesses
74 Citations

Abstract

State-of-the-art human activity recognition methods build on discriminative learning which requires a representative training set for good performance. This leads to scalability issues for the recognition of large sets of highly diverse activities. In this paper we leverage the fact that many human activities are compositional and that the essential components of the activities can be obtained from textual descriptions or scripts. To share and transfer knowledge between composite activities we model them by a common set of attributes corresponding to basic actions and object participants. This attribute representation allows to incorporate script data that delivers new variations of a composite activity or even to unseen composite activities. In our experiments on 41 composite cooking tasks, we found that script data to successfully capture the high variability of composite activities. We show improvements in a supervised case where training data for all composite cooking tasks is available, but we are also able to recognize unseen composites by just using script data and without any manual video annotation.

Download to read the full chapter text

Chapter PDF

Recognizing Fine-Grained and Composite Activities Using Hand-Centric Features and Script Data

Article 22 August 2015

Summary of the Cooking Activity Recognition Challenge

Identification of Cooking Preparation Using Motion Capture Data: A Submission to the Cooking Activity Recognition Challenge

References

Wang, H., Kläser, A., Schmid, C., Liu, C.-L.: Action Recognition by Dense Trajectories. In: CVPR (2011)
Google Scholar
Kovashka, A., Grauman, K.: Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In: CVPR (2010)
Google Scholar
Niebles, J.C., Chen, C.-W., Fei-Fei, L.: Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 392–405. Springer, Heidelberg (2010)
Chapter Google Scholar
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: ICPR (2004)
Google Scholar
Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: CVPR (2009)
Google Scholar
Liu, J.G., Luo, J.B., Shah, M.: Recognizing realistic actions from videos ’in the wild’. In: CVPR (2009)
Google Scholar
Rodriguez, M.D., Ahmed, J., Shah, M.: Action MACH a spatio-temporal maximum average correlation height filter for action recognition. In: CVPR (2008)
Google Scholar
Messing, R., Pal, C., Kautz, H.: Activity recognition using the velocity histories of tracked keypoints. In: ICCV (2009)
Google Scholar
Fathi, A., Farhadi, A., Rehg, J.M.: Understanding egocentric activities, cvpr. In: ICCV (2011)
Google Scholar
Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: CVPR (2009)
Google Scholar
Rohrbach, M., Stark, M., Szarvas, G., Gurevych, I., Schiele, B.: What Helps Where – And Why? Semantic Relatedness for Knowledge Transfer. In: CVPR (2010)
Google Scholar
Liu, J., Kuipers, B., Savarese, S.: Recognizing human actions by attributes. In: CVPR (2011)
Google Scholar
Rohrbach, M., Amin, S., Andriluka, M., Schiele, B.: A database for fine grained activity detection of cooking activities. In: CVPR (2012)
Google Scholar
Laptev, I.: On space-time interest points. In: IJCV (2005)
Google Scholar
Chakraborty, B., Holte, M.B., Moeslund, T.B., Gonzalez, J., Roca, F.X.: A selective spatio-temporal interest point detector for human action recognition in complex scenes. In: ICCV (2011)
Google Scholar
Gupta, A., Davis, L.S.: Objects in action: An approach for combining action understanding and object perception. In: CVPR (2007)
Google Scholar
Wu, J., Osuntogun, A., Choudhury, T., Philipose, M., Rehg, J.M.: A scalable approach to activity recognition based on object use. In: ICCV (2007)
Google Scholar
Li, L.J., Su, H., Lim, Y., Fei-Fei, L.: Objects as attributes for scene classification. In: ECCV (2010)
Google Scholar
Yao, B., Jiang, X., Khosla, A., Lin, A.L., Guibas, L., Fei-Fei1, L.: Human action recognition by learning bases of action attributes and parts. In: ICCV (2011)
Google Scholar
Ferrari, V., Zisserman, A.: Learning visual attributes. In: NIPS (2007)
Google Scholar
Farhadi, A., Endres, I., Hoiem, D.: Attribute-centric recognition for cross-category generalization. In: CVPR (2010)
Google Scholar
Fellbaum, C.: WordNet: An Electronical Lexical Database. The MIT Press (1998)
Google Scholar
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR (2008)
Google Scholar
Snoek, C., Worring, M., van Gemert, J., Geusebroek, J.M., Smeulders, A.W.M.: The challenge problem for automated detection of 101 semantic concepts in multimedia. In: ACM Multimedia (2006)
Google Scholar
Hauptmann, A.G., Christel, M.G., Yan, R.: Video retrieval based on semantic concepts. Proceedings of IEEE 96 (2008)
Google Scholar
Rohrbach, M., Stark, M., Schiele, B.: Evaluating knowledge transfer and zero-shot learning in a large-scale setting. In: CVPR (2011)
Google Scholar
Schank, R.C., Abelson, R.P.: Scripts, Plans, Goals and Understanding (1977)
Google Scholar
Barr, A., Feigenbaum, E.: The Handbook of Artificial Intelligence, vol. 1. William Kaufman Inc., Los Altos (1981)
MATH Google Scholar
Regneri, M., Koller, A., Pinkal, M.: Learning script knowledge with web experiments. In: Proceedings of ACL 2010 (2010)
Google Scholar
Bloem, J., Regneri, M., Thater, S.: Robust processing of noisy web-collected data. In: Proceedings of KONVENS 2012 (2012)
Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. In: Information Processing and Management (1988)
Google Scholar
Vedaldi, A., Zisserman, A.: Efficient additive kernels via explicit feature maps. In: CVPR (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Max Planck Institute for Informatics, Saarbrücken, Germany
Marcus Rohrbach, Mykhaylo Andriluka, Sikandar Amin & Bernt Schiele
Department of Computational Linguistics, Saarland University, Germany
Michaela Regneri & Manfred Pinkal
Department of Computer Science, Technische Universität München, Germany
Sikandar Amin

Authors

Marcus Rohrbach
View author publications
You can also search for this author in PubMed Google Scholar
Michaela Regneri
View author publications
You can also search for this author in PubMed Google Scholar
Mykhaylo Andriluka
View author publications
You can also search for this author in PubMed Google Scholar
Sikandar Amin
View author publications
You can also search for this author in PubMed Google Scholar
Manfred Pinkal
View author publications
You can also search for this author in PubMed Google Scholar
Bernt Schiele
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Microsoft Research Ltd., CB3 0FB, Cambridge, UK
Andrew Fitzgibbon
Dept. of Computer Science, University of North Carolina, 27599, Chapel Hill, NC, USA
Svetlana Lazebnik
California Institute of Technology, 91125, Pasadena, CA, USA
Pietro Perona
Institute of Industrial Science, The University of Tokyo, 153-8505, Tokyo, Japan
Yoichi Sato
INRIA, 38330, Montbonnot, France
Cordelia Schmid

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rohrbach, M., Regneri, M., Andriluka, M., Amin, S., Pinkal, M., Schiele, B. (2012). Script Data for Attribute-Based Recognition of Composite Activities. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds) Computer Vision – ECCV 2012. ECCV 2012. Lecture Notes in Computer Science, vol 7572. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33718-5_11

Download citation

DOI: https://doi.org/10.1007/978-3-642-33718-5_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33717-8
Online ISBN: 978-3-642-33718-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Script Data for Attribute-Based Recognition of Composite Activities

Abstract

Chapter PDF

Similar content being viewed by others

Recognizing Fine-Grained and Composite Activities Using Hand-Centric Features and Script Data

Summary of the Cooking Activity Recognition Challenge

Identification of Cooking Preparation Using Motion Capture Data: A Submission to the Cooking Activity Recognition Challenge

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Script Data for Attribute-Based Recognition of Composite Activities

Abstract

Chapter PDF

Similar content being viewed by others

Recognizing Fine-Grained and Composite Activities Using Hand-Centric Features and Script Data

Summary of the Cooking Activity Recognition Challenge

Identification of Cooking Preparation Using Motion Capture Data: A Submission to the Cooking Activity Recognition Challenge

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation