Convolutional Learning of Spatio-temporal Features

Taylor, Graham W.; Fergus, Rob; LeCun, Yann; Bregler, Christoph

doi:10.1007/978-3-642-15567-3_11

Graham W. Taylor¹⁹,
Rob Fergus¹⁹,
Yann LeCun¹⁹ &
…
Christoph Bregler¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 6316))

Included in the following conference series:

European Conference on Computer Vision

8387 Accesses
216 Citations

Abstract

We address the problem of learning good features for understanding video data. We introduce a model that learns latent representations of image sequences from pairs of successive images. The convolutional architecture of our model allows it to scale to realistic image sizes whilst using a compact parametrization. In experiments on the NORB dataset, we show our model extracts latent “flow fields” which correspond to the transformation between the pair of input frames. We also use our model to extract low-level motion features in a multi-stage architecture for action recognition, demonstrating competitive performance on both the KTH and Hollywood2 datasets.

Download to read the full chapter text

Chapter PDF

Deep Insights into Convolutional Networks for Video Recognition

Article Open access 29 October 2019

Sympathy for the Details: Dense Trajectories and Hybrid Classification Architectures for Action Recognition

Video Representation Learning by Recognizing Temporal Transformations

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)
Article Google Scholar
Hinton, G., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554 (2006)
Article MATH MathSciNet Google Scholar
Larochelle, H., Erhan, D., Courville, A., Bergstra, J., Bengio, Y.: An empirical evaluation of deep architectures on problems with many factors of variation. In: ICML, pp. 473–480 (2007)
Google Scholar
Ranzato, M., Poultney, C., Chopra, S., LeCun, Y.: Efficient learning of sparse representations with an energy-based model. In: NIPS, pp. 1137–1144 (2006)
Google Scholar
Nair, V., Hinton, G.: 3D object recognition with deep belief nets. In: NIPS, pp. 1339–1347 (2009)
Google Scholar
Cadieu, C., Olshausen, B.: Learning transformational invariants from natural movies. In: NIPS, pp. 209–216 (2009)
Google Scholar
Jarrett, K., Kavukcuoglu, K., Ranzato, M., LeCun, Y.: What is the best multi-stage architecture for object recognition? In: ICCV, pp. 2146–2153 (2009)
Google Scholar
Lee, H., Grosse, R., Ranganath, R., Ng, A.Y.: Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: ICML, pp. 609–616 (2009)
Google Scholar
Norouzi, M., Ranjbar, M., Mori, G.: Stacks of convolutional restricted Boltzmann machines for shift-invariant feature learning. In: CVPR (2009)
Google Scholar
Memisevic, R., Hinton, G.: Unsupervised learning of image transformations. In: CVPR (2007)
Google Scholar
Memisevic, R., Hinton, G.: Learning to represent spatial transformations with factored higher-order Boltzmann machines. Neural Comput. (2010)
Google Scholar
Sutskever, I., Hinton, G.: Learning multilevel distributed representations for high-dimensional sequences. In: AISTATS (2007)
Google Scholar
Olshausen, B., Field, D.: Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision Res. 37, 3311–3325 (1997)
Article Google Scholar
Dean, T., Corrado, G., Washington, R.: Recursive sparse spatiotemporal coding. In: Proc. IEEE Int. Workshop on Mult. Inf. Proc. and Retr. (2009)
Google Scholar
Lee, H., Battle, A., Raina, R., Ng, A.Y.: Efficient sparse coding algorithms. In: NIPS, pp. 801–808 (2007)
Google Scholar
Laptev, I., Lindeberg, T.: Space-time interest points. In: ICCV, pp. 432–439 (2003)
Google Scholar
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR (2008)
Google Scholar
Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: VS-PETS (2005)
Google Scholar
Willems, G., Tuytelaars, T., Gool, L.: An efficient dense and scale-invariant spatio-temporal interest point detector. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 650–663. Springer, Heidelberg (2008)
Chapter Google Scholar
Jhuang, H., Serre, T., Wolf, L., Poggio, T.: A biologically inspired system for action recognition. In: ICCV (2007)
Google Scholar
He, X., Zemel, R., Carreira-Perpiñán, M.: Multiscale conditional random fields for image labeling. In: CVPR, pp. 695–702 (2004)
Google Scholar
Hinton, G.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14, 1771–1800 (2002)
Article MATH Google Scholar
LeCun, Y., Huang, F., Bottou, L.: Learning methods for generic object recognition with invariance to pose and lighting. In: CVPR (2004)
Google Scholar
Pinto, N., Cox, D., DiCarlo, J.: Why is real-world visual object recognition hard? PLoS Comput. Biol. 4 (2008)
Google Scholar
Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: CVPR, pp. 2929–2936 (2009)
Google Scholar
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local SVM approach. In: ICPR, pp. 32–36 (2004)
Google Scholar
Wang, H., Ullah, M., Kläser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: BMVC, pp. 127–138 (2009)
Google Scholar
Lee, H., Ekanadham, C., Ng, A.: Sparse deep belief net model for visual area V2. In: NIPS, pp. 873–880 (2008)
Google Scholar
Mairal, J., Bach, F., Ponce, J., Sapiro, G.: Online dictionary learning for sparse coding. In: ICML, pp. 689–696 (2009)
Google Scholar
Freund, Y., Haussler, D.: Unsupervised learning of distributions of binary vectors using 2-layer networks. In: Proc. NIPS, vol. 4 (1992)
Google Scholar

Download references

Author information

Authors and Affiliations

Courant Institute of Mathematical Sciences, New York University, New York, USA
Graham W. Taylor, Rob Fergus, Yann LeCun & Christoph Bregler

Authors

Graham W. Taylor
View author publications
You can also search for this author in PubMed Google Scholar
Rob Fergus
View author publications
You can also search for this author in PubMed Google Scholar
Yann LeCun
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Bregler
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

GRASP Laboratory, University of Pennsylvania, 3330 Walnut Street, 19104, Philadelphia, PA, USA
Kostas Daniilidis
School of Electrical and Computer Engineering, National Technical University of Athens, 15773, Athens, Greece
Petros Maragos
Department of Applied Mathematics, Ecole Centrale de Paris, Grande Voie des Vignes, 92295, Chatenay-Malabry, France
Nikos Paragios

1 Electronic Supplementary Material

Electronic Supplementary Material (169 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Taylor, G.W., Fergus, R., LeCun, Y., Bregler, C. (2010). Convolutional Learning of Spatio-temporal Features. In: Daniilidis, K., Maragos, P., Paragios, N. (eds) Computer Vision – ECCV 2010. ECCV 2010. Lecture Notes in Computer Science, vol 6316. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15567-3_11

Download citation

DOI: https://doi.org/10.1007/978-3-642-15567-3_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15566-6
Online ISBN: 978-3-642-15567-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Convolutional Learning of Spatio-temporal Features

Abstract

Chapter PDF

Similar content being viewed by others

Deep Insights into Convolutional Networks for Video Recognition

Sympathy for the Details: Dense Trajectories and Hybrid Classification Architectures for Action Recognition

Video Representation Learning by Recognizing Temporal Transformations

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

1 Electronic Supplementary Material

Electronic Supplementary Material (169 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Convolutional Learning of Spatio-temporal Features

Abstract

Chapter PDF

Similar content being viewed by others

Deep Insights into Convolutional Networks for Video Recognition

Sympathy for the Details: Dense Trajectories and Hybrid Classification Architectures for Action Recognition

Video Representation Learning by Recognizing Temporal Transformations

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

1 Electronic Supplementary Material

Electronic Supplementary Material (169 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation