Deep learning human actions from video via sparse filtering and locally competitive algorithms

Hahn, William Edward; Lewkowitz, Stephanie; Lacombe, Daniel C.; Barenholtz, Elan

doi:10.1007/s11042-015-2808-x

Deep learning human actions from video via sparse filtering and locally competitive algorithms

Published: 07 August 2015

Volume 74, pages 10097–10110, (2015)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

William Edward Hahn¹,
Stephanie Lewkowitz³,
Daniel C. Lacombe Jr.² &
…
Elan Barenholtz^1,2

564 Accesses
5 Citations
Explore all metrics

Abstract

Physiological and psychophysical evidence suggest that early visual cortex compresses the visual input on the basis of spatial and orientation-tuned filters. Recent computational advances have suggested that these neural response characteristics may reflect a ‘sparse coding’ architecture—in which a small number of neurons need to be active for any given image—yielding critical structure latent in natural scenes. Here we present a novel neural network architecture combining a sparse filter model and locally competitive algorithms (LCAs), and demonstrate the network’s ability to classify human actions from video. Sparse filtering is an unsupervised feature learning algorithm designed to optimize the sparsity of the feature distribution directly without having the need to model the data distribution. LCAs are defined by a system of differential equations where the initial conditions define an optimization problem and the dynamics converge to a sparse decomposition of the input vector. We applied this architecture to train a classifier on categories of motion in human action videos. Inputs to the network were small 3D patches taken from frame differences in the videos. Dictionaries were derived for each action class and then activation levels for each dictionary were assessed during reconstruction of a novel test patch. Overall, classification accuracy was at ≈ 97 %. We discuss how this sparse filtering approach provides a natural framework for multi-sensory and multimodal data processing including RGB video, RGBD video, hyper-spectral video, and stereo audio/video streams.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Keep It Simple and Sparse: Real-Time Action Recognition

Deep Discriminative Model for Video Classification

Discriminative Sparse Representations

References

Atick JJ, Redlich AN (1990) Towards a theory of early visual processing. Neural Comput 2(3):308–320
Article Google Scholar
Atkinson J, Hood B, Wattam-Bell J, Anker S, Tricklebank J (1988) Development of orientation discrimination in infancy. Perception 17(5):587–595
Article Google Scholar
Blakemore C., Campbell FW (1969) On the existence of neurones in the human visual system selectively sensitive to the orientation and size of retinal images
Blakemore C., Cooper GF (1970) Development of the brain depends on the visual environment
Boureau YL, Ponce J, LeCun Y (2010) A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 111–118
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge university press
Candy J, Franke M, Haskell B, Mounts F (1971) Transmitting television as clusters of frame-to-frame differences. Bell Syst Tech J 50(6):1889–1917
Article Google Scholar
Castrodad A, Sapiro G (2012) Sparse modeling of human actions from motion imagery. Int J Comput Vis 100(1):1–15
Article MathSciNet Google Scholar
Jones JP, Palmer LA (1987) An evaluation of the two-dimensional gabor filter model of simple receptive fields in cat striate cortex. J Neurophysiol 58(6):1233–1258
Google Scholar
Legge GE, Foley JM (1980) Contrast masking in human vision. JOSA 70 (12):1458–1471
Article Google Scholar
Ngiam J, Chen Z, Bhaskar SA, Koh PW, Ng AY (2011) Sparse filtering. In: Advances in neural information processing systems, pp 1125–1133
Olshausen BA, Field DJ (1997) Sparse coding with an overcomplete basis set: a strategy employed by v1?. Vis Res 37(23):3311–3325
Article Google Scholar
Olshausen BA, Field DJ (2004) Sparse coding of sensory inputs. Current Opinion Neurobiol 14(4):481–487
Article Google Scholar
Rozell C, Johnson D, Baraniuk R, Olshausen B (2007) Locally competitive algorithms for sparse approximation. In: IEEE international conference on image processing, 2007. ICIP 2007, vol 4. IEEE, pp IV–169
Rozell CJ, Johnson DH, Baraniuk RG, Olshausen BA (2008) Sparse coding via thresholding and local competition in neural circuits. Neural Comput 20(10):2526–2563
Article MathSciNet Google Scholar
Sachs MB, Nachmias J, Robson JG (1971) Spatial-frequency channels in human vision. JOSA 61(9):1176–1186
Article Google Scholar
Schmidt M, Fung G, Rosales R (2009) Optimization methods for l1-regularization. University of British Columbia, Technical Report TR-2009 19
Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local svm approach. In: Proceedings of the 17th international conference on pattern recognition, 2004. ICPR 2004, vol 3. IEEE, pp 32–36

Download references

Acknowledgments

The authors would like to thank Rice University, Stanford University, and KHT of Stockholm, Sweden.

Author information

Authors and Affiliations

Center for Complex Systems and Brain Sciences, Florida Atlantic University, Boca Raton, FL, USA
William Edward Hahn & Elan Barenholtz
Department of Psychology, Florida Atlantic University, Boca Raton, FL, USA
Daniel C. Lacombe Jr. & Elan Barenholtz
Department of Physics, Florida Atlantic University, Boca Raton, FL, USA
Stephanie Lewkowitz

Authors

William Edward Hahn
View author publications
You can also search for this author in PubMed Google Scholar
Stephanie Lewkowitz
View author publications
You can also search for this author in PubMed Google Scholar
Daniel C. Lacombe Jr.
View author publications
You can also search for this author in PubMed Google Scholar
Elan Barenholtz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to William Edward Hahn.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hahn, W.E., Lewkowitz, S., Lacombe, D.C. et al. Deep learning human actions from video via sparse filtering and locally competitive algorithms. Multimed Tools Appl 74, 10097–10110 (2015). https://doi.org/10.1007/s11042-015-2808-x

Download citation

Received: 02 April 2015
Revised: 19 May 2015
Accepted: 01 July 2015
Published: 07 August 2015
Issue Date: November 2015
DOI: https://doi.org/10.1007/s11042-015-2808-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep learning human actions from video via sparse filtering and locally competitive algorithms

Abstract

Access this article

Similar content being viewed by others

Keep It Simple and Sparse: Real-Time Action Recognition

Deep Discriminative Model for Video Classification

Discriminative Sparse Representations

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Deep learning human actions from video via sparse filtering and locally competitive algorithms

Abstract

Access this article

Similar content being viewed by others

Keep It Simple and Sparse: Real-Time Action Recognition

Deep Discriminative Model for Video Classification

Discriminative Sparse Representations

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation