Skip to main content

Stacked Overcomplete Independent Component Analysis for Action Recognition

  • 1623 Accesses

Part of the Lecture Notes in Computer Science book series (LNIP,volume 10112)

Abstract

Generating the discriminative representations of video clips is of vital importance for human action recognition, especially for complex action scenarios. In this paper, we particularly introduce Overcomplete Independent Component Analysis (OICA) to directly learn structural spatio-temporal features from the raw video data. OICA as an unsupervised learning method can fully exploit the unlabeled videos, which is crucial for action recognition since labeling huge volume of video data is too effort-consumed in practice. In addition, features learned by OICA can more accurately describe the complex actions with enough details owing to the overcompleteness and independence constraints to the component bases. Furthermore, inspired by the layered structure of deep neural network, we also propose to stack OICA to form a two-layer network for abstracting robust high-level features. Such stacking is practically proved effective for boosting the recognition accuracy. We evaluate the proposed stacked OICA network on four benchmark datasets: Hollywood2, YouTube, UCF Sports and KTH, which cover the simple and complex action scenarios. The experimental results show that our method always outperforms the baselines, and achieves the state-of-the-art performance.

Keywords

  • Independent Component Analysis
  • Action Recognition
  • Image Patch
  • Convolutional Neural Network
  • Deep Neural Network

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-54184-6_23
  • Chapter length: 16 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   64.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-54184-6
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   84.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.

References

  1. Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H., et al.: Greedy layer-wise training of deep networks. In: NIPS (2007)

    Google Scholar 

  2. Coates, A., Ng, A.Y., Lee, H.: An analysis of single-layer networks in unsupervised feature learning. In: AISTATS (2011)

    Google Scholar 

  3. Comaniciu, D., Ramesh, V., Meer, P.: Kernel-based object tracking. IEEE Trans. PAMI 25, 564–577 (2003)

    CrossRef  Google Scholar 

  4. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)

    Google Scholar 

  5. Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: VS-PETS (2005)

    Google Scholar 

  6. Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. IEEE Trans. PAMI 29, 2247–2253 (2007)

    CrossRef  Google Scholar 

  7. Hastie, T., Tibshirani, R., Friedman, J.: Unsupervised learning. In: Liu, L., Özsu, M.T. (eds.) Encyclopedia of Database Systems. Springer, Heidelberg (2009)

    Google Scholar 

  8. Hyvärinen, A., Hurri, J., Hoyer, P.O.: Independent component analysis. In: Hyvärinen, A., Hurri, J., Hoyer, P.O. (eds.) Natural Image Statistics, vol. 39, pp. 151–175. Springer, Heidelberg (2009)

    CrossRef  Google Scholar 

  9. Jain, M., Jégou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: CVPR (2013)

    Google Scholar 

  10. Jhuang, H., Serre, T., Wolf, L., Poggio, T.: A biologically inspired system for action recognition. In: ICCV (2007)

    Google Scholar 

  11. Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. PAMI 35, 221–231 (2013)

    CrossRef  Google Scholar 

  12. Kanan, C., Cottrell, G.: Robust classification of objects, faces, and flowers using natural image statistics. In: CVPR (2010)

    Google Scholar 

  13. Klaser, A., Marsza lek, M., Schmid, C.: A spatio-temporal descriptor based on 3D-gradients. In: BMVC (2008)

    Google Scholar 

  14. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)

    Google Scholar 

  15. Laptev, I., Marsza lek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR (2008)

    Google Scholar 

  16. Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64, 107–123 (2005)

    MathSciNet  CrossRef  Google Scholar 

  17. Lee, H., Battle, A., Raina, R., Ng, A.Y.: Efficient sparse coding algorithms. In: NIPS (2006)

    Google Scholar 

  18. Lee, H., Grosse, R., Ranganath, R., Ng, A.Y.: Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: ICML (2009)

    Google Scholar 

  19. Le, Q.V., Karpenko, A., Ngiam, J., Ng, A.Y.: ICA with reconstruction cost for efficient overcomplete feature learning. In: NIPS (2011)

    Google Scholar 

  20. Le, Q.V., Zou, W.Y., Yeung, S.Y., Ng, A.Y.: Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: CVPR (2011)

    Google Scholar 

  21. Li, H., Tang, J., Wu, S., Zhang, Y., Lin, S.: Automatic detection and analysis of player action in moving background sports video sequences. IEEE Trans. Circ. Syst. Video Technol. 20, 351–364 (2010)

    CrossRef  Google Scholar 

  22. Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45, 503–528 (1989)

    MathSciNet  CrossRef  MATH  Google Scholar 

  23. Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos in the wild. In: CVPR (2009)

    Google Scholar 

  24. Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: CVPR (2009)

    Google Scholar 

  25. Mikolajczyk, K., Schmid, C.: An affine invariant interest point detector. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 128–142. Springer, Heidelberg (2002). doi:10.1007/3-540-47969-4_9

    CrossRef  Google Scholar 

  26. Ngiam, J., Chen, Z., Chia, D., Koh, P.W., Le, Q.V., Ng, A.Y.: Tiled convolutional neural networks. In: NIPS (2010)

    Google Scholar 

  27. Nowak, E., Jurie, F., Triggs, B.: Sampling strategies for bag-of-features image classification. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 490–503. Springer, Heidelberg (2006). doi:10.1007/11744085_38

    CrossRef  Google Scholar 

  28. Olshausen, B.A., Field, D.J.: Sparse coding with an overcomplete basis set: a strategy employed by V1? Vis. Res. 37, 3311–3325 (1997)

    CrossRef  Google Scholar 

  29. Rodriguez, M.D., Ahmed, J., Shah, M.: Action mach a spatio-temporal maximum average correlation height filter for action recognition. In: CVPR (2008)

    Google Scholar 

  30. Sánchez, J., Perronnin, F., Mensink, T., Verbeek, J.: Image classification with the fisher vector: theory and practice. Int. J. Comput. Vis. 105, 222–245 (2013)

    MathSciNet  CrossRef  MATH  Google Scholar 

  31. Schmidt, M.: Minfunc. Technical report (2005)

    Google Scholar 

  32. Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: ICPR (2004)

    Google Scholar 

  33. Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: ICME (2007)

    Google Scholar 

  34. Sun, L., Jia, K., Chan, T.H., Fang, Y., Wang, G., Yan, S.: DL-SFA: deeply-learned slow feature analysis for action recognition. In: CVPR (2014)

    Google Scholar 

  35. Taylor, G.W., Fergus, R., LeCun, Y., Bregler, C.: Convolutional learning of spatio-temporal features. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6316, pp. 140–153. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15567-3_11

    CrossRef  Google Scholar 

  36. van Hateren, J.H., Ruderman, D.L.: Independent component analysis of natural image sequences yields spatio-temporal filters similar to simple cells in primary visual cortex. Proc. R. Soc. London B: Biol. Sci. 265, 2315–2320 (1998)

    CrossRef  Google Scholar 

  37. Wang, H., Klaser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: CVPR (2011)

    Google Scholar 

  38. Wang, H., Ullah, M.M., Klaser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: BMVC (2009)

    Google Scholar 

  39. Wang, Z., Feng, J., Yan, S., Xi, H.: Linear distance coding for image classification. IEEE Trans. Image Process. 22, 537–548 (2013)

    MathSciNet  CrossRef  Google Scholar 

  40. Wang, H., Oneata, D., Verbeek, J., Schmid, C.: A robust and efficient video representation for action recognition. Int. J. Comput. Vis. 119, 1–20 (2015)

    MathSciNet  Google Scholar 

  41. Willems, G., Tuytelaars, T., Gool, L.: An efficient dense and scale-invariant spatio-temporal interest point detector. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5303, pp. 650–663. Springer, Heidelberg (2008). doi:10.1007/978-3-540-88688-4_48

    CrossRef  Google Scholar 

  42. Wiskott, L., Berkes, P., Franzius, M., Sprekeler, H., Wilbert, N.: Slow feature analysis. Scholarpedia 6, 52–82 (2011)

    CrossRef  Google Scholar 

Download references

Acknowledgement

This work is supported partially by the National Natural Science Foundation of China under Grant 61673362 and 61233003, and the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zilei Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Liu, Z., Tian, Y., Wang, Z. (2017). Stacked Overcomplete Independent Component Analysis for Action Recognition. In: Lai, SH., Lepetit, V., Nishino, K., Sato, Y. (eds) Computer Vision – ACCV 2016. ACCV 2016. Lecture Notes in Computer Science(), vol 10112. Springer, Cham. https://doi.org/10.1007/978-3-319-54184-6_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-54184-6_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-54183-9

  • Online ISBN: 978-3-319-54184-6

  • eBook Packages: Computer ScienceComputer Science (R0)