Advertisement

Spatio-temporal Video Representation with Locality-Constrained Linear Coding

  • Manal Al Ghamdi
  • Nouf Al Harbi
  • Yoshihiko Gotoh
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7585)

Abstract

This paper presents a spatio-temporal coding technique for a video sequence. The framework is based on a space-time extension of scale-invariant feature transform (SIFT) combined with locality-constrained linear coding (LLC). The coding scheme projects each spatio-temporal descriptor into a local coordinate representation produced by max pooling. The extension is evaluated using human action classification tasks. Experiments with the KTH, Weizmann, UCF sports and Hollywood datasets indicate that the approach is able to produce results comparable to the state-of-the-art.

Keywords

Action Recognition Interest Point Sparse Code Human Action Recognition Gaussian Pyramid 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Laptev, I.: On space-time interest points. International Journal of Computer Vision, 107–123 (2005)Google Scholar
  2. 2.
    Shechtman, E., Irani, M.: Space-time behavior-based correlation-or-how to tell if two underlying motion fields are similar without computing them? IEEE Trans. Pattern Anal. Mach. Intell., 2045–2056 (2007)Google Scholar
  3. 3.
    Liu, J., Shah, M.: Learning human actions via information maximization. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2008)Google Scholar
  4. 4.
    Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 16:1–16:43 (2008)Google Scholar
  5. 5.
    Klaser, A., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3D-gradients. In: BMVC, pp. 995–1004. British Machine Vision Association (2008)Google Scholar
  6. 6.
    Wong, S., Cipolla, R.: Extracting spatiotemporal interest points using global information. In: IEEE International Conference on Computer Vision, ICCV, pp. 3455–3460 (2007)Google Scholar
  7. 7.
    Willems, G., Tuytelaars, T., Van Gool, L.: An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 650–663. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  8. 8.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 91–110 (2004)Google Scholar
  9. 9.
    Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the international conference on Multimedia, pp. 357–360. ACM (2007)Google Scholar
  10. 10.
    Cheung, W., Hamarneh, G.: N-sift: N-dimensional scale invariant feature transform for matching medical images. In: IEEE International Symposium on Biomedical Imaging: From Nano to Macro, pp. 720–723 (2007)Google Scholar
  11. 11.
    Chen, M.Y., Hauptmann, A.: Mosift: Recognizing human actions in surveillance videos. Transform, 1–16 (2009)Google Scholar
  12. 12.
    Yang, M., Zhang, L., Yang, J., Zhang, D.: Robust sparse coding for face recognition. In: Computer Vision and Pattern Recognition, CVPR, pp. 625–632 (2011)Google Scholar
  13. 13.
    Grauman, K., Darrell, T.: The pyramid match kernel: Discriminative classification with sets of image features. In: IEEE International Conference on Computer Vision, ICCV, pp. 1458–1465 (2005)Google Scholar
  14. 14.
    Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 1794–1801 (2009)Google Scholar
  15. 15.
    Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Locality-constrained linear coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3360–3367 (2010)Google Scholar
  16. 16.
    Allaire, S., Kim, J., Breen, S., Jaffray, D., Pekar, V.: Full orientation invariance and improved feature selectivity of 3D sift with application to medical image analysis. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, CVPR (2008)Google Scholar
  17. 17.
    Lopes, A., Oliveira, R., de Almeida, J., de A Araujo, A.: Spatio-temporal frames in a bag-of-visual-features approach for human actions recognition. In: XXII Brazilian Symposium on Computer Graphics and Image Processing, pp. 315–321 (2009)Google Scholar
  18. 18.
    Vedaldi, A., Fulkerson, B.: Vlfeat: an open and portable library of computer vision algorithms. In: Proceedings of the International Conference on Multimedia, pp. 1469–1472. ACM (2010)Google Scholar
  19. 19.
    Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local svm approach. In: Proceedings of the International Conference on Pattern Recognition, pp. 32–36 (2004)Google Scholar
  20. 20.
    Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: The IEEE International Conference on Computer Vision, ICCV, pp. 1395–1402 (2005)Google Scholar
  21. 21.
    Rodriguez, M., Ahmed, J., Shah, M.: Action mach a spatio-temporal maximum average correlation height filter for action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2008)Google Scholar
  22. 22.
    Sun, C., Junejo, I.N., Foroosh, H.: Action recognition using rank-1 approximation of joint self-similarity volume. In: IEEE International Conference on Computer Vision, ICCV, pp. 1007–1012 (2011)Google Scholar
  23. 23.
    Weinland, D., Özuysal, M., Fua, P.: Making Action Recognition Robust to Occlusions and Viewpoint Changes. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part III. LNCS, vol. 6313, pp. 635–648. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  24. 24.
    Schindler, K., Van Gool, L.: Action snippets: How many frames does human action recognition require? In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2008)Google Scholar
  25. 25.
    Yeffet, L., Wolf, L.: Local trinary patterns for human action recognition. In: IEEE International Conference on Computer Vision, ICCV, pp. 492–497 (2009)Google Scholar
  26. 26.
    Yao, A., Gall, J., Van Gool, L.: A hough transform-based voting framework for action recognition. In: Computer Vision and Pattern Recognition, CVPR, pp. 2061–2068 (2010)Google Scholar
  27. 27.
    de Campos, T., Barnard, M., Mikolajczyk, K., Kittler, J., Yan, F., Christmas, W., Windridge, D.: An evaluation of bags-of-words and spatio-temporal shapes for action recognition. In: IEEE Workshop on Applications of Computer Vision, WACV, pp. 344–351 (2011)Google Scholar
  28. 28.
    Wu, S., Oreifej, O., Shah, M.: Action recognition in videos acquired by a moving camera using motion decomposition of lagrangian particle trajectories. In: IEEE International Conference on Computer Vision, ICCV, pp. 1419–1426 (2011)Google Scholar
  29. 29.
    Gilbert, A., Illingworth, J., Bowden, R.: Fast realistic multi-action recognition using mined dense spatio-temporal features. In: IEEE International Conference on Computer Vision, ICCV, pp. 925–931 (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Manal Al Ghamdi
    • 1
  • Nouf Al Harbi
    • 1
  • Yoshihiko Gotoh
    • 1
  1. 1.University of SheffieldUK

Personalised recommendations