Skip to main content
Log in

Local polynomial space–time descriptors for action classification

  • Special Issue Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

In this paper we propose to tackle human action indexing by introducing a new local motion descriptor based on a model of the optical flow. We propose to apply a coding step to vector field before the modeling. We use two models: a spatial model and a temporal model. The spatial model is computed by projection of optical flow onto bivariate orthogonal polynomials. Then, the time evolution of spatial coefficients is modeled with a one-dimensional polynomial basis. To perform the action classification, we extend recent still image signatures using local descriptors to our proposal and combine them with linear SVM classifiers. The experiments are carried out on the well-known UCF11 dataset and on the more challenging Hollywood2 action classification dataset and show promising results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Ali, S., Shah, M.: Human action recognition in videos using kinematic features and multiple instance learning. Trans. Pattern Anal. Mach. Intell. 32, 288–303 (2010)

    Article  Google Scholar 

  2. Avila, S., Thome, N., Cord, M., Valle, E., de A Araujo, A.: Bossa: Extended bow formalism for image classification. In: ICIP, pp. 2909–2912. IEEE, Brussels, Belgium (2011)

  3. Bay, H., Tuytelaars, T., Van Gool, L.: Surf: speeded up robust features. In: ECCV, pp. 404–417 (2006)

  4. Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space–time shapes. In: ICCV, vol. 2, pp. 1395–1402. IEEE, Beijing, China (2005)

  5. Chatfield, K., Lempitsky, V., Vedaldi, A., Zisserman, A.: The devil is in the details: an evaluation of recent feature encoding methods. BMVC 76, 1–12 (2011)

    Google Scholar 

  6. Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: ECCV, pp. 428–441 (2006)

  7. Danafar, S., Gheissari, N.: Action recognition for surveillance applications using optic flow and svm. ACCV 4844, 457–466 (2007)

    Google Scholar 

  8. Davis, J., Bobick, A.: The representation and recognition of action using temporal templates. In: Conference on CVPR, pp. 928–934. IEEE, San Juan, Puerto Rico (1997)

  9. Efros, A., Berg, A., Mori, G., Malik, J.: Recognizing action at a distance. In: ICCV, vol. 2, pp. 726–733. IEEE, Nice, France (2003)

  10. Fathi, A., Mori, G.: Action recognition by learning mid-level motion features. In: Conference on CVPR, pp. 1–8. IEEE, Anchorage, Alaska, USA (2008)

  11. Gilbert, A., Illingworth, J., Bowden, R.: Action recognition using mined hierarchical compound features. Trans. Pattern Anal. Mach. Intell. 33(5), 883–897 (2011)

    Article  Google Scholar 

  12. Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space–time shapes. Trans. Pattern Anal. Mach. Intell. 29(12), 2247–2253 (2007)

    Article  Google Scholar 

  13. Horn, B., Schunck, B.: Determining optical flow. Artif. Intell. 17(1), 185–203 (1981)

    Article  Google Scholar 

  14. Ikizler-Cinbis, N., Sclaroff, S.: Object, scene and actions: combining multiple features for human action recognition. In: ECCV 2010, pp. 494–507. Springer, New York (2010)

  15. Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: Conference on CVPR, pp. 3304–3311. IEEE, San Francisco, CA, USA (2010)

  16. Jégou, H., Perronnin, F., Douze, M., Sánchez, J., Pérez, P., Schmid, C.: Aggregating local image descriptors into compact codes. Trans. Pattern Anal. Mach. Intell. 34, 1704–1716 (2012)

    Article  Google Scholar 

  17. Kellokumpu, V., Zhao, G., Pietikäinen, M.: Human activity recognition using a dynamic texture based method. In: BMVC, pp. 885–894 (2008)

  18. Kellokumpu, V., Zhao, G., Pietikäinen, M.: Texture based description of movements for activity analysis. VISAPP 1, 206–213 (2008)

    Google Scholar 

  19. Kihl, O., Picard, D., Gosselin, P.H.: Local polynomial space–time descriptors for actions classification. In: IAPR MVA, Kyoto, Japan (2013)

  20. Kihl, O., Tremblais, B., Augereau, B.: Multivariate orthogonal polynomials to extract singular points. In: ICIP, pp. 857–860. IEEE, New York (2008)

  21. Kihl, O., Tremblais, B., Augereau, B., Khoudeir, M.: Human activities discrimination with motion approximation in polynomial bases. In: ICIP, pp. 2469–2472. IEEE, San Diego, CA, USA (2010)

  22. Koga, T.: Motion-compensated interframe coding for video conferencing. Proceedings of the NTC, New Orleans (1981)

    Google Scholar 

  23. Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Conference on CVPR, pp. 1–8. IEEE, Anchorage, Alaska, USA (2008)

  24. Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos in the wild. In: Conference on CVPR, pp. 1996–2003. IEEE, Miami, Florida, USA (2009)

  25. Lowe, D.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)

    Article  Google Scholar 

  26. Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Proceedings of the 7th International Joint Conference on Artificial Intelligence, vol. 2, pp. 674–679 (1981)

  27. Negrel, R., Picard, D., Gosselin, P.: Using spatial pyramids with compacted VLAT for image categorization. In: ICPR, pp. 2460–2463 (2012)

  28. Picard, D., Gosselin, P.H.: Improving image similarity with vectors of locally aggregated tensors. In: ICIP, pp. 669–672. IEEE, Brussels, Belgium (2011)

  29. Picard, D., Gosselin, P.H.: Efficient image signatures and similarities using tensor products of local descriptors. CVIU 117(6), 680–687 (2013)

    Google Scholar 

  30. Polana, R., Nelson, R.: Low level recognition of human motion. In: Proceedings of the IEEE Workshop on Nonrigid and Articulate Motion, pp. 77–82. Austin, TX, USA (1994)

  31. Sánchez, J., Perronnin, F., Campos, T.D.: Modeling the spatial layout of images beyond spatial pyramids. Pattern Recogn. Lett. (2012)

  32. Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: ICPR, vol. 3, pp. 32–36. IEEE, Cambridge, UK (2004)

  33. Shechtman, E., Irani, M.: Space–time behavior based correlation. IEEE Conf. Comput. Vis. Pattern Recogn. (CVPR) 1, 405–412 (2005)

    Google Scholar 

  34. Shechtman, E., Irani, M.: Space–time behavior based correlation or how to tell if two underlying motion fields are similar without computing them? IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 29(11), 2045–2056 (2007)

    Article  Google Scholar 

  35. Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: ICCV, vol. 2, pp. 1470–1477. IEEE, Nice, France (2003)

  36. Tran, D., Sorokin, A.: Human activity recognition with metric learning. In: ECCV, pp. 548–561 (2008)

  37. Ullah, M.M., Parizi, S.N., Laptev, I.: Improving bag-of-features action recognition with non-local cues. In: Labrosse, F., Zwiggelaar, R., Liu, Y., Tiddeman, B. (eds.) Proceedings of the British Machine Vision Conference. BMVA Press, pp. 95.1–95.11. doi:10.5244/C.24.95 (2010)

  38. Wang, H., Klaser, A., Schmid, C., Liu, C.: Action recognition by dense trajectories. In: Conference on CVPR, pp. 3169–3176. IEEE, Colorado Springs, CO, USA (2011)

  39. Wang, H., Ullah, M.M., Klaser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: Cavallaro, A., Prince, S., Alexander, D. (eds.) Proceedings of the British Machine Vision Conference. BMVA Press, pp. 124.1–124.11. doi:10.5244/C.23.124 (2009)

  40. Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Locality-constrained linear coding for image classification. In: Conference on CVPR, pp. 3360–3367. IEEE, San Francisco, CA, USA (2010)

  41. Wang, L., Suter, D.: Learning and matching of dynamic shape manifolds for human action recognition. IEEE Trans. Image Process. 16(6), 1646 (2007)

    Article  MathSciNet  Google Scholar 

  42. Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: Conference on CVPR, pp. 1794–1801. IEEE, Miami, Florida, USA (2009)

  43. Zhou, X., Yu, K., Zhang, T., Huang, T.: Image classification using super-vector coding of local image descriptors. Comput. Vis. ECCV 2010, 141–154 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Olivier Kihl.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kihl, O., Picard, D. & Gosselin, PH. Local polynomial space–time descriptors for action classification. Machine Vision and Applications 27, 351–361 (2016). https://doi.org/10.1007/s00138-014-0652-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00138-014-0652-z

Keywords

Navigation