Characterizing Actions with Local Descriptors Based on Kinematics and Flow Recurrences

  • Pau Agustí
  • V. Javier Traver
  • Filiberto Pla
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7887)


A common approach to characterize and recognize human actions consists of detecting interest points within the video volume and describe them locally, followed by a bag-of-words representation. Many of the proposed descriptors are based on the local optic flow, but they may simply summarize the flow in terms of histograms of its orientations. However, potentially interesting and discriminative properties of the optic flow are arguably ignored this way. This work addresses this issue by exploring two new optic flow-based descriptors. One of them consists of kinematic features of spatial variations of the optic flow. The other one captures dynamic patterns of the optic flow in terms of its temporal recurrences. It is experimentally found that these descriptors perform competitively with respect to state-of-the art descriptors. Further elaboration of the proposed descriptors and additional experimentation is required to better assess their potential.


Action characterization Interest points Local descriptors Optic flow Kinematic features Recurrences 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: A review. ACM Computing Surveys 43(3), 16:1–16:43 (2011)CrossRefGoogle Scholar
  2. 2.
    Agustí, P., Traver, V.J., Marin-Jimenez, M.J., Pla, F.: Exploring alternative spatial and temporal dense representations for action recognition. In: Real, P., Diaz-Pernil, D., Molina-Abril, H., Berciano, A., Kropatsch, W. (eds.) CAIP 2011, Part II. LNCS, vol. 6855, pp. 364–371. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  3. 3.
    Ali, S., Shah, M.: Human action recognition in videos using kinematic features and multiple instance learning. PAMI 32(2), 288–303 (2010)CrossRefGoogle Scholar
  4. 4.
    Bregonzio, M., Xiang, T., Gong, S.: Fusing appearance and distribution information of interest points for action recognition. PR 45(3), 1220–1234 (2012)Google Scholar
  5. 5.
    Chakraborty, B., Holte, M.B., Moeslund, T.B., González, J.: Selective spatio-temporal interest points. CVIU 116(3), 396–410 (2012)Google Scholar
  6. 6.
    Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005)Google Scholar
  7. 7.
    Al Ghamdi, M., Al Harbi, N., Gotoh, Y.: Spatio-temporal video representation with locality-constrained linear coding. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012 Ws/Demos, Part III. LNCS, vol. 7585, pp. 101–110. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  8. 8.
    Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as Space-Time Shapes. PAMI 29(12), 2247–2253 (2007)CrossRefGoogle Scholar
  9. 9.
    Laptev, I.: On space-time interest points. International Journal of Computer Vision 64(2-3), 107–123 (2003)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Li, R., Zickler, T.: Discriminative virtual views for cross-view action recognition. In: CVPR (2012)Google Scholar
  11. 11.
    Marwan, N., Romano, M.C., Thiel, M., Kurthss, J.: Recurrence plots for the analysis of complex systems. Physics Reports 438(5-6), 237–329 (2007)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Poppe, R.: A survey on vision-based human action recognition. Image Vision Computer 28(6), 976–990 (2010)CrossRefGoogle Scholar
  13. 13.
    Schüldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local SVM approach. In: ICPR, pp. 32–36 (2004)Google Scholar
  14. 14.
    Jong Seo, H., Milanfar, P.: Action recognition from one example. PAMI 33(5), 867–882 (2011)CrossRefGoogle Scholar
  15. 15.
    Wang, H., Ullah, M.M., Kläser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: BMVC (2009)Google Scholar
  16. 16.
    Wu, X., Jia, Y.: View-invariant action recognition using latent kernelized structural SVM. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 411–424. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  17. 17.
    Zhang, Y., Liu, X., Chang, M.-C., Ge, W., Chen, T.: Spatio-temporal phrases for activity recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 707–721. Springer, Heidelberg (2012)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Pau Agustí
    • 1
    • 2
  • V. Javier Traver
    • 1
    • 2
  • Filiberto Pla
    • 1
    • 2
  1. 1.Institute of New Imaging Technologies (iNIT)Spain
  2. 2.Dpt. Lenguajes y Sistemas InformáticosUniversitat Jaume ICastellóSpain

Personalised recommendations