Human Action Recognition under Log-Euclidean Riemannian Metric

  • Chunfeng Yuan
  • Weiming Hu
  • Xi Li
  • Stephen Maybank
  • Guan Luo
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5994)


This paper presents a new action recognition approach based on local spatio-temporal features. The main contributions of our approach are twofold. First, a new local spatio-temporal feature is proposed to represent the cuboids detected in video sequences. Specifically, the descriptor utilizes the covariance matrix to capture the self-correlation information of the low-level features within each cuboid. Since covariance matrices do not lie on Euclidean space, the Log-Euclidean Riemannian metric is used for distance measure between covariance matrices. Second, the Earth Mover’s Distance (EMD) is used for matching any pair of video sequences. In contrast to the widely used Euclidean distance, EMD achieves more robust performances in matching histograms/distributions with different sizes. Experimental results on two datasets demonstrate the effectiveness of the proposed approach.


Action recognition Spatio-temporal descriptor Log-Euclidean Riemannian metric EMD 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Arsigny, V., Fillard, P., Pennec, X., Ayache, N.: Geometric Means in a Novel Vector Space Structure on Symmetric Positive-Definite Matrices. SIAM J. Matrix Anal. Appl., 328–347 (2007)Google Scholar
  2. 2.
    Schuldt, C., Laptev, I., Caputo, B.: Recognizing Human Actions: A Local SVM Approach. In: ICPR, pp. 32–36 (2004)Google Scholar
  3. 3.
    Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR (2008)Google Scholar
  4. 4.
    Niebles, J., Wang, H., Fei-Fei, L.: Unsupervised Learning of Human Action Categories Using Spatial Temporal Words. In: IJCV, pp. 299–318 (2008)Google Scholar
  5. 5.
    Yan, K., Sukthankar, R., Hebert, M.: Efficient Visual Event Detection using Volumetric Features. In: ICCV, pp. 166–173 (2005)Google Scholar
  6. 6.
    Lucena, M.J., Fuertes, J.M., Blanca, N.P.: Human Motion Characterization Using Spatio-temporal Features. In: Martí, J., Benedí, J.M., Mendonça, A.M., Serrat, J. (eds.) IbPRIA 2007. LNCS, vol. 4477, pp. 72–79. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  7. 7.
    Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior Recognition Via Sparse spatiotemporal Features. In: 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005)Google Scholar
  8. 8.
    Wong, S., Cipolla, R.: Extracting Spatiotemporal Interest Points using Global Information. In: ICCV, pp. 1–8 (2007)Google Scholar
  9. 9.
    Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. PAMI 27(10), 615–1630 (2005)Google Scholar
  10. 10.
    Li, X., Hu, W., Zhang, Z., Zhang, X., Zhu, M., Cheng, J.: Visual Tracking Via Incremental Log-Euclidean Riemannian Subspace Learning. In: CVPR (2008)Google Scholar
  11. 11.
    Kadir, T., Zisserman, A., Brady, M.: An Affine Invariant Salient Region Detector. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3021, pp. 228–241. Springer, Heidelberg (2004)Google Scholar
  12. 12.
    Fathi, A., Mori, G.: Action Recognition by Learning Mid-level Motion Features. In: CVPR (2008)Google Scholar
  13. 13.
    Rubner, Y., Tomasi, C., Guibas, L.J.: A metric for distributions with applications to image databases. In: ICCV, pp. 59–66 (1998)Google Scholar
  14. 14.
    Yan, K., Sukthankar, R.: PCA-SIFT: A More Distinctive Representation for Local Image Descriptors. In: CVPR, pp. 506–513 (2004)Google Scholar
  15. 15.
    Rubner, Y., Tomasi, C., Guibas, L.J.: The Earth Mover’s Distance as a Metric for Image Retrieval. IJCV 40(2), 99–121 (2000)zbMATHCrossRefGoogle Scholar
  16. 16.
    Tuzel, O., Porikli, F., Meer, P.: Region Covariance: A Fast Descriptor for Detection and Classification. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 589–600. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  17. 17.
    Liu, J., Ali, S., Shah, M.: Recognizing Human Actions Using Multiple Features. In: CVPR (2008)Google Scholar
  18. 18.
    Jia, K., Yeung, D.: Human Action Recognition using Local Spatio-Temporal Discriminant Embedding. In: CVPR (2008)Google Scholar
  19. 19.
    Perronnin, F.: Universal and Adapted Vocabularies for Generic Visual Categorization. PAMI 30(7), 1243–1256 (2008)Google Scholar
  20. 20.
    Wang, L., Suter, D.: Recognizing Human Activities from Silhouettes: Motion Subspace and Factorial Discriminative Graphical Model. In: CVPR (2007)Google Scholar
  21. 21.
    Liu, J., Shah, M.: Learning Human Actions via Information Maximazation. In: CVPR (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Chunfeng Yuan
    • 1
  • Weiming Hu
    • 1
  • Xi Li
    • 1
  • Stephen Maybank
    • 2
  • Guan Luo
    • 1
  1. 1.National Laboratory of Pattern RecognitionInstitute of Automation, CASBeijingChina
  2. 2.School of Computer Science and Information SystemsBirkbeck CollegeLondonUK

Personalised recommendations