Comparative Evaluation of Action Recognition Methods via Riemannian Manifolds, Fisher Vectors and GMMs: Ideal and Challenging Conditions

  • Johanna Carvajal
  • Arnold Wiliem
  • Chris McCool
  • Brian Lovell
  • Conrad SandersonEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9794)


We present a comparative evaluation of various techniques for action recognition while keeping as many variables as possible controlled. We employ two categories of Riemannian manifolds: symmetric positive definite matrices and linear subspaces. For both categories we use their corresponding nearest neighbour classifiers, kernels, and recent kernelised sparse representations. We compare against traditional action recognition techniques based on Gaussian mixture models and Fisher vectors (FVs). We evaluate these action recognition techniques under ideal conditions, as well as their sensitivity in more challenging conditions (variations in scale and translation). Despite recent advancements for handling manifolds, manifold based techniques obtain the lowest performance and their kernel representations are more unstable in the presence of challenging conditions. The FV approach obtains the highest accuracy under ideal conditions. Moreover, FV best deals with moderate scale and translation changes.


Riemannian Manifold Gaussian Mixture Model Linear Subspace Action Recognition Sparse Representation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



NICTA is funded by the Australian Government via the Department of Communications, and the Australian Research Council via the ICT Centre of Excellence program.


  1. 1.
    Turaga, P., Veeraraghavan, A., Srivastava, A., Chellappa, R.: Statistical computations on Grassmann and Stiefel manifolds for image and video-based recognition. IEEE Trans. Pattern Anal. Mach. Intell. 33, 2273–2286 (2011)CrossRefGoogle Scholar
  2. 2.
    Carvajal, J., Sanderson, C., McCool, C., Lovell, B.C.: Multi-action recognition via stochastic modelling of optical flow and gradients. In: Workshop on Machine Learning for Sensory Data Analysis (MLSDA), pp. 19–24 (2014)Google Scholar
  3. 3.
    Lin, W., Sun, M.T., Poovandran, R., Zhang, Z.: Human activity recognition for video surveillance. In: International Symposium on Circuits and Systems (ISCAS), pp. 2737–2740 (2008)Google Scholar
  4. 4.
    Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)zbMATHGoogle Scholar
  5. 5.
    Csurka, G., Perronnin, F.: Fisher vectors: beyond bag-of-visual-words image representations. In: Richard, P., Braz, J. (eds.) VISIGRAPP 2010. CCIS, vol. 229, pp. 28–42. Springer, Heidelberg (2011)Google Scholar
  6. 6.
    Wang, H., Schmid, C.: Action recognition with improved trajectories. In: International Conference on Computer Vision (ICCV) (2013)Google Scholar
  7. 7.
    Aggarwal, J., Ryoo, M.: Human activity analysis: a review. ACM Comput. Surv. 43, 16:1–16: 43 (2011)CrossRefGoogle Scholar
  8. 8.
    Ke, S.R., Thuc, H.L.U., Lee, Y.J., Hwang, J.N., Yoo, J.H., Choi, K.H.: A review on video-based human activity recognition. Computers 2, 88 (2013)CrossRefGoogle Scholar
  9. 9.
    Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. 28, 976–990 (2010)CrossRefGoogle Scholar
  10. 10.
    Weinland, D., Ronfard, R., Boyer, E.: A survey of vision-based methods for action representation, segmentation and recognition. Comput. Vis. Image Underst. 115, 224–241 (2011)CrossRefGoogle Scholar
  11. 11.
    Hassner, T.: A critical review of action recognition benchmarks. In: Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 245–250 (2013)Google Scholar
  12. 12.
    Pérez, Ó., Piccardi, M., García, J., Molina, J.M.: Comparison of classifiers for human activity recognition. In: Mira, J., Álvarez, J.R. (eds.) IWINAC 2007. LNCS, vol. 4528, pp. 192–201. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  13. 13.
    Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: International Conference on Pattern Recognition (ICPR), vol. 3, pp. 32–36 (2004)Google Scholar
  14. 14.
    Rodriguez, M., Ahmed, J., Shah, M.: Action MACH a spatio-temporal maximum average correlation height filter for action recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2008)Google Scholar
  15. 15.
    Chen, C.C., Ryoo, M.S., Aggarwal, J.K.: UT-Tower Dataset: Aerial View Activity Classification Challenge (2010)Google Scholar
  16. 16.
    Ali, S., Shah, M.: Human action recognition in videos using kinematic features and multiple instance learning. Pattern Anal. Mach. Intell. 32, 288–303 (2010)CrossRefGoogle Scholar
  17. 17.
    Guo, K., Ishwar, P., Konrad, J.: Action recognition from video using feature covariance matrices. IEEE Trans. Image Process. 22, 2479–2494 (2013)CrossRefMathSciNetGoogle Scholar
  18. 18.
    Sanin, A., Sanderson, C., Harandi, M., Lovell, B.: Spatio-temporal covariance descriptors for action and gesture recognition. In: Workshop on Applications of Computer Vision (WACV), pp. 103–110 (2013)Google Scholar
  19. 19.
    Harandi, M.T., Sanderson, C., Shirazi, S., Lovell, B.C.: Kernel analysis on Grassmann manifolds for action recognition. Pattern Recogn. Lett. 34, 1906–1915 (2013)CrossRefGoogle Scholar
  20. 20.
    Narasimha Murty, M., Susheela Devi, V.: Nearest neighbour based classifiers. In: Pattern Recognition: An Algorithmic Approach. Undergraduate Topics in Computer Science, pp. 48–85. Springer, London (2011). doi: 10.1007/978-0-85729-495-1_3 Google Scholar
  21. 21.
    Hamm, J., Lee, D.D.: Grassmann discriminant analysis: a unifying view on subspace-based learning. In: International Conference on Machine Learning (ICML), pp. 376–383 (2008)Google Scholar
  22. 22.
    Arsigny, V., Fillard, P., Pennec, X., Ayache, N.: Log-Euclidean metrics for fast and simple calculus on diffusion tensors. Magn. Reson. Med. 56, 411–421 (2006)CrossRefGoogle Scholar
  23. 23.
    Vemulapalli, R., Pillai, J., Chellappa, R.: Kernel learning for extrinsic classification of manifold features. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1782–1789 (2013)Google Scholar
  24. 24.
    Harandi, M.T., Sanderson, C., Hartley, R., Lovell, B.C.: Sparse coding and dictionary learning for symmetric positive definite matrices: a kernel approach. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 216–229. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  25. 25.
    Jayasumana, S., Hartley, R., Salzmann, M., Li, H., Harandi, M.: Kernel methods on the Riemannian manifold of symmetric positive definite matrices. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 73–80 (2013)Google Scholar
  26. 26.
    Zhang, J., Wang, L., Zhou, L., Li, W.: Learning discriminative Stein kernel for SPD matrices and its applications. IEEE Trans. Neural Netw. Learn. Syst. (in press)Google Scholar
  27. 27.
    Wang, R., Guo, H., Davis, L., Dai, Q.: Covariance discriminative learning: a natural and efficient approach to image set classification. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2496–2503 (2012)Google Scholar
  28. 28.
    Shirazi, S., Harandi, M., Sanderson, C., Alavi, A., Lovell, B.: Clustering on Grassmann manifolds via kernel embedding with application to action analysis. In: International Conference on Image Processing (ICIP), pp. 781–784 (2012)Google Scholar
  29. 29.
    Wu, Y., Jia, Y., Li, P., Zhang, J., Yuan, J.: Manifold kernel sparse representation of symmetric positive-definite matrices and its applications. IEEE Trans. Image Process. 24, 3729–3741 (2015)CrossRefMathSciNetGoogle Scholar
  30. 30.
    Harandi, M., Sanderson, C., Shen, C., Lovell, B.: Dictionary learning and sparse coding on Grassmann manifolds: an extrinsic solution. In: International Conference on Computer Vision (ICCV), pp. 3120–3127 (2013)Google Scholar
  31. 31.
    Bini, D.A., Iannazzo, B.: Computing the Karcher mean of symmetric positive definite matrices. Linear Algebra Appl. 438, 1700–1710 (2013)CrossRefzbMATHMathSciNetGoogle Scholar
  32. 32.
    Sánchez, J., Perronnin, F., Mensink, T., Verbeek, J.: Image classification with the Fisher vector: theory and practice. Int. J. Comput. Vision 105, 222–245 (2013)CrossRefzbMATHMathSciNetGoogle Scholar
  33. 33.
    Perronnin, F., Sánchez, J., Mensink, T.: Improving the Fisher kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  34. 34.
    Shirazi, S., Sanderson, C., McCool, C., Harandi, M.T.: Bags of affine subspaces for robust object tracking. In: IEEE International Conference on Digital Image Computing: Techniques and Applications (2015).
  35. 35.
    Traore, I., Ahmed, A.A.E.: Continuous Authentication Using Biometrics: Data, Models, and Metrics, 1st edn. IGI Global, Hershey (2011)Google Scholar
  36. 36.
    Hirose, S., Nambu, I., Naito, E.: An empirical solution for over-pruning with a novel ensemble-learning method for fMRI decoding. J. Neurosci. Methods 239, 238–245 (2015)CrossRefGoogle Scholar
  37. 37.
    Aggarwal, N., Agrawal, R.: First and second order statistics features for classification of magnetic resonance brain images. J. Signal Inf. Process. 3, 146–153 (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Johanna Carvajal
    • 1
    • 3
  • Arnold Wiliem
    • 1
  • Chris McCool
    • 2
  • Brian Lovell
    • 1
  • Conrad Sanderson
    • 1
    • 3
    • 4
    Email author
  1. 1.University of QueenslandBrisbaneAustralia
  2. 2.Queensland University of TechnologyBrisbaneAustralia
  3. 3.NICTABrisbaneAustralia
  4. 4.Data61, CSIROCanberraAustralia

Personalised recommendations