Novel Human Action Recognition in RGB-D Videos Based on Powerful View Invariant Features Technique

  • Sebastien Mambou
  • Ondrej KrejcarEmail author
  • Kamil Kuca
  • Ali Selamat
Part of the Studies in Computational Intelligence book series (SCI, volume 769)


Human action recognition is one of the important topic in nowadays research. It is obstructed by several factors, among them we can enumerate: the variation of shapes and postures of a human been, the time and memory space need to capture, store, label and process those images. In addition, recognize a human action from different view point is challenging due to the big amount of variation in each view, one possible solution of mentioned problem is to study different preferential View-invariant features sturdy enough to view variation. Our focus on this paper will be to solve mentioned problem by learning view shared and view specific features applying innovative deep models known as a novel sample-affinity matrix (SAM), able to give a good measurement of the similarities among video samples in different camera views. This will also lead to precisely adjust transmission between views and study more informative shared features involve in cross-view actions classification. In addition, we are proposing in this paper a novel view invariant features algorithm, which will give us a better understanding of the internal processing of our project. We have demonstrated through a series of experiment apply on NUMA and IXMAS (multiple camera view video dataset) that our method out performs state-of-the-art-methods.


Action recognition View point Sample-affinity matrix Cross-view actions NUMA IXMAS 



This work and the contribution were supported by The Faculty of Informatics and Management, University of Hradec Kralove, Czech Republic.


  1. 1.
    Fu, Y., Kong, Y.: Bilinear heterogeneous information machine for RGB-D action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1054–1062 (2015)Google Scholar
  2. 2.
    Fu, Y., Kong, Y.: Max-margin action prediction machine. IEEE Trans. Pattern Anal. Mach. Intell. 38, 1844–1858 (2016)CrossRefGoogle Scholar
  3. 3.
    Barshan, B., Tunçel, O., Altun, K.: Comparative study on classifying human activities with miniature inertial and magnetic sensors. Pattern Recognit. 43, 3605–3620 (2010)CrossRefzbMATHGoogle Scholar
  4. 4.
    Nanopoulos, A., Schmidt-Thieme, L., Grabocka, J.: Classification of sparse time series via supervised matrix factorization. In: Proceeding of the AAAI, pp. 928–934 (2012)Google Scholar
  5. 5.
    Dexter, E., Laptev, I., Pérez, P., Junejo, I. N.: Cross-view action recognition from temporal self-similarities. In: Proceeding of the ECCV, pp. 293–306 (2008)Google Scholar
  6. 6.
    Gao, Y., Shi, Y., Cao, L., Yang, W.: MRM-lasso: a sparse multiview feature selection method via low-rank analysis. IEEE Trans. Neural Netw. 26, 2801–2815 (2015)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Ricci, E., Subramanian, S., Liu, G., Sebe, N., Yan, Y.: Multitask linear discriminant analysis for view invariant action recognition. 23, 5599–5611 (2014)Google Scholar
  8. 8.
    Jiang, Z., Zheng, J., Phillips, J., Chellappa, R.: Cross-view action recognition via a transferable dictionary pair. In: Proceeding of the British Machine Vision Conference, pp. 125.1–125.11 (2012)Google Scholar
  9. 9.
    Kong, Y., Ding, Z., Li, J., Fu, Y.: Deeply learned view-invariant features for cross-view action recognition. IEEE Trans. Image Process. (2017)Google Scholar
  10. 10.
    Xu, Z., Weinberger, K., Sha, F., Chen, M.: Marginalized denoising autoencoders for domain adaptation. In: Proceeding of the ICML, pp. 1627–1634 (2012)Google Scholar
  11. 11.
    Guo, Y., Zhou, J., Ding, G.: Collective matrix factorization hashing for multimodal data. In: Proceeding of the CVPR, pp. 2075–2082 (2014)Google Scholar
  12. 12.
    Daumé, H., Kumar, A.: A co-training approach for multi-view spectral clustering. In: Proceeding of the ICML, pp. 393–400 (2011)Google Scholar
  13. 13.
    Zhang, K., Gu, P., Xue, X., Zhang, W.: Multi-view embedding learning for incompletely labeled data. In: Proceeding of the IJCAI, pp. 1910–1916 (2013)Google Scholar
  14. 14.
    He, R., Wang, W., Wang, L., Tan, T., Wang, K.: Learning coupled feature spaces for cross-modal matching. In Proceeding of the ICCV, pp. 2088–2095 (2013)Google Scholar
  15. 15.
    Tao, D., Xu, C., Xu, C.: Multi-view learning with incomplete views. IEEE Trans. Image Process. 24, 5812–5825 (2015)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Kumar, A., Daume, H., Jacobs, D.W., Sharma, A.: Generalized multiview analysis: a discriminative latent space. In: Proceeding of the CVPR, pp. 2160–2167 (2012)Google Scholar
  17. 17.
    Fu, Y., Ding, Z.: Low-rank common subspace for multi-view learning. In Proceeding of the IEEE International Conference on Data Mining (ICDM), pp. 110–119 (2014)Google Scholar
  18. 18.
    Özuysal, M., Fua, P., Weinland, D.: Making action recognition robust to occlusions and viewpoint changes. In: Proceeding of the ECCV, pp. 635–648 (2010)Google Scholar
  19. 19.
    Dexter, E., Laptev, I., Pérez, P., Junejo, I.N.: Cross-view action recognition from temporal self-similarities. In: Proceeding of the ECCV, pp. 293–306 (2008)Google Scholar
  20. 20.
    Dexter, E., Laptev, I., Perez, P., Junejo, I.N.: View-independent action recognition from temporal self-similarities. IEEE Trans. Pattern Anal. Mach. Intell. 33, 172–185 (2011)CrossRefGoogle Scholar
  21. 21.
    Mian, A., Rahmani, H.: Learning a non-linear knowledge transfer model for cross-view action recognition. In: Proceeding of the CVPR, pp. 2458–2466 (2015)Google Scholar
  22. 22.
    Shah, M., Kuipers, B., Savarese, S., Liu, J.: Cross-view action recognition via view knowledge transfer. In: Proceeding of the CVPR, pp. 3209–3216, June 2011Google Scholar
  23. 23.
    Jiang, Z., Zheng, J.: Learning view-invariant sparse representations for cross-view action recognition. In: Proceeding of the ICCV, pp. 3176–3183 (2013)Google Scholar
  24. 24.
    Shan, S., Zhang, H., Lao, S., Chen, X., Kan, M.: Multi-view discriminant analysis. IEEE Trans. Pattern Anal. Mach. Intell. 38, 188–194 (2016)CrossRefGoogle Scholar
  25. 25.
    Camps, O.I., Sznaier, M., Li, B.: Cross-view activity recognition using hankelets. In: Proceeding of the CVPR, pp. 1362–1369 (2012)Google Scholar
  26. 26.
    Wang, C., Xiao, B., Zhou, W., Liu, S., Shi, C., Zhang, Z.: Cross- view action recognition via a continuous virtual path. In: Proceeding of the CVPR, pp. 2690–2697 (2013)Google Scholar
  27. 27.
    Salakhutdinov, R.R., Hinton, G.E.: Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  28. 28.
    Zhang, T., Luo, W., Yang, J., Yuan, X., Zhang, J., Li, J.: Sparseness analysis in the pretraining of deep neural networks. IEEE Trans. Neural Netw. Learn. Syst. (to be published).
  29. 29.
    Weinberger, K., Sha, F., Bengio, Y., Chen, M.: Marginalized denoising auto-encoders for nonlinear representations. In: Proceeding of the ICML, pp. 1476–1484 (2014)Google Scholar
  30. 30.
    Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A., Vincent, P.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)MathSciNetzbMATHGoogle Scholar
  31. 31.
    Ronfard, R., Boyer, E., Weinland, D.: Free viewpoint action recognition using motion history volumes. Comput. Vis. Image Understand. 104, 249–257 (2006)CrossRefGoogle Scholar
  32. 32.
    Tabrizi, M.K., Endres, I., Forsyth, D.A., Farhadi, A.: A latent model of discriminative aspect. In: Proceeding of the ICCV, pp. 948–955 (2009)Google Scholar
  33. 33.
    Martinez, J., Little, J.J., Woodham, R.J., Gupta, A.: 3D pose from motion for cross-view action recognition via non-linear circulant temporal encoding. In: Proceeding of the CVPR, pp. 2601–2608 (2014)Google Scholar
  34. 34.
    Nie, X., Xia, Y., Wu, Y., Zhu, S.C., Wang, J.: Cross-view action modeling, learning and recognition. In: Proceeding of the CVPR, pp. 2649–2656 (2014)Google Scholar
  35. 35.
    Rabaud, V., Cottrell, G., Belongie, S., Dollar, P.: Behavior recognition via sparse spatio-temporal features. In: Proceeding of the VS-PETS, pp. 65–72, Oct 2005Google Scholar
  36. 36.
    Kläser, A., Schmid, C., Liu, C.L., Wang, H.: Dense trajectories and motion boundary descriptors for action recognition. Int. J. Comput. Vis. 103, 60–79 (2013)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.Faculty of Informatics and ManagementCenter for Basic and Applied Research, University of Hradec KraloveHradec KraloveCzech Republic
  2. 2.Faculty of ComputingUniversiti Teknologi MalaysiaJohorMalaysia

Personalised recommendations