Skip to main content
Log in

A unified model for human activity recognition using spatial distribution of gradients and difference of Gaussian kernel

  • Original Article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Understanding of human action and activity from video data is growing field and received rapid importance due to surveillance, security, entertainment and personal logging. In this work, a new hybrid technique is proposed for the description of human action and activity in video sequences. The unified framework endows a robust feature vector wrapping both global and local information strengthening discriminative depiction of action recognition. Initially, entropy-based texture segmentation is used for human silhouette extraction followed by construction of average energy silhouette images (AEIs). AEIs are the 2D binary projection of human silhouette frames of the video sequences, which reduces the feature vector generation time complexity. Spatial Distribution Gradients are computed at different levels of resolution of sub-images of AEI consisting overall shape variations of human silhouette during the activity. Due to scale, rotation and translation invariant properties of STIPs, the vocabulary of DoG-based STIPs are created using vector quantization which is unique for each class of the activity. Extensive experiments are conducted to validate the performance of the proposed approach on four standard benchmarks, i.e., Weizmann, KTH, Ballet Movements, Multi-view IXMAS. Promising results are obtained when compared with the similar state of the arts, demonstrating the robustness of the proposed hybrid feature vector for different types of challenges—illumination, view variations posed by the datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Tripathi, G., Singh, K., Vishwakarma, D.K.: Convolutional neural networks for crowd behaviour analysis: a survey. Vis. Comput. 1–24 (2018). https://doi.org/10.1007/s00371-018-1499-5

  2. Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. 28(6), 976–990 (2010)

    Article  Google Scholar 

  3. Vishwakarma, S., Agrawal, A.: A survey on activity recognition and behavior understanding in video surveillance. Vis. Comput. 29(10), 983–1009 (2013)

    Article  Google Scholar 

  4. Herath, S., Harandi, M., Porikli, F.: Going deeper into action recognition: a survey. Image Vis. Comput. 60, 4–21 (2017)

    Article  Google Scholar 

  5. Han, J., Zhu, J., Cui, Y., Bai, L., Yue, J.: Action detection by double hierarchical multi-structure space–time statistical matching model. Opt. Rev. 25(141), 1–15 (2018)

    Google Scholar 

  6. Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition (2018). arXiv:1801.07455

  7. Weng, Z., Guan, Y.: Action recognition using length-variable edge trajectory and spatio-temporal motion skeleton descriptor. J. Image Video Process. 2018, 8 (2018). https://doi.org/10.1186/s13640-018-0250-5

    Article  Google Scholar 

  8. Vishwakarma, D.K., Kapoor, R., Maheshwari, R., Kapoor, V., Raman, S.: Recognition of abnormal human activity using the changes in orientation of silhouette in key frames. In: 2nd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi (2015)

  9. Han, H., Li, X.J.: Human action recognition with sparse geometric features. Imaging Sci. J. 63, 45–53 (2015)

    Article  Google Scholar 

  10. Guo, H., Fan, X., Wang, S.: Human attribute recognition by refining attention heat map. Pattern Recogn. Lett. 94, 38–45 (2017)

    Article  Google Scholar 

  11. Takano, W., Yamada, Y., Nakamur, Y.: Generation of action description from classification of motion and object. Robot. Auton. Syst. 91, 247–257 (2017)

    Article  Google Scholar 

  12. Patrona, F., Chatzitofis, A., Zarpalas, D., Daras, P.: Motion analysis: action detection, recognition and evaluation based on motion capture data. Pattern Recogn. 76, 612–622 (2018)

    Article  Google Scholar 

  13. Wang, X., Qi, C., Lin, F.: Combined trajectories for action recognition based on saliency detection and motion boundary. Signal Process. Image Commun. 57, 91–102 (2017)

    Article  Google Scholar 

  14. Dawn, D.D., Shaikh, S.H.: A comprehensive survey of human action recognition. Vis. Comput. 32(3), 289–306 (2016)

    Article  Google Scholar 

  15. Vishwakarma, D., Singh, K.: Human activity recognition based on spatial distribution of gradients at sub-levels of average energy silhouette images. IEEE Trans. Cogn. Dev. Syst. 9(4), 316–327 (2017)

    Article  Google Scholar 

  16. Vishwakarma, D.K., Kapoor, R.: Hybrid classifier based human action recogntion using silhouettes and cells. Expert Syst. Appl. 42(20), 6957–6965 (2015)

    Article  Google Scholar 

  17. Al-Ali, S., Milanova, M., Lynn Fox, H.A.-R.: Human action recognition: contour-based and silhouette-based approaches. Comput. Vis. Control Syst. 2, 11–47 (2014)

    Google Scholar 

  18. Jalal, A., Kim, Y.-H., Kim, Y.-J., Kim, D.: Robust human activity recognition from depth video using spatiotemporal multi-fused features. Pattern Recogn. 61, 295–308 (2017)

    Article  Google Scholar 

  19. Coniglio, C., Meurie, C., Lézoray, O., Berbineau, M.: People silhouette extraction from people detection bounding boxes in images. Pattern Recogn. Lett. 93, 182–191 (2017)

    Article  Google Scholar 

  20. Coniglio, C., Meurie, C., Lézoray, O., Berbineau, M.: A graph based people silhouette segmentation using combined probabilities extracted from appearance, shape template prior, and color distributions. In: International Conference on Advanced Concepts for Intelligent Vision Systems, Catania, Italy (2015)

  21. Asadi-Aghbolaghi, M., Kasaei, S.: Supervised spatio-temporal kernel descriptor for human action recognition from RGB-depth videos. Multimed. Tools Appl. 1–21 (2017). https://doi.org/10.1007/s11042-017-5017-y

  22. Al-Maadeed, S., Almotaeryi, R., Jiang, R., Bouridane, A.: Robust human silhouette extraction with Laplacian fitting. Pattern Recogn. Lett. 49, 69–76 (2014)

    Article  Google Scholar 

  23. Singh, S., Velastin, S., Ragheb, H., M.: A multicamera human action video dataset for the evaluation of action recognition methods. In: International Conference on Advanced Video and Signal Based Surveillance, Boston, Massachusetts (2010)

  24. Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23(3), 257–267 (2001)

    Article  Google Scholar 

  25. Ijjina, E.P., Chalavadi, K.M.: Human action recognition in RGB-D videos using motion sequence. Pattern Recogn. 72, 504–516 (2017)

    Article  Google Scholar 

  26. Aggarwal, H., Vishwakarma, D.K.: Covariate conscious approach for Gait recognition based upon Zernike moment invariants. IEEE Trans. Auton. Ment. Dev. 99, 1–1 (2017)

    Google Scholar 

  27. Laptev, I.: On space–time interest points. Int. J. Comput. Vis. 64(2–3), 107–123 (2005)

    Article  Google Scholar 

  28. Raptis, M., Soatto, S.: Tracklet descriptors for action modeling and video analysis. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV. Lecture Notes in Computer Science, vol. 6311. Springer, Berlin (2010)

    Google Scholar 

  29. Pei, L., Ye, M., Zhao, X., Bao, Y.D.: Action recognition by learning temporal slowness invariant features. Vis. Comput. 32(11), 1395–1404 (2016)

    Article  Google Scholar 

  30. Nguyen, T.-N., Miyata, K.: Multi-scale region perpendicular local binary pattern: an effective feature for interest region description. Vis. Comput. 31(4), 391–406 (2015)

    Article  Google Scholar 

  31. Vishwakarma, D.K., Kapoor, R., Dhiman, A.: Unified framework for human activity recognition: an approach using spatial edge distribution and R-transform. Int. J. Electron. Commun. 70(3), 341–353 (2016)

    Article  Google Scholar 

  32. Brutzer, S., Höferlin, B., Heidemann, G.: Evaluation of background subtraction techniques for video surveillance. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Colorado Springs, CO, USA (2011)

  33. Permuter, H., Francos, J., Jermyn, I.: A study of Gaussian mixture models of color and texture features for image classification and segmentation. Pattern Recogn. 39(4), 695–706 (2006)

    Article  MATH  Google Scholar 

  34. Zeng, S., Huang, R., Kang, Z., Sang, N.: Image segmentation using spectral clustering of Gaussian mixture models. Neurocomputing 144, 346–356 (2014)

    Article  Google Scholar 

  35. Ojala, T., Pietikainen, M.: Unsupervised texture segmentation using feature distributions. Pattern Recogn. 32(3), 477–486 (1999)

    Article  Google Scholar 

  36. Heikkila, M., Pietikainen, M.: A texture-based method for modeling the background and detecting moving objects. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 657–662 (2006)

    Article  Google Scholar 

  37. Rampun, A., Strange, H., Zwiggelaar, R.: Texture segmentation using different orientations of GLCM features. In: International Conference on Computer Vision, Germany (2013)

  38. Haralick, R.M., Shanmugam, K., Dinstein, I.H.: Textural features for image classification. IEEE Trans. Syst. Man Cybern. SMC(6), 610–621 (1973)

    Article  Google Scholar 

  39. Soh, L., Tsatsoulis, C.: Texture analysis of sar sea ice imagery using gray level co-occurrence matrices. IEEE Trans. Geosci. Remote Sens. 37(2), 780–795 (1999)

    Article  Google Scholar 

  40. Clausi, D.A.: An analysis of co-occurrence texture statistics as a function of grey level quantization. Can. J. Remote Sens. 28(1), 45–62 (2002)

    Article  Google Scholar 

  41. Komorkiewicz, M., Gorgon, M.: Foreground object features extraction with GLCM texture descriptor in FPGA. In: IEEE Conference on Design and Architectures for Signal and Image Processing (DASIP), Cagliari, Italy (2013)

  42. Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, New York (2006)

    MATH  Google Scholar 

  43. Johnson, S.C.: Hierarichal clustering schemes. Pyschometrica 32(3), 241–254 (1967)

    Article  Google Scholar 

  44. Ng, A.Y., Jordan, A.I., Weiss, Y.: On spectral clustering : analysis and an algorithm. In: NIPS (2001)

  45. Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: Proceedings of the 17th International Conference on Pattern Recognition, Cambridge, UK (2004)

  46. Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space–time shapes. IEEE Trans. Pattern Anal. Mach. Intell. 29(12), 2247–2253 (2007)

    Article  Google Scholar 

  47. Guha, T., Ward, R.K.: Learning sparse representations for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 34(8), 1576–1588 (2012)

    Article  Google Scholar 

  48. Weinland, D., Ronfard, R., Boyer, E.: Free viewpoint action recognition using motion history volumes. Comput. Vis. Image Underst. 104(2–3), 249–257 (2006)

    Article  Google Scholar 

  49. Rahmani, H., Mian, A.: 3D action recognition from novel viewpoints. In: CVPR, Las Vegas (2016)

  50. CMU motion capture database. http://mocap.cs

  51. Liu, L., Shao, L., Li, X., Lu, K.: Learning spatio-temporal representations for action recognition: a genetic programming approach. IEEE Trans. Cybern. 46(1), 158–170 (2016)

    Article  Google Scholar 

  52. Chaaraoui, A.A., Pérez, P.C., Revuelta, F.F.: Sihouette-based human action recognition using sequences of key poses. Pattern Recogn. Lett. 34(15), 1799–1807 (2013)

    Article  Google Scholar 

  53. Wu, D., Shao, L.: Silhouette analysis-based action recognition via exploiting human poses. IEEE Trans. Circuits Syst. Video Technol. 23(2), 236–243 (2013)

    Article  Google Scholar 

  54. Goudelis, G., Karpouzis, K., Kollias, S.: Exploring trace transform for robust human action recognition. Pattern Recogn. 46(12), 3238–3248 (2013)

    Article  Google Scholar 

  55. Touati, R., Mignotte, M.: MDS-based multi-axial dimensionality reduction model for human action recognition. In: Canadian Conference on Computer and Robot Vision, Montreal, QC, Canada (2014)

  56. Fu, Y., Zhang, T., Wang, W.: Sparse coding-based space–time video representation for action recognition. Multimed. Tools Appl. 76(10), 12645–12658 (2017)

    Article  Google Scholar 

  57. Lei, J., Li, G., Zhang, J., Guo, Q., Tu, D.: Continuous action segmentation and recognition using hybrid convolutional neural network-hidden Markov model model. IET Comput. Vis. 10(6), 537–544 (2016)

    Article  Google Scholar 

  58. Liu, H., Shu, N., Tang, Q., Zhang, W.: Computational model based on neural network of visual cortex for human action recognition. IEEE Trans. Neural Netw. Learn. Syst. PP(99), 1–14 (2017)

    Google Scholar 

  59. Sadek, S., Hamadi, A.A., Elmezain, M., Michaelis, B., Sayed, U.: Human action recognition via affine moment invariants. In: International Conference on Pattern Recognition, Tsukuba, Japan (2012)

  60. Saghafi, B., Rajan, D.: Human action recognition using Pose-based discriminant embedding. Sig. Process. Image Commun. 27(1), 96–111 (2012)

    Article  Google Scholar 

  61. Rahman, S.A., Song, I., Leung, M.K.H., Lee, I., Lee, K.: Fast action recognition using negative space features. Expert Syst. Appl. 41(2), 574–587 (2014)

    Article  Google Scholar 

  62. Conde, I.G., Olivieri, D.N.: A KPCA spatio-temporal differential geometric trajectory cloud classifier for recognizing human actions in a CBVR system. Expert Syst. Appl. 42(13), 5472–5490 (2015)

    Article  Google Scholar 

  63. Li, B., Camps, O.I., Sznaier, M.: Cross-view activity recognition using hankelets. In: IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI (2012)

  64. Shi, Y., Tian, Y., Wang, Y., Huang, T.: Sequential deep trajectory descriptor for action recognition with three-stream CNN. IEEE Trans. Multimed. 19(7), 1510–1520 (2017)

    Article  Google Scholar 

  65. Bregonzio, M., Gong, S., Xiang, T.: Recognising action as clouds of space time interest points. In: CVPR, FL, USA, Miami (2009)

  66. Liu, J., Shah, M.: Learning human actions via information maximization. In: CVPR, Anchorage, AK, USA (2008)

  67. Ryoo, M., Aggarwal, J.: Spatio-temporal relationship match: video structure comparison for recognition of complex human activities. In: ICCV, Kyoto, Japan (2009)

  68. Dollár, P., Rabaud, V.C., Cottrell, G.W., Belongie, S.J.: Behavior recognition via sparse spatio-temporal features. In: International Conference on Computer Communications and Networks, Washington, USA (2005)

  69. Fathi, A., Mori, G.: Action recognition by learning mid-level motion features. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Anchorage, AK, USA (2008)

  70. Wang, Y., Mori, G.: Human action recognition using semi-latent topic model. IEEE Trans. Pattern Anal. Mach. Intell. 31(10), 1762–1764 (2009)

    Article  Google Scholar 

  71. Ming, X.L., Xia, H.J., Zheng, T.L.: Human action recognition based on chaotic invariants. J. South Cent. Univ. 20, 3171–3179 (2014)

    Google Scholar 

  72. Iosifidis, A., Tefas, A., Pitas, I.: Discriminant bag of words based representation for human action recognition. Pattern Recogn. Lett. 49, 185–192 (2014)

    Article  Google Scholar 

  73. Wu, X., Xu, D., Duan, L., Luo, J.: Action recognition using context and appearance distribution features. In: IEEE Conference on Computer Vision and Pattern Recognition CVPR, Providence, RI (2011)

  74. Weinland, D., Özuysal, M., Fua, P.: Making action recognition robust to occlusions and viewpoint changes. In: Proceedings of the European Conference on Computer Vision (ECCV), Crete, Greece (2010)

  75. Wu, X., Jia, Y.: View-invariant action recognition using latent kernelized structural SVM. In: Proceedings of the 12th European Conference on Computer Vision (ECCV), Florence, Italy (2012)

  76. Mosabbeb, E.A., Raahemifar, K., Fathy, M.: Multi-view human activity recognition in distributed camera. Sensors 13(7), 8750–8770 (2013)

    Article  Google Scholar 

  77. Wang, J., Zheng, H., Gao, J., Cen, J.: Cross-view action recognition based on a statistical translation framework. IEEE Trans. Circuits Syst. Video Technol. 26(8), 1461–1475 (2016)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dinesh Kumar Vishwakarma.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vishwakarma, D.K., Dhiman, C. A unified model for human activity recognition using spatial distribution of gradients and difference of Gaussian kernel. Vis Comput 35, 1595–1613 (2019). https://doi.org/10.1007/s00371-018-1560-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-018-1560-4

Keywords

Navigation