Skip to main content
Log in

Realistic human action recognition by Fast HOG3D and self-organization feature map

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Nowadays, local features are very popular in vision-based human action recognition, especially in “wild” or unconstrained videos. This paper proposes a novel framework that combines Fast HOG3D and self-organization feature map (SOM) network for action recognition from unconstrained videos, bypassing the demanding preprocessing such as human detection, tracking or contour extraction. The contributions of our work not only lie in creating a more compact and computational effective local feature descriptor than original HOG3D, but also lie in first successfully applying SOM to realistic action recognition task and studying its training parameters’ influence. We mainly test our approach on the UCF-YouTube dataset with 11 realistic sport actions, achieving promising results that outperform local feature-based support vector machine and are comparable with bag-of-words. Experiments are also carried out on KTH and UT-Interaction datasets for comparison. Results on all the three datasets confirm that our work has comparable, if not better, performance comparing with state-of-the-art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Similar content being viewed by others

References

  1. Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. 28, 976–990 (2010)

    Article  Google Scholar 

  2. Turaga, P., Chellappa, R.: Machine recognition of human activities: a survey. IEEE Trans. Circuits Syst. Video Technol. 18(11), 1473–1488 (2008)

    Article  Google Scholar 

  3. Chaquet, J.M., Carmona, E.J., Caballero, A.F.: A survey of video datasets for human action and activity recognition. Comput. Vis. Image Underst. (CVIU) 117(6), 633–659 (2013)

    Article  Google Scholar 

  4. Ryoo, M.S., Aggarwal, J.K.: Recognition of composite human activities through context-free grammar based representation. Proc. CVPR 2, 1709–1718 (2006)

    Google Scholar 

  5. Kohonen, T.: Self-organized formation of topologically correct feature maps. Biol. Cybern. 43(1), 59–69 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  6. Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: Proceedings of the IEEE International Workshop on Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005)

  7. Nowak, E., Jurie, F., Triggs, B.: Sampling strategies for bag-of-features image classification. Proc. ECCV 3954, 490–503 (2006)

    Google Scholar 

  8. Yao, A., Gall, J., Van Gool, L.: A Hough transform-based voting framework for action recognition. In: Proceedings of CVPR, pp. 2061–2068 (2010)

  9. Klaser, A., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3D-gradients. In: Proceedings of British Machine Vision Conference (BMVC), pp. 995–1004 (2008)

  10. Ji, Yanli, Shimada, A., Taniguchi, R.: Human action recognition by SOM considering the probability of spatio-temporal features. Neural Inf. Process. Models Appl. 6444, 391–398 (2010)

    Article  Google Scholar 

  11. Ilonen, J., Kamarainen, J.K.: Object categorization using self-organization over visual appearance. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN), pp. 4549–4553 (2006)

  12. Huang, W., Wu, Q.M.J.: Human action recognition based on self-organizing map. In: Proceedings of ICASSP, pp. 2130–2133 (2010)

  13. Jin, S., Li, Y., Lu, G.-M., et al.: SOM-based hand gesture recognition for virtual interactions. In: Proceedings of the IEEE International Symposium on Virtual Reality Innovation (ISVRI), pp. 317–322 (2011)

  14. Shimada, A., Taniguchi, R.i.: Gesture recognition using sparse code of hierarchical SOM. In: Proceedings of ICPR, pp. 1–4 (2008)

  15. Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. Proc. ICCV 2, 1395–1402 (2005)

    Google Scholar 

  16. Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. Proc. ICPR 3, 32–36 (2004)

    Google Scholar 

  17. Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos “in the wild”. In: Proceedings of CVPR, pp. 1996–2003 (2009)

  18. Ryoo, M.S., Aggarwal, J.K.: Spatio-temporal relationship match: video structure comparison for recognition of complex human activities. In: Proceedings of ICCV, pp. 1593–1600 (2009)

  19. Cohen, I., Li, H.: Inference of human postures by classification of 3D human body shape. In: Proceedings of the IEEE International Workshop on Analysis and Modeling of Faces and Gestures (AMFG), pp. 74–81 (2003)

  20. Sheikh, Y., Sheikh, M., Shah, M.: Exploring the space of a human action. Proc. CVPR 1, 144–149 (2005)

    Google Scholar 

  21. Kellokumpu, V., Pietikäinen, M., Heikkilä, J.: Human activity recognition using sequences of postures. In: Proceedings of IAPR Conference on Machine Vision Applications, pp. 570–573 (2005)

  22. Lv, F., Nevatia, R.: Single view human action recognition using key pose matching and viterbi path searching. In: Proceedings of CVPR, pp. 1–8 (2007)

  23. Wang, L., Suter, D.: Recognizing human activities from silhouettes: motion subspace and factorial discriminative graphical model. In: Proceedings of CVPR, pp. 1–8 (2007)

  24. Abdelkader, M.F., Almageed, W.A., Srivastava, A., Chellappa, R.: Silhouette-based gesture and action recognition via modeling trajectories on Riemannian shape manifolds. Comput. Vis. Image Underst. (CVIU) 115(3), 439–455 (2011)

    Article  Google Scholar 

  25. Yilmaz, A., Shah, M.: Actions sketch: a novel action representation. Proc. CVPR 1, 984–989 (2005)

    Google Scholar 

  26. Achard, C., Qu, X., Mokhber, A., Milgram, M.: A novel approach for recognition of human actions with semi-global features. Mach. Vis. Appl. 19, 27–34 (2008)

    Article  Google Scholar 

  27. Grundmann, M., Meier, F., Essa, I.: 3D shape context and distance transform for action recognition. In: Proceedings of ICPR, pp. 1–4 (2008)

  28. Laptev, I.: On space-time interest points. IJCV 64(2/3), 107–123 (2005)

    Article  Google Scholar 

  29. Matikainen, P., Hebert, M., Sukthankar, R.: Trajectons: action recognition through the motion analysis of tracked features. In: Proceedings of ICCV Workshops, pp. 514–521 (2009)

  30. Messing, R., Pal, C., Kautz, H.: Activity recognition using the velocity histories of tracked keypoints. In: Proceedings of ICCV, pp. 104–111 (2009)

  31. Raptis, M., Soatto, S.: Tracklet descriptors for action modeling and video analysis. In: Proceedings of ECCV, pp. 577–590 (2010)

  32. Wang, H., Klaser, A., Schmid, C.: Dense trajectories and motion boundary descriptors for action recognition. IJCV 103(1), 60–79 (2013)

    Article  MathSciNet  Google Scholar 

  33. Niebles, J.C., Wang, Hongcheng, Fei-Fei, Li: Unsupervised learning of human action categories using spatial-temporal words. IJCV 79(3), 299–318 (2008)

    Article  Google Scholar 

  34. Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Proceedings of CVPR, pp. 1–8 (2008)

  35. Scovanner, P., Ali, S., Shah, M.: A 3-dimensional SIFT descriptor and its application to action recognition. In: Proceedings of ACM International Conference on Multimedia, pp. 357–360 (2007)

  36. Matikainen, P., Hebert, M., Sukthankar, R.: Representing pairwise spatial and temporal relations for action recognition. In: Proceedings of ECCV, pp. 508–521 (2010)

  37. Zhang, Y., Liu, X., Chang, M.C., et al.: Spatio-temporal phrases for activity recognition. Proc. ECCV 7574, 707–721 (2012)

    Google Scholar 

  38. Schindler, K., van Gool, L.: Action snippets: how many frames does human action recognition require? In: Proceedings of CVPR, pp. 1–8 (2008)

  39. Etemad, S.A., Arya, A.: 3D human action recognition and style transformation using resilient backpropagation neural networks. Proc. Intell. Comput. Intell. Syst. (ICIS) 4, 296–301 (2009)

    Google Scholar 

  40. Li, N., Cheng, X., Zhang, S., Wu, Z.: Recognizing human actions by BP-AdaBoost algorithm under a hierarchical framework. In: Proceedings of ICASSP, pp. 3407–3411 (2013)

  41. Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML, pp. 282–289 (2001)

  42. Wang, Y., Mori, G.: Max-margin hidden conditional random fields for human action recognition. In: Proceedings of CVPR, pp. 872–879 (2009)

  43. Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)

    Article  Google Scholar 

  44. Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57 (1999)

  45. Gong, Shaogang, Xiang, Tao: Recognition of group activities using dynamic probabilistic networks. Proc. ICCV 2, 742–749 (2003)

    Google Scholar 

  46. Ryoo, M.S., Chen, C.C., Aggarwal J.K., et al.: An overview of contest on semantic description of human activities (SDHA) 2010. In: Proceedings of the Recognizing Patterns in Signals, Speech, Images and Videos, pp. 270–285 (2010)

  47. Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 4th edn. Elsevier Pte Ltd., Singapore (2010)

    Google Scholar 

  48. Boberg, J., Salakoski, T.: General formulation and evaluation of agglomerative clustering methods with metric and non-metric distances. Pattern Recognit. 26(9), 1395–1406 (1993)

    Article  Google Scholar 

  49. Liu, J., Yang, Y., Shah, M.: Learning semantic visual vocabularies using diffusion distance. In: Proceedings of CVPR, pp. 461–468 (2009)

  50. Fathi, A., Mori, G.: Action recognition by learning mid-level motion features. In: Proceedings of CVPR, pp. 1–8 (2008)

  51. Yeffet, L., Wolf, L.: Local trinary patterns for human action recognition. In: Proceedings of ICCV, pp. 492–497 (2009)

  52. Imtiaz, H., Mahbub, U., Ahad, M.A.R.: Action recognition algorithm based on optical flow and RANSAC in frequency domain. In: Proceedings of SICE Annual Conference, pp. 1627–1631 (2011)

  53. Waltisberg, D., Yao, A., Gall, J., Van Gool, L.: Variations of a Hough-voting action recognition system. In: Proceedings of the Recognizing Patterns in Signals, Speech, Images and Videos, pp. 306–312 (2010)

  54. Zhen, X., Shao, L.: A local descriptor based on Laplacian pyramid coding for action recognition. Pattern Recognit. Lett. (PRL) 34(15), 1899–1905 (2013)

    Article  Google Scholar 

  55. Mukherjee, S., Biswas, S.K., Mukherjee, D.P.: Recognizing interactions between human performers by ‘dominating pose doublet’. In: Proceedings of the Machine Vision and Applications, pp. 1–20 (2013)

Download references

Acknowledgments

This work is supported by National Natural Science Foundation of China (NSFC) under Grant Nos. 60971098 and 61302152.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nijun Li.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, N., Cheng, X., Zhang, S. et al. Realistic human action recognition by Fast HOG3D and self-organization feature map. Machine Vision and Applications 25, 1793–1812 (2014). https://doi.org/10.1007/s00138-014-0639-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00138-014-0639-9

Keywords

Navigation