Recognizing human interactions by genetic algorithm-based random forest spatio-temporal correlation

Li, Nijun; Cheng, Xu; Guo, Haiyan; Wu, Zhenyang

doi:10.1007/s10044-015-0463-5

Recognizing human interactions by genetic algorithm-based random forest spatio-temporal correlation

Industrial and Commercial Application
Published: 19 March 2015

Volume 19, pages 267–282, (2016)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Nijun Li¹,
Xu Cheng¹,
Haiyan Guo¹ &
…
Zhenyang Wu¹

417 Accesses
8 Citations
Explore all metrics

Abstract

Recognizing human interactions is a more challenging task than recognizing single person activities and has attracted much attention of the computer vision community. This paper proposes an innovative and effective way to recognize human interactions, which incorporates the advantages of both global motion context (MC) feature and spatio-temporal (S-T) correlation of local spatio-temporal interest point feature. The MC feature is used to train a random forest where genetic algorithm (GA) is applied to the training phase to achieve a good compromise between reliability and efficiency. Besides, we propose S-T correlation-based match, where MC’s structure and Needleman–Wunsch algorithm are used to calculate the spatial and temporal correlation score of two videos, respectively. Experiments on the UT-Interaction dataset show that our approaches outperform other prevalent machine learning methods, and that the combination of GA search-based random forest and S-T correlation achieves the state-of-the-art performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An online learned hough forest model based on improved multi-feature fusion matching for multi-object tracking

Article 16 August 2018

Detecting and Tracking Sports Players with Random Forests and Context-Conditioned Motion Models

Classification of Spatiotemporal Events Based on Random Forest

References

Poppe R (2010) A survey on vision-based human action recognition. Image Vis Comput (IVC) 28(6):976–990
Article Google Scholar
Turaga P, Chellappa R, Subrahmanian VS et al (2008) Machine recognition of human activities: a survey. IEEE Trans Circuits Syst Video Technol 18(11):1473–1488
Article Google Scholar
Blank M, Gorelick L, Shechtman E, et al. (2005) Actions as space-time shapes. In: Proc. of International Conference on Computer Vision (ICCV), pp 1395–1402
Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: Proc. of International Conference on Pattern Recognition (ICPR), pp 32–36
Lin Z, Jiang Z, Davis LS (2009) Recognizing actions by shape-motion prototype trees. In: Proc. of International Conference on Computer Vision (ICCV), pp 444–451
Laptev I, Marszalek M, Schmid C, et al. (2008) Learning realistic human actions from movies. In: Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1–8
Liu J, Luo J, Shah M (2009) Recognizing realistic actions from videos “in the wild”. In: Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1996–2003
Ryoo MS, Aggarwal JK (2009) Spatio-temporal relationship match: video structure comparison for recognition of complex human activities. In: Proc. of International Conference on Computer Vision (ICCV), pp 1593–1600
Cohen I, Li H (2003) Inference of human postures by classification of 3D human body shape. In: IEEE International Workshop on Analysis and Modeling of Faces and Gestures (AMFG), pp 74–81
Bloom V, Argyriou V, Makris D (2013) Dynamic feature selection for online action recognition. In: International Workshop on Human Behavior Understanding (HBU), pp 64–76
Schwarz LA, Mateus D, Navab N (2012) Recognizing multiple human activities and tracking full-body pose in unconstrained environments. Pattern Recognit 45(1):11–23
Article Google Scholar
Kellokumpu V, Pietikäinen M, Heikkilä J (2005) Human activity recognition using sequences of postures. In: IAPR Conf. of Machine Vision Applications, pp 570–573
Junejo IN, Junejo KN, Aghbari ZA (2013) Silhouette-based human action recognition using SAX-shapes. Vis Comput 30(3):259–269
Article Google Scholar
Grundmann M, Meier F, Essa I (2008) 3D shape context and distance transform for action recognition. In: Proc. of International Conference on Pattern Recognition (ICPR), pp 1–4
Razzaghi P, Palhang M, Gheissari N (2013) A new invariant descriptor for action recognition based on spherical harmonics. Pattern Anal Appl (PAAA) 16(4):507–518
Article MathSciNet Google Scholar
Laptev I (2005) On space-time interest points. Int J Comput Vis (IJCV) 64(2–3):107–123
Article Google Scholar
Dollar P, Rabaud V, Cottrell G, et al. (2005) Behavior recognition via sparse spatio-temporal features. In: IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp 65–72
Wang H, Klaser A, Schmid C et al (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis (IJCV) 103(1):60–79
Article MathSciNet Google Scholar
Yu J, Liu D, Tao D et al (2012) On combining multiple features for cartoon character retrieval and clip synthesis. IEEE Trans Syst Man Cybern 42(5):1413–1427
Article Google Scholar
Yu J, Wang M, Tao D (2012) Semisupervised multiview distance metric learning for cartoon synthesis. IEEE Trans Image Process (TIP) 21(11):4636–4648
Article MathSciNet Google Scholar
Liu W, Tao D, Cheng J et al (2014) Multiview Hessian discriminative sparse coding for image annotation. Comput Vis Image Underst (CVIU) 118:50–60
Article Google Scholar
Liu W, Tao D (2013) Multiview Hessian regularization for image annotation. IEEE Trans Image Process (TIP) 22(7):2676–2687
Article MathSciNet Google Scholar
Tao D, Tang X, Li X et al (2006) Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Trans Pattern Anal Mach Intell (TPAMI) 28(7):1088–1099
Article Google Scholar
Li N, Cheng X, Zhang S, et al. (2013) Recognizing human actions by BP-Adaboost algorithm under a hierarchical recognition framework. In: Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 3407–3411
Li N, Cheng X, Zhang S et al (2014) Realistic human action recognition by Fast HOG3D and self-organization feature map. Mach Vis Appl (MVA) 25(7):1793–1812
Article Google Scholar
Quattoni A, Wang S (2007) Hidden conditional random fields. IEEE Trans Pattern Anal Mach Intell (TPAMI) 29(10):1848–1853
Article Google Scholar
Mikolajczyk K, Uemura H (2008) Action recognition with motion-appearance vocabulary forest. In: Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1–8
Ubalde S, Goussies NA, Mejail ME (2013) Efficient descriptor tree growing for fast action recognition. Pattern Recognit Lett 36(1):213–220
MATH Google Scholar
Ahmad M, Lee SW (2006) HMM-based human action recognition using multiview image sequences. In: Proc. of International Conference on Pattern Recognition (ICPR), pp 263–266
Niebles JC, Wang H, Li FF (2008) Unsupervised learning of human action categories using spatial-temporal words. Int J Comput Vis (IJCV) 79(3):299–318
Article Google Scholar
Zhang Z, Hu Y, Chan S, et al. (2008) Motion context: a new representation for human action recognition. In: Proc. of European Conference on Computer Vision (ECCV), pp 817–829
Ogale AS, Karapurkar A, Aloimonos Y (2007) View-invariant modeling and recognition of human actions using grammars. In: Workshops on Dynamical Vision, pp 115–126
Wang L, Wang Y, Gao W (2011) Mining layered grammar rules for action recognition. Int J Comput Vis (IJCV) 93(2):162–182
Article MathSciNet MATH Google Scholar
Zhang Z, Tao D (2012) Slow feature analysis for human action recognition. IEEE Trans Pattern Anal Mach Intell (TPAMI) 34(3):436–450
Article MathSciNet Google Scholar
Zhang X, Yang Y, Jiao LC et al (2013) Manifold-constrained coding and sparse representation for human action recognition. Pattern Recognit 46(7):1819–1831
Article Google Scholar
Zhang T, Tao D, Li X et al (2009) Patch alignment for dimensionality reduction. IEEE Trans Knowl Data Eng 21(9):1299–1313
Article Google Scholar
Yu J, Liu D, Tao D et al (2011) Complex object correspondence construction in two-dimensional animation. IEEE Trans Image Process (TIP) 20(11):3257–3269
Article MathSciNet Google Scholar
Tao D, Li X, Wu X et al (2007) General tensor discriminant analysis and Gabor features for gait recognition. IEEE Trans Pattern Anal Mach Intell (TPAMI) 29(10):1700–1715
Article Google Scholar
Tao D, Li X, Wu X et al (2009) Geometric mean for subspace selection. IEEE Trans Pattern Anal Mach Intell (TPAMI) 31(2):260–274
Article Google Scholar
Liu W, Liu H, Tao D et al (2015) Multiview Hessian regularized logistic regression for action recognition. Signal Process 110:101–107
Article MathSciNet Google Scholar
Guan N, Tao D, Luo Z et al (2011) Non-negative patch alignment framework. IEEE Trans Neural Netw (TNN) 22(8):1218–1230
Article Google Scholar
Guan N, Tao D, Luo Z, et al. (2012) MahNMF: Manhattan non-negative matrix factorization. pp 1–43 (preprint) arXiv:1207.3438
Guan N, Tao D, Luo Z et al (2012) NeNMF: an optimal gradient method for nonnegative matrix factorization. IEEE Trans Signal Process (TSP) 60(6):2882–2898
Article MathSciNet Google Scholar
Shabani AH, Clausi DA, Zelek JS (2012) Evaluation of local spatio-temporal salient feature detectors for human action recognition. In: Proc. of Computer and Robot Vision (CRV), pp 468–475
Bregonzio M, Gong S, Xiang T (2009) Recognising action as clouds of space-time interest points. In: Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1948–1955
Thurau C, Hlavac V (2008) Pose primitive based human action recognition in videos or still images. In: Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1–8
Klaser A, Marszałek M, Schmid C (2008) A spatio-temporal descriptor based on 3D-gradients. In: Proc. of British Machine Vision Conference (BMVC), pp 995–1004
Scovanner P, Ali S, Shah M (2007) A 3-dimensional SIFT descriptor and its application to action recognition. In: Proc. of ACM International Conference on Multimedia, pp 357–360
Everts I, van Gemert JC, Gevers T (2013) Evaluation of color STIPs for human action recognition. In: Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2850–2857
Reddy KK, Liu J, Shah M (2009) Incremental action recognition using feature-tree. In: Proc. of International Conference on Computer Vision (ICCV), pp 1010–1017
Belongie S, Malik J, Puzicha J (2001) Shape context: a new descriptor for shape matching and object recognition. In: Proc. Advances in Neural Information Processing Systems, pp 831–837
Matikainen P, Hebert M, Sukthankar R (2010) Representing pairwise spatial and temporal relations for action recognition. In: Proc. of European Conference on Computer Vision (ECCV), pp 508–521
Zhang Y, Liu X, Chang MC, et al. (2012) Spatio-temporal phrases for activity recognition. In: Proc. of European Conference on Computer Vision (ECCV), pp 707–721
Yu J, Tao D, Wang M (2012) Adaptive hypergraph learning and its application in image classification. IEEE Trans Image Process (TIP) 21(7):3262–3272
Article MathSciNet Google Scholar
Yao A, Gall J, Van Gool L (2010) A Hough transform-based voting framework for action recognition. In: Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2061–2068
Waltisberg D, Yao A, Gall J, et al. (2010) Variations of a Hough-voting action recognition system. In: ICPR Contests on Recognizing Patterns in Signals, Speech, Images and Videos, pp 306–312
Liu C, Kong Y, Wu X, et al. (2012) Action recognition with discriminative mid-level features. In: Proc. of International Conference on Pattern Recognition (ICPR), pp 3366–3369
Moosmann F, Triggs B, Jurie F (2007) Fast discriminative visual codebooks using randomized clustering forests. In: Proc. of Advances in Neural Information Processing Systems, pp 985–992
Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48(3):443–453
Article Google Scholar
Ryoo MS, Chen CC, Aggarwal JK, et al. (2010) An overview of contest on semantic description of human activities (SDHA) 2010. In: ICPR Contests on Recognizing Patterns in Signals, Speech, Images and Videos, pp 270–285
Quinlan JR (1986) Introduction of decision trees. Mach Learn 1(1):81–106
Google Scholar
Quinlan JR (1993) C4.5: Programs for machine learning, Morgan Kaufmann
Mitchell TM (1997) Machine learning. McGraw-Hill Education Co. (Asia), New York
MATH Google Scholar
Theodoridis S, Koutroumbas K (2010) Pattern Recognit. Elsevier Pte Ltd, Singapore
MATH Google Scholar
Zhang X, Cui J, Tian L, et al. (2011) Local spatio-temporal feature based voting framework for complex human activity detection and localization. In: Proc. of Asian Conference on Pattern Recognition (ACPR), pp 12–16

Download references

Acknowledgments

This work was supported by National Natural Science Foundation of China (NSFC) under Grant Nos. 60971098 and 61302152.

Author information

Authors and Affiliations

School of Information Science and Engineering, Southeast University, Room 205 of Jianxiong Building, Sipailou #2, Xuanwu District, Nanjing, 210096, People’s Republic of China
Nijun Li, Xu Cheng, Haiyan Guo & Zhenyang Wu

Authors

Nijun Li
View author publications
You can also search for this author in PubMed Google Scholar
Xu Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Haiyan Guo
View author publications
You can also search for this author in PubMed Google Scholar
Zhenyang Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nijun Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, N., Cheng, X., Guo, H. et al. Recognizing human interactions by genetic algorithm-based random forest spatio-temporal correlation. Pattern Anal Applic 19, 267–282 (2016). https://doi.org/10.1007/s10044-015-0463-5

Download citation

Received: 08 January 2014
Accepted: 05 March 2015
Published: 19 March 2015
Issue Date: February 2016
DOI: https://doi.org/10.1007/s10044-015-0463-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recognizing human interactions by genetic algorithm-based random forest spatio-temporal correlation

Abstract

Access this article

Similar content being viewed by others

An online learned hough forest model based on improved multi-feature fusion matching for multi-object tracking

Detecting and Tracking Sports Players with Random Forests and Context-Conditioned Motion Models

Classification of Spatiotemporal Events Based on Random Forest

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Recognizing human interactions by genetic algorithm-based random forest spatio-temporal correlation

Abstract

Access this article

Similar content being viewed by others

An online learned hough forest model based on improved multi-feature fusion matching for multi-object tracking

Detecting and Tracking Sports Players with Random Forests and Context-Conditioned Motion Models

Classification of Spatiotemporal Events Based on Random Forest

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation