Abstract
In order to improve the accuracy of action recognition, a compact discriminant hierarchical clustering approach and an action recognition new framework are respectively proposed. Firstly, on the bases of low-level features 3D Self-Correlation Histogram of Oriented Gradient in Trajectory (3D_SCHOGT) and 3D Self-Correlation Histogram of Oriented Optical Flow in Trajectory (3D_SCHOOFT), the mid-level semantics possessing purity, representativeness and discriminativeness simultaneously are obtained using the proposed compact discriminant hierarchical clustering approach, in which removal of singularities, quantitative evaluations of purity, representativeness and discriminativeness, as well as additive constraint of information entropies for clusters are conducted respectively to assure the better purity, representativeness and discriminativeness. Secondly, by introducing category constraint, a discriminant classification model of Category Constraint Latent Support Vector Machines (CC-LSVM) is proposed, which enhances the discriminative ability of classifier. Finally, to further improve the accuracy of action recognition, a new framework is proposed, which introduces low-level features, mid-level semantics and mid-level semantic self-correlation features into the proposed CC-LSVM classifier in a weighted association way, makes full use of category information of actions, and mines the correlations between multi-semantic features and action categories. Consequently, the action recognition accuracy is improved. The accuracies on Weizmann, KTH, UCF-Sports and YouTube datasets are 100%, 98.83%, 98.67% and 90.73% respectively, which outperform all those in contrastive methods. Experiments demonstrate the effectiveness of proposed compact discriminant hierarchical clustering approach and new framework.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Azizpour H (2016) Visual representations and models: from latent SVM to deep learning. Dissertation, KTH - Royal Institute of Technology
Bajcsy P, Ahuja N (1998) Location-and density-based hierarchical clustering using similarity analysis. IEEE Trans Pattern Anal Mach Intell 20(9):1011–1015
Benmokhtar R (2014) Robust human action recognition scheme based on high-level feature fusion. Multimed Tools Appl 69(2):253–275
Blake C, Merz CJ (1998) UCI Repository of machine learning databases. http://archive.ics.uci.edu/ml/
Byrne J (2015) Nested motion descriptors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 502–510
Cao XQ, Liu ZQ (2015) Type-2 fuzzy topic models for human action recognition. IEEE Trans Fuzzy Syst 23(5):1581–1593
Chaquet JM, Carmona EJ, Fernández-Caballero A (2013) A survey of video datasets for human action and activity recognition. Comput Vis Image Underst 117(6):633–659
Chatzis SP, Kosmopoulos D (2015) A nonparametric bayesian approach toward stacked convolutional independent component analysis. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 2803–2811
Cho J, Lee M, Chang HJ, Oh S (2014) Robust action recognition using local motion and group sparsity. Pattern Recogn 47(5):1813–1825
Das S, Abraham A, Konar A (2008) Automatic clustering using an improved differential evolution algorithm. IEEE Trans Syst Man Cybern 38(1):218–237
Derpanis KG, Sizintsev M, Cannons KJ, Wildes RP (2013) Action spotting and recognition based on a spatiotemporal orientation analysis. IEEE Trans Pattern Anal Mach Intell 35(3):527–540
Felzenszwalb P, Girshick R, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Gorelick L, Blank M, Shechtman E, Irani M, Basri R (2007) Actions as space-time shapes. IEEE Trans Pattern Anal Mach Intell 29(12):2247–2253
Goudelis G, Karpouzis K, Kollias S (2013) Exploring trace transform for robust human action recognition. Pattern Recogn 46(12):3238–3248
Guha T, Ward RK (2012) Learning sparse representations for human action recognition. IEEE Trans Pattern Anal Mach Intell 34(8):1576–1588
Hsu YP, Liu C, Chen TY, Fu LC (2016) Online view-invariant human action recognition using rgb-d spatio-temporal matrix. Pattern Recogn 60:215–226
Ikizler-Cinbis N, Sclaroff S (2010) Object, scene and actions: combining multiple features for human action recognition. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 494–507
Jain A, Gupta A, Rodriguez M, Davis LS (2013) Representing videos using mid-level discriminative patches. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2571–2578
Jiang Z, Lin Z, Davis LS (2013) Label consistent K-SVD: learning a discriminative dictionary for recognition. IEEE Trans Pattern Anal Mach Intell 335(11):2651–2664
Junejo I, Dexter E, Laptev I, Perez P (2011) View-independent action recognition from temporal self-similarities. IEEE Trans Pattern Anal Mach Intell 33(1):172–185
Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY (2002) An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24(7):881–892
Klaser A, Marszałek M, Schmid C (2008) A spatio-temporal descriptor based on 3d–gradients. In: British Machine Vision Conference (BMVC), pp 275:1–275:10
Kobayashi T, Otsu N (2012) Motion recognition using local auto-correlation of space–time gradients. Pattern Recogn Lett 33(9):1188–1195
Lan T, Zhu Y, Zamir AR, Savarese S (2015) Action recognition by hierarchical mid-level action elements. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 4552–4560
Le QV, Zou WY, Yeung SY, Ng AY (2011) Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3361–3368
Li LJ, Jha RK, Thomee B, Shamma DA, Cao L, Wang Y (2016) Where the photos were taken: location prediction by learning from flickr photos. In: Large-Scale Visual Geo-Localization, pp 41–58
Liu J, Luo J, Shah M (2009) Recognizing realistic actions from videos “in the wild”. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1996–2003
Liu J, Yang Y, Saleemi I, Shah M (2012) Learning semantic features for action recognition via diffusion maps. Comput Vis Image Underst 116(3):361–377
Liu L, Shao L, Zheng F, Li X (2014) Realistic action recognition via sparsely-constructed Gaussian processes. Pattern Recogn 47(12):3819–3827
Liu W, Liu H, Tao D, Wang Y, Lu K (2015) Multiview hessian regularized logistic regression for action recognition. Signal Process 110:101–107
Liu L, Shao L, Li X, Lu K (2016) Learning spatio-temporal representations for action recognition: a genetic programming approach. IEEE Trans Cybern 46:158–170
Narayan S, Ramakrishnan KR (2014) A cause and effect analysis of motion trajectories for modeling actions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2633–2640
Nasiri JA, Charkari NM, Mozafari K (2014) Energy-based model of least squares twin support vector machines for human action recognition. Signal Process 104:248–257
Nguyen TV, Song Z, Yan S (2015) STAP: spatial-temporal attention-aware pooling for action recognition. IEEE Trans Circuits Syst Video Technol 25(1):77–86
Niebles JC, Wang H, Li FF (2008) Unsupervised learning of human action categories using spatial-temporal words. Int J Comput Vis 79(3):299–318
Pei L, Ye M, Xu P, Li T (2015) Fast multi-class action recognition by querying inverted index tables. Multimed Tools Appl 74(23):10801–10822
Peng X, Qiao Y, Peng Q, Qi X (2013) Exploring motion boundary based sampling and spatial-temporal context descriptors for action recognition. In: British Machine Vision Conference (BMVC), pp 1–11
Peng X, Zou C, Qiao Y, Peng Q (2014) Action recognition with stacked fisher vectors. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 581–595
Ramakanth SA, Babu RV (2012) Feature match: an efficient low dimensional PatchMatch technique. In: Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP), pp 45:1–45:7
Raptis M, Kokkinos I, Soatto S (2012) Discovering discriminative action parts from mid-level video representations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1242–1249
Rodriguez MD, Ahmed J, Shah M (2008) Action mach a spatio-temporal maximum average correlation height filter for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1–8
Rrnyi A (1961) On measures of entropy and information. In: Proceedings of the Berkeley Symposium on Mathematical Statistics and Probability, pp 547–561
Sapienza M, Cuzzolin F, Torr PH (2014) Learning discriminative space-time action parts from weakly labelled videos. Int J Comput Vis 110(1):30–47
Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: Proceedings of the IEEE International Conference on Pattern Recognition (ICPR), pp 32–36
Shao L, Zhen X, Tao D, Li X (2014) Spatio-temporal laplacian pyramid coding for action recognition. IEEE Trans Cybern 44(6):817–827
Singh S, Gupta A, Efros AA (2012) Unsupervised discovery of mid-level discriminative patches. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 73–86
Wang H, Kläser A, Schmid C, Liu CL (2011) Action recognition by dense trajectories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3169–3176
Wang J, Chen Z, Wu Y (2011) Action recognition with multiscale spatio-temporal contexts. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3185–3192
Wang H, Yuan C, Weiming H, Sun C (2012) Supervised class-specific dictionary learning for sparse modeling in action recognition. Pattern Recogn 45(11):3902–3911
Wang C, Wang Y, Yuille AL (2013) An approach to pose-based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 915–922
Wang H, Kläser A, Schmid C, Liu CL (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103(1):60–79
Wang L, Qiao Y, Tang X (2013) Mining motion atoms and phrases for complex action recognition. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 2680–2687
Wang L, Qiao Y, Tang X (2013) Motionlets: mid-level 3D parts for human motion recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2674–2681
Wang H, Yuan C, Hu W, Ling H, Yang W, Sun C (2014) Action recognition using nonnegative action component representation and sparse basis selection. IEEE Trans Image Process 23(2):570–581
Wang L, Qiao Y, Tang X (2016) MoFAP: a multi-level representation for action recognition. Int J Comput Vis 119(3):254–271
Wang X, Thome N, Cord M (2016) Gaze latent support vector machine for image classification. In: IEEE International Conference on Image Processing (ICIP), pp 236–240
Wang X, Yang X, Liu W, Duan C, Latecki LJ (2016) Location-aware image classification. In: International Confernce on Multimedia Modeling (MMM), pp 829–841
Wu X, Xu D, Duan L, Luo J (2011) Action recognition using context and appearance distribution features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 489–496
Wu X, Xu D, Duan L, Luo J, Jia Y (2013) Action recognition using multilevel features and latent structural SVM. IEEE Trans Circuits Syst Video Technol 23(8):1422–1431
Yang X, Tian Y (2014) Action recognition using super sparse coding vector with spatio-temporal awareness. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 727–741
Yang Y, Liu R, Deng C, Gao X (2016) Multi-task human action recognition via exploring super-category. Signal Process 124:36–44
Yi Y, Lin Y (2013) Human action recognition with salient trajectories. Signal Process 93(11):2932–2941
Yi Y, Lin M (2016) Human action recognition with graph-based multiple-instance learning. Pattern Recogn 53:148–162
Yu H, Liu Z, Wang G (2014) An automatic method to determine the number of clusters using decision-theoretic rough set. Int J Approx Reason 55(1):101–115
Yuan C, Li X, Hu W, Ling H, Maybank S (2013) 3D R transform on spatio-temporal interest points for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 724–730
Zhang Z, Tao D (2012) Slow feature analysis for human action recognition. IEEE Trans Pattern Anal Mach Intell 34(3):436–450
Zhang H, Zhou W, Reardon C, Parker LE (2014) Simplex-based 3D spatio-temporal feature description for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2067–2074
Zhou W, Wang C, Xiao B, Zhang Z (2014) Action recognition via structured codebook construction. Signal Process Image Commun 29(4):546–555
Zhu J, Wang B, Yang X, Zhang W, Tu Z (2013) Action recognition with actons. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 3559–3566
Acknowledgements
This work was supported by National Natural Science Foundation of China [No. 61072110]; Science and Technology Overall Innovation Project of Shaanxi Province [2013KTZB03-03-03]; Industrial Research Project of Shaanxi Province [2015GY011]; International Cooperation Project of Shaanxi Province [2015KW-004]; International Cooperation Project of Shaanxi Province [2016KW-042].
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tong, M., Tian, W., Wang, H. et al. A compact discriminant hierarchical clustering approach for action recognition. Multimed Tools Appl 77, 7539–7564 (2018). https://doi.org/10.1007/s11042-017-4660-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-4660-7