Skip to main content
Log in

A compact discriminant hierarchical clustering approach for action recognition

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In order to improve the accuracy of action recognition, a compact discriminant hierarchical clustering approach and an action recognition new framework are respectively proposed. Firstly, on the bases of low-level features 3D Self-Correlation Histogram of Oriented Gradient in Trajectory (3D_SCHOGT) and 3D Self-Correlation Histogram of Oriented Optical Flow in Trajectory (3D_SCHOOFT), the mid-level semantics possessing purity, representativeness and discriminativeness simultaneously are obtained using the proposed compact discriminant hierarchical clustering approach, in which removal of singularities, quantitative evaluations of purity, representativeness and discriminativeness, as well as additive constraint of information entropies for clusters are conducted respectively to assure the better purity, representativeness and discriminativeness. Secondly, by introducing category constraint, a discriminant classification model of Category Constraint Latent Support Vector Machines (CC-LSVM) is proposed, which enhances the discriminative ability of classifier. Finally, to further improve the accuracy of action recognition, a new framework is proposed, which introduces low-level features, mid-level semantics and mid-level semantic self-correlation features into the proposed CC-LSVM classifier in a weighted association way, makes full use of category information of actions, and mines the correlations between multi-semantic features and action categories. Consequently, the action recognition accuracy is improved. The accuracies on Weizmann, KTH, UCF-Sports and YouTube datasets are 100%, 98.83%, 98.67% and 90.73% respectively, which outperform all those in contrastive methods. Experiments demonstrate the effectiveness of proposed compact discriminant hierarchical clustering approach and new framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Azizpour H (2016) Visual representations and models: from latent SVM to deep learning. Dissertation, KTH - Royal Institute of Technology

  2. Bajcsy P, Ahuja N (1998) Location-and density-based hierarchical clustering using similarity analysis. IEEE Trans Pattern Anal Mach Intell 20(9):1011–1015

    Article  Google Scholar 

  3. Benmokhtar R (2014) Robust human action recognition scheme based on high-level feature fusion. Multimed Tools Appl 69(2):253–275

    Article  Google Scholar 

  4. Blake C, Merz CJ (1998) UCI Repository of machine learning databases. http://archive.ics.uci.edu/ml/

  5. Byrne J (2015) Nested motion descriptors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 502–510

  6. Cao XQ, Liu ZQ (2015) Type-2 fuzzy topic models for human action recognition. IEEE Trans Fuzzy Syst 23(5):1581–1593

    Article  Google Scholar 

  7. Chaquet JM, Carmona EJ, Fernández-Caballero A (2013) A survey of video datasets for human action and activity recognition. Comput Vis Image Underst 117(6):633–659

    Article  Google Scholar 

  8. Chatzis SP, Kosmopoulos D (2015) A nonparametric bayesian approach toward stacked convolutional independent component analysis. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 2803–2811

  9. Cho J, Lee M, Chang HJ, Oh S (2014) Robust action recognition using local motion and group sparsity. Pattern Recogn 47(5):1813–1825

    Article  Google Scholar 

  10. Das S, Abraham A, Konar A (2008) Automatic clustering using an improved differential evolution algorithm. IEEE Trans Syst Man Cybern 38(1):218–237

    Article  Google Scholar 

  11. Derpanis KG, Sizintsev M, Cannons KJ, Wildes RP (2013) Action spotting and recognition based on a spatiotemporal orientation analysis. IEEE Trans Pattern Anal Mach Intell 35(3):527–540

    Article  Google Scholar 

  12. Felzenszwalb P, Girshick R, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645

    Article  Google Scholar 

  13. Gorelick L, Blank M, Shechtman E, Irani M, Basri R (2007) Actions as space-time shapes. IEEE Trans Pattern Anal Mach Intell 29(12):2247–2253

    Article  Google Scholar 

  14. Goudelis G, Karpouzis K, Kollias S (2013) Exploring trace transform for robust human action recognition. Pattern Recogn 46(12):3238–3248

    Article  Google Scholar 

  15. Guha T, Ward RK (2012) Learning sparse representations for human action recognition. IEEE Trans Pattern Anal Mach Intell 34(8):1576–1588

    Article  Google Scholar 

  16. Hsu YP, Liu C, Chen TY, Fu LC (2016) Online view-invariant human action recognition using rgb-d spatio-temporal matrix. Pattern Recogn 60:215–226

    Article  Google Scholar 

  17. Ikizler-Cinbis N, Sclaroff S (2010) Object, scene and actions: combining multiple features for human action recognition. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 494–507

  18. Jain A, Gupta A, Rodriguez M, Davis LS (2013) Representing videos using mid-level discriminative patches. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2571–2578

  19. Jiang Z, Lin Z, Davis LS (2013) Label consistent K-SVD: learning a discriminative dictionary for recognition. IEEE Trans Pattern Anal Mach Intell 335(11):2651–2664

    Article  Google Scholar 

  20. Junejo I, Dexter E, Laptev I, Perez P (2011) View-independent action recognition from temporal self-similarities. IEEE Trans Pattern Anal Mach Intell 33(1):172–185

    Article  Google Scholar 

  21. Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY (2002) An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24(7):881–892

    Article  MATH  Google Scholar 

  22. Klaser A, Marszałek M, Schmid C (2008) A spatio-temporal descriptor based on 3d–gradients. In: British Machine Vision Conference (BMVC), pp 275:1–275:10

  23. Kobayashi T, Otsu N (2012) Motion recognition using local auto-correlation of space–time gradients. Pattern Recogn Lett 33(9):1188–1195

    Article  Google Scholar 

  24. Lan T, Zhu Y, Zamir AR, Savarese S (2015) Action recognition by hierarchical mid-level action elements. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 4552–4560

  25. Le QV, Zou WY, Yeung SY, Ng AY (2011) Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3361–3368

  26. Li LJ, Jha RK, Thomee B, Shamma DA, Cao L, Wang Y (2016) Where the photos were taken: location prediction by learning from flickr photos. In: Large-Scale Visual Geo-Localization, pp 41–58

  27. Liu J, Luo J, Shah M (2009) Recognizing realistic actions from videos “in the wild”. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1996–2003

  28. Liu J, Yang Y, Saleemi I, Shah M (2012) Learning semantic features for action recognition via diffusion maps. Comput Vis Image Underst 116(3):361–377

    Article  Google Scholar 

  29. Liu L, Shao L, Zheng F, Li X (2014) Realistic action recognition via sparsely-constructed Gaussian processes. Pattern Recogn 47(12):3819–3827

    Article  Google Scholar 

  30. Liu W, Liu H, Tao D, Wang Y, Lu K (2015) Multiview hessian regularized logistic regression for action recognition. Signal Process 110:101–107

    Article  Google Scholar 

  31. Liu L, Shao L, Li X, Lu K (2016) Learning spatio-temporal representations for action recognition: a genetic programming approach. IEEE Trans Cybern 46:158–170

    Article  Google Scholar 

  32. Narayan S, Ramakrishnan KR (2014) A cause and effect analysis of motion trajectories for modeling actions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2633–2640

  33. Nasiri JA, Charkari NM, Mozafari K (2014) Energy-based model of least squares twin support vector machines for human action recognition. Signal Process 104:248–257

    Article  Google Scholar 

  34. Nguyen TV, Song Z, Yan S (2015) STAP: spatial-temporal attention-aware pooling for action recognition. IEEE Trans Circuits Syst Video Technol 25(1):77–86

    Article  Google Scholar 

  35. Niebles JC, Wang H, Li FF (2008) Unsupervised learning of human action categories using spatial-temporal words. Int J Comput Vis 79(3):299–318

    Article  Google Scholar 

  36. Pei L, Ye M, Xu P, Li T (2015) Fast multi-class action recognition by querying inverted index tables. Multimed Tools Appl 74(23):10801–10822

    Article  Google Scholar 

  37. Peng X, Qiao Y, Peng Q, Qi X (2013) Exploring motion boundary based sampling and spatial-temporal context descriptors for action recognition. In: British Machine Vision Conference (BMVC), pp 1–11

  38. Peng X, Zou C, Qiao Y, Peng Q (2014) Action recognition with stacked fisher vectors. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 581–595

  39. Ramakanth SA, Babu RV (2012) Feature match: an efficient low dimensional PatchMatch technique. In: Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP), pp 45:1–45:7

  40. Raptis M, Kokkinos I, Soatto S (2012) Discovering discriminative action parts from mid-level video representations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1242–1249

  41. Rodriguez MD, Ahmed J, Shah M (2008) Action mach a spatio-temporal maximum average correlation height filter for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1–8

  42. Rrnyi A (1961) On measures of entropy and information. In: Proceedings of the Berkeley Symposium on Mathematical Statistics and Probability, pp 547–561

  43. Sapienza M, Cuzzolin F, Torr PH (2014) Learning discriminative space-time action parts from weakly labelled videos. Int J Comput Vis 110(1):30–47

    Article  Google Scholar 

  44. Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: Proceedings of the IEEE International Conference on Pattern Recognition (ICPR), pp 32–36

  45. Shao L, Zhen X, Tao D, Li X (2014) Spatio-temporal laplacian pyramid coding for action recognition. IEEE Trans Cybern 44(6):817–827

    Article  Google Scholar 

  46. Singh S, Gupta A, Efros AA (2012) Unsupervised discovery of mid-level discriminative patches. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 73–86

  47. Wang H, Kläser A, Schmid C, Liu CL (2011) Action recognition by dense trajectories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3169–3176

  48. Wang J, Chen Z, Wu Y (2011) Action recognition with multiscale spatio-temporal contexts. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3185–3192

  49. Wang H, Yuan C, Weiming H, Sun C (2012) Supervised class-specific dictionary learning for sparse modeling in action recognition. Pattern Recogn 45(11):3902–3911

    Article  Google Scholar 

  50. Wang C, Wang Y, Yuille AL (2013) An approach to pose-based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 915–922

  51. Wang H, Kläser A, Schmid C, Liu CL (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103(1):60–79

    Article  MathSciNet  Google Scholar 

  52. Wang L, Qiao Y, Tang X (2013) Mining motion atoms and phrases for complex action recognition. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 2680–2687

  53. Wang L, Qiao Y, Tang X (2013) Motionlets: mid-level 3D parts for human motion recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2674–2681

  54. Wang H, Yuan C, Hu W, Ling H, Yang W, Sun C (2014) Action recognition using nonnegative action component representation and sparse basis selection. IEEE Trans Image Process 23(2):570–581

    Article  MathSciNet  MATH  Google Scholar 

  55. Wang L, Qiao Y, Tang X (2016) MoFAP: a multi-level representation for action recognition. Int J Comput Vis 119(3):254–271

    Article  MathSciNet  Google Scholar 

  56. Wang X, Thome N, Cord M (2016) Gaze latent support vector machine for image classification. In: IEEE International Conference on Image Processing (ICIP), pp 236–240

  57. Wang X, Yang X, Liu W, Duan C, Latecki LJ (2016) Location-aware image classification. In: International Confernce on Multimedia Modeling (MMM), pp 829–841

  58. Wu X, Xu D, Duan L, Luo J (2011) Action recognition using context and appearance distribution features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 489–496

  59. Wu X, Xu D, Duan L, Luo J, Jia Y (2013) Action recognition using multilevel features and latent structural SVM. IEEE Trans Circuits Syst Video Technol 23(8):1422–1431

    Article  Google Scholar 

  60. Yang X, Tian Y (2014) Action recognition using super sparse coding vector with spatio-temporal awareness. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 727–741

  61. Yang Y, Liu R, Deng C, Gao X (2016) Multi-task human action recognition via exploring super-category. Signal Process 124:36–44

    Article  Google Scholar 

  62. Yi Y, Lin Y (2013) Human action recognition with salient trajectories. Signal Process 93(11):2932–2941

    Article  Google Scholar 

  63. Yi Y, Lin M (2016) Human action recognition with graph-based multiple-instance learning. Pattern Recogn 53:148–162

    Article  Google Scholar 

  64. Yu H, Liu Z, Wang G (2014) An automatic method to determine the number of clusters using decision-theoretic rough set. Int J Approx Reason 55(1):101–115

    Article  MathSciNet  MATH  Google Scholar 

  65. Yuan C, Li X, Hu W, Ling H, Maybank S (2013) 3D R transform on spatio-temporal interest points for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 724–730

  66. Zhang Z, Tao D (2012) Slow feature analysis for human action recognition. IEEE Trans Pattern Anal Mach Intell 34(3):436–450

    Article  MathSciNet  Google Scholar 

  67. Zhang H, Zhou W, Reardon C, Parker LE (2014) Simplex-based 3D spatio-temporal feature description for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2067–2074

  68. Zhou W, Wang C, Xiao B, Zhang Z (2014) Action recognition via structured codebook construction. Signal Process Image Commun 29(4):546–555

    Article  Google Scholar 

  69. Zhu J, Wang B, Yang X, Zhang W, Tu Z (2013) Action recognition with actons. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 3559–3566

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China [No. 61072110]; Science and Technology Overall Innovation Project of Shaanxi Province [2013KTZB03-03-03]; Industrial Research Project of Shaanxi Province [2015GY011]; International Cooperation Project of Shaanxi Province [2015KW-004]; International Cooperation Project of Shaanxi Province [2016KW-042].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ming Tong.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tong, M., Tian, W., Wang, H. et al. A compact discriminant hierarchical clustering approach for action recognition. Multimed Tools Appl 77, 7539–7564 (2018). https://doi.org/10.1007/s11042-017-4660-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-017-4660-7

Keywords

Navigation