Feature Similarity and Frequency-Based Weighted Visual Words Codebook Learning Scheme for Human Action Recognition

  • Saima Nazir
  • Muhammad Haroon YousafEmail author
  • Sergio A. Velastin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10749)


Human action recognition has become a popular field for computer vision researchers in the recent decade. This paper presents a human action recognition scheme based on a textual information concept inspired by document retrieval systems. Videos are represented using a commonly used local feature representation. In addition, we formulate a new weighted class specific dictionary learning scheme to reflect the importance of visual words for a particular action class. Weighted class specific dictionary learning enriches the scheme to learn a sparse representation for a particular action class. To evaluate our scheme on realistic and complex scenarios, we have tested it on UCF Sports and UCF11 benchmark datasets. This paper reports experimental results that outperform recent state-of-the-art methods for the UCF Sports and the UCF11 dataset i.e. 98.93% and 93.88% in terms of average accuracy respectively. To the best of our knowledge, this contribution is first to apply a weighted class specific dictionary learning method on realistic human action recognition datasets.


Human action recognition Bag of visual words Spatio-temporal features UCF Sports 



Sergio A Velastin acknowledges funding by the Universidad Carlos III de Madrid, the European Unions Seventh Framework Programme for research, technological development and demonstration under grant agreement n 600371, el Ministerio de Econom y Competitividad (COFUND2013-51509) and Banco Santander. Authors also acknowledges support from the Directorate of ASR and TD, University of Engineering and Technology Taxila, Pakistan.


  1. 1.
    Abdulmunem, A., Lai, Y., Sun, X.: Saliency guided local and global descriptors for effective action recognition. Comput. Vis. Media 2(1), 97–106 (2016)CrossRefGoogle Scholar
  2. 2.
    Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: 2005 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72. IEEE (2005)Google Scholar
  3. 3.
    Fernando, B., Gould, S.: Learning end-to-end video classification with rank-pooling. In: Proceedings of the International Conference on Machine Learning (ICML) (2016)Google Scholar
  4. 4.
    Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recognit. Lett. 31(8), 651–666 (2010)CrossRefGoogle Scholar
  5. 5.
    Laptev, I., Lindeberg, T.: Space-time interest points. In: proceedings of 9th IEEE International Conference on Computer Vision, Nice, France, pp. 432–439 (2003)Google Scholar
  6. 6.
    Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8. IEEE (2008)Google Scholar
  7. 7.
    Le, Q.V., Zou, W.Y., Yeung, S.Y., Ng, A.Y.: Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3361–3368. IEEE (2011)Google Scholar
  8. 8.
    Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos in the wild. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 1996–2003. IEEE (2009)Google Scholar
  9. 9.
    Markatopoulou, F., Moumtzidou, A., Tzelepis, C., Avgerinakis, K., Gkalelis, N., Vrochidis, S., Mezaris, V., Kompatsiaris, I.: ITI-CERTH participation to TRECVID 2013. In: TRECVID 2013 Workshop, Gaithersburg (2013)Google Scholar
  10. 10.
    Mota, V.F., Souza, J.I., Araújo, A.D.A., Vieira, M.B.: Combining orientation tensors for human action recognition. In: 2013 26th SIBGRAPI-Conference on Graphics, Patterns and Images (SIBGRAPI), pp. 328–333. IEEE (2013)Google Scholar
  11. 11.
    Murthy, O., Goecke, R.: Ordered trajectories for large scale human action recognition. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 412–419 (2013)Google Scholar
  12. 12.
    Nazir, S., Haroon, M., Velastin, S.A.: Inter and intra class correlation analysis (IICCA) for human action recognition in realistic scenarios. In: International Conference of Pattern Recognition Systems (ICPRS) (2017, to appear)Google Scholar
  13. 13.
    Nazir, S., Haroon Yousaf, M., Velastin, S.A.: Evaluating bag of visual features (BoVF) approach using spatio temporal features for action recognition. In: Computer and Electrical Engineering (2017, submitted)Google Scholar
  14. 14.
    Peng, X., Schmid, C.: Multi-region two-stream R-CNN for action detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016 Part IV. LNCS, vol. 9908, pp. 744–759. Springer, Cham (2016). CrossRefGoogle Scholar
  15. 15.
    Peng, X., Wang, L., Cai, Z., Qiao, Y., Peng, Q.: Hybrid super vector with improved dense trajectories for action recognition. In: ICCV Workshops, vol. 13 (2013)Google Scholar
  16. 16.
    Peng, X., Wang, L., Wang, X., Qiao, Y.: Bag of visual words and fusion methods for action recognition: comprehensive study and good practice. Comput. Vis. Image Underst. 150, 109–125 (2016)CrossRefGoogle Scholar
  17. 17.
    Rodriguez, M.D., Ahmed, J., Shah, M.: Action mach a spatio-temporal maximum average correlation height filter for action recognition. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8. IEEE (2008)Google Scholar
  18. 18.
    Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004, vol. 3, pp. 32–36. IEEE (2004)Google Scholar
  19. 19.
    Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the 15th ACM International Conference on Multimedia, pp. 357–360. ACM (2007)Google Scholar
  20. 20.
    Sun, J., Wu, X., Yan, S., Cheong, L.-F., Chua, T.-S., Li, J.: Hierarchical spatio-temporal context modeling for action recognition. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 2004–2011. IEEE (2009)Google Scholar
  21. 21.
    Tirilly, P., Claveau, V., Gros, P.: A review of weighting schemes for bag of visual words image retrieval. Research report PI 1927, p. 47 (2009)Google Scholar
  22. 22.
    Wang, H., Kläser, A., Schmid, C., Liu, C.-L.: Action recognition by dense trajectories. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3169–3176. IEEE (2011)Google Scholar
  23. 23.
    Wang, H., Kläser, A., Schmid, C., Liu, C.-L.: Dense trajectories and motion boundary descriptors for action recognition. Int. J. Comput. Vis. 103(1), 60–79 (2013)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Yadav, G.K., Shukla, P., Sethfi, A.: Action recognition using interest points capturing differential motion information. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1881–1885. IEEE (2016)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.University of Engineering and Technology TaxilaTaxilaPakistan
  2. 2.Universidad Carlos III de MadridMadridSpain

Personalised recommendations