Human action recognition with bag of visual words using different machine learning methods and hyperparameter optimization

Aslan, Muhammet Fatih; Durdu, Akif; Sabanci, Kadir

doi:10.1007/s00521-019-04365-9

Human action recognition with bag of visual words using different machine learning methods and hyperparameter optimization

Original Article
Published: 26 July 2019

Volume 32, pages 8585–8597, (2020)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

1410 Accesses
47 Citations
12 Altmetric
1 Mention
Explore all metrics

Abstract

Human activity recognition (HAR) has quite a wide range of applications. Due to its widespread use, new studies have been developed to improve the HAR performance. In this study, HAR is carried out using the commonly preferred KTH and Weizmann dataset, as well as a dataset which we created. Speeded up robust features (SURF) are used to extract features from these datasets. These features are reinforced with bag of visual words (BoVW). Different from the studies in the literature that use similar methods, SURF descriptors are extracted from binary images as well as grayscale images. Moreover, four different machine learning (ML) methods such as k-nearest neighbors, decision tree, support vector machine and naive Bayes are used for classification of BoVW features. Hyperparameter optimization is used to set the hyperparameters of these ML methods. As a result, ML methods are compared with each other through a comparison with the activity recognition performances of binary and grayscale image features. The results show that if the contrast of the environment decreases when a human enters the frame, the SURF of the binary image are more effective than the SURF of the gray image for HAR.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Action Recognition Framework Based on Normalized Local Binary Pattern

Human action recognition using bag of global and local Zernike moment features

Article 15 May 2019

A Local Feature Descriptor Based on Energy Information for Human Activity Recognition

References

Dobhal T, Shitole V, Thomas G, Navada G (2015) Human activity recognition using binary motion image and deep learning. Procedia Comput Sci 58:178–185
Article Google Scholar
Kim E, Helal S, Cook D (2010) Human activity recognition and pattern discovery. IEEE Pervasive Comput/IEEE Comput Soc IEEE Commun Soc 9(1):48
Article Google Scholar
De Kleijn R, Kachergis G, Hommel B (2014) Everyday robotic action: lessons from human action control. Front Neurorobot 8:13
Article Google Scholar
Dhamsania CJ, Ratanpara TV (2016) A survey on human action recognition from videos. In: 2016 Online international conference on green engineering and technologies (IC-GET). IEEE, pp 1–5
Koohzadi M, Charkari NM (2017) Survey on deep learning methods in human action recognition. IET Comput Vis 11(8):623–632
Article Google Scholar
Ngoc LQ, Viet VH, Son TT, Hoang PM (2016) A robust approach for action recognition based on spatio-temporal features in RGB-D sequences. Int J Adv Comput Sci Appl 7(5):166–177
Google Scholar
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Mandal R, Roy PP, Pal U, Blumenstein M (2018) Bag-of-visual-words for signature-based multi-script document retrieval. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3444-y
Article Google Scholar
Tang F, Lim SH, Chang NL, Tao H (2009) A novel feature descriptor invariant to complex brightness changes. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 2631–2638
Bay H, Tuytelaars T, Van Gool L (2006) Surf: speeded up robust features. In: European conference on computer vision. Springer, pp 404–417
Panchal P, Panchal S, Shah S (2013) A comparison of SIFT and SURF. Int J Innov Res Comput and Commun Eng 1(2):323–327
Google Scholar
Karami E, Prasad S, Shehata M (2017) Image matching using SIFT, SURF, BRIEF and ORB: performance comparison for distorted images. arXiv preprint arXiv:1710.02726
Yang J, Jiang Y-G, Hauptmann AG, Ngo C-W (2007) Evaluating bag-of-visual-words representations in scene classification. In: Proceedings of the international workshop on multimedia information retrieval. ACM, pp 197–206
Faraki M, Palhang M, Sanderson C (2014) Log-Euclidean bag of words for human action recognition. IET Comput Vis 9(3):331–339
Article Google Scholar
Dawn DD, Shaikh SH (2016) A comprehensive survey of human action recognition with spatio-temporal interest point (STIP) detector. Vis Comput 32(3):289–306
Article Google Scholar
Xu S, Fang T, Li D, Wang S (2010) Object classification of aerial images with bag-of-visual words. IEEE Geosci Remote Sens Lett 7(2):366–370
Article Google Scholar
Kim J, Kim B-S, Savarese S (2012) Comparing image classification methods: k-nearest-neighbor and support-vector-machines. Ann Arbor 1001:48109–48122
Google Scholar
Farid DM, Zhang L, Rahman CM, Hossain MA, Strachan R (2014) Hybrid decision tree and naïve Bayes classifiers for multi-class classification tasks. Expert Syst Appl 41(4):1937–1946
Article Google Scholar
Ben-Hur A, Weston J (2010) A user’s guide to support vector machines. In: Data mining techniques for the life sciences. Springer, pp 223–239
Abellán J, Castellano JG (2017) Improving the Naive Bayes classifier via a quick variable selection method using maximum of entropy. Entropy 19(6):247
Article Google Scholar
Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(Feb):281–305
MathSciNet MATH Google Scholar
Yao Y, Cao J, Ma Z (2018) A cost-effective deadline-constrained scheduling strategy for a hyperparameter optimization workflow for machine learning algorithms. In: International conference on service-oriented computing. Springer, pp 870–878
Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: Proceedings of the 17th international conference on pattern recognition, 2004 ICPR 2004, vol. 3. IEEE, pp 32–36
Blank M, Gorelick L, Shechtman E, Irani M, Basri R (2005) Actions as space–time shapes. In: Proceedings of international conference computer Vision. IEEE, pp 1395–1402
Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recognit Lett 31(8):651–666
Article Google Scholar
Plötz T, Guan Y (2018) Deep learning for human activity recognition in mobile computing. Computer 51(5):50–59
Article Google Scholar
Baccouche M, Mamalet F, Wolf C, Garcia C, Baskurt A (2011) Sequential deep learning for human action recognition. In: International workshop on human behavior understanding. Springer, pp 29–39
Rahman S, Cho S-Y, Leung M (2012) Recognising human actions by analysing negative spaces. IET Comput Vis 6(3):197–213
Article MathSciNet Google Scholar
Zhang Z, Hu Y, Chan S, Chia L-T (2008) Motion context: a new representation for human action recognition. In: European conference on computer vision. Springer, pp 817–829
Singh M, Basu A, Mandal MK (2008) Human activity recognition based on silhouette directionality. IEEE Trans Circuits Syst Video Technol 18(9):1280–1292
Article Google Scholar
Bian W, Tao D, Rui Y (2012) Cross-domain human action recognition. IEEE Trans Syst Man Cybern Part B (Cybern) 42(2):298–307
Article Google Scholar
Cao X-Q, Liu Z-Q (2015) Type-2 fuzzy topic models for human action recognition. IEEE Trans Fuzzy Syst 23(5):1581–1593
Article Google Scholar
Uddin MZ, Kim T-S, Kim J-T (2013) A spatiotemporal robust approach for human activity recognition. Int J Adv Robot Syst 10(11):391
Article Google Scholar
Ding W, Liu K, Cheng F, Shi H, Zhang B (2015) Skeleton-based human action recognition with profile hidden Markov models. In: CCF Chinese conference on computer vision. Springer, pp 12–21
Gao H, Chen W, Dou L (2015) Image classification based on support vector machine and the fusion of complementary features. arXiv preprint arXiv:1511.01706
Halima NB, Hosam O (2016) Bag of words based surveillance system using support vector machines. Int J Secur Appl 10(4):331–346
Google Scholar
Liu A-A, Su Y, Gao Z, Hao T, Yang Z-X, Zhang Z (2013) Partwise bag-of-words-based multi-task learning for human action recognition. Electron Lett 49(13):803–805
Article Google Scholar
Liu A-A, Xu N, Su Y-T, Lin H, Hao T, Yang Z-X (2015) Single/multi-view human action recognition via regularized multi-task learning. Neurocomputing 151:544–553
Article Google Scholar
Liu Y, Fung K-C, Ding W, Guo H, Qu T, Xiao C (2018) Novel smart waste sorting system based on image processing algorithms: SURF-BoW and multi-class SVM. Comput Inf Sci 11(3):35
Google Scholar
Zhu Y, Nayak NM, Roy-Chowdhury AK (2013) Context-aware activity recognition and anomaly detection in video. J Sel Top Signal Process 7(1):91–101
Article Google Scholar
Vo V, Ly N (2012) Robust human action recognition using improved BOW and hybrid features. In: 2012 IEEE International symposium on signal processing and information technology (ISSPIT). IEEE, pp 000224–000229
Gilbert A, Illingworth J, Bowden R (2009) Fast realistic multi-action recognition using mined dense spatio-temporal features. In: 2009 IEEE 12th international conference on computer vision. IEEE, pp 925–931
Grushin A, Monner DD, Reggia JA, Mishra A (2013) Robust human action recognition via long short-term memory. In: The 2013 international joint conference on, neural networks (IJCNN). IEEE, pp 1–8
Jhuang H, Serre T, Wolf L, Poggio T (2007) A biologically inspired system for action recognition. In: IEEE 11th international conference on computer vision, 2007 ICCV 2007. IEEE, pp 1–8
Kläser A (2010) Learning human actions in video. Ph.D. Thesis, Université de Grenoble
Lin Z, Jiang Z, Davis LS (2009) Recognizing actions by shape-motion prototype trees. In: 2009 IEEE 12th international conference on computer vision. IEEE, pp 444–451
Liu J, Luo J, Shah M (2009) Recognizing realistic actions from videos “in the wild”. In: IEEE conference on computer vision and pattern recognition, 2009. CVPR 2009. IEEE, pp 1996–2003
Liu J, Shah M (2008) Learning human actions via information maximization. In: IEEE conference on computer vision and pattern recognition, 2008. CVPR 2008. IEEE, pp 1–8
Rodriguez M (2010) Spatio-temporal maximum average correlation height templates in action recognition and video summarization. Electronic Theses and Dissertations, 4323
Schindler K, Van Gool L (2008) Action snippets: How many frames does human action recognition require? In: IEEE conference on computer vision and pattern recognition CVPR 2008. IEEE, pp 1–8
Sun X, Chen M, Hauptmann A (2009) Action recognition via local descriptors and holistic features. In: IEEE computer society conference on computer vision and pattern recognition workshops, 2009 CVPR workshops 2009. IEEE, pp 58–65
Veeriah V, Zhuang N, Qi G-J (2015) Differential recurrent neural networks for action recognition. In: Proceedings of the IEEE international conference on computer vision, pp 4041–4049
Wu X, Liang W, Jia Y (2009) Incremental discriminative-analysis of canonical correlations for action recognition. In: 2009 IEEE 12th international conference on computer vision, 2009. IEEE, pp 2035–2041
Suto J, Oniga S, Lung C, Orha I (2018) Comparison of offline and real-time human activity recognition results using machine learning techniques. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3437-x
Article Google Scholar
Niebles JC, Wang H, Fei-Fei L (2008) Unsupervised learning of human action categories using spatial-temporal words. Int J Comput Vis 79(3):299–318
Article Google Scholar
Ramage D, Hall D, Nallapati R, Manning CD (2009) Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 conference on empirical methods in natural language processing: volume 1. Association for Computational Linguistics, pp 248–256
Blank M, Gorelick L, Shechtman E, Irani M, Basri R (2005) Actions as space–time shapes. In: Tenth IEEE international conference on computer vision (ICCV’05). IEEE, pp 1395–1402
Scovanner P, Ali S, Shah M (2007) A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the 15th ACM international conference on multimedia. ACM, pp 357–360
Bregonzio M, Xiang T, Gong S (2012) Fusing appearance and distribution information of interest points for action recognition. Pattern Recognit 45(3):1220–1234
Article Google Scholar
Dollár P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: 2nd Joint IEEE international workshop on visual surveillance and performance evaluation of tracking and surveillance. IEEE, pp 65–72
Klaser A, Marszałek M, Schmid C (2008) A spatio-temporal descriptor based on 3D-gradients. In: BMVC 2008 19th British machine vision conference. British Machine Vision Association, pp 275: 1–10
Liu H, Ju Z, Ji X, Chan CS, Khoury M (2017) Study of human action recognition based on improved spatio-temporal features. In: Human Motion sensing and recognition: a fuzzy qualitative approach. Springer, Berlin, pp 233–250
Moussa MM, Hamayed E, Fayek MB, El Nemr HA (2015) An enhanced method for human action recognition. J Adv Res 6(2):163–169
Article Google Scholar
Singh YK, Singh ND (2017) Binary face image recognition using logistic regression and neural network. In: 2017 International conference on energy, communication, data analytics and soft computing (ICECDS). IEEE, pp 3883–3888
Pandey RK, Vignesh K, Ramakrishnan A (2018) Binary Document image super resolution for improved readability and OCR performance. arXiv preprint arXiv:1812.02475
Perner P, Perner H, Müller B (2002) Mining knowledge for HEp-2 cell image classification. Artif Intel Med 26(1–2):161–173
Article MATH Google Scholar
Santofimia MJ, Martinez-del-Rincon J, Nebel J-C (2014) Episodic reasoning for vision-based human action recognition. Sci World J 2014:270171
Article Google Scholar
Laptev I, Lindeberg T (2006) Local descriptors for spatio-temporal recognition. In: Spatial coherence for visual motion analysis. Springer, pp 91–103
Galar M, Fernández A, Barrenechea E, Bustince H, Herrera F (2011) An overview of ensemble methods for binary classifiers in multi-class problems: experimental study on one-vs-one and one-vs-all schemes. Pattern Recognit 44(8):1761–1776
Article Google Scholar
Haralick RM (1979) Statistical and structural approaches to texture. Proc IEEE 67(5):786–804
Article Google Scholar

Download references

Acknowledgements

The authors are thankful to RAC-LAB (www.rac-lab.com) for providing the trial version of their commercial software for this study.

Author information

Authors and Affiliations

Department of Electrical and Electronics Engineering, Engineering Faculty, Karamanoglu Mehmetbey University, Karaman, Turkey
Muhammet Fatih Aslan & Kadir Sabanci
Department of Electrical and Electronics Engineering, Faculty of Engineering and Natural Science, Konya Technical University, Konya, Turkey
Akif Durdu

Authors

Muhammet Fatih Aslan
View author publications
You can also search for this author in PubMed Google Scholar
Akif Durdu
View author publications
You can also search for this author in PubMed Google Scholar
Kadir Sabanci
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Muhammet Fatih Aslan.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aslan, M.F., Durdu, A. & Sabanci, K. Human action recognition with bag of visual words using different machine learning methods and hyperparameter optimization. Neural Comput & Applic 32, 8585–8597 (2020). https://doi.org/10.1007/s00521-019-04365-9

Download citation

Received: 20 October 2018
Accepted: 19 July 2019
Published: 26 July 2019
Issue Date: June 2020
DOI: https://doi.org/10.1007/s00521-019-04365-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human action recognition with bag of visual words using different machine learning methods and hyperparameter optimization

Abstract

Access this article

Similar content being viewed by others

Action Recognition Framework Based on Normalized Local Binary Pattern

Human action recognition using bag of global and local Zernike moment features

A Local Feature Descriptor Based on Energy Information for Human Activity Recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Human action recognition with bag of visual words using different machine learning methods and hyperparameter optimization

Abstract

Access this article

Similar content being viewed by others

Action Recognition Framework Based on Normalized Local Binary Pattern

Human action recognition using bag of global and local Zernike moment features

A Local Feature Descriptor Based on Energy Information for Human Activity Recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation