Classification of Human Actions Using 3-D Convolutional Neural Networks: A Hierarchical Approach

Thakkar, Shaival; Joshi, M. V.

doi:10.1007/978-981-13-0020-2_2

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 841))

Included in the following conference series:

National Conference on Computer Vision, Pattern Recognition, Image Processing, and Graphics

1437 Accesses
4 Citations

Abstract

In this paper, we present a hierarchical approach for human action classification using 3-D Convolutional neural networks (3-D CNN). In general, human actions refer to positioning and movement of hands and legs and hence can be classified based on those performed by hands or by legs or, in some cases, both. This acts as the intuition for our work on hierarchical classification. In this work, we consider the actions as tasks performed by hand or leg movements. Therefore, instead of using a single 3-D CNN for classification of given actions, we use multiple networks to perform the classification hierarchically, that is, we first perform binary classification to separate the hand and leg actions and then use two separate networks for hand and leg actions to perform classification among target action categories. For example, in case of KTH dataset, we train three networks to classify six different actions, comprising of three actions each for hands and legs. The novelty of our approach lies in performing the separation of hand and leg actions first, thus making the subsequent classifiers to accept the features corresponding to either hands or legs only. This leads to better classification accuracy. Also, the use of 3-D CNN enables automatic extraction of features in spatial as well as temporal domain, avoiding the need for hand crafted features. This makes it one of the better approaches when it comes to video classification. We use the KTH, Weizmann and UCF-sports datasets to evaluate our method and comparison with the state of the art methods shows that our approach outperforms most of them.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)
Article Google Scholar
Laptev, I., Lindeberg, T.: Space-time interest points. In: 2003 Proceedings of the Ninth IEEE International Conference on Computer Vision, vol. 1, pp. 432–439, October 2003
Google Scholar
Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the 15th ACM International Conference on Multimedia, Series MM 2007, pp. 357–360. ACM, New York (2007). http://doi.acm.org/10.1145/1291233.1291311
Ravanbakhsh, M., Mousavi, H., Rastegari, M., Murino, V., Davis, L.S.: Action recognition with image based CNN features, CoRR, vol. abs/1512.03980 (2015). http://arxiv.org/abs/1512.03980
Baumann, F.: Action recognition with HOG-OF features. In: Weickert, J., Hein, M., Schiele, B. (eds.) GCPR 2013. LNCS, vol. 8142, pp. 243–248. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40602-7_26
Chapter Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995). https://doi.org/10.1023/A:1022627411411
Article MATH Google Scholar
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, Series COLT 1992, pp. 144–152. ACM, New York (1992). http://doi.acm.org/10.1145/130385.130401
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, Series NIPS 2012, pp. 1097–1105. Curran Associates Inc., USA (2012). http://dl.acm.org/citation.cfm?id=2999134.2999257
Mitchell, T.M.: Machine Learning, 1st edn. McGraw-Hill Inc., New York (1997)
MATH Google Scholar
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: 2004 Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004, vol. 3, pp. 32–36, August 2004
Google Scholar
Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: Proceedings of the 14th International Conference on Computer Communications and Networks, Series ICCCN 2005, Washington, DC, USA, pp. 65–72. IEEE Computer Society (2005). http://dl.acm.org/citation.cfm?id=1259587.1259830
Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. Int. J. Comput. Vis. 79(3), 299–318 (2008). https://doi.org/10.1007/s11263-007-0122-4
Article Google Scholar
Jhuang, H., Serre, T., Wolf, L., Poggio, T.: A biologically inspired system for action recognition. In: 2007 IEEE 11th International Conference on Computer Vision, pp. 1–8, October 2007
Google Scholar
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, June 2008
Google Scholar
Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: CVPR 2011, pp. 3169–3176, June 2011
Google Scholar
Hao, Z., Lu, L., Zhang, Q., Wu, J., Izquierdo, E., Yang, J., Zhao, J.: Action recognition based on subdivision-fusion model. In: Proceedings of the British Machine Vision Conference (BMVC), pp. 50.1–50.12. BMVA Press, September 2015. https://doi.org/10.5244/C.29.50
Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. Trans. Pattern Anal. Mach. Intell. 29(12), 2247–2253 (2007)
Article Google Scholar
Brahnam, S., Nanni, L.: High performance set of features for human action classification (2009)
Google Scholar
Fathi, A., Mori, G.: Action recognition by learning mid-level motion features. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, June 2008
Google Scholar
Soomro, K., Zamir, A.R.: Action recognition in realistic sports videos. In: Moeslund, T.B., Thomas, G., Hilton, A. (eds.) Computer Vision in Sports. ACVPR, pp. 181–208. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09396-3_9
Chapter Google Scholar
Wang, H., Ullah, M.M., Klaser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: Cavallaro, A., Prince, S., Alexander, D. (eds.) British Machine Vision Conference, BMVC 2009, London, United Kingdom, pp. 124.1–124.11. BMVA Press, September 2009. https://hal.inria.fr/inria-00439769
Kovashka, A., Grauman, K.: Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2046–2053, June 2010
Google Scholar
Weinzaepfel, P., Harchaoui, Z., Schmid, C.: Learning to track for spatio-temporal action localization, CoRR, vol. abs/1506.01929 (2015). http://arxiv.org/abs/1506.01929

Download references

Author information

Authors and Affiliations

Dhirubhai Ambani Institute of Information and Communication Technology, Gandhinagar, 382007, Gujarat, India
Shaival Thakkar & M. V. Joshi

Authors

Shaival Thakkar
View author publications
You can also search for this author in PubMed Google Scholar
M. V. Joshi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shaival Thakkar .

Editor information

Editors and Affiliations

Indian Institute of Technology Mandi, Mandi, Himachal Pradesh, India
Renu Rameshan
Indraprastha Institute of Information Technology, New Delhi, India
Chetan Arora
Indian Institute of Technology, New Delhi, India
Sumantra Dutta Roy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Thakkar, S., Joshi, M.V. (2018). Classification of Human Actions Using 3-D Convolutional Neural Networks: A Hierarchical Approach. In: Rameshan, R., Arora, C., Dutta Roy, S. (eds) Computer Vision, Pattern Recognition, Image Processing, and Graphics. NCVPRIPG 2017. Communications in Computer and Information Science, vol 841. Springer, Singapore. https://doi.org/10.1007/978-981-13-0020-2_2

Download citation

DOI: https://doi.org/10.1007/978-981-13-0020-2_2
Published: 26 April 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-0019-6
Online ISBN: 978-981-13-0020-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics