Hybridized Deep Learning Architectures for Human Activity Recognition

Pillay, Bradley Joel; Pillay, Anban W.; Jembere, Edgar

doi:10.1007/978-3-030-66151-9_11

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1342))

Included in the following conference series:

Southern African Conference for Artificial Intelligence Research

681 Accesses

Abstract

Human activity recognition using video data has been an active research area in computer vision for many years. Various approaches were introduced to efficaciously recognize human activities. This study focuses on identifying activities performed by single individuals using visual information from short video clips. Several deep learning techniques are exploited to develop an architecture to effectively solve the human activity recognition task. The architecture hybridizes a two-stream neural network with a multi-layer perception (MLP). The two-stream neural network is a temporal segment network (TSN) which consists of a spatial and a temporal stream. The architecture adopts Octave Convolution neural networks as frame-level feature extractors in the temporal segment network (TSN). The optical flow calculations were performed using the FlowNet 2.0 algorithm, which serves as inputs to the temporal stream. This newly developed architecture was trained and evaluated on the KTH human activity dataset. The results obtained are competitive to existing state-of-the-art results.

The support of the Centre for High Performance Computing (CHPC) is gratefully acknowledged.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chen, Y., et al.: Drop an octave: reducing spatial redundancy in convolutional neural networks with octave convolution. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3435–3444 (2019)
Google Scholar
Zhang, H.B., et al.: A comprehensive survey of vision-based human action recognition methods. Sensors 19(5), 1005 (2019)
Article Google Scholar
Kang, S.M., Wildes, R.P.: Review of action recognition and detection methods. arXiv preprint arXiv:1610.06906 (2016)
Chandni, Khurana, R., Kushwaha A.K.S: Delving deeper with dual-stream CNN for activity recognition. In: Khare, A., Tiwary, U., Sethi, I., Singh, N. (eds.) Recent Trends in Communication, Computing, and Electronics. LNEE, vol. 524, pp. 333–342. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-2685-1_32
Chapter Google Scholar
Bilkhu, M., Ayyubi, H.: Human Activity Recognition for Edge Devices. arXiv preprint arXiv:1903.07563 (2019)
Wang, L., et al.: Temporal segment networks: towards good practices for deep action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 20–36. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_2
Chapter Google Scholar
Arif, S., Wang, J., Ul Hassan, T., Fei, Z.: 3D-CNN-based fused feature maps with LSTM applied to action recognition. Fut. Internet 11(2), 42 (2019)
Article Google Scholar
Song, S., Cheung, N.M., Chandrasekhar, V., Mandal, B.: Deep adaptive temporal pooling for activity recognition. In: Proceedings of the 26th ACM International Conference on Multimedia, pp. 1829–1837 (October 2018)
Google Scholar
Baccouche, M., Mamalet, F., Wolf, C., Garcia, C., Baskurt, A.: Sequential deep learning for human action recognition. In: Salah, A.A., Lepri, B. (eds.) HBU 2011. LNCS, vol. 7065, pp. 29–39. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25446-8_4
Chapter Google Scholar
Ullah, M., Ullah, H., Alseadonn, I.M.: Human action recognition in videos using stable features. Sig. Image Process. Int. J. (SIPIJ) 8(6), 1–10 (2017)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations (2015)
Google Scholar
Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and \(<\)0.5 MB model size. arXiv preprint arXiv:1602.07360 (2016)
Han, S., Mao, H., Dally, W.: Deep compression: compressing deep neural network with pruning, trained quantization and Huffman coding. In: 4th International Conference on Learning Representations (2016)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015)
Google Scholar
Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: Flownet 2.0: evolution of optical flow estimation with deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2462–2470 (2017)
Google Scholar
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: 2004 Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004, vol. 3, pp. 32–36. IEEE (August 2004)
Google Scholar
Shi, Y., Tian, Y., Wang, Y., Huang, T.: Sequential deep trajectory descriptor for action recognition with three-stream CNN. IEEE Trans. Multimedia 19(7), 1510–1520 (2017)
Article Google Scholar
Shi, Y., Zeng, W., Huang, T., Wang, Y.: Learning deep trajectory descriptor for action recognition in videos using deep neural networks. In: 2015 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (June 2015)
Google Scholar
Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 428–441. Springer, Heidelberg (2006). https://doi.org/10.1007/11744047_33
Chapter Google Scholar
Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the 15th ACM International Conference on Multimedia, pp. 357–360. ACM (September 2007)
Google Scholar
Caetano, C., dos Santos, J.A., Schwartz, W.R.: Optical flow co-occurrence matrices: a novel spatiotemporal feature descriptor. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 1947–1952. IEEE (December 2016)
Google Scholar
Al-Akam, R., Paulus, D.: Dense 3D optical flow co-occurrence matrices for human activity recognition. In: Proceedings of the 5th International Workshop on Sensor-Based Activity Recognition and Interaction, p. 16. ACM (September 2018)
Google Scholar
Samir, H., El Munim, H.E.A., Aly, G.: Suspicious human activity recognition using statistical features. In: 2018 13th International Conference on Computer Engineering and Systems (ICCES), pp. 589–594. IEEE (December 2018)
Google Scholar
Khan, M.A., Akram, T., Sharif, M., Javed, M.Y., Muhammad, N., Yasmin, M.: An implementation of optimized framework for action classification using multilayers neural network on selected fused features. Pattern Anal. Appl. 22(4), 1377–1397 (2019)
Article MathSciNet Google Scholar
Qi, M., Wang, Y., Qin, J., Li, A., Luo, J., Van Gool, L.: stagNet: an attentive semantic RNN for group activity and individual action recognition. In: IEEE Transactions on Circuits and Systems for Video Technology (2019)
Google Scholar
Jaouedi, N., Boujnah, N., Bouhlel, M.S.: A new hybrid deep learning model for human action recognition. J. King Saud Univ. Comput. Inf. Sci 32(4), 447–453 (2020)
Google Scholar
Tong, M., Wang, H., Tian, W., Yang, S.: Action recognition new framework with robust 3D-TCCHOGAC and 3D-HOOFGAC. Multimedia Tools Appl. 76(2), 3011–3030 (2017)
Article Google Scholar
Shao, L., Liu, L., Yu, M.: Kernelized multiview projection for robust action recognition. Int. J. Comput. Vis. 118(2), 115–129 (2016)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Westville Campus, Private Bag X54001, Durban, 4000, South Africa
Bradley Joel Pillay, Anban W. Pillay & Edgar Jembere
Centre for AI Research (CAIR), Cape Town, South Africa
Bradley Joel Pillay, Anban W. Pillay & Edgar Jembere

Authors

Bradley Joel Pillay
View author publications
You can also search for this author in PubMed Google Scholar
Anban W. Pillay
View author publications
You can also search for this author in PubMed Google Scholar
Edgar Jembere
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bradley Joel Pillay .

Editor information

Editors and Affiliations

University of Pretoria, Pretoria, South Africa
Aurona Gerber

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pillay, B.J., Pillay, A.W., Jembere, E. (2020). Hybridized Deep Learning Architectures for Human Activity Recognition. In: Gerber, A. (eds) Artificial Intelligence Research. SACAIR 2021. Communications in Computer and Information Science, vol 1342. Springer, Cham. https://doi.org/10.1007/978-3-030-66151-9_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-66151-9_11
Published: 21 December 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-66150-2
Online ISBN: 978-3-030-66151-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics