Abstract
Human activity recognition using video data has been an active research area in computer vision for many years. Various approaches were introduced to efficaciously recognize human activities. This study focuses on identifying activities performed by single individuals using visual information from short video clips. Several deep learning techniques are exploited to develop an architecture to effectively solve the human activity recognition task. The architecture hybridizes a two-stream neural network with a multi-layer perception (MLP). The two-stream neural network is a temporal segment network (TSN) which consists of a spatial and a temporal stream. The architecture adopts Octave Convolution neural networks as frame-level feature extractors in the temporal segment network (TSN). The optical flow calculations were performed using the FlowNet 2.0 algorithm, which serves as inputs to the temporal stream. This newly developed architecture was trained and evaluated on the KTH human activity dataset. The results obtained are competitive to existing state-of-the-art results.
The support of the Centre for High Performance Computing (CHPC) is gratefully acknowledged.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chen, Y., et al.: Drop an octave: reducing spatial redundancy in convolutional neural networks with octave convolution. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3435–3444 (2019)
Zhang, H.B., et al.: A comprehensive survey of vision-based human action recognition methods. Sensors 19(5), 1005 (2019)
Kang, S.M., Wildes, R.P.: Review of action recognition and detection methods. arXiv preprint arXiv:1610.06906 (2016)
Chandni, Khurana, R., Kushwaha A.K.S: Delving deeper with dual-stream CNN for activity recognition. In: Khare, A., Tiwary, U., Sethi, I., Singh, N. (eds.) Recent Trends in Communication, Computing, and Electronics. LNEE, vol. 524, pp. 333–342. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-2685-1_32
Bilkhu, M., Ayyubi, H.: Human Activity Recognition for Edge Devices. arXiv preprint arXiv:1903.07563 (2019)
Wang, L., et al.: Temporal segment networks: towards good practices for deep action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 20–36. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_2
Arif, S., Wang, J., Ul Hassan, T., Fei, Z.: 3D-CNN-based fused feature maps with LSTM applied to action recognition. Fut. Internet 11(2), 42 (2019)
Song, S., Cheung, N.M., Chandrasekhar, V., Mandal, B.: Deep adaptive temporal pooling for activity recognition. In: Proceedings of the 26th ACM International Conference on Multimedia, pp. 1829–1837 (October 2018)
Baccouche, M., Mamalet, F., Wolf, C., Garcia, C., Baskurt, A.: Sequential deep learning for human action recognition. In: Salah, A.A., Lepri, B. (eds.) HBU 2011. LNCS, vol. 7065, pp. 29–39. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25446-8_4
Ullah, M., Ullah, H., Alseadonn, I.M.: Human action recognition in videos using stable features. Sig. Image Process. Int. J. (SIPIJ) 8(6), 1–10 (2017)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations (2015)
Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and \(<\)0.5 MB model size. arXiv preprint arXiv:1602.07360 (2016)
Han, S., Mao, H., Dally, W.: Deep compression: compressing deep neural network with pruning, trained quantization and Huffman coding. In: 4th International Conference on Learning Representations (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015)
Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: Flownet 2.0: evolution of optical flow estimation with deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2462–2470 (2017)
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: 2004 Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004, vol. 3, pp. 32–36. IEEE (August 2004)
Shi, Y., Tian, Y., Wang, Y., Huang, T.: Sequential deep trajectory descriptor for action recognition with three-stream CNN. IEEE Trans. Multimedia 19(7), 1510–1520 (2017)
Shi, Y., Zeng, W., Huang, T., Wang, Y.: Learning deep trajectory descriptor for action recognition in videos using deep neural networks. In: 2015 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (June 2015)
Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 428–441. Springer, Heidelberg (2006). https://doi.org/10.1007/11744047_33
Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the 15th ACM International Conference on Multimedia, pp. 357–360. ACM (September 2007)
Caetano, C., dos Santos, J.A., Schwartz, W.R.: Optical flow co-occurrence matrices: a novel spatiotemporal feature descriptor. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 1947–1952. IEEE (December 2016)
Al-Akam, R., Paulus, D.: Dense 3D optical flow co-occurrence matrices for human activity recognition. In: Proceedings of the 5th International Workshop on Sensor-Based Activity Recognition and Interaction, p. 16. ACM (September 2018)
Samir, H., El Munim, H.E.A., Aly, G.: Suspicious human activity recognition using statistical features. In: 2018 13th International Conference on Computer Engineering and Systems (ICCES), pp. 589–594. IEEE (December 2018)
Khan, M.A., Akram, T., Sharif, M., Javed, M.Y., Muhammad, N., Yasmin, M.: An implementation of optimized framework for action classification using multilayers neural network on selected fused features. Pattern Anal. Appl. 22(4), 1377–1397 (2019)
Qi, M., Wang, Y., Qin, J., Li, A., Luo, J., Van Gool, L.: stagNet: an attentive semantic RNN for group activity and individual action recognition. In: IEEE Transactions on Circuits and Systems for Video Technology (2019)
Jaouedi, N., Boujnah, N., Bouhlel, M.S.: A new hybrid deep learning model for human action recognition. J. King Saud Univ. Comput. Inf. Sci 32(4), 447–453 (2020)
Tong, M., Wang, H., Tian, W., Yang, S.: Action recognition new framework with robust 3D-TCCHOGAC and 3D-HOOFGAC. Multimedia Tools Appl. 76(2), 3011–3030 (2017)
Shao, L., Liu, L., Yu, M.: Kernelized multiview projection for robust action recognition. Int. J. Comput. Vis. 118(2), 115–129 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Pillay, B.J., Pillay, A.W., Jembere, E. (2020). Hybridized Deep Learning Architectures for Human Activity Recognition. In: Gerber, A. (eds) Artificial Intelligence Research. SACAIR 2021. Communications in Computer and Information Science, vol 1342. Springer, Cham. https://doi.org/10.1007/978-3-030-66151-9_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-66151-9_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-66150-2
Online ISBN: 978-3-030-66151-9
eBook Packages: Computer ScienceComputer Science (R0)