A motion-aware ConvLSTM network for action recognition

Majd, Mahshid; Safabakhsh, Reza

doi:10.1007/s10489-018-1395-8

A motion-aware ConvLSTM network for action recognition

Published: 23 January 2019

Volume 49, pages 2515–2521, (2019)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Mahshid Majd¹ &
Reza Safabakhsh¹

1563 Accesses
38 Citations
Explore all metrics

Abstract

Human action recognition is an emerging goal of computer vision with several applications such as video surveillance and human-computer interaction. Despite many attempts to develop deep architectures to learn the spatio-temporal features of video, hand-crafted optical flow is still an important part of the recognition process. To engage the motion features deeply inside the learning process, we propose a spatio-temporal video recognition network where a motion-aware long short-term memory module is introduced to estimate the motion flow along with extracting spatio-temporal features. A specific optical flow estimator is subsumed which is based on kernelized cross correlation. The proposed network can be used without any extra learning process and there is no need to pre-compute and store the optical flow. Extensive experiments on two action recognition benchmarks verify the effectiveness of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CBAM: Convolutional Block Attention Module

A review of convolutional neural networks in computer vision

Article Open access 23 March 2024

Video summarization using deep learning techniques: a detailed analysis and investigation

Article 15 March 2023

References

Kourtzi Z, Kanwisher N (2000) Activation in human mt/mst by static images with implied motion. J Cogn Neurosci 12(1):48–55
Article Google Scholar
Herath S, Harandi M, Porikli F (2017) Going deeper into action recognition: a survey. Image Vis Comput 60:4–21
Article Google Scholar
Poppe R (2010) A survey on vision-based human action recognition. Image Vis Comput 28(6):976–990
Article Google Scholar
Srivastava N, Mansimov E, Salakhudinov R (2015) Unsupervised learning of video representations using lstms. In: International conference on machine learning, pp 843–852
Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2625–2634
Ordóñez FJ, Roggen D (2016) Deep convolutional and lstm recurrent neural networks for multimodal wearable activity recognition. Sensors 16(1):115
Article Google Scholar
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Advances in neural information processing systems, pp 568–576
Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1933–1941
Xingjian S, Chen Z, Wang H, Yeung D-Y, Wong W-K, Woo W-C (2015) Convolutional lstm network: a machine learning approach for precipitation nowcasting. In: Advances in neural information processing systems, pp 802–810
Li Z, Gavrilyuk K, Gavves E, Jain M, Snoek C G (2018) Videolstm convolves, attends and flows for action recognition. Comput Vis Image Underst 166:41–50
Article Google Scholar
Jung M, Lee H, Tani J Adaptive detrending to accelerate convolutional gated recurrent unit training for contextual video recognition. arXiv:1705.08764
Sun L, Jia K, Chen K, Yeung D Y, Shi B E, Savarese S Lattice long short-term memory for human action recognition. arXiv:1708.03958
Ng JY-H, Choi J, Neumann J, Davis L S Actionflownet: Learning motion representation for action recognition. arXiv:1612.03052
Dosovitskiy A, Fischer P, Ilg E, Hausser P, Hazirbas C, Golkov V, Van Der Smagt P, Cremers D, Brox T (2015) Flownet: Learning optical flow with convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 2758–2766
Ilg E, Mayer N, Saikia T, Keuper M, Dosovitskiy A, Brox T Flownet 2.0: evolution of optical flow estimation with deep networks. arXiv:1612.01925
Wang C, Zhang L, Xie L, Yuan J Kernel cross-correlator. arXiv:1709.05936
Wang C, Ji T, Nguyen T-M, Xie L Correlation flow: robust optical flow using kernel cross-correlators. arXiv:1802.07078
Borst A (2007) Correlation versus gradient type motion detectors: the pros and cons. Philos Trans Royal Soc Lond B: Biol Sci 362(1479):369–374
Article Google Scholar
Potters M, Bialek W (1994) Statistical mechanics and visual signal processing. J Phys I 4(11):1755–1775
Google Scholar
Borst A, Helmstaedter M (2015) Common circuit design in fly and mammalian motion vision. Nat Neurosci 18(8):1067
Article Google Scholar
Soomro K, Zamir A R, Shah M Ucf101: a dataset of 101 human actions classes from videos in the wild. arXiv:1212.0402
Kuehne H, Jhuang H, Stiefelhagen R, Serre T (2013) Hmdb51: a large video database for human motion recognition. In: High performance computing in science and engineering ’12. Springer, pp 571–582
Simonyan K, Zisserman A Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Krizhevsky A, Sutskever I, Hinton G E (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Wang H, Schmid C (2013) Action recognition with improved trajectories. In: 2013 IEEE international conference on computer vision (ICCV). IEEE, pp 3551–3558
Peng X, Wang L, Wang X, Qiao Y (2016) Bag of visual words and fusion methods for action recognition: comprehensive study and good practice. Comput Vis Image Underst 150:109–125
Article Google Scholar
Wang L, Qiao Y, Tang X (2015) Action recognition with trajectory-pooled deep-convolutional descriptors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4305–4314
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: 2015 IEEE international conference on computer vision (ICCV). IEEE, pp 4489–4497
Wang X, Gao L, Song J, Shen H (2017) Beyond frame-level cnn: Saliency-aware 3-d cnn with lstm for video action recognition. IEEE Signal Process Lett 24(4):510–514
Article Google Scholar
Han Y, Zhang P, Zhuo T, Huang W, Zhang Y (2018) Going deeper with two-stream convnets for action recognition in video surveillance. Pattern Recogn Lett 107:83–90
Article Google Scholar

Download references

Author information

Authors and Affiliations

Amirkabir University of Technology, Tehran, Iran
Mahshid Majd & Reza Safabakhsh

Authors

Mahshid Majd
View author publications
You can also search for this author in PubMed Google Scholar
Reza Safabakhsh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Reza Safabakhsh.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Majd, M., Safabakhsh, R. A motion-aware ConvLSTM network for action recognition. Appl Intell 49, 2515–2521 (2019). https://doi.org/10.1007/s10489-018-1395-8

Download citation

Published: 23 January 2019
Issue Date: 15 July 2019
DOI: https://doi.org/10.1007/s10489-018-1395-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A motion-aware ConvLSTM network for action recognition

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

A review of convolutional neural networks in computer vision

Video summarization using deep learning techniques: a detailed analysis and investigation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A motion-aware ConvLSTM network for action recognition

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

A review of convolutional neural networks in computer vision

Video summarization using deep learning techniques: a detailed analysis and investigation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation