Spatiotemporal feature enhancement network for action recognition

Huang, Guancheng; Wang, Xiuhui; Li, Xuesheng; Wang, Yaru

doi:10.1007/s11042-023-17834-0

Spatiotemporal feature enhancement network for action recognition

Published: 15 December 2023

(2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Guancheng Huang¹,
Xiuhui Wang ORCID: orcid.org/0000-0003-1773-9760¹,
Xuesheng Li² &
…
Yaru Wang²

116 Accesses
Explore all metrics

Abstract

As a hot topic in the field of computer vision, video action recognition has great application potential, such as intelligent monitoring, data recommendation and virtual reality. However, although convolutional neural network has achieved great success in the field of image classification, it has not ushered in its “AlexNet” moment in the field of action recognition, and temporal modeling is still the key and challenge of video action recognition. Aiming at the problem that spatiotemporal feature can not be fully used in video action recognition, a spatiotemporal feature enhancement network for action recognition is proposed. First of all, we designed a novel temporal feature aggregation module to enhance the short-range temporal features and enhance action related features. Then, we present a grouping convolution superposition module to expand the receptive field of temporal features and improve the learning ability of the network to long-term temporal and spatial features. Finally, experiments are conducted on the public action datasets UCF-101 and HMDB-51, and the experimental results demonstrate that the proposed method in this paper effectively enhances the fusion of spatiotemporal features and achieves high accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Fig. 4

Exploring hybrid spatio-temporal convolutional networks for human action recognition

Article 08 March 2017

Action Recognition in Videos with Spatio-Temporal Fusion 3D Convolutional Neural Networks

Article 01 July 2021

Spatiotemporal Fusion Networks for Video Action Recognition

Article 03 January 2019

Data Availability

Data will be made available on reasonable request.

References

Huang Y, Yang X, Gao J, Xu C (2022) Holographic feature learning of egocentric-exocentric videos for multi-domain action recognition. IEEE Trans Multimed 24:2273–2286
Article Google Scholar
Liu J, Akhtar N, Mian A (2022) Adversarial attack on skeleton-based human action recognition. IEEE Trans Neural Netw Learn Syst 33(4):1609–1622
Article Google Scholar
Lin W, Ding X, Huang Y, Zeng H (2023) Self-supervised video-based action recognition with disturbances. IEEE Trans Image Process 32:2493–2507
Article Google Scholar
Cui M, Wang W, Zhang K, Sun Z, Wang L (2023) Pose-appearance relational modeling for video action recognition. IEEE Trans Image Process 32:295–308
Article Google Scholar
Yan R, Xie L, Shu X, Zhang L, Tang J (2023) Progressive instance-aware feature learning for compositional action recognition. IEEE Trans Pattern Anal Mach Intell 45(8):10317–10330
Article Google Scholar
Yan R, Xie L, Shu X, Zhang L, Tang J (2023) Progressive instance-aware feature learning for compositional action recognition. IEEE Trans Pattern Anal Mach Intell 45(8):10317–10330
Article Google Scholar
Luo H, Lin G, Yao Y, Tang Z, Wu Q, Hua X (2022) Dense semantics-assisted networks for video action recognition. IEEE Trans Circuit Syst Video Technol 32(5):3073–3084
Article Google Scholar
Li S, He X, Song W, Hao A, Qin H (2023) Graph diffusion convolutional network for skeleton based semantic recognition of two-person actions. IEEE Trans Pattern Anal Mach Intell 45(7):8477–8493
Google Scholar
Nigam N, Dutta T, Gupta HP (2022) Factornet: Holistic actor, object, and scene factorization for action recognition in videos. IEEE Trans Circuits Syst Video Technol 32(3):976–991
Article Google Scholar
Geng T, Zheng F, Hou X, Lu K, Qi G-J, Shao L (2022) Spatial-temporal pyramid graph reasoning for action recognition. IEEE Trans Image Process 31:5484–5497
Article Google Scholar
Wang Y, Xiao Y, Lu J, Tan B, Cao Z, Zhang Z, Zhou JT (2022) Discriminative multi-view dynamic image fusion for cross-view 3-d action recognition. IEEE Trans Neural Netw Learn Syst 33(10):5332–5345
Article Google Scholar
Hu W, Liu H, Du Y, Yuan C, Li B, Maybank S (2022) Interaction-aware spatio-temporal pyramid attention networks for action classification. IEEE Trans Pattern Anal Mach Intell 44(10):7010–7028
Article Google Scholar
Wang F, Geng S, Zhang D, Zhou M, Nian W, Li L (2022) A fine-grained classification method of thangka image based on senet. In: 2022 International Conference on Cyberworlds (CW), pp 23–30
Paing MP, Pintavirooj C (2023) Adenoma dysplasia grading of colorectal polyps using fast fourier convolutional resnet (ffc-resnet). IEEE Access 11:16644–16656
Article Google Scholar
Svecic A, Francoeur J, Soulez G, Monet F, Kashyap R, Kadoury S (2023) Shape and flow sensing in arterial image guidance from uv exposed optical fibers based on spatio-temporal networks. IEEE Trans Biomed Eng 70(5):1692–1703
Article Google Scholar
Zhang H, Lei L, Ni W, Yang X, Tang T, Cheng K, Xiang D, Kuang G (2023) Optical and sar image dense registration using a robust deep optical flow framework. IEEE J Sel Top Appl Earth Obs Remote Sens 16:1269–1294
Article Google Scholar
Khan S, Hassan A, Hussain F, Perwaiz A, Riaz F, Alsabaan M, Abdul W (2023) Enhanced spatial stream of two-stream network using optical flow for human action recognition. Appl Sci 13:8003
Article Google Scholar
Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Van Gool L (2016) Temporal segment networks: towards good practices for deep action recognition. 9912
Zhu W, Tan Y (2023) A moving infrared small target detection method based on optical flow-guided neural networks. In: 2023 4th International conference on computer vision, image and deep learning (CVIDL), pp 531–535
Alkaddour M, Tariq U, Dhall A (2022) Self-supervised approach for facial movement based optical flow. IEEE Trans Affect Comput 13(4):2071–2085
Article Google Scholar
Khairallah MZ, Bonardi F, Roussel D, Bouchafa S (2022) Pca event-based optical flow: a fast and accurate 2d motion estimation. In: 2022 IEEE International conference on image processing (ICIP), pp 3521–3525
Luo Y, Ying X, Li R, Wan Y, Hu B, Ling Q (2022) Multi-scale optical flow estimation for video infrared small target detection. In: 2022 2nd International conference on computer science, electronic information engineering and intelligent control technology (CEI), pp 129–132
Dobrički T, Zhuang X, Won KJ, Hong B-W (2022) Survey on unsupervised learning methods for optical flow estimation. In: 2022 13th International conference on information and communication technology convergence (ICTC), pp 591–594
Owoyemi J, Hashimoto K (2017) Learning human motion intention with 3d convolutional neural network. In: 2017 IEEE International conference on mechatronics and automation (ICMA), pp 1810–1815
Lai Y-C, Huang R-J, Kuo Y-P, Tsao C-Y, Wang J-H, Chang C-C Underwater target tracking via 3d convolutional networks. In: 2019 IEEE 6th International conference on industrial engineering and applications (ICIEA), pp 485–490
Anshu AK, Arya KV, Gupta A (2020) View invariant gait feature extraction using temporal pyramid pooling with 3d convolutional neural network. In: 2020 IEEE 15th International conference on industrial and information systems (ICIIS), pp 242–246
Wang M, Xing J, Su J, Chen J, Liu Y (2023) Learning spatiotemporal and motion features in a unified 2d network for action recognition. IEEE Trans Pattern Anal Mach Intell 45(3):3347–3362
Google Scholar
Miao X, Ke X (2022) Real-time action detection method based on multi-scale spatiotemporal feature. In: 2022 International conference on image processing, computer vision and machine learning (ICICML), pp 245–248
Hu J, Shen L, Albanie S, Sun G, Wu E (2020) Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell 42(8):2011–2023
Article Google Scholar
Gao S-H, Cheng M-M, Zhao K, Zhang X-Y, Yang M-H, Torr P (2021) Res2net: a new multi-scale backbone architecture. IEEE Trans Pattern Anal Mach Intell 43(2):652–662
Article Google Scholar
Soomro K, Zamir A, Shah M (2012) Ucf101: A dataset of 101 human actions classes from videos in the wild. CoRR 12
Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) Hmdb: a large video database for human motion recognition. In: 2011 International conference on computer vision, pp 2556–2563
Gangrade S, Sharma PC, Sharma AK (2023) Colonoscopy polyp segmentation using deep residual u-net with bottleneck attention module. In: 2023 Fifth International conference on electrical, computer and communication technologies (ICECCT), pp 1–6
Li N, Guo R, Liu X, Wu L, Wang H (2022) Dental detection and classification of yolov3-spp based on convolutional block attention module. In: 2022 IEEE 8th International conference on computer and communications (ICCC), pp 2151–2156
Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q (2020) Eca-net: Efficient channel attention for deep convolutional neural networks. In: 2020 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 11531–11539
Jiang B, Wang M, Gan W, Wu W, Yan J (2021) Stm: spatiotemporal and motion encoding for action recognition. In: 2019 IEEE/CVF International conference on computer vision (ICCV), pp 2000–2009
Tran D, Wang H, Torresani L, Ray J, LeCun Y, Paluri M (2018) A closer look at spatiotemporal convolutions for action recognition. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition, pp 6450–6459
Wang L, Tong Z, Ji B, Wu G (2021) Tdn: Temporal difference networks for efficient action recognition. In: 2021 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 1895–1904
Wang Z, She Q, Smolic A (2021) Action-net: Multipath excitation for action recognition. In: 2021 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 13209–13218

Download references

Acknowledgements

This work is supported by the Key R &D project of Zhejiang Province of China (No. 2021C03151), the Natural Science Foundation of Zhejiang Province of China (No. LY20F020018) and the Public Projects of Zhejiang Province of China (No. LGG21G010001).

Author information

Authors and Affiliations

College of Information Engineering, China Jiliang University, Hangzhou, 310018, China
Guancheng Huang & Xiuhui Wang
Key Laboratory of Safety Engineering and Technology Research of Zhejiang Province, Hangzhou, 310027, China
Xuesheng Li & Yaru Wang

Authors

Guancheng Huang
View author publications
You can also search for this author in PubMed Google Scholar
Xiuhui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xuesheng Li
View author publications
You can also search for this author in PubMed Google Scholar
Yaru Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiuhui Wang.

Ethics declarations

Conflicts of Interest

The authors declare that there is no conflict of interests regarding the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Huang, G., Wang, X., Li, X. et al. Spatiotemporal feature enhancement network for action recognition. Multimed Tools Appl (2023). https://doi.org/10.1007/s11042-023-17834-0

Download citation

Received: 04 April 2022
Revised: 09 September 2023
Accepted: 06 December 2023
Published: 15 December 2023
DOI: https://doi.org/10.1007/s11042-023-17834-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spatiotemporal feature enhancement network for action recognition

Abstract

Access this article

Similar content being viewed by others

Exploring hybrid spatio-temporal convolutional networks for human action recognition

Action Recognition in Videos with Spatio-Temporal Fusion 3D Convolutional Neural Networks

Spatiotemporal Fusion Networks for Video Action Recognition

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Spatiotemporal feature enhancement network for action recognition

Abstract

Access this article

Similar content being viewed by others

Exploring hybrid spatio-temporal convolutional networks for human action recognition

Action Recognition in Videos with Spatio-Temporal Fusion 3D Convolutional Neural Networks

Spatiotemporal Fusion Networks for Video Action Recognition

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation