A novel feature for action recognition

Wen, Hao; Lu, Zhe-Ming; Cui, Jia-Lin; Li, Hao-Lai

doi:10.1007/s11042-023-17251-3

A novel feature for action recognition

Published: 12 October 2023

Volume 83, pages 41441–41456, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Hao Wen¹,
Zhe-Ming Lu¹,
Jia-Lin Cui² &
…
Hao-Lai Li³

76 Accesses
Explore all metrics

Abstract

In this paper, we focus on how to better represent video data in the field of action recognition. We propose a new image feature named phase spectrum reconstruction map to facilitate action recognition, which extracts the contour features of RGB frame images from a video clip beneficial for action recognition. We demonstrate the effectiveness of such a feature with ablation experiments using the channel-based feature fusion method and two-stream method. Also, we verify that the reconstructed map does contain motion-related features and can be learned by convolutional neural networks only using the reconstructed map as input features. Our method is trained and evaluated using the benchmark datasets HMDB-51 and UCF-101, and both show significant improvements over other methods without adding the reconstructed map features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CBAM: Convolutional Block Attention Module

A review of convolutional neural networks in computer vision

Article Open access 23 March 2024

A survey of the recent architectures of deep convolutional neural networks

Article 21 April 2020

Data Availibility Statement

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

References

Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 1725–1732
Horn BK, Schunck BG (1981) Determining optical flow. Artif Intell 17(1–3):185–203
Article Google Scholar
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. Advances in neural information processing systems, p 27
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE International conference on computer vision, pp 4489–4497
Liu Z, Wang L, Wu W, Qian C, Lu T (2021) Tam: Temporal adaptive module for video recognition. In: Proceedings of the IEEE/CVF International conference on computer vision, pp 13708–13718
Sun S, Kuang Z, Sheng L, Ouyang W, Zhang W (2018) Optical flow guided feature: A fast and robust motion representation for video action recognition. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 1390–1399
Diba A, Pazandeh AM, Van Gool L (2016) Efficient two-stream motion and appearance 3d cnns for video classification. arXiv:1608.08851
Zhu Y, Lan Z, Newsam S, Hauptmann A (2018) Hidden two-stream convolutional networks for action recognition. In: Asian conference on computer vision, pp 363–378. Springer
Zhang C, Zou Y, Chen G, Gan L (2020) Pan: Towards fast action recognition via learning persistence of appearance. arXiv:2008.03462
Stroud J, Ross D, Sun C, Deng J, Sukthankar R (2020) D3d: Distilled 3d networks for video action recognition. In: Proceedings of the IEEE/CVF Winter conference on applications of computer vision, pp 625–634
Singla N (2014) Motion detection based on frame difference method. Int J Inf Computat Technol 4(15):1559–1565
Google Scholar
Ng JY-H, Davis LS (2018) Temporal difference networks for video action recognition. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp 1587–1596. IEEE
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: 30-second AAAI Conference on artificial intelligence
Wu Z, Xiong C, Ma C-Y, Socher R, Davis LS (2019) Adaframe: Adaptive frame selection for fast video recognition. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 1278–1287
Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Gool LV (2016) Temporal segment networks: Towards good practices for deep action recognition. In: European conference on computer vision, pp 20–36. Springer
Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 1933–1941
Carreira J, Zisserman A (2017) Quo vadis, action recognition? a new model and the kinetics dataset. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 6299–6308
Guo W, Wang J, Wang S (2019) Deep multimodal representation learning: A survey. IEEE Access 7:63373–63394
Article Google Scholar
Goodale MA, Milner AD (1992) Separate visual pathways for perception and action. Trends Neurosci 15(1):20–25
Article Google Scholar
Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) Hmdb: a large video database for human motion recognition. In: 2011 International conference on computer vision, pp 2556–2563. IEEE
Soomro K, Zamir AR, Shah M (2012) A dataset of 101 human action classes from videos in the wild. Center for Research in Computer Vision 2(11)
Hara K, Kataoka H, Satoh Y (2018) Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet? In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 6546–6555
Liu Z, Wang L, Wu W, Qian C, Lu T (2020) Tam: Temporal adaptive module for video recognition. arXiv:2005.06803
Feichtenhofer C, Fan H, Malik J, He K (2019) Slowfast networks for video recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)
Bertasius G, Wang H, Torresani L (2021) Is space-time attention all you need for video understanding?
Lin J, Gan C, Han S (2018) Temporal shift module for efficient video understanding. arXiv:1811.08383
Contributors M (2020) OpenMMLab’s Next generation video understanding toolbox and benchmark. https://github.com/open-mmlab/mmaction2
Huang G, Bors AG (2020) Learning spatio-temporal representations with temporal squeeze pooling. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 2103–2107. IEEE
Jiang B, Wang M, Gan W, Wu W, Yan J (2019) Stm: Spatiotemporal and motion encoding for action recognition. In: Proceedings of the IEEE/CVF International conference on computer vision, pp 2000–2009
Chen Y, Kalantidis Y, Li J, Yan S, Feng J (2018) Multi-fiber networks for video recognition. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 352–367
Liu Z, Luo D, Wang Y, Wang L, Tai Y, Wang C, Li J, Huang F, Lu T (2020) Teinet: Towards an efficient architecture for video recognition. Proceedings of the AAAI Conference on artificial intelligence 34:11669–11676
Article Google Scholar
Tran D, Wang H, Torresani L, Ray J, LeCun Y, Paluri M (2018) A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 6450–6459
Zhang Y, Li X, Liu C, Shuai B, Zhu Y, Brattoli B, Chen H, Marsic I, Tighe J (2021) Vidtr: Video transformer without convolutions. In: Proceedings of the IEEE/CVF International conference on computer vision, pp 13577–13587

Download references

Acknowledgements

This research is supported in part by the National Key Research and Development Program of China under Grant No.2020AAA0140004.

Author information

Authors and Affiliations

School of Aeronautics and Astronautics, Zhejiang University, 38 Zheda Road, Hangzhou, 310027, Zhejiang, People’s Republic of China
Hao Wen & Zhe-Ming Lu
School of Information Science and Engineering, NingboTech University, Ningbo, 315100, Zhejiang, People’s Republic of China
Jia-Lin Cui
EFORT Intelligent Equipment Co., Ltd., Shanghai, 201600, People’s Republic of China
Hao-Lai Li

Authors

Hao Wen
View author publications
You can also search for this author in PubMed Google Scholar
Zhe-Ming Lu
View author publications
You can also search for this author in PubMed Google Scholar
Jia-Lin Cui
View author publications
You can also search for this author in PubMed Google Scholar
Hao-Lai Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhe-Ming Lu.

Ethics declarations

Conflicts of interest

We declare that we have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wen, H., Lu, ZM., Cui, JL. et al. A novel feature for action recognition. Multimed Tools Appl 83, 41441–41456 (2024). https://doi.org/10.1007/s11042-023-17251-3

Download citation

Received: 04 August 2022
Revised: 18 August 2023
Accepted: 22 September 2023
Published: 12 October 2023
Issue Date: April 2024
DOI: https://doi.org/10.1007/s11042-023-17251-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel feature for action recognition

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

A review of convolutional neural networks in computer vision

A survey of the recent architectures of deep convolutional neural networks

Data Availibility Statement

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A novel feature for action recognition

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

A review of convolutional neural networks in computer vision

A survey of the recent architectures of deep convolutional neural networks

Data Availibility Statement

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation