Leveraging cross-resolution attention for effective extreme low-resolution video action recognition

Oguz, Oguzhan; Ikizler-Cinbis, Nazli

doi:10.1007/s11760-023-02766-x

Leveraging cross-resolution attention for effective extreme low-resolution video action recognition

Original Paper
Published: 16 September 2023

Volume 18, pages 399–406, (2024)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Oguzhan Oguz¹ &
Nazli Ikizler-Cinbis¹

252 Accesses
1 Altmetric
Explore all metrics

Abstract

Recognizing human actions in extremely low-resolution (eLR) videos poses a formidable challenge in the action recognition domain due to the lack of temporal and spatial information in the corresponding eLR frames. In this work, we propose a novel eLR video human action recognition architecture that recognize actions in an eLR setup. The proposed approach and its variants utilize an expanded knowledge distillation scheme that provides the essential flow of information from high-resolution (HR) frames to eLR frames. To further improve the generalization capability, we integrate cross-resolution attention modules that can operate without HR information during inference time. Additionally, we investigate the impact of an eLR data preprocessing pipeline that leverages a super-resolution algorithm and experimentally show the efficacy of the proposed models in eLR space. Our experiments indicate the importance of examining eLR human action recognition and demonstrate that the proposed methods can surpass and/or compete with the current state-of-the-art methods, achieving effective generalization capabilities on both UCF-101 and HMDB-51 datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CBAM: Convolutional Block Attention Module

Attention mechanisms in computer vision: A survey

Article Open access 15 March 2022

Video summarization using deep learning techniques: a detailed analysis and investigation

Article 15 March 2023

Data availability

The data used to support the findings of this study are available from the corresponding author upon request.

References

Bai, Y., Zou, Q., Chen, X., et al.: Extreme low-resolution action recognition with confident spatial-temporal attention transfer. Int. J. Comput. Vis. 1–16 (2023)
Carreira, J., Zisserman, A.: Quo vadis, action recognition? a new model and the kinetics dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299–6308 (2017)
Chen, J., Wu, J., Konrad, J., et al.: Semi-coupled two-stream fusion convnets for action recognition at extremely low resolutions. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, pp. 139–147 (2017)
Crasto, N., Weinzaepfel, P., Alahari, K., Schmid, C.: Mars: motion-augmented RGB stream for action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7882–7891 (2019)
Dai, R., Das, S., Brémond, F.: Learning an augmented RGB representation with cross-modal knowledge distillation for action detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13053–13064 (2021)
Dave, I.R., Chen, C., Shah, M.: Spact: Self-supervised privacy preservation for action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20164–20173 (2022)
Demir, U., Rawat, Y.S., Shah, M.: Tinyvirat: low-resolution video action recognition. In: 2020 25th International Conference on Pattern Recognition (ICPR), IEEE, pp. 7387–7394 (2021)
Feichtenhofer, C., Fan, H., Malik, J., et al.: Slowfast networks for video recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6202–6211 (2019)
Hinton, G., Vinyals, O., Dean, J.: Distilling the Knowledge in a Neural Network. arXiv preprint arXiv:1503.02531 (2015)
Hou, M., Liu, S., Zhou, J., et al.: Extreme low-resolution activity recognition using a super-resolution-oriented generative adversarial network. Micromachines 12(6), 670 (2021)
Article Google Scholar
Huang, Z., Wang, X., Wei, Y., et al.: Ccnet: Criss-cross attention for semantic segmentation. In: IEEE Transactions on Pattern Analysis and Machine Intelligence p. 1 (2020). https://doi.org/10.1109/TPAMI.2020.3007032
Kim, H., Jain, M., Lee, J.T., et al.: Efficient action recognition via dynamic knowledge propagation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13719–13728 (2021)
Kuehne, H., Jhuang, H., Garrote, E., et al.: Hmdb: a large video database for human motion recognition. In: 2011 International conference on computer vision, IEEE, pp. 2556–2563 (2011)
Liu, T., Lam, K.-M., Kong, J.: Distilling privileged knowledge for anomalous event detection from weakly labeled videos. In: IEEE Transactions on Neural Networks and Learning Systems, IEEE (2023)
Liu, Z., Ning, J., Cao, Y., et al.: Video swin transformer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition pp. 3202–3211 (2022)
Liu, Z., Wang, L., Wu, W., et al.: Tam: Temporal adaptive module for video recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13708–13718 (2021)
Ma, C., Guo, Q., Jiang, Y., et al.: Rethinking resolution in the context of efficient video recognition. Adv. Neural Inf. Process. Syst. 35, 37865–37877 (2022)
Google Scholar
Purwanto, D., Renanda Adhi Pramono, R., Chen, Y.T., et al.: Extreme low resolution action recognition with spatial-temporal multi-head self-attention and knowledge distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, (2019)
Purwanto, D., Pramono, R.R.A., Chen, Y.T., et al.: Three-stream network with bidirectional self-attention for action recognition in extreme low resolution videos. IEEE Signal Process. Lett. 26(8), 1187–1191 (2019)
Article Google Scholar
Russo, P., Ticca, S., Alati, E., et al.: Learning to see through a few pixels: Multi streams network for extreme low-resolution action recognition. IEEE Access 9, 12019–12026 (2021)
Article Google Scholar
Ryoo, M., Kim, K., Yang, H.: Extreme low resolution activity recognition with multi-siamese embedding learning. In: Proceedings of the AAAI Conference on Artificial Intelligence (2018)
Ryoo, M.S., Rothrock, B., Fleming, C., et al.: Privacy-preserving human activity recognition from extreme low resolution. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
Selvaraju, R.R., Cogswell, M., Das, A., et al.: Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)
Shaikh, A.H., Meshram, B.: Security issues in cloud computing. In: Intelligent Computing and Networking. Springer, pp. 63–77 (2021)
Soomro, K., Zamir, A.R., Shah, M.: Ucf101: A Dataset of 101 Human Actions Classes from Videos in the Wild. arXiv preprint arXiv:1212.0402 (2012)
Xu, M., Sharghi, A., Chen, X., et al.: Fully-coupled two-stream spatiotemporal networks for extremely low resolution action recognition. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, pp. 1607–1615 (2018)
Zhang, K., Gool, L.V., Timofte, R.: Deep unfolding network for image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3217–3226 (2020)

Download references

Funding

This declaration is not applicable.

Author information

Authors and Affiliations

Department of Computer Engineering, Hacettepe University, 06800, Ankara, Turkey
Oguzhan Oguz & Nazli Ikizler-Cinbis

Authors

Oguzhan Oguz
View author publications
You can also search for this author in PubMed Google Scholar
Nazli Ikizler-Cinbis
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All of the authors contributed equally to this work and reviewed the manuscript.

Corresponding author

Correspondence to Oguzhan Oguz.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Ethical approval

This declaration is not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Oguz, O., Ikizler-Cinbis, N. Leveraging cross-resolution attention for effective extreme low-resolution video action recognition. SIViP 18, 399–406 (2024). https://doi.org/10.1007/s11760-023-02766-x

Download citation

Received: 28 July 2023
Revised: 25 August 2023
Accepted: 28 August 2023
Published: 16 September 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s11760-023-02766-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Leveraging cross-resolution attention for effective extreme low-resolution video action recognition

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

Attention mechanisms in computer vision: A survey

Video summarization using deep learning techniques: a detailed analysis and investigation

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Leveraging cross-resolution attention for effective extreme low-resolution video action recognition

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

Attention mechanisms in computer vision: A survey

Video summarization using deep learning techniques: a detailed analysis and investigation

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation