Thermal infrared action recognition with two-stream shift Graph Convolutional Network

Liu, Jishi; Wang, Huanyu; Wang, Junnian; He, Dalin; Xu, Ruihan; Tang, Xiongfeng

doi:10.1007/s00138-024-01550-2

Thermal infrared action recognition with two-stream shift Graph Convolutional Network

Research
Published: 13 May 2024

Volume 35, article number 65, (2024)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

Jishi Liu¹,
Huanyu Wang²,
Junnian Wang¹,
Dalin He¹,
Ruihan Xu¹ &
…
Xiongfeng Tang¹

114 Accesses
Explore all metrics

Abstract

The extensive deployment of camera-based IoT devices in our society is heightening the vulnerability of citizens’ sensitive information and individual data privacy. In this context, thermal imaging techniques become essential for data desensitization, entailing the elimination of sensitive data to safeguard individual privacy. Meanwhile, thermal imaging techniques can also play a important role in industry by considering the industrial environment with low resolution, high noise and unclear objects’ features. Moreover, existing works often process the entire video as a single entity, which results in suboptimal robustness by overlooking individual actions occurring at different times. In this paper, we propose a lightweight algorithm for action recognition in thermal infrared videos using human skeletons to address this. Our approach includes YOLOv7-tiny for target detection, Alphapose for pose estimation, dynamic skeleton modeling, and Graph Convolutional Networks (GCN) for spatial-temporal feature extraction in action prediction. To overcome detection and pose challenges, we created OQ35-human and OQ35-keypoint datasets for training. Besides, the proposed model enhances robustness by using visible spectrum data for GCN training. Furthermore, we introduce the two-stream shift Graph Convolutional Network to improve the action recognition accuracy. Our experimental results on the custom thermal infrared action dataset (InfAR-skeleton) demonstrate Top-1 accuracy of 88.06% and Top-5 accuracy of 98.28%. On the filtered kinetics-skeleton dataset, the algorithm achieves Top-1 accuracy of 55.26% and Top-5 accuracy of 83.98%. Thermal Infrared Action Recognition ensures the protection of individual privacy while meeting the requirements of action recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Actions as points: a simple and efficient detector for skeleton-based temporal action detection

Article 07 March 2023

3 s-STNet: three-stream spatial–temporal network with appearance and skeleton information learning for action recognition

Article 05 October 2022

ICE-GCN: An interactional channel excitation-enhanced graph convolutional network for skeleton-based action recognition

Article Open access 05 April 2023

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Code Availability

The code that support the findings of this study are available from the corresponding author upon reasonable request.

References

Raza, M.A., Fisher, R.B.: Vision-based approach to assess performance levels while eating. Mach. Vis. Appl. 34(6), 124 (2023)
Article Google Scholar
Gammulle, H., Ahmedt-Aristizabal, D., Denman, S., Tychsen-Smith, L., Petersson, L., Fookes, C.: Continuous human action recognition for human–machine interaction: a review. ACM Comput. Surv. 55, 1–38 (2022)
Article Google Scholar
Gao, C., Du, Y., Liu, J., Lv, J., Yang, L., Meng, D., Hauptmann, A.: Infar dataset: infrared action recognition at different times. Neurocomputing 212, 36–47 (2016)
Article Google Scholar
Jiang, Z., Rozgic, V., Adali, S.: Learning spatiotemporal features for infrared action recognition with 3d convolutional neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 309–317 (2017)
Liu, Y., Lu, Z., Li, J., Yang, T., Yao, C.: Global temporal representation based cnns for infrared action recognition. IEEE Signal Process. Lett. 25, 848–852 (2018)
Article Google Scholar
Wang, L., Gao, C., Zhao, Y., Song, T., Feng, Q.: Infrared and visible image registration using transformer adversarial network. In: 2018 25th IEEE International Conference on Image Processing (ICIP), pp. 1248–1252 (2018)
Chen, X., Gao, C., Li, C., Yang, Y., Meng, D.: Infrared action detection in the dark via cross-stream attention mechanism. IEEE Trans. Multimed. 24, 288–300 (2021)
Article Google Scholar
Wang, C.-Y., Bochkovskiy, A., Liao, H.: Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7464–7475 (2022)
Fang, H., Xie, S., Tai, Y.-W., Lu, C.: Rmpe: regional multi-person pose estimation. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2353–2362 (2016)
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. ArXiv, pp. 7444–7452 (2018)
Zhang, X., Demiris, Y.: Visible and infrared image fusion using deep learning. IEEE Trans. Pattern Anal. Mach. Intell. 45, 10535–10554 (2023)
Article Google Scholar
Si, T., He, F., Li, P., Gao, X.: Tri-modality consistency optimization with heterogeneous augmented images for visible-infrared person re-identification. Neurocomputing 523, 170–181 (2023)
Article Google Scholar
Liu, D., Yang, H., Shao, Y.: Fusion of infrared and visible light images for object detection based on CNN. In: 2021 10th International Conference on Internet Computing for Science and Engineering, pp. 110–115 (2021)
Guo, H., Tang, T., Luo, G., Chen, R., Lu, Y., Wen, L.: Multi-domain pose network for multi-person pose estimation and tracking. ArXiv, pp. 209–216 (2018)
Torralba, A., Russell, B.C., Yuen, J.: Labelme: online image annotation and applications. Proc. IEEE 98, 1467–1484 (2010)
Article Google Scholar
Stefanics, D., Fox, M.: Coco annotator. ACM SIGMultimed. Rec. 13, 1–1 (2021)
Article Google Scholar
Cheng, K., Zhang, Y., He, X., Chen, W., Cheng, J., Lu, H.: Skeleton-based action recognition with shift graph convolutional network. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 180–189 (2020)
Ramasinghe, S., Rodrigo, R.: Action recognition by single stream convolutional neural networks: an approach using combined motion and static information. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), pp. 101–105 (2015)
Jain, M., Jégou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2555–2562 (2013)
Qi, D., Su, L., Song, J., Cui, E., Bharti, T., Sacheti, A.: Imagebert: cross-modal pre-training with large-scale weak-supervised image-text data. arXiv:2001.07966 (2020)
Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4724–4733 (2017)
Ji, K., Lei, W., Zhang, W.: A deep retinex network for underwater low-light image enhancement. Mach. Vis. Appl. 34(6), 122 (2023)
Article Google Scholar
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014)
Donahue, J., Hendricks, L.A., Rohrbach, M., Venugopalan, S., Guadarrama, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2625–2634 (2014)
Tran, D., Bourdev, L.D., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3d convolutional networks. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 4489–4497 (2014)
Zhou, Y., Sun, X., Zha, Z., Zeng, W.: Mict: mixed 3d/2d convolutional tube for human action recognition. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 449–458 (2018)
Zhu, X., Wang, Y., Dai, J., Yuan, L., Wei, Y.: Flow-guided feature aggregation for video object detection. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 408–417 (2017)
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. arXiv:1406.2199 (2014)
Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., Gool, L.: Temporal segment networks: towards good practices for deep action recognition. ArXiv, pp. 20–36 (2016)
Zhu, Y., Lan, Z., Newsam, S., Hauptmann, A.: Hidden two-stream convolutional networks for action recognition. arXiv:1704.00389 (2017)
Liu, K., Liu, W., Gan, C., Tan, M., Ma, H.: T-c3d: temporal convolutional 3d network for real-time action recognition. ArXiv, pp. 7138–7145 (2018)
Zhang, X., Zeng, H., Guo, S., Zhang, L.: Efficient long-range attention network for image super-resolution. ArXiv, pp. 649–667 (2022)
Tan, M., Le, Q.V.: Efficientnet: rethinking model scaling for convolutional neural networks. arXiv:1905.11946 (2019)
Zhang, G., Zhu, Y., Wang, H., Chen, Y., Wu, G., Wang, L.: Extracting motion and appearance via inter-frame attention for efficient video frame interpolation. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5682–5692 (2023)
Tsai, D.-M., Chiu, W.-Y., Lee, M.-H.: Optical flow-motion history image (OF-MHI) for action recognition. Signal Image Video Process. 9, 1897–1906 (2015)
Article Google Scholar
Papandreou, G., Zhu, T.L., Kanazawa, N., Toshev, A., Tompson, J., Bregler, C., Murphy, K.: Towards accurate multi-person pose estimation in the wild. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3711–3719 (2017)

Download references

Funding

No funding was received for conducting this study.

Author information

Authors and Affiliations

School of Physics and Electronic Science, Hunan University of Science and Technology, Xiangtan, 411199, Hunan, China
Jishi Liu, Junnian Wang, Dalin He, Ruihan Xu & Xiongfeng Tang
School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, 411199, Hunan, China
Huanyu Wang

Authors

Jishi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Huanyu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Junnian Wang
View author publications
You can also search for this author in PubMed Google Scholar
Dalin He
View author publications
You can also search for this author in PubMed Google Scholar
Ruihan Xu
View author publications
You can also search for this author in PubMed Google Scholar
Xiongfeng Tang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study’s conception and design. Experimentation and ablation studies were performed by JL, WH, and JW. Data analysis and review were conducted by DH, RH and DT. The first draft of the manuscript was written by JL. Review and editing were performed by HW, and all authors commented on previous versions of the manuscript. The project supervision is done by F. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Huanyu Wang.

Ethics declarations

Conflict of interest

There is no conflict of interest in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, J., Wang, H., Wang, J. et al. Thermal infrared action recognition with two-stream shift Graph Convolutional Network. Machine Vision and Applications 35, 65 (2024). https://doi.org/10.1007/s00138-024-01550-2

Download citation

Received: 11 November 2023
Revised: 04 February 2024
Accepted: 22 April 2024
Published: 13 May 2024
DOI: https://doi.org/10.1007/s00138-024-01550-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Thermal infrared action recognition with two-stream shift Graph Convolutional Network

Abstract

Access this article

Similar content being viewed by others

Actions as points: a simple and efficient detector for skeleton-based temporal action detection

3 s-STNet: three-stream spatial–temporal network with appearance and skeleton information learning for action recognition

ICE-GCN: An interactional channel excitation-enhanced graph convolutional network for skeleton-based action recognition

Data availability

Code Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Thermal infrared action recognition with two-stream shift Graph Convolutional Network

Abstract

Access this article

Similar content being viewed by others

Actions as points: a simple and efficient detector for skeleton-based temporal action detection

3 s-STNet: three-stream spatial–temporal network with appearance and skeleton information learning for action recognition

ICE-GCN: An interactional channel excitation-enhanced graph convolutional network for skeleton-based action recognition

Data availability

Code Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation