Skip to main content
Log in

Thermal infrared action recognition with two-stream shift Graph Convolutional Network

  • Research
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

The extensive deployment of camera-based IoT devices in our society is heightening the vulnerability of citizens’ sensitive information and individual data privacy. In this context, thermal imaging techniques become essential for data desensitization, entailing the elimination of sensitive data to safeguard individual privacy. Meanwhile, thermal imaging techniques can also play a important role in industry by considering the industrial environment with low resolution, high noise and unclear objects’ features. Moreover, existing works often process the entire video as a single entity, which results in suboptimal robustness by overlooking individual actions occurring at different times. In this paper, we propose a lightweight algorithm for action recognition in thermal infrared videos using human skeletons to address this. Our approach includes YOLOv7-tiny for target detection, Alphapose for pose estimation, dynamic skeleton modeling, and Graph Convolutional Networks (GCN) for spatial-temporal feature extraction in action prediction. To overcome detection and pose challenges, we created OQ35-human and OQ35-keypoint datasets for training. Besides, the proposed model enhances robustness by using visible spectrum data for GCN training. Furthermore, we introduce the two-stream shift Graph Convolutional Network to improve the action recognition accuracy. Our experimental results on the custom thermal infrared action dataset (InfAR-skeleton) demonstrate Top-1 accuracy of 88.06% and Top-5 accuracy of 98.28%. On the filtered kinetics-skeleton dataset, the algorithm achieves Top-1 accuracy of 55.26% and Top-5 accuracy of 83.98%. Thermal Infrared Action Recognition ensures the protection of individual privacy while meeting the requirements of action recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Code Availability

The code that support the findings of this study are available from the corresponding author upon reasonable request.

References

  1. Raza, M.A., Fisher, R.B.: Vision-based approach to assess performance levels while eating. Mach. Vis. Appl. 34(6), 124 (2023)

    Article  Google Scholar 

  2. Gammulle, H., Ahmedt-Aristizabal, D., Denman, S., Tychsen-Smith, L., Petersson, L., Fookes, C.: Continuous human action recognition for human–machine interaction: a review. ACM Comput. Surv. 55, 1–38 (2022)

    Article  Google Scholar 

  3. Gao, C., Du, Y., Liu, J., Lv, J., Yang, L., Meng, D., Hauptmann, A.: Infar dataset: infrared action recognition at different times. Neurocomputing 212, 36–47 (2016)

    Article  Google Scholar 

  4. Jiang, Z., Rozgic, V., Adali, S.: Learning spatiotemporal features for infrared action recognition with 3d convolutional neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 309–317 (2017)

  5. Liu, Y., Lu, Z., Li, J., Yang, T., Yao, C.: Global temporal representation based cnns for infrared action recognition. IEEE Signal Process. Lett. 25, 848–852 (2018)

    Article  Google Scholar 

  6. Wang, L., Gao, C., Zhao, Y., Song, T., Feng, Q.: Infrared and visible image registration using transformer adversarial network. In: 2018 25th IEEE International Conference on Image Processing (ICIP), pp. 1248–1252 (2018)

  7. Chen, X., Gao, C., Li, C., Yang, Y., Meng, D.: Infrared action detection in the dark via cross-stream attention mechanism. IEEE Trans. Multimed. 24, 288–300 (2021)

    Article  Google Scholar 

  8. Wang, C.-Y., Bochkovskiy, A., Liao, H.: Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7464–7475 (2022)

  9. Fang, H., Xie, S., Tai, Y.-W., Lu, C.: Rmpe: regional multi-person pose estimation. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2353–2362 (2016)

  10. Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. ArXiv, pp. 7444–7452 (2018)

  11. Zhang, X., Demiris, Y.: Visible and infrared image fusion using deep learning. IEEE Trans. Pattern Anal. Mach. Intell. 45, 10535–10554 (2023)

    Article  Google Scholar 

  12. Si, T., He, F., Li, P., Gao, X.: Tri-modality consistency optimization with heterogeneous augmented images for visible-infrared person re-identification. Neurocomputing 523, 170–181 (2023)

    Article  Google Scholar 

  13. Liu, D., Yang, H., Shao, Y.: Fusion of infrared and visible light images for object detection based on CNN. In: 2021 10th International Conference on Internet Computing for Science and Engineering, pp. 110–115 (2021)

  14. Guo, H., Tang, T., Luo, G., Chen, R., Lu, Y., Wen, L.: Multi-domain pose network for multi-person pose estimation and tracking. ArXiv, pp. 209–216 (2018)

  15. Torralba, A., Russell, B.C., Yuen, J.: Labelme: online image annotation and applications. Proc. IEEE 98, 1467–1484 (2010)

    Article  Google Scholar 

  16. Stefanics, D., Fox, M.: Coco annotator. ACM SIGMultimed. Rec. 13, 1–1 (2021)

    Article  Google Scholar 

  17. Cheng, K., Zhang, Y., He, X., Chen, W., Cheng, J., Lu, H.: Skeleton-based action recognition with shift graph convolutional network. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 180–189 (2020)

  18. Ramasinghe, S., Rodrigo, R.: Action recognition by single stream convolutional neural networks: an approach using combined motion and static information. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), pp. 101–105 (2015)

  19. Jain, M., Jégou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2555–2562 (2013)

  20. Qi, D., Su, L., Song, J., Cui, E., Bharti, T., Sacheti, A.: Imagebert: cross-modal pre-training with large-scale weak-supervised image-text data. arXiv:2001.07966 (2020)

  21. Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4724–4733 (2017)

  22. Ji, K., Lei, W., Zhang, W.: A deep retinex network for underwater low-light image enhancement. Mach. Vis. Appl. 34(6), 122 (2023)

    Article  Google Scholar 

  23. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014)

  24. Donahue, J., Hendricks, L.A., Rohrbach, M., Venugopalan, S., Guadarrama, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2625–2634 (2014)

  25. Tran, D., Bourdev, L.D., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3d convolutional networks. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 4489–4497 (2014)

  26. Zhou, Y., Sun, X., Zha, Z., Zeng, W.: Mict: mixed 3d/2d convolutional tube for human action recognition. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 449–458 (2018)

  27. Zhu, X., Wang, Y., Dai, J., Yuan, L., Wei, Y.: Flow-guided feature aggregation for video object detection. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 408–417 (2017)

  28. Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. arXiv:1406.2199 (2014)

  29. Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., Gool, L.: Temporal segment networks: towards good practices for deep action recognition. ArXiv, pp. 20–36 (2016)

  30. Zhu, Y., Lan, Z., Newsam, S., Hauptmann, A.: Hidden two-stream convolutional networks for action recognition. arXiv:1704.00389 (2017)

  31. Liu, K., Liu, W., Gan, C., Tan, M., Ma, H.: T-c3d: temporal convolutional 3d network for real-time action recognition. ArXiv, pp. 7138–7145 (2018)

  32. Zhang, X., Zeng, H., Guo, S., Zhang, L.: Efficient long-range attention network for image super-resolution. ArXiv, pp. 649–667 (2022)

  33. Tan, M., Le, Q.V.: Efficientnet: rethinking model scaling for convolutional neural networks. arXiv:1905.11946 (2019)

  34. Zhang, G., Zhu, Y., Wang, H., Chen, Y., Wu, G., Wang, L.: Extracting motion and appearance via inter-frame attention for efficient video frame interpolation. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5682–5692 (2023)

  35. Tsai, D.-M., Chiu, W.-Y., Lee, M.-H.: Optical flow-motion history image (OF-MHI) for action recognition. Signal Image Video Process. 9, 1897–1906 (2015)

    Article  Google Scholar 

  36. Papandreou, G., Zhu, T.L., Kanazawa, N., Toshev, A., Tompson, J., Bregler, C., Murphy, K.: Towards accurate multi-person pose estimation in the wild. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3711–3719 (2017)

Download references

Funding

No funding was received for conducting this study.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study’s conception and design. Experimentation and ablation studies were performed by JL, WH, and JW. Data analysis and review were conducted by DH, RH and DT. The first draft of the manuscript was written by JL. Review and editing were performed by HW, and all authors commented on previous versions of the manuscript. The project supervision is done by F. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Huanyu Wang.

Ethics declarations

Conflict of interest

There is no conflict of interest in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, J., Wang, H., Wang, J. et al. Thermal infrared action recognition with two-stream shift Graph Convolutional Network. Machine Vision and Applications 35, 65 (2024). https://doi.org/10.1007/s00138-024-01550-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-024-01550-2

Keywords

Navigation