Zero-shot action recognition by clustered representation with redundancy-free features

Xia, Limin; Wen, Xin

doi:10.1007/s00138-023-01470-7

Zero-shot action recognition by clustered representation with redundancy-free features

Original Paper
Published: 09 October 2023

Volume 34, article number 116, (2023)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

159 Accesses
Explore all metrics

Abstract

Zero-shot action recognition (ZSAR) is a practical and challenging issue, which compensates for the shortcomings of existing action recognition by being able to recognize those action classes that don’t have visual representation during training. However, existing zero-shot action recognition doesn’t focus on the fact that the generated features have many outliers, which harms the recognition. A new method for zero-shot action recognition is proposed, which suppresses this defect by clustered representation with redundancy-free features. In addition, a generative adversarial network (GAN) with gradient penalty is trained to synthesize stable features, solving the problem of data imbalance and alleviating the bottleneck of unstable features generated in existing methods. To reduce the dimension and the subsequent computation, a redundancy-free feature is introduced into the ZSAR. Experiments performed on Olympic Sports, HMDB51, and UCF101 public datasets prove that our method outperforms the state-of-the-art approaches with absolute gains of 1.8%, 0.3%, and 1.7%, respectively, in zero-shot action recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spatio-Temporal Self-supervision for Few-Shot Action Recognition

CLASTER: Clustering with Reinforcement Learning for Zero-Shot Action Recognition

Attention-Based Video Disentangling and Matching Network for Zero-Shot Action Recognition

Data availability

The public Olympic Sports Dataset used in this study is available in the Stanford repository, http://vision.stanford.edu/Datasets/OlympicSports/. UCF101 Dataset is available in the University of Central Florida repository, https://www.crcv.ucf.edu/data/UCF101.php. HMDB51 Dataset is available in the SERRE LAB repository, https://serre-lab.clps.brown.edu/resource/hmdb-a-large-human-motion-database/.

References

Wang, P., Liu, L., Shen, C., Shen, H.T.: Order-aware convolutional pooling for video based action recognition. Pattern Recogn. 91, 357–365 (2019). https://doi.org/10.1016/j.patcog.2019.03.002
Article Google Scholar
Li, J., Liu, X., Zhang, M., Wang, D.: Spatio-temporal deformable 3d convnets with attention for action recognition. Pattern Recogn. 98, 107037 (2020). https://doi.org/10.1016/j.patcog.2019.107037
Article Google Scholar
Sun, B., Kong, D., Wang, S., Li, J., Yin, B., Luo, X.: Gan for vision, kg for relation: a two-stage network for zero-shot action recognition. Pattern Recogn. 126, 108563 (2022). https://doi.org/10.1016/j.patcog.2022.108563
Article Google Scholar
Xia, L., Ma, W.: Human action recognition using high-order feature of optical flows. J. Supercomput. 77(12), 14230–14251 (2021)
Article Google Scholar
Gowda, S.N., Sevilla-Lara, L., Keller, F., Rohrbach, M.: Claster: clustering with reinforcement learning for zero-shot action recognition. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision ECCV 2022, pp. 187–203. Springer, Cham (2022)
Chapter Google Scholar
Wang, W., Zheng, V.W., Yu, H., Miao, C.: A survey of zero-shot learning: settings, methods, and applications. ACM Trans. Intell. Syst. Technol. (2019). https://doi.org/10.1145/3293318
Article Google Scholar
Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 951–958 (2009). https://doi.org/10.1109/CVPR.2009.5206594
Xian, Y., Schiele, B., Akata, Z.: Zero-shot learning - the good, the bad and the ugly. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3077–3086 (2017). https://doi.org/10.1109/CVPR.2017.328
Zhang, H., Liu, L., Long, Y., Zhang, Z., Shao, L.: Deep transductive network for generalized zero shot learning. Pattern Recogn. 105, 107370 (2020). https://doi.org/10.1016/j.patcog.2020.107370
Article Google Scholar
Geng, C., Tao, L., Chen, S.: Guided CNN for generalized zero-shot and open-set recognition using visual and semantic prototypes. Pattern Recogn. 102, 107263 (2020). https://doi.org/10.1016/j.patcog.2020.107263
Article Google Scholar
Li, Z., Yao, L., Chang, X., Zhan, K., Sun, J., Zhang, H.: Zero-shot event detection via event-adaptive concept relevance mining. Pattern Recogn. 88, 595–603 (2019). https://doi.org/10.1016/j.patcog.2018.12.010
Article Google Scholar
Frome, A., Corrado, G.S., Shlens, J., Bengio, S., Dean, J., Ranzato, M., Mikolov, T.: Devise: A deep visual-semantic embedding model. In: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2. NIPS’13, pp. 2121–2129. Curran Associates Inc., Red Hook, NY, USA (2013)
Lampert, C.H., Nickisch, H., Harmeling, S.: Attribute-based classification for zero-shot visual object categorization. IEEE Trans. Pattern Anal. Mach. Intell. 36(3), 453–465 (2014). https://doi.org/10.1109/TPAMI.2013.140
Article Google Scholar
Akata, Z., Reed, S., Walter, D., Lee, H., Schiele, B.: Evaluation of output embeddings for fine-grained image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2927–2936 (2015)
Xian, Y., Lorenz, T., Schiele, B., Akata, Z.: Feature generating networks for zero-shot learning. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5542–5551 (2018). https://doi.org/10.1109/CVPR.2018.00581
Verma, V.K., Arora, G., Mishra, A., Rai, P.: Generalized zero-shot learning via synthesized examples. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4281–4289 (2018). https://doi.org/10.1109/CVPR.2018.00450
Schönfeld, E., Ebrahimi, S., Sinha, S., Darrell, T., Akata, Z.: Generalized zero- and few-shot learning via aligned variational autoencoders. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8239–8247 (2019). https://doi.org/10.1109/CVPR.2019.00844
Han, Z., Fu, Z., Yang, J.: Learning the redundancy-free features for generalized zero-shot object recognition. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12862–12871 (2020). https://doi.org/10.1109/CVPR42600.2020.01288
Doshi, K., Yilmaz, Y.: Zero-shot action recognition with transformer-based video semantic embedding. arXiv preprint arXiv:2203.05156 (2022)
Akata, Z., Perronnin, F., Harchaoui, Z., Schmid, C.: Label-embedding for attribute-based classification. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 819–826 (2013). https://doi.org/10.1109/CVPR.2013.111
Xu, X., Hospedales, T., Gong, S.: Semantic embedding space for zero-shot action recognition. In: 2015 IEEE International Conference on Image Processing (ICIP), pp. 63–67 (2015). https://doi.org/10.1109/ICIP.2015.7350760
Xu, X., Hospedales, T., Gong, S.: Transductive zero-shot action recognition by word-vector embedding. Int. J. Comput. Vision 123(3), 309–333 (2017)
Article MathSciNet MATH Google Scholar
Long, Y., Liu, L., Shao, L., Shen, F., Ding, G., Han, J.: From zero-shot learning to conventional supervised classification: Unseen visual data synthesis. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6165–6174 (2017). https://doi.org/10.1109/CVPR.2017.653
Long, Y., Liu, L., Shen, F., Shao, L., Li, X.: Zero-shot learning using synthesised unseen visual data with diffusion regularisation. IEEE Trans. Pattern Anal. Mach. Intell. 40(10), 2498–2512 (2018). https://doi.org/10.1109/TPAMI.2017.2762295
Article Google Scholar
Jurie, F., Bucher, M., Herbin, S.: Generating visual representations for zero-shot classification. In: 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 2666–2673 (2017). https://doi.org/10.1109/ICCVW.2017.308
Felix, R., Reid, I., Carneiro, G., : Multi-modal cycle-consistent generalized zero-shot learning. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 21–37 (2018)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. Adv. Neural Inform.Process. Syst. 27 (2014)
Mandal, D., Narayan, S., Dwivedi, S.K., Gupta, V., Ahmed, S., Khan, F.S., Shao, L.: Out-of-distribution detection for generalized zero-shot action recognition. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9977–9985 (2019). https://doi.org/10.1109/CVPR.2019.01022
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: International Conference on Machine Learning, pp. 214–223 (2017). PMLR
Mishra, A., Verma, V.K., Reddy, M.S.K., S., A., Rai, P., Mittal, A.: A generative approach to zero-shot and few-shot action recognition. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 372–380 (2018). https://doi.org/10.1109/WACV.2018.00047
Huang, H., Wang, C., Yu, P.S., Wang, C.-D.: Generative dual adversarial network for generalized zero-shot learning. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 801–810 (2019). https://doi.org/10.1109/CVPR.2019.00089
Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2242–2251 (2017). https://doi.org/10.1109/ICCV.2017.244
Alemi, A.A., Fischer, I., Dillon, J.V., Murphy, K.: Deep variational information bottleneck. arXiv preprint arXiv:1612.00410 (2016)
Shermin, T., Teng, S.W., Sohel, F., Murshed, M., Lu, G.: Integrated generalized zero-shot learning for fine-grained classification. Pattern Recogn. 122, 108246 (2022). https://doi.org/10.1016/j.patcog.2021.108246
Article Google Scholar
Likas, A.: A reinforcement learning approach to online clustering. Neural Comput. 11(8), 1915–1932 (1999). https://doi.org/10.1162/089976699300016025
Article Google Scholar
Liu, B., Yao, L., Ding, Z., Xu, J., Wu, J.: Combining ontology and reinforcement learning for zero-shot classification. Knowl.-Based Syst. 144, 42–50 (2018). https://doi.org/10.1016/j.knosys.2017.12.022
Article Google Scholar
Tutsoy, O., Brown, M.: Chaotic dynamics and convergence analysis of temporal difference algorithms with bang-bang control. Optim. Control Appl. Methods 37, 108–126 (2016)
Article MathSciNet MATH Google Scholar
Tutsoy, O., Barkana, D.E., Balikci, K.: A novel exploration-exploitation-based adaptive law for intelligent model-free control approaches. IEEE Trans. Cybern. 53(1), 329–337 (2023). https://doi.org/10.1109/TCYB.2021.3091680
Article Google Scholar
Feng, J., Bai, G., Li, D., Zhang, X., Shang, R., Jiao, L.: Mr-selection: a meta-reinforcement learning approach for zero-shot hyperspectral band selection. IEEE Trans. Geosci. Remote Sens. 61, 1–20 (2023). https://doi.org/10.1109/TGRS.2022.3231870
Article Google Scholar
Girdhar, R., Ramanan, D.: Attentional pooling for action recognition. Adv. Neural Inform. Process. Syst. 30 (2017)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
Wen, Y., Zhang, K., Li, Z., Qiao, Y.: A discriminative feature learning approach for deep face recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision - ECCV 2016, pp. 499–515. Springer, Cham (2016)
Chapter Google Scholar
Forgy, E.W.: Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics 21, 768–769 (1965)
Google Scholar
Niebles, J.C., Chen, C.-W., Fei-Fei, L.: Modeling temporal structure of decomposable motion segments for activity classification. In: European Conference on Computer Vision, pp. 392–405 (2010). Springer
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: Hmdb: a large video database for human motion recognition. In: 2011 International Conference on Computer Vision, pp. 2556–2563 (2011). IEEE
Soomro, K., Zamir, A.R., Shah, M.: Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)
Carreira, J., Zisserman, A.: Quo vadis, action recognition? a new model and the kinetics dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299–6308 (2017)
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3d convolutional networks. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 4489–4497 (2015). https://doi.org/10.1109/ICCV.2015.510
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014). https://doi.org/10.1109/CVPR.2014.223
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. Adv. Neural Inform. Process. Syst. 26 (2013)
Xia, L., Ma, W., Feng, L.: Semantic features and high-order physical features fusion for action recognition. Clust. Comput. 24(4), 3515–3529 (2021). https://doi.org/10.1007/s10586-021-03346-9
Article Google Scholar
Exarchakis, G., Oubari, O., Lenz, G.: A sampling-based approach for efficient clustering in large datasets. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12393–12402 (2022). https://doi.org/10.1109/CVPR52688.2022.01208
Paoletti, G., Cavazza, J., Beyan, C., Del Bue, A.: Subspace clustering for action recognition with covariance representations and temporal pruning. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 6035–6042 (2021). https://doi.org/10.1109/ICPR48806.2021.9412060
Maldonado, S., Saltos, R., Vairetti, C., Delpiano, J.: Mitigating the effect of dataset shift in clustering. Pattern Recogn. 134, 109058 (2023). https://doi.org/10.1016/j.patcog.2022.109058
Article Google Scholar
Gao, J., Zhang, T., Xu, C.: I know the relationships: Zero-shot action recognition via two-stream graph convolutional networks and knowledge graphs. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8303–8311 (2019)
Zhang, C., Peng, Y.: Visual data synthesis via gan for zero-shot video classification. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. IJCAI’18, pp. 1128–1134. AAAI Press (2018)
Qi, C., Feng, Z., Xing, M., Su, Y., Zheng, J., Zhang, Y.: Energy-based temporal summarized attentive network for zero-shot action recognition. IEEE Trans. Multimedia 25, 1940–1953 (2023). https://doi.org/10.1109/TMM.2023.3264847
Huang, K., Miralles-Pechuán, L., McKeever, S.: Enhancing zero-shot action recognition in videos by combining GANs with text and images. SN Comput. Sci. 4(4), 375 (2023). https://doi.org/10.1007/s42979-023-01803-3
Article Google Scholar
Brattoli, B., Tighe, J., Zhdanov, F., Perona, P., Chalupka, K.: Rethinking zero-shot video classification: End-to-end training for realistic applications. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4613–4623 (2020)
Qi, C., Feng, Z., Xing, M., Su, Y.: Dvamn: dual visual attention matching network for zero-shot action recognition. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds.) Artificial Neural Networks and Machine Learning - ICANN 2021, pp. 564–575. Springer, Cham (2021)
Gao, J., Xu, C.: Ci-GNN: building a category-instance graph for zero-shot video classification. IEEE Trans. Multimedia 22(12), 3088–3100 (2020). https://doi.org/10.1109/TMM.2020.2969787
Article MathSciNet Google Scholar

Download references

Funding

This work was supported by the National Natural Science Foundation of China (Grant No. 51678075) the Science and Technology Project of Hunan (Grant No. 2017GK2271).

Author information

Authors and Affiliations

School of Automation, Central South University, Changsha, 410083, China
Limin Xia & Xin Wen

Authors

Limin Xia
View author publications
You can also search for this author in PubMed Google Scholar
Xin Wen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

LX proposed the research topic, guided the design of the research proposal and the conduct of experiments, completed part of the writing, and proofread the whole text. XW participated in the experimental design, implemented the research process, and wrote the part manuscript text. All authors reviewed the manuscript.

Corresponding author

Correspondence to Xin Wen.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethics approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Xia, L., Wen, X. Zero-shot action recognition by clustered representation with redundancy-free features. Machine Vision and Applications 34, 116 (2023). https://doi.org/10.1007/s00138-023-01470-7

Download citation

Received: 15 March 2023
Revised: 06 August 2023
Accepted: 18 September 2023
Published: 09 October 2023
DOI: https://doi.org/10.1007/s00138-023-01470-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Zero-shot action recognition by clustered representation with redundancy-free features

Abstract

Access this article

Similar content being viewed by others

Spatio-Temporal Self-supervision for Few-Shot Action Recognition

CLASTER: Clustering with Reinforcement Learning for Zero-Shot Action Recognition

Attention-Based Video Disentangling and Matching Network for Zero-Shot Action Recognition

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Zero-shot action recognition by clustered representation with redundancy-free features

Abstract

Access this article

Similar content being viewed by others

Spatio-Temporal Self-supervision for Few-Shot Action Recognition

CLASTER: Clustering with Reinforcement Learning for Zero-Shot Action Recognition

Attention-Based Video Disentangling and Matching Network for Zero-Shot Action Recognition

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation