Abstract
Action quality assessment is a significant research domain in computer vision, aimed at evaluating the accuracy of human movement and providing feedback and guidance for training and rehabilitation. However, the uneven nature of the data, which has a significant impact on the labels with less samples, is not taken into consideration by the generally used approaches in this field. To address this issue, we propose using kernel density estimation (KDE) to recalculate the label density and weight the loss function by the reciprocal of the square root of each label density. Additionally, we divide the entire motion into three sub-stages, including the takeoff, aerial movement, and entry for diving, and connect the three stages using an across-staged temporal reasoning module (ASTRM). Our approach achieves a performance of 0.9222 Spearman correlation coefficient (\(\rho \)) and 0.3304 (\(\times \)100) Relative \(\ell _2\)-distance (\(\mathrm R\)-\(\ell _2\)) on the FineDiving dataset, demonstrating competitiveness compared to other methods. Furthermore, numerous comprehensive ablation experiments validate the effectiveness of the methods and modules we adopted.
Similar content being viewed by others
Data Availability
FineDiving dataset can be downloaded upon request at https://github.com/xujinglin/FineDiving.
References
Srivastava A, Mehrotra D, Kapur PK, Aggarwal AG (2020) Analytical evaluation of agile success factors influencing quality in software industry. Int J Syst Assur Eng Manag 11:247–257
Singh D, Satija A (2020) Integrated municipal solid waste management in faridabad city, haryana state (india). Int J Syst Assur Eng Manag 11:411–425
Sengazani Murugesan V, Sequeira AH, Jauhar SK, Kumar V (2020) Sustainable postal service design: integrating quality function deployment from the customers perspective. Int J Syst Assur Eng Manag 11(2):494–505
Amanbek N, Mamayeva LA, Rakhimzhanova GM (2021) Results of a comprehensive assessment of the quality of services to the population with the use of statistical methods. Int J Syst Assur Eng Manag 12:1322–1333
Singh AK, Rawani AM (2022) Industry oriented quality management of engineering education: an integrated qfd-topsis approach. Int J Syst Assur Eng Manag 13(2):904–922
Gupta S, Garg R, Singh A (2020) Anfis-based control of multi-objective grid connected inverter and energy management. J Inst Eng (India): Series B 101:1–14
Xu C, Fu Y, Zhang B, Chen Z, Jiang YG, Xue X (2019) Learning to score figure skating sport videos. IEEE Trans Circuits Syst Video Technol 30(12):4578–4590
Parmar P, Gharat A, Rhodin H (2022) Domain knowledge-informed self-supervised representations for workout form assessment. In: Computer vision–ECCV 2022: 17th european conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXVIII, pp 105–123. Springer
Doughty H, Mayol-Cuevas W, Damen D (2019) The pros and cons: Rank-aware temporal attention for skill determination in long videos. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7854–7863
Nayak JR, Shaw B, Sahu BK (2023) A fuzzy adaptive symbiotic organism search based hybrid wavelet transform-extreme learning machine model for load forecasting of power system: a case study. J Ambient Intell Humaniz Comput 14(8):10833–10847
Danandeh Mehr A, Rikhtehgar Ghiasi A, Yaseen ZM, Sorman AU, Abualigah L (2023) A novel intelligent deep learning predictive model for meteorological drought forecasting. J Ambient Intell Humaniz Comput 14(8):10441–10455
Wang S, Yang D, Zhai P, Yu Q, Suo T, Sun Z, Li K, Zhang L (2021) A survey of video-based action quality assessment. In: 2021 International conference on networking systems of AI (INSAI), pp 1–9
Jain H, Harit G, Sharma A (2021) Action quality assessment using siamese network-based deep metric learning. IEEE Trans Circuits Syst Video Technol 31(6):2260–2273
Li M, Zhang HB, Lei Q, Fan Z, Liu J, Du JX (2022) Pairwise contrastive learning network for action quality assessment. In: Computer vision – ECCV 2022: 17th european conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp 457–473, Berlin, Heidelberg.Springer-Verlag
Yang Y, Zha K, Chen Y, Wang H, Katabi D (2021) Delving into deep imbalanced regression. In: Proceedings of the 38th international conference on machine learning, pp 11842–11851. PMLR
Dong LJ, Zhang HB, Shi Q, Lei Q, Du JX, Gao S (2021) Learning and fusing multiple hidden substages for action quality assessment. Knowl-Based Syst 229(C)
Zhou B, Andonian A, Oliva A, Torralba A (2017) Trn: Temporal relational reasoning in videos. 2018 ECCV
Pirsiavash H, Vondrick C, Torralba A (2014) Assessing the quality of actions. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer vision – ECCV 2014, vol 2014. lecture notes in computer science, pp 556–571, Cham. Springer International Publishing
Parmar P, Tran Morris B (2017) Learning to score olympic events. In: 2017 IEEE Conference on computer vision and pattern recognition workshops (CVPRW), pp 76–84
Li Y, Chai X, Chen X (2019) Scoringnet: Learning key fragment for action quality assessment with ranking loss in skilled sports. In: Jawahar CV, Li H, Mori G, Schindler K (eds) Computer vision – ACCV 2018. lecture notes in computer science. Cham. Springer International Publishing, pp 149–164
Wang S, Yang D, Zhai P, Chen C, Zhang L (2021) Tsa-net: Tube self-attention network for action quality assessment. In: Proceedings of the 29th ACM international conference on multimedia, MM ’21, pp 4902–4910, New York, NY, USA. Association for Computing Machinery
Zeng LA, Hong FT, Zheng WS, Yu QZ, Zeng W, Wang YW, Lai JH (2020) Hybrid dynamic-static context-aware attention network for action assessment in long videos. In: Proceedings of the 28th ACM international conference on multimedia, pp 2526–2534
Zhang HB, Dong LJ, Lei Q, Yang LJ, Du JX (2022) Label-reconstruction-based pseudo-subscore learning for action quality assessment in sporting events. Applied Intelligence (Dordrecht, Netherlands), pp 1–15
Tang Y, Ni Z, Zhou J, Zhang D, Lu J, Wu Y, Zhou J (2020) Usdl: Uncertainty-aware score distribution learning for action quality assessment. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 9836–9845
Zhang B, Chen J, Xu Y, Zhang H, Yang X, Geng X (2022) Dae: Auto-encoding score distribution regression for action quality assessment
Xu A, Zeng LA, Zheng WS (2022) Likert scoring with grade decoupling for long-term action assessment. In: 2022 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 3222–3231
Yu X, Rao Y, Zhao W, Lu J, Zhou J (2021) Core: Group-aware contrastive regression for action quality assessment. 2021 IEEE/CVF International conference on computer vision (ICCV)
Bai Y, Zhou D, Zhang S, Wang J, Ding E, Guan Y, Long Y, Wang J (2022) Action quality assessment with temporal parsing transformer. In: European conference on computer vision, pp 422–438. Springer
Xu J, Rao Y, Yu X, Chen G, Zhou J, Lu J (2022) Finediving: A fine-grained dataset for procedure-aware action quality assessment. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2949–2958
World Aquatics (2023) Competition regulations. https://resources.fina.org/fina/document/2023/04/05/c8f2e9bf-54bb-4e95-a534-116671049357/WORLD_AQUATICS_COMPETITION_REGULATIONS.pdf, Approved by the World Aquatics Bureau on 21 February 2023
Wang Z, Yang Y, Liu Z, Zheng Y (2023) Deep neural networks in video human action recognition: A review. arXiv:2305.15692
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. Adv Neural Inf Process Syst 27
Carreira J, Zisserman A (2017) Quo vadis, action recognition? a new model and the kinetics dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6299–6308
Arnab A, Dehghani M, Heigold G, Sun C, Lučić M, Schmid C (2021) Vivit: A video vision transformer. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6836–6846
Liu Z, Ning J, Cao Y, Wei Y, Zhang Z, Lin S, Hu H (2022) Video swin transformer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3202–3211
Tran D, Wang H, Torresani L, Ray J, LeCun Y, Paluri M (2018) A closer look at spatiotemporal convolutions for action recognition. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition, pp 6450–6459
Liu Z, Miao Z, Zhan X, Wang J, Gong B, Yu SX (2019) Large-scale long-tailed recognition in an open world. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2537–2546
Tang K, Huang J, Zhang H (2020) Long-tailed classification by keeping the good and removing the bad momentum causal effect. In: Advances in neural information processing systems, vol 33, pp 1513–1524. Curran Associates, Inc
Cao K, Wei C, Gaidon A, Arechiga N, Ma T (2019) Learning imbalanced datasets with label-distribution-aware margin loss. In: Advances in neural information processing systems, vol 32. Curran Associates, Inc
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Parmar P, Morris BT (2019) What and how well you performed? a multitask learning approach to action quality assessment. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 304–313
Lin TY, Goyal P, Girshick R, He K, Dollar P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
Kay W, Carreira J, Simonyan K, Zhang B, Hillier C, Vijayanarasimhan S, Viola F, Green T, Back T, Natsev P, Suleyman M (2017) The kinetics human action video dataset
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv:1412.6980
Farabi S, Himel H, Gazzali F, Hasan MB, Kabir MH, Farazi M (2022) Improving action quality assessment using weighted aggregation. In: Pinho AJ, Georgieva P, Teixeira LF, Sánchez JA (eds) Pattern recognition and image analysis. Lecture Notes in Computer Science. Cham, Springer International Publishing, pp 576–587
Bharadiya J (2023) A comprehensive survey of deep learning techniques natural language processing. Eur J Tech 7(1):58–66
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Grant No.52072132).
Author information
Authors and Affiliations
Contributions
Pu-Xiang Lian: Conceptualization, Methodology, Validation, Formal analysis, Writing-original draft, Writing-review & editing, Visualization. Zhi-Gang Shao: Methodology, Formal analysis, Validation, Writing-review & editing.
Corresponding author
Ethics declarations
Competing of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lian, PX., Shao, ZG. Improving action quality assessment with across-staged temporal reasoning on imbalanced data. Appl Intell 53, 30443–30454 (2023). https://doi.org/10.1007/s10489-023-05166-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-05166-3