Automatic Modelling for Interactive Action Assessment

Gao, Jibin; Pan, Jia-Hui; Zhang, Shao-Jie; Zheng, Wei-Shi

doi:10.1007/s11263-022-01695-5

Automatic Modelling for Interactive Action Assessment

Published: 10 December 2022

Volume 131, pages 659–679, (2023)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Jibin Gao¹,
Jia-Hui Pan¹,
Shao-Jie Zhang¹ &
…
Wei-Shi Zheng ORCID: orcid.org/0000-0001-8327-0003^1,2

681 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

Action assessment, the task of visually assessing the quality of performing an action, has attracted much attention in recent years, with promising applications in areas such as medical treatment and sporting events. However, most existing methods of action assessment mainly target the actions performed by a single person; in particular, they neglect the asymmetric relations among agents (e.g., between persons and objects), limiting their performance in many nonindividual actions. In this work, we formulate a framework for modelling asymmetric interactions among agents for action assessment, considering the subordinations among agents in many interactive actions. Specifically, we propose an asymmetric interaction learner consisting of an automatic assigner and an asymmetric interaction network search module. The automatic assigner is designed to automatically group agents within an action into a primary agent (e.g., human) and secondary agents (e.g., objects); the asymmetric interaction network search module adaptively learns the asymmetric interactions between these agents. We conduct experiments on the JIGSAWS dataset containing surgical actions and additionally collect two new datasets, TASD-2 and PaSk, for action assessment on interactive sporting actions. The experimental results on these three datasets demonstrate the effectiveness of our framework in achieving state-of-the-art performance. The extensive experiments on the AQA-7 dataset also indicate the robustness of our model in conventional action assessment settings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Asymmetric Modeling for Action Assessment

Improving action quality assessment with across-staged temporal reasoning on imbalanced data

Article 18 November 2023

Label-reconstruction-based pseudo-subscore learning for action quality assessment in sporting events

Article 13 August 2022

Notes

The Fisher Transform was proposed in 1921 to address a skewed distribution of the sample correlation (r) (PEARSON, 1913; Fisher, 1915); introducing it in the average correlation computation makes the result more reliable (Corey et al., 1998).

References

Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., & Sivic, J. (2016). Netvlad: Cnn architecture for weakly supervised place recognition. In CVPR (pp. 5297–5307).
Azar, S. M., Atigh, M. G., Nickabadi, A., & Alahi, A. (2019). Convolutional relational machine for group activity recognition. In CVPR (pp. 7892–7901).
Bertasius, G., Soo Park, H., Yu, S. X., & Shi, J. (2017). Am I a baller? Basketball performance assessment from first-person videos. In ICCV (pp. 2177–2185).
Cai, H., Zhu, L., & Han, S. (2018). Proxylessnas: Direct neural architecture search on target task and hardware. In ICLR.
Carreira, J., Zisserman, A. (2017). Quo vadis, action recognition? A new model and the kinetics dataset. In CVPR (pp. 6299–6308).
Chang, X., Zheng, W.-S., & Zhang, J. (2015). Learning person-person interaction in collective activity recognition. TIP 24(6), 1905–1918.
Chen, J., Wang, Y., Qin, J., Liu, L., & Shao, L. (July 2017). Fast person re-identification via cross-camera semantic binary transformation. In CVPR.
Corey, D. M., Dunlap, W. P., & Burke, M. J. (1998). Averaging correlations: Expected values and bias in combined Pearson RS and Fisher’s Z transformations. JGP, 125(3), 245–261.
Google Scholar
Dong, X., & Yang, Y. (2019). Searching for a robust neural architecture in four GPU hours. In CVPR (pp. 1761–1770).
Doughty, H., Damen, D., & Mayol-Cuevas, W. (2018). Whoś better, whoś best: Skill determination in video using deep ranking. In CVPR.
Doughty, H., Mayol-Cuevas, W., & Damen, D. (2019). The pros and cons: Rank-aware temporal attention for skill determination in long videos. In CVPR (pp. 7862–7871).
Fang, H.-S., Xie, S., Tai, Y.-W., & Lu, C. (2017). Rmpe: Regional multi-person pose estimation. In ICCV (pp. 2334–2343).
Fisher, R. A. (1915). Frequency distribution of the values of the correlation coefficient in samples from an indefinitely large population. Biometrika, 10(4), 507–521.
Google Scholar
Gao, J., Zheng, W.-S., Pan, J.-H., Gao, C., Wang, Y., Zeng, W., & Lai, J. (2020). An asymmetric modeling for action assessment. In ECCV (pp. 222–238), Springer.
Gao, Y., Vedula, S. S., Reiley, C. E., Ahmidi, N., Varadarajan, B., Lin, H. C., Tao, L., Zappella, L., Béjar, B., Yuh, D. D. et al. (2014). Jhu-isi gesture and skill assessment working set (jigsaws): A surgical activity dataset for human motion modeling. In W2CAI (Vol. 3, p. 3).
Guo, Z., Zhang, X., Mu, H., Heng, W., Liu, Z., Wei, Y., & Sun, J. (2019). Single path one-shot neural architecture search with uniform sampling. In ECCV (pp. 544–560).
Hu, S., Xie, S., Zheng, H., Liu, C., Shi, J., Liu, X., & Lin, D. (2020). Dsnas: Direct neural architecture search without parameter retraining. In CVPR (pp. 12084–12092).
Ilg, W., Mezger, J., & Giese, M. (2003). Estimation of skill levels in sports based on hierarchical Spatio-temporal correspondences. In JPRS (pp. 523–531), Springer.
International Swimming Federation (FINA). Fina diving rules, 2017. URL https://resources.fina.org/fina/document/2021/01/12/916f78f6-2a42-46d6-bea8-e49130211edf/2017-2021_diving_16032018.pdf.
Joachims, T. (2006). Training linear SVMs in linear time. In SIGKDD (pp. 217–226).
Liu, D., Li, Q., Jiang, T., Wang, Y., Miao, R., Shan, F., & Li, Z. (June 2021). Towards unified surgical skill assessment. In CVPR (pp. 9522–9531).
Liu, H., Simonyan, K., & Yang, Y. (2018). Darts: Differentiable architecture search. In ICLR.
Lu, L., Lu, Y., Yu, R., Di, H., Zhang, L., & Wang, S. (2019). Gaim: Graph attention interaction model for collective activity recognition. TMM 22(2), 524–539.
Malpani, A., Vedula, S. S., Chen, C. C. G., & Hager, G. D. (2014). Pairwise comparison-based objective score for automated skill assessment of segments in a surgical task. In IPCAI (pp. 138–147), Springer.
Martin, J., Regehr, G., Reznick, R., Macrae, H., Murnaghan, J., Hutchison, C., & Brown, M. (1997). Objective structured assessment of technical skill (OSATS) for surgical residents. BJS, 84(2), 273–278.
Pan, J.-H., Gao, J., & Zheng, W.-S. (October 2019). Action assessment by joint relation graphs. In ICCV.
Parmar, P., & Morris, B. T. (June 2019). What and how well you performed? A multitask learning approach to action quality assessment. In CVPR.
Parmar, P., & Tran Morris, B. (2017). Learning to score Olympic events. In CVPRW (pp. 20–28).
Parmar, P., Tran Morris, B. (Jan 2019). Action quality assessment across multiple actions. In WACV (pp. 1468–1476). https://doi.org/10.1109/WACV.2019.00161.
Pearson, K. (1913). On the probable error of a correlation coefficient as found from a fourfold table. Biometrika. https://doi.org/10.1093/biomet/9.1-2.22
Pérez, J. S., Meinhardt-Llopis, E., & Facciolo, G. (2013). Tv-l1 optical flow estimation. In IPOL (pp. 137–150).
Pham, H., Guan, M. Y., Zoph, B., Le, Q. V., & Dean, J. (2018). Efficient neural architecture search via parameters sharing. In ICML (pp. 4092–4101).
Pirsiavash, H., Vondrick, C., & Torralba, A. (2014). Assessing the quality of actions. In ECCV (pp. 556–571), Springer.
Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., & Monfardini, G. (2009). The graph neural network model. TNN, 20(1), 61–80.
Google Scholar
Sharma, Y., Bettadapura, V., Plötz, T., Hammerla, N., Mellor, S., McNaney, R., Olivier, P., Deshmukh, S., McCaskie, A., & Essa, I. (2014). Video based assessment of OSATS using sequential motion textures, Georgia Institute of Technology.
Shu, T., Todorovic, S., Zhu, S.-C. (2017). Cern: Confidence-energy recurrent network for group activity recognition. In CVPR (pp. 5523–5531).
Tang, Y., Ni, Z., Zhou, J., Zhang, D., Lu, J., Wu, Y., & Zhou, J. (2020) Uncertainty-aware score distribution learning for action quality assessment. In CVPR (pp. 9839–9848).
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L.u., & Polosukhin, I. (2017). Attention is all you need. In NeurIPS (pp. 5998–6008). Curran Associates, Inc.,. URL http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf.
Wang, M., Ni, B., & Yang, X. (2017). Recurrent modeling of interaction context for collective activity recognition. In CVPR (pp. 3048–3056).
Wu, J., Wang, L., Wang, L., Guo, J., & Wu, G. (2019). Learning actor relation graphs for group activity recognition. In CVPR (pp. 9964–9974).
Xie, S., Zheng, H., Liu, C., & Lin, L. (2018). Snas: Stochastic neural architecture search. In ICLR.
Xu, C., Fu, Y., Zhang, B., Chen, Z., Jiang, Y.-G., & Xue, X. (2018). Learning to score the figure skating sports videos. arXiv preprint arXiv:1802.02774.
Yan, R., Tang, J., Shu, X., Li, Z., & Tian, Q. (2018a). Participation-contributed temporal dynamic model for group activity recognition. In ACM MM (pp. 1292–1300).
Yan, S., Xiong, Y., & Lin, D. (2018b). Spatial temporal graph convolutional networks for skeleton-based action recognition. In AAAI.
Yao, T., Mei, T., & Rui, Y. (2016). Highlight detection with pairwise deep ranking for first-person video summarization. In CVPR (pp. 982–990).
Zeng, L.-A., Hong, F.-T., Zheng, W.-S., Yu, Q.-Z., Zeng, W., Wang, Y.-W., & Lai, J.-H. (2020). Hybrid dynamic-static context-aware attention network for action assessment in long videos. In ACM MM (pp. 2526–2534).
Zhang, P., Tang, Y., Hu, J.-F., & Zheng, W.-S. (2019). Fast collective activity recognition under weak supervision. TIP 29, 29–43.
Zhang, Q. & Li, B. (2011). Video-based motion expertise analysis in simulation-based surgical training using hierarchical dirichlet process hidden Markov model. In MMAR (pp. 19–24), ACM.
Zhang, Q., & Li, B. (2015). Relative hidden Markov models for video-based evaluation of motion skills in surgical training. TPAMI, 37(6), 1206–1218.
Article Google Scholar
Zhang, Y., Wang, C., Wang, X., Zeng, W., & Liu, W. (2020). Fairmot: On the fairness of detection and re-identification in multiple object tracking. arXiv preprint arXiv:2004.01888.
Zhu, K. & Wu, J. (2021). Residual attention: A simple but effective method for multi-label recognition. In ICCV (pp. 184–193).
Zia, A., & Essa, I. (2018). Automated surgical skill assessment in RMIS training. IJCARS, 13, 731–739.
Google Scholar
Zia, A., Sharma, Y., Bettadapura, V., Sarin, E. L., Ploetz, T., Clements, M. A., & Essa, I. (2016). Automated video-based assessment of surgical skills for training and evaluation in medical schools. IJCARS, 11(9), 1623–1636.
Google Scholar
Zia, A., Sharma, Y., Bettadapura, V., Sarin, E. L., & Essa, I. (2018). Video and accelerometer-based motion analysis for automated surgical skills assessment. IJCARS, 13(3), 443–455.

Download references

Acknowledgements

This work was supported partially by the NSFC (U21A20471,U1911401,U1811461), Guangdong NSF Project (Nos. 2020B1515120085, 2018B030312002), Guangzhou Research Project (201902010037), the Key-Area Research and Development Program of Guangzhou (202007030004), and the Major Key Project of PCL (PCL2021A12). The corresponding author and principal investigator for this paper is Wei-Shi Zheng.

Author information

Authors and Affiliations

School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510006, China
Jibin Gao, Jia-Hui Pan, Shao-Jie Zhang & Wei-Shi Zheng
Peng Cheng Laboratory, Shenzhen, 518055, China
Wei-Shi Zheng

Authors

Jibin Gao
View author publications
You can also search for this author in PubMed Google Scholar
Jia-Hui Pan
View author publications
You can also search for this author in PubMed Google Scholar
Shao-Jie Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wei-Shi Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei-Shi Zheng.

Additional information

Communicated by Dima Damen.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Gao, J., Pan, JH., Zhang, SJ. et al. Automatic Modelling for Interactive Action Assessment. Int J Comput Vis 131, 659–679 (2023). https://doi.org/10.1007/s11263-022-01695-5

Download citation

Received: 30 August 2021
Accepted: 24 September 2022
Published: 10 December 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s11263-022-01695-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic Modelling for Interactive Action Assessment

Abstract

Access this article

Similar content being viewed by others

An Asymmetric Modeling for Action Assessment

Improving action quality assessment with across-staged temporal reasoning on imbalanced data

Label-reconstruction-based pseudo-subscore learning for action quality assessment in sporting events

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatic Modelling for Interactive Action Assessment

Abstract

Access this article

Similar content being viewed by others

An Asymmetric Modeling for Action Assessment

Improving action quality assessment with across-staged temporal reasoning on imbalanced data

Label-reconstruction-based pseudo-subscore learning for action quality assessment in sporting events

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation