Skip to main content

Surgical Skill Assessment via Video Semantic Aggregation

  • Conference paper
  • First Online:
Medical Image Computing and Computer Assisted Intervention – MICCAI 2022 (MICCAI 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13437))

Abstract

Automated video-based assessment of surgical skills is a promising task in assisting young surgical trainees, especially in poor-resource areas. Existing works often resort to a CNN-LSTM joint framework that models long-term relationships by LSTMs on spatially pooled short-term CNN features. However, this practice would inevitably neglect the difference among semantic concepts such as tools, tissues, and background in the spatial dimension, impeding the subsequent temporal relationship modeling. In this paper, we propose a novel skill assessment framework, Video Semantic Aggregation (ViSA), which discovers different semantic parts and aggregates them across spatiotemporal dimensions. The explicit discovery of semantic parts provides an explanatory visualization that helps understand the neural network’s decisions. It also enables us to further incorporate auxiliary information such as the kinematic data to improve representation learning and performance. The experiments on two datasets show the competitiveness of ViSA compared to state-of-the-art methods. Source code is available at: bit.ly/MICCAI2022ViSA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ismail Fawaz, H., Forestier, G., Weber, J., Idoumghar, L., Muller, P.-A.: Evaluating Surgical skills from kinematic data using convolutional neural networks. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11073, pp. 214–221. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00937-3_25

    Chapter  Google Scholar 

  2. Funke, I., Mees, S.T., Weitz, J., Speidel, S.: Video-based surgical skill assessment using 3d convolutional neural networks. Int. J. Comput. Assist. Radiol. Surg. 14(7), 1217–1225 (2019)

    Article  Google Scholar 

  3. Gao, J., et al.: An asymmetric modeling for action assessment. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12375, pp. 222–238. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58577-8_14

    Chapter  Google Scholar 

  4. Gao, Y., et al.: Jhu-isi gesture and skill assessment working set (jigsaws): a surgical activity dataset for human motion modeling. In: MICCAI workshop: M2cai, vol. 3, p. 3 (2014)

    Google Scholar 

  5. Girdhar, R., Ramanan, D., Gupta, A., Sivic, J., Russell, B.: Actionvlad: learning spatio-temporal aggregation for action classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 971–980 (2017)

    Google Scholar 

  6. Hassani, A., Walton, S., Shah, N., Abuduweili, A., Li, J., Shi, H.: Escaping the big data paradigm with compact transformers. arXiv preprint arXiv:2104.05704 (2021)

  7. Huang, Z., Li, Y.: Interpretable and accurate fine-grained recognition via region grouping. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8662–8672 (2020)

    Google Scholar 

  8. Kay, W., et al.: The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017)

  9. Kelly, J.D., Petersen, A., Lendvay, T.S., Kowalewski, T.M.: Bidirectional long short-term memory for surgical skill classification of temporally segmented tasks. Int. J. Comput. Assist. Radiol. Surg. 15(12), 2079–2088 (2020). https://doi.org/10.1007/s11548-020-02269-x

    Article  Google Scholar 

  10. Lavanchy, J.L., et al.: Automation of surgical skill assessment using a three-stage machine learning algorithm. Sci. Rep. 11(1), 1–9 (2021)

    Google Scholar 

  11. Li, Z., Huang, Y., Cai, M., Sato, Y.: Manipulation-skill assessment from videos with spatial attention network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. pp. 0–0 (2019)

    Google Scholar 

  12. Li, Z., Wang, W., Li, Z., Huang, Y., Sato, Y.: Spatio-temporal perturbations for video attribution. IEEE Trans. Circuits Syst. Video Technol. 32(4), 2043–2056 (2021)

    Article  Google Scholar 

  13. Liu, D., Li, Q., Jiang, T., Wang, Y., Miao, R., Shan, F., Li, Z.: Towards unified surgical skill assessment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9522–9531 (2021)

    Google Scholar 

  14. Pan, J.H., Gao, J., Zheng, W.S.: Action assessment by joint relation graphs. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 6331–6340 (2019)

    Google Scholar 

  15. Parmar, P., Morris, B.: Action quality assessment across multiple actions. In: IEEE Winter Conference on Applications of Computer Vision. pp. 1468–1476 (2019)

    Google Scholar 

  16. Parmar, P., Tran Morris, B.: Learning to score olympic events. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 20–28 (2017)

    Google Scholar 

  17. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2017)

    Google Scholar 

  18. Tang, Y., Ni, Z., Zhou, J., Zhang, D., Lu, J., Wu, Y., Zhou, J.: Uncertainty-aware score distribution learning for action quality assessment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)

    Google Scholar 

  19. Tao, L., Elhamifar, E., Khudanpur, S., Hager, G.D., Vidal, R.: Sparse hidden Markov models for surgical gesture classification and skill evaluation. In: Abolmaesumi, P., Joskowicz, L., Navab, N., Jannin, P. (eds.) IPCAI 2012. LNCS, vol. 7330, pp. 167–177. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30618-1_17

    Chapter  Google Scholar 

  20. Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., Paluri, M.: A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6450–6459 (2018)

    Google Scholar 

  21. Vaswani, A., et al.: Attention is all you need. Advances in Neural Information Processing Systems 30 (2017)

    Google Scholar 

  22. Wagner, M., et al.: Comparative validation of machine learning algorithms for surgical workflow and skill analysis with the heichole benchmark. arXiv preprint arXiv:2109.14956 (2021)

  23. Wang, T., Wang, Y., Li, M.: Towards accurate and interpretable surgical skill assessment: a video-based method incorporating recognized surgical gestures and skill levels. In: Martel, A.L., Abolmaesumi, P., Stoyanov, D., Mateus, D., Zuluaga, M.A., Zhou, S.K., Racoceanu, D., Joskowicz, L. (eds.) MICCAI 2020. LNCS, vol. 12263, pp. 668–678. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59716-0_64

    Chapter  Google Scholar 

  24. Wang, Z., Majewicz Fey, A.: Deep learning with convolutional neural network for objective skill evaluation in robot-assisted surgery. Int. J. Comput. Assist. Radiol. Surg. 13(12), 1959–1970 (2018). https://doi.org/10.1007/s11548-018-1860-1

    Article  Google Scholar 

  25. Xiang, X., Tian, Y., Reiter, A., Hager, G.D., Tran, T.D.: S3d: stacking segmental p3d for action quality assessment. In: IEEE International Conference on Image Processing, pp. 928–932. IEEE (2018)

    Google Scholar 

  26. Zia, A., Essa, I.: Automated surgical skill assessment in RMIS training. Int. J. Comput. Assist. Radiol. Surg. 13(5), 731–739 (2018)

    Article  Google Scholar 

  27. Zia, A., Sharma, Y., Bettadapura, V., Sarin, E.L., Essa, I.: Video and accelerometer-based motion analysis for automated surgical skills assessment. Int. J. Comput. Assist. Radiol. Surg. 13(3), 443–455 (2018). https://doi.org/10.1007/s11548-018-1704-z

    Article  Google Scholar 

Download references

Acknowledgment

This work is supported by JST AIP Acceleration Research Grant Number JPMJCR20U1, JSPS KAKENHI Grant Number JP20H04205, JST ACT-X Grant Number JPMJAX190D, JST Moonshot R &D Grant Number JPMJMS2011, Fundamental Research Funds for the Central Universities under Grant DUT21RC(3)028 and a project commissioned by NEDO.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weimin Wang .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 726 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, Z., Gu, L., Wang, W., Nakamura, R., Sato, Y. (2022). Surgical Skill Assessment via Video Semantic Aggregation. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2022. MICCAI 2022. Lecture Notes in Computer Science, vol 13437. Springer, Cham. https://doi.org/10.1007/978-3-031-16449-1_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-16449-1_39

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-16448-4

  • Online ISBN: 978-3-031-16449-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics