Skip to main content

Evaluating robotic-assisted surgery training videos with multi-task convolutional neural networks


We seek to understand if an automated algorithm can replace human scoring of surgical trainees performing the urethrovesical anastomosis in radical prostatectomy with synthetic tissue. Specifically, we investigate neural networks for predicting the surgical proficiency score (GEARS score) from video clips. We evaluate videos of surgeons performing the urethral anastomosis using synthetic tissue. The algorithm tracks surgical instrument locations from video, saving the positions of key points on the instruments over time. These positional features are used to train a multi-task convolutional network to infer each sub-category of the GEARS score to determine the proficiency level of trainees. Experimental results demonstrate that the proposed method achieves good performance with scores matching manual inspection in 86.1% of all GEARS sub-categories. Furthermore, the model can detect the difference between proficiency (novice to expert) in 83.3% of videos. Evaluation of GEARS sub-categories with artificial neural networks is possible for novice and intermediate surgeons, but additional research is needed to understand if expert surgeons can be evaluated with a similar automated system.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Availability of data and material

Videos available upon request, with proper data management plan and IRB reciprocity approval.

Code availability

All source code is custom developed and available upon request (


  1. Davies B (2015) Robotic surgery—a personal view of the past, present and future. Int J Adv Robot Syst 12(5):54.

    Article  Google Scholar 

  2. Fard MJ, Pandya AK, Chinnam RB, Klein M, Ellis R (2016) Distance-based time series classification approach for task recognition with application in surgical robot autonomy: task and gesture recognition in robotic minimally invasive surgery. Int J Med Robot Comput Assist Surg 1:3.

    Article  Google Scholar 

  3. Ghani KR, Miller DC, Linsell S, Brachulis A, Lane B, Sarle R, Dalela D, Menon M, Comstock B, Lendvay TS, Montie J, Peabody JO (2016) Measuring to improve: peer and crowd-sourced assessments of technical skill with robot-assisted radical prostatectomy. Eur Urol 69(4):547–550

    Article  Google Scholar 

  4. Johnson BA, Timberlake M, Steinberg RL, Kosemund M, Mueller B, Gahan JC (2019) Design and validation of a low-cost, high-fidelity model for the urethrovesical anastomosis in radical prostatectomy. J Endourol Ja 33:331–336

    Article  Google Scholar 

  5. Han J, Zhang D, Cheng G, Liu N, Dong X (2018) Advanced deep-learning techniques for salient and category-specific object detection: a survey. IEEE Signal Process Mag 35(1):84–100

    Article  Google Scholar 

  6. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. In: Proceedings of the IEEE. IEEE, vol 86, issue 11, pp 2278–2324.

  7. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Proceedings of the international conference on medical image computing and computer-assisted intervention. Springer, pp 234–241

  8. Hasan SMK, Linte CA (2019) U-NetPlus: a modified encoderdecoder u-net architecture for semantic and instance segmentation of surgical instruments from laparoscopic images. In: 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE, pp 7205–7211

  9. Islam M, Atputharuban DA, Ramesh R, Ren H (2019) Real-time instrument segmentation in robotic surgery using auxiliary supervised deep adversarial learning. IEEE Robot Autom Lett 4(2):2188–2195

    Article  Google Scholar 

  10. Shvets AA, Rakhlin A, Kalinin AA, Iglovikov VI (2018) Automatic instrument segmentation in robot-assisted surgery using deep learning. In: 2018 17th IEEE International Conference on machine learning and applications (ICMLA). IEEE, pp 624–628

  11. Kim Y (2014) Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882

  12. Yin W, Kann K, Yu M, Schütze H (2017) Comparative study of CNN and RNN for natural language processing. CoRR

  13. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. arXiv preprint arXiv:1706.03762

  14. Ramachandram D, Taylor GW (2017) Deep multimodal learning: a survey on recent advances and trends. IEEE Signal Process Mag 34(6):96–108

    Article  Google Scholar 

  15. Zhao Z, Voros S, Weng Y, Chang F, Li R (2017) Tracking-by-detection of surgical instruments in minimally invasive surgery via the convolutional neural network deep learning-based method. Comput Assist Surg 22(sup1):26–35

    Article  Google Scholar 

  16. Law H, Ghani K, Deng J (2017) Surgeon technical skill assessment using computer vision based analysis. In: Proceedings of the machine learning for healthcare conference, pp 88–99

  17. Lee D, Yu HW, Kwon H, Kong H-J, Lee KE, Kim HC (2020) Evaluation of surgical skills during robotic surgery by deep learning-based multiple surgical instrument tracking in training and actual operations. J Clin Med 9(6):1964

    Article  Google Scholar 

  18. Gahan J, Steinberg R, Garbens A, Qu X, Larson E (2020) Machine learning using a multi-task convolutional neural networks to accurately assess robotic skills. J Urol 203(Supplement 4):e505–e505

    Google Scholar 

  19. Goh AC, Goldfarb DW, Sander JC, Miles BJ, Dunkin BJ (2012) Global evaluative assessment of robotic skills: validation of a clinical assessment tool to measure robotic surgical skills. J Urol 187(1):247–252

    Article  Google Scholar 

  20. Tombari F, Di Stefano L (2010) Object recognition in 3d scenes with occlusions and clutter by hough voting. In: 2010 Fourth Pacific-Rim Symposium on image and video technology. IEEE, pp 349–355

  21. He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: European Conference on computer vision. Springer, pp 630–645

  22. Wu Y, He K (2018) Group normalization. In: Proceedings of the European Conference on computer vision (ECCV), pp 3–19

Download references


Internal research and development funds were employed.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Eric C. Larson.

Ethics declarations

Conflict of interest

Authors Wang, Dai, Morgan, Elsaied, Garbens, Qu, Steinberg, Gahan, and Larson declare that they have no conflict of interest.

IRB approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Informed consent

was obtained from all individual participants included in the study. This article does not contain any studies with animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wang, Y., Dai, J., Morgan, T.N. et al. Evaluating robotic-assisted surgery training videos with multi-task convolutional neural networks. J Robotic Surg (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Surgical training
  • Robotic-assisted surgery
  • Deep learning
  • Skill evaluation
  • Keypoint detection