Automated Surgical Activity Recognition with One Labeled Sequence
Prior work has demonstrated the feasibility of automated activity recognition in robot-assisted surgery from motion data. However, these efforts have assumed the availability of a large number of densely-annotated sequences, which must be provided manually by experts. This process is tedious, expensive, and error-prone. In this paper, we present the first analysis under the assumption of scarce annotations, where as little as one annotated sequence is available for training. We demonstrate feasibility of automated recognition in this challenging setting, and we show that learning representations in an unsupervised fashion, before the recognition phase, leads to significant gains in performance. In addition, our paper poses a new challenge to the community: how much further can we push performance in this important yet relatively unexplored regime?
KeywordsSurgical activity recognition Gesture recognition Maneuver recognition Semi-supervised learning
This work was supported by a fellowship for modeling, simulation, and training from the Link Foundation. We also thank Anand Malpani, Madeleine Waldram, Swaroop Vedula, Gyusung I. Lee, and Mija R. Lee for procuring the MISTIC-SL dataset. The procurement of MISTIC-SL was supported by the Johns Hopkins Science of Learning Institute.
- 3.Bishop, C.M.: Mixture density networks. Technical report, Aston University (1994)Google Scholar
- 4.Bodenstedt, S., et al.: Unsupervised temporal context learning using convolutional neural networks for laparoscopic workflow analysis. arXiv preprint arXiv:1702.03684 (2017)
- 5.Chen, Z., et al.: Virtual fixture assistance for needle passing and knot tying. In: Intelligent Robots and Systems (IROS), pp. 2343–2350 (2016)Google Scholar
- 6.DiPietro, R., et al.: Segmenting and classifying activities in robot-assisted surgery with recurrent neural networks. Int. J. Comput. Assist. Radiol. Surg. (2019)Google Scholar
- 7.DiPietro, R., Hager, G.D.: Unsupervised learning for surgical motion by learning to predict the future. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11073, pp. 281–288. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00937-3_33CrossRefGoogle Scholar
- 10.Gao, Y., et al.: Language of surgery: a surgical gesture dataset for human motion modeling. In: Modeling and Monitoring of Computer Assisted Interventions (2014)Google Scholar
- 11.Gao, Y., Vedula, S., Lee, G.I., Lee, M.R., Khudanpur, S., Hager, G.D.: Unsupervised surgical data alignment with application to automatic activity annotation. In: 2016 IEEE International Conference on Robotics and Automation (ICRA) (2016)Google Scholar
- 14.Jacobs, D.M., Poenaru, D. (eds.): Surgical Educators’ Handbook. Association for Surgical Education, Los Angeles (2001)Google Scholar
- 17.Yengera, G., Mutter, D., Marescaux, J., Padoy, N.: Less is more: surgical phase recognition with less annotations through self-supervised pre-training of CNN-LSTM networks. arXiv preprint arXiv:1805.08569 (2018)
- 18.Yu, T., Mutter, D., Marescaux, J., Padoy, N.: Learning from a tiny dataset of manual annotations: a teacher/student approach for surgical phase recognition. arXiv preprint arXiv:1812.00033 (2018)