Real-Time Segmentation of Non-rigid Surgical Tools Based on Deep Learning and Tracking

  • Luis C. García-Peraza-HerreraEmail author
  • Wenqi Li
  • Caspar Gruijthuijsen
  • Alain Devreker
  • George Attilakos
  • Jan Deprest
  • Emmanuel Vander Poorten
  • Danail Stoyanov
  • Tom Vercauteren
  • Sébastien Ourselin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10170)


Real-time tool segmentation is an essential component in computer-assisted surgical systems. We propose a novel real-time automatic method based on Fully Convolutional Networks (FCN) and optical flow tracking. Our method exploits the ability of deep neural networks to produce accurate segmentations of highly deformable parts along with the high speed of optical flow. Furthermore, the pre-trained FCN can be fine-tuned on a small amount of medical images without the need to hand-craft features. We validated our method using existing and new benchmark datasets, covering both ex vivo and in vivo real clinical cases where different surgical instruments are employed. Two versions of the method are presented, non-real-time and real-time. The former, using only deep learning, achieves a balanced accuracy of 89.6% on a real clinical dataset, outperforming the (non-real-time) state of the art by 3.8% points. The latter, a combination of deep learning with optical flow tracking, yields an average balanced accuracy of 78.2% across all the validated datasets.


Optical Flow Affine Transformation Convolutional Neural Network Deep Neural Network Visual Servoing 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work was supported by Wellcome Trust [WT101957], EPSRC (NS/A000027/1, EP/H046410/1, EP/J020990/1, EP/K005278), NIHR BRC UCLH/UCL High Impact Initiative and a UCL EPSRC CDT Scholarship Award (EP/L016478/1). The authors would like to thank NVIDIA for the donated GeForce GTX TITAN X GPU, their colleagues E. Maneas, S. Moriconi, F. Chadebecq, M. Ebner and S. Nousias for the ground truth of FetalFlexTool and E. Maneas for preparing setup with an ex vivo placenta.

Supplementary material

440896_1_En_8_MOESM1_ESM.mp4 (1.3 mb)
Supplementary material 1 (mp4 1369 KB)
440896_1_En_8_MOESM2_ESM.mp4 (1.2 mb)
Supplementary material 2 (mp4 1221 KB)
440896_1_En_8_MOESM3_ESM.mp4 (1.1 mb)
Supplementary material 3 (mp4 1118 KB)

Supplementary material 4 (mp4 3555 KB)


  1. 1.
    Bouget, D., Benenson, R., Omran, M., Riffaud, L., Schiele, B., Jannin, P.: Detecting surgical tools by modelling local appearance and global shape. IEEE Trans. Med. Imaging 34(12), 2603–2617 (2015)CrossRefGoogle Scholar
  2. 2.
    Daga, P., Chadebecq, F., Shakir, D., Garcia-Peraza Herrera, L.C., Tella, M., Dwyer, G., David, A.L., Deprest, J., Stoyanov, D., Vercauteren, T., Ourselin, S.: Real-time mosaicing of fetoscopic videos using SIFT. In: SPIE Medical Imaging (2015)Google Scholar
  3. 3.
    Sznitman, R., Ali, K., Richa, R., Taylor, R.H., Hager, G.D., Fua, P.: Data-driven visual tracking in retinal microsurgery. In: Ayache, N., Delingette, H., Golland, P., Mori, K. (eds.) MICCAI 2012. LNCS, vol. 7511, pp. 568–575. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33418-4_70 CrossRefGoogle Scholar
  4. 4.
    Tella, M., Daga, P., Chadebecq, F., Thompson, S., Shakir, D., Dwyer, G., Wimalasundera, R., Deprest, J., Stoyanov, D., Vercauteren, T., Ourselin, S.: A combined EM and visual tracking probabilistic model for robust mosaicking of fetoscopic videos. In: IWBIR (2016)Google Scholar
  5. 5.
    Devreker, A., Rosa, B., Desjardins, A., Alles, E., Garcia-Peraza, L., Maneas, E., Stoyanov, D., David, A., Vercauteren, T., Deprest, J., Ourselin, S., Reynaerts, D., Vander Poorten, E.: Fluidic actuation for intra-operative in situ imaging. In: IROS, pp. 1415–1421. IEEE (2015)Google Scholar
  6. 6.
    Reiter, A., Allen, P.K., Zhao, T.: Marker-less articulated surgical tool detection. In: CARS (2012)Google Scholar
  7. 7.
    Allan, M., Ourselin, S., Thompson, S., Hawkes, D.J., Kelly, J., Stoyanov, D.: Toward detection and localization of instruments in minimally invasive surgery. IEEE Trans. Biomed. Eng. 60(4), 1050–1058 (2013)CrossRefGoogle Scholar
  8. 8.
    Allan, M., Thompson, S., Clarkson, M.J., Ourselin, S., Hawkes, D.J., Kelly, J., Stoyanov, D.: 2D-3D pose tracking of rigid instruments in minimally invasive surgery. In: Stoyanov, D., Collins, D.L., Sakuma, I., Abolmaesumi, P., Jannin, P. (eds.) IPCAI 2014. LNCS, vol. 8498, pp. 1–10. Springer, Cham (2014). doi: 10.1007/978-3-319-07521-1_1 CrossRefGoogle Scholar
  9. 9.
    Pezzementi, Z., Voros, S., Hager, G.D.: Articulated object tracking by rendering consistent appearance parts. In: ICRA, pp. 3940–3947. IEEE (2009)Google Scholar
  10. 10.
    Reiter, A., Goldman, R.E., Bajo, A., Iliopoulos, K., Simaan, N., Allen, P.K.: A learning algorithm for visual pose estimation of continuum robots. In: IROS, pp. 2390–2396. IEEE, September 2011Google Scholar
  11. 11.
    Voros, S., Orvain, E., Cinquin, P., Long, J.A.: Automatic detection of instruments in laparoscopic images: a first step towards high level command of robotized endoscopic holders. In: The First IEEE/RAS-EMBS International Conference on Biomedical Robotics and Biomechatronics (BioRob 2006), pp. 1107–1112. IEEE (2006)Google Scholar
  12. 12.
    Reiter, A., Allen, P.K., Zhao, T.: Appearance learning for 3D tracking of robotic surgical tools. Int. J. Robot. Res. 33(2), 342–356 (2014)CrossRefGoogle Scholar
  13. 13.
    Girshick, R.: Fast R-CNN. In: ICCV, pp. 1440–1448 (2015)Google Scholar
  14. 14.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks, pp. 1–9 (2015)Google Scholar
  15. 15.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL VOC Challenge 2007 Results.
  16. 16.
    Twinanda, A.P., Shehata, S., Mutter, D., Marescaux, J., de Mathelin, M., Padoy, N.: EndoNet: a deep architecture for recognition tasks on laparoscopic videos. In: CVPR, pp. 1–10 (2016)Google Scholar
  17. 17.
  18. 18.
    Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: ICCV, pp. 1520–1528 (2015)Google Scholar
  19. 19.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR, pp. 3431–3440. IEEE (2015)Google Scholar
  20. 20.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)Google Scholar
  21. 21.
    Guerra, E., de Lara, J., Malizia, A., Díaz, P.: Supporting user-oriented analysis for multi-view domain-specific visual languages. Inf. Softw. Technol. 51(4), 769–784 (2009)CrossRefGoogle Scholar
  22. 22.
  23. 23.
    Mottaghi, R., Chen, X., Liu, X., Cho, N.G., Lee, S.W., Fidler, S., Urtasun, R., Yuille, A.: The role of context for object detection and semantic segmentation in the wild. In: CVPR (2014)Google Scholar
  24. 24.
    Smith, L.N.: No more pesky learning rate guessing games. Arxiv, June 2015Google Scholar
  25. 25.
    Shi, J., Tomasi, C.: Good features to track. In: IEEE Computer Society Conference on CVPR, pp. 593–600 (1994)Google Scholar
  26. 26.
    Bouguet, J.Y.: Pyramidal implementation of the lucas kanade feature tracker: description of the algorithm. Technical report, Intel Corporation Microprocessor Research Labs (2000)Google Scholar
  27. 27.
  28. 28.
    Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1409–1422 (2012)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Luis C. García-Peraza-Herrera
    • 1
    Email author
  • Wenqi Li
    • 1
  • Caspar Gruijthuijsen
    • 4
  • Alain Devreker
    • 4
  • George Attilakos
    • 3
  • Jan Deprest
    • 5
  • Emmanuel Vander Poorten
    • 4
  • Danail Stoyanov
    • 2
  • Tom Vercauteren
    • 1
  • Sébastien Ourselin
    • 1
  1. 1.Translational Imaging Group, CMICUniversity College LondonLondonUK
  2. 2.Surgical Robot Vision Group, CMICUniversity College LondonLondonUK
  3. 3.University College London HospitalsLondonUK
  4. 4.Katholieke Universiteit LeuvenLeuvenBelgium
  5. 5.Universitair Ziekenhuis LeuvenLeuvenBelgium

Personalised recommendations