Feature Aggregation Decoder for Segmenting Laparoscopic Scenes

  • Abdolrahim KadkhodamohammadiEmail author
  • Imanol Luengo
  • Santiago Barbarisi
  • Hinde Taleb
  • Evangello Flouty
  • Danail Stoyanov
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11796)


Laparoscopic scene segmentation is one of the key building blocks required for developing advanced computer assisted interventions and robotic automation. Scene segmentation approaches often rely on encoder-decoder architectures that encode a representation of the input to be decoded to semantic pixel labels. In this paper, we propose to use the deep Xception model for the encoder and a simple yet effective decoder that relies on a feature aggregation module. Our feature aggregation module constructs a mapping function that reuses and transfers encoder features and combines information across all feature scales to build a richer representation that keeps both high-level context and low-level boundary information. We argue that this aggregation module enables us to simplify the decoder and reduce the number of parameters in the decoder. We have evaluated our approach on two datasets and our experimental results show that our model outperforms state-of-the-art models on the same experimental setup and significantly improves the previous results, \(98.44\%\) vs \(89.00\%\), on the EndoVis’15 dataset.


Semantic segmentation Minimally invasive surgery Surgical vision 


  1. 1.
    Pascal VOC 2012: segmentation leaderboard. Accessed March 2019
  2. 2.
    Bodenstedt, S., et al.: Comparative evaluation of instrument segmentation and tracking methods in minimally invasive surgery. arXiv preprint arXiv:1805.02475 (2018)
  3. 3.
    Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 833–851. Springer, Cham (2018). Scholar
  4. 4.
    Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1800–1807. IEEE, July 2017Google Scholar
  5. 5.
    da Costa Rocha, C., Padoy, N., Rosa, B.: Self-supervised surgical tool segmentation using kinematic information. In: International Conference on Robotics and Automation (ICRA). IEEE (2019)Google Scholar
  6. 6.
    D’Ettorre, C., et al.: Automated pick-up of suturing needles for robotic surgical assistance. In: International Conference on Robotics and Automation (ICRA), pp. 1370–1377. IEEE (2018)Google Scholar
  7. 7.
    García-Peraza-Herrera, L.C., et al.: ToolNet: holistically-nested real-time segmentation of robotic surgical tools. In: International Conference on Intelligent Robots and Systems (IROS), pp. 5717–5722. IEEE, September 2017Google Scholar
  8. 8.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580–587. IEEE (2014)Google Scholar
  9. 9.
    Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, June 2018Google Scholar
  10. 10.
    Jin, A., et al.: Tool detection and operative skill assessment in surgical videos using region-based convolutional neural networks. In: Winter Conference on Applications of Computer Vision (WACV), pp. 691–699, March 2018Google Scholar
  11. 11.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  12. 12.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431–3440. IEEE (2015)Google Scholar
  13. 13.
    Maier-Hein, L., et al.: Why rankings of biomedical image analysis competitions should be interpreted with care. Nat. Commun. 9(1), 5217 (2018)CrossRefGoogle Scholar
  14. 14.
    Münzer, B., Schoeffmann, K., Böszörmenyi, L.: Content-based processing and analysis of endoscopic images and videos: a survey. Multimedia Tools Appl. 77(1), 1323–1362 (2018)CrossRefGoogle Scholar
  15. 15.
    Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). Scholar
  16. 16.
    Wilson, M., Coleman, M., McGrath, J.: Developing basic hand-eye coordination skills for laparoscopic surgery using gaze training. BJU Int. 105(10), 1356–1358 (2010)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Abdolrahim Kadkhodamohammadi
    • 1
    Email author
  • Imanol Luengo
    • 1
  • Santiago Barbarisi
    • 1
  • Hinde Taleb
    • 1
  • Evangello Flouty
    • 1
  • Danail Stoyanov
    • 1
    • 2
  1. 1.Digital Surgery Ltd.LondonUK
  2. 2.Wellcome/EPSRC Centre for Interventional and Surgical SciencesUniversity College LondonLondonUK

Personalised recommendations