Skip to main content

Feature Aggregation Decoder for Segmenting Laparoscopic Scenes

  • Conference paper
  • First Online:
OR 2.0 Context-Aware Operating Theaters and Machine Learning in Clinical Neuroimaging (OR 2.0 2019, MLCN 2019)

Abstract

Laparoscopic scene segmentation is one of the key building blocks required for developing advanced computer assisted interventions and robotic automation. Scene segmentation approaches often rely on encoder-decoder architectures that encode a representation of the input to be decoded to semantic pixel labels. In this paper, we propose to use the deep Xception model for the encoder and a simple yet effective decoder that relies on a feature aggregation module. Our feature aggregation module constructs a mapping function that reuses and transfers encoder features and combines information across all feature scales to build a richer representation that keeps both high-level context and low-level boundary information. We argue that this aggregation module enables us to simplify the decoder and reduce the number of parameters in the decoder. We have evaluated our approach on two datasets and our experimental results show that our model outperforms state-of-the-art models on the same experimental setup and significantly improves the previous results, \(98.44\%\) vs \(89.00\%\), on the EndoVis’15 dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Even though this dataset has been annotated for shaft, manipulator and background classes, the author of [2] confirms that shaft and manipulator are merged. We also merge these classes during our experiments.

References

  1. Pascal VOC 2012: segmentation leaderboard. http://host.robots.ox.ac.uk/leaderboard/displaylb.php?challengeid=11&compid=6. Accessed March 2019

  2. Bodenstedt, S., et al.: Comparative evaluation of instrument segmentation and tracking methods in minimally invasive surgery. arXiv preprint arXiv:1805.02475 (2018)

  3. Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 833–851. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_49

    Chapter  Google Scholar 

  4. Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1800–1807. IEEE, July 2017

    Google Scholar 

  5. da Costa Rocha, C., Padoy, N., Rosa, B.: Self-supervised surgical tool segmentation using kinematic information. In: International Conference on Robotics and Automation (ICRA). IEEE (2019)

    Google Scholar 

  6. D’Ettorre, C., et al.: Automated pick-up of suturing needles for robotic surgical assistance. In: International Conference on Robotics and Automation (ICRA), pp. 1370–1377. IEEE (2018)

    Google Scholar 

  7. García-Peraza-Herrera, L.C., et al.: ToolNet: holistically-nested real-time segmentation of robotic surgical tools. In: International Conference on Intelligent Robots and Systems (IROS), pp. 5717–5722. IEEE, September 2017

    Google Scholar 

  8. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580–587. IEEE (2014)

    Google Scholar 

  9. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, June 2018

    Google Scholar 

  10. Jin, A., et al.: Tool detection and operative skill assessment in surgical videos using region-based convolutional neural networks. In: Winter Conference on Applications of Computer Vision (WACV), pp. 691–699, March 2018

    Google Scholar 

  11. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  12. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431–3440. IEEE (2015)

    Google Scholar 

  13. Maier-Hein, L., et al.: Why rankings of biomedical image analysis competitions should be interpreted with care. Nat. Commun. 9(1), 5217 (2018)

    Article  Google Scholar 

  14. Münzer, B., Schoeffmann, K., Böszörmenyi, L.: Content-based processing and analysis of endoscopic images and videos: a survey. Multimedia Tools Appl. 77(1), 1323–1362 (2018)

    Article  Google Scholar 

  15. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  16. Wilson, M., Coleman, M., McGrath, J.: Developing basic hand-eye coordination skills for laparoscopic surgery using gaze training. BJU Int. 105(10), 1356–1358 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abdolrahim Kadkhodamohammadi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kadkhodamohammadi, A., Luengo, I., Barbarisi, S., Taleb, H., Flouty, E., Stoyanov, D. (2019). Feature Aggregation Decoder for Segmenting Laparoscopic Scenes. In: Zhou, L., et al. OR 2.0 Context-Aware Operating Theaters and Machine Learning in Clinical Neuroimaging. OR 2.0 MLCN 2019 2019. Lecture Notes in Computer Science(), vol 11796. Springer, Cham. https://doi.org/10.1007/978-3-030-32695-1_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-32695-1_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-32694-4

  • Online ISBN: 978-3-030-32695-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics