Advertisement

Self-supervised Learning for Dense Depth Estimation in Monocular Endoscopy

  • Xingtong Liu
  • Ayushi Sinha
  • Mathias Unberath
  • Masaru Ishii
  • Gregory D. Hager
  • Russell H. Taylor
  • Austin Reiter
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11041)

Abstract

We present a self-supervised approach to training convolutional neural networks for dense depth estimation from monocular endoscopy data without a priori modeling of anatomy or shading. Our method only requires sequential data from monocular endoscopic videos and a multi-view stereo reconstruction method, e.g. structure from motion, that supervises learning in a sparse but accurate manner. Consequently, our method requires neither manual interaction, such as scaling or labeling, nor patient CT in the training and application phases. We demonstrate the performance of our method on sinus endoscopy data from two patients and validate depth prediction quantitatively using corresponding patient CT scans where we found submillimeter residual errors. (Link to the supplementary video: https://camp.lcsr.jhu.edu/miccai-2018-demonstration-videos/)

Notes

Acknowledgement

The work reported in this paper was funded in part by NIH R01-EB015530, in part by a research contract from Galen Robotics, and in part by Johns Hopkins University internal funds.

References

  1. 1.
    Leonard, S., et al.: Evaluation and stability analysis of video-based navigation system for functional endoscopic sinus surgery on in-vivo clinical data. IEEE Trans. Med. Imaging 62(c), 1–10 (2018).  https://doi.org/10.1109/TMI.2018.2833868
  2. 2.
    Sinha, A., Liu, X., Reiter, A., Ishii, M., Hager, G.D, Taylor, R.H.: Endoscopic navigation in the absence of CT imaging. Med. Image Comput. Comput. Assist. Interv. (2018, in press). https://arxiv.org/abs/1806.03997
  3. 3.
    Grasa, O.G., Bernal, E., Casado, S., Gil, I., Montiel, J.M.M.: Visual SLAM for handheld monocular endoscope. IEEE Trans. Med. Imaging 33(1), 135–146 (2014).  https://doi.org/10.1109/TMI.2013.2282997CrossRefGoogle Scholar
  4. 4.
    Mahmoud, N., Hostettler, A., Collins, T., Soler, L., Doignon, C., Montiel, J.M.M.: SLAM based quasi dense reconstruction for minimally invasive surgery scenes. arXiv:1705.09107 (2017)
  5. 5.
    Tatematsu, K., Iwahori, Y., Nakamura, T., Fukui, S., Woodham, R.J., Kasugai, K.: Shape from endoscope image based on photometric and geometric constraints. Procedia Comput. Sci. 22, 1285–1293 (2013).  https://doi.org/10.1016/j.procs.2013.09.216CrossRefGoogle Scholar
  6. 6.
    Ciuti, G., Visentini-Scarzanella, M., Dore, A., Menciassi, A., Dario, P., Yang, G.Z.: Intra-operative monocular 3D reconstruction for image-guided navigation in active locomotion capsule endoscopy. In: 4th IEEE RAS & EMBS International Conference on Biomedical Robotics and Biomechatronics (BioRob), pp. 768–774 (2012).  https://doi.org/10.1109/BioRob.2012.6290771
  7. 7.
    Reiter, A., Leonard, S., Sinha, A., Ishii, M., Taylor, R.H., Hager, G.D.: Endoscopic-CT: learning-based photometric reconstruction for endoscopic sinus surgery. In: Proceedings of SPIE Medical Imaging 2016: Image Processing, vol. 9784, p. 978418–6 (2016).  https://doi.org/10.1117/12.2216296
  8. 8.
    Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: Fourth International Conference on 3D Vision (3DV), pp. 239–248 (2016).  https://doi.org/10.1109/3DV.2016.32
  9. 9.
    Visentini-Scarzanella, M., Sugiura, T., Kaneko, T., Koto, S.: Deep monocular 3D reconstruction for assisted navigation in bronchoscopy. Int. J. Comput. Assist. Radiol. Surg. 12(7), 1089–1099 (2017).  https://doi.org/10.1007/s11548-017-1609-2CrossRefGoogle Scholar
  10. 10.
    Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, no. 6, pp. 6612–6619 (2017).  https://doi.org/10.1109/CVPR.2017.700
  11. 11.
    Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, pp. 539–546 (2005).  https://doi.org/10.1109/CVPR.2005.202
  12. 12.
    Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Proceedings of International Conference on Neural Information Processing Systems, vol. 2, pp. 2366–2374 (2014). http://dl.acm.org/citation.cfm?id=2969033.2969091
  13. 13.
    Billings, S., Taylor, R.: Generalized iterative most likely oriented-point (G-IMLOP) registration. Int. J. Comput. Assist. Radiol. Surg. 10(8), 1213–1226 (2015).  https://doi.org/10.1007/s11548-015-1221-2CrossRefGoogle Scholar
  14. 14.
    Sinha, A., Reiter, A., Leonard, S., Ishii, M., Hager, G.D., Taylor, R.H.: Simultaneous segmentation and correspondence improvement using statistical modes. In: Proceedings of SPIE Medical Imaging 2017: Image Processing, vol. 10133, p. 101331B–8 (2017).  https://doi.org/10.1117/12.2253533
  15. 15.
    Mao, X., Shen, C., Yang, Y.B.: Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections. In: Proceedings of International Conference on Neural Information Processing Systems, pp. 2802–2810 (2016). https://dl.acm.org/citation.cfm?id=3157412

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Xingtong Liu
    • 1
  • Ayushi Sinha
    • 1
  • Mathias Unberath
    • 1
  • Masaru Ishii
    • 2
  • Gregory D. Hager
    • 1
  • Russell H. Taylor
    • 1
  • Austin Reiter
    • 1
  1. 1.The Johns Hopkins UniversityBaltimoreUSA
  2. 2.Johns Hopkins Medical InstitutionsBaltimoreUSA

Personalised recommendations