Unsupervised Learning of Endoscopy Video Frames’ Correspondences from Global and Local Transformation

  • Mohammad Ali ArminEmail author
  • Nick Barnes
  • Salman Khan
  • Miaomiao Liu
  • Florian Grimpen
  • Olivier Salvado
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11041)


Inferring the correspondences between consecutive video frames with high accuracy is essential for many medical image processing and computer vision tasks (e.g. image mosaicking, 3D scene reconstruction). Image correspondences can be computed by feature extraction and matching algorithms, which are computationally expensive and are challenged by low texture frames. Convolutional neural networks (CNN) can estimate dense image correspondences with high accuracy, but lack of labeled data especially in medical imaging does not allow end-to-end supervised training. In this paper, we present an unsupervised learning method to estimate dense image correspondences (DIC) between endoscopy frames by developing a new CNN model, called the EndoRegNet. Our proposed network has three distinguishing aspects: a local DIC estimator, a polynomial image transformer which regularizes local correspondences and a visibility mask which refines image correspondences. The EndoRegNet was trained on a mix of simulated and real endoscopy video frames, while its performance was evaluated on real endoscopy frames. We compared the results of EndoRegNet with traditional feature-based image registration. Our results show that EndoRegNet can provide faster and more accurate image correspondences estimation. It can also effectively deal with deformations and occlusions which are common in endoscopy video frames without requiring any labeled data.


Convolutional neural network Unsupervised learning Image correspondences Registration 



We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research.


  1. 1.
    Liu, C., Yuen, J., Torralba, A., Sivic, J., Freeman, W.T.: SIFT flow: dense correspondence across different scenes. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5304, pp. 28–42. Springer, Heidelberg (2008). Scholar
  2. 2.
    Shi, J., Carlo, T.: Good features to track. In: Presented at the Computer Vision and Patern Recognition, Seattle, WA (1994)Google Scholar
  3. 3.
    Armin, M.A., Chetty, G., De Visser, H., Dumas, C., Grimpen, F., Salvado, O.: Automated visibility map of the internal colon surface from colonoscopy video. Int. J. Comput. Assist. Radiol. Surg. 11, 1599–1610 (2016)CrossRefGoogle Scholar
  4. 4.
    Bell, C.S., Puerto, G.A., Mariottini, G.-L., Valdastri, P.: Six DOF motion estimation for teleoperated flexible endoscopes using optical flow: a comparative study. Presented at the May (2014)Google Scholar
  5. 5.
    Puerto-Souza, G.A., Mariottini, G.L.: Hierarchical Multi-Affine (HMA) algorithm for fast and accurate feature matching in minimally-invasive surgical images. Presented at the October (2012)Google Scholar
  6. 6.
    Ji, D., Kwon, J., McFarland, M., Savarese, S.: Deep view morphing. In: CVPR 2017 (2017)Google Scholar
  7. 7.
    Dosovitskiy, A., et al.: FlowNet: learning optical flow with convolutional networks. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 2758–2766. IEEE (2015)Google Scholar
  8. 8.
    Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. Presented at the July (2017)Google Scholar
  9. 9.
    Garg, R., Vijay Kumar, B.G., Carneiro, G., Reid, I.: Unsupervised CNN for single view depth estimation: geometry to the rescue. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 740–756. Springer, Cham (2016). Scholar
  10. 10.
    Yin, Z., Shi, J.: GeoNet: unsupervised learning of dense depth, optical flow and camera pose. In: CVPR (2018)Google Scholar
  11. 11.
    Meister, S., Hur, J., Roth, S.: UnFlow: unsupervised learning of optical flow with a bidirectional census loss. In: AAAI (2018)Google Scholar
  12. 12.
    Wang, Y., Yang, Y., Yang, Z., Zhao, L., Xu, W.: Occlusion aware unsupervised learning of optical flow. In: CVPR (2018)Google Scholar
  13. 13.
    Butler, D.J., Wulff, J., Stanley, G.B., Black, M.J.: A naturalistic open source movie for optical flow evaluation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 611–625. Springer, Heidelberg (2012). Scholar
  14. 14.
    de Vos Bob, D., Berendsen, F.F., Viergever, M.A., Staring, M., Išgum, I.: End-to-end unsupervised deformable image registration with a convolutional neural network. In: Cardoso, M.J., et al. (eds.) DLMIA/ML-CDS -2017. LNCS, vol. 10553, pp. 204–212. Springer, Cham (2017). Scholar
  15. 15.
    Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 2017–2025. Curran Associates Inc, Red Hook (2015)Google Scholar
  16. 16.
    Mountney, P., Stoyanov, D., Yang, G.-Z.: Three-dimensional tissue deformation recovery and tracking. IEEE Signal Process. Mag. 27, 14–24 (2010)CrossRefGoogle Scholar
  17. 17.
    Ye, M., Giannarou, S., Meining, A., Yang, G.-Z.: Online tracking and retargeting with applications to optical biopsy in gastrointestinal endoscopic examinations. Med. Image Anal. 30, 144–157 (2016)CrossRefGoogle Scholar
  18. 18.
    Zhou, T., Tulsiani, S., Sun, W., Malik, J., Efros, A.A.: View synthesis by appearance flow. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 286–301. Springer, Cham (2016). Scholar
  19. 19.
    Zhou, T., Krahenbuhl, P., Aubry, M., Huang, Q., Efros, A.A.: Learning dense correspondence via 3D-guided cycle consistency. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 117–126 (2016)Google Scholar
  20. 20.
    Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems. ArXiv:160304467 Cs. (2016)Google Scholar
  21. 21.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. ArXiv:14126980 Cs. (2014)Google Scholar
  22. 22.
    De Visser, H., et al.: Developing a next generation colonoscopy simulator. Int. J. Image Graph. 10, 203–217 (2010)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Armin, M.A., et al.: Uninformative frame detection in colonoscopy through motion, edge and color features. In: Luo, X., Reichl, T., Reiter, A., Mariottini, G.-L. (eds.) CARE 2015. LNCS, vol. 9515, pp. 153–162. Springer, Cham (2016). Scholar
  24. 24.
    Hamlyn Centre Laparoscopic/Endoscopic Video Datasets.
  25. 25.
    Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13, 600–612 (2004)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Mohammad Ali Armin
    • 1
    • 2
    Email author
  • Nick Barnes
    • 1
    • 4
  • Salman Khan
    • 1
  • Miaomiao Liu
    • 1
  • Florian Grimpen
    • 3
  • Olivier Salvado
    • 2
  1. 1.CSIRO (Data61)CanberraAustralia
  2. 2.Biomedical Informatics GroupBrisbaneAustralia
  3. 3.Department of Gastroenterology and HepatologyRoyal Brisbane and Women’s HospitalBrisbaneAustralia
  4. 4.College of Engineering and Computer Science (ANU)CanberraAustralia

Personalised recommendations