Abstract
Video-assisted transoral tracheal intubation (TI) necessitates using an endoscope that helps the physician insert a tracheal tube into the glottis instead of the esophagus. The growing trend of robotic-assisted TI would require a medical robot to distinguish anatomical features like an experienced physician which can be imitated by utilizing supervised deep-learning techniques. However, the real datasets of oropharyngeal organs are often inaccessible due to limited open-source data and patient privacy. In this work, we propose a domain adaptive Sim-to-Real framework called IoU-Ranking Blend-ArtFlow (IRB-AF) for image segmentation of oropharyngeal organs. The framework includes an image blending strategy called IoU-Ranking Blend (IRB) and style-transfer method ArtFlow. Here, IRB alleviates the problem of poor segmentation performance caused by significant datasets domain differences, while ArtFlow is introduced to reduce the discrepancies between datasets further. A virtual oropharynx image dataset generated by the SOFA framework is used as the learning subject for semantic segmentation to deal with the limited availability of actual endoscopic images. We adapted IRB-AF with the state-of-the-art domain adaptive segmentation models. The results demonstrate the superior performance of our approach in further improving the segmentation accuracy and training stability.
Graphical abstract
Similar content being viewed by others
Notes
Endoscopic Images generated from SOFA-based oropharynx model with style transfer from phantom (EISOST) - https://github.com/gkw0010/EISOST-Sim2Real-Dataset-Release.
References
Thomas EB, Moss S (2014) Tracheal intubation. Anaesth Intensiv Care Med 15(1):5–7
Caplan RA, Benumof JL, Berry FA, Blitt CD, Bode RH, Cheney FW, Connis RT, Guidry OF, Nickinovich DG, Ovassapian A (2003) Practice guidelines for management of the difficult airway. Anesthesiology 98(1269–1277):2
Lu B, Li B, Chen W, Jin Y, Zhao Z, Dou Q, Heng PA, Liu Y (2021) Toward image-guided automated suture grasping under complex environments: a learning-enabled and optimization-based holistic framework. IEEE Transac Automation Sci Eng 19(4):3794–3808
Lai J, Lu B, Chu HK (2021) Variable-stiffness control of a dual-segment soft robot using depth vision. IEEE ASME Trans Mechatron 27(2):1034–1045
Lu B, Li B, Dou Q, Liu Y (2022) A unified monocular camera-based and pattern-free hand-to-eye calibration algorithm for surgical robots with RCM constraints. IEEE/ASME Trans Mechatron 27(6):5124–5135
Yu BX, Liu Y, Zhang X, Zhong Sh, Chan KC (2022) Mmnet: a modelbased multimodal network for human action recognition in rgb-d videos. IEEE Trans Pattern Anal Mach Intell
Asgari Taghanaki S, Abhishek K, Cohen JP, Cohen-Adad J, Hamarneh G (2021) Deep semantic segmentation of natural and medical images: a review. Artif Intell Rev 54(1):137–178
Frangi AF, Tsaftaris SA, Prince JL (2018) Simulation and synthesis in medical imaging. IEEE Trans Med Image 37(3):673–679
Rehman M, Arsenault L, Javan R (2022) Organs in color: utilizing free software and emerging multi jet fusion technology to color and surface label 3D-printed anatomical models. J Digit Imaging 35(6):1611–1622
Duriez C (2013) Control of elastic soft robots based on real-time finite element method. In: Proc IEEE Int Conf Robot Autom (ICRA), 3982-3987
Zhao W, Queralta JP, Westerlund T (2020) Sim-to-real transfer in deep reinforcement learning for robotics: a survey. In: Proc IEEE Symp Ser Comput Intell (SSCI), 737-744
Ganry L, Hersant B, Quilichini J, Leyder P, Meningaud J (2017) Use of the 3D surgical modelling technique with open-source software for mandibular fibula free flap reconstruction and its surgical guides. J Stomatol Oral Maxillofac Surg 118(3):197–202
Pierri R, Nogueira L, Balan I, Iwaki L et al (2019) Bimaxillary orthognatic surgery planned with the software blender, through the addon ortogonblender. Int J Oral Maxillofac Surg 48:254
Chen X, Hu J, Jin C, Li L, Wang L (2021) Understanding domain randomization for sim-to-real transfer. arXiv:2110.03239
Tobin J, Fong R, Ray A, Schneider J, Zaremba W, Abbeel P (2017) Domain randomization for transferring deep neural networks from simulation to the real world. In: IEEE/RSJ Int. Conf. Intell. Robot. Syst. (IROS), pp. 23-30 . IEEE
Yang Y, Soatto S (2020) Fda: Fourier domain adaptation for semantic segmentation. In: Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 4085-4095
Geng B, Tao D, Xu C (2011) DAML: domain adaptation metric learning. IEEE Trans Image Process 20(10):2980–2989
Long M, Cao Y, Wang J, Jordan M (2015) Learning transferable features with deep adaptation networks. In: Proc. Int. Conf. Mach. Learn. (ICML), pp. 97-105. PMLR
Zellinger W, Grubinger T, Lughofer E, Natschläger T, SamingerPlatz S (2017) Central moment discrepancy (cmd) for domain-invariant representation learning. arXiv:1702.08811
Zou Y, Yu Z, Kumar B, Wang J (2018) Unsupervised domain adaptation for semantic segmentation via class-balanced self-training. In: Proc. Eur. Conf. Comput. Vis. (ECCV), pp. 289-305
Wu Z, Han X, Lin YL, Uzunbas MG, Goldstein T, Lim SN, Davis LS (2018) Dcan: dual channel-wise alignment networks for unsupervised scene adaptation. In: Proc. Eur. Conf. Comput. Vis. (ECCV), pp. 518-534
Sankaranarayanan S, Balaji Y, Jain A, Lim SN, Chellappa R (2018) Learning from synthetic data: addressing domain shift for semantic segmentation. In: Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 3752-3761
Vu TH, Jain H, Bucher M, Cord M, Pérez P (2019) Advent: adversarial entropy minimization for domain adaptation in semantic segmentation. In: Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 2517-2526
Hoffman J, Tzeng E, Park T, Zhu JY, Isola P, Saenko K, Efros A, Darrell T (2018) Cycada: cycle-consistent adversarial domain adaptation. In: Proc. Int. Conf. Mach. Learn. (ICML), pp. 1989-1998 . Pmlr
Richter SR, Vineet V, Roth S, Koltun V (2016) Playing for data: ground truth from computer games. In: Proc. Eur. Conf. Comput. Vis. (ECCV), pp. 102-118. Springer
Ros G, Sellart L, Materzynska J, Vazquez D, Lopez AM (2016) Thesynthia dataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 3234-3243
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 3213-3223
Li Y, Yuan L, Vasconcelos N (2019) Bidirectional learning for domain adaptation of semantic segmentation. In: Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 6936-6945
Zhu XJ (2005) Semi-supervised learning literature survey. Technical report, University of Wisconsin-Madison Department of Computer Sciences
Springenberg JT (2015) Unsupervised and semi-supervised learning with categorical generative adversarial networks. arXiv:1511.06390
An J, Huang S, Song Y, Dou D, Liu W, Luo J (2021) Artflow: unbiased image style transfer via reversible neural flows. In: Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 862-871
Lai J, Ren TA, Yue W, Su S, Chan JYK, Ren H (2023) Sim-to-real transfer of soft robotic navigation strategies that learns from the virtual eye-in-hand vision. Under Review
Lai J, Lu B, Zhao Q, Chu HK (2022) Constrained motion planning of a cable-driven soft robot with compressible curvature modeling. IEEE Robot Autom Lett 7(2):4813–4820
Allan M, Shvets A, Kurmann T, Zhang Z, Duggal R, Su YH, Rieke N, Laina I, Kalavakonda N, Bodenstedt S, et al. (2019) 2017 robotic instrument segmentation challenge. arXiv:1902.06426
Allan M, Kondo S, Bodenstedt S, Leger S, Kadkhodamohammadi R, Luengo I, Fuentes F, Flouty E, Mohammed A, Pedersen M, et al.: 2018 robotic scene segmentation challenge. arXiv:2001.11190
University of Dundee, School of Medicine (2022): Pharynx and floor of mouth. https://skfb.ly/6QXqr. Accessed: 2022-08-01
Ghiasi G, Lee H, Kudlur M, Dumoulin V, Shlens J (2017) Exploringthe structure of a real-time, arbitrary neural artistic stylization network. arXiv:1705.06830
Li Y, Fang C, Yang J, Wang Z, Lu X, Yang MH (2017) Universal style transfer via feature transforms. Adv Neural Info Process Syst 30
Huang X, Belongie S (2017) Arbitrary style transfer in real-time with adaptive instance normalization. In: Proc. IEEE Int. Conf. Compt. Vis. (ICCV), pp. 1501-1510
Liao J, Yao Y, Yuan L, Hua G, Kang SB (2017) Visual attribute transfer through deep image analogy. arXiv:1705.01088
Kingma DP, Dhariwal P (2018) Glow: generative flow with invertible 1x1 convolutions. Adv Neural Info Process Syst 31
Dinh L, Krueger D, Bengio Y (2014) Nice: non-linear independent components estimation. arXiv:1410.8516
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 770-778
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
Acknowledgements
This work was supported in part by the Hong Kong Research Grants Council (RGC) Collaborative Research Fund (CRF-C4026-21GF)
Author information
Authors and Affiliations
Contributions
G.W., J.L., and H.R. conceived the concepts. G.W., T.R., J.L., and L.B. advised on the design and implementation of the experiments. G.W., T.R., and J.L. conducted experiments and analyzed the data. G.W., T.R., J.L., and L.B. wrote the manuscript. All authors read, edited, and discussed the manuscript and agree with the claims made in this work. H.R. coordinated and supervised the research.
Corresponding author
Ethics declarations
Ethics approval
Ethical approval was not sought for the present study because this article does not contain any studies with human or animal subjects.
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, G., Ren, TA., Lai, J. et al. Domain adaptive Sim-to-Real segmentation of oropharyngeal organs. Med Biol Eng Comput 61, 2745–2755 (2023). https://doi.org/10.1007/s11517-023-02877-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11517-023-02877-0