Abstract
In the field of ophthalmic surgery, many clinicians nowadays record their microscopic procedures with a video camera and use the recorded footage for later purpose, such as forensics, teaching, or training. However, in order to efficiently use the video material after surgery, the video content needs to be analyzed automatically. Important semantic content to be analyzed and indexed in these short videos are operation instruments, since they provide an indication of the corresponding operation phase and surgical action. Related work has already shown that it is possible to accurately detect instruments in cataract surgery videos. However, their underlying dataset (from the CATARACTS challenge) has very good visual quality, which is not reflecting the typical quality of videos acquired in general hospitals. In this paper, we therefore analyze the generalization performance of deep learning models for instrument recognition in terms of dataset change. More precisely, we trained such models as ResNet-50, Inception v3 and NASNet Mobile using a dataset of high visual quality (CATARACTS) and test it on another dataset with low visual quality (Cataract-101), and vice versa. Our results show that the generalizability is surprisingly low in general, but slightly worse for the model trained on the high-quality dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Primus, M.J., et al.: Frame-based classification of operation phases in cataract surgery videos. In: Schoeffmann, K., et al. (eds.) MMM 2018. LNCS, vol. 10704, pp. 241–253. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73603-7_20
Quellec, G., Lamard, M., Cochener, B., Cazuguel, G.: Real-time task recognition in cataract surgery videos using adaptive spatiotemporal polynomials. IEEE Trans. Med. Imaging 34(4), 877–887 (2014)
Hajj, H.A., Lamard, M., Conze, P.-H., Cochener, B., Quellec, G.: Monitoring tool usage in surgery videos using boosted convolutional and recurrent neural networks. Med. Image Anal. 47, 203–218 (2018)
Hajj, H.A., et al.: CATARACTS: challenge on automatic tool annotation for cataract surgery. Med. Image Anal. 52, 24–41 (2019)
Schoeffmann, K., Taschwer, M., Sarny, S., Münzer, B., Primus, M.J., Putzgruber, D.: Cataract-101: video dataset of 101 cataract surgeries. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 421–425. ACM (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. CoRR, vol. abs/1512.00567 (2015)
Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition. CoRR, vol. abs/1707.07012 (2017)
Charrière, K., Quellec, G., Lamard, M., Coatrieux, G., Cochener, B., Cazuguel, G.: Automated surgical step recognition in normalized cataract surgery videos. In: 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 4647–4650, August 2014
Quellec, G., Lamard, M., Cochener, B., Cazuguel, G.: Real-time segmentation and recognition of surgical tasks in cataract surgery videos. IEEE Trans. Med. Imaging 33(12), 2352–2360 (2014)
Charriere, K., et al.: Real-time multilevel sequencing of cataract surgery videos. In: 2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI), pp. 1–6, June 2016
Charrière, K., et al.: Real-time analysis of cataract surgery videos using statistical models. Multimed. Tools Appl. 76(21), 22473–22491 (2017)
Al Hajj, H., Lamard, M., Cochener, B., Quellec, G.: Smart data augmentation for surgical tool detection on the surgical tray. In: 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 4407–4410, July 2017
Al Hajj, H., Lamard, M., Charrière, K., Cochener, B., Quellec, G.: Surgical tool detection in cataract surgery videos through multi-image fusion inside a convolutional neural network. In: 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 2002–2005, July 2017
Zisimopoulos, O., et al.: DeepPhase: surgical phase recognition in CATARACTS videos. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11073, pp. 265–272. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00937-3_31
Halevy, A., Norvig, P., Pereira, F.: The unreasonable effectiveness of data. IEEE Intell. Syst. 24(2), 8–12 (2009)
Vardazaryan, A., Mutter, D., Marescaux, J., Padoy, N.: Weakly-supervised learning for tool localization in laparoscopic videos. In: Stoyanov, D., et al. (eds.) LABELS/CVII/STENT-2018. LNCS, vol. 11043, pp. 169–179. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01364-6_19
Acknowledgments
This work was funded by the FWF Austrian Science Fund under grant P 31486-N31.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Sokolova, N., Schoeffmann, K., Taschwer, M., Putzgruber-Adamitsch, D., El-Shabrawi, Y. (2020). Evaluating the Generalization Performance of Instrument Classification in Cataract Surgery Videos. In: Ro, Y., et al. MultiMedia Modeling. MMM 2020. Lecture Notes in Computer Science(), vol 11962. Springer, Cham. https://doi.org/10.1007/978-3-030-37734-2_51
Download citation
DOI: https://doi.org/10.1007/978-3-030-37734-2_51
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37733-5
Online ISBN: 978-3-030-37734-2
eBook Packages: Computer ScienceComputer Science (R0)