AGNet: Attention-Guided Network for Surgical Tool Presence Detection
We propose a novel approach to automatically recognize the presence of surgical tools in surgical videos, which is quite challenging due to the large variation and partially appearance of surgical tools, the complicated surgical scenes, and the co-occurrence of some tools in the same frame. Inspired by human visual attention mechanism, which first orients and selects some important visual cues and then carefully analyzes these focuses of attention, we propose to first leverage a global prediction network to obtain a set of visual attention maps and a global prediction for each tool, and then harness a local prediction network to predict the presence of tools based on these attention maps. We apply a gate function to obtain the final prediction results by balancing the global and the local predictions. The proposed attention-guided network (AGNet) achieves state-of-the-art performance on m2cai16-tool dataset and surpasses the winner in 2016 by a significant margin.
KeywordsSurgical tool recognition Attention-guided network Laparoscopic videos Cholecystectomy Deep learning
The work described in this paper was supported by the following grants from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. CUHK 14202514 and CUHK 14203115).
- 1.He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)Google Scholar
- 2.Letouzey, A., Decrouez, M., Agustinos, A., Voros, S.: Instruments localisation and identification for laparoscopic surgeries (2016). http://camma.u-strasbg.fr/m2cai2016/reports/Letouzey-Tool.pdf
- 3.Luo, H., Hu, Q., Jia, F.: Surgical tool detection via multiple convolutional neural networks (2016). http://camma.u-strasbg.fr/m2cai2016/reports/Luo-Tool.pdf
- 5.Raju, A., Wang, S., Huang, J.: M2CAI surgical tool detection challenge report (2016). http://camma.u-strasbg.fr/m2cai2016/reports/Raju-Tool.pdf
- 6.Rosen, M.L., Stern, C.E., Michalka, S.W., Devaney, K.J., Somers, D.C.: Cognitive Control Network Contributions to Memory-Guided Visual Attention. Cerebral Cortex, New York (2015). bhv028Google Scholar
- 8.Sahu, M., Mukhopadhyay, A., Szengel, A., Zachow, S.: Tool and phase recognition using contextual CNN features. arXiv preprint arXiv:1610.08854 (2016)
- 9.Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: CVPR, pp. 761–769 (2016)Google Scholar
- 10.Twinanda, A.P., Mutter, D., Marescaux, J., de Mathelin, M., Padoy, N.: Single-and multi-task architectures for tool presence detection challenge at M2CAI 2016. arXiv preprint arXiv:1610.08851 (2016)
- 12.Zia, A., Castro, D., Essa, I.: Fine-tuning deep architectures for surgical tool detection (2016). http://camma.u-strasbg.fr/m2cai2016/reports/Zia-Tool.pdf