AGNet: Attention-Guided Network for Surgical Tool Presence Detection

  • Xiaowei Hu
  • Lequan Yu
  • Hao Chen
  • Jing Qin
  • Pheng-Ann Heng
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10553)


We propose a novel approach to automatically recognize the presence of surgical tools in surgical videos, which is quite challenging due to the large variation and partially appearance of surgical tools, the complicated surgical scenes, and the co-occurrence of some tools in the same frame. Inspired by human visual attention mechanism, which first orients and selects some important visual cues and then carefully analyzes these focuses of attention, we propose to first leverage a global prediction network to obtain a set of visual attention maps and a global prediction for each tool, and then harness a local prediction network to predict the presence of tools based on these attention maps. We apply a gate function to obtain the final prediction results by balancing the global and the local predictions. The proposed attention-guided network (AGNet) achieves state-of-the-art performance on m2cai16-tool dataset and surpasses the winner in 2016 by a significant margin.


Surgical tool recognition Attention-guided network Laparoscopic videos Cholecystectomy Deep learning 



The work described in this paper was supported by the following grants from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. CUHK 14202514 and CUHK 14203115).


  1. 1.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)Google Scholar
  2. 2.
    Letouzey, A., Decrouez, M., Agustinos, A., Voros, S.: Instruments localisation and identification for laparoscopic surgeries (2016).
  3. 3.
    Luo, H., Hu, Q., Jia, F.: Surgical tool detection via multiple convolutional neural networks (2016).
  4. 4.
    Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)CrossRefGoogle Scholar
  5. 5.
    Raju, A., Wang, S., Huang, J.: M2CAI surgical tool detection challenge report (2016).
  6. 6.
    Rosen, M.L., Stern, C.E., Michalka, S.W., Devaney, K.J., Somers, D.C.: Cognitive Control Network Contributions to Memory-Guided Visual Attention. Cerebral Cortex, New York (2015). bhv028Google Scholar
  7. 7.
    Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. IJCV 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Sahu, M., Mukhopadhyay, A., Szengel, A., Zachow, S.: Tool and phase recognition using contextual CNN features. arXiv preprint arXiv:1610.08854 (2016)
  9. 9.
    Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: CVPR, pp. 761–769 (2016)Google Scholar
  10. 10.
    Twinanda, A.P., Mutter, D., Marescaux, J., de Mathelin, M., Padoy, N.: Single-and multi-task architectures for tool presence detection challenge at M2CAI 2016. arXiv preprint arXiv:1610.08851 (2016)
  11. 11.
    Twinanda, A.P., Shehata, S., Mutter, D., Marescaux, J., de Mathelin, M., Padoy, N.: EndoNet: a deep architecture for recognition tasks on laparoscopic videos. IEEE Trans. Med. Imaging 36(1), 86–97 (2017)CrossRefGoogle Scholar
  12. 12.
    Zia, A., Castro, D., Essa, I.: Fine-tuning deep architectures for surgical tool detection (2016).

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Xiaowei Hu
    • 1
  • Lequan Yu
    • 1
  • Hao Chen
    • 1
  • Jing Qin
    • 2
  • Pheng-Ann Heng
    • 1
  1. 1.Department of Computer Science and EngineeringThe Chinese University of Hong KongHong KongThe People’s Republic of China
  2. 2.Centre for Smart Health, School of NursingThe Hong Kong Polytechnic UniversityHong KongPeople’s Republic of China

Personalised recommendations