Skip to main content

Attention Aware Deep Learning Model for Wireless Capsule Endoscopy Lesion Classification and Localization



Wireless capsule endoscopy (WCE) is a fundamental diagnosing tool for gastro-intestinal (GI) lesion detection. Detecting and locating the lesions in WCE images using a computer-aided detection method is a challenging task because of the complex nature of GI systems and higher similarities between normal muscle and lesion regions. This study presents the lesion attention aware convolutional neural network (CNN) model using the self-attention mechanism to localize the lesion regions in the WCE image.


The proposed novel lesion region estimator model uses ResNet-50 as a convolutional stem and self-attention mechanism that accurately aggregate spatial features in the global context to localize the lesion attention maps in WCE images. These lesion attention maps are fused with the original WCE image to elevate the lesion region in original WCE images. The lesion attention map estimator and classification network are mutually trained together to improve the detection accuracy of the lesion attention map estimator and the classification accuracy of WCE images respectively.


Also, the model is tested on two publicly available datasets namely bleeding dataset and Kvasir-Capsule dataset with the overall classification accuracy of 95.1 and 94.7, respectively. The proposed attention augmented CNN model outperforms existing CNN-based models.


The experiment results show that the proposed lesion aware classification network offers superior classification accuracy thus aggregating semantic and conceptual attention maps using self-attention mechanisms. Further, this mechanism helps to improve the model explainability by analyzing the gradients of the attention maps.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6


  1. Soffer, S., Klang, E., Shimon, O., Nachmias, N., Eliakim, R., Ben-Horin, S., Kopylov, U., & Barash, Y. (2020). Deep learning for wireless capsule endoscopy: A systematic review and meta-analysis. Gastrointestinal Endoscopy, 92(4), 831–839.

    Article  PubMed  Google Scholar 

  2. Ahn, J., Loc, H. N., Balan, R. K., Lee, Y., & Ko, J. (2018). Finding small-bowel lesions: Challenges in endoscopy-image-based learning systems. Computer, 51(5), 68–76.

    Article  Google Scholar 

  3. Guo, X., & Yuan, Y. (2020). Semi-supervised WCE image classification with adaptive aggregated attention. Medical Image Analysis, 64, 101733.

    Article  PubMed  Google Scholar 

  4. Charfi, S., & El Ansari, M. (2020). A locally based feature descriptor for abnormalities detection. Soft Computing, 24(6), 4469–4481.

    Article  Google Scholar 

  5. Ghosh, T., Fattah, S. A., & Wahid, K. A. (2018). CHOBS: Color histogram of block statistics for automatic bleeding detection in wireless capsule endoscopy video. IEEE Journal of Translational Engineering in Health and Medicine, 6, 1–12.

    CAS  Article  Google Scholar 

  6. Gadermayr, M., Wimmer, G., Kogler, H., Vécsei, A., Merhof, D., & Uhl, A. (2018). Automated classification of celiac disease during upper endoscopy: Status quo and quo vadis. Computers in Biology and Medicine, 102, 221–226.

    CAS  Article  PubMed  Google Scholar 

  7. Yuan, Y., Yao, X., Han, J., Guo, L., & Meng, M. Q. H. (2017). Discriminative joint-feature topic model with dual constraints for WCE classification. IEEE Transactions on Cybernetics, 48(7), 2074–2085.

    Article  PubMed  Google Scholar 

  8. Shen, Y., Guturu, P., & Buckles, B. P. (2011). Wireless capsule endoscopy video segmentation using an unsupervised learning approach based on probabilistic latent semantic analysis with scale invariant features. IEEE Transactions on Information Technology in Biomedicine, 16(1), 98–105.

    Article  PubMed  Google Scholar 

  9. Lan, L., Ye, C., Wang, C., & Zhou, S. (2019). Deep convolutional neural networks for WCE abnormality detection: CNN architecture, region proposal and transfer learning. IEEE Access, 7, 30017–30032.

    Article  Google Scholar 

  10. Yu, J. S., Chen, J., Xiang, Z. Q., & Zou, Y. X. (2015). A hybrid convolutional neural networks with extreme learning machine for WCE image classification. In: 2015 IEEE International Conference on Robotics and Biomimetics (ROBIO). IEEE. pp. 1822–1827. DOI:

  11. Seguí, S., Drozdzal, M., Pascual, G., Radeva, P., Malagelada, C., Azpiroz, F., & Vitrià, J. (2016). Generic feature learning for wireless capsule endoscopy analysis. Computers in Biology and Medicine, 79, 163–172.

    Article  PubMed  Google Scholar 

  12. Gao, Y., Lu, W., Si, X., & Lan, Y. (2020). Deep model-based semi-supervised learning way for outlier detection in wireless capsule endoscopy images. IEEE Access, 8, 81621–81632.

    Article  Google Scholar 

  13. Zhou, T., Han, G., Li, B. N., Lin, Z., Ciaccio, E. J., Green, P. H., & Qin, J. (2017). Quantitative analysis of patients with celiac disease by video capsule endoscopy: A deep learning method. Computers in Biology and Medicine, 85, 1–6.

    Article  PubMed  Google Scholar 

  14. Chen, H., Wu, X., Tao, G., & Peng, Q. (2017). Automatic content understanding with cascaded spatial–temporal deep framework for capsule endoscopy videos. Neurocomputing, 229, 77–87.

    Article  Google Scholar 

  15. Yuan, Y., & Meng, M. Q. H. (2017). Deep learning for polyp recognition in wireless capsule endoscopy images. Medical physics, 44(4), 1379–1389.

    Article  PubMed  Google Scholar 

  16. Sekuboyina, A. K., Devarakonda, S. T., & Seelamantula, C. S. (2017). A convolutional neural network approach for abnormality detection in wireless capsule endoscopy. In: 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017). IEEE. pp. 1057–1060. DOI:

  17. Iakovidis, D. K., Georgakopoulos, S. V., Vasilakakis, M., Koulaouzidis, A., & Plagianakos, V. P. (2018). Detecting and locating gastrointestinal anomalies using deep learning and iterative cluster unification. IEEE Transactions on Medical Imaging, 37(10), 2196–2210.

    Article  PubMed  Google Scholar 

  18. Aoki, T., Yamada, A., Aoyama, K., Saito, H., Tsuboi, A., Nakada, A., Niikura, R., Fujishiro, M., Oka, S., Ishihara, S., Matsuda, T., & Tada, T. (2019). Automatic detection of erosions and ulcerations in wireless capsule endoscopy images based on a deep convolutional neural network. Gastrointestinal Endoscopy, 89(2), 357–363.

    Article  PubMed  Google Scholar 

  19. Gomes, S., Valério, M. T., Salgado, M., Oliveira, H. P., & Cunha, A. (2019). Unsupervised neural network for homography estimation in capsule endoscopy frames. Procedia Computer Science, 164, 602–609.

    Article  Google Scholar 

  20. Wang, S., Xing, Y., Zhang, L., Gao, H., & Zhang, H. (2019). Deep convolutional neural network for ulcer recognition in wireless capsule endoscopy: Experimental feasibility and optimization. Computational and Mathematical Methods in Medicine.

    Article  PubMed  PubMed Central  Google Scholar 

  21. Alaskar, H., Hussain, A., Al-Aseem, N., Liatsis, P., & Al-Jumeily, D. (2019). Application of convolutional neural networks for automated ulcer detection in wireless capsule endoscopy images. Sensors, 19(6), 1265.

    Article  PubMed Central  Google Scholar 

  22. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł, & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 2017, 5998–6008.

    Google Scholar 

  23. Muruganantham, P., & Balakrishnan, S. M. (2021). A survey on deep learning models for wireless capsule endoscopy image analysis. International Journal of Cognitive Computing in Engineering, 2(February), 83–92.

    Article  Google Scholar 

  24. Münzer, B., Schoeffmann, K., & Böszörmenyi, L. (2018). Content-based processing and analysis of endoscopic images and videos: A survey. Multimedia Tools and Applications, 77(1), 1323–1362.

    Article  Google Scholar 

  25. Rathnamala, S., & Jenicka, S. (2021). Automated bleeding detection in wireless capsule endoscopy images based on color feature extraction from Gaussian mixture model superpixels. Medical & Biological Engineering & Computing, 59(4), 969–987.

    CAS  Article  Google Scholar 

  26. Coimbra, M. T., & Cunha, J. S. (2006). MPEG-7 visual descriptors—contributions for automated feature extraction in capsule endoscopy. IEEE Transactions on Circuits and Systems for Video Technology, 16(5), 628–637.

    Article  Google Scholar 

  27. Karargyris, A., & Bourbakis, N. (2011). Detection of small bowel polyps and ulcers in wireless capsule endoscopy videos. IEEE Transactions on Biomedical Engineering, 58(10), 2777–2786.

    Article  PubMed  Google Scholar 

  28. Li, B., & Meng, M. Q. H. (2012). Tumor recognition in wireless capsule endoscopy images using textural features and SVM-based feature selection. IEEE Transactions on Information Technology in Biomedicine, 16(3), 323–329.

    Article  PubMed  Google Scholar 

  29. Yuan, Y., Li, B., & Meng, M. Q. H. (2015). Improved bag of feature for automatic polyp detection in wireless capsule endoscopy images. IEEE Transactions on Automation Science and Engineering, 13(2), 529–535.

    Article  Google Scholar 

  30. Yuan, Y., Li, B., & Meng, M. Q. H. (2016). WCE abnormality detection based on saliency and adaptive locality-constrained linear coding. IEEE Transactions on Automation Science and Engineering, 14(1), 149–159.

    Article  Google Scholar 

  31. Khan, M. A., Rashid, M., Sharif, M., Javed, K., & Akram, T. (2019). Classification of gastrointestinal diseases of stomach from WCE using improved saliency-based method and discriminant features selection. Multimedia Tools and Applications, 78(19), 27743–27770.

    Article  Google Scholar 

  32. Al Mamun, A., Hossain, M. S., Hossain, M. M., & Hasan, M. G. (2019). Discretion way for bleeding detection in wireless capsule endoscopy images. In: 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT). IEEE. pp. 1–6. DOI:

  33. Ghosh, T., & Chakareski, J. (2021). Deep transfer learning for automated intestinal bleeding detection in capsule endoscopy imaging. Journal of Digital Imaging.

    Article  PubMed  PubMed Central  Google Scholar 

  34. Ali, H., Sharif, M., Yasmin, M., Rehmani, M. H., & Riaz, F. (2020). A survey of feature extraction and fusion of deep learning for detection of abnormalities in video endoscopy of gastrointestinal-tract. Artificial Intelligence Review, 53(4), 2635–2707.

    Article  Google Scholar 

  35. Li, M., Hsu, W., Xie, X., Cong, J., & Gao, W. (2020). SACNN: Self-attention convolutional neural network for low-dose CT denoising with self-supervised perceptual loss network. IEEE Transactions on Medical Imaging, 39(7), 2289–2301.

    Article  Google Scholar 

  36. Xing, X., Yuan, Y., & Meng, M. Q. H. (2020). Zoom in lesions for better diagnosis: Attention guided deformation network for WCE image classification. IEEE Transactions on Medical Imaging, 39(12), 4047–4059.

    Article  PubMed  Google Scholar 

  37. Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 7132–7141

  38. Woo, S., Park, J., Lee, J. Y., &Kweon, I. S. (2018). Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV). pp. 3–19.

  39. Bello, I., Zoph, B., Vaswani, A., Shlens, J., & Le, Q. V. (2019). Attention augmented convolutional networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 3286–3295)

  40. Wang, Z., Zou, N., Shen, D., & Ji, S. (2020). Non-local u-nets for biomedical image segmentation. Proceedings of the AAAI Conference on Artificial Intelligence., 34(04), 6315–6322.

    Article  Google Scholar 

  41. Khanh, T. L. B., Dao, D. P., Ho, N. H., Yang, H. J., Baek, E. T., Lee, G., Kim, S. H., & Yoo, S. B. (2020). Enhancing u-net with spatial-channel attention gate for abnormal tissue segmentation in medical imaging. Applied Sciences, 10(17), 5729.

    CAS  Article  Google Scholar 

  42. Huang, G., Zhu, J., Li, J., Wang, Z., Cheng, L., Liu, L., & Zhou, J. (2020). Channel-attention U-Net: Channel attention mechanism for semantic segmentation of esophagus and esophageal cancer. IEEE Access, 8, 122798–122810.

    Article  Google Scholar 

  43. Ren, X., Huo, J., Xuan, K., Wei, D., Zhang, L., & Wang, Q. (2020). Robust brain magnetic resonance image segmentation for hydrocephalus patients: Hard and soft attention. In: 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI) (pp. 385–389). IEEE. DOI:

  44. Chen, B., Li, J., Lu, G., & Zhang, D. (2019). Lesion location attention guided network for multi-label thoracic disease classification in chest X-rays. IEEE Journal of Biomedical and Health Informatics, 24(7), 2016–2027.

    Article  PubMed  Google Scholar 

  45. Fong, C. (2014). Analytical methods for squaring the disc. Retrieved from pp. 1–33

  46. Deeba, F., Islam, M., Bui, F. M., & Wahid, K. A. (2018). Performance assessment of a bleeding detection algorithm for endoscopic video based on classifier fusion method and exhaustive feature selection. Biomedical Signal Processing and Control, 40, 415–424.

    Article  Google Scholar 

  47. Smedsrud, P. H., Thambawita, V., Hicks, S. A., Gjestang, H., Nedrejord, O. O., Næss, E., Borgli, H., Jha, D., Berstad, T. J. D., Eskeland, S. L., Lux, M., & Halvorsen, P. (2021). Kvasir-Capsule, a video capsule endoscopy dataset. Scientific Data, 8(1), 1–10.

    Article  Google Scholar 

  48. Pogorelov, K., Ostroukhova, O., Petlund, A., Halvorsen, P., de Lange, T., Espeland, H. N., Kupka, T., Griwodz, C., Riegler, M. (2018). Deep learning and handcrafted feature based approaches for automatic detection of angiectasia. In: 2018 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI). IEEE. pp. 365–368. DOI:

  49. Xing, X., Yuan, Y., Jia, X., & Meng, M. Q. H. (2019). A saliency-aware hybrid dense network for bleeding detection in wireless capsule endoscopy images. In: 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019). IEEE. pp. 104–107

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Senthil Murugan Balakrishnan.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Muruganantham, P., Balakrishnan, S.M. Attention Aware Deep Learning Model for Wireless Capsule Endoscopy Lesion Classification and Localization. J. Med. Biol. Eng. 42, 157–168 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Wireless capsule endoscopy (WCE)
  • Self-attention mechanism
  • Conceptual feature maps