Skip to main content

Playing to distraction: towards a robust training of CNN classifiers through visual explanation techniques

A Correction to this article was published on 16 September 2021

This article has been updated


The field of deep learning is evolving in different directions, with still the need for more efficient training strategies. In this work, we present a novel and robust training scheme that integrates visual explanation techniques in the learning process. Unlike the attention mechanisms that focus on the relevant parts of images, we aim to improve the robustness of the model by making it pay attention to other regions as well. Broadly speaking, the idea is to distract the classifier in the learning process by forcing it to focus not only on relevant regions but also on those that, a priori, are not so informative for the discrimination of the class. We tested the proposed approach by embedding it into the learning process of a convolutional neural network for the analysis and classification of two well-known datasets, namely Stanford cars and FGVC-Aircraft. Furthermore, we evaluated our model on a real-case scenario for the classification of egocentric images, allowing us to obtain relevant information about peoples’ lifestyles. In particular, we work on the challenging EgoFoodPlaces dataset, achieving state-of-the-art results with a lower level of complexity. The results obtained indicate the suitability of our proposed training scheme for image classification, improving the robustness of the final model.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3

Change history






  1. Richards BA, Lillicrap TP, Beaudoin P, Bengio Y, Bogacz R, Christensen A, Clopath C, Costa RP, de Berker A, Ganguli S et al (2019) A deep learning framework for neuroscience. Nat Neurosci 22(11):1761

    Article  Google Scholar 

  2. Khan S, Islam N, Jan Z, Din IU, Rodrigues JJC (2019) A novel deep learning based framework for the detection and classification of breast cancer using transfer learning. Pattern Recognit Lett 125:1

    Article  Google Scholar 

  3. Wu J, Shin S, Kim CG, Kim SD (2017) IEEE International Conference on Systems, Man, and Cybernetics pp. 1799–1804

  4. Xu J, Zhang Z, Friedman T, Liang Y, Broeck G (2018) International Conference on Machine Learning pp. 5502–5511

  5. Zhang K, Zheng L, Liu Z, Jia N (2020) A deep learning based multitask model for network-wide traffic speed prediction. Neurocomputing 396:438

    Article  Google Scholar 

  6. Luvizon DC, Picard D, Tabia H (2018) IEEE Conference on Computer Vision and Pattern Recognition pp. 5137–5146

  7. Li X, Zhang W, Ding Q (2019) Understanding and improving deep learning-based rolling bearing fault diagnosis with attention mechanism. Sig Process 161:136

    Article  Google Scholar 

  8. Jain DK, Jain R (2020) Upadhyay Y. Lan A, Kathuria A, pp 1839–1856

  9. Samek W, Wiegand T, Müller KR (2017) arXiv preprint arXiv:1708.08296

  10. Vellido A (2019) The importance of interpretability and visualization in machine learning for applications in medicine and health care. Neural Comput App 32:1–15

    Google Scholar 

  11. Krause J, Stark M, Deng J, Fei-Fei L (2013) 4th International IEEE Workshop on 3D Representation and Recognition pp. 554–561

  12. Maji S, Rahtu E, Kannala J, Blaschko M, Vedaldi A (2013) arXiv preprint arXiv:1306.5151

  13. D. Damen, H. Doughty, G. Maria Farinella, S. Fidler, A. Furnari, E. Kazakos, D. Moltisanti, J. Munro, T. Perrett, W. Price, et al., European Conference on Computer Vision pp. 720–736 (2018)

  14. Bolanos M, Dimiccoli M, Radeva P (2016) Toward storytelling from visual lifelogging: An overview. IEEE Transact Human Mach Syst 47(1):77

    Google Scholar 

  15. Talavera E, Leyva-Vallina M, Sarker MMK, Puig D, Petkov N, Radeva P (2019) Hierarchical approach to classify food scenes in egocentric photo-streams. IEEE J Biomed Health Inform 24(3):866

    Google Scholar 

  16. Gelonch O, Cano N, Vancells M, Bolaños M, Farràs-Permanyer L, Garolera M (2020) The effects of exposure to recent autobiographical events on declarative memory in amnestic Mild Cognitive Impairment: A Preliminary Pilot Study. Curr Alzheimer Res 17(2):158

    Article  Google Scholar 

  17. M.K. Sarker, H.A. Rashwan, E. Talavera, S. Furruka Banu, P. Radeva, D. Puig, et al., European Conference on Computer Vision Workshops pp. 1–11 (2018)

  18. Arrieta AB, Díaz-Rodríguez N, Del Ser J, Bennetot A, Tabik S, Barbado A, García S, Gil-López S, Molina D, Benjamins R et al (2020) Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Info Fusion 58:82

    Article  Google Scholar 

  19. A. Bennetot, J.L. Laurent, R. Chatila, N. Díaz-Rodríguez, IJCAI Neural-Symbolic Learning and Reasoning Workshop (2019)

  20. B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, A. Torralba, IEEE Conference on Computer Vision and Pattern Recognition pp. 2921–2929 (2016)

  21. R.R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, D. Batra, IEEE International Conference on Computer Vision pp. 618–626 (2017)

  22. M.D. Zeiler, R. Fergus, European Conference on Computer Vision pp. 818–833 (2014)

  23. M.T. Ribeiro, S. Singh, C. Guestrin, ACM SIGKDD International Conference on Knowledge Discovery and Data Mining pp. 1135–1144 (2016)

  24. D. Smilkov, N. Thorat, B. Kim, F. Viégas, M. Wattenberg, arXiv preprint arXiv:1706.03825 (2017)

  25. A. Schöttl, International Conference on Advanced Computer Information Technologies pp. 348–351 (2020)

  26. Cancela B, Bolón-Canedo V, Alonso-Betanzos A, Gama J (2020) A scalable saliency-based feature selection method with instance-level information. Knowl Based Syst 192:105326

    Article  Google Scholar 

  27. P. Herruzo, L. Portell, A. Soto, B. Remeseiro, International Conference on Image Analysis and Processing pp. 109–119 (2017)

  28. M. Aghaei, M. Dimiccoli, P. Radeva, International Conference on Pattern Recognition pp. 2959–2964 (2016)

  29. E. Talavera, A. Glavan, A. Matei, P. Radeva, arXiv preprint arXiv:2009.07646 (2020)

  30. Furnari A, Farinella GM, Battiato S (2016) European Conference on Computer Vision pp. 474–489

  31. Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transact Pattern Anal Mach Intell 40(4):834

    Article  Google Scholar 

  32. Kim J, Kwon Lee J, Mu Lee K (2016) IEEE Conference on Computer Vision and Pattern Recognition pp. 1646–1654

  33. Dekel O, Gilad-Bachrach R, Shamir O, Xiao L (2012) Optimal distributed online prediction using mini-batches. J Mach Learn Res 13:165

    MathSciNet  MATH  Google Scholar 

  34. Kingma DP, Ba J (2015) 3rd International Conference on Learning Representations pp. 1–15

  35. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M (2016) et al., USENIX Symposium on Operating Systems Design and Implementation pp. 265–283

  36. Chollet F et al. (2015) Keras.

  37. He K, Zhang X, Ren S, Sun J (2016) IEEE Conference on Computer Vision and Pattern Recognition pp. 770–778

  38. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) IEEE Conference on Computer Vision and Pattern Recognition pp. 248–255

  39. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) IEEE conference on computer vision and pattern recognition pp. 2818–2826

  40. Huang G, Liu Z, Weinberger KQ, van der Maaten L (2017) IEEE Conference on Computer Vision and Pattern Recognition pp. 4700–4708

  41. Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR (2012) arXiv preprint arXiv:1207.0580

  42. Zhong Z, Zheng L, Kang G, Li S, Yang Y (2020) AAAI Conference on Artificial Intelligence pp. 13,001–13,008

Download references


We would like to thank the Center for Information Technology of the University of Groningen for their support and for providing access to the Peregrine high performance computing cluster.

Author information

Authors and Affiliations


Corresponding author

Correspondence to David Morales.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Partial financial support was received from HAT.tec GmbH. This work has been financially supported in part by European Union ERDF funds, by the Spanish Ministry of Science and Innovation (research project PID2019-109238GB-C21), and by the Principado de Asturias Regional Government (research project IDI-2018-000176). The funders had no role in the study design, data collection, analysis, and preparation of the manuscript.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Morales, D., Talavera, E. & Remeseiro, B. Playing to distraction: towards a robust training of CNN classifiers through visual explanation techniques. Neural Comput & Applic 33, 16937–16949 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Visual explanation techniques
  • Learning process
  • Convolutional neural networks
  • Image classification
  • Fine-grained recognition
  • Egocentric vision