Abstract
The field of deep learning is evolving in different directions, with still the need for more efficient training strategies. In this work, we present a novel and robust training scheme that integrates visual explanation techniques in the learning process. Unlike the attention mechanisms that focus on the relevant parts of images, we aim to improve the robustness of the model by making it pay attention to other regions as well. Broadly speaking, the idea is to distract the classifier in the learning process by forcing it to focus not only on relevant regions but also on those that, a priori, are not so informative for the discrimination of the class. We tested the proposed approach by embedding it into the learning process of a convolutional neural network for the analysis and classification of two well-known datasets, namely Stanford cars and FGVC-Aircraft. Furthermore, we evaluated our model on a real-case scenario for the classification of egocentric images, allowing us to obtain relevant information about peoples’ lifestyles. In particular, we work on the challenging EgoFoodPlaces dataset, achieving state-of-the-art results with a lower level of complexity. The results obtained indicate the suitability of our proposed training scheme for image classification, improving the robustness of the final model.
This is a preview of subscription content,
to check access.


Change history
16 September 2021
A Correction to this paper has been published: https://doi.org/10.1007/s00521-021-06407-7
References
Richards BA, Lillicrap TP, Beaudoin P, Bengio Y, Bogacz R, Christensen A, Clopath C, Costa RP, de Berker A, Ganguli S et al (2019) A deep learning framework for neuroscience. Nat Neurosci 22(11):1761
Khan S, Islam N, Jan Z, Din IU, Rodrigues JJC (2019) A novel deep learning based framework for the detection and classification of breast cancer using transfer learning. Pattern Recognit Lett 125:1
Wu J, Shin S, Kim CG, Kim SD (2017) IEEE International Conference on Systems, Man, and Cybernetics pp. 1799–1804
Xu J, Zhang Z, Friedman T, Liang Y, Broeck G (2018) International Conference on Machine Learning pp. 5502–5511
Zhang K, Zheng L, Liu Z, Jia N (2020) A deep learning based multitask model for network-wide traffic speed prediction. Neurocomputing 396:438
Luvizon DC, Picard D, Tabia H (2018) IEEE Conference on Computer Vision and Pattern Recognition pp. 5137–5146
Li X, Zhang W, Ding Q (2019) Understanding and improving deep learning-based rolling bearing fault diagnosis with attention mechanism. Sig Process 161:136
Jain DK, Jain R (2020) Upadhyay Y. Lan A, Kathuria A, pp 1839–1856
Samek W, Wiegand T, Müller KR (2017) arXiv preprint arXiv:1708.08296
Vellido A (2019) The importance of interpretability and visualization in machine learning for applications in medicine and health care. Neural Comput App 32:1–15
Krause J, Stark M, Deng J, Fei-Fei L (2013) 4th International IEEE Workshop on 3D Representation and Recognition pp. 554–561
Maji S, Rahtu E, Kannala J, Blaschko M, Vedaldi A (2013) arXiv preprint arXiv:1306.5151
D. Damen, H. Doughty, G. Maria Farinella, S. Fidler, A. Furnari, E. Kazakos, D. Moltisanti, J. Munro, T. Perrett, W. Price, et al., European Conference on Computer Vision pp. 720–736 (2018)
Bolanos M, Dimiccoli M, Radeva P (2016) Toward storytelling from visual lifelogging: An overview. IEEE Transact Human Mach Syst 47(1):77
Talavera E, Leyva-Vallina M, Sarker MMK, Puig D, Petkov N, Radeva P (2019) Hierarchical approach to classify food scenes in egocentric photo-streams. IEEE J Biomed Health Inform 24(3):866
Gelonch O, Cano N, Vancells M, Bolaños M, Farràs-Permanyer L, Garolera M (2020) The effects of exposure to recent autobiographical events on declarative memory in amnestic Mild Cognitive Impairment: A Preliminary Pilot Study. Curr Alzheimer Res 17(2):158
M.K. Sarker, H.A. Rashwan, E. Talavera, S. Furruka Banu, P. Radeva, D. Puig, et al., European Conference on Computer Vision Workshops pp. 1–11 (2018)
Arrieta AB, Díaz-Rodríguez N, Del Ser J, Bennetot A, Tabik S, Barbado A, García S, Gil-López S, Molina D, Benjamins R et al (2020) Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Info Fusion 58:82
A. Bennetot, J.L. Laurent, R. Chatila, N. Díaz-Rodríguez, IJCAI Neural-Symbolic Learning and Reasoning Workshop (2019)
B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, A. Torralba, IEEE Conference on Computer Vision and Pattern Recognition pp. 2921–2929 (2016)
R.R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, D. Batra, IEEE International Conference on Computer Vision pp. 618–626 (2017)
M.D. Zeiler, R. Fergus, European Conference on Computer Vision pp. 818–833 (2014)
M.T. Ribeiro, S. Singh, C. Guestrin, ACM SIGKDD International Conference on Knowledge Discovery and Data Mining pp. 1135–1144 (2016)
D. Smilkov, N. Thorat, B. Kim, F. Viégas, M. Wattenberg, arXiv preprint arXiv:1706.03825 (2017)
A. Schöttl, International Conference on Advanced Computer Information Technologies pp. 348–351 (2020)
Cancela B, Bolón-Canedo V, Alonso-Betanzos A, Gama J (2020) A scalable saliency-based feature selection method with instance-level information. Knowl Based Syst 192:105326
P. Herruzo, L. Portell, A. Soto, B. Remeseiro, International Conference on Image Analysis and Processing pp. 109–119 (2017)
M. Aghaei, M. Dimiccoli, P. Radeva, International Conference on Pattern Recognition pp. 2959–2964 (2016)
E. Talavera, A. Glavan, A. Matei, P. Radeva, arXiv preprint arXiv:2009.07646 (2020)
Furnari A, Farinella GM, Battiato S (2016) European Conference on Computer Vision pp. 474–489
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transact Pattern Anal Mach Intell 40(4):834
Kim J, Kwon Lee J, Mu Lee K (2016) IEEE Conference on Computer Vision and Pattern Recognition pp. 1646–1654
Dekel O, Gilad-Bachrach R, Shamir O, Xiao L (2012) Optimal distributed online prediction using mini-batches. J Mach Learn Res 13:165
Kingma DP, Ba J (2015) 3rd International Conference on Learning Representations pp. 1–15
Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M (2016) et al., USENIX Symposium on Operating Systems Design and Implementation pp. 265–283
Chollet F et al. (2015) Keras. https://keras.io
He K, Zhang X, Ren S, Sun J (2016) IEEE Conference on Computer Vision and Pattern Recognition pp. 770–778
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) IEEE Conference on Computer Vision and Pattern Recognition pp. 248–255
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) IEEE conference on computer vision and pattern recognition pp. 2818–2826
Huang G, Liu Z, Weinberger KQ, van der Maaten L (2017) IEEE Conference on Computer Vision and Pattern Recognition pp. 4700–4708
Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR (2012) arXiv preprint arXiv:1207.0580
Zhong Z, Zheng L, Kang G, Li S, Yang Y (2020) AAAI Conference on Artificial Intelligence pp. 13,001–13,008
Acknowledgements
We would like to thank the Center for Information Technology of the University of Groningen for their support and for providing access to the Peregrine high performance computing cluster.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Partial financial support was received from HAT.tec GmbH. This work has been financially supported in part by European Union ERDF funds, by the Spanish Ministry of Science and Innovation (research project PID2019-109238GB-C21), and by the Principado de Asturias Regional Government (research project IDI-2018-000176). The funders had no role in the study design, data collection, analysis, and preparation of the manuscript.
Rights and permissions
About this article
Cite this article
Morales, D., Talavera, E. & Remeseiro, B. Playing to distraction: towards a robust training of CNN classifiers through visual explanation techniques. Neural Comput & Applic 33, 16937–16949 (2021). https://doi.org/10.1007/s00521-021-06282-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-021-06282-2