Skip to main content
Log in

On the post-hoc explainability of deep echo state networks for time series forecasting, image and video classification

  • S. I. : Effective and Efficient Deep Learning
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Since their inception, learning techniques under the reservoir computing paradigm have shown a great modeling capability for recurrent systems without the computing overheads required for other approaches, specially deep neural networks. Among them, different flavors of echo state networks have attracted many stares through time, mainly due to the simplicity and computational efficiency of their learning algorithm. However, these advantages do not compensate for the fact that echo state networks remain as black-box models whose decisions cannot be easily explained to the general audience. This issue is even more involved for multi-layered (also referred to as deep) echo state networks, whose more complex hierarchical structure hinders even further the explainability of their internals to users without expertise in machine learning or even computer science. This lack of explainability can jeopardize the widespread adoption of these models in certain domains where accountability and understandability of machine learning models is a must (e.g., medical diagnosis, social politics). This work addresses this issue by conducting an explainability study of echo state networks when applied to learning tasks with time series, image and video data. Among these tasks, we stress on the latter one (video classification) which, to the best of our knowledge, has never been tackled before with echo state networks in the related literature. Specifically, the study proposes three different techniques capable of eliciting understandable information about the knowledge grasped by these recurrent models, namely potential memory, temporal patterns and pixel absence effect. Potential memory addresses questions related to the effect of the reservoir size in the capability of the model to store temporal information, whereas temporal patterns unveil the recurrent relationships captured by the model over time. Finally, pixel absence effect attempts at evaluating the effect of the absence of a given pixel when the echo state network model is used for image and video classification. The benefits of the proposed suite of techniques are showcased over three different domains of applicability: time series modeling, image and, for the first time in the related literature, video classification. The obtained results reveal that the proposed techniques not only allow for an informed understanding of the way these models work, but also serve as diagnostic tools capable of detecting issues inherited from data (e.g., presence of hidden bias).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. For clarity any consideration to the bias term is avoided in this statement.

References

  1. Jaeger H (2003) Adaptive nonlinear system identification with echo state networks. In: Advances in neural information processing systems, pp 609–616

  2. Lukoševičius M, Jaeger H (2009) Reservoir computing approaches to recurrent neural network training. Comput Sci Rev 3(3):127–149

    Article  Google Scholar 

  3. Gallicchio C, Scardapane S (2020) Deep randomized neural networks. Recent Trends Learn Data, pp 43–68

  4. Zhang L, Suganthan PN (2016) A survey of randomized algorithms for training neural networks. Inf Sci 364:146–155

    Article  Google Scholar 

  5. Jaeger H, Haas H (2004) Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication. Science 304(5667):78–80

    Article  Google Scholar 

  6. Wu Q, Fokoue E, Kudithipudi D (2018) On the statistical challenges of echo state networks and some potential remedies. arXiv:1802.07369

  7. Jaeger H (2005) Reservoir riddles: suggestions for echo state network research. In:Proceedings. 2005 IEEE international joint conference on neural networks, vol 3, pp 1460–1462. IEEE

  8. Luca AT, Ulrich P (2019) Gradient based hyperparameter optimization in echo state networks. Neural Netw 115:23–29

    Article  Google Scholar 

  9. Öztürk MM, Cankaya IA, Ipekci D (2020) Optimizing echo state network through a novel fisher maximization based stochastic gradient descent. Neurocomputing

  10. Arrieta AB, Díaz-Rodríguez N, Del SJ, Bennetot A, Tabik S, Barbado A, Salvador G, Sergio G-L, Daniel M, Richard B, et al (2020) Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible ai. Inf Fusion 58:82–115

    Article  Google Scholar 

  11. Gallicchio C, Micheli A, Pedrelli L (2017) Deep reservoir computing: a critical experimental analysis. Neurocomputing 268:87–99

    Article  Google Scholar 

  12. Maass W, Natschläger T, Markram H (2002) Real-time computing without stable states: a new framework for neural computation based on perturbations. Neural comput 14(11):2531–2560

    Article  Google Scholar 

  13. Jaeger H (2001) The “echo state’’ approach to analysing and training recurrent neural networks-with an erratum note. Bonn, Germany: German National Research Center for Information Technology GMD Technical Report 148(34):13

    Google Scholar 

  14. Dominey PF (1995) Complex sensory-motor sequence learning based on recurrent state representation and reinforcement learning. Biol Cybern 73(3):265–274

    Article  Google Scholar 

  15. Steil JJ (2004) Backpropagation-decorrelation: online recurrent learning with o (n) complexity. In: 2004 IEEE international joint conference on neural networks (IEEE Cat. No. 04CH37541), vol 2, pp 843–848. IEEE

  16. Del S, Javier L, Ibai, M, Eric L, Oregi I, Osaba E, Lobo JL, Bilbao MN, Vlahogianni EI (2020) Deep echo state networks for short-term traffic forecasting: performance comparison and statistical assessment. In: IEEE international conference on intelligent transportation systems (ITSC), pp 1–6. IEEE

  17. Palumbo F Gallicchio C, Pucci R, Micheli A (2016) Human activity recognition using multisensor data fusion based on reservoir computing. J Ambient Intell Smart Environ 8(2):87–107

    Article  Google Scholar 

  18. Crisostomi E, Gallicchio C, Micheli A, Raugi M, Tucci M (2015) Prediction of the italian electricity price for smart grid applications. Neurocomputing 170:286–295

    Article  Google Scholar 

  19. Jaeger H, Lukoševičius M, Popovici D, Siewert U (2007) Optimization and applications of echo state networks with leaky-integrator neurons. Neural Netw 20(3):335–352

    Article  Google Scholar 

  20. Gallicchio C, Micheli A (2019) Richness of deep echo state network dynamics. In: International work-conference on artificial neural networks, pp 480–491

  21. Gallicchio C, Micheli A (2017) Echo state property of deep reservoir computing networks. Cognit Comput 9(3):337–350

    Article  Google Scholar 

  22. Jaeger H (2002) Tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the “echo state network” approach, volume 5. GMD-Forschungszentrum Informationstechnik Bonn

  23. Gallicchio C, Micheli A, Pedrelli L (2018) Design of deep echo state networks. Neural Netw 108:33–47

    Article  Google Scholar 

  24. Liu K, Zhang J (2020) Nonlinear process modelling using echo state networks optimised by covariance matrix adaption evolutionary strategy. Comput Chem Eng 135:106730

    Article  Google Scholar 

  25. Arras L, Montavon G, Müller K-R, Samek W (2017) Explaining recurrent neural network predictions in sentiment analysis. In: Proceedings of the 8th workshop on computational approaches to subjectivity, sentiment and social media analysis, pp 159–168

  26. Li J, Chen X, Hovy E, Jurafsky D (2016) Visualizing and understanding neural models in nlp. In: Proceedings of NAACL-HLT, pp 681–691

  27. Denil M, Demiraj A, De Freitas N (2014) Extraction of salient sentences from labelled documents. arXiv:1412.6815

  28. Li J, Monroe W, Jurafsky D (2016) Understanding neural networks through representation erasure. arXiv:1612.08220

  29. Kádár A, Chrupała G, Alishahi A (2017) Representation of linguistic form and function in recurrent neural networks. Comput Linguist 43(4):761–780

    Article  MathSciNet  Google Scholar 

  30. Murdoch W, James L, Peter J, Yu B (2018) Beyond word importance: contextual decomposition to extract interactions from lstms. arXiv:1801.05453

  31. Hassaballah M, Awad AI (2020) Deep learning in computer vision: principles and applications. CRC Press, Boca Raton

  32. Rojat T, Puget R, Filliat D, Del S, Javier G, Rodolphe í-R, Natalia D (2021) Explainable artificial intelligence (xai) on time series data: a survey. arXiv:2104.00950

  33. Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing sax: a novel symbolic representation of time series. Data Mining Knowl Discov 15(2):107–144

    Article  MathSciNet  Google Scholar 

  34. Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, pp 2–11

  35. Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inf Syst 3(3):263–286

    Article  Google Scholar 

  36. Zadeh LA (1988) Fuzzy logic. Computer 21(4):83–93

    Article  Google Scholar 

  37. Herrera F, Herrera-Viedma E, Martinez L (2000) A fusion approach for managing multi-granularity linguistic term sets in decision making. Fuzzy Sets Syst 114(1):43–58

    Article  Google Scholar 

  38. Herrera F, Alonso S, Chiclana Francisco H-VE (2009) Computing with words in decision making: foundations, trends and prospects. Fuzzy Optim Decis Making 8(4):337–364

    Article  Google Scholar 

  39. Mencar C, Alonso JM (2018) Paving the way to explainable artificial intelligence with fuzzy modeling. In: International Workshop on Fuzzy Logic and Applications, pp 215–227. Springer

  40. Samek W, Montavon G, Vedaldi A, Hansen LK, Müller K-R (2019) Explainable AI: interpreting, explaining and visualizing deep learning, vol 11700. Springer

  41. Chang Y-W, Lin C-J (2008) Feature ranking using linear svm. In: Causation and prediction challenge, pp 53–64. PMLR

  42. Lundberg SM, Erion GG, Lee S-I (2018) Consistent individualized feature attribution for tree ensembles. arXiv:1802.03888

  43. Smilkov D, Thorat N, Kim B, Viégas F, Wattenberg M (2017) Smoothgrad: removing noise by adding noise. arXiv:1706.03825

  44. Adebayo J, Gilmer J, Muelly M, Goodfellow I, Hardt M, Kim B (2018) Sanity checks for saliency maps. arXiv:1810.03292

  45. Springenberg JT, Dosovitskiy A, Brox T, Riedmiller M (2014) Striving for simplicity: the all convolutional net. arXiv:1412.6806

  46. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626

  47. Montavon G, Lapuschkin S, Binder A, Samek W, Müller K-R (2017) Explaining nonlinear classification decisions with deep taylor decomposition. Pattern Recognit 65:211–222

    Article  Google Scholar 

  48. Simonyan K, Vedaldi A, Zisserman A (2013) Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv:1312.6034

  49. Ancona M, Ceolini E, Öztireli C, Gross M (2017) Towards better understanding of gradient-based attribution methods for deep neural networks. arXiv:1711.06104

  50. Baehrens D, Schroeter T, Harmeling S, Kawanabe M, Hansen K, Müller K-R (2010) How to explain individual classification decisions. J Mach Learn Res 11:1803–1831

    MathSciNet  MATH  Google Scholar 

  51. Shrikumar A, Greenside P, Kundaje A (2017) Learning important features through propagating activation differences. In: International conference on machine learning, pp 3145–3153. PMLR

  52. Ribeiro MT, Singh S, Guestrin C (2016) “Why should i trust you?” Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1135–1144

  53. Marwan N, Romano MC, Thiel M, Kurths J (2007) Recurrence plots for the analysis of complex systems. Phys Rep 438(5–6):237–329

    Article  MathSciNet  Google Scholar 

  54. Eckmann J-P, Kamphorst SO, Ruelle D, et al (1995) Recurrence plots of dynamical systems. World Sci Ser Nonlinear Sci Ser A 16:441–446

    Article  Google Scholar 

  55. Gallicchio C, Micheli A (2016) Deep reservoir computing: a critical analysis. In: ESANN

  56. Schaetti N, Salomon M, Couturier R (2016) Echo state networks-based reservoir computing for mnist handwritten digits recognition. In: IEEE international conference on computational science and engineering (CSE), pp 484–491. IEEE

  57. Woodward A, Ikegami T (2011) A reservoir computing approach to image classification using coupled echo state and back-propagation neural networks. In International conference image and vision computing, Auckland, New Zealand, pp 543–458

  58. Souahlia A, Belatreche A, Benyettou A, Curran K (2016) An experimental evaluation of echo state network for colour image segmentation. In: 2016 International joint conference on neural networks (IJCNN), pp 1143–1150. IEEE

  59. Tong Z, Tanaka G (2018) Reservoir computing with untrained convolutional neural networks for image recognition. In: International conference on pattern recognition (ICPR), pp 1289–1294. IEEE

  60. Shi X, Chen Z, Wang H, Yeung D-Y, Wong W-K, Woo W-C (2015) Convolutional lstm network: a machine learning approach for precipitation nowcasting. arXiv:1506.04214

  61. Laña I, Del SJ, Padró A, Vélez M, Casanova-Mateo C (2016) The role of local urban traffic and meteorological conditions in air pollution: a data-based case study in Madrid. Spain. Atmos Environ 145:424–438

    Article  Google Scholar 

  62. Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local svm approach. In: Proceedings of the 17th International conference on pattern recognition, 2004. ICPR 2004., volume 3, pp 32–36. IEEE

  63. Blank M, Gorelick L, Shechtman E, Irani M, Basri R (2005) Actions as space-time shapes. In: Tenth IEEE international conference on computer vision (ICCV’05) Volume 1, volume 2, pp 1395–1402. IEEE

  64. Weinland D, Ronfard R, Boyer E (2006) Free viewpoint action recognition using motion history volumes. Comput Vis Image Understand 104(2–3):249–257

    Article  Google Scholar 

  65. Liu J, Luo J, Shah M (2009) Recognizing realistic actions from videos “in the wild”. In: 2009 IEEE conference on computer vision and pattern recognition, pp 1996–2003. IEEE

  66. Reddy KK, Shah M (2013) Recognizing 50 human action categories of web videos. Mach Vis Appl 24(5):971–981

    Article  Google Scholar 

  67. Soomro K, Zamir AR, Shah M: Ucf101: a dataset of 101 human actions classes from videos in the wild. arXiv:1212.0402

  68. Rodriguez MD, Ahmed J, Shah M (2008) Action mach a spatio-temporal maximum average correlation height filter for action recognition. In: IEEE conference on computer vision and pattern recognition, pp 1–8. IEEE

  69. Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) Hmdb: a large video database for human motion recognition. In: 2011 International conference on computer vision, pp 2556–2563. IEEE

  70. LeCun Y (1998) The mnist database of handwritten digits. http://yann.lecun.com/exdb/mnist/

  71. Han D, Bo L, Sminchisescu C (2009) Selection and context for action recognition. In: 2009 IEEE 12th international conference on computer vision, pp 1933–1940

  72. Ghadiyaram D, Tran D, Mahajan D (2019) Large-scale weakly-supervised pre-training for video action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12046–12055

  73. Baccouche M, Mamalet F, Wolf C, Garcia C, Baskurt A (2011) Sequential deep learning for human action recognition. In: International workshop on human behavior understanding, pp 29–39. Springer

  74. Shu Na, Tang Q, Liu H (2014) A bio-inspired approach modeling spiking neural networks of visual cortex for human action recognition. In: 2014 international joint conference on neural networks (IJCNN), pp 3450–3457. IEEE

  75. Liu J, Shah M (2008) Learning human actions via information maximization. In: IEEE conference on computer vision and pattern recognition, pp 1–8. IEEE

  76. Sharma S, Kiros R, Salakhutdinov R (2015) Action recognition using visual attention. (2015). arXiv:1511.04119

  77. Shi Y, Zeng W, Huang T, Wang Y (2015) Learning deep trajectory descriptor for action recognition in videos using deep neural networks. In: 2015 IEEE international conference on multimedia and expo (ICME), pp 1–6. IEEE

  78. Wang L, Qiao Y, Tang X (2015) Action recognition with trajectory-pooled deep-convolutional descriptors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4305–4314

  79. Harandi MT, Sanderson C, Shirazi S, Lovell BC (2013) Kernel analysis on grassmann manifolds for action recognition. Pattern Recognit Lett 34(15):1906–1915

    Article  Google Scholar 

Download references

Acknowledgements

This work has received funding support from the Basque Government (Eusko Jaurlaritza) through the Consolidated Research Group MATHMODE (IT1294-19) and ELKARTEK program (3KIA project, KK-2020/00049).

Author information

Authors and Affiliations

Authors

Contributions

A. Barredo Arrieta, J. Del Ser, and M. N. Bilbao contributed to conceptualization; A. Barredo Arrieta, S. Gil-Lopez, I. Laña, and J. Del Ser provided methodology; A. Barredo Arrieta and J. Del Ser performed formal analysis and investigation; A. Barredo Arrieta done writing—original draft preparation; S. Gil-Lopez, I. Laña, M. N. Bilbao, and J. Del Ser performed writing—review and editing; and J. Del Ser contributed to funding acquisition and supervision.

Corresponding author

Correspondence to Javier Del Ser.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Barredo Arrieta, A., Gil-Lopez, S., Laña, I. et al. On the post-hoc explainability of deep echo state networks for time series forecasting, image and video classification. Neural Comput & Applic 34, 10257–10277 (2022). https://doi.org/10.1007/s00521-021-06359-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-021-06359-y

Keywords

Navigation