Skip to main content
Log in

Architectural richness in deep reservoir computing

  • S.I. : IWANN 2019 SI on Advances in Computational Intelligence
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Reservoir computing (RC) is a popular class of recurrent neural networks (RNNs) with untrained dynamics. Recently, advancements on deep RC architectures have shown a great impact in time-series applications, showing a convenient trade-off between predictive performance and required training complexity. In this paper, we go more in depth into the analysis of untrained RNNs by studying the quality of recurrent dynamics developed by the layers of deep RC neural networks. We do so by assessing the richness of the neural representations in the different levels of the architecture, using measures originating from the fields of dynamical systems, numerical analysis and information theory. Our experiments, on both synthetic and real-world datasets, show that depth—as an architectural factor of RNNs design—has a natural effect on the quality of RNN dynamics (even without learning of the internal connections). The interplay between depth and the values of RC scaling hyper-parameters, especially the scaling of inter-layer connections, is crucial to design rich untrained recurrent neural systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. The maximum among the eigenvalues in modulus.

  2. The interested reader is referred to [13, 17] for a more complete overview of the general deep RC approach, including the readout computation and processing.

  3. The size of \({\mathcal {K}}\) is set to 0.3 times the std of instantaneous reservoir activations, as in [34].

  4. From LÍngua BRAsileira de Sinais, i.e., Brasilian Sign Language.

  5. The original dataset contained a variable number of time-steps with zero input features at the beginning and at the end of each time-series, which we have preliminary removed.

  6. Despite its simplicity, this choice for the architectural design of deep RC was found useful in several application contexts (see, e.g., [16, 19]). Notice, however, that in general the deep RC approach is not restrictive in this sense, and it enables the flexibility to have different hyper-parameterizations in different levels of the architecture.

  7. Note the values of the color scale in the \( {\text{ESP}}_{{{\text{index}}}} \) plot in Fig. 8, especially in comparison with those in the same plot of Fig. 7.

  8. Given a vector \({\mathbf {v}}\), \(\text {MAD}({\mathbf {v}}) = \text {MEDIAN}(| {\mathbf {v}}-\text {MEDIAN}({\mathbf {v}})|)\).

References

  1. Atiya AF, Parlos AG (2000) New results on recurrent network training: unifying the algorithms and accelerating convergence. IEEE Trans Neural Netw 11(3):697–709

    Article  Google Scholar 

  2. Bacciu D, Barsocchi P, Chessa S, Gallicchio C, Micheli A (2014) An experimental characterization of reservoir computing in ambient assisted living applications. Neural Comput Appl 24(6):1451–1464

    Article  Google Scholar 

  3. Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166

    Article  Google Scholar 

  4. Chen Y, Keogh E, Hu B, Begum N, Bagnall A, Mueen A, Batista G (2015) The ucr time series classification archive . www.cs.ucr.edu/~eamonn/time_series_data/

  5. Colla V, Matino I, Dettori S, Cateni S, Matino R (2019) Reservoir computing approaches applied to energy management in industry. In: International conference on engineering applications of neural networks. Springer, pp 66–79

  6. Cover TM (1965) Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans Electron Comput 3:326–334

    Article  MATH  Google Scholar 

  7. Dettori S, Matino I, Colla V, Speets R (2020) Deep echo state networks in industrial applications. In: IFIP international conference on artificial intelligence applications and innovations. Springer, pp 53–63

  8. Dias DB, Madeo RC, Rocha T, Biscaro HH, Peres SM (2009) Hand movement recognition for brazilian sign language: a study using distance-based neural networks. In: 2009 international joint conference on neural networks, pp. 697–704. IEEE

  9. Dua D, Graff C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml

  10. Gallicchio C (2019) Chasing the echo state property. In: 27th European symposium on artificial neural networks, computational intelligence and machine learning, ESANN 2019, pp 667–672. ESANN (i6doc. com)

  11. Gallicchio C, Micheli A (2010) A markovian characterization of redundancy in echo state networks by pca. In: Proc. of the 18th European symposium on artificial neural networks (ESANN). d-side publi

  12. Gallicchio C, Micheli A (2011) Architectural and markovian factors of echo state networks. Neural Netw 24(5):440–456

    Article  Google Scholar 

  13. Gallicchio C, Micheli A (2017) Deep echo state network (deepesn): a brief survey. arXiv preprint arXiv:1712.04323

  14. Gallicchio C, Micheli A (2017) Echo state property of deep reservoir computing networks. Cogn Comput 9(3):337–350

    Article  Google Scholar 

  15. Gallicchio C, Micheli A (2019) Reservoir topology in deep echo state networks. In: International conference on artificial neural networks. Springer, pp. 62–75

  16. Gallicchio C, Micheli A (2020) Fast and deep graph neural networks. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 3898–3905

  17. Gallicchio C, Micheli A (2021) Deep reservoir computing. In: Nakajima K, Fischer I (eds) Reservoir computing. Springer, pp 77–95

  18. Gallicchio C, Micheli A, Pedrelli L (2017) Deep reservoir computing: a critical experimental analysis. Neurocomputing 268:87–99. https://doi.org/10.1016/j.neucom.2016.12.089

    Article  Google Scholar 

  19. Gallicchio C, Micheli A, Pedrelli L (2018) Design of deep echo state networks. Neural Netw 108:33–47

    Article  Google Scholar 

  20. Gallicchio C, Scardapane S (2020) Deep randomized neural networks. Recent Trends Learn Data 43–68

  21. Graves A, Mohamed Ar, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing, pp 6645–6649. Ieee

  22. Haber E, Ruthotto L (2017) Stable architectures for deep neural networks. Inverse Probl 34(1):014004

    Article  MathSciNet  MATH  Google Scholar 

  23. Hermans M, Schrauwen B (2013) Training and analysing deep recurrent neural networks. Adv Neural Inf Process Syst 26:190–198

    Google Scholar 

  24. Hu H, Wang L, Lv SX (2020) Forecasting energy consumption and wind power generation using deep echo state network. Renew Energy 154:598–613

    Article  Google Scholar 

  25. Jaeger H (2001) The “echo state” approach to analysing and training recurrent neural networks-with an erratum note. Bonn Ger Ger Natl Res Center Inf Technol GMD Tech Rep

  26. Jaeger H (2002) Short term memory in echo state networks. Tech. rep, GMD-German National Research Institute for Computer Science

    Google Scholar 

  27. Jaeger H (2005) Reservoir riddles: suggestions for echo state network research. In: Proceedings of the 2005 IEEE international joint conference on neural networks (IJCNN), vol 3, pp 1460–1462. IEEE

  28. Jaeger H, Haas H (2004) Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication. Science 304(5667):78–80

    Article  Google Scholar 

  29. Jaeger H, Lukoševičius M, Popovici D, Siewert U (2007) Optimization and applications of echo state networks with leaky-integrator neurons. Neural Netw 20(3):335–352

    Article  MATH  Google Scholar 

  30. Kawai Y, Park J, Asada M (2019) A small-world topology enhances the echo state property and signal propagation in reservoir computing. Neural Netw 112:15–23

    Article  Google Scholar 

  31. Kim T, King BR (2020) Time series prediction using deep echo state networks. Neural Comput Appl 32(23):17769–17787

    Article  Google Scholar 

  32. Lukoševičius M, Jaeger H (2009) Reservoir computing approaches to recurrent neural network training. Comput Sci Rev 3(3):127–149

    Article  MATH  Google Scholar 

  33. Olszewski RT (2001) Generalized feature extraction for structural pattern recognition in time-series data. CARNEGIE-MELLON UNIV PITTSBURGH PA SCHOOL OF COMPUTER SCIENCE, Tech. rep

    Google Scholar 

  34. Ozturk M, Xu D, Principe J (2007) Analysis and design of echo state networks. Neural Comput 19(1):111–138

    Article  MATH  Google Scholar 

  35. Pascanu R, Gulcehre C, Cho K, Bengio Y (2013) How to construct deep recurrent neural networks. arXiv preprint arXiv:1312.6026

  36. Principe J, Xu D, Fisher J, Haykin S (2000) Information theoretic learning. unsupervised adaptive filtering. Unsupervised Adapt Filter 1

  37. Principe JC (2010) Information theoretic learning: Renyi’s entropy and kernel perspectives. Springer Science & Business Media

  38. Rodan A, Tiňo P (2010) Minimum complexity echo state network. IEEE Trans Neural Netw 22(1):131–144

    Article  Google Scholar 

  39. Scardapane S, Wang D (2017) Randomness in neural networks: an overview. Wiley Interdiscip Rev Data Min Knowl Discov 7(2):e1200

    Article  Google Scholar 

  40. Tiňo P, Hammer B, Bodén M (2007) Markovian bias of neural-based architectures with feedback connections. In: Perspectives of neural-symbolic integration. Springer, pp 95–133

  41. Verstraeten D, Schrauwen B, d’Haene M, Stroobandt D (2007) An experimental unification of reservoir computing methods. Neural Netw 20(3):391–403

  42. Weigend AS (2018) Time series prediction: forecasting the future and understanding the past. Routledge

  43. Werbos PJ (1990) Backpropagation through time: what it does and how to do it. Proc IEEE 78(10):1550–1560

    Article  Google Scholar 

  44. Williams BH, Toussaint M, Storkey AJ (2006) Extracting motion primitives from natural handwriting data. In: International conference on artificial neural networks. Springer, pp 634–643

  45. Xue Y, Yang L, Haykin S (2007) Decoupled echo state networks with lateral inhibition. Neural Netw 20(3):365–376

    Article  MATH  Google Scholar 

  46. Yildiz I, Jaeger H, Kiebel S (2012) Re-visiting the echo state property. Neural Netw 35:1–9

    Article  MATH  Google Scholar 

Download references

Acknowledgements

This work has been partially supported by the European Union’s Horizon 2020 Research and Innovation program, under project TEACHING (Grant agreement ID: 871385), URL https://www.teaching-h2020.eu, and by the project BrAID under the Bando Ricerca Salute 2018—Regional public call for research and development projects aimed at supporting clinical and organisational innovation processes of the Regional Health Service—Regione Toscana.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Claudio Gallicchio.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A Further Results

A Further Results

Here, we report additional experimental results that provide further support to the analysis conducted in the paper.

In particular, in Fig. 10, we show the deviation of the results across the different datasets, expressed in terms of median average deviation (MAD)Footnote 8, for the spectral radius, the input scaling and the inter-layer variability. The provided plots match the corresponding median results given in Figs. 7, 8, and 9. We can observe that in general the deviation is rather small and, varying the reservoir configurations and the number of layers, shows a trend generally in line with the observations made in Sect. 4.3, with tendentially lower values for richer reservoirs.

Next, we detail the outcomes of the experiments reported in Sect. 4.3, aggregated on the individual datasets. We provide the values of \(\text {ESP}_{index}\), ASE, LUD and C, achieved under the experimental settings illustrated in Sect. 4.2, varying the value of the spectral radius \(\rho \) in Fig. 11, varying the value of the input scaling \( \omega _{{{\text{in}}}} \) in Fig. 12, and varying the value of the inter-layer scaling \( \omega _{{{\text{il}}}} \) in Fig. 13. In each figure, each row corresponds to a different dataset. The values shown in the plots evidently confirm the same trends analyzed in Sect. 4.3 (see Figs. 7, 8, and 9).

Finally, we report the outcomes of the further experiments conducted in the same conditions as in Sect. 4.2, but with a preliminary rescaling in \([-1,1]\) of the input along each dimension individually, for each dataset. Results are given in Fig. 14, and, as it is apparent from the plots, are qualitatively in line with those without rescaling shown in Figs. 11, 12, and 13, thereby confirming the analysis discussed in Sect. 4.3.

Fig. 10
figure 10

Median absolute deviation across the results for the different datasets considered in Figs. 7, 8 and 9. Results are reported for \(\text {ESP}_{index}\), ASE, LUD and C (in \(\log _{10}\) scale) measured in deeper reservoir layers, varying the values of the spectral radius \(\rho \) (Fig. 10a), of the input scaling \( \omega _{{{\text{in}}}} \) (Fig. 10b), and of the inter-layer scaling \( \omega _{{{\text{in}}}} \) (Fig. 10b)

Fig. 11
figure 11

Spectral radius variability. \(\text {ESP}_{index}\), ASE, LUD and C (in \(\log _{10}\) scale) measured in deeper reservoir layers (horizontal axis in each plot), varying the value of the spectral radius \(\rho \) (vertical axis in each plot). Results are presented individually for each dataset

Fig. 12
figure 12

Input scaling variability. \(\text {ESP}_{index}\), ASE, LUD and C (in \(\log _{10}\) scale) measured in deeper reservoir layers (horizontal axis in each plot), varying the value of the input scaling \( \omega _{{{\text{in}}}} \) (vertical axis in each plot). Results are presented individually for each dataset

Fig. 13
figure 13

Inter-layer scaling variability. \(\text {ESP}_{index}\), ASE, LUD and C (in \(\log _{10}\) scale) measured in deeper reservoir layers (horizontal axis in each plot), varying the value of the inter-layer scaling \( \omega _{{{\text{il}}}} \) (vertical axis in each plot). Results are presented individually for each dataset

Fig. 14
figure 14

Richness of reservoir dynamics measured in deeper reservoir layers, varying the values of the spectral radius \(\rho \) (Fig. 14a), of the input scaling \(\omega _{n}\) (Fig. 14b), and of the inter-layer scaling (Fig. 14c). The plots show \(ESP_{index}\) (the lower the better), ASE (the higher the better), LUD (the higher the better), and C (in \(log_{10}\) scale, the lower the better). Results are aggregated across all datasets. Differently from the results shown in Figs. 11, 12, and 13, the input is rescaled to a common range \([-1,1]\) for all the datasets

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gallicchio, C., Micheli, A. Architectural richness in deep reservoir computing. Neural Comput & Applic 35, 24525–24542 (2023). https://doi.org/10.1007/s00521-021-06760-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-021-06760-7

Keywords

Navigation