Abstract
Reservoir computing (RC) is a popular class of recurrent neural networks (RNNs) with untrained dynamics. Recently, advancements on deep RC architectures have shown a great impact in time-series applications, showing a convenient trade-off between predictive performance and required training complexity. In this paper, we go more in depth into the analysis of untrained RNNs by studying the quality of recurrent dynamics developed by the layers of deep RC neural networks. We do so by assessing the richness of the neural representations in the different levels of the architecture, using measures originating from the fields of dynamical systems, numerical analysis and information theory. Our experiments, on both synthetic and real-world datasets, show that depth—as an architectural factor of RNNs design—has a natural effect on the quality of RNN dynamics (even without learning of the internal connections). The interplay between depth and the values of RC scaling hyper-parameters, especially the scaling of inter-layer connections, is crucial to design rich untrained recurrent neural systems.
Similar content being viewed by others
Notes
The maximum among the eigenvalues in modulus.
The size of \({\mathcal {K}}\) is set to 0.3 times the std of instantaneous reservoir activations, as in [34].
From LÍngua BRAsileira de Sinais, i.e., Brasilian Sign Language.
The original dataset contained a variable number of time-steps with zero input features at the beginning and at the end of each time-series, which we have preliminary removed.
Despite its simplicity, this choice for the architectural design of deep RC was found useful in several application contexts (see, e.g., [16, 19]). Notice, however, that in general the deep RC approach is not restrictive in this sense, and it enables the flexibility to have different hyper-parameterizations in different levels of the architecture.
Given a vector \({\mathbf {v}}\), \(\text {MAD}({\mathbf {v}}) = \text {MEDIAN}(| {\mathbf {v}}-\text {MEDIAN}({\mathbf {v}})|)\).
References
Atiya AF, Parlos AG (2000) New results on recurrent network training: unifying the algorithms and accelerating convergence. IEEE Trans Neural Netw 11(3):697–709
Bacciu D, Barsocchi P, Chessa S, Gallicchio C, Micheli A (2014) An experimental characterization of reservoir computing in ambient assisted living applications. Neural Comput Appl 24(6):1451–1464
Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166
Chen Y, Keogh E, Hu B, Begum N, Bagnall A, Mueen A, Batista G (2015) The ucr time series classification archive . www.cs.ucr.edu/~eamonn/time_series_data/
Colla V, Matino I, Dettori S, Cateni S, Matino R (2019) Reservoir computing approaches applied to energy management in industry. In: International conference on engineering applications of neural networks. Springer, pp 66–79
Cover TM (1965) Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans Electron Comput 3:326–334
Dettori S, Matino I, Colla V, Speets R (2020) Deep echo state networks in industrial applications. In: IFIP international conference on artificial intelligence applications and innovations. Springer, pp 53–63
Dias DB, Madeo RC, Rocha T, Biscaro HH, Peres SM (2009) Hand movement recognition for brazilian sign language: a study using distance-based neural networks. In: 2009 international joint conference on neural networks, pp. 697–704. IEEE
Dua D, Graff C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml
Gallicchio C (2019) Chasing the echo state property. In: 27th European symposium on artificial neural networks, computational intelligence and machine learning, ESANN 2019, pp 667–672. ESANN (i6doc. com)
Gallicchio C, Micheli A (2010) A markovian characterization of redundancy in echo state networks by pca. In: Proc. of the 18th European symposium on artificial neural networks (ESANN). d-side publi
Gallicchio C, Micheli A (2011) Architectural and markovian factors of echo state networks. Neural Netw 24(5):440–456
Gallicchio C, Micheli A (2017) Deep echo state network (deepesn): a brief survey. arXiv preprint arXiv:1712.04323
Gallicchio C, Micheli A (2017) Echo state property of deep reservoir computing networks. Cogn Comput 9(3):337–350
Gallicchio C, Micheli A (2019) Reservoir topology in deep echo state networks. In: International conference on artificial neural networks. Springer, pp. 62–75
Gallicchio C, Micheli A (2020) Fast and deep graph neural networks. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 3898–3905
Gallicchio C, Micheli A (2021) Deep reservoir computing. In: Nakajima K, Fischer I (eds) Reservoir computing. Springer, pp 77–95
Gallicchio C, Micheli A, Pedrelli L (2017) Deep reservoir computing: a critical experimental analysis. Neurocomputing 268:87–99. https://doi.org/10.1016/j.neucom.2016.12.089
Gallicchio C, Micheli A, Pedrelli L (2018) Design of deep echo state networks. Neural Netw 108:33–47
Gallicchio C, Scardapane S (2020) Deep randomized neural networks. Recent Trends Learn Data 43–68
Graves A, Mohamed Ar, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing, pp 6645–6649. Ieee
Haber E, Ruthotto L (2017) Stable architectures for deep neural networks. Inverse Probl 34(1):014004
Hermans M, Schrauwen B (2013) Training and analysing deep recurrent neural networks. Adv Neural Inf Process Syst 26:190–198
Hu H, Wang L, Lv SX (2020) Forecasting energy consumption and wind power generation using deep echo state network. Renew Energy 154:598–613
Jaeger H (2001) The “echo state” approach to analysing and training recurrent neural networks-with an erratum note. Bonn Ger Ger Natl Res Center Inf Technol GMD Tech Rep
Jaeger H (2002) Short term memory in echo state networks. Tech. rep, GMD-German National Research Institute for Computer Science
Jaeger H (2005) Reservoir riddles: suggestions for echo state network research. In: Proceedings of the 2005 IEEE international joint conference on neural networks (IJCNN), vol 3, pp 1460–1462. IEEE
Jaeger H, Haas H (2004) Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication. Science 304(5667):78–80
Jaeger H, Lukoševičius M, Popovici D, Siewert U (2007) Optimization and applications of echo state networks with leaky-integrator neurons. Neural Netw 20(3):335–352
Kawai Y, Park J, Asada M (2019) A small-world topology enhances the echo state property and signal propagation in reservoir computing. Neural Netw 112:15–23
Kim T, King BR (2020) Time series prediction using deep echo state networks. Neural Comput Appl 32(23):17769–17787
Lukoševičius M, Jaeger H (2009) Reservoir computing approaches to recurrent neural network training. Comput Sci Rev 3(3):127–149
Olszewski RT (2001) Generalized feature extraction for structural pattern recognition in time-series data. CARNEGIE-MELLON UNIV PITTSBURGH PA SCHOOL OF COMPUTER SCIENCE, Tech. rep
Ozturk M, Xu D, Principe J (2007) Analysis and design of echo state networks. Neural Comput 19(1):111–138
Pascanu R, Gulcehre C, Cho K, Bengio Y (2013) How to construct deep recurrent neural networks. arXiv preprint arXiv:1312.6026
Principe J, Xu D, Fisher J, Haykin S (2000) Information theoretic learning. unsupervised adaptive filtering. Unsupervised Adapt Filter 1
Principe JC (2010) Information theoretic learning: Renyi’s entropy and kernel perspectives. Springer Science & Business Media
Rodan A, Tiňo P (2010) Minimum complexity echo state network. IEEE Trans Neural Netw 22(1):131–144
Scardapane S, Wang D (2017) Randomness in neural networks: an overview. Wiley Interdiscip Rev Data Min Knowl Discov 7(2):e1200
Tiňo P, Hammer B, Bodén M (2007) Markovian bias of neural-based architectures with feedback connections. In: Perspectives of neural-symbolic integration. Springer, pp 95–133
Verstraeten D, Schrauwen B, d’Haene M, Stroobandt D (2007) An experimental unification of reservoir computing methods. Neural Netw 20(3):391–403
Weigend AS (2018) Time series prediction: forecasting the future and understanding the past. Routledge
Werbos PJ (1990) Backpropagation through time: what it does and how to do it. Proc IEEE 78(10):1550–1560
Williams BH, Toussaint M, Storkey AJ (2006) Extracting motion primitives from natural handwriting data. In: International conference on artificial neural networks. Springer, pp 634–643
Xue Y, Yang L, Haykin S (2007) Decoupled echo state networks with lateral inhibition. Neural Netw 20(3):365–376
Yildiz I, Jaeger H, Kiebel S (2012) Re-visiting the echo state property. Neural Netw 35:1–9
Acknowledgements
This work has been partially supported by the European Union’s Horizon 2020 Research and Innovation program, under project TEACHING (Grant agreement ID: 871385), URL https://www.teaching-h2020.eu, and by the project BrAID under the Bando Ricerca Salute 2018—Regional public call for research and development projects aimed at supporting clinical and organisational innovation processes of the Regional Health Service—Regione Toscana.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A Further Results
A Further Results
Here, we report additional experimental results that provide further support to the analysis conducted in the paper.
In particular, in Fig. 10, we show the deviation of the results across the different datasets, expressed in terms of median average deviation (MAD)Footnote 8, for the spectral radius, the input scaling and the inter-layer variability. The provided plots match the corresponding median results given in Figs. 7, 8, and 9. We can observe that in general the deviation is rather small and, varying the reservoir configurations and the number of layers, shows a trend generally in line with the observations made in Sect. 4.3, with tendentially lower values for richer reservoirs.
Next, we detail the outcomes of the experiments reported in Sect. 4.3, aggregated on the individual datasets. We provide the values of \(\text {ESP}_{index}\), ASE, LUD and C, achieved under the experimental settings illustrated in Sect. 4.2, varying the value of the spectral radius \(\rho \) in Fig. 11, varying the value of the input scaling \( \omega _{{{\text{in}}}} \) in Fig. 12, and varying the value of the inter-layer scaling \( \omega _{{{\text{il}}}} \) in Fig. 13. In each figure, each row corresponds to a different dataset. The values shown in the plots evidently confirm the same trends analyzed in Sect. 4.3 (see Figs. 7, 8, and 9).
Finally, we report the outcomes of the further experiments conducted in the same conditions as in Sect. 4.2, but with a preliminary rescaling in \([-1,1]\) of the input along each dimension individually, for each dataset. Results are given in Fig. 14, and, as it is apparent from the plots, are qualitatively in line with those without rescaling shown in Figs. 11, 12, and 13, thereby confirming the analysis discussed in Sect. 4.3.
Rights and permissions
About this article
Cite this article
Gallicchio, C., Micheli, A. Architectural richness in deep reservoir computing. Neural Comput & Applic 35, 24525–24542 (2023). https://doi.org/10.1007/s00521-021-06760-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-021-06760-7