Echo State Property of Deep Reservoir Computing Networks

Gallicchio, Claudio; Micheli, Alessio

doi:10.1007/s12559-017-9461-9

Echo State Property of Deep Reservoir Computing Networks

Published: 05 May 2017

Volume 9, pages 337–350, (2017)
Cite this article

Cognitive Computation Aims and scope Submit manuscript

2703 Accesses
91 Citations
4 Altmetric
Explore all metrics

Abstract

In the last years, the Reservoir Computing (RC) framework has emerged as a state of-the-art approach for efficient learning in temporal domains. Recently, within the RC context, deep Echo State Network (ESN) models have been proposed. Being composed of a stack of multiple non-linear reservoir layers, deep ESNs potentially allow to exploit the advantages of a hierarchical temporal feature representation at different levels of abstraction, at the same time preserving the training efficiency typical of the RC methodology. In this paper, we generalize to the case of deep architectures the fundamental RC conditions related to the Echo State Property (ESP), based on the study of stability and contractivity of the resulting dynamical system. Besides providing a necessary condition and a sufficient condition for the ESP of layered RC networks, the results of our analysis provide also insights on the nature of the state dynamics in hierarchically organized recurrent models. In particular, we find out that by adding layers to a deep reservoir architecture, the regime of network’s dynamics can only be driven towards (equally or) less stable behaviors. Moreover, our investigation shows the intrinsic ability of temporal dynamics differentiation at the different levels in a deep recurrent architecture, with higher layers in the stack characterized by less contractive dynamics. Such theoretical insights are further supported by experimental results that show the effect of layering in terms of a progressively increased short-term memory capacity of the recurrent models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep learning for time series classification: a review

Article 02 March 2019

Multi-agent deep reinforcement learning: a survey

Article Open access 15 April 2021

Automated machine learning: past, present and future

Article Open access 18 April 2024

References

Aboudib A, Gripon V, Coppin G. A biologically inspired framework for visual information processing and an application on modeling bottom-up visual attention. Cogn Comput. 2016;8(6):1007–1026.
Article Google Scholar
Angelov P, Sperduti A. 2016. Challenges in deep learning. In: Proceedings of the 24th European symposium on artificial neural networks (ESANN), p. 489–495. http://www.i6doc.com.
Bengio Y. Learning deep architectures for ai Foundations and trends ^Ⓡ. Mach Learn. 2009;2(1):1–127.
Article Google Scholar
Bianchi F, Livi L, Alippi C. 2016. Investigating echo state networks dynamics by means of recurrence analysis. arXiv preprint arXiv:1601.07381, p. 1–25.
Buehner M, Young P. A tighter bound for the echo state property. IEEE Trans Neural Netw. 2006;17(3): 820–824.
Article PubMed Google Scholar
Cireşan D, Giusti A, Gambardella L, Schmidhuber J. 2013. Mitosis detection in breast cancer histology images with deep neural networks. In: International conference on medical image computing and computer-assisted intervention. Springer; p. 411–418.
Cireşan D, Meier U, Gambardella L, Schmidhuber J. Deep, big, simple neural nets for handwritten digit recognition. Neural Comput. 2010;22(12):3207–3220.
Article PubMed Google Scholar
Deng L, Yu D. Deep learning. Signal Process. 2014;7:3–4.
Google Scholar
El Hihi S, Bengio Y. 1995. Hierarchical recurrent neural networks for long-term dependencies. In: NIPS, p. 493–499.
Gallicchio C, Micheli A. Architectural and markovian factors of echo state networks. Neural Netw. 2011;24 (5):440–456.
Article PubMed Google Scholar
Gallicchio C, Micheli A. 2016. Deep reservoir computing: a critical analysis. In: Proceedings of the 24th European symposium on artificial neural networks (ESANN), p. 497–502. http://www.i6doc.com.
Gallicchio C, Micheli A, Pedrelli L. 2016. Deep reservoir computing: a critical experimental analysis. Neurocomputing. Accepted.
Gerstner W, Kistler W. 2002. Spiking neuron models: aingle neurons, populations, plasticity. Cambridge University Press.
Goodfellow I, Bengio Y, Courville A. 2016. Deep learning. Book in preparation for MIT Press. http://www.deeplearningbook.org.
Graves A, Mohamed AR, Hinton G. 2013. Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on Acoustics, speech and signal processing (ICASSP). IEEE; p. 6645–6649.
Hammer B, Tiňo P. Recurrent neural networks with small weights implement definite memory machines. Neural Comput. 2003;15(8):1897–1929.
Article Google Scholar
Hermans M, Schrauwen B. 2013. Training and analysing deep recurrent neural networks. In: NIPS, p. 190–198.
Jaeger H. 2001. The “echo state” approach to analysing and training recurrent neural networks - with an erratum note. Tech. rep. GMD - German National Research Institute for Computer Science, Tech. Rep.
Jaeger H. 2001. Short term memory in echo state networks, Tech. rep., German National Research Center for Information Technology.
Jaeger H. 2007. Discovering multiscale dynamical features with hierarchical echo state networks. Tech. rep., Jacobs University Bremen.
Jaeger H, Haas H. Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication. Science 2004;304(5667):78–80.
Article CAS PubMed Google Scholar
Jaeger H, Lukoṡeviċius M, Popovici D, Siewert U. Optimization and applications of echo state networks with leaky-integrator neurons. Neural Netw. 2007;20(3):335–352.
Article PubMed Google Scholar
Klopf A, Weaver S, Morgan J. A hierarchical network of control systems that learn: Modeling nervous system function during classical and instrumental conditioning. Adapt. Behav. 1993;1(3):263–319.
Article Google Scholar
Kolen JF, Kremer SC. 2001. A field guide to dynamical recurrent networks. IEEE Press.
Krizhevsky A, Sutskever I, Hinton G. Imagenet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, and Weinberger KQ, editors. Advances in neural information processing systems; 2012. p. 1097–1105.
Google Scholar
LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015;521(7553):436–444.
Article CAS PubMed Google Scholar
Lukoṡeviċius, M, Jaeger H. Reservoir computing approaches to recurrent neural network training. Comput Sci Rev. 2009;3(3):127–149.
Article Google Scholar
Maass W, Natschläger T, Markram H. Real-time computing without stable states: a new framework for neural computation based on perturbations. Neural Comput. 2002;14(11):2531–2560.
Article PubMed Google Scholar
Malik ZK, Hussain A, Wu QJ. 2016. Multilayered echo state machine: a novel architecture and algorithm. IEEE Transactions on cybernetics. (In Press).
Manjunath G, Jaeger H. Echo state property linked to an input: Exploring a fundamental characteristic of recurrent neural networks. Neural Comput. 2013;25(3):671–696.
Article CAS PubMed Google Scholar
O’Searcoid M. 2006. Metric spaces. Springer Science & Business Media.
Pascanu R, Gulcehre C, Cho K, Bengio Y. 2014. How to construct deep recurrent neural networks arXiv preprint arXiv:1312.6026v5.
Rabinovich M, Huerta R, Varona P, Afraimovich V. Generation and reshaping of sequences in neural systems. Biol Cybern. 2006;95(6):519–536.
Article PubMed Google Scholar
Rabinovich M, Varona P, Selverston A, Abarbanel H. Dynamical principles in neuroscience. Rev Modern Phys. 2006;78(4):1213.
Article Google Scholar
Rodan A, Tiňo P. 2011. Negatively correlated echo state networks. In: Proceedings of the 19th European symposium on artificial neural networks (ESANN), p. 53–58. http://www.i6doc.com.
Sato Y, Nagatomi T, Horio K, Miyamoto H. The cognitive mechanisms of multi-scale perception for the recognition of extremely similar faces. Cogn Comput. 2015;7(5):501–508.
Article Google Scholar
Schmidhuber J. Deep learning in neural networks: an overview. Neural Netw. 2015;61:85–117.
Article PubMed Google Scholar
Schrauwen B, Wardermann M, Verstraeten D, Steil J, Stroobandt D. Improving reservoirs using intrinsic plasticity. Neurocomputing 2008;71(7):1159–1171.
Article Google Scholar
Spratling M. A hierarchical predictive coding model of object recognition in natural images. Cogn Comput. 2016: 1–17.
Steil J. 2004. Backpropagation-decorrelation: online recurrent learning with o (n) complexity. In: Proceedings of the 2004 IEEE international joint conference on neural networks (IJCNN). IEEE; vol. 2, p. 843–848.
Tiṅo P, Hammer B, Bodén M. 2007. Markovian bias of neural-based architectures with feedback connections. In: Perspectives of neural-symbolic integration. Springer; , p. 95–133.
Tiňo P, Dorffner G. Predicting the future of discrete sequences from fractal representations of the past. Mach Learn. 2001;45(2):187–217.
Article Google Scholar
Triefenbach F, Jalalvand A, Demuynck K, Martens JP. Acoustic modeling with hierarchical reservoirs. IEEE Trans Audio Speech Lang Process. 2013;21(11):2439–2450.
Article Google Scholar
Triefenbach F, Jalalvand A, Schrauwen B, Martens JP. 2010. Phoneme recognition with large hierarchical reservoirs. In: Advances in neural information processing systems, p. 2307–2315.
Tyrrell T. The use of hierarchies for action selection. Adapt Behav. 1993;1(4):387–420.
Article Google Scholar
Verstraeten D, Schrauwen B, D’haene M, Stroobandt D. An experimental unification of reservoir computing methods. Neural Netw. 2007;20(3):391–403.
Article CAS PubMed Google Scholar
Wainrib G, Galtier M. A local echo state property through the largest lyapunov exponent. Neural Netw. 2016;76:39–45.
Article PubMed Google Scholar
Xue Y, Yang L, Haykin S. Decoupled echo state networks with lateral inhibition. Neural Netw. 2007;20 (3):365–376.
Article PubMed Google Scholar
Yildiz I, Jaeger H, Kiebel S. Re-visiting the echo state property. Neural Netw. 2012;35:1–9.
Article PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Pisa, Largo B. Pontecorvo, 3, 56127, Pisa, Italy
Claudio Gallicchio & Alessio Micheli

Authors

Claudio Gallicchio
View author publications
You can also search for this author in PubMed Google Scholar
Alessio Micheli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Claudio Gallicchio.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Appendices

Appendix A: Proof of Lemma 2

Let us consider the Jacobian of deepESN state transition function in Eq. 1 evaluated at u(t) and x(t − 1). From Eq. 8 we can easily see that for every j > i, we have that $\mathbf {J}_{F^{(i)},\mathbf {x}^{(j)}}(\mathbf {u}(t),\mathbf {x}(t-1))$ is a zero matrix, as the hierarchical structure of the deepESN architecture (see Fig. 1 and Eqs. 2 and 3) tells us that state update at layer i does not depend on the previous state of the system at higher layers in the stack (i.e. at layers j, with j > i). Thereby, we can notice that J _F,x(u(t),x(t − 1)) has the structure of a lower-triangular block matrix. As such, the eigenvalues of J _F,x(u(t),x(t − 1)) are the eigenvalues of the matrices on its block diagonal, i.e. the eigenvalues of $\mathbf {J}_{F^{(i)},\mathbf {x}^{(i)}}(\mathbf {u}(t),\mathbf {x}(t-1))$ for every i = 1,2,…,N _L. Accordingly, we have that the spectral radius of J _F,x(u(t),x(t − 1)) is the maximum among the spectral radii of its diagonal blocks, i.e.,

$$ \rho(\mathbf{J}_{F,\mathbf{x}}(\mathbf{u}(t),\mathbf{x}(t-1))) = \max\limits_{k = 1,2,\ldots,N_{L}} {}\left( \rho(\mathbf{J}_{F^{(k)},\mathbf{x}^{(k)}}(\mathbf{u}(t),\mathbf{x}(t-1)))\right. $$

(25)

With the aim of computing the diagonal block matrices $\mathbf {J}_{F^{(k)},\mathbf {x}^{(k)}}(\mathbf {u}(t),\mathbf {x}(t-1))$, we observe that from Eq. 8 we have that

$$ \begin{array}{l} \mathbf{J}_{F^{(k)},\mathbf{x}^{(k)}}(\mathbf{u}(t),\mathbf{x}(t-1)) = \frac{\partial F^{(k)}(\mathbf{u}(t), \mathbf{x}^{(1)}(t-1),\ldots,\mathbf{x}^{(k)}(t-1))}{\partial \mathbf{x}^{(k)}(t-1)} = \\\\ \frac{\partial}{\partial \mathbf{x}^{(k)}(t-1)}\left( (1-a^{(k)}) \mathbf{x}^{(k)}(t-1) + a^{(k)} \tanh(\mathbf{W}_{in}^{(k)} F^{(k-1)}(\mathbf{u}(t),\mathbf{x}^{(1)}(t-1),\ldots,\mathbf{x}^{(k-1)}(t-1)) + \right.\\\\ \left. \boldsymbol{\theta}^{(k)} + \hat{\mathbf{W}}^{(k)} \mathbf{x}^{(k)}(t-1)) \right)= \\\\ (1-a^{(k)})\mathbf{I} + a^{(k)} \; \left( \begin{array}{llll} 1-(\tilde{x}_1^{(k)}(t))^2 &\; 0 &\; {\ldots} & \;0\\ 0 & \; 1-(\tilde{x}_2^{(k)}(t))^2 & \; {\ldots} & \; 0\\ {\vdots} & \; {\vdots} & \; {\ddots} & \; {\vdots} \\ 0 & \; 0 & \; {\ldots} & \; 1-(\tilde{x}_{N_R}^{(k)}(t))^2\\ \end{array} \right) \hat{\mathbf{W}}^{(k)} \end{array} $$

(26)

where for j = 1,2,…,N _R, $\tilde {x}_{j}^{(k)}(t)$ are the elements of $\tilde {\mathbf {x}}^{(k)}(t) = \tanh (\mathbf {W}_{in}^{(k)} F^{(k-1)}(\mathbf {u}(t),\mathbf {x}^{(1)}(t-1),\mathbf {x}^{(2)}(t-1),\ldots ,\mathbf {x}^{(k-1)}(t-1)) + \boldsymbol {\theta }^{(k)} + \hat {\mathbf {W}}^{(k)} \mathbf {x}^{(k)}(t-1))$.

Considering zero input and state, from Eq. 26, we can derive that for every k = 1,2,…,N _L

$$ \rho(\mathbf{J}_{F^{(k)},\mathbf{x}^{(k)}}(\mathbf{0}_{u},\mathbf{0})) = \rho ((1-a^{(k)}) \mathbf{I} + a^{(k)} \hat{\mathbf{W}}^{(k)} ) $$

(27)

and therefore Eq. 25 becomes

$$ \rho(\mathbf{J}_{F,\mathbf{x}}(\mathbf{0}_{u},\mathbf{0})) = \max\limits_{k = 1,2,\ldots,N_{L}} \rho ((1-a^{(k)}) \mathbf{I} + a^{(k)} \hat{\mathbf{W}}^{(k)} ). $$

(28)

Appendix B: Proof of Lemma 3

case (a): This case follows from the case of contractivity in standard shallow ESNs [10]. Indeed, $\forall \mathbf {u} \in \mathbb {R}^{N_U}$ and $\forall \mathbf {x}^{(1)}, \mathbf {x^{\prime }}^{(1)} \in \mathbb {R}^{N_R}$

$$\begin{array}{@{}rcl@{}} &&\|F^{(1)}(\mathbf{u},\mathbf{x}^{(1)}) - F^{(1)}(\mathbf{u},\mathbf{x}^{\prime(1)})\|\\ &=&\|(1 - a^{(1)}) \mathbf{x}^{(1)} + a^{(1)} \tanh\left( \mathbf{W}_{in}^{(1)} \mathbf{u} + \boldsymbol{\theta}^{(1)} + \hat{\mathbf{W}}^{(1)} \mathbf{x}^{(1)}\right) \\ &&-(1 - a^{(1)}) \mathbf{x}^{\prime(1)} -a^{(1)}\tanh\left( \mathbf{W}_{in}^{(1)} \mathbf{u} + \boldsymbol{\theta}^{(1)} + \hat{\mathbf{W}}^{(1)} \mathbf{x}^{\prime(1)}\right)\| \end{array} $$

$$\begin{array}{@{}rcl@{}} &=&\|(1 \,-\, a^{(1)}) (\mathbf{x}^{(1)}\,-\,\mathbf{x}^{\prime(1)}) \,+\, a^{(1)} (\tanh\left( \mathbf{W}_{in}^{(1)} \mathbf{u} + \boldsymbol{\theta}^{(1)} \!+ \hat{\mathbf{W}}^{(1)} \mathbf{x}^{(1)}\right) \\ &&-\tanh\left( \mathbf{W}_{in}^{(1)} \mathbf{u} + \boldsymbol{\theta}^{(1)} + \hat{\mathbf{W}}^{(1)} \mathbf{x}^{\prime(1)}\right))\| \\ &\leq&(1 - a^{(1)}) \|\mathbf{x}^{(1)} - \mathbf{x}^{\prime(1)}\| + a^{(1)} \|\mathbf{W}_{in}^{(1)}\mathbf{u}+\boldsymbol{\theta}^{(1)}+\hat{\mathbf{W}}^{(1)}\mathbf{x}^{(1)} \\ &&-\mathbf{W}_{in}^{(1)}\mathbf{u} - \boldsymbol{\theta}^{(1)} - \hat{\mathbf{W}}^{(1)}\mathbf{x}^{\prime(1)}\| \\ &\leq&(1 - a^{(1)}) \|\mathbf{x}^{(1)}- \mathbf{x}^{\prime(1)}\| + a^{(1)} \|\hat{\mathbf{W}}^{(1)}\| \|\mathbf{x}^{(1)} - \mathbf{x}^{\prime(1)}\| \\ &=&((1 - a^{(1)}) + a^{(1)} \|\hat{\mathbf{W}}^{(1)}\|) \|\mathbf{x}^{(1)} - \mathbf{x}^{\prime(1)}\| \end{array} $$

(29)

from which it follows that $C^{(1)} = (1 - a^{(1)}) + a^{(1)} \|\hat {\mathbf {W}}^{(1)}\|$ is a Lipschitz constant for F ⁽¹⁾. Thus if C ⁽¹⁾ < 1 then F ⁽¹⁾ is a contraction (see Definition 2).

case (b): In this case, assuming F ⁽ⁱ⁻¹⁾ is a contraction with a Lipschitz constant C ⁽ⁱ⁻¹⁾ < 1, $\forall \mathbf {u} \in \mathbb {R}^{N_U}$ and $\forall \mathbf {x}^{(1)}, \ldots , \mathbf {x}^{(i)}, \mathbf {x}^{\prime (1)}, \ldots , \mathbf {x}^{\prime (i)} \in \mathbb {R}^{N_R}$

$$\begin{array}{@{}rcl@{}} \begin{array}{l} \|F^{(i)}(\mathbf{u}, \mathbf{x}^{(1)}, \ldots, \mathbf{x}^{(i)}) - F^{(i)}(\mathbf{u}, \mathbf{x}'^{(1)},\ldots, \mathbf{x}'^{(i)})\| = \\ \\ \| (1-a^{(i)}) \mathbf{x}^{(i)} + a^{(i)} \tanh (\mathbf{W}_{in}^{(i)}F^{(i-1)}(\mathbf{u}, \mathbf{x}^{(1)},\ldots, \mathbf{x}^{(i-1)}) + \boldsymbol{\theta}^{(i)} + \hat{\mathbf{W}}^{(i)} \mathbf{x}^{(i)}) - \\ \;\;(1-a^{(i)}) \mathbf{x}'^{(i)} - a^{(i)} \tanh (\mathbf{W}_{in}^{(i)}F^{(i-1)}(\mathbf{u}, \mathbf{x}'^{(1)}, \ldots, \mathbf{x}'^{(i-1)}) + \boldsymbol{\theta}^{(i)} + \hat{\mathbf{W}}^{(i)} \mathbf{x}'^{(i)}) \| = \\ \\ \| (1-a^{(i)}) (\mathbf{x}^{(i)} - \mathbf{x}'^{(i)}) + a^{(i)} (\tanh (\mathbf{W}_{in}^{(i)}F^{(i-1)}(\mathbf{u}, \mathbf{x}^{(1)}, \ldots, \mathbf{x}^{(i-1)}) + \boldsymbol{\theta}^{(i)} + \hat{\mathbf{W}}^{(i)} \mathbf{x}^{(i)}) - \\ \;\; \tanh (\mathbf{W}_{in}^{(i)}F^{(i-1)}(\mathbf{u}, \mathbf{x}'^{(1)}, \ldots, \mathbf{x}'^{(i-1)}) + \boldsymbol{\theta}^{(i)} + \hat{\mathbf{W}}^{(i)} \mathbf{x}'^{(i)})) \| \leq \\ \\ (1-a^{(i)}) \|\mathbf{x}^{(i)} - \mathbf{x}'^{(i)}\| + a^{(i)} \|\mathbf{W}_{in}^{(i)}F^{(i-1)}(\mathbf{u}, \mathbf{x}^{(1)},\ldots, \mathbf{x}^{(i-1)}) + \boldsymbol{\theta}^{(i)} + \hat{\mathbf{W}}^{(i)} \mathbf{x}^{(i)} - \\ \mathbf{W}_{in}^{(i)}F^{(i-1)}(\mathbf{u}, \mathbf{x}'^{(1)},\ldots,\mathbf{x}'^{(i-1)}) - \boldsymbol{\theta}^{(i)} - \hat{\mathbf{W}}^{(i)} \mathbf{x}'^{(i)}\| = \\ \\ (1-a^{(i)}) \|\mathbf{x}^{(i)} - \mathbf{x}'^{(i)}\| + a^{(i)} \| \mathbf{W}_{in}^{(i)} (F^{(i-1)}(\mathbf{u}, \mathbf{x}^{(1)},\ldots, \mathbf{x}^{(i-1)}) - \\ F^{(i-1)}(\mathbf{u}, \mathbf{x}'^{(1)},\ldots, \mathbf{x}'^{(i-1)})) + \hat{\mathbf{W}}^{(i)} (\mathbf{x}^{(i)} - \mathbf{x}'^{(i)})\| \leq \\ \\ (1-a^{(i)}) \|\mathbf{x}^{(i)} - \mathbf{x}'^{(i)}\| + a^{(i)} \left( \| \mathbf{W}_{in}^{(i)} \| \| F^{(i-1)}(\mathbf{u}, \mathbf{x}^{(1)},\ldots, \mathbf{x}^{(i-1)}) - \right.\\ \left.F^{(i-1)}(\mathbf{u}, \mathbf{x}'^{(1)}, \ldots, \mathbf{x}'^{(i-1)}) \| + \|\hat{\mathbf{W}}^{(i)}\| \|\mathbf{x}^{(i)} - \mathbf{x}'^{(i)}\| \right) \leq \\ \\ (1-a^{(i)}) \|\mathbf{x}^{(i)} - \mathbf{x}'^{(i)}\| + a^{(i)} \left( \| \mathbf{W}_{in}^{(i)}\| \, C^{(i-1)} \, \|(\mathbf{x}^{(1)}, \ldots, \mathbf{x}^{(i-1)})-\right.\\ \left.(\mathbf{x}'^{(1)}, \ldots, \mathbf{x}'^{(i-1)})\| + \|\hat{\mathbf{W}}^{(i)}\| \|\mathbf{x}^{(i)} - \mathbf{x}'^{(i)} \|\right) \leq \\ \\ (1-a^{(i)}) \|(\mathbf{x}^{(1)}, \ldots, \mathbf{x}^{(i)})-(\mathbf{x}'^{(1)},\ldots, \mathbf{x}'^{(i)})\| + a^{(i)} \left( C^{(i-1)} \|\mathbf{W}_{in}^{(i)}\| \|(\mathbf{x}^{(1)}, \ldots, \mathbf{x}^{(i)})-\right.\\ \left.(\mathbf{x}'^{(1)},\ldots, \mathbf{x}'^{(i)})\| + \|\hat{\mathbf{W}}^{(i)}\| \|(\mathbf{x}^{(1)}, \ldots, \mathbf{x}^{(i)})-(\mathbf{x}'^{(1)},\ldots, \mathbf{x}'^{(i)})\|\right) = \\ \\ (1-a^{(i)}) \|(\mathbf{x}^{(1)}, \ldots, \mathbf{x}^{(i)})-(\mathbf{x}'^{(1)}, \ldots, \mathbf{x}'^{(i)})\| + \\ a^{(i)} \left( C^{(i-1)} \|\mathbf{W}_{in}^{(i)}\| + \|\hat{\mathbf{W}}^{(i)}\|\right) \|(\mathbf{x}^{(1)}, \ldots, \mathbf{x}^{(i)})-(\mathbf{x}'^{(1)},\ldots, \mathbf{x}'^{(i)})\| = \\ \\ \left[(1-a^{(i)}) + a^{(i)}\left( C^{(i-1)} \|\mathbf{W}_{in}^{(i)}\| + \|\hat{\mathbf{W}}^{(i)}\|\right)\right] \|(\mathbf{x}^{(1)}, \ldots, \mathbf{x}^{(i)})-(\mathbf{x}'^{(1)}, \ldots, \mathbf{x}'^{(i)})\| \end{array} \end{array} $$

(30)

from which we can see that $C^{(i)} = (1-a^{(i)}) + a^{(i)}\left (C^{(i-1)} \|\mathbf {W}_{in}^{(i)}\| + \|\hat {\mathbf {W}}^{(i)}\|\right )$ is a Lipschitz constant for F ⁽ⁱ⁾, and thereby whenever C ⁽ⁱ⁾ < 1 it results that F ⁽ⁱ⁾ is a contraction (see Definition 2).

Appendix C: Proof of Theorem 2

Proof

Given any input string of length N, denoted by s _N = [u(1),…,u(N)], and for every couple of deepESN global states $\mathbf {x} = (\mathbf {x}^{(1)}, \ldots , \mathbf {x}^{(N_{L})}) \in \mathbb {R}^{N_{L} N_{R}}$ and $\mathbf {x}^{\prime } = (\mathbf {x}^{\prime (1)},\ldots , \mathbf {x}^{\prime (N_{L})}) \in \mathbb {R}^{N_{L} N_{R}}$, we have that:

$$\begin{array}{@{}rcl@{}} &&\|\hat{F}(\mathbf{s}_{N},\mathbf{x}) - \hat{F}(\mathbf{s}_{N},\mathbf{x}^{\prime})\| \\ &=&\|\hat{F}([\mathbf{u}(1),\ldots, \mathbf{u}(N)],\mathbf{x}) - \hat{F}([\mathbf{u}(1), \ldots, \mathbf{u}(N)],\mathbf{x}^{\prime})\| \\ &=&\|F(\mathbf{u}(N),\hat{F}([\mathbf{u}(1), \ldots, \mathbf{u}(N-1)],\mathbf{x})) \\ &&-F(\mathbf{u}(N),\hat{F}([\mathbf{u}(1),\ldots, \mathbf{u}(N-1)],\mathbf{x}^{\prime}))\| \\ &\leq& C \|\hat{F}([\mathbf{u}(1),\ldots, \mathbf{u}(N\,-\,1)],\mathbf{x}) \,-\, \hat{F}([\mathbf{u}(1),\ldots, \mathbf{u}(N\,-\,1)],\mathbf{x}^{\prime})\| \\ &=&C \|F(\mathbf{u}(N-1),\hat{F}([\mathbf{u}(1),\ldots, \mathbf{u}(N-2)],\mathbf{x})) \\ &&- F(\mathbf{u}(N-1),\hat{F}([\mathbf{u}(1),\ldots, \mathbf{u}(N-2)],\mathbf{x}^{\prime}))\| \\ &\leq& C^{2} \|\hat{F}([\mathbf{u}(1), \ldots, \mathbf{u}(N\,-\,2)],\mathbf{x}) \,-\, \hat{F}([\mathbf{u}(1), \ldots, \mathbf{u}(N\,-\,2)],\mathbf{x}^{\prime})\| \\ &\leq&\ldots \\ && C^{N-1} \|\hat{F}([\mathbf{u}(1)],\mathbf{x}) - \hat{F}([\mathbf{u}(1)],\mathbf{x}^{\prime})\| \\ &=&C^{N-1} \|F(\mathbf{u}(1),\hat{F}([\;],\mathbf{x})) - F(\mathbf{u}(1),\hat{F}([\;],\mathbf{x}^{\prime}))\| \\ &=&C^{N-1} \|F(\mathbf{u}(1),\mathbf{x}) - F(\mathbf{u}(1),\mathbf{x}^{\prime})\| \\ &\leq& C^{N} \, \|\mathbf{x} - \mathbf{x}^{\prime}\| \\ &=&C^{N} \, \max\limits_{k = 1,2,\ldots,N_{L}} \|\mathbf{x}^{(k)} - \mathbf{x}^{\prime(k)} \| \\ &\leq& C^{N} D \end{array} $$

(31)

from which it follows that $\|\hat {F}(\mathbf {s}_{N},\mathbf {x}) - \hat {F}(\mathbf {s}_{N},\mathbf {x}^{\prime })\|$ is upper bounded by a term that approaches 0 as N →∞. Thereby the ESP condition in Definition 1 holds. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gallicchio, C., Micheli, A. Echo State Property of Deep Reservoir Computing Networks. Cogn Comput 9, 337–350 (2017). https://doi.org/10.1007/s12559-017-9461-9

Download citation

Received: 02 November 2016
Accepted: 16 March 2017
Published: 05 May 2017
Issue Date: June 2017
DOI: https://doi.org/10.1007/s12559-017-9461-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Echo State Property of Deep Reservoir Computing Networks

Abstract

Access this article

Similar content being viewed by others

Deep learning for time series classification: a review

Multi-agent deep reinforcement learning: a survey

Automated machine learning: past, present and future

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Ethical approval

Appendices

Appendix A: Proof of Lemma 2

Appendix B: Proof of Lemma 3

Appendix C: Proof of Theorem 2

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Echo State Property of Deep Reservoir Computing Networks

Abstract

Access this article

Similar content being viewed by others

Deep learning for time series classification: a review

Multi-agent deep reinforcement learning: a survey

Automated machine learning: past, present and future

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Ethical approval

Appendices

Appendix A: Proof of Lemma 2

Appendix B: Proof of Lemma 3

Appendix C: Proof of Theorem 2

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation