Skip to main content
Log in

Echo State Property of Deep Reservoir Computing Networks

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

In the last years, the Reservoir Computing (RC) framework has emerged as a state of-the-art approach for efficient learning in temporal domains. Recently, within the RC context, deep Echo State Network (ESN) models have been proposed. Being composed of a stack of multiple non-linear reservoir layers, deep ESNs potentially allow to exploit the advantages of a hierarchical temporal feature representation at different levels of abstraction, at the same time preserving the training efficiency typical of the RC methodology. In this paper, we generalize to the case of deep architectures the fundamental RC conditions related to the Echo State Property (ESP), based on the study of stability and contractivity of the resulting dynamical system. Besides providing a necessary condition and a sufficient condition for the ESP of layered RC networks, the results of our analysis provide also insights on the nature of the state dynamics in hierarchically organized recurrent models. In particular, we find out that by adding layers to a deep reservoir architecture, the regime of network’s dynamics can only be driven towards (equally or) less stable behaviors. Moreover, our investigation shows the intrinsic ability of temporal dynamics differentiation at the different levels in a deep recurrent architecture, with higher layers in the stack characterized by less contractive dynamics. Such theoretical insights are further supported by experimental results that show the effect of layering in terms of a progressively increased short-term memory capacity of the recurrent models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Aboudib A, Gripon V, Coppin G. A biologically inspired framework for visual information processing and an application on modeling bottom-up visual attention. Cogn Comput. 2016;8(6):1007–1026.

    Article  Google Scholar 

  2. Angelov P, Sperduti A. 2016. Challenges in deep learning. In: Proceedings of the 24th European symposium on artificial neural networks (ESANN), p. 489–495. http://www.i6doc.com.

  3. Bengio Y. Learning deep architectures for ai Foundations and trends . Mach Learn. 2009;2(1):1–127.

    Article  Google Scholar 

  4. Bianchi F, Livi L, Alippi C. 2016. Investigating echo state networks dynamics by means of recurrence analysis. arXiv preprint arXiv:1601.07381, p. 1–25.

  5. Buehner M, Young P. A tighter bound for the echo state property. IEEE Trans Neural Netw. 2006;17(3): 820–824.

    Article  PubMed  Google Scholar 

  6. Cireşan D, Giusti A, Gambardella L, Schmidhuber J. 2013. Mitosis detection in breast cancer histology images with deep neural networks. In: International conference on medical image computing and computer-assisted intervention. Springer; p. 411–418.

  7. Cireşan D, Meier U, Gambardella L, Schmidhuber J. Deep, big, simple neural nets for handwritten digit recognition. Neural Comput. 2010;22(12):3207–3220.

    Article  PubMed  Google Scholar 

  8. Deng L, Yu D. Deep learning. Signal Process. 2014;7:3–4.

    Google Scholar 

  9. El Hihi S, Bengio Y. 1995. Hierarchical recurrent neural networks for long-term dependencies. In: NIPS, p. 493–499.

  10. Gallicchio C, Micheli A. Architectural and markovian factors of echo state networks. Neural Netw. 2011;24 (5):440–456.

    Article  PubMed  Google Scholar 

  11. Gallicchio C, Micheli A. 2016. Deep reservoir computing: a critical analysis. In: Proceedings of the 24th European symposium on artificial neural networks (ESANN), p. 497–502. http://www.i6doc.com.

  12. Gallicchio C, Micheli A, Pedrelli L. 2016. Deep reservoir computing: a critical experimental analysis. Neurocomputing. Accepted.

  13. Gerstner W, Kistler W. 2002. Spiking neuron models: aingle neurons, populations, plasticity. Cambridge University Press.

  14. Goodfellow I, Bengio Y, Courville A. 2016. Deep learning. Book in preparation for MIT Press. http://www.deeplearningbook.org.

  15. Graves A, Mohamed AR, Hinton G. 2013. Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on Acoustics, speech and signal processing (ICASSP). IEEE; p. 6645–6649.

  16. Hammer B, Tiňo P. Recurrent neural networks with small weights implement definite memory machines. Neural Comput. 2003;15(8):1897–1929.

    Article  Google Scholar 

  17. Hermans M, Schrauwen B. 2013. Training and analysing deep recurrent neural networks. In: NIPS, p. 190–198.

  18. Jaeger H. 2001. The “echo state” approach to analysing and training recurrent neural networks - with an erratum note. Tech. rep. GMD - German National Research Institute for Computer Science, Tech. Rep.

  19. Jaeger H. 2001. Short term memory in echo state networks, Tech. rep., German National Research Center for Information Technology.

  20. Jaeger H. 2007. Discovering multiscale dynamical features with hierarchical echo state networks. Tech. rep., Jacobs University Bremen.

  21. Jaeger H, Haas H. Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication. Science 2004;304(5667):78–80.

    Article  CAS  PubMed  Google Scholar 

  22. Jaeger H, Lukoṡeviċius M, Popovici D, Siewert U. Optimization and applications of echo state networks with leaky-integrator neurons. Neural Netw. 2007;20(3):335–352.

    Article  PubMed  Google Scholar 

  23. Klopf A, Weaver S, Morgan J. A hierarchical network of control systems that learn: Modeling nervous system function during classical and instrumental conditioning. Adapt. Behav. 1993;1(3):263–319.

    Article  Google Scholar 

  24. Kolen JF, Kremer SC. 2001. A field guide to dynamical recurrent networks. IEEE Press.

  25. Krizhevsky A, Sutskever I, Hinton G. Imagenet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, and Weinberger KQ, editors. Advances in neural information processing systems; 2012. p. 1097–1105.

    Google Scholar 

  26. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015;521(7553):436–444.

    Article  CAS  PubMed  Google Scholar 

  27. Lukoṡeviċius, M, Jaeger H. Reservoir computing approaches to recurrent neural network training. Comput Sci Rev. 2009;3(3):127–149.

    Article  Google Scholar 

  28. Maass W, Natschläger T, Markram H. Real-time computing without stable states: a new framework for neural computation based on perturbations. Neural Comput. 2002;14(11):2531–2560.

    Article  PubMed  Google Scholar 

  29. Malik ZK, Hussain A, Wu QJ. 2016. Multilayered echo state machine: a novel architecture and algorithm. IEEE Transactions on cybernetics. (In Press).

  30. Manjunath G, Jaeger H. Echo state property linked to an input: Exploring a fundamental characteristic of recurrent neural networks. Neural Comput. 2013;25(3):671–696.

    Article  CAS  PubMed  Google Scholar 

  31. O’Searcoid M. 2006. Metric spaces. Springer Science & Business Media.

  32. Pascanu R, Gulcehre C, Cho K, Bengio Y. 2014. How to construct deep recurrent neural networks arXiv preprint arXiv:1312.6026v5.

  33. Rabinovich M, Huerta R, Varona P, Afraimovich V. Generation and reshaping of sequences in neural systems. Biol Cybern. 2006;95(6):519–536.

    Article  PubMed  Google Scholar 

  34. Rabinovich M, Varona P, Selverston A, Abarbanel H. Dynamical principles in neuroscience. Rev Modern Phys. 2006;78(4):1213.

    Article  Google Scholar 

  35. Rodan A, Tiňo P. 2011. Negatively correlated echo state networks. In: Proceedings of the 19th European symposium on artificial neural networks (ESANN), p. 53–58. http://www.i6doc.com.

  36. Sato Y, Nagatomi T, Horio K, Miyamoto H. The cognitive mechanisms of multi-scale perception for the recognition of extremely similar faces. Cogn Comput. 2015;7(5):501–508.

    Article  Google Scholar 

  37. Schmidhuber J. Deep learning in neural networks: an overview. Neural Netw. 2015;61:85–117.

    Article  PubMed  Google Scholar 

  38. Schrauwen B, Wardermann M, Verstraeten D, Steil J, Stroobandt D. Improving reservoirs using intrinsic plasticity. Neurocomputing 2008;71(7):1159–1171.

    Article  Google Scholar 

  39. Spratling M. A hierarchical predictive coding model of object recognition in natural images. Cogn Comput. 2016: 1–17.

  40. Steil J. 2004. Backpropagation-decorrelation: online recurrent learning with o (n) complexity. In: Proceedings of the 2004 IEEE international joint conference on neural networks (IJCNN). IEEE; vol. 2, p. 843–848.

  41. Tiṅo P, Hammer B, Bodén M. 2007. Markovian bias of neural-based architectures with feedback connections. In: Perspectives of neural-symbolic integration. Springer; , p. 95–133.

  42. Tiňo P, Dorffner G. Predicting the future of discrete sequences from fractal representations of the past. Mach Learn. 2001;45(2):187–217.

    Article  Google Scholar 

  43. Triefenbach F, Jalalvand A, Demuynck K, Martens JP. Acoustic modeling with hierarchical reservoirs. IEEE Trans Audio Speech Lang Process. 2013;21(11):2439–2450.

    Article  Google Scholar 

  44. Triefenbach F, Jalalvand A, Schrauwen B, Martens JP. 2010. Phoneme recognition with large hierarchical reservoirs. In: Advances in neural information processing systems, p. 2307–2315.

  45. Tyrrell T. The use of hierarchies for action selection. Adapt Behav. 1993;1(4):387–420.

    Article  Google Scholar 

  46. Verstraeten D, Schrauwen B, D’haene M, Stroobandt D. An experimental unification of reservoir computing methods. Neural Netw. 2007;20(3):391–403.

    Article  CAS  PubMed  Google Scholar 

  47. Wainrib G, Galtier M. A local echo state property through the largest lyapunov exponent. Neural Netw. 2016;76:39–45.

    Article  PubMed  Google Scholar 

  48. Xue Y, Yang L, Haykin S. Decoupled echo state networks with lateral inhibition. Neural Netw. 2007;20 (3):365–376.

    Article  PubMed  Google Scholar 

  49. Yildiz I, Jaeger H, Kiebel S. Re-visiting the echo state property. Neural Netw. 2012;35:1–9.

    Article  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Claudio Gallicchio.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Appendices

Appendix A: Proof of Lemma 2

Let us consider the Jacobian of deepESN state transition function in Eq. 1 evaluated at u(t) and x(t − 1). From Eq. 8 we can easily see that for every j > i, we have that \(\mathbf {J}_{F^{(i)},\mathbf {x}^{(j)}}(\mathbf {u}(t),\mathbf {x}(t-1))\) is a zero matrix, as the hierarchical structure of the deepESN architecture (see Fig. 1 and Eqs. 2 and 3) tells us that state update at layer i does not depend on the previous state of the system at higher layers in the stack (i.e. at layers j, with j > i). Thereby, we can notice that J F,x (u(t),x(t − 1)) has the structure of a lower-triangular block matrix. As such, the eigenvalues of J F,x (u(t),x(t − 1)) are the eigenvalues of the matrices on its block diagonal, i.e. the eigenvalues of \(\mathbf {J}_{F^{(i)},\mathbf {x}^{(i)}}(\mathbf {u}(t),\mathbf {x}(t-1))\) for every i = 1,2,…,N L . Accordingly, we have that the spectral radius of J F,x (u(t),x(t − 1)) is the maximum among the spectral radii of its diagonal blocks, i.e.,

$$ \rho(\mathbf{J}_{F,\mathbf{x}}(\mathbf{u}(t),\mathbf{x}(t-1))) = \max\limits_{k = 1,2,\ldots,N_{L}} {}\left( \rho(\mathbf{J}_{F^{(k)},\mathbf{x}^{(k)}}(\mathbf{u}(t),\mathbf{x}(t-1)))\right. $$
(25)

With the aim of computing the diagonal block matrices \(\mathbf {J}_{F^{(k)},\mathbf {x}^{(k)}}(\mathbf {u}(t),\mathbf {x}(t-1))\), we observe that from Eq. 8 we have that

$$ \begin{array}{l} \mathbf{J}_{F^{(k)},\mathbf{x}^{(k)}}(\mathbf{u}(t),\mathbf{x}(t-1)) = \frac{\partial F^{(k)}(\mathbf{u}(t), \mathbf{x}^{(1)}(t-1),\ldots,\mathbf{x}^{(k)}(t-1))}{\partial \mathbf{x}^{(k)}(t-1)} = \\\\ \frac{\partial}{\partial \mathbf{x}^{(k)}(t-1)}\left( (1-a^{(k)}) \mathbf{x}^{(k)}(t-1) + a^{(k)} \tanh(\mathbf{W}_{in}^{(k)} F^{(k-1)}(\mathbf{u}(t),\mathbf{x}^{(1)}(t-1),\ldots,\mathbf{x}^{(k-1)}(t-1)) + \right.\\\\ \left. \boldsymbol{\theta}^{(k)} + \hat{\mathbf{W}}^{(k)} \mathbf{x}^{(k)}(t-1)) \right)= \\\\ (1-a^{(k)})\mathbf{I} + a^{(k)} \; \left( \begin{array}{llll} 1-(\tilde{x}_1^{(k)}(t))^2 &\; 0 &\; {\ldots} & \;0\\ 0 & \; 1-(\tilde{x}_2^{(k)}(t))^2 & \; {\ldots} & \; 0\\ {\vdots} & \; {\vdots} & \; {\ddots} & \; {\vdots} \\ 0 & \; 0 & \; {\ldots} & \; 1-(\tilde{x}_{N_R}^{(k)}(t))^2\\ \end{array} \right) \hat{\mathbf{W}}^{(k)} \end{array} $$
(26)

where for j = 1,2,…,N R , \(\tilde {x}_{j}^{(k)}(t)\) are the elements of \(\tilde {\mathbf {x}}^{(k)}(t) = \tanh (\mathbf {W}_{in}^{(k)} F^{(k-1)}(\mathbf {u}(t),\mathbf {x}^{(1)}(t-1),\mathbf {x}^{(2)}(t-1),\ldots ,\mathbf {x}^{(k-1)}(t-1)) + \boldsymbol {\theta }^{(k)} + \hat {\mathbf {W}}^{(k)} \mathbf {x}^{(k)}(t-1))\).

Considering zero input and state, from Eq. 26, we can derive that for every k = 1,2,…,N L

$$ \rho(\mathbf{J}_{F^{(k)},\mathbf{x}^{(k)}}(\mathbf{0}_{u},\mathbf{0})) = \rho ((1-a^{(k)}) \mathbf{I} + a^{(k)} \hat{\mathbf{W}}^{(k)} ) $$
(27)

and therefore Eq. 25 becomes

$$ \rho(\mathbf{J}_{F,\mathbf{x}}(\mathbf{0}_{u},\mathbf{0})) = \max\limits_{k = 1,2,\ldots,N_{L}} \rho ((1-a^{(k)}) \mathbf{I} + a^{(k)} \hat{\mathbf{W}}^{(k)} ). $$
(28)

Appendix B: Proof of Lemma 3

case (a): This case follows from the case of contractivity in standard shallow ESNs [10]. Indeed, \(\forall \mathbf {u} \in \mathbb {R}^{N_U}\) and \(\forall \mathbf {x}^{(1)}, \mathbf {x^{\prime }}^{(1)} \in \mathbb {R}^{N_R}\)

$$\begin{array}{@{}rcl@{}} &&\|F^{(1)}(\mathbf{u},\mathbf{x}^{(1)}) - F^{(1)}(\mathbf{u},\mathbf{x}^{\prime(1)})\|\\ &=&\|(1 - a^{(1)}) \mathbf{x}^{(1)} + a^{(1)} \tanh\left( \mathbf{W}_{in}^{(1)} \mathbf{u} + \boldsymbol{\theta}^{(1)} + \hat{\mathbf{W}}^{(1)} \mathbf{x}^{(1)}\right) \\ &&-(1 - a^{(1)}) \mathbf{x}^{\prime(1)} -a^{(1)}\tanh\left( \mathbf{W}_{in}^{(1)} \mathbf{u} + \boldsymbol{\theta}^{(1)} + \hat{\mathbf{W}}^{(1)} \mathbf{x}^{\prime(1)}\right)\| \end{array} $$
$$\begin{array}{@{}rcl@{}} &=&\|(1 \,-\, a^{(1)}) (\mathbf{x}^{(1)}\,-\,\mathbf{x}^{\prime(1)}) \,+\, a^{(1)} (\tanh\left( \mathbf{W}_{in}^{(1)} \mathbf{u} + \boldsymbol{\theta}^{(1)} \!+ \hat{\mathbf{W}}^{(1)} \mathbf{x}^{(1)}\right) \\ &&-\tanh\left( \mathbf{W}_{in}^{(1)} \mathbf{u} + \boldsymbol{\theta}^{(1)} + \hat{\mathbf{W}}^{(1)} \mathbf{x}^{\prime(1)}\right))\| \\ &\leq&(1 - a^{(1)}) \|\mathbf{x}^{(1)} - \mathbf{x}^{\prime(1)}\| + a^{(1)} \|\mathbf{W}_{in}^{(1)}\mathbf{u}+\boldsymbol{\theta}^{(1)}+\hat{\mathbf{W}}^{(1)}\mathbf{x}^{(1)} \\ &&-\mathbf{W}_{in}^{(1)}\mathbf{u} - \boldsymbol{\theta}^{(1)} - \hat{\mathbf{W}}^{(1)}\mathbf{x}^{\prime(1)}\| \\ &\leq&(1 - a^{(1)}) \|\mathbf{x}^{(1)}- \mathbf{x}^{\prime(1)}\| + a^{(1)} \|\hat{\mathbf{W}}^{(1)}\| \|\mathbf{x}^{(1)} - \mathbf{x}^{\prime(1)}\| \\ &=&((1 - a^{(1)}) + a^{(1)} \|\hat{\mathbf{W}}^{(1)}\|) \|\mathbf{x}^{(1)} - \mathbf{x}^{\prime(1)}\| \end{array} $$
(29)

from which it follows that \(C^{(1)} = (1 - a^{(1)}) + a^{(1)} \|\hat {\mathbf {W}}^{(1)}\|\) is a Lipschitz constant for F (1). Thus if C (1) < 1 then F (1) is a contraction (see Definition 2).

case (b): In this case, assuming F (i−1) is a contraction with a Lipschitz constant C (i−1) < 1, \(\forall \mathbf {u} \in \mathbb {R}^{N_U}\) and \(\forall \mathbf {x}^{(1)}, \ldots , \mathbf {x}^{(i)}, \mathbf {x}^{\prime (1)}, \ldots , \mathbf {x}^{\prime (i)} \in \mathbb {R}^{N_R}\)

$$\begin{array}{@{}rcl@{}} \begin{array}{l} \|F^{(i)}(\mathbf{u}, \mathbf{x}^{(1)}, \ldots, \mathbf{x}^{(i)}) - F^{(i)}(\mathbf{u}, \mathbf{x}'^{(1)},\ldots, \mathbf{x}'^{(i)})\| = \\ \\ \| (1-a^{(i)}) \mathbf{x}^{(i)} + a^{(i)} \tanh (\mathbf{W}_{in}^{(i)}F^{(i-1)}(\mathbf{u}, \mathbf{x}^{(1)},\ldots, \mathbf{x}^{(i-1)}) + \boldsymbol{\theta}^{(i)} + \hat{\mathbf{W}}^{(i)} \mathbf{x}^{(i)}) - \\ \;\;(1-a^{(i)}) \mathbf{x}'^{(i)} - a^{(i)} \tanh (\mathbf{W}_{in}^{(i)}F^{(i-1)}(\mathbf{u}, \mathbf{x}'^{(1)}, \ldots, \mathbf{x}'^{(i-1)}) + \boldsymbol{\theta}^{(i)} + \hat{\mathbf{W}}^{(i)} \mathbf{x}'^{(i)}) \| = \\ \\ \| (1-a^{(i)}) (\mathbf{x}^{(i)} - \mathbf{x}'^{(i)}) + a^{(i)} (\tanh (\mathbf{W}_{in}^{(i)}F^{(i-1)}(\mathbf{u}, \mathbf{x}^{(1)}, \ldots, \mathbf{x}^{(i-1)}) + \boldsymbol{\theta}^{(i)} + \hat{\mathbf{W}}^{(i)} \mathbf{x}^{(i)}) - \\ \;\; \tanh (\mathbf{W}_{in}^{(i)}F^{(i-1)}(\mathbf{u}, \mathbf{x}'^{(1)}, \ldots, \mathbf{x}'^{(i-1)}) + \boldsymbol{\theta}^{(i)} + \hat{\mathbf{W}}^{(i)} \mathbf{x}'^{(i)})) \| \leq \\ \\ (1-a^{(i)}) \|\mathbf{x}^{(i)} - \mathbf{x}'^{(i)}\| + a^{(i)} \|\mathbf{W}_{in}^{(i)}F^{(i-1)}(\mathbf{u}, \mathbf{x}^{(1)},\ldots, \mathbf{x}^{(i-1)}) + \boldsymbol{\theta}^{(i)} + \hat{\mathbf{W}}^{(i)} \mathbf{x}^{(i)} - \\ \mathbf{W}_{in}^{(i)}F^{(i-1)}(\mathbf{u}, \mathbf{x}'^{(1)},\ldots,\mathbf{x}'^{(i-1)}) - \boldsymbol{\theta}^{(i)} - \hat{\mathbf{W}}^{(i)} \mathbf{x}'^{(i)}\| = \\ \\ (1-a^{(i)}) \|\mathbf{x}^{(i)} - \mathbf{x}'^{(i)}\| + a^{(i)} \| \mathbf{W}_{in}^{(i)} (F^{(i-1)}(\mathbf{u}, \mathbf{x}^{(1)},\ldots, \mathbf{x}^{(i-1)}) - \\ F^{(i-1)}(\mathbf{u}, \mathbf{x}'^{(1)},\ldots, \mathbf{x}'^{(i-1)})) + \hat{\mathbf{W}}^{(i)} (\mathbf{x}^{(i)} - \mathbf{x}'^{(i)})\| \leq \\ \\ (1-a^{(i)}) \|\mathbf{x}^{(i)} - \mathbf{x}'^{(i)}\| + a^{(i)} \left( \| \mathbf{W}_{in}^{(i)} \| \| F^{(i-1)}(\mathbf{u}, \mathbf{x}^{(1)},\ldots, \mathbf{x}^{(i-1)}) - \right.\\ \left.F^{(i-1)}(\mathbf{u}, \mathbf{x}'^{(1)}, \ldots, \mathbf{x}'^{(i-1)}) \| + \|\hat{\mathbf{W}}^{(i)}\| \|\mathbf{x}^{(i)} - \mathbf{x}'^{(i)}\| \right) \leq \\ \\ (1-a^{(i)}) \|\mathbf{x}^{(i)} - \mathbf{x}'^{(i)}\| + a^{(i)} \left( \| \mathbf{W}_{in}^{(i)}\| \, C^{(i-1)} \, \|(\mathbf{x}^{(1)}, \ldots, \mathbf{x}^{(i-1)})-\right.\\ \left.(\mathbf{x}'^{(1)}, \ldots, \mathbf{x}'^{(i-1)})\| + \|\hat{\mathbf{W}}^{(i)}\| \|\mathbf{x}^{(i)} - \mathbf{x}'^{(i)} \|\right) \leq \\ \\ (1-a^{(i)}) \|(\mathbf{x}^{(1)}, \ldots, \mathbf{x}^{(i)})-(\mathbf{x}'^{(1)},\ldots, \mathbf{x}'^{(i)})\| + a^{(i)} \left( C^{(i-1)} \|\mathbf{W}_{in}^{(i)}\| \|(\mathbf{x}^{(1)}, \ldots, \mathbf{x}^{(i)})-\right.\\ \left.(\mathbf{x}'^{(1)},\ldots, \mathbf{x}'^{(i)})\| + \|\hat{\mathbf{W}}^{(i)}\| \|(\mathbf{x}^{(1)}, \ldots, \mathbf{x}^{(i)})-(\mathbf{x}'^{(1)},\ldots, \mathbf{x}'^{(i)})\|\right) = \\ \\ (1-a^{(i)}) \|(\mathbf{x}^{(1)}, \ldots, \mathbf{x}^{(i)})-(\mathbf{x}'^{(1)}, \ldots, \mathbf{x}'^{(i)})\| + \\ a^{(i)} \left( C^{(i-1)} \|\mathbf{W}_{in}^{(i)}\| + \|\hat{\mathbf{W}}^{(i)}\|\right) \|(\mathbf{x}^{(1)}, \ldots, \mathbf{x}^{(i)})-(\mathbf{x}'^{(1)},\ldots, \mathbf{x}'^{(i)})\| = \\ \\ \left[(1-a^{(i)}) + a^{(i)}\left( C^{(i-1)} \|\mathbf{W}_{in}^{(i)}\| + \|\hat{\mathbf{W}}^{(i)}\|\right)\right] \|(\mathbf{x}^{(1)}, \ldots, \mathbf{x}^{(i)})-(\mathbf{x}'^{(1)}, \ldots, \mathbf{x}'^{(i)})\| \end{array} \end{array} $$
(30)

from which we can see that \(C^{(i)} = (1-a^{(i)}) + a^{(i)}\left (C^{(i-1)} \|\mathbf {W}_{in}^{(i)}\| + \|\hat {\mathbf {W}}^{(i)}\|\right )\) is a Lipschitz constant for F (i), and thereby whenever C (i) < 1 it results that F (i) is a contraction (see Definition 2).

Appendix C: Proof of Theorem 2

Proof

Given any input string of length N, denoted by s N = [u(1),…,u(N)], and for every couple of deepESN global states \(\mathbf {x} = (\mathbf {x}^{(1)}, \ldots , \mathbf {x}^{(N_{L})}) \in \mathbb {R}^{N_{L} N_{R}}\) and \(\mathbf {x}^{\prime } = (\mathbf {x}^{\prime (1)},\ldots , \mathbf {x}^{\prime (N_{L})}) \in \mathbb {R}^{N_{L} N_{R}}\), we have that:

$$\begin{array}{@{}rcl@{}} &&\|\hat{F}(\mathbf{s}_{N},\mathbf{x}) - \hat{F}(\mathbf{s}_{N},\mathbf{x}^{\prime})\| \\ &=&\|\hat{F}([\mathbf{u}(1),\ldots, \mathbf{u}(N)],\mathbf{x}) - \hat{F}([\mathbf{u}(1), \ldots, \mathbf{u}(N)],\mathbf{x}^{\prime})\| \\ &=&\|F(\mathbf{u}(N),\hat{F}([\mathbf{u}(1), \ldots, \mathbf{u}(N-1)],\mathbf{x})) \\ &&-F(\mathbf{u}(N),\hat{F}([\mathbf{u}(1),\ldots, \mathbf{u}(N-1)],\mathbf{x}^{\prime}))\| \\ &\leq& C \|\hat{F}([\mathbf{u}(1),\ldots, \mathbf{u}(N\,-\,1)],\mathbf{x}) \,-\, \hat{F}([\mathbf{u}(1),\ldots, \mathbf{u}(N\,-\,1)],\mathbf{x}^{\prime})\| \\ &=&C \|F(\mathbf{u}(N-1),\hat{F}([\mathbf{u}(1),\ldots, \mathbf{u}(N-2)],\mathbf{x})) \\ &&- F(\mathbf{u}(N-1),\hat{F}([\mathbf{u}(1),\ldots, \mathbf{u}(N-2)],\mathbf{x}^{\prime}))\| \\ &\leq& C^{2} \|\hat{F}([\mathbf{u}(1), \ldots, \mathbf{u}(N\,-\,2)],\mathbf{x}) \,-\, \hat{F}([\mathbf{u}(1), \ldots, \mathbf{u}(N\,-\,2)],\mathbf{x}^{\prime})\| \\ &\leq&\ldots \\ && C^{N-1} \|\hat{F}([\mathbf{u}(1)],\mathbf{x}) - \hat{F}([\mathbf{u}(1)],\mathbf{x}^{\prime})\| \\ &=&C^{N-1} \|F(\mathbf{u}(1),\hat{F}([\;],\mathbf{x})) - F(\mathbf{u}(1),\hat{F}([\;],\mathbf{x}^{\prime}))\| \\ &=&C^{N-1} \|F(\mathbf{u}(1),\mathbf{x}) - F(\mathbf{u}(1),\mathbf{x}^{\prime})\| \\ &\leq& C^{N} \, \|\mathbf{x} - \mathbf{x}^{\prime}\| \\ &=&C^{N} \, \max\limits_{k = 1,2,\ldots,N_{L}} \|\mathbf{x}^{(k)} - \mathbf{x}^{\prime(k)} \| \\ &\leq& C^{N} D \end{array} $$
(31)

from which it follows that \(\|\hat {F}(\mathbf {s}_{N},\mathbf {x}) - \hat {F}(\mathbf {s}_{N},\mathbf {x}^{\prime })\|\) is upper bounded by a term that approaches 0 as N. Thereby the ESP condition in Definition 1 holds. □

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gallicchio, C., Micheli, A. Echo State Property of Deep Reservoir Computing Networks. Cogn Comput 9, 337–350 (2017). https://doi.org/10.1007/s12559-017-9461-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-017-9461-9

Keywords

Navigation