Cognitive Computation

, Volume 9, Issue 3, pp 364–378 | Cite as

Training Echo State Networks with Regularization Through Dimensionality Reduction

  • Sigurd Løkse
  • Filippo Maria BianchiEmail author
  • Robert Jenssen


In this paper, we introduce a new framework to train a class of recurrent neural network, called Echo State Network, to predict real valued time-series and to provide a visualization of the modeled system dynamics. The method consists in projecting the output of the internal layer of the network on a lower dimensional space, before training the output layer to learn the target task. Notably, we enforce a regularization constraint that leads to better generalization capabilities. We evaluate the performances of our approach on several benchmark tests, using different techniques to train the readout of the network, achieving superior predictive performance when using the proposed framework. Finally, we provide an insight on the effectiveness of the implemented mechanics through a visualization of the trajectory in the phase space and relying on the methodologies of nonlinear time-series analysis. By applying our method on well-known chaotic systems, we provide evidence that the lower dimensional embedding retains the dynamical properties of the underlying system better than the full-dimensional internal states of the network.


Echo state network Nonlinear time-series analysis Dimensionality reduction Time-series prediction 


Compliance with Ethical Standards

Conflict of interests

The authors declare that they have no conflict of interest.

Informed Consent

All procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki Declaration of 1975, as revised in 2008 (5). Additional informed consent was obtained from all patients for which identifying information is included in this article.

Human and Animal Rights

This article does not contain any studies with human or animal subjects performed by any of the authors.


  1. 1.
    Alexandre LA, Embrechts MJ, Linton J. Benchmarking reservoir computing on time-independent classification tasks. IJCNN International Joint Conference on Neural Networks, 2009. IEEE; 2009. p. 2009.Google Scholar
  2. 2.
    Baker CT. The numerical treatment of integral equations. Clarendon Press, Israel Program for Scientific Translations, 1973. ISBN 019853406X.Google Scholar
  3. 3.
    Balmforth N, Craster R. Synchronizing moore and spiegel. Chaos: An Interdisciplinary Journal of Nonlinear Science. 1997;7(4):738–752.CrossRefGoogle Scholar
  4. 4.
    Belkin M, Niyogi P, Sindhwani V. Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res. 2006;7:2399–2434.Google Scholar
  5. 5.
    Bengio Y, Paiement J-F, Vincent P, Delalleau O, Le Roux N, Ouimet M. Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering. Adv Neural Inf Proces Syst. 2004;16:177–184.Google Scholar
  6. 6.
    Bianchi FM, De Santis E, Rizzi A, Sadeghian A. Short-term electric load forecasting using echo state networks and PCA decomposition. IEEE Access 2015a;3:1931–1943. ISSN 2169-3536. doi: 10.1109/ACCESS.2015.2485943.CrossRefGoogle Scholar
  7. 7.
    Bianchi FM, Scardapane S, Uncini A, Rizzi A, Sadeghian A. Prediction of telephone calls load using Echo State Network with exogenous variables. Neural Netw. 2015b;71:204–213. doi: 10.1016/j.neunet.2015.08.010.CrossRefPubMedGoogle Scholar
  8. 8.
    Bianchi FM, Livi L, Alippi C. Investigating echo state networks dynamics by means of recurrence analysis. 2016. arXiv:1601.07381.
  9. 9.
    Boedecker J, Obst O, Lizier JT, Mayer NM, Asada M. Information processing in echo state networks at the edge of chaos. Theory Biosci. 2012;131(3):205–213.CrossRefPubMedGoogle Scholar
  10. 10.
    Bradley E, Kantz H. Nonlinear time-series analysis revisited. Chaos: An Interdisciplinary Journal of Nonlinear Science. 2015;25(9):097610.CrossRefGoogle Scholar
  11. 11.
    Burges CJ. A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc. 1998;2(2): 121–167.CrossRefGoogle Scholar
  12. 12.
    Cao L. Practical method for determining the minimum embedding dimension of a scalar time series. Physica D: Nonlinear Phenomena.s 1997;110(1):43–50.CrossRefGoogle Scholar
  13. 13.
    Charles A, Yin D, Rozell C. Distributed sequence memory of multidimensional inputs in recurrent networks. 2016. arXiv:1605.08346.
  14. 14.
    Davenport MA, Duarte MF, Wakin MB, Laska JN, Takhar D, Kelly KF, Baraniuk RG. The smashed filter for compressive classification and target recognition. Electronic Imaging 2007, pages 64980H–64980H. International Society for Optics and Photonics; 2007.Google Scholar
  15. 15.
    Deihimi A, Showkati H. Application of echo state networks in short-term electric load forecasting. Energy. 2012;39(1):327–340.CrossRefGoogle Scholar
  16. 16.
    Deihimi A, Orang O, Showkati H. Short-term electric load and temperature forecasting using wavelet echo state networks with neural reconstruction. Energy. 2013;57:382–401.CrossRefGoogle Scholar
  17. 17.
    Dutoit X, Schrauwen B, Campenhout JV, Stroobandt D, Brussel HV, Nuttin M. Pruning and regularization in reservoir computing. Neurocomputing. 2009;72(7–9):1534 – 1546. ISSN 0925-2312. doi: 10.1016/j.neucom.2008.12.020 Advances in Machine Learning and Computational Intelligence16th European Symposium on Artificial Neural Networks 200816th European Symposium on Artificial Neural Networks 2008.CrossRefGoogle Scholar
  18. 18.
    Fodor IK. A survey of dimension reduction techniques Technical report. 2002.Google Scholar
  19. 19.
    Fraser AM, Swinney HL. Independent coordinates for strange attractors from mutual information. Phys Rev A. 1986;33(2):1134.CrossRefGoogle Scholar
  20. 20.
    Friedman JH. On bias, variance, 0/1—loss, and the curse-of-dimensionality. Data Min Knowl Disc. 1997;1(1): 55–77.CrossRefGoogle Scholar
  21. 21.
    Gao J, Cao Y, Tung W-w, Hu J. Multiscale analysis of complex time series: integration of chaos and random fractal theory, and beyond: John Wiley & Sons; 2007. ISBN 978-0-471-65470-4.Google Scholar
  22. 22.
    Grassberger P, Procaccia I. Measuring the strangeness of strange attractors. The Theory of Chaotic Attractors. Springer; 2004. p. 170–189.Google Scholar
  23. 23.
    Hai-yan D, Wen-jiang P, Zhen-ya H. A multiple objective optimization based echo state network tree and application to intrusion detection. Proceedings of 2005 IEEE International Workshop on VLSI Design and Video Technology, 2005; 2005. p. 443–446. doi: 10.1109/IWVDVT.2005.1504645.
  24. 24.
    Han S, Lee J. Fuzzy echo state neural networks and funnel dynamic surface control for prescribed performance of a nonlinear dynamic system. IEEE Trans Ind Electron. 2014a;61(2):1099–1112. ISSN 0278-0046. doi: 10.1109/TIE.2013.2253072.CrossRefGoogle Scholar
  25. 25.
    Han SI, Lee JM. Fuzzy echo state neural networks and funnel dynamic surface control for prescribed performance of a nonlinear dynamic system. IEEE Trans Ind Electron. 2014b;61(2):1099–1112.CrossRefGoogle Scholar
  26. 26.
    Har-Shemesh O, Quax R, Miñano B, Hoekstra AG, Sloot PMA. Nonparametric estimation of Fisher information from real data. Phys Rev E. 2016;93(2):023301. doi: 10.1103/PhysRevE.93.023301.CrossRefPubMedGoogle Scholar
  27. 27.
    Hotelling H. Analysis of a complex of statistical variables into principal components. J Educ Psychol. 1933;24 (6):417–441.CrossRefGoogle Scholar
  28. 28.
    Huang C-M, Huang C-J, Wang M-L. A particle swarm optimization to identifying the armax model for short-term load forecasting. IEEE Trans Power Syst. 2005;20(2):1126–1133.CrossRefGoogle Scholar
  29. 29.
    Indyk P, Motwani R. Approximate nearest neighbors: towards removing the curse of dimensionality. Proceedings of the thirtieth annual ACM symposium on Theory of computing. ACM; 1998. p. 604–613.Google Scholar
  30. 30.
    Jaeger H. The echo state approach to analysing and training recurrent neural networks-with an erratum note. Bonn, Germany: German National Research Center for Information Technology GMD Technical Report. 2001;148:34.Google Scholar
  31. 31.
    Jaeger H. Adaptive nonlinear system identification with echo state networks. Advances in neural information processing systems; 2002. p. 593–600.Google Scholar
  32. 32.
    Jaeger H, Haas H. Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication. science. 2004;304(5667):78–80.CrossRefPubMedGoogle Scholar
  33. 33.
    Jan van Oldenborgh G, Balmaseda MA, Ferranti L, Stockdale TN, Anderson DL. Did the ecmwf seasonal forecast model outperform statistical enso forecast models over the last 15 years? J Clim. 2005;18(16): 3240–3249.CrossRefGoogle Scholar
  34. 34.
    Jenssen R. Kernel entropy component analysis. IEEE Trans Pattern Anal Mach Intell 2010;32(5):847–860. ISSN 0162-8828. doi: 10.1109/TPAMI.2009.100.CrossRefPubMedGoogle Scholar
  35. 35.
    Jenssen R. Entropy-relevant dimensions in the kernel feature space: cluster-capturing dimensionality reduction. IEEE Signal Process Mag. 2013;30(4):30–39. ISSN 1053-5888. doi: 10.1109/MSP.2013.2249692.CrossRefGoogle Scholar
  36. 36.
    Kantz H, Schreiber T, Vol. 7. Nonlinear time series analysis: Cambridge university press; 2004. ISBN 9780511755798. doi: 10.1017/CBO9780511755798.
  37. 37.
    Li D, Han M, Wang J. Chaotic time series prediction based on a novel robust echo state network. IEEE Transactions on Neural Networks and Learning Systems. 2012;23(5):787–799.CrossRefPubMedGoogle Scholar
  38. 38.
    Liebert W, Schuster H. Proper choice of the time delay for the analysis of chaotic time series. Phys Lett A. 1989;142(2-3):107–111.CrossRefGoogle Scholar
  39. 39.
    Livi L, Bianchi FM, Alippi C. Determination of the edge of criticality in echo state networks through fisher information maximization. 2016. arXiv:1603.03685.
  40. 40.
    Lukoševičius M, Jaeger H. Reservoir computing approaches to recurrent neural network training. Computer Science Review. 2009;3(3):127–149. doi: 10.1016/j.cosrev.2009.03.005.CrossRefGoogle Scholar
  41. 41.
    Ma Q, Shen L, Chen W, Wang J, Wei J, Yu Z. Functional echo state network for time series classification. Inf Sci. 2016;373:1 – 20. ISSN 0020-0255. doi: 10.1016/j.ins.2016.08.081.CrossRefGoogle Scholar
  42. 42.
    Malik ZK, Hussain A, Wu J. Novel biologically inspired approaches to extracting online information from temporal data. Cogn Comput. 2014;6(3):595–607. ISSN 1866-9964. doi: 10.1007/s12559-014-9257-0.CrossRefGoogle Scholar
  43. 43.
    Malik ZK, Hussain A, Wu J. An online generalized eigenvalue version of laplacian eigenmaps for visual big data. Neurocomputing. 2016a;173(2):127 – 136. ISSN 0925-2312. doi: 10.1016/j.neucom.2014.12.119.CrossRefGoogle Scholar
  44. 44.
    Malik ZK, Hussain A, Wu QJ. Multilayered echo state machine: A novel architecture and algorithm. 2016b.Google Scholar
  45. 45.
    Marwan N, Romano MC, Thiel M, Kurths J. Recurrence plots for the analysis of complex systems. Phys Rep. 2007;438(5):237–329.CrossRefGoogle Scholar
  46. 46.
    Mazumdar J, Harley R. Utilization of echo state networks for differentiating source and nonlinear load harmonics in the utility network. IEEE Trans Power Electron. 2008;23(6):2738–2745. ISSN 0885-8993. doi: 10.1109/TPEL.2008.2005097.CrossRefGoogle Scholar
  47. 47.
    Packard NH, Crutchfield JP, Farmer JD, Shaw RS. Geometry from a time series. Phys Rev Lett. 1980;45(9):712.CrossRefGoogle Scholar
  48. 48.
    Parlitz U. Nonlinear Time-Series Analysis. Boston, MA: Springer US; 1998, pp. 209–239. ISBN 978-1-4615-5703-6. doi: 10.1007/978-1-4615-5703-6_8.Google Scholar
  49. 49.
    Peng Y, Lei M, Li J-B, Peng X-Y. A novel hybridization of echo state networks and multiplicative seasonal ARIMA model for mobile communication traffic series forecasting. Neural Comput & Applic. 2014;24(3-4): 883–890.CrossRefGoogle Scholar
  50. 50.
    Rényi A. On the dimension and entropy of probability distributions. Acta Mathematica Academiae Scientiarum Hungarica. 1959;10(1-2):193–215.CrossRefGoogle Scholar
  51. 51.
    Rhodes C, Morari M. The false nearest neighbors algorithm: An overview. Comput Chem Eng. 1997;21: S1149–S1154.CrossRefGoogle Scholar
  52. 52.
    Scardapane S, Comminiello D, Scarpiniti M, Uncini A. Significance-Based Pruning for Reservoir’s Neurons in Echo State Networks: Springer International Publishing, Cham; 2015, pp. 31–38. ISBN 978-3-319-18164-6. doi: 10.1007/978-3-319-18164-6_4.
  53. 53.
    Schölkopf B, Smola A, Müller K-R. Kernel principal component analysis. International Conference on Artificial Neural Networks. Springer; 1997. p. 583–588.Google Scholar
  54. 54.
    Schölkopf B, Smola AJ, Williamson RC, Bartlett PL. New support vector algorithms. Neural Comput. 2000;12(5):1207– 1245.CrossRefPubMedGoogle Scholar
  55. 55.
    Skowronski MD, Harris JG. Automatic speech recognition using a predictive echo state network classifier. Neural Netw. 2007;20(3):414–423.CrossRefPubMedGoogle Scholar
  56. 56.
    Srinivas M, Patnaik LM. Genetic algorithms: a survey. Computer 1994;27(6):17–26. ISSN 0018-9162. doi: 10.1109/2.294849.CrossRefGoogle Scholar
  57. 57.
    Takens F. Detecting strange attractors in turbulence. Berlin, Heidelberg: Springer Berlin Heidelberg; 1981, pp. 366–381. ISBN 978-3-540-38945-3. doi: 10.1007/BFb0091924.Google Scholar
  58. 58.
    Van Der Maaten L, Postma E, Van den Herik J. Dimensionality reduction: a comparative. J Mach Learn Res. 2009;10:66–71.Google Scholar
  59. 59.
    Varshney S, Verma T. Half Hourly Electricity Load Prediction using Echo State Network. International Journal of Science and Research. 2014;3(6):885–888.Google Scholar
  60. 60.
    Verstraeten D, Schrauwen B. On the quantification of dynamics in reservoir computing. Artificial Neural Networks – ICANN 2009. In: Alippi C, Polycarpou M, Panayiotou C, and Ellinas G, editors. Heidelberg: Springer Berlin; 2009. p. 985–994. ISBN 978-3-642-04273-7. doi: 10.1007/978-3-642-04274-4_101.
  61. 61.
    Wierstra D, Gomez FJ, Schmidhuber J. Modeling systems with internal state using evolino. Proceedings of the 7th annual conference on Genetic and evolutionary computation. ACM; 2005. p. 1795–1802.Google Scholar
  62. 62.
    Wolf A, Swift JB, Swinney HL, Vastano JA. Determining lyapunov exponents from a time series. Physica D: Nonlinear Phenomena. 1985;16(3):285–317.CrossRefGoogle Scholar
  63. 63.
    Zhou S, Lafferty J, Wasserman L. Compressed and privacy-sensitive sparse regression. IEEE Trans Inf Theory. 2009;55(2):846–866.CrossRefGoogle Scholar
  64. 64.
    Fusi S, Miller EK, Rigotti M. Why neurons mix: high dimensionality for higher cognition. Curr Opin Neurobiol. 2016;37:66–74. ISSN 0959-4388. doi: 10.1016/j.conb.2016.01.010.CrossRefPubMedGoogle Scholar
  65. 65.
    Cover TM. Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Transactions on Electronic Computers. 1965;EC-14(3):326–334. ISSN 0367-7508. doi: 10.1109/PGEC.1965.264137.CrossRefGoogle Scholar
  66. 66.
    Mante V, Sussillo D, Shenoy KV, Newsome WT. Context- dependent computation by recurrent dynamics in prefrontal cortex. Nature. 2013;503(7474):78–84. ISSN 0028-0836. doi: 10.1038/nature12742.CrossRefPubMedPubMedCentralGoogle Scholar
  67. 67.
    DiCarlo JJ, Cox DD. Untangling invariant object recognition. Trends Cogn Sci. 2007;11(8):333–341. ISSN 1364-6613. doi: 10.1016/j.tics.2007.06.010.CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.Machine Learning Group, Department of Physics and TechnologyUniversity of Tromsø - The Arctic University of NorwayTromsøNorway

Personalised recommendations