Skip to main content
Log in

Training Echo State Networks with Regularization Through Dimensionality Reduction

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

In this paper, we introduce a new framework to train a class of recurrent neural network, called Echo State Network, to predict real valued time-series and to provide a visualization of the modeled system dynamics. The method consists in projecting the output of the internal layer of the network on a lower dimensional space, before training the output layer to learn the target task. Notably, we enforce a regularization constraint that leads to better generalization capabilities. We evaluate the performances of our approach on several benchmark tests, using different techniques to train the readout of the network, achieving superior predictive performance when using the proposed framework. Finally, we provide an insight on the effectiveness of the implemented mechanics through a visualization of the trajectory in the phase space and relying on the methodologies of nonlinear time-series analysis. By applying our method on well-known chaotic systems, we provide evidence that the lower dimensional embedding retains the dynamical properties of the underlying system better than the full-dimensional internal states of the network.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Alexandre LA, Embrechts MJ, Linton J. Benchmarking reservoir computing on time-independent classification tasks. IJCNN International Joint Conference on Neural Networks, 2009. IEEE; 2009. p. 2009.

  2. Baker CT. The numerical treatment of integral equations. Clarendon Press, Israel Program for Scientific Translations, 1973. ISBN 019853406X.

  3. Balmforth N, Craster R. Synchronizing moore and spiegel. Chaos: An Interdisciplinary Journal of Nonlinear Science. 1997;7(4):738–752.

    Article  Google Scholar 

  4. Belkin M, Niyogi P, Sindhwani V. Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res. 2006;7:2399–2434.

    Google Scholar 

  5. Bengio Y, Paiement J-F, Vincent P, Delalleau O, Le Roux N, Ouimet M. Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering. Adv Neural Inf Proces Syst. 2004;16:177–184.

    Google Scholar 

  6. Bianchi FM, De Santis E, Rizzi A, Sadeghian A. Short-term electric load forecasting using echo state networks and PCA decomposition. IEEE Access 2015a;3:1931–1943. ISSN 2169-3536. doi:10.1109/ACCESS.2015.2485943.

    Article  Google Scholar 

  7. Bianchi FM, Scardapane S, Uncini A, Rizzi A, Sadeghian A. Prediction of telephone calls load using Echo State Network with exogenous variables. Neural Netw. 2015b;71:204–213. doi:10.1016/j.neunet.2015.08.010.

    Article  PubMed  Google Scholar 

  8. Bianchi FM, Livi L, Alippi C. Investigating echo state networks dynamics by means of recurrence analysis. 2016. arXiv:1601.07381.

  9. Boedecker J, Obst O, Lizier JT, Mayer NM, Asada M. Information processing in echo state networks at the edge of chaos. Theory Biosci. 2012;131(3):205–213.

    Article  PubMed  Google Scholar 

  10. Bradley E, Kantz H. Nonlinear time-series analysis revisited. Chaos: An Interdisciplinary Journal of Nonlinear Science. 2015;25(9):097610.

    Article  Google Scholar 

  11. Burges CJ. A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc. 1998;2(2): 121–167.

    Article  Google Scholar 

  12. Cao L. Practical method for determining the minimum embedding dimension of a scalar time series. Physica D: Nonlinear Phenomena.s 1997;110(1):43–50.

    Article  Google Scholar 

  13. Charles A, Yin D, Rozell C. Distributed sequence memory of multidimensional inputs in recurrent networks. 2016. arXiv:1605.08346.

  14. Davenport MA, Duarte MF, Wakin MB, Laska JN, Takhar D, Kelly KF, Baraniuk RG. The smashed filter for compressive classification and target recognition. Electronic Imaging 2007, pages 64980H–64980H. International Society for Optics and Photonics; 2007.

  15. Deihimi A, Showkati H. Application of echo state networks in short-term electric load forecasting. Energy. 2012;39(1):327–340.

    Article  Google Scholar 

  16. Deihimi A, Orang O, Showkati H. Short-term electric load and temperature forecasting using wavelet echo state networks with neural reconstruction. Energy. 2013;57:382–401.

    Article  Google Scholar 

  17. Dutoit X, Schrauwen B, Campenhout JV, Stroobandt D, Brussel HV, Nuttin M. Pruning and regularization in reservoir computing. Neurocomputing. 2009;72(7–9):1534 – 1546. ISSN 0925-2312. doi:10.1016/j.neucom.2008.12.020 Advances in Machine Learning and Computational Intelligence16th European Symposium on Artificial Neural Networks 200816th European Symposium on Artificial Neural Networks 2008.

    Article  Google Scholar 

  18. Fodor IK. A survey of dimension reduction techniques Technical report. 2002.

  19. Fraser AM, Swinney HL. Independent coordinates for strange attractors from mutual information. Phys Rev A. 1986;33(2):1134.

    Article  CAS  Google Scholar 

  20. Friedman JH. On bias, variance, 0/1—loss, and the curse-of-dimensionality. Data Min Knowl Disc. 1997;1(1): 55–77.

    Article  Google Scholar 

  21. Gao J, Cao Y, Tung W-w, Hu J. Multiscale analysis of complex time series: integration of chaos and random fractal theory, and beyond: John Wiley & Sons; 2007. ISBN 978-0-471-65470-4.

  22. Grassberger P, Procaccia I. Measuring the strangeness of strange attractors. The Theory of Chaotic Attractors. Springer; 2004. p. 170–189.

  23. Hai-yan D, Wen-jiang P, Zhen-ya H. A multiple objective optimization based echo state network tree and application to intrusion detection. Proceedings of 2005 IEEE International Workshop on VLSI Design and Video Technology, 2005; 2005. p. 443–446. doi:10.1109/IWVDVT.2005.1504645.

  24. Han S, Lee J. Fuzzy echo state neural networks and funnel dynamic surface control for prescribed performance of a nonlinear dynamic system. IEEE Trans Ind Electron. 2014a;61(2):1099–1112. ISSN 0278-0046. doi:10.1109/TIE.2013.2253072.

    Article  Google Scholar 

  25. Han SI, Lee JM. Fuzzy echo state neural networks and funnel dynamic surface control for prescribed performance of a nonlinear dynamic system. IEEE Trans Ind Electron. 2014b;61(2):1099–1112.

    Article  Google Scholar 

  26. Har-Shemesh O, Quax R, Miñano B, Hoekstra AG, Sloot PMA. Nonparametric estimation of Fisher information from real data. Phys Rev E. 2016;93(2):023301. doi:10.1103/PhysRevE.93.023301.

    Article  PubMed  Google Scholar 

  27. Hotelling H. Analysis of a complex of statistical variables into principal components. J Educ Psychol. 1933;24 (6):417–441.

    Article  Google Scholar 

  28. Huang C-M, Huang C-J, Wang M-L. A particle swarm optimization to identifying the armax model for short-term load forecasting. IEEE Trans Power Syst. 2005;20(2):1126–1133.

    Article  Google Scholar 

  29. Indyk P, Motwani R. Approximate nearest neighbors: towards removing the curse of dimensionality. Proceedings of the thirtieth annual ACM symposium on Theory of computing. ACM; 1998. p. 604–613.

  30. Jaeger H. The echo state approach to analysing and training recurrent neural networks-with an erratum note. Bonn, Germany: German National Research Center for Information Technology GMD Technical Report. 2001;148:34.

    Google Scholar 

  31. Jaeger H. Adaptive nonlinear system identification with echo state networks. Advances in neural information processing systems; 2002. p. 593–600.

  32. Jaeger H, Haas H. Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication. science. 2004;304(5667):78–80.

    Article  CAS  PubMed  Google Scholar 

  33. Jan van Oldenborgh G, Balmaseda MA, Ferranti L, Stockdale TN, Anderson DL. Did the ecmwf seasonal forecast model outperform statistical enso forecast models over the last 15 years? J Clim. 2005;18(16): 3240–3249.

    Article  Google Scholar 

  34. Jenssen R. Kernel entropy component analysis. IEEE Trans Pattern Anal Mach Intell 2010;32(5):847–860. ISSN 0162-8828. doi:10.1109/TPAMI.2009.100.

    Article  PubMed  Google Scholar 

  35. Jenssen R. Entropy-relevant dimensions in the kernel feature space: cluster-capturing dimensionality reduction. IEEE Signal Process Mag. 2013;30(4):30–39. ISSN 1053-5888. doi:10.1109/MSP.2013.2249692.

    Article  Google Scholar 

  36. Kantz H, Schreiber T, Vol. 7. Nonlinear time series analysis: Cambridge university press; 2004. ISBN 9780511755798. doi:10.1017/CBO9780511755798.

  37. Li D, Han M, Wang J. Chaotic time series prediction based on a novel robust echo state network. IEEE Transactions on Neural Networks and Learning Systems. 2012;23(5):787–799.

    Article  PubMed  Google Scholar 

  38. Liebert W, Schuster H. Proper choice of the time delay for the analysis of chaotic time series. Phys Lett A. 1989;142(2-3):107–111.

    Article  Google Scholar 

  39. Livi L, Bianchi FM, Alippi C. Determination of the edge of criticality in echo state networks through fisher information maximization. 2016. arXiv:1603.03685.

  40. Lukoševičius M, Jaeger H. Reservoir computing approaches to recurrent neural network training. Computer Science Review. 2009;3(3):127–149. doi:10.1016/j.cosrev.2009.03.005.

    Article  Google Scholar 

  41. Ma Q, Shen L, Chen W, Wang J, Wei J, Yu Z. Functional echo state network for time series classification. Inf Sci. 2016;373:1 – 20. ISSN 0020-0255. doi:10.1016/j.ins.2016.08.081.

    Article  Google Scholar 

  42. Malik ZK, Hussain A, Wu J. Novel biologically inspired approaches to extracting online information from temporal data. Cogn Comput. 2014;6(3):595–607. ISSN 1866-9964. doi:10.1007/s12559-014-9257-0.

    Article  Google Scholar 

  43. Malik ZK, Hussain A, Wu J. An online generalized eigenvalue version of laplacian eigenmaps for visual big data. Neurocomputing. 2016a;173(2):127 – 136. ISSN 0925-2312. doi:10.1016/j.neucom.2014.12.119.

    Article  Google Scholar 

  44. Malik ZK, Hussain A, Wu QJ. Multilayered echo state machine: A novel architecture and algorithm. 2016b.

  45. Marwan N, Romano MC, Thiel M, Kurths J. Recurrence plots for the analysis of complex systems. Phys Rep. 2007;438(5):237–329.

    Article  Google Scholar 

  46. Mazumdar J, Harley R. Utilization of echo state networks for differentiating source and nonlinear load harmonics in the utility network. IEEE Trans Power Electron. 2008;23(6):2738–2745. ISSN 0885-8993. doi:10.1109/TPEL.2008.2005097.

    Article  Google Scholar 

  47. Packard NH, Crutchfield JP, Farmer JD, Shaw RS. Geometry from a time series. Phys Rev Lett. 1980;45(9):712.

    Article  Google Scholar 

  48. Parlitz U. Nonlinear Time-Series Analysis. Boston, MA: Springer US; 1998, pp. 209–239. ISBN 978-1-4615-5703-6. doi:10.1007/978-1-4615-5703-6_8.

    Google Scholar 

  49. Peng Y, Lei M, Li J-B, Peng X-Y. A novel hybridization of echo state networks and multiplicative seasonal ARIMA model for mobile communication traffic series forecasting. Neural Comput & Applic. 2014;24(3-4): 883–890.

    Article  Google Scholar 

  50. Rényi A. On the dimension and entropy of probability distributions. Acta Mathematica Academiae Scientiarum Hungarica. 1959;10(1-2):193–215.

    Article  Google Scholar 

  51. Rhodes C, Morari M. The false nearest neighbors algorithm: An overview. Comput Chem Eng. 1997;21: S1149–S1154.

    Article  CAS  Google Scholar 

  52. Scardapane S, Comminiello D, Scarpiniti M, Uncini A. Significance-Based Pruning for Reservoir’s Neurons in Echo State Networks: Springer International Publishing, Cham; 2015, pp. 31–38. ISBN 978-3-319-18164-6. doi:10.1007/978-3-319-18164-6_4.

  53. Schölkopf B, Smola A, Müller K-R. Kernel principal component analysis. International Conference on Artificial Neural Networks. Springer; 1997. p. 583–588.

  54. Schölkopf B, Smola AJ, Williamson RC, Bartlett PL. New support vector algorithms. Neural Comput. 2000;12(5):1207– 1245.

    Article  PubMed  Google Scholar 

  55. Skowronski MD, Harris JG. Automatic speech recognition using a predictive echo state network classifier. Neural Netw. 2007;20(3):414–423.

    Article  PubMed  Google Scholar 

  56. Srinivas M, Patnaik LM. Genetic algorithms: a survey. Computer 1994;27(6):17–26. ISSN 0018-9162. doi:10.1109/2.294849.

    Article  Google Scholar 

  57. Takens F. Detecting strange attractors in turbulence. Berlin, Heidelberg: Springer Berlin Heidelberg; 1981, pp. 366–381. ISBN 978-3-540-38945-3. doi:10.1007/BFb0091924.

    Google Scholar 

  58. Van Der Maaten L, Postma E, Van den Herik J. Dimensionality reduction: a comparative. J Mach Learn Res. 2009;10:66–71.

  59. Varshney S, Verma T. Half Hourly Electricity Load Prediction using Echo State Network. International Journal of Science and Research. 2014;3(6):885–888.

    Google Scholar 

  60. Verstraeten D, Schrauwen B. On the quantification of dynamics in reservoir computing. Artificial Neural Networks – ICANN 2009. In: Alippi C, Polycarpou M, Panayiotou C, and Ellinas G, editors. Heidelberg: Springer Berlin; 2009. p. 985–994. ISBN 978-3-642-04273-7. doi:10.1007/978-3-642-04274-4_101.

  61. Wierstra D, Gomez FJ, Schmidhuber J. Modeling systems with internal state using evolino. Proceedings of the 7th annual conference on Genetic and evolutionary computation. ACM; 2005. p. 1795–1802.

  62. Wolf A, Swift JB, Swinney HL, Vastano JA. Determining lyapunov exponents from a time series. Physica D: Nonlinear Phenomena. 1985;16(3):285–317.

    Article  Google Scholar 

  63. Zhou S, Lafferty J, Wasserman L. Compressed and privacy-sensitive sparse regression. IEEE Trans Inf Theory. 2009;55(2):846–866.

    Article  Google Scholar 

  64. Fusi S, Miller EK, Rigotti M. Why neurons mix: high dimensionality for higher cognition. Curr Opin Neurobiol. 2016;37:66–74. ISSN 0959-4388. doi:10.1016/j.conb.2016.01.010.

    Article  CAS  PubMed  Google Scholar 

  65. Cover TM. Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Transactions on Electronic Computers. 1965;EC-14(3):326–334. ISSN 0367-7508. doi:10.1109/PGEC.1965.264137.

    Article  Google Scholar 

  66. Mante V, Sussillo D, Shenoy KV, Newsome WT. Context- dependent computation by recurrent dynamics in prefrontal cortex. Nature. 2013;503(7474):78–84. ISSN 0028-0836. doi:10.1038/nature12742.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. DiCarlo JJ, Cox DD. Untangling invariant object recognition. Trends Cogn Sci. 2007;11(8):333–341. ISSN 1364-6613. doi:10.1016/j.tics.2007.06.010.

    Article  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Filippo Maria Bianchi.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Informed Consent

All procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki Declaration of 1975, as revised in 2008 (5). Additional informed consent was obtained from all patients for which identifying information is included in this article.

Human and Animal Rights

This article does not contain any studies with human or animal subjects performed by any of the authors.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Løkse, S., Bianchi, F.M. & Jenssen, R. Training Echo State Networks with Regularization Through Dimensionality Reduction. Cogn Comput 9, 364–378 (2017). https://doi.org/10.1007/s12559-017-9450-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-017-9450-z

Keywords

Navigation