1 Introduction

Time series data are observations of well-defined data items that represent repeated measurements over a period of time, such as a month, quarter, or year [1]. A time series shows the actual movements in the data over time caused by cyclical, seasonal, and irregular events on the data item being measured. Time series data are used in different areas such as statistics, signal processing, pattern recognition, earthquake prediction, weather forecasting, trajectory forecasting, control engineering and communications engineering.

The development of a time series forecasting for nonlinear behavior represents a challenge for both engineers and mathematicians. Typically, nonlinear time series modeling involves two major steps: the selection of a model structure with a certain set of parameters and the selection of an algorithm which estimate these parameters. The later issue usually biases the former one. There are still many unsolved problems related to the implementation and design of time series models.

In literature, there is a wide range of machine learning algorithms proposed and applied for the task of time series forecasting. One of the common types of these algorithms is the Echo State Networks (ESN) [2]. ESN is a supervised learning recurrent neural network (RNN) with a fixed random hidden (reservoir) layer and a memoryless readout. The aim of ESN is to drive large, random and fixed RNN for the input signal, where the neurons (units) are promoted in the reservoir network through a nonlinear response signal [3]. The desired output signal is merged with a trainable linear combination of all response signals. The basic idea of ESN is similar to that of the liquid state machine (LSM), which was developed independently by Mass et al. [4].

However, standard ESN possesses several drawbacks, which affect its acceptability. First, the fixed random reservoir is difficult to understand. Second, the reservoir specification and input connections require many trials. Third, imposing a constraint on the spectral radius of the reservoir matrix is useless when setting the reservoir parameter [5]. Lastly, the reservoir’s connectivity and weight structure are not optimal, and the reservoir’s dynamic organization is still unclear.

In attempt to overcome these problems, Rodan and Tino [6] proposed the deterministic Cycle Reservoir with regular Jumps (CRJ). CRJ is considered as a new class of state-space reservoir models where it possesses a fixed state transition structure (the “reservoir”) and an adjustable readout from the state space as in all Reservoir Computing (RC) models.

CRJ has highly-constrained weight values while the nodes in the reservoir are connected into a unidirectional cycle with a fixed value \(r_c\), similar to that of the Simple Cycle Reservoir (SCR) [7]. In addition to that, a bidirectional jump weight \(r_j\) is found which serves as a shortcut for the CRJ network. Previous works have shown that the addition of these regular jumps can improve the performance of the model [6]. Recently, CRJ has shown very promising performance in different types of applications [8, 9].

In this work, we investigate the application of CRJ [6] for different time series forecasting problems. For this purpose, seven time series datasets of different real world applications are utilized. The performance of the developed CRJ model is evaluated and compared with ESN [2, 10], and the NARX model. The ultimate goal of this study is to reveal the efficiency of CRJ when used for times series forecasting in different applications.

This paper is organized as follows: all methods used in this work for the task of time series forecasting are described in Sect. 2. The selected time series datasets for the purpose of evaluating and benchmarking are presented in Sect. 3. The details of the conducted experiments and the discussion are given in Sect. 4. Finally, the findings of this work are summarized in Sect. 5.

2 Methods

2.1 Echo State Network (ESN)

ESN is a discrete time recurrent neural network with \(\{A\}\) input units, \(\{M\}\) internal units and \(\{S\}\) output units over discrete time slots \(n=\{1,2,3,\dots \}\). The activation of the ESN is expressed using a vector for each layer, as given in Eq. 1.

$$\begin{aligned} a(n)=a_{1}(n),\dots a_{A}(n))^{T},b(n)=(b_{1}(n),\dots b_{M}(n))^{T},c(n)=(c_{1}(n),\dots c_{S}(n))^{T} \end{aligned}$$
(1)

The linking weights between the neurons are gathered in \(M \times A\) size matrix, for the input, which is referred to by \(W^{in}=(W_{ji}^{in}),M \times M\) size matrix for the internal weight, which is referred to by \(W=(w_{ij}),S \times (A+M)\) size matrix for the output, which is referred to by \(W^{out}=w_{ij}^{out}\), and \(M \times S\) size matrix for the connection that projects back from output to the internal unit, which is referred to by \(w_{back}=w_{ij}^{back}\).

Unlike RNN, where all the weights for the inputs, internals and output are adaptable, in ESN the reservoir connection weights as well as the input weights are randomly generated and fixed (non trainable). The only trainable part is the output weights. The fixed random weights for the input and reservoir layers are then scaled with a chosen values, v for the input and \(\lambda \) for the reservoir, where \(v, \lambda \ \in (0,1)\). Moreover, the spectral radius of the reservoir matrix should be less than 1 to ensure a sufficient condition for the “echo state property” (ESP). By doing so, ESN ensures that the reservoir state is an “echo” for all input history. The internal units are updated, when moving from time slot n to time slot \(n+1\), according to Eq. 2.

$$\begin{aligned} b(n+1)=f(W^{in}a(n+1)+Wb(n)+W^{back}c(n)+z(n+1)) \end{aligned}$$
(2)

Where f is the reservoir activation function (usually tangent hyper function (tanh)) and z is optional small white noise that might be needed in some cases for solving the overfitting problem. The linear output is computed using Eq. 3.

$$\begin{aligned} c(n+1)=f^{out}(W^{out}x(n+1)) \end{aligned}$$
(3)

Where \(f_{out}=(f_{out}^{1},\dots f_{out}^{S})\) are the output units function and \(x(n+1)=[b(n+1);a(n+1)]\) are a concatenation for the internals and the input activation vectors.

2.2 Cycle Reservoir with Regular Jump (CRJ)

CRJ is a simple deterministic reservoir model with highly constrained weight values [6]. CRJ deterministically generates reservoir that could have the potential of a better performance than standard ESN and other models previously proposed in the literature [7].

To implement CRJ, you need to optimize several parameters including the cycle weight \(r_c\), jump weight \(r_j\), input weight v. Then, the reservoir size N, similar to ESN, is determined. Moreover, the number of input and output units with the added bias value to input units are also determined based on the nature of the task and the target output.

Unlike ESN, CRJ has a simple regular topology with full connectivity between the input and reservoir, there is no need to specify different weight value for each connection between two nodes, where all the reservoir nodes connected via unidirectional cycle with the same value \(r_c\). The value of \(r_c\) should be on the range of [0,1].

CRJ also has a bidirectional shortcut (jumps) between the reservoir units \(r_j\). These jumps increase the density of the connections in the internal units, which in term facilitates a good training. Unlike ESN, which generates the input weight randomly, CRJ required to set its input weight sign values in a complete deterministic manner. The deterministic input signs are generated based on the \(\pi \) digits where each digit is thresholded at 4; if the value of the digit is between \(0\leqslant digit\leqslant 4\) then the connection sign will be minus \((-)\), and if the value of the digit is between \(5\leqslant digit\leqslant 9\) the connection sign will be positive \((+)\).

2.3 Auto-regressive with eXogenous Inputs (NARX)

NARX model was first presented in 1985 by Leontaritis and Billings [11, 12] as a means of describing the input-output relationship of a nonlinear system [13]. Time Series prediction using the NARX model was explored in many articles [14, 15]. The general NARX model structure can be represented using the following nonlinear differential equation:

$$\begin{aligned} y(t) = f(y(t-1),\dots ,y(t-n),u(t-1),\dots ,u(t-m)) \end{aligned}$$
(4)

f represents a nonlinear mapping between the system input u(t) and the past outputs \(y(t-1), y(t-2),\dots \). The order of the model input and output is assumed to be n and m, respectively. The NARX model can be represented as given in Eq. 4. The model parameters can be estimated using LSE.

$$\begin{aligned} y(t)= & {} a_0 + \sum _{k_1=1}^{n} f_{k_1} (x_{k_1}) \nonumber \\+ & {} \sum _{k_1=1}^{n} \sum _{k_2=k_1}^{n} f_{k_{1}k_{2}} (x_{k_1}(t),x_{k_2}(t)) + \dots \nonumber \\+ & {} \sum _{k_1=1}^{n} \dots \sum _{k_l=k_{l-1}}^{n} f_{k_{1}k_{2} \dots k_l} (x_{k_1}(t),\dots , x_{k_l}(t)) \end{aligned}$$
(5)

Given that:

$$\begin{aligned} \sum _{k_1=1}^{n} \dots \sum _{k_l=k_{l-1}}^{n} f_{k_{1}k_{2} \dots k_z} (x_{k_1}(t),\dots , x_{k_z}(t)) = a_{k_{1}k_{2} \dots k_z} \prod _{i=1}^{z} x_{k_i}(t) \end{aligned}$$
(6)

z is in the interval of [1, l]. \(a_{k_{1}k_{2} \dots k_z}\) are the model parameters to be estimated.

$$\begin{aligned} x_k(t) = \left\{ \begin{array}{ll} y(t-k) &{} \text {if}\, 1 \le k \le n\\ u(t-(k-n)) &{} \text {if}\, n+1 \le k \le n+m\\ \end{array} \right. \end{aligned}$$

Identifying a NARX model requires two steps: (1) pick the best model structure [16], (2) estimating the model parameters. NARX model was used to solve many time series analysis, modeling and identification of nonlinear systems [17, 18].

3 Datasets

Seven time-series datasets are drawn from DataMarket Repository for experimenting and benchmarking the described models in the previous section. The DataMarket Repository is sponsored by Qliktech. DataMarket delivers intuitive platform solutions for self-service data visualization, guided analytic applications, embedded analytics, and reporting to approximately 40,000 customers worldwide [19]. The selected datasets for our experiments represent real world data including: financial forecasting problems, unemployment rates, environmental modeling and pollutants concentrations. The datasets describe a variety of problems over different time periods and have different levels of complexity. All datasets are described in Table 1 and depicted in Fig. 1.

Fig. 1.
figure 1

Seven representative time series.

Table 1. Dataset

4 Experiments and Result

4.1 Parameters Settings

All datasets are divided into two sets for training and testing; 70% was used for training, and the rest is used for testing. In order to obtain the best performance of each model in terms of lowest Error, the parameters of the models should be optimized. Therefore, multiple values for each parameter are tested, and the best value is used to obtain the final output.

  • Echo State Network (ESN): For spectral radius (\(\lambda \)), 20 different values in the range [0.05–1] were tested. For connectivity (con), 10 different values in the range [0.05–0.5] were tested. The model was also tested under different internal unit sizes (N) in the range [50–500]. The final and best parameter’s values for the seven datasets used are shown in Table 2.

  • Cycle Reservoir with Regular Jump (CRJ): For internal unit weights (\(r_c\)) and (rj), 20 different values for each parameter in the range [0.05–1] were tested. For input unit scaler (v), 20 different values in the range [0.05–1] were tested. Also as in ESN, the model was also tested under different internal unit sizes (N) in the range [50–500]. For the jump size, N/2 different jump values were tested. The final and best parameter’s values for the seven datasets are provided in Table 3.

  • NARX: The Levenberg-Marquardt training algorithm is utilized to train the model for all the datasets. Different number of hidden nodes are tested for each dataset starting with 5 neurons up to 50 with a step of 5. The results are reported for this model by averaging the obtained RMSE values over 10 independent runs. Evaluation results along with the best parameters are shown in Table 4 for the seven datasets.

Evaluation of the models performance will be done via Normalized Mean Square Error (NMSE), as given in Eq. 7.

$$\begin{aligned} RMSE=\root \of {\frac{\sum ^T_{n=1}(\hat{c}(n)-c(n))^{2}}{T}} \end{aligned}$$
(7)

Where \(\hat{c}(n)\) is a predicted output, c(n) is a desired output.

Table 2. ESN result
Table 3. CRJ result
Table 4. NARX result

4.2 Comparison Results

The final evaluation results of the CRJ, ESN and NARX models are summarized and listed in Table 5. The results are presented in terms of RMSE and the standard deviation of the 10 independent runs of each model (denoted as RMSE ± STD). Note that CRJ models don’t have a standard deviation since they are deterministic models. On the other had, ESN and NARX yielded different RMSE results over different datasets due to their random weight generation.

As it can be noticed in the evaluation results, the CRJ model showed the lowest RMSE values for all datasets. Examining the results of the other two models, we can’t see a dominant model between NARX and ESN as a second best model for all datasets. ESN model achieved the second best evaluation results in four datasets (3, 4, 5, 6), while NARIX model was the second best in three datasets (1, 2, 7). Moreover, comparing NARIX to ESN in terms of standard deviation, we can notice that NARX model is more robust since it showed lower values in most of datasets, while ESN showed noticeably high values of standard deviation in Datasets 4 and 6. Overall, we can conclude that the CRJ model is very efficient when applied for complex time series forecasting problems. The CRJ model has the advantage of simplicity and robustness when compared to other well known models like ESN and NARX.

Table 5. Comparison result.

5 Conclusion

In this work, the application of cycle reservoir with jumps (CRJ) model is evaluated for time series forecasting. For the purpose of benchmarking and evaluation, seven time series datasets that describe different real world applications are utilized. The evaluation results of CRJ are compared with those obtained for two well regarded models which are the Echo State Network (ESN) and Auto-Regressive with eXogenous inputs (NARX). Evaluation results showed that CRJ achieved the lowest RMSE in all datasets. Subsequently, we argue that CRJ has the potential to outperforms ESN and NARIX in terms of accuracy and robustness when applied to time series forecasting problems.